CN111783688B

CN111783688B - Remote sensing image scene classification method based on convolutional neural network

Info

Publication number: CN111783688B
Application number: CN202010637993.9A
Authority: CN
Inventors: 张旭晴; 刘子维; 杨国东; 陈国鸿; 金琦; 王志伟; 邸健
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2022-03-22
Anticipated expiration: 2040-07-02
Also published as: CN111783688A

Abstract

The invention relates to a remote sensing image scene classification method based on a convolutional neural network, which obtains a remote sensing image to be classified; classifying the remote sensing images through a pre-trained convolutional neural network model; and taking the classification result as a scene classification result of the remote sensing image. The convolutional neural network model is as follows: the first layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the second layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the third layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the fourth layer of convolution layer is sequentially connected with a BN layer and an activation function; a fifth layer of convolution layer, which is sequentially connected with the BN layer, the activation function and the pooling layer; a sixth fully-connected layer; a seventh fully connected layer; and the eighth layer is a full connecting layer. And the BN layer is used for carrying out batch normalization processing, so that the effects of ensuring the characteristic extraction capability and improving the convergence speed of the model are achieved.

Description

Remote sensing image scene classification method based on convolutional neural network

Technical Field

The invention relates to the technical field of computers, in particular to a remote sensing image scene classification method based on a convolutional neural network.

Background

With the rapid development of remote sensing technology, high-resolution remote sensing images become easier to acquire. The high-resolution remote sensing image contains copied nonlinear features, the spectral information has complementarity and great redundancy, so that the feature extraction of the remote sensing image is increasingly difficult, and ideal classification results are difficult to obtain by using traditional remote sensing image scene classification methods such as Bayes models, decision tree classification, great likelihood classification and the like. The traditional classification method can only extract the shallow structure characteristics, and is difficult to extract deeper and more complex characteristic information.

Deep learning is a branch of machine learning and is widely applied to the field of image analysis. The convolutional neural network is one of deep learning models, and has been practically applied in the aspects of pattern classification, object monitoring and identification and the like at present, which depends on the generalization capability of the convolutional neural network which is obviously superior to other methods. LeCun et al propose a LeNet-5 model, apply convolutional neural networks to handwritten character recognition, and obtain good results. The AlexNet model is provided by Krizhevsky and the like, the framework of the LeNet-5 model is inherited, the Relu function is used as an activation function for the first time, the gradient saturation phenomenon is effectively prevented, the Dropout layer is used for processing network parameters, the overfitting phenomenon is prevented, and the classification accuracy is improved. Simoyan et al propose VGG-16 and VGG-19 models, inherit some frameworks of a LeNet model and an AlexNet model, deepen a network structure, and further improve classification accuracy. Wangxin and the like provide a remote sensing image classification method based on a deep convolutional neural network and multi-kernel learning, and the kernel functions suitable for feature information are trained by utilizing the multi-kernel learning theory and are mapped to a high-dimensional space, so that the fusion of high-level features is realized, and the classification accuracy is further improved. The feature extraction capability of each model is improved to a certain extent, but due to the increase of the network depth, the network parameters are increased, so that the model training time is too long, and the training convergence speed is difficult to effectively improve.

Disclosure of Invention

Technical problem to be solved

In order to solve the above problems in the prior art, the present invention provides a remote sensing image scene classification method based on a convolutional neural network.

(II) technical scheme

In order to achieve the purpose, the invention adopts the main technical scheme that:

a remote sensing image scene classification method based on a convolutional neural network comprises the following steps:

s101, obtaining a remote sensing image to be classified;

s102, classifying the remote sensing images through a pre-trained convolutional neural network model;

s103, taking the classification result as a scene classification result of the remote sensing image;

the convolutional neural network model is as follows:

the first layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer;

the second layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer;

the third layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer;

the fourth layer of convolution layer is sequentially connected with a BN layer and an activation function;

a fifth layer of convolution layer, which is sequentially connected with the BN layer, the activation function and the pooling layer;

a sixth fully-connected layer;

a seventh fully connected layer;

and the eighth layer is a full connecting layer.

Optionally, the step of training the convolutional neural network model comprises:

s201, constructing a sample remote sensing data set

Wherein the content of the first and second substances,

the method comprises the steps of collecting an ith type sample remote sensing image, wherein i is a type identifier, i is 1,2, …, n is the total number of sample remote sensing image types;

s202, training a convolutional neural network model according to the sample remote sensing data set;

s203, calculating the whole loss function of the trained convolutional neural network model;

and S204, updating the convolutional neural network model parameters according to the overall loss function, and repeatedly executing S201-S203 until the loss value of the overall loss function is minimum.

Optionally, the activation function is a ReLu function.

Optionally, the pooling layer employs a maximum pooling process.

Optionally, in the first layer convolutional layer, 96 convolutional kernels of 5 × 5 are defined, and the step size is 4;

defining 256 convolution kernels of 3 multiplied by 3 in the second convolution layer, wherein the step length is 1;

defining 256 convolution kernels of 3 multiplied by 3 in the third layer of convolution layer, wherein the step length is 1;

defining 384 convolution kernels of 3 multiplied by 3 in the fourth convolution layer, wherein the step length is 1;

in the fifth convolutional layer, 256 3 × 3 convolutional kernels are defined with a step size of 1.

Optionally, the calculation formula of the BN layer is:

wherein m is the number of BN layers, y_mInput data for m layers, E (y)_m-1) Expectation of input data for m layers, V (y)_m-1) The variance of the input data is m layers, and gamma and beta are learning parameters respectively.

Optionally, the overall loss function E is:

wherein j is an image identifier in any type of sample remote sensing image, m is the total number of images in any type of sample remote sensing image, out_ijTo predict value, y_ijAnd the actual label value of the jth image in the ith type sample remote sensing image is obtained.

Optionally, the convolutional neural network model parameter K ═ (w, b);

wherein w is the weight of the convolution kernel in the convolution neural network model, and b is the bias.

Optionally, the updating the convolutional neural network model parameters according to the overall loss function in S204 includes:

updating the convolutional neural network model parameters by the following formula:

wherein, alpha is the learning rate, K_iAnd (4) convolutional neural network model parameters corresponding to the i-th sample remote sensing image.

Optionally, said b ═ 1.

(III) advantageous effects

The invention has the beneficial effects that: obtaining a remote sensing image to be classified; classifying the remote sensing images through a pre-trained convolutional neural network model; and taking the classification result as a scene classification result of the remote sensing image. The convolutional neural network model is as follows: the first layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the second layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the third layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the fourth layer of convolution layer is sequentially connected with a BN layer and an activation function; a fifth layer of convolution layer, which is sequentially connected with the BN layer, the activation function and the pooling layer; a sixth fully-connected layer; a seventh fully connected layer; and the eighth layer is a full connecting layer. And the BN layer is used for carrying out batch normalization processing, so that the effects of ensuring the characteristic extraction capability and improving the convergence speed of the model are achieved.

Drawings

Fig. 1 is a schematic flowchart of a remote sensing image scene classification method based on a convolutional neural network according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a convolutional neural network model construction according to an embodiment of the present application;

fig. 4 is a graph illustrating a convergence rate comparison of a convolutional neural network model according to an embodiment of the present disclosure.

Detailed Description

For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.

To solve this problem, the present proposal provides a method, see fig. 1, which is implemented as follows:

and S101, obtaining the remote sensing image to be classified.

And S102, classifying the remote sensing image through a pre-trained convolutional neural network model.

And S103, taking the classification result as a scene classification result of the remote sensing image.

The convolutional neural network model is as follows:

the first convolution layer is connected with BN (batch normalization) layer, activation function and pooling layer in turn.

The second convolution layer is connected with the BN layer, the activation function and the pooling layer in sequence.

The third layer of convolution layer is connected with the BN layer, the activation function and the pooling layer in sequence.

The fourth layer is coiled and connected with BN layer and activation function in turn.

And the fifth layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer.

And the sixth fully connected layer.

And a seventh fully connected layer.

And the eighth layer is a full connecting layer.

Specifically, the activation function is a ReLu function.

The pooling layer adopts a maximum pooling method.

In the first convolutional layer, 96 convolutional kernels of 5 × 5 are defined, with a step size of 4.

In the second convolutional layer, 256 3 × 3 convolutional kernels are defined with a step size of 1.

In the third convolutional layer, 256 convolutional kernels of 3 × 3 are defined, with a step size of 1.

In the fourth convolutional layer, 384 convolution kernels of 3 × 3 are defined, with a step size of 1.

The calculation formula of the BN layer is as follows:

The step of training the convolutional neural network model comprises:

s201, constructing a sample remote sensing data set

Wherein the content of the first and second substances,

and the i is a set of the ith type of sample remote sensing images, i is a class identifier, and i is 1,2, …, n is the total number of the types of the sample remote sensing images.

And S202, training a convolutional neural network model according to the sample remote sensing data set.

And S203, calculating the overall loss function of the trained convolutional neural network model.

Wherein the overall loss function E is:

The convolutional neural network model parameter K is (w, b).

Optionally, b is 1.

In S204, updating the convolutional neural network model parameters according to the global loss function, including:

updating the parameters of the convolutional neural network model by the following formula:

The remote sensing image scene classification method based on the convolutional neural network can effectively improve the classification accuracy and improve the model convergence rate.

The convolutional neural network model adopted in the method can be as shown in fig. 2, a BN layer is added after each convolutional layer, batch normalization processing is carried out, and the calculated amount is reduced; adding pooling layers in the 1 st, 2 nd, 3 th and 5 th layers, reducing the latitude of each characteristic diagram, and reserving most important information, compressing data and parameter quantity; the ReLu function is used as an activation function to prevent the gradient saturation phenomenon; and (3) accelerating the convergence speed of the model by using a mini-batch training method and combining the normalization processing of the BN layer. The method combines the advantages of AlexNet model extraction features, and introduces a BN layer at the same time, thereby achieving the effects of ensuring the feature extraction capability and improving the convergence speed of the model.

In the embodiment, when the convolutional neural network model is constructed, in order to prevent the problems of insufficient memory, low efficiency and the like caused by excessive input data amount and serious consumption of system resources, which causes no response of the system or overlong consumption time, the BN layer is used for carrying out normalization processing on data, and each batch of data is normalized to reduce the calculated amount. The data are normalized in batches and input into the neural network for training, so that the network convergence speed is increased.

Compared with the traditional neural network construction method, the method adopted by the embodiment has the advantages that the accuracy is improved, the convergence rate of the network is greatly increased, and a new thought is provided for the remote sensing image classification method based on the convolutional neural network.

Referring to fig. 3, in the convolutional neural network model construction method provided in this embodiment, a sample remote sensing data set is first constructed

Wherein the content of the first and second substances,

Construction I_mWhile simultaneouslyAnd also to produce corresponding label values

Wherein

A set of labels represented by a set of class i sample remote sensing images.

In addition, in order to verify the model, the sample remote sensing data set can be divided into a training set and a testing set, and each training set and each testing set are respectively divided into a training set and a testing set

And selecting 80% of the training samples and 20% of the testing samples, and testing the model when the model training is finished.

When the convolutional neural network model is trained, the model is trained by using a training set, and the output value of the neural network is calculated forwards.

Specifically, an 8-layer network model, 5 convolutional layers and 3 full-link layers shown in fig. 2 were constructed. The 1 st, 2 nd, 3 rd and 5 th convolutional layers are connected in the sequence of convolutional layer → BN layer → pooling layer, and the 4 th layer is free of pooling layer. The ReLu function which is applied most and has strong adaptability is used as the activation function.

In the first layer of convolutional layer, 96 convolutional kernels of 5X5 are defined, and the step size is 4; the pooling layer uses a maximum pooling method.

Defining 256 convolution kernels of 3 multiplied by 3 in the second convolution layer, wherein the step length is 1; the pooling layer uses a maximum pooling method.

In the third convolution layer, 256 convolution kernels of 3 × 3 are defined, and the step length is 1; the pooling layer uses a maximum pooling method.

Defining 384 convolution kernels of 3 multiplied by 3 in the fourth convolution layer, wherein the step length is 1; and (4) no pooling layer.

In the fifth convolutional layer, 256 3 × 3 convolutional kernels are defined, the step size is 1, and the pooling layer uses the maximum pooling method.

The sixth layer is a fully connected layer.

The seventh layer is a fully connected layer.

The eighth layer is a fully connected layer.

Calculating the overall loss function of the neural network, reversely optimizing the loss function, using a gradient descent method to minimize the loss value, and updating the network parameters.

Specifically, each image is subjected to model calculation, a prediction tag value is output, and cross entropy is calculated according to the prediction tag value and an actual tag value and serves as a loss function.

And (3) updating network parameters by using a gradient descent method as an optimization function and utilizing a least square method idea so as to minimize a target function, namely a loss function. The calculation method is as follows:

alpha is the learning rate, K_iAnd (4) convolutional neural network model parameters corresponding to the i-th sample remote sensing image.

After saving the model, verification is performed using the test set.

Specifically, the training is completed and the model is saved. The training set trains the model, so that the model can learn the capability of extracting the features; the test set verifies the trained model, and in principle, no intersection exists between the training set and the images in the test set, so that the accuracy of the result is ensured.

And inputting the images in the test set into the trained model, and calculating the loss value and the accuracy.

And finishing training of the convolutional neural network model in the remote sensing image scene classification method based on the convolutional neural network.

The following describes the training process of the model again by taking the example of training the convolutional neural network model by the UC mercered Land-Use data set.

1) Preparing data: the UC Mercded Land-Use dataset is used herein, for 21 classes, with 100 images of each type.

2) Constructing a data set and a label:

constructing multi-class remote sensing image data set

Wherein the content of the first and second substances,

and the i is a set of the ith type of sample remote sensing images, i is a class identifier, and i is 1,2, …, n is the total number of the types of the sample remote sensing images. Construction I_mAt the same time, the corresponding label value is also made

Wherein

A set of labels represented by a set of class i sample remote sensing images. Each one of which is

And

including 100 sample data.

The data set is divided into a training set Train and a Test set Test:

e.g. using 80% of the data set as training samples, from each one

And randomly selecting 80 remote sensing images to construct a training set.

Using 20% of the data set as training samples, from each one

And randomly selecting 20 remote sensing images to construct a test set.

3) And constructing a network model, training the model by using a training set, and calculating the output value of the neural network forwards.

Wherein, the network model structure: each convolutional layer is followed by a BN layer, using the ReLu function as the activation function.

The sixth layer is a fully connected layer.

The seventh layer is a fully connected layer.

The eighth layer is a fully connected layer.

In the calculation of the BN layer, for the m-th BN layer, the following calculation formula is adopted:

wherein, y_mInput data for m layers, E (y)_m-1) Expectation of input data for m layers, V (y)_m-1) The variance of the input data is m layers, and gamma and beta are learning parameters respectively.

4) Calculating the overall loss function of the neural network, reversely optimizing the loss function, using a gradient descent method to minimize the loss value, and updating the network parameters.

The model fully connected layer outputs a feature vector F3 (prediction value), which is combined with the actual label value of the input image to calculate the cross entropy as the network loss function E. There are the calculation formulas:

among them, out_ijTo predict value, y_ijAnd the actual label value of the jth image in the ith type sample remote sensing image is obtained.

And (w, b) the model parameter K, wherein w is the weight of a convolution kernel in the convolution neural network model, and b is the bias. Such as b-1.

There is an update parameter formula:

5) The model is saved and verified using the test set.

And (5) finishing model training and saving the model. And inputting the test set into the stored model, and obtaining a predicted value Pre through each image of the model, namely a classification result.

And calculating a loss function E according to the predicted value Pre and the Real label value Real of the image.

And comparing the predicted value Pre with the Real label value Real of the image to obtain the classification accuracy.

In this example, three models were used for comparison, where method 1 is the method of the present invention, method 2 is the method of the present invention for removing the BN layer, and method 3 is the AlexNet model.

The convergence speed comparison of the three methods is shown in fig. 4.

Table 1 shows the comparison of the accuracy obtained by the three methods on the data set UC Merced Land-Use.

TABLE 1

The method provided by the embodiment comprises the steps of obtaining a remote sensing image to be classified; classifying the remote sensing images through a pre-trained convolutional neural network model; and taking the classification result as a scene classification result of the remote sensing image. The convolutional neural network model is as follows: the first layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the second layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the third layer of convolution layer is sequentially connected with the BN layer, the activation function and the pooling layer; the fourth layer of convolution layer is sequentially connected with a BN layer and an activation function; a fifth layer of convolution layer, which is sequentially connected with the BN layer, the activation function and the pooling layer; a sixth fully-connected layer; a seventh fully connected layer; and the eighth layer is a full connecting layer. And the BN layer is used for carrying out batch normalization processing, so that the effects of ensuring the characteristic extraction capability and improving the convergence speed of the model are achieved.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Finally, it should be noted that: the above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A remote sensing image scene classification method based on a convolutional neural network is characterized by comprising the following steps:

s101, obtaining a remote sensing image to be classified;

the convolutional neural network model is as follows:

a sixth fully-connected layer;

a seventh fully connected layer;

and the eighth layer is a full connecting layer.

2. The method of claim 1, wherein the step of training the convolutional neural network model comprises:

s201, constructing a sample remote sensing data set

Wherein the content of the first and second substances,

3. The method according to claim 1 or 2, wherein the activation function is a ReLu function.

4. The method of claim 3, wherein the pooling layer employs max-pooling.

5. The method of claim 4, wherein 96 convolution kernels of 5x5 are defined in the first layer of convolution layers, with a step size of 4;

6. The method of claim 5, wherein the calculation formula of the BN layer is as follows:

7. The method of claim 2, wherein the global loss function E is:

8. The method of claim 7, wherein the convolutional neural network model parameter K ═ (w, b);

9. The method of claim 8, wherein the updating the convolutional neural network model parameters according to the global loss function in S204 comprises:

10. The method of claim 9, wherein b-1.