CN111008642A

CN111008642A - High-resolution remote sensing image classification method and system based on convolutional neural network

Info

Publication number: CN111008642A
Application number: CN201911024275.8A
Authority: CN
Inventors: 张晓东; 陈关州; 谢义娟; 王庆; 戴凡; 龚元夫; 朱坤
Original assignee: Hubei Furuier Technology Co Ltd
Current assignee: Hubei Furuier Technology Co Ltd
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2020-04-14

Abstract

The invention provides a high-resolution remote sensing image classification method and a system based on a convolutional neural network, wherein the method comprises the following steps: firstly, preprocessing a remote sensing image, and segmenting the preprocessed remote sensing image; then processing each segmented image to process all the segmented images into image objects with a fixed size input format acceptable by a convolutional neural network; then constructing a double-input parallel convolutional neural network, and training the convolutional neural network to obtain a trained convolutional neural network; and finally, classifying the actual remote sensing image by adopting the trained convolutional neural network. The invention has the beneficial effects that: processing irregular image objects by using a minimum external rectangular mask mode, and separately learning geometric shape information, spectrum, color and other information of the image objects; a multi-input parallel convolution neural network is provided; and errors caused by artificial interference are reduced, and the intelligent degree of remote sensing image classification is improved.

Description

High-resolution remote sensing image classification method and system based on convolutional neural network

Technical Field

The invention relates to the technical field of remote sensing image processing and information extraction, in particular to a high-resolution remote sensing image classification method and system based on a convolutional neural network.

Background

The remote sensing image and the natural scene image have differences, and the main differences are as follows: 1) the remote sensing image has richer wave bands, and the natural image generally only comprises three wave bands; 2) the proportion of a single ground object in the remote sensing image in the image is generally small, and the proportion of a single object in the natural scene image in the image is generally large; 3) the shooting visual angle of the remote sensing image is generally vertical downward view, and the shooting visual angle of the natural scene image is variable; 4) the difference of the ground object scale in the remote sensing image can be large, and the difference of the object in the natural scene image is small. The above similarities and differences enable the depth learning method in the natural scene image processing to be borrowed from the remote sensing image processing, but corresponding adjustment is also needed to adapt to the characteristics of the remote sensing image.

The deep learning technology performs target detection in a picture of a natural scene, and achieves achievement which is difficult to achieve by a traditional method. In view of the above situation, how to apply the technology based on deep learning to the remote sensing image based on the remote sensing image ground object target detection model based on deep learning to improve the work efficiency and precision, reduce manual operation, and realize full-automatic remote sensing image ground object target detection becomes a hotspot problem.

Disclosure of Invention

In order to solve the problems, the invention provides a high-resolution remote sensing image classification method and system based on a convolutional neural network;

a high-resolution remote sensing image classification method based on a convolutional neural network mainly comprises the following steps:

s101: acquiring a remote sensing image for training, and preprocessing the remote sensing image to obtain a preprocessed remote sensing image;

s102: segmenting the preprocessed remote sensing image by using a graph theory-based minimum spanning tree segmentation method so as to segment the preprocessed remote sensing image into a plurality of segmented images with irregular shapes and inconsistent sizes;

s103: processing the plurality of segmented images respectively to process all the segmented images into image objects in a fixed-size input format acceptable by a convolutional neural network;

s104: constructing a double-input parallel convolutional neural network, and training the double-input parallel convolutional neural network by using image objects with fixed size input formats which can be accepted by the plurality of convolutional neural networks as training data of the double-input parallel convolutional neural network to obtain a trained double-input parallel convolutional neural network;

s105: and classifying the actual remote sensing images by adopting the trained double-input parallel convolutional neural network.

Further, in step S101, the specific method of preprocessing is as follows:

and scaling the shortest edge of the remote sensing image to 600 pixels, and simultaneously carrying out normalization processing on the pixel value of the remote sensing image to obtain the preprocessed remote sensing image.

Further, in step S103, for a certain segmented image, the processing method specifically includes:

firstly, segmenting the segmented image by using the minimum circumscribed rectangle of the segmented image to obtain a minimum circumscribed rectangle object corresponding to the segmented image, and then compressing the minimum circumscribed rectangle object to a preset size in an equal proportion so as to process the minimum circumscribed rectangle object into an image object with a fixed size input format acceptable by a convolutional neural network;

all the segmented images are processed by adopting the method, so that all the segmented images are processed into image objects in a fixed-size input format acceptable by a convolutional neural network.

Further, the minimum circumscribed rectangle object comprises two kinds, namely a binary mask image and an image without a mask; the minimum circumscribed rectangle of the segmented image is used for segmenting the segmented image to obtain a minimum circumscribed rectangle object corresponding to the segmented image, and the method specifically comprises the following steps:

1) intercepting the segmentation image by using the minimum circumscribed rectangle without the mask, namely filling the area of the minimum circumscribed rectangle with the original value to obtain the image without the mask of the segmentation image;

2) and intercepting the segmentation image by using the minimum circumscribed rectangle with the mask, and carrying out binarization on the result, namely using white filling for the object region and using zero filling for other regions to obtain a binary mask image of the segmentation image.

Further, in step S104, the constructed dual-input parallel convolutional neural network has two parallel input branches, one branch inputs binary mask data, and the other branch inputs data without a mask, the two branches begin to learn features of respective input data, then the learned features of the two branches are fused together to learn more abstract features, and finally a classification result is output; the binary mask data includes a plurality of different binary mask images and the unmasked data includes a plurality of different unmasked images.

Further, in step S104, when the image objects with the fixed-size input format that can be accepted by the plurality of convolutional neural networks are used as training data of the convolutional neural networks, the activation function uses ReLU when the dual-input parallel convolutional neural networks are trained; the convolution kernel size is 3 × 3, and the step size is 1; the size of the pooling window is 2 multiplied by 2, and the step length is 2; the network optimization algorithm is Adadelta.

Further, a high resolution remote sensing image classification system based on convolutional neural network, its characterized in that: the system comprises the following modules:

the remote sensing image acquisition module is used for acquiring a remote sensing image for training and preprocessing the remote sensing image to obtain a preprocessed remote sensing image;

the image segmentation module is used for segmenting the preprocessed remote sensing image by using a graph theory-based minimum spanning tree segmentation method so as to segment the preprocessed remote sensing image into a plurality of segmented images with irregular shapes and inconsistent sizes;

the image processing module is used for respectively processing the plurality of segmented images so as to process all the segmented images into image objects with a fixed size input format acceptable by a convolutional neural network;

the network construction module is used for constructing a double-input parallel convolutional neural network, and training the double-input parallel convolutional neural network by using image objects with fixed size input formats which can be accepted by the plurality of convolutional neural networks as training data of the double-input parallel convolutional neural network to obtain the trained double-input parallel convolutional neural network;

and the actual classification module is used for classifying the actual remote sensing image by adopting the trained input parallel convolutional neural network.

Further, in the remote sensing image acquisition module, a specific method of preprocessing is as follows:

Further, in the image processing module, for a certain segmented image, the processing method specifically includes:

The technical scheme provided by the invention has the beneficial effects that: the technical scheme provided by the invention has the following advantages:

(1) and processing irregular image objects by using a minimum circumscribed rectangle mask mode, and separately learning the geometric shape information of the image objects and other information such as spectrum, color and the like.

(2) A multi-input parallel convolution neural network is provided, one receives data using a mask and learns object shape information; one accepts data without using a mask and learns other information of the object than shape information.

(3) The two processes of feature extraction and classification in the object-oriented remote sensing image classification method are integrated into one model, so that errors caused by artificial interference are reduced, and the intelligent degree of remote sensing image classification is improved.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flowchart of a high resolution remote sensing image classification method based on a convolutional neural network according to an embodiment of the present invention;

FIG. 2(a) is a schematic diagram of an original remote sensing image according to an embodiment of the present invention;

FIG. 2(b) is a schematic diagram of a segmented image according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a segmentation process in an embodiment of the present invention;

FIG. 4 is a diagram of a dual-input parallel convolutional neural network in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a convolution operation in an embodiment of the present invention;

FIG. 6 is a schematic illustration of a pooling operation in an embodiment of the present invention;

FIG. 7 is a schematic diagram of a fully connected layer in an embodiment of the invention;

FIG. 8 is a flow chart of an experiment in an embodiment of the present invention;

fig. 9 is a schematic diagram of digital orthophoto images of the unmanned aerial vehicle according to the embodiment of the present invention;

FIG. 10 is a schematic diagram of the segmentation result of the digital orthophoto image of the UAV according to the embodiment of the present invention;

fig. 11 is a schematic diagram illustrating a classification result of digital ortho images of an unmanned aerial vehicle according to an embodiment of the present invention;

fig. 12 is a schematic block diagram of a high-resolution remote sensing image classification system based on a convolutional neural network according to an embodiment of the present invention.

Detailed Description

For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a high-resolution remote sensing image classification method and system based on a convolutional neural network.

Referring to fig. 1, fig. 1 is a flowchart of a high resolution remote sensing image classification method based on a convolutional neural network in an embodiment of the present invention, which specifically includes the following steps:

s101: acquiring a remote sensing image for training, and preprocessing the remote sensing image to obtain a preprocessed remote sensing image; the remote sensing image for training can be downloaded from the remote sensing image data set which is already disclosed on the network;

In step S101, the specific method of preprocessing is as follows:

and scaling the shortest edge of the remote sensing image to 600 pixels, and simultaneously carrying out normalization processing on the pixel value of the remote sensing image to obtain a preprocessed remote sensing image:

for a certain remote sensing image, the calculation formula of the specific scaling ratio is shown as the following formula:

in the above formula, h and w are width and height of the remote sensing image respectively, and the whole remote sensing image is subjected to scaling treatment according to the obtained scaling ratio;

calculating the mean value mean and the standard deviation std of the pixel values of the remote sensing image after the zooming processing;

according to the mean value mean and the standard deviation std, carrying out normalization processing on the pixel value in the zoomed image as shown in the following formula to obtain a normalized remote sensing image, namely a preprocessed remote sensing image:

in step S102, when the preprocessed remote sensing image is segmented by using a graph theory-based minimum spanning tree segmentation method, the segmentation criterion is a prediction and judgment process: judging whether the two vertexes connected currently belong to the same area, if so, returning to TRUE, and otherwise, returning to FALSE; in the algorithm implemented by the join-Set data structure, if the vertex connected by an edge already belongs to a certain tree, the vertex connected by an edge has the same root, and the vertex connected by an edge belongs to the same Set, the edge does not need to be added, so that whether the edge is added into the minimum spanning tree or not is judged by the algorithm; according to the two points, whether an edge can be added into the spanning tree or not can be judged, namely, the vertexes are combined into a set; when the algorithm is finished, a plurality of trees meeting the conditions are obtained, each tree represents an area, but meanwhile, the different rule settings can be found, and when all the edges are circulated, vertexes connected by some edges do not meet the merging conditions and become isolated points or small areas; therefore, after the minimum spanning tree segmentation algorithm is operated, small region combination needs to be carried out according to the edge weight from small to large, and the size of the region can be controlled by setting parameters. In the embodiment of the present invention, the acquired original remote sensing image and the segmented image are respectively shown in fig. 2(a) and fig. 2 (b).

In step S103, for a certain segmented image, the processing method specifically includes:

The minimum circumscribed rectangle objects comprise two kinds, namely a binary mask image and an image without a mask; the minimum circumscribed rectangle of the segmented image is used for segmenting the segmented image to obtain a minimum circumscribed rectangle object corresponding to the segmented image, and the method specifically comprises the following steps:

2) and (3) intercepting the segmentation image by using the minimum circumscribed rectangle with the mask, and binarizing the result, namely using white filling (the RGB value is 255) for the object region and zero filling (the RGB value is 0) for the other regions, so as to obtain a binary mask image of the segmentation image as shown in fig. 3.

In step S104, the constructed dual-input parallel convolutional neural network has two parallel input branches, as shown in fig. 4, one branch inputs binary mask data, and the other branch inputs data without a mask, the two branches begin to learn features of respective input data, then the features learned by the two branches are fused together to learn more abstract features, and finally a classification result is output; the binary mask data includes a plurality of different binary mask images and the unmasked data includes a plurality of different unmasked images.

In step S104, using the image objects with fixed size input format that can be accepted by the plurality of convolutional neural networks as training data of the convolutional neural networks, when training the dual-input parallel convolutional neural networks, using a ReLU as an activation function; the convolution kernel size is 3 × 3, and the step size is 1; the size of the pooling window is 2 multiplied by 2, and the step length is 2; the network optimization algorithm is Adadelta; the network convolution layer uses SpatialDropout and Batch Normalization techniques.

During training, a data enhancement method can be adopted to expand the training data according to actual needs;

the data enhancement means that the original data is randomly rotated, translated and the like to a certain degree to generate new sample data, and the original data sample library is expanded. The data lifting method can randomly intercept target scale data in the large-scale image object or randomly rotate sample data by an angle or translate to generate new data. The method can not only increase the generalization capability of the network, but also prevent the overfitting phenomenon of the network to a certain extent. In an embodiment of the present invention, a training sample library is augmented using data enhancement techniques.

In the embodiment of the invention, the remote sensing image features are extracted by adopting a convolutional neural network. The Convolutional neural network mainly consists of a Convolutional Layer (Convolutional Layer), a Pooling Layer (Pooling Layer), and a fully connected Layer (FullyConnectedLayer), and is introduced as follows:

1) convolutional layer

The convolutional layer performs a sliding convolution on the input image or feature map through a fixed-size window (convolution kernel) to obtain an output on the input, which is called the feature map. The input convolutional layer is a three-dimensional array in the form of (r, m, n), wherein r represents the number of input image channels or the number of characteristic graphs, and m and n represent the size of the image or the characteristic graphs. The basic principle is shown in fig. 5; the red frame area in the input image or feature map in fig. 5 is the image area currently undergoing convolution operation, and the pixel value matrix of the corresponding area is

Green represents the convolution kernel of

The calculation result is the sum of products of the corresponding position values, and the sum is used as the value of the corresponding position of the output characteristic diagram, namely 1. The parameters in the convolution kernel in the neural network are the parameters to be learned, and are automatically learned through training without manual setting. In addition, the convolutional layer can extract the characteristics of the image to obtain the image characteristics beneficial to completing the task.

2) Pooling layer

The pooling layer is another important basic component unit of the convolutional neural network, the pooling operation is similar to the convolutional operation, except that the pooling function is generally a maximum value taking function or an average value taking function, namely, a maximum value taking or an average value taking operation is carried out on a matrix formed by pixels of a window with a specified size on the feature map, so that a value of a position corresponding to the pooled feature map is obtained. The pooling layer adopts a pooling function of extreme value taking or average value taking operation without parameters needing to be learned. The pooling layer is used for integrating and extracting adjacent features, simplifying and refining the features and avoiding redundancy of the extracted features. The rationale for the commonly used max-pooling, average-pooling and global-average pooling is shown in FIG. 6; in fig. 6, the pooling area is 2 × 2, the moving step is 2, and a, b, and c in the graph represent the output results of maximum pooling, average pooling, and global pooling of the feature map, respectively. Wherein the a, b grid colors correspond to the regions of the feature map where pooling operations are performed. Since global pooling has only one output for a feature map, there is only one output value in c that is globally averaged over the input feature map.

3) Full connection layer

The features extracted by the convolutional layer and the pooling layer are two-dimensional features, and are converted into one-dimensional features through related operations, and then the one-dimensional features are input into the full-connection layer, so that the final features are combined, and the final task can be completed. The basic structure of the fully-connected layer is shown in FIG. 7; wherein, x1, x2, x3 are the input of the full connection layer, a1, a2, a3 are the output, and the formula is:

in the above formula, the elements in the W matrix and the deviation matrix b are parameters for network learning.

In the embodiment of the invention, the unmanned aerial vehicle DOM image is adopted for the experiment, the spatial resolution of the adopted image is 0.5 m, and the size of the adopted image is 10595 multiplied by 9748. Firstly, segmenting preprocessed images by using a graph theory-based most spanning tree segmentation method to obtain 28452 image objects, then processing the segmented images to obtain an experimental data set with a fixed size (64 x 64), and finally selecting 7093 images as training samples, 1520 images as verification samples and 1521 images as test samples. The data set distribution is shown in table 1:

TABLE 1 Experimental data Category distribution

The experimental network model is shown in fig. 8. The network's activation function uses the ReLU; the convolution kernel size is 3 × 3, and the step size is 1; the size of the pooling window is 2 multiplied by 2, and the step length is 2; the network optimization algorithm adopted by the invention is Adadelta; the network convolution layer uses SpatialDropout and Batch Normalization techniques.

The data enhancement means that the original data is randomly rotated, translated and the like to a certain degree to generate new sample data, and the original data sample library is expanded. The data lifting method can randomly intercept target scale data in the large-scale image object or randomly rotate sample data by an angle or translate to generate new data. The method can not only increase the generalization capability of the network, but also prevent the overfitting phenomenon of the network to a certain extent. The present invention extends the training sample library using data enhancement techniques.

The test samples for precision evaluation are randomly selected from a certain number of samples of each category on the image, and then the overall precision, recall ratio, precision ratio and Kappa coefficient of the classification result are counted according to the classification result of each image object and the confusion degree of the real categories of the earth surface. The classification accuracy of the experiment is shown in table 2:

TABLE 2 Dual input parallel convolutional neural network classification accuracy

In the embodiment of the invention, the ground feature classification experiment is carried out by taking the digital orthophoto image (0.5 m) of the unmanned aerial vehicle as a data source. The experimental results are shown in the following graph, fig. 9 is an original image, fig. 10 is a divided image, and fig. 11 is a classification result graph.

Referring to fig. 12, fig. 12 is a schematic diagram illustrating a module composition of a high resolution remote sensing image classification system based on a convolutional neural network according to an embodiment of the present invention; including what connect in order: the remote sensing image acquisition module 11, the image segmentation module 12, the image processing module 13, the network construction module 14 and the actual classification module 15; wherein:

the remote sensing image acquisition module 11 is used for acquiring a remote sensing image for training and preprocessing the remote sensing image to obtain a preprocessed remote sensing image;

an image segmentation module 12, configured to segment the preprocessed remote sensing image by using a graph theory-based minimum spanning tree segmentation method, so as to segment the preprocessed remote sensing image into a plurality of segmented images with irregular shapes and inconsistent sizes;

an image processing module 13, configured to process each of the multiple segmented images, so as to process all of the segmented images into image objects in a fixed-size input format that can be accepted by a convolutional neural network;

the network construction module 14 is configured to construct a dual-input parallel convolutional neural network, and train the dual-input parallel convolutional neural network by using image objects with fixed size input formats that can be accepted by the plurality of convolutional neural networks as training data of the dual-input parallel convolutional neural network, so as to obtain a trained dual-input parallel convolutional neural network;

and the actual classification module 15 is used for classifying the actual remote sensing image by adopting the trained double-input parallel convolutional neural network.

In the remote sensing image acquisition module 11, the specific method of preprocessing is as follows:

In the image processing module 13, for a certain segmented image, the processing method specifically includes:

2) intercepting the segmentation image by using the minimum circumscribed rectangle with the mask, and carrying out binarization on the result, namely using white filling for the object region and using zero filling for other regions to obtain a binary mask image of the segmentation image

The invention has the beneficial effects that: the technical scheme provided by the invention has the following advantages:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A high-resolution remote sensing image classification method based on a convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:

2. The method for classifying the high-resolution remote sensing image based on the convolutional neural network as claimed in claim 1, wherein: in step S101, the specific method of preprocessing is as follows:

3. The method for classifying the high-resolution remote sensing image based on the convolutional neural network as claimed in claim 1, wherein: in step S103, for a certain segmented image, the processing method specifically includes:

4. The method for classifying high-resolution remote sensing images based on the convolutional neural network as claimed in claim 3, wherein: the minimum circumscribed rectangle objects comprise two kinds, namely a binary mask image and an image without a mask; the minimum circumscribed rectangle of the segmented image is used for segmenting the segmented image to obtain a minimum circumscribed rectangle object corresponding to the segmented image, and the method specifically comprises the following steps:

5. The method for classifying high-resolution remote sensing images based on the convolutional neural network as claimed in claim 4, wherein: in step S104, the constructed dual-input parallel convolutional neural network has two parallel input branches, one branch inputs binary mask data, and the other branch inputs data without a mask, the two branches begin to learn features of respective input data, then the learned features of the two branches are fused together to learn more abstract features, and finally a classification result is output; the binary mask data includes a plurality of different binary mask images and the unmasked data includes a plurality of different unmasked images.

6. The method for classifying the high-resolution remote sensing image based on the convolutional neural network as claimed in claim 1, wherein: in step S104, using the image objects with fixed size input format that can be accepted by the plurality of convolutional neural networks as training data of the convolutional neural networks, when training the dual-input parallel convolutional neural networks, using a ReLU as an activation function; the convolution kernel size is 3 × 3, and the step size is 1; the size of the pooling window is 2 multiplied by 2, and the step length is 2; the network optimization algorithm is Adadelta.

7. A high-resolution remote sensing image classification system based on a convolutional neural network is characterized in that: the system comprises the following modules:

and the actual classification module is used for classifying the actual remote sensing image by adopting the trained double-input parallel convolutional neural network.

8. The method for classifying high-resolution remote sensing images based on the convolutional neural network as claimed in claim 7, wherein: in the remote sensing image acquisition module, the specific method of preprocessing is as follows:

9. The method for classifying high-resolution remote sensing images based on the convolutional neural network as claimed in claim 7, wherein: in the image processing module, for a certain segmented image, the processing method specifically includes:

10. The method for classifying high-resolution remote sensing images based on the convolutional neural network as claimed in claim 9, wherein: the minimum circumscribed rectangle objects comprise two kinds, namely a binary mask image and an image without a mask; the minimum circumscribed rectangle of the segmented image is used for segmenting the segmented image to obtain a minimum circumscribed rectangle object corresponding to the segmented image, and the method specifically comprises the following steps: