CN111401455B

CN111401455B - Remote sensing image deep learning classification method and system based on Capsules-Unet model

Info

Publication number: CN111401455B
Application number: CN202010199056.XA
Authority: CN
Inventors: 廖静娟; 郭宇娟
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2023-04-18
Anticipated expiration: 2040-03-20
Also published as: CN111401455A

Abstract

The invention discloses a remote sensing image deep learning classification method and system based on Capsules-Unet model, comprising the following steps: carrying out data preprocessing on the remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data; fusing a capsule (Capsules) model by taking the Unet model as a basic network architecture, and establishing a Capsules-Unet model; training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model; and classifying the remote sensing image data to be classified by utilizing the trained Capsules-Unet model. The invention establishes the remote sensing image deep learning classification Capsules-Unet model capable of encapsulating the multidimensional characteristics of the ground objects, improves the dynamic routing algorithm of the existing capsule model, and more accurately and efficiently classifies the high-resolution remote sensing images.

Description

Remote sensing image deep learning classification method and system based on Capsules-Unet model

Technical Field

The invention relates to the field of remote sensing image classification, in particular to a remote sensing image deep learning classification method and system based on Capsules-Unet model.

Background

Classification is a fundamental problem in the field of remote sensing, and image classification of a basic model represented by a deep Convolutional Neural Network (CNN) has become a major trend. Compared with the traditional remote sensing image classification method, the deep convolutional neural network does not need to make characteristics manually. It is usually composed of multiple successive layers that can automatically learn extremely complex hierarchical features from a large amount of data, thus avoiding the problem of highly relying on expert knowledge to design features. Common convolutional neural networks such as Full Convolutional Networks (FCN), U-net, generative countermeasure networks (GANs) and other cross-connection type neural networks have become ideal models for various image classification tasks, and show huge potential in remote sensing applications. For example, in the classification of urban complex terrain types, a convolutional neural network is used for automatically extracting multi-scale features, and the classification accuracy can reach over 90%. The model structure is improved aiming at the problem that the sampling mode on the convolutional neural network is not fine enough, and the distinguishing capability of the model on the ground object objects is greatly improved through optimization modes such as data enhancement, multi-scale fusion, post-processing (CRF, voting and the like), additional features (elevation information, vegetation indexes and spectral features) and the like.

Despite the great success of convolutional neural networks, there are still some challenging problems in remote sensing classification. The main reasons are as follows:

(1) Remote sensing images are more complex than natural images. Remote sensing images contain various types of objects that vary greatly in size, color, location, and orientation. The spectral characteristics may not be sufficient in themselves to distinguish objects, but may also require identification of features based on spatial location, etc. Therefore, how to combine rich spectral information and spatial information as complementary clues, it is a hot spot of current research to significantly improve the performance of deep learning in remote sensing classification.

(2) There is often a lack of a large number of labeled datasets in remote sensing applications, which not only relates to the number of datasets, but also presents difficulties in defining category labels in the datasets. The complex deep network at present needs to configure a large amount of hyper-parameters, so that the whole network is too complex to be optimized. Therefore, overfitting is inevitable when deep neural networks are trained with a small amount of data set.

(3) The current "end-to-end" learning strategy makes the excellent performance of deep learning in the classification task difficult to interpret. In addition to the final network output, it is difficult to understand the prediction logic hidden inside the network. This creates difficulties in further mining and processing of remote sensing image classification.

In view of the above, researchers have proposed many approaches to these challenges. More recently, sabour et al designed a new neuron, the "capsule," to replace the traditional scalar neuron to construct a capsule network (CapsNet). The capsule is a carrier encapsulating a plurality of neurons. The output of the method is a high-dimensional vector which can express various attribute information of the entity, such as posture, illumination, deformation and the like. The probability of the entity occurring is represented by the modular length of the vector, and the larger the modular value of the high-dimensional vector is, the more likely the entity is to exist. The vector can be used for representing the relative relation between the direction of the ground object and the space, and the defects of the convolutional neural network are greatly overcome. The training algorithm of the capsule network is mainly a dynamic routing mechanism between capsules of network continuous layers. The dynamic routing mechanism may improve the robustness of model training and prediction while reducing the number of samples required.

Disclosure of Invention

The invention aims to solve the technical problem of establishing a remote sensing image deep learning classification model capable of packaging multidimensional characteristics of surface features so as to more accurately classify high-resolution remote sensing images; the dynamic routing algorithm of the existing capsule model is improved, the operation memory burden is reduced, and data parameters are reduced, so that the high-resolution remote sensing images can be classified more efficiently.

According to one aspect of the invention, a remote sensing image deep learning classification method based on a Capsules-Unet model is provided, and comprises the following steps: s1, carrying out data preprocessing on remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data; s2, fusing a capsule (Capsules) model by using the Unet model as a basic network architecture, and establishing a Capsules-Unet model; s3, training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model; and S4, classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.

Optionally, the step S3 further includes a step of verifying the Capsules-Unet model by using the verification set data, where the step includes setting an error smaller than a given threshold or meeting a maximum number of iterations, stopping training iteration, and completing training of the Capsules-Unet model.

Optionally, the Capsules-uet model comprises: a feature extraction module comprising an input convolution layer and convolution capsule layers (ConCaps), the input convolution layer for extracting low-level features of an input remote sensing image; the rolling capsule layer (ConCaps) is used for carrying out convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules; a contraction path module, which comprises a plurality of main capsule layers (Primary capsules) and is used for carrying out down-sampling processing on the capsules obtained by the characteristic extraction module; an extended path module including a plurality of main capsule layers (PrimaryCaps) and a plurality of deconvolution capsule layers (DeconCaps) which are configured to be staggered with each other, for performing an up-sampling process on the capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module; a skip connection layer (skip layer), wherein the extended path module clips and copies the low-level features in the contracted path module through the skip connection layer for performing upsampling processing in the extended path module; a classification module comprising a classification Capsule layer (Class Capsule) comprising a plurality of capsules, an activation vector modulo length of each Capsule of the plurality of capsules for calculating a probability of whether an instance of each Class exists.

Optionally, in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the same transformation matrix is adopted for the same type of child capsule and parent capsule.

Optionally, in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.

According to another aspect of the invention, a remote sensing image deep learning classification system based on a Capsules-Unet model is provided, which comprises: the data preprocessing unit is used for preprocessing the remote sensing image data and dividing the preprocessed remote sensing image data into training set data and verification set data; a model establishing unit, which takes the Unet model as a basic network architecture, fuses capsule (Capsules) models and establishes Capsules-Unet models; the model training unit is used for training the Capsules-Unet model by utilizing the training set data and the verification set data to obtain the trained Capsules-Unet model; and the classification unit is used for classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.

Optionally, the model training unit further includes a model verification subunit, configured to verify the Capsules-usne model by using the verification set data, where the training iteration is stopped when a set error is smaller than a given threshold or a maximum iteration number is met, and training of the Capsules-usne model is completed.

Optionally, the Capsules-Unet model in the system includes: a feature extraction module comprising an input convolution layer and convolution capsule layers (ConCaps), the input convolution layer for extracting low-level features of an input remote sensing image; the rolling capsule layer (ConCaps) is used for carrying out convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules; a contraction path module, which comprises a plurality of main capsule layers (PrimaryCaps) and is used for carrying out down-sampling processing on the capsules obtained by the characteristic extraction module; an extended path module including a plurality of main capsule layers (PrimaryCaps) and a plurality of deconvolution capsule layers (DeconCaps) which are configured to be staggered with each other, for performing an up-sampling process on the capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module; a skip connection layer (skip layer), wherein the extended path module clips and copies the low-level features in the contracted path module through the skip connection layer for performing upsampling processing in the extended path module; a classification module comprising a classification Capsule layer (Class Capsule) comprising a plurality of capsules, an activation vector modulo length of each Capsule of the plurality of capsules for calculating a probability of whether an instance of each Class exists.

Optionally, in the model training unit and the classification unit, a modified local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the same transformation matrix is adopted for the same type of child capsule and parent capsule.

Optionally, in the model training unit and the classification unit, a modified local constraint dynamic routing algorithm is adopted in the Capsules-Unet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.

The technical scheme of the invention has the beneficial technical effects that: a remote sensing image deep learning classification Capsules-Unet model capable of packaging multidimensional characteristics of ground objects is established, and high-resolution remote sensing images are classified more accurately; the dynamic routing algorithm of the existing capsule model is improved, the operation memory burden and the data parameters are reduced, and the high-resolution remote sensing images are classified more efficiently.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a remote sensing image deep learning classification method according to the invention.

FIG. 2 is a flow diagram of a remote sensing image deep learning classification method according to an embodiment of the invention.

FIG. 3 is a schematic diagram of the Capsules-Unet model structure of the present invention.

FIG. 4 is a further detailed block diagram of the Capsules-Unet model structure of FIG. 3 in accordance with the present invention.

FIG. 5 is a schematic structural diagram of a remote sensing image deep learning classification system according to the invention.

FIG. 6 is a schematic diagram of an improved locally constrained dynamic routing algorithm of the Capsules-Unet model of the present invention.

FIG. 7 is a schematic diagram of the process of updating coupling coefficients by the improved locally constrained dynamic routing algorithm of the Capsules-Unet model of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems consistent with certain aspects of the invention, as detailed in the appended claims.

As shown in fig. 1, according to an aspect of the present invention, the present invention provides a remote sensing image deep learning classification method based on Capsules-Unet model, including: s1, carrying out data preprocessing on remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data; s2, fusing a capsule (Capsules) model by using the Unet model as a basic network architecture, and establishing a Capsules-Unet model; s3, training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model; and S4, classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.

Data pre-processing

The high-resolution remote sensing image data of the invention is from Vaihingen and Potsdam data sets of ISPRS 2D semantic labeling. The ISPRS Vaihingen data set comprises 33 orthorectified images with different sizes and ground real data, which are all taken from Vaihingen area in Germany, the ground sampling interval is 9cm, and the ISPRS Vaihingen data set is composed of 3 channels (IRRG) of near infrared, infrared and green. The data collectively include 6 surface feature categories, which are impervious to water, buildings, low vegetation, trees, vehicles, and others (backgrounds). During model training, images with real label data are divided into a training set and a verification set, 31 images are selected as the training set, and the remaining 2 images are used as the verification set.

The Potsdam dataset contained 38 orthorectified images of different sizes and corresponding Digital Surface Models (DSMs). The DSM is an array of the same size as the input image, providing an elevation value at each pixel. The images were taken from the Potsdam area of Germany with a ground sampling interval of 5cm and consisted of 4 channels of near infrared, green and blue (IRRGB). In the experiment, to increase the number of bands in the Potsdam dataset, the NDVI index was calculated using the infrared band. The data set provides true label data for 24 images for model training and validation. The Potsdam dataset has the same feature class as the Vaihingen dataset. During model training, 23 images are selected as a training set, and the remaining 1 image is selected as a verification set. The image numbers of the labeled data in the two data sets divided into training samples and test samples are shown in table 1.

TABLE 1 image numbering of labeled data divided into training samples and test samples

See fig. 2. In the data preprocessing step, the original image is sampled to 64 × 64 pixels by using a random sliding window sampling method. 80% of the samples in the sample bank were used as training samples and 20% as test samples. To increase the diversity and variability of the samples, the model inputs a large amount of labeled training data into the model through normalization, random sampling, and data enhancement.

Capsule-Unet model establishment

The present invention is based on a new neuron designed by Sabour et al, namely a 'capsule', to replace the neuronScalar neurons of the system are used to construct the capsule network. The capsule network is composed of multiple layers of capsules. Each capsule contains two components: weights and coupling coefficients. Each weight matrix W represents a linear transformation that carries spatial relationships between low-level features and high-level features and other important relationships, such as pose (position, size, orientation), deformation, velocity, hue, texture, etc. Coefficient of coupling c _ij It is determined to which higher capsule the output of a lower capsule is directed. Different from the weights, c _ij Is updated by means of dynamic routing. Thus, the coupling coefficient essentially determines how information flows between the capsules. C between each lower capsule and all potential higher capsules _ij The sum is 1. The capsule network is established on the basis of capsules and aims to overcome the defect that the traditional CNNs cannot identify the partial-integral relationship between the posture information of an entity and an object.

The invention provides a high-spatial-resolution image classification model Capsules-Unet with a U-net model as a basic framework. The invention designs a classification model by utilizing the concept and the structure of the capsule so as to improve the classification performance through the viewpoint invariance and the interoperation mechanism of the capsule model.

Referring to fig. 3 and 4, the Capsules-uet model of the present invention comprises: a feature extraction module 1, which comprises an input convolution layer 11 and convolution capsule layers (ConCaps) 12, wherein the input convolution layer 11 is used for extracting low-level features of an input remote sensing image; the rolling capsule layer (ConCaps) 12 is used for performing convolution filtering processing on the low-level features extracted by the rolling layer 11 and converting the low-level features into capsules; a contraction path module 2, which includes a plurality of main capsule layers 21 (PrimaryCaps), and is configured to perform downsampling processing on the capsules obtained by the feature extraction module 1; an extended path module 3 including a plurality of main capsule layers (PrimaryCaps) 31 and a plurality of deconvolution capsule layers (DeconCaps) 32, which are configured to be staggered with each other, for performing an up-sampling process on the capsules from the contracted path module; the system also comprises an output main capsule layer 33, which is used for carrying out convolution processing on the data obtained by the up-sampling processing in the extended path module 3 and outputting the data to the classification module 5; a skip layer (skip layer) 4, wherein the extended path module 3 clips and copies the low-level features in the contracted path module 2 through the skip layer 4 for performing an upsampling process in the extended path module 3; a classification module 5 comprising a classification Capsule layer (Class Capsule) 52, the classification Capsule layer 52 comprising a plurality of capsules, an activation vector modulo length of each Capsule of the plurality of capsules for calculating a probability of whether an instance of each Class is present.

According to an alternative embodiment of the present invention, the input convolutional layer 11 contains 16 5 × 5 convolutional kernels with step size 1, which inputs a 64 × 64 × 3 image and outputs a 64 × 64 × 1 × 16 tensor. Many aspects of the primary capsule layers (PrimaryCaps) 21, 31 function similarly to CNN convolution layers. However, they take the capsule as input and use a locally constrained dynamic routing algorithm to determine the output, which is also the capsule. Deconvolution capsule layers (DeconCaps) 32 operate using transposed convolution to make up for global connectivity loss from locally constrained dynamic routing. The classification Capsule layer (Class Capsule) 52 may include k capsules (corresponding to k classification categories). The activation vector modulo length for each capsule calculates the probability of whether an instance of each class exists. Each capsule in the previous layer is fully connected to the capsules in that layer.

As shown in fig. 4, the input is a 64 × 64 multiband remote sensing image. First, starting from the input convolution layer 11 of the feature extraction module 1, 16 convolution kernels having a size of 5 × 5 and a step size of 1 are used to output a tensor of 64 × 64 × 1 × 16. The input convolutional layer 11 generates 16 feature maps having the same spatial dimension. The capsule convolution kernel passing through 2 5 × 5 convolution capsule layers 12 with step size 2 inputs 16 basic features detected by convolution layer 11 to generate a first set of feature combined capsule outputs with output size of 32 × 32 × 2 × 16. The feature extraction module 1 may capture the difference features of the input data and input them into the following modules. Next, the architecture is composed of a shrink path module 2 and an extend path module 3 in the basis of the Unet model. The shrink path module 2 down-samples the image by 4 successive layers of primary capsules (PrimaryCaps) 21. That is, capsule convolution with a step size of 1 of 5 × 5 × 4 × 16 (4 capsules, 16 feature maps), capsule convolution with a step size of 2 of 5 × 5 × 4 × 16 (4 capsules, 16 feature maps), capsule convolution with a step size of 1 of 5 × 5 × 8 × 16 (8 capsules, 16 feature maps), and capsule convolution with a step size of 2 of 5 × 5 × 8 × 32 (8 capsules, 32 feature maps) are performed. The size of the image when reaching the bottom layer is 8 × 8 × 8 × 32. By adding a capsule layer instead of a convolutional layer to the Unet, partial-global context information is preserved. The extended path module 3 is composed of 3 main capsule layers (PrimaryCaps) 31 and 3 deconvolution capsule layers (DeconCaps) 32 alternately, and further includes an output main capsule layer 33. Namely, a 5 × 5 × 8 × 32 (8 capsules, 32 feature maps) capsule convolution with a step size of 1 +4 × 4 × 8 × 32 (8 capsules, 32 feature maps) deconvolution capsule + 5 × 5 × 4 × 32 (4 capsules, 32 feature maps) capsule convolution with a step size of 1 +4 × 4 × 16 (4 capsules, 16 feature maps) deconvolution capsule + 5 × 5 × 4 × 16 (4 capsules, 16 feature maps) capsule convolution with a step size of 1 +4 × 4 × 2 × 16 (2 capsules, 16 feature maps) deconvolution capsule is performed. When the image reaches the uppermost layer, that is, after the 3 rd deconvolution, the image becomes 64 × 64 × 2 × 16, and then the data subjected to the upsampling process in the extended path module 3 is convolved by the output main capsule 33 and output to the classification module 5. Features in the contracted path module 2 are clipped and copied by the skip connection layer (skip layer) 4 for corresponding upsampling in the expanded path module 3. The classification module 5 comprises a classification Capsule layer (Class Capsule) 52, and the output main Capsule layer 33 performs convolution processing on the data subjected to the upsampling processing in the extended path module 3 and outputs the data to the classification Capsule layer 52; the classification capsule layer 52 includes a plurality of capsules, and the activation vector modulo length of each capsule of the plurality of capsules is used to calculate a probability of whether an instance of each class exists.

Model training and validation

The operation environment of the embodiment of the invention is built based on NVIDIA Quadro P600 GPUs and Keras deep learning platforms. The Capsules-Unet model was trained on a computer with a 3.7GHz 8-core CPU and 32GB memory.

Referring to fig. 2, in the model training phase, all training set data are input into the Capsules-usnet model, and the operation result of the Capsules-usnet model under different parameter values is judged by adopting verification set data. The maximum number of training cycles is set to 10000. The input size of the batch per training step is 30, i.e. 30 samples are input per time to fit the Capsules-Unet model. The initial learning rate is 0.001. And updating all parameters of the Capsules-Unet model by adopting an Adam optimization method. The iteration is stopped when the error is less than a given threshold or a maximum number of iterations is met.

Classification

In the classification stage, the trained Capsules-Unet model is used for classifying the data to be classified so as to generate preliminary class prediction data. The prediction data is then input to a classification capsule layer (classscapsule) 52, and a final surface feature class is generated according to the probability of the prediction class.

For classification tasks involving k classes, a classification Capsule layer (Class Capsule) 52 has k capsules, each representing a Class. Since the length of the capsule vector output represents the presence of a visual entity, the length of each capsule in the last layer (| v) _c | |) represents the probability of capsule-like k. For each predefined class, there is a contribution L in edge loss _k This is similar to Softmax as applied to the multi-classification task.

L _k ＝T _k max(0，m ⁺ -||v _c ||) ² +λ _margin (1-T _k )max(0，||v _c ||-m ^- ) (1)

Wherein m is ⁺ ＝0.9，m ^- =0.1 and λ _margin ＝0.5。T _k Is an indicator function of the classification, i.e., k is 1 in the presence and 0 in the absence. | v | (V) _c | | represents the output probability of the capsule; m is ⁺ For the upper bound, we penalize false positive (false positive), i.e., predict that k classes exist and do not exist, m ^- For the lower bound, penalize false negative (false negative), that is, predict k classes do not exist, but exist; lambda [ alpha ] _margin The specific gravities of the two are adjusted as a proportionality coefficient.

Referring to fig. 5, according to another aspect of the present invention, there is provided a remote sensing image deep learning classification system based on Capsules-Unet model, including: the data preprocessing unit 101 is used for preprocessing the remote sensing image data and dividing the preprocessed remote sensing image data into training set data and verification set data; a model establishing unit 102, which takes the Unet model as a basic network architecture, fuses capsule (Capsules) models and establishes capsule-Unet models; the model training unit 103 is used for training the Capsules-Unet model by utilizing the training set data and the verification set data to obtain the trained Capsules-Unet model; and the classification unit 104 classifies the remote sensing image data to be classified by using the trained Capsules-Unet model.

According to an optional embodiment of the present invention, the model training unit 103 further comprises a model verification subunit, configured to verify the Capsules-usne model using the verification set data.

Local constraint dynamic routing algorithm

The original capsule network and dynamic routing algorithm need to occupy a large amount of memory, and the model operation is very time-consuming. Because when the dynamic routing algorithm determines the coefficients for a sub-capsule to route to a parent capsule in the next layer, an additional intermediate representation is required to store the output of the sub-capsule in a given layer. Therefore, in order to solve the problems of excessive memory burden and parameter explosion, the invention provides an improved local constraint dynamic routing algorithm on the original dynamic routing algorithm.

According to one embodiment of the improved locally constrained dynamic routing algorithm of the present invention, when data of a sub-capsule (a current-layer capsule) of the Capsules-Unet model is routed to data of a parent capsule (a next-layer capsule), the same transformation matrix is used for the same type of sub-capsule and parent capsule.

As shown in fig. 6. In layer I there is a set of capsule types

For each and every>

All exist by one h ^l ×w ^l _×z ^l Is greater than or equal to>

Wherein h is ^l ×w ^l Is the size of the l-1 layer output. In level l +1 of the network there is a group of capsule types>

For each and every>

Exist in h ^l+1 ×w ^l+1 ×z ^l+1 Is greater than or equal to>

Wherein h is ^l+1 ×w ^l+1 Is the size of the l-layer output.

Taking the convolutional capsule layer 12 as an example, each parent capsule p _xy E P receives a set of "prediction vectors",

the set is defined as the transformation matrix ≧ based on the (x, y) -centered nucleus in level I>

And the output of the sub-capsule->

Matrix multiplication between, i.e. for arbitrary +>

Is present>

Thus, we can see that each->

All have a shape k _h ×k _w ×z ^l Where kh x kw is the size of the custom kernel. For all capsule types T ^l Each->

All the shapes of (1) are kh x kw x z ^l ×|T ^l+1 |×z ^l+1 Wherein | T ^l+1 And | is the number of parent capsule types in layer l + 1. It is worth noting that each +>

Independent of spatial position (x, y), since the same transformation matrix is shared across all spatial positions of a given capsule type. Briefly, the locally constrained dynamic routing is to perform convolution in each capsule at the bottom layer, each capsule convolves a tensor with the same dimension as that of all capsules at the top layer, and then perform routing on the convolution result of each bottom layer capsule. This is why matrix sharing can be utilized here to significantly reduce the number of parameters.

To determine each parent capsule p _xy E P, calculate these "prediction vectors"

Is weighted sum of wherein->

Are routing coefficients determined by a locally constrained dynamic routing algorithm. These Routing coefficients are calculated by "Routing Softmax",

/>

wherein

Is the child capsule routed to the parent capsule p _xy Is initialized to 0, is greater than or equal to>

The iterative update mode of (2) is as follows:

is independent of the current input image, it depends on the location and type of the two capsules, and then by measuring the current output v of each capsule in the layer _xy And the prediction vector pick>

The initial coupling coefficient is iteratively improved by the consistency therebetween.

In the capsule-Unet model, because the transport is in the form of vectors in the front-layer network, the direction processing is performed on the 'capsule' when the activation is made. The activation function of the Capsule-Unet model is named as Squaring, and the expression is shown in formula (4):

wherein p is _xy And v _xy Representing the input vector of capsule j and the output vector, p, at spatial location (x, y), respectively _xy In effect, is the weighted sum of all vectors output to capsule j at the previous layer. The physical meaning represented by the preceding part of the formula is the input vector p _xy The second part represents p _xy The unit vector of (2). The squaring function ensures that the value range of the input vector is between 0 and 1, and simultaneously, the direction of the input vector is kept. When | | | p _xy When | is zero, v _xy Is close to 0; when | | | p _xy V at infinity | | _xy Infinitely close to 1.

The dynamic process of locally constrained dynamic routing to achieve the update of the coupling coefficients between capsules is shown in fig. 7. In the first iteration of dynamic routing, due to temporary variables

Are initialized toZero, so that the coupling coefficient of capsule i to all capsules in layer l +1 is equal, and then on all inputs received->

Weighted summation is carried out to obtain p _xy Wherein the weight is each coupling coefficient->

Then p _xy Continuing to carry out nonlinear transformation according to the squaring function in the formula (4) to obtain v _xy And finally follows >>

In principle pair>

Updating, and returning the output v of the capsule after r iterations _xy . Typically, the optimal number of iterations in the experiment is 3.

According to another embodiment of the improved locally constrained dynamic routing algorithm of the present invention, in the model training and classification step, when data of a child capsule of the Capsules-Unet model is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only within one defined local window. As shown in fig. 5, only for a partial window area k of the capsule of layer l _h ×k _w ×z ^l Sampling is carried out, instead of carrying out operation on the whole capsule, so that the operation amount is greatly reduced, the operation memory burden is reduced, and the data parameters are reduced.

Evaluation method

In the present invention, the Overall Accuracy (OA) and Kappa coefficient were used to evaluate the classification effect of the inventive method. The overall accuracy is the percentage of correctly classified images in all test sets. The Kappa coefficient is another widely-used evaluation criterion, and is based on a confusion matrix to evaluate the accuracy of remote sensing classification. The equation is as follows:

where N is the total number of samples, N is the number of classes, M _ij Is a confusion matrix C41]The (i, j) th value of (M) _i+ And M _+i Respectively representing the sum of the ith row and ith column of C.

Comparative analysis

The invention compares the Capsules-Unet model with the Capsules and Unet to obtain the following results: both the Capsules-Unet model and the capsNet have achieved good results in the classification of impervious floors, buildings and low vegetation. The four types of ground objects have better area connectivity and clear ground object edges. However, vehicles and other unique, small coverage categories are not well identified and are very visible as a clutter with buildings. The homogeneous region in the Vaihingen dataset is small and not accurate enough for feature extraction. Although the Capsules-Unet model and the capsNet model provided by the invention both adopt a capsule form to reserve the spatial information between ground objects, under the condition that the homogeneous region is small, the spatial information of small target regions such as automobiles and the like is limited. The Unet model works well for classification of impervious and low-lying vegetation, but building boundaries are unclear.

Table 2 lists the class accuracy of the three models on the Vaihingen dataset. As can be seen from the table, the method of the present invention achieves a high classification performance. From an Overall Accuracy (OA) point of view, the Overall Accuracy (OA) of Capsules-Unet is 1.22% higher than that of CapsNet and 1.89% higher than that of Unet. The Kappa coefficients of the Capsules-Unet, the capsNet and the Unet models were 0.74, 0.74 and 0.72, respectively. The classification accuracy of Capsules-Unet in the Vaihingen dataset was slightly better than that of Capsules and Unet.

TABLE 2 precision evaluation results of Vaihingen dataset and Potsdam dataset

The features and advantages of the invention are illustrated by reference to examples. Accordingly, the invention is expressly not limited to these exemplary embodiments illustrating some possible non-limiting combination of features which may be present alone or in other combinations of features.

The above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A remote sensing image deep learning classification method based on Capsules-Unet model comprises the following steps:

s1, carrying out data preprocessing on remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data;

s2, taking the Unet model as a basic network architecture, fusing the capsule model, and establishing a Capsules-Unet model;

the Capsules-Unet model comprises:

the characteristic extraction module comprises an input convolution layer and a convolution capsule layer, wherein the input convolution layer is used for extracting low-level characteristics of an input remote sensing image; the rolling capsule layer is used for performing convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules;

the contraction path module comprises a plurality of main capsule layers and is used for performing down-sampling processing on the capsules obtained by the characteristic extraction module;

an extended path module including a plurality of main capsule layers and a plurality of backwinding capsule layers, the main capsule layers and the backwinding capsule layers being configured to be interleaved with each other for upsampling capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module;

a jump connection layer, the extension path module clipping and replicating low-level features in the contraction path module through the jump connection layer for upsampling processing in the extension path module;

a classification module comprising a classification capsule layer comprising a plurality of capsules, an activation vector modulo length of each capsule of the plurality of capsules for calculating a probability of whether an instance of each class exists;

s3, training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model;

and S4, classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.

2. The remote sensing image deep learning classification method according to claim 1, characterized in that: and step S3, verifying the Capsules-Unet model by using the verification set data, wherein training iteration is stopped when a set error is smaller than a given threshold value or a maximum iteration number is met, and training of the Capsules-Unet model is completed.

3. The remote sensing image deep learning classification method according to claim 1, characterized in that: in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a sub-capsule is routed to data of a next-layer parent capsule, the same transformation matrix is adopted for the same type of sub-capsule and parent capsule.

4. The remote sensing image deep learning classification method according to claim 1 or 3, characterized in that: in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.

5. A remote sensing image deep learning classification system based on Capsules-Unet model comprises:

the data preprocessing unit is used for preprocessing the remote sensing image data and dividing the preprocessed remote sensing image data into training set data and verification set data;

a model establishing unit, which takes the Unet model as a basic network architecture, fuses the capsule model and establishes a Capsules-Unet model;

the Capsules-Unet model comprises:

an extended path module comprising a plurality of main capsule layers and a plurality of backwinding capsule layers, the main capsule layers and the backwinding capsule layers being configured to interleave with each other for upsampling capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module;

the model training unit is used for training the Capsules-Unet model by utilizing the training set data and the verification set data to obtain the trained Capsules-Unet model;

and the classification unit is used for classifying the remote sensing image data to be classified by utilizing the trained Capsules-Unet model.

6. The remote sensing image deep learning classification system according to claim 5, characterized in that: the model training unit also comprises a model verification subunit which is used for verifying the Capsules-Unet model by adopting the verification set data, wherein when the set error is smaller than a given threshold value or the maximum iteration number is met, the training iteration is stopped, and the training of the Capsules-Unet model is completed.

7. The remote sensing image deep learning classification system according to claim 5, characterized in that: in the model training unit and the classification unit, an improved local constraint dynamic routing algorithm is adopted in the Capsules-Unet model, so that when data of a sub-capsule is routed to data of a next layer of parent capsule, the same conversion matrix is adopted for the sub-capsule and the parent capsule of the same type.

8. The remote sensing image deep learning classification system according to claim 5 or 7, characterized in that: in the model training unit and the classification unit, an improved local constraint dynamic routing algorithm is adopted in the Capsules-Unet model, so that when the data of a child capsule is routed to the data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.