CN111401455B - Remote sensing image deep learning classification method and system based on Capsules-Unet model - Google Patents

Remote sensing image deep learning classification method and system based on Capsules-Unet model Download PDF

Info

Publication number
CN111401455B
CN111401455B CN202010199056.XA CN202010199056A CN111401455B CN 111401455 B CN111401455 B CN 111401455B CN 202010199056 A CN202010199056 A CN 202010199056A CN 111401455 B CN111401455 B CN 111401455B
Authority
CN
China
Prior art keywords
capsule
capsules
model
data
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010199056.XA
Other languages
Chinese (zh)
Other versions
CN111401455A (en
Inventor
廖静娟
郭宇娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202010199056.XA priority Critical patent/CN111401455B/en
Publication of CN111401455A publication Critical patent/CN111401455A/en
Application granted granted Critical
Publication of CN111401455B publication Critical patent/CN111401455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Astronomy & Astrophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image deep learning classification method and system based on Capsules-Unet model, comprising the following steps: carrying out data preprocessing on the remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data; fusing a capsule (Capsules) model by taking the Unet model as a basic network architecture, and establishing a Capsules-Unet model; training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model; and classifying the remote sensing image data to be classified by utilizing the trained Capsules-Unet model. The invention establishes the remote sensing image deep learning classification Capsules-Unet model capable of encapsulating the multidimensional characteristics of the ground objects, improves the dynamic routing algorithm of the existing capsule model, and more accurately and efficiently classifies the high-resolution remote sensing images.

Description

Remote sensing image deep learning classification method and system based on Capsules-Unet model
Technical Field
The invention relates to the field of remote sensing image classification, in particular to a remote sensing image deep learning classification method and system based on Capsules-Unet model.
Background
Classification is a fundamental problem in the field of remote sensing, and image classification of a basic model represented by a deep Convolutional Neural Network (CNN) has become a major trend. Compared with the traditional remote sensing image classification method, the deep convolutional neural network does not need to make characteristics manually. It is usually composed of multiple successive layers that can automatically learn extremely complex hierarchical features from a large amount of data, thus avoiding the problem of highly relying on expert knowledge to design features. Common convolutional neural networks such as Full Convolutional Networks (FCN), U-net, generative countermeasure networks (GANs) and other cross-connection type neural networks have become ideal models for various image classification tasks, and show huge potential in remote sensing applications. For example, in the classification of urban complex terrain types, a convolutional neural network is used for automatically extracting multi-scale features, and the classification accuracy can reach over 90%. The model structure is improved aiming at the problem that the sampling mode on the convolutional neural network is not fine enough, and the distinguishing capability of the model on the ground object objects is greatly improved through optimization modes such as data enhancement, multi-scale fusion, post-processing (CRF, voting and the like), additional features (elevation information, vegetation indexes and spectral features) and the like.
Despite the great success of convolutional neural networks, there are still some challenging problems in remote sensing classification. The main reasons are as follows:
(1) Remote sensing images are more complex than natural images. Remote sensing images contain various types of objects that vary greatly in size, color, location, and orientation. The spectral characteristics may not be sufficient in themselves to distinguish objects, but may also require identification of features based on spatial location, etc. Therefore, how to combine rich spectral information and spatial information as complementary clues, it is a hot spot of current research to significantly improve the performance of deep learning in remote sensing classification.
(2) There is often a lack of a large number of labeled datasets in remote sensing applications, which not only relates to the number of datasets, but also presents difficulties in defining category labels in the datasets. The complex deep network at present needs to configure a large amount of hyper-parameters, so that the whole network is too complex to be optimized. Therefore, overfitting is inevitable when deep neural networks are trained with a small amount of data set.
(3) The current "end-to-end" learning strategy makes the excellent performance of deep learning in the classification task difficult to interpret. In addition to the final network output, it is difficult to understand the prediction logic hidden inside the network. This creates difficulties in further mining and processing of remote sensing image classification.
In view of the above, researchers have proposed many approaches to these challenges. More recently, sabour et al designed a new neuron, the "capsule," to replace the traditional scalar neuron to construct a capsule network (CapsNet). The capsule is a carrier encapsulating a plurality of neurons. The output of the method is a high-dimensional vector which can express various attribute information of the entity, such as posture, illumination, deformation and the like. The probability of the entity occurring is represented by the modular length of the vector, and the larger the modular value of the high-dimensional vector is, the more likely the entity is to exist. The vector can be used for representing the relative relation between the direction of the ground object and the space, and the defects of the convolutional neural network are greatly overcome. The training algorithm of the capsule network is mainly a dynamic routing mechanism between capsules of network continuous layers. The dynamic routing mechanism may improve the robustness of model training and prediction while reducing the number of samples required.
Disclosure of Invention
The invention aims to solve the technical problem of establishing a remote sensing image deep learning classification model capable of packaging multidimensional characteristics of surface features so as to more accurately classify high-resolution remote sensing images; the dynamic routing algorithm of the existing capsule model is improved, the operation memory burden is reduced, and data parameters are reduced, so that the high-resolution remote sensing images can be classified more efficiently.
According to one aspect of the invention, a remote sensing image deep learning classification method based on a Capsules-Unet model is provided, and comprises the following steps: s1, carrying out data preprocessing on remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data; s2, fusing a capsule (Capsules) model by using the Unet model as a basic network architecture, and establishing a Capsules-Unet model; s3, training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model; and S4, classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.
Optionally, the step S3 further includes a step of verifying the Capsules-Unet model by using the verification set data, where the step includes setting an error smaller than a given threshold or meeting a maximum number of iterations, stopping training iteration, and completing training of the Capsules-Unet model.
Optionally, the Capsules-uet model comprises: a feature extraction module comprising an input convolution layer and convolution capsule layers (ConCaps), the input convolution layer for extracting low-level features of an input remote sensing image; the rolling capsule layer (ConCaps) is used for carrying out convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules; a contraction path module, which comprises a plurality of main capsule layers (Primary capsules) and is used for carrying out down-sampling processing on the capsules obtained by the characteristic extraction module; an extended path module including a plurality of main capsule layers (PrimaryCaps) and a plurality of deconvolution capsule layers (DeconCaps) which are configured to be staggered with each other, for performing an up-sampling process on the capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module; a skip connection layer (skip layer), wherein the extended path module clips and copies the low-level features in the contracted path module through the skip connection layer for performing upsampling processing in the extended path module; a classification module comprising a classification Capsule layer (Class Capsule) comprising a plurality of capsules, an activation vector modulo length of each Capsule of the plurality of capsules for calculating a probability of whether an instance of each Class exists.
Optionally, in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the same transformation matrix is adopted for the same type of child capsule and parent capsule.
Optionally, in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.
According to another aspect of the invention, a remote sensing image deep learning classification system based on a Capsules-Unet model is provided, which comprises: the data preprocessing unit is used for preprocessing the remote sensing image data and dividing the preprocessed remote sensing image data into training set data and verification set data; a model establishing unit, which takes the Unet model as a basic network architecture, fuses capsule (Capsules) models and establishes Capsules-Unet models; the model training unit is used for training the Capsules-Unet model by utilizing the training set data and the verification set data to obtain the trained Capsules-Unet model; and the classification unit is used for classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.
Optionally, the model training unit further includes a model verification subunit, configured to verify the Capsules-usne model by using the verification set data, where the training iteration is stopped when a set error is smaller than a given threshold or a maximum iteration number is met, and training of the Capsules-usne model is completed.
Optionally, the Capsules-Unet model in the system includes: a feature extraction module comprising an input convolution layer and convolution capsule layers (ConCaps), the input convolution layer for extracting low-level features of an input remote sensing image; the rolling capsule layer (ConCaps) is used for carrying out convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules; a contraction path module, which comprises a plurality of main capsule layers (PrimaryCaps) and is used for carrying out down-sampling processing on the capsules obtained by the characteristic extraction module; an extended path module including a plurality of main capsule layers (PrimaryCaps) and a plurality of deconvolution capsule layers (DeconCaps) which are configured to be staggered with each other, for performing an up-sampling process on the capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module; a skip connection layer (skip layer), wherein the extended path module clips and copies the low-level features in the contracted path module through the skip connection layer for performing upsampling processing in the extended path module; a classification module comprising a classification Capsule layer (Class Capsule) comprising a plurality of capsules, an activation vector modulo length of each Capsule of the plurality of capsules for calculating a probability of whether an instance of each Class exists.
Optionally, in the model training unit and the classification unit, a modified local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the same transformation matrix is adopted for the same type of child capsule and parent capsule.
Optionally, in the model training unit and the classification unit, a modified local constraint dynamic routing algorithm is adopted in the Capsules-Unet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.
The technical scheme of the invention has the beneficial technical effects that: a remote sensing image deep learning classification Capsules-Unet model capable of packaging multidimensional characteristics of ground objects is established, and high-resolution remote sensing images are classified more accurately; the dynamic routing algorithm of the existing capsule model is improved, the operation memory burden and the data parameters are reduced, and the high-resolution remote sensing images are classified more efficiently.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a remote sensing image deep learning classification method according to the invention.
FIG. 2 is a flow diagram of a remote sensing image deep learning classification method according to an embodiment of the invention.
FIG. 3 is a schematic diagram of the Capsules-Unet model structure of the present invention.
FIG. 4 is a further detailed block diagram of the Capsules-Unet model structure of FIG. 3 in accordance with the present invention.
FIG. 5 is a schematic structural diagram of a remote sensing image deep learning classification system according to the invention.
FIG. 6 is a schematic diagram of an improved locally constrained dynamic routing algorithm of the Capsules-Unet model of the present invention.
FIG. 7 is a schematic diagram of the process of updating coupling coefficients by the improved locally constrained dynamic routing algorithm of the Capsules-Unet model of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems consistent with certain aspects of the invention, as detailed in the appended claims.
As shown in fig. 1, according to an aspect of the present invention, the present invention provides a remote sensing image deep learning classification method based on Capsules-Unet model, including: s1, carrying out data preprocessing on remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data; s2, fusing a capsule (Capsules) model by using the Unet model as a basic network architecture, and establishing a Capsules-Unet model; s3, training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model; and S4, classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.
Data pre-processing
The high-resolution remote sensing image data of the invention is from Vaihingen and Potsdam data sets of ISPRS 2D semantic labeling. The ISPRS Vaihingen data set comprises 33 orthorectified images with different sizes and ground real data, which are all taken from Vaihingen area in Germany, the ground sampling interval is 9cm, and the ISPRS Vaihingen data set is composed of 3 channels (IRRG) of near infrared, infrared and green. The data collectively include 6 surface feature categories, which are impervious to water, buildings, low vegetation, trees, vehicles, and others (backgrounds). During model training, images with real label data are divided into a training set and a verification set, 31 images are selected as the training set, and the remaining 2 images are used as the verification set.
The Potsdam dataset contained 38 orthorectified images of different sizes and corresponding Digital Surface Models (DSMs). The DSM is an array of the same size as the input image, providing an elevation value at each pixel. The images were taken from the Potsdam area of Germany with a ground sampling interval of 5cm and consisted of 4 channels of near infrared, green and blue (IRRGB). In the experiment, to increase the number of bands in the Potsdam dataset, the NDVI index was calculated using the infrared band. The data set provides true label data for 24 images for model training and validation. The Potsdam dataset has the same feature class as the Vaihingen dataset. During model training, 23 images are selected as a training set, and the remaining 1 image is selected as a verification set. The image numbers of the labeled data in the two data sets divided into training samples and test samples are shown in table 1.
TABLE 1 image numbering of labeled data divided into training samples and test samples
Figure BDA0002418693200000061
See fig. 2. In the data preprocessing step, the original image is sampled to 64 × 64 pixels by using a random sliding window sampling method. 80% of the samples in the sample bank were used as training samples and 20% as test samples. To increase the diversity and variability of the samples, the model inputs a large amount of labeled training data into the model through normalization, random sampling, and data enhancement.
Capsule-Unet model establishment
The present invention is based on a new neuron designed by Sabour et al, namely a 'capsule', to replace the neuronScalar neurons of the system are used to construct the capsule network. The capsule network is composed of multiple layers of capsules. Each capsule contains two components: weights and coupling coefficients. Each weight matrix W represents a linear transformation that carries spatial relationships between low-level features and high-level features and other important relationships, such as pose (position, size, orientation), deformation, velocity, hue, texture, etc. Coefficient of coupling c ij It is determined to which higher capsule the output of a lower capsule is directed. Different from the weights, c ij Is updated by means of dynamic routing. Thus, the coupling coefficient essentially determines how information flows between the capsules. C between each lower capsule and all potential higher capsules ij The sum is 1. The capsule network is established on the basis of capsules and aims to overcome the defect that the traditional CNNs cannot identify the partial-integral relationship between the posture information of an entity and an object.
The invention provides a high-spatial-resolution image classification model Capsules-Unet with a U-net model as a basic framework. The invention designs a classification model by utilizing the concept and the structure of the capsule so as to improve the classification performance through the viewpoint invariance and the interoperation mechanism of the capsule model.
Referring to fig. 3 and 4, the Capsules-uet model of the present invention comprises: a feature extraction module 1, which comprises an input convolution layer 11 and convolution capsule layers (ConCaps) 12, wherein the input convolution layer 11 is used for extracting low-level features of an input remote sensing image; the rolling capsule layer (ConCaps) 12 is used for performing convolution filtering processing on the low-level features extracted by the rolling layer 11 and converting the low-level features into capsules; a contraction path module 2, which includes a plurality of main capsule layers 21 (PrimaryCaps), and is configured to perform downsampling processing on the capsules obtained by the feature extraction module 1; an extended path module 3 including a plurality of main capsule layers (PrimaryCaps) 31 and a plurality of deconvolution capsule layers (DeconCaps) 32, which are configured to be staggered with each other, for performing an up-sampling process on the capsules from the contracted path module; the system also comprises an output main capsule layer 33, which is used for carrying out convolution processing on the data obtained by the up-sampling processing in the extended path module 3 and outputting the data to the classification module 5; a skip layer (skip layer) 4, wherein the extended path module 3 clips and copies the low-level features in the contracted path module 2 through the skip layer 4 for performing an upsampling process in the extended path module 3; a classification module 5 comprising a classification Capsule layer (Class Capsule) 52, the classification Capsule layer 52 comprising a plurality of capsules, an activation vector modulo length of each Capsule of the plurality of capsules for calculating a probability of whether an instance of each Class is present.
According to an alternative embodiment of the present invention, the input convolutional layer 11 contains 16 5 × 5 convolutional kernels with step size 1, which inputs a 64 × 64 × 3 image and outputs a 64 × 64 × 1 × 16 tensor. Many aspects of the primary capsule layers (PrimaryCaps) 21, 31 function similarly to CNN convolution layers. However, they take the capsule as input and use a locally constrained dynamic routing algorithm to determine the output, which is also the capsule. Deconvolution capsule layers (DeconCaps) 32 operate using transposed convolution to make up for global connectivity loss from locally constrained dynamic routing. The classification Capsule layer (Class Capsule) 52 may include k capsules (corresponding to k classification categories). The activation vector modulo length for each capsule calculates the probability of whether an instance of each class exists. Each capsule in the previous layer is fully connected to the capsules in that layer.
As shown in fig. 4, the input is a 64 × 64 multiband remote sensing image. First, starting from the input convolution layer 11 of the feature extraction module 1, 16 convolution kernels having a size of 5 × 5 and a step size of 1 are used to output a tensor of 64 × 64 × 1 × 16. The input convolutional layer 11 generates 16 feature maps having the same spatial dimension. The capsule convolution kernel passing through 2 5 × 5 convolution capsule layers 12 with step size 2 inputs 16 basic features detected by convolution layer 11 to generate a first set of feature combined capsule outputs with output size of 32 × 32 × 2 × 16. The feature extraction module 1 may capture the difference features of the input data and input them into the following modules. Next, the architecture is composed of a shrink path module 2 and an extend path module 3 in the basis of the Unet model. The shrink path module 2 down-samples the image by 4 successive layers of primary capsules (PrimaryCaps) 21. That is, capsule convolution with a step size of 1 of 5 × 5 × 4 × 16 (4 capsules, 16 feature maps), capsule convolution with a step size of 2 of 5 × 5 × 4 × 16 (4 capsules, 16 feature maps), capsule convolution with a step size of 1 of 5 × 5 × 8 × 16 (8 capsules, 16 feature maps), and capsule convolution with a step size of 2 of 5 × 5 × 8 × 32 (8 capsules, 32 feature maps) are performed. The size of the image when reaching the bottom layer is 8 × 8 × 8 × 32. By adding a capsule layer instead of a convolutional layer to the Unet, partial-global context information is preserved. The extended path module 3 is composed of 3 main capsule layers (PrimaryCaps) 31 and 3 deconvolution capsule layers (DeconCaps) 32 alternately, and further includes an output main capsule layer 33. Namely, a 5 × 5 × 8 × 32 (8 capsules, 32 feature maps) capsule convolution with a step size of 1 +4 × 4 × 8 × 32 (8 capsules, 32 feature maps) deconvolution capsule + 5 × 5 × 4 × 32 (4 capsules, 32 feature maps) capsule convolution with a step size of 1 +4 × 4 × 16 (4 capsules, 16 feature maps) deconvolution capsule + 5 × 5 × 4 × 16 (4 capsules, 16 feature maps) capsule convolution with a step size of 1 +4 × 4 × 2 × 16 (2 capsules, 16 feature maps) deconvolution capsule is performed. When the image reaches the uppermost layer, that is, after the 3 rd deconvolution, the image becomes 64 × 64 × 2 × 16, and then the data subjected to the upsampling process in the extended path module 3 is convolved by the output main capsule 33 and output to the classification module 5. Features in the contracted path module 2 are clipped and copied by the skip connection layer (skip layer) 4 for corresponding upsampling in the expanded path module 3. The classification module 5 comprises a classification Capsule layer (Class Capsule) 52, and the output main Capsule layer 33 performs convolution processing on the data subjected to the upsampling processing in the extended path module 3 and outputs the data to the classification Capsule layer 52; the classification capsule layer 52 includes a plurality of capsules, and the activation vector modulo length of each capsule of the plurality of capsules is used to calculate a probability of whether an instance of each class exists.
Model training and validation
The operation environment of the embodiment of the invention is built based on NVIDIA Quadro P600 GPUs and Keras deep learning platforms. The Capsules-Unet model was trained on a computer with a 3.7GHz 8-core CPU and 32GB memory.
Referring to fig. 2, in the model training phase, all training set data are input into the Capsules-usnet model, and the operation result of the Capsules-usnet model under different parameter values is judged by adopting verification set data. The maximum number of training cycles is set to 10000. The input size of the batch per training step is 30, i.e. 30 samples are input per time to fit the Capsules-Unet model. The initial learning rate is 0.001. And updating all parameters of the Capsules-Unet model by adopting an Adam optimization method. The iteration is stopped when the error is less than a given threshold or a maximum number of iterations is met.
Classification
In the classification stage, the trained Capsules-Unet model is used for classifying the data to be classified so as to generate preliminary class prediction data. The prediction data is then input to a classification capsule layer (classscapsule) 52, and a final surface feature class is generated according to the probability of the prediction class.
For classification tasks involving k classes, a classification Capsule layer (Class Capsule) 52 has k capsules, each representing a Class. Since the length of the capsule vector output represents the presence of a visual entity, the length of each capsule in the last layer (| v) c | |) represents the probability of capsule-like k. For each predefined class, there is a contribution L in edge loss k This is similar to Softmax as applied to the multi-classification task.
L k =T k max(0,m + -||v c ||) 2margin (1-T k )max(0,||v c ||-m - ) (1)
Wherein m is + =0.9,m - =0.1 and λ margin =0.5。T k Is an indicator function of the classification, i.e., k is 1 in the presence and 0 in the absence. | v | (V) c | | represents the output probability of the capsule; m is + For the upper bound, we penalize false positive (false positive), i.e., predict that k classes exist and do not exist, m - For the lower bound, penalize false negative (false negative), that is, predict k classes do not exist, but exist; lambda [ alpha ] margin The specific gravities of the two are adjusted as a proportionality coefficient.
Referring to fig. 5, according to another aspect of the present invention, there is provided a remote sensing image deep learning classification system based on Capsules-Unet model, including: the data preprocessing unit 101 is used for preprocessing the remote sensing image data and dividing the preprocessed remote sensing image data into training set data and verification set data; a model establishing unit 102, which takes the Unet model as a basic network architecture, fuses capsule (Capsules) models and establishes capsule-Unet models; the model training unit 103 is used for training the Capsules-Unet model by utilizing the training set data and the verification set data to obtain the trained Capsules-Unet model; and the classification unit 104 classifies the remote sensing image data to be classified by using the trained Capsules-Unet model.
According to an optional embodiment of the present invention, the model training unit 103 further comprises a model verification subunit, configured to verify the Capsules-usne model using the verification set data.
Local constraint dynamic routing algorithm
The original capsule network and dynamic routing algorithm need to occupy a large amount of memory, and the model operation is very time-consuming. Because when the dynamic routing algorithm determines the coefficients for a sub-capsule to route to a parent capsule in the next layer, an additional intermediate representation is required to store the output of the sub-capsule in a given layer. Therefore, in order to solve the problems of excessive memory burden and parameter explosion, the invention provides an improved local constraint dynamic routing algorithm on the original dynamic routing algorithm.
According to one embodiment of the improved locally constrained dynamic routing algorithm of the present invention, when data of a sub-capsule (a current-layer capsule) of the Capsules-Unet model is routed to data of a parent capsule (a next-layer capsule), the same transformation matrix is used for the same type of sub-capsule and parent capsule.
As shown in fig. 6. In layer I there is a set of capsule types
Figure BDA0002418693200000111
For each and every>
Figure BDA0002418693200000112
All exist by one h l ×w l ×z l Is greater than or equal to>
Figure BDA0002418693200000113
Wherein h is l ×w l Is the size of the l-1 layer output. In level l +1 of the network there is a group of capsule types>
Figure BDA0002418693200000114
For each and every>
Figure BDA0002418693200000115
Exist in h l+1 ×w l+1 ×z l+1 Is greater than or equal to>
Figure BDA0002418693200000116
Wherein h is l+1 ×w l+1 Is the size of the l-layer output.
Taking the convolutional capsule layer 12 as an example, each parent capsule p xy E P receives a set of "prediction vectors",
Figure BDA0002418693200000117
the set is defined as the transformation matrix ≧ based on the (x, y) -centered nucleus in level I>
Figure BDA0002418693200000118
And the output of the sub-capsule->
Figure BDA0002418693200000119
Matrix multiplication between, i.e. for arbitrary +>
Figure BDA00024186932000001110
Is present>
Figure BDA00024186932000001111
Thus, we can see that each->
Figure BDA00024186932000001112
All have a shape k h ×k w ×z l Where kh x kw is the size of the custom kernel. For all capsule types T l Each->
Figure BDA00024186932000001113
All the shapes of (1) are kh x kw x z l ×|T l+1 |×z l+1 Wherein | T l+1 And | is the number of parent capsule types in layer l + 1. It is worth noting that each +>
Figure BDA00024186932000001114
Independent of spatial position (x, y), since the same transformation matrix is shared across all spatial positions of a given capsule type. Briefly, the locally constrained dynamic routing is to perform convolution in each capsule at the bottom layer, each capsule convolves a tensor with the same dimension as that of all capsules at the top layer, and then perform routing on the convolution result of each bottom layer capsule. This is why matrix sharing can be utilized here to significantly reduce the number of parameters.
To determine each parent capsule p xy E P, calculate these "prediction vectors"
Figure BDA00024186932000001115
Is weighted sum of wherein->
Figure BDA00024186932000001116
Are routing coefficients determined by a locally constrained dynamic routing algorithm. These Routing coefficients are calculated by "Routing Softmax",
Figure BDA00024186932000001117
/>
wherein
Figure BDA00024186932000001118
Is the child capsule routed to the parent capsule p xy Is initialized to 0, is greater than or equal to>
Figure BDA00024186932000001119
The iterative update mode of (2) is as follows:
Figure BDA0002418693200000121
Figure BDA0002418693200000122
is independent of the current input image, it depends on the location and type of the two capsules, and then by measuring the current output v of each capsule in the layer xy And the prediction vector pick>
Figure BDA0002418693200000123
The initial coupling coefficient is iteratively improved by the consistency therebetween.
In the capsule-Unet model, because the transport is in the form of vectors in the front-layer network, the direction processing is performed on the 'capsule' when the activation is made. The activation function of the Capsule-Unet model is named as Squaring, and the expression is shown in formula (4):
Figure BDA0002418693200000124
wherein p is xy And v xy Representing the input vector of capsule j and the output vector, p, at spatial location (x, y), respectively xy In effect, is the weighted sum of all vectors output to capsule j at the previous layer. The physical meaning represented by the preceding part of the formula is the input vector p xy The second part represents p xy The unit vector of (2). The squaring function ensures that the value range of the input vector is between 0 and 1, and simultaneously, the direction of the input vector is kept. When | | | p xy When | is zero, v xy Is close to 0; when | | | p xy V at infinity | | xy Infinitely close to 1.
The dynamic process of locally constrained dynamic routing to achieve the update of the coupling coefficients between capsules is shown in fig. 7. In the first iteration of dynamic routing, due to temporary variables
Figure BDA0002418693200000125
Are initialized toZero, so that the coupling coefficient of capsule i to all capsules in layer l +1 is equal, and then on all inputs received->
Figure BDA0002418693200000126
Weighted summation is carried out to obtain p xy Wherein the weight is each coupling coefficient->
Figure BDA0002418693200000127
Then p xy Continuing to carry out nonlinear transformation according to the squaring function in the formula (4) to obtain v xy And finally follows >>
Figure BDA0002418693200000128
In principle pair>
Figure BDA0002418693200000129
Updating, and returning the output v of the capsule after r iterations xy . Typically, the optimal number of iterations in the experiment is 3.
According to another embodiment of the improved locally constrained dynamic routing algorithm of the present invention, in the model training and classification step, when data of a child capsule of the Capsules-Unet model is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only within one defined local window. As shown in fig. 5, only for a partial window area k of the capsule of layer l h ×k w ×z l Sampling is carried out, instead of carrying out operation on the whole capsule, so that the operation amount is greatly reduced, the operation memory burden is reduced, and the data parameters are reduced.
Evaluation method
In the present invention, the Overall Accuracy (OA) and Kappa coefficient were used to evaluate the classification effect of the inventive method. The overall accuracy is the percentage of correctly classified images in all test sets. The Kappa coefficient is another widely-used evaluation criterion, and is based on a confusion matrix to evaluate the accuracy of remote sensing classification. The equation is as follows:
Figure BDA0002418693200000131
where N is the total number of samples, N is the number of classes, M ij Is a confusion matrix C41]The (i, j) th value of (M) i+ And M +i Respectively representing the sum of the ith row and ith column of C.
Comparative analysis
The invention compares the Capsules-Unet model with the Capsules and Unet to obtain the following results: both the Capsules-Unet model and the capsNet have achieved good results in the classification of impervious floors, buildings and low vegetation. The four types of ground objects have better area connectivity and clear ground object edges. However, vehicles and other unique, small coverage categories are not well identified and are very visible as a clutter with buildings. The homogeneous region in the Vaihingen dataset is small and not accurate enough for feature extraction. Although the Capsules-Unet model and the capsNet model provided by the invention both adopt a capsule form to reserve the spatial information between ground objects, under the condition that the homogeneous region is small, the spatial information of small target regions such as automobiles and the like is limited. The Unet model works well for classification of impervious and low-lying vegetation, but building boundaries are unclear.
Table 2 lists the class accuracy of the three models on the Vaihingen dataset. As can be seen from the table, the method of the present invention achieves a high classification performance. From an Overall Accuracy (OA) point of view, the Overall Accuracy (OA) of Capsules-Unet is 1.22% higher than that of CapsNet and 1.89% higher than that of Unet. The Kappa coefficients of the Capsules-Unet, the capsNet and the Unet models were 0.74, 0.74 and 0.72, respectively. The classification accuracy of Capsules-Unet in the Vaihingen dataset was slightly better than that of Capsules and Unet.
TABLE 2 precision evaluation results of Vaihingen dataset and Potsdam dataset
Figure BDA0002418693200000141
The features and advantages of the invention are illustrated by reference to examples. Accordingly, the invention is expressly not limited to these exemplary embodiments illustrating some possible non-limiting combination of features which may be present alone or in other combinations of features.
The above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A remote sensing image deep learning classification method based on Capsules-Unet model comprises the following steps:
s1, carrying out data preprocessing on remote sensing image data, and dividing the preprocessed remote sensing image data into training set data and verification set data;
s2, taking the Unet model as a basic network architecture, fusing the capsule model, and establishing a Capsules-Unet model;
the Capsules-Unet model comprises:
the characteristic extraction module comprises an input convolution layer and a convolution capsule layer, wherein the input convolution layer is used for extracting low-level characteristics of an input remote sensing image; the rolling capsule layer is used for performing convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules;
the contraction path module comprises a plurality of main capsule layers and is used for performing down-sampling processing on the capsules obtained by the characteristic extraction module;
an extended path module including a plurality of main capsule layers and a plurality of backwinding capsule layers, the main capsule layers and the backwinding capsule layers being configured to be interleaved with each other for upsampling capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module;
a jump connection layer, the extension path module clipping and replicating low-level features in the contraction path module through the jump connection layer for upsampling processing in the extension path module;
a classification module comprising a classification capsule layer comprising a plurality of capsules, an activation vector modulo length of each capsule of the plurality of capsules for calculating a probability of whether an instance of each class exists;
s3, training the Capsules-Unet model by using the training set data and the verification set data to obtain the trained Capsules-Unet model;
and S4, classifying the remote sensing image data to be classified by using the trained Capsules-Unet model.
2. The remote sensing image deep learning classification method according to claim 1, characterized in that: and step S3, verifying the Capsules-Unet model by using the verification set data, wherein training iteration is stopped when a set error is smaller than a given threshold value or a maximum iteration number is met, and training of the Capsules-Unet model is completed.
3. The remote sensing image deep learning classification method according to claim 1, characterized in that: in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a sub-capsule is routed to data of a next-layer parent capsule, the same transformation matrix is adopted for the same type of sub-capsule and parent capsule.
4. The remote sensing image deep learning classification method according to claim 1 or 3, characterized in that: in step S3 and step S4, an improved local constraint dynamic routing algorithm is adopted in the Capsules-uet model, so that when data of a child capsule is routed to data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.
5. A remote sensing image deep learning classification system based on Capsules-Unet model comprises:
the data preprocessing unit is used for preprocessing the remote sensing image data and dividing the preprocessed remote sensing image data into training set data and verification set data;
a model establishing unit, which takes the Unet model as a basic network architecture, fuses the capsule model and establishes a Capsules-Unet model;
the Capsules-Unet model comprises:
the characteristic extraction module comprises an input convolution layer and a convolution capsule layer, wherein the input convolution layer is used for extracting low-level characteristics of an input remote sensing image; the rolling capsule layer is used for performing convolution filtering processing on the low-level features extracted by the rolling layer and converting the low-level features into capsules;
the contraction path module comprises a plurality of main capsule layers and is used for performing down-sampling processing on the capsules obtained by the characteristic extraction module;
an extended path module comprising a plurality of main capsule layers and a plurality of backwinding capsule layers, the main capsule layers and the backwinding capsule layers being configured to interleave with each other for upsampling capsules from the contracted path module; the system also comprises an output main capsule layer which is used for carrying out convolution processing on data obtained by the up-sampling processing in the extended path module and outputting the data to the classification module;
a jump connection layer, the extension path module clipping and replicating low-level features in the contraction path module through the jump connection layer for upsampling processing in the extension path module;
a classification module comprising a classification capsule layer comprising a plurality of capsules, an activation vector modulo length of each capsule of the plurality of capsules for calculating a probability of whether an instance of each class exists;
the model training unit is used for training the Capsules-Unet model by utilizing the training set data and the verification set data to obtain the trained Capsules-Unet model;
and the classification unit is used for classifying the remote sensing image data to be classified by utilizing the trained Capsules-Unet model.
6. The remote sensing image deep learning classification system according to claim 5, characterized in that: the model training unit also comprises a model verification subunit which is used for verifying the Capsules-Unet model by adopting the verification set data, wherein when the set error is smaller than a given threshold value or the maximum iteration number is met, the training iteration is stopped, and the training of the Capsules-Unet model is completed.
7. The remote sensing image deep learning classification system according to claim 5, characterized in that: in the model training unit and the classification unit, an improved local constraint dynamic routing algorithm is adopted in the Capsules-Unet model, so that when data of a sub-capsule is routed to data of a next layer of parent capsule, the same conversion matrix is adopted for the sub-capsule and the parent capsule of the same type.
8. The remote sensing image deep learning classification system according to claim 5 or 7, characterized in that: in the model training unit and the classification unit, an improved local constraint dynamic routing algorithm is adopted in the Capsules-Unet model, so that when the data of a child capsule is routed to the data of a next-layer parent capsule, the child capsule is routed to the parent capsule only in one defined local window.
CN202010199056.XA 2020-03-20 2020-03-20 Remote sensing image deep learning classification method and system based on Capsules-Unet model Active CN111401455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010199056.XA CN111401455B (en) 2020-03-20 2020-03-20 Remote sensing image deep learning classification method and system based on Capsules-Unet model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010199056.XA CN111401455B (en) 2020-03-20 2020-03-20 Remote sensing image deep learning classification method and system based on Capsules-Unet model

Publications (2)

Publication Number Publication Date
CN111401455A CN111401455A (en) 2020-07-10
CN111401455B true CN111401455B (en) 2023-04-18

Family

ID=71429004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010199056.XA Active CN111401455B (en) 2020-03-20 2020-03-20 Remote sensing image deep learning classification method and system based on Capsules-Unet model

Country Status (1)

Country Link
CN (1) CN111401455B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184687B (en) * 2020-10-10 2023-09-26 南京信息工程大学 Road crack detection method based on capsule feature pyramid and storage medium
CN112163549B (en) * 2020-10-14 2022-06-10 中南大学 Remote sensing image scene classification method based on automatic machine learning
CN112348118A (en) * 2020-11-30 2021-02-09 华平信息技术股份有限公司 Image classification method based on gradient maintenance, storage medium and electronic device
CN112580484B (en) * 2020-12-14 2024-03-29 中国农业大学 Remote sensing image corn straw coverage recognition method and device based on deep learning
CN112507039A (en) * 2020-12-15 2021-03-16 苏州元启创人工智能科技有限公司 Text understanding method based on external knowledge embedding
CN112766340B (en) * 2021-01-11 2024-06-04 中山大学 Depth capsule network image classification method and system based on self-adaptive spatial mode

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830243A (en) * 2018-06-22 2018-11-16 西安电子科技大学 Hyperspectral image classification method based on capsule network
CN110321859A (en) * 2019-07-09 2019-10-11 中国矿业大学 A kind of optical remote sensing scene classification method based on the twin capsule network of depth
CN110728224A (en) * 2019-10-08 2020-01-24 西安电子科技大学 Remote sensing image classification method based on attention mechanism depth Contourlet network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830243A (en) * 2018-06-22 2018-11-16 西安电子科技大学 Hyperspectral image classification method based on capsule network
CN110321859A (en) * 2019-07-09 2019-10-11 中国矿业大学 A kind of optical remote sensing scene classification method based on the twin capsule network of depth
CN110728224A (en) * 2019-10-08 2020-01-24 西安电子科技大学 Remote sensing image classification method based on attention mechanism depth Contourlet network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Mercedes E. Paoletti et.al.Capsule Networks for Hyperspectral Image Classification.《IEEE Transactions on Geoscience and Remote Sensing》.2019,第57卷(第4期),第2145-2160页. *
Ruirui Li et.al.DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation.《https://arxiv.org/pdf/1709.00201.pdf》.2017,第1-8页. *

Also Published As

Publication number Publication date
CN111401455A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401455B (en) Remote sensing image deep learning classification method and system based on Capsules-Unet model
CN111191736B (en) Hyperspectral image classification method based on depth feature cross fusion
Ghaderizadeh et al. Hyperspectral image classification using a hybrid 3D-2D convolutional neural networks
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN109461157B (en) Image semantic segmentation method based on multistage feature fusion and Gaussian conditional random field
CN108230329B (en) Semantic segmentation method based on multi-scale convolution neural network
CN111612008B (en) Image segmentation method based on convolution network
CN111489358A (en) Three-dimensional point cloud semantic segmentation method based on deep learning
Schulz et al. Learning Object-Class Segmentation with Convolutional Neural Networks.
CN110717553A (en) Traffic contraband identification method based on self-attenuation weight and multiple local constraints
CN110728197B (en) Single-tree-level tree species identification method based on deep learning
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN114782311B (en) CENTERNET improvement-based multi-scale defect target detection method and system
Xu et al. Robust self-ensembling network for hyperspectral image classification
CN113191213B (en) High-resolution remote sensing image newly-added building detection method
CN115359372A (en) Unmanned aerial vehicle video moving object detection method based on optical flow network
CN116645592B (en) Crack detection method based on image processing and storage medium
CN113435253A (en) Multi-source image combined urban area ground surface coverage classification method
CN115205590A (en) Hyperspectral image classification method based on complementary integration Transformer network
CN115238758A (en) Multi-task three-dimensional target detection method based on point cloud feature enhancement
CN115527056A (en) Hyperspectral image classification method based on dual-hybrid convolution generation countermeasure network
CN115482518A (en) Extensible multitask visual perception method for traffic scene
CN113850324A (en) Multispectral target detection method based on Yolov4
CN113205103A (en) Lightweight tattoo detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant