CN115937697A

CN115937697A - Remote sensing image change detection method

Info

Publication number: CN115937697A
Application number: CN202210834324.XA
Authority: CN
Inventors: 郭海涛; 卢俊; 龚志辉; 徐青; 丁磊; 林雨准; 刘相云; 牛艺婷
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2023-04-07

Abstract

The invention relates to a method for detecting changes of remote sensing images, and belongs to the technical field of remote sensing image recognition. According to the invention, an input mode of a twin network capable of sharing weight is selected in an encoding part, the double-time phase images are respectively input into the same network branches for feature extraction, and the overfitting phenomenon is reduced by sharing weight, wherein the two network branches select a residual error connection module as a convolution unit to respectively extract the features of the images, so that the convergence speed of the network is increased; meanwhile, information lost in the continuous down-sampling process of the image is aggregated layer by layer through up-sampling by utilizing a dense connection method, so that the extraction capability of the network on small targets is improved; an attention module is introduced at the transverse connection position of the encoding part and the decoding part to enhance the acquisition capacity of the change information and improve the distinguishing capacity of the boundary under the complex background; and finally, restoring the feature graph into the original image size through sampling on the decoding part, and inputting the feature graph into a classification layer to obtain an accurate change detection result.

Description

Remote sensing image change detection method

Technical Field

The invention relates to a method for detecting changes of remote sensing images, and belongs to the technical field of remote sensing image recognition.

Background

Remote sensing image change detection has been widely applied to a plurality of fields such as topographic map updating, city change analysis, disaster assessment, landslide monitoring and the like. Traditional change detection methods such as an image algebra method, an image transformation method and a comparison method after classification show good effects on the remote sensing images with medium and low resolution, but with the improvement of the image resolution, the image background is more complex, automatic change detection is more difficult to realize, and the precision of a change detection result cannot meet the requirements of practical application.

Deep learning is widely applied to the fields of computer vision, remote sensing and the like with strong autonomous learning capability, a model based on a full Convolutional neural Network (FCN) structure is applied to a semantic segmentation task, and the characteristic of an image is extracted in an end-to-end mode. Typical networks include UNet, segNet and DenseNet, and the task of detecting the change of the remote sensing image is rapidly developed by continuously providing a semantic segmentation network. The twin network gradually becomes a popular input mode in remote sensing image change detection by the characteristic of double-branch sharing weight, daudt and the like discuss three full convolution neural networks of FC-EF, FC-Sim-conc and FC-Sim-diff, and verify the effectiveness of the twin network in a change detection task, wherein the coding layers of the FC-Sim-conc and the FC-Sim-diff are divided into two branches with the same structure and sharing weight, the difference is that the decoding structure of the FC-Sim-conc fuses the characteristics obtained from the coding structure, and the FC-Sim-diff is the corresponding layer for calculating the absolute value of the characteristic diagram difference of the decoding structure and inputting the absolute value into the decoding structure. By introducing modules such as attention and depth measurement into a network, a learner can better distinguish the change characteristic from the invariant characteristic, and the performance of remote sensing image change detection is improved. For example: the SUNet-CD is added with an integrated channel attention module on the basis of UNet + + so as to refine the most representative features of different semantic levels, thereby realizing a change detection task; on the basis of a twin network, the STANet adds a Basic spatial-temporal interaction module (BAM) and a Pyramid spatial-temporal interaction module (PAM), so that the network can better adapt to targets under different scales, and the accuracy of change detection is improved; DASNet uses a double-attention mechanism to capture long-term dependence and obtain more discriminative features, thereby improving the recognition capability of the model; the DSAMNet uses a depth measurement module to enhance the learning capability of a feature extractor and generate more fine features.

At present, the widely used method in remote sensing image change detection is to directly put the remote sensing images of the front and back time phases into a convolutional neural network so as to output the final change result. Two modes are available for directly putting the remote sensing image into the convolutional neural network for feature extraction, one mode is that the double-time phase image is superposed on the channel dimension and then is used as the input of the network, and the change information is extracted from the superposed image by utilizing the excellent feature extraction capability of the deep convolutional neural network, but the method is difficult to maintain the high-dimensional feature of the original image; the other method is that two images are independently input into a twin network, the subnetworks share the weight, so that fewer parameters are needed for training, the overfitting phenomenon is reduced, the twin network can simultaneously extract the characteristics of front and rear time phase images, an image characteristic diagram is obtained in a weight sharing mode, and then the image characteristic diagram is compared to obtain the characteristics of a change region, so that accurate pixel-level classification is achieved.

Although the method continuously improves the accuracy of change detection, the change detection of the high-resolution remote sensing image still has the problems of rough boundary segmentation, small target omission and the like.

Disclosure of Invention

The invention aims to provide a remote sensing image change detection method to solve the problems that the boundary segmentation is rough and the small target is missed in the change detection of the current remote sensing image.

The invention provides a method for detecting remote sensing image change for solving the technical problem, which comprises the following steps:

1) Acquiring a remote sensing image pair to be detected, wherein the remote sensing image pair is a first remote sensing image and a second remote sensing image respectively, and the first remote sensing image and the second remote sensing image have the same scale and size;

2) Constructing a change detection model, wherein the change detection model comprises a coding part and a decoding part, the coding part adopts a twin network, the decoding part adopts an up-sampling network, and the coding part and the decoding part adopt an attention module for connection;

the twin network comprises a first network branch and a second network branch, the first network branch and the second network both adopt a plurality of layers of residual connecting modules for feature extraction, the first network branch is used for carrying out feature extraction on the first remote sensing image to obtain first remote sensing image features with different scales, and the second network branch is used for carrying out feature extraction on the second remote sensing image to obtain second remote sensing image features with different scales; overlapping and fusing the first remote sensing image characteristics and the second remote sensing image characteristics with the same scale in a dense connection mode between the first network branch and the second network branch to obtain fusion characteristics with different scales;

the attention module is used for respectively carrying out feature enhancement processing on the fusion features with different scales; and the decoding part is used for carrying out feature recovery and fusion on the features of different scales after the feature enhancement processing through an up-sampling network so as to obtain a change detection result.

According to the invention, an input mode of a twin network capable of sharing weight is selected in an encoding part, the double-time phase images are respectively input into the same network branches for feature extraction, and the overfitting phenomenon is reduced by sharing weight, wherein two network branches select a residual error connection module as a convolution unit to respectively extract the features of the images, so that the convergence speed of the network is increased; meanwhile, information lost in the continuous down-sampling process of the image is aggregated layer by layer through up-sampling by utilizing a dense connection method, so that the extraction capability of the network on small targets is improved; an attention module is introduced at the transverse connection position of the encoding part and the decoding part to enhance the acquisition capacity of the change information and improve the distinguishing capacity of the boundary under the complex background; and finally, restoring the feature graph into the original image size through sampling on the decoding part, and inputting the feature graph into a classification layer to obtain an accurate change detection result.

Further, the residual error connection module comprises a first convolution layer, a second convolution layer, a first regularization layer, a second regularization layer, an activation layer and a superposition module, wherein the first convolution layer is used for performing convolution operation on input features, the output of the first convolution layer is processed by the first regularization layer and the activation layer and then sequentially input into the second convolution layer and the second regularization layer, the output of the first convolution layer is input into the superposition module after being processed by convolution and regularization again, the output of the second regularization layer and the output of the first convolution layer are subjected to superposition processing, and the superposed features are the output of the residual error connection module.

The residual error connection module adopts the idea of constant and quick connection, so that the convergence speed of a network can be increased, and the detection efficiency is improved.

Furthermore, the dense connection includes an upsampling layer and an overlapping layer, the upsampling layer is used for upsampling the output of a residual connecting module on a certain layer in the second network branch, and the overlapping layer is used for performing overlapping processing on the upsampling result, the output of the residual connecting module on the upper layer in the second network branch and the output of the residual connecting module on the corresponding layer in the first network branch.

According to the invention, the feature maps with different scales obtained in the encoding part are densely connected, so that not only is the shallow spatial information learned, but also the obtained deep semantic information is transmitted to the shallow layer, the acquisition capability of small target change information is improved, and the problem of information loss of the image in the continuous down-sampling process is solved.

Further, the attention module adopts a convolution attention module.

The convolution attention module adopted by the invention enhances the attention to necessary characteristics, inhibits unnecessary characteristics and enhances the acquisition capability of the change information.

Furthermore, the first network branch comprises N layers of residual error connection modules, the second network branch comprises N +1 layers of residual error connection modules, and N is greater than or equal to 3; the input of the first layer residual connecting module of the first network branch is a first remote sensing image, and the output of the first layer residual connecting module of the first network branch is a convolution characteristic with the same scale size as the first remote sensing image; the input of the second layer residual error connection module of the first network branch is the output of the first layer residual error connection module, the output is the convolution characteristic with the scale size of 1/2 of the first remote sensing image, the input of the Nth layer residual error connection module of the first network branch is the output of the Nth-1 layer residual error connection module, and the output is the scale size of 1/2 of the first remote sensing image ^N-1 The convolution characteristic of (a); the input of the first layer residual connecting module of the second network branch is a second remote sensing image, and the output of the first layer residual connecting module of the second network branch is a convolution characteristic with the same scale size as the second remote sensing image; the input of the second layer residual error connecting module of the second network branch is the output of the first layer residual error connecting module, the output is the convolution characteristic with the scale size of 1/2 of the first remote sensing image, the input of the (N + 1) th layer residual error connecting module of the second network branch is the output of the Nth layer residual error connecting module, and the output is the scale size of 1/2 of the second remote sensing image ^N The convolution characteristic of (1).

Furthermore, the dense connection comprises N, the input of the N-th densely-connected upper sampling layer is the output of the (N + 1) -th layer residual connection module of the second network branch, the superposition layer of the N-th dense connection is used for superposing the output of the densely-connected upper sampling layer, the output of the N-th layer residual connection module of the first network branch and the output of the N-th layer residual connection module of the second network branch, and the superposition result is the output of the dense connection; the input of the upper sampling layer of the (N-1) th dense connection is the output of the (N-1) th dense connection, the superposition layer of the (N-1) th dense connection is used for superposing the output of the upper sampling layer of the dense connection, the output of the (N-1) th residual connecting module of the first network branch and the output of the (N-1) th residual connecting module of the second network branch, and the superposition result is the output of the (N-1) th dense connection.

Further, N is 4.

Further, the change detection model adopts a mixed Loss function of weighted cross entropy and Dice Loss during training.

Further, the loss function of the weighted cross entropy employed is:

wherein i represents a certain pixel point; w is a weight coefficient; y is _i 、p _i Respectively representing the prediction label of the pixel point i and the probability that the pixel point i belongs to the change class.

Further, the Loss function of Dice Loss employed is:

where Y represents the true value of the sample,

representing the predicted value of the sample.

Drawings

FIG. 1 is a schematic diagram of a network structure of a detection module used in the method for detecting changes in remote sensing images according to the present invention;

FIG. 2 is a schematic diagram of a residual linking module used in the present invention;

FIG. 3 is a schematic illustration of a dense connection employed by the present invention;

FIG. 4 is a schematic diagram of a network structure of a convolution attention module employed in the present invention;

FIG. 5 (a) is a remote sensing image of selected region 1 during the period T1 during the experiment of the present invention;

FIG. 5 (b) is a remote sensing image of selected region 1 during the T2 period of the experimental procedure of the present invention;

FIG. 5 (c) is a graph of the true change in the T1 and T2 periods for selected region 1 during the course of the experiment of the present invention;

FIG. 5 (d) is a schematic diagram of the change detection result of the remote sensing image of the selected area 1 by using the FC-Sim-conv model in the experimental process of the present invention;

FIG. 5 (e) is a schematic diagram of the change detection result of the remote sensing image of the selected area 1 by using the FC-Sim-diff model in the experiment process;

FIG. 5 (f) is a schematic diagram of the change detection result of the remote sensing image of the selected area 1 by using the DSAMNet model in the experimental process of the present invention;

FIG. 5 (g) is a schematic diagram of the change detection result of the remote sensing image of the selected area 1 by using the SUN-CD model in the experimental process of the present invention;

FIG. 5 (h) is a schematic diagram of the change detection result of the remote sensing image of the selected region 1 by using the CDASNet model of the present invention in the experimental process of the present invention;

FIG. 6 (a) is a remote sensing image of region 2 selected during the experiment of the present invention during the period T1;

FIG. 6 (b) is a remote sensing image of region 2 selected during the experiment of the present invention during the period T2;

FIG. 6 (c) is a graph of the true change in the T1 and T2 periods for selected region 2 during the course of the experiment of the present invention;

FIG. 6 (d) is a schematic diagram of the change detection result of the remote sensing image of the selected area 2 by using the FC-Sim-conv model in the experimental process of the present invention;

FIG. 6 (e) is a schematic diagram of the change detection result of the remote sensing image of the selected area 2 by using the FC-Sim-diff model in the experiment process;

FIG. 6 (f) is a schematic diagram of the change detection result of the remote sensing image of the selected region 2 by using the DSAMNet model in the experimental process of the present invention;

FIG. 6 (g) is a schematic diagram of the variation detection result of the remote sensing image of the selected area 2 by using the SUN-CD model in the experimental process of the present invention;

FIG. 6 (h) is a schematic diagram of the change detection result of the remote sensing image of the selected region 2 using the CDASNet model of the present invention during the experiment;

FIG. 7 (a) is a remote sensing image of selected region 3 during the experiment of the present invention during the period T1;

FIG. 7 (b) is a remote sensing image of selected region 3 during the T2 period of the experimental procedure of the present invention;

FIG. 7 (c) is a graph of the true change in the T1 and T2 periods for selected region 3 during the course of the experiment of the present invention;

FIG. 7 (d) is a schematic diagram of the change detection result of the remote sensing image of the selected area 3 by using the FC-Sim-conv model in the experiment process of the invention;

FIG. 7 (e) is a schematic diagram of the variation detection result of the remote sensing image of the selected area 3 using the FC-Sim-diff model in the experiment process of the present invention;

FIG. 7 (f) is a schematic diagram of the change detection result of the selected remote sensing image of region 3 using the DSAMNet model in the experimental process of the present invention;

FIG. 7 (g) is a schematic diagram of the variation detection result of the remote sensing image of the selected area 3 by using the SUN-CD model in the experimental process of the present invention;

FIG. 7 (h) is a schematic diagram of the change detection result of the selected remote sensing image of area 3 using the CDASNet model of the present invention during the experiment;

FIG. 8 (a) is a remote sensing image of selected region 4 during the period T1 during the experiment of the present invention;

FIG. 8 (b) is a remote sensing image of selected region 4 during the T2 period of the experimental procedure of the present invention;

FIG. 8 (c) is a plot of the true change in the T1 and T2 periods for selected region 4 during the course of the experiment of the present invention;

FIG. 8 (d) is a schematic diagram of the change detection result of the remote sensing image of the selected area 4 by using the FC-Sim-conv model in the experiment process of the invention;

FIG. 8 (e) is a schematic diagram of the variation detection result of the remote sensing image of the selected area 4 using the FC-Sim-diff model in the experiment process of the present invention;

FIG. 8 (f) is a schematic diagram of the change detection result of the remote sensing image of the selected region 4 by using the DSAMNet model in the experimental process of the present invention;

FIG. 8 (g) is a schematic diagram of the variation detection result of the remote sensing image of the selected area 4 by using the SUN-CD model in the experimental process of the present invention;

FIG. 8 (h) is a schematic diagram of the change detection result of the remote sensing image of the selected region 4 using the CDASNet model of the present invention during the experiment;

FIG. 9 (a) is a selected remote sensing image during an ablation experiment of the present invention during a period T1;

FIG. 9 (b) is a remote sensing image of a selected T2 period during an ablation experiment of the present invention;

FIG. 9 (c) is a plot of the true change values for selected T1 and T2 periods during the course of an experiment of the present invention;

FIG. 9 (d) is a schematic diagram showing the detection result of the change of the base used in the model during the ablation experiment;

FIG. 9 (e) is a schematic diagram of the change detection result of the model using base + CBAM in the ablation experiment process according to the present invention;

FIG. 9 (f) is a schematic diagram of the variation detection result of the model using base + DC in the ablation experimental process of the present invention;

FIG. 9 (g) is a schematic diagram of the change detection result of CDASNet adopted by the model in the ablation experimental process according to the present invention;

fig. 9 (h) is a detail view of the change detection of CDASNet used by the model during the ablation experiment of the present invention.

Detailed Description

The following description will further describe embodiments of the present invention with reference to the accompanying drawings.

Method embodiment

The invention provides a CDASNet network based on the idea that a UNet network can extract multi-scale features from a multi-layer path, and the overall structure is shown in figure 1. Aiming at the characteristics of remote sensing image change detection data, the network adopts a twin network input mode in an encoding stage, reduces the overfitting phenomenon by sharing weight, wherein two sub-networks select a residual error connection module as a convolution unit to respectively extract the characteristics of images, and the convergence speed of the network is increased; meanwhile, a new dense connection method is utilized to aggregate information lost in the continuous down-sampling process of the image layer by layer through up-sampling; and an attention module is introduced at the transverse connection position of the encoding stage and the decoding stage to enhance the acquisition capability of the change information, and finally, the feature graph is restored to the original image size through up-sampling and is input into a classification layer to obtain a change detection result.

Specifically, as shown in fig. 1, the network used for detecting the change of the remote sensing image in the invention is a CDASNet network, the network comprises a coding part and a decoding part, the coding part adopts a twin network, the decoding part adopts an up-sampling network, and the coding part and the decoding part adopt an attention module for connection; the twin network comprises a first network branch and a second network branch, wherein the first network branch and the second network both adopt a plurality of layers of residual connecting modules for feature extraction, the first network branch is used for carrying out feature extraction on the first remote sensing image so as to obtain first remote sensing image features of different scales, and the second network branch is used for carrying out feature extraction on the second remote sensing image so as to obtain second remote sensing image features of different scales; overlapping and fusing the first remote sensing image characteristic and the second remote sensing image characteristic with the same scale in a dense connection mode between the first network branch and the second network branch to obtain fusion characteristics with different scales; the attention module is used for respectively carrying out feature enhancement processing on the fusion features with different scales; the decoding part is used for carrying out feature recovery and fusion on the features of different scales after feature enhancement processing through an up-sampling network so as to obtain a change detection result. The first remote sensing image and the second remote sensing image refer to images in the same area in different time periods.

The first network branch comprises N layers of residual error connecting modules, the second network branch comprises N +1 layers of residual error connecting modules, and N is more than or equal to 3; for the present embodiment, N is 4, i.e. the first network branch comprises a layer 4 residual connection module, and the second network branch comprises a layer 5 residual connection module. Wherein the input of the first layer residual connecting module of the first network branch is a first remote sensing image (image in T1 period, T1 for short)) The output is convolution characteristic F with the same size as the first remote sensing image scale ₁₁ (ii) a The input of the second layer residual connecting module of the first network branch is the output F of the first layer residual connecting module ₁₁ And outputting convolution characteristic F with the scale size of 1/2 of the first remote sensing image ₁₂ (ii) a By analogy, the input of the layer 4 residual connecting module of the first network branch is the output F of the layer 3 residual connecting module ₁₃ The output is the first remote sensing image with the scale size of 1/8 ¹ Convolution characteristic F of ₁₄ . The input of the first layer residual error connection module of the second network branch is a second remote sensing image (an image in a T2 period, T2 for short), and the output is a convolution characteristic F with the same size as the second remote sensing image ₂₁ (ii) a The input of the second layer residual connecting module of the second network branch is the output F of the first layer residual connecting module ₂₁ And outputting convolution characteristic F with the scale size of 1/2 of the first remote sensing image ₂₂ (ii) a By analogy, the input of the layer 5 residual connecting module of the second network branch is the output F of the layer 4 residual connecting module ₂₄ And outputting convolution characteristic F with the scale size of 1/16 of the second remote sensing image ₂₅ . Thus, the feature obtained by the first network branch is sequentially F ₁₁ 、F ₁₂ 、F ₁₃ 、F ₁₄ (ii) a The second network branch in turn gets the feature F ₂₁ 、F ₂₂ 、F ₂₃ 、F ₂₄ 、F ₂₅ 。

As shown in fig. 2, the residual Connection module adopts an Identity Shortcut Connection (Identity Shortcut Connection), and includes a first convolution layer, a second convolution layer, a first regularization layer, a second regularization layer, an activation layer, and a superposition module, where the first convolution layer is configured to perform convolution operation on input features, output of the first convolution layer is sequentially input to the second convolution layer and the second regularization layer after being processed by the first regularization layer and the activation layer, and is input to the superposition module after being processed by convolution and regularization again, where the output of the second regularization layer and output of the first convolution layer are subjected to superposition processing, and the superposed features are output of the residual Connection module. x is input, the network structure is decomposed into two different branches by an objective function H (x), one branch is residual mapping F (x), in the branch, x firstly passes through a convolution layer of 3 x 3 to obtain a characteristic vector, then a regularization (Batch Normalization, BN) is utilized to enable the characteristic vector to meet the distribution rule that the mean value is 0 and the variance is 1, then a ReLU nonlinear activation function is selected to reduce the calculated amount of the network and relieve the over-fitting phenomenon, then the F (x) is obtained through a convolution layer of 3 x 3 and a BN layer, the other branch directly transmits input information x passing through the convolution layer of 3 x 3 to output, the input information x is added with the F (x) to obtain a final output result, and the integrity of original input information is reserved.

The first network branch and the second network branch are connected by a Dense Connection (DC), as shown in fig. 3, the Dense Connection includes an upsampling layer and an overlay layer, where the upsampling module is configured to upsample an output of a residual Connection module in a certain layer of the second network branch, and the overlay layer is configured to perform an overlay process on an upsampling result and an output of a residual Connection module in a higher layer of the second network branch and an output of a residual Connection module in a corresponding layer of the first network branch. In this embodiment, the first network branch adopts four layers of residual error connection modules to extract image features of different sizes, so that four dense connections are also adopted to realize the fusion of the first network branch extraction features and the second network branch extraction features. For the 4 th dense connection, the 5 th layer residual connecting module of the second network branch extracts the feature F of 1/16 of the original image ₂₅ Will obtain the feature F ₂₅ Inputting the image into the upper sampling layer of the dense connection for up-sampling processing to obtain the 1/8 characteristic of the original image, and extracting the 1/8 characteristic of the original image and the 4 th layer residual error connecting module of the first network branch to obtain the 1/8 characteristic F of the original image ₁₄ Extracting the feature F of 1/8 of the original image by a layer 4 residual error connecting module of the second network branch ₂₄ Performing superposition fusion to obtain a feature map F ₄ Then F is added ₄ After up-sampling with F ₁₃ 、F ₂₃ Superimposing the channels to obtain F ₃ (ii) a By analogy, fusion features of different scales can be obtained.

By the dense connection mode, the invention can obtain the information obtained in the coding stageSame scale feature map F ₁ 、F ₂ 、F ₃ 、F ₄ By using a single hop connection as input to the decoding stage, the number of parameters due to excessive hop connections is reduced. The method has the advantages that not only is the spatial information of the shallow layer learned, but also the obtained deep semantic information is transmitted to the shallow layer, and the acquisition capability of the small target change information is improved.

Because the background of the high-resolution remote sensing image is complex, the feature extraction is carried out only by a single model, the mixed pixel area is often difficult to distinguish, and simultaneously, not all features in a decoding part have the same effect on the model, so that the feature graph F of each layer in a coding part ₁ 、F ₂ 、F ₃ 、F ₄ A convolution Attention Module (CBAM) is introduced before the decoding part to enhance the Attention of the necessary characteristics and restrain the unnecessary characteristics, and a new characteristic diagram F is obtained ₁ ′、F ₂ ′、F ₃ ′、F ₄ '. Then F is mixed ₄ ' and F ₂₅ The features obtained after up-sampling in the decoding stage are fused to obtain a feature D ₄ Characterization of D ₄ After upsampling, is mixed with F ₃ ' carrying out fusion to obtain feature D ₃ By analogy, the characteristic D can be obtained finally ₁ . Finally, feature D is obtained ₁ And reducing the dimension to 2 by 1 × 1 convolution to generate a final prediction graph.

As shown in FIG. 4, the convolution attention module CBAM employed by the present invention concatenates the channel attention submodule and the spatial attention submodule, and captures spatial global information by introducing global pooling. The structure of the channel attention submodule is similar to that of the SE module, except that the channel attention submodule aggregates global information by paralleling maximum pooling and average pooling operations and compresses features in a spatial dimension; and the spatial attention submodule models the spatial relationship of the features and compresses the features on the channel. Given an input feature mapping X ∈ R ^C×H×W C, H and W are respectively the channel number, width and height of the characteristic, and a one-dimensional channel attention vector M is obtained by the channel attention submodule _c ∈R ^C×1×1 Then, thenMultiplying the input features and generating a two-dimensional space attention vector M through a space attention submodule _s ∈R ^1×H×W Finally, a new feature map F' is obtained.

In the remote sensing image change detection, the problem of obvious class imbalance exists, namely the number of the unchanged pixels is far greater than that of the changed pixels, so that the model training can focus on learning the large class data and neglect the small class data, and the change condition of a small target is easy to miss detection. In order to reduce the influence of sample imbalance and increase the weight of a changed sample in the whole image, the invention uses a mixed Loss function L of Weighted Cross Entropy (WCE) and Dice Loss, which is defined as:

L＝L _wce +L _dice (1)

the weighted cross entropy loss function can be expressed as:

And calculating the similarity between the truth value of the change sample and the predicted change sample by using a Dice Loss function. The Dice Loss function is expressed as follows:

wherein Y represents the true value of the sample,

representing the predicted value of the sample.

In order to further explain the detection effect of the remote sensing image change detection method (CDASNet) of the present invention, a simulation experiment is performed on the method, and two sets of remote sensing image data sets, namely, LEVIR-CD and CDD data sets, are selected in the experiment. The LEVIR-CD data set collects 637 high-resolution remote sensing images with the size of 1024 x 1024 pixels through Google Earth, covers 20 different areas in Texas, the time span is 5-14 years, the resolution is 0.5m per pixel, and the data set focuses on changes of buildings. Because the remote sensing image has a large size, the remote sensing image cannot be directly input into a model, and meanwhile, in order to avoid the phenomenon of information loss at the edge of a slice, the LEVIR-CD data set is cut into 256 multiplied by 256 slices according to the overlapping degree of 20 percent in the experiment, wherein 70 percent of the slices are training samples, 10 percent of the slices are verification samples, and the balance is testing samples. The CDD dataset is a public satellite image pair dataset provided by Lebedev. The dataset is obtained by Google Earth as a remote sensing image covering the same area with seasonal variation, and there are 11 pairs of multispectral images, including 7 pairs of 4725 × 2200 pixel sized seasonally varying images for creating manual label maps and 4 pairs of 1900 × 1000 pixel sized images for manually adding other objects. The data set is composed of multi-source remote sensing images, the spatial resolution is 3cm-100cm per pixel, seasonal variation between two time phase images is large, only appearance and disappearance of objects are regarded as variation of the images in the process of generating the label graph, and variation caused by seasons is ignored. The experiment was cut and rotated on the original image to generate 10000 training samples, 3000 verification samples and 3000 test samples, wherein the sample size was 256 × 256 pixels.

The remote sensing image change detection can be regarded as a pixel dichotomy problem in nature, and the common Precision evaluation indexes include Precision (P), recall (R), F1 score (F1-score, F1) and Intersection Over Unit (IOU). The higher the accuracy rate is, the lower the false detection generated by the prediction result is, and the better the precision of the model is; the higher the recall rate is, the positive sample in the prediction result is detected to the maximum extent, and the higher the value of the model is; the F1 score is a harmonic average value of the accuracy and the recall rate, and the larger the F1 score value is, the better the overall prediction result of the model is; the IOU is the ratio of the intersection and union of the positive sample predictors and the true values. The expression formula is as follows:

where TP represents the number of pixels for which positive samples are predicted as positive samples, FP represents the number of pixels for which negative samples are predicted as positive samples, TN represents the number of pixels for which positive samples are predicted as negative samples, and FN represents the number of pixels for which positive samples are predicted as negative samples. In this experiment, the positive samples are changed pixels and the negative samples are unchanged pixels.

In order to better illustrate the effect of the invention, the invention is compared with the currently mainstream twin network change detection methods FC-Sim-conc, FC-Sim-diff, DSAMNet and SUNet-CD, and each network experiment is carried out under the same environment, and the training parameters are kept consistent. In the experiment, a Pythroch deep learning frame under Linux is adopted, and the hardware environment is CPU Inter (R) Core (TM) i9-9900K, GPU GTX2080Ti and 11G display memory. In the training process, adam is selected as an optimizer, the batch size (batch size) is 4, the total iteration number EPOCH is 100, and the initial learning rate lr is _base Is 0.0001, and the learning rate is adjusted by adopting a fixed length attenuation strategy, the attenuation step length _ size of the learning rate is 8, and the attenuation factor gamma is set to be 0.5.

In order to more comprehensively compare the change detection results of the methods under the complex background, two typical regions are respectively selected from the LEVIR-CD and CDD data sets and are respectively marked as regions 1-4 for analysis. Wherein, the area 1 is an image including small and regularly arranged buildings and is used for exploring the detection performance of the network on small-size ground feature details; the area 2 covers buildings with obvious color characteristic change and large range, and is used for discussing the distinguishing capability of each network on large-size ground features; the detection performance of different networks under the condition of unbalanced types is discussed by using the images of the small target ground objects containing the vehicles in the area 3; the area 4 is mainly used for exploring the extraction capability of each network for linear ground objects.

For the region 1, due to the existence of unchanged ground objects with similar spectral characteristics around the changed region, as shown in fig. 5 (a) and 5 (b), the remote sensing image maps of the T1 and T2 periods are respectively shown, and fig. 5 (c) is the truth map of the changes of the T1 and T2 periods, and the detection results of the models of FC-Siam-conv, FC-Siam-diff, DSAMNet and SUN-CD are respectively shown in fig. 5 (d), 5 (e), 5 (f) and 5 (g), it can be seen that the several ways fail to detect the slightly changed building region at the lower left corner. Certain edges are hidden in shadows, so that the FC-Sim-conv is difficult to accurately distinguish the boundaries of buildings, and the phenomena of incomplete edges and adhesion are generated; an obvious missing detection phenomenon exists in the FC-Sim-diff; the DSAMNet and the SUN-CD respectively use a depth measurement module and a channel attention module, so that the detection effect is correspondingly improved compared with the former two networks, but the DSAMNet and the SUN-CD are easily influenced by shadows and trees to cause a micro false detection phenomenon. The detection result of the CDASNet adopted by the invention is shown in fig. 5 (h), a convolution attention module is introduced, the boundary information is effectively distinguished, the high-level and low-level characteristics are fused by utilizing dense connection, the small-scale change region is relatively completely detected, and the detection result is obviously superior to other networks.

For the region 2, as shown in fig. 6 (a) and 6 (b), the remote sensing image maps of the periods T1 and T2 are respectively shown, fig. 6 (c) is the truth map of the changes of the periods T1 and T2, and the detection results of the models of FC-Sim-conv, FC-Sim-diff, DSAMNet and SUN-CD are respectively shown in fig. 6 (d), 6 (e), 6 (f) and 6 (g), from which it can be seen that the FC-Sim-conv false detection phenomenon is obvious; the FC-Sim-diff and SUN-CD misjudge the grassland around the change area as changed to different degrees; the DSAMNet can extract deep features, detect the variation range more accurately, but has a weak capability of extracting edge detail information. The CDASNet (fig. 6 (h)) used in the present invention can completely detect the region with obvious color change and has smoother edge.

For region 3, as shown in FIGS. 7 (a) and 7 (b), the remote sensing images of the time periods T1 and T2 are shown, respectively, and FIG. 7 (c) is the truth diagram of the changes of the time periods T1 and T2, and the detection results using the FC-Sim-conv, FC-Sim-diff, DSAMNet, SUN-CD and CDASNet models are shown in FIGS. 7 (d), 7 (e), 7 (f), 7 (g) and 7 (h), respectively. For region 4, as shown in FIGS. 8 (a) and 8 (b), the remote sensing images of the time periods T1 and T2, respectively, and the truth diagrams of the changes of the time periods T1 and T2, respectively, are shown in FIG. 8 (d), 8 (e), 8 (f), 8 (g), and 8 (h), respectively, and the detection results using the models FC-Sim-conv, FC-Sim-diff, DSAMNet, SUN-CD, and CDASNet are shown in FIG. 8 (c), and FIG. 8 (c), respectively. From the region 3 and the region 4, it can be found that the FC-Sim-conv, the FC-Sim-diff and the DSAMNet can only detect a few change regions, and the omission phenomenon is obvious. The SUN-CD utilizes the dense jump connection and the channel integration module to fuse multi-stage features, so that the detection capability of a small target object is improved, but due to the shielding of shadows and trees, particularly when the texture features of the target are similar to invariant regions, the phenomena of missing detection and false detection still exist, and the problems of breakage and incomplete detection easily occur to the detection of roads. The CDASNet provided by the invention effectively improves the detection capability of small targets, can completely detect the change area even under the shadow, has more accurate boundary positioning, and is more ideal for linear ground object detection results.

For quantitative analysis of each test result, the evaluation results of each method on the LEVIR-CD dataset and the CDD dataset are counted in table 1. In the data set LEVIR-CD, the F1 scores obtained by the FC-Sim-conv and the FC-Sim-diff are relatively low in precision, but the F1 score is higher than the FC-Sim-diff because the FC-Sim-conv fuses the features obtained from the coding structure in the decoding structure. The DSAMNet adopts a depth measurement module to obtain deep-layer features in the image, and the precision of the DSAMNet on recall ratio, F1 score and intersection ratio is superior to that of FC-Sim-conv and FC-Sim-diff. The accuracy, recall rate, F1 score and intersection ratio of the SUN-CD are respectively 90.20%, 87.26%, 88.71% and 79.71%, wherein the accuracy reaches the optimal value of each network result, and the rest results are suboptimal values. According to the invention, the CDASNet achieves the optimal values in the experimental results in terms of recall rate, F1 score and cross-over ratio, wherein the recall rate and the F1 score are respectively increased by 1.29% and 0.31% compared with the suboptimal value, and the accuracy rate is slightly lower than the suboptimal value.

In the data set CDD, the precision of FC-Sim-diff and DSAMNet is lower than that of other networks on the F1 score, SUN-CD reaches a suboptimal value in various evaluation indexes of experimental results, the accuracy, recall rate, F1 score and intersection ratio of CDASNet adopted by the invention are respectively 95.54%, 93.37%, 94.44% and 89.47%, and are respectively improved by 0.87%, 1.65%, 1.26% and 2.25% compared with suboptimal values.

TABLE 1

In order to explore the influence degree of the dense connection and the CBAM on the network, ablation experiments are respectively performed on two data sets, and specific experimental results are shown in table 2. In LEVIR-CD data set, compared with the F1 fraction of the basic network, the F1 fraction of the network-added CBAM is improved by 0.61%, and the F1 fraction is improved to 89.02% after dense connection is adopted. In the CDD data set, the network adds CBAM to improve the F1 score by 2.49%, and the F1 score is improved to 94.44% after adopting dense connection, which is improved by 3.89% compared with the basic network. The necessity of the modules in the network is thus demonstrated.

TABLE 2

Fig. 9 (a) and 9 (b) show selected remote sensing images of different periods in the ablation experimental process, fig. 9 (c) shows a corresponding change true value diagram, and fig. 9 (d), 9 (e), 9 (f) and 9 (g) show change detection results of base, base + CBAM, base + DC and CDASNet models respectively. Through comparison, it can be found that the detection result edge obtained by the CBAM effect is accurate, but the change of the small target ground object cannot be detected, and the edge obtained by the DC effect is misjudged more, so that the adhesion phenomenon is easily generated, but the change of the small target ground object can be accurately identified, and the detection details are shown in fig. 9 (h). The CDASNet combines the advantages of the two modules and achieves the best detection effect.

The method described above may be run on a computer device as a computer program to implement the change detection method described in the present invention.

The CDASNet network model is provided, remote sensing image change detection is carried out based on the CDASNet network model, a twin network input mode is utilized to better adapt to a change detection task, a new dense connection mode is adopted to fuse information between high-level and low-level feature layers, therefore, the small target extraction capacity of the network is improved, meanwhile, an attention mechanism is introduced into the network to enhance the attention degree of a change area, and the boundary distinguishing capacity under a complex background is improved. Experiments are carried out on LEVIR-CD and CDD data sets, and results show that the CDASNet can adapt to the remote sensing image change detection task under the complex background and is superior to the current mainstream change detection network.

Claims

1. A remote sensing image change detection method is characterized by comprising the following steps:

1) Acquiring remote sensing images to be detected, wherein the remote sensing images are a first remote sensing image and a second remote sensing image respectively, and the first remote sensing image and the second remote sensing image have the same scale;

the twin network comprises a first network branch and a second network branch, the first network branch and the second network both adopt a plurality of layers of residual connecting modules for feature extraction, the first network branch is used for carrying out feature extraction on the first remote sensing image to obtain first remote sensing image features of different scales, and the second network branch is used for carrying out feature extraction on the second remote sensing image to obtain second remote sensing image features of different scales; overlapping and fusing the first remote sensing image characteristic and the second remote sensing image characteristic with the same scale in a dense connection mode between the first network branch and the second network branch to obtain fusion characteristics with different scales;

2. The remote sensing image change detection method according to claim 1, wherein the residual connection module includes a first convolution layer, a second convolution layer, a first regularization layer, a second regularization layer, an activation layer, and a superposition module, the first convolution layer is configured to perform convolution operation on input features, output of the first convolution layer is sequentially input to the second convolution layer and the second regularization layer after being processed by the first regularization layer and the activation layer, and is input to the superposition module after being processed by convolution and regularization again, the output of the second regularization layer and output of the first convolution layer are subjected to superposition processing, and the superposed features are output of the residual connection module.

3. The method for detecting the change of the remote sensing image according to claim 1 or 2, wherein the dense connection comprises an upsampling layer and an overlaying layer, the upsampling layer is used for upsampling the output of a residual connecting module on a certain layer in the second network branch, and the overlaying layer is used for overlaying the upsampling result, the output of the residual connecting module on the upper layer in the second network branch and the output of the residual connecting module on the corresponding layer in the first network branch.

4. A method as recited in claim 3, wherein said attention module is a convolution attention module.

5. The method for detecting changes in remote-sensing images according to claim 4, wherein the method comprisesThe first network branch comprises N layers of residual error connection modules, the second network branch comprises N +1 layers of residual error connection modules, and N is more than or equal to 3; the input of the first layer residual connecting module of the first network branch is a first remote sensing image, and the output of the first layer residual connecting module of the first network branch is a convolution characteristic with the same scale size as the first remote sensing image; the input of the second layer residual error connection module of the first network branch is the output of the first layer residual error connection module, the output is the convolution characteristic with the dimension being 1/2 of the first remote sensing image, the input of the Nth layer residual error connection module of the first network branch is the output of the Nth-1 layer residual error connection module, and the output is the dimension being 1/2 of the first remote sensing image ^N-1 The convolution characteristic of (a); the input of the first layer residual error connecting module of the second network branch is a second remote sensing image, and the output is a convolution characteristic with the same size as the second remote sensing image; the input of the second layer residual error connecting module of the second network branch is the output of the first layer residual error connecting module, the output is the convolution characteristic with the scale size of 1/2 of the first remote sensing image, the input of the (N + 1) th layer residual error connecting module of the second network branch is the output of the Nth layer residual error connecting module, and the output is the scale size of 1/2 of the second remote sensing image ^N The convolution characteristic of (1).

6. The remote sensing image change detection method according to claim 5, wherein the dense connections include N, the input of the up-sampling layer of the nth dense connection is the output of the (N + 1) th layer residual connection module of the second network branch, the superposition layer of the nth dense connection is used for superposing the output of the up-sampling layer of the dense connection, the output of the N layer residual connection module of the first network branch and the output of the N layer residual connection module of the second network branch, and the superposition result is the output of the dense connection; the input of the upper sampling layer of the (N-1) th dense connection is the output of the (N-1) th dense connection, the superposition layer of the (N-1) th dense connection is used for superposing the output of the upper sampling layer of the dense connection, the output of the residual error connection module of the (N-1) th layer of the first network branch and the output of the residual error connection module of the (N-1) th layer of the second network branch, and the superposition result is the output of the (N-1) th dense connection.

7. The method for detecting changes in remote sensing images of claim 5, wherein N is 4.

8. The method for detecting changes in remote sensing images according to any one of claims 4-7, wherein the change detection model is trained using a mixed Loss function of weighted cross entropy and Dice Loss.

9. The remote sensing image change detection method according to claim 8, wherein the loss function of the adopted weighted cross entropy is:

10. The method for detecting changes in remote sensing images of claim 8, wherein the Loss function of Dice Loss is:

where Y represents the true value of the sample,

representing the predicted value of the sample. />