CN113901900A

CN113901900A - Unsupervised change detection method and system for homologous or heterologous remote sensing image

Info

Publication number: CN113901900A
Application number: CN202111153270.2A
Authority: CN
Inventors: 唐旭; 张华煜; 马晶晶; 张向荣; 焦李成
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-07

Abstract

The invention discloses an unsupervised change detection method and system for homologous or heterologous remote sensing images, which inputs homologous or heterologous double-time phase images into a twin full convolution feature extraction network to obtain feature difference graphs of different scales; constructing a feature fusion network corresponding to each scale and based on meta-learning and packet convolution, and fusing feature difference graphs of different scales to obtain a difference graph; optimizing a twin full convolution feature extraction network and a feature fusion network based on meta-learning and grouping convolution by using an improved objective function, and iteratively updating the weight values of the twin full convolution feature extraction network and the feature fusion network by using an Adam optimization method; continuously enlarging the numerical value difference between the variable pixels and the invariable pixels in the difference image in the training process; and completing binarization operation on the trained difference image through a threshold value to obtain a binary change image with the same size as the original image, and efficiently and accurately obtaining a change detection result of a pair of homologous or heterologous double-temporal remote sensing images in an unsupervised mode.

Description

Unsupervised change detection method and system for homologous or heterologous remote sensing image

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an unsupervised change detection method and an unsupervised change detection system for homologous or heterologous remote sensing images, which can be used for accurately detecting changes between two pictures from remote sensing images obtained from the same or different sensors at two different times and generating corresponding change detection images.

Background

With the increase of the variety and the number of remote sensing satellites and the improvement of earth observation technology in recent years, people can have more opportunities to monitor the change of the earth surface from the outer space, and a remote sensing image change detection task is generated accordingly. The task can play a crucial role in practical applications such as land cover monitoring, disaster management, ecosystem surveillance, urban planning, etc. To enable change detection, various satellite platforms have provided a large number of multi-temporal remote sensing images to the change detection database in constant operation and these images typically have very high spatial resolution. In order to achieve a better detection effect, a common supervised or semi-supervised training mode needs a large amount of manually labeled data to ensure the change detection effect, and the manual labeling process is complex and tedious, so people hope to complete the change detection as accurately as possible under an unsupervised condition. However, due to the characteristics of rich information, complex spatial details and the like of the remote sensing images, changes between the image pairs are unknown, and great challenges are brought to change detection of the remote sensing images. Meanwhile, the increase of the types of remote sensing satellites also provides new requirements for change detection tasks, namely the change detection tasks aiming at the heterogeneous images. When two images of the same location are taken by 2 different sensors at different times, we call a heterogeneous image pair. How to remove the interference caused by different sensors and accurately identify the change between two images brings more challenges to the task of detecting the change of the remote sensing image.

Because the traditional machine learning method has good stability and high efficiency, the manual features (such as texture features, spectral features, color features and shape features) are widely applied to the field of change detection. Conventional unsupervised change detection methods are typically developed based on difference images, and many practical algorithms have been proposed to obtain useful difference images, such as principal component analysis and slow feature analysis. However, since it is difficult to fully describe the information of the remote sensing image by manual features, the performance of the change detection of the remote sensing image by applying the traditional machine learning method cannot meet the expectation.

In recent years, with the development of deep learning technology, especially convolutional neural network, the computer vision field enters a new era, and the problem of remote sensing image processing is greatly solved. Due to the strong nonlinear fitting capability and the hierarchical structure of the convolutional neural network, the learned features can simultaneously obtain high-level semantics and rich spatial context information. A series of remote sensing image change detection methods based on a convolutional neural network are provided, and the implementation steps of the methods are generally as follows: the depth features are extracted through a pre-trained convolutional neural network, multi-layer feature combination is carried out, comparison and selection of the depth features are carried out through a certain feature selection strategy, and a threshold value is set to generate a corresponding binary change map. However, it is difficult for the current task of detecting changes to adequately mine the multi-scale depth features between image pairs, reasonably select and utilize the extracted features, and highlight and emphasize the difference information between image pairs.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide an unsupervised change detection method and system for homologous or heterologous remote sensing images, which aims at overcoming the defects in the prior art, and the method and system for unsupervised change detection for homologous or heterologous remote sensing images are provided, wherein a multi-scale full convolution network is constructed to enable a depth feature map to have a spatial relationship between pixel points under different scales, the mastering capability of the full convolution network on global features is improved and feature differences among heterologous images are reduced through meta-learning, the fusion effect of different features is ensured by using packet convolution, and meanwhile, an improved objective function further highlights a change area in a difference map, so that the accuracy and recall rate of change detection are improved.

The invention adopts the following technical scheme:

an unsupervised change detection method for homologous or heterologous remote sensing images comprises the following steps:

s1, constructing a twin full convolution feature extraction network, respectively inputting homologous or heterologous double-temporal images into the constructed twin full convolution feature extraction network, and extracting multi-scale semantic features to obtain feature difference maps of different scales;

s2, extracting global features through a global feature sampling network according to the feature difference graphs of different scales obtained in the step S1, and mapping the heterogeneous images to the same feature space; constructing a feature fusion network corresponding to each scale and based on meta-learning and packet convolution, fusing feature difference maps of different scales, and obtaining a difference map by a depth change vector analysis method;

s3, completing optimization of the twin full convolution feature extraction network constructed in the step S1 and the feature fusion network constructed in the step S2 based on meta-learning and grouping convolution by using an improved objective function, and iteratively updating the weight values of the twin full convolution feature extraction network and the feature fusion network by using an Adam optimization method; continuously enlarging the numerical difference between the changed pixels and the unchanged pixels in the difference image obtained in the step S2 in the training process;

and S4, completing binarization operation of the difference map trained in the step S3 through a threshold value to obtain a binary change map with the same size as the original map, and completing image change detection. 2. The method of claim 1, wherein in step S1, the twin full convolution feature extraction network is a set of dual-branch weight-shared full convolution networks, and each full convolution network is formed by a VGG16 model composed of five convolution modules connected in series.

Specifically, in step S1, the input homologous or heterologous two-phase images specifically include:

adjusting the size of the homologous image pair to 640 multiplied by 640, standardizing the double time phase image by subtracting the mean value from each pixel point in the same source image pair and dividing the mean value by the standard deviation, and finishing the registration of the input homologous image pair by using a preprocessing method;

adjusting the size of the heterogeneous image pair to 640 multiplied by 640, standardizing the double time phase images in a mode of subtracting the mean value from each pixel point in the heterogeneous source image pair and dividing the standard deviation, and finishing the registration of the two images by using a preprocessing method;

and inputting the processed homologous or heterologous images into a twin full convolution feature extraction network and a feature fusion network to obtain corresponding feature difference maps.

Specifically, in step S2, the feature fusion network based on meta-learning and packet convolution includes a global feature sampling network, a feature upsampling model based on a meta-learning module, and a feature fusion model based on a packet convolution module; the global feature sampling network is a U-shaped network with unshared double-branch weights, and skip connection is adopted between corresponding layers of an encoder and a decoder of the U-shaped network; the feature up-sampling module based on the meta-learning module comprises a position coding process and an inverted residual block; the feature fusion module based on the grouping convolution is formed by connecting 4 grouping convolution modules with different convolution kernels in parallel.

Specifically, in step S3, the feature difference maps of different scales obtained in step S2 are subjected to a depth change vector analysis network to obtain a change amplitude ρ of a corresponding position of each pixel, and the network parameters of the constructed twin full convolution feature extraction network and the feature fusion network based on the meta learning and the block convolution are updated by using an improved loss function, so that the change amplitude ρ continuously approaches 0 or 1, and an optimal difference map is obtained.

Specifically, in step S3, the step of iteratively updating the network weight value by using the Adam optimization method specifically includes:

inputting a pair of homologous or heterologous double-temporal images into the built twin full convolution feature extraction network and the feature fusion network based on meta-learning and grouping convolution, and updating the weight values of the twin full convolution feature extraction network and the feature fusion network based on meta-learning and grouping convolution by using the weight values; and repeatedly inputting the homologous or heterologous double-temporal image pair into a twin full convolution feature extraction network and a feature fusion network based on meta-learning and grouping convolution, and updating the loss function loss value after the weight value is updated.

Further, the loss function loss is as follows:

wherein, tau_local(i, j) is the local adaptive threshold, H is the length of the disparity map, W is the width of the disparity map, ρ_i,jIs the magnitude of the value of the disparity map at the (i, j) position, i.e., the magnitude of the change of the pixel at the (i, j) position.

Further, the updated weight value W_newComprises the following steps:

wherein W is the initial weight value of the multi-scale graph convolutional neural network, L is the learning rate of the multi-scale graph convolutional neural network training,

representing a partial derivation operation.

Specifically, in step S4, the binarization process of the disparity map is represented as:

wherein, tau_NTo improve the threshold used in the loss function to distinguish between changed and unchanged regions during the last training, CM (x, y) is the expected binary change map.

Another technical solution of the present invention is an unsupervised change detection system for homologous or heterologous remote sensing images, comprising:

the feature module is used for constructing a twin full convolution feature extraction network, inputting the homologous or heterologous double-temporal images into the constructed twin full convolution feature extraction network respectively, and extracting multi-scale semantic features to obtain feature difference maps of different scales;

the fusion module extracts global features through a global feature sampling network according to the feature difference graphs of different scales obtained by the feature module and maps the heterogeneous images into the same feature space; constructing a feature fusion network corresponding to each scale and based on meta-learning and packet convolution, fusing feature difference maps of different scales, and obtaining a difference map by a depth change vector analysis method;

the optimization module is used for completing the optimization of the twin full convolution feature extraction network constructed by the feature module and the feature fusion network of the fusion module based on meta-learning and grouping convolution by using an improved objective function, and iteratively updating the weight values of the twin full convolution feature extraction network and the feature fusion network by using an Adam optimization method; continuously enlarging the numerical difference between the variable pixels and the invariable pixels in the difference image obtained by the fusion module in the training process;

and the detection module finishes binarization operation on the difference image trained by the optimization module through a threshold value to obtain a binary change image with the same size as the original image, and finishes image change detection.

Compared with the prior art, the invention has at least the following beneficial effects:

the unsupervised change detection method for the homologous or heterologous remote sensing image can fully mine and reserve the depth characteristics of the remote sensing image for the homologous and heterologous images, effectively fuse the multi-scale characteristics, organically combine the space and the inter-spectrum characteristics in the characteristic image, and finally effectively highlight the change area in the difference image through the improved objective function to finish accurate change detection. For the heterogeneous image pair, the data difference caused by different sensors can be effectively reduced, so that the change detection effect of the heterogeneous image is effectively ensured.

Furthermore, the invention uses the twin full convolution feature extraction network as the feature extraction unit of the method, and can extract the multi-scale depth feature information of the image to be detected in the double time phases through convolution of a plurality of layers, thereby obtaining a difference feature map containing feature information in different scales and providing effective data support for the next change detection task.

Furthermore, the homologous image pair used by the invention adopts a pair of double-temporal multispectral high-resolution Satellite images of Montpellier areas provided in the Onera Satellite Change Detection public data set. The heterogeneous image used by the invention adopts a pair of two-time-phase Synthetic Aperture Radar (SAR)/optical satellite images for shooting in Wuhan region in China. Resizing it to 640 x 640 ensures consistency in the sizes of the subsequent convolution and deconvolution operations. The normalization operation makes the pixel value distribution of the image conform to the data distribution rule, and the generalization effect after training can be more easily obtained.

Furthermore, the up-sampling module based on meta-learning can be used for guiding the generation of convolution weights in the up-sampling convolution layer by fully fusing the global features of the images, so that the information loss in the up-sampling process is effectively avoided. The feature fusion module based on the grouping convolution can fully fuse the up-sampled features and the original features under the scale, and further avoid the feature loss in the up-sampling process of the full convolution network. Meanwhile, the feature fusion module further utilizes the relationship among the feature graph channels, introduces the inter-spectrum relationship among the feature graphs for the generation of the final difference graph, and improves the final change detection effect. In the up-sampling module based on meta-learning, an inverted residual block is also used, which can allocate the obtained global features to each meta-block generated according to the position in the residual block, and is used for guiding the generation of the weight parameters through a dynamic block convolution function, so that the difference feature map can better grasp the relationship among all pixel points in the space of the difference feature map, and the up-sampling of the deep network is better realized. In the feature fusion module based on the grouping convolution, the grouping convolution is used, and the grouping convolution with different convolution kernel sizes is adopted, so that the feature fusion between different feature difference maps can be better completed, and the features between the difference feature maps can be captured.

Furthermore, a depth change vector analysis network is used for converting the difference characteristic maps with different scales into a difference map capable of representing the change amplitude of the corresponding position of each pixel, so that the difference map can be directly operated to obtain a required change detection map, a new objective function for change detection is provided, the intra-class similarity of the change area and the invariable area is emphasized, and the inter-class difference between the change area and the invariable area is continuously enlarged, so that the change area is highlighted, the final effect of change detection is ensured, and the robustness of the detection process is improved.

Furthermore, an Adam optimization algorithm is used for updating the network weight value, the first moment estimation and the second moment estimation of the gradient are comprehensively considered, the method is more suitable for the problem that the gradient is sparse or the gradient has large noise and an unstable target function, and can play a better role in an unsupervised mode.

Further, the parameters are continuously updated using the improved loss function to obtain a weight value W that better generates an image disparity map_newThereby ensuring the final change detection accuracy.

Further, the invention uses the threshold value used by the last improved loss function to carry out the binarization process of the difference map. Due to the fact that the loss function and the updatability of the threshold value of the loss function are improved, the change region and the invariant region can be better distinguished, the final binary change image is obtained, and the binary change image can well reflect the change situation between the two images of the double-time phase image.

In conclusion, the method and the device can efficiently and accurately obtain the change detection result of the pair of homologous or heterologous double-temporal remote sensing images in an unsupervised mode.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a meta-learning upsampling module of the present invention;

FIG. 3 is a block diagram of a packet convolution module according to the present invention;

FIG. 4 is a schematic diagram of the overall change detection network of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be understood that the terms "comprises" and/or "comprising" indicate the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

The invention provides an unsupervised change detection method for homologous or heterologous remote sensing images, which is characterized in that homologous or heterologous double-time phase image pairs are input; constructing a full convolution feature extraction network; constructing a feature fusion network based on meta-learning and packet convolution; the training of the whole network is completed through an improved objective function; finally, a binary change map corresponding to the input image pair is generated.

The invention utilizes the meta-learning module to fuse the global features in the images and reduce the feature difference between the pair of heterogeneous images, and simultaneously utilizes the feature fusion module based on the grouping convolution to fully fuse the features with different scales and introduce the inter-spectrum relationship for the difference feature map. The invention also uses the improved objective function to further highlight the sample characteristics of the variable region and the invariable region, thereby ensuring the reliability of the training process. The modules act together, so that the final detection result of the method has strong robustness and high detection accuracy, and the method can be well suitable for homologous or heterologous change detection tasks.

Referring to fig. 1, the unsupervised change detection method for homologous or heterologous remote sensing images of the present invention includes the following steps:

s1, constructing a twin full convolution feature extraction network;

the twin full convolution feature extraction network is a group of dual-branch weight sharing full convolution networks, wherein each full convolution network is composed of a VGG16 model formed by five convolution modules connected in series.

The sizes of convolution kernels in a VGG16 model are all set to be 3 x 3 pixels, the step sizes are all set to be 1 x 1 pixels, and the padding is set to be 1 pixel, and the model adopts weights trained in advance on a building extraction data set as initial values.

In the network parameters of the VGG16, the input feature maps of the second, third, fourth and fifth convolution modules are sequentially set to be 64, 128, 256 and 512, and the output feature maps are sequentially set to be 128, 256, 512 and 512. The modules are connected through an average pooling layer and used for reducing the size of the characteristic diagram and improving the receptive field of the convolution kernel, so that the network extracts more abundant multi-scale information.

Inputting a homologous or heterologous two-phase image pair;

for the used image pair, the images are respectively input into a twin full convolution feature extraction network built in the preceding text and a feature fusion network based on meta learning and grouping convolution through certain preprocessing, and the change result of the image pair is finally and better detected through continuous training.

S101, the homologous image pair data adopts a double-time phase image of a Montpellier area provided in an One Satellite Change Detection (OSCD) public data set, the double-time phase image is a pair of multispectral Satellite images, the two-time phase image is shot in the Montpellier area by a Sentiel-2 Satellite in 2015, 8 and 12 days and 2017, 10 and 30 days, the spatial resolution is 10 meters/pixel, and the size is 451 multiplied by 426 pixels. The data set provides a reference image that represents the change between two periods of time.

The method uses four spectral bands of red, green, blue and near infrared in the multispectral as research bands.

The pair of images is resized to 640 x 640, the dual-temporal images are normalized by subtracting the mean value from each pixel point in the image pair and dividing the result by the standard deviation, and the registration of the input homologous image pair is completed by using a related preprocessing method.

S102, adopting a pair of Synthetic Aperture Radar (SAR)/optical satellite images for shooting in Wuhan region in China as heterogeneous image data. The SAR image is shot by Radarsat-2 in C band in 6 months 2008, the optical image is data of red, green and blue bands obtained by *** earth in 11 months 2011, and the pixel sizes of the two images are 495 × 503. In this dataset, the main changes in the scene are changes in buildings and roads. The data set provides a reference image that represents the change between two periods of time.

The pair of images is resized to 640 x 640, the dual-phase images are normalized by subtracting the mean value from each pixel point in the image pair and dividing the result by the standard deviation, and the registration of the input pair of heterogeneous images is completed by using a relevant preprocessing method.

And inputting the processed homologous or heterologous images into a twin full convolution feature extraction network built in S1 and S2 and a feature fusion network based on meta-learning and packet convolution to obtain a corresponding difference map.

S2, constructing a feature fusion network based on meta-learning and packet convolution;

a feature fusion network which can eliminate the feature difference of different source images and can fully fuse features of different scales is constructed. In order to achieve the purpose, the invention constructs a feature fusion network based on meta-learning and packet convolution. The feature fusion network is composed of a global feature sampling network, a feature up-sampling model based on a meta-learning module and a feature fusion model 3 based on a grouping convolution module, and is shown in fig. 3.

Firstly, feature maps output by double branches of the twin full convolution feature extraction network are input into global feature sampling networks with the same structure and independent parameters respectively, the global feature sampling networks are used for grasping global features of images in two time periods, and the global features extracted by the global feature sampling networks are used for guiding generation of feature weights in an up-sampling stage of the twin full convolution feature extraction network. In addition, for a heterogeneous image, the twin full convolution feature extraction network and the global feature sampling network are used for further preprocessing the input double-time phase image, and finally the two images are classified through a classifier, so that the feature difference of the two images is eliminated in the mode, and the feature capable of performing change detection in the same feature space is obtained. (to introduce a global feature sampling network here)

Next, for feature maps of two time periods obtained by different convolution module layers in the twin full convolution feature extraction network, feature difference maps of corresponding layers are obtained through a pixel-by-pixel difference solving operation. Under the guidance of the global feature sampling network, providing weight parameters corresponding to different positions for a difference feature map obtained by each layer of the twin full convolution feature extraction network, and obtaining the difference feature map containing the relationship between the global features and the positions through a meta-learning module.

The difference characteristic graph after the up-sampling is connected with the difference graph of the current layer, and the grouping convolution modules are used for enabling the grouping convolution of different scales to grasp the change area characteristics of different forms, considering the relationship among characteristic channels and endowing the inter-spectrum relationship for the fused difference characteristics. (to introduce a feature fusion model based on a block convolution module here)

Five difference feature maps corresponding to the five convolution module scales of the twin full convolution feature extraction network are obtained through feature upsampling based on the meta-learning module and feature fusion based on the grouping convolution module, and are input into the depth change vector analysis module, so that a final difference map can be obtained.

Each module in the feature fusion network will be described in detail with reference to the block diagram.

Referring to fig. 2 and 4, the structure of the global feature sampling network and the meta-learning module will be further described.

First, a global feature sampling network is introduced, as shown in fig. 4. The global feature sampling network is similar to a U-network and uses skip connections between the respective layers of the encoder and decoder of the U-network.

In the encoder of the global feature sampling network, 2 convolutional layers are used to obtain 2 feature maps with different scales, and global average pooling is used for the last layer of the decoder to obtain the global feature with the size of 1. The first convolutional layer contains a convolution operation and a downsampling operation. In the convolution operation, the size of a convolution kernel is set to be 1 x 1 pixel, the step size is set to be 1 x 1 pixel, the filling is set to be 0 pixel, and the average pooling layer is adopted for downsampling. The second convolutional layer contains a convolution operation where the convolution kernel size is set to 2 x 2 pixels, the step size is 2 x 2 pixels, and the fill is set to 0 pixels. The number of channels of the feature map of each layer in the encoder is 512, which is the same as the input feature map.

In the decoder of the global feature sampling network, a bilinear interpolation method is adopted for the up-sampling of the global features, and a feature map with the same size as the second convolutional layer is obtained. After the two are cascaded, the size of a convolution kernel is set to be 1 multiplied by 1 pixel, the step size is set to be 1 multiplied by 1 pixel, and the convolution operation with the filling setting to be 0 pixel is used for fusing the two. Then 2 upsampling will be performed, using deconvolution operation. After each time of use, the data is cascaded with the encoder output characteristics with the corresponding size, and then the convolution operation fusion is carried out by setting the size of a convolution kernel to be 1 × 1 pixel, setting the step length to be 1 × 1 pixel and filling to be 0 pixel. Finally, the two global feature extraction networks with the same structure obtain a feature map which has the same output size as the 5 th layer of the twin full convolution feature extraction network and is fused with global features. And obtaining a feature difference map containing global features by subtracting the two feature maps.

For heterogeneous image pairs, pre-training will be done by the encoders of the twin full convolution feature extraction network and the global feature extraction network to eliminate feature differences between the heterogeneous images. And for the global features with the size of 1 obtained after global average pooling in an encoder of the global feature extraction network, setting the size of a convolution kernel to be 1 × 1 pixel, setting the step length to be 1 × 1 pixel, filling convolution operation with the size set to be 0 pixel for classification, wherein the number of classes is 2, and the classification is used for identifying whether feature maps of two heterogeneous images are in the same feature space.

Referring to fig. 2, the structure of the upsampling module based on meta-learning will be described.

The main flow of the up-sampling module based on meta-learning comprises a position coding process and an inverted residual block. The method mainly utilizes a feature map containing global information obtained by a global feature sampling network to endow each layer of difference features with global features and guide an up-sampling process.

First, for each layer of difference features, it includes a position encoding process to ensure that the following links can assign different weights to different positions. The process of position coding is shown by the following formula:

wherein the content of the first and second substances,

representing the position code of the difference feature map at (i, j), and H and W are the length and width of the current difference feature map, respectively. The position-coding and difference feature maps will then be jointly input into the portion of the inverted residual block after it has been input.

The inverse residual block is a set of convolution structures consisting of two sets of regular convolutions and one deep convolution. Firstly, setting the size of a convolution kernel to be 1 × 1 pixel, setting the step length to be 1 × 1 pixel, filling the convolution operation set to be 0 pixel, enlarging the dimension of an input feature graph, and then connecting a batch regularization layer and a ReLU activation function; setting the size of a convolution kernel to be 3 x 3 pixels, setting the step length to be 1 x 1 pixel, filling the depth convolution operation set to be 1 pixel, grasping more comprehensive characteristic information under the condition of not excessively increasing network parameters, and then carrying out batch regularization layer and ReLU activation function; and finally, the dimension of the input feature diagram is restored to the dimension of the input feature diagram through the convolution operation that the size of a convolution kernel is set to be 1 multiplied by 1 pixel, the step length is set to be 1 multiplied by 1 pixel, and the convolution operation that the pixel is set to be 0 pixel is filled.

In the inverted residual block used in the present invention, the input feature dimensions are 64, 128, 256, 512 from layer 1 to layer 5, respectively, and the added feature dimensions are 128, 256, 512, 1024, respectively.

The weights of convolution in the inverted residual block are generated under the guidance of a feature map containing global information, and the process is a meta-learning process. First, the number of channels of the feature map containing global information needs to be divided into several parts, and this process is assigned according to the number of weights required for each metablock (5 blocks in total in the present invention). The global features obtained by each metablock assignment are then used to guide the generation of weight parameters by a dynamic block convolution function. The operation of the dynamic block convolution function is shown as follows:

O_i,j＝X_i,j*θ_i,j

wherein X_i,jA difference characteristic diagram, theta, representing the input_i,jAs a global feature，O_i,jThe resulting weight for inverting the residual block. Due to the position coding obtained before, the position-by-position corresponding multiplication can be carried out in the dynamic block convolution process, so that the position relation is given to the finally obtained weights, and the relation among all pixel points in the difference characteristic diagram space can be better mastered through the characteristic diagram of the inverted residual block.

Referring to fig. 3, the structure of the packet convolution module is further described.

The grouping convolution module is composed of 4 groups of grouping convolutions with different convolution kernel sizes, and the main structure comprises a characteristic cascading part and a grouping convolution characteristic fusion part. And the characteristic cascading part fuses the difference characteristic diagram after the up-sampling with the characteristic diagram of the current layer, namely, the characteristic diagram with the channel number of 2C is obtained by connecting the characteristic diagram with the channel one by one. And then inputting a grouping convolution feature fusion part which consists of four independent grouping convolutions, wherein each grouping convolution is divided into C groups, namely, a special convolution kernel is provided for each group of channels (two channels are respectively from two different feature maps) in the feature maps, so that the features among the feature maps can be better captured. In addition, since the position and range of the change area vary greatly in the change detection task, the size of each convolution kernel is different in the packet convolution in order to better grasp the change area of different size and shape. The sizes of the 4 convolution kernels are 1 × 1, 3 × 1, 1 × 3 and 3 × 3 respectively, and finally 4 feature maps with the same size as the input feature map and the number of channels of 2C are obtained. And then, cascading the feature maps obtained by 4 grouped convolutions, and finally obtaining a fusion feature map with the channel number of C through 1 convolution operation with the convolution kernel size of 1 multiplied by 1, thereby completing the channel fusion operation of the corresponding scale feature difference map. C are 64, 128, 256, 512 in different layers from layer 1 to layer 5, respectively.

S3, improving the objective function to complete network optimization

In order to optimize the twin full convolution feature extraction network and the global feature sampling network built in the steps S1 and S2, a feature difference map is first generated into a difference map with a value range between [0,1] by using a depth change vector analysis method. The loss function designed by the invention and a common Adam optimization algorithm are optimized, so that the numerical difference between the changed pixels and the unchanged pixels in the difference image is enlarged to the maximum extent, and the generation of the binary change image in the S5 is completed better.

Through the built twin full convolution feature extraction network and the global feature sampling network, the homologous or heterologous image pair obtains a difference feature map containing 5 different scales, and through the depth change vector analysis network, the change amplitude of the corresponding position of each pixel can be obtained, as shown in the following formula:

wherein rho represents the variation amplitude of the characteristic diagram, the large rho value represents high variation degree, the small rho value represents low variation degree, and g^dA difference profile obtained for the network. Through this operation, a variation graph with dimension 1 is obtained, and the value of each pixel point in the graph represents the variation amplitude at the position.

For a general method, after obtaining the change map, the change degree can be divided into 2 types by a specific threshold value selection method, and then a binary change map can be obtained to complete change detection. In the invention, in order to further highlight the change region and obtain a more accurate region boundary, an improved loss function is proposed, so that the twin full convolution feature extraction network constructed in steps S1 and S2 and the feature fusion network based on meta-learning and packet convolution can be continuously updated in the training process, and the features of the input image pair can be better grasped.

The formula for the loss function is as follows:

where H is the length of the disparity map, W is the width of the disparity map, ρ_i,jFor the magnitude of the value of the disparity map at the (i, j) position, i.e. the amplitude of the variation of the pixel at the (i, j) position, τ_local(i, j) are locally adaptive thresholds that can capture strong local changes in the image, well highlighting the boundaries of the change map. Low-pass filtering is used for the determination of the locally adaptive boundary, gaussian filtering is chosen to decide the locally adaptive threshold, i.e. a context-dependent threshold is generated for each pixel by generating a weighted average of the neighborhood of each pixel.

Next, using Adam optimization algorithm, iteratively update the network weight values, as follows:

inputting a pair of homologous or heterologous double-temporal image pairs into a built twin full convolution feature extraction network and a feature fusion network based on meta-learning and grouping convolution, and updating weight values of the twin full convolution feature extraction network and the feature fusion network based on meta-learning and grouping convolution by using weight values:

wherein, W_newFor the updated weight value, W is the initial weight value of the multi-scale graph convolutional neural network, L is the learning rate of the multi-scale graph convolutional neural network training, and the value range of the learning rate is [0.001-0.00001 ]]And x represents a multiplication operation,

representing a partial derivation operation;

and step two, repeatedly inputting the homologous or heterologous double-temporal image pair into a twin full convolution feature extraction network and a feature fusion network based on meta-learning and packet convolution, and updating the loss function loss value after the weight value is updated.

S5, generating a binary change map

Through the continuous optimization of the improved objective function proposed in step S4, the disparity map obtained through the depth change vector analysis method will be continuously updated, and finally the disparity map with the same size as the input image will be obtained. And (3) finishing binarization of the difference graph by using a threshold value used by the last improved loss function, wherein the binarization process is represented as:

wherein tau is_NTo improve the threshold used in the loss function during the last training to distinguish between changed and unchanged regions, CM (x, y) is the expected binary change map, 0 for unchanged regions and 1 for changed regions. Through the above process, a binary change map with a size of 640 × 640 can be obtained, and the binary change map is restored to the original size of the image in the data set, so that a final binary change map is obtained.

In another embodiment of the invention, an unsupervised change detection system for homologous or heterologous remote sensing images is provided, which can be used for implementing the unsupervised change detection method for homologous or heterologous remote sensing images.

The feature module is used for constructing a twin full convolution feature extraction network, inputting homologous or heterologous double-temporal images into the constructed twin full convolution feature extraction network respectively, and extracting multi-scale semantic features to obtain feature difference maps of different scales;

In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of an unsupervised change detection method for homologous or heterologous remote sensing images, and comprises the following steps:

constructing a twin full convolution feature extraction network, respectively inputting homologous or heterologous double-temporal images into the constructed twin full convolution feature extraction network, and extracting multi-scale semantic features to obtain feature difference maps of different scales; extracting global features through a global feature sampling network according to feature difference graphs of different scales, and mapping different source images into the same feature space; constructing a feature fusion network corresponding to each scale and based on meta-learning and packet convolution, fusing feature difference maps of different scales, and obtaining a difference map by a depth change vector analysis method; optimizing a twin full convolution feature extraction network and a feature fusion network based on meta-learning and grouping convolution by using an improved objective function, and iteratively updating the weight values of the twin full convolution feature extraction network and the feature fusion network by using an Adam optimization method; continuously enlarging the numerical value difference between the variable pixels and the invariable pixels in the difference image in the training process; and (4) completing binarization operation on the trained difference image through a threshold value to obtain a binary change image with the same size as the original image, and completing image change detection.

In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.

One or more instructions stored in the computer-readable storage medium can be loaded and executed by the processor to implement the corresponding steps of the unsupervised change detection method for the homologous or heterologous remote sensing images in the above embodiments; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation conditions

The hardware platform of the simulation experiment of the invention is as follows: CPU is Intel (R) core (TM) i7-8700X, main frequency is 3.2GHz, memory 64GB, GPU is NVIDIA 2080 Ti.

The software platform of the simulation experiment of the invention is as follows: ubuntu operating system and python 3.6.

2. Simulation content and result analysis

The simulation experiment of the invention is used for measuring the detection capability of the unsupervised remote sensing image change detection method based on meta-learning and packet convolution constructed by the invention on homologous and heterologous images.

For the homologous double-temporal image pair, the method provided by the invention and five prior arts (a robust change vector analysis method, a principal component analysis network method, a symmetric coupling convolution network method, a depth change vector analysis method and a depth confidence network method) use a training image pair to respectively carry out unsupervised training. And after the training is finished, carrying out change detection on the pair of images to obtain a binary change image aiming at the group of images.

The simulation experiment of the invention uses a set of multi-spectral Satellite images from Montpellier region in the One Satellite Change Detection (OSCD) data set to the double temporal images used by the homologous images. The set of multispectral satellite images were captured by the Sentinel-2 satellite in Montpellier area on 12 days 8/2015 and on 30 days 10/2017, respectively, with a spatial resolution of 10 m/pixel and a size of 451 × 426 pixels. There are 13 spectral channels of images in the data set. In the experiment, four spectral bands of red, green, blue and near infrared are selected as research bands. The pair of images provides a reference image representing the change between the two periods, but it does not participate in the training process, only finally participates in the evaluation experiment for each method change detection result.

For heterogeneous dual-temporal image pairs, the method provided by the invention and three prior arts (a multi-modal Markov model method, a symmetric coupling convolution network method and a patch similarity graph matrix-based method) use training image pairs to respectively perform unsupervised training. And after the training is finished, carrying out change detection on the pair of images to obtain a binary change image aiming at the group of images.

The simulation experiment of the invention uses a pair of time phase images of a Synthetic Aperture Radar (SAR)/optical satellite image shot in Wuhan area in China for the heterogeneous images. The SAR image is shot by Radarsat-2 in C band in 6 months 2008, the optical image is data of red, green and blue bands obtained by *** earth in 11 months 2011, and the pixel sizes of the two images are 495 × 503. In this dataset, the main changes in the scene are changes in buildings and roads. The data set provides a reference image that represents the change between the two periods, but it does not participate in the training process, only in the last participation in the evaluation experiment of the results of the detection of the change in each method.

In the simulation experiments on homologous image pairs, the five prior arts adopted refer to:

the Robust change vector analysis method in the prior art refers to a remote sensing image change detection method proposed in a paper "Robust Change Vector Analysis (RCVA) for multi-sensor very high resolution optical satellite" (int.j.appl.ear. observer. geooil. 2016) published by Thonfeld et al.

The principal component analysis network method in the prior art refers to a remote sensing image change detection method proposed in an article "Automatic change detection in synthetic aperture images based on PCANet" (IEEE geosci. remote Sens. Lett., 2016) published by Gao et al, which is called a principal component analysis network method for short.

The prior art symmetric coupling convolution network method refers to a remote sensing image change detection method proposed in a paper "adaptive connected convolutional network for change detection based on terrestrial images" (IEEE trans. new. lean. system.2018) published by Liu et al.

The depth change vector analysis method in the prior art refers to a remote sensing image change detection method proposed by Saha et al in a paper "unheaperused deep change vector analysis for multiple-change detection in VHR images" (IEEE trans. geosci. remote sensors, 2019), which is based on a depth learning method and can detect changes occurring in a picture pair by performing direct depth feature comparison based on the result of change vector analysis, and is referred to as a depth change vector analysis method for short.

The deep confidence network method in the prior art refers to a paper "Change detection in synthetic adaptive radiation images based on deep neural networks" (IEEE trans. neural net. lean. syst., 2016) published by Gong et al, and the method creates a model which can predict each spatial position label by learning the concepts of pixel Change and pixel invariance from an initial classification result to complete unsupervised Change detection, which is called the deep confidence network method for short.

The variation graphs of the Montpellier area obtained by the four methods are respectively compared with the reference image provided by the data set comprehensively by using 4 evaluation indexes (Precision, Recall, Accuracy, Kappa coefficient). The Accuracy Precision, Recall, Accuracy, Kappa, were calculated using the following formula and the results are plotted in table 1:

wherein the content of the first and second substances,

wherein TP represents the number of pixels correctly classified into a change category in the change map, TN represents the number of pixels correctly classified into an invariant category in the change map, FP represents the number of pixels incorrectly classified into a change category in the change map, and FN represents the number of pixels incorrectly classified into an invariant category in the change map.

The invention in table 1 represents the unsupervised remote sensing image change detection method based on meta-learning and packet convolution, the RCVA represents the robust change vector analysis method proposed by Thonfeld et al, the PCANet represents the principal component analysis network method proposed by Gao et al, the SCCN represents the symmetric convolution coupling network method proposed by Liu et al, the DCVA represents the depth change vector analysis method proposed by Saha et al, and the DBN represents the depth confidence network method proposed by Gong et al.

TABLE 1 Performance evaluation chart of the change detection method of the invention and the existing unsupervised remote sensing image

It can be seen from table 1 that the Accuracy Precision of the binary change map obtained by unsupervised learning of the group of double-temporal remote sensing images is 83.56%, the Recall rate Recall is 72.73%, the Accuracy is 97.21%, and the Kappa coefficient is 0.7629, and these 4 indexes are all higher than 5 prior art methods in the simulation experiment, which indicates that the method can obtain higher change detection Accuracy.

In simulation experiments on heterogeneous image pairs, the three prior arts adopted refer to:

the prior art multi-modal Markov model method refers to a heterogeneous remote sensing image change detection method proposed in a paper "multi-modal change detection in remote sensing images using an unsupervised pixel based Markov random field model" (IEEE trans. image process, 2019) published by touatio et al, and the method uses an improved Markov model to complete the change detection of the homogeneous and heterogeneous images, which is called a multi-modal Markov model method for short.

The prior art symmetric coupling convolution network method refers to a heterogeneous remote sensing image change detection method proposed in a paper "adaptive connected networking network for change detection based on heterogeneous remote images" (IEEE trans. new. lean. system, 2018) published by Liu et al.

The prior art is Based on a Patch Similarity Graph Matrix method, which is a Heterogeneous Remote Sensing image Change Detection method proposed in a paper "Patch Similarity Graph Matrix-Based inactive Remote Sensing Change Detection With a Homogeneous and Heterogeneous Sensors" (IEEE trans. Geosci. Remote Sensors, 2020) published by Sun et al.

The variation graphs of the Montpellier area obtained by the four methods are respectively compared with the reference image provided by the data set comprehensively by using 4 evaluation indexes (Precision, Recall, Accuracy, Kappa coefficient). The results of the calculations are plotted in table 2.

The 'invention' in table 2 represents the unsupervised remote sensing image change detection method based on meta-learning and packet convolution proposed by the invention, the 'M3 CD' represents the multi-modal markov model method proposed by Touati et al, the 'SCCN' represents the symmetric convolution coupling network method proposed by Liu et al, and the 'PSGM' represents the patch similarity graph matrix method proposed by Sun et al.

TABLE 2 Performance evaluation table of the present invention and the existing unsupervised heterogeneous remote sensing image change detection method

It can be seen from table 2 that the Accuracy Precision of the binary change map obtained by unsupervised learning of the heterogeneous dual-temporal remote sensing image set is 75.48%, the Recall rate Recall is 71.75%, the Accuracy is 95.13%, and the Kappa coefficient is 0.7088, and these 4 indexes are all higher than those of the 3 prior art methods in the simulation experiment, which indicates that the method can obtain higher change detection Accuracy in processing the heterogeneous dual-temporal image change detection task.

In summary, the unsupervised change detection method and system for the homologous or heterologous remote sensing image can fully extract multi-scale and multi-level information by utilizing the built twin full convolution feature extraction network; by using the up-sampling network based on the meta-learning, the global feature extraction network can grasp the global information of the image pair, and meanwhile, the problem of feature difference between different source images can be solved through the added classifier. Meanwhile, the method can be used for guiding the generation of convolution weights in the upsampling convolutional layer by fully fusing the global characteristics of the image, so that the information loss in the upsampling process can be effectively avoided; by utilizing the feature fusion network based on the packet convolution, the upsampled features can be combined with the difference features of the original layer, so that effective feature fusion is realized. The meta-learning module and the grouping convolution module act together, and the problem that the detection precision is low due to the fact that the multi-scale change characteristic diagram is incompletely grasped and content loss is easily caused in the prior art method is effectively solved. Meanwhile, the invention also provides an improved loss function, emphasizes the change characteristics by continuously shortening the distance between the change and invariable characteristics and the characteristic center thereof, and effectively solves the problems that the change area is insufficiently emphasized and irregular change area boundaries cannot be accurately detected in the prior art. The invention is a very practical unsupervised homologous or heterologous remote sensing image change detection method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. An unsupervised change detection method for homologous or heterologous remote sensing images is characterized by comprising the following steps:

and S4, completing binarization operation of the difference map trained in the step S3 through a threshold value to obtain a binary change map with the same size as the original map, and completing image change detection.

2. The method of claim 1, wherein in step S1, the twin full convolution feature extraction network is a set of dual-branch weight-shared full convolution networks, and each full convolution network is formed by a VGG16 model composed of five convolution modules connected in series.

3. The method according to claim 1, wherein in step S1, the input homologous or heterologous two-phase images are specifically:

4. The method according to claim 1, wherein in step S2, the feature fusion network based on meta-learning and packet convolution includes a global feature sampling network, a feature up-sampling model based on meta-learning module and a feature fusion model based on packet convolution module; the global feature sampling network is a U-shaped network with unshared double-branch weights, and skip connection is adopted between corresponding layers of an encoder and a decoder of the U-shaped network; the feature up-sampling module based on the meta-learning module comprises a position coding process and an inverted residual block; the feature fusion module based on the grouping convolution is formed by connecting 4 grouping convolution modules with different convolution kernels in parallel.

5. The method according to claim 1, wherein in step S3, the feature difference maps of different scales obtained in step S2 are subjected to a depth change vector analysis network to obtain a change amplitude ρ of a corresponding position of each pixel, and the network parameters of the constructed twin full convolution feature extraction network and the feature fusion network based on meta-learning and packet convolution are updated by using an improved loss function, so that the change amplitude ρ continuously approaches 0 or 1 to obtain an optimal difference map.

6. The method according to claim 1, wherein in step S3, the iteratively updating the network weight values using the Adam optimization method specifically includes:

7. The method of claim 6, wherein the loss function loss is as follows:

8. The method of claim 6, wherein the updated weight value W_newComprises the following steps:

representing a partial derivation operation.

9. The method according to claim 1, wherein in step S4, the binarization process of the difference map is represented as:

10. An unsupervised change detection system for homologous or heterologous remote sensing images, comprising: