CN114743109A

CN114743109A - Multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method and system

Info

Publication number: CN114743109A
Application number: CN202210461115.5A
Authority: CN
Inventors: 李树涛; 谢雨欣; 方乐缘; 付巍
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-07-12

Abstract

The invention discloses a multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method and system, which comprises the following steps of using a labeled training set X_LTraining the twin neural network model S to obtain a labeled training model S (X)_Lθ), add a Monte Carlo removal layer, for the unlabeled training set X_UPerforming multi-model collaborative predictionAnd screening to generate a pseudo label training set X'_U(ii) a For labeled training set X_LSelecting significant variation region and clipping to pseudo label training set X'_UTo obtain enhanced pseudo tag data set X_M(ii) a Will have a label training set X_LAnd enhanced pseudo tag dataset X_MThe twin neural network model S is trained together. According to the invention, change detection with higher precision can be completed only by a small number of labels, a real label and a pseudo label can be combined, a semi-supervised learning process is improved by using supervised learning, more edge data is increased, the distribution of positive and negative samples is more balanced, and the change detection performance is improved.

Description

Multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method and system

Technical Field

The invention relates to a high-resolution remote sensing image processing technology, in particular to a multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method and system.

Background

The remotely sensed images may provide a macroscopic view of the earth observation reflecting the type and attributes of objects within the observation region. Change detection is a technique of recognizing a difference in state of the same object by observing it at different times. The multi-temporal remote sensing images covering the same scene can reveal the dynamic change of the ground, so the change detection technology of the multi-time sequence images becomes more and more important. At present, remote sensing image change detection is widely applied to the fields of ecosystem monitoring, resource management, land utilization/land coverage change analysis, city expansion research, damage assessment and the like.

The deep learning method has advantages in processing data with huge amount and complex features. In recent years, the fully supervised deep learning technology has achieved great success in the aspect of detecting the change of the remote sensing image with medium and low resolution. However, fully supervised change detection requires a large number of labeled samples, and labeling large scale high resolution change detection datasets is very time consuming and laborious due to the complexity of the high resolution remote sensing image scene. Not only does it require a significant amount of skill to provide a rich set of experience and expertise, but it also requires a significant amount of time to analyze the changes in the image pairs. In this case, those time consuming methods that rely on a large number of labels are impractical for the diverse and fast emergency scenarios we are interested in (such as natural disaster assessment and land cover conversion). The unsupervised deep learning change detection method directly utilizing the linear transformation theory alleviates the problems to a certain extent. However, the unsupervised method requires many parameters to be set manually due to no label support, and a large number of false change areas are generated in the detection result, so that the detection accuracy is low.

Semi-supervised learning can not only learn labeled data, but also extract useful features from unlabeled data, thereby improving the generalization performance of the model and making it possible to train the model using a small set of labeled data. Most of the existing semi-supervised change detection methods focus on a hyperspectral image, an SAR image or a multispectral image with medium and low resolution. The method aims at the problems that the work of semi-supervised change detection of the high-resolution remote sensing image is little, and because training data labeling samples are limited, the detection result is inaccurate, edge information is fuzzy and the like. Therefore, an efficient semi-supervised change detection method is provided, and the method has important significance for improving the accuracy and efficiency of change detection and promoting the landing of the change detection.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method and system, aims to solve the problem that the detection result is uncertain when the training data labeling sample is limited, and improves the accuracy and performance of change detection by carrying out multi-model collaborative prediction through a Monte Carlo removal layer.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method comprises the following steps of training a twin neural network model S for high-resolution remote sensing image change detection:

1) respectively obtaining labeled and unlabeled remote sensing images to generate a labeled training set X_LAnd a label-free training set X_U；

2) Will have a label training set X_LTraining the twin neural network model S to obtain a labeled training model S (X)_Lθ), where θ is a network parameter;

3) training model S (X) with label_LTheta), adding Monte Carlo removing layer, sampling for multiple times to obtain multiple model parameters, and performing label-free training on the X_UPerforming multi-model cooperative prediction to obtain a labeled training model S (X)_LTheta) on unlabeled training set X_UPrediction probability p of any ith unlabeled training sample_i ^UAnd binary prediction result y_i ^UObtaining uncertainty estimates D from multiple predictions in synergy_iCalculating the prediction probability p of the unlabeled training sample i_i ^UAccording to the set confidence coefficient threshold value a, screening the label-free training sample and predicting the binary result y_i ^UAs corresponding pseudo labels, according to the screened label-free training sample i, the corresponding pseudo labels and the uncertainty estimation result D_iObtaining a pseudo label training set X'_U；

4) Computing labeled training set X_LSelecting and clipping to a pseudo label training set X 'in a significant change region of an attention map of a twin neural network model S'_UThereby obtaining an enhanced pseudo tag data set X after combining the real tag and the pseudo tag_M；

5) Will have a label training set X_LAnd enhanced pseudo tag dataset X_MJointly forming a mixed training set X, training a twin neural network model S through the mixed training set X to obtain updated network parameters theta ', and obtaining a twin neural network model S (X, theta') after semi-supervised training;

6) judging whether a preset ending condition is satisfied, if not, skipping to execute the step 3) to continue iterative training to update the network parameter theta' of the twin neural network model S, otherwise, judging that the twin neural network model S is trained completely.

Optionally, generating a labeled training set X in step 1)_LContaining N_LEach labeled training sample comprises two labeled remote sensing images in different time phases and a corresponding pixel-level label; unlabeled training set X_UContaining N_UAnd each unlabeled training sample comprises two labeled remote sensing images in different time phases.

Optionally, the twin neural network model S includes two branch encoders and a decoding and merging module, the two branch encoders share parameters, the branch encoders include two encoding modules connected in sequence and three encoding modules including an attention module, the decoding and merging module includes three decoding modules including an attention module, two decoding modules, and a convolutional layer connected in sequence, an output end of one of the two branch encoders is connected to an input end of the decoding and merging module, and an absolute value of a feature map difference output between encoding modules at the same level or encoding modules including an attention module in the two branch encoders is connected to an output end of a decoding module including an attention module or a decoding module in the decoding and merging module.

Optionally, the labeled training set X in the step 2) is_LWhen the twin neural network model S is trained, the steps of each round of training comprise: inputting the labeled training samples of the round into a twin neural network model S for training to obtain a prediction probability p_i ^L(ii) a According to a predetermined loss function L_SAnd reversely propagating the twin neural network model S to obtain a network parameter theta of the twin neural network, wherein the function expression of the preset loss function LS is as follows:

L_S＝-[ω·y_i ^L logp_i ^L+(1-y_i ^L)log(1-p_i ^L)]

in the above formula, ω is the ratio of the number of samples of the variable class to the number of samples of the invariant class in the labeled training set, y_i ^LTraining samples for labels corresponding to pixel-level labels, p_i ^LThe prediction probability obtained for the twin neural network model S.

Optionally, the prediction result y is calculated in step 3)^UThe functional expression of confidence in the predicted result is:

in the above formula, γ_iConfidence for the ith unlabeled training sample, N_PFor the ith unlabeled training sampleNumber of pixels, L_p(p_i，j ^U) Representing a statistic p_i，j ^UGreater than a preset confidence upper limit threshold gamma_pNumber of pixels of, L_n(p_i，j ^U) Representing a statistic p_i，j ^UGreater than a preset confidence lower limit threshold gamma_nNumber of pixels of p_i，j ^UTo predict the probability p^UPrediction probability, N, of pixel j in the ith unlabeled training sample_UFor label-free training set X_UAnd (2) the number of samples in (1) and:

optionally, obtaining an uncertainty estimation result D by collaborating multiple prediction results in step 3)_iThe functional expression of (a) is:

in the above formula, N_predNumber of models, y, generated for removing layers using Monte Carlo_i，n ^UThe binary prediction result, E (y), generated for the nth prediction for the ith unlabeled training sample_i，n ^U) And (4) averaging the binary prediction results generated by N times of prediction of the multi-model of the ith label-free training sample.

Optionally, step 4) comprises:

4.1) computing the labeled training set X_LA significant change region in the attention map of the twin neural network model S;

4.2) selecting a significant change area and enabling the dogs to have a binary mask matrix M with the same size as the original labeled training sample;

4.3) select significant variation region and clip to pseudo-label training set X 'based on'_UTo obtain an enhanced pseudo tag data set X after combining real tags and pseudo tags_M；

In the above formula, the first and second carbon atoms are,

wherein

Two enhanced pseudo-tag remote sensing images representing different time phases,

for the purpose of enhanced pixel-level pseudo-labeling,

as a pseudo label

Corresponding uncertainty estimate, N_U′For enhanced pseudo label data set X_MThe number of samples of (a);

and

for two labeled remote sensing images of different time phases in the ith labeled training sample,

is the label of the ith labeled training sample, M is a binary mask,

for label-free training set X_UTwo labeled remote sensing images of different time phases in the selected unlabeled training sample,

pseudo-label for the selected label-free training sample, D_i'for the selected result of uncertainty estimation corresponding to the unlabeled training sample,' an operator indicates multiplication of the corresponding position.

Optionally, when the twin neural network model S is trained through the hybrid training set X in step 5), the loss function L adopted is:

L＝L_S+L_U

in the above formula, L_SFor the labeled training set X in the step 2)_LA predetermined loss function, L, used in training the twin neural network model S_UIs uncertainty-weighted binary cross entropy loss and has:

in the above formula, the first and second carbon atoms are,

for the ith uncertainty estimation result in the hybrid training set X, L_WBCEAs an intermediate variable, ω is the ratio of the number of samples of the variable class to the number of samples of the invariant class in the labeled training set, y_i ^MTraining the corresponding pseudo label, p, of the ith label in the mixed training set X_i ^MAnd training the corresponding prediction probability of the sample for the ith label in the mixed training set X.

In addition, the invention also provides a multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection system which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method.

In addition, the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program is used for being executed by computer equipment to implement the steps of the multi-model collaborative optimization high-score remote sensing image semi-supervised change detection method.

Compared with the prior art, the invention has the following advantages:

firstly, aiming at the problem that the labeling sample of the training data of the high-resolution remote sensing image is limited, the change detection with higher precision can be completed by using a semi-supervised method and only needing a small amount of labels.

Secondly, the invention cuts the significant change area in the labeled training set into the pseudo label training set, can combine the real label and the pseudo label, improves the semi-supervised learning process by using the supervised learning, increases more edge data, makes the distribution of the positive and negative samples more balanced, and further improves the change detection performance.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a twin neural network model S according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an encoder module model according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a decoder module according to an embodiment of the present invention.

Fig. 5 is a schematic view of an attention module model according to an embodiment of the invention.

FIG. 6 is a graph showing a comparison of the results of a first set of variation tests performed by the method of the present invention with other prior art methods.

FIG. 7 is a graphical representation comparing the results of a second set of change detections in the method of the present invention with other prior art methods.

FIG. 8 is a graphical comparison of the results of a second set of change detections using the method of the present invention and other prior art methods.

Detailed Description

As shown in fig. 1, the present embodiment provides a method for detecting semi-supervised change of a multi-model collaborative optimization high-resolution remote sensing image, including the step of training a twin neural network model S for detecting change of a high-resolution remote sensing image:

3) training model S (X) with label_LTheta), adding Monte Carlo removing layer, sampling for multiple times to obtain multiple model parameters, and performing label-free training on the X_UPerforming multi-model cooperative prediction to obtain a labeled training model S (X)_Lθ) to the unlabeled training set X_UPredicted probability p of any ith unlabeled training sample in (1)_i ^UAnd binary prediction result y_i ^UObtaining uncertainty estimation result D by cooperating with multiple prediction results_iCalculating the prediction probability p of the unlabeled training sample i_i ^UAccording to the set confidence threshold value a, screening the label-free training sample and predicting the binary result y_i ^UAs corresponding pseudo labels, according to the screened label-free training sample i, the corresponding pseudo labels and the uncertainty estimation result D_iObtaining a pseudo label training set X'_U；

5) Will have a label training set X_LAnd enhanced pseudo tag data setX_MForming a mixed training set X together, training a twin neural network model S through the mixed training set X to obtain updated network parameters theta ', and obtaining a twin neural network model S (X, theta') after semi-supervised training;

6) judging whether a preset ending condition is satisfied, if not, skipping to execute the step 3) to continue iterative training to update the network parameter theta' of the twin neural network model S, otherwise, judging that the twin neural network model S is trained completely. After the twin neural network model S is trained, the remote sensing image to be detected is input into the twin neural network model S, and then change detection of the remote sensing image can be achieved.

In this embodiment, the labeled training set X is generated in step 1)_LContaining N_LEach labeled training sample comprises two labeled remote sensing images in different time phases and a corresponding pixel-level label; labeled training set X_LCan be expressed as:

in the above formula, the first and second carbon atoms are,

labeled remote sensing images representing different phases,

for the corresponding pixel level label, N_LThe number of samples in the training set for the labeled training set. Unlabeled training set X_UContaining N_UAnd each unlabeled training sample comprises two labeled remote sensing images in different time phases. Unlabeled training set X_UCan be expressed as:

in the above formula, the first and second carbon atoms are,

unlabeled remote sensing images representing different time phases, N_UThe number of samples in the unlabeled training set. Number of samples N in labeled training set_LNumber of samples N in label-free training set_UThe specific gravity of the two can be configured according to the need, for example, as an alternative implementation, the number N of samples in the labeled training set is provided in this embodiment_L712, accounting for 10% of the total training set; number of samples N in unlabeled training set_U6408, it accounts for 90% of the total training set.

As shown in fig. 2, the twin neural network model S includes two branch encoders and a decoding-merging module, the two branch encoders share parameters, the branch encoders include two encoding modules (Enc1 and Enc2) connected in sequence and three encoding modules (EAM1 to EAM3) including attention modules, the decoding-merging module includes three decoding modules (DAM1 to DAM3) including attention modules, two decoding modules (Dec1 and Dec2) connected in sequence and a convolutional layer, an output terminal of one of the two branch encoders is connected to an input terminal of the decoding-merging module, and an absolute value of a difference in characteristic map output between encoding modules of the same level or encoding modules including attention modules in the two branch encoders is connected to an output terminal of the decoding module or decoding module including attention modules in the decoding-merging module, for example, as an alternative embodiment, referring to fig. 2, the absolute value of the profile difference output between the coding modules Enc1 of the two constituent encoders is connected to the output of the decoding module Dec2, the absolute value of the profile difference output between the coding modules Enc2 of the two constituent encoders is connected to the output of the decoding module Dec1, the absolute value of the profile difference output between the coding modules EAM1 of the two constituent encoders having attention modules is connected to the output of the decoding module DAM3 having attention modules, the absolute value of the profile difference output between the coding modules EAM2 of the two constituent encoders having attention modules is connected to the output of the decoding module DAM2 having attention modules, and the absolute value of the profile difference output between the coding modules EAM3 of the two constituent encoders having attention modules is connected to the output of the decoding module DAM1 having attention modules. Referring to fig. 2, the convolutional layer in this embodiment includes two convolution modules, i.e., a Conv16 and a Conv1, and the output of the decoding and merging module passes through the two convolution modules, i.e., a Conv16 and a Conv1 in sequence, so as to obtain a binary change map.

The structure of the coding module and the coding module including the Attention module is shown in fig. 3, and the difference is whether an optional Attention module (Attention) is included. The coding module comprises two 3 × 3 convolutional layers (Conv), a batch normalization layer and a modified linear unit (not shown in the figures of batch normalization layer and modified linear unit), and a Strided convolutional layer (stranded Conv) to double down-sample the features; the coding module containing the Attention module includes two 3 × 3 convolutional layers, a batch normalization layer and a modified linear unit, an Attention module (Attention) and a stride convolutional layer to double down sample the features.

The decoding module and the decoding module with Attention module have the structure shown in fig. 4, which is distinguished whether an optional Attention module (Attention) is included. The decoding module comprises two 3 × 3 convolutional layers (Conv), a batch normalization layer and a modified linear unit (not shown in the figures of the batch normalization layer and the modified linear unit), and an inverse convolutional layer (TransposeConv) to up-sample the features by two times; the decoding module including the Attention module includes two 3 × 3 convolutional layers, a batch normalization layer and a modified linear unit, an Attention module (Attention) and an inverse convolutional layer to double down-sample the features.

The structure of the attention module is shown in fig. 5. Each attention module comprises a spatial attention module including a global covariance pooling layer (GCP) and two 1 × 1 convolutional layers (Conv1 × 1), and a channel attention module including two 3 × 3 convolutional layers (scaled Conv) with an expansion ratio of 4, a global mean pooling layer (GAP), and a 1 × 1 convolutional layer (Conv1 × 1). The input and output of the spatial attention module are superposed to be used as the input of the channel attention module, and the input and output of the channel attention module are superposed to be used as the final output.

This implementationIn the example, the step 2) is to have a label training set X_LWhen the twin neural network model S is trained, the steps of each round of training comprise: inputting the labeled training samples of the round into a twin neural network model S for training to obtain a prediction probability p_i ^L(ii) a As can be seen from the foregoing, the labeled training samples in this embodiment include labeled remote sensing images in different time phases; according to a predetermined loss function L_SPropagating the twin neural network model S in the reverse direction to obtain the network parameter theta of the twin neural network, wherein the preset loss function L_SThe functional expression of (a) is:

L_S＝-[ω·y_i ^Llogp_i ^L+(1-y_i ^L)log(1-p_i ^L)]

in the above formula, ω is the ratio of the number of samples of the variable class to the number of samples of the invariant class in the labeled training set, y_i ^LTraining samples for labels corresponding to pixel-level labels, p_i ^LThe prediction probability obtained for the twin neural network model S. According to a predetermined loss function L_SCarrying out back propagation on the twin neural network model S to obtain a network parameter theta of the twin neural network, and extracting an attention diagram of the sample in the twin neural network according to the network parameter theta obtained by learning; recording the twin neural network model S trained on the labeled training set as a labeled training model S (X)_L,θ)。

In this embodiment, the prediction result y is calculated in step 3)^UThe functional expression of the confidence of the medium prediction result is:

in the above formula, γ_iConfidence for the ith unlabeled training sample, N_PFor the number of pixels of the ith unlabeled training sample, L_p(p_i,j ^U) Representing a statistic p_i,j ^UGreater than a preset confidence upper limit threshold gamma_pNumber of pixels of, L_n(p_i,j ^U) Presentation statisticsp_i,j ^UGreater than a preset confidence lower limit threshold gamma_nNumber of pixels of p_i,j ^UTo predict the probability p^UPrediction probability, N, of pixel j in the ith unlabeled training sample_UFor label-free training set X_UAnd (2) the number of samples in (1) and:

in this embodiment, the number of pixels N of the ith unlabeled training sample_P65536, a preset confidence upper threshold γ_p0.9, a preset confidence lower threshold γ_n0.1; it goes without saying that the values of the above parameters can be set as desired. On the basis of this, if gamma_i>a, the image is paired with

Corresponding binary prediction result y_i ^URegarding the image pair as a pseudo label, and taking the image pair and the corresponding pseudo label y_i ^UAnd uncertainty estimation result D_iAdd pseudo label training set X'_UTraining set X 'of pseudo labels'_UCan be expressed as:

wherein

Pseudo-tagged remote sensing images representing different phases,

is a corresponding pixel level pseudo tag, D'_iEstimate the result for the corresponding uncertainty of the pseudo tag, N_U'The number of samples in the training set for the pseudo label,

d' are all from

D, in this example, the confidence threshold a is 0.9.

Because the twin neural network model S has fewer training samples, the network training effect is not good, and the prediction result obtained by the model has uncertainty; the Monte Carlo removal layer is adopted in the multi-prediction process, namely the removal layer of the network is also opened in the test process, the parameters of the network S obey Bernoulli distribution, the model parameters obtained in each sampling are different, and the multi-model prediction result is obtained through multi-sampling. In this embodiment, the uncertainty estimation result D is obtained by collaborating multiple prediction results in step 3)_iThe functional expression of (a) is:

in the above formula, N_predNumber of models generated for removing layers using Monte Carlo, y_i,n ^UThe binary prediction result, E (y), generated for the nth prediction for the ith unlabeled training sample_i,n ^U) And (4) averaging the binary prediction results generated by N times of prediction of the multi-model of the ith label-free training sample. Wherein the number of models generated by removing layers using Monte Carlo is N_predThe value can be selected according to the requirement, for example, in the embodiment, the model number N is adopted_pred＝10。

In this embodiment, step 4) includes:

In the above-mentioned formula, the compound has the following structure,

wherein

for the purpose of enhanced pixel-level pseudo-labeling,

as a pseudo label

and

is the label of the ith labeled training sample, M is a binary mask,

According to the embodiment, uncertainty estimation is obtained by coordinating with multiple prediction results, a loss function is designed according to the uncertainty estimation, high-quality pseudo labels are effectively utilized, overfitting of a model is avoided, and accuracy of model edge detection is improved. Specifically, when the twin neural network model S is trained through the hybrid training set X in step 5), the loss function L adopted is as follows:

L＝L_S+L_U

in the above formula, the first and second carbon atoms are,

for the ith uncertainty estimation result in the hybrid training set X, L_WBCEAs intermediate variables, omega is labeled trainingRatio of number of samples of variable type to number of samples of invariable type in exercise set, y_i ^MTraining the corresponding pseudo label, p, of the ith label in the mixed training set X_i ^MAnd training the corresponding prediction probability of the sample for the ith label in the mixed training set X.

In this embodiment, the steps of predicting, screening pseudo label samples, cutting a significant change frame, and retraining the mixed data set to update the model parameters are continuously cycled based on the current semi-supervised model until all non-label samples are marked with pseudo labels, and the cycle is ended, so that the training of the twin neural network model S is finally completed. After the twin neural network model S is trained, the remote sensing image to be detected is input into the twin neural network model S, and then change detection of the remote sensing image can be achieved.

The data set used for training and testing in this example is the LEVIR-CD data set (LEVIR building Change Detection dataset). The LEVIR-CD dataset contains 637 sets of data, each set containing two images of different phases, each image being 1024 × 1024 in size. Because of the large size of the image, each image was cropped to 16 non-overlapping images of 256 × 256, 10192 in total. Selecting 10% of the training sets as training sets; and taking 10% of the training set as a labeled training set and taking 90% of the training set as an unlabeled training set.

To verify the method of this embodiment, the method proposed in this embodiment is compared with 5 existing methods, and the 5 existing methods include: a Dual Temporal Image Transformer (BIT), a Dual Attention Network (DANet), a Spatial-Temporal Attention Neural Network (STANet), a W Network (W-Net), and a full convolution Early Fusion Network (FC-EF-Res). The 5 supervised methods all used 10% of the LEVIR-CD dataset as the labeled training set, and the final experimental results are shown in Table 1, FIG. 6, FIG. 7 and FIG. 8.

Table 1: comparison result of the method of the embodiment and the existing full-supervision method

Referring to table 1, there are 4 evaluation indexes used herein, namely Precision (Precision), Recall (Recall), F1 score (F1-score), and overall Precision (OA), wherein the F1 score is an evaluation index in which Precision and Recall are considered together. It can be seen from table 1 that, compared with the fully supervised method, although 10% of the LEVIR-CD data set is used as the training set, the method of the present embodiment performs semi-supervised training only by using 10% of the data labels in the training set, so as to achieve the effect comparable to that of fully supervised, and the effect is better than that of the four fully supervised methods of DANet, STANet, W-Net and FC-EF-Res in 3 evaluation indexes of accuracy, F1 score and overall accuracy. It is worth mentioning that the recall rate of the BIT method and the STANet method is slightly higher than that of the method of the present embodiment, because both of the methods adopt an attention mechanism to perform feature learning, and although most of the changed pixels are detected correctly, a small part of the unchanged pixels are detected as changed classes, which results in an improvement in the recall rate. The method of this example outperformed the 5 full supervised methods compared on the F1 score.

Fig. 6, fig. 7 and fig. 8 are schematic diagrams illustrating comparison of three sets of change detection results for three different scenes in the present embodiment and other existing methods, respectively, where a T1 image and a T2 image are a pair of input remote sensing images in different time phases; the label is a pixel level binary label, wherein a white pixel (with a value of 1) represents a variation class, a black pixel (with a value of 0) represents an invariant class, and the label is marked by a data set author; BIT, DANet, STANet, W-Net and FC-EF-Res represent the results of the change detection of five existing methods, and finally the results of the change detection of the present embodiment. As can be seen from the combination of FIGS. 6 to 8, the detection result of the method of the present embodiment has clear edge contour and significantly improved false negative. Therefore, the method can finish the change detection with higher precision only by a small number of labels, improves the accuracy of the model detection edge, and further improves the change detection performance.

In summary, in the method of the present embodiment, model training is performed on a small number of labeled training sets to obtain a labeled training model; adding a Monte Carlo removal layer into a labeled training model, sampling for multiple times to obtain multiple model parameters, performing multiple model prediction on a label-free training set to obtain the prediction probability of a label-free sample, obtaining uncertainty estimation by cooperating with multiple prediction results, and screening out part of the label-free samples according to a set confidence threshold value and printing pseudo labels on the part of the label-free samples; cutting the significant change area in the labeled training set into a pseudo label training set, and combining a real label and a pseudo label to obtain an enhanced pseudo label data set; and designing a loss function according to the uncertainty estimation result, and training by combining the label data set and the pseudo label data set to obtain a multi-model collaborative optimization change detection model. The method can perform semi-supervised change detection under the condition that the training data labeling sample is limited, optimizes the network by cooperating with multiple prediction results, learns more reliable pseudo labels, combines label images and pseudo label images, improves the semi-supervised learning process by using supervised learning, further improves the model performance, and effectively solves the problems that the high-resolution remote sensing data labeling sample is limited and the change detection result is unclear in outline.

In addition, the embodiment also provides a multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method.

In addition, the embodiment also provides a computer readable storage medium, in which a computer program is stored, and the computer program is used for being executed by computer equipment to implement the steps of the foregoing multi-model collaborative optimization high-score remote sensing image semi-supervised change detection method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiments, and all technical solutions that belong to the idea of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method is characterized by comprising the following steps of training a twin neural network model S for high-resolution remote sensing image change detection:

3) training model S (X) with label_LTheta), adding Monte Carlo removing layer, sampling for multiple times to obtain multiple model parameters, and performing label-free training on the X_UPerforming multi-model cooperative prediction to obtain a labeled training model S (X)_LTheta) on unlabeled training set X_UPredicted probability p of any ith unlabeled training sample in (1)_i ^UAnd binary prediction result y_i ^UObtaining uncertainty estimates D from multiple predictions in synergy_iCalculating the prediction probability p of the unlabeled training sample i_i ^UAccording to the set confidence threshold value a, screening the label-free training sample and predicting the binary result y_i ^UAs corresponding pseudo labels, according to the screened label-free training sample i, the corresponding pseudo labels and the uncertainty estimation result D_iObtaining a pseudo label training set X'_U；

5) Will have a label training set X_LAnd enhanced pseudo tag dataset X_MJointly form a mixed training setX, training a twin neural network model S through a mixed training set X to obtain updated network parameters theta ', and obtaining a twin neural network model S (X, theta') after semi-supervised training;

2. The method for detecting semi-supervised change of multi-model collaborative optimization high-resolution remote sensing image according to claim 1, wherein a labeled training set X is generated in step 1)_LContaining N_LEach labeled training sample comprises two labeled remote sensing images in different time phases and a corresponding pixel-level label; unlabeled training set X_UContaining N_UAnd each unlabeled training sample comprises two labeled remote sensing images in different time phases.

3. The multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method according to claim 2, characterized in that the twin neural network model S comprises two branch encoders and a decoding and merging module, the two branch encoders share parameters, the branch encoders comprise two encoding modules and three encoding modules containing attention modules which are connected in sequence, the decoding and merging module comprises three decoding modules with attention modules, two decoding modules and a convolution layer which are connected in sequence, wherein the output end of one of the two branch encoders is connected with the input end of the decoding and merging module, and the absolute value of the difference of the characteristic patterns output between the coding modules with the same level or the coding modules containing the attention module in the two branch coders is connected with the output end of the decoding module or the decoding module containing the attention module in the decoding and merging module.

4. The multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection according to claim 1The method is characterized in that the labeled training set X is used in the step 2)_LWhen the twin neural network model S is trained, the steps of each round of training comprise: inputting the labeled training samples of the round into a twin neural network model S for training to obtain a prediction probability p_i ^L(ii) a According to a predetermined loss function L_SPropagating the twin neural network model S in the reverse direction to obtain the network parameter theta of the twin neural network, wherein the preset loss function L_SThe functional expression of (a) is:

L_S＝-[ω·y_i ^Llogp_i ^L+(1-y_i ^L)log(1-p_i ^L)]

5. The method for detecting semi-supervised change of multi-model cooperative optimization high-resolution remote sensing image according to claim 1, wherein the calculation of the prediction result y in the step 3) is performed^UThe functional expression of confidence in (a) is:

in the above formula, γ_iConfidence for the ith unlabeled training sample, N_PNumber of pixels, L, for the ith unlabeled training sample_p(p_i,j ^U) Representing a statistic p_i,j ^UIs greater than a preset confidence coefficient upper limit threshold value gamma_pNumber of pixels of, L_n(p_i,j ^U) Representing a statistic p_i,j ^UGreater than a preset confidence lower limit threshold gamma_nNumber of pixels of p_i,j ^UTo predict the probability p^UPrediction probability, N, of pixel j in the ith unlabeled training sample_UFor label-free training set X_UNumber of samples in (1), andcomprises the following steps:

6. the method for detecting semi-supervised change of multi-model cooperative optimization high-resolution remote sensing image according to claim 1, wherein uncertainty estimation D is obtained from cooperative multiple prediction results in step 3)_iThe functional expression of (a) is:

in the above formula, N_predNumber of models generated for removing layers using Monte Carlo, y_i,n ^UThe binary prediction result, E (y), generated for the nth prediction for the ith unlabeled training sample_i,n ^U) And (4) averaging the binary prediction results generated by N times of prediction of the multi-model of the ith label-free training sample.

7. The multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method according to claim 1, wherein the step 4) comprises the following steps:

4.3) select significant variation region and clip to pseudo label training set X 'based on'_UTo obtain an enhanced pseudo tag data set X after combining real tags and pseudo tags_M；

In the above formula, the first and second carbon atoms are,

wherein

for the purpose of enhanced pixel-level pseudo-labeling,

as a pseudo label

and

for the two labeled remote sensing images of different time phases in the ith labeled training sample,

is the label of the ith labeled training sample, M is a binary mask,

8. The semi-supervised change detection method for the multi-model collaborative optimization high-resolution remote sensing image according to claim 1, wherein when the twin neural network model S is trained through the hybrid training set X in the step 5), the loss function L is as follows:

L＝L_S+L_U

in the above formula, L_SA labeled training set X in the step 2)_LA predetermined loss function, L, used in training the twin neural network model S_UIs uncertainty-weighted binary cross entropy loss and has:

in the above-mentioned formula, the compound has the following structure,

for the ith uncertainty estimation result in the hybrid training set X, L_WBCEAs an intermediate variable, ω is the ratio of the number of samples of the variable class to the number of samples of the invariant class in the labeled training set, y_i ^MTraining a corresponding pseudo label, p, of a sample for the ith label in the mixed training set X_i ^MAnd training the corresponding prediction probability of the sample for the ith label in the mixed training set X.

9. A multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection system, which comprises a microprocessor and a memory which are connected with each other, and is characterized in that the microprocessor is programmed or configured to execute the steps of the multi-model collaborative optimization high-resolution remote sensing image semi-supervised change detection method according to any one of claims 1-8.

10. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and the computer program is used for being executed by a computer device to implement the steps of the multi-model collaborative optimization high-score remote sensing image semi-supervised change detection method according to any one of claims 1 to 8.