CN116363516A - Remote sensing image change detection method based on edge auxiliary self-adaption - Google Patents

Remote sensing image change detection method based on edge auxiliary self-adaption Download PDF

Info

Publication number
CN116363516A
CN116363516A CN202310339916.9A CN202310339916A CN116363516A CN 116363516 A CN116363516 A CN 116363516A CN 202310339916 A CN202310339916 A CN 202310339916A CN 116363516 A CN116363516 A CN 116363516A
Authority
CN
China
Prior art keywords
change detection
edge
module
layer
eatder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310339916.9A
Other languages
Chinese (zh)
Inventor
唐旭
段钧益
马晶晶
张向荣
焦李成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310339916.9A priority Critical patent/CN116363516A/en
Publication of CN116363516A publication Critical patent/CN116363516A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a remote sensing image change detection method based on edge assistance and self-adaption, which comprises the following implementation steps: constructing an edge auxiliary and self-adaptive change detection network EATDer; generating a corresponding training set and verification set; training the EATDer by using a training set and a verification set; and detecting the change of the remote sensing image. According to the invention, the built SAVT module and FRFM module are utilized, the global features of the remote sensing image can be extracted at reasonable calculation cost, the influence of the uneven positive and negative samples of the data set is controlled, the change detection module emphasizing the edge information and the joint loss function are utilized, the network can fully pay attention to the edge detection precision of the target to be detected, and the problems of high calculation cost, frequent adhesion of edge detection and low precision in the prior art method are solved.

Description

Remote sensing image change detection method based on edge auxiliary self-adaption
Technical Field
The invention belongs to the technical field of image processing, and further relates to a remote sensing image change detection method based on edge auxiliary self-adaption in the technical field of image detection. The method can be used for detecting double-phase change of the remote sensing image acquired in land coverage analysis and disaster monitoring.
Background
The change detection is an operation of quantitative analysis and an operation of surface change of a phenomenon or an object from two different periods, and the change detection of a remote sensing image plays a vital role in the remote sensing field and is receiving more and more attention due to important application value thereof. The technical scheme for realizing remote sensing image change detection is mainly a supervised scheme based on deep learning: with the advent of the convolutional neural network CNN (Convolutional Neural Network) technology, the deep learning method was gradually applied to change detection, and the CNN can further explore the semantic features of the remote sensing image by using its layered structure and layered learning manner. Thus, an accurate change detection feature map is generated, but there is still a difficult problem that CNN cannot capture long-distance context information hidden in a remote sensing image. This disadvantage more or less limits the detection accuracy of the detection network based on changes in the pure CNN structure. In view of the above limitations, researchers have introduced the transducer technology into the remote sensing field. The transform can well make up the defect of the extraction capability of CNN in long-distance information due to the global receptive field, but the transform still has a certain improvement, and firstly, the area changing in the multi-temporal remote sensing image is generally irregular due to the fact that the earth surface coverage type in the remote sensing image is various and the scale is also various. However, conventional transducer-based models do not accurately detect boundaries, which negatively impacts the remote sensing change detection task. Secondly, the current transducer-based model only focuses on the remote sensing image itself, but ignores time cues among remote sensing images acquired at different times, and the cues are also of great importance to remote sensing change detection of the remote sensing image. Finally, the transducer-based models are always cumbersome due to self-attention and multi-head mechanisms, which limits the efficiency of the remote sensing change detection process
The Chinese Star map measurement and control technology (fertilizer combination) limited company provides a remote sensing change detection method in a patent document "an algorithm and a system for remote sensing change detection" (patent application number: 202210941062.7, application publication number: CN 115019186A) applied by the Chinese Star map measurement and control technology. The method comprises the steps of carrying out feature extraction and splicing on graphs in different time phases, carrying out feature extraction by using a pyramid module based on a CNN structure, and finally inputting the fused feature graph into an FCN-head module and an SPP-head module to obtain final output. Although the method tries to improve the context information extraction capability of the network by using a new structure, the method still has two defects, namely, the method is limited by the fact that the pyramid module of the CNN structure is insufficient in design, and the global feature extraction capability is limited, so that the problem of false detection and omission of the network can occur, and the detection accuracy is further affected. Secondly, because the feature extraction and network design of the method pay attention to edge information, the edge detection accuracy of the method is insufficient, and a phenomenon that a plurality of different target edges are adhered is still detected in a large number in error.
A remote sensing change detection method is proposed in the patent literature of the university of Harbin engineering, namely a remote sensing image change detection method based on a multi-scale semantic mark transducer network (patent application number: 202211026042.3, application publication number: CN 115393317A). The method adopts a feature coder and decoder to extract feature graphs, utilizes a multi-scale semantic mark encoder to convert the feature graphs with different scales from the feature encoder into semantic marks with different lengths, and then sends the semantic marks into a transducer to obtain global semantic information. And combining semantic marks containing rich semantic information of different scales with the multi-scale feature map containing rich spatial information by utilizing a multi-scale semantic mark decoder to obtain semantic space joint features, finally, aggregating the multi-scale semantic space joint features in the feature decoder in a jump connection mode, and obtaining a final change result map by utilizing a classifier. The disadvantage of this method is that, firstly, the method directly uses the original transducer structure, and the multi-head attention calculation in the original transducer structure needs to consume a great deal of calculation and memory resources.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides a remote sensing image change detection method based on edge auxiliary self-adaption, which is used for solving three problems in the prior art: firstly, directly using the problem of overlarge consumption of multi-head attention computing resources in an original transducer structure; secondly, the problem that the global feature extraction capability of the neural network with a pure CNN structure is limited; thirdly, the problem of false detection omission caused by poor edge holding effect in the change detection result.
The method is used for adaptively reducing the size and the scale of the input feature map aiming at the feature map input with different sizes and scales, and greatly reducing the consumption of computing resources. The invention uses a mode of combining CNN and a transducer, utilizes the capability of the transducer structure to extract global features to make up for the defect of a pure CNN structure, constructs a feature extractor consisting of a feature extraction module Self-adaptive transducer feature extraction module SAVT (Self-adaption Vision Transformer) and a cross branch information interaction module FRFM (Full Range Fusion Module), wherein the SAVT is characterized in that the structure firstly uses CNN to carry out feature extraction and then sends the extracted features into the transducer module with Self-adaptive multi-head attention, and then carries out global feature extraction under the condition of controllable calculated amount, the FRFM module builds information communication between two branches of a change detection network, strengthens the anti-interference capability of the network for the problem of uneven distribution of positive and negative samples of an original data set, and ensures that the network is easier to distinguish the targets and the backgrounds of change detection. According to the invention, the original supervised data label is subjected to edge information extraction by using a Canny operator, and when a decoder and a loss function are designed, a transducer is introduced into the decoder to refine characteristics, so that the detection precision is improved, and the purpose of emphasizing the edge information is achieved by sending the original data and the edge data into a counter-propagation process of a network together through the specially designed loss function, so that the network keeps enough attention to the edge information continuously, and the edge holding effect in a network detection result is improved.
The technical scheme of the invention for realizing the purpose comprises the following steps:
step 1, constructing an Edge-aided adaptive change detection network EATDer (Edge-Assisted Adaptive Transformer Detector):
step 1.1, constructing a feature extractor consisting of three first, second and third sub-extractors with the same structure in series;
each sub-extractor is composed of two SAVT modules with the same structure, which are connected in parallel and then connected in series with an FRFM module, and after 256×256 input images pass through the three sub-extractors, three characteristic diagrams of 64×64, 32×32 and 16×16 are output;
the SAVT module in each sub-extractor is composed of a characteristic diagram coding layer and four self-adaptive multi-head attention SAVT encoders with the same structure; the characteristic map coding layer is realized by a convolution layer; the first to third sub-extractor feature map dimensions are set to 64,128,256, respectively, the convolution kernel sizes are set to 7,3,3, the step sizes are set to 4,2,2, and the fills are set to 3,1, respectively; the SAVT encoder of the self-adaptive multi-head attention is formed by sequentially connecting a Layer Norm Layer, a self-adaptive reduction Layer, a multi-head attention Layer, a Layer Norm Layer and an MLP Layer in series; the adaptive reduction layer is used for carrying out average pooling and convolution on input data and adapting to the inputReducing the length and width of the image with the input dimension of H multiplied by W multiplied by C before layer, and averaging the size after pooling to be
Figure SMS_1
Wherein (1)>
Figure SMS_2
Is an upward rounding operation; the convolution kernel size, the step length and the filling of the convolution in the self-adaptive reduction layer are respectively set to be 1,1 and 0, and the number of heads of the multi-head attention in different sub-extractors is fixed to be 4;
the FRFM module in each sub-extractor is formed by connecting two FRFM sub-modules A and B with the same structure in parallel, each FRFM sub-module is formed by connecting an information interaction branch and a convolution attention module CBAM (Convolutional Block Attention Module) in series, wherein the information interaction branch receives two output characteristic graphs { X, Y } processed by the parallel SAVT module as the input of the branch; the information interaction branch in the FRFM sub-module A is realized by the following formula:
Figure SMS_3
wherein X is cf Representing the output of the information interaction leg in the FRFM sub-module a,
Figure SMS_4
representing different parameter matrices available for neural network learning, softmax (·) representing the softmax function, d kx Representing the value of the scaling factor and +.>
Figure SMS_5
Is equal in dimension. The formula of the cross fusion process of the same branch B is expressed as follows:
Figure SMS_6
wherein Y is cf Representing the output of the information interaction leg in the FRFM sub-module B,
Figure SMS_7
and->
Figure SMS_8
Representing different parameter matrices for neural network learning, d ky Representing the scaling factor and its value and YW y K Is equal in dimension;
step 1.2, a change detection module formed by connecting a recovery submodule and a refinement submodule in series is established;
the recovery submodule is formed by connecting a 3X 3 convolution layer and an up-sampling layer in series; the thinning submodule is formed by serially connecting a 3X 3 convolution layer, a SAVT module, a 2X 2 deconvolution layer, a Batch normal layer, a 3X 3 convolution layer, a SAVT module, a 3X 3 convolution layer, a Batch normal layer and a 1X 1 convolution layer in sequence;
step 1.3, connecting a feature extractor and a change detection module in series to form an edge auxiliary self-adaptive network EATDer;
step 2, generating a training set and a verification set:
step 2.1, collecting at least 1000 pairs of aligned double-phase remote sensing change detection image pairs, cutting all the aligned double-phase remote sensing change detection images into 256×256, then manufacturing a binarization label for each change detection image pair, marking the pixel points containing a change target in the image as positive examples, and marking the pixel points without the change target as negative examples;
step 2.2, processing each binarized label by adopting a Canny operator to obtain an edge label image of a target label to be detected, forming the edge labels into an edge label 1, and then performing expansion processing on the edge label 1 by using an image with the kernel size of 3 multiplied by 3 to obtain an edge label 2;
step 2.3, carrying out data enhancement on the double-phase remote sensing image pair and the change detection labels as well as the edge labels 1 and 2;
step 2.4, dividing all the images and labels after data enhancement according to a ratio of 4:1, combining the double-phase remote sensing image pair, the change detection label and the edge label 2 to form a training set 1 and a verification set 1, and combining the double-phase remote sensing image pair, the change detection label and the edge label 1 to form a training set 2 and a verification set 2;
step 3, training EATDers by using the training set and the verification set:
step 3.1, inputting the training set 1 into an EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 2 by adopting a joint loss function, and iteratively updating parameters of the EATDer network by using a gradient descent method until the joint loss function converges to obtain a pre-trained EATDer;
step 3.2, inputting the training set 2 into the pre-trained EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a target edge tag 1 by adopting a joint loss function, iteratively updating parameters of an EATDer network by using a gradient descent method, inputting the evidence set into the iterated EATDer for verification once each iteration, and taking the network parameter with the highest precision in the verification result for 100 times as the network parameter of which the final training is finished to obtain the trained EATDer;
step 4, detecting the change of the remote sensing image:
and cutting the aligned remote sensing image to be detected into an image with the size of 256 multiplied by 256, inputting the image into a trained EATDer network, and outputting the EATDer as a prediction result of the change detection.
Compared with the prior art, the invention has the following advantages:
firstly, the invention overcomes the defect of high calculation cost in the prior art by compressing the size of the K matrix and the V matrix in multi-head attention calculation in a self-adaptive mode according to the size of the input feature map through the nerve network sub-module SAVT based on the transducer, so that the calculation and storage cost of multi-head attention calculation is greatly reduced under the condition of controllable information loss when the invention detects the change of the remote sensing image.
Second, the feature extraction module combined by the FRFM and SAVT modules is constructed, so that the global information of the image can be more fully mined, and the defects that the feature extraction capability is insufficient and false detection is easy to generate in the prior art are overcome. The invention improves the accuracy of remote sensing image change detection.
Third, the invention introduces an edge loss function in the final detection module, and utilizes the Transformer to refine the characteristics, thereby overcoming the defect of poor edge retaining effect of the detection result in the prior art, and leading the detection result output by the network to have less false judgment of adhesion of the detection result when the remote sensing image is detected.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the overall architecture of an EATDer network in the present invention;
FIG. 3 is a schematic diagram of the SAVT module in the EATDer network of the invention;
FIG. 4 is a schematic diagram of the FRFM module in the EATDer network of the present invention;
FIG. 5 is a schematic diagram of the configuration of a change detection module in the EATDer network of the present invention;
FIG. 6 is a schematic representation of an edge dataset generated by the present invention;
fig. 7 is a graph of simulation results of remote sensing image change detection using the present invention and the conventional detection method, respectively.
Detailed Description
The invention will now be described in further detail with reference to the drawings and examples.
The implementation steps of the embodiment of the present invention will be described in further detail with reference to fig. 1.
And 1, constructing a graph convolution enhanced convolution network EATDer.
Step 1.1 referring to fig. 2, a feature extractor is constructed which is formed by connecting three first, second and third sub-extractors with the same structure in series.
Each sub-extractor is composed of two SAVT modules with the same structure in parallel connection and then connected in series with an FRFM module, and after 256×256 input images pass through the three sub-extractors, three characteristic diagrams with different scales of 64×64, 32×32 and 16×16 are output, so that the multi-scale characteristic extraction capacity of a network is improved, and then the multi-scale characteristic diagrams subjected to the characteristic extraction are sent to a change detection module.
The sub-module SAVT of the constructed eather network is further described with reference to fig. 3. The module comprises a characteristic diagram coding process and four layers of SAVT encoders, wherein the characteristic coding process is realized in a convolution mode, each Layer of SAVT encoder comprises a front half part formed by a Layer Norm Layer, an adaptive reduction Layer and a multi-head attention Layer, and a rear half part formed by the Layer Norm Layer and an MLP Layer:
for convenience of explanation, the input/output of the entire SAVT module is described first, and then the data processing flow inside the SAVT encoder is described in detail.
Assume that the feature map input to each SAVT module is of size
Figure SMS_9
The feature map is encoded by a convolution operation comprising c 'convolution kernels, c' being set to 64,128,256, respectively, convolution size, step size, filling 2s-1, s, and s-1 in three sub-extractors, wherein s is sequentially designed as 4,2,2 according to the arrangement order of the sub-extractors to form%>
Figure SMS_10
Is then subjected to a dimension-size-warping operation such that the patch becomes
Figure SMS_11
The size and input to the SAVT encoder for processing. After being processed by the SAVT encoder, the output characteristic diagram of the module is obtained>
Figure SMS_12
The characteristic diagram F out Will be sent to the FRFM module for further processing.
The magnitudes of the input and output feature maps of each SAVT encoder are uniform, and for ease of understanding the data processing scheme of the SAVT encoder, the data processing scheme in the first encoder of the four layers will be described in detail, and the same processing scheme is used for the L-layer encoder in the remaining modules. SAVT encoder is used for inputting data P in the first SAVT encoder 1 ' do the following:
Figure SMS_13
Figure SMS_14
LN (-), SAMSA (-), MLP (-) represent the functions of the layer-norm layer, the adaptive multi-headed attention layer, the MLP layer, respectively,
Figure SMS_15
representing the encoded features. Wherein the SAMSA section mainly comprises an adaptive reduction and a conventional multi-head attention, in which process, first, layer-norm-processed P is processed 1 ' copy three times, generate pending data +.>
Figure SMS_16
And->
Figure SMS_17
Subsequently we use adaptive downscaling process K 1 And V 1
In the adaptive reduction process, K is first of all calculated 1 Remodelling into two dimensions
Figure SMS_18
And using average pooling and 1 x 1 convolution: />
Figure SMS_19
The purpose of the averaging pooling is to reduce the computational effort size and the purpose of the 1 x 1 convolution is to reduce the loss of information from the feature map. Through reconstruction operation, reduced data can be obtained
Figure SMS_20
Similarly, V can be obtained 1 Is the simplified data of->
Figure SMS_21
And then to the newly obtained data
Figure SMS_22
A multi-headed attention mechanism is used.
The multi-headed attentiveness mechanism is represented as follows:
Figure SMS_23
Figure SMS_24
Figure SMS_25
wherein W is O ,
Figure SMS_26
Representing a matrix of learnable parameters, content (·), softmax (·) representing the dimension concatenation, self-content calculation, softmax function, respectively. />
Figure SMS_27
Is a scaling factor, d k The value of (2) is equal to the K dimension. The number of heads of the multi-head attention in the embodiment of the present invention is fixed to 4, i.e., n=4. The algorithm has a lower computational complexity than standard multi-headed attention.
The time complexity of the quantization calculation original multi-headed attention and the reduced multi-headed attention is calculated as follows,
Figure SMS_28
Figure SMS_29
where Ω (MSA) represents the temporal complexity of the multi-headed attention and Ω (SAMSA) represents the temporal complexity of the adaptive multi-headed attention in the present invention.
Referring to fig. 4, the FRFM sub-module includes a cross fusion module and a CBAM module, and is a main module for completing twin network dual-branch information interaction.
Schematic diagram of the cross-fusion submodule FRFM is shown in fig. 4 (b). When the outputs from the SAVT modules of two different branches are taken as inputs
Figure SMS_30
When the information is input into FRFM, the cross fusion submodule fuses the information of two different branches first to generate +.>
Figure SMS_31
For ease of understanding, for input X, this process can be expressed as:
Figure SMS_32
wherein X is cf Representing the output of the information interaction leg in the FRFM sub-module a,
Figure SMS_33
representing a learnable parameter matrix, +.>
Figure SMS_34
For scaling factor d kx Has a value of K x Dimension, Q of y ,K x ,V x Is defined as:
Figure SMS_35
wherein,,
Figure SMS_36
and->
Figure SMS_37
Is a learnable weight.
Similarly, the formula of the cross-fusion process for Y is as follows:
Figure SMS_38
Figure SMS_39
wherein Y is cf Representing the output of the information interaction leg in the FRFM sub-module B,
Figure SMS_40
and->
Figure SMS_41
Representing different parameter matrices for neural network learning,/->
Figure SMS_42
For scaling factor d ky Has a value of K x Is a dimension of (c).
X outputting the cross fusion module cf And Y cf Input to the CBAM module, which concatenates channel attention and spatial attention. For X cf CBAM derives channel attention scores sequentially by mean pooling, max pooling, MLP and convolution
Figure SMS_43
And spatial attention score->
Figure SMS_44
Multiplying the obtained score back to X cf To generate refined feature data
Figure SMS_45
The following are provided:
Figure SMS_46
Figure SMS_47
wherein,,
Figure SMS_48
representing an element-by-element multiplication. Likewise, Y cf The CBAM procedure of (c) is:
Figure SMS_49
Figure SMS_50
wherein,,
Figure SMS_51
representing the channel score and the attention score. By mining the salient information, the changed/unchanged information in the RS image can be highlighted.
Combining mode of feature extraction modules the twin encoder of the present invention is composed of three successive stages, each stage containing two SAVT blocks and one FRFM. Double-timing RS image { I ] with size H×W×3 A ,I B Input encoder, two SAVT blocks of the first stage map it into feature maps
Figure SMS_52
And->
Figure SMS_53
{F 1 A ,F 1 B The FRFM signal to generate enhanced features +.>
Figure SMS_54
After the second stage and the third stage, orderly obtain
Figure SMS_55
And->
Figure SMS_56
Channel C 1 ,C 2 ,C 3 Set to 64,128,256, respectively. The feature images obtained through the feature extraction module not only relate to complex contents of various scales in the remote sensing image, but also contain a lot of time information. Through increasingThe strong features will be fed to the change detection module for change detection.
And 1.2, establishing a change detection module.
Referring to fig. 5, the restoration sub-module is composed of a 3×3 convolution and an upsampling, and the main purpose is to further fuse and splice the multi-scale features extracted by the feature extractor. This process can be formulated as follows:
Figure SMS_57
Figure SMS_58
Figure SMS_59
O i ′=Conv 3×3 (O i ),i=1,2,3,
Figure SMS_60
wherein O is 1 ,O 2 ,O 3 Representing the outputs of the first, second and third sub-extractors, O i ' represents the output feature map after feature fusion by convolution, O m Output feature map representing recovery submodule, conv 3×3 (. Cndot.) and Up (. Cndot.) represent 3X 3 convolution and upsampling operations, respectively.
Referring to fig. 5, the refinement sub-module is composed of 4 3×3 convolutions, one 1×1 convolution, two refinement SAVT modules (only one Layer of SAVT encoder is contained in the module), two deconvolutions and three Layer Norm layers. The design purpose of the refinement sub-module is to improve feature map O for change detection m Thereby further highlighting the hiding at O m Is provided, and a time cue is provided.
In the refinement sub-module, the invention captures local knowledge through convolution, twoThe SAVT module comprehensively analyzes the global context cues and introduces deconvolution to complete up-sampling. Wherein O is m First, through 3X 3 convolution and SAVT, the method is obtained
Figure SMS_61
Then O is added m ' feeding in sequence two deconvolution blocks combined with Layer Norm layers to obtain +.>
Figure SMS_62
Finally, to O m "apply two other 3X 3 convolutions, one 1X 1 convolution and one refined SAVT module, get the feature map +.>
Figure SMS_63
This signature is then processed using softmax to yield
Figure SMS_64
In order to improve the detection precision of the edge, the invention increases the edge detection task to ensure that the model can fully perceive the edge, thereby improving the detection performance. Dimension of channel is M s Divided into two parts including a variation prediction result
Figure SMS_65
And an edge predictor +.>
Figure SMS_66
The joint loss function will be applied to narrow the gap between the change detection predictor and the target edge predictor and the corresponding change detection tag and target edge tag. Considering the change detection task is a bi-classification task, the binary cross entropy loss is selected as the basic unit of the joint loss function.
The joint loss function is defined as follows:
Figure SMS_67
wherein lambda represents a super-parameter with a value of 0.3, and log (·) represents a base of 10Logarithmic operation of G c Representing a change detected tag value, M c G represents the prediction result of the change detection of the network output e Tag value representing edge detection, M e The prediction result of the edge detection output by the network is represented.
And 2, generating a training set and a verification set.
Step 2.1, the embodiment of the invention is to collect 1000 pairs of aligned double-phase remote sensing change detection image pairs, cut all the aligned double-phase images into 256×256, then make a binarization label for each change detection image pair, mark the pixel points with change targets in the image as positive examples, and mark the pixel points without change targets as negative examples.
And 2.2, processing each binarized label by adopting a Canny operator to obtain an edge label image of the target label to be detected, forming the edge labels into an edge label 1, and then performing image expansion processing on the edge label 1 by using a kernel size of 3 multiplied by 3 to obtain an edge label 2. The process is shown in fig. 6.
And 2.3, carrying out data enhancement on the double-phase remote sensing image pair and the change detection labels as well as the edge labels 1 and 2.
And 2.4, dividing all the images and labels after data enhancement according to a ratio of 4:1, combining the double-phase remote sensing image pair, the change detection label and the edge label 2 to form a training set 1 and a verification set 1, and combining the double-phase remote sensing image pair, the change detection label and the edge label 1 to form a training set 2 and a verification set 2.
And 3, training the EATDer by using the training set and the verification set.
And 3.1, inputting the training set 1 into an EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 2 by adopting a joint loss function, and iteratively updating parameters of the EATDer network under the condition that the learning rate is 0.001 by using a gradient descent method until the joint loss function converges to obtain the pre-trained EATDer.
And 3.2, inputting the training set 2 into the pre-trained EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 1 by adopting a joint loss function, iteratively updating parameters of an EATDer network under the condition that the learning rate is 0.001 by using a gradient descent method, inputting the evidence set into the iterated EATDer for verification every iteration, and taking the network parameter with the highest precision in the verification result for 100 times as the network parameter of which the final training is finished to obtain the trained EATDer.
And 4, obtaining a change result graph.
And cutting the aligned remote sensing image to be detected into an image with the size of 256 multiplied by 256, inputting the image into a trained EATDer network, and outputting the EATDer as a prediction result of the change detection.
The effects of the present invention can be further illustrated by the following simulations.
Simulation conditions.
The hardware platform of the simulation environment is that the processor is intel XEON 4214R with the main frequency of 2.4GHz, and the display card is GeForce RTX 3090 multiplied by 2 and 48g of display memory.
The software platform of the simulation environment of the present invention is the framework of ubuntu16.04, python 3.9.5+pytorch 1.13.
The input images used in the simulation of the invention are LEVIR-CD remote sensing change detection images, CDD remote sensing change detection images and WHO remote sensing change detection images respectively. Wherein,,
the LEVIR-CD contains 637 remote sensing images acquired by Google Earth, and the size is 1024×1024 pixels. The pixel resolution of these images is 0.5m. LEVIR-CD contains a double-phase image with a time span of 5 to 14 years, and is mainly concerned with various building changes such as villa houses and high-rise apartments.
CDD contains 16000 pairs of seasonally varying remote sensing bi-temporal images collected from *** earth. The spatial dimensions are 256×256, and the resolution varies from 0.03m to 1 m. In CDD, the land cover variations involved are diverse, including buildings, automobiles, roads.
The WHU contains a single remote sensing image. Its spatial dimensions and resolution were 32507 ×15354 and 0.2m, respectively. The primary object in the dataset is a building.
And secondly, simulating content and analyzing results thereof.
Under the simulation conditions, the change detection simulation is respectively carried out on four data sets by using the method ICIF-Net, DSAMNet, SNUNet of the invention and the four existing methods. The results are shown in FIG. 7.
In simulation experiments, the three prior art techniques refer to:
the prior art method for detecting the change of the network DSAMNet based on the deep supervision attention metric refers to a method published by Liu Mengxi et al on IEEE and used for detecting the change of a remote sensing image, namely: liu, mengxi, and Qian Shi, "DSAMNET: adeeply supervised attention metric based network for change detection of high-resolution images," 2021IEEE International Geoscience and Remote Sensing Symposium IGARSS.IEEE,2021, abbreviated as DSAMNet method.
The change detection method of the connected network SNUNet densely connected in the prior art refers to a method for detecting the change of a remote sensing image published by Fang Sheng et al in IEEE, namely: fang, sheng, et al, "SNUNet-CD: adensely connected Siamese network for change detection of VHR images," IEEE Geoscience and Remote Sensing Letters (2021): 1-5. Abbreviated SNUNet method.
The method for detecting the change of the inter-scale feature fusion network ICIF-Net based on the inter-scale feature cross interaction in the prior art refers to a method published by Feng Yucha et al on IEEE for detecting the change of a remote sensing image, namely: the ICIF-Net method is described by Feng, yuchao, et al, "Intra-scale cross-interaction and inter-scale feature fusion network for bitemporal remote sensing images change detection," IEEE Transactions on Geoscience and Remote Sensing 60 (2022): 1-13.
The simulation effect of the present invention is further described below with reference to fig. 7.
Fig. 7 (a), 7 (b), and 7 (c) are graphs of contrast algorithm visualizations of the LEVIR-CD dataset.
Fig. 7 (d), fig. 7 (e), fig. 7 (f) are graphs of the results of the comparative algorithm visualization of the CDD dataset.
Fig. 7 (g), fig. 7 (h), fig. 7 (i) are graphs of the results of the comparative algorithm visualization of the WHU dataset.
As can be seen from fig. 7 (b), the existing dsambet method has a large number of stuck detection results, mainly because the method does not emphasize the edge information of the object to be detected, resulting in a situation that the detection accuracy is reduced.
As can be seen from fig. 7 (g), the conventional ICIF-Net method and the SNUNet method have a large number of false positive detections, mainly because the method performs double-branch information communication, has insufficient feature extraction capability, and erroneously considers the background change as a change to be detected, so that the detection accuracy does not meet the expected condition.
The visual result diagram shows that the method has higher detection effect accuracy, fewer noise points and fewer missed detection errors, and more accurate boundary detection on the changed ground object, thereby fully illustrating the superiority of the method.
In order to quantitatively illustrate the performance of the proposed method, the present invention selects the numerical performance index commonly used by the change detection task to measure the difference between the existing method and the present invention, wherein the performance index comprises a change detection precision (P), a change detection recall (R), an F1 score (F1), and a calculation total precision (OA), and the calculation cost index comprises parameters and FLPs, and the results are plotted in the following table. Wherein, table 1 shows the accuracy comparison result of the invention and the comparison algorithm, and table 2 shows the calculation cost index comparison result of the invention and the comparison algorithm.
Figure SMS_68
Figure SMS_69
Figure SMS_70
Figure SMS_71
TABLE 1
Figure SMS_72
TABLE 2
Figure SMS_73
As can be seen from the two tables, the method has relatively higher four precision quantitative indexes, smaller overall error of change detection, higher change detection precision, relatively lower calculation cost and relatively lower calculation cost, and further illustrates the superiority of the method.
The simulation experiment shows that: the invention can extract global space characteristics of the remote sensing image by using the constructed EATDer neural network, fully pay attention to the detection precision of the image edge, can extract global characteristics of the remote sensing image under reasonable calculation cost by using the constructed SAVT module and the FRFM module, and control the influence of uneven positive and negative samples of the data set, and can fully pay attention to the edge detection precision of the target to be detected by using the change detection module emphasizing edge information and the joint loss function, thereby solving the problems of high calculation cost, frequent adhesion of edge detection and low precision existing in the prior art method.

Claims (2)

1. The change detection method based on the edge auxiliary self-adaption is characterized by comprising the following steps of: constructing an FRFM module comprising an adaptive reduced SAVT module and two-way information communication, and designing a conversion detection module of edge information and a corresponding loss function; the change detection method comprises the following steps:
step 1, constructing an edge-aided adaptive change detection network EATDer:
step 1.1, constructing a feature extractor consisting of three first, second and third sub-extractors with the same structure in series;
each sub-extractor is composed of two SAVT modules with the same structure, which are connected in parallel and then connected in series with an FRFM module, and after 256×256 input images pass through the three sub-extractors, three characteristic diagrams of 64×64, 32×32 and 16×16 are output;
the SAVT module in each sub-extractor is composed of a characteristic diagram coding layer and four self-adaptive multi-head attention SAVT encoders with the same structure; the characteristic map coding layer is realized by a convolution layer; the first to third sub-extractor feature map dimensions are set to 64,128,256, respectively, the convolution kernel sizes are set to 7,3,3, the step sizes are set to 4,2,2, and the fills are set to 3,1, respectively; the SAVT encoder of the self-adaptive multi-head attention is formed by sequentially connecting a Layer Norm Layer, a self-adaptive reduction Layer, a multi-head attention Layer, a Layer Norm Layer and an MLP Layer in series; the adaptive reduction layer is formed by carrying out average pooling and convolution on input data, and for the images with the length, width and input dimension of H multiplied by W multiplied by C before the input of the adaptive reduction layer, the size after the average pooling is that
Figure FDA0004157835360000011
Wherein (1)>
Figure FDA0004157835360000012
Is an upward rounding operation; the convolution kernel size, the step length and the filling of the convolution in the self-adaptive reduction layer are respectively set to be 1,1 and 0, and the number of heads of the multi-head attention in different sub-extractors is fixed to be 4;
the FRFM module in each sub-extractor is formed by connecting two FRFM sub-modules A and B with the same structure in parallel, each FRFM sub-module is formed by connecting an information interaction branch and a CBAM structure in series, wherein the information interaction branch receives two output characteristic diagrams { X, Y } processed by the parallel SAVT module as the input of the branch; the information interaction branch in the FRFM sub-module A is realized by the following formula:
Figure FDA0004157835360000013
wherein X is cf Representing the output of the information interaction leg in the FRFM sub-module a,
Figure FDA0004157835360000014
Figure FDA0004157835360000021
representing different parameter matrices available for neural network learning, softmax (·) representing the softmax function, d kx Representing the value of the scaling factor
Figure FDA0004157835360000022
Is equal in dimension; the formula of the cross fusion process of the same branch B is expressed as follows:
Figure FDA0004157835360000023
wherein Y is cf Representing the output of the information interaction leg in the FRFM sub-module B,
Figure FDA0004157835360000024
and->
Figure FDA0004157835360000025
Representing different parameter matrices for neural network learning, d ky Representing the value of the scaling factor and +.>
Figure FDA0004157835360000026
Is equal in dimension;
step 1.2, a change detection module formed by connecting a recovery submodule and a refinement submodule in series is established;
the recovery submodule is formed by connecting a 3X 3 convolution layer and an up-sampling layer in series; the thinning submodule is formed by serially connecting a 3X 3 convolution layer, a SAVT module, a 2X 2 deconvolution layer, a Batch normal layer, a 3X 3 convolution layer, a SAVT module, a 3X 3 convolution layer, a Batch normal layer and a 1X 1 convolution layer in sequence;
step 1.3, connecting a feature extractor and a change detection module in series to form an edge auxiliary self-adaptive network EATDer;
step 2, generating a training set and a verification set:
step 2.1, collecting at least 1000 pairs of aligned double-phase remote sensing change detection image pairs, cutting all the aligned double-phase remote sensing change detection images into 256×256, then manufacturing a binarization label for each change detection image pair, marking the pixel points containing a change target in the image as positive examples, and marking the pixel points without the change target as negative examples;
step 2.2, processing each binarized label by adopting a Canny operator to obtain an edge label image of a target label to be detected, forming the edge labels into an edge label 1, and then performing expansion processing on the edge label 1 by using an image with the kernel size of 3 multiplied by 3 to obtain an edge label 2;
step 2.3, carrying out data enhancement on the double-phase remote sensing image pair and the change detection labels as well as the edge labels 1 and 2;
step 2.4, dividing all the images and labels after data enhancement according to a ratio of 4:1, combining the double-phase remote sensing image pair, the change detection label and the edge label 2 to form a training set 1 and a verification set 1, and combining the double-phase remote sensing image pair, the change detection label and the edge label 1 to form a training set 2 and a verification set 2;
step 3, training EATDers by using the training set and the verification set:
step 3.1, inputting the training set 1 into an EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 2 by adopting a joint loss function, and iteratively updating parameters of the EATDer network by using a gradient descent method until the joint loss function converges to obtain a pre-trained EATDer;
step 3.2, inputting the training set 2 into the pre-trained EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a target edge tag 1 by adopting a joint loss function, iteratively updating parameters of an EATDer network by using a gradient descent method, inputting the evidence set into the iterated EATDer for verification once each iteration, and taking the network parameter with the highest precision in the verification result for 100 times as the network parameter of which the final training is finished to obtain the trained EATDer;
step 4, detecting the change of the remote sensing image:
and cutting the aligned remote sensing image to be detected into an image with the size of 256 multiplied by 256, inputting the image into a trained EATDer network, and outputting the EATDer as a prediction result of the change detection.
2. The method for edge-aided adaptive-based change detection of claim 1, wherein the joint loss functions in step 3.1 and step 3.2 are as follows:
Figure FDA0004157835360000031
wherein λ represents a hyper-parameter with a value of 0.3, log (-) represents a logarithmic operation with a base of 10, G c Representing a change detected tag value, M c G represents the prediction result of the change detection of the network output e Tag value representing edge detection, M e The prediction result of the edge detection output by the network is represented.
CN202310339916.9A 2023-03-31 2023-03-31 Remote sensing image change detection method based on edge auxiliary self-adaption Pending CN116363516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310339916.9A CN116363516A (en) 2023-03-31 2023-03-31 Remote sensing image change detection method based on edge auxiliary self-adaption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310339916.9A CN116363516A (en) 2023-03-31 2023-03-31 Remote sensing image change detection method based on edge auxiliary self-adaption

Publications (1)

Publication Number Publication Date
CN116363516A true CN116363516A (en) 2023-06-30

Family

ID=86919327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310339916.9A Pending CN116363516A (en) 2023-03-31 2023-03-31 Remote sensing image change detection method based on edge auxiliary self-adaption

Country Status (1)

Country Link
CN (1) CN116363516A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036962A (en) * 2023-10-08 2023-11-10 中国科学院空天信息创新研究院 Remote sensing image change detection method, device, equipment and storage medium
CN117933309A (en) * 2024-03-13 2024-04-26 西安理工大学 Three-path neural network and method for detecting change of double-phase remote sensing image

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036962A (en) * 2023-10-08 2023-11-10 中国科学院空天信息创新研究院 Remote sensing image change detection method, device, equipment and storage medium
CN117036962B (en) * 2023-10-08 2024-02-06 中国科学院空天信息创新研究院 Remote sensing image change detection method, device, equipment and storage medium
CN117933309A (en) * 2024-03-13 2024-04-26 西安理工大学 Three-path neural network and method for detecting change of double-phase remote sensing image

Similar Documents

Publication Publication Date Title
CN112232391B (en) Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN116363516A (en) Remote sensing image change detection method based on edge auxiliary self-adaption
CN110781756A (en) Urban road extraction method and device based on remote sensing image
Wang et al. Multifocus image fusion using convolutional neural networks in the discrete wavelet transform domain
CN110570440A (en) Image automatic segmentation method and device based on deep learning edge detection
CN113139543A (en) Training method of target object detection model, target object detection method and device
CN113838064B (en) Cloud removal method based on branch GAN using multi-temporal remote sensing data
CN112819007B (en) Image recognition method, device, electronic equipment and storage medium
CN117876836B (en) Image fusion method based on multi-scale feature extraction and target reconstruction
CN112651451B (en) Image recognition method, device, electronic equipment and storage medium
CN114187520B (en) Building extraction model construction and application method
CN117557775B (en) Substation power equipment detection method and system based on infrared and visible light fusion
CN117496347A (en) Remote sensing image building extraction method, device and medium
CN112784732A (en) Method, device, equipment and medium for recognizing ground object type change and training model
CN115937736A (en) Small target detection method based on attention and context awareness
Chen et al. BisDeNet: A new lightweight deep learning-based framework for efficient landslide detection
CN117994240A (en) Multi-scale two-level optical remote sensing image stripe noise intelligent detection method and device
CN114596503A (en) Road extraction method based on remote sensing satellite image
Zhou et al. ASSD-YOLO: a small object detection method based on improved YOLOv7 for airport surface surveillance
CN111539434B (en) Infrared weak and small target detection method based on similarity
CN116503677B (en) Wetland classification information extraction method, system, electronic equipment and storage medium
CN113569912A (en) Vehicle identification method and device, electronic equipment and storage medium
CN117115675A (en) Cross-time-phase light-weight spatial spectrum feature fusion hyperspectral change detection method, system, equipment and medium
CN112084941A (en) Target detection and identification method based on remote sensing image
CN113569600A (en) Method and device for identifying weight of object, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination