CN116363516A - Remote sensing image change detection method based on edge auxiliary self-adaption - Google Patents
Remote sensing image change detection method based on edge auxiliary self-adaption Download PDFInfo
- Publication number
- CN116363516A CN116363516A CN202310339916.9A CN202310339916A CN116363516A CN 116363516 A CN116363516 A CN 116363516A CN 202310339916 A CN202310339916 A CN 202310339916A CN 116363516 A CN116363516 A CN 116363516A
- Authority
- CN
- China
- Prior art keywords
- change detection
- edge
- module
- layer
- eatder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 115
- 230000008859 change Effects 0.000 title claims abstract description 113
- 238000000034 method Methods 0.000 claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000012795 verification Methods 0.000 claims abstract description 20
- 238000003708 edge detection Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 16
- 230000003993 interaction Effects 0.000 claims description 15
- 230000003044 adaptive effect Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 13
- 230000009467 reduction Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 238000005520 cutting process Methods 0.000 claims description 5
- 238000011084 recovery Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 17
- 238000000605 extraction Methods 0.000 description 19
- 238000013527 convolutional neural network Methods 0.000 description 13
- 238000004088 simulation Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000007547 defect Effects 0.000 description 7
- 230000004927 fusion Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000003337 fertilizer Substances 0.000 description 1
- 239000012014 frustrated Lewis pair Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a remote sensing image change detection method based on edge assistance and self-adaption, which comprises the following implementation steps: constructing an edge auxiliary and self-adaptive change detection network EATDer; generating a corresponding training set and verification set; training the EATDer by using a training set and a verification set; and detecting the change of the remote sensing image. According to the invention, the built SAVT module and FRFM module are utilized, the global features of the remote sensing image can be extracted at reasonable calculation cost, the influence of the uneven positive and negative samples of the data set is controlled, the change detection module emphasizing the edge information and the joint loss function are utilized, the network can fully pay attention to the edge detection precision of the target to be detected, and the problems of high calculation cost, frequent adhesion of edge detection and low precision in the prior art method are solved.
Description
Technical Field
The invention belongs to the technical field of image processing, and further relates to a remote sensing image change detection method based on edge auxiliary self-adaption in the technical field of image detection. The method can be used for detecting double-phase change of the remote sensing image acquired in land coverage analysis and disaster monitoring.
Background
The change detection is an operation of quantitative analysis and an operation of surface change of a phenomenon or an object from two different periods, and the change detection of a remote sensing image plays a vital role in the remote sensing field and is receiving more and more attention due to important application value thereof. The technical scheme for realizing remote sensing image change detection is mainly a supervised scheme based on deep learning: with the advent of the convolutional neural network CNN (Convolutional Neural Network) technology, the deep learning method was gradually applied to change detection, and the CNN can further explore the semantic features of the remote sensing image by using its layered structure and layered learning manner. Thus, an accurate change detection feature map is generated, but there is still a difficult problem that CNN cannot capture long-distance context information hidden in a remote sensing image. This disadvantage more or less limits the detection accuracy of the detection network based on changes in the pure CNN structure. In view of the above limitations, researchers have introduced the transducer technology into the remote sensing field. The transform can well make up the defect of the extraction capability of CNN in long-distance information due to the global receptive field, but the transform still has a certain improvement, and firstly, the area changing in the multi-temporal remote sensing image is generally irregular due to the fact that the earth surface coverage type in the remote sensing image is various and the scale is also various. However, conventional transducer-based models do not accurately detect boundaries, which negatively impacts the remote sensing change detection task. Secondly, the current transducer-based model only focuses on the remote sensing image itself, but ignores time cues among remote sensing images acquired at different times, and the cues are also of great importance to remote sensing change detection of the remote sensing image. Finally, the transducer-based models are always cumbersome due to self-attention and multi-head mechanisms, which limits the efficiency of the remote sensing change detection process
The Chinese Star map measurement and control technology (fertilizer combination) limited company provides a remote sensing change detection method in a patent document "an algorithm and a system for remote sensing change detection" (patent application number: 202210941062.7, application publication number: CN 115019186A) applied by the Chinese Star map measurement and control technology. The method comprises the steps of carrying out feature extraction and splicing on graphs in different time phases, carrying out feature extraction by using a pyramid module based on a CNN structure, and finally inputting the fused feature graph into an FCN-head module and an SPP-head module to obtain final output. Although the method tries to improve the context information extraction capability of the network by using a new structure, the method still has two defects, namely, the method is limited by the fact that the pyramid module of the CNN structure is insufficient in design, and the global feature extraction capability is limited, so that the problem of false detection and omission of the network can occur, and the detection accuracy is further affected. Secondly, because the feature extraction and network design of the method pay attention to edge information, the edge detection accuracy of the method is insufficient, and a phenomenon that a plurality of different target edges are adhered is still detected in a large number in error.
A remote sensing change detection method is proposed in the patent literature of the university of Harbin engineering, namely a remote sensing image change detection method based on a multi-scale semantic mark transducer network (patent application number: 202211026042.3, application publication number: CN 115393317A). The method adopts a feature coder and decoder to extract feature graphs, utilizes a multi-scale semantic mark encoder to convert the feature graphs with different scales from the feature encoder into semantic marks with different lengths, and then sends the semantic marks into a transducer to obtain global semantic information. And combining semantic marks containing rich semantic information of different scales with the multi-scale feature map containing rich spatial information by utilizing a multi-scale semantic mark decoder to obtain semantic space joint features, finally, aggregating the multi-scale semantic space joint features in the feature decoder in a jump connection mode, and obtaining a final change result map by utilizing a classifier. The disadvantage of this method is that, firstly, the method directly uses the original transducer structure, and the multi-head attention calculation in the original transducer structure needs to consume a great deal of calculation and memory resources.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides a remote sensing image change detection method based on edge auxiliary self-adaption, which is used for solving three problems in the prior art: firstly, directly using the problem of overlarge consumption of multi-head attention computing resources in an original transducer structure; secondly, the problem that the global feature extraction capability of the neural network with a pure CNN structure is limited; thirdly, the problem of false detection omission caused by poor edge holding effect in the change detection result.
The method is used for adaptively reducing the size and the scale of the input feature map aiming at the feature map input with different sizes and scales, and greatly reducing the consumption of computing resources. The invention uses a mode of combining CNN and a transducer, utilizes the capability of the transducer structure to extract global features to make up for the defect of a pure CNN structure, constructs a feature extractor consisting of a feature extraction module Self-adaptive transducer feature extraction module SAVT (Self-adaption Vision Transformer) and a cross branch information interaction module FRFM (Full Range Fusion Module), wherein the SAVT is characterized in that the structure firstly uses CNN to carry out feature extraction and then sends the extracted features into the transducer module with Self-adaptive multi-head attention, and then carries out global feature extraction under the condition of controllable calculated amount, the FRFM module builds information communication between two branches of a change detection network, strengthens the anti-interference capability of the network for the problem of uneven distribution of positive and negative samples of an original data set, and ensures that the network is easier to distinguish the targets and the backgrounds of change detection. According to the invention, the original supervised data label is subjected to edge information extraction by using a Canny operator, and when a decoder and a loss function are designed, a transducer is introduced into the decoder to refine characteristics, so that the detection precision is improved, and the purpose of emphasizing the edge information is achieved by sending the original data and the edge data into a counter-propagation process of a network together through the specially designed loss function, so that the network keeps enough attention to the edge information continuously, and the edge holding effect in a network detection result is improved.
The technical scheme of the invention for realizing the purpose comprises the following steps:
step 1, constructing an Edge-aided adaptive change detection network EATDer (Edge-Assisted Adaptive Transformer Detector):
step 1.1, constructing a feature extractor consisting of three first, second and third sub-extractors with the same structure in series;
each sub-extractor is composed of two SAVT modules with the same structure, which are connected in parallel and then connected in series with an FRFM module, and after 256×256 input images pass through the three sub-extractors, three characteristic diagrams of 64×64, 32×32 and 16×16 are output;
the SAVT module in each sub-extractor is composed of a characteristic diagram coding layer and four self-adaptive multi-head attention SAVT encoders with the same structure; the characteristic map coding layer is realized by a convolution layer; the first to third sub-extractor feature map dimensions are set to 64,128,256, respectively, the convolution kernel sizes are set to 7,3,3, the step sizes are set to 4,2,2, and the fills are set to 3,1, respectively; the SAVT encoder of the self-adaptive multi-head attention is formed by sequentially connecting a Layer Norm Layer, a self-adaptive reduction Layer, a multi-head attention Layer, a Layer Norm Layer and an MLP Layer in series; the adaptive reduction layer is used for carrying out average pooling and convolution on input data and adapting to the inputReducing the length and width of the image with the input dimension of H multiplied by W multiplied by C before layer, and averaging the size after pooling to beWherein (1)>Is an upward rounding operation; the convolution kernel size, the step length and the filling of the convolution in the self-adaptive reduction layer are respectively set to be 1,1 and 0, and the number of heads of the multi-head attention in different sub-extractors is fixed to be 4;
the FRFM module in each sub-extractor is formed by connecting two FRFM sub-modules A and B with the same structure in parallel, each FRFM sub-module is formed by connecting an information interaction branch and a convolution attention module CBAM (Convolutional Block Attention Module) in series, wherein the information interaction branch receives two output characteristic graphs { X, Y } processed by the parallel SAVT module as the input of the branch; the information interaction branch in the FRFM sub-module A is realized by the following formula:
wherein X is cf Representing the output of the information interaction leg in the FRFM sub-module a,representing different parameter matrices available for neural network learning, softmax (·) representing the softmax function, d kx Representing the value of the scaling factor and +.>Is equal in dimension. The formula of the cross fusion process of the same branch B is expressed as follows:
wherein Y is cf Representing the output of the information interaction leg in the FRFM sub-module B,and->Representing different parameter matrices for neural network learning, d ky Representing the scaling factor and its value and YW y K Is equal in dimension;
step 1.2, a change detection module formed by connecting a recovery submodule and a refinement submodule in series is established;
the recovery submodule is formed by connecting a 3X 3 convolution layer and an up-sampling layer in series; the thinning submodule is formed by serially connecting a 3X 3 convolution layer, a SAVT module, a 2X 2 deconvolution layer, a Batch normal layer, a 3X 3 convolution layer, a SAVT module, a 3X 3 convolution layer, a Batch normal layer and a 1X 1 convolution layer in sequence;
step 1.3, connecting a feature extractor and a change detection module in series to form an edge auxiliary self-adaptive network EATDer;
step 2, generating a training set and a verification set:
step 2.1, collecting at least 1000 pairs of aligned double-phase remote sensing change detection image pairs, cutting all the aligned double-phase remote sensing change detection images into 256×256, then manufacturing a binarization label for each change detection image pair, marking the pixel points containing a change target in the image as positive examples, and marking the pixel points without the change target as negative examples;
step 2.2, processing each binarized label by adopting a Canny operator to obtain an edge label image of a target label to be detected, forming the edge labels into an edge label 1, and then performing expansion processing on the edge label 1 by using an image with the kernel size of 3 multiplied by 3 to obtain an edge label 2;
step 2.3, carrying out data enhancement on the double-phase remote sensing image pair and the change detection labels as well as the edge labels 1 and 2;
step 2.4, dividing all the images and labels after data enhancement according to a ratio of 4:1, combining the double-phase remote sensing image pair, the change detection label and the edge label 2 to form a training set 1 and a verification set 1, and combining the double-phase remote sensing image pair, the change detection label and the edge label 1 to form a training set 2 and a verification set 2;
step 3, training EATDers by using the training set and the verification set:
step 3.1, inputting the training set 1 into an EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 2 by adopting a joint loss function, and iteratively updating parameters of the EATDer network by using a gradient descent method until the joint loss function converges to obtain a pre-trained EATDer;
step 3.2, inputting the training set 2 into the pre-trained EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a target edge tag 1 by adopting a joint loss function, iteratively updating parameters of an EATDer network by using a gradient descent method, inputting the evidence set into the iterated EATDer for verification once each iteration, and taking the network parameter with the highest precision in the verification result for 100 times as the network parameter of which the final training is finished to obtain the trained EATDer;
step 4, detecting the change of the remote sensing image:
and cutting the aligned remote sensing image to be detected into an image with the size of 256 multiplied by 256, inputting the image into a trained EATDer network, and outputting the EATDer as a prediction result of the change detection.
Compared with the prior art, the invention has the following advantages:
firstly, the invention overcomes the defect of high calculation cost in the prior art by compressing the size of the K matrix and the V matrix in multi-head attention calculation in a self-adaptive mode according to the size of the input feature map through the nerve network sub-module SAVT based on the transducer, so that the calculation and storage cost of multi-head attention calculation is greatly reduced under the condition of controllable information loss when the invention detects the change of the remote sensing image.
Second, the feature extraction module combined by the FRFM and SAVT modules is constructed, so that the global information of the image can be more fully mined, and the defects that the feature extraction capability is insufficient and false detection is easy to generate in the prior art are overcome. The invention improves the accuracy of remote sensing image change detection.
Third, the invention introduces an edge loss function in the final detection module, and utilizes the Transformer to refine the characteristics, thereby overcoming the defect of poor edge retaining effect of the detection result in the prior art, and leading the detection result output by the network to have less false judgment of adhesion of the detection result when the remote sensing image is detected.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the overall architecture of an EATDer network in the present invention;
FIG. 3 is a schematic diagram of the SAVT module in the EATDer network of the invention;
FIG. 4 is a schematic diagram of the FRFM module in the EATDer network of the present invention;
FIG. 5 is a schematic diagram of the configuration of a change detection module in the EATDer network of the present invention;
FIG. 6 is a schematic representation of an edge dataset generated by the present invention;
fig. 7 is a graph of simulation results of remote sensing image change detection using the present invention and the conventional detection method, respectively.
Detailed Description
The invention will now be described in further detail with reference to the drawings and examples.
The implementation steps of the embodiment of the present invention will be described in further detail with reference to fig. 1.
And 1, constructing a graph convolution enhanced convolution network EATDer.
Step 1.1 referring to fig. 2, a feature extractor is constructed which is formed by connecting three first, second and third sub-extractors with the same structure in series.
Each sub-extractor is composed of two SAVT modules with the same structure in parallel connection and then connected in series with an FRFM module, and after 256×256 input images pass through the three sub-extractors, three characteristic diagrams with different scales of 64×64, 32×32 and 16×16 are output, so that the multi-scale characteristic extraction capacity of a network is improved, and then the multi-scale characteristic diagrams subjected to the characteristic extraction are sent to a change detection module.
The sub-module SAVT of the constructed eather network is further described with reference to fig. 3. The module comprises a characteristic diagram coding process and four layers of SAVT encoders, wherein the characteristic coding process is realized in a convolution mode, each Layer of SAVT encoder comprises a front half part formed by a Layer Norm Layer, an adaptive reduction Layer and a multi-head attention Layer, and a rear half part formed by the Layer Norm Layer and an MLP Layer:
for convenience of explanation, the input/output of the entire SAVT module is described first, and then the data processing flow inside the SAVT encoder is described in detail.
Assume that the feature map input to each SAVT module is of sizeThe feature map is encoded by a convolution operation comprising c 'convolution kernels, c' being set to 64,128,256, respectively, convolution size, step size, filling 2s-1, s, and s-1 in three sub-extractors, wherein s is sequentially designed as 4,2,2 according to the arrangement order of the sub-extractors to form%>Is then subjected to a dimension-size-warping operation such that the patch becomesThe size and input to the SAVT encoder for processing. After being processed by the SAVT encoder, the output characteristic diagram of the module is obtained>The characteristic diagram F out Will be sent to the FRFM module for further processing.
The magnitudes of the input and output feature maps of each SAVT encoder are uniform, and for ease of understanding the data processing scheme of the SAVT encoder, the data processing scheme in the first encoder of the four layers will be described in detail, and the same processing scheme is used for the L-layer encoder in the remaining modules. SAVT encoder is used for inputting data P in the first SAVT encoder 1 ' do the following:
LN (-), SAMSA (-), MLP (-) represent the functions of the layer-norm layer, the adaptive multi-headed attention layer, the MLP layer, respectively,representing the encoded features. Wherein the SAMSA section mainly comprises an adaptive reduction and a conventional multi-head attention, in which process, first, layer-norm-processed P is processed 1 ' copy three times, generate pending data +.>And->Subsequently we use adaptive downscaling process K 1 And V 1 。
In the adaptive reduction process, K is first of all calculated 1 Remodelling into two dimensionsAnd using average pooling and 1 x 1 convolution: />The purpose of the averaging pooling is to reduce the computational effort size and the purpose of the 1 x 1 convolution is to reduce the loss of information from the feature map. Through reconstruction operation, reduced data can be obtainedSimilarly, V can be obtained 1 Is the simplified data of->And then to the newly obtained dataA multi-headed attention mechanism is used.
The multi-headed attentiveness mechanism is represented as follows:
wherein W is O ,Representing a matrix of learnable parameters, content (·), softmax (·) representing the dimension concatenation, self-content calculation, softmax function, respectively. />Is a scaling factor, d k The value of (2) is equal to the K dimension. The number of heads of the multi-head attention in the embodiment of the present invention is fixed to 4, i.e., n=4. The algorithm has a lower computational complexity than standard multi-headed attention.
The time complexity of the quantization calculation original multi-headed attention and the reduced multi-headed attention is calculated as follows,
where Ω (MSA) represents the temporal complexity of the multi-headed attention and Ω (SAMSA) represents the temporal complexity of the adaptive multi-headed attention in the present invention.
Referring to fig. 4, the FRFM sub-module includes a cross fusion module and a CBAM module, and is a main module for completing twin network dual-branch information interaction.
Schematic diagram of the cross-fusion submodule FRFM is shown in fig. 4 (b). When the outputs from the SAVT modules of two different branches are taken as inputsWhen the information is input into FRFM, the cross fusion submodule fuses the information of two different branches first to generate +.>For ease of understanding, for input X, this process can be expressed as:
wherein X is cf Representing the output of the information interaction leg in the FRFM sub-module a,representing a learnable parameter matrix, +.>For scaling factor d kx Has a value of K x Dimension, Q of y ,K x ,V x Is defined as:
Similarly, the formula of the cross-fusion process for Y is as follows:
wherein Y is cf Representing the output of the information interaction leg in the FRFM sub-module B,and->Representing different parameter matrices for neural network learning,/->For scaling factor d ky Has a value of K x Is a dimension of (c).
X outputting the cross fusion module cf And Y cf Input to the CBAM module, which concatenates channel attention and spatial attention. For X cf CBAM derives channel attention scores sequentially by mean pooling, max pooling, MLP and convolutionAnd spatial attention score->
Multiplying the obtained score back to X cf To generate refined feature dataThe following are provided:
wherein,,representing an element-by-element multiplication. Likewise, Y cf The CBAM procedure of (c) is:
wherein,,representing the channel score and the attention score. By mining the salient information, the changed/unchanged information in the RS image can be highlighted.
Combining mode of feature extraction modules the twin encoder of the present invention is composed of three successive stages, each stage containing two SAVT blocks and one FRFM. Double-timing RS image { I ] with size H×W×3 A ,I B Input encoder, two SAVT blocks of the first stage map it into feature mapsAnd->{F 1 A ,F 1 B The FRFM signal to generate enhanced features +.>After the second stage and the third stage, orderly obtainAnd->Channel C 1 ,C 2 ,C 3 Set to 64,128,256, respectively. The feature images obtained through the feature extraction module not only relate to complex contents of various scales in the remote sensing image, but also contain a lot of time information. Through increasingThe strong features will be fed to the change detection module for change detection.
And 1.2, establishing a change detection module.
Referring to fig. 5, the restoration sub-module is composed of a 3×3 convolution and an upsampling, and the main purpose is to further fuse and splice the multi-scale features extracted by the feature extractor. This process can be formulated as follows:
O i ′=Conv 3×3 (O i ),i=1,2,3,
wherein O is 1 ,O 2 ,O 3 Representing the outputs of the first, second and third sub-extractors, O i ' represents the output feature map after feature fusion by convolution, O m Output feature map representing recovery submodule, conv 3×3 (. Cndot.) and Up (. Cndot.) represent 3X 3 convolution and upsampling operations, respectively.
Referring to fig. 5, the refinement sub-module is composed of 4 3×3 convolutions, one 1×1 convolution, two refinement SAVT modules (only one Layer of SAVT encoder is contained in the module), two deconvolutions and three Layer Norm layers. The design purpose of the refinement sub-module is to improve feature map O for change detection m Thereby further highlighting the hiding at O m Is provided, and a time cue is provided.
In the refinement sub-module, the invention captures local knowledge through convolution, twoThe SAVT module comprehensively analyzes the global context cues and introduces deconvolution to complete up-sampling. Wherein O is m First, through 3X 3 convolution and SAVT, the method is obtainedThen O is added m ' feeding in sequence two deconvolution blocks combined with Layer Norm layers to obtain +.>Finally, to O m "apply two other 3X 3 convolutions, one 1X 1 convolution and one refined SAVT module, get the feature map +.>This signature is then processed using softmax to yield
In order to improve the detection precision of the edge, the invention increases the edge detection task to ensure that the model can fully perceive the edge, thereby improving the detection performance. Dimension of channel is M s Divided into two parts including a variation prediction resultAnd an edge predictor +.>The joint loss function will be applied to narrow the gap between the change detection predictor and the target edge predictor and the corresponding change detection tag and target edge tag. Considering the change detection task is a bi-classification task, the binary cross entropy loss is selected as the basic unit of the joint loss function.
The joint loss function is defined as follows:
wherein lambda represents a super-parameter with a value of 0.3, and log (·) represents a base of 10Logarithmic operation of G c Representing a change detected tag value, M c G represents the prediction result of the change detection of the network output e Tag value representing edge detection, M e The prediction result of the edge detection output by the network is represented.
And 2, generating a training set and a verification set.
Step 2.1, the embodiment of the invention is to collect 1000 pairs of aligned double-phase remote sensing change detection image pairs, cut all the aligned double-phase images into 256×256, then make a binarization label for each change detection image pair, mark the pixel points with change targets in the image as positive examples, and mark the pixel points without change targets as negative examples.
And 2.2, processing each binarized label by adopting a Canny operator to obtain an edge label image of the target label to be detected, forming the edge labels into an edge label 1, and then performing image expansion processing on the edge label 1 by using a kernel size of 3 multiplied by 3 to obtain an edge label 2. The process is shown in fig. 6.
And 2.3, carrying out data enhancement on the double-phase remote sensing image pair and the change detection labels as well as the edge labels 1 and 2.
And 2.4, dividing all the images and labels after data enhancement according to a ratio of 4:1, combining the double-phase remote sensing image pair, the change detection label and the edge label 2 to form a training set 1 and a verification set 1, and combining the double-phase remote sensing image pair, the change detection label and the edge label 1 to form a training set 2 and a verification set 2.
And 3, training the EATDer by using the training set and the verification set.
And 3.1, inputting the training set 1 into an EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 2 by adopting a joint loss function, and iteratively updating parameters of the EATDer network under the condition that the learning rate is 0.001 by using a gradient descent method until the joint loss function converges to obtain the pre-trained EATDer.
And 3.2, inputting the training set 2 into the pre-trained EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 1 by adopting a joint loss function, iteratively updating parameters of an EATDer network under the condition that the learning rate is 0.001 by using a gradient descent method, inputting the evidence set into the iterated EATDer for verification every iteration, and taking the network parameter with the highest precision in the verification result for 100 times as the network parameter of which the final training is finished to obtain the trained EATDer.
And 4, obtaining a change result graph.
And cutting the aligned remote sensing image to be detected into an image with the size of 256 multiplied by 256, inputting the image into a trained EATDer network, and outputting the EATDer as a prediction result of the change detection.
The effects of the present invention can be further illustrated by the following simulations.
Simulation conditions.
The hardware platform of the simulation environment is that the processor is intel XEON 4214R with the main frequency of 2.4GHz, and the display card is GeForce RTX 3090 multiplied by 2 and 48g of display memory.
The software platform of the simulation environment of the present invention is the framework of ubuntu16.04, python 3.9.5+pytorch 1.13.
The input images used in the simulation of the invention are LEVIR-CD remote sensing change detection images, CDD remote sensing change detection images and WHO remote sensing change detection images respectively. Wherein,,
the LEVIR-CD contains 637 remote sensing images acquired by Google Earth, and the size is 1024×1024 pixels. The pixel resolution of these images is 0.5m. LEVIR-CD contains a double-phase image with a time span of 5 to 14 years, and is mainly concerned with various building changes such as villa houses and high-rise apartments.
CDD contains 16000 pairs of seasonally varying remote sensing bi-temporal images collected from *** earth. The spatial dimensions are 256×256, and the resolution varies from 0.03m to 1 m. In CDD, the land cover variations involved are diverse, including buildings, automobiles, roads.
The WHU contains a single remote sensing image. Its spatial dimensions and resolution were 32507 ×15354 and 0.2m, respectively. The primary object in the dataset is a building.
And secondly, simulating content and analyzing results thereof.
Under the simulation conditions, the change detection simulation is respectively carried out on four data sets by using the method ICIF-Net, DSAMNet, SNUNet of the invention and the four existing methods. The results are shown in FIG. 7.
In simulation experiments, the three prior art techniques refer to:
the prior art method for detecting the change of the network DSAMNet based on the deep supervision attention metric refers to a method published by Liu Mengxi et al on IEEE and used for detecting the change of a remote sensing image, namely: liu, mengxi, and Qian Shi, "DSAMNET: adeeply supervised attention metric based network for change detection of high-resolution images," 2021IEEE International Geoscience and Remote Sensing Symposium IGARSS.IEEE,2021, abbreviated as DSAMNet method.
The change detection method of the connected network SNUNet densely connected in the prior art refers to a method for detecting the change of a remote sensing image published by Fang Sheng et al in IEEE, namely: fang, sheng, et al, "SNUNet-CD: adensely connected Siamese network for change detection of VHR images," IEEE Geoscience and Remote Sensing Letters (2021): 1-5. Abbreviated SNUNet method.
The method for detecting the change of the inter-scale feature fusion network ICIF-Net based on the inter-scale feature cross interaction in the prior art refers to a method published by Feng Yucha et al on IEEE for detecting the change of a remote sensing image, namely: the ICIF-Net method is described by Feng, yuchao, et al, "Intra-scale cross-interaction and inter-scale feature fusion network for bitemporal remote sensing images change detection," IEEE Transactions on Geoscience and Remote Sensing 60 (2022): 1-13.
The simulation effect of the present invention is further described below with reference to fig. 7.
Fig. 7 (a), 7 (b), and 7 (c) are graphs of contrast algorithm visualizations of the LEVIR-CD dataset.
Fig. 7 (d), fig. 7 (e), fig. 7 (f) are graphs of the results of the comparative algorithm visualization of the CDD dataset.
Fig. 7 (g), fig. 7 (h), fig. 7 (i) are graphs of the results of the comparative algorithm visualization of the WHU dataset.
As can be seen from fig. 7 (b), the existing dsambet method has a large number of stuck detection results, mainly because the method does not emphasize the edge information of the object to be detected, resulting in a situation that the detection accuracy is reduced.
As can be seen from fig. 7 (g), the conventional ICIF-Net method and the SNUNet method have a large number of false positive detections, mainly because the method performs double-branch information communication, has insufficient feature extraction capability, and erroneously considers the background change as a change to be detected, so that the detection accuracy does not meet the expected condition.
The visual result diagram shows that the method has higher detection effect accuracy, fewer noise points and fewer missed detection errors, and more accurate boundary detection on the changed ground object, thereby fully illustrating the superiority of the method.
In order to quantitatively illustrate the performance of the proposed method, the present invention selects the numerical performance index commonly used by the change detection task to measure the difference between the existing method and the present invention, wherein the performance index comprises a change detection precision (P), a change detection recall (R), an F1 score (F1), and a calculation total precision (OA), and the calculation cost index comprises parameters and FLPs, and the results are plotted in the following table. Wherein, table 1 shows the accuracy comparison result of the invention and the comparison algorithm, and table 2 shows the calculation cost index comparison result of the invention and the comparison algorithm.
TABLE 1
TABLE 2
As can be seen from the two tables, the method has relatively higher four precision quantitative indexes, smaller overall error of change detection, higher change detection precision, relatively lower calculation cost and relatively lower calculation cost, and further illustrates the superiority of the method.
The simulation experiment shows that: the invention can extract global space characteristics of the remote sensing image by using the constructed EATDer neural network, fully pay attention to the detection precision of the image edge, can extract global characteristics of the remote sensing image under reasonable calculation cost by using the constructed SAVT module and the FRFM module, and control the influence of uneven positive and negative samples of the data set, and can fully pay attention to the edge detection precision of the target to be detected by using the change detection module emphasizing edge information and the joint loss function, thereby solving the problems of high calculation cost, frequent adhesion of edge detection and low precision existing in the prior art method.
Claims (2)
1. The change detection method based on the edge auxiliary self-adaption is characterized by comprising the following steps of: constructing an FRFM module comprising an adaptive reduced SAVT module and two-way information communication, and designing a conversion detection module of edge information and a corresponding loss function; the change detection method comprises the following steps:
step 1, constructing an edge-aided adaptive change detection network EATDer:
step 1.1, constructing a feature extractor consisting of three first, second and third sub-extractors with the same structure in series;
each sub-extractor is composed of two SAVT modules with the same structure, which are connected in parallel and then connected in series with an FRFM module, and after 256×256 input images pass through the three sub-extractors, three characteristic diagrams of 64×64, 32×32 and 16×16 are output;
the SAVT module in each sub-extractor is composed of a characteristic diagram coding layer and four self-adaptive multi-head attention SAVT encoders with the same structure; the characteristic map coding layer is realized by a convolution layer; the first to third sub-extractor feature map dimensions are set to 64,128,256, respectively, the convolution kernel sizes are set to 7,3,3, the step sizes are set to 4,2,2, and the fills are set to 3,1, respectively; the SAVT encoder of the self-adaptive multi-head attention is formed by sequentially connecting a Layer Norm Layer, a self-adaptive reduction Layer, a multi-head attention Layer, a Layer Norm Layer and an MLP Layer in series; the adaptive reduction layer is formed by carrying out average pooling and convolution on input data, and for the images with the length, width and input dimension of H multiplied by W multiplied by C before the input of the adaptive reduction layer, the size after the average pooling is thatWherein (1)>Is an upward rounding operation; the convolution kernel size, the step length and the filling of the convolution in the self-adaptive reduction layer are respectively set to be 1,1 and 0, and the number of heads of the multi-head attention in different sub-extractors is fixed to be 4;
the FRFM module in each sub-extractor is formed by connecting two FRFM sub-modules A and B with the same structure in parallel, each FRFM sub-module is formed by connecting an information interaction branch and a CBAM structure in series, wherein the information interaction branch receives two output characteristic diagrams { X, Y } processed by the parallel SAVT module as the input of the branch; the information interaction branch in the FRFM sub-module A is realized by the following formula:
wherein X is cf Representing the output of the information interaction leg in the FRFM sub-module a, representing different parameter matrices available for neural network learning, softmax (·) representing the softmax function, d kx Representing the value of the scaling factorIs equal in dimension; the formula of the cross fusion process of the same branch B is expressed as follows:
wherein Y is cf Representing the output of the information interaction leg in the FRFM sub-module B,and->Representing different parameter matrices for neural network learning, d ky Representing the value of the scaling factor and +.>Is equal in dimension;
step 1.2, a change detection module formed by connecting a recovery submodule and a refinement submodule in series is established;
the recovery submodule is formed by connecting a 3X 3 convolution layer and an up-sampling layer in series; the thinning submodule is formed by serially connecting a 3X 3 convolution layer, a SAVT module, a 2X 2 deconvolution layer, a Batch normal layer, a 3X 3 convolution layer, a SAVT module, a 3X 3 convolution layer, a Batch normal layer and a 1X 1 convolution layer in sequence;
step 1.3, connecting a feature extractor and a change detection module in series to form an edge auxiliary self-adaptive network EATDer;
step 2, generating a training set and a verification set:
step 2.1, collecting at least 1000 pairs of aligned double-phase remote sensing change detection image pairs, cutting all the aligned double-phase remote sensing change detection images into 256×256, then manufacturing a binarization label for each change detection image pair, marking the pixel points containing a change target in the image as positive examples, and marking the pixel points without the change target as negative examples;
step 2.2, processing each binarized label by adopting a Canny operator to obtain an edge label image of a target label to be detected, forming the edge labels into an edge label 1, and then performing expansion processing on the edge label 1 by using an image with the kernel size of 3 multiplied by 3 to obtain an edge label 2;
step 2.3, carrying out data enhancement on the double-phase remote sensing image pair and the change detection labels as well as the edge labels 1 and 2;
step 2.4, dividing all the images and labels after data enhancement according to a ratio of 4:1, combining the double-phase remote sensing image pair, the change detection label and the edge label 2 to form a training set 1 and a verification set 1, and combining the double-phase remote sensing image pair, the change detection label and the edge label 1 to form a training set 2 and a verification set 2;
step 3, training EATDers by using the training set and the verification set:
step 3.1, inputting the training set 1 into an EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a corresponding target edge tag 2 by adopting a joint loss function, and iteratively updating parameters of the EATDer network by using a gradient descent method until the joint loss function converges to obtain a pre-trained EATDer;
step 3.2, inputting the training set 2 into the pre-trained EATDer, calculating a change detection prediction result and a target edge prediction result which are output by the EATDer and loss values of a corresponding change detection tag and a target edge tag 1 by adopting a joint loss function, iteratively updating parameters of an EATDer network by using a gradient descent method, inputting the evidence set into the iterated EATDer for verification once each iteration, and taking the network parameter with the highest precision in the verification result for 100 times as the network parameter of which the final training is finished to obtain the trained EATDer;
step 4, detecting the change of the remote sensing image:
and cutting the aligned remote sensing image to be detected into an image with the size of 256 multiplied by 256, inputting the image into a trained EATDer network, and outputting the EATDer as a prediction result of the change detection.
2. The method for edge-aided adaptive-based change detection of claim 1, wherein the joint loss functions in step 3.1 and step 3.2 are as follows:
wherein λ represents a hyper-parameter with a value of 0.3, log (-) represents a logarithmic operation with a base of 10, G c Representing a change detected tag value, M c G represents the prediction result of the change detection of the network output e Tag value representing edge detection, M e The prediction result of the edge detection output by the network is represented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310339916.9A CN116363516A (en) | 2023-03-31 | 2023-03-31 | Remote sensing image change detection method based on edge auxiliary self-adaption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310339916.9A CN116363516A (en) | 2023-03-31 | 2023-03-31 | Remote sensing image change detection method based on edge auxiliary self-adaption |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116363516A true CN116363516A (en) | 2023-06-30 |
Family
ID=86919327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310339916.9A Pending CN116363516A (en) | 2023-03-31 | 2023-03-31 | Remote sensing image change detection method based on edge auxiliary self-adaption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116363516A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117036962A (en) * | 2023-10-08 | 2023-11-10 | 中国科学院空天信息创新研究院 | Remote sensing image change detection method, device, equipment and storage medium |
CN117933309A (en) * | 2024-03-13 | 2024-04-26 | 西安理工大学 | Three-path neural network and method for detecting change of double-phase remote sensing image |
-
2023
- 2023-03-31 CN CN202310339916.9A patent/CN116363516A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117036962A (en) * | 2023-10-08 | 2023-11-10 | 中国科学院空天信息创新研究院 | Remote sensing image change detection method, device, equipment and storage medium |
CN117036962B (en) * | 2023-10-08 | 2024-02-06 | 中国科学院空天信息创新研究院 | Remote sensing image change detection method, device, equipment and storage medium |
CN117933309A (en) * | 2024-03-13 | 2024-04-26 | 西安理工大学 | Three-path neural network and method for detecting change of double-phase remote sensing image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112232391B (en) | Dam crack detection method based on U-net network and SC-SAM attention mechanism | |
CN116363516A (en) | Remote sensing image change detection method based on edge auxiliary self-adaption | |
CN110781756A (en) | Urban road extraction method and device based on remote sensing image | |
Wang et al. | Multifocus image fusion using convolutional neural networks in the discrete wavelet transform domain | |
CN110570440A (en) | Image automatic segmentation method and device based on deep learning edge detection | |
CN113139543A (en) | Training method of target object detection model, target object detection method and device | |
CN113838064B (en) | Cloud removal method based on branch GAN using multi-temporal remote sensing data | |
CN112819007B (en) | Image recognition method, device, electronic equipment and storage medium | |
CN117876836B (en) | Image fusion method based on multi-scale feature extraction and target reconstruction | |
CN112651451B (en) | Image recognition method, device, electronic equipment and storage medium | |
CN114187520B (en) | Building extraction model construction and application method | |
CN117557775B (en) | Substation power equipment detection method and system based on infrared and visible light fusion | |
CN117496347A (en) | Remote sensing image building extraction method, device and medium | |
CN112784732A (en) | Method, device, equipment and medium for recognizing ground object type change and training model | |
CN115937736A (en) | Small target detection method based on attention and context awareness | |
Chen et al. | BisDeNet: A new lightweight deep learning-based framework for efficient landslide detection | |
CN117994240A (en) | Multi-scale two-level optical remote sensing image stripe noise intelligent detection method and device | |
CN114596503A (en) | Road extraction method based on remote sensing satellite image | |
Zhou et al. | ASSD-YOLO: a small object detection method based on improved YOLOv7 for airport surface surveillance | |
CN111539434B (en) | Infrared weak and small target detection method based on similarity | |
CN116503677B (en) | Wetland classification information extraction method, system, electronic equipment and storage medium | |
CN113569912A (en) | Vehicle identification method and device, electronic equipment and storage medium | |
CN117115675A (en) | Cross-time-phase light-weight spatial spectrum feature fusion hyperspectral change detection method, system, equipment and medium | |
CN112084941A (en) | Target detection and identification method based on remote sensing image | |
CN113569600A (en) | Method and device for identifying weight of object, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |