CN116310839A - Remote sensing image building change detection method based on feature enhancement network - Google Patents
Remote sensing image building change detection method based on feature enhancement network Download PDFInfo
- Publication number
- CN116310839A CN116310839A CN202310426990.4A CN202310426990A CN116310839A CN 116310839 A CN116310839 A CN 116310839A CN 202310426990 A CN202310426990 A CN 202310426990A CN 116310839 A CN116310839 A CN 116310839A
- Authority
- CN
- China
- Prior art keywords
- feature
- building
- input
- channel
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008859 change Effects 0.000 title claims abstract description 52
- 238000001514 detection method Methods 0.000 title claims abstract description 47
- 230000004927 fusion Effects 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000000007 visual effect Effects 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 33
- 238000011176 pooling Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 19
- 230000002776 aggregation Effects 0.000 claims description 11
- 238000004220 aggregation Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 239000000284 extract Substances 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 238000012795 verification Methods 0.000 claims description 2
- 230000002708 enhancing effect Effects 0.000 claims 3
- 230000003213 activating effect Effects 0.000 claims 1
- 230000006872 improvement Effects 0.000 description 12
- 238000011156 evaluation Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a remote sensing image building change detection method based on a characteristic enhancement network, which comprises the following steps: the first step: preparing a data set, and a second step: carrying out data enhancement; and a third step of: building a network model and training, and a fourth step: building change detection. The invention realizes the full fusion of different building information by introducing the visual transducer structure, the space and channel attention, the u-shaped residual error module, the enhanced feature extraction module and the self-attention feature fusion module, and can better distinguish buildings with different regular shapes and sizes so as to prevent false detection and missed detection, and simultaneously improve the feature extraction capability of the buildings with different shapes and the edge details thereof. The present invention has a higher F1 score and Kappa coefficient than a different advanced algorithm, such as BIT, changeformer.
Description
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image building change detection method based on a characteristic enhancement network.
Background
Currently, since the twenty-first century, the living standard of people is gradually improved, and the development of urbanization is more and more important. The building is used as one of the marks of city construction, and the building can represent city change to a great extent, and has important significance for city planning management. The change detection refers to the process of observing the state difference of the same geographic position at different times, and has important significance for the change detection of buildings, the land resource utilization, the post-disaster reconstruction and the like.
The change detection population of the building can be classified into a conventional change detection algorithm and a deep learning-based change detection algorithm. In the conventional method, the change detection algorithm of the remote sensing image can be roughly divided into a direct comparison method and a classified comparison method. The direct comparison method is mainly used for analyzing geometric features, spectrum textures and the like of a building, and changing information is obtained from images through direct comparison. The algorithm of comparison after classification firstly carries out ground feature classification on remote sensing images in different periods, and then carries out comparison so as to determine the final change and unchanged area. The accuracy of the change detection is mainly determined by the classification result.
However, as buildings become more complex and remote sensing images contain more and more object information, the conventional methods are more and more difficult to meet the demands, and the conventional building change detection methods are mostly based on manually constructed features and are easily interfered by different information, such as noise, image registration and the like. Moreover, the features constructed by the traditional method can only be fitted to relatively simple buildings, and the fitting of the complex abstract building features is difficult. In addition, the traditional algorithm needs to rely on a great deal of expertise and experience of professionals in different fields when constructing features, and consumes a great deal of manpower and material resources, and the efficiency is low in most of the current modes based on manual field investigation. Therefore, an automated, intelligent, and fast building change detection method is increasingly required.
With the development of space remote sensing technology, deep learning has been applied to change detection, and has strong modeling and learning capabilities, and feature extraction and end-to-end change detection can be performed on images by establishing a series of models (such as UNet, stant and the like), so that the detection precision and speed are improved.
However, some existing model algorithms have the problems that the edge details of lost buildings are changed, small-scale target buildings in complex backgrounds are omitted, the detection effect on irregularly-shaped target buildings is poor, and the changes between different buildings with similar positions are difficult to distinguish due to the fact that the feature extraction capability is insufficient.
Disclosure of Invention
The invention aims to provide a remote sensing image building change detection method based on a characteristic enhancement network, which can improve the characteristic representation capability of the network so as to further improve the change detection precision.
The invention adopts the technical scheme that:
a remote sensing image building change detection method based on a characteristic enhancement network specifically comprises the following steps: the first step: a dataset is prepared, a public change detection dataset CDD is collected, the dataset comprising a validation set, a training set, and a test set. Each subset contains A, B and OUT folders, which respectively correspond to the pre-change image, the post-change image and the building label actually changed, and each image has a size of 256×256 pixels.
As a further improvement scheme of the technical scheme: and a second step of: data enhancement is performed. In order to enhance the identification capability of the network to buildings under different scenes and the robustness of the network, the generalization capability of the network is enhanced, and the data enhancement is carried out on the image by adopting methods of horizontal overturning, rotation and the like.
As a further improvement scheme of the technical scheme: and a third step of: and (5) building a network model and training. The input image is first input into a feature extractor to extract building features. The feature extractor consists of three parts: a primary feature extractor, an enhanced feature extractor, a ResNet decoder.
As a further improvement scheme of the technical scheme: the primary feature extractor consists mainly of a Unet code and a visual transducer structure. Each coding block of the Unet contains two convolutional layers, each of which outputs a feature map. These two feature maps are input into the VTS to obtain a larger receptive field and enhance the ability of the feature representation, ultimately outputting five feature maps.
As a further improvement scheme of the technical scheme: the fifth output feature map is input into an enhanced feature extractor, the enhanced feature extractor is composed of four modules, namely a space and channel attention module, a U-shaped residual module, an enhanced feature extraction module and a self-attention feature fusion module, which are combined to further enhance the representation capability and the robustness of the network to the building features.
As a further improvement scheme of the technical scheme: the space and channel attention module consists of space attention and channel attention, and the attention to the feature map is increased in the channel dimension and the space dimension, so that the expression capability of the network to important building features can be effectively enhanced.
As a further improvement scheme of the technical scheme: the U-shaped residual module may better capture global and local information to enhance building feature extraction, and the enhanced feature extraction module may improve the ability to extract representative building features from the channel and space dimensions.
As a further improvement scheme of the technical scheme: the self-attention feature fusion module fully merges feature information through different operations (such as summation, subtraction and stitching).
As a further improvement scheme of the technical scheme: the outputs of the enhancement feature extractor and the primary feature extractor are input into a ResNet decoder for decoding, and finally two feature graphs are output.
As a further improvement scheme of the technical scheme: the two output feature maps of the feature extractor are input into the cross-channel up and down Wen Yuyi aggregation module for full fusion of the channel information.
As a further improvement scheme of the technical scheme: the output signature of the cross-channel up and down Wen Yuyi aggregation module is input into a convolution layer to obtain the final change detection map.
As a further improvement scheme of the technical scheme: the loss function used for network training is a combination of cross entropy loss, dice loss and Focal loss to improve the impact of imbalance between changing and unchanged buildings.
As a further improvement scheme of the technical scheme: the test set samples are input into a trained network model to predict a building change map.
According to the invention, by introducing a visual transducer structure, spatial correlation is provided for buildings with different levels of characteristic diagrams, and the recognition capability of a network to the buildings at different positions is enhanced; furthermore, by introducing the attention of the space and the channel, the interference of irrelevant background information is filtered from the space dimension and the channel dimension, and the detection capability of the small-scale building is improved.
Furthermore, the u-shaped residual error module and the enhanced feature extraction module are designed to improve the feature extraction capability of buildings with different shapes and edge details thereof. The self-attention feature fusion module is provided to realize the full fusion of different building information, and can better distinguish buildings with different regular shapes and sizes so as to prevent false detection and missed detection.
Furthermore, a cross-channel up-down Wen Yuyi aggregation module is designed to perform information aggregation in the channel dimension, so that context semantic information can be better utilized, and in order to reduce information loss during feature map merging, the detection capability of a network on a building is improved. The present invention has a higher F1 score and Kappa coefficient than a different advanced algorithm, such as BIT, changeformer.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a change detection flow in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network model structure according to an embodiment of the present invention;
FIG. 3 is a schematic view of a visual transducer according to an embodiment of the present invention;
FIG. 4 is a block diagram of a space and channel attention module in accordance with an embodiment of the present invention;
FIG. 5 is a block diagram of a u-shaped residual module and an enhanced feature extraction module in accordance with an embodiment of the present invention;
FIG. 6 is a block diagram of a self-attention feature fusion module in accordance with an embodiment of the present invention;
FIG. 7 is a block diagram of a cross-channel up and down Wen Yuyi aggregation module in accordance with an embodiment of the present invention;
FIG. 8 is a comparison of the detection results of the building change according to the present invention with other prior art advanced methods.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, 2 and 3, the present invention includes the steps of:
the first step: preparing a data set; first, a dataset is prepared. A public change detection dataset CDD is collected, the dataset comprising 11 pairs of remote sensing images of seasonal changes, wherein 7 pairs of images are 4725 x 2700 pixels in size and 4 pairs of images are 1900 x 1000 pixels in size. The resolution is 0.03m/pixel to 1m/pixel. The image size was cut to 256×256 pixels, 10000 for training, 3000 for verification, and 3000 for testing.
And a second step of: carrying out data enhancement; in order to enhance the identification capability of the network to buildings under different scenes and the robustness of the network, the generalization capability of the network is enhanced, and the data enhancement is carried out on the image by adopting methods of horizontal overturning, rotation and the like.
And a third step of: building a network model and training; the model structure of the feature enhancement network is shown in fig. 2. The input image is first input into a feature extractor to extract building features. The feature extractor consists of three parts: a primary feature extractor, an enhanced feature extractor, a ResNet decoder.
The primary feature extractor mainly comprises a Unet code and a visual transducer structure. Each coding block of the Unet contains two convolutional layers, each of which outputs a feature map. These two feature maps are input into the VTS to obtain a larger receptive field and enhance the ability of the feature representation. Finally, the primary feature extractor outputs five feature maps. The fifth feature is input to an enhanced feature extractor for further feature enhancement. The enhanced feature extractor consists of four modules. The first four output features and the fifth enhancement feature are input into the ResNet structure for decoding. After decoding, the feature extractor may output two feature maps of the same size. The two feature maps are input into the cross-channel up and down Wen Yuyi aggregation module for fusion. Finally, we obtain the output change detection graph through two convolution layers with convolution kernel size of 3×3 and a convolution layer with convolution kernel size of 1×1, and the channel number of the output change graph is 2.
Primary feature extractor
As shown in fig. 2 (b), the primary feature extractor employs UNet encoding structures and Visual Transducer Structures (VTS). Each coding block contains two convolutional layers, each of which outputs a feature map. These two feature maps are input into the VTS. The structure of the VTS is shown in fig. 3. Because the receptive field of the feature map after convolution is larger than before convolution, the second feature map contains more semantic information. After patch pulsing, the first property is used as a query vector. The second feature is used as a key vector and a value vector. The multi-headed attention is set to 12 in the present invention. Meanwhile, the first feature is used as a position matrix, and is overlapped with the second feature to be used as an input. Also, to reduce network parameters, we set the transducer block to 1. After transforming the scale of the feature map, the size is 768×16×16. They are then input into the transpose convolution layer to change the size and channel number of the feature map. The final output profile has the same size as the input profile. The Resnet feature decoder uses mainly a part of the Resnet18 network architecture, first expanding the size of the input feature map to twice the original size by transpose convolution with a core size of 7 x 7. The feature map is then input into a residual module, which is followed by a Dropout layer to reduce overfitting. And then, after being spliced with the feature images with the corresponding scale, the feature images are input into a residual error module, and the process is circulated until a final output feature image is obtained.
Space and channel attention module
As shown in fig. 4, channel attention uses an adaptive average pooling operation to pool each channel of the input feature map. Two fully connected layers are then used to reduce the characteristic parameters and the Relu function is used to increase the nonlinearity. And inputting the fully connected result into a sigmoid function to execute weight normalization, and multiplying the weight by each element of the input feature map to obtain the channel attention feature map. Spatial attention employs average pooling, which takes global feature information into account, and maximum pooling, which is used to mine the representative features of a building, and which uses different sized pooling cores to perform the pooling operation. The left branch uses a pooled core of size 3×3 and the middle branch uses a pooled core of size 5×5. Smaller pooled cores may capture finer target features, while larger pooled cores may mine for richer target features. After the pooling operation is performed, a 1×1 convolutional layer is used to adjust the number of channels. And then, the results output by different pooling operations are spliced together to fuse information. The result is input into two convolution layers to obtain initial weights, and final weights are calculated through a sigmoid function. And finally, adding the spatial attention and the channel attention to obtain a final output.
u-shape residual error module and enhanced feature extraction module
As shown in fig. 5 (a), the U-shaped residual block is divided into two parts: an upper branch and a lower branch. The size of the input feature map is reduced to half its original size with a max pooling layer with a pooling kernel size of 2 x 2. Deep semantic features are then extracted through the four convolution layers. The above information is used to reduce information loss according to the skipped connection. The outputs of the fourth convolution layer and the third convolution layer are spliced together and input into the convolution layers. After three identical operations are performed, the output is up-sampled. The output and input features are then added. For the following branches, the operations are the same except that the maximum pooling is replaced by the average pooling.
As shown in fig. 5 (b), the enhanced feature extraction module is divided into a left branch and a right branch. The left branch mainly extracts feature information in the spatial dimension, and the right branch mainly mines features in the channel dimension. The left branch first compresses the channels of the input feature to 1 by a convolution of 1 x 1. Its spatial features are then extracted by two convolution layers. The result is fed to a sigmoid function to perform weight normalization. The weights are multiplied by the input features, and a weight is assigned to each feature element. And finally, obtaining left branch output through addition. The left side of the right branch uses an average pooling operation to pool each channel information of the input feature. The right side of the right branch then uses a max pooling operation in order to comprehensively consider the feature information from a global and local perspective. The channels are then compressed and expanded using two convolution layers of 1 x 1, with the result being activated by the sigmoid function. After multiplying the activation result by the input feature, the left and right features are added to obtain the final output.
Self-attention feature fusion module
As shown in fig. 6, to better fuse feature information, three operations are used: addition, subtraction and concatenation. Two convolution layers are then used to obtain depth representative features. In adjusting the number of channels with a 1 x 1 convolution, it is known from self-attention that this mechanism can correlate pixels at different locations, which can identify a building well. Thus, consider the subtractive branch as a query vector, the additive branch as a key vector, and the splice branch as a value vector. The subtraction output is subjected to reshape and transpose operations, and the addition output is subjected to reshape. Then, they are multiplied and the multiplication result is activated using a sigmoid function, and then the feature information of the splice branches is fused using two convolution layers. Then multiplying the result by a final weight matrix to construct a space long-distance dependency relationship, and carrying out reshape transformation on the multiplication result. Finally, the result and the two input features are added element by element, and the final output is considered as a weighted sum of the input features and all the position features.
Cross-channel up-down Wen Yuyi aggregation module
In this module, as shown in fig. 7, the middle feature map is obtained by performing channel stitching on the left feature map and the right feature map, then the size of the middle feature is compressed into 1×1 by the adaptive averaging pooling layer, the channel number is adjusted by the convolution layer of 1×1, the convolution output is stitched with the middle feature map, the stitching result is transmitted to the convolution layer of 1×1, the convolution result is input to the right branch for connection and fusion, the weight matrix is normalized by using a sigmoid function, and the aggregation and fusion of channel information can be fully realized through the left branch and the right branch. Two convolution layers are used to extract multi-scale features of left and right branch features. Finally, multiplying the weight matrix by the convolution result, and adding the weight matrix and the convolution result element by element, so that the channel information of the output feature images can realize good intersection.
The loss function used in the training of the present network is a combination of cross entropy loss, dice loss and Focal loss, as follows, to improve the impact of imbalance between the changing building and the unchanged building.
L=L bce +δL dc +φL fc (4)
y n Wherein represents true change of earth's surface, p n Representing the predicted architectural changes, H, W represent the height and width of the image, respectively. Alpha is super parameter, alpha is more than or equal to 0, p is estimated probability of model, and the value range is 0,1]. Delta phi is used for balancing loss L bce 、L dc 、L fc 。
Fourth, detecting building change. And outputting a change detection graph through the test set sample image after the network training is finished and converged.
In order to verify the effectiveness of the invention, the CDD data set is used for training and testing different algorithm models, and the training and testing are performed in the same environment. The algorithm used for comparison was STANET, SNUNET, BIT, changeformer, IDET. The test is carried out by 5 evaluation indexes, wherein the evaluation indexes are respectively an Overall Accuracy (overlay Accuracy), a Precision (Precision), a Recall (Recall), an F1 fraction (F1-score) and a Kappa coefficient, and F1 is a harmonic average value of the Precision and the Recall, and the larger the value is, the better. The specific evaluation index results are shown in table 1.
It can be seen from table 1 that the inventive process is all over 5 existing advanced processes in all metrics, which demonstrates the effectiveness of the inventive process.
The method of the present invention is shown in fig. 8 in comparison with other prior art methods for detecting building changes.
As can be seen from fig. 8, in the change detection result of the method of the present invention, the proposed model has a more complete and accurate detection result. And the method can well distinguish and distinguish the irregular-shaped target building from the change between different buildings with similar positions, can also filter the influence of background noise, and can enhance the detection capability of small-scale target buildings.
The present invention has been described in detail with reference to the drawings and the embodiments, but the present invention is not limited to the embodiments, and various changes can be made by the above-disclosed technical matters within the knowledge of those skilled in the art without departing from the spirit of the present invention. The invention may be practiced otherwise than as specifically described.
In the description of the present invention, it should be noted that, for the azimuth words such as "center", "lateral", "longitudinal", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc., the azimuth and positional relationships are based on the azimuth or positional relationships shown in the drawings, it is merely for convenience of describing the present invention and simplifying the description, and it is not to be construed as limiting the specific scope of protection of the present invention that the device or element referred to must have a specific azimuth configuration and operation.
It should be noted that the terms "comprises" and "comprising," along with any variations thereof, in the description and claims of the present application are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed.
Note that the above is only a preferred embodiment of the present invention and uses technical principles. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the present invention has been described in connection with the above embodiments, it is to be understood that the invention is not limited to the specific embodiments disclosed and that many other and equally effective embodiments may be devised without departing from the spirit of the invention, and the scope thereof is determined by the scope of the appended claims.
Claims (10)
1. A remote sensing image building change detection method based on a characteristic enhancement network is characterized by comprising the following steps of: the method comprises the following steps:
the first step: preparing a data set, namely collecting a public change detection data set CDD, wherein the data set comprises a verification set, a training set and a test set. Each subset comprises A, B and OUT folders, which respectively correspond to an image before change, an image after change and a building label which truly changes, and the size of each image is 256 multiplied by 256 pixels;
and a second step of: data enhancement is performed: in order to enhance the identification capability of the network to buildings under different scenes and the robustness of the network, the generalization capability of the network is enhanced, and the data enhancement is carried out on the image by adopting methods of horizontal overturning, rotation and the like;
and a third step of: building a network model and training; the network model comprises:
feature extractor: the input image is first input into a feature extractor for extracting building features;
cross-channel up and down Wen Yuyi aggregation module: for sufficient fusion of channel information;
fourth step: building change detection; the test set samples are input into a trained network model to predict a building change map.
2. The method for detecting the change of the building by the remote sensing image based on the characteristic enhancement network according to claim 1, wherein the characteristic extractor comprises:
primary feature extractor: for enhancing feature expression capability; the primary feature extractor mainly consists of a Unet code and a visual transducer structure; each coding block of the Unet comprises two convolution layers, and each convolution layer outputs a feature map; these two feature maps are input into the VTS to obtain a larger receptive field and enhance the ability of feature representation, ultimately outputting five feature maps; the method specifically comprises the following steps:
after patch unbinding, the first property is used as a query vector;
using the second feature as a key vector and a value vector;
the multi-headed attention is set to 12 in the present invention, while the first feature is used as a position matrix, which is superimposed with the second feature as input, we set the transducer block to 1;
after transforming the scale of the feature map, the size is 768×16×16. Then, inputting them into a transpose convolution layer to change the size and channel number of the feature map, and finally outputting the feature map with the same size as the input feature map;
enhancement feature extractor: further enhancing the characteristic expression capability;
and the ResNet decoder is used for decoding the output of the enhanced feature extractor and the primary feature extractor, inputting the output to the ResNet decoder, and finally outputting two feature graphs.
3. The remote sensing image building change detection method based on the feature enhancement network according to claim 2, wherein the fifth output feature map is input into an enhancement feature extractor, the feature expression capability is further enhanced, the enhancement feature extractor is composed of four modules, namely a space and channel attention module, a U-shaped residual module, an enhancement feature extraction module and a self-attention feature fusion module, which are combined to further enhance the representation capability and the robustness of the network to the building features, wherein the space and channel attention module is composed of space attention and channel attention, the attention to the feature map is increased in the channel dimension and the space dimension, and the expression capability of the network to important building features can be effectively enhanced; the U-shaped residual error module is used for enhancing the extraction of building characteristics and capturing global and local information better; the enhanced feature extraction module is used for improving the capability of extracting representative building features from the channel and space dimensions, and the self-attention feature fusion module fully combines feature information through summation, difference and splicing.
4. A remote sensing image building change detection method based on a feature enhancement network according to claim 3, wherein the Resnet feature decoder uses mainly a part of the Resnet18 network structure, and first expands the size of the input feature map to twice the original size by transposed convolution with a kernel size of 7 x 7. The feature map is then input into a residual module, which is followed by a Dropout layer to reduce overfitting. And then, after being spliced with the feature images with the corresponding scale, the feature images are input into a residual error module, and the process is circulated until a final output feature image is obtained.
5. The remote sensing image building change detection method based on the characteristic enhancement network according to claim 3, wherein the space and channel attention module specifically comprises the following steps:
channel attention uses an adaptive average pooling operation to pool each channel of the input feature map;
then using two fully connected layers to reduce the characteristic parameters, the Relu function is used to increase the nonlinearity;
and inputting the fully connected result into a sigmoid function to execute weight normalization, and multiplying the weight by each element of the input feature map to obtain the channel attention feature map.
6. The method for detecting changes in remote sensing image buildings based on characteristic enhancement network according to claim 3, wherein the loss function used in the network training in the third step is a combination of cross entropy loss, dice loss and Focal loss, and is used for improving the influence caused by unbalance between a changed building and a unchanged building, and the method is specifically represented by the following formula:
L=L bce +δL dc +φL fc (4)
y n wherein represents true change of earth's surface, p n Representing predicted architectural changes, H, W representing the height and width of the image, respectively; alpha is super parameter, alpha is more than or equal to 0, p is estimated probability of model, and the value range is 0,1]The method comprises the steps of carrying out a first treatment on the surface of the Delta phi is used for balance loss; l (L) bce Is a cross entropy loss function, L dc 、L fc The Dice loss function and the Focal loss function, respectively.
7. The remote sensing image building change detection method based on the characteristic enhancement network according to claim 3, wherein the method comprises the following steps: the U-shaped residual error module is divided into two parts: an upper branch and a lower branch; the specific application method comprises the following steps:
reducing the size of the input feature map to half its original size with a max pooling layer having a pooling kernel size of 2 x 2;
deep semantic features are then extracted through the four convolution layers. Using the above information to reduce information loss according to the skipped connection;
splicing the outputs of the fourth convolution layer and the third convolution layer, and inputting the outputs into the convolution layers;
after three identical operations are performed, the output is up-sampled. Then adding the output and input features;
the lower branch operates the same as the upper branch except that the maximum pooling is replaced by the average pooling.
8. The remote sensing image building change detection method based on the characteristic enhancement network according to claim 3, wherein the method comprises the following steps: the enhancement feature extraction module is divided into a left branch and a right branch; the left branch mainly extracts feature information in the space dimension, and the right branch mainly excavates features in the channel dimension; the left branch first compresses the channel of the input feature to 1 by a convolution of 1×1; then extracting its spatial features through two convolution layers; the result is fed to a sigmoid function to perform weight normalization; multiplying the weight and the input feature, and distributing the weight for each feature element; finally, obtaining left branch output through addition; the left side of the right branch uses an average pooling operation to pool each channel information of the input feature; the right side of the right branch then uses a max pooling operation in order to comprehensively consider feature information from a global and local perspective; the channels are then compressed and expanded using two convolution layers of 1 x 1, the result being activated by a sigmoid function; after multiplying the activation result by the input feature, the left and right features are added to obtain the final output.
9. The remote sensing image building change detection method based on the characteristic enhancement network according to claim 3, wherein the method comprises the following steps: when the number of channels is adjusted by convolution of 1 multiplied by 1, the mechanism can correlate pixels at different positions according to self-attention, and the building can be well identified; therefore, the self-attention feature fusion module works as follows: taking the subtraction branch as a query vector, taking the addition branch as a key vector, and taking the splicing branch as a value vector; performing reshape and transpose operations, and performing reshape on the subtraction output; then multiplying them, activating the multiplication result by using a sigmoid function, and fusing the characteristic information of the spliced branches by using two convolution layers; then multiplying the result by a final weight matrix to construct a space long-distance dependency relationship, and carrying out reshape transformation on the multiplication result; finally, the result and the two input features are added element by element, and the final output is considered as a weighted sum of the input features and all the position features.
10. The remote sensing image building change detection method based on the characteristic enhancement network according to claim 1, wherein the method comprises the following steps: in the cross-channel up-down Wen Yuyi aggregation module, the middle feature diagram is obtained by channel splicing of left features and right features, the size of the middle features is compressed into 1×1 through a self-adaptive average pooling layer, the channel number is adjusted through a 1×1 convolution layer, convolution output and the channel number are spliced, a splicing result is transmitted to the 1×1 convolution layer, the convolution result is transmitted to a right branch for connection and fusion, a sigmoid function is used for normalizing a weight matrix, and aggregation and fusion of channel information can be fully realized through the left branch and the right branch; extracting multi-scale features of left and right branch features using two convolution layers; and finally, multiplying the weight matrix by the convolution result, and adding the weight matrix element by element to ensure that the channel information of the output feature images realize good intersection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310426990.4A CN116310839A (en) | 2023-04-20 | 2023-04-20 | Remote sensing image building change detection method based on feature enhancement network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310426990.4A CN116310839A (en) | 2023-04-20 | 2023-04-20 | Remote sensing image building change detection method based on feature enhancement network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116310839A true CN116310839A (en) | 2023-06-23 |
Family
ID=86787251
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310426990.4A Pending CN116310839A (en) | 2023-04-20 | 2023-04-20 | Remote sensing image building change detection method based on feature enhancement network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116310839A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117152621A (en) * | 2023-10-30 | 2023-12-01 | 中国科学院空天信息创新研究院 | Building change detection method, device, electronic equipment and storage medium |
-
2023
- 2023-04-20 CN CN202310426990.4A patent/CN116310839A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117152621A (en) * | 2023-10-30 | 2023-12-01 | 中国科学院空天信息创新研究院 | Building change detection method, device, electronic equipment and storage medium |
CN117152621B (en) * | 2023-10-30 | 2024-02-23 | 中国科学院空天信息创新研究院 | Building change detection method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111539316B (en) | High-resolution remote sensing image change detection method based on dual-attention twin network | |
CN110705457B (en) | Remote sensing image building change detection method | |
CN111080629B (en) | Method for detecting image splicing tampering | |
CN112668494A (en) | Small sample change detection method based on multi-scale feature extraction | |
CN111738110A (en) | Remote sensing image vehicle target detection method based on multi-scale attention mechanism | |
CN113569788B (en) | Building semantic segmentation network model training method, system and application method | |
Yin et al. | Attention-guided siamese networks for change detection in high resolution remote sensing images | |
CN103745453B (en) | Urban residential areas method based on Google Earth remote sensing image | |
CN114187520B (en) | Building extraction model construction and application method | |
CN115601661A (en) | Building change detection method for urban dynamic monitoring | |
CN116310839A (en) | Remote sensing image building change detection method based on feature enhancement network | |
CN113569724A (en) | Road extraction method and system based on attention mechanism and dilation convolution | |
CN114494821A (en) | Remote sensing image cloud detection method based on feature multi-scale perception and self-adaptive aggregation | |
CN111275694B (en) | Attention mechanism guided progressive human body division analysis system and method | |
CN114494870A (en) | Double-time-phase remote sensing image change detection method, model construction method and device | |
CN115376019A (en) | Object level change detection method for heterogeneous remote sensing image | |
CN117475236B (en) | Data processing system and method for mineral resource exploration | |
CN112818818B (en) | Novel ultra-high-definition remote sensing image change detection method based on AFFPN | |
CN113298689B (en) | Large-capacity image steganography method | |
CN117173573A (en) | Urban building type change remote sensing detection method | |
CN114463175B (en) | Mars image super-resolution method based on deep convolutional neural network | |
CN115527118A (en) | Remote sensing image target detection method fused with attention mechanism | |
CN115147727A (en) | Method and system for extracting impervious surface of remote sensing image | |
CN116958800A (en) | Remote sensing image change detection method based on hierarchical attention residual unet++ | |
CN113963271A (en) | Model for identifying impervious surface from remote sensing image and method for training model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |