CN113609896B

CN113609896B - Object-level remote sensing change detection method and system based on dual-related attention

Info

Publication number: CN113609896B
Application number: CN202110692812.7A
Authority: CN
Inventors: 胡翔云; 张琳; 张觅
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2023-09-01
Anticipated expiration: 2041-06-22
Also published as: CN113609896A

Abstract

The invention provides an object-level remote sensing change detection method and system based on dual-related attention, which are used for carrying out data enhancement for change detection to generate a double input stream; setting a backbone network with shared weight for receiving the double input stream and extracting different scale characteristics of the double-phase image; setting a feature fusion neck of dual-related attention guidance, focusing on the correlation of the dual-time phase features of the same scale in a space level and a channel level to obtain refined difference features, and setting a refinement path aggregation pyramid module to fuse features among layers of different scales; and finally, sending the difference features of different scales into a change detection head, and predicting the position, the size and the change confidence of the change ground feature in the form of a boundary frame. The data enhancement method special for the change detection can accelerate model training and improve model performance, can effectively resist pseudo-change interference in image pairs through guidance of dual-related attention mechanisms, and has higher accuracy and robustness.

Description

Object-level remote sensing change detection method and system based on dual-related attention

Technical Field

The invention belongs to the field of automatic change detection of remote sensing images, and particularly relates to an object-level remote sensing change detection method and system based on dual-related attention.

Background

Change detection is a process of detecting a difference by observing and recognizing states of objects or phenomena of different periods. More precisely, the purpose of the change detection is to find the change information of a semantic category of particular interest under the influence of filtering out irrelevant change information. It has been one of the most important problems in the remote sensing field. Currently, change detection is widely used in various applications, such as urban planning, land resource management, environmental monitoring, agricultural investigation, disaster assessment and other applications, and has great research value.

Currently, the change detection method can be classified into a pixel-level change detection method, an object-level change detection method, and a scene-level change detection method according to the granularity of the basic unit used. Pixel level methods typically output pixel level binary mask predictions by extracting ground varying features, classifying pixel by pixel as varying or unchanged in input size. For object level change detection, an object is generally acquired by means of segmentation or detection, and whether the object is changed or not is determined by extracting and comparing multi-temporal features of the object, and prediction of the object level, such as a binary mask of an object instance or a bounding box of the object, is output. The scene level method analyzes whether the category of the corresponding scene changes from the semantic level and what changes occur at different time, classifies images or image slices, and provides prediction of image level labels. Since change detection most of the time requires determination of changes in a particular class of regions or objects, pixel-level and object-level methods are more used.

For the pixel-level change detection method, representative conventional methods are Principal Component Analysis (PCA), change Vector Analysis (CVA), and the like. In view of the problem of poor adaptability of manually designed features to high-resolution images and the superiority of the deep learning method in feature extraction, more and more change detection studies begin to introduce the deep learning method. These models have a large receptive field with far better performance than traditional methods, but remain at the pixel level in terms of processing. The pixel level method always has a hidden constraint condition, and needs to perform high-precision mutual registration between different phase data, namely, performing strict alignment on front and rear images. However, shape changes due to the difference in viewing angle and projected shadows caused by high-rise buildings tend to create false change regions. Furthermore, recent studies have shown that modern deep neural networks are not strictly invariant to translation. Therefore, under supervision of the "pixel tag" mode, the pixel level method with the pixel correspondence effect has difficulty in avoiding such pseudo-variations.

The conventional object-based change detection method uses an "object" as an analysis unit, and can reduce erroneous judgment of a pseudo-change region to some extent. Notably, an "object" in an object-based approach is typically a set of local clusters of pixels, obtained by segmenting an image using spectral texture features, geometric features (e.g., shape and area), and other information. However, due to the limitations of conventional manual design feature extraction methods, these segmented regions often rely on thresholds, making "objects" prone to over-segmentation and boundary fragmentation, have no semantic integrity, and do not truly mimic actual geographic entity targets.

Disclosure of Invention

In order to overcome the technical problems, the invention provides an object-level remote sensing change detection scheme based on dual-related attention by utilizing a deep learning technology, which is used for detecting changed geographic entities (such as newly added buildings and changed artificial ground objects). Compared with the pixel-level method and the traditional object-based method, the method disclosed by the invention focuses more on the overall information and context connection of the changed geographic entity, and can effectively resist pseudo-change interference in the image.

The invention provides an object-level remote sensing change detection method based on dual-related attention, which comprises the following steps,

step 1, data enhancement for change detection is carried out to generate a double input stream, the implementation mode comprises that random time sequence exchange and combination are carried out on four pairs of images to obtain a pair of synthesized images, enhancement operation is carried out on one of the synthesized images, the enhancement operation comprises random geometric transformation, brightness transformation, gaussian noise addition and color transformation, and only corresponding geometric transformation is carried out on the other synthesized image;

step 2, setting a backbone network with shared weight for receiving the double input streams and extracting different scale characteristics of the double-phase images;

step 3, setting a feature fusion neck for dual-related attention guidance on the basis of a backbone network, wherein a dual-related attention module is arranged in the feature fusion neck for guiding the network to focus on the correlation of the spatial hierarchy and the channel hierarchy of dual-temporal features with the same scale so as to acquire refined difference features, and a refinement path aggregation pyramid module is arranged for fusing features among layers with different scales;

and 4, finally, sending the difference features with different scales into a change detection head, and predicting the position, the size and the change confidence of the change feature in a boundary frame mode.

In addition, in the backbone network sharing the weight, feature reuse is realized through a layered feature fusion strategy, and excessively repeated gradient information is truncated and used for extracting parallel features of different scales of the double-phase images.

And, construct the twin network as the backbone network which shares the weight with CSPDarkNet-53 network, extract the characteristic of five scales, mark as C1-C5 layer sequentially from top to bottom.

And the feature fusion neck of the dual collaborative attention guide fuses the features of the C3-C5 layers of the twin CSPDarknet-53 network by adopting three dual relevant attention modules to generate P3-P5 layer features, and then fuses the difference features among different scale layers by adopting a refinement path aggregation pyramid module.

Moreover, the dual-correlation attention module comprises a space-correlation attention module, a channel-correlation attention module and a variation difference module, and parallel features F from the same scale layer are set _a B are connected in series to obtain feature F _ab Difference feature F directing spatially dependent attention _P Channel dependent attention directed difference feature F _C And feature F _ab And further fusing to obtain the final variation difference characteristic F.

The change detection heads are divided into three layers, namely detection head-S, detection head-M and detection head-L, respectively correspond to N3-N5 layer characteristics from the refinement path aggregation pyramid module, learn the change confidence and coordinates of the change object in parallel according to the supervision information, and learn the regression of the bounding box of the change object by using a priori frame; training the whole network model according to the training sample labels by using a multi-task loss function; and inputting the double-phase time-phase images to be detected into a trained network to obtain an accurate change detection result.

On the other hand, the invention also provides an object-level remote sensing change detection system based on the dual-related attention, which is used for realizing the object-level remote sensing change detection method based on the dual-related attention.

Furthermore, the device comprises the following modules,

the first module is used for carrying out data enhancement for change detection to generate a double input stream, the implementation mode comprises that random time sequence exchange and combination are carried out on four pairs of images to obtain a pair of synthesized images, enhancement operation is carried out on one of the synthesized images, the enhancement operation comprises random geometric transformation, brightness transformation, gaussian noise addition and color transformation, and only corresponding geometric transformation is carried out on the other synthesized image;

the second module is used for setting a backbone network with shared weight and receiving double input streams and extracting different scale characteristics of the double-phase images;

and the third module is used for setting a feature fusion neck for dual-related attention guidance on the basis of a backbone network, wherein the feature fusion neck is internally provided with a dual-related attention module for guiding the network to focus on the correlation of the dual-time phase features with the same scale in a space level and a channel level so as to acquire refined difference features, and a refinement path aggregation pyramid module is arranged for fusing features among layers with different scales.

And a fourth module, which is used for sending the difference features with different scales to the change detection head, and predicting the position, the size and the change confidence of the change feature in the form of a boundary frame.

Alternatively, the system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the object-level remote sensing change detection method based on dual-related attention.

Or comprises a readable storage medium having stored thereon a computer program which, when executed, implements an object-level remote sensing change detection method based on dual-related attention as described above.

Compared with the prior art, the invention has the following three advantages:

1) The data enhancement mode special for change detection can accelerate model training and improve model performance.

The data enhancement method provided by the invention helps the network pay more attention to the nature of the change by randomly exchanging the early image and the later image on line instead of over fitting in a specific input mode. Combining four pairs of dual-phase images into a pair of composite image pairs helps to implicitly increase batch size, enrich the changing scene, and help the model to converge more stably and faster. By scaling the composite image pair, the duty ratio of the changed small target is implicitly increased, the network is guided to pay attention to the change detection of the small target, and the model performance is effectively improved.

2) Feature correlation is used to capture the difference features and speed up model convergence.

The dual correlation attention module designed by the invention establishes correlation attention of parallel features from channels and spatial layers, guides the network to further refine the features related to the change from the channels and spatial layers of the features, suppresses the unrelated features, and is beneficial to rapid convergence of the network.

3) The method has the advantages of being excellent in robustness and good in change detection effect, and can effectively resist pseudo-change interference.

According to the invention, through network structure design and loss function design, the constructed end-to-end change detection network is more focused on the integral characteristics and context association of the change object, and pseudo-change interference caused by visual angle change and projection difference is avoided. The whole training process does not need to design artificial feature guidance, and the network can self-adaptively learn the required features, so that the method has better generalization. Under complex changing scenes, the method also has better stable performance.

Drawings

FIG. 1 is a flow chart of a data enhancement method dedicated to change detection according to an embodiment of the present invention.

Fig. 2 is a general framework diagram of a dual-related attention-directed change detection network in accordance with an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a dual-related attention module according to an embodiment of the present invention, where fig. 3 (a) is a general architecture of the dual-related attention module, fig. 3 (b) is a specific architecture of the spatial-related attention module, fig. 3 (c) is a specific architecture of the channel-related attention module, and fig. 3 (d) is a specific architecture of the variation difference module.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings and examples.

In the data layer, the framework designs a data enhancement method special for change detection, which can effectively accelerate the training speed of the model and improve the performance of the model. At the model level, the framework constructs a dual-related attention-guided change detection network, and can effectively extract the integral characteristics and the context association of a change object. The framework ultimately represents the detected changing geographic entity (e.g., newly added building, artificial structure, etc.) in the form of a bounding box.

The embodiment of the invention provides an object-level remote sensing change detection method based on dual-related attention, which comprises a data enhancement process (shown in figure 1) special for change detection and a dual-related attention-guided change detection network (shown in figure 2). The dual-related-attention-guided change detection network sequentially comprises a backbone network sharing weights, a dual-collaborative-attention-guided feature fusion neck and a change detection head, wherein the dual-collaborative-attention-guided feature fusion neck comprises a plurality of dual-related attention modules and a refinement path aggregation pyramid module corresponding to the backbone network. In addition, fig. 3 illustrates architectural details of dual attention modules in a network. The method specifically comprises the following steps:

step 1: the data enhancement mode special for change detection performs random combination and transformation on the four pairs of images to generate a double-input stream.

The embodiment adopts a data enhancement mode special for change detection to perform random time sequence exchange and combination on four pairs of images, and performs random geometric transformation, color transformation and other operations to generate a double input stream.

Comprises the following substeps:

step 1.1, randomly selecting four pairs of double-phase images of different areas from a training set, and randomly exchanging front-phase images and rear-phase images.

And 1.2, splicing and combining the four pairs of processed images into a pair of synthetic image pairs in random order.

For example, the region 1T0 phase image, the region 2T0 phase image, the region 3T0 phase image, the region 4T0 phase image, the region 1T1 phase image, the region 2T1 phase image, the region 3T1 phase image, and the region 4T1 phase image are randomly exchanged, and the region 1T1 phase image, the region 2T0 phase image, the region 3T1 phase image, and the region 4T0 phase image are synthesized 1, and the region 1T0 phase image, the region 2T1 phase image, the region 3T0 phase image, and the region 4T1 phase image are synthesized 2.

Step 1.3, for one of the synthesized images, the embodiment performs the following enhancement functions:

(1) Geometric transformation: random transformation, including clipping, scaling, translation, flipping, and rotation;

(2) Brightness conversion;

(3) Increasing Gaussian noise;

(4) And (5) color transformation.

To ensure the versatility and robustness of the network, the embodiment only performs the same geometric transformation method for another composite image. Through the above steps, a dual input image stream is generated using the composite image.

Step 2: and setting a backbone network with shared weight to receive the double input stream and extract different scale characteristics of the double-phase image.

The embodiment uses a CSPDarkNet-53 network to construct a twin network as a backbone network with shared weight to receive double input streams, realizes feature reuse through a hierarchical feature fusion strategy, and cuts off excessively repeated gradient information for extracting parallel features of different scales of a double-phase image (the C1-C5 layers from top layer to bottom layer in FIG. 2 correspond to five corresponding scale features).

Step 3: the feature fusion neck of the dual collaborative attention guide fuses the features of the C3-C5 layers of the twin CSPDarknet-53 network by adopting three dual relevant attention modules to generate P3-P5 layer features, and then fuses the difference features among different scale layers by adopting a refinement path aggregation pyramid module. Comprises the following substeps:

step 3.1, the dual-correlation attention module acquires refined difference features by using the feature correlation of the spatial hierarchy and the channel hierarchy on the parallel features of the same scale layer, wherein the refined difference features comprise the spatial correlation attention module, the channel correlation attention module and the variation difference module, and specific details are shown in fig. 3 and comprise the following substeps:

step 3.1.1 parallel features F from the Co-scale layer _a B are fed into a spatially dependent attention module. First calculate their spatial correlation map C _P As shown in fig. 3 (b), the calculation is as follows:

wherein ,is through F _a Results obtained from the transpose and deformation operations, F _b ^′ Is made up of F _b The results obtained by the deformation operation->Representing a matrix multiplication. According to the space correlation diagram C _P For F _a And F is equal to _b The spatial local features related to the change in (C) are enhanced, the spatial local features unrelated to the change are suppressed, and the spatial correlation diagram C is used for solving the problem that the spatial local features are not related to the change _P Normalized to softmax of (c) and normalized to deformed F _a Matrix multiplication may yield enhanced features F _Pa From transposed spatial correlation diagram C _P Normalized to softmax of (c) and normalized to deformed F _b Matrix multiplication may yield enhanced features F _Pb . Finally, fuse feature F _Pa and F_Pb To generate spatially dependent attention weights W _P The fusion process includes using the feature F _Pa and F_Pb Make a connection (symbol C in FIG. 3 (b)), sigmoid transform to obtain feature F _Pab And compressing the channel number from 2C to obtain W' p through 1*1 convolution and 3*3 convolution, and finally performing sigmoid conversion.

Step 3.1.2 parallel features F from the co-scale layer _a ,F _b Is fed into the channel dependent attention module. First calculate their channel correlation diagrams C _C As shown in fig. 3 (c), the calculation is as follows:

wherein ,is through F _b Results obtained from the transpose and deformation operations, F' _a Is made up of F _a The results obtained from the deforming operation are,representing a matrix multiplication. According to the channel correlation diagram C _C For F _a And F is equal to _b The channels related to the change in (a) are enhanced, wherein the channels not related to the change are suppressed, and a channel related graph C is obtained _C Enhanced feature F _Ca and F_Cb . Finally, fuse feature F _Ca and F_Cb De-generating channel dependent attention weights W _C . The fusion process includes the use of the feature F _Ca and F_Cb Performing connection (symbol C in FIG. 3 (C)), sigmoid transformation to obtain feature F _Cab The number of channels is compressed from 2C to C by 1*1 convolution and 3*3 convolution to obtain W' _C Finally, the method is subjected to sigmoid transformation.

Step 3.1.3 parallel features F from the Co-scale layer _a ,F _b Is fed into a variance module. The variation difference module extracts difference features through two branches of absolute difference operation and serial operation respectively, and finally fuses the difference features extracted by the two branches to obtain final difference feature F _D As shown in fig. 3 (d). F (F) _d1 From F _a ,F _b Obtaining absolute difference F _ab From F _a ,F _b Characteristic connection is carried out, and then the characteristic F can be obtained through 1*1 convolution _d2 Final F _d1 and F_d2 Performing feature connection, 1*1 convolution, batch normalization and LeakyReLU transformation to obtain final feature F _D 。

LeakyReLU(x)＝max(αx,x)

Step 3.1.4 parallel features F from the co-scale layer _a ,F _b Obtaining feature F by tandem _ab 。

Step 3.1.5, F _D Respectively at W _P ,W _C Is guided by (a) to perform feature enhancement to obtain difference feature F guided by spatially dependent attention _P Channel dependent attention directed difference feature F _C As shown in fig. 3 (a), the calculation method is as follows:

F′ _p ＝W _P ⊙F _D

F′ _c ＝W _C ⊙F _D

wherein +.H indicates an element-wise dot product operation,representing an element-wise addition operation.

Difference feature F that directs spatially dependent attention _P General purpose medicineLane dependent attention directed difference feature F _C And feature F _ab And further fusing to obtain the final variation difference characteristic F.

And 3.2, fusing the difference features among the layers with different scales by using a refinement path aggregation pyramid module.

The path aggregation pyramid network adds a bottom-up path outside a conventional top-down path, and can effectively enhance the expression of low-layer features in feature fusion. The refinement path aggregation pyramid module provided by the invention uses series operation instead of element-by-element addition operation when fusion of features of different scales is carried out, so that the features of a lower layer and a higher layer can be adaptively fused, and semantic changes of features of different scales can be more effectively captured.

Features of the C3-C5 layers are subjected to feature fusion of different phases through a dual-related attention module, and then feature fusion of different scales is finished through a top-down path. Specifically, the generation of P4 layer features is taken as an example. The C5 layer features directly obtain P5 layer features through the dual correlation attention module. The features of the P5 layer are amplified to the same resolution as the features of the C4 layer through upsampling, and the features obtained through the dual-related attention module of the features of the C4 layer are fused to obtain the features of the P4 layer. The specific mode is to perform characteristic connection and perform 1*1 convolution, 3*3 convolution, batch normalization and LeakyReLU transformation. The feature generation of the P3 layer is the same as the feature generation of the P4 layer.

And then, the features of the P3-P5 layers are subjected to feature fusion through a bottom-up path to generate features N3-N5. Specifically, taking the generation of the N4 layer feature as an example, the N3 layer feature is obtained by performing 1*1 convolution, 3*3 convolution, batch normalization and LeakyReLU transformation on the P3 layer feature, the N3 layer feature is reduced to the P4 layer feature resolution through upsampling, and the N4 layer feature is obtained by performing feature fusion with the P4 layer feature, specifically, the feature connection is performed, and the steps of 1*1 convolution, 3*3 convolution, batch normalization and LeakyReLU transformation are performed. The feature generation of the N5 layer is the same as the feature generation of the N4 layer.

And 4, sending the features (N3-N5) with different scales from the refinement path aggregation pyramid module into a change detection head to output a change object prediction in the form of a boundary box.

In the dual-related attention-guided change detection network, finally, the difference features with different scales are sent to a change detection head of the network, and the detected change geographic entity (such as a newly added building, an artificial structure and the like) is represented in the form of a boundary box.

In an embodiment, step 4 comprises the sub-steps of:

step 4.1, the change detection heads are divided into three layers, namely a detection head-S, a detection head-M and a detection head-L, which respectively correspond to N3-N5 layer features from the refined path aggregation pyramid module, and the change confidence Conf of the change object is learned in parallel according to the supervision information _changed And coordinates (x, y, w, h). And the prior frames are used for quickly and efficiently learning the regression of the boundary frames of the change object, the prior frames on the three detection layers are 3 by default, and the length and width of all prior frames are obtained by carrying out kmeans clustering according to the training set data. A is the prior frame number, C is the feature channel number, W is the feature map width, and H is the feature map height.

And 4.2, training the model according to the training sample label by using a multi-task loss function. The Loss function Loss includes a bounding box regression Loss L _reg And varying confidence prediction loss L _Conf ：

Loss＝λ _reg ×L _reg +λ _Conf ×L _Conf

wherein λ_reg Represents the weight of the regression loss in the total loss, lambda _Conf Representing the contribution of the varying confidence prediction loss to the total loss. Lambda is taken out _reg ＝1，λ _Conf ＝2.5。

Comprises the following substeps:

and 4.2.1, generating positive and negative samples.

And calculating the ratio of the ground truth box size to the anchor size. If the ratio is within the set threshold (1/4-4), the invention treats the anchor as a positive sample (confidence of 1); otherwise, consider a negative sample (confidence of 0).

Step 4.2.2, calculating the regression loss of the boundary box. The CIoU loss function is used as a bounding box regression loss function.

Wherein b represents the center point coordinates of the prediction bounding box,the center point coordinates representing the truth box, ioU is the cross-correlation factor, the cross-correlation area of the prediction box and the truth box is removed by the cross-correlation area of the prediction box and the truth box, +.>Refers to the Euclidean distance between the center point of the prediction boundary box and the center point of the ground truth box, c represents the diagonal distance of the minimum closed rectangle containing both the prediction box and the ground truth box, and v measures the similarity of the aspect ratio of the prediction boundary box and the ground truth box>Andrepresenting the width and length of the truth box, w and h representing the width and length of the prediction box, alpha being the weight coefficient,

and 4.2.3, calculating the change confidence prediction loss. The Focal loss function is used as the varying confidence loss function.

Where y' represents the predicted confidence of the change, y represents the true confidence, α is used to control the contribution of positive and negative samples to the loss, and γ is used to control the contribution of difficult and simple samples to the loss. In the experiments of the present invention, α=0.25 and γ=1.5 were used by default.

And 4.2.4, training the object level change detection network guided by the dual relevant attention by utilizing the multi-task loss until the whole network is converged to the optimal precision.

And 4.3, inputting the double-phase relative images to be detected into the trained network in the training step 4.2, and obtaining accurate change detection results. The generation of the double-phase image pair to be detected is consistent with the processing mode of the training set in the previous step.

According to the invention, the prediction result of the change detection of part of experimental data can be seen, and the change detection of the remote sensing images under different scenes can be accurately and robustly carried out. Experimental results prove that the data enhancement method special for the change detection can accelerate model training and improve model performance, and the dual-correlation-attention-based detection network can effectively resist pseudo-change interference in an image pair through end-to-end learning and training of a training set on the change image and guiding by a dual-correlation-attention mechanism, and has higher accuracy and robustness.

In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.

In some possible embodiments, an object-level remote sensing change detection system based on dual-related attention is provided, comprising the following modules,

In some possible embodiments, an object level remote sensing change detection system based on dual related attention is provided, including a processor and a memory, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform an object level remote sensing change detection method based on dual related attention as described above.

In some possible embodiments, an object level remote sensing change detection system based on dual-related attention is provided, which comprises a readable storage medium having stored thereon a computer program which, when executed, implements an object level remote sensing change detection method based on dual-related attention as described above.

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. An object-level remote sensing change detection method based on dual-related attention is characterized by comprising the following steps of: comprises the steps of,

step 3, setting a feature fusion neck for dual-related attention guidance on the basis of a backbone network, wherein a dual-related attention module is arranged in the feature fusion neck for guiding the network to focus on the correlation of the spatial hierarchy and the channel hierarchy of dual-temporal features with the same scale so as to acquire refined difference features, and a refinement path aggregation pyramid module is arranged for fusing features among layers with different scales; the implementation is as follows,

step 3.1, the dual-related attention module acquires refined difference features by using the feature correlation of the spatial hierarchy and the channel hierarchy on the parallel features of the same scale layer, wherein the dual-related attention module comprises a spatial related attention module, a channel related attention module and a change difference module, and the implementation process comprises the following substeps:

step 3.1.1 parallel features F from the Co-scale layer _a ，F _b Is fed into a spatially dependent attention module; first calculate their spatial correlation map C _P The calculation is as follows:

wherein ,is through F _a Transpose andthe deformation operation gives results F' _b Is made up of F _b The results obtained by the deformation operation->Representing a matrix multiplication; according to the space correlation diagram C _P For F _a And F is equal to _b The spatial local features related to the change in (C) are enhanced, the spatial local features unrelated to the change are suppressed, and the spatial correlation diagram C is used for solving the problem that the spatial local features are not related to the change _P Normalized to softmax of (c) and normalized to deformed F _a Matrix multiplication to obtain enhanced features F _Pa From transposed spatial correlation diagram C _P Normalized to softmax of (c) and normalized to deformed F _b Matrix multiplication to obtain enhanced features F _Pb The method comprises the steps of carrying out a first treatment on the surface of the Finally, fuse feature F _Pa and F_Pb To generate spatially dependent attention weights W _P The fusion process includes using the feature F _Pa and F_Pb Obtaining a feature F by performing connection and sigmoid transformation _Pab Compressing the channel number by convolution to obtain W' _p Finally, performing sigmoid transformation;

step 3.1.2 parallel features F from the co-scale layer _a ，F _b Is sent to a channel dependent attention module; first calculate their channel correlation diagrams C _C The calculation is as follows:

wherein ,is through F _b Results obtained from the transpose and deformation operations, F' _a Is made up of F _a The results obtained by the deformation operation->Representing a matrix multiplication; according to the channel correlation diagram C _C For F _a And F is equal to _b Channel dependent of medium variationEnhancement in which channels not related to variation are suppressed, obtaining a correlation diagram C according to the channels _C Enhanced feature F _Ca and F_Cb The method comprises the steps of carrying out a first treatment on the surface of the Finally, fuse feature F _Ca and F_Cb To generate channel dependent attention weights W _C The method comprises the steps of carrying out a first treatment on the surface of the The fusion process includes the use of the feature F _Ca and F_Cb Obtaining a feature F by performing connection and sigmoid transformation _Cab Compressing the channel number by convolution to obtain W' _C Finally, performing sigmoid transformation;

step 3.1.3 parallel features F from the Co-scale layer _a ，F _b Is sent to a variance module; the variation difference module extracts difference features through two branches of absolute difference operation and serial operation respectively, and finally fuses the difference features extracted by the two branches to obtain a final difference feature F _D ；F _d1 From F _a ，F _b Obtaining absolute difference F _ab From F _a ，F _b Performing characteristic connection to obtain a characteristic F, and performing convolution to obtain a characteristic F _d2 Final F _d1 and F_d2 Performing feature connection, convolution, batch normalization and LeakyReLU transformation to obtain final feature F _D ；

Step 3.1.4 parallel features F from the co-scale layer _a ，F _b Obtaining feature F by tandem _ab ；

Step 3.1.5, F _D Respectively at W _P ，W _C Is guided by (a) to perform feature enhancement to obtain difference feature F guided by spatially dependent attention _P Channel dependent attention directed difference feature F _C The calculation method is as follows:

F′ _p ＝W _P ⊙F _D

F′ _c ＝W _C ⊙F _D

wherein +.H indicates an element-wise dot product operation,representing an element-wise addition operation;

difference feature F that directs spatially dependent attention _P Channel dependent attention directed difference feature F _C And feature F _ab Further fusing to obtain a final variation difference characteristic F;

step 3.2, fusing the difference features among different scale layers by a refinement path aggregation pyramid module;

the path aggregation pyramid network adds a bottom-up path outside a conventional top-down path, and enhances the expression of low-layer features in feature fusion; the refinement path aggregation pyramid module uses series operation instead of element-by-element addition operation when fusion of features of different scales is carried out so as to adaptively fuse the features of a lower layer and a higher layer and capture semantic changes of ground objects of different scales;

the features of the C3-C5 layers are subjected to feature fusion of different time phases through a dual-related attention module, and then subjected to feature fusion of different scales through a top-down path; then, the features of the P3-P5 layers are further subjected to feature fusion through a bottom-up path to generate features N3-N5;

2. The object-level remote sensing change detection method based on dual-related attention according to claim 1, wherein: in the backbone network sharing the weight, feature reuse is realized through a layered feature fusion strategy, and excessively repeated gradient information is truncated and used for extracting parallel features of different scales of the double-phase images.

3. The object-level remote sensing change detection method based on dual-related attention according to claim 2, wherein: and constructing a twin network by using the CSPDarkNet-53 network as a backbone network sharing the weight, extracting the characteristics of five scales, and marking the characteristics as C1-C5 layers from the top layer to the bottom layer.

4. The object-level remote sensing change detection method based on dual-related attention according to claim 3, wherein: the feature fusion neck of the dual collaborative attention guide fuses the features of the C3-C5 layers of the twin CSPDarknet-53 network by adopting three dual relevant attention modules to generate P3-P5 layer features, and then fuses the difference features among different scale layers by adopting a refinement path aggregation pyramid module.

5. The object-level remote sensing change detection method based on dual-related attention as claimed in claim 1 or 2 or 3 or 4, wherein: the dual-related attention module comprises a space-related attention module, a channel-related attention module and a variation difference module, and is provided with parallel features F from the same-scale layer _a ，F _b Obtaining feature F by tandem _ab Difference feature F directing spatially dependent attention _P Channel dependent attention directed difference feature F _C And feature F _ab And further fusing to obtain the final variation difference characteristic F.

6. The object-level remote sensing change detection method based on dual-related attention according to claim 4, wherein: the change detection heads are divided into three layers, namely a detection head-S, a detection head-M and a detection head-L, which respectively correspond to N3-N5 layer characteristics from the refined path aggregation pyramid module, change confidence and coordinates of the change object are learned in parallel according to the supervision information, and regression of a boundary frame of the change object is learned by using a priori frame; training the whole network model according to the training sample labels by using a multi-task loss function; and inputting the double-phase time-phase images to be detected into a trained network to obtain an accurate change detection result.

7. An object-level remote sensing change detection system based on dual-related attention, which is characterized in that: an object-level remote sensing change detection method for implementing a dual-related attention-based method as claimed in any one of claims 1 to 6.

8. The dual-related attention-based object-level remote sensing change detection system of claim 7, wherein: comprising the following modules, wherein the modules are arranged in a row,

the third module is used for setting a feature fusion neck for dual-related attention guidance on the basis of a backbone network, wherein the feature fusion neck is internally provided with a dual-related attention module for guiding the network to focus on the correlation of the dual-time phase features with the same scale in a space level and a channel level so as to acquire refined difference features, and a refinement path aggregation pyramid module is arranged for fusing features among layers with different scales;

9. The dual-related attention-based object-level remote sensing change detection system of claim 7, wherein: comprising a processor and a memory for storing program instructions, the processor being adapted to invoke the stored instructions in the memory to perform a method of object-level telemetry change detection based on dual-related attention as claimed in any one of claims 1-6.

10. The dual-related attention-based object-level remote sensing change detection system of claim 7, wherein: comprising a readable storage medium having stored thereon a computer program which, when executed, implements a dual related attention based object level remote sensing change detection method as claimed in any of claims 1-6.