CN117237188A - Multi-scale attention network saliency target detection method based on remote sensing image - Google Patents

Multi-scale attention network saliency target detection method based on remote sensing image Download PDF

Info

Publication number
CN117237188A
CN117237188A CN202311185844.3A CN202311185844A CN117237188A CN 117237188 A CN117237188 A CN 117237188A CN 202311185844 A CN202311185844 A CN 202311185844A CN 117237188 A CN117237188 A CN 117237188A
Authority
CN
China
Prior art keywords
tensor
feature
scale
upsampling
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311185844.3A
Other languages
Chinese (zh)
Inventor
霍丽娜
王咏梅
王威
刘金生
李欢
高学渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN202311185844.3A priority Critical patent/CN117237188A/en
Publication of CN117237188A publication Critical patent/CN117237188A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting a salient object of a multi-scale attention network based on a remote sensing image, which adopts a lightweight backbone network to perform feature coding; mining high-level semantic information at the top of the encoder by compression operation and propagating it top-down; adding a semantic information guided multi-scale extraction module to expand the expression of the receptive field enhanced multi-scale features after the encoder; adding a residual fusion module to eliminate redundant information and noise in the encoder; a two-dimensional residual attention module is added to capture the dependency between space and channel, thereby improving the accuracy of small target detection. The invention can extract the most attractive content in any given image, pay more attention to small objects and restrain background interference.

Description

Multi-scale attention network saliency target detection method based on remote sensing image
Technical Field
The invention relates to a method for detecting a salient object of a multi-scale attention network, in particular to a method for detecting a salient object of a multi-scale attention network based on a remote sensing image, and belongs to the technical field of computer vision.
Background
With the rapid development and wide application of information technology and intelligent technology, a stream of "intelligent" hot flashes are raised worldwide, and artificial intelligence and deep learning are areas of great attention in recent years. The amount of data generated by applications in various fields has been increasing explosively, and how to extract important information from mass data accurately and rapidly has become important. In particular, short video and graphic messages have become a complete part of people's work and entertainment. Video image information is a very visual expression mode, and when the information is processed, the human brain is often focused on more important areas, and the attention of other irrelevant areas is reduced to obtain more detail information related to a remarkable object. Therefore, how to make a computer simulate the human visual system, to quickly locate the most attractive object in an image or video, so that solving the complex visual problem becomes the focus of researchers. In deep learning, the more the number of samples is, the better the model effect is trained, and the stronger the generalization capability of the model is. Therefore, the SRGAN model is disclosed in the journal paper "The theoretical research of generative adversarial networks: an overlapping" by Li et al 2021, which discloses data enhancement for two data sets respectively, so as to improve the definition of the image and reduce noise interference. The SRGAN has good data distribution modeling capability, two neural networks are used for learning in a mutual game mode, and after continuous optimization iteration, the model reaches Nash balance, so that small target information can be well reserved, and the detection capability of the small target is improved. This opens up a new way for detecting complete salient objects.
To date, salient object detection has been extended from the field of natural images to the field of optical remote sensing images, which are usually taken at a high resolution bird's eye view, most objects being possibly small in size, and the structure being more complex. Many researchers have proposed depth models for salient object detection of optical remote sensing images. In the journal IEEE Transactions on Geoscience and Remote Sensing published paper "RRNet: relational reasoning network with parallel multi-scale attention for salient object detection in optical remote sensing images" by Cong et al 2022, a parallel multi-scale attention module is introduced to recover significant object detail information, solving the multi-scale variation problem, but still has the limitation of incomplete detection. Lin et al 2022, international conference paper "Attention guided network for salient object detection in optical remote sensing images" at Artificial Neural Networks and Machine Learning, employed channel attention and spatial attention in combination to extract multiple features from different dimensions, but the salient object boundaries were not sufficiently sharp. All of these methods use a single dimension, ignoring the dependencies and dependencies between dimensions. Therefore, modeling the correlation of channel dimensions and spatial dimensions is needed to improve the accuracy of small target detection.
Disclosure of Invention
The invention aims to provide a multi-scale attention network saliency target detection method based on a remote sensing image.
In order to solve the technical problems, the invention adopts the following technical scheme: a method for detecting a salient object of a multi-scale attention network based on a remote sensing image comprises the following steps:
step 1, image preprocessing: using an SRGAN model to adjust the size of an input image to an input tensor X with a preset size;
step 2, establishing a two-dimensional multi-scale attention network: the two-dimensional multiscale attention network comprises first to fifth feature processing units E 1 -E 5 And a semantic compression operation; the semantic compression operation comprises a depth separable convolution layer DSConv and an adaptive average pooling layer AP; tensor X is sequentially passed through feature encoder E 1 -E 5 After the processing, obtaining first to fifth feature tensors; the fifth feature tensor obtains global semantic information K through semantic compression operation; feedforward of global semantic information K to multipleIn the scale extraction modules SMEM1-SMEM4, the fifth feature tensor after upsampling is spliced with the fourth feature tensor is input into the multi-scale extraction module SMEM4, the output of the multi-scale extraction module SMEM4 after upsampling is spliced with the third feature tensor is input into the multi-scale extraction module SMEM3, the output of the multi-scale extraction module SMEM3 after upsampling is spliced with the second feature tensor is input into the multi-scale extraction module SMEM2, the output of the multi-scale extraction module SMEM2 after upsampling is spliced with the first feature tensor is input into the multi-scale extraction module SMEM1, and the multi-scale extraction modules SMEM1-SMEM4 respectively output the decoded feature tensor F 1 s -F 4 s The method comprises the steps of carrying out a first treatment on the surface of the Feature tensor F 4 s Post-upsampling and decoding feature tensor F 3 s Splicing to obtain joint decoding characteristic tensor F 34 s Feature tensor F 3 s Post-upsampling and decoding feature tensor F 2 s Splicing to obtain joint decoding characteristic tensor F 23 s Feature tensor F 2 s Post-upsampling and decoding feature tensor F 1 s Splicing to obtain joint decoding characteristic tensor F 12 s Joint decoding feature tensor F 12 s 、F 23 s 、F 34 s Inputting a residual error fusion module RFM; residual fusion module RFM outputs multi-scale feature tensor F r The method comprises the steps of carrying out a first treatment on the surface of the Multiscale feature tensor F r Input two-dimensional residual attention module BRAM, and output decoding characteristic tensor F by the two-dimensional residual attention module BRAM sa As a saliency map;
step 3, detecting a significance map: the image to be detected is input into a two-dimensional multi-scale attention network, and a saliency map is output.
Further, the size of the input tensor X is 224×224×3, and the sizes of the first to fifth feature tensors are 112×112×16, 56×56×24, 28×28×32, 14×14×96, 7×7×320, respectively.
Further, the fifth feature tensor is subjected to semantic compression operation, and first, 3×3 depth separable convolution is applied to the fifth feature tensor to obtain a 7×7×64 feature map, and then, adaptive average pooling is applied to obtain a 5×5×64 feature map.
Further, the fifth feature tensor calculates an advanced semantic feature tensor K through convolution and pooling:
K=avgpool(DSConv 3×3 (E 5 ))
further, the output of the decoder features is:
where UP denotes upsampling using bilinear interpolation and cat denotes connecting in the channel dimension. Sequentially combining deep coding feature upsampling and the previous layer features pairwise in channel dimension for splicing, and then performing SMEM processing to obtain a multi-scale feature F i s F is to F i s The above operation is repeated and input to the RFM optimization correction.
Further, after BRAM decoding, the output characteristic diagram F with the original size is obtained by up-sampling by using a bilinear interpolation method sa
By adopting the technical scheme, the invention has the following technical effects:
according to the invention, by introducing data enhancement and multi-scale attention strategies, the definition of an original image can be improved, and the practicability and accuracy of the model are improved. The semantic information representation capability is enhanced by compressing the advanced features, the semantic information is fed into the SMEM for multi-scale feature interaction, more effective multi-scale information is captured, and the problem of complex scale is solved; constructing RFM fusion semantic information and detail information, further capturing remarkable clues, and improving detail representation capability of low-level features; and the correlation between the BRAM modeling space and the channel is designed, and the BRAM modeling space and the channel are fused to inhibit background interference, so that the accuracy of small target detection is improved.
Drawings
Fig. 1 is a frame diagram of the present invention.
Fig. 2 is a block diagram of the semantic information guided multi-scale extraction module SMEM of the invention.
Fig. 3 is a block diagram of the residual fusion module RFM of the invention.
Fig. 4 is a block diagram of a two-dimensional residual attention module BRAM of the present invention.
Fig. 5 is an input image of embodiment 1 of the present invention.
Fig. 6 is a graph of the significance of the test of example 1 of the present invention.
Detailed Description
The following examples serve to illustrate the invention.
Example 1
Referring to fig. 1, a method for detecting a salient object of a two-dimensional multi-scale attention network based on an optical remote sensing image includes the following steps:
step 1, image preprocessing: the SRGAN model is used to increase the fine granularity of the image, the input image is adjusted to a tensor X of a preset size, in this embodiment 224×224×3;
step 2, establishing a two-dimensional multi-scale attention network: the two-dimensional multiscale attention network comprises first to fifth feature processing units E 1 -E 5 And a semantic compression operation; the semantic compression operation comprises a depth separable convolution layer DSConv and an adaptive average pooling layer AP; tensor X is sequentially passed through feature encoder E 1 -E 5 After the processing, obtaining first to fifth feature tensors; the fifth feature tensor obtains global semantic information K through semantic compression operation; the global semantic information K is fed forward to a multi-scale extraction module SMEM1-SMEM4, the fifth feature tensor after upsampling is spliced with the fourth feature tensor is input to a multi-scale extraction module SMEM4, the output of the multi-scale extraction module SMEM4 after upsampling is spliced with the third feature tensor is input to a multi-scale extraction module SMEM3, the output of the multi-scale extraction module SMEM3 after upsampling is spliced with the second feature tensor is input to a multi-scale extraction module SMEM2, the output of the multi-scale extraction module SMEM2 after upsampling is spliced with the first feature tensor is input to a multi-scale extraction module SMEM1, and the multi-scale extraction modules SMEM1-SMEM4 respectively output decoded feature tensors F 1 s -F 4 s The method comprises the steps of carrying out a first treatment on the surface of the Feature tensor F 4 s Post-upsampling and decoding feature tensor F 3 s Splicing to obtain joint decoding characteristic tensor F 34 s Feature tensor F 3 s Post-upsampling and decoding feature tensor F 2 s Splicing to obtain joint decoding characteristic tensor F 23 s Feature tensor F 2 s Post-upsampling and decoding feature tensor F 1 s Splicing to obtain joint decoding characteristic tensor F 12 s Joint decoding feature tensor F 12 s 、F 23 s 、F 34 s Inputting a residual error fusion module RFM; residual fusion module RFM outputs multi-scale feature tensor F r The method comprises the steps of carrying out a first treatment on the surface of the Multiscale feature tensor F r Input two-dimensional residual attention module BRAM, and output decoding characteristic tensor F by the two-dimensional residual attention module BRAM sa As a saliency map;
step 3, detecting a significance map: and inputting tensors, and obtaining a saliency map after processing the tensors through a two-dimensional multi-scale attention network.
The sizes of the first to fifth feature tensors in the present embodiment are 112×112×16, 56×56×24, 28×28×32, 14×14×96, 7×7×320, respectively.
And carrying out semantic compression operation on the fifth feature tensor, firstly, carrying out dimension reduction on the fifth feature tensor by applying 3×3 depth separable convolution to obtain a 7×7×64 depth separation feature map, and then, carrying out space compression by applying adaptive average pooling to obtain a 5×5×64 compression feature map.
The fifth feature tensor calculates an advanced semantic feature tensor K through convolution and pooling:
K=avgpool(DSConv 3×3 (E 5 ))
and the global semantic information K is sequentially fed forward and fused with each level of coding feature tensor and is input into a module SMEM to relieve interference caused by scale change and complex background. Referring to fig. 3, two feature tensors are obtained by respectively performing two 1×1 convolutions on the input, one feature tensor is input into three parallel dynamic depth convolutions to extract multi-scale information, and the void ratio is r=1, 2 and 3; another feature tensor is refined by adopting the traditional 3×3 convolution; the input feature tensor is compressed into a one-dimensional vector by the operations of maximum pooling and average pooling. The two one-dimensional vectors are then passed through a 1 x 1 convolution, sigmoid function and SELayThe er layer enhances the salient features; finally, the three pieces of context information are aggregated, and 4 groups of characteristic tensors F with different resolution sizes are output 1 s -F 4 s Sizes are 112×112×64, 56×56×64, 28×28×64, 14×14×64, respectively;
then the characteristic tensor F 1 s -F 4 s Sequentially connecting in channel dimension to obtain 3 groups of decoding characteristic tensors F 12 s 、F 23 s 、F 34 s As inputs to the residual fusion module RFM, the sizes are 112×112×64, 56×56×64, 28×28×64, respectively. Referring to fig. 4, the rfm fuses semantic information and detail information layer by layer to achieve effective multi-scale feature fusion, and obtains a multi-scale feature tensor F r The size is 112 multiplied by 64;
tensor of multi-scale features F r Input into a two-dimensional residual attention module BRAM for processing. Referring to fig. 5, the input is first subjected to multi-scale feature extraction by adopting a multi-scale strategy, so as to obtain multi-scale features; then explore two-dimensional attention along channel and space dimension, solve the problems of attention loss and multi-scale change, recover detail information of remarkable object, output decoding characteristic tensor F sa As a significance map, the size thereof was 112×112×64. The output process of each module in the decoder is as follows:
wherein UP is UP sampled by bilinear interpolation, cat denotes connection in channel dimension, SMEM, RFM and BRAM denote operation of three modules.
Finally, the output characteristic diagram is up-sampled by 2 times to be the same as the original size, namely 224×224×64.
It should be noted that, at present, the technical scheme of the invention has been applied and researched in a small range, and the research result shows that the user satisfaction is higher. Now, the preparation technology conversion application is started, and the intellectual property risk early warning investigation is carried out.

Claims (9)

1. The method for detecting the saliency target of the multi-scale attention network based on the remote sensing image is characterized by comprising the following steps of:
step 1, image preprocessing: using an SRGAN model to adjust the size of an input image to an input tensor X with a preset size;
step 2, establishing a two-dimensional multi-scale attention network: the two-dimensional multiscale attention network comprises first to fifth feature processing units E 1 -E 5 And a semantic compression operation; the semantic compression operation comprises a depth separable convolution layer DSConv and an adaptive average pooling layer AP; tensor X is sequentially passed through feature encoder E 1 -E 5 After the processing, obtaining first to fifth feature tensors; the fifth feature tensor obtains global semantic information K through semantic compression operation; the global semantic information K is fed forward to a multi-scale extraction module SMEM1-SMEM4, the fifth feature tensor after upsampling is spliced with the fourth feature tensor is input to a multi-scale extraction module SMEM4, the output of the multi-scale extraction module SMEM4 after upsampling is spliced with the third feature tensor is input to a multi-scale extraction module SMEM3, the output of the multi-scale extraction module SMEM3 after upsampling is spliced with the second feature tensor is input to a multi-scale extraction module SMEM2, the output of the multi-scale extraction module SMEM2 after upsampling is spliced with the first feature tensor is input to a multi-scale extraction module SMEM1, and the multi-scale extraction modules SMEM1-SMEM4 respectively output decoded feature tensors F 1 s -F 4 s The method comprises the steps of carrying out a first treatment on the surface of the Feature tensor F 4 s Post-upsampling and decoding feature tensor F 3 s Splicing to obtain joint decoding characteristic tensor F 34 s Feature tensor F 3 s Post-upsampling and decoding feature tensor F 2 s Splicing to obtain joint decoding characteristic tensor F 23 s Feature tensor F 2 s Post-upsampling and decoding feature tensor F 1 s Splicing to obtain joint decoding characteristic tensor F 12 s Joint decoding feature tensor F 12 s 、F 23 s 、F 34 s Input residual fusionCombining the modules RFM; residual fusion module RFM outputs multi-scale feature tensor F r The method comprises the steps of carrying out a first treatment on the surface of the Multiscale feature tensor F r Input two-dimensional residual attention module BRAM, and output decoding characteristic tensor F by the two-dimensional residual attention module BRAM sa As a saliency map;
step 3, detecting a significance map: the image to be detected is input into a two-dimensional multi-scale attention network, and a saliency map is output.
2. The method for detecting a saliency target of a multiscale attention network based on remote sensing images according to claim 1, wherein the input tensor size is 224×224×3, and the first to fifth feature tensors are 112×112×16, 56×56×24, 28×28×32, 14×14×96, 7×7×320, respectively.
3. The method for detecting the saliency target of the multi-scale attention network based on the remote sensing image according to claim 2, wherein the semantic compression operation is performed on a fifth feature tensor, firstly, 3×3 depth separable convolution is applied to the fifth feature tensor to perform dimension reduction, a 7×7×64 depth separation feature map is obtained, and then, adaptive average pooling is applied to perform spatial compression, so as to obtain a 5×5×64 compression feature map.
4. The method for detecting saliency target of a remote sensing image-based multi-scale attention network of claim 1, wherein the output of the decoder characterizer is:
sequentially combining deep coding feature upsampling and the previous layer features pairwise in channel dimension for splicing, and then performing SMEM processing to obtain a multi-scale feature F i s F is to F i s The above operation is repeated and input to the RFM optimization correction.
5. The remote sensing image-based multi-scale of claim 1The significance target detection method of the degree attention network is characterized by comprising the following steps of 4 s Upsampling and decoding feature tensor F by bilinear interpolation 3 s Splicing to obtain joint decoding characteristic tensor F 34 s
6. The method for detecting saliency target of a multiscale attention network based on remote sensing images according to claim 1, wherein a feature tensor F 3 s Upsampling and decoding feature tensor F by bilinear interpolation 2 s Splicing to obtain joint decoding characteristic tensor F 23 s
7. The method for detecting saliency target of a multiscale attention network based on remote sensing images according to claim 1, wherein a feature tensor F 2 s Upsampling and decoding feature tensor F by bilinear interpolation 1 s Splicing to obtain joint decoding characteristic tensor F 12 s
8. The method for detecting saliency target of multi-scale attention network based on remote sensing image according to claim 1, wherein the feature tensor F is 34 s Characteristic tensor F 23 s And a characteristic tensor F 12 s Input to RFM optimization correction.
9. The method for detecting the saliency target of the multiscale attention network based on the remote sensing image according to claim 1, wherein the output characteristic diagram F with the original size is obtained by upsampling through a bilinear interpolation method after BRAM decoding sa
CN202311185844.3A 2023-09-14 2023-09-14 Multi-scale attention network saliency target detection method based on remote sensing image Pending CN117237188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311185844.3A CN117237188A (en) 2023-09-14 2023-09-14 Multi-scale attention network saliency target detection method based on remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311185844.3A CN117237188A (en) 2023-09-14 2023-09-14 Multi-scale attention network saliency target detection method based on remote sensing image

Publications (1)

Publication Number Publication Date
CN117237188A true CN117237188A (en) 2023-12-15

Family

ID=89092400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311185844.3A Pending CN117237188A (en) 2023-09-14 2023-09-14 Multi-scale attention network saliency target detection method based on remote sensing image

Country Status (1)

Country Link
CN (1) CN117237188A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117994506A (en) * 2024-04-07 2024-05-07 厦门大学 Remote sensing image saliency target detection method based on dynamic knowledge integration

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117994506A (en) * 2024-04-07 2024-05-07 厦门大学 Remote sensing image saliency target detection method based on dynamic knowledge integration

Similar Documents

Publication Publication Date Title
CN112329800B (en) Salient object detection method based on global information guiding residual attention
Yeh et al. Multi-scale deep residual learning-based single image haze removal via image decomposition
Kim et al. Fully deep blind image quality predictor
CN111488865B (en) Image optimization method and device, computer storage medium and electronic equipment
CN111242238B (en) RGB-D image saliency target acquisition method
CN111754438B (en) Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN112801901A (en) Image deblurring algorithm based on block multi-scale convolution neural network
CN112288627B (en) Recognition-oriented low-resolution face image super-resolution method
CN111612711A (en) Improved picture deblurring method based on generation countermeasure network
Huang et al. High-quality face image generated with conditional boundary equilibrium generative adversarial networks
Kim et al. Deeply aggregated alternating minimization for image restoration
CN117237188A (en) Multi-scale attention network saliency target detection method based on remote sensing image
CN115457568B (en) Historical document image noise reduction method and system based on generation countermeasure network
CN114926734B (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN116645287B (en) Diffusion model-based image deblurring method
Hongmeng et al. A detection method for deepfake hard compressed videos based on super-resolution reconstruction using CNN
CN117274059A (en) Low-resolution image reconstruction method and system based on image coding-decoding
Liu et al. Overview of image inpainting and forensic technology
CN113837290A (en) Unsupervised unpaired image translation method based on attention generator network
Gao A method for face image inpainting based on generative adversarial networks
Yu et al. A review of single image super-resolution reconstruction based on deep learning
CN104252715B (en) Single line image-based three-dimensional reconstruction method
Zhang et al. AG-Net: An advanced general CNN model for steganalysis
CN115965844B (en) Multi-focus image fusion method based on visual saliency priori knowledge
CN117217997A (en) Remote sensing image super-resolution method based on context perception edge enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination