CN117893761A - SAR image ship instance segmentation method based on cross-scale attention - Google Patents

SAR image ship instance segmentation method based on cross-scale attention Download PDF

Info

Publication number
CN117893761A
CN117893761A CN202410081388.6A CN202410081388A CN117893761A CN 117893761 A CN117893761 A CN 117893761A CN 202410081388 A CN202410081388 A CN 202410081388A CN 117893761 A CN117893761 A CN 117893761A
Authority
CN
China
Prior art keywords
feature map
scale
target
cross
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410081388.6A
Other languages
Chinese (zh)
Inventor
张强
韩臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202410081388.6A priority Critical patent/CN117893761A/en
Publication of CN117893761A publication Critical patent/CN117893761A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a SAR image ship instance segmentation method based on cross-scale attention, which takes an image and a target real category, a target real coordinate and a real target mask in the image as training samples, and extracts a plurality of groups of training samples from a data set; constructing a CARSNet network structure based on cross-scale attention; inputting a training sample into CARSNet network structures to obtain a trained CARSNet network; and (5) segmenting the SAR image ship examples through the trained CARSNet network structure. In the segmentation process, the method adopts the cross-scale attention module to supplement stable detail information for the high-scale features, and improves the segmentation effect of the algorithm on the ship target contour. The invention adopts the positive sample sampling method based on Gaussian distribution, and the method can adaptively generate more positive samples containing the head and tail of the ship according to the self geometric characteristics of the ship target, thereby being beneficial to the study of the network on the integral characteristics of the ship.

Description

SAR image ship instance segmentation method based on cross-scale attention
Technical Field
The invention belongs to the technical field of image segmentation, and particularly relates to a SAR image ship instance segmentation method based on cross-scale attention.
Background
As an active microwave imaging sensor, synthetic aperture radar (SYNTHETIC APERTURE RADAR, SAR) coherently images a target by transmitting electromagnetic pulses and by receiving echoes. The electromagnetic microwave transmission is adopted, so that the influence of weather, cloud cover shielding and time is hardly caused, and a clear ground object target image can be acquired all the day and all the weather; and the wave band is longer than the light wave band, so that the penetration capability is stronger, and the areas such as cloud layers, vegetation, smog and the like can be penetrated. Due to the above advantages, SAR is widely used in the military and civilian fields. In the civil field, SAR can be used for monitoring, environment detection, city planning, map drawing and the like of offshore illegal ships; in the field of military, SAR can realize detection of important military targets, provides accurate coordinate information for weapon systems, and all countries always adopt SAR satellites to observe marine ships due to unique advantages of SAR. The example segmentation technology can be used for describing outline details of a clearer target, so that the method is widely applied to the fields of automatic driving, medical image analysis, face recognition, video analysis, industrial automation, remote sensing image analysis and the like. Compared with target detection, the method can acquire more accurate ship position information and contour details by dividing the ship instance of the SAR image. Therefore, the method has important practical value and wide application prospect in civil and military fields for the ship instance segmentation of SAR images.
The SRNet algorithm only adopts a single-layer feature map of the FPN network to segment at the segmentation head part, and the contour segmentation effect of a large and a middle targets generated by the algorithm is rough due to the lack of detail information on the high-scale feature map, so that the accurate positions of the targets cannot be provided.
Disclosure of Invention
The invention aims to provide a SAR image ship instance segmentation method based on cross-scale attention, which solves the problem that the accurate position of a target of the type cannot be provided due to the lack of detail information on a feature map, so that the segmentation effect is affected.
The technical scheme adopted by the invention is that the SAR image ship instance segmentation method based on the cross-scale attention is implemented according to the following steps:
Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;
Step2, constructing a CARSNet network structure based on cross-scale attention;
step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network;
and 4, segmenting the SAR image ship examples through the trained CARSNet network structure.
The invention is also characterized in that:
in the step1, the real coordinates of the target comprise the coordinates of a central point of the target in the image, the width and height of the target and the rotation angle coordinates of the target.
The dataset in step 1 is the Instance-RSDD dataset or the SSDD dataset.
The CARSNet network structure based on the cross-scale attention in the step2 comprises the following steps:
Resnet network for extracting low-scale feature map of input image;
The feature extraction network is used for extracting a multi-scale feature map from the input image;
The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;
the example segmentation network comprises a cross-scale attention module and a segmentation head, and is used for 14×14 sampling of the feature images in the rotation detection frame, supplementing the detail information on the low-scale feature images to the sampled feature images, and inputting the feature images of the supplemental information into a segmentation head prediction segmentation mask.
14×14 Samples of the rotation detection frame are obtained by bilinear interpolation algorithm.
The specific process for extracting the low-scale characteristic map of the input image comprises the following steps:
After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C 0 and a feature C 1, the output feature C 0 and the feature C 1 are fused through a formula (1) to obtain a fusion feature C F, and the fusion formula is as follows:
CF=Conv(C0)+DeConv(C1) (1)
Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;
The fusion feature C F is taken as a low-scale feature map of the image.
The specific process of the step 3 is as follows:
Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;
Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;
step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;
Step 3.4, selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing the detail information on the low-scale feature map to the sampled single-layer feature map to obtain a feature map of the supplemental information;
Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;
And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.
Step 3.3, converting the real frame of the target into a two-dimensional Gaussian distribution formula is as follows:
Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) -1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.
The specific process for selecting the single-layer characteristic map of the to-be-supplemented information from the multi-scale characteristic map according to the width and the height of the rotation detection frame is as follows:
Selecting a single-layer feature map F k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:
Wherein k 0 represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;
And selecting the single-layer characteristic diagram F k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.
The specific process of the step 3.4 is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F H and a detail information feature map F L, mapping the high-scale feature map F H into a query matrix Q H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F L into a key matrix K L and a value matrix V L, and obtaining a similarity matrix of the high-scale feature map F H and the detail information feature map F L through matrix multiplication between the query matrix Q H and the key matrix K L, and obtaining a stable detail information feature map F D through multiplication of the similarity matrix and the value matrix V L, wherein the expression is as follows:
Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;
And adding the stable detail information feature map F D and the high-scale feature map F H by elements to obtain a feature map of the supplementary detail information.
The invention has the beneficial effects that:
the invention adopts the cross-scale attention module to supplement stable detail information for the high-scale features, thereby improving the segmentation effect of the algorithm on the ship target contour.
The invention adopts the positive sample sampling method based on Gaussian distribution, and the method can adaptively generate more positive samples containing the head and tail of the ship according to the self geometric characteristics of the ship target, thereby being beneficial to the study of the network on the integral characteristics of the ship.
Drawings
FIG. 1 is a block diagram of a cross-scale attention-based CARSNet network architecture z of the present invention;
FIG. 2 is a schematic illustration of a low-scale feature map of an input image extracted by a Resnet network employed in the present invention;
FIG. 3 is a schematic representation of a multi-scale feature extraction network employed in the present invention;
FIG. 4 is a block diagram of an example split network architecture employed by the present invention;
FIG. 5 is a block diagram of a cross-scale attention module in accordance with the present invention;
FIG. 6 is a graph of true annotation results for SAR images at SSDD datasets;
FIG. 7 is a graph of segmentation results for SRNet;
fig. 8 is a graph of the segmentation result of the present invention.
FIG. 9 is a graph of the true labeling results of SAR images of the Instance-RSDD dataset;
FIG. 10 is a graph of segmentation results for SRNet;
fig. 11 is a graph of the segmentation result of the present invention.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and detailed description.
Example 1
The SAR image ship instance segmentation method based on the cross-scale attention is implemented according to the following steps:
Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;
The real coordinates of the target comprise coordinates of a central point of the target in the image, and coordinates of the width and height of the target and the rotation angle of the target.
The dataset is an Instance-RSDD dataset or a SSDD dataset.
Step 2, constructing a CARSNet network structure based on cross-scale attention; the CARSNet network architecture based on cross-scale attention is shown in fig. 1, and includes:
Resnet network for extracting low-scale feature map of input image; the specific process is as follows:
After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C 0 and a feature C 1, the output feature C 0 and the feature C 1 are fused through a formula (1) to obtain a fusion feature C F, and the fusion formula is as follows:
CF=Conv(C0)+DeConv(C1) (1)
Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;
The fusion feature C F is taken as a low-scale feature map of the image.
The feature extraction network is used for extracting a multi-scale feature map from the input image;
The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;
The example segmentation network includes a cross-scale attention module and a segmentation head as shown in fig. 4, and is configured to sample a feature map in a rotation detection frame by 14×14, supplement detail information on a low-scale feature map to the sampled feature map, and input the feature map of the supplemental information into a segmentation head prediction segmentation mask.
14×14 Samples of the rotation detection frame are obtained by bilinear interpolation algorithm.
Step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network; the specific process is as follows:
Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;
Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;
step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;
In the training stage of a rotating target detection network, the invention designs a positive sample sampling method based on Gaussian distribution, and for the real frame coordinates (x, y, w, h and theta) of a ship target, the real frame coordinates are converted into two-dimensional Gaussian distribution by the following formula (2), wherein the formula is as follows:
Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) -1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.
For each anchor box, the value of its center point on the two-dimensional gaussian distribution of the real box is calculated, and then the anchor boxes of the first k maximum values are left. IoU between the anchor frames and the real frames is calculated, the mean m and the variance g of the IoU are calculated, and a IoU threshold for screening positive samples is set: t=m+g. Finally, whether the IoU values of the anchor frame and the real frame are larger than t is judged to determine whether the anchor frame and the real frame are positive samples.
Step 3.4, selecting a single-layer characteristic diagram of the information to be supplemented from the multi-scale characteristic diagram according to the width and the height of the rotation detection frame, wherein the specific process is as follows:
Selecting a single-layer feature map F k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:
Wherein k 0 represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;
And selecting the single-layer characteristic diagram F k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.
As shown in fig. 5, inputting a single-layer feature map of information to be supplemented and a rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing detail information on a low-scale feature map onto the sampled single-layer feature map to obtain a feature map of the supplemental information; the specific process is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F H and a detail information feature map F L, mapping the high-scale feature map F H into a query matrix Q H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F L into a key matrix K L and a value matrix V L, and obtaining a similarity matrix of the high-scale feature map F H and the detail information feature map F L through matrix multiplication between the query matrix Q H and the key matrix K L, and obtaining a stable detail information feature map F D through multiplication of the similarity matrix and the value matrix V L, wherein the expression is as follows:
Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;
And adding the stable detail information feature map F D and the high-scale feature map F H by elements to obtain a feature map of the supplementary detail information.
Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;
And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.
And 4, segmenting the SAR image ship examples through the trained CARSNet network structure.
The frame of the invention is divided into three parts: a feature extraction network, a rotation target detection network, and an instance segmentation network. The invention designs a cross-scale attention module, supplements low-scale stable detail information to high-scale features, and improves the segmentation effect of an algorithm on a target contour. The invention also designs a positive sample sampling method based on Gaussian distribution, the whole ship is approximately in two-dimensional Gaussian distribution, positive samples are screened out according to Gaussian distribution, the positive samples at the head and tail of the ship are increased, and the effect of the rotating target detection stage is improved.
Example 2
The method is adopted to segment the SSDD data set SAR image ship, meanwhile, the SRNet method is adopted to compare, and the comparison result is shown in the table 1:
TABLE 1
According to the results of the comparison method SRNet and the AP results of the comparison method SRNet under the SSDD data set example segmentation standard, the detection accuracy of the SAR image ship target detection method exceeds the comparison result of SRNet in all AP indexes, and the SAR image ship target detection method has a good detection effect.
Fig. 6,7 and 8 are SAR image simulation results at SSDD data sets, fig. 6 is a graph of true labeling results in this scenario, fig. 7 is a SRNet segmentation result, and fig. 8 is a graph of segmentation results of the present invention.
Example 3
The SAR image ship is segmented by the method of the invention, and meanwhile, the SAR image ship is compared by the SRNet method, and the comparison result is shown in Table 2:
TABLE 2
According to the results of the AP results of the comparison method SRNet under the Instance division reference of the Instance-RSDD data set in Table 2, the detection accuracy of the SAR image ship target detection method is higher than the comparison result of SRNet in all AP indexes, and the SAR image ship target detection method has a good detection effect.
Fig. 9, 10 and 11 are simulation results of SAR images in an Instance-RSDD dataset, fig. 9 is a graph of true labeling results in the scene, fig. 10 is a SRNet segmentation result, and fig. 11 is a graph of segmentation results according to the present invention.
As can be seen from the table 1 and the table 2 in the embodiment 2 and the embodiment 3, the detection accuracy of the invention has better detection effect on SAR image ship targets in comparison results that all AP indexes exceed SRNet. As can be seen from comparison of fig. 7, 8 and comparison of fig. 10 and 11, SRNet may generate a detection frame that cannot completely surround the ship target, affecting the final segmentation result; the Ga-ATSS provided by the invention greatly improves the accuracy of the detection frame, and is beneficial to subsequent segmentation; meanwhile, SRNet is not good for the segmentation details of the outline part of the ship, and the cross-scale attention module provided by the invention greatly improves the problem and improves the segmentation effect of the ship.

Claims (10)

1. The SAR image ship instance segmentation method based on the cross-scale attention is characterized by comprising the following steps of:
Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;
Step2, constructing a CARSNet network structure based on cross-scale attention;
step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network;
and 4, segmenting the SAR image ship examples through the trained CARSNet network structure.
2. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 1, wherein the real coordinates of the target in the step 1 comprise coordinates of a central point of the target in the image, width and height of the target and coordinates of rotation angle of the target.
3. The method for segmenting the SAR image ship Instance based on the cross-scale attention according to claim 1, wherein the dataset in the step 1 is an Instance-RSDD dataset or a SSDD dataset.
4. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 2, wherein the CARSNet network structure based on the cross-scale attention in the step 2 comprises the following steps:
Resnet network for extracting low-scale feature map of input image;
The feature extraction network is used for extracting a multi-scale feature map from the input image;
The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;
the example segmentation network comprises a cross-scale attention module and a segmentation head, and is used for 14×14 sampling of the feature images in the rotation detection frame, supplementing the detail information on the low-scale feature images to the sampled feature images, and inputting the feature images of the supplemental information into a segmentation head prediction segmentation mask.
5. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 4, wherein 14×14 sampling of the feature map in the rotation detection frame is performed by adopting a bilinear interpolation algorithm.
6. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 5, wherein the specific process for extracting the low-scale feature map of the input image is as follows:
After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C 0 and a feature C 1, the output feature C 0 and the feature C 1 are fused through a formula (1) to obtain a fusion feature C F, and the fusion formula is as follows:
CF=Conv(C0)+DeConv(C1) (1)
Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;
The fusion feature C F is taken as a low-scale feature map of the image.
7. The SAR image ship instance segmentation method based on the cross-scale attention as set forth in claim 5, wherein the specific process of the step 3 is as follows:
Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;
Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;
step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;
Step 3.4, selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing the detail information on the low-scale feature map to the sampled single-layer feature map to obtain a feature map of the supplemental information;
Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;
And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.
8. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 7, wherein the formula for converting the real frame of the target into the two-dimensional Gaussian distribution in the step 3.3 is as follows:
Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) -1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.
9. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 7, wherein the specific process of selecting the single-layer feature map of the information to be supplemented from the multi-scale feature map according to the width and the height of the rotation detection frame is as follows:
Selecting a single-layer feature map F k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:
Wherein k 0 represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;
And selecting the single-layer characteristic diagram F k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.
10. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 7, wherein the specific process of the step 3.4 is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F H and a detail information feature map F L, mapping the high-scale feature map F H into a query matrix Q H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F L into a key matrix K L and a value matrix V L, and obtaining a similarity matrix of the high-scale feature map F H and the detail information feature map F L through matrix multiplication between the query matrix Q H and the key matrix K L, and obtaining a stable detail information feature map F D through multiplication of the similarity matrix and the value matrix V L, wherein the expression is as follows:
Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;
And adding the stable detail information feature map F D and the high-scale feature map F H by elements to obtain a feature map of the supplementary detail information.
CN202410081388.6A 2024-01-19 2024-01-19 SAR image ship instance segmentation method based on cross-scale attention Pending CN117893761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410081388.6A CN117893761A (en) 2024-01-19 2024-01-19 SAR image ship instance segmentation method based on cross-scale attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410081388.6A CN117893761A (en) 2024-01-19 2024-01-19 SAR image ship instance segmentation method based on cross-scale attention

Publications (1)

Publication Number Publication Date
CN117893761A true CN117893761A (en) 2024-04-16

Family

ID=90639318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410081388.6A Pending CN117893761A (en) 2024-01-19 2024-01-19 SAR image ship instance segmentation method based on cross-scale attention

Country Status (1)

Country Link
CN (1) CN117893761A (en)

Similar Documents

Publication Publication Date Title
CN114202696B (en) SAR target detection method and device based on context vision and storage medium
CN109636742B (en) Mode conversion method of SAR image and visible light image based on countermeasure generation network
Miao et al. An improved lightweight RetinaNet for ship detection in SAR images
CN110675418A (en) Target track optimization method based on DS evidence theory
CN108428220B (en) Automatic geometric correction method for ocean island reef area of remote sensing image of geostationary orbit satellite sequence
CN111666854B (en) High-resolution SAR image vehicle target detection method fusing statistical significance
Zhao et al. SAR ship detection based on end-to-end morphological feature pyramid network
CN111553204B (en) Transmission tower detection method based on remote sensing image
CN110598730A (en) Flight path association algorithm based on decision tree
CN106845343B (en) Automatic detection method for optical remote sensing image offshore platform
CN113536963A (en) SAR image airplane target detection method based on lightweight YOLO network
CN112487912A (en) Arbitrary-direction ship detection method based on improved YOLOv3
CN114419444A (en) Lightweight high-resolution bird group identification method based on deep learning network
Liu et al. A multi-scale feature pyramid SAR ship detection network with robust background interference
Zou et al. Sonar Image Target Detection for Underwater Communication System Based on Deep Neural Network.
Meiyan et al. M-FCN based sea-surface weak target detection
CN117173556A (en) Small sample SAR target recognition method based on twin neural network
CN116953702A (en) Rotary target detection method and device based on deduction paradigm
Wang et al. Multi-view SAR automatic target recognition based on deformable convolutional network
Wan et al. Orientation Detector for Small Ship Targets in SAR Images Based on Semantic Flow Feature Alignment and Gaussian Label Matching
CN115984751A (en) Twin network remote sensing target tracking method based on multi-channel multi-scale fusion
CN117893761A (en) SAR image ship instance segmentation method based on cross-scale attention
CN116109682A (en) Image registration method based on image diffusion characteristics
CN111008555B (en) Unmanned aerial vehicle image small and weak target enhancement extraction method
Li et al. Ship velocity estimation via images acquired by an unmanned aerial vehicle-based hyperspectral imaging sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination