CN117893761A - SAR image ship instance segmentation method based on cross-scale attention - Google Patents
SAR image ship instance segmentation method based on cross-scale attention Download PDFInfo
- Publication number
- CN117893761A CN117893761A CN202410081388.6A CN202410081388A CN117893761A CN 117893761 A CN117893761 A CN 117893761A CN 202410081388 A CN202410081388 A CN 202410081388A CN 117893761 A CN117893761 A CN 117893761A
- Authority
- CN
- China
- Prior art keywords
- feature map
- scale
- target
- cross
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000005070 sampling Methods 0.000 claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 14
- 238000001514 detection method Methods 0.000 claims description 80
- 239000011159 matrix material Substances 0.000 claims description 36
- 239000002356 single layer Substances 0.000 claims description 36
- 238000010586 diagram Methods 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 9
- 230000000153 supplemental effect Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 230000001502 supplementing effect Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 10
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 239000013589 supplement Substances 0.000 abstract description 4
- 239000000284 extract Substances 0.000 abstract 1
- 238000002372 labelling Methods 0.000 description 3
- 238000010191 image analysis Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000010410 layer Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a SAR image ship instance segmentation method based on cross-scale attention, which takes an image and a target real category, a target real coordinate and a real target mask in the image as training samples, and extracts a plurality of groups of training samples from a data set; constructing a CARSNet network structure based on cross-scale attention; inputting a training sample into CARSNet network structures to obtain a trained CARSNet network; and (5) segmenting the SAR image ship examples through the trained CARSNet network structure. In the segmentation process, the method adopts the cross-scale attention module to supplement stable detail information for the high-scale features, and improves the segmentation effect of the algorithm on the ship target contour. The invention adopts the positive sample sampling method based on Gaussian distribution, and the method can adaptively generate more positive samples containing the head and tail of the ship according to the self geometric characteristics of the ship target, thereby being beneficial to the study of the network on the integral characteristics of the ship.
Description
Technical Field
The invention belongs to the technical field of image segmentation, and particularly relates to a SAR image ship instance segmentation method based on cross-scale attention.
Background
As an active microwave imaging sensor, synthetic aperture radar (SYNTHETIC APERTURE RADAR, SAR) coherently images a target by transmitting electromagnetic pulses and by receiving echoes. The electromagnetic microwave transmission is adopted, so that the influence of weather, cloud cover shielding and time is hardly caused, and a clear ground object target image can be acquired all the day and all the weather; and the wave band is longer than the light wave band, so that the penetration capability is stronger, and the areas such as cloud layers, vegetation, smog and the like can be penetrated. Due to the above advantages, SAR is widely used in the military and civilian fields. In the civil field, SAR can be used for monitoring, environment detection, city planning, map drawing and the like of offshore illegal ships; in the field of military, SAR can realize detection of important military targets, provides accurate coordinate information for weapon systems, and all countries always adopt SAR satellites to observe marine ships due to unique advantages of SAR. The example segmentation technology can be used for describing outline details of a clearer target, so that the method is widely applied to the fields of automatic driving, medical image analysis, face recognition, video analysis, industrial automation, remote sensing image analysis and the like. Compared with target detection, the method can acquire more accurate ship position information and contour details by dividing the ship instance of the SAR image. Therefore, the method has important practical value and wide application prospect in civil and military fields for the ship instance segmentation of SAR images.
The SRNet algorithm only adopts a single-layer feature map of the FPN network to segment at the segmentation head part, and the contour segmentation effect of a large and a middle targets generated by the algorithm is rough due to the lack of detail information on the high-scale feature map, so that the accurate positions of the targets cannot be provided.
Disclosure of Invention
The invention aims to provide a SAR image ship instance segmentation method based on cross-scale attention, which solves the problem that the accurate position of a target of the type cannot be provided due to the lack of detail information on a feature map, so that the segmentation effect is affected.
The technical scheme adopted by the invention is that the SAR image ship instance segmentation method based on the cross-scale attention is implemented according to the following steps:
Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;
Step2, constructing a CARSNet network structure based on cross-scale attention;
step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network;
and 4, segmenting the SAR image ship examples through the trained CARSNet network structure.
The invention is also characterized in that:
in the step1, the real coordinates of the target comprise the coordinates of a central point of the target in the image, the width and height of the target and the rotation angle coordinates of the target.
The dataset in step 1 is the Instance-RSDD dataset or the SSDD dataset.
The CARSNet network structure based on the cross-scale attention in the step2 comprises the following steps:
Resnet network for extracting low-scale feature map of input image;
The feature extraction network is used for extracting a multi-scale feature map from the input image;
The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;
the example segmentation network comprises a cross-scale attention module and a segmentation head, and is used for 14×14 sampling of the feature images in the rotation detection frame, supplementing the detail information on the low-scale feature images to the sampled feature images, and inputting the feature images of the supplemental information into a segmentation head prediction segmentation mask.
14×14 Samples of the rotation detection frame are obtained by bilinear interpolation algorithm.
The specific process for extracting the low-scale characteristic map of the input image comprises the following steps:
After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C 0 and a feature C 1, the output feature C 0 and the feature C 1 are fused through a formula (1) to obtain a fusion feature C F, and the fusion formula is as follows:
CF=Conv(C0)+DeConv(C1) (1)
Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;
The fusion feature C F is taken as a low-scale feature map of the image.
The specific process of the step 3 is as follows:
Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;
Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;
step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;
Step 3.4, selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing the detail information on the low-scale feature map to the sampled single-layer feature map to obtain a feature map of the supplemental information;
Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;
And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.
Step 3.3, converting the real frame of the target into a two-dimensional Gaussian distribution formula is as follows:
Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) -1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.
The specific process for selecting the single-layer characteristic map of the to-be-supplemented information from the multi-scale characteristic map according to the width and the height of the rotation detection frame is as follows:
Selecting a single-layer feature map F k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:
Wherein k 0 represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;
And selecting the single-layer characteristic diagram F k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.
The specific process of the step 3.4 is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F H and a detail information feature map F L, mapping the high-scale feature map F H into a query matrix Q H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F L into a key matrix K L and a value matrix V L, and obtaining a similarity matrix of the high-scale feature map F H and the detail information feature map F L through matrix multiplication between the query matrix Q H and the key matrix K L, and obtaining a stable detail information feature map F D through multiplication of the similarity matrix and the value matrix V L, wherein the expression is as follows:
Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;
And adding the stable detail information feature map F D and the high-scale feature map F H by elements to obtain a feature map of the supplementary detail information.
The invention has the beneficial effects that:
the invention adopts the cross-scale attention module to supplement stable detail information for the high-scale features, thereby improving the segmentation effect of the algorithm on the ship target contour.
The invention adopts the positive sample sampling method based on Gaussian distribution, and the method can adaptively generate more positive samples containing the head and tail of the ship according to the self geometric characteristics of the ship target, thereby being beneficial to the study of the network on the integral characteristics of the ship.
Drawings
FIG. 1 is a block diagram of a cross-scale attention-based CARSNet network architecture z of the present invention;
FIG. 2 is a schematic illustration of a low-scale feature map of an input image extracted by a Resnet network employed in the present invention;
FIG. 3 is a schematic representation of a multi-scale feature extraction network employed in the present invention;
FIG. 4 is a block diagram of an example split network architecture employed by the present invention;
FIG. 5 is a block diagram of a cross-scale attention module in accordance with the present invention;
FIG. 6 is a graph of true annotation results for SAR images at SSDD datasets;
FIG. 7 is a graph of segmentation results for SRNet;
fig. 8 is a graph of the segmentation result of the present invention.
FIG. 9 is a graph of the true labeling results of SAR images of the Instance-RSDD dataset;
FIG. 10 is a graph of segmentation results for SRNet;
fig. 11 is a graph of the segmentation result of the present invention.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and detailed description.
Example 1
The SAR image ship instance segmentation method based on the cross-scale attention is implemented according to the following steps:
Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;
The real coordinates of the target comprise coordinates of a central point of the target in the image, and coordinates of the width and height of the target and the rotation angle of the target.
The dataset is an Instance-RSDD dataset or a SSDD dataset.
Step 2, constructing a CARSNet network structure based on cross-scale attention; the CARSNet network architecture based on cross-scale attention is shown in fig. 1, and includes:
Resnet network for extracting low-scale feature map of input image; the specific process is as follows:
After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C 0 and a feature C 1, the output feature C 0 and the feature C 1 are fused through a formula (1) to obtain a fusion feature C F, and the fusion formula is as follows:
CF=Conv(C0)+DeConv(C1) (1)
Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;
The fusion feature C F is taken as a low-scale feature map of the image.
The feature extraction network is used for extracting a multi-scale feature map from the input image;
The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;
The example segmentation network includes a cross-scale attention module and a segmentation head as shown in fig. 4, and is configured to sample a feature map in a rotation detection frame by 14×14, supplement detail information on a low-scale feature map to the sampled feature map, and input the feature map of the supplemental information into a segmentation head prediction segmentation mask.
14×14 Samples of the rotation detection frame are obtained by bilinear interpolation algorithm.
Step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network; the specific process is as follows:
Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;
Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;
step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;
In the training stage of a rotating target detection network, the invention designs a positive sample sampling method based on Gaussian distribution, and for the real frame coordinates (x, y, w, h and theta) of a ship target, the real frame coordinates are converted into two-dimensional Gaussian distribution by the following formula (2), wherein the formula is as follows:
Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) -1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.
For each anchor box, the value of its center point on the two-dimensional gaussian distribution of the real box is calculated, and then the anchor boxes of the first k maximum values are left. IoU between the anchor frames and the real frames is calculated, the mean m and the variance g of the IoU are calculated, and a IoU threshold for screening positive samples is set: t=m+g. Finally, whether the IoU values of the anchor frame and the real frame are larger than t is judged to determine whether the anchor frame and the real frame are positive samples.
Step 3.4, selecting a single-layer characteristic diagram of the information to be supplemented from the multi-scale characteristic diagram according to the width and the height of the rotation detection frame, wherein the specific process is as follows:
Selecting a single-layer feature map F k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:
Wherein k 0 represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;
And selecting the single-layer characteristic diagram F k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.
As shown in fig. 5, inputting a single-layer feature map of information to be supplemented and a rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing detail information on a low-scale feature map onto the sampled single-layer feature map to obtain a feature map of the supplemental information; the specific process is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F H and a detail information feature map F L, mapping the high-scale feature map F H into a query matrix Q H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F L into a key matrix K L and a value matrix V L, and obtaining a similarity matrix of the high-scale feature map F H and the detail information feature map F L through matrix multiplication between the query matrix Q H and the key matrix K L, and obtaining a stable detail information feature map F D through multiplication of the similarity matrix and the value matrix V L, wherein the expression is as follows:
Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;
And adding the stable detail information feature map F D and the high-scale feature map F H by elements to obtain a feature map of the supplementary detail information.
Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;
And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.
And 4, segmenting the SAR image ship examples through the trained CARSNet network structure.
The frame of the invention is divided into three parts: a feature extraction network, a rotation target detection network, and an instance segmentation network. The invention designs a cross-scale attention module, supplements low-scale stable detail information to high-scale features, and improves the segmentation effect of an algorithm on a target contour. The invention also designs a positive sample sampling method based on Gaussian distribution, the whole ship is approximately in two-dimensional Gaussian distribution, positive samples are screened out according to Gaussian distribution, the positive samples at the head and tail of the ship are increased, and the effect of the rotating target detection stage is improved.
Example 2
The method is adopted to segment the SSDD data set SAR image ship, meanwhile, the SRNet method is adopted to compare, and the comparison result is shown in the table 1:
TABLE 1
According to the results of the comparison method SRNet and the AP results of the comparison method SRNet under the SSDD data set example segmentation standard, the detection accuracy of the SAR image ship target detection method exceeds the comparison result of SRNet in all AP indexes, and the SAR image ship target detection method has a good detection effect.
Fig. 6,7 and 8 are SAR image simulation results at SSDD data sets, fig. 6 is a graph of true labeling results in this scenario, fig. 7 is a SRNet segmentation result, and fig. 8 is a graph of segmentation results of the present invention.
Example 3
The SAR image ship is segmented by the method of the invention, and meanwhile, the SAR image ship is compared by the SRNet method, and the comparison result is shown in Table 2:
TABLE 2
According to the results of the AP results of the comparison method SRNet under the Instance division reference of the Instance-RSDD data set in Table 2, the detection accuracy of the SAR image ship target detection method is higher than the comparison result of SRNet in all AP indexes, and the SAR image ship target detection method has a good detection effect.
Fig. 9, 10 and 11 are simulation results of SAR images in an Instance-RSDD dataset, fig. 9 is a graph of true labeling results in the scene, fig. 10 is a SRNet segmentation result, and fig. 11 is a graph of segmentation results according to the present invention.
As can be seen from the table 1 and the table 2 in the embodiment 2 and the embodiment 3, the detection accuracy of the invention has better detection effect on SAR image ship targets in comparison results that all AP indexes exceed SRNet. As can be seen from comparison of fig. 7, 8 and comparison of fig. 10 and 11, SRNet may generate a detection frame that cannot completely surround the ship target, affecting the final segmentation result; the Ga-ATSS provided by the invention greatly improves the accuracy of the detection frame, and is beneficial to subsequent segmentation; meanwhile, SRNet is not good for the segmentation details of the outline part of the ship, and the cross-scale attention module provided by the invention greatly improves the problem and improves the segmentation effect of the ship.
Claims (10)
1. The SAR image ship instance segmentation method based on the cross-scale attention is characterized by comprising the following steps of:
Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;
Step2, constructing a CARSNet network structure based on cross-scale attention;
step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network;
and 4, segmenting the SAR image ship examples through the trained CARSNet network structure.
2. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 1, wherein the real coordinates of the target in the step 1 comprise coordinates of a central point of the target in the image, width and height of the target and coordinates of rotation angle of the target.
3. The method for segmenting the SAR image ship Instance based on the cross-scale attention according to claim 1, wherein the dataset in the step 1 is an Instance-RSDD dataset or a SSDD dataset.
4. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 2, wherein the CARSNet network structure based on the cross-scale attention in the step 2 comprises the following steps:
Resnet network for extracting low-scale feature map of input image;
The feature extraction network is used for extracting a multi-scale feature map from the input image;
The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;
the example segmentation network comprises a cross-scale attention module and a segmentation head, and is used for 14×14 sampling of the feature images in the rotation detection frame, supplementing the detail information on the low-scale feature images to the sampled feature images, and inputting the feature images of the supplemental information into a segmentation head prediction segmentation mask.
5. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 4, wherein 14×14 sampling of the feature map in the rotation detection frame is performed by adopting a bilinear interpolation algorithm.
6. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 5, wherein the specific process for extracting the low-scale feature map of the input image is as follows:
After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C 0 and a feature C 1, the output feature C 0 and the feature C 1 are fused through a formula (1) to obtain a fusion feature C F, and the fusion formula is as follows:
CF=Conv(C0)+DeConv(C1) (1)
Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;
The fusion feature C F is taken as a low-scale feature map of the image.
7. The SAR image ship instance segmentation method based on the cross-scale attention as set forth in claim 5, wherein the specific process of the step 3 is as follows:
Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;
Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;
step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;
Step 3.4, selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing the detail information on the low-scale feature map to the sampled single-layer feature map to obtain a feature map of the supplemental information;
Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;
And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.
8. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 7, wherein the formula for converting the real frame of the target into the two-dimensional Gaussian distribution in the step 3.3 is as follows:
Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) -1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.
9. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 7, wherein the specific process of selecting the single-layer feature map of the information to be supplemented from the multi-scale feature map according to the width and the height of the rotation detection frame is as follows:
Selecting a single-layer feature map F k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:
Wherein k 0 represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;
And selecting the single-layer characteristic diagram F k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.
10. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 7, wherein the specific process of the step 3.4 is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F H and a detail information feature map F L, mapping the high-scale feature map F H into a query matrix Q H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F L into a key matrix K L and a value matrix V L, and obtaining a similarity matrix of the high-scale feature map F H and the detail information feature map F L through matrix multiplication between the query matrix Q H and the key matrix K L, and obtaining a stable detail information feature map F D through multiplication of the similarity matrix and the value matrix V L, wherein the expression is as follows:
Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;
And adding the stable detail information feature map F D and the high-scale feature map F H by elements to obtain a feature map of the supplementary detail information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410081388.6A CN117893761A (en) | 2024-01-19 | 2024-01-19 | SAR image ship instance segmentation method based on cross-scale attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410081388.6A CN117893761A (en) | 2024-01-19 | 2024-01-19 | SAR image ship instance segmentation method based on cross-scale attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117893761A true CN117893761A (en) | 2024-04-16 |
Family
ID=90639318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410081388.6A Pending CN117893761A (en) | 2024-01-19 | 2024-01-19 | SAR image ship instance segmentation method based on cross-scale attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117893761A (en) |
-
2024
- 2024-01-19 CN CN202410081388.6A patent/CN117893761A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114202696B (en) | SAR target detection method and device based on context vision and storage medium | |
CN109636742B (en) | Mode conversion method of SAR image and visible light image based on countermeasure generation network | |
Miao et al. | An improved lightweight RetinaNet for ship detection in SAR images | |
CN110675418A (en) | Target track optimization method based on DS evidence theory | |
CN108428220B (en) | Automatic geometric correction method for ocean island reef area of remote sensing image of geostationary orbit satellite sequence | |
CN111666854B (en) | High-resolution SAR image vehicle target detection method fusing statistical significance | |
Zhao et al. | SAR ship detection based on end-to-end morphological feature pyramid network | |
CN111553204B (en) | Transmission tower detection method based on remote sensing image | |
CN110598730A (en) | Flight path association algorithm based on decision tree | |
CN106845343B (en) | Automatic detection method for optical remote sensing image offshore platform | |
CN113536963A (en) | SAR image airplane target detection method based on lightweight YOLO network | |
CN112487912A (en) | Arbitrary-direction ship detection method based on improved YOLOv3 | |
CN114419444A (en) | Lightweight high-resolution bird group identification method based on deep learning network | |
Liu et al. | A multi-scale feature pyramid SAR ship detection network with robust background interference | |
Zou et al. | Sonar Image Target Detection for Underwater Communication System Based on Deep Neural Network. | |
Meiyan et al. | M-FCN based sea-surface weak target detection | |
CN117173556A (en) | Small sample SAR target recognition method based on twin neural network | |
CN116953702A (en) | Rotary target detection method and device based on deduction paradigm | |
Wang et al. | Multi-view SAR automatic target recognition based on deformable convolutional network | |
Wan et al. | Orientation Detector for Small Ship Targets in SAR Images Based on Semantic Flow Feature Alignment and Gaussian Label Matching | |
CN115984751A (en) | Twin network remote sensing target tracking method based on multi-channel multi-scale fusion | |
CN117893761A (en) | SAR image ship instance segmentation method based on cross-scale attention | |
CN116109682A (en) | Image registration method based on image diffusion characteristics | |
CN111008555B (en) | Unmanned aerial vehicle image small and weak target enhancement extraction method | |
Li et al. | Ship velocity estimation via images acquired by an unmanned aerial vehicle-based hyperspectral imaging sensor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |