CN117893761A

CN117893761A - SAR image ship instance segmentation method based on cross-scale attention

Info

Publication number: CN117893761A
Application number: CN202410081388.6A
Authority: CN
Inventors: 张强; 韩臻
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2024-01-19
Filing date: 2024-01-19
Publication date: 2024-04-16

Abstract

The invention discloses a SAR image ship instance segmentation method based on cross-scale attention, which takes an image and a target real category, a target real coordinate and a real target mask in the image as training samples, and extracts a plurality of groups of training samples from a data set; constructing a CARSNet network structure based on cross-scale attention; inputting a training sample into CARSNet network structures to obtain a trained CARSNet network; and (5) segmenting the SAR image ship examples through the trained CARSNet network structure. In the segmentation process, the method adopts the cross-scale attention module to supplement stable detail information for the high-scale features, and improves the segmentation effect of the algorithm on the ship target contour. The invention adopts the positive sample sampling method based on Gaussian distribution, and the method can adaptively generate more positive samples containing the head and tail of the ship according to the self geometric characteristics of the ship target, thereby being beneficial to the study of the network on the integral characteristics of the ship.

Description

SAR image ship instance segmentation method based on cross-scale attention

Technical Field

The invention belongs to the technical field of image segmentation, and particularly relates to a SAR image ship instance segmentation method based on cross-scale attention.

Background

As an active microwave imaging sensor, synthetic aperture radar (SYNTHETIC APERTURE RADAR, SAR) coherently images a target by transmitting electromagnetic pulses and by receiving echoes. The electromagnetic microwave transmission is adopted, so that the influence of weather, cloud cover shielding and time is hardly caused, and a clear ground object target image can be acquired all the day and all the weather; and the wave band is longer than the light wave band, so that the penetration capability is stronger, and the areas such as cloud layers, vegetation, smog and the like can be penetrated. Due to the above advantages, SAR is widely used in the military and civilian fields. In the civil field, SAR can be used for monitoring, environment detection, city planning, map drawing and the like of offshore illegal ships; in the field of military, SAR can realize detection of important military targets, provides accurate coordinate information for weapon systems, and all countries always adopt SAR satellites to observe marine ships due to unique advantages of SAR. The example segmentation technology can be used for describing outline details of a clearer target, so that the method is widely applied to the fields of automatic driving, medical image analysis, face recognition, video analysis, industrial automation, remote sensing image analysis and the like. Compared with target detection, the method can acquire more accurate ship position information and contour details by dividing the ship instance of the SAR image. Therefore, the method has important practical value and wide application prospect in civil and military fields for the ship instance segmentation of SAR images.

The SRNet algorithm only adopts a single-layer feature map of the FPN network to segment at the segmentation head part, and the contour segmentation effect of a large and a middle targets generated by the algorithm is rough due to the lack of detail information on the high-scale feature map, so that the accurate positions of the targets cannot be provided.

Disclosure of Invention

The invention aims to provide a SAR image ship instance segmentation method based on cross-scale attention, which solves the problem that the accurate position of a target of the type cannot be provided due to the lack of detail information on a feature map, so that the segmentation effect is affected.

The technical scheme adopted by the invention is that the SAR image ship instance segmentation method based on the cross-scale attention is implemented according to the following steps:

Step 1, taking an image and a target real category, a target real coordinate and a real target mask in the image as a group of training samples, and extracting a plurality of groups of training samples from a data set;

Step2, constructing a CARSNet network structure based on cross-scale attention;

step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network;

and 4, segmenting the SAR image ship examples through the trained CARSNet network structure.

The invention is also characterized in that:

in the step1, the real coordinates of the target comprise the coordinates of a central point of the target in the image, the width and height of the target and the rotation angle coordinates of the target.

The dataset in step 1 is the Instance-RSDD dataset or the SSDD dataset.

The CARSNet network structure based on the cross-scale attention in the step2 comprises the following steps:

Resnet network for extracting low-scale feature map of input image;

The feature extraction network is used for extracting a multi-scale feature map from the input image;

The rotating target detection network is used for predicting the center point coordinates of a rotating detection frame, the rotating angle of the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target by adopting the multi-scale feature map;

the example segmentation network comprises a cross-scale attention module and a segmentation head, and is used for 14×14 sampling of the feature images in the rotation detection frame, supplementing the detail information on the low-scale feature images to the sampled feature images, and inputting the feature images of the supplemental information into a segmentation head prediction segmentation mask.

14×14 Samples of the rotation detection frame are obtained by bilinear interpolation algorithm.

The specific process for extracting the low-scale characteristic map of the input image comprises the following steps:

After the image is input to Resnet networks, the output of stage0 and stage1 of Resnet networks are used as a feature C ₀ and a feature C ₁, the output feature C ₀ and the feature C ₁ are fused through a formula (1) to obtain a fusion feature C _F, and the fusion formula is as follows:

C_F＝Conv(C₀)+DeConv(C₁) (1)

Wherein Conv represents a 1×1 convolution and DeConv represents a 4×4 transpose convolution;

The fusion feature C _F is taken as a low-scale feature map of the image.

The specific process of the step 3 is as follows:

Step 3.1, inputting a training sample into Resnet networks and outputting a low-scale feature map of a training sample image;

Step 3.2, inputting the training sample into a feature extraction network, and outputting a multi-scale feature map of the training sample image;

step 3.3, inputting a multi-scale feature map of a training sample image into a rotating target detection network, converting a real frame of a target into two-dimensional Gaussian distribution, selecting a positive sample anchor frame and a negative sample anchor frame from the anchor frames by adopting a positive sample sampling method based on the Gaussian distribution, and predicting the rotating detection frame, the width and height of the rotating detection frame and the class score of the rotating detection frame of the target;

Step 3.4, selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing the detail information on the low-scale feature map to the sampled single-layer feature map to obtain a feature map of the supplemental information;

Step 3.5, inputting the feature map of the supplemental information into a segmentation head prediction segmentation mask;

And 3.6, introducing three loss functions, namely a classification prediction loss function for measuring the difference between the predicted class score and the real class, a regression prediction loss function for measuring the difference between the predicted rotation detection frame and the target real coordinate, and a mask prediction loss function for measuring the difference between the predicted segmentation mask and the real target mask, adding the obtained values of the three loss functions, returning to the step 3.1, reducing the total loss value through a random gradient descent algorithm until the total loss value is minimum after 36 rounds of training, and obtaining corresponding CARSNet network parameters, thereby obtaining a trained CARSNet network.

Step 3.3, converting the real frame of the target into a two-dimensional Gaussian distribution formula is as follows:

Where γ (p) represents the gaussian distribution value of p points, m represents the center point coordinates (x, y) of the ship target, r= (cos θ, -sin θ; sin θ, cos θ), s=diag ([ w/2,h/2 ]), p represents the gaussian distribution value of an anchor frame center point, (·) ^-1 represents the inverse transform of the matrix, |·| represents the determinant of the matrix, and θ represents the target rotation angle coordinates.

The specific process for selecting the single-layer characteristic map of the to-be-supplemented information from the multi-scale characteristic map according to the width and the height of the rotation detection frame is as follows:

Selecting a single-layer feature map F _k on the multi-scale feature map according to the width and the height of the rotation detection frame, wherein the calculation expression of the level k of the single-layer feature map in the multi-scale feature map is as follows:

Wherein k ₀ represents the lowest scale of the multi-scale feature map, and w and h represent the width and height of the rotation detection frame respectively;

And selecting the single-layer characteristic diagram F _k corresponding to k as the single-layer characteristic diagram of the information to be supplemented.

The specific process of the step 3.4 is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F _H and a detail information feature map F _L, mapping the high-scale feature map F _H into a query matrix Q _H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F _L into a key matrix K _L and a value matrix V _L, and obtaining a similarity matrix of the high-scale feature map F _H and the detail information feature map F _L through matrix multiplication between the query matrix Q _H and the key matrix K _L, and obtaining a stable detail information feature map F _D through multiplication of the similarity matrix and the value matrix V _L, wherein the expression is as follows:

Wherein Linear represents a fully connected network for dimension transformation, and d represents the dimension of the query matrix;

And adding the stable detail information feature map F _D and the high-scale feature map F _H by elements to obtain a feature map of the supplementary detail information.

The invention has the beneficial effects that:

the invention adopts the cross-scale attention module to supplement stable detail information for the high-scale features, thereby improving the segmentation effect of the algorithm on the ship target contour.

The invention adopts the positive sample sampling method based on Gaussian distribution, and the method can adaptively generate more positive samples containing the head and tail of the ship according to the self geometric characteristics of the ship target, thereby being beneficial to the study of the network on the integral characteristics of the ship.

Drawings

FIG. 1 is a block diagram of a cross-scale attention-based CARSNet network architecture z of the present invention;

FIG. 2 is a schematic illustration of a low-scale feature map of an input image extracted by a Resnet network employed in the present invention;

FIG. 3 is a schematic representation of a multi-scale feature extraction network employed in the present invention;

FIG. 4 is a block diagram of an example split network architecture employed by the present invention;

FIG. 5 is a block diagram of a cross-scale attention module in accordance with the present invention;

FIG. 6 is a graph of true annotation results for SAR images at SSDD datasets;

FIG. 7 is a graph of segmentation results for SRNet;

fig. 8 is a graph of the segmentation result of the present invention.

FIG. 9 is a graph of the true labeling results of SAR images of the Instance-RSDD dataset;

FIG. 10 is a graph of segmentation results for SRNet;

fig. 11 is a graph of the segmentation result of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings and detailed description.

Example 1

The SAR image ship instance segmentation method based on the cross-scale attention is implemented according to the following steps:

The real coordinates of the target comprise coordinates of a central point of the target in the image, and coordinates of the width and height of the target and the rotation angle of the target.

The dataset is an Instance-RSDD dataset or a SSDD dataset.

Step 2, constructing a CARSNet network structure based on cross-scale attention; the CARSNet network architecture based on cross-scale attention is shown in fig. 1, and includes:

Resnet network for extracting low-scale feature map of input image; the specific process is as follows:

C_F＝Conv(C₀)+DeConv(C₁) (1)

The fusion feature C _F is taken as a low-scale feature map of the image.

The example segmentation network includes a cross-scale attention module and a segmentation head as shown in fig. 4, and is configured to sample a feature map in a rotation detection frame by 14×14, supplement detail information on a low-scale feature map to the sampled feature map, and input the feature map of the supplemental information into a segmentation head prediction segmentation mask.

Step 3, inputting a training sample into a CARSNet network structure based on cross-scale attention to obtain a trained CARSNet network; the specific process is as follows:

In the training stage of a rotating target detection network, the invention designs a positive sample sampling method based on Gaussian distribution, and for the real frame coordinates (x, y, w, h and theta) of a ship target, the real frame coordinates are converted into two-dimensional Gaussian distribution by the following formula (2), wherein the formula is as follows:

For each anchor box, the value of its center point on the two-dimensional gaussian distribution of the real box is calculated, and then the anchor boxes of the first k maximum values are left. IoU between the anchor frames and the real frames is calculated, the mean m and the variance g of the IoU are calculated, and a IoU threshold for screening positive samples is set: t=m+g. Finally, whether the IoU values of the anchor frame and the real frame are larger than t is judged to determine whether the anchor frame and the real frame are positive samples.

Step 3.4, selecting a single-layer characteristic diagram of the information to be supplemented from the multi-scale characteristic diagram according to the width and the height of the rotation detection frame, wherein the specific process is as follows:

As shown in fig. 5, inputting a single-layer feature map of information to be supplemented and a rotation detection frame into an example segmentation network, sampling the single-layer feature map in the rotation detection frame by 14×14 through a cross-scale attention module, and supplementing detail information on a low-scale feature map onto the sampled single-layer feature map to obtain a feature map of the supplemental information; the specific process is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F _H and a detail information feature map F _L, mapping the high-scale feature map F _H into a query matrix Q _H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F _L into a key matrix K _L and a value matrix V _L, and obtaining a similarity matrix of the high-scale feature map F _H and the detail information feature map F _L through matrix multiplication between the query matrix Q _H and the key matrix K _L, and obtaining a stable detail information feature map F _D through multiplication of the similarity matrix and the value matrix V _L, wherein the expression is as follows:

The frame of the invention is divided into three parts: a feature extraction network, a rotation target detection network, and an instance segmentation network. The invention designs a cross-scale attention module, supplements low-scale stable detail information to high-scale features, and improves the segmentation effect of an algorithm on a target contour. The invention also designs a positive sample sampling method based on Gaussian distribution, the whole ship is approximately in two-dimensional Gaussian distribution, positive samples are screened out according to Gaussian distribution, the positive samples at the head and tail of the ship are increased, and the effect of the rotating target detection stage is improved.

Example 2

The method is adopted to segment the SSDD data set SAR image ship, meanwhile, the SRNet method is adopted to compare, and the comparison result is shown in the table 1:

TABLE 1

According to the results of the comparison method SRNet and the AP results of the comparison method SRNet under the SSDD data set example segmentation standard, the detection accuracy of the SAR image ship target detection method exceeds the comparison result of SRNet in all AP indexes, and the SAR image ship target detection method has a good detection effect.

Fig. 6,7 and 8 are SAR image simulation results at SSDD data sets, fig. 6 is a graph of true labeling results in this scenario, fig. 7 is a SRNet segmentation result, and fig. 8 is a graph of segmentation results of the present invention.

Example 3

The SAR image ship is segmented by the method of the invention, and meanwhile, the SAR image ship is compared by the SRNet method, and the comparison result is shown in Table 2:

TABLE 2

According to the results of the AP results of the comparison method SRNet under the Instance division reference of the Instance-RSDD data set in Table 2, the detection accuracy of the SAR image ship target detection method is higher than the comparison result of SRNet in all AP indexes, and the SAR image ship target detection method has a good detection effect.

Fig. 9, 10 and 11 are simulation results of SAR images in an Instance-RSDD dataset, fig. 9 is a graph of true labeling results in the scene, fig. 10 is a SRNet segmentation result, and fig. 11 is a graph of segmentation results according to the present invention.

As can be seen from the table 1 and the table 2 in the embodiment 2 and the embodiment 3, the detection accuracy of the invention has better detection effect on SAR image ship targets in comparison results that all AP indexes exceed SRNet. As can be seen from comparison of fig. 7, 8 and comparison of fig. 10 and 11, SRNet may generate a detection frame that cannot completely surround the ship target, affecting the final segmentation result; the Ga-ATSS provided by the invention greatly improves the accuracy of the detection frame, and is beneficial to subsequent segmentation; meanwhile, SRNet is not good for the segmentation details of the outline part of the ship, and the cross-scale attention module provided by the invention greatly improves the problem and improves the segmentation effect of the ship.

Claims

1. The SAR image ship instance segmentation method based on the cross-scale attention is characterized by comprising the following steps of:

Step2, constructing a CARSNet network structure based on cross-scale attention;

2. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 1, wherein the real coordinates of the target in the step 1 comprise coordinates of a central point of the target in the image, width and height of the target and coordinates of rotation angle of the target.

3. The method for segmenting the SAR image ship Instance based on the cross-scale attention according to claim 1, wherein the dataset in the step 1 is an Instance-RSDD dataset or a SSDD dataset.

4. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 2, wherein the CARSNet network structure based on the cross-scale attention in the step 2 comprises the following steps:

Resnet network for extracting low-scale feature map of input image;

5. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 4, wherein 14×14 sampling of the feature map in the rotation detection frame is performed by adopting a bilinear interpolation algorithm.

6. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 5, wherein the specific process for extracting the low-scale feature map of the input image is as follows:

C_F＝Conv(C₀)+DeConv(C₁) (1)

The fusion feature C _F is taken as a low-scale feature map of the image.

7. The SAR image ship instance segmentation method based on the cross-scale attention as set forth in claim 5, wherein the specific process of the step 3 is as follows:

8. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 7, wherein the formula for converting the real frame of the target into the two-dimensional Gaussian distribution in the step 3.3 is as follows:

9. The method for segmenting the SAR image ship instance based on the cross-scale attention according to claim 7, wherein the specific process of selecting the single-layer feature map of the information to be supplemented from the multi-scale feature map according to the width and the height of the rotation detection frame is as follows:

10. The method for segmenting the SAR image ship instance based on the cross-scale attention as set forth in claim 7, wherein the specific process of the step 3.4 is as follows: selecting a single-layer feature map of the to-be-supplemented information from the multi-scale feature map according to the width and the height of the rotation detection frame, inputting the single-layer feature map of the to-be-supplemented information and the rotation detection frame into an example segmentation network, performing 14×14 feature sampling on the low-scale feature map of the image and the single-layer feature map of the to-be-supplemented information in the rotation detection frame of the target by adopting a bilinear interpolation algorithm to respectively obtain a high-scale feature map F _H and a detail information feature map F _L, mapping the high-scale feature map F _H into a query matrix Q _H through a fully connected network of a cross-scale attention module, mapping the detail information feature map F _L into a key matrix K _L and a value matrix V _L, and obtaining a similarity matrix of the high-scale feature map F _H and the detail information feature map F _L through matrix multiplication between the query matrix Q _H and the key matrix K _L, and obtaining a stable detail information feature map F _D through multiplication of the similarity matrix and the value matrix V _L, wherein the expression is as follows: