CN111563414B

CN111563414B - SAR image ship target detection method based on non-local feature enhancement

Info

Publication number: CN111563414B
Application number: CN202010267019.8A
Authority: CN
Inventors: 耿杰; 徐哲; 蒋雯; 邓鑫洋; 黄凯
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2022-03-01
Anticipated expiration: 2040-04-08
Also published as: CN111563414A

Abstract

The invention discloses a non-local feature enhancement-based SAR image ship target detection method, which comprises the following steps of: inputting a ship SAR image data set, and converting the data set into a PASCAL VOC format; constructing a convolutional neural network ResNet, and training the ResNet network by using the manufactured ship SAR image data set; fusing the feature graphs output by ResNet by using a feature pyramid network to obtain a fused feature graph; grouping, scaling and averaging the fused feature graphs to obtain an average feature graph; performing feature enhancement on the average feature map by using a non-local feature enhancement model; carrying out scale transformation on the enhanced feature map; and performing regression prediction on the enhanced feature map by using a full convolution network to obtain a final detection result. The invention provides a non-local feature enhancement method, which not only effectively solves the problem of detecting the SAR image ship small target, but also can reduce the influence of the SAR image noise of the ship target on the detection.

Description

SAR image ship target detection method based on non-local feature enhancement

Technical Field

The invention belongs to the field of intelligent interpretation of remote sensing images, and particularly relates to a non-local feature enhancement-based SAR image ship target detection method.

Background

Synthetic Aperture Radar (SAR) is an active microwave remote sensor, can perform high-resolution imaging on various targets, can work all day long and all weather, has the advantages of strong penetration capability, strong anti-interference performance, long acting distance and the like, and is widely applied to the fields of military reconnaissance, marine application, agriculture and forestry monitoring and the like. With the development of synthetic aperture radar imaging technology, the SAR image target detection has been widely applied to military and civil fields, and the SAR image ship target detection is one of the hot spot problems.

The Constant False Alarm Rate (CFAR) detection algorithm is a traditional SAR image ship target detection method, and the ship target is detected by carrying out statistical distribution modeling on background clutter. But due to the complexity of modeling and long calculation time, ideal results are often difficult to obtain for ship detection with complex background. With the development of artificial intelligence technology, deep learning models are gradually used for target detection tasks. Target detection algorithms based on a convolutional neural network can be divided into two categories, one category is a double-stage target detection algorithm, such as fast R-CNN, Mask R-CNN and the like, detection is divided into two stages of candidate region extraction, candidate target category and position prediction by the algorithm, the other category is a single-stage target detection algorithm, such as You Only Look one (Yolov2), Ret inaNet and the like, the algorithm directly predicts the category and position information of a target in a regression mode, compared with the double-stage target detection algorithm, the detection efficiency is greatly improved, and an end-to-end target detection task is realized.

However, existing convolutional neural network-based object detection algorithms are mainly applied to optical image data. The SAR image ship target detection technology based on the convolutional neural network has made great progress in the last two years, but still has some problems, which are mainly shown in the following: (1) the size of a ship target of the low-resolution SAR image is small, and the position of the ship target in the deep characteristic map based on the convolutional neural network is rough, so that the ship target is difficult to locate and even missed to detect; (2) the SAR image of the ship target has the problems of complex background and large noise, the difficulty of distinguishing the target from the background is increased, and the detection effect of the ship target is influenced.

Disclosure of Invention

Aiming at the technical problems, the invention provides the SAR image ship target detection method based on non-local feature enhancement, which solves the problem of unbalance among feature layers, enhances the robustness of features, improves the detection capability of small-size ship targets and improves the precision of ship target detection.

The technical method adopted by the invention is as follows: a non-local feature enhancement-based SAR image ship target detection method is characterized by comprising the following steps:

step one, inputting a ship SAR image data set, and converting the data set into a PASCAL VOC format:

step 101, labeling a ship SAR image data set by using open source software labelImg;

102, respectively putting the annotation file and the image file into different folders to manufacture a standard Pascal VOC data set format;

103, dividing the manufactured ship SAR image data set into a training set, a testing set and a verification set according to the proportion of 7:2:1, and inputting the training set, the testing set and the verification set into an SAR image ship target detection model;

step two, constructing a convolutional neural network ResNet, and training the ResNet network by using the manufactured ship SAR image data set:

step 201, constructing a ResNet-50 network as a basic network, wherein the ResNet-50 network comprises 5 convolution modules which are Conv1, Conv2_ x, Conv3_ x, Conv4_ x and Conv5_ x respectively;

step 202, adopting a ResNet-50 network to extract the features of the SAR image, and outputting feature maps of the last 4 convolution modules, wherein the feature maps are respectively C₂、C₃、C₄、C₅；

Thirdly, fusing the feature graphs output by ResNet by using the feature pyramid network to obtain fused feature graphs:

step 301, for the characteristic diagram C output in step 202₂、C₃、C₄、C₅Convolution operations are respectively carried out by adopting convolution kernels with the number being half of the number of the characteristic diagram channels and the size being 1 multiplied by 1, so that characteristic dimension reduction is realized, and a dimension-reduced characteristic diagram C is obtained₂′、C₃′、C₄′、P₅′；

Step 302, interpolate P bilinearly₅' size extension to and C₄' same size, then with C₄' element by element addition to P₄', same principle, P₄Extended to and C by bilinear interpolation₃' same size, then with C₃' Pixel-by-pixel addition to P₃′，P₃Extended to and C by bilinear interpolation₂' same size, then with C₂' Pixel-by-pixel addition to P₂′；

Step 303, checking P by respectively adopting convolution of 3 × 3₂′、P₃′、P₄′、P₅Performing convolution operation to eliminate aliasing effect caused by bilinear interpolation to obtain a fused characteristic diagram P₂、P₃、P₄、P₅；

Step four, grouping, scaling and averaging the fused feature graphs to obtain an average feature graph:

step 401, fusing the feature map P₅Performing bilinear interpolation expansion to sum P₄Same size, to give P₅", mixing P with₂Performing a spatial pyramid pooling reduction to sum P₃Same size, to give P₂″；

Step 402, averaging the two groups of feature maps with the re-scaled sizes to obtain an average feature map:

wherein, P_a、P_bRespectively representing the average features of the two groups of feature mapsFigure representation;

fifthly, performing feature enhancement on the average feature map by using a non-local feature enhancement model:

step 501, average feature map P_aPerforming three groups of convolution operations by adopting convolution kernels with the number being half of the average feature map channel number and the size being 1 multiplied by 1 respectively to realize feature dimensionality reduction, and expanding the dimensionality of the feature kernels respectively to obtain three feature matrixes:

θ(pⁱ)＝W_θpⁱ

φ(p^j)＝W_φp^j

g(p^j)＝W_gp^j

wherein p isⁱAnd p^jRespectively representing the information of the ith and jth positions on the average profile, W_θ、W_φAnd W_gRespectively representing the weight matrixes of the corresponding convolution kernels;

step 502, calculating a feature matrix theta (p)ⁱ) And phi (p)^j) To obtain a similarity matrix f (p)ⁱ,p^j) Comprises the following steps:

step 503, match the similarity matrix f (p)ⁱ,p^j) And a feature matrix g (p)^j) Carrying out matrix multiplication to obtain an enhanced feature matrix y_iComprises the following steps:

step 504, the enhanced feature matrix y is checked by using the convolution with the number of average feature map channels and the size of 1 multiplied by 1_iPerforming convolution operation to realize feature dimension increasing, and then performing convolution operation with the average feature map P_aAdding element by element to enhance original characteristics to obtain enhanced characteristic diagram D₃；

Step 505, the average feature map P is processed according to steps 501 to 504_bIs treated to obtainStrong feature map D₄；

Step six, carrying out scale transformation on the enhanced feature graph:

step 601, enhancing the feature map D₃Obtaining D by size expansion after bilinear interpolation₂；

Step 602, enhance feature map D₄The size is reduced after the spatial pyramid pooling to obtain D₅；

Step 603, enhanced feature map D of scale transformation₂、D₃、D₄、D₅Respectively replacing the feature map P of step 303₂、P₃、P₄、P₅；

Step 604, for the enhanced feature map D₅The size is reduced after the space pyramid pooling is carried out, and an enhanced feature map D is obtained₆；

Seventhly, carrying out regression prediction on the enhanced feature graph by using a full convolution network to obtain a final detection result:

step 701, for the enhanced feature map D₂、D₃、D₄、D₅、D₆Respectively inputting the data into a frame regression sub-network and a classification sub-network;

loss function L of bounding box regression subnetwork training_regThe formula is as follows:

wherein d represents the difference between the predicted bounding box vector and the true label box vector; loss function L for classification subnetwork training_clsThe formula is as follows:

L_cls＝-α_t(1-q_t)^γlog(q_t)

wherein q is_tProbability value, alpha, representing class prediction_tAnd gamma is an over-parameter for adjusting the balance of the difficult and easy samples; thus, the Loss function Loss formula of the whole network training is as follows:

wherein N is_posRepresenting the number of positive samples, mu and lambda representing balance factors of classification loss and bounding box regression loss, respectively;

step 702, enhance the feature map D₂、D₃、D₄、D₅、D₆And after respectively passing through the frame regression sub-network and the classification sub-network, obtaining a final prediction frame by using a non-maximum suppression algorithm, and outputting a ship SAR image target detection result.

Compared with the prior art, the invention mainly has the following advantages:

firstly, the invention provides a non-local feature enhancement-based SAR image ship target detection method, which balances semantic information among different feature layers through feature layer scaling, integration, refining and enhanced non-local feature enhancement processing, improves the utilization rate of feature layer information, can reduce the influence of ship target SAR image noise on detection, and further improves the robustness of ship target detection;

secondly, the SAR image feature learning is carried out by adopting the feature pyramid network, the high resolution of the low-level features and the semantic information of the high-level features are fused, so that each layer of features of the network can have rich semantic information, and the detection effect of the SAR image ship small target can be further improved.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic structural diagram of a non-local feature enhancement model according to the present invention.

Detailed Description

The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Spatially relative terms, such as "above … …," "above … …," "above … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial relationship to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" can include both an orientation of "above … …" and "below … …". The device may be otherwise variously oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

As shown in fig. 1, the disclosed SSDD ship SAR image data set is taken as an example to illustrate the rationality and effectiveness of the present invention, and includes the following steps:

101, marking the ship SAR image data set by adopting an open SSDD ship SAR image data set, totaling 1160 images and adopting open source software labelImg;

step 201, constructing a ResNet-50 network as a basic network, which comprises 5 convolution modules, namely Conv1, Conv2_ x, Conv3_ x, Conv4_ x and Conv5_ x, wherein the ship SAR image is subjected to bilinear interpolation, is scaled to 400 × 512 in size and is input into Conv1 after being normalized;

step 202, adopting a ResNet-50 network to extract the features of the SAR image, and outputting feature maps of the last 4 convolution modules, wherein the feature maps are respectively C₂、C₃、C₄、C₅In which C is₂256 feature maps, C, of 100X 128 dimensions are shown₃Denotes a 512-piece feature map, C, of dimensions 50X 64₄Representing 1024 signatures of size 25X 32, C₅2048 feature maps of size 13 × 16 are shown;

step 301, for the characteristic diagram C output in step 202₂、C₃、C₄、C₅Respectively adopting 256 convolution kernels with the size of 1 multiplied by 1 to carry out convolution operation, realizing feature dimension reduction and obtaining a dimension-reduced feature map C₂′、C₃′、C₄′、P₅', wherein C₂' denotes 256 feature maps, C, of size 100X 128₃' denotes 256 feature maps, C, of size 50X 64₄' denotes 256 feature maps of size 25X 32, P₅' 256 feature maps with dimensions 13 × 16 are shown;

Step 303, checking P by respectively adopting convolution of 3 × 3₂′、P₃′、P₄′、P₅Performing convolution operation to eliminate aliasing effect caused by bilinear interpolation to obtain a fused characteristic diagram P₂、P₃、P₄、P₅In which P is₂Representing 256 features of size 100X 128, P₃Representing 256 features of size 50X 64, P₄256 feature maps, P, of 25X 32 size are shown₅256 feature maps of size 13 × 16 are shown;

wherein, P_a、P_bRespectively representing the feature maps of two groups of feature maps after averaging, wherein P_aRepresenting 256 features of size 50X 64, P_b256 feature maps of 25 × 32 size are shown;

step 501, average feature map P_aRespectively adopting 128 convolution kernels with the size of 1 multiplied by 1 to carry out three groups of convolution operations, realizing feature dimensionality reduction, and respectively expanding the dimensionality of the three groups of convolution kernels to obtain three feature matrixes:

θ(pⁱ)＝W_θpⁱ

φ(p^j)＝W_φp^j

g(p^j)＝W_gp^j

step 504, check the enhanced feature matrix y with 256 convolution of size 1 × 1_iPerforming convolution operation to realize feature dimension increasing, and then performing convolution operation with the average feature map P_aAdding element by element to enhance original characteristics to obtain enhanced characteristic diagram D₃；

Step 505, the average feature map P is processed according to steps 501 to 504_bProcessing to obtain enhanced feature map D₄；

Step six, carrying out scale transformation on the enhanced feature graph:

Step 603, enhanced feature map D of scale transformation₂、D₃、D₄、D₅Respectively replacing the feature map P of step 303₂、P₃、P₄、P₅Wherein D is₂256 feature maps, D, of 100X 128 size are shown₃256 feature maps of size 50X 64, D₄256 feature maps of 25X 32 size, D₅256 feature maps of size 13 × 16 are shown;

step 604, for the enhanced feature map D₅The size is reduced after the space pyramid pooling is carried out, and an enhanced feature map D is obtained₆Wherein D is₆256 feature maps of size 7 × 8 are shown;

step 701, enhancing the feature map D₂、D₃、D₄、D₅、D₆Respectively inputting the data into a frame regression sub-network and a classification sub-network;

bounding box regression subnetworkLoss function of training L_regThe formula is as follows:

L_cls＝-α_t(1-q_t)^γlog(q_t)

step 702, enhance the feature map D₂、D₃、D₄、D₅、D₆And respectively obtaining a final prediction frame by using a non-maximum suppression algorithm through a frame regression sub-network and a classification sub-network, and outputting a ship SAR image target detection result.

The above embodiments are only examples of the present invention, and are not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiments according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. A non-local feature enhancement-based SAR image ship target detection method is characterized by comprising the following steps:

wherein, P_a、P_bRespectively representing the feature maps after the two groups of feature maps are averaged;

θ(pⁱ)＝W_θpⁱ

φ(p^j)＝W_φp^j

g(p^j)＝W_gp^j

Step six, carrying out scale transformation on the enhanced feature graph:

L_cls＝-α_t(1-q_t)^γlog(q_t)