CN113642606B

CN113642606B - Marine ship detection method based on attention mechanism

Info

Publication number: CN113642606B
Application number: CN202110788250.6A
Authority: CN
Inventors: 徐晓刚; 陈雨杭; 余新洲; 徐冠雷
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2024-01-09
Anticipated expiration: 2041-07-13
Also published as: CN113642606A

Abstract

The invention discloses a marine ship detection method based on an attention mechanism, which can enable a deep convolutional neural network to pay attention to a part with more discriminant in characteristic representation, thereby effectively improving the precision of ship detection and identification. The method comprises the following steps: the method comprises the steps of inputting an original ship picture into a backbox of a YOLOv5 network to obtain a feature map output by a last convolution layer, inputting the feature map into a coordinated attention module to obtain a feature map optimized based on coordinated attention, inputting the optimized feature image into a Neck part, generating three feature layers after a series of processing, inputting the three feature layers into an output end, and giving a boundary box and confidence level according to the generated three feature layers by the output end. The coordination attention module related to the optimized network provided by the invention simultaneously considers the relationship among channels and the position information. The method not only captures the cross-channel information, but also contains direction-aware (direction-aware) and position-sensitive (position-sensitive) information, and can more accurately locate and identify the target area.

Description

Marine ship detection method based on attention mechanism

Technical Field

The invention relates to the field of image recognition, in particular to a marine ship detection method based on an attention mechanism.

Background

In recent years, intelligent science and technology are continuously developed, and high-new technology is also becoming the first driving force for social development. In military, as the strategic status of the ocean is also increasing, competition around the ocean is unprecedented, and in order to improve the supervision and control capability of the ocean, research on ocean monitoring is gradually increased in various countries. In the aspect of national defense construction, the ships are taken as important traffic and military equipment in new times, so that the ship information of the territories of the China and the world is rapidly acquired, and the method has important significance for national defense safety in China.

In the early stage, the research on the theme of marine ship recognition is to separate detection and recognition, and Wang Ruifu et al propose a detection algorithm based on CFAR global detection algorithm and CNN model image recognition. The algorithm designed by Deng Zhipeng et al comprises two detection subnetworks, one of which is used for generating a multi-scale suggestion box, and the other is a high-precision recognition network based on feature fusion, and can generate stronger response to small and dense ship targets. Along with the development of image processing and artificial intelligence technology, in recent years, a target detection algorithm is rapidly developed, and the positioning and classification of ship targets can be performed together through the target detection algorithm.

Disclosure of Invention

The invention aims to provide a marine ship detection method based on an attention mechanism, and aims to provide a method for detecting and identifying different types of ships. The method can be used for identifying the types of ships in batches, meanwhile, the identification accuracy is improved, errors in the artificial identification process are avoided, and the overall efficiency of the related fields is improved.

In order to achieve the above purpose, the invention provides a ship detection method based on an attention mechanism, which comprises the following steps:

1) Acquiring multiple types of ship pictures from a network, and marking each picture to obtain a ship data set;

2) Training the YOLOv5 network on the target detection data set to obtain a pre-training weight file of the YOLOv5 network model;

3) Modifying the structure of the YOLOv5 network;

4) Loading the pre-training weight file obtained in the step 2), and retraining the modified YOLOv5 network model in the step 3) on a ship data set to obtain a new weight file, namely a target network model;

5) And loading the target network model into a ship detection and identification system, and inputting the ship picture to be identified into the system for detection and identification to obtain the ship type and the confidence coefficient.

Preferably, the step 3) specifically includes:

3.1 Inputting the original ship image into a backbox of a YOLOv5 network to obtain a feature map X output by a last convolution layer;

3.2 Inputting the feature map X into a coordinated attention module to obtain a feature map Y optimized based on coordinated attention;

3.3 The optimized characteristic image Y is input into a Neck part, the Neck part adopts an FPN+PAN structure, three characteristic layers are generated after processing and are input into an output end, and the output end gives a boundary box and confidence according to the generated three characteristic layers.

Preferably, the step 3.2) specifically comprises:

for the input signature X, each channel is encoded along the horizontal and vertical coordinates using a pooling kernel of size (H, 1) or (1, w), respectively, so the output of the c-th channel of height H is expressed as:

likewise, the output of the c-th channel of width w is expressed as:

the two transforms aggregate features along two spatial directions respectively to obtain a pair of direction-perceived feature maps z ^h And z ^w ；

The direction perception feature diagram obtained by the transformation is subjected to concat operation, and then a 1 multiplied by 1 convolution transformation function F is used ₁ And carrying out transformation operation on the data, wherein the calculation formula is as follows:

f＝δ(F ₁ ([z ^h ,z ^w ]))

wherein δ is a nonlinear activation function ReLu activation function, [, ], a]For the concat operation along the spatial dimension,for spatial information in horizontal and vertical directionsIntermediate feature map for direction encoding, r representing compression ratio parameters for controlling SE module size;

then, f is split into two independent tensors along the spatial dimensionAnd->Then use two additional 1 x 1 convolutions F _h ,F _w Respectively f ^h ,f ^w Transformed into tensors with the same dimensions as X, the calculation formula is as follows:

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F _w (f ^w ))

wherein σ is a sigmoid activation function;

next, for output g ^h ，g ^w And expanding and serving as attention weight, and acting on the optimized feature map Y input to obtain the attention module, wherein the calculation formula is as follows:

the beneficial effects of the invention are as follows: the invention provides a marine ship detection method based on an attention mechanism, which can enable a deep convolutional neural network to pay attention to a more discriminative part in feature representation, thereby effectively improving the precision of ship detection and identification. The coordination attention module related to the optimized network provided by the invention simultaneously considers the relationship among channels and the position information. It captures not only cross-channel information, but also direction-aware and position-sensitive information, which allows the model to more accurately locate and identify the target area.

Drawings

The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of the overall structure of a modified YOLOv5 network of the present invention;

FIG. 2 is a diagram of a coordinated attention module according to an embodiment of the present invention;

fig. 3 is an example of the detection result of the present invention.

Detailed Description

In order to more particularly describe the present invention, the following detailed description of the technical scheme of the present invention is provided with reference to the accompanying drawings and the specific embodiments.

Fig. 1 is a network structure diagram of a marine ship detection method based on an attention mechanism according to an embodiment of the present invention. Referring to fig. 1, the overall flow of the attention mechanism-based marine vessel detection method of the present invention comprises the steps of:

step 1: acquiring multiple types of ship pictures from a network, and marking each picture to obtain a ship data set; (seven types of ship pictures are specifically acquired in the embodiment, including aircraft carriers, cruisers, expelling ships, guard ships, amphibious warships, civil ships and other ships);

step 2: the dataset was assembled as per 8:1:1 is divided into a training set, a verification set and a test set;

step 3: training a YOLOv5 network on a coco and other large data set to obtain a pre-training weight of a YOLOv5 model;

step 4: modifying the YOLOv5 network structure;

wherein, step 4 specifically comprises:

4.1: the method comprises the steps of preprocessing an original ship image, inputting the preprocessed original ship image into a backbone network, processing the preprocessed original ship image through a Focus structure, convolution operation and a Bottleneck CSP structure to obtain 3 characteristic images with different sizes, and processing the last characteristic image through an SPP structure to obtain an output characteristic image X;

4.2: inputting the feature map X into a coordinated attention (Coordinate Attention) module to obtain a feature map Y optimized based on coordinated attention;

4.3: and inputting the output characteristic image Y into a Neck part, wherein Neck adopts an FPN+PAN structure, and performing concat operation with the first two characteristic layers generated by the backstone after Bottleneck CSP structure and up-sampling operation. After the series of operations, three feature layers are generated and input to an output end, the output end gives out a boundary frame and confidence according to the generated three feature layers, and then a non-maximum suppression method is adopted to screen out repeated boundary frames so as to obtain a prediction frame.

Step 5: loading the pre-trained weight file, and retraining the modified YOLOv5 network model on the ship data set to obtain a new weight file, namely a target network model;

step 6: and loading the target network model into a ship detection and identification system, and inputting the ship picture into the system for detection and identification to obtain the ship type and the confidence.

As shown in fig. 2, in a specific embodiment of the present invention, step 4.2 specifically comprises the following sub-steps:

4.2.1: for the input signature X, each channel is encoded along the horizontal and vertical coordinates, respectively, using a pooling kernel of size (H, 1) or (1, w). Thus, the output of the c-th channel of height h can be expressed as:

likewise, the output of the c-th channel of width w can be expressed as:

4.2.2: the two transforms aggregate features along two spatial directions respectively to obtain a pair of direction-perceived feature maps z ^h And z ^w The method comprises the steps of carrying out a first treatment on the surface of the The direction perception feature diagram obtained by the transformation is subjected to concat operation, and then a 1 multiplied by 1 convolution transformation function F is used ₁ And carrying out transformation operation on the data, wherein the calculation formula is as follows:

f＝δ(F ₁ ([z ^h ,z ^w ]))

wherein δ is a nonlinear activation function ReLu activation function, [, ], a]For the concat operation along the spatial dimension,for intermediate feature mapping that encodes spatial information in both the horizontal and vertical directions. r denotes a compression ratio parameter for controlling the size of the SE module.

4.2.3: then, f is split into two independent tensors along the spatial dimensionAnd->Then use two additional 1 x 1 convolutions F _h ,F _w Respectively f ^h ,f ^w Transformed into tensors with the same dimensions as X, the calculation formula is as follows:

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F _w (f ^w ))

where σ is the sigmoid activation function.

4.2.4: next, for output g ^h ，g ^w The attention module is expanded and used as attention weight, and the attention weight is acted on the input to obtain the output Y of the attention module, and the calculation formula is as follows:

the invention provides a marine ship detection method based on an attention mechanism, which can enable a deep convolutional neural network to pay attention to a more discriminative part in feature representation, thereby effectively improving the precision of ship detection and identification. The coordination attention module related to the optimized network provided by the invention simultaneously considers the relationship among channels and the position information. It captures not only cross-channel information, but also direction-aware and position-sensitive information, which allows the model to more accurately locate and identify the target area. As shown in fig. 3, a ship picture of the aircraft carrier is input, an output detection result "air carrier" is obtained through the convolutional neural network based on the attention mechanism, and the position of the aircraft carrier is framed on the picture by a rectangular frame.

On a test set on a self-built ship data set, the method and the original YOLOv5 network are subjected to comparison test. Compared with the original network of YOLOv5, under the condition of the same accuracy, the invention can obtain 2% mAP improvement, and the improvement is slightly improved on two indexes of recall rate and [email protected]:0.95, which proves the effectiveness of the invention on a self-constructed ship data set, and the specific results are shown in the following table:

model	Accuracy rate of	Recall rate of recall	[email protected]	[email protected]:0.95
					YOLOv5	0.967	0.953	0.962	0.821
Ours	0.967	0.955	0.982	0.822

The above description is only of the preferred embodiments of the present invention, and is not intended to limit the present invention. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims

1. A marine ship detection method based on an attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

3) Modifying the structure of the YOLOv5 network;

the step 3) is specifically as follows:

3.3 Inputting the optimized characteristic image Y into a Neck part, wherein the Neck part adopts an FPN+PAN structure, generating three characteristic layers after processing, inputting the three characteristic layers into an output end, and giving a boundary box and a confidence level according to the generated three characteristic layers by the output end;

the step 3.2) specifically comprises the following steps:

likewise, the output of the c-th channel of width w is expressed as:

f＝δ(F ₁ ([z ^h ,z ^w ]))

wherein δ is a nonlinear activation function ReLu activation function, [, ], a]For the concat operation along the spatial dimension,for intermediate feature mapping that encodes spatial information in the horizontal and vertical directions, r represents a compression ratio parameter for controlling the size of the SE module;

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F _w (f ^w ))

wherein σ is a sigmoid activation function;