CN111898699A

CN111898699A - Automatic detection and identification method for hull target

Info

Publication number: CN111898699A
Application number: CN202010802211.2A
Authority: CN
Inventors: 杨子恒; 曾潇然; 胡智焕; 刘笑成; 孙志坚; 张卫东
Original assignee: Haizhiyun Suzhou Technology Co ltd
Current assignee: Haizhiyun Suzhou Technology Co ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2020-11-06
Anticipated expiration: 2040-08-11
Also published as: CN111898699B

Abstract

The invention relates to a hull target automatic detection and identification method, which comprises the following steps: step 1: acquiring a training data set and a test data set; step 2: constructing a ship body recognition neural network model; and step 3: expanding the training data set to obtain an expanded training data set, and training the neural network model by using the expanded training data set; and 4, step 4: testing the neural network model by using the test data set, judging whether the model precision meets the preset precision, if so, executing the step 5, otherwise, returning to the step 3; and 5: and carrying out automatic detection and identification on the ship body by using the trained ship body identification neural network model. Compared with the prior art, the method has the advantages of high identification precision, high identification speed and the like.

Description

Automatic detection and identification method for hull target

Technical Field

The invention relates to the technical field of automatic identification of hull targets under a navigation condition, in particular to an automatic detection and identification method of hull targets.

Background

With the development of computer vision, especially in the field of target detection in recent years, more and more automatic driving systems are equipped with automatic image recognition systems to ensure the safety of unmanned driving, and similarly, in the field of navigation, the automatic navigation of unmanned ships also needs to automatically recognize the ship body to ensure the safety of the ship-moving task.

The existing target detection algorithms rarely carry out target identification training specially aiming at the ship body, and the algorithms are not high in identification precision and speed of the ship body, so that the ship body identification precision and speed are improved by a characteristic target detection algorithm aiming at the ship.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an automatic detection and identification method for a ship hull target, which is based on yolov3 and has the advantages of improvement, high identification precision and high identification speed on the basis.

The purpose of the invention can be realized by the following technical scheme:

a ship hull target automatic detection and identification method comprises the following steps:

step 1: acquiring a training data set and a test data set;

step 2: constructing a ship body recognition neural network model;

and step 3: expanding the training data set to obtain an expanded training data set, and training the neural network model by using the expanded training data set;

and 4, step 4: testing the neural network model by using the test data set, judging whether the model precision meets the preset precision, if so, executing the step 5, otherwise, returning to the step 3;

and 5: and carrying out automatic detection and identification on the ship body by using the trained ship body identification neural network model.

Preferably, the method for acquiring the training data set comprises:

the hull pictures are labeled using LabelImg, and then processed into txt format files using python, all of which constitute the training data set.

Preferably, the method for expanding the training data set specifically comprises:

and mixing different pictures in the training data set with image pixels by using a Mixup image enhancement mode to expand the data set, then carrying out scale transformation on the expanded training data set to expand the training data set again, and finally taking the training data set subjected to two times of expansion as a final expanded training data set.

Preferably, the training of the hull recognition neural network model in the step 3 adopts a mixed precision training mode.

More preferably, the mixing precision training mode is specifically as follows:

the 16-bit half-precision weight is used for calculating the activation function part, and the 32-bit single-precision weight is used for updating.

Preferably, the hull recognition neural network model comprises an improved CSPDarknet53 framework, a second convolution module, an SPP pooling layer, a feature fusion module, an output layer and an anchor frame detection module which are connected in sequence; the improved CSPDarknet53 framework is also connected with an output layer;

the improved CSPDarknet53 framework comprises a first convolution module, a first residual error module, a second residual error module, a third residual error module, a fourth residual error module and a fifth residual error module which are sequentially connected;

the output layer comprises a first splicing module, a third convolution module, a fourth convolution module, a second splicing module, a fifth convolution module, a sixth convolution module, a first up-sampling module, a seventh convolution module and a second up-sampling module;

the input end of the first splicing module is connected with the third residual error module, and the output end of the first splicing module is sequentially connected with the third convolution module, the fourth convolution module and the anchor frame detection module; the input end of the second splicing module is connected with the fourth residual error module, and the output end of the second splicing module is sequentially connected with the fifth convolution module, the sixth convolution module and the anchor frame detection module; the input end of the seventh convolution module is connected with the characteristic fusion module, and the output end of the seventh convolution module is connected with the anchor frame detection module;

the input end of the first up-sampling module is connected with the output end of the fifth convolution module, and the output end of the first up-sampling module is connected with the input end of the first splicing module; the input end of the second up-sampling module is connected with the output end of the seventh convolution module, and the output end of the second up-sampling module is connected with the input end of a second splicing module (504);

the feature fusion module is three adaptive spatial feature fusion structures (ASFFs) which are sequentially connected.

More preferably, the first convolution module, the second convolution module, the third convolution module, the fourth convolution module, the fifth convolution module and the sixth convolution module are respectively composed of a plurality of convolution blocks; the convolution block comprises a convolution layer and a DropBlock regularization unit; the output end of the convolution layer is connected with the input end of the DropBlock;

the first residual error module, the second residual error module, the third residual error module, the fourth residual error module and the fifth residual error module are respectively composed of a plurality of residual error blocks; the residual blocks are represented as residual blocks n, and the expression of the residual block is as follows:

residual block n input + convolution block + residual unit n

The residual unit is specifically a residual network containing two convolution blocks.

More preferably, the convolutional layer takes the form of a convolution of Mixconv2 d;

the activation function of the convolutional layer is specifically a Mish activation function, and specifically includes:

Mish＝x*tanh(ln1+e^x)。

more preferably, the hull recognition neural network model adopts a cosine annealing mode to realize automatic attenuation of the learning rate.

More preferably, the anchor frame detection module comprises three anchor frame modules, and the modules are obtained by clustering a training set; three outputs obtained by the hull recognition neural network model are respectively input into the three anchor frame modules to obtain three outputs related to the coordinate information and the probability of the hull; and then, respectively carrying out non-maximum value inhibition on the three outputs to obtain the same proper output, if the probability of identifying the ship output by the anchor frame detection module is greater than the set basic probability, judging that the ship is detected by the output, and finally outputting the coordinate information output by the anchor frame detection module as the position information of the ship in the image.

Compared with the prior art, the invention has the following advantages:

firstly, the recognition precision is high, and the recognition speed is fast: the target detection and identification method of the invention uses the hull recognition neural network model to automatically recognize the hull target, and adopts the Mixup image enhancement mode to expand the training data set, so that the training data set is more comprehensive, and the recognition precision of the hull recognition neural network model obtained by training is higher; the hull recognition neural network model simplifies the number of convolution layers, and simultaneously, a Mish activation function is selected, so that the recognition speed of the model is improved.

Secondly, perfecting the model structure: the use of the spp structure improves the feature extraction capability of the neural network and reduces the possibility of overfitting, the use of the DenseNet connection method improves the gradient back propagation capability, so that the network is easier to train, the use of the MixConv2d convolution method improves the feature extraction capability of convolution, so that the structure of the hull recognition neural network model is more perfect, and the hull recognition method is more suitable for hull recognition.

Drawings

FIG. 1 is a schematic flow chart of a hull target automatic detection and identification method according to the present invention;

fig. 2 is a schematic structural diagram of a hull recognition neural network model in the invention.

The reference numbers in the figures indicate:

1. the improved CSPDarknet53 framework, 2, a second convolution module, 3, an SPP pooling layer, 4, a feature fusion module, 5, an output layer, 6, an anchor frame detection module, 101, a first convolution module, 102, a first residual module, 103, a second residual module, 104, a third residual module, 105, a fourth residual module, 106, a fifth residual module, 501, a first splicing module, 502, a third convolution module, 503, a fourth convolution module, 504, a second splicing module, 505, a fifth convolution module, 506, a sixth convolution module, 507, a first upsampling module, 508, a seventh convolution module, 509, and a second upsampling module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

An automatic detection and identification method for a ship body target is structurally shown in fig. 1 and comprises the following steps:

step 1: acquiring a training data set and a test data set;

step 2: constructing a ship body recognition neural network model;

The method for acquiring the training data set in the embodiment comprises the following steps: the hull pictures are labeled using LabelImg, and then processed into txt format files using python, all of which constitute the training data set.

The method for expanding the training data set in the embodiment comprises the following steps: and mixing different pictures in the training data set with image pixels by using a Mixup image enhancement mode to expand the data set, then carrying out scale transformation on the expanded training data set to expand the training data set again, and finally taking the training data set subjected to two times of expansion as a final expanded training data set.

The training of the hull recognition neural network model in the embodiment adopts a mixed precision training mode, which specifically comprises the following steps: the 16-bit half-precision weight is used for calculating the activation function part, and the 32-bit single-precision weight is used for updating.

The structure of the hull recognition neural network model in this embodiment is shown in fig. 2, and includes an improved CSPDarknet53 framework 1, a second convolution module 2, an SPP pooling layer 3, a feature fusion module 4, an output layer 5, and an anchor frame detection module 6, which are connected in sequence, and the improved CSPDarknet53 framework 1 is further connected to the output layer 5.

The improved CSPDarknet53 framework 1 includes a first convolution module 101, a first residual module 102, a second residual module 103, a third residual module 104, a fourth residual module 105, and a fifth residual module 106, which are connected in sequence.

The output layer 5 comprises a first concatenation module 501, a third convolution module 502, a fourth convolution module 503, a second concatenation module 504, a fifth convolution module 505, a sixth convolution module 506, a first upsampling module 507, a seventh convolution module 508 and a second upsampling module 509.

The input end of the first splicing module 501 is connected with the third residual error module 104, the output end of the first splicing module 501 is sequentially connected with the third convolution module 502, the fourth convolution module 503 and the anchor frame detection module 6, the input end of the second splicing module 505 is connected with the fourth residual error module 105, the output end of the second splicing module 505 is sequentially connected with the fifth convolution module 505, the sixth convolution module 506 and the anchor frame detection module 6, the input end of the seventh convolution module 508 is connected with the feature fusion module 4, and the output end of the seventh convolution module 508 is connected with the anchor frame detection module 6.

The input end of the first upsampling module 507 is connected to the output end of the fifth convolution module 505, the output end of the first upsampling module 507 is connected to the input end of the first splicing module 501, the input end of the second upsampling module 509 is connected to the output end of the seventh convolution module 508, and the output end of the second upsampling module 509 is connected to the input end of the second splicing module 504.

The feature fusion module 4 in this example is three adaptive spatial feature fusion constructs ASFF connected in series.

The convolution modules in this embodiment, that is, the first convolution module 101, the second convolution module 2, the third convolution module 502, the fourth convolution module 503, the fifth convolution module 505, and the sixth convolution module 506, are respectively composed of a plurality of convolution blocks, each convolution block includes a convolution layer and a dryblock regularization unit, and an output end of the convolution layer is connected to an input end of the dryblock;

the first residual module 102, the second residual module 103, the third residual module 104, the fourth residual module 105 and the fifth residual module 106 are respectively composed of a plurality of residual blocks, the plurality of residual blocks are represented as residual blocks x n, and the expression of the residual modules is as follows:

residual block n input + convolution block + residual unit n

The convolutional layer in this embodiment adopts a convolutional form of Mixconv2d, and the activation function is a Mish function, specifically:

Mish＝x*tanh(ln1+e^x)

the hull recognition neural network model in the embodiment adopts a cosine annealing mode to realize automatic attenuation of the learning rate.

The following describes the model structure in detail:

one, improve CSPDarknet53 framework: the CSPDarknet53 frame-based target identification structure simplifies the middle neural network structure based on the existing CSPDarknet53 frame, so that the number of convolutional layers is reduced from 75 layers to 43 layers, the training process can be accelerated, and the detection performance cannot be reduced due to the reduction of convolutional layers because only the target detection of a single ship body is performed.

Secondly, SPP pooling layer: spatial Pyramid Pooling, which is a Spatial Pyramid Pooling layer, can better extract image features than a mature Pooling structure.

Thirdly, convolutional layer using the form of a Mixconv2d convolution: the Mixconv2d structure is a mixed receptive field convolution structure, and the capability of feature extraction in the image convolution process is effectively improved.

Fourthly, Mish activation function: the activation function allows better information to be driven down to the neural network, resulting in better accuracy and generalization.

Fifthly, a Mixup image enhancement technology: the image classification accuracy of 1 percent can be stably improved under the condition of almost no extra calculation overhead.

Sixthly, an adaptive spatial feature fusion structure ASFF: inconsistency in the training process is filtered by learning self-adaptive spatial fuzzy weight values, powerful baseline and tiny thrust overhead are obviously improved, and the most advanced speed and precision balance is realized among all single-point detectors.

Seventhly, Mixed Precision Training: the same precision can be achieved under the condition that the memory is reduced by half, and the training efficiency is improved.

Eight, DropBlock regularization: in order to discard local semantic information for more effectiveness, a network is stimulated to learn more robust and effective characteristics, a mode of discarding according to blocks is adopted, and the method is different from a Dropout method in that information of pixel points in a certain area of an image is lost together, so that the reuse rate of the information can be accelerated.

Ninthly, attenuation of cosine annealing learning rate: an automatic learning rate attenuation method can enable an algorithm to have a high learning rate during training and air storage, and a relatively low learning rate at the later stage of training and training, and receive a preset minimum value more highly.

The structural composition of each module is shown in table 1.

TABLE 1 structural composition of the various modules

Module	Structural assembly	Module	Structural assembly
				First convolution module 101	(608,608,32)*3	Seventh convolution module 508	(19,19,1024)*2
Second convolution module 2	(19,19,1024)*2	First residual module 102	(304,304,64)*1
				Third convolution module 502	(76,76,256)*2	Second residual module 103	(152,152,128)*2
Fourth convolution module 503	(76,76,512)*2	Third residual module 104	(76,76,256)*2
				Fifth convolution Module 505	(38,38,512)*2	Fourth residual module 105	(38,38,512)*2
Sixth convolution module 506	(38,38,1024)*2	Fifth residual module 106	(19,19,1024)*2

In the Training stage of the ship hull recognition neural network model in the embodiment, different Training data set pictures are Mixed on image pixels by using a Mixup image enhancement mode to expand a data set, the expanded Training set is subjected to scale transformation to expand the data set again, the data set after two times of expansion is transmitted to a neural network for Training, the Training adopts a Mixed Precision Training mode Mixed Precision Training, the embodiment is that 32-bit weight during Training is cut into 16 bits to be used for calculating a part of an activation function, but 32-bit weight is adopted during updating, so that the loss of semi-Precision information can be controlled, during Training, the picture firstly passes through 3 convolutional layers (including an extraction layer) to extract features, all convolutional layers adopt a convolutional form of Mixconv2d, in the convolutional layers, firstly, the image features are extracted by convolution, the activation functions of the convolution layers are changed from the traditional relu activation functions into the mish activation functions, the last part of the convolution layers is normalized through a Drop regularization unit, the output is used for feature extraction through the extraction layers of three residual blocks, after the output passes through the two residual blocks, one part of the output is transmitted into the output layer to serve as high-scale output (76), the other part of the output is transmitted downwards along the neural network structure, one part of the result obtained after the output passes through the two residual blocks is transmitted into the output layer to serve as medium-scale output (38) and the other part of the result is transmitted downwards along the neural network structure, one part of the result obtained after the output passes through the two residual blocks is transmitted into the output layer to serve as low-scale output (19), the output passes through the three convolution blocks and is transmitted downwards along the neural network structure to pass through an SPP pooling layer 2 and sequentially pass through 3 ASFF extraction structures The output obtained after the feature fusion module 4 is connected and formed is used as a third output after passing through one convolution block, the third output can be simultaneously used as an input to be spliced with the original 38 × 38 output after being expanded to the size of 38 × 38 along the upper sampling layer, and then is used as a second output after being spliced with the original 76 × 76 output after being expanded to the size of 76 × 76 along the upper sampling layer, and then is used as a first output after being subjected to two convolution blocks, and then the first output, the second output and the third output are multiplied by three anchors box corresponding to the first output respectively to obtain a final output, wherein the formats of all the outputs are the same and comprise: and if the probability c is greater than a basic set value of 0.5, the ships can be judged as the real detected ship results, and x, y, h and w can be output as the coordinate positions of the ship in the image.

In step 3, the number of the types to be detected in the original yolo algorithm is changed into 1 by the network architecture, and the ship body recognition neural network is built according to the method, so that the structure of the neural network and the number of layers of the convolutional layers are reduced.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A ship body target automatic detection and identification method is characterized by comprising the following steps:

step 1: acquiring a training data set and a test data set;

step 2: constructing a ship body recognition neural network model;

2. The hull target automatic detection and identification method according to claim 1, characterized in that the training data set is obtained by the following method:

3. The hull target automatic detection and identification method according to claim 1, characterized in that the method for expanding the training data set specifically comprises:

4. The hull target automatic detection and identification method according to claim 1, characterized in that the training of the hull recognition neural network model in the step 3 adopts a mixed precision training mode.

5. The automatic detection and identification method for the ship hull target according to claim 4, characterized in that the mixed precision training mode is specifically as follows:

6. The automatic detection and identification method for the ship hull targets according to claim 1, characterized in that the ship hull recognition neural network model comprises a modified CSPDarknet53 framework (1), a second convolution module (2), an SPP pooling layer (3), a feature fusion module (4), an output layer (5) and an anchor frame detection module (6) which are connected in sequence; the improved CSPDarknet53 framework (1) is also connected with an output layer (5);

the improved CSPDarknet53 framework (1) comprises a first convolution module (101), a first residual error module (102), a second residual error module (103), a third residual error module (104), a fourth residual error module (105) and a fifth residual error module (106) which are connected in sequence;

the output layer (5) comprises a first splicing module (501), a third convolution module (502), a fourth convolution module (503), a second splicing module (504), a fifth convolution module (505), a sixth convolution module (506), a first upsampling module (507), a seventh convolution module (508) and a second upsampling module (509);

the input end of the first splicing module (501) is connected with the third residual error module (104), and the output end of the first splicing module (501) is sequentially connected with the third convolution module (502), the fourth convolution module (503) and the anchor frame detection module (6); the input end of the second splicing module (505) is connected with the fourth residual error module (105), and the output end of the second splicing module (505) is sequentially connected with the fifth convolution module (505), the sixth convolution module (506) and the anchor frame detection module (6); the input end of the seventh convolution module (508) is connected with the feature fusion module (4), and the output end of the seventh convolution module is connected with the anchor frame detection module (6);

the input end of the first up-sampling module (507) is connected with the output end of the fifth convolution module (505), and the output end of the first up-sampling module (507) is connected with the input end of the first splicing module (501); the input end of the second up-sampling module (509) is connected with the output end of the seventh convolution module (508), and the output end of the second up-sampling module (509) is connected with the input end of the second splicing module (504);

the feature fusion module (4) is three adaptive spatial feature fusion structures (ASFF) which are sequentially connected.

7. The automatic detection and identification method for the ship hull targets according to claim 6, characterized in that the first convolution module (101), the second convolution module (2), the third convolution module (502), the fourth convolution module (503), the fifth convolution module (505) and the sixth convolution module (506) are respectively composed of a plurality of convolution blocks; the convolution block comprises a convolution layer and a DropBlock regularization unit; the output end of the convolution layer is connected with the input end of the DropBlock;

the first residual error module (102), the second residual error module (103), the third residual error module (104), the fourth residual error module (105) and the fifth residual error module (106) are respectively composed of a plurality of residual error blocks; the residual blocks are represented as residual blocks n, and the expression of the residual block is as follows:

residual block n input + convolution block + residual unit n

8. The automatic detection and identification method for the ship hull targets according to claim 7, characterized in that the convolution layer adopts a convolution form of Mixconv2 d;

Mish＝x*tanh(ln|1+e^x|)。

9. the automatic hull object detecting and identifying method according to claim 6, characterized in that the hull identifying neural network model adopts a cosine annealing mode to realize automatic attenuation of learning rate.

10. The hull target automatic detection and identification method according to claim 6, characterized in that the anchor frame detection module (6) comprises three anchor frame modules, which are obtained by training set clustering; three outputs obtained by the hull recognition neural network model are respectively input into the three anchor frame modules to obtain three outputs related to the coordinate information and the probability of the hull; and then, the three outputs are respectively subjected to non-maximum suppression to obtain the same proper output, if the probability of identifying the ship output by the anchor frame detection module (6) is greater than the set basic probability, the output is judged to be the ship detected, and the coordinate information output by the anchor frame detection module (6) is used as the position information of the ship in the image and is finally output.