CN110796009A

CN110796009A - Method and system for detecting marine vessel based on multi-scale convolution neural network model

Info

Publication number: CN110796009A
Application number: CN201910930804.4A
Authority: CN
Inventors: 王平; 李明; 雷建胜; 赵光辉; 安玉拴; 金明磊; 李超; 陈浩
Original assignee: Aerospace Star Technology Co Ltd
Current assignee: Aerospace Star Technology Co Ltd; Space Star Technology Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-02-14

Abstract

The invention provides a method and a system for detecting a marine ship based on a multi-scale convolutional neural network model, wherein the method comprises the following steps: constructing a ship image sample library, acquiring ship video data of a coastal area under visible light based on an unmanned aerial vehicle platform, extracting each frame of image, and obtaining a true value and a length and a width of a ship position; then, enhancing the data through digital image processing algorithms such as inversion and scaling; constructing a multilayer convolutional neural network as a ship target detector, inputting the obtained processed image serving as sample data into a deep learning network, and obtaining a feature map after convolution; and constructing a multi-scale convolution neural unit, performing feature fusion on the multilayer convolution feature maps based on the feature maps after convolution, and training according to the obtained real position of the ship to obtain a training model. The invention adopts a multi-scale fusion method, thereby well ensuring the detection accuracy and reducing the training difficulty.

Description

Method and system for detecting marine vessel based on multi-scale convolution neural network model

Technical Field

The invention belongs to the technical field of ship digital image processing, and particularly relates to a method and a system for detecting a marine ship based on a multi-scale convolutional neural network model.

Background

In the modern society, monitoring cameras are everywhere, and if the monitoring cameras are only observed and detected by human eyes, abnormal events in videos are easily missed. With the rapid development of computer networks, communication and semiconductor technologies, people are more and more interested in analyzing video images by using computer vision instead of human eyes to obtain useful information in the video images. Target detection is a key point of computer vision research, and the main function of the target detection is to extract the position of a target which is interested in people in an image and other information. The target detection is the basis of many video applications, more is the necessity of applications such as traffic monitoring, intelligent robot and man-machine interaction, and the like, has an important role in intelligent city management, illegal crime fighting and safe city and smart city construction, and is the key and difficult point of current video processing research. For the ship target, the ship management, supervision and scheduling of coastal cities play a crucial role.

The inventor provides a ship detection method and system based on scene multi-dimensional features in a patent application named as a ship detection method and system based on scene multi-dimensional features (patent application No. 201711311822.1 publication No. CN 107818326B), wherein all edges of each frame of image are extracted as the fourth dimension of the image; extracting to obtain a coastline, and enabling a sea surface area to be a ship appearance range area; constructing a Fast Region-based functional Neural Networks (RCNN) convolution network as a deep learning network, and inputting sample data into the deep learning network; constructing an RPN (region pro-social network) network, generating region suggestion frames with different sizes in a ship appearance range region by using a sliding window, combining the region suggestion frames with the obtained deep learning network, and training a model according to the real position of the ship; and carrying out ship detection on the part between the coastlines on the detection image based on the model obtained by training. The method has the disadvantages that Hough only has a good segmentation effect on the coastline which is in a straight line, and the segmentation robustness is poor; the fast RCNN-like network needs to manually specify the default back box size of the algorithm in advance, has high training difficulty and is difficult to adapt to ships with various dimensions.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method and a system for detecting a marine vessel based on a multi-scale convolutional neural network model.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a marine vessel detection method based on a multi-scale convolutional neural network model comprises the following steps:

step 1, constructing a ship image sample library, collecting ship video data of a coastal area under visible light, extracting each frame of image, obtaining a true value and a length and a width of a ship position, and further performing enhancement processing on the ship video data through a digital image processing algorithm;

step 2, constructing a multilayer convolutional neural network as a ship target detector, inputting the processed image obtained in the step 1 into a deep learning network as sample data, and obtaining a feature map after convolution;

and 3, constructing a multi-scale convolution neural unit, performing feature fusion on the multilayer convolution feature maps based on the feature maps obtained in the step 2 after convolution, and training according to the real position of the ship to further obtain a training model.

As a further preferable solution of the method for detecting a marine vessel based on the multi-scale convolutional neural network model of the present invention, in step 1, the scaling operation of the data enhancement process is implemented as follows:

step 1.1, scaling the image according to the original length-width ratio until the longest edge is equal to 500;

step 1.2, then, the short edges with the length less than 500 pixels are filled with gray to form a picture with the side length of 500 pixels;

and step 1.3, if the longest edge of the original picture is less than 500 pixels, performing gray compensation operation.

As a further preferable aspect of the method for detecting a marine vessel based on the multi-scale convolutional neural network model of the present invention, in step 2,

the network structure of the constructed multilayer convolutional neural network consists of 62 convolutional layers and 5 pooling layers;

for the common convolution layer, a convolution core with a modifiable numerical value is used for convolving the feature layer of the previous layer, and an output feature layer can be obtained by activating a function;

where Mj represents the set of input layers selected, i is the index value of the input layer cell, j is the index value of the output layer cell,representing the weight between the i-th layer input layer and the j-th layer output layer,

represents the activation bias for the output layer of layer j, f () represents the activation function for that output layer,

the jth output layer representing the l layer,

the ith input layer of the l-1 layer is represented, and the pooling layer comprises N input layers and N output layers;

wherein d isown () represents a down-sampling function, typically summing all pixels in different n × n regions of the input image, so that the output image is reduced by a factor of n in both dimensions, each output layer corresponding to a respective multiplicative offsetAnd an additive bias

The jth output layer representing the l layer,

the jth input layer representing the l-1 layer;

for the output full connection layer, convolution is input into multiple characteristic layers, and the convolution values are summed to obtain the output layer, using a_ijRepresenting the weight or contribution of the ith input layer in the jth output feature layer, the jth output layer can be represented as:

and the weight a_ijThe constraints need to be satisfied:

wherein N is_inIndicates the number of layers of the input features,

representing the weight between the input layer and the output layer,

indicating the activation bias between the various layers,the jth output layer representing the l layer,

the jth input layer representing the l-1 layer;

as a further preferable scheme of the method for detecting the marine vessel based on the multi-scale convolutional neural network model of the present invention, in step 3, the multi-scale feature fusion module specifically comprises:

step 3.1, extracting feature maps output by the convolution layers of layers 9, 12, 15, 17 and 19 in the whole neural network, and then splicing the 5 feature maps to form a new feature map; then, the new feature map is fed into two multi-scale loss function modules

And

calculating a multi-scale loss function and updating network parameters; the multiscale loss function during training is defined as follows:

the multi-scale loss function in training consists of two multi-scale loss function modules

And

a composition in which i represents an element number in a feature map; p is a radical of_iIndicating the probability that the ith element in the feature map contains the target at the position corresponding to the original image,

the probability of whether the original image position in the real label data contains the target or not is shown, and if the real value of the label is judged to be the real valueThe ith position in the feature map is a target, and if the ith position is 0, the ith position is not the target; t is t_iIndicating the coordinate offset of the ith element in the feature map corresponding to the frame in the original image

Then represents the offset of the real frame coordinates; n is a radical of_clsAnd N_regThe total number of target classes and the total number of frame coordinate offsets contained in the feature map, and the classification loss functionUsing a conventional softmax function, a regression loss functionThe used method is smooth_L1loss：

Wherein the content of the first and second substances,

which means that the regression loss function is calculated only for the candidate box with the target, lambda is an adjustable parameter, and is set to 3 by default, to balance the unbalanced influence of the ratio between positive and negative samples in the data on the final loss function.

A ship detection system for building a deep learning network model based on scene multi-dimensional features comprises the following steps:

the image acquisition module is used for constructing a ship image sample library, and comprises the steps of acquiring video data acquired by the unmanned aerial vehicle in the coastal region under visible light, extracting each frame of image and obtaining a ship position true value and length and width;

the data enhancement processing module is used for carrying out data enhancement processing on the data set, and comprises classic digital image processing algorithms such as inversion and scaling on the image;

the characteristic extraction module is used for constructing a multilayer convolutional neural network as a ship target detector, and inputting an image serving as sample data into the multilayer convolutional neural network after data enhancement to obtain a characteristic diagram;

and the training module is used for constructing a multi-scale convolution nerve unit, performing feature fusion on the plurality of feature maps, and training according to the real position of the ship in the data set to obtain a training result.

The technical scheme provided by the invention has the beneficial effects that:

(1) according to the actual data condition, various small ships and large ships coexist on the sea, and the large difference of the same kind of targets is the main reason of low detection recall rate. The invention integrates the characteristics of a plurality of characteristic graphs of the multilayer convolutional neural network, integrates the image detail information of the network shallow layer and the macroscopic semantic information of the network high layer, and improves the recall rate of small target detection while ensuring the original large target detection quality.

(2) According to the invention, an image data enhancement strategy is added in target detection, so that the robustness detection of the algorithm is improved. The method still has a good detection result for complex scenes such as cloud and fog, cloudy days, raining and the like. The method can be used for providing the ocean supervision work efficiency, saving the supervision cost, providing scientific basis for the formulation of ocean management decisions, and having important market value.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

fig. 2 is a schematic diagram of a network structure according to an embodiment of the present invention.

Detailed Description

The invention provides a ship detection method of a multi-scale feature fusion convolutional neural network. Firstly, an image sample library is constructed, and a ship image is subjected to sample marking to obtain a calibrated image sample. And then enhancing the data through digital image processing algorithms such as inversion and scaling, and constructing a deep learning network to carry out convolution on the image. And then constructing a multilayer convolutional neural network as a ship target detector, inputting the processed image as sample data into a deep learning network to obtain a feature map after convolution, constructing a multi-scale convolutional neural unit, performing feature fusion on the multilayer convolutional feature map, finally obtaining a loss function of a suggestion box by using a ship position truth value, training the whole network, and outputting a trained model. And finally, carrying out ship detection on the test data by using the trained model. The method mainly comprises five processes of sample library construction, image data enhancement, multi-scale feature fusion, deep learning network training and ship detection.

To illustrate the specific embodiments in detail, as shown in fig. 1, the flow of the example is as follows:

step a, constructing a ship image sample library; the data is enhanced through digital image processing algorithms such as inversion and scaling, and the number of the image data is expanded.

Firstly, ship images are prepared, and data required to be collected by the method are mainly video data shot by unmanned aerial vehicles in coastal areas under visible light. For the acquired video data, each frame of image needs to be obtained through a decoder or a code, and a ship image sample library with sufficient diversity is obtained for a plurality of videos. And then, obtaining a true value and a length and a width of the ship position through a preselected mark for each frame of image in the ship image sample library. And then, enhancing the image data through inversion and scaling, adding a new image formed after inversion and scaling into a sample library, and carrying out pre-selection marking and training together.

Wherein implementation details of scaling include:

(1) scaling the image to have its original aspect ratio with the longest side equal to 500 pixels;

(2) then, the short edges with the length less than 500 pixels are filled with gray, and finally a picture with the side length of 500 pixels is formed;

(3) if the longest edge of the original picture is less than 500 pixels, the gray padding operation is directly performed.

And b, constructing a multilayer convolutional neural network as a ship target detector, and inputting the processed image obtained in the step a into the multilayer convolutional neural network as sample data to obtain a characteristic diagram.

The network structure of the built multilayer convolutional neural network consists of 62 convolutional layers and 5 maximum pooling layers.

For the common convolution layer, a convolution core with a value being modified is used to convolute the feature layer of the previous layer, and then the output feature layer can be obtained by activating the function. Each output layer may be a combination of convolving the values of multiple input layers:

where Mj represents the set of input layers selected, i is the index value of the input layer cell, j is the index value of the output layer cell,

representing the weight between the i-th layer input layer and the j-th layer output layer,

the jth output layer representing the l layer,

the ith input layer representing the l-1 layer. For the pooling layer, there are N input layers and N output layers, except that each output layer is smaller.

down () represents a downsampling function. Typically, all pixels in different n × n regions of the input image are summed. So that the output image is reduced by a factor of n in both dimensions. Each output layer corresponds to a respective multiplicative bias

And an additive bias

The jth output layer representing the l layer,the jth input layer representing the l-1 layer.

For the output fully connected layer, it is often better to convolve the input multiple feature layers and then sum the convolved values to obtain the output layer. Example of the invention uses_ijIndicating the weight or contribution of the ith input layer in obtaining the jth output feature layer. Thus, the jth output layer can be represented as:

and the weight a_ijThe constraints need to be satisfied:

wherein N is_inIndicates the number of layers of the input features,

representing the weight between the input layer and the output layer,

indicating the activation bias between the various layers,

the jth output layer representing the l layer,

the jth input layer representing the l-1 layer.

And c, constructing a multi-scale convolution nerve training unit, performing feature fusion on a plurality of feature maps based on the plurality of feature maps obtained in the step b, and finally training according to the real position of the ship obtained in the step a to obtain a training model.

As shown in fig. 2, the workflow of the multi-scale feature fusion module is as follows:

firstly, extracting feature maps output by the convolution layers of layers 9, 12, 15, 17 and 19 in the whole neural network, and then splicing the 5 feature maps to form a new feature map; then, the new feature map is fed into two multi-scale loss function modules

And

and calculating a multi-scale loss function and updating network parameters. The multiscale loss function during training is defined as follows:

the function is composed of two multi-scale loss function modules

Andand (4) forming. Wherein i represents the element number in the feature map; p is a radical of_iIndicating the probability that the ith element in the feature map contains the target at the position corresponding to the original image,

indicating the probability of whether the original image position in the real label data contains the target or not, if the label real value

It means that the ith position in the feature map is the target, and if it is 0Then it is not; t is t_iIndicating the coordinate offset of the ith element in the feature map corresponding to the frame in the original image

An offset of the real frame coordinates is indicated. N is a radical of_clsAnd N_regThe total number of object classes and the total number of frame coordinate offsets included in the feature map are provided. Classification loss function

The conventional softmax function is used. Function of regression lossThe used method is smooth_L1loss：

Wherein

Indicating that the regression loss function is calculated only for the candidate box with the target. λ is an adjustable parameter, set to 3 by default, to balance the unbalanced effect of the ratio between positive and negative samples in the data on the final loss function.

So far, the detailed implementation process of the marine vessel detection method based on the multi-scale feature fusion convolutional neural network used in the embodiment of the application is introduced. In specific implementation, the method provided by the invention can realize automatic operation flow based on software technology, and can also realize a corresponding system in a modularized mode.

The embodiment of the invention also provides a ship detection system for constructing a deep learning network model based on scene multi-dimensional features, which comprises the following modules:

and the training module is used for constructing a multi-scale convolution nerve unit, performing feature fusion on the plurality of feature maps, and training according to the real position of the ship in the data set to obtain a training result. The specific implementation of each module can refer to the corresponding step, and the detailed description of the invention is omitted. The specific examples described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made or substituted in a similar manner to the specific embodiments described herein by those skilled in the art without departing from the spirit of the invention or exceeding the scope thereof as defined in the appended claims.

Claims

1. A method for detecting a marine vessel based on a multi-scale convolutional neural network model is characterized by comprising the following steps:

step 1, constructing a ship image sample library, collecting ship video data of a coastal area under visible light, extracting each frame of image, obtaining a true value and a length and a width of a ship position, and performing enhancement processing on the ship video data through a digital image processing algorithm;

2. The method of claim 1, wherein the method comprises the following steps: in step 1, the data enhancement process includes:

step 1.1, scaling the image according to the original length-width ratio until the longest edge is equal to 500 pixels;

and step 1.3, if the longest edge of the original picture is less than 500 pixels, performing gray compensation.

3. The method of claim 1, wherein the method comprises the following steps: in the step 2, the process is carried out,

for the common convolutional layer, carrying out convolution on the feature layer of the previous layer by using a convolution core with a modifiable numerical value, and obtaining an output feature layer through an activation function;

the jth output layer representing the l layer,the ith input layer of the l-1 layer is represented, and the pooling layer comprises N input layers and N output layers;

wherein down () represents a down-sampling function, which is the summation of all pixels in different n × n regions of the input image, such that the output image is reduced by n times in both dimensions, and each output layer corresponds to a respective multiplicative biasAnd an additive bias

The jth output layer representing the l layer,

the jth input layer representing the l-1 layer;

for the output full-connection layer, convolution is input into a plurality of characteristic layers, and the convolution values are summed to obtain the output layer if a_ijRepresenting the weight or contribution of the ith input layer in the jth output feature layer, the jth output layer can be represented as:

and a is_ijThe constraints need to be satisfied:

wherein N is_inIndicates the number of layers of the input features,

representing the weight between the input layer and the output layer,

the jth input layer representing the l-1 layer.

4. The method of claim 1, wherein the method comprises the following steps: in step 3, feature fusion comprises:

step 3.1, extracting feature maps output by the convolution layers of layers 9, 12, 15, 17 and 19 in the whole neural network, and then splicing the 5 feature maps to form a new feature map; then, the new characteristic graphs are respectively input into two multi-scale loss function modules

And

And

the probability of whether the original image position in the real label data contains the target or not is shown, and if the real value of the label is judged to be the real value

The ith position in the feature map is a target, and if the ith position is 0, the ith position is not the target; t is t_iIndicating the coordinate offset of the ith element in the feature map corresponding to the frame in the original image

Then represents the offset of the real frame coordinates; n is a radical of_clsAnd N_regThe total number of target classes and the total number of frame coordinate offsets contained in the feature map, and the classification loss functionUsing a conventional softmax function, a regression loss function

The used method is smooth_L1loss：

Wherein the content of the first and second substances,

meaning that the regression loss function is calculated only for the candidate box with the target, λ is an adjustable parameter, and is set to 3 by default, to balance the ratio between positive and negative samples in the dataUnbalanced effects on the final loss function.

5. A ship detection system for building a deep learning network model based on scene multi-dimensional features according to any one of claims 1 to 4, the system comprising: