CN114627415A

CN114627415A - Ship detection method and system based on adaptive data enhancement

Info

Publication number: CN114627415A
Application number: CN202210249886.8A
Authority: CN
Inventors: 谢晓华; 秦潼; 赖剑煌; 周华君; 叶标华
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-06-14

Abstract

The invention discloses a ship detection method and a system based on adaptive data enhancement, wherein the method comprises the following steps: acquiring a picture data set through a camera and performing data enhancement processing; based on the neural network model, carrying out picture feature extraction processing on the enhanced picture; performing multi-scale dimension attention calculation on the feature picture to obtain the feature picture with multi-scale dimensions; performing loss calculation on the characteristic picture with multi-scale dimensionality through a loss function to obtain a final loss value; updating the neural network model according to the final loss value, and constructing a ship detection model; and detecting the picture to be detected based on the ship detection model to obtain a detection result. The ship detection method is based on the visible light video detection technology, the detection sensing capability of the ship body is improved by constructing the ship detection model, and the ship detection precision can be improved under a special detection environment. The ship detection method and system based on adaptive data enhancement can be widely applied to the technical field of ship detection.

Description

Ship detection method and system based on adaptive data enhancement

Technical Field

The invention relates to the technical field of ship detection, in particular to a ship detection method and system based on adaptive data enhancement.

Background

The marine transportation is an important link of international logistics, with the continuous development of economy in China, the shipping flow is continuously increased, and various problems are met, such as behaviors of marine violation, illegal transportation and the like, and further, in order to ensure the continuity and timeliness of supervision of various large ports and navigation channels, a technology of intelligent ship detection must be introduced.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a ship detection method and system based on adaptive data enhancement, which utilize a visible light video detection technology and introduce multi-scale dimension attention calculation and aspect ratio loss function to improve a ship detection algorithm, so as to improve ship detection accuracy in a special detection environment.

The first technical scheme adopted by the invention is as follows: the ship detection method based on adaptive data enhancement comprises the following steps:

acquiring a picture data set through a camera and performing data enhancement processing to obtain an enhanced picture;

based on the neural network model, carrying out picture feature extraction processing on the enhanced picture to obtain a feature picture;

performing multi-scale dimension attention calculation on the feature picture to obtain the feature picture with multi-scale dimensions;

performing loss calculation on the characteristic picture with multi-scale dimensionality through a loss function to obtain a final loss value;

updating the neural network model according to the final loss value, and constructing a ship detection model;

and detecting the picture to be detected based on the ship detection model to obtain a detection result.

Further, the step of obtaining a picture data set through a camera and performing data enhancement processing to obtain an enhanced picture specifically includes:

acquiring a picture data set through a camera;

extracting pictures according to the picture data set to obtain a first picture and a second picture;

marking and cutting the ship image in the first picture to obtain an enhanced material;

and pasting the enhanced material to the second picture according to a preset rule to obtain an enhanced picture.

Further, the step of labeling and cutting the ship image in the first picture to obtain the enhanced material specifically includes:

labeling the ship image in the first picture according to a ship detection frame to obtain a ship image with a label, wherein the ship detection frame is provided with a detection label;

cutting the marked ship image to obtain a cut image;

judging whether the cut image is cut off or not according to the detection label of the ship detection frame to obtain the cut image which is cut off and the cut image which is not cut off;

and integrating the cut-off image and the non-cut-off image to obtain the enhanced material.

Further, the step of pasting the enhancement material to the second picture according to a preset rule to obtain the enhanced picture specifically includes:

the preset rule comprises selecting a pasting area in the second picture, carrying out scaling processing on the enhancement materials and judging the pasting position of the enhancement materials;

judging that the enhancement material is a cut image which is cut off, and selecting the edge which is pasted on the second picture to obtain the pasting position of the edge target;

judging that the enhancement material is an uncut cut image, and randomly selecting a position on the second picture to obtain a target pasted central point;

judging that the size of the enhanced material is larger than a preset size, and reducing the enhanced material;

and when the cross ratio value of the pasting position of the enhancement material is judged to be larger than or equal to the preset value, reselecting the pasting position of the enhancement material, and pasting the enhancement material to the second picture to obtain the enhanced picture until the cross ratio value of the pasting position of the enhancement material is judged to be smaller than the preset value.

Further, the step of performing image feature extraction processing on the enhanced image based on the neural network model to obtain a feature image specifically includes:

the neural network model comprises four stage layers, each stage layer comprises a plurality of residual blocks, and each residual block comprises a first convolution layer, a second convolution layer, a first batch of normalization layers, a second batch of normalization layers, a linear rectification layer and a down-sampling layer;

and performing feature extraction processing on the enhanced picture through convolution calculation to obtain a feature picture.

Further, the step of performing multi-scale dimension attention calculation on the feature picture to obtain the feature picture with multi-scale dimensions specifically includes:

performing channel dimension lifting processing on the feature picture based on an extended dimension method to obtain a multi-dimensional feature picture;

calculating an attention value of the multi-dimensional feature picture based on a channel weighting method to obtain a multi-dimensional feature picture;

and performing fusion processing on the multi-scale feature picture based on a dimension fusion method to obtain the feature picture with multi-scale dimensions.

Further, the step of performing loss calculation on the feature picture with multi-scale dimensions through a loss function to obtain a final loss value specifically includes:

the loss functions include a thermodynamic loss function, a center point offset value loss function, a length-width prediction loss function, and an aspect ratio loss function;

respectively performing loss calculation on the characteristic picture with multi-scale dimensionality through a thermodynamic diagram loss function, a central point deviation value loss function, a length-width prediction loss function and an aspect ratio loss function to obtain a thermodynamic diagram loss value, a central point deviation loss value, an aspect ratio prediction loss value and an aspect ratio loss value;

and integrating the thermodynamic diagram loss value, the central point offset loss value, the aspect ratio prediction loss value and the aspect ratio loss value to obtain a final loss value.

Further, the length and width prediction loss function is expressed as follows:

in the above formula, L_sizeThe length and width predicted loss value is represented,

indicates the predicted length and width, s_kRepresenting a genuine label.

Further, the aspect ratio loss function is represented as follows:

in the above formula, L_Ratio、

Represents the aspect ratio loss value, x, y represent coordinate points on the thermodynamic diagram, c represents the target class,

indicates the predicted length of the detected object,

indicates the predicted width of the detected object, H_xyzRepresenting the true length of the detected object, W_xycRepresenting the true width of the detection target.

The second technical scheme adopted by the invention is as follows: a ship detection system based on adaptive data enhancement comprises:

the enhancement module is used for acquiring a picture data set through a camera and performing data enhancement processing to obtain an enhanced picture;

the characteristic extraction module is used for extracting the image characteristics of the enhanced image based on the neural network model to obtain a characteristic image;

the multi-scale dimension calculation module is used for carrying out multi-scale dimension attention calculation on the feature picture to obtain the feature picture with multi-scale dimensions;

the loss calculation module is used for performing loss calculation on the characteristic picture with the multi-scale dimensionality through a loss function to obtain a final loss value;

the updating module is used for updating the neural network model according to the final loss value and constructing a ship detection model;

and the detection module is used for detecting the picture to be detected based on the ship detection model to obtain a detection result.

The method and the system have the beneficial effects that: the method applies a visible light video detection technology, can enhance the perception capability of the neural network model to the ship detection target through multi-scale dimension attention calculation, and can predict the length-width ratio information of the ship to be detected by using the loss function based on the length-width ratio consistency so as to further improve the prediction precision capability of the neural network model, thereby improving the ship detection precision under a special detection environment.

Drawings

FIG. 1 is a flow chart of the steps of the adaptive data enhancement based ship detection method of the present invention;

FIG. 2 is a block diagram of the structure of the ship detection system based on adaptive data enhancement;

FIG. 3 is a schematic illustration of the adaptive data enhancement of the present invention;

FIG. 4 is a schematic diagram of the deep neural network model structure of the present invention;

FIG. 5 is a schematic diagram of a backbone network architecture to which the present invention is applied;

FIG. 6 is a block diagram of a multi-scale dimensional attention calculation module of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

The method relies on a visible light video detection technology, improves a ship detection algorithm by introducing multi-scale dimension attention calculation and an aspect ratio loss function, and can improve the ship detection precision in a special detection environment.

Referring to fig. 1, the invention provides a ship detection method based on adaptive data enhancement, which comprises the following steps:

s1, acquiring a picture data set through a camera and performing data enhancement processing to obtain an enhanced picture;

s11, acquiring a picture data set through a camera;

s12, extracting pictures according to the picture data set to obtain a first picture and a second picture;

specifically, referring to fig. 3, first two pictures, I for each picture, are randomly extracted from the picture data set₁And I₂And further from picture I₁To obtain the enhancement material.

S13, labeling and cutting the ship image in the first picture to obtain an enhanced material;

s131, performing annotation processing on the ship image in the first picture according to a ship detection frame to obtain a ship image with an annotation, wherein the ship detection frame is provided with a detection label;

s132, cutting the marked ship image to obtain a cut image;

s133, judging whether the cut image is cut off or not according to the detection label of the ship detection frame to obtain the cut image which is cut off and the cut image which is not cut off;

and S134, integrating the cut-off image and the non-cut-off image to obtain the enhanced material.

In particular, Picture I₁Wherein the marked ship detection frame G ═ G₁,g₂,…,g_nWith 4 variables per box representing g_i＝ (x_i,min,y_i,min,x_i,max,y_i,max) Wherein x is_i,min,y_i,minIndicates the upper left corner, x, of the corresponding detection box_i,max,y_i,maxRepresenting the lower right hand corner of the corresponding detection box. Then randomly selecting one of all the objects in the picture, and directly cutting the selected object according to the marked coordinates to obtain an image I₂Then, whether the target is cut off is judged, and the judgment rule is as follows: if (x)_i,min,y_i,min) Or (x)_i,max,y_i,max) And on the periphery, the target is considered to be truncated, otherwise, the target is not truncated, and the truncated cut image and the non-truncated cut image are integrated to obtain the enhanced material.

And S14, pasting the enhancement material to the second picture according to a preset rule to obtain an enhanced picture.

S141, the preset rule comprises selecting a pasting area in the second picture, zooming the enhancement material and judging the pasting position of the enhancement material;

s142, judging that the enhancement material is a cut image which is cut off, and selecting the edge which is pasted on the second picture to obtain the pasting position of the edge target;

s143, judging that the enhancement material is an uncut cut image, and randomly selecting a position on the second picture to obtain a target pasted central point;

specifically, the center point of the target paste target is randomly selected, and the center-lower part of the whole picture, i.e., between 1/3 and 3/4, is generally selected. For a truncated object, we paste the object to the corresponding edge, otherwise to an area that is not at random.

S144, judging that the size of the enhanced material is larger than the preset size, and reducing the enhanced material;

specifically, the size of the enhancement material is judged, the target with the smaller size of the enhancement material is not reduced before pasting, the target characteristics are seriously lost due to the too small target, the training effect is influenced, and when the size of the enhancement material is judged to be larger than the preset size, the enhancement material is reduced.

And S145, when the cross ratio value of the pasting position of the enhancement material is judged to be larger than or equal to the preset value, reselecting the pasting position of the enhancement material, and pasting the enhancement material to the second picture to obtain the enhanced picture until the cross ratio value of the pasting position of the enhancement material is judged to be smaller than the preset value.

Specifically, since the enhancement material itself may carry some background information of the picture, random pasting may affect the original target of the picture to be trained, before pasting, the intersection ratio of the target pasting position and all target positions of the picture to be detected is calculated respectively, if the sum of the values is less than a certain threshold, the next operation is performed, otherwise, the pasting position is reselected, when the selection frequency reaches the preset threshold, the maximum reselection frequency is exceeded, the data enhancement is given up to the picture to be detected, and when the intersection ratio of the pasting position of the enhancement material is less than the preset threshold, the data enhancement is given up to the picture to be detectedWhen setting the value, pasting the enhancement material to the second picture to obtain an enhanced picture which is marked as I_input。

S2, based on the neural network model, carrying out picture feature extraction processing on the enhanced picture to obtain a feature picture;

s21, the neural network model comprises four stage layers, each stage layer comprises a plurality of residual blocks, and each residual block comprises a first convolution layer, a second convolution layer, a first batch normalization layer, a second batch normalization layer, a linear rectification layer and a down-sampling layer;

and S22, performing feature extraction processing on the enhanced picture through convolution calculation to obtain a feature picture.

Specifically, referring to fig. 4 and 5, the enhanced picture is subjected to feature extraction processing using ResNet-34 as a model backbone network, and a neural network model is represented by S₀,S₁,S₂,S₃Each stage consists of N residual blocks, and one residual block consists of two convolution layers, two batch normalization layers, one linear rectification layer and one down-sampling layer. Enhanced picture I_inputEach characteristic obtained by performing characteristic extraction after calculating the convolution layer of each stage after the ResNet-34 network convolution is respectively f_e0＝S₀(I_input),f_e1＝S₁(f_e0),f_e2＝S₂(f_e0),f_e3＝S₃(f_e2) Each characteristic diagram is f_e0、f_e1、f_e2And f_e3。

S3, performing multi-scale dimension attention calculation on the feature picture to obtain the feature picture with multi-scale dimensions;

s31, carrying out channel dimension lifting processing on the feature picture based on an extended dimension method to obtain a multi-dimension feature picture;

specifically, the feature map f is input into a multi-scale attention module after being calculated by a convolution layer and a linear rectification layer respectively_e0,f_e1,f_e2,f_e3Extracting the channel dimension by a convolution layerAnd performing matrix fusion with two adjacent characteristic pictures to obtain a multi-dimensional characteristic picture of the next stage for input, wherein the multi-dimensional characteristic pictures are f_d0,f_d1,f_d2,f_d3The process can be expressed by the following formula:

in the above formula, f_d0,f_d1,f_d2,f_d3The multi-dimensional feature picture is represented,

representing a merge operation in the matrix channel dimension, P (e) representing a process through a convolutional layer and a linear rectifying layer, f_d0、f_d1、f_d2And f_d3Feature pictures respectively representing multi-scale dimensions, f_e0、f_e1、f_e2And f_e3Each representing a feature picture.

S32, calculating an attention value of the multi-dimensional feature picture based on a channel weighting method to obtain a multi-dimensional feature picture;

specifically, referring to fig. 6, let the feature picture f of multi-scale dimension_d0,f_d1,f_d2,f_d3Respectively pass through four attention modules, each of which is respectively denoted as A₀,A₁,A₂,A₃The attention module firstly globally pools the multi-scale-dimension feature picture into a one-dimensional vector of (c,1) by adopting a channel weighting mode, wherein c represents the number of channels, and then the multi-scale feature picture is obtained by multiplying the original feature picture on the channels after full-connection layer transformation, and the specific process can be represented by the following formula:

m0＝A₀(d0),m1＝A₁(d1),m2＝A₂(d2),m3＝A₃(d3)

in the above formula, m0, m1, m2 and m3 respectively represent multi-scale feature pictures, a_i(d) The characteristic diagram is shown to go through the ith attentionAnd (6) force module processing.

And S33, carrying out fusion processing on the multi-scale feature picture based on a dimension fusion method to obtain the feature picture with multi-scale dimensions.

Specifically, after the multi-scale feature maps respectively pass through the attention module, the multi-scale feature maps are fused on the channel dimension, and finally the feature picture f with the multi-scale dimension of the downstream task is obtained_inputThe specific process can be represented by the following formula:

in the above formula, the first and second carbon atoms are,

representing a fusion operation in the dimension of the matrix channel, f_inputRepresenting a feature picture with multi-scale dimensions.

S4, performing loss calculation on the characteristic picture with multi-scale dimensionality through a loss function to obtain a final loss value;

s41, the loss functions comprise a thermodynamic loss function, a center point offset value loss function, a length-width prediction loss function and an aspect ratio loss function;

s42, respectively performing loss calculation on the feature picture with multi-scale dimensions through a thermodynamic diagram loss function, a central point offset value loss function, a length-width prediction loss function and an aspect ratio loss function to obtain a thermodynamic diagram loss value, a central point offset loss value, an aspect ratio prediction loss value and an aspect ratio loss value;

specifically, firstly, a characteristic picture f of multi-scale dimensions is taken_inputReducing the dimension of the first convolutional layer and the second convolutional layer through the first convolutional layer and the second convolutional layer, then respectively inputting the reduced dimension into each module for calculating a loss function, calculating a loss value, wherein the loss function comprises a thermodynamic diagram loss function, a central point deviation value loss function, a length-width prediction loss function and an aspect ratio loss function, and respectively calculating a characteristic picture f of a multi-scale dimension_inputCalculating;

the thermodynamic diagram loss function calculation formula is as follows:

in the above formula, L_kRepresenting thermodynamic diagram loss values, N representing the image keypoint number index,

indicates the predicted value, Y_xycRepresenting a true value, alpha and beta representing a hyper-parameter, x and y representing coordinate points on a thermodynamic diagram, and c representing a target class;

the central point offset value loss function calculation formula is as follows:

in the above formula, L_offRepresenting the center point offset loss value, p represents the coordinates of the target center point,

and R is the scaling proportion of the scaled approximate integer coordinate.

Indicating a predicted deviation;

the calculation formula of the length and width prediction loss function is as follows:

indicates the predicted length and width, s_kA tag representing a real;

the aspect ratio loss function is calculated as follows:

in the above formula, L_Ratio、

indicates the predicted length of the detected object,

indicates the predicted width of the detected object, H_xycRepresenting the true length of the detected object, W_xycRepresenting the true width of the detection target;

in the course of the calculation process,

and λ_RatioThe same parameter is represented, namely a directly deleted denominator part, an aspect ratio fraction direct subtraction mode is not directly adopted, loss calculation is carried out by adopting a numerator part after fraction general division, and the purpose of doing so is that the model can predict a few small values in the initial training stage, so that the aspect ratio loss is too large, and the model training effect is influenced.

And S43, integrating the thermodynamic diagram loss value, the central point offset loss value, the aspect ratio prediction loss value and the aspect ratio loss value to obtain a final loss value.

Specifically, the weight of each loss value is set, the final loss value is the weighted sum of all the above loss values, and the weighting function calculation formula is as follows:

L_det＝L_k+λ_sizeL_size+λ_offL_off+λ_RationL_Ratio

in the above formula, L_detRepresents the finalLoss value, λ_sizeWeight, λ, representing the length-width predicted penalty value_offWeight, λ, representing the center point offset penalty value_RatioRepresenting the weight of the aspect ratio loss value.

S5, updating the neural network model according to the final loss value, and constructing a ship detection model;

specifically, the final loss value is subjected to gradient reverse updating of the neural network model, loading parameters of the neural network model are updated to obtain an updated neural network model, and the ship detection model is constructed based on the updated neural network model.

And S6, detecting the picture to be detected based on the ship detection model to obtain a detection result.

Specifically, based on a ship detection model, pictures are sequentially selected from pictures to be tested and input into the ship detection model, the ship detection model is responsible for outputting predicted values of the pictures, including categories, central point positions and corresponding length and width values, and finally the overall accuracy of the pictures is calculated.

Referring to fig. 2, the ship detection system based on adaptive data enhancement comprises:

the enhancement module is used for acquiring a picture data set through a camera and carrying out data enhancement processing to obtain an enhanced picture;

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The ship detection method based on adaptive data enhancement is characterized by comprising the following steps of:

2. The ship detection method based on adaptive data enhancement according to claim 1, wherein the step of obtaining a picture data set through a camera and performing data enhancement processing to obtain an enhanced picture specifically comprises:

acquiring a picture data set through a camera;

3. The adaptive data enhancement-based ship detection method according to claim 2, wherein the step of labeling and cropping the ship image in the first picture to obtain enhanced material specifically comprises:

cutting the marked ship image to obtain a cut image;

and integrating the cut image and the non-cut image to obtain the enhanced material.

4. The ship detection method based on adaptive data enhancement according to claim 3, wherein the step of pasting the enhancement material to the second picture according to a preset rule to obtain the enhanced picture specifically comprises:

the preset rule comprises selecting a pasting area in the second picture, zooming the enhancement material and judging the pasting position of the enhancement material;

judging that the size of the enhancement material is larger than a preset size, and performing reduction processing on the enhancement material;

5. The ship detection method based on adaptive data enhancement according to claim 4, wherein the step of performing picture feature extraction processing on the enhanced picture based on the neural network model to obtain a feature picture specifically comprises:

6. The ship detection method based on adaptive data enhancement according to claim 5, wherein the step of performing multi-scale dimension attention calculation on the feature picture to obtain the feature picture with multi-scale dimensions specifically comprises:

calculating attention values of the multi-dimensional feature pictures based on a channel weighting method to obtain multi-dimensional feature pictures;

7. The ship detection method based on adaptive data enhancement according to claim 6, wherein the step of performing loss calculation on the feature picture with multi-scale dimensions through a loss function to obtain a final loss value specifically comprises:

respectively performing loss calculation on the feature picture with multi-scale dimensionality through a thermodynamic diagram loss function, a central point offset value loss function, a length-width prediction loss function and an aspect ratio loss function to obtain a thermodynamic diagram loss value, a central point offset loss value, an aspect ratio prediction loss value and an aspect ratio loss value;

8. The adaptive data enhancement based ship detection method according to claim 7, wherein the length-width prediction loss function is expressed as follows:

indicates the predicted length and width, s_kRepresenting a genuine label.

9. The adaptive data enhancement based ship detection method according to claim 8, wherein the aspect ratio loss function is expressed as follows:

in the above formula, L_Ratio、

indicates the predicted length of the detected object,

indicates the predicted width of the detected object, H_xycRepresenting the true length of the detected object, W_xycRepresenting the true width of the detected object.

10. The ship detection system based on adaptive data enhancement is characterized by comprising the following modules: