CN112926486A - Improved RFBnet target detection algorithm for ship small target - Google Patents

Improved RFBnet target detection algorithm for ship small target Download PDF

Info

Publication number
CN112926486A
CN112926486A CN202110281458.9A CN202110281458A CN112926486A CN 112926486 A CN112926486 A CN 112926486A CN 202110281458 A CN202110281458 A CN 202110281458A CN 112926486 A CN112926486 A CN 112926486A
Authority
CN
China
Prior art keywords
training
frames
network
layers
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110281458.9A
Other languages
Chinese (zh)
Inventor
方健
刘坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN202110281458.9A priority Critical patent/CN112926486A/en
Publication of CN112926486A publication Critical patent/CN112926486A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/38Outdoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method based on improved RFBnet, aiming at the problems that in ship target detection, ship targets under the condition of multiple targets are easily shielded by the multiple targets, so that the ship targets are missed to detect, and are wrongly classified. Firstly, performing feature fusion by using a pooling feature fusion module and a deconvolution feature fusion module; secondly, providing step-by-step convolution to extract the information of the concerned region of the feature unit in the original image, and designing an expansion convolution module integrated with the attention mechanism and new first three effective feature layers to perform feature fusion again; then, a focusing classification loss function is introduced to solve the problem of unbalanced distribution of positive and negative samples in the training process; and finally, training the ship detection data set SeaShips. The results show that: the improved algorithm has good effect, and especially has obvious effect on small targets shielded by multiple targets. The average precision mean value is 96.26%, the average precision mean value is improved by 4.74% compared with the algorithm before improvement, the frame rate per second reaches 26FPS, and the requirement of real-time detection is met.

Description

Improved RFBnet target detection algorithm for ship small target
The technical field is as follows:
the invention relates to a method for detecting a small target under the condition of ship multi-target shielding, in particular to a ship target detection method for improving an RFBnet network.
Background art:
ships are important carriers of offshore activities, and computer vision-based ship target detection is applied to an actual ship management system, so that detection of offshore and offshore ships is widely applied to military and civil fields. At present, a natural image of a head-up viewing angle adopted by ship target detection has the characteristics of small data volume, high resolution, rich color and texture information, easy data acquisition and the like, and becomes an important source in the field of target detection. However, the ship target under the condition of multiple targets in the image is easily shielded by the multiple targets, so that the problems of small target missing detection, classification error and the like are caused, and how to improve the detection precision and speed meets the requirement of ocean security in practical application is an urgent problem to be solved.
Traditional ship target detection algorithms are classified into three categories, such as target detection based on statistics, target detection based on knowledge and target detection based on models, and these algorithms need to manually extract target features, such as scale invariant feature transformation matching algorithms, directional gradient histogram features, accelerated robust features and the like, which all need complicated and complicated feature extraction processes and have great disadvantages in detection accuracy and speed.
Although mainstream deep learning target detection algorithms are researched and improved, the mainstream deep learning target detection algorithms only detect ship targets, do not classify the detected ship targets further, and do not meet the requirements of actual ship supervision.
The invention content is as follows:
aiming at the problems that in the conventional ship target detection, a ship target under the condition of multiple targets is easily shielded by the multiple targets to cause the problems of ship target omission, classification errors and the like, the method provides a natural image target detection method based on improved RFBnet, which comprises the following steps:
s1, creating a SeaShips containing 7000 ship data sets with 1920 × 1080 resolution, the data set contains six types of vessels, which are defined to cover substantially all vessels present in the offshore area and to take into account background, lighting, perspective, visible hull proportion, dimensions and occlusion, the data set adopts a standard PASCALVOC labeling format, each picture is precisely labeled with a label and a boundary box of a target, preprocessing is carried out before training, the size of the picture is adjusted to 300 multiplied by 300 pixels, the ore carrier type comprises 1141 pieces, the bulk carrier type comprises 1129 pieces, the container hip type comprises 814 pieces, the general carrier hip type comprises 1188 pieces, the fisher boat type comprises 1258 pieces, the passenger hip type comprises 705 pieces, the six mixed types mainly comprise 765 pieces of mutually shielded ships in the image, and the training set, the verification set and the testing set are randomly divided according to the proportion of 7:2: 1;
s2, firstly cutting a natural image with original resolution into a size of 300 × 300 × 3, transmitting the natural image into an improved RFBnet network, keeping a common VGG only containing conv4-3 and fc7, adding some RFB modules, forming new feature layers which are respectively BasicRFB P3, P5, P6, P7 and P8, after conv4-3, changing one part of the natural image into a size of 38 × 38 × 256 through convolution, changing the other part of the natural image into a size of 19 × 19 × 512 through a mode of maximum pooling, convolution and pooling, continuing to change the natural image into a size of 19 × 19 × 1024 through a dilation convolution with a dilation rate of 6 and fc7, then changing the natural image into a size of 19 × 19 × 256 through convolution once, continuing to sample into a size of 38 × 256, and then fusing with the improvement of the first part to obtain a new feature layer, wherein the six feature layers are totally included;
s3, firstly, processing relatively shallow BasicRFB _ aP3 and BasicRFB P3 features by using maximum pooling layer 2 × 2, 3 × 3 convolution and Relu activation functions in the PFF module, enabling shallow layer network features to learn more nonlinear relations while keeping significant detail features and reducing feature dimensions of a shallow layer network, fusing with BasicRFB P5 to enable the BasicRFB P5 to acquire more edge detail information of BasicRFB _ a P3 and BasicRFB P3, then processing relatively deep BasicRFB P6, Conv 2P 5848P 7 and Conv 2P 8 by using Deconv 2 × 2, 3 × 3 convolution and Relu activation functions in the DFF module, enabling the network features to learn more nonlinear relations while filling feature contents and extracting sensitive feature information, fusing with BasicRFB P5, enabling the Conv 2P d P8 to extract more deep layer RFP 3527, and finally extracting more normalized features of the ConicRFP 638, ConicRFP 638 and Conv2, forming a new BasicRFB P5 characteristic, and similarly, performing the same operation on other layers in the backbone network, thus forming six new effective characteristic layers;
s4, adding DB1, DB2 and DB3 expansion volume blocks to the new layers of the basic RFB _ a P3, the basic RFB P3 and the basic RFB P5 in the network framework respectively, then the information of the receptive fields of the three layers of characteristic units in the original image is learned through DB1, DB2 and DB3 respectively, and the merged features are fused in a concatee mode, the number of channels of the original feature graph is increased by the merged features, finally, a convolution layer is added for increasing the learning capability of the network and simultaneously reducing the feature dimension, because the RFBnet algorithm obtains the feature map and then respectively inputs the feature map into the classification network and the positioning network, the category information and the position information are obtained by convolution of 3 x 3, i.e. information for one object is present in a feature cell of 3 x 3 size, therefore, the algorithm also uses a convolution dimension reduction mode of 3 multiplied by 3, so that the effective characteristic layers after the dimension reduction of the first three layers and the effective characteristic layers of the last three layers form six layers of latest effective characteristic layers;
s5, for the first latest effective feature layer 38 x 512, 1444 grids are contained in the first latest effective feature layer, each grid corresponds to 6 prior frames, each grid of the second, third and fourth latest effective feature layers corresponds to 6 prior frames, each grid of the fifth and sixth latest effective feature layers corresponds to 4 prior frames, finally 11620 prior frames are formed, then the 11620 frames are respectively adjusted according to the prediction result, then whether the 11620 adjusted frames contain the required object or not is judged, if yes, the frame is marked, of course, some frames obtained by utilizing the prior frames can be overlapped, therefore, the score and the overlapping condition of the frames are also judged, the required frame is found by utilizing a non-maximum inhibition method, and the type of the required frame is marked;
s6 candidate box matching and loss function design
In order to realize the detection of ship targets with different scales in the image, candidate frames with different aspect ratios are designed to be matched to adapt to image targets with different scales, and the image targets are detected according to RFThe Bnet loss function may obtain a candidate frame D ═ D1, D2, … dn, where di is composed of (cx, cy, w, h)4 coordinate values, where (cx, cy) is the coordinate of the center point, and w, h are the width and height of the candidate frame, respectively, and the candidate frame is matched with the real tag frame to obtain the coordinate of the candidate frame and the corresponding target category, which may be specifically expressed as the coordinate of the candidate frame and the target category corresponding to the candidate frame
Figure BDA0002978638720000041
l represents category class, and can be ordered for simple labeling
Figure BDA0002978638720000042
Wherein
Figure BDA0002978638720000043
Representing a set of prediction class vectors, wherein
Figure BDA0002978638720000044
Representing a set of predicted coordinate vectors, marking the candidate frame as a positive sample Pos when the matching of the candidate frame and the real label frame is greater than a threshold value, and marking the candidate frame as a negative sample Neg when the matching of the candidate frame and the real label frame is less than the threshold value;
in addition, the RFBnet algorithm predicts 11620 prediction frames on 6 prediction scales, wherein only a small part of the prediction frames contain targets, most of the prediction frames only contain image background information, the network focuses more on background frames which are easy to classify, the classification capability of the targets is reduced, and in order to avoid the problem that model training is degraded due to the fact that the situation causes model training degradation, the method introduces focused classification loss to supervise model training, which can be expressed as:
Figure BDA0002978638720000045
where N is the number of candidate frames corresponding to the actual frame, the localization loss function can be expressed as:
Figure BDA0002978638720000046
Figure BDA0002978638720000047
in the formula (I), the compound is shown in the specification,
Figure BDA0002978638720000048
indicates the probability of correct and class as background prediction box, atAnd r is a hyperparameter, and at∈[0,1],r∈[0,5]When r is>When the average value of the positive sample and the negative sample is 0, the loss of the positive sample is relatively reduced, and the model focuses more on training of the negative sample, so that the problem of unbalanced distribution of the positive sample and the negative sample is effectively solved by adding a focusing classification loss function, and the optimization efficiency of the model is improved;
s7, network training
In the whole training process, in order to quickly optimize training, a priori boxes with IoU values larger than 0.5 of real boxes are set as positive example boxes, negative example boxes difficult to learn are adopted to participate in training, and the proportion of the positive sample and the negative sample is set as 3: 1, setting each 4 pictures as a batch, setting an optimizer as Adam, using a callback mode for learning rate, when two epochs are completed, loss does not decrease, the learning rate is reduced to half of the original rate, training is carried out three times, and a model is firstly pre-trained on an ILSVRC CLS-LOC data set; the second training, setting the parameters of the first 20 layers of the network not to participate in the training, setting the initial learning rate to be 0.0005 and the epoch to be 50, setting all the parameters of the third network to participate in the training, setting the initial learning rate to be 0.0001 and the epoch to be 100, adding early stop in order to reduce the training time, and finishing the training when 4 epoch loss values do not fall in each training;
s8, network test
And for the trained network model, taking the test sample as input to obtain an output predicted value, and comparing the output predicted value with a sample true value to calculate the mAP.
In summary, compared with the conventional convolutional neural network, the network model provided by the present invention is different in that feature fusion in a concatee manner is performed on the conventional six feature layers, so that the shallow layer and the deep layer are interconnected, and meanwhile, a DB module is added to the first three layers of the new six feature layers, so as to enhance the detection efficiency of the shallow layer on the small target in a fusion manner.
Description of the drawings:
FIG. 1 is a flow chart for fusing Basic RFB P5 features.
DB1 with attention mechanism incorporated in FIG. 2
FIG. 3 structures of DB1, DB2 and DB3
The specific implementation mode is as follows:
a ship target detection method based on an improved RFBnet network comprises the following steps:
s1, creating a SeaShips containing 7000 ship data sets with 1920 × 1080 resolution, the data set contains six types of vessels, which are defined to cover substantially all vessels present in the offshore area and to take into account background, lighting, perspective, visible hull proportion, dimensions and occlusion, the data set adopts a standard PASCALVOC labeling format, each picture is precisely labeled with a label and a boundary box of a target, preprocessing is carried out before training, the size of the picture is adjusted to 300 multiplied by 300 pixels, the ore carrier type comprises 1141 pieces, the bulk carrier type comprises 1129 pieces, the container hip type comprises 814 pieces, the general carrier hip type comprises 1188 pieces, the fisher boat type comprises 1258 pieces, the passenger hip type comprises 705 pieces, the six mixed types mainly comprise 765 pieces of mutually shielded ships in the image, and the training set, the verification set and the testing set are randomly divided according to the proportion of 7:2: 1;
s2, firstly cutting a natural image with an original resolution into a size of 300 × 300 × 3, transmitting the natural image into an improved RFBnet network, keeping a common VGG only containing conv4-3 and fc7, adding some RFB modules, forming new feature layers which are respectively BasicRFB P3, P5, P6, P7 and P8, after conv4-3, changing one part of the natural image into a size of 38 × 38 × 256 through convolution, changing the other part of the natural image into a size of 19 × 19 × 512 through a mode of maximum pooling, convolution and pooling, continuing to change the natural image into a size of 19 × 19 × 1024 through a dilation convolution with a dilation rate of 6 and fc7, changing the natural image into a size of 19 × 19 × 256 through a convolution, continuing to sample into a size of 38 × 256, and then fusing the natural image with the improved cone of the first part to obtain a new feature layer, wherein the size is six feature layers;
s3, firstly, processing relatively shallow BasicRFB _ aP3 and BasicRFB P3 features by using maximum pooling layer 2 × 2, 3 × 3 convolution and Relu activation functions in the PFF module, enabling shallow layer network features to learn more nonlinear relations while keeping significant detail features and reducing feature dimensions of a shallow layer network, fusing with BasicRFB P5 to enable the BasicRFB P5 to acquire more edge detail information of BasicRFB _ a P3 and BasicRFB P3, then processing relatively deep BasicRFB P6, Conv 2P 5848P 7 and Conv 2P 8 by using Deconv 2 × 2, 3 × 3 convolution and Relu activation functions in the DFF module, enabling the network features to learn more nonlinear relations while filling feature contents and extracting sensitive feature information, fusing with BasicRFB P5, enabling the Conv 2P d P8 to extract more deep layer RFP 3527, and finally extracting more normalized features of the ConicRFP 638, ConicRFP 638 and Conv2, forming a new BasicRFB P5 characteristic, and similarly, performing the same operation on other layers in the backbone network, thus forming six new effective characteristic layers;
s4, adding DB1, DB2 and DB3 expansion volume blocks to the new layers of the basic RFB _ a P3, the basic RFB P3 and the basic RFB P5 in the network framework respectively, then the information of the receptive fields of the three layers of characteristic units in the original image is learned through DB1, DB2 and DB3 respectively, and the merged features are fused in a concatee mode, the number of channels of the original feature graph is increased by the merged features, finally, a convolution layer is added for increasing the learning capability of the network and simultaneously reducing the feature dimension, because the RFBnet algorithm obtains the feature map and then respectively inputs the feature map into the classification network and the positioning network, the category information and the position information are obtained by convolution of 3 x 3, i.e. information for one object is present in a feature cell of 3 x 3 size, therefore, the algorithm also uses a convolution dimension reduction mode of 3 multiplied by 3, so that the effective characteristic layers after the dimension reduction of the first three layers and the effective characteristic layers of the last three layers form six layers of latest effective characteristic layers;
s5, for the first latest effective feature layer 38 x 512, 1444 grids are contained in the first latest effective feature layer, each grid corresponds to 6 prior frames, and each grid of the second, third and fourth latest effective feature layers corresponds to 6 prior frames; each grid of a fifth and a sixth latest effective feature layer corresponds to 4 prior frames, finally 11620 prior frames are formed, then the 11620 frames are adjusted respectively according to a prediction result, then whether the 11620 adjusted frames contain the required object or not is judged, if yes, the frame is marked, of course, some frames obtained by utilizing the prior frames are overlapped, the score and the overlapping condition of the frames are also judged, and the required frame is found by utilizing a non-maximum inhibition method and the type of the frame is marked;
s6 candidate box matching and loss function design
In order to detect ship targets with different scales in an image, candidate frames with different aspect ratios are designed to be matched to adapt to image targets with different scales, according to an RFBnet loss function, a candidate frame D (D1, D2, … dn) can be obtained, wherein di is composed of (cx, cy, w, h)4 coordinate values, the (cx, cy) is a central point coordinate, w and h are the width and the height of the candidate frame respectively, the candidate frame is matched with a real label frame, and the coordinate of the candidate frame and the corresponding target category are obtained and can be specifically expressed as
Figure BDA0002978638720000071
l represents category class, and can be ordered for simple labeling
Figure BDA0002978638720000072
Wherein
Figure BDA0002978638720000073
Representing a set of prediction class vectors, wherein
Figure BDA0002978638720000074
Representing a set of predicted coordinate vectors, marking the candidate frame as a positive sample Pos when the matching of the candidate frame and the real label frame is greater than a threshold value, and marking the candidate frame as a negative sample Neg when the matching of the candidate frame and the real label frame is less than the threshold value;
in addition, the RFBnet algorithm predicts 11620 prediction frames on 6 prediction scales, wherein only a small part of the prediction frames contain targets, most of the prediction frames only contain image background information, the network focuses more on background frames which are easy to classify, the classification capability of the targets is reduced, and in order to avoid the problem that model training is degraded due to the fact that the situation causes model training degradation, the method introduces focused classification loss to supervise model training, which can be expressed as:
Figure BDA0002978638720000081
where N is the number of candidate frames corresponding to the actual frame, the localization loss function can be expressed as:
Figure BDA0002978638720000082
Figure BDA0002978638720000083
in the formula (I), the compound is shown in the specification,
Figure BDA0002978638720000084
for the classification loss function, the cross entropy is used to calculate the loss, which can be expressed as:
Figure BDA0002978638720000085
in the formula (I), the compound is shown in the specification,
Figure BDA0002978638720000086
indicates the probability of correct and class as background prediction box, atAnd r is a hyperparameter, and at∈[0,1],r∈[0,5]When r is>When the average value of the positive sample and the negative sample is 0, the loss of the positive sample is relatively reduced, and the model focuses more on training of the negative sample, so that the problem of unbalanced distribution of the positive sample and the negative sample is effectively solved by adding a focusing classification loss function, and the optimization efficiency of the model is improved;
s7, network training
In the whole training process, in order to quickly optimize training, a priori boxes with IoU values larger than 0.5 of real boxes are set as positive example boxes, negative example boxes difficult to learn are adopted to participate in training, and the proportion of the positive sample and the negative sample is set as 3: 1, setting each 4 pictures as a batch, setting an optimizer as Adam, using a callback mode for learning rate, when two epochs are completed, loss does not decrease, the learning rate is reduced to half of the original rate, training is carried out three times, and a model is firstly pre-trained on an ILSVRC CLS-LOC data set; the second training, setting the parameters of the first 20 layers of the network not to participate in the training, setting the initial learning rate to be 0.0005 and the epoch to be 50, setting all the parameters of the third network to participate in the training, setting the initial learning rate to be 0.0001 and the epoch to be 100, adding early stop in order to reduce the training time, and finishing the training when 4 epoch loss values do not fall in each training;
s8, network test
And for the trained network model, taking the test sample as input to obtain an output predicted value, and comparing the output predicted value with a sample true value to calculate the mAP.

Claims (1)

1. A ship target detection method for improving an RFBnet network is characterized by comprising the following steps:
s1, creating a 7000-piece 1920 × 1080-resolution ship data set SeaShips, wherein the data set comprises six types of ship types, the data set adopts a standard PASCALVOC labeling format, each picture is accurately labeled with a target label and a boundary box, preprocessing is performed before training, the size of an image is adjusted to 300 × 300 pixels, 1141 of an ore carrier type, 1129 of a bulk carrier type, 814 of a container type, 1188 of a general carrier type, 1258 of a firm boat type, 705 of a passger type, 765 of ship mutual occlusion observed by human eyes of the six types of mixed types, and the training set, the verification set and the test set are randomly divided according to a ratio of 7:2: 1;
s2, firstly cutting a natural image with original resolution into a size of 300 × 300 × 3, transmitting the natural image into an improved RFBnet network, keeping a common VGG only containing conv4-3 and fc7, adding some RFB modules, forming new feature layers which are respectively BasicRFB P3, P5, P6, P7 and P8, after conv4-3, changing one part of the natural image into a size of 38 × 38 × 256 through convolution, changing the other part of the natural image into a size of 19 × 19 × 512 through a mode of maximum pooling, convolution and pooling, continuing to change the natural image into a size of 19 × 19 × 1024 through a dilation convolution with a dilation rate of 6 and fc7, then changing the natural image into a size of 19 × 19 × 256 through convolution once, continuing to sample into a size of 38 × 256, and then fusing with the improvement of the first part to obtain a new feature layer, wherein the six feature layers are totally included;
s3, firstly, processing relatively shallow BasicRFB _ a P3 and BasicRFB P3 features by using maximum pooling layer 2 x 2, 3 x 3 convolution and Relu activation functions in a PFF module, then fusing with the BasicRFB P5, secondly, processing relatively deep BasicRFB P6, Conv2dP7 and Conv2dP8 by using Deconv 2 x 2, 3 x 3 convolution and Relu activation functions in the DFF module, fusing with the BasicRFB P5, and finally fusing the extracted features and performing L2 norm normalization operation to form new BasicRFB P5 features, and similarly, performing the same operation on other layers in a main network, thus forming six new effective feature layers in total;
s4, adding DB1, DB2 and DB3 expansion convolution blocks to new layers of BasicRFB _ a P3, BasicRFB P3 and BasicRFB P5 in a network frame respectively, then learning information of the receptive field of the three layers of feature units in an original image through DB1, DB2 and DB3 respectively, and finally adding a convolution layer, wherein after an RFBnet algorithm obtains a feature map, the feature map is input into a classification network and a positioning network respectively, category information and position information are obtained through convolution of 3 x 3, namely information of a target exists in feature units of 3 x 3 size, so that the algorithm also uses a convolution dimension reduction mode of 3 x 3, and thus an effective feature layer after the previous three layers of dimension reduction and an original three layers of effective feature layers form a six-layer latest effective feature layer together;
s5, for the first latest effective feature layer 38 x 512, 1444 grids are contained in the first latest effective feature layer, each grid corresponds to 6 prior frames, each grid of the second, third and fourth latest effective feature layers corresponds to 6 prior frames, each grid of the fifth and sixth latest effective feature layers corresponds to 4 prior frames, finally 11620 prior frames are formed, then the 11620 frames are respectively adjusted according to the prediction result, then whether the 11620 adjusted frames contain the required object or not is judged, if yes, the frame is marked, of course, some frames obtained by utilizing the prior frames can be overlapped, therefore, the score and the overlapping condition of the frames are also judged, the required frame is found by utilizing a non-maximum inhibition method, and the type of the required frame is marked;
s6 candidate box matching and loss function design
In order to detect ship targets with different scales in an image, candidate frames with different aspect ratios are designed to be matched to adapt to image targets with different scales, according to an RFBnet loss function, a candidate frame D (D1, D2, … dn) can be obtained, wherein di is composed of (cx, cy, w, h)4 coordinate values, the (cx, cy) is a central point coordinate, w and h are the width and the height of the candidate frame respectively, the candidate frame is matched with a real label frame, and the coordinate of the candidate frame and the corresponding target category are obtained and can be specifically expressed as
Figure FDA0002978638710000021
l represents category class, and can be ordered for simple labeling
Figure FDA0002978638710000022
Wherein
Figure FDA0002978638710000023
Representing a set of prediction class vectors, wherein
Figure FDA0002978638710000024
Representing a set of predicted coordinate vectors, marking the candidate frame as a positive sample Pos when the matching of the candidate frame and the real label frame is greater than a threshold value, and marking the candidate frame as a negative sample Neg when the matching of the candidate frame and the real label frame is less than the threshold value;
in addition, the RFBnet algorithm predicts 11620 prediction frames on 6 prediction scales, wherein only a small part of the prediction frames contain targets, most of the prediction frames only contain image background information, the network focuses more on background frames which are easy to classify, the classification capability of the targets is reduced, and in order to avoid the problem that model training is degraded due to the fact that the situation causes model training degradation, the method introduces focused classification loss to supervise model training, which can be expressed as:
Figure FDA0002978638710000031
where N is the number of candidate frames corresponding to the actual frame, the localization loss function can be expressed as:
Figure FDA0002978638710000032
Figure FDA0002978638710000033
in the formula (I), the compound is shown in the specification,
Figure FDA0002978638710000034
for the classification loss function, the cross entropy is used to calculate the loss, which can be expressed as:
Figure FDA0002978638710000035
in the formula (I), the compound is shown in the specification,
Figure FDA0002978638710000036
indicates the probability of correct and class as background prediction box, atAnd r is a hyperparameter, and at∈[0,1],r∈[0,5]When r is>A time of 0 means that the loss of the positive sample is relatively reduced, and the model focuses more on the training of the negative sample;
s7, network training
In the whole training process, in order to quickly optimize training, a priori boxes with IoU values larger than 0.5 of real boxes are set as positive example boxes, negative example boxes difficult to learn are adopted to participate in training, and the proportion of the positive sample and the negative sample is set as 3: 1, setting each 4 pictures as a batch, setting an optimizer as Adam, using a callback mode for learning rate, when two epochs are completed, loss does not decrease, the learning rate is reduced to half of the original rate, training is performed in three times, pre-training a model on an ILSVRC CLS-LOC data set, training for the second time, setting parameters of the first 20 layers of a network to be not involved in training, setting the initial learning rate to be 0.0005, setting the epochs to be 50, training all parameters of the third layer of the network to be involved in training, setting the initial learning rate to be 0.0001 and setting the epochs to be 100, adding early-stop for reducing training time, and finishing the training when 4 epochs of loss values do not decrease each time;
s8, network test
And for the trained network model, taking the test sample as input to obtain an output predicted value, and comparing the output predicted value with a sample true value to calculate the mAP.
CN202110281458.9A 2021-03-16 2021-03-16 Improved RFBnet target detection algorithm for ship small target Withdrawn CN112926486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110281458.9A CN112926486A (en) 2021-03-16 2021-03-16 Improved RFBnet target detection algorithm for ship small target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110281458.9A CN112926486A (en) 2021-03-16 2021-03-16 Improved RFBnet target detection algorithm for ship small target

Publications (1)

Publication Number Publication Date
CN112926486A true CN112926486A (en) 2021-06-08

Family

ID=76175583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110281458.9A Withdrawn CN112926486A (en) 2021-03-16 2021-03-16 Improved RFBnet target detection algorithm for ship small target

Country Status (1)

Country Link
CN (1) CN112926486A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627310A (en) * 2021-08-04 2021-11-09 中国电子科技集团公司第十四研究所 Background and scale perception SAR ship target detection method
CN113705327A (en) * 2021-07-06 2021-11-26 中国电子科技集团公司第二十八研究所 Fine-grained target classification method based on priori knowledge
CN113808164A (en) * 2021-09-08 2021-12-17 西安电子科技大学 Infrared video multi-target tracking method
CN114445689A (en) * 2022-01-29 2022-05-06 福州大学 Multi-scale weighted fusion target detection method and system guided by target prior information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705327A (en) * 2021-07-06 2021-11-26 中国电子科技集团公司第二十八研究所 Fine-grained target classification method based on priori knowledge
CN113705327B (en) * 2021-07-06 2024-02-09 中国电子科技集团公司第二十八研究所 Fine granularity target classification method based on priori knowledge
CN113627310A (en) * 2021-08-04 2021-11-09 中国电子科技集团公司第十四研究所 Background and scale perception SAR ship target detection method
CN113627310B (en) * 2021-08-04 2023-11-24 中国电子科技集团公司第十四研究所 SAR ship target detection method based on background and scale sensing
CN113808164A (en) * 2021-09-08 2021-12-17 西安电子科技大学 Infrared video multi-target tracking method
CN114445689A (en) * 2022-01-29 2022-05-06 福州大学 Multi-scale weighted fusion target detection method and system guided by target prior information

Similar Documents

Publication Publication Date Title
US20210398294A1 (en) Video target tracking method and apparatus, computer device, and storage medium
CN112884064B (en) Target detection and identification method based on neural network
CN109902677B (en) Vehicle detection method based on deep learning
CN112926486A (en) Improved RFBnet target detection algorithm for ship small target
CN110135503B (en) Deep learning identification method for parts of assembly robot
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN111985376A (en) Remote sensing image ship contour extraction method based on deep learning
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN112800838A (en) Channel ship detection and identification method based on deep learning
CN112529090B (en) Small target detection method based on improved YOLOv3
CN110827312B (en) Learning method based on cooperative visual attention neural network
Zheng et al. A lightweight ship target detection model based on improved YOLOv5s algorithm
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN116612292A (en) Small target detection method based on deep learning
CN111368637B (en) Transfer robot target identification method based on multi-mask convolutional neural network
CN114331946A (en) Image data processing method, device and medium
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
CN116994135A (en) Ship target detection method based on vision and radar fusion
CN117557784B (en) Target detection method, target detection device, electronic equipment and storage medium
CN113610178A (en) Inland ship target detection method and device based on video monitoring image
CN116824333A (en) Nasopharyngeal carcinoma detecting system based on deep learning model
CN116740572A (en) Marine vessel target detection method and system based on improved YOLOX
CN114743045B (en) Small sample target detection method based on double-branch area suggestion network
CN110889418A (en) Gas contour identification method
CN113435389B (en) Chlorella and golden algae classification and identification method based on image feature deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210608

WW01 Invention patent application withdrawn after publication