CN111914917A - Target detection improved algorithm based on feature pyramid network and attention mechanism - Google Patents
Target detection improved algorithm based on feature pyramid network and attention mechanism Download PDFInfo
- Publication number
- CN111914917A CN111914917A CN202010710684.XA CN202010710684A CN111914917A CN 111914917 A CN111914917 A CN 111914917A CN 202010710684 A CN202010710684 A CN 202010710684A CN 111914917 A CN111914917 A CN 111914917A
- Authority
- CN
- China
- Prior art keywords
- feature
- algorithm
- fusion
- network
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 80
- 230000007246 mechanism Effects 0.000 title claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims description 15
- 238000000034 method Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000005284 excitation Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 1
- 241000283073 Equus caballus Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 241000009328 Perro Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection improved algorithm based on a characteristic pyramid network and an attention mechanism, which fuses 6 multi-scale characteristic graphs extracted from a basic network in an original SSD algorithm by combining the principle of the characteristic pyramid network, wherein a new characteristic graph formed after fusion simultaneously contains rich context information so as to improve the detection capability; and an attention model is added to the fused feature graph, so that feature information of the small target is effectively extracted. The condition of missed detection is improved, the robustness of the algorithm is improved, and meanwhile the requirement of real-time performance is still met in the aspect of detection speed.
Description
Technical Field
The invention belongs to the field of digital image processing, relates to target detection, and particularly relates to a target detection improvement algorithm based on a characteristic pyramid network and an attention mechanism.
Background
The task of target detection is to find out interested targets in images and determine the types and the positions of the interested targets, is one of the core problems in the field of computer vision, and is widely applied to infrared detection technology, intelligent video monitoring, remote sensing image target detection, medical diagnosis and fire and smoke detection in intelligent buildings. The target detection algorithm can be divided into a traditional target detection algorithm and a target detection algorithm based on deep learning; the traditional target detection algorithm is represented by an SIFT algorithm, a V-J detection algorithm and the like, but the method is high in time complexity and has no good robustness. The target detection algorithm based on deep learning comprises an R-CNN algorithm, a Fast R-CNN algorithm, a Faster R-CNN algorithm, a YOLO algorithm, an SSD algorithm and the like. Although many excellent target detection algorithms exist in the prior art, the detection performance is still insufficient, so that the problems of missing detection, false detection and the like are caused.
Disclosure of Invention
In view of the above-mentioned drawbacks and disadvantages of the prior art, an object of the present invention is to provide an improved algorithm for object detection based on a feature pyramid network and an attention mechanism.
In order to realize the task, the invention adopts the following technical solution:
an improved target detection algorithm based on a feature pyramid network and an attention mechanism is characterized by comprising the following steps:
step 1) combining the principle of a feature pyramid network, extracting 6 multi-scale feature maps of an input image from a basic network VGG-16 in an original SSD algorithm, and performing feature fusion according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
According to the invention, the size of the input image in step 1) is 300 × 300, and the sizes of the feature maps for detection obtained after passing through the basic network VGG-16 are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1, respectively. According to the principle of the feature pyramid network, feature fusion is sequentially carried out on feature graphs from small to large in size, and 6 feature graphs with the sizes still being 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained.
Further, in the step 2), an attention model is added to the feature map fused according to the feature pyramid principle in the step 1), because the fusion process is performed according to the order of the size of the feature map from small to large, the feature map with the most abundant information after fusion is (38, 38), and the two feature maps (19, 19) have more abundant detail information and semantic information compared with other feature maps, and are more sensitive to small target detection; in order to maintain the detection speed of the algorithm and reduce the calculation amount of the algorithm, the attention model is only added to the two feature maps (38, 38) and (19, 19) after fusion, and the specific detection algorithm process is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, multi-scale feature maps extracted from a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is performed according to the principle of a feature pyramid network and the order of size from small to large, and 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained through fusion, and all the feature maps contain rich semantic information and detailed information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; the attention model is added to the feature map after feature fusion in 1a), and the two feature maps of 38 × 38 and 19 × 19 after fusion contain the most abundant information, and in order to keep the real-time performance of the algorithm, the attention model is only added to the two feature maps.
c) Setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the steps a) and b), and calculating the scale of the candidate frames according to the following formula (1):
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2; obtaining the scale of each candidate frame by using the formula (1);
for aspect ratio, the value is generallyAnd the width and height of the candidate frame are calculated according to the following formula (2):
for a candidate box with aspect ratio of 1, a scale is also addedThe candidate frame of (1), the center coordinates of the candidate frame areWherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; and the alpha weight coefficient is set to be 1.
For the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
the invention relates to a target detection improved algorithm based on a feature pyramid network and an attention mechanism, which is based on a single-stage target detection algorithm (SSD) algorithm, takes the influence of the resolution of a feature map on the target detection performance into consideration, improves the original algorithm, combines the thought of the feature pyramid network, fuses multi-scale feature maps extracted by the original SSD algorithm, and fuses to form a feature map with abundant semantic information and detailed information; and combining the principle of an attention mechanism, adding an attention model for the two feature maps with the fused sizes of 38 multiplied by 38 and 19 multiplied by 19 so as to enhance the recognition effect on the small target object.
Drawings
FIG. 1 is a schematic diagram of a network architecture for an object detection algorithm that combines a feature pyramid network and an attention mechanism;
FIG. 2 is a picture comparing the detection effect of the original SSD algorithm with the improved target detection algorithm, wherein the left graph a1, the graph a2, the graph a3, the graph a4 and the graph a5 are the detection pictures of the original SSD algorithm; the right-hand graph b1, the graph b2, the graph b3, the graph b4 and the graph b5 are all detection pictures of the improved target detection algorithm.
The invention is described in further detail below with reference to the figures and examples.
Detailed Description
The invention discloses a target detection improvement algorithm based on a feature pyramid network and an attention mechanism. The method comprises the steps of integrating the principle of a characteristic pyramid network, fusing 6 characteristic graphs extracted by an original SSD algorithm to form a new characteristic graph, and meanwhile, having rich semantic information and detail information; then, an attention model is added to the fused feature map, but in order to keep the real-time performance of the algorithm, the attention model is only added to the 38 × 38 feature map and the 19 × 19 feature map which contain most abundant information and are more sensitive to small target detection. The detection capability of the target detection algorithm is improved through the improvement of the algorithm, and the problems of missing detection and the like are solved.
The embodiment provides an improved target detection algorithm based on a feature pyramid network and an attention mechanism, which comprises the following steps:
step 1) combining the principle of a feature pyramid network, extracting 6 multi-scale feature maps of an input image from a basic network VGG-16 in an original SSD algorithm, and performing feature fusion according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
In step 1), the size of the input image is 300 × 300, the sizes of the feature maps extracted through the basic network VGG-16 are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1, respectively, and the extracted 6 feature maps are fused from small to large in size by combining the idea of the feature pyramid network, that is, 1 × 1 and 3 × 3, 3 × 3 and 5 × 5, 5 × 5 and 10 × 10, 10 × 10 and 19 × 19, and 19 × 19 and 38 × 38. The sizes of the fused feature maps are still 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1.
In the step 2), an attention model is added to the fused feature map in combination with the principle of an attention mechanism, because the 38 × 38 and 19 × 19 feature maps after feature fusion contain the richest information, and in order to keep the real-time performance of the detection algorithm and reduce the calculation amount, the attention model is only added to the two feature maps, and the extraction of the features of the small target object can be enhanced after the attention model is added.
The detection process of the improved target detection algorithm is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, multi-scale feature maps extracted by a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is performed according to the principle of a feature pyramid network and the order of the sizes from small to large, and the two feature maps (1, 1) and (3, 3) are taken as examples:
firstly, up-sampling a characteristic diagram with the size of (1, 1), and inserting new elements between pixel points by adopting a proper interpolation algorithm on the basis of the original image pixels by adopting an interpolation value method, so that the size of the characteristic diagram is enlarged, and the enlarged characteristic diagram is consistent with the size of the characteristic diagram (3, 3); then carrying out 1 × 1 convolution operation on the feature maps of (3, 3), and changing the number of channels of the feature maps to enable the number of channels to be the same as that of the feature maps obtained through up-sampling; and finally, carrying out feature fusion, and carrying out convolution operation on the fused feature graph by using a 3-by-3 convolution kernel after fusion so as to eliminate the aliasing effect of up-sampling. The fusion between other adjacent feature maps is consistent with the method described above. The fusion obtains 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1, and the feature maps all contain rich semantic information and detail information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; adding an attention model to the feature map subjected to feature fusion in the step a), wherein the 38 × 38 and 19 × 19 feature maps after fusion contain the most abundant information, and in order to keep the real-time performance of the algorithm, only adding the attention model to the two feature maps. The process of adding the attention model is divided into three steps: squeeze, excite, notice.
The formula for the pressing operation is as follows:
h, W respectively represents the height and width of an input, U represents the input, Y represents the output, and C is the number of input channels;
the function of the formula (1) is to convert the input of H × W × C into the output of 1 × C, which corresponds to performing a global average pooling (global average pooling) operation.
The formula for the excitation operation is as follows:
S=h-Swish(W2×ReLU6(W1Y))(2)
wherein Y represents an output of the pressing operation, S represents an output of the energizing operation, and W1Has the dimension of C/r C, W2The dimension of (C) is C/r, r is a scaling parameter, and the value is 4. W1Multiplication by Y represents a full join operation, and then activation of the function via ReLU 6; then with W2Multiplication also represents a full connection operation, and finally passes through a hard-Swish activation function, so that the excitation operation is completed. ReLU6 and hard-Swish activation function formulas are shown in equation (3) below.
The operation of Attention is shown below:
X=S×U (4)
in the formula, X represents the feature map after the attention mechanism is added, U represents the original input, and S represents the output of the excitation operation, and the weight of each feature map is multiplied by the feature of the feature map.
c) Setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the step a) and the step b), and calculating the scale of the candidate frames according to the following formula:
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2;
obtaining the scale of each candidate frame by using the formula (5);
for aspect ratio, the value is generallyAnd the width and height of the candidate frame are calculated according to the following formula (6):
for a candidate box with aspect ratio of 1, a scale is also addedThe candidate frame of (1), the center coordinates of the candidate frame areWherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; and the alpha weight coefficient is set to be 1.
For the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
and then training the improved target detection algorithm model.
In this embodiment, the PASCAL VOC2007 data set and the PASCAL VOC2012 data set are used as training sets for model training, and data amplification techniques are adopted to expand images of the training sets by performing operations such as horizontal flipping and random cropping on the data sets.
Data used for the experiment: the PASCAL VOC data set is a set of standardized data set for image identification and classification, and the data set comprises 20 categories, namely human, bird, cat, cow, dog, horse, sheep, airplane, bicycle, boat, bus, automobile, motorcycle, train, bottle, chair, dining table, potted plant, sofa and television.
This example was trained using the VOC2007 data set and the VOC2012 data set described above and tested using the VOC2007 data set. During training, a random gradient descent method (SGD) is adopted, the batchsize is set to be 32, the initial learning rate is set to be 0.001, the momentum parameter momentum is set to be 0.9, the learning rate is reduced by 90% when the iteration times are 100000 and 150000, and the training times are 200000.
In order to verify the detection effect of the target detection improvement algorithm based on the single-stage network model in this embodiment, the applicant selects a test set in the PASCAL VOC2007 data set for detection, and uses an mAP (mean Average precision) as an evaluation index of the detection algorithm, each detected category obtains a curve formed by precision ratio and recall ratio, i.e., a P-R curve, the area under the curve is an AP value, and the AP values of all detected categories are averaged to obtain the mAP value. The detection effect is compared with other mainstream target detection models in terms of both subjective and objective aspects (see tables 1 and 2).
TABLE 1
TABLE 2
In the subjective evaluation of detection effect, the original SSD algorithm and the improved detection algorithm effect graph are compared (as shown in fig. 2, the graphs a1, a2, a3, a4 and a5 are original SSD algorithm detection pictures, and the graphs b1, b2, b3, b4 and b5 are target detection algorithm detection pictures). As can be seen from the figure, compared with the original SSD algorithm, the improved target detection algorithm significantly improves the problems of missing detection and the like in the original algorithm, has more excellent detection capability for densely distributed small target objects, and can detect more targets. The detection effect is obviously improved compared with the original SSD algorithm.
Claims (3)
1. An improved target detection algorithm based on a feature pyramid network and an attention mechanism is characterized by comprising the following steps:
step 1) combining the principle of a feature pyramid network, and performing feature fusion on 6 multi-scale feature maps extracted from an input image by a basic network VGG-16 in an original SSD algorithm according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
2. The algorithm of claim 1, wherein the size of the input image in step 1) is 300 x 300, and the sizes of the feature maps for detection obtained after passing through the underlying network VGG-16 are 38 x 38, 19 x 19, 10 x 10, 5 x 5, 3 x 3, 1 x 1; according to the principle of the feature pyramid network, feature fusion is sequentially carried out on feature graphs for detection from small to large in size, and 6 feature graphs with the feature graph size still being 38 multiplied by 38, 19 multiplied by 19, 10 multiplied by 10, 5 multiplied by 5, 3 multiplied by 3 and 1 multiplied by 1 are obtained.
3. The algorithm according to claim 1, wherein in step 2), an attention model is added to the feature map fused according to the feature pyramid principle in step 1), because the fusion process is performed in the order from small to large in feature map size, so that the feature map with the most abundant information after fusion is (38, 38), (19, 19), and the two feature maps have more abundant detail information and semantic information and are more sensitive to small object detection than other feature maps; in order to maintain the detection speed of the algorithm and reduce the calculation amount of the algorithm, the attention model is only added to the two feature maps (38, 38) and (19, 19) after fusion, and the detection process of the target detection algorithm is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, input image multi-scale feature maps extracted by a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is carried out according to the principle of a feature pyramid network and the order of the sizes from small to large, 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained through fusion, and the feature maps contain rich semantic information and detailed information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; adding an attention model to the feature map subjected to feature fusion in the step 1a), wherein the two feature maps of 38 × 38 and 19 × 19 contain the most abundant information after fusion, and in order to keep the real-time performance of the algorithm, only adding the attention model to the two feature maps;
c) setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the steps a) and b), and calculating the scale of the candidate frames according to the following formula (1):
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2;
obtaining the scale of each candidate frame by using the formula (1);
for aspect ratio, the value is generallyAnd the width and height of the candidate frame are calculated according to the following formula (2):
for a candidate box with aspect ratio of 1, a scale is also addedThe candidate frame of (1), the center coordinates of the candidate frame areWherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; an alpha weight coefficient set to 1;
for the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010710684.XA CN111914917A (en) | 2020-07-22 | 2020-07-22 | Target detection improved algorithm based on feature pyramid network and attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010710684.XA CN111914917A (en) | 2020-07-22 | 2020-07-22 | Target detection improved algorithm based on feature pyramid network and attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111914917A true CN111914917A (en) | 2020-11-10 |
Family
ID=73280105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010710684.XA Pending CN111914917A (en) | 2020-07-22 | 2020-07-22 | Target detection improved algorithm based on feature pyramid network and attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111914917A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418345A (en) * | 2020-12-07 | 2021-02-26 | 苏州小阳软件科技有限公司 | Method and device for quickly identifying fine-grained small target |
CN112465057A (en) * | 2020-12-08 | 2021-03-09 | 中国人民解放军空军工程大学 | Target detection and identification method based on deep convolutional neural network |
CN112819737A (en) * | 2021-01-13 | 2021-05-18 | 西北大学 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
CN112837747A (en) * | 2021-01-13 | 2021-05-25 | 上海交通大学 | Protein binding site prediction method based on attention twin network |
CN113158738A (en) * | 2021-01-28 | 2021-07-23 | 中南大学 | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism |
CN113177579A (en) * | 2021-04-08 | 2021-07-27 | 北京科技大学 | Feature fusion method based on attention mechanism |
CN113255443A (en) * | 2021-04-16 | 2021-08-13 | 杭州电子科技大学 | Pyramid structure-based method for positioning time sequence actions of graph attention network |
CN113409249A (en) * | 2021-05-17 | 2021-09-17 | 上海电力大学 | Insulator defect detection method based on end-to-end algorithm |
CN113408549A (en) * | 2021-07-14 | 2021-09-17 | 西安电子科技大学 | Few-sample weak and small target detection method based on template matching and attention mechanism |
CN113807291A (en) * | 2021-09-24 | 2021-12-17 | 南京莱斯电子设备有限公司 | Airport runway foreign matter detection and identification method based on feature fusion attention network |
CN113920468A (en) * | 2021-12-13 | 2022-01-11 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
CN114220015A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
CN114387202A (en) * | 2021-06-25 | 2022-04-22 | 南京交通职业技术学院 | 3D target detection method based on vehicle end point cloud and image fusion |
CN114972860A (en) * | 2022-05-23 | 2022-08-30 | 郑州轻工业大学 | Target detection method based on attention-enhanced bidirectional feature pyramid network |
CN115019169A (en) * | 2022-05-31 | 2022-09-06 | 海南大学 | Single-stage water surface small target detection method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180182109A1 (en) * | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
US20190341025A1 (en) * | 2018-04-18 | 2019-11-07 | Sony Interactive Entertainment Inc. | Integrated understanding of user characteristics by multimodal processing |
CN110533084A (en) * | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
CN110674866A (en) * | 2019-09-23 | 2020-01-10 | 兰州理工大学 | Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network |
CN110705457A (en) * | 2019-09-29 | 2020-01-17 | 核工业北京地质研究院 | Remote sensing image building change detection method |
CN111179217A (en) * | 2019-12-04 | 2020-05-19 | 天津大学 | Attention mechanism-based remote sensing image multi-scale target detection method |
CN111401201A (en) * | 2020-03-10 | 2020-07-10 | 南京信息工程大学 | Aerial image multi-scale target detection method based on spatial pyramid attention drive |
-
2020
- 2020-07-22 CN CN202010710684.XA patent/CN111914917A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180182109A1 (en) * | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
US20190341025A1 (en) * | 2018-04-18 | 2019-11-07 | Sony Interactive Entertainment Inc. | Integrated understanding of user characteristics by multimodal processing |
CN110533084A (en) * | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
CN110674866A (en) * | 2019-09-23 | 2020-01-10 | 兰州理工大学 | Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network |
CN110705457A (en) * | 2019-09-29 | 2020-01-17 | 核工业北京地质研究院 | Remote sensing image building change detection method |
CN111179217A (en) * | 2019-12-04 | 2020-05-19 | 天津大学 | Attention mechanism-based remote sensing image multi-scale target detection method |
CN111401201A (en) * | 2020-03-10 | 2020-07-10 | 南京信息工程大学 | Aerial image multi-scale target detection method based on spatial pyramid attention drive |
Non-Patent Citations (3)
Title |
---|
徐成琪;洪学海;: "基于功能保持的特征金字塔目标检测网络", 模式识别与人工智能, no. 06, 15 June 2020 (2020-06-15) * |
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12 * |
高建瓴;孙健;王子牛;韩毓璐;冯娇娇;: "基于注意力机制和特征融合的SSD目标检测算法", 软件, no. 02, 15 February 2020 (2020-02-15) * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418345A (en) * | 2020-12-07 | 2021-02-26 | 苏州小阳软件科技有限公司 | Method and device for quickly identifying fine-grained small target |
CN112418345B (en) * | 2020-12-07 | 2024-02-23 | 深圳小阳软件有限公司 | Method and device for quickly identifying small targets with fine granularity |
CN112465057A (en) * | 2020-12-08 | 2021-03-09 | 中国人民解放军空军工程大学 | Target detection and identification method based on deep convolutional neural network |
CN112819737A (en) * | 2021-01-13 | 2021-05-18 | 西北大学 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
CN112837747A (en) * | 2021-01-13 | 2021-05-25 | 上海交通大学 | Protein binding site prediction method based on attention twin network |
CN112837747B (en) * | 2021-01-13 | 2022-07-12 | 上海交通大学 | Protein binding site prediction method based on attention twin network |
CN112819737B (en) * | 2021-01-13 | 2023-04-07 | 西北大学 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
CN113158738A (en) * | 2021-01-28 | 2021-07-23 | 中南大学 | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism |
CN113158738B (en) * | 2021-01-28 | 2022-09-20 | 中南大学 | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism |
CN113177579A (en) * | 2021-04-08 | 2021-07-27 | 北京科技大学 | Feature fusion method based on attention mechanism |
CN113255443A (en) * | 2021-04-16 | 2021-08-13 | 杭州电子科技大学 | Pyramid structure-based method for positioning time sequence actions of graph attention network |
CN113255443B (en) * | 2021-04-16 | 2024-02-09 | 杭州电子科技大学 | Graph annotation meaning network time sequence action positioning method based on pyramid structure |
CN113409249A (en) * | 2021-05-17 | 2021-09-17 | 上海电力大学 | Insulator defect detection method based on end-to-end algorithm |
CN114387202A (en) * | 2021-06-25 | 2022-04-22 | 南京交通职业技术学院 | 3D target detection method based on vehicle end point cloud and image fusion |
CN113408549A (en) * | 2021-07-14 | 2021-09-17 | 西安电子科技大学 | Few-sample weak and small target detection method based on template matching and attention mechanism |
CN113408549B (en) * | 2021-07-14 | 2023-01-24 | 西安电子科技大学 | Few-sample weak and small target detection method based on template matching and attention mechanism |
CN113807291B (en) * | 2021-09-24 | 2024-04-26 | 南京莱斯电子设备有限公司 | Airport runway foreign matter detection and identification method based on feature fusion attention network |
CN113807291A (en) * | 2021-09-24 | 2021-12-17 | 南京莱斯电子设备有限公司 | Airport runway foreign matter detection and identification method based on feature fusion attention network |
CN113920468A (en) * | 2021-12-13 | 2022-01-11 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
CN114220015A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
CN114972860A (en) * | 2022-05-23 | 2022-08-30 | 郑州轻工业大学 | Target detection method based on attention-enhanced bidirectional feature pyramid network |
CN115019169A (en) * | 2022-05-31 | 2022-09-06 | 海南大学 | Single-stage water surface small target detection method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111914917A (en) | Target detection improved algorithm based on feature pyramid network and attention mechanism | |
CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
CN110348376B (en) | Pedestrian real-time detection method based on neural network | |
CN111079584A (en) | Rapid vehicle detection method based on improved YOLOv3 | |
CN110532970B (en) | Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces | |
JP7097641B2 (en) | Loop detection method based on convolution perception hash algorithm | |
CN109034092A (en) | Accident detection method for monitoring system | |
CN110287777B (en) | Golden monkey body segmentation algorithm in natural scene | |
CN113255589B (en) | Target detection method and system based on multi-convolution fusion network | |
Lyu et al. | Small object recognition algorithm of grain pests based on SSD feature fusion | |
CN113313031B (en) | Deep learning-based lane line detection and vehicle transverse positioning method | |
CN113610046B (en) | Behavior recognition method based on depth video linkage characteristics | |
CN111860587A (en) | Method for detecting small target of picture | |
CN108133235A (en) | A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure | |
CN113139489A (en) | Crowd counting method and system based on background extraction and multi-scale fusion network | |
CN116129291A (en) | Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device | |
CN116052271A (en) | Real-time smoking detection method and device based on CenterNet | |
CN116452966A (en) | Target detection method, device and equipment for underwater image and storage medium | |
CN113887649B (en) | Target detection method based on fusion of deep layer features and shallow layer features | |
WO2022205329A1 (en) | Object detection method, object detection apparatus, and object detection system | |
CN112070181B (en) | Image stream-based cooperative detection method and device and storage medium | |
CN110136098B (en) | Cable sequence detection method based on deep learning | |
CN110309786B (en) | Lactating sow posture conversion identification method based on depth video | |
CN117409244A (en) | SCKConv multi-scale feature fusion enhanced low-illumination small target detection method | |
CN117079125A (en) | Kiwi fruit pollination flower identification method based on improved YOLOv5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |