CN111914917A - Target detection improved algorithm based on feature pyramid network and attention mechanism - Google Patents

Target detection improved algorithm based on feature pyramid network and attention mechanism Download PDF

Info

Publication number
CN111914917A
CN111914917A CN202010710684.XA CN202010710684A CN111914917A CN 111914917 A CN111914917 A CN 111914917A CN 202010710684 A CN202010710684 A CN 202010710684A CN 111914917 A CN111914917 A CN 111914917A
Authority
CN
China
Prior art keywords
feature
algorithm
fusion
network
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010710684.XA
Other languages
Chinese (zh)
Inventor
王燕妮
刘祥
翟会杰
余丽仙
孙雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and Technology filed Critical Xian University of Architecture and Technology
Priority to CN202010710684.XA priority Critical patent/CN111914917A/en
Publication of CN111914917A publication Critical patent/CN111914917A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection improved algorithm based on a characteristic pyramid network and an attention mechanism, which fuses 6 multi-scale characteristic graphs extracted from a basic network in an original SSD algorithm by combining the principle of the characteristic pyramid network, wherein a new characteristic graph formed after fusion simultaneously contains rich context information so as to improve the detection capability; and an attention model is added to the fused feature graph, so that feature information of the small target is effectively extracted. The condition of missed detection is improved, the robustness of the algorithm is improved, and meanwhile the requirement of real-time performance is still met in the aspect of detection speed.

Description

Target detection improved algorithm based on feature pyramid network and attention mechanism
Technical Field
The invention belongs to the field of digital image processing, relates to target detection, and particularly relates to a target detection improvement algorithm based on a characteristic pyramid network and an attention mechanism.
Background
The task of target detection is to find out interested targets in images and determine the types and the positions of the interested targets, is one of the core problems in the field of computer vision, and is widely applied to infrared detection technology, intelligent video monitoring, remote sensing image target detection, medical diagnosis and fire and smoke detection in intelligent buildings. The target detection algorithm can be divided into a traditional target detection algorithm and a target detection algorithm based on deep learning; the traditional target detection algorithm is represented by an SIFT algorithm, a V-J detection algorithm and the like, but the method is high in time complexity and has no good robustness. The target detection algorithm based on deep learning comprises an R-CNN algorithm, a Fast R-CNN algorithm, a Faster R-CNN algorithm, a YOLO algorithm, an SSD algorithm and the like. Although many excellent target detection algorithms exist in the prior art, the detection performance is still insufficient, so that the problems of missing detection, false detection and the like are caused.
Disclosure of Invention
In view of the above-mentioned drawbacks and disadvantages of the prior art, an object of the present invention is to provide an improved algorithm for object detection based on a feature pyramid network and an attention mechanism.
In order to realize the task, the invention adopts the following technical solution:
an improved target detection algorithm based on a feature pyramid network and an attention mechanism is characterized by comprising the following steps:
step 1) combining the principle of a feature pyramid network, extracting 6 multi-scale feature maps of an input image from a basic network VGG-16 in an original SSD algorithm, and performing feature fusion according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
According to the invention, the size of the input image in step 1) is 300 × 300, and the sizes of the feature maps for detection obtained after passing through the basic network VGG-16 are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1, respectively. According to the principle of the feature pyramid network, feature fusion is sequentially carried out on feature graphs from small to large in size, and 6 feature graphs with the sizes still being 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained.
Further, in the step 2), an attention model is added to the feature map fused according to the feature pyramid principle in the step 1), because the fusion process is performed according to the order of the size of the feature map from small to large, the feature map with the most abundant information after fusion is (38, 38), and the two feature maps (19, 19) have more abundant detail information and semantic information compared with other feature maps, and are more sensitive to small target detection; in order to maintain the detection speed of the algorithm and reduce the calculation amount of the algorithm, the attention model is only added to the two feature maps (38, 38) and (19, 19) after fusion, and the specific detection algorithm process is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, multi-scale feature maps extracted from a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is performed according to the principle of a feature pyramid network and the order of size from small to large, and 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained through fusion, and all the feature maps contain rich semantic information and detailed information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; the attention model is added to the feature map after feature fusion in 1a), and the two feature maps of 38 × 38 and 19 × 19 after fusion contain the most abundant information, and in order to keep the real-time performance of the algorithm, the attention model is only added to the two feature maps.
c) Setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the steps a) and b), and calculating the scale of the candidate frames according to the following formula (1):
Figure BDA0002596425370000031
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2; obtaining the scale of each candidate frame by using the formula (1);
for aspect ratio, the value is generally
Figure BDA0002596425370000032
And the width and height of the candidate frame are calculated according to the following formula (2):
Figure BDA0002596425370000041
for a candidate box with aspect ratio of 1, a scale is also added
Figure BDA0002596425370000042
The candidate frame of (1), the center coordinates of the candidate frame are
Figure BDA0002596425370000043
Wherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
Figure BDA0002596425370000044
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; and the alpha weight coefficient is set to be 1.
For the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
Figure BDA0002596425370000045
Figure BDA0002596425370000046
Figure BDA0002596425370000047
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
Figure BDA0002596425370000051
the invention relates to a target detection improved algorithm based on a feature pyramid network and an attention mechanism, which is based on a single-stage target detection algorithm (SSD) algorithm, takes the influence of the resolution of a feature map on the target detection performance into consideration, improves the original algorithm, combines the thought of the feature pyramid network, fuses multi-scale feature maps extracted by the original SSD algorithm, and fuses to form a feature map with abundant semantic information and detailed information; and combining the principle of an attention mechanism, adding an attention model for the two feature maps with the fused sizes of 38 multiplied by 38 and 19 multiplied by 19 so as to enhance the recognition effect on the small target object.
Drawings
FIG. 1 is a schematic diagram of a network architecture for an object detection algorithm that combines a feature pyramid network and an attention mechanism;
FIG. 2 is a picture comparing the detection effect of the original SSD algorithm with the improved target detection algorithm, wherein the left graph a1, the graph a2, the graph a3, the graph a4 and the graph a5 are the detection pictures of the original SSD algorithm; the right-hand graph b1, the graph b2, the graph b3, the graph b4 and the graph b5 are all detection pictures of the improved target detection algorithm.
The invention is described in further detail below with reference to the figures and examples.
Detailed Description
The invention discloses a target detection improvement algorithm based on a feature pyramid network and an attention mechanism. The method comprises the steps of integrating the principle of a characteristic pyramid network, fusing 6 characteristic graphs extracted by an original SSD algorithm to form a new characteristic graph, and meanwhile, having rich semantic information and detail information; then, an attention model is added to the fused feature map, but in order to keep the real-time performance of the algorithm, the attention model is only added to the 38 × 38 feature map and the 19 × 19 feature map which contain most abundant information and are more sensitive to small target detection. The detection capability of the target detection algorithm is improved through the improvement of the algorithm, and the problems of missing detection and the like are solved.
The embodiment provides an improved target detection algorithm based on a feature pyramid network and an attention mechanism, which comprises the following steps:
step 1) combining the principle of a feature pyramid network, extracting 6 multi-scale feature maps of an input image from a basic network VGG-16 in an original SSD algorithm, and performing feature fusion according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
In step 1), the size of the input image is 300 × 300, the sizes of the feature maps extracted through the basic network VGG-16 are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1, respectively, and the extracted 6 feature maps are fused from small to large in size by combining the idea of the feature pyramid network, that is, 1 × 1 and 3 × 3, 3 × 3 and 5 × 5, 5 × 5 and 10 × 10, 10 × 10 and 19 × 19, and 19 × 19 and 38 × 38. The sizes of the fused feature maps are still 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1.
In the step 2), an attention model is added to the fused feature map in combination with the principle of an attention mechanism, because the 38 × 38 and 19 × 19 feature maps after feature fusion contain the richest information, and in order to keep the real-time performance of the detection algorithm and reduce the calculation amount, the attention model is only added to the two feature maps, and the extraction of the features of the small target object can be enhanced after the attention model is added.
The detection process of the improved target detection algorithm is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, multi-scale feature maps extracted by a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is performed according to the principle of a feature pyramid network and the order of the sizes from small to large, and the two feature maps (1, 1) and (3, 3) are taken as examples:
firstly, up-sampling a characteristic diagram with the size of (1, 1), and inserting new elements between pixel points by adopting a proper interpolation algorithm on the basis of the original image pixels by adopting an interpolation value method, so that the size of the characteristic diagram is enlarged, and the enlarged characteristic diagram is consistent with the size of the characteristic diagram (3, 3); then carrying out 1 × 1 convolution operation on the feature maps of (3, 3), and changing the number of channels of the feature maps to enable the number of channels to be the same as that of the feature maps obtained through up-sampling; and finally, carrying out feature fusion, and carrying out convolution operation on the fused feature graph by using a 3-by-3 convolution kernel after fusion so as to eliminate the aliasing effect of up-sampling. The fusion between other adjacent feature maps is consistent with the method described above. The fusion obtains 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1, and the feature maps all contain rich semantic information and detail information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; adding an attention model to the feature map subjected to feature fusion in the step a), wherein the 38 × 38 and 19 × 19 feature maps after fusion contain the most abundant information, and in order to keep the real-time performance of the algorithm, only adding the attention model to the two feature maps. The process of adding the attention model is divided into three steps: squeeze, excite, notice.
The formula for the pressing operation is as follows:
Figure BDA0002596425370000081
h, W respectively represents the height and width of an input, U represents the input, Y represents the output, and C is the number of input channels;
the function of the formula (1) is to convert the input of H × W × C into the output of 1 × C, which corresponds to performing a global average pooling (global average pooling) operation.
The formula for the excitation operation is as follows:
S=h-Swish(W2×ReLU6(W1Y))(2)
wherein Y represents an output of the pressing operation, S represents an output of the energizing operation, and W1Has the dimension of C/r C, W2The dimension of (C) is C/r, r is a scaling parameter, and the value is 4. W1Multiplication by Y represents a full join operation, and then activation of the function via ReLU 6; then with W2Multiplication also represents a full connection operation, and finally passes through a hard-Swish activation function, so that the excitation operation is completed. ReLU6 and hard-Swish activation function formulas are shown in equation (3) below.
Figure BDA0002596425370000082
The operation of Attention is shown below:
X=S×U (4)
in the formula, X represents the feature map after the attention mechanism is added, U represents the original input, and S represents the output of the excitation operation, and the weight of each feature map is multiplied by the feature of the feature map.
c) Setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the step a) and the step b), and calculating the scale of the candidate frames according to the following formula:
Figure BDA0002596425370000091
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2;
obtaining the scale of each candidate frame by using the formula (5);
for aspect ratio, the value is generally
Figure BDA0002596425370000092
And the width and height of the candidate frame are calculated according to the following formula (6):
Figure BDA0002596425370000093
for a candidate box with aspect ratio of 1, a scale is also added
Figure BDA0002596425370000094
The candidate frame of (1), the center coordinates of the candidate frame are
Figure BDA0002596425370000095
Wherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
Figure BDA0002596425370000101
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; and the alpha weight coefficient is set to be 1.
For the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
Figure BDA0002596425370000102
Figure BDA0002596425370000103
Figure BDA0002596425370000104
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
Figure BDA0002596425370000105
and then training the improved target detection algorithm model.
In this embodiment, the PASCAL VOC2007 data set and the PASCAL VOC2012 data set are used as training sets for model training, and data amplification techniques are adopted to expand images of the training sets by performing operations such as horizontal flipping and random cropping on the data sets.
Data used for the experiment: the PASCAL VOC data set is a set of standardized data set for image identification and classification, and the data set comprises 20 categories, namely human, bird, cat, cow, dog, horse, sheep, airplane, bicycle, boat, bus, automobile, motorcycle, train, bottle, chair, dining table, potted plant, sofa and television.
This example was trained using the VOC2007 data set and the VOC2012 data set described above and tested using the VOC2007 data set. During training, a random gradient descent method (SGD) is adopted, the batchsize is set to be 32, the initial learning rate is set to be 0.001, the momentum parameter momentum is set to be 0.9, the learning rate is reduced by 90% when the iteration times are 100000 and 150000, and the training times are 200000.
In order to verify the detection effect of the target detection improvement algorithm based on the single-stage network model in this embodiment, the applicant selects a test set in the PASCAL VOC2007 data set for detection, and uses an mAP (mean Average precision) as an evaluation index of the detection algorithm, each detected category obtains a curve formed by precision ratio and recall ratio, i.e., a P-R curve, the area under the curve is an AP value, and the AP values of all detected categories are averaged to obtain the mAP value. The detection effect is compared with other mainstream target detection models in terms of both subjective and objective aspects (see tables 1 and 2).
TABLE 1
Figure BDA0002596425370000111
TABLE 2
Figure BDA0002596425370000112
Figure BDA0002596425370000121
In the subjective evaluation of detection effect, the original SSD algorithm and the improved detection algorithm effect graph are compared (as shown in fig. 2, the graphs a1, a2, a3, a4 and a5 are original SSD algorithm detection pictures, and the graphs b1, b2, b3, b4 and b5 are target detection algorithm detection pictures). As can be seen from the figure, compared with the original SSD algorithm, the improved target detection algorithm significantly improves the problems of missing detection and the like in the original algorithm, has more excellent detection capability for densely distributed small target objects, and can detect more targets. The detection effect is obviously improved compared with the original SSD algorithm.

Claims (3)

1. An improved target detection algorithm based on a feature pyramid network and an attention mechanism is characterized by comprising the following steps:
step 1) combining the principle of a feature pyramid network, and performing feature fusion on 6 multi-scale feature maps extracted from an input image by a basic network VGG-16 in an original SSD algorithm according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
2. The algorithm of claim 1, wherein the size of the input image in step 1) is 300 x 300, and the sizes of the feature maps for detection obtained after passing through the underlying network VGG-16 are 38 x 38, 19 x 19, 10 x 10, 5 x 5, 3 x 3, 1 x 1; according to the principle of the feature pyramid network, feature fusion is sequentially carried out on feature graphs for detection from small to large in size, and 6 feature graphs with the feature graph size still being 38 multiplied by 38, 19 multiplied by 19, 10 multiplied by 10, 5 multiplied by 5, 3 multiplied by 3 and 1 multiplied by 1 are obtained.
3. The algorithm according to claim 1, wherein in step 2), an attention model is added to the feature map fused according to the feature pyramid principle in step 1), because the fusion process is performed in the order from small to large in feature map size, so that the feature map with the most abundant information after fusion is (38, 38), (19, 19), and the two feature maps have more abundant detail information and semantic information and are more sensitive to small object detection than other feature maps; in order to maintain the detection speed of the algorithm and reduce the calculation amount of the algorithm, the attention model is only added to the two feature maps (38, 38) and (19, 19) after fusion, and the detection process of the target detection algorithm is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, input image multi-scale feature maps extracted by a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is carried out according to the principle of a feature pyramid network and the order of the sizes from small to large, 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained through fusion, and the feature maps contain rich semantic information and detailed information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; adding an attention model to the feature map subjected to feature fusion in the step 1a), wherein the two feature maps of 38 × 38 and 19 × 19 contain the most abundant information after fusion, and in order to keep the real-time performance of the algorithm, only adding the attention model to the two feature maps;
c) setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the steps a) and b), and calculating the scale of the candidate frames according to the following formula (1):
Figure FDA0002596425360000031
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2;
obtaining the scale of each candidate frame by using the formula (1);
for aspect ratio, the value is generally
Figure FDA0002596425360000032
And the width and height of the candidate frame are calculated according to the following formula (2):
Figure FDA0002596425360000033
for a candidate box with aspect ratio of 1, a scale is also added
Figure FDA0002596425360000034
The candidate frame of (1), the center coordinates of the candidate frame are
Figure FDA0002596425360000035
Wherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
Figure FDA0002596425360000036
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; an alpha weight coefficient set to 1;
for the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
Figure FDA0002596425360000041
Figure FDA0002596425360000042
Figure FDA0002596425360000043
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
Figure FDA0002596425360000044
CN202010710684.XA 2020-07-22 2020-07-22 Target detection improved algorithm based on feature pyramid network and attention mechanism Pending CN111914917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010710684.XA CN111914917A (en) 2020-07-22 2020-07-22 Target detection improved algorithm based on feature pyramid network and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010710684.XA CN111914917A (en) 2020-07-22 2020-07-22 Target detection improved algorithm based on feature pyramid network and attention mechanism

Publications (1)

Publication Number Publication Date
CN111914917A true CN111914917A (en) 2020-11-10

Family

ID=73280105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010710684.XA Pending CN111914917A (en) 2020-07-22 2020-07-22 Target detection improved algorithm based on feature pyramid network and attention mechanism

Country Status (1)

Country Link
CN (1) CN111914917A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418345A (en) * 2020-12-07 2021-02-26 苏州小阳软件科技有限公司 Method and device for quickly identifying fine-grained small target
CN112465057A (en) * 2020-12-08 2021-03-09 中国人民解放军空军工程大学 Target detection and identification method based on deep convolutional neural network
CN112819737A (en) * 2021-01-13 2021-05-18 西北大学 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN112837747A (en) * 2021-01-13 2021-05-25 上海交通大学 Protein binding site prediction method based on attention twin network
CN113158738A (en) * 2021-01-28 2021-07-23 中南大学 Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN113177579A (en) * 2021-04-08 2021-07-27 北京科技大学 Feature fusion method based on attention mechanism
CN113255443A (en) * 2021-04-16 2021-08-13 杭州电子科技大学 Pyramid structure-based method for positioning time sequence actions of graph attention network
CN113409249A (en) * 2021-05-17 2021-09-17 上海电力大学 Insulator defect detection method based on end-to-end algorithm
CN113408549A (en) * 2021-07-14 2021-09-17 西安电子科技大学 Few-sample weak and small target detection method based on template matching and attention mechanism
CN113807291A (en) * 2021-09-24 2021-12-17 南京莱斯电子设备有限公司 Airport runway foreign matter detection and identification method based on feature fusion attention network
CN113920468A (en) * 2021-12-13 2022-01-11 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN114220015A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved YOLOv 5-based satellite image small target detection method
CN114387202A (en) * 2021-06-25 2022-04-22 南京交通职业技术学院 3D target detection method based on vehicle end point cloud and image fusion
CN114972860A (en) * 2022-05-23 2022-08-30 郑州轻工业大学 Target detection method based on attention-enhanced bidirectional feature pyramid network
CN115019169A (en) * 2022-05-31 2022-09-06 海南大学 Single-stage water surface small target detection method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182109A1 (en) * 2016-12-22 2018-06-28 TCL Research America Inc. System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
US20190341025A1 (en) * 2018-04-18 2019-11-07 Sony Interactive Entertainment Inc. Integrated understanding of user characteristics by multimodal processing
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
CN110674866A (en) * 2019-09-23 2020-01-10 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN110705457A (en) * 2019-09-29 2020-01-17 核工业北京地质研究院 Remote sensing image building change detection method
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111401201A (en) * 2020-03-10 2020-07-10 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182109A1 (en) * 2016-12-22 2018-06-28 TCL Research America Inc. System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
US20190341025A1 (en) * 2018-04-18 2019-11-07 Sony Interactive Entertainment Inc. Integrated understanding of user characteristics by multimodal processing
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
CN110674866A (en) * 2019-09-23 2020-01-10 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN110705457A (en) * 2019-09-29 2020-01-17 核工业北京地质研究院 Remote sensing image building change detection method
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111401201A (en) * 2020-03-10 2020-07-10 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐成琪;洪学海;: "基于功能保持的特征金字塔目标检测网络", 模式识别与人工智能, no. 06, 15 June 2020 (2020-06-15) *
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12 *
高建瓴;孙健;王子牛;韩毓璐;冯娇娇;: "基于注意力机制和特征融合的SSD目标检测算法", 软件, no. 02, 15 February 2020 (2020-02-15) *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418345A (en) * 2020-12-07 2021-02-26 苏州小阳软件科技有限公司 Method and device for quickly identifying fine-grained small target
CN112418345B (en) * 2020-12-07 2024-02-23 深圳小阳软件有限公司 Method and device for quickly identifying small targets with fine granularity
CN112465057A (en) * 2020-12-08 2021-03-09 中国人民解放军空军工程大学 Target detection and identification method based on deep convolutional neural network
CN112819737A (en) * 2021-01-13 2021-05-18 西北大学 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN112837747A (en) * 2021-01-13 2021-05-25 上海交通大学 Protein binding site prediction method based on attention twin network
CN112837747B (en) * 2021-01-13 2022-07-12 上海交通大学 Protein binding site prediction method based on attention twin network
CN112819737B (en) * 2021-01-13 2023-04-07 西北大学 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN113158738A (en) * 2021-01-28 2021-07-23 中南大学 Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN113158738B (en) * 2021-01-28 2022-09-20 中南大学 Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN113177579A (en) * 2021-04-08 2021-07-27 北京科技大学 Feature fusion method based on attention mechanism
CN113255443A (en) * 2021-04-16 2021-08-13 杭州电子科技大学 Pyramid structure-based method for positioning time sequence actions of graph attention network
CN113255443B (en) * 2021-04-16 2024-02-09 杭州电子科技大学 Graph annotation meaning network time sequence action positioning method based on pyramid structure
CN113409249A (en) * 2021-05-17 2021-09-17 上海电力大学 Insulator defect detection method based on end-to-end algorithm
CN114387202A (en) * 2021-06-25 2022-04-22 南京交通职业技术学院 3D target detection method based on vehicle end point cloud and image fusion
CN113408549A (en) * 2021-07-14 2021-09-17 西安电子科技大学 Few-sample weak and small target detection method based on template matching and attention mechanism
CN113408549B (en) * 2021-07-14 2023-01-24 西安电子科技大学 Few-sample weak and small target detection method based on template matching and attention mechanism
CN113807291B (en) * 2021-09-24 2024-04-26 南京莱斯电子设备有限公司 Airport runway foreign matter detection and identification method based on feature fusion attention network
CN113807291A (en) * 2021-09-24 2021-12-17 南京莱斯电子设备有限公司 Airport runway foreign matter detection and identification method based on feature fusion attention network
CN113920468A (en) * 2021-12-13 2022-01-11 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN114220015A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved YOLOv 5-based satellite image small target detection method
CN114972860A (en) * 2022-05-23 2022-08-30 郑州轻工业大学 Target detection method based on attention-enhanced bidirectional feature pyramid network
CN115019169A (en) * 2022-05-31 2022-09-06 海南大学 Single-stage water surface small target detection method and device

Similar Documents

Publication Publication Date Title
CN111914917A (en) Target detection improved algorithm based on feature pyramid network and attention mechanism
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN110348376B (en) Pedestrian real-time detection method based on neural network
CN111079584A (en) Rapid vehicle detection method based on improved YOLOv3
CN110532970B (en) Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces
JP7097641B2 (en) Loop detection method based on convolution perception hash algorithm
CN109034092A (en) Accident detection method for monitoring system
CN110287777B (en) Golden monkey body segmentation algorithm in natural scene
CN113255589B (en) Target detection method and system based on multi-convolution fusion network
Lyu et al. Small object recognition algorithm of grain pests based on SSD feature fusion
CN113313031B (en) Deep learning-based lane line detection and vehicle transverse positioning method
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN111860587A (en) Method for detecting small target of picture
CN108133235A (en) A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure
CN113139489A (en) Crowd counting method and system based on background extraction and multi-scale fusion network
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN116052271A (en) Real-time smoking detection method and device based on CenterNet
CN116452966A (en) Target detection method, device and equipment for underwater image and storage medium
CN113887649B (en) Target detection method based on fusion of deep layer features and shallow layer features
WO2022205329A1 (en) Object detection method, object detection apparatus, and object detection system
CN112070181B (en) Image stream-based cooperative detection method and device and storage medium
CN110136098B (en) Cable sequence detection method based on deep learning
CN110309786B (en) Lactating sow posture conversion identification method based on depth video
CN117409244A (en) SCKConv multi-scale feature fusion enhanced low-illumination small target detection method
CN117079125A (en) Kiwi fruit pollination flower identification method based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination