CN114494728B - Small target detection method based on deep learning - Google Patents

Small target detection method based on deep learning Download PDF

Info

Publication number
CN114494728B
CN114494728B CN202210123900.XA CN202210123900A CN114494728B CN 114494728 B CN114494728 B CN 114494728B CN 202210123900 A CN202210123900 A CN 202210123900A CN 114494728 B CN114494728 B CN 114494728B
Authority
CN
China
Prior art keywords
feature map
feature
channel
targets
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210123900.XA
Other languages
Chinese (zh)
Other versions
CN114494728A (en
Inventor
杜金莲
李攀
张潇
苏航
赵青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202210123900.XA priority Critical patent/CN114494728B/en
Publication of CN114494728A publication Critical patent/CN114494728A/en
Application granted granted Critical
Publication of CN114494728B publication Critical patent/CN114494728B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small target detection method based on deep learning, which comprises the following steps: and carrying out data enhancement on the small target in the training data. And carrying out feature extraction on the processed image through a feature extraction network, and fusing the feature images through cascading to obtain the feature image. The feature map is weighted by the channel attention module and then is subjected to the space attention module to obtain a final feature map. The extracted potential targets are classified into regular targets and small targets according to the area size. Carrying out RoIAlign region pooling operation on the small target region, and carrying out category judgment and position regression on the pooled result to obtain a final detection result; and the mixed attention module is used for improving the extraction capacity of the RPN region, dividing the extracted region into two types of small targets and other targets according to the size of the area, pooling the small target region by using RoIAlign regions, and fully utilizing the characteristic information of the small target region, thereby improving the detection capacity of a network on the small targets while reducing the increase of the calculated amount.

Description

Small target detection method based on deep learning
Technical Field
The invention relates to the technical field of target detection in computer vision, in particular to a small target detection method based on deep learning.
Background
New generation technologies represented by artificial intelligence are becoming one of important driving forces for promoting the development of social productivity and realizing industrial digitization and industrial modernization. Among them, as an important research direction in the field of artificial intelligence, a task of target detection has been rapidly developed in recent years. The object detection task is one of four tasks of computer vision, and aims to identify an object in a picture and mark the specific position of the object in the form of a rectangular frame. The detection effect of small targets has been difficult to compare favorably with the detection effect of large targets in a picture. The reason for this problem is that the small target is small in size, and it is difficult for the detection method to acquire relatively rich information from these limited pixel information. However, the small target detection task has great significance for the production practice activities of human beings, such as the detection of small traffic signs in the automatic driving field, the identification of small area focuses in the medical image field, the search and rescue of sea surface personnel in the remote sensing image and the like, and in the tasks, the improvement of the detection efficiency of the small target can effectively promote the development of social productivity and ensure the life and property safety of people.
At present, research on a target detection method is mainly divided into two major directions, wherein one is RCNN series of algorithms based on two-stage detection, the two-stage detection algorithm performs region extraction in a first stage, and the second stage performs position regression and classification on the extracted region, wherein the representative algorithm is FASTER RCNN algorithm; and secondly, based on a one-stage target detection algorithm, the one-stage detection algorithm discards the two-stage region extraction step and directly carries out position regression and classification on the feature map, wherein the representative algorithms are SSD algorithm and YOLO series algorithm. Research on small target detection methods is generally carried out by modifying the method so as to improve the detection effect. Six feature graphs with different sizes are used for detecting the targets by the SSD algorithm proposed by Liu et al, so that the detection capability of a network model on small targets is improved, but the network result ignores the connection among all feature layers, and particularly the low-layer feature graph for detecting the small targets lacks rich semantic information of a high layer, so that the method has limited lifting effect on the detection of the small targets. The DSSD algorithm proposed by Fu et al considers that the problem of inaccurate classification of targets by a network model caused by lack of abundant semantic information in a feature map for detecting the small targets in SSD, so as to cause the problem of false detection of the small targets.
The reason for the poor detection of small targets can be summarized as the following three points: (1) The feature map for detecting the small target lacks enough semantic information and detail information; (2) the number of small target instances in the dataset is relatively poor; (3) The small target loses part of small target information after a plurality of downsampling. In the existing detection method, although the characteristics of the targets are enhanced on a general detection model so as to improve the detection effect of the small targets, the targets with different sizes are not treated differently, and the targets with different sizes are detected by using a unified method, so that the small targets with relatively lacking pixel information lose part of precious characteristic information. In order to solve the problems, the invention provides a small target detection method based on deep learning.
The invention comprises the following steps:
The invention provides a small target detection method based on deep learning, which improves the phenomenon of poor detection effect on a small target in a general target detection method. According to the method, potential targets are divided into conventional targets and small targets according to the potential target areas extracted by the network model, and key processing is carried out on the potential small targets so as to improve the detection effect of the small targets.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the specific flow of the method is shown in the attached figure 1, and the small target detection method based on deep learning comprises the following steps:
Step one: the data enhancement is performed on the small targets in the training data, and the enhancement method used comprises the steps of zooming, overturning, color gamut warping, copying of small target examples and Mosaic enhancement on pictures in the data set.
Step two: and (3) carrying out feature extraction on the image processed in the step one through a feature extraction network, wherein the feature extraction network uses ResNet as a backbone network, and the feature graphs of the third stage and the fourth stage are fused through cascading to be used as feature graphs.
Step three: and (3) improving the capability of the network to distinguish the foreground from the background in the feature map by using a mixed attention mechanism on the feature map generated in the step two, wherein the mixed attention mechanism adopts a serial form, and the final feature map is obtained by weighting the feature map through a channel attention module and then through a space attention module.
Step four: and (3) extracting the areas of potential targets in the characteristic diagram obtained in the step three by using the RPN area extraction network, and dividing the extracted potential targets into conventional targets and small targets according to the area size. In view of the adjustment of the target position thereafter, an area having an area of 64×64 or less is taken as a small target area, and a target area having a value larger than this is taken as a large target area.
Step five: performing RoI alignment region pooling operation on the small target region obtained in the step four, performing RoI Pooling region pooling operation on other target regions, and performing category judgment and position regression on the pooled result to obtain a final detection result
Compared with the prior art, the invention has the following advantages:
(1) Aiming at the problem that the feature map for detecting the small target lacks enough semantic information and detail information, the invention uses ResNet feature extraction network, improves the small target extraction capability, and fuses the low-level feature map and the high-level feature map, so that the feature map has rich semantic information and space detail information.
(2) Aiming at the problem that the number of small target examples in the data set is relatively deficient, the method and the device improve the robustness of a network model by using a data enhancement means, wherein the small target example copy and the Mosaic data enhancement improve the capturing probability of the network to the small target, thereby improving the detection capability of the network.
(3) Aiming at the problem that the prior art uses a unified method to detect targets with different sizes to cause that characteristic information is lost by the small targets, the invention uses a mixed attention module to improve the extraction capability of RPN regions, divides the extracted regions into two types of the small targets and other targets according to the area size, uses RoI alignment regions for pooling the small target regions, fully utilizes the characteristic information of the small target regions, and improves the detection capability of a network on the small targets while reducing the increase of calculation amount.
Drawings
Fig. 1 is a schematic view of the overall structure of the present invention.
Fig. 2 is a network diagram of feature extraction of the present invention.
Fig. 3 is a network diagram of an attention module of the present invention.
Fig. 4 is a region extraction network diagram of the present invention.
FIG. 5a is a schematic diagram of RoI Pooling area pooling in accordance with the present invention.
FIG. 5b is a schematic representation of the pooling of RoI Align regions of the present invention.
Detailed Description
The invention provides a small target detection method based on deep learning, which is further described in detail below with reference to the attached drawings.
The overall structure of the invention is schematically shown in fig. 1, and comprises:
firstly, carrying out data enhancement on image data in a training data set, wherein the enhancement method comprises five means of scaling, overturning, color gamut distortion, copying of a small target instance and Mosaic enhancement; secondly, performing feature extraction on the image subjected to data enhancement by using ResNet feature extraction network, wherein a feature fusion technology is used in a feature extraction stage, and a feature extraction network diagram is shown in fig. 2; the feature diagram uses a mixed attention mechanism to improve the distinguishing capability of the network to the foreground and the background in the feature diagram, wherein the mixed attention mechanism uses a serial form, and the attention module network diagram is shown in fig. 3; then, using an RPN area extraction network to extract the areas of potential targets from the feature images processed by the attention module, and dividing the extracted potential target areas into conventional targets and small targets according to the area size, wherein the threshold value for distinguishing the conventional target small targets is 64 multiplied by 64; and finally carrying out RoI Pooling area pooling operation on the obtained conventional target area, carrying out RoI Align area pooling operation on the small target area, and effectively solving the problem of characteristic loss caused by twice quantization and rounding in the network processing flow by RoI Align area pooling, thereby improving the detection capability of the small target, and finally carrying out category judgment and position regression on the pooled result to obtain a final detection result.
The feature extraction network diagram of the present invention is shown in fig. 2, and includes:
ResNet50 overall structure: the ResNet network has five phases in total, the first phase comprising one convolution layer with a convolution kernel of 7 and one maximum pooling layer with a pooling kernel of 3 steps of 2; the second stage to the fifth stage are similar in structure and comprise a convolution block and a plurality of identification blocks, wherein the number of the identification blocks is 2,3,5,2. Wherein the input dimension and the output dimension of the convolution block are inconsistent and used for changing the dimension of the network, and the input dimension and the output dimension of the identification block are consistent, serial connection can be used to deepen the depth of the network.
Feature fusion stage: feature fusion is shown by a broken line in fig. 2, in the prior art, a feature map of a stage four is generally used for extracting a later region, however, the feature map generated by the stage four has rich semantic information in a high-level feature map, but lacks space detail information in a low-level feature map, so that the feature fusion is performed on the feature map generated by the stage three and the stage four, thereby enhancing the detail information and the semantic information of the feature map. The feature fusion generally adopts pixel addition or channel cascade operation, and takes negative effects caused by direct pixel addition into consideration. Taking a three-channel picture with a size of 600×600 as an example: the picture is subjected to a stage III to obtain a feature map C 1 with the width and the height of 75 and the channel number of 512, a stage four to obtain a feature map C 2 with the width and the height of 38 and the channel number of 1024, the size of the feature map C 1 is adjusted to the size of the feature map C 2 through maximum pooling to obtain a feature map C 3, then the feature map C 2 and the feature map C 3 are connected in series in the channel direction in a channel mode, and finally the feature map after being connected in series is subjected to the channel number adjustment operation through 1X 1 convolution to obtain a final fusion feature map C.
The attention module of the present invention is shown in fig. 3, and includes:
Because each channel and position in the feature map has different information, if the feature map is directly used for region extraction, certain important channel information and position information are ignored, so that the extraction capacity of the region extraction network is reduced, and the invention uses an attention mechanism to enable the network to automatically judge the important channels and positions in the feature map before region extraction. The invention adopts a series structure of classical attention modules CBAM, firstly, a channel attention module is used for judging the importance of a characteristic map channel, and secondly, a space attention module is used for judging the importance of a characteristic map space position, and the specific steps are as follows:
Step one: the input feature map F is processed by the maximum pooling block and the average pooling block of the channel attention module, respectively. Taking the maximum pooling block as an example, the feature map F firstly obtains compressed channel information through global maximum pooling, the shape is changed from 1024×38×38 to 1024×1×1, then one-dimensional convolution with a convolution kernel size of k is used for aggregating the compressed channel information, and due to the fact that the convolution has the property of parameter sharing, the number of parameters of a module can be effectively reduced by using the one-dimensional convolution compared with the use of a full-link layer in the conventional channel attention, wherein the convolution kernel size k is determined by the following formula, and C is the number of channels of the feature map:
Step two: and (3) carrying out pixel-by-pixel addition operation on the channel aggregation information processed by the full-maximum pooling block and the average pooling block obtained in the step (I), obtaining the weight of each channel importance through sigmoid nonlinear activation, and then carrying out pixel-by-pixel multiplication operation on the channel importance weight and the original feature map F to obtain the feature map F 1 reinforced by the channel attention module.
Step three: and D, performing spatial attention module processing on the feature map F 1 obtained in the second step. The feature map F 1 is subjected to global maximum pooling and global average pooling respectively in the channel dimension to obtain compressed space information, the shape is changed from 1024×38×38 to 1×38×38, then cascade operation is carried out on the pooling result in the channel dimension to obtain compressed information with the shape of 2×38×38, and then two-dimensional convolution with the convolution kernel size of 7 is used for polymerizing the compressed space information to obtain aggregation information with the shape of 1×38×38.
Step four: and (3) carrying out pixel-by-pixel addition operation on the spatial aggregation information obtained in the step (III), then carrying out nonlinear activation through sigmoid to obtain a weight of spatial importance, and then carrying out pixel-by-pixel multiplication operation on the weight and the original feature map F 1 to obtain a feature map F 2 reinforced by a spatial attention module.
The area extraction network diagram of the present invention is shown in fig. 4, and includes the following steps:
step one: and carrying out feature integration on the feature map processed by the attention module to enhance robustness, wherein the feature map is operated to be processed by a convolution layer with a convolution kernel size of 3.
Step two: m prior frames are laid on the feature map, the set of the prior frame sizes is {8,16, 32}, and the set of the aspect ratios is {0.5,1,2}, that is, each feature point corresponds to 9 different prior frames.
Step three: judging whether the prior frame is a foreground or a background, namely whether the prior frame contains targets or not, wherein a 1×1 convolution is used for obtaining an information matrix with the channel number of 2×9, the information matrix is used for predicting whether 9 prior frames on each feature point on the feature map contain targets or not, and then the foreground prior frame is obtained through softmax classification.
Step four: and carrying out coordinate adjustment on the prior frames, and obtaining an information matrix with 4 multiplied by 9 channels by using 1 multiplied by 1 convolution for predicting the change of the position coordinates of 9 prior frames on each feature point on the feature map.
Step five: and (3) screening the areas obtained in the step (III) and the step (IV), preventing the extracted areas from being too small or exceeding the boundary, sorting according to the softmax score, taking out corresponding suggestion frames, and using non-maximum suppression for the suggestion frames to obtain a final target area.
The area extraction network is used for extracting the area of the potential target in the feature map, the area has four values corresponding to the left upper corner and the right lower corner of the area respectively, the area value of the potential target area can be calculated according to the four values, and in order to enhance the detection capability of the small target area, the potential target area is divided into the small target area and other target areas, the dividing threshold value is 64 multiplied by 64, namely the small target area smaller than the value is the small target area, and the other target areas larger than the value are the other target areas.
The RoI Pooling region pooling schematic diagram of the present invention is shown in fig. 5a, and includes:
The RoI Pooling area is pooled for pooling conventional targets, and the pooled results are used for final position prediction and classification prediction of the targets. The following description will take a picture size of 256×256 and a certain extraction area size of 72×72 as an example: firstly, generating a feature map with the size of 16 multiplied by 16 after a picture passes through a convolutional neural network, and performing first quantization according to RoI Pooling operation to obtain a feature map mapping region with the size of 4 multiplied by 4 after a region passes through the convolutional neural network; then, taking RoIPooling as an example, dividing the size of 3×3 into 9 small areas with the size of 1.33×1.33 on the 4×4 feature map mapping area, and performing second quantization according to RoI Pooling to obtain a feature map mapping area with the size of 1×1; finally, the maximum value of each region is taken out to obtain a final result of 3×3 size.
The RoI Pooling region pooling is performed on the suggested region twice, partial characteristic information is lost in the process, and the lost information has little influence on the detection of a large target, but is difficult to apply to a small target with less information, so that the RoI alignment region pooling method is adopted for the detection of the small target.
The schematic of the pooling of the RoI Align region of the present invention is shown in FIG. 5b, comprising:
The RoI Align region pooling is used for pooling small targets, and the pooled results are used for final position prediction and classification prediction of the targets. The following description will take a picture size of 256×256 and a certain extraction area size of 72×72 as an example: firstly, generating a feature map with the size of 16 multiplied by 16 after a picture passes through a convolutional neural network, wherein the size of a region passes through the convolutional neural network and is 4.5 multiplied by 4.5, and reserving a decimal part of a result according to the operation of RoI alignment to obtain a feature map mapping region with the size of 4.5 multiplied by 4.5; then, taking the example of the RoI Align size of 3×3, dividing the feature map mapping region of 4.5X4.5 into 9 small regions with the size of 1.5X1.5, and reserving the decimal part of the result according to the operation of the RoI Align to obtain the feature map mapping region with the size of 1.5X1.5; and finally, dividing each area into four parts on average, taking the position of each center point, obtaining the value of the point through a bilinear interpolation method, and obtaining a final result with the size of 3 multiplied by 3 for the four points.
The pooling of the RoI Align region can effectively utilize the information of each characteristic point in the region, and is beneficial to the subsequent detection of small targets. After the potential target area is pooled through RoI Pooling and RoI alignment, feature information is subjected to feature extraction in the fifth stage of ResNet, average pooling and flat operation are performed after the feature information is extracted, and finally a final position prediction result and a classification prediction result are obtained by using a full connection layer.
The above list of detailed descriptions is only specific to practical embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent embodiments or modifications that do not depart from the spirit of the present invention should be included in the scope of the present invention.

Claims (4)

1. A small target detection method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
step one: the method comprises the steps of carrying out data enhancement on small targets in training data, wherein the used enhancement method comprises the steps of zooming, overturning, color gamut distortion, replication of small target examples and mosaics enhancement on pictures in a data set;
step two: extracting the characteristics of the image processed in the first step through a characteristic extraction network, wherein the characteristic extraction network uses ResNet as a backbone network, and the characteristic images of the third stage and the fourth stage are fused through cascading to be used as characteristic images;
step three: the mixed attention mechanism is used for improving the distinguishing capability of the network to the foreground and the background in the feature map, the mixed attention mechanism is in a serial form, and the feature map is weighted by the channel attention module and then is subjected to the space attention module to obtain a final feature map;
Step four: extracting the areas of potential targets in the feature map obtained in the third step by using an RPN area extraction network, and dividing the extracted potential targets into conventional targets and small targets according to the area size; taking an area with the area being more than or equal to 64 multiplied by 64 as a small target area, and taking a target area with the area being more than 64 multiplied by 64 as a large target area;
step five: performing RoI alignment region pooling operation on the small target region obtained in the step four, performing RoI Pooling region pooling operation on other target regions, and performing category judgment and position regression on the pooled result to obtain a final detection result;
The feature extraction network includes:
ResNet50 overall structure: the ResNet network has five phases in total, the first phase comprising one convolution layer with a convolution kernel of 7 and one maximum pooling layer with a pooling kernel of 3 steps of 2; the structures of the second stage to the fifth stage are similar, and the second stage comprises a convolution block and a plurality of identification blocks, wherein the number of the identification blocks is 2,3,5,2; the input dimension and the output dimension of the convolution block are inconsistent and are used for changing the dimension of the network, and the input dimension and the output dimension of the identification block are consistent and are connected in series, so that the depth of the network is deepened; the attention module includes:
Using an attention mechanism to enable the network to automatically judge important channels and positions in the feature map; the serial structure of the attention module CBAM is adopted, firstly, the channel attention module is used for judging the importance of the feature map channel, and secondly, the space attention module is used for judging the importance of the feature map space position, and the specific steps are as follows:
Step one: the input feature map F is processed by a maximum pooling block and an average pooling block of the channel attention module respectively; taking the maximum pooling block as an example, the feature map F firstly obtains compressed channel information through global maximum pooling, the shape is changed from 1024×38×38 to 1024×1×1, then one-dimensional convolution with a convolution kernel size of k is used for aggregating the compressed channel information, and due to the fact that the convolution has the property of parameter sharing, the number of parameters of a module can be effectively reduced by using the one-dimensional convolution compared with the use of a full-link layer in the conventional channel attention, wherein the convolution kernel size k is determined by the following formula, and C is the number of channels of the feature map:
step two: carrying out pixel-by-pixel addition operation on the channel aggregation information processed by the full-maximum pooling block and the average pooling block obtained in the step one, obtaining the weight of each channel importance through sigmoid nonlinear activation, and then carrying out pixel-by-pixel multiplication operation on the channel aggregation information and the original feature map F to obtain a feature map F 1 reinforced by a channel attention module;
Step three: carrying out space attention module processing on the feature map F 1 obtained in the second step; the feature map F 1 is subjected to global maximum pooling and global average pooling respectively in the channel dimension to obtain compressed space information, the shape is changed from 1024×38×38 to 1×38×38, then cascade operation is carried out on the pooling result in the channel dimension to obtain compressed information with the shape of 2×38×38, and then two-dimensional convolution with the convolution kernel size of 7 is used for polymerizing the compressed space information to obtain aggregation information with the shape of 1×38×38;
Step four: and (3) carrying out pixel-by-pixel addition operation on the spatial aggregation information obtained in the step (III), then carrying out nonlinear activation through sigmoid to obtain a weight of spatial importance, and then carrying out pixel-by-pixel multiplication operation on the weight and the original feature map F 1 to obtain a feature map F 2 reinforced by a spatial attention module.
2. The deep learning-based small target detection method according to claim 1, wherein: firstly, carrying out data enhancement on image data in a training data set, wherein the enhancement method comprises five means of scaling, overturning, color gamut distortion, copying of a small target instance and Mosaic enhancement; secondly, performing feature extraction on the image subjected to data enhancement by using ResNet feature extraction network, wherein a feature fusion technology is used in a feature extraction stage; the feature map uses a mixed attention mechanism to improve the distinguishing capability of the network to the foreground and the background in the feature map, and the mixed attention mechanism uses a serial form; then, using an RPN area extraction network to extract the areas of potential targets from the feature images processed by the attention module, dividing the extracted potential target areas into targets and small targets according to the area size, and distinguishing the small targets of the conventional targets to obtain a threshold value of 64 multiplied by 64; and finally carrying out RoI Pooling area pooling operation on the obtained target area, carrying out RoI alignment area pooling operation on the small target area, solving the problem of characteristic loss caused by twice quantitative rounding in the network processing flow by the RoI alignment area pooling, thereby improving the detection capability of the small target, and finally carrying out category judgment and position regression on the pooled result to obtain a final detection result.
3. The deep learning-based small target detection method according to claim 1, wherein: feature fusion stage: feature fusion is carried out on the feature graphs generated in the third stage and the fourth stage, so that detail information and semantic information of the feature graphs are enhanced; the feature fusion adopts pixel addition or channel cascading operation, and takes negative effects caused by direct pixel addition into consideration, and adopts a channel cascading mode to perform the feature fusion; a three-channel picture of size 600 x 600 is explained as follows: the three-channel picture can obtain a feature map C 1 with the width and the height of 75 and the channel number of 512 after passing through a stage three, a feature map C 2 with the width and the height of 38 and the channel number of 1024 after passing through a stage four, the size of the feature map C 1 is adjusted to the size of a feature map C 2 through maximum pooling to obtain a feature map C 3, then the feature map C 2 and the feature map C 3 are connected in series in the channel direction in a channel mode, and finally the feature map after being connected in series is subjected to channel number adjustment operation through 1X 1 convolution to obtain a final fusion feature map C.
4. The deep learning-based small target detection method according to claim 1, wherein: an area extraction network comprising the steps of:
step one: feature integration is carried out on the feature graphs processed by the attention module so as to enhance robustness, and the feature graphs are operated to be processed by a convolution layer with the convolution kernel size of 3;
Step two: paving m prior frames on the feature map, wherein the size sets of the prior frames are {8,16, 32}, and the aspect ratio sets are {0.5,1,2}, namely each feature point corresponds to 9 different prior frames;
Step three: judging whether the prior frame is a foreground or a background, namely whether the prior frame contains targets or not, wherein 1X 1 convolution is used for obtaining an information matrix with the channel number of 2X 9, the information matrix is used for predicting whether 9 prior frames on each feature point on the feature map contain targets or not, and then the foreground prior frame is obtained through softmax classification;
Step four: carrying out coordinate adjustment on the prior frames, and obtaining an information matrix with 4 multiplied by 9 channels by using 1 multiplied by 1 convolution, wherein the information matrix is used for predicting the change of the position coordinates of 9 prior frames on each feature point on the feature map;
Step five: and (3) screening the areas obtained in the step (III) and the step (IV), preventing the extracted areas from being too small or exceeding the boundary, sorting according to the softmax score, taking out corresponding suggestion frames, and using non-maximum suppression for the suggestion frames to obtain a final target area.
CN202210123900.XA 2022-02-10 2022-02-10 Small target detection method based on deep learning Active CN114494728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210123900.XA CN114494728B (en) 2022-02-10 2022-02-10 Small target detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210123900.XA CN114494728B (en) 2022-02-10 2022-02-10 Small target detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN114494728A CN114494728A (en) 2022-05-13
CN114494728B true CN114494728B (en) 2024-06-07

Family

ID=81477644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210123900.XA Active CN114494728B (en) 2022-02-10 2022-02-10 Small target detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN114494728B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116766213B (en) * 2023-08-24 2023-11-03 烟台大学 Bionic hand control method, system and equipment based on image processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN112801158A (en) * 2021-01-21 2021-05-14 中国人民解放军国防科技大学 Deep learning small target detection method and device based on cascade fusion and attention mechanism
CN113052185A (en) * 2021-03-12 2021-06-29 电子科技大学 Small sample target detection method based on fast R-CNN
CN113673618A (en) * 2021-08-26 2021-11-19 河南中烟工业有限责任公司 Tobacco insect target detection method fused with attention model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN112801158A (en) * 2021-01-21 2021-05-14 中国人民解放军国防科技大学 Deep learning small target detection method and device based on cascade fusion and attention mechanism
CN113052185A (en) * 2021-03-12 2021-06-29 电子科技大学 Small sample target detection method based on fast R-CNN
CN113673618A (en) * 2021-08-26 2021-11-19 河南中烟工业有限责任公司 Tobacco insect target detection method fused with attention model

Also Published As

Publication number Publication date
CN114494728A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN110503112B (en) Small target detection and identification method for enhancing feature learning
Qu et al. A crack detection algorithm for concrete pavement based on attention mechanism and multi-features fusion
CN110210539B (en) RGB-T image saliency target detection method based on multi-level depth feature fusion
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN108121991B (en) Deep learning ship target detection method based on edge candidate region extraction
CN108334881B (en) License plate recognition method based on deep learning
WO2018145470A1 (en) Image detection method and device
CN107038416B (en) Pedestrian detection method based on binary image improved HOG characteristics
CN109886159B (en) Face detection method under non-limited condition
CN112580661B (en) Multi-scale edge detection method under deep supervision
CN113920468B (en) Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN108038486A (en) A kind of character detecting method
CN110084302A (en) A kind of crack detection method based on remote sensing images
CN114494728B (en) Small target detection method based on deep learning
CN116645328A (en) Intelligent detection method for surface defects of high-precision bearing ring
CN114005081A (en) Intelligent detection device and method for foreign matters in tobacco shreds
Ming et al. Defect detection of LGP based on combined classifier with dynamic weights
CN117437201A (en) Road crack detection method based on improved YOLOv7
CN112836573A (en) Lane line image enhancement and completion method based on confrontation generation network
CN115376082A (en) Lane line detection method integrating traditional feature extraction and deep neural network
CN116030361A (en) CIM-T architecture-based high-resolution image change detection method
CN113139431A (en) Image saliency target detection method based on deep supervised learning
CN115661482B (en) RGB-T salient target detection method based on joint attention
CN116824630A (en) Light infrared image pedestrian target detection method
CN117611807A (en) Large-scale image ore detection method based on improved YOLOv5x

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant