CN116258940A - Small target detection method for multi-scale features and self-adaptive weights - Google Patents
Small target detection method for multi-scale features and self-adaptive weights Download PDFInfo
- Publication number
- CN116258940A CN116258940A CN202310205418.5A CN202310205418A CN116258940A CN 116258940 A CN116258940 A CN 116258940A CN 202310205418 A CN202310205418 A CN 202310205418A CN 116258940 A CN116258940 A CN 116258940A
- Authority
- CN
- China
- Prior art keywords
- scale
- detection
- features
- network
- detection method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a small target detection method of multi-scale characteristics and self-adaptive weights, which is used for detecting objects of images acquired in real time through a two-dimensional camera; the detection method adopts a ResNet network as a backbone network. By adopting the technical scheme, the ResNet network is improved, and the method of merging the multi-scale features and adaptively distributing the weights is adopted, so that the importance of each feature of the network is adaptively and dynamically adjusted, the detection capability, accuracy and detection effect of the model on the small target are improved, and the detection rate of the small target can be effectively improved.
Description
Technical Field
The invention belongs to the technical field of image detection and processing. More particularly, the invention relates to a method for small target detection of multi-scale features and adaptive weights.
Background
Target detection is an important research direction in the field of computer vision, and is also a research basis for other complex visual tasks such as image segmentation and target tracking.
With the development of deep learning, a target detection technology based on the deep learning has made tremendous progress. In a real scene, as a large number of small targets exist, the small target detection has wide application prospect.
For example, in an unmanned system, when objects such as traffic lights or pedestrians are relatively small, unmanned vehicles are still required to accurately recognize the objects and respond accordingly; in the analysis of satellite images, it is necessary to detect objects such as automobiles, ships, and the like.
These targets are often difficult to detect due to the too small scale. Because the pixels of the small target are few, effective information is difficult to extract, and the detection of the small target faces great difficulties and challenges.
Therefore, research on an effective method for detecting a small target and improvement of detection performance and detection effect of the small target are very important and urgent research subjects in the current target detection field.
Disclosure of Invention
The invention provides a small target detection method with multi-scale characteristics and self-adaptive weights, and aims to improve the detection capability and detection effect of small targets.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
according to the small target detection method, object detection is carried out on the image acquired in real time through the two-dimensional camera; the detection method adopts a ResNet network as a backbone network.
According to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced.
The detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.
The block module is a general module in the ResNet network, namely a residual block, and can be expressed by the following formula (1):
χ l+1 =χ l +F(χ l ,w l ) (1)
in formula (1):
x l representing layer one features;
x l+1 represents layer 1 features;
w l representing layer I convolution kernel weights;
F(x l ,w l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, which is a combination of full-join layer and activation function.
The detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):
in formula (2):
g c statistical information representing each channel;
c is the number of the feature graphs;
X c c feature maps;
H. w is the height and width of the feature map respectively;
the compression of information from the space dimension is realized, and on the basis, corresponding weights are generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled:
s=F(g,W) (3)
in formula (3):
s represents a different weight for each channel;
g represents channel statistics obtained according to formula (2);
w represents a convolution kernel weight;
f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.
According to the detection method, a trans-scale characteristic layer is added into a backbone network, so that extraction of network characteristics is facilitated.
The detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.
The network output of the detection method is four scales, and each scale fuses the characteristic information of the last scale, so that the operation is beneficial to strengthening shallow characteristics and effectively describing small targets; and finally, fusing the four scales in the subsequent processing.
The detection method obtains four-scale feature graphs through a backbone network; if targets with different scales in the target area can be detected effectively, the scale images are required to be fused; the method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.
The FPN structure transmits high-level strong semantic features, so that the whole pyramid is enhanced in semantic information, but positioning information is not transmitted; the PAN transmits the strong positioning information of the low-layer network to the high-layer characteristics, thereby being beneficial to improving the positioning precision.
By adopting the technical scheme, the ResNet network is improved, the importance of each characteristic is dynamically and adaptively adjusted by adopting a method of merging multi-scale characteristics and adaptively distributing weights, the detection capability, accuracy and detection effect of a model on a small target are improved, and the detection rate of the small target can be effectively improved.
Drawings
The contents of the drawings and the marks in the drawings are briefly described as follows:
FIG. 1 is a diagram of the ResNet network of the present invention;
FIG. 2 is a system block diagram of ConvNeXt-SE of the present invention;
FIG. 3 is a schematic diagram of a scale fusion structure according to the present invention;
fig. 4 is a system block diagram of the present invention.
Detailed Description
The following detailed description of the embodiments of the invention, given by way of example only, is presented in the accompanying drawings to aid in a more complete, accurate, and thorough understanding of the inventive concepts and aspects of the invention by those skilled in the art.
The invention relates to a multi-scale characteristic and self-adaptive weight small target detection method, which is used for detecting objects in a real-time collected image through a two-dimensional camera, so that small target objects in a target area can be detected as accurately and comprehensively as possible.
In order to solve the problems existing in the prior art and overcome the defects thereof and realize the aim of improving the detection capability and the detection effect of small targets, the invention adopts the following technical scheme:
the invention discloses a multi-scale characteristic and self-adaptive weight small target detection method, which adopts a ResNet network as a backbone network.
Fig. 4 is a system for applying the present invention, the system includes a camera to collect an image, introducing a cross-scale feature map through a res net backbone network, and adding a multi-scale feature adaptive weight to obtain target position information and target category information.
The specific analysis is as follows:
1. backbone network selection and optimization:
in order to prevent gradient disappearance and degradation problems in the network, and consider the situation that the network has enough characteristic representation capability, and then combine the simplicity, instantaneity and the like of the network, the ResNet network is selected as a backbone network, and the residual structure of the ResNet network can enable the neural network to design a deeper network structure, so that the representation of effective characteristics is facilitated.
According to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced. A large convolution kernel containing a channel attention mechanism is introduced to facilitate extraction of target features.
The use of large convolution kernels containing attention mechanisms in the block module facilitates extraction of network features by using in, but not limited to, the ResNet network.
On the basis, the multi-scale feature self-adaptive weight is added, the FPN+PAN structure is improved, the characterization capability of small target features is improved, and the accuracy of the model on small target detection is finally improved through the method. The technical key is that when a small target is detected, the importance of each characteristic of network self-adaption dynamic adjustment is realized by improving the ResNet network and setting multi-scale self-adaption weight.
The detection method is beneficial to extracting network characteristics by adding a trans-scale characteristic layer in the ResNet network but not limited to the ResNet network.
The invention is mainly based on ResNet network and improves the ResNet network, mainly processes multi-scale characteristics and fuses self-adaptive weights, thereby improving the detection rate of small targets. The size difference of targets to be detected is considered to be large, so the network structure design also needs to consider the problem of multi-scale fusion.
Fig. 1 is a diagram of a network structure of a res net employed in the present invention.
As shown in fig. 1, the network output of the detection method is four scales, each scale fuses the feature information of the previous scale, the operation is favorable for strengthening shallow features, effectively describing small targets and the like, and finally the four scales are fused in subsequent processing and the like.
2. Large convolution kernel attention mechanism:
the detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.
The block in fig. 1 is a generic block in the res net network, namely the residual block, which can be expressed using formula (1):
χ l+1 =χ l +F(χ l ,w l ) (1)
in formula (1):
x l represents layer 1 features;
x l+1 represents layer 1+1 features;
w l representing layer 1 convolution kernel weights;
F(x l ,w l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, is a full concatenationAnd a combination of an activation function and a junction layer.
In the formula (1), F (x) l ,w l ) The invention uses the large convolution kernel block used by ConvNeXt as reference and improves the large convolution kernel block based on different expression modes.
Each element in the output feature cannot utilize context information outside the region, considering that each learned filter works with one local receptive field.
To solve this problem, the present invention introduces a channel attention mechanism, as shown in fig. 2, the difference between (a) and (b) in the connex-SE is that in the attention branch, (a) uses a RELU activation function, and (b) uses a GELU activation function.
The reason for selecting the GELU is nonlinear is an important property of the model, meanwhile, random regular operation such as dropout is needed to be added for model generalization, the GELU is an idea of introducing random regularization in activation, probability description of neuron input is intuitively and more consistent with natural recognition, and meanwhile, the experimental effect is better than that of RELU.
The detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):
in formula (2):
g c statistical information representing each channel;
c is the number of the feature graphs;
X c c feature maps;
H. w is the height and width of the feature map respectively;
the compression of information from the space dimension is realized, and on the basis, corresponding weights are generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled:
s=F(g,W) (3)
in formula (3):
s represents a different weight for each channel;
g represents channel statistics obtained according to formula (2);
w represents a convolution kernel weight;
f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.
FIG. 2 is a block diagram of a ConvNeXt-SE system.
3. Multiscale feature fusion structure:
according to the detection method, four scale feature images are obtained through a backbone network, and if targets with different scales in a target area can be detected effectively, the scale images are required to be fused.
The method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.
In this regard, the present invention improves upon the FPN (Feature Pyramid Network) + PAN (Path Aggregation Network) structure in which:
the structure of the FPN transmits the high-level strong semantic features, enhances the semantic information of the whole pyramid, but does not transmit the positioning information.
The PAN transmits the strong positioning information of the low-layer network to the high-layer characteristic, thereby being beneficial to improving the positioning precision.
FIG. 3 is a schematic diagram of a scale fusion structure.
The structure of the invention is shown in fig. 3, in which s11, s12, s13, s21, s22 and s23 are all weight parameters which can be learned, which is favorable for highlighting the contribution of each scale and increasing the detection rate of small targets.
The detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.
In summary, the invention adds and fuses the information of one scale on each scale of the backbone network ResNet, and introduces a large convolution kernel containing a channel attention mechanism, so that the invention is beneficial to the extraction of target features; on the basis, the FPN+PAN structure with improved self-adaptive weight of the multi-scale features is added, the characterization capability of the small target features is improved, and the accuracy of the model on the small target detection is improved through the improvement method.
While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the above embodiments, but is capable of being modified or applied directly to other applications without modification, as long as various insubstantial modifications of the method concept and technical solution of the invention are adopted, all within the scope of the invention.
Claims (10)
1. The utility model provides a little target detection method of multiscale characteristic and self-adaptation weight, carries out object detection to the image of real-time acquisition through two-dimensional camera, its characterized in that: the detection method adopts a ResNet network as a backbone network.
2. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: according to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced.
3. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: the detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.
4. A method of small object detection of multi-scale features and adaptive weights as claimed in claim 3, characterized by: the block module is a general module in the ResNet network, namely a residual block, and can be expressed by the following formula (1):
χ l+1 =x l +F(χ l ,w l ) (1)
in formula (1):
x 1 represents layer 1 features;
x l+1 represents layer 1+1 features;
w 1 representing layer 1 convolution kernel weights;
F(x l ,w l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, which is a combination of full-join layer and activation function.
5. The method for small object detection of multi-scale features and adaptive weights according to claim 2, wherein: the detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):
in formula (2):
g c statistical information representing each channel;
c is the number of the feature graphs;
X c c feature maps;
H. w is the height and width of the feature map respectively;
the compression of information from space dimension is realized, and on the basis, corresponding weight is generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled;
s=F(g,W) (3)
in formula (3):
s represents a different weight for each channel;
g represents channel statistics obtained according to formula (2);
w represents a convolution kernel weight;
f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.
6. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: according to the detection method, a trans-scale characteristic layer is added into a backbone network, so that extraction of network characteristics is facilitated.
7. The method for small object detection of multi-scale features and adaptive weights according to claim 6, wherein: the detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.
8. The method for small object detection of multi-scale features and adaptive weights according to claim 6, wherein: the network output of the detection method is four scales, and each scale fuses the characteristic information of the last scale, so that the operation is beneficial to strengthening shallow characteristics and effectively describing small targets; and finally, fusing the four scales in the subsequent processing.
9. The method for small object detection of multi-scale features and adaptive weights according to claim 8, wherein: the detection method obtains four-scale feature graphs through a backbone network; if targets with different scales in the target area can be detected effectively, the scale images are required to be fused; the method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.
10. The method for small object detection of multi-scale features and adaptive weights according to claim 9, wherein: the FPN structure transmits high-level strong semantic features, so that the whole pyramid is enhanced in semantic information, but positioning information is not transmitted; the PAN transmits the strong positioning information of the low-layer network to the high-layer characteristics, thereby being beneficial to improving the positioning precision.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310205418.5A CN116258940A (en) | 2023-03-06 | 2023-03-06 | Small target detection method for multi-scale features and self-adaptive weights |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310205418.5A CN116258940A (en) | 2023-03-06 | 2023-03-06 | Small target detection method for multi-scale features and self-adaptive weights |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116258940A true CN116258940A (en) | 2023-06-13 |
Family
ID=86682319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310205418.5A Pending CN116258940A (en) | 2023-03-06 | 2023-03-06 | Small target detection method for multi-scale features and self-adaptive weights |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116258940A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237830A (en) * | 2023-11-10 | 2023-12-15 | 湖南工程学院 | Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention |
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
-
2023
- 2023-03-06 CN CN202310205418.5A patent/CN116258940A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237830A (en) * | 2023-11-10 | 2023-12-15 | 湖南工程学院 | Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention |
CN117237830B (en) * | 2023-11-10 | 2024-02-20 | 湖南工程学院 | Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention |
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
CN117314898B (en) * | 2023-11-28 | 2024-03-01 | 中南大学 | Multistage train rail edge part detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Vehicle-damage-detection segmentation algorithm based on improved mask RCNN | |
CN111460968B (en) | Unmanned aerial vehicle identification and tracking method and device based on video | |
CN110163187B (en) | F-RCNN-based remote traffic sign detection and identification method | |
Tian et al. | A dual neural network for object detection in UAV images | |
CN112183203B (en) | Real-time traffic sign detection method based on multi-scale pixel feature fusion | |
CN112686207B (en) | Urban street scene target detection method based on regional information enhancement | |
CN116258940A (en) | Small target detection method for multi-scale features and self-adaptive weights | |
CN110569754A (en) | Image target detection method, device, storage medium and equipment | |
CN114266977B (en) | Multi-AUV underwater target identification method based on super-resolution selectable network | |
CN111738071B (en) | Inverse perspective transformation method based on motion change of monocular camera | |
CN112801027A (en) | Vehicle target detection method based on event camera | |
CN115115973A (en) | Weak and small target detection method based on multiple receptive fields and depth characteristics | |
CN111881984A (en) | Target detection method and device based on deep learning | |
CN113743163A (en) | Traffic target recognition model training method, traffic target positioning method and device | |
CN113901931A (en) | Knowledge distillation model-based behavior recognition method for infrared and visible light videos | |
CN116363535A (en) | Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network | |
Schenkel et al. | Domain adaptation for semantic segmentation using convolutional neural networks | |
CN114140698A (en) | Water system information extraction algorithm based on FasterR-CNN | |
Leipnitz et al. | The effect of image resolution in the human presence detection: A case study on real-world image data | |
CN111275027A (en) | Method for realizing detection and early warning processing of expressway in foggy days | |
CN117274957B (en) | Road traffic sign detection method and system based on deep learning | |
WO2023037494A1 (en) | Model training device, control method, and non-transitory computer-readable medium | |
CN114882478B (en) | Driver behavior recognition method for local multiscale feature fusion under weight optimization | |
CN115762178B (en) | Intelligent electronic police violation detection system and method | |
CN118072146B (en) | Unmanned aerial vehicle aerial photography small target detection method based on multi-level feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |