CN116258940A - Small target detection method for multi-scale features and self-adaptive weights - Google Patents

Small target detection method for multi-scale features and self-adaptive weights Download PDF

Info

Publication number
CN116258940A
CN116258940A CN202310205418.5A CN202310205418A CN116258940A CN 116258940 A CN116258940 A CN 116258940A CN 202310205418 A CN202310205418 A CN 202310205418A CN 116258940 A CN116258940 A CN 116258940A
Authority
CN
China
Prior art keywords
scale
detection
features
network
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310205418.5A
Other languages
Chinese (zh)
Inventor
张天飞
凌强
周荣强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Institute of Information Engineering
Original Assignee
Anhui Institute of Information Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Institute of Information Engineering filed Critical Anhui Institute of Information Engineering
Priority to CN202310205418.5A priority Critical patent/CN116258940A/en
Publication of CN116258940A publication Critical patent/CN116258940A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small target detection method of multi-scale characteristics and self-adaptive weights, which is used for detecting objects of images acquired in real time through a two-dimensional camera; the detection method adopts a ResNet network as a backbone network. By adopting the technical scheme, the ResNet network is improved, and the method of merging the multi-scale features and adaptively distributing the weights is adopted, so that the importance of each feature of the network is adaptively and dynamically adjusted, the detection capability, accuracy and detection effect of the model on the small target are improved, and the detection rate of the small target can be effectively improved.

Description

Small target detection method for multi-scale features and self-adaptive weights
Technical Field
The invention belongs to the technical field of image detection and processing. More particularly, the invention relates to a method for small target detection of multi-scale features and adaptive weights.
Background
Target detection is an important research direction in the field of computer vision, and is also a research basis for other complex visual tasks such as image segmentation and target tracking.
With the development of deep learning, a target detection technology based on the deep learning has made tremendous progress. In a real scene, as a large number of small targets exist, the small target detection has wide application prospect.
For example, in an unmanned system, when objects such as traffic lights or pedestrians are relatively small, unmanned vehicles are still required to accurately recognize the objects and respond accordingly; in the analysis of satellite images, it is necessary to detect objects such as automobiles, ships, and the like.
These targets are often difficult to detect due to the too small scale. Because the pixels of the small target are few, effective information is difficult to extract, and the detection of the small target faces great difficulties and challenges.
Therefore, research on an effective method for detecting a small target and improvement of detection performance and detection effect of the small target are very important and urgent research subjects in the current target detection field.
Disclosure of Invention
The invention provides a small target detection method with multi-scale characteristics and self-adaptive weights, and aims to improve the detection capability and detection effect of small targets.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
according to the small target detection method, object detection is carried out on the image acquired in real time through the two-dimensional camera; the detection method adopts a ResNet network as a backbone network.
According to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced.
The detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.
The block module is a general module in the ResNet network, namely a residual block, and can be expressed by the following formula (1):
χ l+1 =χ l +F(χ l ,w l ) (1)
in formula (1):
x l representing layer one features;
x l+1 represents layer 1 features;
w l representing layer I convolution kernel weights;
F(x l ,w l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, which is a combination of full-join layer and activation function.
The detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):
Figure BDA0004110724690000021
in formula (2):
g c statistical information representing each channel;
c is the number of the feature graphs;
X c c feature maps;
H. w is the height and width of the feature map respectively;
the compression of information from the space dimension is realized, and on the basis, corresponding weights are generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled:
s=F(g,W) (3)
in formula (3):
s represents a different weight for each channel;
g represents channel statistics obtained according to formula (2);
w represents a convolution kernel weight;
f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.
According to the detection method, a trans-scale characteristic layer is added into a backbone network, so that extraction of network characteristics is facilitated.
The detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.
The network output of the detection method is four scales, and each scale fuses the characteristic information of the last scale, so that the operation is beneficial to strengthening shallow characteristics and effectively describing small targets; and finally, fusing the four scales in the subsequent processing.
The detection method obtains four-scale feature graphs through a backbone network; if targets with different scales in the target area can be detected effectively, the scale images are required to be fused; the method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.
The FPN structure transmits high-level strong semantic features, so that the whole pyramid is enhanced in semantic information, but positioning information is not transmitted; the PAN transmits the strong positioning information of the low-layer network to the high-layer characteristics, thereby being beneficial to improving the positioning precision.
By adopting the technical scheme, the ResNet network is improved, the importance of each characteristic is dynamically and adaptively adjusted by adopting a method of merging multi-scale characteristics and adaptively distributing weights, the detection capability, accuracy and detection effect of a model on a small target are improved, and the detection rate of the small target can be effectively improved.
Drawings
The contents of the drawings and the marks in the drawings are briefly described as follows:
FIG. 1 is a diagram of the ResNet network of the present invention;
FIG. 2 is a system block diagram of ConvNeXt-SE of the present invention;
FIG. 3 is a schematic diagram of a scale fusion structure according to the present invention;
fig. 4 is a system block diagram of the present invention.
Detailed Description
The following detailed description of the embodiments of the invention, given by way of example only, is presented in the accompanying drawings to aid in a more complete, accurate, and thorough understanding of the inventive concepts and aspects of the invention by those skilled in the art.
The invention relates to a multi-scale characteristic and self-adaptive weight small target detection method, which is used for detecting objects in a real-time collected image through a two-dimensional camera, so that small target objects in a target area can be detected as accurately and comprehensively as possible.
In order to solve the problems existing in the prior art and overcome the defects thereof and realize the aim of improving the detection capability and the detection effect of small targets, the invention adopts the following technical scheme:
the invention discloses a multi-scale characteristic and self-adaptive weight small target detection method, which adopts a ResNet network as a backbone network.
Fig. 4 is a system for applying the present invention, the system includes a camera to collect an image, introducing a cross-scale feature map through a res net backbone network, and adding a multi-scale feature adaptive weight to obtain target position information and target category information.
The specific analysis is as follows:
1. backbone network selection and optimization:
in order to prevent gradient disappearance and degradation problems in the network, and consider the situation that the network has enough characteristic representation capability, and then combine the simplicity, instantaneity and the like of the network, the ResNet network is selected as a backbone network, and the residual structure of the ResNet network can enable the neural network to design a deeper network structure, so that the representation of effective characteristics is facilitated.
According to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced. A large convolution kernel containing a channel attention mechanism is introduced to facilitate extraction of target features.
The use of large convolution kernels containing attention mechanisms in the block module facilitates extraction of network features by using in, but not limited to, the ResNet network.
On the basis, the multi-scale feature self-adaptive weight is added, the FPN+PAN structure is improved, the characterization capability of small target features is improved, and the accuracy of the model on small target detection is finally improved through the method. The technical key is that when a small target is detected, the importance of each characteristic of network self-adaption dynamic adjustment is realized by improving the ResNet network and setting multi-scale self-adaption weight.
The detection method is beneficial to extracting network characteristics by adding a trans-scale characteristic layer in the ResNet network but not limited to the ResNet network.
The invention is mainly based on ResNet network and improves the ResNet network, mainly processes multi-scale characteristics and fuses self-adaptive weights, thereby improving the detection rate of small targets. The size difference of targets to be detected is considered to be large, so the network structure design also needs to consider the problem of multi-scale fusion.
Fig. 1 is a diagram of a network structure of a res net employed in the present invention.
As shown in fig. 1, the network output of the detection method is four scales, each scale fuses the feature information of the previous scale, the operation is favorable for strengthening shallow features, effectively describing small targets and the like, and finally the four scales are fused in subsequent processing and the like.
2. Large convolution kernel attention mechanism:
the detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.
The block in fig. 1 is a generic block in the res net network, namely the residual block, which can be expressed using formula (1):
χ l+1 =χ l +F(χ l ,w l ) (1)
in formula (1):
x l represents layer 1 features;
x l+1 represents layer 1+1 features;
w l representing layer 1 convolution kernel weights;
F(x l ,w l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, is a full concatenationAnd a combination of an activation function and a junction layer.
In the formula (1), F (x) l ,w l ) The invention uses the large convolution kernel block used by ConvNeXt as reference and improves the large convolution kernel block based on different expression modes.
Each element in the output feature cannot utilize context information outside the region, considering that each learned filter works with one local receptive field.
To solve this problem, the present invention introduces a channel attention mechanism, as shown in fig. 2, the difference between (a) and (b) in the connex-SE is that in the attention branch, (a) uses a RELU activation function, and (b) uses a GELU activation function.
The reason for selecting the GELU is nonlinear is an important property of the model, meanwhile, random regular operation such as dropout is needed to be added for model generalization, the GELU is an idea of introducing random regularization in activation, probability description of neuron input is intuitively and more consistent with natural recognition, and meanwhile, the experimental effect is better than that of RELU.
The detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):
Figure BDA0004110724690000061
in formula (2):
g c statistical information representing each channel;
c is the number of the feature graphs;
X c c feature maps;
H. w is the height and width of the feature map respectively;
the compression of information from the space dimension is realized, and on the basis, corresponding weights are generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled:
s=F(g,W) (3)
in formula (3):
s represents a different weight for each channel;
g represents channel statistics obtained according to formula (2);
w represents a convolution kernel weight;
f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.
FIG. 2 is a block diagram of a ConvNeXt-SE system.
3. Multiscale feature fusion structure:
according to the detection method, four scale feature images are obtained through a backbone network, and if targets with different scales in a target area can be detected effectively, the scale images are required to be fused.
The method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.
In this regard, the present invention improves upon the FPN (Feature Pyramid Network) + PAN (Path Aggregation Network) structure in which:
the structure of the FPN transmits the high-level strong semantic features, enhances the semantic information of the whole pyramid, but does not transmit the positioning information.
The PAN transmits the strong positioning information of the low-layer network to the high-layer characteristic, thereby being beneficial to improving the positioning precision.
FIG. 3 is a schematic diagram of a scale fusion structure.
The structure of the invention is shown in fig. 3, in which s11, s12, s13, s21, s22 and s23 are all weight parameters which can be learned, which is favorable for highlighting the contribution of each scale and increasing the detection rate of small targets.
The detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.
In summary, the invention adds and fuses the information of one scale on each scale of the backbone network ResNet, and introduces a large convolution kernel containing a channel attention mechanism, so that the invention is beneficial to the extraction of target features; on the basis, the FPN+PAN structure with improved self-adaptive weight of the multi-scale features is added, the characterization capability of the small target features is improved, and the accuracy of the model on the small target detection is improved through the improvement method.
While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the above embodiments, but is capable of being modified or applied directly to other applications without modification, as long as various insubstantial modifications of the method concept and technical solution of the invention are adopted, all within the scope of the invention.

Claims (10)

1. The utility model provides a little target detection method of multiscale characteristic and self-adaptation weight, carries out object detection to the image of real-time acquisition through two-dimensional camera, its characterized in that: the detection method adopts a ResNet network as a backbone network.
2. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: according to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced.
3. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: the detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.
4. A method of small object detection of multi-scale features and adaptive weights as claimed in claim 3, characterized by: the block module is a general module in the ResNet network, namely a residual block, and can be expressed by the following formula (1):
χ l+1 =x l +F(χ l ,w l ) (1)
in formula (1):
x 1 represents layer 1 features;
x l+1 represents layer 1+1 features;
w 1 representing layer 1 convolution kernel weights;
F(x l ,w l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, which is a combination of full-join layer and activation function.
5. The method for small object detection of multi-scale features and adaptive weights according to claim 2, wherein: the detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):
Figure FDA0004110724680000011
in formula (2):
g c statistical information representing each channel;
c is the number of the feature graphs;
X c c feature maps;
H. w is the height and width of the feature map respectively;
the compression of information from space dimension is realized, and on the basis, corresponding weight is generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled;
s=F(g,W) (3)
in formula (3):
s represents a different weight for each channel;
g represents channel statistics obtained according to formula (2);
w represents a convolution kernel weight;
f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.
6. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: according to the detection method, a trans-scale characteristic layer is added into a backbone network, so that extraction of network characteristics is facilitated.
7. The method for small object detection of multi-scale features and adaptive weights according to claim 6, wherein: the detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.
8. The method for small object detection of multi-scale features and adaptive weights according to claim 6, wherein: the network output of the detection method is four scales, and each scale fuses the characteristic information of the last scale, so that the operation is beneficial to strengthening shallow characteristics and effectively describing small targets; and finally, fusing the four scales in the subsequent processing.
9. The method for small object detection of multi-scale features and adaptive weights according to claim 8, wherein: the detection method obtains four-scale feature graphs through a backbone network; if targets with different scales in the target area can be detected effectively, the scale images are required to be fused; the method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.
10. The method for small object detection of multi-scale features and adaptive weights according to claim 9, wherein: the FPN structure transmits high-level strong semantic features, so that the whole pyramid is enhanced in semantic information, but positioning information is not transmitted; the PAN transmits the strong positioning information of the low-layer network to the high-layer characteristics, thereby being beneficial to improving the positioning precision.
CN202310205418.5A 2023-03-06 2023-03-06 Small target detection method for multi-scale features and self-adaptive weights Pending CN116258940A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310205418.5A CN116258940A (en) 2023-03-06 2023-03-06 Small target detection method for multi-scale features and self-adaptive weights

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310205418.5A CN116258940A (en) 2023-03-06 2023-03-06 Small target detection method for multi-scale features and self-adaptive weights

Publications (1)

Publication Number Publication Date
CN116258940A true CN116258940A (en) 2023-06-13

Family

ID=86682319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310205418.5A Pending CN116258940A (en) 2023-03-06 2023-03-06 Small target detection method for multi-scale features and self-adaptive weights

Country Status (1)

Country Link
CN (1) CN116258940A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237830A (en) * 2023-11-10 2023-12-15 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention
CN117314898A (en) * 2023-11-28 2023-12-29 中南大学 Multistage train rail edge part detection method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237830A (en) * 2023-11-10 2023-12-15 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention
CN117237830B (en) * 2023-11-10 2024-02-20 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention
CN117314898A (en) * 2023-11-28 2023-12-29 中南大学 Multistage train rail edge part detection method
CN117314898B (en) * 2023-11-28 2024-03-01 中南大学 Multistage train rail edge part detection method

Similar Documents

Publication Publication Date Title
Zhang et al. Vehicle-damage-detection segmentation algorithm based on improved mask RCNN
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN110163187B (en) F-RCNN-based remote traffic sign detection and identification method
Tian et al. A dual neural network for object detection in UAV images
CN112183203B (en) Real-time traffic sign detection method based on multi-scale pixel feature fusion
CN112686207B (en) Urban street scene target detection method based on regional information enhancement
CN116258940A (en) Small target detection method for multi-scale features and self-adaptive weights
CN110569754A (en) Image target detection method, device, storage medium and equipment
CN114266977B (en) Multi-AUV underwater target identification method based on super-resolution selectable network
CN111738071B (en) Inverse perspective transformation method based on motion change of monocular camera
CN112801027A (en) Vehicle target detection method based on event camera
CN115115973A (en) Weak and small target detection method based on multiple receptive fields and depth characteristics
CN111881984A (en) Target detection method and device based on deep learning
CN113743163A (en) Traffic target recognition model training method, traffic target positioning method and device
CN113901931A (en) Knowledge distillation model-based behavior recognition method for infrared and visible light videos
CN116363535A (en) Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network
Schenkel et al. Domain adaptation for semantic segmentation using convolutional neural networks
CN114140698A (en) Water system information extraction algorithm based on FasterR-CNN
Leipnitz et al. The effect of image resolution in the human presence detection: A case study on real-world image data
CN111275027A (en) Method for realizing detection and early warning processing of expressway in foggy days
CN117274957B (en) Road traffic sign detection method and system based on deep learning
WO2023037494A1 (en) Model training device, control method, and non-transitory computer-readable medium
CN114882478B (en) Driver behavior recognition method for local multiscale feature fusion under weight optimization
CN115762178B (en) Intelligent electronic police violation detection system and method
CN118072146B (en) Unmanned aerial vehicle aerial photography small target detection method based on multi-level feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination