CN116258940A

CN116258940A - Small target detection method for multi-scale features and self-adaptive weights

Info

Publication number: CN116258940A
Application number: CN202310205418.5A
Authority: CN
Inventors: 张天飞; 凌强; 周荣强
Original assignee: Anhui Institute of Information Engineering
Current assignee: Anhui Institute of Information Engineering
Priority date: 2023-03-06
Filing date: 2023-03-06
Publication date: 2023-06-13

Abstract

The invention discloses a small target detection method of multi-scale characteristics and self-adaptive weights, which is used for detecting objects of images acquired in real time through a two-dimensional camera; the detection method adopts a ResNet network as a backbone network. By adopting the technical scheme, the ResNet network is improved, and the method of merging the multi-scale features and adaptively distributing the weights is adopted, so that the importance of each feature of the network is adaptively and dynamically adjusted, the detection capability, accuracy and detection effect of the model on the small target are improved, and the detection rate of the small target can be effectively improved.

Description

Small target detection method for multi-scale features and self-adaptive weights

Technical Field

The invention belongs to the technical field of image detection and processing. More particularly, the invention relates to a method for small target detection of multi-scale features and adaptive weights.

Background

Target detection is an important research direction in the field of computer vision, and is also a research basis for other complex visual tasks such as image segmentation and target tracking.

With the development of deep learning, a target detection technology based on the deep learning has made tremendous progress. In a real scene, as a large number of small targets exist, the small target detection has wide application prospect.

For example, in an unmanned system, when objects such as traffic lights or pedestrians are relatively small, unmanned vehicles are still required to accurately recognize the objects and respond accordingly; in the analysis of satellite images, it is necessary to detect objects such as automobiles, ships, and the like.

These targets are often difficult to detect due to the too small scale. Because the pixels of the small target are few, effective information is difficult to extract, and the detection of the small target faces great difficulties and challenges.

Therefore, research on an effective method for detecting a small target and improvement of detection performance and detection effect of the small target are very important and urgent research subjects in the current target detection field.

Disclosure of Invention

The invention provides a small target detection method with multi-scale characteristics and self-adaptive weights, and aims to improve the detection capability and detection effect of small targets.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

according to the small target detection method, object detection is carried out on the image acquired in real time through the two-dimensional camera; the detection method adopts a ResNet network as a backbone network.

According to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced.

The detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.

The block module is a general module in the ResNet network, namely a residual block, and can be expressed by the following formula (1):

χ _l+1 ＝χ _l +F(χ _l ，w _l ) (1)

in formula (1):

x _l representing layer one features;

x _l+1 represents layer 1 features;

w _l representing layer I convolution kernel weights;

F(x _l ，w _l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, which is a combination of full-join layer and activation function.

The detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):

in formula (2):

g _c statistical information representing each channel;

c is the number of the feature graphs;

X _c c feature maps;

H. w is the height and width of the feature map respectively;

the compression of information from the space dimension is realized, and on the basis, corresponding weights are generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled:

s＝F(g，W) (3)

in formula (3):

s represents a different weight for each channel;

g represents channel statistics obtained according to formula (2);

w represents a convolution kernel weight;

f (g, W) is a nonlinear function, which is a combination of a fully connected layer and an activation function.

According to the detection method, a trans-scale characteristic layer is added into a backbone network, so that extraction of network characteristics is facilitated.

The detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.

The network output of the detection method is four scales, and each scale fuses the characteristic information of the last scale, so that the operation is beneficial to strengthening shallow characteristics and effectively describing small targets; and finally, fusing the four scales in the subsequent processing.

The detection method obtains four-scale feature graphs through a backbone network; if targets with different scales in the target area can be detected effectively, the scale images are required to be fused; the method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.

The FPN structure transmits high-level strong semantic features, so that the whole pyramid is enhanced in semantic information, but positioning information is not transmitted; the PAN transmits the strong positioning information of the low-layer network to the high-layer characteristics, thereby being beneficial to improving the positioning precision.

By adopting the technical scheme, the ResNet network is improved, the importance of each characteristic is dynamically and adaptively adjusted by adopting a method of merging multi-scale characteristics and adaptively distributing weights, the detection capability, accuracy and detection effect of a model on a small target are improved, and the detection rate of the small target can be effectively improved.

Drawings

The contents of the drawings and the marks in the drawings are briefly described as follows:

FIG. 1 is a diagram of the ResNet network of the present invention;

FIG. 2 is a system block diagram of ConvNeXt-SE of the present invention;

FIG. 3 is a schematic diagram of a scale fusion structure according to the present invention;

fig. 4 is a system block diagram of the present invention.

Detailed Description

The following detailed description of the embodiments of the invention, given by way of example only, is presented in the accompanying drawings to aid in a more complete, accurate, and thorough understanding of the inventive concepts and aspects of the invention by those skilled in the art.

The invention relates to a multi-scale characteristic and self-adaptive weight small target detection method, which is used for detecting objects in a real-time collected image through a two-dimensional camera, so that small target objects in a target area can be detected as accurately and comprehensively as possible.

In order to solve the problems existing in the prior art and overcome the defects thereof and realize the aim of improving the detection capability and the detection effect of small targets, the invention adopts the following technical scheme:

the invention discloses a multi-scale characteristic and self-adaptive weight small target detection method, which adopts a ResNet network as a backbone network.

Fig. 4 is a system for applying the present invention, the system includes a camera to collect an image, introducing a cross-scale feature map through a res net backbone network, and adding a multi-scale feature adaptive weight to obtain target position information and target category information.

The specific analysis is as follows:

1. backbone network selection and optimization:

in order to prevent gradient disappearance and degradation problems in the network, and consider the situation that the network has enough characteristic representation capability, and then combine the simplicity, instantaneity and the like of the network, the ResNet network is selected as a backbone network, and the residual structure of the ResNet network can enable the neural network to design a deeper network structure, so that the representation of effective characteristics is facilitated.

According to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced. A large convolution kernel containing a channel attention mechanism is introduced to facilitate extraction of target features.

The use of large convolution kernels containing attention mechanisms in the block module facilitates extraction of network features by using in, but not limited to, the ResNet network.

On the basis, the multi-scale feature self-adaptive weight is added, the FPN+PAN structure is improved, the characterization capability of small target features is improved, and the accuracy of the model on small target detection is finally improved through the method. The technical key is that when a small target is detected, the importance of each characteristic of network self-adaption dynamic adjustment is realized by improving the ResNet network and setting multi-scale self-adaption weight.

The detection method is beneficial to extracting network characteristics by adding a trans-scale characteristic layer in the ResNet network but not limited to the ResNet network.

The invention is mainly based on ResNet network and improves the ResNet network, mainly processes multi-scale characteristics and fuses self-adaptive weights, thereby improving the detection rate of small targets. The size difference of targets to be detected is considered to be large, so the network structure design also needs to consider the problem of multi-scale fusion.

Fig. 1 is a diagram of a network structure of a res net employed in the present invention.

As shown in fig. 1, the network output of the detection method is four scales, each scale fuses the feature information of the previous scale, the operation is favorable for strengthening shallow features, effectively describing small targets and the like, and finally the four scales are fused in subsequent processing and the like.

2. Large convolution kernel attention mechanism:

The block in fig. 1 is a generic block in the res net network, namely the residual block, which can be expressed using formula (1):

χ _l+1 ＝χ _l +F(χ _l ，w _l ) (1)

in formula (1):

x _l represents layer 1 features;

x _l+1 represents layer 1+1 features;

w _l representing layer 1 convolution kernel weights;

F(x _l ，w _l ) The large convolution kernel block used for ConvNeXt is a nonlinear function, is a full concatenationAnd a combination of an activation function and a junction layer.

In the formula (1), F (x) _l ，w _l ) The invention uses the large convolution kernel block used by ConvNeXt as reference and improves the large convolution kernel block based on different expression modes.

Each element in the output feature cannot utilize context information outside the region, considering that each learned filter works with one local receptive field.

To solve this problem, the present invention introduces a channel attention mechanism, as shown in fig. 2, the difference between (a) and (b) in the connex-SE is that in the attention branch, (a) uses a RELU activation function, and (b) uses a GELU activation function.

The reason for selecting the GELU is nonlinear is an important property of the model, meanwhile, random regular operation such as dropout is needed to be added for model generalization, the GELU is an idea of introducing random regularization in activation, probability description of neuron input is intuitively and more consistent with natural recognition, and meanwhile, the experimental effect is better than that of RELU.

in formula (2):

g _c statistical information representing each channel;

c is the number of the feature graphs;

X _c c feature maps;

H. w is the height and width of the feature map respectively;

s＝F(g，W) (3)

in formula (3):

s represents a different weight for each channel;

g represents channel statistics obtained according to formula (2);

w represents a convolution kernel weight;

FIG. 2 is a block diagram of a ConvNeXt-SE system.

3. Multiscale feature fusion structure:

according to the detection method, four scale feature images are obtained through a backbone network, and if targets with different scales in a target area can be detected effectively, the scale images are required to be fused.

The method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.

In this regard, the present invention improves upon the FPN (Feature Pyramid Network) + PAN (Path Aggregation Network) structure in which:

the structure of the FPN transmits the high-level strong semantic features, enhances the semantic information of the whole pyramid, but does not transmit the positioning information.

The PAN transmits the strong positioning information of the low-layer network to the high-layer characteristic, thereby being beneficial to improving the positioning precision.

FIG. 3 is a schematic diagram of a scale fusion structure.

The structure of the invention is shown in fig. 3, in which s11, s12, s13, s21, s22 and s23 are all weight parameters which can be learned, which is favorable for highlighting the contribution of each scale and increasing the detection rate of small targets.

In summary, the invention adds and fuses the information of one scale on each scale of the backbone network ResNet, and introduces a large convolution kernel containing a channel attention mechanism, so that the invention is beneficial to the extraction of target features; on the basis, the FPN+PAN structure with improved self-adaptive weight of the multi-scale features is added, the characterization capability of the small target features is improved, and the accuracy of the model on the small target detection is improved through the improvement method.

While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the above embodiments, but is capable of being modified or applied directly to other applications without modification, as long as various insubstantial modifications of the method concept and technical solution of the invention are adopted, all within the scope of the invention.

Claims

1. The utility model provides a little target detection method of multiscale characteristic and self-adaptation weight, carries out object detection to the image of real-time acquisition through two-dimensional camera, its characterized in that: the detection method adopts a ResNet network as a backbone network.

2. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: according to the detection method, a trans-scale feature map is introduced into a ResNet backbone network, and a large convolution kernel containing a channel attention mechanism and enhanced small target features of multi-scale features and self-adaptive weights are introduced.

3. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: the detection method uses a large convolution kernel containing a channel attention mechanism in a block module, which is beneficial to extracting network characteristics.

4. A method of small object detection of multi-scale features and adaptive weights as claimed in claim 3, characterized by: the block module is a general module in the ResNet network, namely a residual block, and can be expressed by the following formula (1):

χ _l+1 ＝x _l +F(χ _l ，w _l ) (1)

in formula (1):

x ₁ represents layer 1 features;

x _l+1 represents layer 1+1 features;

w ₁ representing layer 1 convolution kernel weights;

5. The method for small object detection of multi-scale features and adaptive weights according to claim 2, wherein: the detection method generates channel statistical information by using global averaging pooling, as shown in the following formula (2):

in formula (2):

g _c statistical information representing each channel;

c is the number of the feature graphs;

X _c c feature maps;

H. w is the height and width of the feature map respectively;

the compression of information from space dimension is realized, and on the basis, corresponding weight is generated for each channel according to the following formula (3), so that the aim of highlighting important channels is fulfilled;

s＝F(g，W) (3)

in formula (3):

s represents a different weight for each channel;

g represents channel statistics obtained according to formula (2);

w represents a convolution kernel weight;

6. The method for small object detection of multi-scale features and adaptive weights according to claim 1, wherein: according to the detection method, a trans-scale characteristic layer is added into a backbone network, so that extraction of network characteristics is facilitated.

7. The method for small object detection of multi-scale features and adaptive weights according to claim 6, wherein: the detection method introduces a learnable weight parameter in the multi-scale fusion process, so that each scale is fully fused, and the detection of a small target is facilitated.

8. The method for small object detection of multi-scale features and adaptive weights according to claim 6, wherein: the network output of the detection method is four scales, and each scale fuses the characteristic information of the last scale, so that the operation is beneficial to strengthening shallow characteristics and effectively describing small targets; and finally, fusing the four scales in the subsequent processing.

9. The method for small object detection of multi-scale features and adaptive weights according to claim 8, wherein: the detection method obtains four-scale feature graphs through a backbone network; if targets with different scales in the target area can be detected effectively, the scale images are required to be fused; the method comprises the following steps: the FPN+PAN structure is improved, the characterization capability of the small target features is improved, and the accuracy of the model on small target detection is finally improved through the mode.

10. The method for small object detection of multi-scale features and adaptive weights according to claim 9, wherein: the FPN structure transmits high-level strong semantic features, so that the whole pyramid is enhanced in semantic information, but positioning information is not transmitted; the PAN transmits the strong positioning information of the low-layer network to the high-layer characteristics, thereby being beneficial to improving the positioning precision.