CN114549959A

CN114549959A - Infrared dim target real-time detection method and system based on target detection model

Info

Publication number: CN114549959A
Application number: CN202210183525.8A
Authority: CN
Inventors: 刘晓涛; 魏子翔; 刘静
Original assignee: Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Institute of Technology of Xidian University
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-27

Abstract

The invention provides an infrared small and weak target real-time detection method and system based on a target detection model, which constructs an infrared small and weak airplane detection data set, and then redesigns a main network for extracting characteristics, so that the network can be more suitable for the characteristic extraction of small and weak targets; adding a spatial pyramid pooling layer behind the backbone network again to obtain different receptive fields, so that the positioning capability of the network model on the weak and small targets is enhanced; and finally, aiming at the fact that the size of the infrared weak and small target is fixed and single, detection heads are reduced, and shallow information and deep information are fused through a characteristic gold tower, so that the network has higher detection accuracy and stronger robustness. The invention not only improves the accuracy and speed of the weak and small target detection under the complex background, but also improves the robustness and adaptability of the weak and small target detection algorithm.

Description

Infrared dim target real-time detection method and system based on target detection model

Technical Field

The invention relates to the technical field of image recognition, in particular to an infrared small dim target real-time detection method and system based on a target detection model.

Background

The infrared small and weak target detection is a research difficulty and key point in the field of target detection, and has an important role in national defense and military. In the past, infrared dim and small target detection mainly depends on a method of searching a region of interest aiming at target characteristics, a threshold segmentation method utilizing infrared imaging physical characteristics, a background prediction modeling method and the like, but the detection efficiency of the algorithms is low, and the algorithms are difficult to cope with complicated and variable scenes. In recent years, with the development of deep learning, target detection algorithms based on deep learning have been developed greatly. The deep learning is realized through the nonlinear change of a network layer, and the abstract features in the image are extracted through a back propagation algorithm, so that the target is accurately identified. However, due to the problems of low imaging resolution, low contrast, lack of corresponding texture of the weak and small targets, and the like of the infrared image, the target detection algorithm designed based on the common data set is not suitable for detecting the infrared weak and small targets. Most of the existing algorithms are designed for high-resolution visible light images, and the images contain large, medium and small targets, so that the algorithms are not suitable for infrared weak and small target detection, and certain design redundancy exists. Therefore, it is urgent to design a special algorithm for infrared weak and small target detection.

In the prior art, an infrared weak and small target detection algorithm mainly adopts single-frame detection and multi-frame detection. The multi-frame detection algorithm generally assumes that the background is static, consumes more time than the single-frame algorithm, and is difficult to apply to real-time detection of infrared dim targets. The patent application with the application publication number of CN113643315A and the name of 'an infrared small target detection method based on a self-adaptive peak gradient descent filter' discloses an infrared small target detection method based on a self-adaptive peak gradient descent filter, and solves the problems that irregular clutter cannot be inhibited and the detection performance of a small target is low due to the fact that a rectangular window is used for extracting local features of an infrared image in the existing infrared small target detection method, but the method still cannot achieve real-time detection.

With the rapid development of the deep learning method, the target detection field is developed in a breakthrough manner, and a new method and a new thought are brought for the infrared weak and small target detection research. The patent application with the application publication number of CN113591968A and the name of 'infrared weak and small target detection method based on asymmetric attention feature fusion' discloses an infrared weak and small target detection method based on asymmetric attention feature fusion. The method is based on a target detection network based on deep learning, weak and small target detection is optimized by methods of optimizing an anchor box, introducing an attention mechanism and the like, so that the improved network focuses more on shallow semantic information, and the weak and small target detection capability is improved. At present, most networks only pay attention to the performance of the network, the complexity of the network is continuously improved to improve the detection effect of the infrared small and weak target, however, the infrared small and weak target occupies few pixels, lacks texture features and the like, and the characteristics of the infrared small and weak target can be lost by the too-deep and too-complex network, so that the special network is designed aiming at the characteristics of the infrared small and weak target, and the special network has important significance in real-time operation on the resource-limited equipment.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for detecting infrared dim targets in real time based on a target detection model, which can effectively lighten the network structure needing a large amount of calculation in the inference process, improve the performance of the model for detecting the infrared dim targets, reduce the complexity of the network and have higher reliability.

In order to achieve the purpose, the invention adopts the following specific technical scheme:

an infrared small target real-time detection method based on a target detection model comprises the following steps:

(1) acquiring an infrared small target data set, selecting and labeling the infrared small target data set to obtain a training set and a verification set;

(2) improving a feature extraction backbone network of a target detection model, adding a spatial pyramid pooling layer after the feature extraction network, optimizing a target detection head, and constructing an infrared weak and small target detection network;

(3) training the infrared weak small target detection network by using the loss function and the training set and the verification set in the step (1) to obtain an improved target detection model;

(4) and detecting the infrared small target by using the improved target detection model to obtain a detection result.

Preferably, the step (1) of obtaining the infrared weak and small target data set and selecting and labeling the infrared weak and small target data set to obtain a training set and a verification set includes:

acquiring an infrared small target data set and selecting a part of image data from the infrared small target data set, wherein each target corresponds to a position central point; marking the image data according to the coordinates of the position central point and a setting frame; and dividing the obtained data set according to a set proportion to obtain a training set and a verification set.

Preferably, step (2) comprises the steps of:

(2a) improving a feature extraction backbone network of the target detection model, and reducing down-sampling multiples of the feature extraction backbone network;

(2b) adding a spatial pyramid pooling layer after the modified feature extraction network;

(2c) adopting a characteristic pyramid structure, fusing shallow information and deep information, and reserving characteristic information of a shallow network and global information of a deeper network;

(2d) the number of detection heads is reduced, and the redundant structure of the network is reduced;

(2e) and constructing a network model, and obtaining a basic network model for detecting the infrared dim targets by readjusting the configuration file of the network.

Preferably, the 32-fold down-sampling in step (2a) is changed to 16-fold down-sampling.

Preferably, in the step (2b), the input feature map is divided into 4 branches during the spatial pyramid pooling, the 1 st branch is directly transmitted backwards, the other three branches respectively use the cores with the set number to perform the maximum pooling operation, the receptive field is enlarged, and then rich feature information is spliced through the channels.

Preferably, the number of probes in step (2d) is reduced to 1.

Preferably, the structure of the network model in step (2e) is input layer → first CBL layer → second CBL layer → first Resblock layer → maximum pooling layer → second Resblock layer → maximum pooling layer → third CBL layer → SPP layer → fourth CBL layer → fifth CLB layer → upsampling layer → sixth CBL layer → Conv layer → Yolo output layer; wherein the second Resblock layer is coupled to the upsampling layer; the CBL is composed of a convolutional layer, a normalization layer and an active layer, the CSP-res is composed of a first CBL, a second CBL, a third CBL and a fourth CBL, wherein the first CBL is routed to the output, and the second CBL is routed to the fourth CBL;

the size of the convolution kernel of the first CBL layer is set to be 3x3, the convolution step is set to be 2, and the number of the convolution kernels is set to be 32; the size of a convolution kernel of the second CBL layer is set to be 3x3, the convolution step size is set to be 2, and the number of the convolution kernels is set to be 64; the size of the convolution kernel of the third CBL layer is set to be 3x3, the convolution step is set to be 1, and the number of the convolution kernels is set to be 256; the size of a convolution kernel of the fourth CBL layer is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 256; the size of a convolution kernel of the fifth CBL layer is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 128; the size of a convolution kernel of the sixth CBL layer is set to be 3x3, the convolution step is set to be 1, and the number of the convolution kernels is set to be 256;

the size of the Conv layer convolution kernel is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 18;

the first Resblock layer is composed of four CBL layers, the sizes of convolution kernels are respectively 3x3, 3x3, 3x3 and 1x1, the convolution step lengths are all 1, and the number of the convolution kernels is respectively 64, 32 and 64;

the second Resblock layer is composed of four CBL layers, the sizes of convolution kernels are respectively 3x3, 3x3, 3x3 and 1x1, the convolution step lengths are all 1, and the number of the convolution kernels is respectively 128, 64 and 128;

SPP module inputs are linked through 1x1, 5x5, 9x9, 13x13 max pooling layers;

preferably, step (3) comprises the steps of:

(3a) carrying out initialization setting on training parameters of the infrared small and weak target detection network;

(3b) and inputting the training set into an infrared small target detection network, and training the network by using a CIoU loss function.

Preferably, step (4) comprises the steps of:

(4a) and inputting the data to be detected into the improved target detection model, performing maximum suppression on the outer frame of the infrared small and weak target and the small and weak target when processing the multi-scale prediction information, and counting to obtain the correct number of positive samples, the wrong number of positive samples and the wrong number of negative samples.

An infrared small target real-time detection system based on a target detection model comprises:

the data acquisition unit is used for acquiring an infrared small target data set, selecting and labeling the infrared small target data set, and obtaining a training set and a verification set;

the model construction unit is used for improving a feature extraction backbone network of the target detection model, adding a spatial pyramid pooling layer after the feature extraction network, optimizing a target detection head and constructing an infrared weak and small target detection network;

the model training unit is used for training the infrared weak and small target detection network by utilizing the loss function, the training set and the verification set to obtain an improved target detection model;

and the target detection unit is used for detecting the infrared dim target by using the improved target detection model to obtain a detection result.

The invention has the beneficial effects that: by combining the cross-stage local area network, less down-sampling multiples and the fusion of the shallow network characteristics and the global characteristics of the deep network, the network can better adapt to the detection of weak and small targets, the detection precision is improved, and the complexity of the network is reduced. Compared with the prior art, the method and the device have the advantages that the detection capability of the infrared dim target in the complex scene is improved, fewer computing resources can be used, and the dim target detection algorithm can be easily deployed on the edge equipment with limited resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a real-time detection method for infrared small and weak targets based on a target detection model;

FIG. 2 is a basic network module designed by the present invention;

FIG. 3 is a diagram of a network architecture designed by the present invention;

fig. 4 is a diagram of the effect of the present invention on detecting weak and small targets in different scenes.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Other embodiments, which can be derived by one of ordinary skill in the art from the embodiments given herein without any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention provides a real-time detection method for infrared dim targets based on a target detection model, which includes:

Preferably, step (2) comprises the steps of:

Preferably, the number of probes in step (2d) is reduced to 1.

SPP module inputs are linked through 1x1, 5x5, 9x9, 13x13 max pooling layers;

preferably, step (3) comprises the steps of:

Preferably, step (4) comprises the steps of:

The invention also provides an infrared dim target real-time detection system based on the target detection model, which comprises the following components:

The present invention will be described in detail below by taking the real-time detection of infrared small and weak airplanes as an example.

Step 1) acquiring an infrared dim target data set, selecting and labeling the acquired infrared dim target data set, and constructing the infrared dim target data set used by the invention:

step 1a) data sets selected in the experiment are data5, data10 and data21 in an infrared weak and small airplane target detection tracking data set under the ground/air background, the total number of the data sets is 3900, the format of pictures in the data sets is bmp, the resolution is 256x256, the bit depth is 24 bits, and each target corresponds to a mark position central point;

step 1b) converting the bmp files under the data5, data10 and data21 folders into jpg files;

step 1c) renaming and integrating jpg files in the data5, data10 and data21 folders into a JPEGImages folder;

step 1c1) naming each picture by datax _000xxx.jpg to prevent data confusion; the names may be unified in other ways, and the present invention is not limited thereto.

Step 1d) marking the data according to the coordinate of the target central point and a selected frame with the width of 12 and the height of 6, and writing the category, the coordinate of the normalized central point and the normalized width and height into a txt file corresponding to the picture;

step 1e) taking 70% of data under the JPEGImages folder as a training set sample, taking 15% of data as a verification set, and taking the rest data as a test set; or 80% of the data can be used as a training set sample, 20% of the data can be used as a verification set, and a test set is randomly extracted from the data, wherein the specific proportion is not limited.

Step 1e1) extracting a training set, a verification set and a test set from a JPEGImages folder by adopting a random extraction method;

step 1e2) writing the paths and names of all pictures in the training sample set into a train _ txt file under an ImageSets/Main folder, and writing the names of all pictures in the testing sample set into a test _ txt file under the ImageSets/Main folder, wherein the name of each picture is used as a line in the train _ txt file and the test _ txt file;

step 2) constructing an infrared small and weak target detection network:

step 2a) the yolov4-tiny feature extraction backbone network is improved, the original 32 times down-sampling of the network is changed into 16 times down-sampling, so as to ensure that the weak and small target information is not lost, and the calculation complexity of the network is reduced;

step 2a1) referring to fig. 2(a), the CBL layer is composed of a standard convolutional layer batchnorm batch normalization layer and a leak nonlinear activation function layer, wherein the slope of the leak nonlinear activation function is 0.1.

Step 2a2) referring to the Resblock layer in FIG. 2(b), the Resblock layer is composed of four CBL layers and outputs, wherein the first CBL output directly reaches the output, the second CBL output is used as one input of the fourth CBL, the learning capability of the network of the difference of the gradient combination is maximized, and the calculation amount is reduced.

Step 2b), adding a spatial pyramid pooling layer behind the modified feature extraction network, obtaining a larger receptive field through maximum pooling operations of different scales, and combining the global features and the local features together to enhance the positioning capability of the network on infrared weak and small targets;

step 2b1) referring to fig. 2(c), the spatial pyramid pooling module divides the input feature map into 4 branches, the 1 st branch is directly transmitted backwards without any processing, the other three branches respectively use the kernels with the maximum pooling cores of 5, 9 and 13 to perform maximum pooling operation, the receptive field is enlarged, and then rich feature information is spliced through the channels, so that the detection performance is improved.

Step 2c), reducing 2 detection heads of yolo to 1, so that the network can more accurately identify infrared dim targets;

step 2c1) removes the detection head used for predicting large objects from the original network structure, leaving only the detection head used for detecting small objects.

Step 2c2) correcting the number of aiming frames and the size of the aiming frames in the Yolov4-tiny target detection network, reducing the number of anchor frames from the original 6 to 3, and setting the size of the aiming frames as (10,3), (12,6) and (14, 7);

as shown in fig. 3, the step 2d) builds a network model, and obtains a basic network model for detecting the infrared small and weak targets by readjusting the configuration file of the network.

Step 2d1) building a network, the structure of which is input layer → first CBL layer → second CBL layer → first Resblock layer → maximum pooling layer → second Resblock layer → maximum pooling layer → third CBL layer → SPP layer → fourth CBL layer → fifth CLB layer → up-sampling layer → sixth CBL layer → Conv layer → Yolo output layer. Wherein the second Resblock layer is coupled to the upsampling layer. The CBL is composed of a convolutional layer, a normalization layer and an active layer, and the CSP-res is composed of a first CBL, a second CBL, a third CBL and a fourth CBL, wherein the first CBL is routed to the output, and the second CBL is routed to the fourth CBL.

Step 2d2), the size of the convolution kernel of the first CBL layer is set to be 3x3, the convolution step size is set to be 2, and the number of the convolution kernels is set to be 32; the size of the second CBL layer convolution kernel is set to 3x3, the convolution step size is set to 2, the number of convolution kernels is set to 64, the size of the third CBL layer convolution kernel is set to 3x3, the convolution step size is set to 1, and the number of convolution kernels is set to 256; the size of a convolution kernel of the fourth CBL layer is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 256; the size of a convolution kernel of the fifth CBL layer is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 128; setting the size of convolution kernels of a sixth CBL layer to be 3x3, setting convolution step size to be 1 and setting the number of convolution kernels to be 256;

step 2d3) setting the size of the Conv layer convolution kernel as 1x1, setting the convolution step as 1 and the number of the convolution kernels as 18;

step 2d4), the first Resblock layer is composed of four CBL layers as shown in fig. 2, the sizes of convolution kernels are respectively 3x3, 3x3, 3x3 and 1x1, the convolution step length is 1, and the number of convolution kernels is respectively 64, 32 and 64;

step 2d5), the second Resblock layer is composed of four CBL layers as shown in fig. 2, the sizes of convolution kernels are respectively 3x3, 3x3, 3x3 and 1x1, the convolution step size is 1, and the number of convolution kernels is 128, 64 and 128;

step 2d6) the SPP module is shown as coupled with input via 1x1, 5x5, 9x9, 13x13 max pooling layers;

step 2d7) is implemented by using the convolutional layer to extract features, using the route layer to connect feature maps, using the max pooling layer to reduce the sensitivity of the convolutional layer to the position, using the up-sampling layer to increase the dimension of the feature maps, and using the yolo layer to perform classification and position of the target object

Step 3) training the infrared weak and small target data set by using the loss function

Step 3a) carrying out initialization setting on training parameters of the infrared small and weak airplane detection network:

data, and setting max _ batchs to 6000 times, setting a learning rate learning _ rate to 0.00261, setting the batch to 64, setting the subdivisions to 8, loading 64 pictures to a memory once, completing forward propagation in 8 times, completing backward propagation and updating after forward propagation of 64 pictures every time by 8 pictures, and setting the width and height of an input picture to 256.

Step 3b) training the network by using a CIoU loss function, wherein the CIoU loss function can be expressed by the following four formulas:

CIou considers the overlapping area, the center distance and the length-width ratio, rho, of two detection frames²(b，b^gt) The Euclidean distance between the central points of the prediction frame and the marking frame, c is the diagonal distance of the minimum closure area containing the prediction frame and the marking frame simultaneously, b^gtAnd representing the coordinates of the center point of the labeling box. w is a^gtAnd h^gtWidth and height of real box, w and h width and height of prediction box, alpha is a positive number, v is used to measure the consistency of aspect ratio, IoU represents the intersection ratio of prediction box and real box

Step 4) detecting the infrared small and weak airplane by using the improved yolov4-tiny model to obtain a detection result

Step 4a) when processing the multi-scale prediction information, performing maximum suppression on the outer frame and the small target of the infrared small and small target, and after performing maximum suppression processing on the outer frame and the small target of the infrared small and small target, counting to obtain the correct number N of positive samples_TPNumber of erroneous positive samples N_FPAnd the number of false negative samples N_FN；

Step 4a1) Accuracy (Accuracy): indicating the proportion of positive and negative samples correctly classified

Step 4a2) Precision (Precision): represents the proportion of all samples originally classified as positive samples

Step 4a3) Recall (Recall): representing the proportion of originally positive samples to originally positive samples

And inputting the infrared small and weak target test set to be detected into an infrared small and weak target detection network for forward calculation to obtain the detection result of the infrared small and weak airplane target.

The technical effects of the invention are further explained by combining simulation experiments as follows:

1. simulation conditions and contents:

the simulation experiment of the invention is realized based on Darknet framework under the hardware environment of GPU GeForce GTX 2080Ti and RAM 32G and the software environment of Ubuntu 18.04. The data set used in the experiment is derived from partial data in the target detection tracking data set of the small and weak airplane in the infrared sequence image, which is prepared by the Huimanwei et al person in 2019 through the method steps of typical scene design, off-site experiment shooting, data processing and labeling and the like.

Simulation experiment: after the infrared weak and small target data set is constructed according to the invention, 6000 times of iterative training are carried out on the training set by using the optimized network. And inputting the test set to trained infrared weak and small target detection data for detection, as shown in fig. 4.

2. And (3) simulation result analysis:

compared with other infrared weak and small target detection algorithms, the infrared weak and small target detection result obtained by the method has obvious advantages, the accuracy of the infrared weak and small target detection is 98.24%, and the accuracy of the infrared weak and small target detection is only 92% in the prior art. The frame rate of the infrared weak and small airplane is tested on a GPU GeForce RTX 2080Ti, the average frame rate reaches 911.7fps, the original yolov4-tiny is only 443fps, and the detection speed is increased by more than two times. The detection result shows that the method can achieve good detection effect on infrared small and weak airplane targets in various complex scenes.

In light of the foregoing description of the preferred embodiments of the present invention, those skilled in the art can now make various alterations and modifications without departing from the scope of the invention. The technical scope of the present invention is not limited to the contents of the specification, and must be determined according to the scope of the claims.

Claims

1. A real-time detection method for infrared small and weak targets based on a target detection model is characterized by comprising the following steps:

2. The infrared small target real-time detection method based on the target detection model as claimed in claim 1, wherein the step (1) of obtaining the infrared small target data set and selecting and labeling the infrared small target data set to obtain a training set and a verification set comprises:

3. The infrared small target real-time detection method based on the target detection model as claimed in claim 1, wherein the step (2) comprises the steps of:

4. The method for real-time detection of infrared small and weak targets based on the target detection model as claimed in claim 3, wherein the step (2a) is changed from 32 times down-sampling to 16 times down-sampling.

5. The infrared small and weak target real-time detection method based on the target detection model as claimed in claim 3, characterized in that the input feature map is divided into 4 branches during the spatial pyramid pooling in step (2b), the 1 st branch is directly transmitted backwards, the other three branches respectively use a set number of kernels to perform maximum pooling operation to expand the receptive field, and then rich feature information is spliced through the channels.

6. The method for real-time detection of infrared small and weak targets based on the target detection model as claimed in claim 3, wherein the number of detection heads in step (2d) is reduced to 1.

7. The real-time infrared small target detection method based on the target detection model as claimed in claim 3, wherein the structure of the network model in step (2e) is input layer → first CBL layer → second CBL layer → first Resblock layer → maximum pooling layer → second Resblock layer → maximum pooling layer → third CBL layer → SPP layer → fourth CBL layer → fifth CLB layer → up-sampling layer → sixth CBL layer → Conv layer → Yolo output layer; wherein the second Resblock layer is coupled to the upsampling layer; the CBL is composed of a convolutional layer, a normalization layer and an active layer, the CSP-res is composed of a first CBL, a second CBL, a third CBL and a fourth CBL, wherein the first CBL is routed to the output, and the second CBL is routed to the fourth CBL;

the size of the convolution kernel of the first CBL layer is set to be 3x3, the convolution step length is set to be 2, and the number of the convolution kernels is set to be 32; the size of a convolution kernel of the second CBL layer is set to be 3x3, the convolution step size is set to be 2, and the number of the convolution kernels is set to be 64; the size of the convolution kernel of the third CBL layer is set to be 3x3, the convolution step is set to be 1, and the number of the convolution kernels is set to be 256; the size of a convolution kernel of the fourth CBL layer is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 256; the size of a convolution kernel of the fifth CBL layer is set to be 1x1, the convolution step is set to be 1, and the number of the convolution kernels is set to be 128; setting the size of convolution kernels of a sixth CBL layer to be 3x3, setting convolution step size to be 1 and setting the number of convolution kernels to be 256;

SPP module inputs are coupled through 1x1, 5x5, 9x9, 13x13 max pooling layers.

8. The infrared small target real-time detection method based on the target detection model as claimed in claim 1, wherein the step (3) comprises the steps of:

9. The infrared small target real-time detection method based on the target detection model as claimed in claim 1, wherein the step (4) comprises the steps of:

10. The utility model provides a real-time detection system of infrared weak little target based on target detection model which characterized in that includes: