CN114445332A

CN114445332A - Multi-scale detection method based on FASTER-RCNN model

Info

Publication number: CN114445332A
Application number: CN202111575232.6A
Authority: CN
Inventors: 关新锋; 刘凯; 吴波; 胡荣; 王兆俊; 常泽民; 王嘉楠; 徐小玉
Original assignee: Jiangxi Aerospace Pohu Cloud Technology Co ltd
Current assignee: Jiangxi Aerospace Pohu Cloud Technology Co ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-05-06

Abstract

The invention discloses a multi-scale detection method based on a FASTER-RCNN model, which comprises the following steps: inputting an image to be detected into a VGG16 convolutional neural network, and simultaneously sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model; performing sparse training on the preliminary fast-RCNN model, and finding out a small scale factor channel with smaller weight; pruning the small scale factor channel to obtain a small target Faster-RCNN model; and quantifying the characteristics in the small target fast-RCNN model to obtain a multi-scale small target fast-RCNN model. The problems of low identification precision, low analysis efficiency and large occupation of video memory resources are solved.

Description

Multi-scale detection method based on FASTER-RCNN model

Technical Field

The invention belongs to the technical field of image detection, and relates to a multi-scale detection method based on a FASTER-RCNN model.

Background

The big data age has come. With the development of society, the progress of various complex high and new technologies and the strong support of the state on the development of smart cities, particularly the related application and scale increase in the field of artificial intelligence, the field of video monitoring gradually becomes a hot spot of attention. An efficient video monitoring application mechanism is added in the construction of a smart city, and the security monitoring of city operation can be well guaranteed. Target detection aiming at video images becomes an indispensable monitoring technology, and particularly, the detection requirement of small targets is increasing day by day on the detection of the small targets. Currently, the deep learning field can be roughly divided into two groups. One is a college group, researches strong and complex model networks and experimental methods, is not limited by equipment environment, only pursues higher performance, and is suitable for high-precision detection projects such as military use; the other type is an engineering group, aiming at enabling an algorithm to be more stable, paying more attention to the overall efficiency and landing realization of a product, and the high efficiency is the pursuit target of the algorithm. The complex model inherently has a better model, but the high consumption of memory space and computing resources is the reason that it is difficult to apply to various hardware platforms, such as embedded chips, micro edge computing devices, etc.

The urban video monitoring scene image has complex background, the urban management picture shot by the video has large noise, and the pictures obtained by the tomography are many and complicated. On the other hand, the case picture does not have a certain track like a moving target, the scene of a single case picture is difficult to determine, and the behavior of the main case body is difficult to identify; the case characteristics, sizes, shapes and positions are not uniformly distributed, even partial cases are difficult to distinguish by naked eyes, the one-stage target detection method in the prior art, such as YOLO \ SSD and the like, is difficult to ensure the one-stage positioning and identification precision, the high-precision two-stage target detection frame has low identification efficiency, and real-time detection is difficult to realize.

Disclosure of Invention

The invention aims to provide a multi-scale detection method based on a FASTER-RCNN model, which solves the problem of low identification efficiency in the prior art.

The technical scheme adopted by the invention is that the multi-scale detection method based on the FASTER-RCNN model comprises the following steps:

step 1, inputting an image to be detected into a VGG16 convolutional neural network, and simultaneously sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model;

step 2, performing sparse training on the preliminary fast-RCNN model to find out a small scale factor channel with smaller weight;

step 3, pruning is carried out on the small scale factor channel to obtain a small target Faster-RCNN model;

and 4, converting the floating point operand type of the feature extraction network in the small target fast-RCNN model from the type of floa32 to data of the type of int8 to obtain a multi-scale small target fast-RCNN model.

The invention is also characterized in that:

marking interest points of an image to be detected before step 1, and carrying out data cleaning work to form a pre-training model sample library; and then, carrying out data enhancement processing on the images in the pre-training model sample library, adding a characteristic image noise filtering mechanism, and unifying the number of the images of each type of sample to obtain the image to be detected.

The specific process of the step 1 is as follows:

inputting each type of image to be detected into a VGG16 convolutional neural network, taking feature maps output by three feature extraction layers, namely conv1_2, conv3_3 and conv5_3, in the VGG16 convolutional neural network as the input of an RPN, generating a candidate frame with three scales by the RPN to judge the feature maps, correcting the candidate frame, and outputting candidate frame feature maps with different scales, namely a target feature map; mapping the candidate frames with three scales onto a target feature map by using ROI posing, calculating the size of a sampling network each time by setting a fixed scale by the ROI posing, and converting the features in any effective region of interest into a fixed-range feature map with the size of an original image; and carrying out target classification and border regression on the candidate frame feature map so as to complete the positioning of the target in the image, namely forming a preliminary fast-RCNN model.

And 3, the pruning threshold value of the scale factor in the pruning is 0-1.

The specific process of the step 4 is as follows: for the feature extraction network in the multi-scale small target fast-RCNN model, uniformly distributed activation values are extracted, and the flo 32 type data with the range of [ - | max |, + | max | ] in one tensor is directly mapped into int8 type data with the range of [ -127, +127 ]; for the activation values with uneven distribution, firstly, divergence is adopted to set a threshold value | T | for the activation value of each layer, then the flo 32 type data of [ - | T |, + | T | ] is mapped into int8 type data with the range of [ -127, +127], and the part exceeding the threshold value | T | is completely mapped at two ends to obtain a multi-scale small target Faster-RCNN model.

The invention has the beneficial effects that: according to the multi-scale detection method based on the FASTER-RCNN model, a multi-scale regression mechanism is introduced into a FASTER-RCNN target detection algorithm, and a target detection model with small samples, high precision and high efficiency of urban management events is realized by combining with network lightweight cutting; on the basis of a classic two-stage target detection framework, multi-scale and model pruning quantification is combined, training and detection speeds are increased, and the problems of low recognition accuracy, low analysis efficiency and large occupation of video memory resources are solved.

Drawings

FIG. 1 is a flow chart of the multi-scale detection method based on the FASTER-RCNN model of the present invention;

FIG. 2 is a flow chart of the pruning process in the FASTER-RCNN model-based multi-scale detection method of the present invention;

FIG. 3 is a flow chart of the quantization process in the FASTER-RCNN model-based multi-scale detection method of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

As shown in FIG. 1, the multi-scale detection method based on the FASTER-RCNN model comprises the following steps:

step 1, identifying by taking a lower mask of an urban video camera as a target, marking interest points of images to be detected (wearing masks and not wearing masks), and performing data cleaning work to form a pre-training model sample library; performing data enhancement processing on images in a pre-training model sample library, adding a characteristic image noise filtering mechanism, unifying the number of images of each type of sample, avoiding the problem of low overall model precision caused by the class imbalance of training data, and obtaining an image to be detected;

step 2, inputting the image to be detected processed in the step 1 into a VGG16 convolutional neural network, and simultaneously and sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model;

specifically, each type of image to be detected is input into a VGG16 convolutional neural network, feature maps output by three feature extraction layers, namely conv1_2, conv3_3 and conv5_3, in the VGG16 convolutional neural network are simultaneously used as input of an RPN, candidate frames with three scales are generated by the RPN to judge the feature maps, the candidate frames are corrected at the same time, and candidate frame feature maps with different scales, namely target feature maps, are output; mapping the candidate frames with three scales onto a target feature map by using ROI posing, calculating the size of a sampling network each time by setting a fixed scale by the ROI posing, and converting the features in any effective region of interest into a fixed-range feature map with the size of an original image; and (4) carrying out target classification (head portrait with/without mask) and border regression on the candidate frame feature map so as to complete the positioning of the target in the image, namely forming a preliminary Faster-RCNN model.

Step 3, performing sparse training on the preliminary fast-RCNN model to find out a small scale factor channel with smaller weight;

specifically, as shown in fig. 2, the scale factor of each BN layer of the VGG16 network corresponds to a specific convolution channel, and each scale factor is associated with each channel of the convolution layer, and during training, the scale factors of the BN layer are thinned and normalized by L1, the values of the scale factors of the BN layer are pushed to zero, and the importance of the convolution channels is quantized and sorted, so that an unimportant channel (a small scale factor channel with a smaller weight) is determined. Introducing a scale factor gamma into each channel, multiplying the scale factor gamma by the output of the channel, and combining the network weight and the scale factors to sparsely regularize the scale factors.

Step 4, setting a pruning threshold value of the scale factor to be 0-1, and pruning the small scale factor channel to obtain a small target fast-RCNN model; the larger the pruning threshold is set, the more the model is cropped, and the smaller the model size is, the faster the recognition speed is, and the percentage is 0.25 in the embodiment. And pruning channels or nodes with small scale factors according to unimportant channels automatically identified by sparse training. Compared with a basic network, the network structure of the pruned model is changed, and the channels or nodes with small scale factors and smaller weights are deleted, so that the model with a more compact network structure is obtained.

Step 5, converting the floating point operand type of the feature extraction network in the small target fast-RCNN model from the type of floa32 to data of the type of int8 to obtain a multi-scale small target fast-RCNN model;

specifically, as shown in fig. 3, for the uniformly distributed activation values in the feature extraction network in the multi-scale small target fast-RCNN model, floa32 type data with a range of [ - | max |, + | max | ] in one tensor is directly mapped to int8 type data with a range of [ -127, +127 ]; for the activation values with uneven distribution, firstly, divergence is adopted to set a threshold value | T | for the activation value of each layer, then the flo 32 type data of [ - | T |, + | T | ] is mapped into int8 type data with the range of [ -127, +127], and the part exceeding the threshold value | T | is completely mapped at two ends to obtain a multi-scale small target Faster-RCNN model.

According to the mode and the multi-scale detection method based on the FASTER-RCNN model, the multi-scale detection method based on the FASTER-RCNN model is characterized in that a fast-RCNN target detection algorithm introduces a multi-scale regression mechanism and combines with network lightweight cutting, so that a small-sample, high-precision and high-efficiency target detection model of an urban management event is realized; on the basis of a classic two-stage target detection framework, multi-scale and model pruning quantification is combined, training and detection speeds are increased, and the problems of low recognition accuracy, low analysis efficiency and large occupation of video memory resources are solved.

Claims

1. The multi-scale detection method based on the FASTER-RCNN model is characterized by comprising the following steps:

step 2, performing sparse training on the preliminary Faster-RCNN model to find out a small scale factor channel with smaller weight;

step 3, pruning the small scale factor channel to obtain a small target Faster-RCNN model;

2. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein before step 1, the image to be detected is labeled with interest points, and data cleaning work is performed to form a pre-training model sample library; and then, carrying out data enhancement processing on the images in the pre-training model sample library, adding a characteristic image noise filtering mechanism, and unifying the number of the images of each type of sample to obtain the image to be detected.

3. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein the specific process of step 1 is as follows:

inputting each type of image to be detected into a VGG16 convolutional neural network, taking feature maps output by three feature extraction layers, namely conv1_2, conv3_3 and conv5_3, in the VGG16 convolutional neural network as the input of an RPN, generating a candidate frame with three scales by the RPN to judge the feature maps, correcting the candidate frame, and outputting candidate frame feature maps with different scales, namely a target feature map; mapping the candidate frames with three scales onto a target feature map by using ROI posing, calculating the size of a sampling network each time by setting a fixed scale by the ROI posing, and converting the features in any effective region of interest into a fixed-range feature map with the size of an original image; and carrying out target classification and border regression on the candidate frame feature map so as to complete the positioning of the target in the image, namely forming a preliminary Faster-RCNN model.

4. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein the pruning threshold of the scale factors in the pruning in step 3 is 0 to 1.

5. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein the specific process of step 4 is: uniformly distributed activation values in a feature extraction network in the multi-scale small target fast-RCNN model are directly mapped to the type data of flo 32 with the range of [ - | max |, + | max | ] in one tensor into type data of int8 with the range of [ -127, +127 ]; for the activation values with uneven distribution, firstly, divergence is adopted to set a threshold value | T | for the activation value of each layer, then the flo 32 type data of [ - | T |, + | T | ] is mapped into int8 type data with the range of [ -127, +127], and the part exceeding the threshold value | T | is completely mapped at two ends to obtain a multi-scale small target Faster-RCNN model.