CN114445332A - Multi-scale detection method based on FASTER-RCNN model - Google Patents

Multi-scale detection method based on FASTER-RCNN model Download PDF

Info

Publication number
CN114445332A
CN114445332A CN202111575232.6A CN202111575232A CN114445332A CN 114445332 A CN114445332 A CN 114445332A CN 202111575232 A CN202111575232 A CN 202111575232A CN 114445332 A CN114445332 A CN 114445332A
Authority
CN
China
Prior art keywords
faster
rcnn model
scale
model
rcnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111575232.6A
Other languages
Chinese (zh)
Inventor
关新锋
刘凯
吴波
胡荣
王兆俊
常泽民
王嘉楠
徐小玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Aerospace Pohu Cloud Technology Co ltd
Original Assignee
Jiangxi Aerospace Pohu Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Aerospace Pohu Cloud Technology Co ltd filed Critical Jiangxi Aerospace Pohu Cloud Technology Co ltd
Priority to CN202111575232.6A priority Critical patent/CN114445332A/en
Publication of CN114445332A publication Critical patent/CN114445332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-scale detection method based on a FASTER-RCNN model, which comprises the following steps: inputting an image to be detected into a VGG16 convolutional neural network, and simultaneously sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model; performing sparse training on the preliminary fast-RCNN model, and finding out a small scale factor channel with smaller weight; pruning the small scale factor channel to obtain a small target Faster-RCNN model; and quantifying the characteristics in the small target fast-RCNN model to obtain a multi-scale small target fast-RCNN model. The problems of low identification precision, low analysis efficiency and large occupation of video memory resources are solved.

Description

Multi-scale detection method based on FASTER-RCNN model
Technical Field
The invention belongs to the technical field of image detection, and relates to a multi-scale detection method based on a FASTER-RCNN model.
Background
The big data age has come. With the development of society, the progress of various complex high and new technologies and the strong support of the state on the development of smart cities, particularly the related application and scale increase in the field of artificial intelligence, the field of video monitoring gradually becomes a hot spot of attention. An efficient video monitoring application mechanism is added in the construction of a smart city, and the security monitoring of city operation can be well guaranteed. Target detection aiming at video images becomes an indispensable monitoring technology, and particularly, the detection requirement of small targets is increasing day by day on the detection of the small targets. Currently, the deep learning field can be roughly divided into two groups. One is a college group, researches strong and complex model networks and experimental methods, is not limited by equipment environment, only pursues higher performance, and is suitable for high-precision detection projects such as military use; the other type is an engineering group, aiming at enabling an algorithm to be more stable, paying more attention to the overall efficiency and landing realization of a product, and the high efficiency is the pursuit target of the algorithm. The complex model inherently has a better model, but the high consumption of memory space and computing resources is the reason that it is difficult to apply to various hardware platforms, such as embedded chips, micro edge computing devices, etc.
The urban video monitoring scene image has complex background, the urban management picture shot by the video has large noise, and the pictures obtained by the tomography are many and complicated. On the other hand, the case picture does not have a certain track like a moving target, the scene of a single case picture is difficult to determine, and the behavior of the main case body is difficult to identify; the case characteristics, sizes, shapes and positions are not uniformly distributed, even partial cases are difficult to distinguish by naked eyes, the one-stage target detection method in the prior art, such as YOLO \ SSD and the like, is difficult to ensure the one-stage positioning and identification precision, the high-precision two-stage target detection frame has low identification efficiency, and real-time detection is difficult to realize.
Disclosure of Invention
The invention aims to provide a multi-scale detection method based on a FASTER-RCNN model, which solves the problem of low identification efficiency in the prior art.
The technical scheme adopted by the invention is that the multi-scale detection method based on the FASTER-RCNN model comprises the following steps:
step 1, inputting an image to be detected into a VGG16 convolutional neural network, and simultaneously sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model;
step 2, performing sparse training on the preliminary fast-RCNN model to find out a small scale factor channel with smaller weight;
step 3, pruning is carried out on the small scale factor channel to obtain a small target Faster-RCNN model;
and 4, converting the floating point operand type of the feature extraction network in the small target fast-RCNN model from the type of floa32 to data of the type of int8 to obtain a multi-scale small target fast-RCNN model.
The invention is also characterized in that:
marking interest points of an image to be detected before step 1, and carrying out data cleaning work to form a pre-training model sample library; and then, carrying out data enhancement processing on the images in the pre-training model sample library, adding a characteristic image noise filtering mechanism, and unifying the number of the images of each type of sample to obtain the image to be detected.
The specific process of the step 1 is as follows:
inputting each type of image to be detected into a VGG16 convolutional neural network, taking feature maps output by three feature extraction layers, namely conv1_2, conv3_3 and conv5_3, in the VGG16 convolutional neural network as the input of an RPN, generating a candidate frame with three scales by the RPN to judge the feature maps, correcting the candidate frame, and outputting candidate frame feature maps with different scales, namely a target feature map; mapping the candidate frames with three scales onto a target feature map by using ROI posing, calculating the size of a sampling network each time by setting a fixed scale by the ROI posing, and converting the features in any effective region of interest into a fixed-range feature map with the size of an original image; and carrying out target classification and border regression on the candidate frame feature map so as to complete the positioning of the target in the image, namely forming a preliminary fast-RCNN model.
And 3, the pruning threshold value of the scale factor in the pruning is 0-1.
The specific process of the step 4 is as follows: for the feature extraction network in the multi-scale small target fast-RCNN model, uniformly distributed activation values are extracted, and the flo 32 type data with the range of [ - | max |, + | max | ] in one tensor is directly mapped into int8 type data with the range of [ -127, +127 ]; for the activation values with uneven distribution, firstly, divergence is adopted to set a threshold value | T | for the activation value of each layer, then the flo 32 type data of [ - | T |, + | T | ] is mapped into int8 type data with the range of [ -127, +127], and the part exceeding the threshold value | T | is completely mapped at two ends to obtain a multi-scale small target Faster-RCNN model.
The invention has the beneficial effects that: according to the multi-scale detection method based on the FASTER-RCNN model, a multi-scale regression mechanism is introduced into a FASTER-RCNN target detection algorithm, and a target detection model with small samples, high precision and high efficiency of urban management events is realized by combining with network lightweight cutting; on the basis of a classic two-stage target detection framework, multi-scale and model pruning quantification is combined, training and detection speeds are increased, and the problems of low recognition accuracy, low analysis efficiency and large occupation of video memory resources are solved.
Drawings
FIG. 1 is a flow chart of the multi-scale detection method based on the FASTER-RCNN model of the present invention;
FIG. 2 is a flow chart of the pruning process in the FASTER-RCNN model-based multi-scale detection method of the present invention;
FIG. 3 is a flow chart of the quantization process in the FASTER-RCNN model-based multi-scale detection method of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in FIG. 1, the multi-scale detection method based on the FASTER-RCNN model comprises the following steps:
step 1, identifying by taking a lower mask of an urban video camera as a target, marking interest points of images to be detected (wearing masks and not wearing masks), and performing data cleaning work to form a pre-training model sample library; performing data enhancement processing on images in a pre-training model sample library, adding a characteristic image noise filtering mechanism, unifying the number of images of each type of sample, avoiding the problem of low overall model precision caused by the class imbalance of training data, and obtaining an image to be detected;
step 2, inputting the image to be detected processed in the step 1 into a VGG16 convolutional neural network, and simultaneously and sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model;
specifically, each type of image to be detected is input into a VGG16 convolutional neural network, feature maps output by three feature extraction layers, namely conv1_2, conv3_3 and conv5_3, in the VGG16 convolutional neural network are simultaneously used as input of an RPN, candidate frames with three scales are generated by the RPN to judge the feature maps, the candidate frames are corrected at the same time, and candidate frame feature maps with different scales, namely target feature maps, are output; mapping the candidate frames with three scales onto a target feature map by using ROI posing, calculating the size of a sampling network each time by setting a fixed scale by the ROI posing, and converting the features in any effective region of interest into a fixed-range feature map with the size of an original image; and (4) carrying out target classification (head portrait with/without mask) and border regression on the candidate frame feature map so as to complete the positioning of the target in the image, namely forming a preliminary Faster-RCNN model.
Step 3, performing sparse training on the preliminary fast-RCNN model to find out a small scale factor channel with smaller weight;
specifically, as shown in fig. 2, the scale factor of each BN layer of the VGG16 network corresponds to a specific convolution channel, and each scale factor is associated with each channel of the convolution layer, and during training, the scale factors of the BN layer are thinned and normalized by L1, the values of the scale factors of the BN layer are pushed to zero, and the importance of the convolution channels is quantized and sorted, so that an unimportant channel (a small scale factor channel with a smaller weight) is determined. Introducing a scale factor gamma into each channel, multiplying the scale factor gamma by the output of the channel, and combining the network weight and the scale factors to sparsely regularize the scale factors.
Step 4, setting a pruning threshold value of the scale factor to be 0-1, and pruning the small scale factor channel to obtain a small target fast-RCNN model; the larger the pruning threshold is set, the more the model is cropped, and the smaller the model size is, the faster the recognition speed is, and the percentage is 0.25 in the embodiment. And pruning channels or nodes with small scale factors according to unimportant channels automatically identified by sparse training. Compared with a basic network, the network structure of the pruned model is changed, and the channels or nodes with small scale factors and smaller weights are deleted, so that the model with a more compact network structure is obtained.
Step 5, converting the floating point operand type of the feature extraction network in the small target fast-RCNN model from the type of floa32 to data of the type of int8 to obtain a multi-scale small target fast-RCNN model;
specifically, as shown in fig. 3, for the uniformly distributed activation values in the feature extraction network in the multi-scale small target fast-RCNN model, floa32 type data with a range of [ - | max |, + | max | ] in one tensor is directly mapped to int8 type data with a range of [ -127, +127 ]; for the activation values with uneven distribution, firstly, divergence is adopted to set a threshold value | T | for the activation value of each layer, then the flo 32 type data of [ - | T |, + | T | ] is mapped into int8 type data with the range of [ -127, +127], and the part exceeding the threshold value | T | is completely mapped at two ends to obtain a multi-scale small target Faster-RCNN model.
According to the mode and the multi-scale detection method based on the FASTER-RCNN model, the multi-scale detection method based on the FASTER-RCNN model is characterized in that a fast-RCNN target detection algorithm introduces a multi-scale regression mechanism and combines with network lightweight cutting, so that a small-sample, high-precision and high-efficiency target detection model of an urban management event is realized; on the basis of a classic two-stage target detection framework, multi-scale and model pruning quantification is combined, training and detection speeds are increased, and the problems of low recognition accuracy, low analysis efficiency and large occupation of video memory resources are solved.

Claims (5)

1. The multi-scale detection method based on the FASTER-RCNN model is characterized by comprising the following steps:
step 1, inputting an image to be detected into a VGG16 convolutional neural network, and simultaneously sequentially passing outputs of three feature extraction layers, namely conv1_2, conv3_3 and conv5_3 in the VGG16 convolutional neural network through an RPN and an ROI posing network to obtain a preliminary Faster-RCNN model;
step 2, performing sparse training on the preliminary Faster-RCNN model to find out a small scale factor channel with smaller weight;
step 3, pruning the small scale factor channel to obtain a small target Faster-RCNN model;
and 4, converting the floating point operand type of the feature extraction network in the small target fast-RCNN model from the type of floa32 to data of the type of int8 to obtain a multi-scale small target fast-RCNN model.
2. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein before step 1, the image to be detected is labeled with interest points, and data cleaning work is performed to form a pre-training model sample library; and then, carrying out data enhancement processing on the images in the pre-training model sample library, adding a characteristic image noise filtering mechanism, and unifying the number of the images of each type of sample to obtain the image to be detected.
3. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein the specific process of step 1 is as follows:
inputting each type of image to be detected into a VGG16 convolutional neural network, taking feature maps output by three feature extraction layers, namely conv1_2, conv3_3 and conv5_3, in the VGG16 convolutional neural network as the input of an RPN, generating a candidate frame with three scales by the RPN to judge the feature maps, correcting the candidate frame, and outputting candidate frame feature maps with different scales, namely a target feature map; mapping the candidate frames with three scales onto a target feature map by using ROI posing, calculating the size of a sampling network each time by setting a fixed scale by the ROI posing, and converting the features in any effective region of interest into a fixed-range feature map with the size of an original image; and carrying out target classification and border regression on the candidate frame feature map so as to complete the positioning of the target in the image, namely forming a preliminary Faster-RCNN model.
4. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein the pruning threshold of the scale factors in the pruning in step 3 is 0 to 1.
5. The FASTER-RCNN model-based multi-scale detection method according to claim 1, wherein the specific process of step 4 is: uniformly distributed activation values in a feature extraction network in the multi-scale small target fast-RCNN model are directly mapped to the type data of flo 32 with the range of [ - | max |, + | max | ] in one tensor into type data of int8 with the range of [ -127, +127 ]; for the activation values with uneven distribution, firstly, divergence is adopted to set a threshold value | T | for the activation value of each layer, then the flo 32 type data of [ - | T |, + | T | ] is mapped into int8 type data with the range of [ -127, +127], and the part exceeding the threshold value | T | is completely mapped at two ends to obtain a multi-scale small target Faster-RCNN model.
CN202111575232.6A 2021-12-21 2021-12-21 Multi-scale detection method based on FASTER-RCNN model Pending CN114445332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111575232.6A CN114445332A (en) 2021-12-21 2021-12-21 Multi-scale detection method based on FASTER-RCNN model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111575232.6A CN114445332A (en) 2021-12-21 2021-12-21 Multi-scale detection method based on FASTER-RCNN model

Publications (1)

Publication Number Publication Date
CN114445332A true CN114445332A (en) 2022-05-06

Family

ID=81363236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111575232.6A Pending CN114445332A (en) 2021-12-21 2021-12-21 Multi-scale detection method based on FASTER-RCNN model

Country Status (1)

Country Link
CN (1) CN114445332A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315722A (en) * 2023-11-24 2023-12-29 广州紫为云科技有限公司 Pedestrian detection method based on knowledge migration pruning model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315722A (en) * 2023-11-24 2023-12-29 广州紫为云科技有限公司 Pedestrian detection method based on knowledge migration pruning model
CN117315722B (en) * 2023-11-24 2024-03-15 广州紫为云科技有限公司 Pedestrian detection method based on knowledge migration pruning model

Similar Documents

Publication Publication Date Title
CN107563372B (en) License plate positioning method based on deep learning SSD frame
US9865042B2 (en) Image semantic segmentation
CN108230339A (en) A kind of gastric cancer pathological section based on pseudo label iteration mark marks complementing method
Nguyen et al. Yolo based real-time human detection for smart video surveillance at the edge
Li et al. Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing
CN111461212A (en) Compression method for point cloud target detection model
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
CN111524117A (en) Tunnel surface defect detection method based on characteristic pyramid network
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
CN111753682A (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN112308087B (en) Integrated imaging identification method based on dynamic vision sensor
Guan et al. Multi-scale object detection with feature fusion and region objectness network
CN110569764B (en) Mobile phone model identification method based on convolutional neural network
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board
CN116977844A (en) Lightweight underwater target real-time detection method
CN113963333B (en) Traffic sign board detection method based on improved YOLOF model
CN109615610B (en) Medical band-aid flaw detection method based on YOLO v2-tiny
CN114445332A (en) Multi-scale detection method based on FASTER-RCNN model
CN114092467A (en) Scratch detection method and system based on lightweight convolutional neural network
CN114998866A (en) Traffic sign identification method based on improved YOLOv4
CN113705404A (en) Face detection method facing embedded hardware
CN112989928A (en) Lightweight image identification method based on wood structure
CN113012167A (en) Combined segmentation method for cell nucleus and cytoplasm
CN110738230A (en) clothes identification and classification method based on F-CDSSD
Li et al. Instance segmentation of traffic scene based on YOLACT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination