CN113052185A

CN113052185A - Small sample target detection method based on fast R-CNN

Info

Publication number: CN113052185A
Application number: CN202110270154.2A
Authority: CN
Inventors: 贾海涛; 鲜维富; 莫超杰; 许文波; 任利; 周焕来; 贾宇明
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-29

Abstract

The invention discloses a small sample target detection method based on Faster R-CNN. The fast-RCNN network is deeply improved and optimized by combining a traditional target detection algorithm and a small sample learning algorithm, so that the fast-RCNN network is suitable for small sample target detection. The invention provides an attention-based RPN module, which allocates different weights to different channel characteristics by using a channel attention mechanism, then performs deep cross-correlation on support set characteristics and query set characteristics to generate an attention characteristic graph, and then sends the attention characteristic graph to an RPN network to generate a candidate frame. The method is based on metric learning, an improved weighted prototype network is used for replacing the fast R-CNN classifier head, and the classification accuracy of candidate regions under small samples is improved; the invention introduces a multi-scale FPN module which comprises two branches, wherein one branch is similar to a general detection network and is applied to an RPN layer, and the other branch is applied to a support set image to extract a multi-scale feature map so as to solve the problems of scale sparseness of a small sample data set and scale difference between a query picture and the support set image.

Description

Small sample target detection method based on fast R-CNN

Technical Field

The invention relates to the field of small sample learning and target detection in deep learning, in particular to a target detection technology under a small sample condition.

Background

In recent years, with the development of massively parallel computing devices, deep learning theory has enjoyed great success in the practical application of computer vision. For example, image recognition technology has been widely applied in the fields of face recognition, automatic driving, biomedicine, etc., and the core task in these applications is to detect and recognize targets in a scene through a neural network model. However, the image algorithm based on the deep neural network usually needs a large amount of labeled data, and the model is supervised from end to end, so that a good effect can be achieved after a large number of iterations. However, due to limitations and specificities in some practical applications, it is often difficult to obtain large-scale image data sample sets, such as rare species pictures, rare remote sensing images, precious medical diagnosis pictures, special military target pictures, and the like. On the other hand, even if enough sample pictures are available, marking large-scale sample data requires enormous manpower and material resources. Therefore, in the case of data scarcity, how to learn and popularize the data in a small sample to a new task becomes a hot discussion problem in computer vision and other fields.

With the continuous progress and development of the target detection technology based on deep learning, excellent target detection frameworks such as fast R-CNN, YOLO, SSD and the like appear, but for small sample target detection, the method is a difficult problem in the field of target detection. The invention aims to solve the problem of how to detect the target under only a few samples. The invention combines a small sample learning algorithm and a traditional target detection algorithm, namely fast R-CNN, and designs a method capable of carrying out target detection under the condition of small samples.

Disclosure of Invention

In order to solve the target detection problem under the condition of a small sample, the invention provides a small sample target detection technology based on fast R-CNN. The technology is based on a universal two-stage target detection algorithm fast R-CNN in deep learning, and aiming at the condition of insufficient samples, the fast R-CNN is further improved by combining a small sample learning technology.

The technical scheme adopted by the invention is as follows:

step 1: inputting an image to be detected as a query set image and a small number of images containing targets as support set images;

step 2: extracting query image features through a feature extraction network, and extracting support set image features as support features;

and step 3: simultaneously sending the support image characteristic and the query image characteristic into an FPN network to generate a multi-scale characteristic diagram;

and 4, step 4: generating an attention feature map by the feature map through a channel attention mechanism and spatial attention, sending the attention feature map into an RPN network to generate a candidate frame, and generating a Roi feature map through Roi Pooling;

and 5: and the support features and the Roi features are respectively sent to the measurement branch and the regression branch for classification and positioning, and the object target is detected.

Compared with the prior art, the invention has the beneficial effects that:

(1) compared with the traditional target detection algorithm, the method has better generalization performance;

(2) and for the detection of small sample targets with insufficient samples, the identification and detection can be better carried out.

Drawings

FIG. 1 is a diagram: FPN is shown schematically.

FIG. 2 is a diagram of: channel attention mechanism diagram.

FIG. 3 is a diagram of: and (5) a multi-scale feature map extraction process.

FIG. 4 is a diagram of: partial detection result graph on PASCAL VOC data set

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

In this embodiment, the method for detecting a small sample target includes the following processing steps:

step 1: image input

Different from the traditional target detection algorithm, only a single image to be detected is input, the image to be detected is input as a query image for small sample target detection, and a few images containing targets are used as support set images. Thus, the method includes a query image branch and a support image branch, both branches running concurrently.

Step 2: multi-scale feature map extraction

The invention improves the fast R-CNN detection algorithm, introduces a multi-scale FPN module to simultaneously extract multi-scale characteristic graphs of the query image and the support set image, and solves the problems of target detection with different scales and scale difference of the query image and the support set image, wherein the FPN schematic diagram is shown in figure 1. The branch of the query image is similar to a common detection network, the FPN is applied to the RPN layer, the other branch is applied to the support set image to extract the multi-scale feature map, and each support image feature pyramid is obtained, so that the support set scale space is enriched. Further, after the pyramid supporting the image features is obtained, a multi-scale prototype of each class is generated through a weighted prototype network, and the process is shown in fig. 3.

And step 3: candidate region extraction

A potential candidate Region is generated in a fast-RCNN (Region pro-social Network) by adopting an RPN (Region pro-social Network), then whether the anchor frame belongs to the foreground or the background is judged through softmax, and then the anchor frame is corrected by utilizing regression of a surrounding frame to obtain a more accurate candidate frame. In the small sample target detection, a target object to be detected only contains a small number of training samples, and many RPN networks obtained by training on the base class are likely to generate more candidate frames irrelevant to the object when detecting a new class, so that the candidate classification network is required to have good discrimination capability. On the other hand, the RPN network should filter candidate regions that do not belong to the support set category, thereby reducing the number of candidate frames that need to be distinguished, which is helpful to further improve the network accuracy. Therefore, the present invention proposes an RPN network based on a multiple attention mechanism. The idea of attention mechanism stems from the human visual system selectively focusing on certain areas of emphasis while observing, while ignoring others. The multi-attention mechanism RPN provided by the invention uses the support set and the query set samples as input, so that the RPN can more effectively generate the candidate frame of the small sample target. The query image and the support image are firstly sent to a channel attention module after feature extraction, and the channel attention mechanism selectively strengthens certain channel features and suppresses less useful features by learning global information. As shown in the channel attention mechanism of fig. 2, an input set of feature maps are first subjected to global average pooling to perform global information compression, and all pixels of each feature map are averaged and compressed to a size of 1 × 1. And then, in order to learn the nonlinear correlation relationship between the channels, normalizing the channels through two full-connection layers and Relu activation functions and a sigmoid function, and finally generating the attention weight of each channel. And multiplying the weight by the feature map to obtain the feature map weighted by the attention of the channel. After the channel attention is passed, the support set characteristics and the query set characteristics are subjected to deep cross correlation to generate an attention characteristic graph, and then the attention characteristic graph is sent to an RPN network to generate a candidate box. In the space attention module, the characteristics of the query set pass through a convolution layer, and the method is different from the conventional convolution, and adopts Depth-wise convolution. Meanwhile, the support set features are subjected to pooling and Depth-wise convolution to form a 1 × 1 × C vector, the vector is used as a kernel to perform deep cross-correlation operation with the query set features, and an attention feature graph capable of representing the correlation between the query set and the support set is generated.

And step 3: candidate region classification and regression

After candidate frames are extracted through the RPN, the fast RCNN synthesizes a Roi feature map with the feature map, and the Roi feature map is subjected to final classification and positioning judgment to screen out a final target. The original fast RCNN network directly adopts the traditional softmax function to output the class of the target, but for the case of a small sample, the traditional classification method does not have enough generalization capability to detect the target of a new class. The invention proposes an improved weighted prototype network to replace the fast RCNN classifier head based on a metric learning approach. By adopting a metric learning mode and combining a meta learning training strategy, a model with generalization capability can be trained, and a new target class can be determined according to a small amount of samples, so as to realize target detection under a small sample.

The prototype network extracts the embedded features of the image through the embedded network learning, and uses the mean vector of the features of each category of the images of the support set as the prototype of the class, as shown in formula 1. And then, judging the distance between the query image feature and each prototype through a non-parametric measurement mode such as Euclidean distance and the like to classify.

However, this approach has a problem that when the distribution of the support set samples is very different or there are bad samples, the calculated mean vector cannot be used as a representative vector of the class well. The means are calculated in such a way that each sample feature contributes equally to the representative vector, but different sample features should have different degrees of contribution.

The invention calculates the prototype of the class in a weighted manner. Specifically, first, a one-dimensional gaussian kernel function is used to calculate a weighting coefficient for each support set sample feature, as shown in formula 2.

Wherein x is_ijJ support sample, x, representing the ith class_qQuery sample, σ, representing class i_iIndicating that the width of the gaussian function takes 0.1.

After obtaining the weighting coefficients for each support set feature, the present invention calculates the prototype of the class in a weighting manner, and the specific calculation is shown in formula 3.

Wherein

Is a prototype of the ith class that is subjected to a weighting calculation.

For a query sample, it is expected to tend toward weighted prototypes of the same class and away from weighted prototypes of different classes, so that a loss function can be obtained, as shown in equation 4.

FIG. 4 is a graph showing the results of the detection of the method of the present invention on the PASCAL VOC test set, and it can be seen that the method has a better detection result and can also detect a part of small targets.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except combinations where mutually exclusive features or/and steps are present.

Claims

1. A small sample target detection method based on fast R-CNN is characterized by comprising the following steps:

2. The method of claim 1, wherein in step 3, the FPN simultaneously introduces a query image branch and a support set image branch, wherein different scales of feature maps output by fusion of the FPN network in the query image branch are input into the RPN network to generate candidate regions; in the support set image branch, the support set image features are input into the FPN network to obtain each support image feature pyramid.

3. The method of claim 1 wherein step 4 is performed in tandem with the channel attention mechanism and the spatial attention, i.e., after the channel attention, the support set features and the query set features are depth cross correlated to generate the attention feature map.

4. The method of claim 1, wherein the step 5 uses a modified weighted prototype network as the metric branch.