CN113283513B

CN113283513B - Small sample target detection method and system based on target interchange and metric learning

Info

Publication number: CN113283513B
Application number: CN202110603033.5A
Authority: CN
Inventors: 刘芳; 焦李成; 刘静; 刘旭; 李鹏芳; 李玲玲; 郭雨薇; 古晶
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-12-13
Anticipated expiration: 2041-05-31
Also published as: CN113283513A

Abstract

The invention discloses a small sample target detection method and a system based on target interchange and metric learning. The data enhancement mode increases the contrast among targets in the target image, provides more contrastable samples for the similarity measurement module, reduces the influence of other types of targets or backgrounds in the target image on the target to be detected, and improves the detection precision of the small sample target detection model based on measurement learning on the base class and the new class.

Description

Small sample target detection method and system based on target interchange and metric learning

Technical Field

The invention belongs to the technical field of image detection, and particularly relates to a small sample target detection method and system based on target interchange and metric learning.

Background

Deep learning models have had great success in the task of target detection primarily because deep neural networks can learn higher-level, deeper features from the data. However, the deep learning model depends heavily on a large amount of labeled data, but the manual data labeling is time-consuming, labor-consuming, expensive, and in some application fields, there is not enough data accumulation. Deep learning works satisfactorily in data intensive applications, but is hindered when tagged data samples are small or data sets are small.

The input to the metric learning-based small-sample target detection model is a query-target image pair, and the output is a region in the target image that is similar to the query image. When the model is trained on a base class, labels of the query image and the target image are known, the purpose of training is to learn similarity measurement between the query image and the target image, the similarity measurement is irrelevant to the class, and then the learned measurement is directly used on a new class data set in a testing stage. The characteristics of the target image and the query image and the results of the small sample detection model based on metric learning are analyzed, and the results are found that the algorithm is easy to generate false detection results for classes with mutual relations because the target image has rich background and contains multiple types of foreground objects.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method and a system for detecting a small sample target based on target interchange and metric learning, aiming at the deficiencies in the prior art, wherein a new query-target image pair is formed as an input by interchanging the query image and the similar target in the target image by using the existing label information of the target image and the query image in the base class. The data enhancement mode increases the contrast among foreground objects in the target image, provides more contrastable samples for the similarity measurement module, further reduces the influence of other types of targets or backgrounds in the target image on the target to be detected, and improves the precision of the target detection model based on measurement learning.

The invention adopts the following technical scheme:

a small sample target detection method based on target interchange and metric learning comprises the following steps:

s1, dividing a category set C, obtaining a base category and a new category according to category division, and dividing an image data set into a base image data set and a new image data set according to the base category and the new category;

s2, respectively constructing a base class data set and a new class data set which are in a form of a query image-target image pair according to the base class image data set and the new class image data set obtained in the step S1;

s3, constructing a small sample target detection model based on metric learning by using the Faster R-CNN as a main frame of the detection model;

s4, constructing a data enhancement module based on target interchange, randomly selecting a pair of query image-target image and corresponding label information thereof in the pair of base class data set constructed in the step S2 as the input of the data enhancement module, cutting the area of an example object in the target image, which belongs to the same category as the query image, as a new query image according to the label information, embedding the original query image into the area of the corresponding example object, forming a new target image with the original target image, and constructing a new query-target image;

s5, using the paired base class data set constructed in the step S2 as a training data set, setting a random number rand for each batch of query-target image pairs which are used as input, if the random number rand is larger than 0.5, generating a new query-target image pair according to the data enhancement module in the step S4, if the random number rand is smaller than 0.5, reserving the original input image pair, and inputting the image pair and corresponding label information into the small sample target detection model constructed in the step S3 in batches for training, wherein the size of each batch is K;

and S6, randomly selecting a query-target image pair from the paired new class data set or base class data set constructed in the step S2, inputting the query-target image pair into the small sample target detection model trained in the step S5, and obtaining a detection result of the new class or the base class, namely finding a target example belonging to the same class as the query image in the target image.

Specifically, in step S1, 80 categories in the COCO2017 dataset are divided into 4 groups, three groups of categories are used as base classes, and the images including the target of the base class constitute a base class dataset

All used for model training, b is base class, M _b Number of base class images; taking the remaining one category as a new category, and combining the images containing the targets of the remaining categories into a new category dataset

For testing; x _m ∈R ^N×N R represents the real number field, Y _m ＝{(c _j ,I _j ),j＝1,...,N _m }，c _j Is an image X _m Class information of the jth object contained in (1), I _j Is its location information.

Specifically, in step S2, the pre-trained Mask R-CNN is used to filter the images in the new class data set and the base class data set in step S1, then only the Mask R-CNN is used to detect the detectable target label information for training, one image is randomly selected during training, then the label information of the target on the corresponding image is obtained, the image is cut and scaled according to the position label to be used as a query image P, then other images containing the target class in the query image are randomly selected to be used as a target image I, and a query-target image pair is constructed to be used as the input of the small sample target detection model.

Specifically, in step S3, the small sample target detection model includes a feature extraction network F, a candidate region generation network RPN, and a measurement module M; adopting ResNet-50 as a backbone network of a feature extraction module for extracting features F (P) and F (I) of a query image and a target image, wherein a candidate region generation network RPN is used for generating a candidate frame region containing a foreground object; the measurement module M uses a two-layer MLP network and ends with softmax class II; the input of the metric module M is the features of each candidate box on the target image after ROI Pooling and the target features of the query image.

Specifically, step S4 specifically includes:

s401, randomly selecting a pair of query-target image pairs in the paired base class data set constructed in the step S2, and selecting a target I which is in the same type as the query image P in the target image I according to the label information of the target image I and the query image P;

s402, according to the position information loc of the target i on the target image selected in the step S401 _i Calculating pixel area of corresponding target, and cutting pixel area to be greater than 50 ² And scaling the target image to be the same size as the query image as a new query image P';

s403, scaling the original query image to the same pixel area size as that of the target I in the selected target image, and replacing the target I with the scaled query image to form a new target image I' in an area corresponding to the target boundary frame selected in the original target image;

s404, using the new query image P 'formed in step S402 and the new target image I' formed in step S403 as a new query-target image pair, replacing the original query image P and the original target image I to form a query-target image pair as an input of the model, wherein the label information of the new query image P 'is consistent with the label information of the original query image P, and the label information of the new target image I' is consistent with the label information of the original target image I.

Specifically, step S5 specifically includes:

s501, randomly selecting a pair of query-target image pairs in the base class query-target image pair set constructed in the step S2, wherein the target image is I, and the query image is P;

s502, generating a random number rand, if the random number is larger than 0.5, transforming the input image pair according to the data enhancement module in the step S4, and generating a new query-target image pair as input;

s503, respectively sending the target image I and the query image P in the step S501 to a feature extraction module in the small sample target detection model constructed in the step S3 to obtain corresponding target image features F (I) and query image features F (P);

s504, the target image features F (I) obtained in the step S503 are used as input of a small sample target detection model RPN area generation network, the RPN area generation network is used for generating anchors, the anchors are cut and filtered, then the anchors are subjected to secondary classification through softmax, whether the anchors belong to a foreground or a background is judged, and the position information of the anchors is corrected through border frame regression;

s505, mapping the anchors position information corrected in the step S504 to a target image feature F (I), and acquiring a feature set F (bboes) corresponding to the candidate frame set by using ROI Pooling _i )；

S506, the ith candidate frame characteristic F (bboxes) of the step S505 is processed _i ) Splicing with the query image feature F (P), sending the query image feature F (P) into a measurement module of a small sample target detection model, and outputting a similarity score between the ith candidate frame feature in the target image and the query image feature F (P);

s507, according to the corrected anchor position information obtained in the S504, the similarity score obtained in the S506 and the label information of the query-target image pair, a target detection loss function is utilized

And carrying out joint training on the small sample target detection model.

Specifically, in step S507, the target detection loss function

Comprises the following steps:

wherein,

for cross entropy loss in Faster R-CNN,

in order to obtain the regression loss of the frame,

λ is 0.1 for the boundary-based ranking penalty.

Further, boundary-based ranking penalties

Comprises the following steps:

wherein m is ⁺ Lower limit of prospect, m ^- Taking the number of anchors obtained after the target image passes through the RPN as the upper limit of the background, calculating IoU of the Z candidate frames, and if the number of IoU candidate frames is more than 0.5, dividing the frame into a foreground, wherein the label y of the foreground is _i =1; if less than 0.5, it is classified as background, label y _i ＝0。

Ith candidate frame feature F (bbox) in target image _i ) And the feature vectors are obtained by cascading with the query image features F (P). Otherwise, it is set to 0.s = M (x) is a measure of the predicted probability value that the network M outputs each candidate box as a foreground.

Specifically, step S6 specifically includes:

s601, giving paired query-target image pairs in the new class image or the base class image constructed in the step S2, wherein the label information of the query image is known;

s602, sending the query-target image pair constructed in the step S601 into the small sample target detection model trained in the step S5 to obtain a query image feature F (I) and a target image feature F (P);

s603, taking the query image feature F (I) obtained in the step S602 as the input of a small sample target detection model RPN region generation network, and generating candidate frame region information containing a foreground object by using the RPN region generation network;

s604, mapping the candidate frame information obtained in the step S603 to the query image feature F (I) obtained in the step S602, and obtaining a feature set F (bboxes) corresponding to the candidate frame set by using ROI Pooling _i )；

S605, the feature F (bboxes) of the ith candidate frame in the step S604 is used _i ) Splicing with the target image features F (P), sending the target image features F (P) into a measurement module of a small sample target detection model, and outputting the similarity score between the ith candidate frame feature in the target image and the target image features F (P);

and S606, setting the threshold value to be 0.75, outputting the target area with the similarity score higher than the threshold value with the query image in the target image and the similarity score thereof, and finishing the small sample target detection based on metric learning.

Another technical solution of the present invention is a small sample target detection system based on target interchange and metric learning, comprising:

the classification module is used for classifying the class set C, obtaining a base class and a new class according to class classification, and classifying the image data set into a base class image data set and a new class image data set according to the base class and the new class;

the construction module is used for respectively constructing a base class data set and a new class data set which are paired and take the form of a query image-target image according to the base class image data set and the new class image data set obtained by the division module;

the target module is used for constructing a small sample target detection model based on metric learning by using the Faster R-CNN as a main frame of the detection model;

the enhancement module is used for constructing a data enhancement module based on target interchange, randomly selecting a pair of query image-target image and corresponding label information thereof in a pair of base class data sets constructed by the construction module as input of the data enhancement module, cutting an area where an example object belonging to the same category as the query image in the target image is located as a new query image according to the label information, embedding the original query image into the area where the corresponding example object is located, forming a new target image with the original target image, and constructing a new query-target image;

the training module is used for setting a random number rand for each batch of inquiry-target image pairs which are used as input by using the paired base class data sets constructed by the construction module as training data sets, generating a new inquiry-target image pair according to target interchange in the enhancement module if the random number rand is more than 0.5, reserving the original input image pair if the random number rand is less than 0.5, and inputting the image pair and corresponding label information into a small sample target detection model constructed in the target module in batches for training, wherein the size of each batch is K;

and the detection module randomly selects a query-target image pair in the paired new class data set or base class data set constructed in the construction module, inputs the query-target image pair into the small sample target detection model trained by the training module, and obtains a detection result of the new class or the base class, namely, a target example which belongs to the same class as the query image is found in the target image.

Compared with the prior art, the invention at least has the following beneficial effects:

compared with the existing method based on metric learning, the small sample target detection method based on target interchange and metric learning provided by the invention has the advantages that because the label information of the target image and the query image in the base class data is known, the data enhancement method based on target interchange is utilized to interchange the same type of target in the query image and the target image to form a new query-target image pair as input, the comparison among all targets in the target image is increased through the data enhancement mode, more contrastable samples are provided for the similarity measurement module, the influence of other types of targets or backgrounds in the target image on the target to be detected is reduced, and the detection accuracy of the small sample detection model based on metric learning on the base class and the new type is improved.

Further, the category set C is divided, a base category and a new category for training are obtained according to category division, and the image data set is divided into a base image data set and a new image data set according to the base category and the new category; the purpose of small sample learning is to train a model by using a large amount of labeled sample data, so that the model can recognize new category data with only a small amount of labels. Therefore, the category set C is divided according to the setting requirement of small sample learning, the image data set is divided into a base class image data set and a new class image data set according to the base class category and the new class category; and training the detection model by using the base class data set, and realizing detection on the new class data set by using the trained detection model.

Further, the input of the small sample target detection method based on metric learning is a query-target image pair, and the output is a target belonging to the same category as the query image in the target image. Therefore, for the images in the new class data set and the base class data set in S1, the pre-trained Mask R-CNN is used to filter the objects which are too small or too much occluded in the images. Training by using label information of a target which can be detected only by using Mask R-CNN, randomly selecting one image during training, then obtaining label information of the target on the corresponding image, cutting and zooming according to a position label to be used as a query image P, then randomly selecting other images containing target categories in the query image to be used as a target image I, and constructing a query-target image pair to be used as input of a small sample target detection model based on metric learning.

Furthermore, the Faster R-CNN is a two-stage detection model, firstly a series of candidate regions possibly containing targets are generated according to the characteristics extracted by the network, only whether the candidate regions are foreground objects or backgrounds is judged at this stage, and then classification and position regression of fine-grained specific categories are carried out on the candidate regions to complete the detection task. Constructing a small sample target detection model based on metric learning by using Faster R-CNN as a main frame of the detection model; the model comprises a feature extraction network F, a candidate region generation network RPN and a measurement module M; adopting ResNet-50 as a backbone network of a feature extraction module for extracting features F (P) and F (I) of a query image and a target image, and generating a network RPN for generating a candidate frame region containing a foreground object by using a candidate region generation network; the measurement module M uses a two-layer MLP network and ends with softmax class II; the input of the metric module M is the features of each candidate box on the target image after ROI Pooling and the target features of the query image.

Further, in order to grasp new things more quickly, a human being refers to the environment where the object is often located or other types of objects appearing simultaneously with the object in the learning process, but also learns the difference between the object and the reference object. For a small sample target detection model based on metric learning, the training samples are query-target image pairs, and the class labels of the query images and the label information of the objects contained in the targets are known. When a person finds an object in the target image that belongs to the same class as the query object, the person can recognize the object even if the object in the target image is replaced with the object in the query image. Therefore, the invention provides a data enhancement method based on target interchange, which has the general idea that when training is carried out on a base class, the label information of a query-target image pair is utilized to obtain the position information of targets in a target image which belong to the same class as the query image, and the similar targets in the query-target image pair are interchanged to form a new query-target image pair as a training sample, so that more comparison samples are provided for a measurement module, and the performance of a small sample target detection algorithm based on measurement learning is improved.

Further, using the paired base class data set constructed in the step S2 as a training data set, setting a random number rand for each batch of query-target image pairs as input, if the random number rand is greater than 0.5, generating a new query-target image pair according to the data enhancement module in the step S4, if the random number rand is less than 0.5, reserving the original input image pair, and inputting the image pair and corresponding label information into the small sample target detection model constructed in the step S3 in batches for training, wherein the size of each batch is K;

further, the objective of the small sample detection task based on metric learning is to find and match the target imageThe query image belongs to a class of objects and is identified and located, thus, the loss function is detected using the objects

And carrying out joint training on the small sample target detection model.

Furthermore, in order to better implicitly learn the similarity measurement between the foreground object features in the target image and the target features of the query image so as to predict the similarity, a margin-based ranking loss is adopted herein to guide the learning of a measurement module. This loss can enlarge the distance between the different classes, and all foreground objects other than the target and the background are considered negative examples. By doing so, the location of the target can be better highlighted.

Further, given a query image and a target image, a small sample target detection task based on metric learning is to find and locate a target in the target image that belongs to the same class as the query image. Therefore, a query-target image pair is randomly selected from the paired new class data set or base class data set, and is input into the small sample target detection model trained by the paired base class data set, so that a detection result of the new class or the base class is obtained, that is, a target example belonging to the same class as the query image is found in the target image, and a small sample target detection task based on metric learning is completed.

In conclusion, the method and the device not only realize the comparison among the foreground objects in the target image, provide more diversified samples for metric learning for the metric module, and reduce the influence of other foreground objects in the target image on the target to be detected; and background information of the targets in the query image is increased to a certain extent, so that template targets are various, and the influence of the background information on the detection result is reduced. Experimental results show that the method simultaneously improves the detection precision of the small sample target detection algorithm based on metric learning on the base class data and the new class data.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a block diagram of a small sample target detection method based on target interchange and metric learning according to the present invention;

FIG. 2 is a block diagram of a data enhancement module based on object exchange according to the present invention;

FIG. 3 is a task setting for small sample target detection based on metric learning in the present invention;

FIG. 4 is a schematic diagram of the operation of the data enhancement module based on object exchange according to the present invention;

FIG. 5 illustrates COCO data set partitioning in accordance with the present invention;

fig. 6 is a graph showing comparison between experimental results, (a) is a detection result of a CoAE model based on metric learning, and (b) is an experimental result added with the data enhancement method based on object exchange proposed by the present invention;

FIG. 7 is a comparison example of experimental results, (a) is a query image, (b) is a target image, (c) is the detection result of the existing CoAE model based on metric learning, and (d) is the experimental result added with the data enhancement method based on target interchange proposed by the present invention;

FIG. 8 is a graph showing comparative examples of experimental results.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and some details may be omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

The invention provides a small sample target detection method based on target interchange and metric learning, wherein the essential idea of the small sample target detection method based on metric learning is a comparison and prediction task; given a query image and a target image, a metric learning-based method learns similarity metrics between foreground objects contained in the query image and the target image by using a large amount of labeled base class data, namely finding a region which is the most image and the closest to the query image in the target image, and considering that the foreground objects contained in the region and the targets in the target image belong to the same class; and then giving a query image and a target image of the new class target, and applying the learned similarity measurement to a small amount of new class data with labeled samples to realize a target detection task in a small sample learning scene. The characteristics of input data and experimental results are analyzed, and the fact that due to the fact that the background of a target image is complex, the objects contained in the target image are various, and the relation existing between different types of objects and the complex background can affect the detection of certain types, the condition of false detection is caused; aiming at the problem, the invention provides a data enhancement strategy based on target interchange, which is characterized in that the existing label information of a target image and a query image in a base class is utilized to interchange the query image and the similar target in the target image to form a new query-target image pair as input. The data enhancement mode increases the contrast among targets in the target image, provides more contrastable samples for the similarity measurement module, and further reduces the influence of other targets or backgrounds in the target image on the target to be detected.

Referring to fig. 1, the present invention provides a small sample target detection method based on target interchange and metric learning, including the following steps:

80 classes in the COCO2017 dataset are divided into 4 groups, a group including a certain three groups of classes as a base class is formed into a base class dataset, and the group including the target of the base class forms a base class dataset

Only for testing. X _m ∈R ^N×N R represents the real number field, Y _m ＝{(c _j ,I _j ),j＝1,...,N _m }，c _j Is an image X _m Class information of the jth object contained in (1) _j Is its location information.

S2, respectively constructing paired data sets in the form of query image-target image for the base class image data set and the new class image data set;

the target image at least comprises one instance object which belongs to the same category as the query image. For the base class data, the label information of both the query image and the target image is available, while for the new class data, only the label information in the query image is available.

Referring to fig. 2, for the images in the new class data set and the base class data set in S1, filtering the objects that are too small or too much shielded in the images by using a pre-trained Mask R-CNN, and then training by using only the label information detected by the Mask R-CNN; during training, one image is randomly selected, label information of a target on the corresponding image is obtained, cutting and scaling are carried out according to the position label of the image to be used as a query image P, other images containing a target type in the query image are randomly selected to be used as a target image I, and a query-target image pair is constructed to be used as input of a model.

S3, constructing a small sample target detection model based on metric learning by using the Faster R-CNN as a main frame of the detection model, wherein the small sample target detection model comprises a feature extraction network F, a candidate region generation network RPN and a metric module M;

the small sample target detection model takes Faster R-CNN pre-trained on ImageNet which removes 80 types of images contained in COCO data set as a basic frame, and adopts ResNet-50 as a backbone network of a feature extraction module for extracting features F (P) and F (I) of a query image and a target image, and parameters of the ResNet-50 network are shared by the query image and the target image.

The candidate region generation network RPN is used to generate a candidate frame region containing a foreground object.

The metric module M uses a two-layer MLP network and ends with a softmax class two. The input M of the measurement module is the feature of each candidate frame on the target image and the target feature of the query image after ROI Pooling, and the purpose is to measure the similarity between the feature of each candidate frame and the feature of the query image, output the similarity between the feature of each candidate frame and the feature of the query image, and reserve the candidate frame with high similarity as a detection result.

S4, constructing a data enhancement module based on target interchange, randomly selecting a pair of 'query image-target image' and corresponding label information thereof from the pair of base class data sets constructed in the step S2 as input of the data enhancement module, cutting an area where an example object in the target image, which belongs to the same category as the query image, is located as a new query image according to the label information, embedding the original query image into the area where the example object is located, forming a new target image with the original target image, and constructing a new 'query-target image';

referring to fig. 2, the specific steps are as follows:

s401, according to the paired base class data sets constructed in the step S2, selecting a pair of 'query-target' image pairs randomly, and according to label information of a target image I and a query image P, selecting a target I which is in the same type as the query image P in the target image I;

s402, according to the position information loc of the target i on the target image _i Calculating the pixel area of the corresponding target, if the pixel area is more than 50 ² Cutting the image and scaling the image to be a new query image P' with the same size as the query image; otherwise, go back to step S401 to make a selection again, as shown in fig. 2 (a);

s403, scaling the original query image to the same pixel area size as that of the target I in the selected target image, and replacing the target I with the scaled query image to an area corresponding to the target boundary frame selected in the original target image to form a new target image I', as shown in FIG. 2 (a);

s404, using the new query image P 'formed in S402 and the new target image I' formed in S403 as a new query-target image pair, and replacing the original query image P and the original target image I to form a query-target image pair as an input of the model, wherein the label information of the new query image P 'is consistent with the label information of the original query image P, and the label information of the new target image I' is consistent with the label information of the original target image I, as shown in fig. 2 (b).

Referring to fig. 3, by the data enhancement method based on target interchange provided by the present invention, not only can the similarity measurement between various foreground objects in the original target image be added, so as to provide more samples for the measurement module, but also the target in the query image can contain a plurality of background information, so as to reduce the influence of the background information.

S5, using the paired base class data set constructed in the step S2 as a training data set, setting a random number rand for each batch of input query-target image pairs, if the random number rand is larger than 0.5, generating a new query-target image pair according to target interchange in the step S4, otherwise, keeping the original input image pair, and inputting the image pair and corresponding label information thereof into the small sample target detection model constructed in the step S3 in batches for training, wherein the size of each batch is K;

referring to fig. 1, the specific steps are as follows:

s501, randomly selecting a pair of query-target image pairs in the paired base class data set constructed in the step S2, wherein the target image is I, and the query image is P;

s502, generating a random number rand, if the random number is larger than 0.5, transforming the input image pair according to the target interchange method in the step S4, and generating a new query-target image pair as input; if the value is less than 0.5, no change is made;

s503, respectively sending the target image I and the query image P into a pre-trained feature extraction module in the model constructed in S3 to obtain corresponding target image features F (I) and query image features F (P);

s504, taking the target image feature F (I) as the input of an RPN network, generating a group of anchors by using an RPN area generation network, cutting and filtering the anchors, then carrying out secondary classification on the anchors through softmax, judging whether the anchors belong to a foreground (foregone) or a background (background), namely, whether the anchors are objects or not, and correcting the position information of the anchors by using and using a bounding box regression;

s505, mapping the candidate frame information to a target image feature F (I), and acquiring a feature set F (bboxes) corresponding to the candidate frame set by using ROI Pooling _i )；

S506, the ith candidate frame feature F (bboxes) _i ) Splicing with the query image feature F (P), sending the result into a measurement module, and outputting the similarity score of the ith candidate frame feature in the target image and the query image feature F (P);

s507, detecting loss function by using target

In (1)

Cross entropy loss (classification probability) and bounding box regression loss

And boundary-based ranking penalties

And performing joint training on the small sample target detection model.

Constructing a target detection loss function L, wherein the optimization target is as follows:

wherein,

for cross entropy loss in Faster R-CNN,

in order to obtain the regression loss of the frame,

the ranking loss based on the boundary specifically includes:

wherein m is ⁺ Lower limit of prospect of =0.7, m ^- ＝0.3 is the upper limit of the background. The first half of the loss limits the confidence of the foreground and the background, and enhances the characteristics of the foreground object in the target image; the second half is the loss of ranking, which does not really set the ranking order, but only requires that the confidence between samples of the same category is less than 0.3, and the confidence between different categories is greater than 0.7, thus limiting the confidence between categories.

And S6, inputting paired 'query-target' images in the new type of images or the base type of images constructed in the step S2 into the small sample target detection model based on metric learning trained in the step S5 to obtain a detection result of the new type of images or the base type of images, namely finding target examples belonging to the same type as the query images in the target images.

Referring to fig. 4, the specific steps are as follows:

s601, giving a pair of new class data sets or a pair of query-target image pairs in the base class data sets, wherein the label information of the query image is known and the label information of the target image is unavailable;

s602, sending the image pair into a feature extractor in a small sample target detection model trained in the step S5 and based on target interchange and metric learning to obtain a query image feature F (I) and a target image feature F (P);

s603, taking the query image feature F (I) as the input of an RPN network, and generating candidate frame area information containing a foreground object by using the RPN area generation network;

s604, mapping the candidate frame information to a query image feature F (I), and acquiring a feature set F (bboxes) corresponding to the candidate frame set by using ROI Pooling _i )；

S605, the ith candidate frame feature F (bboes) _i ) Splicing with the target image feature F (P), sending the target image feature F (P) into a measurement module, and outputting the similarity score of the ith candidate frame feature in the target image and the target image feature F (P);

s606, setting a threshold value to be 0.75, outputting a target area with the similarity score higher than the threshold value with the query image in the target image and the similarity score of the target area, and finishing a small sample target detection task based on metric learning by considering that a foreground object contained in the area and the query image belong to the same class.

In another embodiment of the present invention, a small sample target detection system based on target interchange and metric learning is provided, which can be used to implement the small sample target detection method based on target interchange and metric learning described above.

The classification module is used for classifying the class set C, obtaining a base class and a new class according to class classification, and classifying the image dataset into a base class image dataset and a new class image dataset according to the base class and the new class;

the construction module is used for respectively constructing a base class data set and a new class data set which are in a form of a pair of a query image-target image according to the base class image data set and the new class image data set obtained by the division module;

the target module uses the Faster R-CNN as a main frame of a detection model to construct a small sample target detection model based on metric learning, and the small sample target detection model comprises a feature extraction network F, a candidate region generation network RPN and a metric module M;

the training module is used for using a paired base class data set constructed by the construction module as a training data set, setting a random number rand for each batch of query-target image pairs which are input, if the random number rand is larger than 0.5, generating a new query-target image pair according to target interchange in the enhancement module, if the random number rand is smaller than 0.5, reserving the original input image pair, and inputting the image pair and corresponding label information into a small sample target detection model constructed in the target module in batches for training, wherein the size of each batch is K;

In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor according to the embodiment of the present invention may be used for the operation of the small sample target detection method based on target interchange and metric learning, and includes:

dividing the category set C to obtain a base category and a new category according to category division, and dividing the image dataset into a base image dataset and a new image dataset according to the base category and the new category; respectively constructing a base class data set and a new class data set which are paired in the form of a query image-target image according to the obtained base class image data set and the new class image data set; constructing a small sample target detection model based on metric learning by using fast R-CNN as a main frame of the detection model, wherein the small sample target detection model comprises a feature extraction network F, a candidate region generation network RPN and a measurement module M; constructing a data enhancement module based on target interchange, randomly selecting a pair of query image-target image and corresponding label information thereof in a pair of constructed base class data set as input of the data enhancement module, cutting an area where an example object belonging to the same category as a query image in the target image is located as a new query image according to the label information, embedding the original query image into the area where the corresponding example object is located, forming a new target image with the original target image, and constructing a new query-target image; using the constructed paired base class data sets as training data sets, setting a random number rand for each batch of query-target image pairs as input, if the random number rand is larger than 0.5, generating a new query-target image pair according to target interchange, if the random number rand is smaller than 0.5, reserving the original input image pair, and inputting the image pair and corresponding label information into a small sample target detection model in batches for training, wherein the size of each batch is K; and randomly selecting a query-target image pair in the constructed new class data set or the base class data set, inputting the query-target image pair into the trained small sample target detection model, and obtaining a detection result of the new class or the base class, namely finding a target example which belongs to the same class as the query image in the target image.

In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.

One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the small sample object detection method based on object exchange and metric learning in the above embodiments; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:

dividing the category set C to obtain a base category and a new category according to category division, and dividing the image dataset into a base image dataset and a new image dataset according to the base category and the new category; respectively constructing a base class data set and a new class data set which are in a form of a query image-target image pair according to the obtained base class image data set and the new class image data set; constructing a small sample target detection model based on metric learning by using fast R-CNN as a main frame of the detection model, wherein the small sample target detection model comprises a feature extraction network F, a candidate region generation network RPN and a measurement module M; constructing a data enhancement module based on target interchange, randomly selecting a pair of query image-target image and corresponding label information thereof in a pair of constructed base class data set as input of the data enhancement module, cutting an area where an example object belonging to the same category as a query image in the target image is located as a new query image according to the label information, embedding the original query image into the area where the corresponding example object is located, forming a new target image with the original target image, and constructing a new query-target image; using the constructed paired base class data sets as training data sets, setting a random number rand for each batch of query-target image pairs as input, if the random number rand is larger than 0.5, generating a new query-target image pair according to target interchange, if the random number rand is smaller than 0.5, reserving the original input image pair, and inputting the image pair and corresponding label information into a small sample target detection model in batches for training, wherein the size of each batch is K; and randomly selecting a query-target image pair in the constructed new class data set or the base class data set, inputting the query-target image pair into the trained small sample target detection model, and obtaining a detection result of the new class or the base class, namely finding a target example which belongs to the same class as the query image in the target image.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The effect of the present invention will be further described with reference to the simulation diagram

1. Simulation conditions

The hardware conditions of the simulation of the invention are as follows: the graphic workstation of the intelligent perception and image understanding laboratory uses a GPU of Nvidia GeForce GTX 1080Ti, and the video memory of the GPU is 12G; and 2 CPUs with 10 cores: intel Xeon E5-2360 v4, the dominant frequency is 2.20GHz, and the memory is 64GB; the invention simulates the COCO2017 dataset. The COCO2017 dataset contains 80 classes. According to the task setting of small sample target detection based on metric learning, a data set needs to be divided into a base class and a new class, and the classes of the base class and the new class are not intersected. Therefore, for comparison with other mainstream methods, please refer to fig. 5, the COCO2017 dataset is divided into 4 groups, and if one of the groups is used as a new class for testing, the other three groups are used as base classes for training.

2. Simulation content and results

The method is used for carrying out experiments under the simulation condition, the similarity measurement between the learning query image and each foreground object contained in the target image is carried out by utilizing the 'query-target' image which is divided into the base class in the COCO2017 data set, then the query image and the target image of the new class of target are given, and the learned similarity measurement is applied to the new class of data with a small number of labeled samples, so that the target detection task under the small sample learning scene is realized.

The data enhancement method based on target interchange provided by the invention is added to the existing mainstream method based on metric learning for comparison, and the result is shown in table 1:

TABLE 1

From the results in table 1, the method of the present invention achieves a certain improvement in the target detection accuracy on both the base class and the new class.

Referring to fig. 6, since the background information of the target image is rich, many results of detecting the background information as the target appear in fig. 6 (a); in fig. 6 (b), almost no other redundant background region is detected, and the bounding box of the detection result is more accurate. Therefore, by using the data enhancement method provided by the invention, the distance between the target to be detected and the background information in the target image can be enlarged, the diversity of the target in the query image and the background information in the query image can be enriched, and the influence of the background in the target image on the target to be detected is reduced.

Referring to fig. 7, the data enhancement method based on target exchange provided by the present invention can provide more diverse comparison samples for the measurement module, and improve the detection accuracy for small targets and multiple targets.

Referring to fig. 8, a first column from left to right is a query image, a second column is a detection result of an existing CoAE model based on metric learning, and a third column is an experimental result added with the data enhancement method based on object interchange provided by the present invention.

In summary, according to the small sample target detection method and system based on target interchange and metric learning, the data enhancement method not only realizes comparison between foreground objects in a target image, but also provides more diversified samples for metric learning for a metric module, and reduces the influence of other foreground objects in the target image on a target to be detected; the background information of the target in the query image is increased to a certain extent, so that the template targets are various, the influence of the background information on the detection result is reduced, and the detection precision of the small sample target detection algorithm based on metric learning on the base class data and the new class data is improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention should not be limited thereby, and any modification made on the basis of the technical idea proposed by the present invention falls within the protection scope of the claims of the present invention.

Claims

1. The small sample target detection method based on target interchange and metric learning is characterized by comprising the following steps of:

s3, constructing a small sample target detection model based on metric learning by using the Faster R-CNN as a main frame of the detection model, wherein the small sample target detection model comprises a feature extraction network F, a candidate region generation network RPN and a metric module M; adopting ResNet-50 as a backbone network of a feature extraction module for extracting features F (P) and F (I) of a query image and a target image, wherein a candidate region generation network RPN is used for generating a candidate frame region containing a foreground object; the measurement module M uses a two-layer MLP network and ends with softmax class II; the input of the measurement module M is the characteristic of each candidate frame on the target image after ROI Pooling and the target characteristic of the query image;

s4, constructing a data enhancement module based on target interchange, randomly selecting a pair of query image-target image and corresponding label information thereof from the pair of base class data sets constructed in the step S2 as input of the data enhancement module, cutting an area where an example object in the target image, which belongs to the same category as the query image, is located as a new query image according to the label information, embedding the original query image into the area where the corresponding example object is located, forming a new target image with the original target image, and constructing a new query-target image pair;

s5, using the paired base class data set constructed in the step S2 as a training data set, setting a random number rand for each batch of input query-target image pairs, if the random number rand is larger than 0.5, generating a new query-target image pair according to the data enhancement module in the step S4, if the random number rand is smaller than 0.5, reserving the original input image pair, inputting the image pair and corresponding label information into the small sample target detection model constructed in the step S3 in batches for training, wherein the size of each batch is K, and the target detection loss function is K

Comprises the following steps:

wherein,

for the cross-entropy loss in the Faster R-CNN,

in order to obtain the frame return loss,

λ is 0.1 for boundary-based ranking penalty;

boundary-based ranking penalties

Comprises the following steps:

wherein m is ⁺ Lower limit of prospect, m ^- Taking the number of anchors obtained after the target image passes through the RPN as the upper limit of the background, calculating IoU of the Z candidate frames, and if the number of IoU candidate frames is more than 0.5, dividing the frame into a foreground, wherein the label y of the foreground is _i =1; if less than 0.5, it is classified as background, label y _i ＝0；

Ith candidate frame feature F (bbox) in target image _i ) The feature vector is obtained by cascading with the query image feature F (P); s = M (x) is a measure of the predicted probability value that network M outputs each candidate box as a foreground;

and S6, randomly selecting a query-target image pair from the paired new class data set or base class data set constructed in the step S2, inputting the query-target image pair into the small sample target detection model trained in the step S5, and obtaining a detection result of the new class data set or base class data set, namely finding a target example belonging to the same class as the query image in the target image.

2. The method according to claim 1, wherein in step S1, 80 classes in the COCO2017 dataset are divided into 4 groups, and a group dataset comprising three groups of classes as base classes, the images of which objects including the base class constitute the base class dataset

For testing; x _m ∈R ^N×N R represents the real number field, Y _m ＝{(c _j ,I _j ),j＝1,...,N _m }，c _j As an image X _m Class information of the jth object contained in (1) _j Is its location information.

3. The method according to claim 1, wherein in step S2, the pre-trained Mask R-CNN is used to filter the images included in the new class data set and the base class data set in step S1, the pre-trained Mask R-CNN is used to filter the images, then only the Mask R-CNN is used to detect the target label information that can be detected for training, one image is randomly selected during training, then the label information of the target on the corresponding image is obtained, the image is cut and scaled according to the position label as a query image P, then other images including the target class in the query image are randomly selected as a target image I, and a query-target image pair is constructed as the input of the small sample target detection model.

4. The method according to claim 1, wherein step S4 is specifically:

s401, randomly selecting a pair of query-target image pairs in the paired base class data set constructed in the step S2, and selecting a target I which is similar to the query image P from the target image I according to the label information of the target image I and the query image P;

s402, according to the position information loc of the target i on the target image selected in the step S401 _i Calculating pixel area of corresponding target, and cutting pixel area to be greater than 50 ² And scaled to the same size as the query imageIs a new query image P';

s403, scaling the original query image to the pixel area size same as that of the target I in the selected target image, and replacing the target I with the scaled query image to form a new target image I' in the region corresponding to the selected target boundary frame in the original target image;

5. The method according to claim 1, wherein step S5 is specifically:

s505, mapping the anchors position information corrected in the step S504 to a target image feature F (I), and acquiring a feature set F corresponding to the candidate frame set by using ROI Pooling(bboxes _i )；

And performing joint training on the small sample target detection model.

6. The method according to claim 1, wherein step S6 is specifically:

s601, giving paired query-target image pairs in the new type of images or the base type of images constructed in the step S2, wherein the label information of the query images is known;

S605, the ith candidate frame feature F (bboes) in the step S604 is used _i ) Splicing with the target image features F (P), sending the target image features F (P) into a measurement module of a small sample target detection model, and outputting the similarity score between the ith candidate frame feature in the target image and the target image features F (P);

7. A small sample object detection system based on object exchange and metric learning, comprising:

the dividing module is used for dividing the category set C, obtaining a base category and a new category according to category division, and dividing the image data set into a base image data set and a new image data set according to the base category and the new category;

the target module uses the Faster R-CNN as a main frame of a detection model to construct a small sample target detection model based on metric learning, and the small sample target detection model comprises a feature extraction network F, a candidate region generation network RPN and a metric module M; adopting ResNet-50 as a backbone network of a feature extraction module for extracting features F (P) and F (I) of a query image and a target image, and generating a network RPN for generating a candidate frame region containing a foreground object by using a candidate region generation network; the measurement module M uses a two-layer MLP network and ends with softmax class II; the input of the measurement module M is the characteristic of each candidate frame on the target image after ROI Pooling and the target characteristic of the query image;

the enhancement module is used for constructing a data enhancement module based on target interchange, randomly selecting a pair of query image-target image and corresponding label information thereof from a pair of base class data sets constructed by the construction module as input of the data enhancement module, cutting an area where an example object belonging to the same category as the query image in the target image is located as a new query image according to the label information, embedding the original query image into the area where the corresponding example object is located, forming a new target image with the original target image, and constructing a new query-target image pair;

training modules, pairs constructed using construction modulesThe base class data set is used as a training data set, a random number rand is set for each batch of query-target image pairs which are used as input, if the random number rand is larger than 0.5, a new query-target image pair is generated according to target interchange in the enhancement module, if the random number rand is smaller than 0.5, the original input image pair is reserved, then the image pair and corresponding label information are input into a small sample target detection model constructed in the target module in batches for training, the size of each batch is K, and the target detection loss function is lost

Comprises the following steps:

wherein,

for the cross-entropy loss in the Faster R-CNN,

in order to obtain the frame return loss,

λ is 0.1 for boundary-based ranking penalty;

boundary-based ranking penalties

Comprises the following steps:

wherein m is ⁺ Lower limit of the prospect, m ^- Taking the number of anchors obtained after the target image passes through the RPN as the upper limit of the background, calculating IoU of Z candidate frames, and if the number of IoU candidate frames is more than 0.5, dividing the candidate frames into a foreground and labeling the foregroundy _i =1; if less than 0.5, it is classified as background, label y _i ＝0；

Feature F (bbox) of ith candidate frame in target image _i ) The feature vectors are obtained by cascading with the query image features F (P); s = M (x) is a measure of the predicted probability value that network M outputs each candidate box as a foreground;

and the detection module randomly selects a query-target image pair in the paired new class data set or base class data set constructed in the construction module, inputs the query-target image pair into the small sample target detection model trained by the training module, and obtains a detection result of the new class or base class, namely, a target example belonging to the same class as the query image is found in the target image.