CN114841257A

CN114841257A - Small sample target detection method based on self-supervision contrast constraint

Info

Publication number: CN114841257A
Application number: CN202210421310.5A
Authority: CN
Inventors: 邢薇薇; 姚杰; 刘渭滨; 张顺利; 魏翔
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-08-02
Anticipated expiration: 2042-04-21
Also published as: CN114841257B

Abstract

The invention provides a small sample target detection method based on self-supervision contrast constraint. The method comprises the following steps: modeling the small sample target detection problem into a mathematical optimization problem based on self-supervision learning, and constructing a small sample target detection model sensitive to data disturbance; designing an optimized objective function of a small sample target detection model; and training the small sample target detection model by using a deep learning updating process based on the optimization target function to obtain a trained small sample target detection model, and performing target detection on the small sample to be detected by using the trained small sample target detection model. The invention is based on a two-stage learning process, and uses transfer learning to learn the domain knowledge and carries out model fine tuning on a small sample data set. Experimental results prove that the method obtains good performance on the PASCAL-VOC public data set, can effectively improve the performance of the model on the small sample target detection problem, and has strong practical application significance.

Description

Small sample target detection method based on self-supervision contrast constraint

Technical Field

The invention relates to the technical field of target detection, in particular to a small sample target detection method based on self-supervision contrast constraint.

Background

In recent years, with the development of deep convolutional neural networks, target detection has been remarkably improved, however, the existing target detection method is seriously dependent on a large amount of annotated data, and when the annotated data becomes scarce, the deep neural network may have the problems of serious overfitting and no generalization. However, in reality, there are many data which are difficult to obtain by annotating object bounding boxes such as object classes with rare examples or similar medical data, and the small sample learning aims to train a model by providing fewer examples, and most of the existing small sample learning work focuses on an image classification problem, and only a few on a small sample target detection problem. Since object detection requires not only prediction of classes but also localization of objects, it makes it much more difficult than the task of small sample classification. Specifically, the small sample target detection method based on the self-supervision contrast constraint maximizes the difference between different classes of objects and minimizes the difference between the same class of objects under the condition of less available training data, so that the class prediction and positioning of the objects achieve the best effect, which is a mathematical optimization problem based on self-supervision learning.

In order to enhance the effect of predicting and positioning the small sample object class, a reasonable detection method needs to be designed. The existing two-stage fine adjustment-based method has great advantages in the aspect of improving the detection of small sample targets, and the two stages based on the two-stage fine adjustment are as follows: the first stage is as follows: training a base class on large-scale data; and a second stage: all base class training parameters are frozen and the classifier and bounding box regressor are fine-tuned using a small amount of new data. However, the small sample target detection method still has some problems, and after the model is trimmed on new data, the target object is often mistakenly marked as other confusable categories.

Disclosure of Invention

The embodiment of the invention provides a small sample target detection method based on self-supervision contrast constraint so as to effectively perform target detection on a small sample.

In order to achieve the purpose, the invention adopts the following technical scheme.

A small sample target detection method based on self-supervision contrast constraint comprises the following steps:

modeling a small sample target detection problem into a mathematical optimization problem based on self-supervision learning, and constructing a small sample target detection model which is sensitive to data disturbance and faces to input;

designing an optimized objective function of a small sample target detection model;

and training the small sample target detection model by using a deep learning updating process based on the optimization target function to obtain a trained small sample target detection model, and performing target detection on the small sample to be detected by using the trained small sample target detection model.

Preferably, the modeling of the small sample target detection problem into a mathematical optimization problem based on self-supervised learning, and the constructing of the input-oriented small sample target detection model sensitive to data disturbance includes:

(1) first stage construction of a data set D _train All training data of the base class are contained;

(2) second phase building of the basic dataset D _base Data set D _base Class information and data set D _train The same amount of training data for each class is used for the small sample target data set D _novel The number of (2) is the same;

(3) second stage construction of a small sample object data set D _novel Wherein the category information is associated with a first-stage data set D _train Second stage base data set D _base In contrast, the number of samples of each type of training data is equal to the second stage basic data set D _base The same;

(4) using contrast loss to carry out feature consistency constraint, and proposing the predicted distribution of the contrast loss to the sample based on the predicted distributionCarrying out consistency constraint by constructing positive and negative sample pairs, wherein a represents the sample pair, a _p Represents a positive sample pair, a _n Representing a negative example pair, y _a A label representing the pair of samples is attached to,

S，S ⁺ ，S ^- representing sample features used to construct positive and negative sample pairs, S representing features corresponding to reference samples, S ⁺ Represents the sample feature with the same category as the reference sample and the maximum IoU value, S ^- Representing sample features different from the reference sample class, i.e. a _p ＝{S，S ⁺ }，a _n ＝{S，S ^- }。

Preferably, the designing an optimized objective function of the small sample object detection model includes:

the optimization objective function for setting the small sample target detection model comprises a base class training network optimization objective function L _base ＝L _rpn +L _cls +L _reg And fine tuning network optimization objective function L _{fine_tune} ＝L _rpn +L _cls +L _reg +L _contrastive +L _{contrastive-JS} The fine tuning network adds a contrast optimization objective function on the basis of the base class training network;

1：L _rpn for extracting the network loss function for the region, the calculation method is shown as formula (1):

the loss function of the area extraction network is divided into classification loss functions L _{rpn_cls} And bounding box regression loss function L _{rpn_reg} Two moieties, L _{rpn_cls} For network training of classifying anchor frames as positive and negative samples, the complete description is shown in formula (2), L _{rpn_reg} For the bounding box regression network training, the complete formula description is shown in formula (3), where N is _{rpn_cls} Representing the batch size, N, of training samples in a local extraction network _{rpn_reg} Representation area extraction networkThe number of anchor frames generated is such that,

representing the true classification probability corresponding to the ith anchor box,

L _{rpn_cls} using cross entropy to compute whether the loss of the target is contained in the anchor box is a two-class loss, p _i Representing the predicted classification probability of the ith anchor box,

representing the real classification probability corresponding to the ith anchor frame, wherein the function is used for judging whether the extracted image area contains an object or not;

is shown in equation (4), t _i ＝{t _x ，t _y ，t _w ，t _h Denotes the bounding box prediction regression parameters for the ith anchor box,

regression parameter, t, representing the true value box corresponding to the ith anchor box _i ，

The calculation process is shown in formula (5) and formula (6).

t _x ＝(x-x _anchor )/w _anchor ，t _y ＝(y-y _anchor )/h _anchor

t _w ＝log(w/w _anchor )，t _h ＝log(h/h _anchor ) (5)

x, y denote coordinates of the center point of the predicted bounding box, w, h denote the width and height of the predicted bounding box, x _anchor ，y _anchor Coordinates of center point, w, representing current anchor frame _anchor ，h _anchor Indicating the width and height of the current anchor box.

x ^* ，y ^* Coordinates of the center point of the real bounding box representing the object in the image, w ^* ，h ^* Width and height of a real bounding box representing an object in the image;

2: classification loss function L _cls The calculation formula of (a) is as follows:

using cross entropy as a classification loss function in a target detection network, where s _i Denotes the ith detection frame, p _i Representing the predicted classification probability of the ith detection box,

the classification truth value of the ith detection frame is represented, the function provides basis for the classification behavior of the network, whether the classification of the network on the object class of the detection area is accurate or not is judged through the function, and model updating is carried out on inaccurate objects through calculating loss values;

3: bounding box regression loss function L _reg The calculation formula of (a) is as follows:

t _i and

respectively representing the predicted value and the real value of the bounding box parameterized coordinate of the ith detection box,

if the function is smooth loss, further adjusting the position information of the detection area through the function;

4: contrast loss function L _contrastive The calculation formula of (a) is as follows:

construction of S, S ⁺ ，S ^- Sample characteristics, constructing positive sample pairs a _p ＝{S，S ⁺ H, a negative sample pair a _n ＝{S，S ^- }，D _a Represents a positive sample pair a _p Or negative sample pair a _n Euclidean distance between, y _a The label of the pair a of the samples is shown,

i.e. the current sample pair is a positive sample pair a _p The model will minimize the distance between the update sample and the positive sample;

m represents the upper bound of the sample pair distance, when the distance between the sample and the negative sample is greater than m, the loss value is equal to 0, and the model is not updated; otherwise, updating the model until the distance of the negative sample pair reaches m;

5: Contrasive-JS loss function L _{contrastive-JS} The calculation formula of (a) is as follows:

wherein p is _a Is the predicted distribution of samples over a, y _a Is the label for the current sample pair. p is a radical of _a [i]Representing the ith prediction distribution in the sample pair,

m' represents the upper bound of the sample pair distance, which is the same as m in equation (9).

Preferably, the training of the small sample target detection model based on the optimization objective function by using the deep learning update process to obtain the trained small sample target detection model includes:

training a small sample target detection model by using an optimized objective function through a two-stage deep learning model updating process, wherein the two-stage deep learning process consists of two stages of data training and small sample data fine tuning, and a training sample is used for training the whole detection frame in the first stage to obtain model parameters of the model on a basic sample; in the second stage, firstly, the model parameters in the first stage are used for carrying out parameter initialization on the network, the parameters of the feature extraction module are fixed, then, the small sample data set is used for carrying out fine adjustment on the model parameters, a consistency strategy based on self-supervision learning is introduced in the second stage to restrain the feature expression and distribution expression of the sample, and finally, the training of the small sample target detection model is completed, and the trained small sample target detection model is obtained.

step 3-1: generation of dataset D Using PASCAL VOC dataset _train ，D _base And D _noval There are 20 classes in the PASCAL VOC data set, 15 classes of which are divided into basic classes and 5 new classes, and the base is usedAll instances of this class construct D _train Randomly sampling K-1, 2, 3, 5, 10 instances from the new class and the basic class as D of K-shot _base And D _noval ；

Step 3-2: creating a base class training network taking fast-RCNN as a basic frame, selecting ResNet101 and a characteristic pyramid as a characteristic extraction network, initializing model parameters, setting a hyper-parameter, setting the standard batch size to be 16, creating a standard SGD optimizer, wherein the momentum is 0.9, and the weight attenuation is 1 e-4;

step 3-3: construction of D _train The data loader is used for performing data enhancement on the original input;

step 3-4: training the base class training network, calculating the output value of each base class training sample, and calculating the loss L _base Updating network parameters by using a gradient descent algorithm;

step 3-5: if the model converges or the required training steps are reached, ending the base class training network training process and storing the model parameters; otherwise, returning to the step 4-2;

step 3-6: construction of D _base And D _noval The data loader is used for creating a fine tuning network model, initializing a network by using model parameters obtained by a base class training network and creating an optimizer;

step 3-7: training a fine tuning network, on the basis of a base class training network, obtaining a candidate frame feature map generated by training samples after the pooling operation of an interested area, traversing a candidate frame feature map list, matching a candidate frame feature map with the same category as a positive example for each candidate frame feature map, matching a candidate frame feature map with different categories as a negative example, selecting two positive examples and a current sample to form 2 positive sample pairs, selecting two negative examples and the current sample to form 2 negative sample pairs, and calculating L for the candidate frame feature maps of the obtained positive sample pairs and the negative sample pairs _contrastive (ii) a Obtaining the class probability distribution of the training samples after the classification operation, and calculating the class probability distribution of the positive sample pair and the negative sample pair _contrastive Calculating the output value of each training sample, calculating the loss L _{fine_tune} Using gradient descent algorithmsUpdating the network parameters;

step 3-8: using AP50 for new prediction (nAP50) on a PASCAL VOC 2007 test set as a model performance evaluation index, observing the convergence condition of the model, and ending the fine-tuning network training process if the model converges or reaches the required training steps; otherwise, go back to step 3-7.

According to the technical scheme provided by the embodiment of the invention, the method provided by the invention is based on a two-stage learning process, and is used for learning the domain knowledge by using transfer learning and carrying out model fine tuning on a small sample data set. Experimental results prove that the method provided by the invention obtains good performance on a public data set of the PASCAL-VOC data set, can effectively improve the performance of a model on the detection problem of a small sample target, and has strong practical application significance.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a base class training network according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a trimming network according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

The self-supervision learning is an unsupervised learning method, and mainly utilizes an auxiliary task to mine self supervision information from large-scale unsupervised data, trains a network by constructing the supervision information, and learns a universal characteristic expression for a downstream task. The method for self-supervision learning based on contrast constraint mainly learns how to construct characterization on similar and dissimilar things, so that the distance between a sample and a positive sample is far greater than the distance between the sample and a negative sample, namely the self-supervision learning is realized by constructing the positive sample and the negative sample and measuring the distance between the positive sample and the negative sample.

The small sample target detection method based on the self-supervision contrast constraint provided by the embodiment of the invention comprises the following steps:

step S1: aiming at the characteristics of the small sample target detection problem, the small sample target detection problem is modeled into a mathematical optimization problem based on self-supervision learning, and a small sample target detection model sensitive to data disturbance is constructed for input.

Step S2: and setting an optimized objective function of the small sample target detection model.

Step S3: and training the small sample target detection model by using a deep learning updating process based on the optimization target function to obtain a trained small sample target detection model, and performing target detection on the small sample to be detected by using the trained small sample target detection model.

Specifically, the step S1 includes:

the small sample target detection model is specifically expressed as follows:

(1) first stage construction of a data set D _train Which contains all training data of the base class. The core of the stage is to provide initialization parameters for the second stage model training, and simultaneously train to obtain a model extraction module for the second stage to carry out feature extraction. To obtain a good model extraction model, the sufficiency of the data needs to be guaranteed in the first stage, so D _train All the category information and more complete data information of the base class are contained;

(2) second phase building of the basic dataset D _base Data set D _base Class information and data set D _train The same amount of training data of each type is used as the small sample target data set D _novel The number of (2) is the same. The second phase is the fine tuning phase, which has as its main goal to get a model that performs well on small sample datasets. Use of D _train After the first stage training is completed, to balance the target data set and the basic data set, a certain amount of basic data is required to be used for training in the second stage, namely D _base . At this time, D _base Class information of (2) and (D) _train The same, but the same amount of image data as the final small sample dataset. To summarize, D _base And D _train The source is the same, and the auxiliary data is constructed for realizing good performance of the model on the small sample target data;

(3) second stage construction of a small sample object data set D _novel Wherein the category information is different from the first stage training set and the second stage basic data set, and the number of samples of each type of training data is different from the second stage basic data set D _base The same is true. D _novel To evaluate the core data of the model on a small sample object detection problem. In order to embody the learning characteristics of small samples, the number of samples in a data set is very limited, and the number of samples in different evaluation indexes is different;

(4) the biggest problem of small samples is that training data is insufficient, in order to make full use of limited data, the invention provides the method for carrying out feature consistency constraint by using contrast loss, and carrying out consistency constraint on the predicted distribution of the samples by using the contrast loss based on the predicted distribution. Wherein a represents a sample pair, a _p Represents a positive sample pair, a _n Represents a negative example pair, y _a A label representing the pair of samples is attached to,

Specific constraint forms of the above-described feature consistency constraint are shown in the following equations (9) and (10).

Specifically, step S2 includes:

and setting an optimized objective function of the small sample target detection model. The invention adopts a two-stage network training process, firstly, a base class training network is trained on a large number of base class data sets, and then, fine adjustment is carried out on a balance data set, so that the optimization objective function of the small sample target detection model set by the invention can be divided into the optimization objective function L of the base class training network _base ＝L _rpn +L _cls +L _reg And fine tuning network optimization objective function L _{fine_tune} ＝L _rpn +L _cls +L _reg +L _contrastwe +L _{contrastwe-JS} . The goal of the fine tuning network is to maximize the difference between different classes of objects and minimize the difference between the same classes of objects with less training data available, and specifically, the fine tuning network adds a contrast optimization objective function on the basis of the base class training network.

(1) Extracting network loss function in area

The function of the area extraction network is to screen out anchor frames that may have targets, and specifically, the area extraction network implements two functions: 1) judging whether the anchor frame is an object or a background, screening out a specified number of anchor frames by a Non Maximum Suppression (NMS) method, setting IoU threshold, and considering that IoU is larger than a given threshold upper limit and contains a target, namely a positive sample, and IoU is smaller than a given threshold lower limit and is used as the background, namely a negative sample. Other anchor frames do not participate in training; 2) coordinate correction and regression problems, namely finding the mapping relation between the anchor frame and the true value frame, can be realized by translation and scaling, when the anchor frame and the true value frame are relatively close to each other, the transformation between the prediction boundary frame and the true value frame is considered to be linear transformation, and the parameter coordinates of the boundary frame can be finely adjusted by using a linear regression model. After the correction parameters of each anchor frame are obtained, accurate anchor frame parameter coordinates can be calculated. The area extraction network loss function is completely described as shown in formula (1).

The loss function of the area extraction network is divided into classification loss functions L _{rpn_cls} And bounding box regression loss function L _{rpn_reg} Two moieties, L _{rpn_cls} For network training of classifying anchor frames as positive and negative samples, the complete description is shown in formula (2), L _{rpn_reg} And (4) the method is used for the training of the bounding box regression network, and the complete formula description is shown as a formula (3). Wherein N is _{rpn_cls} Representing the batch size, N, of training samples in a local extraction network _{rpn_reg} Indicating the number of anchor boxes generated by the area extraction network.

λ is a weight balance parameter.

and representing the true classification probability corresponding to the ith anchor box. The function is used for judging whether the extracted image area contains an object or not, and is mainly used for distinguishing whether the current area is a foreground or a background.

L _{rpn_reg} Aiming at adjusting the position information of a nomination area in an area extraction network to guide the area extraction network to extract more accurate object positions, and the method is used

Calculating the difference between the predicted bounding box and the real bounding box,

The calculation process is shown in formula (5) and formula (6).

t _x ＝(x-x _anchor )/w _anchor ，t _y ＝(y-y _anchor )/h _anchor

t _w ＝log(w/w _anchor )，t _h ＝log(h/h _anchor ) (5)

x ^* ，y ^* Coordinates of the center point of the real bounding box representing the object in the image, w ^* ，h ^* Representing the width and height of the real bounding box of the object in the image.

(2) Classification loss function

the classification truth value of the ith detection box is represented, the function provides basis for classification behaviors of the network, whether object classification of the network on a detection area is accurate or not can be judged through the function, and model updating is carried out on inaccurate objects through calculating loss values.

(3) Bounding box regression loss function

The bounding box regression loss function used in the target detection network is the same as equation (3), where t _i And

and respectively representing the predicted value and the real value of the boundary frame parameterization coordinate of the ith detection frame.

Is a loss of smoothness. The position information of the detection area can be further adjusted by the function.

(4) Contrast loss function

Since the detection box can be regarded as a disturbance variant of the real target value, the contrast loss function is constructed by constructing the positive sample pair a of the detection box _p And negative sample pair a _n Reduce the positive sample pair a _p A distance of (a) to enlarge the negative sample pair a _n The distance of (c). By controlling the characteristic expression of the sample pair in the model training process, the characteristic expression of the objects of the same category in the model is closer, and the characteristic expression difference of the objects of different categories is more obvious, so that the purpose of better learning the characteristic expression of the detection frame is achieved.

m represents the upper bound of the sample pair distance, when the distance between the sample and the negative sample is greater than m, the loss value is equal to 0, and the model is not updated; otherwise, the model is updated until the distance of the negative sample pair reaches m.

(5) Contrasive-JS penalty function

In order to expand the effect of contrast constraint, the invention provides a contrast-JS loss function to provide guidance for prediction distribution besides using a contrast loss function to constrain a feature learning process, so that the consistency constraint is performed on the prediction distribution generated by a classifier, and a model is more sensitive to the class information of an object, wherein the specific form of the model is shown in a formula (10).

Wherein p is _a Is the predicted distribution of the sample pair a，y _a Is the label for the current sample pair. p is a radical of _a [i]Representing the ith prediction distribution in the sample pair.

Specifically, step S3 includes:

aiming at the problem of small sample target detection, the invention constructs a two-stage deep learning model updating process and trains a small sample target detection model by using an optimized objective function. The two-stage learning process consists of two stages of sufficient data training and small sample data fine tuning. In the first stage, sufficient training samples are used for training the whole detection frame to obtain model parameters of the model on the basic sample; in the second stage, firstly, the model parameters in the first stage are used for carrying out parameter initialization on the network, the parameters of the feature extraction module are fixed, the model parameters are finely adjusted by using a small sample data set, and in addition, a consistency strategy based on self-supervision learning provided by the invention is introduced in the second stage to restrain the feature expression and distribution expression of the sample, and finally, model training is completed.

The specific process is as follows:

step 3-1: generation of dataset D Using PASCAL VOC dataset _train ，D _base And D _noval There are 20 classes in the PASCAL VOC data set, 15 of which are divided into a basic class and 5 new classes. Building D with all instances of the base class _train Randomly sampling K-1, 2, 3, 5, 10 instances from the new class and the basic class as D of K-shot _base And D _noval . When partitioning is performed on the PASCAL VOCs, three different random partitioning approaches are employed, referred to as Split1, Split2, and Split 3.

Step 3-2: creating a base class training network taking fast-RCNN as a basic frame, selecting ResNet101 and a feature pyramid as a feature extraction network, initializing model parameters, setting hyper-parameters, and setting the standard batch size to be 16. A standard SGD optimizer was created with momentum of 0.9 and weight decay of 1 e-4.

Step 3-3: construction of D _train And the data loader is used for performing data enhancement on the original input.

Step 3-4: training the base class training network, calculating the output value of each base class training sample, and calculating the loss L _base The network parameters are updated using a gradient descent algorithm.

Step 3-5: if the model converges or the required training steps are reached, ending the base class training network training process and storing the model parameters; otherwise, go back to step 4-2.

Step 3-6: construction of D _base And D _noval And the data loader is used for creating a fine tuning network model, initializing the network by using model parameters obtained by the base class training network and creating the optimizer.

Step 3-7: training a fine tuning network, on the basis of a base class training network, obtaining a candidate frame feature map generated by training samples after the pooling operation of an interested area, traversing a candidate frame feature map list, matching a candidate frame feature map with the same category as a positive example for each candidate frame feature map, matching a candidate frame feature map with different categories as a negative example, selecting two positive examples and a current sample to form 2 positive sample pairs, selecting two negative examples and the current sample to form 2 negative sample pairs, and calculating L for the candidate frame feature maps of the obtained positive sample pairs and the negative sample pairs _contrastive (ii) a Obtaining the class probability distribution of the training samples after the classification operation, and calculating the class probability distribution of the positive sample pair and the negative sample pair _contrastive Calculating the output value of each training sample, calculating the loss L _{fine_tune} The network parameters are updated using a gradient descent algorithm.

Step 3-8: using AP50 as a model performance evaluation index on a PASCAL VOC 2007 test set, evaluating the performance of the model on a new category, observing the convergence condition of the model, and ending the fine-tuning network training process if the model converges or reaches the required training steps; otherwise, go back to step 3-7.

Results of the experiment

The comparison result of the method designed by the invention and the previous small sample target detection algorithm is shown in table 1. As shown in the table, the method has the highest accuracy rate in different base classes and new class setting modes, namely different data set segmentation modes, and when the number of new class instances is 1, the detection accuracy rate is improved by 7.0% at most compared with a better small sample target detection algorithm.

Table 1 comparative experimental results of the present invention under different data divisions

In summary, the small sample target detection method based on the self-supervision contrast constraint provided by the invention enhances the small sample target detection effect by using the self-supervision contrast constraint, and compared with the traditional algorithm for indirectly adjusting the parameters of the area generation network and the feature pyramid through the full connection layer, the method directly influences the feature extraction, specifically, the method directly limits the parameter update of the area generation network and the feature pyramid, does not introduce new parameters into the network, and does not increase extra calculation amount.

Compared with the traditional algorithm, the method enhances the classification and positioning capacity of the small sample target by using the contrast loss, and can stably improve the class target detection capacity under different example numbers.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A small sample target detection method based on self-supervision contrast constraint is characterized by comprising the following steps:

2. The method of claim 1, wherein the modeling of the small sample target detection problem as a mathematical optimization problem based on self-supervised learning, and the constructing of the input-oriented small sample target detection model sensitive to data disturbance comprises:

(1) first stage construction of a data set D _trai n, wherein all training data of the base class are contained;

(2) second phase building of the basic dataset D _base Data set D _base Class information and data set D _train The same amount of training data for each class is used for the small sample target data set D _novel The number of the groups is the same;

(3) second stage construction of a small sample object data set D _novel Wherein the category information is associated with a first stage data set D _trai n, second stage basic data set D _base In contrast, the number of samples of each type of training data is equal to the second stage basic data set D _base The same;

(4) using contrast loss to carry out feature consistency constraint, proposing that contrast loss based on prediction distribution carries out consistency constraint on prediction distribution of samples, and constructing positive and negative sample pairs, wherein a represents the sample pair, a represents the sample pair _p Represents a positive sample pair, a _n Representing a negative example pair, y _a A label representing the pair of samples is attached to,

S，S ⁺ ，S ^- representing sample features used to construct positive and negative sample pairs, S representing features corresponding to reference samples, S ⁺ Indicating the same class as the reference sample and having the largest value of IoUSample characteristics, S ^- Representing sample features different from the reference sample class, i.e. a _p ＝{S，S ⁺ }，a _n ＝{S，S ^- }。

3. The method of claim 1, wherein the designing an optimized objective function for the small sample object detection model comprises:

the loss function of the area extraction network is divided into classification loss functions L _{rpn_cls} And bounding box regression loss function L _{rpn_reg} Two moieties, L _{rpn_cls} For network training of classifying anchor frames as positive and negative samples, the complete description is shown in formula (2), L _{rpn_reg} For the bounding box regression network training, the complete formula description is shown in formula (3), where N is _{rpn_cls} Representing the batch size, N, of training samples in a local extraction network _{rpn_reg} Representing the number of anchor boxes generated by the area extraction network,

λ is the weight balance parameter:

L _{rpn_cls} the calculation of whether the anchor frame contains the target loss or not by using cross entropy is a two-classification loss, p _i Representing the predicted classification probability of the ith anchor box,

The calculation process is shown in formula (5) and formula (6).

t _x ＝(x-x _anchor )/w _anchor ，t _y ＝(y-y _anchor )/h _anchor

t _w ＝log(w/w _anchor )，t _h ＝log(h/h _anchor ) (5)

using cross entropy as a classification loss function in an object detection network, where s _i Denotes the ith detection frame, p _i Representing the predicted classification probability of the ith detection box,

t _i and

wherein p is _a Is the predicted distribution of the sample pair a, y _a Is the label for the current sample pair. p is a radical of _a [i]Representing the ith prediction distribution in the sample pair,

4. The method of claim 3, wherein the training of the small sample target detection model based on the optimization objective function using a deep learning update process to obtain the trained small sample target detection model comprises:

5. The method of claim 4, wherein the training of the small sample target detection model based on the optimization objective function using a deep learning update process to obtain the trained small sample target detection model comprises:

step 3-1: generation of dataset D Using PASCAL VOC dataset _train ，D _base And D _noval ，PASCAL There are 20 classes in the VOC data set, 15 classes are divided into basic classes and 5 new classes, and D is constructed by using all examples of the basic classes _train Randomly sampling K-1, 2, 3, 5, 10 instances from the new class and the basic class as D of K-shot _base And D _noval ；

step 3-7: training a fine tuning network, on the basis of a base class training network, obtaining a candidate frame feature map generated by training samples after the pooling operation of an interested area, traversing a candidate frame feature map list, matching a candidate frame feature map with the same category as a positive example for each candidate frame feature map, matching a candidate frame feature map with different categories as a negative example, selecting two positive examples and a current sample to form 2 positive sample pairs, selecting two negative examples and the current sample to form 2 negative sample pairs, and calculating L for the candidate frame feature maps of the obtained positive sample pairs and the negative sample pairs _contrastive (ii) a Obtaining the class probability distribution of the training samples after the classification operation, and calculating the class probability distribution of the positive sample pair and the negative sample pair _contrastive Calculating each trainingOutput value of sample, calculating loss L _{fine_tune} Updating network parameters by using a gradient descent algorithm;