CN111582345A

CN111582345A - Target identification method for complex environment under small sample

Info

Publication number: CN111582345A
Application number: CN202010358400.5A
Authority: CN
Inventors: 姚远; 郑志浩; 张学睿; 张帆; 尚明生
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing University; Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-08-25

Abstract

The invention relates to a target identification method for a complex environment under a small sample, belonging to the technical field of image processing. The method comprises the following steps: 1) data expansion, specifically comprising: s11: constructing and training a GAN; s12: after the GAN network training is finished, screening a data set generated by the GAN, and mixing a result and a real data set to form a new data set to obtain an expanded small sample data set; labeling the new data set, and taking the new data set after labeling as an input of YOLOV 3; 2) the target identification specifically comprises the following steps: s21: constructing and training a Yolov3 network; s22: after the coordinate, confidence and classification training of the YOLOV3 network is completed, inputting the new data set into the YOLOV3 network, performing NMS processing on the finally remaining detection frames in the picture, deleting redundant frames, and outputting the picture with the detection frames. The method can solve the problem that the target is difficult to clearly identify in a complex environment under a small sample.

Description

Target identification method for complex environment under small sample

Technical Field

The invention belongs to the technical field of image processing, and relates to a target identification method for a complex environment under a small sample.

Background

In actual engineering, the data samples acquired are usually insufficient, so that model learning is insufficient, and an overfitting state occurs. Under a complex scene, overexposure occurs, and the number of positive samples is greater than that of negative samples, and under such a situation, a method for identifying a target under a complex environment by using a small sample is urgently needed.

In recent years, recognition technology based on deep learning has been developed rapidly, and deep convolution classification networks represented by GoogleNet, VGG, ResNet, and sentet have been especially successful in the industrial and academic fields. Compared with the traditional image classification and identification technology, the deep convolution classification network enables the feature extraction and the feature classification to be unified into an integral framework for joint training, so that the problems of manual feature extraction and semantic gap existing in the traditional identification method are solved. However, these classification models are end-to-end supervision models, the higher accuracy depends on a large amount of labeled data, under the condition of rare data, the models are easy to over-fit, a poor generalization result and a lower accuracy are obtained, and the data enhancement and regularization technology can only relieve the problem of target identification of small samples and cannot fundamentally solve the problem.

In order to solve the problem of small number of collected samples, it is currently easy to think of expanding sample data by using a Generative Adaptive Network (GAN), which is a deep learning model and one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the discriminant Model (discriminant Model) produces quite good output, thereby obtaining an extended sample.

Disclosure of Invention

In view of this, the present invention aims to provide a target identification method in a complex environment under a small sample, which expands the sample data through a GAN network to solve the problem of difficult sample collection; and then inputting the expanded sample data into a YOLOV3 network, and improving the self identification capability of the network by learning the offset of the central point of the bounding box, thereby solving the problem that the target is difficult to clearly identify in a complex environment under a small sample.

In order to achieve the purpose, the invention provides the following technical scheme:

a target identification method of a complex environment under a small sample comprises the following steps:

s1: data expansion, specifically comprising:

s11: constructing and training a GAN network;

s12: after the GAN network training is finished, screening a data set generated by the GAN, and mixing a result and a real data set to form a new data set to obtain an expanded small sample data set; labeling the new data set, and taking the new data set after labeling as an input of YOLOV 3;

s2: the target identification specifically comprises the following steps:

s21: constructing and training a Yolov3 network;

s22: after the coordinate, confidence and classification training of the YOLOV3 network is completed, inputting the new data set into the YOLOV3 network, performing NMS processing on the finally remaining detection frames in the picture, deleting redundant frames, and outputting the picture with the detection frames.

Further, in step S11, the GAN network constructed includes a generator C and a discriminator T; the generator C has an input of noise data conforming to a probability distribution, such as gaussian distribution, bernoulli distribution, uniform distribution, etc., where it is assumed that the noise data conforms to gaussian distribution, and the role of C is to generate a new picture from the input noise data; the discriminator T has two inputs, one is a real data set, the label of the real data set is automatically set to be 1, the other input is data generated by the generator C, the label is automatically set to be 0, the function of the discriminator T is to identify the real data and the data generated by the generator as far as possible, and the discriminator T can be regarded as a two-classification network;

the loss function of the GAN network is:

wherein, t to A_true(t)A data set t (t) representing data t from real data; n to A_noise(n)The representative data n comes from the data set C (n) generated by the generator C.

Further, in step S11, the training GAN network is: a mode training generator C and a discriminator T which adopt single alternate iterative training; before training, initializing a generator C randomly, and pre-training a discriminator T to ensure that the discriminator T has certain classification capability when training is started;

the specific steps of GAN network training are as follows:

1) a fixed generator C trains a discriminator T and circularly executes the following steps K times;

① from noise data A_noiseMiddle sampling n objects to generate a set n-A_noise(n)；

② from the real data set A_true(t)Middle sampling t objects to generate set t-A_true(t)；

③ mixing n to A_noise(n)Inputting the data into the GAN to generate a new data set C (n);

④ mixing C (n) and t-A_true(t)Inputting T, training with the following formula as a loss function, wherein the loss function is similar to a binary network, and when T is distinguished, A is inclined to be_true(t)The score of the data in (1) is close to 1, and the score of the data in (c) (n) is close to 0;

adopting cross entropy loss as a loss function, updating network parameters in a gradient descending mode, and circulating for K times to find out the optimal discriminator T under the condition of the current GAN;

2) the fixed discriminator T and the training generator C execute the following steps once;

② mixing n to A_noise(n)Inputting C, and recording the output data as C (n);

③ from C (n) and A_true(t)Sampling n data to form a set and inputting the set into T;

fourthly, training C according to the loss function of the following formula and the output result of T, and updating the network parameters by adopting gradient descent;

the objective of the penalty function is to hopefully find the true dataset A under the current conditions of the arbiter_true(t)And output of C (n) network parameter with minimum KL divergence between two data sets; since the parameters of the discriminator T are fixed while training the generator G, C is optimized according to the discrimination result T (C (n)) of T;

3) and ending the single training process, returning to the beginning, and performing training again.

Further, in step S21, the YOLOV3 network is constructed as: the main network is 53 layers of dark net, the image features are extracted by adopting a residual error structure and a small convolution kernel, and three detection layers with different sizes are set to respectively detect a larger target and a smaller target by using a feature pyramid structure.

Further, in step S21, the training of the YOLOV3 network specifically includes:

1) obtaining a clustering center of the data set by using a K _ means clustering algorithm, and setting the clustering center as the value of anchors;

2) using random resize to enhance data, and adjusting the size of an input picture to any multiple of 16;

3) inputting a picture into a network, extracting the features of the picture through Darknet53, dividing the feature picture into three cells with different numbers, respectively sending the extracted features to three YOLO detection layers, and outputting a picture with a prediction frame drawn by the YOLO layers;

4) and comparing the coordinates of the prediction frame drawn by the YOLO layer with the coordinates of the anchor, regressing the coordinate offset by adopting a logistic mode, and calculating by using the following four formulas:

wherein R is_mAnd R_nIs the coordinate of the cell at the upper left corner of the prediction box, sigmoid (O)_m) And sigmoid (O)_n) Is the offset of the coordinate of the center point of the prediction frame compared with the coordinate of the center of the anchor, O_mAnd O_nIs the coordinate of the center point of the prediction box, b_mAnd b_nIs the normalized value of the center of the prediction box relative to the coordinate of the upper left corner of the cell, A_wAnd A_hIs width and height of anchor, O_wAnd O_hB is the width and height of the detection frame_wAnd b_hThe width and height of the prediction box after normalization are relative to the width and height of the anchor;

5) meanwhile, the probability that an object exists in each detection frame is scored by using logistic regression, the probability is recorded as confidence degree, the detection frame with the highest confidence degree is selected for reservation, and the rest frames are deleted;

6) after the confidence degree score is obtained, the network classifies the objects in the detection frame according to the classified loss function;

further, in step 4) of training the YOLOV3 network, the loss function of the center coordinates and the width and height is:

wherein, α_exitIs the weight coefficient of the central coordinate loss function, l x l represents the number of cells into which the characteristic diagram is divided, K represents the number of prediction frames,

judging whether the b-th prediction frame of the a-th cell is responsible for detecting the current object or not, and if so, judging whether the current object exists

Has a value of1, otherwise 0; then using the squared error, wherein

And

is the center coordinate of the manual mark frame, m_aAnd n_aIs the center coordinate of the prediction box;

the loss function for width and height is:

wherein p is_aAnd q is_aIs the width and height of the prediction box,

and

then the width and height of the manual label box.

Further, in step 5) of training the YOLOV3 network, the loss function of the confidence in the training is:

the loss function of confidence coefficient adopts cross entropy error, the first expression represents the loss function of confidence coefficient under the condition that the prediction frame has an object, the second expression represents the loss function of confidence coefficient under the condition that no object exists, α_noexitThe weight coefficient is expressed when no object exists, and the influence of a prediction frame without the object on the updating of the network parameters is reduced;

confidence indicating the presence of an object, 1 if an object is present, and absence of an objectIs 0; e.g. of the type_aRepresenting the confidence level calculated by the network itself.

Further, in step 6) of training the YOLOV3 network, the classified loss function adopts cross entropy error, and the calculation formula is as follows:

wherein e ∈ classes represents the object type to which the object in the prediction box belongs,

indicates that e is 1 when belonging to the correct class of the object, 0 when belonging to the other class, and G_a(e) Indicating the score of the network after classifying e.

The invention has the beneficial effects that: according to the invention, the GAN network is adopted to expand sample data, so that the problem of difficulty in sample acquisition is solved; and then inputting the expanded sample data into a YOLO V3 network, and improving the self recognition capability of the network by learning the offset of the central point of the bounding box, thereby solving the problem that the target is difficult to clearly recognize in a complex environment under a small sample.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a block diagram of a target identification method of the present invention;

FIG. 2 is an original picture obtained according to an embodiment of the present invention;

fig. 3 shows the recognition result of the embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Referring to fig. 1, a method for identifying a target in a complex environment under a small sample includes the following steps:

s1: data expansion, specifically comprising:

s11: constructing and training a GAN network;

the constructed GAN network comprises a generator C and a discriminator T; the generator C has an input of noise data conforming to a probability distribution, such as gaussian distribution, bernoulli distribution, uniform distribution, etc., where it is assumed that the noise data conforms to gaussian distribution, and the role of C is to generate a new picture from the input noise data; the discriminator T has two inputs, one is a real data set, the label of the real data set is automatically set to be 1, the other input is data generated by the generator C, the label is automatically set to be 0, the function of the discriminator T is to identify the real data and the data generated by the generator as far as possible, and the discriminator T can be regarded as a two-classification network;

the loss function of the GAN network is:

The training GAN network is: a mode training generator C and a discriminator T which adopt single alternate iterative training; before training, initializing a generator C randomly, and pre-training a discriminator T to ensure that the discriminator T has certain classification capability when training is started;

the specific steps of GAN network training are as follows:

④ mixing C (n) and t-A_true(t)Inputting T, training with the following formula as a loss function, wherein the loss function is similar to a binary network, and when T is distinguished, A is inclined to be_true(t)The score of the data in (a) is close to 1, and the score of the data in (c) (n) is close to 0;

because the network parameters of the generator C are fixed and unchanged when the arbiter T is trained, the cross entropy loss is adopted as a loss function, the network parameters are updated in a gradient descending manner, and the cycle is performed for K times to find the optimal arbiter T under the condition of the current GAN;

② mixing n to A_noise(n)Inputting C, and recording the output data as C (n);

S12: after the GAN network training is finished, screening a data set generated by the GAN, and mixing a result and a real data set to form a new data set to obtain an expanded small sample data set; the new data set is annotated and the new data set after annotation is taken as input to YOLOV 3.

S2: the target identification specifically comprises the following steps:

s21: constructing and training a Yolov3 network;

the constructed YOLOV3 network was: the main network is 53 layers of dark net, the image features are extracted by adopting a residual error structure and a small convolution kernel, and three detection layers with different sizes are set to respectively detect a larger target and a smaller target by using a feature pyramid structure.

The training YOLOV3 network specifically includes:

wherein R is_mAnd R_nIs the coordinate of the cell at the upper left corner of the prediction box, sigmoid (O)_m) And sigmoid (O)_n) Is the offset of the coordinate of the center point of the prediction frame compared with the coordinate of the center of the anchor, O_mAnd O_nIs the coordinate of the center point of the prediction box, b_mAnd b_nIs the normalized value of the center of the prediction box relative to the coordinate of the upper left corner of the cell, A_wAnd A_hIs width and height of anchor, O_wAnd O_hB is the width and height of the detection frame_wAnd b_hThe width and height of the prediction box are normalized and are relative to the width and height of the anchor.

The loss function of center coordinates and width and height is:

The value is 1, otherwise 0; then using the squared error, wherein

And

the loss function for width and height is:

wherein p is_aAnd q is_aIs the width and height of the prediction box,

and

then the width and height of the manual label box.

5) And meanwhile, scoring the possibility that an object exists in each detection frame by using logistic regression, recording the score as a confidence coefficient, selecting the detection frame with the highest confidence coefficient, reserving the detection frame, and deleting the rest detection frames.

The loss function of confidence in training is:

the confidence coefficient of the existing object is 1 when the object exists, and is 0 when the object does not exist; e.g. of the type_aRepresenting the confidence level calculated by the network itself.

6) After the confidence scores, the network classifies the objects in the detection frame according to the classified loss function.

The classified loss function adopts cross entropy error, and the calculation formula is as follows:

Example 1: the citrus data set shot by the unmanned aerial vehicle is used for comparing and verifying the invention and a common target identification algorithm, and the obtained citrus data is shown in figure 2. In this embodiment, the specific recognition result is shown in table 1, comparing the method with three common target recognition algorithms of YOLO V3, Mask Fast rcnn and Fast rcnn, and the method of the present invention has the highest recognition accuracy.

TABLE 1 comparative experimental results

Serial number	Name of algorithm	Accuracy of identification
			1	GAN+YOLO V3	90.1％
2	YOLO V3	85.4％
			3	Mask fast rcnn	76.5％
4	Fast rcnn	69.4％

As can be seen from FIG. 3, the method of the present invention has better recognition under the condition of high density.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A target identification method of a complex environment under a small sample is characterized by comprising the following steps:

s1: data expansion, specifically comprising:

s11: constructing and training a generating type countermeasure network (GAN);

s2: the target identification specifically comprises the following steps:

s21: constructing and training a Yolov3 network;

2. The method for identifying the target under the complex environment with the small sample according to claim 1, wherein in step S11, the constructed GAN network comprises a generator C and a discriminator T; the generator C has an input, i.e. noise data according to a certain probability distribution, the role of C is to generate a new picture based on the input noise data; the discriminator T has two inputs, one is a real data set, the label of the real data set is automatically set to 1, the other input is data generated by the generator C, the label is automatically set to 0, and the function of T is to identify the real data and the data generated by the generator;

the loss function of the GAN network is:

3. The method for target recognition in a complex environment under a small sample according to claim 2, wherein in step S11, the training GAN network is: a mode training generator C and a discriminator T which adopt single alternate iterative training; before training, initializing a generator C randomly, and pre-training a discriminator T to ensure that the discriminator T has certain classification capability when training is started;

the specific steps of GAN network training are as follows:

② mixing n to A_noise(n)Inputting C, and recording the output data as C (n);

4. The method for target recognition in a complex environment under a small sample according to claim 1, wherein in step S21, the YOLOV3 network is constructed by: the main network is 53 layers of dark net, the image features are extracted by adopting a residual error structure and a small convolution kernel, and three detection layers with different sizes are set to respectively detect a larger target and a smaller target by using a feature pyramid structure.

5. The method for target recognition in a complex environment under a small sample according to claim 4, wherein the step S21 of training the YoloV3 network specifically includes:

6. The method for target recognition in a complex environment under a small sample according to claim 5, wherein in the step 4) of training the YOLOV3 network, the loss functions of the center coordinates and the width and height are as follows:

The value is 1, otherwise 0; then using the squared error, wherein

And

the loss function for width and height is:

wherein p is_aAnd q is_aIs the width and height of the prediction box,

and

then the width and height of the manual label box.

7. The method for target recognition in a complex environment under a small sample according to claim 5, wherein in the step 5) of training the YOLOV3 network, the loss function of the confidence coefficient during training is:

8. The method for identifying the target under the complex environment with the small sample as recited in claim 5, wherein in the step 6) of training the YOLOV3 network, the classified loss function adopts cross entropy error, and the calculation formula is as follows: