CN109543740B

CN109543740B - Target detection method based on generation countermeasure network

Info

Publication number: CN109543740B
Application number: CN201811363392.2A
Authority: CN
Inventors: 项学智; 于泽婷; 翟明亮; 吕宁; 郭鑫立; 王帅; 张荣芳; 张玉琦
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2022-07-15
Anticipated expiration: 2038-11-14
Also published as: CN109543740A

Abstract

The invention provides a target detection method based on generation of an antagonistic network, which comprises the steps of designing a generator, generating various samples according to class labels, designing an agent, detecting data of the generator, providing a pseudo true value, applying the data generated by the agent to training of a target detector, designing the target detector, judging whether the generated data is beneficial to improving the target detection precision, designing the antagonistic device, judging whether the data is from real data or generated data in a training stage, alternately training the generator and the arbiter, and directly inputting the data to be detected into the target detector in a testing stage to obtain a detection result. The invention can enrich training data by combining the sample generated by the generating network with the real sample, improve the detection precision, provide feedback for the generating network by the target detection network, ensure that the generated sample is more real, directly apply the data generated by the agent to the training of the target detector, and do not need to consume a large amount of manpower and material resources for marking.

Description

Target detection method based on generation countermeasure network

Technical Field

The invention belongs to the field of target detection methods, and particularly relates to a target detection method based on a generation countermeasure network.

Background

In recent years, deep learning is rapidly developed, and a target detection algorithm based on the deep learning obtains better results, but problems to be solved still exist. Firstly, a target detection algorithm based on deep learning needs a large number of labeled samples, and deep learning is often used on a small data set to be easily overfitting. According to research, a rough rule of thumb is that supervised deep learning algorithms will generally achieve acceptable performance given about 5000 labeled samples per class, and will achieve or exceed human performance when a data set of at least 1000 million labeled samples is used for training. In addition, the labeling of the data set will also consume a lot of manpower and material resources. It is worth further investigation how to train with a large number of unlabeled samples or generated samples. Secondly, most of the current stage of target detection research relies on deeper networks to improve accuracy, but more complex calculations and more memory requirements are brought about, a large amount of hardware is consumed, and deployment is difficult. The generation of countermeasure networks is one of the most advanced and fascinating areas in deep learning today, however most of the research focuses on improving the quality of generated data, and the research on how to apply the generated data is less.

In order to solve the above problems, the present invention provides a method for detecting a target based on a generation countermeasure network. First, the generation countermeasure network is merged with the target detection network. The combination of the samples generated by the generation network and the real samples can enrich training data, improve detection precision, and meanwhile, the target detection network also provides feedback for the generation network, so that the generated samples are more real. The invention introduces the agent, and the generated sample can generate a pseudo-true value through the agent, so that the generated data can be directly applied to the training of the target detector without consuming a large amount of manpower and material resources for marking. Secondly, the method only applies the target detection network in the actual test process, and has simple structure and easy deployment.

Disclosure of Invention

The invention aims to provide a method for detecting a target based on a generated confrontation network, which is suitable for detecting the target based on the generated confrontation network under the condition of a small number of labeled samples.

The purpose of the invention is realized by the following steps:

a target detection method based on a generation countermeasure network comprises the following specific implementation steps:

step 1, designing a generator G, and generating various samples according to category labels;

step 2, selecting a trained high-precision detector as a proxy F, detecting data generated by a generator G, providing a false true value, and applying the data generated by the proxy F to the training of a target detector;

Step 3, designing a target detector O, judging whether generated data is beneficial to the improvement of target detection precision in a training stage, and providing feedback for a generator, wherein the target detector O is the final output of a testing stage;

step 4, designing a reactor A, judging whether the data are from real data or generated data in a training stage, and providing feedback for a generator;

step 5, in a training phase, alternately training a generator G and a discriminator D, wherein the input of the generator G is normalized to be in a range from-1 to 1, the input of the discriminator D is real data and generated data, the generator uses Adam as an optimizer, and the discriminator uses SGD as an optimizer;

and 6, in the testing stage, directly inputting the data to be detected into the target detector O to obtain a detection result.

Step 1, the generator G adopts a conditional constraint GAN network structure, inputs condition variables y, namely class labels, and generates samples of different classes under the guidance of the class labels, the generator G adopts a residual error network structure and comprises 4 residual error blocks and an upsampling layer, wherein the residual error blocks are a set of two BN layers, a ReLU layer and a 3 × 3 convolutional layer, and the output layer of the generator G adopts a Tanh activation function.

And 3, the target detector O in the step 3 uses a full convolution architecture, the feature extraction part of the target detector O has 15 layers of convolution in total, wherein the convolution kernel size of the convolution layers 2, 4, 6, 8 and 10 is 3 x 3, the step length is 2, the step lengths of the rest convolution layers are 1, and the multi-feature fusion technology is used for fusing the lower-layer convolution features and the upper-layer convolution features.

The conditional variable y is introduced into the countermeasure device A, the network structure comprises 5 residual blocks and a down-sampling layer, the residual blocks in the countermeasure device A are a set of two LeakyReLU layers and 3 × 3 convolution layers, and the countermeasure device A uses a spectrum normalization technology.

The parameter of the generator G in the step 1 is theta_gBy optimising the loss function L_DCan obtain theta_g，

Loss function L_DBy opposing the loss L_{D_A}And detecting loss L_{D_O}Two-part, i.e. L_D＝L_D-A+λL_D-OTo counter the loss L_{D_A}Is composed of

Where Z is the input to the generator, y is the category label,

in order to be the output of the generator,

means that the parameter is theta_aTo determine the antagonistic branches of the network,

a probability of discriminating the generated sample as a true sample for the competitor; detecting loss L_{D_O}Is composed of

Wherein

Is the target information output by the target detector,

the class probability of the output of the target detector,

is the position coordinates output by the target detector,

the target information output for the agent is,

the class probability that is output by the agent,

location coordinates output for agents

The parameter of the reactor A in the step 4 is theta_aI.e. by

The parameter of the target detector O in the step 3 is theta_oI.e. by

The invention has the beneficial effects that: the method has the advantages that the generated countermeasure network is fused with the target detection network, training data can be enriched by combining the samples generated by the generation network with real samples, the detection precision is improved, meanwhile, the target detection network also provides feedback for the generation network, and the generated samples are more real.

Drawings

Fig. 1 is a diagram of the network architecture of the present invention.

Fig. 2 is an architecture diagram of the generator G of the present invention.

Fig. 3 is a diagram of residual block in generator G according to the present invention.

Fig. 4 is an architecture diagram of a feature extraction section of the object detector O of the present invention.

FIG. 5 is a diagram of the structure of reactor A of the present invention.

Fig. 6 is a diagram illustrating a structure of a residual block in the reactor a according to the present invention.

Detailed Description

The invention is further described with reference to the accompanying drawings in which:

example 1

A target detection method based on generation of a countermeasure network comprises the following specific implementation steps:

step 1, designing a generator G, and generating various samples according to the class labels, as shown in FIG. 2;

step 2, selecting a trained high-precision detector as an agent F, detecting data generated by a generator G, providing a pseudo true value, applying the data generated by the agent F to the training of a target detector, selecting a YOLO v3 network as the agent in the scheme, generating bounding box information and category information of a sample generated by the generator through the agent as an agent true value, and realizing the direct application of the generated data without additional marking and an end-to-end network structure by introducing the agent;

and 3, designing a target detector O, judging whether generated data is beneficial to the improvement of target detection precision or not in a training stage, providing feedback for the generator, wherein the target detector O is the final output of a testing stage, as shown in fig. 4, the target detector O is used as one of discriminators, training is alternately performed with the generator in the training stage, judging whether generated data is beneficial to the improvement of detection precision or not, providing feedback for the generator, the input of the target detector O is a generated sample, a false true value generated by a proxy is output as the type and the position coordinate of a detected target, a characteristic extraction part adopts a full convolution architecture, 15 layers of convolutions are adopted, wherein the convolution kernel size of the convolutions of the 2 nd, 4 th, 6 th, 8 th and 10 th layers of convolutions is 3 x 3, the step length is 2, and the step length of the rest convolution layers is 1. In order to ensure the stability of the whole network model, all the pooling layers are replaced by convolutional layers fused into step length, and in addition, in order to obtain richer detail features, a multi-feature fusion method is used for fusing lower-layer convolutional features and higher-layer convolutional features, the structure of a target detector O is similar to SSD and YOLO, and belongs to one-stage detection, so that the boundary box coordinates and the classification probability are simultaneously predicted as the output of the last layer, unlike RCNN series detectors. Each unit location in the last layer of the feature map predicts N bounding boxes, where N is the number of anchor boxes. The number of feature maps in the last layer is set to N × (K +5), where K is the number of classes used to predict class probabilities, 5 refers to the bounding box coordinates and target value (5 ═ 4+ 1);

Step 4, designing a countermeasure device a, in the training stage, judging whether the data is from real data or generated data, and providing feedback for the generator, as shown in fig. 5, the countermeasure device a is one of the discriminators, the input is the real data and the output of the generation model, meanwhile, the input also introduces category information, the output is two types 0/1, wherein 1 is true, namely the real data, 0 is false, namely the generated data, the countermeasure device a uses a residual error network structure, and the residual error block comprises two leak ReLU layers and 3 × 3 convolutional layers. The method has the advantages that 5 residual blocks are provided in total, each residual block is followed by a downsampling layer, the weight normalization technology of spectrum normalization is used by the reactor A, the calculation amount is small, and pictures with higher quality or equivalent quality compared with other technologies can be generated;

and 5, in a training stage, alternately training the generator G and the discriminator D, wherein the input of the generator G is normalized to be in a range from-1 to 1, the input of the discriminator D is real data and generated data, the generator uses Adam as an optimizer, the discriminator uses SGD as the optimizer, the generator and the discriminator compete with each other in the training process of the whole network, the capacity of the generator and the discriminator is continuously improved in the alternate training process, and finally the generator generates new data similar to the real data by learning the essential characteristics of the real data. The judgment model comprises two parts, wherein the countermeasure device A judges whether input data is true or false, the target detector O judges whether the input data can improve the target detection precision, the judgment model plays a role in guiding the generation model to adjust so that the obtained generation data is closer to the real data, the repeated training process is prevented from being in a divergent state, the judgment device is trained firstly in the training process, then the generator is trained, and the judgment model and the generator are alternately trained;

The step 1 generator G adopts a conditional constraint GAN network structure, and G is from a noise p_zFirstly, generating an image by network forward propagation, and expecting the generated image to tend to a real image, inputting a condition variable y, namely a category label, by a false fraud discriminator, generating samples of different categories under the guidance of the category label, wherein a generator G uses a residual error network structure and comprises 4 residual error blocks and an upsampling layer, wherein the residual error blocks are a set of two BN layers, a ReLU layer and a 3 × 3 convolution layer, and the output layer of the generator G adopts a Tanh activation function.

The step 3 target detector O uses a full convolution architecture, the feature extraction part of which has 15 layers of convolution, the convolution kernel size is 3 × 3, 1 × 1, and the multi-feature fusion technique is used to fuse the low-layer convolution features and the high-layer convolution features.

The countermeasure A introduces a condition variable y, the network structure comprises 5 residual blocks and a down-sampling layer, the residual blocks in the countermeasure A are a set of two LeakyReLU layers and 3 × 3 convolution layers, and the countermeasure A uses a spectrum normalization technology.

The parameter of the generator G in the step 1 is theta _gBy optimising the loss function L_DCan obtain theta_g，

Loss function L_DBy opposing the loss L_{D_A}And detecting loss L_{D_O}Two-part compositions, i.e. L_D＝L_D-A+λL_D-OTo counter the loss L_{D_A}Is composed of

Where Z is the input to the generator, y is the category label,

for the output of the generator, the loss L is detected_{D_O}Is composed of

Wherein

Is the target information output by the target detector,

the class probability of the output of the target detector,

is the position coordinates output by the target detector,

the target information output for the agent is,

the class probability that is output by the agent,

bits output for agentCoordinate setting

The parameter of the reactor A in the step 4 is theta_aI.e. by

The parameter of the target detector O in the step 3 is theta_oI.e. by

The invention relates to a target detection method based on generation of a countermeasure network, in particular to a target detection method based on the condition of generating a small number of labeled samples of the countermeasure network.

In recent years, deep learning is rapidly developed, and a target detection algorithm based on the deep learning obtains better results, but the problem to be solved still exists. Firstly, a target detection algorithm based on deep learning needs a large number of labeled samples, and deep learning is often used on a small data set to be overfitting easily. According to research, a rough rule of thumb is that supervised deep learning algorithms will generally achieve acceptable performance given about 5000 labeled samples per class, and will achieve or exceed human performance when a data set of at least 1000 million labeled samples is used for training. In addition, the labeling of the data set will consume a lot of manpower and material resources. It is worth further investigation how to train with a large number of unlabelled or generated samples. Secondly, most of the current stage of target detection research relies on a deeper network to improve the accuracy, but more complex calculation and more memory requirements are brought, a large amount of hardware is consumed, and the target detection research is difficult to deploy. The generation of countermeasure networks is one of the leading and most exciting areas in deep learning today, however most research focuses on improving the quality of generated data and less research is done on how to apply the generated data.

In order to solve the above problems, the present invention provides a method for detecting a target based on a generation countermeasure network. First, the generation countermeasure network is merged with the target detection network. The combination of the samples generated by the generation network and the real samples can enrich training data, improve detection precision, and meanwhile, the target detection network also provides feedback for the generation network, so that the generated samples are more real. The method introduces the agent, and the generated sample can generate a pseudo true value through the agent, so that the generated data can be directly applied to the training of the target detector without consuming a large amount of manpower and material resources for marking. Secondly, the method only applies the target detection network in the actual test process, and has simple structure and easy deployment.

The invention aims to provide a target detection method based on a generation countermeasure network under the condition of a small number of labeled samples. The overall network structure is shown in fig. 1 and mainly comprises three major parts: generator G, agent F, and discriminator D, wherein the discriminator comprises target detector O and countermeasure A. The method comprises the following steps:

S1, a design generator G generates various samples according to the class labels, and the structure diagram of the design generator G is shown in the attached figure 2.

S2, introducing a proxy F, wherein a YOLO v3 detector is selected as the proxy in the scheme, and the data generated by the generator G is detected to provide a false true value. The agent F can directly apply the generated data to the training of the target detector without additional marking, and an end-to-end network structure is realized.

S3, designing a target detector O, wherein the structure diagram is shown in an attached figure 4. And in the training stage, judging whether the generated data is beneficial to the improvement of the target detection precision or not, and providing feedback for the generator. The target detector O is also the final output of the test phase.

S4, designing the reactor A, wherein the structure diagram is shown in the attached drawing 5. In the training phase, it is discriminated whether the data originated from the real data or generated data, providing feedback to the generator.

S5. in the training phase, the generator G and the discriminator D are alternately trained, wherein the input of the generator G is normalized to be in a range from-1 to 1, the input of the discriminator D is real data and generated data, the generator uses Adam as an optimizer, and the discriminator uses SGD as an optimizer. The capacity of the two devices is continuously improved in the training process, so that the generated data is more real, and meanwhile, the detection precision is also continuously improved.

S6, in the testing stage, the data to be tested are directly input into the target detector O to obtain a testing result.

The invention is described in more detail below with reference to fig. 1.

S1. As shown in FIG. 2, a generator G is constructed, G from a noise p_zInitially, one image is generated by forward propagation through the network and it is desired that the generated image tends towards a real image, the discriminator can be spoofed with false or false. Meanwhile, the category information is fused into a generation network, and various samples are generated under the guidance of the category information. The generator adopts a residual network structure, and a residual block is shown in fig. 3 and comprises two BN layers, a RELU layer and a 3 × 3 convolutional layer. There are four residual blocks, each followed by an upsampling layer. And the output layer of the generator uses the Tanh activation function.

S2, a YOLO v3 network is selected as an agent in the scheme, and the input of the agent is the output of the generator. The sample generated by the generator generates bounding box information and category information as proxy truth values through the proxy. The introduction of the agent realizes the direct application of the generated data, does not need additional marking and realizes an end-to-end network structure.

And S3, as shown in FIG. 4, constructing a target detector O, wherein the target detector O is used as one of the discriminators, and is alternately trained with the generator in a training stage to judge whether the generated data is beneficial to the improvement of the detection precision or not and provide feedback for the generator. The target detector O has as its inputs the generated samples and the false-true values generated by the agent, and as its output the class and position coordinates of the detected target. The characteristic extraction part adopts a full convolution architecture and comprises 15 layers of convolution, wherein the convolution kernel size of the convolution of the 2 nd, 4 th, 6 th, 8 th and 10 th layers is 3 x 3, the step length is 2, and the step length of the rest convolution layers is 1. Here, to ensure the stability of the entire network model, all pooling layers are replaced by convolutional layers that are merged into a step size. In addition, in order to obtain richer detail features, a multi-feature fusion method is used for fusing the lower-layer convolution features with the higher-layer convolution features. The structure of the target detector O is similar to SSD, YOLO, belonging to a one-stage detection, and therefore, unlike the RCNN series detectors, the bounding box coordinates and classification probability are predicted as the output of the last layer at the same time. Each unit location in the last layer of the feature map predicts N bounding boxes, where N is the number of anchor boxes. The number of feature maps in the last layer is set to N × (K +5), where K is the number of classes used to predict class probabilities and 5 refers to the bounding box coordinates and target value (5 ═ 4+ 1).

S4, as shown in FIG. 5, a countermeasure A is constructed, wherein the countermeasure A is one of the discriminators, the input is real data and the output (generated data) of the generated model, and the input also introduces category information. The output is two types 0/1, where 1 is true, i.e., true data, and 0 is false, i.e., generated data. The aligner a uses a residual network structure and the residual block contains two leak ReLU layers, 3 × 3 convolutional layers. There are 5 residual blocks, each of which is followed by a downsampled layer. And the aligner a uses a weight normalization technique of spectral normalization, which is computationally inexpensive and produces a picture of higher or comparable quality than other techniques.

S5, in the training process of the whole network, the generator and the discriminator compete with each other, the capacity of the generator and the discriminator is continuously improved in the alternate training process, and finally the generator generates new data similar to the real data by learning the essential characteristics of the real data. The discrimination model is composed of two parts, wherein the countermeasure device A judges whether the input data is true or false, and the target detector O judges whether the input data can improve the target detection precision. The discriminant model plays a role in guiding the generated model to adjust so that the obtained generated data is closer to the real data, and the situation that the repeated training process is in a divergent state is prevented. In the training process, firstly training a discriminator, then training a generator, alternately training the discriminator and the generator, using Adam as an optimizer for the generator and SGD as an optimizer for the discriminator, and the loss function is as follows:

θ_gFor the parameters of the generator G, the data which the generator wishes to generate can trick the arbiter into

It is misjudged as true by optimizing the loss function L_DCan obtain theta_gSpecifically, as shown below, the following examples are given,

wherein the loss function L_DBy opposing the loss L_{D_A}And detecting loss L_{D_O}Two parts are formed. Namely that

L_D＝L_D-A+λL_D-O, (2)

Wherein the penalty function L is counteracted_{D_A}As follows, Z is the input to the generator, y is the category label,

in order to be the output of the generator,

the probability that the generated sample is discriminated as a true sample is generated for the competitor.

Detecting loss L_{D_O}Can be divided into three parts: regression loss, target loss and classification loss. Wherein,

target information, class probability and position coordinates output for the target detector,

category labels and location coordinates output for the agents.

θ_aFor parameters of the countermeasure A, the countermeasure expects to judge the true data as true, and the generated data as true

If, the parameter θ can be obtained by_a。

θ_oThe target detector is expected to correctly detect the type of target as a parameter of the target detector O

And position information, waiting for the parameter θ by the following equation_o。

And S6, inputting data to test by using the trained target detection O in the test stage, and outputting the data as corresponding class information and bounding box information.

Claims

1. A target detection method based on a generation countermeasure network is characterized by comprising the following specific implementation steps:

the target detector O is used as one of the discriminators, the discriminators are alternately trained with the generator in a training stage, whether generated data is beneficial to improvement of detection precision or not is judged, feedback is provided for the generator, the input of the target detector O is a generated sample, a false true value generated by the agent is output as a category and a position coordinate of a detected target, a characteristic extraction part adopts a full convolution architecture and is totally composed of 15 layers of convolutions, wherein the convolution kernel size of 2, 4, 6, 8 and 10 layers of convolutions is 3 x 3, the step length is 2, and the step lengths of the rest convolution layers are 1; in order to ensure the stability of the whole network model, all pooling layers are replaced by convolution layers fused with step lengths, in addition, in order to obtain richer detail characteristics, a multi-feature fusion method is used for fusing low-layer convolution features and high-layer convolution features, the structure of a target detector O is similar to SSD and YOLO, and the target detector O belongs to one-stage detection, so that the coordinate and the classification probability of a bounding box are simultaneously predicted as the output of the last layer, different from RCNN series detectors; predicting N bounding boxes at each unit position in the feature map of the last layer, wherein N is the number of anchor boxes; the number of feature maps in the last layer is set to N × (K +5), where K is the number of classes used to predict class probabilities, and 5 is for bounding box coordinates and target values;

the input of the reactor A is one of the discriminators, the real data and the output of the generated model are input, the category information is also introduced into the input at the same time, and the output is 0/1 types, wherein 1 is true, namely the real data, and 0 is false, namely the generated data; the reactor A uses a residual error network structure, and a residual error block comprises two leak ReLU layers and 3 × 3 convolution layers; the method has the advantages that 5 residual blocks are provided in total, each residual block is followed by a downsampling layer, the weight normalization technology of spectrum normalization is used by the reactor A, the calculation amount is small, and pictures with higher quality or equivalent quality compared with other technologies can be generated;

2. The method for detecting the target based on the generation countermeasure network as claimed in claim 1, wherein: step 1, the generator G adopts a conditional constraint GAN network structure, inputs condition variables y, namely class labels, and generates samples of different classes under the guidance of the class labels, the generator G adopts a residual error network structure and comprises 4 residual error blocks and an upsampling layer, wherein the residual error blocks are a set of two BN layers, a ReLU layer and a 3 × 3 convolutional layer, and the output layer of the generator G adopts a Tanh activation function.

3. The method for detecting the target based on the generation countermeasure network as claimed in claim 1, wherein: the parameter of the generator G in the step 1 is theta_gBy optimising the loss function L_DCan obtain theta_g，

Where z is the input to the generator, y is the category label,

in order to be the output of the generator,

Wherein

Is the target information output by the target detector,

the class probability of the output of the target detector,

is the position coordinates output by the target detector,

the target information output for the agent is,

the class probability that is output by the agent,

location coordinates output for the agent.

4. The method for detecting the target based on the generation countermeasure network as claimed in claim 1, wherein: the parameter of the reactor A in the step 4 is theta_aI.e. by

5. The method for detecting the target based on the generation countermeasure network as claimed in claim 1, wherein: the parameter of the target detector O in the step 3 is theta _oI.e. by