CN112434599B

CN112434599B - Pedestrian re-identification method based on random occlusion recovery of noise channel

Info

Publication number: CN112434599B
Application number: CN202011321451.7A
Authority: CN
Inventors: 黄德双; 张焜
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2022-11-18
Anticipated expiration: 2040-11-23
Also published as: JP2022082493A; CN112434599A; JP7136500B2

Abstract

The invention relates to a pedestrian re-identification method based on random shielding recovery of a noise channel, which comprises the following steps: step 1: after data division and pretreatment are carried out on a reference data set, a CAN network structure is constructed, data expansion is carried out on a training set obtained after data division and pretreatment are carried out on the reference data set by using the CAN network structure, and a basic network main body feature extraction structure is trained by using the training set after data expansion to obtain a trained basic network main body feature extraction structure; step 2: constructing a noise channel structure of the label error; and step 3: comprehensively establishing a pedestrian re-identification network for random shielding recovery based on a noise channel based on a trained basic network main body feature extraction structure, a noise channel structure and a CAN network structure; and 4, step 4: and identifying the actual original image to be detected by utilizing the pedestrian re-identification network. Compared with the prior art, the method has the advantages of good network robustness, high accuracy, low error and the like.

Description

Pedestrian re-identification method based on random occlusion recovery of noise channel

Technical Field

The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method based on random occlusion recovery of a noise channel.

Background

The basic task of a distributed multi-camera surveillance system is to associate people with camera views at different locations and at different times. This is called the pedestrian re-recognition problem, and more specifically, the pedestrian re-recognition is mainly to solve the problem of "where a target pedestrian has appeared before" or "where a target pedestrian has gone after being captured in a monitoring network". It supports many critical applications such as long-time multi-camera tracking and forensic search. In fact, each camera can shoot under different illumination conditions, shading degrees and different static and dynamic backgrounds from different angles and distances. This presents a number of significant challenges to the pedestrian re-identification task. Meanwhile, pedestrian re-identification techniques relying on traditional biometrics such as face recognition are neither feasible nor reliable, since pedestrians observed at cameras of unknown distances may have conditional limitations such as crowded backgrounds, low resolution, etc.

The traditional pedestrian re-identification technology mainly comprises two aspects: feature expression and similarity measure. The common features mainly include color features, texture features, shape features, higher-level attribute features, behavior semantic features and the like. For the similarity measurement, the euclidean distance is used first, and then some supervised similarity discrimination methods are also proposed.

With the development of deep learning, the method based on the deep learning model already occupies the field of pedestrian re-identification, and the deep models for pedestrian re-identification are mainly divided into three types at the present stage: identification model, verification model, and triple model. The Identification model is the same as the classification model on other tasks, gives a picture and then outputs a label of the picture, and the Identification model can fully utilize the labeling information of a single image. The Verification model takes two pictures as input and then inputs whether they are the same pedestrian. The Verification model uses weak tags (relationship between two lines of people) and does not use annotation information for a single picture. Similarly, the triplet model takes three pictures as input, zooms in, zooms out, and zooms out, but the label information for a single picture is not used.

In the aspect of feature extraction, the depth model abandons the traditional mode of manually designing features, and a network model and a structural module are designed based on a convolutional neural network to automatically learn the features. More classical network structures include GoogleNet, resNet, and densnet, among others. Common feature extraction structures include an abstraction structure, a feature pyramid, an attention structure, and the like.

Under the background, the invention designs a network model for random occlusion recovery based on a noise channel, and discriminant force characteristics (including global and local) and reinforced spatial relationship learning can be extracted through multi-scale characterization learning. While the random batch mask strategy employs a random masking and attention mechanism, mitigating the situation where local detail features are suppressed.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a pedestrian re-identification method based on random occlusion recovery of a noise channel.

The purpose of the invention can be realized by the following technical scheme:

a pedestrian re-identification method based on random occlusion recovery of a noise channel comprises the following steps:

step 1: after data division and pretreatment are carried out on a reference data set, a CAN network structure for shielding recovery is constructed, data expansion is carried out on a training set obtained after data division and pretreatment are carried out on the reference data set by using the CAN network structure, and a basic network main body feature extraction structure is trained by using the training set after data expansion to obtain a trained basic network main body feature extraction structure;

and 2, step: constructing a noise channel structure for reducing tag errors caused by data expansion;

and 3, step 3: comprehensively establishing a pedestrian re-identification network based on random shielding recovery of a noise channel based on a trained basic network main body feature extraction structure, a noise channel structure and a CAN network structure for shielding recovery;

and 4, step 4: and identifying the actual original image to be detected by utilizing a pedestrian re-identification network based on random occlusion recovery of a noise channel.

Further, the step 1 comprises the following sub-steps:

step 101: dividing a reference data set into a training set and a test set, then randomly extracting picture data from the training set and carrying out preprocessing operation;

step 102: constructing a CAN network structure for shielding recovery and further performing data expansion on the training set by using the CAN network structure;

step 103: setting parameters and corresponding formulas required by a training network model;

step 104: and after the setting is finished, inputting the picture data subjected to the preprocessing operation and the data expansion into the basic network main body feature extraction structure for training to obtain the trained basic network main body feature extraction structure.

Further, the reference data set in step 101 is a Market1501 data set; the preprocessing operation in the step 101 comprises horizontal turning, noise adding or random erasing; the basic network main body feature extraction structure in the step 104 is a ResNet50 network structure.

Further, in the step 104, in the process of inputting the picture data after the preprocessing operation and the data expansion into the basic network main body feature extraction structure for training, parameters are automatically adjusted by using an Adam optimization method, the Dropout strategy is used to avoid the occurrence of an over-fitting condition, and the Batch Normalization is used to accelerate the convergence speed of the network.

Further, the step 103 specifically includes: setting a total training period epoch to be 150, setting a weight attenuation parameter weight decay to be 0.0005, setting a batch size to be 180, and setting a learning rate updating mode, wherein the corresponding description formula is as follows:

in the formula, lr is a learning rate.

Further, the CAN network structure for occlusion recovery in step 1 is composed of a generator network for learning the original data set and generating an image and a discriminator network for determining whether the input image is real, i.e. whether the input image belongs to the original data set or is generated by the generator network, and the corresponding mathematical description formula is as follows:

where x is the occlusion image, y is the target image, and D and G represent the discriminator network and the generator network, respectively.

Further, the process of using the noise channel structure in step 2 to reduce the tag error caused by data expansion specifically includes:

step 201: giving distribution to the transition probability between the original label corresponding to the generated image data and the noise label observed by utilizing the noise channel structure;

step 202: and solving the distribution by using an EM algorithm to obtain implicit parameters, and reducing the tag error caused by data expansion by using the implicit parameters.

Further, the distribution in step 201 describes the formula:

in the formula, z is a noise label, N is a noise label set, theta and w are implicit parameters, C is a clean label set, k is the number of categories, and p is the predicted label probability.

Further, the process of obtaining the hidden parameters by using the EM algorithm for the distribution in step 202 includes fixing the hidden parameters θ and w and estimating a transition probability in step E, and updating the parameter θ in step M, where the estimated transition probability corresponds to a description formula:

in the formula, c _ti To estimate the transition probability from tag t to tag j, y _t For the true tag information of the t-th sample, x _t For the t-th sample of the input, z _t A noise label for the t sample of the input;

the corresponding description formula of the update parameter θ is:

in the equation, θ (i, j) is the true transition probability from tag i to tag j.

Further, the objective function adopted in the EM algorithm has a corresponding description formula as follows:

in the formula, S (w) represents an objective function employed in the EM algorithm.

Compared with the prior art, the invention has the following advantages:

(1) The method uses a deep learning technology, firstly carries out preprocessing operations such as turning and cutting on a training set picture, then carries out feature extraction through a basic network model (ResNet 50), carries out random batch mask training strategy and multi-scale representation learning on high-dimensional features extracted through the ResNet50 network, thereby obtaining feature information which has more discriminative power and more detail and contains the spatial relevance of pedestrians, and then uses a multi-loss function to carry out fusion joint training network.

(2) The invention uses the restored occlusion image to expand the data set, and introduces a tag noise channel, thereby relieving the error brought by the expanded data and improving the robustness of the network.

Drawings

Fig. 1 is a network overall block diagram of a pedestrian re-identification technology based on random occlusion recovery of a noise channel according to an embodiment of the present invention.

Fig. 2 is a network training flowchart of a pedestrian re-identification technology based on random occlusion recovery of a noise channel according to an embodiment of the present invention.

Fig. 3 is a flowchart illustrating evaluation of a result of a pedestrian re-identification technique based on random occlusion recovery of a noise channel according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

The invention relates to a pedestrian re-identification technology based on random shielding recovery of a noise channel, which realizes a more accurate and efficient pedestrian re-identification task on a plurality of reference data sets. The task of pedestrian re-identification is a processing process for establishing a relationship between pedestrian images or video samples collected by different cameras without repeated vision fields, namely identifying whether pedestrians shot by the cameras at different positions at different moments are the same person. The traditional pedestrian re-identification mainly comprises two steps of pedestrian feature expression and pedestrian similarity judgment.

Compared with a pedestrian re-identification algorithm based on deep learning, the invention provides a personnel re-identification method based on random occlusion recovery with a noise channel. Adding a random occlusion block to the original image, repairing by using a GAN model, and then expanding the original training set by using the repaired image. The baseline model is trained using the enhanced data set and label errors of the augmented image are mitigated by the noise path.

Practical embodiments

1. Basic technical scheme

The invention relates to a pedestrian re-identification technology based on random shielding recovery of a noise channel, as shown in figure 1, the main realization structure of the technology depends on the following parts:

1) Dividing a training set and a test set for an original data set;

2) Extracting a basic network main body characteristic structure;

3) A noise channel structure;

4) A GAN network structure for occlusion recovery;

5) The method comprises the steps of network hyper-parameter adjustment, including an iteration step length adjustment method, an iteration step length initial value, learning function selection and the like;

6) Selecting a loss function, wherein different loss functions are used for different structures;

7) The whole technical method is written based on PyTorch, python and some auxiliary libraries.

Step 1) in the above 7 steps specifically includes: and dividing the reference data set into a training set and a test set. Take the data set Market1501 as an example, 751 pedestrian IDs in total 12936 pictures are taken as training sets, and 750 other pedestrian IDs and some background pictures in total 19732 are taken as test sets.

On the basis, data set processing is further carried out, and a part of the training set is further divided to be used as a verification set so as to control the training process and effectively obtain the optimal state. The test set is divided into two parts, namely query and galery.

And (4) extracting the features of the pictures in the query set and the candidate set by using the trained network, and calculating Euclidean distances of the proposed features in pairs respectively to perform distance sorting. And obtaining pictures with the similar target distance in the candidate set and the query set.

Step 2) of the above 7 steps specifically includes: and selecting a mature network with better performance to perform experiments and carrying out exploration and comparison on results. With the ResNet50 network structure, resNet learns the residual errors through short-circuit connection, and the degradation problem caused when the network depth is deepened is solved.

Step 3) in the above 7 steps, specifically including: for the generated image, the original label cannot be directly considered as a genuine label. For the observed noise label, the transition probabilities before the noise label and the real label need to be learned, and for all training images, the label of the original data is considered to be clean, while the label of the generated data is noisy. For the observation tag, given the distribution, solving for hidden parameters using the EM algorithm;

step 4) in the above 7 steps, specifically including: the generation countermeasure network (GAN) adopts the idea of two-person zero-sum game, and is composed of two parts, namely a generation network and a discrimination network. The GAN is used to learn the original data set and generate an image, while the discriminator network is used to determine whether the input image is authentic (original data set) or counterfeit (generated by the generator network). Both networks are trained simultaneously. The purpose is to make the discrimination model unable to distinguish the authenticity of the generated image. In the technical scheme of the invention, using the condition GAN [15], the mathematical expression of the optimization target is as follows:

In the technical scheme of the invention, aiming at a ResNet50 network structure, in order to solve the difficulty of SGD parameter selection, an Adam optimization method is used for automatically adjusting parameters. The strategy of Dropout is used to avoid the occurrence of the over-fitting condition, and Batch Normalization is used to accelerate the convergence speed of the network.

The method is characterized in that a total training period (epoch) is set to be 150, a weight decay parameter (weight decay) is 0.0005, a batch size (batch size) is 180, and a learning rate updating mode is as follows:

in the formula, lr is a learning rate.

Step 7) of the above 7 steps, specifically comprising: the PyTorch adopts the form of a dynamic graph, and is easy to realize the idea of network construction of the PyTorch.

2. Practical implementation of

The embodiment of the invention is realized in such a way that a pedestrian re-identification technology based on random shielding recovery of a noise channel comprises the following steps:

the reference data set needs data preprocessing for data expansion, and the following data processing modes are used

1) Randomly extracting a plurality of pictures in a data set and adding Gaussian noise processing

2) Randomly extracting a plurality of pictures in the data set, randomly adding a rectangular blocking block on the pictures, and randomly selecting the length and the width of the area from 2cm to 5 cm. In order to make the rectangle occlude the Person image as much as possible, the image is divided into three columns from left to right, and the center of the matrix is randomly chosen in the middle column. The pixel values of the R, G and B channels of the blocker are 0255, and the average values in the data set. On the Market-1501 dataset, pixel averages 89.3, 102.5 and 98.7, occlusion images were restored by Cycle GAN.

And randomly extracting a plurality of pictures from the training data to perform horizontal turning, noise adding, random erasing and other treatments. Meanwhile, aiming at 6 cameras in the Market1501 dataset, images among different cameras are subjected to camera style migration by using Cycle GAN, so that the data integration is multiplied.

After the data sets have been organized and processed as described above, the images are input into a convolutional neural network (ResNet 50) for feature extraction using ResNet50 as a reference network model for parameter and time considerations, and since mark 1501 belongs to a pedestrian data set with a relatively large data volume, a network model pre-trained on ImageNet is used for extraction.

For the whole network training, the combined training is carried out by using a mode of fusing identification loss and weighted list loss, and the whole model comprises a feature learning structure of three branches. And extracting the characteristic graph of the picture through each branch characteristic, and then performing network training and weight updating through combined loss.

For a tag noise path, the original tag cannot be directly considered as a genuine tag for the generated image. For the observed noise label, learning transition probabilities before the noise label and the real label is needed;

the tag of the original data is considered clean, while the tag of the generated data is noisy. For the observation tags, the following distributions are defined:

Given distribution, calculating implicit parameters through an EM algorithm, and in the step E, fixing the parameters and estimating transition probability:

in step M, updating parameters:

Finally, the objective function can be expressed as:

The invention achieves the best recognition result in the current stage on the Market-1501 data set, and the result on the Market-1501 data set is shown in the table 1.

TABLE 1 comparison of experiments on Market-1501 data set

As shown in FIG. 3, through evaluation calculation, the pedestrian re-identification technology based on random occlusion recovery of noise channel proposed by the invention has mAP of 70.1, rank1 of 86.6 and rank5 of 94.6 on Market1501 data set (without re-ranking). Meanwhile, good experimental effect is achieved on other data sets.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A pedestrian re-identification method based on random occlusion recovery of a noise channel is characterized by comprising the following steps:

step 1: after data division and pretreatment are carried out on a reference data set, a CAN network structure for shielding recovery is constructed, data expansion is carried out on a training set obtained after data division and pretreatment are carried out on the reference data set by using the CAN network structure, training is carried out on a basic network main body feature extraction structure by using the training set after data expansion, and a trained basic network main body feature extraction structure is obtained;

and 4, step 4: identifying the actual original image to be detected by utilizing a pedestrian re-identification network based on random shielding recovery of a noise channel;

the process of using the noise path structure in step 2 to reduce the tag error caused by data expansion specifically includes:

step 202: obtaining hidden parameters aiming at the distribution by using an EM algorithm, and reducing tag errors caused by data expansion by using the hidden parameters;

the distribution in step 201 is described by the formula:

in the formula, z is a noise label, N is a noise label set, theta and w are implicit parameters, C is a clean label set, k is the number of categories, and p is the probability of a predicted label;

in the step 202, the process of obtaining the hidden parameters by using the EM algorithm for the distribution includes fixing the hidden parameters θ and w and estimating the transition probability in step E, and updating the parameter θ in step M, where the estimated transition probability corresponds to a description formula:

in the formula, c _ti To estimate the transition probability from tag t to tag j, y _t For the true tag information of the input t sample, x _t For the t-th sample of the input, z _t A noise label for the t-th sample of the input;

the corresponding description formula of the update parameter θ is:

where θ (i, j) is the true transition probability from tag i to tag j;

the objective function adopted in the EM algorithm has a corresponding description formula as follows:

2. The pedestrian re-identification method based on random occlusion recovery of the noise channel as claimed in claim 1, wherein the step 1 comprises the sub-steps of:

3. The pedestrian re-identification method based on the random occlusion recovery of the noise channel as claimed in claim 2, wherein the reference data set in the step 101 is a Market1501 data set; the preprocessing operation in the step 101 comprises horizontal turning, noise adding or random erasing; the basic network main body feature extraction structure in the step 104 is a ResNet50 network structure.

4. The pedestrian re-identification method based on random occlusion recovery of the noise channel as claimed in claim 2, wherein in the step 104, the picture data after the preprocessing operation and the data expansion is input to the basic network main body feature extraction structure for training, parameters are automatically adjusted by using an Adam optimization method, the Dropout strategy is used to avoid the occurrence of the over-fitting condition, and the Batch Normalization is used to accelerate the convergence speed of the network.

5. The method according to claim 2, wherein the step 103 specifically comprises: setting the total training period epoch to 150, the weight decay parameter weight default to 0.0005, the batch size to 180, and the learning rate update mode, where the corresponding description formula is:

in the formula, lr is a learning rate.

6. The method according to claim 1, wherein the CAN network structure for occlusion recovery in step 1 comprises a generator network for learning an original data set and generating an image, and a discriminator network for determining whether an input image is real, that is, whether the input image belongs to the original data set or is generated by the generator network, and the corresponding mathematical description formula is as follows: