CN112101217A

CN112101217A - Pedestrian re-identification method based on semi-supervised learning

Info

Publication number: CN112101217A
Application number: CN202010970306.5A
Authority: CN
Inventors: 葛永新; 高志顺
Original assignee: Zhenjiang Qidi Digital World Technology Co ltd
Current assignee: Zhenjiang Qidi Digital World Technology Co ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-18
Anticipated expiration: 2040-09-15
Also published as: CN112101217B

Abstract

The invention discloses a pedestrian re-identification method based on semi-supervised learning, which comprises the following steps of S100 learning a projection matrix U belonging to R^d×cProjecting the original d-dimensional feature space to a c-dimensional subspace, so that U^TX∈R^c×NSatisfying in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as similar samples, and samples from different pedestrians are defined as different samples; s200, adopting a projection matrix U epsilon R for a new sample^d×cAnd projecting to the new subspace to obtain a prediction sample sequence, wherein the prediction sample sequence is arranged from small to large according to Euclidean distances between the new samples and samples in the training sample set. The method makes full use of the labeled sample which is accurately labeledAnd by using the contrast loss function, the negative sample pair can be fully utilized while the positive sample is constrained, the identification speed is high, and the identification accuracy is higher.

Description

Pedestrian re-identification method based on semi-supervised learning

Technical Field

The invention relates to the technical field of pedestrian re-identification, in particular to a pedestrian re-identification method based on semi-supervised learning.

Background

Pedestrian re-identification (Person re-identification), also known as pedestrian re-identification, is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence. Widely considered as a sub-problem for image retrieval, given a monitored pedestrian image, the pedestrian image is retrieved across devices. The visual limitation of a fixed camera is overcome, the pedestrian detection and pedestrian tracking technology can be combined, and the method can be widely applied to the fields of intelligent video monitoring, intelligent security and the like.

Although computer vision practitioners have recently proposed a number of algorithms from different perspectives for the task of pedestrian re-identification, attempting to increase the recognition rate on public data sets, pedestrian re-identification remains a very challenging task due to the effects of several real-world factors.

At present, a semi-supervised learning method is generally used for solving the pedestrian re-identification task, and the flow is approximately as follows: firstly, automatically labeling a sample without a label; secondly, the labeled samples and the automatically labeled samples are subjected to unified training, so that the model is optimized, and the model has better discrimination capability. The utilization of the unlabeled sample after automatic labeling and after labeling has two problems:

the idea of the method used for automatically labeling the unlabeled samples is to label the new mapped space by using a K-Nearest Neighbor (KNN) algorithm. This makes the error after automatic labeling larger if the new space discrimination ability learned is not enough. When the data with larger labeling errors are used for model training, the model is not likely to have better generalization capability due to the increase of training samples, and the discrimination capability of the model is poorer when the model is trained.

Secondly, only the positive sample pairs in the training set are restricted during training, and the negative sample pairs are not concerned, so that the training samples are not fully utilized.

Disclosure of Invention

Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the existing method for solving the pedestrian re-recognition task by using semi-supervised learning has the problems that automatic labeling errors are easily influenced by a new space obtained by learning and the utilization of training samples is insufficient.

In order to solve the technical problems, the invention adopts the following technical scheme: the pedestrian re-identification method based on semi-supervised learning comprises the following steps:

s100: learning a projection matrix U ∈ R^d×cProjecting the original d-dimensional feature space to a c-dimensional subspace, so that U^TX∈R^c×NSatisfying in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as similar samples, and samples from different pedestrians are defined as different samples;

s200: adopting a projection matrix U epsilon R for a new sample^d×cAnd projecting to the new subspace to obtain a prediction sample sequence, wherein the prediction sample sequence is arranged from small to large according to Euclidean distances between the new samples and samples in the training sample set.

Preferably, the learning projection matrix U epsilon R in S100^d×cThe method specifically comprises the following steps:

s110, establishing a training sample set, wherein the training sample set comprises a plurality of samples, the plurality of samples comprise labeled samples and unlabeled samples, and the labels of the samples of the same pedestrian in the labeled samples are the same;

let X ═ X_L,X_U]∈R^d×NRepresents all training samples, where N is the number of all pictures included in the training set, d is the length of the feature vector,

represents N_LThe sample of the individual label is provided with a label,

represents N_UEach unlabeled sample;

s120, establishing an objective function as follows:

wherein L (U) is a regression function,

for a weighted regression function, Ω (U) is a regularized constraint, α, λ > 0 are equilibrium coefficients;

s130, taking the labeled sample loss function as a contrast loss function: for sampled N_PA sample pair

And

if it is

And

samples from the same pedestrian are then in the new projection space

And

has an Euclidean distance d between_nShould be as small as possible, close to 0; otherwise, then d_nShould be at least greater than a predetermined threshold margin>0, if the above condition is not satisfied, a loss occurs;

s140, labeling the label-free samples by adopting a K mutual nearest neighbor method, wherein the loss function of the label-free samples is as follows:

wherein if U^Tx_iAnd U^Tx_jSatisfy K mutual nearest neighbor and x_iAnd x_jFrom different cameras, then

Otherwise W_ij＝0； (8)；

After labeling the label-free samples, further constraining the existing subspace by using the labeled samples, wherein the weight of the constraint is the cosine distance of the two samples in the new projection space;

s150: the regularization term: the projection matrix U is constrained using L2,1 norm:

Ω(U)＝||U||_2,1 (4)。

preferably, the labeled sample loss function of S130 is:

wherein:

preferably, N sampled in S130_PThe sampling strategy of each sample is a sampling strategy for maximizing the top-k recognition rate, namely for each image, all samples of k nearest neighbors are sampled.

Preferably, in S140, the method for labeling the unlabeled sample by using the method of K nearest neighbors includes:

the K nearest neighbors N (x, K) of sample x are defined as follows:

N(x,k)＝{x₁,x₂,...,x_k},|N(p,k)|＝k (5)；

where | represents the number of samples in the set, then K is defined as the mutual nearest R (x, K) as follows:

R(x,k)＝{x_i|(x_i∈N(x,k))∧(x∈N(x_i,k))} (6)。

compared with the prior art, the invention has at least the following advantages:

(1) according to the invention, the K mutual nearest neighbor is used, so that the automatic labeling result of the unlabeled sample is more credible.

(2) And fully utilizing the precisely marked labeled sample. By using the common contrast loss function in the training deep neural network, the negative sample pair can be fully utilized while the positive sample is restrained. It should be noted that any loss for identification or classification may be used as an alternative to the labeled sample loss function.

(3) In order to enable the model to be conveniently migrated to the depth model later, an end-to-end training mode is used, and a random gradient descent method is used in a training strategy. The batch generation strategy for maximizing the top-k recognition rate is provided, and the problems that the convergence speed of a pair training strategy is low in random batches, overfitting of a model is prevented and the like are solved.

Drawings

Fig. 1 shows the K mutual nearest neighbor sampling strategy for the pedestrian re-identification problem. First row: one picture to be retrieved and its 10 nearest neighbors, where P1-P4 are positive samples and N1-N6 are negative samples. A second row: each two columns are the 10 nearest neighbor images to which the first row of images corresponds. The thick-line rectangular frame without chamfers and the thin-line rectangular frame with chamfers respectively represent the retrieved picture and the positive sample picture.

The negative sample closest to the image to be retrieved in FIG. 2 is the most difficult negative sample; the first positive sample just smaller than the hardest negative sample is a moderate positive sample; the samples in the boxes are the sampling strategy herein.

Fig. 3 proper positive sample sampling.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1-3, the pedestrian re-identification method based on semi-supervised learning includes the following steps:

S100: learning a projection matrix U ∈ R^d×cProjecting the original d-dimensional feature space to a c-dimensional subspace, so that U^TX∈R^c×NSatisfying in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as homogeneous samples, and samples from different pedestrians are defined as heterogeneous samples.

Learning projection matrix U ∈ R^d×cThe method specifically comprises the following steps:

represents N_LThe sample of the individual label is provided with a label,

represents N_UEach unlabeled sample;

s120, establishing an objective function as follows:

where L (U) is a regression function, with the goal of making labeled exemplars meet the same labeled exemplar pair closer together, the uncorrelated labeled exemplar pair farther apart in the new space mapped to,

the regression function is a weighted regression function which can improve the discrimination of the model by using unlabelled samples, and omega (U) is a regularization constraint which can select more discriminative features from an original feature space to avoid the condition that the more discriminative features are selectedFitting, wherein alpha and lambda are balance coefficients;

s130, a labeled sample loss function: the purpose of this constraint is to fully utilize the label information of the labeled exemplars. To simultaneously use the positive and negative sample pair constraints, we use the training contrast loss function

Wherein:

for sampled N_PA sample pair

And

if it is

And

samples from the same pedestrian are then in the new projection space

And

in order to effectively utilize the discrimination information of the unlabeled samples and reduce the adverse effect of wrong labeling on the model, K mutual nearest neighbor is adopted to replace K nearest neighbor to label the unlabeled samples, and only the positive sample pair is constrained in the item. The specific loss function is as follows:

Otherwise W_ij＝0 (8)；

The significance of this term is to identify that the K nearest neighbor sample pairs in the learned discriminative subspace are most likely to be from the same pedestrian. And then, after labeling the unlabeled samples, further constraining the existing subspace by using the labeled samples, wherein the weight of the constraint is the cosine distance of the two samples in the new projection space.

S150: the regularization term: the purpose of adding the regularization term is to make the learned projection matrix more sparse while avoiding the occurrence of overfitting. Here, we use L2, a 1 norm to constrain the projection matrix U:

Ω(U)＝||U||_2,1 (4)。

as an improvement, N sampled in S130_PThe sampling strategy of each sample is a sampling strategy for maximizing the top-k recognition rate, namely for each image, all samples of k nearest neighbors are sampled. Therefore, the judgment information of the labeled sample can be utilized to the maximum extent while over-fitting is avoided.

When using random gradient descent for optimization, all samples need to be fed into the model in batches. All samples are sampled randomly, a small fraction of classes, two images per class, are randomly selected each time. In the loss calculation, all sample pairs that all images in each batch may make up participate in the calculation. In this way, although many pairs of samples can be computed at one time, due to the randomness in class sampling, the optimal direction of such sampling may not be the direction that can make the target fall down the fastest. Every time the optimization is completed, the distances of all samples under the current model are calculated. To make the target fall down faster, only one pair of negative examples that are the most difficult under the current model is selected for each image, as in FIG. 2.

It is noted that some positive sample pairs have too large an intra-class difference due to drastic changes, and if these samples are trained, it is highly likely that the model will be overfit, as shown in FIG. 3. To avoid this over-fitting, each picture was sampled with a modest positive sample (a moderate positive sample) in the manner shown in fig. 2, i.e., with the first positive sample being just smaller than the hardest negative sample. To exploit as much of the information provided by labeled exemplars as possible, we propose a sampling strategy that maximizes the top-k recognition rate. As shown in fig. 2, for each image, all samples in k nearest neighbors are sampled, so that discrimination information of labeled samples can be maximally utilized while over-fitting is avoided.

As an improvement, in S140, the method for labeling the label-free samples by using the method of K nearest neighbors includes:

as in fig. 1, P1-P4 are four positive samples of the picture to be retrieved, but do not rank in the first four bits of the nearest neighbor picture, which introduces large errors if the K nearest neighbor result is used directly. It should be noted, however, that the picture to be retrieved and the four positive samples are respectively K-nearest neighbors to each other, which we will refer to as K-nearest neighbors. If unlabeled data is labeled in this manner, error introduction is reduced to some extent.

The K nearest neighbors N (x, K) of sample x are defined as follows:

N(x,k)＝{x₁,x₂,...,x_k},|N(p,k)|＝k (5)；

R(x,k)＝{x_i|(x_i∈N(x,k))∧(x∈N(x_i,k))} (6)。

s200: adopting a projection matrix U epsilon R for a new sample^d×cAnd projecting to the new subspace to obtain a prediction sample sequence, wherein the prediction sample sequence is arranged from small to large according to Euclidean distances between the new samples and samples in the training sample set. The prediction samples are ranked top, indicating the highest likelihood of being the same person between the new sample and the prediction sample.

Experiment and analysis:

selecting characteristics: in order to quickly verify the validity of the proposed method, the LOMO feature and the GOG feature commonly used in the task of pedestrian re-identification are used herein.

Setting parameters: the algorithm is implemented using the theono framework. Wherein the minimum interval margin is 0.5, the balance coefficients alpha and lambda are respectively 0.005 and 0.0001, the mapping to the subspace dimension c is 512, and the batch size, the learning rate and k are respectively 32, 1 and 10.

VIPeR database test results and analysis

The VIPeR database is one of the most popular databases for pedestrian re-identification tasks. It contains 1264 images of 632 pedestrians collected by two cameras with different lighting conditions with 90 deg. view angle change. We freely selected 316 pedestrians to compose the training set, and the remaining 316 pedestrians to compose the testing set, and performed semi-supervised and fully-supervised experimental setup, respectively.

Semi-supervised experiments: for the semi-supervised setup, we randomly took 1/3 pedestrian pictures in the training set to erase the tag as unlabeled samples and the remaining 2/3 pedestrian pictures as tagged samples. The results are shown in Table 4.1. Comparing the method provided by the invention with SSCDL and DLLAP, the method provided by the invention can be found to greatly improve the performance, and particularly the Rank-1 recognition rate can reach 47.5% after the LOMO characteristic and the GOG characteristic are combined.

TABLE 4.1 comparison of recognition rates of semi-supervised learning methods in VIPeR database

Rank	1	5	10	20
					SSCDL	25.6	53.7	68.2	83.6
DLLAP	32.5	61.8	74.3	84.1
					LOMO+Our	34.2	65.2	76.4	85.4
GOG+Our	42.4	73.4	83.9	91.0
					LOMO+GOG+Our	47.5	78.3	86.9	92.1

And (3) full supervision experiment: we also performed a fully supervised setup of the method proposed herein, i.e. using labels of all training samples. The results are shown in Table 4.2. Comparison with DLLAP and L1Graph shows that the methods proposed herein are greatly enhanced when using the GOG feature and using the LOMO and GOG features in combination. Compared with a semi-supervised setting, the method can be seen that when LOMO and GOG characteristics are used, the recognition rate of 47.5% can be achieved by using only the label of 2/3 training samples, which is only 3% different from that in the full-supervised case, and the effectiveness of the method provided by the invention is fully proved.

TABLE 4.2 comparison of recognition rates under fully supervised settings on the VIPeR database

Rank	1	5	10	20
					DLLAP^[41]	38.5	70.8	78.5	86.1
L1Graph^[42]	41.5	-	-	-
					LOMO+Our	36.1	68.2	79.6	88.5
GOG+Our	48.6	77.1	87.3	92.9
					LOMO+GOG+Our	50.5	79.6	88.8	94.3

The method of the invention uses the contrast loss function to fully utilize the label information of the labeled sample, and uses the K mutual nearest neighbor method to replace the K nearest neighbor method to label the unlabeled sample. The experimental results on the pedestrian re-identification public data set VIPeR confirm the effectiveness of the method.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. The pedestrian re-identification method based on semi-supervised learning is characterized by comprising the following steps of:

s100: learning a projection matrix U ∈ R^d×cProjecting the original d-dimensional feature space to a c-dimensional subspace, so that U^TX∈R^c ^×NSatisfying in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as similar samples, and samples from different pedestrians are defined as different samples;

2. The pedestrian re-identification method based on semi-supervised learning as claimed in claim 1, wherein the learning projection matrix U e R in S100^d×cThe method specifically comprises the following steps:

represents N_LThe sample of the individual label is provided with a label,

represents N_UEach unlabeled sample;

s120, establishing an objective function as follows:

wherein L (U) is a regression function,

And

if it is

And

samples from the same pedestrian are then in the new projection space

And

Otherwise W_ij＝0； (8)；

Ω(U)＝||U||_2,1 (4)。

3. the pedestrian re-identification method based on semi-supervised learning according to claim 2, wherein the labeled sample loss function of S130 is:

wherein:

4. semi-supervised learning based pedestrian re-identification as claimed in claim 2Method, characterized in that N sampled in S130_PThe sampling strategy of each sample is a sampling strategy for maximizing the top-k recognition rate, namely for each image, all samples of k nearest neighbors are sampled.

5. The pedestrian re-identification method based on semi-supervised learning as claimed in claim 2, wherein the method for labeling the unlabeled samples by using the K mutual nearest neighbor method in S140 is as follows:

the K nearest neighbors N (x, K) of sample x are defined as follows:

N(x,k)＝{x₁,x₂,...,x_k},|N(p,k)|＝k (5)；

R(x,k)＝{x_i|(x_i∈N(x,k))∧(x∈N(x_i,k))} (6)。