CN107145826B

CN107145826B - Pedestrian re-identification method based on double-constraint metric learning and sample reordering

Info

Publication number: CN107145826B
Application number: CN201710213894.6A
Authority: CN
Inventors: 于慧敏; 谢奕
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-04-01
Filing date: 2017-04-01
Publication date: 2020-05-08
Anticipated expiration: 2037-04-01
Also published as: CN107145826A

Abstract

The invention discloses a pedestrian re-identification method based on double-constraint metric learning and sample reordering, which comprises two stages of training and testing; the training phase comprises the following steps: establishing cross-camera association constraint; establishing a constraint associated with the camera; solving a measurement matrix; the testing phase comprises the following steps: performing feature space projection by using the measurement matrix; calculating Euclidean distances between the query picture and the candidate pictures in the feature space; calculating initial sequence of candidate pictures; selecting front K candidate pictures in a sorting queue; constructing a probability hypergraph by utilizing the relevance of the previous K candidate pictures in the feature space; calculating a reordering result based on the probabilistic hypergraph; and returning the final ordering of the candidate pictures. According to the invention, two kinds of association constraints of the training samples are considered at the same time, so that the feature space obtained by learning is more suitable for pedestrian re-identification, and meanwhile, the re-ordering is carried out by utilizing the association of the candidate pictures, so that a more accurate pedestrian re-identification result is obtained.

Description

Pedestrian re-identification method based on double-constraint metric learning and sample reordering

Technical Field

The invention relates to a method in the technical field of video image processing, in particular to a pedestrian re-identification method based on double-constraint metric learning and sample reordering.

Background

Video monitoring provides a rich information source for safety early warning, investigation and evidence collection, suspect tracking and other works. However, the monitoring range of a single camera is very limited, so that it is impossible to monitor a large or complex scene (e.g. train station, airport, campus, etc.) in all directions. In order to capture more comprehensive and extensive information in a public area, a large number of monitoring cameras are often required to work in concert. The traditional video processing technology is mainly designed for a single camera, and when a pedestrian target moves out of a current video, the direction of the target cannot be judged. Therefore, how to re-identify pedestrians in the monitoring network according to the query picture of the pedestrian target and establish identity association of the pedestrian target under different cameras becomes a core problem in the field of intelligent video monitoring.

For the pedestrian re-identification problem, the traditional method is mainly based on the appearance characteristics of the pedestrian image, such as extracting the characteristics of color, shape, texture and the like, and then the pedestrian re-identification result is obtained according to the characteristic similarity. However, the illumination, the viewing angle difference and the posture change of the pedestrian between different cameras can significantly change the appearance of the same pedestrian, and the satisfactory accuracy of pedestrian re-identification cannot be obtained only by means of similarity matching of the appearance characteristics of the pedestrian pictures. The introduction of measurement learning provides an important means for relieving the influence of cross-camera difference on pedestrian re-identification, and the measurement learning learns a measurement matrix through a training set, so that a pedestrian picture can be projected to a new feature space, the feature distance between the same pedestrian pictures is smaller, and the feature distance between different pedestrian pictures is larger. However, in the existing metric learning algorithm, only cross-camera correlation information between pedestrian pictures of different cameras is considered in the training process, and the correlation between different pedestrian pictures in the same camera is ignored. Meanwhile, the metric learning algorithm is easy to generate an overfitting phenomenon on a training set, and a suboptimal pedestrian re-identification result can be obtained by completely depending on a distance metric matrix obtained by learning to perform similarity sequencing in a testing stage.

Aiming at the defects and shortcomings of the existing pedestrian re-identification method based on metric learning, the dual-constraint metric learning technology provided by the invention can simultaneously consider the associated information of the same camera and the cross camera between training samples in the metric learning process, and learn to obtain a feature space projection matrix with stronger discriminability. In addition, by introducing a reordering technology in the test stage, the method can effectively relieve the influence of the over-fitting phenomenon in metric learning by utilizing the associated information among the candidate pictures, and obtain a candidate picture ordering result which is more stable and accurate than the existing pedestrian re-identification technology.

Disclosure of Invention

The invention provides a pedestrian re-identification method based on double-constraint metric learning and sample reordering to solve the problems in the prior art, so that the accuracy and the stability of the existing pedestrian re-identification method based on metric learning are improved.

In order to achieve the purpose, the invention discloses a pedestrian re-identification method based on double-constraint metric learning and sample reordering, which comprises two stages of training and testing;

the training phase comprises the steps of:

step 1, establishing cross-camera association constraint: forming a cross-camera sample pair by using pedestrian pictures from different cameras in the training set, and establishing a constraint item to ensure that the characteristic distance between the cross-camera positive sample pair is smaller than the characteristic distance between the cross-camera negative sample pair;

step 2, establishing a camera association constraint: forming a same-camera sample pair by using pedestrian pictures from the same camera in the training set, and establishing a constraint item to ensure that the characteristic distance between the same-camera negative sample pair is greater than the characteristic distance between the camera-crossing positive sample pair;

step 3, solving a measurement matrix: obtaining a target function of double-constraint metric learning by combining the two constraint terms in the step 1 and the step 2, solving a semi-positive definite metric matrix M which minimizes the target function to obtain a training result of metric learning, and ending the training stage;

the testing phase comprises the following steps:

and 4, performing characteristic space projection by using the measurement matrix: according to the semi-positive nature of the measurement matrix M, the characteristics of the measurement matrix M are decomposed into M-P^TP, utilizing the matrix P to search the feature vector x of the picture in the test stage^pAnd feature vectors of candidate sets

Projecting the images to a new feature space in a unified manner, wherein N is the total number of the candidate concentrated images in the testing stage;

step 5, calculating Euclidean distances of the query picture and the candidate pictures in the feature space: respectively calculating the Euclidean distance between the query picture and each candidate picture in the new feature space:

step 6, calculating the initial sequence of the candidate pictures: sorting the candidate pictures according to the Euclidean distances calculated in the step 5, wherein the candidate pictures with smaller Euclidean distances to the query picture can obtain a more front sorting position;

step 7, selecting the first K candidate pictures in the sorting queue: selecting K candidate pictures with the top ranking from the candidate picture ranking queue obtained in the step 6;

step 8, constructing a probability hypergraph by using the relevance of the previous K candidate pictures in the feature space: taking the query picture and the K candidate pictures as vertexes of the probability hypergraph, generating hyperedges of the probability hypergraph through the relevance between the vertexes, and finally giving corresponding weight to each hyperedge;

step 9, calculating a reordering result based on the probability hypergraph: calculating a Laplace matrix of the probability hypergraph, establishing a target function by combining with the experience loss of the initial label, calculating the ranking score of the candidate pictures according to the target function, and reordering the K candidate pictures from large to small according to the ranking score;

step 10, returning the final ordering of the candidate pictures: and (4) replacing the sorting positions of the previous K pictures in the sorting queue in the step 6 with the re-sorting results of the K candidate pictures in the step 9, and returning the whole candidate set sorting queue as the final result of pedestrian re-identification.

Further: the establishment of the cross-camera association constraint in the step 1 comprises the following steps:

step 1.1, respectively defining training pictures from different cameras as a query set

And candidate set

Wherein x_iAnd y_jIs a feature vector of a pedestrian picture, and

and

the number of the pictures in the search set is n, and the number of the pictures in the candidate set is m;

step 1.2, defining sample pairs (x) composed of pedestrian pictures from different cameras_i,y_j) Is a cross-camera sample pair; when x is_iAnd y_jWhen belonging to the same pedestrian, i.e.

Scale (x)_i,y_j) For a cross-camera positive sample pair, and define z _ij1 is ═ 1; when in

When it comes to (x)_i,y_j) For pairs of negative samples across the camera, and set z_ij＝-1；

Step 1.3, constraining any cross-camera positive sample pair (x) in the training set_i,y_j) Is less than the negative sample pair (x) across the camera_i,y_k) The distance between:

wherein d is_M(-) is the mahalanobis distance metric function to be learned, expressed as follows:

in the above formula, M is a semi-positive measurement matrix, i.e. the target of measurement learning;

step 1.4, performing equivalent transformation on the constraint in step 1.3, wherein the distance between any cross-camera positive sample pair in the constraint training set is smaller than a threshold ξ, and the distance between any cross-camera negative sample pair in the training set is larger than a threshold ξ, so as to obtain the following loss function:

wherein

Is a logistic regression function; e_p(M) is a loss function across the camera positive sample pair, E_d(M) is a loss function of the cross-camera negative sample pairs, ξ takes on all cross-camera sample pairs (x)_i,y_j) And with camera sample pair (y)_j,y_k) The average distance of (c).

Further: the establishment of the camera association constraint in the step 2 comprises the following steps:

step 2.1, define candidate set

Picture of different pedestrians in middle_jAnd y_kPairs of constituent samples (y)_j,y_k) For the same camera negative sample pair, and set label z_jk＝-1；

Step 2.2, constraining any cross-camera positive sample pair (x) in the training set_i,y_j) Is less than the same camera negative sample pair (y)_j,y_k) The distance between:

step 2.3, since step 1.4 already constrains the distances between all pairs of cross-camera positive samples to be less than the threshold ξ, the constraint in step 2.2 is equivalently converted into any pair of same-camera negative samples (y) in the constraint training set_j,y_k) Greater than ξ, the following loss function is obtained:

wherein E_s(M) is the loss function of the same camera negative sample pair.

Further: the solving of the measurement matrix in the step 3 specifically comprises the following steps:

step 3.1, jointly considering the loss functions in step 1.4 and step 2.3, obtaining a target function of the double-constraint distance metric learning:

Φ(M)＝E_p(M)+E_d(M)+E_s(M)

step 3.2, give weight w to the sample pairs in the objective function_ijAnd W_jkAnd simplifying the objective function expression in the step 3.1 to obtain:

wherein

When z is_ijWhen 1 is equal to w_ij＝1/N_posIn which N is_posThe total number of pairs of positive samples across cameras in the training set; when z is_ijWhen is-1 time w_ijIs set to be 1/N_negIn which N is_negThe total number of all cross-camera and same-camera negative sample pairs in the training set; at the same time, since there is no same-camera positive sample pair, w will be_jkAre uniformly set to 1/N_neg；

Step 3.3, defining the dual constraint metric learning as the following optimization problem:

and 3.4, solving the optimization problem in the step 3.3 to obtain a semi-positive definite metric matrix M.

Further: the constructing of the probability hypergraph by using the relevance of the previous K candidate pictures in the feature space in the step 8 specifically comprises the following steps:

step 8.1, first queryMerging the pictures and K candidate pictures to obtain a vertex set of the probability hypergraph

Step 8.2, in

Each vertex v in_iAs central node, by connection v_iGenerating three super edges by 5, 15 and 25 vertexes which are closest to each other in the projection feature space, and adding the three super edges into a super edge set epsilon of the probability hypergraph, so that the set epsilon contains 3 x (K +1) super edges in total;

step 8.3, for each super edge e in the super edge set epsilon_iAssigning a non-negative weight value w_h(e_i) When the super edge takes the query picture as the central node, a weighted value is distributed to the super edge

When the super edge takes the candidate picture as the central node, the weight value is distributed to the super edge

Step 8.4, according to

The subordination relation between the middle vertex and the epsilon middle transfinite has a structure size of

The element of the incidence matrix H is defined as:

wherein A (v)_i,e_j) Representing a vertex v_iBelonging to the super edge e_jIs calculated by the following formula:

wherein v is_jIs a super edge e_jσ is the average distance between all vertices in the projection feature space; final completion probability hypergraph

And obtaining a correlation matrix H.

Further: in step 9, a reordering result is calculated based on the probabilistic hypergraph, which specifically includes the following substeps:

step 9.1, based on the correlation matrix H, calculate the degree d (v) of each vertex and the degree δ (e) of each superedge in the probability hypergraph, where d (v) Σ_e∈εw_h(e) H (v, e), and

defining a diagonal matrix D_vMaking the elements on the diagonal line correspond to the degree of each vertex in the probability hypergraph; defining a diagonal matrix D_eMaking the elements on the diagonal line correspond to the degree of each hyper-edge in the probability hyper-graph; defining diagonal matrix W to make its diagonal elements correspond to weight W of each superedge_h(e)；

Step 9.2, utilizing incidence matrix H and vertex degree matrix D_vOvercritical matrix D_eAnd the Laplace matrix L of the probability hypergraph is calculated together with the hyperedge weight matrix W:

wherein I is a size of

The identity matrix of (1);

step 9.3, simultaneously considering Laplacian constraint and initial label experience loss of the probability hypergraph by utilizing a normalization framework, and defining an objective function of sample reordering as follows:

wherein f represents a reordering score vector needing to be learned, r represents an initial label vector, the label of a query picture in r is set to be 1, the labels of all candidate pictures are set to be 0, and μ >0 is a normalization parameter used for weighing the importance between a first item and a second item in an objective function; the first item in the target function restrains the peaks sharing more hyper-edges in the probability hypergraph to obtain similar reordering scores, and the second item in the target function restrains the reordering scores to be close to the initial label information;

step 9.4, by making the first derivative of the objective function in step 9.3 with respect to f zero, an optimal solution to the reordering problem can be obtained quickly:

and 9.5, reordering the K candidate pictures from large to small according to the reordering scores of the candidate pictures in the vector f.

Compared with the prior art, the invention adopting the technical scheme has the following beneficial effects:

1) compared with the existing pedestrian re-identification method based on metric learning, which only considers the cross-camera association constraint of the training samples, the method provided by the invention simultaneously considers the same-camera and cross-camera association information among the training samples in the process of metric learning, so that the learned metric matrix has stronger discriminability;

2) according to the method, the probability hypergraph is constructed by using the associated information among different candidate pictures, the similarity sorting result in the test stage is reordered, the influence of an over-fitting phenomenon in metric learning is effectively relieved, and a more stable and accurate candidate picture sorting result is obtained;

3) according to the invention, only K candidate pictures with the front initial ordering positions are considered during reordering, and compared with the reordering of the whole candidate set, the calculation complexity of probability hypergraph construction is reduced on the basis of ensuring the ordering accuracy, so that the reordering speed is accelerated.

Drawings

FIG. 1 is a schematic overall flow chart of the present invention.

Detailed Description

The technical solution of the present invention will be further described in detail with reference to the following specific examples.

The following examples are carried out on the premise of the technical scheme of the invention, and detailed embodiments and specific operation processes are given, but the scope of the invention is not limited to the following examples.

Examples

In the embodiment of the invention, pedestrian pictures shot by different cameras are processed, a metric matrix is learned through a training set, a query picture of a certain pedestrian target is used in a test stage, and correct matching of the pedestrian target is found in candidate sets shot by different cameras, referring to fig. 1, in the embodiment of the invention, the method comprises two stages of training and testing;

the training phase comprises the steps of:

step 1, establishing cross-camera association constraint: the method comprises the following steps of forming a cross-camera sample pair by using pedestrian pictures from different cameras in a training set, establishing a constraint term to enable the characteristic distance between the cross-camera positive sample pair to be smaller than the characteristic distance between the cross-camera negative sample pair, and specifically comprising the following sub-steps:

And candidate set

Wherein x_iAnd y_jIs a feature vector of a pedestrian picture, and

and

step 1.4, performing equivalent transformation on the constraint in the step 1.3, wherein the distance between any cross-camera positive sample pair in the constraint training set is smaller than a threshold value ξ, and any cross-camera negative sample pair in the training set

The distance between is greater than the threshold ξ, the following loss function is obtained:

wherein

Step 2, establishing a camera association constraint: the method comprises the following steps of forming a same-camera sample pair by using pedestrian pictures from the same camera in a training set, establishing a constraint term to enable the characteristic distance between the same-camera negative sample pair to be larger than the characteristic distance between a cross-camera positive sample pair, and specifically comprising the following substeps:

step 2.1, define candidate set

step 3, solving a measurement matrix: obtaining a target function of double-constraint metric learning by combining the two constraint terms in the step 1 and the step 2, and solving a semi-positive definite metric matrix M which minimizes the target function to obtain a training result of metric learning, wherein the training result specifically comprises the following substeps:

Φ(M)＝E_p(M)+E_d(M)+E_s(M)

step 3.2, giving weight to the sample pairs in the objective function, and simplifying the objective function expression in step 3.1 to obtain:

wherein

When z is_ijWhen 1 is equal to w_ij=1/N_posIn which N is_posThe total number of pairs of positive samples across cameras in the training set; when z is_ijWhen is-1 time w_ijIs set to be 1/N_negIn which N is_negThe total number of all cross-camera and same-camera negative sample pairs in the training set; at the same time, since there is no same-camera positive sample pair, w will be_jkAre uniformly set to 1/N_neg；

step 3.4, solving the optimization problem in the step 3.3 to obtain a semi-positive definite metric matrix M; in this embodiment, first, a matrix X and a matrix Y are defined, where the matrix X and the matrix Y respectively store a query set

Middle n pictures and candidate set

Feature vectors of the middle m pictures; then, X and Y are combined into a matrix C ═ X, Y]And use in combination of c_iRepresents the ith column of matrix C; by assuming when y_jAnd y_kZ is the same candidate picture_jk0 and w_jkThe objective function in step 3.2 can be changed to 0:

the gradient of the objective function with respect to the matrix M is found as follows:

finally, iteratively solving a measurement matrix M which minimizes the objective function by using a gradient descent method;

the testing phase comprises the following steps:

step 7, selecting the first K candidate pictures in the sorting queue: selecting K candidate pictures with the top ranking from the candidate picture ranking queue obtained in the step 6, wherein K is 100 in the embodiment;

step 8, constructing a probability hypergraph by using the relevance of the previous K candidate pictures in the feature space: taking the query picture and the K candidate pictures as vertexes of the probability hypergraph, generating hyperedges of the probability hypergraph through the relevance between the vertexes, and finally giving corresponding weight to each hyperedge; the method specifically comprises the following substeps:

step 8.1, firstly, merging the query picture and K candidate pictures to obtain a vertex set of the probability hypergraph

Step 8.2, in

step 8.3, for each super edge e in the super edge set epsilon_iAssigning a non-negative weight value w_h(e_i) When the super edge takes the query picture as the central node, a larger weight value is distributed to the super edge

Emphasizing the role of the query picture in reordering; when the super edge takes the candidate picture as the central node, a smaller weight value is distributed to the super edge

In this example take

Step 8.4, according to

The element of the incidence matrix H is defined as:

And obtaining a correlation matrix H;

step 9, calculating a reordering result based on the probability hypergraph: calculating a Laplace matrix of the probability hypergraph, establishing a target function by combining with the experience loss of the initial label, calculating the ranking score of the candidate pictures according to the target function, and reordering the K candidate pictures from large to small according to the ranking score; the method specifically comprises the following substeps:

wherein I is a size of

The identity matrix of (1);

wherein f represents a reordering score vector needing to be learned, r represents an initial label vector, the label of a query picture in r is set to be 1, the labels of all candidate pictures are set to be 0, and μ >0 is a normalization parameter used for weighing the importance between a first item and a second item in an objective function; the first item in the target function restrains the peaks sharing more super edges in the hypergraph to obtain similar reordering scores, and the second item in the target function restrains the reordering scores to be close to the initial label information; μ in this example is 0.01;

step 9.5, reordering the K candidate pictures from large to small according to the reordering scores of the candidate pictures in the vector f;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A pedestrian re-identification method based on double-constraint metric learning and sample reordering is characterized by comprising two stages of training and testing;

the training phase comprises the steps of:

the testing phase comprises the following steps:

step 4, utilizing the measurement matrixCarrying out characteristic space projection: according to the semi-positive nature of the measurement matrix M, the characteristics of the measurement matrix M are decomposed into M-P^TP, utilizing the matrix P to search the feature vector x of the picture in the test stage^pAnd feature vectors of candidate sets

2. The pedestrian re-identification method based on the dual-constraint metric learning and the sample re-ranking as claimed in claim 1, wherein: the establishment of the cross-camera association constraint in the step 1 comprises the following steps:

And candidate set

Wherein x_iAnd y_jIs a feature vector of a pedestrian picture, and

and

step 1.2, defining sample pairs (x) composed of pedestrian pictures from different cameras_i，y_j) Is a cross-camera sample pair; when x is_iAnd y_jWhen belonging to the same pedestrian, i.e.

Scale (x)_i，y_j) For a cross-camera positive sample pair, and define z_ij1 is ═ 1; when in

When it comes to (x)_i，y_j) For pairs of negative samples across the camera, and set z_ij＝-1；

Step 1.3, constraining any cross-camera positive sample pair (x) in the training set_i，y_j) Is less than the negative sample pair (x) across the camera_i，y_k) The distance between:

wherein

Is a logistic regression function; e_p(M) is a loss function across the camera positive sample pair, E_d(M) is a loss function of the cross-camera negative sample pairs, ξ takes on all cross-camera sample pairs (x)_i，y_j) And with camera sample pair (y)_j，y_k) The average distance of (c).

3. The pedestrian re-identification method based on the dual-constraint metric learning and the sample re-ranking as claimed in claim 2, wherein: the establishment of the camera association constraint in the step 2 comprises the following steps:

step 2.1, define candidate set

Picture of different pedestrians in middle_jAnd y_kPairs of constituent samples (y)_j，y_k) For the same camera negative sample pair, and set label z_jk＝-1；

Step 2.2, constraining any cross-camera positive sample pair (x) in the training set_i，y_j) Is less than the same camera negative sample pair (y)_j，y_k) The distance between:

step 2.3, since step 1.4 already constrains the distances between all pairs of cross-camera positive samples to be less than the threshold ξ, the constraint in step 2.2 is equivalently converted into any pair of same-camera negative samples (y) in the constraint training set_j，y_k) Greater than ξ, the following loss function is obtained:

wherein E_s(M) is the loss function of the same camera negative sample pair.

4. The pedestrian re-identification method based on the dual-constraint metric learning and the sample re-ranking as claimed in claim 3, wherein: the solving of the measurement matrix in the step 3 specifically comprises the following steps:

Φ(M)＝E_p(M)+E_d(M)+E_s(M)

step 3.2, give weight w to the sample pairs in the objective function_ijAnd w_jkAnd for step 3.1The target function expression of (2) is simplified to obtain:

wherein

5. The pedestrian re-identification method based on the dual-constraint metric learning and the sample re-ranking as claimed in claim 1, wherein: the constructing of the probability hypergraph by using the relevance of the previous K candidate pictures in the feature space in the step 8 specifically comprises the following steps:

Step 8.2, in

Each vertex v in_iAs a central nodePoint, by connecting v_iGenerating three super edges by 5, 15 and 25 vertexes which are closest to each other in the projection feature space, and adding the three super edges into a super edge set epsilon of the probability hypergraph, so that the set epsilon contains 3 x (K +1) super edges in total;

Step 8.4, according to

The element of the incidence matrix H is defined as:

wherein A (v)_i，e_j) Representing a vertex v_iBelonging to the super edge e_jIs calculated by the following formula:

And obtaining a correlation matrix H.

6. The pedestrian re-identification method based on the dual-constraint metric learning and the sample re-ranking as claimed in claim 5, wherein: in step 9, a reordering result is calculated based on the probabilistic hypergraph, which specifically includes the following substeps:

step 9.1, calculating the degree d (v) of each vertex and the degree delta (e) of each hyper-edge in the probability hypergraph based on the incidence matrix H, wherein

While

wherein I is a size of

The identity matrix of (1);

wherein f represents a reordering fraction vector needing to be learned, r represents an initial label vector, the label of a query picture in r is set to be 1, the labels of all candidate pictures are set to be 0, and mu >0 is a normalization parameter used for weighing the importance between a first item and a second item in an objective function; the first item in the target function restrains the peaks sharing more hyper-edges in the probability hypergraph to obtain similar reordering scores, and the second item in the target function restrains the reordering scores to be close to the initial label information;