CN115661722B

CN115661722B - Pedestrian re-identification method combining attribute and orientation

Info

Publication number: CN115661722B
Application number: CN202211431088.3A
Authority: CN
Inventors: 张天宇; 张永飞; 李波; 李林洪
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-06-06
Anticipated expiration: 2042-11-16
Also published as: CN115661722A

Abstract

The invention discloses a pedestrian re-identification method combining attributes and directions, which comprises the steps of firstly, utilizing a human attribute identification model and a direction identification model to obtain human apparent attribute characteristics and human direction information of a pedestrian target in a pedestrian image; then, according to the similarity of the human body apparent attribute characteristics of the pedestrians in the training data set, calculating the visual similarity among different pedestrians to obtain a visual similarity matrix, and adjusting classification labels used in the classification loss function according to the similarity relation; finally, the method designs the measurement learning loss based on the orientation similarity, strengthens the minimization constraint of the intra-class distance between the same orientations on the basis of a triplet loss function, and simultaneously avoids directly optimizing the sample distance between opposite orientations with larger visual phase difference. The pedestrian re-recognition method combining the attribute and the orientation provided by the invention relieves the problem that the pedestrian re-recognition training algorithm is fit to the pedestrian identity information in the training data, and is beneficial to improving the mobility of the pedestrian re-recognition model in an actual scene.

Description

Pedestrian re-identification method combining attribute and orientation

Technical Field

The invention relates to the technical field of image recognition, in particular to a pedestrian re-recognition method combining attributes and directions.

Background

At present, public safety is increasingly demanded by people, and higher demands are put on public security infrastructure construction by countries and society. Because video investigation plays an important role in case investigation, security situation early warning and the like, the deployment number of monitoring cameras is also increased year by year, and monitoring videos become an important information source in the case processing process. But content analysis of the massive amounts of video data produced by the surveillance cameras presents challenges. Pedestrians are the main target types of video monitoring, are the basis for realizing intelligent traffic and intelligent security, and the searching and identifying of pedestrians in the monitoring video become the core requirements of content analysis of the monitoring video. The pedestrian under the cross camera is identified by manpower alone, so that the requirements of urban traffic management and video monitoring security can not be met, a pedestrian re-identification method based on a computer vision technology is generated, the pedestrian re-identification technology becomes a hot spot problem, and the method attracts the wide attention of the fields of computer vision and artificial intelligence.

Pedestrian re-recognition technology is generally used for searching images of the same pedestrian in a monitoring video network of a plurality of cameras, the images of the pedestrian are given as targets, the images are expressed as a group of vectors, namely human body apparent features by means of apparent characteristics of the pedestrians in the images, such as clothes colors, textures and the like, and the similarity degree of the features is used as a basis for ordering image searching results. The technology provides reliable basis for video monitoring and public safety, and can provide key technical support for future intelligent monitoring and smart city systems.

When the existing pedestrian re-identification method extracts apparent features of a human body, an optimization target is generally based on identity consistency, namely, images of the same person are subjected to approximate feature mapping, features among images of different persons need to be distinguished, and the distance in a feature space is as far as possible. The object of identity consistency can be realized by means of deep model optimization of large parameters, so that human body characteristics and identity information are strongly related. However, in a real scene, there may be a difference between the similarity between images and identity consistency, such as a situation where there is a large difference in the appearance of images of the same pedestrian in different directions or the appearance of images of different pedestrians is large due to the same clothing appearance. Model training is only carried out based on identity consistency, so that the model is more prone to learning the characteristics with unchanged orientation, or differences among different but similar pedestrians are excavated from areas outside the foreground, the model is insufficient in fitting and generalization capability, and the model cannot adapt to the task of re-identifying pedestrians in a test scene or an actual application scene.

Therefore, how to reduce the gap between identity consistency and visual similarity, alleviate the overfitting of the pedestrian re-identification training algorithm to the identity information, and promote the mobility of the pedestrian re-identification model is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a pedestrian re-recognition method combining attributes and directions, which relieves the problem that a pedestrian re-recognition training algorithm is over-fitted to the pedestrian identity information in training data, utilizes the human attribute information to measure the visual similarity among different people, designs a new neighborhood label regularization method to adjust classification labels, and designs a new loss function based on the difference among different directions for training a pedestrian re-recognition depth neural network, thereby being beneficial to improving the mobility of a pedestrian re-recognition model in an actual scene.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a pedestrian re-identification method combining attributes and directions comprises the following steps:

step 1: collecting pedestrian images from the monitoring video to form re-identification training data, and labeling pedestrian classification labels; respectively utilizing a human body attribute recognition model and a human body orientation recognition model to acquire human body apparent attribute characteristics and human body orientation labels of pedestrian targets in the pedestrian images;

step 2: calculating attribute characteristics of the same pedestrian object according to the human body apparent attribute characteristics of each pedestrian image, calculating visual similarity of the attribute characteristics of each pedestrian object and other pedestrian objects, obtaining a visual similarity matrix, judging similarity relations among different pedestrian objects according to the visual similarity matrix, and adjusting pedestrian classification labels according to the similarity relations by adopting a neighborhood label regularization method;

step 3: training a pedestrian re-recognition model according to the human body orientation label generated in the step 1 and the pedestrian classification label obtained in the step 2 by utilizing classification loss and measurement learning loss based on orientation similarity;

step 4: and respectively inputting the target image and the pedestrian image data to be identified into the trained pedestrian re-identification model, extracting corresponding human body feature vectors, calculating human body feature similarity, searching and matching by utilizing the human body feature similarity, and completing the task of pedestrian re-identification by using the sequence of the human body feature similarity from large to small as a search result of pedestrian re-identification.

Preferably, the step 1 of obtaining the apparent attribute feature and the human body orientation specifically includes the following steps:

step 11: selecting a plurality of images in the human attribute identification data set, and carrying out human data identification labeling and body orientation labeling on the images to respectively obtain a human data identification data set and a body orientation data set; the human body data identification labels comprise labels of clothes type, clothes color, knapsack, hat and glasses; respectively inputting a human body data identification data set and a body orientation data set into a depth residual neural network, and training to obtain an attribute identification model and an orientation identification model;

step 12: inputting all images in the re-recognition training data into the attribute recognition model, and taking the feature vector after global maximum pooling of the feature map after the last layer of convolution as the human body apparent attribute feature of the pedestrian image;

step 13: and inputting all the images in the re-identification training data into the orientation identification model to obtain the human orientation labels of all the images.

Preferably, the attribute features of the same pedestrian target are average values of human body apparent attribute features of all pedestrian images corresponding to the same pedestrian target;

the expression of similarity is:

wherein dist (-) is a euclidean distance calculation function; d (D) _j,k Representing visual similarity of the pedestrian object j and the pedestrian object k;

the attribute characteristics of the pedestrian target j; />

Is an attribute feature of the pedestrian object k.

Preferably, the step 2 of adjusting the classification label by using a neighborhood label regularization method specifically includes the following steps:

step 21: estimating the number scale N, N of similar pedestrians with similar vision in each pedestrian target in the re-recognition training data, wherein the value of N is generally one percent of the number of the pedestrian targets, a proper threshold t is selected from a vision similarity matrix of each pedestrian target by using a dichotomy, the vision similarity is larger than the similarity threshold and is the similar pedestrians of the current pedestrian target, and the average value range of the number of the similar pedestrians is [ N, N+1];

step 22: assigning category weights to pedestrian classification tags of similar pedestrians of each pedestrian target, thereby adjusting the corresponding pedestrian classification tags;

and (3) regarding the threshold t found in the step 21, if the visual similarity of the two pedestrians is greater than t, considering that the pedestrians are similar, and giving higher category weight to similar pedestrians than dissimilar pedestrians on the basis of the one-hot coded pedestrian classification label marking.

Preferably, the calculating the learning loss based on the measure of the similarity of orientation in the step 3 includes the steps of:

step 31: batching the re-recognition training data, inputting each pedestrian image into a pedestrian re-recognition model in each batch of data to acquire a corresponding human body feature vector, and sequentially selecting each image as an anchor point diagram; selecting a pedestrian image with the same pedestrian classification label as the anchor point diagram and the largest Euclidean distance of the human body orientation marked as the adjacent orientation (the adjacent orientation is the adjacent orientation of the human front image, for example), as an adjacent orientation positive sample image (the image with the largest Euclidean distance is selected from a pedestrian image set with the same pedestrian classification label and the adjacent orientation); selecting a pedestrian image with the smallest Euclidean distance, which is different from the pedestrian classification labels of the anchor point diagram, as a negative sample image; forming the anchor point diagram, human body feature vectors adjacently oriented to the positive sample image and the negative sample image into a triplet, inputting the triplet into a triplet loss function, and calculating to obtain triplet loss;

step 32: selecting, for each anchor point diagram selected in the step 31, a pedestrian image with the same pedestrian classification label as the anchor point diagram and the same human body orientation and the same Euclidean distance and the largest Euclidean distance as the same orientation positive sample image; inputting the anchor point diagram and the human body characteristic vector of the positive sample image in the same direction into a contrast loss function, and calculating contrast loss between positive samples;

step 33: the sum of the contrast loss and the triplet loss between positive samples is calculated, and the metric learning loss based on the orientation similarity is obtained.

Preferably, the specific process of the step 4 is as follows:

the method comprises the steps of collecting target images and pedestrian image data to be identified as a query set and a candidate set respectively, inputting the query set and the candidate set into a pedestrian re-identification model to obtain corresponding human body feature vectors, calculating Euclidean distances between the human body feature vectors corresponding to the query set and the candidate set to obtain human body feature similarity, sequencing all the human body feature similarity from large to small, and taking a sequencing result as a retrieval result of pedestrian re-identification.

Compared with the prior art, the invention discloses a pedestrian re-identification method combining attribute and orientation, which reduces the overfitting of a pedestrian re-identification algorithm to identity information by introducing visual similarity information and improves the generalization performance of pedestrian re-identification characteristics, and the content of the invention mainly comprises the following steps: firstly, acquiring human body apparent attribute characteristics and human body orientation information of a pedestrian target in a pedestrian image by utilizing a human body attribute recognition model and an orientation recognition model; then, according to the similarity of the human body apparent attribute characteristics of the pedestrians in the training data set, calculating the visual similarity among different pedestrians to obtain a visual similarity matrix, and adjusting classification labels used in the classification loss function according to the similarity relation; finally, the method designs the measurement learning loss based on the orientation similarity, strengthens the minimization constraint of the intra-class distance between the same orientations on the basis of a triplet loss function, and simultaneously avoids directly optimizing the sample distance between opposite orientations with larger visual phase difference. The pedestrian re-recognition method combining the attribute and the orientation provided by the invention relieves the problem that the pedestrian re-recognition training algorithm is fit to the pedestrian identity information in the training data, and is beneficial to improving the mobility of the pedestrian re-recognition model in an actual scene.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a pedestrian re-recognition method combining attributes and directions.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a pedestrian re-recognition method combining attributes and directions, which solves the problem that a pedestrian re-recognition training algorithm is over-fitted to the pedestrian identity information in training data and is beneficial to improving the mobility of a pedestrian re-recognition model in an actual scene.

Referring to fig. 1, a flowchart of a pedestrian re-recognition method combining attributes and directions is disclosed in this embodiment. Firstly, acquiring human body apparent attribute characteristics and human body orientation information of a pedestrian target in a pedestrian image by utilizing a human body attribute recognition model and an orientation recognition model; then, according to the similarity of the human body apparent attribute characteristics of the pedestrians in the training data set, calculating the visual similarity among different pedestrians to obtain a visual similarity matrix, and adjusting classification labels used in the classification loss function according to the similarity relation; finally, the method designs the measurement learning loss based on the orientation similarity, strengthens the minimization constraint of the intra-class distance between the same orientations on the basis of a triplet loss function, and simultaneously avoids directly optimizing the sample distance between opposite orientations with larger visual phase difference. And after calculating the classification loss and the learning loss based on the measurement of the orientation similarity by using the classification labels, obtaining the sum of the two losses to supervise the training of the depth residual error neural network. The method specifically comprises the following steps:

s1: re-identifying the pedestrian image in the training data, acquiring the human body apparent attribute characteristics and the human body orientation labels of the pedestrian targets in the pedestrian image by utilizing the human body attribute identification model and the orientation identification model, and labeling the pedestrian classification labels;

s11: selecting a plurality of images in the human attribute identification data set, marking the images for clothes types, clothes colors, backpacks, hats, glasses and body orientations, inputting the marked images into a depth residual error neural network, and training an attribute identification model and an orientation identification model;

s12: inputting all images in the pedestrian re-recognition training data into a depth residual neural network according to the attribute recognition model which is completed by the S11 training, and taking the feature vector which is obtained by carrying out global maximum pooling on the feature map after the last layer of convolution as the human body apparent attribute feature of the pedestrian image;

s13: performing attribute recognition of body orientation on all images in the pedestrian re-recognition training data according to the orientation recognition model completed by the training in the S11, and obtaining the body orientation labels of all images;

s2: calculating the average feature vector of all images of each person as the attribute feature of the pedestrian for the human body apparent attribute feature of each image acquired in the step S1, then calculating the similarity of the attribute features among different rows of people to acquire a visual similarity matrix, and adjusting a classification label by using a neighborhood label regularization method according to the similarity relation;

s21: in the training data for re-identifying pedestrians, the average characteristic of human body apparent attribute characteristics of all images of each pedestrian is taken as the attribute characteristic of each pedestrian and is recorded as

S22: the Euclidean distance is calculated between the attribute features of each pedestrian in pairs to obtain a visual similarity matrix D, namely

Wherein dist (-) is a euclidean distance calculation function; d (D) _j,k Visual similarity representing pedestrians j and k;

the attribute characteristics of the pedestrian target j; />

The attribute characteristics of the pedestrian target k;

s23: estimating the number scale N of other pedestrians with similar vision in each pedestrian in the data set, wherein N=5 can be set; selecting a suitable threshold t from the similarity matrix by a dichotomy so that the average value of the number of similar pedestrians with the similarity to each pedestrian being greater than the threshold is between N and n+1; recording a similar pedestrian category set of the pedestrian y as Sim (y); the specific process of selecting the proper threshold t is as follows:

s231: let the possible minimum value l=0 and the maximum value r=1 of the threshold t;

s232: estimating a threshold t=0.5 according to a dichotomy, and enabling all pedestrian targets with visual similarity larger than the threshold t to be similar pedestrians to form a similar set;

s233: calculating the mean value a=average (Card (Sim (y))) of the magnitudes of the corresponding visual similarities in all the similarity sets at this time;

s234: if a is between N and N+1, determining that the current threshold t=0.5 is a selected proper threshold, and outputting the threshold t; if a is less than N, it is indicated that the threshold t should be lowered to increase the number of elements in the similarity set, such that the maximum r=the threshold t;

s235: re-estimating the threshold t= (r+l)/2; if a > n+1, the threshold t should be increased to decrease the number of elements in the similarity set, let the minimum value l=threshold t, re-estimate the threshold t= (r+l)/2, and return to S234;

s24: if the visual similarity of the two pedestrians is larger than t, the two pedestrians are considered to be similar, and on the basis of the category labels of the pedestrians coded by one-hot, the similar pedestrians are given higher category weights than the dissimilar pedestrians, so that the adjustment of the pedestrian classification labels is realized; the pedestrian classification label for pedestrian i is denoted as q _i The pedestrian classification label is called a category, q _i (c) For the probability that pedestrian i is noted as belonging to category c, the formula is as follows:

wherein ,y_i Is the category of pedestrian i, noted in the pedestrian re-identification dataset, sim (y _i ) Is a set of similar pedestrian categories of which the category is similar to the category, n _i Is Sim (y) _i ) The number of the contained categories, epsilon, is a super parameter with a value between 0 and 1, the smoothness degree of the label is controlled, and the epsilon=0.2 is preferable;

s3: generating a human body orientation mark in the step S1 and a pedestrian classification label acquired in the step S2, training a pedestrian re-recognition model by using classification loss and measurement learning loss based on orientation similarity, extracting human body feature vectors of pedestrian images, calculating human body feature similarity, and completing a pedestrian re-recognition task by using human body feature retrieval matching;

the specific implementation steps for training the pedestrian re-identification model and extracting the characteristics to complete the task of pedestrian re-identification comprise the following steps:

s31: initializing a depth residual neural network comprising a classifier, wherein the depth residual neural network can be initialized by using an ImageNet pre-training parameter, and the classifier can be realized by using a full connection layer;

s32: selecting a batch of image data with the quantity of batch size from the pedestrian re-identification training data each time, wherein 64 pieces of image data can be generally taken; performing necessary data enhancement on the data, including but not limited to random elimination, random color enhancement, random flipping, random cutting and the like, and then inputting the data into a depth residual neural network; for the output of the network classifier, the classification loss for each image sample is calculated as follows:

wherein ,p_i( c) The classifier predicts the probability that sample i belongs to category c; q _i (c) The probability that pedestrian i is marked as belonging to category c;

s33: sequentially selecting each image from each batch of data as an anchor point diagram; selecting a pedestrian image with the same category as the anchor point image and the largest Euclidean distance of adjacent directions as an adjacent direction positive sample image, and selecting a pedestrian image with the smallest Euclidean distance different from the category of the anchor point image as a negative sample image; forming a triplet by the anchor point diagram, the adjacent positive sample image and the adjacent negative sample image, inputting the triplet into a triplet loss function, and calculating triplet loss;

s34: selecting the pedestrian images with the same category and the same orientation and the largest distance as the same orientation positive sample images for each anchor point diagram selected in the S33; inputting the characteristics of the anchor point graph and the positive sample image with the same direction into a contrast loss function, and calculating the contrast loss between the positive samples;

s35: calculating the sum of the contrast loss and the triplet loss, namely, the learning loss based on the measurement of the orientation similarity, namely:

wherein ,

the maximum distance between positive samples is directly taken as a loss value for the contrast loss between the positive samples obtained when the kth image of the p-th pedestrian in the S34 is taken as an anchor point diagram; />

The triplet loss obtained in S33;

s36: to be used for

And->

As a final loss function value, adjusting model parameters by using a back propagation algorithm on the depth residual neural network until a preset convergence condition is reached; the learning rate is 0.001, the optimizer is Adam, and the training turn is 120 turns;

s4: after model training is completed, extracting human body feature vectors from all images of a query set and a candidate set which are formed by collecting target images and pedestrian image data to be identified, calculating Euclidean distances between the human body feature vectors in the query set and the candidate set as human body feature similarity, and sequencing from large to small in similarity as a retrieval result of pedestrian re-identification to complete a task of pedestrian re-identification.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The pedestrian re-identification method combining the attribute and the orientation is characterized by comprising the following steps of:

step 1: collecting pedestrian images, forming re-identification training data, and labeling pedestrian classification labels; respectively utilizing an attribute identification model and a direction identification model to acquire human body apparent attribute characteristics and human body direction marks of pedestrian targets in the pedestrian images;

step 3: training a pedestrian re-recognition model by using classification loss and measurement learning loss based on orientation similarity according to the human body orientation label and the pedestrian classification label obtained in the step 2;

step 31: initializing a depth residual neural network comprising a classifier;

step 32: selecting a batch of image data from the pedestrian re-identification training data each time, carrying out data enhancement on the data, and then inputting the data into a depth residual error neural network; for the output of the network classifier, the classification loss for each image sample is calculated as follows:

wherein ,p_i (c) The classifier predicts the probability that pedestrian i belongs to category c; q _i (c) The probability that pedestrian i is marked as belonging to category c;

step 33: inputting each pedestrian image into a pedestrian re-identification model in each batch of data to acquire a corresponding human body feature vector, and sequentially selecting each image as an anchor point diagram; selecting a pedestrian image with the same pedestrian classification label as the anchor point diagram, and the human body orientation marked as the pedestrian image with the largest Euclidean distance of the adjacent orientation as an adjacent orientation positive sample image; selecting a pedestrian image with the smallest Euclidean distance, which is different from the pedestrian classification labels of the anchor point diagram, as a negative sample image; forming the anchor point diagram, human body feature vectors adjacently oriented to the positive sample image and the negative sample image into a triplet, inputting the triplet into a triplet loss function, and calculating to obtain triplet loss;

step 34: selecting, for each selected anchor point diagram, a pedestrian image with the same pedestrian classification label as the anchor point diagram, the same human body orientation and the same Euclidean distance and the largest distance as the same orientation positive sample image; inputting the anchor point diagram and the human body characteristic vector of the positive sample image in the same direction into a contrast loss function, and calculating contrast loss between positive samples;

step 35: calculating the sum of the contrast loss and the triplet loss, namely, the learning loss based on the measurement of the orientation similarity, namely:

wherein ,

taking the maximum distance between positive samples as a loss value directly for the contrast loss between the positive samples obtained when the kth image of the p-th pedestrian in the step 34 is taken as an anchor point diagram; />

A triplet loss calculated in step 33;

step 36: to be used for

And->

As a final loss function value, adjusting model parameters by using a back propagation algorithm on the depth residual neural network until a preset convergence condition is reached;

step 4: and respectively inputting the target image and the pedestrian image data to be identified into the trained pedestrian re-identification model, extracting corresponding human body feature vectors, calculating human body feature similarity, and searching and matching according to the human body feature similarity to finish the pedestrian re-identification task.

2. The pedestrian re-recognition method combining attributes and directions according to claim 1, wherein the step 1 of obtaining the apparent attribute features and the human directions specifically comprises the following steps:

3. The pedestrian re-recognition method combining attributes and directions according to claim 1, wherein the attribute features of the same pedestrian target in the step 2 are average values of human body apparent attribute features of all pedestrian images corresponding to the same pedestrian target;

the expression of similarity is:

wherein dist (-) is a euclidean distance calculation function; d (D) _j,k Vision representing pedestrian object j and pedestrian object kSimilarity;

the attribute characteristics of the pedestrian target j; />

Is an attribute feature of the pedestrian object k.

4. The method for re-identifying pedestrians by combining attributes and directions according to claim 1, wherein the step 2 of adjusting the pedestrian classification tags by using a neighborhood tag regularization method specifically comprises the following steps:

step 21: estimating the number scale N of similar pedestrians with similar vision in each pedestrian target in the re-recognition training data, selecting a similar threshold value from the vision similarity matrix of each pedestrian target by using a dichotomy, wherein the vision similarity value is larger than the similar threshold value and is the similar pedestrians of the current pedestrian target, and enabling the average value range of the number of the similar pedestrians to be [ N, N+1];

step 22: class weights are assigned to the pedestrian classification tags of similar pedestrians for each pedestrian target, thereby adjusting the corresponding pedestrian classification tags.

5. The pedestrian re-recognition method combining attributes and directions according to claim 1, wherein the specific process of step 4 is as follows: