CN112801051A

CN112801051A - Method for re-identifying blocked pedestrians based on multitask learning

Info

Publication number: CN112801051A
Application number: CN202110333021.5A
Authority: CN
Inventors: 沈子荷; 崔鹏; 马超
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-05-14

Abstract

The embodiment of the application discloses a method for re-identifying blocked pedestrians based on multi-task learning, which comprises the following steps: one branch constructs a pedestrian re-identification network with random shielding, so that the pedestrian re-identification network automatically generates a partial shielding picture to obtain a shielding data set; training the two branches of the whole body graph and the occlusion graph on the network respectively; combining the pedestrian identities learned by the whole body image and the shielding image, and arranging according to the similarity to obtain an identification result; using the shielding picture as query, using the whole body picture as galery to select half of the identity of the pedestrian for training; and evaluating the recognition effect of the network model by using the identity of the remaining half of the pedestrians as a test set. Compared with other pedestrian re-identification methods, a random shielding module is introduced into one branch of multitask learning, all pictures are partially shielded, then a whole body graph and a shielding graph are learned respectively, the characteristics learned by the whole body graph and the shielding graph are combined, pedestrian re-identification is achieved, and the effectiveness of the method is verified on the datasets Occluded-REID and DukeMTMC-reiD.

Description

Method for re-identifying blocked pedestrians based on multitask learning

Technical Field

The embodiment of the application relates to the technical field of computer vision image processing, in particular to a method for re-identifying a blocked pedestrian based on multi-task learning.

Background

Pedestrian re-identification aims at re-identifying target persons by multiple non-overlapping cameras, i.e. given a surveyor and a gallery, our goal is to find all images of the same person containing the surveyor in the gallery. It is an important research topic in the field of computer vision, and has been applied to many important public places, especially crowded places such as campuses, shopping malls, airports, etc. However, the research-oriented scenario is far from the practical scenario, and there are many factors affecting the accuracy of ReID in the real scenario, and there are mainly the following problems: camera pixel problems, the camera may have lower pixels due to distance from the pedestrian; the pedestrian pose problem, the various pose changes and uncertainties of pedestrians, also greatly increases the difficulty, and these challenges result in re-identification that remains an unsolved problem; the problem of sheltering from, the pedestrian in the photo can be sheltered from by other pedestrians, bill-board, wall and car etc. has brought certain hindrance for the research of pedestrian heavy identification, so pedestrian heavy identification still has very big degree of difficulty in the aspect of sheltering from.

The problem of shielding is inevitable, when shielding objects appear in the picture, wrong recognition results can be caused, so that a network can pay more attention to local information and characteristics on the picture instead of overall information through multitask learning, the information attention to the shielding part is reduced, and the problem caused by shielding can be solved to a great extent.

Disclosure of Invention

Therefore, the embodiment of the application discloses a shielded pedestrian re-identification method based on multi-task learning, the method is divided into two branches, a random shielding module is built in one branch, a partial shielding picture is automatically generated, a shielding data set is obtained for learning, the other branch is used for carrying out local feature learning and posture estimation on a whole body map aiming at the whole body map, the pedestrian ID learned by the whole body map and the pedestrian ID learned by the shielding map are arranged according to the similarity, a recognition result is obtained, the shielding picture is used as query, the whole body map is used as galery, and half of the pedestrian identities are randomly selected for training. And training the pedestrian re-recognition network by using the identity of the remaining half of the pedestrians as a test set to obtain a network model, and evaluating the recognition effect of the network model. And inputting the test data set into a model, and evaluating the effect of the model by adopting an average precision mean mAP and an accumulated matching characteristic curve CMC curve. Inputting the galery data set into the model, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID. Inputting a query pedestrian image, obtaining pedestrian characteristics, retrieving the image characteristics, calculating the similarity, and selecting the photo with the highest similarity, wherein the obtained ID is the pedestrian ID of the query image.

In order to achieve the above object, the embodiments of the present application provide the following technical solutions:

according to the embodiment of the application, the method for re-identifying the blocked pedestrian based on the multi-task learning is provided, and comprises the following steps:

a branch construction random shielding module automatically generates a partial shielding picture to obtain a shielding data set for learning;

the other branch is used for carrying out local feature learning and posture estimation on the whole body image respectively aiming at the whole body image, and then combining the features of the whole body image and the whole body image;

combining the pedestrian ID learned by the whole body image and the pedestrian ID learned by the occlusion image, and arranging according to the similarity to obtain an identification result;

using a shielding picture as a query and a whole body picture as a galery, and randomly selecting half of the identity of a pedestrian for training to obtain a network model;

the method comprises the steps that the identification effect of a network model is evaluated by using the identity of the remaining half of pedestrians as a test set, the test data set is input into the model, and the model effect is evaluated by adopting an average precision mean value mAP and an accumulative matching characteristic curve CMC curve;

inputting the galery data set into a trained model, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID;

inputting a query pedestrian image, obtaining pedestrian characteristics, retrieving the image characteristics, calculating the similarity, and selecting the photo with the highest similarity, wherein the obtained ID is the pedestrian ID of the query image.

In summary, the embodiment of the present application provides a method for re-identifying a blocked pedestrian based on multitask learning, the method includes dividing one branch into two branches, where one branch constructs a random blocking module, automatically generates a partial blocking picture, obtains a blocking data set for learning, and the other branch performs local feature learning and posture estimation on a whole body map for the whole body map, and obtains an identification result by combining a pedestrian ID learned from the whole body map and a pedestrian ID learned from the blocking map and arranging the pedestrian IDs according to the similarity. Using a shielding picture as a query and a whole body picture as a galery, and randomly selecting half of the identity of a pedestrian for training to obtain a network model; the method comprises the steps that the identification effect of a network model is evaluated by using the identity of the remaining half of pedestrians as a test set, the test data set is input into the model, and the model effect is evaluated by adopting an average precision mean value mAP and an accumulative matching characteristic curve CMC curve; inputting the galery data set into a multi-task model, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID. The shielded picture is used as a query image, image features are retrieved, the similarity is calculated, the picture with the highest similarity is selected from the galery, and the obtained ID is the pedestrian ID of the query image.

Drawings

Fig. 1 is a flowchart of a method for re-identifying an occluded pedestrian based on multitask learning according to an embodiment of the present application.

Fig. 2 is a block diagram of a pedestrian re-identification blocking network structure based on multitask learning according to an embodiment of the present application.

Fig. 3 is a structure of an occlusion map module according to an embodiment of the present application.

Fig. 4 is a structure of a whole body module according to an embodiment of the present application.

Detailed Description

The following description is given for the purpose of illustration and not limitation, and may be embodied in other forms without limitation to the embodiments set forth herein, so that the present invention will be readily apparent to those skilled in the art from that description.

Fig. 1 is a flowchart of a method for re-identifying an occluded pedestrian based on multitask learning according to an embodiment of the present application, where the method includes the following steps:

step 101, a branch constructs a random shielding module, and automatically generates a partial shielding picture to obtain a shielding data set for learning;

102, the other branch respectively performs local feature learning and posture estimation on the whole body image aiming at the whole body image, and then combines the features of the whole body image and the whole body image;

103, combining the pedestrian ID learned by the whole body diagram and the pedestrian ID learned by the occlusion diagram, and arranging according to the similarity to obtain an identification result;

step 104, using the shielding picture as query and the whole body picture as galery, and randomly selecting half of the identity of the pedestrian for training to obtain a network model;

step 105, using the identity of the remaining half of the pedestrians as a test set, evaluating the recognition effect of the network model, inputting the test data set into the model, and evaluating the model effect by adopting an average precision mean mAP and an accumulative matching characteristic curve CMC curve;

step 106, inputting the galery data set into the trained model, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID;

and step 107, inputting the query pedestrian image to obtain pedestrian features, retrieving the image features, calculating the similarity, and selecting the photo with the highest similarity, wherein the obtained ID is the pedestrian ID of the query image.

In a possible implementation manner, the method for constructing the random occlusion module in one branch in step 101 specifically includes the following steps:

a data set X with N images and M pedestrian IDs

Representing all of the samples in the data set X,

a jth graph representing the ith pedestrian ID converts the data set X into a data set Z through a mapping function, wherein each picture in the data set Z is a randomly occluded picture,

representing all the samples in the data set Z,

is represented by

Generating the jth pedestrian of the ith map;

in addition, we need to learn a feature extractor h to make the features of the same person close and the feature difference of different persons is larger;

then we train these occlusion pictures in the convolutional neural network to identify the identity of each person, train on the data set X, and the objective function is:

in the formula f (-) is the classifier, L^p(. h) is an identity loss function, and after a data set Z is obtained by a mapping function, the two data sets are combined to obtain an objective function:

we take pedestrian re-identification as a classification problem and use softmax loss as the identity loss, with the loss function:

in the formula, K represents the identity of K pedestrians in total,

and representing the prediction result of the ith training sample as the kth classification.

In the step 102, local feature learning and posture estimation are respectively performed on the whole body map, and the specific steps are as follows:

the whole body graph obtains the tenor T through a backbone network ResNet 50;

then carrying out attitude estimation on the whole body image, inputting an image, obtaining 18 landmarks by using an attitude estimator, predicting the coordinate and confidence score of each landmark by the attitude estimator, comparing the coordinate and the confidence score with a threshold gamma, solving the coordinate when the confidence is greater than gamma, and setting the value smaller than gamma as 0, wherein the formula is as follows:

generating two-dimensional Gaussian heat maps centering on the ground real position by using the landmarks, wherein each heat map clearly encodes information of different areas of the pedestrian;

pooling TensorT averages as global feature f_gMultiplying each heat map by T to obtain a posture guidance characteristic map, focusing the non-shielding part of the pedestrian and inhibiting information from a shielding area;

each posture guide feature map passes through an average pooling layer to generate a feature vector with 2048 dimensions, and then all the feature vectors are maximally pooled and are compared with the global feature f_gAre connected together and are denoted by f_catA 1 is to f_catInputting a fully-connected layer, and recording as a global gesture-oriented feature f_poseThe ID of each input image is predicted using softmax.

Fig. 2 is an overall network structure used in the embodiment of the present application, where the method is divided into two branches, one of the branches constitutes a random occlusion module, a partial occlusion picture is automatically generated, an occlusion data set is obtained for learning, the other branch performs local feature learning and posture estimation on a whole body map for the whole body map, pedestrian IDs learned by combining the whole body map and pedestrian IDs learned by the occlusion map are arranged according to the size of similarity, an identification result is obtained, and finally, the pedestrian IDs of the whole body map and the occlusion map are arranged according to the size of similarity, and an identification result is obtained. Using a shielding picture as a query and a whole body picture as a galery, and randomly selecting half of the identity of a pedestrian for training to obtain a network model; the method comprises the steps that the identification effect of a network model is evaluated by using the identity of the remaining half of pedestrians as a test set, the test data set is input into the model, and the model effect is evaluated by adopting an average precision mean value mAP and an accumulative matching characteristic curve CMC curve; inputting the galery data set into a multi-task model, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID. The method comprises the steps of using an occlusion picture as a query image, searching image features, calculating similarity, selecting a picture with the highest similarity from the galery, and obtaining an ID which is a pedestrian ID of the query image.

Fig. 3 is a schematic structural diagram of an occlusion map module provided in the embodiment of the present application, and a calculation flow is as follows:

a data set X with N images and M pedestrian IDs

Representing all of the samples in the data set X,

representing all the samples in the data set Z,

is represented by

Generating the jth pedestrian of the ith map;

in the formula, K represents the identity of K pedestrians in total,

Fig. 4 is a schematic structural diagram of a whole body diagram module provided in the embodiment of the present application, which performs local feature learning and posture estimation on the whole body diagram respectively, and a calculation flow is as follows:

for local feature branches, the whole-body graph obtains the tenor T through a backbone network ResNet 50; dividing tentor T horizontally into six parts, obtaining 6 column vectors g through average pooling, performing convolution operation on each vector, reducing dimensionality to obtain a column vector h, and finally inputting all the column vectors h into a classifier composed of a full connection layer and a softmax function to obtain a predicted pedestrian ID;

inputting a picture, obtaining 18 landmarks by using a posture estimator, predicting coordinates and confidence score of each landmark by using the posture estimator, generating a two-dimensional Gaussian heat map with the ground real position as the center by using the landmarks, and specifically encoding information of different regions of pedestrians by using each heat map; pooling tenor T averages as global feature f_gMultiplying each heat map by T to obtain a posture guidance characteristic map, focusing the non-shielding part of the pedestrian and inhibiting information from a shielding area;

To verify the accuracy and robustness of the present invention, the present invention performed experiments on Occluded-REID and DukeMTMC-reiD. The model effect was evaluated using mean average Precision (mAP) and Cumulative matching Characteristic curve (CMC curve). Experiments are carried out on Occluded-REID and DukeMTMC-reiD data sets, and identification results are compared with the identification results of the existing pedestrian re-identification method, so that identification result data shown in the table 1 are obtained.

TABLE 1

The accuracy rates of Rank-1 of Occluded-REID and DukeMTMC-reiD in the method are respectively 50.3% and 79.8%, the accuracy rates of mAP are respectively 35.8% and 62.1%, and the method is beyond most pedestrian re-identification algorithms.

In summary, the embodiment of the present application provides a method for re-identifying a blocked pedestrian based on multitask learning, the method includes dividing one branch into two branches, where one branch constructs a random blocking module, automatically generates a partial blocking picture, obtains a blocking data set for learning, and the other branch performs local feature learning and posture estimation on a whole body map for the whole body map, and obtains an identification result by combining a pedestrian ID learned from the whole body map and a pedestrian ID learned from the blocking map and arranging the pedestrian IDs according to the similarity, and finally obtains an identification result by combining the pedestrian IDs of the whole body map and the blocking map and arranging the pedestrian IDs according to the similarity. Using a shielding picture as a query and a whole body picture as a galery, and randomly selecting half of the identity of a pedestrian for training to obtain a network model; the method comprises the steps that the identification effect of a network model is evaluated by using the identity of the remaining half of pedestrians as a test set, the test data set is input into the model, and the model effect is evaluated by adopting an average precision mean value mAP and an accumulative matching characteristic curve CMC curve; inputting the galery data set into a multi-task model, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID. Inputting a query pedestrian image, obtaining pedestrian characteristics, retrieving the image characteristics, calculating the similarity, and selecting the picture with the highest similarity, wherein the obtained ID is the pedestrian ID of the query image.

It should be noted that while methods and flow diagrams are provided herein, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed to achieve desirable results, that additional or fewer operations steps may be included, that certain steps may be omitted, that steps be combined into one step, and/or that a step be split into multiple steps based on conventional or non-inventive means.

It should be understood that various changes and modifications to the technical solution of the present invention by those skilled in the art without departing from the spirit of the invention shall fall within the scope of the invention defined by the claims.

Claims

1. A method for re-identifying blocked pedestrians based on multitask learning is characterized by comprising the following steps:

inputting the galery data set into the model obtained in the fourth step, storing the pedestrian image characteristics extracted by the model, and finally obtaining a pedestrian image characteristic database, wherein each characteristic has a unique pedestrian ID;

inputting a query pedestrian image, obtaining pedestrian characteristics, retrieving the image characteristics, calculating the similarity, and selecting the picture with the highest similarity, wherein the obtained ID is the pedestrian ID of the query image.

2. The method of claim 1, wherein the step of constructing the random occlusion module by one of the branches comprises the steps of:

a data set X with N images and M pedestrian IDs is converted into a data set Z through a mapping function, and each image in the data set Z is a randomly shielded image;

then training the occlusion pictures in a convolutional neural network to identify the identity of each person, and training on a data set X;

we take pedestrian re-identification as a classification problem and use softmax loss as the identity loss.

3. The method as claimed in claim 1, wherein the local feature learning and the pose estimation are performed on the whole body map respectively, and then the pedestrian ID result is obtained by combining the features learned by the local feature learning and the pose estimation, specifically as follows:

the whole body graph obtains the tenor T through a backbone network ResNet 50;

dividing tentor T horizontally into six parts, obtaining 6 column vectors g through average pooling, performing convolution operation on each vector, reducing dimensionality to obtain a column vector h, and finally inputting all the column vectors h into a classifier composed of a full connection layer and a softmax function to obtain a predicted pedestrian ID;

then carrying out attitude estimation on the whole body image, inputting an image, obtaining 18 landmarks by using an attitude estimator, predicting the coordinate and confidence score of each landmark by the attitude estimator, then comparing the coordinate and the confidence score with a threshold gamma, solving the coordinate when the confidence is greater than gamma, and setting the confidence value smaller than gamma as 0;

pooling the tensor T into global features on average, multiplying each heat map by T to obtain a posture guidance feature map, focusing the non-shielding part of the pedestrian, and inhibiting information from a shielding area;

and (3) each attitude guide feature map passes through an average pooling layer to generate a 2048-dimensional feature vector, then maximally pooling all feature vectors, connecting the feature vectors with global features, inputting the feature vectors into a full-connection layer to obtain attitude guide global features, and predicting the ID of each input image by using softmax.