CN112801008B

CN112801008B - Pedestrian re-recognition method and device, electronic equipment and readable storage medium

Info

Publication number: CN112801008B
Application number: CN202110168058.7A
Authority: CN
Inventors: 黄燕挺; 冯子钜; 叶润源; 毛永雄; 董帅; 邹昆
Original assignee: Zhongshan Xidao Technology Co ltd; University of Electronic Science and Technology of China Zhongshan Institute
Current assignee: Zhongshan Xidao Technology Co ltd; University of Electronic Science and Technology of China Zhongshan Institute
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2024-05-31
Anticipated expiration: 2041-02-05
Also published as: CN112801008A

Abstract

The application provides a pedestrian re-identification method, a pedestrian re-identification device, electronic equipment and a readable storage medium, and relates to the technical field of image processing. According to the method, the first branch network is added into the pedestrian re-recognition model for extracting the pedestrian segmentation attention feature map, so that the model can pay more attention to the features of the area where the pedestrian is located when the pedestrian re-recognition is carried out, the features which are more beneficial and more obvious to the pedestrian re-recognition in the image can be extracted, a better recognition effect is achieved for the pedestrian re-recognition under the condition that the pedestrian is blocked, and the accuracy of the pedestrian re-recognition can be effectively improved.

Description

Pedestrian re-recognition method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a pedestrian re-recognition method, a device, an electronic apparatus, and a readable storage medium.

Background

Pedestrian re-identification is a technology for researching fire heat in the field of computer vision in recent years, and is identified through the characteristics of wearing, posture, hairstyle and the like of pedestrians, and is mainly oriented to identification and retrieval of pedestrians under a cross-camera cross-scene condition. The method has wide application prospect in the fields of video monitoring, intelligent security protection and the like, and the development of pedestrian re-identification technology has important significance for building safe cities.

Pedestrian re-recognition is a very challenging computer vision task whose task goal is to retrieve pedestrians under different cameras, where the difficulties include varying background, illumination, blurring of pictures, differences in the pose of pedestrians, and occlusion of debris, among others.

In recent years, the deep learning method is widely applied to multiple computer vision fields such as image classification, target recognition and the like, and compared with the traditional manual design method, the deep learning can obtain better performance, however, due to the fact that a monitoring video is complex, the acquired image is influenced by multiple factors to possibly cause shielding of pedestrians, such as garbage cans, buildings, other pedestrians and the like, the existing recognition method is difficult to recognize the same pedestrian under different cameras, and the pedestrian re-recognition accuracy is low.

Disclosure of Invention

The embodiment of the application aims to provide a pedestrian re-identification method, a device, electronic equipment and a readable storage medium, which are used for solving the problem of low pedestrian re-identification accuracy in the prior art.

In a first aspect, an embodiment of the present application provides a pedestrian re-recognition method, where the method includes:

Extracting image features of an image to be identified through a backbone network of the pedestrian re-identification model;

extracting a pedestrian segmentation attention characteristic map according to the image characteristics through a first branch network of the pedestrian re-identification model, wherein the pedestrian segmentation attention characteristic map is used for marking the position of a pedestrian in the image to be identified;

extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-recognition model;

fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fused feature map;

and carrying out pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result.

In the implementation process, the first branch network is added into the pedestrian re-recognition model for extracting the pedestrian segmentation attention feature map, so that the model can pay more attention to the features of the area where the pedestrian is located when the pedestrian re-recognition is carried out, the features which are more beneficial and more obvious to the pedestrian re-recognition in the image can be extracted, the pedestrian re-recognition under the condition that the pedestrian is blocked has a better recognition effect, and the accuracy of the pedestrian re-recognition can be effectively improved.

Optionally, the extracting, by the first branch network of the pedestrian re-recognition model, a pedestrian segmentation attention characteristic map according to the image features includes:

detecting pedestrians in the image to be identified according to the image characteristics through the first branch network to obtain a pedestrian detection frame;

Dividing the area selected by the pedestrian detection frame from the image to be identified through the first branch network to obtain a divided image;

Filling the segmented image through the first branch network to obtain a target segmented image with the same size as the image to be identified;

And carrying out pedestrian identification on each pixel point in the target segmentation image through the first branch network to obtain a pedestrian segmentation attention characteristic diagram.

In the implementation process, the pedestrian detection frame is divided, so that the pedestrian area can be divided, and interference caused by the background feature on pedestrian recognition is eliminated.

Optionally, the step of performing pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result includes:

Dividing the fusion feature map through the pedestrian re-recognition model to obtain a plurality of feature blocks;

Uniformly pooling each feature block through the pedestrian re-recognition model to obtain a first local feature corresponding to each feature block;

Performing dimension reduction processing on each first local feature by using a preset convolution check through the pedestrian re-identification model to obtain a corresponding second local feature;

And inputting each second local feature into a corresponding classifier in the pedestrian re-recognition model to obtain a recognition result output by the classifier.

In the implementation process, the fusion feature map is divided into local features for later prediction, so that finer granularity features can be provided for pedestrian re-recognition, and the precision of pedestrian re-recognition is improved.

Optionally, the inputting each second local feature into a corresponding classifier in the pedestrian re-recognition model, to obtain a recognition result output by the classifier, includes:

connecting a plurality of second local features with the global feature map according to channel dimensions through the pedestrian re-recognition model to obtain total features;

And inputting the total features to corresponding classifiers in the pedestrian re-recognition model to obtain recognition results output by the classifiers.

In the implementation process, the obtained total features are input into the classifier, so that effective fusion among all input features is realized, more features can be identified, and the identification effect is improved.

Optionally, the pedestrian segmentation attention feature map is a binary image, a feature point with a feature value of 1 represents a position of a pedestrian in the pedestrian segmentation attention feature map, the pedestrian segmentation attention feature map and the global feature map are fused through the pedestrian re-recognition model to obtain a fused feature map, and the method includes:

And multiplying the pedestrian segmentation attention feature map with the global feature map through the pedestrian re-recognition model to obtain a fusion feature map, so that effective positioning of features belonging to pedestrians in the global feature map can be realized.

Optionally, the pedestrian re-recognition model is trained by:

inputting a training image into the pedestrian re-recognition model to obtain a prediction result output by the pedestrian re-recognition model, wherein the training image comprises a pedestrian shielding image;

calculating a total loss value by using a loss function according to the prediction result;

And updating network parameters in the pedestrian re-identification model according to the total loss value.

In the implementation process, the pedestrian shielding image is added in the training process to serve as a training image, so that the accuracy of re-identifying pedestrians in the pedestrian shielding state of the model can be effectively improved.

Optionally, the tag information of the training image includes a tag of whether each pixel belongs to a pedestrian, and the method further includes:

pedestrian detection is carried out on the training image through an example segmentation algorithm, and a segmentation mask image is obtained and is used for marking pixel points belonging to pedestrians in the training image;

and marking the training image by taking the segmentation mask image as a label.

In the implementation process, the example segmentation algorithm is adopted to obtain the segmentation mask image to serve as the label of the training image, so that the problem that a great deal of time is consumed for manual labeling can be avoided.

Optionally, the obtaining the segmentation mask image includes:

Pedestrian detection is carried out on the training image through an example segmentation algorithm, and at least one pedestrian detection frame is obtained;

If the at least one pedestrian detection frame comprises at least two detection frames, determining the position relation between the detection frame center of each detection frame and the horizontal center of the training image;

determining a target detection frame according to the position relation, wherein the area selected by the target detection frame is the area where the pedestrian to be identified is located;

and obtaining a corresponding segmentation mask image according to the target detection frame.

In the implementation process, the detection frame close to the central position in the image can be selected as the area where the pedestrian is located in the mode, so that the model can pay more attention to the pedestrian at the middle position in the training process, and accurate segmentation of the pedestrian can be realized under the condition that the pedestrian is shielded.

Optionally, the method further comprises:

and preprocessing the pedestrian images in the training images by adopting a random erasure data enhancement algorithm to obtain pedestrian shielding images, so that the number of the pedestrian shielding images can be increased.

In a second aspect, an embodiment of the present application provides a pedestrian re-recognition apparatus, including:

the main feature extraction module is used for extracting image features of the image to be identified through a main network of the pedestrian re-identification model;

The first branch feature extraction module is used for extracting a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-recognition model, and the pedestrian segmentation attention feature map is used for marking the position of a pedestrian in the image to be recognized;

the second branch characteristic extraction module is used for extracting a global characteristic diagram of the pedestrian according to the image characteristics through a second branch network of the pedestrian re-identification model;

The feature fusion module is used for fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fused feature map;

And the pedestrian re-recognition module is used for carrying out pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result.

Optionally, the first branch feature extraction module is configured to:

Optionally, the pedestrian re-identification module is configured to:

Optionally, the pedestrian re-recognition module is configured to connect, through the pedestrian re-recognition model, the plurality of second local features with the global feature map according to a channel dimension, so as to obtain a total feature; and inputting the total features to corresponding classifiers in the pedestrian re-recognition model to obtain recognition results output by the classifiers.

Optionally, the pedestrian segmentation attention feature map is a binary image, a feature point with a feature value of 1 represents a position of a pedestrian in the pedestrian segmentation attention feature map, and the feature fusion module is configured to multiply the pedestrian segmentation attention feature map with the global feature map through the pedestrian re-recognition model to obtain a fusion feature map.

Optionally, the apparatus further comprises:

The model training module is used for inputting a training image into the pedestrian re-recognition model to obtain a prediction result output by the pedestrian re-recognition model, and the training image comprises a pedestrian shielding image; calculating a total loss value by using a loss function according to the prediction result; and updating network parameters in the pedestrian re-identification model according to the total loss value.

Optionally, the label information of the training image includes a label of whether each pixel belongs to a pedestrian, and the model training module is configured to detect the pedestrian on the training image by using an example segmentation algorithm to obtain a segmentation mask image, where the segmentation mask image is used to mark the pixel belonging to the pedestrian in the training image; and marking the training image by taking the segmentation mask image as a label.

Optionally, the model training module is configured to:

Optionally, the model training module is configured to perform preprocessing on the pedestrian image in the training image by adopting a random erasure data enhancement algorithm, so as to obtain a pedestrian shielding image.

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer readable instructions which, when executed by the processor, perform the steps of the method as provided in the first aspect above.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method as provided in the first aspect above.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of an electronic device for performing a pedestrian re-recognition method according to an embodiment of the present application;

FIG. 2 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a pedestrian re-recognition model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an image obtained by using a random erasure data enhancement algorithm according to an embodiment of the present application;

Fig. 5 is a block diagram of a pedestrian re-recognition device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The embodiment of the application provides a pedestrian re-recognition method, which is characterized in that a first branch network is added into a pedestrian re-recognition model for extracting a pedestrian segmentation attention feature map, so that the model can pay more attention to the features of the area where a pedestrian is located when the pedestrian re-recognition is carried out, thereby extracting the features which are more beneficial and more obvious to the pedestrian re-recognition in an image, having a better recognition effect to the pedestrian re-recognition under the condition that the pedestrian is blocked, and effectively improving the accuracy of the pedestrian re-recognition.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an electronic device for performing a pedestrian re-recognition method according to an embodiment of the present application, where the electronic device may include: at least one processor 110, such as a CPU, at least one communication interface 120, at least one memory 130, and at least one communication bus 140. Wherein the communication bus 140 is used to enable direct connection communication of these components. The communication interface 120 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 130 may be a high-speed RAM memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory. Memory 130 may also optionally be at least one storage device located remotely from the aforementioned processor. The memory 130 has stored therein computer readable instructions which, when executed by the processor 110, perform the method process shown in fig. 2 described below. For example, the memory 130 is used to store images, as well as various feature maps extracted, and the processor 110 may be used to perform feature extraction and pedestrian re-recognition based on the features.

It will be appreciated that the configuration shown in fig. 1 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present application, where the method includes the following steps:

step S110: and extracting image features of the image to be identified through a backbone network of the pedestrian re-identification model.

In some embodiments, the pedestrian re-recognition model may be a neural network model, such as a convolutional neural network, a cyclic convolutional neural network, resNet network, or variations of these neural networks, or the like.

The structural schematic diagram of the pedestrian re-recognition model in the embodiment of the application is shown in fig. 3, and the structural schematic diagram comprises a main network, a first branch network and a second branch network, wherein the main network is respectively connected with the first branch network and the second branch network, the main network is used for extracting basic image characteristics of an image, the first branch network is used for dividing a pedestrian image, extracting a pedestrian division attention characteristic diagram, and the second branch network is used for extracting a global image of a pedestrian.

The backbone network is used for extracting image features of the image to be identified, and the structure of the backbone network can be a convolution layer in a convolution neural network, for example, the backbone network can be other network structures, for example, VGG, and the convolution layer can be replaced by other deformable convolutions.

The image to be identified can be a series of pedestrian images acquired by a camera, or a video image, etc. After the identified image is input into the pedestrian re-identification model, the identified image is processed through a convolution layer in a backbone network, and image characteristics are output.

For ease of computation, the image features may be feature descriptions or feature vectors, or other forms of feature maps.

Step S120: and extracting a pedestrian segmentation attention characteristic diagram according to the image characteristics through a first branch network of the pedestrian re-identification model.

In order to facilitate the pedestrian re-recognition model to pay more attention to the pedestrian region in the image to be recognized so that the model can extract the characteristics favorable for pedestrian re-recognition, a first branch network for recognizing the region where the pedestrian is in the image to be recognized can be added in the model so as to divide the pedestrian region from the original image, and therefore the influence of other background characteristics on the recognition result of pedestrian re-recognition can be avoided.

Wherein the pedestrian segmentation attention characteristic map is used for marking the positions of pedestrians in the pedestrian segmentation attention map. In some embodiments, feature values of individual feature points in the pedestrian segmentation attention feature map are used to characterize the probability that the feature point belongs to a pedestrian.

The feature point with the feature value larger than the preset threshold value can be set as the pixel point belonging to the pedestrian, the preset threshold value can be set according to actual requirements, for example, 0.5 or 0.8, for example, the feature value of a certain feature point in the pedestrian segmentation attention feature map is 0.9, and the position where the feature point is the pedestrian is determined. In this way, the location of the pedestrian in the pedestrian segmentation attention characteristic map can be determined.

In some other embodiments, the pedestrian segmentation attention feature map may also be a binary image, where the feature value of the feature point is 0 or 1, the feature point with the feature value of 1 corresponds to the pixel point in the image to be identified, that is, the position where the pedestrian is located, and the feature point with the feature value of 0 corresponds to the pixel point in the image to be identified, that is, the position where the pedestrian is not located, and may be a background feature. In this way, the pedestrian segmentation attention profile can be understood as a mask image.

In the embodiment of the application, the pedestrian segmentation attention characteristic map can be used as an attention fusion to the global characteristic map of the pedestrian, so that the extraction of the characteristics of the area where the pedestrian is located by the model is increased, and the accuracy rate of the pedestrian re-identification in the shielding state is increased.

Step S130: and extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-identification model.

The second branch network is used for extracting global features, and can comprise a downsampling layer, a convolution layer, a normalization layer, a dimension reduction layer and the like, for example, the image features are input into the second branch network, the image features can be downsampled through global average pooling to obtain feature vectors, then the feature vectors are dimension reduced through the convolution layer, the normalization layer and the dimension reduction layer, and the obtained dimension reduction features can be global features. The expression form of the global feature is a global feature map.

Step S140: and fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fused feature map.

Attention mechanisms mean that after attention is applied in a global feature map, features that need to be focused on can be obtained, and then more attention resources are devoted to these features. Therefore, after the pedestrian segmentation attention feature map and the global feature map are fused, a feature region needing to be focused on can be applied to the global feature map, so that the pedestrian re-recognition model can recognize the features of the feature region, the obvious features in the global features can be enhanced, meaningless features are restrained, the obvious features of pedestrian re-recognition are obtained, and the recognition accuracy is effectively improved.

In some embodiments, if the pedestrian segmentation attention feature map is a binary image, the fusion method may be to multiply the pedestrian segmentation attention feature map with the global feature map to obtain a fusion feature map, so that global features of the pedestrian position are retained in the obtained fusion feature map, and other nonsensical features other than the pedestrian position are removed.

Of course, in other embodiments, whether the pedestrian segmentation attention profile is a binary image or not, the pedestrian segmentation attention profile may be weighted and summed with each corresponding feature in the global profile to obtain a fused profile.

Step S150: and carrying out pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result.

After the fusion feature map is obtained, pedestrian re-recognition can be performed based on the fusion feature map. When the identification is carried out, the classification in the pedestrian re-identification model can be adopted to carry out pedestrian ID classification on the fusion feature map, so that an identification result is obtained.

In some embodiments, if there are multiple pedestrians in the image to be identified and there is a situation that the pedestrians are blocked, in this case, in the process of obtaining the pedestrian segmentation attention pattern, the pedestrians in the image to be identified may be detected through the first branch network according to the image features to obtain a pedestrian detection frame, then the region selected by the pedestrian detection frame is segmented from the image to be identified through the first branch network to obtain a segmented image, then the segmented image is filled through the first branch network to obtain a target segmented image with the same size as the image to be identified, and finally, the pedestrians are identified for each pixel point in the target segmented image to obtain the pedestrian segmentation attention pattern.

The area selected by the pedestrian detection frame is an area where pedestrians are located, and if a plurality of pedestrians exist in the image to be identified, a plurality of pedestrian detection frames may be obtained, and each detection frame may be divided. Since the segmented image may be smaller than the image to be identified, the segmented image may also be enlarged to be the same size as the image to be identified in order to facilitate subsequent fusion with the global feature map.

After the segmented image is obtained, pedestrian recognition can be performed on each pixel point in the segmented image, so that a pedestrian segmentation attention characteristic map is obtained.

If there are a plurality of detection frames, and if there are overlapping areas in two of the plurality of detection frames, when dividing, in order to avoid dividing other pedestrians, if the detection frame 1 corresponds to the pedestrian 1, and the detection frame 2 corresponds to the pedestrian 2, when the pedestrian 1 is covered by the pedestrian 2, there is a certain overlapping area between the detection frame 1 and the detection frame 2, so that the pedestrian 2 may be divided into two when dividing the pedestrian 1, in order to avoid this, the overlapping area between the detection frame 1 and the detection frame 2 may be filled with a background color, that is, the overlapping area between the detection frame 1 and the detection frame 2 is not the pedestrian area but is regarded as the background area, so that the overlapping area between the detection frame 1 is not recognized as the pedestrian area when recognizing the pedestrian, and the pixel point belonging to the pedestrian 2 is not recognized as the pixel point of the pedestrian 1 when recognizing the pedestrian in the detection frame 1.

In the implementation process, the pedestrian detection frame is divided, so that the pedestrian area can be divided, and interference caused by background features on pedestrian recognition is eliminated.

In some embodiments, in order to improve accuracy of pedestrian re-recognition, during the re-recognition, the fusion feature map may be segmented through a pedestrian re-recognition model to obtain a plurality of feature blocks, then each feature block is uniformly pooled through the pedestrian re-recognition model to obtain a first local feature corresponding to each feature block, each first local feature is subjected to dimension reduction processing through a preset convolution check through the pedestrian re-recognition model to obtain a corresponding second local feature, and then each second local feature is input to a corresponding classifier in the pedestrian re-recognition model to obtain a recognition result output by the classifier.

When the fusion feature map is divided, the fusion feature map can be uniformly divided into a plurality of feature blocks in the horizontal direction, for example, 6 feature blocks. And uniformly pooling each feature block to obtain a plurality of first local features, wherein the expression forms of the local features are feature vectors. In the dimension reduction processing, the preset convolution kernel may be a 1*1 convolution kernel, and of course, convolution kernels with other sizes may be set according to actual requirements, and the dimension reduction processing is performed on the first local feature to obtain a corresponding second local feature, where the expression form of the second local feature may also be a feature vector. When the identification is performed, the second local features can be input into a classifier formed by the full connection layer and the Softmax function, each second local feature is input into a corresponding classifier, namely, the second local features do not share the classifier, and the identification result output by each classifier is obtained.

In some embodiments, when the second local features are input into the classifier to perform classification and identification, in order to improve accuracy of classification and identification, a plurality of second local features and the global feature map may be connected according to a channel dimension to obtain a total feature, and then the total feature is input into the classifier to obtain an identification result output by the classifier.

And connecting a plurality of second local features with the global feature map according to the channel dimension, and inputting the obtained total features into the classifier, so that effective fusion among all input features is realized, more features can be identified, and the identification effect is improved.

In some embodiments, to improve the recognition accuracy of the pedestrian re-recognition model, the pedestrian re-recognition model may be trained in the following manner:

inputting the training image into the pedestrian re-recognition model to obtain a prediction result output by the pedestrian re-recognition model, wherein the training image comprises a pedestrian shielding image, calculating a total loss value according to the prediction result by using a loss function, and updating network parameters in the pedestrian re-recognition model according to the total loss value.

The training images refer to images in a plurality of data sets, and the plurality of data sets comprise mark-1501, dukeMTMC-reID, OChuman, custom data sets and the like. Both the training data sets Market-1501 and DukeMTMC-reID are very small, the Market-1501 training set has only 12936 pictures of 751 individuals and the DukeMTMC-reID training set has only 16522 pictures of 702 individuals; therefore, in order to form a data set with a larger standard, pseudo tag data of some field video can be added, so that the accuracy of the trained pedestrian re-identification model is higher. In addition, the sharpness of the pictures of mark-1501 and DukeMTMC-reID is actually lower, which may be the reason for the shooting by the camera far away, and the relatively large person, the data set or picture with relatively higher sharpness may be added.

Wherein the mark-1501 dataset has 1501 identities collected by 6 cameras, there are 32668 pedestrian images in total, the dataset is divided into a training set and a test set, the training set contains 12936 images of 751 identities; the testing set comprises 3368 query images and 1593 gallery images, and 750 identities; dukeMTMC-reID dataset contained 1404 identities collected by more than 2 cameras, total 36411 images, training set contained 16522 images of 702 identities, and test set contained another 702 identities.

In addition, the pedestrian video of the existing actual scene can be inferred by using a pre-trained pedestrian detection model resnet on the coco data set, a plurality of human body pictures are cut out, then pictures with larger similarity are screened out by using a universal pedestrian re-identification model, part of the pictures are cut out from continuous video images, the human body pictures have smaller differences, and even two adjacent frames of images are not very different, in this case, the images with poor model identification can be screened out by manpower, and finally the reserved images are used as the self-defined data set.

It will be appreciated that the process of data in the pedestrian re-recognition model in the training process is similar to that in the above embodiment, and the description is not repeated here for brevity. If the pedestrian re-recognition is performed by inputting the global feature map and the local features into the classifier together for recognition, the total loss obtained may be the sum of the loss obtained by prediction based on the global feature map and the total loss obtained by prediction based on each local feature, and the calculation formula is as follows:

Where id_loss represents the total Loss, loss _global represents the Loss obtained by prediction based on the global feature, n represents the number of the second local features described above, λ represents the weight at the time of weighting, and Loss _i represents the Loss corresponding to the ith second local feature.

The weight can be obtained by carrying out dot multiplication according to the mask and the local feature corresponding to each second local feature and then summing the calculated average values, and the calculation formula is as follows:

λ＝Avg(∑P_mask·P_i)；

Where λ represents the weight, P _mask represents the mask corresponding to the second local feature, and P _i represents the ith second local feature.

After the total loss is calculated, the total loss is transmitted back to the model, and the network parameters in the pedestrian re-identification model are updated in the direction of reducing the total loss, wherein the network parameters comprise parameters of a main network, a first branch network, a second branch network and the like.

The loss function for calculating the total loss can adopt a triplet loss function, and the triplet loss function can effectively pull the intra-class distance closer and the inter-class distance farther. Of course, other loss functions may be used for calculation, such as a quad loss function, etc. The loss corresponding to the global feature can be calculated by adopting a cross entropy loss function or a triplet loss function, and the loss corresponding to the local feature can also be calculated by adopting a cross entropy loss function or a triplet loss function.

After the model reaches convergence or reaches the iteration number, the model training is completed.

In some embodiments, in order to achieve the segmentation of the pedestrian, the label information of the training image includes whether each pixel belongs to a label of the pedestrian, and since the label information required for the segmentation of the pedestrian is at a pixel level, if the label information is marked manually, it may take a lot of time, in order to avoid this, the pedestrian may be detected by the example segmentation algorithm on the training image, so as to obtain a segmentation mask image, and the segmentation mask image is used for marking the pixels belonging to the pedestrian in the training image, and then the segmentation mask image is used as a label to mark the training image.

The specific manner of obtaining the segmentation mask image by using the example segmentation algorithm may refer to the related implementation process in the prior art, and will not be described herein in detail.

In some embodiments, in order to ensure accuracy of segmentation, pedestrian detection is performed on a training image through an example segmentation algorithm to obtain at least one pedestrian detection frame, if at least one pedestrian detection frame comprises at least two pedestrian detection frames, determining a position relationship between a detection frame center of each detection frame and a horizontal center of the training image, then determining a target detection frame according to the position relationship, wherein a region selected by the target detection frame is a region in which a pedestrian to be identified is located, and obtaining a corresponding segmentation mask image according to the target detection frame.

It will be appreciated that in a pedestrian occlusion image, the majority of the image is two persons, so selecting a detection box is not possible to select only a detection box that is large, but rather should select a detection box that is closest to the horizontal center of the image, and if the horizontal centers of both detection boxes are close together, then the vertical center is selected to be above, thus determining the final selected target detection box. And then, each pixel point in the target detection frame can be identified to determine the pixel point belonging to the pedestrian, so that a segmentation mask image is obtained, the pixel point belonging to each pedestrian can be accurately identified, and the accuracy of the label is improved.

In some embodiments, in order to expand the pedestrian occlusion image, a random search data enhancement algorithm may be used to pre-process the pedestrian image in the training image to obtain the pedestrian occlusion image.

The random erasure data enhancement algorithm is to randomly select an area in the training image, and add random normal distribution noise (the schematic diagram of which is shown in fig. 4), so that the situation of model overfitting can be reduced, and the performance of the model can be improved. In addition, the specific implementation process of the random erasure data enhancing algorithm may refer to the related implementation process in the prior art, and will not be described herein for brevity.

Referring to fig. 5, fig. 5 is a block diagram illustrating a pedestrian re-recognition apparatus 200 according to an embodiment of the application, where the apparatus 200 may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus 200 corresponds to the above embodiment of the method of fig. 2, and is capable of executing the steps involved in the embodiment of the method of fig. 2, and specific functions of the apparatus 200 may be referred to in the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.

Optionally, the apparatus 200 includes:

The trunk feature extraction module 210 is configured to extract image features of an image to be identified through a trunk network of the pedestrian re-identification model;

A first branch feature extraction module 220, configured to extract a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-recognition model, where the pedestrian segmentation attention feature map is used to mark a position of a pedestrian in the image to be recognized;

a second branch feature extraction module 230, configured to extract a global feature map of the pedestrian according to the image feature through a second branch network of the pedestrian re-recognition model;

The feature fusion module 240 is configured to fuse the pedestrian segmentation attention feature map with the global feature map through the pedestrian re-recognition model, so as to obtain a fused feature map;

And the pedestrian re-recognition module 250 is used for performing pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result.

Optionally, the first branch feature extraction module 220 is configured to:

Optionally, the pedestrian re-recognition module 250 is configured to:

Optionally, the pedestrian re-recognition module 250 is configured to connect, through the pedestrian re-recognition model, the plurality of second local features with the global feature map according to a channel dimension, to obtain a total feature; and inputting the total features to corresponding classifiers in the pedestrian re-recognition model to obtain recognition results output by the classifiers.

Optionally, the pedestrian segmentation attention feature map is a binary image, a feature point with a feature value of 1 represents a position of a pedestrian in the pedestrian segmentation attention feature map, and the feature fusion module 240 is configured to multiply the pedestrian segmentation attention feature map with the global feature map through the pedestrian re-recognition model to obtain a fusion feature map.

Optionally, the apparatus 200 further includes:

Optionally, the model training module is configured to:

It should be noted that, for convenience and brevity, a person skilled in the art will clearly understand that, for the specific working procedure of the apparatus described above, reference may be made to the corresponding procedure in the foregoing method embodiment, and the description will not be repeated here.

An embodiment of the application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method procedure performed by an electronic device in the method embodiment shown in fig. 1.

The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the above-described method embodiments, for example, comprising: extracting image features of an image to be identified through a backbone network of the pedestrian re-identification model; extracting a pedestrian segmentation attention characteristic map according to the image characteristics through a first branch network of the pedestrian re-identification model, wherein the pedestrian segmentation attention characteristic map is used for marking the position of a pedestrian in the image to be identified; extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-recognition model; fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fused feature map; and carrying out pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result.

In summary, the embodiment of the application provides a pedestrian re-recognition method, a device, an electronic device and a readable storage medium, which are used for extracting a pedestrian segmentation attention feature map by adding a first branch network into a pedestrian re-recognition model, so that the model can pay more attention to the features of the area where the pedestrian is located when the pedestrian re-recognition is performed, thereby extracting the features which are more beneficial and more obvious to the pedestrian re-recognition in the image, having a better recognition effect to the pedestrian re-recognition under the condition that the pedestrian is blocked, and effectively improving the accuracy of the pedestrian re-recognition.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of pedestrian re-identification, the method comprising:

Performing pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result;

The pedestrian re-recognition is performed based on the fusion feature map through the pedestrian re-recognition model, and a recognition result is obtained, including:

Inputting the total features into corresponding classifiers in the pedestrian re-recognition model to obtain recognition results output by the classifiers;

training the pedestrian re-recognition model by:

updating network parameters in the pedestrian re-recognition model according to the total loss value;

The method further comprises the steps of:

Preprocessing a pedestrian image in the training image by adopting a random erasure data enhancement algorithm to obtain a pedestrian shielding image;

The training image refers to images in a plurality of data sets, and the plurality of data sets comprise mark-1501, dukeMTMC-reID, OChuman and custom data sets; the custom data is obtained by cutting out human body pictures from pedestrian video, and then screening out pictures with larger similarity in the human body pictures through a general pedestrian re-identification model and a manual mode;

The prediction result comprises a prediction result of the global feature map and a prediction result of the second local feature; the calculating the total loss value by using a loss function according to the prediction result comprises the following steps:

And calculating a total loss value based on the loss obtained by predicting the global feature map and the loss obtained by predicting each second local feature, wherein the calculation formula is as follows:

Wherein id_loss represents a total Loss value, loss _global represents a Loss obtained by prediction based on the global feature map, n represents the number of the second local features, λ represents a weight at the time of weighting, and Loss _i represents a Loss corresponding to the ith second local feature;

And the weight lambda is obtained by carrying out dot multiplication on the mask corresponding to each second local feature and the corresponding second local feature and then summing the calculated mean value, and the calculation formula is as follows:

λ＝Avg(ΣP_mask·P_i)

Wherein P _mask represents a mask corresponding to the second local feature, and P _i represents an ith second local feature.

2. The method of claim 1, wherein the extracting a pedestrian segmentation attention profile from the image features by the first branch network of the pedestrian re-recognition model comprises:

3. The method according to claim 1, wherein the pedestrian segmentation attention feature map is a binary image, and a feature point with a feature value of 1 indicates a position of a pedestrian in the pedestrian segmentation attention feature map, and the fusing the pedestrian segmentation attention feature map with the global feature map through the pedestrian re-recognition model to obtain a fused feature map includes:

And multiplying the pedestrian segmentation attention feature map with the global feature map through the pedestrian re-recognition model to obtain a fusion feature map.

4. The method of claim 1, wherein the tag information of the training image includes a tag of whether each pixel belongs to a pedestrian, the method further comprising:

5. The method of claim 4, wherein the obtaining a segmentation mask image comprises:

6. A pedestrian re-identification device, the device comprising:

The pedestrian re-recognition module is used for carrying out pedestrian re-recognition based on the fusion feature map through the pedestrian re-recognition model to obtain a recognition result;

The pedestrian re-identification module is specifically configured to:

the apparatus further comprises a model training module for:

The training image refers to images in a plurality of data sets, and the plurality of data sets comprise mark-1501, dukeMTMC-reID, OChuman and custom data sets; the custom data is obtained by cutting out human body pictures from pedestrian video, and then screening out pictures with larger similarity in the human body pictures through a universal pedestrian re-identification model;

λ＝Avg(ΣP_mask·P_i)

7. An electronic device comprising a processor and a memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-5.

8. A readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of claims 1-5.