CN110110689B

CN110110689B - Pedestrian re-identification method

Info

Publication number: CN110110689B
Application number: CN201910403777.5A
Authority: CN
Inventors: 张云洲; 刘双伟; 齐林; 朱尚栋; 徐文娟
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2023-05-26
Anticipated expiration: 2039-05-15
Also published as: CN110110689A

Abstract

The embodiment of the disclosure relates to a pedestrian re-identification method, which comprises the following steps: extracting a pedestrian CNN characteristic diagram from a plurality of pictures; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model; and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result. The method provided by the embodiment of the disclosure provides a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, the variation of pedestrian features is increased, the situation that pedestrians are blocked is resisted, and the generalization capability of a deep pedestrian re-recognition model is improved.

Description

Pedestrian re-identification method

Technical Field

The disclosure relates to the technical field of computer vision, in particular to a pedestrian re-recognition method.

Background

The pedestrian re-identification is to match and identify the identity of the pedestrian under a non-overlapping multi-camera monitoring system, and plays an important role in intelligent video monitoring, crime prevention, social security maintenance and the like. However, when the human body attributes such as the posture, the gait, the clothes and the like and the environmental factors such as illumination, background and the like are changed, the appearance of the same pedestrian is obviously different under different monitoring videos, and the appearances of different pedestrians are similar under a certain condition.

In recent years, a deep learning method is widely used, and the deep learning can achieve better performance than the conventional manual design method. However, deep pedestrian re-recognition models typically have a large number of network parameters, but are optimized over a limited data set, which increases the risk of overfitting and reduces generalization capability. Improving the generalization ability of the model is therefore a significant and important issue for deep pedestrian re-identification.

To improve the generalization ability of the deep convolutional neural network, variations of the training data set can be increased and a large number of pedestrian images containing occlusion situations can be collected, but only the data enhancement at the image level can be realized, and the data enhancement at the aspect beyond the image level cannot be provided so as to improve the generalization ability of the deep convolutional neural network.

The above drawbacks are to be overcome by those skilled in the art.

Disclosure of Invention

First, the technical problem to be solved

In order to solve the above-mentioned problems of the prior art, the present disclosure provides a pedestrian re-recognition method that can perform data enhancement in terms of feature level to improve generalization ability of a deep convolutional neural network.

(II) technical scheme

In order to achieve the above purpose, the main technical scheme adopted in the present disclosure includes:

an embodiment of the present disclosure provides a pedestrian re-recognition method, including:

extracting a pedestrian CNN characteristic diagram from a plurality of pictures;

simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model;

and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result.

In one embodiment of the present disclosure, the extracting a pedestrian CNN feature map from a plurality of pictures includes:

randomly selecting the plurality of pictures from a training dataset;

inputting the pictures into a plurality of different semantic layers of a ResNet50 model for extraction to obtain feature images of a plurality of channels;

processing the feature graphs of the channels by using a channel attention module to obtain a feature graph processed by the channels;

and processing the spatial context information of the feature map processed by the channel at different positions by using a spatial attention module to obtain the pedestrian CNN feature map.

In one embodiment of the disclosure, the processing the feature maps of the plurality of channels by using the channel attention module, to obtain a feature map processed by the channels includes:

obtaining a channel characteristic descriptor according to the characteristic diagram of each channel in the characteristic diagrams of the plurality of channels;

obtaining a channel attention feature map through activating function operation on the channel feature descriptors;

multiplying the channel attention profile by the aggregated profile to obtain the channel processed profile.

In one embodiment of the disclosure, the feature descriptors include statistics of the plurality of channels, and the feature descriptors are:

the statistics for each channel are:

wherein N is the number of channels, N is the number of channels, and A and B are the length and width of the feature map respectively;

the channel attention profile is:

e＝σ(W ₂ δ(W ₁ (s)))

wherein sigma, delta represent the Sigmod activation function and the ReLU activation function, respectively,

is the weight of the first fully-connected layer Fc1, -/->

Is the weight of the second fully connected layer Fc2, r is a multiple of the attenuation.

In one embodiment of the disclosure, the processing, by using the spatial attention module, spatial context information of the channel-processed feature map at different positions, to obtain the pedestrian CNN feature map includes:

carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a first spatial information feature map T and a second spatial information feature map U;

performing matrix multiplication operation on the transpose of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map;

carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a third spatial information feature map V;

performing matrix multiplication operation on the third spatial information feature map V and the transpose of the spatial attention feature map to obtain a feature map subjected to spatial processing;

and obtaining the pedestrian CNN characteristic diagram according to the channel processing and the space processing.

In one embodiment of the present disclosure, the model training for the situation that the discriminative area of the pedestrian CNN feature map is blocked is simulated by using the manner of resisting erasure learning, and obtaining the training model includes:

inputting the pedestrian CNN feature images into a main classifier and an auxiliary classifier respectively for classification training, and outputting feature images exclusive to pedestrian categories from the main classifier and the auxiliary classifier;

performing partial erasure on the auxiliary classifier to obtain an erased feature map;

calculating the exclusive characteristic diagram of the pedestrian category output by the main classifier and the erased characteristic diagram output by the auxiliary classifier through a loss function to obtain a loss value;

and updating parameters of the training model according to the loss value.

In one embodiment of the disclosure, the primary classifier and the secondary classifier include the same number of convolution layers and global average pooling layers, and the number of channels of the convolution layers is the same as the number of pedestrian categories in the training dataset, and each channel of the pedestrian category-specific feature map represents a body response heat map when pedestrian images belong to different categories.

In one embodiment of the disclosure, the performing the partial erasure at the secondary classifier includes:

determining a region with the heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region;

and erasing the part corresponding to the discriminant region in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier by a countermeasure mode that the response value is replaced by 0.

In one embodiment of the present disclosure, the step of performing pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized, where obtaining the pedestrian re-recognition result includes:

inputting the target pedestrian image and the pedestrian image to be identified into the training model for training to obtain corresponding depth characteristics respectively;

calculating cosine distance according to the depth features of the target pedestrian image and the depth features of the pedestrian image to be identified;

and determining the similarity between the target pedestrian image and the pedestrian image to be identified according to the cosine distance, wherein the pedestrian image to be identified with the maximum similarity is the pedestrian re-identification result.

In one embodiment of the present disclosure, a calculation formula for calculating a cosine distance according to a depth feature of the target pedestrian image and a depth feature of the pedestrian image to be identified is:

wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.

(III) beneficial effects

The beneficial effects of the present disclosure are: according to the pedestrian re-recognition method provided by the embodiment of the disclosure, the input feature map of the auxiliary classifier is partially erased by providing a feature level data enhancement strategy, so that the variation of pedestrian features and the situation of resisting the shielding of pedestrians are increased, and the generalization capability of the deep pedestrian re-recognition model is improved.

Drawings

FIG. 1 is a flow chart of a pedestrian re-identification method provided in one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a network architecture for implementing the method of FIG. 1 in one embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating step S110 in FIG. 1 according to one embodiment of the present disclosure;

FIG. 4 is a flowchart of step S303 in FIG. 3 according to one embodiment of the present disclosure;

FIG. 5 is a schematic illustration of channel attention in one embodiment of the present disclosure;

FIG. 6 is a schematic diagram of spatial attention in one embodiment of the present disclosure;

FIG. 7 is a flowchart of step S304 in FIG. 3 according to one embodiment of the present disclosure;

FIG. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure;

FIG. 9 is a flowchart of step S120 in FIG. 1 according to one embodiment of the present disclosure;

fig. 10 is a flowchart of step S130 in fig. 1 according to an embodiment of the disclosure.

Detailed Description

For a better explanation of the present disclosure, for ease of understanding, the present disclosure is described in detail below by way of specific embodiments in conjunction with the accompanying drawings.

All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

In other embodiments of the present disclosure, adding a variant of the training data set is an effective way to increase the generalization ability of the deep convolutional neural network. However, unlike the visual task of object recognition, pedestrian re-recognition needs to collect image data across cameras, and pedestrian labeling is very difficult, so that a large enough data set is often required to be built for pedestrian re-recognition, and the existing data set is small in pedestrian labeling amount. To address this problem, data enhancement may use only the current data set to augment the variation of the training set samples without the additional cost. Recent data enhancement studies have used an countermeasure generation network (Generative Adversarial Networks, GAN for short) to generate pedestrian images of different human body postures and camera styles, but this approach has problems of long training time, difficult convergence, low quality of generated images, and the like. In addition to explicitly generating new images, the usual approach may also enhance data on training images by dithering pixel values, random cropping, flipping the original image, etc.

In addition, occlusion is also an important factor affecting the generalization ability of convolutional neural networks. Collecting a large number of pedestrian images containing occlusion situations is one way to effectively solve the occlusion problem, but this also requires a high cost investment. Another reasonable approach is to accurately simulate the situation where pedestrians are occluded. For example, a rectangular box of random size and random position is used on the training image to occlude the training image and the pixel values of this rectangular area are replaced with random values to simulate occlusion to increase the variation of the dataset. However, the above-mentioned shielding area is randomly selected, optionally, a pedestrian re-recognition classification model is trained, then the image discriminant area is found with the aid of network visualization and multiple classifiers, and the discriminant area is shielded on the original image to generate a new sample, and finally the new sample is added into the original data set to re-train the pedestrian re-recognition model.

Based on the two methods, the variation of the sample is added on the original pedestrian image through shielding, and the method belongs to the image-level data enhancement method.

Fig. 1 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:

as shown in fig. 1, in step S110, a pedestrian CNN feature map is extracted from a plurality of pictures;

as shown in fig. 1, in step S120, model training is performed by simulating the situation that the discriminative area of the pedestrian CNN feature map is blocked in a manner of resisting erasure learning, so as to obtain a training model;

as shown in fig. 1, in step S130, the training model is used to combine the target pedestrian image and the pedestrian image to be identified to perform pedestrian re-identification, so as to obtain a pedestrian re-identification result.

The specific implementation of the steps of the embodiment shown in fig. 1 is described in detail below:

with reference to the flow chart shown in fig. 1, fig. 2 is a schematic diagram of a network structure for implementing the method shown in fig. 1 in an embodiment of the present disclosure, as shown in fig. 2, a complementary attention of channel attention and spatial attention is required to be used in a processing process for each channel, and then anti-erasure learning and Softmax loss calculation are performed on the obtained feature map. In addition, as shown in fig. 2, three channels are divided into two channels, a middle-level semantic branch and a high-level semantic branch.

In step S110, a pedestrian CNN feature map is extracted from a plurality of pictures.

Fig. 3 is a flowchart of step S110 in fig. 1 according to an embodiment of the disclosure, which specifically includes the following steps:

as shown in fig. 3, in step S301, the plurality of pictures are randomly selected from the training dataset.

As shown in fig. 3, in step S302, the plurality of pictures are input to a plurality of different semantic layers of the res net50 model for extraction, so as to obtain a feature map of a plurality of channels.

In one embodiment of the present disclosure, first, an image is input, and a batch number of picture sets, i.e., a plurality of pictures, are randomly selected on a training dataset. Next, the picture sizes are adjusted to 384×128, and the pictures are sent to different semantic layers (res_conv5a, res_conv5b, res_conv5c) of the backbone network res net50, as shown in fig. 2, where res_conv5a, res_conv5b correspond to the middle semantic branches and res_conv5c corresponds to the high semantic branches) to extract the pedestrian CNN feature map.

As shown in fig. 3, in step S303, the channel attention module is used to process the feature maps of the channels, so as to obtain a feature map processed by the channels.

Fig. 4 is a flowchart of step S303 in fig. 3 according to an embodiment of the present disclosure, specifically including the following steps:

as shown in fig. 4, in step S401, a channel feature descriptor is obtained according to a feature map of each channel in the feature maps of the plurality of channels.

As shown in fig. 4, in step S402, a channel attention profile is obtained by performing an activation function operation on the channel feature descriptors.

As shown in fig. 4, in step S403, the channel attention profile is multiplied by the aggregated profile to obtain the channel processed profile.

In one embodiment of the present disclosure, step S303 may employ a channel attention module to explore the links between channels of the pedestrian CNN profile, capturing and describing areas of input image discrimination.

Fig. 5 is a schematic view of channel attention in an embodiment of the disclosure, as shown in fig. 5, for each channel, feature maps obtained by extraction are respectively shown in a and B, where a and B are the length and width of the feature maps, N is the number of channels, and N is the number of channels.

First, feature graphs are aggregated using GAP operations

Spatial information for each channel, generating a feature descriptor for the channel: />

It can be seen that the feature descriptors include statistics for the plurality of channels, the statistics for each channel being:

secondly, s is passed through a threshold mechanism module to obtain a channel attention feature map

e＝σ(W ₂ δ(W ₁ (s))) formula (2)

is the weight of the first fully-connected layer Fc1, -/->

Finally, multiplying the channel attention e with the original input feature map S to obtain a corrected feature map

Because the channel attention profile e-code contains dependencies and relative importance between channel profiles, the neural network will ignore less reused profiles by dynamically updating e to learn important types of profiles.

As shown in fig. 3, in step S304, the spatial context information of the channel-processed feature map at different positions is processed by using a spatial attention module, so as to obtain the pedestrian CNN feature map.

In one embodiment of the present disclosure, step S304 may use a spatial attention module to integrate spatial context information of different positions of the feature map into the pedestrian local feature, so as to enhance the spatial correlation of the pedestrian local area. Fig. 6 is a schematic diagram of spatial attention in an embodiment of the disclosure, as shown in fig. 6, the feature images after the channel processing are respectively convolved to obtain a first spatial information feature image T, a second spatial information feature image U and a third spatial information feature image V, the first spatial information feature image T is transposed and then multiplied by U to obtain D, the D and the V are multiplied to obtain X, and the X is scaled in a certain proportion and then added to the feature images after the channel processing to implement spatial processing on the feature images, so as to obtain a final pedestrian CNN feature image.

Fig. 7 is a flowchart of step S304 in fig. 3 according to an embodiment of the present disclosure, which specifically includes the following steps:

as shown in fig. 7, in step S701, a convolution operation of 1×1 is performed on the channel-processed feature map, so as to obtain a first spatial information feature map T and a second spatial information feature map U. Channel attention corrected feature map (i.e., channel processed feature map)

Feeding a convolution f of 1 x 1 _key And f _query Obtaining two characteristic diagrams T and U, wherein

As shown in fig. 7, in step S702, a matrix multiplication operation is performed on the transpose of the first spatial information feature map T and the second spatial information feature map U, so as to obtain a spatial attention feature map. Adjusting the T and U shapes to

Where z=a×b, representing the number of features, then transpose T and matrix multiply with U, applying a Softmax function in the row direction to obtain a spatial attention profile D e R ^Z×Z Each element D of D _j,i Can be represented asThe method comprises the following steps:

wherein d _j,i Representing the correlation of the ith position to the jth position feature, the more similar the feature expressions of the two positions are, the higher the correlation between the two.

As shown in fig. 7, in step S703, a convolution operation of 1×1 is performed on the channel-processed feature map, to obtain a third spatial information feature map V.

The channel-processed feature map S' is fed into a 1 x 1 convolutional layer f _value Obtaining a new characteristic diagram

And adjust its shape to +.>

As shown in fig. 7, in step S704, the third spatial information feature map V is subjected to matrix multiplication with the transpose of the spatial attention feature map, and a spatially processed feature map is obtained.

In this step, the transpose of V and D is first subjected to matrix multiplication, and the shape of the result is adjusted to

And passing it through a convolution f of 1 x 1 _up Obtaining a characteristic diagram->

As shown in fig. 7, in step S705, the pedestrian CNN feature map is obtained from the passage processing, that is, the steps S401 to S403, and the space processing, that is, the steps S701 to S704.

In this step X is multiplied by a scaling parameter alpha and added to the channel-processed feature map S' by element to obtain a feature map

Namely:

based on the above, the elements of each position of the feature map s″ can be expressed as:

where α is a learnable parameter, initially set to 0, and progressively greater weights may be learned starting from 0. As can be seen from equation (6), the feature S "of each position of the feature map S' _j Is a feature map S 'of features of all positions and processed by channels' _j So that it contains a global receptive field, based on each element D in the spatial attention profile D _j,i Selectively aggregating the associated local regions V _i And thereby may enhance the link between the different local features of the pedestrian.

Based on the foregoing steps, it is more efficient to use the modified CNN signature in series with channel attention and spatial attention, letting the neural network automatically focus on which types of features and which locations of features. Thus, in the present disclosure, the channel attention module and the spatial attention module are used in combination, giving full play to both. As shown in fig. 5, the characteristic diagram of the present disclosure

The correction of the complementary attention is realized by the channel attention module and the space attention module firstly:

S'＝M _c (S)

S”＝M _s (S') equation (7)

In step S120, model training is performed by simulating the situation that the discriminative area of the pedestrian CNN feature map is blocked in a manner of resisting erasure learning, so as to obtain a training model.

Fig. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure, as shown in fig. 8, the pedestrian CNN feature map is processed by convolution, GAP and Softmax loss functions through the main classifier and the auxiliary classifier, respectively.

Fig. 9 is a flowchart of step S120 in fig. 1 according to an embodiment of the disclosure, which specifically includes the following steps:

as shown in fig. 9, in step S901, the pedestrian CNN feature map is input to a main classifier and an auxiliary classifier, respectively, for classification training, and feature maps specific to pedestrian categories are output from the main classifier and the auxiliary classifier.

The main classifier and the auxiliary classifier comprise the same number of convolution layers and global average pooling layers (Global Average Pooling, GAP for short), the number of channels of the convolution layers is the same as the number of pedestrian categories in the training data set, and each channel of the characteristic map specific to the pedestrian category represents a body response heat map when the pedestrian image belongs to different categories.

In the step, the full connection layer of the classification model is replaced by a 1×1 convolution layer to form a classification model based on a full convolution network, and the corrected feature image (namely, pedestrian CNN feature image) is fed into the 1×1 convolution layer to directly obtain the feature image exclusive to the pedestrian category. In the training stage, a pedestrian image category label can be obtained, and the characteristic diagram of the channel corresponding to the category label is indexed out to obtain a characteristic diagram exclusive to the pedestrian category, namely a body response heat diagram of the pedestrian image.

As shown in fig. 9, in step S902, the sub classifier is partially erased, so as to obtain an erased feature map.

Firstly, determining a region with a heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region; and secondly, erasing the part corresponding to the discriminant area in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier by a countermeasure mode that the response value is replaced by 0.

In the step, the input feature map of the auxiliary classifier is partially erased, the main classifier in the step S901 generates a feature map exclusive for the pedestrian category, and the part with the body response heat map value higher than the threshold of the countermeasure erasure is set as the discriminatory region, and the corresponding region in the input feature map of the auxiliary classifier is erased by the countermeasure mode that the response value is replaced by 0. The feature map input by the auxiliary classifier is partially erased, so that the variation of the feature map can be increased, and the situation that pedestrians are blocked is simulated.

As shown in fig. 9, in step S903, a loss function is used to calculate the characteristic map specific to the pedestrian category output by the main classifier and the erased characteristic map output by the auxiliary classifier, so as to obtain a loss value.

As shown in fig. 9, in step S904, the training model is updated with parameters according to the loss value.

In the step, parameter updating is carried out on both branches of the main classifier and the auxiliary classifier under the supervision of a Softmax loss function, and the loss function expression is as follows:

where P represents the size of the bulk samples, M represents the number of branches, K represents the number of classifiers in the challenge-erase learning (2 in this embodiment), C represents the number of classes,

represents the first of the kth classifier of the mth branch of the nth sample when the full convolution classification network is used _p A node value of Softmax input, where l _p Is the class of the p-th sample. The first classifier of each branch is a main classifier, the second classifier is an auxiliary classifier, and the parameter lambda is a parameter lambda _k Is assigned to the two classifier lossesWeight, wherein parameter lambda ₁ Corresponding to =1 is the main classifier, parameter λ ₂ Corresponding to=0.5 is a secondary classifier.

In step S130, the training model is used to combine the target pedestrian image and the pedestrian image to be identified to perform pedestrian re-identification, so as to obtain a pedestrian re-identification result.

Fig. 10 is a flowchart of step S130 in fig. 1 according to an embodiment of the disclosure, specifically including the following steps:

as shown in fig. 10, in step S1001, corresponding depth features are obtained respectively according to the target pedestrian image and the pedestrian image to be identified being input into the training model for training. In this step, the target pedestrian image and the pedestrian image to be identified are sent to the CNN model trained in step 2 to extract image features, specifically, features (res_conv5a, res_conv5b, res_conv5c) with different semantic levels in fig. 2 are connected in series as final feature descriptors.

As shown in fig. 10, in step S1002, a cosine distance is calculated according to the depth feature of the target pedestrian image and the depth feature of the pedestrian image to be identified, where the calculation formula is:

As shown in fig. 10, in step S1003, the similarity between the target pedestrian image and the pedestrian image to be identified is determined according to the magnitude of the cosine distance, wherein the pedestrian image to be identified with the greatest similarity is the pedestrian re-identification result.

Since the similarity between the graph opposition composed of the target pedestrian image and the pedestrian image to be identified is in a negative linear correlation relationship with the feature cosine distance, the smaller the feature cosine distance is, the higher the similarity of the graph opposition is. Based on the above, the cosine distances can be obtained and then arranged in ascending order according to the sizes, that is, the images are ordered in descending order of the sizes of the similarity, and the pedestrian image to be identified with the maximum similarity is used as the pedestrian re-identification result.

In summary, by adopting the pedestrian re-recognition method provided by the embodiment of the present disclosure, on one hand, by providing a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, so as to increase the variation of the pedestrian features and resist the situation that the pedestrian is blocked, and improve the generalization capability of the deep pedestrian re-recognition model. On the other hand, the spatial attention model in the disclosure integrates spatial context information into the local features of pedestrians, enhances the spatial correlation of different positions of the pedestrians, forms a complementary attention model with the channel attention model, and corrects the feature map from two directions of the channel and the space by combining the two models, so that the discriminant region can be better captured. The classification model based on the full convolution network can directly obtain a body response heat map in the forward propagation process, guide erasure of a discriminatory body region and realize data enhancement of feature level data.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A pedestrian re-recognition method, characterized in that it comprises:

extracting a pedestrian CNN feature map from a plurality of pictures, wherein the pedestrian CNN feature map comprises the following steps:

randomly selecting the plurality of pictures from a training dataset;

processing the spatial context information of the feature map processed by the channel at different positions by using a spatial attention module to obtain the pedestrian CNN feature map; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to carry out model training to obtain a training model, wherein the method comprises the following steps:

the auxiliary classifier is an auxiliary classifier added on the basis of the resnet 50;

the main classifier and the auxiliary classifier comprise the same number of convolution layers and global average pooling layers, the number of channels of the convolution layers is the same as the number of pedestrian categories in the training data set, and each channel of the characteristic map exclusive to the pedestrian category represents a body response heat map when the pedestrian image belongs to different categories; performing partial erasure on the auxiliary classifier to obtain an erased feature map;

the performing partial erasure at the secondary classifier includes:

the part corresponding to the discriminant area in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier is erased in a countermeasure mode that the response value is replaced by 0;

parameter updating is carried out on the training model according to the loss value;

2. The pedestrian re-recognition method of claim 1 wherein the processing the feature map of the plurality of channels with the channel attention module to obtain a channel processed feature map comprises:

3. The pedestrian re-recognition method of claim 2 wherein the feature descriptors include statistics of the plurality of channels, the feature descriptors being:

the statistics for each channel are:

the channel attention profile is:

e＝σ(W ₂ δ(W ₁ (s)))

is the weight of the first fully-connected layer Fc1, -/->

4. The pedestrian re-recognition method of claim 1, wherein the processing the spatial context information of the channel-processed feature map at different locations with the spatial attention module to obtain the pedestrian CNN feature map comprises:

5. The pedestrian re-recognition method according to claim 2, wherein the step of performing pedestrian re-recognition by combining the target pedestrian image and the pedestrian image to be recognized by using the training model, the step of obtaining a pedestrian re-recognition result includes:

6. The pedestrian re-recognition method of claim 5, wherein the calculation formula for calculating the cosine distance from the depth features of the target pedestrian image and the depth features of the pedestrian image to be recognized is: