CN110110689B - Pedestrian re-identification method - Google Patents

Pedestrian re-identification method Download PDF

Info

Publication number
CN110110689B
CN110110689B CN201910403777.5A CN201910403777A CN110110689B CN 110110689 B CN110110689 B CN 110110689B CN 201910403777 A CN201910403777 A CN 201910403777A CN 110110689 B CN110110689 B CN 110110689B
Authority
CN
China
Prior art keywords
pedestrian
feature map
channel
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910403777.5A
Other languages
Chinese (zh)
Other versions
CN110110689A (en
Inventor
张云洲
刘双伟
齐林
朱尚栋
徐文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910403777.5A priority Critical patent/CN110110689B/en
Publication of CN110110689A publication Critical patent/CN110110689A/en
Application granted granted Critical
Publication of CN110110689B publication Critical patent/CN110110689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure relates to a pedestrian re-identification method, which comprises the following steps: extracting a pedestrian CNN characteristic diagram from a plurality of pictures; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model; and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result. The method provided by the embodiment of the disclosure provides a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, the variation of pedestrian features is increased, the situation that pedestrians are blocked is resisted, and the generalization capability of a deep pedestrian re-recognition model is improved.

Description

Pedestrian re-identification method
Technical Field
The disclosure relates to the technical field of computer vision, in particular to a pedestrian re-recognition method.
Background
The pedestrian re-identification is to match and identify the identity of the pedestrian under a non-overlapping multi-camera monitoring system, and plays an important role in intelligent video monitoring, crime prevention, social security maintenance and the like. However, when the human body attributes such as the posture, the gait, the clothes and the like and the environmental factors such as illumination, background and the like are changed, the appearance of the same pedestrian is obviously different under different monitoring videos, and the appearances of different pedestrians are similar under a certain condition.
In recent years, a deep learning method is widely used, and the deep learning can achieve better performance than the conventional manual design method. However, deep pedestrian re-recognition models typically have a large number of network parameters, but are optimized over a limited data set, which increases the risk of overfitting and reduces generalization capability. Improving the generalization ability of the model is therefore a significant and important issue for deep pedestrian re-identification.
To improve the generalization ability of the deep convolutional neural network, variations of the training data set can be increased and a large number of pedestrian images containing occlusion situations can be collected, but only the data enhancement at the image level can be realized, and the data enhancement at the aspect beyond the image level cannot be provided so as to improve the generalization ability of the deep convolutional neural network.
The above drawbacks are to be overcome by those skilled in the art.
Disclosure of Invention
First, the technical problem to be solved
In order to solve the above-mentioned problems of the prior art, the present disclosure provides a pedestrian re-recognition method that can perform data enhancement in terms of feature level to improve generalization ability of a deep convolutional neural network.
(II) technical scheme
In order to achieve the above purpose, the main technical scheme adopted in the present disclosure includes:
an embodiment of the present disclosure provides a pedestrian re-recognition method, including:
extracting a pedestrian CNN characteristic diagram from a plurality of pictures;
simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model;
and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result.
In one embodiment of the present disclosure, the extracting a pedestrian CNN feature map from a plurality of pictures includes:
randomly selecting the plurality of pictures from a training dataset;
inputting the pictures into a plurality of different semantic layers of a ResNet50 model for extraction to obtain feature images of a plurality of channels;
processing the feature graphs of the channels by using a channel attention module to obtain a feature graph processed by the channels;
and processing the spatial context information of the feature map processed by the channel at different positions by using a spatial attention module to obtain the pedestrian CNN feature map.
In one embodiment of the disclosure, the processing the feature maps of the plurality of channels by using the channel attention module, to obtain a feature map processed by the channels includes:
obtaining a channel characteristic descriptor according to the characteristic diagram of each channel in the characteristic diagrams of the plurality of channels;
obtaining a channel attention feature map through activating function operation on the channel feature descriptors;
multiplying the channel attention profile by the aggregated profile to obtain the channel processed profile.
In one embodiment of the disclosure, the feature descriptors include statistics of the plurality of channels, and the feature descriptors are:
Figure BDA0002060606770000021
the statistics for each channel are:
Figure BDA0002060606770000022
wherein N is the number of channels, N is the number of channels, and A and B are the length and width of the feature map respectively;
the channel attention profile is:
e=σ(W 2 δ(W 1 (s)))
wherein sigma, delta represent the Sigmod activation function and the ReLU activation function, respectively,
Figure BDA0002060606770000031
is the weight of the first fully-connected layer Fc1, -/->
Figure BDA0002060606770000032
Is the weight of the second fully connected layer Fc2, r is a multiple of the attenuation.
In one embodiment of the disclosure, the processing, by using the spatial attention module, spatial context information of the channel-processed feature map at different positions, to obtain the pedestrian CNN feature map includes:
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a first spatial information feature map T and a second spatial information feature map U;
performing matrix multiplication operation on the transpose of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map;
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a third spatial information feature map V;
performing matrix multiplication operation on the third spatial information feature map V and the transpose of the spatial attention feature map to obtain a feature map subjected to spatial processing;
and obtaining the pedestrian CNN characteristic diagram according to the channel processing and the space processing.
In one embodiment of the present disclosure, the model training for the situation that the discriminative area of the pedestrian CNN feature map is blocked is simulated by using the manner of resisting erasure learning, and obtaining the training model includes:
inputting the pedestrian CNN feature images into a main classifier and an auxiliary classifier respectively for classification training, and outputting feature images exclusive to pedestrian categories from the main classifier and the auxiliary classifier;
performing partial erasure on the auxiliary classifier to obtain an erased feature map;
calculating the exclusive characteristic diagram of the pedestrian category output by the main classifier and the erased characteristic diagram output by the auxiliary classifier through a loss function to obtain a loss value;
and updating parameters of the training model according to the loss value.
In one embodiment of the disclosure, the primary classifier and the secondary classifier include the same number of convolution layers and global average pooling layers, and the number of channels of the convolution layers is the same as the number of pedestrian categories in the training dataset, and each channel of the pedestrian category-specific feature map represents a body response heat map when pedestrian images belong to different categories.
In one embodiment of the disclosure, the performing the partial erasure at the secondary classifier includes:
determining a region with the heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region;
and erasing the part corresponding to the discriminant region in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier by a countermeasure mode that the response value is replaced by 0.
In one embodiment of the present disclosure, the step of performing pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized, where obtaining the pedestrian re-recognition result includes:
inputting the target pedestrian image and the pedestrian image to be identified into the training model for training to obtain corresponding depth characteristics respectively;
calculating cosine distance according to the depth features of the target pedestrian image and the depth features of the pedestrian image to be identified;
and determining the similarity between the target pedestrian image and the pedestrian image to be identified according to the cosine distance, wherein the pedestrian image to be identified with the maximum similarity is the pedestrian re-identification result.
In one embodiment of the present disclosure, a calculation formula for calculating a cosine distance according to a depth feature of the target pedestrian image and a depth feature of the pedestrian image to be identified is:
Figure BDA0002060606770000041
wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.
(III) beneficial effects
The beneficial effects of the present disclosure are: according to the pedestrian re-recognition method provided by the embodiment of the disclosure, the input feature map of the auxiliary classifier is partially erased by providing a feature level data enhancement strategy, so that the variation of pedestrian features and the situation of resisting the shielding of pedestrians are increased, and the generalization capability of the deep pedestrian re-recognition model is improved.
Drawings
FIG. 1 is a flow chart of a pedestrian re-identification method provided in one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a network architecture for implementing the method of FIG. 1 in one embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating step S110 in FIG. 1 according to one embodiment of the present disclosure;
FIG. 4 is a flowchart of step S303 in FIG. 3 according to one embodiment of the present disclosure;
FIG. 5 is a schematic illustration of channel attention in one embodiment of the present disclosure;
FIG. 6 is a schematic diagram of spatial attention in one embodiment of the present disclosure;
FIG. 7 is a flowchart of step S304 in FIG. 3 according to one embodiment of the present disclosure;
FIG. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure;
FIG. 9 is a flowchart of step S120 in FIG. 1 according to one embodiment of the present disclosure;
fig. 10 is a flowchart of step S130 in fig. 1 according to an embodiment of the disclosure.
Detailed Description
For a better explanation of the present disclosure, for ease of understanding, the present disclosure is described in detail below by way of specific embodiments in conjunction with the accompanying drawings.
All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
In other embodiments of the present disclosure, adding a variant of the training data set is an effective way to increase the generalization ability of the deep convolutional neural network. However, unlike the visual task of object recognition, pedestrian re-recognition needs to collect image data across cameras, and pedestrian labeling is very difficult, so that a large enough data set is often required to be built for pedestrian re-recognition, and the existing data set is small in pedestrian labeling amount. To address this problem, data enhancement may use only the current data set to augment the variation of the training set samples without the additional cost. Recent data enhancement studies have used an countermeasure generation network (Generative Adversarial Networks, GAN for short) to generate pedestrian images of different human body postures and camera styles, but this approach has problems of long training time, difficult convergence, low quality of generated images, and the like. In addition to explicitly generating new images, the usual approach may also enhance data on training images by dithering pixel values, random cropping, flipping the original image, etc.
In addition, occlusion is also an important factor affecting the generalization ability of convolutional neural networks. Collecting a large number of pedestrian images containing occlusion situations is one way to effectively solve the occlusion problem, but this also requires a high cost investment. Another reasonable approach is to accurately simulate the situation where pedestrians are occluded. For example, a rectangular box of random size and random position is used on the training image to occlude the training image and the pixel values of this rectangular area are replaced with random values to simulate occlusion to increase the variation of the dataset. However, the above-mentioned shielding area is randomly selected, optionally, a pedestrian re-recognition classification model is trained, then the image discriminant area is found with the aid of network visualization and multiple classifiers, and the discriminant area is shielded on the original image to generate a new sample, and finally the new sample is added into the original data set to re-train the pedestrian re-recognition model.
Based on the two methods, the variation of the sample is added on the original pedestrian image through shielding, and the method belongs to the image-level data enhancement method.
Fig. 1 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:
as shown in fig. 1, in step S110, a pedestrian CNN feature map is extracted from a plurality of pictures;
as shown in fig. 1, in step S120, model training is performed by simulating the situation that the discriminative area of the pedestrian CNN feature map is blocked in a manner of resisting erasure learning, so as to obtain a training model;
as shown in fig. 1, in step S130, the training model is used to combine the target pedestrian image and the pedestrian image to be identified to perform pedestrian re-identification, so as to obtain a pedestrian re-identification result.
The specific implementation of the steps of the embodiment shown in fig. 1 is described in detail below:
with reference to the flow chart shown in fig. 1, fig. 2 is a schematic diagram of a network structure for implementing the method shown in fig. 1 in an embodiment of the present disclosure, as shown in fig. 2, a complementary attention of channel attention and spatial attention is required to be used in a processing process for each channel, and then anti-erasure learning and Softmax loss calculation are performed on the obtained feature map. In addition, as shown in fig. 2, three channels are divided into two channels, a middle-level semantic branch and a high-level semantic branch.
In step S110, a pedestrian CNN feature map is extracted from a plurality of pictures.
Fig. 3 is a flowchart of step S110 in fig. 1 according to an embodiment of the disclosure, which specifically includes the following steps:
as shown in fig. 3, in step S301, the plurality of pictures are randomly selected from the training dataset.
As shown in fig. 3, in step S302, the plurality of pictures are input to a plurality of different semantic layers of the res net50 model for extraction, so as to obtain a feature map of a plurality of channels.
In one embodiment of the present disclosure, first, an image is input, and a batch number of picture sets, i.e., a plurality of pictures, are randomly selected on a training dataset. Next, the picture sizes are adjusted to 384×128, and the pictures are sent to different semantic layers (res_conv5a, res_conv5b, res_conv5c) of the backbone network res net50, as shown in fig. 2, where res_conv5a, res_conv5b correspond to the middle semantic branches and res_conv5c corresponds to the high semantic branches) to extract the pedestrian CNN feature map.
As shown in fig. 3, in step S303, the channel attention module is used to process the feature maps of the channels, so as to obtain a feature map processed by the channels.
Fig. 4 is a flowchart of step S303 in fig. 3 according to an embodiment of the present disclosure, specifically including the following steps:
as shown in fig. 4, in step S401, a channel feature descriptor is obtained according to a feature map of each channel in the feature maps of the plurality of channels.
As shown in fig. 4, in step S402, a channel attention profile is obtained by performing an activation function operation on the channel feature descriptors.
As shown in fig. 4, in step S403, the channel attention profile is multiplied by the aggregated profile to obtain the channel processed profile.
In one embodiment of the present disclosure, step S303 may employ a channel attention module to explore the links between channels of the pedestrian CNN profile, capturing and describing areas of input image discrimination.
Fig. 5 is a schematic view of channel attention in an embodiment of the disclosure, as shown in fig. 5, for each channel, feature maps obtained by extraction are respectively shown in a and B, where a and B are the length and width of the feature maps, N is the number of channels, and N is the number of channels.
First, feature graphs are aggregated using GAP operations
Figure BDA0002060606770000081
Spatial information for each channel, generating a feature descriptor for the channel: />
Figure BDA0002060606770000082
It can be seen that the feature descriptors include statistics for the plurality of channels, the statistics for each channel being:
Figure BDA0002060606770000083
secondly, s is passed through a threshold mechanism module to obtain a channel attention feature map
Figure BDA0002060606770000084
e=σ(W 2 δ(W 1 (s))) formula (2)
Wherein sigma, delta represent the Sigmod activation function and the ReLU activation function, respectively,
Figure BDA0002060606770000085
is the weight of the first fully-connected layer Fc1, -/->
Figure BDA0002060606770000086
Is the weight of the second fully connected layer Fc2, r is a multiple of the attenuation.
Finally, multiplying the channel attention e with the original input feature map S to obtain a corrected feature map
Figure BDA0002060606770000087
Figure BDA0002060606770000088
Because the channel attention profile e-code contains dependencies and relative importance between channel profiles, the neural network will ignore less reused profiles by dynamically updating e to learn important types of profiles.
As shown in fig. 3, in step S304, the spatial context information of the channel-processed feature map at different positions is processed by using a spatial attention module, so as to obtain the pedestrian CNN feature map.
In one embodiment of the present disclosure, step S304 may use a spatial attention module to integrate spatial context information of different positions of the feature map into the pedestrian local feature, so as to enhance the spatial correlation of the pedestrian local area. Fig. 6 is a schematic diagram of spatial attention in an embodiment of the disclosure, as shown in fig. 6, the feature images after the channel processing are respectively convolved to obtain a first spatial information feature image T, a second spatial information feature image U and a third spatial information feature image V, the first spatial information feature image T is transposed and then multiplied by U to obtain D, the D and the V are multiplied to obtain X, and the X is scaled in a certain proportion and then added to the feature images after the channel processing to implement spatial processing on the feature images, so as to obtain a final pedestrian CNN feature image.
Fig. 7 is a flowchart of step S304 in fig. 3 according to an embodiment of the present disclosure, which specifically includes the following steps:
as shown in fig. 7, in step S701, a convolution operation of 1×1 is performed on the channel-processed feature map, so as to obtain a first spatial information feature map T and a second spatial information feature map U. Channel attention corrected feature map (i.e., channel processed feature map)
Figure BDA0002060606770000091
Feeding a convolution f of 1 x 1 key And f query Obtaining two characteristic diagrams T and U, wherein
Figure BDA0002060606770000092
As shown in fig. 7, in step S702, a matrix multiplication operation is performed on the transpose of the first spatial information feature map T and the second spatial information feature map U, so as to obtain a spatial attention feature map. Adjusting the T and U shapes to
Figure BDA0002060606770000093
Where z=a×b, representing the number of features, then transpose T and matrix multiply with U, applying a Softmax function in the row direction to obtain a spatial attention profile D e R Z×Z Each element D of D j,i Can be represented asThe method comprises the following steps:
Figure BDA0002060606770000094
wherein d j,i Representing the correlation of the ith position to the jth position feature, the more similar the feature expressions of the two positions are, the higher the correlation between the two.
As shown in fig. 7, in step S703, a convolution operation of 1×1 is performed on the channel-processed feature map, to obtain a third spatial information feature map V.
The channel-processed feature map S' is fed into a 1 x 1 convolutional layer f value Obtaining a new characteristic diagram
Figure BDA0002060606770000095
And adjust its shape to +.>
Figure BDA0002060606770000096
As shown in fig. 7, in step S704, the third spatial information feature map V is subjected to matrix multiplication with the transpose of the spatial attention feature map, and a spatially processed feature map is obtained.
In this step, the transpose of V and D is first subjected to matrix multiplication, and the shape of the result is adjusted to
Figure BDA0002060606770000097
And passing it through a convolution f of 1 x 1 up Obtaining a characteristic diagram->
Figure BDA0002060606770000098
As shown in fig. 7, in step S705, the pedestrian CNN feature map is obtained from the passage processing, that is, the steps S401 to S403, and the space processing, that is, the steps S701 to S704.
In this step X is multiplied by a scaling parameter alpha and added to the channel-processed feature map S' by element to obtain a feature map
Figure BDA0002060606770000101
Namely:
Figure BDA0002060606770000102
based on the above, the elements of each position of the feature map s″ can be expressed as:
Figure BDA0002060606770000103
where α is a learnable parameter, initially set to 0, and progressively greater weights may be learned starting from 0. As can be seen from equation (6), the feature S "of each position of the feature map S' j Is a feature map S 'of features of all positions and processed by channels' j So that it contains a global receptive field, based on each element D in the spatial attention profile D j,i Selectively aggregating the associated local regions V i And thereby may enhance the link between the different local features of the pedestrian.
Based on the foregoing steps, it is more efficient to use the modified CNN signature in series with channel attention and spatial attention, letting the neural network automatically focus on which types of features and which locations of features. Thus, in the present disclosure, the channel attention module and the spatial attention module are used in combination, giving full play to both. As shown in fig. 5, the characteristic diagram of the present disclosure
Figure BDA0002060606770000104
The correction of the complementary attention is realized by the channel attention module and the space attention module firstly:
S'=M c (S)
S”=M s (S') equation (7)
In step S120, model training is performed by simulating the situation that the discriminative area of the pedestrian CNN feature map is blocked in a manner of resisting erasure learning, so as to obtain a training model.
Fig. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure, as shown in fig. 8, the pedestrian CNN feature map is processed by convolution, GAP and Softmax loss functions through the main classifier and the auxiliary classifier, respectively.
Fig. 9 is a flowchart of step S120 in fig. 1 according to an embodiment of the disclosure, which specifically includes the following steps:
as shown in fig. 9, in step S901, the pedestrian CNN feature map is input to a main classifier and an auxiliary classifier, respectively, for classification training, and feature maps specific to pedestrian categories are output from the main classifier and the auxiliary classifier.
The main classifier and the auxiliary classifier comprise the same number of convolution layers and global average pooling layers (Global Average Pooling, GAP for short), the number of channels of the convolution layers is the same as the number of pedestrian categories in the training data set, and each channel of the characteristic map specific to the pedestrian category represents a body response heat map when the pedestrian image belongs to different categories.
In the step, the full connection layer of the classification model is replaced by a 1×1 convolution layer to form a classification model based on a full convolution network, and the corrected feature image (namely, pedestrian CNN feature image) is fed into the 1×1 convolution layer to directly obtain the feature image exclusive to the pedestrian category. In the training stage, a pedestrian image category label can be obtained, and the characteristic diagram of the channel corresponding to the category label is indexed out to obtain a characteristic diagram exclusive to the pedestrian category, namely a body response heat diagram of the pedestrian image.
As shown in fig. 9, in step S902, the sub classifier is partially erased, so as to obtain an erased feature map.
Firstly, determining a region with a heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region; and secondly, erasing the part corresponding to the discriminant area in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier by a countermeasure mode that the response value is replaced by 0.
In the step, the input feature map of the auxiliary classifier is partially erased, the main classifier in the step S901 generates a feature map exclusive for the pedestrian category, and the part with the body response heat map value higher than the threshold of the countermeasure erasure is set as the discriminatory region, and the corresponding region in the input feature map of the auxiliary classifier is erased by the countermeasure mode that the response value is replaced by 0. The feature map input by the auxiliary classifier is partially erased, so that the variation of the feature map can be increased, and the situation that pedestrians are blocked is simulated.
As shown in fig. 9, in step S903, a loss function is used to calculate the characteristic map specific to the pedestrian category output by the main classifier and the erased characteristic map output by the auxiliary classifier, so as to obtain a loss value.
As shown in fig. 9, in step S904, the training model is updated with parameters according to the loss value.
In the step, parameter updating is carried out on both branches of the main classifier and the auxiliary classifier under the supervision of a Softmax loss function, and the loss function expression is as follows:
Figure BDA0002060606770000121
where P represents the size of the bulk samples, M represents the number of branches, K represents the number of classifiers in the challenge-erase learning (2 in this embodiment), C represents the number of classes,
Figure BDA0002060606770000122
represents the first of the kth classifier of the mth branch of the nth sample when the full convolution classification network is used p A node value of Softmax input, where l p Is the class of the p-th sample. The first classifier of each branch is a main classifier, the second classifier is an auxiliary classifier, and the parameter lambda is a parameter lambda k Is assigned to the two classifier lossesWeight, wherein parameter lambda 1 Corresponding to =1 is the main classifier, parameter λ 2 Corresponding to=0.5 is a secondary classifier.
In step S130, the training model is used to combine the target pedestrian image and the pedestrian image to be identified to perform pedestrian re-identification, so as to obtain a pedestrian re-identification result.
Fig. 10 is a flowchart of step S130 in fig. 1 according to an embodiment of the disclosure, specifically including the following steps:
as shown in fig. 10, in step S1001, corresponding depth features are obtained respectively according to the target pedestrian image and the pedestrian image to be identified being input into the training model for training. In this step, the target pedestrian image and the pedestrian image to be identified are sent to the CNN model trained in step 2 to extract image features, specifically, features (res_conv5a, res_conv5b, res_conv5c) with different semantic levels in fig. 2 are connected in series as final feature descriptors.
As shown in fig. 10, in step S1002, a cosine distance is calculated according to the depth feature of the target pedestrian image and the depth feature of the pedestrian image to be identified, where the calculation formula is:
Figure BDA0002060606770000131
wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.
As shown in fig. 10, in step S1003, the similarity between the target pedestrian image and the pedestrian image to be identified is determined according to the magnitude of the cosine distance, wherein the pedestrian image to be identified with the greatest similarity is the pedestrian re-identification result.
Since the similarity between the graph opposition composed of the target pedestrian image and the pedestrian image to be identified is in a negative linear correlation relationship with the feature cosine distance, the smaller the feature cosine distance is, the higher the similarity of the graph opposition is. Based on the above, the cosine distances can be obtained and then arranged in ascending order according to the sizes, that is, the images are ordered in descending order of the sizes of the similarity, and the pedestrian image to be identified with the maximum similarity is used as the pedestrian re-identification result.
In summary, by adopting the pedestrian re-recognition method provided by the embodiment of the present disclosure, on one hand, by providing a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, so as to increase the variation of the pedestrian features and resist the situation that the pedestrian is blocked, and improve the generalization capability of the deep pedestrian re-recognition model. On the other hand, the spatial attention model in the disclosure integrates spatial context information into the local features of pedestrians, enhances the spatial correlation of different positions of the pedestrians, forms a complementary attention model with the channel attention model, and corrects the feature map from two directions of the channel and the space by combining the two models, so that the discriminant region can be better captured. The classification model based on the full convolution network can directly obtain a body response heat map in the forward propagation process, guide erasure of a discriminatory body region and realize data enhancement of feature level data.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (6)

1. A pedestrian re-recognition method, characterized in that it comprises:
extracting a pedestrian CNN feature map from a plurality of pictures, wherein the pedestrian CNN feature map comprises the following steps:
randomly selecting the plurality of pictures from a training dataset;
inputting the pictures into a plurality of different semantic layers of a ResNet50 model for extraction to obtain feature images of a plurality of channels;
processing the feature graphs of the channels by using a channel attention module to obtain a feature graph processed by the channels;
processing the spatial context information of the feature map processed by the channel at different positions by using a spatial attention module to obtain the pedestrian CNN feature map; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to carry out model training to obtain a training model, wherein the method comprises the following steps:
inputting the pedestrian CNN feature images into a main classifier and an auxiliary classifier respectively for classification training, and outputting feature images exclusive to pedestrian categories from the main classifier and the auxiliary classifier;
the auxiliary classifier is an auxiliary classifier added on the basis of the resnet 50;
the main classifier and the auxiliary classifier comprise the same number of convolution layers and global average pooling layers, the number of channels of the convolution layers is the same as the number of pedestrian categories in the training data set, and each channel of the characteristic map exclusive to the pedestrian category represents a body response heat map when the pedestrian image belongs to different categories; performing partial erasure on the auxiliary classifier to obtain an erased feature map;
the performing partial erasure at the secondary classifier includes:
determining a region with the heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region;
the part corresponding to the discriminant area in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier is erased in a countermeasure mode that the response value is replaced by 0;
calculating the exclusive characteristic diagram of the pedestrian category output by the main classifier and the erased characteristic diagram output by the auxiliary classifier through a loss function to obtain a loss value;
parameter updating is carried out on the training model according to the loss value;
and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result.
2. The pedestrian re-recognition method of claim 1 wherein the processing the feature map of the plurality of channels with the channel attention module to obtain a channel processed feature map comprises:
obtaining a channel characteristic descriptor according to the characteristic diagram of each channel in the characteristic diagrams of the plurality of channels;
obtaining a channel attention feature map through activating function operation on the channel feature descriptors;
multiplying the channel attention profile by the aggregated profile to obtain the channel processed profile.
3. The pedestrian re-recognition method of claim 2 wherein the feature descriptors include statistics of the plurality of channels, the feature descriptors being:
Figure FDA0004176651120000021
the statistics for each channel are:
Figure FDA0004176651120000022
wherein N is the number of channels, N is the number of channels, and A and B are the length and width of the feature map respectively;
the channel attention profile is:
e=σ(W 2 δ(W 1 (s)))
wherein sigma, delta represent the Sigmod activation function and the ReLU activation function, respectively,
Figure FDA0004176651120000023
is the weight of the first fully-connected layer Fc1, -/->
Figure FDA0004176651120000024
Is the weight of the second fully connected layer Fc2, r is a multiple of the attenuation.
4. The pedestrian re-recognition method of claim 1, wherein the processing the spatial context information of the channel-processed feature map at different locations with the spatial attention module to obtain the pedestrian CNN feature map comprises:
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a first spatial information feature map T and a second spatial information feature map U;
performing matrix multiplication operation on the transpose of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map;
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a third spatial information feature map V;
performing matrix multiplication operation on the third spatial information feature map V and the transpose of the spatial attention feature map to obtain a feature map subjected to spatial processing;
and obtaining the pedestrian CNN characteristic diagram according to the channel processing and the space processing.
5. The pedestrian re-recognition method according to claim 2, wherein the step of performing pedestrian re-recognition by combining the target pedestrian image and the pedestrian image to be recognized by using the training model, the step of obtaining a pedestrian re-recognition result includes:
inputting the target pedestrian image and the pedestrian image to be identified into the training model for training to obtain corresponding depth characteristics respectively;
calculating cosine distance according to the depth features of the target pedestrian image and the depth features of the pedestrian image to be identified;
and determining the similarity between the target pedestrian image and the pedestrian image to be identified according to the cosine distance, wherein the pedestrian image to be identified with the maximum similarity is the pedestrian re-identification result.
6. The pedestrian re-recognition method of claim 5, wherein the calculation formula for calculating the cosine distance from the depth features of the target pedestrian image and the depth features of the pedestrian image to be recognized is:
Figure FDA0004176651120000031
wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.
CN201910403777.5A 2019-05-15 2019-05-15 Pedestrian re-identification method Active CN110110689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910403777.5A CN110110689B (en) 2019-05-15 2019-05-15 Pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910403777.5A CN110110689B (en) 2019-05-15 2019-05-15 Pedestrian re-identification method

Publications (2)

Publication Number Publication Date
CN110110689A CN110110689A (en) 2019-08-09
CN110110689B true CN110110689B (en) 2023-05-26

Family

ID=67490255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910403777.5A Active CN110110689B (en) 2019-05-15 2019-05-15 Pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN110110689B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516603B (en) * 2019-08-28 2022-03-18 北京百度网讯科技有限公司 Information processing method and device
CN112633459A (en) * 2019-09-24 2021-04-09 华为技术有限公司 Method for training neural network, data processing method and related device
CN112784648B (en) * 2019-11-07 2022-09-06 中国科学技术大学 Method and device for optimizing feature extraction of pedestrian re-identification system of video
CN111160096A (en) * 2019-11-26 2020-05-15 北京海益同展信息科技有限公司 Method, device and system for identifying poultry egg abnormality, storage medium and electronic device
CN111198964B (en) * 2020-01-10 2023-04-25 中国科学院自动化研究所 Image retrieval method and system
CN111461038B (en) * 2020-04-07 2022-08-05 中北大学 Pedestrian re-identification method based on layered multi-mode attention mechanism
CN111582587B (en) * 2020-05-11 2021-06-04 深圳赋乐科技有限公司 Prediction method and prediction system for video public sentiment
CN111814618B (en) * 2020-06-28 2023-09-01 浙江大华技术股份有限公司 Pedestrian re-recognition method, gait recognition network training method and related devices
CN112131943B (en) * 2020-08-20 2023-07-11 深圳大学 Dual-attention model-based video behavior recognition method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201600068348A1 (en) * 2016-07-01 2018-01-01 Octo Telematics Spa Procedure for determining the status of a vehicle by detecting the vehicle's battery voltage.
CN107563313A (en) * 2017-08-18 2018-01-09 北京航空航天大学 Multiple target pedestrian detection and tracking based on deep learning
CN107679483A (en) * 2017-09-27 2018-02-09 北京小米移动软件有限公司 Number plate recognition methods and device
CN107992882A (en) * 2017-11-20 2018-05-04 电子科技大学 A kind of occupancy statistical method based on WiFi channel condition informations and support vector machines
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN109583379A (en) * 2018-11-30 2019-04-05 常州大学 A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359559B (en) * 2018-09-27 2021-11-12 天津师范大学 Pedestrian re-identification method based on dynamic shielding sample
CN109583502B (en) * 2018-11-30 2022-11-18 天津师范大学 Pedestrian re-identification method based on anti-erasure attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201600068348A1 (en) * 2016-07-01 2018-01-01 Octo Telematics Spa Procedure for determining the status of a vehicle by detecting the vehicle's battery voltage.
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN107563313A (en) * 2017-08-18 2018-01-09 北京航空航天大学 Multiple target pedestrian detection and tracking based on deep learning
CN107679483A (en) * 2017-09-27 2018-02-09 北京小米移动软件有限公司 Number plate recognition methods and device
CN107992882A (en) * 2017-11-20 2018-05-04 电子科技大学 A kind of occupancy statistical method based on WiFi channel condition informations and support vector machines
CN109583379A (en) * 2018-11-30 2019-04-05 常州大学 A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于稀疏学习的行人重识别算法;张文文 等;《数据采集与处理》;第33卷(第5期);第855-864页 *

Also Published As

Publication number Publication date
CN110110689A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110110689B (en) Pedestrian re-identification method
CN109740419B (en) Attention-LSTM network-based video behavior identification method
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
Sun et al. Lattice long short-term memory for human action recognition
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN108960080B (en) Face recognition method based on active defense image anti-attack
CN110889375B (en) Hidden-double-flow cooperative learning network and method for behavior recognition
CN110378208B (en) Behavior identification method based on deep residual error network
CN112434608B (en) Human behavior identification method and system based on double-current combined network
US20220292394A1 (en) Multi-scale deep supervision based reverse attention model
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN112861970B (en) Fine-grained image classification method based on feature fusion
Zhu et al. Attentive multi-stage convolutional neural network for crowd counting
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
Cai et al. A real-time smoke detection model based on YOLO-smoke algorithm
CN114241456A (en) Safe driving monitoring method using feature adaptive weighting
Gao et al. Adaptive random down-sampling data augmentation and area attention pooling for low resolution face recognition
CN110728238A (en) Personnel re-detection method of fusion type neural network
CN114492634A (en) Fine-grained equipment image classification and identification method and system
Kumar et al. Content based movie scene retrieval using spatio-temporal features
CN112528077A (en) Video face retrieval method and system based on video embedding
CN110852272B (en) Pedestrian detection method
CN116229323A (en) Human body behavior recognition method based on improved depth residual error network
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN116543333A (en) Target recognition method, training method, device, equipment and medium of power system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant