CN114495269A

CN114495269A - Pedestrian re-identification method

Info

Publication number: CN114495269A
Application number: CN202210034867.3A
Authority: CN
Inventors: 张索非; 吴晓富
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-05-13

Abstract

The invention discloses a pedestrian re-identification method, which comprises the following steps: inputting the image of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and extracting the characteristics of the pedestrian; matching the extracted pedestrian features with the features corresponding to the images in the gallery, and outputting a recognition result; the pedestrian re-identification model is constructed based on an asymmetric branch network, and the asymmetric branch network comprises 1 trunk network, 1 global branch network and 1 asymmetric local branch network. The pedestrian re-identification model is constructed based on the asymmetric branch network, so that the diversity of extracted features is improved, and the identification precision is improved.

Description

Pedestrian re-identification method

Technical Field

The invention relates to a pedestrian re-identification method, and belongs to the technical field of computer vision.

Background

With the wide application of deep learning in the field of image processing and computer vision, a network model based on pre-training as a feature extraction module is a common means for solving the problem, and in recent years, work in the aspect of re-recognition of many pedestrians shows that a multi-branch network is an effective strategy for feature extraction, features extracted by different branches can be mutually supplemented, and the re-recognition performance of pedestrians is greatly improved.

However, most of the existing using methods for the multi-branch network adopt a symmetrical network structure, and explicit constraints are applied among the branch networks to ensure the diversity of extracted features, so that the training computation amount of the pedestrian re-recognition model is large, and the model construction efficiency is low.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a pedestrian re-identification method, which comprises the following steps:

inputting the image of the pedestrian to be recognized into a pre-trained pedestrian re-recognition model, and extracting the characteristics of the pedestrian;

matching the extracted pedestrian features with the features corresponding to the images in the gallery, and outputting a recognition result;

the pedestrian re-identification model is constructed based on an asymmetric branch network, and the asymmetric branch network comprises 1 trunk network, 1 global branch network and 1 asymmetric local branch network.

Further, the backbone network is Resnet 50.

Further, the global branch network includes a convolution layer, a down-sampling layer, a BN layer, a residual structure and a residual module, and the step size of the convolution kernel of the down-sampling layer is 1.

Furthermore, the local branch network comprises a convolution layer, a down-sampling layer, a BN layer and a residual structure, wherein the step length of a convolution kernel of the down-sampling layer is 1, and the network weight of the local branch network is not shared.

Further, the extraction of the pedestrian features comprises the following steps:

obtaining 1 global feature from the output feature map of the global branch network through 1 global average operation;

the global features pass through a batch normalization layer to obtain normalized global features;

the output characteristic diagram of the local branch network obtains a plurality of local characteristics through 1 group of local average operations;

and the normalized global features and the plurality of local features are connected in series end to serve as the extracted pedestrian features.

Further, the normalized global features are trained by adopting a cross entropy loss function.

Further, the multiple local features are subjected to dimensionality reduction to obtain multiple shorter local features, and the multiple shorter local features are trained by adopting a cross entropy loss function.

Further, the extracted pedestrian features are trained by adopting a triple loss function.

Further, the pedestrian re-identification model comprises a lightweight attention module, and the lightweight attention module comprises a space attention sub-module and a channel attention sub-module.

Further, the spatial attention submodule employs 1 one-dimensional convolution to reduce the number of parameters.

Compared with the prior art, the invention has the beneficial effects that: the pedestrian re-identification model is constructed based on the asymmetric branch network, so that the diversity of extracted features is improved, and the identification precision is improved; meanwhile, the step length of the down-sampling layer convolution kernel of the global branch network and the local branch network is adjusted from 2 to 1, the resolution of the output characteristic graph is doubled, and the identification precision is improved; meanwhile, the main network adopts a lightweight attention module, and the relationship between the features and the peripheral features is modeled by comparing the features of different positions and different channels, so that the representativeness of the extracted features is improved, and the identification precision is further improved; meanwhile, the used lightweight attention module means that a 1-dimensional convolution is used in the attention module to reduce the calculation amount, and only a very small amount of calculation complexity is increased to replace the remarkable performance improvement.

Drawings

FIG. 1 is a flowchart of a pedestrian re-identification method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an asymmetric branch network architecture according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a lightweight spatial attention submodule according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout, and the embodiments described below with reference to the drawings are exemplary only and are not to be construed as limiting the invention.

In the description of the present invention, several meanings are more than one, several meanings are more than two, more than, less than, exceeding, etc. are understood as not including the number, and more than, less than, etc. are understood as including the number, if there are descriptions of the first and second for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features or implicitly indicating the precedence of the indicated technical features.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

In the description of the present invention, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention, and in this specification, an illustrative representation of the terms does not necessarily refer to the same embodiment or example, and the particular feature, structure, material, or characteristic described may be combined in any suitable manner in any one or more embodiments or examples.

The present invention provides a pedestrian re-identification method, which is further described with reference to the accompanying drawings and embodiments.

As shown in fig. 1, the present embodiment provides a pedestrian re-identification method, including: and inputting the image of the pedestrian to be recognized into a pedestrian re-recognition model, extracting the pedestrian features, matching the extracted pedestrian features with the corresponding features of each image in the image library, and outputting a recognition result.

In this embodiment, the construction of the pedestrian recognition model is realized by a convolutional neural network, which includes: obtaining a historical pedestrian image, constructing a pedestrian re-identification training data set, and obtaining a pedestrian re-identification model based on a convolutional neural network according to the pedestrian re-identification training data set.

As shown in fig. 2, in the present embodiment, the convolutional neural network is an asymmetric branch network, and includes 1 trunk network, 1 global branch network, and 1 asymmetric local branch network.

The main network adopts ResNet50 to extract features, ResNet50 comprises layers 0 to 3, and the layer 3 output feature graph of ResNet50 is input into the global branch network and the asymmetric local branch network simultaneously.

The global branch network comprises a convolution layer, a down-sampling layer, a BN layer, a residual structure and a residual module, wherein the step length of a convolution kernel of the down-sampling layer of the global branch network is set to be 1, the residual module of the global branch network is composed of the down-sampling layer and the convolution layer, 1 global feature is obtained through 1 global average operation of an output feature diagram of the global branch network, the obtained global feature passes through a batch normalization layer to obtain a normalized global feature, and the normalized global feature is trained by adopting a cross entropy loss function.

The local branch network comprises a convolution layer, a down-sampling layer, a BN layer and a residual structure, the step length of a convolution kernel of the down-sampling layer is set to be 1, the network weight of the local branch network is not shared, the output characteristic diagram of the local branch network obtains a plurality of local characteristics through 1 group of local average operations, and the plurality of local characteristics comprise an upper local characteristic, a middle local characteristic and a lower local characteristic.

The normalized global features and the plurality of local features are connected in series end to end and serve as extracted pedestrian features, and the extracted pedestrian features are trained by adopting a triple loss function.

And obtaining a plurality of shorter local features by the plurality of local features through a dimensionality reduction layer, and then training the plurality of shorter local features by adopting a cross entropy loss function.

The main network is also provided with a lightweight attention module, the lightweight attention module comprises a space attention submodule (namely a position attention module) and a channel attention submodule (namely a channel attention module), and the space attention submodule and the channel attention submodule are connected in series to form a complete attention module; as shown in fig. 3, the spatial attention submodule reduces the number of parameters by using a specially designed one-dimensional convolution, and the specific steps are as follows: the space attention submodule firstly converts an input feature X from a dimension of C multiplied by H multiplied by 2 into a dimension of C multiplied by S, wherein C is the number of feature map channels, H is the height of the feature map, W is the width of the feature map, and S is H multiplied by W and is the size of the straightened feature; inputting the straightened characteristic X into another single-channel one-dimensional convolution filter with the kernel width of 5 and the step length of 1 to obtain a characteristic Q after convolution; inputting the straightened feature X into a single-channel one-dimensional convolution filter with the kernel width of 5 and the step length of 1 to obtain a convolved feature K, transposing the feature K, multiplying the feature K by a feature Q, and obtaining an attention graph A by a softmax function, wherein the attention graph A is represented as:

A＝soft max(K^TQ) (1)

where T denotes a matrix transpose.

Inputting the characteristic X into a two-dimensional convolution filter with the kernel width of 1 and the step length of 1, wherein the number of input and output channels of the filter is C, obtaining a characteristic V after convolution, multiplying the characteristic V with a characteristic graph A, obtaining a characteristic graph after attention reweighting, and adding and outputting the graph and the original characteristic through a residual error structure.

The loss function used by the asymmetric branching network during model training is represented as follows:

wherein L is_ceRepresenting the cross entropy loss function, L_tRepresenting a triple loss function; w^bAnd

representing full connectivity layer corresponding parameters for a cross entropy loss function; n represents a block index of a local branch, and generally takes a value from 1 to 3; y is the identity tag of the pedestrian sample, i andj represents the sequence number of the different samples; λ is a weight constant, which can be directly set to 1; an indicator indicates a concatenation operation of feature vectors; for global branch, in the training phase, the global feature in the triple loss function is derived from the global feature f before normalization^gThe global feature in the cross entropy loss function is derived from the normalized global feature f^b(ii) a In the testing stage, the output characteristics of the model are normalized by f^bIn place of f^gTo be connected in series, i.e., the bnnack structure in fig. 1; for local branches, f^pRepresenting local features obtained after average pooling, f^sRepresenting the shortened local features obtained after the dimensionality reduction layer.

The final output pedestrian characteristic of the recognition model is f^b⊙f^pAs shown in table 1, the asymmetric network structure provided in the present invention can effectively improve the diversity of extracted features between different branches, thereby improving the overall recognition accuracy of the system.

TABLE 1 Performance comparison of the latest approach on four common pedestrian re-identification datasets

The comparison of different pedestrian re-identification methods in table 1 covers four most common standard data sets, and it can be seen that, from the two standard evaluation indexes of mAP and Rank-1 of the pedestrian re-identification task, all other data sets show obvious advantages except that the Rank-1 index on the Market1501 data set has slight lag, which exceeds other methods of ranking second.

As will be appreciated by one of skill in the art, the present application may be provided as a method, system, or computer program product, and as such, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects and may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.

While the present application has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application, it will be understood that each flowchart illustration and/or block diagram block and combination of flowchart illustrations and/or block diagram blocks can be implemented by computer program instructions, which may be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart illustration, flow or blocks, and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A pedestrian re-identification method, characterized in that the method comprises the steps of:

matching the extracted pedestrian features with features corresponding to the images in the image library, and outputting an identification result;

2. The pedestrian re-identification method according to claim 1, wherein the backbone network is Resnet 50.

3. The pedestrian re-identification method of claim 1, wherein the global branching network comprises a convolution layer, a downsampling layer, a BN layer, a residual structure, and a residual module, wherein the downsampling layer convolution kernel step size is 1.

4. The pedestrian re-identification method according to claim 1, wherein the local branch network includes a convolution layer, a down-sampling layer, a BN layer and a residual structure, the step size of the convolution kernel of the down-sampling layer is 1, and the network weights of the local branch network are not shared.

5. The pedestrian re-identification method according to claim 1, wherein the extraction of the pedestrian feature includes the steps of:

6. The pedestrian re-identification method of claim 5, wherein the normalized global features are trained using a cross entropy loss function.

7. The pedestrian re-identification method according to claim 5, wherein the plurality of local features are passed through a dimensionality reduction layer to obtain a plurality of shorter local features, and the plurality of shorter local features are trained using a cross entropy loss function.

8. The pedestrian re-identification method of claim 5, wherein the extracted pedestrian features are trained using a triple loss function.

9. The pedestrian re-identification method according to claim 1, wherein the pedestrian re-identification model comprises a lightweight attention module including a spatial attention sub-module and a channel attention sub-module.

10. The pedestrian re-identification method of claim 9, wherein the spatial attention sub-module employs 1 one-dimensional convolution to reduce the number of parameters.