CN114792430A - Pedestrian re-identification method, system and related equipment based on polarization self-attention - Google Patents

Pedestrian re-identification method, system and related equipment based on polarization self-attention Download PDF

Info

Publication number
CN114792430A
CN114792430A CN202210462489.9A CN202210462489A CN114792430A CN 114792430 A CN114792430 A CN 114792430A CN 202210462489 A CN202210462489 A CN 202210462489A CN 114792430 A CN114792430 A CN 114792430A
Authority
CN
China
Prior art keywords
pedestrian
attention
self
branch
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210462489.9A
Other languages
Chinese (zh)
Inventor
闫潇宁
陈晓艳
杨坤志
张东洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Anruan Huishi Technology Co ltd
Original Assignee
Shenzhen Anruan Huishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Anruan Huishi Technology Co ltd filed Critical Shenzhen Anruan Huishi Technology Co ltd
Priority to CN202210462489.9A priority Critical patent/CN114792430A/en
Publication of CN114792430A publication Critical patent/CN114792430A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention is suitable for the field of computer vision, and provides a pedestrian re-identification method, a system and related equipment based on polarized self-attention, wherein the method comprises the following steps: acquiring a shooting data set with a pedestrian picture, and preprocessing the shooting data set to obtain a data set to be divided; dividing a data set to be divided into a training set and a testing set, wherein each pedestrian picture has a real label; constructing a pedestrian re-identification model comprising a double-branch structure and a polarization self-attention mechanism structure; training a pedestrian weight recognition model by using an Adam optimization algorithm by taking a training set as input and a real label corresponding to the training set as reference for parameter debugging to obtain a training parameter weight; and taking the training parameter weight as a test weight, and taking the test set as the input of the pedestrian re-recognition model to obtain a pedestrian re-recognition result of the test set. The invention reduces the information loss in the characteristic extraction process, and improves the accuracy of pedestrian re-identification by combining global and local characteristic extraction.

Description

Pedestrian re-identification method and system based on polarization self-attention and related equipment
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a pedestrian re-identification method and system based on polarization self-attention and related equipment.
Background
In recent years, with increasing demands in the fields of intelligent security and video monitoring, the research on pedestrian re-identification has received more and more extensive attention and research. In video monitoring, because the camera resolution ratio is low, light intensity is not enough, make a video recording angle is not good and the object shelters from the like factor, be difficult to catch the clear people's face information of pedestrian, therefore be difficult to discern the pedestrian identity through people's face information, and pedestrian's heavy identification can be regarded as a picture retrieval task, utilize computer vision technique to judge whether there is specific pedestrian in given picture or the video sequence, can avoid because the camera resolution ratio is low under this kind of technique, light intensity is not enough, make a video recording angle is not good and the object shelters from the like factor is difficult to discern the condition of pedestrian's identity through people's face information.
Pedestrian re-identification is generally realized based on a deep learning method, namely, after the characteristics of pedestrians are identified by a convolutional neural network, the pedestrians with the same characteristics in different pictures are obtained. The existing pedestrian re-identification method mainly considers the coarse-grained characteristics of the whole pedestrian picture, namely the overall characteristics of the pedestrian, but lacks the attention to the fine-grained characteristics of the pedestrian, such as hairstyle, clothes color, whether to pack or not and the like, so that the existing pedestrian re-identification method is insufficient in pedestrian characteristic extraction, and the accuracy rate is low.
Disclosure of Invention
The embodiment of the invention provides a pedestrian re-identification method, a system and related equipment based on polarization self-attention, and aims to solve the problem that the traditional pedestrian re-identification method is insufficient in extracting different coarse-grained characteristics of pedestrians, so that the accuracy is low.
In a first aspect, an embodiment of the present invention provides a method for re-identifying a pedestrian based on polarized self-attention, where the method includes:
acquiring a shooting data set with a pedestrian picture, and preprocessing the shooting data set to obtain a data set to be divided;
dividing the data set to be divided into a training set and a testing set, wherein each pedestrian picture in the training set and the data set has a real label;
constructing a pedestrian re-identification model comprising a double-branch structure and a polarization self-attention mechanism structure;
taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight;
and taking the training parameter weight as the testing weight of the pedestrian re-recognition model, and taking the testing set as the input of the pedestrian re-recognition model to obtain the pedestrian re-recognition result of the testing set.
Further, in the step of obtaining a shot data set with a pedestrian picture, and preprocessing the shot data set to obtain a data set to be divided, the preprocessing specifically includes:
normalizing the size of each picture in the shooting data set, and turning, randomly cutting and erasing data enhancement are carried out on each picture in the shooting data set.
Furthermore, the pedestrian re-identification model takes a convolutional neural network as a feature extraction network, the convolutional neural network comprises an input layer, a convolutional layer, a feature extraction layer and an output layer, wherein the convolutional layer comprises a plurality of layers, the double-branch structure comprises a global branch and a local branch, the double-branch structure is positioned between the convolutional layer and the output layer, the polarization self-attention mechanism structure comprises a channel self-attention branch and a space self-attention branch, and the polarization self-attention mechanism structure is positioned after each convolutional layer.
Further, for the polarized self-attention mechanism structure, the feature matrix of the pedestrian picture output by the convolutional layer is defined as X, and the weight of the channel self-attention branch is defined as A sp (X), then the channel is weighted from the attention branch by A sp (X) satisfies the relation (1):
A sp (X)=F SG3 (F SM1 (F GP (W q (X))))×σ 2 (W v (X)))] (1)
defining the spatial self-attention branch as weight A ch (X), then the weight of the spatial self-attention branch A ch (X) satisfies the relation (2):
Figure BDA0003620269850000031
in the above relational expressions (1) and (2), W q 、W v
Figure BDA0003620269850000032
Convolution operations, σ, all 1 x1 1 、σ 2 、σ 3 Are all tensor warp operations, F SM For softmax operation, F GP For global average pooling operation, F SG Is a Sigmoid function, and the x operator is matrix point multiplication;
defining a weight A for the channel self-attention branch sp (X) and the weight of the spatial self-attention branch A ch (X) parallel fusion results as PSA p (X), then said parallel fusion results PSA p (X) satisfies the relation (3):
PSA p (X)=Z ch +Z sp =A ch (X)⊙ ch X+A sp (X)⊙ sp X (3)
the above relation (3)) Middle, PSA p (X) as an output of said polarized self-attentive force mechanism structure ch For a multiply by channel operation, <' > sp For operation by spatial multiplication, Z sp For the output of said spatial self-attention branch, Z ch Is the output of the channel from the attention branch.
Furthermore, for the dual-branch structure, the global branch has the same structure as the feature extraction layer of the convolutional neural network, and is configured to take the parallel fusion result output by the polarized self-attention mechanism structure as input, and extract a global feature corresponding to the pedestrian picture, where the local branch and the global branch are in parallel hierarchy, and are configured to take the parallel fusion result output by the polarized self-attention mechanism structure as input, and extract a local feature corresponding to the pedestrian picture.
Further, the output layer of the pedestrian re-identification model takes the fusion result of the global feature and the local feature as an output result.
Still further, the loss function used by the pedestrian re-identification model in the training process comprises at least one of a triple loss, a classification loss and a center loss.
In a second aspect, an embodiment of the present invention further provides a pedestrian re-identification system based on polarized self-attention, including:
the device comprises a preprocessing module, a data partitioning module and a data partitioning module, wherein the preprocessing module is used for acquiring a shooting data set with a pedestrian picture and preprocessing the shooting data set to obtain a data set to be partitioned;
the data dividing module is used for dividing the data set to be divided into a training set and a testing set, wherein each pedestrian picture in the training set and the data set is provided with a real label;
the model building module is used for building a pedestrian re-identification model comprising a double-branch structure and a polarization self-attention mechanism structure;
the model training module is used for taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight;
and the data identification module is used for taking the training parameter weight as the testing weight of the pedestrian re-identification model and taking the test set as the input of the pedestrian re-identification model to obtain a pedestrian re-identification result of the test set.
In a third aspect, an embodiment of the present invention further provides a computer device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for pedestrian re-identification based on polarized self-attention as described in any one of the above embodiments when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps in the method for pedestrian re-identification based on polarized self-attention as described in any one of the above embodiments.
The method has the advantages that due to the adoption of the two-dimensional polarization self-attention mechanism structure, information of different feature spaces can be learned in a self-adaptive mode, information loss caused by dimension reduction is reduced, meanwhile, the global features and the local features are respectively extracted by utilizing the two-branch structure network, and the recognition accuracy of the pedestrian re-recognition model is further improved.
Drawings
FIG. 1 is a block flow diagram illustrating steps of a method for re-identifying a pedestrian based on polarized self-attention according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a pedestrian re-identification model provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of a polarization self-attention mechanism configuration provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a system 200 for recognizing a pedestrian based on polarized self-attention according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1, fig. 1 is a flow chart of steps of a pedestrian re-identification method based on polarized self-attention according to an embodiment of the present invention, which specifically includes the following steps:
s101, acquiring a shooting data set with a pedestrian picture, and preprocessing the shooting data set to obtain a data set to be divided.
In the step of obtaining a shot data set with a pedestrian picture and preprocessing the shot data set to obtain a data set to be divided, the preprocessing method specifically comprises the following steps:
normalizing the size of each picture in the shooting data set, and turning, randomly cutting and erasing data enhancement are carried out on each picture in the shooting data set.
For example, the shooting data set selected in the embodiment of the present invention may be a public Market1501 data set, the Market1501 data set includes 1501 pedestrian targets captured through camera shooting, each pedestrian target further includes a plurality of pedestrian images in the data set, and in step S101, the pedestrian images in the Market1501 data set are preprocessed in manners of normalization, flipping, random cropping, random erasing, and the like, so as to obtain the data set to be divided.
S102, dividing the data set to be divided into a training set and a testing set, wherein each pedestrian picture in the training set and the data set has a real label.
Specifically, the data to be divided is divided into the training set and the test set, and the marker 1501 data set used in the embodiment of the present invention is divided into the training set and the test set including different pedestrian targets according to the existing content thereof, wherein the training set includes 751 pedestrian targets including 12936 pedestrian images, the test set includes 750 pedestrian targets including 19732 pedestrian images, each pedestrian image has the real tag, and the real tag is used for tagging a certain feature expressed in the pedestrian image.
Preferably, the training set is divided into N batches, each batch includes P different pedestrian targets, and each pedestrian target corresponds to K pedestrian pictures, that is, each batch includes B ═ P × K pedestrian pictures as training samples.
S103, constructing a pedestrian re-identification model comprising a double-branch structure and a polarized self-attention mechanism structure.
Specifically, referring to fig. 2, fig. 2 is a schematic structural diagram of a pedestrian re-identification model provided in an embodiment of the present invention, where the pedestrian re-identification model uses a Convolutional Neural Network (CNN) as a feature extraction network, the Convolutional Neural network includes an input layer, a Convolutional layer, a feature extraction layer, and an output layer, where the Convolutional layer includes a plurality of Convolutional layers, and on the basis of the Convolutional Neural network, the dual-branch structure is disposed at a position of the feature extraction layer and includes a global branch and a local branch, and the dual-branch structure is located between the Convolutional layer and the output layer of the overall Neural network structure.
In the embodiment of the invention, the structure of the global branch is the same as that of the original feature extraction layer of the convolutional neural network, namely, the original overall feature extraction of the pedestrian picture is realized, and after two groups of unified convolution operations, output data with the same size as the original input data is obtained by utilizing global maximum pooling; the local branch may be regarded as a structure additionally added between the convolutional layer and the output layer, and different from the global branch, the local branch does not include a structure included in the global branch for performing a global maximum pooling operation, but after two sets of convolution operations, divides image data obtained by convolution into smaller local images, and then extracts features of the local images, so as to extract fine-grained features which are not easily extracted by hierarchical features, preferably, the local branch may include a plurality of structures for dividing into local pictures and extracting local features, and when the size of the pedestrian picture is large enough or the number of pedestrian targets in the pictures is large and the environment is complicated, the plurality of local features obtained by extracting features through the plurality of local branches are spliced, output data with the same size as original input data is obtained, loss is calculated by using the same loss function as the output result of the global branch, and the finally obtained feature extraction effect is better. The dual-branch structure of the embodiment of the present invention is used only for illustration, and it should be understood that the multi-branch neural network structure based on the embodiment of the present invention also falls within the protection scope of the present invention.
In an embodiment of the present invention, the polarized self-attention mechanism structure comprises a channel self-attention branch and a spatial self-attention branch, and is located after each convolution layer of the convolutional neural network.
Specifically, referring to fig. 3, fig. 3 is a schematic view of a polarization self-attention mechanism structure provided in an embodiment of the present invention, wherein C, H, W in fig. 3 respectively corresponds to the number, height, and width of channels of the pedestrian picture, and for the polarization self-attention mechanism structure, a feature matrix of the pedestrian picture output by the convolutional layer is defined as X, and a weight of the channel self-attention branch is defined as a sp (X), then the channel is weighted from the attention branch by A sp (X) satisfies the relation (1):
A sp (X)=F SG3 (F SM1 (F GP (W q (X))))×σ 2 (W v (X)))] (1)
defining the weight of the spatial self-attention branch as A ch (X), then the weight of the spatial self-attention branch A ch (X) satisfies the relation (2):
Figure BDA0003620269850000071
in the above relational expressions (1) and (2), W q 、W v
Figure BDA0003620269850000072
Convolution operations, σ, all 1 x1 1 、σ 2 、σ 3 Are all tensor deformation operations, F SM For softmax operation, F GP For a global average pooling operation, F SG Is a Sigmoid function;
defining a weight A for the channel self-attention branch sp (X) and the weight of the spatial self-attention branch A ch The parallel fusion result of (X) is PSA p (X), then said parallel fusion results PSA p (X) satisfies the relation (3):
PSA p (X)=Z ch +Z sp =A ch (X)⊙ ch X+A sp (X)⊙ sp X (3)
in the above relation (3), PSA p (X) as an output of the polarized self-attentive force mechanism structure, <' > ch For a multiply by channel operation, <' > sp For operation by spatial multiplication, Z sp For the output of said spatial self-attention branch, Z ch Is the output of the channel from the attention branch.
For the double-branch structure, the global branch has the same structure as the feature extraction layer of the convolutional neural network, and is configured to take the parallel fusion result output by the polarized self-attention mechanism structure as input and extract a global feature corresponding to the pedestrian picture, and the local branch and the global branch are in parallel hierarchy, and are configured to take the parallel fusion result output by the polarized self-attention mechanism structure as input and extract a local feature corresponding to the pedestrian picture.
Specifically, in the pedestrian re-identification model constructed in the embodiment of the present invention, corresponding to the original structure of the convolutional neural network, the polarization self-attention mechanism structure acts on the convolutional layer of the original convolutional neural network, and the dual-branch structure acts on the feature extraction layer of the original convolutional neural network, the output of the convolutional layer after the polarization self-attention mechanism structure acts on the dual-branch structure as an input of the dual-branch structure as a whole, and in the dual-branch structure, no structure for directly transmitting information is provided between the global branch and the local branch, and in the data input level, the global branch and the local branch both use the same input content, and respectively use different feature extraction methods to perform feature extraction on the input content.
And the output layer of the pedestrian re-identification model takes the fusion result of the global features and the local features as an output result.
It should be noted that, the convolutional neural Network refers to a neural Network including a convolutional structure, and in the case of some commonly used feature extraction networks, for example, ResNet (Deep residual Network), OSNet (Omni-Scale Network, multiscale Network), etc. as the feature extraction Network, the polarized self-attention mechanism structure described in the embodiment of the present invention may be used in its convolutional structure, the dual-branch structure described in the embodiment of the present invention may be used at the end of its convolutional structure, and both the polarized self-attention mechanism structure and the dual-branch structure may achieve the same technical effect as the embodiment of the present invention by participating in convolution calculation and implementing feature fusion of different branches through an output layer, therefore, the embodiment of the present invention does not limit the type of the basic convolutional neural Network, and it should be considered that, in any Network based on feature extraction, some embodiments of the present invention may use a new type of the feature extraction Network, and all the features of the different types of the convolutional neural networks may be combined into the same type The neural network model constructed by the targets of tasks such as target classification and scene segmentation can use the structure in the embodiment of the invention and is within the protection scope of the invention.
And S104, taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight.
Specifically, the reference for parameter debugging by using the real label corresponding to the training set refers to an evaluation parameter specified by a pedestrian re-recognition effect for evaluating a current model in a training process of the pedestrian re-recognition model, the real label refers to a feature expressed by the pedestrian target in the pedestrian picture in the embodiment of the present invention, and the re-recognition degree of the real label can reflect the re-recognition degree of the overall feature of the pedestrian target.
Preferably, the loss function used by the pedestrian re-identification model in the training process includes at least one of triple loss, classification loss, and center loss, in an embodiment of the present invention, the loss function used by the pedestrian re-identification model includes all three types described above, and the object of performing the loss calculation using the loss function is data output after the global branch and the local branch feature are fused. The training parameter weight refers to the weight A of the channel self-attention branch in the above embodiment sp (X) and the weight of the spatial self-attention branch A ch (X), and the parallel fusion resultant PSA p (X)。
The number of times of the pedestrian re-identification training is related to the number of the batches divided by the training set, in the embodiment of the invention, the training set is divided into N batches, and the pedestrian re-identification model is subjected to iterative training for at least N times so as to perform complete training by using the parts of the training set corresponding to all the batches obtained by division. The pedestrian re-identification model is optimized in parameters by using an Adam optimizer in the training process.
And S105, taking the training parameter weight as a test weight of the pedestrian re-identification model, and taking the test set as the input of the pedestrian re-identification model to obtain a pedestrian re-identification result of the test set.
In the embodiment of the present invention, the training parameter weight refers to each weight parameter in the polarized self-attention machine structure, in the training process, due to continuous iterative optimization, each weight parameter in the polarized self-attention machine structure gradually approaches to a value with an optimal re-recognition effect, the test weight takes a parameter value obtained when the training is completed as a practical value, the test set is used as an input of the pedestrian re-recognition model after the training in step S104, and a pedestrian re-recognition result corresponding to the pedestrian target in the test set is obtained.
For example, please refer to table 1, where table 1 is a comparison result between the pedestrian re-identification method provided by the embodiment of the present invention and other disclosed algorithms.
TABLE 1 comparison of pedestrian re-identification method with other public algorithms
Figure BDA0003620269850000091
Figure BDA0003620269850000101
mAP and Rank-1 are common evaluation indexes in the field of pedestrian re-identification, wherein mAP represents identification accuracy, Rank-1 represents first-place hit rate, and data in table 1 show that compared with the existing pedestrian re-identification algorithm, the identification accuracy and the first-place hit rate which can be achieved by the pedestrian re-identification model based on the polarization self-attention mechanism structure and the double-branch structure provided by the embodiment of the invention are greatly improved.
The invention has the advantages that because the double-dimensional polarization self-attention mechanism structure is adopted, the information of different feature spaces can be learned in a self-adaptive manner, the information loss caused by dimension reduction is reduced, and meanwhile, the global features and the local features are respectively extracted by utilizing the double-branch structure network, so that the identification accuracy of the pedestrian re-identification model is further improved.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a pedestrian re-identification system 200 based on polarized self-attention according to an embodiment of the present invention, where the pedestrian re-identification system 200 includes:
the system comprises a preprocessing module 201, a data processing module and a data dividing module, wherein the preprocessing module 201 is used for acquiring a shooting data set with a pedestrian picture and preprocessing the shooting data set to obtain a data set to be divided;
a data dividing module 202, configured to divide the data set to be divided into a training set and a test set, where each pedestrian picture in the training set and the data set has a real label;
the model construction module 203 is used for constructing a pedestrian re-identification model comprising a double-branch structure and a polarized self-attention mechanism structure;
the model training module 204 is used for taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight;
the data identification module 205 is configured to use the training parameter weight as a test weight of the pedestrian re-identification model, and use the test set as an input of the pedestrian re-identification model, so as to obtain a result of the pedestrian re-identification for the test set.
The system 200 for re-identifying a pedestrian based on polarization self-attention can implement the steps in the method for re-identifying a pedestrian based on polarization self-attention in the foregoing embodiments, and can implement the same technical effects, which are not described herein again with reference to the description in the foregoing embodiments.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device provided in an embodiment of the present invention, where the computer device 300 includes: a memory 302, a processor 301, and a computer program stored on the memory 302 and executable on the processor 301.
The processor 301 calls the computer program stored in the memory 302 to execute the steps of the pedestrian re-identification method provided by the embodiment of the present invention, and with reference to fig. 1, the method specifically includes:
s101, acquiring a shooting data set with a pedestrian picture, and preprocessing the shooting data set to obtain a data set to be divided.
Furthermore, in the step of obtaining a shooting data set with a pedestrian picture and preprocessing the shooting data set to obtain a data set to be divided, the preprocessing specifically includes:
normalizing the size of each picture in the shooting data set, and turning, randomly cutting and erasing data enhancement are carried out on each picture in the shooting data set.
S102, dividing the data set to be divided into a training set and a testing set, wherein each pedestrian picture in the training set and the data set has a real label.
S103, constructing a pedestrian re-identification model comprising a double-branch structure and a polarized self-attention mechanism structure.
Furthermore, the pedestrian re-identification model takes a convolutional neural network as a feature extraction network, the convolutional neural network comprises an input layer, a convolutional layer, a feature extraction layer and an output layer, wherein the convolutional layer comprises a plurality of layers, the double-branch structure comprises a global branch and a local branch, the double-branch structure is positioned between the convolutional layer and the output layer, the polarization self-attention mechanism structure comprises a channel self-attention branch and a space self-attention branch, and the polarization self-attention mechanism structure is positioned after each convolutional layer.
Further, for the polarized self-attention mechanism structure, the feature matrix of the pedestrian picture output by the convolution layer is defined as X, and the weight of the channel self-attention branch is defined as A sp (X), then the channel is branched from attention by weight A sp (X) satisfies the relation (1):
A sp (X)=F SG3 (F SM1 (F GP (W q (X))))×σ 2 (W v (X)))] (1)
defining the spatial self-attention branch as weight A ch (X), then the weight A of the spatial self-attention branch ch (X) satisfies the relation (2):
Figure BDA0003620269850000121
in the above relational expressions (1) and (2), W q 、W v
Figure BDA0003620269850000122
Convolution operations, σ, all 1 x1 1 、σ 2 、σ 3 Are all tensor warp operations, F SM For softmax operation, F GP For a global average pooling operation, F SG Is a Sigmoid function;
defining a weight A for the channel self-attention branch sp (X) and weight A of the spatial self-attention branch ch (X) parallel fusion results as PSA p (X), then said parallel fusion results PSA p (X) satisfies the relation (3):
PSA p (X)=Z ch +Z sp =A ch (X)⊙ ch X+A sp (X)⊙ sp X (3)
in the above relation (3), PSA p (X) as an output of said polarized self-attentive force mechanism structure ch As a multiply-by-channel operation, " sp For multiplication operations in space, Z sp For the output of said spatial self-attention branch, Z ch Is the output of the channel from the attention branch.
Furthermore, for the dual-branch structure, the global branch has the same structure as the feature extraction layer of the convolutional neural network, and is configured to take the parallel fusion result output by the polarized self-attention mechanism structure as input, and extract a global feature corresponding to the pedestrian picture, where the local branch and the global branch are in parallel hierarchy, and are configured to take the parallel fusion result output by the polarized self-attention mechanism structure as input, and extract a local feature corresponding to the pedestrian picture.
Further, the output layer of the pedestrian re-identification model takes the fusion result of the global feature and the local feature as an output result.
And S104, taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight.
Still further, the loss function used by the pedestrian re-identification model in the training process comprises at least one of a triple loss, a classification loss and a center loss.
And S105, taking the training parameter weight as a test weight of the pedestrian re-identification model, and taking the test set as the input of the pedestrian re-identification model to obtain a pedestrian re-identification result of the test set.
Illustratively, when the computer device 300 used in the embodiment of the present invention executes the method for re-identifying a pedestrian in park management provided in the embodiment of the present invention and obtains the data in table 1 in the above embodiment, the used computer hardware environment is based on central processing units Intel Xeon E5-2630V 4 of Intel corporation, the used memory size is 128GB, and the used graphics processing unit is NVIDIA GeForce CTX1080 Ti; the software environment of the used computer program is based on Ubuntu 20.04, the programming language is Python 3.6, the deep learning framework is Pythrch 1.8, and the CUDA version is 11.4. In addition, the computer device 300 provided by the embodiment of the present invention may also be implemented on hardware based on platforms such as Jetson Nano, haisi 3559, and the like.
The computer device 300 provided in the embodiment of the present invention can implement the steps in the method for re-identifying a pedestrian based on polarized self-attention in the foregoing embodiment, and can implement the same technical effects, and the details are not repeated herein with reference to the description in the foregoing embodiment.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process and step in the pedestrian re-identification method based on polarization self-attention provided in the embodiment of the present invention, and can implement the same technical effects, and in order to avoid repetition, the detailed description is omitted here.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element identified by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described in connection with the preferred embodiments of the present invention, as illustrated and described in the accompanying drawings, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for re-identifying a pedestrian based on polarized self-attention, the method comprising:
acquiring a shooting data set with a pedestrian picture, and preprocessing the shooting data set to obtain a data set to be divided;
dividing the data set to be divided into a training set and a testing set, wherein each pedestrian picture in the training set and the data set has a real label;
constructing a pedestrian re-identification model comprising a double-branch structure and a polarization self-attention mechanism structure;
taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight;
and taking the training parameter weight as the testing weight of the pedestrian re-recognition model, and taking the testing set as the input of the pedestrian re-recognition model to obtain the pedestrian re-recognition result of the testing set.
2. The pedestrian re-identification method based on polarized self-attention as claimed in claim 1, wherein in the step of obtaining a shot data set with a pedestrian picture and preprocessing the shot data set to obtain data sets to be divided, the preprocessing specifically comprises:
and normalizing the size of each picture in the shooting data set, and turning, randomly cutting and erasing data enhancement are performed on each picture in the shooting data set.
3. The pedestrian re-identification method based on the polarized self-attention as claimed in claim 1, wherein the pedestrian re-identification model takes a convolutional neural network as a feature extraction network, the convolutional neural network comprises an input layer, a convolutional layer, a feature extraction layer and an output layer, and the convolutional layer comprises a plurality of layers; the dual-branch structure comprises a global branch and a local branch, and is positioned between the convolutional layer and the output layer; the polarized self-attention mechanism structure includes a channel self-attention branch and a spatial self-attention branch, the polarized self-attention mechanism structure being located after each of the convolutional layers.
4. The pedestrian re-identification method based on polarized self-attention of claim 3, wherein for the polarized self-attention mechanism structure, a feature matrix defining the pedestrian picture output by the convolution layer is X, and the weight of the channel self-attention branch is A sp (X), then the channel is weighted from the attention branch by A sp (X) satisfies the relation (1):
A sp (X)=F SG3 (F SM1 (F GP (W q (X))))×σ 2 (W v (X)))] (1)
defining the weight of the spatial self-attention branch as A ch (X), then the weight of the spatial self-attention branch A ch (X) satisfies the relation (2):
Figure FDA0003620269840000021
in the above relational expressions (1) and (2), W q 、W v
Figure FDA0003620269840000022
Convolution operations, σ, all 1 x1 1 、σ 2 、σ 3 Are all tensor warp operations, F SM For softmax operation, F GP For a global average pooling operation, F SG Is a Sigmoid function;
defining a weight A for the channel self-attention branch sp (X) and the weight of the spatial self-attention branch A ch (X) parallel fusion results as PSA p (X), then said parallel fusion results PSA p (X) satisfies the relation (3):
PSA p (X)=Z ch +Z sp =A ch (X)⊙ ch X+A sp (X)⊙ sp X (3)
in the above relation (3), PSA p (X) as an output of said polarized self-attentive force mechanism structure ch As a multiply-by-channel operation, " sp For operation by spatial multiplication, Z sp For the output of said spatial self-attention branch, Z ch Is the output of the channel from the attention branch.
5. The method of polarized self-attention based pedestrian re-identification as claimed in claim 4 wherein for the dual-branch structure: the global branch has the same structure as the feature extraction layer of the convolutional neural network and is used for taking the parallel fusion result output by the polarization self-attention mechanism structure as input and extracting the global feature corresponding to the pedestrian picture; the local branch and the global branch are in a parallel level and used for inputting and extracting the local features corresponding to the pedestrian picture by taking the parallel fusion result output by the polarized self-attention mechanism structure.
6. The polarization self-attention based pedestrian re-recognition method according to claim 5, wherein an output layer of the pedestrian re-recognition model takes a fusion result of the global feature and the local feature as an output result.
7. The pedestrian re-identification method based on polarization self-attention as claimed in claim 1, wherein the loss function used by the pedestrian re-identification model in the training process comprises at least one of a triplet loss, a classification loss and a center loss.
8. A polarized self-attention based pedestrian re-identification system, comprising:
the device comprises a preprocessing module, a data partitioning module and a data partitioning module, wherein the preprocessing module is used for acquiring a shooting data set with a pedestrian picture and preprocessing the shooting data set to obtain a data set to be partitioned;
the data dividing module is used for dividing the data set to be divided into a training set and a testing set, wherein each pedestrian picture in the training set and the data set is provided with a real label;
the model building module is used for building a pedestrian re-identification model comprising a double-branch structure and a polarization self-attention mechanism structure;
the model training module is used for taking the training set as the input of the pedestrian re-recognition model, taking the real label corresponding to the training set as the reference of parameter debugging, and training the pedestrian re-recognition model by using an Adam optimization algorithm to obtain the training parameter weight;
and the data identification module is used for taking the training parameter weight as the testing weight of the pedestrian re-identification model and taking the testing set as the input of the pedestrian re-identification model to obtain the pedestrian re-identification result of the testing set.
9. A computer device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method for pedestrian re-identification based on polarized self-attention as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method for pedestrian re-identification based on polarized self-attention as claimed in any one of claims 1 to 7.
CN202210462489.9A 2022-04-24 2022-04-24 Pedestrian re-identification method, system and related equipment based on polarization self-attention Pending CN114792430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210462489.9A CN114792430A (en) 2022-04-24 2022-04-24 Pedestrian re-identification method, system and related equipment based on polarization self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210462489.9A CN114792430A (en) 2022-04-24 2022-04-24 Pedestrian re-identification method, system and related equipment based on polarization self-attention

Publications (1)

Publication Number Publication Date
CN114792430A true CN114792430A (en) 2022-07-26

Family

ID=82461658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210462489.9A Pending CN114792430A (en) 2022-04-24 2022-04-24 Pedestrian re-identification method, system and related equipment based on polarization self-attention

Country Status (1)

Country Link
CN (1) CN114792430A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116110076A (en) * 2023-02-09 2023-05-12 国网江苏省电力有限公司苏州供电分公司 Power transmission aerial work personnel identity re-identification method and system based on mixed granularity network
CN116823914A (en) * 2023-08-30 2023-09-29 中国科学技术大学 Unsupervised focal stack depth estimation method based on all-focusing image synthesis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116110076A (en) * 2023-02-09 2023-05-12 国网江苏省电力有限公司苏州供电分公司 Power transmission aerial work personnel identity re-identification method and system based on mixed granularity network
CN116110076B (en) * 2023-02-09 2023-11-07 国网江苏省电力有限公司苏州供电分公司 Power transmission aerial work personnel identity re-identification method and system based on mixed granularity network
CN116823914A (en) * 2023-08-30 2023-09-29 中国科学技术大学 Unsupervised focal stack depth estimation method based on all-focusing image synthesis
CN116823914B (en) * 2023-08-30 2024-01-09 中国科学技术大学 Unsupervised focal stack depth estimation method based on all-focusing image synthesis

Similar Documents

Publication Publication Date Title
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN112906720B (en) Multi-label image identification method based on graph attention network
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN108090472B (en) Pedestrian re-identification method and system based on multi-channel consistency characteristics
CN114792430A (en) Pedestrian re-identification method, system and related equipment based on polarization self-attention
CN111008618B (en) Self-attention deep learning end-to-end pedestrian re-identification method
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
Cun et al. Image splicing localization via semi-global network and fully connected conditional random fields
CN114170516B (en) Vehicle weight recognition method and device based on roadside perception and electronic equipment
CN114749342A (en) Method, device and medium for identifying coating defects of lithium battery pole piece
CN116503399B (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN111695460A (en) Pedestrian re-identification method based on local graph convolution network
CN114676776A (en) Fine-grained image classification method based on Transformer
CN112329771A (en) Building material sample identification method based on deep learning
CN116206334A (en) Wild animal identification method and device
CN116503398B (en) Insulator pollution flashover detection method and device, electronic equipment and storage medium
CN117456480A (en) Light vehicle re-identification method based on multi-source information fusion
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
Obeso et al. Introduction of explicit visual saliency in training of deep cnns: Application to architectural styles classification
CN116975828A (en) Face fusion attack detection method, device, equipment and storage medium
CN116824330A (en) Small sample cross-domain target detection method based on deep learning
CN116310323A (en) Aircraft target instance segmentation method, system and readable storage medium
CN114842417A (en) Anti-unmanned aerial vehicle system image identification method based on coordinate attention mechanism fusion
CN114387489A (en) Power equipment identification method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination