CN111368815B - Pedestrian re-identification method based on multi-component self-attention mechanism - Google Patents
Pedestrian re-identification method based on multi-component self-attention mechanism Download PDFInfo
- Publication number
- CN111368815B CN111368815B CN202010467045.5A CN202010467045A CN111368815B CN 111368815 B CN111368815 B CN 111368815B CN 202010467045 A CN202010467045 A CN 202010467045A CN 111368815 B CN111368815 B CN 111368815B
- Authority
- CN
- China
- Prior art keywords
- pcpa
- attention
- feature
- self
- pedestrian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a pedestrian re-identification method based on a multi-component self-attention mechanism, which comprises the steps of pre-training a deep convolutional neural network backbone model; then, after the backbone model is branched, a multi-component self-attention network is constructed, and multi-component self-attention characteristics are obtained; then inputting the multi-component self-attention characteristics into a classifier, and performing combined training to minimize cross entropy loss and metric loss; and finally, inputting the test set picture into the trained model, fusing the output component characteristics to obtain overall characteristics, and sequencing the longitude quantities to realize pedestrian re-identification. Various challenges in the pedestrian re-identification problem are fully considered, a multi-component self-attention mechanism is provided, the attention activation region is effectively expanded, and the pedestrian characteristics are enriched; the self-attention module enables the network to pay attention to the area with the distinguishing characteristic more fully and finely, wherein the space attention module and the channel attention module are fused into the network in the form of residual errors, so that the network is more robust and stable and is easy to train.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence and computer vision, and particularly relates to a pedestrian re-identification method based on a multi-component self-attention mechanism.
Background
With the acceleration of urbanization, public safety has become a focus and a demand of increasing attention. Monitoring cameras are widely covered in important public health areas such as university campuses, theme parks, hospitals, streets and the like, and good objective conditions are created for automatic monitoring by utilizing a computer vision technology.
In recent years, pedestrian re-identification has been receiving increasing attention as an important research direction in the field of video monitoring. Specifically, pedestrian re-identification refers to a technology of judging whether a specific pedestrian exists in an image or a video sequence by using a computer vision technology under a cross-camera and cross-scene condition. As an important supplement of a face recognition technology, the technology can recognize pedestrians according to wearing, posture, hairstyle and other information of the pedestrians, the pedestrians who cannot acquire clear shot faces are continuously tracked across cameras in an actual monitoring scene, the space-time continuity of data is enhanced, a large amount of manpower and material resources are saved, and the technology has important research significance.
In an open environment, due to the fact that a monitoring scene is complex and changeable, interference factors such as background noise, illumination change, posture change and severe shielding often exist in an acquired pedestrian image, an existing recognition model cannot well pay attention to an area with strong discriminativity and high discriminativity, extracted features are not robust enough, and recognition performance is poor. Therefore, it is highly desirable to provide a pedestrian re-identification method capable of accurately extracting strong-discriminability and high-resolution features.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on a multi-component self-attention mechanism, aiming at the defects of the existing method. The method utilizes the advantages of the existing deep learning and extracts features through a deep residual error neural network; an identification model based on a multi-component self-attention mechanism is built, pedestrian features are extracted in a component-by-component, multi-branch and high-fusion manner, the attention activation range is expanded, and the region with discriminability is more widely and sufficiently noted; the spatial self-attention and the channel self-attention are fused, the model is enabled to pay more attention to key areas with distinguishing characteristics in space, the model is enabled to integrate and summarize channels containing similar semantic information in the channels, classification results are enabled to be more distinctive, meanwhile, the spatial attention module and the channel attention module are fused into the network in a residual error mode, and the network is enabled to be more robust and stable and easy to train. In conclusion, the invention improves the performance of pedestrian re-identification under the condition of crossing the cameras, and has good robustness and universal applicability.
The purpose of the invention is realized by the following technical scheme: a pedestrian re-identification method based on a multi-component self-attention mechanism comprises the following steps:
s1: pre-training a deep convolutional neural network backbone model B;
s2: segmenting the backbone model B: b iscommonAnd BbranchIn which B isbranchCorresponding to the last layer of residual error network layer4 of the backbone model B, and dividing B into two layersbranchDeeply copying 2 parts to obtain three branches: b isbranch1,Bbranch2,Bbranch3Constructing a multi-component self-attention network ANet behind the branch to obtain a multi-component self-attention feature F of the pedestrian;
s3: inputting the multi-component self-attention features into a classifier CLS, and jointly training to minimize cross-entropy loss LxentAnd measure the loss Ltriplet;
S4: and inputting the test set picture into the trained model, fusing the output component characteristics to obtain an overall characteristic f, and sequencing the longitude quantities to realize pedestrian re-identification.
Further, the step S1 specifically includes: and adopting ResNet by the deep convolutional neural network backbone model B, and pre-training on an ImageNet data set to enable B to obtain an initial value.
Further, the step S2 includes the following sub-steps:
s2.1: let BcommonAnd Bbranch1,Bbranch2,Bbranch3Corresponding to a learning parameter of WcommonAnd Wbranch1,Wbranch2,Wbranch3,Wbranch1,Wbranch2,Wbranch3The initialization parameters are consistent; pedestrian image P passing through BcommonRespectively pass through Bbranch1,Bbranch2,Bbranch3Then, the corresponding extracted feature map is classified as F1∈RC×H×W,F2∈RC×H×W,F3∈RC×H×WWherein C is the number of channels of the feature map, H is the height of the feature map, W is the width of the feature map, and the calculation formula is as follows:
wherein T represents a matrix transposition function;
s2.2: in Bbranch1And later establishing a branch: a local component based self-attention network PCPA; f1Input PCPA, output feature set FpcpaPCPA network parameter is Wpcpa;
S2.3: in Bbranch2And later establishing a branch: a global component based self-attention network PCA; f2Input PCA, output feature vector FpcaPCA network parameter is Wpca;
S2.4: in Bbranch3And later establishing a branch: global component-based feature mapping network Global; f3Input Global, output feature vector FgThe Global network parameter is Wglobal;
S2.5:Fpcpa,Fpca,FgCollectively constituting a multi-component self-attention feature set F.
Further, the PCPA in step S2.2 comprises the following sub-steps:
s2.2.2: for inputThrough three branches: a feature mapping module Identity, a spatial self-attention module Patt, a channel self-attention module CATt; correspondingly, the extracted feature maps are respectively F1_up_identity,F1_up_patt,F1_up_cattThe method comprises the following substeps:
F1_up_identity=F1_up
(b) for inputThe spatial self-attention module PAtt specifically includes: for arbitrary feature vectors xi,xj∈F1_up,1≤i≤N,1≤j≤N,Self-attention relation modeling is carried out on the space scale, and a relation matrix D ∈ R is obtainedN*N(ii) a For each value DkThe calculation formula is as follows:
applying a relationship matrix D to F1_upAnd is merged into the network in the form of residual error to obtain an updated characteristic diagram F1_up_pattThe calculation formula is as follows:
(c) for inputThe channel self-attention module CAtt specifically includes: for arbitrary feature vectors ci,cj∈F1_upI is more than or equal to 1 and less than or equal to C, j is more than or equal to 1 and less than or equal to C, and the self-attention relation modeling is carried out on the channel scale to obtain a relation matrix E ∈ RC*C(ii) a For each value E of EkThe calculation formula is:
Applying a relationship matrix E to the sum F1_upObtaining an updated feature map F1_up_cattThe calculation formula is as follows:
s2.2.3: handle F1_up_identity,F1_up_patt,F1_up_cattFusing to obtain an output characteristic diagram F1_up_pcpaThe calculation method is as follows:
F1_up_pcpa=F1_up_identity+F1_up_patt+F1_up_catt
s2.2.4: for theMode of operation and1_upobtaining an output characteristic diagram F passing through the PCPA1_down_pcpaAnd is with F1_upSharing a parameter WpattAnd Wcatt;
S2.2.5: to F1_up_pcpaAnd F1_down_pcpaPerforming global average pooling to obtain a feature vector set:
Fpcpa={Fup_pcpa,Fdown_pcpa}
wherein, Fup_pcpa=AvgP(F1_up_pcpa),Fdown_pcpa=AvgP(F1_down_pcpa) (ii) a AvgP (·) represents the global average pooling operation, which is calculated as:
wherein C is more than or equal to 1 and less than or equal to C1,1≤w≤W1,1≤h≤H1,C1Is F1_up_pcpaNumber of channels of feature map, W1Is F1_up_pcpaWidth of the feature map, H1Is F1_up_pcpaHeight, x of the feature mapc,w,hFor three-dimensional feature maps F1_up_pcpaAn element of (1); AvgP (F)1_down_pcpa) The same is true.
Further, the PCA in step S2.3 specifically is: for input F2∈RC×H×WThrough three branches: the feature mapping module Identity, the spatial self-attention module Patt, and the channel self-attention module CATt are calculated in the same manner as steps S2.2.2-S2.2.3 of the PCPA, but without performing feature map horizontal segmentation.
Further, Global in step S2.4 is specifically: for input F3∈RC×H×WPerforming global average pooling to obtain a feature vector FgThe calculation formula is as follows:
Fg=AvgP(F3)
wherein, AvgP (-) represents the global average pooling operation, and the calculation formula is:
wherein C is more than or equal to 1 and less than or equal to C3,1≤w≤W3,1≤h≤H3,C3Is F3Number of channels of feature map, W3Is F3Width of the feature map, H3Is F3Height, x of the feature mapc,w,hFor three-dimensional feature maps F3Of (2) is used.
Further, the multi-component self-attention feature set F ═ F in step S2.5up_pcpa,Fdown_pcpa,Fpca,Fg}。
Further, the step S3 includes the following sub-steps:
s3.1: for the input pedestrian image P ═ { P ═ P1,p2,p3......pnAndcorresponding identity tag IDs ═ { q ═ q1,q2,q3......qnN is the number of samples of P; obtaining a multi-component self-attention feature F ═ { F ═ F corresponding to the pedestrian image P through the step S2up_pcpa,Fdown_pcpa,Fpca,Fg};
S3.2: the classifier CLS (-) represents a fully connected layer for pi∈ P, 1 ≦ i ≦ n, and its corresponding multi-component self-attention feature Fi={Fi,up_pcpa,Fi,down_pcpa,Fi,pca,Fi,g}; let the weight matrix be Wcls={Wcls1,Wcls2,Wcls3,Wcls4},Wcls1,Wcls2,Wcls3,Wcls4∈RK×ZK is the input feature vector Fi,up_pcpa,Fi,down_pcpa,Fi,pca,Fi,gZ is an output dimension, namely the number of pedestrian identity tags; through CLS (-), the output classification probability Pro:
s3.3: calculating the cross entropy loss LxentThe calculation formula is as follows:
s3.4: for arbitrarily inputted pedestrian sample pi∈P={p1,p2,p3......pnH, identity label qi∈IDs={q1,q2,q3......qnGet P out of PiNearest negative example pjAnd from piThe farthest positive sample pk(ii) a Obtaining a multi-component self-attention feature via said step S2:
Fpi={Fpi,up_pcpa,Fpi,down_pcpa,Fpi,pca,Fpi,g}
Fpj={Fpj,up_pcpa,Fpj,down_pcpa,Fpj,pca,Fpj,g}
Fpk={Fpk,up_pcpa,Fpk,down_pcpa,Fpk,pca,Fpk,g}
s3.5: calculating metric loss LtripletThe calculation method is as follows: for Fup_pcpaRecording:
order snup_pcpa-spup_pcpaM is a boundary value, for Fup_pcpaIs measured by the loss Ltriple_up_pcpa,i,j,kThe calculation formula of (2) is as follows:
wherein [ ·]+Is a change function; fdown_pcpa,Fpca,FgThe same principle is applied to the calculation of the metric loss of (1), to obtain LtripletThe calculation formula of (2) is as follows:
s3.6: joint training minimization of cross entropy loss LxentAnd measure the loss LtripletThe total loss is L ═ Lxent+λLtripletλ is LxentAnd LtripletThe balance parameter of (1).
Further, the step S4 includes the following sub-steps:
s4.1: the query picture set A is ═ { a ═ a1,a2,a3......aMAnd a picture set G to be selected is G ═ G1,g2,g3......gTAnd inputting the backbone model B and the multi-component self-attention network ANet respectively to obtain corresponding feature sets:
FA={FA,up_pcpa,FA,down_pcpa,FA,pca,FA,g}
FG={FG,up_pcpa,FG,down_pcpa,FG,pca,FG,g}
s4.2: f is to beAAnd FGRespectively fusing the features of the components in a mode of carrying out feature connection on the dimensions to obtain an overall feature fAAnd fG;
S4.3: calculating fAAnd fGEuclidean distance between them, constructing a distance matrix S ∈ RM*TAnd sorting according to the distance to obtain a retrieval candidate list.
The invention has the beneficial effects that: the invention relates to pedestrian picture retrieval and identification under a cross-camera, which is characterized in that:
(1) the invention constructs an identification model based on a multi-component self-attention mechanism, extracts pedestrian features in a way of splitting, multi-branching and high fusion, enlarges the attention activation range, more widely and fully pays attention to key areas with discriminability, and improves the feature robustness.
(2) The invention integrates the self-mapping module, the space attention module and the channel attention module to extract the pedestrian characteristics, and more accurately focuses on the areas with high discriminability. The self-mapping module is helpful for the model to pay attention to local information and global information of the model, the space attention module enables the model to pay more attention to key areas with distinguishing characteristics, and the channel attention module enables the model to integrate and summarize channels containing similar semantic information, so that the classification result is more distinctive, and the accuracy of pedestrian re-identification is improved.
(3) According to the invention, through multi-branch combined training measurement loss and classification cross entropy loss, the characteristic discrimination and the discrimination of each branch are improved, so that the identification accuracy is improved.
Drawings
FIG. 1 is a network model structure diagram of a pedestrian re-identification method based on a multi-component self-attention mechanism disclosed in the invention;
fig. 2 is a schematic diagram of a multi-component self-attention network ANet disclosed in the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions and specific operation procedures in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, but the scope of the present invention is not limited to the embodiments described below.
As shown in fig. 1-2, an embodiment of the present invention discloses a pedestrian re-identification method based on a multi-component self-attention mechanism, which includes the following steps:
s1: and pre-training a deep convolutional neural network backbone model B.
In step S1, ResNet is used as the convolutional neural network backbone model, and pretraining is performed on the ImageNet large-scale data set, so that the backbone network B obtains an ideal initial value.
S2: segmenting the backbone model B: b iscommonAnd BbranchIn which B isbranchCorresponding to the last layer of residual network layer4 of ResNet50, BbranchDeeply copying 2 parts to obtain three branches: b isbranch1,Bbranch2,Bbranch3And constructing a multi-component self-attention network ANet after the branch to obtain a multi-component self-attention feature F of the pedestrian.
The step S2 specifically includes:
s2.1: let BcommonAnd Bbranch1,Bbranch2,Bbranch3Corresponding to a learning parameter of WcommonAnd Wbranch1,Wbranch2,Wbranch3,Wbranch1,Wbranch2,Wbranch3Sets the input pedestrian image P as a picture in RGB format, resized to 348 × 128 × 3, passed BcommonRespectively pass through Bbranch1,Bbranch2,Bbranch3Then, the corresponding extracted feature maps are respectively F1∈RC×H×W,F2∈RC×H×W,F3∈RC×H×WWherein C is the number of channels of the feature map, H is the height of the feature map, W is the width of the feature map, and the calculation formula is as follows:
where T represents a matrix transposition function.
S2.2: in Bbranch1And later establishing a branch: local component based self-attention network PCPA. For PCPA, input F1Output feature set FpcpaWherein the PCPA network parameter is Wpcpa。
In S2.2, the PCPA specifically includes the following steps:
s2.2.1: for F1∈RC×H×WHorizontally and equally dividing the height of the character into an upper part and a lower part to form a characteristic diagram(upper half) and(lower half).
S2.2.2: for inputRespectively passing through three branches: a feature mapping module Identity, a spatial self-attention module Patt, a channel self-attention module CATt; correspondingly, the extracted feature maps are respectively F1_up_identity,F1_up_patt,F1_up_catt. Wherein Identity is used for mapping inputIn its own right, PAtt is aimed at placing more attention in the spatial regions with discriminability and distinctiveness, and CAtt is aimed at generalizing the channel features containing similar semantic information:
(a) for inputThe Identity mapping relationship is calculated by the following formula: f1_up_identity=F1_up;
for arbitrary feature vectors xi,xj∈F1_up,1≤i≤N,1≤j≤N,Self-attention relation modeling is carried out on the space scale, and a relation matrix D ∈ R is obtainedN*N. For each value DkThe calculation formula is as follows:k is (i, j). Applying a relationship matrix D to F1_upObtaining an updated feature map F1_up_pattThe calculation formula is as follows:
wherein, WpattAre the parameters of the PAtt module and,representing matrix multiplication and outer product;
for any characteristic directionQuantity ci,cj∈F1_upI is more than or equal to 1 and less than or equal to C, j is more than or equal to 1 and less than or equal to C, and the self-attention relation modeling is carried out on the channel scale to obtain a relation matrix E ∈ RC*C. For each value E of EkThe calculation formula is as follows:
applying a relationship matrix E to the sum F1_upObtaining an updated feature map F1_up_cattThe calculation formula is as follows:
S2.2.3: handle F1_up_identity,F1_up_patt,F1_up_cattFusing to obtain an output characteristic diagram F1_up_pcpaThe calculation method is as follows:
F1_up_pcpa=F1_up_identity+F1_up_patt+F1_up_catt。
s2.2.4: for theMode of operation and1_upobtaining an output characteristic diagram F passing through the PCPA1_down_pcpaAnd is with F1_upAnd sharing the parameters.
S2.2.5: to F1_up_pcpaAnd F1_down_pcpaCarrying out global average pooling to obtain a characteristic vector set Fpcpa:
Fpcpa={Fup_pcpa,Fdown_pcpa},
Fup_pcpa=AvgP(F1_up_pcpa),
Fdown_pcpa=AvgP(F1_down_pcpa)
Wherein, AvgP (-) represents the global average pooling operation, and the calculation formula is:
wherein x isc,w,hFor three-dimensional feature maps F1_up_pcpaC is not less than 1 and not more than C1,1≤w≤W1,1≤h≤H1,C1Is F1_up_pcpaNumber of channels of feature map, W1Is F1_up_pcpaWidth of the feature map, H1Is F1_up_pcpaThe height of the feature map.
S2.3: in Bbranch2And later establishing a branch: self-attention network PCA based on global components. For PCA, input F2Outputting the feature vector FpcaWherein the PCA network parameter is Wpca。
In S2.3, for input F2∈RC*H*WThe PCA specifically comprises: through three branches: the calculation modes of the feature mapping module Identity, the space self-attention module Patt and the channel self-attention module CATt are consistent with those of the Identity, the Patt and the CATt in the PCPA, but F is equal to F2The operations of steps S2.2.2-S2.2.3 are performed without horizontal splitting. In particular, the module is in Bbranch2Albeit with Bbranch1The PCPA of (a) is calculated in a consistent manner, but does not share parameters.
S2.4: in Bbranch3And later establishing a branch: and mapping the Global component-based feature to the Global. For Global, input F3Outputting the feature vector FgWherein, the Global network parameter is Wglobal。
In S2.4, the Global specific process includes: for input F3∈RC*H*WPerforming global average pooling to obtain a feature FgThe calculation formula is as follows:
Fg=AvgP(F3)
wherein, AvgP (-) represents the global average pooling operation, and the calculation formula is:
wherein x isc,w,hFor three-dimensional feature maps F3C is not less than 1 and not more than C3,1≤w≤W3,1≤h≤H3,C3Is F3Number of channels of feature map, W3Is F3Width of the feature map, H3Is F3The height of the feature map.
S2.5:Fpcpa,Fpca,FgCollectively forming a multi-component self-attention feature set F; the multi-component self-attention feature set F specifically includes:
F={Fup_pcpa,Fdown_pcpa,Fpca,Fg}
s3: inputting the multi-component self-attention features into a classifier CLS, and jointly training to minimize cross-entropy loss LxentAnd measure the loss Ltriplet。
S3.1: for the input pedestrian image P ═ { P ═ P1,p2,p3......pnAnd the corresponding identity tag IDs ═ q1,q2,q3......qnN is the number of samples of the input pedestrian picture P; obtaining a multi-component self-attention feature F ═ { F) through said S2up_pcpa,Fdown_pcpa,Fpca,Fg}。
S3.2: the classifier CLS (-) represents a fully connected layer for pi∈ P, 1 ≦ i ≦ n, and its corresponding multi-component self-attention feature Fi={Fi,up_pcpa,Fi,down_pcpa,Fi,pca,Fi,g}, input feature vector Fi,up_pcpa,Fi,down_pcpa,Fi,pca,Fi,gAll the dimensions of (A) are K; let the weight matrix be Wcls={Wcls1,Wcls2,Wcls3,Wcls4},Wcls1,Wcls2,Wcls3,Wcls4∈RK×ZZ is the output dimension, i.e. pedestrian identity labelAnd (4) the number. Through CLS (-), the output classification probability Pro:
s3.3: calculating the cross entropy loss LxentThe calculation formula is as follows:
wherein q isi∈IDs。
S3.4: for arbitrarily inputted pedestrian sample pi∈P={p1,p2,p3......pnH, identity label qi∈IDs={q1,q2,q3......qnGet P out of PiNearest negative example pjAnd from piThe farthest positive sample pk. Obtaining a multi-component self-attention feature via said S2:
Fpi={Fpi,up_pcpa,Fpi,down_pcpa,Fpi,pca,Fpi,g}
Fpj={Fpj,up_pcpa,Fpj,down_pcpa,Fpj,pca,Fpj,g}
Fpk={Fpk,up_pcpa,Fpk,down_pcpa,Fpk,pca,Fpk,g}。
s3.5: calculating metric loss LtripletA calculation partyThe formula is as follows:
with Fup_pcpaFor example, note:
order snup_pcpa-spup_pcpaM is more than or equal to m, m is a boundary value, and the value of m in the embodiment is 1.2; then for Fup_pcpaIs measured by the loss Ltriple_up_pcpa,i,j,kThe calculation formula of (2) is as follows:
wherein q isi、qj、qkIs a sample pi、pj、pkA corresponding identity tag; [. the]+Is a change function; thus obtaining LtripletThe calculation formula of (2) is as follows:
s3.6: joint training minimization of cross entropy loss LxentAnd measure the loss LtripletThe total loss is L ═ Lxent+λLtripletλ is LxentAnd LtripletThe balance parameter of (1). In this example, λ is 0.5.
S4: and inputting the test set picture into the trained model, fusing the output component characteristics to obtain an overall characteristic f, and sequencing the longitude quantities to realize pedestrian re-identification. The test set pictures comprise a query picture set and a to-be-selected picture set, and the to-be-selected pictures with the same pedestrian are found from the to-be-selected picture set according to the query picture set.
The step S4 specifically includes:
s4.1: for query picture set a ═ a1,a2,a3......aMAnd a picture set G to be selected is G ═ G1,g2,g3......gTM is the number of the elements of the query picture set, and T is the number of the elements of the picture set to be selected; a and G are both RGB pictures, and the size is setAdjusted to 384 × 128 × 3, the backbone model B and the multi-component self-attention network ANet are input separately to obtain the corresponding feature sets:
FA={FA,up_pcpa,FA,down_pcpa,FA,pca,FA,g}
FG={FG,up_pcpa,FG,down_pcpa,FG,pca,FG,g}。
s4.2: f is to beAAnd FGRespectively fusing the features of the components in a mode of carrying out feature connection on the dimensions to obtain an overall feature fAAnd fG。
S4.3: calculating fAAnd fGEuclidean distance between them, constructing a distance matrix S ∈ RM*TSorting the pictures to be selected according to the distance of each query picture, setting the query number s, and taking the first s pictures to be selected with smaller distance as a retrieval candidate list of the query picture; and evaluating the accuracy of the result by using mAP and Rank @ 1.
Table 1 below shows the result of the recognition accuracy obtained by the method according to the above embodiment of the present invention. The results of comparison of other reference methods for comparison with the results of the embodiment are shown from top to bottom, and it can be seen that the recognition performance of the embodiment of the invention is greatly improved.
Table 1: identification accuracy results
Method of producing a composite material | mAP | Rank@1 |
SVD-Net | 62.1% | 82.3% |
AACN | 66.87 | 85.9% |
MGCAM | 74.3% | 83.8% |
HA-CNN | 75.7% | 91.2% |
PCB | 81.6% | 93.8% |
Method as described in the example | 88.5% | 95.4% |
In summary, the embodiment of the invention discloses a pedestrian re-identification method based on a multi-component self-attention mechanism, which constructs an identification model based on the multi-component self-attention mechanism, extracts pedestrian features in a component-by-component, multi-branch and high-fusion manner, enlarges the attention activation range, pays attention to a key region with discriminability more widely and fully, and improves the feature robustness; the method integrates a self-mapping module, a space attention module and a channel attention module to extract pedestrian features, and focuses on the regions with high discriminability more accurately. The self-mapping module is helpful for the model to pay attention to local information and global information of the model, the space attention module enables the model to pay more attention to key areas with distinguishing characteristics, and the channel attention module enables the model to integrate and summarize channels containing similar semantic information, so that classification results are more distinctive; in addition, the method improves the characteristic discrimination and the discrimination of each branch through multi-branch combined training measurement loss and classification cross entropy loss, thereby improving the identification accuracy.
Claims (7)
1. A pedestrian re-identification method based on a multi-component self-attention mechanism, comprising the steps of:
s1: pre-training a deep convolutional neural network backbone model B;
s2: segmenting the backbone model B: b iscommonAnd BbranchIn which B isbranchCorresponding to the last layer of residual error network layer4 of the backbone model B, and dividing B into two layersbranchDeeply copying 2 parts to obtain three branches: b isbranch1,Bbranch2,Bbranch3Constructing a multi-component self-attention network ANet behind the branch to obtain a multi-component self-attention feature F of the pedestrian;
s3: inputting the multi-component self-attention features into a classifier CLS, and jointly training to minimize cross-entropy loss LxentAnd measure the loss Ltriplet;
S4: and inputting the test set picture into the trained model, fusing the output component characteristics to obtain an overall characteristic f, and sequencing the longitude quantities to realize pedestrian re-identification.
The step S2 includes the following sub-steps:
s2.1: let BcommonAnd Bbranch1,Bbranch2,Bbranch3Corresponding to a learning parameter of WcommonAnd Wbranch1,Wbranch2,Wbranch3,Wbranch1,Wbranch2,Wbranch3The initialization parameters are consistent; pedestrian image P passing through BcommonRespectively pass through Bbranch1,Bbranch2,Bbranch3Then, the corresponding extracted feature maps are respectively F1∈RC×H×W,F2∈RC×H×W,F3∈RC×H×WWherein C is the number of channels of the feature map, H is the height of the feature map, W is the width of the feature map, and the calculation formula is as follows:
wherein T represents a matrix transposition function;
s2.2: in Bbranch1And later establishing a branch: a local component based self-attention network PCPA; f1Input PCPA, output feature set FpcpaPCPA network parameter is Wpcpa;
S2.3: in Bbranch2And later establishing a branch: a global component based self-attention network PCA; f2Input PCA, output feature vector FpcaPCA network parameter is Wpca;
S2.4: in Bbranch3And later establishing a branch: global component-based feature mapping network Global; f3Input Global, output feature vector FgThe Global network parameter is Wglobal;
S2.5:Fpcpa,Fpca,FgCollectively constituting a multi-component self-attention feature set F.
The PCPA in said step S2.2 comprises the following sub-steps:
s2.2.2: for inputThrough three branches: a feature mapping module Identity, a spatial self-attention module Patt, a channel self-attention module CATt; correspondingly, the extracted feature maps are respectively F1_up_identity,F1_up_patt,F1_up_cattThe method comprises the following substeps:
F1_up_identity=F1_up
(b) for inputThe spatial self-attention module PAtt specifically includes: for arbitrary feature vectors xi,xj∈F1_up,1≤i≤N,1≤j≤N,Self-attention relation modeling is carried out on the space scale, and a relation matrix D ∈ R is obtainedN*N(ii) a For each value DkThe calculation formula is as follows:
applying a relationship matrix D to F1_upAnd is merged into the network in the form of residual error to obtain an updated characteristic diagram F1_up_pattThe calculation formula is as follows:
(c) for inputThe channel self-attention module CAtt specifically includes: for arbitrary feature vectors ci,cj∈F1_upI is more than or equal to 1 and less than or equal to C, j is more than or equal to 1 and less than or equal to C, and the self-optimization is carried out on the channel scaleAttention relationship modeling to obtain a relationship matrix E ∈ RC*C(ii) a For each value E of EkThe calculation formula is as follows:
applying a relationship matrix E to the sum F1_upObtaining an updated feature map F1_up_cattThe calculation formula is as follows:
s2.2.3: handle F1_up_identity,F1_up_patt,F1_up_cattFusing to obtain an output characteristic diagram F1_up_pcpaThe calculation method is as follows:
F1_up_pcpa=F1_up_identity+F1_up_patt+F1_up_catt
s2.2.4: for theMode of operation and1_upobtaining an output characteristic diagram F passing through the PCPA1_down_pcpaAnd is with F1_upSharing a parameter WpattAnd Wcatt;
S2.2.5: to F1_up_pcpaAnd F1_down_pcpaPerforming global average pooling to obtain a feature vector set:
Fpcpa={Fup_pcpa,Fdown_pcpa}
wherein, Fup_pcpa=AvgP(F1_up_pcpa),Fdown_pcpa=AvgP(F1_down_pcpa) (ii) a AvgP (·) represents the global average pooling operation, which is calculated as:
wherein x isc,w,hFor three-dimensional feature maps F1_up_pcpaC is not less than 1 and not more than C1,1≤w≤W1,1≤h≤H1,C1Is F1_up_pcpaNumber of channels of feature map, W1Is F1_up_pcpaWidth of the feature map, H1Is F1_up_pcpaHeight of the feature map; AvgP (F)1_down_pcpa) The same is true.
2. The pedestrian re-identification method based on the multi-component self-attention mechanism according to claim 1, wherein the step S1 is specifically: and adopting ResNet by the deep convolutional neural network backbone model B, and pre-training on an ImageNet data set to enable B to obtain an initial value.
3. The pedestrian re-identification method based on the multi-component self-attention mechanism as claimed in claim 1, wherein the PCA in step S2.3 is specifically: for input F2∈RC×H×WThrough three branches: the feature mapping module Identity, the spatial self-attention module Patt, and the channel self-attention module CATt are calculated in the same manner as steps S2.2.2-S2.2.3 of the PCPA, but without performing feature map horizontal segmentation.
4. The pedestrian re-identification method based on the multi-component self-attention mechanism according to claim 3, wherein Global in the step S2.4 is specifically: for input F3∈RC×H×WPerforming global average pooling to obtain a feature vector FgThe calculation formula is as follows:
Fg=AvgP(F3)
wherein, AvgP (-) represents the global average pooling operation, and the calculation formula is:
wherein C is more than or equal to 1 and less than or equal to C3,1≤w≤W3,1≤h≤H3,C3Is F3Number of channels of feature map, W3Is F3Width of the feature map, H3Is F3Height, x of the feature mapc,w,hFor three-dimensional feature maps F3Of (2) is used.
5. The method for pedestrian re-identification based on the multi-component self-attention mechanism according to claim 4, wherein the multi-component self-attention feature set F ═ { F ═ F in step S2.5up_pcpa,Fdown_pcpa,Fpca,Fg}。
6. The method for pedestrian re-identification based on the multi-component self-attention mechanism as claimed in claim 5, wherein the step S3 comprises the sub-steps of:
s3.1: for the input pedestrian image P ═ { P ═ P1,p2,p3......pnAnd the corresponding identity tag IDs ═ q1,q2,q3......qnN is the number of samples of P; obtaining a multi-component self-attention feature F ═ { F ═ F corresponding to the pedestrian image P through the step S2up_pcpa,Fdown_pcpa,Fpca,Fg};
S3.2: the classifier CLS (-) represents a fully connected layer for pi∈ P, 1 ≦ i ≦ n, and its corresponding multi-component self-attention feature Fi={Fi,up_pcpa,Fi,down_pcpa,Fi,pca,Fi,g}; let the weight matrix be Wcls={Wcls1,Wcls2,Wcls3,Wcls4},Wcls1,Wcls2,Wcls3,Wcls4∈RK×ZK is the input feature vector Fi,up_pcpa,Fi,down_pcpa,Fi,pca,Fi,gZ is an output dimension, namely the number of pedestrian identity tags; through CLS (-), the output classification probability Pro:
s3.3: calculating the cross entropy loss LxentThe calculation formula is as follows:
s3.4: for arbitrarily inputted pedestrian sample pi∈P={p1,p2,p3......pnH, identity label qi∈IDs={q1,q2,q3......qnGet P out of PiNearest negative example pjAnd from piThe farthest positive sample pk(ii) a Obtaining a multi-component self-attention feature via said step S2:
s3.5: calculating metric loss LtripletThe calculation method is as follows: for Fup_pcpaRecording:
order snup_pcpa-spup_pcpaM is a boundary value, for Fup_pcpaIs measured by the loss Ltriple_up_pcpa,i,j,kThe calculation formula of (2) is as follows:
wherein [ ·]+Is a change function; fdown_pcpa,Fpca,FgThe same principle is applied to the calculation of the metric loss of (1), to obtain LtripletThe calculation formula of (2) is as follows:
s3.6: joint training minimization of cross entropy loss LxentAnd measure the loss LtripletThe total loss is L ═ Lxent+λLtripletλ is LxentAnd LtripletThe balance parameter of (1).
7. The method for pedestrian re-identification based on the multi-component self-attention mechanism as claimed in claim 6, wherein the step S4 comprises the sub-steps of:
s4.1: the query picture set A is ═ { a ═ a1,a2,a3......aMAnd a picture set G to be selected is G ═ G1,g2,g3......gTAnd inputting the backbone model B and the multi-component self-attention network ANet respectively to obtain corresponding feature sets:
FA={FA,up_pcpa,FA,down_pcpa,FA,pca,FA,g}
FG={FG,up_pcpa,FG,down_pcpa,FG,pca,FG,g}
s4.2: f is to beAAnd FGRespectively fusing the features of the components in a mode of carrying out feature connection on the dimensions to obtain an overall feature fAAnd fG;
S4.3: calculating fAAnd fGEuclidean distance between them, constructing a distance matrix S ∈ RM*TSorting according to the distance to obtain a retrieval candidate list; wherein, M is the number of pictures in the query picture set A, and T is the number of pictures in the candidate picture set G.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010467045.5A CN111368815B (en) | 2020-05-28 | 2020-05-28 | Pedestrian re-identification method based on multi-component self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010467045.5A CN111368815B (en) | 2020-05-28 | 2020-05-28 | Pedestrian re-identification method based on multi-component self-attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111368815A CN111368815A (en) | 2020-07-03 |
CN111368815B true CN111368815B (en) | 2020-09-04 |
Family
ID=71209699
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010467045.5A Active CN111368815B (en) | 2020-05-28 | 2020-05-28 | Pedestrian re-identification method based on multi-component self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111368815B (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860291A (en) * | 2020-07-16 | 2020-10-30 | 上海交通大学 | Multi-mode pedestrian identity recognition method and system based on pedestrian appearance and gait information |
CN111914107B (en) * | 2020-07-29 | 2022-06-14 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN111931624B (en) * | 2020-08-03 | 2023-02-07 | 重庆邮电大学 | Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system |
CN112163498B (en) * | 2020-09-23 | 2022-05-27 | 华中科技大学 | Method for establishing pedestrian re-identification model with foreground guiding and texture focusing functions and application of method |
CN112633089B (en) * | 2020-12-11 | 2024-01-09 | 深圳市爱培科技术股份有限公司 | Video pedestrian re-identification method, intelligent terminal and storage medium |
CN112766156B (en) * | 2021-01-19 | 2023-11-03 | 南京中兴力维软件有限公司 | Riding attribute identification method and device and storage medium |
CN113158739B (en) * | 2021-01-28 | 2024-01-05 | 中山大学 | Method for solving re-identification of replacement person by twin network based on attention mechanism |
CN112836637B (en) * | 2021-02-03 | 2022-06-14 | 江南大学 | Pedestrian re-identification method based on space reverse attention network |
CN113029327B (en) * | 2021-03-02 | 2023-04-18 | 招商局重庆公路工程检测中心有限公司 | Tunnel fan embedded foundation damage identification method based on metric attention convolutional neural network |
CN113095221B (en) * | 2021-04-13 | 2022-10-18 | 电子科技大学 | Cross-domain pedestrian re-identification method based on attribute feature and identity feature fusion |
CN113283320A (en) * | 2021-05-13 | 2021-08-20 | 桂林安维科技有限公司 | Pedestrian re-identification method based on channel feature aggregation |
CN113191338B (en) * | 2021-06-29 | 2021-09-17 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, device and equipment and readable storage medium |
CN113705880A (en) * | 2021-08-25 | 2021-11-26 | 杭州远眺科技有限公司 | Traffic speed prediction method and device based on space-time attention diagram convolutional network |
CN113420742B (en) * | 2021-08-25 | 2022-01-11 | 山东交通学院 | Global attention network model for vehicle weight recognition |
CN113920470B (en) * | 2021-10-12 | 2023-01-31 | 中国电子科技集团公司第二十八研究所 | Pedestrian retrieval method based on self-attention mechanism |
CN113837208B (en) * | 2021-10-18 | 2024-01-23 | 北京远鉴信息技术有限公司 | Method and device for determining abnormal image, electronic equipment and storage medium |
CN114120069B (en) * | 2022-01-27 | 2022-04-12 | 四川博创汇前沿科技有限公司 | Lane line detection system, method and storage medium based on direction self-attention |
US11810366B1 (en) | 2022-09-22 | 2023-11-07 | Zhejiang Lab | Joint modeling method and apparatus for enhancing local features of pedestrians |
CN115240121B (en) * | 2022-09-22 | 2023-01-03 | 之江实验室 | Joint modeling method and device for enhancing local features of pedestrians |
CN115795394A (en) * | 2022-11-29 | 2023-03-14 | 哈尔滨工业大学(深圳) | Biological feature fusion identity recognition method for hierarchical multi-modal and advanced incremental learning |
CN116704453B (en) * | 2023-08-08 | 2023-11-28 | 山东交通学院 | Method for vehicle re-identification by adopting self-adaptive division and priori reinforcement part learning network |
CN117593514B (en) * | 2023-12-08 | 2024-05-24 | 耕宇牧星(北京)空间科技有限公司 | Image target detection method and system based on deep principal component analysis assistance |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110642A (en) * | 2019-04-29 | 2019-08-09 | 华南理工大学 | A kind of pedestrian's recognition methods again based on multichannel attention feature |
CN110188611A (en) * | 2019-04-26 | 2019-08-30 | 华中科技大学 | A kind of pedestrian recognition methods and system again introducing visual attention mechanism |
CN110688938A (en) * | 2019-09-25 | 2020-01-14 | 江苏省未来网络创新研究院 | Pedestrian re-identification method integrated with attention mechanism |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9928410B2 (en) * | 2014-11-24 | 2018-03-27 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing object, and method and apparatus for training recognizer |
CN110175527B (en) * | 2019-04-29 | 2022-03-25 | 北京百度网讯科技有限公司 | Pedestrian re-identification method and device, computer equipment and readable medium |
CN110610129A (en) * | 2019-08-05 | 2019-12-24 | 华中科技大学 | Deep learning face recognition system and method based on self-attention mechanism |
-
2020
- 2020-05-28 CN CN202010467045.5A patent/CN111368815B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188611A (en) * | 2019-04-26 | 2019-08-30 | 华中科技大学 | A kind of pedestrian recognition methods and system again introducing visual attention mechanism |
CN110110642A (en) * | 2019-04-29 | 2019-08-09 | 华南理工大学 | A kind of pedestrian's recognition methods again based on multichannel attention feature |
CN110688938A (en) * | 2019-09-25 | 2020-01-14 | 江苏省未来网络创新研究院 | Pedestrian re-identification method integrated with attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN111368815A (en) | 2020-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111368815B (en) | Pedestrian re-identification method based on multi-component self-attention mechanism | |
CN110110642B (en) | Pedestrian re-identification method based on multi-channel attention features | |
Luo et al. | Traffic sign recognition using a multi-task convolutional neural network | |
CN108764065B (en) | Pedestrian re-recognition feature fusion aided learning method | |
CN109961051B (en) | Pedestrian re-identification method based on clustering and block feature extraction | |
CN112101150B (en) | Multi-feature fusion pedestrian re-identification method based on orientation constraint | |
CN111079584A (en) | Rapid vehicle detection method based on improved YOLOv3 | |
Zia et al. | Detailed 3d representations for object recognition and modeling | |
Wang et al. | A survey of vehicle re-identification based on deep learning | |
Mughal et al. | Assisting UAV localization via deep contextual image matching | |
US9626585B2 (en) | Composition modeling for photo retrieval through geometric image segmentation | |
CN110717526A (en) | Unsupervised transfer learning method based on graph convolution network | |
CN110807434A (en) | Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes | |
Tang et al. | Multi-modal metric learning for vehicle re-identification in traffic surveillance environment | |
CN105574545B (en) | The semantic cutting method of street environment image various visual angles and device | |
CN113361464A (en) | Vehicle weight recognition method based on multi-granularity feature segmentation | |
CN113221770B (en) | Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning | |
CN112990282B (en) | Classification method and device for fine-granularity small sample images | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN111104973A (en) | Knowledge attention-based fine-grained image classification method | |
CN114283355A (en) | Multi-target endangered animal tracking method based on small sample learning | |
CN112070010A (en) | Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning | |
Tian et al. | Robust joint learning network: improved deep representation learning for person re-identification | |
CN115690549A (en) | Target detection method for realizing multi-dimensional feature fusion based on parallel interaction architecture model | |
Liu et al. | Posture calibration based cross-view & hard-sensitive metric learning for UAV-based vehicle re-identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |