CN112597866B - Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method - Google Patents

Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method Download PDF

Info

Publication number
CN112597866B
CN112597866B CN202011489557.8A CN202011489557A CN112597866B CN 112597866 B CN112597866 B CN 112597866B CN 202011489557 A CN202011489557 A CN 202011489557A CN 112597866 B CN112597866 B CN 112597866B
Authority
CN
China
Prior art keywords
picture
loss function
infrared
visible light
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011489557.8A
Other languages
Chinese (zh)
Other versions
CN112597866A (en
Inventor
邵昊
高广谓
吴飞
徐国安
岳东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202011489557.8A priority Critical patent/CN112597866B/en
Publication of CN112597866A publication Critical patent/CN112597866A/en
Application granted granted Critical
Publication of CN112597866B publication Critical patent/CN112597866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method, which is based on a residual error network and comprises a feature extraction part, a feature mapping part and a loss function part; inputting K pairs of pictures to a feature extraction part initially, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target, a knowledge distillation function is introduced, and a loss function is calculated; then inputting the shallow layer feature extraction result into a feature mapping part, and extracting the mode sharing features of the visible light mode and the infrared mode; finally, the classification result is output after sequentially passing through a GEM pooling layer, a batch normalization layer and a full connection layer; the invention also designs an improved enumeration loss function, and further solves the problem of large modal difference from the traditional visible light image mode to the infrared image mode.

Description

Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method
Technical Field
The invention relates to the technical field of computer vision, in particular to a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method.
Background
Pedestrian re-identification is a popular research subject in the field of computer vision, integrates a computer image processing technology and a statistical technology, and is widely applied to the fields of security protection, intelligent monitoring and the like. It has the difficulties of large intra-class difference (the apparent characteristics of the same person can be very different), small inter-class difference (the apparent characteristics of different persons can be very similar), and the like. This is mainly due to factors such as the shooting angle of the camera, the difference of illumination, the change of the posture of the pedestrian, and the shielding. Pedestrian re-identification algorithms in the prior art mainly study the daytime pedestrian re-identification based on visible light (RGB) images. Night scenes are also the key fields in the fields of monitoring, security and the like. Although many surveillance cameras can automatically convert from a visible mode (visible mode) to an Infrared mode (Infrared mode) and acquire color (RGB) images and Infrared (Infrared) images, respectively, many excellent pedestrian re-identification algorithms do not support such matching between color and Infrared images. Because of the large modal gap between color (RGB) and Infrared (Infrared) images. The visible image has 3 channels containing color information, while the infrared image has only 1 channel containing invisible light information.
At present, pedestrian re-identification algorithms based on visible light-infrared cross-modal are mainly divided into two types: (1) a dual stream network based method; (2) a method of generating a network based on a countermeasure. The first approach addresses inter-modal differences by aligning feature distributions of different modalities; the second approach resolves the inter-modality differences through modality conversion while preserving identity information.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method, which effectively solves the problem of large modal difference from a visible light mode to an infrared image mode by aiming at the problems of large intra-class difference, small inter-class difference, easy shielding and the like of the traditional pedestrian re-identification method.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method is characterized by comprising the following steps:
s1, initially inputting K to a feature extraction part of the picture, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target; the feature extraction is as follows:
I V =F V (i V )
I T =F T (i T )
wherein i V Representing a visible light picture, F V Indicating shallow feature extraction of visible light, I V Representing features of visible light picture extraction; i.e. i T Representing an infrared picture, F T Indicating shallow feature extraction of an infrared picture, I T Representing the characteristics of infrared picture extraction;
step S2, introducing a knowledge distillation function KD Loss, and obtaining a characteristic pair I according to the step S1 V And I T The loss function is calculated as follows:
Figure BDA0002840340870000021
step S3, the characteristic pair I acquired in the step S1 V And I T Inputting the data into a characteristic mapping part, and extracting the mode sharing characteristics of a visible light mode and an infrared mode as follows:
K V =E(I V )
K T =E(I T )
wherein E represents the operation of deeply extracting modal shared features; k V And K T Respectively representing the extracted sharing characteristics;
step S4, sharing the characteristics K with the modality obtained in the step S3 V And K T The classification result is output to the GEM pooling layer, the batch normalization layer and the full connection layer in sequence, and the output classification result is as follows:
L V =FC(BN(GEM(K V )))
L T =FC(BN(GEM(K T )))
wherein GEM represents pooling operations as follows:
Figure BDA0002840340870000022
is characteristic of pooled output. P is a hyper-parameter, which can be set in advance or learned by back propagation, and corresponds to maximum pooling when P → ∞ and average pooling when P → 1;
BN represents batch normalization operation, FC represents full connection layer;
step S5, introducing an improved enumeration loss function;
step S5.1, introducing inter-class cross-modal enumeration loss function L c The following were used:
Figure BDA0002840340870000023
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As standard sample pictures, y i For positive sample pictures, y j A negative sample picture;
step S5.2, introducing in-class homomorphic enumeration loss function L s The following were used:
Figure BDA0002840340870000031
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As standard sample pictures, y i For an active sample picture, x j A negative sample picture;
step S5.3, introduce the compact term C as follows:
Figure BDA0002840340870000032
wherein f is r (y i ) Denotes f (y) i ) The (c) th element of (a),
Figure BDA0002840340870000033
denotes f (y) i ) R is the deep local feature representation of the output f (y) i ) Dimension (d);
the final enumeration loss function is as follows:
L enumerate =L c +L s +λC
where λ is a balancing term to balance the tightening term C;
step S6, integrating the identity information into the overall loss function; specifically, the cross entropy loss function is designed as follows:
Figure BDA0002840340870000034
Figure BDA0002840340870000035
where N represents the class number of the identity information of the sample, L idt Cross entropy loss function, L, representing an infrared picture idv Represents the cross entropy loss function of a visible picture, q () represents the predicted label, p () represents the true label, x i Representing a visible picture, y i Representing an infrared picture;
step S7, based on steps S2, S5, and S6, the final loss function is as follows:
L total =L enumerate +L idv +L idt +L KD
has the advantages that:
according to the method, shallow feature extraction is carried out on an infrared image and a visible light image to extract modal unique features of a visible light modality and an infrared modality, then deep feature extraction is carried out on the infrared image and the visible light image in a feature mapping part to extract modal sharing features of the visible light modality and the infrared modality, and finally classification results are output through operations of pooling, batch normalization and the like. A knowledge distillation function is introduced, the difference between a visible light mode and an infrared mode in a shallow network is reduced, a common feature space with different angles can be learned by designing an improved enumeration loss function, and the included angle between mapping features can be effectively constrained through the common feature space. Most of the existing work uses euclidean metric based constraints to account for differences between different modal characteristics. However, these methods cannot learn angle-discriminating embedded features because euclidean distance cannot effectively measure the angle between embedded features, and thus the improved enumeration loss function solves this problem with cosine distances. Since a common feature space that is angularly distinguishable is particularly important for classification based on pedestrian images between mapped features, an improved enumeration loss function may better learn this space.
Drawings
Fig. 1 is an overall network framework diagram of a pedestrian re-identification method provided by the invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method is based on a residual error network shown in figure 1. Specifically, the residual network comprises a residual block, a convolutional layer, a batch normalization layer, an activation function layer, a full connection layer and a pooling layer.
In the figure, stage0, stage1, stage2, stage3 and stage4 respectively represent the shallow convolutional layer, the first residual block, the second residual block, the third residual block and the fourth residual block of the Resnet50 network. The shallow convolutional layers and the structures of the residual blocks are shown in table 1 below. GEM represents a pooling operation. BNNeck is a batch normalization layer. FC is a full connectivity layer.
TABLE 1 concrete Structure of shallow convolution layer and residual Block
Figure BDA0002840340870000041
The re-identification method mainly comprises 3 parts, namely a feature extraction part, a feature mapping part and a loss function part, and the implementation mode of each part is described in detail below.
(1) Feature extraction section
S1, initially inputting K to a feature extraction part of the picture, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target; the feature extraction is as follows:
I V =F V (i V )
I T =F T (i T )
wherein i V Representing a visible light picture, F V Indicating shallow feature extraction of visible light, I V Representing features of visible light picture extraction; i.e. i T Representing an infrared picture, F T Indicating shallow feature extraction of an infrared picture, I T Representing the characteristics of infrared picture extraction;
step S2, in order to reduce the difference between the visible light mode and the infrared mode in the shallow layer network, introducing a knowledge distillation function KD Loss, and obtaining a characteristic pair I according to the step S1 V And I T The loss function is calculated as follows:
Figure BDA0002840340870000051
(2) feature mapping section
Step S3, the characteristic pair I acquired in the step S1 V And I T Inputting the data into a characteristic mapping part, and extracting the mode sharing characteristics of a visible light mode and an infrared mode as follows:
K V =E(I V )
K T =E(I T )
wherein E represents the operation of deeply extracting modal shared features; k V And K T Respectively representing the extracted shared features;
step S4, sharing the characteristics K with the modality obtained in the step S3 V And K T The classification result is output to a GEM pooling layer, a batch normalization layer and a full connection layer in sequence, and the classification result is output as follows:
L V =FC(BN(GEM(K V )))
L T =FC(BN(GEM(K T )))
where GEM represents a pooling approach, BN represents batch normalization operations, and FC represents the fully connected layer.
As a fine-grained example search, maximal pooling or average pooling, which is widely used by pooling operations, fails to capture domain-specific distinguishing features. Instead of exploiting the widely used maximal pooling or average pooling, we use a GEM pooling layer to convert 3-dimensional features into 1-dimensional feature vectors. Given the 3-dimensional features, the following are specific:
Figure BDA0002840340870000061
x is a characteristic of the pooling operation output. P is a hyper-parameter, which can be set in advance or learned by back propagation, and corresponds to maximum pooling when P → ∞ and average pooling when P → 1;
after GEM pooling, in order to enable an enumeration loss function to constrain features in a free Euclidean space and simultaneously constrain features near a hypersphere for classification loss, a batch normalization layer is introduced, and then the features are obtained after dropout and activation function operation.
(3) Part of loss function
Step S5, introducing an improved enumeration loss function;
this section will discuss the enumerated penalty function proposed in this patent in more detail. The enumeration loss function provided by the invention is inspired by a common triplet loss function, and the calculation formula of the common triplet loss function is as follows:
Figure BDA0002840340870000062
wherein x i a ,
Figure BDA0002840340870000064
And x i n Respectively representing a standard sample picture, an active sample picture and a passive sample picture, f() Representing a feature extraction operation, | () | luminance 2 2 Represents the squaring operation of Euclidean distance, alpha represents a preset hyper-parameter, [ z ]] + =max(z,0)。
The existing triple loss function has two key problems: (1) and selecting a sample picture. (2) And setting a hyper-parameter alpha. The selection of the existing triplet loss function sample picture is usually determined by an online hard/soft selection strategy, and the hyper-parameter alpha is manually set. There are the following problems: how to select a suitable sample picture in a cross-modality scene.
Suppose now there are N pairs of different modality pictures, { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Is input to the network for training, where x i And y i A pair of pictures representing different modalities but the same identity information. Assume that the standard sample picture is x i Then the aggressive sample picture is y i But the selection of the negative sample pictures has a different form, x j Or y j J indicates identity information different from i, one of which is from x i Negative picture samples are selected from pictures with different modes and different identity information, so that a cross-mode triple loss function calculation formula is as follows:
Figure BDA0002840340870000063
wherein L is cross Representing the calculated transmembrane state triplet loss function.
Because the difference between the local horizontal infrared mode and the visible light mode is smaller, meanwhile, the traditional method for describing local characteristics by hand design is concentrated on reducing the difference between the modes, and based on the difference, an enumeration loss function is provided. The purpose of this loss function is to completely eliminate the inter-modal local differences by means of deep convolutional networks, in particular, the purpose of enumerating the loss function is to have a standard sample picture x i And positive sample picture y i Not only is less than x j (with standards)The picture has the same mode and different identity information) and is less than y j (different modality from standard sample, different identity information).
Step S5.1, introducing inter-class cross-modal enumeration loss function L c The following were used:
Figure BDA0002840340870000071
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For positive sample pictures, y j A negative sample picture; the present embodiment defaults cross-modal changes within a class to be less than cross-modal changes between classes.
Step S5.2, introducing in-class homomorphic enumeration loss function L s The following were used:
Figure BDA0002840340870000072
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For an active sample picture, x j A negative sample picture; the same cross-modal variation within a default class of the present embodiment is less than the same modal variation between classes.
Step S5.3, the two loss functions can help to train deep local feature description when ignoring inter-modal differences in the training phase. But in the practical experimental process, convergence based on the combination of the two loss functions is difficult, so a tight term C is introduced to ensure that each dimension in the generated deep local feature description is distributed as uniformly as possible, so that the obtained feature description is more compact and information-rich, and the tight term C is introduced as follows:
Figure BDA0002840340870000073
wherein f is r (y i ) Denotes f (y) i ) The (c) th element of (a),
Figure BDA0002840340870000074
denotes f (y) i ) R is the output deep local feature representation f (y) i ) Dimension (d); the purpose of the compact term is to avoid overfitting of the network in the training process, in the experiment, if no network loss exists, the network is difficult to converge, and the compact term can help reduce redundancy, so that the deep local features are more discriminative and informative.
The final enumeration loss function is as follows:
L enumerate =L c +L s +λC
where λ is a balancing term to balance the tightening term C;
step S6, the characteristics of the visible and infrared images may be completely different due to cross-modality variation, and therefore, the loss function will fall into a convergence problem due to incorrect relationship metrics, and is difficult to converge for large datasets. At the same time, learned features cannot account for intra-class variations by using relationship constraints only. The identity information is thus integrated into the overall loss function in this embodiment. This is done using a cross entropy loss function that is widely used. The identity loss function will model the identity specific information to enhance robustness in the feature learning process. The cross entropy loss function is calculated as follows:
Figure BDA0002840340870000081
Figure BDA0002840340870000082
where N represents the class number of the identity information of the sample, L idt Cross entropy loss function, L, representing infrared pictures idv Represents the cross entropy loss function of a visible picture, q () represents the predicted label, p () represents the true label, x i Representing a visible picture, y i Representing an infrared picture;
step S7, based on steps S2, S5, and S6, the final loss function is as follows:
L total =L enumerate +L idv +L idt +L KD
in the experiment, an Adam optimizer is selected to optimize the model, the initial learning rate is set to be 1 x 10-4, and the results of part of the experiment are shown in the following tables 2-3. The present invention is still optimal in accuracy on the SYSU-MM01 dataset without using pre-training experimental processing methods. Compared with a Hi-cmd method, the rank1 value is improved by 8.29%, the index is most important in an actual application scene, and other indexes are obviously improved. Meanwhile, on the RegDB data set, the present invention is still optimal in terms of accuracy without using a pre-training experimental processing method. Compared with the best Edfl method, rank1 is improved by 17, 72%, and other indexes such as Map and the like are also improved obviously. In conclusion, the improvement of the two data sets in the invention is a great difference in the pedestrian re-identification field.
TABLE 2 Experimental results on SYSU-MM01 data set
Figure BDA0002840340870000083
TABLE 3 results of experiments on the RegDB data set
Method Rank 1 Rank 10 Rank 20 Map
Zero[1] 17.75 34.21 44.35 18.9
Hcml[2] 24.44 47.53 56.78 20.8
Hsme[3] 50.85 73.36 81.66 47
Edfl[5] 52.58 72.1 81.47 52.98
ours 70.3 80.31 87.25 69.32
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (1)

1. A knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method is characterized by comprising the following steps:
s1, initially inputting K to a feature extraction part of the picture, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target; the feature extraction is as follows:
I V =F V (i V )
I T =F T (i T )
wherein i V Representing a visible light picture, F V Indicating shallow feature extraction of visible light, I V Representing features of visible light picture extraction; i.e. i T Representing an infrared picture, F T Indicating shallow feature extraction of an infrared picture, I T Representing the characteristics of infrared picture extraction;
step S2, introducing a knowledge distillation function KD Loss, and obtaining a characteristic pair I according to the step S1 V And I T The loss function is calculated as follows:
Figure FDA0002840340860000012
step S3, the characteristic pair I acquired in the step S1 V And I T Inputting the data into a characteristic mapping part, and extracting the mode sharing characteristics of a visible light mode and an infrared mode as follows:
K V =E(I V )
K T =E(I T )
wherein E represents the operation of deeply extracting modal shared features; k V And K T Respectively representing the extracted sharing characteristics;
step S4, sharing the characteristics K with the modality obtained in the step S3 V And K T The classification result is output to a GEM pooling layer, a batch normalization layer and a full connection layer in sequence, and the classification result is output as follows:
L V =FC(BN(GEM(K V )))
L T =FC(BN(GEM(K T )))
wherein GEM represents pooling operations as follows:
Figure FDA0002840340860000011
x is a characteristic output via a pooling operation; p is a hyper-parameter and is acquired by adopting preset or back propagation learning, and when p → ∞ is equivalent to maximum pooling, and when p → 1 is equivalent to average pooling;
BN represents batch normalization operation, FC represents full connection layer;
step S5, introducing an improved enumeration loss function;
step S5.1, introducing an inter-class cross-modal enumeration loss function L c The following:
Figure FDA0002840340860000021
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For positive sample pictures, y j A negative sample picture;
step S5.2, introducing in-class homomorphic enumeration loss function L s The following were used:
Figure FDA0002840340860000022
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For an active sample picture, x j A negative sample picture;
step S5.3, introduce the compact term C as follows:
Figure FDA0002840340860000023
wherein f is r (y i ) Denotes f (y) i ) The (c) th element of (a),
Figure FDA0002840340860000024
denotes f (y) i ) R is the deep local feature representation of the output f (y) i ) Dimension (d);
the final enumeration loss function is as follows:
L enumerate =L c +L s +λC
where λ is a balancing term to balance the tightening term C;
step S6, integrating the identity information into the overall loss function; specifically, the cross entropy loss function is designed as follows:
Figure FDA0002840340860000025
Figure FDA0002840340860000026
where N represents the class number of the identity information of the sample, L idt Cross entropy loss function, L, representing an infrared picture idv Represents the cross entropy loss function of a visible picture, q () represents the predicted label, p () represents the true label, x i Representing a visible picture, y i Representing an infrared picture;
step S7, based on steps S2, S5, and S6, the final loss function is as follows:
L total =L enumerate +L idv +L idt +L KD
CN202011489557.8A 2020-12-16 2020-12-16 Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method Active CN112597866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011489557.8A CN112597866B (en) 2020-12-16 2020-12-16 Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011489557.8A CN112597866B (en) 2020-12-16 2020-12-16 Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method

Publications (2)

Publication Number Publication Date
CN112597866A CN112597866A (en) 2021-04-02
CN112597866B true CN112597866B (en) 2022-08-02

Family

ID=75196844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011489557.8A Active CN112597866B (en) 2020-12-16 2020-12-16 Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN112597866B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128460B (en) * 2021-05-06 2022-11-08 东南大学 Knowledge distillation-based multi-resolution pedestrian re-identification method
CN113269117B (en) * 2021-06-04 2022-12-13 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN113283362B (en) * 2021-06-04 2024-03-22 中国矿业大学 Cross-mode pedestrian re-identification method
CN114220124B (en) * 2021-12-16 2024-07-12 华南农业大学 Near infrared-visible light cross-mode double-flow pedestrian re-identification method and system
CN114550220B (en) * 2022-04-21 2022-09-09 中国科学技术大学 Training method of pedestrian re-recognition model and pedestrian re-recognition method
CN114694185B (en) * 2022-05-31 2022-11-04 浪潮电子信息产业股份有限公司 Cross-modal target re-identification method, device, equipment and medium
CN115115919B (en) * 2022-06-24 2023-05-05 国网智能电网研究院有限公司 Power grid equipment thermal defect identification method and device
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN110796026A (en) * 2019-10-10 2020-02-14 湖北工业大学 Pedestrian re-identification method based on global feature stitching
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN110796026A (en) * 2019-10-10 2020-02-14 湖北工业大学 Pedestrian re-identification method based on global feature stitching
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于生成对抗网络的跨模态行人重识别研究;冯敏等;《现代信息科技》;20200225(第04期);全文 *

Also Published As

Publication number Publication date
CN112597866A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN112597866B (en) Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method
CN107577990B (en) Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval
Zhang et al. Chinese sign language recognition with adaptive HMM
Wang et al. Large-scale isolated gesture recognition using convolutional neural networks
US20190057299A1 (en) System for building a map and subsequent localization
WO2021103721A1 (en) Component segmentation-based identification model training and vehicle re-identification methods and devices
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN105184238A (en) Human face recognition method and system
CN108596010B (en) Implementation method of pedestrian re-identification system
JPH06150000A (en) Image clustering device
WO2023279935A1 (en) Target re-recognition model training method and device, and target re-recognition method and device
CN110516533A (en) A kind of pedestrian based on depth measure discrimination method again
CN117746467A (en) Modal enhancement and compensation cross-modal pedestrian re-recognition method
CN113076891A (en) Human body posture prediction method and system based on improved high-resolution network
CN112084895A (en) Pedestrian re-identification method based on deep learning
TW202125323A (en) Processing method of learning face recognition by artificial intelligence module
CN117351518B (en) Method and system for identifying unsupervised cross-modal pedestrian based on level difference
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN112232147B (en) Method, device and system for self-adaptive acquisition of super-parameters of face model
CN116935329A (en) Weak supervision text pedestrian retrieval method and system for class-level comparison learning
Ran et al. Improving visible-thermal ReID with structural common space embedding and part models
CN109740405B (en) Method for detecting front window difference information of non-aligned similar vehicles
CN111738039A (en) Pedestrian re-identification method, terminal and storage medium
CN114972146A (en) Image fusion method and device based on generation countermeasure type double-channel weight distribution
CN110941994B (en) Pedestrian re-identification integration method based on meta-class-based learner

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant