CN112597866B - Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method - Google Patents
Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method Download PDFInfo
- Publication number
- CN112597866B CN112597866B CN202011489557.8A CN202011489557A CN112597866B CN 112597866 B CN112597866 B CN 112597866B CN 202011489557 A CN202011489557 A CN 202011489557A CN 112597866 B CN112597866 B CN 112597866B
- Authority
- CN
- China
- Prior art keywords
- picture
- loss function
- infrared
- visible light
- follows
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method, which is based on a residual error network and comprises a feature extraction part, a feature mapping part and a loss function part; inputting K pairs of pictures to a feature extraction part initially, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target, a knowledge distillation function is introduced, and a loss function is calculated; then inputting the shallow layer feature extraction result into a feature mapping part, and extracting the mode sharing features of the visible light mode and the infrared mode; finally, the classification result is output after sequentially passing through a GEM pooling layer, a batch normalization layer and a full connection layer; the invention also designs an improved enumeration loss function, and further solves the problem of large modal difference from the traditional visible light image mode to the infrared image mode.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method.
Background
Pedestrian re-identification is a popular research subject in the field of computer vision, integrates a computer image processing technology and a statistical technology, and is widely applied to the fields of security protection, intelligent monitoring and the like. It has the difficulties of large intra-class difference (the apparent characteristics of the same person can be very different), small inter-class difference (the apparent characteristics of different persons can be very similar), and the like. This is mainly due to factors such as the shooting angle of the camera, the difference of illumination, the change of the posture of the pedestrian, and the shielding. Pedestrian re-identification algorithms in the prior art mainly study the daytime pedestrian re-identification based on visible light (RGB) images. Night scenes are also the key fields in the fields of monitoring, security and the like. Although many surveillance cameras can automatically convert from a visible mode (visible mode) to an Infrared mode (Infrared mode) and acquire color (RGB) images and Infrared (Infrared) images, respectively, many excellent pedestrian re-identification algorithms do not support such matching between color and Infrared images. Because of the large modal gap between color (RGB) and Infrared (Infrared) images. The visible image has 3 channels containing color information, while the infrared image has only 1 channel containing invisible light information.
At present, pedestrian re-identification algorithms based on visible light-infrared cross-modal are mainly divided into two types: (1) a dual stream network based method; (2) a method of generating a network based on a countermeasure. The first approach addresses inter-modal differences by aligning feature distributions of different modalities; the second approach resolves the inter-modality differences through modality conversion while preserving identity information.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method, which effectively solves the problem of large modal difference from a visible light mode to an infrared image mode by aiming at the problems of large intra-class difference, small inter-class difference, easy shielding and the like of the traditional pedestrian re-identification method.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method is characterized by comprising the following steps:
s1, initially inputting K to a feature extraction part of the picture, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target; the feature extraction is as follows:
I V =F V (i V )
I T =F T (i T )
wherein i V Representing a visible light picture, F V Indicating shallow feature extraction of visible light, I V Representing features of visible light picture extraction; i.e. i T Representing an infrared picture, F T Indicating shallow feature extraction of an infrared picture, I T Representing the characteristics of infrared picture extraction;
step S2, introducing a knowledge distillation function KD Loss, and obtaining a characteristic pair I according to the step S1 V And I T The loss function is calculated as follows:
step S3, the characteristic pair I acquired in the step S1 V And I T Inputting the data into a characteristic mapping part, and extracting the mode sharing characteristics of a visible light mode and an infrared mode as follows:
K V =E(I V )
K T =E(I T )
wherein E represents the operation of deeply extracting modal shared features; k V And K T Respectively representing the extracted sharing characteristics;
step S4, sharing the characteristics K with the modality obtained in the step S3 V And K T The classification result is output to the GEM pooling layer, the batch normalization layer and the full connection layer in sequence, and the output classification result is as follows:
L V =FC(BN(GEM(K V )))
L T =FC(BN(GEM(K T )))
wherein GEM represents pooling operations as follows:
is characteristic of pooled output. P is a hyper-parameter, which can be set in advance or learned by back propagation, and corresponds to maximum pooling when P → ∞ and average pooling when P → 1;
BN represents batch normalization operation, FC represents full connection layer;
step S5, introducing an improved enumeration loss function;
step S5.1, introducing inter-class cross-modal enumeration loss function L c The following were used:
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As standard sample pictures, y i For positive sample pictures, y j A negative sample picture;
step S5.2, introducing in-class homomorphic enumeration loss function L s The following were used:
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As standard sample pictures, y i For an active sample picture, x j A negative sample picture;
step S5.3, introduce the compact term C as follows:
wherein f is r (y i ) Denotes f (y) i ) The (c) th element of (a),denotes f (y) i ) R is the deep local feature representation of the output f (y) i ) Dimension (d);
the final enumeration loss function is as follows:
L enumerate =L c +L s +λC
where λ is a balancing term to balance the tightening term C;
step S6, integrating the identity information into the overall loss function; specifically, the cross entropy loss function is designed as follows:
where N represents the class number of the identity information of the sample, L idt Cross entropy loss function, L, representing an infrared picture idv Represents the cross entropy loss function of a visible picture, q () represents the predicted label, p () represents the true label, x i Representing a visible picture, y i Representing an infrared picture;
step S7, based on steps S2, S5, and S6, the final loss function is as follows:
L total =L enumerate +L idv +L idt +L KD 。
has the advantages that:
according to the method, shallow feature extraction is carried out on an infrared image and a visible light image to extract modal unique features of a visible light modality and an infrared modality, then deep feature extraction is carried out on the infrared image and the visible light image in a feature mapping part to extract modal sharing features of the visible light modality and the infrared modality, and finally classification results are output through operations of pooling, batch normalization and the like. A knowledge distillation function is introduced, the difference between a visible light mode and an infrared mode in a shallow network is reduced, a common feature space with different angles can be learned by designing an improved enumeration loss function, and the included angle between mapping features can be effectively constrained through the common feature space. Most of the existing work uses euclidean metric based constraints to account for differences between different modal characteristics. However, these methods cannot learn angle-discriminating embedded features because euclidean distance cannot effectively measure the angle between embedded features, and thus the improved enumeration loss function solves this problem with cosine distances. Since a common feature space that is angularly distinguishable is particularly important for classification based on pedestrian images between mapped features, an improved enumeration loss function may better learn this space.
Drawings
Fig. 1 is an overall network framework diagram of a pedestrian re-identification method provided by the invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method is based on a residual error network shown in figure 1. Specifically, the residual network comprises a residual block, a convolutional layer, a batch normalization layer, an activation function layer, a full connection layer and a pooling layer.
In the figure, stage0, stage1, stage2, stage3 and stage4 respectively represent the shallow convolutional layer, the first residual block, the second residual block, the third residual block and the fourth residual block of the Resnet50 network. The shallow convolutional layers and the structures of the residual blocks are shown in table 1 below. GEM represents a pooling operation. BNNeck is a batch normalization layer. FC is a full connectivity layer.
TABLE 1 concrete Structure of shallow convolution layer and residual Block
The re-identification method mainly comprises 3 parts, namely a feature extraction part, a feature mapping part and a loss function part, and the implementation mode of each part is described in detail below.
(1) Feature extraction section
S1, initially inputting K to a feature extraction part of the picture, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target; the feature extraction is as follows:
I V =F V (i V )
I T =F T (i T )
wherein i V Representing a visible light picture, F V Indicating shallow feature extraction of visible light, I V Representing features of visible light picture extraction; i.e. i T Representing an infrared picture, F T Indicating shallow feature extraction of an infrared picture, I T Representing the characteristics of infrared picture extraction;
step S2, in order to reduce the difference between the visible light mode and the infrared mode in the shallow layer network, introducing a knowledge distillation function KD Loss, and obtaining a characteristic pair I according to the step S1 V And I T The loss function is calculated as follows:
(2) feature mapping section
Step S3, the characteristic pair I acquired in the step S1 V And I T Inputting the data into a characteristic mapping part, and extracting the mode sharing characteristics of a visible light mode and an infrared mode as follows:
K V =E(I V )
K T =E(I T )
wherein E represents the operation of deeply extracting modal shared features; k V And K T Respectively representing the extracted shared features;
step S4, sharing the characteristics K with the modality obtained in the step S3 V And K T The classification result is output to a GEM pooling layer, a batch normalization layer and a full connection layer in sequence, and the classification result is output as follows:
L V =FC(BN(GEM(K V )))
L T =FC(BN(GEM(K T )))
where GEM represents a pooling approach, BN represents batch normalization operations, and FC represents the fully connected layer.
As a fine-grained example search, maximal pooling or average pooling, which is widely used by pooling operations, fails to capture domain-specific distinguishing features. Instead of exploiting the widely used maximal pooling or average pooling, we use a GEM pooling layer to convert 3-dimensional features into 1-dimensional feature vectors. Given the 3-dimensional features, the following are specific:
x is a characteristic of the pooling operation output. P is a hyper-parameter, which can be set in advance or learned by back propagation, and corresponds to maximum pooling when P → ∞ and average pooling when P → 1;
after GEM pooling, in order to enable an enumeration loss function to constrain features in a free Euclidean space and simultaneously constrain features near a hypersphere for classification loss, a batch normalization layer is introduced, and then the features are obtained after dropout and activation function operation.
(3) Part of loss function
Step S5, introducing an improved enumeration loss function;
this section will discuss the enumerated penalty function proposed in this patent in more detail. The enumeration loss function provided by the invention is inspired by a common triplet loss function, and the calculation formula of the common triplet loss function is as follows:
wherein x i a ,And x i n Respectively representing a standard sample picture, an active sample picture and a passive sample picture, f() Representing a feature extraction operation, | () | luminance 2 2 Represents the squaring operation of Euclidean distance, alpha represents a preset hyper-parameter, [ z ]] + =max(z,0)。
The existing triple loss function has two key problems: (1) and selecting a sample picture. (2) And setting a hyper-parameter alpha. The selection of the existing triplet loss function sample picture is usually determined by an online hard/soft selection strategy, and the hyper-parameter alpha is manually set. There are the following problems: how to select a suitable sample picture in a cross-modality scene.
Suppose now there are N pairs of different modality pictures, { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Is input to the network for training, where x i And y i A pair of pictures representing different modalities but the same identity information. Assume that the standard sample picture is x i Then the aggressive sample picture is y i But the selection of the negative sample pictures has a different form, x j Or y j J indicates identity information different from i, one of which is from x i Negative picture samples are selected from pictures with different modes and different identity information, so that a cross-mode triple loss function calculation formula is as follows:
wherein L is cross Representing the calculated transmembrane state triplet loss function.
Because the difference between the local horizontal infrared mode and the visible light mode is smaller, meanwhile, the traditional method for describing local characteristics by hand design is concentrated on reducing the difference between the modes, and based on the difference, an enumeration loss function is provided. The purpose of this loss function is to completely eliminate the inter-modal local differences by means of deep convolutional networks, in particular, the purpose of enumerating the loss function is to have a standard sample picture x i And positive sample picture y i Not only is less than x j (with standards)The picture has the same mode and different identity information) and is less than y j (different modality from standard sample, different identity information).
Step S5.1, introducing inter-class cross-modal enumeration loss function L c The following were used:
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For positive sample pictures, y j A negative sample picture; the present embodiment defaults cross-modal changes within a class to be less than cross-modal changes between classes.
Step S5.2, introducing in-class homomorphic enumeration loss function L s The following were used:
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For an active sample picture, x j A negative sample picture; the same cross-modal variation within a default class of the present embodiment is less than the same modal variation between classes.
Step S5.3, the two loss functions can help to train deep local feature description when ignoring inter-modal differences in the training phase. But in the practical experimental process, convergence based on the combination of the two loss functions is difficult, so a tight term C is introduced to ensure that each dimension in the generated deep local feature description is distributed as uniformly as possible, so that the obtained feature description is more compact and information-rich, and the tight term C is introduced as follows:
wherein f is r (y i ) Denotes f (y) i ) The (c) th element of (a),denotes f (y) i ) R is the output deep local feature representation f (y) i ) Dimension (d); the purpose of the compact term is to avoid overfitting of the network in the training process, in the experiment, if no network loss exists, the network is difficult to converge, and the compact term can help reduce redundancy, so that the deep local features are more discriminative and informative.
The final enumeration loss function is as follows:
L enumerate =L c +L s +λC
where λ is a balancing term to balance the tightening term C;
step S6, the characteristics of the visible and infrared images may be completely different due to cross-modality variation, and therefore, the loss function will fall into a convergence problem due to incorrect relationship metrics, and is difficult to converge for large datasets. At the same time, learned features cannot account for intra-class variations by using relationship constraints only. The identity information is thus integrated into the overall loss function in this embodiment. This is done using a cross entropy loss function that is widely used. The identity loss function will model the identity specific information to enhance robustness in the feature learning process. The cross entropy loss function is calculated as follows:
where N represents the class number of the identity information of the sample, L idt Cross entropy loss function, L, representing infrared pictures idv Represents the cross entropy loss function of a visible picture, q () represents the predicted label, p () represents the true label, x i Representing a visible picture, y i Representing an infrared picture;
step S7, based on steps S2, S5, and S6, the final loss function is as follows:
L total =L enumerate +L idv +L idt +L KD 。
in the experiment, an Adam optimizer is selected to optimize the model, the initial learning rate is set to be 1 x 10-4, and the results of part of the experiment are shown in the following tables 2-3. The present invention is still optimal in accuracy on the SYSU-MM01 dataset without using pre-training experimental processing methods. Compared with a Hi-cmd method, the rank1 value is improved by 8.29%, the index is most important in an actual application scene, and other indexes are obviously improved. Meanwhile, on the RegDB data set, the present invention is still optimal in terms of accuracy without using a pre-training experimental processing method. Compared with the best Edfl method, rank1 is improved by 17, 72%, and other indexes such as Map and the like are also improved obviously. In conclusion, the improvement of the two data sets in the invention is a great difference in the pedestrian re-identification field.
TABLE 2 Experimental results on SYSU-MM01 data set
TABLE 3 results of experiments on the RegDB data set
Method | Rank 1 | Rank 10 | Rank 20 | Map |
Zero[1] | 17.75 | 34.21 | 44.35 | 18.9 |
Hcml[2] | 24.44 | 47.53 | 56.78 | 20.8 |
Hsme[3] | 50.85 | 73.36 | 81.66 | 47 |
Edfl[5] | 52.58 | 72.1 | 81.47 | 52.98 |
ours | 70.3 | 80.31 | 87.25 | 69.32 |
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (1)
1. A knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method is characterized by comprising the following steps:
s1, initially inputting K to a feature extraction part of the picture, and performing shallow feature extraction; each pair of the K pairs of pictures comprises a visible light picture and an infrared picture aiming at the same target; the feature extraction is as follows:
I V =F V (i V )
I T =F T (i T )
wherein i V Representing a visible light picture, F V Indicating shallow feature extraction of visible light, I V Representing features of visible light picture extraction; i.e. i T Representing an infrared picture, F T Indicating shallow feature extraction of an infrared picture, I T Representing the characteristics of infrared picture extraction;
step S2, introducing a knowledge distillation function KD Loss, and obtaining a characteristic pair I according to the step S1 V And I T The loss function is calculated as follows:
step S3, the characteristic pair I acquired in the step S1 V And I T Inputting the data into a characteristic mapping part, and extracting the mode sharing characteristics of a visible light mode and an infrared mode as follows:
K V =E(I V )
K T =E(I T )
wherein E represents the operation of deeply extracting modal shared features; k V And K T Respectively representing the extracted sharing characteristics;
step S4, sharing the characteristics K with the modality obtained in the step S3 V And K T The classification result is output to a GEM pooling layer, a batch normalization layer and a full connection layer in sequence, and the classification result is output as follows:
L V =FC(BN(GEM(K V )))
L T =FC(BN(GEM(K T )))
wherein GEM represents pooling operations as follows:
x is a characteristic output via a pooling operation; p is a hyper-parameter and is acquired by adopting preset or back propagation learning, and when p → ∞ is equivalent to maximum pooling, and when p → 1 is equivalent to average pooling;
BN represents batch normalization operation, FC represents full connection layer;
step S5, introducing an improved enumeration loss function;
step S5.1, introducing an inter-class cross-modal enumeration loss function L c The following:
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For positive sample pictures, y j A negative sample picture;
step S5.2, introducing in-class homomorphic enumeration loss function L s The following were used:
wherein { (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x i ,y i )…(x n ,y n ) Denotes a picture with N pairs of different modalities, x i As a standard sample picture, y i For an active sample picture, x j A negative sample picture;
step S5.3, introduce the compact term C as follows:
wherein f is r (y i ) Denotes f (y) i ) The (c) th element of (a),denotes f (y) i ) R is the deep local feature representation of the output f (y) i ) Dimension (d);
the final enumeration loss function is as follows:
L enumerate =L c +L s +λC
where λ is a balancing term to balance the tightening term C;
step S6, integrating the identity information into the overall loss function; specifically, the cross entropy loss function is designed as follows:
where N represents the class number of the identity information of the sample, L idt Cross entropy loss function, L, representing an infrared picture idv Represents the cross entropy loss function of a visible picture, q () represents the predicted label, p () represents the true label, x i Representing a visible picture, y i Representing an infrared picture;
step S7, based on steps S2, S5, and S6, the final loss function is as follows:
L total =L enumerate +L idv +L idt +L KD 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011489557.8A CN112597866B (en) | 2020-12-16 | 2020-12-16 | Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011489557.8A CN112597866B (en) | 2020-12-16 | 2020-12-16 | Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112597866A CN112597866A (en) | 2021-04-02 |
CN112597866B true CN112597866B (en) | 2022-08-02 |
Family
ID=75196844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011489557.8A Active CN112597866B (en) | 2020-12-16 | 2020-12-16 | Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112597866B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113128460B (en) * | 2021-05-06 | 2022-11-08 | 东南大学 | Knowledge distillation-based multi-resolution pedestrian re-identification method |
CN113269117B (en) * | 2021-06-04 | 2022-12-13 | 重庆大学 | Knowledge distillation-based pedestrian re-identification method |
CN113283362B (en) * | 2021-06-04 | 2024-03-22 | 中国矿业大学 | Cross-mode pedestrian re-identification method |
CN114220124B (en) * | 2021-12-16 | 2024-07-12 | 华南农业大学 | Near infrared-visible light cross-mode double-flow pedestrian re-identification method and system |
CN114550220B (en) * | 2022-04-21 | 2022-09-09 | 中国科学技术大学 | Training method of pedestrian re-recognition model and pedestrian re-recognition method |
CN114694185B (en) * | 2022-05-31 | 2022-11-04 | 浪潮电子信息产业股份有限公司 | Cross-modal target re-identification method, device, equipment and medium |
CN115115919B (en) * | 2022-06-24 | 2023-05-05 | 国网智能电网研究院有限公司 | Power grid equipment thermal defect identification method and device |
CN116824695A (en) * | 2023-06-07 | 2023-09-29 | 南通大学 | Pedestrian re-identification non-local defense method based on feature denoising |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717411A (en) * | 2019-09-23 | 2020-01-21 | 湖北工业大学 | Pedestrian re-identification method based on deep layer feature fusion |
CN110796026A (en) * | 2019-10-10 | 2020-02-14 | 湖北工业大学 | Pedestrian re-identification method based on global feature stitching |
CN111931637A (en) * | 2020-08-07 | 2020-11-13 | 华南理工大学 | Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network |
-
2020
- 2020-12-16 CN CN202011489557.8A patent/CN112597866B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717411A (en) * | 2019-09-23 | 2020-01-21 | 湖北工业大学 | Pedestrian re-identification method based on deep layer feature fusion |
CN110796026A (en) * | 2019-10-10 | 2020-02-14 | 湖北工业大学 | Pedestrian re-identification method based on global feature stitching |
CN111931637A (en) * | 2020-08-07 | 2020-11-13 | 华南理工大学 | Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network |
Non-Patent Citations (1)
Title |
---|
基于生成对抗网络的跨模态行人重识别研究;冯敏等;《现代信息科技》;20200225(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112597866A (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112597866B (en) | Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method | |
CN107577990B (en) | Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval | |
Zhang et al. | Chinese sign language recognition with adaptive HMM | |
Wang et al. | Large-scale isolated gesture recognition using convolutional neural networks | |
US20190057299A1 (en) | System for building a map and subsequent localization | |
WO2021103721A1 (en) | Component segmentation-based identification model training and vehicle re-identification methods and devices | |
CN110503076B (en) | Video classification method, device, equipment and medium based on artificial intelligence | |
CN105184238A (en) | Human face recognition method and system | |
CN108596010B (en) | Implementation method of pedestrian re-identification system | |
JPH06150000A (en) | Image clustering device | |
WO2023279935A1 (en) | Target re-recognition model training method and device, and target re-recognition method and device | |
CN110516533A (en) | A kind of pedestrian based on depth measure discrimination method again | |
CN117746467A (en) | Modal enhancement and compensation cross-modal pedestrian re-recognition method | |
CN113076891A (en) | Human body posture prediction method and system based on improved high-resolution network | |
CN112084895A (en) | Pedestrian re-identification method based on deep learning | |
TW202125323A (en) | Processing method of learning face recognition by artificial intelligence module | |
CN117351518B (en) | Method and system for identifying unsupervised cross-modal pedestrian based on level difference | |
CN117333908A (en) | Cross-modal pedestrian re-recognition method based on attitude feature alignment | |
CN112232147B (en) | Method, device and system for self-adaptive acquisition of super-parameters of face model | |
CN116935329A (en) | Weak supervision text pedestrian retrieval method and system for class-level comparison learning | |
Ran et al. | Improving visible-thermal ReID with structural common space embedding and part models | |
CN109740405B (en) | Method for detecting front window difference information of non-aligned similar vehicles | |
CN111738039A (en) | Pedestrian re-identification method, terminal and storage medium | |
CN114972146A (en) | Image fusion method and device based on generation countermeasure type double-channel weight distribution | |
CN110941994B (en) | Pedestrian re-identification integration method based on meta-class-based learner |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |