CN114511878A - Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization - Google Patents

Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization Download PDF

Info

Publication number
CN114511878A
CN114511878A CN202210004347.8A CN202210004347A CN114511878A CN 114511878 A CN114511878 A CN 114511878A CN 202210004347 A CN202210004347 A CN 202210004347A CN 114511878 A CN114511878 A CN 114511878A
Authority
CN
China
Prior art keywords
modal
pedestrian
visible light
sample
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210004347.8A
Other languages
Chinese (zh)
Inventor
张立言
袁野
陈志贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210004347.8A priority Critical patent/CN114511878A/en
Publication of CN114511878A publication Critical patent/CN114511878A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visible light infrared pedestrian re-identification method based on multi-modal relational polymerization, which comprises the following steps of: acquiring all-weather multi-angle visible light infrared pedestrian monitoring video information through real scene collection, and preprocessing to obtain pedestrian image samples under two modalities of a multi-identity information label; constructing a multi-granularity double-flow cross-modal deep neural network based on modal relationship aggregation, taking pedestrian image samples of two modalities as network input, taking corresponding identity information as a label, and performing supervised training on the network; the visible light or infrared pedestrian images are used as query targets to be input into the network, and the trained network provides a pedestrian image list with higher similarity to the query targets in another modal data set, so that a cross-modal pedestrian matching function is realized. The invention excavates the pedestrian image characteristic information under multiple granularities and optimizes the distribution of the pedestrian image characteristic information in the characteristic space, so that the identity invariant characteristics of two modes are better extracted.

Description

Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a visible light infrared pedestrian re-identification method realized by machine learning.
Background
In the current information and intelligent era, road monitoring systems are more and more frequently found in streets and alleys, and the road monitoring systems are mainly used for maintaining social public safety and preventing illegal crimes. Through the monitoring system, departments such as public security organs and the like can locate and track the criminal suspect, and legal basis is provided for solving cases. The traditional search screening mode is that the record is directly watched through the manpower, and time cost and manpower cost are very high, and meanwhile, due to the limitation of the manpower, the condition of overlooking and mislooking can also occur. With the development of deep learning and information technology, a relatively mature face recognition technology is firstly applied to the field, but the technology does not achieve a good effect due to the fact that the performance of monitoring cameras is different and the monitoring angles are different. Therefore, a pedestrian re-recognition problem that can extract and utilize the feature information of the whole body of a pedestrian is attracting more attention, and the object of the invention is to search out the same pedestrian target through a video image across cameras, so as to determine the moving route and the track of the pedestrian.
However, in real scenes, most of illegal criminal activities are performed at night, so that all-weather retrieval needs to be performed on the real scenes, and the RGB images shot by the camera in the daytime and the infrared images shot at night are globally analyzed. Therefore, on the basis of the traditional single-mode pedestrian re-identification technology, cross-mode pedestrian re-identification capable of combining two-mode information is generated at present and is paid attention by researchers.
Cross-modal pedestrian re-identification mainly researches on the problem that images belonging to the same pedestrian are searched and matched in an image library under two modes by giving an RGB image or an infrared image of a specific pedestrian.
The main challenge facing cross-modal pedestrian re-identification is modeling for the modalities in the cross-modal problem. How to better reduce the difference between the images of the two modalities and learn the shared robustness characteristics between the two modalities is the key of the current research. Early research mainly focuses on two methods, namely, learning based on characterization and learning based on measurement, and then a learning method based on mode conversion is provided, and conversion between an RGB image and an infrared image is realized through a countermeasure generation network (GAN), so that the cross-mode pedestrian re-identification problem is converted into a problem under a single mode.
Due to the complexity of the problem, the existing visible light infrared pedestrian re-identification methods are realized by constructing a deep convolutional neural network.
The key problem of cross-modal pedestrian re-identification is that modal differences between a visible light image and an infrared image are large, the number of channels of the visible light image is different, effective information of the visible light image and the infrared image is different, how to extract common features of the two modes, and how to utilize the information of the visible light image and the infrared image to the maximum extent is the key point to be solved. In addition to the difference between the modalities, the images within the two modalities also have inherent problems in conventional single-modality pedestrian re-identification, such as low resolution, occlusion, and view angle variation.
Disclosure of Invention
The invention aims to provide a visible light infrared pedestrian re-identification method based on multi-modal relational polymerization, so that identity invariant features of two modes can be better extracted, and higher accuracy can be obtained.
In order to achieve the purpose, the invention adopts the technical scheme that:
a visible light infrared pedestrian re-identification method based on multi-modal relational polymerization comprises the following steps:
step 1, acquiring all-weather multi-angle visible light infrared pedestrian monitoring video information through real scene collection, and preprocessing to obtain pedestrian image samples under two modalities of a multi-identity information label;
step 2, constructing a multi-granularity double-flow cross-modal deep neural network based on modal relationship aggregation, initializing network parameters, inputting the pedestrian image samples of the two modalities obtained in the step 1 as a network, using corresponding identity information as a label, performing supervised training on the multi-granularity double-flow cross-modal deep neural network, and continuously adjusting to obtain deep learning network parameters with optimal effect;
and 3, inputting the visible light or infrared pedestrian images in the day or at night as a query target into a network, and giving a pedestrian image list with higher similarity to the query target in another modal data set by the trained multi-granularity double-current cross-modal deep neural network, thereby realizing the cross-modal pedestrian matching function.
In the step 1, the pretreatment comprises the following steps: and carrying out frame sampling and cutting operation, recording the identity information corresponding to different pedestrians, reserving all information of the pedestrians in the image after data preprocessing, and removing redundant invalid backgrounds and other parts in the image.
In the step 1, the data set is a SYSU-MM01 data set based on a real scene.
In the step 2, the multi-granularity double-flow cross-modal depth neural network updates the original features of each channel by using a cross-modal relationship, and supplements the information lacking in the single-modal features, so as to reduce the difference between the visible light modal image and the infrared modal image, wherein the calculation process is as follows:
firstly, extracting the original visible light characteristics fRAnd infrared characteristic fIAnd (3) performing Batch Normalization (BN) and ReLU activation function operation to obtain two processed modal characteristics:
fR=σRfR,fI=σIfI
in the formula, σRAnd σIRepresents a learnable parameter;
then, the feature map of each channel is taken as a feature vector, and f is calculated by the following formulaRThe ith feature vector and fIEuclidean distance between jth feature vectors
Figure BDA0003456045260000031
Figure BDA0003456045260000032
In the formula, | indicates L2 norm, and d is obtainedijComposed relationship matrix MRSame principle for infrared features
Figure BDA0003456045260000033
The same operation is carried out to obtain a relation matrix MI
In order to avoid losing the original information, the original characteristics and the relation matrix information are fused through the following formula to obtain the characteristics for finally realizing the reduction of the difference between the modes,
fR=fR+fR×S(HR[α(fR),β(MR)])
fI=fI+fI×S(HI[α(fI),β(MI)])
in the formula, alpha and beta represent two embedded functions of original features and a relation matrix, S represents a Sigmoid function, and H representsRAnd HIRepresenting a learnable parameter.
In the step 2, the double-flow cross-modal deep neural network learns the characteristics of two modes by using two granularity branches: under the global granularity, using ResNet-50 pre-trained on ImageNet as a backbone network, taking a first layer in the ResNet-50 as a double-current network which does not share parameters, respectively extracting characteristics of visible light and infrared modes, and taking a second layer to a fifth layer as network structures which share parameters and are used for extracting the mode invariant characteristics of the two modes; after modal relationship aggregation, using a linear layer to reduce the dimension of the features, and then using the difficult-to-load sample triple loss, the identity loss and the MMD-ID loss to optimize the feature space, so that the sample distance of the same identity is shortened, and the sample distance of different identities is lengthened; the specific loss function calculation method is as follows:
wherein the identity Loss (Identification Loss) is as follows:
Figure BDA0003456045260000034
wherein n represents the total number of samples, xiRepresenting a given input image, yiIndicates its corresponding label, predict xiIs identified as category yiIs encoded by the Softmax function, by p (y)i|xi) Represents;
the hard-to-load sample triplet Loss (TriHard Loss) is:
Figure BDA0003456045260000035
in the formula, for each trained batch, randomly selecting P identity pedestrian samples, randomly selecting K different sample pictures for each identity to form a P x K batch, and then selecting a different label sample closest to the sample a in the batch as a difficultly negative sample for each sample a in the batch; da,pDenotes the distance (label is the same) between the specimen a and its positive specimen, da,nRepresents the distance of the sample a from its negative sample (label is different); and a sample set with the same label as a is A, a sample set with a label different from a is B, and epsilon represents a threshold parameter set according to actual requirements.
The MMD-ID loss function is:
Figure BDA0003456045260000041
Figure BDA0003456045260000042
in the formula (I), the compound is shown in the specification,
Figure BDA0003456045260000043
and
Figure BDA0003456045260000044
respectively, the kernel similarities between the samples in the same mode,
Figure BDA0003456045260000045
representing similarity across modal samples; pcAnd QcRespectively representing the visible light image and the infrared image sample distribution of the c label sample;
under local granularity, using ResNet-50 pre-trained on ImageNet as a backbone network, not sharing parameters of all layers of ResNet-50, and extracting unique characteristics of modes; after six blocks of sample images are segmented, connecting the characteristics of two modes of the corresponding part, and entering a linear layer of shared parameters for dimension reduction; and finally, optimizing the feature space by using the same loss function in the global granularity.
Has the beneficial effects that: the invention constructs a visible light infrared pedestrian re-identification method based on multi-modal relational polymerization, which is divided into three parts. The first part is used for calculating the relation among the cross-modal and adding the relation with the original characteristics, so that the difference among the modal is reduced; the second part adopts multi-granularity feature extraction, so that global and comprehensive features and local finer features can be better combined, and information in the cross-modal image can be better utilized; and the third part introduces three loss functions, so that the feature space can be optimized, and the performance of the model is further improved.
Detailed Description
The present invention is further described below.
The invention discloses a visible light infrared pedestrian re-identification method based on multi-modal relational polymerization, which comprises the following steps of:
step 1, data preparation and formalization definition: the method comprises the steps of acquiring all-weather multi-angle visible light infrared pedestrian monitoring video information through real scene collection, carrying out frame sampling and cutting operation on the video information, and recording identity information corresponding to different pedestrians of the video information. All information of pedestrians in the image is reserved in the data preprocessing, and redundant invalid backgrounds and other parts in the image are removed. After data preprocessing, obtaining pedestrian image samples under two modes of a multi-identity information label;
and 2, constructing a multi-granularity double-flow cross-modal deep neural network based on modal relation aggregation, initializing network parameters, inputting the pedestrian image samples of the two modalities obtained in the step 1 as a network, taking corresponding identity information as a label, performing supervised training on the multi-granularity double-flow cross-modal deep neural network, and continuously adjusting to obtain the deep learning network parameters with the optimal effect.
The multi-granularity double-current cross-modal depth neural network updates the original features of each channel by using a cross-modal relationship, supplements the lacking information of single-modal features, and accordingly reduces the difference between visible light and infrared modal images, and the calculation process is as follows:
firstly, the extracted original visible light characteristics f are comparedRAnd infrared characteristic fIAnd (3) carrying out batch standardization (BN) and ReLU activation function operation to obtain two processed modal characteristics:
fR=σRfR,fI=σIfI
in the formula, σRAnd σIRepresenting a learnable parameter.
Then, the feature map of each channel is taken as a feature vector, and f is calculated by the following formulaRThe ith feature vector and fIEuclidean distance between jth feature vectors
Figure BDA0003456045260000051
The calculation is shown in the following formula,
Figure BDA0003456045260000052
in the formula, | indicates the norm of L2, and d can be obtainedijComposed relationship matrix MRSame principle for infrared features
Figure BDA0003456045260000053
The same operation is carried out to obtain a relation matrix MI
In order to avoid losing the original information, the original characteristics and the relation matrix information are fused through the following formula to obtain the characteristics for finally realizing the reduction of the difference between the modes,
fR=fR+fR×S(HR[α(fR),β(MR)])
fI=fI+fI×S(HI[α(fI),β(MI)])
in the formula, alpha and beta represent two embedded functions of original features and a relation matrix, S represents a Sigmoid function, and H representsRAnd HIRepresenting a learnable parameter.
The dual-flow cross-modal deep neural network uses two granularity branches to learn the features of two modalities. Under the global granularity, ResNet-50 pre-trained on ImageNet is used as a backbone network, the first layer in the ResNet-50 is used as a double-current network which does not share parameters, and the characteristics of visible light and infrared modes are respectively extracted. The second to fifth layers thereafter serve as a network structure for sharing parameters, which is used to extract the mode-invariant features of the two modes. After modal relationship aggregation, dimension reduction is carried out on the features by using a linear layer, and then the feature space is optimized by using the loss of the sample triples which are difficult to be loaded, the loss of the identity and the loss of the MMD-ID, so that the sample distance of the same identity is shortened, and the sample distance of different identities is lengthened. The specific loss function calculation method is as follows:
wherein the identity Loss (Identification Loss) is as follows:
Figure BDA0003456045260000061
wherein n represents the total number of samples, xiRepresenting a given input image, yiIndicates its corresponding label, predict xiIs identified as category yiIs encoded by the Softmax function, by p (y)i|xi) And (4) showing.
The hard-to-load sample triplet Loss (TriHard Loss) is:
Figure BDA0003456045260000062
in the formula, for each trained batch, pedestrian samples with P identities are randomly selected, and K different sample pictures are randomly selected for each identity, so that a P × K batch is formed. Then, for each sample a in the batch, selecting a different label sample closest to the sample a in the batch as a refractory sample; da,pDenotes the distance (label is the same) between the specimen a and its positive specimen, da,nRepresents the distance of the sample a from its negative sample (label is different); and a sample set with the same label as a is A, a sample set with a label different from a is B, and epsilon represents a threshold parameter set according to actual requirements.
The MMD-ID loss function is:
Figure BDA0003456045260000063
Figure BDA0003456045260000071
in the formula (I), the compound is shown in the specification,
Figure BDA0003456045260000072
and
Figure BDA0003456045260000073
respectively, the kernel similarities between the samples in the same mode,
Figure BDA0003456045260000074
representing similarity across modal samples; pcAnd QcRespectively representing the visible light image and infrared image sample distribution of the c-tag sample.
Under local granularity, using ResNet-50 pre-trained on ImageNet as a backbone network, and extracting the unique characteristics of the modes without sharing parameters of all layers of ResNet-50. And then, after six blocks of sample images are segmented, connecting the characteristics of two modes of the corresponding part, and entering a linear layer of shared parameters for dimension reduction. And finally, optimizing the feature space by using the same loss function in the global granularity.
And 3, inputting the visible light or infrared pedestrian images in the day or at night as a query target into a network, and giving a pedestrian image list with higher similarity to the query target in another modal data set by the trained multi-granularity double-current depth network, thereby realizing the cross-modal pedestrian matching function.
Aiming at the problem of large modal difference, the invention reduces the difference between the modes by calculating the relationship between the two modes and adding the relationship with the original characteristics. And multi-granularity feature extraction is further used, global and comprehensive features and local finer features are better combined, and information in the images of the two modes is better utilized. And finally, three loss functions of identity loss, triple loss of difficult negative samples and MMD-ID loss are introduced, the characteristic space is optimized, and the performance of the model is further improved. Experiments show that the model provided by the method has good effect in a real scene, and obtains high accuracy
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A visible light infrared pedestrian re-identification method based on multi-modal relational polymerization is characterized by comprising the following steps: the method comprises the following steps:
step 1, acquiring all-weather multi-angle visible light infrared pedestrian monitoring video information through real scene collection, and preprocessing to obtain pedestrian image samples under two modalities of a multi-identity information label;
step 2, constructing a multi-granularity double-current cross-modal deep neural network based on modal relation aggregation, initializing network parameters, inputting the pedestrian image samples of the two modalities obtained in the step 1 as a network, using corresponding identity information as a label, performing supervised training on the multi-granularity double-current cross-modal deep neural network, and continuously adjusting to obtain deep learning network parameters with optimal effects;
and 3, inputting the visible light or infrared pedestrian images in the day or at night as a query target into a network, and giving a pedestrian image list with higher similarity to the query target in another modal data set by the trained multi-granularity double-current cross-modal deep neural network, thereby realizing the cross-modal pedestrian matching function.
2. The visible light infrared pedestrian re-identification method based on multi-modal relational polymerization according to claim 1, wherein: in the step 1, the pretreatment comprises the following steps: and carrying out frame sampling and cutting operation, recording the identity information corresponding to different pedestrians, reserving all information of the pedestrians in the image after data preprocessing, and removing redundant invalid backgrounds and other parts in the image.
3. The visible light infrared pedestrian re-recognition method based on multi-modal relational aggregation according to claim 1, wherein: in the step 1, the data set is a SYSU-MM01 data set based on a real scene.
4. The visible light infrared pedestrian re-identification method based on multi-modal relational polymerization according to claim 1, wherein: in the step 2, the multi-granularity double-flow cross-modal depth neural network updates the original features of each channel by using a cross-modal relationship, and supplements the information lacking in the single-modal features, so as to reduce the difference between the visible light modal image and the infrared modal image, wherein the calculation process is as follows:
firstly, extracting the original visible light characteristics fRAnd infrared characteristic fIAnd (3) carrying out batch standardization (BN) and ReLU activation function operation to obtain two processed modal characteristics:
fR=σRfR,fI=σIfI
in the formula, σRAnd σIRepresents a learnable parameter;
then, the feature map of each channel is taken as a feature vector, and f is calculated by the following formulaRThe ith feature vector and fIEuclidean distance between jth feature vectors
Figure FDA0003456045250000011
Figure FDA0003456045250000012
In the formula, | | will | represent L2 norm, and then dijFormed relationship matrix MRSame principle for infrared features
Figure FDA0003456045250000021
The same operation is carried out to obtain a relation matrix MI
In order to avoid losing the original information, the original characteristics and the relation matrix information are fused through the following formula to obtain the characteristics for finally realizing the reduction of the difference between the modes,
fR=fR+fR×S(HR[α(fR),β(MR)])
fI=fI+fI×S(HI[α(fI),β(MI)])
in the formula, alpha and beta represent two embedded functions of original features and a relation matrix, S represents a Sigmoid function, and H representsRAnd HIRepresenting a learnable parameter.
5. The visible light infrared pedestrian re-recognition method based on multi-modal relational aggregation according to claim 1, wherein: in step 2, the double-flow cross-modal deep neural network learns the features of two modalities by using two granularity branches: under the global granularity, using ResNet-50 pre-trained on ImageNet as a backbone network, taking a first layer in the ResNet-50 as a double-current network which does not share parameters, respectively extracting characteristics of visible light and infrared modes, and taking a second layer to a fifth layer as network structures which share parameters and are used for extracting the mode invariant characteristics of the two modes; after modal relationship aggregation, using a linear layer to reduce the dimension of the features, and then using the loss of the difficult-to-load sample triples, the loss of the identities and the MMD-ID loss to optimize the feature space, so that the sample distance of the same identity is shortened, and the sample distances of different identities are lengthened; the specific loss function calculation method is as follows:
wherein the identity Loss (Identification Loss) is as follows:
Figure FDA0003456045250000022
wherein n represents the total number of samples, xiRepresenting a given input image, yiIndicates its corresponding label, predict xiIs identified as category yiIs encoded by the Softmax function, by p (y)i|xi) Represents;
the hard-to-load sample triplet Loss (TriHard Loss) is:
Figure FDA0003456045250000023
in the formula, for each trained batch, randomly selecting P identity pedestrian samples, randomly selecting K different sample pictures for each identity to form a P x K batch, and then selecting a different label sample closest to the sample a in the batch as a difficultly negative sample for each sample a in the batch; da,pDenotes the distance (label is the same) between the specimen a and its positive specimen, da,nRepresents the distance of the sample a from its negative sample (label is different); and a sample set with the same label as a is A, a sample set with a label different from a is B, and epsilon represents a threshold parameter set according to actual requirements.
The MMD-ID loss function is:
Figure FDA0003456045250000031
Figure FDA0003456045250000032
in the formula (I), the compound is shown in the specification,
Figure FDA0003456045250000033
and
Figure FDA0003456045250000034
respectively, the kernel similarities between the samples in the same mode,
Figure FDA0003456045250000035
representing similarity across modal samples; pcAnd QcRespectively representing the visible light image and the infrared image sample distribution of the c label sample;
under local granularity, using ResNet-50 pre-trained on ImageNet as a backbone network, not sharing parameters of all layers of ResNet-50, and extracting unique characteristics of modes; after six blocks of sample images are segmented, connecting the characteristics of two modes of the corresponding part, and entering a linear layer of shared parameters for dimension reduction; and finally, optimizing the feature space by using the same loss function in the global granularity.
CN202210004347.8A 2022-01-05 2022-01-05 Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization Pending CN114511878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210004347.8A CN114511878A (en) 2022-01-05 2022-01-05 Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210004347.8A CN114511878A (en) 2022-01-05 2022-01-05 Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization

Publications (1)

Publication Number Publication Date
CN114511878A true CN114511878A (en) 2022-05-17

Family

ID=81550377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210004347.8A Pending CN114511878A (en) 2022-01-05 2022-01-05 Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization

Country Status (1)

Country Link
CN (1) CN114511878A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152851A (en) * 2023-10-09 2023-12-01 中科天网(广东)科技有限公司 Face and human body collaborative clustering method based on large model pre-training
CN117528233A (en) * 2023-09-28 2024-02-06 哈尔滨航天恒星数据***科技有限公司 Zoom multiple identification and target re-identification data set manufacturing method
CN117746467A (en) * 2024-01-05 2024-03-22 南京信息工程大学 Modal enhancement and compensation cross-modal pedestrian re-recognition method
CN117935172A (en) * 2024-03-21 2024-04-26 南京信息工程大学 Visible light infrared pedestrian re-identification method and system based on spectral information filtering

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117528233A (en) * 2023-09-28 2024-02-06 哈尔滨航天恒星数据***科技有限公司 Zoom multiple identification and target re-identification data set manufacturing method
CN117528233B (en) * 2023-09-28 2024-05-17 哈尔滨航天恒星数据***科技有限公司 Zoom multiple identification and target re-identification data set manufacturing method
CN117152851A (en) * 2023-10-09 2023-12-01 中科天网(广东)科技有限公司 Face and human body collaborative clustering method based on large model pre-training
CN117152851B (en) * 2023-10-09 2024-03-08 中科天网(广东)科技有限公司 Face and human body collaborative clustering method based on large model pre-training
CN117746467A (en) * 2024-01-05 2024-03-22 南京信息工程大学 Modal enhancement and compensation cross-modal pedestrian re-recognition method
CN117746467B (en) * 2024-01-05 2024-05-28 南京信息工程大学 Modal enhancement and compensation cross-modal pedestrian re-recognition method
CN117935172A (en) * 2024-03-21 2024-04-26 南京信息工程大学 Visible light infrared pedestrian re-identification method and system based on spectral information filtering

Similar Documents

Publication Publication Date Title
CN114220124B (en) Near infrared-visible light cross-mode double-flow pedestrian re-identification method and system
CN110414368B (en) Unsupervised pedestrian re-identification method based on knowledge distillation
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
Wang et al. A survey of vehicle re-identification based on deep learning
CN114511878A (en) Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization
CN114005096B (en) Feature enhancement-based vehicle re-identification method
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
CN111666851B (en) Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
Tang et al. Multi-modal metric learning for vehicle re-identification in traffic surveillance environment
Garg et al. Delta descriptors: Change-based place representation for robust visual localization
CN110598543B (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
CN113989851B (en) Cross-modal pedestrian re-identification method based on heterogeneous fusion graph convolution network
CN112906605B (en) Cross-mode pedestrian re-identification method with high accuracy
CN105528794A (en) Moving object detection method based on Gaussian mixture model and superpixel segmentation
CN107688830B (en) Generation method of vision information correlation layer for case serial-parallel
Fu et al. Learning latent features with local channel drop network for vehicle re-identification
CN115457082A (en) Pedestrian multi-target tracking algorithm based on multi-feature fusion enhancement
Wang et al. V2I-CARLA: A novel dataset and a method for vehicle reidentification-based V2I environment
Li et al. Tvg-reid: Transformer-based vehicle-graph re-identification
Zhang et al. Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost
Hassan et al. Crowd counting using deep learning based head detection
Zhou Deep learning based people detection, tracking and re-identification in intelligent video surveillance system
CN116580333A (en) Grain depot vehicle tracking method based on YOLOv5 and improved StrongSORT
CN116229511A (en) Identification re-recognition method based on golden monkey trunk feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination