CN112434654A - Cross-modal pedestrian re-identification method based on symmetric convolutional neural network - Google Patents
Cross-modal pedestrian re-identification method based on symmetric convolutional neural network Download PDFInfo
- Publication number
- CN112434654A CN112434654A CN202011430914.3A CN202011430914A CN112434654A CN 112434654 A CN112434654 A CN 112434654A CN 202011430914 A CN202011430914 A CN 202011430914A CN 112434654 A CN112434654 A CN 112434654A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- visible light
- feature vector
- infrared light
- sample feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 19
- 239000013598 vector Substances 0.000 claims description 93
- 230000006870 function Effects 0.000 claims description 33
- 238000009826 distribution Methods 0.000 claims description 24
- 238000005457 optimization Methods 0.000 claims description 13
- 239000011541 reaction mixture Substances 0.000 claims description 7
- 230000008014 freezing Effects 0.000 claims description 6
- 238000007710 freezing Methods 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 4
- 230000003042 antagnostic effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 9
- 238000013528 artificial neural network Methods 0.000 abstract 2
- 230000000694 effects Effects 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 5
- 238000002679 ablation Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a cross-modal pedestrian re-identification method based on a symmetric convolutional neural network, which comprises the following steps of: 1, acquiring pedestrian photos under two different modes of visible light and infrared light, constructing a cross-mode pedestrian re-identification data set, and constructing a search library; 2, establishing a symmetrical convolution neural network cross-modal pedestrian re-identification method model by using a neural network; training a cross-modal pedestrian re-identification method model based on a symmetric convolutional neural network by using a data set; and 4, forecasting is realized by utilizing the established model so as to achieve the purpose of cross-modal pedestrian re-identification. The method can greatly relieve the problem that the existing pedestrian re-identification method is inaccurate in detection in the cross-mode state, and still has higher detection precision under the condition of larger modal difference.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a cross-modal pedestrian re-identification method based on a symmetric convolutional neural network.
Background
Cross-modal pedestrian re-identification is an important subject in the field of computer vision. Cross-modal pedestrian re-identification plays a crucial role in target tracking, video monitoring and public security, and is concerned by more and more scholars.
For a traditional single-mode pedestrian re-identification task, data only comprises visible light images, and the difficulty is mainly that the visual angle of a camera is changed, the camera is shielded, the posture of a pedestrian is changed, the illumination is changed, the background is complex, and the like. The cross-mode pedestrian re-identification data includes not only a visible light image but also an infrared light image, and image retrieval needs to be performed in the two modes. At night, the visible camera has difficulty capturing enough pedestrian appearance information due to weak lighting, and the pedestrian appearance information is mainly acquired by the infrared camera or the depth camera. Because the imaging mechanisms of the two cameras are different, two modes are formed, and huge mode difference exists between the two images. The visible light image is different from the infrared light image, and as shown in fig. 1, it can be seen that the visible light image contains more color information than the infrared light image. In addition to the intra-modal differences, inter-modal differences also present another significant problem to be solved for cross-modal pedestrian re-identification.
The inter-modal differences between the visible light modality and the infrared light modality can be subdivided into characteristic differences and appearance differences. To reduce the impact of feature variation, some approaches reduce feature variation by aligning cross-modal features using a uniform embedding space, but this ignores the large apparent difference between the two modalities. Other methods use generation of a countermeasure network (GAN) to effect image conversion between visible and infrared light images in such a way as to reduce the effect of appearance differences. Although the virtual image generated by GAN is similar to the original image, it is not guaranteed that the detail information related to the identity is generated, and the generated information is not guaranteed to be completely reliable.
Disclosure of Invention
The invention provides a cross-modal pedestrian re-identification method based on a symmetric convolutional neural network aiming at the problem of inter-modal and intra-modal differences, so as to reduce the inter-modal and intra-modal differences and improve the re-identification effect and precision.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a cross-modal pedestrian re-identification method based on a symmetric convolutional neural network, which is characterized by comprising the following steps of:
Collecting an infrared light image set T of N pedestrians by using an infrared light camera or a depth camera, wherein m infrared light images of the ith pedestrian are recorded as TiAnd T isi={Ti 1,Ti 2,...,Ti m},Ti mAn m-th infrared light image representing an i-th pedestrian;
constructing a search library by visible light pictures and infrared light images of other pedestrians with known identity information;
step 2, constructing a symmetrical convolutional neural network consisting of a generator and a discriminator;
the generator consists of two independent columns of ResNet50 networks, wherein the ResNet50 network consists of d residual sub-modules, and a column of full connection layers S is added after the d-1 residual sub-module1Adding a column of fully-connected layers S after the d-th residual sub-module2;
The discriminator consists of a visible light image classifier and an infrared light image classifier;
initializing network weights for the ResNet50 network;
initializing parameters of the full connection layer and the discriminator by adopting a random initialization mode;
step 3, respectively inputting the visible light image set V and the infrared light image set T of the N pedestrians into two independent ResNet50 networks, and outputting a d-1 group of visible light characteristic information V after a d-1 residual sub-moduled-1And d-1 group infrared light characteristic information td-1Respectively inputting the d-th residual error sub-module and outputting a d-th group of visible light characteristic information vdAnd d group infrared light characteristic information td;
Step 4, constructing the d-1 th sample feature space Xd-1;
Selecting visible light characteristic information and infrared light characteristic information of P pedestrians from all the characteristic information output by the d-1 th residual error submodule, wherein the visible light characteristic information v of each pedestriani,d-1And infrared light characteristic information ti,d-1Respectively selecting K pieces of feature information to construct a d-1 th sample feature space Xd-1;
The d-1 th sample feature space Xd-1Are input together to the subsequent fully-connected layer S1D-1 group visible light feature vector v 'is output'd-1And infrared light feature vector t'd-1;
Step 5, constructing the d sample feature space Xd;
Selecting visible light characteristic information and infrared light characteristic information of P pedestrians from all the characteristic information output by the d residual error submodule, wherein the visible light characteristic information v of each pedestriani,dAnd infrared light characteristic information ti,dRespectively selecting K pieces of feature information to construct a d-th sample feature space Xd;
Then the d sample feature space X is useddAre input together to the subsequent fully-connected layer S2And outputs a d-th group visible light feature vector v'dAnd infrared light feature vector t'd;
Step 6, the d-1 group of visible light feature vectors v'd-1Inputting the infrared light characteristic vector into the visible light image classifier, outputting an initial probability distribution GV of the visible light, and converting the d-1 th group of infrared light characteristic vectors t'd-1Inputting the infrared light image classifier and outputting an initial probability distribution GT of infrared light;
construction of identity loss function L using equation (1)ID:
Step 7, from the d-1 sample feature space Xd-1To select the kth characteristic information of the a-th pedestrianIf the feature vector is recorded as the anchor sample feature vector, the anchor sample feature vector is recorded together with the anchor sample feature vectorThe z-th characteristic information of the a-th pedestrian with the same identity informationIs denoted as the z-th positive sample feature vector, andthe c characteristic information of the f pedestrian with different identity informationAnd (4) establishing a mixed ternary loss function L by using the formula (2) after the c negative sample feature vector is recordedTRI1(Xd-1):
In the formula (2), the reaction mixture is,representing anchor sample feature vectorsAnd the z-th positive sample feature vectorThe Euclidean distance of (a) is,representing anchor sample feature vector and the c negative sample feature vector of the f pedestrianEuclidean distance of (p)1Is a mixed ternary loss function LTRI1(Xd-1) A predefined minimum interval;
step 8, from the d sample characteristic space XdIn the method, the s characteristic information of the r pedestrian is selectedIf the feature vector is recorded as the anchor sample feature vector, the anchor sample feature vector is recorded together with the anchor sample feature vectorB-th characteristic information of r-th pedestrian with same identity informationIs denoted as the b-th positive sample feature vector, andqth characteristic information of the h pedestrian with different identity informationAnd (4) establishing a mixed ternary loss function L by using the formula (3) after the q negative sample feature vector is recordedTRI2(Xd):
In the formula (3), the reaction mixture is,representing anchor sample feature vectorsAnd the b-th positive sample feature vectorThe Euclidean distance of (a) is,representing the anchor sample feature vector and the qth negative sample feature vector of the h pedestrianEuclidean distance of (p)2Is a mixed ternary loss function LTRI2(Xd) A predefined minimum interval;
step 9, establishing a mixed ternary loss function L by using the formula (4)TRI:
LTRI=LTRI1+LTRI2 (4)
Establishing a global penalty function L using equation (5)ALL:
LALL=LID+βLTRI (5)
In the formula (5), β represents a mixed ternary loss function LTRIThe coefficient of (a);
carrying out optimization solution on the formula (5) by a random gradient descent method, carrying out gradient back propagation, training each parameter of the symmetric convolutional neural network, and obtaining a preliminarily trained symmetric convolutional neural network model;
step 10, the d-1 group visible light feature vector v'd-1Inputting the visible light image classifier in the preliminarily trained symmetric convolutional neural network model, and outputting visible lightD-1 group infrared light feature vector t'd-1Inputting the infrared light image classifier in the preliminarily trained symmetric convolutional neural network model, and outputting the probability distribution GT' of infrared light; d-1 group visible light feature vector v'd-1Inputting the pseudo visible light probability distribution GV 'into an infrared light classifier in the preliminarily trained symmetric convolutional neural network model to obtain a pseudo visible light probability distribution GV';
constructing a divergence loss function L between the pseudo visible light eigenvector GV 'and the visible light probability distribution GV' by using the formula (6)KL:
LKL=KL(GV″,GV′) (6)
In the formula (6), KL (·,) represents the difference value of the probability distributions of the two;
establishing a discriminator loss function L using equation (7)DIS:
LDIS=LID-αLKL (7)
In the formula (7), α represents LKLThe coefficient of (a);
step 11, establishing a generator loss function L by using the formula (8)GEN:
LGEN=αLKL+βLTRI (8)
And 12, sequentially optimizing and solving the formula (5), the formula (7) and the formula (8) by a gradient descent method:
firstly, carrying out optimization solution on the formula (5) and training all parameters of the network;
secondly, carrying out optimization solution on the formula (7), in the gradient back propagation process, only carrying out back propagation on the gradient of the discriminator, and setting the gradient of the generator to zero, thereby freezing the generator parameters and training the discriminator parameters;
finally, carrying out optimization solution on the formula (8), in the gradient back propagation process, only carrying out back propagation on the gradient of the generator, and setting the gradient of the discriminator to zero, thereby freezing the parameters of the discriminator and training the parameters of the generator;
after training in turn, make LALL,LDIS,LGENConverge to optimum in antagonistic learning when LDISReach the optimumWhen the discriminator is optimal, when LGENWhen the optimal condition is reached, the generator is optimal, so that a final cross-modal pedestrian re-identification model of the symmetric convolutional neural network is obtained;
step 13, utilizing the final symmetrical convolutional neural network model to query and match the cross-modal pedestrian re-identification;
inputting the pedestrian image to be inquired into a final symmetrical convolutional neural network model to extract features, then carrying out similarity comparison with the features of the pedestrians in a search library, and finding corresponding pedestrian identity information from the ranking list according to the sequence of the similarity, thereby obtaining an identification result.
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the difference between the modes, the invention combines the mode confusion idea based on probability distribution with counterstudy to construct a symmetrical convolutional neural network, the symmetrical convolutional neural network is composed of a generator and a discriminator, and the network generates the mode invariant feature by minimizing the output probability distribution difference of a classifier in the discriminator, thereby achieving the purpose of mode confusion and realizing higher detection precision under the conditions of shielding, pedestrian posture change, illumination change and mode change.
2. In order to solve the two problems of the difference between the modes and the difference in the modes, the invention combines the ternary loss with the counterstudy and provides a mixed ternary loss to reduce the difference between the modes and the difference in the modes. When mode confusion is achieved through counterlearning, the positive sample and the negative sample are selected under the condition that modes are not distinguished to conduct feature alignment and reduce mode difference, so that high detection precision is achieved under the condition that the mode difference is large, and the adaptability of the method is high.
3. According to the capability of describing structure and spatial information of the hidden layer convolution characteristics, the invention adopts d-1 layer hidden layer convolution characteristics (namely characteristics from ResNet50 network residual submodule) as a rear full-connection layer S1And the input of the following discriminator, so that the network can learn more space structure information and reduce the color differenceThe influence is reduced, and the difference between the two modes is reduced, so that the detection precision of the invention is improved, and the invention has stronger applicability in the fields of target tracking, video monitoring, public security and the like.
4. The invention aligns the features at different depths of the symmetrical convolutional neural network, so that the network can learn more deep information, the robustness of the network is improved, the problem of inaccurate detection of the existing pedestrian re-identification method in a cross-mode state can be greatly solved, and the accurate detection can be realized under the condition of appearance difference and other problems.
Drawings
FIG. 1 is a schematic view of two modes of a prior art mid-modal pedestrian;
FIG. 2 is a network architecture proposed by the present invention;
FIG. 3 is a schematic view of inter-modal and intra-modal losses to which the present invention relates;
FIG. 4 is a graph of the results of the alpha variable on the RegDB data set in the present invention;
FIG. 5 is a graph of the results of the alpha variable on the SYSU-MM01 dataset in accordance with the present invention;
FIG. 6 is a graph of the results of the beta variable on the RegDB data set in the present invention;
FIG. 7 is a graph of the results of the beta variable on the SYSU-MM01 dataset in accordance with the present invention.
Detailed Description
In the embodiment, a cross-modal pedestrian re-identification method based on a symmetric convolutional neural network mainly reduces inter-modal and intra-modal differences by using the symmetric convolutional neural network and counterstudy; and the network is optimized on different network depths, and the appearance difference is reduced by utilizing the shallow feature with more space structure information. Referring to fig. 1, there is shown a schematic diagram of images in two different modalities, the detailed steps are as follows:
Collecting an infrared light image set T of N pedestrians by using an infrared light camera or a depth camera, wherein m infrared light images of the ith pedestrian are recorded as TiAnd T isi={Ti 1,Ti 2,...,Ti m},Ti mAn m-th infrared light image representing an i-th pedestrian;
constructing a search library by visible light pictures and infrared light images of other pedestrians with known identity information;
this embodiment utilizes the RegDB dataset and the SYSU-MM01 dataset. The SYSU-MM01 is a large-scale cross-modal pedestrian re-identification dataset collected by four visible light cameras and two infrared light cameras. The data set has two different scenes, namely indoor and outdoor, and the training set comprises 395 pieces of pedestrian identity data information, wherein 11909 infrared light pedestrian images and 22258 visible light pedestrian images are shared.
The RegDB data set contains 412 pedestrian identity information, which are captured by the dual camera system. Each pedestrian ID contains 10 visible light images and 10 infrared light images in total. The invention adopts a recognized data set processing method, randomly divides all data in the data set into two parts, and randomly selects a part of data for training.
Step 2, constructing a symmetrical convolutional neural network consisting of a generator and a discriminator;
the generator consists of two independent columns of ResNet50 networks, where the ResNet50 network consists of d residual sub-modules, a column of fully connected layers S is added after the d-1 th residual sub-module1Adding a column of fully-connected layers S after the d-th residual sub-module2;S1,S2For extracting modal sharing information; the ResNet50 network adopted by the invention is composed of 4 residual submodules, wherein d is 4, and d-1 is 3; full connection layer S1,S2The number of the neurons is set to 1024;
the discriminator consists of a visible light image classifier and an infrared light image classifier, which are shown in fig. 2;
initializing network weights for the ResNet50 network;
initializing parameters of the full connection layer and the discriminator by adopting a random initialization mode;
step 3, respectively inputting the visible light image set V and the infrared light image set T of N pedestrians into two independent ResNet50 networks for extracting the characteristic information of the pedestrians, and outputting d-1 group of visible light characteristic information V after the d-1 residual sub-moduled-1And d-1 group infrared light characteristic information td-1Respectively inputting the d-th residual error sub-module and outputting the d-th group of visible light characteristic information vdAnd d group infrared light characteristic information td;
Step 4, constructing the d-1 th sample feature space Xd-1;
Selecting visible light characteristic information and infrared light characteristic information of P pedestrians from all the characteristic information output by the d-1 th residual error submodule, wherein the visible light characteristic information v of each pedestriani,d-1And infrared light characteristic information ti,d-1Respectively selecting K pieces of feature information to construct a d-1 th sample feature space Xd-1(ii) a In the invention, P is 16, K is 4;
the d-1 th sample feature space Xd-1Are input together to the subsequent fully-connected layer S1And d-1 group visible light feature vector v 'is output for extracting modal sharing information'd-1And infrared light feature vector t'd-1;
Step 5, constructing the d sample feature space Xd;
Selecting visible light characteristic information and infrared light characteristic information of P pedestrians from all the characteristic information output by the d residual error submodule, wherein the visible light characteristic information v of each pedestriani,dAnd infrared light characteristic information ti,dRespectively selecting K pieces of feature information to construct a d-th sample feature space Xd;P=16,K=4;
Then the d sample feature space XdAre input together to the subsequent fully-connected layer S2For extracting modal sharing information and outputting d-th group visible light feature vector v'dAnd infrared light feature vector t'd;
Step 6, setting the d-1 group visible light feature vector v'd-1Inputting into a visible light image classifier, outputting an initial probability distribution GV of visible light, and converting the d-1 th group of infrared light feature vectors t'd-1Inputting the infrared light image into an infrared light image classifier, and outputting an initial probability distribution GT of the infrared light;
construction of identity loss function L using equation (1)ID:
Step 7, from the d-1 th sample feature space Xd-1To select the kth characteristic information of the a-th pedestrianIf the feature vector is recorded as the anchor sample feature vector, the anchor sample feature vector is recorded together with the anchor sample feature vectorThe z-th characteristic information of the a-th pedestrian with the same identity informationIs denoted as the z-th positive sample feature vector, andthe c characteristic information of the f pedestrian with different identity informationAnd (4) establishing a mixed ternary loss function L by using the formula (2) after the c negative sample feature vector is recordedTRI1(Xd-1):
In the formula (2), the reaction mixture is,representing anchor sample feature vectorsAnd the z-th positive sample feature vectorThe Euclidean distance of (a) is,representing anchor sample feature vector and the c negative sample feature vector of the f pedestrianEuclidean distance of (p)1Is a mixed ternary loss function LTRI1(Xd-1) A predefined minimum interval; is set to rho10.5. The distance between the anchor sample feature vector and the positive sample feature vector can be reduced by optimizing equation (2) and the distance between the anchor sample feature vector and the negative sample feature vector can be increased. As shown in fig. 3;
step 8, from the d sample characteristic space XdIn the method, the s characteristic information of the r pedestrian is selectedIf the feature vector is recorded as the anchor sample feature vector, the anchor sample feature vector is recorded together with the anchor sample feature vectorB-th characteristic information of r-th pedestrian with same identity informationIs denoted as the b-th positive sample feature vector, andqth characteristic information of the h pedestrian with different identity informationAnd (4) establishing a mixed ternary loss function L by using the formula (3) after the q negative sample feature vector is recordedTRI2(Xd):
In the formula (3), the reaction mixture is,representing anchor sample feature vectorsAnd the b-th positive sample feature vectorThe Euclidean distance of (a) is,representing the anchor sample feature vector and the qth negative sample feature vector of the h pedestrianEuclidean distance of (p)2Is a mixed ternary loss function LTRI2(Xd) A predefined minimum interval; is set to rho2=0.5。
Step 9, establishing a mixed ternary loss function L by using the formula (4)TRI:
LTRI=LTRI1+LTRI2 (4)
Establishing a global penalty function L using equation (5)ALL:
LALL=LID+βLTRI (5)
In the formula (5), β represents a mixed ternary loss function LTRIThe coefficient of (a). The coefficient β is set to β 1.4.
Carrying out optimization solution on the formula (5) by a random gradient descent method, carrying out gradient back propagation, training each parameter of the symmetric convolutional neural network, and obtaining a preliminarily trained symmetric convolutional neural network model;
step 10, setting a d-1 group visible light feature vector v'd-1Inputting the infrared light characteristic vectors t ' of the d-1 group into a visible light image classifier in a symmetrical convolutional neural network model after preliminary training, outputting the probability distribution GV ' of visible light 'd-1Inputting the infrared light image into an infrared light image classifier in a preliminarily trained symmetric convolutional neural network model, and outputting the probability distribution GT' of infrared light; d-1 group visible light feature vector v'd-1Inputting the pseudo visible light probability distribution GV' into an infrared light classifier in the preliminarily trained symmetric convolutional neural network model;
construction of divergence loss function L between pseudo visible light eigenvector GV 'and visible light probability distribution GV' using equation (6)KL:
LKL=KL(GV″,GV′) (6)
In the formula (6), the reaction mixture is,to representAnddifference values of probability distributions;
establishing a discriminator loss function L using equation (7)DIS:
LDIS=LID-αLKL (7)
In the formula (7), α represents LKLThe coefficient of (a). The coefficient α is set to α ═ 1.
Step 11, establishing a generator loss function L by using the formula (8)GEN:
LGEN=αLKL+βLTRI (8)
The invention has performed a verification experiment on the setting of α, β, and fig. 4 is the effect of the coefficient α on the RegDB data set in the invention; FIG. 5 is a graph of the effect of the coefficient α on the SYSU-MM01 data set; the performance of the invention is proved to be better when alpha is 1;
FIG. 6 is a graph of the effect of the coefficient β on the RegDB data set in the present invention; FIG. 7 is a graph of the effect of the coefficient β on the SYSU-MM01 data set; when alpha is 1 and beta is 1.4, the performance of the invention is optimal, and experiments prove that good results can be obtained in a wider value range of alpha and beta, which reflects the superiority of the invention.
And 12, sequentially carrying out optimization solving on the formula (5), the formula (7) and the formula (8) by a gradient descent method. The invention optimizes a network model using an adaptive gradient optimizer (Adam).
Firstly, carrying out optimization solution on the formula (5) and training all parameters of the network;
secondly, carrying out optimization solution on the formula (7), in the gradient back propagation process, only carrying out back propagation on the gradient of the discriminator, and setting the gradient of the generator to zero, thereby freezing the generator parameters and training the discriminator parameters;
finally, carrying out optimization solution on the formula (8), in the gradient back propagation process, only carrying out back propagation on the gradient of the generator, and setting the gradient of the discriminator to zero, thereby freezing the parameters of the discriminator and training the parameters of the generator;
after training in turn, make LALL,LDIS,LGENConverge to optimum in antagonistic learning when LDISWhen the optimum is reached, the discriminator is optimum, when L isGENWhen the optimal condition is reached, the generator is optimal, so that a final cross-modal pedestrian re-identification model of the symmetric convolutional neural network is obtained;
step 13, utilizing the final symmetrical convolutional neural network model to query and match the cross-modal pedestrian re-identification;
inputting the pedestrian image to be inquired into a final symmetrical convolutional neural network model to extract features, then carrying out similarity comparison with the features of the pedestrians in a search library, and finding corresponding pedestrian identity information from the ranking list according to the sequence of the similarity, thereby obtaining an identification result.
Example (b):
in order to prove the effectiveness of the invention, some comparative tests are carried out with other methods, and as shown in table 1, compared with other methods in the prior art, the effect of the invention is obviously better, and the effectiveness of the invention is proved. Ablation experiments were also performed on each module of the network of the present invention, and the results of the experiments are shown in table 2, demonstrating the effectiveness of each module of the present invention.
Table 1 is a graph comparing the effectiveness of the present invention with other methods
Table 2 shows the related ablation experimental graphs of the present invention
Experiments prove that the method can greatly relieve the problem that the existing pedestrian re-identification method is inaccurate in detection in the cross-mode, and still has higher detection precision under the condition of larger modal difference.
Claims (1)
1. A cross-mode pedestrian re-identification method based on a symmetric convolutional neural network is characterized by comprising the following steps:
step 1, collecting a visible light image set V of N pedestrians, wherein j visible light images of the ith pedestrian are recorded as ViAnd V isi={Vi 1,Vi 2,...,Vi j},Vi jA jth visible light picture representing the ith pedestrian and giving ith identity information y to the ith pedestriani;i=1,2,…,N;
Collecting an infrared light image set T of N pedestrians by using an infrared light camera or a depth camera, wherein m infrared light images of the ith pedestrian are recorded as TiAnd T isi={Ti 1,Ti 2,...,Ti m},Ti mAn m-th infrared light image representing an i-th pedestrian;
constructing a search library by visible light pictures and infrared light images of other pedestrians with known identity information;
step 2, constructing a symmetrical convolutional neural network consisting of a generator and a discriminator;
the generator consists of two independent columns of ResNet50 networks, wherein the ResNet50 network consists of d residual sub-modules, and a column of full connection layers S is added after the d-1 residual sub-module1Adding a column of fully-connected layers S after the d-th residual sub-module2;
The discriminator consists of a visible light image classifier and an infrared light image classifier;
initializing network weights for the ResNet50 network;
initializing parameters of the full connection layer and the discriminator by adopting a random initialization mode;
step 3, respectively inputting the visible light image set V and the infrared light image set T of the N pedestrians into two independent ResNet50 networks, and outputting a d-1 group of visible light characteristic information V after a d-1 residual sub-moduled-1And d-1 group infrared light characteristic information td-1Respectively inputting the d-th residual error sub-module and outputting a d-th group of visible light characteristic information vdAnd d group infrared light characteristic information td;
Step 4, constructing the d-1 th sample feature space Xd-1;
Selecting visible light characteristic information and infrared light characteristic information of P pedestrians from all the characteristic information output by the d-1 th residual error submodule, wherein the visible light characteristic information v of each pedestriani,d-1And infrared light characteristic information ti,d-1Respectively selecting K pieces of feature information to construct a d-1 th sample feature space Xd-1;
The d-1 th sample feature space Xd-1Are input together to the subsequent fully-connected layer S1D-1 group visible light feature vector v 'is output'd-1And infrared light feature vector t'd-1;
Step 5, constructing the d sample feature space Xd;
Selecting all characteristic information output from the d residual error submoduleVisible light characteristic information and infrared light characteristic information of P pedestrians, and visible light characteristic information v of each pedestriani,dAnd infrared light characteristic information ti,dRespectively selecting K pieces of feature information to construct a d-th sample feature space Xd;
Then the d sample feature space X is useddAre input together to the subsequent fully-connected layer S2And outputs a d-th group visible light feature vector v'dAnd infrared light feature vector t'd;
Step 6, the d-1 group of visible light feature vectors v'd-1Inputting the infrared light characteristic vector into the visible light image classifier, outputting an initial probability distribution GV of the visible light, and converting the d-1 th group of infrared light characteristic vectors t'd-1Inputting the infrared light image classifier and outputting an initial probability distribution GT of infrared light;
construction of identity loss function L using equation (1)ID:
Step 7, from the d-1 sample feature space Xd-1To select the kth characteristic information of the a-th pedestrianIf the feature vector is recorded as the anchor sample feature vector, the anchor sample feature vector is recorded together with the anchor sample feature vectorThe z-th characteristic information of the a-th pedestrian with the same identity informationIs denoted as the z-th positive sample feature vector, andthe c characteristic information of the f pedestrian with different identity informationAnd (4) establishing a mixed ternary loss function L by using the formula (2) after the c negative sample feature vector is recordedTRI1(Xd-1):
In the formula (2), the reaction mixture is,representing anchor sample feature vectorsAnd the z-th positive sample feature vectorThe Euclidean distance of (a) is,representing anchor sample feature vector and the c negative sample feature vector of the f pedestrianEuclidean distance of (p)1Is a mixed ternary loss function LTRI1(Xd-1) A predefined minimum interval;
step 8, from the d sample characteristic space XdIn the method, the s characteristic information of the r pedestrian is selectedIf the feature vector is recorded as the anchor sample feature vector, the anchor sample feature vector is recorded together with the anchor sample feature vectorB-th characteristic information of r-th pedestrian with same identity informationIs denoted as the b-th positive sample feature vector, andqth characteristic information of the h pedestrian with different identity informationAnd (4) establishing a mixed ternary loss function L by using the formula (3) after the q negative sample feature vector is recordedTRI2(Xd):
In the formula (3), the reaction mixture is,representing anchor sample feature vectorsAnd the b-th positive sample feature vectorThe Euclidean distance of (a) is,representing the anchor sample feature vector and the qth negative sample feature vector of the h pedestrianEuclidean distance of (p)2Is a mixed ternary loss function LTRI2(Xd) A predefined minimum interval;
step 9, establishing a mixed ternary loss function L by using the formula (4)TRI:
LTRI=LTRI1+LTRI2 (4)
Establishing a global penalty function L using equation (5)ALL:
LALL=LID+βLTRI (5)
In the formula (5), β represents a mixed ternary loss function LTRIThe coefficient of (a);
carrying out optimization solution on the formula (5) by a random gradient descent method, carrying out gradient back propagation, training each parameter of the symmetric convolutional neural network, and obtaining a preliminarily trained symmetric convolutional neural network model;
step 10, the d-1 group visible light feature vector v'd-1Inputting the infrared characteristic vector t ' of the d-1 group into a visible light image classifier in the preliminarily trained symmetric convolutional neural network model, outputting the probability distribution GV ' of visible light 'd-1Inputting the infrared light image classifier in the preliminarily trained symmetric convolutional neural network model, and outputting the probability distribution GT' of infrared light; d-1 group visible light feature vector v'd-1Inputting the pseudo visible light probability distribution GV 'into an infrared light classifier in the preliminarily trained symmetric convolutional neural network model to obtain a pseudo visible light probability distribution GV';
constructing a divergence loss function L between the pseudo visible light eigenvector GV 'and the visible light probability distribution GV' by using the formula (6)KL:
LKL=KL(GV″,GV′) (6)
In the formula (6), KL (·,) represents the difference value of the probability distributions of the two;
establishing a discriminator loss function L using equation (7)DIS:
LDIS=LID-αLKL (7)
In the formula (7), α represents LKLThe coefficient of (a);
step 11, establishing a generator loss function L by using the formula (8)GEN:
LGEN=αLKL+βLTRI (8)
And 12, sequentially optimizing and solving the formula (5), the formula (7) and the formula (8) by a gradient descent method:
firstly, carrying out optimization solution on the formula (5) and training all parameters of the network;
secondly, carrying out optimization solution on the formula (7), in the gradient back propagation process, only carrying out back propagation on the gradient of the discriminator, and setting the gradient of the generator to zero, thereby freezing the generator parameters and training the discriminator parameters;
finally, carrying out optimization solution on the formula (8), in the gradient back propagation process, only carrying out back propagation on the gradient of the generator, and setting the gradient of the discriminator to zero, thereby freezing the parameters of the discriminator and training the parameters of the generator;
after training in turn, make LALL,LDIS,LGENConverge to optimum in antagonistic learning when LDISWhen the optimum is reached, the discriminator is optimum, when L isGENWhen the optimal condition is reached, the generator is optimal, so that a final cross-modal pedestrian re-identification model of the symmetric convolutional neural network is obtained;
step 13, utilizing the final symmetrical convolutional neural network model to query and match the cross-modal pedestrian re-identification;
inputting the pedestrian image to be inquired into a final symmetrical convolutional neural network model to extract features, then carrying out similarity comparison with the features of the pedestrians in a search library, and finding corresponding pedestrian identity information from the ranking list according to the sequence of the similarity, thereby obtaining an identification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011430914.3A CN112434654B (en) | 2020-12-07 | 2020-12-07 | Cross-modal pedestrian re-identification method based on symmetric convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011430914.3A CN112434654B (en) | 2020-12-07 | 2020-12-07 | Cross-modal pedestrian re-identification method based on symmetric convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112434654A true CN112434654A (en) | 2021-03-02 |
CN112434654B CN112434654B (en) | 2022-09-13 |
Family
ID=74692582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011430914.3A Active CN112434654B (en) | 2020-12-07 | 2020-12-07 | Cross-modal pedestrian re-identification method based on symmetric convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112434654B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033438A (en) * | 2021-03-31 | 2021-06-25 | 四川大学 | Data feature learning method for modal imperfect alignment |
CN113112534A (en) * | 2021-04-20 | 2021-07-13 | 安徽大学 | Three-dimensional biomedical image registration method based on iterative self-supervision |
CN113627272A (en) * | 2021-07-19 | 2021-11-09 | 上海交通大学 | Serious misalignment pedestrian re-identification method and system based on normalization network |
CN114550210A (en) * | 2022-02-21 | 2022-05-27 | 中国科学技术大学 | Pedestrian re-identification method based on modal adaptive mixing and invariance convolution decomposition |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608450A (en) * | 2016-03-01 | 2016-05-25 | 天津中科智能识别产业技术研究院有限公司 | Heterogeneous face identification method based on deep convolutional neural network |
KR101908481B1 (en) * | 2017-07-24 | 2018-12-10 | 동국대학교 산학협력단 | Device and method for pedestraian detection |
CN110580460A (en) * | 2019-08-28 | 2019-12-17 | 西北工业大学 | Pedestrian re-identification method based on combined identification and verification of pedestrian identity and attribute characteristics |
CN110956094A (en) * | 2019-11-09 | 2020-04-03 | 北京工业大学 | RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network |
CN111325115A (en) * | 2020-02-05 | 2020-06-23 | 山东师范大学 | Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss |
CN111539255A (en) * | 2020-03-27 | 2020-08-14 | 中国矿业大学 | Cross-modal pedestrian re-identification method based on multi-modal image style conversion |
CN111597876A (en) * | 2020-04-01 | 2020-08-28 | 浙江工业大学 | Cross-modal pedestrian re-identification method based on difficult quintuple |
US20200302176A1 (en) * | 2019-03-18 | 2020-09-24 | Nvidia Corporation | Image identification using neural networks |
CN111767882A (en) * | 2020-07-06 | 2020-10-13 | 江南大学 | Multi-mode pedestrian detection method based on improved YOLO model |
CN111898510A (en) * | 2020-07-23 | 2020-11-06 | 合肥工业大学 | Cross-modal pedestrian re-identification method based on progressive neural network |
CN111931637A (en) * | 2020-08-07 | 2020-11-13 | 华南理工大学 | Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network |
CN111985313A (en) * | 2020-07-09 | 2020-11-24 | 上海交通大学 | Multi-style pedestrian re-identification method, system and terminal based on counterstudy |
-
2020
- 2020-12-07 CN CN202011430914.3A patent/CN112434654B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608450A (en) * | 2016-03-01 | 2016-05-25 | 天津中科智能识别产业技术研究院有限公司 | Heterogeneous face identification method based on deep convolutional neural network |
KR101908481B1 (en) * | 2017-07-24 | 2018-12-10 | 동국대학교 산학협력단 | Device and method for pedestraian detection |
US20200302176A1 (en) * | 2019-03-18 | 2020-09-24 | Nvidia Corporation | Image identification using neural networks |
CN110580460A (en) * | 2019-08-28 | 2019-12-17 | 西北工业大学 | Pedestrian re-identification method based on combined identification and verification of pedestrian identity and attribute characteristics |
CN110956094A (en) * | 2019-11-09 | 2020-04-03 | 北京工业大学 | RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network |
CN111325115A (en) * | 2020-02-05 | 2020-06-23 | 山东师范大学 | Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss |
CN111539255A (en) * | 2020-03-27 | 2020-08-14 | 中国矿业大学 | Cross-modal pedestrian re-identification method based on multi-modal image style conversion |
CN111597876A (en) * | 2020-04-01 | 2020-08-28 | 浙江工业大学 | Cross-modal pedestrian re-identification method based on difficult quintuple |
CN111767882A (en) * | 2020-07-06 | 2020-10-13 | 江南大学 | Multi-mode pedestrian detection method based on improved YOLO model |
CN111985313A (en) * | 2020-07-09 | 2020-11-24 | 上海交通大学 | Multi-style pedestrian re-identification method, system and terminal based on counterstudy |
CN111898510A (en) * | 2020-07-23 | 2020-11-06 | 合肥工业大学 | Cross-modal pedestrian re-identification method based on progressive neural network |
CN111931637A (en) * | 2020-08-07 | 2020-11-13 | 华南理工大学 | Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network |
Non-Patent Citations (4)
Title |
---|
BO LI等: "Visible Infrared Cross-Modality Person Re-Identification Network Based on Adaptive Pedestrian Alignment", 《IEEE ACCESS》 * |
JIN KYU KANG等: "Person Re-Identification Between Visible and Thermal Camera Images Based on Deep Residual CNN Using Single Input", 《IEEE ACCESS》 * |
王海彬: "基于深度特征的跨模态行人重识别技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
郑爱华等: "基于局部异质协同双路网络的跨模态行人重识别", 《模式识别与人工智能》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033438A (en) * | 2021-03-31 | 2021-06-25 | 四川大学 | Data feature learning method for modal imperfect alignment |
CN113033438B (en) * | 2021-03-31 | 2022-07-01 | 四川大学 | Data feature learning method for modal imperfect alignment |
CN113112534A (en) * | 2021-04-20 | 2021-07-13 | 安徽大学 | Three-dimensional biomedical image registration method based on iterative self-supervision |
CN113112534B (en) * | 2021-04-20 | 2022-10-18 | 安徽大学 | Three-dimensional biomedical image registration method based on iterative self-supervision |
CN113627272A (en) * | 2021-07-19 | 2021-11-09 | 上海交通大学 | Serious misalignment pedestrian re-identification method and system based on normalization network |
CN113627272B (en) * | 2021-07-19 | 2023-11-28 | 上海交通大学 | Serious misalignment pedestrian re-identification method and system based on normalization network |
CN114550210A (en) * | 2022-02-21 | 2022-05-27 | 中国科学技术大学 | Pedestrian re-identification method based on modal adaptive mixing and invariance convolution decomposition |
CN114550210B (en) * | 2022-02-21 | 2024-04-02 | 中国科学技术大学 | Pedestrian re-identification method based on modal self-adaptive mixing and invariance convolution decomposition |
Also Published As
Publication number | Publication date |
---|---|
CN112434654B (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112434654B (en) | Cross-modal pedestrian re-identification method based on symmetric convolutional neural network | |
Wang et al. | Mancs: A multi-task attentional network with curriculum sampling for person re-identification | |
Zhang et al. | Deep-IRTarget: An automatic target detector in infrared imagery using dual-domain feature extraction and allocation | |
Tan et al. | MHSA-Net: Multihead self-attention network for occluded person re-identification | |
CN109934117B (en) | Pedestrian re-identification detection method based on generation of countermeasure network | |
US11810366B1 (en) | Joint modeling method and apparatus for enhancing local features of pedestrians | |
CN111767882A (en) | Multi-mode pedestrian detection method based on improved YOLO model | |
CN112651262B (en) | Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment | |
CN111260594A (en) | Unsupervised multi-modal image fusion method | |
CN110717411A (en) | Pedestrian re-identification method based on deep layer feature fusion | |
CN107315795B (en) | The instance of video search method and system of joint particular persons and scene | |
WO2022160772A1 (en) | Person re-identification method based on view angle guidance multi-adversarial attention | |
CN114694089A (en) | Novel multi-mode fusion pedestrian re-recognition algorithm | |
CN112084952B (en) | Video point location tracking method based on self-supervision training | |
CN116311368A (en) | Pedestrian re-identification method | |
CN114743162A (en) | Cross-modal pedestrian re-identification method based on generation of countermeasure network | |
CN113792686B (en) | Vehicle re-identification method based on visual representation of invariance across sensors | |
CN114882537A (en) | Finger new visual angle image generation method based on nerve radiation field | |
Zhang et al. | Two-stage domain adaptation for infrared ship target segmentation | |
Zhang et al. | Visual Object Tracking via Cascaded RPN Fusion and Coordinate Attention. | |
CN113011359A (en) | Method for simultaneously detecting plane structure and generating plane description based on image and application | |
Wang et al. | Listen, look, and find the one: Robust person search with multimodality index | |
CN115393788B (en) | Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement | |
CN117115850A (en) | Lightweight pedestrian re-identification method based on off-line distillation | |
CN116311504A (en) | Small sample behavior recognition method, system and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |