CN115601791A - Unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution - Google Patents

Unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution Download PDF

Info

Publication number
CN115601791A
CN115601791A CN202211404730.9A CN202211404730A CN115601791A CN 115601791 A CN115601791 A CN 115601791A CN 202211404730 A CN202211404730 A CN 202211404730A CN 115601791 A CN115601791 A CN 115601791A
Authority
CN
China
Prior art keywords
camera
network
domain
training
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211404730.9A
Other languages
Chinese (zh)
Other versions
CN115601791B (en
Inventor
蒋敏
张千
孔军
陶雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202211404730.9A priority Critical patent/CN115601791B/en
Publication of CN115601791A publication Critical patent/CN115601791A/en
Application granted granted Critical
Publication of CN115601791B publication Critical patent/CN115601791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution. The multi-branch network recognition model multi-former is constructed based on a transform network, and comprises a single-camera-domain Intraformer network and a multi-camera-domain Interformer network, all the single-camera-domain Intraformer networks share backbone network parameters, the generalization capability is enhanced, inter-domain differences caused by backgrounds, illumination and the like of different camera domains are relieved to a certain extent, the robustness of the model to noise pseudo labels is improved, and the accuracy of unsupervised pedestrian re-recognition is further improved. By using the self-adaptive outlier sample redistribution, the number of the pseudo labels can be expanded, and the feature representation capability of a multi-branch network recognition model Multiformer is enhanced. During model training, by using the combined learning formed by the example-level comparison learning and the clustering-level comparison learning, the clustering accuracy can be greatly improved, and the problem of noise pseudo-labels is alleviated, so that the accuracy and robustness of unsupervised pedestrian re-identification are effectively improved.

Description

Unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution
Technical Field
The invention relates to an unsupervised pedestrian re-identification method, in particular to an unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution.
Background
With the extensive research in theory and practice of computer vision, pedestrian re-identification is also becoming an important branch of the computer vision, which aims to identify a target pedestrian in a non-overlapping camera. Pedestrian re-identification has a wide range of real-world applications such as criminal searches, multi-camera tracking, and missing person searches.
At present, the traditional research of pedestrian re-identification relies on a large number of manually marked images, the method is inefficient and expensive, the problem is thoroughly solved by unsupervised pedestrian re-identification, the technology does not need to additionally mark the identity of a pedestrian, and compared with the traditional pedestrian re-identification, the unsupervised pedestrian re-identification has wider application space.
Due to the diversity of objective environments and the subjective complexity of pedestrian actions, at present, unsupervised pedestrian re-identification still has many problems to be solved urgently, wherein the problems to be solved mainly include: 1) If no real identity label exists, the model must determine a pseudo identity label related to the training data; at present, similar images are mainly distributed with the same label through clustering or KNN searching and the like so as to generate a pseudo label for training, but if the estimated identity is incorrect, the learning of the model is hindered; 2) Because the pedestrian images have factors such as shielding, different visual angles, background interference and the like, the estimated pseudo label has noise, and the main task of the pedestrian re-identification model is to learn distinctive pedestrian feature representation from different pedestrian images, minimize the influence of the noise pseudo label and maximize the discrimination of the model, which is also a challenge of unsupervised pedestrian re-identification; 3) The pedestrian re-identification is essentially a multi-camera retrieval task, and how to fully learn the pedestrian features unchanged across cameras due to the differences of backgrounds, visual angles, light rays and the like among different cameras is also a problem to be solved.
In addition, the traditional unsupervised pedestrian re-identification task mainly adopts the CNN as a backbone network to extract features, the CNN can only process one local neighborhood each time, the receptive field is limited, global information cannot be well captured, and the convolution and down-sampling operation of the CNN can cause great detail information and space information loss, so that the unsupervised pedestrian re-identification requirement cannot be effectively met.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution, and effectively improves the accuracy and robustness of unsupervised pedestrian re-identification.
According to the technical scheme provided by the invention, the unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution comprises the following steps:
constructing a Transformer network-based multi-branch network identification model Multiformer to perform required unsupervised pedestrian re-identification on pedestrian images acquired by m cameras by using the constructed multi-branch network identification model Multiformer, wherein,
identifying a model Multiformer for the constructed multi-branch network, wherein the model Multiformer comprises a single-camera-domain Intromer network constructed on the basis of a Transformer network for each camera and a multi-camera-domain Interormer network constructed on the basis of the Transformer network for all cameras;
when a multi-branch network recognition model multi-camera is constructed, the single-camera domain Intraform networks and the multi-camera domain Interform networks of all cameras adopt the same backbone network, and the single-camera domain Intraform networks of all cameras share the backbone network parameters during training;
when the pedestrian is re-identified, feature extraction is carried out on an identification image containing the pedestrian to be identified by using a multi-camera domain Interformer network, so that the pedestrian image matched with the extracted pedestrian feature is searched and determined in the pedestrian images collected by the m cameras according to the extracted pedestrian feature.
When the multi-branch network recognition model Multiformer is constructed, the construction steps comprise:
constructing a multi-branch network identification basic model based on a transform network, wherein the multi-branch network identification basic model comprises a multi-camera domain basic network based on the transform network and m single-camera domain basic networks based on the transform network, and classifiers are configured in the multi-camera domain basic network and all the single-camera domain basic networks, wherein the configured classifiers are adaptively connected with corresponding backbone networks in the multi-camera domain basic network or the single-camera domain basic network;
when a multi-branch network identification basic model is constructed, pre-training a backbone network for constructing a multi-camera domain basic network on the basis of an ImageNet data set to obtain multi-camera domain backbone network pre-training parameters of the multi-camera domain basic network;
when the constructed single-camera domain basic network is trained, loading the obtained multi-camera domain backbone network pre-training parameters to the backbone networks of all the single-camera domain basic networks so as to enable the single-camera domain basic networks of all the cameras to share the network backbone parameters;
performing required training on the constructed multi-branch network identification basic model, so as to form a corresponding single-camera-domain Intraformer network based on a trained single-camera-domain basic network and form a multi-camera-domain Interformer network based on a trained multi-camera-domain basic network when a target training state is reached;
and forming a multi-branch network identification model Multiformer by using the multi-camera domain Interformer network and the m single-camera domain Intraformer networks.
When the constructed multi-branch network recognition basic model is trained, the training process comprises the following steps:
step 1, performing feature extraction on a training data set by utilizing a multi-branch network identification basic model to obtain multi-camera-domain picture features F mc And single camera field picture feature F of the ith camera c_i ,i=1,…,m;
Step 2, the obtained multi-camera-domain picture characteristics F mc And single camera field picture feature F of ith camera c_i Clustering is carried out, wherein successfully clustered pictures form clustering points Inliers, clustering point pseudo labels are distributed to pictures in the clustering points Inliers, and unsuccessfully clustered pictures form Outliers;
step 3, generating a clustering point pseudo label clustering center based on the clustering point pseudo labels, performing self-adaptive outlier sample redistribution on Outliers Outiers by using the generated clustering point pseudo label clustering center, distributing corresponding clustering point pseudo labels to Outliers in the Outliers Outiers after the self-adaptive outlier samples are redistributed, and forming a pseudo label training set by using all clustering point pseudo labels;
step 4, performing joint comparison learning on the multi-branch network identification basic model to perform model network parameter optimization based on the joint comparison learning on the multi-branch network identification basic model, wherein,
for the ith single-camera domain basic network, based on the training data set and the single-camera domain picture characteristic F of the ith camera c_i Performing joint comparison learning on the clustering centers of the clustering point pseudo labels;
for the multi-camera domain basic network, based on the training data set and the multi-camera domain picture characteristics F mc Performing joint comparison learning on the clustering centers of the clustering point pseudo labels;
the joint contrast learning comprises clustering level contrast learning and example level contrast learning;
step 5, carrying out the collaborative training of the single-camera domain basic network and the multi-camera domain basic network on the multi-branch network identification basic model after the optimization based on the joint comparison learning, wherein,
using multi-camera-domain picture features F mc The pseudo label training set trains the multi-camera domain basic network;
for the ith single-camera domain basic network, utilizing the single-camera domain picture characteristic F of the ith camera c_i Training by a pseudo label training set;
and 6, repeating the training process from the step 1 to the step 5 until a target training state is reached.
For step 1, extracting the multi-camera-domain picture characteristics F mc Then, performing Spilt processing on any training picture in a training data set, connecting a parameter Cls token to each image block obtained by the Spilt processing, and embedding the position information of each image block and the camera information code of the training picture to configure and form training picture multi-camera domain feature extraction information;
processing the multi-camera domain feature extraction information of the training picture by using a multi-camera domain basic network to extract and obtain multi-camera domain picture features F mc
Extracting single-camera domain picture feature F of ith camera c_i Spilt processing is carried out on the training picture acquired by the ith camera, a parameter Cls token is connected to each image block obtained by the Spilt processing, and the position information of each image block is embedded to form training picture single-camera domain feature extraction information;
processing the single-camera domain feature extraction information of the training picture by using the single-camera domain basic network corresponding to the ith camera to extract and obtain single-camera domain picture features F c_i
In step 2, the obtained multi-camera-domain picture characteristics F are processed mc And all single camera field picture features F c And when clustering is carried out, the clustering method comprises a DBSCAN clustering method.
In step 3, for the pseudo label clustering center of the clustering point, the following steps are provided:
Figure BDA0003936500690000031
wherein Y is the category number of the pseudo labels of the clustering points, phi i As cluster center feature of class i, f j Num is the feature of the ith picture i The number of pictures included in the category of the ith category;
the generated clustering point pseudo label clustering Center is stored in a clustering Center feature repository Center Memory Bank;
computing an affinity matrix between outlier samples within Outliers and the clustering point pseudo-tag clustering center, wherein,
the affinity matrix between Outliers and the clustering point pseudo tag clustering center is:
Figure BDA0003936500690000041
AFM (i, j) is the ith cluster center feature phi in the affinity matrix AFM i Value of mutual similarity relationship with jth outlier sample, O j Features of a jth outlier sample; phi i_r Representing the ith cluster center feature Φ i R number of (1), O j_r Represents the jth outlier sample feature O j R, N represents a characteristic dimension;
and when the calculated affinity matrix AFM self-adaptive outlier sample is redistributed, the outlier sample is distributed to the clustering center of the clustering point pseudo label with the strongest mutual similarity relation.
Configuring a mutual similarity relation threshold value v for the mutual similarity relation between the outlier sample and the clustering point pseudo label clustering center, wherein,
Figure BDA0003936500690000042
Num O is the number of Outliers, v, samples within the Outliers start Is the initial value of threshold v of mutual similarity relation, gamma is threshold attenuation rate, epoch is training round, e peak For characterizingThe training round when the threshold value v of the mutual similarity relation reaches the peak value, II (-) is an indication function, when the training round is less than e peak Is 1, i.e. II (·) = II { epoch)<e peak };
When the outlier samples are distributed based on the configured mutual similarity relation threshold v, the jth outlier sample of which the mutual similarity relation value AFM (i, j) is greater than the mutual similarity relation threshold v is distributed to the clustering center of the pseudo label of the clustering point with the strongest mutual similarity relation.
In step 4, during the joint comparison learning, after the cluster level comparison learning, the cluster comparison loss l is obtained c (ii) a After example-level contrast learning, example contrast loss l is obtained t, wherein ,
to cluster contrast loss l c Then, there are:
Figure BDA0003936500690000043
wherein ,Φ+ The method comprises the steps that positive samples of a sample picture q are obtained, gamma is a set parameter, and f (q) is a query example feature of the sample picture q;
comparison of losses l to examples t Then, there are:
Figure BDA0003936500690000044
wherein P is the number of different pedestrians selected in a given sample, K is the number of sample pictures selected for each pedestrian in the given sample, a is one picture in the K sample pictures,
Figure BDA0003936500690000051
for an anchor image with identity i,
Figure BDA0003936500690000052
is a positive sample with an identity of i,
Figure BDA0003936500690000053
is an identityIs a negative sample of j and is,
Figure BDA0003936500690000054
for anchor images of identity i
Figure BDA0003936500690000055
The minimum gap between the similarity of the beta positive sample pair and the similarity of the negative sample pair is the extracted image feature.
Determining a model network parameter theta to minimize a loss function of the NH training samples under the determined model network parameter theta when model network parameter optimization based on joint contrast learning is performed, wherein,
when optimizing, the multi-camera domain basic network and all the single-camera domain basic networks are optimized simultaneously, and the method comprises the following steps:
Figure BDA0003936500690000056
f(x a ) For anchor point image x a And (4) extracting image features.
For the loss of identity for collaborative training, there are:
Figure BDA0003936500690000057
wherein ,lid In order to coordinate the loss of identity training,
Figure BDA0003936500690000058
is x i Nz is the number of training samples in the training data set,
Figure BDA0003936500690000059
for training sample x i The multi-branch network recognition basic model outputs a real identity label of
Figure BDA00039365006900000510
The probability of (c).
The invention has the advantages that: the multi-branch network identification model Multiformer is constructed based on a Transformer network, the constructed multi-branch network identification model Multiformer comprises a single-camera domain Intormer network and a multi-camera domain Interormer network, all the single-camera domain Intormer networks share backbone network parameters, the generalization capability is enhanced, the inter-domain difference caused by the background, illumination and the like of different camera domains is relieved to a certain extent, the robustness of the model to noise pseudo labels is improved, and the accuracy of unsupervised pedestrian re-identification is further improved.
By utilizing the self-adaptive outlier sample redistribution, the number of the pseudo labels can be expanded, and the characteristic representation capability of a multi-branch network identification model Multiformer is enhanced. During model training, by means of the combined learning of the example-level comparison learning and the cluster-level comparison learning, the clustering accuracy is greatly improved through the combined learning, and the problem of noise pseudo labels is relieved, so that the accuracy and robustness of unsupervised pedestrian re-identification are effectively improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a flowchart of an embodiment of constructing a multi-branch network recognition model Multiformer according to the present invention.
Fig. 3 is a diagram of an embodiment of a multi-branch network recognition model Multiformer according to the present invention.
FIG. 4 is a schematic diagram of a single-camera domain Intactor network and a multi-camera domain Interactor network according to an embodiment of the present invention.
Fig. 5 is a diagram illustrating the visualization effect of the multi-branch network according to the present invention.
FIG. 6 is a schematic diagram of an embodiment of counting the distribution of Outliers after clustering according to the present invention.
FIG. 7 is a diagram illustrating adaptive outlier sample reallocation according to the present invention.
FIG. 8 is a schematic diagram of the inventive association comparison.
Fig. 9 is a schematic diagram of the visualization effect in the comparative example of the present invention.
Detailed Description
The invention is further illustrated by the following specific figures and examples.
In order to effectively improve the accuracy and robustness of unsupervised pedestrian re-identification, the unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution is adopted, in an embodiment of the invention, the unsupervised pedestrian re-identification method comprises the following steps:
constructing a Transformer network-based multi-branch network identification model Multiformer to perform required unsupervised pedestrian re-identification on pedestrian images acquired by m cameras by using the constructed multi-branch network identification model Multiformer, wherein,
identifying a model Multiformer for the constructed multi-branch network, wherein the model Multiformer comprises a single-camera-domain Intormer network constructed on the basis of a Transformer network for each camera and a multi-camera-domain Interormer network constructed on the basis of a Transformer network for all cameras;
when a multi-branch network recognition model multi-camera is constructed, the single-camera domain Intraform networks and the multi-camera domain Interform networks of all cameras adopt the same backbone network, and the single-camera domain Intraform networks of all cameras share the backbone network parameters during training;
when the pedestrian is re-identified, feature extraction is carried out on an identification image containing the to-be-identified pedestrian by using a multi-camera domain Interformer network, so that a pedestrian image matched with the extracted pedestrian feature is searched and determined in the pedestrian images collected by the m cameras according to the extracted pedestrian feature.
Fig. 1 shows an implementation flowchart of unsupervised pedestrian re-identification, in which a multi-branch network identification model Multiformer based on a transform network is required to be constructed when unsupervised pedestrian re-identification is implemented, that is, the multi-branch network identification model Multiformer is constructed based on the transform network, where a scene range of pedestrian image acquisition determined by m cameras is a range of a pedestrian re-identification area of the multi-branch network identification model Multiformer, at this time, the constructed multi-branch network identification model Multiformer can perform unsupervised pedestrian re-identification on pedestrian images acquired by the m cameras, where the cameras are devices capable of acquiring pedestrian images, such as common cameras, camera heads, and the like, and the specific types of the cameras and the number of the cameras can be selected as required so as to meet the required unsupervised pedestrian re-identification. In addition, m cameras are generally installed in different areas, that is, images of pedestrians in m different area scenes can be acquired by using the m cameras.
In order to improve accuracy and robustness of unsupervised pedestrian re-identification, in an embodiment of the present invention, the multi-branch network identification model multi-camera needs to include a single-camera-domain intraform network constructed based on a transform network for each camera and a multi-camera-domain interfrmer network constructed based on a transform network for all cameras, where the single-camera-domain specifically refers to a camera acquiring a pedestrian image within an acquisition range, and the multi-camera-domain specifically refers to m cameras acquiring a pedestrian image within an acquisition range. Because the single-camera-domain Intramer network and the multi-camera-domain Interramer network are constructed on the basis of the Transformer network, the global information and the picture details can be better acquired by utilizing the characteristics of the Transformer network, and the effective information utilization rate of the global is enhanced.
In one embodiment of the invention, single-camera domain Intrameters of all cameras adopt the same backbone network, and share network backbone parameters, so that the generalization capability of a multi-branch network recognition model Multiformer can be enhanced, the inter-domain differences brought by the backgrounds, the illuminations and the like of different camera domains are relieved to a certain extent, the robustness of a noise pseudo label is improved, and the accuracy of unsupervised pedestrian re-recognition is further improved.
FIG. 5 is a T-SNE graph, plotted on a public data set Market-1501. Fig. (a) shows a feature distribution map obtained without applying the multi-branch network recognition model, multiformer, processing according to the present invention, and fig. (b) shows a feature distribution map obtained by applying the multi-branch network recognition model, multiformer, extraction. Where the same color dots represent the same camera, the Market-1501 data set extracts pictures from 6 cameras, and thus there are 6 colors in the figure. The image (a) is influenced by the domain difference between different cameras, so that the image features of the same camera are more similar, which means that the attention of the network is not influenced by the pedestrian but is influenced by noise. The image features of each camera in the graph (b) are uniformly distributed, and it can be seen that, after a multi-branch network recognition model, a multi-former is introduced, the domain difference among different cameras is obviously relieved.
In an embodiment of the present invention, when constructing a multi-branch network recognition model, the construction step includes:
constructing a multi-branch network identification basic model based on a transform network, wherein the multi-branch network identification basic model comprises a multi-camera domain basic network based on the transform network and m single-camera domain basic networks based on the transform network, a classifier is configured in the multi-camera domain basic network and all the single-camera domain basic networks, and the configured classifier is adaptively connected with corresponding backbone networks in the multi-camera domain basic network or the single-camera domain basic network;
when a multi-branch network identification basic model is constructed, pre-training a backbone network for constructing a multi-camera domain basic network on the basis of an ImageNet data set to obtain pre-training parameters of the multi-camera domain backbone network of the multi-camera domain basic network;
when the constructed single-camera domain basic network is trained, loading the obtained multi-camera domain backbone network pre-training parameters to the backbone networks of all the single-camera domain basic networks so as to enable the single-camera domain basic networks of all the cameras to share the network backbone parameters;
performing required training on the constructed multi-branch network recognition basic model, so as to form a corresponding single-camera-domain Intraform network based on a trained single-camera-domain basic network and form a multi-camera-domain Interform network based on a trained multi-camera-domain basic network when a target training state is reached;
and forming a multi-branch network identification model Multiformer by using the multi-camera domain Interformer network and the m single-camera domain Intraformer networks.
As can be seen from the above description, since the multi-branch network identification model includes a single-camera-domain intranet network and a multi-camera-domain interposer network, the constructed multi-branch network identification basic model at least includes m single-camera-domain basic networks for forming the single-camera-domain intranet network and m multi-camera-domain basic networks for forming the multi-camera-domain interposer network, that is, the m single-camera-domain basic networks correspond to the m single-camera-domain intranet networks finally formed, and the multi-camera-domain basic networks correspond to the multi-camera-domain interposer networks.
In an embodiment of the present invention, the single-camera domain basic network and the multi-camera domain basic network both use the same backbone network, for example, both use an Encoder in a Transformer network as the backbone network. In addition, a classifier is configured in the multi-camera domain basic network and all the single-camera domain basic networks, and then a multi-branch classifier is formed.
Fig. 3 shows a schematic diagram of the architectures corresponding to a multi-camera-domain Interformer network and a single-camera-domain Intraformer network after reaching a target training state, and since only the corresponding network parameters are optimized and adjusted during training, the corresponding architectures of the constructed single-camera-domain basic network and the multi-camera-domain basic network can refer to the diagram of fig. 3.
In fig. 3 and fig. 4, for a multi-camera domain inter-former network, split is to slice an input picture, so as to obtain a plurality of image blocks after slicing. The Liner Projection of Flattened Patches is linear Projection and dimension transformation processing, embedding is data Embedding, and Feature Extraction is Feature Extraction, wherein in the Feature Extraction, E is obtained in sequence mc Block, and Token. In FIG. 4, branch-1 to Branch-m are the Intraform networks of m single-camera domains.
Affinity Matrix is an Affinity Matrix, pseudo Label is a Pseudo Label, AORA is an adaptive outlier sample redistribution strategy, joint Contrast Learning (JCL) is Joint contrast Learning, MLP is a Multilayer Perceptron (multilayered Perceptron), and Classincer is a Classifier. The Joint Contrast Learning (JCL) includes Instance contrast learning and Cluster contrast learning.
Fig. 4 shows an implementation of a multi-camera domain interpolator network and a single-camera domain interpolator network corresponding backbone network, where the backbone network in fig. 4 includes the above-mentioned Spilt, linear projection, dimension transformation processing, and the like, and the specific cases of forming the multi-camera domain interpolator network and the single-camera domain interpolator network corresponding backbone network based on the transform are the same as the prior art.
In one embodiment of the invention, the Classifier is adaptively connected to the backbone network via MLP, and in this case, the information added by the Classifier can be determined. In specific implementation, all Classifier classifiers use the same Classifier, and normal initialization can be adopted for all Classifier classifiers. After the established multi-branch network recognition basic model is trained to reach the target state, the corresponding Classifier classifiers can be obtained respectively.
For the single-camera-domain intemperer network, since the same backbone network is used as the multi-camera-domain intemperer network, in fig. 3, for a specific case of m single-camera-domain intemperer networks, reference may be made to the corresponding description of the multi-camera-domain intemperer network, and details thereof are not repeated here.
In order to realize the sharing of network backbone network parameters, in one embodiment of the invention, a backbone network in a multi-camera domain basic network is constructed and pre-trained on the basis of an ImageNet data set to obtain multi-camera domain backbone network pre-training parameters of the multi-camera domain basic network;
and loading the pre-training parameters of the multi-camera domain backbone networks of the multi-camera domain network backbone networks to the backbone networks of all the single-camera domain basic networks when the constructed single-camera domain basic network is constructed, so that the single-camera domain basic networks of all the cameras share the network backbone parameters.
In specific implementation, the ImageNet data set is a commonly used public data set, and the method and the process for pre-training the backbone network in the multi-camera domain basic network by using the ImageNet data set are consistent with the prior art. And in the multi-camera domain basic network, after pre-training to obtain pre-training parameters of the multi-camera domain network, adding a Classifier in the multi-camera domain basic network. In all single-camera domain basic networks, after pre-training parameters of the multi-camera domain network are loaded, a Classifier is added in each single-camera domain basic network.
The Classifier can be a current common classification form, and the mode for adding the Classifier and the specific form of the Classifier can be selected according to actual needs so as to meet the classification requirement required by the Classifier. After all classifiers are added, the construction of the multi-branch network recognition basic model is realized, and then the training of the multi-branch network recognition basic model is needed.
As can be seen from the above description, network parameter sharing is implemented for the backbone network of the single-camera domain foundation network constructed by each camera based on the transform network, but the corresponding parameters of the Classifier are not shared. In specific implementation, after the backbone networks of the single-camera domain basic network share network parameters, the corresponding backbone network parameters are basically consistent.
In an embodiment of the present invention, the training required for constructing the multi-branch network recognition basic model is specifically configured to implement sharing of network backbone parameters by the single-camera domain basic network of all cameras, and then the training is performed on the obtained multi-branch network recognition basic model until a target training state is reached.
When the constructed multi-branch network recognition basic model is trained, the training process comprises the following steps:
step 1, performing feature extraction on a training data set by utilizing a multi-branch network identification basic model to obtain multi-camera-domain picture features F mc And single camera field picture feature F of the ith camera c_i ,i=1,…,m;
Step 2, the obtained multi-camera-domain picture characteristics F mc And single camera field picture feature F of the ith camera c_i Clustering is carried out, wherein the successfully clustered pictures form clustering points Inliers, and clustering points In are subjected toDistributing clustering point pseudo labels to pictures in the liers, and forming Outliers by pictures which are not successfully clustered;
step 3, generating a clustering point pseudo label clustering center based on the clustering point pseudo labels, performing self-adaptive outlier sample redistribution on Outliers by using the generated clustering point pseudo label clustering center, distributing corresponding clustering point pseudo labels to Outliers in the Outliers after the self-adaptive outlier samples are redistributed, and forming a pseudo label training set by using all clustering point pseudo labels;
step 4, performing joint comparison learning on the multi-branch network identification basic model to perform model network parameter optimization based on the joint comparison learning on the multi-branch network identification basic model, wherein,
for the ith single-camera domain basic network, based on the training data set and the single-camera domain picture characteristics F of the ith camera c_i Performing joint comparison learning on the clustering centers of the clustering point pseudo labels;
for the multi-camera domain basic network, based on the training data set and the multi-camera domain picture characteristics F mc Performing joint comparison learning on the clustering centers of the clustering point pseudo labels;
the joint contrast learning comprises clustering level contrast learning and example level contrast learning;
step 5, the multi-branch network identification basic model after optimization based on joint comparison learning is subjected to the collaborative training of a single-camera domain basic network and a multi-camera domain basic network, wherein,
using multiple camera-domain picture features F mc The pseudo label training set trains the multi-camera domain basic network;
for the ith single-camera domain basic network, the single-camera domain picture characteristic F of the ith camera is utilized c_i Training by a pseudo label training set;
during collaborative training, network parameter optimization based on collaborative training is carried out on the multi-branch network recognition basic model by utilizing the calculated collaborative training identity loss;
and 6, repeating the training process from the step 1 to the step 5 until a target training state is reached.
Fig. 2 shows an embodiment of a training process for a multi-branch network recognition basic model, that is, during training, steps such as feature extraction, clustering to generate partial pseudo labels and outliers, adaptive outlier sample allocation, joint comparison learning, and multi-branch network collaborative training are generally performed, where a termination condition of the training is generally whether the model converges, and when it is determined that the model converges after the training, the training is terminated, and at this time, a target training state is reached, otherwise, the training is repeatedly performed. In specific implementation, the condition for judging that the model is in the convergence state is as follows: during training, the precision of the model is not increased any more, and the loss of the model is not reduced any more.
The training procedure is described in detail below.
Specifically, during training, a training data set needs to be provided or configured, the training data set is images captured and collected by m cameras, and the size of the training data set can be selected according to actual needs so as to meet required training requirements.
In an embodiment of the present invention, for step 1, the multi-camera domain picture feature F is extracted mc Then, performing Spilt processing on any training picture in a training data set, connecting a parameter Cls token to each image block obtained by the Spilt processing, and coding the position information of each image block and the camera information of the training picture to form training picture multi-camera domain feature extraction information;
processing the multi-camera domain feature extraction information of the training picture by using a multi-camera domain basic network to extract and obtain multi-camera domain picture features F mc
Extracting single-camera domain picture feature F of ith camera c_i Spilt processing is carried out on the training picture acquired by the ith camera, a parameter Cls token is connected to each image block obtained by the Spilt processing, and position information of each image block is processed to form training picture single-camera domain feature extraction information;
training image pairs using single-camera domain basis network corresponding to ith cameraProcessing the single-camera domain feature extraction information to extract and obtain single-camera domain picture features F c_i
In specific implementation, the input training data is a set X of all camera pictures (i.e. pictures collected by m cameras) mc ∈R B×C×H×W Where H × W is the resolution of the input picture, C is the number of channels (for RGB pictures, the number of channels C is 3), and B is the size of each batch, and the size of each batch B can be selected and determined according to the actual application scenario, etc. Segmenting (Spilt) an input picture and flattening the spatial dimension to obtain an image block Patch sequence
Figure BDA0003936500690000111
Wherein N is the number of Patch obtained by dividing, P h ×P w The size of the cut image block Patch is shown.
Image block Patch sequence X p Obtaining Patch code E of image block after linear projection and dimension transformation mc ∈R B ×N×D And D is the generated characteristic dimension. Encoding E for an image Block Patch mc Connecting a parameter Cls token for representing global characteristics, and embedding the position code and the camera information code to obtain E mc_cls ∈R B×N′×D . After training, the Cls token parameter will contain a feature representation of the input picture for classification. The size of the parameter Cls token is R B×1×D Patch encoding of image blocks E mc After connecting the parameters Cls token, the dimension of the Patch number N is increased by 1.
The parameter Cls token is a learnable parameter with a size R B×1×D (ii) a The position code is the position information of each image block Patch in the original picture after being divided, the camera information code is formed by the camera number information of the picture, and the sizes corresponding to the position code and the camera information code are R 1×N′×D The initial values are all 0.
Will E mc_cls Sending the training picture to a Block network of a transform network, and processing the training picture multi-camera-domain feature extraction information by using the Block network to extract and obtain multi-camera-domain picture features F mc Extracted polyCamera field Picture feature F mc Namely Token generated by the Interformer network in fig. 3 and fig. 4.
For the single-camera domain basic network, the input training pictures are classified according to the camera labels and are respectively sent into the corresponding single-camera domain basic network, for example, for the ith camera and the single-camera domain basic network corresponding to the ith camera, the training pictures sent into the single-camera domain basic network corresponding to the ith camera are the pictures acquired by the ith camera. Specifically, the input data for each single-camera domain infrastructure network is a set X of single-camera pictures c_i ∈R B×C×H×W Wherein c _ i represents the ith camera, and the image block Patch code E is obtained after the image block Patch is subjected to the same segmentation and dimension transformation as those of the multi-camera domain basic network c_i ∈R B×N×D . Encoding E for an image Block Patch c_i A parameter Cls token is concatenated, into which the position code of the image patch is also embedded. Subsequently, the image Patch is encoded E c_i Transmitting the image into a Block network of a transform network to extract and obtain the image characteristics F of a single camera field c_i Extracting to obtain single-camera domain picture characteristics F c_i I.e. Token generated by the Intraformer network in fig. 3 and fig. 4.
In FIG. 4 is shown that the Block network is based on E mc_cls For an embodiment of processing the location information and the camera information to obtain Token, the specific method and process of obtaining Token of the Block network may refer to the processing procedure in fig. 4, which is similar to the Block network processing method in the existing transform network and will not be described in detail here.
In a multi-camera-domain basic network for forming a multi-camera-domain Interformer network and a single-camera-domain basic network for forming a single-camera-domain Intraformer network, parameters of a Block network are determined and obtained through the steps, so that picture features can be directly extracted and obtained by using the Block network, are specifically related to the Block network, are well known to those skilled in the art, and are not described herein again.
In an embodiment of the present invention, in step 2, the obtained multi-camera-domain picture feature F is processed mc And all single camera field picture features F c When clustering is performed, the clustering method packageThe method comprises a DBSCAN clustering method.
In the clustering process, part of the extracted image features are interfered by noises such as pedestrian postures, backgrounds and the like, so that the image features are far away from a clustering central point and cannot be successfully clustered, such samples are called outlier samples, and all the outlier samples form Outliers. In an embodiment of the invention, unsupervised pedestrian re-identification is performed by means of collaborative training by means of pseudo labels obtained by clustering, and outlier samples cannot be utilized in training due to missing labels.
In specific implementation, the clustering method can adopt a DBSCAN clustering method, the DBSCAN clustering method does not need to specify the clustering number, the clustering category number can be independently learned, after clustering, clustering point pseudo labels are distributed to the successfully clustered pictures, and the unsuccessfully clustered pictures form Outliers. Of course, in specific implementation, the clustering method may also adopt other common clustering forms, specifically based on meeting the actual clustering requirements. When the DBSCAN clustering method is adopted for clustering, the specific conditions for clustering to form clustering points Inliers and outlier Outliers can be selected and determined according to actual needs, so that the actual clustering requirements can be met.
Fig. 6 counts the number of outlier samples of the pedestrian re-identification data set Market-1501 after clustering by the DBSCAN, the outlier samples occupy more than 60% of the total samples in the initial training stage, and the outlier samples still occupy more than 10% after the model is iterated for multiple times. Compared with the convolutional neural network, the Transformer network has less generalized bias on the structure of input data, such as correlation and translational invariance, and therefore, more data is needed for training the Transformer network, especially in the early stage of model training. To get better results, full use is needed for outlier samples.
In an embodiment of the present invention, in step 3, for the clustering center of the pseudo labels at the clustering points, there are:
Figure BDA0003936500690000121
wherein Y is the category number of the pseudo labels of the clustering points,Φ i as cluster center feature of class i, f j Num is the feature of the ith picture i The number of pictures contained in the ith category;
the generated clustering point pseudo label clustering Center is stored in a clustering Center characteristic repository Center Memory Bank;
computing an affinity matrix between outlier samples within Outliers and the cluster point pseudo-tag cluster centers, wherein,
the affinity matrix between Outliers and the clustering point pseudo tag clustering center is:
Figure BDA0003936500690000122
AFM (i, j) is the ith cluster center feature phi in the affinity matrix AFM i Value of mutual similarity relationship with jth outlier sample, O j Features of a jth outlier sample; phi i_r Representing the ith cluster center feature Φ i R number of (1), O j_r Represents the jth outlier sample feature O j R, N represents a characteristic dimension;
and when the calculated affinity matrix AFM self-adaptive outlier sample is redistributed, allocating an outlier sample to the clustering center of the pseudo label of the clustering point with the strongest mutual similarity relation.
In specific implementation, after clustering is performed based on the DBSCAN clustering method, the category number Y of the pseudo labels of the clustering points can be obtained according to the clustering points Inliers formed, and of course, the feature f of the ith category of jth picture can also be obtained j And the number of pictures num contained in the i-th category i . Thus, after clustering, a clustering point pseudo tag clustering center can be generated, the generated clustering point pseudo tag clustering center being { Φ } 1 ,Φ 2 ,...Φ i ,…Φ Y }。
For the characteristic dimension N, it is related to the constructed multi-branch network identification model Multiformer, and for a certain multi-branch network identification model Multiformer, the characteristic dimension N remains fixed. At this time, the ith cluster center is speciallySign phi i Having the same feature dimension as the jth outlier sample. For the affinity matrix AFM, the mutual similarity relation values between the corresponding outlier sample and the cluster center feature can be obtained in accordance with the prior art, that is, based on the affinity matrix AFM.
As can be seen from the above description, when determining whether to converge, it is generally necessary to train the constructed multi-branch network recognition base model multiple times. In an embodiment of the invention, a clustering Center characteristic repository Center Memory Bank is used for storing clustering point pseudo label clustering centers after clustering in each training process.
After the clustering point pseudo tag clustering centers are stored by using a clustering Center feature repository Center Memory Bank, the affinity matrix between Outliers in Outliers Outliers and the clustering point pseudo tag clustering centers can be calculated, self-adaptive outlier sample redistribution is carried out based on the calculated affinity matrix, the data volume of a training model is expanded, the feature representation capability of the model is enhanced, and better performance is obtained.
Ith cluster center feature phi in the AFM i The ith cluster center feature Φ at the time of the mutual similarity relationship value AFM (i, j) with the jth outlier sample i Specific case of characteristic dimension N and the ith clustering center characteristic phi i And (4) correlating. In specific implementation, when the self-adaptive outlier sample is redistributed, the jth outlier sample is distributed to the clustering center of the pseudo label of the clustering point with the strongest mutual similarity relation. The mutual similarity relationship is strongest, specifically, for j outlier samples, the feature phi of the ith clustering center is i The corresponding mutual similarity value AFM (i, j) is maximal.
In one embodiment of the invention, a mutual similarity threshold v is configured for the mutual similarity between the outlier sample and the clustering center of the pseudo label of the clustering point, wherein,
Figure BDA0003936500690000131
Num O as Outliers inner OutliersNumber of books, v start Is the initial value of threshold v of mutual similarity relation, gamma is threshold attenuation rate, epoch is training round, e peak The training round is used for representing the time when the threshold value v of the mutual similarity reaches the peak value, II (-) is an indication function, and when the training round is less than e peak Is 1, i.e. II (·) = II { epoch)<e peak };
When the outlier samples are distributed based on the configured mutual similarity relation threshold v, the jth outlier sample of which the mutual similarity relation value AFM (i, j) is greater than the mutual similarity relation threshold v is distributed to the clustering center of the pseudo label of the clustering point with the strongest mutual similarity relation.
In specific implementation, when the multi-branch network recognition basic model is in an initial training stage, the feature extraction capability is poor, and the accuracy of extracted features is relatively low, so that a smaller threshold v of mutual similarity relation is adopted. Along with the training, the model feature extraction capability is gradually enhanced, and similarly, the threshold value v of the mutual similarity relationship is also adaptively increased; however, due to the fact that some pictures have multiple pedestrians, occlusion, blurring and the like in the data, and part of outlier sample points are always in the condition of oscillation or clustering incapability, the outlier sample points are called as strong noise points, and therefore after iteration is performed for a certain turn, the threshold value ν of the mutual similarity relation is adaptively reduced to ignore the interference of the strong noise points on the model, as shown in fig. 7.
And configuring to obtain the threshold v of the mutual similarity relation based on the conditions in different training stages. Initial value v of mutual similarity relation threshold v start May be set to 0.6 in general, the threshold decay rate y may be set to 0.9 in general peak And is an empirical value that can be set to 10 in general. epoch is the round of model training.
After the mutual similarity relation threshold value v is configured, when the mutual similarity relation value AFM (i, j) is larger than the mutual similarity relation threshold value v, the jth outlier sample is distributed to the clustering point pseudo label clustering center with the strongest mutual similarity relation, otherwise, the jth outlier sample is not distributed.
An embodiment of the inventionIn step 4, in the process of joint contrast learning, after the cluster level contrast learning, the cluster contrast loss l is obtained c (ii) a After example-level contrast learning, an example contrast loss l is obtained t, wherein ,
to cluster contrast loss l c Then, there are:
Figure BDA0003936500690000141
wherein ,Φ+ The method comprises the steps that positive samples of a sample picture q are obtained, gamma is a set parameter, and f (q) is a query example feature of the sample picture q;
comparison of losses l to examples t Then, there are:
Figure BDA0003936500690000142
wherein P is the number of different pedestrians selected in a given sample, K is the number of sample pictures selected for each pedestrian in the given sample, a is one picture in the K sample pictures,
Figure BDA0003936500690000143
for an anchor image with identity i,
Figure BDA0003936500690000144
is a positive sample with an identity of i,
Figure BDA0003936500690000145
for the negative sample with the identity j,
Figure BDA0003936500690000146
for anchor images of identity i
Figure BDA0003936500690000147
The minimum gap between the similarity of the beta positive sample pair and the similarity of the negative sample pair is the extracted image feature.
Figure BDA0003936500690000148
For anchor images of identity i
Figure BDA0003936500690000149
The characteristics of the image to be extracted are,
Figure BDA00039365006900001410
for anchor images of identity j
Figure BDA00039365006900001411
And (4) extracting image features.
In specific implementation, in addition to comparison learning in the conventional example level, in order to improve the clustering effect of the model and reduce the distance between the outlier sample and the clustering center of the pseudo label of the clustering point, in an embodiment of the invention, the cluster-level comparison learning is added, and the training is combined with the example-level comparison learning. For the cluster-level comparison learning, the similarity between the samples and the positive clusters is mainly drawn, the similarity between the samples and the negative clusters is pushed away, and compared with the example-level comparison learning, the calculation amount of the model can be greatly reduced. Clustering is facilitated by the use of comparison samples, and this cluster-oriented comparison learning paradigm helps the model to minimize the similarity between clusters to separate different clusters. The situation of joint contrast learning is shown in fig. 8.
In the specific implementation, during training, the single-camera domain basic network and the multi-camera domain basic network are required to be subjected to joint comparison learning, and in the joint comparison learning, the purpose of the cluster level comparison learning is to minimize the distance between the sample picture q and the positive cluster thereof and maximize the distance between the sample picture q and the negative cluster thereof, so that after the cluster level comparison learning, the cluster comparison loss l can be obtained c
And the samples of each batch of the cluster-level contrast learning only need to be contrasted with the clustering center characteristics of the pseudo labels of the clustering points for the contrast learning. In specific implementation, performing cluster level comparison learning on a single-camera domain basic network, wherein the sample picture q is a picture shot and collected by a camera corresponding to the single-camera domain basic network; and performing cluster level comparison learning on the multi-camera domain basic network, wherein the sample picture q is any picture in the training data set.
As can be seen from the above description, after clustering, the pictures in the training data set are configured with the clustering point pseudo labels after successful clustering, so that after the sample picture q is determined, the picture belongs to a positive sample in the same category as the sample picture q, and belongs to a negative sample in a different category from the sample picture q, that is, the positive sample Φ of the sample picture q + And the query instance features f (q) of the sample picture q can be determined by the technical means commonly used in the technical field. Therefore, the clustering contrast loss l corresponding to the single-camera domain basic network and the multi-camera domain basic network can be obtained respectively c
In specific implementation, for the parameter Γ, the value range of the parameter Γ is [0,1], and if the parameter Γ can be 0.5. In addition, the minimum gap β between the similarity of the positive sample pair and the similarity of the negative sample pair is an empirical value, and may take 0.3, for example.
In specific implementation, the purpose of example-level contrast learning is to reduce the similarity between similar samples and expand the similarity between dissimilar samples. For a given batch of samples, the given samples are sample pictures selected from the training data set, P different pedestrians are selected from the sample pictures, and K pictures are selected for each pedestrian. For each image a, a most dissimilar positive sample p is picked to carry out example level comparison learning with a most similar negative sample n. Generally, in a given sample, P may be selected to be 8, K may be selected to be 32. The identity i or the identity j specifically refers to one of P different pedestrians.
The example-level comparison learning is beneficial to learning the significant features among different samples by the multi-branch recognition network basic model, and the feature representation capability of the multi-branch recognition network basic model is enhanced. The two are combined for comparison learning, so that the clustering accuracy of the model is greatly improved, and the problem of noise pseudo-labels is solved.
In an embodiment of the present invention, a query instance feature f (q) is used to perform a comparison learning with a central feature repository Center Memory Bank, and the central feature repository Center Memory Bank is updated, where the updating method includes:
in calculating clustering contrast loss l c Then, updating the clustering center feature by using the query instance feature f (q), wherein a specific calculation formula is as follows:
Φ + ←(1-u)f(q)+uΦ +
wherein u is a parameter, and is used for slowly updating the characteristics of the clustering Center characteristic repository Center Memory Bank, so as to avoid losing the characteristic consistency due to violent vibration. The value range of the parameter u is generally [0,1], and the specific value of the parameter u can be selected according to actual needs so as to meet the actual needs.
In one embodiment of the invention, model network parameters theta are determined during model network parameter optimization based on joint contrast learning to minimize the loss function of NH training samples under the determined model network parameters theta,
Figure BDA0003936500690000161
when optimizing, the multi-camera domain basic network and all the single-camera domain basic networks are optimized simultaneously, and the following steps are performed:
Figure BDA0003936500690000162
wherein ,f(xa ) For anchor point images x a And (4) extracting image features.
In specific implementation, the number of NH training samples, which are samples selected from the training data set, may be selected as desired. maxd a,p Is that
Figure BDA0003936500690000163
mind a,n Is that
Figure BDA0003936500690000164
I.e. performing the operation of the two norms.
In an embodiment of the present invention, for the loss of the identity of the collaborative training, there are:
Figure BDA0003936500690000165
wherein ,lid In order to coordinate the loss of identity training,
Figure BDA0003936500690000166
is x i Nz is the number of training samples in the training data set,
Figure BDA0003936500690000167
for training sample x i The multi-branch network recognition basic model outputs a real identity label of
Figure BDA0003936500690000168
The probability of (c).
In the specific implementation, the number Nz of training samples in the training data set can be determined according to the condition of the provided training data set, and as can be seen from the above description, in the training process, after clustering, clustering is performed to assign clustering point pseudo labels and reassignment of outlier samples is performed, and then the training samples x are assigned to the outlier samples i The real identity label output by the multi-branch network recognition basic model can be obtained as
Figure BDA0003936500690000169
Probability of (2)
Figure BDA00039365006900001610
In specific implementation, as can be seen from the above description, in the training process, the cluster contrast loss l can be obtained c Example comparative loss l t And co-training identity loss l id Then, the total loss after one training can be obtained,
Figure BDA0003936500690000171
as can be seen from the above description, in determining the basis of multi-branch network identificationWhen the model is converged, the main judgment indexes comprise the precision of the model and the loss of the model, wherein the precision of the model can be the average precision average mAP generally, and the loss of the model is the total loss
Figure BDA0003936500690000172
After training, the specific calculation of the average precision mean value mAP can be consistent with the prior art, so that the judgment of whether the multi-branch network recognition basic model is converged can be effectively realized. And when the training of the multi-branch network recognition basic model is converged, the target training state is reached, and at the moment, the multi-branch network recognition basic model is utilized to form the multi-branch network recognition model Multiformer.
After obtaining the multi-branch network recognition model Multiformer, when unsupervised pedestrian re-recognition is performed, a query picture R needs to be provided to search for pedestrians with similar features to the query picture R in a picture set taken by m cameras. As can be seen from the above description, for a picture set captured by m cameras, all features in the picture set are extracted by using a multi-branch network recognition model Multiformer, and specifically, the manner and process for extracting the picture features in the picture set refer to the above description.
And after the query picture R is processed by the same technical means, extracting corresponding characteristics by using a multi-camera domain Interformer network. After the features of the query picture R are obtained, feature similarity with the features extracted from the picture set is calculated, and the specific manner and process for calculating the feature similarity may be the conventional manner. When the feature similarity is obtained through calculation, the pedestrian image matched with the extracted pedestrian feature can be selected and determined according to actual requirements, and if a feature similarity threshold value can be set, all the pedestrian images meeting the feature similarity threshold value are selected and determined as the pedestrian images matched with the query picture R. The selection and determination of the feature similarity threshold and the like can be selected according to the needs, so as to meet the requirements of actual application scenes.
In order to verify the accuracy and robustness of the invention, experiments are carried out by utilizing three public data sets, wherein the three public data sets are respectively as follows: market-1501, msmt17 and duke mtmc-reID, specifically the duke mtmc-reID dataset contains 36411 images of 1812 identities taken by 8 cameras, with a training set of 702 identities containing 16522 images and a test set of 702 identities. The Market-1501 data set contains 1501 pedestrians photographed by 6 cameras, with 751 identity in the training set containing 12936 images and 750 identity in the test set containing 19732 images. The MSMT17 dataset contains 4101 pedestrians and 126441 bounding boxes, which were captured by 15 cameras. The training set contains 1041 pedestrians for 32621 bounding boxes, and the test set contains 3060 pedestrians for 93820 bounding boxes.
Because these data sets are obtained under a plurality of camera devices, there are a variety of gestures, visual angles and illumination changes in the data sets, and meanwhile, there are a lot of cluttered backgrounds and occlusion between pedestrians under different scenes, and these data sets all have great challenges.
Table 1 data set introduction
Data set Number of categories Number of training classes Number of test classes Size of picture
DukeMTMC-reID 1812 702 1110 256*128
Market-1501 1501 751 750 256*128
MSMT17 4101 1041 3060 256*128
Table 1 is the total number of categories, training categories, and test categories for the three data sets, which may be set to 256 × 128 for the picture size.
TABLE 2 accuracy of the model over three pedestrian re-identification tasks
Data set Market-1501 DukeMTMC-reID MSMT17
mAP 79.1% 68.9% 36.0%
Table 2 shows the test results of the unsupervised pedestrian re-identification method provided by the invention on three unsupervised pedestrian re-identification tasks, namely Market-1501, duke MTMC-reiD and MSMT17, and the average precision mean mAP is used as an evaluation index in the unsupervised pedestrian re-identification method.
The invention achieves higher recognition rate on the three data sets. Although the three data sets have the difficulties of shielding, deformation, background confusion, low resolution and the like, the robust feature representation capability of the Multiformer, the optimization capability of the joint contrast learning strategy on cluster representation and the high-efficiency data utilization capability of the self-adaptive outlier sample redistribution strategy provided by the invention are benefited, the method has good robustness on the difficulties, and the performance is very excellent.
In order to verify the performance improvement effect of the multi-branch network identification model Multiformer, the adaptive outlier sample redistribution strategy and the Joint comparison Learning strategy on the whole unsupervised pedestrian re-identification task, an ablation experiment is performed on a Market-1501 data set as shown in table 4, specifically, VIT is taken as a base line network, namely Baseline, the Multiformer is expressed as the multi-branch network identification model Multiformer of the invention, JCL is expressed as the Joint comparison Learning module Joint Contrast Learning, and AORA is expressed as the adaptive outlier sample redistribution strategy.
As can be seen from Table 3, the accuracy of pure use of the baseline network is only 59.6% for the unsupervised pedestrian re-identification task of Market-1501. In a basic-line network, the network model structure is modified into a multi-branch network recognition model Multiformer, and the precision can reach 69.2%, which shows that the multi-branch network recognition model Multiformer can improve the feature representation capability of the model.
After cluster center features are established for combined comparison learning, the accuracy of the model can reach 77.1%, which shows that cluster level comparison learning can effectively enable the model to learn the similarity between the model and positive clusters and the difference between the model and negative clusters. On the basis, after the self-adaptive outlier sample redistribution strategy is added, the precision of the model can reach 79.1%, which shows that the module can more fully utilize the limited data samples, so that the training effect of the model is more sufficient.
TABLE 3 influence of different modules on Market-1501 unsupervised pedestrian re-identification task
Figure BDA0003936500690000181
Figure BDA0003936500690000191
In order to better show the effect of the Multiformer, the adaptive outlier sample redistribution strategy and the joint contrast learning strategy designed in the present invention, a visualization result is given in fig. 9.
In summary, the multi-branch network identification model Multiformer is constructed based on a Transformer network, the constructed multi-branch network identification model Multiformer comprises a single-camera-domain Intormer network and a multi-camera-domain Interormer network, all the single-camera-domain Intormer networks share backbone network parameters, the generalization capability is enhanced, inter-domain differences caused by backgrounds, illumination and the like of different camera domains are relieved to a certain extent, the robustness of the model to noise false tags is improved, and the accuracy of unsupervised pedestrian re-identification is further improved.
By using the self-adaptive outlier sample redistribution, the number of the pseudo labels can be expanded, and the feature representation capability of a multi-branch network recognition model Multiformer is enhanced. During model training, by using the combined learning of the example level comparison learning and the cluster level comparison learning, the clustering accuracy is greatly improved through the combined learning, and the problem of noise pseudo labels is relieved.

Claims (10)

1. An unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution is characterized by comprising the following steps:
constructing a multi-branch network identification model Multiformer based on a Transformer network to perform required unsupervised pedestrian re-identification on pedestrian images collected by m cameras by using the constructed multi-branch network identification model Multiformer, wherein,
identifying a model Multiformer for the constructed multi-branch network, wherein the model Multiformer comprises a single-camera-domain Intromer network constructed on the basis of a Transformer network for each camera and a multi-camera-domain Interormer network constructed on the basis of the Transformer network for all cameras;
when a multi-branch network recognition model multi-camera is constructed, the single-camera domain Intraform networks and the multi-camera domain Interform networks of all cameras adopt the same backbone network, and the single-camera domain Intraform networks of all cameras share the backbone network parameters during training;
when the pedestrian is re-identified, feature extraction is carried out on an identification image containing the pedestrian to be identified by using a multi-camera domain Interformer network, so that the pedestrian image matched with the extracted pedestrian feature is searched and determined in the pedestrian images collected by the m cameras according to the extracted pedestrian feature.
2. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample redistribution as claimed in claim 1, wherein when constructing the multi-branch network identification model Multiformer, the construction step comprises:
constructing a multi-branch network identification basic model based on a transform network, wherein the multi-branch network identification basic model comprises a multi-camera domain basic network based on the transform network and m single-camera domain basic networks based on the transform network, a classifier is configured in the multi-camera domain basic network and all the single-camera domain basic networks, and the configured classifier is adaptively connected with corresponding backbone networks in the multi-camera domain basic network or the single-camera domain basic network;
when a multi-branch network identification basic model is constructed, pre-training a backbone network for constructing a multi-camera domain basic network on the basis of an ImageNet data set to obtain multi-camera domain backbone network pre-training parameters of the multi-camera domain basic network;
when the constructed single-camera domain basic network is trained, loading the obtained multi-camera domain backbone network pre-training parameters to the backbone networks of all the single-camera domain basic networks so as to enable the single-camera domain basic networks of all the cameras to share the network backbone parameters;
performing required training on the constructed multi-branch network recognition basic model, so as to form a corresponding single-camera-domain Intraform network based on a trained single-camera-domain basic network and form a multi-camera-domain Interform network based on a trained multi-camera-domain basic network when a target training state is reached;
and forming a multi-branch network identification model Multiformer by using the multi-camera domain Interformer network and the m single-camera domain Intraformer networks.
3. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-allocation as claimed in claim 2, wherein when training the constructed multi-branch network identification basic model, the training process comprises:
step 1, performing feature extraction on a training data set by utilizing a multi-branch network identification basic model to obtain multi-camera-domain picture features F mc And single camera field picture feature F of the ith camera c_i ,i=1,…,m;
Step 2, the obtained multi-camera-domain picture characteristics F mc And single camera field picture feature F of ith camera c_i Clustering, wherein successfully clustered pictures form clustering points Inliers, clustering point pseudo labels are distributed to pictures in the clustering points Inliers, and unsuccessfully clustered pictures form Outliers;
step 3, generating a clustering point pseudo label clustering center based on the clustering point pseudo labels, performing self-adaptive outlier sample redistribution on Outliers Outiers by using the generated clustering point pseudo label clustering center, distributing corresponding clustering point pseudo labels to Outliers in the Outliers Outiers after the self-adaptive outlier samples are redistributed, and forming a pseudo label training set by using all clustering point pseudo labels;
step 4, performing joint comparison learning on the multi-branch network identification basic model to perform model network parameter optimization based on the joint comparison learning on the multi-branch network identification basic model, wherein,
for the ith single-camera domain basic network, based on the training data set and the single-camera domain picture characteristic F of the ith camera c_i Performing joint comparison learning on the clustering centers of the clustering point pseudo labels;
for the multi-camera domain basic network, based on the training data set and the multi-camera domain picture characteristics F mc Performing joint comparison learning on the clustering centers of the clustering point pseudo labels;
the joint contrast learning comprises clustering level contrast learning and example level contrast learning;
step 5, carrying out the collaborative training of the single-camera domain basic network and the multi-camera domain basic network on the multi-branch network identification basic model after the optimization based on the joint comparison learning, wherein,
using multiple camera-domain picture features F mc The pseudo label training set trains the multi-camera domain basic network;
for the ith single-camera domain basic network, utilizing the single-camera domain picture characteristic F of the ith camera c_i Training by using a pseudo label training set;
and 6, repeating the training process from the step 1 to the step 5 until a target training state is reached.
4. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-allocation as claimed in claim 3, wherein for step 1, the multi-camera domain picture features F are extracted mc Then, performing Spilt processing on any training picture in a training data set, connecting a parameter Cls token to each image block obtained by the Spilt processing, and embedding the position information of each image block and the camera information code of the training picture to configure and form training picture multi-camera domain feature extraction information;
processing the multi-camera domain feature extraction information of the training picture by using the multi-camera domain basic network to obtain the extraction resultMulti-camera-domain picture feature F mc
Extracting single-camera domain picture characteristic F of ith camera c_i Then, spilt processing is carried out on the training picture acquired by the ith camera, a parameter Cls token is connected to each image block obtained through the Spilt processing, and the position information of each image block is embedded to form single-camera domain feature extraction information of the training picture;
processing the training picture single-camera domain feature extraction information by using the single-camera domain basic network corresponding to the ith camera to extract and obtain a single-camera domain picture feature F c_i
5. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample redistribution as claimed in claim 3, wherein in step 2, the obtained multi-camera domain picture feature F is mc And all single camera field picture features F c And when clustering is carried out, the clustering method comprises a DBSCAN clustering method.
6. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-allocation as claimed in claim 3, wherein in step 3, for the clustering center of the pseudo labels of the clustering points, there are:
Figure FDA0003936500680000031
wherein Y is the category number of the pseudo labels of the clustering points, phi i As cluster center feature of class i, f j Num is the feature of the ith picture i The number of pictures included in the ith category;
the generated clustering point pseudo label clustering Center is stored in a clustering Center feature repository Center Memory Bank;
computing an affinity matrix between outlier samples within Outliers and the cluster point pseudo-tag cluster centers, wherein,
the affinity matrix between Outliers and the clustering point pseudo tag clustering center is:
Figure FDA0003936500680000032
AFM (i, j) is the ith clustering center characteristic phi in the AFM i Mutual similarity relation value with jth outlier sample, O j Features of a jth outlier sample; phi i_r Represents the ith cluster center feature phi i R number of (1), O j_r Represents the jth outlier sample feature O j R, N represents a characteristic dimension;
and when the calculated affinity matrix AFM self-adaptive outlier sample is redistributed, allocating the outlier sample to the clustering center of the clustering point pseudo label with the strongest mutual similarity relation.
7. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample redistribution as claimed in claim 6, wherein a mutual similarity relation threshold v is configured for the mutual similarity relation between the outlier sample and the clustering point pseudo tag clustering center, wherein,
Figure FDA0003936500680000033
Num O is the number of Outliers, v, within the Outliers start Is the initial value of threshold v of mutual similarity relation, gamma is threshold attenuation rate, epoch is training round, e peak The training round is used for representing the time when the threshold value v of the mutual similarity reaches the peak value, II (-) is an indication function, and when the training round is less than e peak Is 1, i.e. II (·) = II { epoch)<e peak };
When the outlier samples are distributed based on the configured mutual similarity relation threshold v, the jth outlier sample of which the mutual similarity relation value AFM (i, j) is greater than the mutual similarity relation threshold v is distributed to the clustering center of the pseudo label of the clustering point with the strongest mutual similarity relation.
8. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-allocation as claimed in claim 6, wherein in step 4, in the process of combined contrast learning, after the cluster-level contrast learning, the cluster contrast loss l is obtained c (ii) a After example-level contrast learning, an example contrast loss l is obtained t, wherein ,
contrast loss for clusters l c Then, there are:
Figure FDA0003936500680000041
wherein ,Φ+ The method comprises the steps that positive samples of a sample picture q are obtained, gamma is a set parameter, and f (q) is a query example feature of the sample picture q;
comparison of losses for examples l t Then, there are:
Figure FDA0003936500680000042
wherein P is the number of different pedestrians selected in a given sample, K is the number of sample pictures selected for each pedestrian in the given sample, a is one picture in the K sample pictures,
Figure FDA0003936500680000043
for an anchor image with identity i,
Figure FDA0003936500680000044
is a positive sample with the identity i,
Figure FDA0003936500680000045
is a negative sample of the identity j,
Figure FDA0003936500680000046
for anchor images of identity i
Figure FDA0003936500680000047
And beta is the minimum gap between the similarity of the positive sample pair and the similarity of the negative sample pair.
9. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample redistribution as claimed in claim 8, wherein the model network parameter θ is determined during model network parameter optimization based on joint contrast learning to minimize a loss function of NH training samples under the determined model network parameter θ, wherein,
when optimizing, the multi-camera domain basic network and all the single-camera domain basic networks are optimized simultaneously, and the method comprises the following steps:
Figure FDA0003936500680000048
f(x a ) For anchor point image x a And (4) extracting image features.
10. The unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-allocation as claimed in claim 8, wherein for the loss of co-training identity, there are:
Figure FDA0003936500680000049
wherein ,lid In order to coordinate the loss of identity training,
Figure FDA0003936500680000051
is x i Nz is the number of training samples in the training data set,
Figure FDA0003936500680000052
for training sample x i Multi-branch network identification basic model output truthThe real identity label is
Figure FDA0003936500680000053
The probability of (c).
CN202211404730.9A 2022-11-10 2022-11-10 Unsupervised pedestrian re-identification method based on multi-former and outlier sample re-distribution Active CN115601791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211404730.9A CN115601791B (en) 2022-11-10 2022-11-10 Unsupervised pedestrian re-identification method based on multi-former and outlier sample re-distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211404730.9A CN115601791B (en) 2022-11-10 2022-11-10 Unsupervised pedestrian re-identification method based on multi-former and outlier sample re-distribution

Publications (2)

Publication Number Publication Date
CN115601791A true CN115601791A (en) 2023-01-13
CN115601791B CN115601791B (en) 2023-05-02

Family

ID=84852926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211404730.9A Active CN115601791B (en) 2022-11-10 2022-11-10 Unsupervised pedestrian re-identification method based on multi-former and outlier sample re-distribution

Country Status (1)

Country Link
CN (1) CN115601791B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403015A (en) * 2023-03-13 2023-07-07 武汉大学 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014193630A1 (en) * 2013-05-30 2014-12-04 Intel Corporation Quantization offset and cost factor modification for video encoding
GB2548870A (en) * 2016-03-31 2017-10-04 Ekkosense Ltd Remote monitoring
CN111723645A (en) * 2020-04-24 2020-09-29 浙江大学 Multi-camera high-precision pedestrian re-identification method for in-phase built-in supervised scene
WO2022043741A1 (en) * 2020-08-25 2022-03-03 商汤国际私人有限公司 Network training method and apparatus, person re-identification method and apparatus, storage medium, and computer program
CN114596589A (en) * 2022-03-14 2022-06-07 大连理工大学 Domain-adaptive pedestrian re-identification method based on interactive cascade lightweight transformations
CN114663685A (en) * 2022-02-25 2022-06-24 江南大学 Method, device and equipment for training pedestrian re-recognition model
CN114972794A (en) * 2022-06-15 2022-08-30 上海理工大学 Three-dimensional object recognition method based on multi-view Pooll transducer
CN115205570A (en) * 2022-09-14 2022-10-18 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014193630A1 (en) * 2013-05-30 2014-12-04 Intel Corporation Quantization offset and cost factor modification for video encoding
GB2548870A (en) * 2016-03-31 2017-10-04 Ekkosense Ltd Remote monitoring
CN111723645A (en) * 2020-04-24 2020-09-29 浙江大学 Multi-camera high-precision pedestrian re-identification method for in-phase built-in supervised scene
WO2022043741A1 (en) * 2020-08-25 2022-03-03 商汤国际私人有限公司 Network training method and apparatus, person re-identification method and apparatus, storage medium, and computer program
CN114663685A (en) * 2022-02-25 2022-06-24 江南大学 Method, device and equipment for training pedestrian re-recognition model
CN114596589A (en) * 2022-03-14 2022-06-07 大连理工大学 Domain-adaptive pedestrian re-identification method based on interactive cascade lightweight transformations
CN114972794A (en) * 2022-06-15 2022-08-30 上海理工大学 Three-dimensional object recognition method based on multi-view Pooll transducer
CN115205570A (en) * 2022-09-14 2022-10-18 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNWEI ZHAO; SHILIANG ZHANG; TIEJUN HUANG: "\"Transformer-Based Domain Adaptation for Event Data Classification\"" *
胡杰;昌敏杰;熊宗权;徐博远;谢礼浩;郭迪: "" 基于Transformer的图像分类网络MultiFormer"" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403015A (en) * 2023-03-13 2023-07-07 武汉大学 Unsupervised target re-identification method and system based on perception-aided learning transducer model
CN116403015B (en) * 2023-03-13 2024-05-03 武汉大学 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Also Published As

Publication number Publication date
CN115601791B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
US10002313B2 (en) Deeply learned convolutional neural networks (CNNS) for object localization and classification
Von Stumberg et al. Gn-net: The gauss-newton loss for multi-weather relocalization
Novotny et al. Semi-convolutional operators for instance segmentation
Chuang et al. A feature learning and object recognition framework for underwater fish images
CN110532897B (en) Method and device for recognizing image of part
CN111191655B (en) Object identification method and device
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN110866896B (en) Image saliency target detection method based on k-means and level set super-pixel segmentation
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN109492610B (en) Pedestrian re-identification method and device and readable storage medium
CN114255434A (en) Multi-target tracking method and device
CN116798070A (en) Cross-mode pedestrian re-recognition method based on spectrum sensing and attention mechanism
CN115147632A (en) Image category automatic labeling method and device based on density peak value clustering algorithm
CN115601791B (en) Unsupervised pedestrian re-identification method based on multi-former and outlier sample re-distribution
Chralampidis et al. Classification of noisy signals using fuzzy ARTMAP neural networks
Liao et al. Multi-scale saliency features fusion model for person re-identification
Taylor et al. Pose-sensitive embedding by nonlinear nca regression
CN112016592B (en) Domain adaptive semantic segmentation method and device based on cross domain category perception
CN112686247A (en) Identification card number detection method and device, readable storage medium and terminal
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
CN111553202B (en) Training method, detection method and device for neural network for living body detection
US20210034915A1 (en) Method and apparatus for object re-identification
Chung et al. Rotation invariant aerial image retrieval with group convolutional metric learning
Essa et al. High Order Volumetric Directional Pattern for Video‐Based Face Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant