CN116403015A - Unsupervised target re-identification method and system based on perception-aided learning transducer model - Google Patents

Unsupervised target re-identification method and system based on perception-aided learning transducer model Download PDF

Info

Publication number
CN116403015A
CN116403015A CN202310248659.8A CN202310248659A CN116403015A CN 116403015 A CN116403015 A CN 116403015A CN 202310248659 A CN202310248659 A CN 202310248659A CN 116403015 A CN116403015 A CN 116403015A
Authority
CN
China
Prior art keywords
learning
perception
target
mask
unsupervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310248659.8A
Other languages
Chinese (zh)
Other versions
CN116403015B (en
Inventor
叶茫
陈朔怡
李辰玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202310248659.8A priority Critical patent/CN116403015B/en
Publication of CN116403015A publication Critical patent/CN116403015A/en
Application granted granted Critical
Publication of CN116403015B publication Critical patent/CN116403015B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unsupervised target re-identification method and system based on a perception-assisted learning transducer model, and designs a mutual learning method taking discrimination information and detail perception into consideration on the basis of utilizing the advantages of the transducer in terms of global modeling and learning structural information aiming at target re-identification tasks under an unsupervised scene. For discriminant feature learning, based on pseudo tags generated by clustering, the model is optimized by clustering and instance-level combination loss. For perception aided learning, the invention performs local masking on the image at the block level and constructs an alignment strategy guided by the original visual signal to achieve the purpose of fine-grained modeling. Furthermore, the present invention proposes a target-aware masking strategy to avoid part of the background interference. Under the condition that the test setting and reasoning time is not increased, the invention greatly improves the retrieval accuracy of the unsupervised target re-identification task.

Description

Unsupervised target re-identification method and system based on perception-aided learning transducer model
Technical Field
The invention belongs to the technical field of computer vision image retrieval, relates to a target re-identification method and a target re-identification system, and particularly relates to an unsupervised target re-identification method and a target re-identification system based on a perception-assisted learning transducer model.
Background
Unsupervised target Re-identification (Re-ID) is a task of retrieving specific objects (e.g. pedestrians, vehicles) by non-overlapping cameras without tags of data, whose query set consists of captured images of multiple pedestrians or vehicles. Much research in this area has focused mainly on supervised learning methods that require tags. In an actual application scene, a large amount of monitoring image data is marked, so that a large amount of time and labor cost are consumed, and the unsupervised target re-identification can save a large amount of labor cost, so that the method has a great application value.
The lack of identity tags to supervise the syndrome model learning makes unsupervised target re-identification very challenging. The existing unsupervised target re-identification research is mainly based on a convolutional neural network, and the main stream method is to train a model by using clustering pseudo tags. Some of these methods are based on feature extraction of convolutional neural models, some of which focus mainly on generating high quality pseudo tags, while others focus more on the design of clustering algorithms and training strategies. Lin et al (document 1) propose a bottom-up clustering method that exploits diversity among individuals and intra-individual similarity. Dai et al (literature 2) designed an unsupervised baseline based on clusters for storing features and calculating contrast loss at cluster level. RLCCs proposed by Zhang et al (document 3) note improving cluster quality and they propose a method of generating samples that can provide supplemental information to aid in clustering. In addition, PPLR (document 4) uses a relationship between local features and global features to reduce tag noise to improve pseudo tag quality. However, convolutional neural network structures are limited by local receptive fields, and long-distance relationships are difficult to establish in the early stages. Recently, luo et al (document 5) employed a visual transducer-based self-monitoring method, pre-training using a large scale unlabeled pedestrian re-recognition dataset LUperson (document 6). The study shows that the application of the pre-trained visual transducer directly to the existing method can significantly improve the performance of the Re-ID task. Indeed, the self-attention mechanism of the visual transducer has natural long-range properties for efficient global modeling. Furthermore, the transformation is more prone to learn shape and structure information than CNNs that rely on local texture information. For common challenges such as occlusion and interference faced by Re-ID tasks, visual transformers have shape recognition capabilities comparable to the human visual system, with greater robustness (document 7). The potential of visual transformers in the field of unsupervised Re-ID can be further exploited.
On the other hand, while visual transducer has demonstrated a powerful capability to extract feature representations, simply applying it to existing methods still suffers from the lack of capturing fine-grained information. Because the existing unsupervised Re-ID methods are based on global discriminant learning of pseudo tags, they are mainly focused on identity-related attributes at the category level. In fact, the visual perception of the details of the image itself is not well exploited. Compared with convolutional neural networks, visual transformers have greater potential in learning visual information that is rich in images. Using visual transducer to introduce the concept of blocking, mae (document 8) builds self-supervised training by randomly masking blocks and then performing pixel-level reconstruction. Similarly, simMIM (document 9) learns better feature representations by predicting the original signal of the occluded region to enhance the model's understanding of visual information. These studies indicate that model learning can also benefit from low-level visual signals. Also, introducing visual information learning strategies (e.g., masking) into convolutional neural networks typically requires very complex designs, because the feature map generated by the convolutional operation retains a large number of interference edges. However, these vision transducer-based methods can only learn generalized features, often requiring task-specific supervised fine tuning when applied to different types of downstream tasks. For Re-ID tasks, learning identity discrimination features plays a key role, while local fine-grained information helps to further distinguish difficult samples (documents 10-14).
Therefore, how to combine discrimination information and local detail perception when feature learning is performed under a unified framework, and improving fine granularity modeling capability while performing identity discrimination is a critical problem in an unsupervised target re-recognition task.
[ document 1]Yutian Lin,Xuanyi Dong,Liang Zheng,Yan Yan,and Yi Yang.A bottom-up clustering approach to unsupervised person re-identification.In AAAI, volume 33, pages 8738-8745,2019.
[ document 2]Zuozhuo Dai,Guangyuan Wang,Weihao Yuan,Xiaoli Liu,Siyu Zhu,and Ping Tan.Cluster contrast for unsupervised person re-identification.arXiv preprintarXiv:2103.11568,2021.
[ document 3]Xiao Zhang,Yixiao Ge,Yu Qiao,and Hongsheng Li.Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In CVPR, pages 3436-3445,2021.
[ document 4]YoonkiCho,Woo Jae Kim,SeunghoonHong,and Sung-EuiYoon. Part-based pseudo label refinement for unsupervised person re-identification. InCVPR, pages 7308-7318,2022.
[ 5]Hao Luo,Pichao Wang,Yi Xu,Feng Ding,Yanxin Zhou,Fan Wang,Hao Li,and Rong Jin.Self-supervised pre-training for transformer-based person re-identification.arXiv preprint arXiv:2111.12084,2021.
[ document 6]Dengpan Fu,Dongdong Chen,Jianmin Bao,Hao Yang,Lu Yuan,Lei Zhang,Houqiang Li,and Dong Chen.Unsupervised pre-training for person re-identification.InCVPR, pages 14750-14759,2021.
[ document 7]Muhammad Muzammal Naseer,Kanchana Ranasinghe,Salman H Khan,Munawar Hayat,Fahad Shahbaz Khan,and Ming-Hua yang. Including properties of vision transformers. Neurops, 34:23296-23308,2021.
[ 8]Kaiming He,Xinlei Chen,Saining Xie,Yanghao Li,Piotr Doll ar, and Ross Girsheck. Molded autoencoders are scalable vision features. In CVPR, pages 16000-16009,2022.
[ document 9]ZhendaXie,Zheng Zhang,Yue Cao,Yutong Lin,Jianmin Bao,Zhuliang Yao,Qi Dai,and Han Hu.Simmim:A simple framework for masked image modeling.In CVPR,pages 9653-9663,2022.
[ document 10]GuanshuoWang,YufengYuan,XiongChen,JiweiLi,and Xi Zhou.Learning discriminative features with multiple granularities for person re-identification. In ACM MM pages 274-282,2018.
[ document 11]Yifan Sun,Liang Zheng,Yi Yang,Qi Tian,and Shengjin Wang.Beyond part models:Person retrieval with refined part pooling (and a strong convolutional baseline): in ECCV pages480-496,2018.
[ document 12]YoonkiCho,Woo Jae Kim,SeunghoonHong,and Sung-EuiYoon. Part-based pseudo label refinement for unsupervised person re-identification. InCVPR, pages 7308-7318,2022.
[ document 13]Kuan Zhu,Haiyun Guo,Tianyi Yan,Yousong Zhu,Jinqiao Wang,and Ming Tang.Pass:Part-aware self-supervised pretraining for person re-identification.In ECCV, pages 198-214.Springer Nature Switzerland Cham,2022.
[ document 14]Yifan Sun,Qin Xu,Yali Li,Chi Zhang,Yikang Li,Shengjin Wang,and Jian Sun.Perceive where to focus:Learning visibility-aware part-level features for partial person re-identification.In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pages 393-402,2019.
Disclosure of Invention
Aiming at the defects of the prior invention, the invention provides an unsupervised target re-identification method and system based on a perception-aided learning transducer model, and designs a target perception local mask alignment method to mine fine-grained visual perception information of an image, so as to assist and supplement the learning of discrimination features, and achieve the aim of improving the retrieval accuracy of the unsupervised target re-identification model.
The technical scheme adopted by the method is as follows: an unsupervised target re-identification method based on a perception-assisted learning transducer model comprises the following steps:
step 1: constructing a transducer model based on perception aided learning;
the perception assisted learning transducer model comprises a block generation module, a target perception mask module, a transducer backbone network module and a mask alignment module;
the block generation module comprises four convolution layers which are sequentially connected, wherein the convolution kernel of the first convolution layer is 7*7, a half of channels are processed by adopting a batch normalization layer after convolution, the other half of channels are processed by adopting an example normalization layer, and a function layer is activated by RELU after processing; the convolution kernel of the second convolution layer is 3*3, after convolution, batch normalization layer processing is adopted for half channels, the other half channels are processed by example normalization layers, and after processing, function layers are activated by RELU; the convolution kernel of the third convolution layer is 3*3, all channels are processed by adopting a batch normalization layer after convolution, and a function layer is activated by RELU after processing; the convolution kernel size of the fourth convolution layer is 16×16;
the target perception mask module comprises a plurality of random initialization mask modules, each mask module is a trainable parameter, the data format is the same as the block length of the block generation module, and the mask modules are used for replacing a designated part of common blocks and then are used as input of a transform backbone network;
the converter backbone network comprises a plurality of converter layers, each layer consists of a multi-head self-attention MSA and two layers of full-connection network MLP using GELU activation function, and LayerNorm and residual connection are arranged in front of the MSA and the MLP;
the mask alignment module comprises a dimension conversion function, converts the original image dimension into a characteristic dimension and is used for a subsequent pixel level alignment loss function;
step 2: and inputting the image to be identified into the transducer model based on the perception aided learning to obtain a target re-identification result.
The system of the invention adopts the technical proposal that: an unsupervised target re-recognition system based on a perceptually aided learning transducer model, comprising:
one or more processors;
and a storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement an unsupervised target re-identification method based on a perceptually assisted learning transducer model as described.
The invention has the following advantages:
(1) The invention applies the Transformer to the unsupervised target re-identification work for the first time. Using Vision Transformer remote attention modeling and more powerful feature extraction we propose a mutual learning framework that comprehensively considers discriminating features and detail perception.
(2) The invention designs a perception-assisted learning strategy based on target perception mask alignment, which helps a transducer learn block-level local details. Better discrimination characteristics supplemented by local details can be extracted under the mutual learning of the models;
(3) Compared with a model based on a convolutional neural network, the method provided by the invention effectively improves the retrieval accuracy of the model under the condition that the test setting and the deducing time are kept unchanged.
Drawings
FIG. 1 is a block diagram of a generating module according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a perception aided learning system according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a framework for learning perception assistance and discrimination features based on a transducer according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.
The invention provides an unsupervised target re-identification method based on a perception-assisted learning transducer model, which comprises the following steps:
step 1: constructing a transducer model based on perception aided learning;
the transducer model for perception aided learning comprises a block generation module, a target perception mask module, a transducer backbone network module and a mask alignment module;
please refer to fig. 1, the block generating module of the present embodiment includes four sequentially connected convolution layers, wherein the convolution kernel of the first convolution layer is 7*7, after convolution, a batch normalization layer (BN layer) is adopted for half channels, and the other half channels are processed by an example normalization layer (IN layer), and after processing, a function layer is activated by RELU; the convolution kernel of the second convolution layer is 3*3, after convolution, batch normalization layer processing is adopted for half channels, the other half channels are processed by example normalization layers, and after processing, function layers are activated by RELU; the convolution kernel of the third convolution layer is 3*3, all channels are processed by adopting a batch normalization layer after convolution, and a function layer is activated by RELU after processing; the convolution kernel size of the fourth convolution layer is 16×16;
the target perception mask module of the embodiment comprises a plurality of random initialization mask modules, wherein each mask module is a trainable parameter, the data format is identical to the block length of the block generation module and is 768-dimensional vector, and the mask module is used for replacing a designated part of common blocks and then is used as input of a transformer backbone network;
the converter backbone network of the embodiment comprises a plurality of converter layers, wherein each layer consists of a multi-head self-attention MSA and two layers of full-connection network MLP using GELU activation function, and LayerNorm and residual connection are arranged in front of the MSA and the MLP;
the mask alignment module of the embodiment includes a dimension conversion function, which converts the original image dimension into the feature dimension for the subsequent pixel level alignment loss function;
step 2: and inputting the image to be identified into a transducer model based on perception aided learning to obtain a target re-identification result.
The transducer model based on perception aided learning in this embodiment is a trained model; according to the method, the importance of fine granularity information on the ReID task and incapability of capturing detail perception information by judging and learning based on the conventional pseudo tag generated based on the clustering algorithm are comprehensively considered. The method has the core idea that the distinguishing characteristics are supplemented by improving the fine granularity modeling capability of the model through rich visual signals. Specifically, the perception-assisted learning transducer model provided by the invention utilizes the advantages of the visual transducer in terms of global modeling and learning structural information on the model to extract more robust discrimination features. The whole is composed of two branches, namely, distinguishing characteristic learning and perception auxiliary learning. For discriminant feature learning, the model is trained by combining the loss through a cluster level and an instance set with the help of pseudo tags generated by the clusters. For perceptually assisted learning, the invention locally masks the image at the block level and builds an alignment guided by the original visual signal. Wherein the unmasked image blocks may be considered as representations of local information, the model needs to use existing partially visible information to infer the visual signal of the blank area. It improves the model's ability to understand local detail. Finally, the two branches act together to complete the training process in a mutual learning mode.
Furthermore, it is contemplated that the direct use of random masking may be affected by a large area background, resulting in a model focused on some areas of interference. The present invention proposes a target-aware masking approach that is more prone to selecting a central region of the target to more align key regions of the target during training.
Under the condition that the test setting and reasoning time is not increased, the invention greatly improves the retrieval accuracy of the unsupervised target re-identification task.
The deep learning framework adopted in this embodiment is Pytorch. The hardware environment of the experiment is NVIDIA GeForce RTX3090 x 8 graphic card, and the processor is
Figure BDA0004127010920000061
Gold 6240, memory 256g 27999ddr 4. The specific implementation flow of the unsupervised target re-identification method based on the transformer is as follows:
the first step: constructing a transducer model based on perception aided learning;
please refer to fig. 3, in this experiment, vision Transformer (ViT) network is adopted as the feature extractor, and training is completed by discriminating the feature learning branch and the perception assisting learning branch to learn each other. In the feature learning, the obtained features are clustered to obtain pseudo tags, and then the network updating is guided by adopting contrast learning according to the pseudo tags. In the perceptually assisted learning branch, the model is guided to learn fine-grained information by aligning the original pixels of the mask portion to construct a supervisory signal.
And a second step of: training a transducer model based on perception-aided learning;
dividing the photographed pictures into a training set and a testing set. The training set images are fed into a network using a perceptually assisted learning algorithm and based on a transducer. The network parameters are optimized and updated using forward and backward propagation.
(1) Discriminating characteristic learning;
definition set
Figure BDA0004127010920000062
To represent the input of a discriminating characteristic learning branch, where n is the number of training images; each input image is +.>
Figure BDA0004127010920000063
H and W represent the width and height of the image, respectively, C represents the three channels of RGB, ++>
Figure BDA0004127010920000064
Representing a real number; after the block embedding operation, the image is divided into N block embedments with D dimensions; then introducing a learnable global classification feature for the sequence representation of the global feature, and a learnable position embedding to maintain the spatial position relationship; during training, the perceptually assisted learning transducer model extracts features from all training images to obtain initial features represented by global classification features>
Figure BDA0004127010920000065
The initial features are clustered by the usual clustering algorithm (Martin Ester, hans-Peter Kriegel,)>
Figure BDA0004127010920000066
San der, xiaoiei Xu, et al, a density-based allgorithm for discovering clusters in large spatial databases with noise, in kdd, volume 96, pages 226-231,1996.), generating pseudo tags from the clustering result; in addition, a memory dictionary is established to store the representative characteristics of each cluster and the corresponding pseudo tag; initializing representative features of each cluster by a cluster average value, and subsequently carrying out momentum update; the examples are considered positive samples with the same pseudo tags in the memory dictionary, the rest are negative samples; the contrast loss at cluster level is:
Figure BDA0004127010920000071
wherein m is j Representative features representing cluster levels in memory dictionary, m + Representing the corresponding positive features in the memory dictionary, wherein k is the clustering number, and τ is a self-defined temperature parameter; as the network updates in each iteration, the memory dictionary will also update the features to maintain consistency;
momentum updates are as follows:
m j ←μm j +(1-μ)f h ,(2)
wherein f h Samples representing the most difficult of the batches (the one of the sample features of the same pseudo tag that is the least similar to the feature in the in-memory dictionary)μ represents momentum.
(2) Example level loss function design.
The invention takes the examples with the same pseudo labels as the input samples in the batch as positive samples and the others as negative samples. Pulling the positive samples apart a reduced distance and letting the negative samples get far away can bring the instances in the cluster together, which is easier to distinguish. This procedure is represented as follows:
Figure BDA0004127010920000072
in the process, ,
Figure BDA0004127010920000073
representing instance level losses, f + Is characteristic of positive samples, f - Is characteristic of a negative sample.
(3) The target perceives the mask.
For block embedding input similar to initial and discriminant feature learning, the set processed by the target perception mask is adopted
Figure BDA0004127010920000074
As input to perceptually assisted learning. The invention chooses to mask the partial block embedding near the center of the image by excluding the block embedding at the c-turn at the image edge and then randomly masking the remainder. These masks are defined as randomly initialized learner block embedding +.>
Figure BDA0004127010920000075
For subsequent direct learning of local visual perception wherein +.>
Figure BDA0004127010920000076
Indicating that a certain one of the learner-able mask blocks is embedded, and m indicates the number of mask blocks. The present embodiment replaces the block after the embedding operation corresponding to the center portion of the image with the mask feature as the final input of the perceptual learning branch.
(4) Visual perception alignment. Please refer to fig. 2The present invention performs a dimensional transformation on the original image input to facilitate alignment of pixels with block features,
Figure BDA0004127010920000077
will be reconfigured as +.>
Figure BDA0004127010920000078
The alignment formula for the perceived block and the block of the mask area of the image is:
Figure BDA0004127010920000079
wherein,,
Figure BDA00041270109200000710
representing the learned features of the ith mask block,/-)>
Figure BDA00041270109200000711
Representing pixel values of a corresponding i-th block portion of the image; m represents the number of mask blocks;
to enhance visual perception and fine-grained modeling capabilities to aid and supplement discriminant feature learning, the present invention establishes a direct correlation between feature level information and pixel information. The final loss function is expressed as:
Figure BDA0004127010920000081
wherein lambda is 1 And lambda (lambda) 2 Representing the weight of each part.
And a third step of: transducer model test based on perception aided learning;
in the test stage, the method only uses a trained visual transducer model to extract the characteristics, and then calculates the similarity between the target characteristics to be queried and all image characteristics of the database to obtain a search result sequence with high similarity.
The image of the target object in the test set is used as a set to be queried, and the rest of the shooting images are used as a gallery set. And (3) reasoning by adopting a model with the best effect in the training process to obtain a final retrieval result on the test set. The evaluation index adopts Rank-1, mAP and mINP matching precision, and the precision reflects the retrieval probability of the correct re-identification image.
In the experiment, the invention performs experimental verification on two common pedestrian re-identification data sets Market1501 and MSMT7 collected by using a ground monitoring camera. The mark 1501 includes 1501 pedestrians photographed by 6 cameras (5 high-definition cameras and 1 low-definition camera), 32668 detected pedestrian rectangular frames. Each pedestrian is captured by at least 2 cameras and may have multiple images in one camera. The training set comprises 751 people and comprises 12936 images, and 17 training data of each person are obtained on average; the test set contained 750 people, 19732 images, and an average of 26 test data for each person. The pedestrian detection rectangle of 3368 query images was manually drawn, and the pedestrian detection box in the gamma was detected using a DPM detector. The fixed number of training sets and test sets provided by the data set may be used under single-shot or multi-shot test settings. The MSMT7 employs a 15 camera network placed in a campus, including 12 outdoor cameras and 3 indoor cameras, resulting in a 126441 pedestrian rectangular frame for 4101 pedestrians.
The invention uniformly adjusts the images to 256 x 128. In addition, data enhancement methods such as padding of 10 pixels, random clipping, and random erasure with a probability of 0.5 are employed in the training data. The block size is set to 16 x 16, resulting in a feature dimension of 768. The batch size is set to 256, including 32 identities, each with 8 images. The training batch number is 50, and the iteration number of each batch is 400. We used DBSCAN clustering algorithm with the maximum neighborhood distance on the mark 1501 test set to 0.5 and the maximum neighborhood distance of MSMT17 set to 0.7. The memory dictionary is initialized with the mean value of the cluster features, and the momentum mu of the memory update of the most difficult sample is set to 0.2. For the discriminating characteristic learning branch, the temperature of the cluster level contrast loss is set to 0.05. For the perceptually assisted learning branch, the feature centered in the image is masked with a mask ratio of 25%. In terms of the penalty function, the weight of the cluster-level penalty is 1, the weight of the instance-level penalty is 0.4 for Market1501, 0.6 for MSMT17, and the weight of the mask alignment penalty is 1. A random gradient descent (SGD) optimizer was used during training. The initial learning rate was 0.00035, 10-fold decrease per 20 training sessions.
In order to verify the effectiveness of the invention, the search result of the invention is compared with the existing unsupervised and self-supervised target re-identification method, and the existing unsupervised and self-supervised target re-identification method mainly comprises the following steps:
(1)BagTricks:Hao Luo,Youzhi Gu,Xingyu Liao,Shenqi Lai,and Wei Jiang.Bag of tricks and a strong baseline for deep person re-identification.In CVPRW,pages 0–0,2019.
(2)SCSN:Xuesong Chen,Canmiao Fu,Yong Zhao,Feng Zheng,Jingkuan Song,Rongrong Ji,and Yi Yang.Salience-guided cascaded suppression network for person re-identification.In CVPR,pages 3300–3310,2020.
(3)AGW:Mang Ye,Jianbing Shen,Gaojie Lin,Tao Xiang,Ling Shao,and Steven C.H.Hoi.2021.Deep learning for person re-identification:A survey and outlook.IEEE TPAMI(2021),1–1.
(4)TransReID:Shuting He,Hao Luo,Pichao Wang,Fan Wang,Hao Li,and Wei Jiang.2021.Transreid:Transformer-based object re-identification.In ICCV.15013–15022.
(5)SpCL:Yixiao Ge,Feng Zhu,Dapeng Chen,Rui Zhao,et al.Selfpaced contrastive learning with hybrid memory for domain adaptive object re-id.NeurIPS,33:11309–11321,2020.
(6)ICE:Hao Chen,Benoit Lagadec,and Francois Bremond.Ice:Inter-instance contrastive encoding for unsupervised person re-identification.In ICCV,pages 14960–14969,2021.
(7)Cluster-Contrast:Zuozhuo Dai,Guangyuan Wang,Weihao Yuan,Xiaoli Liu,Siyu Zhu,and Ping Tan.Cluster contrast for unsupervised person re-identification.arXiv preprint arXiv:2103.11568,2021.
(8)IIDS:Shiyu Xuan and Shiliang Zhang.Intra-inter domain similarity forunsupervised person re-identification.IEEE TPAMI,2022.
(9)ISE:Xinyu Zhang,Dongdong Li,Zhigang Wang,Jian Wang,Errui Ding,Javen Qinfeng Shi,Zhaoxiang Zhang,and Jingdong Wang.Implicit sample extension for unsupervised person re-identification.In CVPR,pages 7369–7378,2022.
(10)PPLR:Yoonki Cho,Woo Jae Kim,Seunghoon Hong,and Sung-EuiYoon.Part-based pseudo label refinement for unsupervised person re-identification.In CVPR,pages 7308–7318,2022.
(11)PASS:Kuan Zhu,Haiyun Guo,Tianyi Yan,Yousong Zhu,Jinqiao Wang,and Ming Tang.Pass:Part-aware self-supervised pretraining for person re-identification.In ECCV,pages 198–214.Springer Nature Switzerland Cham,2022.
(12)TransReID-SSL:Hao Luo,Pichao Wang,Yi Xu,Feng Ding,Yanxin Zhou,Fan Wang,Hao Li,and Rong Jin.Self-supervised pre-training for transformer-based person re-identification.arXiv preprint arXiv:2111.12084,2021.
tests were performed on the mark-1501 and MSMT17 datasets and the results are shown in Table 1:
TABLE 1
Figure BDA0004127010920000101
As can be seen from table 1: evaluation is carried out on a Market1501 data set, the highest 96.0% accuracy of the method Rank1 is achieved, and the mAP is 91.0%. In addition, the evaluation result of the method on a larger and more complex data set MSMT17 is obviously superior to that of a CNN-based method, and also exceeds that of two methods based on a transducer which are self-supervised pre-training and fine tuning by using ClusterContrast, and the accuracy rate of Rank1 and mAP respectively reach 78.6% and 56.2%, which are improved by 3.6% and 5.6% compared with the previous unsupervised method. In contrast to the supervised approach, although the proposed approach is purely unsupervised, it is still not inferior in performance to the supervised approach. On the mark 1501 dataset, the proposed method Rank1 achieves 96.0% accuracy, mAP achieves 91.0%, superior to most of the most advanced and supervised methods in the past two years. The experimental results on both data sets demonstrate the effectiveness and superiority of the present invention.
The method provided by the invention tests on a large number of unsupervised target re-identification data sets, and the obtained result is superior to the current most advanced unsupervised target re-identification method, and even has competitive power in the supervised target re-identification method.
It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims (6)

1. An unsupervised target re-identification method based on a perception-assisted learning transducer model is characterized by comprising the following steps:
step 1: constructing a transducer model based on perception aided learning;
the perception assisted learning transducer model comprises a block generation module, a target perception mask module, a transducer backbone network module and a mask alignment module;
the block generation module comprises four convolution layers which are sequentially connected, wherein the convolution kernel of the first convolution layer is 7*7, a half of channels are processed by adopting a batch normalization layer after convolution, the other half of channels are processed by adopting an example normalization layer, and a function layer is activated by RELU after processing; the convolution kernel of the second convolution layer is 3*3, after convolution, batch normalization layer processing is adopted for half channels, the other half channels are processed by example normalization layers, and after processing, function layers are activated by RELU; the convolution kernel of the third convolution layer is 3*3, all channels are processed by adopting a batch normalization layer after convolution, and a function layer is activated by RELU after processing; the convolution kernel size of the fourth convolution layer is 16×16;
the target perception mask module comprises a plurality of random initialization mask modules, each mask module is a trainable parameter, the data format is the same as the block length of the block generation module, and the mask modules are used for replacing a designated part of common blocks and then are used as input of a transform backbone network;
the converter backbone network comprises a plurality of converter layers, each layer consists of a multi-head self-attention MSA and two layers of full-connection network MLP using GELU activation function, and LayerNorm and residual connection are arranged in front of the MSA and the MLP;
the mask alignment module comprises a dimension conversion function, converts the original image dimension into a characteristic dimension and is used for a subsequent pixel level alignment loss function;
step 2: and inputting the image to be identified into the transducer model based on the perception aided learning to obtain a target re-identification result.
2. The method for unsupervised target re-identification based on a perception-assisted learning transducer model according to claim 1, wherein the method comprises the following steps:
the transducer model based on perception aided learning is a trained model; in the training process, mutual learning is carried out through a distinguishing feature learning branch and a perception auxiliary learning branch to finish training together; in the distinguishing feature learning, the obtained features are clustered to obtain pseudo tags, and then the parameters of the guiding model are updated by adopting contrast learning according to the pseudo tags; in the perception auxiliary learning branch, a supervision signal is constructed by aligning the original pixels of the mask part, and the model is guided to learn fine granularity information;
the distinguishing feature learning and defining set
Figure FDA0004127010910000011
To represent the input of a discriminating characteristic learning branch, where n is the number of training images; each input image is +.>
Figure FDA0004127010910000012
H and W represent the width and height of the image, respectively, C represents the three passes of RGBLane, I/O (the)>
Figure FDA0004127010910000021
Representing a real number; after the block embedding operation, the image is divided into N block embedments with D dimensions; then introducing a learnable global classification feature for the sequence representation of the global feature, and a learnable position embedding to maintain the spatial position relationship; during training, the perceptually assisted learning transform model extracts features from all training images to obtain initial features represented by global classification features +.>
Figure FDA0004127010910000022
Adopting a clustering algorithm to the initial characteristics, and generating a pseudo tag according to a clustering result; storing representative features of each cluster and corresponding pseudo tags through a memory dictionary; initializing representative features of each cluster by a cluster average value, and subsequently carrying out momentum update; the examples are considered positive samples with the same pseudo tags in the memory dictionary, the rest are negative samples; the contrast loss at cluster level is:
Figure FDA0004127010910000023
wherein m is j Representative features representing cluster levels in memory dictionary, m + Representing the corresponding positive features in the memory dictionary, wherein k is the clustering number, and τ is a self-defined temperature parameter; as the network updates in each iteration, the memory dictionary will also update the features to maintain consistency;
momentum updates are as follows:
m j ←μm j +(1-μ)f h , (2)
wherein f h Represents the most difficult sample in the batch, μ represents momentum.
3. The method for unsupervised target re-identification based on a perception-assisted learning transducer model according to claim 2, wherein the method comprises the following steps: during clustering, the pulling distance between positive samples is reduced, and negative samples are far away from each other, so that examples in the clusters are clustered together;
Figure FDA0004127010910000024
wherein,,
Figure FDA0004127010910000025
representing instance level losses, f + Is characteristic of positive samples, f - Is characteristic of a negative sample.
4. The method for unsupervised target re-identification based on a perception-assisted learning transducer model according to claim 2, wherein the method comprises the following steps: processed set using target perception mask
Figure FDA0004127010910000026
As input for perception-assisted learning; selecting to mask partial block embedding near the center of the image, specifically, excluding block embedding at the edge c circle of the image, and then randomly masking the rest part; these masks are defined as randomly initialized learner block embedding +.>
Figure FDA0004127010910000027
Wherein (1)>
Figure FDA0004127010910000028
Representing that a certain one of the learnable mask blocks is embedded, and m represents the number of mask blocks for subsequent direct learning of local visual perception.
5. The method for unsupervised target re-identification based on a perception-assisted learning transducer model according to claim 2, wherein the method comprises the following steps: the original image input is dimensionally transformed to facilitate alignment of pixels with block features,
Figure FDA0004127010910000029
will be reconfigured as +.>
Figure FDA00041270109100000210
The alignment formula for the perceived block and the block of the mask area of the image is:
Figure FDA0004127010910000031
wherein f i p Representing the features learned by the ith mask block,
Figure FDA0004127010910000032
representing pixel values of a corresponding i-th block portion of the image; m represents the number of mask blocks;
the final loss function is expressed as:
Figure FDA0004127010910000033
wherein lambda is 1 And lambda (lambda) 2 Representing the weight of each part.
6. An unsupervised target re-recognition system based on a perception-assisted learning transducer model, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the unsupervised target re-identification method based on a perceptually assisted learning transducer model as claimed in any one of claims 1 to 5.
CN202310248659.8A 2023-03-13 2023-03-13 Unsupervised target re-identification method and system based on perception-aided learning transducer model Active CN116403015B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310248659.8A CN116403015B (en) 2023-03-13 2023-03-13 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310248659.8A CN116403015B (en) 2023-03-13 2023-03-13 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Publications (2)

Publication Number Publication Date
CN116403015A true CN116403015A (en) 2023-07-07
CN116403015B CN116403015B (en) 2024-05-03

Family

ID=87018938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310248659.8A Active CN116403015B (en) 2023-03-13 2023-03-13 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Country Status (1)

Country Link
CN (1) CN116403015B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160997A1 (en) * 2018-11-02 2020-05-21 University Of Central Florida Research Foundation, Inc. Method for detection and diagnosis of lung and pancreatic cancers from imaging scans
CN112069920A (en) * 2020-08-18 2020-12-11 武汉大学 Cross-domain pedestrian re-identification method based on attribute feature driven clustering
CN113487027A (en) * 2021-07-08 2021-10-08 中国人民大学 Sequence distance measurement method based on time sequence alignment prediction, storage medium and chip
CN114333062A (en) * 2021-12-31 2022-04-12 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN114596589A (en) * 2022-03-14 2022-06-07 大连理工大学 Domain-adaptive pedestrian re-identification method based on interactive cascade lightweight transformations
CN114677646A (en) * 2022-04-06 2022-06-28 上海电力大学 Vision transform-based cross-domain pedestrian re-identification method
CN115050045A (en) * 2022-04-06 2022-09-13 上海电力大学 Vision MLP-based pedestrian re-identification method
CN115359254A (en) * 2022-07-25 2022-11-18 华南理工大学 Vision transform network-based weak supervision instance segmentation method, system and medium
KR20230003827A (en) * 2021-06-30 2023-01-06 주식회사 사로리스 Image processing apparatus for improving license plate recognition rate and image processing method using the same
CN115601791A (en) * 2022-11-10 2023-01-13 江南大学(Cn) Unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution
US20230046066A1 (en) * 2021-05-25 2023-02-16 Samsung Electronics Co., Ltd. Method and apparatus for video recognition
KR20230026216A (en) * 2021-08-17 2023-02-24 한국과학기술원 Method and Apparatus for Denoising using Cycle-Consistent Learning and Attention Module to Achieve Robustness Against Adversarial Attacks
US20230062151A1 (en) * 2021-08-10 2023-03-02 Kwai Inc. Transferable vision transformer for unsupervised domain adaptation

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160997A1 (en) * 2018-11-02 2020-05-21 University Of Central Florida Research Foundation, Inc. Method for detection and diagnosis of lung and pancreatic cancers from imaging scans
CN112069920A (en) * 2020-08-18 2020-12-11 武汉大学 Cross-domain pedestrian re-identification method based on attribute feature driven clustering
US20230046066A1 (en) * 2021-05-25 2023-02-16 Samsung Electronics Co., Ltd. Method and apparatus for video recognition
KR20230003827A (en) * 2021-06-30 2023-01-06 주식회사 사로리스 Image processing apparatus for improving license plate recognition rate and image processing method using the same
CN113487027A (en) * 2021-07-08 2021-10-08 中国人民大学 Sequence distance measurement method based on time sequence alignment prediction, storage medium and chip
US20230062151A1 (en) * 2021-08-10 2023-03-02 Kwai Inc. Transferable vision transformer for unsupervised domain adaptation
KR20230026216A (en) * 2021-08-17 2023-02-24 한국과학기술원 Method and Apparatus for Denoising using Cycle-Consistent Learning and Attention Module to Achieve Robustness Against Adversarial Attacks
CN114333062A (en) * 2021-12-31 2022-04-12 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN114596589A (en) * 2022-03-14 2022-06-07 大连理工大学 Domain-adaptive pedestrian re-identification method based on interactive cascade lightweight transformations
CN115050045A (en) * 2022-04-06 2022-09-13 上海电力大学 Vision MLP-based pedestrian re-identification method
CN114677646A (en) * 2022-04-06 2022-06-28 上海电力大学 Vision transform-based cross-domain pedestrian re-identification method
CN115359254A (en) * 2022-07-25 2022-11-18 华南理工大学 Vision transform network-based weak supervision instance segmentation method, system and medium
CN115601791A (en) * 2022-11-10 2023-01-13 江南大学(Cn) Unsupervised pedestrian re-identification method based on Multiformer and outlier sample re-distribution

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ARYAN GUPTA: "Ensemble Learning using Vision Transformer and Convolutional Networks for Person Re-ID", 2022 6TH INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 13 April 2022 (2022-04-13) *
张良;车进;杨琦;: "多粒度特征融合的行人再识别研究", 液晶与显示, no. 06, 15 June 2020 (2020-06-15) *
杨玉亭;冯林;代磊超;苏菡;: "面向上下文注意力联合学习网络的方面级情感分类模型", 模式识别与人工智能, no. 08, 15 August 2020 (2020-08-15) *
燕志星;王海瑞;杨宏伟;靖婉婷;: "基于深度学习特征提取和GWO-SVM滚动轴承故障诊断的研究", 云南大学学报(自然科学版), no. 04, 10 July 2020 (2020-07-10) *
郑烨;赵杰煜;王翀;张毅;: "基于姿态引导对齐网络的局部行人再识别", 计算机工程, no. 05, 15 May 2020 (2020-05-15) *

Also Published As

Publication number Publication date
CN116403015B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
Farfade et al. Multi-view face detection using deep convolutional neural networks
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
Anil et al. Literature survey on face and face expression recognition
CN112818931A (en) Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion
Lee et al. Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos
Ranjan et al. Unconstrained age estimation with deep convolutional neural networks
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
Shuai et al. Integrating parametric and non-parametric models for scene labeling
CN113822246B (en) Vehicle weight identification method based on global reference attention mechanism
Zhang et al. IL-GAN: Illumination-invariant representation learning for single sample face recognition
Tang et al. Piap-df: Pixel-interested and anti person-specific facial action unit detection net with discrete feedback learning
Xia et al. Face occlusion detection using deep convolutional neural networks
Yan et al. Part-based representation enhancement for occluded person re-identification
CN116030495A (en) Low-resolution pedestrian re-identification algorithm based on multiplying power learning
Zheng et al. Vlad encoded deep convolutional features for unconstrained face verification
Lai et al. Deep siamese network for low-resolution face recognition
Wang et al. Exploring fine-grained sparsity in convolutional neural networks for efficient inference
CN116994319A (en) Model training method, face recognition equipment and medium
CN116403015B (en) Unsupervised target re-identification method and system based on perception-aided learning transducer model
Zhu et al. Correspondence-free dictionary learning for cross-view action recognition
Sharma et al. Face recognition using face alignment and PCA techniques: a literature survey
Zhang et al. Lightweight PM-YOLO network model for moving object recognition on the distribution network side
CN113869154B (en) Video actor segmentation method according to language description
Huang et al. Weighted graph embedded low-rank projection learning for feature extraction
Zhang et al. Discriminative feature representation for person re-identification by batch-contrastive loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant