WO2022160773A1 - 基于虚拟样本的行人重识别方法 - Google Patents
基于虚拟样本的行人重识别方法 Download PDFInfo
- Publication number
- WO2022160773A1 WO2022160773A1 PCT/CN2021/122343 CN2021122343W WO2022160773A1 WO 2022160773 A1 WO2022160773 A1 WO 2022160773A1 CN 2021122343 W CN2021122343 W CN 2021122343W WO 2022160773 A1 WO2022160773 A1 WO 2022160773A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- virtual
- pedestrian
- data set
- real
- samples
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000009826 distribution Methods 0.000 claims abstract description 25
- 238000005070 sampling Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 12
- 230000000694 effects Effects 0.000 claims abstract description 7
- 238000012795 verification Methods 0.000 claims abstract description 4
- 238000009877 rendering Methods 0.000 claims abstract description 3
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 238000005286 illumination Methods 0.000 abstract description 3
- 230000036544 posture Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 8
- 238000003384 imaging method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000016776 visual perception Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000000332 black box Nutrition 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000003628 erosive effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the invention belongs to the technical field of pedestrian re-identification, in particular to a pedestrian re-identification method based on virtual samples.
- Person Re-ID aims to match images of each person from multiple non-overlapping cameras deployed at different locations.
- person re-identification technology has developed rapidly, and there are rich application scenarios, such as finding people of interest (such as lost children or criminals) and tracking specific people, etc., which makes person re-identification technology in-depth.
- Research. Benefiting from deep convolutional neural networks, many of the proposed methods for person re-id achieve very high performance.
- these pedestrian re-identification methods rely on images from a large number of pedestrian surveillance videos for training, which will expose personal privacy information and may lead to further security problems. Due to the growing concern about privacy issues, some real pedestrian datasets are required to be withdrawn, and even images of the datasets cannot be displayed in any form of publication.
- the unsupervised domain adaptation method can still learn the relevant features of the target domain data set with the help of the source domain data set without relying on the target domain pedestrian label, which avoids the specific category of target domain pedestrians to a certain extent. direct exposure of information.
- State-of-the-art unsupervised domain adaptation methods generally fall into two categories: clustering-based methods and generative-based methods.
- the purpose of the present invention is to provide a pedestrian re-identification method based on a virtual sample in view of the deficiencies of the prior art, realize pedestrian re-identification under privacy protection through virtual samples, and solve the problem of pedestrians under privacy protection in the prior art.
- the re-identification task faces the challenge of missing pedestrian appearance in target images and the challenge of a large domain gap between virtual and real images.
- the present invention adopts the following technical solutions:
- a pedestrian re-identification method based on virtual samples comprising the following steps:
- Step S1 obtaining the virtual characters generated by the game engine for preprocessing, and generating a batch of virtual samples with character labels by fusing the background of the target data set and the poses of real characters through a multi-factor variational generation network;
- Step S2 rendering the generated virtual sample according to the lighting condition of the target data set
- Step S3 sampling the rendered virtual samples according to the character attributes of the target data set
- Step S4 constructing a training data set according to the sampled virtual samples to train the pedestrian re-identification model, and verify the recognition effect of the trained model.
- step S1 includes:
- Step S11 extract k characters from the virtual data set generated by the game engine and extract l backgrounds from the real pedestrian data set, respectively ⁇ c 1 ,...,c k ⁇ and ⁇ b 1 ,...,b l ⁇ , directly synthesizing the two to obtain n virtual images about the fusion of virtual characters and real backgrounds as training samples ⁇ x 1 , ..., x n ⁇ ;
- Step S12 extract the character pose of each training sample respectively, and use it, the training sample and the corresponding background as the input of the constructed deep neural network based on variational autoencoder, that is, the multi-factor variational generation network, and construct the objective function training. Let the network learn to obtain the transformation law of the characters, backgrounds and poses of the synthetic images;
- Step S13 adjusting the resolution of the virtual character according to the character resolution of the target data set
- Step S14 taking the adjusted virtual character, the real background and the pose extracted from the target data set as the input of the network, and generating a batch of virtual samples with character labels through the network.
- x represents the input training samples
- z (x, b) represents the joint latent variable
- D ⁇ represents the decoder network as a generator
- ⁇ i represents the feature parameters extracted from different network layers
- q ⁇ represents the posterior distribution parameters
- p ⁇ represents the prior distribution parameter
- KL represents the Kullback-Leibler divergence
- i and ⁇ i are pre-set hyperparameters used to control the contribution of different network layers to the total loss.
- the pixel ratio of the characters in the virtual data set and the real pedestrian data set in the image is calculated respectively, and the resolution of the virtual character is adjusted by scaling the characters in the virtual data set so that it has the same size as the target. datasets of similar resolution.
- step S2 convert each image into HSV format, extract the V channel and calculate the average value of the V channel as the brightness value of the image, the brightness value of the image ranges from 0 to 255, so as to obtain the target The lighting conditions of the dataset.
- step S3 two attributes of the color of the upper body clothes and the color of the lower body clothes are selected as the basic attributes for sampling to perform attribute distribution statistics of the data set.
- the identification and verification process includes: using the model obtained by training to match the retrieved pictures in the gallery to determine the pictures of the same identity, and output the corresponding picture indexes in turn according to the possibility, and the real labels Do a comparison.
- the problem of image brightness difference caused by the following problems and the inconsistency of attribute distribution caused by different clothing may be caused by seasonal changes.
- the present invention uses a virtual image generation framework integrating translation-rendering-sampling to bring the virtual image and the real image as close as possible. distribution and generate a batch of new virtual samples, and further use these virtual samples to train the obtained pedestrian re-identification model, which can be effectively applied to pedestrian datasets in real scenes, so as to achieve the goal of not obtaining the target domain.
- An effective pedestrian re-identification model is learned under the condition of the appearance of the real pedestrian dataset, and the task of pedestrian re-identification under privacy protection is completed. Specifically include the following aspects:
- the present invention defines three types of information that are not related to privacy, specifically including content information, namely background and posture, imaging information, such as resolution and lighting conditions, and description information, such as clothing color, etc. human attributes.
- the present invention adopts a virtual image generation framework integrating image translation-rendering-sampling to process the virtual data generated in the game engine to obtain a virtual sample, Effectively realize the domain distribution approximation from virtual samples to real images.
- the present invention has the characteristics of high adaptability and strong image translation flexibility, and proposes a deep neural network based on variational autoencoders—multi-factor variational generation network. Coding and fusion of irrelevant factors can effectively generate virtual samples that integrate virtual characters and real-world information.
- FIG. 1 is a flowchart of a method for pedestrian re-identification based on virtual samples in an embodiment of the present invention.
- FIG. 2 is a schematic structural diagram of a deep neural network of a multi-factor variational generation network in an embodiment of the present invention.
- This embodiment discloses a pedestrian re-identification method based on virtual samples, which aims to provide a pedestrian re-identification scheme under privacy protection. Since the appearance of real pedestrians cannot be obtained, this scheme uses the virtual images generated by the game engine as the source dataset for extracting character features. However, if we simply use the virtual source dataset X s to train the pedestrian re-identification model and apply it directly to the real pedestrian target dataset X t , due to the huge domain between the virtual source dataset and the real pedestrian dataset This method cannot learn the effective discriminative feature representation of real pedestrians in the target dataset, which will lead to the model effect far from meeting the actual needs.
- this scheme introduces three types of privacy-independent information, including content information (background and pose, etc.), imaging information ( Foreground resolution and lighting conditions, etc.) and description information (attributes of people such as clothes color), etc.
- content information contains real-world information and the limb state of real pedestrians
- imaging information forces the image style to approach the target domain
- description information makes the overall attribute distribution of the dataset have statistical semantic consistency.
- the virtual sample-based pedestrian re-identification method specifically includes the following steps:
- step S1 virtual data generated by the game engine is acquired and preprocessed to obtain a batch of virtual samples with character labels. Specifically, this step S1 includes the following steps:
- Step S11 extract k characters from the virtual data set generated by the game engine and extract l backgrounds from the real pedestrian data set, respectively ⁇ c 1 ,...,c k ⁇ and ⁇ b 1 ,...,b l ⁇ , the two are directly synthesized to obtain n virtual images about the fusion of virtual characters and real backgrounds as training samples ⁇ x 1 , . . . , x n ⁇ .
- Step S12 extract the character pose of each training sample respectively, and use it, the training sample and the corresponding background as the input of the constructed deep neural network based on variational autoencoder, that is, the multi-factor variational generation network, and construct the objective function training. Let the network learn to get the transformation law of the person, background and pose of the synthetic image.
- step S12 the objective function is
- x represents the input training samples
- z (x, b) represents the joint latent variable
- D ⁇ represents the decoder network as a generator
- ⁇ i represents the feature parameters extracted from different network layers
- q ⁇ represents the posterior distribution parameters
- p ⁇ represents the prior distribution parameter
- KL represents the Kullback-Leibler divergence
- i and ⁇ i are pre-set hyperparameters used to control the contribution of different network layers to the total loss.
- Step S13 Adjust the resolution of the virtual character according to the character resolution of the target data set.
- step S13 the pixel proportions of the characters in the virtual data set and the real pedestrian data set are calculated respectively, and the resolution of the virtual characters is adjusted by scaling the characters in the virtual data set to have a resolution similar to the target data set. Rate.
- Step S14 taking the adjusted virtual character, the real background and the pose extracted from the target data set as the input of the network, and generating a batch of virtual samples with character labels through the network.
- Step S2 Render the generated virtual sample according to the lighting condition of the target data set.
- each image is converted into HSV format, the V channel is extracted and the average value of the V channel is calculated as the brightness value of the image, so as to obtain the illumination condition of the target dataset.
- the brightness values of the image range from 0-255.
- Step S3 Sampling the rendered virtual samples according to the character attributes of the target data set.
- step S3 two attributes of the color of the upper body clothes and the color of the lower body clothes are selected as the basic attributes of sampling to perform attribute distribution statistics of the data set.
- Step S4 constructing a training data set according to the sampled virtual samples to train the pedestrian re-identification model, and verify the recognition effect of the trained model.
- the specific identification and verification process includes: using the model obtained by training to match the pictures with the same identity identification in the searched pictures in the gallery, and outputting the corresponding picture indexes in sequence according to the possibility, and comparing them with the real labels.
- the pedestrian re-identification method of this embodiment under the privacy-protected pedestrian re-identification setting, we cannot obtain the pedestrian appearance in the real pedestrian data set in the target domain, and turn to the virtual character generated by the game engine to replace the real character as foreground information to extract pedestrians Based on this strategy, a batch of new virtual samples are generated by fusing virtual characters and real backgrounds to be used as the training set of the pedestrian re-identification model.
- the model trained by the method provided in this embodiment can effectively protect the privacy of pedestrians from being infringed, and use the relevant information in the target domain that does not involve privacy as much as possible to shorten the distance from the distribution of the target domain, with the help of real pedestrians in the target domain
- the content information (background and pose, etc.) of the data set realizes the basic transformation of the virtual character, and then the imaging information (foreground resolution and lighting conditions, etc.) is extracted from the real pedestrian data set in the target domain and applied to the virtual sample, and the image sampling method is used according to the description.
- information attributes of people such as clothes color
- model training only access rights to the generated virtual samples are provided, and the test and evaluation of the recognition effect of the model applied to the real pedestrian dataset is completed under black-box conditions, thus achieving the goal of pedestrian re-identification under privacy protection.
- Step S1 Since the virtual samples lack real-world information, privacy-independent content is introduced from the real-world data set to generate more realistic images. Therefore, the virtual data set X s and the real pedestrian data set X t need to be prepared in advance. Pedestrian images usually contain two parts of content, the background and the pedestrian as the foreground. In the traditional person re-identification task, many methods propose to reduce the influence of the background through the attention mechanism, through segmentation or local feature extraction-based methods, so that the model can pay more attention to the pedestrian itself.
- this scheme proposes image fusion of the virtual characters of the virtual dataset and the real background of the target domain dataset.
- a self-correcting human parsing network is used to extract the person mask in each image, and the area covered by the mask is further erased from the pedestrian image, thereby avoiding the leakage of appearance information involving pedestrian privacy.
- the background image with pedestrians removed is inpainted using the recurrent feature inference network to obtain the complete background image.
- the edge of the person mask obtained by the self-correcting human parsing network is incomplete, so dilation and erosion techniques are used to fill the missing pixels to further improve the integrity of the person mask.
- the erasing process of real pedestrian images should be done by the image provider to avoid privacy leakage.
- the present embodiment adopts the matting script to extract the avatar from the virtual image with the solid color background, so as to realize the separation of the avatar and the background more quickly and conveniently.
- k persons are extracted from the virtual dataset generated by the game engine and l backgrounds are extracted from the real pedestrian dataset, respectively ⁇ c 1 ,...,c k ⁇ and ⁇ b 1 ,...,b l ⁇ , the two are directly synthesized to obtain n virtual images about the fusion of virtual characters and real backgrounds as training samples ⁇ x 1 , . . . , x n ⁇ .
- the character pose of each training sample is extracted separately, and the training sample and the corresponding background are used as the input of the deep neural network based on the variational autoencoder, that is, the multi-factor variational generation network, and the objective function training is constructed.
- the network learns to obtain the transformation law of the synthetic image with respect to the person, background and pose.
- the multi-factor variational generation network uses a variety of factors that have nothing to do with privacy (such as background, posture, etc.) to be input into the encoder network to obtain the corresponding encoding, and the encoding is carried out through autoregressive group modeling. Joint latent variable modeling, and then generate virtual samples with the target image content through the decoder network.
- the specific modeling process is as follows:
- an effective method is to use a variational autoencoder to model p(x
- z represents the latent variable
- p(z) represents the standard normal distribution prior in the variational autoencoder framework.
- z the joint latent variable z (c,b) . Since the foreground content information of character c is contained in the fusion image x, use x to encode c. As the goal shifts to learning p(x
- KL represents the Kullback-Leibler divergence
- this scheme proposes a novel multi-factor variational generative network.
- the multi-factor variational generative network inputs the person, background and pose into the encoder network separately to obtain their low-dimensional feature encoding.
- a multifactor variational generative network concatenates target domain-related codes into joint codes before fusing with person codes.
- the multi-factor variational generation network adopts autoregressive group modeling to construct the joint latent variable representation of z (x,b) .
- the parameters needed to generate the model can be learned by training the multifactor variational generative network described above.
- This embodiment assumes that the parameters of the prior distribution and the posterior distribution are ⁇ and ⁇ , respectively.
- p(z (x, b) ) is modeled as a Gaussian distribution, and the parameters ⁇ and ⁇ are inferred by a neural network. From this, the loss function for training can be deduced as follows:
- this embodiment combines the perceptual function ⁇ to extract features that are more in line with visual intuition, and is used to calculate the perceptual loss between the original image input and the image generated by the decoder network. Therefore, the final loss function of this scheme is defined as follows:
- ⁇ i represents the features extracted from each layer of the visual perception network
- i and ⁇ i are hyperparameters used to control the contribution of different layers of the visual perception network to the total loss
- D ⁇ represents the decoder network as a generator
- Character resolution refers to the number of pixels of foreground pedestrians in the image.
- images of different pedestrians are usually different according to the position and viewpoint of the camera.
- the virtual dataset obtained by the game engine the virtual The number of pixels occupied by each person in the image is basically the same. Therefore, there is a large gap in the distribution of person resolution between the virtual source domain and the target real domain.
- by scaling the characters in the source domain the pixel ratio of the characters in the whole image can be closer to the target domain.
- the mask of the person in each image is first obtained through the self-corrected human parsing network, and then the number of pixels occupied by the person mask is divided by the number of pixels of the whole image to obtain the percentage.
- the pixel proportions of the characters in the virtual dataset and the target dataset are calculated separately, and the characters in the virtual dataset are scaled accordingly to adjust the character resolution of the virtual character to have a similar percentage to the target domain.
- a batch of virtual samples with person labels is generated by using the adjusted virtual person, the real background, and the pedestrian pose extracted from the target dataset as the input of the deep neural network.
- Step S2 Render the generated virtual sample according to the lighting condition of the target data set.
- lighting conditions can vary widely across datasets. Some datasets only have specific lighting conditions, such as those captured at night. Due to the huge difference in brightness, the learned person re-ID model may not be properly applied to the actual target domain.
- this scheme adjusts the lighting situation of the source domain to adapt to the lighting situation of the target domain.
- each image is converted to HSV format, the V channel is extracted and the average value of the V channel is calculated as the brightness value of the image, which ranges from 0-255.
- this embodiment multiplies each image by the same coefficient to adjust the illumination of the source domain so that the luminance distributions of the two domains have similar peak distributions.
- Step S3 Sampling the rendered virtual samples according to the character attributes of the target data set.
- the sampling process draws virtual samples from the target domain according to descriptive information such as clothing style, age, gender, etc.
- the attributes of characters can be manually set to ensure diversity.
- the description information of virtual characters usually has various characteristics.
- the images of the dataset are usually captured in a specific area within a limited period of time. For example, a dataset of real pedestrians is captured on a campus in summer, and there are a large number of pedestrians wearing T-shirts and backpacks. .
- the virtual image is sampled according to the description information of the real target domain, so that the attributes of the virtual character are as consistent as possible with the real scene, so that the learned person re-identification model can better adapt to the target domain.
- two attributes are selected as the basic attributes of sampling, including the color of upper body clothes and the color of lower body clothes.
- Step S4 verifying the recognition effect, constructing a training data set by sampling the virtual samples to train the pedestrian re-identification model, and using the trained model to match the retrieved pictures in the gallery and determine the pictures with the same identity identification and in sequence according to the possibility Output the corresponding image index and compare it with the real label.
- the implementation platform of this embodiment is pycharm software, and the basis of data reading and writing, basic mathematical operations, and optimization solutions are well-known technologies in the technical field, and details are not described here.
- the automatic operation of the process can be realized by means of software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (7)
- 一种基于虚拟样本的行人重识别方法,其特征在于,包括以下步骤:步骤S1、获取游戏引擎生成的虚拟人物进行预处理,并通过多因素变分生成网络融合目标数据集的背景和真实人物姿态生成得到一批带有人物标签的虚拟样本;步骤S2、根据目标数据集的光照情况对生成的虚拟样本进行渲染;步骤S3、根据目标数据集的人物属性对渲染后的虚拟样本进行抽样;步骤S4、根据抽样得到的虚拟样本构造训练数据集对行人重识别模型进行训练,并对训练得到的模型进行识别效果验证。
- 根据权利要求1所述的基于虚拟样本的行人重识别方法,其特征在于:所述步骤S1包括:步骤S11、从游戏引擎生成的虚拟数据集中提取k个人物和从真实行人数据集中提取l个背景,分别为{c 1,...,c k}和{b 1,...,b l},将两者直接合成得到关于虚拟人物和真实背景融合的n个虚拟图像作为训练样本{x 1,...,x n};步骤S12、分别提取每个训练样本的人物姿态,将其与训练样本和对应的背景作为所构建的基于变分自编码器的深度神经网络即多因素变分生成网络的输入,构造目标函数训练让网络学习得到合成图像关于人物、背景和姿态的变换规律;步骤S13、根据目标数据集的人物分辨率对虚拟人物的分辨率进行调整;步骤S14、将调整后的虚拟人物、真实背景和从目标数据集中提取到的姿态作为网络的输入,通过网络生成得到一批带有人物标签的虚拟样本。
- 根据权利要求2所述的基于虚拟样本的行人重识别方法,其特征在于:在所述步骤S13中,分别计算虚拟数据集和真实行人数据集的人物在图像中的像素占比,通过缩放虚拟数据集的人物来调整虚拟人物的分辨率并使其具有与目标数据集相似的分辨率。
- 根据权利要求1所述的基于虚拟样本实的行人重识别方法,其特征在于:在所述步骤S2中,将每个图像转换为HSV格式,提取V通道并计算V通道的平均值作为图像的亮度值,该通道亮度值范围为0~255,从而获取目标数据集的光照情况。
- 根据权利要求1所述的基于虚拟样本的行人重识别方法,其特征在于:在所述步骤S3中,选定上半身衣服的颜色和下半身衣服的颜色的两个属性作为抽样的基础属性以进行数据集的属性分布统计。
- 根据权利要求1所述的基于虚拟样本的行人重识别方法,其特征在于:在所述步骤S4中,识别验证过程包括:将训练得到的模型用来匹配检索图片在图库中判定为同一身份标识的图片,并按照可能性依次输出对应的图片索引,与真实标签做对比。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/337,439 US11837007B2 (en) | 2021-01-28 | 2023-06-20 | Pedestrian re-identification method based on virtual samples |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110122521.4A CN112784783B (zh) | 2021-01-28 | 2021-01-28 | 基于虚拟样本的行人重识别方法 |
CN202110122521.4 | 2021-01-28 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/337,439 Continuation-In-Part US11837007B2 (en) | 2021-01-28 | 2023-06-20 | Pedestrian re-identification method based on virtual samples |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022160773A1 true WO2022160773A1 (zh) | 2022-08-04 |
Family
ID=75759603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/122343 WO2022160773A1 (zh) | 2021-01-28 | 2021-09-30 | 基于虚拟样本的行人重识别方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11837007B2 (zh) |
CN (1) | CN112784783B (zh) |
WO (1) | WO2022160773A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115496191A (zh) * | 2022-11-08 | 2022-12-20 | 腾讯科技(深圳)有限公司 | 一种模型训练方法及相关装置 |
CN117456389A (zh) * | 2023-11-07 | 2024-01-26 | 西安电子科技大学 | 一种基于YOLOv5s的改进型无人机航拍图像密集和小目标识别方法、***、设备及介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112784783B (zh) | 2021-01-28 | 2023-05-02 | 武汉大学 | 基于虚拟样本的行人重识别方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414462A (zh) * | 2019-08-02 | 2019-11-05 | 中科人工智能创新技术研究院(青岛)有限公司 | 一种无监督的跨域行人重识别方法及*** |
CN110490960A (zh) * | 2019-07-11 | 2019-11-22 | 阿里巴巴集团控股有限公司 | 一种合成图像生成方法及装置 |
CN110555390A (zh) * | 2019-08-09 | 2019-12-10 | 厦门市美亚柏科信息股份有限公司 | 基于半监督训练方式的行人重识别方法、装置及介质 |
US20190378333A1 (en) * | 2018-06-08 | 2019-12-12 | Verizon Patent And Licensing Inc. | Methods and systems for representing a pre-modeled object within virtual reality data |
CN112784783A (zh) * | 2021-01-28 | 2021-05-11 | 武汉大学 | 基于虚拟样本的行人重识别方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11138469B2 (en) * | 2019-01-15 | 2021-10-05 | Naver Corporation | Training and using a convolutional neural network for person re-identification |
GB2584727B (en) * | 2019-06-14 | 2024-02-28 | Vision Semantics Ltd | Optimised machine learning |
CN110427813B (zh) * | 2019-06-24 | 2023-06-09 | 中国矿业大学 | 基于姿态指导行人图像生成的孪生生成式对抗网络的行人重识别方法 |
CN110796080B (zh) * | 2019-10-29 | 2023-06-16 | 重庆大学 | 一种基于生成对抗网络的多姿态行人图像合成算法 |
US20230004760A1 (en) * | 2021-06-28 | 2023-01-05 | Nvidia Corporation | Training object detection systems with generated images |
-
2021
- 2021-01-28 CN CN202110122521.4A patent/CN112784783B/zh active Active
- 2021-09-30 WO PCT/CN2021/122343 patent/WO2022160773A1/zh active Application Filing
-
2023
- 2023-06-20 US US18/337,439 patent/US11837007B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190378333A1 (en) * | 2018-06-08 | 2019-12-12 | Verizon Patent And Licensing Inc. | Methods and systems for representing a pre-modeled object within virtual reality data |
CN110490960A (zh) * | 2019-07-11 | 2019-11-22 | 阿里巴巴集团控股有限公司 | 一种合成图像生成方法及装置 |
CN110414462A (zh) * | 2019-08-02 | 2019-11-05 | 中科人工智能创新技术研究院(青岛)有限公司 | 一种无监督的跨域行人重识别方法及*** |
CN110555390A (zh) * | 2019-08-09 | 2019-12-10 | 厦门市美亚柏科信息股份有限公司 | 基于半监督训练方式的行人重识别方法、装置及介质 |
CN112784783A (zh) * | 2021-01-28 | 2021-05-11 | 武汉大学 | 基于虚拟样本的行人重识别方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115496191A (zh) * | 2022-11-08 | 2022-12-20 | 腾讯科技(深圳)有限公司 | 一种模型训练方法及相关装置 |
CN115496191B (zh) * | 2022-11-08 | 2023-04-07 | 腾讯科技(深圳)有限公司 | 一种模型训练方法及相关装置 |
CN117456389A (zh) * | 2023-11-07 | 2024-01-26 | 西安电子科技大学 | 一种基于YOLOv5s的改进型无人机航拍图像密集和小目标识别方法、***、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112784783B (zh) | 2023-05-02 |
CN112784783A (zh) | 2021-05-11 |
US20230334895A1 (en) | 2023-10-19 |
US11837007B2 (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kumar et al. | Object detection system based on convolution neural networks using single shot multi-box detector | |
Matern et al. | Exploiting visual artifacts to expose deepfakes and face manipulations | |
CN112766160B (zh) | 基于多级属性编码器和注意力机制的人脸替换方法 | |
CN109376582B (zh) | 一种基于生成对抗网络的交互式人脸卡通方法 | |
WO2022160773A1 (zh) | 基于虚拟样本的行人重识别方法 | |
CN112215180A (zh) | 一种活体检测方法及装置 | |
Zhang et al. | Copy and paste GAN: Face hallucination from shaded thumbnails | |
CN109063643B (zh) | 一种用于脸部信息部分隐藏条件下的面部表情痛苦度识别方法 | |
Shiri et al. | Identity-preserving face recovery from stylized portraits | |
WO2024109374A1 (zh) | 换脸模型的训练方法、装置、设备、存储介质和程序产品 | |
CN113486944A (zh) | 人脸融合方法、装置、设备及存储介质 | |
CN112101320A (zh) | 模型训练方法、图像生成方法、装置、设备及存储介质 | |
Dong et al. | Semi-supervised domain alignment learning for single image dehazing | |
Barni et al. | Iris deidentification with high visual realism for privacy protection on websites and social networks | |
Arora et al. | A review of techniques to detect the GAN-generated fake images | |
Shiri et al. | Recovering faces from portraits with auxiliary facial attributes | |
Zeinstra et al. | ForenFace: a unique annotated forensic facial image dataset and toolset | |
Xu et al. | RelightGAN: Instance-level generative adversarial network for face illumination transfer | |
CN111275778B (zh) | 人脸简笔画生成方法及装置 | |
CN112488165A (zh) | 一种基于深度学习模型的红外行人识别方法及*** | |
Marnissi et al. | GAN-based vision Transformer for high-quality thermal image enhancement | |
CN112233054B (zh) | 基于关系三元组的人-物交互图像生成方法 | |
He et al. | Fa-gans: Facial attractiveness enhancement with generative adversarial networks on frontal faces | |
Annadani et al. | Augment and adapt: A simple approach to image tampering detection | |
Zhao et al. | Face Restoration Based on GANs and NST |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21922364 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21922364 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21922364 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/01/2024) |