CN110210556B

CN110210556B - Pedestrian re-identification data generation method

Info

Publication number: CN110210556B
Application number: CN201910466234.8A
Authority: CN
Inventors: 查正军; 刘嘉威
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2022-09-06
Anticipated expiration: 2039-05-29
Also published as: CN110210556A

Abstract

The invention discloses a pedestrian re-identification data generation method based on an adaptive migration network, which converts the image style in a source data set by using a countermeasure generation network (general adaptive Networks) to generate an image consistent with the style of the source data set, can effectively reduce the field interval between different data sets, and finally enables a model trained on a certain data set to have strong generalization capability on other data sets.

Description

Pedestrian re-identification data generation method

Technical Field

The invention relates to the technical field of pedestrian re-identification, in particular to a pedestrian re-identification data generation method.

Background

The pedestrian re-identification technology mainly aims to identify images with the same identity as a target person in a large-scale image library acquired by different monitoring cameras at different places. The pedestrian re-identification has very wide application in daily life, such as intelligent security, man-machine interaction, content-based retrieval, behavior analysis and the like. The pedestrian re-identification technology based on deep learning utilizes the strong automatic learning abstract characteristic capability of a neural network in deep learning, so that the performance of a pedestrian re-identification algorithm is greatly improved. The pedestrian re-identification algorithm belongs to a supervised learning algorithm, namely, a large amount of data (pedestrian images) needs to be collected by human in advance to form a data set. Different pedestrians need to be labeled with different identity labels in the data set, then a recognition algorithm is designed, and supervised learning is carried out according to sample data in the training set. Therefore, the quality of the data set and the characteristics of the data set have a great influence on the final performance of the pedestrian re-identification algorithm.

Although the existing pedestrian re-identification methods based on deep learning have good performance on a single data set, the performance of the existing pedestrian re-identification methods is sharply reduced when the existing pedestrian re-identification methods are directly applied to other data sets, and the reason is that the existing pedestrian re-identification methods are poor in generalization performance. The pedestrian image is susceptible to various monitoring environmental factors and interference in the imaging process. In real life, pedestrian image appearances are usually photographed by different monitoring cameras at different angles, at different places, and at different times. This results in significant deviations in the imaging conditions, such as illumination, sharpness, and camera view, during the acquisition of the images in the different data sets. Images in one data set are usually shot by a plurality of cameras at a fixed angle at a specific time, so that a model trained based on a single data set cannot cope with complex shooting conditions in the real world.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification data generation method, and the algorithm aims to generate a source data set image consistent with a target data set format by designing an adaptive migration network, so that the difference existing among data sets is reduced, and the performance of the pedestrian re-identification algorithm on the target data set is improved.

The purpose of the invention is realized by the following technical scheme:

a pedestrian re-identification data generation method comprising:

using the cycleGAN to construct a countermeasure generation network for transferring illumination change, a countermeasure generation network for transferring resolution change, a countermeasure generation network for transferring a photographing view angle and an integrated countermeasure generation network; constructing a sub-network weight fitting network by using a multilayer convolutional neural network and a full-connection network;

the countermeasure generating network of the migration illumination change, the countermeasure generating network of the migration resolution change and the countermeasure generating network of the migration photographing view all comprise a generator consisting of an encoder and a decoder and a discriminator; the integrated countermeasure generating network comprises a decoder and a discriminator; parameters of a decoder and a discriminator in the four confrontation generation networks are mutually shared, and the four discriminators are all used in a training stage and are matched with the output of the corresponding decoder to realize parameter updating;

for an image x from a source data set, respectively performing coding dimension reduction through encoders in the countermeasure generation network for migration illumination change, the countermeasure generation network for migration resolution change and the countermeasure generation network for migration photographing view;

generating the weight of each confrontation generation network according to the output of the three encoders through a sub-network weight fitting network;

after the weights of the three confrontation generation networks are multiplied by the output of the corresponding encoder, a new image with the illumination, the resolution and the photographing visual angle consistent with the target data set is generated through a decoder of the integrated confrontation generation network.

According to the technical scheme provided by the invention, the pedestrian re-identification data generation method based on the adaptive migration network converts the image style in the source data set by using the countermeasure generation network (general adaptive Networks) to generate the image with the style consistent with that of the target data set, so that the field interval between different data sets can be effectively reduced, and the re-identification rate across the data sets is improved. Finally, the model trained on a certain data set has strong generalization capability on other data sets.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic diagram of a pedestrian re-identification data generation method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a generation network for migration of illumination change countermeasures according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a migration resolution change countermeasure generation network provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of a countermeasure generation network for migrating a photographing view according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a sub-network weight fitting network according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an integrated countermeasure generation network according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a pedestrian re-identification data generation method, which is a pedestrian re-identification data generation method based on an introduction division and fusion strategy of an adaptive migration network and decoder parameter sharing, and can convert the image style in a source data set into the style of a target data set by introducing four countermeasure generation networks and a sub-network weight fitting network in the training process of a pedestrian re-identification algorithm spanning multiple data sets, thereby adaptively reducing the influence of the characteristics of the data set such as illumination, pixels, shooting positions and the like on the model generalization performance. The pedestrian re-identification data generation method has wide application prospect, and many applications in the intelligent society can not leave pedestrian re-identification algorithms (such as intelligent security, pedestrian path identification and the like), so that the method is a necessary way for human beings to realize strong artificial intelligence.

Fig. 1 is a schematic diagram of a method for generating pedestrian re-identification data according to an embodiment of the present invention, which mainly includes:

1. using the cycleGAN to construct a countermeasure generation network for transferring illumination change, a countermeasure generation network for transferring resolution change, a countermeasure generation network for transferring a photographing view angle and an integrated countermeasure generation network; a sub-network weight-fitting network is constructed using a multi-layer convolutional neural network and a fully-connected network.

As will be appreciated by those skilled in the art, CycleGAN is a basic algorithmic model that can transform image content from one domain to another, such as style migration, etc.

2. The countermeasure generating network of the migration illumination change, the countermeasure generating network of the migration resolution change and the countermeasure generating network of the migration photographing view all comprise a generator consisting of an encoder and a decoder and a discriminator; the integrated countermeasure generating network comprises a decoder and a discriminator; the parameters of the decoder and the discriminator in the four confrontation generation networks are mutually shared, and the four discriminators are all used in the training stage and are matched with the output of the corresponding decoder to realize parameter updating.

In the embodiment of the invention, the encoders in the countermeasure generation networks are mainly used for extracting the relevant features of the input image, and the features extracted by the encoders in different countermeasure generation networks have different emphasis, namely different emphasis on illumination, resolution and visual angle.

3. And for an image x from a source data set, respectively carrying out coding dimension reduction through encoders in the countermeasure generation network for the migration illumination change, the countermeasure generation network for the migration resolution change and the countermeasure generation network for the migration photographing view.

4. And generating the weight of each confrontation generation network according to the output of the three encoders through a sub-network weight fitting network.

5. After the weights of the three confrontation generating networks are multiplied by the output of the corresponding encoder, a new image with the illumination, resolution and photographing visual angle consistent with the target data set is generated by a decoder of the integrated confrontation generating network.

For ease of understanding, each network is described in detail below.

Firstly, shifting the confrontation of illumination change to generate a network.

As shown in fig. 2, the structure of the generation network for migration of the countermeasure against the illumination change mainly includes: a generator and a discriminator, the generator mainly comprises an Encoder (Encoder) and a Decoder (Decode).

For each image x from the source dataset T, the purpose of migrating the antagonistic generation network of illumination changes is to generate a new image based on x that is consistent in illumination characteristics with the target dataset S. The main working process is as follows:

1. for an image x from a source data set, an encoder in a network is generated by shifting the confrontation of illumination change for encoding and dimension reduction, and then a decoder decodes an encoding result to generate a new image with illumination characteristics consistent with a target data set.

2. The new image is mixed with the image in the target data set and input to the discriminator, which discriminates the image, i.e. whether the image is the new image generated or the image in the target data set.

3. And updating parameters of the encoder according to the judgment result of the discriminator.

The countermeasure of the migration illumination variation generates a loss function L of encoder parameter updates in the network _{total 1} Comprises the following steps:

L _{total 1} ＝L _gan +η ₁ L _ill (G，H)

wherein eta is ₁ Adjustments may be made during the experiment for over-parameters.

L _gan As a basis loss function, expressed as:

L _gan ＝L _adv +λ ₁ L _cyc +λ ₂ L _ide

wherein L is _adv For ensuring that the distribution of the new image generated is consistent with the distribution of the target data set, L _cyc For reversing in Cyc | eGANSwitching, transferring the new image generated back to the original image, L _ide For ensuring that the overall colour of the two images remains the same, lambda ₁ And λ ₂ Adjustments may be made during the experiment for over-parameters.

L _ill (G, H) a lighting situation for focusing the countering generation network of the migration lighting changes in the migration image, expressed as:

L _ill (G，H)＝E _x～p(x) [||H(G(x))-H(x)|| ₁ ]

wherein G (x) represents the new image generated, E represents the mathematical expectation, p (x) is the data distribution function of the image in the source dataset, H (-) represents the function of extracting the illumination insensitive feature, | | ₁ Representing a norm.

And secondly, moving the countermeasure generation network with the changed resolution.

As shown in fig. 3, the structure of the antagonistic generating network for the migration resolution change is also implemented by using CycleGAN, and the structure and workflow are the same as those of the antagonistic generating network for the migration illumination change described above. The differences are that: the opposition to the change in migration resolution generates a new image resolution generated by the network that is consistent with the target data set and that has a different loss function as the encoder parameters are updated.

The countermeasure of the transition resolution change generates a loss function L of encoder parameter updates in the network _{total 2} Comprises the following steps:

L _{total 2} ＝L _gan +η ₂ L _res (G，I)

wherein eta is ₂ For the purpose of over-parameters, adjustments can be made during the experiment, L _gan The basic loss function is the same as that described above and will not be described again.

L _res (G, I) a countermeasure generation network for causing the transition resolution to vary focuses on the resolution in the transition image, which is expressed as:

wherein G (x) represents the new image generated, p (x) is the data distribution function of the image in the source data set, I (x) represents the function of extracting the pixel insensitive feature,

representing the square of the two norms.

And thirdly, shifting the confrontation of the photographing visual angles to generate a network.

As shown in fig. 4, the structure of the generation network for the countermeasure of the migration photographing perspective is also implemented by using CycleGAN, and the structure and the workflow are the same as those of the countermeasure generation network for the migration illumination change described above. The difference lies in that: the countermeasure of the migration camera view generates a new image camera view generated by the network that is consistent with the target data set and that has a different loss function when the encoder parameters are updated.

Since the migration camera angle does not need to have other constraints, the countermeasure of the migration camera angle generates a loss function L of the encoder parameter update in the network _{total 3} Comprises the following steps:

L _{total 3} ＝L _gan

wherein L is _gan The basic loss function is the same as that described above and will not be described again.

And fourthly, fitting a network by the sub-network weight.

In the embodiment of the present invention, the sub-network weight fitting network is implemented by using a multilayer convolutional neural network and a full-connection network, and the structure of the sub-network weight fitting network is shown in fig. 5.

The sub-network weight fitting network needs to be trained so that the sub-network weight fitting network can fit the weight of the network generated by each countermeasure;

the training process is as follows:

1) the outputs of the encoders within the three pairs generation network are stacked together as the input to the sub-network weight fitting network.

2) And normalizing the reciprocal of the loss functions of the three confrontation generation networks to be used as a real label.

3) Training by adopting the minimum mean square error as a loss function of a sub-network weight fitting network;

in the testing stage, the weight of each confrontation generation network is directly generated according to the output of the coder in the three confrontation generation networks:

wherein,

generating a corresponding weight value of the network for the countermeasure of the migration illumination change,

generating corresponding weights of the network for the countermeasure of the migration resolution change,

and generating a weight corresponding to the network for the confrontation of the migration photographing visual angles.

And fifthly, integrating the countermeasure generation network.

As shown in fig. 6, the integrated countermeasure generation network includes: a decoder and a discriminator, wherein the network structure of the decoder and the discriminator is the same as the three countermeasure generation networks (countermeasure generation network for migrating illumination variation, countermeasure generation network for migrating resolution variation, countermeasure generation network for migrating photo-taking view angle) introduced earlier.

1. And (5) a training stage.

The integrated countermeasure generation network normalizes the reciprocal of the loss function of the three countermeasure generation networks to obtain the weights of the three countermeasure generation networks, and then multiplies the outputs of the encoders of the three countermeasure generation networks by the respective weights to be used as the input z of the integrated countermeasure generation network in the training stage _x Expressed as:

wherein,

the outputs of the encoders of the countermeasure generation network for the migration illumination change, the countermeasure generation network for the migration resolution change, and the countermeasure generation network for the migration photographing view angle are respectively corresponded to;

and

are the same, can be, for example, 64 x 768, then z _x ∈R ^64×64×768 。

By decoder pair z _x Decoding to obtain a new image, mixing it with the image of target data, distinguishing the image by discriminator, updating the parameters of decoder and discriminator according to the result of discriminator, and obtaining relative loss function L _{total 4} Comprises the following steps:

wherein eta is ₃ For hyper-parameters, can be varied automatically during the course of the experiment, L _gan The basic loss function is the same as that described above and will not be described again.

f (-) is the Jesen-Shannon divergence between the two distributions,

and

respectively, are the coding features multiplied by the normalized parameters (i.e. correspond to the above-mentioned ones)

)。

2. Testing phase

In the testing stage, the weights of the three confrontation generating networks used by the integrated confrontation generating network are provided by the network weight fitting network, and after the weights are multiplied by the outputs of the encoders corresponding to the three confrontation generating networks, a migrated image is output.

In the embodiment of the invention, the parameters of the decoder and the discriminator in the integrated countermeasure generating network are shared with the decoders of the previous three countermeasure generating networks, and the parameters are updated only in the integrated countermeasure network.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A pedestrian re-identification data generation method, characterized by comprising:

constructing a countermeasure generation network for the change of the migration illumination, a countermeasure generation network for the change of the migration resolution, a countermeasure generation network for the migration photographing view and an integrated countermeasure generation network by using the cycleGAN; constructing a sub-network weight fitting network by using a multilayer convolutional neural network and a full-connection network;

the countermeasure generating network of the migration illumination change, the countermeasure generating network of the migration resolution change and the countermeasure generating network of the migration photographing view all comprise a generator consisting of an encoder and a decoder and a discriminator; the integrated countermeasure generation network comprises a decoder and a discriminator; parameters of decoders and discriminators in the four countermeasure generating networks are mutually shared, and the four discriminators are all used in a training stage and are matched with the output of corresponding decoders to realize parameter updating;

after the weights of the three confrontation generating networks are multiplied by the output of the corresponding encoder, a new image with the illumination, resolution and photographing visual angle consistent with the target data set is generated by a decoder of the integrated confrontation generating network.

2. The pedestrian re-identification data generation method according to claim 1, wherein the structures of the countermeasure generation network for migrating illumination variation, the countermeasure generation network for migrating resolution variation, and the countermeasure generation network for migrating photographing view angle are the same;

for an image x from a source data set, an encoder in a network is generated by shifting the confrontation of illumination change to perform encoding dimensionality reduction, and then a decoder decodes an encoding result to generate a new image with illumination characteristics consistent with a target data set; the new image is mixed with the image in the target data set and then input into a discriminator, and the image is distinguished by the discriminator, so that the parameters of the encoder are updated according to the distinguishing result of the discriminator;

the countermeasure generation network for the transition resolution change and the countermeasure generation network for the transition photographing view both work in the above manner, and the difference is that: the generated new images are respectively consistent with the target data set in the resolution and the photographing view, and the loss functions are different when the parameters of the encoder are updated.

3. The pedestrian re-identification data generation method of claim 1, wherein the countermeasure against the transitional illumination change generates a loss function L of the encoder parameter update in the network _{total 1} Comprises the following steps:

L _{total 1} ＝L _gan +η ₁ L _ill (G,H)

wherein eta ₁ Is a hyperparameter, L _gan Is a base loss function; l is _ill (G, H) is used to makeCountering transitional lighting changes generates lighting conditions, L, in which the network focuses on the transitional image _ill (G,H)＝E _x～p(x) [||H(G(x))-H(x)|| ₁ ]G (x) represents a new generated image, E represents a mathematical expectation, p (x) is a data distribution function of the image in the source dataset, H (·) represents a function that extracts the illumination insensitive features, | | ₁ Representing a norm.

4. The pedestrian re-identification data generation method according to claim 1, wherein the countermeasure against the change in the transition resolution generates a loss function L of the encoder parameter update in the network _{total 2} Comprises the following steps:

L _{total 2} ＝L _gan +η ₂ L _res (G,I)

wherein eta is ₂ Is hyperparametric, L _gan Is a base loss function; l is _res (G, I) a countermeasure generation network for causing the transition resolution to vary focuses on the resolution in the transition image,

g (x) represents the new image generated, E represents the mathematical expectation, p (x) is the data distribution function of the source data set image, I (x) represents the function of extracting the pixel insensitive features,

representing the square of the two norms.

5. The pedestrian re-identification data generation method of claim 1, wherein the countermeasure of the migration photo-angle generates a loss function L of the encoder parameter update in the network _{total 3} Comprises the following steps:

L _{total 3} ＝L _gan

wherein L is _gan Is the basis loss function.

6. The pedestrian re-identification data generation method according to claim 1,

the training process is as follows: stacking outputs of encoders in the three confrontation generation networks to serve as inputs of a sub-network weight fitting network, and normalizing reciprocal numbers of loss functions of the three confrontation generation networks to serve as real labels; then, training by adopting the minimum mean square error as a loss function of a sub-network weight fitting network;

wherein,

generate corresponding network weights for the confrontation of the migration resolution change,

7. The pedestrian re-identification data generation method of claim 1, wherein in the training phase, the integrated countermeasure generation network normalizes reciprocal of loss functions of the three countermeasure generation networks to obtain weights of the three countermeasure generation networks, and then multiplies outputs of encoders of the three countermeasure generation networks by their respective weights as inputs z of the integrated countermeasure generation network in the training phase _x Expressed as:

wherein,

the outputs of the encoders respectively correspond to a countermeasure generation network for migration illumination change, a countermeasure generation network for migration resolution change, and a countermeasure generation network for migration photographing view;

by decoder pair z _x Decoding to obtain a new image, mixing it with the image of target data, distinguishing the image by discriminator, updating the parameters of decoder and discriminator according to the result of discriminator, and obtaining corresponding loss function L _{total 4} Comprises the following steps:

wherein eta is ₃ Is a hyperparameter, L _gan As a basis loss function;

f (-) is the Jesen-Shannon divergence between the two distributions,

and

respectively the coding characteristics multiplied by the normalized parameters;

after the integrated countermeasure generating network finishes the parameter updating of the decoder and the discriminator, the updated related parameters are shared to the countermeasure generating network of the migration illumination change, the countermeasure generating network of the migration resolution change and the countermeasure generating network of the migration photographing view;

in the testing stage, the weights of the three confrontation generating networks used by the integrated confrontation generating network are provided by a network weight fitting network.

8. The pedestrian re-identification data generation method according to claim 3, 4, 5 or 7, wherein the basic loss function L _gan Comprises the following steps:

L _gan ＝L _adv +λ ₁ L _cyc +λ ₂ L _ide

wherein L is _adv For ensuring that the distribution of the new image generated is consistent with the distribution of the target data set, L _cyc For reverse conversion in CycleGAN, with new images being generated and migrated back to the original, L _ide For ensuring that the overall colour of the two images remains the same, lambda ₁ And λ ₂ Is a hyper-parameter.