CN115713680A

CN115713680A - Semantic guidance-based face image identity synthesis method

Info

Publication number: CN115713680A
Application number: CN202211451581.1A
Authority: CN
Inventors: 刘瑞霞; 李子安; 舒明雷; 陈长芳; 单珂
Original assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-02-24
Anticipated expiration: 2042-11-18
Also published as: CN115713680B

Abstract

A face image identity synthesis method based on semantic guidance comprises the steps of extracting identity information, attribute information and background information of each image, fusing the information in a feature fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the generated face image with the face shape changed and stable quality is obtained by adding the background information in the training process.

Description

Semantic guidance-based face image identity synthesis method

Technical Field

The invention relates to the field of image-level depth counterfeiting, in particular to a face image identity synthesis method based on semantic guidance.

Background

In recent years, with the breakthrough development of machine learning and graphics technology, the field of deep counterfeiting has been advanced greatly, and the synthesis direction of human face identity in the sub-direction of the field has been developed rapidly, resulting in more and more counterfeit images and videos appearing on the network. Specifically, the face identity synthesis technology is to convert the identity information of a source face to a target face through a reasonable technology, and simultaneously, attribute information (information such as background, posture, illumination and the like) of the target face in an image is not damaged. At present, human face identity synthesis is widely applied to various fields of information protection, the film and television industry, virtual entertainment and the like, advanced equipment is utilized in the film and television industry to reconstruct a facial model of an actor, illumination conditions of a scene are reconstructed, and a vivid effect can be obtained. Compared with the directions of attribute editing, image restoration and the like in the field of deep counterfeiting, the face identity synthesis is more open, and meanwhile, more innovative technologies in generating models are involved.

The traditional research of the face identity synthesis direction is mainly based on an image editing mode, and the method can be divided into two categories, namely a face image analysis and fusion mode and a 3D face modeling mode. The first traditional image editing mode needs to manually analyze the face area, and carries out face fusion through rendering, deformation and other modes, and the mode is not efficient, and consumes a great deal of time and energy. The second method needs to acquire a 3D face image of the face image, and combines a deep learning technique to generate an image, which may cause a problem of lack of illumination and background. In addition, these generation methods are less concerned about the structure of the face, resulting in a face shape problem in the generated face image.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a face image identity synthesis method which is used for firstly semantically guiding feature key points of face shape change, then extracting identity information, attribute information and background information from an image, then fusing the information in a feature fusion mode, and finally generating an image from the fused information.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a face image identity synthesis method based on semantic guidance comprises the following steps:

a) Extracting key points of the face image from all face images in the CelebA face image data set;

b) Establishing a PET key point adjusting network, inputting the key points of the face image into the PET key point adjusting network to obtain the characteristic key points lm _fake For the feature key point lm _fake Iteration is carried out to obtain optimized characteristic key points lm _fake ；

c) Establishing a facial image feature extraction network, and concentrating CelebA facial image data into a source image Pic _s And a target image Pic _t Inputting the image data into a face image feature extraction network, and respectively outputting to obtain identity features F _id And attribute feature F _attr ；

d) Establishing a background feature extraction network and extracting a target image Pic _t Inputting the background feature information into a background feature extraction network to obtain background feature information F _bg ；

e) Establishing a generating network, and identifying the identity F _id Attribute feature F _attr Background feature information F _bg And optimized feature key point lm _fake Inputting the image into a generation network to obtain a face image Pic _fake For picture Pic _fake Iteration is carried out to obtain an optimized face image Pic _fake ；

f) Repeating the steps b) to e) to obtain a real face image Pic with a changed face contour _fake 。

Further, step a) comprises the following steps:

a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set _s The key points extracted in (1) are expressed as source key points lm _s Target image Pic from CelebA face image dataset _t The key points extracted in (1) are expressed as source key points lm _t 。

Further, step b) comprises the following steps:

b-1) creation by the Source encoder E _lms Target encoder E _lmt Key point generator G _lm And a similarity discriminator D _S And true and false discriminator D _TF Forming a PET key point adjusting network;

b-2) Source encoder E _lms Comprises a first down-sampling convolutional layer, a second down-sampling convolutional layer, a third down-sampling convolutional layer, a fourth down-sampling convolutional layer and a fifth sampling convolutional layer, and the source key point lm is _s Input to a source encoder E _lms In the first downsampling convolutional layer, outputting the obtained characteristic information

Feature information

Inputting the data into a second downsampled convolutional layer, and outputting the characteristic information

Feature information

Inputting the data into a third downsampled convolutional layer, and outputting the characteristic information

Feature information

Inputting the data into a fourth downsampled convolutional layer, and outputting the characteristic information

Feature information

Inputting the data into a fifth downsampled convolutional layer, and outputting the characteristic information

b-3) target encoder E _lmt Comprises a first full connection layer, a second full connection layer, a third full connection layer, a fourth full connection layer and a fifth full connection layer, and a source key point lm _t Input to a target encoder E _lmt In the first full connection layer, the characteristic information is output

Feature information

Inputting the data into a second full connection layer, and outputting the data to obtain characteristic information

Feature information

Inputting the data into a third full connection layer, and outputting the data to obtain characteristic information

Feature information

Inputting the data into a fourth full connection layer, and outputting the data to obtain characteristic information

Feature information

Inputting the data into the fifth connection layer, and outputting the obtained characteristic information

Cat () function will feature information

And characteristic information

Stacking to obtain feature vectors

b-5) Key Point Generator G _lm Composed of a first up-sampling convolutional layer, a second up-sampling convolutional layer, a third up-sampling convolutional layer, a fourth up-sampling convolutional layer and a fifth up-sampling convolutional layer, and the feature vector

Input to the keypoint generator G _lm In the first upsampled convolutional layer, outputting to obtain the characteristic key point

Feature key points

Inputting the data into a second upsampling convolutional layer, and outputting to obtain a characteristic key point

Feature key points

Inputting the data into a third upsampling convolutional layer, and outputting to obtain characteristic key points

Feature key points

Inputting the data into a fourth upsampling convolutional layer, and outputting to obtain characteristic key points

Feature key points

Inputting the data into a fifth upsampling convolutional layer, and outputting to obtain a characteristic key point lm _fake ；

b-6) similarity discriminator D _S By Layer _s Module, layer _fake Module, layer _c Modular construction, layer _fake The module consists of a first full connection layer, a second full connection layer, a third full connection layer and a fourth full connection layer, and a characteristic key point lm _fake Input to Layer _fake In the first full connection layer of the module, outputting the characteristic information

Feature information

Input to Layer _fake In the second full connection layer of the module, outputting the characteristic information

Feature information

Input to Layer _fake In the third full connection layer of the module, outputting the obtained characteristic information

Feature information

Input to Layer _fake In the fourth full connection layer of the module, outputting the characteristic information

Layer _s The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a source key point lm _s Input to Layer _s The module outputs the characteristic information from the first full connection layer

Feature information

Input to Layer _s The module outputs the characteristic information from the second full connection layer

Feature information

Input to Layer _s The module outputs the characteristic information from the third full connection layer

Feature information

Input to Layer _s The module outputs the characteristic information from the fourth full connection layer

Cat () function is used to transfer feature information

And characteristic information

Stacking to obtain feature vectors

Layer _c The module comprises a first full connection layer, a second full connection layer, a third full connection layer and a fourth full connection layer, and the feature vectors are formed

Input to Layer _c In a first full connection layer of the module, outputting to obtain a similarity characteristic Fscore1, inputting the similarity characteristic Fscore1 into Layer _c In a second full connection Layer of the module, outputting to obtain a similarity characteristic Fscore2, and inputting the similarity characteristic Fscore2 into the Layer _c In a third full connection Layer of the module, outputting to obtain a similarity characteristic Fscore3, and inputting the similarity characteristic Fscore3 into a Layer _c Outputting to obtain a similarity score in a fourth full connection layer of the module;

b-7) true and false discriminator D _TF The characteristic key point lm is composed of a first full connecting layer, a second full connecting layer, a third full connecting layer, a fourth full connecting layer, a fifth full connecting layer and a sixth full connecting layer _fake Input to a true or false discriminator D _TF In the first fully-connected layer of (2), the output is characterized

Will be characterized by

Input to the second fully-connected layer, output to obtain characteristics

Will be characterized by

Input to the third fully-connected layer, output to obtain characteristics

Will be characterized by

Input to the fourth full connection layer, and output is characterized

Will be characterized by

Input to the fifth full-link layer, and output is characterized

Feature(s)

Inputting the data into a sixth full connection layer, and outputting the data to obtain a numerical value of 1 channel

b-8) by the formula loss _L1 ＝||lm _fake -lm _s || ² Calculating to obtain the loss per point _L1 In the formula | · | non-conducting phosphor ² Is mean squared error, loss _Cycle ＝||lm _fake -lm _t || ² Calculating to obtain the loss of reconstruction _Cycle By the formula

Calculating to obtain the loss of true and false _DTF By the formula

Calculating to obtain the loss of similarity _DS Using point-by-point loss by the back propagation method _L1 Loss of reconstruction _Cycle Loss of true and false _DTF Loss of similarity _DS Iterative optimization feature key point lm _fake 。

In the step b-2), convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, step length is all 1, and padding is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0.

Further, step c) comprises the steps of:

c-1) establishment by identity encoder E _id And attribute encoder E _attr Forming a face image feature extraction network;

c-2) identity encoder E _id Composed of an Arcface algorithm, and a source image Pic _s Input to identity encoder E _id In the method, a source image Pic is processed by an interplate () function _s Adjusting to 112 × 112 resolution, inputting the 112 × 112 resolution image into an Arcface algorithm, and outputting to obtain an identity vector

Wherein b is the training batch, c is the number of channels, h is the image height, w is the image width, and the identity vector is obtained

Sequentially inputting the data into a filling layer and a regularization layer, and outputting to obtain an identity characteristic F _id ；

c-3) Attribute encoder E _attr The first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block, the fifth downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU active layer, a second ReLU active layer, a first volume layer, a second volume layer, a downsampling layer and a residual connecting layer, the first bottleneck residual block and the second residual block are sequentially composed of the first normalization layer, the second normalization layer, the first ReLU active layer, the second volume layer, the downsampling layer and the residual connecting layer, the first bottleneck residual block and the second residual block are sequentially composed of the first normalization layer, the second normalization layer, the first ReLU active layer, the second ReLU active layer, the first volume layer, the second volume layer and the residual connecting layer, and the target image Pic is formed by the first downsampling residual block, the second residual block and the second residual block _t Input to attribute encoder E _attr In the first downsampling residual block, outputting the obtained attribute characteristics

Characterizing attributes

Inputting the data into a second downsampling residual block, and outputting the data to obtain attribute characteristics

Characterizing attributes

Inputting the data into a third downsampled residual block, and outputting to obtain attribute characteristics

Characterizing attributes

Inputting the data into a fourth downsampling residual block, and outputting to obtain attribute characteristics

Characterizing attributes

Inputting the data into a fifth downsampling residual block, and outputting to obtain attribute characteristics

Characterizing attributes

Inputting into the first bottleneck residual block, outputting to obtain attribute characteristics

Characterizing attributes

Inputting the data into a second bottleneck residual block, and outputting to obtain an attribute characteristic F _attr 。

In the step c-3), a first normalization layer and a second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are respectively in a BatchNorm2d mode; in the step c-3), convolution kernels of the first convolution layer and the second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all 3, and filling and step length are all 1.

Further, step d) comprises the following steps:

d-1) establishing a human face analysis module and a background information encoder E _bg Forming a background feature extraction network;

d-2) the face analysis module is composed of a face analysis algorithm BiSeNet and used for analyzing a target image Pic _t Inputting the image into a face analysis module, analyzing to obtain each part of the face, filling colors into each part of the analyzed face to obtain an image Pic only keeping a background area _bg ；

d-3) background information encoder E _bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a down-sampling convolution layer, a self-attention layer and a ReLU activation layer, and the image Pic is formed _bg Input to a background information encoder E _bg In the first self-attention module, the output obtains the background feature

Characterizing the background

Inputting the data into a second self-attention module, and outputting the data to obtain background features

Characterizing the background

Inputting the data into a third self-attention module, and outputting the data to obtain background features

Characterizing the background

Inputting the data into a fourth self-attention module, and outputting the obtained background features

Characterizing the background

Inputting the data into a fourth self-attention module, and outputting the obtained background feature F _bg 。

In the step d-3), the convolution kernels of the downsampling convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step length is 0, and the padding is 0.

Further, step e) comprises the steps of:

e-1) establishing a generation network consisting of a fusion module, an up-sampling module and a discriminator module;

e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second adaptive instance normalization layer, and the attribute characteristic F is obtained by normalizing the first convolution layer, the second convolution layer and the second adaptive instance normalization layer _attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a first fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula sigma _id As identity feature F _id The standard deviation of the (c) is,

as a property feature

Is a channel averaging operation, and σ () is a standard deviation operation, the features are fused

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

After inputting into the second convolution layer, obtaining the attribute characteristics

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

The channel average of (a);

e-3) mixingFusion features

Inputting the data into a fusion module, and obtaining attribute characteristics from a second fusion block after passing through a first convolution layer of the second fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

After inputting into the second convolution layer, obtaining attribute characteristics

Identity feature F _id And attribute features

Input to second AdaptationIn the example normalization layer, by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

The channel average of (a);

e-4) fusing features

Inputting the data into a fusion module, and obtaining attribute characteristics from a third fusion block and a first convolution layer of the third fusion block

Identity feature F _id And attribute features

Inputting into a first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-5) fusing features

Inputting the data into a fusion module, and obtaining attribute characteristics after the data passes through a first convolution layer of a fourth fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-6) fusing features

Inputting the attribute data into a fifth fusion block of the fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of the fifth fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

The channel average of (a);

e-7) fusing the features

Inputting the attribute data into a fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of a sixth fusion block

Identity F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-8) optimizing the feature key points lm _fake Respectively inputting two convolution layers to respectively obtain characteristics F _gamma And feature F _beta By the formula

Calculating to obtain a fusion vector F _fuse ；

e-9) the up-sampling module is composed of a first up-sampling layer, a second up-sampling layer, a third up-sampling layer, a fourth up-sampling layer and a fifth up-sampling layer, and the background characteristics are obtained

And a fusion vector F _fuse Inputting the data into a first up-sampling layer of an up-sampling module, and outputting the obtained characteristics

Will be characterized by

And background features

Input into the second up-sampling layer together, and output to obtain characteristics

Will be characterized by

And background features

Input into the third up-sampling layer together, and output is characterized

Will be characterized by

And background features

Input into the fourth up-sampling layer together, and output to obtain characteristics

Will be characterized by

And background feature F _bg Inputting the images into a fifth upper sampling layer, and outputting to obtain a face image Pic _fake ；

e-10) the discriminator module comprises a first downsampled convolutional layer, a second downsampled convolutional layer, a third downsampled convolutional layer,A fourth down-sampling convolutional layer, a fifth down-sampling convolutional layer, a sixth down-sampling convolutional layer, and a Sigmoid function layer, and the face image Pic _fake Inputting the data into the first downsampling convolutional layer, and outputting the data to obtain the characteristics

Will be characterized by

Input to the second downsampled convolutional layer, and output to obtain the characteristics

Will be characterized by

Input to the third downsampling convolutional layer, and output to obtain the characteristics

Will be characterized by

Inputting the data into a fourth downsampled convolutional layer, and outputting the data to obtain characteristics

Will be characterized by

Inputting the data into a fifth downsampled convolutional layer, and outputting the result to obtain characteristics

Will be characterized by

Input to the sixth downsampled convolutional layer, and output to obtain the characteristics

Will be characterized by

After being input into a Sigmoid function layer, the values are output

Target image Pic _t After input to the first downsampled convolutional layer, output to obtain the characteristics

Will be characterized by

Input into the second downsampled convolutional layer, and output to obtain the characteristics

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

After being input into the sixth downsampled convolution layer, the convolution product is outputTo obtain the characteristics

Will be characterized by

After being input into a Sigmoid function layer, the values are output

e-11) by the formula

Calculating to obtain the identity loss l1, and obtaining the identity loss l1 through a formula l2= | | Pic _fake -Pic _t || ² Calculating to obtain the reconstruction loss l2 by the formula

Calculating to obtain attribute loss l3, and iteratively optimizing the face image Pic by using the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method _fake 。

Further, in the step e-2), convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are all 3, step length is 1, and filling is 0; in the step e-8), the convolution kernels of the two convolution layers are both 1, the step length is both 1, and the filling is both 0; in the step e-9), convolution kernels of the first upper sampling layer, the second upper sampling layer, the third upper sampling layer and the fourth upper sampling layer are all 3, step length is 1, filling is 1, convolution kernels of the fifth upper sampling layer are 7, step length is 1, and filling is 0; e-10), the convolution kernels of the first down-sampling convolution layer, the second down-sampling convolution layer and the third down-sampling convolution layer are all 4 x 4, the step lengths are all 2, the padding is all 1, the convolution kernels of the fourth down-sampling convolution layer, the fifth down-sampling convolution layer and the sixth down-sampling convolution layer are all 4 x 4, the step lengths are all 1, and the padding is all 1.

The invention has the beneficial effects that: and extracting identity information, attribute information and background information of each image, fusing the information in a characteristic fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image with the face shape changed and stable quality.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a diagram of the key point extraction and adjustment structure of the present invention;

FIG. 3 is a diagram of a key point discriminator network architecture of the present invention;

FIG. 4 is a diagram of an attribute extraction structure and a downsampling structure according to the present invention;

FIG. 5 is a block diagram illustrating an exemplary spatial adaptive normalization architecture according to the present invention;

FIG. 6 is a diagram of semantic parsing and background information extraction according to the present invention.

Detailed Description

The present invention is further described with reference to fig. 1 to 6.

a) And extracting key points of the face image from all face images in the CelebA face image data set.

b) Establishing a PET key point adjusting network, inputting the key points of the face image into the PET key point adjusting network to obtain the characteristic key points lm _fake For the feature key point lm _fake Iteration is carried out to obtain optimized characteristic key points lm _fake 。

c) Establishing a facial image feature extraction network, and concentrating CelebA facial image data into a source image Pic _s And a target image Pic _t Inputting the image data into a face image feature extraction network, and respectively outputting to obtain identity features F _id And attribute feature F _attr 。

d) Establishing a background feature extraction network and extracting a target image Pic _t Inputting the background feature information into a background feature extraction network to obtain background feature information F _bg 。

e) Establishing a generating network, and identifying the identity F _id Attribute feature F _attr Background feature information F _bg And optimized characteristic key point lm _fake Inputting the image into a generation network to obtain a face image Pic _fake For picture Pic _fake Iteration is carried out to obtain an optimized face image Pic _fake 。

f) Repeating the steps b) to e) to obtain a real face image Pic with a changed face contour _fake . The method comprises the steps of providing characteristic key points for semantically guiding face shape change, extracting identity information, attribute information and background information of each image, fusing the information in a characteristic fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the generated face image with the face shape changed and stable quality is obtained by adding the background information in the training process.

Example 1:

the step a) comprises the following steps:

a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set _s The key points extracted in (1) are expressed as source key points lm _s Target image Pic from CelebA face image dataset _t The key points extracted in (1) are expressed as source key points lm _t . CelebA facial image data set is composed of 30000 facial images with different identities, the resolution of each image is 512 x 512, and a source image Pic _s And a target image Pic _t Are all images in the CelebA dataset.

Example 2:

the step b) comprises the following steps:

b-1) creation by the Source encoder E _lms Target encoder E _lmt Key point generator G _lm And a similarity discriminator D _S And a true and false discriminator D _TF The formed PET key points regulate the network.

b-2) Source encoder E _lms Comprises a first down-sampling convolution layer, a second down-sampling convolution layer, a third down-sampling convolution layer, a fourth down-sampling convolution layer and a fifth sampling convolution layer, and the source key point lm is _s Input to a source encoder E _lms In the first downsampling convolutional layer, outputting the obtained characteristic information

Feature information

Feature information

Feature information

Feature information

b-3) target encoder E _lmt Comprises a first full connection layer, a second full connection layer, a third full connection layer, a fourth full connection layer and a fifth full connection layer, and a source key point lm _t Input to a target encoder E _lmt First fully connected layer ofIn the method, the characteristic information is obtained by outputting

Feature information

Feature information

Feature information

Feature information

Cat () function will feature information

And characteristic information

Stacking to obtain feature vectors

Key points of features

Key points of features

Key points of features

Key points of features

Inputting the data into a fifth upsampling convolutional layer, and outputting to obtain a characteristic key point lm _fake Its dimension is 1 × 212.

b-6) similarity discriminator D _S By Layer _s Module, layer _fake Module, layer _c Modular construction, layer _fake The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a characteristic key point lm is connected with the module _fake Input to Layer _fake In the first full connection layer of the module, outputting the characteristic information

Feature information

Feature information

Feature information

Feature information

Feature information

Feature information

Cat () function is used to transfer feature information

And characteristic information

Stacking to obtain feature vectors

Layer _c The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the feature vectors are combined

Input to Layer _c In a first full connection Layer of the module, outputting to obtain a similarity characteristic Fscore1, and inputting the similarity characteristic Fscore1 into a Layer _c In a second full connection Layer of the module, outputting to obtain a similarity characteristic Fscore2, and inputting the similarity characteristic Fscore2 into the Layer _c In a third full connection Layer of the module, outputting to obtain a similarity characteristic Fscore3, and inputting the similarity characteristic Fscore3 into a Layer _c In the fourth full connection layer of the module, outputting to obtain a similarity score score。

Will be characterized by

Input to the second fully-connected layer, output to obtain characteristics

Will be characterized by

Input to the third full-link layer, and output is characterized

Will be characterized by

Input to the fourth full connection layer, and output is characterized

Will be characterized by

Input to the fifth full-link layer, and output is characterized

Feature(s)

b-8) by the formula loss _L1 ＝||lm _fake -lm _s || ² Calculating to obtain the loss per point _L1 In the formula | · | non-conducting phosphor ² Is the mean squared error, loss _Cycle ＝||lm _fake -lm _t || ² Calculating to obtain the loss of reconstruction _Cycle By the formula

Calculating to obtain the loss of true and false _DTF By the formula

Calculating to obtain the loss of similarity _DS Using point-by-point loss by the back propagation method _L1 Loss of reconstruction _Cycle Loss of true and false _DTF Similarity loss _DS Iterative optimization feature key point lm _fake 。

Example 3:

in the step b-2), the convolution kernels of the first down-sampling convolution layer, the second down-sampling convolution layer, the third down-sampling convolution layer, the fourth down-sampling convolution layer and the fifth down-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0.

Example 4:

the step c) comprises the following steps:

c-1) establishment by the identity encoder E _id Sum attribute encoder E _attr Forming a face image feature extraction network;

c-2) identity encoder E _id Composed of Arcface algorithm, and a source image Pic _s Input to identity encoder E _id In the method, a source image Pic is processed by an interplate () function _s Adjusting to 112 × 112 resolution, inputting the 112 × 112 resolution image into an Arcface algorithm, and outputting to obtain an identity vector

c-3) Attribute encoder E _attr The first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block, the fifth downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer, a downsampling layer and a residual connecting layer, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer and a residual connecting layer, and the target image Pic is formed by the target image Pic _t Input to attribute encoder E _attr In the first down-sampling residual block, the attribute characteristics are output

Characterizing attributes

Characterizing attributes

Inputting the data into a third down-sampling residual block, and outputting to obtain attribute characteristics

Characterizing attributes

Characterizing attributes

Characterizing attributes

Characterizing attributes

Example 5:

Example 6:

the step d) comprises the following steps:

d-1) establishing a solution from a human faceAnalysis module and background information encoder E _bg Forming a background feature extraction network;

d-2) the face analysis module is composed of a face analysis algorithm BiSeNet and used for analyzing a target image Pic _t Inputting the image into a face analysis module, analyzing to obtain each part of the face, filling colors into each part of the face to obtain an image Pic only keeping a background region _bg ；

d-3) background information encoder E _bg The image Pic is composed of a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially composed of a down-sampling convolution layer, a self-attention layer and a ReLU activation layer, and the image Pic is formed by the image Pic _bg Input to a background information encoder E _bg In the first self-attention module, the output obtains the background feature

Characterizing the background

Inputting the background feature into a second self-attention module, and outputting

Characterizing the background

Characterizing the background

Characterizing the background

Inputting the background feature into a fourth self-attention module, and outputting the background feature F _bg 。

Example 7:

Example 8:

step e) comprises the following steps:

e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first coiling layer, a first self-adaptive example normalizing layer, a ReLU activation layer, a second coiling layer and a second self-adaptive example normalizing layer, and the attribute characteristic F is obtained by sequentially combining the first coiling layer, the first self-adaptive example normalizing layer, the second coiling layer and the second self-adaptive example normalizing layer _attr Inputting the attribute characteristics into a fusion module from a first fusion block, and obtaining the attribute characteristics after passing through a first convolution layer of the first fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula σ _id Is a special identitySign F _id The standard deviation of (a) is determined,

as a property feature

Is the channel averaging operation, and σ () is the standard deviation operation, will fuse the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-3) fusing features

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

A channel average of (d);

e-4) fusing the features

Inputting the data into a third fusion block of the fusion module, and obtaining attribute characteristics after passing through a first convolution layer of the third fusion block

Identity F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

A channel average of (d);

e-5) fusing features

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-6) fusing the features

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-7) fusing the features

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

A channel average of (d);

e-8) optimizing the feature key points lm _fake Inputting two convolution layers respectively to obtain characteristics F _gamma And feature F _beta By the formula

Calculating to obtain a fusion vector F _fuse ；

Will be characterized by

And background features

Will be characterized by

And background features

Input into the third up-sampling layer together, and output is characterized

Will be characterized by

And background features

Will be characterized by

e-10) the discriminator module comprises a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer,the face image Pic _fake After input to the first downsampled convolutional layer, output to obtain the characteristics

Will be characterized by

Will be characterized by

Input to a third downsampled convolutional layer, and output to obtain characteristics

Will be characterized by

Input to the fourth downsampled convolutional layer, and output to obtain the characteristics

Will be characterized by

Will be characterized by

Input to the sixth downsampling convolutional layer, and output to obtain the characteristics

Will be characterized by

After being input into a Sigmoid function layer, the values are output

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

Input to the fifth downsampled convolutional layer, and output to obtain the characteristics

Will be characterized by

Will be characterized by

After the data is input into a Sigmoid function layer, a numerical value is output

e-11) by the formula

Example 9:

convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block in the step e-2) are all 3, step length is 1, and filling is 0; in the step e-8), the convolution kernels of the two convolution layers are both 1, the step length is both 1, and the filling is both 0; in the step e-9), convolution kernels of the first upper sampling layer, the second upper sampling layer, the third upper sampling layer and the fourth upper sampling layer are all 3, step length is 1, filling is 1, convolution kernels of the fifth upper sampling layer are 7, step length is 1, and filling is 0; e-10), wherein the convolution kernels of the first downsampling convolutional layer, the second downsampling convolutional layer and the third downsampling convolutional layer are all 4 x 4, the step lengths are all 2, the padding is all 1, the convolution kernels of the fourth downsampling convolutional layer, the fifth downsampling convolutional layer and the sixth downsampling convolutional layer are all 4 x 4, the step lengths are all 1, and the padding is all 1.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A face image identity synthesis method based on semantic guidance is characterized by comprising the following steps:

2. The face image identity synthesis method based on semantic guidance according to claim 1,

the step a) comprises the following steps:

3. The face image identity synthesis method based on semantic guidance according to claim 2,

the step b) comprises the following steps:

b-1) construction by the Source encoder E _lms Target encoder E _lmt Key point generator G _lm And a similarity discriminator D _S And a true and false discriminator D _TF Forming a PET key point adjusting network;

Feature information

Feature information

Inputting the data into a third downsampling convolutional layer, and outputting the characteristic information

Feature information

Feature information

Feature information

Inputting the data into a second full-connection layer, and outputting the data to obtain characteristic information

Feature information

Feature information

Feature information

Cat () function will feature information

And characteristic information

Stacking to obtain feature vectors

Key points of features

Is inputted intoIn the second up-sampling convolution layer, outputting to obtain the characteristic key point

Key points of features

Key points of features

Feature key points

Feature information

Feature information

Feature information

Feature information

Feature information

Feature information

Cat () function is used to transfer feature information

And characteristic information

Stacking to obtain feature vectors

Input to Layer _c In a first full connection Layer of the module, outputting to obtain a similarity characteristic Fscore1, and inputting the similarity characteristic Fscore1 into a Layer _c In the second full connection Layer of the module, outputting to obtain a similarity characteristic Fscore2, and inputting the similarity characteristic Fscore2 into the Layer _c In a third full connection Layer of the module, outputting to obtain a similarity characteristic Fscore3, and inputting the similarity characteristic Fscore3 into a Layer _c Outputting to obtain a similarity score in a fourth full connection layer of the module;

b-7) true and false discriminator D _TF The characteristic key point lm is composed of a first full connecting layer, a second full connecting layer, a third full connecting layer, a fourth full connecting layer, a fifth full connecting layer and a sixth full connecting layer _fake Input to a true or false discriminator D _TF In the first full connection layer of (2), the output is characterized

Will be characterized by

Input to the second fully-connected layer, output is characterized

Will be characterized by

Input to the third full-link layer, and output is characterized

Will be characterized by

Input to the fourth fully-connected layer, and output to obtain characteristics

Will be characterized by

Input to the fifth fully-connected layer, output to obtain characteristics

Feature(s)

Calculating to obtain the loss of true and false _DTF By the formula

4. The semantic guidance-based face image identity synthesis method according to claim 1, characterized in that: in the step b-2), the convolution kernels of the first down-sampling convolution layer, the second down-sampling convolution layer, the third down-sampling convolution layer, the fourth down-sampling convolution layer and the fifth down-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0.

5. The method for synthesizing the identity of the human face image based on semantic guidance according to claim 1, wherein the step c) comprises the following steps:

c-3) Attribute encoder E _attr The first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block, the fifth downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer, a downsampling layer and a residual connecting layer, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer and a residual connecting layer, and the target image Pic is formed by the target image Pic _t Input to attribute encoder E _attr In the first downsampling residual block, outputting the obtained attribute characteristics

Characterizing attributes

Characterizing attributes

Characterizing attributes

Characterizing attributes

Characterizing attributes

Characterizing attributes

6. The semantic guidance-based human face image identity synthesis method according to claim 1, characterized in that: in the step c-3), a first normalization layer and a second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are respectively a BatchNorm2d; convolution kernels of the first convolution layer and the second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block in the step c-3) are all 3, and filling and step length are all 1.

7. The method for synthesizing the identity of the human face image based on semantic guidance according to claim 1, wherein the step d) comprises the following steps:

d-1) establishing a human face analysis module and a background information encoder E _bg A constructed background feature extraction network;

Characterizing the background

Characterizing the background

Characterizing the background

Characterizing the background

8. The semantic guidance-based human face image identity synthesis method according to claim 1, characterized in that: in the step d-3), convolution kernels of the downsampling convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, step length is all 0, and padding is all 0.

9. The method for synthesizing the identity of the human face image based on semantic guidance according to claim 7, wherein the step e) comprises the following steps:

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula σ _id As identity feature F _id The standard deviation of the (c) is,

as a characteristic of an attribute

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

The channel average of (a);

e-3) fusing features

Inputting the data into a fusion module, and obtaining attribute characteristics after the data passes through a first convolution layer of a second fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

A channel average of (d);

e-4) fusing features

Identity feature F _id And attribute features

Inputting into a first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

The channel average of (a);

e-5) fusing features

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

The channel average of (a);

e-6) fusing features

Identity feature F _id And attribute features

Inputting into a first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a characteristic of an attribute

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

A channel average of (d);

e-7) fusing the features

Inputting the attribute characteristics into a fusion module from a sixth fusion block, and obtaining the attribute characteristics after passing through a first convolution layer of the sixth fusion block

Identity feature F _id And attribute features

Inputting into the first adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

Channel average of (2), fusing the features

Inputting into ReLU activation layer to obtain characteristics

Will be characterized by

Identity feature F _id And attribute features

Inputting into a second adaptive instance normalization layer by formula

Calculating to obtain fusion characteristics

In the formula

As a property feature

A channel average of (d);

Calculating to obtain a fusion vector F _fuse ；

e-9) the up-sampling module comprises a first up-sampling layer, a second up-sampling layer, a third up-sampling layer, a fourth up-sampling layer and a fifth up-sampling layer, and the background characteristics are obtained

Will be characterized by

And background features

Will be characterized by

And background features

Input into the third up-sampling layer together, and output to obtain characteristics

Will be characterized by

And background features

Will be characterized by

e-10) the discriminator module comprises a first down-sampling convolution layer, a second down-sampling convolution layer, a third down-sampling convolution layer, a fourth down-sampling convolution layer, a fifth down-sampling convolution layer, a sixth down-sampling convolution layer and a Sigmoid function layer, and the discriminator module is used for discriminating the face image Pic _fake Is inputted intoAfter the first downsampling convolutional layer, outputting to obtain the characteristics

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

After being input into a Sigmoid function layer, the values are output

The target image Pic _t After input to the first downsampled convolutional layer, output to obtain the characteristics

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

Will be characterized by

After being input into a Sigmoid function layer, the values are output

e-11) by the formula

10. The semantic guidance-based face image identity synthesis method according to claim 1, characterized in that: convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block in the step e-2) are all 3, step length is 1, and filling is 0; in the step e-8), the convolution kernels of the two convolution layers are both 1, the step length is both 1, and the filling is both 0; in the step e-9), convolution kernels of the first up-sampling layer, the second up-sampling layer, the third up-sampling layer and the fourth up-sampling layer are all 3, step length is 1, filling is 1, convolution kernels of the fifth up-sampling layer are 7, step length is 1, and filling is 0; e-10), wherein the convolution kernels of the first downsampling convolutional layer, the second downsampling convolutional layer and the third downsampling convolutional layer are all 4 x 4, the step lengths are all 2, the padding is all 1, the convolution kernels of the fourth downsampling convolutional layer, the fifth downsampling convolutional layer and the sixth downsampling convolutional layer are all 4 x 4, the step lengths are all 1, and the padding is all 1.