CN115713680A - Semantic guidance-based face image identity synthesis method - Google Patents

Semantic guidance-based face image identity synthesis method Download PDF

Info

Publication number
CN115713680A
CN115713680A CN202211451581.1A CN202211451581A CN115713680A CN 115713680 A CN115713680 A CN 115713680A CN 202211451581 A CN202211451581 A CN 202211451581A CN 115713680 A CN115713680 A CN 115713680A
Authority
CN
China
Prior art keywords
layer
inputting
feature
attribute
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211451581.1A
Other languages
Chinese (zh)
Other versions
CN115713680B (en
Inventor
刘瑞霞
李子安
舒明雷
陈长芳
单珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Original Assignee
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Institute of Artificial Intelligence filed Critical Qilu University of Technology
Priority to CN202211451581.1A priority Critical patent/CN115713680B/en
Publication of CN115713680A publication Critical patent/CN115713680A/en
Application granted granted Critical
Publication of CN115713680B publication Critical patent/CN115713680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

A face image identity synthesis method based on semantic guidance comprises the steps of extracting identity information, attribute information and background information of each image, fusing the information in a feature fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the generated face image with the face shape changed and stable quality is obtained by adding the background information in the training process.

Description

Semantic guidance-based face image identity synthesis method
Technical Field
The invention relates to the field of image-level depth counterfeiting, in particular to a face image identity synthesis method based on semantic guidance.
Background
In recent years, with the breakthrough development of machine learning and graphics technology, the field of deep counterfeiting has been advanced greatly, and the synthesis direction of human face identity in the sub-direction of the field has been developed rapidly, resulting in more and more counterfeit images and videos appearing on the network. Specifically, the face identity synthesis technology is to convert the identity information of a source face to a target face through a reasonable technology, and simultaneously, attribute information (information such as background, posture, illumination and the like) of the target face in an image is not damaged. At present, human face identity synthesis is widely applied to various fields of information protection, the film and television industry, virtual entertainment and the like, advanced equipment is utilized in the film and television industry to reconstruct a facial model of an actor, illumination conditions of a scene are reconstructed, and a vivid effect can be obtained. Compared with the directions of attribute editing, image restoration and the like in the field of deep counterfeiting, the face identity synthesis is more open, and meanwhile, more innovative technologies in generating models are involved.
The traditional research of the face identity synthesis direction is mainly based on an image editing mode, and the method can be divided into two categories, namely a face image analysis and fusion mode and a 3D face modeling mode. The first traditional image editing mode needs to manually analyze the face area, and carries out face fusion through rendering, deformation and other modes, and the mode is not efficient, and consumes a great deal of time and energy. The second method needs to acquire a 3D face image of the face image, and combines a deep learning technique to generate an image, which may cause a problem of lack of illumination and background. In addition, these generation methods are less concerned about the structure of the face, resulting in a face shape problem in the generated face image.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a face image identity synthesis method which is used for firstly semantically guiding feature key points of face shape change, then extracting identity information, attribute information and background information from an image, then fusing the information in a feature fusion mode, and finally generating an image from the fused information.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a face image identity synthesis method based on semantic guidance comprises the following steps:
a) Extracting key points of the face image from all face images in the CelebA face image data set;
b) Establishing a PET key point adjusting network, inputting the key points of the face image into the PET key point adjusting network to obtain the characteristic key points lm fake For the feature key point lm fake Iteration is carried out to obtain optimized characteristic key points lm fake
c) Establishing a facial image feature extraction network, and concentrating CelebA facial image data into a source image Pic s And a target image Pic t Inputting the image data into a face image feature extraction network, and respectively outputting to obtain identity features F id And attribute feature F attr
d) Establishing a background feature extraction network and extracting a target image Pic t Inputting the background feature information into a background feature extraction network to obtain background feature information F bg
e) Establishing a generating network, and identifying the identity F id Attribute feature F attr Background feature information F bg And optimized feature key point lm fake Inputting the image into a generation network to obtain a face image Pic fake For picture Pic fake Iteration is carried out to obtain an optimized face image Pic fake
f) Repeating the steps b) to e) to obtain a real face image Pic with a changed face contour fake
Further, step a) comprises the following steps:
a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set s The key points extracted in (1) are expressed as source key points lm s Target image Pic from CelebA face image dataset t The key points extracted in (1) are expressed as source key points lm t
Further, step b) comprises the following steps:
b-1) creation by the Source encoder E lms Target encoder E lmt Key point generator G lm And a similarity discriminator D S And true and false discriminator D TF Forming a PET key point adjusting network;
b-2) Source encoder E lms Comprises a first down-sampling convolutional layer, a second down-sampling convolutional layer, a third down-sampling convolutional layer, a fourth down-sampling convolutional layer and a fifth sampling convolutional layer, and the source key point lm is s Input to a source encoder E lms In the first downsampling convolutional layer, outputting the obtained characteristic information
Figure BDA0003948863910000021
Feature information
Figure BDA0003948863910000022
Inputting the data into a second downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000031
Feature information
Figure BDA0003948863910000032
Inputting the data into a third downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000033
Feature information
Figure BDA0003948863910000034
Inputting the data into a fourth downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000035
Feature information
Figure BDA0003948863910000036
Inputting the data into a fifth downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000037
b-3) target encoder E lmt Comprises a first full connection layer, a second full connection layer, a third full connection layer, a fourth full connection layer and a fifth full connection layer, and a source key point lm t Input to a target encoder E lmt In the first full connection layer, the characteristic information is output
Figure BDA0003948863910000038
Feature information
Figure BDA0003948863910000039
Inputting the data into a second full connection layer, and outputting the data to obtain characteristic information
Figure BDA00039488639100000310
Feature information
Figure BDA00039488639100000311
Inputting the data into a third full connection layer, and outputting the data to obtain characteristic information
Figure BDA00039488639100000312
Feature information
Figure BDA00039488639100000313
Inputting the data into a fourth full connection layer, and outputting the data to obtain characteristic information
Figure BDA00039488639100000314
Feature information
Figure BDA00039488639100000315
Inputting the data into the fifth connection layer, and outputting the obtained characteristic information
Figure BDA00039488639100000316
Cat () function will feature information
Figure BDA00039488639100000317
And characteristic information
Figure BDA00039488639100000318
Stacking to obtain feature vectors
Figure BDA00039488639100000319
b-5) Key Point Generator G lm Composed of a first up-sampling convolutional layer, a second up-sampling convolutional layer, a third up-sampling convolutional layer, a fourth up-sampling convolutional layer and a fifth up-sampling convolutional layer, and the feature vector
Figure BDA00039488639100000320
Input to the keypoint generator G lm In the first upsampled convolutional layer, outputting to obtain the characteristic key point
Figure BDA00039488639100000321
Feature key points
Figure BDA00039488639100000322
Inputting the data into a second upsampling convolutional layer, and outputting to obtain a characteristic key point
Figure BDA00039488639100000323
Feature key points
Figure BDA00039488639100000324
Inputting the data into a third upsampling convolutional layer, and outputting to obtain characteristic key points
Figure BDA00039488639100000325
Feature key points
Figure BDA00039488639100000326
Inputting the data into a fourth upsampling convolutional layer, and outputting to obtain characteristic key points
Figure BDA00039488639100000327
Feature key points
Figure BDA00039488639100000328
Inputting the data into a fifth upsampling convolutional layer, and outputting to obtain a characteristic key point lm fake
b-6) similarity discriminator D S By Layer s Module, layer fake Module, layer c Modular construction, layer fake The module consists of a first full connection layer, a second full connection layer, a third full connection layer and a fourth full connection layer, and a characteristic key point lm fake Input to Layer fake In the first full connection layer of the module, outputting the characteristic information
Figure BDA0003948863910000041
Feature information
Figure BDA0003948863910000042
Input to Layer fake In the second full connection layer of the module, outputting the characteristic information
Figure BDA0003948863910000043
Feature information
Figure BDA0003948863910000044
Input to Layer fake In the third full connection layer of the module, outputting the obtained characteristic information
Figure BDA0003948863910000045
Feature information
Figure BDA0003948863910000046
Input to Layer fake In the fourth full connection layer of the module, outputting the characteristic information
Figure BDA0003948863910000047
Layer s The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a source key point lm s Input to Layer s The module outputs the characteristic information from the first full connection layer
Figure BDA0003948863910000048
Feature information
Figure BDA0003948863910000049
Input to Layer s The module outputs the characteristic information from the second full connection layer
Figure BDA00039488639100000410
Feature information
Figure BDA00039488639100000411
Input to Layer s The module outputs the characteristic information from the third full connection layer
Figure BDA00039488639100000412
Feature information
Figure BDA00039488639100000413
Input to Layer s The module outputs the characteristic information from the fourth full connection layer
Figure BDA00039488639100000414
Cat () function is used to transfer feature information
Figure BDA00039488639100000415
And characteristic information
Figure BDA00039488639100000416
Stacking to obtain feature vectors
Figure BDA00039488639100000417
Layer c The module comprises a first full connection layer, a second full connection layer, a third full connection layer and a fourth full connection layer, and the feature vectors are formed
Figure BDA00039488639100000418
Input to Layer c In a first full connection layer of the module, outputting to obtain a similarity characteristic Fscore1, inputting the similarity characteristic Fscore1 into Layer c In a second full connection Layer of the module, outputting to obtain a similarity characteristic Fscore2, and inputting the similarity characteristic Fscore2 into the Layer c In a third full connection Layer of the module, outputting to obtain a similarity characteristic Fscore3, and inputting the similarity characteristic Fscore3 into a Layer c Outputting to obtain a similarity score in a fourth full connection layer of the module;
b-7) true and false discriminator D TF The characteristic key point lm is composed of a first full connecting layer, a second full connecting layer, a third full connecting layer, a fourth full connecting layer, a fifth full connecting layer and a sixth full connecting layer fake Input to a true or false discriminator D TF In the first fully-connected layer of (2), the output is characterized
Figure BDA00039488639100000419
Will be characterized by
Figure BDA00039488639100000420
Input to the second fully-connected layer, output to obtain characteristics
Figure BDA00039488639100000421
Will be characterized by
Figure BDA00039488639100000422
Input to the third fully-connected layer, output to obtain characteristics
Figure BDA00039488639100000423
Will be characterized by
Figure BDA00039488639100000424
Input to the fourth full connection layer, and output is characterized
Figure BDA00039488639100000425
Will be characterized by
Figure BDA00039488639100000426
Input to the fifth full-link layer, and output is characterized
Figure BDA00039488639100000427
Feature(s)
Figure BDA00039488639100000428
Inputting the data into a sixth full connection layer, and outputting the data to obtain a numerical value of 1 channel
Figure BDA0003948863910000051
b-8) by the formula loss L1 =||lm fake -lm s || 2 Calculating to obtain the loss per point L1 In the formula | · | non-conducting phosphor 2 Is mean squared error, loss Cycle =||lm fake -lm t || 2 Calculating to obtain the loss of reconstruction Cycle By the formula
Figure BDA0003948863910000052
Calculating to obtain the loss of true and false DTF By the formula
Figure BDA0003948863910000053
Calculating to obtain the loss of similarity DS Using point-by-point loss by the back propagation method L1 Loss of reconstruction Cycle Loss of true and false DTF Loss of similarity DS Iterative optimization feature key point lm fake
In the step b-2), convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, step length is all 1, and padding is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0.
Further, step c) comprises the steps of:
c-1) establishment by identity encoder E id And attribute encoder E attr Forming a face image feature extraction network;
c-2) identity encoder E id Composed of an Arcface algorithm, and a source image Pic s Input to identity encoder E id In the method, a source image Pic is processed by an interplate () function s Adjusting to 112 × 112 resolution, inputting the 112 × 112 resolution image into an Arcface algorithm, and outputting to obtain an identity vector
Figure BDA0003948863910000054
Wherein b is the training batch, c is the number of channels, h is the image height, w is the image width, and the identity vector is obtained
Figure BDA0003948863910000055
Sequentially inputting the data into a filling layer and a regularization layer, and outputting to obtain an identity characteristic F id
c-3) Attribute encoder E attr The first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block, the fifth downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU active layer, a second ReLU active layer, a first volume layer, a second volume layer, a downsampling layer and a residual connecting layer, the first bottleneck residual block and the second residual block are sequentially composed of the first normalization layer, the second normalization layer, the first ReLU active layer, the second volume layer, the downsampling layer and the residual connecting layer, the first bottleneck residual block and the second residual block are sequentially composed of the first normalization layer, the second normalization layer, the first ReLU active layer, the second ReLU active layer, the first volume layer, the second volume layer and the residual connecting layer, and the target image Pic is formed by the first downsampling residual block, the second residual block and the second residual block t Input to attribute encoder E attr In the first downsampling residual block, outputting the obtained attribute characteristics
Figure BDA0003948863910000061
Characterizing attributes
Figure BDA0003948863910000062
Inputting the data into a second downsampling residual block, and outputting the data to obtain attribute characteristics
Figure BDA0003948863910000063
Characterizing attributes
Figure BDA0003948863910000064
Inputting the data into a third downsampled residual block, and outputting to obtain attribute characteristics
Figure BDA0003948863910000065
Characterizing attributes
Figure BDA0003948863910000066
Inputting the data into a fourth downsampling residual block, and outputting to obtain attribute characteristics
Figure BDA0003948863910000067
Characterizing attributes
Figure BDA0003948863910000068
Inputting the data into a fifth downsampling residual block, and outputting to obtain attribute characteristics
Figure BDA0003948863910000069
Characterizing attributes
Figure BDA00039488639100000610
Inputting into the first bottleneck residual block, outputting to obtain attribute characteristics
Figure BDA00039488639100000611
Characterizing attributes
Figure BDA00039488639100000612
Inputting the data into a second bottleneck residual block, and outputting to obtain an attribute characteristic F attr
In the step c-3), a first normalization layer and a second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are respectively in a BatchNorm2d mode; in the step c-3), convolution kernels of the first convolution layer and the second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all 3, and filling and step length are all 1.
Further, step d) comprises the following steps:
d-1) establishing a human face analysis module and a background information encoder E bg Forming a background feature extraction network;
d-2) the face analysis module is composed of a face analysis algorithm BiSeNet and used for analyzing a target image Pic t Inputting the image into a face analysis module, analyzing to obtain each part of the face, filling colors into each part of the analyzed face to obtain an image Pic only keeping a background area bg
d-3) background information encoder E bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a down-sampling convolution layer, a self-attention layer and a ReLU activation layer, and the image Pic is formed bg Input to a background information encoder E bg In the first self-attention module, the output obtains the background feature
Figure BDA00039488639100000613
Characterizing the background
Figure BDA00039488639100000614
Inputting the data into a second self-attention module, and outputting the data to obtain background features
Figure BDA00039488639100000615
Characterizing the background
Figure BDA00039488639100000616
Inputting the data into a third self-attention module, and outputting the data to obtain background features
Figure BDA00039488639100000617
Characterizing the background
Figure BDA00039488639100000618
Inputting the data into a fourth self-attention module, and outputting the obtained background features
Figure BDA0003948863910000071
Characterizing the background
Figure BDA0003948863910000072
Inputting the data into a fourth self-attention module, and outputting the obtained background feature F bg
In the step d-3), the convolution kernels of the downsampling convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step length is 0, and the padding is 0.
Further, step e) comprises the steps of:
e-1) establishing a generation network consisting of a fusion module, an up-sampling module and a discriminator module;
e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second adaptive instance normalization layer, and the attribute characteristic F is obtained by normalizing the first convolution layer, the second convolution layer and the second adaptive instance normalization layer attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a first fusion block
Figure BDA0003948863910000073
Identity feature F id And attribute features
Figure BDA0003948863910000074
Inputting into the first adaptive instance normalization layer by formula
Figure BDA0003948863910000075
Calculating to obtain fusion characteristics
Figure BDA0003948863910000076
In the formula sigma id As identity feature F id The standard deviation of the (c) is,
Figure BDA0003948863910000077
as a property feature
Figure BDA0003948863910000078
Is a channel averaging operation, and σ () is a standard deviation operation, the features are fused
Figure BDA0003948863910000079
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100000710
Will be characterized by
Figure BDA00039488639100000711
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100000712
Identity feature F id And attribute features
Figure BDA00039488639100000713
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100000714
Calculating to obtain fusion characteristics
Figure BDA00039488639100000715
In the formula
Figure BDA00039488639100000716
As a characteristic of an attribute
Figure BDA00039488639100000717
The channel average of (a);
e-3) mixingFusion features
Figure BDA00039488639100000718
Inputting the data into a fusion module, and obtaining attribute characteristics from a second fusion block after passing through a first convolution layer of the second fusion block
Figure BDA00039488639100000719
Identity feature F id And attribute features
Figure BDA00039488639100000720
Inputting into the first adaptive instance normalization layer by formula
Figure BDA0003948863910000081
Calculating to obtain fusion characteristics
Figure BDA0003948863910000082
In the formula
Figure BDA0003948863910000083
As a characteristic of an attribute
Figure BDA0003948863910000084
Channel average of (2), fusing the features
Figure BDA0003948863910000085
Inputting into ReLU activation layer to obtain characteristics
Figure BDA0003948863910000086
Will be characterized by
Figure BDA0003948863910000087
After inputting into the second convolution layer, obtaining attribute characteristics
Figure BDA0003948863910000088
Identity feature F id And attribute features
Figure BDA0003948863910000089
Input to second AdaptationIn the example normalization layer, by formula
Figure BDA00039488639100000810
Calculating to obtain fusion characteristics
Figure BDA00039488639100000811
In the formula
Figure BDA00039488639100000812
As a characteristic of an attribute
Figure BDA00039488639100000813
The channel average of (a);
e-4) fusing features
Figure BDA00039488639100000814
Inputting the data into a fusion module, and obtaining attribute characteristics from a third fusion block and a first convolution layer of the third fusion block
Figure BDA00039488639100000815
Identity feature F id And attribute features
Figure BDA00039488639100000816
Inputting into a first adaptive instance normalization layer by formula
Figure BDA00039488639100000817
Calculating to obtain fusion characteristics
Figure BDA00039488639100000818
In the formula
Figure BDA00039488639100000819
As a property feature
Figure BDA00039488639100000820
Channel average of (2), fusing the features
Figure BDA00039488639100000821
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100000822
Will be characterized by
Figure BDA00039488639100000823
After inputting into the second convolution layer, obtaining attribute characteristics
Figure BDA00039488639100000824
Identity feature F id And attribute features
Figure BDA00039488639100000825
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100000826
Calculating to obtain fusion characteristics
Figure BDA00039488639100000827
In the formula
Figure BDA00039488639100000828
As a property feature
Figure BDA00039488639100000829
The channel average of (a);
e-5) fusing features
Figure BDA00039488639100000830
Inputting the data into a fusion module, and obtaining attribute characteristics after the data passes through a first convolution layer of a fourth fusion block
Figure BDA00039488639100000831
Identity feature F id And attribute features
Figure BDA00039488639100000832
Inputting into the first adaptive instance normalization layer by formula
Figure BDA00039488639100000833
Calculating to obtain fusion characteristics
Figure BDA00039488639100000834
In the formula
Figure BDA00039488639100000835
As a property feature
Figure BDA00039488639100000836
Channel average of (2), fusing the features
Figure BDA00039488639100000837
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100000838
Will be characterized by
Figure BDA00039488639100000839
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100000840
Identity feature F id And attribute features
Figure BDA0003948863910000091
Inputting into a second adaptive instance normalization layer by formula
Figure BDA0003948863910000092
Calculating to obtain fusion characteristics
Figure BDA0003948863910000093
In the formula
Figure BDA0003948863910000094
As a property feature
Figure BDA0003948863910000095
The channel average of (a);
e-6) fusing features
Figure BDA0003948863910000096
Inputting the attribute data into a fifth fusion block of the fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of the fifth fusion block
Figure BDA0003948863910000097
Identity feature F id And attribute features
Figure BDA0003948863910000098
Inputting into the first adaptive instance normalization layer by formula
Figure BDA0003948863910000099
Calculating to obtain fusion characteristics
Figure BDA00039488639100000910
In the formula
Figure BDA00039488639100000911
As a characteristic of an attribute
Figure BDA00039488639100000912
Channel average of (2), fusing the features
Figure BDA00039488639100000913
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100000914
Will be characterized by
Figure BDA00039488639100000915
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100000916
Identity F id And attribute features
Figure BDA00039488639100000917
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100000918
Calculating to obtain fusion characteristics
Figure BDA00039488639100000919
In the formula
Figure BDA00039488639100000920
As a characteristic of an attribute
Figure BDA00039488639100000921
The channel average of (a);
e-7) fusing the features
Figure BDA00039488639100000922
Inputting the attribute data into a fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of a sixth fusion block
Figure BDA00039488639100000923
Identity F id And attribute features
Figure BDA00039488639100000924
Inputting into the first adaptive instance normalization layer by formula
Figure BDA00039488639100000925
Calculating to obtain fusion characteristics
Figure BDA00039488639100000926
In the formula
Figure BDA00039488639100000927
As a property feature
Figure BDA00039488639100000928
Channel average of (2), fusing the features
Figure BDA00039488639100000929
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100000930
Will be characterized by
Figure BDA00039488639100000931
After inputting into the second convolution layer, obtaining attribute characteristics
Figure BDA00039488639100000932
Identity feature F id And attribute features
Figure BDA00039488639100000933
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100000934
Calculating to obtain fusion characteristics
Figure BDA00039488639100000935
In the formula
Figure BDA00039488639100000936
As a property feature
Figure BDA00039488639100000937
The channel average of (a);
e-8) optimizing the feature key points lm fake Respectively inputting two convolution layers to respectively obtain characteristics F gamma And feature F beta By the formula
Figure BDA0003948863910000101
Calculating to obtain a fusion vector F fuse
e-9) the up-sampling module is composed of a first up-sampling layer, a second up-sampling layer, a third up-sampling layer, a fourth up-sampling layer and a fifth up-sampling layer, and the background characteristics are obtained
Figure BDA0003948863910000102
And a fusion vector F fuse Inputting the data into a first up-sampling layer of an up-sampling module, and outputting the obtained characteristics
Figure BDA0003948863910000103
Will be characterized by
Figure BDA0003948863910000104
And background features
Figure BDA0003948863910000105
Input into the second up-sampling layer together, and output to obtain characteristics
Figure BDA0003948863910000106
Will be characterized by
Figure BDA0003948863910000107
And background features
Figure BDA0003948863910000108
Input into the third up-sampling layer together, and output is characterized
Figure BDA0003948863910000109
Will be characterized by
Figure BDA00039488639100001010
And background features
Figure BDA00039488639100001011
Input into the fourth up-sampling layer together, and output to obtain characteristics
Figure BDA00039488639100001012
Will be characterized by
Figure BDA00039488639100001013
And background feature F bg Inputting the images into a fifth upper sampling layer, and outputting to obtain a face image Pic fake
e-10) the discriminator module comprises a first downsampled convolutional layer, a second downsampled convolutional layer, a third downsampled convolutional layer,A fourth down-sampling convolutional layer, a fifth down-sampling convolutional layer, a sixth down-sampling convolutional layer, and a Sigmoid function layer, and the face image Pic fake Inputting the data into the first downsampling convolutional layer, and outputting the data to obtain the characteristics
Figure BDA00039488639100001014
Will be characterized by
Figure BDA00039488639100001015
Input to the second downsampled convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100001016
Will be characterized by
Figure BDA00039488639100001017
Input to the third downsampling convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100001018
Will be characterized by
Figure BDA00039488639100001019
Inputting the data into a fourth downsampled convolutional layer, and outputting the data to obtain characteristics
Figure BDA00039488639100001020
Will be characterized by
Figure BDA00039488639100001021
Inputting the data into a fifth downsampled convolutional layer, and outputting the result to obtain characteristics
Figure BDA00039488639100001022
Will be characterized by
Figure BDA00039488639100001023
Input to the sixth downsampled convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100001024
Will be characterized by
Figure BDA00039488639100001025
After being input into a Sigmoid function layer, the values are output
Figure BDA00039488639100001026
Target image Pic t After input to the first downsampled convolutional layer, output to obtain the characteristics
Figure BDA00039488639100001027
Will be characterized by
Figure BDA00039488639100001028
Input into the second downsampled convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100001029
Will be characterized by
Figure BDA00039488639100001030
Input to the third downsampling convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100001031
Will be characterized by
Figure BDA00039488639100001032
Inputting the data into a fourth downsampled convolutional layer, and outputting the data to obtain characteristics
Figure BDA00039488639100001033
Will be characterized by
Figure BDA00039488639100001034
Inputting the data into a fifth downsampled convolutional layer, and outputting the result to obtain characteristics
Figure BDA00039488639100001035
Will be characterized by
Figure BDA00039488639100001036
After being input into the sixth downsampled convolution layer, the convolution product is outputTo obtain the characteristics
Figure BDA0003948863910000111
Will be characterized by
Figure BDA0003948863910000112
After being input into a Sigmoid function layer, the values are output
Figure BDA0003948863910000113
e-11) by the formula
Figure BDA0003948863910000114
Calculating to obtain the identity loss l1, and obtaining the identity loss l1 through a formula l2= | | Pic fake -Pic t || 2 Calculating to obtain the reconstruction loss l2 by the formula
Figure BDA0003948863910000115
Calculating to obtain attribute loss l3, and iteratively optimizing the face image Pic by using the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method fake
Further, in the step e-2), convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are all 3, step length is 1, and filling is 0; in the step e-8), the convolution kernels of the two convolution layers are both 1, the step length is both 1, and the filling is both 0; in the step e-9), convolution kernels of the first upper sampling layer, the second upper sampling layer, the third upper sampling layer and the fourth upper sampling layer are all 3, step length is 1, filling is 1, convolution kernels of the fifth upper sampling layer are 7, step length is 1, and filling is 0; e-10), the convolution kernels of the first down-sampling convolution layer, the second down-sampling convolution layer and the third down-sampling convolution layer are all 4 x 4, the step lengths are all 2, the padding is all 1, the convolution kernels of the fourth down-sampling convolution layer, the fifth down-sampling convolution layer and the sixth down-sampling convolution layer are all 4 x 4, the step lengths are all 1, and the padding is all 1.
The invention has the beneficial effects that: and extracting identity information, attribute information and background information of each image, fusing the information in a characteristic fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image with the face shape changed and stable quality.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of the key point extraction and adjustment structure of the present invention;
FIG. 3 is a diagram of a key point discriminator network architecture of the present invention;
FIG. 4 is a diagram of an attribute extraction structure and a downsampling structure according to the present invention;
FIG. 5 is a block diagram illustrating an exemplary spatial adaptive normalization architecture according to the present invention;
FIG. 6 is a diagram of semantic parsing and background information extraction according to the present invention.
Detailed Description
The present invention is further described with reference to fig. 1 to 6.
A face image identity synthesis method based on semantic guidance comprises the following steps:
a) And extracting key points of the face image from all face images in the CelebA face image data set.
b) Establishing a PET key point adjusting network, inputting the key points of the face image into the PET key point adjusting network to obtain the characteristic key points lm fake For the feature key point lm fake Iteration is carried out to obtain optimized characteristic key points lm fake
c) Establishing a facial image feature extraction network, and concentrating CelebA facial image data into a source image Pic s And a target image Pic t Inputting the image data into a face image feature extraction network, and respectively outputting to obtain identity features F id And attribute feature F attr
d) Establishing a background feature extraction network and extracting a target image Pic t Inputting the background feature information into a background feature extraction network to obtain background feature information F bg
e) Establishing a generating network, and identifying the identity F id Attribute feature F attr Background feature information F bg And optimized characteristic key point lm fake Inputting the image into a generation network to obtain a face image Pic fake For picture Pic fake Iteration is carried out to obtain an optimized face image Pic fake
f) Repeating the steps b) to e) to obtain a real face image Pic with a changed face contour fake . The method comprises the steps of providing characteristic key points for semantically guiding face shape change, extracting identity information, attribute information and background information of each image, fusing the information in a characteristic fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the generated face image with the face shape changed and stable quality is obtained by adding the background information in the training process.
Example 1:
the step a) comprises the following steps:
a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set s The key points extracted in (1) are expressed as source key points lm s Target image Pic from CelebA face image dataset t The key points extracted in (1) are expressed as source key points lm t . CelebA facial image data set is composed of 30000 facial images with different identities, the resolution of each image is 512 x 512, and a source image Pic s And a target image Pic t Are all images in the CelebA dataset.
Example 2:
the step b) comprises the following steps:
b-1) creation by the Source encoder E lms Target encoder E lmt Key point generator G lm And a similarity discriminator D S And a true and false discriminator D TF The formed PET key points regulate the network.
b-2) Source encoder E lms Comprises a first down-sampling convolution layer, a second down-sampling convolution layer, a third down-sampling convolution layer, a fourth down-sampling convolution layer and a fifth sampling convolution layer, and the source key point lm is s Input to a source encoder E lms In the first downsampling convolutional layer, outputting the obtained characteristic information
Figure BDA0003948863910000131
Feature information
Figure BDA0003948863910000132
Inputting the data into a second downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000133
Feature information
Figure BDA0003948863910000134
Inputting the data into a third downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000135
Feature information
Figure BDA0003948863910000136
Inputting the data into a fourth downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000137
Feature information
Figure BDA0003948863910000138
Inputting the data into a fifth downsampled convolutional layer, and outputting the characteristic information
Figure BDA0003948863910000139
b-3) target encoder E lmt Comprises a first full connection layer, a second full connection layer, a third full connection layer, a fourth full connection layer and a fifth full connection layer, and a source key point lm t Input to a target encoder E lmt First fully connected layer ofIn the method, the characteristic information is obtained by outputting
Figure BDA00039488639100001310
Feature information
Figure BDA00039488639100001311
Inputting the data into a second full connection layer, and outputting the data to obtain characteristic information
Figure BDA00039488639100001312
Feature information
Figure BDA00039488639100001313
Inputting the data into a third full connection layer, and outputting the data to obtain characteristic information
Figure BDA00039488639100001314
Feature information
Figure BDA00039488639100001315
Inputting the data into a fourth full connection layer, and outputting the data to obtain characteristic information
Figure BDA00039488639100001316
Feature information
Figure BDA00039488639100001317
Inputting the data into the fifth connection layer, and outputting the obtained characteristic information
Figure BDA00039488639100001318
Cat () function will feature information
Figure BDA00039488639100001319
And characteristic information
Figure BDA00039488639100001320
Stacking to obtain feature vectors
Figure BDA00039488639100001321
b-5) Key Point Generator G lm Composed of a first up-sampling convolutional layer, a second up-sampling convolutional layer, a third up-sampling convolutional layer, a fourth up-sampling convolutional layer and a fifth up-sampling convolutional layer, and the feature vector
Figure BDA0003948863910000141
Input to the keypoint generator G lm In the first upsampled convolutional layer, outputting to obtain the characteristic key point
Figure BDA0003948863910000142
Key points of features
Figure BDA0003948863910000143
Inputting the data into a second upsampling convolutional layer, and outputting to obtain a characteristic key point
Figure BDA0003948863910000144
Key points of features
Figure BDA0003948863910000145
Inputting the data into a third upsampling convolutional layer, and outputting to obtain characteristic key points
Figure BDA0003948863910000146
Key points of features
Figure BDA0003948863910000147
Inputting the data into a fourth upsampling convolutional layer, and outputting to obtain characteristic key points
Figure BDA0003948863910000148
Key points of features
Figure BDA0003948863910000149
Inputting the data into a fifth upsampling convolutional layer, and outputting to obtain a characteristic key point lm fake Its dimension is 1 × 212.
b-6) similarity discriminator D S By Layer s Module, layer fake Module, layer c Modular construction, layer fake The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a characteristic key point lm is connected with the module fake Input to Layer fake In the first full connection layer of the module, outputting the characteristic information
Figure BDA00039488639100001410
Feature information
Figure BDA00039488639100001411
Input to Layer fake In the second full connection layer of the module, outputting the characteristic information
Figure BDA00039488639100001412
Feature information
Figure BDA00039488639100001413
Input to Layer fake In the third full connection layer of the module, outputting the obtained characteristic information
Figure BDA00039488639100001414
Feature information
Figure BDA00039488639100001415
Input to Layer fake In the fourth full connection layer of the module, outputting the characteristic information
Figure BDA00039488639100001416
Layer s The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a source key point lm s Input to Layer s The module outputs the characteristic information from the first full connection layer
Figure BDA00039488639100001417
Feature information
Figure BDA00039488639100001418
Input to Layer s The module outputs the characteristic information from the second full connection layer
Figure BDA00039488639100001419
Feature information
Figure BDA00039488639100001420
Input to Layer s The module outputs the characteristic information from the third full connection layer
Figure BDA00039488639100001421
Feature information
Figure BDA00039488639100001422
Input to Layer s The module outputs the characteristic information from the fourth full connection layer
Figure BDA00039488639100001423
Cat () function is used to transfer feature information
Figure BDA00039488639100001424
And characteristic information
Figure BDA00039488639100001425
Stacking to obtain feature vectors
Figure BDA00039488639100001426
Layer c The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the feature vectors are combined
Figure BDA00039488639100001427
Input to Layer c In a first full connection Layer of the module, outputting to obtain a similarity characteristic Fscore1, and inputting the similarity characteristic Fscore1 into a Layer c In a second full connection Layer of the module, outputting to obtain a similarity characteristic Fscore2, and inputting the similarity characteristic Fscore2 into the Layer c In a third full connection Layer of the module, outputting to obtain a similarity characteristic Fscore3, and inputting the similarity characteristic Fscore3 into a Layer c In the fourth full connection layer of the module, outputting to obtain a similarity score score。
b-7) true and false discriminator D TF The characteristic key point lm is composed of a first full connecting layer, a second full connecting layer, a third full connecting layer, a fourth full connecting layer, a fifth full connecting layer and a sixth full connecting layer fake Input to a true or false discriminator D TF In the first fully-connected layer of (2), the output is characterized
Figure BDA0003948863910000151
Will be characterized by
Figure BDA0003948863910000152
Input to the second fully-connected layer, output to obtain characteristics
Figure BDA0003948863910000153
Will be characterized by
Figure BDA0003948863910000154
Input to the third full-link layer, and output is characterized
Figure BDA0003948863910000155
Will be characterized by
Figure BDA0003948863910000156
Input to the fourth full connection layer, and output is characterized
Figure BDA0003948863910000157
Will be characterized by
Figure BDA0003948863910000158
Input to the fifth full-link layer, and output is characterized
Figure BDA0003948863910000159
Feature(s)
Figure BDA00039488639100001510
Inputting the data into a sixth full connection layer, and outputting the data to obtain a numerical value of 1 channel
Figure BDA00039488639100001511
b-8) by the formula loss L1 =||lm fake -lm s || 2 Calculating to obtain the loss per point L1 In the formula | · | non-conducting phosphor 2 Is the mean squared error, loss Cycle =||lm fake -lm t || 2 Calculating to obtain the loss of reconstruction Cycle By the formula
Figure BDA00039488639100001512
Calculating to obtain the loss of true and false DTF By the formula
Figure BDA00039488639100001513
Calculating to obtain the loss of similarity DS Using point-by-point loss by the back propagation method L1 Loss of reconstruction Cycle Loss of true and false DTF Similarity loss DS Iterative optimization feature key point lm fake
Example 3:
in the step b-2), the convolution kernels of the first down-sampling convolution layer, the second down-sampling convolution layer, the third down-sampling convolution layer, the fourth down-sampling convolution layer and the fifth down-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0.
Example 4:
the step c) comprises the following steps:
c-1) establishment by the identity encoder E id Sum attribute encoder E attr Forming a face image feature extraction network;
c-2) identity encoder E id Composed of Arcface algorithm, and a source image Pic s Input to identity encoder E id In the method, a source image Pic is processed by an interplate () function s Adjusting to 112 × 112 resolution, inputting the 112 × 112 resolution image into an Arcface algorithm, and outputting to obtain an identity vector
Figure BDA0003948863910000161
Wherein b is the training batch, c is the number of channels, h is the image height, w is the image width, and the identity vector is obtained
Figure BDA0003948863910000162
Sequentially inputting the data into a filling layer and a regularization layer, and outputting to obtain an identity characteristic F id
c-3) Attribute encoder E attr The first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block, the fifth downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer, a downsampling layer and a residual connecting layer, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer and a residual connecting layer, and the target image Pic is formed by the target image Pic t Input to attribute encoder E attr In the first down-sampling residual block, the attribute characteristics are output
Figure BDA0003948863910000163
Characterizing attributes
Figure BDA0003948863910000164
Inputting the data into a second downsampling residual block, and outputting the data to obtain attribute characteristics
Figure BDA0003948863910000165
Characterizing attributes
Figure BDA0003948863910000166
Inputting the data into a third down-sampling residual block, and outputting to obtain attribute characteristics
Figure BDA0003948863910000167
Characterizing attributes
Figure BDA0003948863910000168
Inputting the data into a fourth downsampling residual block, and outputting to obtain attribute characteristics
Figure BDA0003948863910000169
Characterizing attributes
Figure BDA00039488639100001610
Inputting the data into a fifth downsampling residual block, and outputting to obtain attribute characteristics
Figure BDA00039488639100001611
Characterizing attributes
Figure BDA00039488639100001612
Inputting into the first bottleneck residual block, outputting to obtain attribute characteristics
Figure BDA00039488639100001613
Characterizing attributes
Figure BDA00039488639100001614
Inputting the data into a second bottleneck residual block, and outputting to obtain an attribute characteristic F attr
Example 5:
in the step c-3), a first normalization layer and a second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are respectively in a BatchNorm2d mode; in the step c-3), convolution kernels of the first convolution layer and the second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all 3, and filling and step length are all 1.
Example 6:
the step d) comprises the following steps:
d-1) establishing a solution from a human faceAnalysis module and background information encoder E bg Forming a background feature extraction network;
d-2) the face analysis module is composed of a face analysis algorithm BiSeNet and used for analyzing a target image Pic t Inputting the image into a face analysis module, analyzing to obtain each part of the face, filling colors into each part of the face to obtain an image Pic only keeping a background region bg
d-3) background information encoder E bg The image Pic is composed of a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially composed of a down-sampling convolution layer, a self-attention layer and a ReLU activation layer, and the image Pic is formed by the image Pic bg Input to a background information encoder E bg In the first self-attention module, the output obtains the background feature
Figure BDA0003948863910000171
Characterizing the background
Figure BDA0003948863910000172
Inputting the background feature into a second self-attention module, and outputting
Figure BDA0003948863910000173
Characterizing the background
Figure BDA0003948863910000174
Inputting the data into a third self-attention module, and outputting the data to obtain background features
Figure BDA0003948863910000175
Characterizing the background
Figure BDA0003948863910000176
Inputting the data into a fourth self-attention module, and outputting the obtained background features
Figure BDA0003948863910000177
Characterizing the background
Figure BDA0003948863910000178
Inputting the background feature into a fourth self-attention module, and outputting the background feature F bg
Example 7:
in the step d-3), the convolution kernels of the downsampling convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step length is 0, and the padding is 0.
Example 8:
step e) comprises the following steps:
e-1) establishing a generation network consisting of a fusion module, an up-sampling module and a discriminator module;
e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first coiling layer, a first self-adaptive example normalizing layer, a ReLU activation layer, a second coiling layer and a second self-adaptive example normalizing layer, and the attribute characteristic F is obtained by sequentially combining the first coiling layer, the first self-adaptive example normalizing layer, the second coiling layer and the second self-adaptive example normalizing layer attr Inputting the attribute characteristics into a fusion module from a first fusion block, and obtaining the attribute characteristics after passing through a first convolution layer of the first fusion block
Figure BDA0003948863910000181
Identity feature F id And attribute features
Figure BDA0003948863910000182
Inputting into the first adaptive instance normalization layer by formula
Figure BDA0003948863910000183
Calculating to obtain fusion characteristics
Figure BDA0003948863910000184
In the formula σ id Is a special identitySign F id The standard deviation of (a) is determined,
Figure BDA0003948863910000185
as a property feature
Figure BDA0003948863910000186
Is the channel averaging operation, and σ () is the standard deviation operation, will fuse the features
Figure BDA0003948863910000187
Inputting into ReLU activation layer to obtain characteristics
Figure BDA0003948863910000188
Will be characterized by
Figure BDA0003948863910000189
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100001810
Identity F id And attribute features
Figure BDA00039488639100001811
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100001812
Calculating to obtain fusion characteristics
Figure BDA00039488639100001813
In the formula
Figure BDA00039488639100001814
As a property feature
Figure BDA00039488639100001815
The channel average of (a);
e-3) fusing features
Figure BDA00039488639100001816
Inputting the data into a fusion module, and obtaining attribute characteristics from a second fusion block after passing through a first convolution layer of the second fusion block
Figure BDA00039488639100001817
Identity feature F id And attribute features
Figure BDA00039488639100001818
Inputting into the first adaptive instance normalization layer by formula
Figure BDA00039488639100001819
Calculating to obtain fusion characteristics
Figure BDA00039488639100001820
In the formula
Figure BDA00039488639100001821
As a property feature
Figure BDA00039488639100001822
Channel average of (2), fusing the features
Figure BDA00039488639100001823
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100001824
Will be characterized by
Figure BDA00039488639100001825
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100001826
Identity feature F id And attribute features
Figure BDA00039488639100001827
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100001828
Calculating to obtain fusion characteristics
Figure BDA00039488639100001829
In the formula
Figure BDA00039488639100001830
As a characteristic of an attribute
Figure BDA00039488639100001831
A channel average of (d);
e-4) fusing the features
Figure BDA00039488639100001832
Inputting the data into a third fusion block of the fusion module, and obtaining attribute characteristics after passing through a first convolution layer of the third fusion block
Figure BDA00039488639100001833
Identity F id And attribute features
Figure BDA00039488639100001834
Inputting into the first adaptive instance normalization layer by formula
Figure BDA0003948863910000191
Calculating to obtain fusion characteristics
Figure BDA0003948863910000192
In the formula
Figure BDA0003948863910000193
As a property feature
Figure BDA0003948863910000194
Channel average of (2), fusing the features
Figure BDA0003948863910000195
Inputting into ReLU activation layer to obtain characteristics
Figure BDA0003948863910000196
Will be characterized by
Figure BDA0003948863910000197
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA0003948863910000198
Identity feature F id And attribute features
Figure BDA0003948863910000199
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100001910
Calculating to obtain fusion characteristics
Figure BDA00039488639100001911
In the formula
Figure BDA00039488639100001912
As a property feature
Figure BDA00039488639100001913
A channel average of (d);
e-5) fusing features
Figure BDA00039488639100001914
Inputting the data into a fusion module, and obtaining attribute characteristics after the data passes through a first convolution layer of a fourth fusion block
Figure BDA00039488639100001915
Identity feature F id And attribute features
Figure BDA00039488639100001916
Inputting into the first adaptive instance normalization layer by formula
Figure BDA00039488639100001917
Calculating to obtain fusion characteristics
Figure BDA00039488639100001918
In the formula
Figure BDA00039488639100001919
As a property feature
Figure BDA00039488639100001920
Channel average of (2), fusing the features
Figure BDA00039488639100001921
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100001922
Will be characterized by
Figure BDA00039488639100001923
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100001924
Identity feature F id And attribute features
Figure BDA00039488639100001925
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100001926
Calculating to obtain fusion characteristics
Figure BDA00039488639100001927
In the formula
Figure BDA00039488639100001928
As a property feature
Figure BDA00039488639100001929
The channel average of (a);
e-6) fusing the features
Figure BDA00039488639100001930
Inputting the attribute data into a fifth fusion block of the fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of the fifth fusion block
Figure BDA00039488639100001931
Identity feature F id And attribute features
Figure BDA00039488639100001932
Inputting into the first adaptive instance normalization layer by formula
Figure BDA00039488639100001933
Calculating to obtain fusion characteristics
Figure BDA00039488639100001934
In the formula
Figure BDA00039488639100001935
As a characteristic of an attribute
Figure BDA00039488639100001936
Channel average of (2), fusing the features
Figure BDA00039488639100001937
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100001938
Will be characterized by
Figure BDA00039488639100001939
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure BDA00039488639100001940
Identity feature F id And attribute features
Figure BDA0003948863910000201
Inputting into a second adaptive instance normalization layer by formula
Figure BDA0003948863910000202
Calculating to obtain fusion characteristics
Figure BDA0003948863910000203
In the formula
Figure BDA0003948863910000204
As a property feature
Figure BDA0003948863910000205
The channel average of (a);
e-7) fusing the features
Figure BDA0003948863910000206
Inputting the attribute data into a fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of a sixth fusion block
Figure BDA0003948863910000207
Identity feature F id And attribute features
Figure BDA0003948863910000208
Inputting into the first adaptive instance normalization layer by formula
Figure BDA0003948863910000209
Calculating to obtain fusion characteristics
Figure BDA00039488639100002010
In the formula
Figure BDA00039488639100002011
As a property feature
Figure BDA00039488639100002012
Channel average of (2), fusing the features
Figure BDA00039488639100002013
Inputting into ReLU activation layer to obtain characteristics
Figure BDA00039488639100002014
Will be characterized by
Figure BDA00039488639100002015
After inputting into the second convolution layer, obtaining attribute characteristics
Figure BDA00039488639100002016
Identity feature F id And attribute features
Figure BDA00039488639100002017
Inputting into a second adaptive instance normalization layer by formula
Figure BDA00039488639100002018
Calculating to obtain fusion characteristics
Figure BDA00039488639100002019
In the formula
Figure BDA00039488639100002020
As a property feature
Figure BDA00039488639100002021
A channel average of (d);
e-8) optimizing the feature key points lm fake Inputting two convolution layers respectively to obtain characteristics F gamma And feature F beta By the formula
Figure BDA00039488639100002022
Calculating to obtain a fusion vector F fuse
e-9) the up-sampling module is composed of a first up-sampling layer, a second up-sampling layer, a third up-sampling layer, a fourth up-sampling layer and a fifth up-sampling layer, and the background characteristics are obtained
Figure BDA00039488639100002023
And a fusion vector F fuse Inputting the data into a first up-sampling layer of an up-sampling module, and outputting the obtained characteristics
Figure BDA00039488639100002024
Will be characterized by
Figure BDA00039488639100002025
And background features
Figure BDA00039488639100002026
Input into the second up-sampling layer together, and output to obtain characteristics
Figure BDA00039488639100002027
Will be characterized by
Figure BDA00039488639100002028
And background features
Figure BDA00039488639100002029
Input into the third up-sampling layer together, and output is characterized
Figure BDA00039488639100002030
Will be characterized by
Figure BDA00039488639100002031
And background features
Figure BDA00039488639100002032
Input into the fourth up-sampling layer together, and output to obtain characteristics
Figure BDA00039488639100002033
Will be characterized by
Figure BDA00039488639100002034
And background feature F bg Inputting the images into a fifth upper sampling layer, and outputting to obtain a face image Pic fake
e-10) the discriminator module comprises a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer,the face image Pic fake After input to the first downsampled convolutional layer, output to obtain the characteristics
Figure BDA0003948863910000211
Will be characterized by
Figure BDA0003948863910000212
Input to the second downsampled convolutional layer, and output to obtain the characteristics
Figure BDA0003948863910000213
Will be characterized by
Figure BDA0003948863910000214
Input to a third downsampled convolutional layer, and output to obtain characteristics
Figure BDA0003948863910000215
Will be characterized by
Figure BDA0003948863910000216
Input to the fourth downsampled convolutional layer, and output to obtain the characteristics
Figure BDA0003948863910000217
Will be characterized by
Figure BDA0003948863910000218
Inputting the data into a fifth downsampled convolutional layer, and outputting the result to obtain characteristics
Figure BDA0003948863910000219
Will be characterized by
Figure BDA00039488639100002110
Input to the sixth downsampling convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100002111
Will be characterized by
Figure BDA00039488639100002112
After being input into a Sigmoid function layer, the values are output
Figure BDA00039488639100002113
Target image Pic t After input to the first downsampled convolutional layer, output to obtain the characteristics
Figure BDA00039488639100002114
Will be characterized by
Figure BDA00039488639100002115
Input to the second downsampled convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100002116
Will be characterized by
Figure BDA00039488639100002117
Input to a third downsampled convolutional layer, and output to obtain characteristics
Figure BDA00039488639100002118
Will be characterized by
Figure BDA00039488639100002119
Input to the fourth downsampled convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100002120
Will be characterized by
Figure BDA00039488639100002121
Input to the fifth downsampled convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100002122
Will be characterized by
Figure BDA00039488639100002123
Input to the sixth downsampling convolutional layer, and output to obtain the characteristics
Figure BDA00039488639100002124
Will be characterized by
Figure BDA00039488639100002125
After the data is input into a Sigmoid function layer, a numerical value is output
Figure BDA00039488639100002126
e-11) by the formula
Figure BDA00039488639100002127
Calculating to obtain the identity loss l1, and obtaining the identity loss l1 through a formula l2= | | Pic fake -Pic t || 2 Calculating to obtain the reconstruction loss l2 by the formula
Figure BDA00039488639100002128
Calculating to obtain attribute loss l3, and iteratively optimizing the face image Pic by using the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method fake
Example 9:
convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block in the step e-2) are all 3, step length is 1, and filling is 0; in the step e-8), the convolution kernels of the two convolution layers are both 1, the step length is both 1, and the filling is both 0; in the step e-9), convolution kernels of the first upper sampling layer, the second upper sampling layer, the third upper sampling layer and the fourth upper sampling layer are all 3, step length is 1, filling is 1, convolution kernels of the fifth upper sampling layer are 7, step length is 1, and filling is 0; e-10), wherein the convolution kernels of the first downsampling convolutional layer, the second downsampling convolutional layer and the third downsampling convolutional layer are all 4 x 4, the step lengths are all 2, the padding is all 1, the convolution kernels of the fourth downsampling convolutional layer, the fifth downsampling convolutional layer and the sixth downsampling convolutional layer are all 4 x 4, the step lengths are all 1, and the padding is all 1.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A face image identity synthesis method based on semantic guidance is characterized by comprising the following steps:
a) Extracting key points of the face image from all face images in the CelebA face image data set;
b) Establishing a PET key point adjusting network, inputting the key points of the face image into the PET key point adjusting network to obtain the characteristic key points lm fake For the feature key point lm fake Iteration is carried out to obtain optimized characteristic key points lm fake
c) Establishing a facial image feature extraction network, and concentrating CelebA facial image data into a source image Pic s And a target image Pic t Inputting the image data into a face image feature extraction network, and respectively outputting to obtain identity features F id And attribute feature F attr
d) Establishing a background feature extraction network and extracting a target image Pic t Inputting the background feature information into a background feature extraction network to obtain background feature information F bg
e) Establishing a generating network, and identifying the identity F id Attribute feature F attr Background feature information F bg And optimized feature key point lm fake Inputting the image into a generation network to obtain a face image Pic fake For picture Pic fake Iteration is carried out to obtain an optimized face image Pic fake
f) Repeating the steps b) to e) to obtain a real face image Pic with a changed face contour fake
2. The face image identity synthesis method based on semantic guidance according to claim 1,
the step a) comprises the following steps:
a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set s The key points extracted in (1) are expressed as source key points lm s Target image Pic from CelebA face image dataset t The key points extracted in (1) are expressed as source key points lm t
3. The face image identity synthesis method based on semantic guidance according to claim 2,
the step b) comprises the following steps:
b-1) construction by the Source encoder E lms Target encoder E lmt Key point generator G lm And a similarity discriminator D S And a true and false discriminator D TF Forming a PET key point adjusting network;
b-2) Source encoder E lms Comprises a first down-sampling convolutional layer, a second down-sampling convolutional layer, a third down-sampling convolutional layer, a fourth down-sampling convolutional layer and a fifth sampling convolutional layer, and the source key point lm is s Input to a source encoder E lms In the first downsampling convolutional layer, outputting the obtained characteristic information
Figure FDA0003948863900000021
Feature information
Figure FDA0003948863900000022
Inputting the data into a second downsampled convolutional layer, and outputting the characteristic information
Figure FDA0003948863900000023
Feature information
Figure FDA0003948863900000024
Inputting the data into a third downsampling convolutional layer, and outputting the characteristic information
Figure FDA0003948863900000025
Feature information
Figure FDA0003948863900000026
Inputting the data into a fourth downsampled convolutional layer, and outputting the characteristic information
Figure FDA0003948863900000027
Feature information
Figure FDA0003948863900000028
Inputting the data into a fifth downsampled convolutional layer, and outputting the characteristic information
Figure FDA0003948863900000029
b-3) target encoder E lmt Comprises a first full connection layer, a second full connection layer, a third full connection layer, a fourth full connection layer and a fifth full connection layer, and a source key point lm t Input to a target encoder E lmt In the first full connection layer, the characteristic information is output
Figure FDA00039488639000000210
Feature information
Figure FDA00039488639000000211
Inputting the data into a second full-connection layer, and outputting the data to obtain characteristic information
Figure FDA00039488639000000212
Feature information
Figure FDA00039488639000000213
Inputting the data into a third full connection layer, and outputting the data to obtain characteristic information
Figure FDA00039488639000000214
Feature information
Figure FDA00039488639000000215
Inputting the data into a fourth full connection layer, and outputting the data to obtain characteristic information
Figure FDA00039488639000000216
Feature information
Figure FDA00039488639000000217
Inputting the data into the fifth connection layer, and outputting the obtained characteristic information
Figure FDA00039488639000000218
Cat () function will feature information
Figure FDA00039488639000000219
And characteristic information
Figure FDA00039488639000000220
Stacking to obtain feature vectors
Figure FDA00039488639000000221
b-5) Key Point Generator G lm Composed of a first up-sampling convolutional layer, a second up-sampling convolutional layer, a third up-sampling convolutional layer, a fourth up-sampling convolutional layer and a fifth up-sampling convolutional layer, and the feature vector
Figure FDA00039488639000000222
Input to the keypoint generator G lm In the first upsampled convolutional layer, outputting to obtain the characteristic key point
Figure FDA00039488639000000223
Key points of features
Figure FDA00039488639000000224
Is inputted intoIn the second up-sampling convolution layer, outputting to obtain the characteristic key point
Figure FDA00039488639000000225
Key points of features
Figure FDA00039488639000000226
Inputting the data into a third upsampling convolutional layer, and outputting to obtain characteristic key points
Figure FDA00039488639000000227
Key points of features
Figure FDA00039488639000000228
Inputting the data into a fourth upsampling convolutional layer, and outputting to obtain characteristic key points
Figure FDA00039488639000000229
Feature key points
Figure FDA00039488639000000230
Inputting the data into a fifth upsampling convolutional layer, and outputting to obtain a characteristic key point lm fake
b-6) similarity discriminator D S By Layer s Module, layer fake Module, layer c Modular construction, layer fake The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a characteristic key point lm is connected with the module fake Input to Layer fake In the first full connection layer of the module, outputting the characteristic information
Figure FDA0003948863900000031
Feature information
Figure FDA0003948863900000032
Input to Layer fake In the second full connection layer of the module, outputting the characteristic information
Figure FDA0003948863900000033
Feature information
Figure FDA0003948863900000034
Input to Layer fake In the third full connection layer of the module, outputting the obtained characteristic information
Figure FDA0003948863900000035
Feature information
Figure FDA0003948863900000036
Input to Layer fake In the fourth full connection layer of the module, outputting the characteristic information
Figure FDA0003948863900000037
Layer s The module consists of a first full connecting layer, a second full connecting layer, a third full connecting layer and a fourth full connecting layer, and a source key point lm s Input to Layer s The module outputs the characteristic information from the first full connection layer
Figure FDA0003948863900000038
Feature information
Figure FDA0003948863900000039
Input to Layer s The module outputs the characteristic information from the second full connection layer
Figure FDA00039488639000000310
Feature information
Figure FDA00039488639000000311
Input to Layer s The module outputs the characteristic information from the third full connection layer
Figure FDA00039488639000000312
Feature information
Figure FDA00039488639000000313
Input to Layer s The module outputs the characteristic information from the fourth full connection layer
Figure FDA00039488639000000314
Cat () function is used to transfer feature information
Figure FDA00039488639000000315
And characteristic information
Figure FDA00039488639000000316
Stacking to obtain feature vectors
Figure FDA00039488639000000317
Layer c The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the feature vectors are combined
Figure FDA00039488639000000318
Input to Layer c In a first full connection Layer of the module, outputting to obtain a similarity characteristic Fscore1, and inputting the similarity characteristic Fscore1 into a Layer c In the second full connection Layer of the module, outputting to obtain a similarity characteristic Fscore2, and inputting the similarity characteristic Fscore2 into the Layer c In a third full connection Layer of the module, outputting to obtain a similarity characteristic Fscore3, and inputting the similarity characteristic Fscore3 into a Layer c Outputting to obtain a similarity score in a fourth full connection layer of the module;
b-7) true and false discriminator D TF The characteristic key point lm is composed of a first full connecting layer, a second full connecting layer, a third full connecting layer, a fourth full connecting layer, a fifth full connecting layer and a sixth full connecting layer fake Input to a true or false discriminator D TF In the first full connection layer of (2), the output is characterized
Figure FDA00039488639000000319
Will be characterized by
Figure FDA00039488639000000320
Input to the second fully-connected layer, output is characterized
Figure FDA0003948863900000041
Will be characterized by
Figure FDA0003948863900000042
Input to the third full-link layer, and output is characterized
Figure FDA0003948863900000043
Will be characterized by
Figure FDA0003948863900000044
Input to the fourth fully-connected layer, and output to obtain characteristics
Figure FDA0003948863900000045
Will be characterized by
Figure FDA0003948863900000046
Input to the fifth fully-connected layer, output to obtain characteristics
Figure FDA0003948863900000047
Feature(s)
Figure FDA0003948863900000048
Inputting the data into a sixth full connection layer, and outputting the data to obtain a numerical value of 1 channel
Figure FDA0003948863900000049
b-8) by the formula loss L1 =||lm fake -lm s || 2 Calculating to obtain the loss per point L1 In the formula | · | non-conducting phosphor 2 Is mean squared error, loss Cycle =||lm fake -lm t || 2 Calculating to obtain the loss of reconstruction Cycle By the formula
Figure FDA00039488639000000410
Calculating to obtain the loss of true and false DTF By the formula
Figure FDA00039488639000000411
Calculating to obtain the loss of similarity DS Using point-by-point loss by the back propagation method L1 Loss of reconstruction Cycle Loss of true and false DTF Loss of similarity DS Iterative optimization feature key point lm fake
4. The semantic guidance-based face image identity synthesis method according to claim 1, characterized in that: in the step b-2), the convolution kernels of the first down-sampling convolution layer, the second down-sampling convolution layer, the third down-sampling convolution layer, the fourth down-sampling convolution layer and the fifth down-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the padding is all 0.
5. The method for synthesizing the identity of the human face image based on semantic guidance according to claim 1, wherein the step c) comprises the following steps:
c-1) establishment by identity encoder E id And attribute encoder E attr Forming a face image feature extraction network;
c-2) identity encoder E id Composed of Arcface algorithm, and a source image Pic s Input to identity encoder E id In the method, a source image Pic is processed by an interplate () function s Adjusting to 112 × 112 resolution, inputting the 112 × 112 resolution image into an Arcface algorithm, and outputting to obtain an identity vector
Figure FDA00039488639000000412
Wherein b is the training batch, c is the number of channels, h is the image height, w is the image width, and the identity vector is obtained
Figure FDA00039488639000000413
Sequentially inputting the data into a filling layer and a regularization layer, and outputting to obtain an identity characteristic F id
c-3) Attribute encoder E attr The first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block, the fifth downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer, a downsampling layer and a residual connecting layer, the first bottleneck residual block and the second bottleneck residual block are sequentially composed of a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first rolling layer, a second rolling layer and a residual connecting layer, and the target image Pic is formed by the target image Pic t Input to attribute encoder E attr In the first downsampling residual block, outputting the obtained attribute characteristics
Figure FDA0003948863900000051
Characterizing attributes
Figure FDA0003948863900000052
Inputting the data into a second downsampling residual block, and outputting the data to obtain attribute characteristics
Figure FDA0003948863900000053
Characterizing attributes
Figure FDA0003948863900000054
Inputting the data into a third down-sampling residual block, and outputting to obtain attribute characteristics
Figure FDA0003948863900000055
Characterizing attributes
Figure FDA0003948863900000056
Inputting the data into a fourth downsampling residual block, and outputting to obtain attribute characteristics
Figure FDA0003948863900000057
Characterizing attributes
Figure FDA0003948863900000058
Inputting the data into a fifth downsampling residual block, and outputting to obtain attribute characteristics
Figure FDA0003948863900000059
Characterizing attributes
Figure FDA00039488639000000510
Inputting into the first bottleneck residual block, outputting to obtain attribute characteristics
Figure FDA00039488639000000511
Characterizing attributes
Figure FDA00039488639000000512
Inputting the data into a second bottleneck residual block, and outputting to obtain an attribute characteristic F attr
6. The semantic guidance-based human face image identity synthesis method according to claim 1, characterized in that: in the step c-3), a first normalization layer and a second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are respectively a BatchNorm2d; convolution kernels of the first convolution layer and the second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block in the step c-3) are all 3, and filling and step length are all 1.
7. The method for synthesizing the identity of the human face image based on semantic guidance according to claim 1, wherein the step d) comprises the following steps:
d-1) establishing a human face analysis module and a background information encoder E bg A constructed background feature extraction network;
d-2) the face analysis module is composed of a face analysis algorithm BiSeNet and used for analyzing a target image Pic t Inputting the image into a face analysis module, analyzing to obtain each part of the face, filling colors into each part of the face to obtain an image Pic only keeping a background region bg
d-3) background information encoder E bg The image Pic is composed of a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially composed of a down-sampling convolution layer, a self-attention layer and a ReLU activation layer, and the image Pic is formed by the image Pic bg Input to a background information encoder E bg In the first self-attention module, the output obtains the background feature
Figure FDA0003948863900000061
Characterizing the background
Figure FDA0003948863900000062
Inputting the data into a second self-attention module, and outputting the data to obtain background features
Figure FDA0003948863900000063
Characterizing the background
Figure FDA0003948863900000064
Inputting the data into a third self-attention module, and outputting the data to obtain background features
Figure FDA0003948863900000065
Characterizing the background
Figure FDA0003948863900000066
Inputting the data into a fourth self-attention module, and outputting the obtained background features
Figure FDA0003948863900000067
Characterizing the background
Figure FDA0003948863900000068
Inputting the background feature into a fourth self-attention module, and outputting the background feature F bg
8. The semantic guidance-based human face image identity synthesis method according to claim 1, characterized in that: in the step d-3), convolution kernels of the downsampling convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, step length is all 0, and padding is all 0.
9. The method for synthesizing the identity of the human face image based on semantic guidance according to claim 7, wherein the step e) comprises the following steps:
e-1) establishing a generation network consisting of a fusion module, an up-sampling module and a discriminator module;
e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second adaptive instance normalization layer, and the attribute characteristic F is obtained by normalizing the first convolution layer, the second convolution layer and the second adaptive instance normalization layer attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a first fusion block
Figure FDA0003948863900000069
Identity feature F id And attribute features
Figure FDA00039488639000000610
Inputting into the first adaptive instance normalization layer by formula
Figure FDA00039488639000000611
Calculating to obtain fusion characteristics
Figure FDA00039488639000000612
In the formula σ id As identity feature F id The standard deviation of the (c) is,
Figure FDA00039488639000000613
as a characteristic of an attribute
Figure FDA00039488639000000614
Is a channel averaging operation, and σ () is a standard deviation operation, the features are fused
Figure FDA0003948863900000071
Inputting into ReLU activation layer to obtain characteristics
Figure FDA0003948863900000072
Will be characterized by
Figure FDA0003948863900000073
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure FDA0003948863900000074
Identity feature F id And attribute features
Figure FDA0003948863900000075
Inputting into a second adaptive instance normalization layer by formula
Figure FDA0003948863900000076
Calculating to obtain fusion characteristics
Figure FDA0003948863900000077
In the formula
Figure FDA0003948863900000078
As a characteristic of an attribute
Figure FDA0003948863900000079
The channel average of (a);
e-3) fusing features
Figure FDA00039488639000000710
Inputting the data into a fusion module, and obtaining attribute characteristics after the data passes through a first convolution layer of a second fusion block
Figure FDA00039488639000000711
Identity feature F id And attribute features
Figure FDA00039488639000000712
Inputting into the first adaptive instance normalization layer by formula
Figure FDA00039488639000000713
Calculating to obtain fusion characteristics
Figure FDA00039488639000000714
In the formula
Figure FDA00039488639000000715
As a property feature
Figure FDA00039488639000000716
Channel average of (2), fusing the features
Figure FDA00039488639000000717
Inputting into ReLU activation layer to obtain characteristics
Figure FDA00039488639000000718
Will be characterized by
Figure FDA00039488639000000719
After inputting into the second convolution layer, obtaining attribute characteristics
Figure FDA00039488639000000720
Identity F id And attribute features
Figure FDA00039488639000000721
Inputting into a second adaptive instance normalization layer by formula
Figure FDA00039488639000000722
Calculating to obtain fusion characteristics
Figure FDA00039488639000000723
In the formula
Figure FDA00039488639000000724
As a characteristic of an attribute
Figure FDA00039488639000000725
A channel average of (d);
e-4) fusing features
Figure FDA00039488639000000726
Inputting the data into a third fusion block of the fusion module, and obtaining attribute characteristics after passing through a first convolution layer of the third fusion block
Figure FDA00039488639000000727
Identity feature F id And attribute features
Figure FDA00039488639000000728
Inputting into a first adaptive instance normalization layer by formula
Figure FDA00039488639000000729
Calculating to obtain fusion characteristics
Figure FDA00039488639000000730
In the formula
Figure FDA00039488639000000731
As a property feature
Figure FDA00039488639000000732
Channel average of (2), fusing the features
Figure FDA00039488639000000733
Inputting into ReLU activation layer to obtain characteristics
Figure FDA00039488639000000734
Will be characterized by
Figure FDA00039488639000000735
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure FDA00039488639000000736
Identity feature F id And attribute features
Figure FDA00039488639000000737
Inputting into a second adaptive instance normalization layer by formula
Figure FDA0003948863900000081
Calculating to obtain fusion characteristics
Figure FDA0003948863900000082
In the formula
Figure FDA0003948863900000083
As a property feature
Figure FDA0003948863900000084
The channel average of (a);
e-5) fusing features
Figure FDA0003948863900000085
Inputting the data into a fusion module, and obtaining attribute characteristics after the data passes through a first convolution layer of a fourth fusion block
Figure FDA0003948863900000086
Identity feature F id And attribute features
Figure FDA0003948863900000088
Inputting into the first adaptive instance normalization layer by formula
Figure FDA0003948863900000089
Calculating to obtain fusion characteristics
Figure FDA00039488639000000810
In the formula
Figure FDA00039488639000000811
As a property feature
Figure FDA00039488639000000812
Channel average of (2), fusing the features
Figure FDA00039488639000000813
Inputting into ReLU activation layer to obtain characteristics
Figure FDA00039488639000000814
Will be characterized by
Figure FDA00039488639000000815
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure FDA00039488639000000816
Identity feature F id And attribute features
Figure FDA00039488639000000817
Inputting into a second adaptive instance normalization layer by formula
Figure FDA00039488639000000818
Calculating to obtain fusion characteristics
Figure FDA00039488639000000819
In the formula
Figure FDA00039488639000000820
As a characteristic of an attribute
Figure FDA00039488639000000821
The channel average of (a);
e-6) fusing features
Figure FDA00039488639000000822
Inputting the attribute data into a fifth fusion block of the fusion module, and obtaining the attribute characteristics after the attribute data passes through a first convolution layer of the fifth fusion block
Figure FDA00039488639000000823
Identity feature F id And attribute features
Figure FDA00039488639000000824
Inputting into a first adaptive instance normalization layer by formula
Figure FDA00039488639000000825
Calculating to obtain fusion characteristics
Figure FDA00039488639000000826
In the formula
Figure FDA00039488639000000827
As a characteristic of an attribute
Figure FDA00039488639000000828
Channel average of (2), fusing the features
Figure FDA00039488639000000829
Inputting into ReLU activation layer to obtain characteristics
Figure FDA00039488639000000830
Will be characterized by
Figure FDA00039488639000000831
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure FDA00039488639000000832
Identity feature F id And attribute features
Figure FDA00039488639000000833
Inputting into a second adaptive instance normalization layer by formula
Figure FDA00039488639000000834
Calculating to obtain fusion characteristics
Figure FDA00039488639000000835
In the formula
Figure FDA00039488639000000836
As a property feature
Figure FDA00039488639000000837
A channel average of (d);
e-7) fusing the features
Figure FDA00039488639000000838
Inputting the attribute characteristics into a fusion module from a sixth fusion block, and obtaining the attribute characteristics after passing through a first convolution layer of the sixth fusion block
Figure FDA0003948863900000091
Identity feature F id And attribute features
Figure FDA0003948863900000092
Inputting into the first adaptive instance normalization layer by formula
Figure FDA0003948863900000093
Calculating to obtain fusion characteristics
Figure FDA0003948863900000094
In the formula
Figure FDA0003948863900000095
As a property feature
Figure FDA0003948863900000096
Channel average of (2), fusing the features
Figure FDA0003948863900000097
Inputting into ReLU activation layer to obtain characteristics
Figure FDA0003948863900000098
Will be characterized by
Figure FDA0003948863900000099
After inputting into the second convolution layer, obtaining the attribute characteristics
Figure FDA00039488639000000910
Identity feature F id And attribute features
Figure FDA00039488639000000911
Inputting into a second adaptive instance normalization layer by formula
Figure FDA00039488639000000912
Calculating to obtain fusion characteristics
Figure FDA00039488639000000913
In the formula
Figure FDA00039488639000000914
As a property feature
Figure FDA00039488639000000915
A channel average of (d);
e-8) optimizing the feature key points lm fake Respectively inputting two convolution layers to respectively obtain characteristics F gamma And feature F beta By the formula
Figure FDA00039488639000000916
Calculating to obtain a fusion vector F fuse
e-9) the up-sampling module comprises a first up-sampling layer, a second up-sampling layer, a third up-sampling layer, a fourth up-sampling layer and a fifth up-sampling layer, and the background characteristics are obtained
Figure FDA00039488639000000917
And a fusion vector F fuse Inputting the data into a first up-sampling layer of an up-sampling module, and outputting the obtained characteristics
Figure FDA00039488639000000918
Will be characterized by
Figure FDA00039488639000000919
And background features
Figure FDA00039488639000000920
Input into the second up-sampling layer together, and output to obtain characteristics
Figure FDA00039488639000000921
Will be characterized by
Figure FDA00039488639000000922
And background features
Figure FDA00039488639000000923
Input into the third up-sampling layer together, and output to obtain characteristics
Figure FDA00039488639000000924
Will be characterized by
Figure FDA00039488639000000925
And background features
Figure FDA00039488639000000926
Input into the fourth up-sampling layer together, and output to obtain characteristics
Figure FDA00039488639000000927
Will be characterized by
Figure FDA00039488639000000928
And background feature F bg Inputting the images into a fifth upper sampling layer, and outputting to obtain a face image Pic fake
e-10) the discriminator module comprises a first down-sampling convolution layer, a second down-sampling convolution layer, a third down-sampling convolution layer, a fourth down-sampling convolution layer, a fifth down-sampling convolution layer, a sixth down-sampling convolution layer and a Sigmoid function layer, and the discriminator module is used for discriminating the face image Pic fake Is inputted intoAfter the first downsampling convolutional layer, outputting to obtain the characteristics
Figure FDA00039488639000000929
Will be characterized by
Figure FDA00039488639000000930
Input to the second downsampled convolutional layer, and output to obtain the characteristics
Figure FDA00039488639000000931
Will be characterized by
Figure FDA00039488639000000932
Input to the third downsampling convolutional layer, and output to obtain the characteristics
Figure FDA00039488639000000933
Will be characterized by
Figure FDA00039488639000000934
Inputting the data into a fourth downsampled convolutional layer, and outputting the data to obtain characteristics
Figure FDA0003948863900000101
Will be characterized by
Figure FDA0003948863900000102
Input to the fifth downsampled convolutional layer, and output to obtain the characteristics
Figure FDA0003948863900000103
Will be characterized by
Figure FDA0003948863900000104
Input to the sixth downsampling convolutional layer, and output to obtain the characteristics
Figure FDA0003948863900000105
Will be characterized by
Figure FDA0003948863900000106
After being input into a Sigmoid function layer, the values are output
Figure FDA0003948863900000107
The target image Pic t After input to the first downsampled convolutional layer, output to obtain the characteristics
Figure FDA0003948863900000108
Will be characterized by
Figure FDA0003948863900000109
Input to the second downsampled convolutional layer, and output to obtain the characteristics
Figure FDA00039488639000001010
Will be characterized by
Figure FDA00039488639000001011
Input to the third downsampling convolutional layer, and output to obtain the characteristics
Figure FDA00039488639000001012
Will be characterized by
Figure FDA00039488639000001013
Inputting the data into a fourth downsampled convolutional layer, and outputting the data to obtain characteristics
Figure FDA00039488639000001014
Will be characterized by
Figure FDA00039488639000001015
Inputting the data into a fifth downsampled convolutional layer, and outputting the result to obtain characteristics
Figure FDA00039488639000001016
Will be characterized by
Figure FDA00039488639000001017
Input to the sixth downsampling convolutional layer, and output to obtain the characteristics
Figure FDA00039488639000001018
Will be characterized by
Figure FDA00039488639000001019
After being input into a Sigmoid function layer, the values are output
Figure FDA00039488639000001020
e-11) by the formula
Figure FDA00039488639000001021
Calculating to obtain the identity loss l1, and obtaining the identity loss l1 through a formula l2= | | Pic fake -Pic t || 2 Calculating to obtain the reconstruction loss l2 by the formula
Figure FDA00039488639000001022
Calculating to obtain attribute loss l3, and iteratively optimizing the face image Pic by using the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method fake
10. The semantic guidance-based face image identity synthesis method according to claim 1, characterized in that: convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block in the step e-2) are all 3, step length is 1, and filling is 0; in the step e-8), the convolution kernels of the two convolution layers are both 1, the step length is both 1, and the filling is both 0; in the step e-9), convolution kernels of the first up-sampling layer, the second up-sampling layer, the third up-sampling layer and the fourth up-sampling layer are all 3, step length is 1, filling is 1, convolution kernels of the fifth up-sampling layer are 7, step length is 1, and filling is 0; e-10), wherein the convolution kernels of the first downsampling convolutional layer, the second downsampling convolutional layer and the third downsampling convolutional layer are all 4 x 4, the step lengths are all 2, the padding is all 1, the convolution kernels of the fourth downsampling convolutional layer, the fifth downsampling convolutional layer and the sixth downsampling convolutional layer are all 4 x 4, the step lengths are all 1, and the padding is all 1.
CN202211451581.1A 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method Active CN115713680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211451581.1A CN115713680B (en) 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211451581.1A CN115713680B (en) 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method

Publications (2)

Publication Number Publication Date
CN115713680A true CN115713680A (en) 2023-02-24
CN115713680B CN115713680B (en) 2023-07-25

Family

ID=85233817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211451581.1A Active CN115713680B (en) 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method

Country Status (1)

Country Link
CN (1) CN115713680B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246022A (en) * 2023-03-09 2023-06-09 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance
CN116612211A (en) * 2023-05-08 2023-08-18 山东省人工智能研究院 Face image identity synthesis method based on GAN and 3D coefficient reconstruction

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122103A1 (en) * 2017-10-24 2019-04-25 International Business Machines Corporation Attention based sequential image processing
CN110197167A (en) * 2019-06-05 2019-09-03 清华大学深圳研究生院 A kind of video actions moving method
CN111368662A (en) * 2020-02-25 2020-07-03 华南理工大学 Method, device, storage medium and equipment for editing attribute of face image
CN111932444A (en) * 2020-07-16 2020-11-13 中国石油大学(华东) Face attribute editing method based on generation countermeasure network and information processing terminal
CN112734634A (en) * 2021-03-30 2021-04-30 中国科学院自动化研究所 Face changing method and device, electronic equipment and storage medium
CN112766160A (en) * 2021-01-20 2021-05-07 西安电子科技大学 Face replacement method based on multi-stage attribute encoder and attention mechanism
CN113112411A (en) * 2020-01-13 2021-07-13 南京信息工程大学 Human face image semantic restoration method based on multi-scale feature fusion
CN113627233A (en) * 2021-06-17 2021-11-09 中国科学院自动化研究所 Visual semantic information-based face counterfeiting detection method and device
CN113689328A (en) * 2021-09-13 2021-11-23 中国海洋大学 Image harmony system based on self-attention transformation
WO2021258920A1 (en) * 2020-06-24 2021-12-30 百果园技术(新加坡)有限公司 Generative adversarial network training method, image face swapping method and apparatus, and video face swapping method and apparatus
CN114078172A (en) * 2020-08-19 2022-02-22 四川大学 Text image generation method for progressively generating confrontation network based on resolution
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122103A1 (en) * 2017-10-24 2019-04-25 International Business Machines Corporation Attention based sequential image processing
CN110197167A (en) * 2019-06-05 2019-09-03 清华大学深圳研究生院 A kind of video actions moving method
CN113112411A (en) * 2020-01-13 2021-07-13 南京信息工程大学 Human face image semantic restoration method based on multi-scale feature fusion
CN111368662A (en) * 2020-02-25 2020-07-03 华南理工大学 Method, device, storage medium and equipment for editing attribute of face image
WO2021258920A1 (en) * 2020-06-24 2021-12-30 百果园技术(新加坡)有限公司 Generative adversarial network training method, image face swapping method and apparatus, and video face swapping method and apparatus
CN111932444A (en) * 2020-07-16 2020-11-13 中国石油大学(华东) Face attribute editing method based on generation countermeasure network and information processing terminal
CN114078172A (en) * 2020-08-19 2022-02-22 四川大学 Text image generation method for progressively generating confrontation network based on resolution
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method
CN112766160A (en) * 2021-01-20 2021-05-07 西安电子科技大学 Face replacement method based on multi-stage attribute encoder and attention mechanism
CN112734634A (en) * 2021-03-30 2021-04-30 中国科学院自动化研究所 Face changing method and device, electronic equipment and storage medium
CN113627233A (en) * 2021-06-17 2021-11-09 中国科学院自动化研究所 Visual semantic information-based face counterfeiting detection method and device
CN113689328A (en) * 2021-09-13 2021-11-23 中国海洋大学 Image harmony system based on self-attention transformation
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHILIANG XU 等: "StyleSwap: Style-Based Generator Empowers Robust Face Swapping", 《ARXIV》, vol. 2022, pages 1 - 21 *
李欢: "基于归一化SAGAN的奶山羊图像生成算法研究", 《中国优秀硕士学位论文全文数据库农业科技辑》, vol. 2022, no. 1, pages 050 - 152 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246022A (en) * 2023-03-09 2023-06-09 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance
CN116246022B (en) * 2023-03-09 2024-01-26 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance
CN116612211A (en) * 2023-05-08 2023-08-18 山东省人工智能研究院 Face image identity synthesis method based on GAN and 3D coefficient reconstruction
CN116612211B (en) * 2023-05-08 2024-02-02 山东省人工智能研究院 Face image identity synthesis method based on GAN and 3D coefficient reconstruction

Also Published As

Publication number Publication date
CN115713680B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
Wen et al. Cycle4completion: Unpaired point cloud completion using cycle transformation with missing region coding
Yuan et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet
Liu et al. Convtransformer: A convolutional transformer network for video frame synthesis
CN115713680A (en) Semantic guidance-based face image identity synthesis method
WO2023072067A1 (en) Face attribute editing model training and face attribute editing methods
CN113140020B (en) Method for generating image based on text of countermeasure network generated by accompanying supervision
Li et al. Learning face image super-resolution through facial semantic attribute transformation and self-attentive structure enhancement
CN116309913B (en) Method for generating image based on ASG-GAN text description of generation countermeasure network
CN111161158B (en) Image restoration method based on generated network structure
CN116612211B (en) Face image identity synthesis method based on GAN and 3D coefficient reconstruction
CN116246022B (en) Face image identity synthesis method based on progressive denoising guidance
CN115311720A (en) Defekake generation method based on Transformer
CN113379597A (en) Face super-resolution reconstruction method
Bhunia et al. Word level font-to-font image translation using convolutional recurrent generative adversarial networks
CN112949707A (en) Cross-mode face image generation method based on multi-scale semantic information supervision
CN115063463A (en) Fish-eye camera scene depth estimation method based on unsupervised learning
CN110415261B (en) Expression animation conversion method and system for regional training
CN115909160A (en) Method and device for detecting depth video frame insertion and computer readable storage medium
Endo et al. Few-shot semantic image synthesis using stylegan prior
CN114463214A (en) Double-path iris completion method and system guided by regional attention mechanism
Luan et al. Learning unsupervised face normalization through frontal view reconstruction
CN114155139A (en) Deepfake generation method based on vector discretization representation
Lai et al. Generative focused feedback residual networks for image steganalysis and hidden information reconstruction
CN115496134B (en) Traffic scene video description generation method and device based on multi-mode feature fusion
CN108305219A (en) A kind of image de-noising method based on uncorrelated sparse dictionary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant