CN113343878A

CN113343878A - High-fidelity face privacy protection method and system based on generation countermeasure network

Info

Publication number: CN113343878A
Application number: CN202110681374.4A
Authority: CN
Inventors: 杨辉华; 张隆昊; 李灵巧; 许亦博; 李忠明
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-03

Abstract

The invention provides a high-fidelity face privacy protection method and system based on a generation countermeasure network, wherein the method comprises the following steps: carrying out face recognition on a source face image to obtain multi-scale identity characteristics of a source face; at least obtaining a target face boundary box and face key point information based on an input target face image; extracting specific key points related to postures and/or expressions in the obtained face key points, and obtaining a face key point connecting line graph based on the extracted specific key points; obtaining a fidelity face privacy protection image based on a generation countermeasure network GAN, comprising: the method comprises the steps of taking a target face image and multi-scale identity characteristics of a source face as input, and generating a fidelity face privacy protection image with the identity of the source face and the attributes of the target face by utilizing an Unet neural network structure; and (3) taking the fidelity face privacy protection image and the face key point connecting line graph as common input to judge the recognition result of the fidelity face privacy protection image, and optimizing the Unet neural network structure based on the recognition result.

Description

High-fidelity face privacy protection method and system based on generation countermeasure network

Technical Field

The invention relates to the technical field of computer vision, in particular to a high-fidelity face privacy protection method and system based on a generation countermeasure network.

Background

At present, the face synthesis and replacement technology is receiving more and more attention as a new computer vision technology, and has great application value in the aspects of entertainment, virtual reality, privacy protection, video chat and other vision technologies. The current mainstream face synthesis and replacement technology model is complex, has high requirements on hardware and has long reasoning time. In addition, the generated face of each frame cannot well keep the attributes (such as posture, expression, skin color, illumination, makeup and the like) of the target face of the frame, so that the problems of discontinuity, instability and the like in a time domain occur, the effect of the generated face on video application is poor, the distortion of the image of the target face is often serious under the condition of meeting privacy protection, and the good privacy protection effect under the high-fidelity state of the face is difficult to achieve at present.

Therefore, how to realize good privacy protection of the target face in a high fidelity state is a problem to be solved.

Disclosure of Invention

In view of the above, the present invention provides a high fidelity face privacy protection method and system based on generation countermeasure network, so as to eliminate or improve one or more defects existing in the prior art.

According to one aspect of the invention, a high-fidelity face privacy protection method based on a generation countermeasure network is provided, and the method comprises the following steps:

a source face identity coding step, which is used for carrying out face recognition on a source face image and acquiring the multi-scale identity characteristics of a source face;

a face and key point detection step, which is used for at least obtaining a target face boundary box and face key point information based on an input target face image;

a key point connecting line graph obtaining step, which is used for extracting specific key points related to gestures and/or expressions in the obtained face key points and obtaining the face key point connecting line graph based on the extracted specific key points;

obtaining a fidelity face privacy protection image based on a generation countermeasure network GAN, comprising:

an image generation step, which is used for synthesizing a fidelity face privacy protection image with the identity of the source face and the attribute of the target face by using a Unet neural network structure by taking the multi-scale identity characteristics of the target face image and the source face as input;

and a judging step, namely judging the recognition result of the fidelity face privacy protection image by taking the fidelity face privacy protection image and the face key point connecting line graph as common input, and optimizing the Unet neural network structure based on the recognition result.

In some embodiments of the invention, the identity encoding step comprises: acquiring multi-scale identity characteristics of a source face by adopting a k-shot strategy; the method for acquiring the multi-scale identity characteristics of the source face by adopting the k-shot strategy comprises the following steps: acquiring k identity characteristics based on k source face images, and averaging the acquired k identity characteristics to obtain final identity characteristics; and obtaining the multi-scale identity characteristics of the source face based on the final identity characteristics.

In some embodiments of the invention, the image generating step comprises: an identity migration step, namely obtaining a first self-adaptive output characteristic of a source face image through example normalization processing, layer normalization processing and deep learning based on multi-scale identity characteristics of the source face image; a gesture expression control step, namely obtaining a second self-adaptive output characteristic of the target face image based on example normalization processing, layer normalization processing and deep learning based on a face key point connecting line graph of the target face image; and generating a fidelity face privacy protection image with the identity of the source face and the attributes of the target person by utilizing a Unet neural network structure based on the first adaptive output characteristics and the first adaptive output characteristics.

In some embodiments of the invention, the first adaptive output characteristic conforms to the following equation:

the second adaptive output characteristic conforms to the following equation:

wherein the content of the first and second substances,

M∈R^H×W×C；

wherein M is the characteristic of the input image, and R represents a real number domain; h and W represent the height and width of the characteristic diagram respectively, and C represents the number of characteristic channels; mu.s_INAnd σ_INIs the mean and standard deviation for the example normalization,

and

respectively represent the mean and variance, μ, of each instance in the c-th channel_LNAnd σ_LNIs the mean and standard deviation for layer normalization; gamma ray_IDAnd beta_IDIs a parameter learned by multi-scale identity features; gamma ray_PEAnd beta_PEIs a line graph B passing through the key points_tThe parameters of the parameters that are learned are,

is M_INThe value of the c-th channel position (x, y),

is M_LNThe value of the c-th channel position (x, y),

is the average of LN over the c-th channel.

In some embodiments of the invention, the method further comprises: determining a face area based on the obtained target face bounding box, and calculating an affine transformation matrix to align the face; a fusion step, namely adjusting the generated fidelity face privacy protection image by using a mirror image Sigmoid mask, and fusing the synthesized fidelity face privacy protection image with a target image, so that the outer edge pixels of the generated face mainly use the pixels of the target face, and the inner part of the generated face keeps the pixels of the generated face; and pasting the adjusted fidelity face privacy protection image back to the corresponding position of the target image or the video by utilizing the affine transformation matrix.

In some embodiments of the invention, the method further comprises: the discrimination module optimizes the Unet neural network using a countering loss function, a perceptual loss function, an identity loss function, and a reconstruction loss function.

In some embodiments of the invention, the penalty function satisfies:

the identity loss function is expressed as:

the reconstruction loss function is expressed as:

the perceptual loss function is expressed as:

wherein G represents the result of the generator, D_iI-th discrimination result representing the discriminator, n representing the number of discrimination results, F_tRepresenting the image of the target face, F_sRepresenting the source face image, B_tRepresentative face gateA connection line graph of the key points is shown,

representing the desired value, Y representing the generated face image,₁represents weight value, which can be set to 0.1, H and W represent height and width of the feature map, respectively, C represents number of feature channels, F^sAnd Y^sAre respectively F_tAnd Y in s layers, m representing the number of layers, C^s、H^sAnd W^sFG representing the number, width and height of channels of the s-layer, respectively^sAnd YG^sIs F^sAnd Y^sThe matrix of (a) is a matrix of,

represents F_tThe value at the ith row and kth column of the Gram matrix of features at the s-th layer,

represents F_tThe value of the Gram matrix of features at the s-th layer at the j-th row and k-th column,

the value of the ith row and the kth column of the Gram matrix representing the characteristics of Y at the s-th layer,

the j row and k column values of the Gram matrix representing the characteristics of Y at the s layer.

In some embodiments of the present invention, the source face image or the target face image comprises a still image or a moving image, the moving image comprising a video frame; the attributes of the target face comprise at least one of the following attributes: pose, expression, skin color, lighting and makeup of the target face; the method further comprises the following steps: and inputting the synthesized fidelity human face privacy protection images with different resolutions into a plurality of discrimination modules with the same network structure, and optimizing the Unet neural network structure of the generation module based on the discrimination result.

According to another aspect of the present invention, there is also provided a high fidelity face privacy protection system based on generation of a countermeasure network, the system comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus implementing the steps of the method as described above when the computer instructions are executed by the processor.

In a further aspect of the invention, a computer-readable storage medium is also provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

The high-fidelity face privacy protection method and system based on the generation countermeasure network can well synthesize the whole face area, hair and background, so that the face area, the hair and the background can be easily fused into a target frame.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic flow chart of a high-fidelity face privacy protection method based on a generation countermeasure network in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a system module structure of a high-fidelity face privacy protection method in an embodiment of the present invention.

Fig. 3 is a schematic block diagram of an identity migration module according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a gesture expression control module according to an embodiment of the present invention.

Fig. 5 is a schematic flow chart of a high-fidelity face privacy protection method in another embodiment of the present invention.

Fig. 6a, 6b and 6c are schematic diagrams of a mirror sigmoid mask and 2D and 3D visualizations thereof, respectively, according to an embodiment of the invention.

FIG. 7 is a graph showing the effect of the method of the present invention compared with the prior art method.

Fig. 8 is a schematic implementation flow diagram of a high-fidelity face privacy protection method in another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

In order to obtain a high-fidelity image which can fully protect the face privacy, the invention provides a high-fidelity face privacy protection method based on a generation countermeasure network.

Fig. 1 is a schematic flow chart of a high-fidelity face privacy protection method based on a generation countermeasure network in an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

and a source face identity coding step S110, wherein the source face image is subjected to face recognition to obtain the multi-scale identity characteristics of the source face.

The module for realizing the step in the method can be called as an identity coding module, the identity coding module carries out face recognition on a source face image by utilizing a pre-trained face recognition network, and multi-scale identity features are extracted from a source face.

As an example, the face recognition network employed is the currently popular ArcFace. The multi-scale identity feature may be an identity feature of multiple resolution scales.

And a human face and key point detection step S120, wherein at least a target human face boundary box and human face key point information are obtained based on the input target human face image.

The module for realizing the step in the method can be called a face and key point detection module, more specifically, the face and key point detection module can adopt a pre-trained current mainstream SFD algorithm to detect the face key points, and can also use a current mainstream 2D-FAN algorithm to detect the face boundary box. The models have small parameter quantity, short reasoning time and lower requirement on video memory.

And a key point connecting line graph obtaining step S130 of extracting specific key points related to the pose and/or expression from the target face key points obtained in step S120, and obtaining a face key point connecting line graph based on the extracted specific key points.

For example, the face key point wiring diagram B can be obtained by wiring specific key points related to the pose and/or expression_tThe face keypoint wiring graph is also associated with pose and/or expression.

Since the gesture and expression can be considered to be mainly described by the opening and closing of the eyes, the position of the eyeballs, the direction of the nose bridge, and the mouth shape, the key points selected by the embodiment of the present invention, which are closely related to the gesture and expression, include, for example: the eyes, the eyeballs, the mouth, the nose bridge and other fewer key points are selected, and the influence on the identity of the face can be reduced as much as possible.

In step S140, a fidelity face privacy protection image is obtained based on a generated countermeasure network (GAN).

The generation of the countermeasure network GAN is a neural network learning model in which a game is continuously played by a generator G (generator) and a discriminator d (discriminator), and the generator G learns the distribution of data. During the training process, the generator G aims to generate as many real pictures as possible to trick the discriminator D. The goal of the discriminator D is to discriminate as much as possible between the false image generated by the generator G and the true image. Thus, the generator G and the discriminator D form a dynamic game process, and finally, the result of the game of the two parties is that the false samples generated by the generator are almost not different from the true samples, so that the discriminator cannot distinguish the true samples from the false samples, the generator and the discriminator reach balance at the moment, and the whole training process is finished.

Based on the principle of generating a countermeasure network, in the embodiment of the present invention, the step S140 of obtaining a privacy-preserving image of a fidelity face based on GAN includes an image generating step S141 and a discriminating step S142.

The image generation step S141 can be realized by a generator (generation module) for generating a countermeasure network, in which the target face image F is used_tAnd source face F_sThe multi-scale identity characteristics are used as input, a Unet neural network structure is used for generating (synthesizing) a fidelity face image, the fidelity face image has the identity of a source face and the attribute of a target face, so that the target face is replaced by the fidelity face, the target face can be protected, and the generated fidelity face can also be called as a fidelity face privacy protection image.

The Unet neural network is a semantic segmentation network obtained by improving on the basis of a Full Convolution Network (FCN), and the network structure of the Unet neural network comprises two symmetrical parts: the former part network uses convolution and pooling downsampling of 3x3, and can grasp the context information (namely the relation between pixels) in the image; the latter part of the network is substantially symmetrical to the front, using 3x3 convolution and upsampling for output image segmentation purposes. Feature fusion is also used in the Unet neural network, and features of a down-sampling part in the front and features of an up-sampling part in the back are fused to obtain more accurate context information, so that a better segmentation effect is achieved. The Unet neural network model is simple, light and compact, and can reduce the complexity of face generation compared with the existing model for face replacement.

The attributes of the target face include at least one of the following attributes: pose, expression, skin tone, lighting, and makeup of the target face.

In the determination step S142, the fidelity face privacy protection image synthesized in step S141 and the face key point wiring diagram are used as common inputs to determine the recognition result of the fidelity face privacy protection image, and the Unet neural network structure is optimized based on the recognition result.

The determination step S142 may be implemented by a determiner (determination module) that generates an antagonistic network, and in this step, the determination module may implement perception of the pose and/or expression of the target face by splicing the fidelity face privacy protection image generated in step S141 and the face key point connecting line graph as input. That is, the discrimination module will note whether the poses and/or expressions, such as the positions of the five sense organs, etc., of the synthesized face are consistent with those of the target face in addition to the authenticity of the image generated by the generation module, thereby realizing explicit supervision of the poses and expressions of the face.

After synthesizing the fidelity face privacy protection image based on a generated confrontation network (GAN), the synthesized face image may replace the target face in the target image.

In some embodiments of the present invention, in order to facilitate the replacement of the face, a face region clipping step may be further included before the step S140 is performed. The step can cut out a face area based on the face and a target face boundary box obtained by the key point detection module, and the subsequent image generation step can synthesize a face image based on the cut-out face area.

The face and key point detection module is further used for detecting the source face bounding box to cut out the face area of the source face image, so that the image generation module carries out face image synthesis based on the cut out source face area and the target face area.

In addition, in practical application, the target face image may have a certain inclination and cannot be aligned well with the source face, and at this time, in order to improve the fidelity of the synthesized face, the present invention performs face alignment on the cut target face region by using the affine transformation matrix AM before the generation step is performed.

The high-fidelity face privacy protection method based on the generation countermeasure network, disclosed by the invention, has the advantages that the generated face not only retains the attributes including the posture, the expression and the like of the target face with high fidelity, but also has the identity of the source face, the target face in a video or a picture is replaced by the face generated by the method, the realization is simple, and the privacy information of the target face can be effectively protected.

In this embodiment of the present invention, the source face identity encoding step S110 may further include: and acquiring the multi-scale identity characteristics of the source face by adopting a k-shot strategy.

The step of obtaining the multi-scale identity characteristics of the source face by adopting the k-shot strategy can comprise the following steps:

(1) k identity features are obtained based on k source face images, and the obtained k identity features are averaged to obtain the final identity feature.

The k source face images may be static source face images or source face video frames. K source images or source video frames are simultaneously input into an identity coding module to obtain k identity characteristics, and then the final identity characteristics can be obtained in an averaging mode.

(2) And obtaining the multi-scale identity characteristics of the source face based on the final identity characteristics.

Here, the multi-scale identity feature may refer to identity features of different resolution scales. According to the final identity feature, the resolution can be reduced in a grading way to obtain the multi-scale identity features with different resolutions.

The identity characteristics of the source face are obtained by adopting a k-shot strategy averaging mode, so that the accuracy and the comprehensiveness of identity coding can be further improved. The k-shot strategy can effectively avoid interference and defects caused by certain extreme conditions in a single image or a single video frame to identity extraction, and can improve the face changing effect of a static image or a video by utilizing richer face identity information.

In addition, in the embodiment of the present invention, in order to further improve the accuracy of face conversion and ensure high fidelity of the synthesized face, the present invention also restricts the pose and/or expression of the generated face while ensuring accurate identity conversion, and for this, the image generating step S141 may further include an identity migration step and a pose expression control step that employ an adaptive normalization technique.

IN the identity migration step, not only the Instance Normalization (IN) processing is performed on the source face image, but also the Layer Normalization (LN) processing related to the number of feature channels is performed on the source face image, and the multi-scale identity features based on the source face are input together for learning, so as to obtain the corrected source face identity feature output. The module of the generation module of the present invention that implements the identity migration step may be referred to as an identity migration module.

In the gesture expression control step, not only the example normalization processing is carried out on the target face image, but also the layer normalization processing related to the number of the characteristic channels is carried out on the target face image, and the target face image is input together and is learned based on the face key point connecting line graph, so that the target face characteristic corrected based on the gesture and the expression of the face is obtained. The module for realizing the gesture expression control step in the generation module can be called as a gesture expression control module.

In the identity migration module and the gesture expression control module, making M belong to R^H×W×CAnd representing the characteristics of the input image, wherein H and W respectively represent the height and width of the characteristic diagram, C represents the number of characteristic channels, and R is a real number domain. As shown IN FIGS. 3 and 4, M is first obtained by passing M through an example normalization (IN) layer and a Layer Normalization (LN) layer, respectively_INAnd M_LN：

The case normalization means that the mean and variance are calculated for each channel of each case only in the (H, W) dimension, and the layer normalization means that the mean and variance are calculated for each case (each graph) in the (C, H, W) dimension.

As in the above formula, μ_INAnd σ_INExpressed are mean and standard deviation of IN, μ_LNAnd σ_LNThe mean and standard deviation of the LN are indicated,

and

respectively representing the mean and variance of each instance in the c-th channel,

is M_INThe value of the c-th channel position (x, y),

is M_LNThe value of the c-th channel position (x, y),

is the average of LN over the c-th channel. Subsequently, two outputs M are output using a learnable parameter ρ_INAnd M_LNAre combined and madeAdaptive normalization is achieved with the parameters γ and β:

more specifically, for the identity migration module, the resulting output characteristics are:

for the gesture expression control module, the obtained output characteristics are as follows:

gamma for identity transfer module_IDAnd beta_IDIs learned through multi-scale identity characteristics, and the gamma of the gesture expression control module_PEAnd beta_PEIs a line graph B passing through the key points_tAnd (4) learning. Fig. 3 shows an exemplary process of performing example normalization processing and layer normalization processing on input features (source face images) in the identity migration module, and then combining ID features (corresponding identity features in the source face images) and ID vectors (ID resolution scales) to obtain modified source face feature outputs, and fig. 4 shows a connecting line graph B of combining key points after performing example normalization processing and layer normalization processing on input features (target face images) in the gesture expression control module_tAn exemplary process for deriving a modified target face feature output is illustrated in figures 3 and 4,

represents the addition of the related elements;

which means that the elements involved are multiplied by each other,

representation mergingOperation, GAP is global average pooling, and Linear represents the connection layer.

In the embodiment of the invention, as described above, the example normalization is channel-by-channel calculation, the correlation between the channels is ignored, the layer normalization does not ignore the correlation between the channels, the identity migration module can realize accurate identity conversion by introducing the example normalization and the layer normalization simultaneously, and the pose and expression control module corrects the pose and expression of the synthesized face by using the face key point information so as to make the pose and expression of the synthesized face consistent with the target face. In the embodiment of the invention, the problem of asymmetry of the face image synthesized in the generating step can be greatly improved or eliminated by introducing layer normalization processing in the generating step.

Fig. 2 is a schematic diagram of a system module structure for implementing the high-fidelity face privacy protection method in an embodiment of the present invention. In fig. 2, each ID block corresponds to the processing of the identity migration module shown in fig. 3, ID blocks of different levels correspond to different ID resolution feature scales, each PE block corresponds to the processing of the gesture expression control module shown in fig. 4, and PE blocks of different levels correspond to the five sense organ key point connecting line graphs of different resolutions. The identity migration module is applied at a higher level in the generation module, and the gestural expression control module is applied at a lower level. It can be considered that identity features contain more abstract and richer semantic information and therefore should be used more at a higher feature level (low resolution feature map) to shape identities; and the face key points and the connecting line graph thereof should be used at a lower feature level (a high-resolution feature graph) to correct the outline details.

As shown in FIG. 2, the discrimination module may perceive the pose and/or expression by connecting the face key points to a line graph B_tAnd the face image synthesized by the generation module is used as common input to judge whether the synthesized image is a synthesized image or an original image, and the Unet neural network structure is optimized based on the recognition result to weakly monitor the posture and expression of the generated face.

Further, in the embodiment of the present invention, the determination module may adopt a multi-scale structural design. The synthesized face images with different scales (different resolutions) are input into a plurality of discrimination modules with the same network structure, and the Unet neural network structure of the generation module is optimized based on each discrimination result. The discrimination module for inputting the large-scale image has a larger receptive field and a more global visual angle, so that the generation module can be guided to generate an image with global consistency; on the other hand, the discrimination module inputting the small-scale image can be used for guiding the generation module to generate better image details.

Furthermore, the discrimination module can also utilize the multi-scale features to assist in retaining attributes of the target face, such as skin color, illumination, makeup, occlusion and the like.

More specifically, the discrimination module utilizes a perceptual loss function to preserve the attributes of the target face without affecting identity conversion, such as: skin color, lighting, makeup, and shading. The perception loss function comprises an occlusion perception part and a style perception part, and attributes such as skin color, illumination, makeup, occlusion and the like of the target face are retained in an auxiliary mode by utilizing the multi-scale features of the discrimination module.

In an embodiment of the present invention, the perceptual loss function is defined as:

wherein H and W represent the height and width of the feature map, respectively, C represents the number of feature channels, and F^sAnd Y^sAre respectively F_tAnd Y in s layers, m representing the number of layers, C^s、H^sAnd W^sFG representing the number, width and height of channels of the s-layer, respectively^sAnd YG^sIs F^sAnd Y^sA Gram (Gram) matrix of (a),

represents F_tThe value of the characteristic of the gram matrix at the s-th layer at the j-th row and k-th column,

the value of the ith row and the kth column of the gram matrix representing the feature of Y at the s-th layer,

the value of the jth row and kth column of the gram matrix representing the characteristics of Y at the s-th layer.

In addition, in the preferred embodiment of the present invention, the discrimination module further uses a countermeasure loss function to implement a "binary minuscule gaming", where the countermeasure loss function is defined as:

wherein G represents the result of the generator, D_iI-th discrimination result representing a discriminator, n representing the number of discrimination results, F_tRepresenting the image of the target face, F_sRepresenting the source face image, B_tRepresents a five sense organs key point connecting line graph, and E represents the expectation value.

In addition, the discrimination module also accurately converts the generated face identity into a source face identity using an identity loss function, the identity loss function formula being:

wherein cos (·, ·) represents the cosine similarity of two identity features; IDEnc (Y)) Identity features representing the synthetic face Y generated by the generating module, IDEnc (F)_s) Representing the identity of the source face, a synthetic face Y and a source face F_sThe identity features of (1) are extracted by using an identity encoder (identity encoding module).

In addition, the discrimination module also utilizes a reconstruction loss function to help preserve hair and a portion of facial occlusions of the target face. The reconstruction loss function is defined as the square of the euclidean distance of the target face from the generated face:

wherein the content of the first and second substances,

for reconstructing the loss value, λ₁The weight value may be set to 0.1.

In the embodiment of the invention, the discrimination module can keep various attributes of the target face without influencing identity conversion by introducing various loss functions such as countermeasure loss, identity loss, reconstruction loss, perception loss and the like.

In addition, in order to eliminate an obvious boundary between an outer edge and an inner edge of a synthesized face after face replacement, the method of the embodiment of the invention further comprises an image fusion step, namely, a mirror image Sigmoid mask is further utilized to adjust the synthesized fidelity face privacy protection image, so that the outer edge pixels of the generated face mainly use the pixels of the target face, and the pixels of the generated face are reserved inside the generated face.

More specifically, the mask formula employed is as follows:

where x denotes the abscissa of the synthesized face pixel, y denotes the ordinate of the synthesized face pixel, mid denotes the half size of the image, α is used to control the slope of the function, and as an example, α may be set to 0.08, and θ denotes a variation factor, and as an example, θ may be set to 0.7, but the present invention is not limited thereto.

After the generated face pixels are processed by using the mirror image Sigmoid mask, the generated face pixels can be directly attached to the corresponding positions of the target image or the video frame, or the synthesized face pixels can be directly attached to the corresponding positions of the target image or the video frame after being transformed by using the affine transformation matrix AM of the target face calculated in the prior art. The visual effect of the masking process is shown in fig. 6. Fig. 6a, 6b, 6c show a mirror Sigmoid mask, a 2D visualization thereof and a 3D visualization thereof, respectively. After the images are fused, the outer edge and the inner edge of the synthesized face are in smooth transition, and no obvious boundary appears.

Fig. 5 is a schematic flow chart of a high-fidelity face privacy protection method in another embodiment of the present invention, as shown in fig. 5, the method in this embodiment includes the following steps:

step S1, the face and key point detection module obtains a face bounding box and face key points of a target image (a still image or a video frame), cuts out a target face region, obtains target face position information and target face key point information, and further calculates an affine transformation matrix AM using an OpenCV function to align the target face.

Step S2, respectively connecting the eye, mouth and nose bridge points in the key points of the target face to obtain a target face key point connecting line graph B_t。

And step S3, carrying out face identity feature coding on the identified source face image to acquire the multi-scale identity feature of the source face.

The multi-scale identity features of the source face can be extracted from k source images or video frames through a k-shot strategy, for example.

And step S4, sending the multi-scale identity characteristics of the target face, the key point connecting line graph and the source face into a generating module to obtain a synthesized face image.

And step S5, the synthesized face image and the face key point connecting line graph are used as common input to judge whether the fidelity face privacy protection image is a synthesized image or an original image, and the Unet neural network structure is optimized based on the recognition result.

This step is present only during the training phase and not during the actual application phase.

And step S6, adjusting the pixel value of the generated face by using a mirror image sigmoid mask, and fusing the synthesized face with the target video frame, so that the outer edge pixels of the generated face mainly use the pixels of the target face, and the inner part of the generated face is reserved.

In step S7, the affine transformation matrix AM of the target face calculated in step S1 may be used to directly paste the processed generated face back to the corresponding position of the target image or video frame.

Fig. 8 shows a schematic flow chart of an implementation of the corresponding method. As shown in fig. 8, a target face attribute feature and a face key point connecting line graph are obtained based on a target frame, and face pixel alignment is performed by calculating an affine transformation matrix AM. And then obtaining final face identity characteristics by averaging based on a plurality of source face video frames, synthesizing a high-fidelity face by using a generated confrontation network based on the obtained face identity characteristics, target face attribute characteristics and face key point connecting line graphs, and further improving the continuity and consistency of the face-changed images by image fusion.

The high-fidelity face privacy protection method based on the generation countermeasure network can well synthesize the whole face area, hair and background, so that the face area, the hair and the background can be easily fused into a target frame.

In order to verify the effect of the invention, the face changing effect of the invention and other open-source face synthesis and replacement methods is compared on a FaceForensics + + data set. The effect pairs are shown in fig. 7. Both FaceSwap and deepfaces suffer from blending inconsistency (blending inconsistency) because they follow a strategy of first synthesizing an internal face region and then blending it into a target face. However, the present invention can synthesize the whole face region, hair, and background well, and thus can be easily fused into a target frame. In addition, existing methods do not preserve attributes (e.g., pose, expression, skin tone, lighting, makeup, etc.) in the target face. Furthermore, generating inconsistencies in the pose and expression of the face and the target face reduces temporal continuity of the results, while deviations in light, skin tone and makeup reduce temporal stability and realism of the results. The invention can well reserve the attribute of the human face so as to obtain higher fidelity. Table 1 shows the quantitative comparison of the present invention with other state-of-the-art methods (Nirkin, IP-GAN, faceShifter, DeepFakes, faceSwap and FS-GAN). The following general criteria were used for quantitative comparison: identity retrieval (id), Domain Invariant Perceptual Distance (DIPD), pose distance (pos), expression distance (exp), Structural Similarity (SSIM), mean endpoint error (AEE), and Flow Warping Error (FWE).

Table 1. quantitative comparison with other face exchange methods (heel @ indicates the better the number, and ↓ indicates the better the smaller the number).

As is apparent from fig. 7 and table 1, the present invention is superior to other existing methods in terms of identity conversion, attribute maintenance, authenticity, and time continuity and stability.

In accordance with the foregoing method, there is also provided a high fidelity face privacy protection system based on generation of a countermeasure network, the system comprising a processor and a memory, the memory having stored therein computer instructions for execution by the processor, the system implementing the steps of the foregoing method when the computer instructions are executed by the processor.

As another embodiment, the high fidelity face privacy protection system based on generation of a countermeasure network of the present invention may comprise:

a target face input unit, wherein the target face image information is input in the form of video or still picture;

a source face input unit, wherein the source face image information is input in the form of video or still picture;

the high-fidelity face privacy protection unit is used for inputting target face information and source face information, synthesizing a face according to a high-fidelity face privacy protection method based on a generation countermeasure network, keeping all attributes of the target face with high fidelity and having the identity of the source face, and replacing the target face in a video or a picture by using the generated face so as to protect the privacy information of the target face. The high-fidelity face privacy protection unit may comprise:

the source face identity coding module is used for executing the source face identity coding step;

the face and key point detection module is used for executing the face and key point detection step;

the key point wiring diagram acquisition module is used for executing the key point wiring diagram acquisition step;

an image generation module for performing the image generation step;

and the judging module is used for executing the judging step.

In addition, the high-fidelity face privacy protection unit can further comprise a fusion module used for executing a fusion step, adjusting the generated fidelity face privacy protection image by the mirror image Sigmoid mask, and fusing the synthesized face with the target video frame.

Software implementing embodiments of the present invention may be disposed in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of tangible storage medium known in the art.

Accordingly, the present disclosure also relates to a storage medium as above, on which a computer program code may be stored, which when executed may implement various embodiments of the method of the present invention.

It should be noted that the exemplary embodiments of the present invention describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A high-fidelity face privacy protection method based on a generation countermeasure network is characterized by comprising the following steps:

2. The method of claim 1, wherein the identity encoding step comprises: acquiring multi-scale identity characteristics of a source face by adopting a k-shot strategy; the method for acquiring the multi-scale identity characteristics of the source face by adopting the k-shot strategy comprises the following steps:

acquiring k identity characteristics based on k source face images, and averaging the acquired k identity characteristics to obtain final identity characteristics;

and obtaining the multi-scale identity characteristics of the source face based on the final identity characteristics.

3. The method of claim 1, wherein the image generating step comprises:

an identity migration step, namely obtaining a first self-adaptive output characteristic of a source face image through example normalization processing, layer normalization processing and deep learning based on multi-scale identity characteristics of the source face image;

a gesture expression control step, namely obtaining a second self-adaptive output characteristic of the target face image based on example normalization processing, layer normalization processing and deep learning based on a face key point connecting line graph of the target face image;

and generating a fidelity face privacy protection image with the identity of the source face and the attributes of the target person by utilizing a Unet neural network structure based on the first adaptive output characteristics and the first adaptive output characteristics.

4. The method of claim 3,

the first adaptive output characteristic conforms to the following equation:

the second adaptive output characteristic conforms to the following equation:

wherein the content of the first and second substances,

M∈R^H×W×C；

wherein M is the characteristic of the input image, and R represents a real number domain; h and W represent the height and width of the characteristic diagram respectively, and C represents the number of characteristic channels; mu.s_INAnd σ_INIs for exampleThe normalized mean and standard deviation of the mean and standard deviation,

and

is M_INThe value of the c-th channel position (x, y),

is M_LNThe value of the c-th channel position (x, y),

is the average of LN over the c-th channel.

5. The method of claim 1, further comprising:

determining a face area based on the obtained target face bounding box, and calculating an affine transformation matrix to align the face;

a fusion step, namely adjusting the generated fidelity face privacy protection image by using a mirror image Sigmoid mask, and fusing the synthesized fidelity face privacy protection image with a target image, so that the outer edge pixels of the generated face mainly use the pixels of the target face, and the inner part of the generated face keeps the pixels of the generated face;

and pasting the adjusted fidelity face privacy protection image back to the corresponding position of the target image or the video by utilizing the affine transformation matrix.

6. The method of claim 1, further comprising:

the discrimination module optimizes the Unet neural network using a countering loss function, a perceptual loss function, an identity loss function, and a reconstruction loss function.

7. The method of claim 6,

the opposition loss function satisfies:

the identity loss function is expressed as:

the reconstruction loss function is expressed as:

the perceptual loss function is expressed as:

wherein G represents the result of the generator, D_iI-th discrimination result representing the discriminator, n representing the number of discrimination results, F_tRepresenting the image of the target face, F_sRepresenting the source face image, B_tRepresenting a connecting line graph of key points of the human face,

representing the desired value, Y representing the generated face image, lambda₁Representing weight values, H and W respectively representing height and width of the feature map, C representing number of feature channels, F^sAnd Y^sAre respectively F_tAnd Y in s layers, m representing the number of layers, C^s、H^sAnd W^sFG representing the number, width and height of channels of the s-layer, respectively^sAnd YG^sIs F^sAnd Y^sThe matrix of the gram of (a) is,

8. The method of claim 1, wherein the source or target facial image comprises a still image or a moving image, the moving image comprising a video frame;

the attributes of the target face comprise at least one of the following attributes: pose, expression, skin color, lighting and makeup of the target face;

the method further comprises the following steps: and inputting the synthesized fidelity human face privacy protection images with different resolutions into a plurality of discrimination modules with the same network structure, and optimizing the Unet neural network structure of the generation module based on the discrimination result.

9. A high fidelity face privacy protection system based on a generative confrontation network, the system comprising a processor and a memory, wherein the memory has stored therein computer instructions for execution by the processor, the apparatus implementing the steps of the method of any one of claims 1 to 8 when the computer instructions are executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.