CN114981835A

CN114981835A - Training method and device of face reconstruction model, face reconstruction method and device, electronic equipment and readable storage medium

Info

Publication number: CN114981835A
Application number: CN202080002537.5A
Authority: CN
Inventors: 卢运华; 张丽杰; 陈冠男; 刘瀚文
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2022-08-30
Also published as: WO2022087941A1

Abstract

A training method and device for a face reconstruction model, a face reconstruction method and device, electronic equipment and a readable storage medium are provided. The training method of the human face reconstruction model comprises the steps of obtaining training data (101); inputting the first face image into a first network model to obtain a second face image (102); inputting the target face image and the second face image into a second network model to obtain a judgment result (103); obtaining a first loss function corresponding to the first network model, and adjusting parameters of the first network model according to the first loss function (104); obtaining a second loss function corresponding to the second network model, and adjusting parameters of the second network model according to the second loss function (105); alternately performing the above steps to model train (106) the first network model and the second network model in turn; and taking the trained first network model as a face reconstruction model (107). The scheme can improve the accuracy and definition of face reconstruction.

Description

Training method and device for face reconstruction model, face reconstruction method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of face reconstruction technologies, and in particular, to a training method and apparatus for a face reconstruction model, an electronic device, and a readable storage medium.

Background

The face reconstruction refers to a technology for reconstructing a face image of a person included in video data, and is widely applied to the aspects of person identification and tracking, and the like, and in the related technology, the accuracy and the definition of the face reconstruction are poor.

Disclosure of Invention

The embodiment of the disclosure provides a training method and a training device for a face reconstruction model, and a face reconstruction method and a face reconstruction device, so as to solve the problem of poor accuracy and definition of face reconstruction.

In a first aspect, an embodiment of the present disclosure provides a training method for a face reconstruction model, including the following steps:

acquiring training data, wherein the training data comprises a target face image and a first face image corresponding to the target face image, and the definition of the first face image is smaller than that of the target face image;

inputting the first face image into a first network model to obtain a second face image, wherein the first network model takes the face image as input and takes a reconstructed image of the input face image as output to generate a network model;

inputting the target face image and the second face image into a second network model to obtain a judgment result, wherein the second network model is the judgment network model which takes the face image as input and takes the judgment result of the authenticity of the input face image as output, and the judgment result comprises the judgment results of the integral authenticity and the authenticity of local features of the input face image;

acquiring a first loss function corresponding to the first network model, and adjusting parameters of the first network model according to the first loss function;

acquiring a second loss function corresponding to the second network model, and adjusting parameters of the second network model according to the second loss function;

alternately carrying out the steps to carry out model training on the first network model and the second network model in turn;

and taking the trained first network model as a face reconstruction model, wherein under the condition of training completion, the values of the first loss function and the second loss function reach corresponding target threshold values.

In some embodiments, the second loss function comprises a first discriminative opponent loss, the second network model comprises a global discriminative subnetwork;

the obtaining of the second loss function corresponding to the second network model includes:

marking the second face image as false, marking the target face image as true, and respectively inputting the second face image and the target face image into the global discrimination sub-network to respectively obtain a first discrimination result and a second discrimination result;

and obtaining the first discrimination countermeasure loss according to the first discrimination result and the second discrimination result.

In some embodiments, the second loss function comprises a second discriminant pair loss and a third discriminant pair loss, the second network model further comprising an eye discriminant subnetwork and a mouth discriminant subnetwork;

obtaining a corresponding first eye image and a corresponding first mouth image according to the second face image;

obtaining a corresponding second eye image and a second mouth image according to the target face image;

marking the first eye image and the first mouth image as false, marking the second eye image and the second mouth image as true, and respectively inputting the first eye image and the second eye image into the eye judgment subnetwork to respectively output a third judgment result and a fourth judgment result; the first mouth image and the second mouth image are respectively input into the mouth judging sub-network and respectively output a fifth judging result and a sixth judging result;

obtaining a second judgment countermeasure loss according to the third judgment result and the fourth judgment result;

and obtaining a third discrimination countermeasure loss according to the fifth discrimination result and the sixth discrimination result.

In some embodiments, the first loss function includes a first sub-loss and a second sub-loss;

the obtaining of the first loss function corresponding to the first network model includes:

acquiring a first face bitmap and a second face bitmap corresponding to the target face image;

analyzing the second face image to obtain a third face bitmap and a fourth face bitmap corresponding to the second face image, wherein the first face bitmap and the second face bitmap correspond to different regions of the same face image, the first face bitmap and the third face bitmap correspond to the same region of different face images, and the second face bitmap and the fourth face bitmap correspond to the same region of different face images;

obtaining the first sub-loss according to the difference between the first face bitmap and the third face bitmap;

and obtaining the second sub-loss according to the difference between the second face bitmap and the fourth face bitmap.

In some embodiments, the first face bitmap comprises a facial image of a human face, and the second face bitmap comprises a skin image of the facial image.

In some embodiments, the first loss function includes a third sub-loss;

the obtaining of the first loss function corresponding to the first network model further includes:

acquiring first feature point data corresponding to the target face image;

analyzing the second face image to obtain second feature point data corresponding to the second face image;

and obtaining the third sub-loss according to the difference between the first characteristic point data and the second characteristic point data.

In some embodiments, the first feature point data comprises a heat map of the target face image and the second feature point data comprises a heat map of the second face image, wherein the heat map comprises one or more of a left eye heat map, a right eye heat map, a nose heat map, a mouth heat map and a face contour heat map of the face image.

In some embodiments, the first loss function includes a fourth sub-loss;

acquiring a first feature vector corresponding to the target face image;

acquiring a second feature vector corresponding to the second face image;

and obtaining the fourth sub-loss according to the difference between the first feature vector and the second feature vector.

In some embodiments, the first loss function includes a fifth sub-loss;

and obtaining the fifth sub-loss according to the difference between the target face image and the second face image.

In some embodiments, the first loss function comprises one or more of a sixth sub-loss and a seventh sub-loss;

the obtaining a first loss function corresponding to the first network model further includes:

taking the perception loss of the difference between the eye region image of the target face image and the eye region image of the second face image as the sixth sub-loss; and/or

And the perception loss according to the difference between the mouth region image of the target face image and the mouth region image of the second face image is used as the seventh sub-loss.

In some embodiments, the first loss function includes an eighth sub-loss;

and obtaining the eighth sub-loss according to a generated countermeasure loss between the first network model and the second network model, wherein the second network model comprises one or more of a global discrimination sub-network, an eye discrimination sub-network and a mouth discrimination sub-network, and the generated countermeasure loss is determined according to the obtained discrimination result after the second face image output by the first network model is marked as true and then the second face image is input into one or more of the global discrimination sub-network, the eye discrimination sub-network and the mouth discrimination sub-network.

In a second aspect, an embodiment of the present disclosure provides a face reconstruction method, including the following steps:

acquiring an input image;

inputting the input image into a face reconstruction model to obtain a face reconstruction image, wherein the face reconstruction model is obtained by performing model training through the training method of the face reconstruction model in any one of the first aspect.

In a third aspect, an embodiment of the present disclosure provides a training apparatus for a face reconstruction model, including:

the training data acquisition module is used for acquiring training data, wherein the training data comprises a target face image and a first face image corresponding to the target face image, and the definition of the first face image is smaller than that of the target face image;

the first input module is used for inputting the first face image into a first network model to obtain a second face image, wherein the first network model takes the face image as input and takes a reconstructed image of the input face image as output to generate a network model;

the second input module is used for inputting the target face image and the second face image into a second network model to obtain a judgment result, wherein the second network model is the judgment network model which takes the face image as input and takes the judgment result of the authenticity of the input face image as output, and the judgment result comprises the judgment results of the integral authenticity and the authenticity of local features of the input face image;

a first loss function obtaining module, configured to obtain a first loss function corresponding to the first network model, and adjust a parameter of the first network model according to the first loss function;

a second loss function obtaining module, configured to obtain a second loss function corresponding to the second network model, and adjust a parameter of the second network model according to the second loss function;

the training module is used for performing model training on the first network model and the second network model in turn;

and the face reconstruction model confirmation module is used for taking the trained first network model as a face reconstruction model, wherein under the condition of finishing training, the values of the first loss function and the second loss function reach corresponding target threshold values.

In a fourth aspect, an embodiment of the present disclosure provides a face reconstruction apparatus, including:

the input image acquisition module is used for acquiring an input image;

and the input module is used for inputting the input image into a face reconstruction model to obtain a face reconstruction image, wherein the face reconstruction model is obtained by performing model training through the training method of the face reconstruction model in any one of the first aspect.

In a fifth aspect, the disclosed embodiments provide an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the training method for a face reconstruction model according to any one of the first aspect, or implements the steps of the face reconstruction method according to the second aspect.

In a fifth aspect, the disclosed embodiments provide a readable storage medium, on which a computer program is stored, which when executed by the processor implements the steps of the training method for a face reconstruction model according to any one of the first aspect, or implements the steps of the face reconstruction method according to the second aspect.

The method and the device for generating the confrontation network model train the first network model and the second network model by establishing the generation confrontation network comprising the first network model and the second network model, wherein the second network model comprises integral authenticity and discrimination results of authenticity of local features, and the accuracy of judgment of output results of the first network model is improved, so that the accuracy of a trained reconstruction model for face image reconstruction is improved, the iteration speed is improved, and the model training efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required to be used in the description of the embodiments of the present disclosure will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings may be obtained according to the drawings without inventive labor.

Fig. 1 is a flowchart of a training method of a face reconstruction model according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a face reconstruction method according to an embodiment of the present disclosure;

fig. 3 is a structural diagram of a training apparatus for a face reconstruction model according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a face reconstruction apparatus according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The embodiment of the disclosure provides a training method of a human face reconstruction model.

As shown in fig. 1, in an embodiment, the training method of the face reconstruction model includes the following steps:

step 101: training data is acquired.

The training data in this embodiment is also referred to as a training set, the training data includes a target face image and a first face image, the training data includes the target face image and the first face image corresponding to the target face image, and the sharpness of the first face image is smaller than the sharpness of the target face image. The format of the face image can be video data or a photo.

The target face image and the first face image in this embodiment may be directly provided by the training data, or only the target face image, that is, the face image with higher definition may be provided, and then the definition of the target face image is reduced to generate the first face image, which is also called to degrade the target face image, and after degradation, the definition of the target face image is reduced, so as to obtain the first face image with the definition smaller than that of the target face image.

In the embodiments of the present disclosure, "definition" may refer to the definition of each detail shadow and its boundary in an image, and the higher the definition is, the better the human eye perception effect is. The definition of the output image is higher than that of the input image, for example, the image processing method provided by the embodiment of the disclosure is used for processing the input image, for example, denoising, super-resolution and/or deblurring processing is performed, so that the output image obtained after processing is clearer than the input image

In one embodiment, the target face image may be degraded by one or more of adding noise to the image, gaussian blur, adjusting brightness and contrast of the image, scaling the image, morphing the image, adding a motion blur effect to the image.

It should be understood that the quality of the face image in the target face image is relatively high, for example, the target face image may have suitable brightness and contrast, proper image proportion, no motion blur, high image quality, and the like, and when implemented, the target face image may be degraded by reducing or increasing the brightness and contrast, adjusting the image proportion to make the image proportion be out of balance, and the like, so as to obtain the first face image, that is, obtain the face image with lower definition.

Thus, training data of the target face image and the first face image can be obtained.

Step 102: and inputting the first face image into a first network model to obtain a second face image.

The first network model in this embodiment is a generated network model that takes a face image as input and takes a reconstructed image of the input face image as output.

The first network model in this embodiment is used as a generator for processing and reconstructing the input first face image. The first network model performs defuzzification or resolution improvement processing on the first face image, so as to obtain a second face image from the first face image, in other words, the second face image is a reconstruction result of the first network model on the first face image.

Step 103: and inputting the target face image and the second face image into a second network model to obtain a judgment result.

In this embodiment, the second network model is a discrimination network model that takes a face image as an input and takes a discrimination result of authenticity of the input face image as an output.

The second network model in this embodiment is equivalent to a discriminator, and the first network model and the second network model together form a generation countermeasure network for model training.

The second network model discrimination result comprises discrimination results of the integral authenticity of the input face image and the authenticity of the local features, wherein the integral authenticity refers to the discrimination result of the input face image from the global angle of the face image, and the authenticity of the local features refers to the discrimination result of the local detail features of the face image.

Generally, the output of the second network model as the discriminator has a value between 0 and 1, wherein the closer the discrimination result is to 1, the higher the authenticity of the discrimination of the second network model is indicated, whereas if the discrimination result is to 0, the lower the authenticity of the discrimination of the second network model is indicated.

Step 104: and acquiring a first loss function corresponding to the first network model, and adjusting parameters of the first network model according to the first loss function.

Step 105: and acquiring a second loss function corresponding to the second network model, and adjusting parameters of the second network model according to the second loss function.

Step 106: and alternately carrying out the steps to alternately carry out model training on the first network model and the second network model.

Further, a first loss function corresponding to the first network model and a second loss function corresponding to the second network model are respectively established, and parameters of the corresponding first network model or the second network model are adjusted according to the established loss functions to carry out model training on the first network model and the second network model.

In this embodiment, the process of alternately training the first network model and the second network model may be adjusted. For example, the first network model may be trained once, the second network model may be trained once, the first network model may be trained once again, and so on; the first network model may also be trained multiple times, followed by one training of the second network model, followed by multiple training of the first network model, and so on. Obviously, the training manner for the first network model and the second network model in the present embodiment is not limited to this.

Step 107: and taking the trained first network model as a face reconstruction model.

In this embodiment, when the values of the first loss function and the second loss function both reach the corresponding target thresholds, or the first loss function and the second loss function are both converged, it is considered that the first network model is trained, and the trained first network model is the face reconstruction model meeting the face reconstruction requirements. It should be noted that the target threshold value here may be set according to actual conditions, and may be, for example, a minimum value or a maximum value that the first loss function or the second loss function can reach.

In some embodiments, the first penalty function comprises a first sub-penalty and a second sub-penalty, and step 104 comprises:

analyzing the second face image to obtain a third face bitmap and a fourth face bitmap corresponding to the second face image;

In this embodiment, the first face bitmap and the second face bitmap corresponding to the target face image may be directly provided by the training data, or may be obtained by analyzing the target face image. And the third face bitmap and the fourth face bitmap corresponding to the second face image are obtained by analyzing the second face image.

The Face image analysis to obtain the corresponding Face bitmap can be realized by using a pre-trained Face analysis model, and the Face analysis model can select the existing or improved Face analysis model such as RoI Tang (Face matching with RoI Tang-Warping), and the like, and is not further limited herein.

In this embodiment, the first face bitmap and the second face bitmap correspond to different regions of the same face image, and the regions corresponding to the first face bitmap and the third face bitmap are the same, in other words, the first face bitmap corresponds to a certain region, such as an eye region, of the target face image, and the third face bitmap corresponds to an eye region of the second face image, and similarly, the second face bitmap and the fourth face bitmap correspond to the same region of the face image.

In some embodiments, the first facial bitmap comprises a facial image of a facial image and the second facial bitmap comprises a skin image of the facial image.

That is, the first face bitmap and the third face bitmap correspond to images of five sense organs in the face image, which are also referred to as organ maps in this embodiment, and the organ may be five sense organs such as mouth, nose, eyes, eyebrows, and ears. The second face bitmap and the fourth face bitmap correspond to skin regions other than the five sense organs.

By respectively obtaining the first sub-loss corresponding to the organ map and the second sub-loss corresponding to the skin map, the reconstruction results of the first network model for the organ region and the skin region can be respectively determined, so that the adjustment precision of the first network model is improved, and the model training efficiency is improved.

It should be understood that, each sub-loss in this embodiment may be calculated in different manners, for example, the L1 loss of the first person face bitmap and the third person face bitmap may be used as the first sub-loss, and the L2 loss of the first person face bitmap and the third person face bitmap may also be used as the first sub-loss. The L1 loss refers to a minimum absolute deviation (LAD), and the L2 loss refers to a Least Square Error (LSE), and the specific calculation method may refer to related technologies, which are not described herein again.

In this embodiment, the first sub-loss is described as L2 loss between the organ map of the target face image generated by the face analysis model and the organ map of the second face image output by the first network model, and in this embodiment, the first sub-loss is referred to as L2_ feat.

In this embodiment, the second sub-loss is taken as an example of an L2 loss between the skin map of the target face image generated by the face analysis model and the skin map of the second face image output by the first network model, and in this embodiment, the second sub-loss is denoted as L2_ skin.

It should be understood that the face map and the skin map of the face image are images, and can embody the feeling from the visual and subjective perspective of a person, that is, the feeling of the similarity between the output result of the first network model and the target face image from the visual observation perspective of a person.

In some embodiments, the first loss function includes a third sub-loss, and the step 104 further includes:

acquiring first feature point data corresponding to the target face image;

In this embodiment, the face alignment analysis is performed according to the feature points in the target face image and the second face image, and the process of the face alignment analysis may be understood as extracting first feature point data of the target face image through a face alignment model, then extracting second feature point data of the second face image, comparing the extracted first feature point data with the extracted second feature point data, and determining a third sub-loss according to a difference value of the extracted first feature point data and the extracted second feature point data.

The analysis of the feature point data may be understood as comparing the similarity between the output result of the first network model and the real face corresponding to the input image from a numerical point of view.

In some embodiments, the feature point data of the face image can be extracted through coordinate regression, the speed is high, and the calculation amount is small.

In some other embodiments, the feature point data comprises a heat map of the face image, the heat map of the face image comprising one or more of a left eye heat map, a right eye heat map, a nose heat map, a mouth heat map, and a face contour heat map of the face image. For example, the left-eye heat map refers to a heat map composed of key points located in a region corresponding to the left eye, the face contour heat map refers to a heat map composed of key points corresponding to regions outside each organ, and so on, to generate a plurality of partial heat maps constituting a face image. By generating a plurality of local heat maps constituting the face image, it is helpful to further improve the accuracy of the feature point data calculation for the face image.

In this embodiment, first, key points are determined, and the number of the key points may be set as required, for example, a 68-point heat map is selected; next, outputting n heatmaps with the same number n of key points, which is 68 heatmaps in this embodiment; further, a point with the highest peak value in the heat map is searched and used as a key point, or the contribution value of each pixel point in the heat map is weighted and calculated to obtain the coordinate of the key point.

The calculation accuracy can be further improved by obtaining feature point data of the face image based on the heat map regression.

In implementation, a pre-trained Face Alignment model is provided, and the Face Alignment model may be a Face Alignment model such as AWing ([ ICCV 2019] Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression), and may refer to related technologies.

Then, the face alignment model is used to obtain first feature point data of the target face image, namely the heat map of the target face image, and the face alignment model is used to obtain second feature point data of the second face image, namely the heat map of the second face image.

And finally, obtaining a third sub-loss according to the difference between the first characteristic point data and the second characteristic point data. In this embodiment, the third sub-loss is the L2 loss of the heat map of the target face image and the heat map of the second face image, which is denoted as L2_ heatmap.

In some embodiments, the first loss function includes a fourth sub-loss, and the step 104 further includes:

acquiring a first feature vector corresponding to the target face image;

acquiring a second feature vector corresponding to the second face image;

In this embodiment, feature analysis is also performed on the target face image and the second face image, specifically, a feature vector of the target face image is calculated first, then a feature vector of the second face image is calculated, and finally a second sub-loss is determined according to a difference between the two feature vectors.

In this embodiment, the cosine similarity of the two eigenvectors is calculated, and then the cosine similarity is subtracted from 1 to serve as a loss function corresponding to the feature analysis, and the fourth sub-loss is denoted as lcos similarity in this embodiment.

In some embodiments, the first loss function includes a fifth sub-loss, and the step 104 further includes:

In this embodiment, a loss of L2 between the target face image and the face reconstruction image output by the first network model is further introduced as a fifth sub-loss. During implementation, the difference value between the target Face image and the second Face image can be determined through a pre-training Face Recognition model, and the Face Recognition model can be an existing or improved Face Recognition model such as ArcFace, Additive Angular Face Loss for Deep Face Recognition and the like. The fifth sub-loss is denoted as L20 in the examples.

In some embodiments, the first loss function comprises one or more of a sixth sub-loss and a seventh sub-loss, and the step 104 further comprises:

In this embodiment, the eye region image and the mouth region image are further analyzed, and the perceptual loss of the target face image and the second face image in the eye region image is determined respectively, and is regarded as a sixth sub-loss, which is denoted as L2_ eye; the perceptual loss between the target face image and the second face image at the mouth region image is determined as a seventh sub-loss, denoted as L2_ mouth.

In some embodiments, the first loss function includes an eighth sub-loss, and the step 104 further includes:

obtaining the eighth sub-loss according to a generation countermeasure loss between the first network model and the second network model.

In implementation, the second face image output by the first network model is firstly marked as true, specifically, for example, it is marked as 1, then the second face image is input into one or more of the global discrimination sub-network, the eye discrimination sub-network and the mouth discrimination sub-network, and then the corresponding discrimination result is obtained, the obtained discrimination result is a numerical value between 0 and 1, and according to the difference between the discrimination result and 1, the generated countermeasure loss between the first network model and the second network model is obtained and is marked as the eighth sub-loss LG.

In this embodiment, the second network model includes one or more of a global discrimination subnetwork, an eye discrimination subnetwork, and a mouth discrimination subnetwork.

After the second face image marked as true is input into the global judgment sub-network, the global countermeasure loss can be determined according to the judgment result of the global judgment sub-network and is marked as LG _ all; after the second face image marked as true is input into the eye judgment sub-network, eye resistance loss can be determined according to the judgment result of the eye judgment sub-network and is marked as LG _ eye; after the second face image marked as true is input into the mouth judging subnetwork, the mouth confrontation loss can be determined according to the judgment result of the mouth judging subnetwork, and is marked as LG _ mouth.

After the first to eighth sub-losses are determined, a first loss function can be obtained, where the first loss function is denoted as L in this embodiment, then:

L＝w1*L2_feat+w2*L2_skin+w3*L2_heatmap+w4*LCosSimilarity+w5*L20+w6*L2_eye+w7*L2_mouth+LG。

wherein LG _ w8 × LG _ all + w9 × LG _ eye + w10 × LG _ mouth.

In the above formula, w1 to w10 are weight coefficients corresponding to the loss values, respectively, and may be set as needed, for example, all of them may be set to 1, or a coefficient corresponding to a loss value with a greater importance degree may be set relatively greater according to importance degrees of different loss values, so as to obtain a first loss function corresponding to the first network model.

In some embodiments, the second loss function comprises a first discriminant penalty, and step 106 further comprises:

and obtaining the first discrimination countermeasure loss according to the first discrimination result and the second discrimination result. .

In this embodiment, the second network model includes a global discriminant subnetwork, and in implementation, the second face image output by the first network model is first marked as false, for example, may be marked as 0, and the target face image is marked as true, for example, may be marked as 0. And then respectively inputting the second face image and the target face image into a global discrimination sub-network to obtain a discrimination result, wherein the discrimination result is a numerical value between 0 and 1, the discrimination result corresponding to the second face image is a first discrimination result, and the discrimination result corresponding to the second face image is a second discrimination result.

Next, the first discrimination countermeasure loss corresponding to the first network model and the global discrimination subnetwork determined according to the obtained first discrimination result and the second discrimination result is denoted as LD _ all.

In some embodiments, the second loss function comprises a second discriminant penalty and a third discriminant penalty, and step 106 further comprises:

marking the first eye image and the first mouth image as false, marking the second eye image and the second mouth image as true, respectively inputting the first eye image and the second eye image into the eye judgment sub-network, and respectively outputting a third judgment result and a fourth judgment result; the first mouth image and the second mouth image are respectively input into the mouth judging sub-network and respectively output a fifth judging result and a sixth judging result;

obtaining a second discrimination countermeasure loss according to the third discrimination result and the fourth discrimination result;

and obtaining a third discrimination countermeasure loss according to the fifth discrimination result and the sixth discrimination result. .

When the second discrimination immunity loss and the third discrimination immunity loss are determined, the eye image and the mouth image of the second face image need to be extracted.

When determining the second discrimination immunity loss and the third discrimination immunity loss, it is further necessary to extract an eye image and a mouth image of the target face image.

The extracted first eye image and first mouth image are both marked as false, e.g., both marked as 0, and the second eye image and second mouth image are both marked as true, e.g., both marked as 1.

Inputting the first eye image into an eye judgment sub-network to obtain a third judgment result; inputting the second eye image into an eye judgment sub-network to obtain a fourth judgment result; inputting the first mouth image into a mouth judging subnetwork to obtain a fifth judging result; the second mouth image is input to the mouth discrimination sub-network, and a sixth discrimination result is obtained.

And finally, obtaining a second judgment countermeasure loss according to the difference between the third judgment result and the fourth judgment result, and recording the second judgment countermeasure loss as LD _ eye, and obtaining a third judgment countermeasure loss according to the difference between the fifth judgment result and the sixth judgment result, and recording the third judgment countermeasure loss as LD _ mouth.

After the first discriminatory countermeasure loss, the second discriminatory countermeasure loss, and the third discriminatory countermeasure loss are determined, a second loss function, denoted as LD w11 LD _ all + w12 LD _ mouth + w13 LD _ mouth can be obtained. W11 to w13 are weight coefficients corresponding to the loss values.

The embodiment of the disclosure also provides a face reconstruction method.

As shown in fig. 2, the face reconstruction method includes the following steps:

step 201: acquiring an input image;

step 202: and inputting the input image into a face reconstruction model to obtain face reconstruction data.

In this embodiment, the face reconstruction model is obtained by performing model training by using any one of the above training methods for a face reconstruction model.

In this embodiment, the face reconstruction model used is a face reconstruction model obtained by training the face reconstruction model by the above-mentioned training method, and the input image is input into the face reconstruction model, so that a face reconstruction result with a high degree of consistency with a real face image can be output.

The present disclosure provides a training device for a face reconstruction model.

In one embodiment, as shown in fig. 3, the training apparatus 300 for face reconstruction model includes:

a training data obtaining module 301, configured to obtain training data, where the training data includes a target face image and a first face image corresponding to the target face image, and a definition of the first face image is smaller than a definition of the target face image;

a first input module 302, configured to input the first face image into a first network model to obtain a second face image, where the first network model is a generated network model that takes the face image as input and takes a reconstructed image of the input face image as output;

a second input module 302, configured to input the target face image and the second face image into a second network model to obtain a determination result, where the second network model is a determination network model that takes the face image as input and takes a determination result of authenticity of the input face image as output, and the determination result includes a determination result of authenticity of the whole face image and authenticity of a local feature of the input face image;

a first loss function obtaining module 304, configured to obtain a first loss function corresponding to the first network model, and adjust a parameter of the first network model according to the first loss function;

a second loss function obtaining module 305, configured to obtain a second loss function corresponding to the second network model, and adjust a parameter of the second network model according to the second loss function;

a training module 306, configured to perform model training on the first network model and the second network model in turn;

and a face reconstruction model confirmation module 307, configured to use the trained first network model as a face reconstruction model, where values of the first loss function and the second loss function both reach corresponding target thresholds when training is completed.

the second loss function obtaining module 305 includes:

a first discrimination result obtaining sub-module, configured to mark the second face image as false, mark the target face image as true, and input the second face image and the target face image into the global discrimination sub-network respectively to obtain a first discrimination result and a second discrimination result respectively;

and the first judgment countermeasure loss acquisition submodule is used for acquiring the first judgment countermeasure loss according to the first judgment result and the second judgment result.

In some embodiments, the second loss function comprises a second discriminative opposition loss and a third discriminative opposition loss, the second network model further comprising an eye discriminative subnetwork and a mouth discriminative subnetwork;

the second loss function obtaining module 305 further includes:

a second discrimination countermeasure loss acquisition sub-module for

The first image acquisition submodule is used for acquiring a corresponding first eye image and a corresponding first mouth image according to the second face image;

the second image acquisition submodule is used for acquiring a corresponding second eye image and a corresponding second mouth image according to the target face image;

a marking sub-module, configured to mark the first eye image and the first mouth image as false, mark the second eye image and the second mouth image as true, input the first eye image and the second eye image to the eye discrimination sub-network respectively, and output a third discrimination result and a fourth discrimination result respectively; the first mouth image and the second mouth image are respectively input into the mouth judging sub-network and respectively output a fifth judging result and a sixth judging result;

a second judgment countermeasure loss obtaining sub-module, configured to obtain a second judgment countermeasure loss according to the third judgment result and the fourth judgment result;

and the third discrimination countermeasure loss acquisition submodule is used for acquiring a third discrimination countermeasure loss according to the fifth discrimination result and the sixth discrimination result.

the first loss function obtaining module 304 includes:

the face bitmap acquisition sub-module is used for acquiring a first face bitmap and a second face bitmap corresponding to the target face image;

the first analysis sub-module is used for analyzing the second face image to obtain a third face bitmap and a fourth face bitmap corresponding to the second face image, wherein the first face bitmap and the second face bitmap correspond to different regions of the same face image, the first face bitmap and the third face bitmap correspond to the same region of different face images, and the second face bitmap and the fourth face bitmap correspond to the same region of different face images;

a first sub-loss obtaining sub-module, configured to obtain the first sub-loss according to a difference between the first face bitmap and the third face bitmap;

and the second sub-loss obtaining sub-module is used for obtaining the second sub-loss according to the difference between the second face bitmap and the fourth face bitmap.

In some embodiments, the first facial bitmap comprises a facial image of a facial image, and the second facial bitmap comprises a skin image of the facial image.

In some embodiments, the first loss function includes a third sub-loss;

the first loss function obtaining module 304 further includes:

the feature point data acquisition submodule is used for acquiring first feature point data corresponding to the target face image;

the second analysis submodule is used for analyzing the second face image to obtain second feature point data corresponding to the second face image;

and the second sub-loss obtaining sub-module is used for obtaining the third sub-loss according to the difference between the first characteristic point data and the second characteristic point data.

In some embodiments, the first loss function includes a fourth sub-loss;

the first loss function obtaining module 304 further includes:

the feature vector acquisition sub-module is used for acquiring a first feature vector corresponding to the target face image;

the feature vector acquisition sub-module is further configured to acquire a second feature vector corresponding to the second face image;

and the fourth sub-loss obtaining sub-module is used for obtaining the fourth sub-loss according to the difference between the first characteristic vector and the second characteristic vector.

In some embodiments, the first loss function includes a fifth sub-loss;

the first loss function obtaining module 304 further includes:

and the fifth sub-loss obtaining sub-module is used for obtaining the fifth sub-loss according to the difference between the target face image and the second face image.

the first loss function obtaining module 304 further includes:

a sixth sub-loss obtaining sub-module, configured to obtain a perception loss according to a difference between the eye region image of the target face image and the eye region image of the second face image, as a sixth sub-loss; and/or

A seventh sub-loss obtaining sub-module, configured to obtain a perceptual loss according to a difference between the mouth region image of the target face image and the mouth region image of the second face image as the seventh sub-loss.

In some embodiments, the first loss function includes an eighth sub-loss;

the first loss function obtaining module 304 further includes:

and the eighth sub-loss obtaining sub-module is configured to obtain the eighth sub-loss according to a generated countermeasure loss between the first network model and the second network model, where the second network model includes one or more of a global discrimination sub-network, an eye discrimination sub-network, and a mouth discrimination sub-network, and the generated countermeasure loss is determined according to a discrimination result obtained after the second face image output by the first network model is marked as true and then the second face image is input into one or more of the global discrimination sub-network, the eye discrimination sub-network, and the mouth discrimination sub-network.

The training device for the face reconstruction model of the embodiment of the present disclosure can implement the steps of the above-mentioned training method for the face reconstruction model, and can at least implement the same or similar technical effects, and will not be described herein again

The embodiment of the disclosure provides a human face reconstruction device.

As shown in fig. 4, in one embodiment, the face reconstruction apparatus 400 includes:

an input image acquisition module 401, configured to acquire an input image;

an input module 402, configured to input the input image into a face reconstruction model to obtain a face reconstruction image, where the face reconstruction model is obtained by performing model training through any one of the above-mentioned training methods for a face reconstruction model.

The face reconstruction device of the embodiment of the present disclosure implements each step of the face reconstruction method, and can at least implement the same or similar technical effects, and is not repeated herein

An embodiment of the present disclosure provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the training method for a face reconstruction model according to any one of the above claims, or implements the steps of the face reconstruction method according to the above.

The disclosed embodiments provide a readable storage medium, on which a computer program is stored, which when executed by the processor, implements the steps of the training method for a face reconstruction model as described in any one of the above, or implements the steps of the face reconstruction method as described above.

The electronic device and the readable storage medium of this embodiment can implement the steps of the above training method for a face reconstruction model and the face reconstruction method, and can at least achieve the same or similar technical effects, which are not described herein again.

The above is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

A training method of a face reconstruction model comprises the following steps:

acquiring training data, wherein the training data comprises a target face image and a first face image corresponding to the target face image, and the definition of the first face image is smaller than that of the target face image;

inputting the first face image into a first network model to obtain a second face image, wherein the first network model takes the face image as input and takes a reconstructed image of the input face image as output to generate a network model;

inputting the target face image and the second face image into a second network model to obtain a judgment result, wherein the second network model is the judgment network model which takes the face image as input and takes the judgment result of the authenticity of the input face image as output, and the judgment result comprises the judgment results of the integral authenticity and the authenticity of local features of the input face image;

acquiring a first loss function corresponding to the first network model, and adjusting parameters of the first network model according to the first loss function;

acquiring a second loss function corresponding to the second network model, and adjusting parameters of the second network model according to the second loss function;

alternately carrying out the steps to carry out model training on the first network model and the second network model in turn;

and taking the trained first network model as a face reconstruction model, wherein under the condition of training completion, the values of the first loss function and the second loss function reach corresponding target threshold values.
The method of claim 1, wherein the second loss function comprises a first discriminative opponent loss, the second network model comprises a global discriminative subnetwork;

the obtaining of the second loss function corresponding to the second network model includes:

marking the second face image as false, marking the target face image as true, and respectively inputting the second face image and the target face image into the global discrimination sub-network to respectively obtain a first discrimination result and a second discrimination result;

and obtaining the first discrimination countermeasure loss according to the first discrimination result and the second discrimination result.
The method of claim 2, wherein the second loss function comprises a second discriminative opponent loss and a third discriminative opponent loss, the second network model further comprising an eye discriminative subnetwork and a mouth discriminative subnetwork;

the obtaining of the second loss function corresponding to the second network model includes:

obtaining a corresponding first eye image and a corresponding first mouth image according to the second face image;

obtaining a corresponding second eye image and a second mouth image according to the target face image;

marking the first eye image and the first mouth image as false, marking the second eye image and the second mouth image as true, respectively inputting the first eye image and the second eye image into the eye judgment sub-network, and respectively outputting a third judgment result and a fourth judgment result; the first mouth image and the second mouth image are respectively input into the mouth judging sub-network and respectively output a fifth judging result and a sixth judging result;

obtaining a second discrimination countermeasure loss according to the third discrimination result and the fourth discrimination result;

and obtaining a third discrimination countermeasure loss according to the fifth discrimination result and the sixth discrimination result.
The method of any of claims 1-3, wherein the first loss function includes a first sub-loss and a second sub-loss;

the obtaining of the first loss function corresponding to the first network model includes:

acquiring a first face bitmap and a second face bitmap corresponding to the target face image;

analyzing the second face image to obtain a third face bitmap and a fourth face bitmap corresponding to the second face image, wherein the first face bitmap and the second face bitmap correspond to different regions of the same face image, the first face bitmap and the third face bitmap correspond to the same region of different face images, and the second face bitmap and the fourth face bitmap correspond to the same region of different face images;

obtaining the first sub-loss according to the difference between the first face bitmap and the third face bitmap;

and obtaining the second sub-loss according to the difference between the second face bitmap and the fourth face bitmap.
The method of claim 4, wherein the first facial bitmap comprises a facial image of a facial image and the second facial bitmap comprises a skin image of the facial image.
The method of any of claims 1-3, wherein the first loss function includes a third sub-loss;

the obtaining of the first loss function corresponding to the first network model further includes:

acquiring first feature point data corresponding to the target face image;

analyzing the second face image to obtain second feature point data corresponding to the second face image;

and obtaining the third sub-loss according to the difference between the first characteristic point data and the second characteristic point data.
The method of claim 6, wherein the first feature point data comprises a heat map of the target face image and the second feature point data comprises a heat map of the second face image, wherein a heat map comprises one or more of a left eye heat map, a right eye heat map, a nose heat map, a mouth heat map, and a face contour heat map of a face image.
The method of any of claims 1-3, wherein the first loss function includes a fourth sub-loss;

the obtaining a first loss function corresponding to the first network model further includes:

acquiring a first feature vector corresponding to the target face image;

acquiring a second feature vector corresponding to the second face image;

and obtaining the fourth sub-loss according to the difference between the first feature vector and the second feature vector.
The method of any of claims 1-3, wherein the first loss function includes a fifth sub-loss;

the obtaining of the first loss function corresponding to the first network model further includes:

and obtaining the fifth sub-loss according to the difference between the target face image and the second face image.
The method of any one of claims 1 to 3, wherein the first loss function comprises one or more of a sixth sub-loss and a seventh sub-loss;

the obtaining of the first loss function corresponding to the first network model further includes:

taking the perception loss of the difference between the eye region image of the target face image and the eye region image of the second face image as the sixth sub-loss; and/or

And the perception loss according to the difference between the mouth region image of the target face image and the mouth region image of the second face image is used as the seventh sub-loss.
The method of any of claims 1-3, wherein the first loss function includes an eighth sub-loss;

the obtaining of the first loss function corresponding to the first network model further includes:

and obtaining the eighth sub-loss according to a generated countermeasure loss between the first network model and the second network model, wherein the second network model comprises one or more of a global discrimination sub-network, an eye discrimination sub-network and a mouth discrimination sub-network, and the generated countermeasure loss is determined according to the obtained discrimination result after the second face image output by the first network model is marked as true and then the second face image is input into one or more of the global discrimination sub-network, the eye discrimination sub-network and the mouth discrimination sub-network.
A face reconstruction method comprises the following steps:

acquiring an input image;

inputting the input image into a face reconstruction model to obtain a face reconstruction image, wherein the face reconstruction model is obtained by performing model training by using the training method of the face reconstruction model according to any one of claims 1 to 11.
A training device for a face reconstruction model comprises:

the training data acquisition module is used for acquiring training data, wherein the training data comprises a target face image and a first face image corresponding to the target face image, and the definition of the first face image is smaller than that of the target face image;

the first input module is used for inputting the first face image into a first network model to obtain a second face image, wherein the first network model takes the face image as input and takes a reconstructed image of the input face image as output to generate a network model;

the second input module is used for inputting the target face image and the second face image into a second network model to obtain a judgment result, wherein the second network model is the judgment network model which takes the face image as input and takes the judgment result of the authenticity of the input face image as output, and the judgment result comprises the judgment results of the integral authenticity and the authenticity of local features of the input face image;

a first loss function obtaining module, configured to obtain a first loss function corresponding to the first network model, and adjust a parameter of the first network model according to the first loss function;

a second loss function obtaining module, configured to obtain a second loss function corresponding to the second network model, and adjust a parameter of the second network model according to the second loss function;

the training module is used for carrying out model training on the first network model and the second network model in turn;

and the face reconstruction model confirmation module is used for taking the trained first network model as a face reconstruction model, wherein under the condition of finishing training, the values of the first loss function and the second loss function reach corresponding target threshold values.
A face reconstruction device, comprising:

the input image acquisition module is used for acquiring an input image;

an input module, configured to input the input image into a face reconstruction model to obtain a face reconstruction image, where the face reconstruction model is obtained by performing model training through the training method of the face reconstruction model according to any one of claims 1 to 11.
An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the training method of a face reconstruction model according to any one of claims 1 to 11 or implementing the steps of the face reconstruction method according to claim 12.
A readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of a training method for a face reconstruction model according to any one of claims 1 to 11, or carries out the steps of a face reconstruction method according to claim 12.