CN113379594A

CN113379594A - Face shape transformation model training, face shape transformation method and related device

Info

Publication number: CN113379594A
Application number: CN202110728844.8A
Authority: CN
Inventors: 尚太章; 刘家铭; 洪智滨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-10

Abstract

The disclosure provides a face shape transformation model training method, a face shape transformation device, electronic equipment, a computer readable storage medium and a computer program product, relates to the technical field of artificial intelligence such as computer vision and deep learning, and can be applied to scenes such as face image processing. The method comprises the following steps: obtaining a first generation network and a second generation network based on the first facial feature set and the second facial feature set; generating a first target image corresponding to the first sample image and a second target image corresponding to the second sample image by using the first generation network and the second generation network, respectively; generating a first converted image corresponding to the first target image and a second converted image corresponding to the second target image by using a second generation network and the first generation network, respectively; controlling the first sample image and the first converted image, and the second sample image and the second converted image to be judged as the same image; outputting the generated countermeasure network meeting the requirements as a target face shape transformation model.

Description

Face shape transformation model training, face shape transformation method and related device

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, which can be applied to scenes such as face image processing, and in particular, to a face transformation model training and face transformation method, and a corresponding apparatus, electronic device, computer-readable storage medium, and computer program product.

Background

The face attribute editing technology relates to many different editing contents, such as removing or adding glasses, removing or adding bang, color development editing, five sense organs editing and makeup editing. However, there is also a category of editing that involves face shape changes, such as face shape editing, age editing that involves face shape editing when the child is getting smaller or larger, and face shape changes when changing faces, for example.

How to accurately realize the face shape transformation without changing other information of the face is a difficulty in face shape editing in face attribute editing.

Disclosure of Invention

The embodiment of the disclosure provides a face transformation model training method, a face transformation device, an electronic device, a computer-readable storage medium and a computer program product.

In a first aspect, an embodiment of the present disclosure provides a facial form transformation model training method, including: obtaining a first generation network and a second generation network based on the first facial feature set and the second facial feature set; generating a first target image corresponding to the first sample image by using the first generation network, and controlling the first target image to be judged to belong to the second face; generating a second target image corresponding to the second sample image by using a second generation network, and controlling the second target image to be judged to belong to the first face; generating a first converted image corresponding to the first target image and a second converted image corresponding to the second target image by using a second generation network and the first generation network, respectively; controlling the first sample image and the first converted image, and the second sample image and the second converted image to be judged as the same image; outputting the generated confrontation network meeting the preset requirement as a target face shape transformation model; the first sample image and the second sample image belong to a first face and a second face respectively, and the generation countermeasure network comprises a first generation network and a second generation network.

In a second aspect, an embodiment of the present disclosure provides a facial form transformation model training apparatus, including: a generating network training unit configured to derive a first generating network and a second generating network based on the first facial feature set and the second facial feature set; a first primary transformation unit configured to generate a first target image corresponding to the first sample image using the first generation network and control the first target image to be discriminated as belonging to the second face type; a second primary conversion unit configured to generate a second target image corresponding to the second sample image using a second generation network, and control the second target image to be discriminated as belonging to the first face; a quadratic conversion unit configured to generate a first converted image corresponding to the first target image and a second converted image corresponding to the second target image using the second generation network and the first generation network, respectively; an identical image control unit configured to control the first sample image and the first converted image, the second sample image and the second converted image to be discriminated as identical images; a target face shape transformation model output unit configured to output the generated countermeasure network satisfying a preset requirement as a target face shape transformation model; the first sample image and the second sample image belong to a first face and a second face respectively, and the generation countermeasure network comprises a first generation network and a second generation network.

In a third aspect, an embodiment of the present disclosure provides a face shape transformation method, including: acquiring a face image to be transformed; and calling a target face shape transformation model to perform face shape transformation on the face image to be transformed, wherein the face shape transformation model is obtained according to the face shape transformation model training method described in any one implementation mode of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a facial form changing apparatus, including: a face image to be transformed acquisition unit configured to acquire a face image to be transformed; and the model calling processing unit is configured to call a target face shape transformation model to perform face shape transformation on the face image to be transformed, wherein the face shape transformation model is obtained according to the face shape transformation model training device described in any one implementation mode of the second aspect.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a face transformation model as described in any implementation of the first aspect or the method for transforming a face as described in any implementation of the third aspect when executed.

In a sixth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement the face transformation model training method as described in any one of the implementations of the first aspect or the face transformation method as described in any one of the implementations of the third aspect when executed.

In a seventh aspect, the disclosed embodiments provide a computer program product comprising a computer program, which when executed by a processor is capable of implementing the face transformation model training method as described in any of the implementations of the first aspect or the face transformation method as described in any of the implementations of the third aspect.

According to the face shape transformation model training and face shape transformation method provided by the embodiment of the disclosure, on the basis of ensuring that the image after the primary face shape transformation can be judged as the corresponding face shape, the additional control sample image and the image after the secondary face shape transformation can be judged as the same image, so that under the condition that a training sample pair with an obvious corresponding relation is difficult to obtain, the image after the face shape transformation can be controlled to judge the face shape, and the change of other information of the face can be controlled as little as possible, so that the same person and the same expression before and after transformation are realized.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;

FIG. 2 is a flowchart of a facial transformation model training method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another facial transformation model training method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a stylian 2 model provided in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a face key point migration network according to an embodiment of the present disclosure;

FIG. 6 is a diagram illustrating an example of a key point migration provided by an embodiment of the present disclosure;

FIG. 7 is a block diagram of a facial form transformation model training apparatus according to an embodiment of the present disclosure

Fig. 8 is a block diagram of a facial form changing apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device suitable for executing a face shape transformation model training method and/or a face shape transformation method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present methods, apparatuses, electronic devices and computer-readable storage media for training a face recognition model and recognizing a face may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications for realizing information communication between the

terminal devices

101, 102, and 103 and the server 105, such as a facial image editing application, a model training application, an online social application, and the like, may be installed on the

terminal devices

101, 102, and 103 and the server 105.

The

terminal apparatuses

101, 102, 103 and the server 105 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the

terminal devices

101, 102, and 103 are software, they may be installed in the electronic devices listed above, and they may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and are not limited in this respect. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein.

The server 105 may provide various services through various built-in applications, for example, a face image editing application that may provide a face editing service for a user, and the server 105 may implement the following effects when running the face image editing application: firstly, receiving a face image to be transformed, which is transmitted by

terminal equipment

101, 102 and 103, through a network 104; then, the target face shape transformation model is called to perform face shape transformation on the face image to be transformed. Further, the transformation result may also be returned to the

terminal devices

101, 102, 103.

The face transformation model can be obtained by a face transformation model training application built in the server 105 according to the following steps: firstly, obtaining a first generation network and a second generation network based on a first facial feature set and a second facial feature set; then, generating a first target image corresponding to the first sample image by using the first generation network, and controlling the first target image to be judged to belong to a second face; meanwhile, a second target image corresponding to the second sample image is generated by using a second generation network, and the second target image is controlled to be judged to belong to the first face; next, generating a first converted image corresponding to the first target image and a second converted image corresponding to the second target image using a second generation network and a first generation network, respectively; next, controlling the first sample image and the first converted image, and the second sample image and the second converted image to be judged as the same image; and finally, outputting the generated confrontation network meeting the preset requirement as a target face shape transformation model, wherein the first sample image and the second sample image belong to a first face shape and a second face shape respectively, and the generated confrontation network comprises a first generation network and a second generation network.

Since the training of the face shape transformation model requires more computation resources and higher computation power, the training method of the face shape transformation model provided in the following embodiments of the present application is generally executed by the server 105 having higher computation power and more computation resources, and accordingly, the training device of the face shape transformation model is also generally disposed in the server 105. However, it should be noted that, when the

terminal devices

101, 102, and 103 also have computing capabilities and computing resources meeting the requirements, the

terminal devices

101, 102, and 103 may also complete the above-mentioned operations performed by the server 105 through the face shape transformation model training application installed thereon, and then output the same result as the server 105. Accordingly, the face shape conversion model training means may be provided in the

terminal apparatuses

101, 102, 103. In such a case, the exemplary system architecture 100 may also not include the server 105 and the network 104.

Of course, the server for training the face transformation model may be different from the server for calling the trained face transformation model. Specifically, the face shape conversion model trained by the server 105 may also be a lightweight face shape conversion model suitable for being embedded in the

terminal devices

101, 102, and 103 by model distillation, that is, the lightweight face shape conversion model in the

terminal devices

101, 102, and 103 or a more complex face shape conversion model in the server 105 may be selected and used flexibly according to the recognition accuracy required in practice.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of a facial shape transformation model training method according to an embodiment of the present disclosure, wherein the process 200 includes the following steps:

step 201: obtaining a first generation network and a second generation network based on the first facial feature set and the second facial feature set;

the present step is directed to constructing two sample relationship pairs from a first facial feature set and a second facial feature set by an executive body of a facial transformation model training method (e.g., the server 105 shown in fig. 1), so that two generation networks respectively learn corresponding relationships from the two sample relationship pairs, for example, when the first facial feature set is used as an input sample and the second facial feature set is used as an output sample, the generation networks can be trained to learn a transformation relationship representing the transformation from the first facial feature to the second facial feature from the sample relationship pairs, and if the input and output relationships are opposite, the transformation relationship representing the transformation from the long second facial feature to the first facial feature can be learned from the sample relationship pairs.

It should be noted that the first facial feature set and the second facial feature set are feature sets extracted from the first facial image set and the second facial image set respectively and representing respective facial shapes, and the facial features can be represented in various ways, such as facial outline, five sense organs, facial key points, and so on. It should be emphasized that, the first facial image set and the second facial image set are obtained independently, and there is no dependency or causal relationship between the first facial image set and the second facial image set, and it is not that which facial image is transformed based on another facial image, for example, a preset number of first facial images and second facial images may be generated directly through a facial image random generation model, so as to obtain the first facial image set and the second facial image set, the facial image random generation model may learn a facial distinguishing feature based on a small number of samples labeled with facial shapes in advance, and then randomly generate a large number of designated facial images under the guidance of the facial distinguishing feature. Facial features can then be extracted from the first and second sets of facial images, respectively, resulting in the first and second sets of facial features.

Taking the facial feature set as an example to characterize the facial feature, the step of obtaining the facial feature set may specifically be: face key points are extracted from the first face image set and the second face image set respectively to obtain a first face key point set and a second face key point set.

The first face and the second face generally refer to two different faces, but not to a certain two faces, and the specific faces referred to by the first face and the second face can be determined according to actual conditions.

It should be noted that although the pairs of samples trained to obtain the first generation network and the second generation network are both the first facial feature set and the second facial feature set, the training processes are independent from each other, and due to the characteristics of the neural network and the pre-training algorithm, two transformations learned from two pairs of samples opposite to the input sample and the output sample are not usually direct reversible relations, and the transformation relations of the same pair of samples trained for multiple times may also be different.

Step 202: generating a first target image corresponding to the first sample image by using the first generation network, and controlling the first target image to be judged to belong to the second face;

in step 201, the execution subject converts a first sample image belonging to a first face into a first target image by using a first generation network, and the first generation network learns a conversion relationship from the first face to a second face, so that the first generation network is preliminarily determined to have an available face conversion function only when the first target image is determined to belong to the second face.

However, it should be noted that, since it is difficult to acquire a pair of samples having a definite correspondence relationship, although the transformation relationship from the first face form to the second face form can be learned from the first face form feature set and the second face form feature set obtained independently, in actual use, it is necessary to ensure that not only the face form is transformed, but also other face features in the image are changed as little as possible, and this cannot be achieved by focusing on only the features of the face form, and it is difficult for the pair of samples used for training to obtain the generation network to include a pair of face images with only the face form being transformed, and other ways are also needed to achieve this purpose.

Step 203: generating a second target image corresponding to the second sample image by using a second generation network, and controlling the second target image to be judged to belong to the first face;

in step 201, the execution subject converts the second sample image belonging to the second face into the second target image by using the second generation network, and the second generation network learns the conversion relationship from the second face to the first face, so that the second generation network is preliminarily determined to have the available face conversion function only when the second target image is determined to belong to the first face.

Step 204: generating a first converted image corresponding to the first target image and a second converted image corresponding to the second target image by using a second generation network and the first generation network, respectively;

on the basis of

steps

202 and 203, this step is intended to convert a first target image belonging to a second face into a first converted image that should belong to a first face by the above-described executing body using the second generating network, while converting a second target image belonging to the first face into a second converted image that should belong to a second face using the first generating network.

Namely, the content described in the step is as follows: on the basis that the face of the original image of one face is judged to be the corresponding face of the other face after the primary face transformation obtained by the transformation relation, the secondary face transformed image is obtained by the transformation relation opposite to the primary face transformed image.

Since the first and second generation networks can only ensure that the face of the first transformed image is finally determined to belong to the first face and the face of the second transformed image is finally determined to belong to the second face, but other parameters except the face are not determined any more, for example, other facial parameters may be adjusted greatly in the conversion process so that the other facial parameters may be determined to be the desired face after the change, but the expression and facial details are changed greatly. For practical use, a model that only meets the requirements of the face shape is obviously not feasible, and other information of the face needs to be maintained by controlling other parameters.

Step 205: controlling the first sample image and the first converted image, and the second sample image and the second converted image to be judged as the same image;

that is, the step can judge that the images after two times of conversion are not obviously different in face details due to the two times of conversion, and can also judge that the images belong to the same image. Thereby enabling preservation of facial details (e.g., face-related information, facial expression information, key point information, etc. for identifying whether or not belonging to the same user). There are various ways to achieve the control purpose of this step, for example, a discrimination network that is whether the generated countermeasure network belongs to the same image or not may be added, and the discrimination network may guide the adjustment of the transformation relation in the generated network, or a special loss function may be added to achieve the control purpose.

Step 206: and outputting the generated confrontation network meeting the preset requirement as a target face shape transformation model.

On the basis of step 205, this step is intended to output the converged or generated countermeasure network satisfying the training end condition as a desired target face transformation model by the execution subject described above.

Specifically, the preset requirement mainly refers to a requirement when the model is considered to be in convergence or meet the training end condition, for example: and when the numerical value of the loss function is smaller than a preset value, the iterative training times exceed the preset times, the numerical value difference of the loss function in two adjacent iterations is smaller than the preset value, and the like.

According to the face shape conversion model training method provided by the embodiment of the disclosure, on the basis of ensuring that the image after the primary face shape conversion can be judged as the corresponding face shape, the additional control sample image and the image after the secondary face shape conversion can be judged as the same image, so that under the condition that a training sample pair with an obvious corresponding relation is difficult to obtain, the image after the face shape conversion can be controlled to judge the face shape, and the change of other information of the face can be controlled as little as possible, so that the same person and the same expression before and after the conversion can be realized.

Referring to fig. 3, fig. 3 is a flowchart of another facial shape transformation image training method according to an embodiment of the present disclosure, in which the process 300 includes the following steps:

step 301-1: respectively taking the first facial feature set and the second facial feature set as an input sample and an output sample for training a generating network to obtain a first generating network;

that is, the first generation network is able to learn a transformation relationship from the sample pair to transform from the first facial feature to the second facial feature.

Step 302-1: generating a first target image corresponding to the first sample image using the first generation network;

on the basis of step 301-1, this step is intended to generate, by the executing body described above, a first target image corresponding to the first sample image through the first generating network, i.e., to attempt face conversion on the first sample image by means of the learned transformation relationship characterizing the first face feature to the second face feature.

Step 303-1: the second judging network judges whether the first target image belongs to a second face, if so, executing step 304-1, otherwise, readjusting the transformation relation learned by the first generating network until the output first target is judged to belong to the second face (represented by pointing to step 301-1 again in fig. 3);

on the basis of step 302-1, this step is intended to discriminate by the execution subject described above whether or not the face of the first target image belongs to the second face by generating the second discrimination network in the confrontation network.

The method for constructing the second discrimination network having the discrimination capability may be: second facial features are extracted based on the second facial feature set, and a second judgment network is constructed based on the second facial features.

Step 304-1: generating a first transformed image corresponding to the first target image using a second generation network;

on the basis of step 303-1, this step is intended to convert the first target image into a first transformed image by the execution subject described above through the second generation network that generates the countermeasure network.

Unlike the first generation network, the second generation network learns the transformation relationship that is the opposite of the first generation network, i.e., the transformation relationship from the second facial feature to the first facial feature.

Step 305-1: the third judging network judges whether the first sample image is the same as the first converted image, if the first sample image is judged to be the same image, step 307 is executed, otherwise step 306 is executed;

on the basis of step 304-1, this step is intended to discriminate by the executing body whether or not the first sample image and the first converted image are the same image through the third discrimination network.

The third discrimination network of this step is used to discriminate whether the two images belong to the same image, and the discrimination criterion can be obtained based on a preset same image discrimination rule.

One specific determination step may be:

acquiring at least one of the differences of key points of the face shape, the expression and the identity information of the face between the first sample image and the first conversion image;

and judging that the first sample image and the first transformed image belong to the same image in response to that at least one of the face shape key point difference, the expression difference and the face identity information difference is smaller than the corresponding preset difference.

The preset differences corresponding to the differences of the key points of the face shape, the preset differences corresponding to the differences of the expression and the preset differences corresponding to the differences of the face identity information may be the same or different, but are different due to different categories. The realization mode simultaneously adopts at most three parameters to judge the same image, covers a plurality of angles and improves the accuracy of the judgment result as much as possible.

Step 301-2: respectively taking the second facial feature set and the first facial feature set as an input sample and an output sample for training another generation network to obtain a second generation network;

that is, the second generation network can learn a transformation relationship from the pair of samples to transform from the second facial feature to the first facial feature.

Step 302-2: generating a second target image corresponding to the second sample image using a second generation network;

step 303-2: the first judging network judges whether the second target image belongs to the first face type, if so, executing step 304-2, otherwise, readjusting the transformation relation learned by the second generating network until the output second target image is judged to belong to the first face type (indicated by pointing to step 301-2 again in fig. 3);

the method for constructing the first discriminant network with the discriminant capability may be: a first facial feature is extracted based on the first facial feature set, and a first discriminant network is constructed based on the first facial feature.

Step 304-2: generating a second transformed image corresponding to the second target image using the first generation network;

this step is intended to convert the second target image into the second converted image through the first generation network that generates the countermeasure network by the execution subject described above.

Step 305-2: the third judging network judges whether the second sample image is the same as the second converted image, if so, executing step 307, otherwise, executing step 306;

step 301-2 to step 305-2 are similar to the other set of step 301-1 to step 305-1, the input-output relationship in the sample pair is opposite, the face shape of the initial image is different, and the discrimination capabilities of the generation network and the discrimination network are correspondingly adjusted adaptively.

Step 306: adjusting parameters of a current cyclic loss function;

this step is based on the fact that the determination result of step 305-1 is that the first sample image and the first transformed image are not the same image or the determination result of step 305-2 is that the second sample image and the second transformed image are not the same image, and aims to adjust the two generation networks by the executing entity in such a way that the parameters of the current cyclic loss function are adjusted, so that the images that can be generated by the adjusted generation networks can be determined by step 305-1 and step 305-2 (indicated by pointing to step 301-1 and step 301-2 in fig. 3).

That is, the step view solves the above problem by newly introducing a loop loss function, so it is called a loop loss function, and it is because the first sample image, after two conversions of the first transformation and the second transformation, finally returns to the first face shape to which it belongs, equivalent to going through a face shape transformation loop. The cyclic loss function (Cycle loss function) is therefore a loss function used by the present disclosure to control the first sample image and the first transformed image to be discriminated to belong to the same image, and the second sample image and the second transformed image to belong to the same image.

Step 307: and outputting the generated countermeasure network with the cyclic loss function meeting the preset requirement as a target face shape transformation model.

This step is based on whether the first image and the fifth image are the same image as the result of the determination in step 305-1 or the third image and the sixth image as the result of the determination in step 305-2, and is intended to output a model having a cyclic loss function satisfying a preset requirement as a target face transformation model by the execution subject.

The preset requirements set for the cyclic loss function may include a variety of, for example: when the value is smaller than the preset value, the iterative training times exceed the preset times, the difference of the loss function values of two adjacent iterations is smaller than the preset value, and the like. For example, the generated countermeasure network having the cyclic loss function with the actual value smaller than the preset value is output as the target face transformation model.

That is, the generation countermeasure network in this embodiment includes a first generation network, a second generation network, a first discrimination network, a second discrimination network, and a third discrimination network.

In this embodiment, steps 301-1 and 301-2 show an implementation manner of specifically obtaining the first generation network and the second generation network, so that they learn different transformation relationships respectively; the discrimination task is realized by generating a discrimination network in the countermeasure network through the steps 303 and 305; the cyclic loss function is also selected in step 306 and step 307 to control the face image to keep other image contents unchanged before and after face shape conversion.

It should be understood that the above-mentioned various parts in this embodiment do not have cause and dependency relationship, and the detailed description of each part can be respectively combined with the embodiment of the process 200 to form different independent embodiments, and this embodiment only exists as a limited embodiment including multiple detailed description parts at the same time.

On the basis of any of the above embodiments, in order to obtain a sufficient number of first facial feature sets and second facial feature sets, various ways may be adopted, such as first labeling a small number of facial images of a facial type to train a classifier, and generating a required number of sample images of the corresponding facial type by means of the classifier and a face random generator. Finally, facial features are extracted from the sample set of each facial form to form facial feature sets. Such as facial form key point set, facial form five sense organs set, facial form outline set, etc. The classifier and the face randomizer can be implemented by using various specific models, such as the common stylegan2 model and other similar or identical models, which are not listed here.

The above embodiments illustrate how to train the face transformation model from various aspects, and in order to highlight the effect of the face transformation model trained from the actual use scene as much as possible, the present disclosure also specifically provides a solution to the actual problem by using the trained face transformation model, and a face transformation method includes the following steps:

acquiring a face image to be transformed;

and calling a target face shape transformation model to perform face shape transformation on the face image to be transformed.

Furthermore, the executing body can also return the face-shape-converted image output by the model to the user who initiates the conversion task, or upload the face-shape-converted image to a certain network address according to the instruction of the user.

In order to deepen understanding, the disclosure also provides a specific implementation scheme by combining with a specific application scenario with a requirement for changing the face shapes of adults and children, and the implementation scheme is mainly divided into several parts: 1) training a sample generation model; 2) generating adult and child face image sets meeting requirements by using the trained sample generation model; 3) respectively extracting an adult face type key point set and a child face type key point set from the adult face type image set and the child face type image set; 4) training a face type key point migration model for realizing face type interchange of adults and children based on an adult face type key point set and a child face type key point set, and explaining each part as follows:

1) training sample generation model

In the embodiment, the scheme generated by stylegan2 is adopted to generate images of children and adults. Firstly, training by adopting a large amount of data to obtain a stylegan2 model; then randomly generating a small number of pictures by adopting a trained stylegan2 model; and then, obtaining two groups of data of adults and children through manual labeling, and then training a linear classifier according to the two groups of data to obtain a classification hyperplane (a normal vector of the classification hyperplane is an attribute axis of the two groups of data of the adults and the children, namely, one side of the attribute axis generates an adult picture, and the other side generates a child picture).

The stylegan2 architecture is specifically shown in fig. 4, where the left side of fig. 4 shows a fully-connected latent code mapping network, and aims to map z from 512 dimensions to w of 8 × 512 dimensions, then w in a decoder structure shown on the right side respectively obtains a mod coefficient of a weight of a corresponding layer in the decoder through a, then performs mod and demod on the weight of the corresponding layer to obtain a corresponding weight, and then calculates a corresponding convolution to obtain a result, and adds the result and a right B (noise) to obtain a final 1024 × 1024 picture. And the training process directly judges whether the finally generated picture is true or false. The training converges to obtain the final stylegan2 model.

A small number of pictures (about 4000) were randomly generated using the resulting stylegan2 model, and the corresponding parent code z was labeled manually with adult and child tags. The classifier is trained on the late code z using a linear support vector machine. And obtaining a classification hyperplane, and further obtaining a normal vector of the hyperplane, namely the direction vector of the attribute axis of the adult child.

2) Generating a set of adult and child facial images that meet the requirements:

the classification hyperplane of the stylegan2 model is used for randomly generating a large amount of data, meanwhile, the classifier obtained by the last training is adopted for distinguishing the attributes, the adult face images and the child face images with the required number can be generated, and 20000 sheets of generation are set and required in the embodiment.

And randomly generating a late code z, and classifying the late code z by adopting a trained support vector machine classifier. When an adult needs to be generated and the classification result is child, the z direction is moved to the adult direction according to the direction vector of the attribute axis of the adult child. When a child needs to be generated and the classification result is an adult, the z-direction is moved to the child direction according to the direction vector of the attribute axis of the adult child. Therefore, 20000 adult pictures and 20000 child pictures can be generated as required.

3) Extracting key points of the facial form:

and extracting key points of the generated facial images of the children and the adults. The key point extraction model can adopt an open-source human face key point detection model realized based on dlib (C + + open-source toolkit containing a machine learning algorithm).

4) Training a face type key point migration model:

and training a face key point migration model by using the generated face key point data. And training a key point migration model by using the extracted face key points, corresponding adult and child labels, a generator based on full-connection design and a discriminator.

The overall training mode adopted is a cyclic design. The specific structure is shown in fig. 5. Wherein the encoder, decoder and MLP (Muti-Layer perceptron) are all fully connected networks. The Encoder is responsible for encoding the input face key points to obtain xfeature (x characteristics), and the MLP is responsible for encoding the condition (feedback) to obtain cfeature (c characteristics). And then, concat (fusion) the xfeature and the cfissue together, and sending the mixture to a decoder for decoding to obtain the face key points corresponding to the condition. The condition may be [0,1] or [1,0 ]. Where [0,1] represents the goal to turn an adult into a child and [1,0] represents the migration of child keypoints to an adult. The final output of the network does not directly output the corresponding key points, but the relative movement distance of the original key points when the network is migrated to the target key points. The arbiter is also a fully connected network component. The training mode adopts a cyclic (loop-generated countermeasure) mode. In particular, the loss function includes a reconstruction loss, a regular loss of an offset distance, a countermeasure loss (all three belong to a common round-robin function for generating a countermeasure network), and a cycle loss (i.e., a round-robin loss function described in the above embodiments of the present application).

Fig. 6 shows the migration result of part of the key points. The first row, from left to right, depicts the transition from adult face keypoints to child face keypoints, and the second row, from left to right, depicts the transition from child face keypoints to adult face keypoints.

With further reference to fig. 7 and 8, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a face transformation model training apparatus corresponding to the embodiment of the face transformation model training method shown in fig. 2 and an embodiment of a face transformation apparatus corresponding to the embodiment of the face transformation method, respectively. The device can be applied to various electronic equipment.

As shown in fig. 7, the facial form transformation model training apparatus 700 of the present embodiment may include: network training section 701, first primary conversion section 702, second primary conversion section 703, secondary conversion section 704, and target face conversion model output section 705 are generated. Wherein the generating network training unit 701 is configured to obtain a first generating network and a second generating network based on the first facial feature set and the second facial feature set; a first primary transformation unit 702 configured to generate a first target image corresponding to the first sample image using the first generation network, and control the first target image to be discriminated as belonging to the second face; a second primary conversion unit 703 configured to generate a second target image corresponding to the second sample image using a second generation network, and control the second target image to be discriminated as belonging to the first face; a quadratic conversion unit 704 configured to generate a first converted image corresponding to the first target image and a second converted image corresponding to the second target image using the second generation network and the first generation network, respectively; an identical image control unit 704 configured to control the first sample image and the first converted image, the second sample image and the second converted image to be discriminated as identical images; a target face shape transformation model output unit 705 configured to output the generated countermeasure network satisfying a preset requirement as a target face shape transformation model; the first sample image and the second sample image belong to a first face and a second face respectively, and the generation countermeasure network comprises a first generation network and a second generation network.

In the present embodiment, the face shape conversion model training apparatus 700: the specific processes of the generation network training unit 701, the first primary transformation unit 702, the second primary transformation unit 703, the secondary transformation unit 704, and the target face transformation model output unit 705 and the technical effects thereof can be referred to the related descriptions of step 201 and step 205 in the corresponding embodiment of fig. 2, and are not repeated herein.

In some optional implementations of this embodiment, the generating network training unit 701 may be further configured to:

respectively taking the first facial feature set and the second facial feature set as an input sample and an output sample for training a generating network to obtain a first generating network;

and respectively taking the second facial feature set and the first facial feature set as an input sample and an output sample for training the other generation network to obtain a second generation network.

In some optional implementations of the present embodiment, the facial form transformation model training apparatus 700 may further include:

a first discrimination network construction unit configured to extract a first facial feature based on the first facial feature set and construct a first discrimination network based on the first facial feature;

a second determination network construction unit configured to extract a second facial feature based on the second facial feature set, and construct a second determination network based on the second facial feature;

and the third judgment network construction unit is configured to construct a third judgment network based on a preset same image judgment rule.

In some optional implementations of the present embodiment, the first primary transformation unit 702 may include a first control subunit configured to control the first target image to be discriminated as belonging to the second face, and the first control subunit may be further configured to:

controlling the first target image to be judged by a second judging network to belong to a second face;

the second primary conversion unit 703 may include a second control subunit configured to control the second target image to be discriminated as belonging to the first face type, and the second control subunit may be further configured to:

and controlling the second target image to be judged to belong to the first face by the first judging network.

In some optional implementations of this embodiment, the same image control unit 704 may be further configured to:

controlling the first sample image and the first conversion image to be judged as the same image by the third judging network;

and controlling the second sample image and the second conversion image to be judged as the same image by the third judging network.

In some optional implementations of the present embodiment, the target face transformation model output unit 705 may be further configured to:

constructing a target generation countermeasure network based on a first generation network, a second generation network, a first discrimination network, a second discrimination network and a third discrimination network;

and generating a countermeasure network for the target meeting the preset requirement and outputting the countermeasure network as a target face shape transformation model.

a face image set random generation unit configured to generate a preset number of first face images and second face images respectively through a face image random generation model, and obtain a first face image set and a second face image set;

the facial feature extraction unit is configured to extract facial features from the first facial image set and the second facial image set respectively to obtain a first facial feature set and a second facial feature set.

In some optional implementations of the present embodiment, the face feature extraction unit may be further configured to:

face key points are extracted from the first face image set and the second face image set respectively to obtain a first face key point set and a second face key point set.

controlling the first sample image and the first conversion image, and the second sample image and the second conversion image to be judged as the same image by adjusting the parameters of the preset cyclic loss function;

correspondingly, the target face transformation model output unit 705 may be further configured to:

and outputting the generated countermeasure network with the cyclic loss function meeting the preset jump-out condition as a target face transformation model.

In some optional implementations of the present embodiment, the identical image control unit 704 may include a first sub-unit configured to control the first sample image and the first transformed image to be discriminated as the same image, and the first sub-unit may be further configured to:

and judging that the first sample image and the first transformed image are the same image in response to at least one of the face shape key point difference, the expression difference and the face identity information difference being smaller than the corresponding preset difference.

As shown in fig. 8, the face shape changing apparatus 800 of the present embodiment may include: a face image to be transformed acquiring unit 801 and a model calling processing unit 802. The face image to be transformed acquiring unit 801 is configured to acquire a face image to be transformed; a model calling processing unit 802 configured to call a target face shape transformation model to perform face shape transformation on the face image to be transformed; the target face shape conversion model is obtained by the face shape conversion model training device 700.

In the present embodiment, the face shape conversion apparatus 800: the specific processing of the to-be-transformed face image obtaining unit 801 and the model calling processing unit 802 and the technical effects brought by the processing may respectively correspond to the related descriptions in the method embodiments, and are not described herein again.

The present embodiment exists as an apparatus embodiment corresponding to the above method embodiment, and provides a face shape transformation model training apparatus and a face shape transformation apparatus, which can not only control the image after face shape transformation to be distinguished by the face shape, but also control other information of the face to be changed as little as possible, so as to realize the same person and the same expression before and after transformation, in the case that it is difficult to obtain a training sample pair with obvious correspondence, on the basis of ensuring that the image after primary face shape transformation can be distinguished as a corresponding face shape, and additionally, the control sample image and the image after secondary face shape transformation can be distinguished as the same image.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can implement the face transformation model training method and/or the face transformation method described in any one of the above embodiments.

According to an embodiment of the present disclosure, there is also provided a readable storage medium storing computer instructions for enabling a computer to implement the face transformation model training method and/or the face transformation method described in any of the above embodiments when executed.

The disclosed embodiments provide a computer program product, which when executed by a processor is capable of implementing the face transformation model training method and/or the face transformation method described in any of the above embodiments.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the face transformation model training method and/or the face transformation method. For example, in some embodiments, the face transformation model training method and/or the face transformation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the face transformation model training method and/or the face transformation method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the face transformation model training method and/or the face transformation method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and Virtual Private Server (VPS) service.

According to the technical scheme of the embodiment of the disclosure, on the basis of ensuring that the image after the primary face shape transformation can be judged as the corresponding face shape, the additional control sample image and the image after the secondary face shape transformation can be judged as the same image, so that under the condition that a training sample pair with an obvious corresponding relation is difficult to obtain, the image after the face shape transformation can be controlled to judge the face shape, and the change of other information of the face can be controlled as small as possible, so that the same person and the same expression before and after the transformation are realized.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A facial form transformation model training method comprises the following steps:

obtaining a first generation network and a second generation network based on the first facial feature set and the second facial feature set;

generating a first target image corresponding to a first sample image by using the first generation network, and controlling the first target image to be judged to belong to a second face;

generating a second target image corresponding to a second sample image by using the second generation network, and controlling the second target image to be judged to belong to the first face;

generating a first converted image corresponding to the first target image and a second converted image corresponding to the second target image using the second generation network and the first generation network, respectively;

controlling the first sample image and the first converted image, and the second sample image and the second converted image to be discriminated as the same image;

outputting the generated confrontation network meeting the preset requirement as a target face shape transformation model; wherein the first and second sample images belong to the first and second face types, respectively, and the generative countermeasure network includes the first and second generative networks.

2. The method of claim 1, wherein deriving the first and second generated networks based on the first and second facial feature sets comprises:

respectively taking the first facial feature set and the second facial feature set as an input sample and an output sample for training a generating network to obtain the first generating network;

and respectively taking the second facial feature set and the first facial feature set as an input sample and an output sample for training another generation network to obtain the second generation network.

3. The method of claim 1, further comprising:

extracting a first facial feature based on the first facial feature set, and constructing a first discrimination network based on the first facial feature;

extracting second facial features based on the second facial feature set, and constructing a second judgment network based on the second facial features;

and constructing to obtain a third discrimination network based on a preset identical image discrimination rule.

4. The method of claim 3, wherein the controlling the first target image to be discriminated as belonging to a second face type comprises:

controlling the first target image to be judged to belong to the second face by the second judging network;

the controlling the second target image to be discriminated as belonging to the first face type includes:

5. The method of claim 3, wherein the controlling that the first sample image and the first transformed image, the second sample image and the second transformed image are discriminated as the same image comprises:

controlling the first sample image and the first converted image to be judged as the same image by the third judging network;

and controlling the second sample image and the second converted image to be the same image discriminated by the third discrimination network.

6. The method of claim 3, wherein outputting the generated confrontation network meeting the preset requirements as a target face transformation model comprises:

constructing a target generation countermeasure network based on the first generation network, the second generation network, the first discrimination network, the second discrimination network, and the third discrimination network;

and generating a countermeasure network for the target meeting the preset requirement and outputting the countermeasure network as the target face shape transformation model.

7. The method of claim 1, further comprising:

respectively generating a preset number of first facial image and second facial image through a face image random generation model to obtain a first facial image set and a second facial image set;

facial features are extracted from the first facial image set and the second facial image set respectively, and the first facial feature set and the second facial feature set are obtained.

8. The method of claim 7, wherein said extracting facial features from the first and second sets of facial images, respectively, resulting in the first and second sets of facial features comprises:

face key points are extracted from the first face image set and the second face image set respectively, and the first face key point set and the second face key point set are obtained.

9. The method of claim 1, wherein the controlling that the first sample image and the first transformed image, the second sample image and the second transformed image are discriminated as the same image comprises:

controlling the first sample image and the first transformed image, and the second sample image and the second transformed image to be judged as the same image by adjusting parameters of a preset cyclic loss function;

correspondingly, the outputting of the generated confrontation network meeting the preset requirement as the target face transformation model comprises the following steps:

and outputting the generated countermeasure network with the cyclic loss function meeting the preset jump-out condition as the target face transformation model.

10. The method according to any one of claims 1-9, wherein the controlling the first sample image and the first transformed image to be discriminated as the same image comprises:

acquiring at least one of a face shape key point difference, an expression difference and a face identity information difference between the first sample image and the first transformation image;

11. A face transformation method comprising:

acquiring a face image to be transformed;

calling a target face shape transformation model to carry out face shape transformation on the face image to be transformed; wherein the target face transformation model is obtained based on the face transformation model training method according to any one of claims 1 to 10.

12. A facial form transformation model training apparatus comprising:

a generating network training unit configured to derive a first generating network and a second generating network based on the first facial feature set and the second facial feature set;

a first primary transformation unit configured to generate a first target image corresponding to a first sample image using the first generation network and control the first target image to be discriminated as belonging to a second face;

a second primary transformation unit configured to generate a second target image corresponding to a second sample image using the second generation network, and control the second target image to be discriminated as belonging to the first face;

a quadratic transformation unit configured to generate a first transformed image corresponding to the first target image and a second transformed image corresponding to the second target image using the second generation network and the first generation network, respectively;

a same image control unit configured to control the first sample image and the first converted image, the second sample image and the second converted image to be discriminated as the same image;

a target face shape transformation model output unit configured to output the generated countermeasure network satisfying a preset requirement as a target face shape transformation model; wherein the first and second sample images belong to the first and second face types, respectively, and the generative countermeasure network includes the first and second generative networks.

13. The apparatus of claim 12, wherein the generating network training unit is further configured to:

14. The apparatus of claim 12, further comprising:

a first discriminant network construction unit configured to extract a first facial feature based on the first facial feature set and construct a first discriminant network based on the first facial feature;

15. The apparatus of claim 12, wherein the first primary transformation unit comprises a first control subunit configured to control the first target image to be discriminated as belonging to a second face, the first control subunit further configured to:

the second primary conversion unit includes a second control subunit configured to control the second target image to be discriminated as belonging to the first face type, the second control subunit being further configured to:

16. The apparatus of claim 12, wherein the same image control unit is further configured to:

17. The apparatus of claim 12, wherein the target face transformation model output unit is further configured to:

18. The apparatus of claim 12, further comprising:

a facial feature extraction unit configured to extract facial features from the first facial image set and the second facial image set, respectively, resulting in the first facial feature set and the second facial feature set.

19. The apparatus of claim 18, wherein the facial feature extraction unit is further configured to:

20. The apparatus of claim 12, wherein the same image control unit is further configured to:

correspondingly, the target face transformation model output unit is further configured to:

21. The apparatus according to any one of claims 12-20, wherein the same image control unit comprises a first subunit configured to control the first sample image and the first transformed image to be discriminated as the same image, the first subunit further configured to:

22. A facial contour transformation apparatus comprising:

a face image to be transformed acquisition unit configured to acquire a face image to be transformed;

the model calling processing unit is configured to call a target face shape transformation model to carry out face shape transformation on the face image to be transformed; wherein the target face shape transformation model is obtained based on the face shape transformation model training apparatus according to any one of claims 12 to 21.

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the face transformation model training method of any one of claims 1-10 and/or the face transformation method of claim 11.

24. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the face transformation model training method of any one of claims 1 to 10 and/or the face transformation method of claim 11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the face transformation model training method of any one of claims 1-10 and/or the face transformation method of claim 11.