CN113870399B - Expression driving method and device, electronic equipment and storage medium - Google Patents

Expression driving method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113870399B
CN113870399B CN202111117185.0A CN202111117185A CN113870399B CN 113870399 B CN113870399 B CN 113870399B CN 202111117185 A CN202111117185 A CN 202111117185A CN 113870399 B CN113870399 B CN 113870399B
Authority
CN
China
Prior art keywords
image
facial
expression
sample
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111117185.0A
Other languages
Chinese (zh)
Other versions
CN113870399A (en
Inventor
梁柏荣
郭知智
洪智滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111117185.0A priority Critical patent/CN113870399B/en
Publication of CN113870399A publication Critical patent/CN113870399A/en
Priority to PCT/CN2022/088311 priority patent/WO2023045317A1/en
Application granted granted Critical
Publication of CN113870399B publication Critical patent/CN113870399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The disclosure provides an expression driving method and device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as face image processing, face recognition and the like. The specific implementation scheme is as follows: respectively inputting a source image with expression and a target image without expression into a three-dimensional expression model to obtain a plurality of first facial attributes and a plurality of second facial attributes, replacing corresponding facial attributes in the second facial attributes by adopting at least part of the first facial attributes, performing three-dimensional facial reconstruction and rendering on the replaced second facial attributes, and performing expression driving on a three-dimensional facial image to be rendered through an expression driving model. Therefore, the facial expressions and facial gestures in the source image and the target image can be decoupled, and further, the facial expressions and facial gestures of the target image can be controlled independently, so that more various expression drives can be better met.

Description

Expression driving method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, which can be applied to scenes such as face image processing and face recognition, and in particular, to an expression driving method and apparatus, an electronic device, and a storage medium.
Background
The facial expression driving technology is one of important computer vision technologies, and the task is to drive the facial expression of a target picture through a facial expression picture so that the facial expressions of the target picture and the facial expression picture are consistent as much as possible. Facial expression-driven technology is very widespread in general entertainment applications.
Disclosure of Invention
The disclosure provides a method and a device for expression driving, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided an expression driving method including: acquiring a source image with an expression and a target image without the expression; inputting the source image and the target image into a three-dimensional expression model respectively to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; replacing corresponding face attributes in the second face attributes by at least part of face attributes in the first face attributes to obtain a plurality of replaced second face attributes; according to the plurality of second face attributes after the replacement processing, performing three-dimensional face reconstruction and rendering on the face in the target image to obtain a rendered three-dimensional face image; and inputting the rendered three-dimensional face image into an expression driving model so as to drive the face in the target image in an expression mode.
According to another aspect of the present disclosure, there is provided an expression driving apparatus including: the first acquisition module is used for acquiring a source image with an expression and a target image without the expression; a second obtaining module, configured to input the source image and the target image into a three-dimensional expression model respectively, so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; a replacing module, configured to replace, by at least part of the plurality of first face attributes, corresponding face attributes in the plurality of second face attributes with at least part of the plurality of first face attributes, so as to obtain a plurality of second face attributes after replacement processing; the processing module is used for carrying out three-dimensional face reconstruction and rendering on the face in the target image according to the plurality of replaced second face attributes to obtain a rendered three-dimensional face image; and the driving module is used for inputting the rendered three-dimensional face image into an expression driving model so as to drive the expression of the face in the target image.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of an embodiment of the first aspect of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a flow chart diagram of an expression driving method according to an embodiment of the disclosure;
FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The facial expression driving technology is one of important computer vision technologies, and the task is to drive the facial expression of a target picture through a facial expression picture so that the facial expressions of the target picture and the facial expression picture are consistent as much as possible. Facial expression-driven technology is very widespread in general entertainment applications.
In the related art, a face 2D key point of a driving image is detected, and the face 2D key point is expression-expressed to generate a face picture driven by a corresponding expression.
However, the expression driving technology based on the 2D facial key points cannot decouple the expression and the facial pose, and when the difference between the pose of the driving picture and the pose of the target image is large, the pose of the generated picture changes along with the driving image, the original pose of the target image cannot be maintained, and more various expression drivers cannot be satisfied.
In order to solve the above problems, the present disclosure provides an expression driving method, an expression driving apparatus, an electronic device, and a storage medium.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. It should be noted that the expression driving method of the embodiment of the present disclosure may be applied to the expression driving apparatus of the embodiment of the present disclosure, and the apparatus may be configured in an electronic device. The electronic device may be a mobile terminal, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems.
As shown in fig. 1, the expression driving method may include the steps of:
step 101, obtaining a source image with an expression and a target image without the expression.
In the embodiment of the disclosure, an object can be shot by using an image acquisition device to obtain a source image with an expression and a target image without the expression, or the source image with the expression and the target image without the expression are downloaded from a network. Wherein the expression in the source image may comprise: happy, angry, excited or angry facial expressions.
Step 102, inputting the source image and the target image into the three-dimensional expression model respectively to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image.
In order to achieve decoupling between each of the facial attributes, the source image and the target image may be respectively input into a three-dimensional expression model, and the three-dimensional expression model may output a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image. It should be noted that the first face attribute and the second face attribute include: at least one of a facial expression, a facial pose, facial lighting, and facial shape, the first facial attribute may be different from the second facial attribute.
In addition, it should be noted that the three-dimensional expression model may include an encoding layer and a decoding layer; the encoding layer is used for respectively inputting the source image and the target image into the three-dimensional expression model so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image, and therefore decoupling of all the facial attributes in the facial attributes is achieved; and the decoding layer is used for carrying out three-dimensional face reconstruction on the face in the target image according to the plurality of replaced second face attributes to obtain a reconstructed three-dimensional face image so as to realize face reconstruction on the plurality of replaced second face attributes.
As an application scenario, in a face image processing and face recognition scenario, the three-dimensional expression model may be a face 3D deformation statistical model (referred to as 3 DMM), and in order to achieve decoupling between each face attribute in the face attributes, the face image and the target image may be respectively input into a coding layer of the 3DMM, so as to obtain a plurality of first face attributes corresponding to the face image and a plurality of second face attributes corresponding to the target image.
And 103, replacing the corresponding face attribute in the second face attributes with at least part of the first face attributes to obtain a plurality of replaced second face attributes.
In order to make the target image keep the original facial pose and only perform expression driving on the target image, in the embodiment of the present disclosure, at least part of the plurality of first facial attributes may be used to replace the corresponding facial attributes in the plurality of second facial attributes, so as to obtain a plurality of second facial attributes after replacement processing. For example, the facial expression in the second facial attribute may be replaced with the facial expression in the first facial attribute, and the second facial attribute after replacing the facial expression may be used as the plurality of second facial attributes after the replacement processing.
And 104, performing three-dimensional face reconstruction and rendering on the face in the target image according to the plurality of replaced second face attributes to obtain a rendered three-dimensional face image.
In order to present the replaced plurality of second facial attributes, the replaced plurality of second facial attributes may be input into a decoding layer of the three-dimensional expression model to obtain a reconstructed three-dimensional face image. Further, a rendered three-dimensional face image is obtained by a 3D rendering technique.
And 105, inputting the rendered three-dimensional face image into an expression driving model so as to drive the face in the target image in an expression mode.
It can be understood that, because the rendered three-dimensional face image has poor reality, in order to make the expression-driven target image more realistic, in the embodiment of the present disclosure, the rendered three-dimensional face image may be input into the expression-driven model to perform expression driving on the face in the target image.
In conclusion, a source image with an expression and a target image without the expression are obtained; respectively inputting a source image and a target image into a three-dimensional expression model so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; replacing corresponding face attributes in the second face attributes with at least part of face attributes in the first face attributes to obtain replaced second face attributes; according to the plurality of replaced second facial attributes, performing three-dimensional facial reconstruction and rendering on the face in the target image to obtain a rendered three-dimensional facial image; and inputting the rendered three-dimensional face image into an expression driving model so as to drive the expression of the face in the target image. Therefore, the decoupling of the facial expressions and facial gestures in the source image and the target image can be realized, further, the facial expressions and facial gestures of the target image can be controlled independently, and more diversified expression drivers can be better met.
In order to keep the target image in the original facial pose, only the target image is expression-driven, as shown in fig. 2, and fig. 2 is a schematic diagram according to a second embodiment of the present disclosure. In the embodiment of the present disclosure, the facial expression in the second facial attribute may be replaced with a facial expression in the plurality of first facial attributes to obtain a replacement-processed second facial attribute. The embodiment shown in fig. 2 may include the following steps:
step 201, a source image with an expression and a target image without the expression are obtained.
Step 202, inputting the source image and the target image into the three-dimensional expression model respectively to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image.
And step 203, performing replacement processing on the facial expression in the second facial attribute according to the facial expression in the plurality of first facial attributes.
In the disclosed embodiment, each of the first and second face attributes may include: facial shape, facial pose, facial expression, and facial illumination, and the facial expression in the second facial attribute may be replaced with the facial expression in the first facial attribute.
Step 204, the facial expressions after the replacement processing in the second facial attribute and the facial gestures, facial shapes and facial illumination remained by the replacement processing in the second facial attribute are used as a plurality of second facial attributes after the replacement processing.
That is, after the facial expression in the second facial attribute is subjected to the replacement processing using the facial expression in the first facial attribute, the replacement-processed facial expression in the second facial attribute, in which the facial pose, the facial shape, and the facial illumination originally retained, may be used as the plurality of second facial attributes after the replacement processing.
And step 205, performing three-dimensional face reconstruction and rendering on the face in the target image according to the plurality of replaced second face attributes to obtain a rendered three-dimensional face image.
And step 206, inputting the rendered three-dimensional face image into an expression driving model so as to drive the face in the target image in an expression mode.
It should be noted that the execution processes of steps 201 to 202 and steps 205 to 206 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this and are not described again.
In conclusion, the facial expression in the second facial attribute is replaced according to the facial expression in the plurality of first facial attributes; the facial expression after the replacement processing in the second facial attribute and the facial pose, the facial shape and the facial illumination retained by the replacement processing in the second facial attribute are used as a plurality of second facial attributes after the replacement processing. Therefore, the target image can keep the original facial posture, and only the expression of the target image is driven.
In order to perform face reconstruction on the replaced plurality of second facial attributes, as shown in fig. 3, fig. 3 is a schematic diagram according to a third embodiment of the present disclosure, in which a three-dimensional face reconstruction and rendering may be performed on a face in a target image according to the replaced plurality of second facial attributes to obtain a reconstructed three-dimensional face image. The embodiment shown in fig. 3 may include the following steps:
step 301, acquiring a source image with an expression and a target image without the expression.
Step 302, inputting the source image and the target image into the three-dimensional expression model respectively to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image.
And step 303, replacing the corresponding face attribute in the second face attributes with at least part of the first face attributes to obtain a plurality of replaced second face attributes.
And 304, performing three-dimensional face reconstruction on the face in the target image according to the plurality of replaced second face attribute coefficients to obtain a reconstructed three-dimensional face image.
In the embodiment of the disclosure, the plurality of second facial attribute coefficients after the replacement processing may be input into a decoding layer of the three-dimensional expression model, and the three-dimensional expression model may output a reconstructed three-dimensional facial image.
And 305, performing three-dimensional face rendering on the reconstructed three-dimensional face image to obtain a rendered three-dimensional face image.
In order to make the acquired three-dimensional face image more accurate and real, a 3D rendering technology can be adopted to perform three-dimensional face rendering on the reconstructed three-dimensional face image so as to obtain a rendered three-dimensional face image.
Step 306, inputting the rendered three-dimensional face image into an expression driving model to perform expression driving on the face in the target image.
It should be noted that the execution processes of steps 301 to 303 and step 306 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this, and are not described again.
In conclusion, the face in the target image is subjected to three-dimensional face reconstruction according to the plurality of second face attribute coefficients after the replacement processing, so that a reconstructed three-dimensional face image is obtained; and performing three-dimensional face rendering on the reconstructed three-dimensional face image to obtain a rendered three-dimensional face image, so that the plurality of replaced second face attributes can be subjected to face reconstruction.
In order to enable the expression driving model to perform expression driving on the rendered three-dimensional facial image to obtain a more real facial driving image, as shown in fig. 4, fig. 4 is a schematic diagram according to a fourth embodiment of the present disclosure, in the embodiment of the present disclosure, before the rendered three-dimensional facial image is input to the expression driving model, the expression driving model may be trained to output the more real facial driving image, and the embodiment shown in fig. 4 may include the following steps:
step 401, acquiring a source image with an expression and a target image without the expression.
Step 402, inputting the source image and the target image into the three-dimensional expression model respectively to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image.
And step 403, replacing the corresponding face attribute in the second face attributes with at least part of the first face attributes to obtain a plurality of replaced second face attributes.
And step 404, performing three-dimensional face reconstruction and rendering on the face in the target image according to the plurality of replaced second face attributes to obtain a rendered three-dimensional face image.
Step 405, acquiring a plurality of sample images with expressions.
In the embodiment of the present disclosure, a plurality of frames of sample images with expressions may be obtained by downloading through an image acquisition device or a network, where it should be noted that the plurality of frames of sample images with expressions may be sample images with different expressions of the same object, or the plurality of frames of sample images with expressions may be sample images with different expressions of different objects.
Step 406, inputting the sample image into a coding layer of the three-dimensional expression model for each frame of sample image to obtain a sample facial attribute corresponding to the sample image; wherein the sample facial attributes comprise: at least one of a sample facial expression, a sample facial shape, a sample facial pose, and a sample facial illumination.
Further, each frame of sample image of a plurality of frames of sample images with expressions may be respectively input into an encoding layer of a three-dimensional expression model, and the three-dimensional expression model may output a sample facial attribute corresponding to each frame of sample image, where it should be noted that the sample facial attribute may include: at least one of a sample facial expression, a sample facial shape, a sample facial pose, and a sample facial illumination.
Step 407, inputting the sample facial expression, the sample facial shape, the sample facial pose and the sample facial illumination into a decoding layer of the three-dimensional expression model to perform three-dimensional facial reconstruction on the face in the sample image, so as to obtain a reconstructed three-dimensional sample facial image.
Furthermore, the sample facial expression, the sample facial shape, the sample facial posture and the sample facial illumination can be input into a decoding layer of the three-dimensional expression model, and the three-dimensional expression model can perform three-dimensional facial reconstruction on the sample facial attribute to obtain a reconstructed three-dimensional sample facial image.
And step 408, performing three-dimensional face rendering on the reconstructed three-dimensional sample face image to obtain a rendered three-dimensional sample face image.
In the embodiment of the disclosure, the three-dimensional face rendering can be performed on the reconstructed three-dimensional sample face image by using a three-dimensional rendering technology, so as to obtain a rendered three-dimensional sample face image.
And 409, training the initial expression driving model according to the rendered three-dimensional sample facial image and the rendered sample image to generate an expression driving model.
As an example, inputting a rendered three-dimensional sample facial image into an initial expression driving model to obtain an expression prediction image; determining a loss function value according to the difference between the sample image and the expression predicted image; the initial expression-driven model is trained to minimize the loss function values based on the loss function values.
That is, in order to improve the accuracy of the expression driving model, the rendered three-dimensional sample facial image may be input into an initial expression driving model, the initial expression driving model may output an expression prediction image, and further, the sample image may be compared with the expression prediction image to determine a difference between the sample image and the expression prediction image, and a loss function value may be determined according to the difference, for example, the loss function value may include a first sub-loss function value and a second sub-loss function value, wherein the first sub-loss function value may be determined according to an absolute value of a difference between the sample image and the expression prediction image, and at the same time, the sample image and the expression prediction image may be input into a trained Visual Graphics Generator (VGG), a semantic vector corresponding to the sample image and a semantic vector corresponding to the expression prediction image may be generated, and the second sub-loss function value may be determined according to an absolute value of a difference between the semantic vector corresponding to the sample image and the semantic vector corresponding to the expression. Furthermore, according to the loss function value, the initial expression driving model can be trained in a gradient feedback mode so as to minimize the loss function value.
As another example, image normalization processing is performed on the rendered three-dimensional sample face image and the sample image to obtain a target three-dimensional sample face image; inputting a target three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image; determining a loss function value according to the difference between the sample image and the expression predicted image; the initial expression-driven model is trained to minimize the loss function values based on the loss function values.
In order to distribute data of the rendered three-dimensional sample face image and the rendered sample image in the same area, reduce the difference between the rendered three-dimensional sample face image and the sample image, facilitate training of the initial expression driving model, and perform image normalization processing on the rendered three-dimensional sample face image and the rendered sample image to obtain a target three-dimensional sample face image. For example, the pixel value of each pixel in the rendered three-dimensional sample face image and sample image may be divided by 255 and then subtracted by 1, such that the pixel value of each pixel is between [ -0.5,0.5 ]. Then, the target three-dimensional sample face image can be input into an initial expression driving model, the initial expression model can output an expression predicted image, and then the sample image and the expression predicted image can be compared to determine the difference between the sample image and the expression predicted image, and a loss function value is determined according to the difference. According to the loss function value, the initial expression driving model can be trained in a gradient return mode so as to minimize the loss function value.
Step 410, inputting the rendered three-dimensional face image into an expression driving model to perform expression driving on the face in the target image.
It should be noted that the execution processes of steps 401 to 404 and step 410 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this, and are not described again.
In conclusion, multiple frames of sample images with expressions are obtained; inputting the sample image into a coding layer of a three-dimensional expression model aiming at each frame of sample image so as to obtain a sample face attribute corresponding to the sample image; wherein the sample facial attributes comprise: at least one of a sample facial expression, a sample facial shape, a sample facial pose, and a sample facial illumination; inputting the sample facial expression, the sample facial shape, the sample facial posture and the sample facial illumination into a decoding layer of the three-dimensional expression model so as to carry out three-dimensional facial reconstruction on the face in the sample image and obtain a reconstructed three-dimensional sample facial image; performing three-dimensional face rendering on the reconstructed three-dimensional sample face image to obtain a rendered three-dimensional sample face image; and training the initial expression driving model according to the rendered three-dimensional sample face image and the rendered sample image to generate an expression driving model. Therefore, the expression driving model can perform expression driving on the rendered three-dimensional face image to acquire a more real face driving image.
In order to more clearly illustrate the above embodiments, the description will now be made by way of example.
For example, as shown in fig. 5, a source image (source image) may represent a source image having an expression, a target image may represent a target image without an expression, a 3DMM may represent a three-dimensional expression model, and the source image and the target image may be respectively input into an encoding layer of the 3DMM to obtain a shape (facial shape), a pos (facial pose), a light (facial illumination), and an exp (facial expression) corresponding to the source image, and a shape (facial shape), a pos (facial pose), a light (facial illumination), and an exp (facial expression) corresponding to the target image. Then, replacing exp in the target image by exp in the source image, wherein the replaced face attribute corresponding to the target image comprises: and the replaced exp, the original shape, the position and the light remained in the target image. Furthermore, the face attribute after the replacement processing corresponding to the target image may be input to a decoding layer of the 3d dm model to perform three-dimensional face reconstruction and rendering to obtain a rendered three-dimensional face image, and finally, the rendered three-dimensional face image may be input to a translator model (expression driver model) that outputs an expression driver image corresponding to the target image.
According to the expression driving method, a source image with an expression and a target image without the expression are obtained; respectively inputting a source image and a target image into a three-dimensional expression model to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; replacing corresponding face attributes in the second face attributes with at least part of face attributes in the first face attributes to obtain replaced second face attributes; according to the plurality of replaced second facial attributes, performing three-dimensional facial reconstruction and rendering on the face in the target image to obtain a rendered three-dimensional facial image; and inputting the rendered three-dimensional face image into an expression driving model so as to drive the face in the target image in an expression mode. The method includes the steps that a source image and a target image are respectively input into a three-dimensional expression model, a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image are obtained, furthermore, at least part of the first facial attributes are adopted to replace the corresponding facial attributes in the second facial attributes, three-dimensional facial reconstruction and rendering are carried out on the replaced second facial attributes, and finally expression driving is carried out on a rendered three-dimensional facial image through an expression driving model. Therefore, the decoupling of the facial expressions and facial gestures in the source image and the target image can be realized, further, the facial expressions and facial gestures of the target image can be controlled independently, and more diversified expression drivers can be better met.
In order to realize the embodiment, the present disclosure further provides an expression driving apparatus.
Fig. 6 is a schematic diagram according to a fifth embodiment of the present disclosure, and as shown in fig. 6, an expression driving apparatus 600 includes: a first acquisition module 610, a second acquisition module 620, a replacement module 630, a processing module 640, and a driving module 650.
The first obtaining module 610 is configured to obtain a source image with an expression and a target image without the expression; a second obtaining module 620, configured to input the source image and the target image into the three-dimensional expression model respectively, so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; a replacing module 630, configured to replace, by at least part of the plurality of first facial attributes, corresponding facial attributes in the plurality of second facial attributes to obtain a plurality of second facial attributes after replacement processing; the processing module 640 is configured to perform three-dimensional face reconstruction and rendering on a face in the target image according to the plurality of second face attributes after the replacement processing, so as to obtain a rendered three-dimensional face image; and the driving module 650 is configured to input the rendered three-dimensional facial image into an expression driving model, so as to perform expression driving on the face in the target image.
As a possible implementation manner of the embodiment of the present disclosure, the replacing module 630 is specifically configured to: performing replacement processing on the facial expression in the second facial attribute according to the facial expression in the plurality of first facial attributes; the facial expression after the replacement processing in the second facial attribute and the facial pose, the facial shape and the facial illumination retained by the replacement processing in the second facial attribute are used as a plurality of second facial attributes after the replacement processing.
As a possible implementation manner of the embodiment of the present disclosure, the processing module 640 is specifically configured to: according to the plurality of second face attribute coefficients after the replacement processing, performing three-dimensional face reconstruction on the face in the target image to obtain a reconstructed three-dimensional face image; and performing three-dimensional face rendering on the reconstructed three-dimensional face image to obtain a rendered three-dimensional face image.
As a possible implementation manner of the embodiment of the present disclosure, the three-dimensional expression model includes a coding layer and a decoding layer; the encoding layer is used for respectively inputting a source image and a target image into the three-dimensional expression model so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; and the decoding layer is used for carrying out three-dimensional face reconstruction on the face in the target image according to the plurality of replaced second face attributes to obtain a reconstructed three-dimensional face image.
As a possible implementation manner of the embodiment of the present disclosure, the expression driving apparatus 600 further includes: the device comprises a third acquisition module, a fourth acquisition module, a reconstruction module, a rendering module and a training module.
The third acquisition module is used for acquiring a plurality of frames of sample images with expressions; the fourth acquisition module is used for inputting the sample image into a coding layer of the three-dimensional expression model aiming at each frame of sample image so as to acquire the sample facial attribute corresponding to the sample image; wherein the sample facial attributes comprise: at least one of a sample facial expression, a sample facial shape, a sample facial pose, and a sample facial illumination; the reconstruction module is used for inputting the sample facial expression, the sample facial shape, the sample facial posture and the sample facial illumination into a decoding layer of the three-dimensional expression model so as to carry out three-dimensional facial reconstruction on the face in the sample image and obtain a reconstructed three-dimensional sample facial image; the rendering module is used for performing three-dimensional face rendering on the reconstructed three-dimensional sample face image to obtain a rendered three-dimensional sample face image; and the training module is used for training the initial expression driving model according to the rendered three-dimensional sample facial image and the rendered sample image so as to generate the expression driving model.
As a possible implementation manner of the embodiment of the present disclosure, the training module is specifically configured to: inputting the rendered three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image; determining a loss function value according to the difference between the sample image and the expression predicted image; and training the initial expression driving model according to the loss function value so as to minimize the loss function value.
As a possible implementation manner of the embodiment of the present disclosure, the training module is specifically configured to: performing image normalization processing on the rendered three-dimensional face image and the sample image to obtain a target three-dimensional sample face image; inputting a target three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image; determining a loss function value according to the difference between the sample image and the expression predicted image; the initial expression-driven model is trained to minimize the loss function values based on the loss function values.
The expression driving device of the embodiment of the disclosure acquires a source image with an expression and a target image without the expression; respectively inputting a source image and a target image into a three-dimensional expression model so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image; replacing corresponding face attributes in the second face attributes with at least part of face attributes in the first face attributes to obtain replaced second face attributes; according to the plurality of replaced second facial attributes, performing three-dimensional facial reconstruction and rendering on the face in the target image to obtain a rendered three-dimensional facial image; and inputting the rendered three-dimensional face image into an expression driving model so as to drive the face in the target image in an expression mode. The device can achieve the purpose that a plurality of first facial attributes corresponding to a source image and a plurality of second facial attributes corresponding to a target image are obtained by respectively inputting the source image and the target image into the three-dimensional expression model, furthermore, at least part of the first facial attributes are adopted to replace the corresponding facial attributes in the second facial attributes, the replaced second facial attributes are subjected to three-dimensional facial reconstruction and rendering, and finally, expression driving is carried out on the rendered three-dimensional facial image through the expression driving model. Therefore, the decoupling of the facial expressions and facial gestures in the source image and the target image can be realized, further, the facial expressions and facial gestures of the target image can be controlled independently, and more diversified expression drivers can be better met.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all carried out on the premise of obtaining the consent of the user, and all accord with the regulation of related laws and regulations without violating the good custom of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the expression driving method. For example, in some embodiments, the expression driver method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the expression driving method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the expression driving method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (13)

1. An expression driving method comprising:
acquiring a source image with an expression and a target image without the expression;
inputting the source image and the target image into a three-dimensional expression model respectively to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image;
replacing corresponding face attributes in the second face attributes with at least part of face attributes in the first face attributes to obtain replaced second face attributes;
according to the plurality of second facial attributes after the replacement processing, performing three-dimensional facial reconstruction and rendering on the face in the target image to obtain a rendered three-dimensional facial image;
inputting the rendered three-dimensional facial image into an expression driving model so as to drive the face in the target image in an expression manner;
before the inputting the rendered three-dimensional facial image into an expression driving model, the method further includes:
acquiring a plurality of frames of sample images with expressions;
inputting the sample image into a coding layer of the three-dimensional expression model aiming at each frame of the sample image so as to obtain a sample face attribute corresponding to the sample image; wherein the sample facial attributes comprise: at least one of a sample facial expression, a sample facial shape, a sample facial pose, and a sample facial illumination;
inputting the sample facial expression, the sample facial shape, the sample facial posture and the sample facial illumination into a decoding layer of the three-dimensional expression model so as to carry out three-dimensional facial reconstruction on the face in the sample image and obtain a reconstructed three-dimensional sample facial image;
performing three-dimensional face rendering on the reconstructed three-dimensional sample face image to obtain a rendered three-dimensional sample face image;
training an initial expression driving model according to the rendered three-dimensional sample facial image and the sample image to generate the expression driving model;
the training of an initial expression driving model according to the rendered three-dimensional sample facial image and the sample image to generate the expression driving model comprises:
inputting the rendered three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image;
determining a loss function value according to the difference between the sample image and the expression predicted image;
training the initial expression driving model according to the loss function value so as to minimize the loss function value;
wherein the loss function values comprise a first sub-loss function value and a second sub-loss function value, and the determining the loss function values according to the difference between the sample image and the expression predicted image comprises:
determining a first sub-loss function value according to the absolute value of the difference value between the sample image and the expression predicted image;
and meanwhile, inputting the sample image and the expression predicted image into a trained eye view image generator, generating a semantic vector corresponding to the sample image and a semantic vector corresponding to the expression predicted image, and determining a second sub-loss function value according to the absolute value of the difference between the semantic vector corresponding to the sample image and the semantic vector corresponding to the expression predicted image.
2. The method of claim 1, wherein said replacing corresponding ones of said plurality of second facial attributes with at least some of said plurality of first facial attributes to obtain replacement processed plurality of second facial attributes comprises:
performing replacement processing on the facial expression in the second facial attribute according to the facial expression in the plurality of first facial attributes;
and using the facial expressions after the replacement processing in the second facial attributes and the facial gestures, the facial shapes and the facial illumination remained by the replacement processing in the second facial attributes as a plurality of second facial attributes after the replacement processing.
3. The method of claim 1, wherein the performing three-dimensional facial reconstruction and rendering of the face in the target image according to the plurality of second facial attributes after the replacement processing to obtain a rendered three-dimensional facial image comprises:
performing three-dimensional face reconstruction on the face in the target image according to the plurality of second face attribute coefficients after the replacement processing to obtain a reconstructed three-dimensional face image;
and performing three-dimensional face rendering on the reconstructed three-dimensional face image to obtain a rendered three-dimensional face image.
4. The method of claim 3, wherein the three-dimensional expression model comprises an encoding layer and a decoding layer;
the encoding layer is used for respectively inputting the source image and the target image into a three-dimensional expression model so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image;
and the decoding layer is used for carrying out three-dimensional face reconstruction on the face in the target image according to the plurality of replaced second face attributes to obtain a reconstructed three-dimensional face image.
5. The method of claim 1, wherein training an initial expression-driven model from the rendered three-dimensional sample facial image and the sample image to generate the expression-driven model comprises:
performing image normalization processing on the rendered three-dimensional sample face image and the sample image to obtain a target three-dimensional sample face image;
inputting the target three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image;
determining a loss function value according to the difference between the sample image and the expression predicted image;
and training the initial expression driving model according to the loss function value so as to minimize the loss function value.
6. An expression driving apparatus comprising:
the first acquisition module is used for acquiring a source image with an expression and a target image without the expression;
the second acquisition module is used for respectively inputting the source image and the target image into the three-dimensional expression model so as to acquire a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image;
a replacing module, configured to replace, by at least part of the plurality of first face attributes, corresponding face attributes in the plurality of second face attributes with at least part of the plurality of first face attributes, so as to obtain a plurality of second face attributes after replacement processing;
the processing module is used for carrying out three-dimensional face reconstruction and rendering on the face in the target image according to the plurality of replaced second face attributes to obtain a rendered three-dimensional face image;
the driving module is used for inputting the rendered three-dimensional facial image into an expression driving model so as to drive the expression of the face in the target image;
the device further comprises:
the third acquisition module is used for acquiring a plurality of frames of sample images with expressions;
a fourth obtaining module, configured to, for each frame of the sample image, input the sample image into a coding layer of the three-dimensional expression model to obtain a sample facial attribute corresponding to the sample image; wherein the sample facial attributes comprise: at least one of a sample facial expression, a sample facial shape, a sample facial pose, and a sample facial illumination;
the reconstruction module is used for inputting the sample facial expression, the sample facial shape, the sample facial posture and the sample facial illumination into a decoding layer of the three-dimensional expression model so as to carry out three-dimensional facial reconstruction on the face in the sample image and obtain a reconstructed three-dimensional sample facial image;
the rendering module is used for performing three-dimensional face rendering on the reconstructed three-dimensional sample face image to obtain a rendered three-dimensional sample face image;
the training module is used for training an initial expression driving model according to the rendered three-dimensional sample facial image and the sample image so as to generate the expression driving model;
the training module is specifically configured to:
inputting the rendered three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image;
determining a loss function value according to the difference between the sample image and the expression predicted image;
training the initial expression driving model according to the loss function value so as to minimize the loss function value;
wherein the loss function values comprise a first sub-loss function value and a second sub-loss function value, and the determining the loss function values according to the difference between the sample image and the expression predicted image comprises:
determining a first sub-loss function value according to the absolute value of the difference value between the sample image and the expression predicted image;
and meanwhile, inputting the sample image and the expression predicted image into a trained eye view image generator, generating a semantic vector corresponding to the sample image and a semantic vector corresponding to the expression predicted image, and determining a second sub-loss function value according to the absolute value of the difference between the semantic vector corresponding to the sample image and the semantic vector corresponding to the expression predicted image.
7. The apparatus according to claim 6, wherein the replacement module is specifically configured to:
performing replacement processing on the facial expression in the second facial attribute according to the facial expression in the plurality of first facial attributes;
and using the facial expressions after the replacement processing in the second facial attributes and the facial gestures, the facial shapes and the facial illumination remained by the replacement processing in the second facial attributes as a plurality of second facial attributes after the replacement processing.
8. The apparatus according to claim 6, wherein the processing module is specifically configured to:
performing three-dimensional face reconstruction on the face in the target image according to the plurality of second face attribute coefficients after the replacement processing to obtain a reconstructed three-dimensional face image;
and performing three-dimensional face rendering on the reconstructed three-dimensional face image to obtain a rendered three-dimensional face image.
9. The apparatus of claim 8, wherein the three-dimensional expression model comprises an encoding layer and a decoding layer;
the encoding layer is used for respectively inputting the source image and the target image into a three-dimensional expression model so as to obtain a plurality of first facial attributes corresponding to the source image and a plurality of second facial attributes corresponding to the target image;
and the decoding layer is used for carrying out three-dimensional face reconstruction on the face in the target image according to the plurality of replaced second face attributes to obtain a reconstructed three-dimensional face image.
10. The apparatus of claim 6, wherein the training module is specifically configured to:
carrying out image normalization processing on the rendered three-dimensional face image and the sample image to obtain a target three-dimensional sample face image;
inputting the target three-dimensional sample facial image into an initial expression driving model to obtain an expression predicted image;
determining a loss function value according to the difference between the sample image and the expression predicted image;
and training the initial expression driving model according to the loss function value so as to minimize the loss function value.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method according to any one of claims 1-5.
CN202111117185.0A 2021-09-23 2021-09-23 Expression driving method and device, electronic equipment and storage medium Active CN113870399B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111117185.0A CN113870399B (en) 2021-09-23 2021-09-23 Expression driving method and device, electronic equipment and storage medium
PCT/CN2022/088311 WO2023045317A1 (en) 2021-09-23 2022-04-21 Expression driving method and apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111117185.0A CN113870399B (en) 2021-09-23 2021-09-23 Expression driving method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113870399A CN113870399A (en) 2021-12-31
CN113870399B true CN113870399B (en) 2022-12-02

Family

ID=78993646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111117185.0A Active CN113870399B (en) 2021-09-23 2021-09-23 Expression driving method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113870399B (en)
WO (1) WO2023045317A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870399B (en) * 2021-09-23 2022-12-02 北京百度网讯科技有限公司 Expression driving method and device, electronic equipment and storage medium
CN115984947B (en) * 2023-02-21 2023-06-27 北京百度网讯科技有限公司 Image generation method, training device, electronic equipment and storage medium
CN117115317A (en) * 2023-08-10 2023-11-24 北京百度网讯科技有限公司 Avatar driving and model training method, apparatus, device and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298917A (en) * 2019-07-05 2019-10-01 北京华捷艾米科技有限公司 A kind of facial reconstruction method and system
CN110941332A (en) * 2019-11-06 2020-03-31 北京百度网讯科技有限公司 Expression driving method and device, electronic equipment and storage medium
CN111599002A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Method and apparatus for generating image
CN111968203A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Animation driving method, animation driving device, electronic device, and storage medium
CN112215050A (en) * 2019-06-24 2021-01-12 北京眼神智能科技有限公司 Nonlinear 3DMM face reconstruction and posture normalization method, device, medium and equipment
WO2021012590A1 (en) * 2019-07-22 2021-01-28 广州华多网络科技有限公司 Facial expression shift method, apparatus, storage medium, and computer device
CN112907725A (en) * 2021-01-22 2021-06-04 北京达佳互联信息技术有限公司 Image generation method, image processing model training method, image processing device, and image processing program
US11055514B1 (en) * 2018-12-14 2021-07-06 Snap Inc. Image face manipulation
CN113221847A (en) * 2021-06-07 2021-08-06 广州虎牙科技有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113313085A (en) * 2021-07-28 2021-08-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113344777A (en) * 2021-08-02 2021-09-03 中国科学院自动化研究所 Face changing and replaying method and device based on three-dimensional face decomposition

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101944238B (en) * 2010-09-27 2011-11-23 浙江大学 Data driving face expression synthesis method based on Laplace transformation
US20200020173A1 (en) * 2018-07-16 2020-01-16 Zohirul Sharif Methods and systems for constructing an animated 3d facial model from a 2d facial image
GB2586260B (en) * 2019-08-15 2021-09-15 Huawei Tech Co Ltd Facial image processing
CN110868598B (en) * 2019-10-17 2021-06-22 上海交通大学 Video content replacement method and system based on countermeasure generation network
CN113327278B (en) * 2021-06-17 2024-01-09 北京百度网讯科技有限公司 Three-dimensional face reconstruction method, device, equipment and storage medium
CN113870399B (en) * 2021-09-23 2022-12-02 北京百度网讯科技有限公司 Expression driving method and device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055514B1 (en) * 2018-12-14 2021-07-06 Snap Inc. Image face manipulation
CN112215050A (en) * 2019-06-24 2021-01-12 北京眼神智能科技有限公司 Nonlinear 3DMM face reconstruction and posture normalization method, device, medium and equipment
CN110298917A (en) * 2019-07-05 2019-10-01 北京华捷艾米科技有限公司 A kind of facial reconstruction method and system
WO2021012590A1 (en) * 2019-07-22 2021-01-28 广州华多网络科技有限公司 Facial expression shift method, apparatus, storage medium, and computer device
CN110941332A (en) * 2019-11-06 2020-03-31 北京百度网讯科技有限公司 Expression driving method and device, electronic equipment and storage medium
CN111599002A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Method and apparatus for generating image
CN111968203A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Animation driving method, animation driving device, electronic device, and storage medium
CN112907725A (en) * 2021-01-22 2021-06-04 北京达佳互联信息技术有限公司 Image generation method, image processing model training method, image processing device, and image processing program
CN113221847A (en) * 2021-06-07 2021-08-06 广州虎牙科技有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113313085A (en) * 2021-07-28 2021-08-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113344777A (en) * 2021-08-02 2021-09-03 中国科学院自动化研究所 Face changing and replaying method and device based on three-dimensional face decomposition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
3DMM与GAN结合的实时人脸表情迁移方法;高翔等;《计算机应用与软件》;20200412(第04期);第119-126页 *

Also Published As

Publication number Publication date
WO2023045317A1 (en) 2023-03-30
CN113870399A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN113870399B (en) Expression driving method and device, electronic equipment and storage medium
CN113643412A (en) Virtual image generation method and device, electronic equipment and storage medium
CN112562069B (en) Method, device, equipment and storage medium for constructing three-dimensional model
EP3876204A2 (en) Method and apparatus for generating human body three-dimensional model, device and storage medium
CN113052962B (en) Model training method, information output method, device, equipment and storage medium
CN113658309A (en) Three-dimensional reconstruction method, device, equipment and storage medium
CN113963110A (en) Texture map generation method and device, electronic equipment and storage medium
CN114549710A (en) Virtual image generation method and device, electronic equipment and storage medium
CN112989970A (en) Document layout analysis method and device, electronic equipment and readable storage medium
CN113365146B (en) Method, apparatus, device, medium and article of manufacture for processing video
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN115147265A (en) Virtual image generation method and device, electronic equipment and storage medium
CN114937478B (en) Method for training a model, method and apparatus for generating molecules
CN114549728A (en) Training method of image processing model, image processing method, device and medium
CN112528995A (en) Method for training target detection model, target detection method and device
CN113379877A (en) Face video generation method and device, electronic equipment and storage medium
CN113962845B (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN113380269B (en) Video image generation method, apparatus, device, medium, and computer program product
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
CN115359166B (en) Image generation method and device, electronic equipment and medium
CN115393488B (en) Method and device for driving virtual character expression, electronic equipment and storage medium
CN115147547B (en) Human body reconstruction method and device
CN113421335B (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN113240780B (en) Method and device for generating animation
CN114078097A (en) Method and device for acquiring image defogging model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant