CN113744129A - Semantic neural rendering-based face image generation method and system - Google Patents
Semantic neural rendering-based face image generation method and system Download PDFInfo
- Publication number
- CN113744129A CN113744129A CN202111050013.6A CN202111050013A CN113744129A CN 113744129 A CN113744129 A CN 113744129A CN 202111050013 A CN202111050013 A CN 202111050013A CN 113744129 A CN113744129 A CN 113744129A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- face image
- deformation
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000009877 rendering Methods 0.000 title claims abstract description 29
- 230000001537 neural effect Effects 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 28
- 238000013507 mapping Methods 0.000 claims abstract description 22
- 230000001815 facial effect Effects 0.000 claims description 18
- 230000014509 gene expression Effects 0.000 claims description 10
- 230000003287 optical effect Effects 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims 1
- 210000004709 eyebrow Anatomy 0.000 abstract description 3
- 238000011156 evaluation Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000014616 translation Effects 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Processing Or Creating Images (AREA)
Abstract
A face image generation method based on semantic neural rendering comprises the following steps: s1, a mapping network generates a hidden vector from a target face motion descriptor; s2, under the guidance of the hidden vector, a deformation network estimates accurate deformation between a source face image and a required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough deformed image; and S3, generating a final fine image from the roughly deformed image by the editing network. The human face image generation method based on semantic neural rendering can generate images with more accurate actions, can generate more vivid results and accurate movement, and simultaneously still retains the identity information of the source human face image. Not only can a realistic image with the correct global pose be generated, but also vivid micro-presentations, such as pounding mouth and raising eyebrows, can be generated. In addition, information in irrelevant source face images is well preserved.
Description
Technical Field
The invention relates to face image generation and neural rendering, in particular to a face image generation method and a face image generation system based on semantic neural rendering.
Background
A face image is one of the most important photographic contents widely used in daily life. It is an important task to have a variety of application scenarios to be able to edit portrait images by modifying the pose and expression of a given face. However, achieving such editing is extremely challenging, as it requires the automatic perception of 3D geometry that any given face is authentic. At the same time, the acuity of the human visual system to portrait images requires algorithms to generate realistic faces and backgrounds, which makes the task more difficult.
To achieve intuitive control, the motion descriptors should be semantically meaningful, which requires representing facial expressions, head rotations and translations as completely decoupled variables. Parametric face modeling methods provide a powerful tool for describing 3D faces with semantic parameters. These methods allow controlling the shape, expression, etc. characteristics of the 3D face through parameters. In conjunction with the priors of these techniques, one may desire to control the generation of realistic face images similar to the graphics rendering process. Currently, some model-based methods combine rendered images of a three-dimensional deformable face model (3DMM) and edit portrait images by modifying expression or pose parameters. These methods achieve impressive results, but they are target person specific methods, which means that they cannot be applied to arbitrary person portraits.
In 3DMM, the 3D shape S of a face is parametrically characterized as:in which the number of the first and second groups is reduced,average of 3D shape of human face, BidAnd BexpIs the base of identity and expression obtained after scanning 200 faces and performing principal component analysis. The parameters alpha and beta are respectively 80-dimension and 64-dimension, and describe the identity of the human faceAnd expressive features. The rotation and translation of the face is expressed as R ∈ SO (3) and t ∈ R3. Up to this point, the motion information in the human face can be clearly expressed by (β, R, t) in 3 DMM.
Disclosure of Invention
The invention provides a face image generation method and system based on semantic neural rendering, which can generate an image with more accurate action.
The technical scheme of the invention is as follows:
according to one aspect of the invention, a face image generation method based on semantic neural rendering is provided, which comprises the following steps: s1, a mapping network generates a hidden vector from a target face motion descriptor; s2, under the guidance of the hidden vector, a deformation network estimates accurate deformation between a source face image and a required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough deformed image; and S3, generating a final fine image from the roughly deformed image by the editing network.
Preferably, in the above-mentioned human face image generation method based on semantic neural rendering, in step S1, the target facial motion descriptor includes expression, rotation and conversion information of the target face, and after obtaining the target facial motion descriptor, the mapping network generates a hidden vector from the target facial motion descriptor.
Preferably, in the above method for generating a face image based on semantic neural rendering, in step S2, under the guidance of the hidden vector z, the deformation network estimates an accurate deformation between the source face image and the desired target image to obtain an optical flow field, and deforms the source face image by using the estimated optical flow field to generate a rough deformed image.
Preferably, in the above method for generating a face image based on semantic neural rendering, in step S3, the editing network receives the coarse deformed image obtained in the previous step, and combines the source face image and the hidden vector to obtain a final fine image.
According to another aspect of the invention, a face image generation system based on semantic neural rendering is provided, which comprises a mapping network, a deformation network and an editing network, wherein the mapping network is used for mapping an object motion descriptor to a hidden vector; the deformation network is used for estimating the accurate deformation between the source face image and the required target image under the guidance of the hidden vector, and deforming the source face image by using the estimated deformation parameters to generate a rough deformed image; and an editing network for generating a clear image with rich details by editing the coarse morphed image, and generating a final fine image from the coarse morphed image.
According to the technical scheme of the invention, the beneficial effects are as follows:
the semantic neural rendering-based face image generation method and system can generate images with more accurate actions, can generate more vivid results and accurate movement, and simultaneously still retain the identity information of the source face image. Not only can a realistic image with the correct global pose be generated, but also vivid micro-presentations, such as pounding mouth and raising eyebrows, can be generated. In addition, information in irrelevant source face images is well preserved.
For a better understanding and appreciation of the concepts, principles of operation, and effects of the invention, reference will now be made in detail to the following examples, taken in conjunction with the accompanying drawings, in which:
drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
FIG. 1 is a flow chart of a semantic neural rendering based face image generation method of the present invention;
FIG. 2 is a network overall frame diagram of the semantic neural rendering-based face image generation method of the present invention;
FIG. 3 is a qualitative comparison graph of the present invention and other algorithms on the task of intuitive face image control;
fig. 4 is an effect diagram of the indirect human face image editing task according to the present invention.
Detailed Description
In order to make the objects, technical means and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific examples. These examples are merely illustrative and not restrictive of the invention.
A face image generation method and system based on semantic neural rendering relates to a novel neural rendering model, and after a source face image and target 3DMM parameters are given, the model can generate a vivid result with accurate target motion. The proposed system model can be divided into three parts: a mapping network, a morphing network, and an editing network, wherein the mapping network generates hidden vectors from the motion descriptors. Under the guidance of the implicit vector, the deformation network estimates the accurate deformation between the source face image and the required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough result. Finally, the editing network generates a final fine image from the coarse image.
Fig. 1 is a flowchart of a semantic-neural-rendering-based face image generation method of the present invention, and fig. 2 is an overall framework diagram of a semantic-neural-rendering-based face image generation system of the present invention, which is described with reference to fig. 1 and fig. 2, and includes the following steps:
s1. the mapping network generates a hidden vector from the target face motion descriptor (as shown in FIG. 2). In this step, as shown in fig. 2, the target face motion descriptor p includes expression, rotation, and conversion information of the target face. After the target face motion descriptor p is obtained, the mapping network generates a hidden vector z from p.
S2, under the guidance of the hidden vector, the deformation network estimates accurate deformation between the source face image and the required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough deformed image. In the step, under the guidance of a hidden vector z, a deformation network estimates a source face image IsAnd accurate deformation between the required target image to obtain an optical flow field w, and aligning the source face image I by using the estimated wsPerforming deformation to generate rough deformed image
And S3, generating a final fine image from the roughly deformed image by the editing network. In this step, the editing network receives the rough deformed image obtained in the previous stepCombining source face images IsAnd the hidden vector z to obtain the final fine imageI.e. the generated image in fig. 2
FIG. 2 is a network overall framework diagram of the semantic neural rendering-based face image generation method of the present invention. Given the source facial image (source image Is in fig. 2) and the target facial motion descriptor, the output of the model Is a facial image with accurate target motion, while retaining other information of the source facial image, such as identity, lighting, and background. As shown in fig. 2, the face image generation system model based on semantic neural rendering of the present invention can be divided into three parts: mapping networks, morphing networks, and editing networks. Firstly, mapping a target motion descriptor to a hidden vector; then generating a rough image through a deformation network; finally, the editing network is responsible for generating a sharp image with rich details by editing the coarse results (i.e., generating an image))。
The invention also provides a face image generation system based on semantic neural rendering, which comprises a mapping network, a deformation network and an editing network, wherein the mapping network is used for mapping the target motion descriptor to the hidden vector; the deformation network is used for estimating the accurate deformation between the source face image and the required target image under the guidance of the hidden vector, and deforming the source face image by using the estimated deformation parameters to generate a rough deformed image; and an editing network for generating a clear image with rich details by editing the coarse morphed image, and generating a final fine image from the coarse morphed image.
Fig. 3 shows a qualitative comparison of the present invention (i.e., the present model in fig. 3) with other algorithms on the task of intuitive face image control. It can be seen that the compared style manipulation network (styleirg) model produces impressive results with realistic details. However, it tends to generate images with a conservative strategy: face motion away from the distribution center is attenuated or ignored for better image quality. Meanwhile, some factors (such as glasses and clothes) which are not related to the facial movement are changed in the modification process. Although the proposed system is not trained using the FFHQ dataset, it still achieves impressive results when tested using this dataset. The system model of the present invention can not only generate realistic images with correct global poses, but also vivid micro-presentations, such as pounding mouth and raising eyebrows. In addition, information in irrelevant source face images is well preserved.
Compared with the existing human face image generation method, the method provided by the invention has the following two advantages: with better generation quality and with higher accuracy of the face movements. Two concepts of generation quality and face movement accuracy in face image generation and related evaluation indexes are explained below respectively: generating quality and face motion accuracy, wherein:
the quality of generation: and measuring whether the generated face image has higher image quality. On the evaluation index, the evaluation is divided into objective evaluation and subjective evaluation. Fraich perceptual distance is a commonly used objective assessment method of production quality. To calculate the Frey's perception distance of a face image generation model, a batch of face images is first generated using the model, and a batch of images is sampled from the data set for comparison. Then, the characteristics of the two batches of images are extracted, the statistical characteristics of the two batches of images are calculated, and the difference of distribution between the generated image and the real image is measured based on the statistical characteristics to serve as the evaluation of the quality of the generated image.
Face motion accuracy: and measuring whether the generated face image has the target face motion characteristics.
Specifically, the accuracy of the facial movement is measured by calculating the average distance of the expression and the posture in the 3d mms of the generated image and the target image as the average expression distance and the average posture distance, respectively.
Table 1 shows the quantitative comparison of the present invention with other algorithms on the task of intuitive face image control. As can be seen from table 1, by using a stylized generation confrontation network (StyleGAN) model as the final generator, the styleig model is able to generate a more realistic image, resulting in a lower fregue perceived distance (FID) score. However, a higher average expressive distance and average pose distance indicates that it may not be able to faithfully reconstruct the target facial motion. Unlike the stylerrig model, the method and system model provided by the invention can generate images with more accurate actions.
TABLE 1 quantitative comparison of the present invention to other algorithms on intuitive face image control task
Sensing distance of Frey cut | Mean expression distance | Mean attitude distance | |
StyleRig model | 47.37 | 0.316 | 0.0919 |
The model | 65.97 | 0.257 | 0.0252 |
Fig. 4 is an effect diagram of the indirect human face image editing task according to the present invention. It can be seen that the system model proposed by the present invention can generate more realistic results and accurate motion while still preserving the identity information of the source face image.
In summary, in order to realize controllable face image generation, the invention provides a novel neural rendering model. Given the source face image and the target 3DMM parameters, the model will produce a realistic result with accurate target motion. The proposed model can be divided into three parts: mapping networks, morphing networks, and editing networks. The mapping network generates hidden vectors from the motion descriptors. Under the guidance of the implicit vector, the deformation network estimates the accurate deformation between the source face image and the required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough result. Finally, the editing network generates a final fine image from the coarse image.
Experiments prove that the model provided by the invention has superiority and versatility. Experiments have shown that this model not only enables intuitive image control through user-specified facial actions, but also generates realistic results in an indirect portrait editing task (also called face reproduction) with the goal of mimicking another person's facial actions.
The foregoing description is of the preferred embodiment of the concepts and principles of operation in accordance with the invention. The above-described embodiments should not be construed as limiting the scope of the claims, and other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.
Claims (5)
1. A face image generation method based on semantic neural rendering is characterized by comprising the following steps:
s1, a mapping network generates a hidden vector from a target face motion descriptor;
s2, under the guidance of the hidden vector, a deformation network estimates accurate deformation between a source face image and a required target image, and deforms the source face image by using an estimated deformation parameter to generate a rough deformed image; and
and S3, generating a final fine image from the rough deformed image by the editing network.
2. The method for generating a facial image based on semantic neural rendering of claim 1, wherein in step S1, the target facial motion descriptor comprises expression, rotation and transformation information of a target face, and after obtaining the target facial motion descriptor, the mapping network generates the hidden vector from the target facial motion descriptor.
3. The semantic neural rendering-based face image generation method according to claim 1, wherein in step S2, under the guidance of the implicit vector z, the deformation network estimates an accurate deformation between the source face image and the desired target image, obtains an optical flow field, and generates a rough deformed image by deforming the source face image using the estimated optical flow field.
4. The method for generating a facial image based on semantic neural rendering of claim 1, wherein in step S3, the editing network receives the coarse deformed image obtained in the previous step, and combines the source facial image and the hidden vector to obtain a final fine image.
5. A face image generation system based on semantic neural rendering is characterized by comprising a mapping network, a deformation network and an editing network, wherein,
a mapping network for mapping the target motion descriptor to the hidden vector;
the deformation network is used for estimating the accurate deformation between the source face image and the required target image under the guidance of the hidden vector, and deforming the source face image by using the estimated deformation parameters to generate a rough deformed image; and
and the editing network is used for generating a clear image with rich details by editing the rough deformed image and generating a final fine image from the rough deformed image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111050013.6A CN113744129A (en) | 2021-09-08 | 2021-09-08 | Semantic neural rendering-based face image generation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111050013.6A CN113744129A (en) | 2021-09-08 | 2021-09-08 | Semantic neural rendering-based face image generation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113744129A true CN113744129A (en) | 2021-12-03 |
Family
ID=78737158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111050013.6A Pending CN113744129A (en) | 2021-09-08 | 2021-09-08 | Semantic neural rendering-based face image generation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113744129A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114648613A (en) * | 2022-05-18 | 2022-06-21 | 杭州像衍科技有限公司 | Three-dimensional head model reconstruction method and device based on deformable nerve radiation field |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563323A (en) * | 2017-08-30 | 2018-01-09 | 华中科技大学 | A kind of video human face characteristic point positioning method |
US20180046854A1 (en) * | 2015-02-16 | 2018-02-15 | University Of Surrey | Three dimensional modelling |
CN109961507A (en) * | 2019-03-22 | 2019-07-02 | 腾讯科技(深圳)有限公司 | A kind of Face image synthesis method, apparatus, equipment and storage medium |
CN110660076A (en) * | 2019-09-26 | 2020-01-07 | 北京紫睛科技有限公司 | Face exchange method |
CN110717418A (en) * | 2019-09-25 | 2020-01-21 | 北京科技大学 | Method and system for automatically identifying favorite emotion |
GB202007052D0 (en) * | 2020-05-13 | 2020-06-24 | Facesoft Ltd | Facial re-enactment |
CN111971713A (en) * | 2018-06-14 | 2020-11-20 | 英特尔公司 | 3D face capture and modification using image and time tracking neural networks |
CN113239857A (en) * | 2021-05-27 | 2021-08-10 | 京东科技控股股份有限公司 | Video synthesis method and device |
CN113343761A (en) * | 2021-05-06 | 2021-09-03 | 武汉理工大学 | Real-time facial expression migration method based on generation confrontation |
-
2021
- 2021-09-08 CN CN202111050013.6A patent/CN113744129A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180046854A1 (en) * | 2015-02-16 | 2018-02-15 | University Of Surrey | Three dimensional modelling |
CN107563323A (en) * | 2017-08-30 | 2018-01-09 | 华中科技大学 | A kind of video human face characteristic point positioning method |
CN111971713A (en) * | 2018-06-14 | 2020-11-20 | 英特尔公司 | 3D face capture and modification using image and time tracking neural networks |
CN109961507A (en) * | 2019-03-22 | 2019-07-02 | 腾讯科技(深圳)有限公司 | A kind of Face image synthesis method, apparatus, equipment and storage medium |
CN110717418A (en) * | 2019-09-25 | 2020-01-21 | 北京科技大学 | Method and system for automatically identifying favorite emotion |
CN110660076A (en) * | 2019-09-26 | 2020-01-07 | 北京紫睛科技有限公司 | Face exchange method |
GB202007052D0 (en) * | 2020-05-13 | 2020-06-24 | Facesoft Ltd | Facial re-enactment |
CN113343761A (en) * | 2021-05-06 | 2021-09-03 | 武汉理工大学 | Real-time facial expression migration method based on generation confrontation |
CN113239857A (en) * | 2021-05-27 | 2021-08-10 | 京东科技控股股份有限公司 | Video synthesis method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114648613A (en) * | 2022-05-18 | 2022-06-21 | 杭州像衍科技有限公司 | Three-dimensional head model reconstruction method and device based on deformable nerve radiation field |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | Maskgan: Towards diverse and interactive facial image manipulation | |
Cao et al. | Facewarehouse: A 3d facial expression database for visual computing | |
Pyun et al. | An example-based approach for facial expression cloning | |
KR20120130627A (en) | Apparatus and method for generating animation using avatar | |
Yu et al. | A video, text, and speech-driven realistic 3-D virtual head for human–machine interface | |
CN111833236B (en) | Method and device for generating three-dimensional face model for simulating user | |
KR20130003170A (en) | Method and apparatus for expressing rigid area based on expression control points | |
CN105118023A (en) | Real-time video human face cartoonlization generating method based on human facial feature points | |
CN111950430A (en) | Color texture based multi-scale makeup style difference measurement and migration method and system | |
CN106326980A (en) | Robot and method for simulating human facial movements by robot | |
CN110443872B (en) | Expression synthesis method with dynamic texture details | |
CN113744129A (en) | Semantic neural rendering-based face image generation method and system | |
Li et al. | Orthogonal-blendshape-based editing system for facial motion capture data | |
Liu et al. | Data-driven 3d neck modeling and animation | |
Mattos et al. | 3D linear facial animation based on real data | |
Do et al. | Quantitative manipulation of custom attributes on 3d-aware image synthesis | |
KR100792704B1 (en) | A Method of Retargeting A Facial Animation Based on Wire Curves And Example Expression Models | |
Sucontphunt et al. | Crafting 3d faces using free form portrait sketching and plausible texture inference | |
Pei et al. | Transferring of speech movements from video to 3D face space | |
Deena et al. | Speech-driven facial animation using a shared Gaussian process latent variable model | |
Sato et al. | Synthesis of photo-realistic facial animation from text based on HMM and DNN with animation unit | |
KR100544684B1 (en) | A feature-based approach to facial expression cloning method | |
Cosker et al. | Speech-driven facial animation using a hierarchical model | |
US20240242430A1 (en) | Model Reconstruction Method, Model Processing Method and Apparatus, Device, System, and Medium | |
Agianpuye et al. | Synthesizing neutral facial expression on 3D faces using Active Shape Models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20211203 |
|
WD01 | Invention patent application deemed withdrawn after publication |