CN113470182A - Face geometric feature editing method and deep face remodeling editing method - Google Patents

Face geometric feature editing method and deep face remodeling editing method Download PDF

Info

Publication number
CN113470182A
CN113470182A CN202111029442.5A CN202111029442A CN113470182A CN 113470182 A CN113470182 A CN 113470182A CN 202111029442 A CN202111029442 A CN 202111029442A CN 113470182 A CN113470182 A CN 113470182A
Authority
CN
China
Prior art keywords
face
local
geometric
features
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111029442.5A
Other languages
Chinese (zh)
Other versions
CN113470182B (en
Inventor
高林
陈姝宇
姜悦人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Computing Technology Innovation Research Institute
Original Assignee
Zhongke Computing Technology Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Computing Technology Innovation Research Institute filed Critical Zhongke Computing Technology Innovation Research Institute
Priority to CN202111029442.5A priority Critical patent/CN113470182B/en
Publication of CN113470182A publication Critical patent/CN113470182A/en
Application granted granted Critical
Publication of CN113470182B publication Critical patent/CN113470182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2021Shape modification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a face geometric feature editing method and a deep face remodeling editing method, which comprises the following steps: acquiring a geometric basic face image, and detecting face key points from the geometric basic face image; connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume integral variation is formed by training key points of a human face on a human face data set, and parameterizes a natural human face shape to enable the characteristics of a hidden space of the automatic encoder to decode the key points of the human face which are natural, smooth and accord with the geometric characteristics of the human face; acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder; and rendering the optimized grid into a human face geometric feature map. The invention is suitable for the fields of computer vision and computer graphics.

Description

Face geometric feature editing method and deep face remodeling editing method
Technical Field
The invention relates to a face geometric feature editing method and a deep face reshaping editing method. It is suitable for the fields of computer vision and computer graphics.
Background
Human face image editing is one of important research directions of computer vision and graphics, and has wide application in mass media and video industry. Early traditional face editing methods implemented editing mainly through image warping and pixel computational rendering, and it was difficult to generate details and handle hidden areas of the eyes and mouth.
In recent years, interactive face editing methods can be roughly divided into two types: one is to generate human faces from the condition input overall depth for editing, for example, Zhu et al, in 2020, "Image synthesis with a detailed region-adaptive normalization" published by Zhu et al in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern registration; another is a method of treating local modifications as image completion, such as SC-FEGAN, surface Editing general additive Network With User's Sketch and Color, published by Jo et al in 2019 in Proceedings of the IEEE/CVF International Conference on Computer Vision.
The above method, while capable of generating natural results, requires the user to provide line draft or semantic graph input approximating network training data for high quality drawing. When the input line draft or semantic graph is not real enough, the result is correspondingly flawed. This is relatively difficult for beginners or users without drawing skills to use.
There are also some efforts to optimize the line draft input, such as "Deep plastic surgery" published by Yang et al in Proceedings of the European conference on computer vision in 2020, but user adjustment of the input is still required to achieve good results.
The existing traditional editing methods such as liquefaction deformation and the like cannot naturally and efficiently process large-scale editing of mouths and eyes, and the editing degree is limited, as shown in fig. 4. In recent face editing, semantic graphs and line manuscripts are used as conditional input for network training, the generated result is highly fitted to input, and the input of novice users of beginners is often abstract and greatly different from a training set. Most of the work is directed to the user input into the network and thus the resulting results are flawed. At present, some works exist to optimize the line manuscript input by the user, which has certain effect, but the user still needs to repeatedly fine tune, because the editing freedom of the line manuscript graph is too high, the control precision of the user is correspondingly reduced after the optimization. Drawing-based interaction is balanced between the accuracy of control and the natural degree of a generated result, the existing face editing technology is not easy to use for common users, and the editing efficiency is low.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the existing problems, a face geometric feature editing method and a deep face reshaping editing method are provided.
The technical scheme adopted by the invention is as follows: a human face geometric feature editing method is characterized by comprising the following steps:
acquiring a geometric basic face image, and detecting face key points from the geometric basic face image;
connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume variation is formed by training face key points on a face data set, and parameterizes a natural face shape to enable the characteristics of a hidden space to decode the face key points which are natural, smooth and accord with the geometric characteristics of the face;
acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder;
and rendering the optimized grid into a human face geometric feature map.
The training of the convolutional variational auto-encoder comprises:
counting and calculating average key points of the key points in the face database;
performing Delaunay triangulation to determine the connection relation fixed topology of the average key points, and taking the connection relation as the edge relation of graph convolution;
a graph volume variational automatic encoder aiming at key points of a face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of key points output by decoding are constrained to be consistent with the input by using an L2 loss function, and the face shape of a natural face is parameterized through training, so that the features of a hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric features of the face.
The graph convolution variation automatic encoder optimizes grids formed by connecting other positions in the human face key points and the human face key points according to the coordinate difference before and after dragging of the dragging points and the positions of the fixed points, and comprises the following steps:
regarding the mesh after the user drags the point as the mesh of the local missing vertex, and regarding the key points of the face other than the fixed point and the dragged point as the missing points which can move freely; the initial mesh is used for implicit spatial coding initialization;
and iteratively optimizing a final deformed grid according to the grid decoded by the minimized hidden space and the different points corresponding to the grid after the immobile point and the dragging point are set.
A deep face reshaping editing method is characterized in that:
segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature graph is edited according to the face geometric feature editing method;
inputting the face appearance feature map into a local generation module of a corresponding part according to the face part, and extracting local appearance features corresponding to each part of the face;
generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;
and fusing the local face features corresponding to all parts of the face through the trained global fusion module to generate a face editing image with the geometric features of the face geometric feature map and the appearance features of the face appearance feature map.
The local generation module generates the local features of the face including the corresponding local geometric features and the local appearance features corresponding to each part of the face based on the local geometric features and the local appearance features, and the local generation module comprises:
inputting the local geometric features into a convolution framework of the local generation module;
the human face appearance feature map is coded to high-dimensional features through a convolution layer, the features are divided into h.w index sequences according to position codes, wherein h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, and the dimensions correspond to the height and the width of an image;
and combining each sequence and the learnable position coding parameters, sending the combined sequences into a Transformer encoder for recombination to obtain and inject parameters corresponding to a Sandwich normalization layer in the framework, and finally convoluting the framework to output an edited image.
Features before the convolutional layer of the last dimension down to 3 in the local generation module are merged.
The global fusion module fuses the local features of the human face corresponding to each part of the human face together by using a network of a U-net structure.
The training of the local generation module and the global fusion module comprises:
a double-scale PatchGAN discriminator is adopted in both the local generation module and the global fusion module
Figure 84179DEST_PATH_IMAGE001
To match the distribution between the generated result and the actual result, as follows:
Figure 470161DEST_PATH_IMAGE002
wherein
Figure 843374DEST_PATH_IMAGE004
Is the output of the discriminator
Figure 707424DEST_PATH_IMAGE005
In order to be the output of the generator,I in is a characteristic diagram of the appearance of the human face,L in is a geometric face feature map.
The training of the local generation module and the global fusion module comprises:
the feature matching loss function of the multi-scale discriminator used in Pix2PixHD is used, as follows:
Figure 380851DEST_PATH_IMAGE006
wherein
Figure 406576DEST_PATH_IMAGE007
Is the number of layers of the discriminator,
Figure 267085DEST_PATH_IMAGE008
is the number of the characteristic elements of the ith layer, k is the scale serial number of the multi-scale discriminator,I in is a characteristic diagram of the appearance of the human face,L in is a geometric characteristic map of a human face,I out is the output result image.
Constraining the color difference of the a and b channels of the input-output image converted into the CIELAB color space as follows:
Figure 934826DEST_PATH_IMAGE009
wherein Lab ()abTo convert an RGB image into a CIELAB color space and take out the functions of the a and b channels.
In the training of the local generation module and the global fusion module, a pre-training network VGG19 is adopted to encode the input and output images and perform a high-level characteristic loss function;
in the global fusion module training, a pre-training face recognition network ArcFace is used for distinguishing input and output, and the characteristic cosine similarity is calculated to be used as a loss function, wherein the loss function is as follows:
Figure 197180DEST_PATH_IMAGE010
Figure 924965DEST_PATH_IMAGE011
wherein
Figure 272770DEST_PATH_IMAGE012
The face recognition network ArcFace is represented, and the VGG is represented by a pre-training network VGG 19;
Figure 213044DEST_PATH_IMAGE013
and
Figure 852274DEST_PATH_IMAGE014
generating face images input and output in the module locally;
Figure 141173DEST_PATH_IMAGE015
and
Figure 586061DEST_PATH_IMAGE016
the face images are input and output in the global fusion module.
The deep face remodeling editing device is characterized in that:
the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;
the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face;
the local generation unit is used for generating the local human face features which are corresponding to all parts of the human face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features;
and the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.
A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the method for deep face reshaping editing as claimed in any one of claims 4 to 11.
A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program realizes the steps of the deep face reshaping editing method when being executed.
An interactive interface, comprising:
the display area I is used for displaying the geometric basic face image uploaded by the user and face key points which are detected by the geometric basic face image and can be operated by the user;
the display area II is used for displaying a human face geometric feature map corresponding to the human face key points in the display area I;
the display area III is used for displaying the face appearance characteristic diagram uploaded by the user;
and the display area IV is used for displaying the face editing image which is edited and generated by adopting the depth face reshaping editing method.
The invention has the beneficial effects that: the invention uses the graph convolution network to code the face key points, and the hidden layer vector is iteratively optimized according to the loss of the user dragging point and the current network output corresponding point, and finally the corresponding deformed shape is obtained, thereby the deformed face key point shape can be iteratively optimized according to the user dragging speed.
The human face appearance feature map is divided into four parts (eyes, a nose, a mouth and a background) according to the parts, each part is separately locally coded by an appearance coder designed based on a Transformer to generate corresponding local appearance features, then corresponding local geometric features are generated according to a deformed key point map, finally, all the features are spliced into a final result by a global fusion module with a U-net structure, and the spliced image has the geometric shape features edited by dragging of a user and the appearance features of the human face appearance feature map.
The face portrait is reshaped and edited based on face key point deformation and a depth generation network, the existing face editing technical means scheme is supplemented, and compared with other editing modes based on drawing, the face portrait reshaping method is easier for common users to use in dragging and editing.
Drawings
Fig. 1 is a network architecture diagram of an embodiment, in which the half shows a face key point deformation network and an optimization process, and the right half shows a local-to-global generation network.
Fig. 2 is a schematic diagram illustrating a process of dynamically editing a picture by a user in the embodiment.
FIG. 3 illustrates a multi-instance shape editing effect of an embodiment.
Fig. 4 shows the effect of the embodiment compared with the conventional image warping method.
Fig. 5 shows the effect of two consecutive edits of the embodiment.
FIG. 6 shows the real-time drag interaction interface of the present embodiment.
Detailed Description
As shown in fig. 1, the present embodiment is a method for editing deep face reshaping capable of real-time interaction based on key point dragging, and specifically includes the following steps:
segmenting a human face geometric feature image (semantic mask image) which is well deformed and edited into four parts, namely eyes, a nose, a mouth and a background according to human face parts to obtain local geometric features which respectively correspond to the human face parts, and inputting the local geometric features into a local generation module convolution framework;
the human face appearance feature mapI in Inputting corresponding local generation modules according to human face parts (eyes, nose, mouth and background) and extracting corresponding local appearance characteristics of all parts of the human face;
generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;
the local features of the face corresponding to each part of the face are fused by a trained global fusion module to generate a face editing imageI out . Ideally, the face edits the imageI out The human face geometric feature map has the same facial geometric feature as the human face geometric feature map and has the same facial appearance feature as the human face appearance feature map.
In this embodiment, the geometric feature map of the face is edited by a face geometric feature editing method based on a geometric basic face image.
The method for editing the geometric characteristics of the human face in the embodiment comprises the following steps:
obtaining geometric basic face image (human face real image, and human face appearance characteristic image)I in The same picture or different pictures) and detecting the key points of the face from the geometric basic face image
Figure 454659DEST_PATH_IMAGE017
Connecting key points into a grid of a 2D plane according to the positions of five sense organs on a geometric basis face image
Figure 566972DEST_PATH_IMAGE018
Encoding the grid input image convolution Variation Automatic Encoder (VAE);
obtaining key points of user to face
Figure 761193DEST_PATH_IMAGE017
Including the user's keypoints of a face
Figure 958956DEST_PATH_IMAGE017
Some points are set as fixed points, and key points of the human face are set
Figure 631246DEST_PATH_IMAGE017
In the dragging operation of some points, the graph convolution variational automatic encoder iteratively optimizes the implicit space encoding of the grid by taking the coordinate difference of key points before and after dragging as a loss function so as to ensure thatMesh for network output after iterative optimization
Figure 332486DEST_PATH_IMAGE019
The original shape characteristics are kept while the dragging and editing of the user are met;
rendering the grid into a semantic mask image after obtaining the edited grid, and taking the semantic mask image as a human face geometric characteristic image input in the deep human face reshaping editing method
In this embodiment, the idea of 3D mesh deformation is applied to the deformation of the plane face key points, and therefore the plane 2D key points are directly adopted because the detection of the plane key points of the face is very accurate at present, and in contrast, the 3D face reconstruction not only has larger data and is slower in reconstruction, but also has larger errors compared with the 2D key points. The 2D key point data is compact, and the shape characteristics of the human face can be well described.
In the embodiment, firstly, the average key points of the key points in the face database are statistically calculated, then, Delaunay triangulation is carried out to determine the fixed topology of the connection relation of the vertexes, the connection relation is used as a limb relation of graph convolution, then, a graph integral Variational Automatic Encoder (VAE) aiming at the key points of the face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of the key points output by decoding are constrained to be consistent with the input by an L2 loss function, and the natural face shape is parameterized through training, so that the features of the hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric feature distribution of the face.
The mesh after the user drags the point will be treated as the mesh of the local missing vertex
Figure 963187DEST_PATH_IMAGE020
That is, except the fixed point and the drag point set by the user, the other points are regarded as the missing points which can move freely, and the initial grid is used
Figure 382667DEST_PATH_IMAGE018
For implicit spatial coding initialization. The embodiment iteratively optimizes the corresponding dissimilarity point of the mesh decoded by the minimized implicit space and the user-defined meshAnd (3) outputting a final deformed grid as shown in the following formula:
Figure 858648DEST_PATH_IMAGE021
wherein
Figure 414394DEST_PATH_IMAGE022
On behalf of the decoder,
Figure 950418DEST_PATH_IMAGE023
is used for selecting and
Figure 857194DEST_PATH_IMAGE020
a matrix of identical point sequences.
In this embodiment, in order to better enable the network to learn the characteristics of the face appearance texture distribution and better control the local details, a local generation module is designed in combination with a transform encoder, and training is performed on each structured area image. As shown in the right half area of fig. 1, the face appearance feature mapI in The method comprises the steps of firstly encoding convolution layers to high-dimensional features, then splitting the features into h and w indexed sequences according to position encoding (h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, the dimensions of the h and w correspond to the height and the width of an image), enabling each sequence to be regarded as a word of the high-dimensional features of an appearance, combining each sequence and learnable position encoding parameters, sending the combined sequences to a Transformer encoder for recombination to obtain parameters corresponding to a Sandwich normalization layer in a convolution semantic mask diagram framework, injecting the parameters, and finally outputting an edited image by the convolution framework.
Random noise is injected into the convolution skeleton in the local generation module of the example to enhance the robustness of generation and avoid detail blurring.
In order to combine the parts together, the global fusion module in this embodiment fuses the outputs of the four parts together using a network with a U-net structure. In order to keep the generation details of the local generation module as much as possible and eliminate the style difference between the generated results of all the parts as much as possible, the characteristics before the convolution layer with dimension reduced to 3 finally in the local generation module are combined, because the characteristics keep rich higher dimension information and the size of the characteristics is consistent with that of the input picture, the direct alignment of all the parts is convenient.
In the embodiment, the global fusion module copies each face local feature to a zero value tensor of the size of the input picture according to the coordinate position of each part in the picture, and then four features with the same size as the input picture are connected in dimension to form the input of the U-net network.
In order to avoid changing the unmodified picture part except the human face as much as possible, the embodiment uses the key point convex hull to deduct the background, sends the background into the convolutional network coding and injects the convolutional network coding into the decoder of the global fusion module, so that the final generated result can be well fused with the background.
In order to enable the local generation module and the global fusion module to learn the distribution of the face shape and improve the quality of the generated and edited image, the embodiment performs a series of preprocessing on the data set and designs and utilizes a plurality of loss functions to constrain the generated result. In this embodiment, the local generation module is trained first, and then the parameters of the local generation module are fixed to train the global fusion module.
This example uses CelebA-HQ as the training data set and performs a series of pre-treatments:
firstly, the left and right angles of the face are screened out by a deep face alignment identification method
Figure 139795DEST_PATH_IMAGE024
A lateral face;
then, Face + + dense Face key point prediction API is used for detecting the faces of the data set, 772 key points are saved in each Face, and a semantic mask graph is rendered.
This embodiment also screens out pictures with sunglasses because the eye key points are difficult to predict and do not represent the shape of the glasses. The window sizes of the four parts of eyes, nose, mouth and background are set to 128 × 320, 160 × 160, 192 × 192 and 512 × 512 in turn, and all images are scaled to 512 × 512.
The embodiment is to generate pairsTraining in a network-resistant classical training mode, and adopting a dual-scale PatchGAN discriminator in both the local generation module and the global fusion module
Figure 815627DEST_PATH_IMAGE001
To match the distribution between the generated result and the actual result, as shown in the following equation:
Figure DEST_PATH_IMAGE025
wherein
Figure 991393DEST_PATH_IMAGE026
Is the output of the discriminator
Figure 775679DEST_PATH_IMAGE027
In order to be the output of the generator,
Figure 203249DEST_PATH_IMAGE028
for the input geometric feature map of the human face,
Figure 327063DEST_PATH_IMAGE015
the face appearance feature image is input.
In order to achieve more robust training for the local generation module and the global fusion module, the present example uses the feature matching loss function of the multi-scale discriminator used in Pix2PixHD, as follows:
Figure 204889DEST_PATH_IMAGE029
wherein
Figure 86257DEST_PATH_IMAGE030
Is the number of layers of the discriminator,
Figure 707731DEST_PATH_IMAGE031
is the number of the characteristic elements of the ith layer, k is the scale serial number of the multi-scale discriminator,I in as a feature of the face appearanceIn the figure, the figure shows that,L in is a geometric characteristic map of a human face,I out is the output result image.
In order to keep the tone of the generated result consistent, the present embodiment imposes a constraint on the color difference of the a and b channels of the input-output image converted into the CIELAB color space, as follows:
Figure 358156DEST_PATH_IMAGE032
wherein Lab ()abTo convert an RGB image into a CIELAB color space and take out the functions of the a and b channels.
One of the keys of the quality of the editing result is to maintain the characteristic attributes of the character, and the embodiment uses a mixed loss function for control in the local and global phases. In both local and global network training, the pre-training network VGG19 is used to encode the input and output images as a high-level feature loss function. In order to better maintain the character features in global fusion, the embodiment further uses a pre-training face recognition network ArcFace to discriminate input and output, and calculates the cosine similarity of features to be used as a loss function, as shown in the following formula:
Figure 141304DEST_PATH_IMAGE033
Figure 509968DEST_PATH_IMAGE034
wherein
Figure 669554DEST_PATH_IMAGE035
The representative ArcFace is a face recognition network, and it should be noted that the loss function calculated by the network is only suitable for global faces and is not suitable for local training. VGG refers to the pre-training network VGG 19;
Figure 440064DEST_PATH_IMAGE013
and
Figure 397043DEST_PATH_IMAGE014
generating face images input and output in the module locally;
Figure 253004DEST_PATH_IMAGE015
and
Figure 216281DEST_PATH_IMAGE016
the face images are input and output in the global fusion module.
The geometrical characteristics mainly comprise two aspects: 1. shape information such as the shape of the five sense organs, the face shape of a person, the length of hair, and the like; 2. geometric details, i.e. the representation of details of geometric features of a human face, such as wrinkles of the person's face, the trend of the hair, etc.
The appearance characteristics mainly include three contents: 1. color information such as color development, skin color, lip color, and the like of a human face; 2. material information, namely the texture of the hair and skin of the human face, such as the smoothness of the skin and the like; 3. the illumination information is information of the influence of the illumination condition on the brightness of the human face, such as the brightness of light, the change of shadow, and the like. In some cases, the effects of the above factors on appearance are mutual, for example, illumination changes may affect the expression of skin color, and appearance characteristics do not make clear division between each of the above factors.
Fig. 2 is a schematic diagram illustrating a process of dynamically editing a picture by a user in the embodiment. The embodiment is easy to use for common users, after a user drags a face key point, the key point grid deformation is carried out in real time, the editing effect is automatically generated, and the user can continuously edit each part.
Fig. 3 shows a multi-case shape editing effect of the embodiment, a user may rapidly drag a face key point to implement functions such as hairline reduction (leftmost column), expression control, face thinning, and the like, a corresponding face key point line graph is listed in the first row of fig. 3, a dragged point is displayed in a small box, and an arrow in the small box is a dragging direction.
Fig. 4 shows a comparison of the effect of the present embodiment and the effect of the conventional image warping method, where the first image is an original image, the second image is a feature point before and after deformation, the third image is a result of the conventional method, and the last image is a processing result of the present embodiment. The traditional image distortion method is difficult to process the positions of eyes and mouth, after the mouth is pulled away in the example, the image distortion method cannot generate images of teeth and the like, so that the images are very unnatural, and the method of the embodiment can automatically generate the corresponding missing parts.
Fig. 5 shows the effect of two consecutive edits in this embodiment, corresponding face keypoint wiring diagrams are listed in the first and third rows.
Fig. 6 shows an interactive interface which can be dragged by a user in real time in this embodiment, where the interactive interface includes a display area i, a display area ii, and a display area iv, where the display area i is used to display a geometric-base face image uploaded by the user and face key points detected by the geometric-base face image and available for the user to operate; the display area II is used for displaying a face geometric feature map corresponding to the face key points; and the display area IV is used for displaying the face editing image which is edited and generated by adopting a depth face reshaping editing method. In the deep face remodeling editing method corresponding to the interactive interface, the geometric basic face image is the same as the face appearance feature image, so that no other display area is arranged on the face appearance feature image.
The embodiment also provides a deep face reshaping editing device, which comprises a geometric feature extraction unit, an appearance feature extraction unit, a local generation unit and a global fusion unit, wherein the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face part to obtain local geometric features corresponding to all parts of the face; the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face; the local generation unit is used for generating the local face features which are corresponding to each part of the face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features; the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.
The present embodiment also provides a storage medium on which a computer program executable by a processor is stored, where the computer program is executed to implement the steps of the deep face reshaping editing method in the present embodiment.
The embodiment also provides a computer device, which has a memory and a processor, wherein the memory stores a computer program capable of being executed by the processor, and the computer program realizes the steps of the deep face reshaping editing method in the embodiment when being executed.

Claims (15)

1. A human face geometric feature editing method is characterized by comprising the following steps:
acquiring a geometric basic face image, and detecting face key points from the geometric basic face image;
connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume variation is formed by training face key points on a face data set, and parameterizes a natural face shape to enable the characteristics of a hidden space to decode the face key points which are natural, smooth and accord with the geometric characteristics of the face;
acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder;
and rendering the optimized grid into a human face geometric feature map.
2. The method for editing geometric features of human faces according to claim 1, wherein the training of the graph convolution variational automatic encoder comprises:
counting and calculating average key points of the key points in the face database;
performing Delaunay triangulation to determine the connection relation fixed topology of the average key points, and taking the connection relation as the edge relation of graph convolution;
a graph volume variational automatic encoder aiming at key points of a face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of key points output by decoding are constrained to be consistent with the input by using an L2 loss function, and the face shape of a natural face is parameterized through training, so that the features of a hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric features of the face.
3. The method for editing geometric features of a human face according to claim 1, wherein the automatic graph convolution variation encoder optimizes grids formed by connecting the positions of the rest points of the key points of the human face and the key points of the human face according to the coordinate difference before and after dragging of a dragging point and the position of a fixed point, and comprises the following steps:
regarding the mesh after the user drags the point as the mesh of the local missing vertex, and regarding the key points of the face other than the fixed point and the dragged point as the missing points which can move freely; the initial mesh is used for implicit spatial coding initialization;
and iteratively optimizing a final deformed grid according to the grid decoded by the minimized hidden space and the different points corresponding to the grid after the immobile point and the dragging point are set.
4. A deep face reshaping editing method is characterized in that:
segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;
inputting the face appearance feature map into a local generation module of a corresponding part according to the face part, and extracting local appearance features corresponding to each part of the face;
generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;
and fusing the local face features corresponding to all parts of the face through the trained global fusion module to generate a face editing image with the geometric features of the face geometric feature map and the appearance features of the face appearance feature map.
5. The method for deep face reshaping and editing according to claim 4, wherein the local generation module generates, based on the local geometric features and the local appearance features, local features of the face corresponding to each part of the face, the local features including the corresponding local geometric features and the corresponding local appearance features, and includes:
inputting the local geometric features into a convolution framework of the local generation module;
the human face appearance feature map is coded to high-dimensional features through a convolution layer, the features are divided into h.w index sequences according to position codes, wherein h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, and the dimensions correspond to the height and the width of an image;
and combining each sequence and the learnable position coding parameters, sending the combined sequences into a Transformer encoder for recombination to obtain and inject parameters corresponding to a Sandwich normalization layer in the framework, and finally convoluting the framework to output an edited image.
6. A deep face reshaping editing method according to claim 4, wherein: features before the convolutional layer of the last dimension down to 3 in the local generation module are merged.
7. A deep face reshaping editing method according to claim 4, wherein: the global fusion module fuses the local features of the human face corresponding to each part of the human face together by using a network of a U-net structure.
8. The deep face reshaping editing method according to claim 4, wherein the training of the local generation module and the global fusion module comprises:
a double-scale PatchGAN discriminator is adopted in both the local generation module and the global fusion module
Figure DEST_PATH_IMAGE002
To match the distribution between the generated result and the actual result, as follows:
Figure DEST_PATH_IMAGE004
wherein
Figure DEST_PATH_IMAGE006
Is the output of the discriminator
Figure DEST_PATH_IMAGE008
In order to be the output of the generator,I in is a characteristic diagram of the appearance of the human face,L in is a geometric face feature map.
9. The deep face reshaping editing method according to claim 4, wherein the training of the local generation module and the global fusion module comprises:
the feature matching loss function of the multi-scale discriminator used in Pix2PixHD is used, as follows:
Figure DEST_PATH_IMAGE010
wherein
Figure DEST_PATH_IMAGE012
Is the number of layers of the discriminator,
Figure DEST_PATH_IMAGE014
is the number of the characteristic elements of the ith layer, k is the scale serial number of the multi-scale discriminator,I in is a characteristic diagram of the appearance of the human face,L in is a geometric characteristic map of a human face,I out is the output result image.
10. A deep face reshaping editing method as claimed in claim 4, wherein the transformation of the input and output images to the color difference of the a and b channels in the CIELAB color space is constrained as follows:
Figure DEST_PATH_IMAGE016
wherein Lab ()abTo convert an RGB image into a CIELAB color space and take out the functions of the a and b channels.
11. A deep face reshaping editing method according to claim 4, wherein: in the training of the local generation module and the global fusion module, a pre-training network VGG19 is adopted to encode the input and output images and perform a high-level characteristic loss function;
in the global fusion module training, a pre-training face recognition network ArcFace is used for distinguishing input and output, and the characteristic cosine similarity is calculated to be used as a loss function, wherein the loss function is as follows:
Figure DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE020
wherein
Figure DEST_PATH_IMAGE022
The face recognition network ArcFace is represented, and the VGG is represented by a pre-training network VGG 19;
Figure DEST_PATH_IMAGE024
and
Figure DEST_PATH_IMAGE026
generating face images input and output in the module locally;
Figure DEST_PATH_IMAGE028
and
Figure DEST_PATH_IMAGE030
the face images are input and output in the global fusion module.
12. The deep face remodeling editing device is characterized in that:
the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;
the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face;
the local generation unit is used for generating the local human face features which are corresponding to all parts of the human face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features;
and the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.
13. A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the method for deep face reshaping editing as claimed in any one of claims 4 to 11.
14. A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program when executed implements the steps of the method for deep face reshaping editing as claimed in any one of claims 4 to 11.
15. An interactive interface, comprising:
the display area I is used for displaying the geometric basic face image uploaded by the user and face key points which are detected by the geometric basic face image and can be operated by the user;
the display area II is used for displaying a human face geometric feature map corresponding to the human face key points in the display area I;
the display area III is used for displaying the face appearance characteristic diagram uploaded by the user;
a display area iv for displaying a face editing image edited and generated by the deep face reshaping editing method according to any one of claims 4 to 11.
CN202111029442.5A 2021-09-03 2021-09-03 Face geometric feature editing method and deep face remodeling editing method Active CN113470182B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111029442.5A CN113470182B (en) 2021-09-03 2021-09-03 Face geometric feature editing method and deep face remodeling editing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111029442.5A CN113470182B (en) 2021-09-03 2021-09-03 Face geometric feature editing method and deep face remodeling editing method

Publications (2)

Publication Number Publication Date
CN113470182A true CN113470182A (en) 2021-10-01
CN113470182B CN113470182B (en) 2022-02-18

Family

ID=77867216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111029442.5A Active CN113470182B (en) 2021-09-03 2021-09-03 Face geometric feature editing method and deep face remodeling editing method

Country Status (1)

Country Link
CN (1) CN113470182B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119977A (en) * 2021-12-01 2022-03-01 昆明理工大学 Graph convolution-based Transformer gastric cancer canceration region image segmentation method
CN114845067A (en) * 2022-07-04 2022-08-02 中科计算技术创新研究院 Hidden space decoupling-based depth video propagation method for face editing
CN115311730A (en) * 2022-09-23 2022-11-08 北京智源人工智能研究院 Face key point detection method and system and electronic equipment
CN115810215A (en) * 2023-02-08 2023-03-17 科大讯飞股份有限公司 Face image generation method, device, equipment and storage medium
WO2023132788A3 (en) * 2022-01-06 2023-10-05 Lemon Inc. Creating effects based on facial features
CN117594202A (en) * 2024-01-19 2024-02-23 深圳市宗匠科技有限公司 Wrinkle size analysis method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7450126B2 (en) * 2000-08-30 2008-11-11 Microsoft Corporation Methods and systems for animating facial features, and methods and systems for expression transformation
CN109978930A (en) * 2019-03-27 2019-07-05 杭州相芯科技有限公司 A kind of stylized human face three-dimensional model automatic generation method based on single image
CN110288697A (en) * 2019-06-24 2019-09-27 天津大学 3D face representation and method for reconstructing based on multiple dimensioned figure convolutional neural networks
CN112288851A (en) * 2020-10-23 2021-01-29 武汉大学 Three-dimensional face modeling method based on double-branch flow network
CN112991484A (en) * 2021-04-28 2021-06-18 中国科学院计算技术研究所数字经济产业研究院 Intelligent face editing method and device, storage medium and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7450126B2 (en) * 2000-08-30 2008-11-11 Microsoft Corporation Methods and systems for animating facial features, and methods and systems for expression transformation
CN109978930A (en) * 2019-03-27 2019-07-05 杭州相芯科技有限公司 A kind of stylized human face three-dimensional model automatic generation method based on single image
CN110288697A (en) * 2019-06-24 2019-09-27 天津大学 3D face representation and method for reconstructing based on multiple dimensioned figure convolutional neural networks
CN112288851A (en) * 2020-10-23 2021-01-29 武汉大学 Three-dimensional face modeling method based on double-branch flow network
CN112991484A (en) * 2021-04-28 2021-06-18 中国科学院计算技术研究所数字经济产业研究院 Intelligent face editing method and device, storage medium and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUANXIA ZHENG等: "Pluralistic Image Completion", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
HU HAN等: "Heterogeneous Face Attribute Estimation:A Deep Multi-Task Learning Approach", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119977A (en) * 2021-12-01 2022-03-01 昆明理工大学 Graph convolution-based Transformer gastric cancer canceration region image segmentation method
WO2023132788A3 (en) * 2022-01-06 2023-10-05 Lemon Inc. Creating effects based on facial features
US11900545B2 (en) 2022-01-06 2024-02-13 Lemon Inc. Creating effects based on facial features
CN114845067A (en) * 2022-07-04 2022-08-02 中科计算技术创新研究院 Hidden space decoupling-based depth video propagation method for face editing
CN114845067B (en) * 2022-07-04 2022-11-04 中科计算技术创新研究院 Hidden space decoupling-based depth video propagation method for face editing
CN115311730A (en) * 2022-09-23 2022-11-08 北京智源人工智能研究院 Face key point detection method and system and electronic equipment
CN115311730B (en) * 2022-09-23 2023-06-20 北京智源人工智能研究院 Face key point detection method and system and electronic equipment
CN115810215A (en) * 2023-02-08 2023-03-17 科大讯飞股份有限公司 Face image generation method, device, equipment and storage medium
CN117594202A (en) * 2024-01-19 2024-02-23 深圳市宗匠科技有限公司 Wrinkle size analysis method and device, electronic equipment and storage medium
CN117594202B (en) * 2024-01-19 2024-04-19 深圳市宗匠科技有限公司 Wrinkle size analysis method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113470182B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN113470182B (en) Face geometric feature editing method and deep face remodeling editing method
Jam et al. A comprehensive review of past and present image inpainting methods
US11880766B2 (en) Techniques for domain to domain projection using a generative model
Zhuang et al. Dreameditor: Text-driven 3d scene editing with neural fields
CN110322468A (en) A kind of automatic edit methods of image
Bermano et al. Facial performance enhancement using dynamic shape space analysis
CN111915693A (en) Sketch-based face image generation method and system
Zhang et al. Hair-GAN: Recovering 3D hair structure from a single image using generative adversarial networks
Piao et al. Inverting generative adversarial renderer for face reconstruction
WO2021140510A2 (en) Large-scale generation of photorealistic 3d models
CN113034355B (en) Portrait image double-chin removing method based on deep learning
Guo et al. 3D face from X: Learning face shape from diverse sources
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN116416376A (en) Three-dimensional hair reconstruction method, system, electronic equipment and storage medium
CN116385606A (en) Speech signal driven personalized three-dimensional face animation generation method and application thereof
CN112991484B (en) Intelligent face editing method and device, storage medium and equipment
Ling et al. Semantically disentangled variational autoencoder for modeling 3d facial details
CN117036620B (en) Three-dimensional face reconstruction method based on single image
Jiang et al. 3d points splatting for real-time dynamic hand reconstruction
He et al. Data-driven 3D human head reconstruction
CN116630496A (en) Sketch character animation generation method and device, electronic equipment and storage medium
Berson et al. A robust interactive facial animation editing system
Zhang et al. Hair-gans: Recovering 3d hair structure from a single image
CN113129347A (en) Self-supervision single-view three-dimensional hairline model reconstruction method and system
Li et al. Guiding 3D Digital Content Generation with Pre-Trained Diffusion Models.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant