CN113470182A

CN113470182A - Face geometric feature editing method and deep face remodeling editing method

Info

Publication number: CN113470182A
Application number: CN202111029442.5A
Authority: CN
Inventors: 高林; 陈姝宇; 姜悦人
Original assignee: Zhongke Computing Technology Innovation Research Institute
Current assignee: Zhongke Computing Technology Innovation Research Institute
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2021-10-01
Anticipated expiration: 2041-09-03
Also published as: CN113470182B

Abstract

The invention relates to a face geometric feature editing method and a deep face remodeling editing method, which comprises the following steps: acquiring a geometric basic face image, and detecting face key points from the geometric basic face image; connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume integral variation is formed by training key points of a human face on a human face data set, and parameterizes a natural human face shape to enable the characteristics of a hidden space of the automatic encoder to decode the key points of the human face which are natural, smooth and accord with the geometric characteristics of the human face; acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder; and rendering the optimized grid into a human face geometric feature map. The invention is suitable for the fields of computer vision and computer graphics.

Description

Face geometric feature editing method and deep face remodeling editing method

Technical Field

The invention relates to a face geometric feature editing method and a deep face reshaping editing method. It is suitable for the fields of computer vision and computer graphics.

Background

Human face image editing is one of important research directions of computer vision and graphics, and has wide application in mass media and video industry. Early traditional face editing methods implemented editing mainly through image warping and pixel computational rendering, and it was difficult to generate details and handle hidden areas of the eyes and mouth.

In recent years, interactive face editing methods can be roughly divided into two types: one is to generate human faces from the condition input overall depth for editing, for example, Zhu et al, in 2020, "Image synthesis with a detailed region-adaptive normalization" published by Zhu et al in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern registration; another is a method of treating local modifications as image completion, such as SC-FEGAN, surface Editing general additive Network With User's Sketch and Color, published by Jo et al in 2019 in Proceedings of the IEEE/CVF International Conference on Computer Vision.

The above method, while capable of generating natural results, requires the user to provide line draft or semantic graph input approximating network training data for high quality drawing. When the input line draft or semantic graph is not real enough, the result is correspondingly flawed. This is relatively difficult for beginners or users without drawing skills to use.

There are also some efforts to optimize the line draft input, such as "Deep plastic surgery" published by Yang et al in Proceedings of the European conference on computer vision in 2020, but user adjustment of the input is still required to achieve good results.

The existing traditional editing methods such as liquefaction deformation and the like cannot naturally and efficiently process large-scale editing of mouths and eyes, and the editing degree is limited, as shown in fig. 4. In recent face editing, semantic graphs and line manuscripts are used as conditional input for network training, the generated result is highly fitted to input, and the input of novice users of beginners is often abstract and greatly different from a training set. Most of the work is directed to the user input into the network and thus the resulting results are flawed. At present, some works exist to optimize the line manuscript input by the user, which has certain effect, but the user still needs to repeatedly fine tune, because the editing freedom of the line manuscript graph is too high, the control precision of the user is correspondingly reduced after the optimization. Drawing-based interaction is balanced between the accuracy of control and the natural degree of a generated result, the existing face editing technology is not easy to use for common users, and the editing efficiency is low.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the existing problems, a face geometric feature editing method and a deep face reshaping editing method are provided.

The technical scheme adopted by the invention is as follows: a human face geometric feature editing method is characterized by comprising the following steps:

acquiring a geometric basic face image, and detecting face key points from the geometric basic face image;

connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume variation is formed by training face key points on a face data set, and parameterizes a natural face shape to enable the characteristics of a hidden space to decode the face key points which are natural, smooth and accord with the geometric characteristics of the face;

acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder;

and rendering the optimized grid into a human face geometric feature map.

The training of the convolutional variational auto-encoder comprises:

counting and calculating average key points of the key points in the face database;

performing Delaunay triangulation to determine the connection relation fixed topology of the average key points, and taking the connection relation as the edge relation of graph convolution;

a graph volume variational automatic encoder aiming at key points of a face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of key points output by decoding are constrained to be consistent with the input by using an L2 loss function, and the face shape of a natural face is parameterized through training, so that the features of a hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric features of the face.

The graph convolution variation automatic encoder optimizes grids formed by connecting other positions in the human face key points and the human face key points according to the coordinate difference before and after dragging of the dragging points and the positions of the fixed points, and comprises the following steps:

regarding the mesh after the user drags the point as the mesh of the local missing vertex, and regarding the key points of the face other than the fixed point and the dragged point as the missing points which can move freely; the initial mesh is used for implicit spatial coding initialization;

and iteratively optimizing a final deformed grid according to the grid decoded by the minimized hidden space and the different points corresponding to the grid after the immobile point and the dragging point are set.

A deep face reshaping editing method is characterized in that:

segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature graph is edited according to the face geometric feature editing method;

inputting the face appearance feature map into a local generation module of a corresponding part according to the face part, and extracting local appearance features corresponding to each part of the face;

generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;

and fusing the local face features corresponding to all parts of the face through the trained global fusion module to generate a face editing image with the geometric features of the face geometric feature map and the appearance features of the face appearance feature map.

The local generation module generates the local features of the face including the corresponding local geometric features and the local appearance features corresponding to each part of the face based on the local geometric features and the local appearance features, and the local generation module comprises:

inputting the local geometric features into a convolution framework of the local generation module;

the human face appearance feature map is coded to high-dimensional features through a convolution layer, the features are divided into h.w index sequences according to position codes, wherein h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, and the dimensions correspond to the height and the width of an image;

and combining each sequence and the learnable position coding parameters, sending the combined sequences into a Transformer encoder for recombination to obtain and inject parameters corresponding to a Sandwich normalization layer in the framework, and finally convoluting the framework to output an edited image.

Features before the convolutional layer of the last dimension down to 3 in the local generation module are merged.

The global fusion module fuses the local features of the human face corresponding to each part of the human face together by using a network of a U-net structure.

The training of the local generation module and the global fusion module comprises:

a double-scale PatchGAN discriminator is adopted in both the local generation module and the global fusion module

To match the distribution between the generated result and the actual result, as follows:

wherein

Is the output of the discriminator

In order to be the output of the generator,I _inis a characteristic diagram of the appearance of the human face,L _inis a geometric face feature map.

the feature matching loss function of the multi-scale discriminator used in Pix2PixHD is used, as follows:

wherein

Is the number of layers of the discriminator,

is the number of the characteristic elements of the ith layer, k is the scale serial number of the multi-scale discriminator,I _inis a characteristic diagram of the appearance of the human face,L _inis a geometric characteristic map of a human face,I _outis the output result image.

Constraining the color difference of the a and b channels of the input-output image converted into the CIELAB color space as follows:

wherein Lab ()_abTo convert an RGB image into a CIELAB color space and take out the functions of the a and b channels.

In the training of the local generation module and the global fusion module, a pre-training network VGG19 is adopted to encode the input and output images and perform a high-level characteristic loss function;

in the global fusion module training, a pre-training face recognition network ArcFace is used for distinguishing input and output, and the characteristic cosine similarity is calculated to be used as a loss function, wherein the loss function is as follows:

wherein

The face recognition network ArcFace is represented, and the VGG is represented by a pre-training network VGG 19;

and

generating face images input and output in the module locally;

and

the face images are input and output in the global fusion module.

The deep face remodeling editing device is characterized in that:

the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;

the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face;

the local generation unit is used for generating the local human face features which are corresponding to all parts of the human face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features;

and the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.

A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the method for deep face reshaping editing as claimed in any one of claims 4 to 11.

A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program realizes the steps of the deep face reshaping editing method when being executed.

An interactive interface, comprising:

the display area I is used for displaying the geometric basic face image uploaded by the user and face key points which are detected by the geometric basic face image and can be operated by the user;

the display area II is used for displaying a human face geometric feature map corresponding to the human face key points in the display area I;

the display area III is used for displaying the face appearance characteristic diagram uploaded by the user;

and the display area IV is used for displaying the face editing image which is edited and generated by adopting the depth face reshaping editing method.

The invention has the beneficial effects that: the invention uses the graph convolution network to code the face key points, and the hidden layer vector is iteratively optimized according to the loss of the user dragging point and the current network output corresponding point, and finally the corresponding deformed shape is obtained, thereby the deformed face key point shape can be iteratively optimized according to the user dragging speed.

The human face appearance feature map is divided into four parts (eyes, a nose, a mouth and a background) according to the parts, each part is separately locally coded by an appearance coder designed based on a Transformer to generate corresponding local appearance features, then corresponding local geometric features are generated according to a deformed key point map, finally, all the features are spliced into a final result by a global fusion module with a U-net structure, and the spliced image has the geometric shape features edited by dragging of a user and the appearance features of the human face appearance feature map.

The face portrait is reshaped and edited based on face key point deformation and a depth generation network, the existing face editing technical means scheme is supplemented, and compared with other editing modes based on drawing, the face portrait reshaping method is easier for common users to use in dragging and editing.

Drawings

Fig. 1 is a network architecture diagram of an embodiment, in which the half shows a face key point deformation network and an optimization process, and the right half shows a local-to-global generation network.

Fig. 2 is a schematic diagram illustrating a process of dynamically editing a picture by a user in the embodiment.

FIG. 3 illustrates a multi-instance shape editing effect of an embodiment.

Fig. 4 shows the effect of the embodiment compared with the conventional image warping method.

Fig. 5 shows the effect of two consecutive edits of the embodiment.

FIG. 6 shows the real-time drag interaction interface of the present embodiment.

Detailed Description

As shown in fig. 1, the present embodiment is a method for editing deep face reshaping capable of real-time interaction based on key point dragging, and specifically includes the following steps:

segmenting a human face geometric feature image (semantic mask image) which is well deformed and edited into four parts, namely eyes, a nose, a mouth and a background according to human face parts to obtain local geometric features which respectively correspond to the human face parts, and inputting the local geometric features into a local generation module convolution framework;

the human face appearance feature mapI _inInputting corresponding local generation modules according to human face parts (eyes, nose, mouth and background) and extracting corresponding local appearance characteristics of all parts of the human face;

the local features of the face corresponding to each part of the face are fused by a trained global fusion module to generate a face editing imageI _out. Ideally, the face edits the imageI _outThe human face geometric feature map has the same facial geometric feature as the human face geometric feature map and has the same facial appearance feature as the human face appearance feature map.

In this embodiment, the geometric feature map of the face is edited by a face geometric feature editing method based on a geometric basic face image.

The method for editing the geometric characteristics of the human face in the embodiment comprises the following steps:

obtaining geometric basic face image (human face real image, and human face appearance characteristic image)I _inThe same picture or different pictures) and detecting the key points of the face from the geometric basic face image

；

Connecting key points into a grid of a 2D plane according to the positions of five sense organs on a geometric basis face image

Encoding the grid input image convolution Variation Automatic Encoder (VAE);

obtaining key points of user to face

Including the user's keypoints of a face

Some points are set as fixed points, and key points of the human face are set

In the dragging operation of some points, the graph convolution variational automatic encoder iteratively optimizes the implicit space encoding of the grid by taking the coordinate difference of key points before and after dragging as a loss function so as to ensure thatMesh for network output after iterative optimization

The original shape characteristics are kept while the dragging and editing of the user are met;

rendering the grid into a semantic mask image after obtaining the edited grid, and taking the semantic mask image as a human face geometric characteristic image input in the deep human face reshaping editing method。

In this embodiment, the idea of 3D mesh deformation is applied to the deformation of the plane face key points, and therefore the plane 2D key points are directly adopted because the detection of the plane key points of the face is very accurate at present, and in contrast, the 3D face reconstruction not only has larger data and is slower in reconstruction, but also has larger errors compared with the 2D key points. The 2D key point data is compact, and the shape characteristics of the human face can be well described.

In the embodiment, firstly, the average key points of the key points in the face database are statistically calculated, then, Delaunay triangulation is carried out to determine the fixed topology of the connection relation of the vertexes, the connection relation is used as a limb relation of graph convolution, then, a graph integral Variational Automatic Encoder (VAE) aiming at the key points of the face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of the key points output by decoding are constrained to be consistent with the input by an L2 loss function, and the natural face shape is parameterized through training, so that the features of the hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric feature distribution of the face.

The mesh after the user drags the point will be treated as the mesh of the local missing vertex

That is, except the fixed point and the drag point set by the user, the other points are regarded as the missing points which can move freely, and the initial grid is used

For implicit spatial coding initialization. The embodiment iteratively optimizes the corresponding dissimilarity point of the mesh decoded by the minimized implicit space and the user-defined meshAnd (3) outputting a final deformed grid as shown in the following formula:

wherein

On behalf of the decoder,

is used for selecting and

a matrix of identical point sequences.

In this embodiment, in order to better enable the network to learn the characteristics of the face appearance texture distribution and better control the local details, a local generation module is designed in combination with a transform encoder, and training is performed on each structured area image. As shown in the right half area of fig. 1, the face appearance feature mapI _inThe method comprises the steps of firstly encoding convolution layers to high-dimensional features, then splitting the features into h and w indexed sequences according to position encoding (h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, the dimensions of the h and w correspond to the height and the width of an image), enabling each sequence to be regarded as a word of the high-dimensional features of an appearance, combining each sequence and learnable position encoding parameters, sending the combined sequences to a Transformer encoder for recombination to obtain parameters corresponding to a Sandwich normalization layer in a convolution semantic mask diagram framework, injecting the parameters, and finally outputting an edited image by the convolution framework.

Random noise is injected into the convolution skeleton in the local generation module of the example to enhance the robustness of generation and avoid detail blurring.

In order to combine the parts together, the global fusion module in this embodiment fuses the outputs of the four parts together using a network with a U-net structure. In order to keep the generation details of the local generation module as much as possible and eliminate the style difference between the generated results of all the parts as much as possible, the characteristics before the convolution layer with dimension reduced to 3 finally in the local generation module are combined, because the characteristics keep rich higher dimension information and the size of the characteristics is consistent with that of the input picture, the direct alignment of all the parts is convenient.

In the embodiment, the global fusion module copies each face local feature to a zero value tensor of the size of the input picture according to the coordinate position of each part in the picture, and then four features with the same size as the input picture are connected in dimension to form the input of the U-net network.

In order to avoid changing the unmodified picture part except the human face as much as possible, the embodiment uses the key point convex hull to deduct the background, sends the background into the convolutional network coding and injects the convolutional network coding into the decoder of the global fusion module, so that the final generated result can be well fused with the background.

In order to enable the local generation module and the global fusion module to learn the distribution of the face shape and improve the quality of the generated and edited image, the embodiment performs a series of preprocessing on the data set and designs and utilizes a plurality of loss functions to constrain the generated result. In this embodiment, the local generation module is trained first, and then the parameters of the local generation module are fixed to train the global fusion module.

This example uses CelebA-HQ as the training data set and performs a series of pre-treatments:

firstly, the left and right angles of the face are screened out by a deep face alignment identification method

A lateral face;

then, Face + + dense Face key point prediction API is used for detecting the faces of the data set, 772 key points are saved in each Face, and a semantic mask graph is rendered.

This embodiment also screens out pictures with sunglasses because the eye key points are difficult to predict and do not represent the shape of the glasses. The window sizes of the four parts of eyes, nose, mouth and background are set to 128 × 320, 160 × 160, 192 × 192 and 512 × 512 in turn, and all images are scaled to 512 × 512.

The embodiment is to generate pairsTraining in a network-resistant classical training mode, and adopting a dual-scale PatchGAN discriminator in both the local generation module and the global fusion module

To match the distribution between the generated result and the actual result, as shown in the following equation:

wherein

Is the output of the discriminator

In order to be the output of the generator,

for the input geometric feature map of the human face,

the face appearance feature image is input.

In order to achieve more robust training for the local generation module and the global fusion module, the present example uses the feature matching loss function of the multi-scale discriminator used in Pix2PixHD, as follows:

wherein

Is the number of layers of the discriminator,

is the number of the characteristic elements of the ith layer, k is the scale serial number of the multi-scale discriminator,I _inas a feature of the face appearanceIn the figure, the figure shows that,L _inis a geometric characteristic map of a human face,I _outis the output result image.

In order to keep the tone of the generated result consistent, the present embodiment imposes a constraint on the color difference of the a and b channels of the input-output image converted into the CIELAB color space, as follows:

One of the keys of the quality of the editing result is to maintain the characteristic attributes of the character, and the embodiment uses a mixed loss function for control in the local and global phases. In both local and global network training, the pre-training network VGG19 is used to encode the input and output images as a high-level feature loss function. In order to better maintain the character features in global fusion, the embodiment further uses a pre-training face recognition network ArcFace to discriminate input and output, and calculates the cosine similarity of features to be used as a loss function, as shown in the following formula:

wherein

The representative ArcFace is a face recognition network, and it should be noted that the loss function calculated by the network is only suitable for global faces and is not suitable for local training. VGG refers to the pre-training network VGG 19;

and

generating face images input and output in the module locally;

and

the face images are input and output in the global fusion module.

The geometrical characteristics mainly comprise two aspects: 1. shape information such as the shape of the five sense organs, the face shape of a person, the length of hair, and the like; 2. geometric details, i.e. the representation of details of geometric features of a human face, such as wrinkles of the person's face, the trend of the hair, etc.

The appearance characteristics mainly include three contents: 1. color information such as color development, skin color, lip color, and the like of a human face; 2. material information, namely the texture of the hair and skin of the human face, such as the smoothness of the skin and the like; 3. the illumination information is information of the influence of the illumination condition on the brightness of the human face, such as the brightness of light, the change of shadow, and the like. In some cases, the effects of the above factors on appearance are mutual, for example, illumination changes may affect the expression of skin color, and appearance characteristics do not make clear division between each of the above factors.

Fig. 2 is a schematic diagram illustrating a process of dynamically editing a picture by a user in the embodiment. The embodiment is easy to use for common users, after a user drags a face key point, the key point grid deformation is carried out in real time, the editing effect is automatically generated, and the user can continuously edit each part.

Fig. 3 shows a multi-case shape editing effect of the embodiment, a user may rapidly drag a face key point to implement functions such as hairline reduction (leftmost column), expression control, face thinning, and the like, a corresponding face key point line graph is listed in the first row of fig. 3, a dragged point is displayed in a small box, and an arrow in the small box is a dragging direction.

Fig. 4 shows a comparison of the effect of the present embodiment and the effect of the conventional image warping method, where the first image is an original image, the second image is a feature point before and after deformation, the third image is a result of the conventional method, and the last image is a processing result of the present embodiment. The traditional image distortion method is difficult to process the positions of eyes and mouth, after the mouth is pulled away in the example, the image distortion method cannot generate images of teeth and the like, so that the images are very unnatural, and the method of the embodiment can automatically generate the corresponding missing parts.

Fig. 5 shows the effect of two consecutive edits in this embodiment, corresponding face keypoint wiring diagrams are listed in the first and third rows.

Fig. 6 shows an interactive interface which can be dragged by a user in real time in this embodiment, where the interactive interface includes a display area i, a display area ii, and a display area iv, where the display area i is used to display a geometric-base face image uploaded by the user and face key points detected by the geometric-base face image and available for the user to operate; the display area II is used for displaying a face geometric feature map corresponding to the face key points; and the display area IV is used for displaying the face editing image which is edited and generated by adopting a depth face reshaping editing method. In the deep face remodeling editing method corresponding to the interactive interface, the geometric basic face image is the same as the face appearance feature image, so that no other display area is arranged on the face appearance feature image.

The embodiment also provides a deep face reshaping editing device, which comprises a geometric feature extraction unit, an appearance feature extraction unit, a local generation unit and a global fusion unit, wherein the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face part to obtain local geometric features corresponding to all parts of the face; the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face; the local generation unit is used for generating the local face features which are corresponding to each part of the face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features; the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.

The present embodiment also provides a storage medium on which a computer program executable by a processor is stored, where the computer program is executed to implement the steps of the deep face reshaping editing method in the present embodiment.

The embodiment also provides a computer device, which has a memory and a processor, wherein the memory stores a computer program capable of being executed by the processor, and the computer program realizes the steps of the deep face reshaping editing method in the embodiment when being executed.

Claims

1. A human face geometric feature editing method is characterized by comprising the following steps:

and rendering the optimized grid into a human face geometric feature map.

2. The method for editing geometric features of human faces according to claim 1, wherein the training of the graph convolution variational automatic encoder comprises:

3. The method for editing geometric features of a human face according to claim 1, wherein the automatic graph convolution variation encoder optimizes grids formed by connecting the positions of the rest points of the key points of the human face and the key points of the human face according to the coordinate difference before and after dragging of a dragging point and the position of a fixed point, and comprises the following steps:

4. A deep face reshaping editing method is characterized in that:

segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;

5. The method for deep face reshaping and editing according to claim 4, wherein the local generation module generates, based on the local geometric features and the local appearance features, local features of the face corresponding to each part of the face, the local features including the corresponding local geometric features and the corresponding local appearance features, and includes:

6. A deep face reshaping editing method according to claim 4, wherein: features before the convolutional layer of the last dimension down to 3 in the local generation module are merged.

7. A deep face reshaping editing method according to claim 4, wherein: the global fusion module fuses the local features of the human face corresponding to each part of the human face together by using a network of a U-net structure.

8. The deep face reshaping editing method according to claim 4, wherein the training of the local generation module and the global fusion module comprises:

wherein

Is the output of the discriminator

9. The deep face reshaping editing method according to claim 4, wherein the training of the local generation module and the global fusion module comprises:

wherein

Is the number of layers of the discriminator,

10. A deep face reshaping editing method as claimed in claim 4, wherein the transformation of the input and output images to the color difference of the a and b channels in the CIELAB color space is constrained as follows:

11. A deep face reshaping editing method according to claim 4, wherein: in the training of the local generation module and the global fusion module, a pre-training network VGG19 is adopted to encode the input and output images and perform a high-level characteristic loss function;

wherein

and

generating face images input and output in the module locally;

and

the face images are input and output in the global fusion module.

12. The deep face remodeling editing device is characterized in that:

13. A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the method for deep face reshaping editing as claimed in any one of claims 4 to 11.

14. A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program when executed implements the steps of the method for deep face reshaping editing as claimed in any one of claims 4 to 11.

15. An interactive interface, comprising:

a display area iv for displaying a face editing image edited and generated by the deep face reshaping editing method according to any one of claims 4 to 11.