CN114663274A

CN114663274A - Portrait image hair removing method and device based on GAN network

Info

Publication number: CN114663274A
Application number: CN202210172409.6A
Authority: CN
Inventors: 吴奕谦; 金小刚
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-06-24

Abstract

The invention discloses a Portrait image hair removing method based on a GAN network, which comprises the following steps: step 1, randomly sampling hidden codes in a StyleGAN hidden space to obtain a hair hidden code-fraction data set; step 2, training through a support vector machine to obtain a hair separation boundary and a gender separation boundary; step 3, obtaining male optical head hidden codes through hair separation boundary editing; step 4, training to obtain a Male HairMapper model for editing the hidden codes of the Male with the hair based on the data in the step 3; step 5, editing the gender separation boundary and a Male HairMapper model to obtain a female optical head hidden code; step 6, training and obtaining a hair removal model for generating a high-quality optical head portrait image based on the data; and 7, inputting the portrait image of the hair to be removed into the hair removal model, and outputting the portrait image of the hair to be removed after calculation. The invention also provides a portrait image hair removing device. The method provided by the invention can generate a high-quality optical head portrait image.

Description

Portrait image hair removing method and device based on GAN network

Technical Field

The invention relates to the technical field of portrait editing, in particular to a method and a device for removing portrait image hairs based on a GAN network.

Background

Hair is not only an important component of the human body, but also a key element representing individuality and fashion. However, the presence of hair in portrait images poses significant challenges for digital hair styling and three-dimensional reconstruction of human faces. In the aspect of digital hair style design, the new hair style is easily mixed with the old hair by directly overlaying the new hair style on the original drawing, so that problems are caused; replacing an old hair style with a new one requires the use of matting and image completion techniques, both of which are very error prone. For the reconstruction of a three-dimensional face, most of the existing methods cannot process hairs blocking the face, and the hairs still remain in the texture, so that obvious artifacts can be generated in the reconstructed face. This prompted us to develop a hair treatment method that naturally removes hair from the portrait to facilitate practical use.

The traditional method for removing hair generates a training set by modifying and calibrating actual male and female images, but the method is time-consuming and labor-consuming, and the volume of the obtained training set is not large, so that the model of the final training is not good. At the present stage, a scientific researcher adopts hidden codes in a StyleGAN hidden space to represent image information, and then finishes the calibration of an image through a computer, but the StyleGAN hidden space has a problem: the semantic information of the combination of female-optical head is lacked in the hidden space, so that the method can only obtain the hidden code pair of male hair and optical head.

Academic literature interrelating the patent Space of GANs for Semantic Face Editing (In Advances In Neural Information Processing Systems 33(2020): pages 12104 and 12114,2020) discloses a method for establishing separation boundaries In a StyleGAN hidden Space, thereby Editing Face semantics by simple linear combination, but the method cannot ensure that the Face features before and after Editing are unchanged.

Academic documents Text-drive management of Text image (In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pages 2085-2094,2021) disclose a CILP language-image pre-training model for implementing Text-based semantic image operations by dynamically finding the direction of a given Text prompt In a StyleGAN hidden space.

Patent document CN111598762A discloses a generating robust image steganography method, which includes: constructing an image data set, and pre-training the image data set; constructing and initializing a deep learning network architecture; training a deep learning network architecture by adopting a combined-fine tuning method to obtain a network architecture model; and generating a secret-carrying pseudo-graph by using a network architecture model, and carrying out secret communication to finish the image steganography process. The method adopts an image steganography method, and integrates the embedding process of secret information into the image generation process by utilizing a generated antagonistic network StyleGAN, so that the obtained generation type image steganography method has the advantages of large embedding capacity, good generated image quality, strong undetectable secret image statistics, high practicability and the like, and overcomes the problems of poor quality of the secret image generated by the existing generation type image steganography, low embedding capacity, low information extraction accuracy and the like, but the method cannot well process the hair removal problem because the StyleGAN does not have semantic information of a female-optical head combination, and meanwhile, the invention content does not mention how to solve the problem.

Disclosure of Invention

In order to solve the problems, the invention provides a method for removing portrait image hair based on a GAN network, which solves the problem that semantic information of a combination of female and optical heads does not exist in a StyleGAN hidden space through a gender boundary; furthermore, a model is constructed based on the hair hidden code set of the StyleGAN hidden space, so that hair can be removed under the condition that other characteristics of the face are kept unchanged, and the hair in the portrait image can be removed quickly and automatically.

A portrait image hair removing method based on a GAN network comprises the following steps:

step 1, randomly sampling hidden codes in a StyleGAN hidden space to obtain a hidden code set, labeling the hidden codes in the hidden code set to obtain a hair label and an optical head label, and combining the hidden code set and the label set into a hair hidden code-fraction data set;

step 2, obtaining a hair separation boundary and a gender separation boundary through training of a support vector machine, wherein the hair separation boundary can only edit a hidden code of male hair and cannot ensure that the facial features of a portrait image output by the hidden code are not changed, and the gender separation boundary is used for editing the gender corresponding to the hidden code and ensuring that the facial features of the portrait image output by the hidden code are not changed;

step 3, editing the male hidden codes in the head-hair hidden code-fraction data set through a hair separation boundary, and obtaining male hidden codes corresponding to the male hidden codes in the head-hair hidden code-fraction data set by adopting semantic diffusion refinement optimization, wherein the facial features of the male hidden codes output the portrait images are consistent with the facial features of the male hidden codes output the portrait images;

step 4, combining the Male bareheaded hidden code obtained in the step 3 and the corresponding Male haired hidden code into a training set, inputting the training set into a pre-constructed Male HairMapper model, and after iterative training, obtaining a Male HairMapper model which is used for editing the Male hidden code to remove the hair in the portrait image and keeping the facial features of the portrait image unchanged;

step 5, converting the female hidden codes with hair in the hair hidden code-fraction data set into Male hidden codes with hair through a gender separation boundary, inputting the converted Male hidden codes with hair into a trained Male HairMapper model to obtain Male hidden codes with hair, and optimizing the Male hidden codes with hair through semantic diffusion refinement to obtain corresponding female hidden codes with hair;

step 6, the male bald hidden code generated in the step 3 and the corresponding male haired hidden code form a data set, the data set is input into a pre-constructed hair removal model, and after iterative training, a hair removal model for generating a high-quality bald portrait image is obtained;

and 7, inputting the portrait image of the hair to be removed into the hair removal model, and outputting the portrait image after the hair is removed after the portrait image is subjected to editing calculation and fusion splicing with the image.

Specifically, the hidden code set in step 1 includes a female hidden code with hair, a male hidden code with hair and a corresponding male hidden code with hair.

Specifically, the hair separation boundary and the gender separation boundary are obtained through training of a support vector machine in the step 2, and the specific steps are as follows:

step 2.1, based on the male hidden codes with hair and the corresponding male hidden codes with optical head in the hair hidden code-fraction data set, training by a support vector machine to obtain hair separation boundaries;

and 2.2, editing the randomly generated hidden codes through StyleFlow software to obtain a gender hidden code-score data set with opposite gender hidden codes of the portrait image, and training through a support vector machine based on the gender hidden code-score data set to obtain a gender separation boundary.

Preferably, the semantic diffusion refinement optimization is to diffuse the facial features in the original portrait image into the edited hidden codes, perform iterative optimization on the target hidden codes, and output pairs of hidden codes with hairs and an optical head when the total loss function reaches the minimum.

Specifically, when the target hidden code is subjected to iterative optimization, the total loss function adopted is specifically:

L_diffuse＝λ_recL_rec+λ_perL_per

wherein L is_recFor pixel level reconstruction loss, L_perFor structure level reconstruction loss, λ_recWeight lost for pixel level reconstruction, λ_perThe lost weights are reconstructed for the structure level.

Preferably, the hair removal model in step 6 is obtained by using the following formula of the target loss function during iterative training:

L＝λ_lL_latent+λ_hL_hair+λ_fL_face+λ_iL_id

wherein L is the total loss, L_latentFor implicit code loss, λ_lWeights lost for implicit codes, L_hairIs the pixel-level loss, λ, of the hair region_hWeight lost for pixels of the hair region, L_faceIs the pixel level loss, λ, of the face region_fWeight lost at pixel level for face region, L_idFor loss of facial features, λ_iThe weight of face feature loss is added with the calculation parameters of hidden code loss and pixel loss on the basis of the traditional loss function formula, so that the quality of the portrait image output by the final model is improved.

Preferably, the hair removal model in step 6 further includes an encoder and an image generator, the encoder is configured to encode the input portrait image to obtain a corresponding hidden code, and obtain a corresponding optical head hidden code after calculation and editing, and the image generator generates the portrait image with hair removed based on the obtained optical head hidden code.

Preferably, the fusing and splicing in step 7 is to perform seamless fusion on the face features of the portrait image to be hair-removed through Poisson editing operation and the portrait image output by the optical head hidden code obtained through editing calculation, so as to obtain the portrait image with hair removed.

The invention also provides a portrait image hair removing device for rapidly removing portrait image hair, comprising:

a computer memory, a computer processor and a computer program stored in and executable on the computer memory, wherein the computer processor performs the portrait image hair removal method described above; the computer processor, when executing the computer program, performs the steps of: inputting the portrait image of the hair to be removed into the portrait image hair removal device, and outputting the portrait image after the hair is removed through calculation.

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention provides a 'female-male-bald' process, thereby solving the problem that semantic information of a female-bald combination does not exist in a StyleGAN hidden space.

(2) Based on the hair hidden code of the StyleGAN hidden space, a portrait image hair removal model is constructed, so that various facial forms with different expressions, postures, ages and sexes can be processed, and a corresponding optical head portrait image with high quality is generated.

Drawings

FIG. 1 is a schematic flow diagram of a portrait image hair removal method according to the present invention;

FIG. 2 is a flow chart of the training process of the Male HairMapper model and the hair removal model in the present embodiment;

FIG. 3 is a flow chart of the hair removing model for removing hair according to the present embodiment;

FIG. 4 is a portrait image of the present embodiment with hair removed;

fig. 5 is a portrait image after hair is removed in this embodiment.

Detailed Description

As shown in fig. 1, a portrait image hair removal method based on GAN network includes:

step 1, randomly sampling hidden codes in a StyleGAN hidden space to obtain a hidden code set, labeling the hidden codes in the hidden code set to obtain a hair label and an optical head label, and combining the hidden code set and the labeling label into a hair hidden code-score data set (W, S)_hair：

Step 1.1, randomly sampling to obtain 2N_wInvisible code

Wherein N is_wA hidden code constitutes a data set D₀：

Calculating N_wStandard deviation w of implicit code_stdAnd calculating a noise ratio value according to the formula (II):

scale_noise＝(0.5w_std)² (II)

in [0,1 ]]Are generated randomlyMaking noise and using scale_noiseControlling the weight of the noise to obtain a final noise data set D_noise：

Wherein n is_i0,……,n_i17Noise added to each layer;

step 1.2, D₀And D_noiseThe hidden codes in the system are input into a generator of StyleGAN2-ada to obtain corresponding random sampling portrait images, a hair classifier is trained to score the random sampling portrait images, and the corresponding score of each hidden code is obtained according to the formula (IV):

s＝C(g(w⁺)) (IV)

wherein s is hair score, C is hair classifier, g is generator of StyleGAN2-ada, g (w)⁺) A portrait image output by the generator, wherein the hair classifier adopts a ResNeXt-50 (32x4D) structure, the portrait image containing hair is graded as 1, the portrait image of the optical head is graded as 0, and a hidden code set D is completed₀And D_noiseLabeling with a label with or without hair;

step 1.3, mixing the above D₀Hidden code of data set and corresponding hair score constitute a hair hidden code-score data set (W, S)_hair：

Step 2, obtaining a hair separation boundary and a gender separation boundary through training of a support vector machine, wherein the hair separation boundary can only edit the hidden code of male hair, but cannot ensure that the facial features of the hidden code output portrait image are not changed, the gender separation boundary is used for editing the gender corresponding to the hidden code, and simultaneously ensures that the facial features of the hidden code output portrait image are not changed:

step 2.1, based on the hair hidden code-score data set (W, S)_hairTraining and outputting a coarse hair by using a support vector machineSeparation boundary b_hNormal vector n of_h；

Step 2.2 Generation of N Using StyleFlow stochastic editing according to formula (VII)_genderHidden code

Obtaining a gender-converted covert code

The method of using the formula is disclosed In academic literature including Attribute-conditioned expansion of quantized-generated images using conditioned connecting normal communicating flows (In ACM Transactions On Graphics (TOG),2021,40(3): 1-21, 2021), and will not be described In detail;

step 2.3, mixing

And w⁺Setting the hidden code number of all males to be 1 and setting the hidden code number of all females to be 0 to obtain the hidden code-score data set of gender (W, S)_gender：

Wherein, the first and the second end of the pipe are connected with each other,

to mean

Or w⁺；

Step 2.4, hidden code-score data set based on gender (W, S)_genderTraining and outputting a sex separation boundary b by using a support vector machine_gNormal vector n of_g。

Step 3, editing the male hidden codes in the head-hair hidden code-fraction data set through a hair separation boundary, obtaining male hidden codes corresponding to the male hidden codes in the head-hair hidden code-fraction data set by adopting semantic diffusion refinement optimization, wherein the facial features of the male hidden codes output the portrait images are consistent with the facial features of the male hidden codes output the portrait images:

step 3.1, editing hidden codes by using hair separation boundaries

The

Is D₀And D_noiseThe hidden code of the male portrait image corresponding to the middle is obtained according to the formula (IX)

Wherein n is_hSeparating the boundary b for the hair_hThe normal vector of (a);

step 3.2, adopting FaceParsing extraction

Corresponding original portrait image

Mask m of hair region_h；

Step 3.3, based on the original portrait image

And an intermediate portrait image

Calculating and outputting prior information according to formula (X)

Wherein an element level multiplication is indicated;

step 3.4, mixing

As hidden codes to be optimized

An initial value of (1);

step 3.5, calculating according to the formula (XI) to obtain the pixel level reconstruction loss L_rec：

Step 3.6, calculating and obtaining the structural level reconstruction loss L according to the formula (XII)_per：

Wherein φ represents the VGG16 model after training;

step 3.7 Total loss L calculated according to formula (XIII)_diffuse：

L_diffuse＝λ_recL_rec+λ_perL_per(XIII) in which λ_recWeight lost for pixel level reconstruction, λ_perWeights lost for structure level reconstruction;

step 3.8, optimization by continuous iteration

So that the total loss L_diffuseMinimum, get the finalSemantic diffusion results

Hidden code for obtaining portrait image without hair and keeping facial features unchanged

Step 4, combining the Male hidden code obtained in the step 3 and the corresponding Male hidden code with hair to form a training set, inputting the training set into a pre-constructed Male HairMapper model, and after iterative training, obtaining a Male HairMapper model which is used for editing the Male hidden code to remove the hair in the portrait image and keeping the facial features of the portrait image unchanged:

step 4.1, based on the pair of hair-and-no-hair male crypts obtained in step 3

And with

Obtaining a corresponding hidden code-fraction data set H_m：

Wherein n is_mAs a data set H_mThe number of crypto-code pairs for middle males;

step 4.2, training a full-connection network Male HairMapper, and recording as M_m，M_mEditing the hidden code according to equation (XV):

wherein β is a hyperparameter controlling the weight;

step 4.3, calculating according to the formula (XVI) to obtain the hidden code loss L_latent：

Step 4.4, S44 the pixel level loss L of the hair region is calculated according to formula (XVII)_hair：

Step 4.5, S45 obtains the pixel level loss L of the face area by calculation according to the formula (XVIII)_face：

Step 4.6, calculating by ArcFace according to the formula (XIX) to obtain the loss L of the facial features_id：

Wherein, R is an ArcFace network, calculating cosine value;

and 4.7, calculating the total loss L according to the formula (XX):

L＝λ_lL_latent+λ_hL_hair+λ_fL_face+λ_iL_id (XX)

wherein λ is_lWeight lost for implicit codes, λ_hWeight of pixel-level loss, λ, for hair region_fWeight lost at pixel level, λ, for face region_iWeights lost for facial features;

step 4.8, optimizing M by continuous iteration_mReducing the total loss L to obtain a trained Male HairMapper model; the Male HairMapper model is used to directly edit Male cryptic code to remove hairs in a portrait image and leave facial features in the portrait image unchanged.

Step 5, converting the female hidden codes with hair in the hair hidden code-fraction data set into Male hidden codes with hair through a gender separation boundary, inputting the converted Male hidden codes with hair into a trained Male HairMapper model to obtain Male hidden codes with hair, and optimizing the Male hidden codes with hair through semantic diffusion refinement to obtain corresponding female hidden codes with hair:

step 5.1, editing hidden codes by adopting gender separation boundaries

The above-mentioned

Is D₀And D_noiseThe corresponding female portrait image hidden code is obtained according to the formula (XXX) and the male intermediate hidden code with the facial pose and the skin color basically unchanged

Wherein n is_gSeparating the boundary for sex b_gThe normal vector of (a);

step 5.2, hiding the male hidden code with hair

Inputting the codes into a Male HairMapper model, and calculating to obtain corresponding Male hidden codes with hair removed

Wherein the hidden code

And

in contrast, hair was removed, but facial features were changed;

step 5.3, adopting FaceParsing extraction

Corresponding original female portrait image

Mask m of hair region_h；

Step 5.4, based on the original female portrait image

And a middle male anophelifuge portrait image

Calculating output prior information according to formula (L)

Wherein an element level multiplication is indicated;

step 5.5, mixing

As hidden codes to be optimized

An initial value of (1);

step 5.6, calculating according to the formula (LX) to obtain the pixel level reconstruction loss L_rec：

Step 5.7, calculating according to the formula (LXX) to obtain the structure level reconstruction loss L_per：

Where φ represents the trained VGG16 model;

step 5.8, calculating according to the formula (LXXX) to obtain the total loss L_diffuse：

L_diffuse＝λ_recL_rec+λ_perL_per(LXXX) wherein λ_recWeight lost for pixel level reconstruction, λ_perReconstructing the lost weights for the structure level;

step 5.9, optimizing by utilizing continuous iteration

So that the total loss L_diffuseMinimum, obtain the final semantic diffusion result

And

form a pair of female crypts with and without hair.

Step 6, hiding the male bald hidden code generated in the step 3 and the corresponding male haired hidden code, and forming a data set by the female bald hidden code generated in the step 5 and the corresponding female haired hidden code, inputting the data set into a pre-constructed hair removal model, and obtaining a hair removal model for generating a high-quality bald portrait image after iterative training:

step 6.1, based on the male covert codes with and without hair in pairs obtained in step 3

And

the female covert code obtained in the step 5 and paired with hair and without hair

And

the composition data set H:

wherein n is_mNumber of male covert code pairs, n, in dataset H_fNumber of covert code pairs for females;

step 6.2, constructing a hair removal model, which comprises an identification module, a hair removal module, an imaging module and a fusion module:

the identification module comprises an encoder, the encoder is used for encoding an input portrait image to obtain a corresponding hidden code in a StyleGAN hidden space and inputting the hidden code to the hair removal module, wherein the encoder selects a projector provided by design an encoder for Style GAN image management, the projector is the prior art, and the specific operation process is not repeated;

the hair removing module is a fully connected network and is used for editing the hidden hair codes in the portrait image hidden codes into hidden optical head codes and inputting the edited portrait image hidden codes into the imaging module;

the imaging module comprises an image generator, the image generator generates a corresponding optical head portrait image according to an input optical head hidden code, and obtains the portrait image without hair through an image fusion method.

The image fusion method specifically comprises the following steps:

first, a mask m of a hair region of a portrait image X input into a hair removal model is extracted using FaceParsing_test；

Then, tom_testPerforming dilation and blur operation to obtain a blurred edge mask

Finally, seamlessly fusing the new portrait image with the portrait image needing hair removal through Poisson editing, and calculating to obtain the portrait image X with the hair removed_res：

Wherein, P is Poisson edition operation;

and 6.3, performing iterative training on the hair removal model based on the data set H to finally obtain the hair removal model for generating the high-quality optical head portrait image, wherein the training method is consistent with the training method of the Male HairMapper model in the step 4, namely, the detailed description is omitted.

As shown in FIG. 2, a flow chart of the training of the Male HairMapper model and the hair removal model is shown.

As shown in fig. 3, a specific process for removing hair for the hair removal model is as follows: the method comprises the steps of transferring the facial features of an original portrait image to a generated optical head portrait image to finish the hair removal work of the original portrait image, wherein the movement of the facial features in a StyleGAN hidden space is linear work, so that the method is simpler and quicker compared with the movement of the hair features, and meanwhile, the problem that the hair features are lost during movement is avoided, so that a high-quality fused portrait image is obtained.

Step 7, inputting the portrait image of the hair to be removed shown in fig. 4 into a hair removal model, and encoding the input portrait image through a projector provided by a Designing an encoder for style gan image manipulation to obtain a corresponding hidden code

According to formula (XCIX)

Editing and calculating to obtain corresponding hidden codes for removing hairs

Invisible code for removing hair

Input to the image generator, a new portrait image is obtained with no hairs and with other facial features unchanged:

and finally, seamlessly fusing the new portrait image and the portrait image without the hair through Poisson editing on the mask of the portrait image with the hair to be removed, and calculating to obtain the portrait image without the hair as shown in the figure 5.

Claims

1. A portrait image hair removing method based on a GAN network is characterized by comprising the following steps:

step 4, combining the Male hidden code obtained in the step 3 and the corresponding Male hidden code with hair to form a training set, inputting the training set into a pre-constructed Male HairMapper model, and obtaining a Male HairMapper model which is used for editing the Male hidden code to remove the hair in the portrait image and keeping the facial features of the portrait image unchanged after iterative training;

and 7, inputting the portrait image of the hair to be removed into the hair removal model, and outputting the portrait image of the removed hair after editing calculation and image fusion splicing.

2. The method for removing portrait image hair based on GAN network of claim 1, wherein the hidden code set in step 1 includes hidden codes with hair for female, hidden codes with hair for male and corresponding hidden codes with hair for male.

3. The method for removing portrait images based on GAN network as claimed in claim 1, wherein the step 2 obtains the hair separation boundary and the gender separation boundary through the training of a support vector machine, and the specific steps are as follows:

4. The method for removing the hairs of the portrait image based on the GAN network as claimed in claim 1, wherein the semantic diffusion refining optimization is to diffuse the facial features in the original portrait image into the edited hidden codes, perform iterative optimization on the target hidden codes, and output pairs of hidden codes with hairs and optical heads when the total loss function reaches the minimum.

5. The method for removing portrait image hair based on GAN network as claimed in claim 4, wherein the total loss function adopted when iteratively optimizing the target hidden code is specifically:

L_diffuse＝λ_recL_rec+λ_perL_per

6. The method for removing the portrait image hair based on the GAN network as claimed in claim 1, wherein the hair removal model in the step 6 is iteratively trained by using a target loss function formula as follows:

L＝λ_lL_latent+λ_hL_hair+λ_fL_face+λ_iL_id

wherein L is the total loss, L_latentFor implicit code loss, λ_lWeights lost for implicit codes, L_hairIs the pixel-level loss, λ, of the hair region_hWeight lost for pixels of the hair region, L_faceIs the pixel level loss, λ, of the face region_fWeight lost at pixel level for face region, L_idFor loss of facial features, λ_iIs the weight lost to the facial features.

7. The method for removing the hair of the portrait image based on the GAN network as claimed in claim 1, wherein the hair removal model in step 6 further comprises an encoder and an image generator, the encoder is configured to encode the input portrait image to obtain a corresponding hidden code, and obtain a corresponding hidden code after calculation and editing, and the image generator generates the portrait image with the hair removed based on the obtained hidden code.

8. The method for removing the hairs of the portrait image based on the GAN network as claimed in claim 1, wherein the fusing and splicing in the step 7 is to seamlessly fuse the facial features of the portrait image to be hair-removed with the portrait image output by the optical head hidden code obtained by editing and calculating through Poisson editing operation, so as to obtain the portrait image with the hairs removed.

9. A portrait image hair removal device comprising a computer memory, a computer processor and a computer program stored in and executable on the computer memory, wherein the computer processor performs the GAN network-based portrait image hair removal method of any of claims 1-8; the computer processor, when executing the computer program, performs the steps of: inputting the portrait image of the hair to be removed into the portrait image hair removal device, and outputting the portrait image after the hair is removed through calculation.