CN117218302B

CN117218302B - Doll model generation algorithm based on generation countermeasure network

Info

Publication number: CN117218302B
Application number: CN202311484349.2A
Authority: CN
Inventors: 张海军; 李国建; 穆翔宇
Original assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Current assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date: 2023-11-09
Filing date: 2023-11-09
Publication date: 2024-04-23
Anticipated expiration: 2043-11-09
Also published as: CN117218302A

Abstract

The invention is suitable for the field of image translation, and discloses a doll model generation algorithm based on a generation countermeasure network, which comprises the following steps: building a doll model generation data set, designing a doll model generation model, and respectively training a head generation network and a skin generation network. The model generation model comprises a head generation network based on a generation countermeasure network and a skin generation network based on the generation countermeasure network; the head generation network is used for generating a real head image, and the skin generation network is used for reconstructing a skin region of the manikin. The invention combines the head generation network and the skin generation network together to form a doll model generation algorithm, can realize the task of converting a doll model into a real model, and keeps the information such as the posture and the clothing of the model unchanged. The model ensures the quality of the generated image and simultaneously can give consideration to the running speed of the model.

Description

Doll model generation algorithm based on generation countermeasure network

Technical Field

The invention relates to the field of image translation, in particular to a doll model generation algorithm based on a generation countermeasure network.

Background

The generation of the mannequin refers to the conversion of an image of the mannequin into a real model image while keeping the clothes, the posture, and the like of the mannequin unchanged. With the advent of electronic commerce, more and more merchants have chosen to sell garments online. However, selling garments online does not allow users to touch the entity of the garment, and how to display the garment online becomes a major problem for merchants. Conventional methods for displaying garments typically require the use of models, preparation for shooting of sites and equipment, etc., and then putting the garment to be displayed on the model to shoot a garment display image. However, this may result in a high cost of shooting for the merchant. In order to solve the problem, the invention provides a doll model generation algorithm. The algorithm can directly convert the doll model image into a real model image, and areas such as clothing in the image are kept unchanged, so that the cost of displaying the clothing on line of a merchant can be greatly reduced.

Disclosure of Invention

The invention aims to provide a doll model generation algorithm based on a generation countermeasure network, which aims to solve the technical problem of converting a doll model image into a real model image.

In order to achieve the above purpose, the invention provides the following scheme: a mannequin generation algorithm based on generating an countermeasure network, comprising the steps of:

Constructing a doll model to generate a data set: acquiring a real model image data set, and constructing a doll model according to the real model image data set to generate a data set; the doll model generation data set comprises a head data set and a skin data set, wherein the head data set comprises a head analysis image, a face key point feature, a reserved area of the head image and a mask of the head, and the skin data set comprises openpose features containing hand key points, densepose features containing a human body surface, a reserved area of the human body image, a mask of the skin and a mask of the human body;

Designing a doll model generation model: the model generation model comprises a head generation network based on a generation countermeasure network and a skin generation network based on the generation countermeasure network; the head generation network is used for generating a real head image, the input of the head generation network is the facial key point characteristics in the head data set, the reserved area of the head image and the mask of the head, and the output of the head generation network is the real head image and the semantic distribution information of the head image; the skin generation network is used for reconstructing a skin area of the manikin, the input of the skin generation network is a skin data set, and the output of the skin generation network is a manikin image of real skin;

the head generation network and the skin generation network are trained separately.

As one embodiment, the header generation network comprises a header generator based on Res-UNet and a header discriminator based on PatchGAN, wherein the header generator is used for outputting a header image and semantic distribution information of the header image according to an input header data set, and the header discriminator is used for discriminating the true or false of the header image output by the header generator; specifically, the inputs to the head generator are facial key point features in the head dataset, a reserved area of the head image, and a mask for the head; the head discriminator discriminates the authenticity of the head image output by the head generator so as to restrict the authenticity of the image generation;

The skin generation network comprises a Res-UNet-based skin generator and a PatchGAN-based skin discriminator, wherein the skin generator is used for outputting a mannequin image according to an input skin data set, and the skin discriminator is used for discriminating the authenticity of skin of the mannequin image output by the skin generator. Specifically, the inputs to the skin generator are openpose features containing hand keypoints, densepose features containing human body surfaces, reserved areas of human body images, masks of skin, and masks of human body; the output of the skin generator is a three-dimensional manikin image.

As an embodiment, the head generator and the skin generator each comprise five layers Res-UNet, wherein the number of hidden units is set to 512;

the basic unit of each layer of Res-UNet comprises a residual block formed by a three-layer convolutional neural network, namely, the residual part of the Res-UNet residual block used is the three-layer convolutional neural network.

As one embodiment, the head generator outputs a head image and semantic distribution information of the head image according to an input head data set, including the steps of:

the head data set is input into five layers of Res-UNet, five layers of downsampling are firstly carried out, then jump connection is used, features of each layer of Res-UNet are combined, upsampling is carried out, and finally a head image and 15-dimensional head image semantic distribution information are output.

As one embodiment, the head discriminator and the skin discriminator each include PatchGAN including five convolution layers, the convolution kernel sizes of the five convolution layers are each 4×4, the step sizes of the first, second, and third convolution layers are each 2, and the step sizes of the fourth and fifth convolution layers are each 1.

As one embodiment, the first layer of convolution layer includes a convolution and LeakyReLU loss functions, the second, third and fourth layers of convolution layers each include convolutions, BN and LeakyReLU layers, and the fifth layer of convolution layer includes a convolution operation that reduces the number of channels of the feature map to 1.

As one embodiment, the header generation network has an input dimension of 20 dimensions and an output dimension of 18 dimensions; the dimension of the input of the skin generation network is 31 dimension, and the dimension of the output is 3 dimension.

As one embodiment, in the training phase, the optimizers of the head generation network and the skin generation network are set to Adam, and the learning rate is set to 0.00005;

When training the head generation network, the loss functions used comprise a first antagonism loss function, a head analysis loss function, a first reconstruction loss function and a first VGG perception loss function;

The loss functions used in training the skin generation network include a second fight loss function, a second reconstructed loss function, and a second VGG perceived loss function.

As one embodiment, before training the head generation network and the skin generation network, respectively, the size of the real model image is adjusted to 1024×768, a head image of 256×256 in size is cut off centering on the face, and a head analysis image, a face key point feature, a human body posture feature, a mask of the head, a mask of the skin, and a mask of the human body are generated.

As one embodiment, the mannequin generation model is generated based on Pytorch frames.

As one embodiment, the constructing a data set generated by a model of a human figure from a real model image data set includes the steps of:

Constructing a head data set: the size of a real model image is adjusted, a head image with a certain size is intercepted by taking a face as a center, and a head analysis image and a face key point feature are extracted; specifically, the size of the real model image is adjusted to 1024×768, and the size of the truncated head image is 256×256.

Constructing a skin dataset: two pose estimation features are extracted from a real model image, namely openpose features containing key points of hands and densepose features containing 24 human body surfaces.

According to the doll model generation algorithm based on the generation countermeasure network, the head generation network based on the face key points is applied to a doll model generation task, the method can ignore the size proportion of the five sense organs in the input head image, and a real human body image can be generated only according to the positions of the five sense organs in the face area, so that the method is very suitable for being applied to the doll model generation task. Meanwhile, a new head generating network is built based on the generating countermeasure network, and head analysis loss is added in the network, so that the quality of generated images can be improved, and the generated head images are more real. In addition, the invention also provides a skin generation network based on the skin mask, and the generated hand area contains more details by openpose features containing hand key points, so that the generated image is more real. The invention combines the head generation network and the skin generation network together to form a doll model generation algorithm, can realize the task of converting a doll model into a real model, and keeps the information such as the posture and the clothing of the model unchanged. The model ensures the quality of the generated image and simultaneously can give consideration to the running speed of the model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a figure model generation algorithm based on generating an countermeasure network according to an embodiment of the present invention.

Fig. 2 is a flowchart of a model generation model of a design doll model according to an embodiment of the present invention.

Fig. 3 is a display diagram of a figure model generated dataset image and corresponding features provided by an embodiment of the present invention.

Fig. 4 is a model framework diagram of a model generation algorithm based on generating a countermeasure network according to an embodiment of the present invention.

Fig. 5 is a frame diagram of a generating network based on a header of a generating countermeasure network provided by an embodiment of the present invention.

Fig. 6 is a frame diagram of a skin generation network based on generation of an countermeasure network provided by an embodiment of the present invention.

Fig. 7 is a display diagram of a mannequin image captured in a real scene to which the mannequin generation algorithm provided by the embodiment of the present invention is applied.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

Fig. 1 shows a flowchart of a model generation algorithm based on a generation countermeasure network provided by the invention, fig. 2 shows a flowchart of a model generation model designed by an embodiment of the invention, and fig. 3 shows a display diagram of a model generation dataset image and corresponding features provided by an embodiment of the invention; fig. 4 shows a model frame diagram of a model generation algorithm based on generating a countermeasure network according to an embodiment of the present invention, and fig. 5 shows a frame diagram of a head generation network based on generating a countermeasure network according to an embodiment of the present invention; FIG. 6 illustrates a frame diagram of a skin generation network based on generation of an countermeasure network provided by an embodiment of the invention; fig. 7 shows a display diagram of a mannequin image captured in a real scene to which the mannequin generation algorithm provided by the embodiment of the present invention is applied.

Referring to fig. 1, a detailed description of a model generation algorithm based on generating an countermeasure network according to an embodiment of the present invention is as follows:

Step S1: a doll model generation dataset is constructed, the doll model generation dataset comprising a head dataset and a skin dataset, the head dataset comprising a head resolution image, facial key point features, a reserved area of the head image and a mask of the head, the skin dataset comprising openpose features comprising hand key points, densepose features comprising a human surface, a reserved area of the human image, a mask of the skin and a mask of the human body.

The real model image data set and model image data for training are obtained from display images on shopping websites, and the data set comprises various high-definition female model images. When a data set is generated by constructing a doll model, the characteristics of the collected real model image are required to be extracted, and the head image, the head analysis image, the face key points and the gesture information of the model are extracted. For the acquisition of the head image, the invention uses the face area of the model as the center, and intercepts the model image to obtain the head image with the size of 256×256. For the head analysis image, the head analysis image is analyzed by utilizing a pre-trained BiSeNet-based head analysis model, and the image is divided into 15 different areas to obtain the head analysis image. For the facial key points, the facial key point detection method utilizes a MTCNN-based facial key point detection algorithm to detect the key points of the head images. For the gesture information of the model, the invention respectively extracts two gesture features, namely openpose features containing key points of hands and densepose features containing surface information of 24 persons by using the pre-trained model. The extracted features of the present invention are shown in fig. 3.

Step S2: and designing a model generation model of the doll model. The mannequin generation model includes a head generation network based on generating the countermeasure network and a skin generation network based on generating the countermeasure network. Specifically, referring to fig. 1 and 2, the design of the model generation model for the doll includes the steps of: the step S2-1 design is based on a header generation module that generates an countermeasure network and the step S2-2 design is based on a skin generation module that generates the countermeasure network.

Step S2-1: the design generates a network (also may be referred to as a header generation module) based on generating a header of the antagonism network. The header generation network of the invention uses the input characteristics to generate a real header image and 15-dimensional header image semantic distribution information, and the network structure of the stage is shown in fig. 5. The header generation network includes a Res-UNet based header generator (also referred to as a Res-UNet based header generation network) and a PatchGAN based header arbiter (also referred to as a PatchGAN based header arbitration network). The header generator comprises five layers of Res-UNet, the base unit of each layer consisting of a residual block of three layers of convolutional neural networks. Features are input into Res-UNet, five layers of downsampling are performed first, then a jump connection is used, features of each Res-UNet layer are combined, and upsampling is performed to enable the size of a final output feature map to be the same as that of the input feature map. The head discriminator includes PatchGAN comprising five convolutional layers. PatchGAN dividing the input image into a plurality of small image blocks, and respectively carrying out two classification on each small block to finally obtain a global discrimination result of the input image. The head generator uses the facial key point features, the reserved area of the head image, and the mask (head mask) features of the head as inputs, and finally outputs a real head image while predicting the semantic distribution of the head image. The head discriminator discriminates the authenticity of the head image output from the head generator to restrict the authenticity of the image generation. Head generation network loss functionComprising four parts, respectively a first countermeasures loss functionHead parsing loss functionA first VGG perceptual loss function and a first reconstruction loss function. The loss function formula of the header generation network is as follows (formula 1):（1）。

Wherein, Representing the overall loss function of the header generation network,A first countermeasures loss function is represented,Representing the head-resolution loss function,A first reconstruction loss function is represented and,Representing a first VGG perceptual loss function,、、AndRespectively representing the weights of a first VGG perception loss function, a first reconstruction loss function, a head analysis loss function and a first antagonism loss function, D represents a head discriminator, x represents the input characteristics, N represents the number of pixel points, B represents the category number of the pixel points,Representing the probability that the ith pixel point generated by the model is of the jth class, S _R represents the generated head image, R represents the real head image, C, H, W represents the channel number, height and width of the image respectively,Representing the output of layer j in the VGG network.

Step S2-2: the design is based on generating a skin generation network (also may be referred to as a skin generation module) against the network. The present invention proposes a skin generation network capable of converting a model skin area into real skin in the case of a clothing area capable of retaining a model image. The network structure at this stage is shown in fig. 6. The overall structure of the network is the same as that of the header generation network, and comprises a Res-UNet-based generator and a PatchGAN-based arbiter. Inputs of the skin generation network are openpose features containing key points of the hand, densepose features containing 24 human surfaces, reserved areas of human images, masks of human bodies (human masks) and masks of skin (skin masks), and the images are output as a converted model image of human bodies. Meanwhile, the PatchGAN network is favorable for judging the true and false of the generated result. The loss function used by the skin generation network is a weighted sum of the second antagonistic loss function, the second VGG perceived loss function, and the second reconstructed loss function, the loss function formula of the skin generation network is as follows (equation 2):（2）。

Wherein, Representing the overall loss function of the skin-generated network,A second countermeasures loss function is represented,A second reconstruction loss function is represented which,Representing a second VGG perceptual loss function,、AndRepresenting the weights of the second VGG perceived loss function, the second reconstructed loss function, and the second fight loss function, respectively, D' representing the skin discriminator, S _V representing the generated skin image, V representing the real model image,Representing the output of the j-th layer in the VGG network, C ', H ', W ' represent the channel number, height, and width of the image, respectively.

Step S3: the head generation network and the skin generation network are trained separately. According to the design training strategy of the virtual dressing model, the doll model generation model provided by the invention is realized under the Pytorch framework. In the training and testing stage, the image is resized to 1024×768 before being sent to the network, and 256×256 head regions are truncated centering on the face, and features such as head analysis image, face key point feature, human body posture feature (openpose feature including hand key point and densepose feature including human body surface) and mask of model (mask of head, mask of skin and mask of human body) required for the task are generated using the pre-trained model. In the header generation network, the Res-UNet network has a five-layer structure, the basic unit of each layer consists of a residual block formed by a three-layer convolutional neural network, and the number of hidden units of each layer is set to 512. The head discriminator of the head generating network comprises five convolution layers, the convolution kernel sizes of all the convolution layers are 4×4, the step sizes of the first three convolution layers are 2, the step sizes of the second two convolution layers are 1, that is, the step sizes of the first layer convolution layer, the second layer convolution layer and the third layer convolution layer are 2, and the step sizes of the fourth layer convolution layer and the fifth layer convolution layer are 1. For the specific structure of the head discriminator, the first layer of convolution layer consists of one convolution and LeakyReLU loss functions, the middle three convolution layers consist of three parts of convolution, BN layer and LeakyReLU layers, the last convolution layer only comprises one convolution operation, the channel number of the feature map is reduced to 1, that is, the second layer of convolution layer, the third layer of convolution layer and the fourth layer of convolution layer consist of convolution, BN layer and LeakyReLU layers, and the fifth layer of convolution layer comprises one convolution operation, and the channel number of the feature map is reduced to 1. The skin generation network is the same overall structure as the head generation network, but the dimensions of the input and output of the skin generation network are different from those of the head generation network. The header generation network has an input dimension of 20 dimensions and an output dimension of 18 dimensions. The dimension of the skin generation network input is 31 dimensions and the dimension of the output is 3 dimensions. Different loss functions are used at different stages of the doll model generation algorithm, a first fight loss function, a head analysis loss function, a first reconstruction loss function and a first VGG perception loss function are used for training in a head generation network, and a second fight loss function, a second reconstruction loss function and a second VGG perception loss function are used for training in a skin generation network. The anti-loss function can enable an image generated by the model to be more real, the head analysis loss function can enable the model to learn semantic distribution information of the head image, the reconstruction loss function can restrict an area of the network which is unchanged in the image, and the VGG perception loss function can keep the quality of the human eye perception image.

The head generation network and the skin generation network are respectively and independently trained, and are optimized by using an Adam optimizer, and the learning rate is 0.00005.

In the specific application, the generated real head image and the real skin mannequin image are combined together to obtain the real mannequin image. The combination method can be carried out in an image processing mode, for example, the generated real head image is restored to the head area in the image, and then the model image after the head is restored is input into a skin generation network, so that the real model wearing display image can be obtained.

The doll model generation algorithm based on the generation countermeasure network provided by the invention has the following advantages: (1) The head generating network based on the generating countermeasure network is provided, head image features such as face key points are added, a real head image can be generated according to various input features, meanwhile, the rest areas such as the neck are reserved, and the problem that the proportion of the five sense organs of the doll model is different from that of the real model is solved. In terms of the loss function, head parsing loss is added to better generate a real head image; (2) The invention provides a skin generation network based on a generation countermeasure network, which takes the gesture information of a human model, a reserved area of a human image, a mask of the human body and mask characteristics of the skin as inputs to generate a final human model image containing real skin, and simultaneously, for the aspect of input characteristics, openpose characteristics containing hand key points are used, so that the generated image hand area is more real; (3) A doll model generation algorithm based on generating an countermeasure network is provided. The algorithm disassembles the doll model generating task and divides the doll model generating task into two stages, wherein the first stage generates a real model head area, and the second stage generates a real skin area by utilizing the characteristics of the model such as the gesture and the skin mask. The method can effectively utilize various features in the input image, so that the generated figure model image is more real.

The doll model generation algorithm based on the generation countermeasure network provided by the invention directly converts the doll model image into the real model image, and keeps the areas of clothing and the like in the image unchanged, thereby greatly reducing the cost of displaying the clothing on a merchant line.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims

1. A mannequin generation algorithm based on generation of an countermeasure network, comprising the steps of:

training a head generation network and a skin generation network respectively;

the head analysis loss function is obtained through calculation according to the following formula:

;

Wherein, Representing a head parse loss function,/>Representing the number of pixel points,/>Representing the number of categories of pixel points,/>Represents the/>Whether the true class of the individual pixel points is the/>Class,/>Expression model generation/>The pixel point is the/>Probability of class.

2. The doll model generation algorithm based on the generation countermeasure network of claim 1 wherein the head generation network includes a Res-UNet based head generator for outputting a head image from an input head dataset and semantic distribution information of the head image, and a PatchGAN based head discriminator for discriminating the true or false of the head image output by the head generator;

The skin generation network comprises a Res-UNet-based skin generator and a PatchGAN-based skin discriminator, wherein the skin generator is used for outputting a mannequin image according to an input skin data set, and the skin discriminator is used for discriminating the authenticity of skin of the mannequin image output by the skin generator.

3. The doll model generation algorithm based on generation of countermeasure network of claim 2 wherein the head generator and the skin generator each include five layers Res-UNet, wherein the number of hidden units is set to 512;

The basic unit of each layer Res-UNet comprises a residual block consisting of a three-layer convolutional neural network.

4. A doll model generation algorithm based on generation of countermeasure network as claimed in claim 3 wherein the head generator outputs a head image and semantic distribution information of the head image from the input head data set includes the steps of:

5. The model generation algorithm based on generation of countermeasure network of claim 2, wherein the head discriminator and the skin discriminator each include PatchGAN including five convolution layers, the convolution kernel sizes of the five convolution layers are each 4×4, the step sizes of the first, second, and third convolution layers are each 2, and the step sizes of the fourth and fifth convolution layers are each 1.

6. The model generation algorithm based on generation of the countermeasure network of claim 5, wherein the first layer of convolution layers includes a convolution sum LeakyReLU loss function, the second layer of convolution layers, the third layer of convolution layers, and the fourth layer of convolution layers each include a convolution, a BN layer, and a LeakyReLU layer, and the fifth layer of convolution layers includes a convolution operation that reduces the number of channels of the feature map to 1.

7. The model generation algorithm based on generating an countermeasure network of claim 1, wherein the head generation network has an input dimension of 20 dimensions and an output dimension of 18 dimensions; the dimension of the input of the skin generation network is 31 dimension, and the dimension of the output is 3 dimension.

8. The doll model generation algorithm based on generation of countermeasure network of claim 1 wherein the optimizers of the head generation network and the skin generation network are set to Adam and learning rate is set to 0.00005 during training phase;

9. The model generation algorithm based on the generation countermeasure network according to claim 1, wherein before training the head generation network and the skin generation network respectively, the size of the real model image is adjusted to 1024×768, a head image of 256×256 in size is truncated centering on the face, and a head analysis image, a face key point feature, a human body posture feature, a mask of the head, a mask of the skin, and a mask of the human body are generated.

10. The model generation algorithm based on generation of a countermeasure network of claim 1, wherein the model generation model is generated based on Pytorch framework.

11. The model generation algorithm based on generation of a countermeasure network as claimed in claim 1, wherein the construction of the model generation dataset from the real model image dataset includes the steps of:

constructing a head data set: the size of a real model image is adjusted, a head image with a certain size is intercepted by taking a face as a center, and a head analysis image and a face key point feature are extracted;