CN112365553B

CN112365553B - Human body image generation model training, human body image generation method and related device

Info

Publication number: CN112365553B
Application number: CN201910672687.6A
Authority: CN
Inventors: 冀志龙; 侯琦; 杨非
Original assignee: Beijing Xintang Sichuang Educational Technology Co Ltd
Current assignee: Beijing Xintang Sichuang Educational Technology Co Ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2022-05-20
Anticipated expiration: 2039-07-24
Also published as: CN112365553A

Abstract

The embodiment of the invention provides a human body image generation model training method, a human body image generation method and a related device, wherein the model training method comprises the following steps: acquiring a human body training image; fixing model parameters of all levels of resolution generators lower than the current level resolution, generating a current level overall generated image and a local generated image according to the current level overall skeleton point image by using the current level resolution generator, acquiring the current level overall loss by using all levels of overall generated images, overall label images, local generated images and local label images with resolutions lower than or equal to the current level, and optimizing the model parameters of the current level resolution generator to enable the current level overall loss to meet a loss threshold; and adjusting the current level resolution, and optimizing the adjusted current level resolution generators until all generators complete optimization. The human body image generation model training method, the human body image generation method and the related device provided by the embodiment of the invention can ensure the accuracy of the human body image generation model and the accuracy of the human body image generation.

Description

Human body image generation model training, human body image generation method and related device

Technical Field

Embodiments of the present invention relate to the field of computers, and in particular, to a human body image generation model training method, apparatus, device, and storage medium, and a human body image generation method, apparatus, device, and storage medium.

Background

With the development of multimedia technology, in order to save cost and time, more and more images or videos of characters or cartoon characters are made by adopting an image generation method.

At present, an image generated by using a deep learning method is only applied to the field of low-resolution images, and when an image with high resolution is generated, the phenomena that the detail information of the generated image is rough and a picture generation model is unstable during training occur, which severely limits the application scene of the algorithm.

Therefore, how to ensure the accuracy of the image generation model to realize the accuracy of one-step image generation is an urgent technical problem to be solved.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide a human body image generation model training method, a human body image generation model training device, human body image generation equipment and a storage medium, and the human body image generation method, the human body image generation device, the human body image generation equipment and the storage medium are used for ensuring the accuracy of an image generation model and realizing the accuracy of one-step image generation.

In order to solve the above problems, an embodiment of the present invention provides a human body image generation model training method, including obtaining a human body training image, where the number of resolution levels of the human body training image is equal to the number of resolution levels of a generator of the human body image generation model, the resolution of each level of the human body training image is respectively equal to the resolution of each level of the resolution generator, and the human body training image includes an entire label image, a local label image, and an entire skeleton point image that identify the same image;

fixing model parameters of each level of resolution generators with the resolution lower than the current level resolution, starting from the lowest level resolution, generating a current level overall generated image and a current level local generated image by using the current level resolution generator according to the current level overall skeleton point image with the current level resolution, acquiring current level overall loss by using each level overall generated image, each level overall label image, each level local generated image and each level local label image with the resolution lower than or equal to the current level resolution, and optimizing the model parameters of the current level resolution generator according to the current level overall loss until the current level overall loss meets a loss threshold value to obtain a trained current level resolution generator;

and adjusting the current level resolution according to the resolution level to obtain the adjusted current level resolution, and optimizing the generators of the adjusted current level resolution until the generators of all resolutions of the resolution level complete optimization to obtain the trained human body image generation model.

In order to solve the above problem, an embodiment of the present invention further provides a human body image generating method, including:

acquiring a human skeleton point image;

and obtaining the human body image by utilizing the trained human body image generation model.

In order to solve the above problem, an embodiment of the present invention further provides a human body image generation model training apparatus, including:

the human body training image acquisition unit is suitable for acquiring a human body training image, the resolution grade number of the human body training image is equal to the resolution grade number of a generator of a human body image generation model, the resolution of each level of the human body training image is respectively equal to the resolution of each level of the resolution generator, and the human body training image comprises an integral label image, a local label image and an integral skeleton point image which identify the same image;

the current-level resolution generator training unit is suitable for fixing model parameters of all levels of resolution generators with the resolution lower than the current level resolution, starting from the lowest level resolution, generating a current-level overall generated image and a current-level local generated image by using the current-level resolution generator according to the current-level overall skeleton point image with the current level resolution, acquiring current-level overall loss by using all levels of overall generated images, all levels of overall label images, all levels of local generated images and all levels of local label images with the resolution lower than or equal to the current level resolution, and optimizing the model parameters of the current-level resolution generator according to the current-level overall loss until the current-level overall loss meets a loss threshold value to obtain the trained current-level resolution generator;

and the human body image generation model acquisition unit is suitable for adjusting the current level resolution according to the resolution level to obtain the adjusted current level resolution, and optimizing the generator of the adjusted current level resolution until the generators of all resolutions of the resolution level complete optimization to obtain the trained human body image generation model.

In order to solve the above problem, an embodiment of the present invention further provides a human body image generating apparatus, including:

the human skeleton point image acquisition unit is suitable for acquiring a human skeleton point image;

and the human body image acquisition unit is suitable for acquiring the human body image by utilizing the trained human body image generation model.

To solve the above problem, an embodiment of the present invention further provides an apparatus, including at least one memory and at least one processor; the memory stores a program that the processor calls to execute the human body image generation model training method or the human body image generation method.

In order to solve the above problem, an embodiment of the present invention further provides a storage medium, where a program suitable for training a human body image generation model is stored, so as to implement the human body image generation model training method.

In order to solve the above problem, an embodiment of the present invention further provides a storage medium storing a program suitable for human body image generation to implement the human body image generation method as described above.

Compared with the prior art, the technical scheme of the invention has the following advantages:

the human body image generation model training method, the device, the equipment and the storage medium provided by the embodiment of the invention utilize the whole label image, the local label image and the whole skeleton point image of each level of resolution to determine the current level resolution in turn from the lowest level resolution, fix the model parameters of each level resolution generator with the trained resolution lower than the current level resolution, utilize the current level resolution generator to generate the current level whole generation image according to the current level whole skeleton point image and further generate the current level local generation image, then combine the loss of the current level whole generation image and the current level local generation image when calculating the current level whole loss of the current level resolution generator of the image generation model, optimize the current level resolution generator based on the current level whole loss, and obtaining the trained current-level resolution generator until the obtained current-level overall loss meets the loss threshold, and realizing the training of the human body image generation model until the optimization of all resolution-level resolution generators of all resolution levels is completed. It can be seen that, in the training method of the image generation model provided by the embodiment of the present invention, the resolution generators of each level of the image generation model are trained in sequence from the lowest resolution, and the accuracy of the image generated by the resolution generators of each level is ensured in sequence; and in the training process of the current-level resolution generator, the current-level overall generated image and the current-level local generated image are acquired at the same time, and the current-level resolution generator is optimized by combining the loss caused by the current-level overall generated image and the current-level local generated image, so that the current-level resolution generator obtained by training can ensure the generation precision of the overall generated image, the generation precision of the local generated image and finally the human body image generation model obtained by the human body image generation model training method provided by the embodiment of the invention is used for generating the human body image, not only the human body image generation model can be used for directly generating the required human body image, but also the overall and local precision of the generated human body image is high.

Drawings

FIG. 1 is a schematic flow chart of a human body image generation model training method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a step of acquiring human training images in the human image generative model training method according to the embodiment of the present invention;

FIG. 3 is a schematic flow chart of the step of obtaining the current-level overall loss in the training method for human body image generative models according to the embodiment of the present invention;

fig. 4 is a schematic flowchart of a step of obtaining a current-level overall generated image and a current-level local generated image in the human body image generation model training method according to the embodiment of the present invention;

FIG. 5 is a schematic flow chart of a human body image generation model training method according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a human body image generating method according to an embodiment of the present invention;

FIG. 7 is a block diagram of a human body image generation model training apparatus according to an embodiment of the present invention;

FIG. 8 is a block diagram of a human body image generating apparatus according to an embodiment of the present invention;

fig. 9 illustrates an alternative hardware device architecture of the device provided in the embodiment of the present invention.

Detailed Description

In the prior art, an image generated by a deep learning mode can only be applied to the field of low-resolution images, and the detail information of the image is rough.

In one method, in order to improve the problem of insufficient detail of the obtained image when generating a high-resolution image, a method of distributed local enhancement is adopted, that is: firstly, generating a complete whole image by adopting a CNN (conditional Neural Networks Convolutional Neural network) model; then, the areas with insufficient details in the whole image are intercepted, such as: and generating a local enhanced image by adopting a local CNN model for the hand, the face and the like, and replacing a corresponding area in the original image to obtain a final overall image.

However, although the above method can improve the details of the local area to some extent, on one hand, when there are a plurality of local areas with insufficient details, a plurality of local CNN models need to be established and generated by using the plurality of CNN models, which results in long training and calculation time and increases the burden of computer hardware performance; on the other hand, although the final overall image obtained by the above method recovers the local detail information of the image to a certain extent, the overall image and the local image are generated by a plurality of models, so that an intercepting boundary artifact and local jitter exist in the final overall image, and the obtained final overall image is still not ideal.

In order to ensure the precision of an image generation model and realize the precision of one-step image generation, the embodiment of the invention provides a human body image generation model training method, a human body image generation model training device, human body image generation model training equipment and a storage medium, and an image generation method, a human body image generation model training device, human body image generation model training equipment and a storage medium.

The human body image generation model training method provided by the embodiment of the invention comprises the following steps:

acquiring a human body training image, wherein the resolution grade number of the human body training image is equal to that of a generator of the human body image generation model, the resolution of each level of the human body training image is respectively equal to that of each level of the resolution generator, and the human body training image comprises an integral label image, a local label image and an integral skeleton point image which identify the same image;

and adjusting the current level resolution according to the resolution level to obtain the adjusted current level resolution, and optimizing the generator of the adjusted current level resolution until the generators of all resolutions of the resolution level are optimized to obtain the trained human body image generation model.

Thus, the human body image generation model training method provided by the embodiment of the present invention sequentially determines the current level resolution from the lowest level resolution by using the global label image, the local label image and the global skeleton point image of each level of resolution, fixes the model parameters of each level resolution generator with the trained resolution lower than the current level resolution, generates the current level global generation image according to the current level global skeleton point image by using the current level resolution generator, further generates the current level local generation image, then optimizes the current level resolution generator based on the current level global loss by combining the loss of the current level global generation image and the current level local generation image when the current level global loss of the current level resolution generator of the image generation model is calculated until the obtained current level global loss satisfies the loss threshold, and obtaining the trained current level resolution generators until the optimization of all levels of resolution generators of all resolution levels is completed, and realizing the training of the human body image generation model.

It can be seen that, in the training method of the image generation model provided by the embodiment of the present invention, the resolution generators of each level of the image generation model are trained in sequence from the lowest resolution, and the accuracy of the image generated by the resolution generators of each level is ensured in sequence; and in the training process of the current-level resolution generator, the current-level overall generated image and the current-level local generated image are acquired at the same time, and the current-level resolution generator is optimized by combining the loss caused by the current-level overall generated image and the current-level local generated image, so that the current-level resolution generator obtained by training can ensure the generation precision of the overall generated image, the generation precision of the local generated image and finally the human body image generation model obtained by the human body image generation model training method provided by the embodiment of the invention is used for generating the human body image, not only the human body image generation model can be used for directly generating the required human body image, but also the overall and local precision of the generated human body image is high.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It can be understood that the human body image generation model training method provided by the embodiment of the invention refers to a method for training in order to improve the accuracy of human body image generation and ensure that the accuracy of the generated human body image meets the requirements after the human body image generation model is initially constructed; the human body image generation method provided by the embodiment of the invention is used for generating an image of corresponding motion of a human body based on a human body overall skeleton point image or generating a motion video of the human body according to a series of overall skeleton point images.

Referring to fig. 1, fig. 1 is a schematic flow chart diagram of a human body image generation model training method according to an embodiment of the present invention.

As shown in the figure, the human body image generation model training method provided by the embodiment of the invention comprises the following steps:

step S10: acquiring a human body training image, wherein the resolution grade number of the human body training image is equal to the resolution grade number of a generator of the human body image generation model, the resolution of each level of the human body training image is respectively equal to the resolution of each level of the resolution generator, and the human body training image comprises a whole label image, a local label image and a whole skeleton point image which identify the same image.

When training a human body image generation model, firstly, a human body training image needs to be acquired, and it can be understood that the human body training image refers to an image including a human body, so as to facilitate training and ensure the precision after training.

Of course, the whole tag image, the local tag image and the whole bone point image all identify the same image, the whole tag image with the maximum resolution (i.e. the resolution of the directly photographed image) can be acquired by an image photographing device, such as a camera, etc., or directly picked from the already photographed image, and the local tag image with the maximum resolution and the whole bone point image with the maximum resolution are extracted from the whole tag image.

In order to ensure the accuracy of the trained human body image generation model, it is necessary to train resolution generators of each level of the human body image generation model, and for this reason, the human body training image needs to include not only the whole label image, the local label image and the whole skeleton point image having the maximum resolution, but also the whole label image, the local label image and the whole skeleton point image having the same resolution as the resolution of the resolution generators of each level.

It is easy to understand that the resolution grade number of the human body training image is equal to the resolution grade number of the generator of the image generation model, and the resolution of each level of the human body training image is equal to the resolution grade number of each level of the resolution generator, respectively, means that the resolution grade number of the human body training image is at least equal to the resolution grade number of the generator of the human body image generation model, and the human body training image with the resolution equal to the resolution of each level of the resolution generator is included to ensure the training, and certainly, if the human body training image also includes images different from the resolution grade number, the human body training image will not be used when the model training is performed.

In order to ensure the smooth acquisition of the human body training image and ensure the training effect, please refer to fig. 2, where fig. 2 is a schematic flow diagram of a step of acquiring the human body training image of the human body image generation model training method provided in the embodiment of the present invention.

As shown in the figure, in one embodiment, a human training image may be acquired by the following steps.

Step S101: and acquiring a maximum resolution integral label image.

First, the whole tag image with the maximum resolution is acquired, and as shown above, the whole tag image with the maximum resolution may be acquired by the image capturing device or directly picked up from the already captured image.

In order to improve the training precision, a plurality of maximum resolution integral label images can be selected, the human body image generation model training method provided by the embodiment of the invention is operated for a plurality of times, so that the data repeatability is reduced on the basis of ensuring the processing effect, the human body (character) video data can be obtained, the images are extracted according to a certain frame frequency, and the video is converted into the images and used as the maximum resolution integral label image for training.

The specific frame rate can be set according to the needs, such as: it may be set to extract 1 frame every 5 frames to avoid duplication of image data.

Step S102: and extracting the bone point data of the maximum resolution integral label image to obtain a maximum resolution integral bone point image.

After the maximum resolution integral label image is obtained, detecting and extracting the bone point data, specifically, extracting the bone point data of the maximum resolution integral label image by using a human body posture detection algorithm, wherein the bone point data includes the bone point data of a human face, a hand and a body part, and obtaining the maximum resolution integral bone point image.

In one embodiment, RGB value identification may also be performed on the maximum resolution whole skeleton point image, and different parts are identified by different colors, which may facilitate identification of the same part in multiple images when processing multiple images.

Step S103: and acquiring a local mask region according to the maximum resolution whole bone point image, and extracting a maximum resolution local tag image from the maximum resolution whole tag image according to the local mask region.

And generating a locally enhanced mask region, namely a local mask region, according to the maximum resolution whole bone point image, and extracting to obtain a maximum resolution local label image by using the local mask region.

Step S104: and downsampling the maximum resolution integral label image, the maximum resolution integral bone point image and the maximum resolution local label image to obtain the human body training image.

After obtaining the maximum resolution whole label image, the maximum resolution whole skeleton point image and the maximum resolution local label image, down-sampling the images, wherein it can be understood that the number of steps of the down-sampling plus 1 is equal to the number of steps of the resolution grade of a generator of a human body image generation model, and the magnification of the down-sampling is required to be ensured to be the same as that of the generator of the human body image generation model, so that human body training data with all resolutions can be obtained.

In one embodiment, the down-sampling may be performed by image interpolation.

The human body training image obtained by the method not only obtains input data for training a human body image generation model, namely the whole skeleton point image with each level of resolution and the whole label data for judging the whole precision, but also obtains the local label data, and provides a reference basis for improving the precision of the local image in the subsequent optimization of the human body image generation model.

Step S11: determining the current level resolution as the lowest level resolution.

In order to reduce the influence factor on the accuracy, the human body image generation model training method provided by the embodiment of the invention starts from the generator with the lowest resolution, and for this purpose, the current resolution is determined as the lowest resolution.

Step S12: a current-level resolution generator generates a current-level global generation image and a current-level local generation image from a current-level global skeleton point image having a current-level resolution.

When the current level resolution is the lowest level resolution, a lowest level overall generation image and a lowest level local generation image are generated from the lowest level resolution skeleton point image using the lowest level resolution generator.

When the current-level resolution is not the lowest-level resolution, the current-level global generation image and the current-level local generation image are generated using the current-level global skeleton point image having the current-level resolution.

It is understood that when the current level resolution is not the lowest level resolution, it indicates that the optimization of the model parameters of each level of the resolution generators with the resolution lower than the current level resolution has been completed.

Also, in one embodiment, in order to acquire the current-level overall generated image and the current-level local generated image, the current-level resolution generator may first generate the current-level overall generated image from the current-level overall bone point image having the current-level resolution; and then extracting the current-level locally-generated image from the current-level wholly-generated image according to the local mask region.

It is to be understood that the acquisition of the current-stage locally generated image may be performed not after the overall generation of the image but before the calculation of the current-stage local loss.

Step S13: and acquiring the current-level overall loss by utilizing the all-level integrally generated images, all-level integrally labeled images, all-level locally generated images and all-level locally labeled images with the resolution lower than or equal to the current-level resolution.

And after the current-level overall generated image and the current-level local generated image are obtained, the current-level overall loss is calculated by combining all levels of overall generated images, all levels of overall label images, all levels of local generated images and all levels of local label images with the resolution lower than or equal to the resolution of the current level.

It is to be understood that each of the stages of integrally generated images having a resolution lower than and equal to the resolution of the current level includes an integrally generated image having a resolution lower than the resolution of the current level and an integrally generated image having a resolution equal to the resolution of the current level, each of the stages of integrally labeled images having a resolution lower than and equal to the resolution of the current level includes an integrally labeled image having a resolution lower than the resolution of the current level and an integrally labeled image having a resolution equal to the resolution of the current level, each of the stages of locally generated images having a resolution lower than and equal to the resolution of the current level includes a locally generated image having a resolution lower than the resolution of the current level and a locally generated image having a resolution equal to the resolution of the current level, each level of local label image with the resolution lower than and equal to the current level resolution includes a local label image with the resolution lower than the current level resolution and a local label image with the resolution equal to the current level resolution.

In order to improve the accuracy of obtaining the current-level overall loss, in a specific embodiment, please refer to fig. 3, and fig. 3 is a flowchart illustrating a step of obtaining the current-level overall loss of the human body image generation model training method according to the embodiment of the present invention.

As shown in the figure, to obtain the current stage global loss, the following operations may be performed:

step S131: and calculating the integral loss of the current level according to the integral generation images of all levels with the resolution lower than or equal to the resolution of the current level and the integral label images of all levels.

In a specific embodiment, to obtain the current-stage overall loss, the resolution discriminators of each stage in the convolutional neural network may be used to extract the overall generated image features of the discriminators of each stage and the overall label image features of the discriminators of each stage, and then the current-stage overall discriminator loss and the current-stage overall multi-layer feature loss may be calculated according to the overall generated image features of the discriminators of each stage and the overall label image features of the discriminators of each stage.

When the current level resolution is the lowest level resolution, only the lowest level discriminator overall generation image features and the lowest level discriminator overall label image features of the lowest level integrally generated image need to be extracted by using a lowest level resolution discriminator; when the current stage resolution is not the lowest resolution, the discriminator with the corresponding resolution is required to extract the current stage resolution overall generated image feature and the current stage resolution overall label image feature of the current stage overall generated image, and the each stage discriminator overall generated image feature and each stage discriminator overall label image feature of each stage overall generated image with the resolution lower than the current stage resolution, and the each stage overall generated image with the resolution lower than the current stage resolution is obtained based on the generators with each stage resolution which have completed training.

The current-stage overall discriminator loss can be calculated by adopting the following formula:

n is the total number of stages of a generator of the existing human body image generation model;

D_i-an i-level resolution discriminator;

G_i(x) -i-level generator generates an image;

y_i-a level i label image;

L_GAN- -wgan-gp function.

And the current-level overall multi-layer feature loss can be calculated by adopting the following formula:

wherein n is the total number of stages of a generator of the existing human body image generation model;

m is the number of overall image features extracted by the discriminator in the primary resolution image;

C_ikH_ikW_ik-the size of the overall image feature of the kth layer in the i-level discriminator;

D_ik-k-level image extraction features of i-level resolution discriminators;

G_i(x) -i-level generator generates an image;

y_i-a level i label image.

In order to ensure the training precision, the precision of loss determination needs to be ensured, and therefore, the current-level overall perception loss is also obtained, specifically, firstly, the trained VGG model is used for extracting the overall generation image features of each-level VGG model and the overall label image features of each-level VGG model of the overall generation image of each level, and then the current-level overall perception loss is calculated according to the overall generation image features of each-level VGG model and the overall label image features of each-level VGG model.

In a specific embodiment, the trained VGG model may be a model pre-trained on an ImageNet data set.

When the calculation of the integral perception loss of the current stage is carried out, firstly, the integral generation image features of each stage of VGG model and the integral label image features of each stage of VGG model of the integral generation image of each stage are extracted: when the current level resolution is the lowest level resolution, only the integral generation image features of the lowest level VGG model and the integral label image features of the lowest level VGG model need to be obtained; when the current level resolution is not the lowest level resolution, the current-level integral generation image feature of the VGG model, the current-level integral label image feature of the VGG model, and the integral generation image feature of each level of VGG model and the integral label image feature of each level of VGG model which are lower than the current level resolution need to be acquired.

Then, calculating the current-level overall perception loss by using the overall generated image features of each level of the VGG model and the overall label image features of each level of the VGG model, and specifically calculating by using the following formula:

wherein: g_i(x) -i-level generator generates images

y_i-a level i label image;

phi (x) - -VGG model

C_ikH_ikW_ik-size of image feature of ith level j layer

L_GAN- -wgan-gp function.

Step S132: and calculating the local loss of the current level according to the local generated images of all levels with the resolution lower than or equal to the resolution of the current level and the local label images of all levels.

Extracting local generation image features of each level of discriminator and local label image features of each level of discriminator of each level of local generation image by using each level of resolution discriminator in a convolutional neural network, and calculating the loss of the current level of local discriminator and the loss of the current level of local multilayer features according to the local generation image features of each level of discriminator and the local label image features of each level of discriminator;

and extracting local generation image features of each level of VGG model and local label image features of each level of VGG model of each level of local generation image by using the trained VGG model, and calculating the current-level local perception loss according to the local generation image features of each level of VGG model and the local label image features of each level of VGG model.

The specific process may be the same as the calculation method of the current-stage overall loss, and is not described herein again.

Step S133: and acquiring the current-level overall loss according to the current-level overall loss and the current-level local loss.

After the current-level overall loss and the current-level local loss are obtained, the current-level overall loss is further obtained, the overall loss of the current-level resolution discriminator is calculated, whether the discrimination accuracy of the current-level resolution discriminator meets the discrimination requirement can be known, whether the difference can be identified can also directly influence the overall loss of the current-level resolution generator, and therefore when the overall loss of the current-level resolution generator is obtained, the overall loss of the current-level resolution discriminator also needs to be obtained.

Obtaining the overall loss of the current-level resolution generator according to the current-level overall discriminator loss, the current-level overall multi-layer feature loss, the current-level overall perception loss, the current-level local discriminator loss, the current-level local multi-layer feature loss and the current-level local perception loss:

and acquiring the overall loss of the current-level resolution discriminator according to the overall discriminator loss and the local discriminator loss.

Wherein, the overall loss of the current level resolution generator can be obtained by using the following formula:

L_Gen＝L_vgg+λ_FML_FM+λ_DisL_Dis

+λ_Local(L_{Local_vgg}+λ_FML_{Local_FM}+λ_DisL_{Local_vgg})

wherein: lambda [ alpha ]_FM-a weighting factor of the current level multi-level feature penalty;

λ_Dis-the weighting factor lost by the current level resolution discriminator;

λ_Local-weight coefficient of local image loss

And respectively setting different weights according to the influence of each different loss on the overall loss so as to improve the accuracy of the overall loss of the current level resolution generator.

The current level resolution discriminator global penalty can be obtained using the following formula:

L_{total_Dis}＝L_Dis+λ_LocalL_{Local_Dis}

wherein: lambda [ alpha ]_Local-weight coefficients of local image loss.

Step S14: and judging whether the current stage overall loss meets a loss threshold value, if so, executing step S15, and if not, executing step S16.

Obtaining the overall loss of the current level resolution generator and the overall loss of the current level resolution discriminator, namely obtaining the overall loss of the current level, judging whether the overall loss of the current level meets a loss threshold value, specifically judging whether the overall loss of the current level resolution generator meets the generator loss threshold value and whether the overall loss of the current level resolution discriminator meets the discriminator loss threshold value, and if yes, executing step S15; otherwise, step S16 is executed.

Step S15: a trained current level resolution generator is obtained.

After judgment, if the current-level overall loss meets the loss threshold, the accuracy of the human body image generated by the current-level resolution generator meets the requirement, so that the trained current-level resolution generator can be obtained, and of course, the trained current-level resolution discriminator generator and the trained resolution generators of each level with the resolution lower than the current-level resolution are also obtained.

Step S16: and optimizing the model parameters of the current-level resolution generator according to the current-level overall loss.

If the current level overall loss does not satisfy the loss threshold, the current level resolution generator needs to be optimized, and it can be understood that, as the current level resolution generator is optimized, the discrimination capability of the current level resolution discriminator also needs to be further optimized, and for this reason, in a specific embodiment, the following manner may be adopted to implement the optimization of the current level resolution generator:

firstly, setting a training sequence of a current level resolution discriminator and a current level resolution generator, and sequentially training the current level resolution discriminator and the current level resolution generator according to the training sequence.

Specifically, it may be set as required that the training of the current-level resolution discriminator is performed first, and then the training of the current-level resolution generator is performed, or the training of the current-level resolution generator is performed first, and then the training of the current-level resolution discriminator is performed, or after 2 times of training of the current-level resolution discriminator is completed, the training of the current-level resolution generator is performed 3 times, and then the training of the current-level resolution discriminator is performed again, where the number of times of training is set as required, and the specific manner may be selected as required, and is not described herein again.

When the current level resolution discriminator is trained, fixing parameters of a current level resolution discriminator generator, and adjusting the parameters of the current level resolution discriminator according to the overall loss of the current level resolution discriminator until the overall loss of the current level resolution discriminator meets an overall loss threshold of the discriminator to obtain the trained current level resolution discriminator;

it is easily understood that after the parameters of the current level resolution discriminator are adjusted, the steps S13 to S14 are executed again, the global loss of the current level resolution discriminator and the global loss of the current level resolution discriminator are recalculated, and the judgment is made until the global loss of the current level resolution discriminator meets the discriminator global loss threshold.

And when the current level resolution generator is trained, fixing the parameters of the current level resolution discriminator, and adjusting the parameters of the current level resolution generator according to the overall loss of the current level resolution generator until the overall loss of the current level resolution generator meets a generator overall loss threshold value, so as to obtain the trained current level resolution generator.

Similarly, after the parameter adjustment of the current level resolution generator is performed, the steps S13 to S15 are executed again to recalculate the total loss of the current level resolution generator and the total loss of the current level resolution discriminator, and the determination is performed until the total loss of the current level resolution generator satisfies the total loss threshold of the generator.

Finally, the overall loss of the current level resolution generator meets the generator loss threshold, and the overall loss of the current level resolution discriminator meets the discriminator loss threshold.

Step S17: judging whether generators of all resolutions of the resolution levels are optimized, if so, executing step S110; if not, step S18 is performed.

And after obtaining the current level resolution generators after training, further judging whether all the generators with all resolutions of resolution levels have been optimized, if so, obtaining the human body image generation model after training, and if not, training the generators with all levels of resolutions which are not trained.

Step S18: and adjusting the current level resolution according to the resolution level.

The current level resolution is adjusted according to the resolution level, so that the adjusted current level resolution can be obtained, and the adjusted current level resolution is generally higher than the resolution of one level.

Step S19: the model parameters of each level of resolution generator having a resolution lower than the current level of resolution are fixed, and the process goes to step S12.

After the new current level resolution is obtained, in order to avoid the complexity of the training of the current level resolution generator caused by the change of the model parameters of each level of resolution generators with the resolution lower than the current level resolution when the generator of the current level resolution is trained, the model parameters of each level of resolution generators with the resolution lower than the current level resolution are firstly fixed, of course, the model parameters of each level of resolution generators with the resolution lower than the current level resolution are obtained through the training, and then the step S12 is executed.

It is understood that when the current level resolution generator is the lowest level resolution generator, there are no resolution generators of each level having a resolution lower than the current level resolution, and the model parameters do not need to be fixed.

Step S110: and obtaining the trained human body image generation model.

In order to further improve the accuracy of the trained human body image generation model, in a specific embodiment, please refer to fig. 4, and fig. 4 is a schematic flow chart of the step of obtaining the current-stage overall generated image and the current-stage local generated image of the human body image generation model training method provided in the embodiment of the present invention.

Step S121: and acquiring the current-level coding image characteristics of the current-level overall skeleton point image according to the current-level overall skeleton point image by utilizing a coding module of the current-level resolution generator.

Step S122: and adding the current-level coded image features and the previous-level decoded image features obtained by the previous-level whole skeleton point image decoding module with the resolution lower than the current-level resolution to obtain the current-level image features.

And adding the acquired current-level coded image characteristics and the previous-level decoded image characteristics to acquire current-level image characteristics.

It will be appreciated that the features of the encoded image at the current level need to be of the same resolution as the features of the decoded image at the previous level, so as to ensure that the two can be added.

Step S123: and decoding the current-level image features by using a decoding module of the current-level resolution generator to obtain the current-level overall generated image.

When the previous-level decoded image features are added to obtain the current-level image features, decoding the current-level image features by using a decoding module of the current-level resolution generator, so that the current-level integrally generated image can be obtained.

Step S124: and extracting the current-stage locally-generated image from the current-stage wholly-generated image according to the position of the local tag image in the whole tag image.

And acquiring the current-level local generated image from the acquired current-level overall generated image according to the position of the local label image in the overall label image.

In one embodiment, the current-level locally generated image may be acquired according to the local mask region. Specifically, the following formula can be used to obtain the locally generated image at the current stage:

I_{local part}＝I_Generating⊙I_mask

Wherein, I_Generating-a whole label image;

an example is a dot product;

I_mask-0 to 1 binary image

Thus, the human body image generation model training method provided by the embodiment of the invention not only needs to refer to the current-level whole skeleton point image when acquiring the current-level whole generation image with the higher resolution by one level, but also needs to combine the characteristics of the previous-level decoded image with the lower resolution by one level, so that the precision of the human body image acquired by the human body image generation model can be further improved.

In order to further improve the precision of the human body image generation model training method, an embodiment of the present invention further provides a human body image generation model training method, please refer to fig. 5, and fig. 5 is another schematic flow chart of the human body image generation model training method provided in the embodiment of the present invention.

As shown in the figure, the training method for generating a model of a human body image provided by the embodiment of the present invention includes the following steps:

step S20: and acquiring a human body training image, wherein the resolution grade number of the human body training image is equal to the resolution grade number of a generator of the image generation model, the resolution of each level of the human body training image is respectively equal to the resolution of each level of the resolution generator, and the human body training image comprises a whole label image, a local label image and a whole skeleton point image which mark the same image.

Step S21: determining the current level resolution as the lowest level resolution.

Step S22: a current-level resolution generator generates a current-level global generation image and a current-level local generation image from a current-level global skeleton point image having a current-level resolution.

Step S23: and acquiring the current-level overall loss by using the current-level overall generated image, the current-level overall label image, the current-level local generated image and the current-level local label image.

Step S24: and judging whether the current stage overall loss meets a loss threshold value, if so, executing step S25, and if not, executing step S26.

Step S25: a trained current level resolution generator is obtained.

Step S26: optimizing the current level resolution generator according to the current level global penalty.

Please refer to steps S10 to S16 in fig. 1 for the contents of steps S20 to S26, which are not described herein again.

Step S27: and unfreezing the model parameters of each level of resolution generator with the resolution lower than the current level of resolution.

After the training of the current-level resolution generator in the state of the fixed model parameters of each level of resolution generators with the resolution lower than the current-level resolution is completed, in order to improve the training precision, the training is performed on the current-level resolution generator in the state of unfreezing the model parameters of each level of resolution generators with the resolution lower than the current-level resolution, so that the model parameters of each level of resolution generators with the resolution lower than the current-level resolution are unfrozen.

Step S28: and determining the unfreezing current level resolution as the lowest level resolution.

In order to reduce the influence factors on the precision in the unfreezing state, the human body image generation model training method provided by the embodiment of the invention still starts training from the generator with the lowest resolution, and therefore, the current resolution is determined to be the lowest resolution at first.

Step S29: a thawing current-level resolution generator generates a thawing current-level global production image and a thawing current-level partial production image from a thawing current-level global skeleton point image having a thawing current-level resolution.

The method for obtaining the unfreezing of the current-stage wholly-generated image and the unfreezing of the current-stage partially-generated image is the same as the method for obtaining the current-stage wholly-generated image and the current-stage partially-generated image, and please refer to the details of step S12 in fig. 1, which is not repeated herein.

Step S210: and utilizing unfreezing all-stage integrally generated images, unfreezing all-stage integrally labeled images, unfreezing all-stage partially generated images and unfreezing all-stage partially labeled images with the resolution lower than or equal to the current-stage unfreezing resolution to obtain the overall loss of the current unfrozen stage.

The method for obtaining the total loss of the unfreezing current stage is the same as the method for obtaining the total loss of the current stage, please refer to the specific content of step S13 in fig. 1, and will not be described herein again.

Step S211: it is determined whether the total loss of the thawing current stage satisfies the loss threshold, if so, step S212 is performed, and if not, step S213 is performed.

Step S212: resulting in a unfreezing current level resolution generator.

After judgment, when the overall thawing loss of the current stage meets the loss threshold, the precision of the human body image generated by the thawing current stage resolution generator meets the requirement, so that the trained thawing current stage resolution generator can be obtained, and of course, the trained thawing current stage resolution discriminator generator and the trained thawing resolution generators with the resolutions lower than the thawing current stage resolution are also obtained.

Step S213: and optimizing the model parameters of the resolution generator of the unfreezing current level according to the overall loss of the unfreezing current level.

For details of step S213, please refer to the discussion of step S16 in fig. 1, which is not repeated herein.

Step S214: judging whether the unfreezing current level resolution is equal to the current level resolution, if so, executing a step S216; if not, step S215 is performed.

After obtaining the unfreezing current level resolution generator, further judging whether the unfreezing current level resolution is equal to the current level resolution, if so, training of the unfreezing model parameters is completed by all levels of resolution generators after training of the fixed model parameters, training of a higher level resolution generator under the condition of the fixed model parameters can be performed, and if not, step S215 is executed, and the unfreezing current level resolution is adjusted to be raised by one level.

Step S215: and adjusting the resolution of the unfreezing current level to be increased by one level.

Step S216: judging whether generators of all resolutions of the resolution levels are optimized, if so, executing step S219; if not, step S217 is performed.

For details of step S216, please refer to step S17 shown in fig. 1, which is not described herein again.

Step S217: and adjusting the current level resolution according to the resolution level.

Step S218: the model parameters of each level of resolution generator having a lower resolution than the current level of resolution are fixed.

Please refer to step S18 and step S19 shown in fig. 1 for details of step S217 and step S218, which are not repeated herein.

Step S219: and obtaining the trained image generation model.

The human body image generation model training method provided by the embodiment of the invention not only needs to train the current resolution generator under the condition of fixing the model parameters of each level of resolution generator with the resolution lower than the current resolution, but also needs to unfreeze the model parameters of each level of resolution generator with the resolution lower than the current resolution after the training in the state is completed, and then further trains each level of resolution generator with the resolution lower than the current resolution and the current level resolution generator in the unfreezing state, thereby further improving the training precision on the basis of ensuring the training speed.

In addition to the human body image generation model training method, an embodiment of the present invention further provides a human body image generation method, please refer to fig. 6, and fig. 6 is a flow diagram of the human body image generation method provided in the embodiment of the present invention.

As shown in the figure, the human body image generation method provided by the embodiment of the invention comprises the following steps:

step S31: and acquiring an image of the human skeleton point.

The method includes acquiring a human skeleton point image of a human body image to be generated, specifically, the human skeleton point image may be acquired by extracting human skeleton points in other images or by motion modeling, which is not limited herein.

Step S32: and obtaining the human body image by utilizing the trained human body image generation model.

And then generating a human body image by using the human body image generation model obtained after training by the human body image generation model training method.

The human body image generation method provided by the embodiment of the invention utilizes the human body image generation model trained by the human body image generation model training method to generate the human body image, the generation precision of the human body image generation model is higher, the integral precision of the human body image is higher, the precision of the local details of the directly generated human body image is higher, the image meeting the precision requirement can be directly generated only by one human body image generation model, the calculated amount and the operation time are reduced, and the requirement on computer hardware is reduced.

Of course, specifically, when the trained human body image generation model is used to generate the human body image, the method may include the following steps:

down-sampling the human skeleton point images to obtain all levels of human skeleton point images with the same resolution as that of the human skeleton point images of the human skeleton point generation model;

starting from the lowest resolution, acquiring the current-level coding image characteristics according to the current-level human skeleton point image with the current-level resolution by using a coding module of a current-level resolution generator of the human body image generation model;

adding the current-level coded image features and previous-level decoded image features obtained by a previous-level overall skeleton point image decoding module lower than the current-level resolution to obtain current-level image features;

decoding the current-level image features by using a decoding module of the current-level resolution generator to obtain current-level decoded image features until the current-level resolution is the maximum resolution, and decoding the current-level decoded image features by using the decoding module of the current-level resolution generator to obtain the whole image.

Therefore, when the human body image is generated and the current-level overall generated image with the higher resolution is obtained, the current-level overall skeleton point image needs to be referred, and the characteristics of the previous-level decoded image with the lower resolution are combined, so that the precision of the human body image obtained by the human body image generation model can be further improved.

In the following, the human body image generation model training apparatus and the human body image generation apparatus provided by the embodiments of the present invention are introduced, and the human body image generation model training apparatus and the human body image generation apparatus described below may be regarded as a functional module architecture that is required to be set by an electronic device (e.g., a PC) to respectively implement the human body image generation model training method and the human body image generation method provided by the embodiments of the present invention. The contents of the human body image generation model training apparatus and the human body image generation apparatus described below may be referred to in correspondence with the contents of the human body image generation model training method and the human body image generation method described above, respectively.

Fig. 7 is a block diagram of a human body image generative model training device provided in an embodiment of the present invention, where the human body image generative model training device may be applied to both a client and a server, and referring to fig. 7, the human body image generative model training device may include:

a human body training image obtaining unit 100 adapted to obtain a human body training image, wherein the number of resolution levels of the human body training image is equal to the number of resolution levels of a generator of the human body image generation model, the resolution of each level of the human body training image is respectively equal to the resolution of each level of the resolution generator, and the human body training image includes an overall label image, a local label image and an overall skeleton point image which identify the same image;

a current-level resolution generator training unit 110 adapted to fix model parameters of each level of resolution generators having a resolution lower than a current level resolution, starting from the lowest level resolution, generating a current-level overall generated image and a current-level local generated image by using the current-level resolution generator according to a current-level overall skeleton point image having the current level resolution, acquiring a current-level overall loss by using each level of overall generated image, each level of overall label image, each level of local generated image and each level of local label image having a resolution lower than or equal to the current level resolution, and optimizing the model parameters of the current-level resolution generator according to the current-level overall loss until the current-level overall loss satisfies a loss threshold, thereby obtaining a trained current-level resolution generator;

the human body image generation model obtaining unit 120 is adapted to adjust the current level resolution according to the resolution level to obtain an adjusted current level resolution, and optimize the generator of the adjusted current level resolution until the generators of all resolutions of the resolution level complete optimization to obtain the trained human body image generation model.

When training a human body image generation model, the human body training image obtaining unit 100 needs to obtain a human body training image first, and it can be understood that the human body training image refers to an image including a human body, so as to facilitate training and ensure precision after training.

In order to ensure the smooth acquisition of the human body training image and ensure the training effect, in one embodiment, the human body training image acquisition unit 100 may include the following modules.

And the maximum resolution integral label image acquisition module is suitable for acquiring the maximum resolution integral label image.

The maximum resolution whole tag image acquiring module acquires a whole tag image with the maximum resolution, and as shown above, the maximum resolution whole tag image may be acquired by an image capturing device or directly picked from an already captured image.

In order to improve the training precision, a plurality of maximum resolution integral label images can be selected, the human body image generation model training device provided by the embodiment of the invention is operated for a plurality of times, in order to reduce the data repeatability on the basis of ensuring the processing effect, the human body (character) video data can be obtained, the images are extracted according to a certain frame frequency, and the video is converted into the images and used as the maximum resolution integral label image for training.

And the maximum resolution integral skeleton point image acquisition module is suitable for extracting the skeleton point data of the maximum resolution integral label image and acquiring the maximum resolution integral skeleton point image.

After the maximum resolution integral label image is obtained, the maximum resolution integral skeleton point image obtaining module detects and extracts skeleton point data, specifically, the skeleton point data of the maximum resolution integral label image can be extracted by using a human body posture detection algorithm, and of course, the skeleton point data comprises skeleton point data of a human face, a hand and a body part, so that the maximum resolution integral skeleton point image is obtained.

And the maximum resolution local label image acquisition module is suitable for acquiring a local mask region according to the maximum resolution whole bone point image and extracting a maximum resolution local label image from the maximum resolution whole label image according to the local mask region.

And the human body training image acquisition module is used for carrying out down-sampling on the maximum resolution integral label image, the maximum resolution integral bone point image and the maximum resolution local label image to obtain the human body training image.

After obtaining the maximum resolution whole label image, the maximum resolution whole skeleton point image and the maximum resolution local label image, the human body training image obtaining module performs downsampling on the maximum resolution whole label image, it can be understood that the number of downsampling stages plus 1 is equal to the number of resolution stages of a generator of a human body image generation model, and it is required to ensure that the downsampling magnification is the same as the magnification of the generator of the human body image generation model, so that human body training data with all resolutions is obtained.

In one embodiment, the down-sampling may be performed using a means of image interpolation.

The human body training image obtained by the device not only obtains input data for training a human body image generation model, namely the whole skeleton point image with each level of resolution and the whole label data for judging the whole precision, but also obtains the local label data, thereby providing a reference basis for improving the precision of the local image in the subsequent optimization of the human body image generation model.

After the human body training image is obtained, the current-level resolution generator training unit 110 performs training of the current-level resolution generator training unit.

It is understood that when the current level resolution is not the lowest level resolution, it indicates that the optimization of the model parameters of each level of the resolution generator having a resolution lower than the current level resolution has been completed.

In order to further improve the accuracy of the trained human body image generation model, in one embodiment, the current-stage overall generation image and the current-stage local generation image may be acquired by the following method.

Firstly, acquiring the current-level coding image characteristics of the current-level overall bone point image according to the current-level overall bone point image by utilizing a coding module of the current-level resolution generator.

And then, adding the current-level coded image features and the previous-level decoded image features obtained by the previous-level whole bone point image decoding module with the resolution lower than the current-level resolution to obtain the current-level image features.

It will be appreciated that the features of the encoded image at the current level need to be of the same resolution as the features of the decoded image at the previous level, to ensure that the two can be added.

And then, decoding the current-level image features by using a decoding module of the current-level resolution generator to obtain the current-level overall generated image.

And finally, extracting the current-stage locally-generated image from the current-stage wholly-generated image according to the position of the local tag image in the whole tag image.

And acquiring the current-level local generation image from the acquired current-level overall generation image according to the position of the local label image in the overall label image.

In one embodiment, the current-level locally generated image may be acquired according to the local mask region.

It is to be understood that each of the stages of integrally generated images having a resolution lower than and equal to the resolution of the current level includes an integrally generated image having a resolution lower than the resolution of the current level and an integrally generated image having a resolution equal to the resolution of the current level, each of the stages of integrally labeled images having a resolution lower than and equal to the resolution of the current level includes an integrally labeled image having a resolution lower than the resolution of the current level and an integrally labeled image having a resolution equal to the resolution of the current level, each of the stages of locally generated images having a resolution lower than and equal to the resolution of the current level includes a locally generated image having a resolution lower than the resolution of the current level and a locally generated image having a resolution equal to the resolution of the current level, each level of the local label images having the resolution lower than and equal to the current level resolution includes a local label image having the resolution lower than the current level resolution and a local label image having the resolution equal to the current level resolution.

In order to improve the accuracy of obtaining the current-level overall loss, the training unit 110 of the current-level resolution generator of the training apparatus for generating a human body image model according to the embodiment of the present invention may include:

and the current-stage overall loss acquisition module is suitable for calculating the current-stage overall loss according to the overall generated images and the overall label images of all stages with the resolution lower than or equal to the resolution of the current stage.

When the current level resolution is the lowest level resolution, only the lowest level discriminator overall generation image features and the lowest level discriminator overall label image features of the lowest level integrally generated image need to be extracted by using a lowest level resolution discriminator; when the resolution of the current stage is not the lowest resolution, the classifier with the corresponding resolution is required to be used for extracting the integrally generated image features of the current stage resolution discriminator and the integrally labeled image features of the current stage resolution discriminator of the integrally generated image of the current stage, the integrally generated image features of each stage of the integrally generated image with the resolution lower than the current stage resolution and the integrally labeled image features of each stage of the discriminator, and the integrally generated image of each stage with the resolution lower than the current stage resolution is obtained based on the generators of all the resolution which have been trained.

D_i-an i-level resolution discriminator;

G_i(x) -i-level generator generates an image;

y_i-a level i label image;

L_GAN- -wgan-gp function.

To calculate using the following formula:

m is the number of the overall image characteristics extracted by the discriminator in the primary resolution image;

D_ik-k-level image extraction features of i-level resolution discriminators;

G_i(x) -i-level generator generates an image;

y_i-a level i label image.

When the calculation of the integral perception loss of the current stage is carried out, firstly, the integral generation image features of each stage of VGG model and the integral label image features of each stage of VGG model of the integral generation image of each stage are extracted: when the current level resolution is the lowest level resolution, only the integral generation image feature of the lowest-level VGG model and the integral label image feature of the lowest-level VGG model are required to be acquired; when the current level resolution is not the lowest level resolution, the current-level integral generation image feature of the VGG model, the current-level integral label image feature of the VGG model, and the integral generation image feature of each level of VGG model and the integral label image feature of each level of VGG model which are lower than the current level resolution need to be acquired.

wherein: g_i(x) -i-level generator generates images

y_i-a level i label image;

phi (x) - -VGG model

C_ikH_ikW_ik-size of image feature of ith level j layer

L_GAN- -wgan-gp function.

And the current-stage local loss acquisition module is suitable for calculating the current-stage local loss according to each stage of the local generation image and each stage of the local label image, the resolution of which is lower than or equal to the resolution of the current stage.

and extracting local generation image features of each stage of VGG model and local label image features of each stage of VGG model of each stage of local generation image by using the trained VGG model, and calculating the local perception loss of the current stage according to the local generation image features of each stage of VGG model and the local label image features of each stage of VGG model.

And the current stage overall loss acquisition module is suitable for acquiring the current stage overall loss according to the current stage overall loss and the current stage local loss.

After the current-level overall loss and the current-level local loss are obtained, the current-level overall loss is further obtained, the overall loss of the current-level resolution discriminator is calculated, whether the discrimination precision of the current-level resolution discriminator meets the discrimination requirement can be known, whether the difference can be identified can be recognized, the overall loss of the current-level resolution discriminator can be directly influenced, and therefore when the overall loss of the current-level resolution discriminator is obtained, the overall loss of the current-level resolution discriminator also needs to be obtained.

L_Gen＝L_vgg+λ_FML_FM+λ_DisL_Dis

+λ_Local(L_{Local_vgg}+λ_FML_{Local_FM}+λ_DisL_{Local_vgg})

wherein: lambda_FM- -current level multilayerA weight coefficient of feature loss;

λ_Dis-the weighting factor lost by the current level resolution discriminator;

λ_Local-weight coefficient of local image loss

L_{total_Dis}＝L_Dis+λ_LocalL_{Local_Dis}

wherein: lambda [ alpha ]_Local-a weighting factor for local image loss.

Obtaining the comprehensive loss of a current level resolution generator and the comprehensive loss of a current level resolution discriminator, namely obtaining the comprehensive loss of a current level, judging whether the comprehensive loss of the current level meets a loss threshold value, specifically judging whether the comprehensive loss of the current level resolution generator meets the generator loss threshold value or not, judging whether the comprehensive loss of the current level resolution discriminator meets the discriminator loss threshold value or not, and if so, determining that the precision of a human body image generated by the current level resolution generator meets the requirement, thereby obtaining the current level resolution generator after training, and certainly obtaining the current level resolution discriminator generator after training and the training resolution generators of all levels with the resolution lower than the current level resolution; if the loss threshold is not met, the current level resolution generator needs to be optimized, and it is understood that along with the optimization of the current level resolution generator, the discrimination capability of the current level resolution discriminator also needs to be further optimized, and for this reason, in one embodiment, the following manner may be adopted to achieve the optimization of the current level resolution generator:

firstly, fixing parameters of a current level resolution generator, and adjusting the parameters of the current level resolution discriminator according to the overall loss of the current level resolution discriminator until the overall loss of the current level resolution discriminator meets an overall loss threshold of the discriminator, so as to obtain a training current level resolution discriminator.

It is easy to understand that after the parameters of the current level resolution discriminator are adjusted, the overall loss of the current level resolution discriminator and the overall loss of the current level resolution discriminator need to be recalculated, and the determination is performed until the overall loss of the current level resolution discriminator meets the overall loss threshold of the discriminator.

Then, the parameters of the trained current level resolution discriminator are fixed, and the parameters of the current level resolution generator are adjusted according to the overall loss of the current level resolution generator until the overall loss of the current level resolution generator meets a generator overall loss threshold value.

Similarly, after the parameter adjustment of the current level resolution generator is performed, the overall loss of the current level resolution generator and the overall loss of the current level resolution discriminator also need to be calculated again, and the judgment is performed until the overall loss of the current level resolution generator meets the generator overall loss threshold.

And fixing the adjusted parameters of the current level resolution generator again, and adjusting the parameters of the current level resolution discriminator again until the overall loss of the current level resolution generator meets the generator loss threshold, and whether the overall loss of the current level resolution discriminator meets the discriminator loss threshold or not.

It will be appreciated that in another embodiment, the current level resolution generator may be trained first, followed by the current level resolution discriminator.

After obtaining the current resolution generator after training, the human body image generation model obtaining unit 120 further determines whether all the generators with all resolutions of the resolution level have been optimized, if yes, obtains the human body image generation model after training, and if not, needs to train the resolution generators of all levels that have not been trained.

First, the current level resolution is adjusted according to the resolution level, so that the adjusted current level resolution can be obtained, and the adjusted current level resolution is generally higher than the resolution of one level.

Then, in order to avoid the complexity of the training of the current resolution generator caused by the change of the model parameters of each level of resolution generators with the resolution lower than the current resolution when the generator of the current resolution is trained, the model parameters of each level of resolution generators with the resolution lower than the current resolution need to be fixed, of course, the model parameters of each level of resolution generators with the resolution lower than the current resolution are obtained through training, and then the generation of the generated image and the loss judgment are performed again until the trained human body image generation model is obtained.

It can be seen that, in the training device for the image generation model provided by the embodiment of the present invention, the resolution generators of each level of the image generation model are trained sequentially from the lowest resolution, and the accuracy of the image generated by the resolution generators of each level is ensured sequentially; and in the training process of the current-level resolution generator, the current-level overall generated image and the current-level local generated image are acquired at the same time, and the current-level resolution generator is optimized by combining the loss caused by the current-level overall generated image and the current-level local generated image, so that the current-level resolution generator obtained by training can ensure the generation precision of the overall generated image, the generation precision of the local generated image and finally the human body image generation model obtained by the human body image generation model training method provided by the embodiment of the invention is used for generating the human body image, not only the human body image generation model can be used for directly generating the required human body image, but also the overall and local precision of the generated human body image is high.

In order to further improve the accuracy of the human body image generation model training method, an embodiment of the present invention further provides a human body image generation model training device, which is configured to train a current resolution generator in a state where model parameters of each resolution generator with a resolution lower than a current resolution are fixed, and then to unfreeze the model parameters of each resolution generator with a resolution lower than the current resolution, so that the model parameters of each resolution generator with a resolution lower than the current resolution are unfrozen, and from a lowest resolution, an unfrozen current resolution generator is used to generate an unfrozen current-stage overall generated image and an unfrozen current-stage partially generated image from an unfrozen current-stage overall bone point image with an unfrozen current-stage resolution, and an unfrozen image is generated from an unfrozen current-stage overall generated image with a resolution lower than or equal to the unfrozen current-stage overall generated image, Unfreezing the whole label images of each level, unfreezing the local generated images of each level and unfreezing the local label images of each level to obtain the overall loss of the unfrozen current level, optimizing the model parameters of the resolution generator of the unfrozen current level according to the overall loss of the unfrozen current level until the overall loss of the unfrozen current level meets a loss threshold value, and obtaining a trained resolution generator of the unfrozen current level; and adjusting the resolution of the current thawing level according to the resolution level to obtain the adjusted resolution of the current thawing level, and optimizing the generator of the adjusted resolution of the current thawing level until the resolution of the current thawing level is equal to the resolution of the current level.

The human body image generation model training device provided by the embodiment of the invention not only needs to train the current resolution generator under the condition of fixing the model parameters of each level of resolution generator with the resolution lower than the current resolution, but also needs to unfreeze the model parameters of each level of resolution generator with the resolution lower than the current resolution after the training in the state is completed, and then further trains each level of resolution generator with the resolution lower than the current resolution and the current level resolution generator in the unfreezing state, thereby further improving the training precision on the basis of ensuring the training speed.

Fig. 8 is a block diagram of a human body image generating apparatus provided in an embodiment of the present invention, where the human body image generating apparatus is applicable to both a client and a server, and referring to fig. 8, the human body image generating apparatus may include:

a human bone point image obtaining unit 200 adapted to obtain a human bone point image;

the human body image obtaining unit 210 is adapted to obtain a human body image by using the trained human body image generation model.

The human skeleton point image obtaining unit 200 obtains a human skeleton point image of a human body image to be generated, specifically, the human skeleton point image can be obtained by extracting human skeleton points in other images or by motion modeling, which is not limited herein, and then the human body image obtaining unit 210 generates a human body image by using the human body image generation model obtained by the training of the human body image generation model training method.

The human body image generating device provided by the embodiment of the invention generates the human body image by utilizing the human body image generating model trained by the human body image generating model training method, has higher generating precision of the human body image generating model, not only can ensure that the whole precision of the human body image is higher, but also has higher precision of local details of the directly generated human body image, can directly generate the image meeting the precision requirement only by one human body image generating model, reduces the calculated amount and the operation time, and reduces the requirement on computer hardware.

Of course, when the human body image obtaining unit 210 generates the human body image by using the trained human body image generation model, the following operations may be performed:

Therefore, when generating the human body image, the human body image generating device provided by the embodiment of the invention needs to refer to the current-level whole skeleton point image and combine the previous-level decoded image feature with the lower resolution level when acquiring the current-level whole generated image with the higher resolution level, so that the precision of the human body image acquired by the human body image generating model can be further improved.

In order to solve the problem, an embodiment of the present invention further provides an apparatus, where the apparatus provided in the embodiment of the present invention may load the program module architecture in a program form, so as to implement the human body image generation model training method or the human body image generation method provided in the embodiment of the present invention; the hardware device may be an electronic device with specific data processing capability, and the electronic device may be: such as a terminal device or a server device.

Optionally, fig. 9 shows an optional hardware device architecture of the device provided in the embodiment of the present invention, which may include: at least one memory 3 and at least one processor 1; the memory stores a program, and the processor calls the program to execute the human body image generation model training method or the human body image generation method, and in addition, the processor can also comprise at least one communication interface 2 and at least one communication bus 4; the processor 1 and the memory 3 may be located in the same electronic device, for example, the processor 1 and the memory 3 may be located in a server device or a terminal device; the processor 1 and the memory 3 may also be located in different electronic devices.

As an optional implementation of the disclosure of the embodiment of the present invention, the memory 3 may store a program, and the processor 1 may call the program to execute the human body image generation model training method or the video frame insertion generation method provided by the above embodiment of the present invention.

In the embodiment of the invention, the electronic device can be a tablet computer, a notebook computer and other devices capable of performing human body image generation model training.

In the embodiment of the present invention, the number of the processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processor 1, the communication interface 2, and the memory 3 complete mutual communication through the communication bus 4; it is obvious that the communication connection of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 shown in fig. 9 is only an alternative;

optionally, the communication interface 2 may be an interface of a communication module, such as an interface of a GSM module;

the processor 1 may be a central processing unit CPU or a Specific Integrated circuit asic (application Specific Integrated circuit) or one or more Integrated circuits configured to implement an embodiment of the invention.

The memory 3 may comprise a high-speed RAM memory and may also comprise a non-volatile memory, such as at least one disk memory.

It should be noted that the above terminal device may further include other devices (not shown) that may not be necessary for the disclosure of the embodiment of the present invention; in view of these other elements that may not be necessary to understand the disclosure of embodiments of the present invention, embodiments of the present invention are not specifically described.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, which stores a program suitable for training a human body image generation model, and when the instruction is executed by a processor, the instruction can implement the human body image generation model training method as described above.

Embodiments of the present invention also provide a computer-readable storage medium, in which a program suitable for human body image generation is stored, and when the program is executed by a processor, the computer-readable storage medium can implement the human body image generation method as described above.

When the human body image generation model training method is realized, the computer executable instructions stored in the storage medium provided by the embodiment of the invention train all levels of resolution generators of the image generation model in sequence from the lowest level of resolution, and ensure the precision of the images generated by all levels of resolution generators in sequence; and in the training process of the current-level resolution generator, the current-level overall generated image and the current-level local generated image are acquired at the same time, and the current-level resolution generator is optimized by combining the loss caused by the current-level overall generated image and the current-level local generated image, so that the current-level resolution generator obtained by training can ensure the generation precision of the overall generated image, the generation precision of the local generated image and finally the human body image generation model obtained by the human body image generation model training method provided by the embodiment of the invention is used for generating the human body image, not only the human body image generation model can be used for directly generating the required human body image, but also the overall and local precision of the generated human body image is high.

When the human body image generation method is realized, the human body image generation model trained by the human body image generation model training method is utilized to generate the human body image, the generation precision of the human body image generation model is high, the overall precision of the human body image is high, the precision of local details of the directly generated human body image is high, an image meeting the precision requirement can be directly generated only through one human body image generation model, the calculation amount and the calculation time are reduced, and the requirement on computer hardware is reduced.

The embodiments of the present invention described above are combinations of elements and features of the present invention. Unless otherwise mentioned, the elements or features may be considered optional. Each element or feature may be practiced without being combined with other elements or features. In addition, the embodiments of the present invention may be configured by combining some elements and/or features. The order of operations described in the embodiments of the present invention may be rearranged. Some configurations of any embodiment may be included in another embodiment, and may be replaced with corresponding configurations of the other embodiment. It is obvious to those skilled in the art that claims that are not explicitly cited in each other in the appended claims may be combined into an embodiment of the present invention or may be included as new claims in a modification after the filing of the present application.

Embodiments of the invention may be implemented by various means, such as hardware, firmware, software, or a combination thereof. In a hardware configuration, the method according to an exemplary embodiment of the present invention may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, and the like.

In a firmware or software configuration, embodiments of the present invention may be implemented in the form of modules, procedures, functions, and the like. The software codes may be stored in memory units and executed by processors. The memory unit is located inside or outside the processor, and may transmit and receive data to and from the processor via various known means.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Although the embodiments of the present invention have been disclosed, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A human body image generation model training method is characterized by comprising the following steps:

acquiring a human body training image, wherein the resolution grade number of the human body training image is equal to the resolution grade number of a generator of a human body image generation model, the resolution of each level of the human body training image is respectively equal to the resolution of each level of the resolution generator, and the human body training image comprises an integral label image, a local label image and an integral skeleton point image which identify the same image;

2. The human image generation model training method as set forth in claim 1, wherein the step of generating the current-stage globally generated image and the current-stage locally generated image from the current-stage global bone point image having the current-stage resolution using the current-stage resolution generator includes:

acquiring the current-level coding image characteristics of the current-level overall skeleton point image according to the current-level overall skeleton point image by utilizing a coding module of the current-level resolution generator;

decoding the current-level image features by using a decoding module of the current-level resolution generator to obtain the current-level overall generated image;

and extracting the current-stage locally-generated image from the overall label image according to the position of the local label image in the overall label image.

3. The human image generative model training method of claim 2, wherein the step of obtaining the global loss of the current stage using the global generation images of each stage, the global label images of each stage, the local generation images of each stage, and the local label images of each stage having a resolution lower than or equal to the resolution of the current stage comprises:

calculating the integral loss of the current stage according to the integral generation images of all stages with the resolution lower than or equal to the resolution of the current stage and the integral label images of all stages;

calculating the local loss of the current stage according to each stage of the local generation image and each stage of the local label image with the resolution lower than or equal to the resolution of the current stage;

and acquiring the current-level overall loss according to the current-level overall loss and the current-level local loss.

4. The human image generative model training method of claim 3, wherein the step of calculating the current level overall loss from each level of the overall generative image and each level of the overall label image having a resolution lower than or equal to the current level resolution comprises:

extracting the integral generation image characteristics of each stage of discriminator and the integral label image characteristics of each stage of discriminator by using each stage of resolution discriminator in a convolutional neural network, and calculating the loss of the integral discriminator at the current stage and the loss of the integral multi-layer characteristics at the current stage according to the integral generation image characteristics of each stage of discriminator and the integral label image characteristics of each stage of discriminator;

and extracting the integral generation image features of each level of VGG model and the integral label image features of each level of VGG model of each level of integral generation image by using the trained VGG model, and calculating the integral perception loss of the current level according to the integral generation image features of each level of VGG model and the integral label image features of each level of VGG model.

5. The human image generative model training method of claim 4, wherein the step of calculating the local loss at the current stage according to the locally generated images at each stage and the locally labeled images at each stage comprises:

6. The human image generative model training method as claimed in claim 5 wherein said step of deriving said current stage global loss from said current stage global loss and said current stage local loss comprises:

7. The human image generative model training method as claimed in claim 6 wherein said step of optimizing model parameters of said current level resolution generator according to said current level global penalty until said current level global penalty satisfies a penalty threshold comprises:

setting a training sequence of a current level resolution discriminator and a current level resolution generator, and sequentially training the current level resolution discriminator and the current level resolution generator according to the training sequence;

when the current level resolution generator is trained, fixing the parameters of the current level resolution discriminator, and adjusting the parameters of the current level resolution generator according to the overall loss of the current level resolution generator until the overall loss of the current level resolution generator meets a generator overall loss threshold value, so as to obtain the trained current level resolution generator.

8. The human image generative model training method of claim 1, wherein between the step of obtaining a trained current level resolution generator and the step of adjusting the current level resolution according to the resolution level to obtain an adjusted current level resolution and optimizing the adjusted current level resolution generator until all resolution generators of the resolution level complete the optimization, further comprising:

unfreezing the model parameters of each level of resolution generator with the resolution lower than the current level resolution, starting from the lowest level resolution, utilizing the unfreezing current level resolution generator to generate an unfreezing current level overall generated image and a unfreezing current level local generated image according to the unfreezing current level overall bone point image with the unfreezing current level resolution, utilizing the unfreezing current level overall generated image with the resolution lower than or equal to the unfreezing current level resolution, unfreezing each level overall label image, unfreezing each level local generated image and unfreezing each level local label image to obtain the unfreezing current level overall loss, and optimizing the model parameters of the unfreezing current level resolution generator according to the unfreezing current level overall loss until the unfreezing current level overall loss meets a loss threshold value to obtain a trained unfreezing current level resolution generator;

and adjusting the resolution of the current thawing level according to the resolution level to obtain the adjusted resolution of the current thawing level, and optimizing the generator of the adjusted resolution of the current thawing level until the resolution of the current thawing level is equal to the resolution of the current level.

9. The human image generative model training method of any one of claims 1 to 8, wherein the step of acquiring human training images comprises:

acquiring a maximum resolution integral label image;

extracting the bone point data of the maximum resolution integral label image to obtain a maximum resolution integral bone point image;

acquiring a local mask region according to the maximum resolution whole bone point image, and extracting a maximum resolution local tag image from the maximum resolution whole tag image according to the local mask region;

and downsampling the maximum resolution integral label image, the maximum resolution integral bone point image and the maximum resolution local label image to obtain the human body training image.

10. The human image generation model training method as set forth in claim 9, wherein the step of generating the current-stage globally generated image and the current-stage locally generated image from the current-stage global bone point image having the current-stage resolution using the current-stage resolution generator includes:

generating a current-level overall generation image from the current-level overall skeleton point image having the current-level resolution by using a current-level resolution generator;

and extracting the current-level locally-generated image from the current-level wholly-generated image according to the local mask region.

11. A human body image generation method, comprising:

acquiring a human skeleton point image;

acquiring a body image using the trained body image generation model of any one of claims 1-10.

12. A human body image generation model training device is characterized by comprising:

the current-level resolution generator training unit is suitable for fixing model parameters of all levels of resolution generators with resolution lower than the current level resolution, starting from the lowest level resolution, generating a current-level overall generated image and a current-level local generated image by using the current-level resolution generator according to a current-level overall skeleton point image with the current level resolution, acquiring current-level overall loss by using all levels of overall generated images, all levels of overall label images, all levels of local generated images and all levels of local label images with resolution lower than or equal to the current level resolution, and optimizing the model parameters of the current-level resolution generator according to the current-level overall loss until the current-level overall loss meets a loss threshold value to obtain the trained current-level resolution generator;

13. A human body image generation apparatus, characterized by comprising:

a body image acquisition unit adapted to acquire body images using the trained body image generation model of any of claims 1-10.

14. An electronic device comprising at least one memory and at least one processor; the memory stores a program that the processor calls to execute the human body image generation model training method according to any one of claims 1 to 10 or the human body image generation method according to claim 11.

15. A storage medium characterized in that the storage medium stores a program adapted for training a human image generative model to implement the human image generative model training method as set forth in any one of claims 1 to 10.

16. A storage medium characterized in that it stores a program adapted for human body image generation to realize the human body image generation method as claimed in claim 11.