CN115019128A

CN115019128A - Image generation model training method, image generation method and related device

Info

Publication number: CN115019128A
Application number: CN202210625806.4A
Authority: CN
Inventors: 杜鸿飞; 邓攀; 刘明; 王晓敏; 龚海刚; 刘明辉; 程旋; 邓佳丽; 解天舒
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2022-09-06

Abstract

The invention provides an image generation model training method, an image generation method and a related device, wherein the method comprises the following steps: acquiring a random noise sample and an image sample; performing iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until the pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model; the loss function is constructed on the basis of distance information between generated images, discrimination information of the generated images and the image samples and preset parameter information, wherein the distance information is obtained by random noise samples; iterative training is used to reduce the similarity between the generated images. Aiming at the noise with high similarity, the invention can obtain different generated images, and enlarge the distance between the distribution modes corresponding to the generated images, thereby solving the problem of mode collapse.

Description

Image generation model training method, image generation method and related device

Technical Field

The invention relates to the technical field of deep learning, in particular to an image generation model training method, an image generation method and a related device.

Background

Generation of a countermeasure network (GAN) is a deep learning model, is one of the most promising methods for unsupervised learning in recent years on complex distribution, and has attracted much attention in the fields of image generation, image restoration, image conversion, and the like.

The generation countermeasure network is composed of a generator and an arbiter, and the generator and the arbiter are in a countermeasure relationship. The discriminator learns to distinguish between real samples and false samples generated by the generator, which then causes the discriminator to consider the self-generated samples as real samples. As training progresses, the discriminators have stronger resolving power, and the samples generated by the generator are closer to the real samples. However, the training freedom for generating the countermeasure network is too large, the generator and the discriminator are easy to fall into an abnormal countermeasure state, pattern collapse occurs, the diversity of generated images is insufficient, and all pictures generated in severe pattern collapse are the same.

At present, most methods for solving the problem of mode collapse start from a model structure, but the solution increases the consumption of computing resources due to the change of a model and increases the complexity of model training, so how to provide a simplified and effective training method is a technical problem to be solved.

Disclosure of Invention

An objective of the present invention is to provide an image generation model training method, an image generation method and a related apparatus, so as to solve the problem of mode collapse and improve the image generation quality.

In a first aspect, the present invention provides a method for training an image generation model, where the method includes: acquiring a random noise sample and an image sample; performing iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until a pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model; the loss function is constructed based on distance information between generated images obtained by the random noise samples, discrimination information of the generated images and the image samples and preset parameter information; the iterative training is used to reduce the similarity between the generated images.

In a second aspect, the present invention provides an image generation method, the method comprising: obtaining a random noise sample; inputting the random noise sample into an image generation model, and generating a first image and a second image, wherein the first image and the second image are different; the image generation model is obtained by the image generation model training method as provided in the first aspect.

In a third aspect, the present invention provides an image generation model training apparatus, including: the acquisition module is used for acquiring a random noise sample and an image sample; the training module is used for carrying out iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until a pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model; in each iteration training, the loss function is constructed based on distance information between generated images obtained by the random noise sample, discrimination information of the generated images and the image samples and preset parameter information; the iterative training is used to reduce the similarity between the generated images.

In a fourth aspect, the present invention provides an image generating apparatus comprising: an obtaining module, configured to obtain a random noise sample; the generation module is used for inputting the random noise samples into an image generation model to generate a first image and a second image; the first image and the second image are different; the image generation model is obtained by the image generation model training method according to the first aspect.

In a fifth aspect, the invention provides an electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being capable of executing the computer program to implement the method according to the first aspect or the method according to the second aspect.

In a sixth aspect, the present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect or the method according to the second aspect.

The invention provides an image generation model training method, an image generation method and a related device, wherein the method comprises the following steps: acquiring a random noise sample and an image sample; performing iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until a pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model; the loss function is constructed based on distance information between generated images obtained by the random noise samples, discrimination information of the generated images and the image samples and preset parameter information; the iterative training is used to reduce the similarity between the generated images. Different from the mode of solving the mode collapse from the angle of a model structure in the prior art, the method is optimized from the angle of a loss function, is constructed on the basis of distance information between generated images obtained by the random noise sample, discrimination information of the generated images and the image sample and preset parameter information, and is used for carrying out model training on the basis of the loss function, so that different generated images can be obtained by aiming at noise with high similarity of the obtained model, the distance between distribution modes corresponding to the generated images is enlarged, and the problem of mode collapse can be solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a schematic diagram of a mode collapse occurring with GAN;

fig. 2 is a block diagram of an electronic device according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of constructing a loss function provided by an embodiment of the present invention;

FIG. 4 is a diagram illustrating the effect of the loss function provided by the embodiment of the present invention;

FIG. 5 is a graph of probability curve information between iteration number and balance factor provided by an embodiment of the present invention;

FIG. 6 is a schematic flow chart of a method for training an image generation model according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a training method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating an effect of a training method according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating the testing effect of an image generation model according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of an image generation method provided by an embodiment of the invention;

FIG. 11 is a functional block diagram of an image generation model training apparatus according to an embodiment of the present invention;

fig. 12 is a functional block diagram of an image generating apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

The generation countermeasure network consists of a generator and a discriminator, wherein the generator is mainly used for capturing the probability distribution of a real sample and then generating a generation sample similar to the real sample so as to achieve the purpose of confusing the discriminator; the function of the discriminator is to distinguish the generated false sample from the real sample. In a specific iteration updating process, when one module parameter is updated, the parameter of the other module is fixed, then the back propagation is carried out to update the model parameter, and the steps are carried out alternately.

GAN has two serious problems of training instability and pattern collapse since its introduction, and particularly, the pattern collapse problem is still one of the most troublesome problems of GAN at present.

An important manifestation of pattern collapse is that when similar noise samples are input, although they are not the same, the output images are still completely consistent, especially when the noise changes in a small interval, which is especially obvious.

As shown in fig. 1, fig. 1 is a schematic diagram of the mode collapse of GAN, when two random noise N1 and random noise N2 with 91% similarity are input into the model. The generated images of the model are completely consistent. This means that when the noise input into the generator varies within a certain range δ, the generated instance thereof does not substantially change, where g (z) represents the image generated after the noise passes through the generator, that is, the mode that the generator can cover is limited, and the limited sample mode is prone to the problem of incomplete mode coverage when facing a complex data set, and finally, the mode collapse problem will be caused, and the generated image quality is poor.

That is, ideally, the generator can learn all the pattern distribution in the real sample, but this state is difficult to achieve in practice, and usually, the generator will only capture part of the patterns in the real sample, which will seriously affect the generation quality of the image.

In order to solve the pattern collapse problem, the related art proposes the following solutions:

(1) the mode of modifying data processing is one direction of solving mode collapse, a common small batch discriminator of the method generates a countermeasure network (Minibatch GAN), and the discrimination capability of the discriminator on real data and generated data is enhanced by using the data correlation among a plurality of batchs, so that a generator can be driven to capture more data modes.

(2) Another aspect for alleviating mode collapse is to add an additional discriminator, and use multiple discriminators to improve the mode capturing capability of the model, such as Dual discriminator generator adaptive networks (D2 GAN), in which the model trains two discriminators simultaneously to minimize Kullback-leibler (KL) divergence and reverse KL divergence, so as to make the mode distribution more uniform; training a multi-generator model is also an important direction for solving the mode collapse, for example, an MGAN model using multiple generators avoids the problem of insufficient fitting ability of a single generator when facing a complex data set, and uses multiple generators and assists a classifier to capture all distributions of real samples, thereby alleviating the mode collapse problem to a certain extent, but the consumption of computing resources is increased to a certain extent.

(3) The AE-OT-GAN proves the problem of mode collapse of GAN theoretically based on the optimal migration theory, the standard distribution of an input generator is mapped into discontinuous distribution in a latent space by using a discontinuous optimal transmission diagram, a combined model based on an autoencoder and a GAN model is used, wherein the autoencoder is responsible for mapping real data to the latent space, the GAN model generates new samples by using the distribution in the latent space, and the scheme also has a certain effect on the mode collapse alleviation.

It can be summarized from the above solutions that most of the existing methods for solving the problem of mode collapse start from the model structure, but the solution increases the computational resource consumption due to the change of the model, and increases the complexity of model training, so how to provide a simplified and effective training method is a technical problem to be solved.

Therefore, in the training method provided in the embodiment of the present invention, the problem of mode collapse is solved from the perspective of loss function optimization, so as to improve the image generation quality, and the idea of the training method is as follows: when two similar noises N1 and N2 are input into the model, a diversity penalty constraint is added to the model, the distance of the generated result is pulled, and the model is forced not to generate a completely consistent result.

Before describing the training method provided by the embodiment of the present invention, an electronic device is introduced, please refer to fig. 2, and fig. 2 is a block diagram of the electronic device provided by the embodiment of the present invention, where the electronic device may be an execution subject of the training method provided by the embodiment of the present invention, and may also be an execution subject of the image generation method provided by the embodiment of the present invention, and the electronic device may be, but is not limited to: servers, mobile terminals, etc.

As shown in fig. 2, the electronic device 200 comprises a memory 201, a processor 202 and a communication interface 203, wherein the memory 201, the processor 202 and the communication interface 203 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 201 may be used to store software programs and modules, such as instructions/modules of the image generation model training apparatus 300 or the image generation apparatus 400 provided in the embodiment of the present invention, which may be stored in the memory 201 in the form of software or firmware or fixed in an Operating System (OS) of the electronic device 200, and the processor 202 executes the software programs and modules stored in the memory 201, so as to execute various functional applications and data processing. The communication interface 203 may be used for communication of signaling or data with other node devices.

The Memory 201 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

Processor 202 may be an integrated circuit chip having signal processing capabilities. The processor 202 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in fig. 2 is merely illustrative and that electronic device 200 may include more or fewer components than shown in fig. 2 or may have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.

Before describing the training method provided by the embodiment of the present invention, a method for constructing a loss function in the embodiment of the present invention is described, please refer to fig. 3, where fig. 3 is a schematic flowchart for constructing a loss function provided by the embodiment of the present invention, and the method includes:

and S31, constructing a distance information loss model according to the first generated image and the second generated image obtained by the random noise sample.

Wherein the first generated image is generated based on random noise samples; the second generated image is generated based on randomly perturbed samples of the noise samples.

As shown in fig. 1, since the images generated by two noise samples with the same distribution pattern are the same in a certain noise range, in the embodiment of the present invention, when two noise samples with very high similarity (corresponding to the above random noise sample and random disturbance sample) N1 and N2 are added with a diversity penalty constraint, that is, a distance information loss model, the distances between the respective distribution patterns of the generated images corresponding to N1 and N2 may be pulled apart, so as to force the model not to generate a completely consistent result, as shown in fig. 4, fig. 4 is a schematic diagram of the effect of the loss function provided by the embodiment of the present invention.

In an optional implementation manner, the diversity penalty constraint, that is, the distance information loss model in the embodiment of the present invention may satisfy the following relation:

Diversity＝KL(G(z)||G(z+δ))

wherein G (z) represents a first image generated after the random noise sample z passes through the generator, and G (z + delta) represents a second image generated after the random noise sample z is added with a slight disturbance delta through the generator; delta characterizes the noise threshold; the KL Divergence (Kullback-Leibler Divergence) is then used to measure the "distance" between two sample probability distribution functions.

It should be noted that the KL divergence may be replaced by another divergence model, and is not limited herein.

The working mechanism of the diversity penalty term is as follows: in the initial stage, the generated pattern coverage areas of z and z + δ are as shown in (a) in fig. 4, during the training process, the diversity penalty term always focuses on the distance between the sampling point z and the generated sample in the z + δ area in the noise space, and during the training process, a reverse gradient update power is applied to the sample with the generated pattern overlapping, so that the generated pattern corresponding to the noise interval is gradually pulled apart, and finally, the generated pattern coverage area reaches as shown in (b) in fig. 4.

S32, a discrimination information loss model is constructed based on the discrimination information of the first generated image and the image sample and the first weight information.

In an embodiment of the present invention, the discrimination information is generated by the discriminator based on the first image and the image sample, and in an alternative embodiment, the discrimination information loss model may satisfy the following relation:

wherein Ez-Z [ D (G (Z)) ] -Ex-Pr [ D (x)) ] is the basic loss of the GAN model, and D (G (Z)) represents the judgment information of the first image; d (x) the discrimination information of the characterization image sample; z represents a distribution model of random noise samples Z; pr represents a distribution model of the image sample; e characterizes the expectation; λ 1, λ 2 are first weight information.

The above-mentioned gradient penalty term

The method is used for limiting the Lipschitz continuity of the discriminator on the distribution between generated data and real data, the continuity limitation enables a model to meet the requirement of Kantorovich Rubenstein duality, so that the calculation of the Wassertein optimal distance is converted into the optimization problem of generating an anti-network loss function, meanwhile, the gradient is limited within a certain range, the occurrence of the gradient explosion problem is reduced with great probability, the unstable training problem of the GAN is fundamentally relieved, wherein,

the following relation is satisfied:

wherein, is distributed

The data points corresponding to the distribution Pg of the generated image moving towards Pr are sampled uniformly in a straight line.

The consistency penalty term CT _x′,x The method is used for limiting the continuity of the discriminator on the real data distribution, so that the distance between the generated data distribution of the model at the initial training stage and the real data distribution can be shortened more quickly, and meanwhile, certain effects on the generation quality improvement and the mode collapse problem relief of the model are achieved.

The consistency penalty term CT _x′,x "satisfies the following relation:

wherein d (,) represents L ₂ The norm calculations, D (x') and D (x ″), are the discriminatory information of the discriminator output resulting from two different degrees of perturbation applied to the image sample x. For the disturbance mode of data, in order to randomly drop hidden layers of the discriminators, when the probability of drop is small, the output of the disturbance discriminators can be regarded asThe output of the original discriminator for the data point x ' not far from x is shown as D (x ') to represent the discrimination information corresponding to x '. Then, finding out the discrimination information D (x ') of a second data point x ' around x in the same way, and setting the value of M ' as 0; in practical experiments, it was found that limiting the model's penultimate layer D _ (.) can slightly improve the performance of the model, so two outputs, D _ (x') and D _ (x ″), are added to the CT penalty term.

S33, constructing a loss function based on the distance information loss model, second weight information corresponding to the distance information loss model and the judgment information loss model; or, the loss function is constructed based on the distance information loss model, the second weight information corresponding to the distance information loss model, the discrimination information loss model, the balance factor and the random coefficient, and the loss function is constructed.

In one embodiment, the distance information loss model and the discrimination information loss model obtained in step S31 and step S32 are added to obtain a loss function constructed in the embodiment of the present invention, and the loss function may be formed as the relation (1):

where λ 3 characterizes the second weight information.

In another possible embodiment, since the pattern diversity and the generation quality have conflicting trends to some extent, the generation quality will have a certain increase when the sample diversity is reduced, and the generation quality will have a certain decrease when the sample diversity is increased. In order to balance the diversity and the generation quality of the samples, the invention introduces a balance factor p, and the balance factor can regulate and control the distance information loss mode and the proportion of the judgment information loss model in the updating process. When the balance factor is increased, the proportion of the diversity penalty term is increased, so that the generation diversity is increased, and when the balance factor is decreased, the proportion of the basic loss term is increased, so that the generation quality of the image is improved. The specific working mode of the balance factor is that the diversity punishment items are reversely propagated and updated according to the probability in each iteration in the training process, and the probability value is adjustable, so that the aim of controlling the proportion of the two losses in the updating process is fulfilled.

The loss function obtained in the above manner can be expressed by the following relation (2):

wherein p represents a balance factor; μ characterizes the random coefficient, satisfying a uniform distribution U (0, 1).

It should be noted that, in the actual model training process, the problem of mode collapse in the prior art can be solved by using any one of the loss functions in the above relation (1) or relation (2).

In an alternative embodiment, for the loss function shown in the above relation (2), during each iterative training process, specific values of the balance factor and the random coefficient may be determined, and therefore, in a possible embodiment, the manner of determining the balance factor and the random coefficient may be:

in each iterative training, determining the value of the balance factor corresponding to each iterative training according to the preset iteration times and the probability curve information between the balance factors, and randomly sampling from a preset distribution model to obtain the random coefficient.

Referring to fig. 5, fig. 5 is a probability curve information diagram between iteration times and balance factors provided in the embodiment of the present invention, according to fig. 5, a balance factor corresponding to each iteration time may be determined, a preset distribution model may be a uniform distribution model U (0,1), and random sampling is performed from the uniform distribution model, so that a random coefficient may be obtained.

Based on the loss function constructed above, the training method of the image generation model provided by the implementation of the present invention is described below.

Referring to fig. 6, fig. 6 is a schematic flowchart of an image generation model training method according to an embodiment of the present invention, where an execution subject of the method may be the electronic device shown in fig. 2, and the method includes:

s601, acquiring a random noise sample and an image sample.

In a possible implementation manner, a gaussian distribution model, for example, gaussian distribution N (0,1) with a mean value of 0 and a variance of 1, may be determined first, and random noise samples may be obtained by randomly sampling from N (0, 1).

The image sample may be any real image obtained from the image dataset, and the image sample is subsequently represented by x in the embodiment of the present invention.

S602, performing iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until the pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model;

the generation countermeasure network is provided with a generator and an arbiter, and in the training process, the network parameters of the arbiter can be fixed, and the network parameters of the generator are updated based on the loss value of the loss function constructed in the embodiment of the application until the set conditions are met.

In the embodiment of the present invention, the setting condition may be that the loss function is in a convergence state, or that the number of iterations reaches a preset number of iterations, which is not limited herein.

In the embodiment of the present invention, after the random noise sample is input to the generator, the generated image may be obtained, and the generated image and the image sample are input to the discriminator, the discrimination information may be obtained.

It can be understood that, in the process of constructing the loss function, distance information between the generated images and discrimination information between the generated images and the image samples are used, in the iterative training process, the loss function is optimized through the distance information and the discrimination information, and the network parameters of the generator are continuously updated based on the loss value of the loss function, so that the generator can gradually reduce the similarity between the generated images, the generated images are generated towards more and more different trends, and the distance between the distribution modes corresponding to the generated images is enlarged, thereby solving the problem of mode collapse.

In an alternative implementation manner, in order to facilitate understanding of the training process, step S602 provided in this embodiment of the present invention may include:

step 1, inputting a random noise sample into a generator to obtain a first generated image and a second generated image;

step 2, inputting the first generated image and the image sample into a discriminator to obtain discrimination information;

step 3, inputting the distance information of the first generated image and the second generated image into a distance information loss model and inputting the judgment information into a judgment information loss model to obtain a loss value of a loss function;

step 4, keeping the network parameters of the discriminator fixed, and updating the network parameters of the generator based on the loss value;

and 5, when the set condition is not reached, returning to execute the step of inputting the random noise sample into the generator to obtain a first generated image and a second generated image until the loss function reaches the set condition, and taking the trained generation countermeasure network as an image generation model.

In an alternative embodiment, in order to obtain the first generated image and the second generated image, the step 1 may include the following steps:

a1, determining a noise threshold.

In an optional implementation manner, in order to maximize the benefit of the diversity penalty term, a noise threshold δ needs to be determined, the penalty term is limited to be effective only for noise within a threshold range of the sampling point, and then a random disturbance sample corresponding to the random noise sample is determined according to the noise threshold, and a second generated image is obtained.

An embodiment of determining the noise threshold is given below, which may include the following steps:

a1-1, randomly sampling from a preset data distribution model to obtain a plurality of random values, and obtaining each random disturbance sample corresponding to a random noise sample according to each random value;

a1-2, obtaining a generated image corresponding to the random noise sample and a generated image corresponding to each random disturbance sample according to the generator;

a1-3, determining the similarity between the generated images corresponding to the random noise samples and the generated images corresponding to each random disturbance sample;

and A1-4, determining the generated image with the minimum similarity as a target generated image, and determining the target generated image to correspond to a random value as a noise threshold value.

In an optional implementation manner, in the embodiment of the present invention, a sample z input model is randomly sampled from a distribution obeying N (0,1) to obtain a first generated image Fake1 ═ G (z), then, a disturbance noise δ is randomly sampled from the N (0, x) distribution, where x is a real number greater than 0, the disturbance noise δ and a clean input z are added to obtain a disturbance input z + δ, and then, the disturbance input generator is enabled to obtain a disturbance to obtain a second generated image Fake2 ═ G (z + δ). Therefore, it is only necessary to observe the variation trend of the output of the generator with the variation of the disturbance noise δ and compare the similarity between the generated image Fake1 and Fake2, so as to determine the noise critical value of the generation pattern variation, and thus determine the threshold δ of the disturbance noise. The invention designs a comparison experiment for exploring a noise threshold delta, the experiment takes a trained generator G as a basic model, and takes Fake1 ═ G (z) and Fake2 ═ G (z + delta), the influence degree of a series of x values on the difference between output Fake1 and Fake2 is tested, and the threshold delta is finally determined.

A2, based on the noise threshold, determining a random disturbance sample of the random noise samples.

And A3, obtaining a first generated image according to the random noise sample, and obtaining a second generated image based on the random disturbance sample.

Referring to fig. 7, fig. 7 is an exemplary diagram of a training method according to an embodiment of the present invention, in each training process, first, some random noise z is input to a generator of GAN, and output Fake1 ═ G (z) and Fake1 ═ G (z + δ), where G (z) represents a generated image after the random noise z passes through the generator, and δ is a noise threshold, which represents slight disturbance. Fig. 8 shows the first image and the second image, where fig. 8 is a schematic diagram illustrating the effect of the training method provided by the embodiment of the present invention, and the distance information corresponding to the outputs Fake1 ═ G (z) and Fake2 ═ G (z + δ) of the generator may optimize the loss function, increase the distance between Fake1 and Fake2, and force the generator to fail to generate a completely consistent result in the case of inputting similar noise, so as to avoid the phenomenon of mode collapse. Meanwhile, the Fake1 is input into the discriminator as an input and is distinguished together with the real image, the generator and the discriminator are mutually confronted, and the high-quality image is generated in a reciprocating cycle under the influence of a diversity penalty mechanism.

The present invention evaluates this method using two quantitative measurement methods: inclusion Score (IS) and Frechet Inclusion Distance (FID). The IS measures sample quality and diversity by finding the predicted label entropy, the higher the IS, the better the model performance. FID evaluates the effect of generation by comparing the generated image with the real image to compute a "distance value," which measures the similarity between the real and ghost samples by fitting a multivariate gaussian (MVG) model to the intermediate representation of the real and ghost samples. For FID, the lower the score, the better the model performance.

Table 1 shows the IS score and FID score of the training method proposed by the invention on the CIFAR-10 data set, and also lists the scores of some common GAN models. It can be seen that the training method provided by the invention is still improved to a certain extent based on the CT-GAN model, the generation effect of the model can reach the current advanced level, and more importantly, a large amount of extra computational power loss is saved on the basis of obtaining a better effect.

TABLE 1 IS score comparison of different models on CIFAR-10 dataset

In order to visually observe the effect of the improved loss function of the invention on the alleviation of the mode collapse, it is assumed that the model inputs two noises N ₁ And N ₂ Obtaining a resultant image, wherein N ₂ ＝N ₁ + delta, and then comparing the two corresponding generated images G (N) ₁ ) And G (N) ₂ ) A difference of (c).

Fig. 9 is a test effect diagram of the image generation model provided in the embodiment of the present invention, which shows a comparison diagram of the generation effect when the diversity penalty term is applied to the CIFAR-10 dataset training, where (a) in fig. 9 is the image generation effect of the prior art, and (b) in fig. 9 is the image generation effect of the training method provided in the embodiment of the present invention, and the upper image is G (N) ₁ ) Corresponding first generated image, the lower image being G (N) ₂ ) A corresponding second generated image. As can be seen from the comparison results, for the prior art, image G (N) is generated ₁ ) And G (N) ₂ ) Very close, no obvious mode change is seen by the two; and for the optimization method of adding diversity penalty, G (N) ₁ ) And G (N) ₂ ) Obvious mode difference can be seen, and the diversity punishment is proved to really widen the generated mode within the noise threshold range, so that the aim of enhancing the collapse resistance of the model mode is fulfilled.

Based on the obtained image generation model, an embodiment of the present invention further provides an image generation method, please refer to fig. 10, where fig. 10 is a schematic flowchart of the image generation method provided in the embodiment of the present invention, and the method includes:

s701, acquiring a random noise sample;

s702, inputting the random noise sample into the image generation model, and generating a first image and a second image, wherein the first image and the second image are different.

In the embodiment of the present invention, the first generated image and the second generated image are different from each other, which may also be understood as that the first generated image and the second generated image are not similar to each other, that is, the similarity between the first generated image and the second generated image is smaller than the similarity threshold; the similarity threshold can be used for measuring the similarity of the two, the similarity threshold can be customized according to actual conditions, and the smaller the similarity threshold is, the more dissimilar the two are.

The image generation model is obtained by the image generation model training method provided by the embodiment of the invention.

Referring to fig. 11, fig. 11 is a functional block diagram of an image generative model training device according to an embodiment of the present invention, wherein the image generative model training device 300 includes:

an obtaining module 310, configured to obtain a random noise sample and an image sample;

the training module 320 is configured to perform iterative training on the initial generated countermeasure network according to the random noise sample and the image sample until the pre-constructed loss function reaches a set condition, and use the trained generated countermeasure network as an image generation model;

in each iteration training, a loss function is constructed on the basis of distance information between generated images, discrimination information of the generated images and image samples and preset parameter information, wherein the distance information is obtained by random noise samples; iterative training is used to reduce the similarity between the generated images.

In an alternative embodiment, the image generative model training apparatus 300 may further comprise a construction module for performing the steps of fig. 3 to achieve the corresponding technical effect.

In an alternative embodiment, the training module 320 may be specifically configured to perform steps 1 to 5, and each step included in step 1 to achieve the corresponding technical effect.

In an optional embodiment, the training module 320 may be further specifically configured to, in each iterative training, determine a numerical value of a balance factor corresponding to each iterative training according to preset iteration times and probability curve information between the balance factors, and randomly sample from a preset distribution model to obtain the random coefficient.

Referring to fig. 12, fig. 12 is a functional block diagram of an image generating apparatus according to an embodiment of the present invention, where the image generating apparatus 400 includes:

an obtaining module 410 is configured to obtain random noise samples.

A generating module 420, configured to input a random noise sample into the image generation model, and generate a first image and a second image; the first generated image and the second generated image are different; the image generation model is obtained by the image generation model training method provided by the embodiment of the invention.

Embodiments of the present invention further provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the image generation model training method or the image generation method according to any one of the foregoing embodiments. The computer readable storage medium may be, but is not limited to, various media that can store program codes, such as a usb disk, a removable hard disk, a ROM, a RAM, a PROM, an EPROM, an EEPROM, a magnetic disk, or an optical disk.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Claims

1. A method for training an image generation model, the method comprising:

acquiring a random noise sample and an image sample;

performing iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until a pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model;

the loss function is constructed based on distance information between generated images obtained by the random noise samples, discrimination information of the generated images and the image samples and preset parameter information; the iterative training is used to reduce the similarity between the generated images.

2. The image generative model training method of claim 1, wherein the loss function is constructed by:

constructing a distance information loss model according to a first generated image and a second generated image obtained by the random noise sample; wherein the first generated image is generated based on the random noise samples; the second generated image is generated based on randomly perturbed samples of the noise samples;

constructing a discrimination information loss model according to discrimination information of the first generated image and the image sample and first weight information;

constructing the loss function based on the distance information loss model, second weight information corresponding to the distance information loss model and the discrimination information loss model; or,

and constructing the loss function based on the distance information loss model, second weight information corresponding to the distance information loss model, the judgment information loss model, a balance factor and a random coefficient, and constructing the loss function.

3. The image generative model training method of claim 2, further comprising:

in each iteration training, according to preset iteration times and probability curve information between balance factors, determining the value of the balance factor corresponding to each iteration training, and randomly sampling from a preset distribution model to obtain the random coefficient.

4. The image generative model training method of claim 2, wherein the generative confrontation network comprises a generator and a discriminator;

performing iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until a pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model, wherein the iterative training comprises the following steps:

inputting the random noise samples into the generator to obtain the first generated image and the second generated image;

inputting the first generated image and the image sample into the discriminator to obtain the discrimination information;

inputting the distance information of the first generated image and the second generated image into the distance information loss model and inputting the discrimination information into the discrimination information loss model to obtain the loss value of the loss function;

keeping the network parameters of the arbiter fixed, and updating the network parameters of the generator based on the loss value;

and when the set condition is not reached, returning to execute the step of inputting the random noise sample into the generator to obtain the first generated image and the second generated image until the loss function reaches the set condition, and taking the trained generation countermeasure network as an image generation model.

5. The method of claim 4, wherein the random noise samples are input to the generator to obtain a first generated image and a second generated image, the method further comprising:

determining a noise threshold;

determining a random perturbation sample of the random noise samples based on the noise threshold;

and obtaining the first generated image according to the random noise sample, and obtaining the second generated image based on the random disturbance sample.

6. The image generation model training method of claim 5, wherein determining a noise threshold comprises:

randomly sampling from a preset data distribution model to obtain a plurality of random values, and obtaining each random disturbance sample corresponding to the random noise sample according to each random value;

according to the generator, obtaining a generated image corresponding to the random noise sample and a generated image corresponding to each random disturbance sample;

determining the similarity between the generated images corresponding to the random noise samples and the generated images corresponding to each random disturbance sample;

and determining the generated image with the minimum similarity as a target generated image, and determining the random value corresponding to the target generated image as the noise threshold value.

7. An image generation method, characterized in that the method comprises:

obtaining a random noise sample;

inputting the random noise sample into an image generation model, and generating a first image and a second image, wherein the first image and the second image are different; the image generation model is obtained by the image generation model training method according to any one of claims 1 to 6.

8. An image generative model training apparatus, comprising:

the acquisition module is used for acquiring a random noise sample and an image sample;

the training module is used for carrying out iterative training on the initial generation countermeasure network according to the random noise sample and the image sample until a pre-constructed loss function reaches a set condition, and taking the trained generation countermeasure network as an image generation model;

in each iteration training, the loss function is constructed based on distance information between generated images obtained by the random noise sample, discrimination information of the generated images and the image samples and preset parameter information; the iterative training is used to reduce the similarity between the generated images.

9. An image generation apparatus, comprising:

an obtaining module, configured to obtain a random noise sample;

the generation module is used for inputting the random noise samples into an image generation model to generate a first image and a second image; the first image and the second image are different; the image generation model is obtained by the image generation model training method according to any one of claims 1 to 6.

10. An electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being operable to execute the computer program to implement the method of any one of claims 1 to 6 or the method of any one of claim 7.

11. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any of claims 1-7.