CN114038055A - Image generation method based on contrast learning and generation countermeasure network - Google Patents

Image generation method based on contrast learning and generation countermeasure network Download PDF

Info

Publication number
CN114038055A
CN114038055A CN202111254371.9A CN202111254371A CN114038055A CN 114038055 A CN114038055 A CN 114038055A CN 202111254371 A CN202111254371 A CN 202111254371A CN 114038055 A CN114038055 A CN 114038055A
Authority
CN
China
Prior art keywords
image
network
discriminator
generator
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111254371.9A
Other languages
Chinese (zh)
Inventor
张亮
王博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
University of Electronic Science and Technology of China
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, Yangtze River Delta Research Institute of UESTC Huzhou filed Critical University of Electronic Science and Technology of China
Priority to CN202111254371.9A priority Critical patent/CN114038055A/en
Publication of CN114038055A publication Critical patent/CN114038055A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an image generation method based on contrast learning and generation of a confrontation network, and belongs to the field of computer vision. The method comprises the steps that firstly, a generated countermeasure network is selected as a basic framework, when the generated countermeasure network is trained, positive and negative samples are required to be constructed for query objects of a discriminator and a generator respectively, the purpose is to map images into a representation space by using the discriminator, so that the discriminator can learn reasonable representation of the images under self-supervision, introduction of additional model parameters can be reduced, and the generator can map similar input random vectors into similar images under self-supervision and map dissimilar random vectors into different images, so that the diversity of the generated images is improved. After the generation of the confrontation network is trained, an image can be generated by inputting noise into the generator. By the method, the advantages of contrast learning and generation of the countermeasure network are fully utilized, and the diversity of the generated images of the existing generation method is improved.

Description

Image generation method based on contrast learning and generation countermeasure network
Technical Field
The invention belongs to the field of computer vision, and mainly relates to the problem of improving the diversity of image generation; the method is mainly applied to the aspects of film and television entertainment industry, man-machine interaction, machine vision understanding and the like.
Background
At present, the demand for image generation is increasing in the fields of movie and television entertainment, computer vision understanding and the like. For example: in a role playing game, a player can control parameters to generate a character avatar according to preferences; in early education, matching images were generated from texts, and infants were guided to recognize the wide and varied world by using the images. Common image generation models include autoregressive models, variational autoencoders, Generative Adaptive Networks (GAN), and flow models. The generation of the countermeasure network has the advantages of small calculation amount, high quality of generated images, simple model structure and the like, and is widely applied to image generation. In recent years, model improvements for generation of countermeasure networks have focused mainly on both structural improvements and loss function improvements.
The method for improving the structure of the countermeasure network is quite multiple, and mainly aims to improve the capability of generating a model of the countermeasure network, such as SAGAN, and introduces a self-attention module into the model, so that the model can take three aspects of simulation remote dependence, efficiency and calculation amount into consideration, and the generation capability of the model is improved. Reference documents: zhang, h., Goodfellow, i., Metaxas, d., & ondena, a. (2019, May), Self-authentication genetic adaptive network in International conference on machine learning (pp.7354-7363), PMLR.
The improvement on the generation of the countermeasure network loss function is mainly to solve the problems of unstable training, gradient disappearance, mode collapse and the like of the generation of the countermeasure network. Arjovsky et al use Wasserstein distance to construct a loss function and impose Lipschitz continuity constraints on the discriminators, significantly improving the quality and richness of the images generated by generating the countermeasure network. The method for improving the countermeasure network has the advantages of good model improvement effect and convenient reference, and the method is combined with the idea of comparative learning to improve the loss function of the countermeasure network. Reference documents: gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Corville, A. (2017). Improved training of wasserstein gans.arxiv preprint arXiv: 1704.00028.
Contrast learning is a common self-supervision method, which utilizes labeled information generated by data itself to realize model training in a supervision mode, and has become an effective solution for deep representation learning. Aiming at the problem that the traditional generation countermeasure network generates images with insufficient diversity, the loss function of the generation countermeasure network is improved by combining contrast learning, and excellent results are obtained.
Disclosure of Invention
The invention discloses an image generation method based on contrast learning and generation countermeasure network, which solves the problem of insufficient diversity of generated images in the prior art.
The method comprises the steps of firstly, selecting and using a generated countermeasure network as a basic framework, normalizing and scaling a training picture to be 32 x 3, and sampling a normal distribution to obtain an input random vector. Meanwhile, the core thought of the comparative learning and the Info NCE loss function of the comparative learning are used for reference. When the countermeasure network is generated by training, positive and negative samples need to be constructed for query objects of the discriminator and the generator respectively, the purpose is to map images into a representation space by using the discriminator, so that the discriminator can learn reasonable representation of the images under self-supervision, and introduction of additional model parameters can be reduced, while the generator can map similar input random vectors into similar images under self-supervision, and map dissimilar random vectors into different images, thereby improving the diversity of the generated images. After the generation of the confrontation network is trained, an image can be generated by inputting noise into the generator. By the method, the advantages of contrast learning and generation of the countermeasure network are fully utilized, and the diversity of the generated images of the existing generation method is improved. The general structure of the algorithm is schematically shown in fig. 1.
For convenience in describing the present disclosure, certain terms are first defined.
Definition 1: a normal distribution. Also called gaussian distribution, is a probability distribution that is very important in the fields of mathematics, physics, engineering, etc., and has a significant influence on many aspects of statistics. If the random variable x, its probability density function satisfies
Figure BDA0003323570280000021
Where μ is the mathematical expectation of a normal distribution, σ2The variance of a normal distribution is said to satisfy the normal distribution, and is often denoted as N (mu, sigma)2)。
Definition 2: a countermeasure network is generated. The generation countermeasure network comprises two different neural networks, one called generator G and the other called discriminator D, which oppose each other during the training process, the purpose of the discriminator being to distinguish the true data distribution prAnd generating a data distribution pgThe purpose of the generator is not to distinguish the two distributions by the discriminator, and finally the generated data distribution is consistent with the real data distribution: p is a radical ofr=pg
Definition 3: and (5) comparison learning. Contrast learning learns the representation of samples by making comparisons between input samples. Contrast learning does not learn signals from a single data sample at a time, but rather learns by making comparisons between different samples. A comparison may be made between a positive pair of "similar" inputs (consisting of the query object and its positive sample) and a negative pair of "different" inputs (consisting of the query object and its negative sample).
Definition 4: a batch normalization layer. The deep neural network training technique is a technique for deep neural network training, namely, each batch of data is normalized, so that the convergence rate of a model can be increased, and more importantly, the problem of gradient dispersion in a deep network is relieved to a certain extent, so that the deep network model is trained more easily and stably.
Definition 5: the ReLU activation layer. Also called modified linear unit, is an activation function commonly used in artificial neural networks, usually referred to as a non-linear function represented by a ramp function and its variants, and expressed as f (x) max (0, x)
Definition 6: tanh active layer. Can be expressed
Figure BDA0003323570280000022
And (4) defining.
Definition 7: a global average pooling layer. Pooling layers are often used by convolutional networks to reduce model size, speed computation, and improve robustness of extracted features. The global average pooling layer is used for averaging the input characteristic diagram as the name suggests, and is generally used for replacing the final full-connection layer of the neural network, so that the dimension reduction is directly realized, and the parameters of the network are greatly reduced.
Definition 8: IS index. The IS index IS mainly used for measuring the quality and diversity of the image, when the image IS clear enough, the type of the image can be known definitely, the Inception V3 model IS used for predicting the probability of generating the image x as the type y, namely the conditional probability p (y | x) IS obtained, the Inception V3 model can divide the image into 1000 types, namely the quality of the image x IS considered to be better as long as the probability of the image x as one type IS higher. Similarly, in order to consider the richness of the image, it is desirable that the categories of the image are uniformly distributed, and the edge probability is considered at this time
Figure BDA0003323570280000023
Where N is the number of generated images and the number of classes y depends on the number of image classes N in the training data setclassHope that
Figure BDA0003323570280000024
The calculation formula of the IS index can be described as:
Figure BDA0003323570280000025
definition 9: FID index. Calculating the FID index also requires the use of the IncepotionV 3 model.When calculating the FID, this last pooling layer needs to be removed, and a 2048-dimensional high-level feature, hereinafter referred to as an n-dimensional feature, is obtained. For real images, their n-dimensional features follow a certain distribution; likewise, the n-dimensional features of the image generated by GAN also follow a distribution, the FID index represents the frichet Distance between the two distributions, and the calculation method can be described as follows:
Figure BDA0003323570280000031
tr represents the summation operation of the elements on the diagonal of the matrix. Mu.sdata,μgSum Σdata,∑gMean and covariance of the n-dimensional features representing the real image and the generated image, respectively. A lower FID means closer proximity between the two distributions, which means higher quality and better diversity of the generated pictures.
Definition 10: IncepotionV 3 model. The InceptitionV 3 model is a deep neural network used to extract features, which passes the extracted features through the last pooling layer to output a class of images.
Therefore, the technical scheme of the invention is an image generation method based on contrast learning and generation of a countermeasure network, which comprises the following steps:
step 1: preprocessing the data set;
acquiring real images, classifying and labeling the real images according to objects displayed by the images, and normalizing pixel values of all pictures;
step 2: constructing a generator network and a discriminator network for generating an antagonistic neural network;
1) the network input of the generator is a random vector, and the output is a picture; the first layer of the generator network is a linear layer, then three upsampling residual error network blocks are connected, then a batch normalization layer, a ReLU activation layer and a convolution layer are sequentially connected, and finally a Tanh activation layer is connected;
2) the network input of the discriminator is a picture, and the output is a scalar and a vector; the discriminator network is divided into three modules: feature extraction module D1Anti-loss module D2Watch with a watchA sign mapping module H; feature extraction module D1The input is a picture, the output is a feature vector of the picture, and a feature extraction module D1The first layer of (2) is an optimized down-sampling residual network block, then three standard down-sampling residual network blocks are connected, and then a ReLU activation layer and a global average pooling layer are sequentially connected; loss-fighting module D2Is the feature extraction module D1Is output as a scalar: true or false, counter-loss module D2The method is characterized by comprising the following steps of (1) adopting a linear layer; input of the characterization mapping module H is a feature extraction module D1The output of the mapping module is a characterization vector, the characterization mapping module H is formed by a linear layer, the overall network structure is shown in figure 2, and the residual error network block structure is shown in figure 5;
and step 3: constructing positive and negative samples for the query object;
1) when constructing positive and negative samples for the discriminator, random image transformation operation including image rotation, image inversion, and image saturation, contrast and darkness adjustment is performed on the real image x to obtain its positive sample x+Randomly extracting other real images as negative samples of the query image
Figure BDA0003323570280000034
1, wherein N is a negative sample number; the construction is shown in fig. 3.
2) Defining a radius as R for a query variable z when constructing positive and negative samples for a generator; in a real image, taking a hypersphere with a query variable z as a center, randomly sampling in the hypersphere to obtain a positive sample z+:||z+-z||2R is less than or equal to R, and a negative sample is obtained by random sampling outside the hypersphere
Figure BDA0003323570280000032
N is the number of negative samples; the construction is shown in fig. 4.
And 4, step 4: designing a loss function;
1) designing a loss function for the discriminator network, and setting an image randomly generated by a generator as xg~pgRandomly extracting query image x from real imagerTo search forImage x of inquiryrConstructing a positive sample x according to method 1) in step 3+And negative sample
Figure BDA0003323570280000035
Feature extraction module D using discriminator1To extract a query image xrRespectively obtaining the characteristics f of the query image and the characteristics f of the positive sample query image+Negative sample query image feature
Figure BDA0003323570280000033
Sign for
Figure BDA00033235702800000421
Loss-fighting module D for sending the characteristics f of the query image to the discriminator2Performing countermeasure training, and mapping all image features into a representation space through a representation mapping module H to obtain H (H (f)), wherein H is+=H(f+),
Figure BDA00033235702800000422
1, N, a loss function of the discriminator
Figure BDA0003323570280000041
Comprises the following steps:
wherein the content of the first and second substances,
Figure BDA0003323570280000042
generating a countermeasure loss function, α, for the arbiter of the countermeasure networkdIs the weight of the term that is used,
Figure BDA0003323570280000043
is a contrast loss function, D (x)g) Representing the output value of the discriminator on the generated image, the larger the output value is, indicating that,
Figure BDA0003323570280000044
indicating the desire for the output value,
Figure BDA0003323570280000045
i.e. distribution
Figure BDA0003323570280000046
Generating an image distribution p for a set of data pr andge denotes a linear mixing coefficient,
Figure BDA0003323570280000047
indicating that the discriminant function is graded with respect to the blended image,
Figure BDA0003323570280000048
the coefficient is a gradient penalty term and is used for constraining the parameters of the discriminator model to accord with the lipschitz continuous condition, wherein lambda is a gradient penalty coefficient, and tau is a temperature coefficient;
2) aiming at a loss function of generator network design, recording a random vector extracted from a standard multivariate normal distribution N (0, I) as z and I as an identity matrix, and constructing a positive sample z for the random vector according to the method 2) in the step 3+Negative sample
Figure BDA00033235702800000420
Inputting the vector into a generator to obtain a corresponding generated image
Figure BDA0003323570280000049
Figure BDA00033235702800000410
Then, the generated image is input to a feature extraction section D of the discriminator1Obtaining corresponding generated image features
Figure BDA00033235702800000411
Characterizing an image
Figure BDA00033235702800000412
Loss-fighting module D sent to discriminator2Performing countermeasure training, and mapping all generated image features into a representation space through a representation mapping module H to obtain
Figure BDA00033235702800000413
Loss function of generator
Figure BDA00033235702800000414
Comprises the following steps:
Figure BDA00033235702800000415
Figure BDA00033235702800000416
generating a countermeasure loss function for the generator of the countermeasure network, αgIs the weight of the term that is used,
Figure BDA00033235702800000417
is a contrast loss function introduced by the present invention; g (z) is a generated image of the generator with respect to the random variable z, D (G (z)) is an output value of the discriminator with respect to the generated image,
Figure BDA00033235702800000419
representing the mathematical expectation of the discriminator with respect to the output value of the generated image when the input random variable z of the generator conforms to the standard multivariate normal distribution;
and 5: training the generation countermeasure neural network constructed in the step 2, performing network training by using the loss function constructed in the step 4, fixing the parameters of the discriminator network D when updating the generator network G, fixing the parameters of the generator network G when updating the discriminator network D, and iteratively updating the discriminator 5 times each time and then updating the generator once;
step 6: the trained generator network G is used to generate images.
The innovation here is that:
1) the image is projected into the representation space directly by the discriminator without introducing an additional mapping model, and the discriminator can be prompted to distinguish the image by learning the characteristics of the image instead of directly memorizing the image.
2) The method for constructing the positive and negative samples in the input vector space enables a generator to map vectors far away from the input vector space into different images, so that the diversity of the generated images of the generator is improved, meanwhile, the image features generated by the generator on vectors very close to the input vector space are more similar, and the robustness of the generator can be improved.
3) The idea of comparative learning was used to improve the loss function for generation of the countermeasure network and to achieve excellent results in experiments with IS score 8.046 and FID score 15.60 on the CIFAR-10 dataset.
Drawings
FIG. 1 is a main flow chart of the method of the present invention.
Fig. 2 is a diagram of the main network structure of the method of the present invention.
FIG. 3 is a schematic diagram of constructing positive and negative samples for a query image according to the present invention.
FIG. 4 is a diagram illustrating the construction of positive and negative samples for a query vector according to the present invention.
Fig. 5 is a schematic diagram of an upsampled residual network block, a standard downsampled residual network block, and an optimized downsampled residual network block of the present invention.
Detailed Description
Step 1: preprocessing the data set;
CIFAR-10 dataset (http:// www.cs.toronto.edu/. kriz/CIFAR-10-python. tar. gz) was obtained, which has a total of 60000 RGB natural images of size 32X 32, comprising 10 categories, these ten categories including: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks. Using 50000 images as training set, randomly disordering training sequence, finally normalizing picture pixel value to range [ -1, 1 ].
Step 2: constructing a generator network and a discriminator network for generating a countermeasure network;
1) the network input of the generator is a random vector, and the output is a picture; the first layer of the generator network is a linear layer, then three upsampling residual error network blocks are connected, then a batch normalization layer, a ReLU activation layer and a convolution layer are sequentially connected, and finally a Tanh activation layer is connected;
2) judgmentThe input of the discriminator network is a picture, and the output is a scalar and a vector; the discriminator network is divided into three modules: feature extraction module D1Anti-loss module D2And a characterization mapping module H. The input of the feature extraction module is a picture, the output is a feature vector of the picture, and the feature extraction module D1The first layer of (1) is an optimized down-sampled residual network block, followed by three standard down-sampled residual network blocks, followed in turn by a ReLU activation layer, a global average pooling layer. Loss-fighting module D2The input of (D) is a feature vector, the output is a scalar quantity, and a loss resisting module D2A linear layer formation is used. The input of the characterization mapping module H is a feature vector, the output is a characterization vector, the characterization mapping module H is formed by adopting a linear layer, the overall network structure is shown in figure 2, and the residual error network block structure is shown in figure 5.
And step 3: constructing positive and negative samples for the query object;
1) when positive and negative samples are constructed for the discriminator part, random image transformation operation (including image rotation, image inversion, and image saturation, contrast and darkness adjustment) is carried out on the inquired real image x to obtain a positive sample x thereof+Randomly extracting other real images from the training data set as negative samples of the query image
Figure BDA0003323570280000051
N, N is a negative sample number, and is constructed in the manner shown in fig. 3.
2) When positive and negative samples are constructed for a generator part, a hypersphere with radius R and query variable z as center is defined for a query variable z-N (0, I), and positive example z is obtained by random sampling in the hypersphere+:||z+-z||2R is less than or equal to R, and a negative sample is obtained by random sampling outside the hypersphere
Figure BDA0003323570280000053
Figure BDA0003323570280000052
N, N is a negative sample number, and is constructed in the manner shown in fig. 4.
And 4, step 4: designing a loss function;
1) design the loss function for the discriminant, memory from the training data set distribution prX is the real image extracted at randomrThe image randomly generated by the generator is xg-pg, which is the real image x of the queryrConstructing positive and negative samples x according to method 1) in step 3+
Figure BDA00033235702800000613
N, using a feature extraction module D of the discriminator1To extract the image features and respectively obtain the features f of the query image and the features f of the regular example image+Features of negative sample images
Figure BDA00033235702800000616
N, the characteristics f of the query image are sent to a countermeasure loss module D of the discriminator2Performing countermeasure training, and mapping all image features into a representation space through a characterization mapping module H to obtain H ═ H (f), and H + ═ H (f)+),
Figure BDA00033235702800000614
N, the penalty function of the discriminator may be described as:
Figure BDA0003323570280000061
Figure BDA0003323570280000062
generating a countermeasure loss function for the arbiter of the countermeasure network, wherein
Figure BDA0003323570280000063
I.e. distribution
Figure BDA0003323570280000064
Distributing p for a data setrAnd generating an image distribution pgWith λ being the gradient penalty factor, oneIs generally set to 10.
Figure BDA0003323570280000065
Is a contrast loss function introduced by the present invention, where αdτ is the temperature coefficient, which is the weight of this term.
2) For the loss function of generator design, let the random vector extracted from the standard multivariate normal distribution N (0, I) be z, and construct positive and negative samples z for it according to method 2) in step 3+
Figure BDA00033235702800000615
N, inputting the vector into a generator to obtain a corresponding generated image
Figure BDA0003323570280000066
Then, the generated image is input to a feature extraction section D of the discriminator1Obtaining corresponding generated image features
Figure BDA0003323570280000067
Characterizing an image
Figure BDA0003323570280000068
The confrontation training module sent to the discriminator carries out confrontation training and maps all the generated image characteristics to the representation space through the representation mapping module H to obtain
Figure BDA0003323570280000069
N, the loss function of the generator may be described as:
Figure BDA00033235702800000610
Figure BDA00033235702800000611
to generate a countermeasure loss function for the generator of the countermeasure network,
Figure BDA00033235702800000612
is a contrast loss function introduced by the present invention, where αgτ is the temperature coefficient, which is the weight of this term.
And 5: training the generation countermeasure neural network constructed in the step 3, performing network training by using the loss function constructed in the step 4, fixing the parameter of D when G is updated, fixing the parameter of G when D is updated, and updating the discriminator 5 times and then updating the generator once each time;
step 6: in the testing stage, the model is trained in the step 5, and only a generator network part for generating the countermeasure network is taken; 50000 random vectors are randomly sampled from a standard multivariate normal distribution N (0, I), 50000 generated images are obtained by inputting the vectors into a generator network, and an IS index and an FID index of the generated images are calculated so as to evaluate the generation capacity of the generator network.
The picture size is as follows: 32*32*3
Learning rate: 0.0002 and decreases linearly with the number of iterations
Training batch size: 64
Iteration times are as follows: 100000
Contrast loss function weight alpha of discriminatord:1
Contrast loss function weight alpha of the generatorg:1
Temperature coefficient in contrast loss function τ: 0.3
Number of negative samples N: 64.

Claims (1)

1. an image generation method based on contrast learning and generation of a countermeasure network, the method comprising:
step 1: preprocessing the data set;
acquiring real images, classifying and labeling the real images according to objects displayed by the images, and normalizing pixel values of all pictures;
step 2: constructing a generator network and a discriminator network for generating an antagonistic neural network;
1) the network input of the generator is a random vector, and the output is a picture; the first layer of the generator network is a linear layer, then three upsampling residual error network blocks are connected, then a batch normalization layer, a ReLU activation layer and a convolution layer are sequentially connected, and finally a Tanh activation layer is connected;
2) the network input of the discriminator is a picture, and the output is a scalar and a vector; the discriminator network is divided into three modules: feature extraction module D1Anti-loss module D2And a characterization mapping module H; feature extraction module D1The input is a picture, the output is a feature vector of the picture, and a feature extraction module D1The first layer of (2) is an optimized down-sampling residual network block, then three standard down-sampling residual network blocks are connected, and then a ReLU activation layer and a global average pooling layer are sequentially connected; loss-fighting module D2Is the feature extraction module D1Is output as a scalar: true or false, counter-loss module D2The method is characterized by comprising the following steps of (1) adopting a linear layer; input of the characterization mapping module H is a feature extraction module D1The output is a characterization vector, and a characterization mapping module H is formed by adopting a linear layer;
and step 3: constructing positive and negative samples for the query object;
1) when constructing positive and negative samples for the discriminator, random image transformation operation including image rotation, image inversion, and image saturation, contrast and darkness adjustment is performed on the real image x to obtain its positive sample x+Randomly extracting other real images as negative samples of the query image
Figure FDA00033235702700000111
1, wherein N is a negative sample number;
2) defining a radius as R for a query variable z when constructing positive and negative samples for a generator; in a real image, taking a hypersphere with a query variable z as a center, randomly sampling in the hypersphere to obtain a positive sample z+:||z+-z||2R is less than or equal to R, and a negative sample is obtained by random sampling outside the hypersphere
Figure FDA0003323570270000012
Figure FDA0003323570270000013
N is the number of negative samples;
and 4, step 4: designing a loss function;
1) designing a loss function for the discriminator network, and setting an image randomly generated by a generator as xg~pgRandomly extracting query image x from real imagerTo query the image xrConstructing a positive sample x according to method 1) in step 3+And negative sample
Figure FDA0003323570270000014
Feature extraction module D using discriminator1To extract a query image xrRespectively obtaining the characteristics f of the query image and the characteristics f of the positive sample query image+Negative examples query features of images
Figure FDA0003323570270000015
Loss-fighting module D for sending the characteristics f of the query image to the discriminator2Performing countermeasure training, and mapping all image features into a representation space through a representation mapping module H to obtain H (H (f)), wherein H is+=H(f+),
Figure FDA0003323570270000016
Loss function of discriminator
Figure FDA0003323570270000017
Comprises the following steps:
Figure FDA0003323570270000018
wherein the content of the first and second substances,
Figure FDA0003323570270000019
generating a countermeasure loss function, α, for the arbiter of the countermeasure networkdIs the weight of the term that is used,
Figure FDA00033235702700000110
is a contrast loss function, D (x)g) Representing the output value of the discriminator on the generated image, the larger the output value is, indicating that,
Figure FDA0003323570270000021
indicating the desire for the output value,
Figure FDA0003323570270000022
i.e. distribution
Figure FDA00033235702700000216
Distributing p for a data setrAnd generating an image distribution pgThe linear mixing of (a) and (b),the linear mixing coefficients are represented by the coefficients of the linear mixing,
Figure FDA0003323570270000023
indicating that the discriminant function is graded with respect to the blended image,
Figure FDA0003323570270000024
the coefficient is a gradient penalty term and is used for constraining the parameters of the discriminator model to accord with the lipschitz continuous condition, wherein lambda is a gradient penalty coefficient, and tau is a temperature coefficient;
2) aiming at a loss function of generator network design, recording a random vector extracted from a standard multivariate normal distribution N (0, I) as z and I as an identity matrix, and constructing a positive sample z for the random vector according to the method 2) in the step 3+Negative sample
Figure FDA0003323570270000025
Inputting the vector into a generator to obtain a corresponding generated image
Figure FDA0003323570270000026
Figure FDA00033235702700000217
Then, the generated image is input to a feature extraction section D of the discriminator1Obtaining corresponding generated image features
Figure FDA00033235702700000218
Characterizing an image
Figure FDA0003323570270000029
Loss-fighting module D sent to discriminator2Performing countermeasure training, and mapping all generated image features into a representation space through a representation mapping module H to obtain
Figure FDA00033235702700000210
Loss function of generator
Figure FDA00033235702700000211
Comprises the following steps:
Figure FDA00033235702700000212
Figure FDA00033235702700000213
generating a countermeasure loss function for the generator of the countermeasure network, αgIs the weight of the term that is used,
Figure FDA00033235702700000214
is a contrast loss function introduced by the present invention; g (z) is a generated image of the generator with respect to the random variable z, D (G (z)) is an output value of the discriminator with respect to the generated image,
Figure FDA00033235702700000215
representing the mathematical expectation of the discriminator with respect to the output value of the generated image when the input random variable z of the generator conforms to the standard multivariate normal distribution;
and 5: training the generation countermeasure neural network constructed in the step 2, performing network training by using the loss function constructed in the step 4, fixing the parameters of the discriminator network D when updating the generator network G, fixing the parameters of the generator network G when updating the discriminator network D, and iteratively updating the discriminator 5 times each time and then updating the generator once;
step 6: the trained generator network G is used to generate images.
CN202111254371.9A 2021-10-27 2021-10-27 Image generation method based on contrast learning and generation countermeasure network Pending CN114038055A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111254371.9A CN114038055A (en) 2021-10-27 2021-10-27 Image generation method based on contrast learning and generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111254371.9A CN114038055A (en) 2021-10-27 2021-10-27 Image generation method based on contrast learning and generation countermeasure network

Publications (1)

Publication Number Publication Date
CN114038055A true CN114038055A (en) 2022-02-11

Family

ID=80135466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111254371.9A Pending CN114038055A (en) 2021-10-27 2021-10-27 Image generation method based on contrast learning and generation countermeasure network

Country Status (1)

Country Link
CN (1) CN114038055A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863225A (en) * 2022-07-06 2022-08-05 腾讯科技(深圳)有限公司 Image processing model training method, image processing model generation device, image processing equipment and image processing medium
CN114943322A (en) * 2022-04-11 2022-08-26 山东大学 Automatic generation method and system from layout to scene image based on deep learning
CN115063862A (en) * 2022-06-24 2022-09-16 电子科技大学 Age estimation method based on feature contrast loss
CN115860113A (en) * 2023-03-03 2023-03-28 深圳精智达技术股份有限公司 Training method and related device for self-antagonistic neural network model
CN116309913A (en) * 2023-03-16 2023-06-23 沈阳工业大学 Method for generating image based on ASG-GAN text description of generation countermeasure network
CN116503502A (en) * 2023-04-28 2023-07-28 长春理工大学重庆研究院 Unpaired infrared image colorization method based on contrast learning

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943322A (en) * 2022-04-11 2022-08-26 山东大学 Automatic generation method and system from layout to scene image based on deep learning
CN115063862A (en) * 2022-06-24 2022-09-16 电子科技大学 Age estimation method based on feature contrast loss
CN115063862B (en) * 2022-06-24 2024-04-23 电子科技大学 Age estimation method based on feature contrast loss
CN114863225A (en) * 2022-07-06 2022-08-05 腾讯科技(深圳)有限公司 Image processing model training method, image processing model generation device, image processing equipment and image processing medium
CN114863225B (en) * 2022-07-06 2022-10-04 腾讯科技(深圳)有限公司 Image processing model training method, image processing model generation device, image processing model equipment and image processing model medium
CN115860113A (en) * 2023-03-03 2023-03-28 深圳精智达技术股份有限公司 Training method and related device for self-antagonistic neural network model
CN116309913A (en) * 2023-03-16 2023-06-23 沈阳工业大学 Method for generating image based on ASG-GAN text description of generation countermeasure network
CN116309913B (en) * 2023-03-16 2024-01-26 沈阳工业大学 Method for generating image based on ASG-GAN text description of generation countermeasure network
CN116503502A (en) * 2023-04-28 2023-07-28 长春理工大学重庆研究院 Unpaired infrared image colorization method based on contrast learning

Similar Documents

Publication Publication Date Title
CN114038055A (en) Image generation method based on contrast learning and generation countermeasure network
Batson et al. Noise2self: Blind denoising by self-supervision
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
CN108399428B (en) Triple loss function design method based on trace ratio criterion
Besserve et al. Counterfactuals uncover the modular structure of deep generative models
CN106845529B (en) Image feature identification method based on multi-view convolution neural network
CN109711426B (en) Pathological image classification device and method based on GAN and transfer learning
CN114398961B (en) Visual question-answering method based on multi-mode depth feature fusion and model thereof
Wen et al. Image recovery via transform learning and low-rank modeling: The power of complementary regularizers
CN111429340A (en) Cyclic image translation method based on self-attention mechanism
CN112115967B (en) Image increment learning method based on data protection
CN114998602B (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN112464004A (en) Multi-view depth generation image clustering method
Chen et al. Automated design of neural network architectures with reinforcement learning for detection of global manipulations
CN114494489A (en) Self-supervision attribute controllable image generation method based on depth twin network
CN116704079B (en) Image generation method, device, equipment and storage medium
Gangloff et al. Deep parameterizations of pairwise and triplet Markov models for unsupervised classification of sequential data
CN116109656A (en) Interactive image segmentation method based on unsupervised learning
CN113450313B (en) Image significance visualization method based on regional contrast learning
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN114936890A (en) Counter-fact fairness recommendation method based on inverse tendency weighting method
CN112446345A (en) Low-quality three-dimensional face recognition method, system, equipment and storage medium
EP4386657A1 (en) Image optimization method and apparatus, electronic device, medium, and program product
Saaim et al. Generative Models for Data Synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination