Disclosure of Invention
In view of the above, the present invention is directed to a seasonal style conversion model of images named MSGAN and a method thereof, which can train images of unpaired different seasons and generate better seasonal style conversion effect.
The invention is realized by adopting the following scheme: a model for seasonal style conversion of an image named MSGAN comprises a generator G, a generator F, a first true and false discriminator DGA second true-false discriminator DFAnd season discriminator DS;
The input of the generator G comprises an input image and a condition vector carrying input seasonal style information, and the generator G converts the input image into a seasonal style image determined by the condition vector; the first true and false discriminator DGThe converted image of the discrimination generator G isIf the image is a composite image, feeding the result back to the generator G for providing guidance for the generator G; the season discriminator DSPerforming seasonal classification on each synthesized image or real image, and feeding the result back to the generator G to provide guidance for the generator G;
the generator F converts the image generated by the generator G into a composite image similar to the original input image, and the second true-false discriminator DFIt is discriminated whether or not the converted image of the generator F is a composite image.
Further, the loss function of the generator G during training is:
in the formula, X and Y represent pictures of an input model during training, and LcGAN(G,DGX, Y) denotes a generator G and a first discriminator DGOf the opposing loss function, Lcyc(G) A circular consistency loss function, L, representing the generator GcolorRepresenting the hue loss function, LssimRepresenting a similarity loss function, Lstyle(G,DS) Representing seasonal style loss functions, wherein α, β, gamma and delta are proportional weights of network loss values, and each loss function is as follows:
represents the mathematical expectation of when (x, y) obeys the true data distribution,
representing the mathematical expectation of x when it obeys the true data distribution, D
G(x, y) the result output when the first true-false discriminator discriminates the true image, D
G(x, G (x | c)) represents a first true-false discriminator D
GThe result output when the image synthesized by the generator G is identified;
wherein F (G (x | c)) represents the conversion result of the generator F to the result of the output of the generator G under the condition c;
wherein, G (x | c)
wIndicating the output of the generator G when the condition c is received, y
wA tone representing a real image;
wherein, N is the number of pixels p in the window, SSIM (×) is a loss function;
wherein the content of the first and second substances,
representing the mathematical expectation of when y obeys the true data distribution, D
S(G (x | c)) represents the judgment result of the season discriminator Ds on the generator G under the condition c, D
S(y) represents the discrimination result of the real image by the season discriminator Ds.
Further, the loss function of the generator F during training is:
L(F,DF)=LcGAN(F,DF,X,Y)+αLcyc(F)+βLcolor+γLssim;
in the formula, X and Y represent pictures of an input model during training, and LcGAN(F,DFX, Y) denotes a generator F and a second discriminator DFOf the opposing loss function, Lcyc(F) A circular consistency loss function, L, representing the generator FcolorRepresenting the hue loss function, LssimRepresenting similarity loss functions, wherein α, β and gamma are proportional weights of network loss values, and each loss function is as follows:
wherein G (x | c) represents an image generated by the generator G from the input image x and the condition c,
representing (x, y) subject to the mathematical expectation of the real data pairs,
mathematical expectation representing x as obeying real data, D
F(x, y) denotes a second true-false discriminator D
FAuthentication of data pairs, D
F(x, G (x | c)) represents a second true-false discriminator D
FThe identification result of the synthesized data;
wherein G (F (y)) represents the output result when the input of the generator G is the output of the generator F;
wherein, G (x | c)
wRepresenting the hue, y, of the data synthesized by the generator G when receiving the condition c
wA hue representing a true image color;
where N is the number of pixels p in the window and SSIM (×) is the loss function.
Further, in order to improve the visual effect of the network output image, the saliency information of the image is used as a reference to guide the optimization of the network, and specifically, the saliency information of the network is used to set the proportional weight of the network loss value.
Wherein, the setting of the proportional weight of the network loss value by the significance information of the network specifically comprises the following steps:
step S1: carrying out multi-scale superpixel segmentation and saliency segmentation on an input original image;
step S2: and judging whether each region segmented by the multi-scale superpixels is in the significance region, if so, making the weight value of the network loss value be a group of preset values, otherwise, making the weight value of the network loss value be half of each weight value of the network loss value of the region in the significance region.
Preferably, in the present invention, if the current region is in the significant region, the values of α, β, γ, δ are 10, 2, 1 or 10, 4, 2, 1 respectively, and if the current region is in the non-significant region, the values of α, β, γ, δ are 5, 2, 0.5 or 5, 2, 1, 0.5 respectively.
Further, the structure of the generator G and the generator F is a symmetric convolutional neural network with 7 layers of connection, which sequentially from input to input is: an auto-encoder, a residual block, an auto-decoder.
Further, the season discriminator Ds is structured as: classical AlexNet network plus softmax classifier.
Further, before the picture is input to the generator G, the input image is converted into a gray scale map so as not to have a season feature.
The invention also provides an image seasonal style conversion method based on the MSGAN model, which specifically comprises the following steps:
step S1: constructing the MSGAN model and training the MSGAN model;
step S2: after the training is finished, a generator G in the training system is used as a conversion model;
step S3: and preprocessing the picture to be converted, and inputting the preprocessed picture and the set seasonal conditions into the conversion model together to obtain the converted picture corresponding to the season.
Further, in step S3, the preprocessing specifically includes: and carrying out gray level processing on the picture to be converted so that the picture does not have seasonal characteristics.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a new GAN-based MSGAN model, which can carry out seasonal conversion on input images. The MSGAN model of the invention can be used for training unpaired images in different seasons, so that the use is very convenient.
2. In order to improve the effect of generating images, the invention proposes a new loss suitable for seasonal style conversion: and tone loss, which can guide the optimization direction of the network according to the visual characteristics of the color, so that the tone of the output result is more similar to that of a given reference image.
3. The method uses the saliency information of the image to guide the season style conversion task so as to ensure that different image contents can have different optimization weights in the SMGAN, thereby improving the effect of the season style conversion and shortening the training time, and leading the result output by the network to be more in line with the human visual effect experimental result.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in FIG. 1, the present embodiment provides a seasonal style conversion model of an image named MSGAN, which is characterized by comprising a generator G, a generator F, and a first true and false discriminator DGA second true-false discriminator DFAnd season discriminator DS;
The input of the generator G comprises an input image and a condition vector carrying input seasonal style information, and the generator G converts the input image into a seasonal style image determined by the condition vector; the first true and false discriminator DGDistinguishing whether the image converted by the generator G is a synthetic image or not, and feeding back the result to the generator G to provide guidance for the generator G; the season discriminator DSPerforming seasonal classification on each synthesized image or real image, and feeding the result back to the generator G to provide guidance for the generator G;
the generator F converts the image generated by the generator G into a composite image similar to the original input image, and the second true-false discriminator DFIt is discriminated whether or not the converted image of the generator F is a composite image.
In this embodiment, the loss function of the generator G during training is:
in the formula, X and Y represent pictures of an input model during training, and LcGAN(G,DGX, Y) denotes a generator G and a first discriminator DGOf the opposing loss function, Lcyc(G) A circular consistency loss function, L, representing the generator GcolorRepresenting the hue loss function, LssimRepresenting a similarity loss function, Lstyle(G,DS) Representing seasonal style loss functionα, β, gamma and delta are proportional weights of network loss values, and each loss function is as follows:
wherein G (x | c) represents an image generated by the generator G from the input image x and the condition c,
represents the mathematical expectation of when (x, y) obeys the true data distribution,
representing the mathematical expectation of x when it obeys the true data distribution, D
G(x, y) the result output when the first true-false discriminator discriminates the true image, D
G(x, G (x | c)) represents a first true-false discriminator D
GThe result output when the image synthesized by the generator G is discriminated.
Wherein F (G (x | c)) represents the conversion result of the generator F to the result of the output of the generator G under the condition c;
wherein, G (x | c)
wTone value, y, representing the result output by generator G under condition c
wRepresenting a tonal value of a reference image;
wherein, N is the number of pixels p in the window, SSIM (×) is a loss function;
wherein the content of the first and second substances,
indicating when y obeys the true data distributionMathematical expectation of D
S(G (x | c)) represents the judgment result of the season discriminator Ds on the generator G under the condition c, D
S(y) represents the discrimination result of the real image by the season discriminator Ds.
In this embodiment, the loss function of the generator F during training is:
L(F,DF)=LcGAN(F,DF,X,Y)+αLcyc(F)+βLcolor+γLssim;
in the formula, X and Y represent pictures of an input model during training, and LcGAN(F,DFX, Y) denotes a generator F and a second discriminator DFOf the opposing loss function, Lcyc(F) A circular consistency loss function, L, representing the generator FcolorRepresenting the hue loss function, LssimRepresenting similarity loss functions, wherein α, β and gamma are proportional weights of network loss values, and each loss function is as follows:
wherein G (x | c) represents an image generated by the generator G from the input image x and the condition c,
representing (x, y) subject to the mathematical expectation of the real data pairs,
mathematical expectation representing x as obeying real data, D
F(x, y) denotes a second true-false discriminator D
FAuthentication of data pairs, D
F(x, G (x | c)) represents a second true-false discriminator D
FThe identification result of the synthesized data;
wherein G (F (y) represents the result of the output when the input of the generator G is the output of the generator F;
wherein, G (x | c)
wRepresenting the hue, y, of the data synthesized by the generator G when receiving the condition c
wA hue representing a true image color;
where N is the number of pixels p in the window and SSIM (×) is the loss function.
Preferably, the similarity measure plays an important role in object matching. Efforts are made to maintain similarity of target feature structures when converting input images to other seasonal styles. To ensure consistency in content between the input and output images, the present embodiment uses a loss of structural similarity. Each pixel point P of the input image X and the composite image G (X | c) is selected to be filtered using a window size of 13. The SSIM loss function can be described as:
here,. mu.xIs the average value of x, μyIs the mean value of y, σxIs the standard deviation of x, σyIs the standard deviation of y, σxyIs the covariance of x and y, c1=0.012,c2=0.032. The present embodiment calculates the loss between the input image X and the synthesized image G (X | c) as:
where N is the number of pixels p in windows x and y.
Preferably, since color features in different seasons need to be learned, the embodiment uses an average filter of a sliding window as a color loss, so that the color of the generated image is closer to the real situation. Similar to the SSIM loss function, the present embodiment selects a 13 × 13 sliding window to calculate the hue between the synthetic image G and the real image y. Hue refers to a color attribute, and hue is related to wavelength and is a human feeling of different colors. The formula for calculating the hue of an image from an RGB image is as follows:
wherein the calculation formula of theta is as follows:
thus, the hue loss function is described as:
the overall data flow of the architecture proposed by the present embodiment is shown in fig. 4. Wherein (a): a data flow for converting the input original image into an image of other seasonal style; (b) the method comprises the following steps The synthesized image is converted into an original style image.
In this embodiment, different from the conventional cGAN, in order to improve the visual effect of the network output image, the saliency information of the image is used as a reference to guide the optimization of the network, specifically, the saliency information of the network is used to set the proportional weight of the network loss value, so that a more definite direction is provided for the optimization of the network, and the network is more emphasized in the optimization process. The biggest characteristic of the seasonal style conversion task is that it is impossible to convert all regions in an image with identical operators, because seasonal changes have a small effect on some scenes and a large effect on some regions. Therefore, the embodiment guides the optimization direction of the network by using the saliency information of the image, so that the output result is more real and reliable.
Wherein, the setting of the proportional weight of the network loss value by the significance information of the network specifically comprises the following steps:
step S1: carrying out multi-scale superpixel segmentation and saliency segmentation on an input original image;
step S2: and judging whether each region segmented by the multi-scale superpixels is in the significance region, if so, making the weight value of the network loss value be a group of preset values, otherwise, making the weight value of the network loss value be half of each weight value of the network loss value of the region in the significance region.
In this embodiment, a classical superpixel segmentation algorithm, such as SLIC, is used to perform multi-scale superpixel segmentation on an original image. Given an H × W image, the size of each super-pixel is (H × W)/K, and the distance between adjacent seeds can be approximated as S ═ sqrt ((H × W)/K), assuming pre-segmentation into K super-pixels of the same size. Reselecting the seed point in n × n neighborhood of the seed point (generally, n is 3), and the specific method comprises the following steps: and calculating gradient values of all pixel points in the neighborhood, and moving the seed point to the place with the minimum gradient in the neighborhood, wherein the purpose of doing so is to prevent the clustering center from possibly being at the edge position of the image. Then, by calculating a distance metric, a class label (i.e., to which cluster center) is assigned to each pixel point in the neighborhood around each seed point. The distance metric includes a color distance dc and a spatial distance ds:
in the formula Ii,ai,biRespectively representing the color values of the pixel points i in the LAB color space.
In the formula, Ns represents the maximum spatial distance within a class, and is defined as Ns ═ S ═ sqrt (H × W/K), and is applied to each cluster. The maximum color distance Nc varies from image to image and from cluster to cluster, so it is replaced by a fixed constant m (span [1,40], generally 10). The resulting distance measure D' is as follows:
because each pixel point can be searched by a plurality of seed points, each pixel point has a distance with the surrounding seed points, and the seed point corresponding to the minimum value is taken as the clustering center of the pixel point.
The training data and the test images used in this embodiment are all 600 × 400 in size. The input image is divided by the SLIC algorithm, and the image is divided into 300 small regions. At the same time, the image is significantly segmented using the algorithm proposed by Guanghai Liu et al. The 300 small regions divided by the SLIC algorithm are respectively 10, 4, 2 and 1 if the small regions are in the salient region. If in the insignificant area, that is set to 5, 2, 1, 0.5, respectively. The pseudo code of the weight setting algorithm is shown in fig. 3.
Preferably, in this embodiment, if the current region is in the significant region, the values α, β, γ, and δ are respectively 10, 2, 1, or 10, 4, 2, 1, and if the current region is in the insignificant region, the values α, β, γ, and δ are respectively 5, 2, 0.5, or 5, 2, 1, and 0.5.
As shown in fig. 2, in this embodiment, the structure of the generator G and the generator F is a symmetric convolutional neural network with 7 layers of residual blocks, which sequentially from input to input: an auto-encoder, a residual block, an auto-decoder.
The generator is a symmetric CNN network with a 9-ResNet connection. The residual block reserves the characteristics of the size, the shape and the like of the previous layer of network and directly acts on the next layer of network. The structure can effectively reduce the operation amount of the network and prevent the problem of gradient disappearance in the training process. The structure of the decoder is symmetrical to that of the encoder, and the image which is consistent with the size of the input image can be recovered from the characteristic diagram. Assume that the present embodiment uses a vector of n × 1 as the condition vector c to carry the input seasonal style information. The condition vector c is connected to the condition map m as a bias term. In order to avoid the influence of vector size imbalance, the condition map m is consistent with the size of the input image. The input to the SMGAN can be expressed as:
x′=x+m。
in this embodiment, the discriminator includes two types, one is a conventional binary discriminator for determining whether the image is a composite image, such as a first true-false discriminator and a second true-false discriminator in this embodiment. The other adopts a classic AlexNet classifier to judge the seasonal style of the image, namely a seasonal discriminator (also called style discriminator) in the embodiment. And a softmax activation function is adopted at the last layer of the season discriminator, so that the probability of the corresponding category can be output, and guidance is provided for the generator G, and the generator G can generate a more real simulation image. The j-class prediction for a given sample x and weight w is as follows:
the input image is converted into images of four different seasons, so K is 4. p (n) is the probability distribution of the nth class ground truth; q (n) is the probability distribution of the nth class prediction output P (y ═ n | x); h (p, q) is the cross entropy between p and q and can be expressed as follows:
therefore, the present embodiment will style the penalty function LstyleThe definition is as follows:
using the condition label c in the AlexNet structured softmax classifier, n classes of seasonal style images G (x | c) are generated, and although a condition vector c is given in the generator, a seasonal discriminator DSIt is helpful for the true-false discriminator to distinguish the simulated image from the true image.
In the present embodiment, before the picture is input to the generator G, the input image is converted into a gray scale map so as not to have a season feature. Since the input image has its own seasonal style, it is difficult for the discriminator to make an accurate and objective judgment of the image when converting the input image into an image of its own seasonal style. Therefore, the input image needs to be initialized to a certain degree so that the input image does not have seasonal characteristics. The simplest method is to convert the input RGB image into a gray-scale map, and the conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114;
in the formula, R, G, B represent three channels of images R, G, B, respectively.
In summary, in the present embodiment, the goal of the generator is to convert the input image into other specific seasonal style. The role of the true-false discriminator is to distinguish whether the image is a composite image. The season discriminator seasonally classifies each of the synthesized image and the real image. Both discriminators can provide guidance to the generator, respectively. In order to give the correct optimization direction of the network, the MSGAN uses the style loss, the structural similarity loss and the color loss to improve the generating capability of the generator respectively. Moreover, the MSGAN guides the image style conversion task by using the saliency information of the image for the first time, so that the result of the image style conversion is more consistent with the real condition of human eyes.
The embodiment also provides an image seasonal style conversion method based on the MSGAN model, which specifically comprises the following steps:
step S1: constructing the MSGAN model and training the MSGAN model;
step S2: after the training is finished, a generator G in the training system is used as a conversion model;
step S3: and preprocessing the picture to be converted, and inputting the preprocessed picture and the set seasonal conditions into the conversion model together to obtain the converted picture corresponding to the season.
In this embodiment, in step S3, the preprocessing specifically includes: and carrying out gray level processing on the picture to be converted so that the picture does not have seasonal characteristics.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.