CN111325661A

CN111325661A - Seasonal style conversion model and method for MSGAN image

Info

Publication number: CN111325661A
Application number: CN202010106255.1A
Authority: CN
Inventors: 张福泉; 王传胜; 林强; 王冰
Original assignee: Jinggong Digital Performance Fuzhou Technology Co ltd
Current assignee: Zhonggong Huike Beijing Intelligent Technology Co ltd
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2020-06-23
Anticipated expiration: 2040-02-21
Also published as: CN111325661B

Abstract

The invention relates to a seasonal style conversion model of an image named MSGAN and a method thereof, comprising a generator G, a generator F and a first true and false discriminator D_GA second true-false discriminator D_FAnd season discriminator D_S. The goal of the generator is to convert the input image into other specific seasonal styles. The role of the true-false discriminator is to distinguish whether the image is a composite image. The season discriminator seasonally classifies each of the synthesized image and the real image. Both discriminators can provide guidance to the generator, respectively. In order to give the correct optimization direction of the network, the MSGAN respectively uses the style loss, the structural similarity loss and the color loss to improve the generation energy of the generatorForce. Moreover, the MSGAN guides the image style conversion task by using the saliency information of the image for the first time, so that the result of the image style conversion is more consistent with the real condition of human eyes.

Description

Seasonal style conversion model and method for MSGAN image

Technical Field

The invention relates to the technical field of image processing, in particular to a seasonal style conversion model and a seasonal style conversion method for an image named MSGAN.

Background

In recent years, cartoons, animations and 3D movies have become popular forms of art, and they have the advantage that human figures and scenes can be designed by computer software and used to deduce scripts. The cartoon work needs not only designers to design the character image, but also to draw the scene appearing in the work. Due to the complexity and diversity of storylines, seasonal variations in the seasonal style of the same scene may also often occur with the storyline. Such special effects are also required for many conventional movies or television shows, not only for cartoon works. However, manual computer software implementation of the season change task by designers is time consuming and requires a lot of associated software skills. Therefore, it is an important task to design a special algorithm to automatically switch the seasonal scenes in the movie or cartoon scenes, so as to reduce the production cost and time of most movie works and make the workers concentrate on other works related to the movie works. Furthermore, the algorithm can be embedded into some software to improve the performance of the software, such as Photoshop.

The technology of performing style migration on pictures and making the pictures undergo style conversion has become a hot problem in recent years. Conventional approaches have been directed to designing a specific image filtering algorithm to migrate the original image into a fixed one style. However, many complex style conversion tasks require the conversion of the original image into a variety of different styles of images, such as seasonal conversion tasks. Therefore, fixed image filtering algorithms cannot accomplish this task. Recently, an image style migration method based on deep learning has become a mainstream method, which can transfer the style of an input image into a given style. Especially, the generative countermeasure network can obtain good image style migration effect.

Although generative countermeasure networks have met with significant success, current methods do not adequately accomplish seasonal style migration tasks. There are three main reasons for this: first, seasonal changes, unlike other style transition tasks, sometimes require the addition or subtraction of certain elements from the original image, such as adding snow and subtracting leaves from the image in a winter scene, which is a difficult point in style migration tasks. Secondly, the main feature of seasonal variation is color transformation, and most of image style migration focuses on adding texture to the original image and neglects color variation. Thirdly, when the seasonal style conversion is performed, different contents are affected by seasonal changes to different degrees, for example, leaves respectively appear green and yellow in spring and autumn, while the color of a trunk does not change much, but the content of an image is difficult to identify by a traditional image style migration algorithm.

Disclosure of Invention

In view of the above, the present invention is directed to a seasonal style conversion model of images named MSGAN and a method thereof, which can train images of unpaired different seasons and generate better seasonal style conversion effect.

The invention is realized by adopting the following scheme: a model for seasonal style conversion of an image named MSGAN comprises a generator G, a generator F, a first true and false discriminator D_GA second true-false discriminator D_FAnd season discriminator D_S；

The input of the generator G comprises an input image and a condition vector carrying input seasonal style information, and the generator G converts the input image into a seasonal style image determined by the condition vector; the first true and false discriminator D_GThe converted image of the discrimination generator G isIf the image is a composite image, feeding the result back to the generator G for providing guidance for the generator G; the season discriminator D_SPerforming seasonal classification on each synthesized image or real image, and feeding the result back to the generator G to provide guidance for the generator G;

the generator F converts the image generated by the generator G into a composite image similar to the original input image, and the second true-false discriminator D_FIt is discriminated whether or not the converted image of the generator F is a composite image.

Further, the loss function of the generator G during training is:

in the formula, X and Y represent pictures of an input model during training, and L_cGAN(G,D_GX, Y) denotes a generator G and a first discriminator D_GOf the opposing loss function, L_cyc(G) A circular consistency loss function, L, representing the generator G_colorRepresenting the hue loss function, L_ssimRepresenting a similarity loss function, L_style(G,D_S) Representing seasonal style loss functions, wherein α, β, gamma and delta are proportional weights of network loss values, and each loss function is as follows:

represents the mathematical expectation of when (x, y) obeys the true data distribution,

representing the mathematical expectation of x when it obeys the true data distribution, D_G(x, y) the result output when the first true-false discriminator discriminates the true image, D_G(x, G (x | c)) represents a first true-false discriminator D_GThe result output when the image synthesized by the generator G is identified;

wherein F (G (x | c)) represents the conversion result of the generator F to the result of the output of the generator G under the condition c;

wherein, G (x | c)_wIndicating the output of the generator G when the condition c is received, y_wA tone representing a real image;

wherein, N is the number of pixels p in the window, SSIM (×) is a loss function;

wherein the content of the first and second substances,

representing the mathematical expectation of when y obeys the true data distribution, D_S(G (x | c)) represents the judgment result of the season discriminator Ds on the generator G under the condition c, D_S(y) represents the discrimination result of the real image by the season discriminator Ds.

Further, the loss function of the generator F during training is:

L(F,D_F)＝L_cGAN(F,D_F,X,Y)+αL_cyc(F)+βL_color+γL_ssim；

in the formula, X and Y represent pictures of an input model during training, and L_cGAN(F,D_FX, Y) denotes a generator F and a second discriminator D_FOf the opposing loss function, L_cyc(F) A circular consistency loss function, L, representing the generator F_colorRepresenting the hue loss function, L_ssimRepresenting similarity loss functions, wherein α, β and gamma are proportional weights of network loss values, and each loss function is as follows:

wherein G (x | c) represents an image generated by the generator G from the input image x and the condition c,

representing (x, y) subject to the mathematical expectation of the real data pairs,

mathematical expectation representing x as obeying real data, D_F(x, y) denotes a second true-false discriminator D_FAuthentication of data pairs, D_F(x, G (x | c)) represents a second true-false discriminator D_FThe identification result of the synthesized data;

wherein G (F (y)) represents the output result when the input of the generator G is the output of the generator F;

wherein, G (x | c)_wRepresenting the hue, y, of the data synthesized by the generator G when receiving the condition c_wA hue representing a true image color;

where N is the number of pixels p in the window and SSIM (×) is the loss function.

Further, in order to improve the visual effect of the network output image, the saliency information of the image is used as a reference to guide the optimization of the network, and specifically, the saliency information of the network is used to set the proportional weight of the network loss value.

Wherein, the setting of the proportional weight of the network loss value by the significance information of the network specifically comprises the following steps:

step S1: carrying out multi-scale superpixel segmentation and saliency segmentation on an input original image;

step S2: and judging whether each region segmented by the multi-scale superpixels is in the significance region, if so, making the weight value of the network loss value be a group of preset values, otherwise, making the weight value of the network loss value be half of each weight value of the network loss value of the region in the significance region.

Preferably, in the present invention, if the current region is in the significant region, the values of α, β, γ, δ are 10, 2, 1 or 10, 4, 2, 1 respectively, and if the current region is in the non-significant region, the values of α, β, γ, δ are 5, 2, 0.5 or 5, 2, 1, 0.5 respectively.

Further, the structure of the generator G and the generator F is a symmetric convolutional neural network with 7 layers of connection, which sequentially from input to input is: an auto-encoder, a residual block, an auto-decoder.

Further, the season discriminator Ds is structured as: classical AlexNet network plus softmax classifier.

Further, before the picture is input to the generator G, the input image is converted into a gray scale map so as not to have a season feature.

The invention also provides an image seasonal style conversion method based on the MSGAN model, which specifically comprises the following steps:

step S1: constructing the MSGAN model and training the MSGAN model;

step S2: after the training is finished, a generator G in the training system is used as a conversion model;

step S3: and preprocessing the picture to be converted, and inputting the preprocessed picture and the set seasonal conditions into the conversion model together to obtain the converted picture corresponding to the season.

Further, in step S3, the preprocessing specifically includes: and carrying out gray level processing on the picture to be converted so that the picture does not have seasonal characteristics.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a new GAN-based MSGAN model, which can carry out seasonal conversion on input images. The MSGAN model of the invention can be used for training unpaired images in different seasons, so that the use is very convenient.

2. In order to improve the effect of generating images, the invention proposes a new loss suitable for seasonal style conversion: and tone loss, which can guide the optimization direction of the network according to the visual characteristics of the color, so that the tone of the output result is more similar to that of a given reference image.

3. The method uses the saliency information of the image to guide the season style conversion task so as to ensure that different image contents can have different optimization weights in the SMGAN, thereby improving the effect of the season style conversion and shortening the training time, and leading the result output by the network to be more in line with the human visual effect experimental result.

Drawings

Fig. 1 is a schematic structural diagram of an MSGAN model according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a generator G and a generator F according to an embodiment of the present invention.

Fig. 3 is a pseudo code of a weight setting algorithm according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of the overall data flow according to the embodiment of the present invention. Wherein, (a) a data flow for converting an input raw image into an image of other seasonal style; (b) for converting the synthesized image into an original-style image.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in FIG. 1, the present embodiment provides a seasonal style conversion model of an image named MSGAN, which is characterized by comprising a generator G, a generator F, and a first true and false discriminator D_GA second true-false discriminator D_FAnd season discriminator D_S；

The input of the generator G comprises an input image and a condition vector carrying input seasonal style information, and the generator G converts the input image into a seasonal style image determined by the condition vector; the first true and false discriminator D_GDistinguishing whether the image converted by the generator G is a synthetic image or not, and feeding back the result to the generator G to provide guidance for the generator G; the season discriminator D_SPerforming seasonal classification on each synthesized image or real image, and feeding the result back to the generator G to provide guidance for the generator G;

In this embodiment, the loss function of the generator G during training is:

in the formula, X and Y represent pictures of an input model during training, and L_cGAN(G,D_GX, Y) denotes a generator G and a first discriminator D_GOf the opposing loss function, L_cyc(G) A circular consistency loss function, L, representing the generator G_colorRepresenting the hue loss function, L_ssimRepresenting a similarity loss function, L_style(G,D_S) Representing seasonal style loss functionα, β, gamma and delta are proportional weights of network loss values, and each loss function is as follows:

representing the mathematical expectation of x when it obeys the true data distribution, D_G(x, y) the result output when the first true-false discriminator discriminates the true image, D_G(x, G (x | c)) represents a first true-false discriminator D_GThe result output when the image synthesized by the generator G is discriminated.

wherein, G (x | c)_wTone value, y, representing the result output by generator G under condition c_wRepresenting a tonal value of a reference image;

wherein the content of the first and second substances,

indicating when y obeys the true data distributionMathematical expectation of D_S(G (x | c)) represents the judgment result of the season discriminator Ds on the generator G under the condition c, D_S(y) represents the discrimination result of the real image by the season discriminator Ds.

In this embodiment, the loss function of the generator F during training is:

L(F,D_F)＝L_cGAN(F,D_F,X,Y)+αL_cyc(F)+βL_color+γL_ssim；

wherein G (F (y) represents the result of the output when the input of the generator G is the output of the generator F;

Preferably, the similarity measure plays an important role in object matching. Efforts are made to maintain similarity of target feature structures when converting input images to other seasonal styles. To ensure consistency in content between the input and output images, the present embodiment uses a loss of structural similarity. Each pixel point P of the input image X and the composite image G (X | c) is selected to be filtered using a window size of 13. The SSIM loss function can be described as:

here,. mu._xIs the average value of x, μ_yIs the mean value of y, σ_xIs the standard deviation of x, σ_yIs the standard deviation of y, σ_xyIs the covariance of x and y, c₁＝0.01²，c₂＝0.03². The present embodiment calculates the loss between the input image X and the synthesized image G (X | c) as:

where N is the number of pixels p in windows x and y.

Preferably, since color features in different seasons need to be learned, the embodiment uses an average filter of a sliding window as a color loss, so that the color of the generated image is closer to the real situation. Similar to the SSIM loss function, the present embodiment selects a 13 × 13 sliding window to calculate the hue between the synthetic image G and the real image y. Hue refers to a color attribute, and hue is related to wavelength and is a human feeling of different colors. The formula for calculating the hue of an image from an RGB image is as follows:

wherein the calculation formula of theta is as follows:

thus, the hue loss function is described as:

the overall data flow of the architecture proposed by the present embodiment is shown in fig. 4. Wherein (a): a data flow for converting the input original image into an image of other seasonal style; (b) the method comprises the following steps The synthesized image is converted into an original style image.

In this embodiment, different from the conventional cGAN, in order to improve the visual effect of the network output image, the saliency information of the image is used as a reference to guide the optimization of the network, specifically, the saliency information of the network is used to set the proportional weight of the network loss value, so that a more definite direction is provided for the optimization of the network, and the network is more emphasized in the optimization process. The biggest characteristic of the seasonal style conversion task is that it is impossible to convert all regions in an image with identical operators, because seasonal changes have a small effect on some scenes and a large effect on some regions. Therefore, the embodiment guides the optimization direction of the network by using the saliency information of the image, so that the output result is more real and reliable.

In this embodiment, a classical superpixel segmentation algorithm, such as SLIC, is used to perform multi-scale superpixel segmentation on an original image. Given an H × W image, the size of each super-pixel is (H × W)/K, and the distance between adjacent seeds can be approximated as S ═ sqrt ((H × W)/K), assuming pre-segmentation into K super-pixels of the same size. Reselecting the seed point in n × n neighborhood of the seed point (generally, n is 3), and the specific method comprises the following steps: and calculating gradient values of all pixel points in the neighborhood, and moving the seed point to the place with the minimum gradient in the neighborhood, wherein the purpose of doing so is to prevent the clustering center from possibly being at the edge position of the image. Then, by calculating a distance metric, a class label (i.e., to which cluster center) is assigned to each pixel point in the neighborhood around each seed point. The distance metric includes a color distance dc and a spatial distance ds:

in the formula I_i，a_i，b_iRespectively representing the color values of the pixel points i in the LAB color space.

In the formula, Ns represents the maximum spatial distance within a class, and is defined as Ns ═ S ═ sqrt (H × W/K), and is applied to each cluster. The maximum color distance Nc varies from image to image and from cluster to cluster, so it is replaced by a fixed constant m (span [1,40], generally 10). The resulting distance measure D' is as follows:

because each pixel point can be searched by a plurality of seed points, each pixel point has a distance with the surrounding seed points, and the seed point corresponding to the minimum value is taken as the clustering center of the pixel point.

The training data and the test images used in this embodiment are all 600 × 400 in size. The input image is divided by the SLIC algorithm, and the image is divided into 300 small regions. At the same time, the image is significantly segmented using the algorithm proposed by Guanghai Liu et al. The 300 small regions divided by the SLIC algorithm are respectively 10, 4, 2 and 1 if the small regions are in the salient region. If in the insignificant area, that is set to 5, 2, 1, 0.5, respectively. The pseudo code of the weight setting algorithm is shown in fig. 3.

Preferably, in this embodiment, if the current region is in the significant region, the values α, β, γ, and δ are respectively 10, 2, 1, or 10, 4, 2, 1, and if the current region is in the insignificant region, the values α, β, γ, and δ are respectively 5, 2, 0.5, or 5, 2, 1, and 0.5.

As shown in fig. 2, in this embodiment, the structure of the generator G and the generator F is a symmetric convolutional neural network with 7 layers of residual blocks, which sequentially from input to input: an auto-encoder, a residual block, an auto-decoder.

The generator is a symmetric CNN network with a 9-ResNet connection. The residual block reserves the characteristics of the size, the shape and the like of the previous layer of network and directly acts on the next layer of network. The structure can effectively reduce the operation amount of the network and prevent the problem of gradient disappearance in the training process. The structure of the decoder is symmetrical to that of the encoder, and the image which is consistent with the size of the input image can be recovered from the characteristic diagram. Assume that the present embodiment uses a vector of n × 1 as the condition vector c to carry the input seasonal style information. The condition vector c is connected to the condition map m as a bias term. In order to avoid the influence of vector size imbalance, the condition map m is consistent with the size of the input image. The input to the SMGAN can be expressed as:

x′＝x+m。

in this embodiment, the discriminator includes two types, one is a conventional binary discriminator for determining whether the image is a composite image, such as a first true-false discriminator and a second true-false discriminator in this embodiment. The other adopts a classic AlexNet classifier to judge the seasonal style of the image, namely a seasonal discriminator (also called style discriminator) in the embodiment. And a softmax activation function is adopted at the last layer of the season discriminator, so that the probability of the corresponding category can be output, and guidance is provided for the generator G, and the generator G can generate a more real simulation image. The j-class prediction for a given sample x and weight w is as follows:

the input image is converted into images of four different seasons, so K is 4. p (n) is the probability distribution of the nth class ground truth; q (n) is the probability distribution of the nth class prediction output P (y ═ n | x); h (p, q) is the cross entropy between p and q and can be expressed as follows:

therefore, the present embodiment will style the penalty function L_styleThe definition is as follows:

using the condition label c in the AlexNet structured softmax classifier, n classes of seasonal style images G (x | c) are generated, and although a condition vector c is given in the generator, a seasonal discriminator D_SIt is helpful for the true-false discriminator to distinguish the simulated image from the true image.

In the present embodiment, before the picture is input to the generator G, the input image is converted into a gray scale map so as not to have a season feature. Since the input image has its own seasonal style, it is difficult for the discriminator to make an accurate and objective judgment of the image when converting the input image into an image of its own seasonal style. Therefore, the input image needs to be initialized to a certain degree so that the input image does not have seasonal characteristics. The simplest method is to convert the input RGB image into a gray-scale map, and the conversion formula is as follows:

Gray＝R*0.299+G*0.587+B*0.114；

in the formula, R, G, B represent three channels of images R, G, B, respectively.

In summary, in the present embodiment, the goal of the generator is to convert the input image into other specific seasonal style. The role of the true-false discriminator is to distinguish whether the image is a composite image. The season discriminator seasonally classifies each of the synthesized image and the real image. Both discriminators can provide guidance to the generator, respectively. In order to give the correct optimization direction of the network, the MSGAN uses the style loss, the structural similarity loss and the color loss to improve the generating capability of the generator respectively. Moreover, the MSGAN guides the image style conversion task by using the saliency information of the image for the first time, so that the result of the image style conversion is more consistent with the real condition of human eyes.

The embodiment also provides an image seasonal style conversion method based on the MSGAN model, which specifically comprises the following steps:

step S1: constructing the MSGAN model and training the MSGAN model;

In this embodiment, in step S3, the preprocessing specifically includes: and carrying out gray level processing on the picture to be converted so that the picture does not have seasonal characteristics.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A model for seasonal style conversion of an image named MSGAN comprises a generator G, a generator F, a first true and false discriminator D_GA second true-false discriminator D_FAnd season discriminator D_S；

2. The model of seasonal style conversion of an image named MSGAN as claimed in claim 1 wherein the loss function of the generator G when trained is:

wherein X and Y represent trainingPicture of time input model, L_cGAN(G,D_GX, Y) denotes a generator G and a first discriminator D_GOf the opposing loss function, L_cyc(G) A circular consistency loss function, L, representing the generator G_colorRepresenting the hue loss function, L_ssimRepresenting a similarity loss function, L_style(G,D_S) Representing seasonal style loss functions, wherein α, β, gamma and delta are proportional weights of network loss values, and each loss function is as follows:

wherein the content of the first and second substances,

3. The model of seasonal style conversion of an image named MSGAN as claimed in claim 1 wherein the loss function of the generator F in training is:

L(F,D_F)＝L_cGAN(F,D_F,X,Y)+αL_cyc(F)+βL_color+γL_ssim；

4. The model of seasonal style conversion of images named MSGAN as claimed in claim 1, wherein the saliency information of the images is used as a reference to guide the optimization of the network, and specifically the saliency information of the network is used to set the scale weight of the loss value of the network in order to improve the visual effect of the network output images.

5. The model of seasonal style conversion of images named MSGAN as claimed in claim 4, wherein the setting of the scaling weight of the network loss value with the network saliency information comprises the following steps:

6. The model of seasonal style conversion of an image named MSGAN as claimed in claim 1 wherein the structure of generator G and generator F is a symmetric convolutional neural network with 7 layers of residual blocks, in order from input to input: an automatic coding structure, a residual block structure and a self-decoding structure.

7. The model of seasonal style conversion of images named MSGAN as claimed in claim 1, wherein the structure of the seasonal discriminator Ds is a classical AlexNet network structure.

8. The model of seasonal style conversion of an image named MSGAN as claimed in claim 1 wherein the input image is converted to a gray scale map without seasonal features before the picture is input to the generator G.

9. An image seasonal style conversion method based on the seasonal style conversion model of the image named MSGAN of any one of claims 1 to 8, comprising the steps of:

step S1: constructing the MSGAN model and training the MSGAN model;

10. The method of image seasonal style conversion based on a seasonal style conversion model of an image named MSGAN according to claim 9, wherein: in step S3, the preprocessing specifically includes: and carrying out gray level processing on the picture to be converted so that the picture does not have seasonal characteristics.