CN111340720B

CN111340720B - Color matching woodcut style conversion algorithm based on semantic segmentation

Info

Publication number: CN111340720B
Application number: CN202010091956.2A
Authority: CN
Inventors: 徐丹; 李应涛
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2020-02-14
Filing date: 2020-02-14
Publication date: 2023-05-19
Anticipated expiration: 2040-02-14
Also published as: CN111340720A

Abstract

The invention provides a method for converting the style of a color woodcut based on semantic segmentation, which comprises the following steps: step one: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image; step two: binarizing the semantic segmentation result graph to obtain an image mask; step three: and using semantic segmentation masks of the content image and the woodcut artistic style image as guidance, carrying out regional style conversion on the content image and the woodcut artistic style image by adding a space guidance channel, and finally obtaining an artistic style conversion result with the woodcut artistic style. The method for converting the color-set woodcut style effectively avoids the problems that woodcut texture is not obvious, the distribution of the notch texture is disordered and the like easily caused by the woodcut style conversion result. The wood engraving score texture presented by the wood engraving style conversion result of the method is obvious, the score texture distribution is reasonable, the conversion result is real and natural, and the wood engraving style conversion result is closer to the real wood engraving.

Description

Color matching woodcut style conversion algorithm based on semantic segmentation

Technical Field

The invention relates to the technical field of image processing, in particular to a color matching woodcut style conversion algorithm based on semantic segmentation.

Background

Neural network image style conversion is a technique for rendering styles of artistic style images onto content images using a neural network. Gatys et al ^[1] The pioneering of (a) demonstrated the ability of convolutional neural networks (convolutional neural networks, CNN) to create artistic images; later, neural network image style conversion has received increasing attention, and many methods have been proposed to improve or extend the original algorithms. Li and the like ^[2] Enhancing details and edge contours of the conversion result by adding Laplace loss; risser et al ^[3] A method for improving style conversion stability by adding histogram loss is provided; johnson et al ^[4] Through training a model, the rapid style conversion of the image is realized in a feedforward propagation mode; chen et al ^[5] Based on a local matching method, the rapid conversion of any style is realized; li and the like ^[6] A style conversion algorithm capable of learning a linear transformation matrix is provided in a data driving mode, and the style conversion algorithm can realize style conversion of any image and video.

At present, the neural network image style conversion algorithm can realize the conversion of any style image, but is different from paper painting such as woodcut, but is obtained through rubbing a medium, the woodcut texture is obvious in the work, the types of the woodcut in some local areas are basically consistent, the texture distribution is approximately uniform, the result obtained by the conventional neural network image style conversion algorithm is easy to appear the phenomena of unobvious woodcut, disordered distribution of the cut texture and destroyed semantic information of a content image. The reason for the presence of said defects is: the existing neural network style conversion methods are divided into two main types: (1) an online neural network method based on image optimization; (2) an offline neural network method based on model optimization. For the method of class (1), the generated image is initialized to random noise, a VGG19 network is used as a feature extractor, feature characterization extracted by a higher layer of the VGG19 network is used as content characterization, and correlation among feature characterization extracted by each convolution layer is used as style characterization; the Gram matrix is used for calculating the correlation among different feature characterizations to be used as a style characterization, and as Gram can only extract the global average feature of the image, the space object information is not limited, and the phenomenon is easy to occur in the woodcut stylized image obtained by optimizing on the white noise initialized image. The method of class (2) usually obtains specific model or decoder parameters by training models or decoders, and any artistic style image can be stylized by training the models or decoders, but the models or decoders do not design structures for correspondingly highlighting the texture of woodcut and optimizing the texture distribution for the characteristics of woodcut, so that the stylized result is easy to appear.

[1]Gatys LA,EckerA S,Bethge M.Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recogni-tion.2016:2414-2423

[2]Li S,Xu X,Nie L,et al.Laplacian-steered neural style transfer[C]//Proceedings ofthe 25thACM international conference on Multimedia.ACM,2017:1716-1724

[3]Risser E,Wilmot P,Barnes C.Stable and controllable neural texture synthesis and style transferusing histogram losses[OL].arXivpreprint arXiv:1701.08893,2017

[4]Johnson J,Alahi A,Fei-Fei L.Perceptual losses for real-time style transfer and super-resolution[C]//European conference on computer vision.Springer,Cham,2016:694-711

[5]Chen T Q,Schmidt M.Fast patch-based style transfer of arbitrary style[OL].arXivpreprint arXiv:1612.04337,2016

[6]Li X,Liu S,Kautz J,et al.Learning Linear Transformations for Fast Image and Video Style Transfer[C]//Proceedings of the IEEE Conference on Computer Vision andPattern Recognition.2019:3809-3817

[7]Zheng S,Jayasumana S,Romera-Paredes B,et al.Conditional random fields as recurrent neural net-works[C]//Proceedings ofthe IEEE international confer-ence on computervision.2015:1529-1537

[8]Gatys L A,Ecker A S,Bethge M,et al.Controlling perceptual factors in neural style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition.2017:3985-3993

[9]Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[OL].arXivpreprint arXiv:1409.1556,2014。

Disclosure of Invention

The invention aims to solve the problems that wood nicks are not obvious, nicked texture distribution is disordered and generated graph semantic information is destroyed easily in the result obtained during wood nicking style conversion, and provides a color matching wood nicking style conversion algorithm based on semantic segmentation, so that the generated wood nicking texture distribution is reasonable, and the conversion result is true and natural.

A method for converting the style of a color woodcut based on semantic segmentation comprises the following steps:

step one: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image;

the method comprises the steps of performing semantic segmentation on a content image by using a CRF-RNN network to obtain a semantic segmentation result graph; dividing the print artistic style image by using a semantic labeling tool Lableme to obtain a semantic segmentation result diagram;

step two: respectively carrying out binarization processing on the two semantic segmentation result images to respectively obtain two complementary content image masks of the content image and two complementary style image masks of the print artistic style image;

step three: and using the content image mask and the style image mask as guidance, carrying out regional style conversion on the content image and the woodcut artistic style image by adding a spatial guidance channel, and finally obtaining an artistic style conversion result with the woodcut style.

Further, in the method for converting the engraving style of the color matching woods in semantic segmentation, in the first step, the performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result graph includes:

step 1: pixel X of the content image _i Is used as a random variable, the relation between pixels is used as an edge to form a conditional random field, and X is a random variable X ₁ ,X ₂ ,...X _N A component vector, where N is the number of pixels in the image; when a global observation I is obtained, I is an image, (I, X) can be modeled as a CRF model characterized by a gibbs distribution of the form:

wherein E (x) is energy of which x takes a certain value, and Z (I) is a distribution function.

In the CRF model, the energy given to x a certain tag is calculated from the following energy function:

wherein ,ψ_u (x _i ) For unitary energy component, for measuring the label x _i Probability, ψ, assigned to pixel i _p (x _i ,x _j ) Is a binary energy component describing the association between two adjacent pixels i, j;

the unitary energy component is calculated from the CNN, only the labels of the pixels are roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is expressed as a weighted gaussian kernel function:

wherein ,μ(x_i ,x _j ) As a tag compatibility function, for capturing compatibility between different tag pairs, for each of m=1, 2

Is a Gaussian function applied to the feature vector, w ^(m) For each of the m=1, 2, weight of M, f _i ，f _j Is a feature vector of the pixel;

step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge extrapolation, which approximates the CRF distribution P (X) with a simpler distribution Q (X), which can be written as the product of independent edge distributions, namely:

Q(x)＝Π _i Q _i (x _i )

wherein Q (X) represents an average field approximation of CRF; x is X _i Representing a certain pixel in the image.

Step 3: modeling the single iteration process of the CRF average field obtained in the step 2 as a forward propagation process of a CNN layer, and iterating the CRF average field for a plurality of times until the iteration times are completed, wherein the iteration times are equivalent to treating the CRF average field reasoning as an RNN model, and the model is called CRF-RNN;

step 4: combining the CRF-RNN model with the FCN network to form an end-to-end network;

step 5: and training the network by using the PASCAL Context semantic segmentation data set, and inputting the content image into the FCN combined with the CRF-RNN end-to-end network after the training is finished, so as to finally obtain a semantic segmentation result graph of the content image.

Further, the method for converting the engraving style of the color matching woodcut in semantic segmentation, as described above, includes:

step 1: using the content image as an initialization image for generating a map; the method comprises the steps that a corresponding feature map is obtained in each convolution layer of a network by a content image and a generated map, the feature map of each layer is stored as a two-dimensional matrix to obtain the feature representation of the layer, and the feature representation extracted from the higher layer of the VGG19 network is used as the content representation;

step 2: the woodcut style image and the generated image can also obtain corresponding characteristic representation on each convolution layer of the network; calculating the correlation among the characteristic diagrams of each channel of each layer by using a Gram matrix to be used as style characterization;

step 3: the mask image is input to a style conversion network, which generates guide channels T at each layer based on the mask image, and then re-encoded _l ^r This corresponds to the guiding channel T _l ^r The method is characterized in that a piece of weight information is added to the feature map, the activation value of the region corresponding to the guide channel in the feature map is increased due to the action of the weight, and the image optimization process is only carried out in the corresponding space guide channel region;

step 4: step 1, after a content image is obtained and a content representation of the image is generated through style conversion network calculation, performing corresponding element multiplication operation on a feature image in the content representation and a space guiding channel corresponding to the generated space guiding channel in step 3 to obtain a space guiding content representation, and defining content loss by using Euclidean distance;

after the woodcut style image obtained by calculation through the style conversion network in the step 2 and the feature representation of the generated image are subjected to corresponding element multiplication operation, so that the space guiding feature representation is obtained by carrying out corresponding element multiplication operation on each feature image and the space guiding channel generated in the step 3; calculating the correlation between the space guidance feature graphs by using the Gram matrix to obtain a space guidance Gram matrix as a space guidance style characterization, and defining style loss by using Euclidean distance;

step 5: and (3) weighting and combining the content loss and the style loss to obtain a total loss function, optimizing the generated image initialization image by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with woodcut style.

Further, according to the method for converting the engraving style of the color matching woodcut in semantic segmentation, the function of the content loss in the step 4 is as follows:

wherein x represents a generated map initialization image, x _c Representing a content image;

and />

Representing spatially directed content representations of the initialization image and the content image, respectively, at a first layer of the network; r is R;

the style loss function is:

wherein ,

w ^l a weight factor representing a characterization of each layer of features in VGG-19; />

And->

And respectively generating style characterization of the first layer of the image and the woodcut artistic style image in the network.

Further, according to the method for converting the engraving style of the chromatography woodcut based on semantic segmentation, the total loss function is as follows:

L _total ＝αL _c +βL _s

wherein ,L_c Representing a loss function between a content graph and a generated graph, L _s Representing a loss function between the print style image and the generated map, and alpha and beta represent weights of the content loss function and the print style loss function, respectively.

The beneficial effects are that:

the method for converting the color-set woodcut style effectively avoids the problems that woodcut texture is not obvious, the distribution of the notch texture is disordered and the like easily caused by the woodcut style conversion result. The wood engraving score texture presented by the wood engraving style conversion result of the method is obvious, the score texture distribution is reasonable, the conversion result is real and natural, and the wood engraving style conversion result is closer to the real wood engraving.

The conversion method provided by the invention is based on a neural network segmentation algorithm and a CNN image style conversion method, takes a content image as a generated image initialization image, uses an image mask as a guide, and performs style conversion of color-set woodcut by adding a space guide channel, so that the problems of insignificant woodcut notch texture, disordered notch texture distribution and the like in woodcut stylization are avoided. The principle is as follows:

the invention provides a woodcut style conversion method, which belongs to an online neural network method based on image optimization, wherein each original input image is provided with two complementary mask images, the pixel value of each image mask is 0 or 1 (the pixel value of a black area in each mask image is 0, the pixel value of a white area in each mask image is 1), the image mask is used as a guide, a space guide channel is added for carrying out woodcut regional style conversion, and the space guide channel can be understood as an area with the pixel value of 1 in each mask image;

using the content map, the style map and the corresponding mask images as inputs to a style conversion network, the network generating guide channels T at each layer based on the mask images _l ^r Will guide the channel T _l ^r Performing corresponding element multiplication operation with the feature map extracted by the network to obtain a spatial guidance feature representation, which is equivalent to the guidance channel T _l ^r The weight information is added for the feature map, the activation value of the area corresponding to the guide channel in the feature map is increased due to the action of the weight, and the Gram matrix is formedOnly the feature correlation in the guide channel region is calculated, and the network only optimizes the guide channel region when optimizing the style loss, so that the influence of style features of the non-guide channel region is eliminated, and the phenomenon of disordered wood mark texture distribution is avoided; and optimizing the area of the guide channel by using the first mask image, and optimizing the corresponding guide channel area by using the second mask image after the set iteration times are reached until stopping after the set iteration times, so as to obtain the stylized image.

According to the invention, the content image is used for replacing the white noise image as the initialization image of the generated image, compared with the white noise image initialization, the semantic structure information in the image can be well maintained, the iteration times are reduced, and the VGG19 network can extract the semantic features of the generated image more easily; the Gram matrix is easier to acquire the correlation between the features of the advanced semantic information of the image and the woodcut style features on the feature representation of the image with the semantic information, the optimization is carried out on the initialization generation image with the semantic information, the noise interference is reduced, the migration of the woodcut style features is enhanced in the image optimization process by combining the space guiding channel, and the conversion result woodcut style features are more obvious;

the combination of the space guiding channel and the content image instead of white noise is used as an initialization image for generating the image, so that the problems that the grain characteristics of woodcut are not obvious, the grain distribution of the woodcut is disordered, the semantic information of the image is destroyed and the like in the stylization of woodcut are avoided.

Drawings

FIG. 1 is an overall flowchart of a color woodcut style conversion;

FIG. 2 is a schematic diagram of a CRF-RNN algorithm for content image segmentation;

FIG. 3 is an original image, an image semantic segmentation result, and a mask image thereof;

FIG. 4 is a flow chart of the color woodcut zoned style conversion;

FIG. 5 is a diagram of stylized results for different weights;

FIG. 6 is a comparison of the portrait image woodcut style conversion results;

FIG. 7 is a comparison of woodcut style conversion results for a scene image;

FIG. 8 is a partial texture detail contrast;

FIG. 9 is a visual evaluation average statistics of woodcut style conversion results.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the present invention will be clearly and completely described below, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

A first part: image semantic segmentation:

content image semantic segmentation

For semantic segmentation of content images, CRF-RNN algorithm is used ^[7] The semantic segmentation at the pixel level is obtained, and the algorithm is an end-to-end image semantic neural network segmentation algorithm.

In the pixel-by-pixel labeling task of an image, CRF (conditional random field, CRF) is typically used for label class prediction, where labels of pixels of the image are used as random variables, and the relationship between pixels is used as edges to form a conditional random field, and the CRF can model the labels when global observation is obtained. Setting the image I to have N pixels, distributing each pixel in the image I into a preset label set L, and enabling the label of the pixel distributed to the pixel I to be a random variable X _i ，X _i E L; let the variable X be X ₁ ,X ₂ ,...,X _N A random vector of components. Let graph g= (V, E), where v= { X ₁ ,X ₂ ,...,X _N V is equivalent to X, in which a graph is typically represented by G (V, E), so V is used to represent the set of pixels, E is the pixel-to-pixel relationship, and when a global observation I (I is an image) is obtained, (I, X) can be modeled as a CRF model characterized by a gibbs distribution of the form:

wherein E (x) is energy of which x takes a certain value, and Z (I) is a distribution function. In the fully connected CRF model, the energy of the tag is defined as:

wherein ,ψ_u (x _i ) For unitary energy component, for measuring the label x _i Probability, ψ, assigned to pixel i _p (x _i ,x _j ) The term is used for describing the association between two adjacent pixels i and j, and can lead the pixel points with similar color values of the adjacent pixels to have a higher probability of being separated into the same category of labels. The unitary energy component is calculated from the CNN, only the labels of the pixels are roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is expressed as a weighted gaussian kernel function:

in the formula ,μ(x_i ,x _j ) Is a tag compatibility function used to capture compatibility between different tag pairs. For m=1, 2,..

Is a Gaussian function applied to the feature vector, w ^(m) For each of the m=1, 2, weight of M, f _i ，f _j Is a feature vector of the pixel.

Minimizing the above-mentioned CRF energy E (X) to obtain the largest possible label for a given image, since this exact minimization is difficult to handle, the average field approximation of the CRF distribution is used for maximum a posteriori edge extrapolation, which approximates the CRF distribution P (X) with a simpler distribution Q (X), can be written as a multiplication of the individual edge distributionsThe product, i.e. Q (x) =n _i Q _i (x _i ) Q (X) represents the average field approximation of CRF. Modeling a single iteration process of the CRF average field as a forward propagation process of the CNN layer, iterating the CRF average field multiple times until the iteration number is completed, the iteration number is generally 10 times, which is equivalent to treating the CRF average field inference as an RNN model, and the whole algorithm can be expressed as a RNN process.

Defining the RNN structure as CRF-RNN, regarding a CRF average field as an RNN calculation process, combining the model with FCNs (fully convolutionalnetworks, FCNs) to form an end-to-end network, training the network by using a PASCALContex semantic segmentation data set, inputting the content image into the FCN combined with the CRF-RNN end-to-end network after training is completed, and finally obtaining a semantic segmentation result graph of the content image. The structure of the binding of CRF-RNN to FCN is shown in FIG. 2.

The following describes the segmentation of the print art style image:

the precondition for realizing accurate segmentation by using the CRF-RNN semantic segmentation network is that a large number of marked image data sets need to be trained, the existing woodcut data sets are smaller, and segmentation results meeting the woodcut artistic style conversion conditions are difficult to obtain after training by using the CRF-RNN semantic segmentation network; therefore, the Labelme image annotation tool is used for semantically segmenting the print artistic style image.

A second part: semantic segmentation result binarization

The method comprises the steps of performing color matching woodcut, performing semantic style conversion on a content image and a mask of a woodcut artistic style image, performing binarization processing on the segmentation result by using a CRF-RNN image semantic segmentation algorithm and a segmentation result of the content image and the woodcut artistic style image, and obtaining mask images of the content image and the woodcut artistic style image, wherein each original image is provided with two complementary mask images. The original image, the image semantic segmentation result and the mask image thereof are shown in fig. 3.

Third section: color-set woodcut style conversion

Color woodcut regional style conversion is based on CNN image style conversion methodMethod of ^[1] And image stylization method with space guiding channel ^[8] . And taking the semantic segmentation mask image as a guide, and carrying out regional style conversion on the space guide channel region on the content image and the print artistic style image. Convolutional neural network model using pretrained VGG-19 ^[9] As a feature extractor, the feature characterization extracted by the higher layer of the convolutional neural network is used as a content characterization, and the correlation among the channel feature characterizations of the convolutional layer is used as a style characterization. Namely: the VGG19 network has the capability of extracting high-level semantic information of images, the network recodes the images after the images are input into the network, each convolution layer of the network extracts corresponding feature images, and the feature images are stored as a two-dimensional matrix to obtain the feature characterization of the response of the layer. The conversion method specifically comprises the following steps:

specifically, a generated map initialization image x and a content image x are defined _c (the generated image is an image as an optimization object, the content image and the style image are used as references, the optimization is carried out on the third image to obtain a conversion result image with semantic information of the content image and style characteristics of the style image simultaneously), the initialized image and the content image are recoded on each layer of the VGG-19 network, and the convolution kernel number of the first layer is N _l The size of the characteristic diagram is M _l ，M _l The feature map output by each layer can be stored as a matrix as the product of the width and height of the feature map on the first layer

F _l(x) and F_l (x _c ) Representing the corresponding feature characterizations of the initialization image and the content image at the first layer of the network, and taking the feature characterizations as content characterizations.

specifically, in order to avoid the problems of unobvious wood score texture and disordered score distribution, the method is realized by adding a space guiding channel. Taking the mask image as a space guiding channel, wherein the space guiding channel can be understood as a region with a pixel value of 1 in the mask image, and vectorizing the feature map of each convolution layer and the vectorized space guiding channel T _l ^r Performing corresponding element multiplication operation, and defining a space guiding feature map as follows:

wherein ,

is->

Is the ith column vector of (T) _l ^r Representing the R guide channel on the first layer, R E R; the spatial guiding feature of the first layer is characterized as F _l ^r (x)；

after the style image obtained by the style conversion network calculation in the step (2) and the feature representation of the generated image are subjected to corresponding element multiplication operation, so that the space guiding feature representation is obtained by carrying out corresponding element multiplication operation on each feature image and the space guiding channel generated in the step (3); calculating the correlation between the space guidance feature graphs by using the Gram matrix to obtain a space guidance Gram matrix as a space guidance style characterization, and defining style loss by using Euclidean distance;

in particular, the spatial guidance content characterization can be obtained through the formula (4) calculation,

and />

Representing spatially directed content characterization of the initialization image and the content image, respectively, at a first layer of the network, defining a content loss function as:

specifically, obtaining a space guidance characteristic representation through (4) formula calculation; and then using a Gram matrix to calculate the correlation between the space guidance feature maps as the style characterization of the space guidance channel region, wherein the space guidance Gram matrix is defined as:

defining an initialization image x of a generated image, and a woodcut style image x _s ，

And->

Spatial guidance of first layer in network for respectively generating image and woodcut artistic style imageAnd (5) style characterization. Using the mean square error definition to generate the difference between the image and the woodcut artistic style image, and defining the style loss function of the first layer as follows:

the style loss function for all layers is:

wherein ,w^l Representing the weight factor characterizing each layer in VGG-19.

The content loss function L _c And style loss function L _s Weighted simultaneous, defining a total loss function:

L _total ＝αL _c +βL _s (9)

wherein ,L_c Representing a loss function between a content graph and a generated graph, L _s And (3) representing a loss function between the woodcut style image and the generated image, wherein alpha and beta respectively represent weights of the content loss function and the woodcut style loss function, different alpha/beta values are selected to control the stylization degree of the woodcut, and the generated image is obtained through gradient descent. The space guiding channel ensures that patterns are transferred between similar semantic areas in content and style images, so that the condition that the distribution of texture features of the whole image is disordered is avoided, and fig. 4 is a color-set woodcut zoned style conversion flow chart.

According to the invention, the conv4_2 in the VGG-19 network is selected as a content feature extraction layer, the 5 network layers of conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 are selected as style feature extraction layers, and an original content image is selected as an initialized image of a generated image, so that the semantic structure of the image can be well maintained, the texture effect of woodcarving marks is enhanced, and the iteration times are reduced. For the weight of the content loss function and the style loss function, the larger the alpha/beta is, the lower the stylization degree of the generated image woodcut is, and the higher the stylization degree is otherwise. Fig. 5 is a graph of the degree of stylization generated under different weights.

Fourth part: comparison of style conversion results

The invention is applied to woodcut style conversion, which respectively carries out woodcut style conversion on different types of pictures, such as portrait and scenery pictures, black-and-white pictures and color pictures, and the pictures are respectively matched with Gatys ^[1] 、Johnson ^[6] and Li^[8] The style conversion results of (2) are compared, and the experimental results are shown in fig. 6-8.

In FIG. 6, the style conversion result of the 1 st line black and white woodcut shows Gatys ^[1] ，Johnson ^[6] and Li^[8] The phenomenon of disordered grain distribution of the wood nicks appears in the stylized results; in the 2 nd line color character portrait style conversion result, gatys ^[1] And Johnson ^[6] Is characterized by unobtrusive and disordered grain distribution, li ^[8] Compared with the original print style image, the color distribution of the face area is uneven, the invention has more obvious notch texture characteristics and reasonable notch texture and color distribution in the generation result of the black-white and color character portrait style conversion.

As can be seen from the scene image style conversion result of FIG. 7, gatys ^[1] The conversion result of (2) has distortion, and the phenomenon of damaging semantic information to a certain extent; johnson ^[6] As a result, migration failure occurs in a relatively smooth region, such as the stylized result of fig. 7, where the sky region does not have style features of the same semantics in the print style image, and where other semantically scored texture features do not perform well. Li (Li) ^[8] The conversion result of (2) remains good in the semantic structure information, but the woodcut texture features do not appear to be prominent. The stylized result of the present invention maintains and displays the woodcut texture in the semantic structureFeatures are superior to other methods.

The same region (white box region in fig. 7) is selected in the conversion result of fig. 7 for comparison with the local region detail texture, see fig. 8. Compared with the prior art, the invention has the advantages that the transformation result is more prominent in wood engraving texture characteristics, real and natural, and is close to the engraving effect in real color-covered wood engraving.

And in addition to the comparison of the experimental results, performing user assessment of visual quality on the print style conversion results. The participants firstly watch the content images and the print artistic style images, watch the stylized generated images of the four methods in random sequence, take the original real print artistic style images as the standard, score each stylized generated image from three aspects of overall visual quality, wood score texture quality and score texture distribution rationality, evaluate the score into 5 grades, namely very poor, general good and very good, respectively, and respectively represent the three grades by 1-5 scores, and then calculate the average score of each method according to the score given by the participants. The evaluation experiment invited 20 persons and 20 non-professionals to participate in the experiment and scoring the image processing direction. Fig. 9 is an average score statistic evaluating scores given by experimental participants.

From the experimental scoring results in fig. 9, the average score of the present invention is higher in three aspects than the other three methods, indicating that the woodcut style conversion results of the present invention are superior to the other methods in overall visual quality, woodcut texture quality, and texture distribution rationality.

The color matching woodcut style conversion method provided by the invention has the advantages that on one hand, the semantic structure of the content graph is well maintained, on the other hand, the wood cut texture characteristics in the color matching woodcut are well simulated, the distribution of the cut textures is uniform and reasonable, the conversion result is true and natural, and the color matching woodcut style conversion method is closer to the true woodcut.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for converting the style of a color register woodcut based on semantic segmentation is characterized by comprising the following steps:

step three: using the content image mask and the style image mask as guidance, carrying out regional style conversion on the content image and the woodcut artistic style image by adding a space guidance channel, and finally obtaining an artistic style conversion result with the woodcut style;

in the first step, the performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result graph includes:

step 1: label X for image pixels _i As random variables, the relationship between pixels is used as edge to form a conditional random field, and X is defined by random variable X ₁ ,X ₂ ,...X _N A component vector, where N is the number of pixels in the image; when global observations I are obtained, (I, X) can be modeled as a CRF model characterized by a gibbs distribution of the form:

wherein E (x) is energy of which x takes a certain value, and Z (I) is a distribution function;

wherein ,ψ_u (x _i ) For unitary energy component, for measuring the label x _i Probability, ψ, assigned to pixel i _p (x _i ,x _j ) Is a binary energy component, describing the association between two adjacent pixels i, j,

wherein ,μ(x_i ,x _j ) As a tag compatibility function, for capturing compatibility between different tag pairs, for m=1, 2,..m, K _G ^(m) Is a Gaussian function applied to the feature vector, W ^(m) M=1, 2,..m, weight of M, f _i ，f _j Feature vectors for pixels i and j;

step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge extrapolation, which is to approximate the CRF distribution P (X) with a simpler distribution Q (X, which can be written as the product of the independent edge distributions, i.e.:

Q(x)＝∏ _i Q _i (x _i )

wherein Q (X) represents an average field approximation of CRF; x is X _i Representing a pixel in the image;

step 5: training the network by using the PASCALContext semantic segmentation data set, and inputting the content image into the network after training is finished to finally obtain a semantic segmentation result graph of the content image;

the third step comprises the following steps:

step 3: the mask image is input to a style conversion network, which generates guide channels at each layer based on the mask image, and then re-encoded

This corresponds to a guide channel->

The method is characterized in that a piece of weight information is added to the feature map, the activation value of the region corresponding to the guide channel in the feature map is increased due to the action of the weight, and the image optimization process is only carried out in the corresponding space guide channel region;

step 5: the content loss and the style loss are weighted and combined to obtain a total loss function, the gradient descent is used for optimizing the initialized image of the generated image, the iteration times are set, the iteration times are reached, and the conversion result with woodcut style is finally obtained;

the function of the content loss in step 4 is:

and />

the style loss function is:

wherein ,

W ^l a weight factor representing a characterization of each layer of features in VGG-19;

and->

Respectively generating style characterization of the first layer of the image and the woodcut artistic style image in the network;

the total loss function is:

L _total ＝αL _c +βL _s