CN111340720A

CN111340720A - Color register woodcut style conversion algorithm based on semantic segmentation

Info

Publication number: CN111340720A
Application number: CN202010091956.2A
Authority: CN
Inventors: 徐丹; 李应涛
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2020-02-14
Filing date: 2020-02-14
Publication date: 2020-06-26
Anticipated expiration: 2040-02-14
Also published as: CN111340720B

Abstract

The invention provides a method for converting style of chromatic woodcut based on semantic segmentation, which comprises the following steps: the method comprises the following steps: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image; step two: carrying out binarization processing on the semantic segmentation result image to obtain an image mask; step three: and performing regional style conversion on the content image and the print artistic style image by adding a space guide channel by using the semantic segmentation mask of the content image and the print artistic style image as a guide, and finally obtaining an artistic style conversion result with a woodcut print style. The method for converting the style of the chromatic woodcut engraving effectively avoids the problems that woodcut nick textures are not obvious, nick textures are distributed in a disordered mode and the like easily caused by the style conversion result of the woodcut engraving. The wood engraving style conversion result of the method has obvious wood engraving nick texture, the nick texture is distributed reasonably, and the conversion result is real and natural and is closer to a real wood engraving.

Description

Color register woodcut style conversion algorithm based on semantic segmentation

Technical Field

The invention relates to the technical field of image processing, in particular to a color register woodcut style conversion algorithm based on semantic segmentation.

Background

Neural network image style conversion is a technique for rendering the style of artistic style images onto content images using a neural network. Gatys et al^[1]The pioneering work of (c) demonstrated the ability of Convolutional Neural Networks (CNNs) to create artistic images; after thatNeural network image style conversion is receiving more and more attention, and many methods are proposed to improve or extend the original algorithm. Li and the like^[2]Enhancing the details and edge contours of the conversion result by adding laplacian losses; risser et al^[3]A method for improving the style conversion stability by adding histogram loss is provided; johnson et al^[4]Through the training model, the rapid style conversion of the image is realized in a feed forward propagation mode; chen et al^[5]Based on the local matching method, the quick conversion of any style is realized; li and the like^[6]A style conversion algorithm capable of learning a linear transformation matrix is provided in a data driving mode, and the algorithm can realize style conversion of any images and videos.

At present, the neural network image style conversion algorithm can realize the conversion of images with any style, but for woodcut prints which are different from paper painting and are obtained by medium rubbing, the works are provided with obvious woodcut nick textures, the nick types are basically consistent in some local areas, the texture distribution is approximately uniform and uniform woodcut artistic style paintings, and the phenomena of unobvious woodcut nicks, disordered nick texture distribution and damaged semantic information of content images easily occur in the results obtained by the existing neural network image style conversion algorithm. The reasons why the defects exist are: existing neural network style conversion methods fall into two broad categories: (1) an online neural network method based on image optimization; (2) an off-line neural network method based on model optimization. For the method of the type (1), the generated image is generally initialized to random noise, the VGG19 network is used as a feature extractor, the feature characterizations extracted from the higher layers of the VGG19 network are used as content characterizations, and the correlation between the extracted feature characterizations of each convolutional layer is used as a style characterization; the method comprises the steps of calculating the correlation between different characteristic representations by using a Gram matrix to serve as style representations, wherein the Gram can only extract the global average characteristics of an image, and has no limit effect on space object information, so that the phenomenon is easy to occur in a woodcut stylized image obtained by optimizing the white noise initialized image. The method of the type (2) usually obtains specific model or decoder parameters by training a model or decoder, and any artistic style image can be stylized by training the model or decoder, but the model or decoder does not design a structure for correspondingly highlighting the wood engraving scoring texture and optimizing the texture distribution for the characteristics of the wood engraving, so that the stylization result is easy to generate the phenomenon.

[1]Gatys LA,EckerA S,Bethge M.Image style transfer usingconvolutional neural networks[C]//Proceedings of the IEEE conference oncomputer vision and pattern recogni-tion.2016:2414-2423

[2]Li S,Xu X,Nie L,et al.Laplacian-steered neural style transfer[C]//Proceedings ofthe 25thACM international conference on Multimedia.ACM,2017:1716-1724

[3]Risser E,Wilmot P,Barnes C.Stable and controllable neural texturesynthesis and style transferusing histogram losses[OL].arXivpreprint arXiv:1701.08893,2017

[4]Johnson J,Alahi A,Fei-Fei L.Perceptual losses for real-time styletransfer and super-resolution[C]//European conference on computervision.Springer,Cham,2016:694-711

[5]Chen T Q,Schmidt M.Fast patch-based style transfer of arbitrarystyle[OL].arXivpreprint arXiv:1612.04337,2016

[6]Li X,Liu S,Kautz J,et al.Learning Linear Transformations for FastImage and Video Style Transfer[C]//Proceedings of the IEEE Conference onComputer Vision andPattern Recognition.2019:3809-3817

[7]Zheng S,Jayasumana S,Romera-Paredes B,et al.Conditional randomfields as recurrent neural net-works[C]//Proceedings ofthe IEEE internationalconfer-ence on computervision.2015:1529-1537

[8]Gatys L A,Ecker A S,Bethge M,et al.Controlling perceptual factorsin neural style transfer[C]//Proceedings of the IEEE Conference on ComputerVision and PatternRecognition.2017:3985-3993

[9]Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[OL].arXivpreprint arXiv:1409.1556,2014。

Disclosure of Invention

The invention aims to solve the problems that wood engraving nicks are not obvious, nick textures are distributed in a disordered manner and generated graphic meaning information is damaged easily in the result obtained in the process of wood engraving print style conversion, and provides a color register wood engraving print style conversion algorithm based on semantic segmentation, so that the generated wood engraving print nick textures are distributed reasonably, and the conversion result is real and natural.

A method for converting style of chromatic woodcut based on semantic segmentation comprises the following steps:

the method comprises the following steps: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image;

performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result image; segmenting the print artistic style image by using a semantic annotation tool Lableme to obtain a semantic segmentation result image;

step two: respectively carrying out binarization processing on the two semantic segmentation result images to respectively obtain two complementary content image masks of the content image and two complementary style image masks of the layout artistic style image;

step three: and using the content image mask and the pattern image mask as guidance, and adding a space guide channel to perform regional style conversion on the content image and the print artistic style image to finally obtain an artistic style conversion result with the wood engraving style.

Further, in the method for converting style of chromatic woodcut in semantic segmentation, in the first step, performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result map includes:

step 1: image pixel X of content_iThe label of (A) is used as a random variable, the relation between pixels is used as an edge to form a conditional random field, and X is a random variable X₁,X₂,...X_NA vector of components, where N is the number of pixels in the image; when a global observation I is obtained, I isImages, (I, X) can be modeled as a CRF model, characterized by a gibbs distribution of the form:

where E (x) is the energy at which x takes a certain value, and Z (I) is the partition function.

In the CRF model, the energy assigned to x a certain label is calculated by the following energy function:

wherein ,ψ_u(x_i) Is a unary energy component for measuring the label x_iProbability assigned to pixel i, ψ_p(x_i,x_j) Is a binary energy component describing the association between two adjacent pixels i, j;

the unary energy component is calculated from the CNN, and only the label of the pixel is roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is represented as a weighted gaussian kernel function:

wherein ,μ(x_i,x_j) Is a tag compatibility function for capturing compatibility between different tag pairs, for each of M1, 2

Is a Gaussian function applied to a feature vector, w^(m)For each M ═ 1, 2., the weight of M, f_i，f_jIs a feature vector of the pixel;

step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge inference, which approximates the CRF distribution P (X) with a simpler distribution Q (X), which can be written as the product of the independent edge distributions, i.e.:

Q(x)＝Π_iQ_i(x_i)

wherein Q (X) represents the mean field approximation of CRF; x_iRepresenting a certain pixel in the image.

And step 3: modeling the single iteration process of the CRF mean field obtained in the step 2 as a one-time forward propagation process of a CNN layer, and performing multiple iterations on the CRF mean field until the iteration times are finished, which is equivalent to reasoning the CRF mean field as an RNN model for processing, wherein the model is called CRF-RNN;

and 4, step 4: combining a CRF-RNN model with an FCN to form an end-to-end network;

and 5: and training the network by using the PASCAL Context semantic segmentation data set, inputting the content image into an end-to-end network of the FCN combined with the CRF-RNN after the training is finished, and finally obtaining a semantic segmentation result graph of the content image.

Further, as described above, the method for converting the style of the chromatic woodcut in semantic segmentation includes the following steps:

step 1: using the content image as an initialization image for generating a map; the content image and the generated image obtain corresponding feature maps at each convolution layer of the network, each layer of feature map is stored as a two-dimensional matrix to obtain feature characterization of the layer, and feature characterization extracted from higher layers of the VGG19 network is used as content characterization;

step 2: the woodcut style image and the generated image can obtain corresponding characteristic representation at each convolution layer of the network; calculating the correlation among all channel characteristic graphs of each layer by using a Gram matrix to serve as style representation;

and step 3: the mask image is re-encoded after being input into the style conversion network, and the network generates a guide channel T in each layer according to the mask image_l ^rThis corresponds to the guidance of the channel T_l ^rAdding weight information to the feature map, wherein the activation value of a region corresponding to the guide channel in the feature map is increased under the action of the weight, and the image optimization process is only carried out in the corresponding spatial guide channel region;

and 4, step 4: step 1, after content images and content representations of the generated images are obtained through the calculation of a style conversion network, carrying out corresponding element multiplication operation on feature maps in the content representations and corresponding space guide channels generated in the step 3 to obtain space guide content representations, and defining content loss by using Euclidean distance;

after the wood engraving style images obtained by calculation of the style conversion network in the step 2 and the feature representations of the generated images are represented, multiplying each feature image by the corresponding element of the corresponding space guide channel generated in the step 3 to obtain a space guide feature representation; then, calculating the correlation between the space guide characteristic graphs by using the Gram matrix to obtain a space guide Gram matrix as a space guide style representation, and defining style loss by using Euclidean distance;

and 5: and (3) weighting and combining the content loss and the lattice loss to obtain a total loss function, optimizing the generated initial image of the woodcut by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with the woodcut style.

Further, in the method for converting the style of the chromatic woodcut by semantic segmentation, the function of content loss in step 4 is as follows:

where x denotes the generation of the chart initialization image, x_cRepresenting a content image;

and

the spatial guidance content representation of the initialization image and the content image on the ith layer of the network respectively is shown as R ∈ R;

the style loss function is:

wherein ,

w^la weighting factor representing the characteristic of each layer in the VGG-19;

and

respectively representing the style of the generated image and the first layer of the woodcut art style image in the network.

Further, according to the method for converting the style of the chromatic woodcut based on semantic segmentation, the total loss function is as follows:

L_total＝αL_c+βL_s

wherein ,L_cRepresents the loss function between the content graph and the generation graph, L_sRepresenting the loss function between the layout-style image and the generation map, α and β represent the weights of the content loss function and the layout-style loss function, respectively.

Has the advantages that:

the method for converting the style of the chromatic woodcut engraving effectively avoids the problems that woodcut nick textures are not obvious, nick textures are distributed in a disordered mode and the like easily caused by the style conversion result of the woodcut engraving. The wood engraving style conversion result of the method has obvious wood engraving nick texture, the nick texture is distributed reasonably, and the conversion result is real and natural and is closer to a real wood engraving.

The conversion method provided by the invention takes the content image as the generated image initialization image, uses the image mask as the guide, and performs style conversion of the chromatic woodcut engraving by adding the space guide channel based on the neural network segmentation algorithm and the CNN image style conversion method, thereby avoiding the problems of unobvious woodcut indentation texture, disordered indentation texture distribution and the like in the formatting of woodcut engraving. The principle is as follows:

the invention provides a woodcut style conversion method, which belongs to an online neural network method based on image optimization.A primary input image is provided with two complementary mask images, the pixel value of the image mask is 0 or 1 (the pixel value of a black area in the mask image is 0, and the pixel value of a white area in the mask image is 1), the image mask is used as guidance, a space guidance channel is added for carrying out woodcut regional style conversion, and the space guidance channel can be understood as an area of which the pixel value is 1 in the mask image;

the content map, the style map and the corresponding mask image are used as input of a style conversion network, and the network generates a guide channel T at each layer according to the mask image_l ^rTo guide a passage T_l ^rCarrying out corresponding element multiplication operation with the feature diagram extracted by the network to obtain a space guide feature representation, which is equivalent to that according to the guide channel T_l ^rAdding weight information to the feature map, wherein the activation value of a region corresponding to the guide channel in the feature map is increased under the action of the weight, the Gram matrix only calculates the feature correlation in the guide channel region, and the network only optimizes the guide channel region when the optimization style is lost, so that the influence of style features of a non-guide channel region is eliminated, and the phenomenon of disordered wood carving texture distribution is avoided; and optimizing the region of the guide channel by using the first mask image, optimizing the corresponding region of the guide channel by using the second mask image after the set iteration number is reached, and stopping until the set iteration number is reached to obtain a stylized image.

Compared with the method using white noise image initialization, the method using the content image to replace the white noise image as the initialization image for generating the image can well keep the semantic structure information in the image and reduce the iteration times, and the VGG19 network can more easily extract the semantic features of the generated image; the correlation between the characteristics of the high-level semantic information of the image and the woodcut style characteristics of the image can be easily obtained on the characteristic representation of the image with the semantic information by the Gram matrix, the initialized generated image with the semantic information is optimized, the noise interference is reduced, the migration of the woodcut style characteristics is enhanced in the image optimization process by combining the space guide channel, and the woodcut nicking characteristics of the conversion result are more obvious;

the space guide channel and the content graph replace white noise to be used as an initialization image for generating the graph, and the combination of the space guide channel and the content graph avoids the problems that the texture characteristics of the woodcut nick are not obvious, the nick texture distribution is disordered, the image semantic information is damaged and the like in the formatting of the woodcut picture.

Drawings

FIG. 1 is an overall flow chart of color register woodcut style conversion;

FIG. 2 is a schematic diagram of a CRF-RNN algorithm for content image segmentation;

FIG. 3 is an original image, image semantic segmentation result and its mask image;

FIG. 4 is a flowchart of process color woodcut zoned style conversion;

FIG. 5 is a graph of stylized results for different weights;

FIG. 6 is a comparison of the conversion results of portrait image woodcut style;

FIG. 7 is a comparison of scene image woodcut style conversion results;

FIG. 8 is a comparison of local texture details;

FIG. 9 is a woodcut style conversion result visual evaluation average score statistic.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described clearly and completely below, and it is obvious that the described embodiments are some, not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A first part: image semantic segmentation:

content image semantic segmentation

For semantic segmentation of content images, a CRF-RNN algorithm is used^[7]And obtaining pixel-level semantic segmentation, wherein the algorithm is an end-to-end image semantic neural network segmentation algorithm.

In the pixel-by-pixel marking task of an image, CRF (condition)and CRF) is generally used for predicting label categories, and the labels of image pixels are used as random variables, and the relationship between the pixels is used as an edge to form a conditional random field, so that when a global observation is obtained, the CRF can model the labels. Setting N pixels of an image I, allocating each pixel in the image I to a preset label set L, and enabling a pixel label allocated to a pixel I to be a random variable X_i，X_i∈ L, let variable X be from X₁,X₂,...,X_NA random vector of components. Let diagram G ═ (V, E), where V ═ X₁,X₂,...,X_NV is equivalent to X, in the figure, a figure is usually represented by G (V, E), so that a set of pixels is represented by V, and E is the relationship between pixels, when a global observation I (I is an image) is obtained, (I, X) can be modeled as a CRF model, which is characterized by a gibbs distribution of the following form:

where E (x) is the energy at which x takes a certain value, and Z (I) is the partition function. In a fully-connected CRF model, the energy of the tag is defined as:

wherein ,ψ_u(x_i) Is a unary energy component for measuring the label x_iProbability assigned to pixel i, ψ_p(x_i,x_j) The binary energy component is used for describing the association between two adjacent pixels i and j, and the term can enable pixel points with similar color values of the adjacent pixels to be classified under the same category label with higher probability. The unary energy component is calculated from the CNN, and only the label of the pixel is roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is represented as a weighted gaussian kernel function:

in the formula ,μ(x_i,x_j) Is a tag compatibility function used to capture the compatibility between different tag pairs. For each of M1, 2

Is a Gaussian function applied to a feature vector, w^(m)For each M ═ 1, 2., the weight of M, f_i，f_jIs a feature vector of the pixel.

Minimizing the above-mentioned CRF energy e (x) to obtain the maximum possible label for a given image, and since this precise minimization is difficult to handle, the average field approximation of the CRF distribution is used for maximum a posteriori edge inference, which approximates the CRF distribution p (x) with a simpler distribution q (x), which can be written as the product of independent edge distributions, i.e., q (x) ═ Π_iQ_i(x_i) Q (X) represents the mean field approximation of CRF. Modeling a single iteration process of the CRF mean field as a one-time forward propagation process of a CNN layer, and performing multiple iterations on the CRF mean field until the iteration times are completed, wherein the iteration times are generally 10 times, which is equivalent to reasoning the CRF mean field as an RNN model to process, and the whole algorithm can be represented as a process of an RNN.

Defining the RNN structure as a CRF-RNN, regarding a CRF mean field as an RNN calculation process, combining the model with an FCN (fuzzy coherence networks, FCN) to form an end-to-end network, training the network by using a PASCALContext semantic segmentation data set, inputting a content image into the end-to-end network combining the FCN and the CRF-RNN after the training is finished, and finally obtaining a semantic segmentation result graph of the content image. The structure of the combination of CRF-RNN and FCN is shown in FIG. 2.

The following explains the segmentation of the print artistic style image:

the premise of realizing accurate segmentation of the CRF-RNN semantic segmentation network is that a large number of labeled image data sets need to be trained, the existing woodcut data sets are small, and segmentation results meeting woodcut artistic style conversion conditions are difficult to obtain after the CRF-RNN semantic segmentation network is used for training; therefore, the Labelme image annotation tool is used for performing semantic segmentation on the engraving artistic style image.

A second part: semantic segmentation result binarization

The method comprises the steps that a mask of a content image and a mask of a woodcut artistic style image is used as a guide for semantic style conversion of a color register woodcut, the content image and the woodcut artistic style image are obtained by using a CRF-RNN image semantic segmentation algorithm and a Labelme, then binarization processing is carried out on the segmentation result, a mask image of the content image and a mask image of a format image are obtained, and each original image has two complementary mask images. The original image, the image semantic segmentation result and its mask image are shown in fig. 3.

And a third part: colour register woodcut style conversion

Regional style conversion of color register woodcut is mainly based on CNN image style conversion method^[1]And image stylizing method with space guide channel^[8]. And taking the semantic segmentation mask image as a guide, and performing regional style conversion on the space guide channel region on the content image and the layout artistic style image. Using pre-trained VGG-19 convolutional neural network models^[9]And as a feature extractor, using feature representations extracted by a higher layer of the convolutional neural network as content representations, and using the correlation among the feature representations of each channel of the convolutional layer as style representations. Namely: the VGG19 network has the capability of extracting high-level semantic information of images, the images are recoded by the network after being input into the network, corresponding feature maps are extracted from each convolution layer of the network, and the feature maps are stored into a two-dimensional matrix to obtain the feature characterization of the response of the layer. The conversion method specifically comprises the following steps:

specifically, the definition generates a graph initialization image x and a content image x_c(generating the figure is asOptimizing the image of the object, using the content image and the style image as reference, optimizing the third image to obtain a conversion result graph with semantic information of the content image and style characteristics of the style image), recoding the initialized image and the content image at each layer of the VGG-19 network, wherein the number of convolution kernels of the first layer is N_lThe size of the characteristic diagram is M_l，M_lThe feature map output by each layer can be stored as a matrix as the product of the feature map width and height on the l-th layer

F_l(x) and F_l(x_c) And representing the corresponding characteristic representations of the initialization image and the content image on the ith layer of the network, and taking the characteristic representations as content representations.

in particular, in order to avoid the problems of unobvious texture and disordered distribution of the wood carving scores, the method is realized by adding a space guide conduction channel. Taking the mask image as a spatial guide channel, the spatial guide channel can be understood as a region with a pixel value of 1 in the mask image, and each convolution layer is vectorized into a feature map and a vectorized spatial guide channel T_l ^rPerforming corresponding element multiplication operation, and defining a space guide characteristic diagram as follows:

wherein ,

is composed of

Of the ith column vector, T_l ^rRepresenting the R-th guide channel on the l-th layer, R ∈ R, the spatial guide characteristic of the l-th layer is characterized as F_l ^r(x)；

after the style images obtained by calculation of the style conversion network in the step 2 and the feature representations of the generated images are represented, carrying out corresponding element multiplication operation on each feature image and the corresponding space guide channel generated in the step 3 to obtain a space guide feature representation; then, calculating the correlation between the space guide characteristic graphs by using the Gram matrix to obtain a space guide Gram matrix as a space guide style representation, and defining style loss by using Euclidean distance;

in particular, the spatial guidance content representation can be obtained by the formula (4) calculation,

and

respectively representing the initialization image and the content image on the I layer of the network to guide the content characterization, and defining a content loss function as follows:

specifically, the spatial guide characteristic representation is obtained through the formula (4) calculation; and then, calculating the correlation between the space guide characteristic graphs by using a Gram matrix as the style representation of the space guide channel region, wherein the space guide Gram matrix is defined as:

defining an initial image x for generating a figure, a woodcut stylized image x_s，

And

respectively representing the space guide style of the generated image and the woodcut art style image on the l-th layer in the network. Using mean square error definition to generate the difference between the graph and the woodcut art style image, defining the style loss function of the l-th layer as:

the style loss function for all layers is then:

wherein ,w^lRepresenting the weighting factors of the characteristic characterization of each layer in the VGG-19.

Loss of content function L_cAnd a style loss function L_sWeighted simultaneous, defining the total loss function:

L_total＝αL_c+βL_s(9)

wherein ,L_cRepresents the loss function between the content graph and the generation graph, L_sImage for expressing print style andthe method comprises the steps of generating loss functions between graphs, wherein α and β respectively represent weights of a content loss function and a layout style loss function, selecting different α/β values to control stylization degree of a woodcut, and obtaining a generated graph through gradient reduction.

According to the method, conv4_2 in a VGG-19 network is selected as a content feature extraction layer, 5 network layers including conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 are selected as style feature extraction layers, and an original content image is selected as an initialization image of a generated image, so that the semantic structure of the image can be well maintained, meanwhile, the woodcut score texture effect is enhanced, and the iteration times are reduced.

The fourth part: comparison of style conversion results

The invention is applied to woodcut style conversion, and woodcut style conversion is respectively carried out on different types of pictures, such as portrait and scenery pictures, black-white pictures and color pictures, and the woodcut style conversion is carried out on the pictures and the Gatys pictures^[1]、Johnson^[6] and Li^[8]The results of the style conversion of (a) are compared, and the experimental results are shown in fig. 6 to 8.

In FIG. 6, the style conversion result of the black and white woodcut on line 1 shows Gatys^[1]，Johnson^[6] and Li^[8]The stylized results all show the phenomenon of disordered distribution of the wood carving nicking textures; gatys in the line 2 color person portrait style conversion result^[1]And Johnson^[6]The stylized result of woodcut nicking has unobtrusive texture feature and disordered texture distribution, Li^[8]Compared with the original print style image, the conversion result of the invention has uneven color distribution in the human face area, the invention has more obvious notch texture characteristics in the generation result of black-white and color portrait style conversion,the distribution of the score texture and the color is reasonable.

As can be seen from the scene image style conversion result of FIG. 7, Gatys^[1]The conversion result of (2) has distortion and destroys the semantic information to a certain extent; johnson^[6]The result of (2) may have a migration failure in a relatively smooth region, for example, the sky region in the stylized result of fig. 7 does not have style features of the same semantics in the layout style image, and the other semantic woodcut score texture features are not well represented. Li^[8]The conversion result of (2) is kept good on semantic structure information, but the woodcut notch texture feature is not outstanding. The stylized result of the present invention is superior to other methods in semantic structure preservation and representation of the woodcut score textural features.

The same region (white square region in fig. 7) is selected in the conversion result of fig. 7 and compared with the local region detail texture, see fig. 8. The comparison shows that the conversion result of the invention has more prominent texture characteristics of the woodcut nicks, is real and natural and is close to the nick effect in the real chromatic woodcut engraving.

In addition to the comparison of the above experimental results, the user evaluation of the visual quality is performed on the print style conversion result. The method comprises the steps of enabling a participant to watch a content image and a print art style image, watching generated drawings stylized by four methods in a random sequence, scoring each stylized generated drawing from three aspects of overall visual quality, wood-carving scoring texture quality and scoring texture distribution rationality by taking an original real print art style image as a standard, respectively correspondingly representing evaluation scores of 5 grades which are respectively poor, common, good and good by 1-5 grades, and then calculating the average score of each method according to the scores given by the participant. Evaluation experiments 20 persons with image processing directions and 20 non-professional persons were invited to participate in the experiments and scoring. Figure 9 is the average score statistic for the scores given by the participants in the assessment experiment.

From the experimental scoring results of fig. 9, the average scores of the invention in three aspects are higher than those of the other three methods, which shows that the conversion result of the woodcut style of the invention is superior to the other methods in the overall visual quality, the woodcut score texture quality and the texture distribution rationality.

The method for converting the style of the chromatic woodcut print well keeps the semantic structure of the content graph and well simulates the texture characteristics of woodcut nicks in the chromatic woodcut print, the nick textures are distributed uniformly and reasonably, and the conversion result is real and natural and is closer to the real woodcut print.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for converting style of chromatic woodcut based on semantic segmentation is characterized by comprising the following steps:

2. The method for converting the style of chromatic woodcut based on semantic segmentation as claimed in claim 1, wherein in the first step, the semantic segmentation of the content image by using a CRF-RNN network to obtain a semantic segmentation result map comprises:

step 1: image pixel X_iThe label of (A) is used as a random variable, the relation between pixels is used as an edge to form a conditional random field, and X is a random variable X₁,X₂,...X_NA vector of components, where N is the number of pixels in the image; when a global observation I is obtained, I is an image, (I, X) can be modeled as a CRF model, characterized by a gibbs distribution of the form:

wherein E (x) is the energy of a certain value taken for x, and Z (I) is a partition function;

wherein ,μ(x_i,x_j) Is a tag compatibility function for capturing compatibility between different tag pairs, for m1,2, M each

Q(x)＝Π_iQ_i(x_i)

wherein Q (X) represents the mean field approximation of CRF; x_iRepresenting a pixel in an image;

and 5: and training the network by using the PASCALContext semantic segmentation data set, inputting the content image into an end-to-end network of the FCN combined with the CRF-RNN after the training is finished, and finally obtaining a semantic segmentation result graph of the content image.

3. The method for converting the style of chromatic woodcut based on semantic segmentation as claimed in claim 1, wherein the third step comprises:

4. The method for converting the style of chromatic woodcut based on semantic segmentation as claimed in claim 3, wherein the function of the content loss in step 4 is:

F_l ^r(x) and F_l ^r(x_c) The spatial guidance content representation of the initialization image and the content image on the ith layer of the network respectively is shown as R ∈ R;

the style loss function is:

wherein ,

and

5. The method of claim 3, wherein the total loss function is:

L_total＝αL_c+βL_s