CN111340720B - Color matching woodcut style conversion algorithm based on semantic segmentation - Google Patents

Color matching woodcut style conversion algorithm based on semantic segmentation Download PDF

Info

Publication number
CN111340720B
CN111340720B CN202010091956.2A CN202010091956A CN111340720B CN 111340720 B CN111340720 B CN 111340720B CN 202010091956 A CN202010091956 A CN 202010091956A CN 111340720 B CN111340720 B CN 111340720B
Authority
CN
China
Prior art keywords
image
style
content
woodcut
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010091956.2A
Other languages
Chinese (zh)
Other versions
CN111340720A (en
Inventor
徐丹
李应涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202010091956.2A priority Critical patent/CN111340720B/en
Publication of CN111340720A publication Critical patent/CN111340720A/en
Application granted granted Critical
Publication of CN111340720B publication Critical patent/CN111340720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for converting the style of a color woodcut based on semantic segmentation, which comprises the following steps: step one: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image; step two: binarizing the semantic segmentation result graph to obtain an image mask; step three: and using semantic segmentation masks of the content image and the woodcut artistic style image as guidance, carrying out regional style conversion on the content image and the woodcut artistic style image by adding a space guidance channel, and finally obtaining an artistic style conversion result with the woodcut artistic style. The method for converting the color-set woodcut style effectively avoids the problems that woodcut texture is not obvious, the distribution of the notch texture is disordered and the like easily caused by the woodcut style conversion result. The wood engraving score texture presented by the wood engraving style conversion result of the method is obvious, the score texture distribution is reasonable, the conversion result is real and natural, and the wood engraving style conversion result is closer to the real wood engraving.

Description

Color matching woodcut style conversion algorithm based on semantic segmentation
Technical Field
The invention relates to the technical field of image processing, in particular to a color matching woodcut style conversion algorithm based on semantic segmentation.
Background
Neural network image style conversion is a technique for rendering styles of artistic style images onto content images using a neural network. Gatys et al [1] The pioneering of (a) demonstrated the ability of convolutional neural networks (convolutional neural networks, CNN) to create artistic images; later, neural network image style conversion has received increasing attention, and many methods have been proposed to improve or extend the original algorithms. Li and the like [2] Enhancing details and edge contours of the conversion result by adding Laplace loss; risser et al [3] A method for improving style conversion stability by adding histogram loss is provided; johnson et al [4] Through training a model, the rapid style conversion of the image is realized in a feedforward propagation mode; chen et al [5] Based on a local matching method, the rapid conversion of any style is realized; li and the like [6] A style conversion algorithm capable of learning a linear transformation matrix is provided in a data driving mode, and the style conversion algorithm can realize style conversion of any image and video.
At present, the neural network image style conversion algorithm can realize the conversion of any style image, but is different from paper painting such as woodcut, but is obtained through rubbing a medium, the woodcut texture is obvious in the work, the types of the woodcut in some local areas are basically consistent, the texture distribution is approximately uniform, the result obtained by the conventional neural network image style conversion algorithm is easy to appear the phenomena of unobvious woodcut, disordered distribution of the cut texture and destroyed semantic information of a content image. The reason for the presence of said defects is: the existing neural network style conversion methods are divided into two main types: (1) an online neural network method based on image optimization; (2) an offline neural network method based on model optimization. For the method of class (1), the generated image is initialized to random noise, a VGG19 network is used as a feature extractor, feature characterization extracted by a higher layer of the VGG19 network is used as content characterization, and correlation among feature characterization extracted by each convolution layer is used as style characterization; the Gram matrix is used for calculating the correlation among different feature characterizations to be used as a style characterization, and as Gram can only extract the global average feature of the image, the space object information is not limited, and the phenomenon is easy to occur in the woodcut stylized image obtained by optimizing on the white noise initialized image. The method of class (2) usually obtains specific model or decoder parameters by training models or decoders, and any artistic style image can be stylized by training the models or decoders, but the models or decoders do not design structures for correspondingly highlighting the texture of woodcut and optimizing the texture distribution for the characteristics of woodcut, so that the stylized result is easy to appear.
[1]Gatys LA,EckerA S,Bethge M.Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recogni-tion.2016:2414-2423
[2]Li S,Xu X,Nie L,et al.Laplacian-steered neural style transfer[C]//Proceedings ofthe 25thACM international conference on Multimedia.ACM,2017:1716-1724
[3]Risser E,Wilmot P,Barnes C.Stable and controllable neural texture synthesis and style transferusing histogram losses[OL].arXivpreprint arXiv:1701.08893,2017
[4]Johnson J,Alahi A,Fei-Fei L.Perceptual losses for real-time style transfer and super-resolution[C]//European conference on computer vision.Springer,Cham,2016:694-711
[5]Chen T Q,Schmidt M.Fast patch-based style transfer of arbitrary style[OL].arXivpreprint arXiv:1612.04337,2016
[6]Li X,Liu S,Kautz J,et al.Learning Linear Transformations for Fast Image and Video Style Transfer[C]//Proceedings of the IEEE Conference on Computer Vision andPattern Recognition.2019:3809-3817
[7]Zheng S,Jayasumana S,Romera-Paredes B,et al.Conditional random fields as recurrent neural net-works[C]//Proceedings ofthe IEEE international confer-ence on computervision.2015:1529-1537
[8]Gatys L A,Ecker A S,Bethge M,et al.Controlling perceptual factors in neural style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition.2017:3985-3993
[9]Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[OL].arXivpreprint arXiv:1409.1556,2014。
Disclosure of Invention
The invention aims to solve the problems that wood nicks are not obvious, nicked texture distribution is disordered and generated graph semantic information is destroyed easily in the result obtained during wood nicking style conversion, and provides a color matching wood nicking style conversion algorithm based on semantic segmentation, so that the generated wood nicking texture distribution is reasonable, and the conversion result is true and natural.
A method for converting the style of a color woodcut based on semantic segmentation comprises the following steps:
step one: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image;
the method comprises the steps of performing semantic segmentation on a content image by using a CRF-RNN network to obtain a semantic segmentation result graph; dividing the print artistic style image by using a semantic labeling tool Lableme to obtain a semantic segmentation result diagram;
step two: respectively carrying out binarization processing on the two semantic segmentation result images to respectively obtain two complementary content image masks of the content image and two complementary style image masks of the print artistic style image;
step three: and using the content image mask and the style image mask as guidance, carrying out regional style conversion on the content image and the woodcut artistic style image by adding a spatial guidance channel, and finally obtaining an artistic style conversion result with the woodcut style.
Further, in the method for converting the engraving style of the color matching woods in semantic segmentation, in the first step, the performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result graph includes:
step 1: pixel X of the content image i Is used as a random variable, the relation between pixels is used as an edge to form a conditional random field, and X is a random variable X 1 ,X 2 ,...X N A component vector, where N is the number of pixels in the image; when a global observation I is obtained, I is an image, (I, X) can be modeled as a CRF model characterized by a gibbs distribution of the form:
Figure BDA0002383976040000041
wherein E (x) is energy of which x takes a certain value, and Z (I) is a distribution function.
In the CRF model, the energy given to x a certain tag is calculated from the following energy function:
Figure BDA0002383976040000042
wherein ,ψu (x i ) For unitary energy component, for measuring the label x i Probability, ψ, assigned to pixel i p (x i ,x j ) Is a binary energy component describing the association between two adjacent pixels i, j;
the unitary energy component is calculated from the CNN, only the labels of the pixels are roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is expressed as a weighted gaussian kernel function:
Figure BDA0002383976040000043
wherein ,μ(xi ,x j ) As a tag compatibility function, for capturing compatibility between different tag pairs, for each of m=1, 2
Figure BDA0002383976040000044
Is a Gaussian function applied to the feature vector, w (m) For each of the m=1, 2, weight of M, f i ,f j Is a feature vector of the pixel;
step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge extrapolation, which approximates the CRF distribution P (X) with a simpler distribution Q (X), which can be written as the product of independent edge distributions, namely:
Q(x)=Π i Q i (x i )
wherein Q (X) represents an average field approximation of CRF; x is X i Representing a certain pixel in the image.
Step 3: modeling the single iteration process of the CRF average field obtained in the step 2 as a forward propagation process of a CNN layer, and iterating the CRF average field for a plurality of times until the iteration times are completed, wherein the iteration times are equivalent to treating the CRF average field reasoning as an RNN model, and the model is called CRF-RNN;
step 4: combining the CRF-RNN model with the FCN network to form an end-to-end network;
step 5: and training the network by using the PASCAL Context semantic segmentation data set, and inputting the content image into the FCN combined with the CRF-RNN end-to-end network after the training is finished, so as to finally obtain a semantic segmentation result graph of the content image.
Further, the method for converting the engraving style of the color matching woodcut in semantic segmentation, as described above, includes:
step 1: using the content image as an initialization image for generating a map; the method comprises the steps that a corresponding feature map is obtained in each convolution layer of a network by a content image and a generated map, the feature map of each layer is stored as a two-dimensional matrix to obtain the feature representation of the layer, and the feature representation extracted from the higher layer of the VGG19 network is used as the content representation;
step 2: the woodcut style image and the generated image can also obtain corresponding characteristic representation on each convolution layer of the network; calculating the correlation among the characteristic diagrams of each channel of each layer by using a Gram matrix to be used as style characterization;
step 3: the mask image is input to a style conversion network, which generates guide channels T at each layer based on the mask image, and then re-encoded l r This corresponds to the guiding channel T l r The method is characterized in that a piece of weight information is added to the feature map, the activation value of the region corresponding to the guide channel in the feature map is increased due to the action of the weight, and the image optimization process is only carried out in the corresponding space guide channel region;
step 4: step 1, after a content image is obtained and a content representation of the image is generated through style conversion network calculation, performing corresponding element multiplication operation on a feature image in the content representation and a space guiding channel corresponding to the generated space guiding channel in step 3 to obtain a space guiding content representation, and defining content loss by using Euclidean distance;
after the woodcut style image obtained by calculation through the style conversion network in the step 2 and the feature representation of the generated image are subjected to corresponding element multiplication operation, so that the space guiding feature representation is obtained by carrying out corresponding element multiplication operation on each feature image and the space guiding channel generated in the step 3; calculating the correlation between the space guidance feature graphs by using the Gram matrix to obtain a space guidance Gram matrix as a space guidance style characterization, and defining style loss by using Euclidean distance;
step 5: and (3) weighting and combining the content loss and the style loss to obtain a total loss function, optimizing the generated image initialization image by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with woodcut style.
Further, according to the method for converting the engraving style of the color matching woodcut in semantic segmentation, the function of the content loss in the step 4 is as follows:
Figure BDA0002383976040000067
wherein x represents a generated map initialization image, x c Representing a content image;
Figure BDA0002383976040000061
and />
Figure BDA0002383976040000062
Representing spatially directed content representations of the initialization image and the content image, respectively, at a first layer of the network; r is R;
the style loss function is:
Figure BDA0002383976040000063
wherein ,
Figure BDA0002383976040000064
w l a weight factor representing a characterization of each layer of features in VGG-19; />
Figure BDA0002383976040000065
And->
Figure BDA0002383976040000066
And respectively generating style characterization of the first layer of the image and the woodcut artistic style image in the network.
Further, according to the method for converting the engraving style of the chromatography woodcut based on semantic segmentation, the total loss function is as follows:
L total =αL c +βL s
wherein ,Lc Representing a loss function between a content graph and a generated graph, L s Representing a loss function between the print style image and the generated map, and alpha and beta represent weights of the content loss function and the print style loss function, respectively.
The beneficial effects are that:
the method for converting the color-set woodcut style effectively avoids the problems that woodcut texture is not obvious, the distribution of the notch texture is disordered and the like easily caused by the woodcut style conversion result. The wood engraving score texture presented by the wood engraving style conversion result of the method is obvious, the score texture distribution is reasonable, the conversion result is real and natural, and the wood engraving style conversion result is closer to the real wood engraving.
The conversion method provided by the invention is based on a neural network segmentation algorithm and a CNN image style conversion method, takes a content image as a generated image initialization image, uses an image mask as a guide, and performs style conversion of color-set woodcut by adding a space guide channel, so that the problems of insignificant woodcut notch texture, disordered notch texture distribution and the like in woodcut stylization are avoided. The principle is as follows:
the invention provides a woodcut style conversion method, which belongs to an online neural network method based on image optimization, wherein each original input image is provided with two complementary mask images, the pixel value of each image mask is 0 or 1 (the pixel value of a black area in each mask image is 0, the pixel value of a white area in each mask image is 1), the image mask is used as a guide, a space guide channel is added for carrying out woodcut regional style conversion, and the space guide channel can be understood as an area with the pixel value of 1 in each mask image;
using the content map, the style map and the corresponding mask images as inputs to a style conversion network, the network generating guide channels T at each layer based on the mask images l r Will guide the channel T l r Performing corresponding element multiplication operation with the feature map extracted by the network to obtain a spatial guidance feature representation, which is equivalent to the guidance channel T l r The weight information is added for the feature map, the activation value of the area corresponding to the guide channel in the feature map is increased due to the action of the weight, and the Gram matrix is formedOnly the feature correlation in the guide channel region is calculated, and the network only optimizes the guide channel region when optimizing the style loss, so that the influence of style features of the non-guide channel region is eliminated, and the phenomenon of disordered wood mark texture distribution is avoided; and optimizing the area of the guide channel by using the first mask image, and optimizing the corresponding guide channel area by using the second mask image after the set iteration times are reached until stopping after the set iteration times, so as to obtain the stylized image.
According to the invention, the content image is used for replacing the white noise image as the initialization image of the generated image, compared with the white noise image initialization, the semantic structure information in the image can be well maintained, the iteration times are reduced, and the VGG19 network can extract the semantic features of the generated image more easily; the Gram matrix is easier to acquire the correlation between the features of the advanced semantic information of the image and the woodcut style features on the feature representation of the image with the semantic information, the optimization is carried out on the initialization generation image with the semantic information, the noise interference is reduced, the migration of the woodcut style features is enhanced in the image optimization process by combining the space guiding channel, and the conversion result woodcut style features are more obvious;
the combination of the space guiding channel and the content image instead of white noise is used as an initialization image for generating the image, so that the problems that the grain characteristics of woodcut are not obvious, the grain distribution of the woodcut is disordered, the semantic information of the image is destroyed and the like in the stylization of woodcut are avoided.
Drawings
FIG. 1 is an overall flowchart of a color woodcut style conversion;
FIG. 2 is a schematic diagram of a CRF-RNN algorithm for content image segmentation;
FIG. 3 is an original image, an image semantic segmentation result, and a mask image thereof;
FIG. 4 is a flow chart of the color woodcut zoned style conversion;
FIG. 5 is a diagram of stylized results for different weights;
FIG. 6 is a comparison of the portrait image woodcut style conversion results;
FIG. 7 is a comparison of woodcut style conversion results for a scene image;
FIG. 8 is a partial texture detail contrast;
FIG. 9 is a visual evaluation average statistics of woodcut style conversion results.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the present invention will be clearly and completely described below, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A first part: image semantic segmentation:
content image semantic segmentation
For semantic segmentation of content images, CRF-RNN algorithm is used [7] The semantic segmentation at the pixel level is obtained, and the algorithm is an end-to-end image semantic neural network segmentation algorithm.
In the pixel-by-pixel labeling task of an image, CRF (conditional random field, CRF) is typically used for label class prediction, where labels of pixels of the image are used as random variables, and the relationship between pixels is used as edges to form a conditional random field, and the CRF can model the labels when global observation is obtained. Setting the image I to have N pixels, distributing each pixel in the image I into a preset label set L, and enabling the label of the pixel distributed to the pixel I to be a random variable X i ,X i E L; let the variable X be X 1 ,X 2 ,...,X N A random vector of components. Let graph g= (V, E), where v= { X 1 ,X 2 ,...,X N V is equivalent to X, in which a graph is typically represented by G (V, E), so V is used to represent the set of pixels, E is the pixel-to-pixel relationship, and when a global observation I (I is an image) is obtained, (I, X) can be modeled as a CRF model characterized by a gibbs distribution of the form:
Figure BDA0002383976040000091
wherein E (x) is energy of which x takes a certain value, and Z (I) is a distribution function. In the fully connected CRF model, the energy of the tag is defined as:
Figure BDA0002383976040000092
wherein ,ψu (x i ) For unitary energy component, for measuring the label x i Probability, ψ, assigned to pixel i p (x i ,x j ) The term is used for describing the association between two adjacent pixels i and j, and can lead the pixel points with similar color values of the adjacent pixels to have a higher probability of being separated into the same category of labels. The unitary energy component is calculated from the CNN, only the labels of the pixels are roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is expressed as a weighted gaussian kernel function:
Figure BDA0002383976040000093
in the formula ,μ(xi ,x j ) Is a tag compatibility function used to capture compatibility between different tag pairs. For m=1, 2,..
Figure BDA0002383976040000094
Is a Gaussian function applied to the feature vector, w (m) For each of the m=1, 2, weight of M, f i ,f j Is a feature vector of the pixel.
Minimizing the above-mentioned CRF energy E (X) to obtain the largest possible label for a given image, since this exact minimization is difficult to handle, the average field approximation of the CRF distribution is used for maximum a posteriori edge extrapolation, which approximates the CRF distribution P (X) with a simpler distribution Q (X), can be written as a multiplication of the individual edge distributionsThe product, i.e. Q (x) =n i Q i (x i ) Q (X) represents the average field approximation of CRF. Modeling a single iteration process of the CRF average field as a forward propagation process of the CNN layer, iterating the CRF average field multiple times until the iteration number is completed, the iteration number is generally 10 times, which is equivalent to treating the CRF average field inference as an RNN model, and the whole algorithm can be expressed as a RNN process.
Defining the RNN structure as CRF-RNN, regarding a CRF average field as an RNN calculation process, combining the model with FCNs (fully convolutionalnetworks, FCNs) to form an end-to-end network, training the network by using a PASCALContex semantic segmentation data set, inputting the content image into the FCN combined with the CRF-RNN end-to-end network after training is completed, and finally obtaining a semantic segmentation result graph of the content image. The structure of the binding of CRF-RNN to FCN is shown in FIG. 2.
The following describes the segmentation of the print art style image:
the precondition for realizing accurate segmentation by using the CRF-RNN semantic segmentation network is that a large number of marked image data sets need to be trained, the existing woodcut data sets are smaller, and segmentation results meeting the woodcut artistic style conversion conditions are difficult to obtain after training by using the CRF-RNN semantic segmentation network; therefore, the Labelme image annotation tool is used for semantically segmenting the print artistic style image.
A second part: semantic segmentation result binarization
The method comprises the steps of performing color matching woodcut, performing semantic style conversion on a content image and a mask of a woodcut artistic style image, performing binarization processing on the segmentation result by using a CRF-RNN image semantic segmentation algorithm and a segmentation result of the content image and the woodcut artistic style image, and obtaining mask images of the content image and the woodcut artistic style image, wherein each original image is provided with two complementary mask images. The original image, the image semantic segmentation result and the mask image thereof are shown in fig. 3.
Third section: color-set woodcut style conversion
Color woodcut regional style conversion is based on CNN image style conversion methodMethod of [1] And image stylization method with space guiding channel [8] . And taking the semantic segmentation mask image as a guide, and carrying out regional style conversion on the space guide channel region on the content image and the print artistic style image. Convolutional neural network model using pretrained VGG-19 [9] As a feature extractor, the feature characterization extracted by the higher layer of the convolutional neural network is used as a content characterization, and the correlation among the channel feature characterizations of the convolutional layer is used as a style characterization. Namely: the VGG19 network has the capability of extracting high-level semantic information of images, the network recodes the images after the images are input into the network, each convolution layer of the network extracts corresponding feature images, and the feature images are stored as a two-dimensional matrix to obtain the feature characterization of the response of the layer. The conversion method specifically comprises the following steps:
step 1: using the content image as an initialization image for generating a map; the method comprises the steps that a corresponding feature map is obtained in each convolution layer of a network by a content image and a generated map, the feature map of each layer is stored as a two-dimensional matrix to obtain the feature representation of the layer, and the feature representation extracted from the higher layer of the VGG19 network is used as the content representation;
specifically, a generated map initialization image x and a content image x are defined c (the generated image is an image as an optimization object, the content image and the style image are used as references, the optimization is carried out on the third image to obtain a conversion result image with semantic information of the content image and style characteristics of the style image simultaneously), the initialized image and the content image are recoded on each layer of the VGG-19 network, and the convolution kernel number of the first layer is N l The size of the characteristic diagram is M l ,M l The feature map output by each layer can be stored as a matrix as the product of the width and height of the feature map on the first layer
Figure BDA0002383976040000111
F l(x) and Fl (x c ) Representing the corresponding feature characterizations of the initialization image and the content image at the first layer of the network, and taking the feature characterizations as content characterizations.
Step 2: the woodcut style image and the generated image can also obtain corresponding characteristic representation on each convolution layer of the network; calculating the correlation among the characteristic diagrams of each channel of each layer by using a Gram matrix to be used as style characterization;
step 3: the mask image is input to a style conversion network, which generates guide channels T at each layer based on the mask image, and then re-encoded l r This corresponds to the guiding channel T l r The method is characterized in that a piece of weight information is added to the feature map, the activation value of the region corresponding to the guide channel in the feature map is increased due to the action of the weight, and the image optimization process is only carried out in the corresponding space guide channel region;
specifically, in order to avoid the problems of unobvious wood score texture and disordered score distribution, the method is realized by adding a space guiding channel. Taking the mask image as a space guiding channel, wherein the space guiding channel can be understood as a region with a pixel value of 1 in the mask image, and vectorizing the feature map of each convolution layer and the vectorized space guiding channel T l r Performing corresponding element multiplication operation, and defining a space guiding feature map as follows:
Figure BDA0002383976040000112
wherein ,
Figure BDA0002383976040000113
is->
Figure BDA0002383976040000114
Is the ith column vector of (T) l r Representing the R guide channel on the first layer, R E R; the spatial guiding feature of the first layer is characterized as F l r (x);
Step 4: step 1, after a content image is obtained and a content representation of the image is generated through style conversion network calculation, performing corresponding element multiplication operation on a feature image in the content representation and a space guiding channel corresponding to the generated space guiding channel in step 3 to obtain a space guiding content representation, and defining content loss by using Euclidean distance;
after the style image obtained by the style conversion network calculation in the step (2) and the feature representation of the generated image are subjected to corresponding element multiplication operation, so that the space guiding feature representation is obtained by carrying out corresponding element multiplication operation on each feature image and the space guiding channel generated in the step (3); calculating the correlation between the space guidance feature graphs by using the Gram matrix to obtain a space guidance Gram matrix as a space guidance style characterization, and defining style loss by using Euclidean distance;
in particular, the spatial guidance content characterization can be obtained through the formula (4) calculation,
Figure BDA0002383976040000121
and />
Figure BDA0002383976040000122
Representing spatially directed content characterization of the initialization image and the content image, respectively, at a first layer of the network, defining a content loss function as:
Figure BDA0002383976040000123
specifically, obtaining a space guidance characteristic representation through (4) formula calculation; and then using a Gram matrix to calculate the correlation between the space guidance feature maps as the style characterization of the space guidance channel region, wherein the space guidance Gram matrix is defined as:
Figure BDA0002383976040000124
defining an initialization image x of a generated image, and a woodcut style image x s
Figure BDA0002383976040000125
And->
Figure BDA0002383976040000126
Spatial guidance of first layer in network for respectively generating image and woodcut artistic style imageAnd (5) style characterization. Using the mean square error definition to generate the difference between the image and the woodcut artistic style image, and defining the style loss function of the first layer as follows:
Figure BDA0002383976040000127
the style loss function for all layers is:
Figure BDA0002383976040000128
wherein ,wl Representing the weight factor characterizing each layer in VGG-19.
Step 5: and (3) weighting and combining the content loss and the style loss to obtain a total loss function, optimizing the generated image initialization image by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with woodcut style.
The content loss function L c And style loss function L s Weighted simultaneous, defining a total loss function:
L total =αL c +βL s (9)
wherein ,Lc Representing a loss function between a content graph and a generated graph, L s And (3) representing a loss function between the woodcut style image and the generated image, wherein alpha and beta respectively represent weights of the content loss function and the woodcut style loss function, different alpha/beta values are selected to control the stylization degree of the woodcut, and the generated image is obtained through gradient descent. The space guiding channel ensures that patterns are transferred between similar semantic areas in content and style images, so that the condition that the distribution of texture features of the whole image is disordered is avoided, and fig. 4 is a color-set woodcut zoned style conversion flow chart.
According to the invention, the conv4_2 in the VGG-19 network is selected as a content feature extraction layer, the 5 network layers of conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 are selected as style feature extraction layers, and an original content image is selected as an initialized image of a generated image, so that the semantic structure of the image can be well maintained, the texture effect of woodcarving marks is enhanced, and the iteration times are reduced. For the weight of the content loss function and the style loss function, the larger the alpha/beta is, the lower the stylization degree of the generated image woodcut is, and the higher the stylization degree is otherwise. Fig. 5 is a graph of the degree of stylization generated under different weights.
Fourth part: comparison of style conversion results
The invention is applied to woodcut style conversion, which respectively carries out woodcut style conversion on different types of pictures, such as portrait and scenery pictures, black-and-white pictures and color pictures, and the pictures are respectively matched with Gatys [1] 、Johnson [6] and Li[8] The style conversion results of (2) are compared, and the experimental results are shown in fig. 6-8.
In FIG. 6, the style conversion result of the 1 st line black and white woodcut shows Gatys [1] ,Johnson [6] and Li[8] The phenomenon of disordered grain distribution of the wood nicks appears in the stylized results; in the 2 nd line color character portrait style conversion result, gatys [1] And Johnson [6] Is characterized by unobtrusive and disordered grain distribution, li [8] Compared with the original print style image, the color distribution of the face area is uneven, the invention has more obvious notch texture characteristics and reasonable notch texture and color distribution in the generation result of the black-white and color character portrait style conversion.
As can be seen from the scene image style conversion result of FIG. 7, gatys [1] The conversion result of (2) has distortion, and the phenomenon of damaging semantic information to a certain extent; johnson [6] As a result, migration failure occurs in a relatively smooth region, such as the stylized result of fig. 7, where the sky region does not have style features of the same semantics in the print style image, and where other semantically scored texture features do not perform well. Li (Li) [8] The conversion result of (2) remains good in the semantic structure information, but the woodcut texture features do not appear to be prominent. The stylized result of the present invention maintains and displays the woodcut texture in the semantic structureFeatures are superior to other methods.
The same region (white box region in fig. 7) is selected in the conversion result of fig. 7 for comparison with the local region detail texture, see fig. 8. Compared with the prior art, the invention has the advantages that the transformation result is more prominent in wood engraving texture characteristics, real and natural, and is close to the engraving effect in real color-covered wood engraving.
And in addition to the comparison of the experimental results, performing user assessment of visual quality on the print style conversion results. The participants firstly watch the content images and the print artistic style images, watch the stylized generated images of the four methods in random sequence, take the original real print artistic style images as the standard, score each stylized generated image from three aspects of overall visual quality, wood score texture quality and score texture distribution rationality, evaluate the score into 5 grades, namely very poor, general good and very good, respectively, and respectively represent the three grades by 1-5 scores, and then calculate the average score of each method according to the score given by the participants. The evaluation experiment invited 20 persons and 20 non-professionals to participate in the experiment and scoring the image processing direction. Fig. 9 is an average score statistic evaluating scores given by experimental participants.
From the experimental scoring results in fig. 9, the average score of the present invention is higher in three aspects than the other three methods, indicating that the woodcut style conversion results of the present invention are superior to the other methods in overall visual quality, woodcut texture quality, and texture distribution rationality.
The color matching woodcut style conversion method provided by the invention has the advantages that on one hand, the semantic structure of the content graph is well maintained, on the other hand, the wood cut texture characteristics in the color matching woodcut are well simulated, the distribution of the cut textures is uniform and reasonable, the conversion result is true and natural, and the color matching woodcut style conversion method is closer to the true woodcut.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (1)

1. A method for converting the style of a color register woodcut based on semantic segmentation is characterized by comprising the following steps:
step one: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image;
the method comprises the steps of performing semantic segmentation on a content image by using a CRF-RNN network to obtain a semantic segmentation result graph; dividing the print artistic style image by using a semantic labeling tool Lableme to obtain a semantic segmentation result diagram;
step two: respectively carrying out binarization processing on the two semantic segmentation result images to respectively obtain two complementary content image masks of the content image and two complementary style image masks of the print artistic style image;
step three: using the content image mask and the style image mask as guidance, carrying out regional style conversion on the content image and the woodcut artistic style image by adding a space guidance channel, and finally obtaining an artistic style conversion result with the woodcut style;
in the first step, the performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result graph includes:
step 1: label X for image pixels i As random variables, the relationship between pixels is used as edge to form a conditional random field, and X is defined by random variable X 1 ,X 2 ,...X N A component vector, where N is the number of pixels in the image; when global observations I are obtained, (I, X) can be modeled as a CRF model characterized by a gibbs distribution of the form:
Figure FDA0004163308410000011
wherein E (x) is energy of which x takes a certain value, and Z (I) is a distribution function;
in the CRF model, the energy given to x a certain tag is calculated from the following energy function:
Figure FDA0004163308410000012
wherein ,ψu (x i ) For unitary energy component, for measuring the label x i Probability, ψ, assigned to pixel i p (x i ,x j ) Is a binary energy component, describing the association between two adjacent pixels i, j,
the unitary energy component is calculated from the CNN, only the labels of the pixels are roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is expressed as a weighted gaussian kernel function:
Figure FDA0004163308410000021
wherein ,μ(xi ,x j ) As a tag compatibility function, for capturing compatibility between different tag pairs, for m=1, 2,..m, K G (m) Is a Gaussian function applied to the feature vector, W (m) M=1, 2,..m, weight of M, f i ,f j Feature vectors for pixels i and j;
step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge extrapolation, which is to approximate the CRF distribution P (X) with a simpler distribution Q (X, which can be written as the product of the independent edge distributions, i.e.:
Q(x)=∏ i Q i (x i )
wherein Q (X) represents an average field approximation of CRF; x is X i Representing a pixel in the image;
step 3: modeling the single iteration process of the CRF average field obtained in the step 2 as a forward propagation process of a CNN layer, and iterating the CRF average field for a plurality of times until the iteration times are completed, wherein the iteration times are equivalent to treating the CRF average field reasoning as an RNN model, and the model is called CRF-RNN;
step 4: combining the CRF-RNN model with the FCN network to form an end-to-end network;
step 5: training the network by using the PASCALContext semantic segmentation data set, and inputting the content image into the network after training is finished to finally obtain a semantic segmentation result graph of the content image;
the third step comprises the following steps:
step 1: using the content image as an initialization image for generating a map; the method comprises the steps that a corresponding feature map is obtained in each convolution layer of a network by a content image and a generated map, the feature map of each layer is stored as a two-dimensional matrix to obtain the feature representation of the layer, and the feature representation extracted from the higher layer of the VGG19 network is used as the content representation;
step 2: the woodcut style image and the generated image can also obtain corresponding characteristic representation on each convolution layer of the network; calculating the correlation among the characteristic diagrams of each channel of each layer by using a Gram matrix to be used as style characterization;
step 3: the mask image is input to a style conversion network, which generates guide channels at each layer based on the mask image, and then re-encoded
Figure FDA0004163308410000031
This corresponds to a guide channel->
Figure FDA0004163308410000032
The method is characterized in that a piece of weight information is added to the feature map, the activation value of the region corresponding to the guide channel in the feature map is increased due to the action of the weight, and the image optimization process is only carried out in the corresponding space guide channel region;
step 4: step 1, after a content image is obtained and a content representation of the image is generated through style conversion network calculation, performing corresponding element multiplication operation on a feature image in the content representation and a space guiding channel corresponding to the generated space guiding channel in step 3 to obtain a space guiding content representation, and defining content loss by using Euclidean distance;
after the woodcut style image obtained by calculation through the style conversion network in the step 2 and the feature representation of the generated image are subjected to corresponding element multiplication operation, so that the space guiding feature representation is obtained by carrying out corresponding element multiplication operation on each feature image and the space guiding channel generated in the step 3; calculating the correlation between the space guidance feature graphs by using the Gram matrix to obtain a space guidance Gram matrix as a space guidance style characterization, and defining style loss by using Euclidean distance;
step 5: the content loss and the style loss are weighted and combined to obtain a total loss function, the gradient descent is used for optimizing the initialized image of the generated image, the iteration times are set, the iteration times are reached, and the conversion result with woodcut style is finally obtained;
the function of the content loss in step 4 is:
Figure FDA0004163308410000033
wherein X represents a generated map initialization image, X c Representing a content image;
Figure FDA0004163308410000034
and />
Figure FDA0004163308410000035
Representing spatially directed content representations of the initialization image and the content image, respectively, at a first layer of the network; r is R;
the style loss function is:
Figure FDA0004163308410000041
wherein ,
Figure FDA0004163308410000042
W l a weight factor representing a characterization of each layer of features in VGG-19;
Figure FDA0004163308410000045
and->
Figure FDA0004163308410000044
Respectively generating style characterization of the first layer of the image and the woodcut artistic style image in the network;
the total loss function is:
L total =αL c +βL s
wherein ,Lc Representing a loss function between a content graph and a generated graph, L s Representing a loss function between the print style image and the generated map, and alpha and beta represent weights of the content loss function and the print style loss function, respectively.
CN202010091956.2A 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation Active CN111340720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010091956.2A CN111340720B (en) 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010091956.2A CN111340720B (en) 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation

Publications (2)

Publication Number Publication Date
CN111340720A CN111340720A (en) 2020-06-26
CN111340720B true CN111340720B (en) 2023-05-19

Family

ID=71186865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010091956.2A Active CN111340720B (en) 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN111340720B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI762971B (en) * 2020-07-15 2022-05-01 宏碁股份有限公司 Method and computer program product for image style transfer
CN112288622B (en) * 2020-10-29 2022-11-08 中山大学 Multi-scale generation countermeasure network-based camouflaged image generation method
CN112967180B (en) * 2021-03-17 2023-12-22 福建库克智能科技有限公司 Training method for generating countermeasure network, image style conversion method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050722A (en) * 2014-06-06 2014-09-17 北京航空航天大学 Indoor three-dimensional scene layout and color transfer generation method driven by image contents
CN110503716A (en) * 2019-08-12 2019-11-26 中国科学技术大学 A kind of automobile license plate generated data generation method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3072037B1 (en) * 2013-11-19 2019-08-14 Wacom Co., Ltd. Method and system for ink data generation, ink data rendering, ink data manipulation and ink data communication
CN108470320B (en) * 2018-02-24 2022-05-20 中山大学 Image stylization method and system based on CNN
CN108805803B (en) * 2018-06-13 2020-03-13 衡阳师范学院 Portrait style migration method based on semantic segmentation and deep convolution neural network
CN108898082B (en) * 2018-06-19 2020-07-03 Oppo广东移动通信有限公司 Picture processing method, picture processing device and terminal equipment
CN109697690A (en) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 Image Style Transfer method and system
CN109712068A (en) * 2018-12-21 2019-05-03 云南大学 Image Style Transfer and analogy method for cucurbit pyrography

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050722A (en) * 2014-06-06 2014-09-17 北京航空航天大学 Indoor three-dimensional scene layout and color transfer generation method driven by image contents
CN110503716A (en) * 2019-08-12 2019-11-26 中国科学技术大学 A kind of automobile license plate generated data generation method

Also Published As

Publication number Publication date
CN111340720A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN111340720B (en) Color matching woodcut style conversion algorithm based on semantic segmentation
Jiang et al. Scfont: Structure-guided chinese font generation via deep stacked networks
CN110378985B (en) Animation drawing auxiliary creation method based on GAN
CN106548208B (en) A kind of quick, intelligent stylizing method of photograph image
CN111310760B (en) Method for detecting alpha bone inscription characters by combining local priori features and depth convolution features
CN110276264B (en) Crowd density estimation method based on foreground segmentation graph
CN107679491A (en) A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
Liu et al. Structure-guided arbitrary style transfer for artistic image and video
CN113705579B (en) Automatic image labeling method driven by visual saliency
Wang et al. Evaluate and improve the quality of neural style transfer
Zhu et al. Learning dual transformation networks for image contrast enhancement
Fang et al. Stylized-colorization for line arts
CN112329803B (en) Natural scene character recognition method based on standard font generation
CN113901916A (en) Visual optical flow feature-based facial fraud action identification method
Subramanian et al. Strive: Scene text replacement in videos
CN113112397A (en) Image style migration method based on style and content decoupling
Tomar et al. An Effective Cartoonifying of an Image using Machine Learning
Bagwari et al. An edge filter based approach of neural style transfer to the image stylization
Li et al. Scribble-to-Painting Transformation with Multi-Task Generative Adversarial Networks.
Oh et al. A unified model for semi-supervised and interactive video object segmentation using space-time memory networks
Ashwini et al. Enhancing the Resolution of Ancient Artworks using Generative Adversarial Networks
CN110163927B (en) Single image re-coloring method based on neural network
Fan et al. SemiRefiner: Learning to Refine Semi-realistic Paintings
Manushree et al. XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant