CN111340720A - Color register woodcut style conversion algorithm based on semantic segmentation - Google Patents

Color register woodcut style conversion algorithm based on semantic segmentation Download PDF

Info

Publication number
CN111340720A
CN111340720A CN202010091956.2A CN202010091956A CN111340720A CN 111340720 A CN111340720 A CN 111340720A CN 202010091956 A CN202010091956 A CN 202010091956A CN 111340720 A CN111340720 A CN 111340720A
Authority
CN
China
Prior art keywords
image
style
content
woodcut
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010091956.2A
Other languages
Chinese (zh)
Other versions
CN111340720B (en
Inventor
徐丹
李应涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202010091956.2A priority Critical patent/CN111340720B/en
Publication of CN111340720A publication Critical patent/CN111340720A/en
Application granted granted Critical
Publication of CN111340720B publication Critical patent/CN111340720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for converting style of chromatic woodcut based on semantic segmentation, which comprises the following steps: the method comprises the following steps: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image; step two: carrying out binarization processing on the semantic segmentation result image to obtain an image mask; step three: and performing regional style conversion on the content image and the print artistic style image by adding a space guide channel by using the semantic segmentation mask of the content image and the print artistic style image as a guide, and finally obtaining an artistic style conversion result with a woodcut print style. The method for converting the style of the chromatic woodcut engraving effectively avoids the problems that woodcut nick textures are not obvious, nick textures are distributed in a disordered mode and the like easily caused by the style conversion result of the woodcut engraving. The wood engraving style conversion result of the method has obvious wood engraving nick texture, the nick texture is distributed reasonably, and the conversion result is real and natural and is closer to a real wood engraving.

Description

Color register woodcut style conversion algorithm based on semantic segmentation
Technical Field
The invention relates to the technical field of image processing, in particular to a color register woodcut style conversion algorithm based on semantic segmentation.
Background
Neural network image style conversion is a technique for rendering the style of artistic style images onto content images using a neural network. Gatys et al[1]The pioneering work of (c) demonstrated the ability of Convolutional Neural Networks (CNNs) to create artistic images; after thatNeural network image style conversion is receiving more and more attention, and many methods are proposed to improve or extend the original algorithm. Li and the like[2]Enhancing the details and edge contours of the conversion result by adding laplacian losses; risser et al[3]A method for improving the style conversion stability by adding histogram loss is provided; johnson et al[4]Through the training model, the rapid style conversion of the image is realized in a feed forward propagation mode; chen et al[5]Based on the local matching method, the quick conversion of any style is realized; li and the like[6]A style conversion algorithm capable of learning a linear transformation matrix is provided in a data driving mode, and the algorithm can realize style conversion of any images and videos.
At present, the neural network image style conversion algorithm can realize the conversion of images with any style, but for woodcut prints which are different from paper painting and are obtained by medium rubbing, the works are provided with obvious woodcut nick textures, the nick types are basically consistent in some local areas, the texture distribution is approximately uniform and uniform woodcut artistic style paintings, and the phenomena of unobvious woodcut nicks, disordered nick texture distribution and damaged semantic information of content images easily occur in the results obtained by the existing neural network image style conversion algorithm. The reasons why the defects exist are: existing neural network style conversion methods fall into two broad categories: (1) an online neural network method based on image optimization; (2) an off-line neural network method based on model optimization. For the method of the type (1), the generated image is generally initialized to random noise, the VGG19 network is used as a feature extractor, the feature characterizations extracted from the higher layers of the VGG19 network are used as content characterizations, and the correlation between the extracted feature characterizations of each convolutional layer is used as a style characterization; the method comprises the steps of calculating the correlation between different characteristic representations by using a Gram matrix to serve as style representations, wherein the Gram can only extract the global average characteristics of an image, and has no limit effect on space object information, so that the phenomenon is easy to occur in a woodcut stylized image obtained by optimizing the white noise initialized image. The method of the type (2) usually obtains specific model or decoder parameters by training a model or decoder, and any artistic style image can be stylized by training the model or decoder, but the model or decoder does not design a structure for correspondingly highlighting the wood engraving scoring texture and optimizing the texture distribution for the characteristics of the wood engraving, so that the stylization result is easy to generate the phenomenon.
[1]Gatys LA,EckerA S,Bethge M.Image style transfer usingconvolutional neural networks[C]//Proceedings of the IEEE conference oncomputer vision and pattern recogni-tion.2016:2414-2423
[2]Li S,Xu X,Nie L,et al.Laplacian-steered neural style transfer[C]//Proceedings ofthe 25thACM international conference on Multimedia.ACM,2017:1716-1724
[3]Risser E,Wilmot P,Barnes C.Stable and controllable neural texturesynthesis and style transferusing histogram losses[OL].arXivpreprint arXiv:1701.08893,2017
[4]Johnson J,Alahi A,Fei-Fei L.Perceptual losses for real-time styletransfer and super-resolution[C]//European conference on computervision.Springer,Cham,2016:694-711
[5]Chen T Q,Schmidt M.Fast patch-based style transfer of arbitrarystyle[OL].arXivpreprint arXiv:1612.04337,2016
[6]Li X,Liu S,Kautz J,et al.Learning Linear Transformations for FastImage and Video Style Transfer[C]//Proceedings of the IEEE Conference onComputer Vision andPattern Recognition.2019:3809-3817
[7]Zheng S,Jayasumana S,Romera-Paredes B,et al.Conditional randomfields as recurrent neural net-works[C]//Proceedings ofthe IEEE internationalconfer-ence on computervision.2015:1529-1537
[8]Gatys L A,Ecker A S,Bethge M,et al.Controlling perceptual factorsin neural style transfer[C]//Proceedings of the IEEE Conference on ComputerVision and PatternRecognition.2017:3985-3993
[9]Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[OL].arXivpreprint arXiv:1409.1556,2014。
Disclosure of Invention
The invention aims to solve the problems that wood engraving nicks are not obvious, nick textures are distributed in a disordered manner and generated graphic meaning information is damaged easily in the result obtained in the process of wood engraving print style conversion, and provides a color register wood engraving print style conversion algorithm based on semantic segmentation, so that the generated wood engraving print nick textures are distributed reasonably, and the conversion result is real and natural.
A method for converting style of chromatic woodcut based on semantic segmentation comprises the following steps:
the method comprises the following steps: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image;
performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result image; segmenting the print artistic style image by using a semantic annotation tool Lableme to obtain a semantic segmentation result image;
step two: respectively carrying out binarization processing on the two semantic segmentation result images to respectively obtain two complementary content image masks of the content image and two complementary style image masks of the layout artistic style image;
step three: and using the content image mask and the pattern image mask as guidance, and adding a space guide channel to perform regional style conversion on the content image and the print artistic style image to finally obtain an artistic style conversion result with the wood engraving style.
Further, in the method for converting style of chromatic woodcut in semantic segmentation, in the first step, performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result map includes:
step 1: image pixel X of contentiThe label of (A) is used as a random variable, the relation between pixels is used as an edge to form a conditional random field, and X is a random variable X1,X2,...XNA vector of components, where N is the number of pixels in the image; when a global observation I is obtained, I isImages, (I, X) can be modeled as a CRF model, characterized by a gibbs distribution of the form:
Figure BDA0002383976040000041
where E (x) is the energy at which x takes a certain value, and Z (I) is the partition function.
In the CRF model, the energy assigned to x a certain label is calculated by the following energy function:
Figure BDA0002383976040000042
wherein ,ψu(xi) Is a unary energy component for measuring the label xiProbability assigned to pixel i, ψp(xi,xj) Is a binary energy component describing the association between two adjacent pixels i, j;
the unary energy component is calculated from the CNN, and only the label of the pixel is roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is represented as a weighted gaussian kernel function:
Figure BDA0002383976040000043
wherein ,μ(xi,xj) Is a tag compatibility function for capturing compatibility between different tag pairs, for each of M1, 2
Figure BDA0002383976040000044
Is a Gaussian function applied to a feature vector, w(m)For each M ═ 1, 2., the weight of M, fi,fjIs a feature vector of the pixel;
step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge inference, which approximates the CRF distribution P (X) with a simpler distribution Q (X), which can be written as the product of the independent edge distributions, i.e.:
Q(x)=ΠiQi(xi)
wherein Q (X) represents the mean field approximation of CRF; xiRepresenting a certain pixel in the image.
And step 3: modeling the single iteration process of the CRF mean field obtained in the step 2 as a one-time forward propagation process of a CNN layer, and performing multiple iterations on the CRF mean field until the iteration times are finished, which is equivalent to reasoning the CRF mean field as an RNN model for processing, wherein the model is called CRF-RNN;
and 4, step 4: combining a CRF-RNN model with an FCN to form an end-to-end network;
and 5: and training the network by using the PASCAL Context semantic segmentation data set, inputting the content image into an end-to-end network of the FCN combined with the CRF-RNN after the training is finished, and finally obtaining a semantic segmentation result graph of the content image.
Further, as described above, the method for converting the style of the chromatic woodcut in semantic segmentation includes the following steps:
step 1: using the content image as an initialization image for generating a map; the content image and the generated image obtain corresponding feature maps at each convolution layer of the network, each layer of feature map is stored as a two-dimensional matrix to obtain feature characterization of the layer, and feature characterization extracted from higher layers of the VGG19 network is used as content characterization;
step 2: the woodcut style image and the generated image can obtain corresponding characteristic representation at each convolution layer of the network; calculating the correlation among all channel characteristic graphs of each layer by using a Gram matrix to serve as style representation;
and step 3: the mask image is re-encoded after being input into the style conversion network, and the network generates a guide channel T in each layer according to the mask imagel rThis corresponds to the guidance of the channel Tl rAdding weight information to the feature map, wherein the activation value of a region corresponding to the guide channel in the feature map is increased under the action of the weight, and the image optimization process is only carried out in the corresponding spatial guide channel region;
and 4, step 4: step 1, after content images and content representations of the generated images are obtained through the calculation of a style conversion network, carrying out corresponding element multiplication operation on feature maps in the content representations and corresponding space guide channels generated in the step 3 to obtain space guide content representations, and defining content loss by using Euclidean distance;
after the wood engraving style images obtained by calculation of the style conversion network in the step 2 and the feature representations of the generated images are represented, multiplying each feature image by the corresponding element of the corresponding space guide channel generated in the step 3 to obtain a space guide feature representation; then, calculating the correlation between the space guide characteristic graphs by using the Gram matrix to obtain a space guide Gram matrix as a space guide style representation, and defining style loss by using Euclidean distance;
and 5: and (3) weighting and combining the content loss and the lattice loss to obtain a total loss function, optimizing the generated initial image of the woodcut by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with the woodcut style.
Further, in the method for converting the style of the chromatic woodcut by semantic segmentation, the function of content loss in step 4 is as follows:
Figure BDA0002383976040000067
where x denotes the generation of the chart initialization image, xcRepresenting a content image;
Figure BDA0002383976040000061
and
Figure BDA0002383976040000062
the spatial guidance content representation of the initialization image and the content image on the ith layer of the network respectively is shown as R ∈ R;
the style loss function is:
Figure BDA0002383976040000063
wherein ,
Figure BDA0002383976040000064
wla weighting factor representing the characteristic of each layer in the VGG-19;
Figure BDA0002383976040000065
and
Figure BDA0002383976040000066
respectively representing the style of the generated image and the first layer of the woodcut art style image in the network.
Further, according to the method for converting the style of the chromatic woodcut based on semantic segmentation, the total loss function is as follows:
Ltotal=αLc+βLs
wherein ,LcRepresents the loss function between the content graph and the generation graph, LsRepresenting the loss function between the layout-style image and the generation map, α and β represent the weights of the content loss function and the layout-style loss function, respectively.
Has the advantages that:
the method for converting the style of the chromatic woodcut engraving effectively avoids the problems that woodcut nick textures are not obvious, nick textures are distributed in a disordered mode and the like easily caused by the style conversion result of the woodcut engraving. The wood engraving style conversion result of the method has obvious wood engraving nick texture, the nick texture is distributed reasonably, and the conversion result is real and natural and is closer to a real wood engraving.
The conversion method provided by the invention takes the content image as the generated image initialization image, uses the image mask as the guide, and performs style conversion of the chromatic woodcut engraving by adding the space guide channel based on the neural network segmentation algorithm and the CNN image style conversion method, thereby avoiding the problems of unobvious woodcut indentation texture, disordered indentation texture distribution and the like in the formatting of woodcut engraving. The principle is as follows:
the invention provides a woodcut style conversion method, which belongs to an online neural network method based on image optimization.A primary input image is provided with two complementary mask images, the pixel value of the image mask is 0 or 1 (the pixel value of a black area in the mask image is 0, and the pixel value of a white area in the mask image is 1), the image mask is used as guidance, a space guidance channel is added for carrying out woodcut regional style conversion, and the space guidance channel can be understood as an area of which the pixel value is 1 in the mask image;
the content map, the style map and the corresponding mask image are used as input of a style conversion network, and the network generates a guide channel T at each layer according to the mask imagel rTo guide a passage Tl rCarrying out corresponding element multiplication operation with the feature diagram extracted by the network to obtain a space guide feature representation, which is equivalent to that according to the guide channel Tl rAdding weight information to the feature map, wherein the activation value of a region corresponding to the guide channel in the feature map is increased under the action of the weight, the Gram matrix only calculates the feature correlation in the guide channel region, and the network only optimizes the guide channel region when the optimization style is lost, so that the influence of style features of a non-guide channel region is eliminated, and the phenomenon of disordered wood carving texture distribution is avoided; and optimizing the region of the guide channel by using the first mask image, optimizing the corresponding region of the guide channel by using the second mask image after the set iteration number is reached, and stopping until the set iteration number is reached to obtain a stylized image.
Compared with the method using white noise image initialization, the method using the content image to replace the white noise image as the initialization image for generating the image can well keep the semantic structure information in the image and reduce the iteration times, and the VGG19 network can more easily extract the semantic features of the generated image; the correlation between the characteristics of the high-level semantic information of the image and the woodcut style characteristics of the image can be easily obtained on the characteristic representation of the image with the semantic information by the Gram matrix, the initialized generated image with the semantic information is optimized, the noise interference is reduced, the migration of the woodcut style characteristics is enhanced in the image optimization process by combining the space guide channel, and the woodcut nicking characteristics of the conversion result are more obvious;
the space guide channel and the content graph replace white noise to be used as an initialization image for generating the graph, and the combination of the space guide channel and the content graph avoids the problems that the texture characteristics of the woodcut nick are not obvious, the nick texture distribution is disordered, the image semantic information is damaged and the like in the formatting of the woodcut picture.
Drawings
FIG. 1 is an overall flow chart of color register woodcut style conversion;
FIG. 2 is a schematic diagram of a CRF-RNN algorithm for content image segmentation;
FIG. 3 is an original image, image semantic segmentation result and its mask image;
FIG. 4 is a flowchart of process color woodcut zoned style conversion;
FIG. 5 is a graph of stylized results for different weights;
FIG. 6 is a comparison of the conversion results of portrait image woodcut style;
FIG. 7 is a comparison of scene image woodcut style conversion results;
FIG. 8 is a comparison of local texture details;
FIG. 9 is a woodcut style conversion result visual evaluation average score statistic.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described clearly and completely below, and it is obvious that the described embodiments are some, not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A first part: image semantic segmentation:
content image semantic segmentation
For semantic segmentation of content images, a CRF-RNN algorithm is used[7]And obtaining pixel-level semantic segmentation, wherein the algorithm is an end-to-end image semantic neural network segmentation algorithm.
In the pixel-by-pixel marking task of an image, CRF (condition)and CRF) is generally used for predicting label categories, and the labels of image pixels are used as random variables, and the relationship between the pixels is used as an edge to form a conditional random field, so that when a global observation is obtained, the CRF can model the labels. Setting N pixels of an image I, allocating each pixel in the image I to a preset label set L, and enabling a pixel label allocated to a pixel I to be a random variable Xi,Xi∈ L, let variable X be from X1,X2,...,XNA random vector of components. Let diagram G ═ (V, E), where V ═ X1,X2,...,XNV is equivalent to X, in the figure, a figure is usually represented by G (V, E), so that a set of pixels is represented by V, and E is the relationship between pixels, when a global observation I (I is an image) is obtained, (I, X) can be modeled as a CRF model, which is characterized by a gibbs distribution of the following form:
Figure BDA0002383976040000091
where E (x) is the energy at which x takes a certain value, and Z (I) is the partition function. In a fully-connected CRF model, the energy of the tag is defined as:
Figure BDA0002383976040000092
wherein ,ψu(xi) Is a unary energy component for measuring the label xiProbability assigned to pixel i, ψp(xi,xj) The binary energy component is used for describing the association between two adjacent pixels i and j, and the term can enable pixel points with similar color values of the adjacent pixels to be classified under the same category label with higher probability. The unary energy component is calculated from the CNN, and only the label of the pixel is roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is represented as a weighted gaussian kernel function:
Figure BDA0002383976040000093
in the formula ,μ(xi,xj) Is a tag compatibility function used to capture the compatibility between different tag pairs. For each of M1, 2
Figure BDA0002383976040000094
Is a Gaussian function applied to a feature vector, w(m)For each M ═ 1, 2., the weight of M, fi,fjIs a feature vector of the pixel.
Minimizing the above-mentioned CRF energy e (x) to obtain the maximum possible label for a given image, and since this precise minimization is difficult to handle, the average field approximation of the CRF distribution is used for maximum a posteriori edge inference, which approximates the CRF distribution p (x) with a simpler distribution q (x), which can be written as the product of independent edge distributions, i.e., q (x) ═ ΠiQi(xi) Q (X) represents the mean field approximation of CRF. Modeling a single iteration process of the CRF mean field as a one-time forward propagation process of a CNN layer, and performing multiple iterations on the CRF mean field until the iteration times are completed, wherein the iteration times are generally 10 times, which is equivalent to reasoning the CRF mean field as an RNN model to process, and the whole algorithm can be represented as a process of an RNN.
Defining the RNN structure as a CRF-RNN, regarding a CRF mean field as an RNN calculation process, combining the model with an FCN (fuzzy coherence networks, FCN) to form an end-to-end network, training the network by using a PASCALContext semantic segmentation data set, inputting a content image into the end-to-end network combining the FCN and the CRF-RNN after the training is finished, and finally obtaining a semantic segmentation result graph of the content image. The structure of the combination of CRF-RNN and FCN is shown in FIG. 2.
The following explains the segmentation of the print artistic style image:
the premise of realizing accurate segmentation of the CRF-RNN semantic segmentation network is that a large number of labeled image data sets need to be trained, the existing woodcut data sets are small, and segmentation results meeting woodcut artistic style conversion conditions are difficult to obtain after the CRF-RNN semantic segmentation network is used for training; therefore, the Labelme image annotation tool is used for performing semantic segmentation on the engraving artistic style image.
A second part: semantic segmentation result binarization
The method comprises the steps that a mask of a content image and a mask of a woodcut artistic style image is used as a guide for semantic style conversion of a color register woodcut, the content image and the woodcut artistic style image are obtained by using a CRF-RNN image semantic segmentation algorithm and a Labelme, then binarization processing is carried out on the segmentation result, a mask image of the content image and a mask image of a format image are obtained, and each original image has two complementary mask images. The original image, the image semantic segmentation result and its mask image are shown in fig. 3.
And a third part: colour register woodcut style conversion
Regional style conversion of color register woodcut is mainly based on CNN image style conversion method[1]And image stylizing method with space guide channel[8]. And taking the semantic segmentation mask image as a guide, and performing regional style conversion on the space guide channel region on the content image and the layout artistic style image. Using pre-trained VGG-19 convolutional neural network models[9]And as a feature extractor, using feature representations extracted by a higher layer of the convolutional neural network as content representations, and using the correlation among the feature representations of each channel of the convolutional layer as style representations. Namely: the VGG19 network has the capability of extracting high-level semantic information of images, the images are recoded by the network after being input into the network, corresponding feature maps are extracted from each convolution layer of the network, and the feature maps are stored into a two-dimensional matrix to obtain the feature characterization of the response of the layer. The conversion method specifically comprises the following steps:
step 1: using the content image as an initialization image for generating a map; the content image and the generated image obtain corresponding feature maps at each convolution layer of the network, each layer of feature map is stored as a two-dimensional matrix to obtain feature characterization of the layer, and feature characterization extracted from higher layers of the VGG19 network is used as content characterization;
specifically, the definition generates a graph initialization image x and a content image xc(generating the figure is asOptimizing the image of the object, using the content image and the style image as reference, optimizing the third image to obtain a conversion result graph with semantic information of the content image and style characteristics of the style image), recoding the initialized image and the content image at each layer of the VGG-19 network, wherein the number of convolution kernels of the first layer is NlThe size of the characteristic diagram is Ml,MlThe feature map output by each layer can be stored as a matrix as the product of the feature map width and height on the l-th layer
Figure BDA0002383976040000111
Fl(x) and Fl(xc) And representing the corresponding characteristic representations of the initialization image and the content image on the ith layer of the network, and taking the characteristic representations as content representations.
Step 2: the woodcut style image and the generated image can obtain corresponding characteristic representation at each convolution layer of the network; calculating the correlation among all channel characteristic graphs of each layer by using a Gram matrix to serve as style representation;
and step 3: the mask image is re-encoded after being input into the style conversion network, and the network generates a guide channel T in each layer according to the mask imagel rThis corresponds to the guidance of the channel Tl rAdding weight information to the feature map, wherein the activation value of a region corresponding to the guide channel in the feature map is increased under the action of the weight, and the image optimization process is only carried out in the corresponding spatial guide channel region;
in particular, in order to avoid the problems of unobvious texture and disordered distribution of the wood carving scores, the method is realized by adding a space guide conduction channel. Taking the mask image as a spatial guide channel, the spatial guide channel can be understood as a region with a pixel value of 1 in the mask image, and each convolution layer is vectorized into a feature map and a vectorized spatial guide channel Tl rPerforming corresponding element multiplication operation, and defining a space guide characteristic diagram as follows:
Figure BDA0002383976040000112
wherein ,
Figure BDA0002383976040000113
is composed of
Figure BDA0002383976040000114
Of the ith column vector, Tl rRepresenting the R-th guide channel on the l-th layer, R ∈ R, the spatial guide characteristic of the l-th layer is characterized as Fl r(x);
And 4, step 4: step 1, after content images and content representations of the generated images are obtained through the calculation of a style conversion network, carrying out corresponding element multiplication operation on feature maps in the content representations and corresponding space guide channels generated in the step 3 to obtain space guide content representations, and defining content loss by using Euclidean distance;
after the style images obtained by calculation of the style conversion network in the step 2 and the feature representations of the generated images are represented, carrying out corresponding element multiplication operation on each feature image and the corresponding space guide channel generated in the step 3 to obtain a space guide feature representation; then, calculating the correlation between the space guide characteristic graphs by using the Gram matrix to obtain a space guide Gram matrix as a space guide style representation, and defining style loss by using Euclidean distance;
in particular, the spatial guidance content representation can be obtained by the formula (4) calculation,
Figure BDA0002383976040000121
and
Figure BDA0002383976040000122
respectively representing the initialization image and the content image on the I layer of the network to guide the content characterization, and defining a content loss function as follows:
Figure BDA0002383976040000123
specifically, the spatial guide characteristic representation is obtained through the formula (4) calculation; and then, calculating the correlation between the space guide characteristic graphs by using a Gram matrix as the style representation of the space guide channel region, wherein the space guide Gram matrix is defined as:
Figure BDA0002383976040000124
defining an initial image x for generating a figure, a woodcut stylized image xs
Figure BDA0002383976040000125
And
Figure BDA0002383976040000126
respectively representing the space guide style of the generated image and the woodcut art style image on the l-th layer in the network. Using mean square error definition to generate the difference between the graph and the woodcut art style image, defining the style loss function of the l-th layer as:
Figure BDA0002383976040000127
the style loss function for all layers is then:
Figure BDA0002383976040000128
wherein ,wlRepresenting the weighting factors of the characteristic characterization of each layer in the VGG-19.
And 5: and (3) weighting and combining the content loss and the lattice loss to obtain a total loss function, optimizing the generated initial image of the woodcut by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with the woodcut style.
Loss of content function LcAnd a style loss function LsWeighted simultaneous, defining the total loss function:
Ltotal=αLc+βLs(9)
wherein ,LcRepresents the loss function between the content graph and the generation graph, LsImage for expressing print style andthe method comprises the steps of generating loss functions between graphs, wherein α and β respectively represent weights of a content loss function and a layout style loss function, selecting different α/β values to control stylization degree of a woodcut, and obtaining a generated graph through gradient reduction.
According to the method, conv4_2 in a VGG-19 network is selected as a content feature extraction layer, 5 network layers including conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 are selected as style feature extraction layers, and an original content image is selected as an initialization image of a generated image, so that the semantic structure of the image can be well maintained, meanwhile, the woodcut score texture effect is enhanced, and the iteration times are reduced.
The fourth part: comparison of style conversion results
The invention is applied to woodcut style conversion, and woodcut style conversion is respectively carried out on different types of pictures, such as portrait and scenery pictures, black-white pictures and color pictures, and the woodcut style conversion is carried out on the pictures and the Gatys pictures[1]、Johnson[6] and Li[8]The results of the style conversion of (a) are compared, and the experimental results are shown in fig. 6 to 8.
In FIG. 6, the style conversion result of the black and white woodcut on line 1 shows Gatys[1],Johnson[6] and Li[8]The stylized results all show the phenomenon of disordered distribution of the wood carving nicking textures; gatys in the line 2 color person portrait style conversion result[1]And Johnson[6]The stylized result of woodcut nicking has unobtrusive texture feature and disordered texture distribution, Li[8]Compared with the original print style image, the conversion result of the invention has uneven color distribution in the human face area, the invention has more obvious notch texture characteristics in the generation result of black-white and color portrait style conversion,the distribution of the score texture and the color is reasonable.
As can be seen from the scene image style conversion result of FIG. 7, Gatys[1]The conversion result of (2) has distortion and destroys the semantic information to a certain extent; johnson[6]The result of (2) may have a migration failure in a relatively smooth region, for example, the sky region in the stylized result of fig. 7 does not have style features of the same semantics in the layout style image, and the other semantic woodcut score texture features are not well represented. Li[8]The conversion result of (2) is kept good on semantic structure information, but the woodcut notch texture feature is not outstanding. The stylized result of the present invention is superior to other methods in semantic structure preservation and representation of the woodcut score textural features.
The same region (white square region in fig. 7) is selected in the conversion result of fig. 7 and compared with the local region detail texture, see fig. 8. The comparison shows that the conversion result of the invention has more prominent texture characteristics of the woodcut nicks, is real and natural and is close to the nick effect in the real chromatic woodcut engraving.
In addition to the comparison of the above experimental results, the user evaluation of the visual quality is performed on the print style conversion result. The method comprises the steps of enabling a participant to watch a content image and a print art style image, watching generated drawings stylized by four methods in a random sequence, scoring each stylized generated drawing from three aspects of overall visual quality, wood-carving scoring texture quality and scoring texture distribution rationality by taking an original real print art style image as a standard, respectively correspondingly representing evaluation scores of 5 grades which are respectively poor, common, good and good by 1-5 grades, and then calculating the average score of each method according to the scores given by the participant. Evaluation experiments 20 persons with image processing directions and 20 non-professional persons were invited to participate in the experiments and scoring. Figure 9 is the average score statistic for the scores given by the participants in the assessment experiment.
From the experimental scoring results of fig. 9, the average scores of the invention in three aspects are higher than those of the other three methods, which shows that the conversion result of the woodcut style of the invention is superior to the other methods in the overall visual quality, the woodcut score texture quality and the texture distribution rationality.
The method for converting the style of the chromatic woodcut print well keeps the semantic structure of the content graph and well simulates the texture characteristics of woodcut nicks in the chromatic woodcut print, the nick textures are distributed uniformly and reasonably, and the conversion result is real and natural and is closer to the real woodcut print.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (5)

1. A method for converting style of chromatic woodcut based on semantic segmentation is characterized by comprising the following steps:
the method comprises the following steps: respectively carrying out semantic segmentation on the content image and the print artistic style image to obtain a semantic segmentation result image;
performing semantic segmentation on the content image by using a CRF-RNN network to obtain a semantic segmentation result image; segmenting the print artistic style image by using a semantic annotation tool Lableme to obtain a semantic segmentation result image;
step two: respectively carrying out binarization processing on the two semantic segmentation result images to respectively obtain two complementary content image masks of the content image and two complementary style image masks of the layout artistic style image;
step three: and using the content image mask and the pattern image mask as guidance, and adding a space guide channel to perform regional style conversion on the content image and the print artistic style image to finally obtain an artistic style conversion result with the wood engraving style.
2. The method for converting the style of chromatic woodcut based on semantic segmentation as claimed in claim 1, wherein in the first step, the semantic segmentation of the content image by using a CRF-RNN network to obtain a semantic segmentation result map comprises:
step 1: image pixel XiThe label of (A) is used as a random variable, the relation between pixels is used as an edge to form a conditional random field, and X is a random variable X1,X2,...XNA vector of components, where N is the number of pixels in the image; when a global observation I is obtained, I is an image, (I, X) can be modeled as a CRF model, characterized by a gibbs distribution of the form:
Figure FDA0002383976030000011
wherein E (x) is the energy of a certain value taken for x, and Z (I) is a partition function;
in the CRF model, the energy assigned to x a certain label is calculated by the following energy function:
Figure FDA0002383976030000012
wherein ,ψu(xi) Is a unary energy component for measuring the label xiProbability assigned to pixel i, ψp(xi,xj) Is a binary energy component describing the association between two adjacent pixels i, j;
the unary energy component is calculated from the CNN, and only the label of the pixel is roughly predicted; the binary energy component provides a smoothing term associated with the image data, which term is represented as a weighted gaussian kernel function:
Figure FDA0002383976030000021
wherein ,μ(xi,xj) Is a tag compatibility function for capturing compatibility between different tag pairs, for m1,2, M each
Figure FDA0002383976030000022
Is a Gaussian function applied to a feature vector, w(m)For each M ═ 1, 2., the weight of M, fi,fjIs a feature vector of the pixel;
step 2: the average field approximation of the CRF distribution is used for maximum a posteriori edge inference, which approximates the CRF distribution P (X) with a simpler distribution Q (X), which can be written as the product of the independent edge distributions, i.e.:
Q(x)=ΠiQi(xi)
wherein Q (X) represents the mean field approximation of CRF; xiRepresenting a pixel in an image;
and step 3: modeling the single iteration process of the CRF mean field obtained in the step 2 as a one-time forward propagation process of a CNN layer, and performing multiple iterations on the CRF mean field until the iteration times are finished, which is equivalent to reasoning the CRF mean field as an RNN model for processing, wherein the model is called CRF-RNN;
and 4, step 4: combining a CRF-RNN model with an FCN to form an end-to-end network;
and 5: and training the network by using the PASCALContext semantic segmentation data set, inputting the content image into an end-to-end network of the FCN combined with the CRF-RNN after the training is finished, and finally obtaining a semantic segmentation result graph of the content image.
3. The method for converting the style of chromatic woodcut based on semantic segmentation as claimed in claim 1, wherein the third step comprises:
step 1: using the content image as an initialization image for generating a map; the content image and the generated image obtain corresponding feature maps at each convolution layer of the network, each layer of feature map is stored as a two-dimensional matrix to obtain feature characterization of the layer, and feature characterization extracted from higher layers of the VGG19 network is used as content characterization;
step 2: the woodcut style image and the generated image can obtain corresponding characteristic representation at each convolution layer of the network; calculating the correlation among all channel characteristic graphs of each layer by using a Gram matrix to serve as style representation;
and step 3: the mask image is re-encoded after being input into the style conversion network, and the network generates a guide channel T in each layer according to the mask imagel rThis corresponds to the guidance of the channel Tl rAdding weight information to the feature map, wherein the activation value of a region corresponding to the guide channel in the feature map is increased under the action of the weight, and the image optimization process is only carried out in the corresponding spatial guide channel region;
and 4, step 4: step 1, after content images and content representations of the generated images are obtained through the calculation of a style conversion network, carrying out corresponding element multiplication operation on feature maps in the content representations and corresponding space guide channels generated in the step 3 to obtain space guide content representations, and defining content loss by using Euclidean distance;
after the wood engraving style images obtained by calculation of the style conversion network in the step 2 and the feature representations of the generated images are represented, multiplying each feature image by the corresponding element of the corresponding space guide channel generated in the step 3 to obtain a space guide feature representation; then, calculating the correlation between the space guide characteristic graphs by using the Gram matrix to obtain a space guide Gram matrix as a space guide style representation, and defining style loss by using Euclidean distance;
and 5: and (3) weighting and combining the content loss and the lattice loss to obtain a total loss function, optimizing the generated initial image of the woodcut by using gradient descent, setting iteration times, stopping after the iteration times are reached, and finally obtaining a conversion result with the woodcut style.
4. The method for converting the style of chromatic woodcut based on semantic segmentation as claimed in claim 3, wherein the function of the content loss in step 4 is:
Figure FDA0002383976030000031
where x denotes the generation of the chart initialization image, xcRepresenting a content image;
Fl r(x) and Fl r(xc) The spatial guidance content representation of the initialization image and the content image on the ith layer of the network respectively is shown as R ∈ R;
the style loss function is:
Figure FDA0002383976030000032
wherein ,
Figure FDA0002383976030000041
wla weighting factor representing the characteristic of each layer in the VGG-19;
Figure FDA0002383976030000042
and
Figure FDA0002383976030000043
respectively representing the style of the generated image and the first layer of the woodcut art style image in the network.
5. The method of claim 3, wherein the total loss function is:
Ltotal=αLc+βLs
wherein ,LcRepresents the loss function between the content graph and the generation graph, LsRepresenting the loss function between the layout-style image and the generation map, α and β represent the weights of the content loss function and the layout-style loss function, respectively.
CN202010091956.2A 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation Active CN111340720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010091956.2A CN111340720B (en) 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010091956.2A CN111340720B (en) 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation

Publications (2)

Publication Number Publication Date
CN111340720A true CN111340720A (en) 2020-06-26
CN111340720B CN111340720B (en) 2023-05-19

Family

ID=71186865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010091956.2A Active CN111340720B (en) 2020-02-14 2020-02-14 Color matching woodcut style conversion algorithm based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN111340720B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288622A (en) * 2020-10-29 2021-01-29 中山大学 Multi-scale generation countermeasure network-based camouflaged image generation method
CN112967180A (en) * 2021-03-17 2021-06-15 福建库克智能科技有限公司 Training method for generating countermeasure network, and image style conversion method and device
TWI762971B (en) * 2020-07-15 2022-05-01 宏碁股份有限公司 Method and computer program product for image style transfer

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050722A (en) * 2014-06-06 2014-09-17 北京航空航天大学 Indoor three-dimensional scene layout and color transfer generation method driven by image contents
US20160253090A1 (en) * 2013-11-19 2016-09-01 Wacom Co., Ltd. Method and system for ink data generation, ink data rendering, ink data manipulation and ink data communication
CN108470320A (en) * 2018-02-24 2018-08-31 中山大学 A kind of image stylizing method and system based on CNN
CN108805803A (en) * 2018-06-13 2018-11-13 衡阳师范学院 A kind of portrait style moving method based on semantic segmentation Yu depth convolutional neural networks
CN108898082A (en) * 2018-06-19 2018-11-27 Oppo广东移动通信有限公司 Image processing method, picture processing unit and terminal device
CN109697690A (en) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 Image Style Transfer method and system
CN109712068A (en) * 2018-12-21 2019-05-03 云南大学 Image Style Transfer and analogy method for cucurbit pyrography
CN110503716A (en) * 2019-08-12 2019-11-26 中国科学技术大学 A kind of automobile license plate generated data generation method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160253090A1 (en) * 2013-11-19 2016-09-01 Wacom Co., Ltd. Method and system for ink data generation, ink data rendering, ink data manipulation and ink data communication
CN104050722A (en) * 2014-06-06 2014-09-17 北京航空航天大学 Indoor three-dimensional scene layout and color transfer generation method driven by image contents
CN108470320A (en) * 2018-02-24 2018-08-31 中山大学 A kind of image stylizing method and system based on CNN
CN108805803A (en) * 2018-06-13 2018-11-13 衡阳师范学院 A kind of portrait style moving method based on semantic segmentation Yu depth convolutional neural networks
CN108898082A (en) * 2018-06-19 2018-11-27 Oppo广东移动通信有限公司 Image processing method, picture processing unit and terminal device
CN109697690A (en) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 Image Style Transfer method and system
CN109712068A (en) * 2018-12-21 2019-05-03 云南大学 Image Style Transfer and analogy method for cucurbit pyrography
CN110503716A (en) * 2019-08-12 2019-11-26 中国科学技术大学 A kind of automobile license plate generated data generation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
郑锐 等: "基于卷积神经网络的刺绣风格数字合成", 《浙江大学学报(理学版)》 *
郑锐 等: "基于卷积神经网络的刺绣风格数字合成", 《浙江大学学报(理学版)》, vol. 46, no. 3, 15 May 2019 (2019-05-15), pages 270 - 278 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI762971B (en) * 2020-07-15 2022-05-01 宏碁股份有限公司 Method and computer program product for image style transfer
CN112288622A (en) * 2020-10-29 2021-01-29 中山大学 Multi-scale generation countermeasure network-based camouflaged image generation method
CN112967180A (en) * 2021-03-17 2021-06-15 福建库克智能科技有限公司 Training method for generating countermeasure network, and image style conversion method and device
CN112967180B (en) * 2021-03-17 2023-12-22 福建库克智能科技有限公司 Training method for generating countermeasure network, image style conversion method and device

Also Published As

Publication number Publication date
CN111340720B (en) 2023-05-19

Similar Documents

Publication Publication Date Title
Jiang et al. Scfont: Structure-guided chinese font generation via deep stacked networks
CN110378985B (en) Animation drawing auxiliary creation method based on GAN
CN111340720B (en) Color matching woodcut style conversion algorithm based on semantic segmentation
CN106548208B (en) A kind of quick, intelligent stylizing method of photograph image
CN110322416B (en) Image data processing method, apparatus and computer readable storage medium
CN111310760B (en) Method for detecting alpha bone inscription characters by combining local priori features and depth convolution features
Kumar et al. A comprehensive survey on non-photorealistic rendering and benchmark developments for image abstraction and stylization
Thasarathan et al. Automatic temporally coherent video colorization
CN112287941B (en) License plate recognition method based on automatic character region perception
CN111915522A (en) Image restoration method based on attention mechanism
CN111768335B (en) CNN-based user interactive image local clothing style migration method
CN111986125A (en) Method for multi-target task instance segmentation
CN113705579B (en) Automatic image labeling method driven by visual saliency
Rother et al. Interactive foreground extraction: Using graph cut
Lu et al. Sketch simplification based on conditional random field and least squares generative adversarial networks
CN117934688A (en) Nerve representation modeling method based on Gaussian splatter sample
Zhu et al. Sand painting conversion based on detail preservation
Zhang et al. CBA-GAN: Cartoonization style transformation based on the convolutional attention module
Fang et al. Stylized-colorization for line arts
CN113012079B (en) Low-brightness vehicle bottom image enhancement method and device and storage medium
Tomar et al. An Effective Cartoonifying of an Image using Machine Learning
CN112329803B (en) Natural scene character recognition method based on standard font generation
Satchidanandam et al. Enhancing Style Transfer with GANs: Perceptual Loss and Semantic Segmentation
CN113112397A (en) Image style migration method based on style and content decoupling
Parihar et al. UndarkGAN: Low-light Image Enhancement with Cycle-consistent Adversarial Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant