CN113160343A

CN113160343A - Deep neural network-based simple stroke drawing process generation algorithm

Info

Publication number: CN113160343A
Application number: CN202011638190.1A
Authority: CN
Inventors: 宋桂岭
Original assignee: Jiangsu Yuanu Information Technology Co ltd
Current assignee: Jiangsu Yuanu Information Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-07-23

Abstract

The invention designs and realizes a drawing process generation method guided by simple strokes. A Variational Automatic Encoder (VAE) according to one of the two major generative models currently in use is able to generate data while a Recurrent Neural Network (RNN) is able to learn time-based sequence characteristics, plus attention is paid to the fact that the mechanism is able to retain important information. Therefore, an algorithm is designed based on the variational automatic encoder, the recurrent neural network and the attention mechanism, and the algorithm can generate the drawing process sequence of the simplified strokes more accurately. The algorithm can effectively capture the relation between the simple-stroke drawing processes through an attention mechanism and the priori knowledge of the hidden variable state of the encoder, and can generate a simple-stroke sequence which is difficult for a person to identify whether the simple-stroke sequence is generated or drawn by the human.

Description

Deep neural network-based simple stroke drawing process generation algorithm

Technical Field

The invention relates to the field of artificial intelligence and computer vision, and also relates to the technical field of some graphic images. In particular to a method for automatic variational coder, a recurrent neural network, an attention mechanism and mixed density modeling.

Background

Research in the field of artificial intelligence has been directed to aspects of human life such as playing chess, delivering meals, selling goods, and the like. The art painting is also an aspect of research in the field of artificial intelligence, and how to intelligently teach students to paint, evaluate and correct errors in the painting process of the students still does not have a good solution at present. Although some artificial intelligence techniques have been studied for application in the art, there is no detailed study based on the drawing process. Because in the drawing process, human beings learn based on the drawing process, in the drawing process, teachers teach in a step-by-step mode according to the drawing process instead of directly processing the whole picture in the art field like artificial intelligence at present, such as style migration and image generation, and the obtained results are the whole picture results. The research and application of generating sequences of painting processes using artificial intelligence techniques remains an exploratory problem.

At present, the neural network is applied to the research of the painting art field and mainly divided into two types. The first type is a convolution neural network based on computer vision, which mainly extracts the characteristics of the image of the drawing result, and then carries out applications such as transformation, detection and search. The technology of this kind is mainly the sketch-to-image technology for converting the sketching result into the natural scene image, such as deep sketchhashing. The technology mainly realizes the conversion from the strokes to the natural scene images by training and learning two neural networks, wherein one neural network is used for learning the characteristics of the strokes and the other neural network is used for learning the characteristics of the real scene images, and then partial parameters of the two networks are shared.

The second type is generative models, such as VAE, GAN generative models. The automatic encoder VAE has an encoder and a decoder, and is capable of encoding existing data, then reducing important features of the data into a hidden variable Z, and then the decoder is capable of obtaining a new image by decoding the hidden variable Z. The GAN has a generation network and a discrimination network. The generating network is used for generating images, and the judging network is used for judging the difference between the generated images and the real images. The training process continues to "game" the two networks, which can be used to generate images if the images generated by the final generated network can fool the discriminant model. GAN is studied in image generation to import image data sets directly into the network, such as to generate MNIST data sets.

However, the above methods still have disadvantages:

(1) the data processed by the above methods are all in image format, but for the image data set in image format, only the result of the drawing is reflected, and the drawing process cannot be reflected, so that the relationship between strokes in the drawing process cannot be learned.

(2) Generative models such as GAN have discontinuities in the generated image, such as images of sometimes three cats.

(3) At present, the processing of images is mostly processed by using a convolutional neural network, and a cyclic neural network is rarely used for learning a process sequence of painting. That is, the current research ignores the relevance in the drawing process, which is the characteristic of the drawing.

(4) Most of the generation models such as VAE and GAN are used for generating and processing pictures in natural scenes, but a great gap still exists in the field of simple stroke drawing.

Disclosure of Invention

The invention provides a method for generating a drawing process guided by a simplified stroke, which is characterized in that a process sequence of the simplified stroke drawing is used for training, the relevance in the drawing process of the simplified stroke can be learned, and most importantly, an attention mechanism is added to enable a network model to learn an important part in the drawing process sequence, so that the drawing process and the drawing result are more accurate. The process sequence of drawing is learned instead of the final result, so that the machine can learn the drawing process of human beings, even surpass the drawing level of human beings, and the drawing teaching is realized independently.

The invention designs and realizes a drawing process generation method guided by simple strokes. A Variational Automatic Encoder (VAE) according to one of the two major generative models currently in use is able to generate data while a Recurrent Neural Network (RNN) is able to learn time-based sequence characteristics, plus attention is paid to the fact that the mechanism is able to retain important information. Therefore, an algorithm is designed based on the variational automatic encoder, the recurrent neural network and the attention mechanism, and the algorithm can generate the drawing process sequence of the simplified strokes more accurately. The algorithm is mainly based on a framework of a variational automatic encoder and comprises three parts: one is the encoder network in the VAE, and the network adopted in this part is a multi-layer bidirectional recurrent neural network LSTM. The second section focuses on the force mechanism section. The third part is the decoder modeling network part. The algorithm is mainly characterized in that a hidden layer state of a multilayer cyclic neural network of an encoder is used as prior information of an attention mechanism of the decoder, a hidden layer state h obtained at a certain moment of the decoder is compared with all hidden layer states of the encoder to obtain a relation between an input sequence of the encoder and a generated sequence of the decoder, the relation is used as the attention prior information to participate in decoding of the decoder network, meanwhile, an output state of the last layer of the encoder network is used as an initial state of the decoder and is also used as input of the decoder at each moment, and the prior information of the decoder at each moment is improved. And finally, generating a final simplified stroke drawing process sequence through mixed density network modeling based on the generated sequence of the attention-based decoder network output. Experiments show that the algorithm can effectively capture the relation between the simple stroke drawing processes through an attention mechanism and the priori knowledge of the hidden variable state of an encoder, and can generate a simple stroke sequence which is difficult for a person to identify whether the simple stroke sequence is generated or drawn by the human.

The invention is based on the structure of the VAE of the existing variational automatic coding decoder, and comprises two branch networks, wherein one branch network is an encoder network using a bidirectional cyclic neural network and is used for learning the drawing procedural sequence characteristics of each simple stroke type, and the other branch network is a decoder part which is combined with an attention mechanism and is connected with an MDN mixed density network for modeling to generate a drawing process sequence after passing through a cyclic neural network LSTM.

The method for generating the simplified stroke drawing process based on the deep neural network specifically comprises the following steps:

(1) sampling the simple stroke images in the svg format in the data set, and processing the simple stroke images into a sequence (delta x, delta y, p), wherein the (delta x, delta y) represents the interval between the sampled point and the x coordinate and the y coordinate of the last point, and the p represents whether one stroke ends;

(2) the sequence of stroke processes is input to the encoder portion based on a two-way loop network, one in positive order and one in negative order. The reason why the encoder portion obtains the image features of the simple-stroke character and the output after each time point of drawing the program column passes through the recurrent neural network is (batch _ size, timing, rnn _ size x 2), and the output state final _ state of the recurrent neural network unit corresponding to the last time point is (batch _ size, rnn _ size x 2), multiplied by 2 is because the encoder portion is a bidirectional recurrent network. Wherein, the batch _ size is the number of stroked strokes input each time in the training process, the timepieces represents the length of a process sequence corresponding to one stroked stroke, and rnn _ size refers to the number of LSTM neurons in a recurrent neural network;

(3) converting final _ state obtained by the encoder into a hidden variable Z with the size of (batch _ size,128), and converting Z into the initial state h0 of the decoder with the size of (batch _ size, dec _ rnn) again;

(4) the use of a power mechanism is noted. Calculating cosine similarity between the output of the decoder at the current moment and the output of all the cyclic neural networks of the encoder, converting the cosine similarity into corresponding weight, and performing weighted summation between the cosine similarity and the output of the encoder at all the moments to obtain the attention value C_t. The attention mechanism is calculated according to the following formula:

wherein h is_iWhat is shown is the output at the moment i of the encoder,

representing the output of the decoder at time t, V^T,W₁,W₂Representing trainable parameters;

(5) encoder netImplicit variable of the network Z, result of attention mechanism C_tAnd is also trained as the input to the decoder. The hidden variable Z is used as the input of the decoder at each moment;

(6) the output of the decoder network is decoder _ output, size (bat _ ch, size, time _ st, corresponding size in the present invention is (100,250,512), then the output of the decoder network is mapped to a sequence of dimensions (-1,123), where-1 is an unspecified size, then input into the mixed density network to model (Δ x, Δ y, p), which is a two-dimensional gaussian mixed density modeling, and finally output is a sequence of simple stroke drawing processes, the modeling formula is modeled according to equations (17) - (23) in reference [2], the results after modeling are in accordance with their original distribution compared to the original data.

The invention has the beneficial effects that: the method for generating the simple stroke drawing process based on the deep neural network can effectively capture the relation between the simple stroke drawing process sequences and effectively generate the drawing process sequences of the simple strokes.

Drawings

FIG. 1 is a block diagram of the algorithm of the present invention, namely, a block diagram of an outline drawing process generation algorithm of attribute-sketch-rnn;

FIG. 2 is an attention model, namely an attention model in the algorithm;

FIG. 3 is a diagram of a sequence of generated strokes of a simplified stroke, i.e., a sequence of flower strokes;

fig. 4 is a graph comparing (Δ x, Δ y) coordinate distributions of the original drawing and the generated simplified drawing, in which dots represent the simplified drawing generated by the algorithm, and x represents the original drawing.

Detailed Description

The specific implementation mode of the invention is as follows:

based on the structure of the existing variational automatic encoder-decoder VAE, the device comprises two branch networks, wherein one branch network is an encoder network using a bidirectional cyclic neural network and is used for learning the drawing program list characteristics of each simple stroke type, and the other branch network is a decoder part which is combined with an attention mechanism and is connected with an MDN mixed density network for modeling to generate a drawing process sequence after passing through a cyclic neural network LSTM.

(1) The simple stroke image in the svg format in the data set is sampled and processed into a sequence (delta x, delta y, p), wherein the value (delta x, delta y) represents the interval between the x coordinate and the y coordinate of the sampled point and the last point, and the value p represents whether the stroke ends.

(2) The sequence of stroke processes is input to the encoder portion based on a two-way loop network, one in positive order and one in negative order. The reason why the encoder portion obtains the image features of the simple-stroke character and the output after each time point of drawing the program column passes through the recurrent neural network is (batch _ size, timing, rnn _ size x 2), and the output state final _ state of the recurrent neural network unit corresponding to the last time point is (batch _ size, rnn _ size x 2), multiplied by 2 is because the encoder portion is a bidirectional recurrent network. Wherein, batch _ size is how many stroked strokes are input each time in the training process, timepieces represents the length of a process sequence corresponding to one stroked stroke, and rnn _ size refers to how many LSTM neurons are in a recurrent neural network.

(3) The final _ state obtained by the encoder is converted into a hidden variable Z with the size of (batch _ size,128), and the Z is converted into the initial state h0 of the decoder with the size of (batch _ size, dec _ rnn × 2).

wherein h is_iWhat is shown is the output at the moment i of the encoder,

representing the output of the decoder at time t, V^T,W₁,W₂Representing trainable parameters.

(5) The implicit variable Z of the encoder network and the result C of the attention mechanism are combined_tAnd is also trained as the input to the decoder. The hidden variable Z will be the input to the decoder at each instant.

(6) The output of the decoder network is decoder _ output, size (bat _ ch, size, time _ st, corresponding size in the present invention is (100,250,512), then the output of the decoder network is mapped to a sequence of dimensions (-1,123), where-1 is not specified size, then input to the mixed density network to model (Δ x, Δ y, p), which is a two-dimensional gaussian mixed density modeling, and finally output is a sequence of simple stroke drawing processes, the modeling formula is modeled according to equations (17) - (23) in reference [2], the results after modeling are in accordance with their original distribution compared to the original data, see fig. 4, red for the simple stroke graph generated by the algorithm, and blue for the original graph.

The generation model used in the method is an automatic encoder VAE, the recurrent neural network is a long-short term memory network LSTM, and the attention mechanism is based on the attention mechanism in reference [1] and is used for learning the drawing process of the simplified strokes together, so that the technical field of sequence generation of the drawing process of the simplified strokes is further improved. The method can be applied to the fields of art and natural language processing of unsupervised generation of sequences. The algorithm extracts the priori knowledge of the simple stroke drawing process, and simultaneously takes the time characteristic and the characteristic that different moments affect the generation sequence of the drawing process sequence at the next moment into consideration, so that the sequence generated by the training model is more consistent with the real simple stroke drawing process.

The general block diagram of the algorithm of the invention is shown in figure 1. The attention mechanism model is shown in fig. 2. The generation algorithm of the invention can effectively capture the relation between the simple stroke drawing process sequences and effectively generate the drawing process sequences of the simple strokes. As shown in FIG. 3, a generated simplified stroke is shown drawing a procedural column.

Reference to the literature

[1]Luong,M.-T.,Pham,H.,and Manning,C.D.Effective approaches to attention-based neural machine translation. In Conference on Empirical Methods in Natural Language Processing(2015).

[2]Graves,Alex.Generating sequences with recurrent neural networks.arXiv preprint arXiv:1308.0850,2013.

Claims

1. The simplified stroke drawing process generation algorithm based on the deep neural network is characterized by comprising the following steps of: the method comprises the following steps:

(1) sampling simple stroke images in an svg format in a data set, and processing the simple stroke images into a sequence (delta x, delta y, p);

(2) inputting the simple stroke process sequence into an encoder part based on a bidirectional circulation network, wherein one is positive sequence input, and the other is negative sequence input;

(3) converting final _ state obtained by the encoder twice to obtain hidden variable Z and initial state h0 of the decoder;

(4) use of the attention mechanism yields the value of attention C_t；

(5) The implicit variable Z of the encoder network and the result C of the attention mechanism are combined_tAnd is trained as input to the decoder.

(6) And (3) outputting the decoder network, then mapping the output of the decoder network into a sequence with the dimensionality of (-1,123), wherein-1 is an unspecified size, then inputting the output into the mixed density network to model (delta x, delta y, p), and finally outputting the output into a sequence of the simple stroke drawing process.

2. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the sampling process described in step (1) is processed into a sequence (Δ x, Δ y, p), where (Δ x, Δ y) represents the distance between the sampled point and the x coordinate and the y coordinate of the last point, and p represents whether a stroke ends.

3. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the reason why the encoder portion obtains the image features of the simplified stroke in step (2) and the output after each time of the sequence of drawing processes passes through the recurrent neural network is (batch _ size, timesize, rnn _ size × 2), the output state final _ state of the recurrent neural network unit corresponding to the last time is (batch _ size, rnn _ size × 2), and multiplied by 2 is that the encoder portion is a bidirectional recurrent network. Wherein, batch _ size is the process sequence of how many stroked strokes are input each time in the training process, timepieces represents the length of the process sequence corresponding to one stroked stroke, and rnn _ size refers to how many LSTM neurons are in a recurrent neural network.

4. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the two conversions described in step (3) are: first, a conversion is performed to convert the hidden variable Z to (batch _ size,128), and then a conversion is performed to convert Z to the initial state h0 of the decoder to (batch _ size, dec _ rnn × 2).

5. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: use of the attention mechanism described in step (4): calculating cosine similarity between the output of the decoder at the current moment and the output of all the recurrent neural networks of the encoder, converting the cosine similarity into corresponding weight, and performing weighted summation between the cosine similarity and the output of the encoder at all the moments to obtain an attention value C_t。

6. The deep neural network-based simple stroke drawing process generating algorithm of claim 5, wherein: the attention mechanism is calculated according to the following formula:

wherein h is_iWhat is shown is the output at the moment i of the encoder,

7. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the hidden variable Z in step (5) is used as the input of the decoder at each moment.

8. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: in the step (6), the decoder network outputs decoder _ output with the size of (batch _ size, timepieces, decoder _ rnn _ size), the output of the decoder network is mapped into a sequence with the dimensionality of (-1,123), wherein-1 is the unspecified size, then the output is input into a mixed density network to model (delta x, delta y, p), the modeling process is two-dimensional Gaussian mixed density modeling, and finally the output is a sequence of a simple stroke drawing process.