CN113160343A - Deep neural network-based simple stroke drawing process generation algorithm - Google Patents

Deep neural network-based simple stroke drawing process generation algorithm Download PDF

Info

Publication number
CN113160343A
CN113160343A CN202011638190.1A CN202011638190A CN113160343A CN 113160343 A CN113160343 A CN 113160343A CN 202011638190 A CN202011638190 A CN 202011638190A CN 113160343 A CN113160343 A CN 113160343A
Authority
CN
China
Prior art keywords
sequence
output
decoder
neural network
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011638190.1A
Other languages
Chinese (zh)
Inventor
宋桂岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Yuanu Information Technology Co ltd
Original Assignee
Jiangsu Yuanu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Yuanu Information Technology Co ltd filed Critical Jiangsu Yuanu Information Technology Co ltd
Priority to CN202011638190.1A priority Critical patent/CN113160343A/en
Publication of CN113160343A publication Critical patent/CN113160343A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention designs and realizes a drawing process generation method guided by simple strokes. A Variational Automatic Encoder (VAE) according to one of the two major generative models currently in use is able to generate data while a Recurrent Neural Network (RNN) is able to learn time-based sequence characteristics, plus attention is paid to the fact that the mechanism is able to retain important information. Therefore, an algorithm is designed based on the variational automatic encoder, the recurrent neural network and the attention mechanism, and the algorithm can generate the drawing process sequence of the simplified strokes more accurately. The algorithm can effectively capture the relation between the simple-stroke drawing processes through an attention mechanism and the priori knowledge of the hidden variable state of the encoder, and can generate a simple-stroke sequence which is difficult for a person to identify whether the simple-stroke sequence is generated or drawn by the human.

Description

Deep neural network-based simple stroke drawing process generation algorithm
Technical Field
The invention relates to the field of artificial intelligence and computer vision, and also relates to the technical field of some graphic images. In particular to a method for automatic variational coder, a recurrent neural network, an attention mechanism and mixed density modeling.
Background
Research in the field of artificial intelligence has been directed to aspects of human life such as playing chess, delivering meals, selling goods, and the like. The art painting is also an aspect of research in the field of artificial intelligence, and how to intelligently teach students to paint, evaluate and correct errors in the painting process of the students still does not have a good solution at present. Although some artificial intelligence techniques have been studied for application in the art, there is no detailed study based on the drawing process. Because in the drawing process, human beings learn based on the drawing process, in the drawing process, teachers teach in a step-by-step mode according to the drawing process instead of directly processing the whole picture in the art field like artificial intelligence at present, such as style migration and image generation, and the obtained results are the whole picture results. The research and application of generating sequences of painting processes using artificial intelligence techniques remains an exploratory problem.
At present, the neural network is applied to the research of the painting art field and mainly divided into two types. The first type is a convolution neural network based on computer vision, which mainly extracts the characteristics of the image of the drawing result, and then carries out applications such as transformation, detection and search. The technology of this kind is mainly the sketch-to-image technology for converting the sketching result into the natural scene image, such as deep sketchhashing. The technology mainly realizes the conversion from the strokes to the natural scene images by training and learning two neural networks, wherein one neural network is used for learning the characteristics of the strokes and the other neural network is used for learning the characteristics of the real scene images, and then partial parameters of the two networks are shared.
The second type is generative models, such as VAE, GAN generative models. The automatic encoder VAE has an encoder and a decoder, and is capable of encoding existing data, then reducing important features of the data into a hidden variable Z, and then the decoder is capable of obtaining a new image by decoding the hidden variable Z. The GAN has a generation network and a discrimination network. The generating network is used for generating images, and the judging network is used for judging the difference between the generated images and the real images. The training process continues to "game" the two networks, which can be used to generate images if the images generated by the final generated network can fool the discriminant model. GAN is studied in image generation to import image data sets directly into the network, such as to generate MNIST data sets.
However, the above methods still have disadvantages:
(1) the data processed by the above methods are all in image format, but for the image data set in image format, only the result of the drawing is reflected, and the drawing process cannot be reflected, so that the relationship between strokes in the drawing process cannot be learned.
(2) Generative models such as GAN have discontinuities in the generated image, such as images of sometimes three cats.
(3) At present, the processing of images is mostly processed by using a convolutional neural network, and a cyclic neural network is rarely used for learning a process sequence of painting. That is, the current research ignores the relevance in the drawing process, which is the characteristic of the drawing.
(4) Most of the generation models such as VAE and GAN are used for generating and processing pictures in natural scenes, but a great gap still exists in the field of simple stroke drawing.
Disclosure of Invention
The invention provides a method for generating a drawing process guided by a simplified stroke, which is characterized in that a process sequence of the simplified stroke drawing is used for training, the relevance in the drawing process of the simplified stroke can be learned, and most importantly, an attention mechanism is added to enable a network model to learn an important part in the drawing process sequence, so that the drawing process and the drawing result are more accurate. The process sequence of drawing is learned instead of the final result, so that the machine can learn the drawing process of human beings, even surpass the drawing level of human beings, and the drawing teaching is realized independently.
The invention designs and realizes a drawing process generation method guided by simple strokes. A Variational Automatic Encoder (VAE) according to one of the two major generative models currently in use is able to generate data while a Recurrent Neural Network (RNN) is able to learn time-based sequence characteristics, plus attention is paid to the fact that the mechanism is able to retain important information. Therefore, an algorithm is designed based on the variational automatic encoder, the recurrent neural network and the attention mechanism, and the algorithm can generate the drawing process sequence of the simplified strokes more accurately. The algorithm is mainly based on a framework of a variational automatic encoder and comprises three parts: one is the encoder network in the VAE, and the network adopted in this part is a multi-layer bidirectional recurrent neural network LSTM. The second section focuses on the force mechanism section. The third part is the decoder modeling network part. The algorithm is mainly characterized in that a hidden layer state of a multilayer cyclic neural network of an encoder is used as prior information of an attention mechanism of the decoder, a hidden layer state h obtained at a certain moment of the decoder is compared with all hidden layer states of the encoder to obtain a relation between an input sequence of the encoder and a generated sequence of the decoder, the relation is used as the attention prior information to participate in decoding of the decoder network, meanwhile, an output state of the last layer of the encoder network is used as an initial state of the decoder and is also used as input of the decoder at each moment, and the prior information of the decoder at each moment is improved. And finally, generating a final simplified stroke drawing process sequence through mixed density network modeling based on the generated sequence of the attention-based decoder network output. Experiments show that the algorithm can effectively capture the relation between the simple stroke drawing processes through an attention mechanism and the priori knowledge of the hidden variable state of an encoder, and can generate a simple stroke sequence which is difficult for a person to identify whether the simple stroke sequence is generated or drawn by the human.
The invention is based on the structure of the VAE of the existing variational automatic coding decoder, and comprises two branch networks, wherein one branch network is an encoder network using a bidirectional cyclic neural network and is used for learning the drawing procedural sequence characteristics of each simple stroke type, and the other branch network is a decoder part which is combined with an attention mechanism and is connected with an MDN mixed density network for modeling to generate a drawing process sequence after passing through a cyclic neural network LSTM.
The method for generating the simplified stroke drawing process based on the deep neural network specifically comprises the following steps:
(1) sampling the simple stroke images in the svg format in the data set, and processing the simple stroke images into a sequence (delta x, delta y, p), wherein the (delta x, delta y) represents the interval between the sampled point and the x coordinate and the y coordinate of the last point, and the p represents whether one stroke ends;
(2) the sequence of stroke processes is input to the encoder portion based on a two-way loop network, one in positive order and one in negative order. The reason why the encoder portion obtains the image features of the simple-stroke character and the output after each time point of drawing the program column passes through the recurrent neural network is (batch _ size, timing, rnn _ size x 2), and the output state final _ state of the recurrent neural network unit corresponding to the last time point is (batch _ size, rnn _ size x 2), multiplied by 2 is because the encoder portion is a bidirectional recurrent network. Wherein, the batch _ size is the number of stroked strokes input each time in the training process, the timepieces represents the length of a process sequence corresponding to one stroked stroke, and rnn _ size refers to the number of LSTM neurons in a recurrent neural network;
(3) converting final _ state obtained by the encoder into a hidden variable Z with the size of (batch _ size,128), and converting Z into the initial state h0 of the decoder with the size of (batch _ size, dec _ rnn) again;
(4) the use of a power mechanism is noted. Calculating cosine similarity between the output of the decoder at the current moment and the output of all the cyclic neural networks of the encoder, converting the cosine similarity into corresponding weight, and performing weighted summation between the cosine similarity and the output of the encoder at all the moments to obtain the attention value Ct. The attention mechanism is calculated according to the following formula:
Figure BDA0002877361080000041
Figure BDA0002877361080000042
Figure BDA0002877361080000043
wherein h isiWhat is shown is the output at the moment i of the encoder,
Figure BDA0002877361080000044
representing the output of the decoder at time t, VT,W1,W2Representing trainable parameters;
(5) encoder netImplicit variable of the network Z, result of attention mechanism CtAnd is also trained as the input to the decoder. The hidden variable Z is used as the input of the decoder at each moment;
(6) the output of the decoder network is decoder _ output, size (bat _ ch, size, time _ st, corresponding size in the present invention is (100,250,512), then the output of the decoder network is mapped to a sequence of dimensions (-1,123), where-1 is an unspecified size, then input into the mixed density network to model (Δ x, Δ y, p), which is a two-dimensional gaussian mixed density modeling, and finally output is a sequence of simple stroke drawing processes, the modeling formula is modeled according to equations (17) - (23) in reference [2], the results after modeling are in accordance with their original distribution compared to the original data.
The invention has the beneficial effects that: the method for generating the simple stroke drawing process based on the deep neural network can effectively capture the relation between the simple stroke drawing process sequences and effectively generate the drawing process sequences of the simple strokes.
Drawings
FIG. 1 is a block diagram of the algorithm of the present invention, namely, a block diagram of an outline drawing process generation algorithm of attribute-sketch-rnn;
FIG. 2 is an attention model, namely an attention model in the algorithm;
FIG. 3 is a diagram of a sequence of generated strokes of a simplified stroke, i.e., a sequence of flower strokes;
fig. 4 is a graph comparing (Δ x, Δ y) coordinate distributions of the original drawing and the generated simplified drawing, in which dots represent the simplified drawing generated by the algorithm, and x represents the original drawing.
Detailed Description
The specific implementation mode of the invention is as follows:
based on the structure of the existing variational automatic encoder-decoder VAE, the device comprises two branch networks, wherein one branch network is an encoder network using a bidirectional cyclic neural network and is used for learning the drawing program list characteristics of each simple stroke type, and the other branch network is a decoder part which is combined with an attention mechanism and is connected with an MDN mixed density network for modeling to generate a drawing process sequence after passing through a cyclic neural network LSTM.
(1) The simple stroke image in the svg format in the data set is sampled and processed into a sequence (delta x, delta y, p), wherein the value (delta x, delta y) represents the interval between the x coordinate and the y coordinate of the sampled point and the last point, and the value p represents whether the stroke ends.
(2) The sequence of stroke processes is input to the encoder portion based on a two-way loop network, one in positive order and one in negative order. The reason why the encoder portion obtains the image features of the simple-stroke character and the output after each time point of drawing the program column passes through the recurrent neural network is (batch _ size, timing, rnn _ size x 2), and the output state final _ state of the recurrent neural network unit corresponding to the last time point is (batch _ size, rnn _ size x 2), multiplied by 2 is because the encoder portion is a bidirectional recurrent network. Wherein, batch _ size is how many stroked strokes are input each time in the training process, timepieces represents the length of a process sequence corresponding to one stroked stroke, and rnn _ size refers to how many LSTM neurons are in a recurrent neural network.
(3) The final _ state obtained by the encoder is converted into a hidden variable Z with the size of (batch _ size,128), and the Z is converted into the initial state h0 of the decoder with the size of (batch _ size, dec _ rnn × 2).
(4) The use of a power mechanism is noted. Calculating cosine similarity between the output of the decoder at the current moment and the output of all the cyclic neural networks of the encoder, converting the cosine similarity into corresponding weight, and performing weighted summation between the cosine similarity and the output of the encoder at all the moments to obtain the attention value Ct. The attention mechanism is calculated according to the following formula:
Figure BDA0002877361080000061
Figure BDA0002877361080000062
Figure BDA0002877361080000063
wherein h isiWhat is shown is the output at the moment i of the encoder,
Figure BDA0002877361080000071
representing the output of the decoder at time t, VT,W1,W2Representing trainable parameters.
(5) The implicit variable Z of the encoder network and the result C of the attention mechanism are combinedtAnd is also trained as the input to the decoder. The hidden variable Z will be the input to the decoder at each instant.
(6) The output of the decoder network is decoder _ output, size (bat _ ch, size, time _ st, corresponding size in the present invention is (100,250,512), then the output of the decoder network is mapped to a sequence of dimensions (-1,123), where-1 is not specified size, then input to the mixed density network to model (Δ x, Δ y, p), which is a two-dimensional gaussian mixed density modeling, and finally output is a sequence of simple stroke drawing processes, the modeling formula is modeled according to equations (17) - (23) in reference [2], the results after modeling are in accordance with their original distribution compared to the original data, see fig. 4, red for the simple stroke graph generated by the algorithm, and blue for the original graph.
The generation model used in the method is an automatic encoder VAE, the recurrent neural network is a long-short term memory network LSTM, and the attention mechanism is based on the attention mechanism in reference [1] and is used for learning the drawing process of the simplified strokes together, so that the technical field of sequence generation of the drawing process of the simplified strokes is further improved. The method can be applied to the fields of art and natural language processing of unsupervised generation of sequences. The algorithm extracts the priori knowledge of the simple stroke drawing process, and simultaneously takes the time characteristic and the characteristic that different moments affect the generation sequence of the drawing process sequence at the next moment into consideration, so that the sequence generated by the training model is more consistent with the real simple stroke drawing process.
The general block diagram of the algorithm of the invention is shown in figure 1. The attention mechanism model is shown in fig. 2. The generation algorithm of the invention can effectively capture the relation between the simple stroke drawing process sequences and effectively generate the drawing process sequences of the simple strokes. As shown in FIG. 3, a generated simplified stroke is shown drawing a procedural column.
Reference to the literature
[1]Luong,M.-T.,Pham,H.,and Manning,C.D.Effective approaches to attention-based neural machine translation. In Conference on Empirical Methods in Natural Language Processing(2015).
[2]Graves,Alex.Generating sequences with recurrent neural networks.arXiv preprint arXiv:1308.0850,2013.

Claims (8)

1. The simplified stroke drawing process generation algorithm based on the deep neural network is characterized by comprising the following steps of: the method comprises the following steps:
(1) sampling simple stroke images in an svg format in a data set, and processing the simple stroke images into a sequence (delta x, delta y, p);
(2) inputting the simple stroke process sequence into an encoder part based on a bidirectional circulation network, wherein one is positive sequence input, and the other is negative sequence input;
(3) converting final _ state obtained by the encoder twice to obtain hidden variable Z and initial state h0 of the decoder;
(4) use of the attention mechanism yields the value of attention Ct
(5) The implicit variable Z of the encoder network and the result C of the attention mechanism are combinedtAnd is trained as input to the decoder.
(6) And (3) outputting the decoder network, then mapping the output of the decoder network into a sequence with the dimensionality of (-1,123), wherein-1 is an unspecified size, then inputting the output into the mixed density network to model (delta x, delta y, p), and finally outputting the output into a sequence of the simple stroke drawing process.
2. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the sampling process described in step (1) is processed into a sequence (Δ x, Δ y, p), where (Δ x, Δ y) represents the distance between the sampled point and the x coordinate and the y coordinate of the last point, and p represents whether a stroke ends.
3. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the reason why the encoder portion obtains the image features of the simplified stroke in step (2) and the output after each time of the sequence of drawing processes passes through the recurrent neural network is (batch _ size, timesize, rnn _ size × 2), the output state final _ state of the recurrent neural network unit corresponding to the last time is (batch _ size, rnn _ size × 2), and multiplied by 2 is that the encoder portion is a bidirectional recurrent network. Wherein, batch _ size is the process sequence of how many stroked strokes are input each time in the training process, timepieces represents the length of the process sequence corresponding to one stroked stroke, and rnn _ size refers to how many LSTM neurons are in a recurrent neural network.
4. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the two conversions described in step (3) are: first, a conversion is performed to convert the hidden variable Z to (batch _ size,128), and then a conversion is performed to convert Z to the initial state h0 of the decoder to (batch _ size, dec _ rnn × 2).
5. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: use of the attention mechanism described in step (4): calculating cosine similarity between the output of the decoder at the current moment and the output of all the recurrent neural networks of the encoder, converting the cosine similarity into corresponding weight, and performing weighted summation between the cosine similarity and the output of the encoder at all the moments to obtain an attention value Ct
6. The deep neural network-based simple stroke drawing process generating algorithm of claim 5, wherein: the attention mechanism is calculated according to the following formula:
Figure FDA0002877361070000021
Figure FDA0002877361070000022
Figure FDA0002877361070000023
wherein h isiWhat is shown is the output at the moment i of the encoder,
Figure FDA0002877361070000024
representing the output of the decoder at time t, VT,W1,W2Representing trainable parameters.
7. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: the hidden variable Z in step (5) is used as the input of the decoder at each moment.
8. The deep neural network-based simple stroke drawing process generating algorithm of claim 1, wherein: in the step (6), the decoder network outputs decoder _ output with the size of (batch _ size, timepieces, decoder _ rnn _ size), the output of the decoder network is mapped into a sequence with the dimensionality of (-1,123), wherein-1 is the unspecified size, then the output is input into a mixed density network to model (delta x, delta y, p), the modeling process is two-dimensional Gaussian mixed density modeling, and finally the output is a sequence of a simple stroke drawing process.
CN202011638190.1A 2020-12-31 2020-12-31 Deep neural network-based simple stroke drawing process generation algorithm Pending CN113160343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011638190.1A CN113160343A (en) 2020-12-31 2020-12-31 Deep neural network-based simple stroke drawing process generation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011638190.1A CN113160343A (en) 2020-12-31 2020-12-31 Deep neural network-based simple stroke drawing process generation algorithm

Publications (1)

Publication Number Publication Date
CN113160343A true CN113160343A (en) 2021-07-23

Family

ID=76878364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011638190.1A Pending CN113160343A (en) 2020-12-31 2020-12-31 Deep neural network-based simple stroke drawing process generation algorithm

Country Status (1)

Country Link
CN (1) CN113160343A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658291A (en) * 2021-08-17 2021-11-16 青岛鱼之乐教育科技有限公司 Automatic rendering method of simplified strokes

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658291A (en) * 2021-08-17 2021-11-16 青岛鱼之乐教育科技有限公司 Automatic rendering method of simplified strokes

Similar Documents

Publication Publication Date Title
Dong et al. Semantic image synthesis via adversarial learning
CN110163299B (en) Visual question-answering method based on bottom-up attention mechanism and memory network
CN110851760B (en) Human-computer interaction system for integrating visual question answering in web3D environment
CN106951858A (en) A kind of recognition methods of personage's affiliation and device based on depth convolutional network
CN111028319B (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
RU2665273C2 (en) Trained visual markers and the method of their production
CN112949622B (en) Bimodal character classification method and device for fusing text and image
CN117218498B (en) Multi-modal large language model training method and system based on multi-modal encoder
CN113609326B (en) Image description generation method based on relationship between external knowledge and target
CN114842547A (en) Sign language teaching method, device and system based on gesture action generation and recognition
Xu et al. Face expression recognition based on convolutional neural network
CN115484410A (en) Event camera video reconstruction method based on deep learning
CN116957866A (en) Individualized teaching device of digital man teacher
CN113160343A (en) Deep neural network-based simple stroke drawing process generation algorithm
Mitra et al. CreativeAI: Deep learning for graphics
Ahmed et al. Two person interaction recognition based on effective hybrid learning
CN111382871A (en) Domain generalization and domain self-adaptive learning method based on data expansion consistency
CN116311493A (en) Two-stage human-object interaction detection method based on coding and decoding architecture
CN113553917B (en) Office equipment identification method based on pulse transfer learning
Wu et al. Sketchscene: Scene sketch to image generation with diffusion models
Mishra et al. Environment descriptor for the visually impaired
Gao et al. Infrared image captioning based on unsupervised learning and reinforcement learning
CN113780350B (en) ViLBERT and BiLSTM-based image description method
CN116071825B (en) Action behavior recognition method, system, electronic equipment and storage medium
CN115527216B (en) Text image generation method based on modulation fusion and antagonism network generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination