CN110941960A

CN110941960A - Keyword-based children picture story generation method, system and equipment

Info

Publication number: CN110941960A
Application number: CN201911102388.5A
Authority: CN
Inventors: 徐叶强; 王峰; 窦任荣; 吴云标; 谢海博
Original assignee: Guangzhou Aixue Information Technology Co Ltd
Current assignee: Guangzhou Aixue Information Technology Co Ltd
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2020-03-31

Abstract

The invention provides a method, a system and equipment for generating a children picture story based on keywords. The production method comprises the following steps: collecting children drawing story corpora by using a crawler and cleaning; training the LSTM-based children picture story generation model by taking the cleaned children picture story corpus as training data to obtain a children picture story generation model; acquiring a seed keyword; obtaining sentences containing the keywords from the story corpus of the children picture book according to the seed keywords as seed sentences; and inputting the obtained seed sentence as a seed into the LSTM-based children picture story generation model to generate the children picture story. The generating method is embedded into an intelligent desk lamp reader, and the keywords and the children picture story are input and output by voice to realize voice intelligent interaction. Compared with characters, the voice interaction mode provided by the invention is more friendly to children, easy to accept and more accurate in story generation, and the generated story is more in line with grammar rules and higher in overall quality.

Description

Keyword-based children picture story generation method, system and equipment

Technical Field

The invention relates to the field of natural language processing, in particular to a method, a system, equipment and a storage medium for generating a children picture story based on keywords.

Background

With the continuous development of artificial intelligence technology in various application fields, more and more related technologies appear in the field of automatic text generation, and the aim is to enable a computer to creatively write like a human to create high-quality written works.

At present, the automatic text generation has some applications in the subdivision fields of sports news bulletins or stock market information and the like, but most of the content filling methods based on the templates are relatively fixed in content structure. In addition, the automatic generation of texts for poetry by robots or lyrics generation also has some applications, the basic method is based on the RNN of character level, but the poetry or lyrics generation cannot consider the semantic relation of context, so the poetry or lyrics generation is harder overall.

The children's picture story generation also belongs to the segmentation field of text automatic generation, and at present, the text generation has no specific application in the segmentation field. On one hand, the audience of the children picture story is children, and the metaphorical understanding of poems and lyrics is distinguished due to the fact that the children picture story has the characteristic of direct and white images; on the other hand, the children's picture story is rich and diverse in requirements and is different from the full rule of news bulletins.

The children draw the story can enable the children to read in an easy and pleasant atmosphere, and the story is an important channel for the children to accept external knowledge. The story drawn by the children can improve the language ability of the children, enrich the emotional experience of the children, cultivate good reading habits of the children and the like through humorous and rich story plots and expression modes with luxuriant pictures and texts, and can play a positive role in healthy growth of the children. The creators can apply various subject matters and artistic elements, so that the story presents better artistic effect, presents greater cultural charm, teaches through lively activities and enables children to learn knowledge in the story reading process. Therefore, the automatic text generation technology is applied to the field of automatic children's sketch story generation to automatically generate children's sketch stories, and has important practical significance.

The existing children picture story generation technology is not good enough, only can carry out random generation with similar styles according to the existing corpora or carry out sequential generation according to seed words, and cannot carry out children picture story generation based on key words. On the other hand, children of low age have limited reading ability, cannot draw the story book independently for children to read, and need some form of intelligent device to read the story book.

Disclosure of Invention

In view of the above, the invention provides a method, a system, a device and a storage medium for generating a children picture story based on keywords, so as to solve the problem that the effect of generating the children picture story based on keywords in the prior art is not good.

In order to solve the defects of the prior art, the technical scheme of the invention is as follows:

a children picture story generation method based on keywords comprises the following steps:

s1, off-line model training: training to obtain an LSTM-based children picture story generation model by taking children picture story corpus as training data;

s2, generating a children picture story in real time on line: extracting key words; extracting sentences containing the keywords from the story corpus of the children picture according to the keywords to serve as seed sentences; and inputting the seed sentence as a seed into the LSTM-based children picture story generation model to generate a children picture story.

Preferably, in step S2, the user performs the operation by voice.

Preferably, the step S1 is implemented by the following steps:

s11, collecting corpora: obtaining a children picture book or a children story;

s12, corpus cleaning;

s13, representing the text word vector;

s14, model construction: selecting an Encoder-Decoder framework model of LSTM;

s15, model training: inputting the text obtained after cleaning as a training sample, and setting various parameters including learning rate and iteration period; and carrying out model training to finally obtain the Encoder-Decoder framework model with the best training effect.

Preferably, the process of collecting corpora in step S11 is as follows:

1) webpage crawling: using a nutch frame to perform total-station crawling on the specified children picture-book website and the children story website to obtain html webpages;

2) extracting web page content: extracting texts in the html webpage to obtain the contents of the text drawing and the story text;

3) proxy IP service: to prevent frequent access to the web site from being restricted, a proxy IP mode is added.

Preferably, in the step S13, performing text word vector representation, performing word segmentation on the cleaned corpus, and performing word vector representation by using word2 vec;

the process of constructing the step S14 model comprises the following steps:

performing word vector representation on the washed children picture story corpus, and inputting the words to an Encoder-Decoder framework model of the LSTM; encoding-decoding the word vector representation, specifically:

the encoding part LSTM model inputs vectors with fixed length, and outputs the vectors represented by the word vectors through a Dropout layer after the vectors pass through the LSTM model layer so as to prevent overfitting of the model; finally, generating a semantic vector C with fixed length, wherein the vector represents that an encoder compresses the information of the whole input sequence into a vector with fixed length;

the LSTM model of the decoding part is input as the semantic vector C generated by the encoding part and is output as a sequence with unequal length after passing through the LSTM model layer; after the model is also prevented from being over-fitted by the Dropout layer, a sequence with the same length is finally obtained, and the sequence serves as the output result of the whole model.

Experiments show that the two-layer LSTM has better effect compared with a single-layer LSTM network, the model confusion degree can be reduced, and the Dropout layer can prevent the model from being over-fitted. By adopting the model training mode, the quality of the generated children picture story can be improved under the condition of certain training data, the model training time is shortened, and the training efficiency is improved.

Preferably, the process of step S2 is:

(I) extracting keywords

A user sends a text generation command through voice, ASR is firstly carried out on the voice, the voice is converted into a text, and then keywords in the text are extracted;

(II) seed sentence acquisition

Converting the keywords into seed sentences containing the keywords, and then randomly selecting the seed sentences as the input of the model, wherein the specific flow is as follows:

(1) carrying out sentence division on the training samples;

(2) searching a plurality of sentences containing key phrases, sorting the sentences through the relevance and selecting TopN; the relevance is calculated by dividing the number of contained words by the length of the sentence, and the larger the value is, the larger the relevance is represented;

(3) randomly selecting one of the TopN sentences as a seed sentence to ensure that different texts are generated each time;

(III) text word vector representation

Performing Word segmentation on the seed sentence obtained in the previous step, and then performing Word vector representation by using a Word Embedding open source tool Word2vec of PreTrain;

(IV) Story Generation

Taking the word vector generated in the last step as the input of an Encoder-Decoder framework model, and finally outputting the word vector through the operation of the model to obtain a generated story text;

(V) result output

And performing TTS conversion on the generated story text into voice so as to broadcast the story text to the user.

A keyword-based children's picture story generation system comprises the following modules:

an offline model training module: training to obtain an LSTM-based children picture story generation model by taking children picture story corpus as training data;

the children picture story module is generated on line in real time: acquiring a seed keyword; extracting sentences containing the keywords from the story corpus of the children picture according to the seed keywords to serve as seed sentences; and inputting the seed sentence as a seed into the LSTM-based children picture story generation model to generate a children picture story.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.

A keyword-based children's sketch story generating device comprises a memory, a processor and a computing program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the steps of the method.

Preferably, the equipment further comprises an intelligent desk lamp, the memory and the processor are embedded into the intelligent desk lamp, and the intelligent desk lamp comprises a sound pickup device.

The invention aims at the problem that the children before the school age and in the low-grade age read the paper books, the children picture story generation model is applied to an intelligent desk lamp, the intelligent desk lamp can identify the picture books through a camera, then intelligently broadcast the picture book contents, carry out interactive question answering aiming at the picture book contents, and after the answer is obtained, can identify, judge and feed back the answer. The intelligent desk lamp reader is used as an application carrier, and after a child speaks a keyword, the intelligent desk lamp reader can generate a children picture story based on the keyword and feed back the story in a voice mode.

Compared with the prior art, the invention has the beneficial effects that: according to the technical scheme, the big data are obtained in a web crawler mode, in order to eliminate interference data and improve the computing capability, the data obtained through crawling need to be cleaned, an LSTM Encoder-Decoder framework model is adopted for the cleaned data, and the data training effect can be improved. And finally, the generation method is embedded into an intelligent desk lamp reader, the keywords and the story drawn by the children are input and output by voice, so that voice intelligent interaction is realized, the voice interaction mode is more friendly to the children relative to characters, is easy to accept, and the story generation is more accurate, and the generated story is more in line with grammar rules, so that the overall quality is higher.

Drawings

Fig. 1 is a flowchart of a method for generating a children's sketch story based on keywords according to an exemplary embodiment of the present invention.

Fig. 2 is a schematic diagram of a child sketch story generation model based on LSTM according to an exemplary embodiment of the present invention.

Fig. 3 is a schematic diagram of an LSTM structure according to an exemplary embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Fig. 1 is a flowchart illustrating a method for generating a children's sketch story based on keywords according to an exemplary embodiment of the present invention, where the method includes the following steps:

and step S101, corpus collection. The nutch web crawler frame is used for corpus collection, so that 1 million pieces of children's picture books and children's stories are obtained, and the concrete flow is as follows:

1. webpage crawling: using a nutch frame to perform total-station crawling on the specified children picture-book website and the children story website to obtain html webpages;

2. extracting web page content: extracting texts in html webpages by constructing a template analyzer, a full text extraction analyzer and an intelligent extraction analyzer to obtain drawing text contents and story text contents;

3. proxy IP service: in order to prevent the limitation of frequently visiting the website, the crawler task can be quickly and stably completed by adding the proxy IP mode.

Step S102, data cleaning: because the collected corpus contains both noisy data and dirty data, semantic consistency of the generated text is affected if the corpus is not cleaned. Therefore, the collected children's picture story corpus is cleaned, and noise data and dirty data such as English parts, special characters and the like in the Chinese-English mixed picture story are removed.

Step S103, the text word vector represents: and performing Word segmentation on the cleaned corpus, and then performing Word vector representation by using a Word Embedding open source tool Word2vec of PreTrain.

Step S104, model construction: the text generation model adopts an LSTM-based Encoder-Decoder framework model, as shown in FIG. 2. The Encoder-Decoder framework is divided into two parts of coding-decoding: the encoding part converts the input sequence into a vector of fixed length; and a decoding part for converting the fixed vector generated before into an output sequence. The specific implementation of encoding and decoding therein is made of LSTM.

The Encode-Decoder framework is used as a very common model framework for deep learning, and has good performance on the processing effect of a Sequence2Sequence task similar to text generation. While LSTM, as a variation of RNN, models long-sequence sentences and represents the context of the sentence context. Meanwhile, by adopting a door mechanism idea (as shown in fig. 3), the problems of gradient diffusion and gradient explosion of the RNN in long dependence are solved, so that the method is very suitable for being used as an encoding and decoding part of an Encoder-Decoder framework:

(1) the LSTM model of the coding part, which is input as a fixed-length vector, i.e. a vector represented by a text-based word vector. After passing through the LSTM model layer, the output passes through the Dropout layer to prevent the model from overfitting. Finally, a semantic vector C with fixed length is generated, which represents that the encoder compresses the information of the whole input sequence into a vector with fixed length.

(2) And the LSTM model of the decoding part is input into the semantic vector C generated by the encoding part and is output into sequences with different lengths after passing through the LSTM model. After the model is also prevented from being over-fitted by the Dropout layer, a sequence with the same length is finally obtained, and the sequence serves as the output result of the whole model.

Step S105, model training: after the model is built, the model is trained using the open-source deep learning software library Tensorflow. And inputting the text obtained after cleaning as a training sample, setting parameters such as learning rate, iteration period and the like, and then performing model training to finally obtain the Encoder-Decoder framework model with the best training effect.

And step S106, extracting keywords. The user carries out voice interaction with the desk lamp through natural language, and the user sends out a text generation instruction to the application through voice. For example, "tell a story about tigers and rabbits", the speech is ASR-converted into text, and then the keywords "tigers" and "rabbits" are extracted.

Step S107, seed sentence acquisition. The input of the text generation model is several keywords, but if keywords are directly used as the input of the model, the generated text must start with these keywords, so that a fixed text is generated each time. More semantic information is provided for the children's sketch story generation model.

Therefore, the method firstly converts the keywords into seed sentences containing the keywords, and then randomly selects the seed sentences as the input of the model, and the specific flow is as follows:

(1) carrying out sentence division on the training samples;

(2) searching a plurality of sentences containing key phrases, sorting through the relevance and selecting TopN. The relevance is calculated by dividing the number of contained words by the length of the sentence, and the larger the value is, the larger the relevance is represented;

(3) in the TopN sentence, one of the sentences is randomly selected as a seed sentence to ensure that different texts are generated each time.

Step S108, the text word vector represents: and similarly segmenting the seed sentence obtained in the last step, and then performing Word vector representation by using a Word Embedding open source tool Word2vec of PreTrain.

Step S109, story generation. And taking the word vector generated in the last step as the input of an Encoder-Decoder framework model, and finally outputting the word vector after the operation of the model to obtain the generated story text. And performing TTS conversion on the generated story text into voice so as to broadcast the story text to the user.

In the embodiment of the invention, the keywords 'tiger, rabbit and small animal' are input, and the seed sentences 'a small tiger looks around in a grassplot', 'how the rabbit looks' and 'a small animal flies over' are obtained. And then, generating a children picture story to obtain a picture story: how well the rabbit? All are somewhat bad. .. Suddenly, a small animal flies over, and jumps to the ground with a single stroke. The last sound comes. We play with your bar! One tiger is going out of the mountain to find food, and children roll on the grassland. One tiger was found all around the grass. At this time, they come to a beautiful place.

The embodiment of the invention uses a crawler to collect children's picture story corpus, cleans the collected children's picture story corpus, trains a children's picture story generation model based on LSTM by using the cleaned children's picture story corpus as training data to obtain a children's picture story generation model, obtains seed keywords, extracts sentences containing the keywords as seed sentences from the children's picture corpus according to the seed keywords, inputs the seed sentences as seeds into the children's picture story generation model based on LSTM to generate children's picture stories, embeds the children's picture story generation model into an intelligent desk lamp reader, uses the intelligent desk lamp reader as a carrier to input the keywords and output the children's picture story, and interacts in a voice mode. By implementing the embodiment of the invention, the children picture story can be generated based on the keywords, the input of the keywords and the output of the children picture story are performed in a voice mode, and the voice interaction mode is more friendly to children and easy to accept compared with characters.

An exemplary embodiment of the present invention provides a keyword-based children's sketch story generation system, which includes:

and the corpus collection unit is used for collecting the corpus of the children's painted story by using the crawler. The nutch web crawler frame is used for corpus collection, so that 1 million pieces of children's picture books and children's stories are obtained, and the concrete flow is as follows:

The data cleaning unit is used for cleaning the collected children picture story corpus;

and a model training unit. And the method is used for training the LSTM-based children drawing story generation model by taking the cleaned children drawing story corpus as training data. The model structure is shown in FIG. 3 below:

the children picture story is used as training data and input into a children picture story generation model based on the LSTM, converted into word vectors, and finally output through two layers of the LSTM and Dropout and a Dense layer. Using the Tencent AILAB open source Chinese word vector, the word vector dimension is 200. The model learning rate was 1e-3, batch _ size was 100, and Dropout was 0.5.

A model generation unit. And obtaining a child picture story generation model through the model training process.

And the seed keyword acquisition unit is used for inputting the seed keywords for generating the children picture story.

And the seed sentence acquisition unit is used for extracting sentences containing the keywords from the corpus of the children picture story as seed sentences according to the seed keywords, and providing more semantic information for the generation model of the children picture story.

And the story generation unit is used for inputting the seed sentences serving as seeds into the LSTM-based children's sketch story generation model to generate children's sketch stories.

And the equipment embedding unit is used for embedding the children picture story generation model into an intelligent desk lamp reader.

And the terminal interaction unit takes the intelligent desk lamp reader as a carrier, is used for inputting keywords and outputting the children's picture story, and interacts in a voice mode.

The method comprises the steps of collecting children picture story corpus by using a crawler, cleaning the collected children picture story corpus, training a children picture story generation model based on an LSTM by using the cleaned children picture story corpus as training data to obtain a children picture story generation model, obtaining seed keywords, extracting sentences containing the keywords from the children picture story corpus according to the seed keywords to serve as seed sentences, inputting the seed sentences serving as seeds into the children picture story generation model based on the LSTM to generate children picture stories, embedding the children picture story generation model into an intelligent desk lamp reader, inputting the keywords and outputting the children picture story by using the intelligent desk lamp reader as a carrier, and interacting in a voice mode. By implementing the embodiment of the invention, the children picture story can be generated based on the keywords, the input of the keywords and the output of the children picture story are performed in a voice mode, and the voice interaction mode is more friendly to children and easy to accept compared with characters.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A children picture story generation method based on keywords is characterized by comprising the following steps:

2. The method according to claim 1, wherein in step S2, the user performs the operation by voice.

3. The method according to claim 2, wherein the step S1 is implemented by the following steps:

s11, collecting corpora: obtaining a children picture book or a children story;

s12, corpus cleaning;

s13, representing the text word vector;

s14, model construction: selecting an Encoder-Decoder framework model of LSTM;

4. The generating method according to claim 3, wherein the step S11 includes the following steps:

5. The generating method according to claim 3, wherein in the step S13, text word vector representation is performed, the cleaned corpus is participled, and word2vec is used for word vector representation;

the process of constructing the step S14 model comprises the following steps:

6. The method according to any one of claims 2 to 5, wherein the process of step S2 is:

(I) extracting keywords

(II) seed sentence acquisition

(1) carrying out sentence division on the training samples;

(III) text word vector representation

(IV) Story Generation

(V) result output

7. A children's picture story generation system based on keywords is characterized by comprising the following steps:

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

9. A keyword-based children's transcript story generation apparatus comprising a memory, a processor and a computing program stored on the memory and executable on the processor, the processor executing the program to perform the steps of the method of any one of claims 1 to 6.

10. The apparatus of claim 8, further comprising an intelligent desk lamp, wherein the memory and processor are embedded on the intelligent desk lamp, wherein the intelligent desk lamp comprises a sound pickup device.