CN109062404B

CN109062404B - Interaction system and method applied to intelligent early education machine for children

Info

Publication number: CN109062404B
Application number: CN201810799639.9A
Authority: CN
Inventors: 吴成东; 刘鑫; 丁鹏
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2020-03-24
Anticipated expiration: 2038-07-20
Also published as: CN109062404A

Abstract

The invention belongs to the technical field of early education of children, and provides an interactive system and method applied to an intelligent early education machine for children. The invention integrates expression recognition, handwritten character recognition, voice recognition, text processing and voice evaluation with the field of early education of children. Mainly comprises the following aspects: firstly, using CNN to realize facial expression recognition and assist children teaching; secondly, using CNN to realize handwritten character recognition and helping children to write characters according to correct strokes and stroke orders; thirdly, voice interaction between the children and the early education machine is realized by using voice recognition and SVM text processing; and fourthly, using the voice to evaluate, teach or correct the pronunciation of the children. Compared with the existing early education machine, the early education machine changes the early education mode, enriches the early education content and improves the interest of early education.

Description

Interaction system and method applied to intelligent early education machine for children

Technical Field

The invention belongs to the technical field of early education of children, and relates to an interactive system and method applied to an intelligent early education machine for children.

Background

The early education machine is an educational electronic product specially used for promoting the learning interest of children for early education of children. The early education machine on the current market is basically in a key type or a point-reading type, and the man-machine interaction mode is more traditional, namely the early education machine simply tells stories and children passively listen to stories; the early education machine in the market can not capture the emotion and facial expression of the child, has no emotion and warmth, can not detect the current emotional state of the child in the teaching process, and can not teach and examine the situation according to the situation; meanwhile, the early education machine on the market does not teach children to correctly write characters, strokes and foreign language letters, and daily pronunciations of the children cannot be monitored and corrected.

Disclosure of Invention

Aiming at the defects of the existing early teaching machine, the invention provides a method for applying facial expression recognition, handwritten character recognition, voice interaction, text processing and the like to the early teaching machine for children so as to solve the problem that the early teaching machine is not intelligent enough.

The technical scheme of the invention is as follows:

an interactive system applied to an intelligent early education machine for children comprises a facial expression recognition module, a handwritten character, stroke and foreign letter recognition module, a child insert language filtering module and a child pronunciation evaluation and correction module;

the facial expression recognition module is used for realizing facial expression recognition through a CNN network, judging the type of the expression, selecting the teaching operation corresponding to the expression and realizing that the early education machine has the function of assisting teaching; for example: as shown in fig. 2, when the child shows a confused expression in the learning process, the teaching contents can be explained in further detail so as to facilitate the child to understand.

The recognition module of the handwritten characters, the strokes and the foreign language letters realizes the recognition of the handwritten characters, the strokes and the foreign language letters through a CNN network, detects whether the sequence of the handwritten characters or the foreign language letters and the strokes of the handwritten characters or the foreign language letters is correct or not, and realizes that the early education machine has the function of teaching children to correctly write the characters or the foreign language letters; as shown in fig. 3, in the process of teaching a child to write a character or a foreign language letter, the early teaching machine plays a writing animation of the character or the foreign language letter, and prompts the writing skill of the character or the foreign language letter, so that the child writes. The early education machine detects whether the written characters or foreign letters of the children and the stroke sequence of the written characters or foreign letters are correct. If not, the early teaching machine can play the writing animation of the character or the foreign language letter again and prompt the writing skill of the character or the foreign language letter, and the child writes again until the writing is correct.

The filtering module of the child inserted words is used for judging the text to be a negative text when the input voice contains the inserted words, so as to abandon the response to the text; the addition of the module can improve the response efficiency of the early education machine and the child in the man-machine conversation process. As shown in fig. 4, when the child speaks the insert during the dialog with the early education machine, the early education machine may judge that the text is a negative text, thereby abandoning the response to the text.

The pronunciation evaluating and correcting module for children is used for evaluating and correcting the daily pronunciation of children according to the pronunciation error or defective words in the daily conversation process of the recording children and the early teaching machine, as shown in figure 5, the early teaching machine can correct the pronunciation of children, when the children enter the pronunciation correcting module, the early teaching machine can select problem words, as shown in figure 6, the early teaching machine accurately reads the words, and the children can read the words and the words, if the children still have the pronunciation error, the early teaching machine reads the words again and prompts the pronunciation mode and the skill of the words, and the children read the words again until the pronunciation is correct.

The method for applying the interactive system is characterized by comprising the following steps:

step 1, establishing a database; establishing a database A of facial expressions, handwritten characters, strokes and foreign letters; a database B is used for establishing a children daily conversation text and inserting language database; a database C, positioning and collecting the wrong pronunciation or the defective pronunciation of the daily dialogue of the children, and establishing a problem pronunciation database;

step 1-1, collecting data; acquiring a child facial expression data set, a handwritten picture of simple characters and strokes and a handwritten picture of foreign letters;

step 1-2, classifying and labeling the expression data, and classifying the expression data into various expressions such as anger, happiness, confusion, fear, disgust, sadness and the like, labeling the handwritten character, establishing a database, and classifying the handwritten stroke picture into various Chinese characters such as mouth, hui, hand, foot, cat, dog and the like, labeling the handwritten stroke picture into various strokes such as points, horizontal, vertical, left-falling, right-falling and the like, labeling the handwritten foreign language letter picture into various foreign language letters such as A, B, C, α, β, gamma and the like;

and step 2, a specific implementation method of the functional module comprises a facial expression recognition module, a recognition module for handwritten characters, strokes and foreign letters, a filtering module for child inserted words and a pronunciation evaluation and correction module for children.

Step 2-1, realizing a facial expression recognition module:

step 2-1-1, carrying out gray level processing on the picture, and inputting the picture into a CNN network for training, wherein the specific steps are as follows:

step 2-1-1-1, establishing a CNN neural network model, and setting a network structure; the CNN neural network comprises an input layer, two hidden layers, a full connection layer and an output layer; each hidden layer comprises a convolutional layer and a sub-sampling layer; the convolutional layer adopts a sigmoid activation function to play a role in local sensing and parameter sharing; the sub-sampling layer is used for reducing visual redundancy and network parameters; the fully connected layer maps the learned distributed feature representation to a sample mark space; the output layer, namely a classifier, adopts softmax regression, and the network structure is shown in figure 1 and comprises input layer nodes, hidden layer nodes and output layer nodes;

and 2-1-1-2, initializing parameters of the CNN neural network, wherein the initialization of the network weight is to endow all connection weights (including a threshold) in the network with an initial value. If the initial weight vector is in a relatively flat area of the error surface, the convergence rate of the network training may be abnormally slow. The connection weights and thresholds of the network are initialized to random interval values between-0.30, + 0.30; setting an excitation function of a hidden layer, and setting the learning rate of the weight value to be a point value between the range of [0,1 ];

step 2-1-1-3, according to the picture input data after graying processing at the moment k-1, inputting the weight from the input layer to the hidden layer node and the weight between the input layer and the hidden layer to obtain the output value of the output layer, and updating the weight from the input layer at the moment k to the hidden layer node and the weight between the input layer and the hidden layer;

step 2-1-1-4, setting a total error threshold value for stopping training, and judging whether the total error of the obtained predicted value is greater than the set total error threshold value or not, if so, adjusting the interval weight from the hidden layer node to the output layer node according to the total error value, inputting the interval weight from the layer node to the hidden layer node, and otherwise, finishing the training of the CNN neural network;

step 2-1-2, the output value prediction of the child expression picture is completed by using the trained CNN neural network;

step 2-2, realizing a recognition module of handwritten characters, strokes and foreign letters;

step 2-2-1, carrying out gray level processing on the picture, and inputting the picture into a CNN network for training, wherein the specific steps are as follows:

step 2-2-1-1, establishing a CNN neural network model, and setting a network structure, wherein the CNN neural network comprises an input layer, three hidden layers, a full connection layer and an output layer; each hidden layer comprises a convolutional layer and a sub-sampling layer; the convolution layer adopts a ReLU activation function to play a role in local perception and parameter sharing; the sub-sampling layer is used for reducing visual redundancy and network parameters; the fully connected layer maps the learned distributed feature representation to a sample mark space; the output layer is a classifier and adopts softmax regression;

step 2-2-1-2, initializing CNN neural network parameters, and initializing the connection weight and the threshold of the network to be random interval values between [ -0.30, +0.30 ]; setting an excitation function of a hidden layer, and setting the learning rate of the weight value to be a point value between the range of [0,1 ];

step 2-2-1-3, according to the picture input data after graying processing at the moment k-1, inputting the weight from the input layer to the hidden layer node and the weight between the input layer and the hidden layer to obtain the output value of the output layer, and updating the weight from the input layer at the moment k to the hidden layer node and the weight between the input layer and the hidden layer;

step 2-2-1-4, setting a total error threshold value for stopping training, and judging whether the total error of the obtained predicted value is greater than the set total error threshold value or not, if so, adjusting the interval weight from the hidden layer node to the output layer node according to the total error value, inputting the interval weight from the layer node to the hidden layer node, and otherwise, finishing the training of the CNN neural network;

step 2-2-2, completing the output value prediction of children characters, strokes and foreign letters by using the trained CNN neural network;

step 2-3, realizing a filtering module of the child insert words;

step 2-3-1, labeling the text data, and dividing the text data into a positive text and a negative text, wherein a positive sample is a normal text; the negative sample is an insertion language text;

step 2-3-2, monitoring the sound in the environment, if no sound exists, continuing the monitoring, otherwise intercepting the sound, wherein the sound intercepting method adopts a sound endpoint detection method based on short-time energy and short-time zero crossing rate, and carries out voice recognition on the intercepted sound to obtain a corresponding text of the section of sound;

2-3-3, building an SVM model to carry out secondary classification on the text data, and specifically comprising the following steps:

step 2-3-3-1, performing word segmentation on all training documents, and representing texts by using the words as the dimensionality of vectors;

step 2-3-3-2, counting all the appearing words and frequencies of the documents in each class, then filtering, and removing stop words and single words;

step 2-3-3-3, counting the total word frequency of the words appearing in each category, and taking the vocabulary with the highest frequency as the characteristic word set of the category;

step 2-3-3-4, removing words appearing in each category, and combining feature word sets of all categories to form a total feature word set; finally, the obtained feature word set is a feature set, and the feature set is used for screening features in the test set;

2-3-3-5, training the SVM by using the screened features to obtain a training model;

step 2-3-4, completing output value prediction of the words of the children by using the trained SVM, responding to the prediction value of the SVM if the prediction value of the SVM is a text book, and giving up the response if the prediction value of the SVM is not the text book;

step 2-4, implementing a child pronunciation evaluation and correction module;

2-4-1, selecting a problem word by the early education machine, carrying out correct pronunciation, and simultaneously prompting the pronunciation mode and pronunciation skill of the problem word;

step 2-4-2, the child follows the pronunciation of the early education machine, and meanwhile, the early education machine further evaluates the pronunciation of the child and judges whether the pronunciation of the child is correct or not;

and 2-4-3, if the pronunciation of the child is correct, ending the pronunciation teaching of the problem word, otherwise, repeating the step 2-4-1 and the step 2-4-2 until the pronunciation of the child is correct.

The invention fully utilizes the image recognition, voice interaction and text processing methods in the field of modern mature artificial intelligence, and designs the traditional early teaching machine into an intelligent early teaching machine with the functions of facial expression recognition, man-machine voice interaction and the like. The invention brings artificial intelligence into the field of early teaching machines for children, adds more fun to the early teaching process of children and improves the learning efficiency of children.

The human face recognition and the expression recognition in the field of artificial intelligence are applied to the field of intelligent early education machines for children, the early education machines can recognize different owners, meanwhile, the early education machines can capture the current emotion of the children, and when the early education machines capture the angry expression of the children, a joke can be spoken, so that the children are happy; when capturing the puzzlement expression of children as early education machine, can carry out careful and repeated explanation to the current teaching content of early education machine, let children learn, understand. Meanwhile, the hand-written character recognition technology is applied to the early teaching machine, so that the early teaching machine has the functions of teaching children to write, stroke order and the like.

The voice interaction and text processing method in the field of artificial intelligence is applied to the field of intelligent children early education machines, and the interaction mode of the early education machines and the children is not the traditional key type or point-reading type any more, but the most common and convenient man-machine conversation mode. And the words of the children in the stage of the Eh-scholar can be simply filtered by a text processing method, namely the words of the children inserted in the processes of Eh-scholar, Eh-kah, Eo, and Ey in the conversation process of the early teaching machine and the children are filtered, so that the error response in the process of asking and answering the early teaching machine and the children is reduced.

The voice evaluation technology in the field of artificial intelligence is applied to the field of intelligent early education machines for children, the pronunciation level is automatically evaluated through the intelligent voice technology, and pronunciation errors and defects are positioned and analyzed. In children and early education machine conversation process, intelligence early education machine can be to children's pronunciation level evaluate, pronunciation mistake is monitored, pronunciation defect fixes a position, gets off children's the word record of wrong pronunciation, then carries out problem analysis, and in the teaching process, we can add pronunciation exercise part, reads children's the word of wrong pronunciation aloud for children, then children follow-up reading to this corrects children's pronunciation.

The CNN network is used for realizing facial expression recognition and handwritten character recognition, the SVM is used for realizing simple character processing, and the voice technology is used for realizing voice interaction and voice evaluation functions.

Drawings

Fig. 1 is a schematic diagram of a convolutional neural network structure.

FIG. 2 schematic diagram of expression recognition application of intelligent early education machine

Fig. 3 is a schematic diagram of the intelligent early education machine teaching children writing and writing order.

Fig. 4 is a flow chart of a voice interaction and a child insertion filtering process between a child and an intelligent early education machine.

Fig. 5 is a flow chart of the intelligent early education machine recording the mispronunciation words.

Fig. 6 is a flowchart of pronunciation correction and teaching of the intelligent early education machine.

Detailed Description

The following detailed description is given to specific embodiments with reference to the accompanying drawings.

As shown in fig. 1, when a child lights up the screen of the early teaching machine, the early teaching machine starts to perform face recognition, matches the result of the face recognition with the face in the database, if the matching is successful, the early teaching machine is allowed to enter the system, otherwise, the face registration can be performed, and the registered face is added into the face database so as to be successfully matched next time. The early education machine can identify the current user, so that different databases of different users can be called to realize the purpose of teaching according to people and according to personal conditions. As shown in fig. 4, after the child enters the early education machine system, the early education machine can enter the story telling submodule through the voice instruction "tell me a story of the white snow princess" and search for the keyword of the white snow princess ", and in the process, the early education machine can screen the voice instruction, filter out the children's insertion words of" Eyah, Eyao, Eao, and so on ", and reduce the voice instruction false response rate. Meanwhile, as shown in fig. 5, the pronunciation level of the child can be monitored and evaluated, the unqualified pronunciations are subjected to error, defect positioning and problem analysis, the unqualified pronunciations are recorded in the database, and then the children pronunciations are trained in a targeted manner in the child pronunciation training submodule, wherein the specific process is shown in fig. 6. In the process of teaching children, the early teaching machine can monitor expressions of the children in real time, if the children are confused in the answering process and answer time is overtime, the early teaching machine determines that puzzlement points and difficulties occur in the answering process of the children, the early teaching machine analyzes and answers the questions in detail, otherwise, the answering process is automatically ended, and the specific process is shown in fig. 2. When a child enters the stroke and stroke order practice submodule through a voice command 'writing practice', the early teaching machine can disassemble the strokes and the stroke orders of the fonts, the child respectively writes the strokes and the stroke orders of the fonts, and after the practice, the early teaching machine gives any Chinese character and the child starts to write. If the strokes of each stroke in the writing process of the children are correct and the stroke order in the whole writing process is also correct, the early education machine determines that the children write correctly, otherwise, the early education machine indicates that the children write wrongly and gives a writing mode with correct font, so that the children write repeatedly until the writing is correct, and the specific process is shown in fig. 3.

The SVM filters the inserted words of the children:

recording daily conversations of children to obtain 1000 text materials (the normal text and the meaningless text respectively account for 50%); the 1000 pieces of human-computer dialog text are numbered from 1 to 1000, wherein the numbers 1-800 are training texts, and the number 801-1000 are testing texts.

And (5) constructing a filtering link of the inserted texts by using an SVM model. The training test is carried out by using the SVM realized by Python to obtain a comparison table of the real value of the man-machine conversation text and the SVM distinguishing value, wherein '1' represents a normal child text, and '0' represents a child insert language text, as shown in the following table:

as shown in the table, the intelligent early education machine only answers and responds to the children utterances judged to be 1 in the text screening link realized by the SVM model. Experiments prove that the early education machine achieves the accuracy rate of 98.8% of the children inserting words from the original condition that the texts of the children inserting words are not screened completely. In a word, in the process of dialogue with children, the intelligent early education machine can filter out some meaningless request texts, and the error response rate of the early education machine is improved.

Claims

1. A method applied to an interactive system of an intelligent early education machine for children is characterized by comprising a facial expression recognition module, a handwritten character, stroke and foreign letter recognition module, a child inserted language filtering module and a child pronunciation evaluation and correction module;

the facial expression recognition module is used for realizing facial expression recognition through a CNN network, judging the type of the expression, selecting the teaching operation corresponding to the expression and realizing that the early education machine has the function of assisting teaching;

the recognition module of the handwritten characters, the strokes and the foreign language letters realizes the recognition of the handwritten characters, the strokes and the foreign language letters through a CNN network, detects whether the sequence of the handwritten characters or the foreign language letters and the strokes of the handwritten characters or the foreign language letters is correct or not, and realizes that the early education machine has the function of teaching children to correctly write the characters or the foreign language letters;

the child insert language filtering module is used for judging that the text containing the insert language is a negative text when the input speech contains the insert language, so as to abandon the response to the text containing the insert language;

the children pronunciation evaluating and correcting module is used for correctly reading the words and phrases by the early teaching machine according to the recorded wrong or defective words and phrases during the daily conversation process of the children and the early teaching machine, and then the children follow-up reading;

the method of the interactive system comprises the following steps:

step 1-2, classifying and labeling the expression data, labeling the handwritten characters, establishing a database, labeling the handwritten stroke pictures, and labeling the handwritten foreign language letter pictures;

step 2, a specific implementation method of the functional module comprises a facial expression recognition module, a recognition module for handwritten characters, strokes and foreign letters, a filtering module for child inserted words and a child pronunciation evaluation and correction module;

step 2-1, realizing a facial expression recognition module:

step 2-1-1-1, establishing a CNN neural network model, and setting a network structure; the CNN neural network comprises an input layer, two hidden layers, a full connection layer and an output layer; each hidden layer comprises a convolutional layer and a sub-sampling layer; the convolutional layer adopts a sigmoid activation function to play a role in local sensing and parameter sharing; the sub-sampling layer is used for reducing visual redundancy and network parameters; the fully connected layer maps the learned distributed feature representation to a sample mark space; the output layer, namely a classifier, adopts softmax regression and comprises input layer nodes, hidden layer nodes and output layer nodes;

step 2-1-1-2, initializing CNN neural network parameters, and initializing the connection weight and the threshold of the network to be random interval values between [ -0.30, +0.30 ]; setting an excitation function of a hidden layer, and setting the learning rate of the weight value to be a point value between the range of [0,1 ];

step 2-3, realizing a filtering module of the child insert words;

step 2-4, implementing a child pronunciation evaluation and correction module;