CN109559576B

CN109559576B - Child accompanying learning robot and early education system self-learning method thereof

Info

Publication number: CN109559576B
Application number: CN201811367002.9A
Authority: CN
Inventors: 罗青; 邹逸群; 郭璠; 唐琎; 李凡; 覃若彬
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2020-07-28
Anticipated expiration: 2038-11-16
Also published as: CN109559576A

Abstract

The invention discloses a child accompanying learning robot and a self-learning method of an early education system thereof, wherein the self-learning method comprises the following steps: step A10, training a convolutional neural network; step A20, extracting feature vectors from an input image by using a convolutional neural network; step A30, grouping and quantizing the eigenvectors by adopting a product quantization technology; step A40, generating a reference alphabet according to the Imagenet data set; step A50, acquiring images and categories of unknown new objects, extracting feature vectors of the images of the new objects, grouping and quantizing the feature vectors, and searching matched new object character strings in a reference alphabet; matching and connecting the character strings of the new object with the categories in the associative memory model to realize the learning of the new object into the early education system; and step A60, acquiring the image of the object to be recognized, and recognizing the type of the object to be recognized by the early education system. The invention can realize the study of new knowledge together with children and the joint competition, thereby improving the learning interest of children.

Description

Child accompanying learning robot and early education system self-learning method thereof

Technical Field

The invention relates to intelligent equipment, in particular to a child accompanying learning robot and a self-learning method of an early education system of the child accompanying learning robot.

Background

Preschool children are not fully developed in intelligence and body, and need to accompany and take care of adults for a long time. Moreover, the children stage is the sensitive period of fastest development of abilities of sports, language, mathematics and the like, and the importance of the children stage is self-evident, so that parents need to pay a great deal of effort and cost to accompany and educate the children. Most of the existing early education systems only have the audio and video playing functions, and although some early education systems can simply interact with the systems in a voice or touch mode, the interactive contents need to be existing materials in a system database. That is to say, the existing material function in the "on demand" system is mainly possessed by the existing early education system, and because the early education system does not possess the learning ability, the system can not learn the new knowledge that the system database encountered in the practical application can not cover, and the system can not reach the basic functions of learning, counting and the like together with the children through lively teaching, and can not satisfy the requirement of jointly growing with the children.

Disclosure of Invention

Aiming at the technical problem that the existing children early education system does not have the self-learning function, so that new knowledge which is not covered by the system cannot be learned, the invention provides the early education system self-learning method of the accompanying learning robot for the children.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a self-learning method of an early education system of a robot accompanied by children comprises the following steps:

step A10, training a convolutional neural network;

constructing a convolutional neural network model, taking all sample images of the Imagenet data set as input, taking the types of the sample images as labels, and training a convolutional neural network;

step A20, extracting image characteristic information;

performing feature extraction on the input image by adopting the convolutional neural network obtained by training in the step A10, and outputting a feature vector;

step A30, grouping and quantizing the feature vectors;

grouping and quantizing the feature vectors by adopting a product quantization technology to form m sub-feature vectors;

step A40, generating a reference alphabet;

processing all sample images of the Imagenet data set according to the steps A20 and A30 to obtain m sub-feature vectors of each sample image;

for all sample images, sub-feature vectors with the same sequence are taken to form 1 grouped data set, and m grouped data sets are counted in total;

k is obtained by calculating each packet data set by adopting a K-means algorithm_sA cluster center for recording k of each packet data set_sThe center of each class is 1 class set, and the total m class sets form a reference alphabet;

presetting a reference alphabet in an input layer of an associative memory model;

step A50, learning new things;

acquiring images and categories of unknown new things from the outside of the early education system;

processing the acquired new object image according to the step A20 and the step A30 to obtain m sub-feature vectors of the new object; traversing m sub-feature vectors of the new object, and searching matched letters in a class set of a reference alphabet, which is the same as the current sub-feature vector sequence of the new object, to obtain a character string of the new object with the length of m;

activating nodes of each letter of a character string of the new object in an input layer of an associative memory model, wherein the activated nodes of the input layer are connected with nodes of an output layer representing the category of the new object in a matching manner by the associative memory model; the associative memory model is a binary neural network comprising an input layer and an output layer, and the nodes of the output layer are preset with object types;

step A60, identifying an object to be identified;

acquiring an image of an object to be identified from the outside of the early education system;

processing the acquired image of the object to be identified according to the step A20 and the step A30 to obtain m sub-feature vectors of the object to be identified; traversing m sub-feature vectors of the object to be recognized, and searching matched letters in a class set of a reference alphabet, which is the same as the current sub-feature vector sequence of the object to be recognized, so as to obtain an object to be recognized character string with the length of m;

and activating nodes of all letters of the character string of the object to be recognized in an input layer of the associative memory model, searching nodes of an output layer matched and connected with the activated nodes of the input layer by the associative memory model, outputting object types corresponding to the nodes of the output layer, and recognizing to obtain the types of the object to be recognized.

For new objects which are not available in the early education system, the scheme can match and connect the images of the new objects with the categories by a self-learning method under the condition that the images of the new objects are acquired and other children/teachers inform the categories of the new objects, the matching relation between the images of the objects and the categories stored in the association memory model is updated, and the children accompanying learning robot realizes incremental learning of the objects. Therefore, under the teaching of children or other people, new objects and new characters are learned and recognized, new knowledge is learned together with the children, and the children compete with the new knowledge, so that the learning interest of the children is improved.

Further, the output layer of the associative memory model allocates at least 2 nodes for each object category; when learning a new thing, the output layer node that is connected in match with the active input layer node is the first node under the current thing category that is not connected to the input layer node.

The classes of the output layer events of the associative memory model are respectively provided with a plurality of nodes, so that the similar events can be studied for many times, and the connection relation between a plurality of events and the classes is recorded, thereby improving the identification degree of the event identification.

Further, the associative memory model is a two-layer neural network.

Further, the method for respectively searching the letters matched with the m sub-feature vectors of the object in the m class sets of the reference alphabet comprises the following steps: respectively calculating k in the current sub-feature vector and the corresponding class set_sAnd (3) taking 1 closest class center as 1 letter corresponding to the object according to the distance between the class centers, and obtaining m letters from m sub-feature vectors of the object so as to obtain a character string which corresponds to the object and has the length of m.

Further, the category of the thing and the real name of the thing are stored in a key value pair file in txt format; when a new object is learned, acquiring the real name of the new object from the outside, and then searching the category corresponding to the real name in the key value pair file; if the category information does not exist, inserting a new key value pair to represent the newly learned category, and inputting the category of the matter into the associative memory model; when the object to be recognized is recognized, the associative memory model outputs the class of the object to be recognized, and the early education system outputs the real name of the object to be recognized by searching the real name corresponding to the class in the key value pair file.

According to the scheme, the storage space can be saved by storing the key value pairs.

Further, the convolutional neural network model is trained by using the cross entropy as a loss function, and the weight matrix of each layer of the convolutional neural network is updated according to the calculated value L of the loss function;

wherein the loss function is as shown in equation 1:

where N denotes the number of sample images input to the convolutional neural network at a time, y_iIs the category true label, y 'of the ith sample image'_iAnd (5) predicting the ith sample image for the convolutional neural network.

Further, the convolutional neural network comprises an input layer, a convolutional layer, a layer jump connection, a pooling layer, a full connection layer and a classification layer; when the convolutional neural network is trained, calculating a loss function value according to a predicted value output by the classification layer; when the convolutional neural network is used for extracting the image feature information, the feature vector of the image is formed by the features output by the full connection layer.

Corresponding to the self-learning method of the early education system, the invention also provides a robot for accompanying children, which comprises:

the function selection module is used for starting a new object learning function or an object identification function in the early education system function;

the storage module is used for storing the type of the event, the key value pair file of the real name and a storage reference alphabet;

the information input module is used for acquiring the image and the real name of the new object when the learning function of the new object is started, or acquiring the image of the object to be identified when the identification function of the object is started;

the information processing module is used for searching the category corresponding to the real name in the key value pair file according to the acquired real name of the new object when the new object learning function is started, and then learning the new object according to the method according to the image and the category of the new object; when the object identification function is started, the method is used for identifying the object to be identified according to the acquired image of the object to be identified, and then searching the real name corresponding to the category in the key value pair file according to the identified category;

and the information output module is used for outputting the real name of the object to be identified when the object identification function is started.

Further, the information input module comprises a camera and a voice input unit.

Further, the information output module comprises a display screen and a voice output unit.

Advantageous effects

The invention provides a child accompanying learning robot and an early education system self-learning method thereof, for new objects which do not exist in an early education system, the scheme can activate an input layer node of an associated memory model according to a character string obtained by the image of the new object after being subjected to convolutional neural network and product quantization and connect the input layer node with an output layer node of a corresponding class by the self-learning method under the condition that the image of the new object and other children/instructors inform the real name or class of the new object, the matching relation between the object image and the class stored in the associated memory model is updated, and the child accompanying learning robot realizes incremental learning of the objects. Therefore, under the teaching of children or other people, new objects, characters, formulas and the like are learned and identified, new knowledge is learned together with the children, and the children compete with each other, so that the learning interest of the children is improved.

Drawings

A specific embodiment of the invention will be described in detail hereinafter by way of example and not by way of limitation with reference to the accompanying drawings. The attached drawings are as follows:

fig. 1 is a schematic configuration diagram of a child companion robot according to an embodiment of the present invention;

fig. 2 is a block diagram showing the block configuration of a child companionship robot according to an embodiment of the present invention;

FIG. 3 is a schematic view of a child companion robot according to an embodiment of the present invention for recognizing objects, characters, and equations;

FIG. 4 is a schematic diagram of a learning framework of the learning module;

FIG. 5 is an exemplary diagram of alphabet generation in the learning module;

FIG. 6 is an exemplary diagram of the results of feature product quantization in the learning module;

FIG. 7 is a schematic diagram of a learning or recognition process of a companion robot;

FIG. 8 is a schematic diagram of a convolutional neural network model, in which (a) is a schematic diagram of a residual block, (b) is a schematic diagram of a convolutional layer, (c) is a schematic diagram of a max-pooling layer, and (d) is a schematic diagram of a full-link layer;

FIG. 9 is a convolutional neural network training flow diagram;

FIG. 10 is an exemplary simplified neural network model of the present invention.

Reference numerals: 1-camera, 2-interactive function selection area, 3-voice receiving and output area and 4-display screen.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

The invention discloses a child accompanying robot, which is schematically shown in the figure 1 in the structural diagram and comprises a camera 1, an interactive function selection area 2, a voice receiving and outputting area 3 and a display screen 4. The camera 1 is located at the top of the companion robot and is mainly used for acquiring various image information for object identification from the surrounding environment. The interactive function selection area 2 is located at the front part of the body of the companion robot, and a user can interactively select various functions of the robot by clicking, dragging, sliding and the like in the area. The voice receiving and outputting section 3 is provided at the back of the head of the companion robot and is mainly used for recognizing and receiving the voice of the user from the surrounding environment and outputting a reply voice in response to a voice message or a feedback sound in response to each operation of the system. The display screen 4 is positioned at the head front part of the companion robot and is mainly used for displaying the response information of the robot and the related expressions of the robot. The embodiment adopts the touch screen and has the functions of an interactive function selection area and a display screen. For example, if the user answers the question correctly, a smiling face may be displayed on the display screen of the companion robot, similar to other emotional expressions.

According to the structure of the child accompanying learning robot, a functional block diagram of an early education system is shown in fig. 2. The early education system of the child companion robot comprises: the device comprises a function selection module, a storage module, an information input module, an information processing module and an information output module.

The early education system of the child companion robot has multiple functions: the functions of learning articles, learning words, counting, singing, learning and the like, the corresponding function is selected in the interactive function selection area by a user, and the function is started by the function selection module of the early education system. The interactive interface of the function selection module can be in a touch screen form, and can be conveniently selected by a user in a voice command form.

The early education system has a memory as a storage module, and is used for storing parameters of a convolutional neural network, a reference alphabet obtained by learning on an Imagenet data set, a connection mode of a binary associative memory model, a category (namely category information of an event, namely a key-value pair relationship) corresponding to an output node of the binary associative memory model, and the like.

And the information input module is used for inputting voice information of a user and image information for system identification, such as objects, texts, equations and the like extracted from the surrounding environment of the accompanying robot.

And the information processing module is mainly used for processing the voice and image information acquired by the information input module to finish the learning and identifying work of related target objects in the input image, and the learning module can enable the accompanying robot to acquire objective entity information outside the system through a related learning mechanism. The processing of the voice information includes recognition and understanding of the voice; the processing of the image information includes image preprocessing, image feature extraction, and image recognition. The process includes the following two types of cases:

firstly, the method is simple in identification, directly calls a model trained on an existing database to identify the image, returns the identification result to the system, can identify common objects and characters, and can perform simple four-rule operation.

And secondly, new things are learned, and for objects, characters and the like which are not in a system database, the companion learning robot can update the old model by using the new data through a learning algorithm of the learning module under the condition that other children or instructors inform the types or text information of the new things, so that incremental learning is realized. After a plurality of times of learning, the new things can be identified, so that the purposes of learning together with children as users and making progress are achieved, and the learning interest of the children is improved in a mode of accompanying the learning of the children.

And the information output module comprises a voice output module and a display output module. And the voice output module is used for outputting response voice of the robot or responding to feedback voice of the system and playing music in the singing function. And the display output module is used for outputting patterns matched with the voice information and the response voice or related expressions of the accompanying robot, warning information sent to the guardian and the like.

The information processing module comprises a learning module and a warning module.

And the warning module judges the interaction time of the child and the accompanying learning robot, considers that the child has a small difference for a long time when the child does not interact with the accompanying learning robot for a certain time, and sends warning information to the guardian in time in a voice prompt or display screen display mode.

And the learning module is used for learning new things which do not exist in the early education system database in a self-learning mode, and the new things refer to objects or text information. The early education system of the invention utilizes the functions of the learning module, can learn and identify new objects, new characters and the like under the teaching of other people, can learn new knowledge together with children, compete with each other and progress together, thereby improving the learning interest of the children.

The early education system realizes the learning function through the learning module, can serve as a child playmate, can learn basic skills such as object learning, character learning, arithmetic and the like together with the child through education in entertainment, and can grow together with the child. Therefore, the method aims at new objects or new characters without data when the recognition model is trained. A learning framework is constructed in the learning module of the system information processing module, as shown in fig. 4. The learning module adopts a method of combining a Convolutional Neural Network (CNNS) and a binary associative memory model, firstly, a pre-trained convolutional neural network model is utilized to extract characteristics of a new sample, then, the extracted characteristics are mapped into a limited alphabet list by utilizing a product quantization technology, and finally, the binary associative memory model is used to store the new sample, thereby realizing the learning of the new sample. And during identification and prediction, a binary associative memory model is used for carrying out nearest neighbor search, so that the identification of a new category is realized. Therefore, the robot can learn new objects and text information which are not existed originally in a self-learning mode, so that the robot can learn with children and jointly grow. As can be seen from fig. 4, the illustrated learning framework mainly includes a Convolutional Neural Network (CNNS) and a binary associative memory model. The convolutional neural network has strong learning ability and expression ability, and the image can be well expressed by using the characteristics extracted by the convolutional network pre-trained by millions of data sets in the Imagenet data set.

The invention discloses a self-learning method of an early education system of a robot accompanied with learning by children, which comprises the following steps:

step A10, training the convolutional neural network:

firstly, a convolutional neural network model is constructed, the model used in the embodiment is correspondingly improved on the basis of ResNet, 50 residual blocks are used in total, the structure of the residual blocks is shown in FIG. 8(a), each residual block comprises three convolutional layers and a hopping connection, the structural schematic diagram of the convolutional layers is shown in FIG. 8(b), and the hopping connection can effectively solve the problem of gradient disappearance in the training process; every 10 residual blocks is followed by a maximum pooling layer, which, as shown in fig. 8(c), can reduce the feature size and extract the most significant features. Finally, a full-connectivity layer follows, as shown in fig. 8(d), the network used herein uses a total of three full-connectivity layers, the first full-connectivity layer has 1024 nodes, the second full-connectivity layer has 2048 nodes, the last full-connectivity layer has 1000 (1000 classes of the Imagenet data set), and the result of the last full-connectivity layer passes softmax to obtain the probability of 1000 classes in the Imagenet data set. When the network is subsequently used to extract features, the output of the second fully-connected layer is used as the features.

The convolutional neural network model is then trained using the cross entropy as a loss function, which is shown as equation 1, where y is_iIs the category true label, y 'of the ith sample image'_iAnd (5) predicting the ith sample image for the convolutional neural network. The process of model training is to compare the predicted value and the real label, and update the weight matrix of each layer of the convolutional neural network according to the difference between the predicted value and the real label, in this embodiment, the cross entropy shown in formula 1 is used as a loss function to measure the difference between the predicted value and the real label, and then the weight matrix of each layer of the convolutional neural network model is updated according to the loss function value.

Where N is the batch size, i.e., the number of sample images fed into the training at one time.

Specifically, the process of training the convolutional neural network comprises three parts of forward propagation, calculation of a loss function and backward propagation, wherein the forward propagation is to obtain a predicted value y 'from an image input to the convolutional neural network through convolution, pooling and full connection layers, the loss value L can be calculated according to formula 1 according to the sample true label y and the predicted value y' due to the fact that the true label y of a sample is known, and then the gradient descent method is used for backward propagation and updating the weight matrix of the convolutional neural network.

FIG. 10 is a neural network consisting of two fully-connected layers, where i is the input, h is the middle layer, o is the output layer, and w is the weight, and the forward propagation is calculated as follows:

h₁＝w₁*i₁+w₃*i₂+w₅*i₃(2)

h₂＝w₂*i₁+w₄*i₂+w₆*i₃(3)

o₁＝w₇*h₁+w₉*h₂(4)

o₂＝w₈*h₁+w₁₀*h₂(5)

can be known as o ═ o₁,o₂]Obtaining a predicted value of y ═ y 'after the activation of the softmax function'₁,y'₂]Specifically, the following formula is calculated:

suppose the true label of the sample is y ═ y₁,y₂]The loss values are as follows:

the above is the process of forward propagation and calculating the loss value, and the process of backward propagation and updating the weight parameter is described in detail below.

The loss value L can be known to the parameter w through the chain derivation method₁The gradient calculation formula of (c) is as follows:

similarly, gradient values of other weight parameters can be calculated, and the formula for updating the parameters is as follows:

α is the learning rate, the above process completes one round of learning of the weight parameters of the neural network, the loss value L will decrease continuously after continuous learning until the value converges to a certain value or the training process completes the training process of the convolutional neural network, and obtains the optimized weight matrix, the convolutional neural network of the weight matrix is the trained convolutional neural network, when the feature vector is output by the convolutional neural network in the subsequent process, the feature vector is output by the previous fully connected layer of the classification layer.

Step a20, extracting image feature information: and (3) performing feature extraction on the input image by adopting a convolutional neural network to form a feature vector with the dimension d. The extracted feature vector is obtained by outputting the previous full-connection layer of the classification layer of the relational neural network.

Step A30, grouping and quantizing the feature vectors: and grouping and quantizing the d-dimensional feature vectors by adopting a product quantization technology to form m sub-feature vectors, wherein the dimension of each sub-feature vector is d/m.

Step a40, generating a reference alphabet:

acquiring n sample images of the Imagenet data set, wherein the number of the sample images is n, and each sample image is processed according to the steps A20 and A30, so that m sub-feature vectors are obtained from the n sample images;

and for all sample images, taking sub-feature vectors with the same sequence, wherein the sub-feature vectors with the same sequence of n sample images form 1 grouped data set, and each sample has m sub-feature vectors, so that m grouped data sets are counted in total. The sequence identity here means that each sample image has m sub-feature vectors, the m sub-feature vectors are numbered in sequence, and the sub-feature vectors with the same sequence number in each sample image are taken to obtain the grouped data set with the sequence number. Since there are m sub-feature vectors per sample data, all sample images can get m grouped data sets. For example, the 1 st sub-feature vector of each sample image is taken, and the 1 st sub-feature vectors of all sample images form the 1 st grouped data set; then, taking the 2 nd sub-feature vector of each sample image, wherein the 2 nd sub-feature vectors of all the sample images form a2 nd grouped data set; … …, respectively; finally m packet data sets are obtained.

Calculating K of each packet data set by adopting a K-means algorithm_sA class center, note k_sThe center of each class is 1 class set, and the total m class sets form a reference alphabet.

In the example shown in fig. 5, it is assumed that there are n samples in total, the extracted feature dimension d is 12, the number of groups m is 4, and the class center in each group is k_sThe result of the quantization is shown in FIG. 6, where k is assumed_sThe resulting alphabet comprises 4 alphabet sets (i.e. m class sets, J1, 2, 3, 4 respectively), each alphabet set comprising 4 alphabets (i.e. k)_sThe centers of the individuals are respectively C_ijAnd representing the jth class center in the ith group of sub-feature vectors), an alphabet obtained from an Imagenet data set is used as a reference alphabet, the Imagenet data comprises thousands of classes and millions of pictures, and the alphabets obtained by learning on the data can effectively summarize image features, so that the recognition degree of things during recognition is improved.

The reference alphabet is then preset in the input layer of the associative memory model.

Step a50, learning new things:

when the early education system starts the learning function of the new object, as shown in fig. 4, the accompanying robot acquires an image of the new object unknown to the system from the outside of the system through the camera, and other children or instructors inform the robot system of the real name of the unknown new object through voice.

Processing the acquired new object image according to the step A20 to obtain the feature vector x of the new object^dThen, the m sub-feature vectors of the new object are obtained by processing according to the step A30

i denotes the sequence number of the sub-feature vector.

K at the ith class set of the reference alphabet_sSearching letters which are most matched with the ith sub-feature vector of the new object in the center of the individual class until m sub-feature vectors of the new object

The best matching letter is found, resulting in a new object string of length m. Since each class set is obtained by sub-feature vectors with the same sequence of samples, when a new thing is learned, matched letters are searched in the class sets with the same sequence.

The method for searching the most matched letters comprises the following steps: respectively calculating k of the ith sub-feature vector and the ith class set_sAnd (3) taking the class center corresponding to the minimum distance as 1 letter corresponding to the new object, and obtaining m letters from m sub-feature vectors of the new object, so as to obtain a character string with the length of m corresponding to the new object and output the character string to the associative memory model. For example, in FIG. 7, the 1 st sub-feature vector

Finding the letter closest to the center of the 4 classes as C ₁₁2 nd sub-feature vector

Finding the letter closest to the center of the 4 classes as C₂₃The 3 rd sub-feature vector

Finding the letter closest to the center of the 4 classes as C ₃₄4 th sub-feature vector

Finding the letter closest to the center of the 4 classes as C₄₂The character string obtained by product quantization of the new object is C₁₁C₂₃C₃₄C₄₂,

And searching a class corresponding to the real name of the new object in the key value pair file in the txt format stored in the storage module, and finding a node corresponding to the class of the new object in an output layer of the associative memory model. If the category information is not found in the key-value pair file, indicating that the thing has not been learned before, a new key-value pair is inserted into the key-value pair file.

Each letter of the new object character string activates a corresponding node in an input layer of the associative memory model, and the associative memory model connects the input layer node activated by the new object character string with an output layer node representing a new object category in a matching manner.

In this embodiment, the associative memory model is a binary neural network including an input layer and an output layer, and the nodes of the output layer are preset with object types.

The output layer of the associative memory model allocates at least 2 nodes for each object category, when a new object is learned, the output layer node which is matched and connected with the activated input layer node is the first node which is not connected with the input layer node under the current object category until all the nodes of the category are allocated, and the object of the category is fully learned in the system. By studying the same kind of objects for multiple times and recording the connection relation between a plurality of the objects and the categories, the identification degree of the objects during identification is improved.

The associative memory model in this embodiment is implemented using a two-layer neural network: the input layer consists of m groups of nodes, each group comprising k_sA node of the input layer for storing each k of the m class sets of the reference alphabet_sA class center; the output layer comprises RC nodes, wherein C represents the total number of new object classes which can be learned, R represents the number of nodes allocated to each class by the associative memory model, C is 9, and R is 2 in FIG. 7, and R and C can take a very large value in practice, so that the incremental learning of very large classes and data volumes can be realized with only little storage requirement.

For example, in FIG. 7, the letter C is obtained by multiplying and quantizing the new thing to be learned₁₁、C₂₃、C₃₄、C₄₂Inputting the reference alphabet into the associative memory model, respectively activating corresponding nodes in the input layer of the associative memory model (since the reference alphabet is preset in the input layer nodes of the associative memory model, the corresponding nodes here mean that the input layer respectively stores the letter C₁₁、C₂₃、C₃₄、C₄₂The association memory model assigns a category 6 node to the new object category (assuming category ' 6 ') at the output level and activates the first non-connected node of category 6, i.e. ' η₆₁'; then, each node C of the input layer is connected₁₁、C₂₃、C₃₄、C₄₂Respectively with the new object class node η₆₁And performing matching connection. Since the associative memory model in the present invention is a binary memory model, there is no connection weight, and only there is a connection and no connection. When all the R nodes of the category are distributed, objects which represent the category are fully learned into the system, and therefore learning of new samples is achieved.

The accompanying robot is similar to the learning process of children in the learning process of new things, along with repeated learning, the accuracy of recognizing the things can be gradually improved, the situation of common learning and common progress with the children is formed, the mode of accompanying the learning of the children with the role of the children classmate instead of the mode of teaching the learning of the children with the role of the teacher of the children can eliminate the fear and the aversion to learning of the children, and the learning interest of the children is improved.

The existing system stores contents such as objects, characters and the like in various forms inside the system, and the contents are called out when in use, so that the existing system lacks entity objects. The object identified by the accompanying learning robot is an external objective entity, the image of the object is acquired by a camera through a computer vision technology, the object, the character and the formula outside the system are identified through self-learning, the name of the object is spoken by voices, and the calculation result of the character and the formula is commented, so that the interest and the sense of reality of the existing system can be improved.

Step A60, identifying the object to be identified:

when the object identification function of the early education system is started, the accompanying robot acquires an image of an object to be identified from the outside of the system through the camera;

processing the acquired image of the object to be identified according to the step A20 to obtain a feature vector x of the object to be identified^dThen, the m sub-feature vectors of the object to be identified are obtained by processing according to the step A30

i denotes the sequence number of the sub-feature vector.

K at the ith class set of the reference alphabet_sAnd searching the letter which is most matched with the ith sub-feature vector of the object to be recognized in the individual class center until m sub-feature vectors of the object to be recognized find the most matched letter, thereby obtaining the character string of the object to be recognized with the length of m. The method for searching the most matched letters comprises the following steps: respectively calculating the ith sub-feature vector and k_sAnd (3) taking the class center corresponding to the minimum distance as 1 letter corresponding to the object to be recognized, and obtaining m letters from m sub-feature vectors of the object to be recognized, so as to obtain a character string with the length of m corresponding to the object to be recognized and output the character string to the associative memory model.

And activating corresponding nodes of an input layer of the associative memory model by each letter of the object character string to be recognized, searching output layer nodes connected with the activated input layer nodes in a matching manner by the associative memory model, and outputting object categories corresponding to the output layer nodes.

And searching a real name corresponding to the category of the object to be recognized in the key value pair file in the txt format stored in the storage module, and outputting the real name of the object to be recognized to the child accompanying robot through a display screen or voice.

In the present invention, objects may be any of various forms such as objects, characters, and expressions, and the category of the objects refers to a code sequence of each different object. When the object specifically refers to an object, the real name of the object refers to the Chinese name, pinyin and/or English name of the object; when the object is specifically a character, the real name of the object refers to pinyin, paraphrase and the like of the character, and when the object is specifically an arithmetic expression, the real name of the object refers to an arithmetic result.

Fig. 3 is a schematic view of an embodiment of the child companion robot according to the present invention for recognizing objects, characters, and numbers. As shown in fig. 3, when the companion robot sees an object through the overhead camera, the companion robot will automatically recognize the object and notify the user of the name of the object. For example, when the companion robot early education system is in the object recognition function, when the companion robot sees an apple placed in front, it automatically recognizes that the object is an apple, the display screen actively outputs the object name, or when the user asks the companion robot what this is, it can give an answer. The way of informing the user of the real name of the things can be output by voice, and the two characters of the apple, the corresponding pinyin and English words thereof or both can be displayed on the display screen. Similarly, in the character recognition function, when the accompanying robot sees a Chinese character placed in front (such as a 'chess' character), the accompanying robot can automatically recognize the Chinese character and actively output the pinyin and paraphrase of the Chinese character, or give an answer when the user asks. In addition, the child accompanying robot can recognize articles and words, and can perform addition, subtraction, multiplication and division four arithmetic operations. For example, in the mathematical function, when the companion robot sees a card written with "6 + 4" placed in front, or the robot hears a user asking for "6 + 4? When the robot is used, the robot can inform the user of the calculation result of 6+ 4-10 by voice, and simultaneously, the number of 6+ 4-10 is displayed on the display screen.

If the target object of the object identification is the object type existing in the picture of the system according to the Imagenet data set or the object type learned through new objects, the accompanying robot can realize the identification at any time, namely the real name and the related information of the object are immediately informed to the user after the identification. The object types existing in the image data set picture are pre-learned into the associative memory model by the accompanying robot system, and the learning method is the same as the incremental learning method when the accompanying robot is used in the step A50. If the target object is not learned by the system, the accompanying robot can start a new object learning function of the early education system, and when the user informs of the new object type, the accompanying robot learns the type of the new object that is learned, so that the accompanying robot can correctly recognize the object when seeing the object next time.

For example, if the picture of the original Imagenet data set contains the data of a pear, the child accompanying robot can immediately recognize the pear and inform the user that the pear is. For apples not in the initial training database, parents or other instructors can inform the companion robot that the apple is when the camera of the companion robot sees the apple, and the robot can realize incremental learning through the novelty learning function so as to recognize the apple with children as users next time.

The accompanying robot has a supervision function, judges the interaction time of the child and the accompanying robot, considers that the child has a small difference for a long time when the child does not interact with the accompanying robot for a certain time, and then sends warning information to a guardian in time in a voice prompt or display screen display mode to avoid some unsafe things of the child; meanwhile, the voice prompt and the display screen display can draw up the interest of the children, so that more children can interact with the toy robot.

The child accompanying learning robot and the child early education system apply the robot technology, the image processing and the pattern recognition technology to accompanying of preschool children, intelligently respond to various actions and states of a user, and give the user good use experience. Compared with the existing early teaching robot system, the system disclosed by the invention has the following advantages: 1) the system has a learning function, and can learn and identify new objects and new characters under the guidance of related personnel, so that the goals of learning and co-growth with children are realized, and the time and the energy of parents in the aspect of early education of the children are saved; 2) the system is more positioned in classmates of children than teachers of the children, so that the fear and the aversion to learning emotion of the children are eliminated; 3) the system can identify external objective entities by utilizing images acquired by the camera, so that the system has better interestingness and sense of reality; 4) the system has a supervision function, and can send warning information to a guardian when the child behavior is abnormal, such as a small difference after a long time.

Thus, it should be understood by those skilled in the art that while exemplary embodiments of the present invention have been illustrated and described in detail herein, many other variations and modifications can be made, which are consistent with the principles of the invention, from the disclosure herein, without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

Claims

1. A self-learning method of an early education system of a robot accompanied by children is characterized by comprising the following steps:

step A10, training a convolutional neural network;

step A20, extracting image characteristic information;

step A30, grouping and quantizing the feature vectors;

step A40, generating a reference alphabet;

step A50, learning new things;

step A60, identifying an object to be identified;

2. The method of claim 1, wherein the output layer of the associative memory model assigns at least 2 nodes for each transaction category; when learning a new thing, the output layer node that is connected in match with the active input layer node is the first node under the current thing category that is not connected to the input layer node.

3. The method of claim 1, wherein the associative memory model is a two-layer neural network.

4. The method of claim 1, wherein the method of finding the letter matching the m sub-feature vectors of the thing in the m class sets of the reference alphabet comprises: respectively calculating k in the current sub-feature vector and the corresponding class set_sAnd (3) taking 1 closest class center as 1 letter corresponding to the object according to the distance between the class centers, and obtaining m letters from m sub-feature vectors of the object so as to obtain a character string which corresponds to the object and has the length of m.

5. The method of claim 1, wherein the category of the thing and the true name of the thing are stored in a txt format key-value pair file; when a new object is learned, acquiring the real name of the new object from the outside, and then searching the category corresponding to the real name in the key value pair file; if the category information does not exist, inserting a new key value pair to represent the newly learned category, and inputting the category of the matter into the associative memory model; when the object to be recognized is recognized, the associative memory model outputs the class of the object to be recognized, and the early education system outputs the real name of the object to be recognized by searching the real name corresponding to the class in the key value pair file.

6. The method of claim 1, wherein the convolutional neural network model is trained using cross entropy as a loss function, and the weight matrices for the layers of the convolutional neural network are updated according to the calculated values L of the loss function;

wherein the loss function is as shown in equation 1:

7. The method of claim 6, wherein the convolutional neural network comprises an input layer, a convolutional layer, a skip layer connection, a pooling layer, a full connection layer, and a classification layer; when the convolutional neural network is trained, calculating a loss function value according to a predicted value output by the classification layer; when the convolutional neural network is used for extracting the image feature information, the feature vector of the image is formed by the features output by the full connection layer.

8. A child companion robot, comprising:

an information processing module, when the new object learning function is started, for searching a category corresponding to the real name in the key value pair file according to the obtained real name of the new object, and then learning the new object according to the image and the category of the new object and the method of claim 1; when the object identification function is started, identifying the object to be identified according to the acquired image of the object to be identified and the method of claim 1, and then finding the real name corresponding to the category in the key value pair file according to the identified category;

9. The robot of claim 8, wherein the information input module comprises a camera and a voice input unit.

10. The robot for accompanying children's education as claimed in claim 8, wherein the information output module includes a display screen and a voice output unit.