CN112230875A

CN112230875A - Artificial intelligence following reading method and following reading robot

Info

Publication number: CN112230875A
Application number: CN202011087529.3A
Authority: CN
Inventors: 朱定局
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-01-15

Abstract

An artificial intelligence following reading method and a following reading robot comprise the following steps: an instruction acquisition step; a command mode step; an instruction interaction step; a data acquisition step; and (5) learning. According to the method, the system and the robot, the external text content is obtained through image recognition and is used as the reading content, the text range and the mode of reading after are controlled through instructions, and the content with the reading error is associated with the learning material stored in the robot through intelligent recommendation, so that associated learning is carried out through reading, the reading capability is improved, the mastering of knowledge in the learning material can be promoted, and the reading and the learning are assisted and mutually promoted.

Description

Artificial intelligence following reading method and following reading robot

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence following reading method and a following reading robot.

Background

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the existing reading-following device can only read contents stored in a memory in advance, such as poems, English and the like, and the reading-following device does not have any feedback when a user reads the contents, and is essentially a player. Once the contents in the memory are determined, the contents outside the memory cannot be played, and even if the contents are added to the memory, the user can only manually select one of the contents to play. The existing read-after robots have very limited contents, and even if massive contents are stored in the read-after robots, users can find suitable contents from the massive contents to play the contents difficultly due to the fact that the users need to manually select the contents.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

Therefore, it is necessary to provide an artificial intelligence reading method and a reading robot for overcoming the defects in the prior art that the reading device in the prior art can only perform repeated reading mechanically, cannot find the reading error of the user, and cannot help the user to correct the reading error.

In a first aspect, an embodiment of the present invention provides an artificial intelligence method, where the method includes:

an instruction acquisition step: acquiring an instruction of a user; the instruction comprises relevant indications of modes, interaction modes and ranges;

an instruction mode step: determining text content and first audio to be read according to the mode and the range of the instruction;

and instruction interaction step: determining a second audio and a third audio according to the interaction mode of the instruction;

a data acquisition step: acquiring learning materials;

a learning step: if the instruction is a first preset instruction or a second preset instruction, searching text contents matched with the nearest text contents which are read by the at least one user in error or the text contents to be read and of which the user error rate is higher than a preset proportion from the learning material text as the text contents to be read; and if the instruction is a third preset instruction, continuing to use the subsequent text content of the matched text content as the text content to be read.

Preferably, the command mode step comprises:

a mode selection step: if the instruction includes a read-after meaning, executing a read-after mode step; if the instruction is not acquired and the position pointed by the user in the image is identified to be out of the text content to be read, executing a default mode step; if the instruction comprises a continuation meaning, executing a continuation mode step;

following reading mode step: recognizing the text content in the range indicated by the instruction of the position pointed by the user in the image according to the instruction, taking the text content as the text content to be read, and converting the text content to be read into first audio;

default mode steps: taking the previous instruction as the current instruction, and then turning to the instruction mode step to continue executing;

and a mode continuing step: and acquiring a text in a preset range subsequent to the text content to be read as the text content to be read, and converting the text content to be read into a first audio.

Preferably, the instruction interacting step comprises:

and an interactive selection step: if the instruction is an instruction in the aspect of 'I read with you', executing the step of 'I read with you'; if the instruction is an instruction in the aspect of 'you read with me', executing the step of reading with me; if the instruction is an indication of the aspect of "we read together", then we read together mode step is executed;

i, reading with you: playing the first audio; after the first audio is played, acquiring audio read by a user as second audio; obtaining the text content which is read by the user in error according to the first audio and the second audio, taking the text content as the text content to be read, converting the text content to be read into a third audio, and playing the third audio;

and a step of reading with the user: acquiring the audio of a user as a second audio; after the second audio is obtained, playing the first audio; obtaining the text content which is read by the user in error according to the first audio and the second audio, taking the text content as the text content to be read, converting the text content to be read into a third audio, and playing the third audio;

we read the pattern together: acquiring a unit of audio, and acquiring the current unit audio of a user in real time to be used as the current unit part of the second audio; acquiring a next unit content adjacent to the unit content in the text content to be read corresponding to the previous unit part of the second audio as a first content, and converting the first content into the audio as the current part of the first audio; playing the current part of the first audio while acquiring the current part of the second audio of the user in real time; and obtaining the text content which is read by the user in error according to the first audio and the second audio, converting the text content to be read into a third audio and playing the third audio.

Preferably, the command mode step further comprises:

an image acquisition step: acquiring a first image which is not shielded and a second image which corresponds to the first image and is pointed; taking the second image as the image in the step of instruction mode;

a display step: displaying the text content to be read;

a text recognition step: during training and testing, taking a first image which is not shielded and a pointed second image corresponding to the first image as input, taking text content in a range indicated by an instruction of a position pointed by a user in the second image as output, and training and testing a deep learning model to obtain the deep learning model as a text recognition deep learning model with a shielded image; when the method is used, a first image which is not blocked and a second image which is corresponding to the first image and is pointed are used as input, and output obtained through calculation of a text recognition deep learning model of a blocked image is used as text content in a range indicated by an instruction of a position pointed by a user in the second image.

Preferably, the method further comprises:

a language selection step: acquiring the selection of a user on a preset option; the preset options comprise a language selection step according to the instruction, a language selection step according to the user and a language selection step according to the text content;

selecting a language according to the instruction: taking the language of the instruction as a target language and taking the language of the text content in the range indicated by the instruction as a source language; if the target language is different from the source language, translating the text content in the range indicated by the instruction into the text content of the target language as the text content to be read;

according to the language selected by the user: taking the language of the second audio as a target language and the language of the text content in the range indicated by the instruction as a source language; if the target language is different from the source language, translating the text content into the text content of the target language as the text content to be read;

selecting a language according to the text content: and taking the text content in the range indicated by the instruction as the text content to be read.

Preferably, the first and second electrodes are formed of a metal,

the read-after mode step further comprises: the range indicated by the instruction comprises a word or a sentence or a paragraph or a page; if the current instruction does not have the indicated range, acquiring the range indicated by the instruction in the latest read-following mode step as the range indicated by the current instruction;

the continue mode step further comprises: the preset range includes a range indicated by the instruction in the instruction last reading step.

Preferably, the command mode step further comprises:

the mode selection step further comprises: if the instruction is an instruction for performing the read-after-translation aspect, executing a translation read-after mode step;

a translation follow-up reading mode step: and recognizing the text content in the range indicated by the instruction of the position pointed by the user in the image according to the instruction, translating the text content in the range into the target language required by the instruction, serving as the text content to be read, and converting the text content to be read into first audio.

In a second aspect, an embodiment of the present invention provides an artificial intelligence apparatus, where the apparatus includes:

an instruction acquisition module: acquiring an instruction of a user; the instruction comprises relevant indications of modes, interaction modes and ranges;

an instruction mode module: determining text content and first audio to be read according to the mode and the range of the instruction;

an instruction interaction module: determining a second audio and a third audio according to the interaction mode of the instruction;

a data acquisition module: acquiring learning materials;

a learning module: if the instruction is a first preset instruction or a second preset instruction, searching text contents matched with the nearest text contents which are read by the at least one user in error or the text contents to be read and of which the user error rate is higher than a preset proportion from the learning material text as the text contents to be read; and if the instruction is a third preset instruction, continuing to use the subsequent text content of the matched text content as the text content to be read.

Preferably, the command mode module comprises:

a mode selection module: if the instruction comprises a follow-up reading meaning, executing a follow-up reading mode module; if the instruction is not acquired and the position pointed by the user in the image is identified to be out of the text content to be read, executing a default mode module; if the instruction comprises a continuation meaning, executing a continuation mode module;

a follow-reading mode module: recognizing the text content in the range indicated by the instruction of the position pointed by the user in the image according to the instruction, taking the text content as the text content to be read, and converting the text content to be read into first audio;

a default mode module: taking the previous instruction as the current instruction, and then turning to the instruction mode step to continue executing;

a continuation mode module: and acquiring a text in a preset range subsequent to the text content to be read as the text content to be read, and converting the text content to be read into a first audio.

Preferably, the instruction interaction module comprises:

an interaction selection module: if the instruction is an instruction in the aspect of 'I read with you', executing an I read with you module; if the instruction is an instruction in the aspect of 'you read with me', executing a you read with me module; if the instruction is an indication of the aspect of 'we read together', executing a 'we read together' mode module;

i follow you read module: playing the first audio; after the first audio is played, acquiring audio read by a user as second audio; obtaining the text content which is read by the user in error according to the first audio and the second audio, taking the text content as the text content to be read, converting the text content to be read into a third audio, and playing the third audio;

the follow me module: acquiring the audio of a user as a second audio; after the second audio is obtained, playing the first audio; obtaining the text content which is read by the user in error according to the first audio and the second audio, taking the text content as the text content to be read, converting the text content to be read into a third audio, and playing the third audio;

we read the mode module together: acquiring a unit of audio, and acquiring the current unit audio of a user in real time to be used as the current unit part of the second audio; acquiring a next unit content adjacent to the unit content in the text content to be read corresponding to the previous unit part of the second audio as a first content, and converting the first content into the audio as the current part of the first audio; playing the current part of the first audio while acquiring the current part of the second audio of the user in real time; and obtaining the text content which is read by the user in error according to the first audio and the second audio, converting the text content to be read into a third audio and playing the third audio.

Preferably, the instruction mode module further comprises:

an image acquisition module: acquiring a first image which is not shielded and a second image which corresponds to the first image and is pointed; taking the second image as the image in an instruction mode module;

a display module: displaying the text content to be read;

a text recognition module: during training and testing, taking a first image which is not shielded and a pointed second image corresponding to the first image as input, taking text content in a range indicated by an instruction of a position pointed by a user in the second image as output, and training and testing a deep learning model to obtain the deep learning model as a text recognition deep learning model with a shielded image; when the method is used, a first image which is not blocked and a second image which is corresponding to the first image and is pointed are used as input, and output obtained through calculation of a text recognition deep learning model of a blocked image is used as text content in a range indicated by an instruction of a position pointed by a user in the second image.

Preferably, the apparatus further comprises:

a language selection module: acquiring the selection of a user on a preset option; the preset options comprise a language module selected according to the instruction, a language module selected according to the user and a language module selected according to the text content;

selecting a language module according to the instruction: taking the language of the instruction as a target language and taking the language of the text content in the range indicated by the instruction as a source language; if the target language is different from the source language, translating the text content in the range indicated by the instruction into the text content of the target language as the text content to be read;

according to the language module selected by the user: taking the language of the second audio as a target language and the language of the text content in the range indicated by the instruction as a source language; if the target language is different from the source language, translating the text content into the text content of the target language as the text content to be read;

selecting a language module according to the text content: and taking the text content in the range indicated by the instruction as the text content to be read.

Preferably, the first and second electrodes are formed of a metal,

the read-after mode module further comprises: the range indicated by the instruction comprises a word or a sentence or a paragraph or a page; if the current instruction does not have the indicated range, acquiring the range indicated by the instruction in the latest read-following mode module as the range indicated by the current instruction;

the continuation mode module further comprises: the preset range comprises a range indicated by the instruction in the instruction module read last time.

Preferably, the instruction mode module further comprises:

the mode selection module further comprises: if the instruction is an instruction for performing the read-after-translation aspect, executing a translation read-after mode module;

a translation read-following mode module: and recognizing the text content in the range indicated by the instruction of the position pointed by the user in the image according to the instruction, translating the text content in the range into the target language required by the instruction, serving as the text content to be read, and converting the text content to be read into first audio.

In a third aspect, an embodiment of the present invention provides an artificial intelligence ethics system, where the system includes modules of the apparatus in any one of the embodiments of the second aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to any one of the embodiments of the first aspect.

In a fifth aspect, an embodiment of the present invention provides a robot, including a memory, a processor, and an artificial intelligence robot program stored on the memory and executable on the processor, where the robot is the first artificial intelligence device in the first aspect, and the processor implements the steps of the method in any one of the first aspect when executing the program.

The artificial intelligence reading method and the reading robot provided by the embodiment comprise the following steps: an instruction acquisition step; a command mode step; an instruction interaction step; a data acquisition step; and (5) learning. According to the method, the system and the robot, the external text content is obtained through image recognition and is used as the reading content, the text range and the mode of reading after are controlled through instructions, and the content with the reading error is associated with the learning material stored in the robot through intelligent recommendation, so that associated learning is carried out through reading, the reading capability is improved, the mastering of knowledge in the learning material can be promoted, and the reading and the learning are assisted and mutually promoted.

Drawings

FIG. 1 is a flow diagram of an artificial intelligence method provided by one embodiment of the invention;

FIG. 2 is a flow diagram of an artificial intelligence method provided by one embodiment of the invention;

FIG. 3 is a flow diagram of an artificial intelligence method provided by one embodiment of the invention;

FIG. 4 is a flow diagram of an artificial intelligence method provided by one embodiment of the invention;

FIG. 5 is a flow diagram of an artificial intelligence method provided by one embodiment of the invention;

FIG. 6 is a flow diagram of an artificial intelligence method provided by one embodiment of the invention;

FIG. 7 is a schematic diagram of a robot provided in accordance with an embodiment of the present invention;

FIG. 8 is a schematic diagram of a robot provided in accordance with an embodiment of the present invention;

fig. 9 is a schematic diagram of a robot according to an embodiment of the present invention.

Detailed Description

The technical solutions in the examples of the present invention are described in detail below with reference to the embodiments of the present invention.

Basic embodiment of the invention

In a first aspect, an embodiment of the present invention provides an artificial intelligence method, as shown in fig. 1, where the method includes: an instruction acquisition step; a command mode step; an instruction interaction step; a data acquisition step; and (5) learning. The beneficial technical effects are as follows: through the instruction mode step and the instruction interaction step, a user can adopt different follow-up reading modes, namely the user can read along with the robot, the robot can read along with the user, and the robot can read along with the user, so that the follow-up reading is full of fun, and compared with the traditional single repeat mode that the user reads a sentence and then the repeater plays once again, the method has great advantages and can improve the follow-up reading experience of the user; meanwhile, the robot can follow up reading, help a user to find out wrong contents during reading, mark the wrong contents through display, and read the wrong contents again, so that the reading level of the user is improved. More importantly, the method can search out relevant learning materials according to the contents of the misreading through the learning step to help the user improve and improve the reading level, so that the place where the misreading is carried out can be corrected through the learning process of the learning materials, the reading level is improved, and the learning of knowledge in the learning materials is promoted.

In a preferred embodiment, as shown in FIG. 2, the instruction mode step comprises: a mode selection module; a follow-reading mode step; a default mode step; the mode step is continued. The beneficial technical effects are as follows: except the read-after mode, default mode and continuation mode in addition, can greatly reduce user's instruction complexity, make the user after giving first instruction, only need just can be equated with the complicated instruction before just arriving with the continuation instruction in the future simply, thereby can let the user be absorbed in and read-after, and need not repeated complicated instruction, even during the read-after, need not to send out the instruction, the system also can carry out the control of read-after according to the instruction before, thereby greatly improved the efficiency of read-after, the user control degree of difficulty of read-after has been reduced, make read-after more convenient more save trouble, user's experience is better.

In a preferred embodiment, as shown in FIG. 3, the instruction interaction step includes: an interactive selection step; i read with you; a step of reading with the user; we read the pattern steps together. The beneficial technical effects are as follows: completely different from the traditional repeater, the follow-up reading in the application is not only that the system plays the audio of the user again, but can interact equally with each other, and the most novel is that the user and the robot can read synchronously, so that the user can be greatly helped to improve the reading capability of the user, the real follow-up reading effect is achieved, and the reading level of the user is improved more quickly.

In a preferred embodiment, as shown in fig. 4, the instruction mode step further comprises: acquiring an image; a display step; and a text recognition step. The beneficial technical effects are as follows: the method can follow the reading through the audio, can display the content to be followed and read in the display screen, and can mark the content which is read by mistake in the text content to be read, so that the user can follow the reading by contrasting the displayed text content, and a better follow-reading effect can be achieved.

In a preferred embodiment, as shown in fig. 5, the method further comprises: a language selection step; selecting a language according to the instruction; selecting a language according to a user; and selecting a language according to the text content. The beneficial technical effects are as follows: through the language selection, the user can follow the reading in multiple languages without being completely consistent with the language of the text content to be watched, so that the flexibility of follow-reading is improved, the language selection method can be universally used for various languages, is suitable for various national users, can improve the foreign language level of the user, and has multiple purposes.

In a preferred embodiment, the read-after mode step further comprises: the range indicated by the instruction comprises a word or a sentence or a paragraph or a page; if the current instruction does not have the indicated range, acquiring the range indicated by the instruction in the latest read-following mode step as the range indicated by the current instruction; the continue mode step further comprises: the preset range includes a range indicated by the instruction in the instruction last reading step. The beneficial technical effects are as follows: by specifying the reading-following range, the reading-following can be more personalized, and the part which the user wants to read-following can be read-following; and the range does not need to be specified every time, because the previous range can be directly delayed when the range is not specified, thereby reducing the complexity of the interaction between a user and a system instruction.

In a preferred embodiment, as shown in fig. 6, the instruction mode step further comprises: and (5) a translation follow-up reading mode step. The beneficial technical effects are as follows: through the translation follow-reading mode step, the user can follow-read in any language, so that the user can improve the reading level through follow-reading and can also improve the foreign language level and the translation level through follow-reading, thereby achieving three purposes at a time.

In a second aspect, an embodiment of the present invention provides an artificial intelligence apparatus, where the apparatus is used for the steps of the method according to any one of the embodiments of the first aspect.

In a fifth aspect, an embodiment of the present invention provides a robot, including a memory, a processor, and an artificial intelligence robot program stored in the memory and executable on the processor, where the processor executes the program to implement the steps of the method according to any one of the embodiments of the first aspect.

The artificial intelligence reading method and the reading robot provided by the embodiment comprise the following steps: acquiring a risk type; acquiring an event scene; judging the existence of risks; calculating the risk size; an event scene transformation step; a behavior prediction step; and performing the action. According to the method, the system and the robot, the artificial intelligence ethical risk in the event scene is detected through the deep learning model, then the event scene with the lowest artificial intelligence ethical risk is obtained through the scene transformation virtual simulation, and the behavior required for the scene transformation is obtained through the deep learning model prediction, so that the artificial intelligence ethical risk can be prevented in reality, and the detection and prevention capability of the artificial intelligence ethical risk in the event scene is improved.

PREFERRED EMBODIMENTS OF THE PRESENT INVENTION

1. The robot has a camera, a display, a sound pickup, and a player. (the robot here may have no entity)

Preferably, the camera, the display, the sound pick-up and the player can be mounted on the robot or on the pen or on the lamp or on the bed head or other positions such as glasses or a hat.

Preferably, the cloud computing system is provided with a communication module, and the communication module sends data (or does not send data) to the cloud or the server for computing and then returns a result through the communication module. For example, the communication module sends the acquired image to the cloud or the server for image recognition and then returns the recognized text content through the communication module, and for example, the communication module retrieves and matches the learning material related to the latest read text content and then returns the matched learning material content through the communication module.

2. The camera (and the robot) are connected by a telescopic and rotatable connecting device (for example, a telescopic and rotatable connecting rod, which may be made of metal, etc., and an arm of the robot may also be used as the connecting device, so that the camera is mounted on the hand of the robot at this time, because the arm of the robot can also be telescopic and rotatable). (the robot here may also have no entity, so the camera may be connected to other objects, e.g. a table, etc.)

3. Images are acquired by the camera, and the images acquired by the camera in real time are displayed by a display of the robot, as shown in fig. 7.

4. And identifying the page image with the character from the image, and storing the page image with the character. (if there are 2 pages, then save 2 page images.)

Preferably, the step of recognizing the page with the characters adopts a deep learning model for recognition, and during training, the first image is used as input, the page image with the characters in the first image is used as expected output, and the deep learning model is trained and tested to obtain a page recognition deep learning model; when the method is used, the first image is used as the input of the page recognition deep learning model, and the calculated output of the page recognition deep learning model is used as the page image with characters in the first image.

5. Judging whether the page image with the characters is complete, if not, reminding a user to adjust the relative position of the camera and the object with the characters or automatically adjusting the position of the camera (so that the camera can be aligned to the page with the characters, adjusting the position of the camera, adjusting through a connecting device, or adjusting through camera parameters such as focal length, and the like, wherein the step can also be manually realized by the user, so the step is taken as a further preferred step), and returning to the step 3 to be executed again.

Preferably, the step of judging whether the page image with the character is complete adopts a deep learning model for identification, during training, the complete page image with the character is used as input, the representative complete label is used as expected output, the incomplete page image with the character is used as input, the representative incomplete label is used as expected output, the deep learning model is trained and tested, and the page integrity judgment deep learning model is obtained; when the page integrity judgment deep learning model is used, the page image with the characters is used as the input of the page integrity judgment deep learning model, if the output of the calculated page integrity judgment deep learning model represents a complete label, the page image with the characters is complete, and if the output represents an incomplete label, the page image with the characters is incomplete.

6. Judging whether the page image with the characters is clear, if not, reminding a user to adjust the relative position of the camera and the object with the characters or automatically adjusting the position of the camera (so that the camera can be aligned to the page with the characters, adjusting the position of the camera, adjusting through a connecting device, or adjusting through camera parameters such as focal length, and the like, wherein the step can also be manually realized by the user, so the step is taken as a further preferred step), and returning to the step 3 to execute again.

Preferably, the step of judging whether the page image with the character is clear adopts a deep learning model for identification, during training, the clear page image with the character is used as input, the label representing clear is used as expected output, the page image with the character not being clear is used as input, the label representing unclear is used as expected output, the deep learning model is trained and tested, and the deep learning model for judging the page definition is obtained; when the method is used, the page image with the characters is used as the input of the page definition judgment deep learning model, if the output of the calculated page definition judgment deep learning model represents a clear label, the page image with the characters is clear, and if the output represents an unclear label, the page image with the characters is unclear.

7. The sound pickup acquires the voice of a user and identifies instructions in the voice;

instructions consist of patterns (e.g., follow-up or default or continue, etc.), interaction patterns, ranges (where a range refers to units of length, such as words or sentences or paragraphs or pages or other custom ranges), etc.;

8. determining text content to be read and a first audio according to the mode of the instruction;

8.1, if the instruction is an instruction in the follow-up reading aspect (the instruction in the follow-up reading aspect comprises instructions in aspects of reading with you, reading with us and the like), identifying text content in a range indicated by the instruction where the position pointed by the user in the image is located according to the instruction, taking the text content as text content to be read, and converting the text content to be read into first audio;

8.2, if the instruction is not obtained and the position pointed by the user in the image is identified to be out of the text content to be read, taking the previous instruction as the current instruction, and then turning to the step 8 to continue to execute; it should be noted that: if the current instruction is to continue and the previous instruction is 'i read this sentence with you', the current instruction will become 'i read this sentence with you', so if the next instruction is to continue, the last instruction of the next instruction is 'i read this sentence with you', so the next instruction will become 'i read this sentence with you'.

8.3, if the instruction is an instruction (such as 'continue') of continuing meaning, acquiring a text in a range subsequent to the text content to be read as the text content to be read, and converting the text content to be read into a first audio; preferably, the range of the most recent reading is taken as the range; for example, if the last reading range is a sentence (e.g., sentence 1), then the current reading range is a sentence (e.g., sentence 2). If it is resumed, then the next reading is still a sentence (e.g., sentence 3).

9. Determining a second audio and a third audio according to the interaction mode of the instruction;

9.1, if the instruction is an instruction in the aspect of 'I read with you', playing a first audio; preferably, the text content to be read is displayed as the text content to be displayed while the first audio is played; and after the first audio is played, acquiring the audio read by the user as a second audio. According to the first audio and the second audio, obtaining text content to be read with the marked error position, and displaying the text content to be read as text content to be displayed; and obtaining the text content which is read by the user in error according to the first audio and the second audio, converting the text content to be read into a third audio and playing the third audio.

9.2, if the instruction is the indication of 'you follow me' aspect, acquiring the audio of the user as a second audio. After the second audio is obtained, playing the first audio; preferably, the text content to be read is displayed as the text content to be displayed while the first audio is played; according to the first audio and the second audio, obtaining text content to be read with the marked error position, and displaying the text content to be read as text content to be displayed; and obtaining the text content which is read by the user in error according to the first audio and the second audio, converting the text content to be read into a third audio and playing the third audio.

And 9.3, if the instruction is an indication in the aspect of 'reading together' to obtain the unit of the audio, and obtaining the current unit audio of the user (for example, when the unit is a word, the unit audio is the audio of the word, and when the unit is a word, the unit audio is the audio of the word) in real time as the current unit part of the second audio. And acquiring a unit content (for example, when the unit is a word, the unit content is a character; when the unit is a word, the unit content is a word) adjacent to the unit content in the text content to be read corresponding to the previous unit part of the second audio as the first content, and converting the first content into the audio as the current part of the first audio. Playing the current part of the first audio while acquiring the current part of the second audio of the user in real time; preferably, the text content to be read is displayed as the text content to be displayed while the first audio is played; after the user finishes reading, obtaining text content to be read with the mark of the error position according to the first audio and the second audio, and displaying the text content to be read as the text content to be displayed; and obtaining the text content which is read by the user in error according to the first audio and the second audio, converting the text content to be read into a third audio and playing the third audio.

The step of obtaining the next unit content adjacent to the unit content in the text content to be read corresponding to the previous unit part of the second audio can be realized by deep learning; during training and testing, taking the text content to be read and a plurality of preset unit parts before the current unit part of the second audio as input, taking the next unit content adjacent to the unit content in the text content to be read corresponding to the previous unit part of the second audio as output, and training and testing the deep learning model to obtain the deep learning model as a synchronous reading deep learning model; when the audio reading device is used, the text content to be read and a plurality of preset unit parts before the current unit part of the second audio are used as input, and the output obtained through calculation of the synchronous reading deep learning model is used as the next unit content adjacent to the unit content in the text content to be read corresponding to the previous unit part of the second audio.

For example, if the instruction is "i read this sentence with you", after the robot has read the text content of the sentence in which the word pointed by the user is located, the user follows to read the text content of the sentence in which the word pointed by the user is located; for example, if the instruction is "i read this segment with you", after the robot has read the text content of the segment where the word pointed by the user is, the user follows to read the text content of the segment where the word pointed by the user is; for example, if the instruction is "you read this sentence with me", after the user has read the text content of the sentence in which the word pointed by the user is located, the robot follows to read the text content of the sentence in which the word pointed by the user is located; for example, if the instruction is "you read this segment with me", after the user has read the text content of the segment where the word pointed by the user is located, the robot follows to read the text content of the segment where the word pointed by the user is located; for example, if the instruction is "we read the sentence together", when the user finishes reading each word in the sentence where the word pointed by the user is, the robot synchronously reads each word in the sentence where the word pointed by the user is; for example, if the instruction is "we read the segment together", when the user finishes reading each word in the segment where the word pointed by the user is located, the robot synchronously reads each word in the segment where the word pointed by the user is located;

obtaining text content to be read with the mark of the error position according to the first audio and the second audio, and displaying the text content to be read as the text content to be displayed; obtaining the text content which is read by the user in error according to the first audio and the second audio, using the text content as the text content to be read, converting the text content to be read into a third audio, and playing the third audio comprises the following specific steps: and comparing the first audio with the second audio to find out the text content which is wrongly read by the user, marking the text content to be read (in a special color or underline mode) and displaying the text content (for example, displaying the text content to a display screen of the robot). And converting the text content which is read by the user in error into a third audio, and then playing the audio corresponding to the text content to be read through the player from top to bottom and from left to right. And taking the text content to be read with the marked error position as the text content to be displayed. And displaying the text content to be displayed, for example, to a display screen, or to a mobile phone of a user, or to a display screen of the robot. And synchronizing the read text content with the displayed text content, and performing preset identification (such as highlighting, thickening and the like) when the read word is displayed in a preset range (the preset range is a word or a sentence, and the range indicated by the instruction can also be used as the preset range). By default, the preset range is a word, because regardless of whether the reading range indicated by the instruction is a word or a sentence or a paragraph, it is always the word that it is reading;

for example, the text content to be read is "i is a soldier and comes from a common name", the first audio is "wo shi yi ge bin, lai zi lao bai xing", the second audio is "wo shi yi ge bin, lai zi lao ba xin", the text content to be read which marks the wrong position is "i is a 'soldier', comes from a common 'family name'", and the text content which is mistakenly read by the user is "i is a 'soldier', comes from a common 'family name'";

for example, if the instruction is "i read this sentence with you", the text content in the range indicated by the instruction where the position pointed by the user is located is the text content of the sentence where the word pointed by the user is located; if the instruction is 'I read the word with you', the text content in the range indicated by the instruction of the position pointed by the user is the word pointed by the user; if the instruction is 'I read the word with you', the text content of the word in which the word pointed by the user is located in the range pointed by the instruction of the position pointed by the user is the text content of the word pointed by the user; if the instruction is 'I read the segment with you', the text content in the range indicated by the instruction of the position pointed by the user is the text content of the segment of the word pointed by the user; if the instruction is 'I read the page with you', the text content in the range indicated by the instruction of the position pointed by the user is the text content of the page of the word pointed by the user;

preferably, the step of comparing the first audio with the second audio to find out the text content which is wrongly read by the user can be realized by deep learning; during training and testing, taking text content to be read, first audio and second audio as input, taking the text content to be read with the error position marked as output, training and testing a deep learning model, and taking the obtained deep learning model as an error reading recognition deep learning model; when the method is used, the text content to be read, the first audio and the second audio are used as input, and output obtained through calculation of the wrong pronunciation recognition deep learning model is used as the text content to be read with the marked wrong position. And extracting the text content which is wrongly read by the user from the middle error position of the text content to be read which marks the error position.

Preferably, the first audio and the second audio are converted into a first phonetic symbol text and a second phonetic symbol text; comparing the first phonetic symbol text with the second phonetic symbol text, identifying the text to be read according to the position in the text content to be read corresponding to different phonetic symbols of the first phonetic symbol text and the second phonetic symbol text as an error position, and taking the identified text to be read as the text content to be read marking the error position; and extracting the text content which is wrongly read by the user from the middle error position of the text content to be read which marks the error position.

Preferably, the audio is converted into phonetic text, which can be realized through deep learning. In the training and testing process, audio is used as input, phonetic symbol text is used as output, the deep learning model is trained and tested, and the obtained deep learning model is used as an audio recognition deep learning model; when the method is used, audio is used as input, and output obtained through calculation of the audio recognition deep learning model is used as phonetic symbol text.

Preferably, the first phonetic symbol text and the second phonetic symbol text are compared, the text to be read is identified according to the position in the text content to be read corresponding to the different phonetic symbols of the first phonetic symbol text and the second phonetic symbol text as the error position, and the identified text to be read is used as the text content to be read marking the error position, which can be realized by deep learning. During training and testing, taking a first phonetic symbol text, a second phonetic symbol text and a text to be read as input, taking the text content to be read with the error position as output, and training and testing a deep learning model to obtain the deep learning model as a phonetic symbol error recognition deep learning model; when the method is used, the first phonetic symbol text, the second phonetic symbol text and the text to be read are used as input, and output obtained through calculation of the audio recognition deep learning model is used as the text content to be read with the error position marked.

Preferably, the position pointed by the user includes a position pointed by a finger of the user or a position pointed by other objects such as a pen, as shown in fig. 8.

Preferably, the step of identifying the text content in the range indicated by the instruction where the position pointed by the user in the image is located according to the instruction may be realized by deep learning; during training and testing, the images and the text content in the range indicated by the instruction of the position pointed by the user are respectively used as input and output, the deep learning model is trained and tested, and the obtained deep learning model is used as the text recognition deep learning model of the range of the position pointed by the user; when the text recognition deep learning model is used, the image is used as an input, and the output obtained through calculation of the text recognition deep learning model of the range where the position pointed by the user is located is used as the text content in the range indicated by the instruction where the position pointed by the user is located.

Preferably, the text content of the shielded part in the shielded text content is recovered according to the unoccluded text content; specifically, the method comprises the following steps: in the training and testing process, an unoccluded image and an occluded image (text contents in the unoccluded image and the occluded image are consistent) are used as input, text contents in a range indicated by an instruction of a position pointed by a user in the occluded image are used as output, a deep learning model is trained and tested, and the obtained deep learning model is used as an occluded image text recognition deep learning model; when the system is used, an unoccluded image and an occluded image are used as input, and output obtained by calculation of a text recognition deep learning model of the occluded image is used as text content in a range indicated by an instruction of a position pointed by a user.

Preferably, the language of the instruction is taken as a target language, and the language of the text content in the range indicated by the instruction is taken as a source language; if the target language is different from the source language, translating the text content into the text content of the target language, taking the text content as the text content to be read, converting the text content to be read into a first audio, and then playing the first audio corresponding to the text content to be read through a player from top to bottom and from left to right;

preferably, the language of the second audio is taken as a target language, and the language of the text content in the range indicated by the instruction is taken as a source language; if the target language is different from the source language, translating the text content into the text content of the target language, taking the text content as the text content to be read, converting the text content to be read into a first audio, and then playing the first audio corresponding to the text content to be read through a player from top to bottom and from left to right;

preferably, if the range of the instruction is a page (for example, "i read this page with you"), the text content in the page where the position pointed by the finger is located is identified from the image, and is used as the text content to be read, the text content to be read is converted into audio, and then the audio corresponding to the text content to be read is played through the player from top to bottom and from left to right;

preferably, the step of identifying from the image the text content in the page where the position pointed to by the finger is located may be implemented by deep learning; during training and testing, the images and the text content in the page where the position pointed by the user is located are respectively used as input and output, the deep learning model is trained and tested, and the obtained deep learning model is used as a text recognition deep learning model of the page where the position pointed by the user is located; when the system is used, the image is used as input, and the output obtained through calculation of the text recognition deep learning model of the page where the position pointed by the user is located is used as the text content of the page where the position pointed by the user is located.

Preferably, if the range of the instruction is a paragraph (for example, "i read this paragraph with you"), the text content in the paragraph where the position pointed by the finger is located is identified from the image, and is used as the text content to be read, the text content to be read is converted into audio, and then the audio corresponding to the text content to be read is played through the player from top to bottom and from left to right;

preferably, the step of identifying from the image the text content in the paragraph in which the position pointed to by the finger is located may be achieved by deep learning; during training and testing, the images and the text content in the paragraph where the position pointed by the user is located are respectively used as input and output, the deep learning model is trained and tested, and the obtained deep learning model is used as a text recognition deep learning model of the paragraph where the position pointed by the user is located; when the system is used, the image is used as an input, and the output obtained through calculation of the text recognition deep learning model of the paragraph of the position pointed by the user is used as the text content in the paragraph of the position pointed by the user.

Preferably, if the range of the instruction is a sentence (for example, "i read this sentence with you"), the text content in the sentence where the position pointed by the finger is located is identified from the image, and is used as the text content to be read, the text content to be read is converted into audio, and then the audio corresponding to the text content to be read is played through the player from top to bottom and from left to right;

preferably, -the step of identifying from the image the sentence content in the paragraph in which the position pointed to by the finger is located may be realized by deep learning; during training and testing, the images and sentence contents in the paragraph where the position pointed by the user is located are respectively used as input and output, the deep learning model is trained and tested, and the obtained deep learning model is used as a sentence recognition deep learning model of the paragraph where the position pointed by the user is located; when the system is used, the image is used as an input, and the output obtained through the computation of the sentence recognition deep learning model of the paragraph of the position pointed by the user is used as the sentence content in the paragraph of the position pointed by the user.

Preferably, if the range of the instruction is a word (for example, "i read this word with you"), the text content in the word at the position pointed by the finger is identified from the image, and is used as the text content to be read, the text content to be read is converted into audio, and then the audio corresponding to the text content to be read is played through the player from top to bottom and from left to right;

preferably, the step of identifying the word content in the paragraph in which the position pointed to by the finger is located from the image may be implemented by deep learning; during training and testing, the images and the word contents in the paragraph where the position pointed by the user is located are respectively used as input and output, the deep learning model is trained and tested, and the obtained deep learning model is used as the word recognition deep learning model of the paragraph where the position pointed by the user is located; when the system is used, the image is used as input, and output obtained through calculation of the word recognition deep learning model of the paragraph where the position pointed by the user is located is used as word content in the paragraph where the position pointed by the user is located.

preferably, the step of identifying from the image the word content in the paragraph in which the position pointed to by the finger is located may be achieved by deep learning; during training and testing, the images and the word contents in the paragraph where the position pointed by the user is located are respectively used as input and output, the deep learning model is trained and tested, and the obtained deep learning model is used as a word recognition deep learning model of the paragraph where the position pointed by the user is located; when the system is used, the image is used as an input, and the output obtained through calculation of the word recognition deep learning model of the paragraph where the position pointed by the user is located is used as the word content in the paragraph where the position pointed by the user is located.

10. If the instruction is an instruction in the aspect of translation and subsequent reading, recognizing text content in a range indicated by the instruction of a position pointed by a user in the image according to the instruction, translating the text content in the range into a target language required by the instruction to serve as the text content to be read, converting the text content to be read into a first audio, and then playing the first audio corresponding to the text content to be read through a player from top to bottom and from left to right; (this function is an optional function, unlike reading, the translation function is a function that is a little higher than the reading function, and the reading function is a basic function.)

10.1, identifying the text content in the range indicated by the instruction of the position pointed by the user in the image according to the instruction;

10.2, calling a translation engine interface to translate the recognized text content, such as a hundredth translation engine or a Google translation engine; or the recognized text content is translated through the deep learning model.

11. Acquiring learning materials; preferably, the learning materials are placed in the cloud, and can be in a text format or an audio format. The method specifically comprises the steps of obtaining learning materials input by a user, and obtaining the learning materials obtained by the user through the Internet; learning materials automatically acquired through the Internet are acquired, and learning materials acquired from a database or a knowledge base are acquired.

11.1, acquiring a learning material text uploaded by a user;

11.2, acquiring a learning material text selected by a user from the pre-stored learning material texts;

11.3, if the user does not select the pre-stored learning material texts, acquiring all the pre-stored learning material texts;

12. if the instruction is a first preset instruction (including an instruction on chatting or an instruction on explaining) or a second preset instruction (including an instruction on changing one aspect), searching the learning material text for text content matched with the text content which is read by mistake by at least one nearest user or/and the text content to be read with the error rate higher than a preset proportion; if the instruction is a third preset instruction (including an instruction for continuing the meaning, for example, continuing), continuing to read the subsequent text content of the matched text content, as shown in fig. 9; the attributes of the user can also be considered when searching and recommending; for example, the gender, age, etc. of the user are acquired through a camera, or the attributes of the user are directly acquired through the registration information of the user, so as to acquire the learning materials related to the user (for example, the learning materials belonging to the age group of the user), and then the text content matched with at least one nearest text content to be read is searched from the learning material text related to the user.

12.1, if the instruction is a first preset instruction (including an instruction on chatting or an instruction on explaining) or a second preset instruction (including an instruction on changing one aspect), executing 11.2; if the instruction is an indication of a continue meaning (e.g., continue), then 12.3 is executed;

12.2 searching the text contents which are not read within the latest preset time (for example, within one day) and have the most matching degree with the latest preset text contents of a plurality of users in the learning material texts or/and the text contents to be read with the error rate higher than the preset proportion (preferably, selecting a section of text) as the text contents to be read, converting the text contents to be read into audio, and then playing the audio corresponding to the text contents to be read through a player from top to bottom and from left to right; the user error rate is the proportion of the text content which is read by the user in the text content to be read, wherein the text content which is read by the user in error is a part of the text content to be read.

12.3 acquiring a text (preferably, selecting a section of text) subsequent to the text content to be read as the text content to be read, converting the text content to be read into audio, and then playing the audio corresponding to the text content to be read through the player from top to bottom and from left to right;

13. the content to be read is displayed and the word being read is marked (e.g. in a particular colour or brightness or shape etc.).

Other embodiments of the invention

The method comprises the steps of collecting grades and areas of users, retrieving teaching materials and examination knowledge points corresponding to the grades and the areas, and forming a teaching knowledge base.

The content of the content (the speech of the user is identified and the range of the indication, for example, the user says "follow-up reading this word" or other sentences containing "word", the user says "follow-up reading this sentence" or other sentences containing "sentence", the user refers to sentences, for example, the user says "follow-up reading this paragraph" or other sentences containing "paragraph", when reading the paragraph pointed by the user, for example, when the user says "read the page" or other sentence containing "page", when reading the page pointed by the user), the content pointed by the user is read (during the reading process, the user's hand can leave the pointed content), and the read content is displayed on the display and recorded in the user reading knowledge base.

According to the content of the follow-up reading of the user, particularly the content with errors during the follow-up reading, related knowledge is retrieved from a user teaching knowledge base, during the retrieval, the current scholarly term is retrieved first, then the learned knowledge is retrieved, namely the learned knowledge is sorted according to the relevance and the time, and the relevance is also sorted according to the time. And after the reading is finished, the retrieved related knowledge is played. Before playing the retrieved relevant knowledge, a guidance phrase "what you just read with is very relevant to your past knowledge in school, do you want to listen to? "if the user says so, play will start. And then also may hook knowledge that the student would likely learn during the school period. A guidance phrase "what you just read is also very relevant to the knowledge you will learn during the school period, do you want to listen to? "

Therefore, when the students read, the learning knowledge of the students in the school can be consolidated.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the spirit of the present invention, and these changes and modifications are within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An artificial intelligence method, the method comprising:

a data acquisition step: acquiring learning materials;

2. The artificial intelligence method of claim 1 wherein the instruction pattern step comprises:

3. The artificial intelligence method of claim 2, wherein the instruction interaction step comprises:

4. The artificial intelligence method of claim 2 wherein the instruction pattern step further comprises:

a display step: displaying the text content to be read;

5. The artificial intelligence method of claim 1, wherein the method further comprises:

6. The artificial intelligence method of claim 1,

7. The artificial intelligence method of claim 2 wherein the instruction pattern step further comprises:

8. An artificial intelligence device, wherein the device is configured to implement the steps of the method of any one of claims 1 to 7.

9. A robot comprising a memory, a processor and an artificial intelligence robot program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 7 are carried out when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.