CN112567456A

CN112567456A - Learning aid

Info

Publication number: CN112567456A
Application number: CN201980051140.2A
Authority: CN
Inventors: 艾德里安·德维特
Original assignee: Wanjuan Intelligent Co ltd
Current assignee: Wanjuan Intelligent Co ltd
Priority date: 2018-07-16
Filing date: 2019-07-16
Publication date: 2021-03-26
Also published as: WO2020014730A1

Abstract

A method for learning to read text is described. The method comprises the following steps: receiving audio comprising voiced text; detecting an incorrect utterance included in the voiced text; and providing one or more visual indicia to indicate that an incorrect utterance has been detected to learn to read the text. The method may further include transmitting the received audio over one or more computer networks and/or providing the correct utterance over one or more computer networks. The detection may be performed using a personal computing device or using a remote computer. Additionally, the method may further comprise providing a sound embodying the correct utterance. An apparatus, computer system and computer program product for learning to read text are also described.

Description

Learning aid

Technical Field

The invention relates to a learning aid. More particularly, the present invention relates to a learning aid for learning to read text. The text may be read to learn reading or to learn language.

Background

Once a person has reading capabilities, he can access information and learn many other things. Reading can become more difficult when the written characters of a word are not pronounced or pronounced according to pronunciation.

U.S. patent 6405167 to Mary-an-Kelirano (MaryAnn Cogliano) and U.S. patents 7110945 and 7366664 to the same inventor and assigned to leakages LLS disclose interactive electronic books. The electronic book includes a microphone, a voice recognition unit, and a highlight device. When a particular word pronounces correctly, a highlighting device (e.g., a light emitting diode) will light up. Audio output of the words may also be provided. The highlighting device is stated as having the ability to assist the child in pronouncing one of the words by verbally pronouncing a portion of the word and simultaneously highlighting the portion.

There remains a need for improved or alternative learning aids.

The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that prior art forms part of the common general knowledge.

Disclosure of Invention

In general, embodiments of the invention relate to learning aides. The present invention is directed to a method and apparatus for learning to read text.

In a broad sense, the present invention relates to detecting incorrect utterances of text and providing correct utterances.

In a first form, although it need not be the only or indeed the broadest form, the invention resides in a method for learning to read text, the method comprising:

receiving audio comprising voiced text;

detecting an incorrect utterance included in the voiced text; and

one or more visual markers are provided to indicate that an incorrect utterance has been detected to learn to read the text.

The method may further comprise: the received audio is sent over one or more computer networks and/or the correct utterance is provided over one or more computer networks.

Detection according to the first form may be by use of a personal computing device or by use of a remote computer. Personal computing devices may include devices that receive audio, detect incorrect utterances, and provide one or more visual indicia. The personal computing device may include one or more processors for detecting.

The method of the first aspect may further comprise: the text to be spoken is displayed. The display may be on a screen.

The method of the first aspect may further comprise: sound embodying the correct sound production is provided through the speaker. The speaker may be included on the personal computing device.

In a second form, the present invention provides an apparatus for learning to read text, the apparatus comprising:

a screen for displaying text to be read;

an input for receiving an utterance of text to be read;

one or more computer processors to detect an incorrect utterance included in a received utterance; and

a visual output to provide one or more visual indicia to indicate that an incorrect utterance has been detected.

The apparatus of the second form may further comprise: an incorrect utterance on the device is detected. The detection may use one or more processors.

The device of the second form may further comprise one or more transmitters, transceivers or receivers to communicate the utterance of text and/or a correct utterance of text over one or more computer networks.

The second form of apparatus may comprise an audio output for producing correctly sounding sounds embodying text associated with the detected incorrect sounding.

The device of the second aspect may comprise a personal computing device. The personal computing device may include a smartphone or tablet computer. The personal computing device may include: android home equipment; an iOS (or iPhone OS) device; microsoft Windows Phone device; virtual reality devices, such as Microsoft Hololens virtual reality devices; a digital media player device, such as an Apple TV device, an Android TV device or a ROKU device; a mini-console, such as Nexus Player; SHIELD Android TV, Apple TV. The tablet computer may comprise an Android tablet, such as a Galaxy notebook, or an iOS tablet, such as an iPad. The smartphone may comprise an Android home smartphone, such as a Galaxy smartphone, or an iOS smartphone, such as an iPhone. Personal computing devices may project text to be read onto, for example, a paper book or other surface, or use a camera with a book.

In a third form, the present invention provides a computer system for learning to read text, the computer system comprising:

one or more personal computing devices comprising: a screen for displaying a text to be read; an input for receiving a vocalization of a text to be read; and a visual output for providing one or more visual indicia to indicate that an incorrect utterance has been detected; and

one or more server computers, including one or more computer processors, for receiving utterances of text from one or more personal computing devices and detecting incorrect utterances included in the received utterances.

The third form of one or more server computers may further provide the correct utterance of text.

The one or more personal computing devices of the third form may further include an audio output for producing a sound embodying a correct utterance of text associated with the detected incorrect utterance.

A third form of computer system may comprise one or more databases containing correct utterances of text.

In a fourth form, the present invention provides a computer program product comprising:

a computer usable medium and computer readable program code embodied on said computer usable medium for learning to read text, the computer readable program code comprising:

computer readable program code devices (i) configured to cause a personal computing device to receive audio comprising voiced text;

computer readable program code devices (ii) configured to cause a personal computing device to detect an incorrect utterance included in the voiced text; and

(iv) computer readable program code devices (iii) configured to cause the personal computing device to generate one or more visual indicia to indicate that an incorrect utterance has been detected.

The computer program product according to the fourth form may further include:

computer readable program code device (iv) configured to cause the personal computing device to provide a correct utterance of text associated with the detected incorrect utterance.

The computer program product according to the fourth form may further include:

computer readable program code devices (v) configured to cause a personal computing device to transmit received audio over one or more computer networks.

The computer program product according to the fourth form may further include:

computer readable program code devices (vi) configured to cause a personal computing device to process received audio using one or more processors to detect incorrect utterances. One or more processors may be included in the personal computing device that receive audio, detect incorrect utterances, and cause the generation of one or more visual indicia.

The computer program product according to the fourth form may further include:

computer readable program code devices (vii) configured to cause the personal computing device to provide and/or receive correct utterances over one or more computer networks.

The computer program product according to the fourth form may further include:

computer readable program code means (viii) configured to cause a personal computing device to display text to be spoken. The display may be on a screen.

The computer program product according to the fourth form may further include:

computer readable program code devices (ix) configured to cause the personal computing device to provide a correct sound production through the speakers.

According to any of the above aspects, the detection of an incorrect utterance may comprise a streaming method, a wake-up or stop-word method or a phoneme sound method.

The streaming method may include: a homonym word match between a speech word and a next word in the voiced text is detected.

The streaming method may include: one or more minimal pair stereo(s) are matched.

The streaming method may further include a curated mismatch list. This list can be used to control stringency.

The wake word method may include taking turns of the wake word. Rotating the wake word may include replacing the current wake word with the next wake in the text. The wake word may include a wake syllable, a wake sound, a set of wake sounds, or a wake sentence. When the wake sound comprises a wake syllable, the wake syllable may comprise a plurality of sounds to interpret different accents.

The wake word detection may include a comparison to linearized text. The linearized text may be represented as one or more tokens (tokens). The comparison may include a predictive model that compares the received audio to the linearized text. The comparison may be compared to a probability threshold. The probability threshold may be adjusted. If the audio does not match the linearized text, one or more visual markers may be provided. If the audio matches the linearized text, the next token may be retrieved from the linearized text for comparison.

Streaming detection may include converting received audio to text data output. The text data output may include multiple interpretations of the received audio.

Streaming detection may further include matching the text data output to the linearized text. The matching may include matching within a variety of interpretations. If the text data output does not match the linearized text, one or more visual markers may be provided. If the text data output matches the linearized text, the next received audio is converted to a text data output.

The phoneme sound method may include classification of phoneme sounds. Classification may be by neural networks. The phoneme sounds may be based on phoneme letters. In particular embodiments, the phonemic sounds may be based on arpbet phonemic letters. The classification may include classifying the received audio with one or more phoneme labels and probabilities of the phonemes. The classifying may further include querying a lookup table that includes words and their phonemes. In a lookup table, a word may have multiple phoneme strings. These multiple strings may correspond to various pronunciations of words having different accents or other different pronunciations.

The phoneme sound method may further include: the look-up table is queried using the next word included in the received audio. As the phonemes are voiced and classified, each phoneme included in the received audio may be matched to a phoneme string. Once one of the word phoneme strings matches the vocalized phoneme, the word may be read. The next word to be read can then be looked up.

The phoneme sound method may further include: looking for variations in accents and pronunciations for the current region or regions.

In accordance with any of the above aspects, the one or more computer processors may include a remote computer processor connected to the input, the visual output, and/or the audio output over a computer network. The remote computer processor may be included in a server computer.

According to any of the above aspects, the input may comprise one or more microphones.

According to any of the above aspects, the audio output may comprise one or more speakers.

According to any of the above aspects, the incorrect utterance may include a mispronunciation, an unqualified accented utterance, an incorrect word, and/or an incorrect tone. An unqualified accented utterance may include an utterance in an accent that is not a correct utterance of text. The proper vocalization of the text may include vocalizing with a selected accent. Incorrect tones may be included in tonal languages such as mandarin, cantonese, another chinese dialect, or vietnamese.

The selected accent may comprise accents of americans, english, australian, new zealand, canada, south african or english. The selected accent may include high german, standard japanese (hyoujungo), or standard mandarin.

According to any of the above aspects, the displayed text may include a native language display and a converted display. The converted display may be above or below the original display. The original display may include one or more native language characters. The one or more native language characters may include one or more letters or pictograms. The one or more pictographs may include chinese, japanese, or another language characters. In one embodiment, the native language display may include one or more chinese characters and corresponding romanization. The corresponding romanization may include pinyin (pinyin romanization). The converted display may indicate a tone.

According to any of the above aspects, the one or more visual indicia may include highlighting text associated with the detected incorrect utterance. The text associated with the detected incorrect utterance may include one or more words. One or more words may include one or more characters. The one or more characters may include one or more letters or pictograms. The one or more pictographs may include chinese, japanese, or another language characters.

According to any of the above aspects, the literacy reading text may be literacy reading or literacy language. The learning to read may be an initial learning to read written text that is commonly performed by children. The learned language may be the learning of a second language or other language by a person who may be able to read their native language.

According to any of the above aspects, the text may be a translation from another language.

According to any of the above aspects, the screen may comprise a touch screen.

According to any of the above aspects, touching the highlighted display may initiate the generation of sound, including the correct sounding of text. The sound may be produced one, two, three, four, five, six, seven, eight, nine, ten or more times. In one embodiment, the sound may be generated three times.

If the highlighted display is touched, a record of the desired pronunciation of the highlighted word, term or phrase is played one or more times. The text may be part or all of a book or other work that may be selected from a library. The library may contain original, freely available or licensed works. Recommendations may be made to the user based on interests, levels, or targeted learning.

According to any of the above aspects, a filter may be included to determine the level of the user. The filter may include a set or subset of standardized text or varying difficulty levels.

Also in accordance with any of the above aspects, one or more stickers, rewards, logos, or badges may be provided. One or more stickers, rewards, logos, or badges may be provided when a sentence, page, chapter, book, or number of books is complete.

According to any of the above aspects, the progress of the user may be tracked.

According to any of the above aspects, a library may be included. The library may be included in a database connected to one or more personal computing devices and/or server computers over a computer network. The library may include a plurality of books. The library may further include linearized text for each book.

Other aspects and/or features of the present invention will become apparent from the following detailed description.

Drawings

For the present invention to be readily understood and readily put into practical effect, reference will now be made to the embodiments of the present invention with respect to the accompanying drawings, in which like reference numerals refer to like elements. These drawings are provided by way of example only, in which:

FIGS. 1A, 1B, and 1C are flow diagrams illustrating one embodiment of a method according to the present invention.

FIG. 1D is a flow chart illustrating a method of detecting incorrect utterances according to the present invention.

FIG. 1E is a flow chart illustrating another method of detecting an incorrect utterance in accordance with the present invention.

FIG. 2A is a block diagram illustrating one embodiment of a personal computing device and one embodiment of a computer system, in accordance with the present invention.

FIG. 2B is a block diagram illustrating a computer processor and memory according to one embodiment of the invention.

Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the relative dimensions of some of the elements in the figures may be distorted to help improve the understanding of the embodiments of the present invention.

Detailed Description

Embodiments of the present invention relate to a learning aid that may be used to learn to read text. The invention is of significant advantage in that it can be used to learn reading but also to learn a language, for example a second language.

The present invention is based at least in part on the unexpected finding that a problem with speech to text conversion is that there is an invisible or routinely undetectable ambiguity in everyday speech. Examples of such ambiguities include various accents (e.g., new zealand saying the number after five is "six" or "suctions"), homonyms, and statistical probabilities that a group of sounds in a sentence may be misinterpreted as different words because there are no gaps in the sounds, as there are spaces in written language. The reason a person can understand speech that a machine cannot understand is that the person owns and applies context. People have memory and comprehension of the spoken content and can adjust the spoken content according to the accent of the speaker.

One possible accent resource is a speech accent archive that can be viewed online over http:// account.gmu.edu/browse _ language.php.

Although explained primarily with reference to English, the present invention is also applicable to other languages, including Mandarin, Cantonese, and Japanese. Indeed, the present invention may be applied to any language having a written form.

This highlights another advantage of the invention, namely that it can be applied to learning or learning languages. The learning to read may be an initial learning to read written text that is commonly performed by children. The learning language may be the learning of a second language or another language by someone who may be able to read their native language, such as a person learning english in mandarin.

The present invention is particularly advantageous when applied to english, which presents a particular challenge to learners due to many exceptions that are not phonographed languages and have recognized rules.

The invention is also particularly advantageous when applied to languages that use non-phonograms such as english and chinese, as well as other asian languages and dialects.

In one embodiment, the present invention uses speech recognition to assist the user in reading. The user touches a line in an electronic book or other electronic text display, the line is highlighted, and the remaining pages are darkened. The user reads the row aloud and the e-book responds by highlighting only the portions of the row that were not read correctly. If the user pronounces the word incorrectly or the word is not pronounced correctly, the word is further highlighted to help the user focus on the word.

The present invention may use a variety of speech recognition methods or a combination thereof. The first is to take spoken words and then convert them to text in a streaming method. The method has multiple transitions simultaneously. It will have extra data for words and sentences with high probability and words and sentences with lower probability. The second method is to use a so-called "stop word" or "wake word". For example, someone says "hey Siri", and then issues a voice recognition start signal; any words that follow will be converted using the previous system. In general, the wake word system uses less processing than the streaming approach. Typically, a wake word system requires three syllables to recognize words that have been spoken. The third is a phonemic sound method using phonemic letters.

In one embodiment, the wake word method may be most compatible with the present invention. However, existing ready-to-wake word systems have a limited vocabulary.

The wake word method may include alternating wake words, with a new wake word being replaced by the next word in the book when the word is read.

In the wake word approach, different atomicity or conventions may be prosecuted. For example, there may be a wake-up syllable, a wake-up sound, a set of wake-up sounds, or a wake-up sentence. The wake syllable will have multiple sounds to illustrate the accent.

In another embodiment, a streaming approach is used. Other techniques may be used to enhance the streaming approach to improve accuracy and limitation.

By streaming is meant that conventional speech recognition with two streams is used first. An audio data stream serves as an input to a speech processor, and an output of speech text data. The output of the text data is a word that can be processed from the phoneme string. The output of the text may have a change in the interpretation of the recognized speech. This processing may occur over the internet or in a computing device.

The streaming method may include detecting a homonym match between the phonetic word and the next word. The minimum pair of stereo pairs can then be matched.

A minimum pair of stereo refers to a pair of words or phrases in a particular language that differ only in one phonetic element (e.g., phoneme, tone, or tense) and have different meanings. A minimum pair of stereo may be used to prove that two tones (different speech, sounds or gestures) are two separate phonemes in the language.

The streaming method may also include a manually curated mismatch list. A manually curated mismatch list may be used to set the desired stringency. This approach is used because sometimes it is necessary to distinguish between two close words. Minimal pair stereoscopy allows a sound modification, e.g., "she" becomes "he" or "sea" or "see" or "odd (chi)". One example of a planned mismatch is "s" and "he". This is done because readers may confuse these words and therefore expect some words to have a higher degree of strictness. Another common error is "sit" or "sit down". A learner may read "sit" while reading "sit". For this reason, in English, many of the smallest pairs of solids will contain an "s" at the end of the word. For this reason, many will be mismatched, but not all will be mismatched due to the exception.

Fig. 1A shows an embodiment of a method 100 according to the invention. The method 100 includes receiving 110 audio including voiced text. Incorrect utterances included in the voiced text are then detected 120. The text is thus learned to be read before providing 130 one or more visual indicia indicating that an incorrect utterance has been detected.

The detection 120 may be performed with the user's personal computing device 201 or by a remote server computer 291.

As shown in fig. 1B, the method 100 may further include sending 140 the received audio over one or more computer networks and/or providing the correct utterance over one or more computer networks.

As shown in fig. 1C, the method 100 may further include providing 150 sound embodying the correct sound production through a speaker.

The display 160 of the text to be spoken is not yet shown. The display may be on a screen such as a touch screen.

The detection 120 of an incorrect utterance may include a streaming method or a wake-up word or stop word method. Advantageously, the streaming method may further comprise a curated mismatch list providing a distinction between two close words.

FIG. 1D illustrates one embodiment of the detection 120 including a wake word method 170, the wake word method 170 including a comparison 172 with linearized text. The comparison may include a predictive model that compares 174 the received audio to the linearized text. The linearized text may be represented 176 as one or more tokens.

The comparison 174 may be compared to a probability threshold 178. The probability threshold may be adjusted or adjustable 180 (not shown). If the audio does not match the linearized text, one or more visual markers may be provided 130. If the audio matches the linearized text, the next marker may be retrieved 182 from the linearized text for comparison 170.

The phoneme sound method may include classification of phoneme sounds. Classification may be by neural networks. The phonemic sounds may be based on phonemic letters, such as the arpbet phonemic letters described in the following Uniform Resource Locators (URLs):https://en.wikipedia.org/wiki/ARPABET. Other suitable phonemic alphabets may be readily selected by the skilled artisan in light of the teachings and examples herein.

The classifying may include classifying the received audio with one or more phoneme labels and probabilities of the phonemes. The classifying may further include querying a lookup table that includes words and their phonemes. In a look-up table, a word may have multiple phoneme strings. These multiple strings may correspond to various pronunciations of words having different accents or other different pronunciations.

The next word contained in the received audio is then used to query the look-up table. As the phonemes are voiced and classified, each phoneme included in the received audio may be matched to a phoneme string. Once one of the word phoneme strings matches the voicing factor, the word may be read. The next word to be read can then be looked up.

The phonemic sound method may further comprise looking for variations in accents and pronunciations of the current region or regions.

Fig. 1E illustrates another embodiment of the detection 120 including streaming detection 184, the streaming detection 184 including converting 186 the received audio into text data output. The text data output may include multiple interpretations of the received audio.

Streaming detection may further include matching 188 the text data output with the linearized text. The match 188 may include matches within a variety of interpretations. If the text data output does not match the linearized text, one or more visual markers may be provided 130. If the text data output matches the linearized text, the next received audio is converted 186 to a text data output.

The one or more computer processors may include a remote computer processor connected to the input, visual output, and/or audio output through a computer network. The remote computer processor may be included in a server computer.

Incorrect utterances may include mispronunciations, non-compliant accented utterances, incorrect words, or incorrect tones. An unqualified accented utterance may include an utterance in an accent that is not a correct utterance of text. The proper vocalization of the text may include vocalizing with a selected accent. Incorrect tones may be included in tonal languages such as mandarin, cantonese, another chinese dialect, or vietnamese.

The selected accent may comprise accents of a american, english, australian, new zealand, canada, south african or english speaker. The selected accent may include high german, standard japanese (hyoujungo), or standard mandarin.

The displayed text may include native language display and converted display. The converted display may be above or below the original display. The original display may include one or more native language characters. The one or more native language characters may include one or more letters or pictograms. The one or more pictographs may include chinese, japanese, or another language characters. In one embodiment, the native language display may include one or more chinese characters and corresponding romanization. The corresponding romanization may include pinyin (pinyin romanization). The converted display may indicate a tone.

Touching the highlighted display may initiate the generation of sound, including the correct production of text. The sound may be produced one, two, three, four, five, six, seven, eight, nine, ten or more times. In one embodiment, the sound may be generated three times.

A filter may be provided to determine the level of the user. The filter may include a set or subset of standardized text or varying difficulty levels.

One or more stickers, rewards, logos, or badges may be provided. One or more stickers, rewards, logos, or badges may be provided when a sentence, page, chapter, book, or number of books is complete.

The progress of the user may be tracked.

Although not shown, the library may be included, for example, in a database connected to one or more personal computing devices and/or server computers over a computer network. The library may include a plurality of books. The library may further include linearized text for each book.

One embodiment of a computer system 200 and personal computing device 201 suitable for use with the present invention is shown in fig. 2A and 2B.

In the embodiment shown in fig. 2A and 2B, the computer system 200 includes a personal computing device 201, the personal computing device 201 including an input device and an output device; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, an external hard drive 227, and a microphone 280; output devices include a printer 215, a display device 214, and speakers 217. In some embodiments, the video display 214 may comprise a touch screen.

A modulator-demodulator (modem) transceiver device 216 may be used by the personal computing device 201 to communicate with the communication network 220 via connection 221. The network 220 may be a Wide Area Network (WAN), such as the internet, a cellular telecommunications network, or a private WAN. The personal computing device 201 may be connected to other similar personal devices 290 or server computers 291 via the network 220. Where connection 221 is a telephone line, modem 216 may be a conventional "dial-up" modem. Alternatively, where connection 221 is a high capacity (e.g., cable) connection, modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the network 220.

The personal computing device 201 typically includes at least one processor 205 and memory 206 formed, for example, from semiconductor Random Access Memory (RAM) and semiconductor Read Only Memory (ROM). The personal computing device 201 also includes a plurality of input/output (I/O) interfaces including: an audio-video interface 207 coupled to the video display 214, speaker 217, and microphone 280; an I/O interface 213 for a keyboard 202, a mouse 203, a scanner 226, and an external hard disk drive 227; and an interface 208 for an external modem 216 and printer 215. In some implementations, the modem 216 may be incorporated within the personal computing device 201, such as within the interface 208. The personal computing device 201 also has a local network interface 211, which local network interface 211 permits coupling of the personal device 200 to a local computer network 222, referred to as a Local Area Network (LAN), via a connection 223.

As also shown, local network 222 may also be coupled to wide area network 220 via connection 224, which connection 224 will typically include a so-called "firewall" device or device with similar functionality. Interface 211 may be formed by an ethernet circuit card, a bluetooth wireless device, or an IEEE 802.11 wireless device, or other suitable interface.

I/O interfaces 208 and 213 may provide one or both of serial and parallel connections, the former typically implemented according to the Universal Serial Bus (USB) standard and having corresponding USB connectors (not shown).

Storage devices 209 are provided, which typically include Hard Disk Drives (HDDs) 210. Other storage devices, such as external HD 227, disk drives (not shown), and tape drives (not shown) may also be used. The optical disc drive 212 is typically provided to serve as a non-volatile source of data. Portable storage devices such as optical disks (e.g., CD-ROM, DVD, blu-ray), USB-RAM, external hard drives and floppy disks may be used, for example, as suitable data sources to personal device 200. At least one server computer 291 provides another source of data to the personal device 200 via the network 220.

The components 205-213 of the personal computing device 201 are typically communicated via the interconnection bus 204 in a manner that results in a conventional mode of operation of the personal device 200. In the embodiment shown in fig. 2A and 2B, processor 205 is coupled to system bus 204 by connection 218. Similarly, memory 206 and optical disk drive 212 are coupled to system bus 204 by connection 219. Examples of personal devices 200 on which the described arrangement may be practiced include: IBM-PC and compatible products thereof, Sun spark workstation, Apple computer; a smart phone; a tablet computer or similar device including a computer module, such as personal computing device 201. It should be understood that when personal device 200 comprises a smartphone or tablet computer, display device 214 may comprise a touch screen, while other input and output devices, such as mouse pointer device 203, keyboard 202, scanner 226, and printer 215, may not be included.

Fig. 2B is a detailed schematic block diagram of the processor 205 and the memory 234. Memory 234 represents a logical collection of all memory modules, including memory device 209 and semiconductor memory 206, that may be accessed by personal computing device 201 in fig. 2A.

The method of the present invention may be implemented using personal device 200, where the method may be implemented as one or more software applications 233 executable within personal computing device 201. In particular, the steps of the method of the present invention may be implemented by instructions 231 in software executed within the personal computing device 201.

The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software 233 may also be divided into two separate parts, a first part and corresponding code modules that perform the method of the invention, and a second part and corresponding code modules that manage the graphical user interface between the first part and the user.

The software 233 may be stored in a computer readable medium, including a storage device of the type described herein. The software is loaded into the personal device 200 from a computer-readable medium or through the

network

221 or 223 and then executed by the personal device 200. In one example, the software 233 is stored on a storage medium 225 that is read by the optical disk drive 212. The software 233 is typically stored in the HDD 210 or the memory 206.

A computer readable medium having such software 233 or computer program recorded thereon is a computer program product. The use of the computer program product in the personal device 200 preferably affects the device or means for implementing the method of the invention.

In some cases, a software application 233 may be supplied to the user, the software application 233 being encoded on one or more disk storage media 225, such as a CD-ROM, DVD, or Blu-ray disk, and read via a corresponding drive 212, or also being readable by the user from the

network

220 or 222. The software may also be loaded into personal device 200 from other computer-readable media. Computer-readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to personal computing device 201 or personal device 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROMs, DVDs, blu-ray discs, hard drives, ROMs or integrated circuits, USB memory, magneto-optical disks, or computer-readable cards such as PCMCIA cards, etc., whether such devices are internal or external to the personal computing device 201. Examples of transitory or non-tangible computer-readable transmission media that may also participate in the provision of software applications 233, instructions 231, and/or data to the personal computing device 201 include radio or infrared transmission channels and

network connections

221, 223, 334 to another computer or

networked device

290, 291, as well as the internet or an intranet that includes email transmissions and information recorded on a website or the like.

The second portion of the application 233 and corresponding code modules described above can be executed to implement one or more Graphical User Interfaces (GUIs) to be rendered or otherwise represented on the display 214. Generally, by manipulating the keyboard 202, mouse 203, and/or screen 214 (including a touch screen), a user of the personal device 200 and the method of the present invention can manipulate the interface in a functionally suitable manner to provide control commands and/or inputs to an application associated with the GUI. Other forms of functionally adaptive user interfaces may also be implemented, such as an audio interface utilizing voice prompts output via the speaker 217 and user spoken commands input via the microphone 280. These operations include mouse clicks, screen touches, voice prompts and/or user spoken commands that may be sent via the

network

220 or 222.

When the personal computing device 201 is initially powered up, a Power On Self Test (POST) program 250 may be executed. The POST program 250 is typically stored in the ROM 249 of the semiconductor memory 206. Hardware devices such as ROM 249 are sometimes referred to as firmware. The POST program 250 checks the hardware within the personal computing device 201 to ensure that it is operating properly and typically checks the processor 205, memory 234(209, 206) and basic input output system software (BIOS) module 251, also typically stored in ROM 249, for proper operation. Once the POST program 250 is successfully run, the BIOS 251 activates the hard drive 210. Activation of the hard drive 210 causes the boot loader 252 resident on the hard drive 210 to execute via the processor 205. This loads the operating system 253 into the RAM memory 206, and the operating system 253 begins operating on the RAM memory 206. The operating system 253 is a system-level application executable by the processor 205 to perform various high-level functions, including processor management, memory management, device management, storage management, software application programming interfaces, and a general user interface.

The operating system 253 manages the memory 234(209, 206) to ensure that each process or application running on the personal computing device 201 has sufficient execution memory without conflicting with memory allocated to another process. Furthermore, the different types of memory available in personal device 200 must be used appropriately so that each process can run efficiently. Thus, the aggregate memory 234 is not intended to illustrate how particular segments of memory are allocated, but rather is intended to provide an overall view of memory accessible to the personal computing device 201 and how that memory is used.

Processor 205 includes a number of functional blocks including a control unit 239, an Arithmetic Logic Unit (ALU)240, and a local or internal memory 248, sometimes referred to as a cache memory. Cache 248 typically includes a plurality of storage registers 244, 245, 246 in a register portion that stores data 247. One or more internal buses 241 functionally interconnect these functional modules. The processor 205 also typically has one or more interfaces 242 for communicating with external devices via the system bus 204 using the connection 218. Memory 234 is connected to bus 204 by connection 219.

Application 233 includes a sequence of instructions 231, which sequence of instructions 231 may include conditional branch and cycle instructions. Program 233 can also include data 232 for executing program 233. The instructions 231 and data 232 are stored in

memory locations

228, 229, 230 and 235, 236, 237, respectively. Depending on the relative size of the instructions 231 and the

memory location

228 and 230, a particular instruction may be stored in a single memory location, as depicted by the instruction shown in the memory location 230. Alternatively, the instruction may be segmented into multiple portions, each portion being stored in a separate memory location, as depicted by the instruction segments shown in

memory locations

228 and 229.

Typically, the processor 205 is given a set of instructions 243 in which to execute. The processor 205 then waits for a subsequent input, to which the processor 205 reacts by executing another set of instructions. Each input may be provided from one or more of a plurality of sources, including: data generated by one or more of the

input devices

202, 203 or 214 when a touch screen is included, data received from an external source over one of the

networks

220, 222, data retrieved from one of the

storage devices

206, 209, or data retrieved from a storage medium 225 inserted into the corresponding reader 212. In some cases, execution of a set of instructions may result in the output of data. Execution may also involve storing data or variables to memory 234.

The disclosed arrangement uses input variables 254, which input variables 254 are stored in

respective memory locations

255, 256, 257, 258 in the memory 234. The depicted arrangement produces an output variable 261, which output variable 261 is stored in a

respective memory location

262, 263, 264, 265 in the memory 234. Intermediate variable 268 may be stored in

memory locations

259, 260, 266, and 267.

Register portions

244, 245, 246, Arithmetic Logic Unit (ALU)240, and control unit 239 of processor 205 work together to execute the sequence of micro-operations required to perform the "fetch, decode, and execute" cycle for each instruction in the instruction set comprising program 233. Each fetch, decode, and execute cycle includes:

(a) a fetch operation that fetches or reads instruction 231 from

memory locations

228, 229, 230;

(b) a decode operation in which the control unit 239 determines which instruction has been fetched; and

(c) an operation is performed in which control unit 239 and/or ALU 240 execute instructions.

Thereafter, further fetch, decode, and execute cycles of the next instruction may be performed. Similarly, a storage cycle may be performed by which the control unit 239 stores or writes values to the memory locations 232.

Each step or sub-process in the method of the present invention may be associated with one or more segments of the program 233 and may be executed by the register portion 244-246, the ALU 240 and the control unit 239 in the processor 205, the register portion 244-246, the ALU 240 and the control unit 239 in the processor 205 cooperating to perform fetch, decode and execute cycles for each instruction in the instruction set of the marked segment of the program 233.

As shown in fig. 2A, one or more other computers 290 may be connected to the communication network 220. Each such computer 290 may have a similar configuration as the personal computing device 201 and corresponding peripheral devices.

One or more other server computers 291 may be connected to the communication network 220. These server computers 291 provide information in response to requests from personal devices or other server computers.

The method 100 may alternatively be implemented in dedicated hardware, such as one or more integrated circuits performing the functions or sub-functions of the described methods. Such dedicated hardware may include a graphics processor, a digital signal processor, or one or more microprocessors and associated memory.

It will be appreciated that in order to practice the method of the present invention as described above, the processors and/or memories of the processing machines need not be physically located in the same geographical location. That is, each processor and memory used in the present invention may be located in a geographically different location and connected to communicate in any suitable manner. Additionally, it will be appreciated that each processor and/or memory may be comprised of different physical device components. Thus, the processor need not be a single piece of equipment in one location and the memory need not be another single piece of equipment in another location. That is, it is contemplated that the processor may be two pieces of equipment located at two different physical locations. Two different device components may be connected in any suitable manner. In addition, the memory may include two or more portions of memory in two or more physical locations.

For further explanation, the processes described above are performed by various components and various memories. However, it will be appreciated that according to another embodiment of the invention, as described above, the processing performed by two different components may be performed by a single component. Further, as described above, the processing performed by one different component may be performed by two different components. In a similar manner, according to another embodiment of the invention, as described above, memory storage performed by two different memory portions may be performed by a single memory portion. Further, as described above, memory storage performed by one distinct memory portion may be performed by two memory portions.

Further, various techniques may be used to provide communications between the various processors and/or memories, as well as to allow the processors and/or memories of the present invention to communicate with any other entity, i.e., to, for example, obtain further instructions or access and use remote memory. Such technologies for providing such communications may include, for example, a network, the internet, an intranet, an extranet, a LAN, an ethernet, a telecommunications network (e.g., a cellular or wireless network), or any client server system that provides communications. Such communication techniques may use any suitable protocol, such as TCP/IP, UDP, or OSI.

In one embodiment, a personal computing device 201 for learning to read text includes: a screen 214 for displaying text to be read, an input 216 for receiving an utterance of text to be read, one or more computer processors 205 for detecting an incorrect utterance included in the received utterance, and a visual output 214 for providing one or more visual indicia to indicate the detected incorrect utterance.

In the embodiment shown in fig. 2A and 2B, one or more transmitters, transceivers, or receivers are provided by modem 216, and modem 216 communicates the utterance of text and/or the correct utterance of text over one or

more computer networks

221, 223.

The audio output is provided by speaker 217, and speaker 217 produces sounds that embody the correct utterance of text associated with the detected incorrect utterance.

The personal computing device 201 may be in the form of a smart phone or tablet computer. The personal computing device may include: android home equipment; an iOS (or iPhone OS) device; microsoft Windows Phone device; virtual reality devices, such as Microsoft Hololens virtual reality devices; a digital media player device, such as an Apple TV device, an Android TV device, or a ROKU device; a mini-console, such as Nexus Player; SHIELD Android TV, Apple TV. The tablet computer may comprise an Android tablet, such as a Galaxy notebook, or an iOS tablet, such as an iPad. The smartphone may comprise an Android home smartphone, such as a Galaxy smartphone, or an iOS smartphone, such as an iPhone. Personal computing devices may project text to be read onto, for example, a paper book or other surface, or use a camera with a book.

In one embodiment, a computer system 200 for learning to read text includes one or more personal computing devices 201 and one or more server computers 291, the server computers 291 including one or more computer processors for receiving utterances of text from the one or more personal computing devices and detecting incorrect utterances contained in the received utterances.

One or more server computers 291 may further provide the correct utterance of text.

Computer system 200 may include one or more databases (not shown) that include the correct utterances of text.

The present invention also provides a computer program product comprising a computer usable medium and computer readable program code embodied on said computer usable medium for learning to read text, the computer readable program code comprising: computer readable program code devices (i) configured to cause the personal computing device 201 to receive audio comprising spoken text; computer readable program code devices (ii) configured to cause the personal computing device 201 to detect an incorrect utterance included in the utterance text; and computer readable program code devices (iii) configured to cause the personal computing device 201 to generate one or more visual indicia to indicate that an incorrect utterance has been detected.

The computer program product may further comprise computer readable program code devices (iv) configured to cause the personal computing device 201 to provide a correct utterance of text associated with the detected incorrect utterance.

The computer program product may further comprise computer readable program code means (v) configured to cause the personal computing device 201 to transmit the received audio over one or more computer networks.

The computer program product may further comprise: computer readable program code devices (vi) configured to cause the personal computing device 201 to provide and/or receive correct utterances over one or more computer networks.

The computer program product may further comprise computer readable program code means (vii) configured to cause the personal computing device 201 to display text to be spoken. The display may be on a screen.

The computer program product may further comprise computer readable program code means (viii) configured to cause the personal computing device 201 to provide a correct sound production through the loudspeaker.

Advantageously, the present invention enhances learner confidence by listening to and highlighting mispronounced words when they are detected.

In this specification, the terms "comprises," "comprising," or similar terms are intended to mean a non-exclusive inclusion, such that a device comprising a list of elements does not include only those elements but may include other elements not listed.

Throughout the specification the aim has been to describe the invention without limiting the invention to any one embodiment or specific collection of features. Those skilled in the relevant art will recognize modifications from the specific embodiments, but such modifications will fall within the scope of the invention.

Claims

1. A method for learning to read text, the method comprising:

receiving audio comprising voiced text;

detecting an incorrect utterance included in the voiced text; and

providing one or more visual indicia to indicate the incorrect utterance that has been detected to learn to read text.

2. The method of claim 1, further comprising: the received audio is sent over one or more computer networks and/or the correct utterance is provided over one or more computer networks.

3. The method of claim 1 or 2, wherein the detecting is performed using a personal computing device or using a remote computer, optionally the personal computing device is a personal computing device that receives the audio, detects the incorrect utterance, and provides the one or more visual indicia.

4. The method of any of claims 1 to 3, further comprising: the text to be spoken is displayed.

5. The method of any of claims 1 to 4, further comprising: sound embodying the correct sound production is provided through a loudspeaker.

6. An apparatus for learning to read text, the apparatus comprising:

a screen for displaying a text to be read;

an input for receiving an utterance of the text to be read;

a visual output to provide one or more visual indicia to indicate that the incorrect utterance has been detected.

7. The device of claim 6, further comprising one or more processors to detect the incorrect utterance on the device.

8. The device of claim 6, further comprising one or more transmitters, transceivers, or receivers to communicate the utterance of text and/or a correct utterance of text over one or more computer networks.

9. The device of any of claims 6-8, further comprising an audio output for producing a sound embodying a correct utterance of the text associated with the detected incorrect utterance.

10. A computer system for learning to read text, the computer system comprising:

one or more personal computing devices comprising a screen, an input, and a visual output; the screen is used for displaying a text to be read; the input is for receiving a vocalization of a text to be read; the visual output is to provide one or more visual indicia to indicate that an incorrect utterance has been detected; and

one or more server computers comprising one or more computer processors for receiving utterances of the text from the one or more personal computing devices and detecting incorrect utterances included in the received utterances.

11. The computer system of claim 10, wherein the one or more server computers further provide a correct utterance of the text.

12. The computer system of claim 10 or 11, wherein the one or more personal computing devices further comprise an audio output for producing sounds embodying a correct utterance of the text associated with the detected incorrect utterance.

13. The computer system of any of claims 10-12, further comprising one or more databases that include correct utterances of the text.

14. A computer program product, comprising:

a computer usable medium and computer readable program code embodied on said computer usable medium for learning to read text, said computer readable program code comprising:

computer readable program code device (ii) configured to cause the personal computing device to detect an incorrect utterance included in the voiced text; and

(iv) computer readable program code device (iii) configured to cause the personal computing device to generate one or more visual markers to indicate that the incorrect utterance has been detected.

15. The computer program product of claim 14, further comprising:

computer readable program code device (iv) configured to cause the personal computing device to provide a correct utterance of the text associated with the detected incorrect utterance.

16. The computer program product of claim 14 or 15, further comprising:

computer readable program code means (v) configured to cause the personal computing device to transmit the received audio over one or more computer networks.

17. The computer program product of claim 14, further comprising:

computer readable program code device (vi) configured to cause the personal computing device to process the received audio using one or more processors to detect incorrect utterances.

18. The computer program product of any of claims 14 to 16, further comprising:

computer readable program code device (vii) configured to cause the personal computing device to provide and/or receive the correct utterance over one or more computer networks.

19. The computer program product of any of claims 14 to 18, further comprising:

a computer readable program code device (ix) configured to cause the personal computing device to provide the correct utterance through a speaker.

20. A method, apparatus, system, or product according to any of claims 1-19, wherein detection of the incorrect utterance may include a streaming method, a wake-up or stop-word method, or a phonemic sound method.

21. A method, apparatus, system, or product according to claim 20, wherein the streaming method includes: detecting a homonym word match between the speech word and a next word in the voiced text.

22. A method, apparatus, system, or product according to claim 20, wherein the streaming method includes: one or more minimal pairs of stereo are matched.

23. A method, apparatus, system, or product according to claim 20, wherein the streaming method further includes a curated mismatch list.

24. A method, apparatus, system, or product according to claim 20, wherein the wake word method includes a rolling wake word.

25. A method, apparatus, system, or product according to claim 20, wherein the wake word detection includes a comparison to linearized text.

26. The method, apparatus or system of claim 20, wherein said phoneme sound method comprises classification of phoneme sounds.

27. A method, apparatus or system according to claim 26, wherein the classification is by a neural network.

28. The method, apparatus or system of claim 26, wherein the phonemic sounds are based on phonemic letters.