CN102509479B

CN102509479B - Portable character recognition voice reader and method for reading characters

Info

Publication number: CN102509479B
Application number: CN2011102964173A
Authority: CN
Inventors: 李豫; 沈沾俊; 张书强; 但果
Original assignee: Individual
Current assignee: Individual
Priority date: 2011-10-08
Filing date: 2011-10-08
Publication date: 2013-09-25
Anticipated expiration: 2031-10-08
Also published as: CN102509479A

Abstract

The invention discloses a portable character recognition voice reader and a method for reading characters. The portable character recognition voice reader comprises a housing case and a control circuit, wherein the housing case is suitable for being carried around by visually impaired people and is provided with more than one button, and the control circuit can be operated by the visually impaired people for obtaining information of print media through voice and is arranged in the housing case. The control circuit is composed of a control module, a character and image collection module, an OCR (Optical Character Recognition) character recognition editing module, a TTS (Text To Speech) module, a storage module and an audio playing module; and the method for reading the characters by using the portable character recognition voice reader, provided by the invention, comprises a step of collecting character and image information carried by the print media. The portable character recognition voice reader can be independently operated by the visually impaired people, and according to the method, the content of the character information carried by the print media can be easily obtained through voice by the visually impaired people.

Description

Portable character recognition phonation reader and character reading method

Technical Field

The present invention relates to a text reader and a method for reading text, and more particularly to a text reader for identifying information carried on a flat medium and a method for reading text by the same, which are convenient for visually impaired people.

Background

The reading of the word information carried by the plane media such as newspapers, magazines, books and periodicals, invoices, computer tickets and the like is the best way for people to acquire information and study cultural skills, and meanwhile, with the progress of science and technology, electronic equipment with screens such as computers, electronic books and the like becomes another carrier for people to acquire information. However, for the visually impaired people (short for visually impaired people) with low vision and high myopia caused by the blind and serious eye diseases, reading and reporting are as difficult as climbing the sky, so the reader products specially used for the visually impaired people also come into play, such as electronic book equipment with TTS character voice output function or indoor blind person readers with scanning, recognition and pronunciation. Although these existing products bring good news to the visually impaired people for reading and learning, they still have some disadvantages: the former can only convert the stored TXT format characters into sound for reading, but can not convert the real-time collected character information into sound for reading; the latter relies on the combination of information acquisition equipment and computer to scan and read out the literal information that gathers in real time, and its equipment that uses is complicated, and is not suitable for moving.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a portable character recognition phonation reader and a method for reading characters, which are convenient for visually impaired people to know character information carried by a plane medium through sound at any time and any place.

In order to solve the technical problem, the portable character recognition phonation reader comprises a casing which is suitable for the visually impaired to carry and is provided with more than one key, and a control circuit which is operated by the visually impaired and is arranged in the casing, enables the visually impaired to acquire plane media information through voice.

The control circuit comprises a control module, a character image acquisition module, an OCR character recognition editing module, a TTS character-to-speech module, a storage module and an audio playing module, wherein:

1) the control module is used for receiving key input, calling the corresponding function module for processing according to an input instruction, is a coordination and arrangement center among the function modules, drives the function modules to operate according to the instruction, and is used for managing the operation and reading and writing of related files;

2) the character image acquisition module acquires character and image information of the plane medium by adopting a camera or scanning component, and the model of the camera photosensitive chip is OV9653 or CMOS or CCD photosensitive chips of other brands;

3) the OCR character recognition editing module is used for recognizing the obtained character and image information and storing the character and image information as a TXT format file;

4) the TTS text-to-speech module is used for converting the obtained TXT text into an audio file which can be played;

5) a storage module, an onboard SDRAM with the type of MT48LC16M16A2 or norflash with the type of AM29LV160B) and an external SD card device, wherein the SD card selects an SPI data transmission mode;

6) the audio playing module converts the audio file into an audio signal and outputs the audio signal;

7) the chip used by the power module is a charging and discharging management chip of a lithium battery, which is manufactured by SEMTECH company and is the SC806I charging and discharging management chip.

The control circuit also comprises an intelligent character splicing and typesetting algorithm module which splices the character and image information which are collected for many times and are mutually overlapped into correct and coherent character and image information.

The shell is also provided with a positioning mechanism required for reading the plane media information, and the positioning mechanism consists of a longitudinal positioning rod and a transverse positioning rope.

The control module is a central processing unit, and the central processing unit of the module uses an ADI BF533 chip set produced by Asia Deno semiconductor company.

The OCR character recognition editing module adopts an OCR dynamic link library developed by Beijing Wenton company.

The TTS text-to-speech module adopts an ejTTS6.0 speech synthesis module which is produced by Jietonghua corporation.

The character image acquisition module is externally connected with high-definition large-view camera equipment or scanning equipment through an HDMI (high-definition multimedia interface).

The invention relates to a method for reading characters by a portable character recognition phonation reader, which comprises the steps of collecting character image information carried by a plane medium, wherein the method is independently operated by a visually impaired person and obtains the content of the character information carried by the plane medium through voice, and the method comprises the following operation steps:

firstly, opening a book, setting a longitudinal positioning rod of a positioning mechanism of the portable character recognition sounding reader as a first row, tightly attaching a flange at the front end of the longitudinal positioning rod to the upper edge of the book, pulling a pull ring of a transverse positioning rope by a left hand to tightly attach to the left side edge of the book, pressing an UP or DOWN key to select a 'reading and reporting' use function, pressing a QUERY key, starting the 'reading and reporting' use function, holding the portable character recognition sounding reader by a right hand, and continuously photographing an opened page from left to right to form a first column;

secondly, when the picture is taken to the end of the first row, the transverse positioning rope cannot be pulled out, at the moment, the left hand is slightly loosened to slightly withdraw the transverse positioning rope, the transverse positioning rope is pulled out continuously, the portable character recognition sounding reader is held by the right hand, and the second row picture is taken continuously from left to right;

thirdly, when the right side of the page of the book is photographed, setting the longitudinal positioning rod as a second row, continuously scanning the page of the book from left to right by the method, and repeating the steps of changing columns and rows until the whole page of the opened book is photographed;

and fourthly, pressing a query key, and playing the acquired text image information on the whole page of the book by sound through an OCR (optical character recognition) editing module, an intelligent text splicing and typesetting algorithm module, a TTS (text to speech) text-to-speech module and an audio playing module by the system.

The invention sets the control circuit for collecting, identifying and sounding the information carried by the plane media in the casing which is small and convenient for people to carry, and sets a unique positioning mechanism on the casing, thereby enabling the visually impaired to know the character information recorded by the plane media by operating the invention at any time and any place.

Description of the drawings:

the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a schematic view of the present invention

Fig. 2 is a schematic diagram of the positioning of the collected information according to the present invention.

FIG. 3 is a functional flow diagram of the present invention.

Fig. 4 is a block diagram of the present invention.

FIG. 5 is a connection diagram of the control circuit functional module according to the present invention.

Detailed Description

Example 1

As shown in fig. 1, the portable text recognition sounding reader of the present invention comprises a rectangular casing 1, wherein the casing 1 may be a circular or elliptical body, and is made of metal or PVC material, the size of the casing is small, the length, width and height are 100mm, 50mm and 40mm, so that the portable text recognition sounding reader is convenient for visually impaired people to carry about, and eight input command keys 2 electrically connected with a control circuit built in the casing 1, a charging interface 6 connected with a charger, and an earphone jack 7 for inserting an earphone are arranged on the casing of the casing 1. Eight keys 2 for inputting instructions are UP, DOWN, QUERY, EXIT, PHOTO, VOL +, VOL-and HELP keys respectively, braille or specific contacts for the blind to recognize are printed on the keys 2, a control circuit is arranged in the case 1, when the visually impaired people need to know the text information carried by the plane medium, the visually impaired people only need to hold the case 1 to operate the corresponding keys 2 on the case 1 and place the case 1 on the plane medium for positioning and shooting, and the text information content on the plane medium can be informed to the visually impaired people through an external earphone or a loudspeaker 5 of the case 1.

As shown in fig. 3, 4 and 5, the control circuit is composed of a control module, a Text image acquisition module, an OCR (Optical Character Recognition), a TTS (Text To Speech, chinese name; "Text To Speech" or Speech synthesis technology), a Text-To-Speech module, a storage module and an audio playing module, wherein:

1) the control module is a central processing unit

For controlling input requests and invoking related functions, the central processing unit may be implemented using the ADI BF533 chipset manufactured by Andronic (analog devices), the same family of processor chipsets manufactured by the same company, or chips manufactured by other companies, such as the ARM core or OMAP family chipsets manufactured by TI.

2) Character image acquisition module

The device is characterized in that the device collects text and image information of a plane medium, a camera shooting component or a scanning component is adopted, the camera shooting photosensitive chip is OV9653, CMOS or CCD photosensitive chips of other brands can be used, the lens is a 1.8mm wide-angle lens, the scanning component uses a customized handheld scanner, and the module is connected with a parallel peripheral bus of a central processing unit. The module transmits the collected image information with characters to be identified to a storage module for storage, the image information is processed by a central processing unit, an OCR character recognition editing module is called to identify the image with the characters, and then the TXT (text document) character information obtained after the image information is identified by the OCR character recognition editing module is stored in the storage module. In addition, the module can also be externally connected with High-Definition Multimedia Interface (HDMI), a High-Definition Multimedia Interface (HD), a digital video/audio Interface technology, or a scanning device through an HDMI (HDMI full English name).

3) OCR character recognition editing module

The obtained character and image information is identified and stored as a TXT format file, the module uses a TH _ embedded _ OCR.a dynamic link library developed by Beijing Wenton corporation, and can also use similar functional products or modules of other companies, such as character identification cores of Hanwang, Wenjing and other companies, and the module runs in the central processing unit.

4) TTS character-to-speech module

The obtained TXT is converted into an audio file which can be played, and the TTS text conversion part can select the pronunciation types including male pronunciation, female pronunciation and pronunciations of different languages such as Chinese, English and cantonese. The module adopts an ejTTS6.0 voice synthesis module of Jietonghua sound, and can also use an InterPhonic series or ViviVoice series voice library and a voice synthesis chip of science news flight, and the module runs in the central processing unit.

5) Memory module

Data information transmitted from the central processor and other modules is stored, and the onboard SDRAM (a type of memory) is of the type: MT48LC16M16a2 and norflash (a type of memory) model: AM29LV160B and external SD (Chinese name: Secure Digital Card) Card equipment, the SD Card selects SPI data transmission mode, the module is connected with the external bus interface of the central processing unit.

6) Audio playing module

The module is connected with the synchronous serial interface of the central processing unit, converts the audio file into an audio signal and outputs sound through an external earphone or a loudspeaker 5.

The casing 1 of the present invention is further provided with a USB interface 8 for exchanging data with an external computer and charging the power supply of the present invention. The chip is a charging and discharging management chip of a lithium battery of SEMTECH company, the model number of the chip is SC806I, charging and discharging management chips with the same function produced by other companies can be adopted, and the battery is a 3.7V chargeable and dischargeable lithium battery.

As shown in fig. 1, braille labels are etched on the surface of each of the eight keys 2 on the casing 1 of the present invention, so as to facilitate the operation of visually impaired people, and each of the keys 2 has the following functions:

after the system is started, a piece of starting music and an operation prompt are played, then a using function selection main menu is entered, the name of a first using function is broadcasted, the using function name is switched to be broadcasted by clicking UP and DOWN keys in the using function selection main menu, and then a QUERY confirming key is clicked to enter the using function of the current broadcast. The invention is mainly internally provided with the use functions of 1) reading books and newspapers in sequence; 2) setting system functions; 3) other applications. Other use functions where other applications are customizable.

And when the system is in a main menu state, pressing the UP key will broadcast the name of the previous service function of the currently broadcasted service function, for example, the currently broadcasted service function is 'system setting', pressing the UP key will broadcast 'reading and reporting', and the function is the previous service function of the 'system setting' service function.

And when the system is in a main menu state, pressing the DOWN key will broadcast the name of the next using function of the currently broadcasted using function, for example, the currently broadcasted using function is 'reading and reporting', pressing the DOWN key will broadcast 'system setting', and the function is the next using function of the 'reading and reporting' using function.

The QUERY key is a confirmation key, when the system is in a main menu state, the UP or DOWN key is used for selecting the use function to be entered by broadcasting, and the user function to be currently broadcasted is entered by pressing the key, for example, after the UP or DOWN key is used for selecting the use function of 'reading books and newspapers' by broadcasting, the key is pressed to start the use function of 'reading books and newspapers'. Under the using function of reading books and newspapers, the image acquisition module is in an activated state, at the moment, the system is placed on books or newspapers and magazines carrying text image information, a photo photographing key is pressed, a text image information acquisition stage is started, and at the moment, if a query confirmation key is pressed, the system plays the acquired text image information by sound through an OCR text recognition editing module, a TTS text-to-speech module and an audio playing module.

The EXIT key, which is an EXIT key, EXITs from the current state to the previous state. For example: and in the audio playing state, the user EXITs the audio playing state by pressing EXIT to prepare for the next acquisition of character and image information, and returns to the main menu for selecting the use function by pressing the EXIT key once again.

A PHOTO key and a photographing key which are used for starting the collection of the character image information.

VOL + key, volume up by one at a time of pressing the key.

VOL-key, volume down by one at a time.

The Help key, which is a Help key, can be pressed at any time to play the description of the current use function and the necessary operation step prompt.

The method for reading characters by the portable character recognition phonation reader comprises the following steps:

as shown in fig. 1 and 2, a positioning mechanism is arranged on a casing 1 of the present invention, the positioning mechanism is composed of a longitudinal positioning rod 3, a transverse positioning rope 4 and a tensioning device arranged in the casing 1, the longitudinal positioning rod 3 is arranged at the bottom of the front side surface of the casing 1, and is a telescopic rod (or more than three sections, and the length of the telescopic rod is based on the longitudinal height capable of covering the information carried by the read plane media) which can be pulled out or pushed back and is divided into three sections with equal distance, the longitudinal positioning rod 3 is made of aluminum material or PVC plastic, the front end of the longitudinal positioning rod is bent downwards for 90 degrees to form a skirt edge, or a flange 31 is arranged on the lower surface of the front end of the rod, the flange 31 is mainly used for hanging the edge of the book or the journal to be read and horizontally moves along the edge, the longitudinal positioning rod 3 is pulled out to a first section of distance, which is called a first row for short, to a second, the lines are called and changed, a mechanical impact clicking sound prompt is given when the lines are pulled out for a certain distance each time, and meanwhile, the visually impaired can judge that the lines are pulled out for a plurality of distances by touching the number of the sections extending out of the longitudinal positioning rod 3 with hands; the transverse positioning rope 4 is arranged at the bottom of the left side surface of the machine shell 1 and is a thin rope which can be pulled out or retracted, the front end of the rope is provided with a pull ring 41 which is convenient to pull out, the rope is made of nylon, cotton thread or metal wire, the rope can be pulled out in three sections at equal intervals (or more than three sections, and the length of the rope is based on the transverse distance which can cover the information carried by the read plane media), when the first section of the rope is pulled out, the rope can not be pulled out any more due to the action of a tension spring arranged in the machine shell 1, if the second section of the rope is continuously pulled out, the rope is loosened to slightly pull back the rope, under the loosening of the spring, the second section of the rope can be pulled out again immediately continuously pulled out, the third section of the rope is pulled out according to the method, the transverse positioning rope 4 is pulled out by the first section of the rope and is called as a first row, the second section of the rope is called as, transferring from the previous distance to the next distance, and weighing and changing the rows; after use, the rope ring can be loosened, and the rope can be automatically drawn back into the machine shell 1.

When the visually impaired people need to read books and newspapers, the right hand holds the casing 1, the skirt edge or flange 31 at the front end of the longitudinal positioning rod 3 is clamped and tightly attached to the side surface of the upper edge of the books or newspapers and magazines, the left hand holds the pull ring 41 of the transverse positioning rope 4 and tightly attaches the left side edge of the books or newspapers and magazines, the lens of a camera or scanning component arranged at the bottom surface of the casing 1 is downward and tightly attaches the books or newspapers and magazines to be read, the key 2 corresponding to the text image information acquisition function on the casing 1 is pressed, text image information is acquired from left to right, when the first distance of the transverse positioning rope 4 is pulled to the end, the visually impaired people can sense that the casing 1 held by the visually impaired people is at the position of the second row of the books or newspapers and magazines, if the visually impaired people want to continue to read, the second distance of the transverse positioning rope 4 is changed according to the method, until the last distance into the transverse positioning rope 4; and then, if continuing, extending the longitudinal positioning rod 3 to a second distance in a line changing manner, so that the skirt edge or flange 31 at the front end of the longitudinal positioning rod 3 is still clamped and tightly attached to the side surface of the upper edge of the book or the journal, acquiring character and image information of the book or the journal from left to right by the method, and continuously performing line changing and line changing according to the method until the book or the journal is completely read.

During the whole process of reading, the visually impaired can determine the position of the book or the newspaper and magazine page according to the number of the lines or the columns of the book or the newspaper and magazine page where the longitudinally positioning rod 3 and the transversely positioning rope 4 are pulled out, and can quickly find the position read before interruption if the reading is continued after the interruption.

In the process of collecting, changing columns and changing lines from left to right, the image collecting module transmits the collected image information with characters to the storage module for storage, the image information is processed by the central processing unit and recognized by the OCR character recognition editing module, and then the TXT character information obtained after recognition by the OCR character recognition editing module is stored in the storage module.

The stored text information can be the text information which is scanned and collected independently, or the text information which is collected in a continuous overlapping way, namely, when the visually impaired people reads and collects the information, the stored text information can be shot once and stored once, also can be shot continuously and stored in real time, when the collected text images are small tickets and the like, and the characters with the size less than or equal to the small range of the visual field of the camera, all complete information can be obtained by shooting once, if the generated text information has multiple sections, the text information is switched to text paragraphs to be played through UP or DOWN keys on the shell 1, and audio files which can be played are generated through a TTS text voice conversion module.

Example 2

As shown in fig. 3, the control circuit of the present invention further includes an intelligent character splicing and typesetting algorithm module, which runs in the central processing unit. When the collected images are large-range character images and the camera cannot finish photographing the images at one time, the module can continuously photograph the images and store the images in real time, and after all character image information is photographed, the module automatically splices the collected character image information.

When the module reads the visually impaired people, the overlapped characters and image information which is continuously photographed and collected and stored in real time is spliced into correct and coherent character and image information.

When the device is used, the positioning mechanism arranged on the shell 1 is utilized, under the using function of reading books and newspapers, the photographing key is pressed once when the positioning is carried out, and the collection of the character information images in the whole page is completed through the positioning and the photographing key for a plurality of times.

After the character image information carried by the books or newspapers and periodicals is continuously collected, the character image information is identified by an OCR character identification editing module and then respectively stored, and characters collected for many times are continuously spliced according to the character superposition part and the semantics of each line, so that the continuity of the character information semantics is completed.

The intelligent character splicing and typesetting algorithm module can also realize the real-time voice reading function through the TTS character-to-voice module and the audio playing module.

Other modules, the arrangement and the functions of the case 1 and the using method of the control circuit of the embodiment are the same as those of the embodiment 1.

According to the size of the occupied area of the character image information carried by the plane media such as books, newspapers, magazines and the like, the invention collects, identifies and plays the relevant character information in the following three ways:

1) reading small-range character information such as business cards and shopping tickets: after the images are photographed and collected, the collected images with characters are directly typeset from top to bottom through an OCR character recognition editing module, processed through a TTS character to speech module and output through an audio playing module.

2) Reading large-range text information such as books, newspapers, magazines and the like: the positioning mechanism is used for collecting character image information, then the character image information which is continuously collected and mutually overlapped is spliced into correct and coherent character and image information through the OCR character recognition editing module and the intelligent character splicing and typesetting algorithm module, and the correct and coherent character and image information is output through the audio playing module after being processed by the TTS character-to-speech module.

3) Utilize the HDMI interface on the control circuit, realize being connected with outside high definition wide field of vision camera equipment or scanning equipment: the method can complete image acquisition of information carried by the whole page of the read book at one time. Then the collected images with characters are identified and typeset by an OCR character identification editing module, then processed by a TTS character voice conversion module, and audio is output by an audio module.

Claims

1. A portable character recognition phonation reader is characterized in that: comprises a shell (1) which is suitable for the visually impaired people to carry and is provided with more than one key (2), and a control circuit which is arranged in the shell (1) and is operated by the visually impaired people and enables the visually impaired people to obtain plane media information through voice, wherein the length multiplied by the width multiplied by the height of the shell is 100mm multiplied by 50mm multiplied by 40mm, the shell (1) is also provided with a positioning mechanism required for reading the plane media information, the positioning mechanism comprises a longitudinal positioning rod (3), a transverse positioning rope (4) and a tensioning device arranged in the shell (1), the longitudinal positioning rod (3) is a telescopic rod which can be pulled out or pushed back, the transverse positioning rope (4) is a thin rope which can be pulled out or withdrawn, and the control circuit comprises a control module, a character image acquisition module, an OCR character recognition editing module, a TTS character-to-speech module, a storage module and an audio playing module, wherein,

1) the control module is used for receiving the input of the key (2), calling the corresponding function module for processing according to the input instruction, is a coordination and arrangement center among the function modules, drives the function modules to operate according to the instruction, and is used for managing the operation and reading and writing of related files;

5) the storage module is an onboard SDRAM (synchronous dynamic random access memory) with the type of MT48LC16M16A2 or norflash and the type of AM29LV160B and an external SD card device, and the SD card selects an SPI data transmission mode;

2. The portable text recognition vocalization reader of claim 1, wherein: the control circuit also comprises an intelligent character splicing and typesetting algorithm module which splices the character and image information which are collected for many times and are mutually overlapped into correct and coherent character and image information.

3. The portable text recognition talking reader of claim 1 or 2 wherein: the control module is a central processing unit, and the central processing unit of the module uses an ADI BF533 chip set produced by Asia Deno semiconductor company.

4. The portable text recognition talking reader of claim 1 or 2 wherein: the OCR character recognition editing module adopts an OCR dynamic link library developed by Beijing Wenton company.

5. The portable text recognition talking reader of claim 1 or 2 wherein: the TTS text-to-speech module adopts an ejTTS6.0 speech synthesis module which is produced by Jietonghua corporation.

6. The portable text recognition talking reader of claim 1 or 2 wherein: the character image acquisition module is externally connected with high-definition large-view camera equipment or scanning equipment through an HDMI (high-definition multimedia interface).

7. A method for reading characters by a portable character recognition sounding reader comprises the collection of character image information carried by a plane medium, and is characterized in that: the method is independently operated by the visually impaired and obtains the content of the text information carried by the plane media through voice, and the operation steps are as follows:

a first step of opening a book, setting a longitudinal positioning rod (3) of a positioning mechanism of a portable character recognition phonation reader as claimed in any one of claims 2-6 as a first row, clinging a flange (31) at the front end of the longitudinal positioning rod to the upper edge of the book, pulling a pull ring (41) of a transverse positioning rope (4) by the left hand to cling to the left side edge of the book, pressing an UP or DOWN key to select a 'reading and reading' use function, pressing a QUERY key, starting the 'reading and reading' use function, holding the portable character recognition phonation reader by the right hand, and continuously photographing a first column on the opened page from left to right;

secondly, when the picture is taken to the end of the first row, the transverse positioning rope (4) cannot be pulled out, at the moment, the left hand is slightly loosened to slightly withdraw the transverse positioning rope (4), then the transverse positioning rope is continuously pulled out, and the right hand holds the portable character recognition sounding reader to continuously take a picture from left to right for the second row;

thirdly, when the right side of the page of the book is photographed, setting the longitudinal positioning rod (3) as a second row, continuously scanning the page of the book from left to right by the method, and repeating the steps of changing columns and rows by analogy until the whole page of the opened book is photographed;