CN109377834B

CN109377834B - Text conversion method and system for assisting blind person in reading

Info

Publication number: CN109377834B
Application number: CN201811129893.4A
Authority: CN
Inventors: 李宏亮; 孙旭
Original assignee: Chengdu Kuaiyan Technology Co ltd
Current assignee: Chengdu Kuaiyan Technology Co ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2021-02-09
Anticipated expiration: 2038-09-27
Also published as: CN109377834A

Abstract

The invention relates to the field of reading assistance, and discloses a text conversion method for assisting blind people in reading. The method comprises the following steps: respectively training the character position detection network and the recognition network; detecting the position of a character to be read by adopting a trained character position detection network, and guiding the blind to move the visual field by a voice guidance algorithm so as to obtain character information at different positions; and recognizing the characters at different positions by adopting a trained character position recognition network, splicing recognition results at different positions into complete semantic contents through a character splicing algorithm, and converting the complete semantic contents into voice reading. The scheme uses deep learning to detect and identify characters, has high speed and keeps higher precision in a complex scene; the complete content of the whole page is automatically spliced out by using voice prompt and the page is played in a voice mode, so that the size of the page is not limited, and the confusion of blind people caused by incomplete reading information is avoided. The invention also discloses a text conversion system for assisting the blind to read.

Description

Text conversion method and system for assisting blind person in reading

Technical Field

The invention relates to the technical field of reading assistance, in particular to a text conversion method and a text conversion system for assisting blind people in reading.

Background

The existing printed reading materials are designed for normal people, and the blind cannot read due to the visual defect, and can only read some reading materials translated into braille or audio reading materials to acquire information and learn knowledge. However, the number of the readings is very limited, the blind people are hard to read, the illiterate rate is high, and the blind people lose the most intuitive information acquisition mode, so that the blindness is marginalized, and the serious consequence that the blindness cannot be integrated into the society is caused. With the development of computer science, many people design a series of products for solving the problem that the blind cannot read printed matters like normal people, wherein the typical method is that a blind person reader worn on the finger is put on the index finger of Zhejiang university, then the blind person reader is directly used for touching the text information such as books, and a scanner can automatically scan and identify the glancing text, and then the text is converted into convex and concave braille through a dot matrix, so that the blind persons can identify the braille and further understand the meaning of the corresponding text information. However, such readers suffer from the following problems: the blind cannot see the blind, and the fingers are not in right places; or logic errors can occur due to wrong moving direction, so that the blind does not know the logic errors; the blind person is required to change the finger drop point continuously, and the efficiency is low; in addition, the size of the characters is greatly limited during reading, and the characters which cannot be touched cannot be read, so that the practicability is greatly limited.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the existing problems, a text conversion method and a text conversion system for assisting the blind in reading are provided.

The technical scheme adopted by the invention is as follows: a text conversion method for assisting blind people to read comprises the following processes:

step 1, respectively training a character position detection network and a recognition network;

step 2, detecting the position of the character to be read by adopting a trained character position detection network, and guiding the blind to move the visual field through a voice guidance algorithm so as to obtain character information at different positions;

and 3, recognizing the characters at different positions by adopting a trained character position recognition network, splicing recognition results at different positions into complete semantic content by a character splicing algorithm, and converting the complete semantic content into voice reading.

Further, the specific process of the voice guidance algorithm is as follows:

A. carrying out character position detection on the received video frame, calculating text characteristics and obtaining positioning frames of all text line areas in the video frame;

B. post-processing the positioning frame output by detection to remove the positioning frame with misjudgment or incomplete information;

C. according to the relative position of the positioning frame in the video frame in the process of moving from left to right, the blind person is guided by voice to move the reading to be read, so that the visual field of the camera falls on the upper left corner of the reading;

D. continuing to guide the blind person to move the printed reading materials to be read by voice, moving the visual field from the upper left corner to the right side until the right edge of the reading materials is finished, and finishing one-time scanning;

E. guiding the blind person to move the reading to be read according to the relative position of the detection positioning frame in the video frame, so that the visual field of the camera is positioned at the upper left corner of the non-scanned area of the reading;

F. step D, E is repeated until the entire page of the reading to be read is scanned.

Further, the specific method steps of the character splicing algorithm are as follows:

a. initializing a character string array for storing a splicing result, wherein the character string array is empty in an initial state;

b. sending the current video frame and the corresponding detection positioning frame into an identification network to obtain an identified multiline character result;

c. extracting the first 5 characters from the identified multi-line characters to obtain multi-line characters to be compared;

d. comparing the similarity of the multiple lines of characters to be compared with the substrings of the character strings in the stored result one by one to obtain the positions of the similar substrings meeting the conditions;

e. adding the character result of the video frame to the corresponding position;

f. and c, repeating the steps b, c, d and e until the whole page of the reading material to be read is scanned.

The invention also discloses a text conversion system for assisting the blind to read, which comprises:

the detection network training unit and the recognition network training unit are used for respectively training the character position detection network and the recognition network;

the text information acquisition unit is used for detecting the position of the text to be read by adopting a trained text position detection network and guiding the blind to move the visual field by a voice guidance algorithm so as to acquire text information at different positions;

and the voice conversion unit is used for recognizing the characters at different positions by adopting a trained character position recognition network, splicing recognition results at different positions into complete semantic contents through a character splicing algorithm and then converting the complete semantic contents into voice reading.

Further, the text information obtaining unit further includes a voice guidance algorithm unit, configured to:

C. according to the relative position of the positioning frame in the video frame, the blind person is guided by voice to move the reading to be read, so that the visual field of the camera falls on the upper left corner of the reading;

E. guiding the blind to move the reading to be read according to the relative position of the detection positioning frame in the video frame in the process of moving from left to right, so that the visual field of the camera falls on the upper left corner of the non-scanned area of the reading;

Further, the voice conversion unit further includes a text splicing algorithm unit, configured to:

Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows: the character detection and identification are carried out by using deep learning, the detection and identification speed is high, and higher precision can be kept in a complex scene; the complete content of the whole page can be automatically spliced by the blind moving the page of the reading material by using voice prompt, and the size of the page of the reading material is not limited; the method can use the voice to report the page character information with complete semantics, and avoid confusion of blind people caused by incomplete reading information; the complete system can be carried on a mobile device with very high portability.

Drawings

FIG. 1 is a flow chart of a text conversion method for assisting blind people to read according to the invention.

FIG. 2 is a flow chart of the speech guidance algorithm of the present invention.

FIG. 3 is a schematic flow chart of the text splicing algorithm of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Any feature disclosed in this specification (including any accompanying drawings) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

Example 1

As shown in fig. 1, a text conversion method for assisting blind people to read includes the following processes:

The text conversion method for assisting the blind to read in the embodiment 1 guides the blind to move the visual field to obtain the character information of different positions through voice interaction on the basis of detecting the character positions, splices the recognition character results of different positions into contents with complete semantics through a character splicing algorithm, and then converts the contents into voice reading. The method can be carried on a mobile device with very high portability, and high identification precision is kept in a low-delay scene.

Example 2

Preferably, on the basis of embodiment 1, as shown in fig. 2, the specific process of the voice guidance algorithm is as follows:

A. a camera image acquisition module is adopted to acquire video frames with the size of 640 pixels by 480 pixels in real time and output stable frame frequency; and (3) carrying out character position detection on the video frame with the size of 640 pixels by 480 pixels, calculating text characteristics and obtaining positioning frames of all text line areas in the video frame, wherein each positioning frame comprises coordinates of 4 vertexes of the positioning frame.

B. And post-processing the positioning frame output by detection, wherein the post-processing comprises removing the text frame with the short side length smaller than a certain threshold, and 20 pixels are adopted in the specific embodiment. The post-processing also comprises the steps of sequentially judging 4 vertexes of each positioning frame, and if a certain vertex exists and the distance between the vertex and the edge of the input image is less than 50 pixels, removing the corresponding text frame to realize the removal of the positioning frame with misjudgment or incomplete information;

C. according to the relative position of the positioning frame in the video frame, the blind person is guided by voice to move the reading to be read, so that the visual field of the camera is positioned at the upper left corner of the reading. The determination of the relative position includes the size of the text box and the distance of the 4 vertices of the text box from the boundary of the video frame image. The size of the text box is judged and calculated by using the height of the text box, if the text box with the height smaller than 20 pixels exists, the text box is judged to be too far away from the material to be read, and the user is reminded to approach the reading material by outputting too far voice. Similarly, if the text box with the height larger than 100 pixels exists, the user is judged to be too close to the material to be read, and the user is reminded to be far away from the material to be read through the voice output. When the distance between the user and the reading material is in a proper range, sequentially judging the distance between 4 vertexes of each positioning frame and the boundary of the video frame image, if the distance between a certain vertex and the left edge of the image is less than 50 pixels, judging that the visual field does not fall on the left part of the page, outputting the book to move to the right through voice, and reminding the user to move the visual field to the left side of the reading material. Similarly, if the distance from a certain vertex to the upper edge of the image is less than 50 pixels, the user is judged that the visual field does not fall on the upper part of the page, and the book moves downwards through voice output to remind the user to move the visual field to the upper side of the reading material. And sequentially and continuously prompting until the user falls the visual field to the upper left corner of the reading page.

D. After the printed reading materials reach the upper left corner of the page, continuing to guide the blind person to move the printed reading materials to be read by voice, moving the visual field from the upper left corner to the right side until the right edge of the reading materials is finished, and finishing one-time scanning; and sequentially judging the distance between 4 vertexes of each positioning frame and the image boundary of the video frame in the moving process, and if the distance between all vertexes and the right edge of the image is more than 50 pixels, judging that the visual field reaches the right edge of the reading. And moving from left to right, and simultaneously calling a character splicing algorithm to splice to obtain the complete character information of the section. At this point, one scan is completed.

E. Guiding the blind to move the reading to be read according to the relative position of the detection positioning frame in the video frame in the process of moving from left to right, so that the visual field of the camera falls on the upper left corner of the non-scanned area of the reading; the difference from the step C is that when judging whether the visual field is on the upper part of the page, the distance between the 4 vertexes of all the text frames and the upper edge of the video frame image is not judged any more, but the distance between the 4 vertexes of the text frame corresponding to the last line of text and the upper edge of the video frame image is judged, if the distance is more than 50 pixels, the visual field is judged not to be on the upper part of the page, the book is output through voice to move upwards, and a user is reminded to move the visual field to the upper side of the reading material. And sequentially and continuously prompting until the user falls the visual field to the upper left corner of the unscanned area of the reading material.

Example 3

Preferably, on the basis of the embodiments 1 and 2, the specific method steps of the character splicing algorithm are as follows:

a. initializing a character string array all _ text for storing a splicing result, wherein the character string array all _ text is empty in an initial state;

b. and sending the current video frame and the corresponding detection positioning frame into an identification network to obtain an identified multiline character result, and storing the result into a character string array text.

c. And extracting the first 5 characters of each character string in the character string array text to obtain a character string array match _ text to be compared.

d. And comparing the similarity of each character string in the character string array to be compared with the 5 character-size substring of each character string in the result character string array all _ text one by one, and recording the position information of the substring if the similarity of a certain character string in the character string array all _ text and a certain substring of a certain character string in the all _ text is more than 0.7.

e. And sequentially adding the character string array text of the frame to the corresponding position of the result character string array all _ text, thus finishing the character splicing process once.

The invention also discloses a text conversion system for assisting the blind to read, which corresponds to the text conversion for assisting the blind to read, the text conversion system for assisting the blind to read can be carried on a mobile device with very high portability for realization, and the text conversion system for assisting the blind to read comprises:

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed. Those skilled in the art to which the invention pertains will appreciate that insubstantial changes or modifications can be made without departing from the spirit of the invention as defined by the appended claims.

Claims

1. A text conversion method for assisting blind people to read is characterized by comprising the following processes:

step 3, recognizing characters at different positions by adopting a trained character position recognition network, splicing recognition results at different positions into complete semantic content by a character splicing algorithm, and converting the complete semantic content into voice reading;

the specific process of the voice guidance algorithm is as follows:

F. repeating the step D, E until the whole page of the reading to be read is scanned;

the specific method steps of the character splicing algorithm are as follows:

2. A text conversion system for assisting the blind in reading comprising:

the voice conversion unit is used for recognizing characters at different positions by adopting a trained character position recognition network, splicing recognition results at different positions into complete semantic content through a character splicing algorithm and converting the complete semantic content into voice reading;

the text information acquisition unit further comprises a voice guidance algorithm unit, which is used for:

the voice conversion unit further comprises a character splicing algorithm unit used for: