WO2021159729A1 - 图像文本播报方法及其设备、电子电路和存储介质 - Google Patents

图像文本播报方法及其设备、电子电路和存储介质 Download PDF

Info

Publication number
WO2021159729A1
WO2021159729A1 PCT/CN2020/123195 CN2020123195W WO2021159729A1 WO 2021159729 A1 WO2021159729 A1 WO 2021159729A1 CN 2020123195 W CN2020123195 W CN 2020123195W WO 2021159729 A1 WO2021159729 A1 WO 2021159729A1
Authority
WO
WIPO (PCT)
Prior art keywords
broadcast
text
data
line
storage space
Prior art date
Application number
PCT/CN2020/123195
Other languages
English (en)
French (fr)
Inventor
封宣阳
蔡海蛟
冯歆鹏
周骥
Original Assignee
上海肇观电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海肇观电子科技有限公司 filed Critical 上海肇观电子科技有限公司
Priority to US17/164,744 priority Critical patent/US11776286B2/en
Publication of WO2021159729A1 publication Critical patent/WO2021159729A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4888Data services, e.g. news ticker for displaying teletext characters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4332Content storage operation, e.g. storage operation in response to a pause request, caching operations by placing content in organized collections, e.g. local EPG data repository
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Definitions

  • the present disclosure relates to the technical field of image processing and text broadcasting, and in particular to an image text broadcasting method and its equipment, electronic circuits and storage media.
  • an image text broadcasting method including: receiving a designated broadcasting instruction; in response to the designated broadcasting instruction, determining the current broadcasting progress of the broadcasting data; according to the current broadcasting progress and the designated broadcasting Instruct to obtain the next piece of broadcast data from the first text, where the first text is composed of text data recognized and stored for the text in the text area of the image.
  • an image text broadcasting device including: a receiving device configured to receive a designated broadcasting instruction; and a broadcasting device configured to determine the current broadcasting progress of the broadcasting data in response to the designated broadcasting instruction
  • the processor is configured to obtain the next piece of broadcast data from the first text for the broadcast device to broadcast according to the current broadcast progress and the specified broadcast instruction, wherein the first text is directed to the text area of the image by the character recognition device
  • the text is recognized and stored in the text data.
  • an electronic circuit including: a circuit configured to perform the steps of the above-mentioned method.
  • a reading device including: the above-mentioned electronic circuit; and a circuit configured to broadcast text data.
  • an electronic device including: a processor; and a memory storing a program, the program including instructions that, when executed by the processor, cause the electronic device to execute the foregoing Methods.
  • a non-transitory computer-readable storage medium storing a program, the program including instructions that, when executed by a processor of an electronic device, cause the electronic device to execute the foregoing Methods.
  • Fig. 1 is a flowchart showing an image text broadcasting method according to an exemplary embodiment of the present disclosure
  • Figure 2 shows an exemplary image, which includes a text area containing multiple text lines
  • 3A, 3B, and 3C are diagrams illustrating a method and process of determining associated information according to another exemplary embodiment of the present disclosure
  • Fig. 4 shows a flowchart of an image text broadcasting method according to another exemplary embodiment of the present disclosure
  • FIG. 5A shows a process of preparing broadcast data according to an exemplary embodiment of the present disclosure
  • FIG. 5B shows an exemplary process of sequential broadcast according to an exemplary embodiment of the present disclosure
  • FIG. 6 is a flowchart showing an image text broadcasting method according to another exemplary embodiment of the present disclosure.
  • FIG. 7A, FIG. 7B, FIG. 7C, and FIG. 7D are diagrams illustrating a process of storing broadcast data and determining associated information according to an exemplary embodiment of the present disclosure
  • FIG. 8 is a flowchart showing preparation of designated broadcast data according to an exemplary embodiment of the present disclosure.
  • FIG. 9 is an exemplary form showing a designated broadcast instruction according to an exemplary embodiment of the present disclosure.
  • FIG. 10 is a flowchart illustrating preparation of designated broadcast data in response to a designated broadcast instruction according to an exemplary embodiment of the present disclosure
  • FIG. 11 is a flowchart illustrating preparation of designated broadcast data in response to a designated broadcast instruction according to an exemplary embodiment of the present disclosure
  • FIG. 12 is a flowchart illustrating preparation of designated broadcast data in response to a designated broadcast instruction according to an exemplary embodiment of the present disclosure
  • FIG. 13 is a flowchart illustrating preparation of designated broadcast data in response to a designated broadcast instruction according to an exemplary embodiment of the present disclosure
  • FIGS. 14A, 14B, and 14C are diagrams illustrating a method of constructing associated information for the next broadcast data and storing the associated information in a second storage space in response to a designated broadcast instruction according to an exemplary embodiment of the present disclosure
  • 15 is a flowchart showing an image text broadcasting method according to another exemplary embodiment of the present disclosure.
  • FIG. 16 is a block diagram showing an image text broadcasting device according to an exemplary embodiment of the present disclosure.
  • FIG. 17 is a block diagram showing an electronic device according to an exemplary embodiment of the present disclosure.
  • first, second, etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of these elements. Such terms are only used for Distinguish one element from another.
  • first element and the second element may refer to the same instance of the element, and in some cases, based on the description of the context, they may also refer to different instances.
  • TTS Text To Speech
  • the forward and backward of the voice broadcast function is currently not supported.
  • TTS is a kind of speech synthesis application, it can convert stored text data into natural speech output, and its voice broadcast function is very convenient for users.
  • the TTS voice broadcast function is instant, and it will broadcast after detection and identification, and it cannot support forward and backward broadcast.
  • the broadcasting supported by the present disclosure is not limited to voice broadcasting such as TTS, but can also support more types of broadcasting functions, such as vibration broadcasting for visually impaired and hearing impaired users, such as deaf-mute people.
  • FIG. 1 is a flowchart illustrating an image text broadcasting method according to an exemplary embodiment of the present disclosure.
  • a text line refers to a sequence of characters whose adjacent character spacing is less than a threshold spacing, that is, a continuous line of characters.
  • the distance between adjacent characters refers to the distance between the coordinates of the corresponding positions of adjacent characters, such as the distance between the coordinates of the upper left corner, the coordinates of the lower right corner, or the coordinates of the centroid of adjacent characters. If the distance between adjacent characters is not greater than the threshold distance, then the adjacent characters can be considered continuous, so that they are divided into the same text line. If the distance between adjacent characters is greater than the threshold distance, then the adjacent characters may be considered discontinuous (for example, they may belong to different paragraphs or belong to the left and right columns respectively), so that they are divided into different text lines.
  • the threshold spacing can be set according to the size of the text, for example, the threshold spacing of adjacent texts whose font size is greater than four (such as three and two) is greater than the font size and below four (such as small four and five). The threshold distance between adjacent texts.
  • the image shown in Figure 2 includes a text area containing 12 text lines (1st to 12th lines of text), these 12 text lines are divided into 2 paragraphs ("paragraph" in this article can be Refers to paragraphs or natural paragraphs), the first paragraph has 5 lines, and the second paragraph has 7 lines. It is understandable that an image is not limited to only one text area, but there can also be multiple text areas, and each text area in the image can be processed using the image text broadcasting method of the present disclosure.
  • the image text broadcasting method includes: step S101, performing character recognition on a text line to be recognized in the text area of the image to obtain text data; step S102, storing in the first The text data is stored in the space as a line of data in the first text for the text area; step S103, the broadcast data is stored in the third storage space; and step S104, the data is stored in the second storage space.
  • the associated information of the broadcast data where the associated information is used to correspond the broadcast data in the third storage space with the corresponding data in the first text in the first storage space with respect to positions.
  • step S101 character recognition is performed on the text line to be recognized in the text area in the image to obtain text data.
  • one or more text regions may be included in the image.
  • Each text area may contain at least two lines of text (at least two text lines), and the contained text may be, for example, various forms of text (including various characters, numbers, etc.).
  • the image may also include pictures and the like.
  • the image may be a pre-screened image, for example, a clearer image selected after multiple shots.
  • the image may directly be an image captured by a camera, or may be an image that has undergone certain or some pre-processing based on the image captured by the camera.
  • the pre-processing may include, for example, denoising and contrast. Enhancement, resolution processing, grayscale processing, blur removal, etc.
  • the camera may be set on a wearable device or glasses of the user, for example.
  • the camera used to capture images can perform static or dynamic image capture. It can be an independent device (such as a camera, a video camera, a camera, etc.), or it can be included in various types of electronic equipment (such as mobile phones, computers, personal digital cameras, etc.). Assistants, broadcast equipment, tablet computers, reading aids, wearable devices, etc.).
  • character recognition may be performed on a text line by, for example, an optical character recognition OCR method to obtain text data of the text line.
  • text line detection may be performed after the image is acquired and before the character recognition.
  • each text line to be recognized in a text area can be sequentially detected and recognized, and the text data of the text line can be obtained.
  • character recognition can be performed on the first line of text to obtain the text data of the first line of text (["Zhaoguan", that is, "open vision”.]). Then, character recognition can be performed on subsequent text lines in turn to obtain the corresponding text data.
  • step S102 the text data is stored in the first storage space as a line of data in the first text for the text area. That is, the recognized text data can be stored line by line in the first storage space, so that each line of text data in the first storage space is in a one-to-one correspondence with each text line in the text area.
  • the text data of the recognized text line may be stored in the first storage space, and during storage, may be stored as a line of data in the first text. That is, the recognized text data may be stored in rows as in the presentation form in the text area of the image. For example, when the text data of one text line in the recognized text area is stored in the first storage space, it is also stored as one line of data to facilitate subsequent processing.
  • the text data of each text line identified for the text area can be stored in the first storage space as the data of one line of the first text. Therefore, each line of data in the first text corresponds to each text line in the text area.
  • steps S101 and S102 are described separately as different steps. In other examples, the steps S101 and S102 may also be in the same step.
  • step S103 the broadcast data is stored in the third storage space.
  • the third storage space is used to store broadcast data, where the broadcast data in the third storage space may be broadcast data for sequential broadcasts, or broadcast data for designated broadcasts.
  • the storage of the broadcast data in the third storage space does not necessarily need to be stored in the form of the original row, that is, it may not be stored in a row or may be stored in a row, which is not limited in the present disclosure.
  • each piece of broadcast data can be stored for a longer period of time, and one piece of broadcast data can continue to be retained even if it has been broadcast, so as to maintain the integrity of the broadcast data during the entire broadcast period. It is also possible to remove or cover the broadcast data after the broadcast is completed to save storage space.
  • step S104 the associated information for the broadcast data is stored in the second storage space.
  • the association information is used to correlate the broadcast data in the third storage space with corresponding data in the first text in the first storage space with respect to positions.
  • the broadcast data corresponds to the corresponding data in the first text with respect to the position, which may mean that the broadcast data and the corresponding data in the first text have a corresponding relationship in position.
  • the position of the corresponding data in the first text is the position of the broadcast data in the first text.
  • the broadcast data and the corresponding data may be different, but through the associated information, the positional correspondence between them can be established, thereby facilitating data management and retrieval.
  • the broadcast data in the third storage space is corresponding to the first text in the first storage space.
  • the data corresponds in position, making it possible to realize the forward and backward functions of TTS broadcast. That is, through the above steps S101 to S104, sufficient preparations are made to support the forward and backward functions of TTS broadcast.
  • FIG. 1 exemplarily shows the above-mentioned storage steps
  • the present disclosure is not limited to the order of execution of the steps shown in the drawings of each specification, especially when the subsequent broadcasts are involved, because the Storage and broadcast are operations performed in parallel, and in many cases, there is no need for a time sequence or a specific execution order, but can be processed flexibly according to the actual situation.
  • the above-mentioned first, second, and third storage spaces are storage areas named to distinguish the storage of different data.
  • these storage spaces can be located in the same storage device (for example, a memory), or can be located in each of them. In different storage devices, or two of the storage spaces are located in the same storage device, and the other storage space is separately located in another storage device.
  • the associated information for the broadcast data may at least include:
  • the cutoff proportion of each row of data in the corresponding data in the corresponding data is determined by dividing the number of characters from the start row of the corresponding data to the number of characters in the row of data to the total number of characters in the entire corresponding data. Ratio to determine.
  • the position of each row of data in the corresponding data in the first text may include the row number, segment number, etc. of the row of data.
  • a value such as "00*" can be used to represent the row number of a row of data, for example, "002" can indicate that the row of data is in the second row in the first text.
  • the segment number can also be expressed similarly.
  • other methods for indicating the position are also included, which will not be repeated here.
  • the broadcast data is the same as the corresponding data.
  • the corresponding data and the broadcast data contain the text shown in FIG. 3A.
  • the text data of the 2nd, 3rd, and 4th text lines in the area are used as the data of the 1st, 2nd, and 3rd lines in the broadcast data, respectively).
  • the number of characters in a line of text can be used to represent the number of characters in the line of text. Assuming that 1 Chinese character is equal to 2 characters, and 1 English letter, 1 number, or 1 punctuation mark respectively equals 1 character, the number of characters in a row of data can be determined by, for example, the following method:
  • the number of characters in a line of data the number of Chinese characters in the line ⁇ 2 (1 Chinese character can be equal to 2 characters) + the number of English characters in the line + the number of numeric symbols in the line + the number of punctuation marks in the line quantity.
  • the text data in the first line in Fig. 3A ["Zhaoguan", that is, "open vision”.
  • the cut-off proportion of the last line is generally 100%, so the cut-off proportion of the last line may not be calculated but directly determined as 100%.
  • the obtained cut-off proportion can be stored in the second storage space together with the corresponding positions (002, 003, 004), so that the associated information for the broadcast data stored in the second storage space can be shown in the figure Shown in 3B.
  • cut-off proportion the method of using the number of characters to calculate the cut-off proportion is exemplified above.
  • the cut-off proportion of each row of data can also be determined based on other parameters, which will not be repeated here.
  • the corresponding data corresponding to the location of the broadcast data currently being broadcast may also be stored in the second storage space.
  • the broadcast data is the same as the corresponding data, as shown in FIG. 3C, not only can the correspondence between the two be established more easily, but also the specific information of the broadcast data can be accurately obtained.
  • the broadcast data stored in the third storage space may be broadcast data that supports sequential broadcast (for example, progressive broadcast); it may also be broadcast data that supports specified broadcast (for example, broadcast that supports the aforementioned forward and backward functions), namely , Can be broadcast data organized based on the broadcast location required for the specified broadcast.
  • the organization and storage of the broadcast data can be performed as needed, which will be described in detail with reference to FIG. 5A later.
  • Fig. 4 shows a flowchart of an image text broadcasting method supporting designated broadcasting according to another exemplary embodiment of the present disclosure.
  • step S1 a designated broadcast instruction is received.
  • the specified broadcast instruction is used to indicate that the user needs the broadcast device to broadcast the specified data, for example, the broadcast of a specified text unit (such as the first few lines, the next few lines, etc.).
  • step S2 in response to the received designated broadcast instruction, the current broadcast progress of the broadcast data is determined.
  • the broadcasting device When the user wants the broadcasting device to perform a designated broadcasting (for example, the first line of broadcasting), it is possible that the broadcasting device is in the process of broadcasting. Therefore, in order to determine the location of the designated data desired by the user, it is necessary to determine the current broadcast progress of the broadcast device.
  • a designated broadcasting for example, the first line of broadcasting
  • step S3 the next piece of broadcast data is obtained from the first text according to the current broadcast progress and the designated broadcast instruction.
  • the starting position of the next broadcast data can be determined. Therefore, the broadcast data that the user wants can be obtained from the first text used to store the text data, that is, the next broadcast data.
  • the first text may be composed of text data recognized and stored for the text in the text area of the image.
  • the present disclosure can realize support for designated broadcasts.
  • the storing broadcast data in the third storage space in step S103 may include steps S1031 and S1032.
  • step S1031 when the current broadcast data is stored in the third storage space, in the sequential broadcast mode, the text data of the newly recognized text line is stored in the third storage space as at least a part of the next broadcast data.
  • This step involves the storage of broadcast data for sequential broadcast.
  • the "current broadcast data” mentioned here is the piece of broadcast data immediately before the "next broadcast data”.
  • Sequential broadcasting includes broadcasting by row.
  • the recognized text data is stored in the first storage space to form the first text.
  • each newly recognized and stored text data in the first text can also be stored in the third storage space, and the broadcasting device can obtain it by itself. Since the broadcasting speed is usually slower than the character recognition speed, the text data of at least one text line can be obtained as a piece of broadcasting data each time. After the broadcasting of the previous broadcasting data is completed or the broadcasting is about to be completed, the broadcasting device continues to obtain the broadcasting from the third storage space. Data can be easily obtained and broadcasted in a timely manner.
  • the text data of each newly recognized text line may be stored in the first text first, and then obtained in batches from the first text according to the situation and stored in the third storage space, and stored in the third storage space.
  • Each batch of text data in the storage space can be used as a piece of broadcast data for broadcast.
  • the associated information for the next broadcast data stored in the second storage space is constructed and/or updated.
  • the stored broadcast data changes. It can be seen from the above that the newly stored text data will be used as part of the next broadcast data or as the entire broadcast data, so it does not affect the current broadcast data, but affects the next broadcast data.
  • the broadcast status of the broadcast device is usually determined by the progress of the broadcast, for example, whether the current broadcast data is completely broadcast, when to broadcast the next broadcast data, and so on. Therefore, for the broadcast data stored in the third storage space, if the broadcast device actively initiates the next broadcast data request and responds to the next broadcast data request to prepare to obtain the next broadcast data from the first text, then the identified And the unbroadcasted text data can all be stored as the next broadcast data in the third storage space. In response to this, the associated information for the next broadcast data can be constructed, and there is no need to update the relevant information of the next broadcast data, that is, The storage of the next broadcast data and the construction of its associated information only need to be executed once, so processing resources are very saved. However, this situation may result in a slow broadcast speed, because it needs to wait for the immediate acquisition and storage of the next broadcast data, and the immediate construction of related information.
  • association information has been described with reference to FIGS. 3A, 3B, and 3C, and the update of the association information will be illustrated later with reference to FIGS. 7A, 7B, 7C, and 7D.
  • the text data obtained by performing character recognition on the first text line to be recognized in the text area can be used as a piece of broadcast data alone, so that the text data of the first text line can be quickly broadcast, thereby improving the response speed of the broadcast , Improve user experience.
  • the first text line of the text area is stored separately as a piece of broadcast data.
  • first text line to be recognized can be the first line of the entire text area, or not the first line of the entire text area, but a part of the entire text area (part of the entire line ) In the first line of text to be recognized.
  • the image text broadcasting method may further include: step S110, for judging whether there is a next text line to be recognized in the text area, and if there is, go to step S101: Perform the character recognition operation of step S101 for the next text line to be recognized, and continue to perform the operation of step S102 in sequence, and loop like this until all the text lines to be recognized in the text area are recognized and stored in the first storage space. If there is no next text line to be recognized, the character recognition for the text area can be ended.
  • Step S110 may be executed after step S102, or may be executed after step S103 or step S104.
  • step S102 may be executed after step S103 or step S104.
  • step S104 may be executed after step S103 or step S104.
  • Fig. 5B shows an example of the process of sequential identification, storage, and sequential broadcasting.
  • the present disclosure can, for example, broadcast the currently stored text data in the order of storage while recognizing and storing the text data in order, thereby realizing sequential broadcasting of the characters in the text area. For example, if the text in a text area is arranged in rows, you can detect and recognize the text data in rows, and store each row of text data identified in sequence, and broadcast the currently stored text data in the order of storage. Each line of text data is sufficient.
  • FIG. 5B unlike FIG. 2, it is assumed that the text area has only the five text lines shown.
  • the five text lines are recognized line by line, and the obtained text data is stored in the first storage space line by line to form each line of data of the first text in turn.
  • the broadcast data (three pieces of broadcast data are shown in FIG. 5B) are stored in the third storage space for sequential broadcast.
  • Figure 5B does not show the second storage space and the associated information stored therein, because the second storage space and the associated information therein are used to establish a positional correspondence between the first storage space and the third storage space, and Not directly used for broadcast.
  • the first broadcast data can be broadcast immediately.
  • the subsequent text lines are recognized and stored, thereby Realize the beneficial technical effect of recognizing and storing, and broadcasting at the same time.
  • the text data broadcast of the first text line After the text data broadcast of the first text line is completed, continue to broadcast the second broadcast data. Assuming that in the process of broadcasting the text data of the first text line, the second line and the third line of text are recognized and stored, then the second broadcast data contains the second line of text and the third line of text.
  • the broadcast of the two pieces of broadcast data has semantic cohesion and context, which overcomes the rigid and mechanical gaps or jams that occur in the prior art when it is word-by-word or line-by-line.
  • the subsequent text lines are still recognized and stored, thereby achieving the beneficial technical effect of recognizing and storing while broadcasting.
  • a method of splicing and storing text data is adopted to facilitate multi-line broadcasting. Because the recognition and storage speed is faster than the broadcast speed, during the broadcast of one line, multiple lines may have been recognized and stored. Therefore, the method of recognizing and storing while broadcasting is adopted, and the stored text data is sufficient for the broadcasting device. , There is no need to wait for all text lines to be recognized and stored before broadcasting as in the prior art, so the broadcasting waiting time can be greatly reduced, the broadcasting speed and efficiency can be improved, and a more consistent and smooth broadcasting can be achieved.
  • the method and related equipment of the present disclosure can help, for example, visually impaired users, young or elderly users, and dyslexic users to more easily understand and understand the information automatically broadcast from the text area by the reading aid device, for example.
  • each piece of broadcast data can be removed or covered after the broadcast is completed, or it can be retained for a longer period of time.
  • storing batches of text data (at least one line of data) in the third storage space as broadcast data can overcome the lack of semantic cohesion, inconsistency, and excessive occurrence of semantics caused by verbatim or single-line broadcast in the prior art. Pause and other questions. Because the broadcast data in batches (splicing at least one line of text data) can make the connection between semantics more, the broadcast is more coherent and smooth, the jam phenomenon is greatly reduced, and the broadcast effect is improved.
  • FIG. 7A, FIG. 7B, FIG. 7C, and FIG. 7D combined with specific examples to illustrate that in the case where the text data of the recognized text line is sequentially stored in the third storage space, it should be stored in the second storage space accordingly.
  • the corresponding associated information can be calculated at one time without multiple updates. It is relatively simpler in calculation and processing. Therefore, no separate example will be given to illustrate the batch storage to the third storage space. More details about the calculation of the associated information in the case of storage space.
  • the associated information is stored in the second storage space.
  • the corresponding data corresponding to the broadcast data may be (but not necessarily) stored in the second storage space.
  • the broadcast data in this example is the same as the corresponding data, so for a more intuitive understanding, the calculation and update of the cut-off proportion are directly targeted at the broadcast data (although the broadcast data is not necessarily stored in rows , But because the broadcast data here is the same as the corresponding data, it can be considered that one line of data in the corresponding data is the "one line of data" in the broadcast data).
  • the respective text data (which constitutes a part or all of the corresponding data) are shown in Fig. 7A, Fig.
  • related information such as location and cut-off proportion may also be stored in the first storage space to facilitate information collection and management.
  • the associated information related to the first text stored in the first storage space is not necessarily the same as the associated information stored in the second storage space, because the associated information in the second storage space is often specific to the information in the first text.
  • Part of the data is calculated, and if the associated information is stored in the first storage space, it often needs to be calculated for the first text that is gradually recognized in the entire text area and finally composed of the text data of all text lines (or With the recognition of the text line, it is calculated and updated in real time, and finally the associated information for the entire first text is obtained).
  • the associated information is not stored in the first storage space.
  • the text data (not necessary) obtained by recognizing the first line of text in the text area and the associated information including the position (line number in this example) and the cut-off proportion are stored in the second storage space,
  • the stored information is for example as follows:
  • the cut-off proportion can be directly determined as 100%, in other words, the cut-off proportion can also be directly determined without calculation.
  • the text data of this line can be separately stored as a broadcast data, which can improve the response speed of the broadcast.
  • it can also be stored as one broadcast data with the text data of the subsequent text line.
  • step S110 it can be determined in step S110 whether there is a next text line to be recognized.
  • the next text line to be recognized for example, the second text line
  • step S101 continue to recognize the second line of text, as shown in FIG. 7B, and combine the recognized text data and related information Stored in the first storage space, the second storage space, and the third storage space respectively.
  • the text data and associated information of the second line of characters stored in the second storage space may be:
  • the second broadcast data Since the first broadcast data corresponding to the first line of text is being broadcast, so the second broadcast data is now being prepared, so far, it contains the text data of the second line of text.
  • step S110 it is determined whether there is a next text line to be recognized.
  • the next text line to be recognized for example, the third text line
  • go to step S101 continue to recognize the third line of text, and store the recognized text data and related information into the first storage space respectively , The second storage space and the third storage space.
  • the second broadcast data contains the text data of the second and third text lines. Now it is necessary to update the first broadcast data. 2.
  • the cut-off proportion of the text in the third line is 100%.
  • the text data and associated information for the next broadcast data stored in the second storage space are as follows:
  • step S106 it is determined whether there is a next text line to be recognized.
  • the next text line to be recognized is, for example, the fourth text line
  • go to step S101 continue to recognize the fourth line of text, and store the recognized text data and related information in the first, second, and third lines. Three storage space.
  • the percentage of the text in the third line can be changed.
  • the text data and associated information for the next broadcast data stored in the second storage space are as follows:
  • the text data of the second, third, and fourth lines can be used as the second broadcast data to continue the broadcast (sequential broadcast).
  • next text line to be recognized In the process of sequentially broadcasting the second broadcast data, it can also be judged whether there is the next text line to be recognized. In the case where it is determined that the next text line to be recognized is, for example, the fifth text line, the preparation of the next broadcast data (the third broadcast data) is continued.
  • the cut-off proportion of the next broadcast data can also be calculated in real time in response to the broadcast device initiating acquisition of the next broadcast data. In this way, it only needs to be calculated once and does not need to be stored each time.
  • the text data of a text line is updated.
  • the reason for calculation and/or update of associated information during sequential broadcasting is that the specified broadcasting operation initiated by the user is often unpredictable, and may occur at any time during the sequential broadcasting process, or at any time during the specified broadcasting process. Therefore, the necessary associated information can be prepared for identifying the current broadcast position every time it is broadcast.
  • the above describes the example situation where the image contains one text area.
  • the above recognition and storage operations can be performed for each text area, until all the text lines in the text area Or those text lines of interest are identified and stored.
  • the text data of the multiple text regions can be stored together, or they can be stored separately, which does not affect the essence of the present disclosure.
  • the present disclosure can make the text
  • the designated broadcast such as TTS broadcast is supported. It does not need to wait until all text lines are recognized and stored as in the prior art to start broadcasting. Instead, it can be recognized and stored while broadcasting.
  • the broadcasting does not affect the recognition and storage of text data, which greatly improves the broadcasting speed and realizes Efficient and fast broadcast, and by supporting designated broadcasts that do not need to re-recognize and store text, it greatly saves processing time and processing resources, and greatly improves the broadcast speed (no need to re-recognize at all).
  • step S1032 in response to receiving a designated broadcast instruction, the next broadcast data is obtained from the first text (combination of steps S2 and S3).
  • This step involves specifying the broadcast data for the broadcast.
  • users such as visually impaired and hearing impaired users
  • the present disclosure can support this function.
  • this function is referred to as a "designated broadcast" function in the present disclosure.
  • the designated broadcast may not only include forward and backward broadcasts, but may also include, for example, a broadcast at a location designated by the user.
  • the broadcast data (designated broadcast data) can be obtained from the first text.
  • each line of data in the first text corresponds to each text line in the text area.
  • the first text will eventually contain the text data of the entire text area. Therefore, both the sequential broadcast data or the designated broadcast data for the text area can be obtained from the first text.
  • the newly recognized text data can be directly stored in the third storage space during the sequential broadcast, instead of being obtained from the first text.
  • the current broadcast progress may be determined by the ratio of the number of characters that have been broadcast to the number of characters of the broadcast data.
  • obtaining broadcast data from the first text may include steps S10301, S10302, S10303, and S10304, as shown in FIG. 8.
  • step S10301 in response to receiving a designated broadcast instruction, determine the current broadcast position in the broadcast data as the current broadcast progress.
  • users may need to re-listen to the previous content, such as re-listening to the previous paragraph or the previous line of the current broadcast position.
  • a designated broadcast instruction is received during the broadcast process.
  • the current broadcast progress can be determined by the current broadcast position in the broadcast data.
  • step S10302 a position in the first text corresponding to the current broadcasting progress is determined as the current broadcasting position based on the current broadcasting progress and the associated information for the broadcasting data in the second storage space.
  • the associated information for the broadcast data in the second storage space may be combined, such as the position of each row of data in the broadcast data in the first text, and the deadline for that row of data. Than, as shown in Figure 3B, to obtain the current broadcast position.
  • step S10302 the position in the first text corresponding to the current broadcasting progress is determined as the current broadcasting position based on the current broadcasting progress and the associated information in the second storage space for the broadcasting data include:
  • the current broadcast progress can be compared with the cut-off proportion of the broadcast data stored in the second storage space.
  • the broadcast progress obtained above is 49%
  • the cut-off proportions of each row of data in the associated information about the piece of broadcast data stored in the second storage space are: 32%, 69%, and 100%, respectively.
  • a position in the text By searching for the associated information, it can be found that the line of data whose cut-off percentage is 69% is the third line of the first text. It can be seen that the current broadcast position is the third line of the text area corresponding to the first text.
  • the designated broadcast instruction may include a designated broadcast request and a designated broadcast type, as shown in FIG. 9.
  • the specified broadcast type may include the specified line before the broadcast, the specified line after the broadcast, the specified paragraph before the broadcast, the specified paragraph after the broadcast, and may even include the specified sentence before the broadcast, the sentence after the specified broadcast, etc., and may even include Specify a segment to be broadcast.
  • step S10302 Since what is obtained in step S10302 is only the current broadcast position, if you want to prepare broadcast data for the specified broadcast, you need to know the starting position of the specified broadcast, that is, the position to be broadcast.
  • step S10303 a position to be broadcast is determined in the first text based on the current broadcast position and the specified broadcast type in the specified broadcast instruction.
  • step S10304 using the position to be broadcast as the starting position, the next piece of broadcast data is obtained from the first text and stored in the third storage space, and the association for the next broadcast data is correspondingly stored in the second storage space. information.
  • the broadcast data starting from the position to be broadcast (that is, the next broadcast data) can be prepared for the designated broadcast.
  • the next broadcast data can include the position from the position to be broadcast to the last line of the first text, or the position from the position to be broadcast to the last line of the paragraph in the paragraph, or the position from the position to be broadcast in the first text.
  • the first few rows of data that are less than a certain threshold number of rows. For example, if the threshold number of rows is 4, then 3 rows of data can be prepared, and these 3 rows of data are 3 rows of data starting from the row where the position to be broadcast is located. Assuming that the position to be broadcast is the second line and the threshold number of lines is 4, it can be determined that the next broadcast data to be prepared is 3 lines of data starting from the second line, that is, the second line in the first text, Row 3 and row 4 data.
  • next piece of broadcast data After the next piece of broadcast data is obtained from the first text, it is stored in the third storage space to replace the previous broadcast data for broadcast, thereby supporting designated broadcasts.
  • next broadcast data can also be stored in the second storage space to replace the previous broadcast data, as shown in FIG. 3C, to facilitate the acquisition of accurate information of the current broadcast data.
  • the associated information for the next broadcast data in the second storage space that is, the position and cut-off proportion information for the next broadcast data, as shown in FIG. 3B.
  • the newly stored association information of the next piece of broadcast data may replace the previous association information of the broadcast data, or may be stored in the second storage space in an increased manner. If it is stored in an incremental manner, the status of the associated information of the previous broadcast data needs to be changed. For example, the status of the associated information can be clarified by setting a status flag.
  • the status identifier can be made to be "00" to indicate that the broadcast data in the third storage space corresponding to the associated information is "to be broadcast status”, the status identifier is "01” to indicate that its status is “broadcasting”, and the status identifier is "10” means its status is “broadcast” and so on.
  • the status identifier can be made to be "00" to indicate that the broadcast data in the third storage space corresponding to the associated information is "to be broadcast status”
  • the status identifier is "01” to indicate that its status is "broadcasting”
  • the status identifier is "10” means its status is "broadcast” and so on.
  • the present disclosure can support designated broadcast functions such as TTS broadcast, especially for the visually impaired or hearing impaired, and improve the user's reading experience.
  • the specified broadcasting type in the specified broadcasting instruction may include broadcasting adjacent text units.
  • the adjacent text unit is a text unit adjacent to the text unit where the current text line is located.
  • one text unit can be one text line or one text segment, or several text lines or several text segments. Therefore, the specified broadcasting operation that the present disclosure can support may include broadcasting the text unit adjacent to the text unit where the current text line is located. Therefore, the adjacent text unit can be one line or one paragraph, or several lines or several paragraphs. That is to say, the present disclosure can support the designated broadcast for the text unit.
  • the adjacent text unit may include several lines immediately before the currently broadcast text line, several lines immediately after the currently broadcast text line, and immediately before the paragraph where the currently broadcast text line is located. Several paragraphs, or several paragraphs immediately after the paragraph where the text line of the current broadcast is located.
  • the adjacent text unit in the broadcast includes a line before the broadcast, and the position of each line of data in the broadcast data stored in the second storage space in the first text includes the line corresponding to the data.
  • step S103 in response to receiving the designated broadcast instruction, obtaining the next broadcast data from the first text (corresponding to step S1032 in step S103) includes the following A step of.
  • step S10311 in response to receiving a designated broadcast instruction, determine the current broadcast position in the broadcast data as the current broadcast progress.
  • This step is similar to the aforementioned step S10301, and will not be repeated here.
  • step S10312 based on the current broadcast progress and the associated information for the broadcast data stored in the second storage space, it is determined that the row of data in the broadcast data corresponding to the current broadcast progress is in the first The line number of the corresponding text line in a text is used as the current broadcast line number.
  • This step is used to determine the current broadcast line number as the current broadcast position. Similar to step S10302, the current broadcast progress is compared with the line number in the associated information recorded in the second storage space (as described above, the associated information The position in includes the line number), you can determine the current broadcast line number.
  • step S10313 based on the specified broadcast type of the previous line of the broadcast, the current broadcast line number minus 1 is used as the line number to be broadcast.
  • the line number to be broadcast is the current broadcast line number minus one.
  • step S10314 the line where the line number to be broadcast is located in the first text is used as the starting position, and at least one line of text data is acquired as the next broadcast data.
  • the specified broadcast type in the specified broadcast instruction is to broadcast the previous line
  • the current broadcast line number as the current broadcast position is the third line in the first text (that is, the third text in the text area).
  • the start position of the broadcast should be the second line of the first text
  • several lines of data from the second line can be obtained as the next broadcast data. The specific number of rows of data that can be included in the next broadcast data has been described in detail above, and will not be repeated here.
  • the present disclosure can support specifying the previous line of the broadcast, which overcomes the defect that broadcasts like TTS broadcasts cannot support forward and backward in the prior art.
  • obtaining the next broadcast data from the first text may include the following steps.
  • step S10321 in response to receiving a designated broadcast instruction, determine the current broadcast position in the broadcast data as the current broadcast progress.
  • step S10322 based on the current broadcast progress and the associated information for the broadcast data stored in the second storage space, it is determined that the row of data in the broadcast data corresponding to the current broadcast progress is in the first The line number of the corresponding text line in a text is used as the current broadcast line number.
  • step S10323 based on the specified broadcast type of the next line of the broadcast, the current broadcast line number plus 1 is used as the line number to be broadcast.
  • step S10324 the line where the line number to be broadcast is located in the first text is used as the starting position, and at least one line of text data is acquired as the next broadcast data.
  • the specified broadcast type in the specified broadcast instruction is the next line of the broadcast
  • the current broadcast line number as the current broadcast position is the third line in the first text (that is, the third text in the text area).
  • the starting position of the broadcast should be the fourth line of the first text
  • several lines of data from the fourth line can be obtained as the next broadcast data.
  • the present disclosure can support specifying the next line of the broadcast, which overcomes the defect that broadcasts like TTS broadcasts cannot support forward and backward in the prior art.
  • the specified broadcast of the segment is not involved, it is not necessary to consider whether each row of data in the broadcast is in the same segment when preparing the next broadcast data, so as to maintain the continuity and fluency of the broadcast.
  • the amount or length of the next broadcast data can also be prepared according to actual needs, and the present disclosure does not need to limit this.
  • the position of each line of data in the broadcast data stored in the second storage space in the first text includes the line of data in the adjacent text unit of the broadcast including the previous paragraph of the broadcast.
  • obtaining the next broadcast data from the first text may include steps S10331 ⁇ S10334.
  • characters such as "[00*]” can be used to represent the segment number.
  • “[001]” can represent the first paragraph of the text area.
  • you can also use other ways to represent the segment number such as "#00*”.
  • "00*00*” can represent "segment number + line number", that is, the preceding "00*” is the segment number, and the following "00*” "Is the line number.
  • the present disclosure is not limited to this way of using special characters to represent segment numbers, but can also be represented in other ways, and line numbers are also similar. As long as the trip number and segment number can be identified and distinguished, the two will not be confused.
  • the position-related information in the associated information may also include both line number and segment number information, as long as there is a gap between the line number information and the segment number information. Just be able to distinguish. This makes it easier to specify the broadcast.
  • step S10331 in response to receiving a designated broadcast instruction, determine the current broadcast position in the broadcast data as the current broadcast progress. This step is similar to the aforementioned step S10301, and will not be repeated here.
  • step S10332 based on the current broadcast progress and the associated information for the broadcast data stored in the second storage space, it is determined that the row of data in the broadcast data corresponding to the current broadcast progress is in the first The segment number of the corresponding text line in a text is used as the current broadcast segment number.
  • the current broadcast segment number is the second paragraph.
  • step S10333 based on the specified broadcast type of the previous segment of the broadcast, the current broadcast segment number minus 1 is used as the segment number to be broadcast.
  • the segment number to be broadcast is the first segment.
  • step S10334 the segment corresponding to the segment number to be broadcast is obtained from the first text as the next broadcast data.
  • the segment number to be broadcast is the first segment
  • the first segment is obtained from the first text as the next broadcast data.
  • the present disclosure can support the designated broadcast of the previous paragraph, which overcomes the defect that broadcasts like TTS broadcasts cannot support forward and backward in the prior art.
  • the position of each line of data in the broadcast data stored in the second storage space in the first text includes the line of data
  • the step S103 in response to receiving the designated broadcasting instruction, obtaining the next broadcasting data from the first text may include the following steps.
  • step S10341 in response to receiving a designated broadcast instruction, determine the current broadcast position in the broadcast data as the current broadcast progress.
  • step S10342 based on the current broadcast progress and the associated information for the broadcast data stored in the second storage space, it is determined that the row of data in the broadcast data corresponding to the current broadcast progress is in the first The segment number of the corresponding text line in a text is used as the current broadcast segment number.
  • step S10343 based on the specified broadcast type of the subsequent segment of the broadcast, the current broadcast segment number plus 1 is used as the segment number to be broadcast.
  • step S10344 the segment corresponding to the segment number to be broadcast is obtained from the first text as the next broadcast data.
  • the present disclosure can support specifying the next segment of the broadcast, which overcomes the defect that broadcasts like TTS broadcasts cannot support forward and backward in the prior art.
  • the next broadcast data is obtained from the first text, and the associated information for the next broadcast data is stored in the second storage space .
  • the example of obtaining the next broadcast data and storage associated information during sequential reading is described above with reference to Figs. 7A to 7D.
  • the following will describe obtaining the next broadcast data and storage associated information in the case of designated broadcasts with reference to Figs. 14A to 14C Example.
  • the user initiates the specified broadcast request to read the previous line, then according to the previous description, from the second storage space
  • the stored association information about the current broadcast data determines that the current reading position is the fourth line of the text area. Then, it can be determined that the position to be broadcast is the third line of the text area. Therefore, the position of the bit to be broadcast can be used as the starting position to organize the next broadcast data.
  • At least one line of text data can be obtained from the first text as the next piece of broadcast data.
  • the associated information for the next piece of broadcast data is established in the second storage space, and the method for establishing the associated information is similar to that shown in FIGS. 3A to 3C.
  • the cut-off proportion of the last row of data can also be directly assigned as 100% without calculation.
  • the associated information for the next broadcast data stored in the second storage space is constructed.
  • the text data in this paragraph starting from the position to be broadcast is regarded as the next broadcast data (the first text has already stored enough text data) Circumstances), or you can select several lines starting from the position to be broadcast as the next broadcast data.
  • the several lines may be in the same paragraph (natural paragraph) or not in the same paragraph (that is, they may span paragraphs).
  • the correspondingly storing the associated information for the next broadcast data in the second storage space includes:
  • the position in the first text of each row of data in the corresponding data corresponding to the position of the next piece of broadcast data includes the row number of the row of data or the segment number of the row of data And line number.
  • the position information of each row of data in the corresponding data in the first text is a row number.
  • the location information may also include not only line numbers, but also segment numbers.
  • the three pieces of location information shown in FIG. 14C may also be [001003], [001004], and [001005], respectively.
  • 001 represents the first paragraph
  • 003, 004, and 005 represent the third, fourth, and fifth rows, respectively.
  • [001003], [001004], and [001005] respectively represent the third line of the first paragraph, the fourth line of the first paragraph, and the fifth line of the first paragraph.
  • the user can indicate that he wants to perform a designated broadcast and what type of designated broadcast he wants to perform by operating a corresponding button or a sliding operation on the touch screen. Therefore, the judgment can be made by detecting the corresponding operation, and in the case of detecting the corresponding operation, the corresponding designated broadcast instruction is generated.
  • the above-mentioned image text broadcast method may further include: step S110, in response to detecting an operation on the touch screen, generating the designated broadcast instruction. That is, a specific designated broadcast instruction can be generated for a specific operation on the touch screen.
  • generating the designated broadcast instruction in response to detecting an operation on the touch screen, includes:
  • generating the designated broadcast instruction in response to detecting an operation on the touch screen, includes:
  • a designated broadcast instruction whose designated broadcast type is the next segment of the broadcast is generated.
  • the various touch screen operations described above may include sliding operations on the touch screen.
  • the first touch screen operation may be, for example, a leftward sliding operation
  • the second touch screen operation may be, for example, a right sliding operation.
  • the operation may be, for example, an upward sliding operation
  • the fourth touch-screen operation may be, for example, a downward sliding operation.
  • touch-screen operations may also include operations such as tapping and long-pressing on the touch screen.
  • Different types of touch screen operations can be combined to set the corresponding designated broadcast type.
  • the corresponding designated broadcast type can also be set in conjunction with the occurrence of touch screen operations at different positions on the touch screen.
  • the corresponding operation and its meaning may be as follows, for example.
  • the text data in the first text may not always meet the needs.
  • prompt information can be provided. This will be explained in detail below.
  • the user can perform a sliding operation on the touch screen to indicate their reading intention.
  • the horizontal direction of the touch screen of the reading device is used as a reference for the left and right directions
  • the longitudinal direction of the touch screen is used as a reference for the up and down directions to describe the meaning of the exemplified sliding operation.
  • the user may be prompted: "It is already the first line”.
  • the user can be prompted: "it is the last line”.
  • ⁇ Swipe up operation can mean "broadcast the previous paragraph"
  • the user may be prompted: "It is the last paragraph".
  • the user may be prompted: "recognizing, please wait.”
  • users for example, visually impaired users and hearing-impaired users, etc.
  • visually impaired users and for hearing impaired users (which can be broadcasted in the form of vibration, for example)
  • gestures can be used to specify the broadcast.
  • the user's actions can be captured by using, for example, a camera, and then, based on the image captured by the camera, the image can be analyzed to determine whether the user wants to perform a designated broadcast, and what to do Types of designated broadcasts.
  • the present disclosure does not limit this.
  • the actions used by the user to indicate that they want to perform the specified broadcast and the type of the specified broadcast that they want to perform are similar to the case of using the user's operation above, and will not be repeated here.
  • the present disclosure can greatly improve the reading experience of users (such as visually impaired and hearing impaired users) by providing designated broadcast functions.
  • the position of each line of data in the first text may also be stored in the first storage space, so that each time the data is prepared to be broadcast, the stored information of the required position can be directly obtained from the first storage space.
  • each segment can be indicated by setting a specific segment indicator mark at the corresponding position in the first text, or it can be The first text is stored in the first storage space in the manner of the text segment in the text area, so that the first text itself can also have the information of the segment position corresponding to the text segment of the text area.
  • the broadcast device may actively obtain it, or the processing device may obtain it from the third storage space and provide it to the broadcast device.
  • the number of characters of the corresponding text line can also be stored similarly to the storage of the position information and the cut-off percentage, for example, the number of characters of the corresponding text line can be stored in the first storage space and/or the second storage space.
  • the required cut-off ratio can be calculated more quickly and/or the position of the text data can be located more quickly.
  • the associated information may include the number of characters in addition to the position and the cut-off ratio.
  • the designated broadcast may occur in the process of the sequential broadcast, that is, during the sequential broadcast, the user may need to listen again (as described above). At this time, the designated broadcast can be initiated, and the designated broadcast may be interrupted or terminated. The sequence broadcast is in progress and the required specific broadcast is started.
  • a specific type location identifier used to indicate the type of the text line is stored, and based on the specific type identifier, a prompt is issued to the user when broadcasting.
  • a specific type identifier for indicating the type of the text line may be stored.
  • a corresponding prompt can be issued to the user. For example, if it is determined that a text line to be broadcast is a title line, the user can be prompted with information such as "this is a title line”. If it is determined that a line of text to be broadcast is an obscure line, the user can be prompted with messages such as "This line of text cannot be recognized, please understand.”
  • the above-mentioned prompts may include one of a sound prompt, a vibration prompt, a text prompt, an image prompt, and a video prompt, or a combination thereof, so as to facilitate the use of users with various needs.
  • the specific type of text line includes:
  • the first type of text line wherein the first type of text line is determined by the size of the text
  • the second type of text line wherein the second type of text line is determined by the clarity of the text line.
  • the first type of text line can be a title line, a header, a footer, etc., and the text size of these lines is often different from other text lines.
  • the second type of text line refers to a text line that cannot be clearly recognized, that is, a text line with low text clarity (for example, lower than a preset text clarity threshold).
  • the text lines may be arranged in a horizontal direction, a vertical direction, or an oblique direction.
  • the present disclosure provides an image text broadcasting device 100.
  • the image text broadcasting device 100 may include a receiving device 101, a broadcasting device 102, and a processor 103.
  • the receiving device 101 may be configured to receive a designated broadcast instruction; the broadcast device 102 may be configured to determine the current broadcast progress of the broadcast data in response to the designated broadcast instruction; the processor 103 may be configured In order to obtain the next piece of broadcast data from the first text for the broadcast device to broadcast according to the current broadcast progress and the designated broadcast instruction.
  • the first text is composed of text data recognized and stored by the character recognition device for text lines in the text area of the image.
  • the image text broadcasting apparatus 100 may support designated broadcasting.
  • the image text broadcasting device 100 may further include a character recognition device 104 and at least one memory 105.
  • the character recognition device 104 may be configured to perform character recognition for a text line to be recognized in the text area of the image to obtain text data.
  • the at least one memory 105 may be configured to: store the text data of the text line in the first storage space of the at least one memory as a line of data in the first text for the text area, and also Can be used to store broadcast data in the third storage space of the at least one memory; and can also be used to store associated information for the broadcast data in the second storage space of the at least one memory, the associated information It is used to correspond the broadcast data in the third storage space with the corresponding data in the first text in the first storage space with respect to positions.
  • the broadcasting device 102 may obtain broadcasting data from the third storage space, and perform sequential broadcasting or designated broadcasting on the text area.
  • the processor 103 may, in response to receiving the designated broadcast instruction and the current broadcast progress from the broadcast device 102, obtain the next broadcast data from the first text in the first storage space and store it in the third storage space. .
  • the above-mentioned image text broadcasting device 100 may further include a detecting device 106, which may be configured to generate the specified broadcasting instruction in response to detecting a specified broadcasting operation, and send the specified broadcasting instruction to the processor.
  • the detection device may be directly an input device, or may be another detection component for detecting input or operation.
  • the designated broadcast operation may include various touch screen operations (such as the aforementioned first, second, third, and fourth touch screen operations, etc.). More specifically, for example, sliding to the left on the touch screen, and sliding to the right on the touch screen.
  • the designated broadcast operation may also include: sliding upward on the touch screen and sliding downward on the touch screen.
  • the associated information for the broadcast data may at least include: the position of each row of data in the corresponding data corresponding to the position of the broadcast data in the first text; and the corresponding The cut-off proportion of each row of data in the corresponding data.
  • the cut-off proportion of each row of data in the corresponding data in the corresponding data is determined by the processor 103 through the calculation of the number of characters from the initial row of the corresponding data to the row of data in the entire corresponding data. The proportion of the total number of characters is calculated and determined.
  • the processor 103 may construct and/or update the data stored in the second storage space for the next piece of broadcast data in response to each storage performed in the third storage space. Associated information.
  • the broadcasting device 102 may respond to the first touch screen operation on the touch screen (for example, a leftward sliding operation), and broadcast the previous line of the text line currently being broadcast; and may also respond to the second touch screen operation on the touch screen.
  • a touch screen operation for example, a sliding operation to the right
  • the broadcast device broadcasts the next line of the text line currently being broadcast.
  • the broadcasting device in response to a third touch screen operation on the touch screen (such as an upward sliding operation), the broadcasting device can broadcast the previous paragraph of the text segment currently being broadcast; and, in response to a fourth touch screen operation on the touch screen ( For example, a downward sliding operation), the broadcasting device broadcasts the last paragraph of the text segment currently being broadcast.
  • the broadcast device in response to the designated broadcast operation, may send a prompt to the user indicating that "recognizing" or "the designated location does not exist".
  • the detection device 106 in the device 100 will detect the designated broadcasting operation of the user, and then generate a designated broadcasting instruction, and send it to, for example, the processor 103 or the broadcasting device 102.
  • the processor When the processor receives the designated broadcast instruction, it parses the designated broadcast instruction, and requests the current broadcast progress information from the broadcast device to start preparing the designated broadcast data required for the designated broadcast.
  • the broadcasting device 102 When the broadcasting device 102 receives the specified broadcasting instruction, the broadcasting device determines the current broadcasting progress, and may or may not parse the specified broadcasting instruction, and then sends the current broadcasting progress together with the specified broadcasting instruction to the processor 103, and the processor 103 receives After the designated broadcast instruction and the current broadcast progress, the next broadcast data (specified broadcast data) will be prepared.
  • the process of preparing the next piece of broadcast data is similar to the process described above in connection with step S103.
  • the processor 103 obtains the current broadcast progress based on the current broadcast progress and the associated information stored in the second storage space in the memory 105 for the current broadcast data.
  • the corresponding current broadcast position (the position in the first text), and then based on the designated broadcast instruction, the position to be broadcast is determined.
  • the processor determines the position to be broadcast, it acquires several lines of text data starting from the position to be broadcast from the first text as the next piece of broadcast data, so that the memory 105 is stored in the third storage space for the broadcast device 102 to access for use.
  • the specified broadcast required.
  • the above-mentioned image and text broadcasting device 100 may provide a display function in addition to broadcasting in the form of, for example, voice or vibration. Therefore, it may also include a display device for displaying, for example, the data currently being broadcast or the current broadcast progress (for example, the broadcast position) and so on.
  • an electronic circuit which may include: a circuit configured to perform the steps of the above-mentioned method.
  • a reading device including: the above-mentioned electronic circuit; and a circuit configured to broadcast text data.
  • the reading device in response to a user's operation or a user's action, performs sequential broadcasting or designated broadcasting through the circuit for broadcasting text data.
  • the user's operations may refer to some operations performed by the user on, for example, a reading device, such as operations on switches, buttons, screens, and so on.
  • the user's actions can refer to certain actions that the user makes through the hands or head and other body parts to trigger the reading device to broadcast. For example, one click on the head indicates a sequential broadcast command, and two clicks within a short time interval indicate a designated broadcast command Wait.
  • the meaning of the user's operation or action can be designed according to actual needs.
  • the parameters in the above instructions can also be designed according to actual needs.
  • an electronic device including: a processor; and a memory storing a program, the program including instructions that, when executed by the processor, cause the electronic device to execute the foregoing Methods.
  • a non-transitory computer-readable storage medium storing a program, the program including instructions that, when executed by a processor of an electronic device, cause the electronic device to execute the foregoing Methods.
  • FIG. 17 is a block diagram showing an example of an electronic device according to an exemplary embodiment of the present disclosure. It should be noted that the structure shown in FIG. 17 is only an example. According to specific implementations, the electronic device of the present disclosure may only include one or more of the components shown in FIG. 17.
  • the electronic device 2000 may be, for example, a general-purpose computer (such as a laptop computer, a tablet computer, etc.), a mobile phone, or a personal digital assistant. According to some embodiments, the electronic device 2000 may be a reading aid device (or simply referred to as a reading device).
  • the electronic device 2000 may be configured to capture images, process the captured images, and provide corresponding broadcast services or prompts in response to the processing.
  • the electronic device 2000 can be configured to take an image, perform text detection and recognition on the image to obtain text data, convert the text data into sound data, and can output sound data for the user to listen to, and/or output text data for the user Watch on, for example, a display device (such as a normal display screen or a touch screen, etc.).
  • the electronic device 2000 may be configured to include a spectacle frame or be configured to be detachably mountable to a spectacle frame (for example, a frame of a spectacle frame, a connector connecting two spectacle frames, temples, or any other part). ), it is possible to capture an image approximately including the user's field of view.
  • a spectacle frame for example, a frame of a spectacle frame, a connector connecting two spectacle frames, temples, or any other part.
  • the electronic device 2000 may also be installed on other wearable devices, or integrated with other wearable devices.
  • the wearable device may be, for example, a head-mounted device (such as a helmet or a hat, etc.), a device that can be worn on the ear, and the like.
  • the electronic device can be implemented as an accessory that can be attached to a wearable device, for example, can be implemented as an accessory that can be attached to a helmet or a hat, or the like.
  • the electronic device 2000 may also have other forms.
  • the electronic device 2000 may be a mobile phone, a general-purpose computing device (such as a laptop computer, a tablet computer, etc.), a personal digital assistant, and so on.
  • the electronic device 2000 may also have a base so that it can be placed on a desktop.
  • the electronic device 2000 can be used as a reading auxiliary device (reading device) or an image and text broadcasting device to assist reading.
  • the electronic device 2000 is sometimes referred to as an "e-reader”. Or “reading aids.”
  • users who cannot read on their own can adopt a posture similar to the reading posture to realize the "books, magazines, etc.” read”.
  • the electronic device 2000 can obtain an image, perform character recognition on the text line in the image, obtain text data and store the obtained text data, so as to facilitate the rapid broadcast of the text data and make the broadcast There are semantic cohesion and context in the text data to avoid blunt stuttering caused by line-by-line or word-by-word broadcast.
  • the electronic device 2000 can support designated reading, by detecting the user's operation on the electronic device or the action presented by the user during the reading process, determine the user's designated reading needs, and broadcast the user's designated reading needs. Content, which is more convenient for users to use, greatly improving user experience.
  • the electronic device 2000 may include a camera 2004 for shooting and acquiring images.
  • the camera 2004 can take static images or dynamic images, and can include but is not limited to a camera, a camera, a video camera, etc., and is configured to obtain an initial image including an object to be recognized.
  • the electronic device 2000 may also include an electronic circuit 2100 that includes a circuit configured to perform the steps of the method as previously described.
  • the electronic device 2100 may further include a text recognition circuit 2005, which is configured to perform text detection and recognition (for example, OCR processing) on the text in the image, so as to obtain text data.
  • the character recognition circuit 2005 can be implemented by a dedicated chip, for example.
  • the electronic device 2000 may further include a sound conversion circuit 2006 configured to convert the text data into sound data.
  • the sound conversion circuit 2006 may be realized by a dedicated chip, for example.
  • the electronic device 2000 may further include a sound output circuit 2007 configured to output the sound data.
  • the sound output circuit 2007 may include, but is not limited to, earphones, speakers, or vibrators, etc., and their corresponding driving circuits.
  • the electronic device 2000 may further include an image processing circuit 2008, and the image processing circuit 2008 may include a circuit configured to perform various image processing on an image.
  • the image processing circuit 2008 may, for example, include, but is not limited to, one or more of the following: a circuit configured to denoise an image, a circuit configured to defuzzify an image, a circuit configured to perform geometric correction on an image A circuit, a circuit configured to perform feature extraction on an image, a circuit configured to perform target detection and recognition of a target object in an image, a circuit configured to perform text detection on a text contained in an image, a circuit configured to perform a text detection from an image A circuit configured to extract text lines from an image, a circuit configured to extract text coordinates from an image, and so on.
  • the electronic circuit 2100 may further include a word processing circuit 2009, which may be configured to be based on the extracted text-related information (such as text data, text boxes, paragraph coordinates, text line coordinates, (Text coordinates, etc.) to perform various processing to obtain processing results such as paragraph sorting, text semantic analysis, and layout analysis results.
  • a word processing circuit 2009 may be configured to be based on the extracted text-related information (such as text data, text boxes, paragraph coordinates, text line coordinates, (Text coordinates, etc.) to perform various processing to obtain processing results such as paragraph sorting, text semantic analysis, and layout analysis results.
  • One or more of the above-mentioned various circuits may use customized hardware, and/or may It is implemented by hardware, software, firmware, middleware, microcode, hardware description language, or any combination thereof.
  • one or more of the above-mentioned various circuits can be implemented in assembly language by using logic and algorithms according to the present disclosure.
  • a hardware programming language such as VERILOG, VHDL, C++
  • programming hardware for example, a programmable logic circuit including a field programmable gate array (FPGA) and/or a programmable logic array (PLA)).
  • the electronic device 2000 may further include a communication circuit 2010, which may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card , Infrared communication devices, wireless communication devices and/or chipsets, such as Bluetooth devices, 1302.11 devices, WiFi devices, WiMax devices, cellular communication devices and/or the like.
  • a communication circuit 2010, may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card , Infrared communication devices, wireless communication devices and/or chipsets, such as Bluetooth devices, 1302.11 devices, WiFi devices, WiMax devices, cellular communication devices and/or the like.
  • the electronic device 2000 may further include an input device 2011.
  • the input device 2011 may be any type of device that can input information to the electronic device 2000, and may include, but is not limited to, various sensors, a mouse, a keyboard, and a touch screen. , Buttons, joystick, microphone and/or remote control, etc.
  • the electronic device 2000 may further include an output device 2012.
  • the output device 2012 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a visual output terminal, a vibrator, and/or a printer, etc. .
  • the electronic device 2000 is used as a reading aid device according to some embodiments, the vision-based output device can facilitate the user's family or maintenance workers to obtain output information from the electronic device 2000.
  • the electronic device 2000 may further include a processor 2001.
  • the processor 2001 may be any type of processor, and may include, but is not limited to, one or more general-purpose processors and/or one or more special-purpose processors (for example, special processing chips).
  • the processor 2001 may be, but is not limited to, a central processing unit CPU or a microprocessor MPU, for example.
  • the electronic device 2000 may also include a working memory 2002, which may store programs (including instructions) and/or data (such as images, text, sounds, and other intermediate data, etc.) useful for the work of the processor 2001.
  • Memory and may include, but is not limited to, random access memory and/or read-only memory devices.
  • the electronic device 2000 may also include a storage device 2003.
  • the storage device 2003 may include any non-transitory storage device.
  • the non-transitory storage device may be any storage device that is non-transitory and can realize data storage, and may include but is not limited to Disk drives, optical storage devices, solid-state memory, floppy disks, flexible disks, hard disks, tapes or any other magnetic media, optical disks or any other optical media, ROM (read only memory), RAM (random access memory), cache memory, and /Or any other memory chip or cartridge, and/or any other medium from which the computer can read data, instructions and/or code.
  • the working memory 2002 and the storage device 2003 may be collectively referred to as "memory", and in some cases may be used with each other.
  • the memory may store the aforementioned first text stored in the first storage space, associated information stored in the second storage space (and related data corresponding to the broadcast data), and broadcast data stored in the third storage space.
  • the present disclosure does not limit whether the first storage space, the second storage space, and the third storage space are in the same storage device, as long as the required functions can be realized.
  • the processor 2001 can provide information to the camera 2004, the character recognition circuit 2005, the sound conversion circuit 2006, the sound output circuit 2007, the image processing circuit 2008, the word processing circuit 2009, the communication circuit 2010, the electronic circuit 2100, the input device 2011, At least one of the output device 2012 and other various devices and circuits included in the electronic device 2000 performs control and scheduling. According to some embodiments, at least some of the various components described in FIG. 17 may be connected and/or communicated with each other through the line 2013.
  • Software elements may be located in the working memory 2002, including but not limited to an operating system 2002a, one or more application programs 2002b, drivers, and/or other data and codes.
  • instructions for performing the aforementioned control and scheduling may be included in the operating system 2002a or one or more application programs 2002b.
  • the instructions for executing the method steps described in the present disclosure may be included in one or more application programs 2002b, and each module of the above electronic device 2000 may be read and executed by the processor 2001.
  • the instructions of the application program 2002b are implemented.
  • the electronic device 2000 may include a processor 2001 and a memory (for example, a working memory 2002 and/or a storage device 2003) storing a program.
  • the program includes instructions that, when executed by the processor 2001, cause the processing
  • the device 2001 executes methods as described in various embodiments of the present disclosure.
  • the character recognition circuit 2005, the sound conversion circuit 2006, the sound output circuit 2007, the image processing circuit 2008, the word processing circuit 2009, the communication circuit 2010, the electronic circuit 2100, the input device 2011, the output device 2012, and the electronic device 2000 Part or all of the operations performed by at least one of may be implemented by the processor 2001 reading and executing instructions of one or more application programs 2002b.
  • the executable code or source code of the instructions of the software element (program) can be stored in a non-transitory computer-readable storage medium (such as the storage device 2003), and can be stored in the working memory 2002 (which may be Compile and/or install). Therefore, the present disclosure provides a computer-readable storage medium storing a program.
  • the program includes instructions that, when executed by a processor of an electronic device (for example, a reading device), cause the electronic device to perform various implementations of the present disclosure. The method described in the example.
  • the executable code or source code of the instructions of the software element (program) can also be downloaded from a remote location.
  • the processor 2001 in the electronic device 2000 may be distributed on a network.
  • one processor may be used to perform some processing, while at the same time another processor remote from the one processor may perform other processing.
  • Other modules of the electronic device 2000 can also be similarly distributed. In this way, the electronic device 2000 can be interpreted as a distributed computing system that performs processing in multiple locations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

提供一种图像文本播报方法及其设备、电子电路和存储介质。所述图像文本播报方法包括:接收指定播报指示;响应于所述指定播报指示,确定关于播报数据的当前播报进度;根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据,其中,所述第一文本由针对图像的文本区域中的文本识别并存储的文本数据组成。

Description

图像文本播报方法及其设备、电子电路和存储介质 技术领域
本公开涉及图像处理与文本播报技术领域,特别涉及图像文本播报方法及其设备、电子电路和存储介质。
背景技术
近年来,图像处理与播报技术在各领域得到了广泛的应用,其中有关图像文本的播报技术一直是业界关注的焦点之一。
在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。
发明内容
根据本公开的一方面,提供一种图像文本播报方法,包括:接收指定播报指示;响应于所述指定播报指示,确定关于播报数据的当前播报进度;根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据,其中,所述第一文本由针对图像的文本区域中的文本识别并存储的文本数据组成。
根据本公开的一方面,提供一种图像文本播报设备,包括:接收装置,被配置为接收指定播报指示;播报装置,被配置为响应于所述指定播报指示,确定关于播报数据的当前播报进度;处理器,被配置为根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据供播报装置播报,其中,所述第一文本由字符识别装置针对图像的文本区域中的文本识别并存储的文本数据组成。
根据本公开的另一方面,提供一种电子电路,包括:被配置为执行上述的方法的步骤的电路。
根据本公开的另一方面,还提供一种阅读设备,包括:上述的电子电路;被配置为播报文本数据的电路。
根据本公开的另一方面,还提供一种电子设备,包括:处理器;以及存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述电子设备执行上述的方法。
根据本公开的另一方面,还提供一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行上述的方法。
附图说明
附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。
图1是示出根据本公开的一个示例性实施例的图像文本播报方法的流程图;
图2给出了一个示例性的图像,其中包括一个文本区域,该文本区域中含有多个文本行;
图3A、图3B、图3C是示出根据本公开的另一个示例性实施例的确定关联信息的方法与过程;
图4示出了根据本公开的另一个示例性实施例的图像文本播报方法的流程图;
图5A示出了根据本公开的一个示例性实施例的准备播报数据的过程,图5B是示出根据本公开的一个示例性实施例的顺序播报的示例过程;
图6是示出根据本公开的另一个示例性实施例的图像文本播报方法的流程图;
图7A、图7B、图7C、图7D是示出根据本公开的示例性实施例的播报数据存储与关联信息确定的过程;
图8是示出根据本公开的一个示例性实施例的准备指定播报数据的流程图;
图9是示出根据本公开的示例性实施例的指定播报指示的示例形式;
图10是示出根据本公开的一个示例性实施例的响应于指定播报指示而准备指定播报数据的流程图;
图11是示出根据本公开的一个示例性实施例的响应于指定播报指示而准备指定播报数据的流程图;
图12是示出根据本公开的一个示例性实施例的响应于指定播报指示而准备指定播报数据的流程图;
图13是示出根据本公开的一个示例性实施例的响应于指定播报指示而准备指定播报数据的流程图;
图14A、图14B、图14C是示出根据本公开的一个示例性实施例的响应于指定播报指示来构建针对下一条播报数据的关联信息并在第二存储空间中存储所述关联信息的方法与过程;
图15是示出根据本公开的另一个示例性实施例的图像文本播报方法的流程图;
图16是示出根据本公开的一个示例性实施例的图像文本播报设备的框图;以及
图17是示出根据本公开的一个示例性实施例的电子设备的框图。
具体实施方式
在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。
在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该要素可以是一个也可以是多个。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。
虽然字符识别相关的图像处理技术已经在各领域得到广泛应用,但是在图像文本播报方面,当前仍然存在一些挑战。
例如,在播报的过程中,用户(例如视障和听障用户)可能需要重听之前的某些内容。虽然电子书场景与音频播报场景支持播报功能的前进后退,但是在文本转语音TTS(Text To Speech)播报场景,目前还不支持语音播报功能的前进后退。TTS是语音合成应用的一种,它可以将存储的文本数据转换成自然语音输出,其语音播报功能对于用户而言非常方便。但是,TTS语音播报功能是即时性的,检测识别之后便播报,无法支持前进后退播报。
另外,本公开支持的播报不局限于例如TTS之类的语音播报,而是还可以支持更多类型的播报功能,比如面向视障和听障用户,例如聋哑人士的震动播报等等。
本公开提供了一种图像文本播报方法。图1是示出根据本公开的示例性实施例的图像文本播报方法的流程图。
在本公开中,文本行是指相邻文字间距小于阈值间距的文字的序列,即连续的一行文字。相邻文字间距指的是相邻文字的对应位置的坐标之间的距离,例如相邻文字左上角坐标之间、右下角坐标之间或质心坐标之间的距离等。如果相邻文字间距不大于所述阈值间距,则可认为所述相邻文字连续,从而将其划分到同一文本行中。如果相邻文字间距大于所述阈值间距,则可认为所述相邻文字不连续(例如可能分别属于不同的段落 或分别属于左右两栏),从而将其划分到不同的文本行中。所述阈值间距可以根据文字大小来设置,例如:字体大小大于四号(如三号、二号)的相邻文字设置的阈值间距大于字体大小为四号以下(如小四、五号)的相邻文字设置的阈值间距。
示例地,如图2所示的图像包括一个文本区域,该文本区域中含有12个文本行(第1~12行文本),这12个文本行分为2段(本文中的“段”可以指段落或者自然段),第1段有5行,第2段有7行。可以理解的是,一个图像中不限于只有一个文本区域,而是也可以有多个文本区域,可以对图像中的每个文本区域使用本公开的图像文本播报方法进行处理。
如图1所示,根据本公开的示例性实施例的图像文本播报方法包括:步骤S101,针对图像中文本区域的一个待识别文本行进行字符识别,获得文本数据;步骤S102,在第一存储空间中存储所述文本数据,作为针对所述文本区域的第一文本中的一行数据;步骤S103,在第三存储空间中存储播报数据;以及步骤S104,在第二存储空间中存储针对所述播报数据的关联信息,所述关联信息用于将第三存储空间中的播报数据与第一存储空间中的第一文本中的相应数据关于位置进行对应。
在步骤S101,针对图像中文本区域的待识别文本行进行字符识别,获得文本数据。
根据一些实施例,如上所述,图像中可包含一个或多个文本区域。每个文本区域可以包含至少两行文字(至少2个文本行),所包含的文字例如可以是各种形式的文字(包括各种字符、数字等)。另外,所述图像中除了文本区域之外,还可以包含图等。
根据一些实施例,所述图像可以是经过预先筛选的图像,例如经过多次拍摄,选取的其中较为清楚的图像。
根据一些实施例,所述图像可以直接是由摄像机拍摄所得的图像,也可以是在摄像机拍摄的图像基础上经过了某种或一些预先处理的图像,所述预先处理例如可以包括去噪、对比度增强、分辨率处理、灰度处理、模糊去除等等。
根据一些实施例,摄像机例如可以设置于用户的可穿戴设备或眼镜等设备上。
这里,用于拍摄图像的摄像机能够进行静态或动态的图像拍摄,其可以是独立装置(例如照相机、视频摄像机、摄像头等),也可以包括在各类电子设备(例如移动电话、计算机、个人数字助理、播报设备、平板计算机、阅读辅助设备、可穿戴设备等)中。
根据一些实施例,可以通过例如光学字符识别OCR方法来对文本行进行字符识别,得到该文本行的文本数据。
根据一些实施例,可以在获取图像之后并且在字符识别之前,进行文本行检测。
根据一些实施例,可以顺序检测和识别一个文本区域中的每个待识别文本行,得到该文本行的文本数据。
以图2所示的图像为例,可以先针对第1行文字进行字符识别,从而得到第1行文字的文本数据([“肇观”,即“开启视觉”。])。然后可以依次对后续文本行进行字符识别,得到相应的文本数据。
可以理解的是,不是必须从文本区域的第一行开始检测和识别,也可以直接从其他行开始。
在步骤S102,在第一存储空间中存储所述文本数据,作为针对所述文本区域的第一文本中的一行数据。即,可以在第一存储空间中逐行存储所识别的文本数据,使得第一存储空间中的每行文本数据与文本区域中的每个文本行是一一对应的。
根据一些实施例,可以将识别出的文本行的文本数据存储至第一存储空间,并且在存储时,可以作为第一文本中的一行数据来进行存储。即,可以按照在图像的所述文本区域中的呈现形式那样,也按行来存储识别的文本数据。例如,在将识别出的文本区域中的1个文本行的文本数据存储至第一存储空间时,也作为一行数据来存储,以方便后续的处理。
在本文中,可以将针对该文本区域所识别的各文本行的文本数据作为第一文本的一行数据存储在第一存储空间中。因此,第一文本中的每行数据与该文本区域中的每个文本行相对应。
上面为了更容易描述起见,将步骤S101与S102作为不同的步骤分开描述,在其它的示例中,步骤S101与S102也可以是在同一个步骤中。
在步骤S103,在第三存储空间中存储播报数据。
根据一些实施例,第三存储空间用于存储播报数据,其中第三存储空间中的播报数据可以是顺序播报用的播报数据,也可以是指定播报用的播报数据。这里,播报数据在第三存储空间中的存储不一定需要按照原有行的形式来存储,即,可以不按行存储,也可以按行存储,本公开对此不作限制。
另外,对于播报数据在第三存储空间中存储的时长,可以有多种处理方式。比如可以较长时间地存储每条播报数据,一条播报数据即便已经播报过也可以继续保留,从而保持在整个播报期间的播报数据的完整性。也可以采用将播报完毕之后的播报数据移除或加以覆盖的方式,以节省存储空间。
在步骤S104,在第二存储空间中存储针对所述播报数据的关联信息。
根据一些实施例,所述关联信息用于将第三存储空间中的播报数据与第一存储空间中的第一文本中的相应数据关于位置进行对应。
这里播报数据与第一文本中的相应数据关于位置进行对应,可以指播报数据与第一文本中的相应数据在位置上具有对应关系。例如,在播报数据与所述相应数据相同的情况下,所述相应数据在第一文本中的位置即为所述播报数据在第一文本中的位置。当然,播报数据与相应数据可以不相同,但是通过所述关联信息,可以建立它们之间的在位置上的对应关系,从而方便数据的管理与检索。
如前所述,目前TTS播报无法像电子书或者音频播报场景那样支持前进后退功能。在本公开中,通过设置第二存储空间并在存储空间中存储上述关联信息,以便通过所述关联信息,将第三存储空间中的播报数据与第一存储空间中的第一文本中的相应数据在位置上进行对应,使得实现TTS播报的前进后退功能成为可能。即,通过上述的步骤S101~S104,为支持TTS播报的前进后退功能进行了充足的准备。
可以理解的是,虽然图1示例性地示出了上述的各存储步骤,但是本公开不限于各说明书附图中示出的步骤执行顺序,尤其在后续涉及到播报时,因为本公开中的存储与播报是并行执行的操作,很多时候并不需要时间上的先后关系或者特定的执行顺序,而是可以根据实际情况进行灵活处理。
根据一些实施例,上述的第一、第二、第三存储空间是为了区分不同数据的存储而命名的存储区域,实际这些存储空间可以位于同一个存储装置(例如存储器)中,也可以各自位于不同的存储装置中,或者其中两个存储空间位于同一个存储装置中,另一个存储空间单独位于另一个存储装置中。另外,在数据的存储上,不一定要使用三个不同的存储区域(在同一个存储装置或者不在同一存储装置中)来存储上述数据,虽然将其命名为名字不同的三个存储空间。
根据一些实施例,针对所述播报数据的关联信息至少可以包括:
与所述播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置;以及
所述相应数据中的每行数据在所述相应数据中的截止占比。
这里,所述相应数据中的每行数据在所述相应数据中的截止占比通过从所述相应数据的起始行数据到该行数据的字符数量占整个所述相应数据的总字符数量的比例来确定。
根据一些实施例,所述相应数据中的每行数据在所述第一文本中的位置可以包括该行数据的行号、段号等等。例如,可以使用“00*”之类的数值来表示一行数据的行号,比如“002”可以表示该行数据在第一文本中处于第2行。段号也可以类似地表示。在本公开中,也包含用于表示位置的其他方式,在此不再赘述。
为了方便理解和描述,现在假设播报数据与所述相应数据相同,在上述那样使用行号来表示所述位置的情况下,假设所述相应数据与播报数据中包含的是图3A所示的文本区域中的第2、3、4文本行的文本数据(用矩形框标出来的3行,分别作为该播报数据中的第1、2、3行的数据)。
文本行的字符数量可以用于表示该行文字的字符数。假设1个汉字等于2个字符,1个英文字母、1个数字或者1个标点符号分别都等于1个字符,可以通过例如如下方式来确定一行数据的字符数量:
1行数据的字符数量=该行中的汉字数×2(1个汉字可以等于2个字符)+该行中的英文字符数+该行中的数字符号的数量+该行中的标点符号的数量。
请注意,也可以采用其他方式来计算一行数据的字符数量,而不限于这里示出的例示方式。
比如,图3A中第1行的文本数据[“肇观”,即“开启视觉”。]中的字符数为20,具体为:7个汉字×2+6个标点=20个字符。
通过上述的字符数量计算方式,可以确定所述相应数据(即播报数据)中的3行数据的字符数量分别为:26、30、25。那么,整个所述相应数据的总字符数量为26+30+25=81。
由此,可以计算出所述相应数据中每行数据的截止占比如下:
该相应数据中第1行数据的截止占比为:26/81=32%(对应于文本区域的第2文本行);
该相应数据中第2行数据的截止占比为:(26+30)/81=69%;以及
该相应数据中第3行数据的截止占比为:(26+30+25)/81=100%。
可以理解的是,最后1行的截止占比一般都是100%,所以也可以不计算最后1行的截止占比而是直接将其确定为100%。
可以将得到的上述截止占比连同对应的位置(002、003、004)一起存储在第二存储空间中,由此,在第二存储空间中所存储的针对该播报数据的关联信息可以如图3B所示。
上面为了解释“截止占比”的含义而举例说明了使用字符数量来计算得到截止占比的方法,也可以基于其他参数来确定每行数据的截止占比,在此不再赘述。
另外,根据一些实施例,还可以在第二存储空间中存储与当前正在播报的播报数据在位置上对应的所述相应数据。在播报数据与所述相应数据相同的情况下,则如图3C所示,由此不仅能够更容易地建立两者之间的对应关系,而且还能准确地获知播报数据的具体信息。
如上所述,第三存储空间中存储的播报数据可以是支持顺序播报(例如逐行播报)的播报数据;也可以是支持指定播报(例如支持前述的前进后退功能的播报)的播报数据,即,可以是基于指定播报所需的播报位置而组织的播报数据。
由于播报数据可能需要不断更新或者变换,在本公开中,可根据需要来执行播报数据的组织和存储,后面将参照图5A来具体描述。
本公开不仅可以支持顺序播报,还可以支持指定播报。图4示出了根据本公开的另一个示例性实施例的支持指定播报的图像文本播报方法的流程图。
如图4所示,首先,在步骤S1,接收指定播报指示。
所述指定播报指示用于表明用户需要播报装置进行指定数据的播报,例如指定文本单元(例如前几行、后几行等)的播报。
在步骤S2,响应于接收到的所述指定播报指示,确定关于播报数据的当前播报进度。
在用户想要播报装置进行指定播报(例如播报前1行)时,有可能播报装置处于正在播报的过程中。因此,为了确定用户想要的指定数据的位置,需要先确定播报装置当前播报的进度。
在步骤S3,根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据。
基于当前播报进度和指定播报指示,可以确定下一条播报数据的起始位置,由此,可以从用于存储文本数据的第一文本获取用户想要的播报数据,即下一条播报数据。
其中,所述第一文本可以是由针对图像的文本区域中的文本所识别并存储的文本数据组成的。
由此,本公开可以实现对于指定播报的支持。
下面,将对于本公开支持的顺序播报与指定播报过程进行更详细的描述。
根据一些实施例,如图5A所示,步骤S103中的所述在第三存储空间中存储播报数据可以包括步骤S1031、S1032。
在步骤S1031,在第三存储空间中存储着当前播报数据的情况下,在顺序播报模式中,将新识别的文本行的文本数据存储至第三存储空间作为下一条播报数据的至少一部分。
本步骤涉及顺序播报用的播报数据的存储。这里所述的“当前播报数据”,是紧挨在所述“下一条播报数据”之前的那条播报数据。
顺序播报包括按行依次播报。在按行识别和存储文本区域中的每个文本行的情况下,所识别的文本数据存储在所述第一存储空间中,组成第一文本。那么,在顺序播报的情况下,可以使得每个新识别和存储进第一文本的文本数据都同样存储进第三存储空间,由播报装置自行获取。由于通常播报速度比字符识别速度慢,所以每次可以获取至少一个文本行的文本数据作为一条播报数据,在上一条播报数据播报完毕之后或者即将播报完毕,播报装置继续从第三存储空间获取播报数据,由此可以方便及时获取与播报数据。
根据一些实施例,也可以使得每个新识别的文本行的文本数据先在第一文本中存储,再视情况从第一文本依次分批获取而存储到第三存储空间,并且存储到第三存储空间的每批文本数据可以作为一条播报数据,以供播报。
无论是随着逐行识别而依次存储每个新识别的文本行的文本数据还是分批存储新识别的文本行的文本数据,都是随着识别的进行而将新识别的文本行的文本数据存储进第三存储空间以便形成下一条播报数据,从而在顺序播报模式中,保持播报数据的更新。
根据一些实施例,响应于每次在第三存储空间中执行存储,构建和/或更新存储在所述第二存储空间中的针对所述下一条播报数据的所述关联信息。
每次在第三存储空间中执行存储,便使得存储的播报数据发生变化。由上可知,新存储的文本数据将要作为下一条播报数据的一部分或者作为整个播报数据,因此其不影响当前播报数据,但是影响下一条播报数据。
由于播报装置的播报状态通常由播报的进展来决定,比如,当前播报数据是否播报完毕,何时播报下一条播报数据等等。因此,对于第三存储空间中存储的播报数据而言,如果播报装置主动发起下一条播报数据请求并且响应于下一条播报数据请求才开始准备从第一文本获取下一条播报数据,那么已识别的且尚未播报的文本数据可全部作为下一条播报数据存储进第三存储空间,响应于此,可以构建针对下一条播报数据的关联信息,而且无需对下一条播报数据的关联信息进行更新,即,下一条播报数据的存储与其关联信息的构建只需执行一次,所以非常节省处理资源。但是这种情况可能导致播报的速度慢,因为需要等待下一条播报数据的即时获取与存储、以及关联信息的即时构建。
对于每新识别一个文本行便将其文本数据存储进第三存储空间的方式,在播报装置来获取下一条播报数据之前,需要不断地执行每个新识别的文本行的文本数据的存储,而且,还需要响应于每次这样的存储,更新针对作为下一条播报数据的一部分或者整个下一条播报数据(因为不确定播报装置何时来获取下一条播报数据)的关联信息。这种方式比前一种方式需要的处理资源多,但是能够加快播报速度,因为不需要像前一种方式那样等待播报数据的即时获取与存储、以及关联信息的即时构建。
前面结合图3A、图3B和图3C描述过关联信息的构建,稍后将参考图7A、7B、7C、7D举例说明关联信息的更新。
根据一些实施例,可以将对于所述文本区域的待识别首个文本行进行字符识别得到的文本数据单独作为一条播报数据,以便能够快速播报首个文本行的文本数据,从而提高播报的响应速度,提升用户体验。如图5B所示,该文本区域的首个文本行被单独存储作为一条播报数据。
这里,上述的“待识别首个文本行”可以是整个文本区域的第1行文字,也可以不是整个文本区域的第1行文字,而是整个文本区域中的一部分行(全部行中的一部分)中的待识别第1行文字。
将识别的首行数据单独作为一条播报数据进行播报,可以无需等待后续文本行的识别和存储,从而大大减少了播报等待时间,有效提升了播报速度,尤其感兴趣的第1行文字的播报速度非常有助于表现播报设备的性能。
根据一些实施例,如图6所示,根据本公开的图像文本播报方法还可以包括:步骤S110,用于判断文本区域中是否还有下一个待识别的文本行,如果有,则转到步骤S101,对于该下一个待识别的文本行进行步骤S101的字符识别操作,并继续依次进行步骤S102的操作,如此循环,直到文本区域中的待识别文本行全部识别并存储进第一存储空间。如果已经没有下一个待识别的文本行,则可以结束对于该文本区域的字符识别。
步骤S110可以在步骤S102之后执行,也可以在步骤S103或步骤S104之后执行。在本公开中,对于步骤的执行顺序没有限制,只要能够完成所需的功能即可。
为了便于理解,图5B给出了一个依次识别、存储、顺序播报的过程示例。
本公开可以例如在一边按照顺序识别和存储文本数据的情况下,一边按照存储的顺序来播报当前存储的文本数据,由此可实现对于文本区域中文字的顺序播报。比如,如果一个文本区域中的文字是按行排列的,则可以在按行来检测和识别文本数据、并且依次存储所识别的每行文本数据的同时,按照存储的顺序来播报出当前存储的各行文本数据即可。
为了简化描述,在图5B中,与图2不同,假定该文本区域仅有示出的这5个文本行。逐行识别这5个文本行,并且将得到的文本数据按行存储在第一存储空间中,依次组成第一文本的每行数据。而且,在第三存储空间中存储播报数据(图5B中示出了3条播报数据),以供进行顺序播报。图5B中没有示出第二存储空间以及其中存储的关联信息,因为第二存储空间以及其中的关联信息是用于在第一存储空间与第三存储空间之间建立位置对应的关联关系,并不直接用于播报。
下面将参照图5B来具体描述该示例。例如,对于图5B中所示的文本区域,可以在针对首个文本行进行识别并将识别的该行的文本数据存储为第1条播报数据之后,便立刻开始播报该第1条播报数据,以缩短播报等待时间,提升播报速度;并且,在针对首个文本行进行识别和存储之后,也就是在播报首个文本行的文本数据的同时,继续针对后续文本行进行识别和存储,由此实现一边识别和存储,一边播报的有益技术效果。
在首个文本行的文本数据播报完成之后,继续播报第2条播报数据。假设在播报首个文本行的文本数据的过程中,经历了第2行与第3行文字的识别与存储,则第2条播报数据包含第2行文字与第3行文字,由此使得第2条播报数据的播报具有语义衔接和上下文语境,克服了现有技术中逐字或逐行播报时产生的生硬且机械的间隔或卡顿。
并且,在播报第2条播报数据的过程中,仍然在针对后续文本行进行识别和存储操作,由此实现一边识别和存储,一边播报的有益技术效果。
在播报第2条播报数据之后,可以继续播报第3条播报数据(比如包含第4行、第5行文字)。如此循环,直到将整个文本区域中的文本行播报完毕,由此完成对该文本区域的顺序播报过程。
在本公开中,采用了文本数据拼接存储的方式,以方便多行播报。因为在识别和存储速度比播报速度快的情况下,在播报1行期间,可能已经识别与存储了多行,所以,采用一边识别与存储、一边播报的方式,存储的文本数据足够播报装置使用,无需像现有技术中那样需要等待全部文本行识别与存储完毕之后才能播报,所以能够大大减少播 报等待时间,提升播报速度和效率,并且实现更连贯流畅的播报。本公开的方法和相关设备可以帮助例如视障用户、低龄或老年用户、阅读障碍用户等更容易听懂与理解例如阅读辅助设备从文本区域中自动播报的信息。
需注意,尽管图5B中示出了全部3条播报数据,但是如前所述,每条播报数据可以在播报完毕之后去除或者被覆盖,也可以较长时间地保留。
本公开中,在第三存储空间中存储批量的文本数据(至少一行数据)作为播报数据,能够克服现有技术中逐字播报或者单行播报导致的语义缺乏衔接、不连贯、出现过多的卡顿等问题。因为批量(将至少一行文本数据拼接起来)的播报数据可以使得语义之间的衔接更多、播报更连贯更流畅,大大减少卡顿现象,提升播报效果。
以下将参照图7A、图7B、图7C、图7D,结合具体的例子来说明在逐行识别的文本行的文本数据依次存储到第三存储空间的情况下,要相应地在第二存储空间中存储的对应的关联信息的建立、存储与更新。对于批量存储到第三存储空间的情况,可以一次性计算得到相应的关联信息,而无需多次更新,相对而言计算和处理上更简单,因此,将不再单独举例说明批量存储到第三存储空间的情况下关于关联信息的计算的更多细节。
如上所述,在第二存储空间中存储关联信息,另外,可以(但不是必要的)在第二存储空间中存储与播报数据对应的所述相应数据。为了方便描述和理解,在本例中播报数据与所述相应数据相同,所以下面为了更加直观地理解,在描述截止占比的计算和更新时直接针对播报数据(虽然播报数据不一定按行存储,但是由于这里播报数据与相应数据相同,因此可以认为相应数据中的一行数据即播报数据中的“一行数据”)来说明。另外,为了方便描述和容易理解起见,在图7A、图7B、图7C、图7D中均示出了各自的文本数据(其构成所述相应数据的一部分或者全部),以便与关联信息对照查看,但是如上所述,这些文本数据不是必须存储在第二存储空间中,因此,在图中用虚线框来表示其不是必要的。
顺便提及,根据一些实施例,也可以在第一存储空间中存储例如位置与截止占比之类的关联信息,方便信息的收集与管理。但是需要注意,在第一存储空间中存储的第一文本相关的关联信息与第二存储空间中存储的关联信息不一定相同,因为第二存储空间中的关联信息往往是针对第一文本中的一部分数据进行计算得到的,而第一存储空间中如果存储关联信息,其往往需要针对整个文本区域中逐渐识别出的、最终由所有文本行的文本数据构成的第一文本来计算得到(也可以随着文本行的识别而实时计算、更新,最终得到针对整个第一文本的关联信息)。当然,在第一存储空间中不存储所述关联信息也是可以的。
在本例中虽然主要涉及第二存储空间中的关于关联信息的操作,在必要时也会描述第一和第三存储空间中的相应操作。
在这里举出的例子中,首先,将文本区域的第1行文字进行字符识别并存储在第一存储空间中(未示出),并将其作为一条播报数据(第1条播报数据)单独存储至第三存储空间以供例如顺序播报时播报(即,无需等待与其他行文字进行拼接存储),如图7A所示。
并且,还将对于该文本区域的第1行文字进行识别得到的文本数据(不是必要的)与包括位置(本例子中是行号)和截止占比的关联信息存储在第二存储空间中,存储的信息例如如下:
[“肇观”,即“开启视觉”。]、[001]、[100%]。
如上所述,对于最新识别的文本行,可以直接确定其截止占比为100%,换言之,也可以不计算而直接确定其截止占比。
如上所述,该行的文本数据可以单独存储为一个播报数据,能够提高播报的响应速度。当然,也可以与后续的文本行的文本数据存储为一个播报数据。
此时,可以开始播报第1条播报数据。
然后,在第1条播报数据的播报期间,可以在步骤S110判断是否有下一个待识别文本行。在判断有下一个待识别文本行(例如是第2文本行)的情况下,则转到步骤S101,继续识别第2行文字,如图7B所示,并将识别的文本数据及其相关信息分别存储进第一存储空间、第二存储空间与第三存储空间,此时在第二存储空间中存储的第2行文字的文本数据与关联信息可以为:
[肇观电子致力于计算机视觉处]、[002]、[100%]。
由于第1行文字对应的第1条播报数据正在播报,所以现在是在准备第2条播报数据,至此,其中包含了第2行文字的文本数据。
接着,在步骤S110判断是否有下一个待识别文本行。在判断有下一个待识别文本行(例如是第3文本行)的情况下,转到步骤S101,继续识别第3行文字,并将识别的文本数据及其相关信息分别存储进第一存储空间、第二存储空间与第三存储空间。
由于第1行文字对应的第1条播报数据正在播报,所以现在仍然是在准备第2条播报数据,至此,第2条播报数据中包含第2、3文本行的文本数据,现在需要更新第二存储空间中存储的它们的截止占比。此时,第2条播报数据中在文本区域中的位置在第2行的文本数据的截止占比从之前的100%更新为“第2行文字的字符数量/(第2行+第3行文字的字符数量)”,即26/56=46%,另外,第3行文字的截止占比为100%。
此时,如图7C所示,第二存储空间中存储的针对下一条播报数据的文本数据和关联信息如下:
[肇观电子致力于计算机视觉处]、[002]、[46%]
[理器和人工智能应用产品的创新和]、[003]、[100%]。
接着,在步骤S106判断是否有下一个待识别文本行。在判断有下一个待识别文本行例如是第4文本行的情况下,转到步骤S101,继续识别第4行文字,并将识别的文本数据及其相关信息存储进第一、第二、第三存储空间。
然后,更新第二存储空间中针对下一条播报数据的关联信息,首先计算针对下一条播报数据的总字符数量,即文本区域中第2行、第3行、第4行文字的字符数量,即26+30+25=81,及其相应的截止占比。此时,可以将第2行文字的截止占比更新为(第2行文字的字符数量/针对下一条播报数据的总字符数量),即26/81=32%,可以将第3行文字的截止占比更新为((第2行文字的字符数量+第3行文字的字符数量)/针对下一条播报数据的总字符数量),即(26+30)/81=69%,并且,可以直接将新识别的第4行文字的截止占比确定为100%。
此时,如图7D所示,第二存储空间中存储的针对下一条播报数据的文本数据和关联信息如下:
[肇观电子致力于计算机视觉处]、[002]、[32%]
[理器和人工智能应用产品的创新和]、[003]、[69%]
[研发,为机器人、无人机、无人]、[004]、[100%]。
此时,第1条播报数据播报完毕,则可以将上述的第2、3、4行文字的文本数据共同作为第2条播报数据继续进行播报(顺序播报)。
在顺序播报第2条播报数据的过程中,还可以接着判断是否有下一个待识别文本行。在判断有下一个待识别文本行例如是第5文本行的情况下,继续准备下一个播报数据(第3条播报数据)。
上面针对较为复杂的“响应于每次在第三存储空间中执行存储,更新存储在所述第二存储空间中的针对所述下一条播报数据的所述关联信息”结合一个实际的例子进行了详细的描述。
对于文本数据批量存储进第三存储空间作为下一条播报数据的情况,只需要计算一次针对该下一条播报数据的关联信息并存储到第二存储空间中即可,计算的方式与图3A~图3C、以及图7C和图7D相关的计算方式类似,在此不再赘述。
另外,关于上述的截止占比的计算与更新,也可以响应于播报装置发起获取下一条播报数据,实时计算该下一条播报数据的截止占比,这样只需要计算一次,不需要每次存储进一个文本行的文本数据便进行更新。
顺序播报时也进行关联信息的计算和/或更新的原因在于,用户发起的指定播报操作往往是不可预料的,可能在顺序播报的过程中随时发生,也可能在指定播报的过程中随时发生,所以可在每一次播报时,为识别当前播报位置准备所需的关联信息。
上面描述了图像中包含一个文本区域的示例情况,对于一个图像中包含多个文本区域的情况,可以分别针对每个文本区域进行上述的识别与存储操作,直到将该文本区域中的所有文本行或者感兴趣的那些文本行识别和存储完成。
根据一些实施例,在一个图像中包含多个文本区域时,可以将多个文本区域的文本数据存储在一起,也可以将它们分开存储,这些都不影响本公开的实质。
通过设置第二存储空间,利用第二存储空间中的针对播报数据的关联信息在第三存储空间的播报数据与第一存储空间中的相应数据在位置上建立对应关系,本公开能够使得在文字的识别和存储与播报并行地进行的情况下,支持例如TTS播报的指定播报。而无需像现有技术那样等到全部文本行都识别和存储完毕才能开始播报,而是可以一边识别与存储,一边播报,播报不影响文本数据的识别和存储,由此大大提升了播报速度,实现高效快速播报,而且通过支持无需再重新进行文本识别与存储的指定播报,大大节约了处理时间和处理资源,并且极大提升了播报速度(完全不用再重新识别)。
根据一些实施例,如图5A所示,在步骤S1032,响应于接收到指定播报指示,从第一文本获取下一条播报数据(步骤S2与S3的结合)。
本步骤涉及指定播报用的播报数据。如上所述,在播报的过程中,用户(例如视障和听障用户)可能需要重听之前的某些内容。本公开能够支持这个功能,如上所述,本公开中将该功能称为“指定播报”功能。这里,指定播报不仅可以包括前进后退播报,也可以包括例如用户指定位置的播报。
如前所述,在接收到指定播报指示的情况下,需要为指定播报准备相应的播报数据,也可称为“指定播报数据”。由此,在本步骤,响应于接收到指定播报指示,可以从第一文本获取播报数据(指定播报数据)。
如上所述,第一文本中的每行数据与所述文本区域中的每个文本行相对应。随着字符识别与存储的推进,第一文本中最终将包含整个文本区域的文本数据。因此,无论是针对该文本区域的顺序播报数据还是指定播报数据,都可以从第一文本中获得。当然,也可以如前所述的那样,顺序播报时直接将新识别得到的文本数据存储进第三存储空间,而不是从第一文本中获取。
通过本步骤,能够支持实现例如面向TTS播报之类的指定播报功能。
根据一些实施例,所述当前播报进度可以通过已播报的字符数量占所述播报数据的字符数量的比例来确定。
例如,图3B所示的包含3行数据的该条播报数据共有81个字符,假设已播报到字符“人工智能”,则此时已播报的字符数量为40,那么可以确定当前播报进度为40/81=49%。
另外,也可以通过其他参数来确定当前播报进度,本公开不局限于通过字符数量来计算当前播报进度。
根据一些实施例,所述响应于接收到指定播报指示,从第一文本获取播报数据可以包括步骤S10301、S10302、S10303、S10304,如图8所示。
在步骤S10301,响应于接收到指定播报指示,确定所述播报数据中的当前已播报位置作为当前播报进度。
如前所述,由于在播报的过程中,用户(例如视障和听障用户)可能需要重听之前的内容,例如重听当前播报位置的前一段或前一行等。
通常,在播报过程中接收到指定播报指示。在接收到指定播报指示的情况下,可以通过所述播报数据中当前已播报位置来确定当前播报进度。
在步骤S10302,基于当前播报进度与第二存储空间中的针对所述播报数据的关联信息,确定与当前播报进度对应的所述第一文本中的位置作为当前播报位置。
在确定当前播报进度之后,需要进一步确定与当前播报进度对应的所述第一文本中的位置,即当前播报位置。此时,可以基于得到的当前播报进度,结合第二存储空间中的针对所述播报数据的关联信息,例如播报数据中的每行数据在第一文本中的位置、以及该行数据的截止占比,如图3B所示,来获得当前播报位置。
根据一些实施例,步骤S10302中的所述基于当前播报进度与第二存储空间中的针对所述播报数据的关联信息,确定与当前播报进度对应的所述第一文本中的位置作为当前播报位置包括:
比较当前播报进度与第二存储空间中存储的关于所述播报数据的截止占比,将所述截止占比中大于当前播报进度的截止占比中最小的截止占比所对应的在第二存储空间中存储的关于所述播报数据中的每行数据在所述第一文本中的位置,确定为当前播报位置。
示例地,首先,可以比较当前播报进度与第二存储空间中存储的关于所述播报数据的截止占比。例如,前述得到的播报进度为49%,第二存储空间中存储的关于该条播报数据的关联信息中的每行数据的截止占比分别为:32%,69%,以及100%。在这种情况下,可以确定当前播报到的数据“人工智能”应该处于截止占比为69%(大于当前播报进度的截止占比之中最小的截止占比)的这一行数据所处于的第一文本中的位置。通过查找关联信息可以发现,截止占比为69%的这一行数据为第一文本的第3行,由此可知,当前播报位置为与第一文本对应的该文本区域的第3行。
根据一些实施例,指定播报指示可以包括指定播报请求和指定播报类型,如图9所示。
根据一些实施例,所述指定播报类型可以包括指定播报前一行、指定播报后一行、指定播报前一段、指定播报后一段,甚至可以包括指定播报前一句、指定播报后一句等,以及甚至可以包括指定播报某个片段。
由于在步骤S10302得到的仅是当前播报位置,想要为指定播报准备播报数据,需要知道指定播报的起始位置,即所述待播报位置。
因此,在步骤S10303,基于所述当前播报位置与所述指定播报指示中的指定播报类型,在所述第一文本中确定待播报位置。
在本步骤,通过由所述指定播报类型所决定的当前播报位置与待播报位置的关系,基于当前播报位置,得到待播报位置。例如,如果指定播报类型为播报前一行,则当前播报位置与待播报位置的关系为:待播报位置=当前播报位置-1。再例如,如果指定播报类型为播报后一行,则当前播报位置与待播报位置的关系为:待播报位置=当前播报位置+1。关于这一点,稍后将通过例子来更加具体地描述。
在步骤S10304,以所述待播报位置作为起始位置,从第一文本获取下一条播报数据存储至第三存储空间,并在第二存储空间中相应地存储针对所述下一播报数据的关联信息。
在通过本步骤得到当前播报的数据在第一文本中所处的位置即待播报位置之后,可以为指定播报准备从该待播报位置开始的播报数据(即下一条播报数据)。该下一条播报数据可以包含从该待播报位置开始直到第一文本的最后一行,也可以包含该段中从待播报位置开始直到该段的最后一行,或者为第一文本中的从待播报位置开始的小于某个阈值行数的若干行数据。比如如果阈值行数为4,则可以准备3行数据,这3行数据为从待播报位置所在的行开始的3行数据。假设待播报位置为第2行、阈值行数为4的情况下,则可以确定要准备的下一条播报数据为从第2行开始的3行数据,即,第一文本中的第2行、第3行、第4行数据。
在从第一文本获取下一条播报数据之后,将其存储至第三存储空间,以代替之前的所述播报数据进行播报,从而支持指定播报。另外,所述下一条播报数据也可以存储至第二存储空间,代替之前的播报数据,如图3C所示的那样,以方便获知当前播报数据的准确信息。
而且,需要计算并在第二存储空间中存储针对所述下一条播报数据的关联信息,即,针对该下一条播报数据的位置与截止占比信息,如图3B所示的那样。新存储的该下一条播报数据的关联信息可以代替之前的所述播报数据的关联信息,或者可以是以增加的方式存储在第二存储空间中。如果是以增加的方式存储,那么之前的所述播报数据的关联信息的状态需要发生改变,例如,可以通过设置状态标识来明确关联信息的状态。比如,可以使得状态标识为“00”表示与该关联信息对应的第三存储空间中的播报数据为“待播报状态”,状态标识为“01”表示其状态为“正在播报”,状态标识为“10”表示其状态为“已播报”等等。由此,方便通过状态标识来识别所需的关联信息。
由此,通过上述的步骤,使得本公开能够支持例如TTS播报之类的尤其面向视障或听障人士的指定播报功能,提升用户的阅读体验。
根据一些实施例,在所述指定播报指示中的指定播报类型可以包括播报相邻文本单元。
其中,所述相邻文本单元为与当前文本行所在的文本单元相邻的文本单元。
这里,一个文本单元可以为1个文本行或1个文本段,也可以为若干文本行或者若干文本段。由此,本公开能够支持的指定播报操作可以包括播报与当前文本行所在的文本单元相邻的文本单元。因此,其相邻的文本单元可以为1行或1段,也可以为若干行或者若干段。也就是说,本公开可以支持针对文本单元的指定播报。
其中,所述相邻的文本单元可以包括紧接在当前播报的文本行之前的若干行、紧接在当前播报的文本行之后的若干行、紧接在当前播报的文本行所在的段之前的若干段、或者紧接在当前播报的文本行所在的段之后的若干段。
根据一些实施例,在所述播报相邻文本单元包括播报前一行、存储在第二存储空间中的所述播报数据中的每行数据在所述第一文本中的位置包括该行数据所对应的文本行的行号的情况下,如图10所示,步骤S103中的所述响应于接收到指定播报指示,从第一文本获取下一条播报数据(对应步骤S103中的步骤S1032)包括如下的步骤。
在步骤S10311,响应于接收到指定播报指示,确定所述播报数据中的当前已播报位置作为当前播报进度。
本步骤与前述的步骤S10301类似,在此不再赘述。
在步骤S10312,基于当前播报进度和存储在所述第二存储空间中的针对所述播报数据的所述关联信息,确定所述播报数据中的与当前播报进度对应的那行数据在所述第一文本中所对应的文本行的行号作为当前播报行号。
本步骤是用于确定作为当前播报位置的当前播报行号,与步骤S10302类似,通过当前播报进度与第二存储空间中记载的所述关联信息中的行号(如上所述,所述关联信息中的位置包括行号),可以确定当前播报行号。
在步骤S10313,基于所述播报前一行的指定播报类型,将所述当前播报行号减1作为待播报行号。
由于指定播报指示中的指定播报类型是播报前一行,因此,可以确定待播报行号为当前播报行号减1。
在步骤S10314,以第一文本中所述待播报行号所在的行作为起始位置,获取至少一行文本数据作为下一条播报数据。
以下将举例进行说明。假设指定播报指示中的指定播报类型为播报前一行,则如前面的例子所述,在作为当前播报位置的当前播报行号为第一文本中的第3行(即该文本区域的第3文本行)的情况下,指定播报的起始位置应为第一文本的第2行,那么可以获取从该第2行起的若干行数据作为下一条播报数据。关于下一条播报数据中具体可以包含多少行数据,在上面已经详细描述过,在此不再赘述。
通过上述的步骤S10311~S10314,本公开能够支持指定播报前一行,克服了现有技术中类似TTS播报之类的播报不能支持前进后退的缺陷。
根据一些实施例,在播报相邻文本单元包括播报后一行、存储在所述第二存储空间中的所述播报数据中的每行数据在所述第一文本中的位置包括该行数据的行号的情况下,如图11所示,步骤S103中的所述响应于接收到指定播报指示,从第一文本获取下一条播报数据(对应步骤S103中的步骤S1032)可以包括如下步骤。
在步骤S10321,响应于接收到指定播报指示,确定所述播报数据中的当前已播报位置作为当前播报进度。
在步骤S10322,基于当前播报进度和存储在所述第二存储空间中的针对所述播报数据的所述关联信息,确定所述播报数据中的与当前播报进度对应的那行数据在所述第一文本中所对应的文本行的行号作为当前播报行号。
在步骤S10323,基于所述播报后一行的指定播报类型,将所述当前播报行号加1作为待播报行号。
在步骤S10324,以第一文本中所述待播报行号所在的行作为起始位置,获取至少一行文本数据作为下一条播报数据。
上述的步骤S10321~S10324与前述的步骤S10311~S10314类似,在此不再赘述。
以下将举例进行说明。假设指定播报指示中的指定播报类型为播报后一行,则如前面的例子所述,在作为当前播报位置的当前播报行号为第一文本中的第3行(即该文本区域的第3文本行)的情况下,指定播报的起始位置应为第一文本的第4行,那么可以获取从该第4行起的若干行数据作为下一条播报数据。
通过上述的步骤S10321~S10324,本公开能够支持指定播报后一行,克服了现有技术中类似TTS播报之类的播报不能支持前进后退的缺陷。
这里,在不涉及到对段的指定播报的情况下,在准备下一条播报数据时可以不考虑播报中的每行数据是否在同一段,以保持播报的连贯性和流畅性。当然,也可以根据实际需求来准备下一条播报数据的多少或者长短,本公开无需对此加以限制。
根据一些实施例,在所述播报相邻文本单元包括播报前一段、存储在所述第二存储空间中的所述播报数据中的每行数据在所述第一文本中的位置包括该行数据的段号的情况下,如图12所示,步骤S103中的所述响应于接收到指定播报指示,从第一文本获取下一条播报数据(对应步骤S103中的步骤S1032)可以包括步骤S10331~S10334。
这里,可以使用例如“[00*]”之类的字符来表示段号。比如,“[001]”可以表示文本区域的第1段。或者,也可以使用其他方式来表示段号,比如“#00*”之类。甚至,在既包括行号又包括段号的情况下,可以使用“00*00*”来表示“段号+行号”,即,前面的“00*”是段号,后面的“00*”是行号。可以理解的是,本公开不局限于这种使用特殊字符来表示段号的方式,而是也可以使用其他的方式来表示,行号也是类似。只要能够识别和区分出行号与段号,不会将两者混淆即可。
根据一些实施例,所述关联信息中有关位置的信息除了单独包含行号或者单独包含段号之外,也可以既包含行号又包含段号的信息,只要行号信息与段号信息之间能够区分即可。这样可以更方便指定播报。
在步骤S10331,响应于接收到指定播报指示,确定所述播报数据中的当前已播报位置作为当前播报进度。本步骤与前述的步骤S10301类似,在此不再赘述。
在步骤S10332,基于当前播报进度和存储在所述第二存储空间中的针对所述播报数据的所述关联信息,确定所述播报数据中的与当前播报进度对应的那行数据在所述第一文本中所对应的文本行的段号作为当前播报段号。
比如,假设前面的例子中得到的当前播报进度对应的第3文本行在所在的文本区域中是第2段,则当前播报段号为第2段。
在步骤S10333,基于所述播报前一段的指定播报类型,将所述当前播报段号减1作为待播报段号。
比如,在当前播报段号为第2段的情况下,待播报段号为第1段。
在步骤S10334,从所述第一文本中获取所述待播报段号对应的段作为下一条播报数据。
比如,在待播报段号为第1段的情况下,从所述第一文本中获取第1段作为下一条播报数据。
当然,还可以将第1段之后的一些文本数据与第一段一起作为下一条播报数据。
通过上述的步骤S10331~S10334,本公开能够支持指定播报前一段,克服了现有技术中类似TTS播报之类的播报不能支持前进后退的缺陷。
根据一些实施例,在所述播报相邻文本单元包括播报后一段、存储在所述第二存储空间中的所述播报数据中的每行数据在所述第一文本中的位置包括该行数据的段号的情况下,如图13所示,步骤S103中的所述响应于接收到指定播报指示,从第一文本获取下一条播报数据(对应步骤S103中的步骤S1032)可以包括以下步骤。
在步骤S10341,响应于接收到指定播报指示,确定所述播报数据中的当前已播报位置作为当前播报进度。
在步骤S10342,基于当前播报进度和存储在所述第二存储空间中的针对所述播报数据的所述关联信息,确定所述播报数据中的与当前播报进度对应的那行数据在所述第一文本中所对应的文本行的段号作为当前播报段号。
在步骤S10343,基于所述播报后一段的指定播报类型,将所述当前播报段号加1作为待播报段号。
在步骤S10344,从所述第一文本中获取所述待播报段号对应的段作为下一条播报数据。
上述的步骤S10341~S10344与前述的步骤S10331~S10334类似,在此不再赘述。
通过上述的步骤S10341~S10344,本公开能够支持指定播报后一段,克服了现有技术中类似TTS播报之类的播报不能支持前进后退的缺陷。
请注意,在准备好指定播报所需的下一条播报数据之后,同样需要建立和/或更新第二存储空间中的针对该下一条播报数据的关联信息,以便为下一次指定播报识别当前播报位置。
如上所述,在用户发起指定播报请求的情况下,响应于接收到指定播报请求,从第一文本获取下一条播报数据,并在第二存储空间中存储针对所述下一条播报数据的关联信息。前面结合图7A~图7D描述了顺序读取时获取下一条播报数据与存储关联信息的示例,下面将参考图14A~图14C来描述在指定播报的情况下获取下一条播报数据与存储关联信息的示例。
假设当前读取到图7D所示的第2条播报数据(当前播报数据)中的第3行时,用户发起读取上一行的指定播报请求,则根据前面的描述,由第二存储空间中存储的关于当前播报数据的关联信息确定当前读取位置为该文本区域的第4行。然后,可以确定待播报位置为该文本区域的第3行。因此,可以将该待播报位位置作为起始位置,组织下一条播报数据。
如上所述,关于下一条播报数据的组织,可以从第一文本中获取至少一行文本数据作为所述下一条播报数据。
假设如图14A所示,当前第一文本中已经存储了图3A所示的至少5行文本数据,并且这5行文本数据为该文本区域中的一段文字。则例如可以从第一文本中获取第3行(作为待播报位置的起始位置)至第5行(即待播报位置所在的该段的最后1行)的文本数据作为所述下一条播报数据存储在第三存储空间中,如图14B所示。
在确定了下一条播报数据之后,在第二存储空间中建立针对所述下一条播报数据的关联信息,建立关联信息的方法与图3A~3C所示的类似。
第一文本中的与图14B所示的下一条播报数据对应的相应数据共有3行,该相应数据的第1行是第一文本中的第3行文本数据,即“理器和人工智能应用产品的创新和”,可计算出其字符数量为30;该相应数据的第2行是第一文本中的第4行文本数据,即“研发,为机器人、无人机、无人”,可计算出其字符数量为25;该相应数据的第3行是第一文本中的第5行文本数据,即“车、安防监 控等专业领域提供端到端的解决方案”,可计算出其字符数量为41,由此,该相应数据的总字符数量为(30+25+41)=96。则该相应数据中的第1行数据的截止占比为30/96=31%,第1行数据的截止占比为(30+25)/96=57%,第3行数据的截止占比为(30+25+41)/96=100%。如上所述,最后一行数据的截止占比也可以不经计算而直接赋值为100%。
由此,构建完成存储在第二存储空间中的针对所述下一条播报数据的关联信息。
对于指定播报下一行、指定播报上一段或指定播报下一段的情形,与指定播报上一行的情形类似,在此不再赘述。
另外,对于下一条播报数据的组织,可以如上述的例子中那样,将待播报位置开始的本段中的文本数据均作为下一条播报数据(在第一文本中已经存储了足够的文本数据的情况下),或者也可以选择待播报位置开始的若干行作为下一条播报数据,该若干行可以在同一段(自然段),也可以不在同一段中(即,可以跨段)。
根据一些实施例,所述在第二存储空间中相应地存储针对所述下一播报数据的关联信息包括:
存储与所述下一条播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置;以及
存储所述相应数据中的每行数据在所述相应数据中的截止占比。
根据一些实施例,所述与下一条播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置包括该行数据的行号、或者该行数据的段号与行号。
在图14C所示的例子中,所述相应数据中的每行数据在第一文本中的位置信息为行号。这里,如上所述,该位置信息也可以包括不仅包括行号,还可以包括段号,例如图14C中的示出的三条位置信息还可以分别为[001003]、[001004]、[001005],其中,001表示第1段,003、004、005分别表示第3、4、5行。由此,[001003]、[001004]、[001005]分别表示第1段的第3行、第1段的第4行、第1段的第5行。
根据一些实施例,用户可以通过操作相应的按钮或者在触摸屏上的滑动操作等,表示想要进行指定播报以及想要进行哪种类型的指定播报。由此,可以通过检测相应的操作来进行判断,并且在检测到相应的操作的情况下,生成相应的指定播报指示。
根据一些实施例,如图15所示,上述的图像文本播报方法还可以包括:步骤S110,响应于检测到触屏上的操作,生成所述指定播报指示。即,可以针对触屏上的特定操作,生成特定的指定播报指示。
例如,根据一些实施例,所述响应于检测到触屏上的操作,生成所述指定播报指示包括:
响应于检测到触屏上的第一触屏操作,生成指定播报类型为播报前一行的指定播报指示;以及
响应于检测到触屏上的第二触屏操作,生成指定播报类型为播报后一行的指定播报指示。
再例如,根据一些实施例,所述响应于检测到触屏上的操作,生成所述指定播报指示包括:
响应于检测到触屏上的第三触屏操作,生成指定播报类型为播报前一段的指定播报指示;以及
响应于检测到触屏上的第四触屏操作,生成指定播报类型为播报后一段的指定播报指示。
其中,上述的各种触屏操作可以包括在触屏上的滑动操作,例如第一触屏操作可以例如为向左滑动操作,第二触屏操作可以例如为向右滑动操作,第三触屏操作可以例如为向上滑动操作,第四触屏操作可以例如为向下滑动操作。另外,触屏操作也可以包括在触屏上进行的点击、长按等操作。可以结合不同形式的触屏操作来设定对应的指定播报类型。当然,也可以结合触屏操作在触屏上的不同位置的发生来设定对应的指定播报类型。或者,将不同形式的触屏操作与触屏上不同的位置进行组合来设定对应的指定播报类型也是可以的。
下面将举例来描述。例如,在滑动操作触摸屏的情况下,相应的操作及其含义例如可以如下。另外,对于前进后退的指定播报请求,第一文本中的文本数据不一定都能够符合需要,在这种情况下,可以提供提示信息。以下将具体说明。
例如,对于一个没有显示屏,但有触屏的阅读设备,用户可以在该触屏上进行滑动操作,用以表明自己的阅读意愿。下面以阅读设备的触屏的横向方向作为左右方向的参照,以该触屏的纵向方向作为上下方向的参照,来描述例示的滑动操作的含义。
●向左滑动操作,可以表示“播报前一行”
如果当前播报的文本数据已经是第一文本的前端(比如第一行)的数据,则可以向用户提示:“已经是第一行”。
●向右滑动操作,可以表示“播报后一行”
如果文本区域已经识别完并且当前播报的文本数据是第一文本的末端(也即文本区域的末端)的数据,则可以向用户提示:“已经是最后一行”。
如果尚未识别完并且当前播报的文本数据是第一文本的末端但不是文本区域的末端的数据,即,文本区域中还有一些文本数据尚未识别和存储进已存储文本,则可以向用户提示:“正在识别,请稍等”。
●向上滑动操作,可以表示“播报前一段”
如果当前播报的文本数据已经是第一文本的第一段,则可以向用户提示:“已经是第一段”。
●向下滑动操作,可以表示“播报后一段”
如果文本区域已经识别完并且当前播报的文本数据是第一文本的最后一段(也即文本区域的最后一段),则可以向用户提示:“已经是最后一段”。
如果该文本区域尚未识别完并且当前播报的文本数据是第一文本的最后一段但不是该文本区域的最后一段,则可以向用户提示:“正在识别,请稍等”。
总之,上述的提示也可以概括成“正在识别”或者“不存在指定位置”的提示。
以上例举了在滑动操作的情况下指定播报的几种例示情形,需注意本公开不限于这里例举的例示情形,而是还可以包括其他更多的指定播报操作。另外,上面例举的向左、向右、向上、向下滑动也只是示例,在实际实现时,并不必须采用这里示例的形式,而是可以采用各种替换、变型或扩展的形式。
关于上述的文本行与段,可以通过各种版面分析方法来判断文本区域的一个文本行以及文本区域的一个文本段,在此不再详述。
另外,根据一些实施例,用户(例如视障用户与听障用户等)可以通过做出相应的动作,来表示想要进行指定播报以及想要进行哪种类型的指定播报。例如,对于视障用户、以及对于听障用户(可以通过例如震动的形式来进行播报),在看不到或者看不清文本区域中的内容的情况下,可以通过手势作为想要进行指定播报的信号或通知;或者,在能够看清文本区域的内容的情况下,可以将指引物(例如手指等)放到文本区域上想要播报的位置,来作为想要进行指定播报的信号或通知。在这种情况下,可以通过使用例如摄像机等来拍摄到用户的动作,然后,可以基于摄像机拍摄到的图像,对该图像进行分析,以确定用户是否想要进行指定播报,以及想要进行哪种类型的指定播报。
鉴于用户的动作可以包括很多种,因此本公开对此不作限制。另外,用户采用动作来表示想要进行指定播报以及想要进行的指定播报的类型与上面采用用户的操作的情形类似,在此不再赘述。
本公开通过提供指定播报功能,由此能够大大提升用户(例如视障和听障用户)的阅读体验。
根据一些实施例,也可以在第一存储空间中存储第一文本中每行数据的位置,这样每次准备播报数据时,可以直接从第一存储空间中获得所存储的所需位置的信息。但是,在第一存储空间中不存储第一文本中每行数据的行位置也是可以的,因为本来第一文本中的每行数据就是与文本区域的每个文本行对应的,也就是说,第一文本自身已经具有了文本区域的相应文本行的行位置的信息。
另外,对于文本段位置,如果不在第一存储空间中存储每行数据的段位置,那么可以通过在第一文本中相应位置处设置特定的段指示标记来表示每个段,或者,也可以在第一存储空间中按照文本区域中的文本段的方式存储第一文本,使得第一文本自身也能够具有与文本区域的文本段对应的段位置的信息。
如上所述,对于播报数据的获取,可以由播报装置主动获取,也可以由处理装置从第三存储空间中获取并提供给播报装置。
根据一些实施例,还可以与位置信息和截止占比的存储类似地来存储相应文本行的字符数量,比如在第一存储空间和/或第二存储空间中存储相应文本行的字符数量。在存储了相应文本行的字符数量的情况下,能够更快速地计算出所需的截止占比和/或更快速地定位到文本数据的位置。
由此,所述关联信息除了位置和截止占比之外,还可以包括字符数量。
基于实际需求,指定播报可能发生在顺序播报进行的过程中,即,在顺序播报时,用户可能需要重听(如前所述),此时可以发起指定播报,那么指定播报可能会中断或终止正在进行中的顺序播报而开始所需的指定播报。
根据一些实施例,对于特定类型文本行,存储用于表示该文本行类型的特定类型位置标识,并且基于所述特定类型标识,在播报时向用户发出提示。
对于上述这样的特定类型的文本行,可以存储用于表示该文本行的类型的特定类型标识。在播报时,如果确定要播报的某个文本行对应一个这样的特定类型标识,便可在向用户发出相应的提示。比如,如果确定要播报的一个文本行是标题行,便可以提示用户例如“这是一个标题行”等的信息。如果确定要播报的一个文本行是模糊行,便可以提示用户例如“无法识别该行文字,请谅解”等的信息。
根据一些实施例,上述的提示可以包括声音提示、震动提示、文字提示、图像提示、视频提示中的一种或者它们的组合,以方便各种需求的用户使用。
根据一些实施例,所述特定类型文本行包括:
第一类型文本行,其中,通过文字大小来确定该第一类型文本行;以及
第二类型文本行,其中,通过文本行清晰度来确定该第二类型文本行。
例如,第一类型文本行可以是标题行、页眉、页脚等,这些行的文字大小往往与其他文本行有所不同。
另外,第二类型文本行指无法清楚地识别的文本行,即文本清晰度不高(例如低于预设的文本清晰度阈值)的文本行。
根据一些实施例,所述文本行可以沿横向、竖向、或者斜向排列。
根据一些实施例,如图16所示,本公开提供一种图像文本播报设备100,该图像文本播报设备100可以包括接收装置101、播报装置102、处理器103。
其中,所述接收装置101可以被配置为接收指定播报指示;所述播报装置102可以被配置为响应于所述指定播报指示,确定关于播报数据的当前播报进度;所述处理器103可以被配置为根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据供播报装置播报。
其中,如前所述,所述第一文本由字符识别装置针对图像的文本区域中的文本行识别并存储的文本数据组成。
由此,根据本公开的示例性实施例的图像文本播报设备100可以支持指定播报。
根据一些实施例,图像文本播报设备100还可以包括字符识别装置104、至少一个存储器105。
其中,字符识别装置104可以被配置为针对图像中文本区域的一个待识别文本行进行字符识别,获得文本数据。
所述至少一个存储器105可以被配置用于:在所述至少一个存储器的第一存储空间中存储该文本行的所述文本数据,作为针对所述文本区域的第一文本中的一行数据,还可以用于在所述至少一个存储器的第三存储空间中存储播报数据;以及还可以用于在所 述至少一个存储器的第二存储空间中存储针对所述播报数据的关联信息,所述关联信息用于将第三存储空间中的播报数据与第一存储空间中的第一文本中的相应数据关于位置进行对应。
根据一些实施例,所述播报装置102可以从所述第三存储空间获取播报数据,进行关于所述文本区域的顺序播报或指定播报。
根据一些实施例,所述处理器103可以响应于接收到指定播报指示与来自播报装置102的当前播报进度,从第一存储空间中的第一文本获取下一条播报数据并存储至第三存储空间。
根据一些实施例,上述图像文本播报设备100还可以包括检测装置106,检测装置106可以被配置用于响应于检测到指定播报操作,生成所述指定播报指示,并发送给处理器。这里,检测装置可以直接是输入设备,也可以是另外的用于检测输入或者操作的检测部件。
根据一些实施例,所述指定播报操作可以包括各种触屏操作(比如前述的第一、第二、第三、第四触屏操作等)。更具体地,例如触屏上的向左滑动、触屏上的向右滑动。所述指定播报操作还可以包括:触屏上的向上滑动、触屏上的向下滑动。
根据一些实施例,针对所述播报数据的关联信息至少可以包括:与所述播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置;以及所述相应数据中的每行数据在所述相应数据中的截止占比。
其中,所述相应数据中的每行数据在所述相应数据中的截止占比由处理器103通过从所述相应数据的起始行数据到该行数据的字符数量占整个所述相应数据的总字符数量的比例来计算确定。
根据一些实施例,所述处理器103可以响应于每次在第三存储空间中执行的存储,构建和/或更新存储在所述第二存储空间中的针对所述下一条播报数据的所述关联信息。
根据一些实施例,播报装置102可以响应于触屏上的第一触屏操作(例如向左滑动操作),播报当前播报的文本行的前一行;并且,还可以响应于触屏上的第二触屏操作(例如向右滑动操作),播报装置播报当前播报的文本行的后一行。并且,还可以响应于触屏上的第三触屏操作(例如向上滑动操作),播报装置播报当前播报的文本段的前一段;以及,还可以响应于触屏上的第四触屏操作(例如向下滑动操作),播报装置播报当前播报的文本段的后一段。
根据一些实施例,响应于指定播报操作,播报装置可以向用户发出表示“正在识别”或“不存在指定位置”的提示。
对于上述的图像文本播报设备100而言,假如用户发起了指定播报操作,那么该设备100中的检测装置106将检测到该用户的指定播报操作,随即将生成指定播报指示,发送给例如处理器103或者播报装置102。
在处理器接收到指定播报指示的情况下,解析指定播报指示,并向播报装置请求当前播报进度信息,以开始准备指定播报所需的指定播报数据。
而在播报装置102收到指定播报指示的情况下,播报装置确定当前播报进度,并可以解析或者不解析指定播报指示,然后将当前播报进度连同指定播报指示发送给处理器103,处理器103接收到指定播报指示与当前播报进度后,开始准备下一条播报数据(指定播报数据)。
准备下一条播报数据的过程与前面结合步骤S103描述的过程类似,处理器103基于当前播报进度,结合存储器105中的第二存储空间中存储的针对当前播报数据的关联信息,获得与当前播报进度对应的当前播报位置(第一文本中的位置),然后基于指定播报指示,确定待播报位置。
在处理器确定待播报位置之后,从第一文本中获取以待播报位置开始的若干行文本数据作为下一条播报数据,使得存储器105存储至第三存储空间,供播报装置102取用,以进行所需的指定播报。
另外,根据一些实施例,上述图像文本播报设备100在提供例如语音或者振动形式的播报之外,还可以提供显示功能。由此,其还可以包括显示装置,用于显示例如当前正在播报的数据或者当前播报进度(例如播报位置)等等。
图像文本播报设备中各装置和/或部件的操作与前述的图像文本播报方法中执行的各步骤类似,在此不再赘述。
根据本公开的另一方面,还提供一种电子电路,可以包括:被配置为执行上述的方法的步骤的电路。
根据本公开的另一方面,还提供一种阅读设备,包括:上述的电子电路;被配置为播报文本数据的电路。
根据一些实施例,所述阅读设备响应于用户的操作或者用户的动作,通过所述播报文本数据的电路进行顺序播报或者指定播报。
用户的操作可以指用户在例如阅读设备上进行的一些操作,例如对于开关、按钮、屏幕等等的操作。
用户的动作可以指用户通过手或头等身体部位做出的某些用于触发阅读设备进行播报的动作,例如,点一下头表示顺序播报的命令,短时间间隔内点两下头表示指定播报的命令等。
可以根据实际需求,来设计用户的操作或动作所指代的含义。另外,还可以根据实际需求,来设计上述的指示中的参数。
根据本公开的另一方面,还提供一种电子设备,包括:处理器;以及存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述电子设备执行上述的方法。
根据本公开的另一方面,还提供一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行上述的方法。
图17是示出根据本公开的示例性实施例的电子设备的示例的框图。要注意的是,图17所示出的结构仅是一个示例,根据具体的实现方式,本公开的电子设备可以仅包括图17所示出的组成部分中的一种或多个。
电子设备2000例如可以是通用计算机(例如膝上型计算机、平板计算机等等各种计算机)、移动电话、个人数字助理。根据一些实施例,电子设备2000可以是阅读辅助设备(或者简称为阅读设备)。
电子设备2000可被配置为拍摄图像,对所拍摄的图像进行处理,并且响应于所述处理而提供相应的播报服务或者进行提示。例如,电子设备2000可被配置为拍摄图像,对该图像进行文字检测和识别以获得文字数据,将文字数据转换成声音数据,并且可以输出声音数据供用户聆听,和/或输出文字数据供用户在例如显示装置(例如普通显示屏或触摸显示屏等)上观看。
根据一些实施方式,所述电子设备2000可以被配置为包括眼镜架或者被配置为能够可拆卸地安装到眼镜架(例如眼镜架的镜框、连接两个镜框的连接件、镜腿或任何其他部分)上,从而能够拍摄到近似包括用户的视野的图像。
根据一些实施方式,所述电子设备2000也可被安装到其它可穿戴设备上,或者与其它可穿戴设备集成为一体。所述可穿戴设备例如可以是:头戴式设备(例如头盔或帽子等)、可佩戴在耳朵上的设备等。根据一些实施例,所述电子设备可被实施为可附接到可穿戴设备上的配件,例如可被实施为可附接到头盔或帽子上的配件等。
根据一些实施方式,所述电子设备2000也可具有其他形式。例如,电子设备2000可以是移动电话、通用计算设备(例如膝上型计算机、平板计算机等)、个人数字助理,等等。电子设备2000也可以具有底座,从而能够被安放在桌面上。
根据一些实施方式,所述电子设备2000作为阅读辅助设备(阅读设备)或者图像文本播报设备可以用于辅助阅读,在这种情况下,所述电子设备2000有时也被称为“电子阅读器”或“阅读辅助设备”。借助于电子设备2000,无法自主阅读的用户(例如视力障碍人士、存在阅读障碍的人士、听力障碍人士等)可以采用类似阅读姿势的姿势即可实现对常规读物(例如书本、杂志等)的“阅读”。在“阅读”过程中,所述电子设备2000可以获取图像,并对所述图像中的文本行进行字符识别,得到文本数据并存储得到的文本数据,以方便快速播报文本数据,并且使得播报的文本数据中有语义衔接和上下文语境,避免逐行或逐字播报引起的生硬的卡顿。而且,所述电子设备2000可以支持指定阅读,通过检测到用户在读取过程中对电子设备的操作或用户呈现出的动作,判断用户的指定阅读需要,为用户播报所述指定阅读所需的内容,从而更加方便用户的使用,大大提升用户体验。
电子设备2000可以包括摄像机2004,用于拍摄和获取图像。摄像机2004可以拍摄静态的图像,也可以拍摄动态的图像,可以包括但不限于摄像头、照相机、视频摄像机等,被配置为获取包括待识别对象的初始图像。电子设备2000还可以包括电子电路2100,所述电子电路2100包括被配置为执行如前所述的方法的步骤的电路。电子设备2100还可以包括文字识别电路2005,所述文字识别电路2005被配置为对所述图像中的文字进行文字检测和识别(例如OCR处理),从而获得文字数据。所述文字识别电路2005例如可以通过专用芯片实现。电子设备2000还可以包括声音转换电路2006,所述声音转换电路2006被配置为将所述文字数据转换成声音数据。所述声音转换电路2006例如可以通过专用芯片实现。电子设备2000还可以包括声音输出电路2007,所述声音输出电路2007被配置为输出所述声音数据。所述声音输出电路2007可以包括但不限于耳机、扬声器、或振动器等,及其相应驱动电路。
根据一些实施方式,所述电子设备2000还可以包括图像处理电路2008,所述图像处理电路2008可以包括被配置为对图像进行各种图像处理的电路。图像处理电路2008例如可以包括但不限于以下中的一个或多个:被配置为对图像进行降噪的电路、被配置为对图像进行去模糊化的电路、被配置为对图像进行几何校正的电路、被配置为对图像进行特征提取的电路、被配置为对图像中的目标对象进行目标检测和识别的电路、被配置为对图像中包含的文字进行文字检测的电路、被配置为从图像中提取文本行的电路、被配置为从图像中提取文字坐标的电路等等。
根据一些实施方式,电子电路2100还可以包括文字处理电路2009,所述文字处理电路2009可以被配置为基于所提取的与文字有关的信息(例如文字数据、文本框、段落坐标、文本行坐标、文字坐标等)进行各种处理,从而获得诸如段落排序、文字语义分析、版面分析结果等处理结果。
上述的各种电路(例如文字识别电路2005、声音转换电路2006、声音输出电路2007、图像处理电路2008、文字处理电路2009、电子电路2100中的一个或多个可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现。例如,上述的各种电路中的一个或多个可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
根据一些实施方式,电子设备2000还可以包括通信电路2010,所述通信电路2010可以是使得能够与外部设备和/或与网络通信的任何类型的设备或***,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信设备和/或芯片组,例如蓝牙设备、1302.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
根据一些实施方式,电子设备2000还可以包括输入设备2011,所述输入设备2011可以是能向电子设备2000输入信息的任何类型的设备,并且可以包括但不限于各种传感器、鼠标、键盘、触摸屏、按钮、控制杆、麦克风和/或遥控器等等。
根据一些实施方式,电子设备2000还可以包括输出设备2012,所述输出设备2012可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、视觉输出终端、振动器和/或打印机等。尽管电子设备2000根据一些实施例用于阅读辅助设备,但是基于视觉的输出设备可以方便用户的家人或维修工作人员等从电子设备2000获得输出信息。
根据一些实施方式,电子设备2000还可以包括处理器2001。所述处理器2001可以是任何类型的处理器,并且可以包括但不限于一个或多个通用处理器和/或一个或多个专用处理器(例如特殊处理芯片)。处理器2001例如可以是但不限于中央处理单元CPU或微处理器MPU等等。电子设备2000还可以包括工作存储器2002,所述工作存储器2002可以存储对处理器2001的工作有用的程序(包括指令)和/或数据(例如图像、文字、声音,以及其他中间数据等)的工作存储器,并且可以包括但不限于随机存取存储器和/或只读存储器设备。电子设备2000还可以包括存储设备2003,所述存储设备2003可以包括任何非暂时性存储设备,非暂时性存储设备可以是非暂时性的并且可以实现数据存储的任何存储设备,并且可以包括但不限于磁盘驱动器、光学存储设备、固态存储器、软盘、柔性盘、硬盘、磁带或任何其他磁介质,光盘或任何其他光学介质、ROM(只读存储器)、RAM(随机存取存储器)、高速缓冲存储器和/或任何其他存储器芯片或盒、和/或计算机可从其读取数据、指令和/或代码的任何其他介质。工作存储器2002和存储设备2003可以被集合地称为“存储器”,并且在有些情况下可以相互兼用。所述存储器中可以存储前述的第一存储空间中存储的第一文本、第二存储空间中存储的关联信息(和与播报数据对应的相关数据)、第三存储空间中存储的播报数据等。如前所述,本公开中并不限定第一存储空间、第二存储空间、第三存储空间是否在同一个存储装置中,只要能够实现所需的功能即可。
根据一些实施方式,处理器2001可以对摄像机2004、文字识别电路2005、声音转换电路2006、声音输出电路2007、图像处理电路2008、文字处理电路2009、通信电路2010、电子电路2100、输入设备2011、输出设备2012以及电子设备2000包括的其他各种装置和电路中的至少一个进行控制和调度。根据一些实施方式,图17中所述的各个组成部分中的至少一些可通过线路2013而相互连接和/或通信。
软件要素(程序)可以位于所述工作存储器2002中,包括但不限于操作***2002a、一个或多个应用程序2002b、驱动程序和/或其他数据和代码。
根据一些实施方式,用于进行前述的控制和调度的指令可以被包括在操作***2002a或者一个或多个应用程序2002b中。
根据一些实施方式,执行本公开所述的方法步骤的指令可以被包括在一个或多个应用程序2002b中,并且上述电子设备2000的各个模块可以通过由处理器2001读取和执行一个或多个应用程序2002b的指令来实现。换言之,电子设备2000可以包括处理器2001以及存储程序的存储器(例如工作存储器2002和/或存储设备2003),所述程序包括指令,所述指令在由所述处理器2001执行时使所述处理器2001执行如本公开各种实施例所述的方法。
根据一些实施方式,文字识别电路2005、声音转换电路2006、声音输出电路2007、图像处理电路2008、文字处理电路2009、通信电路2010、电子电路2100、输入设备2011、输出设备2012以及电子设备2000中的至少一个所执行的操作中的一部分或者全部可以由处理器2001读取和执行一个或多个应用程序2002b的指令来实现。
软件要素(程序)的指令的可执行代码或源代码可以存储在非暂时性计算机可读存储介质(例如所述存储设备2003)中,并且在执行时可以被存入工作存储器2002中(可 能被编译和/或安装)。因此,本公开提供存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备(例如阅读设备)的处理器执行时,致使所述电子设备执行如本公开各种实施例所述的方法。根据另一种实施方式,软件要素(程序)的指令的可执行代码或源代码也可以从远程位置下载。
还应该理解,可以根据具体要求而进行各种变型。例如,也可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现各个电路、单元、模块或者元件。例如,所公开的方法和设备所包含的电路、单元、模块或者元件中的一些或全部可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
根据一些实施方式,电子设备2000中的处理器2001可以分布在网络上。例如,可以使用一个处理器执行一些处理,而同时可以由远离该一个处理器的另一个处理器执行其他处理。电子设备2000的其他模块也可以类似地分布。这样,电子设备2000可以被解释为在多个位置执行处理的分布式计算***。
虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、***和设备仅仅是示例性的实施例或示例,本公开的范围并不由这些实施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。进一步地,可以用各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。

Claims (41)

  1. 一种图像文本播报方法,包括:
    接收指定播报指示;
    响应于所述指定播报指示,确定关于播报数据的当前播报进度;以及
    根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据,其中,所述第一文本由针对图像的文本区域中的文本识别并存储的文本数据组成。
  2. 根据权利要求1所述的图像文本播报方法,还包括:
    针对图像中文本区域的待识别文本行进行字符识别以获得文本数据,并在第一存储空间中存储该文本行的所述文本数据,作为所述第一文本中的一行数据;
    在第三存储空间中存储所述播报数据;以及
    在第二存储空间中存储针对所述播报数据的关联信息,所述关联信息用于将第三存储空间中的播报数据与第一存储空间中的第一文本中的相应数据关于位置进行对应。
  3. 根据权利要求2所述的方法,其中,针对所述播报数据的关联信息至少包括:
    与所述播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置;以及
    所述相应数据中的每行数据在所述相应数据中的截止占比,
    其中,所述相应数据中的每行数据在所述相应数据中的截止占比通过从所述相应数据的起始行数据到该行数据的字符数量占整个所述相应数据的总字符数量的比例来确定。
  4. 根据权利要求2所述的方法,其中,所述在第三存储空间中存储播报数据包括:
    在第三存储空间中存储着当前播报数据的情况下,在顺序播报模式中,将新识别的文本行的文本数据存储至第三存储空间作为下一条播报数据的至少一部分。
  5. 根据权利要求4所述的方法,其中,响应于每次在第三存储空间中执行存储,构建和/或更新存储在所述第二存储空间中的针对所述下一条播报数据的关联信息。
  6. 根据权利要求2所述的方法,其中,将对于所述文本区域的待识别首个文本行进行字符识别得到的文本数据单独作为一条播报数据。
  7. 根据权利要求2所述的方法,还包括:
    确定所述文本区域中是否存在下一个待识别文本行;以及
    响应于确定存在下一个待识别文本行,对于该下一个待识别文本行进行字符识别。
  8. 根据权利要求2所述的方法,其中,所述在第三存储空间中存储播报数据包括:
    在第三存储空间中存储着当前播报数据的情况下,响应于接收到指定播报指示,从第一文本获取下一条播报数据并存储在第三存储空间。
  9. 根据权利要求2所述的方法,其中,通过所述播报数据中已播报的字符数量占所述播报数据的字符数量的比例来确定当前播报进度。
  10. 根据权利要求2所述的方法,其中,所述从第一文本获取下一条播报数据包括:
    响应于接收到指定播报指示,确定当前播报数据中的当前已播报位置作为所述当前播报进度;
    基于当前播报进度与第二存储空间中的针对所述当前播报数据的关联信息,确定与当前播报进度对应的所述第一文本中的位置作为当前播报位置;
    基于所述当前播报位置与所述指定播报指示中的指定播报类型,在所述第一文本中确定待播报位置;以及
    以所述待播报位置作为起始位置,从第一文本获取下一条播报数据存储至第三存储空间,并计算和在第二存储空间中相应地存储针对所述下一播报数据的关联信息。
  11. 根据权利要求10所述的方法,其中,所述基于当前播报进度与第二存储空间中的针对所述当前播报数据的关联信息,确定与当前播报进度对应的所述第一文本中的位置作为当前播报位置包括:
    比较当前播报进度与第二存储空间中存储的关于所述当前播报数据的截止占比,将所述截止占比中大于当前播报进度的截止占比中最小的截止占比所对应的在第二存储空间中存储的关于所述当前播报数据的位置,确定为当前播报位置。
  12. 根据权利要求2所述的方法,其中,所述指定播报指示中的指定播报类型包括播报相邻文本单元,以及
    其中,所述相邻文本单元为与当前播报的文本行所在的文本单元相邻的文本单元。
  13. 根据权利要求12所述的方法,其中,所述播报相邻文本单元包括播报前一行,存储在所述第二存储空间中的所述当前播报数据中的每行数据在所述第一文本中的位置包括该行数据的行号,
    其中,所述从第一文本获取下一条播报数据包括:
    响应于接收到指定播报指示,确定所述当前播报数据中的当前已播报位置作为所述当前播报进度;
    基于所述当前播报进度和存储在所述第二存储空间中的针对所述当前播报数据的所述关联信息,确定所述当前播报数据中的与所述当前播报进度对应的那行数据在所述第一文本中所对应的文本行的行号作为当前播报行号;
    基于所述播报前一行的指定播报类型,将所述当前播报行号减1作为待播报行号;以及
    以所述待播报行号所在的行作为起始位置,从第一文本获取至少一行数据作为下一条播报数据。
  14. 根据权利要求12所述的方法,其中,所述播报相邻文本单元包括播报后一行,存储在所述第二存储空间中的所述当前播报数据中的每行数据在所述第一文本中的位置包括该行数据的行号,
    其中,所述从第一文本获取下一条播报数据包括:
    响应于接收到指定播报指示,确定所述当前播报数据中的当前已播报位置作为所述当前播报进度;
    基于所述当前播报进度和存储在所述第二存储空间中的针对所述当前播报数据的所述关联信息,确定所述当前播报数据中的与所述当前播报进度对应的那行数据在所述第一文本中所对应的文本行的行号作为当前播报行号;
    基于所述播报后一行的指定播报类型,将所述当前播报行号加1作为待播报行号;以及
    以所述待播报行号所在的行作为起始位置,从第一文本获取至少一行数据作为下一条播报数据。
  15. 根据权利要求12所述的方法,其中,所述播报相邻文本单元包括播报前一段,存储在所述第二存储空间中的所述当前播报数据中的每行数据在所述第一文本中的位置包括该行数据的段号,
    其中,所述从第一文本获取下一条播报数据包括:
    响应于接收到指定播报指示,确定所述当前播报数据中的当前已播报位置作为所述当前播报进度;
    基于所述当前播报进度和存储在所述第二存储空间中的针对所述当前播报数据的所述关联信息,确定所述当前播报数据中的与所述当前播报进度对应的那行数据在所述第一文本中所对应的文本行的段号作为当前播报段号;
    基于所述播报前一段的指定播报类型,将所述当前播报段号减1作为待播报段号;以及
    从所述第一文本中获取所述待播报段号对应的段作为下一条播报数据。
  16. 根据权利要求12所述的方法,其中,所述播报相邻文本单元包括播报后一段,存储在所述第二存储空间中的所述当前播报数据中的每行数据在所述第一文本中的位置包括该行数据的段号,
    其中,所述从第一文本获取下一条播报数据包括:
    响应于接收到指定播报指示,确定所述当前播报数据中的当前已播报位置作为所述当前播报进度;
    基于所述当前播报进度和存储在所述第二存储空间中的针对所述当前播报数据的所述关联信息,确定所述当前播报数据中的与所述当前播报进度对应的那行数据在所述第一文本中所对应的文本行的段号作为当前播报段号;
    基于所述播报后一段的指定播报类型,将所述当前播报段号加1作为待播报段号;以及
    从所述第一文本中获取所述待播报段号对应的段作为下一条播报数据。
  17. 根据权利要求10所述的方法,其中,所述计算和在第二存储空间中相应地存储针对所述下一播报数据的关联信息包括:
    在第二存储空间中存储与所述下一条播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置;以及
    计算并在第二存储空间中存储所述相应数据中的每行数据在所述相应数据中的截止占比。
  18. 根据权利要求17所述的方法,其中,所述与下一条播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置包括该行数据的行号、或者该行数据的段号与行号。
  19. 根据权利要求1所述的方法,还包括:
    响应于检测到触屏上的操作,生成所述指定播报指示。
  20. 根据权利要求19所述的方法,其中,所述响应于检测到触屏上的操作,生成所述指定播报指示包括:
    响应于检测到触屏上的第一触屏操作,生成指定播报类型为播报前一行的指定播报指示;以及
    响应于检测到触屏上的第二触屏操作,生成指定播报类型为播报后一行的指定播报指示。
  21. 根据权利要求19所述的方法,其中,所述响应于检测到触屏上的操作,生成所述指定播报指示包括:
    响应于检测到触屏上的第三触屏操作,生成指定播报类型为播报前一段的指定播报指示;以及
    响应于检测到触屏上的第四触屏操作,生成指定播报类型为播报后一段的指定播报指示。
  22. 根据权利要求2所述的方法,其中,对于特定类型文本行,存储用于表示该文本行类型的特定类型标识,并且基于所述特定类型标识,在播报时向用户发出提示。
  23. 根据权利要求22所述的方法,其中,所述特定类型文本行包括以下之一:
    第一类型文本行,其中,通过文字大小来确定该第一类型文本行;以及
    第二类型文本行,其中,通过文本行清晰度来确定该第二类型文本行。
  24. 根据权利要求1~23中任一项所述的方法,其中,所述文本行沿横向、竖向、或者斜向排列。
  25. 一种图像文本播报设备,包括:
    接收装置,被配置为接收指定播报指示;
    播报装置,被配置为响应于所述指定播报指示,确定关于播报数据的当前播报进度;
    处理器,被配置为根据所述当前播报进度和所述指定播报指示,从第一文本获取下一条播报数据供播报装置播报,其中,所述第一文本由字符识别装置针对图像的文本区域中的文本识别并存储的文本数据组成。
  26. 根据权利要求25所述的图像文本播报设备,还包括:
    所述字符识别装置,被配置为针对所述文本区域的待识别文本行进行字符识别,获得文本数据;
    至少一个存储器,被配置为:
    在所述至少一个存储器的第一存储空间中存储该文本行的所述文本数据,作为所述第一文本中的一行数据;
    在所述至少一个存储器的第三存储空间中存储所述播报数据;以及
    在所述至少一个存储器的第二存储空间中存储针对所述播报数据的关联信息,所述关联信息用于将第三存储空间中的播报数据与第一存储空间中的第一文本中的相应数据关于位置进行对应。
  27. 根据权利要求26所述的设备,其中,所述播报装置被配置为从所述第三存储空间获取播报数据,进行关于所述文本区域的顺序播报或指定播报。
  28. 根据权利要求26所述的设备,其中,所述处理器被配置为响应于接收到指定播报指示,从第一存储空间中的第一文本获取下一条播报数据并存储至第三存储空间。
  29. 根据权利要求25所述的设备,还包括:
    检测装置,被配置为响应于检测到指定播报操作,生成所述指定播报指示,并发送给所述处理器。
  30. 根据权利要求29所述的设备,其中,所述指定播报操作包括以下操作之一:
    触屏上的用于表示播报前一行的第一触屏操作以及触屏上的用于表示播报后一行的第二触屏操作。
  31. 根据权利要求29所述的设备,其中,所述指定播报操作包括以下操作之一:
    触屏上的用于表示播报前一段的第三触屏操作以及触屏上的用于表示播报后一段的第四触屏操作。
  32. 根据权利要求26所述的设备,其中,针对所述播报数据的关联信息至少包括:
    与所述播报数据关于位置进行对应的所述相应数据中的每行数据在所述第一文本中的位置;以及
    所述相应数据中的每行数据在所述相应数据中的截止占比,
    其中,所述相应数据中的每行数据在所述相应数据中的截止占比由处理器通过从所述相应数据的起始行数据到该行数据的字符数量占整个所述相应数据的总字符数量的比例来计算确定。
  33. 根据权利要求26所述的设备,其中,所述处理器被配置为响应于每次在第三存储空间中执行的存储,构建和/或更新存储在所述第二存储空间中的针对所述下一条播报数据的关联信息。
  34. 根据权利要求30所述的设备,其中,所述播报装置被配置为:
    响应于触屏上的第一触屏操作,播报当前播报的文本行的前一行;以及
    响应于触屏上的第二触屏操作,播报当前播报的文本行的后一行。
  35. 根据权利要求31所述的设备,其中,所述播报装置被配置为:
    响应于触屏上的第三触屏操作,播报当前播报的文本段的前一段;以及
    响应于触屏上的第四触屏操作,播报当前播报的文本段的后一段。
  36. 根据权利要求29所述的设备,其中,所述播报装置被配置为响应于指定播报操作,向用户发出表示正在识别或不存在指定位置的提示。
  37. 一种电子电路,包括:
    被配置为执行根据权利要求1~24中任一项所述的方法的步骤的电路。
  38. 一种阅读设备,包括:
    根据权利要求37所述的电子电路;
    被配置为播报文本数据的电路。
  39. 根据权利要求38所述的阅读设备,其中,所述阅读设备响应于用户的操作或者用户的动作,通过所述播报文本数据的电路进行顺序播报或者指定播报。
  40. 一种电子设备,包括:
    处理器;以及
    存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述电子设备执行根据权利要求1~24中任一项所述的方法。
  41. 一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行根据权利要求1~24中任一项所述的方法。
PCT/CN2020/123195 2020-02-11 2020-10-23 图像文本播报方法及其设备、电子电路和存储介质 WO2021159729A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/164,744 US11776286B2 (en) 2020-02-11 2021-02-01 Image text broadcasting

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010085671.8 2020-02-11
CN202010085671.8A CN110991455B (zh) 2020-02-11 2020-02-11 图像文本播报方法及其设备、电子电路和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/164,744 Continuation US11776286B2 (en) 2020-02-11 2021-02-01 Image text broadcasting

Publications (1)

Publication Number Publication Date
WO2021159729A1 true WO2021159729A1 (zh) 2021-08-19

Family

ID=70081367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123195 WO2021159729A1 (zh) 2020-02-11 2020-10-23 图像文本播报方法及其设备、电子电路和存储介质

Country Status (5)

Country Link
EP (1) EP3866475A1 (zh)
JP (1) JP2021129299A (zh)
KR (1) KR102549570B1 (zh)
CN (1) CN110991455B (zh)
WO (1) WO2021159729A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991455B (zh) * 2020-02-11 2023-05-05 上海肇观电子科技有限公司 图像文本播报方法及其设备、电子电路和存储介质
US11776286B2 (en) 2020-02-11 2023-10-03 NextVPU (Shanghai) Co., Ltd. Image text broadcasting
CN113487542B (zh) * 2021-06-16 2023-08-04 成都唐源电气股份有限公司 一种接触网导线磨耗区域的提取方法
WO2023136605A1 (en) * 2022-01-11 2023-07-20 Samsung Electronics Co., Ltd. Method and electronic device for intelligently reading displayed contents

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022332A (zh) * 2016-04-15 2016-10-12 广州阿里巴巴文学信息技术有限公司 终端设备、将纸质读物转为待听读物播放的装置及方法
US20180239512A1 (en) * 2013-01-28 2018-08-23 Nook Digital, Llc Context based gesture delineation for user interaction in eyes-free mode
CN108665742A (zh) * 2018-05-11 2018-10-16 亮风台(上海)信息科技有限公司 一种通过阅读设备进行阅读的方法与设备
CN110245606A (zh) * 2019-06-13 2019-09-17 广东小天才科技有限公司 一种文本识别方法、装置、设备及存储介质
CN110991455A (zh) * 2020-02-11 2020-04-10 上海肇观电子科技有限公司 图像文本播报方法及其设备、电子电路和存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7336890B2 (en) * 2003-02-19 2008-02-26 Microsoft Corporation Automatic detection and segmentation of music videos in an audio/video stream
KR101351555B1 (ko) * 2012-04-05 2014-01-16 주식회사 알에스엔 대용량 데이터의 텍스트마이닝을 위한 의미기반 분류 추출시스템
KR20140019167A (ko) * 2012-08-06 2014-02-14 삼성전자주식회사 음성 안내 기능을 제공하기 위한 방법 및 그 전자 장치
JP2014127197A (ja) * 2012-12-26 2014-07-07 Toshio Itabashi スマートフォンのカメラで認識した文字を、音声で読み上げるアプリケーション・ソフト
US9378727B2 (en) * 2013-04-27 2016-06-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for audio playing
JP6243071B1 (ja) * 2017-04-03 2017-12-06 旋造 田代 通信内容翻訳処理方法、通信内容翻訳処理プログラム、及び、記録媒体
CN107393356A (zh) * 2017-04-07 2017-11-24 深圳市友悦机器人科技有限公司 控制方法、控制装置和早教机
JP2019040005A (ja) * 2017-08-24 2019-03-14 株式会社オトングラス 読み上げシステム及び読み上げ方法
CN107885826B (zh) * 2017-11-07 2020-04-10 Oppo广东移动通信有限公司 多媒体文件播放方法、装置、存储介质及电子设备
CN108182432A (zh) * 2017-12-28 2018-06-19 北京百度网讯科技有限公司 信息处理方法和装置
CN108366182B (zh) * 2018-02-13 2020-07-07 京东方科技集团股份有限公司 文字语音同步播报的校准方法及装置、计算机存储介质
CN108874356B (zh) * 2018-05-31 2020-10-23 珠海格力电器股份有限公司 语音播报方法、装置、移动终端和存储介质
CN110111612A (zh) * 2019-04-11 2019-08-09 深圳市学之友科技有限公司 一种拍照式点读方法、***及点读设备
US11715485B2 (en) * 2019-05-17 2023-08-01 Lg Electronics Inc. Artificial intelligence apparatus for converting text and speech in consideration of style and method for the same
CN109934210B (zh) * 2019-05-17 2019-08-09 上海肇观电子科技有限公司 版面分析方法、阅读辅助设备、电路和介质
CN110287830A (zh) * 2019-06-11 2019-09-27 广州市小篆科技有限公司 智能穿戴终端、云端服务器和数据处理方法
CN110277092A (zh) * 2019-06-21 2019-09-24 北京猎户星空科技有限公司 一种语音播报方法、装置、电子设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180239512A1 (en) * 2013-01-28 2018-08-23 Nook Digital, Llc Context based gesture delineation for user interaction in eyes-free mode
CN106022332A (zh) * 2016-04-15 2016-10-12 广州阿里巴巴文学信息技术有限公司 终端设备、将纸质读物转为待听读物播放的装置及方法
CN108665742A (zh) * 2018-05-11 2018-10-16 亮风台(上海)信息科技有限公司 一种通过阅读设备进行阅读的方法与设备
CN110245606A (zh) * 2019-06-13 2019-09-17 广东小天才科技有限公司 一种文本识别方法、装置、设备及存储介质
CN110991455A (zh) * 2020-02-11 2020-04-10 上海肇观电子科技有限公司 图像文本播报方法及其设备、电子电路和存储介质

Also Published As

Publication number Publication date
EP3866475A1 (en) 2021-08-18
JP2021129299A (ja) 2021-09-02
KR20210102832A (ko) 2021-08-20
CN110991455B (zh) 2023-05-05
KR102549570B1 (ko) 2023-06-28
CN110991455A (zh) 2020-04-10

Similar Documents

Publication Publication Date Title
WO2021159729A1 (zh) 图像文本播报方法及其设备、电子电路和存储介质
CN110276007B (zh) 用于提供信息的装置和方法
US10921979B2 (en) Display and processing methods and related apparatus
US10642574B2 (en) Device, method, and graphical user interface for outputting captions
US10452777B2 (en) Display apparatus and character correcting method thereof
US20200394356A1 (en) Text information processing method, device and terminal
JP2021129299A5 (zh)
CN103886025A (zh) 网页中图片的显示方法和装置
JP2006107048A (ja) 視線対応制御装置および視線対応制御方法
US10671795B2 (en) Handwriting preview window
US20180283873A1 (en) Calibration method based on dead reckoning technology and portable electronic device
WO2016152200A1 (ja) 情報処理システムおよび情報処理方法
US20140288916A1 (en) Method and apparatus for function control based on speech recognition
US20170357568A1 (en) Device, Method, and Graphical User Interface for Debugging Accessibility Information of an Application
US11776286B2 (en) Image text broadcasting
US10915778B2 (en) User interface framework for multi-selection and operation of non-consecutive segmented information
US20230274565A1 (en) Image, pattern and character recognition
US11163378B2 (en) Electronic device and operating method therefor
US20240194182A1 (en) Text reading method and device
CN110969161B (zh) 图像处理方法、电路、视障辅助设备、电子设备和介质
RU2636673C2 (ru) Способ и устройство для сохранения строки
CA3003002C (en) Systems and methods for using image searching with voice recognition commands
WO2023020334A1 (zh) 图像方向调整方法、装置、存储介质及电子设备
WO2015190061A1 (en) Information processor, information processing method, and program
US20140229837A1 (en) Information processing apparatus and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918587

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918587

Country of ref document: EP

Kind code of ref document: A1