CN113936638A

CN113936638A - Text audio playing method and device and terminal equipment

Info

Publication number: CN113936638A
Application number: CN202010603452.4A
Authority: CN
Inventors: 罗义; 王守诚; 谢鲁冰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2022-01-14

Abstract

The application provides a text audio playing method and device and terminal equipment, and relates to the technical field of terminals. The method comprises the following steps: identifying non-word information in the target text; determining audio information corresponding to the non-character information; and according to the audio information corresponding to the non-character information, the target text is played in a sound mode. The method for playing the text with sound provided by the embodiment can fully express the non-character information in the target text in the process of playing the target text with sound, improve the expression effect of the terminal device on the text information, and improve the user experience.

Description

Text audio playing method and device and terminal equipment

Technical Field

The present application relates to the field of terminal technologies, and in particular, to a text audio playing method and apparatus, and a terminal device.

Background

Terminal devices typically have a function of reading text. At present, a text is read aloud by a terminal device in a manner of playing a pronunciation audio corresponding to a character in the text, so that the purpose of reading the text aloud is achieved. However, the text read by the terminal device may include not only text, but also some non-text information, such as punctuation marks, emoticons, underlined fonts, etc., sentence reading for expressing the text, mood, emotion, etc., or highlighting part of the text. Because the current terminal equipment only reads words in the text when reading the text, the non-word information in the text cannot be fully expressed for the text containing the non-word information, and the expression effect on the text information is poor.

Disclosure of Invention

The application provides a text audio playing method, a text audio playing device and a terminal device, and solves the problem that in the prior art, when the terminal device reads texts aloud, the text information expression effect is poor.

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, the present application provides a method for audio playing of a text, including: identifying non-word information in the target text; determining audio information corresponding to the non-character information; and according to the audio information corresponding to the non-character information, the target text is played in a sound mode.

The audio playing method for the text provided by the embodiment can play the audio information corresponding to the non-character information in the target text in the process of playing the target text in an audio manner, so that the non-character information in the text is fully expressed, the expression effect of the text information is improved, and the user experience is improved.

With reference to the first aspect, in some embodiments, the non-textual information includes emoticons, composition control symbols, punctuation symbols, mathematical symbols, annotation symbols, or characteristic font styles of text.

With reference to the first aspect, in some embodiments, when the target text includes a word and the non-word information, determining audio information corresponding to the non-word information includes:

if the non-character information is a first symbol, determining the application type of the first symbol according to the semantics of the target text; the first symbol comprises audio information corresponding to at least two application types;

and determining the audio information of the first symbol according to the application type of the first symbol.

For the first symbols with different reading methods in different language scenes, the audio information corresponding to the first symbols and the scenes can be determined by identifying the semantics of the target text, so that the terminal equipment is prevented from playing the audio information which is not suitable for the scenes, and the expression effect of the text information is improved.

With reference to the first aspect, in some embodiments, when the target text includes a word and the non-word information, audibly playing the target text according to audio information corresponding to the non-word information includes:

and if the non-character information is the expression symbol, the typesetting control symbol, the punctuation symbol, the mathematic symbol or the annotation symbol, sequentially playing the audio information corresponding to the characters and the audio information corresponding to the non-character information according to the arrangement sequence of the characters and the non-character information in the target text.

if the non-character information is the annotation symbol, identifying the annotation character corresponding to the annotation symbol;

and sequentially playing the audio information corresponding to the characters and the audio information corresponding to the annotation characters according to the arrangement sequence of the characters and the annotation symbols in the target text.

and after the audio information of all the characters of the sentence where the non-character information is located is played, the audio information corresponding to the annotation character is played.

By playing the audio information corresponding to the annotation character in the process of playing the audio information of the character, a user can know the text information in more detail.

With reference to the first aspect, in some embodiments, when the target text includes text and non-text information, playing the target text audibly according to audio information corresponding to the non-text information includes:

and if the non-character information is the characteristic font style, playing the audio information corresponding to the character with the characteristic font style while playing the audio information corresponding to the character with the characteristic font style as the background sound of the audio information corresponding to the character with the characteristic font style.

With reference to the first aspect, in some embodiments, the determining audio information corresponding to the non-text information includes:

and determining the audio information corresponding to the non-character information from a preset audio information library according to the identification information of the non-character information.

In a second aspect, the present application provides an apparatus for audio playback of text, the apparatus comprising:

the identification unit is used for identifying non-character information in the target text;

the determining unit is used for determining the audio information corresponding to the non-character information;

and the playing control unit is used for playing the target text in a sound mode according to the audio information corresponding to the non-character information.

In a third aspect, this embodiment provides a terminal device, which includes a speaker, a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for playing a text with sound as described in the first aspect when executing the computer program.

In a fourth aspect, the present embodiment provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method for vocal playback of text as described in the first aspect above.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the method for playing a text in sound according to the first aspect.

In a sixth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory to implement the method for playing a text with sound according to the first aspect. In this embodiment, the chip system may be a single chip or a chip module formed by a plurality of chips.

It is understood that the beneficial effects of the second to sixth aspects can be seen from the description of the first aspect, and are not described herein again.

Drawings

Fig. 1 is a schematic structural diagram of a mobile phone to which a text audio playing method provided in an embodiment of the present application is applied;

fig. 2 is a schematic structural diagram of a processor to which the text audio playing method according to the embodiment of the present application is applied;

fig. 3 is a schematic diagram of a software architecture to which a text audio playing method according to an embodiment of the present application is applied;

fig. 4 is a first flowchart illustrating a text audio playing method according to an embodiment of the present application;

fig. 5a is a schematic view illustrating a user control for audio playback of a text according to an embodiment of the present application;

fig. 5b is a schematic diagram illustrating a user control for audio playback of a text according to an embodiment of the present application;

fig. 6a is a schematic diagram of an emoji emoticon provided in an embodiment of the present application;

fig. 6b is a schematic diagram of an emoji emoticon provided in the embodiment of the present application;

fig. 6c is a schematic diagram of an emoji emoticon provided in the embodiment of the present application;

fig. 6d is a schematic diagram of an emoji emoticon provided in the embodiment of the present application;

FIG. 7 is a first illustration of a program code display provided in an embodiment of the present application;

FIG. 8 is a second display diagram of program code provided in an embodiment of the present application;

fig. 9 is a flowchart illustrating a second method for playing a text with sound according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a text audio playing apparatus according to an embodiment of the present application.

Detailed Description

The method for playing the text with sound provided by the embodiment of the application can be applied to terminal equipment with an audio information playing function, such as a mobile phone, a tablet computer, an electronic reader, a notebook computer, a netbook, wearable equipment and the like, and the embodiment of the application does not limit the specific type of the terminal equipment at all.

Take the terminal device as a mobile phone as an example. Fig. 1 is a block diagram illustrating a partial structure of a mobile phone according to an embodiment of the present disclosure. Referring to fig. 1, the mobile phone includes Radio Frequency (RF) circuit 110, memory 120, input unit 130, display unit 140, sensor 150, audio circuit 160, wireless fidelity (WiFi) module 170, processor 180, and power supply 190. Those skilled in the art will appreciate that the handset configuration shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 1:

the RF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 180; in addition, the data for designing uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE)), e-mail, Short Messaging Service (SMS), and the like.

The memory 120 may be used to store software programs and modules, and the processor 180 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 120. The memory 120 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as an audio information playing function, a text display function, etc.), and the like; the storage data area may store data (such as audio information, a phonebook, etc.) created according to the use of the cellular phone, etc. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 130 may include a touch panel 131 and other input devices 132. The touch panel 131, also referred to as a touch screen, may collect touch operations of a user on or near the touch panel 131 (e.g., operations of the user on or near the touch panel 131 using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 131 may include two parts, i.e., a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 180, and can receive and execute commands sent by the processor 180. In addition, the touch panel 131 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 130 may include other input devices 132 in addition to the touch panel 131. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 140 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The Display unit 140 may include a Display panel 141, and optionally, the Display panel 141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 131 can cover the display panel 141, and when the touch panel 131 detects a touch operation on or near the touch panel 131, the touch operation is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although the touch panel 131 and the display panel 141 are shown as two separate components in fig. 1 to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 131 and the display panel 141 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 141 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, tapping), and the like. As for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 160, speaker 161, microphone 162 may provide an audio interface between the user and the handset. The audio circuit 160 may transmit the electrical signal converted from the received audio data to the speaker 161, and convert the electrical signal into a sound signal for output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal, which is received by the audio circuit 160 and converted into audio data, which is then processed by the audio data output processor 180 and then transmitted to, for example, another cellular phone via the RF circuit 110, or the audio data is output to the memory 120 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 170, and provides wireless broadband Internet access for the user. Although fig. 1 shows the WiFi module 170, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 180 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 120 and calling data stored in the memory 120, thereby integrally monitoring the mobile phone. Alternatively, processor 180 may include one or more processing units; preferably, the processor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

Referring to fig. 2, the processor 180 may include: a Text control unit (Text View)1801, a Text drawing unit (Draw Text)1802, a Text rendering engine 1803, a speaking trigger 1804, and a Text-to-Speech engine (TTS) 1805. The text control unit 1801 is configured to determine display information such as content, font size, and display shape of a text. A text drawing unit 1802 controls a layout style of the text. And a text rendering engine 1803, configured to finally determine a display image of the text in the display interface according to the display information and the layout style of the text. And the reading trigger 1804 is used for selecting the text according to the touch operation of the user, determining the text as the target text, and controlling the mobile phone to start to play the target text in a sound mode. And a text-to-speech engine 1805, configured to cooperate with the speaker 161 to play audio information corresponding to the target text.

The handset also includes a power supply 190 (e.g., a battery) for powering the various components, and preferably, the power supply may be logically connected to the processor 180 via a power management system, such that functions such as managing charging, discharging, and power consumption are performed via the power management system.

Although not shown, the handset may also include a camera. Optionally, the position of the camera on the mobile phone may be front-located or rear-located, which is not limited in this embodiment of the present application. Optionally, the mobile phone may include a single camera, a dual camera, or a triple camera, which is not limited in this embodiment. For example, a cell phone may include three cameras, one being a main camera, one being a wide camera, and one being a tele camera. When the mobile phone includes a plurality of cameras, the plurality of cameras may be all disposed in front, or all disposed in back, or a part of the cameras may be disposed in front, and another part of the cameras may be disposed in back, which is not limited in this application.

In addition, although not shown, the mobile phone may further include a bluetooth module, etc., which will not be described herein.

Fig. 3 is a schematic diagram of a software structure of a mobile phone according to an embodiment of the present application. Taking a mobile phone operating system as an Android system as an example, in some embodiments, the Android system is divided into four layers, which are an application layer, an application Framework (FWK) layer, a system layer and a hardware abstraction layer, and the layers communicate with each other through a software interface.

As shown in fig. 3, the application layer may be a series of application packages, which may include short message, calendar, camera, video, navigation, gallery, call, and other applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.

As shown in FIG. 3, the application framework layers may include a window manager, a resource manager, and a notification manager, among others.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.

The application framework layer may further include:

a viewing system that includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The telephone manager is used for providing the communication function of the mobile phone. Such as management of call status (including on, off, etc.).

The system layer may include a plurality of functional modules. For example: a sensor service module, a physical state identification module, a three-dimensional graphics processing library (such as OpenGL ES), and the like.

The sensor service module is used for monitoring sensor data uploaded by various sensors in a hardware layer and determining the physical state of the mobile phone;

the physical state recognition module is used for analyzing and recognizing user gestures, human faces and the like;

the three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The system layer may further include:

the surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The hardware abstraction layer is a layer between hardware and software. The hardware abstraction layer may include a display driver, a camera driver, a sensor driver, etc. for driving the relevant hardware of the hardware layer, such as a display screen, a camera, a sensor, etc.

The following embodiments may be implemented on a terminal device having the above-described hardware structure/software structure. The following embodiments will describe a text audio playing method provided in the embodiments of the present application, taking a mobile phone as an example.

Mobile phones generally have a function of reading text. At present, the way of reading text aloud by a mobile phone is to only play the pronunciation audio corresponding to the characters in the text, thereby achieving the purpose of reading the text aloud. However, these texts may include not only texts, but also some non-text information, such as punctuation marks, emoticons, underlines, etc., sentences, moods, or emotions for expressing the texts, or highlighting part of the texts. Therefore, for the text containing the non-character information, when the mobile phone adopts the text reading method, the non-character information in the text cannot be fully expressed, and the expression effect on the text information is influenced.

For example, for the text "happy to know you ^ the handset reads aloud the text currently in a way that only the audio information of the pronunciation of all the words" happy to know you "is played. But the non-character information in the text is not expressed, so that the expression effect on the text information is influenced.

Therefore, the embodiment of the application provides an audio playing method for a text, which can express non-character information in the text and improve the expression effect of the text information.

Referring to fig. 4, a flowchart of a text audio playing method provided in an embodiment of the present application is schematically illustrated. The method comprises the following steps S401-S403.

S401, the mobile phone identifies non-character information in the target text.

In this embodiment, the target text refers to a text that the user selects to play with sound by the mobile phone, and may include all the content in one text (for example, a novel) or may be a part of the text (for example, some paragraphs in the novel), which is not limited in this embodiment.

In some embodiments, referring to fig. 5a, when the mobile phone displays text, the user may select a part of the text as a target text by touch, and click a play icon to input a play instruction to the mobile phone to control the mobile phone to audibly play the target text. In fig. 5a, the text selected by the user is shaded. In other embodiments, referring to fig. 5b, the user may also directly click the play icon to control the mobile phone to take all of the text as the target text, and play the text audibly.

The target text stored in the mobile phone comprises identification information of each word and non-word information. Wherein the identification information is used for uniquely indicating a text or non-text message. Through the identification information of the target text, the mobile phone can identify the character and non-character information in the text.

Illustratively, the identification information may be a character code, such as Unicode. For example, "king" has 16-bit Unicode as U +738B, "king" has 16-bit Unicode as U +002C, epitope symbol has 16-bit Unicode as U +0009, and underlined 16-bit Unicode as U +2381, etc.

The characters related in the present embodiment may be in various language forms such as chinese, english, japanese, and french, for example, "happy", "お", "Bonjour", and the like. The non-textual information may be emoticons, punctuation marks, mathematical symbols, composition control symbols, annotation symbols, or characteristic font styles of text, among others.

Wherein emoticons are typically used to represent certain emotions, such as happy emotions, lachrymatic emotions, boring emotions, cheering emotions, and the like. However, in the present embodiment, the emoticon is not limited thereto, and may be used to represent something such as a moon, a christmas tree, a house, a flower, and the like.

In some embodiments, the emoticons may be combined from a number of non-character symbols, or a combination of non-character symbols and text. Illustratively, these non-character numbers may be "^", "_", "(", "﹏", "{", "→", ">", "<", "|", "\\\ and"/", etc. The combined expression symbols may be ^ a "," (> ﹏ < ">"), "> < |", Y (^ a) Y, etc., and each expression symbol is used to represent one expression, as shown in table 1.

Table 1 corresponding relation table 1

In other embodiments, the emoticon may be an emoji emoticon. Illustratively, the emoji emoticon shown in fig. 6a is used to represent an emoticon that is laughing. Fig. 6b shows an emoji emoticon used to represent lacrimation. The emoji emoticon shown in fig. 6c is used to represent an inattentive emoticon. The emoji emoticon shown in fig. 6d is used to represent the expression of panic.

In the present embodiment, punctuation marks include ",". ","; ",": ","! "," (",") "," { ",", "-", ") … …,"? "," - ", etc., represent punctuation marks of a sentence reading, a particular mood, or a scene.

In the present embodiment, the mathematical symbols include symbols for data operations or units such as "+", "-", "+", "" "log", "m (meters)", "" mm (millimeters) ","/(ratio) "," ° c ", and the like.

The typesetting control symbols can be meter-making position symbols, line-feed symbols, section symbols and the like and are used for controlling characters, punctuation symbols, expression symbols and other symbols typeset on the display interface, so that the hierarchy of the text is clearly expressed and the reading by a user is facilitated. According to the setting of the user, the typesetting control symbols can be displayed in the display interface or can be hidden in the display interface. Illustratively, when a tab symbol is displayed in a display interface, it may be represented by "→". When the line break is displayed in the display interface, the line break can be represented by 'CRLF'. When a space is displayed in the display interface, it can be represented by "·" which is lighter than the text.

The annotation symbol may be a superscript or subscript of a word or phrase corresponding to the annotation word or words associated with the word or phrase. For example, in the text "the year without knowing the upper palace point", the annotation symbol is "point", and the corresponding annotation character is "palace point", and it is called "palace point" because there are two points outside the palace point ".

The annotation symbol may be a symbol in the program code that represents an interpretation. For example, a first annotation symbol "/", a second annotation symbol "/", and a third annotation symbol "//", may be included. In the program code, "/" is used in conjunction with "+", and the text between "/" and "+", which is an annotation text, may occupy multiple lines. The letter following a "//" is the comment letter, which usually takes only one line.

The characteristic font style may be an italic font, an underlined font, a strikethrough font, a bolded font, a font with ground color, a color font, or the like. For example, in the text "you should ensure that information is allIs accurate and error-free"in, the characteristic font style is an underlined font.

S402, the mobile phone determines the audio information corresponding to the non-character information.

An audio information base is preset in the mobile phone, wherein the audio information base comprises audio information corresponding to text information and non-text information. For example, the correspondence between the text or non-text information and the audio information may be as shown in table 2.

TABLE 2 corresponding relationship TABLE II

For text, its audio information typically corresponds to the pronunciation of the text. For example, the audio information of the word "you" corresponds to the pronunciation of [ you ]. The pronunciation corresponding to the audio information of the word "good" is [ good ].

For non-character information, the emotion expressed by the audio information corresponding to the emoticon is the same as the emotion expressed by the emoticon. For example, for an emoticon expressing laugh

The pronunciation corresponding to the audio information may be [ haha ha ]. The tone expressed by the audio information of the punctuation mark should be the same as the tone expressed by the punctuation mark. For example, for punctuation symbols "? ", which may correspond to audio information that is pronounced as Yita. The audio information corresponding to the characteristic font style is convenient for a user to clearly understand the information conveyed by the characteristic font style. For example, the audio information corresponding to the underlined font may be a sound of a pencil line. The embodiment does not limit the specific content of the audio information corresponding to the non-text information.

It should be noted that, in the audio information library, each text or non-text information at least corresponds to one audio information. For example, the audio information of the word "you" may be a female voice uttered as [ you ] or a male voice uttered as [ you ]. Alternatively, the audio information corresponding to underlining may be a sound of drawing a line with a pencil, or may be a sound of drawing a line with a pen. Still alternatively, the sound corresponding to the audio information of "/" may be [ or ] (e.g. in the text "s/he") or may be [ division ] (e.g. in the text "if a is 100, b is 20, please calculate the value of a/b").

Further, for ",". ","; the frequency of occurrence of the symbol is too high, and the punctuation mark used for representing sentence reading does not set corresponding audio information in the audio information base, so as to avoid that the audio information is inserted too frequently in the text audio playing process, which results in the reduction of user experience.

And S403, the mobile phone plays the target text in a sound mode according to the audio information corresponding to the non-character information.

In the target text, both the character information and the non-character information have determined position information, and the position information is used for indicating the arrangement order of the character information and the non-character information in the target text. For example, for a character (including a word, a punctuation mark, a mathematical symbol, an emoticon, an annotation symbol, a comment symbol, etc.), when its position information is 5, it represents that the character is the 5 th character in the target text. For a characteristic font style (e.g. bold font, italic font, underline, etc.), when its position information is 10 to 15, it indicates that the characteristic font style is at the position of 10 th to 15 th characters.

For convenience of description, the following description collectively represents the position information a to b as [ a, b ], where a is less than or equal to b, and a and b are both integers. For example, the present embodiment represents the position information 5 as [5,5], and the position information 10 to 15 as [10,15 ].

It should be noted that, in this embodiment, the characters displayed or pronounced as a whole correspond to a display position, and each display position corresponds to a display positionA location information. For example, a Chinese character (e.g., "you", "good"), an English word (e.g., "happy", "a", etc.), a punctuation (e.g., "down", "a"), an emoji (e.g., "a _" or emoji), an annotation (e.g., ")"^①”、“^②"), a composition control symbol (e.g., a space, tab, or linefeed symbol), etc., each corresponding to a display location, each having a corresponding location information. Of course, the determination method of the position information may also be in other forms, and this embodiment is not limited.

Taking the target text 'King teacher good, < Lambda > high-happiness knowledge you' as an example, wherein the text 'King teacher good' has the position information in the target text of [1,5], punctuation marks ',' the position information in the target text of [6,6], the emoticon 'Lambda' in the target text of [7,7], and 'high-happiness knowledge you' in the target text of [8,13] in sequence.

With the target text "you should ensure information allIs accurate and error-free"for example, the position information of the underlined font in the target text is [8,11]]。

Target text' unknown sky palace^①The year today, and annotating symbols^①The corresponding annotation is "palace point", which is called palace point because of the double points outside the palace. Wherein the annotation symbol^①The position information in the target text is [7,7]]。

In some embodiments, only non-textual information is included in the target text. Then, the mobile phone directly plays the audio information corresponding to the non-character information in the target text during the process of playing the target text with sound. For example, in a chat scenario, only one emoji emoticon, such as that shown in fig. 6a, is included in the target text. Then, the mobile phone directly plays the audio information corresponding to the emoji emoticon in the process of playing the target text.

In other embodiments, both textual and non-textual information is included in the target text. Then the mobile phone may have different playing modes in the process of playing the target text. The following will describe the playing manner of the target text provided by the present application, taking a specific text as an example.

Taking the example that the target text is 'a king teacher who is good, < Lambda > _ is very happy to know you', the mobile phone can play the audio information corresponding to the characters and the audio information corresponding to the non-character information in sequence according to the arrangement sequence of the characters and the non-character information in the target text in the process of playing the target text with sound. The details are as follows.

First, it is recognized that "the teacher in king" is good "the position information in the target text is [1,5], the punctuation mark", "the position information in the target text is [6,6], the emoticon" ^ Lambda "the position information in the target text is [7,7], and" it is very happy to recognize you "the position information in the target text is [8,13] in this order. Then, the mobile phone obtains the audio information corresponding to the characters "king", "old", "teacher", "your", "good", "very", "high", "happy", "acknowledged", "recognized", and "^ a _" from the preset audio information library. Finally, the mobile phone plays the audio information of king firstly, plays the audio information of old secondly, plays the audio information of teacher thirdly, plays the audio information of you fourthly, plays the audio information of good fifthly, pauses for a preset time (for example, 0.5s) at the sixth playing position (namely, the corresponding position) to represent the pause of the sentence, plays the audio information of ' < Lambda > _ seventh ^ and plays the audio information of ' very ' eighth. And analogizing in turn, playing the audio information of 'high', 'happy', 'recognize', 'you', thereby completing the sound playing of the target text 'King teacher good and ^ is happy to know you'.

It should be noted that, for example, in a chat scene, when the target text includes an emoticon, audio information corresponding to the emoticon is added to the text sound played by the mobile phone, so that the text information is more vivid and vivid.

With the target text "you should ensure that information is allIs accurate and error-freeFor example, the mobile phone can play the text accurately in the process of playing the target text with soundIn the process of error-free audio information, the audio information corresponding to underlines is played at the same time and is used as the background sound of the accurate audio information. The details are as follows.

Firstly, the mobile phone recognizes that the position information of 'you should ensure that the information is accurate' in the target text is [1,11] in sequence, and the underlined position information is [8,11 ]. Then, the mobile phone obtains the audio information of "you", "should", "ok", "guarantee", "information", "all", "accurate", "ok", "none", "wrong", and the underlined audio information from the preset audio information library, respectively. And finally, playing the audio information of 'you' for the first time and the audio information of 'answer' for the second time according to the position information, and analogizing to play the audio information of 'OK', 'guarantee', 'information', 'all', 'accurate', 'confirmed', 'no', 'wrong'. And when the mobile phone starts to play the 'quasi' audio information, the underlined audio information (such as the sound drawn by a pencil) starts to be played until the 'wrong' audio information is played. That is, in the process of playing the audio information of the characters "correct", "exact", "none", and "wrong", the audio information corresponding to the underline is played.

Target text' unknown sky palace^①In the case of the year today, the mobile phone can insert annotation symbols during the process of playing the audio information of the text "not knowing the upper palace in the sky and the year today" during the process of playing the target text in sound^①The corresponding annotation character "palace" refers to the audio information of palace, and is called palace because of the double palaces outside the palace. The details are as follows.

Firstly, the mobile phone recognizes that the position information of the unknown Tianshang palace point in the target text is 1,6]Note symbol "^①"position information in text is [7,7]The "and" position information is "[ 8,8 ]]The location information of "this day is" "[ 9,13]. Then, the mobile phone respectively acquires the audio information of each character in the unknown world palace and the today's year from a preset audio information base, and notes that the character is that the palace is the palace, and the two characters are outside the palaceTherefore, it is called audio information corresponding to each character in the palace. Finally, the handset incorporates annotation symbols "^①And playing the target text according to the corresponding audio information of the annotation characters.

In a possible implementation manner, the mobile phone may sequentially play the audio information corresponding to the characters and the audio information corresponding to the annotation characters according to the arrangement sequence of the characters and the annotation symbols in the target text. For example, audio information of "no know of the upper palace" is played in sequence, and then an annotation character "palace" is played, which is called as palace because of the double palaces outside the palace, and finally audio information corresponding to "which year this day is" is played.

In another possible implementation manner, the mobile phone may play the audio information corresponding to the annotation character after playing the audio information of all characters of the sentence where the non-character information is located. For example, after the mobile phone has played the audio information corresponding to each character in "the palace on the day and the year this day" in sequence, the mobile phone may play the annotation character "the palace means that there are two palaces outside the palace, so it is called as" the audio information corresponding to each character in the palace ".

In this embodiment, in the process of playing the audio information of the text, the audio information corresponding to the annotation text is played, so that the user can know the text information in more detail.

Taking the example that the target text is the program code shown in fig. 7, for example, the program code includes a plurality of composition control symbols therein. The mobile phone can insert the audio information corresponding to the typesetting control symbol in the process of playing the audio information of the program code words in the process of playing the target text in a sound mode.

It should be noted that, in general, when the mobile phone displays the program code, the composition control symbol is not displayed, so as to avoid affecting the reading experience of the user. However, for convenience of description of the present embodiment, for example, as shown in fig. 7, the present embodiment shows a program code on which a layout control symbol is displayed.

During the process of playing the seventh line of the program code, the mobile phone first recognizes that the position information of the epitope symbol, "dependences", space character, "{" and "CRLF" included therein is 1, 2, 3, 4 and 5, respectively. Then, the mobile phone respectively obtains the audio information of the marking position symbol, the dependences, the space symbol, the { "and the enter symbol from a preset audio information library, and sequentially plays the audio information of the marking position symbol, the dependences, the space symbol, the {" and the enter symbol according to the position information, thereby expressing the typesetting information of the text in the process of playing the text with sound.

In an example, the audio information of the composition control symbol, such as the tab symbol, the space symbol, and the enter symbol, may respectively correspond to different sounds of hitting the keyboard, and the specific content of the audio information is not limited in this embodiment.

It should be noted that, in the text, a space is usually used only to separate two words and symbols. Therefore, although the space occupies one display position, the mobile phone may not play the corresponding audio information during the process of playing the target text in sound.

Taking the example that the target text is the program code shown in fig. 8, there are a plurality of pieces of comment information in the program code. The text in lines 4 to 6 is an annotation message, which includes annotation symbols "/" and "/", and an annotation word "here, a specific url of jcenter is to be specified". Action 11 is an annotation message that includes the annotation symbol "//" and the annotation text "version must be 3.2.1 or more"

In a possible implementation manner, the terminal device may not play the audio information corresponding to the annotation symbol and the annotation text when encountering the annotation information in the process of playing the program code audibly. For example, in playing the program code shown in fig. 8, the audio information of the texts of the 4 th to 6 th lines and the 11 th line may not be played.

In another possible implementation manner, in the process of playing the program code audibly, when encountering the annotation information, the terminal device may sequentially play the annotation symbol and the audio information corresponding to the annotation character according to the position information of the annotation symbol and the annotation character. The audio information corresponding to the annotation symbol may be [ ding-dong ] or [ code annotation ], and the like, and it should be noted that [ ding-dong ] is used for representing a sound effect of the audio information of the annotation symbol.

Illustratively, when the terminal device plays the program codes of lines 4 to 6 of fig. 8 audibly, it can play them as follows: "ding dong" requires a specific url of jcenter to be specified here or "code annotation requires a specific url of jcenter to be specified here". During the process of playing the line 11 text of fig. 8 audibly, the terminal device may play it as follows: the [ dingdong "version must be above 3.2.1 ] or the [ code annotated version must be above 3.2.1 ].

In addition, in the non-word information of the target text, some first symbols may be included, which have different pronunciations in different language scenes. Illustratively, the first symbol may be "/", which, in the text "s/he," reads [ or ]; in the text "if a is 100, b is 20, please calculate the value of a/b"/", is read as [ divide ].

In order to enable the mobile phone to accurately play the first symbol in the target text in sound, referring to fig. 9, the present embodiment further provides a method for playing the text in sound, which includes the following steps S901-S904.

S901, the mobile phone identifies a first symbol in the target text.

In this embodiment, the mobile phone maintains a first symbol list, which includes identification information of a plurality of first symbols. For each character in the text, the handset compares its identification information to the first list of symbols. And if the identification information of the character can be found in the first symbol list, determining the character as the first character.

S902, the mobile phone determines the application type of the first symbol according to the semantic meaning of the target text.

In the present embodiment, the application type is used to indicate that a symbol is used as a punctuation mark, or used as a mathematical symbol, or used as an annotation symbol, or the like. And, the first symbol includes audio information corresponding to at least two application types.

In some embodiments, the handset may identify semantics of the target text from keywords in the target text, and determine an application type of the first symbol. For example, the type of the first symbol may be determined to be a mathematical symbol by identifying words such as "calculate", "numerical", "compare", "absolute", etc., and determining that the target text describes information related to a mathematical operation. Taking the target text "if a is 100, b is 20, please calculate the value of a/b" as an example, the mobile phone may determine the application type of "/" as a mathematical symbol according to the words such as "a", "calculate", and "value" in the target text.

And S903, the mobile phone determines the audio information corresponding to the first symbol according to the application type of the first symbol.

In a preset audio information base, the first symbol comprises audio information corresponding to at least two application types. Taking the first symbol "/" as an example, in conjunction with table 1, the audio information is audio information 8-1: [ OR ], and audio information 8-2: [ divide ] by.

For the target text "if a is 100, b is 20, please calculate the value of a/b", since the application type of "/" is a mathematical symbol, the corresponding audio information is audio information 8-2: [ divide ] by.

And S904, the mobile phone plays the target text in a sound mode according to the audio information corresponding to the first symbol.

Taking the target text "if a is 100, b is 20, please calculate the value of a/b" as an example, the mobile phone can identify the position information of each character in the target text during the process of playing the target text audibly, wherein the position information of the punctuation mark "/" is [17,17 ]. In the process of playing the audio information of each character in sequence, the mobile phone plays the audio information 8-2 of "/" at the playing time corresponding to the 17 th character: [ divide ] by. Thus, rather than reciting the target text "if a equals 100, b equals 20, please calculate the value of a/b" as [ if a equals one hundred b equals twenty please calculate the value of a divided by b ], the target text is read incorrectly as [ if a equals one hundred b equals twenty please calculate the value of a or b ].

It should be noted that, in the above embodiments, the order in which the mobile phone acquires the audio information from the audio information library and determines the position information is not limited. That is, the mobile phone may determine the location information first and then acquire the audio data; or the audio data may be acquired first and then the position information may be determined.

In summary, the audio playing method for the text provided by this embodiment can play the audio information corresponding to the non-character information in the process of playing the target text in an audio manner, so as to fully express the non-character information in the text, improve the expression effect of the terminal device on the text information, and improve the user experience.

Corresponding to the method for playing the text with sound shown in the above embodiments, the present embodiment also provides a device for playing the text with sound. For convenience of explanation, only portions related to the embodiments of the present application are shown.

Referring to fig. 10, the audio playback apparatus of a text provided in the present embodiment includes a recognition unit 1001, a determination unit 1002, and a playback control unit 1003.

The recognition unit 1001 is configured to recognize non-character information in the target text.

The determining unit 1002 is configured to determine audio information corresponding to the non-text information.

And a playing control unit 1003, configured to play the target text in a voiced manner according to the audio information corresponding to the non-character information.

Optionally, the non-text information includes emoticons, composition control symbols, punctuation symbols, mathematical symbols, annotation symbols, or characteristic font styles of the text.

Optionally, the determining unit 1002 is further configured to determine, according to the identification information of the non-character information, audio information corresponding to the non-character information from a preset audio information library.

Optionally, when the target text includes text and non-text information, the determining unit 1002 is further configured to determine, if the non-text information is a first symbol, an application type of the first symbol according to semantics of the target text, where the first symbol includes audio information corresponding to at least two application types; and determining the audio information of the first symbol according to the application type of the first symbol.

Optionally, when the target text includes a text and the non-text information, the play control unit 1003 is further configured to, if the non-text information is an emoticon, a composition control symbol, a punctuation mark, or a mathematical symbol, sequentially play the audio information corresponding to the text and the audio information corresponding to the non-text information according to the arrangement order of the text and the non-text information in the target text.

Optionally, when the target text includes a text and the non-text information, the play control unit 1003 is further configured to identify an annotation text corresponding to the annotation symbol if the non-text information is the annotation symbol; and sequentially playing the audio information corresponding to the characters and the audio information corresponding to the annotation characters according to the arrangement sequence of the characters and the annotation symbols in the target text.

Optionally, when the target text includes a text and the non-text information, the play control unit 1003 is further configured to identify an annotation text corresponding to the annotation symbol if the non-text information is the annotation symbol; and after the audio information of all the characters of the sentence where the non-character information is located is played, the audio information corresponding to the annotation character is played.

Optionally, when the target text includes a text and non-text information, the playing control unit 1003 is further configured to, if the non-text information is the characteristic font style, play audio information corresponding to the text with the characteristic font style while playing audio information corresponding to the text with the characteristic font style as a background sound of the audio information corresponding to the text with the characteristic font style.

The present embodiment also provides a terminal device, which includes a speaker, a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for playing text with sound provided in the foregoing embodiment. Illustratively, the terminal device may be as shown in fig. 1.

The present embodiment also provides a computer-readable storage medium, which stores a computer program that, when being executed by a processor, implements the steps of the above-described method embodiments.

The computer readable medium may include at least: any entity or device capable of carrying computer program code to a font audio playing device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

The embodiment of the application also provides a computer program product containing instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.

An embodiment of the present application provides a chip system, where the chip system includes a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory to implement the method for playing a text with sound as provided in the embodiment of the present application. In this embodiment, the chip system may be a single chip or a chip module formed by a plurality of chips.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optics, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM and RAM.

Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for audio playback of text, comprising:

identifying non-word information in the target text;

determining audio information corresponding to the non-character information;

and according to the audio information corresponding to the non-character information, the target text is played in a sound mode.

2. The method of claim 1, wherein the non-textual information includes emoticons, composition control symbols, punctuation symbols, mathematical symbols, annotation symbols, or characteristic font styles of text.

3. The method of claim 1 or 2, wherein when the target text comprises words and the non-word information, determining the audio information corresponding to the non-word information comprises:

4. The method of claim 2, wherein when the target text includes words and the non-word information, audibly playing the target text according to audio information corresponding to the non-word information, comprises:

5. The method of claim 2, wherein when the target text includes words and the non-word information, audibly playing the target text according to audio information corresponding to the non-word information, comprises:

6. The method of claim 2, wherein when the target text includes words and the non-word information, audibly playing the target text according to audio information corresponding to the non-word information, comprises:

7. The method of claim 2, wherein when the target text includes text and non-text information, audibly playing the target text according to audio information corresponding to the non-text information, comprises:

8. The method according to any one of claims 1-7, wherein the determining the audio information corresponding to the non-textual information comprises:

9. An apparatus for audio playback of text, the apparatus comprising:

10. A terminal device comprising a speaker, a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.