CN113779958A - Text processing method and device and reading assisting method and device - Google Patents

Text processing method and device and reading assisting method and device Download PDF

Info

Publication number
CN113779958A
CN113779958A CN202110909261.5A CN202110909261A CN113779958A CN 113779958 A CN113779958 A CN 113779958A CN 202110909261 A CN202110909261 A CN 202110909261A CN 113779958 A CN113779958 A CN 113779958A
Authority
CN
China
Prior art keywords
text
audio data
data
read
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110909261.5A
Other languages
Chinese (zh)
Inventor
张微微
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202110909261.5A priority Critical patent/CN113779958A/en
Publication of CN113779958A publication Critical patent/CN113779958A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a text processing method and device and an auxiliary reading method and device, and relates to the technical field of text processing. The text processing method comprises the following steps: determining text analysis data corresponding to the text to be read based on the text to be read, wherein the text analysis data comprises data capable of representing text content characteristics of the text to be read; determining reference audio data corresponding to the text to be read based on the text to be read; and generating reading audio data corresponding to the text to be read based on the reference audio data and the text analysis data, wherein the reading audio data is used for assisting a reader of the text to be read to read the text to be read. The method and the device achieve the purpose of generating the reading audio data which is more matched with the text to be read according to the specific text content of the text to be read, further achieve the purpose of bringing immersive reading experience to the reader based on the reading audio data, and improve the reading substitution sense of the reader and the user experience goodness of feeling.

Description

Text processing method and device and reading assisting method and device
Technical Field
The invention relates to the technical field of text processing, in particular to a text processing method and device, a computer readable storage medium and electronic equipment.
Background
In recent years, with the rapid development of text processing technology, text-based reading modes are increasingly diversified, such as an emerging listening and reading mode. The existing reading mode mainly comprises two modes, wherein the first mode is the recording of a main voice, and the second mode is the conversion of characters into artificially synthesized voice by using a Text To Speech (TTS) technology.
However, the first method is costly and inefficient, although it can record voices of different timbres for different characters. The second type can only use a single tone for reading, cannot match different tones to different roles, and cannot provide corresponding background sound effects. Therefore, the existing book listening and reading mode is difficult to bring immersive reading experience to readers on the premise of ensuring low cost and high efficiency, and the user experience is poor.
Disclosure of Invention
The present invention has been made to solve the above-mentioned problems. The embodiment of the invention provides a text processing method and device, an auxiliary reading method and device, a computer readable storage medium and electronic equipment.
In a first aspect, an embodiment of the present invention provides a text processing method, where the method includes: determining text analysis data corresponding to the text to be read based on the text to be read, wherein the text analysis data comprises data capable of representing text content characteristics of the text to be read; determining reference audio data corresponding to the text to be read based on the text to be read; and generating reading audio data corresponding to the text to be read based on the reference audio data and the text analysis data, wherein the reading audio data is used for assisting a reader of the text to be read to read the text to be read.
In an embodiment of the present invention, the text to be read includes a role dialog statement, the role dialog statement corresponds to a role, and the text parsing data includes role feature data and role dialog content data corresponding to the role. Generating reading audio data corresponding to a text to be read based on the reference audio data and the text analysis data, wherein the reading audio data comprises: determining pronunciation characteristic information corresponding to the character based on the character characteristic data and the reference audio data; generating dialogue audio data corresponding to the role based on the pronunciation characteristic information and the role dialogue content data; based on the dialogue audio data, reading audio data is generated.
In an embodiment of the present invention, the character feature data includes character identity data, and the reference audio data includes video and audio data extracted based on video data corresponding to a text to be read. Based on the character feature data and the reference audio data, determining pronunciation feature information corresponding to the character, including: extracting audio material data corresponding to the roles based on the role identity data and the movie and television audio data; and determining pronunciation characteristic information corresponding to the character based on the audio material data.
In an embodiment of the present invention, the character feature data includes at least one of age data, gender data, and occupation data, and the reference audio data includes a plurality of character samples and pronunciation feature information corresponding to each of the plurality of character samples. Based on the character feature data and the reference audio data, determining pronunciation feature information corresponding to the character, including: determining a role sample matched with the role based on the role characteristic data and the plurality of role samples; and determining pronunciation characteristic information corresponding to the character based on pronunciation characteristic information corresponding to the character sample matched with the character.
In an embodiment of the present invention, determining pronunciation feature information corresponding to a character based on character feature data and reference audio data includes: acquiring character pronunciation selection information sent by a reader based on character characteristic data and reference audio data; and determining pronunciation characteristic information corresponding to the character based on the character pronunciation selection information and the reference audio data.
In an embodiment of the present invention, the text to be read includes the voice-over text, and the reference audio data includes voice-over feature information of the voice-over text. Generating reading audio data corresponding to a text to be read based on the reference audio data and the text analysis data, wherein the reading audio data comprises: generating voice data corresponding to the voice-over text based on the voice-over feature information of the voice-over text and the voice-over text; based on the voice-over data, reading audio data is generated.
In an embodiment of the present invention, the text parsing data includes atmosphere text data, and the reference audio data includes atmosphere sound effect data. Generating reading audio data corresponding to a text to be read based on the reference audio data and the text analysis data, wherein the reading audio data comprises: determining atmosphere reading label information corresponding to a text to be read based on the atmosphere text data; and generating reading audio data based on the atmosphere reading tag information and the atmosphere sound effect data.
In a second aspect, an embodiment of the present invention provides an assistant reading method, including: acquiring reading audio data corresponding to the text to be read based on the text to be read determined by the reader, wherein the reading audio data is determined based on the text processing method mentioned in the first aspect; and playing the reading audio data to assist the reader in reading the text to be read.
In an embodiment of the present invention, the method further includes: acquiring pronunciation switching information sent by a reader, wherein the pronunciation switching information comprises voice switching information of the bystander text and/or role pronunciation switching information; and updating the reading audio data based on the pronunciation switching information to obtain updated reading audio data. Wherein, broadcast and read the audio data to the supplementary reader reads and waits to read the text, include: and switching and playing the updated reading audio data based on the updating time point so as to assist the reader to read the text to be read.
In a third aspect, an embodiment of the present invention provides a text processing apparatus, including: the device comprises a first determining module, a second determining module and a generating module. The first determining module is used for determining text analysis data corresponding to the text to be read based on the text to be read, the text analysis data comprises data capable of representing text content characteristics of the text to be read, and the second determining module is used for determining reference audio data corresponding to the text to be read based on the text to be read. The generating module is used for generating reading audio data corresponding to the text to be read based on the reference audio data and the text analysis data, and the reading audio data is used for assisting a reader of the text to be read to read the text to be read.
In an embodiment of the present invention, the text to be read includes a role dialog statement, the role dialog statement corresponds to a role, and the text parsing data includes role feature data and role dialog content data corresponding to the role. The generating module is further used for determining pronunciation characteristic information corresponding to the character based on the character characteristic data and the reference audio data, generating dialogue audio data corresponding to the character based on the pronunciation characteristic information and the character dialogue content data, and generating reading audio data based on the dialogue audio data.
In an embodiment of the present invention, the character feature data includes character identity data, and the reference audio data includes video and audio data extracted based on video data corresponding to a text to be read. The generating module is further used for extracting audio material data corresponding to the role based on the role identity data and the video audio data, and determining pronunciation characteristic information corresponding to the role based on the audio material data.
In an embodiment of the present invention, the character feature data includes at least one of age data, gender data, and occupation data, and the reference audio data includes a plurality of character samples and pronunciation feature information corresponding to each of the plurality of character samples. The generating module is further used for determining a role sample matched with the role based on the role characteristic data and the plurality of role samples, and determining pronunciation characteristic information corresponding to the role based on pronunciation characteristic information corresponding to the role sample matched with the role.
In an embodiment of the present invention, the generating module is further configured to obtain character pronunciation selection information that is sent by the reader based on the character feature data and the reference audio data, and determine pronunciation feature information corresponding to the character based on the character pronunciation selection information and the reference audio data.
In an embodiment of the present invention, the text to be read includes the voice-over text, and the reference audio data includes voice-over feature information of the voice-over text. The generating module is further used for generating the voice-over audio data corresponding to the voice-over text based on the voice-over feature information of the voice-over text and the voice-over text, and generating the reading audio data based on the voice-over audio data.
In an embodiment of the present invention, the text parsing data includes atmosphere text data, and the reference audio data includes atmosphere sound effect data. The generating module is further used for determining atmosphere reading tag information corresponding to the text to be read based on the atmosphere text data and generating reading audio data based on the atmosphere reading tag information and the atmosphere sound effect data.
In a fourth aspect, an embodiment of the present invention provides an assistant reading device, including: the device comprises a first acquisition module and a playing module. The first obtaining module is configured to obtain, based on a text to be read determined by a reader, reading audio data corresponding to the text to be read, where the reading audio data is determined based on the text processing method mentioned in the first aspect. And the playing module is used for playing the reading audio data so as to assist the reader to read the text to be read.
In an embodiment of the present invention, the apparatus further includes: the second acquisition module is used for acquiring pronunciation switching information sent by a reader, and the pronunciation switching information comprises voice switching information of the bystander text and/or role pronunciation switching information; and the updating module is used for updating the reading audio data based on the pronunciation switching information to obtain the updated reading audio data. The playing module is further used for switching and playing the updated reading audio data based on the updating time point so as to assist the reader in reading the text to be read.
In a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the method of the first aspect and/or the second aspect.
In a sixth aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory for storing computer executable instructions. The processor is configured to execute computer-executable instructions to implement the method of the first aspect and/or the second aspect.
According to the text processing method provided by the embodiment of the invention, the text analysis data is obtained by performing the text analysis on the text to be read, and then the reading audio data corresponding to the text to be read is generated based on the text analysis data, so that the purpose of generating the reading audio data more matched with the text to be read according to the specific text content of the text to be read is realized. In other words, the embodiment of the invention provides rich audio materials for generating the reading audio data by the reference audio data, and improves the fitting degree of the generated reading audio data and the text to be read by the text parsing data. Compared with the existing mode of recording the reading audio in advance by the anchor, the embodiment of the invention does not need to record the audio data manually, has low cost and high efficiency, and compared with the existing mode of generating the reading audio by using TTS technology, the embodiment of the invention not only can match different roles with different timbres according to the actual requirements of readers, but also can match corresponding bystander audio for bystander texts, and match corresponding atmosphere sound effect for atmosphere texts, thereby really achieving the purpose of bringing immersive reading experience to the readers based on the reading audio data, and improving the reading substitution sense of the readers and the good sensitivity of user experience.
Drawings
Fig. 1 is a schematic view of a listening scenario based on a text to be read according to an embodiment of the present invention.
Fig. 2 is a schematic view of an listening scenario based on a text to be read according to another embodiment of the present invention.
Fig. 3 is a flowchart illustrating a text processing method according to an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a path for generating reading audio data corresponding to a text to be read according to an embodiment of the present invention.
Fig. 5 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to an embodiment of the present invention.
Fig. 6 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention.
Fig. 7 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention.
Fig. 8 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention.
Fig. 9 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention.
Fig. 10 is a flowchart illustrating an assistant reading method according to an embodiment of the invention.
Fig. 11 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present invention.
Fig. 12 is a schematic structural diagram of an auxiliary reading device according to an embodiment of the present invention.
Fig. 13 is a schematic structural diagram of an apparatus for a text processing method and an auxiliary reading method according to an embodiment of the present invention.
Fig. 14 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
Hereinafter, example embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein.
It is well known that reading is of extraordinary importance as an activity to obtain information and knowledge. However, the conventional reading method completely relying on text not only has high requirements on the reading environment, but also may damage the vision of the reader when the reading time is too long. Although the emerging reading mode of listening to books can effectively alleviate the defect of the reading mode completely depending on texts, the user experience is poor.
Specifically, the existing reading mode mainly depends on the audio data recorded in advance by the anchor or the reader, that is, the purpose of reading by listening to the book is achieved by using the audio data recorded in advance. The existing audio data is single in form and cannot bring immersive reading experience to readers. In addition, the reader cannot adjust the audio data according to the preference of the reader, and in the reading process, the reader cannot adjust related contents (such as pronunciation timbre of related characters in the text to be read) of the listening book reading in real time according to the actual needs of the reader, and particularly when the document to be read relates to a multi-role and multi-scene text to be read, the defects of the existing listening book reading mode are more obvious.
In order to solve the above problems, embodiments of the present invention provide a text processing method and apparatus, and an auxiliary reading method and apparatus, so as to achieve the purpose of bringing immersive reading experience to a reader, and improve the user experience quality.
A brief description of a book listening scenario based on a text to be read is provided below with reference to fig. 1 and 2.
Fig. 1 is a schematic view of a listening scenario based on a text to be read according to an embodiment of the present invention. The listening book scenario includes a server 110 and a user terminal 120 communicatively coupled to the server 110. Wherein, the user terminal 120 stores the text to be read. The server 110 is configured to execute the text processing method according to the embodiment of the present invention. The user terminal 120 is used for executing the reading assistance method according to the embodiment of the present invention.
Illustratively, in an actual application process, the user terminal 120 receives an auxiliary reading request sent by a reader, and sends the text to be read stored in the user terminal 120 to the server 110 based on the auxiliary reading request, so that the server 110 determines reading audio data corresponding to the text to be read based on the text to be read and sends the reading audio data to the user terminal 120, and the user terminal 120 receives and plays the reading audio data to assist the reader in reading the text to be read.
Fig. 2 is a schematic view of an listening scenario based on a text to be read according to another embodiment of the present invention. The main difference between the book listening scenario and the book listening scenario shown in fig. 1 is that the text to be read in the book listening scenario shown in fig. 2 is stored in the server 110. Then, for example, in an actual application process, the user terminal 120 receives an auxiliary reading request issued by a reader and selection information of a text to be read, and sends the selection information of the text to be read to the server 110 based on the auxiliary reading request, and the server 110 executes a text processing method based on the text to be read selected by the reader, and sends the generated reading audio data to the user terminal 120. It is to be understood that the server 110 may also send the reading audio data and the text to be read together to the user terminal 120.
The user terminal 120 mentioned above is exemplarily a mobile terminal such as a tablet computer, a mobile phone, etc. of the reader.
As can be seen from the scenarios shown in fig. 1 and fig. 2, the text processing method provided by the present invention may be executed in a corresponding server, and correspondingly, the reading assistance method provided by the present invention may be executed in a corresponding user terminal, so as to implement a book listening scenario based on a text to be read. It can be understood that the text processing method provided by the invention can also be directly executed in the corresponding user terminal, so as to avoid data interaction between the server and the user terminal and improve the real-time listening to the book.
The text processing method of the present invention will be briefly described with reference to fig. 3 to 9.
Fig. 3 is a flowchart illustrating a text processing method according to an embodiment of the present invention. As shown in fig. 3, the text processing method provided in the embodiment of the present invention includes the following steps.
And S100, determining text analysis data corresponding to the text to be read based on the text to be read.
In some embodiments, the text to be read is a novel book text, such as a sentiment novel, a swordsman novel, or the like.
Illustratively, the text parsing data includes data capable of characterizing text content of the text to be read, such as character feature data and character dialogue content data corresponding to characters in the text to be read. The character feature data refers to data capable of representing character features, and the character dialogue content data refers to dialogue statement data corresponding to the character. For example, the text to be read is a "conutleaves transmission" novel text, and the text parsing data may include parsing data for a "conutleaves" role, that is, the role is "conutleaves", the role feature data is a "conconutleaves" name, and the role dialog content data is a "conutleaves" spoken dialog sentence.
And S200, determining reference audio data corresponding to the text to be read based on the text to be read.
In some embodiments, the reference audio data corresponding to the text to be read refers to the same type of audio data as the text to be read. For example, the text to be read is a martial arts novel text, and correspondingly, the reference audio data is audio data extracted from martial arts movie and television works. The arrangement is to make the reference audio data more fit with the text to be read. For example, the fighting scene audio in the reference audio data can be bound with the relevant paragraphs of the text to be read, so as to generate more vivid and rich reading audio data. The binding referred to herein may be understood as a timing binding.
For example, the text to be read is martial arts novel text, and the reference audio data is audio data extracted from martial arts movie and television works. The text to be read includes the paragraph related to fighting, so that the paragraph related to fighting in the text to be read can be marked correspondingly based on natural language processing technology such as Neuro-linear Programming (NLP), the fighting scene audio in the reference audio data is extracted, and the fighting scene audio is bound with the paragraph related to fighting which is marked correspondingly, and correspondingly, when the paragraph related to the text to be read is played in the playing process of the actual read audio data, the fighting scene audio can be automatically played.
For another example, the text to be read is an enunciated novel text, and the reference audio data is audio data extracted from an enunciated film and television work. If the text to be read contains the paragraph related to the weather, the paragraph related to the weather (such as rain) in the text to be read can be marked correspondingly based on the natural language processing technology such as NLP, the audio related to the weather (such as rain sound) in the reference audio data is extracted, and the audio related to the weather is bound with the paragraph related to the weather which is marked correspondingly, and accordingly, when the paragraph related to the text to be read is played in the actual playing process of the read audio data, the audio related to the weather (such as rain sound) can be automatically played.
Step S300, reading audio data corresponding to the text to be read is generated based on the reference audio data and the text analysis data.
Illustratively, the reading audio data is used to assist a reader of the text to be read in reading the text to be read. For example, the reading audio data is audio data completely corresponding to all the text contents in the text to be read, so that the reader can know the related contents recorded in the text to be read only by listening to the reading audio data without watching the text to be read any more, that is, the purpose of listening to the book is achieved.
Illustratively, in the practical application process, firstly, text parsing data corresponding to a text to be read is determined based on the text to be read, and then reading audio data corresponding to the text to be read is generated based on the text parsing data.
According to the text processing method provided by the embodiment of the invention, the text analysis data is obtained by performing the text analysis on the text to be read, and then the reading audio data corresponding to the text to be read is generated based on the text analysis data, so that the purpose of generating the reading audio data more matched with the text to be read according to the specific text content of the text to be read is realized. According to the embodiment of the invention, abundant audio materials are provided for generating the reading audio data by referring to the audio data, the conformity of the generated reading audio data and the text to be read is improved by means of the text analysis data, the aim of providing immersive reading experience for a reader based on the reading audio data is further realized, and the reading substitution sense and the user experience good sensitivity of the reader are improved.
In some embodiments, the execution subject of the embodiment shown in fig. 3 is a server connected to a user terminal, and the text to be read is stored in the user terminal. Correspondingly, the specific implementation manner of step S100 is: and determining text analysis data corresponding to the text to be read based on the text to be read acquired from the user terminal. After step S300, the server further needs to perform the following steps: and sending the read audio data to the user terminal.
A specific implementation manner of generating reading audio data corresponding to a text to be read is illustrated in detail below with reference to fig. 4 to 7.
Fig. 4 is a schematic diagram illustrating a path for generating reading audio data corresponding to a text to be read according to an embodiment of the present invention. As shown in fig. 4, the embodiment of the present invention refers to three paths for generating reading audio data corresponding to a text to be read, which are described as follows.
Path one: and (4) film and television video data corresponding to the text to be read. That is to say, the reading audio data corresponding to the text to be read is generated according to the movie video data corresponding to the text to be read.
Illustratively, the text to be read is a fiction text from the radio carving hero biography, and the movie video data corresponding to the text to be read is tv drama video data from the radio carving hero biography. For another example, the text to be read is a "eight dragon parts of the world Wide Web" novel text, and the movie video data corresponding to the text to be read is "eight dragon parts of the world Wide Web" video data of a television drama.
The reading audio data corresponding to the text to be read is generated based on the movie video data corresponding to the text to be read, the contact degree between the text to be read and the generated reading audio data can be greatly improved, and the user experience good feeling is further improved. Particularly, when the audio data comprises the dialogue audio data of the role, the reading effect of the reader can be optimized by using the movie and television video data corresponding to the text to be read. For example, the "Jingguo" role in the novel text of the Diao hero biography is matched with the pronunciation timbre of the "Gujing" role in the TV drama video data of the Diao hero biography.
Exemplarily, the audio data of the "guo jing" character in the video data of the drama of "zhao carving hero biography" is extracted, and then the extracted audio data is processed by utilizing a voice recognition technology to obtain the pronunciation timbre of the "guo jing" character. That is, the pronunciation timbre of the "guo jing" character is extracted from the audio data of the "guo jing" character by means of the voice recognition technology. For another example, the audio data of the "guo jing" character in the "zhao carving hero biography" drama video data is extracted, and then the audio data of the "guo jing" character is input into a pre-trained tone extraction model to obtain the pronunciation tone of the "guo jing" character. The tone extraction model is a neural network model, and the training data of the neural network model comprises an audio data sample to be extracted and pronunciation tone parameters corresponding to the audio data sample to be extracted. In the practical application process, an initial network model is established firstly, then the initial network model is trained based on the training data, and further a tone extraction model capable of extracting pronunciation tone is obtained.
And a second route: and audio data of the same type of the text to be read. That is, the reading audio data corresponding to the text to be read is generated from the audio data of the same type as the text to be read.
Illustratively, the text to be read is a white deer original novel text, and the audio data corresponding to the text to be read is audio data extracted based on the video data of the television drama in the common world. It can be understood that the 'white deer source' and 'trivial world' both belong to the realistic category of the rural subject matter.
The reading audio data corresponding to the text to be read is generated according to the audio data of the same type as the text to be read, so that not only can the background audio and/or atmosphere audio in the audio data be fully utilized, but also the acquisition range of the audio data can be greatly expanded, and the situation that the reading audio data corresponding to the text to be read cannot be generated when the text to be read does not have completely corresponding movie and television video data is avoided. In addition, novel experience can be brought to readers, and the good feeling of the user experience is improved.
Path three: character pronunciation selection information uttered by the reader. That is, reading audio data corresponding to the text to be read is generated according to the character pronunciation selection information uttered by the reader. Based on the path three, the user experience good sensitivity can be greatly improved.
Illustratively, the text to be read is a minor text of the conutus tradition, and the audio data includes conversational audio data of a character, then the reader sends character pronunciation selection information to a "warm tai doctor" character in the conutus tradition, and the "warm tai doctor" character is required to match the pronunciation tone (also called pronunciation characteristic) of the singer "liu de hua". Based on the requirement of a reader, generating dialogue audio data of a warm tai doctor role based on the pronunciation timbre of the singer Liu De Hua, and further generating final reading audio data. It is understood that the pronunciation timbre of the singer "Liu De Hua" can be previously acquired based on the song that the singer "Liu De Hua" sings.
Path one and path two are illustrated below with reference to fig. 5 and fig. 6, and path three is illustrated with reference to fig. 7, so as to further explicitly generate a specific implementation manner of reading audio data corresponding to a text to be read.
Fig. 5 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to an embodiment of the present invention. The embodiment shown in fig. 5 of the present invention is extended from the embodiment shown in fig. 3 of the present invention, and the differences between the embodiment shown in fig. 5 and the embodiment shown in fig. 3 will be emphasized below, and the descriptions of the same parts will not be repeated.
Specifically, in the embodiment of the present invention, the text to be read includes a role dialog sentence, and the role dialog sentence corresponds to at least one role. That is, the character dialogue sentences may be self-uttered sentences of the same character or dialogue sentences between different characters. The text parsing data includes character feature data and character dialogue content data corresponding to the characters.
As shown in fig. 5, in the text processing method according to the embodiment of the present invention, the step of generating the reading audio data based on the reference audio data and the text parsing data includes the following steps.
Step S221, based on the character characteristic data and the reference audio data, determining pronunciation characteristic information corresponding to the character.
As described above, the character feature data refers to data that can characterize the character of a character, and the character dialogue content data refers to dialogue sentence data corresponding to the character. Illustratively, the pronunciation feature information corresponding to the character refers to pronunciation tone feature information.
It is understood that the reference audio data mentioned in step S221 may be video audio data corresponding to the text to be read (i.e. the first path mentioned in the embodiment shown in fig. 4), or may be audio data of the same type as the text to be read (i.e. the second path mentioned in the embodiment shown in fig. 4). If the reference audio data is movie audio data corresponding to a text to be read, the character feature data may include character identity data. Further, the specific execution manner of step S221 may be: extracting audio material data corresponding to the roles based on the role identity data and the movie and television audio data; and determining pronunciation characteristic information corresponding to the character based on the audio material data. The audio material data may be a piece of audio data spoken by the character.
In step S222, dialogue audio data corresponding to the character is generated based on the pronunciation feature information and the character dialogue content data.
The meanings of character feature data and character dialogue content data are exemplified below.
For example, the text of the novel of the Coconutus (Coconutus) contains text contents.
end-Fei slight smile, the opposing side of the foot: ' the driver takes alone the Jili hall at the time of Jie lake and takes care with waiter without worrying, and the Loisei driver walks to the Loisei hall at the time of Jie lake and feels happy and chatting. "she laughs as pale and floating clouds, turns around to be on the side of a maid: "As one wishes". ".
Correspondingly, the character feature data extracted based on the text content comprises ' end fei ', and the character conversation content data comprises ' the steps of remotely seeing the corner independent of the waiter, giving a feeling of drunk, giving no worry at all, seeing the corner with the waiter and seeing the corner with the jade bridge at the lake break, walking to the corner with the waiter, talking very well and having a long time. "and" as desired. ".
It is understood that the dialogue audio data corresponding to the character refers to audio data completely corresponding to the character dialogue content data generated based on the pronunciation feature information corresponding to the character.
In step S223, reading audio data is generated based on the dialogue audio data.
In some embodiments, the conversation audio data is read audio data. That is, step S223 is specifically performed in such a manner that the dialogue audio data is regarded as the reading audio data.
In other embodiments, the reading audio data may include not only the dialogue audio data but also the voice-over audio data and the atmosphere effect data, etc., as mentioned in the following embodiments.
The embodiment of the invention can match more appropriate pronunciation timbres for the roles in the text to be read, and particularly when the text to be read comprises a plurality of different roles, the embodiment of the invention can match more reasonable and vivid pronunciation timbres for the different roles, thereby improving the interestingness of the generated reading audio data and enhancing the substitution sense of a reader when listening to a book.
Fig. 6 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention. The embodiment shown in fig. 6 of the present invention is extended from the embodiment shown in fig. 5 of the present invention, and the differences between the embodiment shown in fig. 6 and the embodiment shown in fig. 5 will be emphasized below, and the descriptions of the same parts will not be repeated.
Specifically, in an embodiment of the present invention, the character feature data includes at least one of age data, gender data, and occupation data. The reference audio data may include a plurality of character samples and pronunciation characteristic information corresponding to each of the plurality of character samples.
As shown in fig. 6, in the text processing method according to the embodiment of the present invention, the step of generating the reading audio data based on the reference audio data and the text parsing data includes the following steps.
Step S224, based on the character characteristic data and a plurality of character samples, determining the character samples matched with the characters.
In some embodiments, the character characteristic data is derived by analyzing character dialog content data corresponding to the character.
Step S225 determines pronunciation feature information corresponding to the character based on pronunciation feature information corresponding to the character sample matched with the character.
Illustratively, pronunciation characteristic information corresponding to the character sample matched with the character is determined as pronunciation characteristic information corresponding to the character.
According to the embodiment of the invention, more reasonable pronunciation timbres can be matched for the roles according to the actual characteristics of the roles, so that the effect of reading assistance by combining reading audio data is improved. Especially when the text to be read comprises a plurality of different characters, the reading audio data generated by the embodiment of the invention can realize an interactive book listening mode.
Fig. 7 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention. As shown in fig. 7, in the text processing method according to the embodiment of the present invention, the step of generating the reading audio data based on the reference audio data and the text parsing data includes the following steps.
In step S226, character pronunciation selection information issued by the reader based on the character feature data and the reference audio data is acquired.
Step S227, based on the character pronunciation selection information and the reference audio data, determines pronunciation feature information corresponding to the character.
That is to say, the embodiment of the invention can support the reader to select the pronunciation tone for the role in the text to be read according to the preference of the reader, and further generate the reading audio data meeting the personalized requirement of the reader. So set up, can further improve user experience good feeling degree.
Fig. 8 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention. The embodiment shown in fig. 8 of the present invention is extended from the embodiment shown in fig. 3 of the present invention, and the differences between the embodiment shown in fig. 8 and the embodiment shown in fig. 3 will be emphasized below, and the descriptions of the same parts will not be repeated.
Specifically, in the embodiment of the present invention, the text to be read includes the whitetext, and the reference audio data includes the whitetext pronunciation feature information. Correspondingly, the Chinese character-to-be-read text can be obtained through a text analysis mode (namely text analysis data). Illustratively, the text without the role dialog content in the text to be read is classified as a text with voice-over. As shown in fig. 8, the step of generating reading audio data based on the reference audio data and the text parsing data includes the following steps.
And S310, generating the voice-over audio data corresponding to the voice-over text based on the voice-over feature information of the voice-over text and the voice-over text.
Step S320, generating read audio data based on the voice-over data.
In some embodiments, the specific implementation manner of step S320 is: the voice-over audio data is determined as the reading audio data. In other embodiments, the specific implementation manner of step S320 is as follows: the reading audio data is generated in combination with the voice-over audio data and the dialogue audio data mentioned in the above embodiments.
It should be noted that the pronunciation timbre of the bystander text can also be switched according to the actual requirement of the reader, for example, the reader designates the pronunciation timbre of the bystander text as "a" latifoliate ", which is not described in detail herein.
The embodiment of the invention can further optimize the immersive reading experience of the reader by means of the voice-over data.
Fig. 9 is a schematic flow chart illustrating a process of generating reading audio data based on reference audio data and text parsing data according to another embodiment of the present invention. The embodiment shown in fig. 9 of the present invention is extended from the embodiment shown in fig. 3 of the present invention, and the differences between the embodiment shown in fig. 9 and the embodiment shown in fig. 3 will be emphasized below, and the descriptions of the same parts will not be repeated.
Specifically, in the embodiment of the present invention, the text parsing data includes atmosphere text data, and the reference audio data includes atmosphere sound effect data. The atmosphere text data may include background atmosphere text data, atmosphere text data, and the like. As shown in fig. 9, the step of generating reading audio data based on the reference audio data and the text parsing data includes the following steps.
Step S410, atmosphere reading label information corresponding to the text to be read is determined based on the atmosphere text data.
Exemplarily, text analysis is performed on a text to be read to obtain atmosphere text data, and corresponding atmosphere reading label information is bound to the atmosphere text data. The atmosphere reading label information is used for marking the playing time sequence node of the subsequent atmosphere sound effect data in the reading audio data.
Step S420, reading audio data is generated based on the atmosphere reading label information and the atmosphere sound effect data.
For example, the atmosphere text data in the text to be read includes "beautiful city, car water dragon, people head movement", and it can be known through analysis that the atmosphere text data corresponds to the atmosphere reading label information of "car, siren", and correspondingly, the atmosphere sound effect data is car siren audio data. Then, in the actual playing process, when the atmosphere text data is played, the car siren audio data can be played.
The embodiment of the invention can further enrich the generated reading audio data by using the atmosphere sound effect data, thereby further improving the immersive experience effect of the reader.
In some embodiments, the reading audio data may be generated jointly by the ambience sound effect data, the voice-over audio data and the dialogue audio data, i.e. the above embodiments are freely combined according to actual needs in order to optimize the listening experience. Therefore, compared with the existing mode that the reading audio is recorded in advance by the anchor, the embodiment of the invention does not need to record the audio data manually, has low cost and high efficiency, and compared with the existing mode that the reading audio is generated by using TTS technology, the embodiment of the invention not only can match different roles with different tones according to the actual requirements of readers, but also can match corresponding bystander audio for bystander texts, and match corresponding atmosphere sound effect for atmosphere texts, thereby really achieving the purpose of bringing immersive reading experience to the readers based on reading audio data, and improving the reading substitution sense of the readers and the user experience goodness.
Fig. 10 is a flowchart illustrating an assistant reading method according to an embodiment of the invention. Illustratively, the reading assistance method mentioned in the embodiment of the present invention may be executed in a mobile terminal of a reader. As shown in fig. 10, the reading assistance method provided by the embodiment of the invention includes the following steps.
And step S500, acquiring reading audio data corresponding to the text to be read based on the text to be read determined by the reader.
For example, the reading audio data corresponding to the text to be read mentioned in step S500 may be determined based on the text processing method mentioned in any of the above embodiments.
Step S600, playing the reading audio data to assist the reader in reading the text to be read.
According to the auxiliary reading method provided by the embodiment of the invention, the purpose of assisting the reader to read the text to be read is realized by acquiring the reading audio data corresponding to the text to be read based on the text to be read determined by the reader and playing the reading audio data. The embodiment of the invention realizes the purpose of bringing immersive reading experience to the reader based on reading audio data, and improves the reading substitution sense of the reader and the user experience good sensitivity.
In some embodiments, the execution subject of the embodiment shown in fig. 10 is a user terminal connected to the server, and the text to be read is stored in the user terminal. Correspondingly, the specific implementation manner of step S500 is: and acquiring reading audio data corresponding to the text to be read from the server based on the text to be read determined by the reader.
Another embodiment of the present invention is extended from the embodiment shown in fig. 10. The auxiliary reading method provided by the embodiment of the invention further comprises the following steps: acquiring pronunciation switching information sent by a reader, wherein the pronunciation switching information comprises voice switching information of the bystander text and/or role pronunciation switching information; and updating the reading audio data based on the pronunciation switching information to obtain updated reading audio data. And the step of playing the reading audio data to assist the reader in reading the text to be read comprises the following steps: and switching and playing the updated reading audio data based on the updating time point so as to assist the reader to read the text to be read.
That is to say, the embodiment of the invention can meet the requirement that the reader switches the pronunciation tone according to the specific situation of the text in the process of listening to the book, and further can further improve the good feeling of the user experience.
The method embodiment of the present invention is described in detail above with reference to fig. 3 to 10, and the apparatus embodiment of the present invention is described in detail below with reference to fig. 11 to 14. It is to be understood that the description of the method embodiments corresponds to the description of the apparatus embodiments, and therefore reference may be made to the preceding method embodiments for parts not described in detail.
Fig. 11 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present invention. As shown in fig. 11, the text processing apparatus according to the embodiment of the present invention includes a first determining module 100, a second determining module 200, and a generating module 300. The first determining module 100 is configured to determine text parsing data corresponding to a text to be read based on the text to be read. The second determining module 200 is configured to determine, based on the text to be read, reference audio data corresponding to the text to be read. The generating module 200 is configured to generate reading audio data corresponding to a text to be read based on the reference audio data and the text parsing data.
In some embodiments, the generating module 200 is further configured to determine pronunciation feature information corresponding to the character based on the character feature data and the reference audio data, generate dialogue audio data corresponding to the character based on the pronunciation feature information and the character dialogue content data, and generate reading audio data based on the dialogue audio data.
In some embodiments, the generating module 200 is further configured to determine a character sample matching the character based on the character feature data and the plurality of character samples, and determine pronunciation feature information corresponding to the character based on pronunciation feature information corresponding to the character sample matching the character.
In some embodiments, the generating module 200 is further configured to obtain character pronunciation selection information issued by the reader based on the character feature data and the reference audio data, and determine pronunciation feature information corresponding to the character based on the character pronunciation selection information and the reference audio data.
In some embodiments, the generating module 200 is further configured to generate the whiteside audio data corresponding to the whiteside text based on the whiteside text pronunciation feature information and the whiteside text, and generate the reading audio data based on the whiteside audio data.
In some embodiments, the generating module 200 is further configured to determine, based on the atmosphere text data, atmosphere reading tag information corresponding to a text to be read, and generate read audio data based on the atmosphere reading tag information and the atmosphere sound effect data.
Fig. 12 is a schematic structural diagram of an auxiliary reading device according to an embodiment of the present invention. As shown in fig. 12, the reading aid provided by the embodiment of the present invention includes a first obtaining module 500 and a playing module 600. The first obtaining module 500 is configured to obtain reading audio data corresponding to a text to be read based on the text to be read determined by a reader. The playing module 600 is used for playing the reading audio data to assist the reader in reading the text to be read.
In some embodiments, the reading assistance device provided by the embodiments of the present invention further includes a second obtaining module and an updating module. And the second acquisition module is used for acquiring pronunciation switching information sent by a reader, and the pronunciation switching information comprises voice-over information of the bystander text and/or role pronunciation switching information. And the updating module is used for updating the reading audio data based on the pronunciation switching information to obtain updated reading audio data. Correspondingly, the playing module 600 is further configured to switch to play the updated reading audio data based on the update time point, so as to assist the reader in reading the text to be read.
Fig. 13 is a schematic structural diagram of an apparatus for a text processing method and an auxiliary reading method according to an embodiment of the present invention. For example, the apparatus 700 may be a robot, a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 13, the apparatus 700 may include one or more of the following components: processing component 702, memory 704, power component 706, multimedia component 708, audio component 710, input/output (I/O) interface 712, sensor component 77, and communication component 716.
The processing component 702 generally controls overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 can include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.
The memory 704 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on device 700, text to be read, audio data to be read, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 706 provides power to the various components of the device 700. The power components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 700.
The multimedia component 708 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 710 is configured to output and/or input audio signals. For example, audio component 710 includes a Microphone (MIC) configured to receive external audio signals when apparatus 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, the audio component 710 further includes a speaker for outputting audio signals, such as reading audio data.
The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the apparatus 700. For example, sensor assembly 714 may detect an open/closed state of device 700, the relative positioning of components, such as a display and keypad of device 700, sensor assembly 714 may also detect a change in position of device 700 or a component of device 700, the presence or absence of user contact with device 700, orientation or acceleration/deceleration of device 700, and a change in temperature of device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 716 is configured to facilitate wired or wireless communication between the apparatus 700 and other devices. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, 2G or 8G, or a combination thereof. In an exemplary embodiment, the communication section 716 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the device 700 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 14 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 800, which may vary significantly depending on configuration or performance, may include one or more Central Processing Units (CPUs) 822 (e.g., one or more processors) and memory 832, one or more storage media 830 (e.g., one or more mass storage devices) storing applications 842 or data 844. Memory 832 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 822 may be provided in communication with the storage medium 830 for executing a series of instruction operations in the storage medium 830 on the server 800.
The server 800 may also include one or more power supplies 824, one or more wired or wireless network interfaces 850, one or more input/output interfaces 858, one or more keyboards 854, and/or one or more operating systems 841, such as Windows Server, Mac OS XTM, UnixTM, Linux, FreeBSDTM, etc.
The server 800 may be used to execute the text processing method and/or the reading aid method mentioned in the above embodiments.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as a Read Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (20)

1. A method of text processing, comprising:
determining text analysis data corresponding to a text to be read based on the text to be read, wherein the text analysis data comprises data capable of representing text content characteristics of the text to be read;
determining reference audio data corresponding to the text to be read based on the text to be read;
and generating reading audio data corresponding to the text to be read based on the reference audio data and the text analysis data, wherein the reading audio data is used for assisting a reader of the text to be read to read the text to be read.
2. The text processing method according to claim 1, wherein the text to be read includes a role dialogue sentence, the role dialogue sentence corresponds to a role, and the text parsing data includes role feature data and role dialogue content data corresponding to the role;
generating reading audio data corresponding to the text to be read based on the reference audio data and the text parsing data, including:
determining pronunciation characteristic information corresponding to the character based on the character characteristic data and the reference audio data;
generating dialogue audio data corresponding to the role based on the pronunciation characteristic information and the role dialogue content data;
generating the reading audio data based on the conversation audio data.
3. The text processing method according to claim 2, wherein the character feature data includes character identity data, and the reference audio data includes video audio data extracted based on video data corresponding to the text to be read;
the determining pronunciation feature information corresponding to the character based on the character feature data and the reference audio data includes:
extracting audio material data corresponding to the roles on the basis of the role identity data and the video audio data;
and determining pronunciation characteristic information corresponding to the character based on the audio material data.
4. The text processing method of claim 2, wherein the character feature data comprises at least one of age data, gender data and occupation data, and the reference audio data comprises a plurality of character samples and pronunciation feature information corresponding to each of the plurality of character samples;
the determining pronunciation feature information corresponding to the character based on the character feature data and the reference audio data includes:
determining a role sample matching the role based on the role feature data and the plurality of role samples;
and determining pronunciation characteristic information corresponding to the role based on the pronunciation characteristic information corresponding to the role sample matched with the role.
5. The method of claim 2, wherein the determining pronunciation characteristic information corresponding to the character based on the character characteristic data and the reference audio data comprises:
acquiring character pronunciation selection information sent by the reader based on the character feature data and the reference audio data;
and determining pronunciation characteristic information corresponding to the character based on the character pronunciation selection information and the reference audio data.
6. The text processing method according to any one of claims 1 to 5, wherein the text to be read comprises an onwhite text, and the reference audio data comprises an onwhite text pronunciation feature information;
generating reading audio data corresponding to the text to be read based on the reference audio data and the text parsing data, including:
generating voice-over audio data corresponding to the voice-over text based on the voice-over feature information of the voice-over text and the voice-over text;
and generating the reading audio data based on the voice-over audio data.
7. The text processing method according to any one of claims 1 to 5, wherein the text parsing data includes atmosphere text data, and the reference audio data includes atmosphere sound effect data;
generating reading audio data corresponding to the text to be read based on the reference audio data and the text parsing data, including:
determining atmosphere reading label information corresponding to the text to be read based on the atmosphere text data;
and generating the reading audio data based on the atmosphere reading tag information and the atmosphere sound effect data.
8. An assistive reading method, comprising:
acquiring reading audio data corresponding to a text to be read based on the text to be read determined by a reader, wherein the reading audio data is determined based on the text processing method of any one of claims 1 to 7;
and playing the reading audio data to assist the reader in reading the text to be read.
9. The reading aid of claim 8, further comprising:
acquiring pronunciation switching information sent by the reader, wherein the pronunciation switching information comprises voice switching information of the bystander text and/or role pronunciation switching information;
updating the reading audio data based on the pronunciation switching information to obtain updated reading audio data;
wherein the playing the reading audio data to assist the reader in reading the text to be read includes:
and switching and playing the updated reading audio data based on the updating time point so as to assist the reader to read the text to be read.
10. A text processing apparatus, comprising:
the reading device comprises a first determining module, a second determining module and a reading module, wherein the first determining module is used for determining text analysis data corresponding to a text to be read based on the text to be read, and the text analysis data comprises data capable of representing text content characteristics of the text to be read;
the second determining module is used for determining reference audio data corresponding to the text to be read based on the text to be read;
and the generating module is used for generating reading audio data corresponding to the text to be read based on the reference audio data and the text analysis data, wherein the reading audio data is used for assisting a reader of the text to be read to read the text to be read.
11. The text processing device according to claim 10, wherein the text to be read includes a character dialogue sentence, the character dialogue sentence corresponds to a character, and the text parsing data includes character feature data and character dialogue content data corresponding to the character;
the generating module is further configured to determine pronunciation feature information corresponding to the character based on the character feature data and the reference audio data, generate dialogue audio data corresponding to the character based on the pronunciation feature information and the character dialogue content data, and generate the reading audio data based on the dialogue audio data.
12. The text processing apparatus according to claim 11, wherein the character feature data includes character identity data, and the reference audio data includes video audio data extracted based on video data corresponding to the text to be read;
the generating module is further configured to extract audio material data corresponding to the character based on the character identity data and the video audio data, and determine pronunciation feature information corresponding to the character based on the audio material data.
13. The text processing apparatus according to claim 11, wherein the character feature data includes at least one of age data, gender data, and occupation data, and the reference audio data includes a plurality of character samples and pronunciation feature information corresponding to each of the plurality of character samples;
the generating module is further configured to determine a character sample matching the character based on the character feature data and the plurality of character samples, and determine pronunciation feature information corresponding to the character based on pronunciation feature information corresponding to the character sample matching the character.
14. The apparatus according to claim 11, wherein the generating module is further configured to acquire character pronunciation selection information issued by the reader based on the character feature data and the reference audio data, and determine pronunciation feature information corresponding to the character based on the character pronunciation selection information and the reference audio data.
15. The text processing device according to any one of claims 10 to 14, wherein the text to be read comprises an onwhite text, and the reference audio data comprises an onwhite text pronunciation feature information;
the generating module is further configured to generate the voice-over audio data corresponding to the voice-over text based on the voice-over feature information of the voice-over text and the voice-over text, and generate the reading audio data based on the voice-over audio data.
16. The text processing apparatus according to any one of claims 10 to 14, wherein the text parsing data comprises ambience text data, and the reference audio data comprises ambience sound effect data;
the generating module is further used for determining atmosphere reading tag information corresponding to the text to be read based on the atmosphere text data, and generating reading audio data based on the atmosphere reading tag information and the atmosphere sound effect data.
17. An assistive reading device, comprising:
a first obtaining module, configured to obtain, based on a text to be read determined by a reader, reading audio data corresponding to the text to be read, where the reading audio data is determined based on the text processing method according to any one of claims 1 to 7;
and the playing module is used for playing the reading audio data so as to assist the reader to read the text to be read.
18. The assistive reading device of claim 17, further comprising:
the second acquisition module is used for acquiring pronunciation switching information sent by the reader, wherein the pronunciation switching information comprises voice switching information of the bystander text and/or role pronunciation switching information;
the updating module is used for updating the reading audio data based on the pronunciation switching information to obtain updated reading audio data;
the playing module is further configured to switch and play the updated reading audio data based on the update time point, so as to assist the reader in reading the text to be read.
19. A computer-readable storage medium, characterized in that the storage medium stores instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of the preceding claims 1 to 9.
20. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing computer executable instructions;
the processor for executing the computer-executable instructions to implement the method of any of the preceding claims 1 to 9.
CN202110909261.5A 2021-08-09 2021-08-09 Text processing method and device and reading assisting method and device Pending CN113779958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110909261.5A CN113779958A (en) 2021-08-09 2021-08-09 Text processing method and device and reading assisting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110909261.5A CN113779958A (en) 2021-08-09 2021-08-09 Text processing method and device and reading assisting method and device

Publications (1)

Publication Number Publication Date
CN113779958A true CN113779958A (en) 2021-12-10

Family

ID=78837078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110909261.5A Pending CN113779958A (en) 2021-08-09 2021-08-09 Text processing method and device and reading assisting method and device

Country Status (1)

Country Link
CN (1) CN113779958A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111443890A (en) * 2020-01-19 2020-07-24 托普朗宁(北京)教育科技有限公司 Reading assisting method and device, storage medium and electronic equipment
CN112562430A (en) * 2019-09-26 2021-03-26 阿里巴巴集团控股有限公司 Auxiliary reading method, video playing method, device, equipment and storage medium
CN112908292A (en) * 2019-11-19 2021-06-04 北京字节跳动网络技术有限公司 Text voice synthesis method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562430A (en) * 2019-09-26 2021-03-26 阿里巴巴集团控股有限公司 Auxiliary reading method, video playing method, device, equipment and storage medium
CN112908292A (en) * 2019-11-19 2021-06-04 北京字节跳动网络技术有限公司 Text voice synthesis method and device, electronic equipment and storage medium
CN111443890A (en) * 2020-01-19 2020-07-24 托普朗宁(北京)教育科技有限公司 Reading assisting method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110634483B (en) Man-machine interaction method and device, electronic equipment and storage medium
US11580290B2 (en) Text description generating method and device, mobile terminal and storage medium
CN107403011B (en) Virtual reality environment language learning implementation method and automatic recording control method
CN104078038B (en) A kind of content of pages reads aloud method and apparatus
CN113409764B (en) Speech synthesis method and device for speech synthesis
WO2022089224A1 (en) Video communication method and apparatus, electronic device, computer readable storage medium, and computer program product
CN110162598B (en) Data processing method and device for data processing
CN109614470B (en) Method and device for processing answer information, terminal and readable storage medium
CN108538284A (en) Simultaneous interpretation result shows method and device, simultaneous interpreting method and device
CN108073572A (en) Information processing method and its device, simultaneous interpretation system
CN111813301B (en) Content playing method and device, electronic equipment and readable storage medium
CN113689530B (en) Method and device for driving digital person and electronic equipment
JP2003037826A (en) Substitute image display and tv phone apparatus
CN113113044B (en) Audio processing method and device, terminal and storage medium
CN110610720B (en) Data processing method and device and data processing device
CN111739528A (en) Interaction method and device and earphone
CN113779958A (en) Text processing method and device and reading assisting method and device
CN113409765B (en) Speech synthesis method and device for speech synthesis
KR20200056754A (en) Apparatus and method for generating personalization lip reading model
CN114356068B (en) Data processing method and device and electronic equipment
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN114155849A (en) Virtual object processing method, device and medium
CN113891150A (en) Video processing method, device and medium
CN108364631B (en) Speech synthesis method and device
CN113709548A (en) Image-based multimedia data synthesis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination