CN109658917A - E-book chants method, apparatus, computer equipment and storage medium - Google Patents

E-book chants method, apparatus, computer equipment and storage medium Download PDF

Info

Publication number
CN109658917A
CN109658917A CN201910044763.9A CN201910044763A CN109658917A CN 109658917 A CN109658917 A CN 109658917A CN 201910044763 A CN201910044763 A CN 201910044763A CN 109658917 A CN109658917 A CN 109658917A
Authority
CN
China
Prior art keywords
information
emotion
data
literal field
book
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910044763.9A
Other languages
Chinese (zh)
Inventor
张誉怀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910044763.9A priority Critical patent/CN109658917A/en
Publication of CN109658917A publication Critical patent/CN109658917A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of e-book to chant method, device, computer equipment and storage medium, before carrying out e-book and chanting, first, the literal field chanted will be needed to be converted into array, then, the neural network model of emotional semantic classification is carried out by the data matrix that training to convergence state is used to convert literal field, classify to array, obtain emotion expressed by the literal field, finally, by classification to emotion search in audio database the sound bank that there is identical emotional expression with the emotion, the sound bank sound intermediate frequency data are called to carry out audio conversion to literal field, the voice that literal field can be made to convert has emotional expression, can the sentient content to e-book chant.Therefore, chanting for e-book can be kept identical as literal expression emotion when voice plays e-book, can really simulates sound and emotion that true man chant, emotional expression enriches fascinating, raising user's listening experience.

Description

E-book chants method, apparatus, computer equipment and storage medium
Technical field
The present embodiments relate to field of audio play, especially a kind of e-book chants method, apparatus, computer equipment And storage medium.
Background technique
Eyes and ear are the most direct sense organ organs of human contact's things, wherein text is with eyes sense organ for reception The information of medium records and route of transmission, and it is the information propagating pathway for receiving medium that language, which is then with the sense of hearing of ear,.It compares Most direct information acquiring pattern is used as by people because being easier to be recorded in the sounding text of language.But it is answered some It is read in scene, people need text information being converted to voice.
In the prior art, people can select to turn the text information in e-book when reading using e-book Voice is turned to be listened to.Wherein, conversion regime be the corresponding pronunciation of each word is recorded, into translative mode after call The corresponding pronunciation of text in converting text, and the pronunciation of single word is combined by the typographical sequences of text and generates pronunciation group It closes, then plays pronunciation combination and complete the process that text information is converted into voice messaging.
The inventor of the invention has found under study for action, after text information is converted into voice messaging in the prior art, from Begin to eventually being pronounced by a tone and tone color, the voice messaging articulation type after conversion is stereotyped, is beyond expression Text emotion to be expressed, it is long-term to play the discomfort that cause user, cause user experience poor.
Summary of the invention
The embodiment of the present invention provides a kind of emotion by identification literal field expression, and suitable according to the emotion of literal field The e-book that audio file with corresponding emotion chanted after voice conversion to literal field chants method, apparatus, meter Calculate machine equipment and storage medium.
In order to solve the above technical problems, the technical solution that the embodiment of the invention uses is: providing a kind of electricity The philosophical works chants method, comprising:
Obtain literal field to be chanted;
The literal field is converted into array according to preset conversion regime, and the array is input to In preset emotion recognition model, wherein the emotion recognition model is used for for training in advance to convergence state to data institute table The neural network model that sign emotion information is analyzed;
Read the classification information of emotion expressed by the characterization literal field of the emotion recognition model output;
The sound bank that there are mapping relations with the classification information is searched in preset audio database, and described in calling Audio data in sound bank carries out voice conversion to the literal field, wherein the audio data institute table in the sound bank The emotion reached and the emotion information that the classification data characterizes are same or similar.
Optionally, described that the voice that there are mapping relations with the classification information is searched in preset audio database Library, and before calling audio data in the sound bank to carry out voice conversion to the literal field, further includes:
Obtain the people information for stating the literal field;
There is according to the people information in preset character data library lookup and the people information angle of mapping relations Color set information, wherein include the gender information for stating the personage of the literal field in role's set information.
Optionally, described that the voice that there are mapping relations with the classification information is searched in preset audio database Library, and call the audio data in the sound bank to carry out voice conversion to the literal field and include:
The voice that there are mapping relations with the classification information and the gender information is searched in the audio database Library, and the audio data in the sound bank is called to carry out voice conversion to the literal field, wherein in the sound bank The tone of audio data is identical as the tone of personage's gender that the gender information characterizes.
Optionally, described that the voice that there are mapping relations with the classification information is searched in preset audio database Library, and after calling audio data in the sound bank to carry out voice conversion to the literal field, further includes:
Searching in preset musical database has mapping relation context music with the classification information, wherein described Emotion expressed by background music and the emotion information that the classification data characterizes are same or similar;
When playing the voice data of the literal field conversion, the background music is played, wherein the background music Broadcast sound volume be less than the voice data broadcast sound volume.
Optionally, described that the voice that there are mapping relations with the classification information is searched in preset audio database Library, and after calling audio data in the sound bank to carry out voice conversion to the literal field, further includes:
Obtain the body surface parameter of target user, wherein the body surface parameter is to detect the one or more body surface of human body The supplemental characteristic of parameter;
Parameter matrix is converted by the supplemental characteristic, and the parameter matrix is input to preset body state and is judged In model, wherein the body state judgment models are that training in advance is used to be judged according to body surface parameter to convergence state The neural network model of body state;
Read the state classification data of the target user of the body state judgment models output;
According to the broadcast state for the voice data that literal field described in the state classification data point reuse is converted, wherein institute Broadcast state is stated to include broadcast sound volume and whether continue to play the voice data.
Optionally, described that the voice that there are mapping relations with the classification information is searched in preset audio database Library, and after calling audio data in the sound bank to carry out voice conversion to the literal field, further includes:
Obtain the environmental volume parameter of target user's local environment;
The broadcast sound volume that there are mapping relations with the environmental volume parameter is searched in preset volume data list;
The voice data of the literal field conversion is played according to the broadcast sound volume.
Optionally, described search in preset volume data list has mapping relations with the environmental volume parameter Before broadcast sound volume, further includes:
The environmental volume parameter is compared with preset volume threshold;
When the environmental volume parameter is greater than the volume threshold, preset prompt information is called to be shown;Otherwise, Then the broadcast sound volume for having mapping relations with the environmental volume parameter is searched in confirmation in the volume data list, wherein The prompt information prompt pause plays the voice data.
In order to solve the above technical problems, the embodiment of the present invention, which also provides a kind of e-book, chants device, comprising:
Module is obtained, for obtaining literal field to be chanted;
Processing module, for the literal field to be converted to array according to preset conversion regime, and will be described Array is input in preset emotion recognition model, wherein the emotion recognition model is to be trained in advance to convergence state Neural network model for being analyzed the characterized emotion information of data;
Read module, for reading emotion expressed by the characterization literal field of the emotion recognition model output Classification information;
Execution module, for searching the voice that there are mapping relations with the classification information in preset audio database Library, and the audio data in the sound bank is called to carry out voice conversion to the literal field, wherein in the sound bank Emotion expressed by audio data and the emotion information that the classification data characterizes are same or similar.
Optionally, the e-book chants device further include:
First acquisition submodule, for obtaining the people information for stating the literal field;
First processing submodule, for being believed according to the people information in preset character data library lookup and the personage Cease role's set information with mapping relations, wherein include the people for stating the literal field in role's set information The gender information of object.
Optionally, the e-book chants device further include:
First processing submodule, for being searched and the classification information and the gender information in the audio database Sound bank with mapping relations, and the audio data in the sound bank is called to carry out voice conversion to the literal field, Wherein, the tone of the audio data in the sound bank is identical as the tone of personage's gender that the gender information characterizes.
Optionally, the e-book chants device further include:
Second processing submodule has mapping relations with the classification information for searching in preset musical database Background music, wherein emotion expressed by the background music and the emotion information that the classification data characterizes are same or similar;
Second implementation sub-module, for playing the background sound when playing the voice data of the literal field conversion It is happy, wherein the broadcast sound volume of the background music is less than the broadcast sound volume of the voice data.
Optionally, the e-book chants device further include:
Second acquisition submodule, for obtaining the body surface parameter of target user, wherein the body surface parameter is detection human body One or more body surface parameter supplemental characteristic;
Third handles submodule, for converting parameter matrix for the supplemental characteristic, and the parameter matrix is inputted To in preset body state judgment models, wherein the body state judgment models are that training to convergence state is used in advance The neural network model of body state is judged according to body surface parameter;
First reading submodule, the state point of the target user for reading the body state judgment models output Class data;
Third implementation sub-module, the voice data for the conversion of the literal field according to the state classification data point reuse Broadcast state, wherein whether the broadcast state includes broadcast sound volume and continues to play the voice data.
Optionally, the e-book chants device further include:
Third acquisition submodule, for obtaining the environmental volume parameter of target user's local environment;
Fourth process submodule is reflected for searching to have with the environmental volume parameter in preset volume data list Penetrate the broadcast sound volume of relationship;
4th implementation sub-module, for playing the voice data of the literal field conversion according to the broadcast sound volume.
Optionally, the e-book chants device further include:
First compares submodule, for the environmental volume parameter to be compared with preset volume threshold;
5th implementation sub-module, for calling preset mention when the environmental volume parameter is greater than the volume threshold Show that information is shown;Otherwise, then confirmation is searched with the environmental volume parameter in the volume data list with mapping The broadcast sound volume of relationship, wherein the prompt information prompt pause plays the voice data.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer equipment, including memory and processing Device is stored with computer-readable instruction in the memory, when the computer-readable instruction is executed by the processor, so that The processor executes the step of e-book described above chants method.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of storage Jie for being stored with computer-readable instruction Matter, when the computer-readable instruction is executed by one or more processors, so that one or more processors execute above-mentioned institute State the step of e-book chants method.
The beneficial effect of the embodiment of the present invention is: before carrying out e-book and chanting, firstly, the text word that needs are chanted Section is converted into array, then, carries out emotion by the data matrix that training to convergence state is used to convert literal field The neural network model of classification, classifies to array, obtains emotion expressed by the literal field, finally, by dividing Class to emotion searched in audio database with the emotion have identical emotional expression sound bank, call in the sound bank Audio data carries out audio conversion to literal field, and the voice that literal field can be made to convert has emotional expression, can have Emotion chants the content of e-book.Therefore, can voice play e-book when, make e-book chant and text It shows emotion identical, can really simulate the sound and emotion that true man chant, emotional expression enriches fascinating, raising user Listening experience.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the basic procedure schematic diagram that e-book of the embodiment of the present invention chants method;
Fig. 2 is the method flow diagram of the gender information of the role of confirmation statement literal field of the embodiment of the present invention;
Fig. 3 is the flow diagram that the embodiment of the present invention adds background music according to the emotion that literal field is expressed;
Fig. 4 is the flow diagram that the embodiment of the present invention chants state according to user's body status adjustment e-book;
Fig. 5 is the flow diagram that the embodiment of the present invention passes through that extraneous volume parameters adjust broadcast sound volume;
Fig. 6 is that the embodiment of the present invention suspends the flow diagram chanted according to environmental volume state modulator e-book;
Fig. 7 is that e-book of the embodiment of the present invention chants device basic structure schematic diagram;
Fig. 8 is computer equipment of embodiment of the present invention basic structure block diagram.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
In some processes of the description in description and claims of this specification and above-mentioned attached drawing, contain according to Multiple operations that particular order occurs, but it should be clearly understood that these operations can not be what appears in this article suitable according to its Sequence is executed or is executed parallel, and serial number of operation such as 101,102 etc. is only used for distinguishing each different operation, serial number It itself does not represent and any executes sequence.In addition, these processes may include more or fewer operations, and these operations can To execute or execute parallel in order.It should be noted that the description such as " first " herein, " second ", is for distinguishing not Same message, equipment, module etc., does not represent sequencing, does not also limit " first " and " second " and be different type.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.
Those skilled in the art of the present technique are appreciated that " terminal " used herein above, " terminal device " both include wireless communication The equipment of number receiver, only has the equipment of the wireless signal receiver of non-emissive ability, and including receiving and emitting hardware Equipment, have on bidirectional communication link, can execute two-way communication reception and emit hardware equipment.This equipment It may include: honeycomb or other communication equipments, shown with single line display or multi-line display or without multi-line The honeycomb of device or other communication equipments;PCS (Personal Communications Service, PCS Personal Communications System), can With combine voice, data processing, fax and/or communication ability;PDA (Personal Digital Assistant, it is personal Digital assistants), it may include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day It goes through and/or GPS (Global Positioning System, global positioning system) receiver;Conventional laptop and/or palm Type computer or other equipment, have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its His equipment." terminal " used herein above, " terminal device " can be it is portable, can transport, be mounted on the vehicles (aviation, Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on Network termination, music/video playback terminal, such as can be PDA, MID (Mobile Internet Device, mobile Internet Equipment) and/or mobile phone with music/video playing function, it is also possible to the equipment such as smart television, set-top box.
Referring specifically to Fig. 1, Fig. 1 is the basic procedure schematic diagram that the present embodiment e-book chants method.
As shown in Figure 1, a kind of e-book chants method, comprising:
S1100, literal field to be chanted is obtained;
Terminal is read aobvious in current display area when receiving instruction of the user by voice broadcasting eBook content Show the literal field of first segment in literal field or display area.In present embodiment, the content of literal field is not limited to This, literal field can be the article or individual a word for the random length that user specifies.
S1200, the literal field is converted to by array according to preset conversion regime, and by the array It is input in preset emotion recognition model, wherein the emotion recognition model is that training in advance to convergence state is used for logarithm The neural network model analyzed according to characterized emotion information;
After having read literal field, literal field is converted into array by matlab software application.Its In, each word or word of literal field are constituted, an element for replacing with array, and the row of element are mapped Column order is consistent with the arrangement order of literal field.
The array obtained after conversion is entered into preset emotion recognition model.Wherein, emotion recognition model is pre- It first trains to convergence state and is used for the neural network model that emotion information characterized to data is analyzed.Emotion recognition model energy Enough is the trained convolutional neural networks model (CNN) to convergence state, and still, not limited to this, emotion recognition model is also Can be: the deformation of deep neural network model (DNN), Recognition with Recurrent Neural Network model (RNN) or above-mentioned three kinds of network models Model.
Initial neural network model as emotion recognition model is converted in training by collecting a large amount of text document Array afterwards is as training sample, by manually after the original text this document for having read training sample to each training sample Demarcated (emotion that calibration refers to training sample truly expressed).Then training sample is input to initial neural network mould In type, and the classification results (emotional semantic classification that classification results are the training sample that model obtains) of model output are obtained, and pass through The loss function of neural network model calculate the distance between the classification results and calibration result (such as: Euclidean distance, geneva away from From or COS distance etc.), calculated result is compared with the distance threshold of setting, if calculated result is less than or equal to distance threshold Then by verifying, continue the training of next training sample, passes through loss letter if calculated result is greater than if distance threshold Number calculates difference between the two, and corrects the weight in neural network model by backpropagation, enables neural network model The weight for enough improving the corresponding element of word that can accurately express text emotion in training sample increases the standard of judgement with this True rate.After executing above scheme and the training of a large amount of training sample by circulation, obtained neural network model pair is trained
The Judgment by emotion accuracy rate of array characterization is greater than certain numerical value, for example, 97%, then the neural network model Training is to convergence state, then the training to convergent neural network is emotion recognition model.
The emotion recognition model of training to convergence state can accurately extract the emotion of array characterization.
S1300, the classification letter for reading emotion expressed by the characterization literal field of the emotion recognition model output Breath;
The array that literal field is converted is input in emotion recognition model, since emotion recognition model is trained in advance Therefore emotion information expressed by the literal field of array characterization, emotion letter can be accurately exported to convergence state Breath is the classification information of emotion recognition model.The classification layer of emotion recognition model is divided into: sad, sad, pain, happy, anger The moods classification such as anger and shyness, emotion recognition model is by carrying out the feature word or sentence to show emotion in array It extracts, and calculates the confidence level between feature word or sentence and above-mentioned class categories, and determine confidence level highest mood class It Wei not the final classification information of the array.
S1400, the sound bank that there are mapping relations with the classification information is searched in preset audio database, and adjust Voice conversion is carried out to the literal field with the audio data in the sound bank, wherein the audio number in the sound bank The emotion information characterized according to expressed emotion and the classification data is same or similar.
After obtaining emotional semantic classification information expressed by literal field, searches in preset audio database and believe with classification The emotion of breath characterization has the sound bank of mapping relations.Wherein, sound bank is to collect announcer or the person of dubbing in some mood Chinese characters in common use pronunciation, the pronunciation of word or common-use words under environment.For example, making announcer in happy mood to common Chinese Each word chanted, then, carry out recording and generate audio data, then the sound bank of the characterization happiness is that common Chinese is sent out The set of the audio data of sound, identical principle being capable of one sound banks of setting corresponding to each mood.But in sound bank Content it is not limited to this, in some embodiments, sound bank can collect the audio of the full Chinese speech pronunciation of a certain mood The audio data of the pronunciation of all words of data or other language languages under certain mood.
It is then, corresponding in the literal field according to the arrangement order of Chinese character in literal field after obtaining literal field The audio data of each Chinese character is extracted in sound bank, and audio data is arranged with the arrangement order of Chinese character in the literal field Sequence, the voice messaging after forming the conversion of the literal field finally play the voice messaging and complete chanting for literal field.
In some embodiments, for partial feeling expression do not set up corresponding sound bank when, can by with the feelings The sound bank for feeling similar emotion is shown literal field, such as after establishing the sad sound bank of expression, identifies text When the emotion of field expression is pain, when not setting up the sound bank of expression pain, it can be carried out by the sad sound bank of expression Approximate transform.
Above embodiment is before carrying out e-book and chanting, firstly, the literal field chanted will be needed to be converted into array Then matrix carries out the nerve net of emotional semantic classification by the data matrix that training to convergence state is used to convert literal field Network model, classifies to array, obtains emotion expressed by the literal field, finally, passing through the emotion of classification arrived The sound bank that there is identical emotional expression with the emotion is searched in audio database, calls the sound bank sound intermediate frequency data to text Word field carries out audio conversion, and the voice that literal field can be made to convert has emotional expression, can be sentient to electronics The content of book is chanted.Therefore, chanting and literal expression emotion phase for e-book can be made when voice plays e-book Together, the sound and emotion that true man chant can be really simulated, emotional expression enriches fascinating, raising user's listening experience.
In some embodiments, other than being directed to the emotion of literal field and carrying out corresponding emotive language and chanted, The e-book of gender design in to(for) role equally has strong influence for chanting for e-book, and role is set as male When, e-book is chanted by the sound bank of women, it is difficult to which the feel for the language for embodying role causes audition effect poor, user's body Test bad problem.Therefore, the gender information for needing first to determine the role of statement literal field before being chanted, please refers to Fig. 2, Fig. 2 are the method flow diagram of the gender information of the role of the present embodiment confirmation statement literal field.
As shown in Fig. 2, before S1400 further include:
S1311, the people information for stating the literal field is obtained;
After obtaining literal field, pass through the people for writing structure recognition and stating literal field in the literal field e-book Object.When literal field, which writes structure, writes for the first person, then the statement person of the literal field is the leading role of e-book.If literary When writing structure and writing for the third person of word field, can illustrate that the literal field is by that angle before statement in the literature What color was stated, people information is obtained by way of extracting and stating people's name.Wherein, people information refers in e-book In, the title of character or code name etc..In present embodiment, e-book refers to that the literature of any style and academic space are made Product.
S1312, in preset character data library lookup and the people information there is mapping to close according to the people information Role's set information of system, wherein include the gender information for stating the personage of the literal field in role's set information.
The angle that there are mapping relations with mission bit stream is searched in preset character data library according to obtained people information Color set information.In present embodiment, each e-book needs to collect the role of all roles in e-book before storage Set information states the gender information of role's task, the e-book side can be successfully put in storage after the completion of collection.But role sets Determine information and do not limit to character gender, according to the difference of concrete application scene, in some embodiments, role's set information is also wrapped It includes: the information such as role's age, personality and state of mind.By establishing and the matched sound bank of above- mentioned information, it will be able to by upper It states information and more life-like simulation is carried out to the sound of role, so that it is closer to true man and chant.
In some embodiments, the sound bank for establishing more sectionalization is subject to gender for identical emotional expression and carries out It distinguishes, for example, being similarly the sad sound bank of expression, two kinds of male voice version and goddess's version is subdivided into, so that more role when chanting Gender carry out correspondence calling.
Specifically, S1400 further include:
S1411, it is searched in the audio database with the classification information and the gender information with mapping relations Sound bank, and audio data in the sound bank is called to carry out voice conversion to the literal field, wherein the voice The tone of audio data in library is identical as the tone of personage's gender that the gender information characterizes.
After obtaining emotional semantic classification information expressed by literal field and stating the gender of role of literal field, default Audio database in search with classification information characterization emotion and gender information have mapping relations sound bank.Due to this Sound bank carries emotion attribute in embodiment, i.e., each sound bank expresses a kind of emotion, meanwhile, sound bank also carries Gender attribute, that is, the sound bank for expressing same emotion are subdivided into the sound bank of two kinds of genders.Therefore, believed by emotional semantic classification The gender of the role of breath and statement literal field can determine a unique sound bank, can satisfy the above-mentioned emotion of expression and property Other information chants demand.
In some embodiments, only literal field is chanted when e-book is chanted, it is difficult to express article and retouch The ambient condition stated can not bring fascinating impression of listening to storytelling to audience.Referring to Fig. 3, Fig. 3 is the present embodiment according to text word The flow diagram for the emotion addition background music that segment table reaches.
As shown in figure 3, after S1400, further includes:
S1421, lookup and the classification information have mapping relation context music in preset musical database, In, emotion expressed by the background music and the emotion information that the classification data characterizes are same or similar;
In present embodiment, musical database is set, several background musics, each head background have been included in musical database Music, all in accordance with the emotion of its music content expression, carries out label setting to background music, i.e., in music data before including Background music in library all has at least one affective tag for expressing its music content emotion.
After the emotion for obtaining literal field expression, retrieved in musical database using the emotion as qualifications, Searching has mapping relation context music with classification information.
In some embodiments, for partial feeling expression do not include corresponding background music when, can by with this The background music of the similar emotion of emotion is substituted, such as after establishing the sound bank celebrated joyously of expression, identifies literal field table It, can be by expressing the background music carry out table that celebrate joyously when not including the background music of expression happiness when the emotion reached is happy It reaches.
S1422, when playing the voice data of literal field conversion, play the background music, wherein the back The broadcast sound volume of scape music is less than the broadcast sound volume of the voice data.
After obtaining the background music that there is same or similar emotional expression with literal field, converted to literal field Voice data while played out, background music is also played out.Literal field is listened to not influence user The broadcast sound volume of background music is less than the broadcast sound volume of voice data, so that user is receiving when playing background music by content When listening voice data, the volume of background music does not influence user and clearly listens to the content that literal field is stated.
By the way that when playing the voice data of literal field, playing has same or similar emotional expression with literal field Background music, can audience is on the spot in person experiences ambient condition described in literal field, it is fascinating, improve user experience.
In some embodiments, the body surface parameter that user is obtained by wearable device, detects the physical condition of user, And determine whether that adjusting e-book chants state, with this, realizes according to user's body state tune by physical condition situation The economize on electricity philosophical works chants the purpose of state.Referring to Fig. 4, Fig. 4 is that the present embodiment is chanted according to user's body status adjustment e-book The flow diagram of state.
As shown in figure 4, including: after S1400
S1431, the body surface parameter for obtaining target user, wherein the body surface parameter be detection one of human body or more The supplemental characteristic of item body surface parameter;
Acquire the body surface parameter of user by the wearable device of user, the body surface parameter of user can be brain electricity, electrocardio, The supplemental characteristics such as eye electricity, myoelectricity, breathing, movement, the sound of snoring, pulse, pulse wave or respiratory rate.
In present embodiment, body surface parameter is the supplemental characteristic of one group of continuous acquisition, for example, the breathing of continuous acquisition user Frequency.
S1432, it converts the supplemental characteristic to parameter matrix, and the parameter matrix is input to preset human body shape In state judgment models, wherein the body state judgment models are that training in advance is used to be joined according to body surface to convergence state Number judges the neural network model of body state;
After having read body surface parameter, body surface parameter is converted into array by matlab software application.Its In, each group of body surface parameter of acquisition is an element of array, and the arrangement order of element and body surface parameter are adopted It is consistent to collect order.
The array obtained after conversion is entered into preset body state judgment models.Wherein, body state judges Model is that training is used for the neural network model analyzed according to body surface parameters on human state to convergence state in advance.Human body State judgment models can be the trained convolutional neural networks model (CNN) to convergence state, still, not limited to this, Body state judgment models can also be: deep neural network model (DNN), Recognition with Recurrent Neural Network model (RNN) or above-mentioned The distorted pattern of three kinds of network models.
For example, to train neural network model by the beat pulse frequency of study human body, whether identification user locates For sleep state, the training process of body state judgment models is illustrated.
Initial neural network model as body state judgment models is in training, by collecting a large amount of human pulse The supplemental characteristic of bounce, and it is recorded in which kind of state human body under the pulse condition is in when collecting, and the pulse of collection is joined The parameter matrix of number data conversion as training sample, by manually after the corresponding body state for observing training sample it is right Each training sample is demarcated (calibration refers to the body state of training sample truly expressed, i.e., whether is in sleep state). Then training sample is input in initial neural network model, and obtains the classification results of model output (classification results is Whether the human body that the state classification for the training sample that model obtains, i.e. model judge to obtain training sample expression is in sleep shape State), and by the loss function of neural network model calculate the distance between the classification results and calibration result (such as: Euclidean Distance, mahalanobis distance or COS distance etc.), calculated result is compared with the distance threshold of setting, if calculated result is less than Then pass through verifying equal to distance threshold, continue the training of next training sample, if calculated result is greater than distance threshold Difference between the two is then calculated by loss function, and the weight in neural network model is corrected by backpropagation, makes mind Can be improved the weight that the element of pulse frequency of User Status can be accurately expressed in training sample through network model, with this, Increase the accuracy rate of judgement.After executing above scheme and the training of a large amount of training sample by circulation, obtained nerve net is trained The body state judging nicety rate that network model characterizes parameter matrix is greater than certain numerical value, for example, 95%, then the neural network Model training is to convergence state, then the training to convergent neural network is body state judgment models.
Whether the human body for training the body state judgment models to convergence state that can accurately judge that parameter matrix characterizes In sleep state.
But body state judgment models can not only be trained for judging the sleep state of human body, also be able to use Same training method training body state judgment models, judge the motion state of human body, human body whether be in quiet state or Whether human body is in the state of being critically ill, and the training parameter data used are also not limited to pulse frequency, and can be use it is (unlimited In) supplemental characteristics such as brain electricity, electrocardio, eye electricity, myoelectricity, breathing, movement, the sound of snoring or respiratory rate are trained and judge.
S1433, the state classification data for reading the target user that the body state judgment models export;
The classification layer of body state judgment models is divided into: awake, sleep, movement, quiet, excited and dispirited etc. human bodies shape State, body state judgment models calculate feature by extracting to the characteristic for expressing body state in parameter matrix Confidence level between data and above-mentioned class categories, and determine that confidence level highest status categories are the final classification of the parameter matrix Data.
The parameter matrix that literal field is converted is input in body state judgment models, due to body state judgment models Training is to convergence state in advance, therefore, being capable of accurate emotion information expressed by the literal field of output parameter matrix characterization.
The broadcast state for the voice data that S1434, the literal field according to the state classification data point reuse are converted, In, whether the broadcast state includes broadcast sound volume and continues to play the voice data.
According to the broadcast state of the corresponding adjustment voice data of the body state characterized in classification data, for example, working as human body When in sleep state, pause plays voice data;When human body is kept in motion, increase broadcast sound volume;When human body is in When quiet state, reduce broadcast sound volume;When human body is in excitatory state, increase broadcast sound volume;When human body is in dispirited state, Reduce broadcast sound volume.
By acquiring user's body surface parameter, after identifying body state, broadcast state is adjusted according to body state, is made Chanting for e-book is consistent with body state, can be improved the acceptance that human body chants e-book, improves user experience.
In some embodiments, e-book is when chanting, the acoustic environment of external environment for user listen to influence compared with Greatly, therefore, it is necessary to be adjusted according to broadcast sound volume of the audio parameter of external environment to e-book.Referring to Fig. 5, Fig. 5 is The present embodiment passes through the flow diagram that extraneous volume parameters adjust broadcast sound volume.
As shown in figure 5, after S1400 further include:
S1441, the environmental volume parameter for obtaining target user's local environment;
Target user institute is collected by the way that the sound transducer at the terminal or between terminal with communication connection is arranged Locate the environmental volume parameter of environment.Environmental volume parameter is using conventional volume unit decibel as unit.
S1442, the broadcasting that there are mapping relations with the environmental volume parameter is searched in preset volume data list Volume;
The numerical value of the volume that e-book is chanted and environmental volume parameter is proportional, according to this direct ratio feature, establishes Volume data list.Broadcast sound volume corresponding to each environmental volume parameter section is recorded in volume data table.Therefore, it is obtaining After environmental volume parameter, the broadcast sound volume that there are mapping relations with environmental volume parameter is searched in volume data list.
S1443, the voice data that the literal field conversion is played according to the broadcast sound volume.
After the broadcast sound volume for obtaining having mapping relations with environmental volume parameter, the volume adjustment that e-book is chanted is extremely The broadcast sound volume.
The broadcast sound volume that voice data is adjusted by the environmental volume parameter in acquisition environment, can guarantee in environmental volume When variation, broadcast sound volume is adjusted in real time, allows users to not hear voice data clearly, improves the efficiency listened to.
In some embodiments, after the environmental volume of environmental volume parameter characterization is greater than certain numerical value, continue to increase Big broadcast sound volume can damage the sense of hearing of user, at this time, it may be necessary to suspend the broadcasting of voice data, to protect the health of user.It please join Fig. 6 is read, Fig. 6 is that the present embodiment suspends the flow diagram chanted according to environmental volume state modulator e-book.
As shown in fig. 6, before S1442 further include:
S1451, the environmental volume parameter is compared with preset volume threshold;
The environmental volume parameter that will acquire is compared with preset volume threshold, wherein volume threshold is the electricity of setting The philosophical works chants maximum environmental volume, i.e., under external world's volume environment, e-book is chanted volume and do not improved, and stops reading aloud It reads.The setting of the volume threshold ability to bear that can be averaged according to human body is set, for example, 100 decibels.But volume threshold Setting it is not limited to this, in some embodiments, the setting of volume threshold can be the numerical value of the customized setting of user.
S1452, when the environmental volume parameter be greater than the volume threshold when, call preset prompt information to be shown Show;Otherwise, then the broadcasting sound for having mapping relations with the environmental volume parameter is searched in confirmation in the volume data list Amount, wherein the prompt information prompt pause plays the voice data.
According to comparison result when environmental volume parameter is greater than volume threshold, preset prompt information is called to be shown; Otherwise, then the broadcast sound volume for having mapping relations with environmental volume parameter is searched in confirmation in volume data list, wherein prompt Information alert pause plays voice data.Preset prompt information content in present embodiment is that prompt pause plays voice number According to, such as: distinguished XXX user, current environment volume numerical value are more than XX decibels, for the health for protecting you, need to suspend and chant The content of e-book.
When environmental volume is greater than certain numerical value, stops playing and chant, human body ear-drum is not applied except ambient sound Other sound, facilitate protect human health.
In order to solve the above technical problems, the embodiment of the present invention, which also provides a kind of e-book, chants device.
Referring specifically to Fig. 7, Fig. 7 is that the present embodiment e-book chants device basic structure schematic diagram.
As shown in fig. 7, a kind of e-book chants device, comprising: obtain module 2100, processing module 2200, read module 2300 and execution module 2400.Wherein, module 2100 is obtained for obtaining literal field to be chanted;Processing module 2200 is used for Literal field is converted into array according to preset conversion regime, and array is input to preset emotion recognition mould In type, wherein the training in advance of emotion recognition model is analyzed to convergence state for emotion information characterized to data Neural network model;Read module 2300 is used to read emotion expressed by the characterization literal field of emotion recognition model output Classification information;Execution module 2400 in preset audio database for searching the voice for having mapping relations with classification information Library, and the audio data in sound bank is called to carry out voice conversion to literal field, wherein the audio data institute table in sound bank The emotion reached and the emotion information that classification data characterizes are same or similar.
E-book chants device before carrying out e-book and chanting, firstly, the literal field chanted will be needed to be converted into number Then group matrix carries out the nerve of emotional semantic classification by the data matrix that training to convergence state is used to convert literal field Network model classifies to array, obtains emotion expressed by the literal field, finally, passing through the feelings of classification arrived Feel the sound bank searched in audio database and there is identical emotional expression with the emotion, calls the sound bank sound intermediate frequency data pair Literal field carries out audio conversion, and the voice that literal field can be made to convert has emotional expression, can be sentient to electricity The content of the philosophical works is chanted.Therefore, chanting and literal expression emotion phase for e-book can be made when voice plays e-book Together, the sound and emotion that true man chant can be really simulated, emotional expression enriches fascinating, raising user's listening experience.
In some embodiments, e-book chants device further include: the first acquisition submodule and the first processing submodule. Wherein, the first acquisition submodule is used to obtain the people information of statement literal field;First processing submodule is used for according to personage Information has role's set information of mapping relations in preset character data library lookup and people information, wherein role's setting It include the gender information of the personage of statement literal field in information.
In some embodiments, e-book chants device further include: the first processing submodule, in audio database Middle lookup has the sound bank of mapping relations with classification information and gender information, and calls the audio data in sound bank to text Field carries out voice conversion, wherein the tone of personage's gender of the tone of the audio data in sound bank and gender information's characterization It is identical.
In some embodiments, e-book chants device further include: second processing submodule and the second implementation sub-module. Wherein, second processing submodule, which is used to search in preset musical database, has mapping relation context sound with classification information It is happy, wherein emotion expressed by background music and the emotion information that classification data characterizes are same or similar;Second implementation sub-module For playing background music when playing the voice data of literal field conversion, wherein the broadcast sound volume of background music is less than language The broadcast sound volume of sound data.
In some embodiments, e-book chants device further include: the second acquisition submodule, third processing submodule, First reading submodule and third implementation sub-module.Wherein, the second acquisition submodule is used to obtain the body surface parameter of target user, Wherein, body surface parameter is to detect the supplemental characteristic of the one or more body surface parameter of human body;Third processing submodule is used for will Supplemental characteristic is converted into parameter matrix, and parameter matrix is input in preset body state judgment models, wherein human body shape State judgment models are the neural network model that training is used to judge according to body surface parameter body state to convergence state in advance; First reading submodule is used to read the state classification data of the target user of body state judgment models output;Third executes son The broadcast state for the voice data that module is used to be converted according to state classification data point reuse literal field, wherein broadcast state packet It includes broadcast sound volume and whether continues to play voice data.
In some embodiments, e-book chants device further include: third acquisition submodule, fourth process submodule and 4th implementation sub-module.Wherein, third acquisition submodule is used to obtain the environmental volume parameter of target user's local environment;4th Submodule is handled for searching the broadcast sound volume that there are mapping relations with environmental volume parameter in preset volume data list; 4th implementation sub-module is used to play the voice data of literal field conversion according to broadcast sound volume.
In some embodiments, e-book chants device further include: first compares submodule and the 5th implementation sub-module. Wherein, the first comparison submodule is for environmental volume parameter to be compared with preset volume threshold;5th implementation sub-module For calling preset prompt information to be shown when environmental volume parameter is greater than volume threshold;Otherwise, then confirm in volume The broadcast sound volume that there are mapping relations with environmental volume parameter is searched in data list, wherein prompt information prompt pause plays Voice data.
In order to solve the above technical problems, the embodiment of the present invention also provides computer equipment.It is this referring specifically to Fig. 8, Fig. 8 Embodiment computer equipment basic structure block diagram.
As shown in figure 8, the schematic diagram of internal structure of computer equipment.The computer equipment includes being connected by system bus Processor, non-volatile memory medium, memory and network interface.Wherein, the non-volatile memories of the computer equipment are situated between Matter is stored with operating system, database and computer-readable instruction, can be stored with control information sequence in database, the computer When readable instruction is executed by processor, processor may make to realize that a kind of e-book chants method.The processing of the computer equipment Device supports the operation of entire computer equipment for providing calculating and control ability.It can be deposited in the memory of the computer equipment Computer-readable instruction is contained, when which is executed by processor, processor may make to execute a kind of e-book Chant method.The network interface of the computer equipment is used for and terminal connection communication.It will be understood by those skilled in the art that Fig. 8 Shown in structure, only the block diagram of part-structure relevant to application scheme, does not constitute and is answered application scheme With the restriction of computer equipment thereon, specific computer equipment may include than more or fewer portions as shown in the figure Part perhaps combines certain components or with different component layouts.
Processor is for executing acquisition module 2100, processing module 2200, read module 2300 in Fig. 7 in present embodiment With the concrete function of execution module 2400, program code and Various types of data needed for memory is stored with the above-mentioned module of execution.Net Network interface is used for the data transmission between user terminal or server.Memory in present embodiment is stored with facial image Program code needed for executing all submodules in critical point detection device and data, server are capable of the program of invoking server Code and data execute the function of all submodules.
Computer equipment is before carrying out e-book and chanting, firstly, the literal field chanted will be needed to be converted into array square Then battle array carries out the neural network of emotional semantic classification by the data matrix that training to convergence state is used to convert literal field Model classifies to array, obtains emotion expressed by the literal field, finally, being existed by the emotion of classification arrived The sound bank that there is identical emotional expression with the emotion is searched in audio database, calls the sound bank sound intermediate frequency data to text Field carries out audio conversion, and the voice that literal field can be made to convert has emotional expression, can be sentient to e-book Content chanted.Therefore, chanting for e-book can be kept identical as literal expression emotion when voice plays e-book, The sound and emotion that true man chant can be really simulated, emotional expression enriches fascinating, raising user's listening experience.
The present invention also provides a kind of storage medium for being stored with computer-readable instruction, computer-readable instruction by one or When multiple processors execute, so that one or more processors execute the step of any of the above-described embodiment e-book chants method.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

Claims (10)

1. a kind of e-book chants method characterized by comprising
Obtain literal field to be chanted;
The literal field is converted into array according to preset conversion regime, and the array is input to default Emotion recognition model in, wherein the emotion recognition model be training in advance to convergence state for being characterized feelings to data The neural network model that sense information is analyzed;
Read the classification information of emotion expressed by the characterization literal field of the emotion recognition model output;
The sound bank that there are mapping relations with the classification information is searched in preset audio database, and calls the voice Audio data in library carries out voice conversion to the literal field, wherein expressed by the audio data in the sound bank Emotion and the emotion information that the classification data characterizes are same or similar.
2. e-book according to claim 1 chants method, which is characterized in that described to be looked into preset audio database The sound bank that there are mapping relations with the classification information is looked for, and calls the audio data in the sound bank to the text word Before the conversion of Duan Jinhang voice, further includes:
Obtain the people information for stating the literal field;
In preset character data library lookup and the people information there is the role of mapping relations to set according to the people information Determine information, wherein include the gender information for stating the personage of the literal field in role's set information.
3. e-book according to claim 2 chants method, which is characterized in that described to be looked into preset audio database The sound bank that there are mapping relations with the classification information is looked for, and calls the audio data in the sound bank to the text word Duan Jinhang voice is converted
The sound bank that there are mapping relations with the classification information and the gender information is searched in the audio database, and The audio data in the sound bank is called to carry out voice conversion to the literal field, wherein the audio in the sound bank The tone of data is identical as the tone of personage's gender that the gender information characterizes.
4. e-book according to claim 1 chants method, which is characterized in that described to be looked into preset audio database The sound bank that there are mapping relations with the classification information is looked for, and calls the audio data in the sound bank to the text word After the conversion of Duan Jinhang voice, further includes:
Searching in preset musical database has mapping relation context music with the classification information, wherein the background Emotion expressed by music and the emotion information that the classification data characterizes are same or similar;
When playing the voice data of the literal field conversion, the background music is played, wherein the background music is broadcast Playback amount is less than the broadcast sound volume of the voice data.
5. e-book according to claim 1 chants method, which is characterized in that described to be looked into preset audio database The sound bank that there are mapping relations with the classification information is looked for, and calls the audio data in the sound bank to the text word After the conversion of Duan Jinhang voice, further includes:
Obtain the body surface parameter of target user, wherein the body surface parameter is to detect the one or more body surface parameter of human body Supplemental characteristic;
Parameter matrix is converted by the supplemental characteristic, and the parameter matrix is input to preset body state judgment models In, wherein the body state judgment models are that training in advance is used to judge human body according to body surface parameter to convergence state The neural network model of state;
Read the state classification data of the target user of the body state judgment models output;
According to the broadcast state for the voice data that literal field described in the state classification data point reuse is converted, wherein described to broadcast Whether the state of putting includes broadcast sound volume and continues to play the voice data.
6. e-book according to claim 1 chants method, which is characterized in that described to be looked into preset audio database The sound bank that there are mapping relations with the classification information is looked for, and calls the audio data in the sound bank to the text word After the conversion of Duan Jinhang voice, further includes:
Obtain the environmental volume parameter of target user's local environment;
The broadcast sound volume that there are mapping relations with the environmental volume parameter is searched in preset volume data list;
The voice data of the literal field conversion is played according to the broadcast sound volume.
7. e-book according to claim 6 chants method, which is characterized in that described in preset volume data list Lookup has with the environmental volume parameter before the broadcast sound volume of mapping relations, further includes:
The environmental volume parameter is compared with preset volume threshold;
When the environmental volume parameter is greater than the volume threshold, preset prompt information is called to be shown;Otherwise, then really Recognize the broadcast sound volume searched in the volume data list and there are mapping relations with the environmental volume parameter, wherein described Prompt information prompt pause plays the voice data.
8. a kind of e-book chants device characterized by comprising
Module is obtained, for obtaining literal field to be chanted;
Processing module, for the literal field to be converted to array according to preset conversion regime, and by the array Input matrix is into preset emotion recognition model, wherein the emotion recognition model is that training to convergence state is used in advance The neural network model that emotion information characterized to data is analyzed;
Read module, for reading the classification of emotion expressed by the characterization literal field of the emotion recognition model output Information;
Execution module, for searching the sound bank that there are mapping relations with the classification information in preset audio database, And the audio data in the sound bank is called to carry out voice conversion to the literal field, wherein the sound in the sound bank Frequency is same or similar according to the emotion information that expressed emotion and the classification data characterize.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is described When computer-readable instruction is executed by the processor, so that the processor executes such as any one of claims 1 to 7 right It is required that the step of e-book chants method.
10. a kind of storage medium for being stored with computer-readable instruction, the computer-readable instruction is handled by one or more When device executes, so that one or more processors execute the e-book as described in any one of claims 1 to 7 claim and chant The step of method.
CN201910044763.9A 2019-01-17 2019-01-17 E-book chants method, apparatus, computer equipment and storage medium Pending CN109658917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910044763.9A CN109658917A (en) 2019-01-17 2019-01-17 E-book chants method, apparatus, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910044763.9A CN109658917A (en) 2019-01-17 2019-01-17 E-book chants method, apparatus, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109658917A true CN109658917A (en) 2019-04-19

Family

ID=66120418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910044763.9A Pending CN109658917A (en) 2019-01-17 2019-01-17 E-book chants method, apparatus, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109658917A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415650A (en) * 2020-03-25 2020-07-14 广州酷狗计算机科技有限公司 Text-to-speech method, device, equipment and storage medium
CN111667815A (en) * 2020-06-04 2020-09-15 上海肇观电子科技有限公司 Method, apparatus, chip circuit and medium for text-to-speech conversion
CN111739509A (en) * 2020-06-16 2020-10-02 掌阅科技股份有限公司 Electronic book audio generation method, electronic device and storage medium
CN112233648A (en) * 2019-12-09 2021-01-15 北京来也网络科技有限公司 Data processing method, device, equipment and storage medium combining RPA and AI
CN112765971A (en) * 2019-11-05 2021-05-07 北京火山引擎科技有限公司 Text-to-speech conversion method and device, electronic equipment and storage medium
CN113851106A (en) * 2021-08-17 2021-12-28 北京百度网讯科技有限公司 Audio playing method and device, electronic equipment and readable storage medium
WO2021259300A1 (en) * 2020-06-23 2021-12-30 北京字节跳动网络技术有限公司 Sound effect adding method and apparatus, storage medium, and electronic device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222500A (en) * 2011-05-11 2011-10-19 北京航空航天大学 Extracting method and modeling method for Chinese speech emotion combining emotion points
GB201314175D0 (en) * 2013-08-07 2013-09-18 Samsung Electronics Co Ltd Text to Speech Conversion
US20150279347A1 (en) * 2014-03-27 2015-10-01 International Business Machines Corporation Text-to-Speech for Digital Literature
CN105280179A (en) * 2015-11-02 2016-01-27 小天才科技有限公司 Text-to-speech processing method and system
CN108091324A (en) * 2017-12-22 2018-05-29 北京百度网讯科技有限公司 Tone recognition methods, device, electronic equipment and computer readable storage medium
KR20180078197A (en) * 2018-06-27 2018-07-09 조은형 E-voice book editor and player
CN108573260A (en) * 2018-03-29 2018-09-25 广东欧珀移动通信有限公司 Information processing method and device, electronic equipment, computer readable storage medium
CN108831436A (en) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 A method of text speech synthesis after simulation speaker's mood optimization translation
CN108831435A (en) * 2018-06-06 2018-11-16 安徽继远软件有限公司 A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN108962219A (en) * 2018-06-29 2018-12-07 百度在线网络技术(北京)有限公司 Method and apparatus for handling text

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222500A (en) * 2011-05-11 2011-10-19 北京航空航天大学 Extracting method and modeling method for Chinese speech emotion combining emotion points
CN102893326A (en) * 2011-05-11 2013-01-23 北京航空航天大学 Chinese voice emotion extraction and modeling method combining emotion points
GB201314175D0 (en) * 2013-08-07 2013-09-18 Samsung Electronics Co Ltd Text to Speech Conversion
US20150279347A1 (en) * 2014-03-27 2015-10-01 International Business Machines Corporation Text-to-Speech for Digital Literature
CN105280179A (en) * 2015-11-02 2016-01-27 小天才科技有限公司 Text-to-speech processing method and system
CN108091324A (en) * 2017-12-22 2018-05-29 北京百度网讯科技有限公司 Tone recognition methods, device, electronic equipment and computer readable storage medium
CN108573260A (en) * 2018-03-29 2018-09-25 广东欧珀移动通信有限公司 Information processing method and device, electronic equipment, computer readable storage medium
CN108831435A (en) * 2018-06-06 2018-11-16 安徽继远软件有限公司 A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN108831436A (en) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 A method of text speech synthesis after simulation speaker's mood optimization translation
KR20180078197A (en) * 2018-06-27 2018-07-09 조은형 E-voice book editor and player
CN108962219A (en) * 2018-06-29 2018-12-07 百度在线网络技术(北京)有限公司 Method and apparatus for handling text

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765971A (en) * 2019-11-05 2021-05-07 北京火山引擎科技有限公司 Text-to-speech conversion method and device, electronic equipment and storage medium
CN112765971B (en) * 2019-11-05 2023-11-17 北京火山引擎科技有限公司 Text-to-speech conversion method and device, electronic equipment and storage medium
CN112233648A (en) * 2019-12-09 2021-01-15 北京来也网络科技有限公司 Data processing method, device, equipment and storage medium combining RPA and AI
CN112233648B (en) * 2019-12-09 2024-06-11 北京来也网络科技有限公司 Data processing method, device, equipment and storage medium combining RPA and AI
CN111415650A (en) * 2020-03-25 2020-07-14 广州酷狗计算机科技有限公司 Text-to-speech method, device, equipment and storage medium
CN111667815A (en) * 2020-06-04 2020-09-15 上海肇观电子科技有限公司 Method, apparatus, chip circuit and medium for text-to-speech conversion
CN111667815B (en) * 2020-06-04 2023-09-01 上海肇观电子科技有限公司 Method, apparatus, chip circuit and medium for text-to-speech conversion
CN111739509A (en) * 2020-06-16 2020-10-02 掌阅科技股份有限公司 Electronic book audio generation method, electronic device and storage medium
CN111739509B (en) * 2020-06-16 2022-03-22 掌阅科技股份有限公司 Electronic book audio generation method, electronic device and storage medium
WO2021259300A1 (en) * 2020-06-23 2021-12-30 北京字节跳动网络技术有限公司 Sound effect adding method and apparatus, storage medium, and electronic device
CN113851106A (en) * 2021-08-17 2021-12-28 北京百度网讯科技有限公司 Audio playing method and device, electronic equipment and readable storage medium
CN113851106B (en) * 2021-08-17 2023-01-06 北京百度网讯科技有限公司 Audio playing method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN109658917A (en) E-book chants method, apparatus, computer equipment and storage medium
Zhang et al. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching
CN108962217B (en) Speech synthesis method and related equipment
US11475897B2 (en) Method and apparatus for response using voice matching user category
Hollien The acoustics of crime: The new science of forensic phonetics
CN111833853B (en) Voice processing method and device, electronic equipment and computer readable storage medium
McLoughlin Speech and Audio Processing: a MATLAB-based approach
CN110838286A (en) Model training method, language identification method, device and equipment
CN109074345A (en) Course is automatically generated and presented by digital media content extraction
US20200058288A1 (en) Timbre-selectable human voice playback system, playback method thereof and computer-readable recording medium
US20200320898A1 (en) Systems and Methods for Providing Reading Assistance Using Speech Recognition and Error Tracking Mechanisms
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
CN108806686B (en) Starting control method of voice question searching application and family education equipment
CN113238654A (en) Multi-modal based reactive response generation
CN110148393A (en) Music generating method, device and system and data processing method
Li et al. Multimodal emotion recognition and state analysis of classroom video and audio based on deep neural network
Hassan On automatic emotion classification using acoustic features
CN117079501A (en) Virtual person self-adjusting teaching cloud platform, system, method and related equipment
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
CN114443889A (en) Audio acquisition method and device, electronic equipment and storage medium
CN115866327A (en) Background music adding method and related device
CN115731917A (en) Voice data processing method, model training method, device and storage medium
KR20210108565A (en) Virtual contents creation method
Eriksson That voice sounds familiar: Factors in speaker recognition
Hammami Towards developing a speech emotion database for Tunisian Arabic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination