CN108615524A - A kind of phoneme synthesizing method, system and terminal device - Google Patents

A kind of phoneme synthesizing method, system and terminal device Download PDF

Info

Publication number
CN108615524A
CN108615524A CN201810456213.3A CN201810456213A CN108615524A CN 108615524 A CN108615524 A CN 108615524A CN 201810456213 A CN201810456213 A CN 201810456213A CN 108615524 A CN108615524 A CN 108615524A
Authority
CN
China
Prior art keywords
sentence
data
feature words
sound
tone feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810456213.3A
Other languages
Chinese (zh)
Inventor
朱坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810456213.3A priority Critical patent/CN108615524A/en
Priority to PCT/CN2018/097560 priority patent/WO2019218481A1/en
Publication of CN108615524A publication Critical patent/CN108615524A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • G10L2013/105Duration
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present invention is suitable for technical field of data processing, provides a kind of phoneme synthesizing method, system and terminal device, including:Text data is obtained, subordinate sentence extracts tone Feature Words, and the emotion attribute of each sentence is analyzed according to tone Feature Words;The basic speech data of each sentence are synthesized according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model;Prosodic features adjustment is carried out to the basic speech data of each sentence according to tone Feature Words, obtains target speech data.By extracting the tone Feature Words of every sentence in text data come the emotion attribute of anolytic sentence, and the basic speech data adjusted by presetting the emotion attribute of sound pronunciation models coupling sentence, prosodic features adjustment is being carried out to basic speech data, is obtaining the higher target speech data of anthropomorphic degree.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves the quality of speech synthesis data.

Description

A kind of phoneme synthesizing method, system and terminal device
Technical field
The invention belongs to a kind of technical field of data processing more particularly to phoneme synthesizing method, system and terminal devices.
Background technology
Audiobook is a kind of individual or more people according to manuscript and is recorded by different sound emoticon and recording format Works.Audiobook on the market is all manually to record and save in advance at present, is directly played when in use.However this is needed A large amount of human resources are expended to be recorded in advance.In order to save human cost, language can be synthesized by speech synthesis technique Sound data.Speech synthesis technique refer to by the methods of mechanically or electrically generating artificial voice, it is that computer oneself is generated or Externally input text information be changed into the technology that the voice that can listen to understand is exported.Current phonetic synthesis skill Art is all first to be analyzed to obtain word and word in text data to text data, later from voice when carrying out phonetic synthesis Library obtains these words and the corresponding basic voice data of word, finally is combined to obtain in order by the basic voice data of acquisition Final voice data, so obtained from voice data personification degree it is not high, thus there are problems that poor quality.
In conclusion there is the voice data poor quality that synthesis obtains in existing speech synthesis technique.
Invention content
In view of this, an embodiment of the present invention provides a kind of phoneme synthesizing method, system and terminal device, it is existing to solve There is the voice data poor quality that synthesis obtains in speech synthesis technique.
The first aspect of the present invention provides a kind of phoneme synthesizing method, including:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words Feel attribute;
It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model The basic speech data of a sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh Mark voice data.
The second aspect of the present invention provides a kind of speech synthesis system, including:
Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and according to the tone Feature Words Analyze the emotion attribute of each sentence;
Voice synthetic module, for being based on default speech database and default sound pronunciation model according to each sentence Emotion attribute synthesize the basic speech data of each sentence;
Voice adjusts module, for carrying out rhythm to the basic speech data of each sentence according to the tone Feature Words Character adjustment is restrained, target speech data is obtained.
The third aspect of the present invention provides a kind of terminal device, including memory, processor and is stored in described deposit In reservoir and the computer program that can run on the processor, the processor realized when executing the computer program with Lower step:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words Feel attribute;
It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model The basic speech data of a sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh Mark voice data.
The fourth aspect of the present invention provides a kind of computer readable storage medium, and the computer readable storage medium is deposited Computer program is contained, the computer program realizes following steps when being executed by processor:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words Feel attribute;
It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model The basic speech data of a sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh Mark voice data.
A kind of phoneme synthesizing method, system and terminal device provided by the invention, by extracting every language in text data The tone Feature Words of sentence come the emotion attribute of anolytic sentence, and the emotion attribute tune by presetting sound pronunciation models coupling sentence Whole obtained basic speech data are carrying out prosodic features adjustment to basic speech data, are obtaining the higher target language of anthropomorphic degree Sound data.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, is effectively improved The quality of speech synthesis data solves asking for the voice data poor quality that existing speech synthesis technique presence synthesis obtains Topic.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is a kind of implementation process schematic diagram for phoneme synthesizing method that the embodiment of the present invention one provides;
Fig. 2 is the implementation process schematic diagram of one step S101 of corresponding embodiment provided by Embodiment 2 of the present invention;
Fig. 3 is the implementation process schematic diagram for the one step S102 of corresponding embodiment that the embodiment of the present invention three provides;
Fig. 4 is the implementation process schematic diagram for the one step S103 of corresponding embodiment that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural schematic diagram for speech synthesis system that the embodiment of the present invention five provides;
Fig. 6 is the structural schematic diagram of sentiment analysis module 101 in the corresponding embodiment five that the embodiment of the present invention six provides;
Fig. 7 is the structural schematic diagram of voice synthetic module 102 in the corresponding embodiment five that the embodiment of the present invention seven provides;
Fig. 8 is the structural schematic diagram of voice adjustment module 103 in the corresponding embodiment five that the embodiment of the present invention eight provides;
Fig. 9 is the schematic diagram for the terminal device that the embodiment of the present invention nine provides.
Specific implementation mode
In being described below, for illustration and not for limitation, it is proposed that such as tool of particular system structure, technology etc Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, system, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.
There is the voice data poor quality that synthesis obtains to solve existing speech synthesis technique in the embodiment of the present invention The problem of, a kind of phoneme synthesizing method, system and terminal device are provided, the tone for extracting every sentence in text data is passed through Feature Words carry out the emotion attribute of anolytic sentence, and adjusted by presetting the emotion attribute of sound pronunciation models coupling sentence Basic speech data are carrying out prosodic features adjustment to basic speech data, are obtaining the higher target speech data of anthropomorphic degree. When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves phonetic synthesis number According to quality, solve the problems, such as the voice data poor quality that existing speech synthesis technique is obtained in the presence of synthesis.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one:
As shown in Figure 1, present embodiments providing a kind of phoneme synthesizing method, specifically include:
Step S101:Text data is obtained, subordinate sentence extracts tone Feature Words, and each according to tone Feature Words analysis The emotion attribute of sentence.
In a particular application, the text data with text information is obtained by terminal, the format of this article notebook data can be with For text formatting (txt), rich text format (Rich Text Format, RTF) or document (Document, DOC) etc., also may be used Think the file that portable document format (Portable Document Format, PDF) or picture etc. include text information, PDF or picture are converted into again to directly read the file of text data, do not limited herein.
In a particular application, after getting text data, as unit of sentence, the tone feature in each sentence is extracted Word.Tone Feature Words refer to words, symbol or words combination with emotion, such as " happy ", " day ", " excellent " " well ", "" etc. indicate mood and the tone words and symbol.Since tone Feature Words can embody the Sentiment orientation of user, Different prosodic features is had when being pronounced.Therefore the tone Feature Words in each sentence are extracted, and according to language Gas Feature Words analyze the emotion attribute of each sentence.
In a particular application, tone Feature Words database is pre-set, each language is extracted according to tone Feature Words database The tone Feature Words to match with tone Feature Words database in sentence.When expressing mood, user can using word contamination come Express the mood of oneself.In order to enrich tone Feature Words database, accurately extract the tone Feature Words in each sentence, according to language The rule of combination of method rule setting words extracts the words for meeting rule of combination when extracting tone Feature Words together. Illustratively, said combination rule includes but not limited to following rule of combination:
A:Degree adverb+emotion word, such as " compared with+it is good ", " very+good ", " special+good ";
B:Negative word+emotion word, such as " not+good ", " not+bad ";
C:Negative word+degree adverb+emotion word, such as " not+too+good ", " not+too+bad ";
D:Degree adverb+negative word+emotion word, such as " very+or not good ", " also+or not bad ".
In a particular application, in order to ensure the phonetic synthesis effect of every a word, the present embodiment be as unit of sentence into Row tone Feature Words extract, and the emotion attribute of the sentence is analyzed for the tone Feature Words of each sentence.
In a particular application, first each sentence of text data is split, is split as multiple word contaminations, will tears open Neutral word and tone Feature Words are divided into the words combination divided, and wherein tone Feature Words include front word and negation words, Ke Yigen The emotion attribute of the sentence is obtained according to above-mentioned neutral word, front word and negation words proportion grading shared in the sentence.
Step S102:Based on default speech database and default sound pronunciation model according to the emotion category of each sentence Property synthesizes the basic speech data of each sentence.
In a particular application, the sentences of multiple word contaminations will be split as unit of words in default voice number According to the voice data for obtaining each words in library, the voice data of multiple words is synthesized to obtain the voice data of whole sentence.
In a particular application, after the voice data for obtaining whole sentence, default voice is based on according to the emotion attribute of the sentence and is sent out Sound model carries out acoustic feature adjustment to voice data, to obtain the basic speech data of corresponding sentence emotion attribute so that hair The pronunciation of sound and actual user are more nearly.In a particular application, above-mentioned acoustic feature includes intensity of sound, word speed, tone High low feature.
In a particular application, the building process of above-mentioned default sound pronunciation model is:Acquire the voice of a large amount of actual users Data utilize nerve net as training sample, and to carrying out emotion attribute label in voice data as unit of each sentence Network is trained, and obtains the acoustic feature of the corresponding sound pronunciation of each emotion attribute.
Step S103:Prosodic features tune is carried out to the basic speech data of each sentence according to the tone Feature Words It is whole, obtain target speech data.
In a particular application, basic speech data are the emotion attributes based on whole sentence, in order to further more be met reality Pronunciation characteristic of the border user in corresponding emotion carries out prosodic features to the basic speech data of whole sentence again for tone Feature Words Adjustment.
In a particular application, above-mentioned prosodic features includes loudness of a sound, pitch and the duration of a sound.Loudness of a sound is then the stress of voice, schwa Change Deng power;Pitch is then the word reconciliation intonation of voice;The duration of a sound is then the rhythm speed of voice.
In a particular application, since the user emotion of the tone Feature Words expression of different emotions tendency is different, and The prosodic features of the corresponding speech of different moods can have larger difference, such as it is happy when tone can obviously than sadness when sound It is turned up.I.e. each tone Feature Words correspond to one kind (or a kind of) prosodic features and therefore first get the corresponding rhythm of tone Feature Words Feature is restrained, prosodic features tune is carried out to the tone Feature Words in basic speech data according to the prosodic features of the tone Feature Words It is whole, if there are multiple tone Feature Words in a sentence, prosodic features adjustment is all carried out to whole tone Feature Words, is obtained more Meet the voice data of actual user's pronunciation.
In a particular application, being adjusted to prosodic features can be the prosodic features for pre-setting all kinds of tone Feature Words Parameter, such as setting indicate that the prosodic features parameter of happy tone Feature Words is loudness of a sound 1, pitch 1 and the duration of a sound 1, and setting indicates The prosodic features parameter of sad tone Feature Words is loudness of a sound 2, pitch 2 and the duration of a sound 2.Either based on basic speech data The percentage of prosodic features parameter is adjusted, and such as indicates happy tone Feature Words when carrying out prosodic features adjustment, be by The corresponding pitch of tone Feature Words increases 10% on the basis of basic speech data, and the duration of a sound is shortened 15%.
Phoneme synthesizing method provided in this embodiment is divided by extracting the tone Feature Words of every sentence in text data The emotion attribute of sentence is analysed, and the emotion attribute by presetting sound pronunciation models coupling sentence adjusts the basis personalized Voice data, then prosodic features adjustment is carried out to basic speech data, obtain the target voice for being more nearly actual user's pronunciation Data.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves language The quality of sound generated data solves asking for the voice data poor quality that existing speech synthesis technique presence synthesis obtains Topic.
Embodiment two:
As shown in Fig. 2, in the present embodiment, the step S101 in embodiment one is specifically included:
Step S201:The sentiment analysis parameter of multiple default dimensions of the sentence is obtained according to the tone Feature Words.
In a particular application, to each vocabulary from three default dimensions such as [front, neutral, negative] after sentence being split Degree scores, and obtains the sentence and corresponds to the scoring synthesis of three default dimensions such as [positive, neutral, negative], then calculates separately The ratio of score shared by three default dimensions of the sentence.Illustratively, neutral word and the tone can be divided into after sentence is split Feature Words, tone Feature Words can be divided into front word and negation words.When classifying to tone Feature Words, to tone Feature Words into Row is classified and the grade scoring of the tone Feature Words corresponding level is arranged.
Such as " happy ":It is arranged that it is front word and its grade scoring is+2;
" excellent ":It is arranged that it is front word and grade scoring is+5;
" bad ":It is arranged that it is negation words and grade scoring is -2 ";
" very bad ":It is arranged that it is negation words and grade scoring is -5.It should be noted that it is above-mentioned to tone Feature Words into Row classification and scoring can be based on neural network structure classification grading module and be realized that specific implementation means are not subject to It repeats.
Step S202:The ratio of the sentiment analysis parameter of total dimension shared by sentiment analysis parameter to the multiple default dimension Example determines the emotion attribute of the sentence.
In a particular application, the score for presetting dimension is calculated according to the grade scoring of tone Feature Words, then is calculated each pre- If the ratio of total dimension score shared by dimension, such as scoring of three default dimensions of a sentence is [+10,4, -6], at this time It is [+0.5,0.2, -0.3] to fraction scale shared by three dimension scores.
In a particular application, the text emotion disaggregated model of supporting vector mechanism corresponding to each sentence default three is utilized Ratio value shared by the score of a dimension is calculated, and then judges the corresponding emotion attribute of the sentence.It advances with a large amount of Different text datas are trained text emotion disaggregated model, to obtain the text emotion analysis model for meeting vectorial mechanism. Wherein it is possible to which emotion attribute is divided into a variety of different states, quantify the emotion of user with this, obtained each sentence pair That answers quantifies to indicate the emotion attribute of the sentence.Illustratively, the emotion attribute of above-mentioned quantization includes but not limited to:Happily, Sad, indignation, it is frightened, feel uncertain and normal.
Embodiment three:
As shown in figure 3, in the present embodiment, the step S102 in embodiment one is specifically included:
Step S301:Voice data corresponding with each word of sentence is obtained from the default speech database.
In a particular application, each sentence is split, multiple word contaminations is split as, pre- as unit of words If obtaining the voice data of each words in speech database
Step S302:The voice data is subjected to the electronic voice data that synthesis acquires the sentence.
In a particular application, the voice data for the voice data of multiple words being synthesized to obtain whole sentence obtains the language The electronic voice data of sentence.
Step S303:By sound pronunciation model according to the emotion attribute of the sentence to the sound of the electronic voice data High, loudness of a sound and word speed are adjusted, and obtain the basic speech data of the sentence.
In a particular application, after obtaining electronic voice data, default sound pronunciation is based on according to the emotion attribute of the sentence Model is adjusted the pitch, loudness of a sound and word speed of electronic voice data, to obtain the basic speech of corresponding sentence emotion attribute Data so that the pronunciation with actual user of pronouncing is more nearly.
Example IV:
As shown in figure 4, in the present embodiment, the step S103 in embodiment one is specifically included:
Step S401:The prosodic features adjustment rule of tone Feature Words in each sentence is obtained, including:Pitch, loudness of a sound and The adjustment rule of the prosodic features parameter such as duration of a sound.
In a particular application, classified and set the grade of different classes of tone Feature Words, root to tone Feature Words Corresponding prosodic features adjustment rule, the specially sound of the tone Feature Words are obtained according to the classification of the tone Feature Words and grade The adjustment rule of the prosodic features parameter such as high, loudness of a sound and the duration of a sound.
In a particular application, the prosodic features parameter of different classes of different grades of tone Feature Words is preset, then right Each tone Feature Words carry out classification and grade classification, and then obtain its corresponding prosodic features parameter acquiring its corresponding rhythm Character adjustment rule.
Step S402:Pitch, sound of the rule to the basic speech data are adjusted according to the prosodic features of tone Feature Words The strong and duration of a sound is adjusted, and obtains target speech data.
In a particular application, it after getting the corresponding prosodic features adjustment rule of tone Feature Words, is advised according to the adjustment Then pitch, loudness of a sound and the duration of a sound of each tone Feature Words in basic speech data are adjusted, can be obtained after being adjusted To the target speech data closer to actual user's pronunciation.
In a particular application, above-mentioned adjustment process can adjust rule according to prosodic features tone Feature Words are calculated Prosodic features parameter, then the prosodic features of corresponding tone Feature Words in basic speech data is adjusted to the prosodic features Parameter.Can also be that basic speech data are adjusted using the form of percentage according to prosodic features adjustment rule, herein It does not limit.
In one embodiment, further comprising the steps of after above-mentioned steps S402:
Obtain the prosodic features parameter of the target speech data;
The average value of the prosodic features parameter of each sentence is calculated according to the prosodic features parameter of target speech data;
Pitch, loudness of a sound and the duration of a sound of each word of sentence are adjusted according to the average value, seamlessly transitted Voice data.
In a particular application, since the adjustment of above-mentioned prosodic features is only to be directed to tone Feature Words, it is thus possible to can cause The problem of existing voice mutation so that the pronunciation appearance for the words that is connected before and after the pronunciation of tone Feature Words and tone Feature Words is lofty not Harmonious situation.It in a particular application, can be by the target language after prosodic features adjusts in order to avoid the above problem The prosodic features parameter of sound data carries out adjustment again as unit of whole sentence so that sentence can seamlessly transit.Specially: The average value of the prosodic features parameter of each sentence is obtained according to the prosodic features parameter of target speech data.For tone feature The connected words of word, is adjusted the pitch, loudness of a sound and the duration of a sound of the words using above-mentioned average value.In a particular application, when It, only need to be for the words that first tone Feature Words is connected with the last one tone Feature Words when multiple tone Feature Words are connected Pitch, loudness of a sound and the duration of a sound are adjusted.
Illustratively, as " we go recreation ground to play good or not afternoon!" in, " good or not " is used as tone Feature Words, sound High and tone can all increase, and the pitch and tone of coupled " objects for appreciation " will not then increase, it is thus possible to appearance from " object for appreciation " to The case where tone and pitch are promoted suddenly when " good or not " so that pronunciation transition is unnatural.Therefore according to the prosodic features of whole sentence The prosodic features mean parameter of whole sentence is calculated in parameter, is the average value by the prosodic features parameter adjustment of " object for appreciation ", then can The drop between the loudness of a sound and tone of " object for appreciation " and " good or not " is enough efficiently reduced, realization seamlessly transits.
Embodiment five:
As shown in figure 5, the present embodiment provides a kind of speech synthesis system 100, for executing the step of the method in embodiment one Suddenly comprising sentiment analysis module 101, voice synthetic module 102 and voice adjust module 103.
Sentiment analysis module 101 extracts tone Feature Words for obtaining text data, subordinate sentence, and according to the tone feature Word analyzes the emotion attribute of each sentence;
Voice synthetic module 102 is used for based on default speech database and default sound pronunciation model according to each language The emotion attribute of sentence synthesizes the basic speech data of each sentence;
Voice adjusts module 103 and is used to carry out the basic speech data of each sentence according to the tone Feature Words Prosodic features adjusts, and obtains target speech data.
It should be noted that speech synthesis system provided in an embodiment of the present invention, due to real with method shown in Fig. 1 of the present invention It applies example and is based on same design, the technique effect brought is identical as embodiment of the method shown in Fig. 1 of the present invention, and particular content can be found in Narration in embodiment of the method shown in Fig. 1 of the present invention, details are not described herein again.
Therefore, a kind of speech synthesis system provided in this embodiment, equally can be by extracting every language in text data The tone Feature Words of sentence come the emotion attribute of anolytic sentence, and the emotion attribute tune by presetting sound pronunciation models coupling sentence The whole basic speech data to be personalized, then prosodic features adjustment is carried out to basic speech data, it obtains being more nearly reality The target speech data of user pronunciation.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with actual user's pronunciation Feature, effectively improve the quality of speech synthesis data, solve the voice that existing speech synthesis technique is obtained in the presence of synthesis The low problem of the quality of data.
Embodiment six:
As shown in fig. 6, in the present embodiment, the sentiment analysis module 101 in embodiment five includes right for executing Fig. 2 institutes The structure of method and step in the embodiment answered comprising parameter acquiring unit 201 and sentiment analysis unit 202.
Parameter acquiring unit 201 is used to obtain the emotion of multiple default dimensions of the sentence according to the tone Feature Words Analyze parameter.
Emotion point of the sentiment analysis unit 202 for total dimension shared by the sentiment analysis parameter to the multiple default dimension Analyse the emotion attribute of sentence described in the ratio-dependent of parameter.
Embodiment seven:
As shown in fig. 7, in the present embodiment, the voice synthetic module 102 in embodiment five includes right for executing Fig. 3 institutes The structure of method and step in the embodiment answered comprising voice data acquiring unit 301, voice data synthesis unit 302 with And acoustic feature adjustment unit 303.
Voice data acquiring unit 301 is used to obtain from the default speech database corresponding with each word of sentence Voice data.
Voice data synthesis unit 302 is used to the voice data carrying out the electronic speech that synthesis acquires the sentence Data.
Acoustic feature adjustment unit 303 is used for through sound pronunciation model according to the emotion attribute of the sentence to the electricity Pitch, loudness of a sound and the word speed of sub- voice data are adjusted, and obtain the basic speech data of the sentence.
Embodiment eight:
As shown in figure 8, in the present embodiment, the voice adjustment module 103 in embodiment five includes right for executing Fig. 4 institutes The structure of method and step in the embodiment answered comprising prosodic features adjusts Rule unit 401 and prosodic features adjustment Unit 402.
Prosodic features adjustment Rule unit 401 obtains the prosodic features adjustment rule of tone Feature Words in each sentence Then, including:The adjustment rule of the prosodic features parameter such as pitch, loudness of a sound and duration of a sound.
Prosodic features adjustment unit 402 is used for according to the prosodic features of tone Feature Words adjustment rule to the basic speech Pitch, loudness of a sound and the duration of a sound of data are adjusted, and obtain target speech data.
In one embodiment, above-mentioned voice adjustment module 103 further include characteristic parameter acquiring unit, computing unit and Seamlessly transit adjustment unit.
Characteristic parameter acquiring unit is used to obtain the prosodic features parameter of the target speech data.
Computing unit is used to calculate the prosodic features parameter of each sentence according to the prosodic features parameter of target speech data Average value.
Seamlessly transit adjustment unit, for according to the average value to pitch, loudness of a sound and the duration of a sound of each word of sentence into Row adjustment, the voice data seamlessly transitted.
Embodiment nine:
Fig. 9 is the schematic diagram for the terminal device that the embodiment of the present invention nine provides.As shown in figure 9, the terminal of the embodiment is set Standby 9 include:Processor 90, memory 91 and it is stored in the meter that can be run in the memory 91 and on the processor 90 Calculation machine program 92, such as program.The processor 90 realizes above-mentioned each phonetic synthesis side when executing the computer program 92 Step in method embodiment, such as step S101 to S103 shown in FIG. 1.Alternatively, the processor 90 executes the computer The function of each module/unit in above system embodiment, such as the function of module 101 to 103 shown in Fig. 5 are realized when program 92.
Illustratively, the computer program 92 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 91, and are executed by the processor 90, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 92 in the terminal device 9 is described.For example, the computer program 92 can be divided It is as follows to be cut into sentiment analysis module, voice synthetic module and voice adjustment module, each module concrete function:
Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and according to the tone Feature Words Analyze the emotion attribute of each sentence;
Voice synthetic module, for being based on default speech database and default sound pronunciation model according to each sentence Emotion attribute synthesize the basic speech data of each sentence;
Voice adjusts module, for carrying out rhythm to the basic speech data of each sentence according to the tone Feature Words Character adjustment is restrained, target speech data is obtained.
The terminal device 9 can be the calculating such as desktop PC, notebook, palm PC and high in the clouds management server Equipment.The terminal device may include, but be not limited only to, processor 90, memory 91.It will be understood by those skilled in the art that Fig. 9 is only the example of terminal device 9, does not constitute the restriction to terminal device 9, may include more more or fewer than illustrating Component, either combines certain components or different components, for example, the terminal device can also include input-output equipment, Network access equipment, bus etc..
Alleged processor 90 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.
The memory 91 can be the internal storage unit of the terminal device 9, such as the hard disk of terminal device 9 or interior It deposits.The memory 91 can also be to be equipped on the External memory equipment of the terminal device 9, such as the terminal device 9 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 91 can also both include the storage inside list of the terminal device 9 Member also includes External memory equipment.The memory 91 is for storing needed for the computer program and the terminal device Other programs and data.The memory 91 can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of the system are divided into different functional units or module, more than completion The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used It, can also be above-mentioned integrated during two or more units are integrated in one unit to be that each unit physically exists alone The form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.In addition, each function list Member, the specific name of module are also only to facilitate mutually distinguish, the protection domain being not intended to limit this application.It is above-mentioned wireless The specific work process of unit in terminal, module, can refer to corresponding processes in the foregoing method embodiment, no longer superfluous herein It states.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed system/terminal device and method, it can be with It realizes by another way.For example, system described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as Multiple units or component can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, system Or INDIRECT COUPLING or the communication connection of unit, can be electrical, machinery or other forms.
It is described to be set as the unit that separating component illustrates and may or may not be physically separated, it is set as single The component of member display may or may not be physical unit, you can be located at a place, or may be distributed over In multiple network element.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme Purpose.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated module/unit, which is realized in the form of SFU software functional unit and is arranged, is independent product sale Or it in use, can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-mentioned reality All or part of flow in a method is applied, relevant hardware can also be instructed to complete by computer program, it is described Computer program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that The step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include:Can carry the computer program code any entity or system, recording medium, USB flash disk, mobile hard disk, Magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to legislation in jurisdiction and the requirement of patent practice Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and Telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of phoneme synthesizing method, which is characterized in that including:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the emotion category of each sentence is analyzed according to the tone Feature Words Property;
Each language is synthesized according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model The basic speech data of sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains target language Sound data.
2. phoneme synthesizing method according to claim 1, which is characterized in that described to be analyzed respectively according to the tone Feature Words The emotion attribute of a sentence, including:
The sentiment analysis parameter of multiple default dimensions of the sentence is obtained according to the tone Feature Words;
The ratio-dependent institute predicate of the sentiment analysis parameter of total dimension shared by sentiment analysis parameter to the multiple default dimension The emotion attribute of sentence.
3. phoneme synthesizing method according to claim 1, which is characterized in that based on default speech database and default voice Pronunciation model synthesizes the basic speech data of each sentence according to the emotion attribute of each sentence, including:
Voice data corresponding with each word of sentence is obtained from the default speech database;
The voice data is subjected to the electronic voice data that synthesis acquires the sentence;
By sound pronunciation model according to the emotion attribute of the sentence to the pitch, loudness of a sound and word speed of the electronic voice data It is adjusted, obtains the basic speech data of the sentence.
4. phoneme synthesizing method according to claim 1, which is characterized in that according to the tone Feature Words to described each The basic speech data of sentence carry out prosodic features adjustment, obtain target speech data, including:
The prosodic features adjustment rule of tone Feature Words in each sentence is obtained, including:The prosodic features such as pitch, loudness of a sound and the duration of a sound The adjustment rule of parameter;
Pitch, loudness of a sound and the duration of a sound of the basic speech data are adjusted according to the prosodic features of tone Feature Words adjustment rule It is whole, obtain target speech data.
5. phoneme synthesizing method according to claim 4, which is characterized in that adjusted according to the prosodic features of tone Feature Words Rule is adjusted pitch, loudness of a sound and the duration of a sound of the basic speech data, after obtaining target speech data, further includes:
Obtain the prosodic features parameter of the target speech data;
The average value of the prosodic features parameter of each sentence is calculated according to the prosodic features parameter of target speech data;
Pitch, loudness of a sound and the duration of a sound of each word of sentence are adjusted according to the average value, the voice seamlessly transitted Data.
6. a kind of speech synthesis system, which is characterized in that including:
Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and is analyzed according to the tone Feature Words The emotion attribute of each sentence;
Voice synthetic module, for the feelings based on default speech database and default sound pronunciation model according to each sentence Sense attribute synthesizes the basic speech data of each sentence;
Voice adjusts module, special for carrying out the rhythm to the basic speech data of each sentence according to the tone Feature Words Requisition whole, acquisition target speech data.
7. speech synthesis system according to claim 6, which is characterized in that the sentiment analysis module includes:
Parameter acquiring unit, the sentiment analysis ginseng of multiple default dimensions for obtaining the sentence according to the tone Feature Words Number;
Sentiment analysis unit, the sentiment analysis parameter for total dimension shared by the sentiment analysis parameter to the multiple default dimension Ratio-dependent described in sentence emotion attribute.
8. speech synthesis system according to claim 6, which is characterized in that the voice synthetic module includes:
Voice data acquiring unit, for obtaining voice number corresponding with each word of sentence from the default speech database According to;
Voice data synthesis unit, for the voice data to be carried out the electronic voice data that synthesis acquires the sentence;
Acoustic feature adjustment unit, for passing through sound pronunciation model according to the emotion attribute of the sentence to the electronic speech Pitch, loudness of a sound and the word speed of data are adjusted, and obtain the basic speech data of the sentence.
9. a kind of terminal device, including memory, processor and it is stored in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 5 when executing the computer program The step of any one the method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.
CN201810456213.3A 2018-05-14 2018-05-14 A kind of phoneme synthesizing method, system and terminal device Pending CN108615524A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810456213.3A CN108615524A (en) 2018-05-14 2018-05-14 A kind of phoneme synthesizing method, system and terminal device
PCT/CN2018/097560 WO2019218481A1 (en) 2018-05-14 2018-07-27 Speech synthesis method, system, and terminal apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810456213.3A CN108615524A (en) 2018-05-14 2018-05-14 A kind of phoneme synthesizing method, system and terminal device

Publications (1)

Publication Number Publication Date
CN108615524A true CN108615524A (en) 2018-10-02

Family

ID=63663006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810456213.3A Pending CN108615524A (en) 2018-05-14 2018-05-14 A kind of phoneme synthesizing method, system and terminal device

Country Status (2)

Country Link
CN (1) CN108615524A (en)
WO (1) WO2019218481A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461435A (en) * 2018-11-19 2019-03-12 北京光年无限科技有限公司 A kind of phoneme synthesizing method and device towards intelligent robot
CN109545245A (en) * 2018-12-21 2019-03-29 斑马网络技术有限公司 Method of speech processing and device
CN109599094A (en) * 2018-12-17 2019-04-09 海南大学 The method of sound beauty and emotion modification
CN109710748A (en) * 2019-01-17 2019-05-03 北京光年无限科技有限公司 It is a kind of to draw this reading exchange method and system towards intelligent robot
CN110379409A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Phoneme synthesizing method, system, terminal device and readable storage medium storing program for executing
CN111031386A (en) * 2019-12-17 2020-04-17 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN111091810A (en) * 2019-12-19 2020-05-01 佛山科学技术学院 VR game character expression control method based on voice information and storage medium
CN111108549A (en) * 2019-12-24 2020-05-05 深圳市优必选科技股份有限公司 Speech synthesis method, speech synthesis device, computer equipment and computer readable storage medium
CN111128118A (en) * 2019-12-30 2020-05-08 科大讯飞股份有限公司 Speech synthesis method, related device and readable storage medium
CN112349272A (en) * 2020-10-15 2021-02-09 北京捷通华声科技股份有限公司 Speech synthesis method, speech synthesis device, storage medium and electronic device
CN113539230A (en) * 2020-03-31 2021-10-22 北京奔影网络科技有限公司 Speech synthesis method and device
CN113990286A (en) * 2021-10-29 2022-01-28 北京大学深圳研究院 Speech synthesis method, apparatus, device and storage medium
CN114783402A (en) * 2022-06-22 2022-07-22 广东电网有限责任公司佛山供电局 Variation method and device for synthetic voice, electronic equipment and storage medium
US11545135B2 (en) * 2018-10-05 2023-01-03 Nippon Telegraph And Telephone Corporation Acoustic model learning device, voice synthesis device, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003271172A (en) * 2002-03-15 2003-09-25 Sony Corp Method and apparatus for voice synthesis, program, recording medium and robot apparatus
US20050187772A1 (en) * 2004-02-25 2005-08-25 Fuji Xerox Co., Ltd. Systems and methods for synthesizing speech using discourse function level prosodic features
CN102103856A (en) * 2009-12-21 2011-06-22 盛大计算机(上海)有限公司 Voice synthesis method and system
US20130211838A1 (en) * 2010-10-28 2013-08-15 Acriil Inc. Apparatus and method for emotional voice synthesis

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8073696B2 (en) * 2005-05-18 2011-12-06 Panasonic Corporation Voice synthesis device
CN101064103B (en) * 2006-04-24 2011-05-04 中国科学院自动化研究所 Chinese voice synthetic method and system based on syllable rhythm restricting relationship
KR20080060909A (en) * 2006-12-27 2008-07-02 엘지전자 주식회사 Method for synthesing voice according to text and voice synthesis using the same
CN101000765B (en) * 2007-01-09 2011-03-30 黑龙江大学 Speech synthetic method based on rhythm character
CN101452699A (en) * 2007-12-04 2009-06-10 株式会社东芝 Rhythm self-adapting and speech synthesizing method and apparatus
KR101203188B1 (en) * 2011-04-14 2012-11-22 한국과학기술원 Method and system of synthesizing emotional speech based on personal prosody model and recording medium
CN103366731B (en) * 2012-03-31 2019-02-01 上海果壳电子有限公司 Phoneme synthesizing method and system
CN103198827B (en) * 2013-03-26 2015-06-17 合肥工业大学 Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter
US20150046164A1 (en) * 2013-08-07 2015-02-12 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for text-to-speech conversion
US9824681B2 (en) * 2014-09-11 2017-11-21 Microsoft Technology Licensing, Llc Text-to-speech with emotional content
CN105355193B (en) * 2015-10-30 2020-09-25 百度在线网络技术(北京)有限公司 Speech synthesis method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003271172A (en) * 2002-03-15 2003-09-25 Sony Corp Method and apparatus for voice synthesis, program, recording medium and robot apparatus
US20050187772A1 (en) * 2004-02-25 2005-08-25 Fuji Xerox Co., Ltd. Systems and methods for synthesizing speech using discourse function level prosodic features
CN102103856A (en) * 2009-12-21 2011-06-22 盛大计算机(上海)有限公司 Voice synthesis method and system
US20130211838A1 (en) * 2010-10-28 2013-08-15 Acriil Inc. Apparatus and method for emotional voice synthesis

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11545135B2 (en) * 2018-10-05 2023-01-03 Nippon Telegraph And Telephone Corporation Acoustic model learning device, voice synthesis device, and program
CN109461435A (en) * 2018-11-19 2019-03-12 北京光年无限科技有限公司 A kind of phoneme synthesizing method and device towards intelligent robot
CN109599094A (en) * 2018-12-17 2019-04-09 海南大学 The method of sound beauty and emotion modification
CN109545245A (en) * 2018-12-21 2019-03-29 斑马网络技术有限公司 Method of speech processing and device
CN109710748A (en) * 2019-01-17 2019-05-03 北京光年无限科技有限公司 It is a kind of to draw this reading exchange method and system towards intelligent robot
CN110379409B (en) * 2019-06-14 2024-04-16 平安科技(深圳)有限公司 Speech synthesis method, system, terminal device and readable storage medium
CN110379409A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Phoneme synthesizing method, system, terminal device and readable storage medium storing program for executing
CN111031386A (en) * 2019-12-17 2020-04-17 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN111031386B (en) * 2019-12-17 2021-07-30 腾讯科技(深圳)有限公司 Video dubbing method and device based on voice synthesis, computer equipment and medium
CN111091810A (en) * 2019-12-19 2020-05-01 佛山科学技术学院 VR game character expression control method based on voice information and storage medium
CN111108549A (en) * 2019-12-24 2020-05-05 深圳市优必选科技股份有限公司 Speech synthesis method, speech synthesis device, computer equipment and computer readable storage medium
WO2021127979A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech synthesis method and apparatus, computer device, and computer readable storage medium
CN111108549B (en) * 2019-12-24 2024-02-02 深圳市优必选科技股份有限公司 Speech synthesis method, device, computer equipment and computer readable storage medium
CN111128118A (en) * 2019-12-30 2020-05-08 科大讯飞股份有限公司 Speech synthesis method, related device and readable storage medium
CN111128118B (en) * 2019-12-30 2024-02-13 科大讯飞股份有限公司 Speech synthesis method, related device and readable storage medium
CN113539230A (en) * 2020-03-31 2021-10-22 北京奔影网络科技有限公司 Speech synthesis method and device
CN112349272A (en) * 2020-10-15 2021-02-09 北京捷通华声科技股份有限公司 Speech synthesis method, speech synthesis device, storage medium and electronic device
CN113990286A (en) * 2021-10-29 2022-01-28 北京大学深圳研究院 Speech synthesis method, apparatus, device and storage medium
CN114783402A (en) * 2022-06-22 2022-07-22 广东电网有限责任公司佛山供电局 Variation method and device for synthetic voice, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2019218481A1 (en) 2019-11-21

Similar Documents

Publication Publication Date Title
CN108615524A (en) A kind of phoneme synthesizing method, system and terminal device
Li et al. Controllable emotion transfer for end-to-end speech synthesis
Morrison et al. Ensemble methods for spoken emotion recognition in call-centres
CN109271493A (en) A kind of language text processing method, device and storage medium
CN107464555A (en) Background sound is added to the voice data comprising voice
Xue et al. Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space
CN101156196A (en) Hybrid speech synthesizer, method and use
Zhang et al. Pre-trained deep convolution neural network model with attention for speech emotion recognition
Pinto-Coelho et al. On the development of an automatic voice pleasantness classification and intensity estimation system
CN107221344A (en) A kind of speech emotional moving method
Deb et al. Fourier model based features for analysis and classification of out-of-breath speech
Pravena et al. Development of simulated emotion speech database for excitation source analysis
Proutskova et al. Breathy, resonant, pressed–automatic detection of phonation mode from audio recordings of singing
Pauletto et al. Exploring expressivity and emotion with artificial voice and speech technologies
CN114927126A (en) Scheme output method, device and equipment based on semantic analysis and storage medium
Alías et al. Towards high-quality next-generation text-to-speech synthesis: A multidomain approach by automatic domain classification
Bozkurt et al. Affective synthesis and animation of arm gestures from speech prosody
Alessandri et al. A critical ear: analysis of value judgments in reviews of Beethoven's piano sonata recordings
Arnhold Complex prosodic focus marking in Finnish: Expanding the data landscape
Worrall et al. Intelligible sonifications
CN110390097A (en) A kind of sentiment analysis method and system based on the interior real time data of application
CN112017668A (en) Intelligent voice conversation method, device and system based on real-time emotion detection
He et al. Automatic generation algorithm analysis of dance movements based on music–action association
Walther et al. Towards a conversational expert system for rhetorical and vocal quality assessment in call center talks.
Kawahara Temporally variable multi attribute morphing of arbitrarily many voices for exploratory research of speech prosody

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181002