CN1787072A - Method for synthesizing pronunciation based on rhythm model and parameter selecting voice - Google Patents

Method for synthesizing pronunciation based on rhythm model and parameter selecting voice Download PDF

Info

Publication number
CN1787072A
CN1787072A CNA2004100969685A CN200410096968A CN1787072A CN 1787072 A CN1787072 A CN 1787072A CN A2004100969685 A CNA2004100969685 A CN A2004100969685A CN 200410096968 A CN200410096968 A CN 200410096968A CN 1787072 A CN1787072 A CN 1787072A
Authority
CN
China
Prior art keywords
syllable
cost
parameters
acoustic
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004100969685A
Other languages
Chinese (zh)
Other versions
CN1787072B (en
Inventor
陈明
吕士楠
张连毅
武卫东
肖娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing InfoQuick SinoVoice Speech Technology Corp.
Original Assignee
JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd filed Critical JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority to CN2004100969685A priority Critical patent/CN1787072B/en
Publication of CN1787072A publication Critical patent/CN1787072A/en
Application granted granted Critical
Publication of CN1787072B publication Critical patent/CN1787072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Electrophonic Musical Instruments (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a voice synthesizing method based on rhythm model and parameter-based sound selection, making acoustics rhythm parameter planning to obtain the target values of acoustics parameters expected by each syllable; then making maximum matching, selecting those with the smallest difference as really used samples; after the maximum matching, making single character matching treatment on the unmatched segments; calculating a synthesizing cost of all segment paths through all syllable candidate samples, where the synthesizing cost is determined by the difference between the acoustics parameters of each candidate sample and their planned values and the difference synthesis between the candidate samples of two adjacent syllables in the paths; obtaining a path with the lowest synthesizing cost by dynamical planning algorithm; as all syllable samples are selected, obtaining the data in a voice base and making waveform splicing and obtaining the final synthesis result.

Description

Select the phoneme synthesizing method of sound based on rhythm model and parameter
Technical field
The present invention relates to the speech synthesis technique field, be specifically related to phoneme synthesizing method.
Background technology
At present, the synthetic developing direction of Chinese speech is based on the waveform concatenation technology in extensive true recording sound storehouse.So-called extensive true recording sound storehouse, be meant the recording sound storehouse of having recorded a large amount of natural-soundings, its scope has covered the situation of the various pronunciations in most context environmentals substantially, and at different context environmentals, system will choose the raw tone fragment of mating is most spliced.Because therefore being on a grand scale of sound storehouse under nearly all situation, can both be found optimal primitive nature voice, and need not to use other technology to regulate, the final synthetic voice and the consistance of raw tone have therefore been guaranteed.In addition, selected here fragment has surmounted the level of syllable, can be multi-character words even phrase segment, has so just further guaranteed the naturalness of synthetic speech.
The shortcoming that this method exists at present is, when splicing, generally adopted and selected suitable syllable based on the method for rhythm level coupling, also promptly according to position, position in prosodic phrase and the position in speech of syllable in whole word that will synthesize, select storehouse these positions sample that mates of trying one's best that neutralizes to splice.(for example the pitch of prosodic phrase head is generally higher although there is certain dependence the real parameters,acoustic (pitch, the duration of a sound, loudness of a sound) of the syllable in short and position, and the rhythm end is lower, and the syllable duration of a sound at rhythm end is longer, the duration of a sound of the syllable in the middle of three words is the shortest etc. in three syllables), but this relation is not absolute, what is more important can not guarantee that a plurality of natural statement of recording in a large number in the storehouse has the consistent pitch or the duration of a sound in addition.Therefore in this case, will produce the uncontinuity in the splicing.For example,,,, may cause these two continuous syllables not meet the Changing Pattern of actual speech owing to do not consider actual parameters,acoustic though carried out selecting sound according to the position if from two statements, select respectively for two continuous syllables in a word.Cause the pitch saltus step on the sense of hearing like this, or the duration of a sound do not match, reduced the naturalness of voice.
The objective of the invention is at existing existing defective of waveform concatenation phoneme synthesizing method and deficiency based on extensive recording sound storehouse, adopt that a kind of to select the method for sound to carry out dynamic Chinese speech based on rhythm model and parameter synthetic, make the syllable sample that splices on real parameters,acoustic, satisfy certain rhythm model, make parameters,acoustic on changing, can control, also just can eliminate in splicing because the naturalness of selecting not matching of syllable to cause reduces.
Summary of the invention
In view of this, the present invention is based on rhythm model and carry out parameters,acoustic planning, obtain the desired value of the desirable parameters,acoustic of each syllable; Carry out maximum match again, select the real sample that uses of conduct of gap minimum.After finishing maximum match, the section at not mating carries out the processing of individual character coupling.Calculate the integrate-cost that each bar runs through the section path of all syllable candidate samples, integrate-cost is to determine by the gap between the parameters,acoustic between the candidate samples of two adjacent syllables in gap between the parameters,acoustic of each candidate samples and its planning value and the path is comprehensive.Obtain the path of integrate-cost minimum by dynamic programming algorithm.Behind the selected sample of all syllables, in sound bank, obtain data and carry out waveform concatenation, obtain final synthetic result.
Provided by the inventionly select the phoneme synthesizing method of sound, comprise the steps: based on rhythm model and parameter
(a) set up rhythm model storehouse, record sound storehouse, index database on a large scale;
(b) text of wanting synthetic speech is carried out pre-service, it comprises, and punctuate, regularization of text, participle, part-of-speech tagging, syntactic analysis, rhythmite level structure are analyzed, commentaries on classics phonetic;
(c) according to the attribute of syllable: in the speech of each syllable in position, the prosodic phrase in position and the sentence sound of position and this syllable connect attribute, the company's of accent attribute, from the rhythm model storehouse, find the parameters,acoustic value that each syllable has, finish planning the parameters,acoustic of each syllable; Wherein said parameters,acoustic comprises: pitch, the duration of a sound, loudness of a sound;
(d), from index database, obtain all candidate samples that this syllable exists in extensive dictation library for each syllable;
(e) calculate the parameters,acoustic in each parameters,acoustic, location parameter and planning of mating string, the cost C between the location parameter j, find its cost C MinLess than threshold value C ThThe coupling string, thereby obtain maximum match length in all candidate samples of current syllable;
(f) section that does not mate in the text is carried out the byte matching treatment:
Calculate the parameters,acoustic of parameters,acoustic, location parameter and the planning of each all candidate samples of syllable, the node cost between the location parameter;
Connection cost between all candidate samples of two adjacent syllables of calculating;
Adopt dynamic programming algorithm, in each path, calculate the path of overall cost minimum; Overall cost is the summation of the connection cost between all node costs and the adjacent node on the path for this reason;
Be provided with each syllable choose sample by optimal path the both candidate nodes of process;
(g) according to selected sample, from extensive true recording sound storehouse, obtain Wave data, splice.
Adopt method provided by the invention can solve the existing discontinuous problem of the existing splicing of waveform concatenation phoneme synthesizing method, improved the naturalness of phonetic synthesis based on extensive recording sound storehouse.
Description of drawings
Fig. 1 is the flow process of phonetic synthesis;
Fig. 2 is the flow process of maximum match step;
Figure 3 shows that individual character selects the rapid example of foot.
Embodiment
Before concrete phonetic synthesis, set up following resource base earlier:
Extensive recording sound storehouse: speech waveform data, each syllable reference position and its parameters,acoustic data (pitch, the duration of a sound, loudness of a sound) in speech waveform.
Index database: to all syllables, write down the sequence number of its all sample in extensive recording sound storehouse, searched extensive recording sound storehouse, can obtain the related data of this syllable fast by this sequence number.
Rhythm model storehouse:, also be which type of pitch, the duration of a sound, the loudness of a sound of each syllable in a word should be by the rhythm model that the statistics training obtains.The numerical value of these parameters,acoustics is closely related with the factors such as length of sentence pattern, part of speech sequence, sentence and prosodic phrase.
The flow process of phonetic synthesis as shown in Figure 1.
Specifically describe as follows:
1, pre-service
For the voice that will synthesize, at first to pass through the text pre-treatment step.This step comprises punctuate, regularization of text, participle, part-of-speech tagging, syntactic analysis, the analysis of rhythmite level structure, changes phonetic etc.Finally can obtain following result:
The phonetic of each syllable in short;
In the speech of each syllable in position, the prosodic phrase position and the sentence in the position;
Part of speech of each speech (for example noun, verb, adjective etc.) and syntactic constituent (subject, predicate, object etc.).
2, parametric programming
By some attributes, from the rhythm model storehouse, find each syllable the parameters,acoustic that should have, also be which type of pitch, the duration of a sound, the loudness of a sound of each syllable should be, finish planning to the parameters,acoustic of each syllable.These attributes comprise: this syllable be in prefix, speech, suffix or monosyllabic word; The speech at this syllable place is in beginning of the sentence, sentence or end of the sentence; What the tone of this syllable front and back is, also promptly accent connects attribute; What the simple or compound vowel of a Chinese syllable of this syllable front and the initial consonant of back be, also is that sound connects attribute; Preceding sticking, the sticking attribute in back of this syllable; The position of this syllable place prosodic phrase, the intonation pattern of this syllable place statement; The part of speech of this syllable place speech, described syntactic constituent etc.
Suppose in short total K syllable (from 1 to K), then afterwards parameters,acoustic of each syllable is as follows in its planning: X k={ H k, L k, T k, A k(k=1 ..., K) be respectively the high point of articulation, the low point of articulation, the duration of a sound and the loudness of a sound that k syllable planned.Its location parameter is Y simultaneously k={ S k, P k, W kRepresent respectively each syllable in sentence, in the prosodic phrase and the position in the speech, wherein beginning of the sentence, prosodic phrase head or prefix all are defined as 0, in the sentence, be defined as 1 in the prosodic phrase or in the speech, end of the sentence, prosodic phrase end or speech end are defined as 2.
3, obtain all candidate samples
For each syllable, from index database, obtain all samples that this syllable exists in extensive dictation library, be called candidate samples.
Index database has been listed all samples of all syllables, and is to discharge according to the order of syllable.For each syllable, total how many samples have all been write down, then the sequence number of each sample of journal in extensive recording sound storehouse.Sample identifies with its sequence number in extensive recording sound storehouse.Therefore, provide a syllable after, just can obtain its all samples in extensive recording sound storehouse fast.
4, maximum match
As shown in Figure 2, begin to handle, establish n=1 from first syllable; (S4.1)
To all candidate samples of current syllable (n syllable), check that whether the follow-up syllable of its candidate samples in former sentence is complementary with the follow-up syllable that will synthesize statement, writes down the length of its coupling.If can not carry out the coupling of follow-up syllable, then matching length was 1 (expression only can be mated self syllable); (S4.2)
Calculate the maximum match length in all candidate samples of current syllable, establish L maximum match length for this reason; (S4.3)
If matching length L is 1, expression does not have polysyllabic coupling, then changes S4.10; (S4.4)
To current syllable, select the candidate samples of all matching length>=L and the string of follow-up L-1 syllable composition thereof to be the coupling string.Here may find one or more coupling strings.Suppose to find J coupling string, and suppose that the parameters,acoustic of each sample in certain string and the location parameter in former sentence are as follows: X J, k'={ h J, k', L J, k', A J, k' and Y J, k'={ S J, k', P J, k', W J, k' (j=1 ..., J, k=0 ..., L-1); (S4.5)
Calculate the parameters,acoustic in each parameters,acoustic, location parameter and planning of mating string, the cost C between the location parameter j,
C j = Σ k = 0 L - 1 f ( X n + k , X j , k ′ , Y n + k , Y j , k ′ ) L - - - ( S 4.6 )
Wherein:
f(X i,X j′,Y i,Y j′)=g(X i,X j′)+h(Y i,Y j′)
g = ( X i , X j ′ ) = ω H ( H i - H j ′ ) 2 + ω L ( L i - L j ′ ) 2 + ω T ( T i - T j ′ ) 2 + ω A ( A i - A j ′ ) 2
h(Y i,Y j′)=ωS|S i-S j′|+ωP|P i-P j′|+ω W|W i-W j′|
Wherein ω is a different parameters weight separately.
The coupling string of minimum cost is found in calculating, and establishing its cost is C Min
C min=min(C j)(j=1,J)(S4.7)
If minimum cost C MinGreater than threshold value C Th, represent that parameters,acoustic of this coupling string and the parameters,acoustic of being planned differ too big, the coupling string of this length can't obtain the result that conforms to ideal value.(S4.8a) then shorten matching length, L=L-1 changes S4.4 then; (S4.8b)
The sample of choosing that identifies syllable to be synthesized is the sample of coupling string representative, identifies a continuous L syllable altogether; (S4.9)
Meet step S4.4, n=n+L is set, maximum match is not carried out in expression, and this moment, L=1 also promptly jumped to next syllable.Perhaps meet step S4.9, L syllable of maximum match skipped in expression; (S4.10)
Whether be ultima, if not, jump to S4.2 and handle.Otherwise withdraw from the processing of maximum match.(S4.11)
5, individual character is selected
Through the step of maximum match, the designated sample of choosing of some syllable in a word, other syllable are not then specified the sample of choosing as yet.For example below in the words: " the up-to-date phonetic synthesis product of having released of Jie Tonghua sound voice technology company limited ", " technology company limited ", " voice ", " product " have had the sample of choosing through maximum match, three parts formed in then remaining syllable, " Jie Tonghua sound voice ", " the up-to-date release ", " synthesizing ", the syllable in these parts does not all have to specify chooses sample.It is exactly pointer carries out sample to the syllable in these parts selection that the individual character is here selected.These parts are called " treatment region ".
Handle operation at each treatment region below.
Suppose that this treatment region is made of N syllable, and sequence number is from C to C+N-1.Concerning each syllable, several candidate samples are arranged, this number of sampling of supposing n syllable is M n(n=C...C+N-1).Defining each candidate samples is W Ij(i=C ... C+N-1; J=1 ... .M i).Therefore, formed the lattice of throwing the net as shown in Figure 3, each candidate samples is a node in this grid.And wherein any path of running through this grid all is a possible sound result that selects.
Calculate the parameters,acoustic of parameters,acoustic, location parameter and the planning of each all candidate samples of syllable, the cost between the location parameter, be called the node cost.Parameters,acoustic and the location parameter of supposing j both candidate nodes of n syllable are X N, j'={ H N, j', L N, j', T N, j', A N, j' and Y N, j'={ S N, j', P N, j', W N, j' (n=1 ..., N, j=1 ..., J).Then its node cost is: D N, j=f (X n, X N, j', Y n, Y N, j').This function definition is the same.
Connection cost between all candidate samples of two adjacent syllables of calculating.For example the connection cost between k the candidate of j candidate of n syllable and n+1 syllable is: E N, j, k=g (X N, j', X N+1, k').This function definition is the same.
The overall cost that defines a paths is the summation of the connection cost between all node costs and the adjacent node on the path for this reason.
Therefore,, suppose that any one is path from first node to the end-node path for this treatment region node grid, wherein concerning n syllable, the path process be the individual node of p (n).Therefore, the overall cost in this path is:
C path = Σ n = I I + N - 1 D n , p ( n ) + Σ n = I I + N - 2 E n , p ( n ) , p ( n + 1 )
Adopt dynamic programming algorithm, in various possible paths, calculate optimal path, also promptly select the path of overall cost minimum.For example, we have chosen the represented path of line of overstriking in Fig. 3.The concrete steps of dynamic programming are as follows:
At first calculate from the local optimum path of 2 syllables of the 1st syllable to the (also promptly being the syllable of C+1), also promptly to each node W of the 2nd syllable correspondence from sequence number Ij(i=C+1, j=1...M C+1), calculate cost from all nodes of previous syllable to this node, this cost is made up of the cost that is connected of certain node of 2 syllables of node cost and this node to the of certain node of previous syllable.As shown in Figure 3, for the 2nd node of the 2nd syllable, calculate the cost of each node of the 1st syllable to this node.It is calculated as follows;
Cost(W C,1,W C+1,2)=21+6=27
Cost(W C,2,W C+1,2)=32+10=42
Cost(W C,3,W C+1,2)=24+12=36
Cost(W C,4,W C+1,2)=18+8=26
The path of cost Cost minimum is exactly the local optimum path, also promptly from W C, 4To W C+1,2The path, its local optimum path cost is 26.Equally a local optimum path that all has from certain node of first syllable to it is also arranged, suppose that local optimum path cost separately is respectively 16 and 20 to the 1st node of the 2nd syllable and the 3rd node.As shown in Figure 3.
And then calculate the local optimum path of the 3rd syllable (also being that sequence number is the syllable of C+2).Each node W to this syllable correspondence Ij(i=C+1, j=1...M C+1), calculate local path from all nodes of first syllable to the best of this node.Because from the best local path of 2 certain nodes of syllable of first syllable to the as calculated, so add from the local path cost of 3 syllables of the 2nd syllable to the as long as calculate the cost result of this best local path now.For example concerning the 2nd node of the 3rd syllable, its cost is:
Cost(W C+1,1,W C+2,2)=16+18+27=61
Cost(W C+1,2,W C+2,2)=26+22+10=58
Cost(W C+1,3,W C+2,2)=20+34+11=65
Therefore, we can know that its local optimum path is from W from the 2nd node of three syllables of second syllable to the C+1,2To W C+2,2The path, again from W C+1,2Recall local optimum path forward, know that promptly be from W from first syllable up to the optimal path of the 2nd node of the 3rd syllable from first syllable to it C, 4To W C+1,2Arrive W again C+2,2The path.
Calculate the optimal path of each node of ultima so always, the Cost value that compares the local optimum path of all these nodes again, get the minimum pairing node of Cost the last node as whole optimal path, by to the recalling of local optimal path, just can know the optimal path of an integral body then.
Be provided with each syllable on this treatment region choose sample by optimal path the both candidate nodes of process.
Handle next treatment region, till no any treatment region.
6, waveform concatenation
By top step, sample all selected in each syllable.After selecting all samples, in fact just know its sequence number in extensive recording sound storehouse, and by this sequence number, in extensive recording sound storehouse, searching, just can obtain the length value that the reference position of the pairing speech waveform data of this sample and the duration by parameters,acoustic obtain.By these values, just can from extensive recording sound storehouse, read out corresponding Wave data.All Wave datas of choosing sample are coupled together, just finished waveform concatenation, thereby obtain final phonetic synthesis result.

Claims (1)

1, a kind ofly selects the phoneme synthesizing method of sound, comprise the steps: based on rhythm model and parameter
(a) set up rhythm model storehouse, record sound storehouse, index database on a large scale;
(b) text of wanting synthetic speech is carried out pre-service, it comprises, and punctuate, regularization of text, participle, part-of-speech tagging, syntactic analysis, rhythmite level structure are analyzed, commentaries on classics phonetic;
(c) according to the attribute of syllable: in the speech of each syllable in position, the prosodic phrase in position and the sentence sound of position and this syllable connect attribute, the company's of accent attribute, from the rhythm model storehouse, find the parameters,acoustic value that each syllable has, finish planning the parameters,acoustic of each syllable; Wherein said parameters,acoustic comprises: pitch, the duration of a sound, loudness of a sound;
(d), from index database, obtain all candidate samples that this syllable exists in extensive dictation library for each syllable;
(e) parameters,acoustic in each coupling that may form by the candidate samples institute of calculating adjacent syllable parameters,acoustic, location parameter and planning of going here and there, the cost C between the location parameter j, find its minimum cost C MinLess than threshold value C ThThe coupling string, be provided with these adjacent syllables choose sample mate for this reason the string pairing candidate samples;
(f) section that does not mate in the text is carried out the byte matching treatment:
Calculate the parameters,acoustic of parameters,acoustic, location parameter and the planning of each all candidate samples of syllable, the node cost between the location parameter;
Connection cost between all candidate samples of two adjacent syllables of calculating;
Adopt dynamic programming algorithm, in each path, calculate the path of overall cost minimum; Described overall cost is the summation of the connection cost between all node costs and the adjacent node on the path for this reason;
Be provided with each syllable choose sample by optimal path the both candidate nodes of process;
(g) according to selected sample, from extensive true recording sound storehouse, obtain Wave data, splice.
CN2004100969685A 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice Active CN1787072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2004100969685A CN1787072B (en) 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2004100969685A CN1787072B (en) 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Publications (2)

Publication Number Publication Date
CN1787072A true CN1787072A (en) 2006-06-14
CN1787072B CN1787072B (en) 2010-06-16

Family

ID=36784491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2004100969685A Active CN1787072B (en) 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Country Status (1)

Country Link
CN (1) CN1787072B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945692B (en) * 2006-10-16 2010-05-12 安徽中科大讯飞信息科技有限公司 Intelligent method for improving prompting voice matching effect in voice synthetic system
CN101000766B (en) * 2007-01-09 2011-02-02 黑龙江大学 Chinese intonation base frequency contour generating method based on intonation model
CN104464717A (en) * 2013-09-25 2015-03-25 三菱电机株式会社 Voice Synthesizer
CN104575487A (en) * 2014-12-11 2015-04-29 百度在线网络技术(北京)有限公司 Voice signal processing method and device
CN104916284A (en) * 2015-06-10 2015-09-16 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN105489216A (en) * 2016-01-19 2016-04-13 百度在线网络技术(北京)有限公司 Voice synthesis system optimization method and device
CN106356052A (en) * 2016-10-17 2017-01-25 腾讯科技(深圳)有限公司 Voice synthesis method and device
CN108573692A (en) * 2017-03-14 2018-09-25 谷歌有限责任公司 Phonetic synthesis Unit selection
CN110047462A (en) * 2019-01-31 2019-07-23 北京捷通华声科技股份有限公司 A kind of phoneme synthesizing method, device and electronic equipment
CN110797006A (en) * 2020-01-06 2020-02-14 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium
WO2020088006A1 (en) * 2018-10-29 2020-05-07 阿里巴巴集团控股有限公司 Speech synthesis method, device, and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69028072T2 (en) * 1989-11-06 1997-01-09 Canon Kk Method and device for speech synthesis
JPH1039895A (en) * 1996-07-25 1998-02-13 Matsushita Electric Ind Co Ltd Speech synthesising method and apparatus therefor
JP2003108178A (en) * 2001-09-27 2003-04-11 Nec Corp Voice synthesizing device and element piece generating device for voice synthesis
TW556150B (en) * 2002-04-10 2003-10-01 Ind Tech Res Inst Method of speech segment selection for concatenative synthesis based on prosody-aligned distortion distance measure
CN1238805C (en) * 2002-07-25 2006-01-25 摩托罗拉公司 Method and apparatus for compressing voice library

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945692B (en) * 2006-10-16 2010-05-12 安徽中科大讯飞信息科技有限公司 Intelligent method for improving prompting voice matching effect in voice synthetic system
CN101000766B (en) * 2007-01-09 2011-02-02 黑龙江大学 Chinese intonation base frequency contour generating method based on intonation model
CN104464717A (en) * 2013-09-25 2015-03-25 三菱电机株式会社 Voice Synthesizer
CN104464717B (en) * 2013-09-25 2017-11-03 三菱电机株式会社 Speech synthesizing device
CN104575487A (en) * 2014-12-11 2015-04-29 百度在线网络技术(北京)有限公司 Voice signal processing method and device
CN104916284A (en) * 2015-06-10 2015-09-16 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN104916284B (en) * 2015-06-10 2017-02-22 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN105489216A (en) * 2016-01-19 2016-04-13 百度在线网络技术(北京)有限公司 Voice synthesis system optimization method and device
CN105489216B (en) * 2016-01-19 2020-03-03 百度在线网络技术(北京)有限公司 Method and device for optimizing speech synthesis system
CN106356052B (en) * 2016-10-17 2019-03-15 腾讯科技(深圳)有限公司 Phoneme synthesizing method and device
CN106356052A (en) * 2016-10-17 2017-01-25 腾讯科技(深圳)有限公司 Voice synthesis method and device
US10832652B2 (en) 2016-10-17 2020-11-10 Tencent Technology (Shenzhen) Company Limited Model generating method, and speech synthesis method and apparatus
CN108573692A (en) * 2017-03-14 2018-09-25 谷歌有限责任公司 Phonetic synthesis Unit selection
CN108573692B (en) * 2017-03-14 2021-09-14 谷歌有限责任公司 Speech synthesis unit selection
WO2020088006A1 (en) * 2018-10-29 2020-05-07 阿里巴巴集团控股有限公司 Speech synthesis method, device, and apparatus
TWI731382B (en) * 2018-10-29 2021-06-21 開曼群島商創新先進技術有限公司 Method, device and equipment for speech synthesis
CN110047462A (en) * 2019-01-31 2019-07-23 北京捷通华声科技股份有限公司 A kind of phoneme synthesizing method, device and electronic equipment
CN110047462B (en) * 2019-01-31 2021-08-13 北京捷通华声科技股份有限公司 Voice synthesis method and device and electronic equipment
CN110797006A (en) * 2020-01-06 2020-02-14 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium
CN110797006B (en) * 2020-01-06 2020-05-19 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium

Also Published As

Publication number Publication date
CN1787072B (en) 2010-06-16

Similar Documents

Publication Publication Date Title
CN107464559B (en) Combined prediction model construction method and system based on Chinese prosody structure and accents
JP4328698B2 (en) Fragment set creation method and apparatus
US7979280B2 (en) Text to speech synthesis
CN1169115C (en) Prosodic databases holding fundamental frequency templates for use in speech synthesis
Gonzalvo et al. Recent advances in Google real-time HMM-driven unit selection synthesizer
JP3910628B2 (en) Speech synthesis apparatus, speech synthesis method and program
US20050182629A1 (en) Corpus-based speech synthesis based on segment recombination
CN1889170A (en) Method and system for generating synthesized speech base on recorded speech template
US10235991B2 (en) Hybrid phoneme, diphone, morpheme, and word-level deep neural networks
US8626510B2 (en) Speech synthesizing device, computer program product, and method
CN1755796A (en) Distance defining method and system based on statistic technology in text-to speech conversion
CN101075432A (en) Speech synthesis apparatus and method
JP4406440B2 (en) Speech synthesis apparatus, speech synthesis method and program
CN1835075A (en) Speech synthetizing method combined natural sample selection and acaustic parameter to build mould
CN101064103A (en) Chinese voice synthetic method and system based on syllable rhythm restricting relationship
CN1787072A (en) Method for synthesizing pronunciation based on rhythm model and parameter selecting voice
US20090216537A1 (en) Speech synthesis apparatus and method thereof
CN1811912A (en) Minor sound base phonetic synthesis method
CN1956057A (en) Voice time premeauring device and method based on decision tree
CN1032391C (en) Chinese character-phonetics transfer method and system edited based on waveform
CN1661673A (en) Speech synthesizer,method and recording medium for speech recording synthetic program
JP4829605B2 (en) Speech synthesis apparatus and speech synthesis program
KR101201913B1 (en) Voice Synthesizing Method and System Based on User Directed Candidate-Unit Selection
JP5268731B2 (en) Speech synthesis apparatus, method and program
JP2008015424A (en) Pattern specification type speech synthesis method, pattern specification type speech synthesis apparatus, its program, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 100193, No. two, building 10, Zhongguancun Software Park, 8 northeast Wang Xi Road, Beijing, Haidian District, 206-1

Patentee after: Beijing InfoQuick SinoVoice Speech Technology Corp.

Address before: E101 development building, 12, information road, Haidian District, Beijing, Zhongguancun

Patentee before: Jietong Huasheng Speech Technology Co., Ltd.