EP0144731A2 - Speech synthesizer - Google Patents

Speech synthesizer Download PDF

Info

Publication number
EP0144731A2
EP0144731A2 EP84113186A EP84113186A EP0144731A2 EP 0144731 A2 EP0144731 A2 EP 0144731A2 EP 84113186 A EP84113186 A EP 84113186A EP 84113186 A EP84113186 A EP 84113186A EP 0144731 A2 EP0144731 A2 EP 0144731A2
Authority
EP
European Patent Office
Prior art keywords
speech
waveform
articulation
unit
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP84113186A
Other languages
German (de)
French (fr)
Other versions
EP0144731A3 (en
EP0144731B1 (en
Inventor
Katsunobu Fushikida
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of EP0144731A2 publication Critical patent/EP0144731A2/en
Publication of EP0144731A3 publication Critical patent/EP0144731A3/en
Application granted granted Critical
Publication of EP0144731B1 publication Critical patent/EP0144731B1/en
Expired legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • This invention relates to a speech synthesizer.
  • V CV vowel - consonant - vowel
  • a speech synthesizer comprising a converting means for converting the input sequence of characters to a sequence of articulation symbols corresponding to a unit speech waveform which is obtained by dividing a diphone, a memory for storing the unit speech waveform corresponding to the predetermined articulation symbols and a synthesizing means for reading the unit speech waveforms corresponding to the articulation symbols of the converted sequence of articulation symbols from the memory and synthesizing them.
  • a speech to be synthesized is first indicated by a keyboard 10. From the keyboard 10, a sequence of character signal, a stress strength signal (in this embodiment, three-levelled) and a boundary signal between speeches are generated.
  • a stress strength signal in this embodiment, three-levelled
  • a boundary signal between speeches are generated.
  • the structure and performance of the speech synthesizer shown in Fig. 1 will be described on the assumption that the speech to be synthesized is "kite".
  • the alphabetical character sequence signals incorporating “kite” are generated by pushing keys "K”, “I”, “T” and “E".
  • the boundary signal B indicating the boundary such as the beginning, ending and pause of the word “kite” and the stress strength signal S T are also supplied to a phoneme symbol/articulation symbol converting circuit 20 together with the character sequence signal.
  • the stress strength is determined based on the pitch and strength of each syllable, for example, a high stress strength shows high pitch frequency.
  • the converting circuit 20 has a processing part 21 and a memory 22. In the memory 22 is stored the phoneme symbol corresponding to the speech which has been prepared in advance. For example, a phoneme symbol /kait/ is stored in correspondence with the word "kite".
  • the processing part 21 supplies an address information to the memory 22 in response to the input signal for a sequence of character. Then the phoneme symbol signal /kait/ is read from the memory 22 and supplied to a phoneme symbol/articulation symbol converting circuit 30.
  • the converting circuit 30 has, as well as the converting circuit 20, a processing part 31 and a memory 32. In the memory 32 is stored an articulation symbol (determined by the phonemes located therebefore and thereafter) which is determined in advance corresponding to the phoneme symbol and by the method peculiar to the present invention which will be described in the following.
  • the articulatory organs of the human being include vocal chords, a tongue, lips, a velum palatinum, etc., as shown in Fig. 3, and various speech is generated by controlling these articulatory organs in accordance with nerve pulse signals. Therefore, if two articulations of the articulatory organs are similar, two similar speech waveforms are generated. Further, it is apparent that if the.articulation parameter values representing the movements of these articulatory organs are approximate to each other, the generated speech waveforms are analogous. As described above, in the conventional synthesizing method basedon the CV, VC waveform connecting type, many speech waveforms corresponding to CV and VC are prepared, but from the viewpoint of the movement of an articulation parameter considerably redundant waveforms are included therein.
  • the speech waveform corresponding to a phoneme /ka/ and that corresponding to a phoneme /ga/ are prepared separately.
  • the movement of the articulatory organs for /ka/ and that for /ga/ are very similar.
  • the relationship between the tongue, palate, etc. is almost the same, and the main difference is in whether the vocal chords are vibrating or not (voiced or unvoiced) in the consonant parts. Therefore, in the voiced section after the unvoiced section of the consonant part /k/ in /ka/ (the section shifting to the normal part of the vowel /a/ which corresponds to (C in Fig.
  • the articulation parameter is almost the same as that of /ga/ (C in Fig. 4B), which can take the place of the partial. waveform of /ka/ in that section with a fairly good approximation. It is clear that in the pairs /kV/ - /gV/, /tV/ - /dV/ and /pV/ - /bV/ (V represents a vowel) also, the waveforms in the part shifting to the vowels can be shared. In Figs.
  • part A is the silent part at the beginning of /ka/ or /ga/ (represented as * )
  • part B the waveform of "k” in /ka/ or “g” in /ga/
  • B' the waveform of the part affected by the phoneme following "g” in /ga/
  • C and D are, as described above, the speech waveforms of the vowels "a” following the consonant of /ka/ and /ga/.
  • the time section which is determined in consideration of manner of articulation is shorter than a CV or VC waveform and can be substituted by a speech waveform based on a different phoneme series, as is shown in Figs. 4A, 4 B , is called an articulation segment, and a speech waveform in the articulation segment is called an articulation element piece waveform. That is, the syllables /ka/ and /ga/ are divided into the time sections B and C for the purpose of using the transient parts of those syllables as those for another speech synthesis.
  • articulation segments the manner of articulation of which are the same are represented by the same articulation symbol and the articulation element piece waveform corresponding to this articulation symbol is stored in the memory 32 in advance.
  • the articulation symbols corresponding to a sequence of phoneme symbols are stored in advance.
  • Fig. 2 shows the classified articulation symbols, in which * represents the silent part which is placed at the beginning of speech or immediately before an explosive, "p", “t”, “k” explosive parts, and (b)a, (d)a, (g)a represent transient parts of the vowel "a” parts which follow the consonants "b", "d", “g”.
  • i(b), i(d), i(g) represent the transient parts of the vowel "i” parts which precede the consonants "b", “d", "g", and ai, au, ao represent the transient parts where the vowel "a” is followed by the vowels "i", "u” and "o".
  • the phoneme symbol /it/ is substituted by a transient part i(d) shifting from the vowel to the consonant of the phoneme symbol /id/ resembling /it/ and a silent part * is placed immediately before the silent explosive "t".
  • speech synthesis by using, in place of /ka/ and /it/, the waveforms taken from /ga/ and /id/ the phoneme sequence of which is different from, but the articulation of which is similar to /ka/ and /it/, dispenses with the need to previously store the transient part of /ka/ or /it/ and enables reduction in the memory capacity.
  • These articulation element piece waveforms can be easily obtained from, for example, waveforms of uttered speech.
  • the waveform address generation circuit 50 reads the articulation element piece waveform corresponding to each articulation symbol which is contained in the articulation signal, and corresponding to the stress signal S T from an articulation waveform memory which is selected from among memories 80a, 80b and 80c included in an articulation waveform memory 80 by the stress signal S T .
  • the articulation element piece waveform is generated on the basis of the address corresponding to each articulation symbol from the memory 80.
  • the stress signal S T from the processing part 21 is detected in a stress strength detection circuit 40, and the articulation phoneme piece waveform of the strength corresponding to the strength of the detected stress strength is read from the memory 80.
  • the articulation element piece waveforms corresponding to the articulation symbols shown in Fig. 2 are stored.
  • An interpolation method selection circuit 60 judges whether the articulation symbol (two continuous waveforms) from the phoneme symbol/articulation symbol converting circuit 30 is voiced or unvoiced.
  • the interpolation circuit 70 is controlled by this judge result to perform the following interpolation, namely, when the articulation symbol is unvoiced (as well as silence) the two continuous articulation element piece waveforms read from the memory 80 are directly connected and, when the articulation symbol is voiced, these waveforms are interpolated, for example, synchronously with a pitch.
  • any spoken word is synthesized by connecting articulation waveforms having several levels of pitches by interpolation process between waveforms on the synchronous pitch process. For example, as shown in Fig.
  • N i namely the time length of h i (n)
  • N i is assumed to be the value obtained by interpolating N f and Ng.
  • N f and N are shorter than N i the final sample value of the waveform may be repeated, and when N f and Ng are longer than N i the surplus waveform may be discarded.
  • a continuous articulation element piece waveform (in this example, a digital waveform) corresponding to the input sequence of characters is supplied to a D/A converter 90 where the interpolated synthesized articulation waveform is converted to an analogue waveform and generated as a synthesized speech.
  • the symbol waveform of a synthesized speech obtained in this way is shown in Fig. 6.
  • this invention in which a unit of speech is used which is shorter from the viewpoint of time than a unit speech waveform such as CV, VC waveforms in the CV, VC waveform compiling type synthesizing method, not only requires a small memory capacity of waveform but also reflects exactly the articulation of the articulatory organs so as to obtain a synthesized speech of high quality.
  • an articulation element piece waveform corresponding to an articulation symbol is compiled and synthesized, but it is clear that the reduction in memory capacity is also possible when this invention is applied to the synthesizing method using what is called a "characteristic parameter" such as a Formant parameter.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention relates to a speech synthesizer comprising a converting means (20, 30) for converting the input sequence of characters to a sequence of articulation symbols corresponding to a unit speech waveform which is obtained by dividing a diphone, a memory (80) for storing said unit speech waveform corresponding to the predetermined articulation symbols, and a synthesizing means (60, 70) for reading said unit speech waveforms corresponding to said articulation symbols of the converted sequence of articulation symbols from said memory (80) and synthesizing them. The speech synthesizer requires a comparatively small memory capacity and provides synthesized speech of high quality.

Description

    BACKGROUND OF THE INVENTION:
  • This invention relates to a speech synthesizer.
  • In the conventional speech synthesis, appropriate syllable waveforms represented by combination of vowel - consonant - vowel (VCV) are prepared in advance, .and connected together. However, since the number of phonemes represented by VCV is very large, an enormous memory capacity for storing them is required. On the other hand, there has been proposed a method in which the waveforms corresponding to the combinations of consonant - vowel (CV), or vowel - consonant (VC), namely, demisyllable or diphone, which have a time length of about half that of a single syllable, are prepared in advance, and the waveforms corresponding to the CV or VC to be required for synthesized speech are selected, and are connected together (compiled and synthesized). According to this method, a reduction in memory capacity is possible than in the case of preparing VCV, but a relatively large memory capacity is still required because of the large quantity of speech waveform information corresponding to CV and VC.
  • SUMMARY OF THE INVENTION:
  • Accordingly, it is an object of the invention to provide a speech synthesizer which requires a comparatively small memory capacity in respect of speech data such as speech waveforms to be prepared in advance.
  • It is another object of the invention to provide a speech synthesizer which has the above advantage and by which synthesized speech of high quality can be obtained.
  • According to the present invention, there is provided a speech synthesizer comprising a converting means for converting the input sequence of characters to a sequence of articulation symbols corresponding to a unit speech waveform which is obtained by dividing a diphone, a memory for storing the unit speech waveform corresponding to the predetermined articulation symbols and a synthesizing means for reading the unit speech waveforms corresponding to the articulation symbols of the converted sequence of articulation symbols from the memory and synthesizing them.
  • These and other objects and features of the present invention_will become clear by the following description of a preferred embodiment of the present invention with reference to the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS:
    • Fig. 1 is a block diagram showing the structure of an embodiment of a speech synthesizer according to the invention;
    • Fig. 2 is a table of information of the synthesizer shown in Fig. 1 which is stored in a memory 32 of a phoneme symbol/articulation symbol converting part 30;
    • Fig. 3 illustrates the concept of the articulatory organs of the human body for explaiming the principle of the invention;
    • Figs. 4A and 4B show examples of articulatory segments for explaiming the principle of the invention;
    • Fig. 5 shows waveforms interpolated by a synchronous pitch method used in the present invention; and
    • Fig. 6 is a waveform of synthesized speech formed by compiling and synthesizing waveforms of articulation element pieces.
    DESCRIPTION OF THE PREFERRED EMBODIMENT:
  • Referring to Fig. 1, a speech to be synthesized is first indicated by a keyboard 10. From the keyboard 10, a sequence of character signal, a stress strength signal (in this embodiment, three-levelled) and a boundary signal between speeches are generated. Hereinunder, the structure and performance of the speech synthesizer shown in Fig. 1 will be described on the assumption that the speech to be synthesized is "kite".
  • Now, the alphabetical character sequence signals incorporating "kite" are generated by pushing keys "K", "I", "T" and "E". The boundary signal B indicating the boundary such as the beginning, ending and pause of the word "kite" and the stress strength signal ST are also supplied to a phoneme symbol/articulation symbol converting circuit 20 together with the character sequence signal. The stress strength is determined based on the pitch and strength of each syllable, for example, a high stress strength shows high pitch frequency. The converting circuit 20 has a processing part 21 and a memory 22. In the memory 22 is stored the phoneme symbol corresponding to the speech which has been prepared in advance. For example, a phoneme symbol /kait/ is stored in correspondence with the word "kite". The processing part 21 supplies an address information to the memory 22 in response to the input signal for a sequence of character. Then the phoneme symbol signal /kait/ is read from the memory 22 and supplied to a phoneme symbol/articulation symbol converting circuit 30. The converting circuit 30 has, as well as the converting circuit 20, a processing part 31 and a memory 32. In the memory 32 is stored an articulation symbol (determined by the phonemes located therebefore and thereafter) which is determined in advance corresponding to the phoneme symbol and by the method peculiar to the present invention which will be described in the following.
  • The articulatory organs of the human being include vocal chords, a tongue, lips, a velum palatinum, etc., as shown in Fig. 3, and various speech is generated by controlling these articulatory organs in accordance with nerve pulse signals. Therefore, if two articulations of the articulatory organs are similar, two similar speech waveforms are generated. Further, it is apparent that if the.articulation parameter values representing the movements of these articulatory organs are approximate to each other, the generated speech waveforms are analogous. As described above, in the conventional synthesizing method basedon the CV, VC waveform connecting type, many speech waveforms corresponding to CV and VC are prepared, but from the viewpoint of the movement of an articulation parameter considerably redundant waveforms are included therein. For example, in the CV, VC waveform connecting type method, the speech waveform corresponding to a phoneme /ka/ and that corresponding to a phoneme /ga/ are prepared separately. However, the movement of the articulatory organs for /ka/ and that for /ga/ are very similar. The relationship between the tongue, palate, etc. is almost the same, and the main difference is in whether the vocal chords are vibrating or not (voiced or unvoiced) in the consonant parts. Therefore, in the voiced section after the unvoiced section of the consonant part /k/ in /ka/ (the section shifting to the normal part of the vowel /a/ which corresponds to (C in Fig. 4A) the articulation parameter is almost the same as that of /ga/ (C in Fig. 4B), which can take the place of the partial. waveform of /ka/ in that section with a fairly good approximation. It is clear that in the pairs /kV/ - /gV/, /tV/ - /dV/ and /pV/ - /bV/ (V represents a vowel) also, the waveforms in the part shifting to the vowels can be shared. In Figs. 4A and 4B, part A is the silent part at the beginning of /ka/ or /ga/ (represented as *), part B the waveform of "k" in /ka/ or "g" in /ga/, B' the waveform of the part affected by the phoneme following "g" in /ga/, and C and D are, as described above, the speech waveforms of the vowels "a" following the consonant of /ka/ and /ga/.
  • Here; the time section which is determined in consideration of manner of articulation is shorter than a CV or VC waveform and can be substituted by a speech waveform based on a different phoneme series, as is shown in Figs. 4A, 4B, is called an articulation segment, and a speech waveform in the articulation segment is called an articulation element piece waveform. That is, the syllables /ka/ and /ga/ are divided into the time sections B and C for the purpose of using the transient parts of those syllables as those for another speech synthesis.
  • As described above, articulation segments the manner of articulation of which are the same are represented by the same articulation symbol and the articulation element piece waveform corresponding to this articulation symbol is stored in the memory 32 in advance. In this way, in the memory 32, the articulation symbols corresponding to a sequence of phoneme symbols are stored in advance. Fig. 2 shows the classified articulation symbols, in which * represents the silent part which is placed at the beginning of speech or immediately before an explosive, "p", "t", "k" explosive parts, and (b)a, (d)a, (g)a represent transient parts of the vowel "a" parts which follow the consonants "b", "d", "g". On the other hand, i(b), i(d), i(g) represent the transient parts of the vowel "i" parts which precede the consonants "b", "d", "g", and ai, au, ao represent the transient parts where the vowel "a" is followed by the vowels "i", "u" and "o".
  • Now returning to Fig. 1, in response to an address corresponding to the phoneme signal /kait/ sent from the processing part 21, a sequence of the articulation symbols /* - k - (g)a - ai - i(d) - * - t/ corresponding to the phoneme signal /kait/ is read as an articulation signal from the memory 32 in the phoneme symbol/articulation symbol converting circuit 31. Here, * represents a silent part described above (#1 in Fig. 2), "k" and "t" explosive parts of /k/ and /t/ (#2, #6 in Fig. 2), "g(a)" a transient part shifting from the consonant to the vowel of /ga/ (#3), "ai" a transient part of the vowel link /ai/ (#4) and "i(d)" a transient part shifting from the vowel to the consonant of /id/ (#5), respectively. In this example, /ka/ in the phoneme symbol /kait/ is substituted by a silent explosive "k" and "(g)a" representing the transient part shifting from the consonant to the vowel of the phoneme symbol "ga" resembling /ka/. The phoneme symbol /it/ is substituted by a transient part i(d) shifting from the vowel to the consonant of the phoneme symbol /id/ resembling /it/ and a silent part * is placed immediately before the silent explosive "t".
  • As described above, speech synthesis by using, in place of /ka/ and /it/, the waveforms taken from /ga/ and /id/ the phoneme sequence of which is different from, but the articulation of which is similar to /ka/ and /it/, dispenses with the need to previously store the transient part of /ka/ or /it/ and enables reduction in the memory capacity.. These articulation element piece waveforms can be easily obtained from, for example, waveforms of uttered speech.
  • Thus obtained articulation signal is supplied to a waveform address generation circuit 50. The waveform address generation circuit 50 reads the articulation element piece waveform corresponding to each articulation symbol which is contained in the articulation signal, and corresponding to the stress signal ST from an articulation waveform memory which is selected from among memories 80a, 80b and 80c included in an articulation waveform memory 80 by the stress signal ST. In other words, the articulation element piece waveform is generated on the basis of the address corresponding to each articulation symbol from the memory 80. The stress signal ST from the processing part 21 is detected in a stress strength detection circuit 40, and the articulation phoneme piece waveform of the strength corresponding to the strength of the detected stress strength is read from the memory 80. In the articulation waveform memory 80 the articulation element piece waveforms corresponding to the articulation symbols shown in Fig. 2 are stored.
  • An interpolation method selection circuit 60 judges whether the articulation symbol (two continuous waveforms) from the phoneme symbol/articulation symbol converting circuit 30 is voiced or unvoiced. The interpolation circuit 70 is controlled by this judge result to perform the following interpolation, namely, when the articulation symbol is unvoiced (as well as silence) the two continuous articulation element piece waveforms read from the memory 80 are directly connected and, when the articulation symbol is voiced, these waveforms are interpolated, for example, synchronously with a pitch.
  • Generally, direct connection of the articulation waveforms makes an unnatural synthesis because of the discontinuous change of a pitch or spectrum. To eliminate this drawback, in this invention, any spoken word is synthesized by connecting articulation waveforms having several levels of pitches by interpolation process between waveforms on the synchronous pitch process. For example, as shown in Fig. 5 if one pitch period of waveform (element piece waveform) at the connected ending part of a temporally preceding unit speech waveform is f(n), its time length (pitch period) Nf, the element piece waveform at the connected beginning part of a succeeding unit speech waveform g(n), its time length (pitch period) N , and the element piece waveform in the i-th section of the interpolation waveform of k pitch section is hi(n), the hi(n) is generated on the basis of the following formulae:
  • Figure imgb0001
    Figure imgb0002
  • Ni, namely the time length of hi(n), is assumed to be the value obtained by interpolating Nf and Ng. In this case, when Nf and N are shorter than Ni the final sample value of the waveform may be repeated, and when Nf and Ng are longer than Ni the surplus waveform may be discarded.
  • In this way, a continuous articulation element piece waveform (in this example, a digital waveform) corresponding to the input sequence of characters is supplied to a D/A converter 90 where the interpolated synthesized articulation waveform is converted to an analogue waveform and generated as a synthesized speech. The symbol waveform of a synthesized speech obtained in this way is shown in Fig. 6.
  • As described above, this invention, in which a unit of speech is used which is shorter from the viewpoint of time than a unit speech waveform such as CV, VC waveforms in the CV, VC waveform compiling type synthesizing method, not only requires a small memory capacity of waveform but also reflects exactly the articulation of the articulatory organs so as to obtain a synthesized speech of high quality.
  • In the embodiment above described, an articulation element piece waveform corresponding to an articulation symbol is compiled and synthesized, but it is clear that the reduction in memory capacity is also possible when this invention is applied to the synthesizing method using what is called a "characteristic parameter" such as a Formant parameter.

Claims (12)

1. A speech synthesizer comprising:
a converting means for converting the input sequence of characters to a sequence of articulation symbols corresponding to a unit speech waveform which is obtained by dividing a diphone;
a memory for storing said unit speech waveform corresponding to the predetermined articulation symbols; and
a synthesizing means for reading said unit speech waveforms corresponding to said articulation symbols of the converted sequence of articulation symbols from said memory and synthesizing them.
2. A speech synthesizer according to claim 1, wherein said unit speech waveform of the vowel part which follows a consonant and is influenced by said consonant is stored in said memory.
3. A speech synthesizer according to claim 1 or 2, wherein said unit speech waveform of the vowel part preceding a consonant is stored in said memory.
4. A speech synthesizer according to any pf claims 1 to 3, further comprising an input means for outputting said input sequence of characters.
5. A speech synthesizer according to claim 4, wherein said input means is a keyboard.
6. A speech synthesizer according to any of claims 1 to 5, wherein said converting means includes: a first converting circuit for converting said input sequence of characters to a sequence of phonemes; and a second converting circuit for converting the converted sequence of phonemes to said sequence of articulation symbols.
7. A speech synthesizer according to any of claims 4. to 6, wherein said input means also generates a stress signal which represents the stress strength of the unit speech waveform corresponding to said articulation symbols.
8. A speech synthesizer according to claim 7, wherein said unit speech waveform is stored for each unit of said stress strength in said memory, and said synthesizing means reads said unit speech waveform corresponding to said stress signal from said memory and synthesizes them.
9. A speech synthesizer according to any of claims 1 to 8, further comprising: an interpolation method determining means for determining an interpolation method on the basis of the speech part of input characters corresponding to the output of said converting means; and an interpolating means for interpolating the unit speech waveform read from said memory on the basis of the determined interpolation method.
10. A speech synthesizer according to claim 9, wherein said interpolation method determining means directly connects the two read unit speech waveforms when said input speech part of the input characters is unvoiced (as well as silence), and determines a predetermined first interpolation method when said input speech part of the input characters is voiced.
11. A speech synthesizer according to claim 9 or 10, wherein said first interpolation method executed by said interpolation means determines interpolation waveform h.(n) by the following formulae on the basis of one pitch period of waveform at the connected ending part of a temporally preceding unit speech waveform f(n), its time length Nf, the element piece waveform at the connected beginning part of a succeeding unit speech waveform g(n), and its time length N :
Figure imgb0003
Figure imgb0004
and determines the time length of said interpolation waveform N. by interpolating Nf and Ng.
12. A speech synthesizer according to any of claims 1 to 11, further comprising a D/A converting means which is connected to the output of said synthesizing means.
EP84113186A 1983-11-01 1984-11-02 Speech synthesizer Expired EP0144731B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP205227/83 1983-11-01
JP58205227A JPH0642158B2 (en) 1983-11-01 1983-11-01 Speech synthesizer

Publications (3)

Publication Number Publication Date
EP0144731A2 true EP0144731A2 (en) 1985-06-19
EP0144731A3 EP0144731A3 (en) 1985-07-03
EP0144731B1 EP0144731B1 (en) 1988-09-07

Family

ID=16503507

Family Applications (1)

Application Number Title Priority Date Filing Date
EP84113186A Expired EP0144731B1 (en) 1983-11-01 1984-11-02 Speech synthesizer

Country Status (3)

Country Link
EP (1) EP0144731B1 (en)
JP (1) JPH0642158B2 (en)
DE (1) DE3473956D1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5463713A (en) * 1991-05-07 1995-10-31 Kabushiki Kaisha Meidensha Synthesis of speech from text
WO1997034291A1 (en) * 1996-03-14 1997-09-18 G Data Software Gmbh Microsegment-based speech-synthesis process
EP1617408A2 (en) 2004-07-15 2006-01-18 Yamaha Corporation Voice synthesis apparatus and method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02141054A (en) * 1988-11-21 1990-05-30 Nec Home Electron Ltd Terminal equipment for personal computer communication
JP5782751B2 (en) * 2011-03-07 2015-09-24 ヤマハ株式会社 Speech synthesizer

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2531006A1 (en) * 1975-07-11 1977-01-27 Deutsche Bundespost Speech synthesis system from diphthongs and phonemes - uses time limit for stored diphthongs and their double application
EP0058130A2 (en) * 1981-02-11 1982-08-18 Eberhard Dr.-Ing. Grossmann Method for speech synthesizing with unlimited vocabulary, and arrangement for realizing the same
DE3220281A1 (en) * 1981-05-29 1982-12-23 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka System for composing a voice through compilation of phoneme components
DE3246712A1 (en) * 1981-12-17 1983-06-30 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka METHOD FOR COMPOSING A VOICE ANALYSIS
EP0087199A1 (en) * 1982-02-24 1983-08-31 Koninklijke Philips Electronics N.V. Device for generating audio information of individual characters
EP0107945A1 (en) * 1982-10-19 1984-05-09 Kabushiki Kaisha Toshiba Speech synthesizing apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5331561A (en) * 1976-09-04 1978-03-24 Mitsukawa Shiyouichi Method of manufacturing ssshaped springs
JPS5868099A (en) * 1981-10-19 1983-04-22 富士通株式会社 Voice synthesizer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2531006A1 (en) * 1975-07-11 1977-01-27 Deutsche Bundespost Speech synthesis system from diphthongs and phonemes - uses time limit for stored diphthongs and their double application
EP0058130A2 (en) * 1981-02-11 1982-08-18 Eberhard Dr.-Ing. Grossmann Method for speech synthesizing with unlimited vocabulary, and arrangement for realizing the same
DE3220281A1 (en) * 1981-05-29 1982-12-23 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka System for composing a voice through compilation of phoneme components
DE3246712A1 (en) * 1981-12-17 1983-06-30 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka METHOD FOR COMPOSING A VOICE ANALYSIS
EP0087199A1 (en) * 1982-02-24 1983-08-31 Koninklijke Philips Electronics N.V. Device for generating audio information of individual characters
EP0107945A1 (en) * 1982-10-19 1984-05-09 Kabushiki Kaisha Toshiba Speech synthesizing apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ICASSP 80, PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Vol.2, April 9-11, 1980, Fairmont Hotel, DENVER, COLORADO, (US), pages 557-560, IEEE, NEW YORK, (US) S. IMAI et al.:"Cepstral synthesis of Japanese from CV syllable parameters". *
ICASSP 82, PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Vol. 2, May 3-5, 1982, Palais des Congres, PARIS, (FR), pages 936-939, IEEE, NEW YORK, (US) E.R. GROSSMANN:"Speech synthesis in the time domain from text". *
PROCEEDINGS OF THE SEMINAR ON PATTERN RECOGNITION, Vol. 1, November 19-20, 1977, University Sart-Tilman, LIEGE, (BE) pages 4.4.1 - 4.4.6, Sitel, OPHAIN, (BE) D. TEIL:"Un peripherique a reponse vocale: L'ICOPHONE 5". *
THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, Vol. 25, No. 1, January 1953, pages 105-113, (US) A.S. HOUSE et al.:"The influence of consonant environment upon the secondary acoustical characteristics of vowels". *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5463713A (en) * 1991-05-07 1995-10-31 Kabushiki Kaisha Meidensha Synthesis of speech from text
WO1997034291A1 (en) * 1996-03-14 1997-09-18 G Data Software Gmbh Microsegment-based speech-synthesis process
DE19610019A1 (en) * 1996-03-14 1997-09-18 Data Software Gmbh G Digital speech synthesis process
DE19610019C2 (en) * 1996-03-14 1999-10-28 Data Software Gmbh G Digital speech synthesis process
US6308156B1 (en) 1996-03-14 2001-10-23 G Data Software Gmbh Microsegment-based speech-synthesis process
EP1617408A2 (en) 2004-07-15 2006-01-18 Yamaha Corporation Voice synthesis apparatus and method
EP1617408A3 (en) * 2004-07-15 2007-06-20 Yamaha Corporation Voice synthesis apparatus and method
US7552052B2 (en) 2004-07-15 2009-06-23 Yamaha Corporation Voice synthesis apparatus and method

Also Published As

Publication number Publication date
EP0144731A3 (en) 1985-07-03
DE3473956D1 (en) 1988-10-13
EP0144731B1 (en) 1988-09-07
JPH0642158B2 (en) 1994-06-01
JPS6097396A (en) 1985-05-31

Similar Documents

Publication Publication Date Title
US4862504A (en) Speech synthesis system of rule-synthesis type
US6778962B1 (en) Speech synthesis with prosodic model data and accent type
US7240005B2 (en) Method of controlling high-speed reading in a text-to-speech conversion system
EP0831460B1 (en) Speech synthesis method utilizing auxiliary information
JP3361066B2 (en) Voice synthesis method and apparatus
US5463713A (en) Synthesis of speech from text
EP0427485A2 (en) Speech synthesis apparatus and method
US6035272A (en) Method and apparatus for synthesizing speech
EP0239394B1 (en) Speech synthesis system
KR20000005183A (en) Image synthesizing method and apparatus
US5212731A (en) Apparatus for providing sentence-final accents in synthesized american english speech
US6970819B1 (en) Speech synthesis device
EP0144731B1 (en) Speech synthesizer
EP0107945B1 (en) Speech synthesizing apparatus
JPS6050600A (en) Rule synthesization system
JP3060276B2 (en) Speech synthesizer
Furtado et al. Synthesis of unlimited speech in Indian languages using formant-based rules
JP3771565B2 (en) Fundamental frequency pattern generation device, fundamental frequency pattern generation method, and program recording medium
JP3081300B2 (en) Residual driven speech synthesizer
JP3318290B2 (en) Voice synthesis method and apparatus
JP3086333B2 (en) Voice synthesis device and voice synthesis method
JP2573586B2 (en) Rule-based speech synthesizer
JPH11161297A (en) Method and device for voice synthesizer
JPS58168096A (en) Multi-language voice synthesizer
Eady et al. Pitch assignment rules for speech synthesis by word concatenation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

17P Request for examination filed

Effective date: 19841102

AK Designated contracting states

Designated state(s): DE FR GB

AK Designated contracting states

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 19861003

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 3473956

Country of ref document: DE

Date of ref document: 19881013

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20021030

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20021107

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20021108

Year of fee payment: 19

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20031102

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040602

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20031102

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040730

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST