EP0144731A2 - Sprachsynthesizer - Google Patents

Sprachsynthesizer Download PDF

Info

Publication number
EP0144731A2
EP0144731A2 EP84113186A EP84113186A EP0144731A2 EP 0144731 A2 EP0144731 A2 EP 0144731A2 EP 84113186 A EP84113186 A EP 84113186A EP 84113186 A EP84113186 A EP 84113186A EP 0144731 A2 EP0144731 A2 EP 0144731A2
Authority
EP
European Patent Office
Prior art keywords
speech
waveform
articulation
unit
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP84113186A
Other languages
English (en)
French (fr)
Other versions
EP0144731A3 (en
EP0144731B1 (de
Inventor
Katsunobu Fushikida
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of EP0144731A2 publication Critical patent/EP0144731A2/de
Publication of EP0144731A3 publication Critical patent/EP0144731A3/en
Application granted granted Critical
Publication of EP0144731B1 publication Critical patent/EP0144731B1/de
Expired legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • This invention relates to a speech synthesizer.
  • V CV vowel - consonant - vowel
  • a speech synthesizer comprising a converting means for converting the input sequence of characters to a sequence of articulation symbols corresponding to a unit speech waveform which is obtained by dividing a diphone, a memory for storing the unit speech waveform corresponding to the predetermined articulation symbols and a synthesizing means for reading the unit speech waveforms corresponding to the articulation symbols of the converted sequence of articulation symbols from the memory and synthesizing them.
  • a speech to be synthesized is first indicated by a keyboard 10. From the keyboard 10, a sequence of character signal, a stress strength signal (in this embodiment, three-levelled) and a boundary signal between speeches are generated.
  • a stress strength signal in this embodiment, three-levelled
  • a boundary signal between speeches are generated.
  • the structure and performance of the speech synthesizer shown in Fig. 1 will be described on the assumption that the speech to be synthesized is "kite".
  • the alphabetical character sequence signals incorporating “kite” are generated by pushing keys "K”, “I”, “T” and “E".
  • the boundary signal B indicating the boundary such as the beginning, ending and pause of the word “kite” and the stress strength signal S T are also supplied to a phoneme symbol/articulation symbol converting circuit 20 together with the character sequence signal.
  • the stress strength is determined based on the pitch and strength of each syllable, for example, a high stress strength shows high pitch frequency.
  • the converting circuit 20 has a processing part 21 and a memory 22. In the memory 22 is stored the phoneme symbol corresponding to the speech which has been prepared in advance. For example, a phoneme symbol /kait/ is stored in correspondence with the word "kite".
  • the processing part 21 supplies an address information to the memory 22 in response to the input signal for a sequence of character. Then the phoneme symbol signal /kait/ is read from the memory 22 and supplied to a phoneme symbol/articulation symbol converting circuit 30.
  • the converting circuit 30 has, as well as the converting circuit 20, a processing part 31 and a memory 32. In the memory 32 is stored an articulation symbol (determined by the phonemes located therebefore and thereafter) which is determined in advance corresponding to the phoneme symbol and by the method peculiar to the present invention which will be described in the following.
  • the articulatory organs of the human being include vocal chords, a tongue, lips, a velum palatinum, etc., as shown in Fig. 3, and various speech is generated by controlling these articulatory organs in accordance with nerve pulse signals. Therefore, if two articulations of the articulatory organs are similar, two similar speech waveforms are generated. Further, it is apparent that if the.articulation parameter values representing the movements of these articulatory organs are approximate to each other, the generated speech waveforms are analogous. As described above, in the conventional synthesizing method basedon the CV, VC waveform connecting type, many speech waveforms corresponding to CV and VC are prepared, but from the viewpoint of the movement of an articulation parameter considerably redundant waveforms are included therein.
  • the speech waveform corresponding to a phoneme /ka/ and that corresponding to a phoneme /ga/ are prepared separately.
  • the movement of the articulatory organs for /ka/ and that for /ga/ are very similar.
  • the relationship between the tongue, palate, etc. is almost the same, and the main difference is in whether the vocal chords are vibrating or not (voiced or unvoiced) in the consonant parts. Therefore, in the voiced section after the unvoiced section of the consonant part /k/ in /ka/ (the section shifting to the normal part of the vowel /a/ which corresponds to (C in Fig.
  • the articulation parameter is almost the same as that of /ga/ (C in Fig. 4B), which can take the place of the partial. waveform of /ka/ in that section with a fairly good approximation. It is clear that in the pairs /kV/ - /gV/, /tV/ - /dV/ and /pV/ - /bV/ (V represents a vowel) also, the waveforms in the part shifting to the vowels can be shared. In Figs.
  • part A is the silent part at the beginning of /ka/ or /ga/ (represented as * )
  • part B the waveform of "k” in /ka/ or “g” in /ga/
  • B' the waveform of the part affected by the phoneme following "g” in /ga/
  • C and D are, as described above, the speech waveforms of the vowels "a” following the consonant of /ka/ and /ga/.
  • the time section which is determined in consideration of manner of articulation is shorter than a CV or VC waveform and can be substituted by a speech waveform based on a different phoneme series, as is shown in Figs. 4A, 4 B , is called an articulation segment, and a speech waveform in the articulation segment is called an articulation element piece waveform. That is, the syllables /ka/ and /ga/ are divided into the time sections B and C for the purpose of using the transient parts of those syllables as those for another speech synthesis.
  • articulation segments the manner of articulation of which are the same are represented by the same articulation symbol and the articulation element piece waveform corresponding to this articulation symbol is stored in the memory 32 in advance.
  • the articulation symbols corresponding to a sequence of phoneme symbols are stored in advance.
  • Fig. 2 shows the classified articulation symbols, in which * represents the silent part which is placed at the beginning of speech or immediately before an explosive, "p", “t”, “k” explosive parts, and (b)a, (d)a, (g)a represent transient parts of the vowel "a” parts which follow the consonants "b", "d", “g”.
  • i(b), i(d), i(g) represent the transient parts of the vowel "i” parts which precede the consonants "b", “d", "g", and ai, au, ao represent the transient parts where the vowel "a” is followed by the vowels "i", "u” and "o".
  • the phoneme symbol /it/ is substituted by a transient part i(d) shifting from the vowel to the consonant of the phoneme symbol /id/ resembling /it/ and a silent part * is placed immediately before the silent explosive "t".
  • speech synthesis by using, in place of /ka/ and /it/, the waveforms taken from /ga/ and /id/ the phoneme sequence of which is different from, but the articulation of which is similar to /ka/ and /it/, dispenses with the need to previously store the transient part of /ka/ or /it/ and enables reduction in the memory capacity.
  • These articulation element piece waveforms can be easily obtained from, for example, waveforms of uttered speech.
  • the waveform address generation circuit 50 reads the articulation element piece waveform corresponding to each articulation symbol which is contained in the articulation signal, and corresponding to the stress signal S T from an articulation waveform memory which is selected from among memories 80a, 80b and 80c included in an articulation waveform memory 80 by the stress signal S T .
  • the articulation element piece waveform is generated on the basis of the address corresponding to each articulation symbol from the memory 80.
  • the stress signal S T from the processing part 21 is detected in a stress strength detection circuit 40, and the articulation phoneme piece waveform of the strength corresponding to the strength of the detected stress strength is read from the memory 80.
  • the articulation element piece waveforms corresponding to the articulation symbols shown in Fig. 2 are stored.
  • An interpolation method selection circuit 60 judges whether the articulation symbol (two continuous waveforms) from the phoneme symbol/articulation symbol converting circuit 30 is voiced or unvoiced.
  • the interpolation circuit 70 is controlled by this judge result to perform the following interpolation, namely, when the articulation symbol is unvoiced (as well as silence) the two continuous articulation element piece waveforms read from the memory 80 are directly connected and, when the articulation symbol is voiced, these waveforms are interpolated, for example, synchronously with a pitch.
  • any spoken word is synthesized by connecting articulation waveforms having several levels of pitches by interpolation process between waveforms on the synchronous pitch process. For example, as shown in Fig.
  • N i namely the time length of h i (n)
  • N i is assumed to be the value obtained by interpolating N f and Ng.
  • N f and N are shorter than N i the final sample value of the waveform may be repeated, and when N f and Ng are longer than N i the surplus waveform may be discarded.
  • a continuous articulation element piece waveform (in this example, a digital waveform) corresponding to the input sequence of characters is supplied to a D/A converter 90 where the interpolated synthesized articulation waveform is converted to an analogue waveform and generated as a synthesized speech.
  • the symbol waveform of a synthesized speech obtained in this way is shown in Fig. 6.
  • this invention in which a unit of speech is used which is shorter from the viewpoint of time than a unit speech waveform such as CV, VC waveforms in the CV, VC waveform compiling type synthesizing method, not only requires a small memory capacity of waveform but also reflects exactly the articulation of the articulatory organs so as to obtain a synthesized speech of high quality.
  • an articulation element piece waveform corresponding to an articulation symbol is compiled and synthesized, but it is clear that the reduction in memory capacity is also possible when this invention is applied to the synthesizing method using what is called a "characteristic parameter" such as a Formant parameter.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)
EP84113186A 1983-11-01 1984-11-02 Sprachsynthesizer Expired EP0144731B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP205227/83 1983-11-01
JP58205227A JPH0642158B2 (ja) 1983-11-01 1983-11-01 音声合成装置

Publications (3)

Publication Number Publication Date
EP0144731A2 true EP0144731A2 (de) 1985-06-19
EP0144731A3 EP0144731A3 (en) 1985-07-03
EP0144731B1 EP0144731B1 (de) 1988-09-07

Family

ID=16503507

Family Applications (1)

Application Number Title Priority Date Filing Date
EP84113186A Expired EP0144731B1 (de) 1983-11-01 1984-11-02 Sprachsynthesizer

Country Status (3)

Country Link
EP (1) EP0144731B1 (de)
JP (1) JPH0642158B2 (de)
DE (1) DE3473956D1 (de)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5463713A (en) * 1991-05-07 1995-10-31 Kabushiki Kaisha Meidensha Synthesis of speech from text
WO1997034291A1 (de) * 1996-03-14 1997-09-18 G Data Software Gmbh Auf mikrosegmenten basierendes sprachsyntheseverfahren
EP1617408A2 (de) 2004-07-15 2006-01-18 Yamaha Corporation Verfahren und Vorrichtung zur Sprachsynthese

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02141054A (ja) * 1988-11-21 1990-05-30 Nec Home Electron Ltd パソコン通信用端末装置
JP5782751B2 (ja) * 2011-03-07 2015-09-24 ヤマハ株式会社 音声合成装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2531006A1 (de) * 1975-07-11 1977-01-27 Deutsche Bundespost System zur synthese von sprache im zeitbereich aus doppellauten und lautelementen
EP0058130A2 (de) * 1981-02-11 1982-08-18 Eberhard Dr.-Ing. Grossmann Verfahren zur Synthese von Sprache mit unbegrenztem Wortschatz und Schaltungsanordnung zur Durchführung des Verfahrens
DE3220281A1 (de) * 1981-05-29 1982-12-23 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka System zum zusammensetzen einer stimme durch kompilation von phonemstuecken
DE3246712A1 (de) * 1981-12-17 1983-06-30 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka Verfahren zur zusammensetzung einer stimmenanalyse
EP0087199A1 (de) * 1982-02-24 1983-08-31 Koninklijke Philips Electronics N.V. Einrichtung zur Erzeugung akustischer Information für einzelne Schriftzeichen
EP0107945A1 (de) * 1982-10-19 1984-05-09 Kabushiki Kaisha Toshiba Einrichtung zur Sprachsynthese

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5331561A (en) * 1976-09-04 1978-03-24 Mitsukawa Shiyouichi Method of manufacturing ssshaped springs
JPS5868099A (ja) * 1981-10-19 1983-04-22 富士通株式会社 音声合成装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2531006A1 (de) * 1975-07-11 1977-01-27 Deutsche Bundespost System zur synthese von sprache im zeitbereich aus doppellauten und lautelementen
EP0058130A2 (de) * 1981-02-11 1982-08-18 Eberhard Dr.-Ing. Grossmann Verfahren zur Synthese von Sprache mit unbegrenztem Wortschatz und Schaltungsanordnung zur Durchführung des Verfahrens
DE3220281A1 (de) * 1981-05-29 1982-12-23 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka System zum zusammensetzen einer stimme durch kompilation von phonemstuecken
DE3246712A1 (de) * 1981-12-17 1983-06-30 Matsushita Electric Industrial Co., Ltd., Kadoma, Osaka Verfahren zur zusammensetzung einer stimmenanalyse
EP0087199A1 (de) * 1982-02-24 1983-08-31 Koninklijke Philips Electronics N.V. Einrichtung zur Erzeugung akustischer Information für einzelne Schriftzeichen
EP0107945A1 (de) * 1982-10-19 1984-05-09 Kabushiki Kaisha Toshiba Einrichtung zur Sprachsynthese

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ICASSP 80, PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Vol.2, April 9-11, 1980, Fairmont Hotel, DENVER, COLORADO, (US), pages 557-560, IEEE, NEW YORK, (US) S. IMAI et al.:"Cepstral synthesis of Japanese from CV syllable parameters". *
ICASSP 82, PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Vol. 2, May 3-5, 1982, Palais des Congres, PARIS, (FR), pages 936-939, IEEE, NEW YORK, (US) E.R. GROSSMANN:"Speech synthesis in the time domain from text". *
PROCEEDINGS OF THE SEMINAR ON PATTERN RECOGNITION, Vol. 1, November 19-20, 1977, University Sart-Tilman, LIEGE, (BE) pages 4.4.1 - 4.4.6, Sitel, OPHAIN, (BE) D. TEIL:"Un peripherique a reponse vocale: L'ICOPHONE 5". *
THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, Vol. 25, No. 1, January 1953, pages 105-113, (US) A.S. HOUSE et al.:"The influence of consonant environment upon the secondary acoustical characteristics of vowels". *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5463713A (en) * 1991-05-07 1995-10-31 Kabushiki Kaisha Meidensha Synthesis of speech from text
WO1997034291A1 (de) * 1996-03-14 1997-09-18 G Data Software Gmbh Auf mikrosegmenten basierendes sprachsyntheseverfahren
DE19610019A1 (de) * 1996-03-14 1997-09-18 Data Software Gmbh G Digitales Sprachsyntheseverfahren
DE19610019C2 (de) * 1996-03-14 1999-10-28 Data Software Gmbh G Digitales Sprachsyntheseverfahren
US6308156B1 (en) 1996-03-14 2001-10-23 G Data Software Gmbh Microsegment-based speech-synthesis process
EP1617408A2 (de) 2004-07-15 2006-01-18 Yamaha Corporation Verfahren und Vorrichtung zur Sprachsynthese
EP1617408A3 (de) * 2004-07-15 2007-06-20 Yamaha Corporation Verfahren und Vorrichtung zur Sprachsynthese
US7552052B2 (en) 2004-07-15 2009-06-23 Yamaha Corporation Voice synthesis apparatus and method

Also Published As

Publication number Publication date
EP0144731A3 (en) 1985-07-03
DE3473956D1 (en) 1988-10-13
EP0144731B1 (de) 1988-09-07
JPH0642158B2 (ja) 1994-06-01
JPS6097396A (ja) 1985-05-31

Similar Documents

Publication Publication Date Title
US4862504A (en) Speech synthesis system of rule-synthesis type
US6778962B1 (en) Speech synthesis with prosodic model data and accent type
US7240005B2 (en) Method of controlling high-speed reading in a text-to-speech conversion system
EP0831460B1 (de) Sprachsynthese unter Verwendung von Hilfsinformationen
JP3361066B2 (ja) 音声合成方法および装置
US5463713A (en) Synthesis of speech from text
EP0427485A2 (de) Verfahren und Einrichtung zur Sprachsynthese
US6035272A (en) Method and apparatus for synthesizing speech
EP0239394B1 (de) Sprachsynthesesystem
KR20000005183A (ko) 이미지 합성방법 및 장치
US5212731A (en) Apparatus for providing sentence-final accents in synthesized american english speech
US6970819B1 (en) Speech synthesis device
EP0144731B1 (de) Sprachsynthesizer
EP0107945B1 (de) Einrichtung zur Sprachsynthese
JPS6050600A (ja) 規則合成方式
JP3060276B2 (ja) 音声合成装置
Furtado et al. Synthesis of unlimited speech in Indian languages using formant-based rules
JP3771565B2 (ja) 基本周波数パタン生成装置、基本周波数パタン生成方法、及びプログラム記録媒体
JP3081300B2 (ja) 残差駆動型音声合成装置
JP3318290B2 (ja) 音声合成方法および装置
JP3086333B2 (ja) 音声合成装置及び音声合成方法
JP2573586B2 (ja) 規則型音声合成装置
JPH11161297A (ja) 音声合成方法及び装置
JPS58168096A (ja) 複数言語音声合成装置
Eady et al. Pitch assignment rules for speech synthesis by word concatenation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

17P Request for examination filed

Effective date: 19841102

AK Designated contracting states

Designated state(s): DE FR GB

AK Designated contracting states

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 19861003

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 3473956

Country of ref document: DE

Date of ref document: 19881013

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20021030

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20021107

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20021108

Year of fee payment: 19

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20031102

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040602

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20031102

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040730

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST