JPH03200299A

JPH03200299A - Voice synthesizer

Info

Publication number: JPH03200299A
Application number: JP1343204A
Authority: JP
Inventors: Tetsuo Nishimoto; 西元　哲夫
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1989-12-28
Filing date: 1989-12-28
Publication date: 1991-09-02
Anticipated expiration: 2016-01-31
Also published as: JP3130305B2

Abstract

PURPOSE:To synthesize a voice by synchronizing formant frequency control and amplitude control with each other a set delay time later. CONSTITUTION:This synthesizer is provided with a means which sets a delay time, a means which generates the amplitude envelope of the voice to be generated the delay time later, and a means which generates frequency control information in synchronism with the generation of the amplitude envelope. Further, the synthesizer is provided with a sound source whose formant frequency is controlled with frequency control information to generate a formant sound and a voice output means which controls the amplitude of the formant sound according to the amplitude envelope and outputs it as a sound. In a sound generation period of each sound, the control over the formant frequency and the control over the amplitude are performed synchronously. Consequently, a sound is faithfully reproduced when a sound which is usually heard is continuously generated.

Description

【発明の詳細な説明】「産業上の利用分野」この発明は特にフォルマント合成方式による音声合成装
置に関する。DETAILED DESCRIPTION OF THE INVENTION "Field of Industrial Application" The present invention particularly relates to a speech synthesis device using a formant synthesis method.

「従来の技術」一定周波数の周期波形（例えば正弦波）を発生する周期
波形発生装置、所定ピッチ毎に窓関数を発生する窓関数
発生装置、および周期波形と窓関数とを乗算して出力す
る乗算器からなるフォルマント発生装置を備えた音声合
成装置が知られている。"Prior Art" A periodic waveform generator that generates a periodic waveform (for example, a sine wave) with a constant frequency, a window function generator that generates a window function at every predetermined pitch, and a device that multiplies the periodic waveform and the window function and outputs the result. 2. Description of the Related Art Speech synthesis devices are known that include a formant generation device consisting of a multiplier.

フォルマント発生装置によれば、上記周期波形の周波数
をフォルマント中心周波数とし、このフォルマント中心
周波数の両側に窓関数のスペクトルを配置したスペクト
ル分布を有するフォルマント音が前記所定ピッチ毎に発
生される。According to the formant generator, a formant sound having a spectral distribution in which the frequency of the periodic waveform is set as a formant center frequency and a window function spectrum is arranged on both sides of the formant center frequency is generated at each of the predetermined pitches.

一般に音声は各々特有な複数のフォルマント音を有して
いるので、音声合成装置には、上記フォルマント発生装
置が複数段けられ、各フォルマント発生装置によって所
望の音声を構成する代表的な幾つかのフォルマント音の
発生が行われる。そして、発生された各フォルマント音
に対して振幅エンベロープジェネレータ（以下、振幅Ｅ
Ｇ）から発生される振幅エンベロープが乗算され、各乗
算結果が加算され、音声をして出力される。In general, each voice has a plurality of unique formant sounds, so a speech synthesis device has multiple stages of formant generators, and each formant generator generates a number of typical forms that make up a desired voice. A formant sound is generated. Then, an amplitude envelope generator (hereinafter referred to as amplitude E) is applied to each generated formant sound.
The amplitude envelope generated from G) is multiplied, and the multiplication results are added together and output as audio.

また、より忠実な音色制御を実現する場合には、音声合
成装置内にピッチエンベロープジェネレータ（以下、ピ
ッチＥＣと略す）が設けられ、ピッチＥＧから出力され
るピッチエンベロープの振幅に応じてフォルマント音の
発生ピッチの制御が行われる。このようなピッチ制御を
行うことにより、フォルマント中心周波数に時間的変化
が与えられ、自然の音声に見られるような時間経過に伴
った音色の変化が実現される。この種の音声合成装置に
よれば、音声、特に母音部の音をある程度忠実に再現す
ることができる。また、同様の装置構成により、自然楽
器の楽音の持続部等の合成を行うこともできる。In addition, in order to achieve more faithful timbre control, a pitch envelope generator (hereinafter abbreviated as pitch EC) is installed in the speech synthesis device, and formant sounds are generated according to the amplitude of the pitch envelope output from pitch EG. The generation pitch is controlled. By performing such pitch control, a temporal change is given to the formant center frequency, and a change in timbre over time as seen in natural speech is realized. According to this type of speech synthesis device, it is possible to reproduce speech, especially vowel sounds, with some degree of fidelity. Further, with a similar device configuration, it is also possible to synthesize sustained parts of musical tones of natural musical instruments.

音声の子音部あるいは自然楽器音の立ち上がり部等の合
成装置としては、ホワイトノイズをローパスフィルタに
よって帯域制限してノイズ音を発生し、ノイズ音と一定
周波数の周期“波形とを乗算し、無声音を発生する方式
のものが、本発明出願人によって既に出願されている（
特願平１−９１７６２号（発明の名称「ノイズ音発生装
置」））。この装置によれば、周期波形の周波数をフォ
ルマント中心周波数とし、その両側にローパスフィルタ
の通過帯域特性に相当する連続スペクトルを有するフォ
ルマント音が得られる。このようなフォルマント音を各
種発生し、重ね合わせることにより、音声の子音や口笛
または自然楽器音のアタック部のような非調和成分を含
んだ音を合成することができる。As a synthesis device for the consonant part of a voice or the rising part of a natural instrument sound, a noise sound is generated by band-limiting white noise with a low-pass filter, and the noise sound is multiplied by a periodic "waveform" of a constant frequency to produce an unvoiced sound. The system that generates this has already been filed by the applicant of the present invention (
Patent Application No. 1-91762 (title of invention: "Noise Sound Generator"). According to this device, a formant sound can be obtained that has the frequency of the periodic waveform as the formant center frequency and has a continuous spectrum on both sides thereof corresponding to the passband characteristics of a low-pass filter. By generating various types of such formant sounds and superimposing them, it is possible to synthesize a sound containing an inharmonic component, such as the consonant of a voice, a whistle, or the attack part of a natural instrument sound.

「発明が解決しようとする課題」さて、日常的な音声の発音を実現しようとする場合、種
類の異なった複数の音を時間的に前後させて発音するこ
とができれば非常に便利である。``Problems to be Solved by the Invention'' Now, when trying to realize the pronunciation of everyday sounds, it would be extremely convenient if multiple sounds of different types could be pronounced one after the other in time.

例えば、子音部［Ｓ］と母音部［Ａ］とを有する音声［
ＳＡ］を発生する場合、第３図に例示するように、まず
、子音部［Ｓ］を発音し、所定時間が経過した後、母音
部［Ａ］を発音するようにすることができると、音声［
ＳＡ］を非常に滑らかに発音することができる。また、
時間的に前後して発音される各音の発音期間内において
、フォルマント周波数の制御および各フォルマントの振
幅の制御が同期して進められると、非常に好都合である
。例えば、上記の［ＳＡ］音の発音状況を厳密に観察す
ると、子音［Ｓ］の発音期間中は口が閉じられており、
この状態から母音［Ａ］の発音期間に移行すると、口が
閉じた状態から徐々に開いた状態になるので、実際に発
音される母音は最初［Ｕ］から始まり徐々に［Ａ］に変
化することとなる。従って、第２番目の音［Ａ］の振幅
を立ち上げるのに同期し、音が［Ｕコから［Ａ］へ変化
するようにフォルマント周波数を連続的に変化させるこ
とができると、日常発音される連続音を非常に忠実に再
現することができる。For example, a voice [
SA], the consonant part [S] can be pronounced first, and the vowel part [A] can be pronounced after a predetermined period of time, as illustrated in FIG. audio[
SA] can be pronounced very smoothly. Also,
It is very convenient if control of the formant frequency and control of the amplitude of each formant are performed in synchronization within the sounding period of each sound that is produced one after the other in time. For example, if we closely observe the pronunciation of the [SA] sound above, we can see that the mouth is closed during the pronunciation period of the consonant [S];
When the state moves from this state to the period when the vowel [A] is pronounced, the mouth gradually changes from a closed state to an open state, so the vowel that is actually pronounced starts with [U] and gradually changes to [A]. It happens. Therefore, if we could synchronize with the rise in the amplitude of the second note [A] and change the formant frequency continuously so that the note changes from [U-C to [A], it would be possible to change the formant frequency continuously in synchronization with the rise in the amplitude of the second note [A]. Continuous sounds can be reproduced with great fidelity.

この発明は上述した事情に鑑みてなされたもので、設定
された遅延時間経過後にフォルマント周波数の制御およ
び振幅の制御を同期させて行い、音声を合成することが
できる音声合成装置を提供することを目的としている。The present invention has been made in view of the above-mentioned circumstances, and an object of the present invention is to provide a speech synthesis device capable of synthesizing speech by synchronizing formant frequency control and amplitude control after a set delay time has elapsed. The purpose is

「課題を解決するための手段」この発明は、遅延時間を設定する手段と、発音開始の指
示が与えられた場合に、前記遅延時間の経過後、発音す
べき音声の振幅エンベロープを発生する手段と、前記振幅エンベロープの発生と同期し、周波数制御情報
を発生する手段と、前記周波数制御情報によってフォルマント周波数が制御
され、フォルマント音を発生する音源と、前記フォルマ
ント音の振幅を前記振幅エンベロープに従って制御し、
音声として出力する音声出力手段とヲ具備することを特徴としている。"Means for Solving the Problems" The present invention provides means for setting a delay time, and means for generating an amplitude envelope of a sound to be produced after the delay time has elapsed when an instruction to start producing sounds is given. a means for generating frequency control information in synchronization with generation of the amplitude envelope; a sound source whose formant frequency is controlled by the frequency control information and generates a formant sound; and a means for controlling the amplitude of the formant sound according to the amplitude envelope. death,
The present invention is characterized by being equipped with a voice output means for outputting voice.

「作用」上記構成によれば、発音開始指示が与えられた時点から
、設定された遅延時間の経過後、フォルマント周波数お
よび振幅の制御が同期して行われ、音声が発生される。"Operation" According to the above configuration, after the set delay time has elapsed from the time when a sound generation start instruction is given, the formant frequency and amplitude are synchronously controlled and sound is generated.

「実施例」以下、図面を参照して本発明の一実施例について説明す
る。"Embodiment" Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図はこの発明の一実施例による音声合成装置の構成
を示すブロック図である。第１図において、１〜４は母
音フォルマント発生部であり、音声の母音部を構成する
特徴的な第１〜第４７オルマントを各々発生する。５〜
８は音声の子音第１〜第４７オルマントを発生する子音
フォルマント発生部である。これらの各フォルマント発
生部１〜８は、各々、周期波形を発生する回路、所定の
窓関数を発生する回路、および周期波形と窓関数を乗算
してフォルマント音を出力する乗算器を有している。FIG. 1 is a block diagram showing the configuration of a speech synthesizer according to an embodiment of the present invention. In FIG. 1, numerals 1 to 4 are vowel formant generating units, which respectively generate characteristic 1st to 47th ormants that constitute the vowel part of speech. 5~
Reference numeral 8 denotes a consonant formant generating section that generates the first to 47th ormants of consonants of speech. Each of these formant generating units 1 to 8 has a circuit that generates a periodic waveform, a circuit that generates a predetermined window function, and a multiplier that multiplies the periodic waveform and the window function to output a formant sound. There is.

！１〜■８は、各々、ピッチＥＧであり、母音第１〜第
４フオルマントおよび子音第１〜第４フオルマントの発
生ピッチを指定する各ピッチエンベロープＰＥ、〜ＰＥ
ｓを各々発生し、母音フォルマント発生部１〜４および
子音フォルマント発生部５〜８に各々供給する。ピッチ
ＥＧＩＩおよび！５にはタイマが内蔵されており、発音
に先立ち、図示しない制御手段によってタイマに計時デ
ータがセットされる。そして、図示しない制御手段から
発音指示信号ＫＯＮが与えられると、ピッチＥＧｌｌお
よびＩ５において、タイマによる計時動作が行われる。! 1 to ■8 are pitches EG, respectively, and pitch envelopes PE, ~PE specifying the pitches of occurrence of vowels 1st to 4th formants and consonants 1st to 4th formants.
s are generated and supplied to vowel formant generating units 1 to 4 and consonant formant generating units 5 to 8, respectively. Pitch EGII and! 5 has a built-in timer, and prior to sound generation, clock data is set in the timer by a control means (not shown). Then, when a sound generation instruction signal KON is applied from a control means (not shown), a timer performs a timing operation at pitches EGll and I5.

そして、ピッチＥＧＩＩのタイマの計時動作が終了する
と、ピッチＥＧＩＩがらピッチエンベロープＰＥ、が出
力されると共に駆動信号Ｋ　Ｄ　ｔが出力される。また
、ピッチＥＧ１５のタイマの計時動作が終了した場合も
同様に、ピッチＥＧ１５からピッチエンベロープＰ　Ｅ
　ｓおよび駆動信号Ｋ　Ｄ　ｓが出力される。Then, when the timing operation of the pitch EGII timer ends, the pitch envelope PE is outputted from the pitch EGII, and the drive signal K D t is outputted. Similarly, when the timer operation for pitch EG15 ends, the pitch envelope P E is changed from pitch EG15.
s and a drive signal K D s are output.

これに対し、ピッチＥＧ１２〜１４および１６〜１８は
、このようなタイマを内蔵しておらず、前段からの駆動
信号の入力があった場合に直ちにピッチエンベロープの
発生を開始すると共に駆動信号を出力する（例えば、ピ
ッチＥＧ１２は前段のピッチＥＧＩＩからの駆動信号Ｋ
Ｄ、が入力された時点でピッチエンベロープＰ　Ｅ　ｔ
の出力ヲ開始すると共に駆動信号ＫＤ、を出力する）。On the other hand, pitch EGs 12 to 14 and 16 to 18 do not have such a built-in timer, and when a drive signal is input from the previous stage, they immediately start generating a pitch envelope and output a drive signal. (For example, the pitch EG12 is the drive signal K from the previous pitch EGII.
When D is input, the pitch envelope P E t
At the same time, the drive signal KD is output.)

２１〜２８は各々振幅ＥＧである。これらの振幅ＥＧ２
１〜２８は、駆動信号ＫＤ、〜ＫＤ、が入力された場合
に振幅エンベロープＡＥ、〜Ａ　Ｅ　ｅを各々出力する
。21 to 28 are amplitudes EG, respectively. These amplitudes EG2
1 to 28 respectively output amplitude envelopes AE and AE when drive signals KD and ∼KD are input.

第２図はピッチエンベａ−プＰＥ、および振幅エンベロ
ープＡＥ、を例示したものであり、横軸は時間、縦軸は
各エンベロープの振幅を示す。また、ＤＴは発音指示信
号ＫＯＮが与えられてから、各エンベロープが発生され
るまでの遅延時間であり、ピッチＥＧＩＩのタイマに設
定される計時データによって決定される。ここで、ピッ
チエンベロープＰＥ、の波形は、所望の音声を構成する
母音第１フオルマントのフォルマント中心周波数の時間
的推移を模して設定され、振幅エンベロープＡＥｌの波
形は母音第１フオルマントの振幅の時間的推移を模した
波形が設定される。他のピッチエンベロープＰＥ、〜Ｐ
Ｅ１および振幅エンベロープＡ　Ｅ　ｔ〜ＡＥ、につい
ても同様である。FIG. 2 illustrates a pitch envelope PE and an amplitude envelope AE, where the horizontal axis shows time and the vertical axis shows the amplitude of each envelope. Further, DT is a delay time from when the sound production instruction signal KON is applied until each envelope is generated, and is determined by the time measurement data set in the pitch EGII timer. Here, the waveform of the pitch envelope PE is set to simulate the temporal transition of the formant center frequency of the vowel first formant that constitutes the desired speech, and the waveform of the amplitude envelope AEl is set to simulate the temporal transition of the formant center frequency of the vowel first formant that constitutes the desired voice. A waveform is set that simulates a typical transition. other pitch envelopes PE, ~P
The same applies to E1 and the amplitude envelope A E t to AE.

そして、乗算器３１〜３４によって、母音第１〜第４フ
オルマントと振幅エンベロープＡＥ、〜ＡＥ、との乗算
が行われ、各乗算結果が加算器４１によって加算される
。また、乗算器３５〜３８によって、子音第１〜第４フ
十ルマントと振幅エンベロープＡ　Ｅ　ｓ〜ＡＥａとの
乗算が行われ、各乗算結果が加算器４２によって加算さ
れる。そして、加算器４１および４２の各加算結果が加
算器５゜によって加算され、音声として出力される。The multipliers 31 to 34 then multiply the vowel first to fourth formants by the amplitude envelopes AE, -AE, and the adder 41 adds the multiplication results. Furthermore, the multipliers 35 to 38 multiply the first to fourth consonant fullants by the amplitude envelopes A E s to AEa, and the adder 42 adds the multiplication results. Then, the addition results of adders 41 and 42 are added by adder 5° and output as audio.

以下、音声［ＳＡ］を合成する場合を例に、この音声合
成装置の動作を説明する。発音に先立ち、図示しない＃
御手段によって、子音「ｓ］に対応したピッチエンベロ
ープパラメータがピッチＥＧＩ５〜１８に設定され、初
期においては［Ｕ］音であり、次第に［Ａ］音へ変化す
る母音部を得るためのピッチエンベロープパラメータが
ピッチＥＧＩＩ〜１４に設定される。また、ピッチＥＧ
ＩＩおよび１５には、各々母音および子音の発音時刻を
指定するための各計時データがセットされる。この場合
、母音の発音時刻が子音の発音時刻より僅かに遅れるの
で、ピッチＥＧＩＩにはピッチＥＧＩ５に設定するもの
より、僅かに大きな計時データが設定される。また、振
幅ＥＧ２１〜２８には、所望の音声に対応した振幅エン
ベロープパラメータがセットされる。The operation of this speech synthesis device will be described below, taking as an example the case of synthesizing speech [SA]. # not shown before pronunciation
The pitch envelope parameter corresponding to the consonant "s" is set to pitch EGI5 to 18 by the control means, and the pitch envelope parameter is set to obtain a vowel part that is initially a [U] sound and gradually changes to an [A] sound. is set to pitch EGII to 14. Also, pitch EG
In II and 15, time measurement data for specifying pronunciation times of vowels and consonants, respectively, is set. In this case, since the vowel pronunciation time is slightly later than the consonant pronunciation time, slightly larger clock data is set for pitch EGII than that set for pitch EGI5. Furthermore, amplitude envelope parameters corresponding to desired audio are set in amplitudes EG21 to EG28.

そして、図示しない制御手段から発音指示信号ＫＯＮが
出力されると、ピッチＥＧＩＩおよび１５では、各計時
データの計時が行われる。この場合、まず、ピッチＥＧ
１５において計時が終了し、ピッチエンベロープＰＥｓ
の発生が開始される。Then, when the sound generation instruction signal KON is outputted from a control means (not shown), each time measurement data is measured at pitches EGII and 15. In this case, first, pitch EG
The timing ends at 15, and the pitch envelope PEs
begins to occur.

また、この時、ピッチＥＧ１５によって駆動信号ＫＤ、
が出力されることにより、駆動信号ＫＤ、〜ＫＤ、が発
生される。この結果、ピッチＥＧ１５〜１８によってピ
ッチエンベロープＰＥ、〜Ｐ　Ｅ　ｓが発生されると同
時に振幅ＥＣ２５〜２８によって振幅エンベロープＡ　
Ｅ　ｓ〜Ａ　Ｅ　ａが発生される。Also, at this time, the drive signal KD,
By outputting , drive signals KD, .about.KD are generated. As a result, the pitch envelopes PE, ~P E s are generated by the pitches EG15-18, and at the same time, the amplitude envelope A is generated by the amplitudes EC25-28.
E s to A E a are generated.

そして、以後、時間経過に伴って子音第１〜第４フオル
マントのフォルマント中心周波数の制御および振幅の制
御が同期して行われる。そして、子音第１〜第４フオル
マントを振幅エンベロープＡＥ６〜ＡＥ・に従って振幅
制限した各信号が乗算器３５〜３８から各々出力され、
加算器４２によつて加算され、加算器５０を介し、子音
［Ｓ］として出力される。Thereafter, control of the formant center frequency and amplitude of the first to fourth formants of the consonant are performed synchronously with the passage of time. Then, signals obtained by limiting the amplitude of the first to fourth consonant formants according to the amplitude envelopes AE6 to AE are outputted from the multipliers 35 to 38, respectively.
They are added by the adder 42 and outputted as the consonant [S] via the adder 50.

次いでピッチＥＧＩＩにおいて計時が終了すると、上述
と同様に、ピッチエンベロープＰ　Ｅ　ｔ〜ＰＥ４およ
び振幅エンベロープＡＢ、〜ＡＥ、が発生され、母音第
１〜第４フオルマントのフォルマント中心周波数および
振幅の制御が行われる。そして、母音第１〜第４７オル
マントを振幅エンベロープＡＥ、−ＡＥ、に従って振幅
制限した各信号が乗算器３１〜３４から各々出力され、
加算器４１によって加算され、加算器５０を介し、音声
の母音部として出力される。この場合、発音開始当初は
母音部として［Ｕ］が発生され、その後、時間経過に伴
って母音第１〜第４フオルマントの各フォルマント中心
周波数が推移し、次第に［Ａ］音へと移行するように音
色が変化する。Next, when the time measurement ends at pitch EGII, pitch envelopes PE t to PE4 and amplitude envelopes AB, to AE are generated in the same way as described above, and the formant center frequencies and amplitudes of the vowel first to fourth formants are controlled. be exposed. Then, signals whose amplitudes are limited for vowel 1st to 47th ormants according to amplitude envelopes AE and -AE are output from multipliers 31 to 34, respectively,
The signals are added by the adder 41 and outputted as the vowel part of the voice via the adder 50. In this case, [U] is generated as the vowel part at the beginning of pronunciation, and then, as time passes, the center frequency of each formant of the vowel 1st to 4th formants changes, and gradually shifts to [A] sound. The tone changes.

なお、本発明の音声合成装置を電子楽器の楽音合成に利
用できることは言うまでもない。この場合、各フォルマ
ント発生部における窓関数の発生周期を発音すべき楽音
の音高に従って変化させるようにすればよい。It goes without saying that the speech synthesis device of the present invention can be used for musical tone synthesis of electronic musical instruments. In this case, the generation period of the window function in each formant generating section may be changed in accordance with the pitch of the musical tone to be generated.

「発明の効果」以上説明したように、この発明によれば、遅延時間を設
定する手段と、発音開始の指示が与えられた場合に、前
記遅延時間の経過後、発音すべき音声の振幅エンベロー
プを発生する手段と、前記振幅エンベロープの発生と同
期し、周波数制御情報を発生する手段と、前記周波数制
御情報によってフォルマント周波数が制御され、フすル
マント音を発生する音源と、航記フォルマント音の振幅
を前記振幅エンベロープに従って制御し、音声として出
力する音声出力手段とを設けたので、各音の発音期間に
おいて、フォルマント周波数の制御と振幅の制御を同期
させて行うことができ、日常聞かれるような複数の音が
連続して発生される場合の音声を忠実に再現することが
できるという効果がある。"Effects of the Invention" As explained above, according to the present invention, there is provided a means for setting a delay time, and an amplitude envelope of a sound to be produced after the elapse of the delay time when an instruction to start producing sound is given. means for generating a fullmant sound, means for generating frequency control information in synchronization with the generation of the amplitude envelope, a sound source for generating a fullmant sound whose formant frequency is controlled by the frequency control information, and a sound source for generating a fullmant sound in synchronization with the generation of the amplitude envelope; Since a sound output means for controlling the amplitude according to the amplitude envelope and outputting the sound as sound is provided, formant frequency control and amplitude control can be performed in synchronization during the sound production period of each sound. This has the effect of faithfully reproducing the sound when a plurality of sounds are generated in succession.

【図面の簡単な説明】[Brief explanation of drawings]

第１図はこの発明の一実施例による音声合成装置の構成
を示すブロック図、第２図は同実施例におけるピッチエ
ンベロープＰＥ、および振幅エンベロープＡ　Ｅ　ｌを
例示する波形図、第３図は音声［ＳＡ］が発音される場
合の振幅エンベロープを示す波形図である。１１−１４・・・・・・ピッチＥＧ（母音用）、１５〜
１８・・・・・・ピッチＥＣ（子音用）、１〜４・・・
・・・母音フォルマント発生部、５Ｚ８・・・・・・子
音フォルマント発生部、２１・・・・・・振幅ＥＣ，３
１〜３８・・・・・・乗算器、４１．４２および５０・
・・・・・加算器。FIG. 1 is a block diagram showing the configuration of a speech synthesizer according to an embodiment of the present invention, FIG. 2 is a waveform diagram illustrating a pitch envelope PE and an amplitude envelope A E l in the same embodiment, and FIG. FIG. 7 is a waveform diagram showing an amplitude envelope when [SA] is pronounced. 11-14...Pitch EG (for vowels), 15~
18...Pitch EC (for consonants), 1-4...
...Vowel formant generation part, 5Z8...Consonant formant generation part, 21...Amplitude EC, 3
1 to 38... Multiplier, 41.42 and 50.
...Adder.

Claims

【特許請求の範囲】遅延時間を設定する手段と、発音開始の指示が与えられた場合に、前記遅延時間の経
過後、発音すべき音声の振幅エンベロープを発生する手
段と、前記振幅エンベロープの発生と同期し、周波数制御情報
を発生する手段と、前記周波数制御情報によってフォルマント周波数が制御
され、フォルマント音を発生する音源と、前記フォルマ
ント音の振幅を前記振幅エンベロープに従って制御し、
音声として出力する音声出力手段とを具備することを特徴とする音声合成装置。[Scope of Claims] Means for setting a delay time; means for generating an amplitude envelope of a sound to be produced after the delay time has elapsed when an instruction to start sound generation is given; and generation of the amplitude envelope. means for generating frequency control information in synchronization with the frequency control information; a sound source for generating a formant sound whose formant frequency is controlled by the frequency control information; and controlling the amplitude of the formant sound according to the amplitude envelope;
A speech synthesis device characterized by comprising a speech output means for outputting as speech.