JP2624708B2

JP2624708B2 - Speech synthesizer

Info

Publication number: JP2624708B2
Application number: JP62247555A
Authority: JP
Inventors: 成利斉藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1987-09-30
Filing date: 1987-09-30
Publication date: 1997-06-25
Anticipated expiration: 2012-06-25
Also published as: JPS6490498A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、規則合成により合成音声を生成する音声合
成装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Purpose of the Invention] (Industrial application field) The present invention relates to a speech synthesizer that generates synthesized speech by rule synthesis.

（従来の技術）従来から種々の音声合成の手法が提唱されているが、
その１つに規則合成法がある。(Prior art) Various speech synthesis techniques have been proposed.
One of them is a rule synthesis method.

規則合成法は、任意の入力文字列を解析してその音韻
情報と韻律情報とを求め、予め定められた規則に基づい
て、前記入力文字列に対応する合成音声を出力するもの
である。The rule synthesizing method analyzes an arbitrary input character string, obtains its phonological information and prosody information, and outputs a synthesized speech corresponding to the input character string based on a predetermined rule.

規則合成法によれば、任意の単語やフレーズの合成音
声を容易に生成することができる。According to the rule synthesis method, a synthesized speech of an arbitrary word or phrase can be easily generated.

しかしながら規則合成によって生成された合成音声は
了解度の点では申し分無いものの、無声化する母音を含
む単語や文章の合成音声を出力する場合には、その母音
の部分の音声パワーが不適切になって聞き苦しくなる場
合がある。However, although the synthesized speech generated by rule synthesis is satisfactory in terms of intelligibility, when a synthesized speech of a word or a sentence containing a vowel to be devoiced is output, the audio power of the vowel portion becomes inappropriate. Can be difficult to hear.

従来では音声素片パラメータファイルにおける、無声
化する母音の音声パワーのパラメータは語頭に用いるこ
とを基準として想定して設定し、その母音の実際の位置
に係わりなく音声パワーを同じにしていたが、自然音声
を調べてみると無声化する母音が語頭にくる場合には、
語中、語尾にくる場合に比べて音声パワーが大きくなっ
ている。Conventionally, in the speech unit parameter file, the parameter of the voice power of the vowel to be unvoiced was set assuming that it is used at the beginning of the word, and the voice power was the same regardless of the actual position of the vowel. If you look at the natural speech and find that the vowels to be devoiced appear at the beginning,
Speech power is higher than in the case of ending the word.

このため語頭で用いることを想定して決められた音声
素片パラメータファイルを使用すると、無声化する母音
を語中または語尾で用いた場合に音声パワーが大きくな
り過ぎてしまう。Therefore, if a speech unit parameter file determined to be used at the beginning of a word is used, the voice power becomes too large when a vowel to be devoiced is used in a word or at the end of a word.

（発明が解決しようとする問題点）このように規則合成法を採用した従来の音声合成装置
は、無声化する母音を含んだ音声の合成が不得手であ
り、その改善が強く望まれているというのが実情であっ
た。(Problems to be Solved by the Invention) As described above, the conventional speech synthesizer employing the rule synthesizing method is not good at synthesizing a voice including a vowel to be devoiced, and its improvement is strongly desired. That was the fact.

本発明はこのような事情を考慮して成されたもので、
その目的とするところは、無声化する母音が語頭、語中
および語尾のいずれに位置する場合でも聞き易い合成音
声を得ることができる音声合成装置を提供することにあ
る。The present invention has been made in view of such circumstances,
An object of the present invention is to provide a speech synthesizer that can obtain a synthesized speech that is easy to hear even when a vowel to be devoiced is located at the beginning, middle, or end of a word.

［発明の構成］（問題点を解決するための手段）本発明はこの目的を実現するために、入力文字列を解
析して音韻情報と韻律情報とを求め、所定の音声素片パ
ラメータが格納されているファイルを参照して前記音韻
情報に対応した音声パラメータ列および前記韻律情報に
対応した韻律パラメータ列を生成し、前記音声パラメー
タ列と韻律パラメータ列とに従って音声を規則合成する
ように構成された音声合成装置において、出力すべき音
声中の無声化する母音の位置によりこの母音の音声パワ
ーが異なるように前記音声パラメータ列を生成するよう
にしたものである。[Constitution of the Invention] (Means for Solving the Problems) In order to achieve this object, the present invention analyzes an input character string to obtain phoneme information and prosody information, and stores predetermined speech unit parameters. A speech parameter sequence corresponding to the phoneme information and a prosody parameter sequence corresponding to the prosody information with reference to the file that has been generated, and the speech is rule-synthesized according to the speech parameter sequence and the prosody parameter sequence. In the above-mentioned voice synthesizer, the voice parameter sequence is generated such that the voice power of the vowel varies depending on the position of the vowel to be devoiced in the voice to be output.

（作用）本発明の音声合成装置によれば、無声化する母音を含
む音声を合成する場合、その母音が語頭、語中および語
尾のいずれかにあるかに応じてその母音の音声パワーが
変化するので、得られる合成音が聞き易くなる。(Operation) According to the speech synthesizing apparatus of the present invention, when synthesizing a speech including a vowel to be devoiced, the speech power of the vowel depends on whether the vowel is at the beginning, middle or end of the word. Since it changes, the resulting synthesized sound becomes easier to hear.

（実施例）以下、本発明の実施例の詳細を図面に基づいて説明す
る。(Example) Hereinafter, details of an example of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例装置の構成を示すブロック
図である。FIG. 1 is a block diagram showing a configuration of an apparatus according to one embodiment of the present invention.

同図において１は入力文字列を解析してその音韻記号
列と韻律情報とを求める文字列解析部である。この文字
列解析部１において求められた音韻記号列は音声パラメ
ータ列生成装置２に入力され、音声素片パラメータファ
イル３が参照されることにより音声パラメータ列が生成
される。In FIG. 1, reference numeral 1 denotes a character string analysis unit that analyzes an input character string to obtain a phoneme symbol string and prosody information. The phoneme symbol string obtained by the character string analysis unit 1 is input to the speech parameter string generation device 2, and the speech parameter string is generated by referring to the speech unit parameter file 3.

一方、文字列解析部１において求められた韻律情報
は、韻律パラメータ列生成装置４に与えられ、韻律パラ
メータ列が生成される。On the other hand, the prosody information obtained by the character string analysis unit 1 is provided to a prosody parameter sequence generation device 4 to generate a prosody parameter sequence.

音声合成器５はこうして求められた音声パラメータ列
と韻律パラメータ列とに従って、所定の合成規則に基づ
いて入力文字列に対応する合成音声を生成出力する。The voice synthesizer 5 generates and outputs a synthesized voice corresponding to the input character string based on a predetermined synthesis rule according to the voice parameter sequence and the prosody parameter sequence determined in this way.

なお本実施例において語頭とは、文章の文頭の音節ま
たは文章の文節の区切り、句読点の区切りでポーズが挿
入された場合のポーズ直後の音節を示す。In the present embodiment, the beginning of a word means a syllable immediately after a pause when a pause is inserted at the beginning of a sentence of a sentence, a break of a passage of a sentence, or a break of a punctuation mark.

また語尾とは、ポーズが挿入された場合のポーズ直前
の音節または句点の直前の音節をいう。これは復合単語
でも同様である。The ending is a syllable immediately before a pause or a syllable immediately before a punctuation mark when a pause is inserted. This is the same for the combined word.

そして本実施例において無声化する母音の素片パラメ
ータファイルの音声パワー成分は、語中または語尾で用
いることを基準として設定されており、無声化する母音
を含む単語や文章を合成するとき、その母音の位置に応
じて、素片パラメータファイルに予め設定されている音
声パワーのパラメータに一定値を加えて大きくする。Then, in the present embodiment, the voice power component of the unit parameter file of the vowel to be devoiced is set on the basis of being used in a word or at the end thereof, and when synthesizing a word or a sentence including the vowel to be devoiced, In accordance with the position of the vowel, a certain value is added to the voice power parameter preset in the unit parameter file to increase the value.

CV音節ケプストラムパラメータを用いた場合には、第
２図に示したように、ケプストラムパラメータの０次項
COに一定値αを加算する。When the CV syllable cepstrum parameter is used, as shown in FIG.
Add a constant value α to CO.

例えば「機械製作所」（キカイセイサクジョ）という
語を合成出力する場合には、無声化する母音「キ」が語
頭などで、「キ」の音声素片パラメータファイルの０次
項のCO値に一定値αを加算して、CO′をすべてのフレー
ムで計算し、CO以外のもの（C1〜C25）については、素
片パラメータファイルの値をそのまま用い、これを
「キ」の音声パラメータ列として生成する。For example, if the word “Keisei Seisakusho” is synthesized and output, the vowel “K” to be devoiced is the beginning of the word, etc., and the CO value of the 0th-order term of the voice unit parameter file of “K” is a fixed value. Add α, calculate CO ′ for all frames, and use the value of the unit parameter file as it is for other than CO (C1 to C25) and generate this as the voice parameter sequence of “G” .

一方、「石炭」（セキタン）や「霞ケ関」（カスミガ
セキ）等の単語の合成音声を生成する場合、無声化する
母音「キ」は語頭ではないので、この場合には素片パラ
メータファイルの値をそのまま用いて音声パラメータ列
を生成する。On the other hand, when generating synthetic speech of words such as "coal" (sekitan) and "kasumigaseki" (kasumigaseki), the vowel "ki" to be devoiced is not the beginning of the word, so in this case, the value of the unit parameter file must be changed. A voice parameter sequence is generated by using it as it is.

続いて本発明の他の実施例を第３図に従って説明す
る。Next, another embodiment of the present invention will be described with reference to FIG.

先の実施例では、無声化する母音が語頭にくる場合、
音声パワーのパラメータに一定値を加えて音声を大きく
していたが、この実施例では、語中または語尾で用いる
ことを想定した音声素片パラメータファイル３とは別個
に、語頭で用いることを想定した語頭用パラメータファ
イル６を設けている。In the previous embodiment, if the vowel to be devoiced comes to the beginning of the word,
Although the voice is increased by adding a constant value to the parameter of the voice power, in this embodiment, it is assumed that it is used at the beginning of the word separately from the voice segment parameter file 3 which is assumed to be used in the middle or the end of the word. A word prefix parameter file 6 is provided.

語頭用パラメータファイル６には語頭用に分析した無
声化する母音「キ」、「ク」、「シ」、「ス」、
「チ」、「シ」、「ヒ」、「フ」「ピ」および「プ」
等、10種類程度の語頭用の音声パワーのパラメータが記
憶されている。The vowels "ki", "ku", "shi", "su" to be devoiced analyzed for the beginning are stored in the beginning parameter file 6.
"H", "H", "H", "H", "H", and "H"
For example, about ten types of speech power parameters for the beginning of a word are stored.

たとえば「機械製作所」（キカイセイサクジョ）とい
う語を合成出力する場合、無声化する母音「キ」は語頭
なので、この「キ」の音声パワーのパラメータは語頭用
パラメータファイル６から取り出し「カ」、「イ」、
「セ」、「イ」、「サ」、「ク」、「ジョ」の音声パワ
ーのパラメータは音声素片パラメータファイル３から取
り出す。For example, in the case of synthesizing and outputting the word “Keisei Seisakusho” (kikaisei sakujo), since the vowel “ki” to be devoiced is an initial, the voice power parameters of the “ki” are extracted from the initial parameter file 6 and “ka”, "I",
The parameters of the audio power of “S”, “A”, “S”, “C”, and “Jo” are extracted from the speech unit parameter file 3.

なお単語「石炭」（セキタン）や「霞ケ関」（カスミ
ガセキ）の音声パラメータ列を生成する場合には、無声
化する母音「キ」が語頭ではないので、音声素片パラメ
ータファイル３のみから素片パラメータを取り出し音声
パラメータ列を生成すればよい。When generating a voice parameter sequence of the words “coal” (sekitan) and “kasumigaseki” (kasumigaseki), the vowel “ki” to be devoiced is not the beginning of the word, so the unit parameter is obtained from only the voice unit parameter file 3. May be taken out to generate a voice parameter sequence.

かくして本実施例装置によれば、無声化する母音を含
む文章を合成する場合でも、その母音がくる位置によっ
て音声パワーが適度に調整されるので、得られる合成音
声は従来に比べて格段に聞取り易くなる。Thus, according to the apparatus of the present embodiment, even when a sentence containing a vowel to be devoiced is synthesized, the audio power is appropriately adjusted depending on the position where the vowel comes, so that the synthesized speech obtained is much more audible than in the past. It will be easier.

なお前記各実施例では、無声化する母音の音声パワー
を示すパラメータを語中または語尾で用いることを基準
として設定していたが、語頭用に用いることを基準とし
て音声パワーを示すパラメータを設定し、無声化する母
音の音節が語頭の場合は音声素片パラメータファイルを
そのまま用いて音声パラメータ列を生成する一方、その
母音が語中または語尾にある場合は、音声素片パラメー
タファイルの音声パワー成分を表すパラメータから一定
値を引くようにしてもよい。In each of the above embodiments, the parameter indicating the voice power of the vowel to be devoiced is set on the basis of the use in the middle or the end of the word, but the parameter indicating the voice power on the basis of the use for the beginning of the word is set. When the syllable of the vowel to be devoiced is the beginning of a word, a speech parameter sequence is generated using the speech unit parameter file as it is, while when the vowel is in the middle or end of the word, the speech power component of the speech unit parameter file is generated. May be subtracted from the parameter representing.

また無声化する母音の音節「キ」、「ク」、「シ」、
「ス」、「チ」、「シ」、「ヒ」、「フ」、「ピ」およ
び「プ」について、それぞれに加減する一定値を試聴実
験により細かに設定することも可能である。Also, the vowel syllables "ki", "ku", "shi",
For "S", "H", "S", "H", "F", "Pi", and "P", it is also possible to finely set a constant value to be added or subtracted by a trial listening experiment.

さらに各実施例では、無声化する母音のパワーを語中
および語尾で同レベルにしていたが、これを別のものと
考え、音声パワーを語頭用と合わせて３通りに分ける
と、より一層自然な合成音声を得ることができる。Further, in each of the embodiments, the power of the vowel to be devoiced is set to the same level in the middle and the end of the word. It is possible to obtain a natural synthesized voice.

［発明の効果］以上説明したように本発明の音声合成装置は、無声化
する母音を含む音声を合成する場合、その母音が語頭、
語中および語尾のいずれにあるかにより、音声パワーの
パラメータを異ならせるので、無声化する母音の大きさ
が程よく調整された聞き易い音声を規則合成することが
できる。[Effects of the Invention] As described above, the speech synthesis apparatus of the present invention, when synthesizing a voice including a vowel to be devoiced, uses the vowel at the beginning,
Since the parameters of the voice power are made different depending on whether the word is in the middle or the end of the word, it is possible to synthesize a legible voice in which the size of the vowel to be devoiced is appropriately adjusted.

【図面の簡単な説明】[Brief description of the drawings]

第１図は本発明の一実施例装置の構成を示すブロック
図、第２図は同実施例における音声素片パラメータの音
声パワーの成分変更を説明する図、第３図は本発明の他
の実施例装置の構成を示すブロック図である。１……文字列解析部、２……音声パラメータ列生成装
置、３……音声素片パラメータファイル、４……韻律パ
ラメータ列生成装置、５……音声合成器、６……語頭用
パラメータファイル。FIG. 1 is a block diagram showing the configuration of an apparatus according to an embodiment of the present invention, FIG. 2 is a view for explaining a change in the audio power component of the speech unit parameters in the embodiment, and FIG. It is a block diagram showing the composition of an example device. 1. Character string analysis unit 2. Speech parameter sequence generation device 3. Speech unit parameter file 4. Prosody parameter sequence generation device 5. Speech synthesizer 6. Word prefix parameter file.

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】入力文字列を解析して音韻情報と韻律情報
とを求め、所定の音声素片パラメータが格納されている
ファイルを参照して前記音韻情報に対応した音声パラメ
ータ列および前記韻律情報に対応した韻律パラメータ列
を生成し、前記音声パラメータ列と韻律パラメータ列と
に従って音声を規則合成するように構成された音声合成
装置において、出力すべき音声中の無声化する母音の位
置によりこの母音の音声パワーが異なるように前記音声
パラメータ列を生成するようにしたことを特徴とする音
声合成装置。An input character string is analyzed to obtain phoneme information and prosody information, and a speech parameter sequence corresponding to said phoneme information and said prosody information are referred to with reference to a file storing predetermined speech unit parameters. In a speech synthesizer configured to generate a prosody parameter sequence corresponding to the vowel and to perform regular synthesis of speech according to the speech parameter sequence and the prosody parameter sequence, Wherein the voice parameter sequence is generated such that the voice powers of the voice syntheses are different.

【請求項２】前記ファイルに、無声化する母音が語中ま
たは語尾にくる場合の音声パワーを示すパラメータを予
め格納しておき、その母音が語頭にくるときには前記音
声パワーが増大するように前記パラメータを変更する特
許請求の範囲第１項記載の音声合成装置。2. The file stores in advance a parameter indicating voice power when a vowel to be devoiced comes to the middle or end of a word, and increases the voice power when the vowel comes to the beginning of a word. The speech synthesizer according to claim 1, wherein the parameter is changed.

【請求項３】前記ファイルに、無声化する母音が語頭に
くる場合の音声パワーを示すパラメータを予め格納して
おき、その母音が語中または語尾にくるときには前記音
声パワーが減少するように前記パラメータを変更する特
許請求の範囲第１項記載の音声合成装置。3. A parameter indicating a voice power when a vowel to be devoiced comes to the beginning of a word is stored in the file in advance, and when the vowel comes to the middle or at the end of the word, the voice power is reduced. The speech synthesizer according to claim 1, wherein the parameter is changed.

【請求項４】前記ファイルに、無声化する母音が語頭に
くる場合の音声パワーを示すパラメータとその母音が語
中または語尾にくる場合の音声パワーを示すパラメータ
とを別個に予め格納しておき、その母音の位置に応じて
各パラメータを選択する特許請求の範囲第１項記載の音
声合成装置。4. A parameter indicating a voice power when a vowel to be unvoiced is at the beginning of a word and a parameter indicating a voice power when the vowel is at the middle or at the end of the word are separately stored in the file in advance. 2. The speech synthesizer according to claim 1, wherein each parameter is selected according to the position of the vowel.