JP3078073B2

JP3078073B2 - Basic frequency pattern generation method

Info

Publication number: JP3078073B2
Application number: JP03344627A
Authority: JP
Inventors: 隆矢頭
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-12-26
Filing date: 1991-12-26
Publication date: 2000-08-21
Anticipated expiration: 2015-08-21
Also published as: JPH05173590A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声合成に用いる基
本周波数パタン生成方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for generating a fundamental frequency pattern used for speech synthesis.

【０００２】[0002]

【従来の技術】文字情報を入力し、それを音声に変換し
て出力する音声合成装置は、出力語彙の制限がないこと
から録音・再生型の音声合成に取って代る技術として期
待されている。この種の合成装置に於いて、音声のアク
セント、イントネ−ションを表現する声帯振動の基本周
波数（ピッチ）パタンの生成は自然な合成音を得る上で
非常に重要な要素である。2. Description of the Related Art A speech synthesizer for inputting character information, converting it into speech, and outputting the speech is expected to be a technology that can replace recording / playback speech synthesis because there is no restriction on the output vocabulary. I have. In this type of synthesizer, generation of a fundamental frequency (pitch) pattern of vocal fold vibration representing accents and intonations of speech is a very important factor in obtaining a natural synthesized sound.

【０００３】音声のピッチパタンは、個々の単語のアク
セント型のみならず文章構造、意味等の影響を強く受け
るため、実際の音声から抽出した基本周波数パタンを種
々用意して、此等を組合せて文章全体のパタンとすると
いう方法では実現が困難であり、適切なモデル化が不可
欠となる。[0003] Since the pitch pattern of speech is strongly affected not only by the accent type of each word but also by the sentence structure, meaning, etc., various fundamental frequency patterns extracted from actual speech are prepared and combined. It is difficult to achieve this by using the entire sentence pattern, and appropriate modeling is indispensable.

【０００４】音声の基本周波数パタン生成モデルとして
は、幾つかの方法が提案されているが、対数軸上の基本
周波数パタンを文頭から文末に向かう緩やかな下降（イ
ントネ−ション）に対応するフレ−ズ成分と、局所的な
起伏（アクセント）に対応するアクセント成分の和で表
現されるとし、フレ−ズ成分はインパルス状のフレ−ズ
指令に対する臨界制動２次線形系の応答であるとの近似
の基に、また、アクセント成分はステップ状のアクセン
ト指令に対する臨界制動２次線形系の応答であるとの近
似の基に定式化したモデルが一般に広く用いられている
（広瀬啓吉、藤崎博也、河井恒、山口幹雄：“基本周波
数パタン生成過程モデルに基づく文章音声の合成”、電
子情報通信学会論文誌 ’８９／０１Ｖｏｌ．Ｊ７２
−ＡＮｏ．１参照）。Several methods have been proposed as models for generating a fundamental frequency pattern of speech. The fundamental frequency pattern on the logarithmic axis is a frame corresponding to a gradual decrease (intonation) from the beginning to the end of the sentence. Is expressed as the sum of a noise component and an accent component corresponding to a local undulation (accent), and the phrase component is an approximation that is a response of a critical damping quadratic linear system to an impulse-like phrase command. In general, a model formulated based on an approximation that the accent component is a response of a critical damping quadratic linear system to a step-like accent command is widely used (Keiyoshi Hirose, Hiroya Fujisaki, Tsune Kawai, Mikio Yamaguchi: "Synthesis of Sentence Speech Based on Fundamental Frequency Pattern Generation Process Model", Transactions of the Institute of Electronics, Information and Communication Engineers, '89 / 01 Vol.
-A No. 1).

【０００５】図３は従来の基本周波数パタン生成モデル
を示すブロック図である。このモデルでは対数基本周波
数ｌn Ｆ0(ｔ）は時刻ｔの関数として次式で与えられ
る。FIG. 3 is a block diagram showing a conventional fundamental frequency pattern generation model. In this model, the logarithmic fundamental frequency In F0 (t) is given by the following equation as a function of time t.

【０００６】[0006]

【数１】 (Equation 1)

【０００７】ここでＦ_min は基底周波数、Ａ_piは文章中
のｉ番目のフレ−ズ指令の大きさ、Ａ_ajは文章中のｊ番
目のアクセント指令の大きさ、Ｉは一文章中のフレ−ズ
指令の数、Ｊは一文章中のアクセント指令の数、Ｔ_0iは
ｉ番目のフレ−ズ指令の開始時点、Ｔ_1j，Ｔ_2jは其々ｊ
番目のアクセント指令の開始時点と終了時点である。Here, F _min is the base frequency, A _pi is the size of the i-th phrase command in the text, A _aj is the size of the j-th accent command in the text, and I is the frame size in the text. The number of accent commands, J is the number of accent commands in one sentence, T _0i is the start time of the i-th _phrase command, and T _1j and T _2j are _j , respectively.
The start and end times of the second accent command.

【０００８】また、Ｇp(ｔ）、Ｇa(ｔ）は其々フレ−ズ
制御機構のインパルス応答関数、アクセント制御機構の
ステップ応答関数であり、ｔ≧０の範囲で次式（２）、
（３）で与えられる（ｔ＜０では両者とも０）。Gp (t) and Ga (t) are an impulse response function of the phrase control mechanism and a step response function of the accent control mechanism, respectively, and within the range of t ≧ 0, the following equation (2):
(3) (both are 0 when t <0).

【０００９】Ｇp(ｔ）＝αｔ・ｅｘｐ（−αｔ） (2) Ｇa(ｔ）＝Ｍin［１−（１＋βｔ）・ｅｘｐ（−βｔ），θ）］ (3) Gp (t) = αt · exp (−αt) (2) Ga (t) = Min [1− (1 + βt) · exp (−βt), θ)] (3)

【００１０】ここで、α、βは其々フレ−ズ制御機構の
応答の速さ、アクセント制御機構の応答の速さを決める
定数であり、α＝３．０、β＝２０．０程度の値を用い
る。また、θはアクセント成分の上限値で通常θ＝０．
９等に選ばれる。Here, α and β are constants which determine the response speed of the phrase control mechanism and the response speed of the accent control mechanism, respectively, where α = 3.0 and β = 20.0. Use values. Θ is the upper limit value of the accent component, usually θ = 0.
9 mag.

【００１１】人間の発声では平坦に発声しても、発声の
初めはピッチが高く、以後呼気圧の低下などにより自然
にピッチが下がる性質があり、このピッチの自然下降成
分をモデル化したものが前述のフレ−ズ制御機構であ
る。In human utterances, even if uttered flatly, the pitch is high at the beginning of the utterance, and thereafter the pitch naturally falls due to a decrease in the expiration pressure, etc. This is the above-described phrase control mechanism.

【００１２】一方、アクセントについては、標準語の場
合、単語のアクセントは第１モ−ラから第２モ−ラにか
けて必ず顕著なピッチの上昇または下降があり、かつ、
単語内でのピッチの顕著な下降は一ヶ所のみに限られ
る。従って、ｎ個のモ−ラから成る単語には（ｎ＋１）
種のアクセント型が存在する。各アクセント型はピッチ
の下降する位置に着目して０型、１型、２型、３型（第
ｉモ−ラと第ｉ＋１モ−ラとの間でピッチが顕著に下降
するものがｉ型であり、０型は平板アクセントとも言
う）と呼ぶ。ピッチの上昇、下降は、前述の基本周波数
生成モデルのアクセント指令の始点、終点に対応する。On the other hand, with respect to accent, in the case of a standard word, the accent of a word always has a remarkable pitch rise or fall from the first mora to the second mora, and
There is only one significant drop in pitch within a word. Therefore, a word consisting of n moras is (n + 1)
There are various accent types. Each accent type pays attention to the position where the pitch falls, and the 0 type, 1 type, 2 type, and 3 type (the i type is a type in which the pitch significantly decreases between the i-th and i + 1-th models) And type 0 is also called a flat accent). The rise and fall of the pitch correspond to the start point and end point of the accent command of the above-described fundamental frequency generation model.

【００１３】[0013]

【発明が解決しようとする課題】図４は、従来のアクセ
ント制御機構によるアクセント成分の形状を示したもの
である。図４は通常用いられる応答速度（β＝２０）を
用いた例であるが、アクセント指令の開始時点から顕著
な立ち上がりを見せた後、上限値に達し、以後その値を
保持する。そして指令の終了時点から立ち上がりと同一
の応答速度で再び０に下降する。FIG. 4 shows the shape of an accent component by a conventional accent control mechanism. FIG. 4 shows an example in which the response speed (β = 20) which is generally used is used. After a remarkable rise is shown from the start of the accent command, the upper limit is reached, and the value is maintained thereafter. Then, it falls to 0 again at the same response speed as the rise from the end of the command.

【００１４】しかしながら、実際の音声のピッチパタン
の観測によると、アクセント成分の上昇と下降の応答速
度は必ずしも同一にはならない。立ち上がりの応答速度
は従来用いられている一定値としても問題となることは
少ないが、下降の速度は単語のアクセント型、モ−ラ
数、アクセント核の次に来るモ−ラの音節の種類などに
よって大きな違いがある。此等の性質を考慮せずピッチ
の下降速度を上昇の速度と同一にしてパタンを生成する
と、アクセントの起伏が激しく、全体的に滑らかさを欠
いた不自然な合成音となる。However, according to the observation of the pitch pattern of the actual voice, the response speed of the rise and fall of the accent component is not always the same. The response speed of the rise is rarely a problem even if it is a constant value conventionally used, but the speed of the fall is the accent type of the word, the number of mora, the type of mora syllable following the accent nucleus, etc. There is a big difference. If the pattern is generated with the pitch descending speed equal to the ascending speed without considering these properties, the accent sound is drastic, and an unnatural synthesized sound lacking in smoothness as a whole is obtained.

【００１５】従来のアクセント制御機構を用いた音声合
成では、これらの性質を明記した例は無く、また、ステ
ップ応答を仮定した従来のアクセント制御機構では下降
の応答速度のみを分離して制御することは困難である。In the speech synthesis using the conventional accent control mechanism, there is no example in which these properties are specified, and in the conventional accent control mechanism assuming a step response, only the response speed of the descent is controlled separately. It is difficult.

【００１６】この発明は、以上述べた問題を解決し、種
々の状況に適応して自然なピッチパタンの生成が可能な
基本周波数パタン生成方法提供することを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and to provide a fundamental frequency pattern generation method capable of generating a natural pitch pattern in various situations.

【００１７】[0017]

【課題を解決するための手段】この発明は、前記課題を
解決するために、入力文章の解析により算出される韻律
を生成する為のアクセント指令とフレーズ指令を入力
し、対数軸上の基本周波数パタンを、イントネーション
に対応するフレーズ成分と、アクセントに対応するアク
セント成分との和で表す基本周波数パタン生成法におい
て、前記アクセント成分を、アクセントの立ち上げ応答
速度を表す定数を用いてアクセントの立ち上がりを記述
する対数周波数軸上の臨界制動２次線形系のインパルス
応答関数と、アクセントの立ち下げ応答速度を表す定数
を用いてアクセントの立ち下がりを記述する対数周波数
軸上の臨界制動２次線形系のインパルス応答関数の和と
して表すと共に、前記インパルス応答関数の立ち上げ応
答速度及び立ち下げ応答速度を各々独立に制御すること
を特徴とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, according to the present invention, an accent command and a phrase command for generating a prosody calculated by analyzing an input sentence are inputted, and a fundamental frequency on a logarithmic axis is input. In a fundamental frequency pattern generation method in which a pattern is represented by a sum of a phrase component corresponding to intonation and an accent component corresponding to an accent, the accent component is expressed by using a constant representing a response speed of the start-up of the accent. The impulse response function of the critical damping quadratic linear system on the logarithmic frequency axis to be described, and the critical damping quadratic linear system on the logarithmic frequency axis describing the fall of the accent using a constant representing the fall response speed of the accent Expressed as the sum of the impulse response functions, and the rise response speed and fall of the impulse response function And controlling independently the answer rate.

【００１８】本発明の実施に当たっては、アクセントの
立ち下がりを記述する前記インパルス応答関数の立ち下
げ応答速度を、単語のアクセント型、モーラ数、アクセ
ント核のあるモーラの次のモーラの音節の種類に応じて
適応的に制御するのが好適である。In practicing the present invention, the fall response speed of the impulse response function that describes the fall of the accent is defined as the accent type of the word, the number of mora, and the type of the syllable of the mora next to the mora with the accent nucleus. It is preferable to control adaptively in response.

【００１９】[0019]

【作用】この発明の方法によれば、音声の基本周波数パ
タンにおけるアクセント成分をアクセントの立ち上がり
の成分と立ち下がりの成分とに分離して扱い、其々を臨
界制動２次線形系のインパルス応答関数で近似する。こ
の場合、対数基本周波数の生成式は次式（４）のように
表される。According to the method of the present invention, the accent component in the fundamental frequency pattern of the voice is treated separately as the rising component and the falling component of the accent, and each is treated as the impulse response function of the critical damping quadratic linear system. Approximation. In this case, the generation formula of the logarithmic fundamental frequency is represented by the following expression (4).

【００２０】[0020]

【数２】 (Equation 2)

【００２１】ここでＦ_min は基底周波数、Ａ_piは文章中
のｉ番目のフレ−ズ指令の大きさ、Ａ_ajは文章中のｊ番
目のアクセント指令の大きさ、Ｉは一文章中のフレ−ズ
指令の数、Ｊは一文章中のアクセント指令の数、Ｔ_0iは
ｉ番目のフレ−ズ指令の開始時点、Ｔ_1j，Ｔ_2jは其々ｊ
番目のアクセント指令の開始時点と終了時点である。ま
た、Ｇ_a1j (t) 、Ｇ_a2j (t) は其々ｊ番目のアクセント
指令に対するアクセントの立ち上がりを記述するインパ
ルス応答関数及びアクセントの立ち下がりを記述するイ
ンパルス応答関数であり、ｔ≧０の範囲で次式（５）、
（６）で与える（ｔ＜０では両者ともに０となる）。Where F _min is the base frequency, A _pi is the size of the i-th phrase command in the text, A _aj is the size of the j-th accent command in the text, and I is the frame size in the text. The number of accent commands, J is the number of accent commands in one sentence, T _0i is the start time of the i-th _phrase command, and T _1j and T _2j are _j , respectively.
The start and end times of the second accent command. G _a1j (t) and G _a2j (t) are an impulse response function for describing the rise of the accent and an impulse response function for describing the fall of the accent for the j-th accent command, respectively. In the following equation (5),
(6) (both become 0 when t <0).

【００２２】Ｇ_a1j (t) ＝Ｍin［１−（１＋εj ｔ）・exp(−εj ｔ），θ］ (5) Ｇ_a2j (t) ＝(-1)・Ｍin［１−（１＋ζj ｔ）・exp(−ζj ｔ），θ］ (6) G _a1j (t) = Min [1− (1 + εj t) · exp (−εj t), θ] (5) G _a2j (t) = (− 1) · Min [1− (1 + ζj t) · exp (−ζj t), θ] (6)

【００２３】ε_j 、ζ_j は其々ｊ番目のアクセント指令
に対するアクセント立ち上げ応答の速さ、アクセント立
ち下げ応答の速さを決める定数である。Ε _j and ζ _j are constants that determine the speed of the response to the accent rise and the response to the accent fall for the j-th accent command, respectively.

【００２４】単語もしくは文節のモ−ラ数とアクセント
型との関連については、日本語の単語アクセントパタン
にはピッチの下降の動作を単語の終端までの時間範囲で
完了させるという傾向がみられ、ピッチの立ち下がり時
間に余裕がある場合には余裕のない場合に比べピッチの
下降が緩やかになることが多い。つまり、同じモーラ数
の単語ではアクセント核が前方にある（アクセント型が
小さい）ほうが立ち下がりの応答速度（減衰速度）が遅
くなり、また同じアクセント型の単語では単語のモ−ラ
数が大きい程減衰速度が遅くなる。Regarding the relationship between the number of words or phrases in the mora and the accent type, the Japanese word accent pattern tends to complete the pitch lowering operation in the time range up to the end of the word. When there is enough time for the fall time of the pitch, the fall of the pitch is often slower than when there is no allowance. In other words, for words having the same mora number, the response speed (decay rate) of the fall is slower when the accent nucleus is in front (small accent type). Decay rate becomes slow.

【００２５】また、アクセントの立ち下がりの減衰速度
と音韻の種類との関係では、特に、アクセント核のある
モ−ラの次のモ−ラ、即ちｉ型（ｉ≠０）アクセントの
単語では語頭から数えてｉ＋１番目のモ−ラの音節の種
類による影響が顕著である。In the relationship between the decay rate of the fall of the accent and the type of phoneme, particularly, the first word in the mora next to the mora having the accent nucleus, that is, in the i-type (i ≠ 0) accent word The effect of the type of the syllable of the (i + 1) -th mora is significant.

【００２６】このような性質を考慮して、本発明ではア
クセントの立ち下げを記述する前記式（６）で表される
インパルス応答関数の応答速度（即ちζ_j ）を、アクセ
ントの立ち上げを記述する前記式（５）で表されるイン
パルス応答関数の応答速度（即ちε_j ）と独立に制御
し、且つ、単語のアクセント型とモ−ラ数とアクセント
核のあるモ−ラの次のモ−ラの音節の種類に応じて適応
的に変化させることにより、実際の音声に近い自然な基
本周波数パタンを生成することが出来るのである。図２
は本発明によるアクセント成分のパタンの一例を示した
ものであり、ζ_jの増大につれてアクセントの立ち下が
りの減衰速度が速くなるように作用する。In consideration of such properties, in the present invention, the response speed (ie, ζ _j ) of the impulse response function expressed by the above-mentioned equation (6) which describes the fall of the accent is expressed by Independently controlling the response speed (ie, ε _j ) of the impulse response function expressed by the above equation (5), the accent type of the word, the number of mora, and the next model after the mora with the accent kernel -It is possible to generate a natural fundamental frequency pattern close to the actual voice by adaptively changing the syllable type of the sound. FIG.
Shows an example of the pattern of the accent component according to the present invention, and acts to increase the decay rate of the fall of the accent as ζ _j increases.

【００２７】[0027]

【実施例】図１はこの発明の一実施例を示す装置の機能
ブロック図であり、文字列入力部１０、文字列解析部１
１、フレ−ズ制御機構１２、アクセント立ち上げ制御機
構１３、アクセント立ち下げ制御機構１４、定数設定部
１５、声帯振動機構１６、基本周波数出力端子１７から
構成されている。以下、本実施例における基本周波数の
生成方法につき説明する。FIG. 1 is a functional block diagram of an apparatus showing an embodiment of the present invention. A character string input section 10 and a character string analysis section 1 are shown.
1, a phrase control mechanism 12, an accent rise control mechanism 13, an accent fall control mechanism 14, a constant setting section 15, a vocal cord vibration mechanism 16, and a fundamental frequency output terminal 17. Hereinafter, a method of generating the fundamental frequency in the present embodiment will be described.

【００２８】先ず、文字列入力部１０から、音声に変換
されるべき文章の韻律記号付き仮名文字列が入力され
る。韻律記号付き仮名文字列とは、例えば表１に示され
る文字列であり、音声に変換されるべき文章の読みに対
応する仮名文字列とフレ−ズ記号、アクセント記号、休
止記号（区切り記号）等の韻律制御記号から成る。First, a kana character string with a prosody symbol of a sentence to be converted to speech is input from the character string input unit 10. A kana character string with a prosody symbol is, for example, a character string shown in Table 1, and includes a kana character string corresponding to reading of a sentence to be converted into speech, a phrase symbol, an accent symbol, and a pause symbol (separator symbol). And so on.

【００２９】[0029]

【表１】 [Table 1]

【００３０】文字列解析部１１では入力された韻律記号
付き仮名文字列に基づき、フレ−ズ制御機構１２の入力
であるフレ−ズ指令を決定すると共にアクセント立ち上
げ制御機構１３及びアクセント立ち下げ制御機構１４の
入力であるアクセント指令を決定する。また、文字列解
析部１１では入力された韻律記号付き仮名文字列から文
章中の各単語におけるモ−ラ数、アクセント型、アクセ
ント核のあるモ−ラの次のモ−ラの音節の種類を判別
し、定数設定部１５に出力する。The character string analyzing unit 11 determines a phrase command which is an input of the phrase control mechanism 12 based on the inputted kana character string with the prosody symbol, and controls the accent rise control mechanism 13 and the accent fall control. An accent command which is an input of the mechanism 14 is determined. Further, the character string analysis unit 11 calculates the number of mora, the accent type, and the type of the syllable of the next mora to the mora with the accent nucleus in each word in the sentence from the input kana character string with the prosodic symbol Judge and output to the constant setting unit 15.

【００３１】フレ−ズ指令の大きさは入力文字列中に挿
入されているフレ−ズ記号の種類に応じて表２の如く決
められる。The size of the phrase command is determined as shown in Table 2 according to the type of phrase symbol inserted in the input character string.

【００３２】[0032]

【表２】 [Table 2]

【００３３】フレ−ズ指令の開始時点は、同じく入力文
字列に於いて、仮名文字列中のフレ−ズ記号の挿入位置
に応じて決められる。例えば、表１に示される文字列で
は、第１フレ−ズのフレ−ズ記号Ｐ１に対して、当該フ
レ−ズの第１音節「キ」の始端を基準に、通常１００〜
２００ｍｓさかのぼった時点に設定される。The start point of the phrase command is also determined in the input character string according to the insertion position of the phrase symbol in the kana character string. For example, in the character string shown in Table 1, with respect to the phrase symbol P1 of the first phrase, usually 100 to 100% based on the beginning of the first syllable "g" of the phrase.
It is set at the point in time when it goes back 200 ms.

【００３４】一方、アクセント指令は、立ち上げ指令と
立ち下げ指令の一対のインパルス指令からなるが、立ち
上げ指令と立ち下げ指令の大きさは同じであるので、従
来のステップ状のアクセント指令と何等変るところはな
い。但し、アクセント立ち下げ指令に対しては定数設定
部１５において単語のモ−ラ数、アクセント型、アクセ
ント核のあるモ−ラの次のモ−ラの音節の種類を基に設
定された立ち下げの応答速度を決める定数（式（６）の
ζ_j ）が同時に与えられる。On the other hand, the accent command is composed of a pair of impulse commands of a start command and a fall command. Since the magnitudes of the start command and the drop command are the same, they are different from those of the conventional step-like accent command. There is no change. However, in response to the accent drop command, the constant setting unit 15 sets the drop based on the number of word mora, the accent type, and the syllable type of the next mora after the mora with the accent nucleus. (Ζ _{j in the} equation (6)) that determines the response speed of are simultaneously given.

【００３５】アクセント指令の大きさ及び指令の時点は
休止記号、アクセント記号の位置、種類に応じて決定さ
れる。アクセント記号はアクセント核のある音節の直後
に挿入されて語のアクセント位置を示すと共にその種類
によってアクセントの強さを表している。アクセント記
号の種類に応じたアクセント指令の大きさを表３に示
す。アクセント記号の無い語については平板型（０型）
アクセントと見做され、同じく表３の指令の大きさが与
えられる。The size of the accent command and the time of the command are determined according to the position and type of the pause symbol and the accent symbol. Accent marks are inserted immediately after a syllable with an accent nucleus to indicate the accent position of a word and to indicate the strength of the accent depending on the type. Table 3 shows the size of the accent command according to the type of accent symbol. Flat words (type 0) for words without accent marks
It is regarded as an accent and is given the magnitude of the command in Table 3 as well.

【００３６】[0036]

【表３】 [Table 3]

【００３７】アクセント立ち上げ指令位置はアクセント
型によっても異なるが、語の第１モ−ラもしくは第２モ
−ラの母音開始時点を基準に決められる。立ち下げ指令
時点はアクセント記号の位置によりアクセント核の次の
モ−ラの母音開始時点を基準に求められる。平板型の場
合は語の最終モ−ラが基準となる。The accent start command position differs depending on the accent type, but is determined based on the start time of the vowel of the first or second mora of the word. The time of the fall command is obtained based on the vowel start time of the next mora of the accent nucleus according to the position of the accent mark. In the case of the flat type, the last mora of the word is the reference.

【００３８】以上のようにして決定されたフレ−ズ指
令、アクセント立ち上げ指令、アクセント立ち下げ指令
は其々フレ−ズ制御機構１２、アクセント立ち上げ制御
機構１３、アクセント立ち下げ制御機構１４に送られ
る。The phrase command, accent rise command, and accent fall command determined as described above are sent to the phrase control mechanism 12, the accent rise control mechanism 13, and the accent fall control mechanism 14, respectively. Can be

【００３９】フレ−ズ制御機構１２では与えられたフレ
−ズ指令に対するインパルス応答関数である式（２）を
計算してフレ−ズ成分を生成する。The phrase control mechanism 12 calculates equation (2), which is an impulse response function for a given phrase command, to generate a phrase component.

【００４０】アクセント立ち上げ制御機構１３は、アク
セント立ち上げ指令に対して立ち上げの応答関数である
式（５）を計算し、アクセント立ち下げ制御機構１４は
アクセント立ち下げ指令に対して、定数設定部１５から
与えられる時定数ζ_j をもとに立ち下げの応答関数であ
る式（６）を計算し、両者の応答の和としてアクセント
成分を生成する。The accent rise control mechanism 13 calculates the expression (5) which is a response function of the rise to the accent rise command, and the accent fall control mechanism 14 sets a constant to the accent fall command. Based on the time constant ζ _j given from the unit 15, the response function of the fall (Equation (6)) is calculated, and an accent component is generated as the sum of the responses.

【００４１】このようにして、全ての指令に対するフレ
−ズ成分、アクセント成分は加算されて声帯振動機構１
６に出力され、声帯振動機構１６では前記フレ−ズ成
分、アクセント成分に対数基本周波数の下限値ｌn Ｆ
_min を加算して対数基本周波数とし、更に指数変換を行
って基本周波数Ｆ0(ｔ）を得て基本周波数出力端子１７
から出力する。In this way, the phrase component and the accent component for all commands are added, and the vocal cord vibration mechanism 1 is added.
6, the vocal cord vibration mechanism 16 adds the lower limit value lnF of the logarithmic fundamental frequency to the phrase component and the accent component.
_{The min} is added to the logarithmic fundamental frequency, and the exponential conversion is further performed to obtain the fundamental frequency F0 (t), and the fundamental frequency output terminal 17
Output from

【００４２】[0042]

【発明の効果】以上、詳細に説明したように、本発明の
基本周波数パタン生成方法によれば、音声の対数基本周
波数パタン中のアクセント成分を、アクセントの立ち上
がりを記述する臨界制動２次線形系のインパルス応答関
数とアクセントの立ち下がりを記述する臨界制動２次線
形系のインパルス応答関数の応答の和で表し、前記アク
セントの立ち下がりを記述する関数の応答速度を単語の
アクセント型、モ−ラ数、アクセント核のあるモ−ラの
次のモ−ラの音節の種類に応じて適応的に設定するた
め、種々の状況の単語もしくは文節に対して自然音声に
近いアクセント成分が生成できる。従って、本発明によ
る基本周波数パタン生成方法を用いて音声合成を行うこ
とにより、より自然な抑揚をもった合成音が生成でき
る。As described above in detail, according to the fundamental frequency pattern generation method of the present invention, the accent component in the logarithmic fundamental frequency pattern of the voice is converted into a critical damping quadratic linear system that describes the rise of the accent. The response speed of the function describing the fall of the accent is represented by the sum of the response of the impulse response function and the response of the impulse response function of the critical damping quadratic linear system that describes the fall of the accent. Since the number and the syllable type of the mora next to the mora having the accent nucleus are adaptively set, an accent component close to natural speech can be generated for a word or a phrase in various situations. Therefore, by performing speech synthesis using the fundamental frequency pattern generation method according to the present invention, a synthesized sound having more natural intonation can be generated.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例を示す装置の機能ブロック図
である。FIG. 1 is a functional block diagram of an apparatus showing one embodiment of the present invention.

【図２】本発明によるアクセント成分のパタンの一例を
示す図である。FIG. 2 is a diagram showing an example of an accent component pattern according to the present invention.

【図３】従来の基本周波数パタン生成モデルを示すブロ
ック図である。FIG. 3 is a block diagram showing a conventional fundamental frequency pattern generation model.

【図４】従来のアクセント制御機構によるアクセント成
分の形状を示す図である。FIG. 4 is a diagram showing a shape of an accent component by a conventional accent control mechanism.

【符号の説明】[Explanation of symbols]

１０文字列入力部１１文字列解析部１２フレ−ズ制御機構１３アクセント立ち上げ制御機構１４アクセント立ち下げ制御機構１５定数設定部１６声帯振動機構１７基本周波数出力端子 Reference Signs List 10 character string input section 11 character string analysis section 12 phrase control mechanism 13 accent rise control mechanism 14 accent fall control mechanism 15 constant setting section 16 vocal cord vibration mechanism 17 fundamental frequency output terminal

フロントページの続き (56)参考文献特開昭64−28695（ＪＰ，Ａ) 特開昭62−138898（ＪＰ，Ａ) 特開平２−129700（ＪＰ，Ａ) 藤崎、須藤「日本語単語アクセントの基本周波数パタンとその生成機構のモデル」音響学会誌27巻９号、ｐｐ．445− 453（1971) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 21/06 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-64-28695 (JP, A) JP-A-62-138898 (JP, A) JP-A-2-129700 (JP, A) Fujisaki, Sudo “Japanese word accent Model of Fundamental Frequency Pattern and Its Generation Mechanism ”, Journal of the Acoustical Society of Japan, Vol. 445-453 (1971) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-21/06 JICST file (JOIS)

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】入力文章の解析により算出される韻律を
生成する為のアクセント指令とフレーズ指令を入力し、
対数軸上の基本周波数パタンを、イントネーションに対
応するフレーズ成分と、アクセントに対応するアクセン
ト成分との和で表す基本周波数パタン生成法において、前記アクセント成分を、アクセントの立ち上げ応答速度
を表す定数を用いてアクセントの立ち上がりを記述する
対数周波数軸上の臨界制動２次線形系のインパルス応答
関数と、アクセントの立ち下げ応答速度を表す定数を用
いてアクセントの立ち下がりを記述する対数周波数軸上
の臨界制動２次線形系のインパルス応答関数の和として
表すと共に、前記インパルス応答関数の立ち上げ応答速
度及び立ち下げ応答速度を各々独立に制御することを特
徴とする基本周波数パタン生成方法。 1. A prosody calculated by analyzing an input sentence.
Enter the accent command and phrase command to generate,
The fundamental frequency pattern on the logarithmic axis
Corresponding phrase component and Accen corresponding to accent
In the fundamental frequency pattern generation method represented by the sum of the accent component and the
Describes the rise of the accent using a constant that represents
Impulse response of critical damping quadratic linear system on logarithmic frequency axis
Using a function and a constant that indicates the response speed of the accent fall
On the logarithmic frequency axis that describes the fall of the accent
Of the impulse response function of the critical damping quadratic linear system of
And the rise response speed of the impulse response function.
Control the speed and fall response speed independently.
A method of generating a fundamental frequency pattern.

【請求項２】請求項１記載の基本周波数パタン生成方
法において、アクセントの立ち下がりを記述する前記イ
ンパルス応答関数の立ち下げ応答速度を、単語のアクセ
ント型、モーラ数、アクセント核のあるモーラの次のモ
ーラの音節の種類に応じて適応的に制御することを特徴
とする基本周波数パタン生成方法。 2. The method according to claim 1, wherein said step (a) describes a fall of an accent.
The fall response speed of the impulse response function is
Next to the mora with the font type, mora number, and accent kernel
It is adaptively controlled according to the type of syllables
A method of generating a fundamental frequency pattern.