JPS6017120B2

JPS6017120B2 - Phoneme piece-based speech synthesis method

Info

Publication number: JPS6017120B2
Application number: JP8264581A
Authority: JP
Inventors: 朋明阿部; 英雄渋谷; 文夫小菅; 四郎水谷; 正宏浜田; 大輔森
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1981-05-29
Filing date: 1981-05-29
Publication date: 1985-05-01
Also published as: DE3220281A1; JPS57197600A

Abstract

The system for composing a voice through phoneme component compilation, when the required voice signal is generated through sequential output of phoneme component data from a memory, effects an equalisation to a specific number of the number of samplings of two phoneme component data elements to be interpolated. The interpolation is carried out between the phoneme component data of identically numbered start and end phoneme components. The phoneme component data to be interpolated are output according to a predefined sampling cycle. The system effects interference-free or noise-free generation of a composite voice by smoothing the amplitude or frequency transition between consecutive phoneme components. <IMAGE>

Description

【発明の詳細な説明】本発明は、主にマイクロコンピュータ及びＤ／Ａ変換器
を用いた音素片編集型音声合成方式に関するもので、出
力音声の振幅値あるいは、ピッチ周期、ホルマント周波
数のなめらかな桶間を行なえるようにすることを目的と
する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a phoneme editing type speech synthesis method mainly using a microcomputer and a D/A converter, and it is a method for synthesizing speech with smooth amplitude values, pitch periods, and formant frequencies of output speech. The purpose is to enable students to perform okema.

マイクロコンピュータ、ＲＯＭ及びＤ／Ａ変換器を基本
構成要素とする従来の音素片編集型音声合成方式では、
ＲＯＭ中に書き込まれた音声のピッチ単位からなる特定
の音素片データを複数回繰り返して１個の音韻単位を構
成し、これらの音韻単位を順次接続することによってま
とまった単語音声を得ている。In the conventional phoneme editing type speech synthesis method whose basic components are a microcomputer, ROM, and D/A converter,
One phoneme unit is constructed by repeating specific phoneme data consisting of voice pitch units written in the ROM a plurality of times, and a group of word sounds is obtained by sequentially connecting these phoneme units.

第１図にこれを示す。ここで音素片グループ１は音素片
ｐｈ，を２回線返し、同グループ２は音素片ｐｈ２を４
回、同グループ３は音素片ｐｈ３を２回繰返した状態を
示す。この場合、合成音の振幅及びピッチ周期等は、メ
モリＲＯＭの中の音素片によって決定されるため、上記
音韻単位の境界で合成音の振幅、ピッチ周期及びホルマ
ント周波数が急激に変化する。This is shown in Figure 1. Here, phoneme group 1 returns the phoneme ph, 2 lines, and group 2 returns the phoneme ph2 4 times.
Group 3 shows a state in which phoneme piece ph3 is repeated twice. In this case, the amplitude, pitch period, etc. of the synthesized sound are determined by the phoneme pieces in the memory ROM, so the amplitude, pitch period, formant frequency, etc. of the synthesized sound change rapidly at the boundaries of the phoneme units.

これを第２図、及び第２図の一部である○でかこんだ部
分の詳細を第３図、第４図に示す。第３図においてＦ，
．〜Ｆ３，は音素片グルーブーのホルマント周波数、Ｆ
，２〜Ｆ母、Ｆ．３〜Ｆ３３は音素片グループ２，３の
ホルマント周波数であって、音素片グループの接続箇所
で、各ホルマント周波数が不連続になっていることを示
す。第４図は各音素片における第Ｐ次の高調波をＡｆ，
１〜Ａｆ３１、Ａｆ，２〜川３２、Ａｆ，３〜Ａｆ３３
として示している。この振幅、ピッチ周期、ホルマント
周波数の不連続性によって、音素片編集型音声合成方式
は周期的なノイズ音が発生し、他の音声合成方式たとえ
ばパーコール方式音声合成の様な自然な音声のつながり
が得られない欠点があった。一方、この様な点を改善す
るために、演算により出力音素片の振幅値をＲＯＭ中の
音素片の振幅値を異なった値にすることなどが考えられ
るが、これらの実施は、いずれもマイクロプロセッサに
相当分の演算機能を要求することになるので、コストメ
リットを主眼とした、音声合成装置には採用できない場
合が多かった。This is shown in FIG. 2, and details of the part circled in FIG. 2 are shown in FIGS. 3 and 4. In Figure 3, F,
．． ~F3, is the formant frequency of the phoneme groove, F
, 2~F mother, F. 3 to F33 are the formant frequencies of phoneme group groups 2 and 3, indicating that each formant frequency is discontinuous at the connection point of the phoneme group. Figure 4 shows the Pth harmonic in each phoneme as Af,
1~Af31, Af, 2~kawa 32, Af, 3~Af33
It is shown as Due to this discontinuity in amplitude, pitch period, and formant frequency, the phoneme editing speech synthesis method generates periodic noise, and other speech synthesis methods, such as Percoll speech synthesis, do not have natural speech connections. There was a drawback that I couldn't get it. On the other hand, in order to improve this point, it is possible to make the amplitude value of the output phoneme piece different from the amplitude value of the phoneme piece in ROM by calculation, but all of these implementations are Since this requires a considerable amount of arithmetic functionality from the processor, it has often not been possible to use it in speech synthesis devices that focus on cost benefits.

本発明は、かかる欠点を除去するものであり、マイクロ
プロセッサにかかる負担が少なく、かつ前記の音韻単位
の境界における急激な特性変化を減少させる目的で各々
の音韻単位内部にて振幅、ピッチ周期、及びホルマント
周波数を連続的に変化させるように各音素片データを処
理し容易に各音素片間において橘間を取れるようにした
音素片編集型音声合成方式を提供するものである。The present invention eliminates such drawbacks by reducing the burden on the microprocessor and reducing the rapid characteristic changes at the boundaries of the phonetic units by changing the amplitude, pitch period, etc. within each phonetic unit. The present invention also provides a phoneme segment editing type speech synthesis method that processes each phoneme piece data so as to continuously change the formant frequency and easily create a tachibana interval between each phoneme piece.

以下の説明では簡単のため説明を、定められた位相をも
つ正弦波の和で表わすことのできる音素′片に限る。ま
ず、ヘルムホルッの位相法則ぐ楽音について耳は、位相
に感じない」）を音声にも仮定し、音素片の各周波数成
分の位相を変化させ、これらを全てｏｏ又は１８００か
ら始まる正弦波で置き換えることができる。これを第１
式に示す。In the following explanation, for the sake of simplicity, the explanation will be limited to phoneme pieces that can be represented by the sum of sine waves with a predetermined phase. First, we assume that Helmholt's phase law (the ear does not perceive the phase of musical tones) also applies to speech, change the phase of each frequency component of a phoneme, and replace them all with a sine wave starting from oo or 1800. I can do it. This is the first
As shown in the formula.

ｐｈ，は音素片１を示す。ph, indicates phoneme piece 1.

ｗ，は音素片１の基本角周波数を示す。w, indicates the fundamental angular frequency of phoneme piece 1.

ｉは基本角周波数（ピッチ周波数）の第ｉ次高調波を示
す。i indicates the i-th harmonic of the fundamental angular frequency (pitch frequency).

Ａ，ｉは第ｉ次高調波の振幅値を示す。A,i indicates the amplitude value of the i-th harmonic.

各音素片は、第１式に従って置き換えると第２式の様に
各音素片を表わすことができる。When each phoneme piece is replaced according to the first equation, each phoneme piece can be expressed as in the second equation.

添字ｎは音素片ｎを示す。The subscript n indicates phoneme piece n.

ここで、隣り合う２つの音素片ｐｈｎとｐｈｎ‐，との
差を求めると、第３式の様になる。Here, if we calculate the difference between two adjacent phoneme pieces phn and phn-, we get the following equation.

ｐｈｎ−ｐｈｎ‐，式を展開していくために、２つの音素片間の振幅値及び
基本角周波数を第４式、第５式で表わす。phn-phn-, In order to develop the equation, the amplitude value and fundamental angular frequency between two phoneme pieces are expressed by the fourth equation and the fifth equation.

ＡｎＦＫｎ‐・ｉ・Ａｎ‐・ｉ ……第４
式のｎ＝そげ．・のｎ‐・ ……第５式ｋ
ｎ−，ｉは第ｊ次高調波の振幅比．〆ｎ−，は基本角周
波数の比を示す。AnFKn-・i・An-・i...4th
n in the formula = soge.・n-・ ...5th formula k
n-, i is the amplitude ratio of the jth harmonic. 〆n−, indicates the ratio of fundamental angular frequencies.

第４式及び第５式を第３式に代入し第６式とする。The fourth equation and the fifth equation are substituted into the third equation to obtain the sixth equation.

１第６式において〆ｎ‐，＝１（２つの音素片の基本
角周波数が等しい場合）とおくと、第６式は第７式で表
わされる。1 In the sixth equation, if 〆n-,=1 (when the fundamental angular frequencies of the two phoneme pieces are equal), the sixth equation is expressed as the seventh equation.

第７式を用いて、新しい音素片ｐｈｎ／Ｍを次式の様に
求める。Using the seventh equation, a new phoneme piece phn/M is determined as shown in the following equation.

第８式に第７式を代入して整理すると、第９式は、音素片ｐｈｎとｐｈｎ‐，の各高調波成分の
振幅の平均値を示している。Substituting Equation 7 into Equation 8 and rearranging it, Equation 9 shows the average value of the amplitude of each harmonic component of phoneme pieces phn and phn-.

この新しい音素片ｐｈｎ′ｎ‐，の振幅の変化を第４図
に対応して第５図に示す。第５図に示した新しい音素片
ｐ払／，は、音素片ｐ払とｐｈ，の補間音素片であり、
同一基本角周波数を持って音素片における各高調波の振
幅補間であることが分かる。このときの音素片波形変化
を第６図に示す。次に第６図に示した同じ基本角周波数
を持った音素片ｐｈ，，ｐ払を、同じサンプリング時間
で，〔ｓｅｃ〕で、サンプリングしたときの波形を第７
図に示す。The change in amplitude of this new phoneme phn'n-, is shown in FIG. 5, corresponding to FIG. The new phoneme piece p pay/, shown in FIG. 5 is an interpolated phoneme piece of the phoneme pieces p pay and ph,
It can be seen that the amplitude of each harmonic in a phoneme segment having the same fundamental angular frequency is interpolated. FIG. 6 shows phoneme waveform changes at this time. Next, the waveform when the phoneme pieces ph,,p, having the same fundamental angular frequency shown in Fig. 6 are sampled at the same sampling time [sec] is the 7th waveform.
As shown in the figure.

サンプリングｉ番目における時刻ｔは、ｔ＝ｉ７．〔ｓｅｃ〕・・・・・・第１
巧式第１坊式で表わすことができるので、音素片ｐｈ，
，ｐｈ２のサンプル値をそれぞれ第１１式及び第ｌａ式
で表わすことができる。The time t at the i-th sampling is t=i7. [sec] ・・・・・・1st
Since it can be expressed in the first expression of the Takumi style, the phoneme piece ph,
, ph2 can be expressed by the 11th equation and the la-th equation, respectively.

サンプリング時間７，〔ｓｅｃ〕は第１３式で表わされ
る。The sampling time 7, [sec] is expressed by Equation 13.

Ｔ，はｐｈ，，ｐｈ２の基本角周波数に対応する周期で
ある。T, is a period corresponding to the fundamental angular frequency of ph,, ph2.

Ｎは音素片ｐｈ，，ｐｈ２の１周期内のサンプリング数
を示す。N indicates the number of samplings of the phoneme pieces ph, ph2 within one period.

第１＄式を用いて、第１１式、第１２式を整理する。Using the first $ expression, organize the 11th and 12th expressions.

音素片ｐｈ，ｉとｐｈ２ｉの差を、第１４式と第１５式
から求める。第１６式を用いて新しい音素片のｉ番目の
サンプル値ｐｈ２／，ｉを求める。The difference between the phoneme pieces ph,i and ph2i is found from the 14th equation and the 15th equation. The i-th sample value ph2/,i of the new phoneme is determined using Equation 16.

ここで、第９式で求めた新しい音素片を、サンプリング
する。Here, the new phoneme segment obtained using Equation 9 is sampled.

サンプリング時間を７ｎ‐，〔ｓｅｃ〕とすると、第１
頚式で表わされる。If the sampling time is 7n-, [sec], the first
Represented by the neck style.

Ｔｎ‐，は、音素片ｐｈｎ／ｎ‐，の基本角周波数時間
を表わす。Tn-, represents the fundamental angular frequency time of the phoneme piece phn/n-,.

Ｎは、音素片ｐｈｎ／ｎ‐，の１周期を７〔ｓｅｃ〕で
サンプリングした時のサンプル数第１缶式を用いて、ｉ
番目のサンプル値ｐｈｎ／ｎ‐，ｊを表わすと、第１９
式となる。N is the number of samples when one period of the phoneme piece phn/n-, is sampled at 7 [sec]. Using the first can formula, i
The 19th sample value phn/n-,j is expressed as the 19th sample value phn/n-,j.
The formula becomes

よって第１方式１ま、第１９式と等価となる。Therefore, the first method 1 is equivalent to Equation 19.

従ってｊ番目の音素片サンプル値びｈ，ｉとｐ〜ｉの平
均値によって得られる新しい音素片｛ｐｈ２′，ｊｌ
ｉ＝１，２，……，Ｎ｝は音素片ｐｈ，とｐｈ２の各周
波数成分の振幅平均を持った音素片であることが分かる
。この新しい音素片｛ｐｈ２／，ｉｌｉ＝１，２，……
，Ｎ｝を第８図に示す。Therefore, a new phoneme piece {ph2′, jl
It can be seen that i=1, 2, . This new phoneme piece {ph2/, ili=1, 2,...
, N} are shown in FIG.

つまり、同じ基本角周波数を持った２つの音素片間の補
間を取るとき、まず同じサンプリング時間７〔ｓｅｃ〕
言い換えると同じサンプリング数Ｎ個でＮ＝工
……第２拭音素片をサンプリングする。In other words, when interpolating between two phoneme pieces with the same fundamental angular frequency, first the same sampling time 7 [sec]
In other words, with the same number of samples N, N = work
. . . The second waving sound element is sampled.

次にサンプリングされた２つの音素片のｊ番目のサンプ
ル値の平均を計算することにより、各高調波成分の振幅
補間が行なえることが分かる。ｎ次に第６式において
そｎ‐，キ１（基本角周波数が異なる）のときを考える
。Next, it can be seen that amplitude interpolation of each harmonic component can be performed by calculating the average of the j-th sample values of the two sampled phoneme pieces. n Next, consider the case where n- and Ki1 (the fundamental angular frequencies are different) in the sixth equation.

音素片ｐｈ．の基本周期をＴ，〔ｓｅｃ〕、音素片ｐｈ
２の基本周期をＴ２〔ｓｅｃ）とすると、Ｔ２：Ｔ．・
そ，……第２１式第２１式の関係が成り立つ。Phoneme piece ph. The fundamental period of T, [sec], the phoneme piece ph
If the fundamental period of T.2 is T2 [sec], then T2:T.・
So...the relationship of Equation 21 and Equation 21 holds true.

ここで、各々の音素片ｐｈ，，ｐｈ２を、Ｎ個の同じサ
ンプル数でサンプリングを行なう。Here, each phoneme piece ph,, ph2 is sampled with the same number of N samples.

つまりサンプリング時間を従釆の様に一定にするのでは
なく、１音素片におけるサンプル数を一定にしてサンプ
リングを行なう。音素片ｐｈ，，ｐｈ２のｊ番目におけ
るサンプル値を、それぞれ第２２式、第２＄式で表わす
。In other words, instead of keeping the sampling time constant as in the case of a subordinate, sampling is performed by keeping the number of samples in one phoneme unit constant. The sample values at the j-th phonemes ph, , ph2 are expressed by the 22nd equation and the 2nd $ equation, respectively.

７，は音素片ｐｈ，のサンプリング日で丁．＝苦〔Ｓ
ｅｃ〕 ‐・‐‐‐‐第２４式７２一は音素片ｐｈ
２のサンプリング時間であり、７２＝害〔Ｓｅｃ〕
・…‐・第２５式で表わされる。7. is a sampling of the phoneme ph. = bitterness [S
ec] ‐・‐‐‐‐Equation 24 721 is the phoneme piece ph
2 sampling time, 72 = harm [Sec]
...--It is expressed by the 25th formula.

従ってこのサンプリング時間？・，↑２の間には、次の
関係があることが分かる。Therefore this sampling time? It can be seen that there is the following relationship between ・ and ↑2.

〜＝夕．・丁， ……第２５式第２２
式、第２＄式に第２４式、第２５式を代入して整理する
と第２方式、第２頚式で表わされる。~ = Evening.・Ding, ...25th formula 22nd
By substituting the 24th equation and the 25th equation into the 2nd $ equation and arranging it, it can be expressed as the 2nd method and the 2nd neck equation.

第２方式と第２申式‘ま、第１１式、第１２式と全く等
価であるから、ｊ番目の音素片サンプル値の平均値によ
って表わされる新しい音素片｛ｐｈ２′，ｊ｝は、音素
片ｐｈ，とｐｈ２の振幅補間を取っていることになる。
つまり「ｊ番目の音素片間のサンプル平均値を計算す
ると言うことは、音素片ｐｈ，とｐｈ２の基本角周波数
を仮想的に一致させることに等しい。Since the second method and the second equation are completely equivalent to the 11th and 12th equations, the new phoneme segment {ph2′,j} represented by the average value of the j-th phoneme segment sample values is the phoneme This means that amplitude interpolation is performed between half ph and ph2.
In other words, ``Calculating the sample average value between the j-th phonemes is equivalent to virtually matching the fundamental angular frequencies of the phonemes ph and ph2.''

そこで新しい音素片のサンプリング時間を第２９式で表
わす。７２′・＝ここ；二 −雄デ２７．・・・・・・第２拭第２１式は、音素片ｐｈ，とｐｈ２の間の周波数補間を
示している。Therefore, the sampling time of a new phoneme is expressed by Equation 29. 72' = here; two male de 27. . . . Second equation 21 shows frequency interpolation between phoneme pieces ph and ph2.

異なるピッチ周期の音素片間の桶間を第９図に示す。こ
のように異なるピッチ周期を持った音素片間の桶間は「
同じサンプル数Ｎでサンプリングされることにより、基
本角周波数を規格化して同一視し、第１方式に表わされ
ているように、両音素片の各々の高調波の振幅補間とな
る。FIG. 9 shows the spacing between phoneme pieces with different pitch periods. In this way, the interval between phoneme pieces with different pitch periods is ``
By sampling with the same number of samples N, the fundamental angular frequency is normalized and equated, and as expressed in the first method, amplitude interpolation of each harmonic of both speech units is performed.

しかしながら、規格化により、基本角周波数情報は失わ
れているから、第２既式１こ表わしたように新たに両音
素片の基本周波数の平均を取り、補間音素片の基本角周
波数情報を得ているのである。なお上記説明では２つの
音素片の間に挿入する桶問責素片のデータ値およびサン
プリング周波数はともに線形補間により求めるものとし
て説明したが補間演算として線形補間以外の桶間演算を
使用しても良い。ゆえに隣り合う音素片ｐｈｎとｐｈｎ
‐，の振中およびピッチ周期、ホルマソト周波数の桶間
は２つの音素片を同じサンプリング数Ｎでサンプリング
し各々の音素片の同一番目のサンプル値から補間データ
値を求め、上記補間データを２つの音素片のサンプリン
グ周波数より補間演算を行うことにより得られるサンプ
リング周波数で出力することによって実現可能となる。
以上の説明では定められた位相をもつ正弦波の和で表わ
すことのできる音素片についてのべたが一般に音声の場
合には近接した音声信号の位相について連続性を仮定で
きるため一般の音声片についても同様の効果が期待でき
る。However, due to standardization, the fundamental angular frequency information is lost, so as shown in the second formula 1, the fundamental frequencies of both phonemes are newly averaged to obtain the fundamental angular frequency information of the interpolated phoneme. -ing In the above explanation, the data value and sampling frequency of the interpolation segment to be inserted between two phoneme segments are both determined by linear interpolation. good. Therefore, adjacent phoneme pieces phn and phn
-, pitch period, formasoto frequency, two phoneme pieces are sampled with the same sampling number N, interpolated data values are obtained from the same sample value of each phoneme piece, and the above interpolated data is divided into two This can be achieved by outputting at a sampling frequency obtained by performing interpolation calculations from the sampling frequency of the phoneme segment.
The above explanation has focused on phoneme segments that can be represented by the sum of sine waves with a predetermined phase, but in general, in the case of speech, continuity can be assumed for the phases of adjacent audio signals, so general speech segments can also be expressed. Similar effects can be expected.

これをマイクロプロセッサで実現するフローチャートを
第１０図に示す。A flowchart for realizing this using a microprocessor is shown in FIG.

以上のように本発明によれば、桶間音素片を使用するよ
うにしているためにノイズ音が発生することなく自然に
近い音声合成音が得られる。As described above, according to the present invention, a speech synthesized sound that is close to natural can be obtained without generating noise since the Okema phoneme segment is used.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は音素片編集型音声合成によって合成された波形
の一部を示す波形図、第２図は音声スベクトラムの３次
元表示を示す特性図、第３図はホルマント周波数の変化
を示す特性図、第４図は同一周波数における振幅値の変
化を示す特性図、第５図は本発明の一実施例における音
素片編集型音声合成方式による同一ピッチ周期の振幅補
間を示す特性図、第６図は同一ピッチ周期補間式音素片
編集型音声合成の状態を示す波形図、第７図は同一ピッ
チ周波数の音素片波形を示す波形図、第８図はｐｈ２／
，ｉより求められた新しい音素片を示す波形図、第９図
は異なるピッチ周波数の音素片間の補間を示す特性図、
第１０図は本発明の一実施例を示すフローチャート図で
ある。第１図第２図第３図第４図第５図第６図第７図第８図第９図第１０図Figure 1 is a waveform diagram showing part of the waveform synthesized by phoneme editing type speech synthesis, Figure 2 is a characteristic diagram showing a three-dimensional display of the speech spectrum, and Figure 3 is a characteristic diagram showing changes in formant frequency. , FIG. 4 is a characteristic diagram showing changes in amplitude values at the same frequency, FIG. 5 is a characteristic diagram showing amplitude interpolation of the same pitch period by the phoneme piece editing type speech synthesis method in an embodiment of the present invention, and FIG. is a waveform diagram showing the state of the same pitch cycle interpolation type phoneme piece editing type speech synthesis, FIG. 7 is a waveform diagram showing the phoneme piece waveform of the same pitch frequency, and FIG.
, a waveform diagram showing a new phoneme piece obtained from i, FIG. 9 is a characteristic diagram showing interpolation between phoneme pieces with different pitch frequencies,
FIG. 10 is a flowchart showing one embodiment of the present invention. Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10

Claims

【特許請求の範囲】１音素片データをメモリに記憶しておき音声合成制御
情報にしたがつて上記メモリから音素片データを順次読
み出し、接続することによつて所望の音声信号を得るに
あたつて、その間を補間すべき２つの音素片データに関
しそのサンプル数を所定のサンプル数に等しくするステ
ツプと、先行する音素片と後続する音素片の各々同一番
目のデータ値から補間演算により補間音素片データを作
成するステツプと、上記補間音素片データを上記先行す
る音素片のサンプリング周期と後続する音素片のサンプ
リング周期とから補間演算を行うことにより得られるサ
ンプリング周期で出力するステツプを有することを特徴
とする音素片編集型音声合成方式。２メモリに記憶されている音素片データが定められた
位相から始まる正弦波の和で表わすことのできることを
特徴とする特許請求の範囲第１項記載の音素片編集型音
声合成方式。[Claims] 1. A desired speech signal is obtained by storing phoneme piece data in a memory and sequentially reading the phoneme piece data from the memory according to speech synthesis control information and connecting them. the step of making the number of samples equal to a predetermined number of samples for the two phoneme segment data to be interpolated between them; The present invention is characterized by comprising a step of creating data, and a step of outputting the interpolated phoneme piece data at a sampling period obtained by performing an interpolation calculation from the sampling period of the preceding phoneme piece and the sampling period of the following phoneme piece. A phoneme segment editing type speech synthesis method. 2. The phoneme piece editing type speech synthesis method according to claim 1, wherein the phoneme piece data stored in the memory can be expressed as a sum of sine waves starting from a predetermined phase.