JP2000172286A

JP2000172286A - Simultaneous articulation processor for chinese voice synthesis

Info

Publication number: JP2000172286A
Application number: JP10342796A
Authority: JP
Inventors: Shunkitsu Kaku; 俊桔郭
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1998-12-02
Filing date: 1998-12-02
Publication date: 2000-06-23
Also published as: SG77275A1; TW451183B; CN1257271A

Abstract

PROBLEM TO BE SOLVED: To obtain a high natural synthetic voice by retrieving a simultaneous articulation segment of a word line, overlapping the waveform of the simultaneous articulation segment on the waveforms of an advancing syllable and a succeeding syllable of the word line and weighting them. SOLUTION: A word analytic device 11 analyzes an inputted sentence based on a dictionary in a dictionary memory 12 storing the word lines and the voice written data, and divides it to plural word lines, and labels on positions between adjacent word lines. A syllable analytic device 14 decides whether the simultaneous articulation processing is performed for any word line based on the data in a pitch data storage of VC(VV) simultaneous articulation segment and CV syllable 15 and a label data storage of VC(VV) simultaneous articulation segment and CV syllable 16, and retrieves the pitch data and label data of the decided CV syllable and VC(VV) simultaneous articulation segment. A waveform overlapping and adding device 19 overlaps and adds the waveforms of the CV syllable and the VC(VV) simultaneous articulation segment.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、中国語の音声合成
において、ある１つの音節から次の１つの音節への滑ら
かな移行を得る中国語音声合成のための同時調音処理装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a simultaneous articulator for synthesizing Chinese speech to obtain a smooth transition from one syllable to the next syllable in Chinese speech synthesis.

【０００２】[0002]

【従来の技術】中国語音声合成において一連の音節の発
音をより流暢かつ滑らかにするために、隣接する音節の
接続を滑らかにする処理は、同時調音（coarticulatio
n）処理又は連音処理と呼ばれる。ここで、同時調音と
は、ある音の調音に際して同時に副次的な別の調音が行
なわれることをいう。中国語の音声合成において、単語
列の１音節から当該単語列のその次の１音節への滑らか
な移行を得るためには、先行する音節を構成する音素の
一部と、後続する音節を構成する音素の一部とを重ね合
わせる同時調音処理が必要とされる。図３は、人間によ
って生成された発声“中文”の広帯域スペクトログラム
による波形図であり、図４は、従来例の中国語音声合成
システムによって生成された発声“中文”の広帯域スペ
クトログラムによる波形図である。図３から、同時調音
現象が存在することは明らかである。しかしながら、従
来例の中国語音声合成システムのほとんどは、図４に示
されるように、同時調音処理について考慮することな
く、単語列“中文”の２つの音節の隣接する音素を単に
相互接続させる。従って、自然さの度合いの低い合成音
声が結果として得られる。2. Description of the Related Art In Chinese speech synthesis, in order to make a series of syllables more fluent and smooth, a process of smoothing the connection of adjacent syllables is performed by a coarticulatio.
n) It is called processing or continuous sound processing. Here, simultaneous articulation means that when articulating a certain sound, another articulation is simultaneously performed. In Chinese speech synthesis, in order to obtain a smooth transition from one syllable of a word string to the next syllable of the word string, a part of the phoneme constituting the preceding syllable and the following syllable must be formed. Simultaneous articulation processing that overlaps some of the phonemes to be performed is required. FIG. 3 is a waveform diagram based on a broadband spectrogram of the utterance “Chinese” generated by a human, and FIG. 4 is a waveform diagram based on a broadband spectrogram of the utterance “Chinese” generated by a conventional Chinese speech synthesis system. . It is clear from FIG. 3 that the simultaneous articulation phenomenon exists. However, most of the conventional Chinese speech synthesis systems simply interconnect adjacent phonemes of two syllables of the word string “Chinese” without considering simultaneous articulation processing, as shown in FIG. Therefore, a synthesized speech having a low degree of naturalness is obtained as a result.

【０００３】さらに、従来の中国語音声合成システムに
おいて使用される同時調音処理技術は、単語列の同時調
音セグメントにおいて時間領域でのシミュレーションを
実行するものである。換言すれば、まず最初に、最適な
同時調音セグメントが、記録された多数の同時調音セグ
メント音声データから検索される。次いで、最適な同時
調音セグメントが、先行する音節と後続する音節の間に
内挿される。上述の処理の重要なポイントは、最適な同
時調音セグメントの決定と、記録された同時調音セグメ
ント音声データからの最適な同時調音セグメントの検索
である。従来技術文献「“中文連音二字詞之語音合
成”，R.O.C. Computational Linguistics Conference
IX，１９９６年」は参照してここに含まれる。Further, the simultaneous articulation processing technique used in the conventional Chinese speech synthesis system executes a simulation in the time domain in the simultaneous articulation segment of a word string. In other words, first, the optimal simultaneous articulatory segment is searched from a large number of recorded simultaneous articulatory segment audio data. The optimal co-articulatory segment is then interpolated between the preceding and following syllables. An important point of the above-described processing is the determination of the optimal simultaneous articulatory segment and the search for the optimal simultaneous articulatory segment from the recorded simultaneous articulatory segment audio data. Prior art document "Chinese continuous phonetic two-letter noun speech synthesis", ROC Computational Linguistics Conference
IX, 1996, which is hereby incorporated by reference.

【０００４】図５は、前述した従来例の中国語音声合成
システムを示すブロック図である。図５において、入力
装置１００は、オペレータが合成されるべき音声書き下
し文を入力するための装置である。単語列記憶装置１１
０は、記録された多数の単語列音声データを記憶する。
単音節記憶装置１８０は、記録された単音節音声データ
を記憶する。単語列検索装置１２０は、入力された音声
書き下し文に従って、同時調音処理を行わなければなら
ない単語列を、単語列記憶装置１１０から検索し、上記
検索された単語列を解析して同時調音セグメントを決定
する。中心位置検索装置１３０は、当該単語列の同時調
音セグメントの中心位置を検索する。評価装置１４０
は、同時調音セグメントの音声時間長を評価する。前置
音節合成装置１５０は、入力された音声書き下し文に従
って、先行する音節の記録された単音節音声データを単
音節記憶装置１８０から検索し、検索された上記記録さ
れた単音節音声データを合成する。同時調音セグメント
合成装置１６０は、前置音節合成装置１５０から出力さ
れる合成音声データを同時調音セグメントと合成する。
後置音節合成装置１７０は、入力された音声書き下し文
に従って、後続する音節の記録された単音節音声データ
を単音節記憶装置１８０から検索し、同時調音セグメン
ト合成装置１６０から出力される合成音声データを、検
索された上記記録された単音節音声データと合成する。
合成音声出力装置１９０は、最終的に得られる、音声で
出力される合成音声データを出力する。FIG. 5 is a block diagram showing the above-described conventional Chinese speech synthesis system. In FIG. 5, an input device 100 is a device for an operator to input a voice draft to be synthesized. Word string storage device 11
0 stores a large number of recorded word string voice data.
The single syllable storage device 180 stores the recorded single syllable voice data. The word string search device 120 searches the word string storage device 110 for a word string that needs to be subjected to simultaneous articulation processing in accordance with the input written speech sentence, and analyzes the searched word string to determine a simultaneous articulation segment. I do. The center position search device 130 searches for the center position of the simultaneous articulation segment of the word string. Evaluation device 140
Evaluates the audio duration of the simultaneous articulation segment. The pre-syllable synthesizing device 150 searches the mono-syllable storage device 180 for mono-syllable voice data in which the preceding syllable has been recorded according to the input voice-written sentence, and synthesizes the searched mono-syllable voice data. . Simultaneous articulation segment synthesizer 160 synthesizes synthesized speech data output from prefix syllable synthesis device 150 with simultaneous articulation segments.
The post-syllable synthesizing device 170 searches the mono-syllable storage device 180 for mono-syllable voice data in which the succeeding syllable is recorded according to the input speech writing sentence, and And synthesizes the searched single syllable voice data.
The synthesized voice output device 190 outputs synthesized voice data that is finally obtained and output as voice.

【０００５】前述した従来例の中国語音声合成システム
は、単語列記憶装置１１０から最適な同時調音セグメン
トを検索し、単音節記憶装置１８０から記録された単音
節音声データを検索し、これらを合成して、合成音声出
力の自然さ及び理解度を改善することは、図５から明ら
かである。In the above-described conventional Chinese speech synthesis system, the optimal simultaneous articulatory segment is searched from the word string storage device 110, the single syllable speech data recorded from the single syllable storage device 180 is searched, and these are synthesized. It is clear from FIG. 5 that the naturalness and understanding of the synthesized speech output are improved.

【０００６】例えば、もし、同時調音処理を行われる、
中国語の単語列“中文”が、図５に示される従来例のシ
ステムを用いて合成されるときは、単語列“中文”に対
応する音声書き下し文は、まず最初に、入力装置１００
を介してオペレータによって入力される。その後、単語
列記憶装置１１０は、単語列“中文”に対応する記録さ
れた単語列音声データについて検索される。単語列記憶
装置１１０内に単語列“中文”の記録された単語列音声
データが存在すると仮定すれば、単語列検索装置１２０
は、単語列“中文”に対応する記録された単語列音声デ
ータを、単語列記憶装置１１０から検索する。検索され
た上記記録された単語列音声データは、単語列“中文”
を解析して同時調音セグメントを決定する。単語列“中
文”の同時調音セグメントの中心位置は、中心位置検索
装置１３０において求められる。単語列“中文”の同時
調音セグメントの音素時間長は、評価装置１４０におい
て評価される。前置音節合成装置１５０は、単語“中”
に対応する記録された単音節音声データについて、単音
節記憶装置１８０内を検索する。同時調音セグメント合
成装置１６０は、単語“中”に対応する検索された上記
記録された単音節音声データを同時調音セグメントと合
成する。次いで、後置音節合成装置１７０は、単語
“文”に対応する記録された単音節音声データを、単音
節記憶装置１８０から検索し、同時調音セグメント合成
装置１６０から出力される合成音声データを、単語
“文”に対応する検索された上記記録された単音節音声
データと合成する。最後に、合成音声出力装置１９０
は、最終的に得られた、音声で出力される合成音声デー
タを出力する。For example, if simultaneous articulation processing is performed,
When the Chinese word string “Chinese sentence” is synthesized using the conventional system shown in FIG. 5, a voice draft corresponding to the word string “Chinese sentence” is first input to the input device 100.
Is entered by the operator via Thereafter, the word string storage device 110 is searched for the recorded word string voice data corresponding to the word string “Chinese”. Assuming that word string voice data in which the word string “Chinese sentence” is recorded exists in the word string storage device 110, the word string search device 120
Searches the word string storage device 110 for recorded word string voice data corresponding to the word string “Chinese sentence”. The searched recorded word string voice data is a word string “Chinese sentence”
To determine simultaneous articulation segments. The center position of the simultaneous articulatory segment of the word string “Chinese” is obtained by the center position search device 130. The phoneme duration of the simultaneous articulation segment of the word string “Chinese” is evaluated by the evaluation device 140. The prefix syllable synthesizing device 150 outputs the word “medium”.
Is searched in the single syllable storage device 180 for the recorded single syllable voice data. The simultaneous articulation segment synthesizing device 160 synthesizes the searched single syllable speech data corresponding to the word “medium” with the simultaneous articulation segment. Next, the post-syllable synthesizing device 170 searches the monosyllable storage device 180 for the recorded monosyllable voice data corresponding to the word “sentence”, and synthesizes the synthesized voice data output from the simultaneous articulatory segment synthesizing device 160 with: The recorded monosyllable speech data corresponding to the word “sentence” is synthesized. Finally, the synthesized speech output device 190
Outputs synthesized voice data finally obtained and output as voice.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、単語列
記憶装置１１０が単語列“中文”に対応する記録された
単語列音声データを含んでいないときは、最も類似する
同時調音セグメントが、単語列“中文”の先行する音節
の母音（However, when the word string storage device 110 does not include the recorded word string voice data corresponding to the word string "Chinese", the most similar simultaneous articulation segment is the word string " The vowel of the preceding syllable of “Chinese”

【外１】）と単語列“中文”の後続する音節の最初の音素（[Outside 1] ) And the first phoneme of the syllable following the word string “Chinese” (

【外２】）とに従って、例えば“通問”（[Outside 2] ), And for example, “Contact” (

【外３】）が決定され、上述されたように合成される。従って、
自然さの度合いが低い合成音声が、結果として得られ
る。さらに、上述されたシステムは、多数の記録された
単語列音声データを記憶するために約５５ＭＢのメモリ
空間を必要とし、このことによって、貴重なメモリ空間
を浪費する。加えて、記録された音声データが合成のた
めの基本単位として使用されるので、その周波数及び音
素時間長は変更不可能であり、かつ、記録された音声デ
ータの検索及び合成は、時間を浪費する。[Outside 3] ) Is determined and synthesized as described above. Therefore,
A synthesized speech with a low degree of naturalness is obtained as a result. Furthermore, the system described above requires about 55 MB of memory space to store a large number of recorded word string audio data, which wastes valuable memory space. In addition, since the recorded audio data is used as a basic unit for synthesis, its frequency and phoneme time length cannot be changed, and searching and synthesizing the recorded audio data is time consuming. I do.

【０００８】従って、上述された従来技術は、次のよう
な欠点を有する。（１）多数の記録された単音節音声データ及び記録され
た単語列音声データを記憶しなければならない。（２）もし所望される記録された単語列音声データが単
語列記憶装置に含まれていないとき、自然さの度合いが
低い合成音声が結果として得られる。（３）記録された音声データが使用されるので、音素時
間長及び韻律が変更不可能である。（４）記録された音声データの検索は、時間を浪費す
る。Therefore, the above-mentioned prior art has the following disadvantages. (1) A large number of recorded monosyllabic voice data and recorded word string voice data must be stored. (2) If the desired recorded word string speech data is not included in the word string storage device, a synthesized speech having a low degree of naturalness is obtained as a result. (3) Since the recorded voice data is used, the phoneme time length and the prosody cannot be changed. (4) Searching for recorded audio data is time consuming.

【０００９】従って、本発明の目的は以上の問題点を解
決し、中国語の音声合成においてある１つの音節から後
続の１つの音節へのスムーズな移行を得ることができる
中国語音声合成のための同時調音処理装置を提供するこ
とにある。SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to solve the above-mentioned problems, and to provide a speech synthesis of Chinese which can obtain a smooth transition from one syllable to one subsequent syllable in speech synthesis in Chinese. To provide a simultaneous tone processing device.

【００１０】[0010]

【課題を解決するための手段】本発明では、中国語音声
合成のための同時調音処理装置は、複数の中国語の単語
列と、それに対応する音声書き下しデータとを記憶する
辞書メモリと、種々の中国語の音節及び同時調音セグメ
ントのピッチデータと、上記種々の中国語の音節及び同
時調音セグメントに対応する音声書き下しデータと、上
記種々の中国語の音節及び同時調音セグメントの子音及
び母音の開始点及び終了点とを記憶する記憶装置と、上
記辞書メモリ内の辞書に基づいて、合成されるべき入力
された音声書き下し文を解析し、当該文を複数の単語列
に区分する単語解析装置と、上記記憶装置内のデータに
基づいて、上記単語解析装置からのどの単語列に対して
同時調音処理を実行すべきかを決定し、同時調音処理を
実行することが決定された単語列の同時調音セグメント
を検索する音節解析装置と、検索された同時調音セグメ
ントを、入力された音声書き下し文における単語列の音
節間に内挿して合成音声を生成する音声合成装置とを備
えたことを特徴とする。According to the present invention, a simultaneous articulation processing apparatus for synthesizing Chinese speech includes a dictionary memory for storing a plurality of Chinese word strings and corresponding speech writing data. Pitch data of Chinese syllables and simultaneous articulation segments, speech writing data corresponding to the various Chinese syllables and simultaneous articulation segments, and start of consonants and vowels of the various Chinese syllables and simultaneous articulation segments A storage device for storing points and end points, and a word analysis device for analyzing an input voice draft sentence to be synthesized based on the dictionary in the dictionary memory and dividing the sentence into a plurality of word strings, Based on the data in the storage device, it is determined which word string from the word analyzer should be subjected to the simultaneous articulation processing, and it is determined that the simultaneous articulation processing is to be executed. A syllable analysis device for searching for a simultaneous articulatory segment of the searched word sequence, and a speech synthesis device for generating a synthesized speech by interpolating the searched simultaneous articulation segment between syllables of the word sequence in the input speech draft. It is characterized by having.

【００１１】また、上記中国語音声合成のための同時調
音処理装置において、上記記憶装置は、４０９個の中国
語の音節を、中国語の四声のうちの第一声で記憶するこ
とを特徴とする。[0011] In the simultaneous articulation processing device for Chinese speech synthesis, the storage device stores 409 Chinese syllables in a first voice of four Chinese voices. And

【００１２】さらに、上記中国語音声合成のための同時
調音処理装置において、上記記憶装置に記憶された上記
同時調音セグメントは、図６で定義された、中国語の単
語列の後続する音節のうちの最初の音素であることを特
徴とする。Further, in the simultaneous articulator for synthesizing Chinese speech, the simultaneous articulatory segment stored in the storage device is one of the following syllables of the Chinese word string defined in FIG. Is the first phoneme.

【００１３】上述の構造を有する本発明の中国語音声合
成のためのＣＶ−ＶＣ（ＶＶ）同時調音処理装置は、ま
ず最初に、ユーザによって入力された音声書き下し文
を、辞書メモリ内の辞書に基づいて複数の単語列に区分
する。次いで、音節解析装置が、同時調音処理を行われ
るべき先行する音節及び後続する音節を決定する。その
後、各音節のピッチデータと子音及び母音の開始点及び
終了点が、音節データ記憶装置から検索される。最後
に、音声合成装置は、音素時間長及び周波数の変更処理
のために音素時間長及び周波数を評価し、音声を合成し
て出力する。The CV-VC (VV) simultaneous articulator for Chinese speech synthesis according to the present invention having the above-mentioned structure firstly transcribes a voice draft sentence input by a user based on a dictionary in a dictionary memory. Into a plurality of word strings. Next, the syllable analyzer determines a preceding syllable and a following syllable for which simultaneous articulation processing is to be performed. Thereafter, the pitch data of each syllable and the start and end points of consonants and vowels are retrieved from the syllable data storage. Finally, the speech synthesizer evaluates the phoneme time length and frequency for changing the phoneme time length and frequency, synthesizes and outputs speech.

【００１４】本発明の他の特徴及び利点は、添付の図面
を参照した好ましい実施形態の以下の詳細な説明によっ
て明白になるであろう。[0014] Other features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments which refers to the accompanying drawings.

【００１５】[0015]

【発明の実施の形態】以下、本発明に係る実施形態につ
いて図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明に係る一実施形態における
中国語音声合成のための同時調音処理装置を示すブロッ
ク図である。図１において、入力装置１０は、例えばキ
ーボードを備え、キーボードを用いて、オペレータは音
声で合成されるべき音声書き下し文を入力することがで
きる。単語解析装置１１は、複数の単語列及び対応する
音声書き下しデータを記憶した辞書メモリ（記憶装置）
１２内の辞書に基づいて、入力された文を解析し、当該
文を複数の単語列に区分し、２つの隣接する単語列間の
位置にラベル付けを行う。音節解析装置１４は、ＶＣ
（ＶＶ）同時調音セグメント及びＣＶ音節のピッチデー
タ記憶装置１５とＶＣ（ＶＶ）同時調音セグメント及び
ＣＶ音節のラベルデータ記憶装置１６内のデータに基づ
いて、どの単語列に対して同時調音処理を実行すべきか
を決定し、決定されたＣＶ音節及びＶＣ（ＶＶ）同時調
音セグメントのピッチデータ及びラベルデータとを検索
する。音節時間長検索装置１７及び音節周波数検索装置
１８は、韻律規則に基づいて、音節の対応する時間長及
び周波数を検索する。ここで、Ｃは子音（consonant）
であり、Ｖは母音（vowel）である。FIG. 1 is a block diagram showing a simultaneous tone processing apparatus for Chinese speech synthesis according to an embodiment of the present invention. In FIG. 1, an input device 10 includes, for example, a keyboard, and an operator can use the keyboard to input a written speech to be synthesized by voice. The word analysis device 11 is a dictionary memory (storage device) that stores a plurality of word strings and corresponding voice down data.
Based on the dictionary in 12, the input sentence is analyzed, the sentence is divided into a plurality of word strings, and a position between two adjacent word strings is labeled. The syllable analyzer 14 is a VC
Based on the data in the (VV) simultaneous articulation segment and CV syllable pitch data storage device 15 and the VC (VV) simultaneous articulation segment and CV syllable label data storage device 16, the simultaneous articulation process is executed for any word string. Then, the pitch data and the label data of the determined CV syllable and the VC (VV) simultaneous articulation segment are searched. The syllable time length searching device 17 and the syllable frequency searching device 18 search for the corresponding time length and frequency of the syllable based on the prosody rule. Where C is a consonant
And V is a vowel.

【００１７】レジスタ装置（記憶装置）１３は、その内
部に、各音節の対応する時間長、周波数、声調（tone）
及び音声記号を記憶する。ここで、声調とは中国語の音
節に付随するアクセントである。波形重ね合わせ及び加
算装置１９は、ＣＶ音節とＶＣ（ＶＶ）同時調音セグメ
ントの波形を重ね合わせかつ加算する。合成音声出力装
置２０は、結果として得られた合成音声を出力する。The register device (storage device) 13 includes therein a time length, a frequency, and a tone corresponding to each syllable.
And phonetic symbols. Here, a tone is an accent accompanying a Chinese syllable. The waveform superposition and addition device 19 superposes and adds the waveforms of the CV syllable and the VC (VV) simultaneous articulation segment. The synthesized speech output device 20 outputs the resulting synthesized speech.

【００１８】本発明に係る実施形態のアプリケーション
は、次のように説明される。例えば、The application of the embodiment according to the present invention is described as follows. For example,

【数１】［ｔａｉ２ｗａｎ１ｓｈｉ４ｙｉ２ｇ
ｅ５ｍｅｉ３ｌｉ４ｄｅ５ｂａｏ３ｄａｏ３
（台灣是一個美麗的寶島）］という音声書き下し文で
は、各単語の音声書き下しデータの後に、対応する単語
の声調を表す数字が続く。図６は、本実施形態におい
て、単語列に対して同時調音処理が行われるべきか否か
を決定するために使用される、後続する音節の最初の音
素の型を示す図である。## EQU1 ## [tai2 wan1 shi4 yi2 g
e5 mei3 li4 de5bao3 dao3
(Taiwan 一個的的的数字］］音声音声音声)], the voice-over data of each word is followed by a number representing the tone of the corresponding word. FIG. 6 is a diagram showing a first phoneme type of a subsequent syllable used to determine whether or not simultaneous articulation processing should be performed on a word string in the present embodiment.

【００１９】本実施形態の同時調音処理は、次のように
して実行される。まず最初に、上述のような音声書き下
し文が、オペレータによって入力装置１０を使用して入
力される。単語解析装置１１は、辞書メモリ１２内の辞
書に基づいて、入力された文を解析し、当該文を複数の
単語列に区分し、２つの隣接する単語列の間の位置にラ
ベル付けを行い、その結果、The simultaneous articulation process of this embodiment is executed as follows. First, a voice draft as described above is input by the operator using the input device 10. The word analysis device 11 analyzes an input sentence based on a dictionary in the dictionary memory 12, divides the sentence into a plurality of word strings, and labels a position between two adjacent word strings. ,as a result,

【数２】“ｔａｉ２ｗａｎ１＠ｓｈｉ４＠ｙ
ｉ２ｇｅ５＠ｍｅｉ３ｌｉ４＠ｄｅ５＠
ｂａｏ３ｄａｏ３” というラベルデータを含む音声書き下しデータのリスト
が生成される。ここで、＠は、隣接する２つの単語列の
間の位置にラベル付けを行うためのラベルデータであ
る。次いで、音節解析装置１４は、図６に示された音素
に従ってどの単語列が同時調音処理を実行すべきかを決
定する。その結果、単語列[Equation 2] “tai2wan1 ＠ shi4 ＠ y
i2ge5 ＠ mei3 li4 ｄｅ de5
A list of audio writing data including label data “bao3 dao3” is generated. Here, ＠ is label data for labeling a position between two adjacent word strings. The device 14 determines which word strings should be subjected to simultaneous articulation according to the phonemes shown in Fig. 6. As a result, the word strings

【数３】“ｔａｉ２ｗａｎ１” 及び## EQU3 ## "tai2 wan1" and

【数４】“ｍｅｉ３ｌｉ４” における音素に同時調音処理を実行する必要があること
が分かる。音節解析装置１４は、図６に示される表に従
って、ＶＣ（ＶＶ）同時調音セグメント及びＣＶ音節の
ピッチデータ及びラベルデータについて、ＶＣ（ＶＶ）
同時調音セグメント及びＣＶ音節のピッチデータ記憶装
置１５内とＶＣ（ＶＶ）同時調音セグメント及びＣＶ音
節のラベルデータ記憶装置１６内とを検索して、当該ピ
ッチデータ及びラベルデータをレジスタ装置１３に記憶
させる。即ち、## EQU4 ## It can be seen that it is necessary to execute simultaneous articulation processing for the phonemes in "mei3 li4". The syllable analyzer 14 calculates the VC (VV) for the VC (VV) simultaneous articulation segment and the pitch data and label data of the CV syllable according to the table shown in FIG.
A search is made in the pitch data storage device 15 for simultaneous articulation segments and CV syllables and in the label data storage device 16 for VC (VV) simultaneous articulation segments and CV syllables, and the pitch data and label data are stored in the register device 13. . That is,

【数５】”ｔａｉ２「ａｉｗａｎ」ｗａｎ１＠
ｓｈｉ４＠ｙｉ２ｇｅ５＠ｍｅｉ３「ｅｉ
ｌ」ｌｉ４＠ｄｅ５＠ｂａｏ３ｄａｏ３” が記憶装置１５及び１６から検索され、ここで、「ａｉ
ｗａｎ」（[Equation 5] "tai2" aiwan "wan1 ＠
shi4 @ yi2ge5 @ mei3 "ei
l ”li4 ＠ de5 ＠ bao3 dao3” is retrieved from the storage devices 15 and 16, where “ai”
wan "(

【外４】）はＶＶ同時調音セグメントであり、「ｅｉｌ」（[Outside 4] ) Are VV simultaneous articulation segments, and “eil” (

【外５】）はＶＣ同時調音セグメントである。音節時間長検索装
置１７及び音節周波数検索装置１８は、各音節の対応す
る時間長及び周波数を決定して、レジスタ装置１３に記
憶させる。[Outside 5] ) Are VC simultaneous articulation segments. The syllable time length searching device 17 and the syllable frequency searching device 18 determine the corresponding time length and frequency of each syllable and store them in the register device 13.

【００２０】図２は、図１のレジスタ装置１３に記憶さ
れたＣＶ音節及びＶＣ（ＶＶ）同時調音セグメントに関
する詳細データを示す図である。なお、レジスタ装置１
３は４０９個の中国語の音節を、中国語の四声のうちの
第一声で記憶する。波形重ね合わせ及び加算装置１９
は、図２に示されるような音節の時間長、音節の子音の
時間長、音節の開始点、音節の終了点、音節の８つのセ
クションの周波数、音節の声調の型、音節の子音の型、
音節の母音の型、単語列における音節の位置、音節のＣ
Ｖ音節のシリアル番号、音節のＶＣ（ＶＶ）同時調音セ
グメントのシリアル番号等を含む、レジスタ装置１３か
らのＣＶ音節及びＶＣ（ＶＶ）同時調音セグメントに関
する詳細データに基づいて、ＣＶ音節とＶＣ（ＶＶ）同
時調音セグメントの各波形を重ね合わせかつ加算する。
最後に、合成音声出力装置２０は、合成の結果得られた
合成音声を出力する。FIG. 2 is a diagram showing detailed data on CV syllables and VC (VV) simultaneous articulation segments stored in the register device 13 of FIG. The register device 1
3 stores 409 Chinese syllables in the first voice of the four Chinese voices. Waveform superposition and addition device 19
Are the syllable time length, the syllable consonant time length, the syllable start point, the syllable end point, the frequency of the eight sections of the syllable, the syllable tone type, and the syllable consonant type as shown in FIG. ,
Vowel type of syllable, position of syllable in word string, syllable C
Based on the detailed data on the CV syllable and the VC (VV) simultaneous articulation segment from the register device 13 including the serial number of the V syllable, the serial number of the VC (VV) simultaneous articulation segment of the syllable, etc., the CV syllable and the VC (VV) ) Superimpose and add each waveform of simultaneous articulation segments.
Finally, the synthesized speech output device 20 outputs a synthesized speech obtained as a result of the synthesis.

【００２１】図７は、本発明に係る一実施形態における
単語列“台灣”に対する同時調音処理について説明する
図である。まず最初に、各ＣＶ音節及びそれのＶＶ同時
調音セグメントのピッチデータ及びラベルデータは、レ
ジスタ装置１３に記憶される。次いで、音節“台”、音
節“灣”、及び同時調音セグメント（FIG. 7 is a diagram for explaining the simultaneous articulation processing for the word string “Taiwan” in one embodiment according to the present invention. First, the pitch data and label data of each CV syllable and its VV simultaneous articulation segment are stored in the register device 13. Then, the syllable “table”, the syllable “bay”, and the simultaneous articulation segment (

【外６】）の周波数と時間長は、韻律規則に基づいて評価され、
音節及び同時調音セグメントの波形が重ね合わされかつ
加算されて、単語列“台灣”の波形を生成する。単語列
“台灣”の合成はピッチデータによって達成されるの
で、その結果、時間長及び周波数を変更することがで
き、貴重なメモリ空間を節約可能である。[Outside 6] ) Frequency and duration are evaluated based on prosodic rules,
The syllable and co-articulatory segment waveforms are superimposed and added to generate the word string "Taiwan" waveform. Since the synthesis of the word string "Taiwan" is achieved by the pitch data, as a result, the time length and the frequency can be changed, and valuable memory space can be saved.

【００２２】前述されたように、本発明に係る実施形態
は、従来技術に関連する同時調音における問題を克服す
る。単語列の同時調音セグメントを検索し、同時調音セ
グメントの波形を単語列の先行する音節と後続する音節
の波形に重ね合わせかつ加算することにより、自然さの
度合いの高い合成音声を得ることができる。さらに、単
語列の時間長及び周波数を変更することができ、これに
よって、異なる声調と時間長を有する単語列を生成可能
であり、貴重なメモリ空間を節約することができる。As mentioned above, embodiments according to the present invention overcome the problems in co-articulation associated with the prior art. By searching for the simultaneous articulatory segment of the word string, and superimposing and adding the waveform of the simultaneous articulatory segment to the waveform of the preceding syllable and the succeeding syllable of the word string, it is possible to obtain a synthesized speech having a high degree of naturalness. . Furthermore, the time length and frequency of the word string can be changed, so that word strings having different tones and time lengths can be generated, and valuable memory space can be saved.

【００２３】本発明が、最も実用的かつ好ましい実施形
態であると考えられるものに関連して説明される一方
で、本発明が、開示された実施形態に限定されることな
く、最も広義的な解釈の精神及び範囲内に含まれる種々
の装置を包含するように意図されており、その結果、そ
のような変形例及び同等な装置を全て包含することは理
解されるであろう。While the present invention has been described in connection with what is considered to be the most practical and preferred embodiments, the present invention is not limited to the disclosed embodiments, but rather is described in the broadest sense. It is to be understood that it is intended to encompass various devices that fall within the spirit and scope of the interpretation, and consequently encompass all such variations and equivalents.

【００２４】[0024]

【発明の効果】以上詳述したように、本発明に係る中国
語音声合成のための同時調音処理装置によれば、複数の
中国語の単語列と、それに対応する音声書き下しデータ
とを記憶する辞書メモリと、種々の中国語の音節及び同
時調音セグメントのピッチデータと、上記種々の中国語
の音節及び同時調音セグメントに対応する音声書き下し
データと、上記種々の中国語の音節及び同時調音セグメ
ントの子音及び母音の開始点及び終了点とを記憶する記
憶装置と、上記辞書メモリ内の辞書に基づいて、合成さ
れるべき入力された音声書き下し文を解析し、当該文を
複数の単語列に区分する単語解析装置と、上記記憶装置
内のデータに基づいて、上記単語解析装置からのどの単
語列に対して同時調音処理を実行すべきかを決定し、同
時調音処理を実行することが決定された単語列の同時調
音セグメントを検索する音節解析装置と、検索された同
時調音セグメントを、入力された音声書き下し文におけ
る単語列の音節間に内挿して合成音声を生成する音声合
成装置とを備える。ここで、上記記憶装置は、好ましく
は。４０９個の中国語の音節を、中国語の四声のうちの
第一声で記憶する。また、上記記憶装置に記憶された上
記同時調音セグメントは、好ましくは、中国語の単語列
の後続する音節のうちの最初の音素である。As described in detail above, according to the simultaneous articulation processing apparatus for Chinese speech synthesis according to the present invention, a plurality of Chinese word strings and the corresponding voice down data are stored. A dictionary memory, pitch data of various Chinese syllables and simultaneous articulation segments, audio writing data corresponding to the various Chinese syllables and simultaneous articulation segments, and Based on a storage device for storing the start and end points of consonants and vowels, and a dictionary in the dictionary memory, the input voice draft sentence to be synthesized is analyzed, and the sentence is divided into a plurality of word strings. Based on the word analysis device and the data in the storage device, determine which word string from the word analysis device should be subjected to the simultaneous articulation process, and execute the simultaneous articulation process A syllable analysis device for searching for simultaneous articulatory segments of a word sequence determined to be extracted, and speech synthesis for generating synthesized speech by interpolating the searched simultaneous articulatory segments between syllables of the word sequence in an input speech draft sentence Device. Here, the storage device is preferably. 409 Chinese syllables are stored in the first of the four Chinese voices. Also, the simultaneous articulatory segment stored in the storage device is preferably the first phoneme of the following syllables of the Chinese word string.

【００２５】従って、単語列の同時調音セグメントを検
索し、同時調音セグメントの波形を単語列の先行する音
節と後続する音節の波形に重ね合わせかつ加算するの
で、自然さの度合いの高い合成音声を得ることができ、
さらに、単語列の時間長及び周波数を変更することがで
き、これによって、異なる声調と時間長を有する単語列
を生成可能であり、貴重なメモリ空間を節約することが
できる。Therefore, the simultaneous articulatory segment of the word string is searched, and the waveform of the simultaneous articulatory segment is superimposed and added to the waveform of the preceding syllable and the following syllable of the word string, so that the synthesized speech having a high degree of naturalness is obtained. You can get
Furthermore, the time length and frequency of the word string can be changed, so that word strings having different tones and time lengths can be generated, and valuable memory space can be saved.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る一実施形態における中国語音声
合成のための同時調音処理装置を示すブロック図であ
る。FIG. 1 is a block diagram illustrating a simultaneous tone processing apparatus for Chinese speech synthesis according to an embodiment of the present invention.

【図２】図１のレジスタ装置１３に記憶された音節の
内容を示す図である。FIG. 2 is a diagram showing the contents of syllables stored in a register device 13 of FIG.

【図３】本実施形態において用いられる人間によって
生成された発声“中文”の広帯域スペクトログラムによ
る波形図である。FIG. 3 is a waveform diagram of a human-generated utterance “Chinese sentence” used in the present embodiment, based on a broadband spectrogram.

【図４】従来例の中国語音声合成システムによって生
成された発声“中文”の広帯域スペクトログラムによる
波形図である。FIG. 4 is a waveform diagram of a utterance “Chinese sentence” generated by a conventional Chinese speech synthesis system using a broadband spectrogram.

【図５】従来例の中国語音声合成システムを示すブロ
ック図である。FIG. 5 is a block diagram showing a conventional Chinese speech synthesis system.

【図６】本実施形態において、単語列に対して同時調
音処理が行われるべきか否かを決定するために使用され
る、後続する音節の最初の音素の型を示す図である。FIG. 6 is a diagram illustrating a first phoneme type of a subsequent syllable used to determine whether or not simultaneous articulation processing should be performed on a word string in the present embodiment.

【図７】本実施形態における単語列“台灣”に対する
同時調音処理について説明する図である。FIG. 7 is a diagram illustrating a simultaneous articulation process for the word string “Taiwan” in the embodiment.

【符号の説明】[Explanation of symbols]

１０…入力装置、１１…単語解析装置、１２…辞書メモリ、１３…レジスタ装置、１４…音節解析装置、１５…ＶＣ（ＶＶ）同時調音セグメント及びＣＶ音節の
ピッチデータ記憶装置、１６…ＶＣ（ＶＶ）同時調音セグメント及びＣＶ音節の
ラベルデータ記憶装置、１７…音節時間長検索装置、１８…音節周波数検索装置、１９…波形重ね合わせ及び加算装置、２０…合成音声出力装置。DESCRIPTION OF SYMBOLS 10 ... Input device, 11 ... Word analyzer, 12 ... Dictionary memory, 13 ... Register device, 14 ... Syllable analyzer, 15 ... VC (VV) simultaneous articulation segment and pitch data storage device of CV syllable, 16 ... VC (VV) A) a label data storage device for simultaneous articulation segments and CV syllables; 17 a syllable time search device; 18 a syllable frequency search device; 19 a waveform superposition and addition device; and 20 a synthesized speech output device.

Claims

【特許請求の範囲】[Claims]

【請求項１】複数の中国語の単語列と、それに対応す
る音声書き下しデータとを記憶する辞書メモリと、種々の中国語の音節及び同時調音セグメントのピッチデ
ータと、上記種々の中国語の音節及び同時調音セグメン
トに対応する音声書き下しデータと、上記種々の中国語
の音節及び同時調音セグメントの子音及び母音の開始点
及び終了点とを記憶する記憶装置と、上記辞書メモリ内の辞書に基づいて、合成されるべき入
力された音声書き下し文を解析し、当該文を複数の単語
列に区分する単語解析装置と、上記記憶装置内のデータに基づいて、上記単語解析装置
からのどの単語列に対して同時調音処理を実行すべきか
を決定し、同時調音処理を実行することが決定された単
語列の同時調音セグメントを検索する音節解析装置と、検索された同時調音セグメントを、入力された音声書き
下し文における単語列の音節間に内挿して合成音声を生
成する音声合成装置とを備えたことを特徴とする中国語
音声合成のための同時調音処理装置。1. A dictionary memory for storing a plurality of Chinese word strings and their corresponding voice down data, pitch data of various Chinese syllables and simultaneous articulation segments, and various Chinese syllables And a storage device for storing speech writing data corresponding to the simultaneous articulation segment, the start point and the end point of the consonants and vowels of the various Chinese syllables and the simultaneous articulation segment, and a dictionary in the dictionary memory. A word analysis device that analyzes the input voice draft sentence to be synthesized and divides the sentence into a plurality of word strings; and based on data in the storage device, any word string from the word analysis device A syllable analysis device that determines whether to perform simultaneous articulation processing by performing a simultaneous articulation process, and searches for a simultaneous articulation segment of the word string determined to perform simultaneous articulation processing. And the coarticulation segment, coarticulation processor for Chinese speech synthesis, characterized in that a speech synthesis device by interpolating between syllables of a word sequence to generate a synthesized speech in the input speech Kakikudashibun.

【請求項２】上記記憶装置は、４０９個の中国語の音
節を、中国語の四声のうちの第一声で記憶することを特
徴とする請求項１記載の中国語音声合成のための同時調
音処理装置。2. The method according to claim 1, wherein the storage device stores 409 Chinese syllables in a first voice of four Chinese voices. Simultaneous articulation processing device.

【請求項３】上記記憶装置に記憶された上記同時調音
セグメントは、中国語の単語列の後続する音節のうちの
最初の音素であることを特徴とする請求項１又は２記載
の中国語音声合成のための同時調音処理装置。3. The Chinese speech according to claim 1, wherein the simultaneous articulatory segment stored in the storage device is a first phoneme of a succeeding syllable of a Chinese word string. Simultaneous articulator for synthesis.