JPS63231397A

JPS63231397A - Evaluation system of parameter for voice synthesization

Info

Publication number: JPS63231397A
Application number: JP62064136A
Authority: JP
Inventors: 加世田　光子
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-03-20
Filing date: 1987-03-20
Publication date: 1988-09-27

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概　要〕規則合成方式による音声合成用パラメータの評価方式に
おいて、結合、補間する可能性のある合成単位の音声合
成用パラメータ時系列の連結部のパラメータ同士の距離
を算出し、しきい値以上の時はそれに関連する音声合成
用パラメータ時系列対の両者又はいずれかを不良又は不
良候補と判定するものである。[Detailed Description of the Invention] [Summary] In a method for evaluating speech synthesis parameters using a rule synthesis method, the distance between parameters in the concatenated parts of speech synthesis parameter time series of synthesis units that may be combined or interpolated is calculated. When the value is greater than or equal to a threshold value, either or both of the related speech synthesis parameter time series pairs are determined to be defective or defective candidates.

〔産業上の利用分野〕[Industrial application field]

本発明は音声合成用パラメータの評価方式に関し、特に
電算機を用いて合成に必要なパラメータを制御する規則
合成方式における評価方式に関する。The present invention relates to a method for evaluating parameters for speech synthesis, and more particularly to an evaluation method for a rule synthesis method that uses a computer to control parameters necessary for synthesis.

〔従来の技術および発明が解決しようとする問題点〕[Problems to be solved by conventional technology and invention]

規則合成方式は、電算機を用いて任意の音声を合成しよ
うとするとき、単語よりも低いレベルの短区間の音声部
分を一つの単位として、それらの組み合わせにより任意
の単語や会話音を発声させる場合に用いる方式である。When trying to synthesize arbitrary speech using a computer, the rule synthesis method uses short sections of speech at a lower level than words as one unit, and combines them to produce an arbitrary word or conversation sound. This method is used in cases where

この合成に用いる音声の最小の単位を基本合成単位と称
し、この合成単位として、１ピッチ分の波形、子音（Ｃ
）と母音（Ｖ）の組合せからなる音節、即ち、ＣＶ、Ｃ
ＶＶ、ｖｃｖ、ｃｖｃ、等がある。これらの音響パラメ
ータをいわゆるＰＡＲＣＯＲ方式、ＬＳＰ方式等により
定め、この音響パラメータを各合成単位に対応した音声
合成用パラメータ時系列として記憶しておき、入力され
た文字列の読みに対応する一連の音声合成用パラメータ
時系列を合成部に供給し音声の合成を行う。この合成単
位の音声合成用パラメータとしては、予め音節単位、単
語単位で発声された自然音声を分析することにより得ら
れる音響パラメータの時系列の全部または一部が用いら
れる。The smallest unit of speech used for this synthesis is called the basic synthesis unit, and this synthesis unit includes a waveform for one pitch, a consonant (C
) and a vowel (V), i.e., CV, C
There are VV, vcv, cvc, etc. These acoustic parameters are determined by the so-called PARCOR method, LSP method, etc., and these acoustic parameters are stored as a speech synthesis parameter time series corresponding to each synthesis unit, and a series of speech sounds corresponding to the reading of the input character string is stored. The synthesis parameter time series is supplied to the synthesis unit to synthesize speech. As the speech synthesis parameters for this synthesis unit, all or part of a time series of acoustic parameters obtained by analyzing natural speech uttered in advance on a syllable or word basis is used.

第３図は従来方式により音声合成用パラメータの品質評
価を行う場合の装置構成図である。３２は音声合成用パ
ラメータの格納部、３３は音声合成の際の制御部、３６
は試聴後の音声合成用パラメータの修正部、そして３１
はこれらを含む装置全体を制御する主制御部である。音
声合成用パラメータ格納部３２は前述の如く合成単位の
音響パラメータを音声合成用パラメータの時系列として
記憶しておくものである。そして例えば、キーボード３
５から入力された文字列にもとづいて合成用制御部３３
にて音声合成された後、デジタル信号処理部３４、バン
ドパスフィルタ３７を経てスピーカ３８より音声出力す
る。FIG. 3 is a diagram showing the configuration of an apparatus for evaluating the quality of speech synthesis parameters using a conventional method. 32 is a storage unit for voice synthesis parameters; 33 is a control unit for voice synthesis; 36
31 is a correction section for voice synthesis parameters after trial listening, and 31
is the main control unit that controls the entire device including these. As described above, the speech synthesis parameter storage section 32 stores acoustic parameters of synthesis units as a time series of speech synthesis parameters. And for example, keyboard 3
Based on the character string input from 5, the synthesis control unit 33
After the voice is synthesized in the digital signal processing section 34 and the bandpass filter 37, the voice is outputted from the speaker 38.

このように合成音声を得るのは、合成単位の連結により
なされる。この場合、連結点での音響パラメータの違い
から合成音声に不明瞭もしくは異音が発生することがあ
る。これは、音声合成用パラメータ格納部３２のメモリ
容量に起因するもので、同じ子音であっても他の単語中
の子音を用いることになるため若干の音響パラメータの
違いを生ずるからである。Obtaining synthesized speech in this way is done by concatenating synthesis units. In this case, unclear or abnormal sounds may occur in the synthesized speech due to differences in acoustic parameters at the connection points. This is due to the memory capacity of the speech synthesis parameter storage section 32, and is because even if the consonants are the same, consonants from other words are used, resulting in slight differences in acoustic parameters.

そこで、このような不明瞭や異音に対処するため、一般
的な手法として、合成単位毎のパラメータを作成した後
、１〜３音節程度の音声を合成してみて、これを試聴す
ることにより連結点での音響パラメータの違いによる品
質の劣化を調査し、それにより合成単位毎のパラメータ
の修正もしくは再分析あるいはパラメータの作成などが
行われている。Therefore, in order to deal with such indistinctness and abnormal sounds, a general method is to create parameters for each synthesis unit, synthesize a voice of about 1 to 3 syllables, and listen to this. Deterioration in quality due to differences in acoustic parameters at connection points is investigated, and parameters are corrected, reanalyzed, or created for each synthesis unit based on the results.

しかし、合成単位の個数は一般に数百〜数千となり、す
べての可能性のある合成単位について合成し試聴するこ
とは非常な労力を要する。一方、全く試聴せず修正しな
かったならば前述の如き音響パラメータの若干の相違か
ら良質な合成音声を得ることはむつかしい。However, the number of synthesis units is generally several hundred to several thousand, and it takes a great deal of effort to synthesize and listen to all possible synthesis units. On the other hand, if no trial listening is performed and no correction is made, it will be difficult to obtain high-quality synthesized speech due to the slight differences in acoustic parameters as described above.

〔問題点を解決するための手段および作用〕本発明は上
述の問題点を解消した音声合成用パラメータの評価方式
を提供するもので、基本的には接続する可能性のある合
成単位の連結点となる音響パラメータ同士の距離（類似
度）を計算し、その距離が所定のしきい値以上のものに
ついて試聴するかもしくは予め一部データの試聴により
不明瞭や異音の発生するしきい値を求めておき、距離が
そのしきい値を越えたものを不良として、パラメータの
修正または再分析、パラメータ作成を行なうことにより
音響パラメータの修正が容易に行なえるようにしたもの
である。従って、本発明によれば、規則合成方式による
音声合成用パラメータの評価方式において、合成単位の
連結部における音響パラメータ同士の距離を算出するパ
ラメータ間距離算出部１３ａを内部に備え、連結する可
能性のある音声合成用パラメータ時系列の対を音声合成
用パラメータ格納部１２から取り出し前記パラメータ間
距離算出部１３ａでの距離の算出結果を所定のしきい値
と比較して合成音声の評価を行う音声合成用パラメータ
評価部１３を備え、前記しきい値が一定値以上の時は関
連する音声パラメータ時系列対の両者又はいずれかを不
良又は不良候補と判定するようにした音声合成用パラメ
ータの評価方式、が提供される。[Means and effects for solving the problems] The present invention provides a method for evaluating speech synthesis parameters that solves the above-mentioned problems, and basically consists of connecting points of synthesis units that may be connected. Calculate the distance (similarity) between acoustic parameters, and listen to those for which the distance is greater than a predetermined threshold, or listen to some of the data in advance to determine the threshold at which ambiguity or abnormal noise occurs. The acoustic parameters can be easily corrected by determining the acoustic parameters, and determining that those whose distance exceeds the threshold value are defective, by correcting or re-analyzing the parameters, and creating parameters. Therefore, according to the present invention, in a method for evaluating speech synthesis parameters using a rule synthesis method, an inter-parameter distance calculation section 13a is provided for calculating the distance between acoustic parameters at a connection section of a synthesis unit, and the possibility of connection is provided. A voice in which a pair of speech synthesis parameters time series is extracted from the speech synthesis parameter storage section 12 and the distance calculation result in the inter-parameter distance calculation section 13a is compared with a predetermined threshold value to evaluate the synthesized speech. A speech synthesis parameter evaluation method comprising a synthesis parameter evaluation unit 13, and determining that both or either of the related speech parameter time series pairs are defective or defective candidates when the threshold value is equal to or greater than a certain value. , is provided.

（実施例〕第１図は本発明に係る音声合成用パラメータの評価方式
を実現する装置のブロック図である。また、第２図は本
方式の評価方式の処理フローチャートである。(Embodiment) Fig. 1 is a block diagram of an apparatus for implementing the speech synthesis parameter evaluation method according to the present invention.Furthermore, Fig. 2 is a processing flowchart of the evaluation method of the present method.

第１図において、音声合成用パラメータ格納部１２は、
前述したように、子音（Ｃ）および母音（Ｖ）の組み合
せからなる音節Ｃ■、■Ｃ■あるいはＣＶＣ等の各合成
単位毎に、その各音節に対応する音声合成用パラメータ
の時系列を記憶するためのものである。音声合成用パラ
メータ評価部１３は連結する可能性のある音声合成用パ
ラメータ時系列の対をパラメータ格納部１２から取り出
し、評価部１３の内部に設けられたパラメータ間距離算
出部１３ａにおいて連結部分となる音響パラメータ同士
の距離（類似度）を算出する。後述するようにその距離
が予め設定されたしきい値以上のときは、その音声合成
用パラメータ対の両者又は一方が不良もしくは不良候補
であると評価者に伝える。合成用制御部１４は不良候補
と判定された音声合成用パラメータ対から音節、単語単
位の音声合成用パラメータ時系列を作成する。この音節
単語単位のパラメータはデジタル処理部１５、バンドパ
スフィルタ１８を経てスピーカ１９により評価者によっ
て試聴され評価される。In FIG. 1, the speech synthesis parameter storage section 12 is
As mentioned above, for each synthesis unit such as syllable C■, ■C■ or CVC, which consists of a combination of consonant (C) and vowel (V), the time series of speech synthesis parameters corresponding to each syllable is stored. It is for the purpose of The speech synthesis parameter evaluation unit 13 extracts pairs of speech synthesis parameter time series that may be connected from the parameter storage unit 12, and uses the inter-parameter distance calculation unit 13a provided inside the evaluation unit 13 to form a connected part. Calculate the distance (similarity) between acoustic parameters. As will be described later, when the distance is greater than or equal to a preset threshold, the evaluator is informed that one or both of the voice synthesis parameter pairs are defective or a defective candidate. The synthesis control unit 14 creates a time series of speech synthesis parameters in units of syllables and words from the speech synthesis parameter pairs determined to be defective candidates. The parameters for each syllable word are passed through a digital processing section 15 and a bandpass filter 18, and then listened to through a speaker 19 by an evaluator for evaluation.

第２図の処理フローチャートにおいて、まず、評価すべ
き音響パラメータ対の選択が行われる（ステップ２１）
。次に、この選択された対の音響パラメータ同士の距離
を算出する（ステップ２２）。In the processing flowchart of FIG. 2, first, a pair of acoustic parameters to be evaluated is selected (step 21).
. Next, the distance between the selected pair of acoustic parameters is calculated (step 22).

そしてこの算出された距離が所定のしきい値以上か否か
判断される（ステップ２３）。しきい値以上のときは不
良もしくは不良候補と判定され（ステップ２４）、音響
パラメータの修正もしくは作成がなされ（ステップ２５
）、フィードバックされて再度距離計算がなされる（ス
テップ２３）。しきい値以下のときはすべての対象に対
して距離計算がなされたか判断され（ステップ２６）、
なされていなければステップ２１に戻る。Then, it is determined whether the calculated distance is greater than or equal to a predetermined threshold (step 23). When the threshold value is exceeded, it is determined to be defective or a defective candidate (step 24), and acoustic parameters are corrected or created (step 25).
), the distance is calculated again based on the feedback (step 23). If it is below the threshold, it is determined whether distance calculations have been performed for all objects (step 26);
If not, return to step 21.

〔発明の効果〕〔Effect of the invention〕

以上述べたように、本発明によれば、連結する可能性の
ある合成単位毎にすべて、接続の際の音声の不明瞭や異
音の発生を試聴により評価するのではなく、連結部の音
声合成用パラメータの距離により評価、又は距離により
評価対象を削減することにより、評価のための労力の低
減および音質の向上が図れる効果がある。As described above, according to the present invention, instead of evaluating the unclearness of the sound or the occurrence of abnormal sounds during connection for each synthesis unit that may be connected, the sound of the connection part is By evaluating based on the distance of the synthesis parameter or reducing the number of evaluation targets based on the distance, it is possible to reduce the labor for evaluation and improve the sound quality.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の音声合成用パラメータ評価方式を実施
する装置のブロック構成図、第２図は第１図装置の処理フローチャート、および第３図は従来の構成図である。（符号の説明）１１　、３１・・・主制御部、１２　、３２・・・音声合成用パラメータ格納部、１３
・・・・・・音声合成用パラメータ評価部、１３ａ・・
・・・・パラメータ間距離算出部、１４　、３３・・・
合成用制御部、１７　、３６・・・音声パラメータ修正部、１９　、３
８・・・スピーカ。本発明の処理フローチャート第２図FIG. 1 is a block configuration diagram of an apparatus implementing the speech synthesis parameter evaluation method of the present invention, FIG. 2 is a processing flowchart of the apparatus shown in FIG. 1, and FIG. 3 is a conventional configuration diagram. (Explanation of symbols) 11, 31... Main control unit, 12, 32... Speech synthesis parameter storage unit, 13
...Speech synthesis parameter evaluation section, 13a...
... Inter-parameter distance calculation unit, 14, 33...
Synthesis control unit, 17, 36... Audio parameter modification unit, 19, 3
8...Speaker. Processing flowchart of the present invention Fig. 2

Claims

【特許請求の範囲】[Claims]

１、規則合成方式による音声合成用パラメータの評価方
式において、合成単位の連結部における音響パラメータ
同士の距離を算出するパラメータ間距離算出部を内部に
備え、連結する可能性のある音声合成用パラメータ時系
列の対を音声合成用パラメータ格納部から取り出し前記
パラメータ間距離算出部での距離の算出結果を所定のし
きい値と比較して合成音声の評価を行う音声合成用パラ
メータ評価部を備え、前記しきい値が一定値以上の時は
関連する音声パラメータ時系列対の両者又はいずれかを
不良又は不良候補と判定するようにした音声合成用パラ
メータの評価方式。1. In a method for evaluating parameters for speech synthesis using a rule synthesis method, an inter-parameter distance calculating section is provided to calculate the distance between acoustic parameters at the connection section of a synthesis unit, and when the parameters for speech synthesis may be connected. a speech synthesis parameter evaluation section that extracts a pair of sequences from a speech synthesis parameter storage section and compares a distance calculation result of the inter-parameter distance calculation section with a predetermined threshold value to evaluate the synthesized speech; A speech synthesis parameter evaluation method that determines both or either of the related speech parameter time series pairs to be defective or defective candidates when a threshold value is equal to or greater than a certain value.