JPH0389400A

JPH0389400A - Formant locus extracting system

Info

Publication number: JPH0389400A
Application number: JP1226592A
Authority: JP
Inventors: Tetsuya Sakayori; 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-09-01
Filing date: 1989-09-01
Publication date: 1991-04-15

Abstract

PURPOSE:To simply extract the formant locus with high accuracy after learning by setting a polar sequence obtained by the maximum likelihood method, giving a formant locus drawn by an expert as teach data, and deriving the formant locus by using a neural net learned by a back propagation. CONSTITUTION:From a waveform of a voice 1, a polar frequency and band width in the rational spectrum approximation of a voice spectrum are derived by the maximum likelihood method 2, these time series are set as input data, a formant locus 8 drawn by an expert is given as teaching data, and by using a neural net 4 learned by a back propagation, the formant locus is derived from its output data. In such a way, from the polar sequence obtained by the maximum likelihood method 2, a smooth formant locus can be obtained.

Description

【発明の詳細な説明】挟止立夏本発明は、フォルマント軌跡抽出方式、より詳細には、
音声波形からフォルマント軌跡を抽出する方式に関する
。[Detailed Description of the Invention] The present invention provides a formant locus extraction method, more specifically,
This paper relates to a method for extracting formant trajectories from speech waveforms.

従米夫揉音声波形からフォルマント軌跡を抽出する方法として従
来より様々なものが知られているが、代表的なものには
、Ａｂｓ法、モーメント法、最尤法などがある。Ａｂｓ
法は非常に精度良くフォルマント抽出が行えることが知
られているが、計算量が膨大であるという欠点を有する
。モーメント法はアルゴリズムが非常に簡単であり実時
間処理に向くが、スペクトルの全体的な傾きなどによっ
て大きな抽出誤差を生む。また、最尤法は線形予測の技
術によって比較的簡単に、しかも精度良くフォルマント
抽出が出来る。Various methods have been known for extracting formant trajectories from speech waveforms, and representative methods include the Abs method, the method of moments, and the maximum likelihood method. Abs
Although this method is known to be able to perform formant extraction with very high accuracy, it has the drawback of requiring a huge amount of calculation. The method of moments has a very simple algorithm and is suitable for real-time processing, but it produces large extraction errors due to factors such as the overall slope of the spectrum. Furthermore, the maximum likelihood method allows formant extraction to be performed relatively easily and accurately using linear prediction technology.

しかし、上述の最尤法によってフォルマント軌跡を抽出
する場合には以下のような問題がある。However, when extracting formant trajectories using the maximum likelihood method described above, there are the following problems.

■最尤法では、推定される極の数がフォルマントの数よ
りも多く成るように分析を行う。このため、推定された
極の中からフォルマントを選択するという後工程が必要
となる。■In the maximum likelihood method, analysis is performed so that the number of estimated poles is greater than the number of formants. Therefore, a post-process of selecting formants from among the estimated poles is required.

■推定された極軌跡は、第２図に示すように、周波数−
時間平面上で消滅、生成する場合があり、これを選択的
に接続しても滑らかなフォルマント軌跡は得られない。■As shown in Figure 2, the estimated polar locus is
They may disappear or be generated on the time plane, and even if they are selectively connected, a smooth formant trajectory cannot be obtained.

■に関してはフォルマントの帯域幅と連続性を考慮して
、動的計画法によってフォルマント軌跡を抽出する方法
が提案されている。しかし、この方法はコスト関数に用
いる重み係数の設定が微妙であり、計算量も多大である
。また、■の問題に関しては考慮されていない。Regarding (2), a method has been proposed in which formant trajectories are extracted by dynamic programming, taking formant bandwidth and continuity into consideration. However, in this method, the setting of weighting coefficients used in the cost function is delicate, and the amount of calculation is large. Furthermore, the problem (■) has not been taken into account.

且−一鰹本発明は、上述のような問題点を解決し、最尤法によっ
て得られた極の系列から、滑らかなフォルマント軌跡を
得ることを目的としてなされたものである。The present invention has been made for the purpose of solving the above-mentioned problems and obtaining a smooth formant locus from a series of poles obtained by the maximum likelihood method.

遭−−」又本発明は、上記目的を達成するために、音声波形から最
尤法によって、音声スペクトルの有理スペクトル近似に
おける極周波数及び帯域幅を求め、これらの時系列を入
力データとし、専門家によって描かれたフォルマント軌
跡を教師データとして与え、バックプロパゲージョンに
よって学習させたニューラルネットを用いて、その出力
データからフォルマント軌跡を求めることを特徴とした
ものである。以下、本発明の実施例に基づいて説明する
。In order to achieve the above object, the present invention calculates the polar frequencies and bandwidths in a rational spectral approximation of the speech spectrum from the speech waveform by the maximum likelihood method, uses these time series as input data, This method is characterized by providing formant trajectories drawn by the house as training data and using a neural network trained by backpropagation to obtain formant trajectories from the output data. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
１図中、ｌは入力音声、２は最尤法部。FIG. 1 is a block diagram for explaining an embodiment of the present invention. In FIG. 1, l represents an input voice and 2 represents a maximum likelihood method section.

３は接糸列部、４はニューラルネット部、５はフォルマ
ント軌跡、６はＦＦＴ、７は端末部、８はフォルマント
軌跡で、本発明は、上述のごとき欠点を解決するために
、最尤法から得られた極系列を入力データとし、専門家
によって描かれたフォルマント軌跡を教師データとして
与え、バックプロパゲージョンによって学習させたニュ
ーラルネットを用いて、その出力データからフォルマン
ト軌跡を求めるよ、うにしたものである。3 is a suction row part, 4 is a neural network part, 5 is a formant locus, 6 is FFT, 7 is a terminal part, and 8 is a formant locus. In order to solve the above-mentioned drawbacks, the present invention uses the maximum likelihood method. As input data is the polar sequence obtained from , formant trajectories drawn by an expert are given as training data, and a neural network trained by backpropagation is used to find formant trajectories from the output data. This is what I did.

第１図において、音声波形は予めＡ／Ｄ変換されてファ
イルに蓄えられているとする。実際の分析系は図の下半
分の点線で囲った部分を含まない。In FIG. 1, it is assumed that the audio waveform has been A/D converted and stored in a file in advance. The actual analysis system does not include the area enclosed by the dotted line in the lower half of the figure.

しかし実際の分析を行う前に、ニューラルネットの重み
係数を学習しなければならない、この際、点線の部分を
含んだ形となる（すなわち、点線内は学習時のみ必要で
ある）。However, before performing the actual analysis, the weighting coefficients of the neural network must be learned, and in this case, the parts inside the dotted lines are included (in other words, the parts inside the dotted lines are needed only during training).

先ず、学習時の方法について説明する。音声を適当なフ
レーム毎に最尤法によって分析し、音声スペクトルの有
理スペクトル近似における極周波数と帯域幅を求める。First, the method for learning will be explained. The audio is analyzed for each appropriate frame using the maximum likelihood method, and the polar frequencies and bandwidths in the rational spectral approximation of the audio spectrum are determined.

この際、極の数は抽出すべきフォルマントの数より多く
する。こうして得られた極の時系列をニューラルネット
への入力とする。一方、この極の時系列を端末画面に表
示する。At this time, the number of poles is made greater than the number of formants to be extracted. The time series of poles obtained in this way is input to the neural network. Meanwhile, this polar time series is displayed on the terminal screen.

また、必要に応じて波形データや、ＦＦＴその他のスペ
クトル分析の結果も併せて表示する。専門家はこれを見
てフォルマント軌跡をキーボード、マウス、ライトペン
等の端末器で入力する。また、上述のスペクトル情報を
紙に印刷してデジタイザによってフォルマント軌跡を描
く方法も考えられる。この様にして作成されたフォルマ
ント軌跡（時系列）を教師データとしてニューラルネッ
トに与える。入力データと教師データが与えられた後に
、ニューラルネットはバックプロパゲージョンによって
重み係数を学習する。In addition, waveform data and results of FFT and other spectrum analyzes are also displayed as necessary. The expert looks at this and inputs the formant trajectory using a terminal device such as a keyboard, mouse, or light pen. Another possible method is to print the above-mentioned spectrum information on paper and draw a formant locus using a digitizer. The formant trajectory (time series) created in this way is given to the neural network as training data. After input data and training data are given, the neural network learns the weighting coefficients by backpropagation.

次に実際の分析の方法について説明する。音声を適当な
フレーム毎に最尤法によって分析し、極の周波数と帯域
幅を求める。この際、極の数は抽出すべきフォルマント
の数より多くする。こうして得られた極の時系列をニュ
ーラルネットへの入力とする。ニューラルネットは既に
重み係数を学習済みであるので、専門家の描くフォルマ
ント軌跡と近いフォルマント軌跡を出力する。Next, the actual analysis method will be explained. Analyze the audio for each appropriate frame using the maximum likelihood method to find the frequency and bandwidth of the poles. At this time, the number of poles is made greater than the number of formants to be extracted. The time series of poles obtained in this way is input to the neural network. Since the neural network has already learned the weighting coefficients, it outputs a formant trajectory that is close to the formant trajectory drawn by the expert.

羞−一来以上の説明から明らかなように、本発明によると、！＆
尤法から得られた極系列を入力データとし。As is clear from the above description, according to the present invention,! &
Use the polar series obtained from the likelihood method as input data.

専門家によって描かれたフォルマント軌跡を教師データ
として与え、バックプロパゲージョンによって学習させ
たニューラルネットを用いて、その出力データからフォ
ルマント軌跡を求めることにより、学習後は比較的簡単
に、精度良くフォルマント軌跡を抽出することが出来る
。また、推定された極を選択的に接続する方法ではない
ので、極軌跡の周波数−時間平面上での消滅、生成にも
対応できる。Formant trajectories drawn by experts are given as training data, and a neural network trained by backpropagation is used to obtain formant trajectories from the output data. After learning, formant trajectories can be calculated relatively easily and accurately. Trajectories can be extracted. Furthermore, since this is not a method of selectively connecting estimated poles, it is also possible to deal with disappearance and generation of polar trajectories on the frequency-time plane.

【図面の簡単な説明】[Brief explanation of drawings]

第１１！ｌは１本発明の一実施例を説明するための構成
図、第２図は、フォルマント軌跡の抽出の例を説明する
ための図である。ｌ・・・入力音声、２・・・最尤法部、３・・・横系列
部、４・・・ニューラルネット部、５・・・フォルマン
ト軌跡。６・・・ＦＦＴ、７・・・端末部、８・・・フォルマン
ト軌跡。11th! 1 is a block diagram for explaining one embodiment of the present invention, and FIG. 2 is a diagram for explaining an example of formant locus extraction. l...Input speech, 2...Maximum likelihood part, 3...Horizontal series part, 4...Neural network part, 5...Formant locus. 6... FFT, 7... Terminal section, 8... Formant locus.

Claims

【特許請求の範囲】[Claims]

１、音声波形から最尤法によって、音声スペクトルの有
理スペクトル近似における極周波数及び帯域幅を求め、
これらの時系列を入力データとし、専門家によって描か
れたフォルマント軌跡を教師データとして与え、バック
プロパゲージョンによって学習させたニューラルネット
を用いて、その出力データからフォルマント軌跡を求め
ることを特徴としたフォルマント軌跡抽出方式。1. Determine the polar frequency and bandwidth in a rational spectral approximation of the audio spectrum using the maximum likelihood method from the audio waveform,
This method uses these time series as input data, provides formant trajectories drawn by experts as training data, and uses a neural network trained by backpropagation to obtain formant trajectories from the output data. Formant trajectory extraction method.