JPH0389400A - Formant locus extracting system - Google Patents

Formant locus extracting system

Info

Publication number
JPH0389400A
JPH0389400A JP1226592A JP22659289A JPH0389400A JP H0389400 A JPH0389400 A JP H0389400A JP 1226592 A JP1226592 A JP 1226592A JP 22659289 A JP22659289 A JP 22659289A JP H0389400 A JPH0389400 A JP H0389400A
Authority
JP
Japan
Prior art keywords
formant
locus
maximum likelihood
data
likelihood method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1226592A
Other languages
Japanese (ja)
Inventor
Tetsuya Sakayori
哲也 酒寄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP1226592A priority Critical patent/JPH0389400A/en
Publication of JPH0389400A publication Critical patent/JPH0389400A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To simply extract the formant locus with high accuracy after learning by setting a polar sequence obtained by the maximum likelihood method, giving a formant locus drawn by an expert as teach data, and deriving the formant locus by using a neural net learned by a back propagation. CONSTITUTION:From a waveform of a voice 1, a polar frequency and band width in the rational spectrum approximation of a voice spectrum are derived by the maximum likelihood method 2, these time series are set as input data, a formant locus 8 drawn by an expert is given as teaching data, and by using a neural net 4 learned by a back propagation, the formant locus is derived from its output data. In such a way, from the polar sequence obtained by the maximum likelihood method 2, a smooth formant locus can be obtained.

Description

【発明の詳細な説明】 挟止立夏 本発明は、フォルマント軌跡抽出方式、より詳細には、
音声波形からフォルマント軌跡を抽出する方式に関する
[Detailed Description of the Invention] The present invention provides a formant locus extraction method, more specifically,
This paper relates to a method for extracting formant trajectories from speech waveforms.

従米夫揉 音声波形からフォルマント軌跡を抽出する方法として従
来より様々なものが知られているが、代表的なものには
、Abs法、モーメント法、最尤法などがある。Abs
法は非常に精度良くフォルマント抽出が行えることが知
られているが、計算量が膨大であるという欠点を有する
。モーメント法はアルゴリズムが非常に簡単であり実時
間処理に向くが、スペクトルの全体的な傾きなどによっ
て大きな抽出誤差を生む。また、最尤法は線形予測の技
術によって比較的簡単に、しかも精度良くフォルマント
抽出が出来る。
Various methods have been known for extracting formant trajectories from speech waveforms, and representative methods include the Abs method, the method of moments, and the maximum likelihood method. Abs
Although this method is known to be able to perform formant extraction with very high accuracy, it has the drawback of requiring a huge amount of calculation. The method of moments has a very simple algorithm and is suitable for real-time processing, but it produces large extraction errors due to factors such as the overall slope of the spectrum. Furthermore, the maximum likelihood method allows formant extraction to be performed relatively easily and accurately using linear prediction technology.

しかし、上述の最尤法によってフォルマント軌跡を抽出
する場合には以下のような問題がある。
However, when extracting formant trajectories using the maximum likelihood method described above, there are the following problems.

■最尤法では、推定される極の数がフォルマントの数よ
りも多く成るように分析を行う。このため、推定された
極の中からフォルマントを選択するという後工程が必要
となる。
■In the maximum likelihood method, analysis is performed so that the number of estimated poles is greater than the number of formants. Therefore, a post-process of selecting formants from among the estimated poles is required.

■推定された極軌跡は、第2図に示すように、周波数−
時間平面上で消滅、生成する場合があり、これを選択的
に接続しても滑らかなフォルマント軌跡は得られない。
■As shown in Figure 2, the estimated polar locus is
They may disappear or be generated on the time plane, and even if they are selectively connected, a smooth formant trajectory cannot be obtained.

■に関してはフォルマントの帯域幅と連続性を考慮して
、動的計画法によってフォルマント軌跡を抽出する方法
が提案されている。しかし、この方法はコスト関数に用
いる重み係数の設定が微妙であり、計算量も多大である
。また、■の問題に関しては考慮されていない。
Regarding (2), a method has been proposed in which formant trajectories are extracted by dynamic programming, taking formant bandwidth and continuity into consideration. However, in this method, the setting of weighting coefficients used in the cost function is delicate, and the amount of calculation is large. Furthermore, the problem (■) has not been taken into account.

且−一鰹 本発明は、上述のような問題点を解決し、最尤法によっ
て得られた極の系列から、滑らかなフォルマント軌跡を
得ることを目的としてなされたものである。
The present invention has been made for the purpose of solving the above-mentioned problems and obtaining a smooth formant locus from a series of poles obtained by the maximum likelihood method.

遭−−」又 本発明は、上記目的を達成するために、音声波形から最
尤法によって、音声スペクトルの有理スペクトル近似に
おける極周波数及び帯域幅を求め、これらの時系列を入
力データとし、専門家によって描かれたフォルマント軌
跡を教師データとして与え、バックプロパゲージョンに
よって学習させたニューラルネットを用いて、その出力
データからフォルマント軌跡を求めることを特徴とした
ものである。以下、本発明の実施例に基づいて説明する
In order to achieve the above object, the present invention calculates the polar frequencies and bandwidths in a rational spectral approximation of the speech spectrum from the speech waveform by the maximum likelihood method, uses these time series as input data, This method is characterized by providing formant trajectories drawn by the house as training data and using a neural network trained by backpropagation to obtain formant trajectories from the output data. Hereinafter, the present invention will be explained based on examples.

第1図は、本発明の一実施例を説明するための構成図で
1図中、lは入力音声、2は最尤法部。
FIG. 1 is a block diagram for explaining an embodiment of the present invention. In FIG. 1, l represents an input voice and 2 represents a maximum likelihood method section.

3は接糸列部、4はニューラルネット部、5はフォルマ
ント軌跡、6はFFT、7は端末部、8はフォルマント
軌跡で、本発明は、上述のごとき欠点を解決するために
、最尤法から得られた極系列を入力データとし、専門家
によって描かれたフォルマント軌跡を教師データとして
与え、バックプロパゲージョンによって学習させたニュ
ーラルネットを用いて、その出力データからフォルマン
ト軌跡を求めるよ、うにしたものである。
3 is a suction row part, 4 is a neural network part, 5 is a formant locus, 6 is FFT, 7 is a terminal part, and 8 is a formant locus. In order to solve the above-mentioned drawbacks, the present invention uses the maximum likelihood method. As input data is the polar sequence obtained from , formant trajectories drawn by an expert are given as training data, and a neural network trained by backpropagation is used to find formant trajectories from the output data. This is what I did.

第1図において、音声波形は予めA/D変換されてファ
イルに蓄えられているとする。実際の分析系は図の下半
分の点線で囲った部分を含まない。
In FIG. 1, it is assumed that the audio waveform has been A/D converted and stored in a file in advance. The actual analysis system does not include the area enclosed by the dotted line in the lower half of the figure.

しかし実際の分析を行う前に、ニューラルネットの重み
係数を学習しなければならない、この際、点線の部分を
含んだ形となる(すなわち、点線内は学習時のみ必要で
ある)。
However, before performing the actual analysis, the weighting coefficients of the neural network must be learned, and in this case, the parts inside the dotted lines are included (in other words, the parts inside the dotted lines are needed only during training).

先ず、学習時の方法について説明する。音声を適当なフ
レーム毎に最尤法によって分析し、音声スペクトルの有
理スペクトル近似における極周波数と帯域幅を求める。
First, the method for learning will be explained. The audio is analyzed for each appropriate frame using the maximum likelihood method, and the polar frequencies and bandwidths in the rational spectral approximation of the audio spectrum are determined.

この際、極の数は抽出すべきフォルマントの数より多く
する。こうして得られた極の時系列をニューラルネット
への入力とする。一方、この極の時系列を端末画面に表
示する。
At this time, the number of poles is made greater than the number of formants to be extracted. The time series of poles obtained in this way is input to the neural network. Meanwhile, this polar time series is displayed on the terminal screen.

また、必要に応じて波形データや、FFTその他のスペ
クトル分析の結果も併せて表示する。専門家はこれを見
てフォルマント軌跡をキーボード、マウス、ライトペン
等の端末器で入力する。また、上述のスペクトル情報を
紙に印刷してデジタイザによってフォルマント軌跡を描
く方法も考えられる。この様にして作成されたフォルマ
ント軌跡(時系列)を教師データとしてニューラルネッ
トに与える。入力データと教師データが与えられた後に
、ニューラルネットはバックプロパゲージョンによって
重み係数を学習する。
In addition, waveform data and results of FFT and other spectrum analyzes are also displayed as necessary. The expert looks at this and inputs the formant trajectory using a terminal device such as a keyboard, mouse, or light pen. Another possible method is to print the above-mentioned spectrum information on paper and draw a formant locus using a digitizer. The formant trajectory (time series) created in this way is given to the neural network as training data. After input data and training data are given, the neural network learns the weighting coefficients by backpropagation.

次に実際の分析の方法について説明する。音声を適当な
フレーム毎に最尤法によって分析し、極の周波数と帯域
幅を求める。この際、極の数は抽出すべきフォルマント
の数より多くする。こうして得られた極の時系列をニュ
ーラルネットへの入力とする。ニューラルネットは既に
重み係数を学習済みであるので、専門家の描くフォルマ
ント軌跡と近いフォルマント軌跡を出力する。
Next, the actual analysis method will be explained. Analyze the audio for each appropriate frame using the maximum likelihood method to find the frequency and bandwidth of the poles. At this time, the number of poles is made greater than the number of formants to be extracted. The time series of poles obtained in this way is input to the neural network. Since the neural network has already learned the weighting coefficients, it outputs a formant trajectory that is close to the formant trajectory drawn by the expert.

羞−一来 以上の説明から明らかなように、本発明によると、!&
尤法から得られた極系列を入力データとし。
As is clear from the above description, according to the present invention,! &
Use the polar series obtained from the likelihood method as input data.

専門家によって描かれたフォルマント軌跡を教師データ
として与え、バックプロパゲージョンによって学習させ
たニューラルネットを用いて、その出力データからフォ
ルマント軌跡を求めることにより、学習後は比較的簡単
に、精度良くフォルマント軌跡を抽出することが出来る
。また、推定された極を選択的に接続する方法ではない
ので、極軌跡の周波数−時間平面上での消滅、生成にも
対応できる。
Formant trajectories drawn by experts are given as training data, and a neural network trained by backpropagation is used to obtain formant trajectories from the output data. After learning, formant trajectories can be calculated relatively easily and accurately. Trajectories can be extracted. Furthermore, since this is not a method of selectively connecting estimated poles, it is also possible to deal with disappearance and generation of polar trajectories on the frequency-time plane.

【図面の簡単な説明】[Brief explanation of drawings]

第11!lは1本発明の一実施例を説明するための構成
図、第2図は、フォルマント軌跡の抽出の例を説明する
ための図である。 l・・・入力音声、2・・・最尤法部、3・・・横系列
部、4・・・ニューラルネット部、5・・・フォルマン
ト軌跡。 6・・・FFT、7・・・端末部、8・・・フォルマン
ト軌跡。
11th! 1 is a block diagram for explaining one embodiment of the present invention, and FIG. 2 is a diagram for explaining an example of formant locus extraction. l...Input speech, 2...Maximum likelihood part, 3...Horizontal series part, 4...Neural network part, 5...Formant locus. 6... FFT, 7... Terminal section, 8... Formant locus.

Claims (1)

【特許請求の範囲】[Claims] 1、音声波形から最尤法によって、音声スペクトルの有
理スペクトル近似における極周波数及び帯域幅を求め、
これらの時系列を入力データとし、専門家によって描か
れたフォルマント軌跡を教師データとして与え、バック
プロパゲージョンによって学習させたニューラルネット
を用いて、その出力データからフォルマント軌跡を求め
ることを特徴としたフォルマント軌跡抽出方式。
1. Determine the polar frequency and bandwidth in a rational spectral approximation of the audio spectrum using the maximum likelihood method from the audio waveform,
This method uses these time series as input data, provides formant trajectories drawn by experts as training data, and uses a neural network trained by backpropagation to obtain formant trajectories from the output data. Formant trajectory extraction method.
JP1226592A 1989-09-01 1989-09-01 Formant locus extracting system Pending JPH0389400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1226592A JPH0389400A (en) 1989-09-01 1989-09-01 Formant locus extracting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1226592A JPH0389400A (en) 1989-09-01 1989-09-01 Formant locus extracting system

Publications (1)

Publication Number Publication Date
JPH0389400A true JPH0389400A (en) 1991-04-15

Family

ID=16847600

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1226592A Pending JPH0389400A (en) 1989-09-01 1989-09-01 Formant locus extracting system

Country Status (1)

Country Link
JP (1) JPH0389400A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006112760A (en) * 2004-10-18 2006-04-27 Tdk Corp Baking furnace

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006112760A (en) * 2004-10-18 2006-04-27 Tdk Corp Baking furnace

Similar Documents

Publication Publication Date Title
Virtanen Sound source separation using sparse coding with temporal continuity objective
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN108597496B (en) Voice generation method and device based on generation type countermeasure network
KR100745976B1 (en) Method and apparatus for classifying voice and non-voice using sound model
Schuller et al. Emotion recognition in the noise applying large acoustic feature sets
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
US6055498A (en) Method and apparatus for automatic text-independent grading of pronunciation for language instruction
US4489435A (en) Method and apparatus for continuous word string recognition
US4481593A (en) Continuous speech recognition
US8676574B2 (en) Method for tone/intonation recognition using auditory attention cues
CN110767210A (en) Method and device for generating personalized voice
Rammo et al. Detecting the speaker language using CNN deep learning algorithm
Seetharaman et al. Music/voice separation using the 2d fourier transform
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
CN110782915A (en) Waveform music component separation method based on deep learning
US11611581B2 (en) Methods and devices for detecting a spoofing attack
CN111128211A (en) Voice separation method and device
Sunny et al. Recognition of speech signals: an experimental comparison of linear predictive coding and discrete wavelet transforms
Jin et al. Speech separation and emotion recognition for multi-speaker scenarios
CN113539243A (en) Training method of voice classification model, voice classification method and related device
CN112863485A (en) Accent voice recognition method, apparatus, device and storage medium
Südholt et al. Pruning deep neural network models of guitar distortion effects
CN116994600A (en) Method and system for driving character mouth shape based on audio frequency
Sunny et al. Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
JPH0389400A (en) Formant locus extracting system