JPS59102296A

JPS59102296A - Pitch extraction

Info

Publication number: JPS59102296A
Application number: JP21293582A
Authority: JP
Inventors: 泰助渡辺; 平岡　省二; 達也木村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1982-12-03
Filing date: 1982-12-03
Publication date: 1984-06-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声をディジタル化し帯域圧縮する音声分析合
成系における重要な基本パラメータの一つである音声の
ピッチ周期を正確に、抽出するピンチ抽出方法に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a pinch extraction method for accurately extracting the pitch period of speech, which is one of the important basic parameters in a speech analysis and synthesis system that digitizes speech and compresses the band. It is something.

従来例の構成とその問題点音声分析合成系において音声の有声部のピッチ周期を抽
出する方法として自己相関法が広く知られている。この
方法はディジタル化した音声信号をある時間区間（フレ
ームと呼ぶ）切出し、その区間で自己相関演算全施し、
自己相関係数が最大の点を検出しピッチ周期を求める方
法である。しかし、前記方法においては正しいピッチ周
期の倍周期や半周期成分全課まって抽出するという欠点
があった。誤まったピッチ周期の抽出は音声合成時にお
ける音質の著しい劣化となって現われる。Conventional Structure and Problems The autocorrelation method is widely known as a method for extracting the pitch period of voiced parts of speech in speech analysis and synthesis systems. This method cuts out a certain time interval (called a frame) from a digitized audio signal, performs all autocorrelation calculations on that interval, and
This method detects the point with the maximum autocorrelation coefficient and determines the pitch period. However, the above method has a drawback in that it requires all double period and half period components of the correct pitch period to be extracted. Extraction of incorrect pitch periods results in significant deterioration of sound quality during speech synthesis.

発明の目的本発明は上記の従来の欠点を除去し、正しいピッチ周期
を確実に抽出するピンチ抽出方法を提供することを目的
とする。OBJECTS OF THE INVENTION It is an object of the present invention to provide a pinch extraction method that eliminates the above-mentioned conventional drawbacks and reliably extracts the correct pitch period.

発明の構成上記目的を達成するために本発明は１フレーム内の音声
データＸ　（ｎ）から自己相関係数Ｗ（τ）を求め、Ｗ
（τ）の値の最大を示す第１の極太値Ｗ（τ）ｍａｘｌ
と２番目の大きさの凸部金持つ第２の極大値Ｗ（τ）ｍ
ａ　ｘ２を検出してＷ（τ）ｍａｘｌ　とＷ（τ）ｍａ
ｘ２　を当該フレームの音声データの振幅の絶対値総和
Ｉ’ＳＩＧで各々正規化した値ＡＷ１とＡＷ２　’ｉ求
め、こ（７）ＡＷｌとＡＷ２と前フレームで既に抽出さ
れたピンチ周期ＰＰＩＴ等によってＡＷｌ　とＡＷ’２
　’ｉ与えるピッチ周期ＰＩＴ１とＰＩＴ２のいずれか
全選択してこれを当該フレームのピンチ周期とするよう
にしたピッチ抽出方法である。Structure of the Invention In order to achieve the above object, the present invention calculates an autocorrelation coefficient W(τ) from audio data X(n) within one frame, and
The first extremely thick value W(τ)maxl indicating the maximum value of (τ)
and the second maximum value W(τ)m with the second largest convex metal
Detect a x2 and calculate W(τ)maxl and W(τ)ma
The values AW1 and AW2 'i are obtained by normalizing x2 with the sum of absolute values of the amplitudes of the audio data of the frame I'SIG, and (7) AWl is calculated using AWl, AW2, and the pinch period PPIT already extracted in the previous frame. and AW'2
This pitch extraction method selects all pitch periods PIT1 and PIT2 given by 'i' and sets this as the pinch period of the frame.

実施例の説明以下、本発明を実施例によって詳説する。Description of examples Hereinafter, the present invention will be explained in detail with reference to Examples.

第１図は本発明方法全実施する装置の構成を示したブロ
ック図である。マイク等から入力端子１を経て入力され
た音声信号はＡＤ変換器２で標本化。FIG. 1 is a block diagram showing the configuration of an apparatus for carrying out the entire method of the present invention. The audio signal input from a microphone etc. through input terminal 1 is sampled by AD converter 2.

量子化されディジタルデータとなり１０ｍ５〜２０ｍ５
程度の１フレ一ム分がまとまりとなってデータバッファ
３に蓄積される。第２図は音声波形２１とフレームの関
係を示した図である。フレーム１分析時においてデータ
バッファ３に貯えられた音声データＸ（ｎ）は、自己相
関器４で（１）式％式％（１（ＮＵフレーム内のサンプル数）によって自己相関係数Ｗ（τ）が割算される。第３図ａ
はフレーム１におけるＷ（τ）を図示したものである。Quantized digital data 10m5 to 20m5
One frame of data is stored in the data buffer 3 as a group. FIG. 2 is a diagram showing the relationship between the audio waveform 21 and frames. The audio data X(n) stored in the data buffer 3 during frame 1 analysis is processed by the autocorrelator 4 using the autocorrelation coefficient W(τ ) is divided.Figure 3a
is a diagram illustrating W(τ) in frame 1.

極太値検出器６ではＷ（τ）から最大値を持つ第１の極
太値３０１のＶｖ’（τ）ｍａｘｌと２番目に大きい第
２の極太値３０２のＷ（τ）　　　を求め、各々の極大
ａｘ２値を与えるτの値３０３．３０４ｉピッチ周期の第１候
補ＰＩＴ１及び第２候袖ＰＩＴ２と定める。The thick value detector 6 calculates Vv'(τ)maxl of the first thick value 301 having the maximum value and W(τ) of the second thick value 302 having the second largest value from W(τ), and calculates each maximum value. The value of τ that gives the ax2 value is determined as the first candidate PIT1 and the second candidate PIT2 with a pitch period of 303.304i.

一方、絶対値加算器５において、（２）式で与えられる
音声データの振幅の絶対値総和ｌ５ＩＧが求められる。On the other hand, the absolute value adder 5 calculates the absolute value sum l5IG of the amplitude of the audio data given by equation (2).

このｌ５ＩＧ（５用いて極太値検出器６で（３）式によ
り極太値Ｗ（τ）ｍａｘｌとｗ（τ）ｍａｘ２　の正規
化を行ないＡＷｌとＡＷ２ｉ求める。Using this l5IG(5), the thick value detector 6 normalizes the thick values W(τ)maxl and w(τ)max2 according to equation (3) to obtain AWl and AW2i.

次にＡＷｌ、ＰＩＴｌ及びＡＷ２　、　Ｐ　Ｉ　Ｔ２の
２組のデータは、第１候補バックァ７．第２候補バツフ
ア８に一時記憶される。選択器９は第１候補バツフア７
と第２候補バツフア８のいずれの出力が正しいピッチか
判断選択し出力端子１Ｑに抽出しだピッチを出力する。Next, the two sets of data AWl, PITl and AW2, PIT2 are stored in the first candidate backup 7. It is temporarily stored in the second candidate buffer 8. The selector 9 selects the first candidate buffer 7
It is determined which output from the second candidate buffer 8 is the correct pitch, and the extracted pitch is output to the output terminal 1Q.

９における選択は以下の様な２つの選択方法のいずれか
で行なわれる。The selection at step 9 is performed by one of the following two selection methods.

（選択方法１）第２図のフレーム１における正しいピッチ２２と半周期
ピッチ２３は第３図では各々３０３と３０４に対応し、
極大値３０１のＡＷｌと極太値３０２のＡＶＶ’２との
差３０５は大きいので、第１候補ＰＩＴ１’ｉピツチと
判断する。(Selection method 1) The correct pitch 22 and half-period pitch 23 in frame 1 in FIG. 2 correspond to 303 and 304, respectively, in FIG.
Since the difference 305 between the maximum value 301 AWl and the maximum value 302 AVV'2 is large, it is determined that the first candidate PIT1'i pitch is reached.

（選択方法２）次に、フレーム２においては、第３図すの半周期ピッチ
３０８が第１候補となり、正しいピッチ３０９が第２候
補になって従来例では誤ピッチが抽出されるところであ
るが、極太値３０６のＡＷｌと極太値３０７のＡＷ２の
差３１０は小さいので、更に前のフレームで抽出された
ピッチＰＰＩＴに近い方のピッチを選択するという条件
を付加すると、第３図においてＰＰＩＴはフレーム１で
抽出されたピッチ３０３であり、ピッチ３０３とＰＩＴ
ｌ　　３０８との差３１１はピッチ３０３とＰＩＴ２３
０９との差３１２より太であるので、第２候補であるＰ
ＩＴ２３０９　を選択するものとする。(Selection method 2) Next, in frame 2, the half-period pitch 308 shown in FIG. , the difference 310 between AWl of the thickest value 306 and AW2 of the thickest value 307 is small, so if we add the condition of selecting the pitch that is closer to the pitch PPIT extracted in the previous frame, then in Fig. 3, PPIT is the same as the frame This is the pitch 303 extracted in step 1, and the pitch 303 and PIT
l The difference 311 from 308 is pitch 303 and PIT23
Since it is thicker than the difference from 09 by 312, the second candidate P
IT2309 shall be selected.

第１図の１１は以上の判定に必要な１つ前のフレームで
抽出されたピッチＰＰｌＴｌ保持するバッファである。Reference numeral 11 in FIG. 1 is a buffer that holds the pitch PPlTl extracted in the previous frame, which is necessary for the above determination.

第４図はＡＷｌとＡＷ２の値によってとるべき選択法全
示すもので、領域４５ではＡＷｌ　　とＡＷ２の差が太
きいと判断し選択方法１が採用され、領域４１では八Ｗ
１とＡＷ’２の差が小さいと判断し選択方法２がとられ
る。Figure 4 shows all the selection methods that should be taken depending on the values of AWl and AW2.
It is determined that the difference between AW'1 and AW'2 is small, and selection method 2 is adopted.

図中の３つの関数は、直線４１は　ＡＷ２　＝　ＡＷｌ直線４２は　ＡＷ２　＝　ＡＷＩ　Ｘ、−０，０３５直
線４３は　ＡＷ２　＝　ＡＷｌＸ、　−０，２３５であ
る。The three functions in the figure are: Straight line 41 is AW2 = AWl; Straight line 42 is AW2 = AWIX, -0,035; Straight line 43 is AW2 = AWlX, -0,235.

発明の効果本発明は、上記のようにフレーム内の音声信号の自己相
関係数から第１の極太値と第２の極太値を与える２つの
ピッチ候補のいずれかを選択する際候補の正規化した自
己相関係数の値によって異なった選択方法を適用するこ
とにより、従来のピンチ抽出装置にありがちな半周期又
は倍周期等の誤ピッチ抽出を防止し、確実に正しいピッ
チを抽出することを可能とする。Effects of the Invention As described above, the present invention provides normalization of the candidate when selecting one of the two pitch candidates that gives the first extremely thick value and the second extremely thick value from the autocorrelation coefficient of the audio signal within the frame. By applying different selection methods depending on the value of the autocorrelation coefficient, it is possible to prevent incorrect pitch extraction such as half period or double period, which is common with conventional pinch extraction devices, and to reliably extract the correct pitch. shall be.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明方法全実施する装置の構成を示したブロ
ック図、第２図は音声信号とフレームとの関係及び抽出
すべきピンチ周期と誤捷って抽出され易いピンチ周期を
示した波形図、第３図ａ。ｂは本発明方法によるピッチ抽出に用いられる正規化自
己相関係数全厚した図、第４図は本発明方法における２
つのピッチ候補から１つを選ぶ方法全決定する判別図で
ある。１・・・−入力端子、２・・・・・・ＡＤ変換器、３・
・・・データバッファ、４・・・・・・自己相関器、５
・・・・・・絶対値加算器、６・・・・・極太値検出器
、７，８・・・・・バッファ、９・・・・・選択器、１
０・・・・−出力端子。代理人の氏名　弁理士　中　尾　敏　男　ほか１名第３
図第４図Fig. 1 is a block diagram showing the configuration of an apparatus for carrying out the entire method of the present invention, and Fig. 2 is a waveform showing the relationship between an audio signal and a frame, and the pinch period that is likely to be extracted by mistake with the pinch period to be extracted. Figure 3a. b is a full-thickness diagram of the normalized autocorrelation coefficient used for pitch extraction by the method of the present invention, and Fig.
It is a discriminant diagram for determining the method of selecting one pitch candidate from two pitch candidates. 1...-input terminal, 2...AD converter, 3.
...Data buffer, 4...Autocorrelator, 5
......Absolute value adder, 6...Extreme value detector, 7, 8...Buffer, 9...Selector, 1
0...-output terminal. Name of agent: Patent attorney Toshio Nakao and 1 other person No. 3
Figure 4

Claims

【特許請求の範囲】[Claims]

音声波形をピッチ周期の数倍程度の区間に分割し、当該
区間での自己相関係数が最大となる第１の極太値を第２
の極大値及び各々の極太値を与える第１．第２のピッチ
周期の候補全求め、第１と第２の極大値の差が大きい時
は第１のピッチ周期候補を当該区間のピッチ周期と決定
し、前記極太値の差が少ない時には第１又は第２の候補
のうち、直前のフレームで既に決定されたピッチ周期に
近い方を選択して当該区間のピッチ周期を決定する小を
特徴とするピッチ抽出方法。Divide the audio waveform into sections several times the pitch period, and set the first thickest value that has the maximum autocorrelation coefficient in the section as the second.
The first one gives the maximum value and each thick value. All candidates for the second pitch period are determined. When the difference between the first and second maximum values is large, the first pitch period candidate is determined as the pitch period of the section; when the difference between the maximum values is small, the first pitch period candidate is determined. Alternatively, a pitch extraction method characterized in that the pitch period of the section is determined by selecting one of the second candidates that is closer to the pitch period already determined in the immediately previous frame.