JP3465941B2

JP3465941B2 - Pitch extraction device

Info

Publication number: JP3465941B2
Application number: JP32897793A
Authority: JP
Inventors: 裕久田崎; 正山浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1993-01-07
Filing date: 1993-12-24
Publication date: 2003-11-10
Anticipated expiration: 2018-11-10
Also published as: JPH06282296A

Abstract

PURPOSE:To provide a pitch extracting device for a voice signal by which a pitch can be extracted in few extracting errors even by a little processing quantity or a little delay. CONSTITUTION:A pitch extracting device has a window means 19 to determine a window width being a time width up to an end from a start of sampling with every frame of an input voice signal 1, a thinning-out means 5 which thins out and samples input sampling themes and outputs data, a pitch period calculating means 6 to calculate a pitch period of the input voice signal 1, and a cotrol means 4 which expands and contracts the window width and changes a thinning-out rate of the thinning-out means 5 by comparing an average value of the pitch period with every past frame with a prescribed value, and has a window position determining means, a noise removing means and a correcting means to select and output the pitch period.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、ディジタル音声信号
からピッチ周期またはその逆数であるピッチ周波数を実
時間で抽出するピッチ抽出装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pitch extracting device for extracting a pitch frequency, which is a reciprocal of a pitch period, from a digital voice signal in real time.

【０００２】[0002]

【従来の技術】ディジタル音声信号を少ない情報量に圧
縮して伝送あるいは蓄積する高能率符号化や音声合成等
を行う場合には、まず音声信号を所定の時間長のフレー
ムに分解して、フレーム毎の処理を行う。これらの高能
率音声符号化や音声合成における品質には、各フレーム
のピッチ周期の抽出精度がきわめて重要であり、このこ
とを考えて、様々なピッチ周期の高精度抽出方式が提案
されている。特に音声伝送を行う場合には、実時間処理
が必要となってくるため、処理量、遅延時間が少なく、
かつ高精度な抽出方式が必要である。2. Description of the Related Art In the case of performing high-efficiency coding or voice synthesis in which a digital voice signal is compressed into a small amount of information for transmission or storage, the voice signal is first decomposed into frames of a predetermined time length, Perform each process. The extraction accuracy of the pitch period of each frame is extremely important for the quality in these high-efficiency speech encoding and speech synthesis, and in consideration of this, various high-precision extraction methods of the pitch period have been proposed. Especially when performing voice transmission, real-time processing becomes necessary, so the processing amount and delay time are small.
Moreover, a highly accurate extraction method is required.

【０００３】この高精度ピッチ抽出法として考えられた
ものとしては、例えば特開昭５７−８２８９７号があ
る。図１７は、この従来のピッチ抽出法の構成を示す構
成図である。図において、１は音声信号を示し、５は間
引き手段、６はピッチ周期評価関数計算手段、１３はピ
ッチ周期、１４は最大値検出手段、１５はピッチ周期高
精度抽出手段である。A method considered as the high precision pitch extraction method is, for example, Japanese Patent Laid-Open No. 57-82897. FIG. 17 is a configuration diagram showing the configuration of this conventional pitch extraction method. In the figure, 1 is a voice signal, 5 is a thinning means, 6 is a pitch cycle evaluation function calculation means, 13 is a pitch cycle, 14 is a maximum value detection means, and 15 is a pitch cycle high precision extraction means.

【０００４】以下、この従来のピッチ抽出法の動作を説
明する。まず、例えば８ＫＨｚでサンプリングされ、８
００Ｈｚの低域通過フィルタリングされたディジタル音
声信号が、音声信号１として間引き手段５に入力され
る。間引き手段５は、音声信号１の２個以上の信号毎に
重み付け加算を行い、結果として得られる間引かれた音
声信号を出力する。ピッチ周期評価関数計算手段６は、
前記間引かれた音声信号の自己相関関数を算出し、これ
をピッチ周期評価関数として最大値検出手段１４に出力
する。最大値検出手段１４は、入力されたピッチ周期評
価関数の最大値を探索し、最大値とその位置、前後の周
期でのピッチ周期評価関数をピッチ周期高精度抽出手段
１５に出力する。ピッチ周期高精度抽出手段１５は、入
力された３点のピッチ周期評価関数の値を用いて、例え
ば放物線近似によってより精度の高い周期を算出し、ピ
ッチ周期１３として出力する。このように構成すること
により、全周期において高精度なピッチ周期評価関数の
算出が不要となり、より少ないメモリ量、処理量にてピ
ッチ周期の抽出が可能となっている。The operation of this conventional pitch extraction method will be described below. First, for example, sampling at 8 KHz,
The low-pass filtered digital voice signal of 00 Hz is input to the thinning means 5 as the voice signal 1. The thinning means 5 performs weighted addition for each of two or more signals of the audio signal 1 and outputs the resulting thinned audio signal. The pitch period evaluation function calculation means 6 is
An autocorrelation function of the decimated voice signal is calculated and output to the maximum value detecting means 14 as a pitch period evaluation function. The maximum value detection means 14 searches for the maximum value of the input pitch cycle evaluation function, and outputs the maximum value, its position, and the pitch cycle evaluation function in the preceding and succeeding cycles to the pitch cycle high accuracy extraction means 15. The high-precision pitch period extraction means 15 uses the values of the input pitch period evaluation functions of the three points to calculate a period with higher precision, for example, by parabolic approximation, and outputs it as the pitch period 13. With this configuration, it is not necessary to calculate the pitch cycle evaluation function with high accuracy in all cycles, and the pitch cycle can be extracted with a smaller amount of memory and processing.

【０００５】また、前記従来のピッチ抽出法の他の先行
技術として、これを上回る高精度な抽出が可能であると
報告されている方式として、特開昭６２−１９４３００
号がある。図１８は、この従来のピッチ抽出法の構成を
示す構成図である。図において、８はピッチ周期候補算
出手段、１６は部分評価関数計算手段、１７は重み制御
手段、１８は判定手段である。その他の図１７と同一部
分には同一番号を付してある。Further, as another prior art of the above-mentioned conventional pitch extraction method, a method reported to be capable of highly accurate extraction exceeding this is disclosed in Japanese Patent Laid-Open No. 62-194300.
There is an issue. FIG. 18 is a configuration diagram showing the configuration of this conventional pitch extraction method. In the figure, 8 is a pitch period candidate calculation means, 16 is a partial evaluation function calculation means, 17 is a weight control means, and 18 is a determination means. Other parts that are the same as those in FIG. 17 are given the same numbers.

【０００６】以下、この従来の他のピッチ抽出法の動作
を説明する。まず、例えば８ＫＨｚでサンプリングさ
れ、８００Ｈｚの低域通過フィルタリングされたディジ
タル音声信号が、音声信号１として間引き手段５と部分
評価関数計算手段１６に入力される。間引き手段５は、
音声信号１を例えば１／４に間引き、間引かれた音声信
号を出力する。ピッチ周期評価関数計算手段６は、前記
間引かれた音声信号の自己相関関数を算出し、これをピ
ッチ周期評価関数としてピッチ周期候補算出手段８に出
力する。重み制御手段１７は過去のピッチ周期１３に応
じて第１の制御パラメータＰ１と第２の制御パラメータ
Ｐ２を求める。ピッチ周期候補算出手段８は、前記第１
の制御パラメータＰ１に基づいて、入力されたピッチ周
期評価関数に重み付けを行い、重み付けされたピッチ周
期評価関数の最大値から複数のピッチ周期候補を抽出す
る。部分評価関数計算手段１６は、音声信号１の一部に
対して、前記ピッチ周期候補におけるピッチ周期評価関
数を算出し、部分評価関数として判定手段１８に出力す
る。判定手段１８は、前記第２の制御パラメータＰ２に
基づいて、前記部分評価関数に重み付けを行った後、そ
の最大値を与える周期をピッチ周期１３として出力す
る。このように構成することにより、過去に抽出された
ピッチ周期系列に対して連続性が良く、特開昭５７−８
２８９７号に開示されている方式に比べてより正確にピ
ッチ周期の抽出が可能である。The operation of this other conventional pitch extraction method will be described below. First, a digital audio signal sampled at, for example, 8 KHz and subjected to low-pass filtering at 800 Hz is input as the audio signal 1 to the decimation unit 5 and the partial evaluation function calculation unit 16. The thinning means 5 is
The audio signal 1 is decimated to, for example, 1/4, and the decimated audio signal is output. The pitch cycle evaluation function calculation means 6 calculates the autocorrelation function of the thinned-out speech signal and outputs it to the pitch cycle candidate calculation means 8 as a pitch cycle evaluation function. The weight control means 17 determines the first control parameter P1 and the second control parameter P2 according to the past pitch period 13. The pitch cycle candidate calculation means 8 is the first
The input pitch period evaluation function is weighted based on the control parameter P1 of 1, and a plurality of pitch period candidates are extracted from the maximum value of the weighted pitch period evaluation function. The partial evaluation function calculation means 16 calculates a pitch cycle evaluation function in the pitch cycle candidate for a part of the voice signal 1 and outputs it to the determination means 18 as a partial evaluation function. The determining means 18 weights the partial evaluation function based on the second control parameter P2, and then outputs the cycle giving the maximum value as the pitch cycle 13. With this configuration, the continuity is good with respect to the pitch period series extracted in the past, and it is disclosed in JP-A-57-8.
The pitch period can be more accurately extracted as compared with the method disclosed in 2897.

【０００７】また、少しの遅延を許すことにより、高精
度で連続性の良いピッチ周期を抽出する方法として、D.
W.Griffin and J.S.Lim著■Multiband Excitation Voco
der■（IEEE Trans. Acoust., Speech, Signal Process
ing, Aug. 1988, pp. 1223-1235）に示されたものがあ
る。図１９は、この従来のピッチ抽出法の構成を示す構
成図である。図において、１０１は音声信号を示し、１
０２は抽出されたピッチ周期を示す。また、１０３はピ
ッチ周期評価関数計算手段、１０４はピッチ周期評価関
数を記憶するバッファであり、１０５は後向予測手段で
ある。１０６はピッチ周期評価関数を１フレーム遅延し
て出力する遅延回路であり、１０７は抽出されたピッチ
を１フレーム遅延して出力する遅延回路である。１０８
はピッチ周期評価関数抽出手段であり、１０９は過去フ
レームにおいて抽出されたピッチ周期におけるピッチ周
期評価関数を記憶するバッファであり、１１０は前向予
測手段である。１１１は補正手段である。Further, as a method for extracting a pitch period with high accuracy and good continuity by allowing a small delay, D.
W. Griffin and JSLim ■ Multiband Excitation Voco
der ■ (IEEE Trans. Acoust., Speech, Signal Process
ing, Aug. 1988, pp. 1223-1235). FIG. 19 is a configuration diagram showing the configuration of this conventional pitch extraction method. In the figure, 101 indicates an audio signal, 1
02 indicates the extracted pitch period. Further, 103 is a pitch period evaluation function calculation means, 104 is a buffer for storing the pitch period evaluation function, and 105 is a backward prediction means. Reference numeral 106 denotes a delay circuit which delays the pitch period evaluation function by one frame and outputs it, and 107 denotes a delay circuit which delays the extracted pitch by one frame and outputs it. 108
Is a pitch cycle evaluation function extraction means, 109 is a buffer for storing the pitch cycle evaluation function in the pitch cycle extracted in the past frame, and 110 is a forward prediction means. Reference numeral 111 is a correction means.

【０００８】以下、従来のピッチ抽出装置の動作につい
て説明する。ピッチ周期評価関数計算手段１０３は、音
声信号１０１よりフレーム毎にピッチ周期評価関数を計
算し、これをバッファ１０４に出力する。バッファ１０
４は前記ピッチ周期評価関数をピッチ抽出対象としてい
るフレームを先頭にＮフレーム分記憶し、このＮフレー
ムのピッチ周期評価関数を後向予測手段１０５に出力す
るとともに、ピッチ抽出対象としているフレームのピッ
チ周期評価関数を前向予測手段１０７と遅延回路１０６
に出力する。The operation of the conventional pitch extracting device will be described below. Pitch cycle evaluation function calculation means 103 calculates a pitch cycle evaluation function for each frame from audio signal 101 and outputs this to buffer 104. Buffer 10
Reference numeral 4 stores N frames of the frame whose pitch period evaluation function is the subject of pitch extraction, and outputs the pitch period evaluation function of the N frames to the backward prediction means 105, and the pitch of the frame which is the subject of pitch extraction. The forward evaluation means 107 and the delay circuit 106 are used as the period evaluation function.
Output to.

【０００９】後向予測手段１０５は、前記バッファ１０
４より入力されたＮフレームのピッチ周期評価関数よ
り、ピッチ抽出対象としているフレームのピッチ周期P₀
の後向信頼度CE_B(P₀)を、例えば式（１）に従って求め
る。ここで、E_n(P_n) はピッチ抽出対象フレームよりｎ
フレーム未来のフレームの周期P_nにおけるピッチ周期評
価関数であり、P_n(n = 1, 2, ..., N-1)はCE_B(P₀) を最
大にするものとして決定する。ただし、P_n(n = 1, 2,
..., N-1)はＮフレーム間で連続的であるように、例え
ば式（２）に従ってその存在範囲を制限する。The backward prediction means 105 includes the buffer 10
From the pitch period evaluation function of N frames input from No. 4, the pitch period P _{0 of the} frame subject to pitch extraction
The backward reliability CE _B (P ₀ ) is calculated according to, for example, equation (1). Where E _n (P _n ) is _n from the pitch extraction target frame.
The frame is a pitch period evaluation function in the period P _n of the frame in the future, and P _n (n = 1, 2, ..., N-1) is determined as the one that maximizes CE _B (P ₀ ). However, P _n (n = 1, 2,
, N-1) is so continuous as to be continuous between N frames, its existence range is limited, for example, according to equation (2).

【００１０】[0010]

【数１】 [Equation 1]

【００１１】次にこの後向信頼度が最大となるP₀を探索
し、このP₀を後向予測ピッチ周期P_Bとする。そして、こ
の後向予測ピッチ周期P_Bとこのときの後向信頼度CE
_B(P_B) を補正手段１１１へ出力する。Next, P ₀ that maximizes the backward reliability is searched, and this P ₀ is set as the backward predicted pitch period P _B. Then, this backward prediction pitch period P _B and the backward reliability CE at this time
_B (P _B ) is output to the correction means 111.

【００１２】遅延回路１０６はピッチ抽出対象フレーム
の１フレーム前のピッチ周期評価関数E_-1 をピッチ周期
評価関数抽出手段１０８に出力する。また、遅延回路１
０７はピッチ抽出対象フレームの１フレーム前に抽出さ
れたピッチ周期P_-1 をピッチ周期評価関数抽出手段１０
８と前向予測手段１１０に出力する。ピッチ周期評価関
数抽出手段１０８は、前記遅延回路１０６より入力され
たピッチ周期評価関数E_-1と前記遅延回路１０７より入
力されたピッチ周期P_-1より、前フレームにおいて抽出
したピッチ周期P_-1に対応するピッチ周期評価関数E_-1(P
_-1) を求め、これをバッファ１０９に出力する。バッフ
ァ１０９は、前記ピッチ周期評価関数抽出手段１０８よ
り入力された過去のフレームで抽出したピッチ周期P_-m
に対応するピッチ周期評価関数E_-m(P_-m)(m=1,2,..., M-
1)を、ピッチ抽出対象としているフレームの直前Ｍ−１
フレーム分記憶し、このＭ−１フレーム分のピッチ周期
評価関数を前向予測手段１１０に出力する。前向予測手
段１１０は、前記バッファ１０４より入力されたピッチ
抽出対象フレームのピッチ周期評価関数E₀より前向予測
ピッチ周期P_Fを、例えばその最大値を与える周期として
求める。ただし、P_Fは、遅延回路１０７より入力される
直前のフレームで抽出されたピッチ周期P_-1 と連続的で
あるように、例えば式（３）に従ってその存在範囲を制
限する。次に、前向信頼度CE_F(P_F) を、例えば式（４）
に従って求める。そして、前記前向予測ピッチ周期P_Fと
このときの前向信頼度CE_F(P_F)を補正手段１１１へ出力
する。The delay circuit 106 outputs the pitch cycle evaluation function E _-1 one frame before the pitch extraction target frame to the pitch cycle evaluation function extraction means 108. Also, the delay circuit 1
Reference numeral 07 denotes the pitch period evaluation function extracting means 10 for the pitch period P _-1 extracted one frame before the frame for pitch extraction.
8 and the forward prediction means 110. Pitch period evaluation function extracting means 108, the delay circuit from the pitch period P _-1 to input a pitch period evaluation function E _-1 input from the delay circuit 107 from 106, the pitch period P _-1 extracted in the previous frame The pitch period evaluation function E _-1 (P
₋₁ ), and outputs it to the buffer 109. The buffer 109 receives the pitch period P _-m extracted in the past frame input from the pitch period evaluation function extracting means 108.
Pitch period evaluation function E _-m (P _-m ) (m = 1,2, ..., M-
1) is M-1 immediately before the frame for which the pitch is to be extracted.
The pitch period evaluation function for M-1 frames is stored and output to the forward prediction means 110. The forward predicting means 110 obtains the forward predictive pitch period P _F from the pitch period evaluation function E ₀ of the pitch extraction target frame input from the buffer 104, for example, as the period giving the maximum value thereof. However, P _F has its existence range limited according to, for example, equation (3) so that it is continuous with the pitch period P ₋₁ extracted in the frame immediately before being input from the delay circuit 107. Next, the forward reliability CE _F (P _F ) can be calculated using, for example, equation (4).
Ask according to. Then, the forward predicted pitch period P _F and the forward reliability CE _F (P _F ) at this time are output to the correction unit 111.

【００１３】[0013]

【数２】 [Equation 2]

【００１４】補正手段１１１は前記後向予測手段１０５
より入力された後向信頼度CE_B(P_B)と前記前向予測手段
１１０より入力された前向信頼度CE_F(P_F) を比較し、例
えば、 CE_B(P_B)/N＞CE_F(P_F)/M ならば前記後向予測手段１０５より入力された後向予測
ピッチ周期P_Bを、それ以外の場合は前記前向予測手段１
１０より入力された前向予測ピッチ周期P_Fを最終的なピ
ッチ周期１０２として選択し、出力する。また前記抽出
されたピッチ周期を遅延回路１０７にも出力する。The correction means 111 is the backward prediction means 105.
The backward reliability CE _B (P _B ) input by the above and the forward reliability CE _F (P _F ) input by the forward prediction means 110 are compared, and for example, CE _B (P _B ) / N> If CE _F (P _F ) / M, then the backward prediction pitch period P _B input from the backward prediction means 105 is used. Otherwise, the forward prediction means 1 is used.
The forward prediction pitch cycle P _F input from 10 is selected as the final pitch cycle 102 and output. The extracted pitch period is also output to the delay circuit 107.

【００１５】[0015]

【発明が解決しようとする課題】上記特開昭５７−８２
８９７号および特開昭６２−１９４３００号に開示され
た従来のピッチ抽出法を用いたピッチ抽出装置では、話
者毎にかなり分布が異なるピッチ周期を求めるにもかか
わらず、固定の間引き処理を行っており、全ての話者に
対して高い抽出精度を保つために余り高い間引き率がと
れず、処理量が十分に削減されていない課題がある。音
声信号の標本化周波数が８ＫＨｚの場合、従来例では１
／４程度の間引き処理を行っているが、ピッチ周期が短
い女性の音声信号の場合には１／４の間引きでは抽出誤
りが急増してしまう。しかし、１／２の間引きでは処理
量の低減は十分とは言えない。DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention
In the pitch extraction device using the conventional pitch extraction method disclosed in Japanese Patent Application Laid-Open No. 897-1987 and Japanese Patent Application Laid-Open No. 62-194300, fixed thinning-out processing is performed, even though a pitch period having a considerably different distribution is obtained for each speaker. However, there is a problem that the throughput cannot be sufficiently reduced in order to maintain a high extraction accuracy for all speakers, and the processing amount is not sufficiently reduced. When the sampling frequency of the audio signal is 8 KHz, it is 1 in the conventional example.
Although decimating processing of about / 4 is performed, extraction errors increase sharply in decimating ¼ in the case of a female voice signal having a short pitch period. However, it cannot be said that the reduction of the processing amount is sufficient with the thinning-out of 1/2.

【００１６】上記特開昭５７−８２８９７号および特開
昭６２−１９４３００号に開示された従来のピッチ抽出
法を用いたピッチ抽出装置では、音声信号に雑音信号が
混入した場合、特にピッチ周期探索範囲の周期を持つ周
期性雑音が混入した場合に非常に多くの抽出誤りが発生
してしまうという課題がある。さらに、特開昭６２−１
９４３００号に開示された従来のピッチ抽出法を用いた
ピッチ抽出装置では、第１の制御パラメータＰ１により
過去に抽出したピッチ周期に対して連続性の高いピッチ
周期候補を抽出し、第２の制御パラメータＰ２によりピ
ッチ周期候補の整数分の１の周期を最終的なピッチ周期
１３として抽出しやすく重み付けしている。しかし、一
般的に相関関数のピーク値は実際のピッチ周期の整数分
の整数倍の周期となりやすいとされており、前記第２の
制御パラメータによる重み付けは、必ずしも少なくない
頻度で抽出誤りを引き起こす。そして、この抽出誤りの
影響が第１の制御パラメータによる重み付けのために次
のフレームに伝搬してしまうという課題がある。In the pitch extracting apparatus using the conventional pitch extracting method disclosed in the above-mentioned JP-A-57-82897 and JP-A-62-194300, when a noise signal is mixed in a voice signal, a pitch period search is particularly performed. There is a problem that a large number of extraction errors occur when periodic noise having a range of cycles is mixed. Furthermore, JP-A-62-1
In the pitch extraction device using the conventional pitch extraction method disclosed in Japanese Patent No. 94300, a pitch cycle candidate having a high continuity with respect to the pitch cycle extracted in the past is extracted by the first control parameter P1, and the second control is performed. The parameter P2 is weighted so as to be easily extracted as a final pitch cycle 13 as a cycle of an integer fraction of the pitch cycle candidates. However, it is generally said that the peak value of the correlation function is likely to be a cycle that is an integral multiple of the actual pitch cycle, and the weighting with the second control parameter causes an extraction error with a frequency that is not a small number. Then, there is a problem that the influence of this extraction error propagates to the next frame due to the weighting by the first control parameter.

【００１７】また、少しの遅延を許す従来のピッチ抽出
装置では、ピッチ抽出対象フレームの前後数フレームを
含め評価を行い、ピッチ抽出を行っていた。しかし、ピ
ッチ周期の連続性を保ち安定したピッチ抽出を行うため
には、評価に含める前後のフレーム数Ｍ、Ｎを大きくと
る必要があり、ピッチ抽出に必要な遅延が大きくなると
いう課題があった。また、遅延を小さくするために評価
に含める後続フレーム数Ｎを小さくすると、ピッチ抽出
誤りが発生しやすくなり、また、常にピッチ周期の連続
性を考慮してピッチ抽出を行っているため、一度誤りが
発生すると、その誤りが後続フレームのピッチ抽出結果
にも伝搬するという課題もあった。さらに、常に数フレ
ーム通しての評価値のみを用いてピッチ周期を算出して
いるため、語頭、語尾では無声部などピッチ抽出に不適
当なフレームを評価に含めることがあり、有声部、無声
部を通して信頼度を計算し、ピッチ周期を求めたとき
に、全くピッチ周期とは無関係な抽出結果が得られる場
合があるという課題もあった。Further, in the conventional pitch extraction apparatus which allows a slight delay, the pitch extraction is performed by performing evaluation including several frames before and after the pitch extraction target frame. However, in order to maintain the continuity of the pitch cycle and perform stable pitch extraction, it is necessary to increase the number of frames M and N before and after it is included in the evaluation, which causes a problem that the delay required for pitch extraction becomes large. . Also, if the number N of subsequent frames included in the evaluation is reduced to reduce the delay, a pitch extraction error is likely to occur, and since the pitch extraction is always performed in consideration of the continuity of the pitch cycle, an error is generated once. However, there is also a problem that the error propagates to the pitch extraction result of the subsequent frame. In addition, since the pitch period is always calculated using only the evaluation values of several frames, frames that are unsuitable for pitch extraction, such as unvoiced parts at the beginning and end of words, may be included in the evaluation. There is also a problem that when the reliability is calculated through the above, and the pitch cycle is obtained, an extraction result completely independent of the pitch cycle may be obtained.

【００１８】この発明は、かかる課題を解決するために
なされたものであり、従来のピッチ抽出装置に比べて、
同等の処理量であればより精度の高いピッチ周期が抽出
でき、同等の抽出精度を達成する場合であればより少な
い処理量で抽出でき、さらに高雑音下でも安定に伝搬誤
りの少ない抽出が可能なピッチ抽出装置を実現すること
を目的としている。The present invention has been made in order to solve the above problems, and has a structure as compared with the conventional pitch extracting device.
If the amount of processing is equivalent, more accurate pitch periods can be extracted, if the same amount of extraction accuracy is achieved, it can be extracted with less amount of processing, and even with high noise, stable extraction with few propagation errors is possible. The objective is to realize a simple pitch extraction device.

【００１９】[0019]

【課題を解決するための手段】この発明に係るピッチ抽
出装置は、入力である音声信号のフレーム毎のサンプリ
ング開始から終了までの時間幅である窓の幅を決める窓
手段と、窓内の音声信号のピッチ周期を算出するピッチ
周期算出手段と、フレーム中において窓手段における窓
を時間方向にシフトさせて窓内の音声信号のパワーが最
大になるよう窓位置を制御する窓位置決定手段を備え
た。SUMMARY OF THE INVENTION The pitch extracting apparatus according to the present invention comprises a window means for determining the width of the window is a time width to the end from the sampling start for each frame of the input speech signal, speech in the window Pitch cycle calculating means for calculating the pitch cycle of the signal , and window in the window means in the frame
Is shifted in the time direction to maximize the power of the audio signal in the window.
A window position determining means for controlling the window position so as to be large was provided.

【００２０】また更に、窓内の入力音声信号のサンプリ
ング・データに対し間引きサンプリングしてデータ出力
する間引き手段と、過去のフレーム毎のピッチ周期平均
値が所定の値より大きいと上記窓の幅を拡げ、かつ上記
間引き手段の間引きを多くして粗くし、ピッチ周期の平
均値が所定の値より小さいと上記窓の幅を狭め、かつ上
記間引きを少なくして細かく出力するよう制御する制御
手段を備えた。 Furthermore, the sample of the input audio signal in the window
Sampling data for thinning data and output data
Decimation means and pitch period average for each past frame
If the value is larger than the specified value, the width of the window will be expanded and
The thinning means is increased to make it coarser and the pitch period is flattened.
If the average value is smaller than the specified value, the width of the window will be narrowed and
Control to control thin output by reducing thinning
Equipped with means .

【００２１】またこの発明のピッチ抽出装置は、入力の
音声信号のフレーム毎のサンプリング値から相関分析を
行い、得られた結果をピッチ周期評価関数として出力す
るピッチ周期評価関数計算手段と、入力音声信号を分析
して有声音、無声音、無音を含む複数のカテゴリに分類
する音声状態判定手段と、音声状態判定手段が判定した
無音フレームのピッチ周期評価関数の平均値を計算して
雑音評価関数とし、音声信号のピッチ周期評価関数から
雑音評価関数値を減算する雑音除去手段を備えた。Further, the pitch extraction apparatus of the present invention performs a correlation analysis from the sampling values of the input speech signal for each frame, and outputs the obtained result as a pitch period evaluation function, and pitch period evaluation function calculating means, and input speech. A voice state determination means for analyzing a signal to classify it into a plurality of categories including voiced sound, unvoiced sound, and silence, and an average value of pitch period evaluation functions of silence frames determined by the voice state determination means is calculated as a noise evaluation function. , Noise removal means for subtracting the noise evaluation function value from the pitch period evaluation function of the speech signal.

【００２２】またこの発明のピッチ抽出装置は、入力の
音声信号のフレーム毎のサンプリング値から相関分析を
行い、得られた結果をピッチ周期評価関数として出力す
るピッチ周期評価関数計算手段と、ピッチ周期評価関数
のピーク値が得られる周期をピッチ周期候補として抽出
するピッチ周期候補算出手段と、過去のフレームのピッ
チ周期抽出結果から現在のフレームのピッチ周期を予測
する予測手段と、予測手段出力の予測ピッチ周期とピッ
チ周期評価関数とを用いて補正ピッチ周期候補を計算す
る補正ピッチ周期候補算出手段と、これらピッチ周期候
補算出手段出力と予測手段出力と補正ピッチ周期候補算
出手段出力とから望ましいピッチ周期を選択出力する補
正手段を備えた。Further, the pitch extraction apparatus of the present invention performs a correlation analysis from the sampling value of each frame of the input speech signal, and outputs the obtained result as a pitch cycle evaluation function, and pitch cycle evaluation function calculation means, and pitch cycle evaluation function. Pitch cycle candidate calculation means for extracting the cycle at which the peak value of the evaluation function is obtained as a pitch cycle candidate, prediction means for predicting the pitch cycle of the current frame from the pitch cycle extraction results of the past frames, and prediction of the prediction means output A corrected pitch cycle candidate calculation means for calculating a corrected pitch cycle candidate using the pitch cycle and the pitch cycle evaluation function, and a desirable pitch cycle from these pitch cycle candidate calculation means output, prediction means output, and corrected pitch cycle candidate calculation means output. A correction means for selectively outputting is provided.

【００２３】またこの発明のピッチ抽出装置は、入力の
音声信号を分析して有声音、無音を含む複数のカテゴリ
に分類する音声状態判定手段と、複数のフレームの音声
信号に対する音声状態判定手段の各判定結果と、最終ピ
ッチ周期選択結果とから、抽出されたピッチ周期の確か
らしさを出力する信頼度判定手段を備えた。Further, the pitch extracting apparatus of the present invention includes a voice state determining means for analyzing an input voice signal and classifying it into a plurality of categories including voiced sound and silence, and a voice state determining means for voice signals of a plurality of frames. A reliability determination means for outputting the certainty of the pitch period extracted from each determination result and the final pitch period selection result was provided.

【００２４】さらにこの発明のピッチ抽出装置は、入力
の音声信号のフレーム毎のサンプリング値から相関分析
を行い、得られた結果をピッチ周期評価関数として出力
するピッチ周期評価関数計算手段と、ピッチ周期評価関
数のピーク値が得られる周期をピッチ周期候補として抽
出するピッチ周期候補算出手段と、過去のフレームのピ
ッチ周期抽出結果と現在のフレームのピッチ周期評価関
数から現在のフレームの前向予測ピッチ周期を算出する
前向予測手段と、現在及び未来のフレームのピッチ周期
評価関数から現在のフレームの後向予測ピッチ周期を算
出する後向予測手段と、入力音声信号のパワー情報を計
算するパワー計算手段と、このパワー情報により、上記
ピッチ周期候補算出手段出力と上記前向予測手段出力と
上記後向予測手段出力とから望ましいピッチ周期を選択
出力する補正手段を備えた。Further, the pitch extraction device of the present invention performs a correlation analysis from the sampling value of each frame of the input speech signal, and outputs the obtained result as a pitch period evaluation function, and pitch period evaluation function calculation means, and pitch period. Pitch cycle candidate calculation means for extracting a cycle in which the peak value of the evaluation function is obtained as a pitch cycle candidate, and a forward prediction pitch cycle of the current frame from the pitch cycle extraction result of the past frame and the pitch cycle evaluation function of the current frame , A backward prediction means for calculating the backward prediction pitch period of the current frame from the pitch period evaluation function of the current and future frames, and a power calculation means for calculating the power information of the input speech signal. Based on this power information, the output of the pitch cycle candidate calculation means, the output of the forward prediction means, and the backward prediction means The desired pitch period from the force with a correction means for selecting outputs.

【００２５】またこの発明のピッチ抽出装置は、入力の
音声信号のフレーム毎のサンプリング値から相関分析を
行い、得られた結果をピッチ周期評価関数として出力す
るピッチ周期評価関数計算手段と、任意の数の過去Ｍフ
レームと現在のフレームと任意の数の未来Ｎフレームの
ピッチ周期評価関数を用いて現在のフレームの予測ピッ
チ周期を計算するピッチ予測手段と、入力音声信号の音
声状態を判別し、この判別結果でピッチ予測手段への入
力フレーム数Ｍ、Ｎを制御するフレーム数制御手段を備
えた。Further, the pitch extracting apparatus of the present invention performs a correlation analysis from the sampling value of each frame of the input voice signal and outputs the obtained result as a pitch period evaluation function, and a pitch period evaluation function calculating means, Pitch predicting means for calculating a predicted pitch cycle of the current frame using the pitch cycle evaluation functions of the number of past M frames, the current frame, and an arbitrary number of future N frames; and a speech state of the input speech signal, A frame number control means for controlling the number M, N of input frames to the pitch prediction means based on the result of this discrimination is provided.

【００２６】またこの発明のピッチ抽出装置は、入力の
音声信号のピッチ周期を算出するピッチ抽出手段と、こ
の抽出されたピッチ周期毎に入力音声信号のパワーを計
算するパワー計算手段と、このパワー計算手段の出力パ
ワー情報の変遷が所定値内であれば上記ピッチを正しい
とし、パワー情報の変遷が所定値以上の場合は上記ピッ
チ周期を誤りとする誤りピッチ判定手段、とを備えた。The pitch extracting apparatus of the present invention further comprises a pitch extracting means for calculating the pitch period of the input voice signal, a power calculating means for calculating the power of the input voice signal for each of the extracted pitch periods, and this power. If the transition of the output power information of the calculation means is within the predetermined value, the above pitch is correct.
If the transition of power information is more than a predetermined value,
And an error pitch determining means for making the H cycle error .

【００２７】[0027]

【作用】本発明におけるピッチ抽出装置は、過去のフレ
ームで抽出したピッチ周期が計算され、その平均値が設
定値より大きいと、つまり入力音声信号の周波数が低い
と判定されると、間引きが行われて粗いサンプリングに
なる。逆に抽出したピッチ周期の平均値が設定値より小
さいと、つまり入力信号が女性の声などの場合には、分
析用の窓の幅を狭くし、かつサンプリングは相対的に細
かくする。In the pitch extracting apparatus according to the present invention, the pitch period extracted in the past frame is calculated, and when the average value is larger than the set value, that is, when the frequency of the input audio signal is low, the thinning-out is performed. It results in coarse sampling. On the contrary, when the average value of the extracted pitch periods is smaller than the set value, that is, when the input signal is a female voice, the width of the analysis window is narrowed and the sampling is relatively fine.

【００２８】またこの発明のピッチ抽出装置は、あらか
じめ決められた時間軸の範囲内で、所定の評価値、例え
ば信号パワーが最大になるように、つまり抽出誤りが避
けられる方向に、分析用の窓が動かされる。Further, the pitch extracting apparatus of the present invention is designed for analysis so that a predetermined evaluation value, for example, signal power is maximized within a range of a predetermined time axis, that is, extraction error is avoided. The windows are moved.

【００２９】またこの発明のピッチ抽出装置は、入力が
無音と判定したフレームのピッチ周期評価関数の平均値
が計算され、雑音が重畳した音声信号の相関関数から推
定した雑音信号の相関関数が減算され、影響が除去され
る。Further, in the pitch extracting apparatus of the present invention, the average value of the pitch period evaluation function of the frame whose input is determined to be silent is calculated, and the correlation function of the noise signal estimated from the correlation function of the voice signal on which noise is superimposed is subtracted. And the effects are removed.

【００３０】またこの発明のピッチ抽出装置は、過去の
フレームのピッチから種々のピッチ周期が予測され、あ
る定められた評価値、に基づいて望ましいピッチ周期が
選択される。Further, the pitch extracting apparatus of the present invention predicts various pitch periods from the pitches of past frames, and selects a desired pitch period based on a certain evaluation value.

【００３１】またこの発明のピッチ抽出装置は、入力の
音声信号の状態がカテゴリ別に分類され、この結果と、
最終のピッチ周期の選択結果から予測ピッチ周期の信頼
度が判断される。この信頼度の値がピッチ周期の選択に
影響を与える。Further, in the pitch extracting apparatus of the present invention, the states of the input voice signal are classified into categories, and the result and
The reliability of the predicted pitch period is determined from the selection result of the final pitch period. This confidence value affects the pitch period selection.

【００３２】またこの発明のピッチ抽出装置は、過去、
現在、未来のフレームから得られる前向、通常、後向の
予測ピッチ周期から、入力音声信号のパワーに基づい
て、つまり音声の有効部分を基に、組み合わされ、ピッ
チ周期が選択される。The pitch extracting device of the present invention has
A pitch period is selected by combining forward, normally, and backward predicted pitch periods obtained from present and future frames, based on the power of the input speech signal, that is, based on the effective portion of the speech.

【００３３】またこの発明のピッチ抽出装置は、予測ピ
ッチ周期の計算に際し、入力音声の状態が調べられ、そ
のカテゴリにより評価関数を計算するためのフレーム数
が選ばれる。Further, in the pitch extracting apparatus of the present invention, the state of the input voice is examined when calculating the predicted pitch period, and the number of frames for calculating the evaluation function is selected according to the category.

【００３４】またこの発明のピッチ抽出装置は、抽出さ
れたピッチ周期に基づいてその周期毎に入力音声信号の
エネルギーパワーが計算され、更にその計算されたパワ
ーの時間変化から抽出ピッチが正しいかどうかが判定さ
れる。Further, the pitch extracting apparatus of the present invention calculates the energy power of the input voice signal for each cycle based on the extracted pitch cycle, and further, from the time change of the calculated power, whether the extracted pitch is correct or not. Is determined.

【００３５】[0035]

【実施例】実施例１．本発明の実施例を図に基づいて説明する。図１は本発明
の実施例であるピッチ抽出装置の構成図である。図にお
いて新規な部分は、４の窓幅・間引き制御手段、１９の
窓手段がある。その他の間引き手段５、ピッチ周期評価
関数計算手段６、最大値検出手段１４は従来のものと同
等である。ただし間引き手段５は制御手段４からの指令
により間引き間隔が可変になっている。窓手段１９は、
入力音声信号１のｘ（ｎ）に対する現在のフレーム長Ｋ
と、例えば中心が一致するようにしてサンプリング数Ｊ
の信号ｙ（ｎ）＝ｗ（ｎ）・ｘ（ｎ）、（ここでｎ＝１
〜Ｊ）を切り出すものである。すなわち入力の音声信号
のサンプリング時間の幅を決めるもので、例えば８ＫＨ
ｚのクロックでサンプリングされるサンプリング数を
（Ｊ＝）２５６サンプルするか５１２サンプルするかな
どで決まる。ｗ（ｎ）は窓関数と呼ばれ、ｗ（ｎ）＝１
は方形窓である。またｗ（ｎ）＝α＋（１−α）cos
（２π（ｎ−Ｋ／２）／Ｊ）でα＝０．５４のとき、ハ
ミング窓である。EXAMPLES Example 1. An embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of a pitch extracting device which is an embodiment of the present invention. In the figure, new parts are 4 window width / thinning control means and 19 window means. Other thinning means 5, pitch period evaluation function calculation means 6, and maximum value detection means 14 are equivalent to those of the conventional one. However, the thinning-out means 5 has a variable thinning-out interval according to a command from the control means 4. The window means 19 is
Current frame length K for x (n) of input audio signal 1
And the number of samples J
Signal y (n) = w (n) .x (n), (where n = 1
~ J) are cut out. That is, it determines the width of the sampling time of the input audio signal, for example 8 KH.
The number of samplings performed by the z clock is (J =) 256 samples or 512 samples. w (n) is called a window function, and w (n) = 1
Is a rectangular window. Also, w (n) = α + (1-α) cos
It is a Hamming window when (2π (n−K / 2) / J) and α = 0.54.

【００３６】以下、図に示した本発明の一実施例の動作
を説明する。まず、例えば８ＫＨｚでサンプリングされ
て、８００Ｈｚの低域通過フィルタリングされたディジ
タル音声が、音声信号１として窓手段１９に入力され
る。窓手段１９は方形窓、ハミング窓いずれでもよく、
ある幅、つまり適当なサンプル長さを決めて入力の音声
信号１であるｙ（ｎ）を１ないしＪサンプルだけ切り出
す。この切り出されたｙ（１）ないしｙ（Ｊ）サンプル
が間引き手段５に入り、間引き率１／Ｎでリサンプリン
グされる。この結果間引き手段５の出力の信号ｚ（ｎ）
は１ないしＪ／Ｎサンプルとなり、ｚ（ｎ）＝ｙ（（ｎ
−１）Ｎ＋１）のリサンプリング結果となる。間引き手
段５が重み付け加算方式の場合もあるが、この場合は重
み付けの係数をｈ（ｉ）、加算数をａとすると、ｚ
（ｎ）＝Σｈ（ｉ）ｙ（（ｎ−１）ａ＋ｉ）で表される
出力となる。入力の音声信号１は、間引きサンプリング
がされて粗くなった状態で、ピッチ周期評価関数計算手
段６で自己相関等の相関分析がされる。更に最大値検出
手段１４では、ピッチ周期評価関数の最大値を検索し、
その最大位置からピッチ周期１３を算出する。The operation of the embodiment of the present invention shown in the drawings will be described below. First, 800 Hz low-pass filtered digital voice sampled at, for example, 8 KHz is input to the window means 19 as the voice signal 1. The window means 19 may be a rectangular window or a Hamming window,
A certain width, that is, an appropriate sample length is determined, and the input voice signal 1, y (n), is cut out by 1 to J samples. The cut out y (1) to y (J) samples enter the thinning means 5 and are resampled at a thinning rate of 1 / N. As a result, the signal z (n) of the output of the thinning means 5
Becomes 1 to J / N samples, and z (n) = y ((n
-1) N + 1) resampling results. The thinning means 5 may be a weighted addition method, but in this case, if the weighting coefficient is h (i) and the number of additions is a, then z
The output is represented by (n) = Σh (i) y ((n−1) a + i). The input voice signal 1 is subjected to thinning sampling and becomes coarse, and then the pitch period evaluation function calculating means 6 performs correlation analysis such as autocorrelation. Further, the maximum value detecting means 14 searches for the maximum value of the pitch period evaluation function,
The pitch cycle 13 is calculated from the maximum position.

【００３７】窓幅・間引き制御手段４は、この過去のピ
ッチ周期を憶えておき、この平均ピッチを算出する。そ
してこの平均値が所定の閾値、例えば６０サンプル以上
の場合には、窓手段１９に窓幅を例えば５１２サンプル
等、分析窓を長くするよう指令し、また間引き手段５に
例えば１／４つまり４ケに１ケ等、間引き率を高くする
よう指令する。また平均値が所定の閾値未満の場合に
は、分析窓を短くして例えば２５６サンプル、間引き率
を１／２と指令する。ところで、ピッチ周期が長い話
者、例えば男性では、その音声信号をかなり間引いて粗
くしてもピッチ周期の抽出精度は劣化しないことが実験
的に確かめられた。同様に、ピッチ周期の短い話者、例
えば女性では、サンプリングをする分析窓（ウインド
ゥ）幅を短くしてもピッチ周期の抽出精度は劣化しない
ことが確かめられた。これらの対象弾性５人女性５人に
よる各人８１文章の実験結果を図２に示す。なお、制御
手段４の過去のピッチ周期の記憶と更新は、過去の数値
の移動平均をとってもよいし、加算平均をとってもよ
い。このように構成することで、同一精度でよいならピ
ッチ周期を確定するための処理量が減る。The window width / thinning control means 4 remembers the past pitch period and calculates this average pitch. When this average value is a predetermined threshold value, for example, 60 samples or more, the window means 19 is instructed to lengthen the analysis window, for example, 512 samples, and the thinning means 5 is, for example, 1/4, that is, 4 Command 1 to increase the thinning rate. When the average value is less than the predetermined threshold value, the analysis window is shortened and, for example, 256 samples and the thinning rate are commanded to be 1/2. By the way, it has been experimentally confirmed that a speaker with a long pitch period, for example, a male, does not deteriorate the extraction accuracy of the pitch period even if the voice signal is thinned out and roughened. Similarly, it has been confirmed that a speaker having a short pitch period, for example, a woman, does not deteriorate the extraction accuracy of the pitch period even if the width of the analysis window (window) for sampling is shortened. FIG. 2 shows the experimental results of 81 sentences by each of 5 females having the target elasticity and 5 females. Note that the storage and update of the past pitch period of the control means 4 may be either a moving average of past numerical values or an addition average. With this configuration, the amount of processing for determining the pitch period is reduced if the same accuracy is required.

【００３８】実施例２．本発明のピッチ抽出装置の他の実施例を説明する。図３
は本実施例でのピッチ抽出装置の構成図である。図にお
いて新規な部分として、２の窓位置決定手段がある。そ
の他の窓手段１９、制御手段４、間引き手段５、ピッチ
周期評価関数計算手段６、最大値検出手段１４は実施例
１と同様の要素である。一般に、有声音の開始部分や終
了部分を含んで評価関数を計算すると、抽出誤りが発生
し易い。従って、過渡部分を避けると抽出誤りが避けら
れる。実施例２ではこのことを利用して、サンプリング
の分析窓を信号のパワーの大きい時間方向にシフトさせ
て抽出誤りを低減しようとする。Example 2. Another embodiment of the pitch extracting device of the present invention will be described. Figure 3
FIG. 3 is a configuration diagram of a pitch extraction device in this embodiment. As a new part in the figure, there are two window position determining means. The other window means 19, control means 4, thinning-out means 5, pitch period evaluation function calculation means 6, and maximum value detection means 14 are the same elements as in the first embodiment. In general, if the evaluation function is calculated including the beginning part and the ending part of the voiced sound, an extraction error is likely to occur. Therefore, avoiding the transient part avoids extraction errors. In the second embodiment, by utilizing this fact, the sampling analysis window is shifted in the time direction in which the power of the signal is large to reduce the extraction error.

【００３９】次に図３に基づき本実施例の構成による装
置の動作を説明する。まず窓位置決定といっても２通り
のやり方がある。具体的な数値として、フレーム長Ｋ＝
１６０サンプル、分析窓長Ｊ＝２４０〜４８０サンプル
の場合を考える。第１のやり方は、窓を動かす範囲が固
定の場合である。この場合には、サンプル対象が現在の
フレームを中心として前後のフレームを範囲とする例え
ば３ＫのＬ＝４８０サンプル固定に対して、その最初の
サンプリング位置からあるサンプル数だけずらした位置
を窓の開始位置としてＪサンプルの信号のサンプリング
を行う。そして窓内のパワー（例えば振幅絶対値和、相
関関数ピーク値など）を求め、最大のパワーを与えた窓
の位置を窓位置とする。なお、Ｊ＝Ｌの場合、つまり分
析窓長Ｊと窓を動かす範囲が一致する場合には、窓位置
は１通りであるので動かす必要はない。第２のやり方
は、窓を動かすサンプル数が固定の場合である。この場
合には、例えば窓を動かすサンプル数Ｌ’＝１２０サン
プルが分析窓長によらず固定であり、現在のフレームと
中心が一致する窓位置に対して、あるサンプル数だけ前
後にずらした窓位置においてサンプリングを行う。そし
て窓内のパワー最大となる位置とする。以後の動作は、
実施例１で説明したと同様になるので説明を省略する。
なお、窓位置決定手段を用いたやり方によると、窓位置
固定の場合に比較して、平均誤り率が男性は固定の場合
の１．１９％が１．１３％に、女性は固定の場合の０．
６０％が０．３２％に改善された。Next, the operation of the apparatus having the configuration of this embodiment will be described with reference to FIG. First of all, there are two ways to determine the window position. As a concrete numerical value, the frame length K =
Consider the case of 160 samples and analysis window length J = 240 to 480 samples. The first method is when the range of moving the window is fixed. In this case, for example, with respect to a fixed L = 480 samples of 3K whose center is the current frame and the frames before and after the current frame, a position shifted by a certain number of samples from the first sampling position is the start of the window. The signal of J samples is sampled as the position. Then, the power in the window (for example, the sum of absolute amplitudes, the peak value of the correlation function, etc.) is obtained, and the position of the window to which the maximum power is given is set as the window position. When J = L, that is, when the analysis window length J and the range in which the window is moved coincide with each other, there is only one window position and there is no need to move it. The second way is for a fixed number of samples to move the window. In this case, for example, the number of samples L ′ = 120 for moving the window is fixed regardless of the analysis window length, and the window is shifted back and forth by a certain number of samples with respect to the window position where the center coincides with the current frame. Sampling at location. Then, set the position in the window to the maximum power. The subsequent operation is
The description is omitted because it is similar to that described in the first embodiment.
According to the method using the window position determining means, the average error rate is 1.19% in the case of fixing the man to 1.13% and in the case of fixing the woman in the case of fixing the window position as compared with the case of fixing the window position. 0.
60% improved to 0.32%.

【００４０】実施例３．本発明のピッチ抽出装置の他の実施例を説明する。図４
は本実施例でのピッチ抽出装置の構成図である。図にお
いて新規な部分として、３の音声状態判別手段、７の雑
音除去手段がある。その他の窓手段１９、制御手段４、
間引き手段５、ピッチ周期評価関数計算手段６、最大値
検出手段１４は実施例１と同様の要素である。一般に、
音声信号と雑音信号の相互相関は無視できる場合が多
い。この場合には、雑音信号の重畳した音声信号の相関
関数は、音声信号の相関関数と雑音信号の相関関数との
和となる。従って、雑音信号の相関関数を推定できれ
ば、これを雑音の重畳した入力の音声信号の相関関数か
ら、推定した雑音信号の相関関数を減算することで、雑
音の影響をなくすことができる。本実施例ではこのこと
に着目している。Example 3. Another embodiment of the pitch extracting device of the present invention will be described. Figure 4
FIG. 3 is a configuration diagram of a pitch extraction device in this embodiment. As new parts in the figure, there are 3 voice state determining means and 7 noise removing means. Other window means 19, control means 4,
The thinning means 5, the pitch period evaluation function calculation means 6, and the maximum value detection means 14 are the same elements as in the first embodiment. In general,
In many cases, the cross-correlation between the voice signal and the noise signal can be ignored. In this case, the correlation function of the voice signal on which the noise signal is superimposed is the sum of the correlation function of the voice signal and the correlation function of the noise signal. Therefore, if the correlation function of the noise signal can be estimated, the influence of noise can be eliminated by subtracting the estimated correlation function of the noise signal from the correlation function of the input voice signal on which noise is superimposed. This embodiment pays attention to this.

【００４１】次に図４に基づき本実施例の構成による装
置の動作を説明する。まず音声状態判別手段３は、入力
の音声信号１分析して、現在のフレームをいくつかの状
態に判別する。例えば有音、無音と分ける。音声信号で
あると判別した場合が有音であり、音声信号以外の雑音
信号のみであると判別した場合が無音である。また、更
に細かく有声音、雑音的有声音、無声音、無音と判別し
てもよい。そして判別結果を雑音除去手段７に出力す
る。なお、雑音的有声音とは、有声音と無声音の中間の
特性を持つ音声である。一方、窓手段１９、間引き手段
５、ピッチ周期評価関数計算手段６を経由した音声信号
１の処理については実施例１と同様である。雑音除去手
段７は、音声状態判別手段３からの信号を例えば３種類
に分けて受ける。すなわち無音か、無声音か、有声音ま
たは雑音的有声音かである。もし無音である場合には、
雑音除去手段７は内部に記憶している雑音評価関数を移
動平均法で更新する。雑音評価関数は、周期を変数とす
る関数である。同様にピッチ周期評価関数も、周期を変
数とした関数であり、有声音または雑音的有声音である
場合には、ピッチ周期評価関数計算手段６からのピッチ
周期評価関数から記憶していた雑音評価関数を減算し、
最大値検出手段１４に出力する。無声音である場合に
は、雑音除去手段７は雑音評価関数の更新は行わない
し、通常ピッチ周期の抽出を行わないので、減算処理も
行わない。Next, the operation of the apparatus having the configuration of this embodiment will be described with reference to FIG. First, the voice state determination means 3 analyzes the input voice signal 1 and determines the current frame into several states. For example, distinguish between voiced and silent. The case where it is determined that it is a voice signal is sound, and the case where it is determined that it is only a noise signal other than the voice signal is silence. Further, the voiced sound, the noisy voiced sound, the unvoiced sound, and the silent sound may be more finely discriminated. Then, the discrimination result is output to the noise removing means 7. It should be noted that the noisy voiced sound is a voice having an intermediate property between voiced sound and unvoiced sound. On the other hand, the processing of the audio signal 1 that has passed through the window means 19, the thinning-out means 5, and the pitch period evaluation function calculation means 6 is the same as in the first embodiment. The noise removing means 7 receives the signal from the voice state determining means 3 in three types, for example. That is, silence, unvoiced sound, voiced sound or noisy voiced sound. If it is silent,
The noise removing means 7 updates the internally stored noise evaluation function by the moving average method. The noise evaluation function is a function whose period is a variable. Similarly, the pitch period evaluation function is also a function having a period as a variable, and in the case of voiced sound or noisy voiced sound, the noise evaluation stored from the pitch period evaluation function from the pitch period evaluation function calculation means 6 Subtract the function,
It is output to the maximum value detecting means 14. In the case of unvoiced sound, the noise removing means 7 does not update the noise evaluation function and does not normally extract the pitch period, and therefore does not perform the subtraction process.

【００４２】こうして雑音の影響をなくすことができ
る。なお、雑音評価関数の更新は、加算平均で行っても
よい。実験によると、自動車走行雑音が平均Ｓ／Ｎで１
０ｄＢあったときに、平均誤り率が雑音除去手段が無い
場合の１７．６％が、本実施例では４．７％に減少し
た。In this way, the influence of noise can be eliminated. The noise evaluation function may be updated by averaging. According to the experiment, the car running noise has an average S / N of 1
When it was 0 dB, the average error rate decreased from 17.6% in the case without the noise removing means to 4.7% in this embodiment.

【００４３】実施例４．本発明のピッチ抽出装置の他の実施例を説明する。図５
は本実施例でのピッチ抽出装置の構成図である。図にお
いて新規な部分として、８ピッチ周期候補算出手段、９
の予測手段、１１の補正ピッチ周期候補算出手段、１２
の補正手段がある。その他の、窓手段１９、制御手段
４、間引き手段５、ピッチ周期評価関数計算手段６は実
施例１と同様の要素である。本実施例では、ピッチ周期
評価関数の最大値を与える周期を単純にピッチ周期とす
るのではなく、過去の値から予測も行い、いくつかのピ
ッチ周期候補を考え、その中から予測値との差が少ない
補正ピッチ周期候補を求め、ある条件対応でピッチ周
期、補正ピッチ周期、予測ピッチ周期の中から最終の求
めるピッチ周期を得ようとする。Example 4. Another embodiment of the pitch extracting device of the present invention will be described. Figure 5
FIG. 3 is a configuration diagram of a pitch extraction device in this embodiment. As a new part in the figure, 8 pitch cycle candidate calculation means, 9
Prediction means, 11 corrected pitch cycle candidate calculation means, 12
There is a correction means of. The other window means 19, control means 4, thinning-out means 5, and pitch period evaluation function calculation means 6 are the same elements as in the first embodiment. In the present embodiment, the period that gives the maximum value of the pitch period evaluation function is not simply the pitch period, but is also predicted from past values, several pitch period candidates are considered, and the predicted value is selected from among them. A corrected pitch cycle candidate with a small difference is obtained, and the final desired pitch cycle is to be obtained from the pitch cycle, the corrected pitch cycle, and the predicted pitch cycle under certain conditions.

【００４４】図６は図５の新規構成要素の動作を説明す
るフローチャート図である。図５と図６に基づき本実施
例の構成による装置の動作を説明する。実施例１と同様
の部分の動作説明は省略する。ピッチ周期評価関数計算
手段６により音声信号１のピッチ周期評価関数が求めら
れる。この値が図６に示すように、ステップＳ１でピッ
チ周期候補算出手段８により探索されて、Ｓ２で最大値
Ｘとその位置（ピッチ周期候補）を補正手段１２に向け
出力する。予測手段９は、ステップＳ３で過去のＭフレ
ームのピッチ周期の系列を用いて現在のフレームの予測
ピッチ周期を算出する。そしてこれを補正ピッチ周期候
補算出手段１１と、補正手段１２に出力する。ここで、
予測ピッチ周期は、Ｍ個のピッチ周期の平均、もしくは
Ｍ以下のｎ次近似予測値として算出することができる。FIG. 6 is a flow chart illustrating the operation of the new components of FIG. The operation of the apparatus having the configuration of this embodiment will be described with reference to FIGS. The description of the operation of the same parts as those in the first embodiment will be omitted. The pitch period evaluation function calculation means 6 obtains the pitch period evaluation function of the audio signal 1. As shown in FIG. 6, this value is searched by the pitch cycle candidate calculation means 8 in step S1, and the maximum value X and its position (pitch cycle candidate) are output to the correction means 12 in step S2. The predicting means 9 calculates the predicted pitch cycle of the current frame using the sequence of pitch cycles of the past M frames in step S3. Then, this is output to the corrected pitch period candidate calculation means 11 and the correction means 12. here,
The predicted pitch period can be calculated as an average of M pitch periods or as an nth-order approximate predicted value of M or less.

【００４５】補正ピッチ周期候補算出手段１１は、ステ
ップＳ４でピッチ周期評価関数計算手段６からのピッチ
周期評価関数の極大値Ｐ₁ 、Ｐ₂ 〜とその周期ｑ₁ 、ｑ
₂ 〜を探索する。ステップＳ５では、これらの極大値
と、予測手段９からの予測ピッチ周期ｑとの歪みｄ_I
（＝｜ｑ−ｑ_I ｜／Ｐ_I ²）を算出する。そしてステップ
Ｓ６でこの歪みを最小にする位置（補正ピッチ周期候
補）を求める。補正手段１２では、ステップＳ７で前記
最大値Ｘが所定の閾値１より大きいか小さいかを判定す
る。またステップＳ８で補正ピッチ周期候補と予測ピッ
チ周期との差を算出し、ステップＳ９で所定の閾値２よ
り大きいか小さいかを判定する。これらの判定結果によ
り、ステップＳ１０〜Ｓ１２でいずれかのピッチ周期が
選択され、ステップＳ１３で最終の確定したものをピッ
チ周期１３として出力する。また予測ピッチ周期算出の
ために、ステップＳ１４でこれを記憶する。The corrected pitch period candidate calculating means 11 at step S4 the maximum values P ₁ , P ₂ of the pitch period evaluating function from the pitch period evaluating function calculating means 6 and their periods q ₁ , q.
₂ to search. In step S5, the distortion d _I between these maximum values and the predicted pitch period q from the prediction means 9
(= | Q−q _I | / P _I ² ) is calculated. Then, in step S6, a position (correction pitch cycle candidate) that minimizes this distortion is obtained. The correction means 12 determines in step S7 whether the maximum value X is larger or smaller than a predetermined threshold value 1. Further, in step S8, the difference between the corrected pitch period candidate and the predicted pitch period is calculated, and in step S9 it is determined whether it is larger or smaller than the predetermined threshold value 2. Based on these determination results, one of the pitch periods is selected in steps S10 to S12, and the final confirmed one is output as the pitch period 13 in step S13. In addition, this is stored in step S14 in order to calculate the predicted pitch period.

【００４６】実施例５．本発明のピッチ抽出装置の他の実施例を説明する。図７
は本実施例でのピッチ抽出装置の構成図である。図にお
いて新規な部分として、１０の信頼度判定手段がある。
その他の各構成要素は今までの各実施例と同様の要素で
あるので、その内容と動作の説明は省略する。ただし補
正手段１２の処理の内容が図５及び図６と少し異なって
おり、本実施例における処理については図８で処理フロ
ーチャート図として示す。一般に、過去の抽出結果を現
在のフレームの抽出に利用する場合に、過去のフレーム
で抽出されたピッチ周期がどの程度信頼できるかが判
り、更に現在のフレームのピッチ周期候補が正しい確率
がどの程度か判っていれば、これらを併用してピッチ周
期の抽出誤りが伝播することが少ない、優れたピッチ抽
出が期待できる。Example 5. Another embodiment of the pitch extracting device of the present invention will be described. Figure 7
FIG. 3 is a configuration diagram of a pitch extraction device in this embodiment. As a new part in the figure, there are 10 reliability determination means.
Since the other components are the same as those in the above-described embodiments, the description of the content and operation thereof will be omitted. However, the contents of the processing of the correction means 12 are slightly different from those in FIGS. 5 and 6, and the processing in this embodiment is shown as a processing flow chart in FIG. In general, when the past extraction result is used to extract the current frame, it is possible to know how reliable the pitch period extracted in the past frame is, and what is the probability that the pitch period candidate of the current frame is correct. If it is known, it is possible to expect excellent pitch extraction in which the pitch period extraction error is rarely propagated by using them together.

【００４７】図７と図８に基づき本実施例の構成による
装置の動作を説明する。先の実施例と同様の動作をする
部分は説明を省略する。音声状態判定手段３は、音声信
号１を分析して、現在のフレームを、有声音、雑音的有
声音、無声音、無音に判別し、判別結果を信頼度判定手
段１０に出力する。信頼度判定手段１０には、無声状態
判定手段３による判別結果と、補正手段１２の最終的な
選択結果および選択過程が入力される。この入力からピ
ッチ周期１３の信頼度の判定を行ない、判定結果を予測
手段９に出力する。この信頼度判定とは、例えば、後述
する図８のＳ２３のステップにてＳ２４のステップへ進
んだ場合と、音声状態判定手段３において１つ前のフレ
ームの音声信号１を無声音もしくは無音と判定した場合
と、音声状態判定手段３において現在のフレームの音声
信号１を雑音的有声音と判定した場合にピッチ周期１３
の信頼度は低いとし、その他の場合は信頼度は高いとす
る。また、単に、音声信号１のパワーが所定の値以下の
場合に信頼度は低いとしてもよい。The operation of the apparatus having the configuration of this embodiment will be described with reference to FIGS. 7 and 8. The description of the parts that operate in the same manner as in the previous embodiment will be omitted. The voice state determination means 3 analyzes the voice signal 1 to determine the current frame as voiced sound, noisy voiced sound, unvoiced sound, or silence, and outputs the determination result to the reliability determination means 10. The reliability determination means 10 receives the determination result of the unvoiced state determination means 3, the final selection result of the correction means 12, and the selection process. From this input, the reliability of the pitch cycle 13 is determined, and the determination result is output to the prediction means 9. The reliability determination is performed, for example, when the process proceeds to step S24 in step S23 of FIG. 8 described later, or when the voice state determination means 3 determines that the voice signal 1 of the immediately preceding frame is unvoiced or silent. In this case, when the voice signal 1 of the current frame is determined to be a noisy voiced sound by the voice state determination means 3, the pitch cycle 13
Is considered to have low reliability, and other cases have high reliability. Alternatively, the reliability may be low when the power of the audio signal 1 is equal to or lower than a predetermined value.

【００４８】予測手段９は、過去のＭフレームのピッチ
周期の系列と、信頼度判定１０が判定した過去のＭフレ
ームのピッチ周期の信頼度の系列を用いて、現在のフレ
ームの予測ピッチ周期を算出し、この予測ピッチ周期を
補正周期候補算出手段１１と補正手段１２に出力する。
例えば、過去のＭフレームの中に信頼度の高いピッチ周
期があれば、最も近い過去の信頼度の高いピッチ周期の
値そのもの、もしくは信頼度の高い全てのピッチ周期の
平均、もしくはｎ（Ｍ以下）次近似予測により予測ピッ
チ周期を算出し、過去のＭフレームに信頼度の高いピッ
チ周期がない場合には、Ｍ個のピッチ周期の平均により
予測ピッチ周期を算出する。補正手段１２は、例えば図
８のフローチャートに従って、ピッチ周期候補算出手段
８から入力されたピッチ周期候補、予測手段９から入力
された予測ピッチ周期、補正ピッチ周期候補算出手段１
１から入力された補正ピッチ周期候補のいずれかを選択
し、ピッチ周期１３として出力する。The predicting means 9 uses the sequence of the pitch periods of the past M frames and the sequence of the reliability of the pitch periods of the past M frames judged by the reliability determination 10 to predict the predicted pitch period of the current frame. The predicted pitch period is calculated and output to the correction period candidate calculation means 11 and the correction means 12.
For example, if there is a highly reliable pitch period in the past M frames, the value of the closest highly reliable pitch period itself, or the average of all highly reliable pitch periods, or n (M or less) ) A predicted pitch cycle is calculated by the next approximate prediction, and when there is no highly reliable pitch cycle in the past M frames, the predicted pitch cycle is calculated by averaging M pitch cycles. The correction means 12 follows the flowchart of FIG. 8, for example, the pitch cycle candidate input from the pitch cycle candidate calculation means 8, the predicted pitch cycle input from the prediction means 9, and the corrected pitch cycle candidate calculation means 1
Any of the correction pitch cycle candidates input from 1 is selected and output as the pitch cycle 13.

【００４９】次に、図８に示した補正手段１２内の処理
について説明する。まずＳ２１のステップにて、予測手
段９内に格納されている過去のＭフレームのピッチ周期
の信頼度から、予測ピッチ周期の信頼度を算出する。算
出は、例えば、Ｍフレームの信頼度が全て高くなった
ら、次に全てが低くなるまで予測ピッチ周期の信頼度は
高いとし、逆にＭフレームの信頼度が全て低くなった
ら、次に全てが高くなるまで予測ピッチ周期の信頼度は
低いとすればよい。また、信頼度の高いフレーム数が所
定の値以上の場合に予測ピッチ周期の信頼度を高くし、
所定の値未満の場合に予測ピッチ周期の信頼度を低くし
てもよい。次に、Ｓ２２のステップにて、前記予測ピッ
チ周期の信頼度と、現在のフレームを含む有声音フレー
ムの連鎖数から、以降のステップで用いる閾値ａ、閾値
ｂ、閾値ｃを算出し、Ｓ２３のステップへ進む。ここ
で、有声音フレームの連鎖数が少ない場合と、予測ピッ
チ周期の信頼度が低い場合にはピッチ周期候補が最終的
に選択されやすくなるように各閾値の算出式を与えてお
く。Ｓ２３のステップにて、ピッチ周期候補算出手段８
から入力されたピッチ周期評価関数の最大値のパワー正
規化値と閾値ａ、ピッチ周期候補と予測ピッチ周期の誤
差率と閾ｂを比較して、ピッチ周期評価関数の最大値の
パワー正規化値が閾値ａより大きく、かつ誤差率が閾値
ｂより小さい場合にＳ２９のステップへ進み、それ以外
の場合にはＳ２４のステップへ進む。ここで２つの周期
ＸとＹの間の誤差率は、ＸとＹの差の絶対値を、ＸとＹ
の小さい方の値で割ることで算出する。Next, the processing in the correction means 12 shown in FIG. 8 will be described. First, in step S21, the reliability of the predicted pitch cycle is calculated from the reliability of the pitch cycle of the past M frames stored in the prediction means 9. For example, when the reliability of all the M frames becomes high, the reliability of the predicted pitch period is high until the reliability of all the M frames becomes low, and when the reliability of all the M frames becomes low, all the reliability becomes high. The reliability of the predicted pitch period may be low until it becomes higher. In addition, when the number of frames with high reliability is greater than or equal to a predetermined value, the reliability of the predicted pitch period is increased,
When the value is less than the predetermined value, the reliability of the predicted pitch period may be lowered. Next, in step S22, a threshold value a, a threshold value b, and a threshold value c used in the subsequent steps are calculated from the reliability of the predicted pitch period and the number of chained voiced sound frames including the current frame. Go to step. Here, when the number of chained voiced frames is small and when the reliability of the predicted pitch cycle is low, calculation formulas for the respective thresholds are given so that pitch cycle candidates are likely to be finally selected. In step S23, pitch period candidate calculation means 8
The power normalization value of the maximum value of the pitch period evaluation function and the threshold value a, and the error ratio of the pitch period candidate and the predicted pitch period and the threshold value b input from Is larger than the threshold value a and the error rate is smaller than the threshold value b, the process proceeds to step S29. Otherwise, the process proceeds to step S24. Here, the error rate between the two cycles X and Y is the absolute value of the difference between X and Y.
It is calculated by dividing by the smaller value of.

【００５０】Ｓ２４のステップでは、予測ピッチ周期の
信頼度の高低を調べ、信頼度が高い場合にはＳ２５のス
テップへ、信頼度が中くらいの場合にはＳ２６のステッ
プへ、信頼度が低い場合にはＳ２９のステップへ進む。
Ｓ２５のステップでは、補正ピッチ周期候補と予測ピッ
チ周期の誤差率と閾値ｃを比較して、誤差率が閾値ｃよ
り小さい場合にはＳ２８のステップへ、その他の場合に
はＳ２７のステップへ進む。Ｓ２６のステップでは、補
正ピッチ周期候補と予測ピッチ周期の誤差率と閾値ｃを
比較して、誤差率が閾値ｃより小さい場合にはＳ２８の
ステップへ、その他の場合にはＳ２９のステップへ進
む。Ｓ２７のステップでは、予測ピッチ周期をピッチ周
期１３として選択して、出力する。Ｓ２８ステップで
は、補正ピッチ周期候補をピッチ周期１３として選択し
て、出力する。Ｓ２９のステップでは、ピッチ周期候補
をピッチ周期１３として選択して、出力する。In step S24, the reliability of the predicted pitch period is checked. If the reliability is high, go to step S25. If the reliability is medium, go to step S26. If the reliability is low, To proceed to step S29.
In step S25, the error rate of the correction pitch cycle candidate and the predicted pitch cycle is compared with the threshold value c. If the error rate is smaller than the threshold value c, the process proceeds to step S28, and otherwise the process proceeds to step S27. In step S26, the error rate of the correction pitch cycle candidate and the predicted pitch cycle is compared with the threshold value c. If the error rate is smaller than the threshold value c, the process proceeds to step S28, and otherwise the process proceeds to step S29. In step S27, the predicted pitch cycle is selected and output as the pitch cycle 13. In step S28, the correction pitch cycle candidate is selected and output as the pitch cycle 13. In step S29, the pitch cycle candidate is selected and output as the pitch cycle 13.

【００５１】実施例６．上記実施例１ないし実施例５では、最大値検出手段１４
もしくはピッチ周期候補算出手段８において、入力され
たピッチ周期評価関数の最大値を検索し、最大値とその
位置を決定している。ピッチ周期評価関数の極大値にお
いて、前後の周期のピッチ周期評価関数の値を用いた曲
線近似、例えば放物線近似を行なって、より精度の高い
最大値とその位置を求める構成も可能である。Example 6. In the first to fifth embodiments, the maximum value detecting means 14
Alternatively, the pitch period candidate calculation means 8 searches for the maximum value of the input pitch period evaluation function, and determines the maximum value and its position. It is also possible to obtain a more accurate maximum value and its position by performing curve approximation, for example, parabolic approximation, using the values of the pitch period evaluation function of the preceding and following periods at the maximum value of the pitch period evaluation function.

【００５２】実施例７．上記実施例４ないし実施例５では、補正ピッチ周期候補
算出手段１１において、ピッチ周期評価関数の各極大値
を与える周期について、その周期と予測手段９から入力
された予測ピッチ周期の差と、その周期におけるピッチ
周期評価関数とによって決定される歪を算出している。
これを、前後の周期のピッチ周期評価関数の値を用いた
曲線近似、例えば放物線近似を行って、より精度の高い
極大値の周期を求めて、この周期に対して歪の算出を行
う構成としてもよい。Example 7. In the fourth to fifth embodiments described above, the corrected pitch period candidate calculation means 11 determines the difference between the cycle and the predicted pitch cycle input from the prediction means 9 with respect to the cycle that gives each maximum value of the pitch cycle evaluation function. The distortion determined by the pitch period evaluation function in the period is calculated.
This is a configuration in which the curve approximation using the value of the pitch period evaluation function of the preceding and following periods, for example, parabolic approximation is performed to obtain a period with a more accurate maximum value, and the distortion is calculated for this period. Good.

【００５３】実施例８．上記実施例１では、窓幅・間引き制御手段４として、過
去のピッチ周期の平均値と１つの閾値を比較して、比較
結果により２通りの分析窓長と間引き率のいずれかを選
択する構成となっている。しかし、これを閾値を複数設
けて、分析窓長と間引き率を３通り以上とする構成とし
てもよい。Example 8. In the first embodiment, the window width / thinning control unit 4 compares the average value of past pitch periods with one threshold value, and selects one of two analysis window lengths and thinning rates according to the comparison result. Has become. However, this may be configured by providing a plurality of threshold values and setting the analysis window length and the thinning rate to three or more.

【００５４】実施例９．上記実施例１では、窓幅・間引き制御手段４として、過
去のピッチ周期の平均値と閾値を比較して、比較結果に
より２通りの分析窓長と間引き率のいずれかを選択する
構成となっている。これを、比較結果により音声信号１
にかける低域通過フィルタの遮断周波数、もしくは分析
窓の種類をも選択させるように構成することもできる。Example 9. In the first embodiment, the window width / thinning control unit 4 is configured to compare the average value of the past pitch periods with the threshold value and select one of two analysis window lengths and thinning rates according to the comparison result. ing. This is the audio signal 1 according to the comparison result.
It is also possible to select the cutoff frequency of the low pass filter to be applied or the type of the analysis window.

【００５５】実施例１０．本実施例ではパワー情報をピッチ抽出の判定に用いるこ
とで、ピッチの誤抽出を軽減する例を説明する。これ
は、フレームパワーが小さいところは、入力音声信号が
不安定であるためである。具体的には、前後フレームと
の連続性を一切考慮していないピッチ周期を候補とする
ことで、連続誤りを軽減する。図９はこの発明の他の実
施例の構成図である。図９において図１９と同一の部分
については同一の符号を付し、説明を省略する。図９に
おいて、新規な部分として、１１２はパワー計算手段、
１１３は音声信号のパワーを記憶するバッファであり、
１１４はピッチ周期候補算出手段、１１５は補正手段で
ある。ここで前向予測手段１１０は、過去フレームで抽
出されたピッチ周期との連続性を失わないように現フレ
ームのピッチ周期を予測するものである。即ち、まずフ
レーム-1で抽出されたピッチ周期P_-1に対して、フレー
ム0では0.8*P_-1〜1.2*P_-1の範囲でピッチ周期を求め
る。これから前向予測ピッチ周期P_Fが求まる。次いで、
フレーム-M+1〜0 で抽出されたピッチ周期におけるピッ
チ周期評価関数の総和を求める。これから前向信頼度CE
_F(P_F)が求まる。また、後向予測手段１０５は、未来フ
レームで予測されるピッチ周期との連続性を失わないよ
うに現フレームのピッチ周期を予測するものである。即
ち、まずフレーム0 の全てのピッチ周期に対して、前向
予測手段１１０と同様にフレーム間でピッチ周期を求め
る範囲を限定しながら、フレーム0〜N-1のピッチ周期を
求める。次いで、フレーム0の全てのピッチ周期に対し
て、フレーム0〜N-1 で抽出されたピッチ周期における
ピッチ周期評価関数の総和を求める。更に上記２つから
求められた値の最大値及びその最大値をとるピッチ周期
を求め、後向信頼度CE_B(P_B)と、後向予測ピッチ周期P_B
を求める。また、ピッチ周期候補算出手段１１４は、前
後フレームとの連続性を一切考慮せず、現フレームだけ
でピッチ周期を算出するものである。即ち、フレーム0
で範囲制限せずにピッチ周期を求め、ピッチ周期候補P_C
を得る。図１０（ａ）で前向予測手段１１０の動作の様
子を、図１０（ｂ）で後向予測手段１０５の動作の様子
を、図１０（ｃ）でピッチ周期候補手段１１４の動作の
様子を示す。Example 10. In this embodiment, an example will be described in which power information is used for pitch extraction determination to reduce erroneous pitch extraction. This is because the input audio signal is unstable where the frame power is low. Specifically, the pitch error that does not consider the continuity with the preceding and succeeding frames is used as a candidate to reduce the continuous error. FIG. 9 is a block diagram of another embodiment of the present invention. In FIG. 9, the same parts as those in FIG. 19 are designated by the same reference numerals and the description thereof will be omitted. In FIG. 9, a new part 112 is power calculation means,
113 is a buffer for storing the power of the audio signal,
Reference numeral 114 is a pitch cycle candidate calculation means, and 115 is a correction means. Here, the forward prediction means 110 predicts the pitch cycle of the current frame so as not to lose continuity with the pitch cycle extracted in the past frame. That is, first, with respect to the pitch period P _-1 extracted in the frame-1, the pitch period is obtained in the range of 0.8 * P _{-1 to} 1.2 * P _{-1 in} the frame 0. From this, the forward prediction pitch period P _F is obtained. Then
The sum of the pitch period evaluation functions in the pitch periods extracted in frame -M + 1 to 0 is calculated. Forward confidence CE
_F (P _F ) is obtained. Further, the backward prediction means 105 predicts the pitch cycle of the present frame so as not to lose continuity with the pitch cycle predicted in the future frame. That is, first, for all the pitch periods of the frame 0, the pitch periods of the frames 0 to N-1 are calculated while limiting the range for obtaining the pitch period between the frames as in the forward predicting means 110. Next, for all pitch periods of frame 0, the sum of the pitch period evaluation functions in the pitch periods extracted in frames 0 to N-1 is calculated. Further, the maximum value obtained from the above two values and the pitch period that takes the maximum value are obtained, and the backward reliability CE _B (P _B ) and the backward predicted pitch period P _{B are obtained.}
Ask for. Further, the pitch period candidate calculation means 114 calculates the pitch period only with the current frame, without considering the continuity with the preceding and following frames at all. That is, frame 0
Pitch cycle is obtained without limiting the range with, and pitch cycle candidate P _C
To get 10A shows the operation of the forward prediction means 110, FIG. 10B shows the operation of the backward prediction means 105, and FIG. 10C shows the operation of the pitch cycle candidate means 114. Show.

【００５６】以下、本発明の一実施例の動作について説
明する。パワー計算手段１１２は、音声信号１０１より
フレーム毎に音声信号パワーを計算し、これをバッファ
１１３に出力する。バッファ１１３は前記音声信号パワ
ーをピッチ抽出対象としているフレームの１フレーム過
去からそれ以降を記憶し、ピッチ抽出対象フレームの音
声信号パワーPW₀及びその前後１フレームずつの音声信
号パワーPW_-1,PW₁ を補正手段１１５に出力する。ピッ
チ周期候補算出手段１１４はバッファ１０４より入力さ
れたピッチ周期抽出対象フレームのピッチ周期評価関数
より、例えばその最大値よりピッチ周期候補P_Cを抽出
し、これを補正手段１１５に出力する。The operation of the embodiment of the present invention will be described below. The power calculation means 112 calculates the audio signal power for each frame from the audio signal 101 and outputs it to the buffer 113. The buffer 113 stores one frame past to the subsequent frames of the frame where the voice signal power is the pitch extraction target, and the voice signal power PW ₀ of the pitch extraction target frame and the voice signal powers PW ₋₁ and PW of each frame before and after the frame. ₁ is output to the correction means 115. Pitch period candidate calculation unit 114 extracts from the pitch period estimation function of the pitch period extraction target frame input from the buffer 104, for example, a pitch period candidate P _C than its maximum value, and outputs it to the correction means 115.

【００５７】補正手段１１５は、例えば図１１のフロー
チャートに従って、後向予測手段１０５より入力された
後向信頼度CE_B(P_B) 、前向予測手段１１０より入力され
た前向信頼度CE_F(P_F)と、前記バッファ１１３から入力
されたパワー情報PW_-1,PW₀,PW₁ を用いて以下の選択を
する。即ち、前記後向予測手段１０５より入力された後
向予測ピッチ周期P_Bと前記前向予測手段１１０より入力
された前向予測ピッチ周期P_Fと前記ピッチ周期候補算出
手段１１４より入力されたピッチ周期候補P_Cのいずれか
を選択し、ピッチ周期１０２として出力する。ここで
は、P_Cは、現フレームで求められるピッチ周期で、ピッ
チ周期候補手段１１４の出力であり、P_Fは、過去フレー
ムから予測されたピッチ周期で、前向予測手段１１０の
出力であり、P_Bは、未来フレームから予測されたピッチ
周期で、後向予測手段１０５の出力である。またＳ３１
は、P_CがP_F、P_Bとほぼ等しい場合は、P_Cをピッチとして
選択し、Ｓ３２はP_FとP_Bの信頼度に差異がない場合は、
P_C,P_F,P_Bのうちフレームパワーが大きいものを選ぶこと
を意味する。Ｓ３３は、P_FとP_Bのうち信頼度が大きいも
のを選ぶことを意味する。図１１のＳ３１のステップで
は、前記ピッチ周期候補算出手段１１４より入力された
ピッチ周期候補P_Cを、後向予測手段１０５より入力され
た後向予測ピッチ周期P_B及び前向予測手段１１０より入
力された前向予測ピッチ周期P_Fと比較する。そして、差
異が小さいときはピッチ周期候補P_Cを最終的なピッチ周
期１０２とする。The correcting means 115, for example, according to the flowchart of FIG. 11, the backward reliability CE _B (P _B ) input from the backward predicting means 105 and the forward reliability CE _F input from the forward predicting means 110. (P _F ) and the power information PW ₋₁ , PW ₀ , PW ₁ input from the buffer 113 are used to make the following selections. That is, the backward prediction pitch period P _B input by the backward prediction unit 105, the forward prediction pitch period P _F input by the forward prediction unit 110, and the pitch input by the pitch period candidate calculation unit 114. One of the cycle candidates P _C is selected and output as the pitch cycle 102. Here, P _C is the pitch cycle obtained in the current frame, which is the output of the pitch cycle candidate means 114, and P _F is the pitch cycle predicted from the past frame, which is the output of the forward prediction means 110, P _B is the pitch period predicted from the future frame and is the output of the backward prediction means 105. Also S31
Is, if P _C is approximately equal to P _F, P _B, selected as pitch P _C, S32 if there is no difference in the reliability of the P _F and P _B,
This means selecting one having a higher frame power from P _C , P _F , and P _B. S33 means selecting one having a higher reliability from P _F and P _B. In step S31 of FIG. 11, the pitch cycle candidate P _C input from the pitch cycle candidate calculation means 114 is input from the backward prediction pitch cycle P _B input from the backward prediction means 105 and the forward prediction means 110. The forward prediction pitch period P _F is calculated. When the difference is small, the pitch cycle candidate P _C is set as the final pitch cycle 102.

【００５８】図１１のＳ３２のステップでは、前記Ｓ３
１のステップで最終的なピッチ周期１０２が決まらない
場合、前記後向予測手段１０５より入力された後向信頼
度CE_B (P_B)と前記前向予測手段１１０より入力された前
向信頼度CE_F(P_F) を比較する。そして、この差異が小さ
いときは、前記バッファ１１３より入力された音声信号
パワーPW_-1,PW₀,PW₁を比較する。その結果、ピッチ抽出
対象フレームの音声信号パワー PW₀がその前後のフレー
ムの音声信号パワーPW_-1,PW₀に比較して十分に大きい場
合は、ピッチ周期候補P_Cを最終的なピッチ周期１０２と
する。それ以外の場合は、ピッチ抽出対象フレームの前
フレームの音声信号パワーPW_-1と後フレームの音声信号
パワーPW₁を比較する。そして、PW_-1 が十分に大きい場
合は前記前向予測ピッチ周期P_Fを最終的なピッチ周期１
０２とし、PW₁ が十分に大きい場合は前記後向予測ピッ
チ周期P_Bを最終的なピッチ周期１０２とする。図１１の
Ｓ３３のステップでは、前記Ｓ３１、Ｓ３２のステップ
で最終的なピッチ周期１０２が決まらない場合、前記後
向信頼度CE_B(P_B)と前記前向信頼度CE_F(P_F)を比較する。
そして、CE_B(P_B)が大きい場合は前記後向予測ピッチ周
期P_B を最終的なピッチ周期１０２とし、それ以外の場
合は前記前向予測ピッチ周期P_Fを最終的なピッチ周期１
０２とする。In step S32 of FIG.
When the final pitch period 102 is not determined in step 1, the backward reliability CE _B (P _B ) input from the backward predicting means 105 and the forward reliability input from the forward predicting means 110. Compare CE _F (P _F ). When this difference is small, the audio signal powers PW _-1 , PW ₀ , PW ₁ input from the buffer 113 are compared. As a result, if the voice signal power PW ₀ of the pitch extraction target frame is sufficiently larger than the voice signal powers PW ₋₁ and PW ₀ of the preceding and succeeding frames, the pitch period candidate P _{C is set} to the final pitch period 102. And Otherwise, compare the audio signal power PW ₁ of the rear frame and the speech signal power PW _-1 of the previous frame pitch extraction target frame. When PW _-1 is sufficiently large, the forward prediction pitch period P _{F is set} to the final pitch period 1
If PW ₁ is sufficiently large, the backward predicted pitch period P _B is set as the final pitch period 102. In step S33 of FIG. 11, when the final pitch period 102 is not determined in steps S31 and S32, the backward reliability CE _B (P _B ) and the forward reliability CE _F (P _F ) are set. Compare.
When CE _B (P _B ) is large, the backward predicted pitch period P _B is set as the final pitch period 102, and in other cases, the forward predicted pitch period P _F is set as the final pitch period 1
02.

【００５９】実施例１１．上記実施例１０では、バッファ１１３にピッチ周期対象
フレームの過去１フレーム以降の音声信号パワーを記憶
し、また補正手段１１５においては、ピッチ周期対象フ
レーム及びその前後１フレームずつの音声信号パワー情
報のみを用いていた。これを、バッファ１１３はピッチ
周期対象フレーム及びその過去Ｍフレーム、未来Ｎフレ
ームの音声信号パワーを記憶し、これを補正手段１１５
に出力するようにする。そして、前記補正手段１１５で
は、前記ピッチ抽出対象フレーム及びその過去Ｍフレー
ム、未来Ｎフレームの音声信号パワーの情報を用いて、
例えばピッチ抽出対象フレームのパワーと過去Ｍフレー
ムにおける平均パワーと未来Ｎフレームにおける平均パ
ワーとの大小関係を用いて最終的なピッチ周期を選択し
てもよい。Example 11. In the tenth embodiment, the buffer 113 stores the audio signal powers of the past one frame and subsequent frames of the pitch cycle target frame, and the correction means 115 stores only the audio signal power information of the pitch cycle target frame and one frame before and after the frame. Was used. The buffer 113 stores the audio signal powers of the pitch cycle target frame and its past M frames and future N frames, and corrects this.
Output to. Then, the correction means 115 uses the information of the audio signal powers of the pitch extraction target frame and its past M frame and future N frame,
For example, the final pitch period may be selected using the magnitude relationship between the power of the pitch extraction target frame, the average power of the past M frames, and the average power of the future N frames.

【００６０】実施例１２．本実施例では、ピッチ抽出に用いるフレームに無声部が
含まれないようにフレーム数を制御することにより、ピ
ッチの誤抽出を軽減する例を説明する。即ち、有声部の
立ち上がり、立ち下がりの部分を含まないようにする。
図１２はこの発明の更に他の実施例を示す構成図であ
る。図１２において、図９と同一の部分については同一
の符号を付し、説明を省略する。図１２において、新規
な部分は以下の通りである。１１６はピッチ周期評価関
数を記憶するバッファ、１１７はフレーム数制御手段、
１１８はピッチ予測手段である。実施例は、有声部立ち
上がりのフレームのピッチ周期を抽出するとき、隣接す
る無声部からの予測はできないので前向予測区間のフレ
ーム数を0 として、後向予測区間だけからピッチを求め
る。図１３は、この様子を説明する図である。Example 12 In the present embodiment, an example will be described in which erroneous pitch extraction is reduced by controlling the number of frames so that unvoiced parts are not included in the frames used for pitch extraction. That is, the rising and falling parts of the voiced part are not included.
FIG. 12 is a block diagram showing still another embodiment of the present invention. 12, the same parts as those in FIG. 9 are designated by the same reference numerals and the description thereof will be omitted. In FIG. 12, the new parts are as follows. 116 is a buffer for storing the pitch period evaluation function, 117 is a frame number control means,
Reference numeral 118 is a pitch predicting means. In the embodiment, when extracting the pitch period of the frame of the voiced part rising, prediction cannot be performed from the adjacent unvoiced part, so the number of frames in the forward prediction section is set to 0, and the pitch is obtained only from the backward prediction section. FIG. 13 is a diagram for explaining this situation.

【００６１】以下、図１２に示した本発明の一実施例の
動作について説明する。バッファ１１６はピッチ周期評
価関数をピッチ抽出対象フレーム及びその直前Ｍフレー
ム、直後Ｎフレーム分記憶し、この（Ｍ＋Ｎ＋１）フレ
ームのピッチ周期評価関数をピッチ予測手段１１８に出
力する。The operation of the embodiment of the present invention shown in FIG. 12 will be described below. The buffer 116 stores the pitch period evaluation function for the pitch extraction target frame and the immediately preceding M frames and the immediately following N frames, and outputs the pitch period evaluation function of this (M + N + 1) frame to the pitch prediction means 118.

【００６２】フレーム数制御手段１１７は音声信号１０
１を分析する。例えば無声、無声→有声過渡、有声、有
声→無声過渡の４状態に判別し、この判別結果に基づき
ピッチ評価に用いる過去フレーム数Ｍ’と未来フレーム
数Ｎ’を、例えば以下に示すように決めて、ピッチ予測
手段１１８に出力する。無声部：Ｍ’＝０，Ｎ’＝０無声→有声過渡部：Ｍ’＝０，Ｎ’＝Ｎ有声部：Ｍ’＝Ｍ，Ｎ’＝Ｎ有声→無声過渡部：Ｍ’＝Ｍ，Ｎ’＝０The frame number control means 117 controls the audio signal 10
Analyze 1. For example, unvoiced, unvoiced → voiced transient, voiced, and voiced → unvoiced transient are discriminated into four states, and the past frame number M ′ and future frame number N ′ used for pitch evaluation are determined based on the discrimination results, for example, as shown below. And outputs it to the pitch predicting means 118. Unvoiced part: M '= 0, N' = 0 Unvoiced → voiced transitional part: M '= 0, N' = N Voiced part: M '= M, N' = N Voiced → unvoiced transitional part: M '= M, N '= 0

【００６３】ピッチ予測手段１１８は前記バッファ１１
６より入力されたピッチ周期評価関数から、前記フレー
ム数制御手段１１７より入力されたピッチ評価に用いる
過去Ｍ’フレーム、未来Ｎ’フレームのフレーム数に基
づき、ピッチ抽出対象フレームの予測ピッチ周期P₀の信
頼度C_E(P₀)を、例えば式（５）に従って求める。ここ
で、E_n(P_n)はピッチ抽出対象フレームからｎフレーム離
れたフレームの周期P_nにおけるピッチ周期評価関数であ
り、P_n(n = -M■, ..., -1, 1, ..., N■)はC_E(P₀)を最
大にするものとして決定する。ただし、P_n(n = -M■,
..., N■)はＭ’＋Ｎ’＋１）フレーム間で連続的であ
るように、例えば式（６）に従ってその存在範囲を制限
する。The pitch predicting means 118 is the buffer 11
Based on the number of past M'frames and future N'frames used for pitch evaluation input from the frame number control means 117 from the pitch period evaluation function input from No. 6, the predicted pitch period P ₀ of the pitch extraction target frame The reliability C _E (P ₀ ) of is calculated according to, for example, equation (5). Here, E _n (P _n ) is the pitch period evaluation function in the period P _n of the frame n frames away from the pitch extraction target frame, and P _n (n = -M ■, ..., -1, 1, ..., N ■) is determined to maximize C _E (P ₀ ). However, P _n (n = -M ■,
..., N ■) limits its existence range, for example, according to equation (6) so that it is continuous between M '+ N' + 1) frames.

【００６４】[0064]

【数３】 [Equation 3]

【００６５】次にこの信頼度が最大となるP₀を探索し、
これを抽出結果であるピッチ周期１０２として出力す
る。Next, search for P ₀ with the highest reliability,
This is output as the pitch cycle 102 which is the extraction result.

【００６６】実施例１３．上記実施例１２では、予測手段を最終的なピッチ周期を
求めるものとしている。これを例えば、評価に含める後
続フレーム数Ｎ＝０として従来のピッチ抽出装置の前向
予測手段とする、あるいは評価に含める先行フレーム数
Ｍ＝０として後向予測手段とするなど、ピッチ抽出装置
の一部として最終的なピッチ周期を求めるための候補を
算出する手段として用いてもよい。Example 13. In the twelfth embodiment, the predicting means determines the final pitch period. For example, the number of subsequent frames N = 0 included in the evaluation is used as the forward prediction means of the conventional pitch extraction apparatus, or the number of preceding frames included in the evaluation M = 0 is used as the backward prediction means of the pitch extraction apparatus. As a part, it may be used as a means for calculating a candidate for obtaining the final pitch period.

【００６７】実施例１４．上記実施例１２では、有声無声判定結果に基づき予測手
段におけるピッチ抽出評価に用いるフレーム数を制御し
ているが、従来のピッチ抽出装置において有声無声判定
を行い、その判定結果により例えば無声→有声過渡部で
あれば前向予測手段を用いない、有声→無声過渡部では
後向予測手段を用いないように切り換えるとしても同様
の効果がある。Example 14 In the twelfth embodiment, the number of frames used for pitch extraction evaluation in the predicting means is controlled based on the voiced / unvoiced determination result. However, the conventional pitch extraction device performs voiced / unvoiced determination and, for example, unvoiced → voiced transient. The same effect can be obtained even if the forward prediction means is not used for the section and the backward prediction means is not used for the voiced → unvoiced transition section.

【００６８】実施例１５．普通ピッチ抽出は周期２．５〜１６ｍｓ程度の範囲で探
索する。しかし、男性では希に１６〜２５ｍｓのピッチ
周期をとる場合がある。ピッチ抽出の範囲を２．５〜２
５ｍｓとすれば探索洩れは無くなるが、探索範囲が広い
ために抽出誤りが発生し易くなるので、通常はこれを行
なわない。このため、ピッチ周期が探索範囲を越えて長
い男性の場合には、ピッチ周期を必ず短く間違えてしま
う。本実施例では、この抽出誤りが発生しているかを判
別する。例えば、ボコーダでは、音源信号を生成するた
めに用いるピッチ周期が実際のピッチ周期と大きく異な
ると異音を発生する。そこで、ピッチ正誤フラグにより
生成する音源信号を以下のように定める。フラグ正：抽
出されたピッチ周期のインパルス列を音源信号とする。
フラグ誤：予め設定しておく最大ピッチ周期のインパル
ス列を音源信号とする。こうすることで、ピッチ抽出が
誤っても合成音声が大きく劣化することを防ぐことがで
きる。図１４はこの発明の更に他の実施例を示す構成図
である。図１４において、２０１は音声信号、２０２は
抽出されたピッチ、２０３はピッチ正誤フラグ、２０４
はピッチ抽出手段、２０５はパワー計算手段、２０６は
誤ピッチ判定手段である。また図１５はその動作を説明
する図である。Example 15 Ordinary pitch extraction is performed within a period of 2.5 to 16 ms. However, a male may rarely take a pitch period of 16 to 25 ms. Pitch extraction range is 2.5-2
If it is set to 5 ms, the omission of the search will be eliminated, but an extraction error is likely to occur due to the wide search range, so this is not normally performed. Therefore, in the case of a man whose pitch period is longer than the search range, the pitch period is always mistaken for short. In this embodiment, it is determined whether this extraction error has occurred. For example, in a vocoder, if the pitch cycle used to generate the sound source signal is significantly different from the actual pitch cycle, an abnormal sound is generated. Therefore, the sound source signal generated by the pitch correctness flag is determined as follows. Flag positive: The impulse train of the extracted pitch period is used as the sound source signal.
False flag: An impulse train having a preset maximum pitch period is used as a sound source signal. By doing so, it is possible to prevent the synthesized speech from being greatly deteriorated even if the pitch extraction is erroneous. FIG. 14 is a block diagram showing still another embodiment of the present invention. In FIG. 14, 201 is an audio signal, 202 is an extracted pitch, 203 is a pitch correctness flag, 204
Is a pitch extraction means, 205 is a power calculation means, and 206 is an erroneous pitch determination means. Further, FIG. 15 is a diagram for explaining the operation.

【００６９】以下、図１４に示した本発明の一実施例の
動作について説明する。ピッチ抽出手段２０４は入力音
声よりピッチ周期を抽出し、このピッチ周期をパワー計
算手段２０５に出力する。パワー計算手段２０５は前記
ピッチ抽出手段２０４により入力されたピッチ周期毎に
入力音声のパワーを計算し、これを誤ピッチ判定手段２
０６に出力する。図１５にピッチ周期が正しく抽出され
た場合（ａ）のピッチ周期毎のパワーの変遷と、入力音
声のピッチ周期がピッチ探索範囲を越えているため誤ピ
ッチ抽出された場合（ｂ）のピッチ周期毎のパワーの変
遷を示す。図１５に示すように誤ピッチ抽出された場合
にはピッチ周期毎のパワーの変化が大きくなる。誤ピッ
チ判定手段２０６は、前記ピッチ周期毎のパワーの変化
に基づいて、例えば連続するピッチ周期間のパワーの比
が予め定めた閾値よりも大きい場合は誤ピッチと判定す
るとして前記ピッチ周期の正誤を判定し、この判定結果
をピッチ正誤フラグ２０３として出力する。なお、図１
６は、ピッチ正誤フラグを切換信号としてボコーダに適
用した例を示す図である。The operation of the embodiment of the present invention shown in FIG. 14 will be described below. The pitch extraction means 204 extracts a pitch cycle from the input voice and outputs this pitch cycle to the power calculation means 205. The power calculation means 205 calculates the power of the input voice for each pitch cycle input by the pitch extraction means 204, and uses it to calculate the erroneous pitch determination means 2
It outputs to 06. FIG. 15 shows the transition of the power for each pitch cycle when the pitch cycle is correctly extracted (a) and the pitch cycle when the incorrect pitch is extracted because the pitch cycle of the input voice exceeds the pitch search range (b). The change of power for each is shown. As shown in FIG. 15, when the erroneous pitch is extracted, the change in power for each pitch cycle becomes large. The erroneous pitch determination means 206 determines that the pitch cycle is correct based on the change in the power for each pitch cycle, for example, when the power ratio between consecutive pitch cycles is larger than a predetermined threshold value, the pitch cycle is correct. Is determined, and the result of this determination is output as the pitch correctness flag 203. Note that FIG.
FIG. 6 is a diagram showing an example in which a pitch correctness flag is applied as a switching signal to a vocoder.

【００７０】実施例１６．上記実施例１５では、ピッチ正誤フラグを出力するだけ
であるが、誤ピッチと判定された場合には、例えばピッ
チ周期探索範囲を変更して再度ピッチ抽出を行い、正し
いと判定されるまでピッチ周期を求め直すとしても良
い。Example 16. In the fifteenth embodiment, the pitch correct / wrong flag is only output. However, when it is determined that the pitch is incorrect, for example, the pitch cycle search range is changed, pitch extraction is performed again, and the pitch cycle is determined until it is determined to be correct. You may ask again.

【００７１】[0071]

【発明の効果】以上説明したように請求項１の発明は、
過去のフレームで抽出したピッチ周期の平均値が大きい
ときに分析窓長を長く、間引き率を高くし、逆に前記平
均値が小さいときに分析窓長を短く、間引き率を低くし
て分析するようにしたので、少ない処理量で精度の高い
ピッチ周期の抽出ができる効果がある。As described above, the invention of claim 1 is
When the average value of the pitch periods extracted in the past frames is large, the analysis window length is long and the thinning rate is high. On the contrary, when the average value is small, the analysis window length is short and the thinning rate is low for analysis. Since this is done, there is an effect that the pitch period can be extracted with high accuracy with a small processing amount.

【００７２】請求項２の発明は、所定の評価値が最大に
なるように分析窓の位置を決定するようにしたので、抽
出がより安定な信号だけを用いた分析となり、信号の過
渡部分でも精度の高いピッチ周期の抽出ができる効果が
ある。According to the second aspect of the present invention, the position of the analysis window is determined so that the predetermined evaluation value is maximized. Therefore, extraction is performed using only a more stable signal, and even in the transient portion of the signal. There is an effect that the pitch cycle can be extracted with high accuracy.

【００７３】請求項３の発明は、無音フレームのピッチ
周期評価関数の平均値を算出して音声のピッチ周期評価
関数から減算するようにしたので、雑音下でも正確なピ
ッチ周期の抽出ができる効果がある。According to the invention of claim 3, the average value of the pitch period evaluation function of the silent frame is calculated and subtracted from the pitch period evaluation function of the voice, so that the pitch period can be accurately extracted even in the presence of noise. There is.

【００７４】請求項４の発明は、過去に抽出されたピッ
チ周期を用いて算出した現在のフレームの予測ピッチ周
期と、ピッチ周期評価関数から算出したピッチ周期候補
と、歪を最小とする補正ピッチ周期候補とから、ピッチ
周期を選択するようにしたので、連続性の高い安定した
ピッチ周期を抽出することができる効果がある。According to a fourth aspect of the present invention, the predicted pitch period of the current frame calculated using the pitch period extracted in the past, the pitch period candidate calculated from the pitch period evaluation function, and the corrected pitch that minimizes distortion. Since the pitch cycle is selected from the cycle candidates, there is an effect that a stable pitch cycle with high continuity can be extracted.

【００７５】請求項５の発明は、音声信号を有声音、無
声音、無音を含む複数のカテゴリに判別し、現在と過去
のフレームの音声信号のカテゴリ判定結果と、補正手段
内の判定結果を用いて信頼度を判定するようにしたの
で、予測ピッチ周期を算出するのに適した信頼度の判定
が成され、連鎖誤りが少ないピッチ周期の抽出ができる
効果がある。According to a fifth aspect of the invention, the voice signal is discriminated into a plurality of categories including voiced sound, unvoiced sound and silence, and the category discrimination result of the voice signals of the present and past frames and the discrimination result in the correction means are used. Since the reliability is determined based on the reliability, determination of the reliability suitable for calculating the predicted pitch cycle is performed, and there is an effect that the pitch cycle with few chain errors can be extracted.

【００７６】請求項６の発明は、前向予測ピッチ周期と
後向ピッチ周期の他に前後フレームとの連続性を考えず
に抽出するピッチ周期候補も最終的なピッチ周期選択対
象とし、また、音声信号のパワー情報を用いて最終的な
ピッチ周期を選択するようにしたので、連鎖誤りが少な
く精度の高いピッチ周期の抽出ができる効果がある。According to the invention of claim 6, in addition to the forward predicted pitch cycle and the backward pitch cycle, pitch cycle candidates extracted without considering continuity with preceding and following frames are also the final pitch cycle selection targets, and Since the final pitch cycle is selected using the power information of the voice signal, there is an effect that the pitch cycle can be extracted with high accuracy and with little chain error.

【００７７】請求項７の発明は、入力音声の状態に基づ
き評価に用いるフレーム数を変更するようにしたので、
ピッチ評価に不適当なフレームを評価範囲から除外で
き、過渡部でも精度の高いピッチ周期の抽出ができる効
果がある。According to the invention of claim 7, the number of frames used for evaluation is changed based on the state of the input voice.
It is possible to exclude frames that are not suitable for pitch evaluation from the evaluation range, and it is possible to extract the pitch period with high accuracy even in the transition part.

【００７８】請求項８の発明は、ピッチ抽出結果を入力
音声のピッチ周期毎のパワーにより再評価するようにし
たので、精度の高いピッチ周期の抽出ができる効果があ
る。According to the invention of claim 8, the pitch extraction result is re-evaluated by the power for each pitch cycle of the input voice, so that there is an effect that the pitch cycle can be extracted with high accuracy.

【図面の簡単な説明】[Brief description of drawings]

【図１】この発明の実施例１の装置の構成図である。FIG. 1 is a configuration diagram of an apparatus according to a first embodiment of the present invention.

【図２】実施例１に基づく実験結果を示す図である。FIG. 2 is a diagram showing an experimental result based on Example 1.

【図３】この発明の実施例２の装置の構成図である。FIG. 3 is a configuration diagram of an apparatus according to a second embodiment of the present invention.

【図４】この発明の実施例３の装置の構成図である。FIG. 4 is a configuration diagram of an apparatus according to a third embodiment of the present invention.

【図５】この発明の実施例４の装置の構成図である。FIG. 5 is a configuration diagram of an apparatus of Embodiment 4 of the present invention.

【図６】この発明の実施例４の装置の構成要素の動作を
説明するフローチャート図である。FIG. 6 is a flow chart for explaining the operation of the constituent elements of the device according to the fourth embodiment of the present invention.

【図７】この発明の実施例５の装置の構成図である。FIG. 7 is a configuration diagram of an apparatus of Example 5 of the present invention.

【図８】実施例５での補正手段の処理フローチャート図
である。FIG. 8 is a process flow chart of the correction means in the fifth embodiment.

【図９】この発明の実施例１０の装置の構成図である。FIG. 9 is a configuration diagram of an apparatus of Example 10 of the present invention.

【図１０】図９の各要素の動作の様子を説明する図であ
る。FIG. 10 is a diagram illustrating the manner of operation of each element in FIG. 9;

【図１１】図９の装置の処理フローチャート図である。FIG. 11 is a process flowchart of the device of FIG.

【図１２】この発明の実施例１２の装置の構成図であ
る。FIG. 12 is a configuration diagram of an apparatus of Embodiment 12 of the present invention.

【図１３】図１２の装置の動作を説明する図である。13 is a diagram for explaining the operation of the apparatus of FIG.

【図１４】この発明の実施例１５の装置の構成図であ
る。FIG. 14 is a configuration diagram of an apparatus of Embodiment 15 of the present invention.

【図１５】図１４の装置の動作を説明する図である。FIG. 15 is a diagram for explaining the operation of the apparatus of FIG.

【図１６】実施例１５の出力の適用例を示す図である。FIG. 16 is a diagram showing an application example of the output of the fifteenth embodiment.

【図１７】従来のピッチ抽出装置を示す構成図である。FIG. 17 is a configuration diagram showing a conventional pitch extraction device.

【図１８】従来のピッチ抽出装置を示す構成図である。FIG. 18 is a configuration diagram showing a conventional pitch extraction device.

【図１９】従来のピッチ抽出装置を示す構成図である。FIG. 19 is a block diagram showing a conventional pitch extraction device.

【符号の説明】[Explanation of symbols]

１音声信号２窓位置決定手段３音声状態判定手段４制御手段５間引き手段６ピッチ周期評価関数計算手段７雑音除去手段８ピッチ周期候補算出手段９予測手段１０信頼度判定手段１１補正ピッチ周期候補算出手段１２補正手段１３ピッチ周期１４最大値検出手段１５ピッチ周期高精度抽出手段１６部分評価関数計算手段１７重み制御手段１８判定手段１９窓判定手段１０１音声信号１０２ピッチ周期１０３ピッチ周期評価関数計算手段１０４バッファ１０５後向予測手段１０６遅延回路１０７遅延回路１０８ピッチ周期評価値算出手段１０９バッファ１１０前向予測手段１１１補正手段１１２パワー計算手段１１３バッファ１１４ピッチ周期候補算出手段１１５補正手段１１６バッファ１１７フレーム数制御手段１１８ピッチ予測手段２０１音声信号２０２ピッチ周期２０３ピッチ正誤フラグ２０４ピッチ抽出手段２０５パワー計算手段２０６誤ピッチ判定手段 1 audio signal 2 Window position determining means 3 Audio state determination means 4 Control means 5 thinning means 6 Pitch period evaluation function calculation means 7 Noise removal means 8 Pitch cycle candidate calculation means 9 Prediction means 10 Reliability determination means 11 Corrected pitch period candidate calculation means 12 Correction means 13 pitch periods 14 Maximum value detection means 15 Pitch cycle high precision extraction means 16 Partial evaluation function calculation means 17 Weight control means 18 Judgment means 19 Window determination means 101 audio signal 102 pitch period 103 pitch period evaluation function calculation means 104 buffer 105 Retrospective prediction means 106 delay circuit 107 delay circuit 108 pitch period evaluation value calculation means 109 buffer 110 Forward prediction means 111 correction means 112 Power calculation means 113 buffer 114 pitch period candidate calculation means 115 Correction means 116 buffer 117 frame number control means 118 pitch predicting means 201 audio signal 202 pitch period 203 Pitch correctness flag 204 pitch extraction means 205 Power calculation means 206 False pitch determination means

フロントページの続き (56)参考文献特開昭56−126895（ＪＰ，Ａ) 特開昭59−99497（ＪＰ，Ａ) 特開昭60−195599（ＪＰ，Ａ) 特開昭62−194300（ＪＰ，Ａ) 特開昭59−152496（ＪＰ，Ａ) 特開平１−315798（ＪＰ，Ａ) 特開昭54−124605（ＪＰ，Ａ) 特開昭63−124100（ＪＰ，Ａ) 特開平１−238698（ＪＰ，Ａ) 実開平２−89500（ＪＰ，Ｕ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 11/06 Continuation of the front page (56) Reference JP-A-56-126895 (JP, A) JP-A-59-99497 (JP, A) JP-A-60-195599 (JP, A) JP-A-62-194300 (JP , A) JP 59-152496 (JP, A) JP 1-315798 (JP, A) JP 54-124605 (JP, A) JP 63-124100 (JP, A) JP 1-238698 (JP, A) Actual Kaihei 2-89500 (JP, U) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 11/00-11/06

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】入力である音声信号のフレーム毎のサン
プリング開始から終了までの時間幅である窓の幅を決め
る窓手段と、上記窓内の音声信号のピッチ周期を算出するピッチ周期
算出手段と、上記フレーム中において上記窓手段における窓を時間方
向にシフトさせて窓内の音声信号のパワーが最大になる
よう窓位置を制御する窓位置決定手段を備えたピッチ抽
出装置。1. Window means for deciding a window width which is a time width from the start to the end of sampling for each frame of an audio signal which is an input, and a pitch cycle calculating means for calculating a pitch cycle of an audio signal in the window. A pitch extraction device comprising window position determining means for controlling the window position so that the power of an audio signal in the window is maximized by shifting the window in the window means in the time direction in the frame.

【請求項２】窓内の入力音声信号のサンプリング・デ
ータに対し間引きサンプリングしてデータ出力する間引
き手段と、過去のフレーム毎のピッチ周期平均値が所定の値より大
きいと上記窓の幅を拡げ、かつ上記間引き手段の間引き
を多くして粗くし、ピッチ周期の平均値が所定の値より
小さいと上記窓の幅を狭め、かつ上記間引きを少なくし
て細かく出力するよう制御する制御手段を備えたことを
特徴とする請求項１記載のピッチ抽出装置。2. A thinning-out for sampling-out sampling data of an input audio signal in a window and outputting the data.
If the average pitch period value for each frame in the past is larger than a predetermined value, the width of the window is expanded, and the thinning means is thinned to increase the coarseness so that the average value of the pitch period is smaller than the predetermined value. If it is small, it is necessary to provide control means for narrowing the width of the window and reducing the thinning to output finely.
The pitch extraction device according to claim 1, characterized in that