JP3559485B2

JP3559485B2 - Post-processing method and device for audio signal and recording medium recording program

Info

Publication number: JP3559485B2
Application number: JP33138899A
Authority: JP
Inventors: 仲大室; 一則間野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-11-22
Filing date: 1999-11-22
Publication date: 2004-09-02
Anticipated expiration: 2019-11-22
Also published as: JP2001147700A

Description

【０００１】
【発明の属する技術分野】
この発明は、フレームごとに入力される音声信号のピッチ成分強調による品質向上を目的とした音声信号の後処理方法に関し、特に、音声信号のスペクトル包絡特性を表すフィルタを音源ベクトルで駆動して音声を合成する予測符号化および復号化において、符号化されたビット系列から高品質な音声信号を再生する音声復号化に適用して有効な音声信号の後処理方法および装置並びにプログラムを記録した記録媒体に関する。
【０００２】
【従来の技術】
ディジタル移動体通信において、電波を効率的に利用したり、音声または音楽蓄積サービス等で通信回線や記憶媒体を効率的に利用するために、高能率音声符号化方法が用いられている。
現在、音声を高能率に符号化する方法として、原音声をフレームまたはサブフレーム（以下、総称してフレーム）と呼ばれる５〜５０ｍｓｅｃ程度の一定間隔の区間に分割し、その１フレームの音声を周波数スペクトルの包絡特性を表す線形フィルタの特性と、そのフィルタを駆動するための駆動音源信号との２つの情報に分離し、それぞれを符号化する手法が提案されている。この手法において、駆動音源信号を符号化する方法として、音声のピッチ周期（基本周波数）に対応すると考えられる周期成分と、それ以外の成分に分離して符号化する方法が知られている。この駆動音源情報の符号化法の例として、符号駆動線形予測符号化（Ｃｏｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ：ＣＥＬＰ）がある。上記の詳細については、文献ＭＲ．ＳｃｈｒｏｅｄｅｒａｎｄＢ．Ｓ．Ａｔａｌ，“Ｃｏｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ（ＣＥＬＰ）：ＨｉｇｈＱｕａｌｉｔｙＳｐｅｅｃｈａｔＶｅｒｙＬｏｗＢｉｔＲａｔｅｓ”，ＩＥＥＥＰｒｏｃ．ＩＣＡＳＳＰ−８５，ｐｐ．９３７−９４０，１９８５に記載されている。
【０００３】
図１に上記符号化部１の構成例を示す。
入力端子に入力された音声Ｘは、線形予測分析部２において、入力音声の周波数スペクトル包絡特性を表す線形予測パラメータが計算される。得られた線形予測パラメータは線形予測パラメータ符号化部３において、量子化および符号化され、量子化されたパラメータは合成フィルタ係数ｋとして、合成フィルタ１２に送られる。
【０００４】
なお、線形予測分析の詳細および線形予測パラメータの符号化例については、例えば、古井貞煕著“ディジタル音声処理”（東海大学出版会）に記載されている。ここで、線形予測分析部２、線形予測パラメータ符号化部３および合成フィルタ１２は非線形なものに置き換えてもよい。
駆動音源ベクトル生成部５では、１フレーム分の長さの駆動音源ベクトル候補を生成し、合成フィルタ１２に送る。駆動音源ベクトル生成部５では、１フレーム分の長さの駆動音源ベクトル候補を生成し、合成フィルタ１２に送る。駆動音源ベクトル生成部５は一般に適応符号帳６と固定符号帳７で構成することが多い。適応符号帳６からはバッファに記憶された直前の過去の駆動音源ベクトル（既に量子化された直前の１〜数フレーム分の駆動音源ベクトル）を、ある周期に相当する長さで切り出し、その切り出したベクトルをフレームの長さになるまで繰り返すことによって、音声の周期成分に対応する時系列ベクトルの候補が出力される。上記「ある周期」とは、歪み計算部１３における歪ｄが小さくなるような周期が選択されるが、選択された周期は、一般には音声のピッチ周期に相当することが多い。固定符号帳７からは、音声の非周期成分に対応する１フレーム分の長さの時系列符号ベクトルの候補が出力される。これらの候補は入力音声と独立に符号化のためのビット数に応じてあらかじめ指定された数の候補ベクトルが記憶されている。適応符号帳６および固定符号帳７から出力された時系列ベクトルの候補は、乗算部８，９において、それぞれ重み作成部１０において作成された重みが乗算され、加算部１１において加算され、駆動音源ベクトルの候補ｃとなる。なお、駆動音源ベクトル生成部５の構成例において、適応符号帳６を用いないで、固定符号帳７のみの構成としてもよく、子音部や背景雑音などのピッチ周期性の少ない信号を符号化するときには、適応符号帳６を用いない構成にすることも多い。
【０００５】
合成フィルタ１２は、線形予測パラメータの量子化値をフィルタの係数とする線形フィルタで、駆動音源ベクトル候補ｃを入力として再生音声の候補ｙを出力する。合成フィルタの次数すなわち線形予測分析の次数は、一般に１０〜１６次程度が用いられることが多い。なお、既に述べたように、合成フィルタ１２は非線形なフィルタでもよい。
歪み計算部１３では、合成フィルタの出力である再生音声の候補ｙと、入力音声Ｘとの歪みｄを計算する。この歪みの計算は、例えば聴覚重み付けなど、合成フィルタの係数または量子化していない線形予測係数を考慮にいれて行う場合がある。
【０００６】
符号帳検索制御部１４では、各再生音声候補ｙと入力音声ｘとの歪みｄが最小となるような駆動音源符号を選択し、そのフレームにおける駆動音源ベクトルを決定する。
符号帳検索制御部１４において決定された駆動音源符号ｎ２と、線形予測パラメータ符号化部３の出力である線形予測パラメータ符号ｎ１は、符号送出部４に送られ、利用の形態に応じて記憶装置に記憶されるか、または通信路を介して受信側へ送られる。
【０００７】
図２に、上記符号化方法に対応するＣＥＬＰ復号化部２０の構成例を示す。
伝送路または記憶媒体から受信された符号のうち、線形予測パラメータ符号ｎ１は線形予測パラメータ復号部２１において合成フィルタ係数に復号され、合成フィルタ２２および、必要に応じて後処理部３０に送られる。駆動音源符号ｎ２は、駆動音源ベクトル生成部２５に送られ、符号に対応する音源ベクトルが生成される。なお、駆動音源ベクトル生成部２５の構成は、図１に示される符号化部１の駆動音源ベクトル生成部５に対応する構成となる。合成フィルタ２２は、駆動音源ベクトルを入力として、音声ｓ´を再生する。後処理部３０はポストフィルタとも呼ばれ、再生された音声の雑音感を聴覚的に低下させるような処理を行う。
【０００８】
図３にポストフィルタの構成例を示す。
ポストフィルタでは、一般にスペクトルの包絡の強調と、くし型フィルタによるピッチ強調を行う。
図３では、復号された音声信号をスペクトル包絡の逆特性を持つＭＡ（移動平均）型フィルタ３２を介し音源波形を抽出し、ピッチ長の位置にタップを持つくし型フィルタ３８に通してピッチの周期性を強調し、最後にスペクトルの包絡特性を強調するＡＲ（自己回帰）型フィルタ３９に通して、聴覚的に改善された音声信号が得られる。ピッチの周期性を強調するためのくし型フィルタは、ＭＡ型で実現する場合と、ＡＲ型で実現する場合があり、ピッチ周期が整数値のときのフィルタ特性を式で表すと、それぞれ以下のようになる。なお、ａ，ｂｉは定数、ｔはピッチ周期である。
ＭＡ型の場合：Ｈ（ｚ）＝１＋ａＺ^−ｔ
ＡＲ型の場合：Ｈ（ｚ）＝１／（１−ａＺ^−ｔ）
実際にはピッチ周期が整数値でない場合が多いため、アップサンプリングの手法を用いて、
ＭＡ型の場合：Ｈ（ｚ）＝１＋ａΣ_ｉｂｉＺ^−ｔ＋ｉ
ＡＲ型の場合：Ｈ（ｚ）＝１／（１−ａΣ_ｉｂｉＺ^−ｔ＋ｉ）
という形にすることが多い。
【０００９】
上記式において、定数ａ，ｂｉによってピッチ周期性が示される。
【００１０】
【発明が解決しようとする課題】
ＣＥＬＰ方式におけるポストフィルタにおいて問題となるのは、ピッチ強調の処理が符号化／復号化と同様に、フレーム単位で行われる（音声符号化という利用分野ではフレーム単位で行わざるをえない）ことである。つまり、フレーム内では信号の音響特性が一定であるという前提にたって処理が行われるために、フレーム長がある程度長い場合（例えば１０ｍｓｅｃ以上）で、ピッチ周期やピッチ周期性の特性がフレーム内で変化しているような過渡的な特性の部分ではフレーム内でピッチ周期やピッチの周期性の度合などが一定であると仮定する処理では十分な品質を得ることができないという問題がある。
【００１１】
この発明の目的は、音声符号化／復号化の利用分野において、フレーム単位の処理の枠組みを崩さないで、より高品質な再生音声を提供することにある。
【００１２】
上記課題を解決するために、請求項１記載の発明は、音声信号の後処理方法において、フレームごとに入力された音声信号を蓄積手段に蓄積する過程と、前記蓄積手段に蓄えられた現在のフレームおよび過去のフレームの音声信号を前記音声信号のスペクトル包絡の逆特性を示す線形予測係数を用いる線形フィルタに通過させて線形フィルタ通過信号を得る過程と、前記線形フィルタ通過信号から現在のフレームにおける平均ピッチ長を計算する過程と、前記線形フィルタ通過信号から波形のピーク位置を検出して第１番目のピッチ基準位置とし、前記第１番目のピッチ基準位置を基準にして平均１ピッチ長の波形を切り出す過程と、前記切り出した平均１ピッチ長の波形と前記線形フィルタ通過信号との相互相関を計算し、前記相互相関のピークを第２番目以降のピッチ基準位置として順次探索する過程と、前記第１番目のピッチ基準位置と前記探索する過程において決定された第２番目以降のピッチ基準位置をもとに前記線形フィルタ通過信号において正確な１ピッチ波形に対応した領域を決定し、前記領域ごとにピッチの周期性を分析して領域ごとの正確なピッチ周期と領域ごとの正確なピッチ相関を求め、前記領域ごとの正確なピッチ周期と前記領域ごとのピッチ相関に基づいてくし型フィルタ係数を算出し、前記領域ごとに前記線形フィルタ通過信号を前記領域ごとのくし型フィルタ係数をもちいたくし型フィルタに通過させて音声信号出力を得る過程とを有することを特徴とする。
【００１３】
請求項２記載の発明は、請求項１記載の音声信号の後処理方法において、前記領域ごとに得られた音声信号に窓関数を乗じ、直前の領域の窓関数を乗じて得られた音声信号とを重畳する過程とを有することを特徴とする。
請求項３記載の発明は、音声信号の後処理装置において、フレームごとに入力された音声信号を蓄積手段に蓄積する蓄積手段と、前記蓄積手段に蓄えられた現在のフレームおよび過去のフレームの音声信号を前記音声信号のスペクトル包絡の逆特性を示す線形予測係数を用いる線形フィルタに通過させて線形フィルタ通過信号を得るフィルタと、前記線形フィルタ通過信号から現在のフレームにおける平均ピッチ長を計算する平均ピッチ長計算部と、前記線形フィルタ通過信号から波形のピーク位置を検出して第１番目のピッチ基準位置とするピーク位置検出部と、前記第１番目のピッチ基準位置を基準にして平均１ピッチ長の波形を切り出す信号波形切り出し部と、前記切り出した平均１ピッチ長の波形と前記線形フィルタ通過信号との相互相関を計算し、前記相互相関のピークを第２番目以降のピッチ基準位置として順次探索するピッチ基準位置探索部と、前記第１番目のピッチ基準位置と前記探索する過程において決定された第２番目以降のピッチ基準位置をもとに前記線形フィルタ通過信号において正確な１ピッチ波形に対応した領域を決定する境界決定部と、前記領域ごとにピッチの周期性を分析して領域ごとの正確なピッチ周期と領域ごとの正確なピッチ相関を求め、前記領域ごとの正確なピッチ周期と前記領域ごとのピッチ相関に基づいてくし型フィルタ係数を算出するピッチ相関値計算部と、前記領域ごとに前記線形フィルタ通過信号を前記領域ごとのくし型フィルタ係数をもちいたくし型フィルタに通過させて音声信号出力を得ることを特徴とする。
【００１４】
請求項４記載の発明は、請求項３記載の音声信号の後処理装置において、前記領域ごとに得られた音声信号に窓関数を乗じる乗算部と、直前の領域の窓関数を乗じて得られた音声信号とを重畳する重畳部と、を備えたことを特徴とする。
【００１５】
請求項５記載の発明は、プログラムを記録した記録媒体において、フレームごとに入力された音声信号を蓄積手段に蓄積する第１の手順と、前記蓄積手段に蓄えられた現在のフレームおよび過去のフレームの音声信号を前記音声信号のスペクトル包絡の逆特性を示す線形予測係数を用いる線形フィルタに通過させて線形フィルタ通過信号を得る第２の手順と、前記線形フィルタ通過信号から現在のフレームにおける平均ピッチ長を計算する第３の手順と、前記線形フィルタ通過信号から波形のピーク位置を検出して第１番目のピッチ基準位置とし、前記第１番目のピッチ基準位置を基準にして平均１ピッチ長の波形を切り出す第４の手順と、前記切り出した平均１ピッチ長の波形と前記線形フィルタ通過信号との相互相関を計算し、前記相互相関のピークを第２番目以降のピッチ基準位置として順次探索する第５の手順と、前記第１番目のピッチ基準位置と前記探索する過程において決定された第２番目以降のピッチ基準位置をもとに前記線形フィルタ通過信号において正確な１ピッチ波形に対応した領域を決定し、前記領域ごとにピッチの周期性を分析して領域ごとの正確なピッチ周期と領域ごとの正確なピッチ相関を求め、前記領域ごとの正確なピッチ周期と前記領域ごとのピッチ相関に基づいてくし型フィルタ係数を算出し、前記領域ごとに前記線形フィルタ通過信号を前記領域ごとのくし型フィルタ係数をもちいたくし型フィルタに通過させて音声信号出力を得る第６の手順を実行させることを特徴とする。
【００１６】
請求項６記載の発明は、請求項５記載のプログラムを記録した記録媒体において、領域ごとに得られた音声信号に窓関数を乗じ、直前の窓関数を乗じて得られた音声信号とを重畳する手順を有することを特徴とする。
【００１７】
この発明では、上記記載の構成を備えることによりポストフィルタ内において、一定時間の遅延を許すこととし、フレーム長＋遅延時間の区間において、ピッチ位置の検出と１ピッチ単位に強調処理を行うピッチ同期型ポストフィルタリングを実現する。これにより、１フレーム内でピッチ周期が変動している場合でも、ピッチの微少な変動を検出して対応しながら、ピッチの強調処理が可能となるうえ、従来の方法に比べて、わずかな遅延時間が増えるのみで、符号化と復号化部分のフレーム処理の枠組みに適用可能である。
【００１８】
【発明の実施の形態】
実施例
以下にこの発明の実施例を図面を用いて説明する。
図４は、この発明によるポストフィルタ構成例を示す。
ＭＡフィルタ部とＡＲフィルタ部は共通でよく、その内側の処理がこの発明の特徴である。（後述するように、ＭＡフィルタとＡＲフィルタを、この発明の処理に合わせて変更するとなおよい。）また、この発明では一定時間の遅延を設けるため、遅延用のバッファ３１を用意する。
【００１９】
図５から図７は、図４における各処理部の処理を、模式的に表した図である。例えば、図５において、入力フレーム位置が合成フィルタ22から出力された１フレーム分の信号ｓ´である。
なお、図５から図７の波形は、ＭＡフィルタ32を通った後の波形eであるとする。実際にポストフィルタ処理するフレームは、処理フレームと表記される区間で、入力フレームと処理フレームの時間差が遅延となる。なお、処理フレーム内における平均ピッチ長は得られているものとする。この平均ピッチ長は、適応符号帳の周期符号を利用してもよいし、合成信号の自己相関関数を計算しなおして、平均ピッチ長を計算してもよい。この平均ピッチ長がフレーム長よりも長い場合は、この発明の効果はほとんどないので、従来のポストフィルタ処理のままでよい。この発明は、ピッチ周期がフレーム長よりも短い場合に効果がある。
【００２０】
以下、ピッチ周期はフレーム長よりも短いものと仮定して説明する。
まず、フレーム内ピーク位置検出部３３では、処理フレーム内における信号のピーク位置（振幅の最大点）を探索して検出する。例えば、これを図５においてＰ_０とする。（図５の例では、処理フレームの右側境界付近に位置することになる。フレーム位置と信号波形の相対的な位置関係はランダムであるので、ピーク位置が処理フレームのどこに位置するかはその都度任意である。）次にピーク位置Ｐ_０を基準にして、信号波形から、信号波形切り出し部３４において、平均ピッチ長の波形を切り出す。この様子を図６に示す。切り出し位置は、ピーク位置を中心に、前後２分の１ピッチ長ずつの領域から切り出すとよい。また、図６のように、処理フレームの境界をまたいでかまわないが、入力フレーム位置の右端を越えることができないので、切り出し位置が入力フレーム位置の右端を越える場合は、切り出し位置の右端は入力フレーム位置の右端とする。次に、ピッチ基準位置探索部３５において、この切り出し波形を左右にシフトしながら切り出し波形と信号の相互相関を計算し、Ｐ_０から平均ピッチ程度離れた位置付近で、相互相関が最大となる位置を探索し、次のピッチ基準位置を決定する。このピッチ基準位置探索処理を繰り返すことによって、処理フレームと入力フレームをあわせた領域で、ピッチ基準位置を決定する。図６の例では、処理フレーム内にピッチ基準位置が３箇所、処理フレーム外に２箇所決まったことになる。次に、領域の境界決定部３６において、図７に示すように、ピッチ基準位置をもとにして、信号を正確な１ピッチ波形に対応するように、領域の境界を決める。境界点の決め方は、例えば、（Ｐ_１＋Ｐ_２）／２、（Ｐ_０＋Ｐ_１）／２のように、基準位置の中間点を境界としてもよいが中間点よりも少し右寄り、例えばＰ_２＋（Ｐ_１−Ｐ_２）＊２／３のように、２：１の内分点を境界点としてもよい。中間点よりも少し右寄りのほうがよいのは、一般的に、１ピッチ波形は急激に立ち上がって、ゆっくり収束するような波形になることが多く観察されるためである。
【００２１】
ピッチ強調処理は、この領域毎にピッチの周期性を分析して処理される。まずピッチ相関値計算部３７において、領域１と過去の信号波形との相互相関を利用して、正確なピッチ周期ｔ１とピッチ相関の値ａ１を求める。次に領域２について、正確なピッチ周期ｔ２と、ピッチ相関の値ａ２を求める。このように求めた正確なピッチ周期とピッチ相関の値は、フレーム内で信号が過渡的に変化していても、従来のポストフィルタ方法とは異なり、変化に追従して値を求めることができる。
【００２２】
最後に、各領域毎に求めたピッチ周期、ピッチ相関の値を使って、前述のようなくし型フィルタ３８を領域毎にかけてピッチ強調処理を行う。
このような処理の場合、小さな領域毎にピッチ強調フィルタの係数がかわるため、不連続音が生じて逆に品質劣化の原因となることがある。この問題を防ぐために、領域の境界決定部３６で決定された各領域を少しずつ（例えば、１０サンプル、１．２５ｍｓｅｃ程度）を重ねるようにして、重なる部分にそれぞれ三角窓をかけて足し合わせ、徐々に特性が変化するようにするとよい。
【００２３】
図５から７の例では、ＭＡフィルタとＡＲフィルタは従来のままを前提としたが、図７のように各領域を決定したあと、ＭＡフィルタやＡＲフィルタも、この領域毎に係数を再分析し、領域毎にスペクトル包絡を強調するようにフィルタをかけるとなおよい。
なお、上記実施例では、符号駆動線形予測符号化および復号化において、符号化されたビット系列から高品質な音声信号を再生する音声復号化方法に適用した例が説明されているが、本発明は入力処理対象が復号音声信号に限定されるものではなく、フレームごとに入力される復号音声信号かどうか不明である音声信号にも適用が可能である。
【００２４】
さらに、本発明を適用したシステムをＣＰＵやメモリ等を有するコンピュータと端末装置とＣＤ−ＲＯＭ、磁気ディスク装置、半導体メモリ等の機械読み取り可能な記録媒体で構成し、記録媒体に記憶された音声信号の後処理方法の手順を実行させるプログラムをコンピュータに読み取り、コンピュータの動作を制御し、前述の実施の形態における各要素を実現する。
【００２５】
【発明の効果】
この発明によるポストフィルタを４ｋｂｉｔ／ｓの音声符号化方式の復号された信号に適用し、主観品質評価を行った。ポストフィルタの処理単位となるフレーム長は１０ｍｓｅｃとした。また、遅延時間は５ｍｓｅｃに設定した。この結果、従来のポストフィルタ法に比べて、特に女性の音声の品質に著しい改善が見られた。具体的には、「南には」といった母音が連続して短時間に変化する部分の雑音感が低減し、クリアに聞こえるようになった。なお、女性の音声で大きな改善が見られ、男性の音声では従来法に比べてあまり変化がなかったのは、この発明がピッチ周期がフレーム長に比べて十分に短い場合に有効なためである。フレーム長の１０ｍｓｅｃは１００Ｈｚに対応するが、男性の音声のピッチ周波数は一般に１００Ｈｚ以下である（ピッチ長が１０ｍｓｅｃよりも長い）ことが多い。一方、女性の音声のピッチ周波数は２００Ｈｚから４００Ｈｚ程度であるため、図５から７の例のように、１フレームに２から３ピッチ入る。即ち男性の音声とは異なり、女性の音声のピッチ長はフレーム長よりも十分に短いため、主観品質の点で大きな改善が得られた。
【図面の簡単な説明】
【図１】ＣＥＬＰ符号化部の構成例を示すブロック図。
【図２】ＣＥＬＰ復号化部の構成例を示すブロック図。
【図３】ポストフィルタの構成例を示すブロック図。
【図４】本発明におけるポストフィルタ処理部のブロック図。
【図５】フレーム内ピーク位置の検出を説明する図。
【図６】波形の切り出しとピッチ基準位置の探索を説明する図。
【図７】領域境界の決定を説明する図。
【符号の説明】
１ＣＥＬＰ符号化部
２線形予測分析部
３線形予測パラメータ符号化部
４符号送出部
５駆動音源ベクトル生成部
６適応符号帳
７固定符号帳
８，９乗算部
１０重み作成部
１１加算部
１２，２２合成フィルタ
１３歪み計算部
１４符号帳検索制御部
２１線形予測パラメータ復号部
２５駆動音源ベクトル生成部
３０後処理部
３１バッファ
３２ＭＡフィルタ
３３フレーム内ピーク位置検出部
３４信号波形切り出し部
３５ピッチ基準位置探索部
３６領域の境界決定部
３７ピッチ相関値計算部
３８くし型フィルタ
３９ＡＲフィルタ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a post-processing method of an audio signal aiming at quality improvement by emphasizing a pitch component of an audio signal inputted for each frame, and in particular, by driving a filter representing a spectral envelope characteristic of the audio signal by a sound source vector, In a predictive encoding and decoding for synthesizing an audio signal, a post-processing method and apparatus effective for an audio signal applied to audio decoding for reproducing a high-quality audio signal from an encoded bit sequence, and a recording medium recording a program About.
[0002]
[Prior art]
2. Description of the Related Art In digital mobile communication, a high-efficiency voice encoding method is used in order to efficiently use radio waves or to efficiently use a communication line or a storage medium for a voice or music storage service.
At present, as a method for encoding speech efficiently, original speech is divided into frames or subframes (hereinafter collectively referred to as frames) at intervals of about 5 to 50 msec, and the speech of one frame is divided into frequencies. A method has been proposed in which the information is separated into two pieces of information, that is, a characteristic of a linear filter representing an envelope characteristic of a spectrum and a drive excitation signal for driving the filter, and each of the information is encoded. In this method, as a method of encoding a drive excitation signal, there is known a method in which a periodic component considered to correspond to a pitch period (fundamental frequency) of voice and a component other than the periodic component are encoded. As an example of the coding method of the drive excitation information, there is code-driven linear prediction coding (CELP). For details of the above, see the document MR. Schroeder and B.S. S. Atal, "Code-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates", IEEE Proc. ICASP-85, pp. 937-940, 1985.
[0003]
FIG. 1 shows a configuration example of the encoding unit 1.
For the speech X input to the input terminal, the linear prediction analysis unit 2 calculates a linear prediction parameter representing a frequency spectrum envelope characteristic of the input speech. The obtained linear prediction parameters are quantized and coded in the linear prediction parameter coding unit 3, and the quantized parameters are sent to the synthesis filter 12 as synthesis filter coefficients k.
[0004]
The details of the linear prediction analysis and examples of encoding the linear prediction parameters are described in, for example, “Digital Speech Processing” by Sadahiro Furui (Tokai University Press). Here, the linear prediction analysis unit 2, the linear prediction parameter encoding unit 3, and the synthesis filter 12 may be replaced with non-linear ones.
The driving sound source vector generation unit 5 generates a driving sound source vector candidate having a length of one frame and sends the candidate to the synthesis filter 12. The driving sound source vector generation unit 5 generates a driving sound source vector candidate having a length of one frame and sends the candidate to the synthesis filter 12. Driving excitation vector generating section 5 generally includes adaptive codebook 6 and fixed codebook 7 in many cases. From the adaptive codebook 6, the immediately preceding drive excitation vector stored in the buffer (the drive excitation vector for one to several frames immediately before quantization) is cut out at a length corresponding to a certain period, and the cut out is performed. By repeating this vector until the length of the frame is reached, a time-series vector candidate corresponding to the periodic component of the audio is output. As the “certain period”, a period that reduces the distortion d in the distortion calculator 13 is selected, and the selected period generally corresponds to the pitch period of voice in many cases. From the fixed codebook 7, a candidate for a time-series code vector having a length of one frame corresponding to the non-periodic component of speech is output. For these candidates, a predetermined number of candidate vectors are stored in accordance with the number of bits for encoding independently of the input speech. The time series vector candidates output from the adaptive codebook 6 and the fixed codebook 7 are multiplied by the weights created by the weight creation unit 10 in the multiplication units 8 and 9, respectively, added by the addition unit 11, and added to the driving sound source. This is a vector candidate c. In the configuration example of the driving excitation vector generation unit 5, the adaptive codebook 6 may not be used and only the fixed codebook 7 may be used, and a signal having a small pitch periodicity such as a consonant part or background noise is encoded. Sometimes, the configuration does not use the adaptive codebook 6.
[0005]
The synthesis filter 12 is a linear filter that uses the quantization value of the linear prediction parameter as a filter coefficient, and outputs a driving sound source vector candidate c as an input and outputs a reproduced sound candidate y. In general, the order of the synthesis filter, that is, the order of the linear prediction analysis, is generally about 10 to 16 order. As described above, the synthesis filter 12 may be a non-linear filter.
The distortion calculator 13 calculates a distortion d between the reproduced voice candidate y output from the synthesis filter and the input voice X. The calculation of the distortion may be performed in consideration of the coefficients of the synthesis filter or the unpredicted linear prediction coefficients, for example, perceptual weighting.
[0006]
The codebook search control unit 14 selects a driving excitation code that minimizes the distortion d between each reproduced speech candidate y and the input speech x, and determines a driving excitation vector in the frame.
The excitation code n2 determined by the codebook search control unit 14 and the linear prediction parameter code n1, which is the output of the linear prediction parameter encoding unit 3, are sent to the code transmission unit 4, and are stored in a storage device according to the use mode. Or sent to the receiving side via a communication path.
[0007]
FIG. 2 shows a configuration example of the CELP decoding unit 20 corresponding to the above encoding method.
Among the codes received from the transmission path or the storage medium, the linear prediction parameter code n1 is decoded into a synthesis filter coefficient in the linear prediction parameter decoding unit 21, and is sent to the synthesis filter 22 and, if necessary, the post-processing unit 30. The driving excitation code n2 is sent to the driving excitation vector generation unit 25, and an excitation vector corresponding to the code is generated. The configuration of the driving excitation vector generation unit 25 corresponds to the configuration of the driving excitation vector generation unit 5 of the encoding unit 1 illustrated in FIG. The synthesis filter 22 reproduces the sound s ′ using the driving sound source vector as an input. The post-processing unit 30 is also called a post-filter, and performs a process of reducing the sense of noise of the reproduced sound.
[0008]
FIG. 3 shows a configuration example of the post filter.
In the post filter, generally, the envelope of the spectrum is enhanced, and the pitch is enhanced by a comb filter.
In FIG. 3, a sound source waveform is extracted from a decoded audio signal through an MA (moving average) filter 32 having an inverse characteristic of a spectral envelope, and is passed through a comb filter 38 having a tap at a position of a pitch length. By passing through an AR (autoregressive) filter 39 that emphasizes the periodicity and finally the envelope characteristic of the spectrum, an audio signal that is auditorily improved is obtained. Comb filters for emphasizing the periodicity of pitch may be realized by MA type or AR type, and the filter characteristics when the pitch period is an integer value are expressed by the following equations, respectively. Become like Here, a and bi are constants, and t is a pitch period.
MA type: H (z) = 1 + aZ− ^t
In case of AR type: H (z) = 1 / (1-aZ- ^t )
Actually, the pitch period is often not an integer value.
In the case of MA type: H (z) = 1 + aΣ i biZ -t + i
In the case of AR type: H (z) = 1 / (1-aΣ i biZ -t + i)
Often in the form.
[0009]
In the above equation, the constants a, The pitch periodicity is indicated by bi.
[0010]
[Problems to be solved by the invention]
The problem with the post filter in the CELP method is that the pitch enhancement processing is performed in units of frames, as in the case of encoding / decoding (in the application field of voice coding, it must be performed in units of frames). is there. That is, since the processing is performed on the assumption that the acoustic characteristics of the signal are constant within the frame, when the frame length is somewhat long (for example, 10 msec or more), the pitch period and the characteristics of the pitch period change within the frame. In such transient characteristics, there is a problem that sufficient quality cannot be obtained by processing that assumes that the pitch period and the degree of pitch periodicity in a frame are constant.
[0011]
SUMMARY OF THE INVENTION An object of the present invention is to provide a higher-quality reproduced sound without breaking the framework of processing on a frame-by-frame basis in a field of audio encoding / decoding.
[0012]
In order to solve the above-mentioned problem, the invention according to claim 1 is a post-processing method of an audio signal, wherein an audio signal input for each frame is stored in a storage unit, and a current audio signal stored in the storage unit is stored in the storage unit. Passing a speech signal of a frame and a past frame through a linear filter using a linear prediction coefficient indicating an inverse characteristic of the spectrum envelope of the speech signal to obtain a linear filter passing signal; and Calculating an average pitch length, detecting a peak position of the waveform from the linear filter passing signal, setting the peak position as a first pitch reference position, and calculating a waveform having an average of one pitch length based on the first pitch reference position. a step of cutting out, the cross-correlation between the linear filter passing signal and the cutout average one-pitch length of the waveform is calculated, the cross-correlation A step of sequentially searching for a chromatography click as the pitch reference position of the second and subsequent, the linear filter based on a pitch reference position of the second and subsequent determined in the course of the search and the first-th pitch reference position A region corresponding to an accurate one-pitch waveform in the passing signal is determined , a pitch periodicity of each region is analyzed for each of the regions, and an accurate pitch period of each region and an accurate pitch correlation of each region are obtained. Calculate the comb filter coefficient based on the exact pitch period and the pitch correlation for each area, and pass the linear filter passing signal for each area to the comb filter using the comb filter coefficient for each area. Obtaining an audio signal output.
[0013]
According to a second aspect of the present invention, in the post-processing method of the first aspect, the audio signal obtained by multiplying the audio signal obtained for each area by a window function and the window function of the immediately preceding area is multiplied. And superimposing.
According to a third aspect of the present invention, in the post-processing apparatus for the audio signal, the storage means for storing the audio signal input for each frame in the storage means, and the current frame and the past frame audio stored in the storage means. A filter that passes the signal through a linear filter that uses a linear prediction coefficient indicating the inverse characteristic of the spectral envelope of the audio signal to obtain a linear filter passing signal, and an average that calculates an average pitch length in a current frame from the linear filter passing signal A pitch length calculation unit, a peak position detection unit that detects a peak position of the waveform from the linear filter passing signal and uses the peak position as a first pitch reference position , and an average of one pitch based on the first pitch reference position. phase signal waveform cutout section for cutting out the length of the waveform, the cutout average one-pitch length of the waveform and the linear filter passing signal The correlation is calculated, the second determined the pitch reference position searching unit for sequentially searching for a peak of the cross-correlation as the pitch reference position of the second and subsequent, in the 1st step of the search and the pitch reference position A boundary determining unit that determines an area corresponding to an accurate one-pitch waveform in the linear filter passing signal based on a subsequent pitch reference position ; and analyzing a periodicity of pitch for each of the areas to obtain an accurate pitch for each area. obtains an accurate pitch correlation for each cycle and the region, and the pitch correlation value calculating unit for calculating a comb filter coefficient based exact pitch period of each of said areas and the pitch correlation of each of said areas, said linear for each of the regions It is characterized in that an audio signal output is obtained by passing a filter-passed signal through a comb filter using a comb filter coefficient for each region .
[0014]
According to a fourth aspect of the present invention, in the post-processing apparatus for an audio signal according to the third aspect, a multiplication unit that multiplies the audio signal obtained for each area by a window function and a window function of the immediately preceding area are obtained. And a superimposing unit that superimposes the reproduced audio signal.
[0015]
The invention according to claim 5, wherein in a recording medium on which a program is recorded, a first procedure of storing an audio signal input for each frame in a storage means, and a current frame and a past frame stored in the storage means. A second step of passing the audio signal through a linear filter using a linear prediction coefficient indicating an inverse characteristic of the spectral envelope of the audio signal to obtain a linear filter-passed signal; and an average pitch in a current frame from the linear filter-passed signal. A third step of calculating the length, and detecting the peak position of the waveform from the linear filter passing signal to determine the peak position as the first pitch reference position, and averaging one pitch length based on the first pitch reference position . a fourth step of cutting out the waveforms, the correlation between the linear filter passing signal and the cutout average one-pitch length of the waveform is calculated, the mutual Based and fifth steps, a pitch reference position of the second and subsequent determined in the course of the search and the first-th pitch reference position for sequentially searching for a peak of about a pitch reference position of the second and subsequent Determining a region corresponding to an accurate one-pitch waveform in the linear filter passing signal , analyzing the periodicity of pitch for each of the regions to determine an accurate pitch period for each region and an accurate pitch correlation for each region, A comb filter that calculates a comb filter coefficient based on an accurate pitch period for each area and a pitch correlation for each area, and uses the linear filter passing signal for each area using the comb filter coefficient for each area. And a sixth procedure for obtaining an audio signal output by passing through the
[0016]
According to a sixth aspect of the present invention, in the recording medium storing the program according to the fifth aspect , the audio signal obtained for each area is multiplied by a window function, and the audio signal obtained by multiplying the window signal immediately before is superimposed. It is characterized by having a procedure to perform.
[0017]
According to the present invention, by providing the above-described configuration, a delay of a fixed time is allowed in the post-filter, and pitch synchronization in which a pitch position is detected and enhancement processing is performed in units of one pitch in a section of frame length + delay time. Implements type post-filtering. As a result, even when the pitch period fluctuates within one frame, pitch enhancement processing can be performed while detecting and responding to small fluctuations in pitch, and a slight delay compared to the conventional method can be achieved. With only an increase in time, it is applicable to the framework of frame processing of the encoding and decoding parts.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described below with reference to the drawings.
FIG. 4 shows a configuration example of a post filter according to the present invention.
The MA filter unit and the AR filter unit may be common, and the processing inside them is a feature of the present invention. (As will be described later, it is more preferable to change the MA filter and the AR filter according to the processing of the present invention.) In the present invention, a delay buffer 31 is provided in order to provide a fixed time delay.
[0019]
5 to 7 are diagrams schematically illustrating the processing of each processing unit in FIG. For example, in FIG. 5, the input frame position is the signal s ′ for one frame output from the synthesis filter 22.
5 to 7 are waveforms e after passing through the MA filter 32. The frame that is actually subjected to post-filter processing is a section described as a processing frame, and the time difference between the input frame and the processing frame is delayed. It is assumed that the average pitch length in the processing frame has been obtained. As the average pitch length, the periodic code of the adaptive codebook may be used, or the autocorrelation function of the synthesized signal may be calculated again to calculate the average pitch length. If the average pitch length is longer than the frame length, there is almost no effect of the present invention, and the conventional post-filter processing may be used. The present invention is effective when the pitch period is shorter than the frame length.
[0020]
Hereinafter, the description will be made assuming that the pitch period is shorter than the frame length.
First, the in-frame peak position detection unit 33 searches for and detects the peak position (maximum amplitude point) of the signal in the processing frame. For example, the _{P 0} in FIG. 5 this. (In the example of FIG. 5, it is located near the right boundary of the processing frame. Since the relative positional relationship between the frame position and the signal waveform is random, the position of the peak position in the processing frame is determined each time. it is optional.) based on the peak position P ₀ then the signal waveform, the signal waveform cutout section 34 cuts out an average pitch length of the waveform. This is shown in FIG. The cutout position may be cut out from a region of a half pitch length before and after the peak position as a center. Also, as shown in FIG. 6, although it does not matter if it crosses the boundary of the processing frame, it cannot cross the right end of the input frame position. Therefore, if the cutout position exceeds the right end of the input frame position, the right end of the cutout position is input. The right end of the frame position. Then, the pitch reference position search section 35, the cut waveform to calculate the cross-correlation of the waveform and signal cut while shifting to the left and right, in the vicinity of a position away approximately average pitch from P _0, the cross-correlation is the maximum position To determine the next pitch reference position. By repeating this pitch reference position search process, a pitch reference position is determined in a region where the processing frame and the input frame are combined. In the example of FIG. 6, three pitch reference positions are determined in the processing frame, and two pitch reference positions are determined outside the processing frame. Next, as shown in FIG. 7, the region boundary determining unit 36 determines the boundary of the region based on the pitch reference position so that the signal corresponds to an accurate one-pitch waveform. The boundary point may be determined, for example, at (P ₁ + P ₂ ) / 2, (P ₀ + P ₁ ) / 2, where the middle point of the reference position may be set as the boundary, but slightly to the right of the middle point, for example, P ₂ + (P ₁ −P ₂ ) * 2/3 may be used as a boundary point with a 2: 1 dividing point. The reason why it is slightly better to the right than the intermediate point is that it is generally observed that a one-pitch waveform rapidly rises and becomes a waveform that converges slowly.
[0021]
The pitch emphasis processing is performed by analyzing the periodicity of the pitch for each area. First, in the pitch correlation value calculation unit 37, an accurate pitch period t1 and a value a1 of the pitch correlation are obtained using the cross-correlation between the region 1 and the past signal waveform. Next, for the area 2, an accurate pitch period t2 and a pitch correlation value a2 are obtained. The accurate pitch period and pitch correlation value obtained in this way can be calculated according to the change, unlike the conventional post-filter method, even if the signal changes transiently in the frame. .
[0022]
Finally, using the pitch period and the pitch correlation value obtained for each area, the above-mentioned elimination filter 38 is applied to each area to perform pitch enhancement processing.
In the case of such processing, since the coefficient of the pitch emphasis filter changes for each small area, a discontinuous sound may be generated, which may cause quality deterioration. In order to prevent this problem, the areas determined by the area boundary determining unit 36 are overlapped little by little (for example, about 10 samples, about 1.25 msec), and the overlapping parts are added with a triangular window, It is good to make a characteristic change gradually.
[0023]
In the examples of FIGS. 5 to 7, the MA filter and the AR filter are assumed to be the same as before, but after determining each region as shown in FIG. 7, the MA filter and the AR filter also re-analyze the coefficient for each region. It is more preferable to apply a filter so as to emphasize the spectral envelope for each region.
Although the above embodiment describes an example in which the present invention is applied to an audio decoding method for reproducing a high-quality audio signal from an encoded bit sequence in code-driven linear prediction encoding and decoding. The input processing is not limited to the decoded audio signal, but can be applied to an audio signal for which it is unknown whether the input is a decoded audio signal input for each frame.
[0024]
Further, a system to which the present invention is applied is constituted by a computer having a CPU and a memory, a terminal device, and a machine-readable recording medium such as a CD-ROM, a magnetic disk device, and a semiconductor memory, and an audio signal stored in the recording medium. The computer reads a program for executing the procedure of the post-processing method, controls the operation of the computer, and realizes each element in the above-described embodiment.
[0025]
【The invention's effect】
The post-filter according to the present invention was applied to a decoded signal of a 4 kbit / s audio coding method, and subjective quality evaluation was performed. The frame length as a processing unit of the post filter was set to 10 msec. The delay time was set to 5 msec. As a result, a remarkable improvement was observed in the quality of the voice of a woman in particular, as compared with the conventional post-filter method. More specifically, the sense of noise in a portion where vowels such as "in the south" continuously change in a short period of time is reduced, and the sound can be heard clearly. It should be noted that a significant improvement was observed in the female voice and the male voice did not change much compared to the conventional method because the present invention is effective when the pitch period is sufficiently short compared to the frame length. . The frame length of 10 msec corresponds to 100 Hz, but the pitch frequency of male voice is generally 100 Hz or less (the pitch length is longer than 10 msec) in many cases. On the other hand, since the pitch frequency of the female voice is about 200 Hz to 400 Hz, two to three pitches are included in one frame as in the example of FIGS. That is, unlike the male voice, the pitch length of the female voice is sufficiently shorter than the frame length, so that a significant improvement in subjective quality was obtained.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration example of a CELP encoding unit.
FIG. 2 is a block diagram showing a configuration example of a CELP decoding unit.
FIG. 3 is a block diagram showing a configuration example of a post filter.
FIG. 4 is a block diagram of a post-filter processing unit according to the present invention.
FIG. 5 is a view for explaining detection of a peak position in a frame.
FIG. 6 is a diagram illustrating waveform extraction and search for a pitch reference position.
FIG. 7 is a view for explaining determination of an area boundary.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 CELP encoding part 2 Linear prediction analysis part 3 Linear prediction parameter encoding part 4 Code transmission part 5 Driven excitation vector generation part 6 Adaptive codebook 7 Fixed codebook 8, 9 Multiplication part 10 Weight creation part 11 Addition parts 12, 22 Synthesis filter 13 Distortion calculation unit 14 Codebook search control unit 21 Linear prediction parameter decoding unit 25 Driving excitation vector generation unit 30 Post-processing unit 31 Buffer 32 MA filter 33 In-frame peak position detection unit 34 Signal waveform extraction unit 35 Pitch reference position search Unit 36 region boundary determination unit 37 pitch correlation value calculation unit 38 comb filter 39 AR filter

Claims

フレームごとに入力された音声信号を蓄積手段に蓄積する過程と、
前記蓄積手段に蓄えられた現在のフレームおよび過去のフレームの音声信号を前記音声信号のスペクトル包絡の逆特性を示す線形予測係数を用いる線形フィルタに通過させて線形フィルタ通過信号を得る過程と、
前記線形フィルタ通過信号から現在のフレームにおける平均ピッチ長を計算する過程と、
前記線形フィルタ通過信号から波形のピーク位置を検出して第１番目のピッチ基準位置とし、前記第１番目のピッチ基準位置を基準にして平均１ピッチ長の波形を切り出す過程と、
前記切り出した平均１ピッチ長の波形と前記線形フィルタ通過信号との相互相関を計算し、前記相互相関のピークを第２番目以降のピッチ基準位置として順次探索する過程と、
前記第１番目のピッチ基準位置と前記探索する過程において決定された第２番目以降のピッチ基準位置をもとに前記線形フィルタ通過信号において正確な１ピッチ波形に対応した領域を決定し、前記領域ごとにピッチの周期性を分析して領域ごとの正確なピッチ周期と領域ごとの正確なピッチ相関を求め、前記領域ごとの正確なピッチ周期と前記領域ごとのピッチ相関に基づいてくし型フィルタ係数を算出し、前記領域ごとに前記線形フィルタ通過信号を前記領域ごとのくし型フィルタ係数をもちいたくし型フィルタに通過させて音声信号出力を得る過程と
を有することを特徴とする音声信号の後処理方法。Accumulating the audio signal input for each frame in the accumulating means;
A step of passing the audio signal of the current frame and the past frame stored in the storage means through a linear filter using a linear prediction coefficient indicating an inverse characteristic of the spectrum envelope of the audio signal to obtain a linear filter passing signal;
Calculating an average pitch length in a current frame from the linear filtered signal;
A step of detecting a peak position of the waveform from the linear filter passing signal, setting the peak position as a first pitch reference position, and cutting out a waveform having an average of one pitch length based on the first pitch reference position ;
Calculating a cross-correlation between the cut-out waveform having an average pitch length of 1 and the signal passing through the linear filter, and sequentially searching for a peak of the cross-correlation as a second or later pitch reference position;
Determining the area corresponding to the correct one pitch waveform in the linear filter passing signal on the basis of the pitch reference position of the second and subsequent determined in the course of the search and the first-th pitch reference position, said region Analyze the pitch periodicity of each region to obtain an accurate pitch period for each region and an accurate pitch correlation for each region, and calculate a comb filter coefficient based on the accurate pitch period for each region and the pitch correlation for each region. Calculating an audio signal output by passing the linear filter passing signal for each region through a comb filter using a comb filter coefficient for each region. Processing method.

請求項１記載の音声信号の後処理方法において、
前記領域ごとに得られた音声信号に窓関数を乗じ、直前の領域の窓関数を乗じて得られた音声信号とを重畳する過程を有することを特徴とする音声信号の後処理方法。The post-processing method for an audio signal according to claim 1,
A post-processing method for an audio signal, comprising a step of multiplying an audio signal obtained for each of the regions by a window function and superimposing an audio signal obtained by multiplying the window function on the immediately preceding region by a window function.

フレームごとに入力された音声信号を蓄積手段に蓄積する蓄積手段と、
前記蓄積手段に蓄えられた現在のフレームおよび過去のフレームの音声信号を前記音声信号のスペクトル包絡の逆特性を示す線形予測係数を用いる線形フィルタに通過させて線形フィルタ通過信号を得るフィルタと、
前記線形フィルタ通過信号から現在のフレームにおける平均ピッチ長を計算する平均ピッチ長計算部と、
前記線形フィルタ通過信号から波形のピーク位置を検出して第１番目のピッチ基準位置とするピーク位置検出部と、
前記第１番目のピッチ基準位置を基準にして平均１ピッチ長の波形を切り出す信号波形切り出し部と、
前記切り出した平均１ピッチ長の波形と前記線形フィルタ通過信号との相互相関を計算し、前記相互相関のピークを第２番目以降のピッチ基準位置として順次探索するピッチ基準位置探索部と、
前記第１番目のピッチ基準位置と前記探索する過程において決定された第２番目以降のピッチ基準位置をもとに前記線形フィルタ通過信号において正確な１ピッチ波形に対応した領域を決定する境界決定部と、前記領域ごとにピッチの周期性を分析して領域ごとの正確なピッチ周期と領域ごとの正確なピッチ相関を求め、前記領域ごとの正確なピッチ周期と前記領域ごとのピッチ相関に基づいてくし型フィルタ係数を算出するピッチ相関値計算部と、前記領域ごとに前記線形フィルタ通過信号を前記領域ごとのくし型フィルタ係数をもちいたくし型フィルタに通過させて音声信号出力を得ること
を特徴とする音声信号の後処理装置。Storage means for storing the audio signal input for each frame in the storage means;
A filter that obtains a linear filter passing signal by passing the audio signal of the current frame and the past frame stored in the storage unit through a linear filter that uses a linear prediction coefficient indicating an inverse characteristic of a spectrum envelope of the audio signal,
An average pitch length calculation unit that calculates an average pitch length in a current frame from the linear filter passing signal,
A peak position detection unit that detects a peak position of a waveform from the linear filter passing signal and uses the peak position as a first pitch reference position ;
A signal waveform cutout unit that cuts out a waveform having an average of one pitch length based on the first pitch reference position ;
A pitch reference position search unit that calculates a cross-correlation between the cut-out waveform having an average pitch length of 1 and the linear filter passing signal, and sequentially searches for a peak of the cross-correlation as a second or later pitch reference position;
Boundary determining unit that determines the area corresponding to the correct one pitch waveform in the first said linear filter passing signal on the basis of the pitch reference position of the second and subsequent determined in the course of the search and the pitch reference position Analyzing the pitch periodicity for each of the regions to obtain an accurate pitch period for each region and an accurate pitch correlation for each region, and calculating the accurate pitch period for each region and the pitch correlation for each region. A pitch correlation value calculating unit for calculating a comb filter coefficient; and obtaining an audio signal output by passing the linear filter passing signal for each region through a comb filter using a comb filter coefficient for each region. Post-processing device for audio signals.

請求項３記載の音声信号の後処理装置において、
前記領域ごとに得られた音声信号に窓関数を乗じる乗算部と、
直前の領域の窓関数を乗じて得られた音声信号とを重畳する重畳部と、
を備えたことを特徴とする音声信号の後処理装置。The audio signal post-processing device according to claim 3,
A multiplication unit that multiplies the audio signal obtained for each region by a window function,
A superimposing unit that superimposes an audio signal obtained by multiplying the window function of the immediately preceding area,
A post-processing device for an audio signal, comprising:

フレームごとに入力された音声信号を蓄積手段に蓄積する第１の手順と、
前記蓄積手段に蓄えられた現在のフレームおよび過去のフレームの音声信号を前記音声信号のスペクトル包絡の逆特性を示す線形予測係数を用いる線形フィルタに通過させて線形フィルタ通過信号を得る第２の手順と、
前記線形フィルタ通過信号から現在のフレームにおける平均ピッチ長を計算する第３の手順と、
前記線形フィルタ通過信号から波形のピーク位置を検出して第１番目のピッチ基準位置とし、前記第１番目のピッチ基準位置を基準にして平均１ピッチ長の波形を切り出す第４の手順と、
前記切り出した平均１ピッチ長の波形と前記線形フィルタ通過信号との相互相関を計算し、前記相互相関のピークを第２番目以降のピッチ基準位置として順次探索する第５の手順と、
前記第１番目のピッチ基準位置と前記探索する過程において決定された第２番目以降のピッチ基準位置をもとに前記線形フィルタ通過信号において正確な１ピッチ波形に対応した領域を決定し、前記領域ごとにピッチの周期性を分析して領域ごとの正確なピッチ周期と領域ごとの正確なピッチ相関を求め、前記領域ごとの正確なピッチ周期と前記領域ごとのピッチ相関に基づいてくし型フィルタ係数を算出し、前記領域ごとに前記線形フィルタ通過信号を前記領域ごとのくし型フィルタ係数をもちいたくし型フィルタに通過させて音声信号出力を得る第６の手順と、を実行させるプログラムを記録した記録媒体。A first procedure for storing the audio signal input for each frame in the storage means;
A second step of passing the audio signals of the current frame and the past frame stored in the storage means through a linear filter using a linear prediction coefficient indicating an inverse characteristic of the spectral envelope of the audio signal to obtain a linear filter passing signal When,
A third procedure of calculating an average pitch length in a current frame from the linearly filtered signal;
A fourth step of detecting a peak position of the waveform from the linear filter passing signal, setting the peak position as a first pitch reference position, and cutting out a waveform having an average of one pitch length based on the first pitch reference position ;
A fifth procedure of calculating a cross-correlation between the cut-out waveform having an average pitch length of 1 and the linear filter passing signal, and sequentially searching for the peak of the cross-correlation as a second or later pitch reference position;
Determining the area corresponding to the correct one pitch waveform in the linear filter passing signal on the basis of the pitch reference position of the second and subsequent determined in the course of the search and the first-th pitch reference position, said region Analyze the pitch periodicity of each region to obtain an accurate pitch period for each region and an accurate pitch correlation for each region, and calculate a comb filter coefficient based on the accurate pitch period for each region and the pitch correlation for each region. And a sixth step of obtaining the audio signal output by passing the linear filter passing signal through the comb filter using the comb filter coefficient of each region for each of the regions . recoding media.

請求項５記載のプログラムを記録した記録媒体において、
領域ごとに得られた音声信号に窓関数を乗じ、直前の窓関数を乗じて得られた音声信号とを重畳する手順を有することを特徴とするプログラムを記録した記録媒体。A recording medium recording the program according to claim 5,
A recording medium on which a program is recorded, comprising a procedure of multiplying an audio signal obtained for each region by a window function and superimposing an audio signal obtained by multiplying the window signal on the immediately preceding window signal.