JP3815323B2

JP3815323B2 - Frequency conversion block length adaptive conversion apparatus and program

Info

Publication number: JP3815323B2
Application number: JP2001400181A
Authority: JP
Inventors: 孝朗山辺
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2001-12-28
Filing date: 2001-12-28
Publication date: 2006-08-30
Anticipated expiration: 2021-12-28
Also published as: JP2003195881A

Description

【０００１】
【発明の属する技術分野】
本発明は周波数変換ブロック長適応変換装置及びプログラムに係り、特に周波数領域信号の時間的変化量を基にアタック音であるか否かを判定して、周波数変換ブロックのブロック長を適応的に切り替える周波数変換ブロック長適応変換装置及びプログラムに関する。
【０００２】
【従来の技術】
代表的なオーディオ圧縮アルゴリズムは適応変換符号化方式が用いられており、既に様々な分野に利用されている。著名な例ではＩＳＯ／ＩＥＣ１１１７２−３のＭＰＥＧ−１オーディオ・レイヤIIIや、ＩＳＯ／ＩＥＣ１３８１８−７のＭＰＥＧ−２ＡＡＣ(Advanced Audio Coding)、ミニディスクの圧縮方式であるＡＴＲＡＣ(Adaptive Transform Acoustic Coding)がこれに相当する。
【０００３】
この適応変換符号化は、時間領域の信号であるＰＣＭ信号を、直交変換（ＭＤＣＴ:Modified Discrete Cosine Transform）を用いて、周波数領域へ展開し、音楽的に重要な周波数帯の重み付けに従い、周波数領域にて情報を削減し符号化を行うものである。
【０００４】
図９はＭＤＣＴ及びＩＭＤＣＴの処理の流れを示す図である。ＭＤＣＴはＤＣＴの一種であり、変換幅の半分ずつ隣り合う変換ブロックと常にオーバーラップがかかりながら周波数領域に展開されるという特徴を持っている。すなわち、１ブロック当たり２Ｎ個のサンプルの時間領域信号が、ＭＤＣＴによりＮ個のＭＤＣＴ係数信号に変換される。また、ＩＭＤＣＴでは、Ｎ個のＭＤＣＴ係数信号を逆変換し、２Ｎ個の時間領域信号にした後に、隣接するブロックでこの時間領域信号を加算して元の時間領域信号を得る。上記のように、変換にはオーバーラップがかかる変換ブロック同士が対称形を成すウィンドウ処理を行うことにより、相互に情報を補完しあっている。
【０００５】
また、図１０に示すように周波数領域への展開は上記の圧縮アルゴリズムの例では２種類の変換幅が用意され、変換ブロック内の信号の特徴に応じ選択可能なようになっている。長い方の変換幅のブロックをロングブロック、短い方の変換幅のブロックをショートブロックと呼び、両者間で遷移する中間ウィンドウのことをスタートウィンドウ、ストップウィンドウと呼ぶ。変換幅の違いによってウィンドウの形状も異なるが、オーバーラップする領域では先に述べたが、ウィンドウは左右対称形を成していなければならない。
【０００６】
図１０（Ａ）、（Ｂ）に示したウィンドウの形状はＭＰＥＧ−２ＡＡＣのものであり、ＭＰＥＧ−１レイヤIIIもこれにほぼ準じている。ＡＴＲＡＣは中間ウィンドウが無くロングウィンドウとショートウィンドウのみである。ロングウィンドウはちょうど図中のスタートウィンドウの右側とストップウィンドウの左側とを合わせた形状である。
【０００７】
通常、周波数変換のブロックが長い程、周波数分解能が向上し、更に補助情報も節約できるため符号化効率が向上するが、オーディオ変換符号化に特有のプリエコーノイズの問題があり、ただ単に周波数変換のブロックを長くすれば良いというものではない。
【０００８】
図１１は人間の聴覚特性の時間マスキング効果を示す図である。同図中、縦軸は音圧（エネルギー）、横軸は時間を示す。これは大きな音の直前と直後の音はマスクされ聞き取り難くなるという現象を示しているが、プリエコーノイズの発生はこのマスキング効果が働く領域外に存在する、ブロック変換幅内の大きな信号によって生じるブロック内に拡散された一様な量子化ノイズが原因である。そこで、この拡散される量子化ノイズを短い変換ブロックを用いることで時間マスキングが働く領域内に閉じ込め、聴感上の改善を図っている。
【０００９】
ブロック長の変換幅判定方法は、時間領域による判定や周波数領域による判定が幾つか紹介されている。時間領域による判定の代表例として１９９２年９月発行の文献「ＭＤｓｙｓｔｅｍ」がある。この文献では、単位ブロック毎に求めたＰＣＭサンプルピーク値による隣接するブロックにおける比率からアタック音が入力したか否かを判断する方法を開示している。
【００１０】
また周波数領域による判定法としては、特開２０００−１３４１０６号公報（発明の名称「オーディオ変換符号化のための周波数領域でのブロックサイズ判定適応方法」）が知られている。この公報記載のブロックサイズ判定適応方法では、単位時間毎に周波数領域に展開された信号からなるサブバンドのエネルギーを、隣り合うフレーム間で比較し、定められたしきい値を超えた場合、つまりエネルギーの変動が大きかったときはショートブロックを、そうでなかったときはロングブロックを適用する手法が示されている。
【００１１】
【発明が解決しようとする課題】
しかしながら、上述の時間領域によるブロック長の変換幅判定方法では、静寂な状態から急激に立ち上がる信号には適しているが、定常的状態からアタック音が入る場合、例えばピアノ曲のように残響音が次第に弱くなりつつも信号が存在している部分に、次の比較的大きな音が重なりあった状態では、時間軸上前方に位置するブロックにおけるピーク値が小さくならず、正しくアタック音を検出できない。
【００１２】
また、周波数領域によるブロック長の変換幅判定方法では、サブバンドのエネルギー量の時間的変化量は、解析ブロックが固定長のため、帯域によって、または解析窓内の信号形態によっては必ずしも正確な周波数成分のエネルギー値を導出しているとは限らない。また、アタック音は基音となる周波数があり、サブバンドに統合したエネルギー量で換算すると、他の近傍の帯域の周波数成分の影響を受けるため、純粋なアタック音によるエネルギー量の推移を把握できてはいない。
【００１３】
本発明は以上の点に鑑みてなされたもので、周波数領域に展開されたスペクトルの時間的な変化量を、誤検出防止のために少なくとも複数の周波数ポイントにて求め、その変化量に基づきアタック音による急激なエネルギー変化があるか否かを判定して周波数変換ブロックのブロック長を適応的に切り替えることにより、精度の高いアタック音検出を行い、プリエコーを抑え、音質の向上を図り得る周波数変換ブロック長適応変換装置及びプログラムを提供することを目的とする。
【００１４】
【課題を解決するための手段】
上記目的を達成するため、本発明の周波数変換ブロック長適応変換装置は、オーディオ変換符号化における周波数変換ブロックのブロック長を適応的に切り替える周波数変換ブロック長適応変換装置であって、入力オーディオ信号を所定のサンプル数で分割し周波数解析する解析ブロック同士が隣接する解析ブロックと半分ずつ重なり合いながら時間的にシフトする複数の解析ブロックのそれぞれについて、解析ブロック間の個々の周波数スペクトルの時間的な変化量を取得する変化量取得手段と、変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲において、変化量取得手段により取得された周波数スペクトルの時間的な変化量と、予め設定したしきい値とを比較する比較手段と、比較手段により変化量がしきい値を超えた個々の周波数スペクトルの本数が、所定の設定値を越えたか否かを検出し、その検出結果によってブロック長を決定するブロック変換幅決定手段とを有する構成としたものである。
【００１５】
この発明では、解析ブロック毎に個々の周波数スペクトルの時間的な変化量がしきい値を越えた周波数スペクトルの本数が所定の設定値を超えたときには、アタック音が入力されたものと判断し、上記の合計が上記の所定の設定値を超えないときにはアタック音ではないと判断して、それぞれに対応したブロック長に決定することができる。
【００１６】
また、上記の変化量取得手段は、入力オーディオ信号を所定のサンプル数で分割し周波数解析する解析ブロック同士が隣接する解析ブロックと半分ずつ重なり合いながら時間的にシフトする複数の解析ブロックのそれぞれについて、解析ブロック間の個々の周波数スペクトルの時間的な変化量を取得するようにしているため、元の解析ブロック境界付近において、シフトした後の複数の解析ブロック間の周波数スペクトルのエネルギー変化量を求めることができ、これにより元の解析ブロック境界付近で発生したアタック音を検出できる。
【００１７】
また、本発明は上記の変化量取得手段を、入力オーディオ信号を所定のサンプル数ずつ解析ブロックとして分割するブロック分割部と、ブロック分割部からの解析ブロック毎に周波数スペクトルを算出する周波数解析部と、周波数スペクトルに基づき周波数解析ポイント毎のエネルギーを求めるスペクトルエネルギー算出部と、スペクトルエネルギー算出部から出力される、時間的に異なる解析ブロックの同一周波数におけるエネルギーの変化量を算出するスペクトルエネルギー算出手段とからなる構成としたものである。この発明では、周波数解析部として、オーディオ変換符号化装置に含まれる周波数解析部を利用できる。
【００１８】
また、本発明は、上記の比較手段を、変化量取得手段により取得された周波数スペクトルの時間的な変化量と、予め設定したしきい値とを比較するときの、変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲を、最大の変化量が小さいほど拡張するように変化させることを特徴とする。
【００１９】
アタック音の周波数成分が隣接する解析ブロック間の境界付近にある場合、二つの解析ブロック間でそのアタック音による成分が分散されるため、エネルギー変化量は必ずしも大きくなるとは限らない。そこで、この発明では、周波数スペクトルの時間的な変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲を、最大の変化量が小さいほど拡張することにより、周波数スペクトルの時間的な変化量が小さいときには、しきい値を超えたスペクトル本数を増加させることでアタック音を検出する。
【００２０】
なお、これは、変化量に対するしきい値を段階的に複数用意し、このときの最大変化量を示す周波数スペクトルのエネルギー量に応じてしきい値を選択し、そのしきい値を超えるスペクトル本数を取得することと等価である。
【００２１】
また、上記の目的を達成するため、本発明のプログラムは、コンピュータを、上記の本発明の周波数変換ブロック長適応変換装置を構成する各手段として機能させることを特徴とする。
【００２２】
【発明の実施の形態】
次に、本発明の一実施の形態について、図面と共に説明する。図１は本発明になる周波数変換ブロック長適応変換装置の一実施の形態の構成図を示す。同図において、入力ＰＣＭ信号はブロック分割部１１に供給され、ここで所定のサンプル数ずつのブロックに分割されて、所定の数のブロックにまとめられる。一つのブロックのサンプル数（ブロックの長さ）は、そのオーディオ符号化方式が持つフレーム長以下とし、かつ、符号化するフレームの範囲と一致させなくてはならない。なぜなら、アタック音検出結果が該当するフレーム、つまりショートウィンドウにすべきフレームを特定できないからである。
【００２３】
アタック音は前後のブロック間でスペクトルのパワー比が急激に変化する。従って、演算量の増加が許す範囲で的確にその変移を捉えるには、ある程度解析ブロックが短い方が、その時間における信号波形の特徴を示さない部分の影響を受けずに済み、より信頼度が増加する。
【００２４】
そこで、本実施の形態ではフレーム長を４分割し、更に図１０（Ａ）のようなスタートウィンドウ及びストップウィンドウと、ロングウィンドウ、ショートウィンドウの４種類のウィンドウを持つ符号化方式について説明する。言うまでもないが、本実施の形態を応用することによって、ロングウィンドウとショートウィンドウの２種類のウィンドウしか持たない符号化方式についても簡単に適応可能である。なお、後者の２種類のウィンドウしか持たない符号化方式のロングウィンドウは台形のような形状をしており、直接ショートウィンドウと接続可能なようになっている。ショートウィンドウの形状は前者の符号化方式と同じである。
【００２５】
図２はフレーム内の解析ブロックを示す。１フレームは４分割されて４つの解析ブロックであるブロック１〜ブロック４からなる。ショートブロックが適用される範囲Iはロングブロックの中間を基準として、ロングブロック変換長IIの５０％強の長さである。従って、スペクトルのパワー変化量を求めるにはブロック１対ブロック２、及びブロック２対ブロック３を比較すればよい。
【００２６】
以上のような動作を行うブロック分割部１１から取り出されたブロックは、図１の周波数解析部１２に供給されて周波数スペクトルが算出される。周波数解析部１２は、例えば一般的な高速フーリエ変換（ＦＦＴ）等の周波数変換法によって周波数スペクトルの算出を行う。この周波数解析部１２は、オーディオ変換符号化装置に含まれる周波数解析部を利用できる。周波数解析部１２で得られた周波数スペクトルは、スペクトルエネルギー算出部１３に供給されて周波数スペクトル毎のエネルギーが求められる。
【００２７】
続いて、前ブロックと現ブロックから個々の周波数スペクトル毎のエネルギー変化量を算出するため、スペクトルエネルギー算出部１３にて一旦求められたエネルギーは、スペクトルエネルギーバッファ１４に送られる一方、直ちにエネルギー変化量の比較を行うため、スペクトルエネルギー変化量算出部１５にも直接に供給される。
【００２８】
スペクトルエネルギー変化量算出部１５は、スペクトルエネルギーバッファ１４からのスペクトルエネルギーと、スペクトルエネルギー算出部１３からのスペクトルエネルギーとを比較することで、前述のように図２に示したブロック１対ブロック２、ブロック２対ブロック３の組み合わせで同一周波数におけるエネルギー変化量を測定する。しきい値比較部１６は、スペクトルエネルギー変化量算出部１５により測定されたエネルギー変化量と、予め定めておいたしきい値とを比較し、エネルギー変化量がしきい値を超えたか否か判定する。この判定は個々の周波数スペクトル毎において行い、その判定結果は条件適合ポイント測定部１７に送られる。
【００２９】
条件適合ポイント測定部１７では誤検出を防止するため、少なくとも複数の周波数スペクトルにてエネルギー変化量がしきい値を超えた事が認められた場合のみ、ショートブロックへの切り替えを許可する。条件適合ポイント測定部１７で得られた測定結果は、ブロック変換幅判定部１８に供給され、ここで最終的なブロック長切り替え判断が行われる。ブロック変換幅判定は前後のフレームの変換長によって制限が生じるため、そのような制御信号を加味しながら判定し、判定結果をブロック長情報として出力する。
【００３０】
例えば、図３（Ａ）に示すような信号波形の場合、定常的な波形であるにも関わらず前後のブロック間で得られる周波数成分が異なってしまう。なお、図３及び後述の図４は、縦軸は振幅、横軸は時間（サンプル）を示す。図３（Ａ）に示す信号は明らかに定常信号であるが、それを拡大した同図（Ｂ）においてブロック２とブロック３との間で周波数成分のエネルギー変化量を測定すると、あたかもアタック音が入力されたかのようなデータを得ることになる。周波数解析幅が固定長でシフトする以上、ブロック長を増減させても信号波形によって必ず同じ現象が起きてしまう。
【００３１】
アタック音の周波数成分はその基音となる周波数とその近傍の周波数、及び倍音成分がほぼ同時に発生する。図４のような残響音が残る中でアタック音が入った信号波形について、信号スペクトルのパワーの時間的な変化量を示したグラフを図５に示す。図５に示すように、基音となる周波数成分Ａとその近傍の周波数成分Ｂ群、及び倍音成分Ｃが新たに発生している。
【００３２】
一方、図３の定常的波形は一部の周波数において前ブロックとのパワー比が著しく増加するが、図５で見られた信号成分Ｂや倍音成分Ｃが存在しない。この両者（アタック音と定常的波形）の差は、基音（パワー比が最も高いところ）に対する近傍の周波数成分と倍音成分の有無にある。
【００３３】
従来の手法は基音のみでアタック音か否かを判定し、基音によって生じる他の周波数成分を考慮していなかった。これに対し、本発明ではアタック音と定常的波形の差は、基音に対する近傍の周波数成分と倍音成分の有無にある点に着目し、基音の検出及び、基音によって生じたと考えられる帯域のエネルギー増加量から総合的に判断し、アタック音検出を行う点に特徴がある。
【００３４】
その方法は、図１と共に説明したように、図１のブロック分割部１１からスペクトルエネルギー変化量算出部１５までの回路部で前ブロックと現ブロックとの周波数スペクトルのエネルギーの増加量を取得し、しきい値比較部１６で所定のしきい値を超えたか否かを検出する。
【００３５】
条件適合ポイント測定部１７では、しきい値比較部１６の所定のしきい値を超えた場合は増加量の最大値を持つ周波数スペクトルに対し、図６にIIIで示すように、基音Ａとその近傍の周波数成分Ｂと倍音成分Ｃを含む適合範囲を定め、しきい値IVを超えたスペクトルの本数をカウントする。ブロック変換幅判定部１８は、この本数が、予め定めておいた値を超えたか否かを検出し、超えた場合は該ブロックにおいてアタック音が入力したものと判断し、ショートブロックへのスイッチに移行する。所定の本数を超えない場合は、アタック音の誤検出であったか、アタック音のエネルギーが小さく、プリエコーが問題になるレベルではないものと判断し、ショートブロックへのスイッチを見送る。
【００３６】
隣接する周波数解析ブロックのスペクトルのエネルギー増加量は、例えば図２のようなブロック間で得ることができる。通常、周波数解析はブロック境界における歪み低減のため、ハミングウィンドウ等の両側の信号成分の重み付けを低下させた左右対称ウィンドウをかけて行われる。従って、ブロック境界においてアタック音が生じた場合、ウィンドウによって信号成分が低下するので、図２のような解析ブロックでは正しく判定できない（検出漏れ）おそれがある。
【００３７】
しかし、解析ブロック幅の１／２の長さを基準にシフトしながら隣接するブロック間のエネルギー変化量を求めれば、上記のような検出漏れを防止することができる。図７は、この１／２シフトを用いた周波数解析ブロック図を示す。エネルギー変化量を求めるには、図７のブロック１対ブロック３、ブロック２対ブロック４のような組み合わせで同一周波数におけるエネルギー変化量を測定することにより、ブロック境界における周波数成分も正しく測定できる。
【００３８】
また、変換される周波数成分は変換法（ブロック長）に従って幾つかのサブバンドに分けられるが（例えば、０〜５ｋＨｚ、５ｋＨｚ〜１０ｋＨｚ、１０ｋＨｚ〜１５ｋＨｚのように）、サブバンド境界の周波数成分を持つ信号（上記の場合、５ｋＨｚ付近、あるいは１０ｋＨｚ付近の信号）は、隣り合うバンド間で信号エネルギーが分散されてしまう。このため、アタック音の周波数成分が周波数解析法の隣接するバンド間の境界付近にある場合、二つのバンド間でそのアタック音による成分が分散されるため、エネルギー変化量は必ずしも大きくなるとは限らない（図８）。
【００３９】
そこで、使用する周波数解析法のバンド境界におけるアタック音の検出精度を高めるため、スペクトルのエネルギー増加量に応じて変化するしきい値を超えたスペクトル本数を設定する。これは前述の条件適合範囲を基音のエネルギー変化量に応じて変化させることを意味する。従って、図６に比べて基音Ａのエネルギー変化量が小さい図８の場合は、適合範囲を図６にIIIで示す適合範囲から図８にVで示すように拡張し、しきい値を超えるポイント数の条件を増加させることで、バンド境界のアタック音の検出を可能にすると共に誤検出を防止する。
【００４０】
バンド境界付近の周波数を基音とするアタック音は、その近傍の周波数帯のエネルギー成分を一時的に大きく増加させる。そこで、基音のエネルギー増加量が比較的小さな場合、近傍のエネルギー成分を観察し、周囲のスペクトルにおいて同じく増加傾向が見られたときはアタック音であったと見なしショートブロックへの変換を促し、そうでなかった場合はアタック音の誤検出、もしくはアタック音のエネルギーが小さく、プリエコーが問題になるレベルではないものと判断し、ショートブロックへのスイッチを見送る。
【００４１】
以上のような基音となる周波数成分のエネルギー増加量に応じ、適応的なスペクトル本数を設定することで、用いる周波数解析法によって決まるバンド境界における周波数成分においてもアタック音の検出を正確に行うことが可能となる。
【００４２】
更に上記の理由により周波数スペクトルにおけるエネルギー増加量の検出方法は周波数解析法を規定する必要が無い。一般的なオーディオ符号化は聴覚心理モデルを利用し、周波数領域上の成分の重み付けを行うことで圧縮を行っている。この聴覚心理モデル解析部にはＦＦＴ等の周波数解析法が用いられその解析幅は確定したブロック長（ロングブロック又はショートブロック）の幅にほぼ等しい。
【００４３】
本発明で用いる周波数解析法は特に規定するものではなく、オーディオ符号化装置が本来持っている周波数解析データを利用することが可能であり、この結果、演算量の抑制、メモリ容量の低減が期待できる。
【００４４】
なお、図１の実施の形態の周波数変換ブロック長適応変換装置を構成する各ブロックの動作を順次の手順とする方法をコンピュータプログラムにより、コンピュータにより実行させることもできる。また、上記の実施の形態では、周波数スペクトルのエネルギー量に基づいてアタック音を検出するように説明したが、周波数スペクトルの振幅値に基づいてアタック音を同様に検出することが可能である。
【００４５】
【発明の効果】
以上説明したように、本発明によれば、複数の解析ブロックのそれぞれについて個々の周波数スペクトルの時間的な変化量がしきい値を越えた周波数スペクトルの本数の合計が所定の設定値を超えたときには、アタック音が入力されたものと判断し、上記の本数が上記の所定の設定値を超えないときにはアタック音ではないと判断して、それぞれに対応したブロック長に決定するようにしたため、アタック音の誤検出や検出漏れを防ぎ、精度の高いアタック音の検出ができ、これによりプリエコーを抑え、従来よりも音質を向上することができる。
【００４６】
また、本発明によれば、入力オーディオ信号を所定のサンプル数で分割し周波数解析する解析ブロック同士が隣接する解析ブロックと半分ずつ重なり合いながら時間的にシフトする複数の解析ブロックのそれぞれについて、解析ブロック間の個々の周波数スペクトルの時間的な変化量を取得することにより、元の解析ブロック境界付近で発生したアタック音を検出できるようにしたため、アタック音の検出洩れをより一層抑えることができる。
【００４７】
更に、本発明によれば、周波数スペクトルの時間的な変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲を、最大の変化量が小さいほど拡張することにより、周波数スペクトルの時間的な変化量が小さいときには、しきい値を超えたスペクトル本数を増加させるようにしたため、バンド境界のアタック音の検出を可能にできると共に、アタック音の誤検出を防止することができる。
【図面の簡単な説明】
【図１】本発明になる周波数変換ブロック長適応変換装置の一実施の形態のブロック図である。
【図２】フレーム内の解析ブロックの一例を示す図である。
【図３】定常波形におけるアタック音の誤検出の一例を示す波形図である。
【図４】残響音が存在する中で入力されたアタック音の一例を示す波形図である。
【図５】図４の波形図における周波数成分のエネルギー変化量を示す図である。
【図６】アタック音を要因とするスペクトルの適合範囲の一例を示す図である。
【図７】１／２シフトを用いたフレーム内の解析ブロックを示す図である。
【図８】基音のエネルギー変化量に応じ変化する条件適合範囲を示す図である。
【図９】ＭＤＣＴ及びＩＭＤＣＴ変換の仕組みを示す図である。
【図１０】ＡＡＣのブロック変換におけるウィンドウの形状を示す図である。
【図１１】聴覚特性における時間マスキング効果を示す図である。
【符号の説明】
１１ブロック分割部
１２周波数解析部
１３スペクトルエネルギー算出部
１４スペクトルエネルギーバッファ
１５スペクトルエネルギー変化量算出部
１６しきい値比較部
１７条件適合ポイント測定部
１８ブロック変換幅判定部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a frequency transform block length adaptive transform apparatus and program, and in particular, determines whether or not an attack sound is based on a temporal change amount of a frequency domain signal, and adaptively switches the block length of the frequency transform block. The present invention relates to a frequency conversion block length adaptive conversion apparatus and program.
[0002]
[Prior art]
A typical audio compression algorithm uses an adaptive transform coding method, and is already used in various fields. Prominent examples include ISO / IEC 11172-3 MPEG-1 audio layer III, ISO / IEC 13818-7 MPEG-2 Advanced Audio Coding (AAC), and mini-disk compression format ATRAC (Adaptive Transform Acoustic Coding). It corresponds to this.
[0003]
In this adaptive transform coding, a PCM signal, which is a signal in the time domain, is expanded into the frequency domain using orthogonal transform (MDCT: Modified Discrete Cosine Transform), and the frequency domain is weighted according to the weighting of a musically important frequency band. The information is reduced and encoded.
[0004]
FIG. 9 is a diagram showing a flow of processing of MDCT and IMDCT. MDCT is a type of DCT, and has a feature that it is expanded in the frequency domain while always overlapping with a transform block adjacent to each other by half of the transform width. That is, the time domain signal of 2N samples per block is converted into N MDCT coefficient signals by MDCT. Also, in IMDCT, N MDCT coefficient signals are inversely transformed into 2N time domain signals, and then the time domain signals are added in adjacent blocks to obtain the original time domain signal. As described above, information is complemented by performing window processing in which conversion blocks that overlap in conversion form symmetrical shapes.
[0005]
Further, as shown in FIG. 10, two types of transform widths are prepared for the expansion to the frequency domain in the above example of the compression algorithm, and can be selected according to the characteristics of the signal in the transform block. A block with a longer conversion width is called a long block, a block with a shorter conversion width is called a short block, and intermediate windows that transition between the two are called a start window and a stop window. Although the window shape varies depending on the conversion width, as described above in the overlapping region, the window must be symmetrical.
[0006]
The window shapes shown in FIGS. 10A and 10B are of the MPEG-2 AAC, and the MPEG-1 layer III substantially conforms to this. ATRAC has no intermediate window and only long and short windows. The long window is a shape that combines the right side of the start window and the left side of the stop window in the figure.
[0007]
In general, the longer the frequency conversion block, the better the frequency resolution and the more efficient it is to save auxiliary information. However, there is a problem of pre-echo noise unique to audio conversion coding. It doesn't mean that you only need to make the block longer.
[0008]
FIG. 11 is a diagram showing the time masking effect of human auditory characteristics. In the figure, the vertical axis represents sound pressure (energy), and the horizontal axis represents time. This indicates that the sound immediately before and after the loud sound is masked and difficult to hear, but the occurrence of pre-echo noise occurs outside the area where this masking effect works, and is caused by a large signal within the block conversion width. This is due to the uniform quantization noise diffused inside. Therefore, the diffused quantization noise is confined in a region where time masking works by using a short transform block to improve the audibility.
[0009]
As a block length conversion width determination method, several determinations in the time domain and in the frequency domain are introduced. As a representative example of determination based on the time domain, there is a document “MD system” published in September 1992. This document discloses a method for determining whether or not an attack sound is input from a ratio of adjacent blocks based on a PCM sample peak value obtained for each unit block.
[0010]
As a determination method based on the frequency domain, Japanese Patent Laid-Open No. 2000-134106 (name of invention “block size determination adaptation method in the frequency domain for audio transform coding”) is known. In the block size determination adaptation method described in this publication, the energy of subbands composed of signals developed in the frequency domain per unit time is compared between adjacent frames, that is, when a predetermined threshold is exceeded, that is, A technique is shown in which a short block is applied when the fluctuation of energy is large, and a long block is applied otherwise.
[0011]
[Problems to be solved by the invention]
However, the method of determining the conversion length of the block length in the time domain described above is suitable for a signal that suddenly rises from a quiet state, but when an attack sound enters from a steady state, for example, a reverberant sound such as a piano song is generated. In the state where the next relatively loud sound overlaps the portion where the signal is present while gradually weakening, the peak value in the block located forward on the time axis does not become small, and the attack sound cannot be detected correctly.
[0012]
In addition, in the method of determining the conversion width of the block length in the frequency domain, the temporal change amount of the energy amount of the subband is not necessarily an accurate frequency depending on the band or the signal form in the analysis window because the analysis block has a fixed length. The energy value of the component is not necessarily derived. In addition, the attack sound has a fundamental frequency, and if converted by the energy amount integrated into the subband, it is affected by the frequency components of other nearby bands, so the transition of the energy amount due to the pure attack sound can be grasped. No.
[0013]
The present invention has been made in view of the above points. An amount of temporal change of a spectrum developed in a frequency domain is obtained at at least a plurality of frequency points to prevent erroneous detection, and an attack is performed based on the amount of change. The frequency conversion that can detect the sudden energy change by sound and adaptively change the block length of the frequency conversion block to detect the attack sound with high accuracy, suppress the pre-echo and improve the sound quality. An object is to provide a block length adaptive conversion apparatus and program.
[0014]
[Means for Solving the Problems]
In order to achieve the above object, an adaptive frequency transform block length transform apparatus according to the present invention is a frequency transform block length adaptive transform apparatus that adaptively switches the block length of a frequency transform block in audio transform coding, wherein an input audio signal is converted . For each of multiple analysis blocks that are shifted in time while the analysis blocks that are divided by a predetermined number of samples and frequency-analyze overlap each other with adjacent analysis blocks, the amount of time variation of the individual frequency spectrum between the analysis blocks The frequency spectrum acquired by the change amount acquisition means in a predetermined frequency adaptation range including a frequency component and a harmonic component in the vicinity of the frequency spectrum with the maximum change amount as a reference . A comparator that compares the amount of change over time with a preset threshold. When, the number of individual frequency spectrum change amount exceeds the threshold by the comparison means detects whether exceeds a predetermined set value, and the block transform width determining means for determining a block length depending on the detection result It is set as the structure which has.
[0015]
In the present invention, when the number of frequency spectra in which the temporal change amount of each frequency spectrum exceeds a threshold value for each analysis block exceeds a predetermined setting value, it is determined that an attack sound is input, When the total does not exceed the predetermined set value, it is determined that the sound is not an attack sound, and the block length corresponding to each can be determined.
[0016]
In addition, the above-described change amount acquisition unit divides the input audio signal by a predetermined number of samples and analyzes each of the plurality of analysis blocks that are shifted in time while overlapping the analysis blocks that are adjacent to each other in frequency analysis. Since the amount of time change of each frequency spectrum between analysis blocks is acquired, the amount of frequency spectrum energy change between multiple analysis blocks after shifting is obtained near the original analysis block boundary. This makes it possible to detect an attack sound generated near the boundary of the original analysis block.
[0017]
Further, the present invention provides the above-described change amount acquisition means, a block dividing unit that divides an input audio signal as an analysis block by a predetermined number of samples, a frequency analysis unit that calculates a frequency spectrum for each analysis block from the block dividing unit, A spectrum energy calculating unit for obtaining energy at each frequency analysis point based on the frequency spectrum, and a spectrum energy calculating unit for calculating a change amount of energy at the same frequency of the analysis block that is temporally different and output from the spectrum energy calculating unit It is set as the structure which consists of. In the present invention, the frequency analysis unit included in the audio transform coding apparatus can be used as the frequency analysis unit.
[0018]
Further, the present invention provides a frequency at which the amount of change is maximum when the comparison means compares the temporal change amount of the frequency spectrum acquired by the change amount acquisition means with a preset threshold value. A predetermined frequency adaptation range including a frequency component and a harmonic component in the vicinity of the spectrum as a reference is changed so as to expand as the maximum change amount decreases.
[0019]
When the frequency component of the attack sound is in the vicinity of the boundary between adjacent analysis blocks, the component due to the attack sound is dispersed between the two analysis blocks , so the amount of energy change is not necessarily large. Therefore, in the present invention, a predetermined frequency adaptation range including a frequency component and a harmonic overtone component in the vicinity of the frequency spectrum having the largest temporal change amount of the frequency spectrum as a reference is expanded as the maximum change amount is smaller. Thus, when the temporal change amount of the frequency spectrum is small , the attack sound is detected by increasing the number of spectra exceeding the threshold.
[0020]
In this case, multiple thresholds for the amount of change are prepared step by step, the threshold is selected according to the amount of energy in the frequency spectrum indicating the maximum amount of change at this time, and the number of spectra exceeding that threshold Is equivalent to
[0021]
In order to achieve the above object, a program according to the present invention causes a computer to function as each means constituting the frequency conversion block length adaptive conversion apparatus according to the present invention.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 shows a block diagram of an embodiment of a frequency conversion block length adaptive conversion apparatus according to the present invention. In the figure, an input PCM signal is supplied to a block dividing unit 11, where it is divided into blocks each having a predetermined number of samples and collected into a predetermined number of blocks. The number of samples of one block (block length) must be equal to or less than the frame length of the audio encoding method, and must match the range of frames to be encoded. This is because the frame corresponding to the attack sound detection result, that is, the frame that should be a short window cannot be specified.
[0023]
In the attack sound, the power ratio of the spectrum changes abruptly between the previous and next blocks. Therefore, in order to accurately capture the transition within the range allowed by the increase in the amount of computation, the shorter the analysis block to a certain extent, the less affected by the part that does not show the signal waveform characteristics at that time, the more reliable. To increase.
[0024]
Therefore, in the present embodiment, a description will be given of an encoding method in which the frame length is divided into four and further has four types of windows, a start window and a stop window, and a long window and a short window as shown in FIG. Needless to say, by applying this embodiment, an encoding method having only two types of windows, a long window and a short window, can be easily applied. Note that the long window of the encoding method having only the latter two types of windows has a trapezoidal shape and can be directly connected to a short window. The shape of the short window is the same as the former encoding method.
[0025]
FIG. 2 shows the analysis block in the frame. One frame is divided into four and consists of four analysis blocks, ie, block 1 to block 4. The range I to which the short block is applied is a length of more than 50% of the long block conversion length II with the middle of the long block as a reference. Therefore, block 1 vs. block 2 and block 2 vs. block 3 may be compared to determine the amount of change in the spectrum power.
[0026]
The block extracted from the block dividing unit 11 performing the above operation is supplied to the frequency analyzing unit 12 in FIG. 1 to calculate the frequency spectrum. The frequency analysis unit 12 calculates a frequency spectrum by a frequency conversion method such as a general fast Fourier transform (FFT). The frequency analysis unit 12 can use the frequency analysis unit included in the audio transform coding apparatus. The frequency spectrum obtained by the frequency analysis unit 12 is supplied to the spectrum energy calculation unit 13 to obtain energy for each frequency spectrum .
[0027]
Subsequently, in order to calculate the energy change amount for each individual frequency spectrum from the previous block and the current block, the energy once obtained by the spectrum energy calculation unit 13 is sent to the spectrum energy buffer 14, while the energy change amount immediately. Are directly supplied to the spectral energy change calculation unit 15.
[0028]
The spectral energy change amount calculation unit 15 compares the spectral energy from the spectral energy buffer 14 with the spectral energy from the spectral energy calculation unit 13, thereby comparing the block 1 to the block 2 shown in FIG. The amount of energy change at the same frequency is measured by a combination of block 2 and block 3. The threshold value comparison unit 16 compares the energy change amount measured by the spectrum energy change amount calculation unit 15 with a predetermined threshold value, and determines whether or not the energy change amount exceeds the threshold value. . This determination is performed Oite each individual frequency spectrum, the determination result is sent to the condition-satisfying point measuring unit 17.
[0029]
To prevent conditions compatible point measuring unit 17 in erroneous detection only if that change in the energy of at least a plurality of frequency spectrum exceeds a threshold value was found, to allow switching to short blocks. The measurement result obtained by the condition conforming point measurement unit 17 is supplied to the block conversion width determination unit 18 where final block length switching determination is performed. Since the block conversion width determination is limited by the conversion lengths of the preceding and subsequent frames, the determination is made in consideration of such a control signal, and the determination result is output as block length information.
[0030]
For example, in the case of a signal waveform as shown in FIG. 3A, the frequency components obtained between the preceding and succeeding blocks differ despite the steady waveform. 3 and FIG. 4 described later, the vertical axis represents amplitude and the horizontal axis represents time (sample). The signal shown in FIG. 3 (A) is obviously a steady signal, but when the amount of energy change of the frequency component is measured between block 2 and block 3 in FIG. You will get data as if it were entered. Since the frequency analysis width shifts at a fixed length, the same phenomenon will always occur depending on the signal waveform even if the block length is increased or decreased.
[0031]
As the frequency component of the attack sound, the frequency that is the fundamental tone, the frequency in the vicinity thereof, and the harmonic component are generated almost simultaneously. FIG. 5 shows a graph showing the amount of change in the power of the signal spectrum with respect to the signal waveform containing the attack sound while the reverberant sound as shown in FIG. 4 remains. As shown in FIG. 5, a frequency component A serving as a fundamental tone, a frequency component B group in the vicinity thereof, and a harmonic component C are newly generated.
[0032]
On the other hand, in the stationary waveform of FIG. 3, the power ratio with the previous block increases remarkably at some frequencies, but the signal component B and the harmonic component C seen in FIG. 5 do not exist. The difference between the two (attack sound and stationary waveform) is the presence or absence of a nearby frequency component and harmonic component with respect to the fundamental tone (where the power ratio is the highest).
[0033]
The conventional method determines whether or not the attack sound is based only on the fundamental sound, and does not consider other frequency components generated by the fundamental sound. On the other hand, in the present invention, focusing on the fact that the difference between the attack sound and the steady waveform is the presence or absence of the nearby frequency component and harmonic component with respect to the fundamental tone, the detection of the fundamental tone and the increase in the energy of the band considered to be caused by the fundamental tone It is characterized in that the attack sound is detected comprehensively based on the amount.
[0034]
As described in conjunction with FIG. 1, the method acquires the amount of increase in energy of the frequency spectrum of the previous block and the current block in the circuit unit from the block dividing unit 11 to the spectral energy change calculation unit 15 in FIG. The threshold comparison unit 16 detects whether or not a predetermined threshold is exceeded.
[0035]
In the condition conforming point measuring unit 17, as shown by III in FIG. 6, the fundamental tone A and its frequency are obtained with respect to the frequency spectrum having the maximum increase amount when the predetermined threshold value of the threshold comparing unit 16 is exceeded. A matching range including the nearby frequency component B and harmonic component C is determined, and the number of spectra exceeding the threshold IV is counted. The block conversion width determination unit 18 detects whether or not the number exceeds a predetermined value. If the number exceeds the predetermined value, the block conversion width determination unit 18 determines that an attack sound has been input in the block, and switches to the short block. Transition. If the predetermined number is not exceeded, it is determined that the attack sound is erroneously detected or the energy of the attack sound is low and the pre-echo is not at a problem level, and the switch to the short block is forgotten.
[0036]
The amount of energy increase in the spectrum of adjacent frequency analysis blocks can be obtained between the blocks as shown in FIG. 2, for example. Usually, frequency analysis is performed by applying a symmetrical window in which the weighting of signal components on both sides such as a Hamming window is reduced in order to reduce distortion at the block boundary. Therefore, when an attack sound is generated at the block boundary, the signal component is lowered by the window, so that there is a possibility that the analysis block as shown in FIG. 2 cannot correctly determine (missing detection).
[0037]
However, if the amount of energy change between adjacent blocks is obtained while shifting based on a length that is ½ of the analysis block width, it is possible to prevent the above-described detection omission. FIG. 7 shows a frequency analysis block diagram using this 1/2 shift. In order to obtain the energy change amount, the frequency component at the block boundary can be correctly measured by measuring the energy change amount at the same frequency by the combination of block 1 to block 3 and block 2 to block 4 in FIG.
[0038]
The frequency component to be converted is divided into several subbands according to the conversion method (block length) (for example, 0 to 5 kHz, 5 kHz to 10 kHz, 10 kHz to 15 kHz). The signal energy (in the above case, a signal in the vicinity of 5 kHz or 10 kHz) has signal energy dispersed between adjacent bands. For this reason, if the frequency component of the attack sound is near the boundary between adjacent bands in the frequency analysis method, the component due to the attack sound is dispersed between the two bands, so the amount of energy change does not necessarily increase. (FIG. 8).
[0039]
Therefore, in order to improve the detection accuracy of the attack sound at the band boundary of the frequency analysis method to be used, the number of spectra exceeding the threshold value that changes according to the amount of increase in the spectrum energy is set. This means that the above-mentioned condition conforming range is changed according to the energy change amount of the fundamental tone. Therefore, in the case of FIG. 8 where the energy change amount of the fundamental tone A is smaller than that of FIG. 6, the adaptation range is expanded from the adaptation range indicated by III in FIG. 6 as indicated by V in FIG. By increasing the number of conditions, it is possible to detect an attack sound at the band boundary and prevent erroneous detection.
[0040]
An attack sound based on a frequency near the band boundary temporarily greatly increases the energy component of the frequency band in the vicinity. Therefore, when the energy increase of the fundamental sound is relatively small, the energy component in the vicinity is observed, and if the increase tendency is also observed in the surrounding spectrum, it is considered as an attack sound, and the conversion to the short block is promoted. If not, it is determined that the attack sound is falsely detected or the energy of the attack sound is low and the pre-echo is not at a problem level, and the switch to the short block is skipped.
[0041]
By setting an adaptive number of spectra according to the amount of energy increase of the frequency component that becomes the fundamental tone as described above, it is possible to accurately detect an attack sound even in the frequency component at the band boundary determined by the frequency analysis method to be used. It becomes possible.
[0042]
Further, for the above reason, the method for detecting the amount of increase in energy in the frequency spectrum does not need to define a frequency analysis method. In general audio coding, an acoustic psychological model is used, and compression is performed by weighting components in the frequency domain. This auditory psychological model analysis unit uses a frequency analysis method such as FFT, and the analysis width is substantially equal to the width of the determined block length (long block or short block).
[0043]
The frequency analysis method used in the present invention is not particularly specified, and it is possible to use the frequency analysis data originally possessed by the audio encoding device. As a result, it is expected to suppress the calculation amount and the memory capacity. it can.
[0044]
It is also possible to cause a computer program to execute a method in which the operation of each block constituting the frequency conversion block length adaptive conversion apparatus of the embodiment of FIG. In the above embodiment, the attack sound is detected based on the energy amount of the frequency spectrum. However, the attack sound can be similarly detected based on the amplitude value of the frequency spectrum.
[0045]
【The invention's effect】
As described above, according to the present invention, the total number of frequency spectra in which the temporal change amount of each frequency spectrum exceeds the threshold for each of the plurality of analysis blocks exceeds a predetermined set value. Sometimes it is determined that an attack sound has been input, and when the number does not exceed the predetermined set value, it is determined that the sound is not an attack sound, and the block length corresponding to each is determined. It is possible to prevent erroneous detection and omission of sound, and to detect an attack sound with high accuracy, thereby suppressing pre-echo and improving sound quality as compared with the prior art.
[0046]
Further, according to the present invention, an analysis block for each of a plurality of analysis blocks in which an analysis block that divides an input audio signal by a predetermined number of samples and performs frequency analysis overlaps with an adjacent analysis block and shifts in time while overlapping each other. Since the attack sound generated near the original analysis block boundary can be detected by acquiring the temporal change amount of the individual frequency spectrum in the meantime, it is possible to further suppress the detection leak of the attack sound.
[0047]
Furthermore, according to the present invention, a predetermined frequency adaptation range including a frequency component and a harmonic component in the vicinity of the frequency spectrum having the maximum temporal change in the frequency spectrum is used as a reference, and the maximum change is small. By extending the frequency range, the number of spectrums exceeding the threshold is increased when the amount of temporal change in the frequency spectrum is small, so that it is possible to detect the attack sound at the band boundary, and the error of the attack sound. Detection can be prevented.
[Brief description of the drawings]
FIG. 1 is a block diagram of an embodiment of a frequency conversion block length adaptive conversion apparatus according to the present invention.
FIG. 2 is a diagram illustrating an example of an analysis block in a frame.
FIG. 3 is a waveform diagram showing an example of erroneous detection of an attack sound in a steady waveform.
FIG. 4 is a waveform diagram showing an example of an attack sound input in the presence of a reverberant sound.
5 is a diagram showing an energy change amount of a frequency component in the waveform diagram of FIG.
FIG. 6 is a diagram illustrating an example of a spectrum matching range caused by an attack sound.
FIG. 7 is a diagram illustrating an analysis block in a frame using a ½ shift.
FIG. 8 is a diagram illustrating a condition conforming range that changes in accordance with an energy change amount of a fundamental tone.
FIG. 9 is a diagram showing a mechanism of MDCT and IMDCT conversion;
FIG. 10 is a diagram illustrating a window shape in AAC block conversion;
FIG. 11 is a diagram showing a time masking effect in auditory characteristics.
[Explanation of symbols]
11 Block division unit 12 Frequency analysis unit 13 Spectral energy calculation unit 14 Spectral energy buffer 15 Spectral energy change calculation unit 16 Threshold comparison unit 17 Condition conforming point measurement unit 18 Block conversion width determination unit

Claims

オーディオ変換符号化における周波数変換ブロックのブロック長を適応的に切り替える周波数変換ブロック長適応変換装置であって、
入力オーディオ信号を所定のサンプル数で分割し周波数解析する解析ブロック同士が隣接する解析ブロックと半分ずつ重なり合いながら時間的にシフトする複数の解析ブロックのそれぞれについて、解析ブロック間の個々の周波数スペクトルの時間的な変化量を取得する変化量取得手段と、
前記変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲において、前記変化量取得手段により取得された前記周波数スペクトルの時間的な変化量と、予め設定したしきい値とを比較する比較手段と、
前記比較手段により前記変化量が前記しきい値を超えた個々の周波数スペクトルの本数が、所定の設定値を越えたか否かを検出し、その検出結果によって前記ブロック長を決定するブロック変換幅決定手段と
を有することを特徴とする周波数変換ブロック長適応変換装置。A frequency transform block length adaptive transform apparatus that adaptively switches a block length of a frequency transform block in audio transform coding,
Divide the input audio signal by a predetermined number of samples , and analyze each frequency spectrum between analysis blocks for each of multiple analysis blocks that shifts in time while the analysis blocks that perform frequency analysis overlap each other by half. Change amount acquisition means for acquiring a typical change amount;
A temporal change amount of the frequency spectrum acquired by the change amount acquisition unit in a predetermined frequency adaptation range including a frequency component and a harmonic component in the vicinity of the frequency spectrum having the maximum change amount as a reference. Comparing means for comparing with a preset threshold value;
Block conversion width determination for detecting whether or not the number of individual frequency spectra whose change amount exceeds the threshold value exceeds a predetermined set value by the comparison means, and determining the block length based on the detection result And a frequency conversion block length adaptive conversion apparatus.

前記変化量取得手段は、入力オーディオ信号を所定のサンプル数ずつ解析ブロックとして分割するブロック分割部と、前記ブロック分割部からの前記解析ブロック毎に周波数スペクトルを算出する周波数解析部と、前記周波数スペクトルを用いて個々の周波数スペクトルのエネルギーを求めるスペクトルエネルギー算出部と、前記スペクトルエネルギー算出部から出力される、時間的に異なる前記解析ブロックの同一周波数におけるエネルギーの変化量を算出するスペクトルエネルギー算出手段とからなることを特徴とする請求項１記載の周波数変換ブロック長適応変換装置。The change amount acquisition means includes: a block division unit that divides an input audio signal into analysis blocks by a predetermined number of samples; a frequency analysis unit that calculates a frequency spectrum for each analysis block from the block division unit; and the frequency spectrum A spectrum energy calculation unit for obtaining energy of individual frequency spectrums using spectrum energy, and a spectrum energy calculation unit for calculating a change amount of energy at the same frequency of the analysis block that is temporally different and output from the spectrum energy calculation unit; The frequency conversion block length adaptive conversion apparatus according to claim 1, comprising:

前記比較手段は、前記変化量取得手段により取得された前記周波数スペクトルの時間的な変化量と、予め設定したしきい値とを比較するときの、前記変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲を、前記最大の変化量が小さいほど拡張するように変化させることを特徴とする請求項１又は２記載の周波数変換ブロック長適応変換装置。Said comparing means includes a temporal variation amount of the frequency spectrum obtained by the variation acquiring unit, a reference of the amount of change is the frequency spectrum is maximum when comparing the preset threshold 3. The frequency transform block length adaptation according to claim 1, wherein a predetermined frequency adaptation range including a frequency component and a harmonic component in the vicinity thereof is changed so as to expand as the maximum change amount decreases. Conversion device.

コンピュータを、
入力オーディオ信号を所定のサンプル数で分割し周波数解析する解析ブロック同士が隣接する解析ブロックと半分ずつ重なり合いながら時間的にシフトする複数の解析ブロックのそれぞれについて、解析ブロック間の個々の周波数スペクトルの時間的な変化量を取得する変化量取得手段と、
前記変化量が最大である周波数スペクトルを基準とする、その近傍の周波数成分と倍音成分を含む所定の周波数適合範囲において、前記変化量取得手段により取得された前記周波数スペクトルの時間的な変化量と、予め設定したしきい値とを比較する比較手段と、
前記比較手段により前記変化量が前記しきい値を超えた個々の周波数スペクトルの本数が、所定の設定値を越えたか否かを検出し、その検出結果によって前記ブロック長を決定するブロック変換幅決定手段と
して機能させることを特徴とするプログラム。Computer
Divide the input audio signal by a predetermined number of samples , and analyze each frequency spectrum between analysis blocks for each of multiple analysis blocks that shifts in time while the analysis blocks that perform frequency analysis overlap each other by half. Change amount acquisition means for acquiring a typical change amount;
A temporal change amount of the frequency spectrum acquired by the change amount acquisition unit in a predetermined frequency adaptation range including a frequency component and a harmonic component in the vicinity of the frequency spectrum having the maximum change amount as a reference. Comparing means for comparing with a preset threshold value;
Block conversion width determination for detecting whether or not the number of individual frequency spectra whose change amount exceeds the threshold value exceeds a predetermined set value by the comparison means, and determining the block length based on the detection result Means and
Program characterized thereby to function.