JP2005010337A

JP2005010337A - Audio signal compression method and apparatus

Info

Publication number: JP2005010337A
Application number: JP2003173046A
Authority: JP
Inventors: Takushi Iwasaki; 拓史岩崎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-06-18
Filing date: 2003-06-18
Publication date: 2005-01-13

Abstract

<P>PROBLEM TO BE SOLVED: To speed up psychological acoustic sense analysis processing while conforming the processing to MODEL1 of a psychological acoustic sense model by ISO/IEC 11172-3. <P>SOLUTION: An effective band estimation part 11 estimates a band (a valid band variable) effectively used for psychological acoustic sense analysis processing on the basis of the number of subbands in use which is determined according to the bit rate and sampling frequency of an audio signal, a critical subband number determination part 12 determines the critical number of usable subband ( a critical band variable) as to nonlinear subbands used to make a noise choice according to the estimated valid band variable, and after an effective band correction part 13 corrects the valid band variable, the psychological acoustic sense analysis processing is carried out by using the critical band variable and valid band variable. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号圧縮方法及び音声信号圧縮装置に関し、特にサブバンド符号化及び人間の聴覚の特性を用いた心理聴覚分析処理により音声信号を圧縮する音声信号圧縮方法及び音声信号圧縮装置に関する。
【０００２】
【従来の技術】
デジタルの音声信号（以下オーディオ信号と記載する）の圧縮符号化方式の１つであるＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＣｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）１Ａｕｄｉｏは、国際標準方式のＩＳＯ／ＩＥＣ１１１７２−３で規定されており、特に、ＭＰＥＧ１Ａｕｄｉｏｌａｙｅｒ２（以下ＭＰ２と記載する）は、デジタルビデオカメラなどの音声記録の際に用いられている。
【０００３】
図７は、ＩＳＯ／ＩＥＣ１１１７２−３で規定されるＭＰＥＧ１Ａｕｄｉｏ（ｌａｙｅｒ１及びｌａｙｅｒ２）のオーディオ信号圧縮回路の構成を示すブロック図である。
【０００４】
図のように、オーディオ信号圧縮回路１００は、デジタルのオーディオ信号を入力するオーディオ信号入力部１０１と、オーディオ信号をサブバンドに分割するフィルタバンク適用部１０２と、スケールファクタを計算するスケールファクタ計算部１０３と、量子化を行う量子化器１０４と、人間の聴覚の特性に基づいたマスキング（心理聴覚分析処理）を行ってデータ量を削減する心理聴覚分析部１０５ｂと、ビット割り当てを計算するビット割り当て計算部１０６と、ＭＰＥＧ１Ａｕｄｉｏのビットストリームを生成するビットストリーム生成部１０７と、から構成されている。
【０００５】
オーディオ信号入力部１０１に、外部Ａ／Ｄコンバータなどによりデジタル化されたオーディオ信号がフレーム単位で入力されると、入力されたオーディオ信号はフィルタバンク適用部１０２及び心理聴覚分析部１０５ｂに渡される。フィルタバンク適用部１０２では、入力されたオーディオ信号が３２個のサブバンドに線形に分割される。分割されたサブバンドは、スケールファクタ計算部１０３に入力される。スケールファクタ計算部１０３では、各サブバンドにおいて最大絶対値となるサンプルを検出し、その値を対数に変換して量子化したスケールファクタを計算する。このスケールファクタを用いて、最大振幅が１．０になるように正規化し、各サブバンドのダイナミックレンジをそろえる。
【０００６】
一方、心理聴覚分析部１０５ｂでは、入力されたオーディオ信号からサブバンドごとに、人間の聴覚での音声の感知限界閾値（以下マスキング閾値と呼ぶ）を計算する。ビット割り当て計算部１０６では、計算されたマスキング閾値を用いてビットの割り当てを計算する。
【０００７】
量子化器１０４では、スケールファクタ計算部１０３で計算されたスケールファクタと、ビット割り当て計算部１０６の出力を用いて、サブバンドが量子化される。最終的には、ビットストリーム生成部１０７で、ＭＰＥＧ１Ａｕｄｉｏのビットストリームを生成する。
【０００８】
図８は、心理聴覚分析部での処理の一例を示すフローチャートである。
この手順は、ＩＳＯ／ＩＥＣ１１１７２−３で規定されている心理聴覚モデルのＭＯＤＥＬ１によるものであり、８つの手順からなる。
【０００９】
ステップＳ１００：フーリエ変換
オーディオ信号が入力されると心理聴覚分析部１０５ｂでは、まず、高速フーリエ変換（ＦＦＴ）などを行い、オーディオ信号を周波数成分に変換する。なお、ＭＰ２では１フレームあたり１１５２オーディオサンプルを扱うが、ＦＦＴの性質により２の階乗個の１０２４オーディオサンプルに対して周波数変換を行う。
【００１０】
また、ここでは、隣接２周波数成分の２乗和をとることで、パワースペクトルを求めている。隣接２周波数成分をまとめることで、帯域幅は１０２４から５１２に変わる。
【００１１】
ステップＳ１０１：音圧レベル計算
ここでは、ステップＳ１００の処理で算出した、５１２帯域のパワースペクトルを３２個の線形なサブバンドに分割する。分割したサブバンドはそれぞれ１６個のパワースペクトルを含んでいる。このパワースペクトルの総和をとることで、サブバンドの音圧レベルを計算する。従ってここでは、１６回の和算を３２回繰り返す。
【００１２】
ステップＳ１０２：音選択
ここでは、５１２帯域のパワースペクトルの中から音としてマークする成分を探す。探索法は、まず下位帯域から上位帯域へ全てのパワースペクトルを探索し、変分点となるところ、すなわちスペクトルが上に凸になる場所を、音成分の可能性があるところとしてマークする。そしてマークした成分が周りの成分よりも７ｄＢ大きなところを、最終的に音成分として決定する。
【００１３】
ステップＳ１０３：ノイズ選択
ここでは、５１２帯域のパワースペクトルを非線形なサブバンドに分割し、ノイズ成分の検出を行う。非線形サブバンドの分け方はサンプリング周波数に依存し、サンプリング周波数３２ｋＨｚの場合は２５個に分けられる。それ以外は２７個に分けられる。ノイズの検出は非線形サブバンドごとに行われ、各サブバンドに属する全てのパワースペクトルを重み付けしながら足し合わせることでノイズか否かを判断する。この処理は積和処理になる。
【００１４】
ステップＳ１０４：マスカー選択
ステップＳ１０２、Ｓ１０３で抽出された音、ノイズ成分（これらをマスカーと呼ぶ）を所定の間隔で間引き、マスカーを選択する。間引き方はＩＳＯ／ＩＥＣ１１１７２−３で規定されている。
【００１５】
ステップＳ１０５：大域的マスク閾値計算
５１２帯域のパワースペクトルを非線形な１３３個のサブバンドに分割し、それぞれのサブバンド領域に対して、ステップＳ１０４の処理で選択したマスカーを用いてマスク閾値を計算する。なお、この１３３個の非線形なサブバンドを以下周波数バンドと呼ぶ。周波数バンドの分け方はＩＳＯ／ＩＥＣ１１１７２−３で規定されている。
【００１６】
ステップＳ１０６：最小マスク閾値計算
ステップＳ１０１で算出した線形な３２分割のサブバンドに対する最小マスク閾値を求める。これはステップＳ１０５で求めた非線形な周波数バンドごとのマスク閾値を線形な３２分割のサブバンドに対応させる処理である。
【００１７】
ステップＳ１０７：ＳＭＲ計算
ここでは、３２個のサブバンドごとに、ステップＳ１０６で求めた最小マスク閾値と、各サブバンドの信号の最大値の差（ＳＭＲ：ＳｉｇｎａｌＭａｓｋＲａｔｉｏ）を計算する。
【００１８】
上記のような、心理聴覚分析部１０５ｂでの処理は、演算量が多く、計算に非常に時間がかかる欠点がある。
ＭＰ２の符号化ではオーディオ信号は、サブバンド符号化（帯域分割符号化ともいう）により、３２個のサブバンドに分割され符合化される。しかしながら分割したサブバンドは３２個全て使われることはない。サブバンドの使用数はビットレート及びサンプリング周波数に依存し、最大でも３０個までしか使用しない。ＩＳＯ／ＩＥＣ１１１７２−３によれば、ビットレート３２ｋｂｐｓ、サンプリング周波数３２ｋＨｚの条件では最大のサブバンドの使用数は１２個と非常に少なくなる。従来、心理聴覚分析部１０５ｂ以外の処理については、ＩＳＯ／ＩＥＣ１１１７２−３においてサブバンドの使用・未使用による高速化が図られていた。
【００１９】
心理聴覚分析部１０５ｂの演算量を軽減する方法も、いつくか提案されている（例えば、特許文献１参照）。
【００２０】
【特許文献１】
特開２００２−１８９４９９号公報（段落番号〔００１０〕〜〔００１４〕，第１図）
【００２１】
【発明が解決しようとする課題】
しかし、上記の従来技術では、ＩＳＯ／ＩＥＣ１１１７２−３による心理聴覚モデルのＭＯＤＥＬ１とは異なる方法を用いており、国際標準の規格に準拠しない方法であるという問題があった。
【００２２】
本発明はこのような点に鑑みてなされたものであり、ＩＳＯ／ＩＥＣ１１１７２−３による心理聴覚モデルのＭＯＤＥＬ１に準拠しながらも、心理聴覚分析部の処理を高速化することが可能な、音声信号圧縮方法及び音声信号圧縮装置を提供することを目的とする。
【００２３】
【課題を解決するための手段】
本発明では上記課題を解決するために、サブバンド符号化及び人間の聴覚の特性を用いた心理聴覚分析処理により音声信号を圧縮する音声信号圧縮方法において、前記音声信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、前記心理聴覚分析処理で使用するのに有効な帯域を線形近似により見積もり、見積もった前記有効な帯域に応じて、前記心理聴覚分析処理でノイズ選択の際に用いられる非線形なサブバンドにおいて、使用に足りる臨界のサブバンド数を決定し、前記心理聴覚分析処理で使用する線形または非線形なサブバンドに応じて前記有効な帯域を補正し、決定した前記臨界のサブバンド数及び補正した前記有効な帯域を用いて、前記心理聴覚分析処理を行うことを特徴とする音声信号圧縮方法が提供される。
【００２４】
上記の方法によれば、音声信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、心理聴覚分析処理で使用するのに有効な帯域を見積もり、見積もった有効な帯域に応じてノイズ選択の際に用いられる非線形なサブバンドにおいて、使用に足りる臨界のサブバンド数を決定し、さらに、有効な帯域を補正した後、臨界のサブバンド数、有効な帯域を用いて心理聴覚分析処理を行うので、演算に使用する帯域が限定され、演算量を削減し、処理を高速化する。
【００２５】
【発明の実施の形態】
以下本発明の実施の形態を図面を参照して説明する。
図１は、本発明の実施の形態の音声信号圧縮装置の主要部を示す機能ブロック図である。
【００２６】
本発明の実施の形態の音声信号圧縮装置１０は、心理聴覚分析部１０５ａにおける、図８で示した、ＩＳＯ／ＩＥＣ１１１７２−３で規定される心理聴覚モデルのＭＯＤＥＬ１で行う処理を高速化するためのものであり、有効帯域見積もり部１１と、臨界サブバンド数決定部１２と、有効帯域補正部１３と、を有する。
【００２７】
なお、図１において、図７で示したＩＳＯ／ＩＥＣ１１１７２−３で規定されるＭＰＥＧ１Ａｕｄｉｏ（ｌａｙｅｒ１及びｌａｙｅｒ２）のオーディオ信号圧縮回路の構成のうち、従来の心理聴覚分析部１０５ｂは、図１の心理聴覚分析部１０５ａに対応している。それ以外の構成要素は図７に示したものと同様であるので図示を省略している。
【００２８】
有効帯域見積もり部１１は、オーディオ信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、心理聴覚分析部１０５ａで使用するのに有効な帯域（以下バリッドバンド（ｖａｌｉｄｂａｎｄ）変数と呼ぶ）を線形近似により見積もる。
【００２９】
臨界サブバンド数決定部１２は、見積もったバリッドバンド変数に応じて、心理聴覚分析部１０５ａでの図８のステップＳ１０３のノイズ選択の際に用いられる非線形なサブバンドにおいて、使用に足りる臨界のサブバンド数（以下クリティカルバンド（ｃｒｉｔｉｃａｌｂａｎｄ）変数と呼ぶ）を決定する。
【００３０】
有効帯域補正部１３は、心理聴覚分析部１０５ａで使用する線形または非線形なサブバンドに応じてバリッドバンド変数を補正する。
心理聴覚分析部１０５ａで使用する線形なサブバンドとしては、図８で示した、ＩＳＯ／ＩＥＣ１１１７２−３で規定される心理聴覚モデルのＭＯＤＥＬ１で行う処理のうち、ステップＳ１０１の音圧レベル計算で使用する５１２帯域を３２個に線形に分割したサブバンドがある。
【００３１】
非線形なサブバンドとしては、前述したノイズ選択の際に用いられる５１２帯域を２５または２７個に非線形に分割したサブバンドや、図８のステップＳ１０５の処理で大域的マスク閾値計算の際に用いられる１３３個に非線形に分割したサブバンドがある（詳しくは後述する）。
【００３２】
以下、図１で示した音声信号圧縮装置１０の動作を説明する。
まず、有効帯域見積もり部１１においてバリッドバンド変数を見積もる。
音声信号圧縮装置１０において、図７で示した構成のうち、心理聴覚分析部１０５ａ以外では１１５２オーディオサンプルを用いている。そのため、心理聴覚分析部１０５ａ以外のサブバンドは、１１５２オーディオサンプルのオーディオ信号を３２個の周波数帯に分割したものである。オーディオ信号のビットレート及びサンプリング周波数が決まると、ＩＳＯ／ＩＥＣ１１１７２−３の仕様により、サブバンド使用数が決まる。例えば、ビットレート３２ｋｂｐｓ、サンプリング周波数３２ｋＨｚの場合、使用するサブバンド数は１２個である。
【００３３】
心理聴覚分析部１０５ａでの処理ではオーディオ信号は５１２帯域で表される。そこで、有効帯域見積もり部１１は、サブバンド使用数をもとに線形近似により心理聴覚分析部１０５ａで使用する５１２帯域時のうち有効である帯域、すなわちバリッドバンド変数を見積もる。
【００３４】
図２は、バリッドバンド変数の見積もり方を示す図である。
図では、１１５２オーディオサンプルのオーディオデータを３２個の周波数帯に分割したものと、心理聴覚分析部１０５ａで使用する５１２帯域を、線形な３２個の周波数帯域に分割したものとの対応を示している。
【００３５】
心理聴覚分析部１０５ａ以外での１１５２オーディオサンプルにおけるサブバンド使用数を“ｓｂｌｉｍｉｔ”とし、バリッドバンド変数を“ｖａｌｉｄ＿ｂａｎｄ”と表記すると、単純比例計算になるので、バリッドバンド変数は、次のように概算できる。
【００３６】
【数１】
ｖａｌｉｄ＿ｂａｎｄ＝ｓｂｌｉｍｉｔ×５１２／３２ ……（１）
サブバンド使用数“ｓｂｌｉｍｉｔ”は、オーディオ信号のビットレート及びサンプリング周波数に応じて決定される。例えば、ビットレート３２ｋｂｐｓ、サンプリング周波数３２ｋＨｚの場合、サブバンド使用数“ｓｂｌｉｍｉｔ＝１２”となる。これを図２で示すように、心理聴覚分析部１０５ａで使用する５１２帯域に対応させると、（１）式より、“ｖａｌｉｄ＿ｂａｎｄ＝１９２”となり、バリッドバンド変数を見積もることができる。
【００３７】
次に、臨界サブバンド数決定部１２では、見積もったバリッドバンド変数に応じて、心理聴覚分析部１０５ａで図８のステップＳ１０３のノイズ選択の際に用いるためのクリティカルバンド変数を決定する。
【００３８】
図３は、サンプリング周波数３２ｋＨｚの場合のクリティカルバンド変数と、クリティカルバンド変数の値に対応する帯域幅の図である。
非線形のサブバンドの分け方はＩＳＯ／ＩＥＣ１１１７２−３で規定されており、サンプリング周波数に依存する。サンプリング周波数３２ｋＨｚの場合は５１２帯域を２５個に分け、最大のクリティカルバンド変数は“２４”となる。それ以外は５１２帯域を２７個に分け、最大のクリティカルバンド変数は“２６”となる。図では、３２ｋＨｚの場合、最大のクリティカルバンド変数が“２４”の場合について示している。
【００３９】
従来では、どの帯域まで使用されているかにかかわらず（サブバンドの使用数にかかわらず）クリティカルバンド変数が最大の“２４”まで、帯域幅でいうと４８０までを計算していた。本発明の実施の形態では、有効帯域見積もり部１１で見積もったバリッドバンド変数に応じて、ノイズ選択の際に使用に足りるだけのクリティカルバンド変数を決定する。例えば、前述したようにビットレート３２ｋｂｐｓで、サンプリング周波数３２ｋＨｚの場合、バリッドバンド変数は“ｖａｌｉｄ＿ｂａｎｄ＝１９２”となるから、図３を参照すると、バリッドバンド変数に対応したクリティカルバンド変数は、“１９”となる。クリティカルバンド変数とバリッドバンド変数の関係は、以下の式のように示される。
【００４０】
【数２】
ｂａｎｄ＿ｗｉｄｔｈ［ｃｒｉｔ＿ｂａｎｄ−１］＜ｖａｌｉｄ＿ｂａｎｄ≦ｂａｎｄ＿ｗｉｄｔｈ［ｃｒｉｔ＿ｂａｎｄ］ ……（２）
上式において、クリティカルバンド変数は“ｃｒｉｔ＿ｂａｎｄ”、バリッドバンド変数は“ｖａｌｉｄ＿ｂａｎｄ”と表記しており、“ｂａｎｄ＿ｗｉｄｔｈ”は帯域幅を示し、例えば、“ｂａｎｄ＿ｗｉｄｔｈ［ｃｒｉｔ＿ｂａｎｄ］”はクリティカルバンド変数“ｃｒｉｔ＿ｂａｎｄ”の帯域幅を示す。
【００４１】
式（２）のようにして、“ｖａｌｉｄ＿ｂａｎｄ”の値が、“ｃｒｉｔ＿ｂａｎｄ−１”の帯域幅と“ｃｒｉｔ＿ｂａｎｄ”の帯域幅の間に収まるクリティカルバンド変数を探す。
【００４２】
このようにして、クリティカルバンド変数を、見積もったバリッドバンド変数に応じて、使用に足りる分だけ計算すればよいので、演算量を減らすことができる。
【００４３】
次に有効帯域補正部１３にて、バリッドバンドの補正を行う。バリッドバンド変数の補正は、まず、臨界サブバンド数決定部１２において決定したクリティカルバンド変数に応じて行う。具体的には、以下の式に従って補正する。
【００４４】
【数３】
ｖａｌｉｄ＿ｂａｎｄ＝ｂａｎｄ＿ｗｉｄｔｈ［ｃｒｉｔ＿ｂａｎｄ］……（３）
式（３）のように、バリッドバンド変数を、クリティカルバンド変数の帯域幅に合わせる。例えば、図３のようにクリティカルバンド変数“１９”の帯域幅に合わせるために“Δｗｄ”だけバリッドバンド変数を引き上げる。
【００４５】
心理聴覚分析部１０５ａの処理では、ＩＳＯ／ＩＥＣ１１１７２−３による心理聴覚モデルのＭＯＤＥＬ１の処理を示す図８のステップＳ１０５の大域的マスク閾値計算において、５１２帯域を１３３個に分割した非線形なサブバンド（周波数バンド）を用いる。以下の処理では、この周波数バンドに対応させるために、バリッドバンド変数を補正する。
【００４６】
図４は、サンプリング周波数３２ｋＨｚの場合の周波数バンドと、帯域幅の関係を示す図である。
周波数バンドの分け方はＩＳＯ／ＩＥＣ１１１７２−３で規定されている。
【００４７】
ここで示した周波数バンドに応じて、前述のクリティカルバンド変数に応じた補正と同様にして、バリッドバンド変数を補正することで、１３３個の非線形な周波数バンドに対応させることができる。具体的には、以下の式に従って補正する。
【００４８】
【数４】
ｂａｎｄ＿ｗｉｄｔｈ［ｆｒｅｑｕｅｎｃｙ＿ｂａｎｄ−１］＜ｖａｌｉｄ＿ｂａｎｄ≦ｂａｎｄ＿ｗｉｄｔｈ［ｆｒｅｑｕｅｎｃｙ＿ｂａｎｄ］……（４）
なお、ここで、周波数バンドは“ｆｒｅｑｕｅｎｃｙ＿ｂａｎｄ”と表記している。“ｂａｎｄ＿ｗｉｄｔｈ［ｆｒｅｑｕｅｎｃｙ＿ｂａｎｄ］”は、周波数バンドの帯域幅となる。
【００４９】
式（４）において周波数バンドの帯域幅を決定し、決定した周波数バンドの帯域幅が補正されたバリッドバンド変数となる。すなわち次式のようになる。
【００５０】
【数５】
ｖａｌｉｄ＿ｂａｎｄ＝ｂａｎｄ＿ｗｉｄｔｈ［ｆｒｅｑｕｅｎｃｙ＿ｂａｎｄ］ ……（５）
上記のように、始めに、５１２帯域を３２個に線形に分割したサブバンドから、バリッドバンド変数を補正し、次に５１２帯域を１３３個に非線形に分割したサブバンドで補正することで精度のよい補正を行うことができる。
【００５１】
心理聴覚分析部１０５ａの処理では、ＩＳＯ／ＩＥＣ１１１７２−３による心理聴覚モデルのＭＯＤＥＬ１の処理を示す図８のステップＳ１０１の音圧レベル計算において、５１２帯域を３２個のサブバンドへ線形に対応させるために１６サンプルごとに演算する必要がある。これに対応するため、バリッドバンド変数をさらに１６の倍数になるように補正する。例えば、以下の式に従って補正する。
【００５２】
【数６】
ｖａｌｉｄ＿ｂａｎｄ＝ｖａｌｉｄ＿ｂａｎｄ＿ｏｌｄ−（ｖａｌｉｄ＿ｂａｎｄ＿ｏｌｄ％１６）＋１６ ……（６）
ここで、“ｖａｌｉｄ＿ｂａｎｄ＿ｏｌｄ”は、１６の倍数に補正する前のバリッドバンド変数、“ｖａｌｉｄ＿ｂａｎｄ＿ｏｌｄ％１６”は、“ｖａｌｉｄ＿ｂａｎｄ＿ｏｌｄ”を１６で割ったときの余りを表している。
【００５３】
以上のようにして、補正したバリッドバンド変数と、クリティカルバンド変数を用いて図８で示したようなＩＳＯ／ＩＥＣ１１１７２−３による心理聴覚モデルのＭＯＤＥＬ１の処理を行う。
【００５４】
以下、上記の処理の流れをフローチャートでまとめる。
図５は、本発明の実施の形態の音声信号圧縮方法の処理の流れを説明するフローチャートである。
【００５５】
Ｓ１：バリッドバンド変数見積もり
オーディオ信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、心理聴覚分析処理で使用するのに有効なバリッドバンドを線形近似により見積もる。
【００５６】
Ｓ２：クリティカルバンド変数決定
見積もったバリッドバンド変数に応じて、心理聴覚分析部１０５ａで図８のステップＳ１０３のノイズ選択の際に用いるためのクリティカルバンド変数を決定する。
【００５７】
Ｓ３：バリッドバンド変数補正
心理聴覚分析処理で使用する線形または非線形なサブバンドに応じてバリッドバンド変数を補正する。具体的には、以下の３段階で行う。すなわち、１．決定したクリティカルバンド変数の帯域に合わせるように補正する。２．５１２帯域を非線形に１３３個に分割したサブバンドである周波数バンドに応じて補正する。３．１６の倍数になるように補正する。
【００５８】
Ｓ４：心理聴覚分析処理
ステップＳ２の処理で決定したクリティカルバンド変数及び、ステップＳ３の処理で補正したバリッドバンド変数を用いて、図８で示したＩＳＯ／ＩＥＣ１１１７２−３による心理聴覚モデルのＭＯＤＥＬ１の処理を行う。
【００５９】
上記のようにして算出したバリッドバンド変数と、クリティカルバンド変数を用いて心理聴覚分析処理を行うことで、以下のような効果が期待できる。
まず、図８のステップＳ１００のフーリエ変換処理の後のパワースペクトルを求める際、従来では５１２帯域全てにわたって２乗和を計算していたが、本発明の実施の形態の処理により決定したバリッドバンド変数を用いることにより、５１２帯域全てについて演算を行う必要がなくなり、バリッドバンド変数まで演算すればよい。
【００６０】
また、ステップＳ１０１の音圧レベル計算においても、５１２帯域全てについて音圧計算をする必要がなくなる。バリッドバンド変数は補正して１６の倍数になっているため、計算頻度は、１６回の和算をバリッドバンド変数／１６回、繰り返すだけで済むようになる。
【００６１】
ステップＳ１０２の音選択においては、バリッドバンド変数を導入することで、５１２帯域から変分点を探す処理、すなわち５１２回の隣接３成分間の比較処理が、バリッドバンド変数までの領域で変分点を探す処理、すなわちバリッドバンド変数回の隣接３成分間の比較処理で済むようになる。
【００６２】
また、ステップＳ１０３のノイズ選択においては、ステップＳ２の処理で決定したクリティカルバンド変数を導入することで、例えば２５個の非線形サブバンド全てについて、それぞれ積和処理をするのではなく、クリティカルバンド変数までの非線形サブバンドまで、それぞれ積和処理をするだけで済むようになる。
【００６３】
以上により、心理聴覚分析部１０５ａでの演算量を大幅に削減することができ、処理を高速化することが可能になる。
次に、本発明の実施の形態の音声信号圧縮装置を適用した具体的なハードウェア構成例を示す。
【００６４】
図６は、オーディオ信号を記録するオーディオ信号記録装置の概略の構成図である。
図のように、オーディオ信号記録装置２０は、入力されたアナログのオーディオ信号をデジタル信号に変換するＡ／Ｄ変換器２１と、オーディオ信号をＭＰ２形式で圧縮符号化するＭＰ２エンコーダ２２とからなる。本発明の実施の形態の音声信号圧縮装置１０は、ここで示したＭＰ２エンコーダ２２により実現できる。
【００６５】
図６で示したようなオーディオ信号記録装置２０は、例えば、デジタルビデオカメラに搭載される。
オーディオ信号記録装置２０の動作について簡単に説明する。
【００６６】
アナログのオーディオ信号が入力されると、Ａ／Ｄ変換器２１は、オーディオ信号をデジタルのオーディオ信号に変換する。変換後、オーディオ信号は、ＭＰ２エンコーダ２２に入力される。ＭＰ２エンコーダ２２では、図７に示したような各部での処理や、図１に示した各機能により演算量が削減された心理聴覚分析部１０５ａでの処理により、オーディオ信号をＭＰ２形式に圧縮符号化し、記録メディア３０に記録する。
【００６７】
記録メディア３０としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記録装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、ＣＤ−Ｒ（Ｒｅｃｏｒｄａｂｌｅ）／ＲＷ（ＲｅＷｒｉｔａｂｌｅ）などがある。光磁気記録媒体には、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｃ）などがある。
【００６８】
上記のように、本発明によれば、心理聴覚分析処理の演算量を削減することができ、処理時間を大幅に短縮することができる。また、処理時間の短縮は、ＩＳＯ／ＩＥＣ１１１７２−３に準拠していながら達成されている。処理時間が短縮されることにより、図６で示したようなオーディオ信号記録装置２０をより低い低周波数で駆動することができるようになり、オーディオ信号記録装置２０の低消費電力化、電力供給装置の小型化、オーディオ信号記録装置２０自体の小型化が期待できる。
【００６９】
なお、本発明は、ＩＳＯ／ＩＥＣ１１１７２−３に準拠していることを特徴としているが、上記の規格に限定されるものではない。
【００７０】
【発明の効果】
以上説明したように本発明では、音声信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、心理聴覚分析処理で使用するのに有効な帯域（バリッドバンド変数）を見積もり、見積もったバリッドバンドに応じてノイズ選択の際に用いられる非線形なサブバンドにおいて、使用に足りる臨界のサブバンド数（クリティカルバンド変数）を決定し、さらに、バリッドバンド変数を補正した後、クリティカルバンド変数、バリッドバンド変数を用いて心理聴覚分析処理を行うので、演算に使用する帯域が限定され、演算量を削減することができる。これにより、従来では演算量が多く処理に時間がかかった心理聴覚分析処理を高速化することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態の音声信号圧縮装置の主要部を示す機能ブロック図である。
【図２】バリッドバンド変数の見積もり方を示す図である。
【図３】サンプリング周波数３２ｋＨｚの場合のクリティカルバンド変数と、クリティカルバンド変数の値に対応する帯域幅の図である。
【図４】サンプリング周波数３２ｋＨｚの場合の周波数バンドと、帯域幅の関係を示す図である。
【図５】本発明の実施の形態の音声信号圧縮方法の処理の流れを説明するフローチャートである。
【図６】オーディオ信号を記録するオーディオ信号記録装置の概略の構成図である。
【図７】ＩＳＯ／ＩＥＣ１１１７２−３で規定されるＭＰＥＧ１Ａｕｄｉｏ（ｌａｙｅｒ１及びｌａｙｅｒ２）のオーディオ信号圧縮回路の構成を示すブロック図である。
【図８】心理聴覚分析部での処理の一例を示すフローチャートである。
【符号の説明】
１０……音声信号圧縮装置，１１……有効帯域見積もり部，１２……臨界サブバンド決定部，１３……有効帯域補正部，１０５ａ……心理聴覚分析部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio signal compression method and an audio signal compression device, and more particularly, to an audio signal compression method and an audio signal compression device for compressing an audio signal by subband coding and psychoacoustic analysis processing using human auditory characteristics.
[0002]
[Prior art]
MPEG (Moving Picture Coding Experts Group) 1 Audio, which is one of the compression coding systems for digital audio signals (hereinafter referred to as audio signals), is defined by ISO / IEC 11172-3 of the international standard system. In particular, MPEG1 Audio layer 2 (hereinafter referred to as MP2) is used for audio recording of a digital video camera or the like.
[0003]
FIG. 7 is a block diagram showing a configuration of an audio signal compression circuit of MPEG 1 Audio (layer 1 and layer 2) defined by ISO / IEC 11172-3.
[0004]
As illustrated, the audio signal compression circuit 100 includes an audio signal input unit 101 that inputs a digital audio signal, a filter bank application unit 102 that divides the audio signal into subbands, and a scale factor calculation unit that calculates a scale factor. 103, a quantizer 104 that performs quantization, a psychoacoustic analysis unit 105b that performs masking (psychological auditory analysis processing) based on human auditory characteristics to reduce the amount of data, and bit allocation that calculates bit allocation The calculation unit 106 includes a bit stream generation unit 107 that generates an MPEG1 Audio bit stream.
[0005]
When an audio signal digitized by an external A / D converter or the like is input to the audio signal input unit 101 in units of frames, the input audio signal is passed to the filter bank application unit 102 and the psychoacoustic analysis unit 105b. The filter bank application unit 102 linearly divides the input audio signal into 32 subbands. The divided subbands are input to the scale factor calculation unit 103. The scale factor calculation unit 103 detects a sample having the maximum absolute value in each subband, calculates a scale factor obtained by converting the value into a logarithm and quantizing. Using this scale factor, normalization is performed so that the maximum amplitude becomes 1.0, and the dynamic range of each subband is made uniform.
[0006]
On the other hand, the psychoacoustic analysis unit 105b calculates a perception threshold value (hereinafter referred to as a masking threshold value) for human hearing for each subband from the input audio signal. The bit allocation calculation unit 106 calculates bit allocation using the calculated masking threshold.
[0007]
In the quantizer 104, the subband is quantized using the scale factor calculated by the scale factor calculation unit 103 and the output of the bit allocation calculation unit 106. Finally, the bitstream generation unit 107 generates an MPEG1 Audio bitstream.
[0008]
FIG. 8 is a flowchart illustrating an example of processing in the psychoacoustic analysis unit.
This procedure is based on the psychoacoustic model MODEL1 defined in ISO / IEC 11172-3, and consists of eight procedures.
[0009]
Step S100: Fourier transform
When an audio signal is input, the psychoacoustic analysis unit 105b first performs a fast Fourier transform (FFT) or the like to convert the audio signal into a frequency component. Note that MP2 handles 1152 audio samples per frame, but frequency conversion is performed on 2 factorial 1024 audio samples due to the nature of FFT.
[0010]
Here, the power spectrum is obtained by taking the sum of squares of the adjacent two frequency components. By combining adjacent two frequency components, the bandwidth changes from 1024 to 512.
[0011]
Step S101: Sound pressure level calculation
Here, the 512-band power spectrum calculated in step S100 is divided into 32 linear subbands. Each divided subband includes 16 power spectra. The sound pressure level of the subband is calculated by taking the sum of the power spectrum. Therefore, here, 16 sums are repeated 32 times.
[0012]
Step S102: Sound selection
Here, a component to be marked as a sound is searched from the power spectrum of 512 bands. In the search method, first, all power spectra are searched from the lower band to the upper band, and a place that becomes a variation point, that is, a place where the spectrum is convex upward is marked as a potential sound component. A place where the marked component is 7 dB larger than the surrounding components is finally determined as a sound component.
[0013]
Step S103: Noise selection
Here, the 512-band power spectrum is divided into non-linear subbands to detect noise components. The method of dividing the nonlinear subband depends on the sampling frequency, and is divided into 25 when the sampling frequency is 32 kHz. The others are divided into 27 pieces. Noise is detected for each non-linear subband, and it is determined whether or not it is noise by adding all power spectra belonging to each subband while weighting them. This process is a product-sum process.
[0014]
Step S104: Masker selection
The sound and noise components (these are called maskers) extracted in steps S102 and S103 are thinned out at predetermined intervals to select a masker. The thinning method is defined in ISO / IEC 11172-3.
[0015]
Step S105: Global mask threshold calculation
The 512-band power spectrum is divided into non-linear 133 subbands, and a mask threshold is calculated for each subband region using the masker selected in the process of step S104. The 133 non-linear subbands are hereinafter referred to as frequency bands. The frequency band division method is defined in ISO / IEC 11172-3.
[0016]
Step S106: Minimum mask threshold calculation
The minimum mask threshold value for the linear 32-divided subband calculated in step S101 is obtained. In this process, the mask threshold value for each nonlinear frequency band obtained in step S105 is made to correspond to a linear sub-band of 32 divisions.
[0017]
Step S107: SMR calculation
Here, for each of the 32 subbands, a difference (SMR: Signal Mask Ratio) between the minimum mask threshold obtained in step S106 and the maximum value of each subband signal is calculated.
[0018]
The processing in the psychoacoustic analysis unit 105b as described above has a drawback that the calculation amount is large and the calculation takes a very long time.
In MP2 encoding, an audio signal is divided into 32 subbands and encoded by subband encoding (also referred to as band division encoding). However, all 32 divided subbands are not used. The number of subbands used depends on the bit rate and sampling frequency, and only 30 at most are used. According to ISO / IEC 11172-3, the maximum number of subbands used is very small, 12 under the conditions of a bit rate of 32 kbps and a sampling frequency of 32 kHz. Conventionally, the processing other than the psychoacoustic analysis unit 105b has been speeded up by using / not using subbands in ISO / IEC 11172-3.
[0019]
Some methods for reducing the amount of computation of the psychoacoustic analysis unit 105b have been proposed (see, for example, Patent Document 1).
[0020]
[Patent Document 1]
Japanese Patent Laid-Open No. 2002-189499 (paragraph numbers [0010] to [0014], FIG. 1)
[0021]
[Problems to be solved by the invention]
However, the above-described conventional technique uses a method different from the psycho-acoustic model MODEL 1 according to ISO / IEC 11172-3, and has a problem that it is a method that does not comply with international standards.
[0022]
The present invention has been made in view of the above points, and is capable of speeding up the processing of the psychoacoustic analysis unit while conforming to MODEL1 of the psychoacoustic model according to ISO / IEC 11172-3. An object is to provide a signal compression method and an audio signal compression apparatus.
[0023]
[Means for Solving the Problems]
In the present invention, in order to solve the above-described problem, in the audio signal compression method for compressing the audio signal by the sub-band coding and the psychoacoustic analysis process using the characteristics of human auditory sense, Based on the number of subband usages determined accordingly, a band effective for use in the psychoacoustic analysis process is estimated by linear approximation, and noise is determined in the psychoacoustic analysis process according to the estimated effective band. In the non-linear sub-band used for selection, the critical sub-band number sufficient for use is determined, and the effective band is corrected and determined according to the linear or non-linear sub-band used in the psychoacoustic analysis processing. The psychoacoustic analysis processing is performed using the critical subband number and the corrected effective band. A method is provided.
[0024]
According to the above method, based on the number of subbands determined according to the bit rate and sampling frequency of the audio signal, the effective band to be used in the psychoacoustic analysis processing is estimated, and the estimated effective band The number of critical subbands that are sufficient for use is determined in the nonlinear subbands used for noise selection according to the above, and after correcting the effective band, the number of critical subbands and the effective band are used. Since psychoacoustic analysis processing is performed, the bandwidth used for the calculation is limited, the amount of calculation is reduced, and the processing speed is increased.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a functional block diagram showing the main part of an audio signal compression apparatus according to an embodiment of the present invention.
[0026]
The audio signal compression apparatus 10 according to the embodiment of the present invention speeds up the process performed in the psychoacoustic analysis model 1051 of the psychoacoustic model defined in ISO / IEC 11172-3 shown in FIG. And has an effective band estimation unit 11, a critical subband number determination unit 12, and an effective band correction unit 13.
[0027]
In FIG. 1, the conventional psychoacoustic analysis unit 105b in the configuration of the audio signal compression circuit of MPEG1 Audio (layer 1 and layer 2) defined by ISO / IEC 11172-3 shown in FIG. It corresponds to the psychoacoustic analysis unit 105a. The other components are the same as those shown in FIG. 7 and are not shown.
[0028]
The effective band estimation unit 11 is a band effective for use in the psychoacoustic analysis unit 105a based on the number of subbands used determined in accordance with the bit rate and sampling frequency of the audio signal (hereinafter, a valid band ) Called variables) by linear approximation.
[0029]
The critical subband number determination unit 12 uses a critical subband that is sufficient for use in the non-linear subband used in the noise selection in step S103 of FIG. 8 in the psychoacoustic analysis unit 105a according to the estimated valid band variable. The number of bands (hereinafter referred to as a critical band variable) is determined.
[0030]
The effective band correction unit 13 corrects the valid band variable according to the linear or non-linear subband used in the psychoacoustic analysis unit 105a.
As the linear subband used in the psychoacoustic analysis unit 105a, the sound pressure level calculation in step S101 is included in the processing performed by MODEL1 of the psychoacoustic model defined in ISO / IEC 11172-3 shown in FIG. There are subbands obtained by linearly dividing the 512 band to be used into 32 bands.
[0031]
As the non-linear sub-band, the sub-band obtained by nonlinearly dividing the 512 band used in the above-described noise selection into 25 or 27 or used in the global mask threshold calculation in the process of step S105 in FIG. There are 133 subbands which are nonlinearly divided (details will be described later).
[0032]
Hereinafter, the operation of the audio signal compression apparatus 10 shown in FIG. 1 will be described.
First, the valid band estimation unit 11 estimates a valid band variable.
In the audio signal compression apparatus 10, 1152 audio samples are used in the configuration shown in FIG. 7 except for the psychoacoustic analysis unit 105a. Therefore, the subbands other than the psychoacoustic analysis unit 105a are obtained by dividing the audio signal of 1152 audio samples into 32 frequency bands. When the bit rate and sampling frequency of the audio signal are determined, the number of subbands used is determined according to the specification of ISO / IEC 11172-3. For example, when the bit rate is 32 kbps and the sampling frequency is 32 kHz, the number of subbands used is twelve.
[0033]
In the processing by the psychoacoustic analysis unit 105a, the audio signal is represented by 512 bands. Therefore, the effective band estimation unit 11 estimates a valid band, that is, a valid band variable among the 512 bands used in the psychoacoustic analysis unit 105a by linear approximation based on the number of subbands used.
[0034]
FIG. 2 is a diagram illustrating how to estimate a valid band variable.
The figure shows the correspondence between the audio data of 1152 audio samples divided into 32 frequency bands and the 512 bands used in the psychoacoustic analysis unit 105a divided into 32 linear frequency bands. Yes.
[0035]
If the number of subbands used in the 1152 audio sample other than the psychoacoustic analysis unit 105a is “sblimit” and the valid band variable is expressed as “valid_band”, it becomes a simple proportional calculation. Therefore, the valid band variable is estimated as follows: it can.
[0036]
[Expression 1]
valid_band = sblimit × 512/32 (1)
The subband usage number “sblimit” is determined according to the bit rate and sampling frequency of the audio signal. For example, when the bit rate is 32 kbps and the sampling frequency is 32 kHz, the number of subbands used is “sblimit = 12”. As shown in FIG. 2, when this is made to correspond to the 512 band used in the psychoacoustic analysis unit 105a, “valid_band = 192” is obtained from the equation (1), and a valid band variable can be estimated.
[0037]
Next, the critical subband number determination unit 12 determines a critical band variable to be used in the noise selection in step S103 of FIG. 8 by the psychoacoustic analysis unit 105a according to the estimated valid band variable.
[0038]
FIG. 3 is a diagram of the critical band variable when the sampling frequency is 32 kHz and the bandwidth corresponding to the value of the critical band variable.
The method of dividing the non-linear subband is defined by ISO / IEC 11172-3 and depends on the sampling frequency. When the sampling frequency is 32 kHz, the 512 band is divided into 25, and the maximum critical band variable is “24”. Otherwise, the 512 band is divided into 27, and the maximum critical band variable is “26”. In the figure, in the case of 32 kHz, the maximum critical band variable is “24”.
[0039]
Conventionally, the critical band variable is calculated up to “24” which is the maximum, regardless of how much band is used (regardless of the number of subbands used), and up to 480 in terms of bandwidth. In the embodiment of the present invention, a critical band variable sufficient for use is determined in the noise selection in accordance with the valid band variable estimated by the effective band estimation unit 11. For example, as described above, when the bit rate is 32 kbps and the sampling frequency is 32 kHz, the valid band variable is “valid_band = 192”. Therefore, referring to FIG. 3, the critical band variable corresponding to the valid band variable is “19”. It becomes. The relationship between the critical band variable and the valid band variable is expressed by the following equation.
[0040]
[Expression 2]
band_width [crit_band-1] <valid_band ≦ band_width [crit_band] (2)
In the above equation, the critical band variable is expressed as “crit_band”, the valid band variable is expressed as “valid_band”, “band_width” indicates the bandwidth, for example, “band_width [crit_band]” is the critical band variable “crit_band”. Indicates bandwidth.
[0041]
As in Expression (2), a critical band variable in which the value of “valid_band” falls between the bandwidth of “crit_band-1” and the bandwidth of “crit_band” is searched.
[0042]
In this way, since the critical band variables need only be calculated according to the estimated valid band variables, the amount of calculation can be reduced.
[0043]
Next, the valid band correction unit 13 corrects the valid band. First, the valid band variable is corrected according to the critical band variable determined by the critical subband number determination unit 12. Specifically, correction is performed according to the following equation.
[0044]
[Equation 3]
valid_band = band_width [crit_band] (3)
As shown in Expression (3), the valid band variable is matched with the bandwidth of the critical band variable. For example, as shown in FIG. 3, the valid band variable is raised by “Δwd” to match the bandwidth of the critical band variable “19”.
[0045]
In the process of the psychoacoustic analysis unit 105a, in the global mask threshold calculation in step S105 of FIG. 8 showing the process of MODEL1 of the psychoacoustic model by ISO / IEC 11172-3, a non-linear subband obtained by dividing the 512 band into 133 pieces. (Frequency band) is used. In the following processing, the valid band variable is corrected in order to correspond to this frequency band.
[0046]
FIG. 4 is a diagram showing the relationship between the frequency band and the bandwidth when the sampling frequency is 32 kHz.
The frequency band division method is defined in ISO / IEC 11172-3.
[0047]
According to the frequency band shown here, it is possible to correspond to 133 non-linear frequency bands by correcting the valid band variable in the same manner as the correction according to the critical band variable described above. Specifically, correction is performed according to the following equation.
[0048]
[Expression 4]
band_width [frequency_band-1] <valid_band ≦ band_width [frequency_band] (4)
Here, the frequency band is described as “frequency_band”. “Band_width [frequency_band]” is the bandwidth of the frequency band.
[0049]
In Equation (4), the bandwidth of the frequency band is determined, and the determined bandwidth of the frequency band is a corrected valid band variable. That is, the following equation is obtained.
[0050]
[Equation 5]
valid_band = band_width [frequency_band] (5)
As described above, first, the valid band variable is corrected from the subband obtained by linearly dividing the 512 band into 32 bands, and then the accuracy is obtained by correcting with the subband obtained by nonlinearly dividing the 512 band into 133 bands. Good correction can be made.
[0051]
In the processing of the psychoacoustic analysis unit 105a, in the sound pressure level calculation of step S101 in FIG. 8 showing the processing of MODEL1 of the psychoacoustic model by ISO / IEC 11172-3, 512 bands are linearly associated with 32 subbands. Therefore, it is necessary to calculate every 16 samples. In order to cope with this, the valid band variable is further corrected to be a multiple of 16. For example, correction is performed according to the following equation.
[0052]
[Formula 6]
valid_band = valid_band_old− (valid_band_old% 16) +16 (6)
Here, “valid_band_old” represents a valid band variable before correction to a multiple of 16, and “valid_band_old% 16” represents a remainder when “valid_band_old” is divided by 16.
[0053]
As described above, MODE1 of the psychoacoustic model according to ISO / IEC 11172-3 as shown in FIG. 8 is performed using the corrected valid band variable and the critical band variable.
[0054]
Hereinafter, the flow of the above processing is summarized in a flowchart.
FIG. 5 is a flowchart for explaining the processing flow of the audio signal compression method according to the embodiment of the present invention.
[0055]
S1: Valid band variable estimation
Based on the number of subbands used determined in accordance with the bit rate and sampling frequency of the audio signal, a valid band effective for use in psychoacoustic analysis processing is estimated by linear approximation.
[0056]
S2: Critical band variable determination
In accordance with the estimated valid band variable, the psychoacoustic analysis unit 105a determines a critical band variable to be used at the time of noise selection in step S103 of FIG.
[0057]
S3: Valid band variable correction
The valid band variable is corrected according to the linear or nonlinear subband used in the psychoacoustic analysis processing. Specifically, the following three steps are performed. That is: Correction is made to match the determined critical band variable band. Correction is made in accordance with a frequency band which is a sub-band obtained by dividing the 2.512 band into 133 non-linearly. 3. Correct to be a multiple of 16.
[0058]
S4: Psychological auditory analysis processing
Using the critical band variable determined in the process of step S2 and the valid band variable corrected in the process of step S3, the psychoacoustic model MODEL1 process according to ISO / IEC 11172-3 shown in FIG. 8 is performed.
[0059]
The following effects can be expected by performing the psychoacoustic analysis process using the valid band variable calculated as described above and the critical band variable.
First, when obtaining the power spectrum after the Fourier transform process in step S100 of FIG. 8, the sum of squares is conventionally calculated over all 512 bands, but the valid band variable determined by the process of the embodiment of the present invention. By using, it is not necessary to perform the calculation for all 512 bands, and it is sufficient to calculate up to the valid band variable.
[0060]
Also in the sound pressure level calculation in step S101, it is not necessary to calculate the sound pressure for all 512 bands. Since the valid band variable is corrected to be a multiple of 16, the calculation frequency only needs to be repeated 16 times the sum of the valid band variables / 16 times.
[0061]
In the sound selection in step S102, a valid band variable is introduced to search for a variation point from the 512 band, that is, 512 comparisons between three adjacent components are performed in the region up to the valid band variable. In other words, the process of searching for the adjacent three components of the valid band variable times is sufficient.
[0062]
In addition, in the noise selection in step S103, by introducing the critical band variable determined in the process of step S2, for example, all the 25 non-linear subbands are not subjected to product-sum processing, but to the critical band variable. It is only necessary to perform product-sum processing for each non-linear subband.
[0063]
As described above, the amount of calculation in the psychoacoustic analysis unit 105a can be greatly reduced, and the processing can be speeded up.
Next, a specific hardware configuration example to which the audio signal compression apparatus according to the embodiment of the present invention is applied will be described.
[0064]
FIG. 6 is a schematic configuration diagram of an audio signal recording apparatus for recording an audio signal.
As shown in the figure, the audio signal recording apparatus 20 includes an A / D converter 21 that converts an input analog audio signal into a digital signal, and an MP2 encoder 22 that compresses and encodes the audio signal in the MP2 format. The audio signal compression apparatus 10 according to the embodiment of the present invention can be realized by the MP2 encoder 22 shown here.
[0065]
The audio signal recording device 20 as shown in FIG. 6 is mounted on, for example, a digital video camera.
The operation of the audio signal recording device 20 will be briefly described.
[0066]
When an analog audio signal is input, the A / D converter 21 converts the audio signal into a digital audio signal. After the conversion, the audio signal is input to the MP2 encoder 22. In the MP2 encoder 22, the audio signal is compressed into the MP2 format by the processing in each unit as shown in FIG. 7 and the processing in the psychoacoustic analysis unit 105a in which the calculation amount is reduced by each function shown in FIG. And recorded on the recording medium 30.
[0067]
Examples of the recording medium 30 include a magnetic recording device, an optical disc, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Examples of the optical disk include a DVD (Digital Versatile Disc), a CD-R (Recordable) / RW (ReWriteable), and the like. Magneto-optical recording media include MO (Magneto-Optical disc).
[0068]
As described above, according to the present invention, it is possible to reduce the amount of computation of psychoacoustic analysis processing, and it is possible to greatly reduce the processing time. Further, the processing time is shortened in conformity with ISO / IEC 11172-3. By shortening the processing time, the audio signal recording apparatus 20 as shown in FIG. 6 can be driven at a lower low frequency, and the power consumption and power supply apparatus of the audio signal recording apparatus 20 can be reduced. The audio signal recording apparatus 20 itself can be reduced in size.
[0069]
Although the present invention is characterized by conforming to ISO / IEC 11172-3, it is not limited to the above standards.
[0070]
【The invention's effect】
As described above, in the present invention, a band (valid band variable) that is effective for use in psychoacoustic analysis processing based on the number of subband usages determined according to the bit rate and sampling frequency of the audio signal. In the non-linear subband used for noise selection according to the estimated and estimated valid band, the critical number of subbands (critical band variable) that is sufficient for use is determined, and after validating the valid band variable, the critical band is corrected. Since the psychoacoustic analysis process is performed using the band variable and the valid band variable, the band used for the calculation is limited, and the calculation amount can be reduced. Thereby, it is possible to speed up the psychoacoustic analysis processing, which conventionally requires a large amount of calculation and takes time to process.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing a main part of an audio signal compression apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a method of estimating a valid band variable.
FIG. 3 is a diagram of a critical band variable in the case of a sampling frequency of 32 kHz and a bandwidth corresponding to the value of the critical band variable.
FIG. 4 is a diagram showing a relationship between a frequency band and a bandwidth when a sampling frequency is 32 kHz.
FIG. 5 is a flowchart illustrating a processing flow of the audio signal compression method according to the embodiment of the present invention.
FIG. 6 is a schematic configuration diagram of an audio signal recording apparatus for recording an audio signal.
FIG. 7 is a block diagram showing a configuration of an audio signal compression circuit of MPEG 1 Audio (layer 1 and layer 2) defined by ISO / IEC 11172-3.
FIG. 8 is a flowchart illustrating an example of processing in a psychoacoustic analysis unit.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Voice signal compression apparatus, 11 ... Effective band estimation part, 12 ... Critical subband determination part, 13 ... Effective band correction part, 105a ... Psychological auditory analysis part

Claims

サブバンド符号化及び人間の聴覚の特性を用いた心理聴覚分析処理により音声信号を圧縮する音声信号圧縮方法において、
前記音声信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、前記心理聴覚分析処理で使用するのに有効な帯域を線形近似により見積もり、
見積もった前記有効な帯域に応じて、前記心理聴覚分析処理でノイズ選択の際に用いられる非線形なサブバンドにおいて、使用に足りる臨界のサブバンド数を決定し、
前記心理聴覚分析処理で使用する線形または非線形なサブバンドに応じて前記有効な帯域を補正し、
決定した前記臨界のサブバンド数及び補正した前記有効な帯域を用いて、前記心理聴覚分析処理を行う、
ことを特徴とする音声信号圧縮方法。In an audio signal compression method for compressing an audio signal by psycho-acoustic analysis processing using subband coding and human auditory characteristics,
Based on the number of subbands used determined according to the bit rate and sampling frequency of the audio signal, the effective band to be used in the psychoacoustic analysis processing is estimated by linear approximation,
In accordance with the estimated effective band, in the non-linear subband used for noise selection in the psychoacoustic analysis processing, the number of critical subbands sufficient for use is determined,
Correct the effective band according to the linear or non-linear subband used in the psychoacoustic analysis process,
The psychoacoustic analysis process is performed using the determined critical subband number and the corrected effective band.
An audio signal compression method.

前記補正は、前記有効な帯域を、決定した前記臨界のサブバンド数における帯域に合わせるように行うことを特徴とする請求項１記載の音声信号圧縮方法。2. The audio signal compression method according to claim 1, wherein the correction is performed so that the effective band is matched with a band in the determined critical number of subbands.

前記補正は、さらに、５１２帯域を１３３個に非線形に分割したサブバンドに応じて行うことを特徴とする請求項２記載の音声信号圧縮方法。3. The audio signal compression method according to claim 2, wherein the correction is further performed according to a subband obtained by nonlinearly dividing the 512 band into 133 pieces.

前記補正は、さらに、前記有効な帯域を１６の倍数に補正することを特徴とする請求項３記載の音声信号圧縮方法。4. The audio signal compression method according to claim 3, wherein the correction further corrects the effective band to a multiple of 16.

前記心理聴覚分析処理は、ＩＳＯ／ＩＥＣ１１１７２−３に準拠していることを特徴とする請求項１記載の音声信号圧縮方法。The audio signal compression method according to claim 1, wherein the psychoacoustic analysis processing is based on ISO / IEC 11172-3.

サブバンド符号化及び人間の聴覚の特性を用いた心理聴覚分析処理により音声信号を圧縮する音声信号圧縮装置において、
前記音声信号のビットレート及びサンプリング周波数に応じて決定されるサブバンド使用数をもとに、前記心理聴覚分析処理で使用するのに有効な帯域を線形近似により見積もる有効帯域見積もり部と、
見積もった前記有効な帯域に応じて、前記心理聴覚分析処理でノイズ選択の際に用いられる非線形なサブバンドにおいて、使用に足りる臨界のサブバンド数を決定する臨界サブバンド数決定部と、
前記心理聴覚分析処理で使用する線形または非線形なサブバンドに応じて前記有効な帯域を補正する有効帯域補正部と、
を有することを特徴とする音声信号圧縮装置。In an audio signal compression apparatus that compresses an audio signal by psycho-acoustic analysis processing using subband coding and human auditory characteristics,
Based on the number of subbands used determined according to the bit rate and sampling frequency of the audio signal, an effective band estimation unit that estimates a band effective for use in the psychoacoustic analysis process by linear approximation;
In accordance with the estimated effective band, a critical subband number determination unit that determines the number of critical subbands that are sufficient for use in a non-linear subband used in noise selection in the psychoacoustic analysis process;
An effective band correction unit that corrects the effective band according to a linear or non-linear subband used in the psychoacoustic analysis process;
An audio signal compression apparatus comprising: