JP2004252068A

JP2004252068A - Device and method for encoding digital audio signal

Info

Publication number: JP2004252068A
Application number: JP2003041260A
Authority: JP
Inventors: Akira Usami; 陽宇佐見
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-02-19
Filing date: 2003-02-19
Publication date: 2004-09-09

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem in which a feeling of distortion of a reproduced signal when an encoded signal is decoded increases if a spectrum is absent as a result of quantization and encoding by an encoding device for digital audio signal to cause deterioration in sound quality. <P>SOLUTION: A bit reserver control means 16 sets the amount of bits usable for encoding. An intensity stereo band setting means 17A is provided to increase a partial frequency area where intensity stereo encoding is carried out when the amounts of usable bits by frames are smaller than a specified threshold ATH (a positive integer). <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のチャンネルのデジタルオーディオ信号を伝送又は記憶する際に情報量を減少するデジタルオーディオ信号の符号化装置に関する。
【０００２】
【従来の技術】
近年のデジタルオーディオの分野では、従来のコンパクトディスク（ＣＤ）に比べて１０分の１以下の低いビットレートで、高品位の音質を伝送又は記憶を可能にする様々なデジタルオーディオ信号の符号化技術が多く使われている。
【０００３】
これらのオーディオ信号符号化技術には、ミニディスク（ＭＤ）で採用されているＡＴＲＡＣ（ＡｄａｐｔｉｖｅＴｒａｎｓｆｏｒｍＡｃｏｕｓｔｉｃＣｏｄｉｎｇ）方式や、衛星デジタル放送で採用されている国際標準化機構（ＩＳＯ：ＩｎｔｅｒｎａｔｉｏｎａｌＯｒｇａｎｉｚａｔｉｏｎｆｏｒＳｔａｎｄａｒｄｉｚａｔｉｏｎ）のＭＰＥＧ（ＭｏｔｉｏｎＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）で規格化されているＭＰＥＧ２−ＡＡＣなどの各種方式がある。
【０００４】
これらのデジタルオーディオ信号の符号化技術を用いた符号化装置では、まず入力されるデジタルオーディオ信号の時系列で連続するｎ個（ｎは正整数）のサンプルをまとめた単位をフレームとするとき、このフレームを単位として周波数軸上の成分を表すサブバンド信号あるいはスペクトルに変換する。変換にはＱＭＦ（ＱｕａｄｒａｔｕｒｅＭｉｒｒｏｒＦｉｌｔｅｒ）などの帯域分割フィルタ処理や、ＭＤＣＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）などの周波数変換処理といった公知のフィルタバンクあるいは変換プロセスが使用される。
【０００５】
次に、最小可聴しきい値や同時マスキングなどの人間の聴覚心理特性に基づいて聴感上知覚されない、もしくは知覚され難い量子化雑音のレベルを許容して、サブバンド信号あるいはスペクトルを量子化するために量子化ビット数が割当てられる。このようなサブバンド信号あるいはスペクトルは、割当てられた量子化ビット数で量子化され、復号化時の量子化雑音は知覚されない、もしくは知覚され難いレベルに抑えられる。これにより音質を高品位に保ちながら、聴感上はあまり知覚されない部分の情報量を削減し、ビットレートの大幅な低減を実現している。
【０００６】
量子化されたサブバンド信号あるいはスペクトルは、符号化されて所定の語長からなるビット列に変換されるとともに、量子化ビット数などの符号化補助情報が多重化されて、符号化ビット列として伝送又は記憶される。更に、符号化において公知のエントロピー符号化である可変長のハフマン符号を使用して、符号化効率を向上させる場合もある。符号化装置から出力、即ち伝送又は記憶された符号化ビット列は、復号化装置もしくは復号化方法で逆の処理を行って、デジタルオーディオ信号に復元されて出力される。
【０００７】
このように人間の聴覚心理特性を利用することは、全体のビットレートの低減には大きな効果をもたらす。しかしながら、フレーム毎においては人間の聴覚心理特性に対して入力されるデジタルオーディオ信号の様態は異なるために、削減される情報量も異なる。すなわち、知覚され難い量子化雑音のレベルを許容するために必要な量子化ビット数が大きくて削減できる情報量の少ないフレームと、逆に必要な量子化ビット数が小さくて削減できる情報量の多いフレームとが存在する。このため、フレーム毎の符号化ビット列の情報量は一定にならない。
【０００８】
その一方で、符号化ビット列を伝送もしくは記憶する装置及び方法、あるいは符号化ビット列を受信し復号する装置及び方法に対して、前記の両方の信号処理を簡単にするために符号化ビット列のビットレートを一定にすることが多い。この要求を満たすとともに、所定のビットレートに基づくフレーム毎のビットの量に対して、少ないビットの量で符号化される場合には余剰ビットの量を蓄えておく。また所定のビットレートに基づくフレーム毎のビットの量に対して、符号化に必要なビットの量が多い場合には、蓄えている余剰ビットの量を加え、使用可能なビットの量を部分的に増大して音質の劣化を低減する方法がある。このような方法のために、余剰ビットを蓄えるバッファを設ける場合もある。
【０００９】
更に、上記のようなデジタルオーディオ信号の符号化技術において、符号化効率を改善する方法として複数のチャンネルからなるデジタルオーディオ信号の双対となる（例えば左と右のチャンネルの）ステレオ信号に対して、ステレオ符号化と呼ばれる技術が用いられる。最も一般的なステレオ符号化は、センター／サイド（ＭＳ）ステレオ符号化と、インテンシティ（強度）ステレオ符号化である。
【００１０】
ＭＳステレオ符号化は、ステレオ信号の和で求められるセンター信号と、差で求められるサイド信号とを生成し、ステレオ信号の代わりにセンター信号とサイド信号とを符号化する。これにより、ステレオ信号が同符号で類似する場合にはサイド信号の振幅が小さくなり、逆に異符号で類似する場合にはセンター信号の振幅が小さくなる。このような方法によれば、符号化時の量子化ビット数が小さくなるために、ステレオ信号を直接符号化する場合に比べて、符号化効率を改善することができる。（例えば、非特許文献１参照。）
【００１１】
一方、インテンシティ（強度）ステレオ符号化は、ステレオ信号から統合信号を生成し、左／右の分布を示す付加的な強度情報を生成するものである。これにより、統合信号を符号化するので、ステレオ信号を符号化する場合に比べて大幅に符号化効率を改善することができる。以下の説明では、「インテンシティ（強度）ステレオ」を単にインテンシティステレオと呼ぶ。（例えば、非特許文献２参照。）
【００１２】
図７は、インテンシティステレオ符号化と、余剰ビットを蓄えるバッファ（以下の説明では「ビットリザーバー」と呼ぶ）を用いた従来のデジタルオーディオ信号の符号化装置の構成を示すブロック図である。
【００１３】
図７において、一連の符号化プロセスは、時系列上のｎ個（ｎは正整数）のサンプルをまとめたフレームを単位として実行される。時間／周波数変換手段１０及び１１は、時間軸上で連続するサンプルの左／右チャンネルの入力信号ＳＩＮＬとＳＩＮＲを、各々スペクトルＳＰＣＬ１とＳＰＣＲ１とに変換するものである。
【００１４】
変換ブロック長設定手段１２及び１３は、フレーム毎の時系列のサンプルの個数をｍ（ｍは正整数）とするとき、ｍの値をｍ＝ｎ及びｍ＜ｎのいずれか一方となるようサブブロックを単位として変換するものである。以下の説明では、ｍ＝ｎとなるサブブロックを「長ブロック」と呼び、ｍ＜ｎとなるサブブロックを「短ブロック」と呼ぶ。例えばｍはｎ＝ｍ×ｋ（ｋは正整数）式で成り立ち、短ブロックに設定された場合は１フレームの期間にｍ個のサンプルから成るサブブロックをｋ個包含する。変換ブロック長の設定は入力信号の様態に応じて設定される。
【００１５】
このような変換ブロック長の切り替えにおいて、例えば振幅の変化量が著しく増大する入力信号を長ブロックでスペクトルに変換し、量子化及び符号化を行なったときには、復号化時に量子化雑音がフレームの全てのサンプルに重畳されて出現する。この場合、信号振幅が小さいときに量子化雑音が知覚される。
【００１６】
この現象を防ぐために、短ブロックを選択することにより、量子化雑音がブロック内のサンプルのみに重畳される。この場合、継時マスキングと呼ばれる人間の聴覚心理特性により、量子化雑音が知覚され難くなる。但し、時間軸の分解能が高い短ブロックでスペクトルに変換することにより、全周波数領域に亘るスペクトルの総数は少なくなり、すなわち周波数軸の分解能は低くなる。
【００１７】
インテンシティステレオ符号化手段１４は、スペクトルＳＰＣＬ１とＳＰＣＲ１とから統合モノラルスペクトルＳＰＣｉ１を生成するものである。ＭＰＥＧ２−ＡＡＣ方式のインテンシティステレオ符号化では、複数のスペクトルをまとめたバンド毎の左／右チャンネルにおけるスペクトルの２乗の累算で求められるエネルギーの比に基づいて、左／右の分布を示す付加的な強度情報を求める。そして、左／右のスペクトルの和にエネルギーの比の２乗根を乗じて統合スペクトルを求める。（非特許文献３を参照。）
【００１８】
インテンシティステレオ符号化は、所定の部分周波数領域のスペクトルを包含するバンドに対して行なわれる。左チャンネルのスペクトルＳＰＣＬ１のインテンシティステレオ符号化を行なうバンドのスペクトルは、統合スペクトルＳＰＣｉ１に置換されてスペクトルＳＰＣＬ２として出力される。また、右チャンネルのスペクトルＳＰＣＲ１のインテンシティステレオ符号化を行なうバンドのスペクトルは、ゼロに置換されてスペクトルＳＰＣＲ２として出力される。これにより、インテンシティステレオ符号化によりゼロに置換された右チャンネルのスペクトルは、削除され伝送又は記憶されない。このために、情報量の大幅な削減が期待される。
【００１９】
ＭＰＥＧ２−ＡＡＣ方式の規格書では、標準的な信号に対してはインテンシティステレオ符号化を行なう部分周波数領域の下限周波数を６ｋＨｚとするのが適当であると記載されている。但し、インテンシティステレオ符号化が行なわれるためには、全周波数領域に亘る左／右チャンネルのスペクトルが同一の数で構成されていなければならない。すなわち、時間／周波数変換を行なう際の左／右チャンネルの変換ブロック長は同一でなければならない。左／右チャンネルの変換ブロック長が異なる場合は、インテンシティステレオ符号化は行なわれない。
【００２０】
更に、付加的な強度情報を複数のスペクトルをまとめたバンド毎に算出し、これを符号化補助情報に包含して伝送又は記憶する場合を考える。短ブロックで時間／周波数変換が行われる場合においては、複数のサブブロックをグループ化して共通の付加的な強度情報として情報量の削減をして符号化効率の向上を図る。この場合には、左／右チャンネルにおけるサブブロックのグループ化の構成がやはり同一でなければならない。
【００２１】
量子化及び符号化手段１５は、左／右チャンネルのスペクトルＳＰＣＬ２とＳＰＣＲ２とを人間の聴覚心理特性と、後述のビットリザーバー制御手段１６により設定されるフレーム毎の使用可能なビットの量に基づいて量子化と符号化を行ない、ビット列に変換する。そして量子化及び符号化手段１５は、符号化補助情報を多重化して符号化ビット列を生成する。
【００２２】
ビットリザーバー制御手段１６は、所定のビットレートに基づくフレーム毎のビットの量に対して少ないビットの量で符号化される場合には余剰ビットの量を蓄えておき、所定のビットレートに基づくフレーム毎のビットの量に対して符号化に必要なビットの量が多い場合には蓄えている余剰ビットの量を加え、使用可能なビットの量を部分的に増大して音質の劣化を低減するための余剰ビットを蓄える。ビットリザーバー制御手段１６はビットリザーバーを備え、全体の符号化ビット列が所定のビットレートになるようにビットリザーバーに蓄えるビットの量を制御する。
【００２３】
図８は、インテンシティステレオ符号化による符号化効率の改善を表す説明図である。図８の縦軸はスペクトルの振幅を示し、横軸はスペクトルの周波数を示す。また、説明を簡単にするために全周波数に亘るスペクトルの総数は２４本とし、隣接するスペクトルを４本ずつまとめて６つのバンドから構成されるとしている。インテンシティステレオ符号化は、横軸に示す周波数ｆ０より高い周波数の領域のスペクトルに対して行なうものとする。
【００２４】
図８において、（ａ１）は左チャンネルのスペクトルＳＰＣＬ１を示し、（ａ２）は右チャンネルのスペクトルＳＰＣＲ１を示す。また図９の（ａ３）は（ａ１）の左チャンネルのスペクトルＳＰＣＬ１と（ａ２）の右チャンネルのスペクトルＳＰＣＲ１とを量子化し、符号化した場合の情報量の総和を示す模式図である。
【００２５】
図９の（ａ３）において、ＤＬ１は（ａ１）の左チャンネルにおけるスペクトルＳＰＣＬ１中の周波数ｆ０より低い周波数領域のスペクトルの情報量を示す。ＤＬ２は図８の（ａ１）の左チャンネルにおけるスペクトルＳＰＣＬ１中の周波数ｆ０より高い周波数領域のスペクトルの情報量を示す。ＤＲ１は図８の（ａ２）の右チャンネルＳＰＣＲ１のスペクトル中の周波数ｆ０より低い周波数領域のスペクトルの情報量を示す。ＤＲ２は図８の（ａ２）の右チャンネルにおけるスペクトルＳＰＣＲ１中の周波数ｆ０より高い周波数領域のスペクトルの情報量を示す。これらの情報量は各スペクトルを量子化及び符号化した場合の情報量である。情報量の総和Ｔ１は、図９の（ａ３）に示すようにＴ１＝ＤＬ１＋ＤＬ２＋ＤＲ１＋ＤＲ２で表わされる。
【００２６】
また、図８の（ｂ１）はインテンシティステレオ符号化を行なった場合のスペクトルＳＰＣＬ２を示す。（ｂ２）は右チャンネルのスペクトルＳＰＣＲ２を示す。インテンシティステレオ符号化を行なうことにより、（ｂ１）の左チャンネルにおけるスペクトルＳＰＣＬ２中の周波数ｆ０より高い周波数のインテンシティステレオ符号化が施されるスペクトルは、統合スペクトルＳＰＣｉ１に置換されている。
【００２７】
また、図８の（ｂ２）に示すように、右チャンネルＳＰＣＲ２の周波数ｆ０より高い周波数のインテンシティステレオ符号化が施されるスペクトルはゼロに置換されている。また、図９の（ｂ３）は図８の（ｂ１）の左チャンネルのスペクトルＳＰＣＬ２と、図８の（ｂ２）の右チャンネルＳＰＣＲ２のスペクトルを量子化し、符号化した場合の情報量の総和を示す。
【００２８】
図９の（ｂ３）において、ＤＬ１は左チャンネルにおけるスペクトルＳＰＣＬ２中の周波数ｆ０より低い周波数領域のスペクトルの情報量を示す。Ｄｉ１は（ｂ１）の左チャンネルにおけるスペクトルＳＰＣＬ２中の周波数ｆ０より高い周波数の領域の統合スペクトルＳＰＣｉ１に置換されたスペクトルの情報量を示す。ＤＲ１は（ｂ２）の右チャンネルにおけるスペクトルＳＰＣＲ２中の周波数ｆ０より低い周波数領域のスペクトルの情報量を示す。これらの情報量は各スペクトルを量子化及び符号化した場合の情報量である。情報量の総和Ｔ２は、図９の（ｂ３）に示すようにＴ２＝ＤＬ１＋ＤＲ１＋Ｄｉ１で表わされる。
【００２９】
図８の（ｂ２）に示すように、右チャンネルの周波数ｆ０より高い周波数領域においては、インテンシティステレオ符号化が施されるスペクトルはゼロに置換される。このため、伝送又は記憶されないので情報量として加算されない。これにより、インテンシティステレオ符号化を行なう場合の情報量の総和Ｔ２は、インテンシティステレオ符号化を行なわない場合の情報量の総和Ｔ１に比べて少なくなり、符号化効率を改善することができる。但し、この例では図９の（ｂ３）に示すように、インテンシティステレオを行なわない周波数ｆ０より低い周波数の領域では、この領域のスペクトルを量子化及び符号化した場合の情報量ＤＬ１が、（ａ３）の同じ符号で表されるＤＬ１と同じ情報量となるようにしている。
【００３０】
また、図９の（ｂ３）に示すインテンシティステレオを行なわない周波数ｆ０より低い周波数の領域では、この領域のスペクトルを量子化及び符号化した場合の情報量ＤＲ１が、（ａ３）の同じ符号で表される情報量ＤＲ１と同じになるようにしている。また、図８の（ａ１）に示す左チャンネルの周波数ｆ０より高い周波数領域のスペクトルを量子化及び符号化による情報量ＤＬ２と、（ｂ１）に示すインテンシティステレオ符号化を行なう周波数ｆ０より高い周波数の統合スペクトルＳＰＣｉ１を量子化及び符号化した場合の情報量Ｄｉ１とが同じになるようにしている。
【００３１】
図１０は、図７に示す符号化装置でインテンシティステレオ符号化を用いて生成された符号化ビット列を復号する復号化装置の構成を示すブロック図である。図１０において、復号化及び逆量子化手段６１は入力される符号化ビット列に包含される符号化補助情報を分解するとともに、復号化及び逆量子化を行って左チャンネルのスペクトルＳＰＣＬ３と、右チャンネルのスペクトルＳＰＣＲ３を出力するものである。復号化及び逆量子化手段６１から出力され、左チャンネルにおけるスペクトルＳＰＣＬ３の中のインテンシティステレオ符号化が施されている部分周波数領域のスペクトルは、統合化スペクトルＳＰＣｉ２からなる。
【００３２】
インテンシティステレオ処理手段６２は、左チャンネルにおけるスペクトルＳＰＣＬ３の中の統合スペクトルＳＰＣｉ２と、分解された符号化補助情報の中の左／右の分布を示す付加的な強度情報とを用いて、右チャンネルにおけるスペクトルＳＰＣＲ３の中のインテンシティステレオ符号化を行なう周波数ｆ０より高い部分の伝送又は記憶されない周波数領域のスペクトルを生成する。またインテンシティステレオ処理手段６２は、スペクトルＳＰＣＲ３のインテンシティステレオ符号化を行う周波数ｆ０より低い周波数のスペクトルと周波数ｆ０より高い周波数のスペクトルとを合成し、スペクトルＳＰＣＲ４を出力する。
【００３３】
また、インテンシティステレオ処理手段６２は、復号化及び逆量子化手段６１から出力される左チャンネルのスペクトルＳＰＣＬ３をスペクトルＳＰＣＬ４として出力する。周波数／時間変換手段６３と６４は、周波数軸上の右チャンネルのスペクトルＳＰＣＬ４と、右チャンネルのスペクトルＳＰＣＲ４とをそれぞれ時系列のデジタルオーディオ信号のサンプルＳＯＵＴＬ及びＳＯＵＴＲに変換して出力する。
【００３４】
図１１は、図１０に示す復号化装置における各手段のスペクトルを示す説明図である。図１１では、縦軸はスペクトルの振幅を示し、横軸はスペクトルの周波数を示す。また、説明を簡単にするために、全周波数に亘るスペクトルの総数は２４本とし、隣接するスペクトルを４本ずつまとめて６つのバンドから構成されるとしている。インテンシティステレオ符号化は周波数ｆ０より高い周波数領域のスペクトルに対して行なうものとする。
【００３５】
図１１において、（ｃ１）は復号化及び逆量子化手段６１から出力される統合スペクトルＳＰＣｉ２を包含する左チャンネルのスペクトルＳＰＣＬ３を示す。（ｃ２）は復号化及び逆量子化手段６１から出力される右チャンネルのスペクトルＳＰＣＲ３を示す。（ｄ１）はインテンシティステレオ処理手段６２から出力される左チャンネルのスペクトルＳＰＣＬ４であり、（ｃ１）のＳＰＣＬ３と同じである。（ｄ２）はインテンシティステレオ復号化手段６２から出力される左チャンネルのスペクトルＳＰＣＲ２を示す。（ｄ２）は（ｃ２）のＳＰＣＲ３に対して（ｃ１）の左チャンネルのスペクトルＳＰＣＬ３に包含される統合スペクトルＳＰＣｉ２を変換すべく、左／右の分布を示す付加的な強度情報を用いて擬似スペクトルを生成した後に結合して出力されたものである。
【００３６】
【非特許文献１】
Ｊ．Ｄ．Ｊｏｈｎｓｔｏｎ著「広帯域ステレオ信号の知覚変換符号化（ＰｅｒｃｅｐｔｕａｌＴｒａｎｓｆｏｒｍＣｏｄｉｎｇｏｆＷｉｄｅｂａｎｄＳｔｅｒｅｏＳｉｇｎａｌｓ）」ＩＥＥＥ／ＩＣＡＳＳＰ、１９８９年、ｐ．１９９３−１９９３
【非特許文献２】
Ｒ．Ｖ．Ｒ．Ｖａｎｄｅｒｗａａｌ著「ステレオフォニックデジタル信号のサブバンド符号化（ＳｕｂｂａｎｄＣｏｄｉｎｇｏｆｓｔｅｒｅｏｐｈｏｎｉｃｄｉｇｉｔａｌｓｉｇｎａｌｓ）」ＩＥＥＥ／ＩＣＡＳＳＰ、１９９１年、ｐ．３６０１−３６０４
【非特許文献３】
「ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇＡＮＮＥＸＢＩｎｆｏｒｍａｔｉｖｅＰａｒｔ」ＩＳＯ／ＩＥＣ、ＩＳ−１３８１８−７
【００３７】
【発明が解決しようとする課題】
しかしながら、上記の従来の技術におけるデジタルオーディオ信号の符号化装置及び方法では、符号化に必要なビットの量が多いフレームや、短ブロックの変換ブロック長で時間／周波数変換を行うフレームが短い期間の間に集中する場合において、ビットリザーバーに蓄えている余剰ビットの量が著しく低減し、符号化に使用可能なビットの量が低減する。このような状態では、所定のビットレートを保ちながら使用可能なビットの量を増大させることができなくなる。このような場合、量子化及び符号化によりスペクトルが欠落して、符号化信号を復号化したときの再生信号の歪み感が増大して音質が劣化するという問題があった。
【００３８】
本発明は、このような従来の問題点に鑑みてなされたものであって、符号化に必要なビットの量が多いフレームや、短ブロックの変換ブロック長で時間／周波数変換を行うフレームが短い期間の間に集中し、符号化に使用可能なビットの量が低減する場合に、量子化及び符号化によりスペクトルが欠落することを阻止し、符号化信号を復号化したときの再生信号の歪み感が無くし、音質が劣化しないデジタルオーディオ信号の符号化装置及び方法を実現することを目的とする。
【００３９】
【課題を解決するための手段】
本願の請求項１の発明は、Ｎチャンネル（Ｎは２以上の正整数）のデジタルオーディオ信号の時系列のｎ個（ｎは正整数）のサンプルをまとめたフレーム毎に周波数軸上のスペクトルに変換し、スペクトルを量子化した後にエントロピー符号化により符号化し、符号化情報を多重化して符号化ビット列を生成して出力する際に、スペクトルにインテンシティステレオ符号化を行なって情報量を減少するデジタルオーディオ信号の符号化装置であって、各チャンネルの時系列のサンプルを周波数領域のスペクトルに変換する時間／周波数変換手段と、前記時間／周波数変換手段で時系列のサンプルを、サブブロックを単位として周波数領域のスペクトルに変換するとき、サンプルの個数ｍ（ｍは正整数）をｍ＝ｎ及びｍ＜ｎのいずれか一方となるよう設定する変換ブロック長設定手段と、Ｎチャンネルのスペクトルの部分周波数領域に対してインテンシティステレオ符号化を行なうインテンシティステレオ処理手段と、前記インテンシティステレオ処理手段から出力されたスペクトルを量子化し、フレーム毎の使用可能なビット量に基づいて符号化し、符号化補助情報を多重して符号化ビット列を生成する量子化及び符号化手段と、フレーム毎の余剰ビット量をビットリザーバーに蓄えると共に、前記ビットリザーバーに蓄えられているビット量と出力ビットレートとを用いてフレーム毎の使用可能なビット量を設定するビットリザーバー制御手段と、フレーム毎の使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させるインテンシティステレオ帯域設定手段と、を具備することを特徴とするものである。
【００４０】
本願の請求項２の発明は、請求項１のデジタルオーディオ信号の符号化装置において、前記インテンシティステレオ帯域設定手段は、時系列の連続するＬ個（Ｌは２以上の正整数）のフレームの使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さいフレームを帯域制御対象フレームとするとき、前記帯域制御対象フレームの数が所定のしきい値ＢＴＨ（ＢＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とするものである。
【００４１】
本願の請求項３の発明は、請求項１のデジタルオーディオ信号の符号化装置において、前記インテンシティステレオ帯域設定手段は、時系列の連続するＬ個（Ｌは２以上の正整数）のフレームの前記変換ブロック長設定手段の設定する変換ブロック長を記憶保持し、記憶保持するＬ個の変換ブロック長のうちｍ＜ｎのサブブロックで変換するフレームの数が所定のしきい値ＣＴＨ（ＣＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とするものである。
【００４２】
本願の請求項４の発明は、Ｎチャンネル（Ｎは２以上の正整数）のデジタルオーディオ信号の時系列のｎ個（ｎは正整数）のサンプルをまとめたフレーム毎に周波数軸上のスペクトルに変換し、スペクトルを量子化した後にエントロピー符号化により符号化し、符号化情報を多重化して符号化ビット列を生成して出力する際に、スペクトルにインテンシティステレオ符号化を行なって情報量を減少するデジタルオーディオ信号の符号化方法であって、各チャンネルの時系列のサンプルを、サブブロックを単位として周波数領域のスペクトルに変換するとき、変換するサンプルの個数ｍ（ｍは正整数）をｍ＝ｎ及びｍ＜ｎのいずれか一方となるよう設定し、Ｎチャンネルのスペクトルの部分周波数領域にインテンシティステレオ符号化を行ない、前記インテンシティステレオ処理により出力されたスペクトルを量子化し、フレーム毎の使用可能なビット量に基づいて符号化すると共に、符号化補助情報を多重して符号化ビット列を生成し、フレーム毎の余剰ビット量をビットリザーバーに蓄えると共に、前記ビットリザーバーに蓄えられているビット量と出力ビットレートとを用いてフレーム毎の使用可能なビット量を設定し、フレーム毎の使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とするものである。
【００４３】
本願の請求項５の発明は、請求項４のデジタルオーディオ信号の符号化方法において、前記インテンシティステレオ符号化における帯域設定は、時系列の連続するＬ個（Ｌは２以上の正整数）のフレームの使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さいフレームを帯域制御対象フレームとするとき、前記帯域制御対象フレームの数が所定のしきい値ＢＴＨ（ＢＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とするものである。
【００４４】
本願の請求項６の発明は、請求項４のデジタルオーディオ信号の符号化方法において、前記インテンシティステレオ符号化における帯域設定は、時系列の連続するＬ個（Ｌは２以上の正整数）のフレームにおけるおける変換ブロック長を記憶保持し、記憶保持するＬ個の変換ブロック長のうちｍ＜ｎのサブブロックで変換するフレームの数が、所定のしきい値ＣＴＨ（ＣＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とするものである。
【００４５】
【発明の実施の形態】
本発明の各実施の形態におけるデジタルオーディオ信号の符号化装置及び方法について、図面を参照しつつ説明する。以下の説明において一連の符号化プロセスは、時系列のデジタルオーディオ信号に対し、ｎ個（ｎは正整数）のサンプルをまとめたフレームを単位として実行される。
【００４６】
（実施の形態１）
図１は、本発明の実施の形態１におけるデジタルオーディオ信号の符号化装置の構成を示すブロック図である。以下に、各構成ブロックとその動作について説明する。時間／周波数変換手段１０及び１１は、時系列のｎ個のサンプルで構成される左チャンネルの入力信号ＳＩＮＬと右チャンネルの入力信号ＳＩＮＲとを、各々左チャンネルのスペクトルＳＰＣＬ１と右チャンネルのスペクトルＳＰＣＲ１とに変換する。時間／周波数変換手段１０及び１１は、左／右チャンネルにおけるフレーム毎のｎ個のサンプルを、後述する変換ブロック長設定手段１２及び１３から通知されるフラグＦＬ又はＦＲに従って、短ブロック又は長ブロックのサブブロック毎にスペクトルＳＰＣＬ１とＳＰＣＲ１に変換して出力する。
【００４７】
変換ブロック長設定手段１２及び１３は、フレーム毎の時系列のサンプルを個数ｍ（ｍは正整数）の値をｍ＝ｎ及びｍ＜ｎのいずれか一方となるようサブブロックを単位として変換する。以下の説明では、ｍ＝ｎとなるサブブロックを「長ブロック」と呼び、ｍ＜ｎとなるサブブロックを「短ブロック」と呼ぶ。例えばｍはｎ＝ｍ×ｋ（ｋは正整数）式で成り立ち、短ブロックに設定された場合は１フレームの期間にｍ個のサンプルから成るサブブロックをｋ個包含する。変換ブロック長の設定は入力信号の様態に応じて設定される。このような変換ブロック長の切り替えについては、従来のデジタルオーディオ信号の符号化装置の構成を示す図７の変換ブロック長設定手段１２及び１３と同じである。
【００４８】
変換ブロック長設定手段１２及び１３は、入力されるデジタルオーディオ信号の時系列の左／右チャンネルにおけるフレーム毎のｎ個のサンプルで構成される左チャンネルの入力信号ＳＩＮＬと、右チャンネルの入力信号ＳＩＮＲとを解析する。そして、振幅の変化量が大きい信号（例えば、カスタネットなどの打楽器による信号）は短ブロックで時間／周波数変換するように、あるいはそれ以外の振幅の変化量が小さい信号は長ブロックで時間／周波数変換するように、時間／周波数変換手段１０及び１１に対して変換ブロック長を設定するフラグＦＬ又はＦＲを出力する。
【００４９】
インテンシティステレオ処理手段１４は、左チャンネルのスペクトルＳＰＣＬ１と右チャンネルのスペクトルＳＰＣＲ１とから統合モノラルスペクトルＳＰＣＩ１を生成する。インテンシティステレオ処理手段を実現する方法の例としては、従来のデジタルオーディオ信号の符号化装置を示す図７のインテンシティステレオ処理手段１４と同じである。左チャンネルのスペクトルＳＰＣＬ１と右チャンネルのスペクトルＳＰＣＲ１に対して、設定され得る周波数ｆｉより高い周波数の領域のスペクトルに対して、インテンシティステレオ符号化が行なわれる。
【００５０】
また、左チャンネルにおけるスペクトルＳＰＣＬ１中の周波数ｆｉより高い周波数の領域は、統合スペクトルＳＰＣｉ１に置換されてスペクトルＳＰＣＬ２として出力される。また、右チャンネルにおけるスペクトルＳＰＣＲ１中の周波数ｆｉより高い周波数の領域のスペクトルは、ゼロに置換されてスペクトルＳＰＣＲ２として出力される。更に、インテンシティステレオ符号化が行われる部分周波数領域の右／左の分布を示す付加的な強度情報も算出される。但し、インテンシティステレオが行なわれるためには、従来のデジタルオーディオ信号の符号化装置を示す図７のインテンシティステレオ処理手段１４と同様の条件でなければならない。
【００５１】
また、上記のインテンシティステレオ処理手段１４の動作の説明は、設定され得る周波数ｆｉより高い周波数のスペクトルに対してインテンシティステレオ符号化を行なう例を対象としている。しかし、インテンシティステレオ符号化を行なう部分周波数領域の設定に別の方法を用いることも可能である。
【００５２】
量子化及び符号化手段１５は、インテンシティステレオ処理手段１４から出力される左チャンネルのスペクトルＳＰＣＬ２と右チャンネルのスペクトルＳＰＣＲ２とに対して、人間の聴覚心理特性と、後述のビットリザーバー制御手段１６により設定されるフレーム毎の使用可能なビットの量とに基づいて量子化と符号化を行なう。そして量子化及び符号化手段１５は量子化と符号化の結果をビット列に変換するとともに、符号化補助情報を多重化して符号化ビット列を生成する。
【００５３】
ビットリザーバー制御手段１６は、所定のビットレートに基づくフレーム毎のビットの量に対して、少ないビットの量で符号化される場合には余剰ビットの量ＲＢＩＴをビットリザーバーに蓄えておく。またビットリザーバー制御手段１６は、所定のビットレートに基づくフレーム毎のビットの量に対して、符号化に必要なビットの量が多い場合にはビットリザーバーに蓄えている余剰ビットの量を加え、使用可能なビットの量ＡＢＩＴを部分的に増大して音質の劣化を低減する。
こうしてビットリザーバー制御手段１６は、全体の符号化ビット列が所定のビットレートになるように、ビットリザーバーに蓄えるビットの量を制御する。
【００５４】
インテンシティステレオ帯域設定手段１７Ａは、ビットリザーバー制御手段１６の出力するフレーム毎の使用可能ビットの量ＡＢＩＴを用いて、インテンシティステレオ符号化を行なう部分周波数領域を設定する。ビットリザーバー制御手段１６から出力されるフレーム毎の使用可能なビットの量ＡＢＩＴが所定のしきい値ＡＴＨ（ＡＴＨは正整数）より大きい場合には、インテンシティステレオ帯域設定手段１７Ａはインテンシティステレオ符号化を行なう部分周波数領域の下限の周波数ｆｉを所定のｆ０に設定する。逆にフレーム毎の使用可能なビットの量ＡＢＩＴが所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さいフレームを帯域制御対象フレームとすると、帯域制御対象フレームの場合には、インテンシティステレオ帯域設定手段１７Ａはインテンシティステレオ符号化を行なう部分周波数領域の下限の周波数ｆｉを、インテンシティステレオ符号化を行う部分周波数範囲を増大させるべく、所定のｆ１（ｆ１＜ｆ０）に設定する。
【００５５】
インテンシティステレオ帯域設定手段１７Ａは、次のような方法で帯域を設定してもよい。すなわち、インテンシティステレオ帯域設定手段１７Ａは、時系列の連続するＬ個（Ｌは２以上の正整数）のフレーム毎にビットリザーバー制御手段１６から出力される使用可能なビットの量をＡＢＩＴとすると、このＡＢＩＴを記憶保持するバッファを備えるものとする。そしてこのビット量ＡＢＩＴが所定のしきい値ＡＴＨより小さいフレームの数が、所定のしきい値ＢＴＨ（ＢＴＨはＬ以下の正整数）より小さい場合には、インテンシティステレオ符号化を行なう部分周波数領域の下限の周波数ｆｉを所定のｆ０に設定する。またビット量ＡＢＩＴが所定のしきい値ＡＴＨより小さいフレームの数が、所定のしきい値ＢＴＨより大きい場合には、インテンシティステレオ符号化を行なう部分周波数領域の下限の周波数ｆｉを所定のｆ１（ｆ１＜ｆ０）に設定する。
【００５６】
上記のインテンシティステレオ帯域設定手段１７Ａの動作の説明では、設定され得る周波数ｆｉより高い周波数のスペクトルに対してインテンシティステレオ符号化を行なう例を対象とした。しかし、インテンシティステレオ符号化を行なう部分周波数領域を増大させるには別の方法を用いることも可能である。また、上記のインテンシティステレオ帯域設定手段１７Ａは、設定され得る周波数ｆｉをｆ０又はｆ１の２つの設定のどちらかに切り替えるようにした。しかし、２つ以上の複数の設定値を用意して、この中のいずれかの設定値を選択するように切り替える方法を用いることも可能である。
【００５７】
図２は、図１のデジタルオーディオ信号の符号化装置において、部分周波数領域のスペクトルに対してインテンシティステレオ符号化を行なった場合の符号化効率の改善を表す説明図である。詳しくはビットリザーバー制御手段１６から出力されるフレーム毎の使用可能なビットの量ＡＢＩＴと所定のしきい値ＡＴＨとの大小関係に基づき、インテンシティステレオ帯域設定手段１７Ａで設定される所定の周波数をｆ０あるいはｆ１（ｆ１＜ｆ０）に設定する例を示している。また図３は左チャンネルのスペクトルＳＰＣＬ１と右チャンネルのスペクトルＳＰＣＲ１とを量子化し、符号化した場合の情報量の総和を示す模式図である。
【００５８】
図２において、縦軸はスペクトルの振幅を示し、横軸はスペクトルの周波数を示す。また、説明を簡単にするために全周波数に亘るスペクトルの総数は２４本とし、隣接するスペクトルを４本ずつまとめ、６つのバンドから構成されるものとする。図２の（ｂ１）及び（ｂ２）と図３の（ｂ３）は、インテンシティステレオ符号化による符号化効率の改善を表す図８及び図９の説明図と内容が同じである。
【００５９】
また、図２において（ｅ１）は、インテンシティステレオ帯域設定手段１７Ａで設定される所定の周波数ｆ１（ｆ１＜ｆ０）より高い部分周波数領域のスペクトルに対して、インテンシティステレオ符号化を行なった場合のスペクトルＳＰＣＬ２を示し、（ｅ２）は右チャンネルのスペクトルＳＰＣＲ２を示す。本実施の形態のインテンシティステレオ符号化を行なうことにより、（ｅ１）の左チャンネルのスペクトルＳＰＣＬ２において、周波数ｆ１より高い周波数のインテンシティステレオ符号化を施されるスペクトルが統合スペクトルＳＰＣｉ１に置換されている。
【００６０】
また、（ｅ２）の右チャンネルにおけるスペクトルＳＰＣＲ２において、周波数ｆ１より高い周波数のインテンシティステレオ符号化を施されるスペクトルがゼロに置換されている。また図３の（ｅ３）は、図２における（ｅ１）の左チャンネルのスペクトルと（ｅ２）の右チャンネルのスペクトルとを量子化して符号化した場合の情報量の総和を示す。
【００６１】
図３の（ｅ３）において、ＤＬ３は左チャンネルにおけるスペクトルＳＰＣＬ２の中の周波数ｆ１より低い周波数領域のスペクトルの情報量である。Ｄｉ２は（ｅ１）の左チャンネルにおけるスペクトルＳＰＣＬ２の中の周波数ｆ１より高い周波数領域の統合スペクトルに置換されたスペクトルの情報量である。ＤＲ３は（ｅ２）の右チャンネルにおけるスペクトルＳＰＣＲ２の中の周波数ｆ１より低い周波数領域のスペクトルの情報量である。これらの情報量は各スペクトルが量子化及び符号化された場合の値である。これらの情報量の総和Ｔ３は、Ｔ３＝ＤＬ３＋ＤＲ３＋Ｄｉ２で示される。
【００６２】
図２の（ｅ２）に示すように、右チャンネルのスペクトルＳＰＣＲ２の周波数ｆ１より高い周波数領域において、インテンシティステレオ符号化を施されるスペクトルはゼロに置換され、伝送又は記憶されないので、情報量として加算されない。これにより、インテンシティステレオ符号化を行なう場合の情報量の総和Ｔ３は、図９に示すインテンシティステレオ符号化を行なわない場合の情報量の総和Ｔ１に比べて少なくすることができ、符号化効率を更に改善することができる。
【００６３】
また、周波数ｆ０より高い部分周波数領域のスペクトルに対してインテンシティステレオ符号化を行なう場合と比較して、周波数ｆ１より高い部分周波数領域のスペクトルにインテンシティステレオ符号化を行なうと、右チャンネルのインテンシティステレオ符号化によりゼロに置換されるスペクトルの数が増大する。このために、右チャンネルのスペクトルの情報量ＤＲ３が情報量ＤＲ１より低減される。これにより、周波数ｆ１より高い部分周波数領域のスペクトルに対して、インテンシティステレオ符号化を行なう場合の情報量の総和Ｔ３は、周波数ｆ０より高い部分周波数領域のスペクトルにインテンシティステレオ符号化を行なう場合の情報量の総和Ｔ２に比べて少なくなる。このため、符号化効率を改善することができる。
【００６４】
図４は、本発明の実施の形態１において、フレーム毎のインテンシティステレオ符号化を行う部分周波数帯域の設定範囲の変化を示すタイムチャートである。図４の横軸は時系列に連続するフレーム番号を示し、上段のタイムチャートの縦軸は、ビットリザーバー制御手段１６から出力されるフレーム毎の使用可能なビットの量ＡＢＩＴを表す。ＡＴＨは、インテンシティステレオ帯域設定手段１７Ａにより、インテンシティステレオ符号化を行う部分周波数帯域の下限の周波数をｆ０もしくはｆ１に切り替える基準となる所定のしきい値である。下段のタイムチャートの縦軸は周波数を表す。
【００６５】
図４に示すように、インテンシティステレオ帯域設定手段１７Ａは、ビットリザーバー制御手段１６から出力される使用可能なビットの量ＡＢＩＴの値が所定のしきい値ＡＴＨより小さくなる第３フレーム及び第５フレームにおいて、インテンシティステレオ符号化を行う部分周波数帯域の下限の周波数をｆ１に設定するようにしている。
【００６６】
これにより、フレーム毎の使用可能なビットの量ＡＢＩＴが所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さい場合で、使用可能なビットの量ＡＢＩＴが著しく減少して量子化及び符号化でスペクトルが欠落する可能性があるとき、インテンシティステレオ符号化を行なう部分周波数領域の範囲を増大させる。すなわちインテンシティステレオを行なう部分周波数領域の下限の周波数を、ｆ０からｆ１（ｆ１＜ｆ０）に移すことで、符号化効率が改善され少ないビットの量で量子化及び符号化を行うことができる。この場合、量子化及び符号化でスペクトルが欠落する可能性を低減でき、音質の劣化を阻止することができる。
【００６７】
尚、上記の実施の形態１で述べられた一連の符号化プロセスは、ソフトウェアプログラム言語によってコンピュータ又はデジタルシグナルプロセッサ（ＤＳＰ）上で実現することも可能である。
【００６８】
（実施の形態２）
次に本発明の実施の形態２におけるデジタルオーディオ信号の符号化装置について説明する。図５は実施の形態２におけるデジタルオーディオ信号の符号化装置の構成を示すブロック図である。なお、実施の形態１と同一のブロックは同一の符号を付け、各ブロックの動作について説明する。
【００６９】
図５において、時間／周波数変換手段１０及び１１、変換ブロック長設定手段１２及び１３、インテンシティステレオ符号化手段１４、量子化及び符号化手段１５、及びビットリザーバー制御手段１６は、図１のデジタルオーディオ信号の符号化装置における各手段と機能が同じである。
【００７０】
インテンシティステレオ帯域設定手段１７Ｂは、時系列の連続するＬ個（Ｌは２以上の正整数）のフレーム毎の変換ブロック長設定手段１２及び１３から出力される変換ブロック長を設定するフラグＦＬ又はＦＲを記憶保持し、記憶保持したＬ個のフレーム毎のフラグＦＬ又はＦＲに基づいて、インテンシティステレオ符号化を行なう部分周波数領域を設定する。インテンシティステレオ帯域設定手段１７Ｂは、変換ブロック長設定手段１２及び１３から出力されるＬ個のフレーム毎の変換ブロック長を設定するフラグＦＬ又はＦＲを記憶保持するためのバッファを備える。インテンシティステレオ帯域設定手段１７Ｂは、バッファに記憶保持されるＬ個のフレーム毎の変換ブロック長を設定するフラグＦＬ又はＦＲを参照し、短ブロックが設定されるフレーム数が、所定のしきい値ＣＴＨ（ＣＴＨはＬ以下の正整数）より小さい場合には、インテンシティステレオ符号化を行なう部分周波数領域の下限の周波数ｆｉを所定のｆ０に設定する。またインテンシティステレオ帯域設定手段１７Ｂはバッファに記憶保持されるＬ個のフレーム毎の変換ブロック長を設定するフラグＦＬ又はＦＲを参照し、短ブロックが設定されるフレーム数が所定のしきい値ＣＴＨより大きい場合には、インテンシティステレオ符号化を行なう部分周波数領域の下限の周波数ｆｉを所定のｆ１（ｆ１＜ｆ０）に設定する。
【００７１】
上記のインテンシティステレオ帯域設定手段１７Ｂの動作説明では、設定され得る周波数ｆｉより高い周波数のスペクトルに対してインテンシティステレオ符号化を行なう例を対象としたが、インテンシティステレオ符号化を行なう部分周波数領域を増大させるには、別の方法を用いることも可能である。また、上記のインテンシティステレオ帯域設定手段１７Ｂは、設定され得る周波数ｆｉをｆ０又はｆ１の２つの設定のどちらかに切り替えたが、２つ以上の複数の設定値を用意して、この中のいずれかの設定値を選択するように切り替える方法を用いることもできる。
【００７２】
変換ブロック長の切り替えは、図７の従来のデジタルオーディオ信号の符号化装置における変換ブロック長設定手段１２及び１３と同様の方法で行なわれる。複数の隣接するスペクトルをまとめたバンド毎に量子化及び符号化が行なわれるために、フレーム毎の量子化及び符号化されるバンド毎の符号化補助情報が伝送又は記憶されることになる。
【００７３】
ここで、各バンドに包含されるスペクトルの数は予め定められている。一般的には低い周波数のスペクトルを包含するバンドのスペクトルの本数に比べて、高い周波数のスペクトルを包含するバンドのスペクトルの本数の方が多く設定される。また、各バンドのスペクトルの本数は人間の聴覚特性に基づいて設定される。伝送又は記憶される有効な周波数領域の範囲を一定にすると、短ブロックが選択される場合の伝送又は記憶されるバンドの数が長ブロックが選択される場合に比べて大きくなるために、符号化補助情報の伝送に多くのビットを要する。
【００７４】
また、振幅の変化量が大きい信号に対しても量子化ノイズを知覚されないレベルに抑えるために量子化ビット数が大きくなる。このため、短ブロックのフレームは長ブロックのフレームに比べて多くのビットを消費する。このため、短ブロックのフレームが連続して出現する場合には、使用可能なビットの量ＡＢＩＴが著しく減少する可能性がある。
【００７５】
図６は、本発明の実施の形態２におけるフレーム毎のインテンシティステレオ符号化を行う部分周波数帯域の設定範囲の変化を示すタイムチャートである。図６の横軸は時系列に連続するフレームを示す。上段のタイムチャートは、インテンシティステレオ帯域設定手段１４に記憶保持されるフレーム毎の時間／周波数変換を行う際の変換ブロック長を示す。
【００７６】
図６に示す例では、時系列に連続する３つのフレームの変換ブロック長を記憶保持し、Ｆ０が第ｎフレームの変換ブロック長であり、Ｆ１が第（ｎ−１）フレームの変換ブロック長であり、Ｆ２が第（ｎ−２）フレームの変換ブロック長であるとする。例えば、第４フレームの符号化処理を行う場合には、Ｆ０に第４フレームの変換ブロック長として短ブロックを示すフラグを記憶保持し、Ｆ１に第３フレームの変換ブロック長として短ブロックを示すフラグを記憶保持し、Ｆ２に第２フレームの変換ブロック長として長ブロックを示すフラグを記憶保持する。
【００７７】
図６の下段のタイムチャートにおける縦軸は周波数を表す。図６に示すように、インテンシティステレオ帯域設定手段１７Ｂは、変換ブロック長設定手段１２及び１３から出力される時系列で連続する３つのフレームの変換ブロック長を記憶保持する。インテンシティステレオ帯域設定手段１７Ｂは記憶保持した３つのフレームの変換ブロック長のうち、所定のしきい値ＣＴＨ（図６においては、ＣＴＨ＝２としている。）が短ブロックとなる第４フレーム及び第５フレームにおいて、インテンシティステレオ符号化を行う部分周波数帯域の下限の周波数をｆ１に設定する。
【００７８】
変換ブロック長が時系列に連続して、あるいは短い期間の間に集中して短ブロックになることで、使用可能なビットの量ＡＢＩＴが著しく減少し、量子化及び符号化でスペクトルが欠落する場合を考える。このとき、インテンシティステレオ帯域設定手段１７Ｂは時系列の連続するＬ個のフレーム毎の変換ブロック長を記憶保持し、記憶保持した変換ブロック長に基づいてインテンシティステレオを行なう部分周波数領域の下限の周波数をｆ０からｆ１（ｆ１＜ｆ０）に移す。そして、インテンシティステレオ符号化を行なう部分周波数領域の範囲を増大することにより、符号化効率を改善し、少ないビットの量で量子化及び符号化を可能にする。こうすれば量子化及び符号化でスペクトルが欠落する可能性を低減でき、音質の劣化を阻止することができる。
【００７９】
尚、上記の実施の形態２の説明にある一連の符号化プロセスは、ソフトウェアプログラム言語によってコンピュータ又はデジタルシグナルプロセッサ（ＤＳＰ）上で実現することも可能である。
【００８０】
【発明の効果】
本発明のデジタルオーディオ信号の符号化装置及び方法によれば、量子化及び符号化に使用可能なビットの量、及び時間／周波数変換の変換ブロック長に基づいて、インテンシティステレオ符号化を行なう部分周波数領域の範囲を適応的に変化させることにより、使用可能なビットの量が著しく減少した場合でも量子化及び符号化によりスペクトルが欠落するのを防ぐことができる。そしてスペクトルが欠落することによる歪み感の増大や、音質の劣化を阻止することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１におけるデジタルオーディオ信号の符号化装置の構成を示すブロック図である。
【図２】実施の形態１におけるインテンシティステレオ符号化による符号化効率の改善効果を表す説明図（その１）である。
【図３】実施の形態１におけるインテンシティステレオ符号化による符号化効率の改善効果を表す説明図（その２）である。
【図４】実施の形態１において、フレーム毎のインテンシティステレオ符号化を行う部分周波数帯域の設定範囲の変化を示すタイムチャートである。
【図５】本発明の実施の形態２におけるデジタルオーディオ信号の符号化装置の構成を示すブロック図である。
【図６】実施の形態２において、フレーム毎のインテンシティステレオ符号化を行う部分周波数帯域の設定範囲の変化を示すタイムチャートである。
【図７】従来のデジタルオーディオ信号の符号化装置の構成を示すブロック図である。
【図８】従来のインテンシティステレオ符号化による符号化効率の改善効果を表す説明図（その１）である。
【図９】従来のインテンシティステレオ符号化による符号化効率の改善効果を表す説明図（その２）である。
【図１０】インテンシティステレオ符号化を用いて生成された符号化ビット列を復号する復号化装置の構成を示すブロック図である。
【図１１】インテンシティステレオ符号化を用いて生成された符号化ビット列を復号する復号化装置において、各ブロックのスペクトルを示す説明図である。
【符号の説明】
１０、１１時間／周波数変換手段
１２、１３変換ブロック長設定手段
１４インテンシティステレオ処理手段
１５量子化及び符号化手段
１６ビットリザーバー制御手段
１７Ａ、１７Ｂインテンシティステレオ帯域設定手段
６１復号化及び逆量子化手段
６２インテンシティステレオ処理手段
６３、６４周波数／時間変換手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a digital audio signal encoding device that reduces the amount of information when transmitting or storing digital audio signals of a plurality of channels.
[0002]
[Prior art]
2. Description of the Related Art In the field of digital audio in recent years, various digital audio signal encoding technologies capable of transmitting or storing high-quality sound at a bit rate lower than one tenth of that of a conventional compact disk (CD). Is often used.
[0003]
These audio signal coding technologies include an ATRAC (Adaptive Transform Acoustic Coding) method used in a mini disc (MD) and an International Organization for Standardization (ISO) used in satellite digital broadcasting. There are various schemes such as MPEG2-AAC standardized by MPEG (Motion Picture Experts Group).
[0004]
In an encoding apparatus using these digital audio signal encoding techniques, first, when a frame is a unit in which n consecutive samples (n is a positive integer) in a time series of an input digital audio signal, This frame is converted into a sub-band signal or spectrum representing a component on the frequency axis in units of frames. For the conversion, a known filter bank or a conversion process such as a band division filter process such as a QMF (Quadrature Mirror Filter) or a frequency conversion process such as an MDCT (Modified Discrete Cosine Transform) is used.
[0005]
Next, to quantize the subband signal or spectrum, allowing the level of quantization noise that is not perceptually perceptible or hard to perceive based on human auditory psychological characteristics such as minimum audible threshold and simultaneous masking Are assigned the number of quantization bits. Such a subband signal or spectrum is quantized by the assigned number of quantization bits, and quantization noise at the time of decoding is suppressed to a level that is not perceived or hardly perceived. As a result, while maintaining high sound quality, the amount of information in a portion that is hardly perceived by hearing is reduced, and the bit rate is significantly reduced.
[0006]
The quantized sub-band signal or spectrum is encoded and converted into a bit string having a predetermined word length, and encoding auxiliary information such as the number of quantization bits is multiplexed and transmitted as an encoded bit string. It is memorized. Further, there is a case where the encoding efficiency is improved by using a variable-length Huffman code which is a known entropy encoding in the encoding. The output from the encoder, that is, the transmitted or stored encoded bit sequence, is subjected to the reverse processing by the decoder or the decoding method, and is restored to a digital audio signal and output.
[0007]
Utilizing the psychoacoustic characteristics of humans in this way has a great effect on reducing the overall bit rate. However, since the form of the digital audio signal input with respect to the human psychoacoustic characteristics differs for each frame, the amount of information to be reduced also differs. In other words, the number of quantization bits required to allow the level of quantization noise that is difficult to perceive is large, and the number of quantization bits is small and the amount of information that can be reduced is small, and conversely, the number of required quantization bits is small and the amount of information that can be reduced is large. There is a frame. For this reason, the information amount of the encoded bit string for each frame is not constant.
[0008]
On the other hand, for an apparatus and a method for transmitting or storing an encoded bit stream, or an apparatus and a method for receiving and decoding an encoded bit stream, the bit rate of the encoded bit stream to simplify both the signal processing described above. Is often constant. In addition to satisfying this requirement, the amount of surplus bits is stored when encoding is performed with a smaller amount of bits for each frame based on a predetermined bit rate. If the amount of bits required for encoding is larger than the amount of bits for each frame based on the predetermined bit rate, the amount of stored surplus bits is added to partially reduce the amount of usable bits. To reduce the deterioration of sound quality. For such a method, a buffer for storing surplus bits may be provided.
[0009]
Further, in the digital audio signal encoding technique as described above, as a method of improving encoding efficiency, a digital audio signal composed of a plurality of channels (for example, a left and right channel) of a stereo signal, A technique called stereo coding is used. The most common stereo codings are center / side (MS) stereo coding and intensity (strength) stereo coding.
[0010]
The MS stereo coding generates a center signal obtained by a sum of stereo signals and a side signal obtained by a difference, and codes the center signal and the side signal instead of the stereo signal. Accordingly, when the stereo signals are similar with the same sign, the amplitude of the side signal is reduced, and when the stereo signals are similar with different signs, the amplitude of the center signal is reduced. According to such a method, since the number of quantization bits at the time of encoding is reduced, encoding efficiency can be improved as compared with a case where a stereo signal is directly encoded. (For example, see Non-Patent Document 1.)
[0011]
Intensity (intensity) stereo coding, on the other hand, generates an integrated signal from a stereo signal and generates additional intensity information indicating a left / right distribution. Thus, since the integrated signal is encoded, the encoding efficiency can be greatly improved as compared with the case where the stereo signal is encoded. In the following description, “intensity (intensity) stereo” is simply referred to as intensity stereo. (For example, see Non-Patent Document 2.)
[0012]
FIG. 7 is a block diagram showing a configuration of a conventional digital audio signal encoding device using intensity stereo encoding and a buffer (hereinafter referred to as a “bit reservoir”) for storing surplus bits.
[0013]
In FIG. 7, a series of encoding processes is executed in units of a frame in which n (n is a positive integer) samples in a time series are collected. The time / frequency conversion means 10 and 11 convert the input signals SINL and SINR of the left / right channels of the continuous samples on the time axis into spectra SPCL1 and SPCR1, respectively.
[0014]
When the number of time-series samples for each frame is m (m is a positive integer), the conversion block length setting means 12 and 13 set the value of m such that m = n or m <n. The conversion is performed in units of blocks. In the following description, a sub-block where m = n is referred to as a “long block”, and a sub-block where m <n is referred to as a “short block”. For example, m is expressed by the formula n = m × k (k is a positive integer), and when set as a short block, one frame period includes k subblocks composed of m samples. The setting of the conversion block length is set according to the state of the input signal.
[0015]
In such switching of the transform block length, for example, when an input signal whose amplitude change amount is remarkably increased is converted into a spectrum by a long block, and quantization and encoding are performed, quantization noise during decoding is reduced to the entire frame. Appear superimposed on the sample. In this case, quantization noise is perceived when the signal amplitude is small.
[0016]
In order to prevent this phenomenon, by selecting a short block, the quantization noise is superimposed only on the samples in the block. In this case, quantization noise is less likely to be perceived due to human psychoacoustic characteristics called successive masking. However, by converting into a spectrum with a short block having a high resolution on the time axis, the total number of spectra over the entire frequency domain is reduced, that is, the resolution on the frequency axis is reduced.
[0017]
The intensity stereo encoding means 14 generates an integrated monaural spectrum SPCl1 from the spectrums SPCL1 and SPCR1. In the intensity stereo coding of the MPEG2-AAC method, a left / right distribution is shown based on an energy ratio obtained by accumulating the square of the spectrum in the left / right channel for each band obtained by combining a plurality of spectra. Find additional intensity information. Then, the integrated spectrum is obtained by multiplying the sum of the left / right spectra by the square root of the energy ratio. (See Non-Patent Document 3)
[0018]
Intensity stereo coding is performed on a band including a spectrum in a predetermined partial frequency domain. The spectrum of the band for performing intensity stereo coding of spectrum SPCL1 of the left channel is replaced with integrated spectrum SPCl1 and output as spectrum SPCL2. Further, the spectrum of the band on which the intensity stereo coding of the spectrum SPCR1 of the right channel is performed is replaced with zero and output as the spectrum SPCR2. As a result, the spectrum of the right channel replaced with zero by the intensity stereo coding is deleted and not transmitted or stored. For this reason, a large reduction in the amount of information is expected.
[0019]
According to the MPEG2-AAC standard, it is described that it is appropriate to set the lower limit frequency of the partial frequency region for performing intensity stereo coding to 6 kHz for a standard signal. However, in order to perform intensity stereo coding, the spectrum of the left / right channel over the entire frequency domain must be composed of the same number. That is, the conversion block length of the left / right channel in performing the time / frequency conversion must be the same. If the transform block lengths of the left / right channels are different, intensity stereo coding is not performed.
[0020]
Further, a case is considered where additional intensity information is calculated for each band in which a plurality of spectra are put together, and this is included in the encoding auxiliary information and transmitted or stored. When time / frequency conversion is performed on short blocks, a plurality of sub-blocks are grouped to reduce the amount of information as common additional intensity information, thereby improving coding efficiency. In this case, the configuration of the grouping of the sub-blocks in the left / right channels must also be the same.
[0021]
The quantization / encoding unit 15 converts the left / right channel spectra SPCL2 and SPCR2 based on human psychoacoustic characteristics and the amount of available bits for each frame set by the bit reservoir control unit 16 described later. Quantization and encoding are performed and converted into a bit string. Then, the quantization and encoding unit 15 multiplexes the encoding auxiliary information to generate an encoded bit sequence.
[0022]
The bit reservoir control means 16 stores the amount of surplus bits when encoding is performed with a smaller amount of bits than the amount of bits for each frame based on a predetermined bit rate, and stores a frame based on the predetermined bit rate. When the amount of bits required for encoding is larger than the amount of bits for each bit, the amount of stored surplus bits is added, and the amount of usable bits is partially increased to reduce sound quality degradation. To save extra bits for The bit reservoir control means 16 includes a bit reservoir, and controls the amount of bits stored in the bit reservoir so that the entire encoded bit sequence has a predetermined bit rate.
[0023]
FIG. 8 is an explanatory diagram illustrating an improvement in coding efficiency by intensity stereo coding. The vertical axis of FIG. 8 shows the amplitude of the spectrum, and the horizontal axis shows the frequency of the spectrum. For the sake of simplicity, it is assumed that the total number of spectra over all frequencies is 24, and that four adjacent spectra are grouped into six bands. The intensity stereo coding is performed on a spectrum in a frequency range higher than the frequency f0 shown on the horizontal axis.
[0024]
In FIG. 8, (a1) shows the spectrum SPCL1 of the left channel, and (a2) shows the spectrum SPCR1 of the right channel. Further, (a3) of FIG. 9 is a schematic diagram showing the total sum of information amounts when the spectrum SPCL1 of the left channel of (a1) and the spectrum SPCR1 of the right channel of (a2) are quantized and coded.
[0025]
In (a3) of FIG. 9, DL1 indicates the information amount of the spectrum in the frequency domain lower than the frequency f0 in the spectrum SPCL1 in the left channel of (a1). DL2 indicates the information amount of the spectrum in the frequency domain higher than the frequency f0 in the spectrum SPCL1 in the left channel of (a1) in FIG. DR1 indicates the information amount of the spectrum in the frequency region lower than the frequency f0 in the spectrum of the right channel SPCR1 in (a2) of FIG. DR2 indicates the information amount of the spectrum in the frequency domain higher than the frequency f0 in the spectrum SPCR1 in the right channel of (a2) in FIG. These information amounts are the information amounts when each spectrum is quantized and coded. The total information amount T1 is represented by T1 = DL1 + DL2 + DR1 + DR2 as shown in (a3) of FIG.
[0026]
(B1) of FIG. 8 shows a spectrum SPCL2 when intensity stereo coding is performed. (B2) shows the spectrum SPCR2 of the right channel. By performing the intensity stereo coding, the spectrum to be subjected to the intensity stereo coding of a frequency higher than the frequency f0 in the spectrum SPCL2 in the left channel of (b1) is replaced with the integrated spectrum SPCl1.
[0027]
Also, as shown in (b2) of FIG. 8, the spectrum to be subjected to intensity stereo coding of a frequency higher than the frequency f0 of the right channel SPCR2 is replaced with zero. (B3) of FIG. 9 shows the sum of the information amounts when the left channel spectrum SPCL2 of (b1) of FIG. 8 and the right channel SPCR2 spectrum of (b2) of FIG. 8 are quantized and coded. .
[0028]
In (b3) of FIG. 9, DL1 indicates the information amount of the spectrum in the frequency domain lower than the frequency f0 in the spectrum SPCL2 in the left channel. Di1 indicates the information amount of the spectrum replaced with the integrated spectrum SPCl1 in a region of a frequency higher than the frequency f0 in the spectrum SPCL2 in the left channel of (b1). DR1 indicates the information amount of the spectrum in the frequency domain lower than the frequency f0 in the spectrum SPCR2 in the right channel of (b2). These information amounts are the information amounts when each spectrum is quantized and coded. The total information amount T2 is represented by T2 = DL1 + DR1 + Di1, as shown in (b3) of FIG.
[0029]
As shown in (b2) of FIG. 8, in a frequency region higher than the frequency f0 of the right channel, the spectrum to be subjected to intensity stereo coding is replaced with zero. Therefore, they are not transmitted or stored, and are not added as an information amount. As a result, the sum T2 of the information amount when the intensity stereo coding is performed becomes smaller than the sum T1 of the information amount when the intensity stereo coding is not performed, and the coding efficiency can be improved. However, in this example, as shown in (b3) of FIG. 9, in a region of a frequency lower than the frequency f0 at which the intensity stereo is not performed, the information amount DL1 when the spectrum of this region is quantized and encoded is ( The amount of information is set to be the same as that of DL1 represented by the same code in a3).
[0030]
Further, in a region of a frequency lower than the frequency f0 at which the intensity stereo is not performed as shown in (b3) of FIG. 9, the information amount DR1 when the spectrum of this region is quantized and coded is the same code of (a3). It is set to be the same as the indicated information amount DR1. Further, an information amount DL2 obtained by quantizing and coding a spectrum in a frequency region higher than the frequency f0 of the left channel shown in (a1) of FIG. 8 and a frequency higher than the frequency f0 of performing intensity stereo coding shown in (b1) Is equal to the information amount Di1 when the integrated spectrum SPCi1 is quantized and coded.
[0031]
FIG. 10 is a block diagram illustrating a configuration of a decoding device that decodes a coded bit string generated using intensity stereo coding in the coding device illustrated in FIG. 7. In FIG. 10, the decoding and inverse quantization means 61 decomposes the encoding auxiliary information included in the input encoded bit sequence, and performs decoding and inverse quantization to obtain a left channel spectrum SPCL3 and a right channel spectrum SPCL3. Is output. The spectrum in the partial frequency domain output from the decoding and inverse quantization means 61 and subjected to intensity stereo coding in the spectrum SPCL3 in the left channel is composed of the integrated spectrum SPCi2.
[0032]
The intensity stereo processing means 62 uses the integrated spectrum SPCl2 in the spectrum SPCL3 in the left channel and the additional intensity information indicating the left / right distribution in the decomposed coded auxiliary information to generate the right channel. In the spectrum SPCR3 in the above, a frequency domain spectrum that is not transmitted or stored in a portion higher than the frequency f0 at which the intensity stereo coding is performed is generated. The intensity stereo processing means 62 combines a spectrum having a frequency lower than the frequency f0 and a spectrum having a frequency higher than the frequency f0 for performing intensity stereo coding of the spectrum SPCR3, and outputs a spectrum SPCR4.
[0033]
Further, the intensity stereo processing means 62 outputs the spectrum SPCL3 of the left channel output from the decoding and inverse quantization means 61 as a spectrum SPCL4. The frequency / time converters 63 and 64 convert the right-channel spectrum SPCL4 and the right-channel spectrum SPCR4 on the frequency axis into time-series digital audio signal samples SOUTL and SOUTR, respectively, and output them.
[0034]
FIG. 11 is an explanatory diagram showing the spectrum of each means in the decoding device shown in FIG. In FIG. 11, the vertical axis indicates the amplitude of the spectrum, and the horizontal axis indicates the frequency of the spectrum. For the sake of simplicity, the total number of spectra over all frequencies is assumed to be 24, and four adjacent spectra are grouped into six bands. The intensity stereo coding is performed on a spectrum in a frequency region higher than the frequency f0.
[0035]
In FIG. 11, (c1) shows the spectrum SPCL3 of the left channel including the integrated spectrum SPCl2 output from the decoding and inverse quantization means 61. (C2) shows the spectrum SPCR3 of the right channel output from the decoding and inverse quantization means 61. (D1) is the spectrum SPCL4 of the left channel output from the intensity stereo processing means 62 and is the same as SPCL3 of (c1). (D2) shows the spectrum SPCR2 of the left channel output from the intensity stereo decoding means 62. (D2) a pseudo-spectrum using the additional intensity information indicating the left / right distribution to convert the integrated spectrum SPCi2 included in the left-channel spectrum SPCL3 of (c1) to the SPCR3 of (c2). Is generated and combined and output.
[0036]
[Non-patent document 1]
J. D. Johnston, "Perceptual Transform Coding of Wideband Stereo Signals", IEEE / ICASSP, 1989, p. 1993-1993
[Non-patent document 2]
R. V. R. Vanderwaal, "Subband Coding of Stereophonic Digital Signals", IEEE / ICASSP, 1991, p. 3601-3604
[Non-Patent Document 3]
"Advanced Audio Coding ANNEX B Informative Part" ISO / IEC, IS-13818-7
[0037]
[Problems to be solved by the invention]
However, in the above-described conventional apparatus and method for encoding a digital audio signal, a frame having a large amount of bits required for encoding or a frame for performing time / frequency conversion with a short block conversion block length has a short period. In the case of concentrating, the amount of surplus bits stored in the bit reservoir is significantly reduced, and the amount of bits available for encoding is reduced. In such a state, it becomes impossible to increase the amount of usable bits while maintaining a predetermined bit rate. In such a case, there is a problem that the spectrum is lost due to the quantization and the encoding, and the sense of distortion of the reproduced signal when the encoded signal is decoded increases, thereby deteriorating the sound quality.
[0038]
The present invention has been made in view of such a conventional problem, and a frame in which the amount of bits necessary for encoding is large or a frame for performing time / frequency conversion with a short block conversion block length is short. Prevents loss of spectrum due to quantization and encoding when the amount of bits available for encoding is reduced during the period, and distortion of the reproduced signal when decoding the encoded signal It is an object of the present invention to realize a digital audio signal encoding apparatus and method that eliminates a feeling and does not deteriorate sound quality.
[0039]
[Means for Solving the Problems]
The invention according to claim 1 of the present application provides a time-series spectrum in which n (n is a positive integer) samples of a time-series digital audio signal of N channels (N is a positive integer of 2 or more) are collected for each frame. After transforming and quantizing the spectrum, encoding by entropy coding, multiplexing the coded information to generate and output a coded bit sequence, performing intensity stereo coding on the spectrum to reduce the amount of information A digital audio signal encoding device, comprising: time / frequency conversion means for converting a time series sample of each channel into a frequency domain spectrum; time series samples by said time / frequency conversion means; When converting to a frequency domain spectrum, the number m of samples (m is a positive integer) is one of m = n and m <n. Transform block length setting means for setting, intensity stereo processing means for performing intensity stereo coding on a partial frequency region of the N-channel spectrum, and quantizing the spectrum output from the intensity stereo processing means, Quantization and encoding means for encoding based on the available bit amount for each frame, multiplexing the encoding auxiliary information to generate an encoded bit sequence, and storing the surplus bit amount for each frame in a bit reservoir, Bit reservoir control means for setting the amount of usable bits for each frame using the amount of bits stored in the bit reservoir and the output bit rate; ATH is a positive integer). It is characterized in that it comprises the intensity stereo band setting means for increasing the Nau partial frequency region.
[0040]
According to a second aspect of the present invention, in the digital audio signal encoding apparatus according to the first aspect, the intensity stereo band setting means is configured to transmit L (L is a positive integer of 2 or more) consecutive frames in time series. When a frame whose available bit amount is smaller than a predetermined threshold value ATH (ATH is a positive integer) is set as a band control target frame, the number of the band control target frames is equal to or smaller than a predetermined threshold value BTH (BTH is less than L). When the value is larger than (a positive integer), the partial frequency region for performing intensity stereo coding is increased.
[0041]
According to a third aspect of the present invention, in the digital audio signal encoding apparatus according to the first aspect, the intensity stereo band setting means is configured to transmit L (L is a positive integer of 2 or more) consecutive frames in time series. The conversion block length set by the conversion block length setting means is stored and held, and the number of frames to be converted by m <n sub-blocks among the L conversion block lengths stored and held is a predetermined threshold value CTH (CTH is When the value is larger than L (a positive integer equal to or less than L), a partial frequency region for performing intensity stereo coding is increased.
[0042]
According to the invention of claim 4 of the present application, the n-channel (N is a positive integer of 2 or more) digital audio signal time-series n (n is a positive integer) time-series sample is collected into a spectrum on the frequency axis for each frame. After transforming and quantizing the spectrum, encoding by entropy coding, multiplexing the coded information to generate and output a coded bit sequence, performing intensity stereo coding on the spectrum to reduce the amount of information A method of encoding a digital audio signal, wherein when converting time-series samples of each channel into a frequency-domain spectrum in units of sub-blocks, the number m of samples to be converted (m is a positive integer) is m = n And m <n, and performs intensity stereo coding on the partial frequency domain of the N-channel spectrum. The spectrum output by the intensity stereo processing is quantized, encoded based on the available bit amount for each frame, and coded auxiliary information is multiplexed to generate a coded bit sequence, and the extra bits for each frame are generated. In addition to storing the amount in the bit reservoir, the available bit amount for each frame is set by using the bit amount stored in the bit reservoir and the output bit rate, and the available bit amount for each frame is set to a predetermined value. When the threshold value is smaller than ATH (ATH is a positive integer), a partial frequency region for performing intensity stereo coding is increased.
[0043]
According to a fifth aspect of the present invention, in the encoding method of the digital audio signal according to the fourth aspect, the band setting in the intensity stereo encoding is performed in a continuous time series of L (L is a positive integer of 2 or more). When a frame whose available bit amount is smaller than a predetermined threshold value ATH (ATH is a positive integer) is set as a band control target frame, the number of the band control target frames is set to a predetermined threshold value BTH (BTH is L When the value is larger than the following (positive integer), the partial frequency domain for performing intensity stereo coding is increased.
[0044]
According to a sixth aspect of the present invention, in the digital audio signal encoding method of the fourth aspect, the band setting in the intensity stereo encoding is performed by setting L (L is a positive integer of 2 or more) consecutive time series. A conversion block length in a frame is stored and held, and the number of frames to be converted in a sub-block of m <n among L conversion block lengths stored and held is determined by a predetermined threshold value CTH (CTH is a positive integer less than or equal to L). ), The partial frequency domain in which intensity stereo coding is performed is increased.
[0045]
BEST MODE FOR CARRYING OUT THE INVENTION
A digital audio signal encoding apparatus and method according to each embodiment of the present invention will be described with reference to the drawings. In the following description, a series of encoding processes is performed on a time-series digital audio signal in units of frames in which n (n is a positive integer) samples are collected.
[0046]
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a digital audio signal encoding device according to Embodiment 1 of the present invention. Hereinafter, each constituent block and its operation will be described. The time / frequency conversion means 10 and 11 convert the left channel input signal SINL and the right channel input signal SINR composed of time series n samples into a left channel spectrum SPCL1 and a right channel spectrum SPCR1, respectively. Convert to The time / frequency converters 10 and 11 convert the n samples for each frame in the left / right channel into a short block or a long block according to a flag FL or FR notified from the conversion block length setting units 12 and 13 described later. The spectrum is converted into the spectrums SPCL1 and SPCR1 for each sub-block and output.
[0047]
The conversion block length setting means 12 and 13 convert the time-series samples for each frame into a value of the number m (m is a positive integer) in units of sub-blocks so that either m = n or m <n. . In the following description, a sub-block where m = n is referred to as a “long block”, and a sub-block where m <n is referred to as a “short block”. For example, m is expressed by the formula n = m × k (k is a positive integer), and when set as a short block, one frame period includes k subblocks composed of m samples. The setting of the conversion block length is set according to the state of the input signal. Such switching of the conversion block length is the same as the conversion block length setting means 12 and 13 in FIG. 7 showing the configuration of a conventional digital audio signal encoding device.
[0048]
The conversion block length setting means 12 and 13 include a left channel input signal SINL composed of n samples for each frame in a left / right channel of a time series of the input digital audio signal, and a right channel input signal SINR And analyze. A signal having a large change in amplitude (for example, a signal from a percussion instrument such as castanets) is subjected to time / frequency conversion in a short block, and a signal having a small change in amplitude is subjected to time / frequency conversion in a long block. A flag FL or FR for setting the conversion block length is output to the time / frequency conversion means 10 and 11 so as to perform the conversion.
[0049]
The intensity stereo processing means 14 generates an integrated monaural spectrum SPCI1 from the left channel spectrum SPCL1 and the right channel spectrum SPCR1. An example of a method of implementing the intensity stereo processing means is the same as the intensity stereo processing means 14 of FIG. 7 showing a conventional digital audio signal encoding device. Intensity stereo coding is performed on the spectrum of a region having a frequency higher than the settable frequency fi with respect to the left channel spectrum SPCL1 and the right channel spectrum SPCR1.
[0050]
Further, a region of the left channel having a frequency higher than the frequency fi in the spectrum SPCL1 is replaced with the integrated spectrum SPCl1 and output as the spectrum SPCL2. Further, the spectrum of the right channel in the frequency region higher than the frequency fi in the spectrum SPCR1 is replaced with zero and output as the spectrum SPCR2. Further, additional intensity information indicating the right / left distribution of the partial frequency domain where the intensity stereo coding is performed is calculated. However, in order to perform intensity stereo, the conditions must be the same as those of the intensity stereo processing means 14 in FIG. 7 showing a conventional digital audio signal encoding apparatus.
[0051]
The above description of the operation of the intensity stereo processing means 14 is directed to an example in which intensity stereo coding is performed on a spectrum having a frequency higher than the settable frequency fi. However, it is also possible to use another method for setting the partial frequency domain for performing the intensity stereo coding.
[0052]
The quantization and encoding unit 15 outputs the left-channel spectrum SPCL2 and the right-channel spectrum SPCR2 output from the intensity stereo processing unit 14 using human auditory psychological characteristics and a bit reservoir control unit 16 described later. Quantization and encoding are performed based on the set amount of usable bits for each frame. Then, the quantization and coding means 15 converts the result of the quantization and coding into a bit string, and multiplexes the coding auxiliary information to generate a coded bit string.
[0053]
The bit reservoir control means 16 stores a surplus bit amount RBIT in the bit reservoir when encoding is performed with a small bit amount with respect to a bit amount for each frame based on a predetermined bit rate. The bit reservoir control means 16 adds the amount of surplus bits stored in the bit reservoir to the amount of bits per frame based on a predetermined bit rate, when the amount of bits necessary for encoding is large, The amount of available bits, ABIT, is partially increased to reduce audio quality degradation.
Thus, the bit reservoir control means 16 controls the amount of bits stored in the bit reservoir so that the entire coded bit string has a predetermined bit rate.
[0054]
Intensity stereo band setting means 17A sets a partial frequency region in which intensity stereo coding is to be performed using the amount of available bits ABIT per frame output from bit reservoir control means 16. If the available bit amount ABIT per frame output from the bit reservoir control means 16 is larger than a predetermined threshold ATH (ATH is a positive integer), the intensity stereo band setting means 17A sets the intensity stereo code The lower limit frequency fi of the partial frequency region to be converted is set to a predetermined f0. Conversely, if a frame in which the amount of available bits ABIT per frame is smaller than a predetermined threshold value ATH (ATH is a positive integer) is set as a band control target frame, in the case of a band control target frame, the intensity stereo band setting is performed. Means 17A sets the lower limit frequency fi of the partial frequency domain for performing intensity stereo coding to a predetermined f1 (f1 <f0) in order to increase the partial frequency range for performing intensity stereo coding.
[0055]
The intensity stereo band setting means 17A may set the band by the following method. That is, the intensity stereo band setting unit 17A sets the amount of usable bits output from the bit reservoir control unit 16 for each of L consecutive (L is a positive integer of 2 or more) frames in time series to ABIT. , And a buffer for storing the ABIT. If the number of frames whose bit amount ABIT is smaller than a predetermined threshold value ATH is smaller than a predetermined threshold value BTH (BTH is a positive integer equal to or less than L), the partial frequency domain for performing intensity stereo coding Is set to a predetermined frequency f0. If the number of frames in which the bit amount ABIT is smaller than the predetermined threshold value ATH is larger than the predetermined threshold value BTH, the lower limit frequency fi of the partial frequency region for performing intensity stereo coding is set to a predetermined frequency f1 ( f1 <f0).
[0056]
The above description of the operation of the intensity stereo band setting means 17A is directed to an example in which intensity stereo coding is performed on a spectrum having a frequency higher than the settable frequency fi. However, other methods can be used to increase the partial frequency domain in which the intensity stereo coding is performed. In addition, the intensity stereo band setting means 17A switches the settable frequency fi to one of two settings, f0 and f1. However, it is also possible to use a method of preparing two or more set values and switching to select any one of the set values.
[0057]
FIG. 2 is an explanatory diagram showing an improvement in coding efficiency when intensity stereo coding is performed on a spectrum in a partial frequency domain in the digital audio signal coding apparatus of FIG. More specifically, the predetermined frequency set by the intensity stereo band setting means 17A is determined based on the magnitude relationship between the amount of available bits ABIT per frame output from the bit reservoir control means 16 and the predetermined threshold value ATH. An example is shown in which f0 or f1 (f1 <f0) is set. FIG. 3 is a schematic diagram showing the sum of information amounts when the left channel spectrum SPCL1 and the right channel spectrum SPCR1 are quantized and coded.
[0058]
In FIG. 2, the vertical axis represents the amplitude of the spectrum, and the horizontal axis represents the frequency of the spectrum. For the sake of simplicity, the total number of spectra over all frequencies is assumed to be 24, and adjacent spectra are grouped in groups of four, each being composed of six bands. (B1) and (b2) of FIG. 2 and (b3) of FIG. 3 have the same contents as the explanatory diagrams of FIGS. 8 and 9 showing the improvement of the coding efficiency by the intensity stereo coding.
[0059]
In FIG. 2, (e1) shows a case where intensity stereo coding is performed on a spectrum in a partial frequency region higher than a predetermined frequency f1 (f1 <f0) set by the intensity stereo band setting unit 17A. (E2) shows the spectrum SPCR2 of the right channel. By performing the intensity stereo coding of the present embodiment, in the spectrum SPCL2 of the left channel (e1), the spectrum subjected to the intensity stereo coding of a frequency higher than the frequency f1 is replaced with the integrated spectrum SPCl1. I have.
[0060]
In the spectrum SPCR2 in the right channel of (e2), the spectrum to be subjected to intensity stereo coding at a frequency higher than the frequency f1 is replaced with zero. Further, (e3) of FIG. 3 shows the total sum of information amounts when the spectrum of the left channel of (e1) and the spectrum of the right channel of (e2) in FIG. 2 are quantized and coded.
[0061]
In (e3) of FIG. 3, DL3 is the information amount of the spectrum in the frequency domain lower than the frequency f1 in the spectrum SPCL2 in the left channel. Di2 is the information amount of the spectrum replaced with the integrated spectrum in the frequency region higher than the frequency f1 in the spectrum SPCL2 in the left channel of (e1). DR3 is the information amount of the spectrum in the frequency domain lower than the frequency f1 in the spectrum SPCR2 in the right channel of (e2). These information amounts are values when each spectrum is quantized and coded. The total sum T3 of these information amounts is represented by T3 = DL3 + DR3 + Di2.
[0062]
As shown in (e2) of FIG. 2, in the frequency region higher than the frequency f1 of the spectrum SPCR2 of the right channel, the spectrum subjected to intensity stereo coding is replaced with zero, and is not transmitted or stored. Not added. Thereby, the total information amount T3 when the intensity stereo coding is performed can be made smaller than the total information amount T1 when the intensity stereo coding is not performed as illustrated in FIG. Can be further improved.
[0063]
Also, when intensity stereo coding is performed on the spectrum of the partial frequency region higher than the frequency f1 as compared with the case where the intensity stereo coding is performed on the spectrum of the partial frequency region higher than the frequency f0, the intensity of the right channel is improved. City stereo coding increases the number of spectra replaced by zero. For this reason, the information amount DR3 of the spectrum of the right channel is reduced from the information amount DR1. Accordingly, the total information amount T3 when performing intensity stereo coding on a spectrum in a partial frequency region higher than frequency f1 is equal to the sum of information amounts when performing intensity stereo coding on a spectrum in a partial frequency region higher than frequency f0. Is smaller than the total amount T2 of the information. For this reason, encoding efficiency can be improved.
[0064]
FIG. 4 is a time chart showing a change in a setting range of a partial frequency band for performing intensity stereo encoding for each frame in the first embodiment of the present invention. The horizontal axis in FIG. 4 shows frame numbers that are continuous in time series, and the vertical axis in the upper time chart shows the amount of available bits ABIT per frame output from the bit reservoir control means 16. ATH is a predetermined threshold value serving as a reference for switching the lower limit frequency of the partial frequency band for performing intensity stereo coding to f0 or f1 by the intensity stereo band setting unit 17A. The vertical axis of the lower time chart represents frequency.
[0065]
As shown in FIG. 4, the intensity stereo band setting means 17A includes a third frame and a fifth frame in which the value of the available bit amount ABIT output from the bit reservoir control means 16 becomes smaller than a predetermined threshold value ATH. In the frame, the lower limit frequency of the partial frequency band for performing intensity stereo coding is set to f1.
[0066]
As a result, when the available bit amount ABIT per frame is smaller than a predetermined threshold ATH (ATH is a positive integer), the available bit amount ABIT is significantly reduced, and the spectrum is not quantized and encoded. Is increased, the range of the partial frequency domain for performing intensity stereo coding is increased. That is, by shifting the lower limit frequency of the partial frequency domain for performing intensity stereo from f0 to f1 (f1 <f0), the encoding efficiency is improved, and quantization and encoding can be performed with a small number of bits. In this case, it is possible to reduce the possibility that a spectrum is lost due to quantization and encoding, and to prevent deterioration of sound quality.
[0067]
Note that the series of encoding processes described in the first embodiment can be implemented on a computer or a digital signal processor (DSP) using a software programming language.
[0068]
(Embodiment 2)
Next, a digital audio signal encoding apparatus according to Embodiment 2 of the present invention will be described. FIG. 5 is a block diagram showing a configuration of a digital audio signal encoding device according to the second embodiment. The same blocks as in the first embodiment are denoted by the same reference numerals, and the operation of each block will be described.
[0069]
In FIG. 5, the time / frequency conversion means 10 and 11, the conversion block length setting means 12 and 13, the intensity stereo coding means 14, the quantization and coding means 15, and the bit reservoir control means 16 correspond to the digital signal shown in FIG. The means and functions in the audio signal encoding device are the same.
[0070]
The intensity stereo band setting means 17B includes a flag FL for setting a conversion block length output from the conversion block length setting means 12 and 13 for each of L (L is a positive integer of 2 or more) consecutive frames in time series. The FR is stored and held, and a partial frequency region for performing intensity stereo coding is set based on the stored flag FL or FR for each of the L frames. The intensity stereo band setting means 17B includes a buffer for storing and holding a flag FL or FR for setting the conversion block length for each of L frames output from the conversion block length setting means 12 and 13. The intensity stereo band setting unit 17B refers to the flag FL or FR that sets the conversion block length for each of L frames stored and held in the buffer, and determines whether the number of frames in which short blocks are set is equal to a predetermined threshold. If CTH is smaller than CTH (CTH is a positive integer equal to or less than L), the lower limit frequency fi of the partial frequency domain for performing intensity stereo coding is set to a predetermined f0. The intensity stereo band setting means 17B refers to the flag FL or FR for setting the conversion block length for each of L frames stored and held in the buffer, and determines whether the number of frames in which short blocks are set is equal to a predetermined threshold value CTH. If it is larger, the lower limit frequency fi of the partial frequency domain for performing intensity stereo coding is set to a predetermined value f1 (f1 <f0).
[0071]
The above description of the operation of the intensity stereo band setting means 17B is directed to an example in which intensity stereo coding is performed on a spectrum having a frequency higher than the settable frequency fi, but a partial frequency in which intensity stereo coding is performed. Other methods can be used to increase the area. In addition, the intensity stereo band setting means 17B switches the settable frequency fi to one of two settings, f0 and f1, but prepares two or more set values and sets a plurality of set values. A method of switching so as to select any one of the setting values may be used.
[0072]
Switching of the conversion block length is performed by the same method as the conversion block length setting means 12 and 13 in the conventional digital audio signal encoding apparatus of FIG. Since quantization and coding are performed for each band in which a plurality of adjacent spectra are put together, coding auxiliary information for each band to be quantized and coded for each frame is transmitted or stored.
[0073]
Here, the number of spectra included in each band is predetermined. Generally, the number of spectrums of a band including a spectrum of a high frequency is set to be larger than the number of spectra of a band including a spectrum of a low frequency. The number of spectra in each band is set based on human auditory characteristics. If the range of the effective frequency domain to be transmitted or stored is fixed, the number of bands to be transmitted or stored when a short block is selected is larger than that when a long block is selected. The transmission of the auxiliary information requires many bits.
[0074]
Further, even for a signal having a large amplitude change amount, the number of quantization bits is increased in order to suppress the quantization noise to a level that is not perceptible. For this reason, a short block frame consumes more bits than a long block frame. Therefore, when frames of short blocks appear continuously, the amount of available bits ABIT may be significantly reduced.
[0075]
FIG. 6 is a time chart showing a change in a setting range of a partial frequency band for performing intensity stereo coding for each frame according to Embodiment 2 of the present invention. The horizontal axis in FIG. 6 shows frames that are continuous in time series. The upper timing chart shows the conversion block length when performing time / frequency conversion for each frame stored and held in the intensity stereo band setting means 14.
[0076]
In the example shown in FIG. 6, the conversion block lengths of three frames continuous in time series are stored and held, F0 is the conversion block length of the nth frame, and F1 is the conversion block length of the (n-1) th frame. It is assumed that F2 is the converted block length of the (n-2) th frame. For example, when the encoding process of the fourth frame is performed, a flag indicating a short block as the conversion block length of the fourth frame is stored and held in F0, and a flag indicating the short block as the conversion block length of the third frame is stored in F1. And a flag indicating a long block as the converted block length of the second frame is stored and held in F2.
[0077]
The vertical axis in the lower time chart of FIG. 6 represents the frequency. As shown in FIG. 6, the intensity stereo band setting unit 17B stores and holds the conversion block lengths of three time-series continuous frames output from the conversion block length setting units 12 and 13. The intensity stereo band setting means 17B stores the fourth frame and the fourth frame in which the predetermined threshold value CTH (CTH = 2 in FIG. 6) is a short block among the conversion block lengths of the three frames stored and held. In five frames, the lower limit frequency of the partial frequency band for performing intensity stereo coding is set to f1.
[0078]
When the transform block length is reduced to a short block that is continuous in time series or concentrated during a short period, the amount of available bits ABIT is significantly reduced, and a spectrum is lost in quantization and coding. think of. At this time, the intensity stereo band setting means 17B stores and holds the conversion block length for each of the continuous L frames in the time series, and sets the lower limit of the partial frequency domain for performing the intensity stereo based on the stored and held conversion block length. The frequency is shifted from f0 to f1 (f1 <f0). By increasing the range of the partial frequency domain in which the intensity stereo coding is performed, coding efficiency is improved, and quantization and coding can be performed with a small number of bits. In this way, it is possible to reduce the possibility that a spectrum is lost due to quantization and encoding, and to prevent deterioration of sound quality.
[0079]
Note that the series of encoding processes described in the second embodiment can be realized on a computer or a digital signal processor (DSP) using a software programming language.
[0080]
【The invention's effect】
According to the apparatus and method for encoding a digital audio signal of the present invention, a portion for performing intensity stereo encoding based on the amount of bits usable for quantization and encoding and the transform block length of time / frequency conversion. By adaptively changing the range of the frequency domain, it is possible to prevent a spectrum from being lost due to quantization and coding even when the amount of usable bits is significantly reduced. In addition, it is possible to prevent an increase in a feeling of distortion and a deterioration in sound quality due to lack of a spectrum.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a digital audio signal encoding device according to Embodiment 1 of the present invention.
FIG. 2 is an explanatory diagram (I) illustrating an effect of improving coding efficiency by intensity stereo coding according to the first embodiment.
FIG. 3 is an explanatory diagram (part 2) illustrating an effect of improving coding efficiency by intensity stereo coding according to the first embodiment.
FIG. 4 is a time chart showing a change in a setting range of a partial frequency band for performing intensity stereo coding for each frame in the first embodiment.
FIG. 5 is a block diagram illustrating a configuration of a digital audio signal encoding device according to Embodiment 2 of the present invention.
FIG. 6 is a time chart showing a change in a setting range of a partial frequency band for performing intensity stereo coding for each frame in the second embodiment.
FIG. 7 is a block diagram showing a configuration of a conventional digital audio signal encoding device.
FIG. 8 is an explanatory diagram (I) illustrating an effect of improving coding efficiency by conventional intensity stereo coding.
FIG. 9 is an explanatory diagram (part 2) illustrating an effect of improving coding efficiency by conventional intensity stereo coding.
FIG. 10 is a block diagram illustrating a configuration of a decoding device that decodes an encoded bit sequence generated using intensity stereo encoding.
FIG. 11 is an explanatory diagram showing a spectrum of each block in a decoding device that decodes a coded bit sequence generated using intensity stereo coding.
[Explanation of symbols]
10, 11 time / frequency conversion means
12, 13 conversion block length setting means
14 Intensity stereo processing means
15 Quantization and coding means
16-bit reservoir control means
17A, 17B intensity stereo band setting means
61 Decoding and inverse quantization means
62 Intensity stereo processing means
63, 64 frequency / time conversion means

Claims

Ｎチャンネル（Ｎは２以上の正整数）のデジタルオーディオ信号の時系列のｎ個（ｎは正整数）のサンプルをまとめたフレーム毎に周波数軸上のスペクトルに変換し、スペクトルを量子化した後にエントロピー符号化により符号化し、符号化情報を多重化して符号化ビット列を生成して出力する際に、スペクトルにインテンシティステレオ符号化を行なって情報量を減少するデジタルオーディオ信号の符号化装置であって、
各チャンネルの時系列のサンプルを周波数領域のスペクトルに変換する時間／周波数変換手段と、
前記時間／周波数変換手段で時系列のサンプルを、サブブロックを単位として周波数領域のスペクトルに変換するとき、サンプルの個数ｍ（ｍは正整数）をｍ＝ｎ及びｍ＜ｎのいずれか一方となるよう設定する変換ブロック長設定手段と、
Ｎチャンネルのスペクトルの部分周波数領域に対してインテンシティステレオ符号化を行なうインテンシティステレオ処理手段と、
前記インテンシティステレオ処理手段から出力されたスペクトルを量子化し、フレーム毎の使用可能なビット量に基づいて符号化し、符号化補助情報を多重して符号化ビット列を生成する量子化及び符号化手段と、
フレーム毎の余剰ビット量をビットリザーバーに蓄えると共に、前記ビットリザーバーに蓄えられているビット量と出力ビットレートとを用いてフレーム毎の使用可能なビット量を設定するビットリザーバー制御手段と、
フレーム毎の使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させるインテンシティステレオ帯域設定手段と、を具備することを特徴とするデジタルオーディオ信号の符号化装置。After converting n (n is a positive integer) time-series samples of a digital audio signal of N channels (N is a positive integer of 2 or more) into a spectrum on the frequency axis for each grouped frame and quantizing the spectrum A digital audio signal encoding device that performs intensity stereo encoding on a spectrum to reduce the amount of information when encoding by entropy encoding, multiplexing the encoded information to generate and output an encoded bit sequence. hand,
Time / frequency converting means for converting a time series sample of each channel into a frequency domain spectrum;
When the time / frequency conversion unit converts a time-series sample into a frequency-domain spectrum in units of sub-blocks, the number m of samples (m is a positive integer) is set to one of m = n and m <n. Conversion block length setting means for setting
Intensity stereo processing means for performing intensity stereo coding on a partial frequency domain of an N-channel spectrum;
Quantizing and encoding means for quantizing the spectrum output from the intensity stereo processing means, encoding based on the available bit amount for each frame, and multiplexing encoding auxiliary information to generate an encoded bit sequence; ,
Bit reservoir control means for storing a surplus bit amount for each frame in the bit reservoir, and setting a usable bit amount for each frame using the bit amount and the output bit rate stored in the bit reservoir,
An intensity stereo band setting means for increasing a partial frequency region for performing intensity stereo coding when an available bit amount per frame is smaller than a predetermined threshold value ATH (ATH is a positive integer). A digital audio signal encoding device characterized by the above-mentioned.

前記インテンシティステレオ帯域設定手段は、
時系列の連続するＬ個（Ｌは２以上の正整数）のフレームの使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さいフレームを帯域制御対象フレームとするとき、前記帯域制御対象フレームの数が所定のしきい値ＢＴＨ（ＢＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させるものであることを特徴とする請求項１記載のデジタルオーディオ信号の符号化装置。The intensity stereo band setting means,
When a frame in which the available bit amount of L frames (L is a positive integer of 2 or more) continuous in time series is smaller than a predetermined threshold value ATH (ATH is a positive integer) is set as a band control target frame, When the number of band control target frames is larger than a predetermined threshold value BTH (BTH is a positive integer equal to or less than L), a partial frequency region for performing intensity stereo coding is increased. 2. A digital audio signal encoding device according to claim 1.

前記インテンシティステレオ帯域設定手段は、
時系列の連続するＬ個（Ｌは２以上の正整数）のフレームの前記変換ブロック長設定手段の設定する変換ブロック長を記憶保持し、記憶保持するＬ個の変換ブロック長のうちｍ＜ｎのサブブロックで変換するフレームの数が所定のしきい値ＣＴＨ（ＣＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させるものであることを特徴とする請求項１記載のデジタルオーディオ信号の符号化装置。The intensity stereo band setting means,
The conversion block length set by the conversion block length setting means is stored and held for L (L is a positive integer of 2 or more) consecutive time-series frames, and m <n among the L conversion block lengths to be stored and held. When the number of frames to be converted in the sub-block is larger than a predetermined threshold value CTH (CTH is a positive integer equal to or less than L), the partial frequency domain for performing intensity stereo coding is increased. The digital audio signal encoding device according to claim 1.

Ｎチャンネル（Ｎは２以上の正整数）のデジタルオーディオ信号の時系列のｎ個（ｎは正整数）のサンプルをまとめたフレーム毎に周波数軸上のスペクトルに変換し、スペクトルを量子化した後にエントロピー符号化により符号化し、符号化情報を多重化して符号化ビット列を生成して出力する際に、スペクトルにインテンシティステレオ符号化を行なって情報量を減少するデジタルオーディオ信号の符号化方法であって、
各チャンネルの時系列のサンプルを、サブブロックを単位として周波数領域のスペクトルに変換するとき、変換するサンプルの個数ｍ（ｍは正整数）をｍ＝ｎ及びｍ＜ｎのいずれか一方となるよう設定し、
Ｎチャンネルのスペクトルの部分周波数領域にインテンシティステレオ符号化を行ない、
前記インテンシティステレオ処理により出力されたスペクトルを量子化し、フレーム毎の使用可能なビット量に基づいて符号化すると共に、符号化補助情報を多重して符号化ビット列を生成し、
フレーム毎の余剰ビット量をビットリザーバーに蓄えると共に、前記ビットリザーバーに蓄えられているビット量と出力ビットレートとを用いてフレーム毎の使用可能なビット量を設定し、
フレーム毎の使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とするデジタルオーディオ信号の符号化方法。After converting n (n is a positive integer) time-series samples of a digital audio signal of N channels (N is a positive integer of 2 or more) into a spectrum on the frequency axis for each grouped frame and quantizing the spectrum A method for encoding a digital audio signal, which performs intensity stereo encoding on a spectrum to reduce the amount of information when encoding by entropy encoding, multiplexing the encoded information to generate and output an encoded bit sequence. hand,
When converting the time-series samples of each channel into a frequency-domain spectrum in units of sub-blocks, the number m (m is a positive integer) of the samples to be converted is one of m = n and m <n. Set,
Perform intensity stereo coding on the partial frequency domain of the N-channel spectrum,
Quantizing the spectrum output by the intensity stereo processing, encoding based on the available bit amount for each frame, multiplexing the encoding auxiliary information to generate an encoded bit sequence,
While storing the surplus bit amount for each frame in the bit reservoir, the available bit amount for each frame is set using the bit amount and the output bit rate stored in the bit reservoir,
Digital audio signal encoding characterized by increasing the partial frequency domain in which intensity stereo encoding is performed when the amount of usable bits per frame is smaller than a predetermined threshold value ATH (ATH is a positive integer). Method.

前記インテンシティステレオ符号化における帯域設定は、
時系列の連続するＬ個（Ｌは２以上の正整数）のフレームの使用可能なビット量が所定のしきい値ＡＴＨ（ＡＴＨは正整数）より小さいフレームを帯域制御対象フレームとするとき、前記帯域制御対象フレームの数が所定のしきい値ＢＴＨ（ＢＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とする請求項４記載のデジタルオーディオ信号の符号化方法。The band setting in the intensity stereo coding is as follows:
When a frame in which the available bit amount of L frames (L is a positive integer of 2 or more) continuous in time series is smaller than a predetermined threshold value ATH (ATH is a positive integer) is set as a band control target frame, 5. The partial frequency domain for performing intensity stereo coding when the number of band control target frames is greater than a predetermined threshold value BTH (BTH is a positive integer equal to or less than L). A method for encoding digital audio signals.

前記インテンシティステレオ符号化における帯域設定は、
時系列の連続するＬ個（Ｌは２以上の正整数）のフレームにおけるおける変換ブロック長を記憶保持し、記憶保持するＬ個の変換ブロック長のうちｍ＜ｎのサブブロックで変換するフレームの数が、所定のしきい値ＣＴＨ（ＣＴＨはＬ以下の正整数）より大きい場合に、インテンシティステレオ符号化を行なう部分周波数領域を増大させることを特徴とする請求項４記載のデジタルオーディオ信号の符号化方法。The band setting in the intensity stereo coding is as follows:
The conversion block length in L consecutive frames (L is a positive integer of 2 or more) in a time series is stored and held, and of the L conversion block lengths stored and held, a frame to be converted with m <n sub-blocks is used. 5. The digital audio signal according to claim 4, wherein when the number is larger than a predetermined threshold value CTH (CTH is a positive integer equal to or less than L), a partial frequency domain for performing intensity stereo coding is increased. Encoding method.