JP2006504993A

JP2006504993A - Digital audio encoding method and apparatus using improved psychoacoustic model

Info

Publication number: JP2006504993A
Application number: JP2004548132A
Authority: JP
Inventors: マシュー，マニュ
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2002-10-30
Filing date: 2003-10-24
Publication date: 2006-02-09
Also published as: WO2004040554A1; EP1556856A1; EP1556856A4; AU2003272128A1

Abstract

本発明は、改善された心理音響モデルを使用したデジタルオーディオ符号化方法に関わり、本発明によるオーディオデータ符号化方法は、入力オーディオ信号の特性によってウィンドウタイプを決定するステップと、決定されたウィンドウタイプによって、入力オーディオ信号からＣＭＤＣＴスペクトルを生成するステップと、決定されたウィンドウタイプを利用して、入力オーディオ信号からＦＦＴスペクトルを生成するステップと、生成されたＣＭＤＣＴスペクトル及びＦＦＴスペクトルを利用して、心理音響モデルの分析を行うステップとを含む。The present invention relates to a digital audio encoding method using an improved psychoacoustic model, and the audio data encoding method according to the present invention includes a step of determining a window type according to characteristics of an input audio signal, and a determined window type. To generate a CMDCT spectrum from the input audio signal, to generate an FFT spectrum from the input audio signal using the determined window type, and to generate a psychology using the generated CMDCT spectrum and FFT spectrum. Analyzing the acoustic model.

Description

本発明は、デジタルオーディオ符号化のための符号化方法及びその装置に関わり、特に、改善された心理音響モデルを使用して、音質の低下なしに符号化方法及びその装置にかかる計算量を減少させ、複雑度を低下させるための符号化方法及び装置に関する。 The present invention relates to an encoding method and apparatus for digital audio encoding, and more particularly, to use an improved psychoacoustic model to reduce the amount of calculation required for the encoding method and apparatus without deterioration in sound quality. And an encoding method and apparatus for reducing complexity.

ＭＰＥＧオーディオ符号化器は、符号化中に生成された量子化ノイズを聴取者に知覚させず、高い圧縮率を実現する。ＭＰＥＧで標準案を定めたＭＰＥＧ−１オーディオ符号器は、オーディオ信号を３２ｋｂｐｓから４４８ｋｂｐｓのビット率で符号化する。ＭＰＥＧ−１オーディオ規格は、符号化のための３個の異なるアルゴリズムを有する。 The MPEG audio encoder realizes a high compression rate without making the listener perceive the quantization noise generated during encoding. An MPEG-1 audio encoder that defines a standard proposal in MPEG encodes an audio signal at a bit rate of 32 kbps to 448 kbps. The MPEG-1 audio standard has three different algorithms for encoding.

ＭＰＥＧ−１符号化器は、レイヤ１、２、３という３つのモードを有している。レイヤ１は、最も基本的なアルゴリズムを具現し、レイヤ２及び３は、レイヤ１が改善されたものである。高いレイヤであるほど、高品質と高圧縮率とが実現される一方、ハードウェアの規模は大きくなる。 The MPEG-1 encoder has three modes of layers 1, 2, and 3. Layer 1 implements the most basic algorithm, and layers 2 and 3 are improvements of layer 1. The higher the layer, the higher the quality and the higher the compression ratio, while the larger the hardware scale.

ＭＰＥＧオーディオ符号化器は、信号の知覚的重複性を低下させるために、人間の聴覚特性をよく反映する心理音響モデルを使用する。ＭＰＥＧで標準案を定めたＭＰＥＧ１とＭＰＥＧ２とは、人間の知覚特性を反映し、知覚的重複性を除去して、符号化後にも良好な音質が維持されるように、心理音響モデルを利用する知覚符号化方式を採択している。 MPEG audio encoders use psychoacoustic models that well reflect human auditory characteristics to reduce the perceptual redundancy of signals. MPEG1 and MPEG2, which set the standard in MPEG, use a psychoacoustic model to reflect human perception characteristics, eliminate perceptual redundancy, and maintain good sound quality after encoding. The perceptual encoding method is adopted.

知覚符号化方式は、人間の心理音響モデルを分析して適用した技法であって、最小可聴限界とマスキング効果とを利用する。マスキング効果は、大きい音によって、ある臨界値以下の小さな音が区分される現象を称し、このように、同時間に存在する信号間のマスキングを周波数マスキングともいう。このとき、周波数帯域によってマスキングされた音の臨界値も変わる。 The perceptual encoding method is a technique applied by analyzing a human psychoacoustic model, and uses a minimum audible limit and a masking effect. The masking effect refers to a phenomenon in which a small sound below a certain critical value is classified by a loud sound. Thus, masking between signals existing at the same time is also referred to as frequency masking. At this time, the critical value of the sound masked by the frequency band also changes.

心理音響モデルを使用して、フィルタバンクの各サブバンドで聞けない最大ノイズモデルを決定できるが、このそれぞれのサブバンドでのノイズレベル、すなわち、マスキング臨界値を使用して、各サブバンドに対する信号対マスク率（ＳＭＲ：Signal to Mask Ratio）値を求めることが可能である。 The psychoacoustic model can be used to determine the maximum noise model that cannot be heard in each subband of the filter bank, but the noise level in this respective subband, i.e. the masking critical value, is used to determine the signal for each subband. It is possible to obtain a signal to mask ratio (SMR) value.

心理音響モデルを使用した符号化方法は、譲受人がＭｏｔｏｒｏｌａ，Ｉｎｃであり、発明の名称が“Ｓｙｓｔｅｍａｎｄｍｅｔｈｏｄｏｆｅｎｃｏｄｉｎｇａｎｄｄｅｃｏｄｉｎｇａｌａｙｅｒｅｄｂｉｔｓｔｒｅａｍｂｙｒｅ−ａｐｐｌｙｉｎｇｐｓｙｃｏａｃｏｕｓｔｉｃａｎａｌｙｓｉｓｉｎｔｈｅｄｅｃｏｄｅｒ”である米国特許第６,０９２,０４１号明細書に開示されている。 The encoding method using the psychoacoustic model is the assignee of Motorola, Inc., and the title of the invention is “System and method of encoding and layered bitstream by re-applying pycoscoustics in the United States”. This is disclosed in US Pat. No. 6,092,041.

図１は、一般的なＭＰＥＧオーディオ符号化器を示す図である。ここでは、ＭＰＥＧオーディオ符号化器のうち、ＭＰＥＧ−１レイヤ３、すなわち、ＭＰ３オーディオ符号化器を例として説明する。 FIG. 1 is a diagram illustrating a general MPEG audio encoder. Here, an MPEG-1 layer 3, that is, an MP3 audio encoder will be described as an example among MPEG audio encoders.

ＭＰ３オーディオ符号化器は、フィルタバンク１１０、変形離散余弦変換部（ＭＤＣＴ）１２０、高速フーリエ変換部（ＦＦＴ）１３０、心理音響符号化部１４０、量子化及びハフマンエンコーディング部１５０、ビットストリームフォーマッティング部１６０を含む。 The MP3 audio encoder includes a filter bank 110, a modified discrete cosine transform unit (MDCT) 120, a fast Fourier transform unit (FFT) 130, a psychoacoustic encoder 140, a quantization and Huffman encoding unit 150, and a bit stream formatting unit 160. including.

フィルタバンク１１０は、オーディオ信号の統計的な重複性を除去するために入力された、時間領域のオーディオ信号を３２個の周波数領域のサブバンドに細分する。 The filter bank 110 subdivides the input time-domain audio signal into 32 frequency-domain subbands to remove statistical duplication of the audio signal.

ＭＤＣＴ部１２０は、周波数分解能を向上させるために、心理音響モデル部１４０から入力されたウィンドウスイッチング情報を利用して、フィルタバンク１１０から分割されたサブバンドをさらに精密な周波数帯域に分割する。例えば、心理音響モデル部１４０から入力されたウィンドウスイッチング情報がロングウィンドウを表示する場合には、３６ポイントのＭＤＣＴを使用して、３２個のサブバンドよりさらに細密に周波数帯域を分割し、ウィンドウスイッチング情報がショートウィンドウを表示する場合には、１２ポイントのＭＤＣＴを使用して、３２個のサブバンドよりさらに細密に周波数帯域を分割する。
ＦＦＴ部１３０は、入力されたオーディオ信号を周波数領域のスペクトルに変換して心理音響モデル部１４０に出力する。 The MDCT unit 120 uses the window switching information input from the psychoacoustic model unit 140 to divide the subbands divided from the filter bank 110 into more precise frequency bands in order to improve the frequency resolution. For example, when the window switching information input from the psychoacoustic model unit 140 displays a long window, 36-point MDCT is used to divide the frequency band more finely than 32 subbands, and to perform window switching. When the information displays a short window, the 12-point MDCT is used to divide the frequency band more finely than 32 subbands.
The FFT unit 130 converts the input audio signal into a frequency domain spectrum and outputs the spectrum to the psychoacoustic model unit 140.

心理音響モデル部１４０は、人間の聴覚特性による知覚的な重複性を除去するために、ＦＦＴ部１３０から出力された周波数スペクトルを利用して、それぞれのサブバンドに対する耳に聞こえないノイズレベルであるマスキング臨界値、すなわち、ＳＭＲを決定する。心理音響モデル部１４０で決定されたＳＭＲ値は、量子化及びハフマン符号化部１２０に入力される。 The psychoacoustic model unit 140 uses the frequency spectrum output from the FFT unit 130 to remove perceptual redundancy due to human auditory characteristics, and is a noise level that is inaudible to each subband. The masking critical value, ie SMR, is determined. The SMR value determined by the psychoacoustic model unit 140 is input to the quantization and Huffman coding unit 120.

また、心理音響モデル部１４０は、知覚エネルギーを計算してウィンドウスイッチング如何を決定し、ウィンドウスイッチング情報をＭＤＣＴ部１２０に出力する。 The psychoacoustic model unit 140 calculates perceptual energy to determine whether window switching is performed, and outputs window switching information to the MDCT unit 120.

量子化及びハフマン符号化部１５０では、心理音響モデル部１４０から入力されたＳＭＲ値に基づいて、ＭＤＣＴ部１２０から入力されたＭＤＣＴが行われた周波数領域のデータについて、知覚的重複性を除去するためのビット割当てとオーディオ符号化のための量子化過程とを行う。 Based on the SMR value input from the psychoacoustic model unit 140, the quantization and Huffman encoding unit 150 removes perceptual redundancy from the frequency domain data input from the MDCT unit 120 and subjected to MDCT. Bit allocation and quantization process for audio encoding.

ビットストリームフォーマッティング部１６０は、量子化及びハフマン符号化部１５０から入力された符号化されたオーディオ信号を、ＭＰＥＧで定めたビットストリームでフォーマッティングして出力する。 The bit stream formatting unit 160 formats and outputs the encoded audio signal input from the quantization and Huffman encoding unit 150 with a bit stream defined by MPEG.

前述されたように、図１に示された従来の心理音響モデルでは、マスキング臨界値を計算するために、入力オーディオ信号から得られたＦＦＴスペクトルを使用する。しかし、フィルタバンクは、アリアシングを起こし、これらアリアシングが発生した成分から得られた値が量子化ステップで使われるため、心理音響モデルでＦＦＴスペクトルに基づいてＳＭＲを求め、これを量子化ステップで使用する場合、最適の結果を得られないという問題点がある。 As described above, the conventional psychoacoustic model shown in FIG. 1 uses the FFT spectrum obtained from the input audio signal to calculate the masking critical value. However, the filter bank causes aliasing, and values obtained from these aliasing components are used in the quantization step. Therefore, the psychoacoustic model obtains the SMR based on the FFT spectrum, and the quantization step When using with, there is a problem that an optimum result cannot be obtained.

本発明は、前記問題点を解決するためのものであって、変形された心理音響モデルを使用して、従来のＭＰＥＧオーディオ符号化器に比べて、出力オーディオストリームの音質を向上させ、デジタルオーディオ符号化ステップの計算量を減少させることが可能なデジタルオーディオ符号化方法及び装置を提供することを目的とする。 The present invention is to solve the above-mentioned problems, and uses a modified psychoacoustic model to improve the sound quality of an output audio stream as compared with a conventional MPEG audio encoder, and It is an object of the present invention to provide a digital audio encoding method and apparatus capable of reducing the calculation amount of the encoding step.

前記課題を解決するために、本発明によるデジタルオーディオ符号化方法は、入力オーディオ信号の特性によってウィンドウタイプを決定するステップと、前記決定されたウィンドウタイプによって前記入力オーディオ信号からＣＭＤＣＴ（ＣｏｍｐｌｅｘＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）スペクトルを生成するステップと、前記決定されたウィンドウタイプを利用して、前記入力オーディオ信号からＦＦＴスペクトルを生成するステップと、前記生成されたＣＭＤＣＴスペクトル及びＦＦＴスペクトルを利用して、心理音響モデルの分析を行うステップとを含むことを特徴とする。 In order to solve the above problems, a digital audio encoding method according to the present invention includes a step of determining a window type according to characteristics of an input audio signal, and a CMDCT (Complex Modified Discrete Cosine) from the input audio signal according to the determined window type. A transform spectrum), a FFT spectrum is generated from the input audio signal using the determined window type, and a psychoacoustic model is generated using the generated CMDCT spectrum and FFT spectrum. And a step of performing the analysis.

前記課題を達成するための本発明によるさらに望ましいデジタルオーディオ符号化方法は、決定されたウィンドウタイプがロングウィンドウである場合、ロングウィンドウを適用して、ロングＣＭＤＣＴスペクトルを生成し、ショートウィンドウを適用して、ショートＦＦＴスペクトルを生成し、生成されたロングＣＭＤＣＴスペクトル及びショートＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする。 In order to achieve the above object, a more preferable digital audio encoding method according to the present invention applies a long window to generate a long CMDCT spectrum and applies a short window when the determined window type is a long window. Then, a short FFT spectrum is generated, and a psychoacoustic model is analyzed based on the generated long CMDCT spectrum and short FFT spectrum.

前記課題を解決するために、本発明によるデジタルオーディオ符号化装置は、入力オーディオ信号の特性によってウィンドウタイプを決定するウィンドウスイッチング部と、前記ウィンドウスイッチング部で決定されたウィンドウタイプによって、前記入力オーディオ信号からＣＭＤＣＴスペクトルを生成するＣＭＤＣＴ部と、前記ウィンドウスイッチング部で決定されたウィンドウタイプを利用して、前記入力オーディオ信号からＦＦＴスペクトルを生成するＦＦＴ部と、前記ＣＭＤＣＴ部で生成されたＣＭＤＣＴスペクトル及び前記ＦＦＴ部で生成されたＦＦＴスペクトルを利用して、心理音響モデルの分析を行う心理音響モデル部と、を含むことを特徴とする。 In order to solve the above problems, a digital audio encoding apparatus according to the present invention includes a window switching unit that determines a window type according to characteristics of an input audio signal, and the input audio signal according to the window type determined by the window switching unit. A CMDCT unit for generating a CMDCT spectrum from the input signal, an FFT unit for generating an FFT spectrum from the input audio signal using a window type determined by the window switching unit, a CMDCT spectrum generated by the CMDCT unit, and the And a psychoacoustic model unit that analyzes a psychoacoustic model using the FFT spectrum generated by the FFT unit.

前記課題を達成するための本発明によるさらに望ましいデジタルオーディオ符号化装置は、ウィンドウスイッチング部で決定されたウィンドウタイプがロングウィンドウである場合、前記ＣＭＤＣＴ部は、ロングウィンドウを適用して、ロングＣＭＤＣＴスペクトルを生成し、前記ＦＦＴ部は、ショートウィンドウを適用して、ショートＦＦＴスペクトルを生成し、前記心理音響モデル部は、前記ＣＭＤＣＴ部で生成されたロングＣＭＤＣＴスペクトル及び前記ＦＦＴ部で生成されたショートＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする。 According to another aspect of the present invention, there is provided a digital audio encoding apparatus according to the present invention, wherein when the window type determined by the window switching unit is a long window, the CMDCT unit applies a long window to generate a long CMDCT spectrum. The FFT unit applies a short window to generate a short FFT spectrum, and the psychoacoustic model unit generates a long CMDCT spectrum generated by the CMDCT unit and a short FFT generated by the FFT unit. A psychoacoustic model is analyzed based on the spectrum.

前記課題を解決するために、本発明によるデジタルオーディオ符号化方法は、入力オーディオ信号からＣＭＤＣＴスペクトルを生成するステップと、生成されたＣＭＤＣＴスペクトルを利用して、心理音響モデルの分析を行うステップとを含むことを特徴とする。 In order to solve the above problems, a digital audio encoding method according to the present invention includes a step of generating a CMDCT spectrum from an input audio signal, and a step of analyzing a psychoacoustic model using the generated CMDCT spectrum. It is characterized by including.

前記課題を達成するための、本発明によるさらに望ましいデジタルオーディオ符号化方法は、入力オーディオ信号について、ロングウィンドウ及びショートウィンドウを適用してＣＭＤＣＴを行い、ロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを生成するステップをさらに含むことを特徴とする。 In order to achieve the above object, a more preferable digital audio encoding method according to the present invention performs a CMDCT by applying a long window and a short window to an input audio signal to generate a long CMDCT spectrum and a short CMDCT spectrum. It is further characterized by including.

前記課題を達成するための、本発明によるさらに望ましいデジタルオーディオ符号化方法は、生成されたロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを使用して、心理音響モデルの分析を行うことを特徴とする。 In order to achieve the above object, a more desirable digital audio encoding method according to the present invention is characterized in that a psychoacoustic model is analyzed using the generated long CMDCT spectrum and short CMDCT spectrum.

前記課題を達成するための、本発明によるさらに望ましいデジタルオーディオ符号化方法は、決定されたウィンドウタイプがロングウィンドウである場合には、心理音響モデルの分析結果に基づいて、ロングＭＤＣＴスペクトルについて量子化及び符号化を行い、決定されたウィンドウタイプがショートウィンドウである場合には、心理音響モデルの分析結果に基づいて、ショートＭＤＣＴスペクトルについて量子化及び符号化を行うことを特徴とする。 In order to achieve the above object, a more preferable digital audio encoding method according to the present invention is to quantize a long MDCT spectrum based on an analysis result of a psychoacoustic model when the determined window type is a long window. When the window type determined is a short window, the short MDCT spectrum is quantized and encoded based on the analysis result of the psychoacoustic model.

前記課題を解決するために、本発明によるデジタルオーディオ符号化装置は、入力オーディオ信号からＣＭＤＣＴスペクトルを生成するＣＭＤＣＴ部と、前記ＣＭＤＣＴ部で生成されたＣＭＤＣＴスペクトルを利用して、心理音響モデルの分析を行う心理音響モデル部とを含むことを特徴とする。 In order to solve the above problems, a digital audio encoding apparatus according to the present invention analyzes a psychoacoustic model using a CMDCT unit that generates a CMDCT spectrum from an input audio signal, and a CMDCT spectrum generated by the CMDCT unit. And a psychoacoustic model unit for performing the above.

前記課題を達成するための、本発明によるさらに望ましいデジタルオーディオ符号化装置では、前記ＣＭＤＣＴ部は、前記入力オーディオ信号について、ロングウィンドウ及びショートウィンドウを適用してＣＭＤＣＴを行い、ロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを生成することを特徴とする。 In a more preferable digital audio encoding apparatus according to the present invention for achieving the above object, the CMDCT unit performs CMDCT on the input audio signal by applying a long window and a short window, and performs a long CMDCT spectrum and a short CMDCT. A spectrum is generated.

前記課題を達成するための、本発明によるさらに望ましいデジタルオーディオ符号化装置では、前記心理音響モデル部は、前記ＣＭＤＣＴ部で生成されたロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを使用して、心理音響モデルの分析を行うことを特徴とする。 In a more desirable digital audio encoding device according to the present invention for achieving the above-mentioned object, the psychoacoustic model unit uses a long CMDCT spectrum and a short CMDCT spectrum generated by the CMDCT unit to generate a psychoacoustic model. It is characterized by performing an analysis.

前記課題を達成するための、本発明によるさらに望ましいデジタルオーディオ符号化装置は、量子化及び符号化部をさらに含み、量子化及び符号化部は、決定されたウィンドウタイプがロングウィンドウである場合には、心理音響モデルの分析結果に基づいて、ロングＭＤＣＴスペクトルについて量子化及び符号化を行い、決定されたウィンドウタイプがショートウィンドウである場合には、心理音響モデルの分析結果に基づいて、ショートＭＤＣＴスペクトルについて量子化及び符号化を行うことを特徴とする。 According to another aspect of the present invention, there is provided a digital audio encoding apparatus further including a quantization and encoding unit, wherein the quantization and encoding unit is a long window when the determined window type is a long window. Performs quantization and encoding on the long MDCT spectrum based on the analysis result of the psychoacoustic model. When the determined window type is a short window, the short MDCT is calculated based on the analysis result of the psychoacoustic model. The spectrum is quantized and encoded.

ＭＰＥＧオーディオ符号化器は、非常に多くの計算量を要求するため、リアルタイム処理に適用し難い。出力オーディオの音質を低下させることによって、エンコーディングアルゴリズムを単純化することが可能である。しかし、音質を低下させずに計算量を減少させることは、非常に難しいことである。 The MPEG audio encoder requires a very large amount of calculation and is difficult to apply to real-time processing. By reducing the sound quality of the output audio, it is possible to simplify the encoding algorithm. However, it is very difficult to reduce the calculation amount without deteriorating the sound quality.

また、従来のＭＰＥＧオーディオ符号化器で使われるフィルタバンクは、アリアシングを発生する。これらアリアシングが発生した成分から得られた値が量子化ステップで使われるため、これらアリアシングが発生したスペクトルに心理音響モデルを適用することが望ましい。 Also, the filter bank used in the conventional MPEG audio encoder generates aliasing. Since values obtained from these aliasing components are used in the quantization step, it is desirable to apply a psychoacoustic model to the spectrum in which these aliasings have occurred.

また、後述する式（２）に表されたように、ＭＤＣＴスペクトルは、周波数２π（ｋ＋０.５）／Ｎ，ｋ＝０，１，．．．Ｎ／２−１でのサイズ及び位相値を与える。したがって、これら周波数でのスペクトルを計算して、心理音響モデルを適用することが望ましい。 In addition, as expressed in Equation (2) described later, the MDCT spectrum has a frequency of 2π (k + 0.5) / N, k = 0, 1,. . . Give the size and phase value at N / 2-1. Therefore, it is desirable to apply a psychoacoustic model by calculating the spectrum at these frequencies.

また、フィルタバンクの出力に対してＣＭＤＣＴを適用して入力信号のスペクトルを計算し、これにより、心理音響モデルを適用することによって、従来のＭＰＥＧオーディオ符号化器に比べて、ＦＦＴ変換に必要な計算量を減らすか、またはＦＦＴ変換過程を省略することが可能である。 Also, the CMDCT is applied to the output of the filter bank to calculate the spectrum of the input signal, thereby applying the psychoacoustic model, which is necessary for the FFT conversion as compared with the conventional MPEG audio encoder. It is possible to reduce the calculation amount or to omit the FFT conversion process.

本発明は、前記のような点に着眼したものであって、本発明によるオーディオ符号化方法及び装置は、出力されるＭＰＥＧオーディオストリームの音質を低下させずに、ＭＰＥＧオーディオ符号化プロセッサの複雑さをも減少させることが可能である。 The present invention focuses on the above points, and the audio encoding method and apparatus according to the present invention does not deteriorate the sound quality of the output MPEG audio stream, and the complexity of the MPEG audio encoding processor. Can also be reduced.

以下では、式（１）ないし（４）を参照して、本発明に使われるアルゴリズムを詳細に説明する。 Hereinafter, the algorithm used in the present invention will be described in detail with reference to equations (1) to (4).

フィルタバンクは、入力信号をπ／３２の解像度で入力信号を分割する。後述するように、フィルタバンクの出力値にＣＭＤＣＴを適用することによって、入力信号のスペクトルを計算することが可能である。このとき、変換長さは、フィルタバンクの出力値を使用せず、入力信号にＣＭＤＣＴを直接適用した場合よりはるかに短い。フィルタバンクの出力に、このような短い長さの変換値を使用することは、長い長さの変換値を使用する場合より計算量を減らすことができるという長所がある。 The filter bank divides the input signal with a resolution of π / 32. As will be described later, it is possible to calculate the spectrum of the input signal by applying CMDCT to the output value of the filter bank. At this time, the conversion length is much shorter than when the CMDCT is directly applied to the input signal without using the output value of the filter bank. Using such a short length conversion value for the output of the filter bank has an advantage that the amount of calculation can be reduced as compared with the case of using a long length conversion value.

ＣＭＤＣＴは、次の式（１）によって計算されうる。 The CMDCT can be calculated by the following equation (1).

ここで、ｋ＝０，１，２，．．．Ｎ／２−１である。

Here, k = 0, 1, 2,. . . N / 2-1.

この場合、Ｘ_Ｃ（ｋ）は、ＭＤＣＴであり、Ｘ_Ｓ（Ｋ）は、ＭＤＳＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＳｉｎｅＴｒａｎｓｆｏｒｍ）である。次の誘導式は、ＣＭＤＣＴとＦＦＴとの関係を説明する。 In this case, X _C (k) is MDCT, and X _S (K) is MDST (Modified Discrete Sine Transform). The following inductive equation explains the relationship between CMDCT and FFT.

ここで、

here,

であり、ｋ＝０，１，．．．Ｎ／２−１である。また、ＭＤＳＴは、ＭＤＣＴと同様に、

And k = 0, 1,. . . N / 2-1. MDST is similar to MDCT,

ここで、ｋ＝０，１，．．．．Ｎ／２−１である。

Here, k = 0, 1,. . . . N / 2-1.

また、次の式（４）のように、 Also, as in the following formula (4),

をＣＭＤＣＴの共役複素数とすれば、

Is a conjugated complex number of CMDCT,

ここで、

here,

であり、ｋ＝０，１，２，．．．Ｎ／２−１である。

And k = 0, 1, 2,. . . N / 2-1.

前記式（４）から分かるように、ＣＭＤＣＴの共役複素数は、ＤＦＴスペクトルの周波数間、すなわち、２π（Ｋ＋０.５）／Ｎ，ｋ＝０，１，．．．Ｎ／２−１の周波数でスペクトルを計算する。 As can be seen from the equation (4), the conjugate complex number of the CMDCT is between the frequencies of the DFT spectrum, that is, 2π (K + 0.5) / N, k = 0, 1,. . . Calculate the spectrum at a frequency of N / 2-1.

ＣＭＤＣＴの位相は、Ｘ’（ｋ）の位相がシフトされたものであり、このような位相シフトは、ＭＰＥＧ−１レイヤ３の心理音響モデルでの非予測度の計算に影響を及ぼさない。 The phase of CMDCT is obtained by shifting the phase of X ′ (k), and such a phase shift does not affect the calculation of unpredictability in the MPEG-1 layer 3 psychoacoustic model.

本発明による心理音響モデルでは、このような点を考慮して、心理音響モデル分析を行う時、ＦＦＴスペクトルの代わりにＣＭＤＣＴスペクトルを使用するか、またはロングＦＦＴスペクトルまたはショートＦＦＴスペクトルの代わりにロングＣＭＤＣＴスペクトルまたはショートＣＭＤＣＴスペクトルを使用する。これにより、ＦＦＴ変換にかかる計算量を減少させることが可能である。 In the psychoacoustic model according to the present invention, the CMDCT spectrum is used instead of the FFT spectrum or the long CMDCT is used instead of the long FFT spectrum or the short FFT spectrum when performing the psychoacoustic model analysis. Use spectrum or short CMDCT spectrum. Thereby, it is possible to reduce the calculation amount concerning FFT conversion.

以下では、実施形態に基づいて、本発明を詳細に説明する。 Below, based on embodiment, this invention is demonstrated in detail.

図２は、本発明の一実施形態によるオーディオ符号化装置を示すブロック図である。 FIG. 2 is a block diagram illustrating an audio encoding device according to an embodiment of the present invention.

フィルタバンク２１０は、入力オーディオ信号の統計的な重複性を除去するために、入力された時間領域のオーディオ信号を周波数領域のサブバンドに分割する。本実施形態では、π／３２の帯域幅を有する３２個のサブバンドに分割する。本実施形態では、３２多相フィルタバンクを使用したが、選択的にサブバンド符号化が可能な他のフィルタを使用することもある。 The filter bank 210 divides the input time domain audio signal into frequency domain subbands in order to remove statistical duplication of the input audio signal. In this embodiment, it is divided into 32 subbands having a bandwidth of π / 32. In this embodiment, 32 polyphase filter banks are used, but other filters capable of selectively performing subband coding may be used.

ウィンドウスイッチング部２２０は、入力オーディオ信号の特性に基づいて、ＣＭＤＣＴ部２３０及びＦＦＴ部２４０で使われるウィンドウタイプを決定し、決定されたウィンドウタイプについての情報をＣＭＤＣＴ部２３０及びＦＦＴ部２４０に入力する。 The window switching unit 220 determines a window type used by the CMDCT unit 230 and the FFT unit 240 based on the characteristics of the input audio signal, and inputs information about the determined window type to the CMDCT unit 230 and the FFT unit 240. .

ウィンドウタイプには、ショートウィンドウとロングウィンドウとがある。ＭＰＥＧ−１レイヤ３では、ロングウィンドウ、スタートウィンドウ、ショートウィンドウ、ストップウィンドウを規定している。このとき、スタートウィンドウまたはストップウィンドウは、ロングウィンドウからショートウィンドウにスイッチングするために使われる。本実施形態では、ＭＰＥＧ−１に規定されたウィンドウタイプを例として説明したが、選択的に他のウィンドウタイプによって、ウィンドウスイッチングアルゴリズムを行うこともある。本発明によるウィンドウスイッチングアルゴリズムについての詳細な説明は、図３及び４を参照して後述する。 Window types include short windows and long windows. In MPEG-1 layer 3, a long window, a start window, a short window, and a stop window are defined. At this time, the start window or stop window is used to switch from the long window to the short window. In the present embodiment, the window type defined in MPEG-1 has been described as an example. However, a window switching algorithm may be selectively performed according to another window type. A detailed description of the window switching algorithm according to the present invention will be described later with reference to FIGS.

ＣＭＤＣＴ部２３０は、ウィンドウスイッチング部２２０から入力されたウィンドウタイプ情報に基づいて、フィルタバンク２１０の出力データにロングウィンドウまたはショートウィンドウを適用してＣＭＤＣＴを行う。 The CMDCT unit 230 performs CMDCT by applying a long window or a short window to the output data of the filter bank 210 based on the window type information input from the window switching unit 220.

ＣＭＤＣＴ部２３０で計算されたＣＭＤＣＴの実数値、すなわち、ＭＤＣＴ値は、量子化及び符号化部２６０に入力される。また、ＣＭＤＣＴ部２３０では、計算されたサブバンドスペクトルを加算して全体スペクトルを計算し、計算された全体スペクトルを心理音響モデル部２５０に伝送する。サブバンドスペクトルから全体スペクトルを求める過程は、図５と関連して後述する。 The CMDCT real value calculated by the CMDCT unit 230, that is, the MDCT value is input to the quantization and encoding unit 260. In addition, the CMDCT unit 230 calculates the entire spectrum by adding the calculated subband spectra, and transmits the calculated entire spectrum to the psychoacoustic model unit 250. The process of obtaining the entire spectrum from the subband spectrum will be described later with reference to FIG.

選択的に、ＭＤＣＴの速い実行のために、ＬＡＭＥアルゴリズムが使用されうる。ＬＡＭＥアルゴリズムで、ＭＤＣＴは、次の式（１）を展開することによって最適化される。計算に関連した三角法による係数の対称性を利用することによって、同じ係数による連続する乗算演算は、加算演算に代替される。これは、２４４回の乗算及び３２４回の加算で演算カウントを減少させ、３６ポイントＭＤＣＴについて約７０％ほどのＭＤＣＴ時間を節減する。このアルゴリズムは、ＭＤＳＴについても適用されうる。 Optionally, the LAME algorithm can be used for fast execution of MDCT. With the LAME algorithm, MDCT is optimized by developing the following equation (1). By taking advantage of the symmetry of the trigonometric coefficients associated with the computation, successive multiplication operations with the same coefficients are replaced with addition operations. This reduces the operation count with 244 multiplications and 324 additions, saving about 70% MDCT time for 36 points MDCT. This algorithm can also be applied for MDST.

ＦＦＴ部２４０は、ウィンドウスイッチング部２２０からのウィンドウタイプ情報に基づいて、入力オーディオ信号についてロングウィンドウまたはショートウィンドウを使用して、ＦＦＴを行い、計算されたロングＦＦＴスペクトルまたはショートＦＦＴスペクトルを心理音響モデル部２５０に出力する。このとき、ＣＭＤＣＴ部２３０で使われるウィンドウタイプがロングウィンドウである場合には、ＦＦＴ部２４０ではショートウィンドウを使用する。すなわち、ＣＭＤＣＴ部２３０の出力がロングＣＭＤＣＴスペクトルである場合、ＦＦＴ部２４０の出力は、ショートＦＦＴスペクトルとなる。同様に、ＣＭＤＣＴ部２３０の出力がショートＣＭＤＣＴスペクトルである場合、ＦＦＴ部２４０の出力は、ロングＦＦＴスペクトルとなる。 The FFT unit 240 performs FFT on the input audio signal using the long window or the short window based on the window type information from the window switching unit 220, and uses the calculated long FFT spectrum or short FFT spectrum as a psychoacoustic model. Output to the unit 250. At this time, if the window type used in the CMDCT unit 230 is a long window, the FFT unit 240 uses a short window. That is, when the output of the CMDCT unit 230 is a long CMDCT spectrum, the output of the FFT unit 240 is a short FFT spectrum. Similarly, when the output of the CMDCT unit 230 is a short CMDCT spectrum, the output of the FFT unit 240 is a long FFT spectrum.

心理音響モデル部２５０は、ＣＭＤＣＴ部２３０からのＣＭＤＣＴスペクトル及びＦＦＴ部２４０からのＦＦＴスペクトルを組合わせて心理音響モデルで使われる非予測度を計算する。 The psychoacoustic model unit 250 calculates the non-prediction degree used in the psychoacoustic model by combining the CMDCT spectrum from the CMDCT unit 230 and the FFT spectrum from the FFT unit 240.

例えば、ＣＭＤＣＴでロングウィンドウが使われる場合、ロングスペクトルは、ロングＭＤＣＴとロングＭＤＳＴの結果値を使用して計算され、ショートスペクトルは、ＦＦＴを使用して計算される。ここで、ロングスペクトルの場合、ＣＭＤＣＴ部２３０で計算されたＣＭＤＣＴスペクトルを使用することは、式（３）及び式（４）から分かるように、ＦＦＴ及びＭＤＣＴのサイズは類似しているという点を利用したものである。 For example, when a long window is used in CMDCT, the long spectrum is calculated using the result values of long MDCT and long MDST, and the short spectrum is calculated using FFT. Here, in the case of a long spectrum, using the CMDCT spectrum calculated by the CMDCT unit 230 indicates that the sizes of FFT and MDCT are similar, as can be seen from Equation (3) and Equation (4). It is used.

また、ＣＭＤＣＴでショートウィンドウが使われる場合、ショートスペクトルは、ショートＭＤＣＴとショートＭＤＳＴの結果値を使用して計算され、ロングスペクトルは、ＦＦＴスペクトルを使用して計算される。 When a short window is used in CMDCT, the short spectrum is calculated using the result values of the short MDCT and the short MDST, and the long spectrum is calculated using the FFT spectrum.

一方、ＣＭＤＣＴ部２３０で計算されたＣＭＤＣＴスペクトルは、ロングウィンドウが適用された場合には、１１５２（３２サブバンド３６サブ−サブバンド）の長さ、ショートウィンドウが適用された場合には、３８４（３２サブバンド１２サブ−サブバンド）の長さを有する。一方、心理音響モデル部２５０は、長さが１０２４または２５６のスペクトルを必要とする。 On the other hand, the CMDCT spectrum calculated by the CMDCT unit 230 has a length of 1152 (32 subbands 36 sub-subbands) when a long window is applied, and 384 ( 32 subbands 12 sub-subbands). On the other hand, the psychoacoustic model unit 250 requires a spectrum having a length of 1024 or 256.

したがって、ＣＭＤＣＴスペクトルは、心理音響モデル分析が行われる前に線形マッピングによって、１１５２（または３８４）の長さから１０２４（または２５６）の長さに再サンプリングされる。 Thus, the CMDCT spectrum is resampled from a length of 1152 (or 384) to a length of 1024 (or 256) by linear mapping before psychoacoustic model analysis is performed.

また、心理音響モデル部２５０では、計算された非予測度を使用して、ＳＭＲ値を求め、これを量子化及び符号化部２６０に出力する。 In addition, the psychoacoustic model unit 250 obtains an SMR value using the calculated non-prediction degree, and outputs this to the quantization and encoding unit 260.

量子化及び符号化部２６０は、スケールファクタを決定し、心理音響モデル部２５０で計算されたＳＭＲ値に基づいて、量子化係数を決定する。決定された量子化係数に基づいて、量子化を行い、量子化されたデータについてハフマン符号化を行う。 The quantization and encoding unit 260 determines a scale factor, and determines a quantization coefficient based on the SMR value calculated by the psychoacoustic model unit 250. Quantization is performed based on the determined quantization coefficient, and Huffman coding is performed on the quantized data.

ビットストリームフォーマット部２７０は、量子化及び符号化部２６０から入力されたデータを特定フォーマットに変換して出力する。前記オーディオ符号化装置がＭＰＥＧオーディオ符号化装置である場合には、ＭＰＥＧ標準で定めたフォーマットに変換して出力する。 The bit stream format unit 270 converts the data input from the quantization and encoding unit 260 into a specific format and outputs the data. If the audio encoding device is an MPEG audio encoding device, it is converted into a format defined by the MPEG standard and output.

図３は、図２のウィンドウスイッチング部２２０で使われるフィルタバンクの出力に基づいたウィンドウスイッチングアルゴリズムに使われる遷移信号検出方式を示す図である。 FIG. 3 is a diagram illustrating a transition signal detection method used in the window switching algorithm based on the output of the filter bank used in the window switching unit 220 of FIG.

ＭＰＥＧで標準化されたＭＰＥＧオーディオ規格によれば、実際ウィンドウタイプは、現在フレームのウィンドウタイプと次のフレームのウィンドウスイッチングフラグとに基づいて決定される。心理音響モデルは、知覚エントロピーに基づいて、ウィンドウスイッチングフラグを決定する。そのため、心理音響モデルは、フィルタバンク及びＭＤＣＴで処理されるフレームより少なくとも一つの以前フレームについて行うことが必要であった。 According to the MPEG audio standard standardized by MPEG, the actual window type is determined based on the window type of the current frame and the window switching flag of the next frame. The psychoacoustic model determines a window switching flag based on perceptual entropy. Therefore, the psychoacoustic model needs to be performed on at least one previous frame than the frame processed by the filter bank and MDCT.

一方、本発明による心理音響モデルは、前述したように、ＣＭＤＣＴスペクトルを使用する。したがって、ウィンドウタイプは、ＣＭＤＣＴが適用される前に決定されねばならない。また、このような理由で、ウィンドウスイッチングフラグは、フィルタバンクの出力から決定され、フィルタバンク及びウィンドウスイッチングは、量子化及び心理音響モデルに比べて１フレーム前のフレームについて行われる。 On the other hand, the psychoacoustic model according to the present invention uses the CMDCT spectrum as described above. Therefore, the window type must be determined before CMDCT is applied. For this reason, the window switching flag is determined from the output of the filter bank, and the filter bank and window switching are performed for a frame one frame before the quantization and psychoacoustic model.

図３に示されたように、フィルタバンクからの入力信号は、３個の時間帯域と２個の周波数帯域、すなわち、総６個の帯域に分割される。図３で、横軸は、各フレーム当り３６個のサンプル、すなわち、それぞれ１２個のサンプルを有する３個の時間帯域に分けられる。縦軸は、各フレーム当り３２個のサブバンド、すなわち、それぞれ１６個のサブバンドを有する２個の周波数帯域に分けられる。ここで、３６個のサンプルと３２個のサブバンドとは、１１５２個のサンプルの入力に対応する。 As shown in FIG. 3, the input signal from the filter bank is divided into three time bands and two frequency bands, that is, a total of six bands. In FIG. 3, the horizontal axis is divided into 36 samples per frame, i.e. 3 time bands each having 12 samples. The vertical axis is divided into 32 subbands per frame, i.e. 2 frequency bands each having 16 subbands. Here, 36 samples and 32 subbands correspond to an input of 1152 samples.

斜線部分は、遷移検出のために使われる部分であるが、説明の便宜のために、各斜線部分を（１）、（２）、（３）及び（４）とする。各領域に対するエネルギーをＥ１、Ｅ２、Ｅ３、及びＥ４とする場合、領域（１）及び（２）間のエネルギー比Ｅ１／Ｅ２と、領域（３）及び（４）間のエネルギー比Ｅ３／Ｅ４とは、遷移如何を表示する遷移表示子である。 The hatched portions are portions used for transition detection. For convenience of explanation, the hatched portions are (1), (2), (3), and (4). When the energy for each region is E1, E2, E3, and E4, the energy ratio E1 / E2 between regions (1) and (2) and the energy ratio E3 / E4 between regions (3) and (4) Is a transition indicator for displaying the transition status.

非遷移信号の場合、遷移表示子の値は、一定範囲内にある。したがって、遷移表示子が一定範囲を逸脱する場合、ウィンドウスイッチングアルゴリズムは、ショートウィンドウが必要であるということを表示する。 In the case of a non-transition signal, the value of the transition indicator is within a certain range. Thus, if the transition indicator deviates from a certain range, the window switching algorithm indicates that a short window is required.

図４は、図２に示されたウィンドウスイッチング部２２０で行われるウィンドウスイッチングアルゴリズム方式を示すフローチャートである。 FIG. 4 is a flowchart illustrating a window switching algorithm method performed by the window switching unit 220 illustrated in FIG.

ステップ４１０では、３２個のサブバンドと、各サブバンド当り３６個の出力サンプルとを有する１フレームのフィルタバンクの出力が入力される。 In step 410, the output of a 1-frame filter bank having 32 subbands and 36 output samples per subband is input.

ステップ４２０では、図３に示されたように、それぞれ１２個のサンプル値を有する３個の時間帯域と１６個の周波数帯域を有する周波数帯域とに分割される。 In step 420, as shown in FIG. 3, the data is divided into 3 time bands each having 12 sample values and 16 frequency bands.

ステップ４３０では、遷移信号を検出するために使われるバンドのエネルギーＥ１、Ｅ２、Ｅ３及びＥ４が計算される。 In step 430, the band energies E1, E2, E3 and E4 used to detect the transition signal are calculated.

ステップ４３０では、入力信号の遷移如何を判断するために、計算された周辺バンドのエネルギーが比較される。すなわち、Ｅ１／Ｅ２及びＥ３／Ｅ４が計算される。 In step 430, the calculated peripheral band energy is compared to determine whether the input signal transitions. That is, E1 / E2 and E3 / E4 are calculated.

ステップ４４０では、計算された周辺バンドのエネルギー比に基づいて、入力信号の遷移如何を決定する。入力信号に遷移がある場合には、ショートウィンドウを表示するためのウィンドウスイッチングフラッグが生成され、遷移がない場合には、ロングウィンドウを表示するためのウィンドウスイッチングフラッグが生成される。 In step 440, the transition of the input signal is determined based on the calculated energy ratio of the surrounding bands. When there is a transition in the input signal, a window switching flag for displaying a short window is generated, and when there is no transition, a window switching flag for displaying a long window is generated.

ステップ４５０では、ステップ４４０で生成されたウィンドウスイッチングフラグと以前フレームで使われたウィンドウとに基づいて、実際適用されるウィンドウタイプを決定する。適用されるウィンドウタイプは、ＭＰＥＧ−１標準で使われている“ショート”、“ロングストップ”、“ロングスタート”、または“ロング”のうち何れか一つでありうる。 In step 450, a window type to be actually applied is determined based on the window switching flag generated in step 440 and the window used in the previous frame. The applied window type may be any one of “short”, “long stop”, “long start”, and “long” used in the MPEG-1 standard.

図５は、本発明によるサブバンドスペクトルから全体スペクトルを求める方法を示す図である。以下では、図５を参照して、サブバンドフィルタバンクの出力から計算されたスペクトルから信号スペクトルを近似的に計算するための方法を説明する。 FIG. 5 is a diagram illustrating a method for obtaining an entire spectrum from a subband spectrum according to the present invention. In the following, with reference to FIG. 5, a method for approximately calculating the signal spectrum from the spectrum calculated from the output of the subband filter bank will be described.

図５に示されたように、入力信号は、分析フィルタ、Ｈ_０（Ｚ），Ｈ_１（Ｚ），Ｈ_２（Ｚ），．．．Ｈ_Ｍ−１（Ｚ）によってフィルタリングされ、ダウンサンプリングされる。以後、ダウンサンプリングされていた信号、ｙ_０（ｎ），ｙ_１（ｎ），ｙ_２（ｎ），．．．ｙ_Ｍ−１（ｎ）は、アップサンプリングされ、合成フィルタ、Ｇ_０（Ｚ），Ｇ_１（Ｚ），Ｇ_２（Ｚ），．．．Ｇ_Ｍ−１（Ｚ）によってフィルタリングされ、信号を再構成するために加算される。 As shown in FIG. 5, the input signal is input to analysis filters H ₀ (Z), H ₁ (Z), H ₂ (Z),. . . Filtered and downsampled by H _M-1 (Z). Thereafter, the down-sampled signals y ₀ (n), y ₁ (n), y ₂ (n),. . . y _M-1 (n) is upsampled and synthesized filters G ₀ (Z), G ₁ (Z), G ₂ (Z),. . . Filtered by G _M-1 (Z) and added to reconstruct the signal.

このような過程は、周波数領域での、スペクトルを反復し、対応するフィルタの周波数応答に乗算した後、全ての帯域のスペクトルを加算する過程に対応する。したがって、このフィルタが理想的な場合、それぞれの帯域に対するＹ_ｍ（ｋ）を何れも加算したスペクトルと同一になり、結果的に、入力ＦＦＴスペクトルを得ることができる。また、これらフィルタが理想的なフィルタに近接した場合にも、近似的なスペクトルを得ることができるが、本発明による心理音響モデルでは、これを利用する。 Such a process corresponds to the process of repeating the spectrum in the frequency domain, multiplying the frequency response of the corresponding filter, and then adding the spectra of all bands. Therefore, when this filter is ideal, the spectrum is the same as the spectrum obtained by adding Y _m (k) for each band, and as a result, an input FFT spectrum can be obtained. Moreover, even when these filters are close to the ideal filter, an approximate spectrum can be obtained, but this is used in the psychoacoustic model according to the present invention.

実験結果、使われるフィルタが理想的なバンドパスフィルタではない場合にも、ＭＰＥＧ−１レイヤ３に使われるフィルタバンクである場合、前記方法によって得られたスペクトルは、実際スペクトルと類似しているという実験結果を得た。 As a result of the experiment, even when the filter used is not an ideal bandpass filter, if it is a filter bank used for MPEG-1 layer 3, the spectrum obtained by the above method is similar to the actual spectrum. Experimental results are obtained.

このように、入力信号のスペクトルは、全ての帯域でのＣＭＤＣＴスペクトルを加算することによって得ることができる。ＣＭＤＣＴを使用して得られたスペクトルは、１１５２ポイントである一方、心理音響モデルに必要なスペクトルは、１０２４ポイントである。したがって、ＣＭＤＣＴスペクトルは、簡単な線形マッピングを使用して再サンプリングされた後、心理音響モデルで使用されうる。 Thus, the spectrum of the input signal can be obtained by adding the CMDCT spectrum in all bands. The spectrum obtained using CMDCT is 1152 points, while the spectrum required for the psychoacoustic model is 1024 points. Thus, the CMDCT spectrum can be used in a psychoacoustic model after being resampled using a simple linear mapping.

図６は、本発明のさらに他の実施形態によるオーディオ符号化方法を示すフローチャートである。 FIG. 6 is a flowchart illustrating an audio encoding method according to another embodiment of the present invention.

ステップ６１０では、フィルタバンクでオーディオ信号を入力され、入力されたオーディオ信号の統計的な重複性を除去するために、入力された時間領域のオーディオ信号を周波数領域のサブバンドに分割する。 In step 610, the audio signal is input through the filter bank, and the input time-domain audio signal is divided into frequency-domain subbands in order to remove statistical duplication of the input audio signal.

ステップ６２０では、入力オーディオ信号の特性に基づいて、ウィンドウタイプを決定する。入力信号が遷移信号である場合には、ステップ６３０に進み、入力信号が遷移信号ではない場合には、ステップ６４０に進む。 In step 620, a window type is determined based on the characteristics of the input audio signal. If the input signal is a transition signal, the process proceeds to step 630. If the input signal is not a transition signal, the process proceeds to step 640.

ステップ６３０では、ステップ６１０で処理されたオーディオデータについて、ショートウィンドウを適用してショートＣＭＤＣＴを行い、それと同時に、ロングウィンドウを適用してロングＦＦＴを行う。この結果、ショートＣＭＤＣＴスペクトル及びロングＦＦＴスペクトルを得る。 In step 630, a short window is applied to the audio data processed in step 610 to perform short CMDCT, and at the same time, a long window is applied to perform long FFT. As a result, a short CMDCT spectrum and a long FFT spectrum are obtained.

ステップ６４０では、ステップ６１０で処理されたオーディオデータについて、ロングウィンドウを適用してロングＣＭＤＣＴを行い、それと同時に、ショートウィンドウを適用してショートＦＦＴを行う。この結果、ロングＣＭＤＣＴスペクトル及びショートＦＦＴスペクトルを得る。 In step 640, a long window is applied to the audio data processed in step 610 to perform a long CMDCT, and at the same time, a short window is applied to perform a short FFT. As a result, a long CMDCT spectrum and a short FFT spectrum are obtained.

ステップ６５０では、ステップ６２０で決定されたウィンドウタイプがショートウィンドウである場合には、ステップ６３０で得られたショートＣＭＤＣＴスペクトル及びロングＦＦＴスペクトルを利用して、心理音響モデルで使われる非予測度を計算し、ステップ６２０で決定されたウィンドウタイプがロングウィンドウである場合には、ステップ６４０で得られたロングＣＭＤＣＴスペクトル及びショートＦＦＴスペクトルを利用して、非予測度を計算する。また、計算された非予測度に基づいて、ＳＭＲ値を計算する。 In step 650, when the window type determined in step 620 is a short window, the non-prediction degree used in the psychoacoustic model is calculated using the short CMDCT spectrum and the long FFT spectrum obtained in step 630. If the window type determined in step 620 is a long window, the non-prediction degree is calculated using the long CMDCT spectrum and the short FFT spectrum obtained in step 640. Further, the SMR value is calculated based on the calculated non-prediction degree.

ステップ６６０では、ステップ６１０で得られたオーディオデータについて、ステップ６５０で計算されたＳＭＲ値によって量子化を行い、量子化されたデータについてハフマン符号化を行う。 In step 660, the audio data obtained in step 610 is quantized using the SMR value calculated in step 650, and the quantized data is subjected to Huffman coding.

ステップ６７０では、ステップ６６０で符号化されたデータを特定フォーマットに変換して出力する。前記オーディオ符号化方法がＭＰＥＧオーディオ符号化方法である場合には、ＭＰＥＧ標準で定めたフォーマットに変換して出力する。 In step 670, the data encoded in step 660 is converted into a specific format and output. If the audio encoding method is an MPEG audio encoding method, the audio encoding method is converted into a format defined by the MPEG standard and output.

図７は、本発明のさらに他の実施形態によるオーディオ符号化器を説明する図である。図７に示されたオーディオ符号化器は、フィルタバンク部７１０、ウィンドウスイッチング部７２０、ＣＭＤＣＴ部７３０、心理音響モデル部７４０、量子化及び符号化部７５０及びビットストリームフォーマッティング部７６０で形成される。 FIG. 7 is a diagram illustrating an audio encoder according to still another embodiment of the present invention. The audio encoder shown in FIG. 7 includes a filter bank unit 710, a window switching unit 720, a CMDCT unit 730, a psychoacoustic model unit 740, a quantization and coding unit 750, and a bit stream formatting unit 760.

ここで、フィルタバンク部７１０、量子化及び符号化部７５０、及びビットストリームフォーマッティング部７６０は、図２のフィルタバンク部２１０、量子化及び符号化部２６０及びビットストリームフォーマッティング部２７０と類似した機能を行うので、説明の簡単のために、詳細な説明は省略する。 Here, the filter bank unit 710, the quantization and encoding unit 750, and the bit stream formatting unit 760 have functions similar to the filter bank unit 210, the quantization and encoding unit 260, and the bit stream formatting unit 270 of FIG. For the sake of simplicity, detailed description is omitted.

ウィンドウスイッチング部７２０は、入力オーディオ信号の特性に基づいて、ＣＭＤＣＴ部７３０で使われるウィンドウタイプを決定し、決定されたウィンドウタイプ情報をＣＭＤＣＴ部７３０に伝送する。 The window switching unit 720 determines a window type used in the CMDCT unit 730 based on the characteristics of the input audio signal, and transmits the determined window type information to the CMDCT unit 730.

ＣＭＤＣＴ部７３０は、ロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを共に計算する。本実施形態では、心理音響モデル部７４０で使われるロングＣＭＤＣＴスペクトルは、３６ポイントＣＭＤＣＴを行い、これを何れも加算した後、１１５２長さのスペクトルを１０２４長さのスペクトルに再サンプリングすることによって得られる。また、心理音響モデル部７４０で使われるショートＣＭＤＣＴスペクトルは、１２ポイントＣＭＤＣＴを行い、これを何れも加算した後、その結果である３８４長さのスペクトルを２５６長さのスペクトルに再サンプリングすることによって得られる。 The CMDCT unit 730 calculates both a long CMDCT spectrum and a short CMDCT spectrum. In this embodiment, the long CMDCT spectrum used in the psychoacoustic model unit 740 is obtained by performing 36-point CMDCT, adding all of them, and re-sampling the 1152-length spectrum to the 1024-length spectrum. It is done. In addition, the short CMDCT spectrum used in the psychoacoustic model unit 740 is obtained by performing 12-point CMDCT, adding all of them, and then re-sampling the resulting 384-length spectrum into a 256-length spectrum. can get.

ＣＭＤＣＴ部７３０は、計算されたロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを心理音響モデル部７４０に出力する。また、ＣＭＤＣＴ部７３０は、ウィンドウスイッチング部７２０から入力されたウィンドウタイプがロングウィンドウである場合には、ロングＭＤＣＴスペクトルを量子化及び符号化部７５０に入力し、入力されたウィンドウタイプがショートウィンドウである場合には、ショートＭＤＣＴスペクトルを量子化及び符号化部７５０に入力する。 The CMDCT unit 730 outputs the calculated long CMDCT spectrum and short CMDCT spectrum to the psychoacoustic model unit 740. The CMDCT unit 730 inputs a long MDCT spectrum to the quantization and encoding unit 750 when the window type input from the window switching unit 720 is a long window, and the input window type is a short window. In some cases, the short MDCT spectrum is input to the quantization and encoding unit 750.

心理音響モデル部７４０は、ＣＭＤＣＴ部７３０から伝送されたロングスペクトル及びショートスペクトルによって非予測度を計算し、計算された非予測度に基づいて、ＳＭＲ値を計算して、量子化及び符号化部７５０に伝送する。 The psychoacoustic model unit 740 calculates a non-prediction degree based on the long spectrum and the short spectrum transmitted from the CMDCT unit 730, calculates an SMR value based on the calculated non-prediction degree, and performs a quantization and encoding unit. 750.

量子化及び符号化部７５０は、ＣＭＤＣＴ部７３０から伝送されたロングＭＤＣＴスペクトル及びショートＭＤＣＴスペクトルと、心理音響モデル部から入力されたＳＭＲ情報に基づいて、スケールファクタ及び量子化係数を決定する。決定された量子化係数に基づいて、量子化を行い、量子化されたデータについてハフマン符号化を行う。 The quantization and coding unit 750 determines a scale factor and a quantization coefficient based on the long MDCT spectrum and the short MDCT spectrum transmitted from the CMDCT unit 730 and the SMR information input from the psychoacoustic model unit. Quantization is performed based on the determined quantization coefficient, and Huffman coding is performed on the quantized data.

ビットストリームフォーマッティング部７６０は、量子化及び符号化部７５０から入力されたデータを特定フォーマットに変換して出力する。前記オーディオ符号化装置がＭＰＥＧオーディオ符号化装置である場合には、ＭＰＥＧ標準で定めたフォーマットに変換して出力する。 The bit stream formatting unit 760 converts the data input from the quantization and encoding unit 750 into a specific format and outputs the data. If the audio encoding device is an MPEG audio encoding device, it is converted into a format defined by the MPEG standard and output.

図８は、本発明のさらに他の実施形態によるオーディオ符号化方法を示すフローチャートである。 FIG. 8 is a flowchart illustrating an audio encoding method according to another embodiment of the present invention.

ステップ８１０では、フィルタバンクでオーディオ信号を入力され、入力されたオーディオ信号の統計的な重複性を除去するために、入力された時間領域のオーディオ信号を周波数領域のサブバンドに分割する。 In step 810, an audio signal is input through a filter bank, and the input time-domain audio signal is divided into frequency-domain subbands in order to remove statistical redundancy of the input audio signal.

ステップ８２０では、入力オーディオ信号の特性に基づいて、ウィンドウタイプを決定する。 In step 820, the window type is determined based on the characteristics of the input audio signal.

ステップ８３０では、ステップ８１０で処理されたオーディオデータについて、ショートウィンドウを適用してショートＣＭＤＣＴを行い、それと同時に、ロングウィンドウを適用してロングＣＭＤＣＴを行う。この結果、ショートＣＭＤＣＴスペクトル及びロングＣＭＤＣＴスペクトルを得る。 In step 830, short CMDCT is performed on the audio data processed in step 810 by applying a short window, and at the same time, long CMDCT is performed by applying a long window. As a result, a short CMDCT spectrum and a long CMDCT spectrum are obtained.

ステップ８４０では、ステップ８３０で得られたショートＣＭＤＣＴスペクトル及びロングＣＭＤＣＴスペクトルを利用して、心理音響モデルで使われる非予測度を計算する。また、計算された非予測度に基づいて、ＳＭＲ値を計算する。 In step 840, the non-prediction level used in the psychoacoustic model is calculated using the short CMDCT spectrum and the long CMDCT spectrum obtained in step 830. Further, the SMR value is calculated based on the calculated non-prediction degree.

ステップ８５０では、ステップ８２０で決定されたウィンドウタイプがロングウィンドウである場合には、ステップ８３０で得られたスペクトルのうち、ロングＭＤＣＴ値を入力されて、これについて、ステップ８４０で計算されたＳＭＲ値によって量子化を行い、量子化されたデータについてハフマン符号化を行う。 In step 850, if the window type determined in step 820 is a long window, the long MDCT value of the spectrum obtained in step 830 is input, and the SMR value calculated in step 840 is calculated. Quantization is performed, and Huffman coding is performed on the quantized data.

ステップ８６０では、ステップ８５０で符号化されたデータを特定フォーマットに変換して出力する。前記オーディオ符号化装置がＭＰＥＧオーディオ符号化装置である場合には、ＭＰＥＧ標準で定めたフォーマットに変換して出力する。 In step 860, the data encoded in step 850 is converted into a specific format and output. If the audio encoding device is an MPEG audio encoding device, it is converted into a format defined by the MPEG standard and output.

本発明は、前述した実施形態に限定されず、本発明の思想内で当業者による変形が可能である。特に、本発明は、ＭＰＥＧ−１レイヤ３だけでなく、ＭＤＣＴ及び心理音響モデルを使用するＭＰＥＧ−２ＡＡＣ（アドバンストオーディオコーディング）、ＭＰＥＧ４、ＷＭＡ（ウインドウズメディアオーディオ）のような全てのオーディオ符号化装置及び方法に適用されうる。 The present invention is not limited to the above-described embodiments, and can be modified by those skilled in the art within the spirit of the present invention. In particular, the present invention applies not only to MPEG-1 layer 3, but also to all audio encoding devices such as MPEG-2 AAC (Advanced Audio Coding), MPEG4, WMA (Windows Media Audio) using MDCT and psychoacoustic models. And can be applied to methods.

本発明はまた、コンピュータ可読記録媒体にコンピュータ可読コードとして具現することが可能である。コンピュータ可読記録媒体は、コンピュータシステムによって読取られるデータが保存される全ての種類の記録装置を含む。コンピュータ可読記録媒体の例としては、ＲＯＭ、ＲＡＭ、ＣＤ−ＲＯＭ、磁気テープ、ハードディスク、フロッピー（登録商標）ディスク、フラッシュメモリ、光データ保存装置があり、またキャリアウェーブ（例えば、インターネットを通じた伝送）状に具現されるものも含む。また、コンピュータ可読記録媒体は、ネットワークに連結されたコンピュータシステムに分散され、分散方式でコンピュータ可読コードとして保存されかつ実行されうる。 The present invention can also be embodied as computer readable code on a computer readable recording medium. Computer-readable recording media include all types of recording devices that store data read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, hard disk, floppy (registered trademark) disk, flash memory, optical data storage device, and carrier wave (for example, transmission through the Internet). Including those embodied in a shape. The computer-readable recording medium may be distributed in a computer system connected to a network, and stored and executed as computer-readable code in a distributed manner.

本発明による改善された心理音響モデルを適用して、ＦＦＴスペクトルの代わりにＣＭＤＣＴスペクトルを使用することによって、入力オーディオ信号に比べて、出力オーディオストリームの音質の低下なしにＦＦＴ変換にかかる計算量及びＭＰＥＧオーディオ符号化器の複雑度を低下させることが可能である。 By applying the improved psychoacoustic model according to the present invention and using the CMDCT spectrum instead of the FFT spectrum, the amount of calculation required for the FFT transform without degrading the sound quality of the output audio stream compared to the input audio signal and It is possible to reduce the complexity of the MPEG audio encoder.

従来のＭＰＥＧオーディオ符号化装置を示すブロック図である。It is a block diagram which shows the conventional MPEG audio encoding apparatus. 本発明の一実施形態によるＭＰＥＧオーディオ符号化装置を示すブロック図である。1 is a block diagram illustrating an MPEG audio encoding device according to an embodiment of the present invention. 本発明によるウィンドウスイッチングアルゴリズムに使われる遷移信号検出方式を示す図である。It is a figure which shows the transition signal detection system used for the window switching algorithm by this invention. 本発明に使われるウィンドウスイッチングアルゴリズムを示すフローチャートである。4 is a flowchart illustrating a window switching algorithm used in the present invention. 本発明によるサブバンドスペクトルから全体スペクトルを求める方式を示す図である。It is a figure which shows the system which calculates | requires the whole spectrum from the subband spectrum by this invention. 本発明の一実施形態によるＭＰＥＧオーディオ符号化方法を示すフローチャートである。3 is a flowchart illustrating an MPEG audio encoding method according to an embodiment of the present invention. 本発明の一実施形態によるＭＰＥＧオーディオ符号化装置を示すブロック図である。1 is a block diagram illustrating an MPEG audio encoding device according to an embodiment of the present invention. 本発明の一実施形態によるＭＰＥＧオーディオ符号化方法を示すフローチャートである。3 is a flowchart illustrating an MPEG audio encoding method according to an embodiment of the present invention.

Claims

デジタルオーディオ符号化方法において、
（ａ）入力オーディオ信号の特性によってウィンドウタイプを決定するステップと、
（ｂ）前記決定されたウィンドウタイプによって、前記入力オーディオ信号からＣＭＤＣＴスペクトルを生成するステップと、
（ｃ）前記決定されたウィンドウタイプを利用して、前記入力オーディオ信号からＦＦＴスペクトルを生成するステップと、
（ｄ）前記生成されたＣＭＤＣＴスペクトル及びＦＦＴスペクトルを利用して、心理音響モデル分析を行うステップとを含むことを特徴とする方法。 In the digital audio encoding method,
(A) determining the window type according to the characteristics of the input audio signal;
(B) generating a CMDCT spectrum from the input audio signal according to the determined window type;
(C) generating an FFT spectrum from the input audio signal using the determined window type;
(D) performing a psychoacoustic model analysis using the generated CMDCT spectrum and FFT spectrum.

前記（ａ）ステップは、（ａ１）前記入力オーディオ信号をフィルタリングして複数個のサブバンドに分割するステップをさらに含み、前記ウィンドウタイプを決定するステップは、前記サブバンドに分割された入力オーディオ信号について行われることを特徴とする請求項１に記載の方法。 The step (a) further includes: (a1) filtering the input audio signal to divide the input audio signal into a plurality of subbands, and determining the window type includes the input audio signal divided into the subbands. The method of claim 1, wherein the method is performed.

前記（ａ１）ステップは、多相フィルタバンクによって行われることを特徴とする請求項２に記載の方法。 The method of claim 2, wherein the step (a1) is performed by a polyphase filter bank.

前記（ａ）ステップで決定されたウィンドウタイプがロングウィンドウである場合、前記（ｂ）ステップでは、ロングウィンドウを適用してロングＣＭＤＣＴスペクトルを生成し、前記（ｃ）ステップでは、ショートウィンドウを適用してショートＦＦＴスペクトルを生成し、前記（ｄ）ステップでは、前記生成されたロングＣＭＤＣＴスペクトル及びショートＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする請求項１に記載の方法。 When the window type determined in step (a) is a long window, a long CMDCT spectrum is generated by applying a long window in step (b), and a short window is applied in step (c). The method according to claim 1, wherein a short FFT spectrum is generated, and in step (d), a psychoacoustic model is analyzed based on the generated long CMDCT spectrum and short FFT spectrum.

前記（ａ）ステップで決定されたウィンドウタイプがショートウィンドウである場合、前記（ｂ）ステップでは、ショートウィンドウを適用してショートＣＭＤＣＴスペクトルを生成し、前記（ｃ）ステップでは、ロングウィンドウを適用してロングＦＦＴスペクトルを生成し、前記（ｄ）ステップでは、前記生成されたショートＣＭＤＣＴスペクトル及びロングＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする請求項１に記載の方法。 When the window type determined in step (a) is a short window, a short CMDCT spectrum is generated by applying a short window in step (b), and a long window is applied in step (c). The method according to claim 1, wherein a long FFT spectrum is generated, and in step (d), a psychoacoustic model is analyzed based on the generated short CMDCT spectrum and long FFT spectrum.

前記（ａ）ステップは、入力オーディオ信号が遷移信号である場合には、ウィンドウタイプをショートウィンドウと決定し、非遷移信号である場合には、ウィンドウタイプをロングウィンドウと決定することを特徴とする請求項１に記載の方法。 In the step (a), when the input audio signal is a transition signal, the window type is determined as a short window, and when the input audio signal is a non-transition signal, the window type is determined as a long window. The method of claim 1.

（ｅ）前記（ｄ）ステップで行われた心理音響モデルの分析結果に基づいて、量子化及び符号化を行うステップをさらに含むことを特徴とする請求項１に記載の方法。 The method according to claim 1, further comprising: (e) performing quantization and encoding based on the analysis result of the psychoacoustic model performed in the step (d).

前記心理音響モデルは、ＭＰＥＧ−１レイヤ３、ＭＰＥＧ−２
ＡＡＣ、ＭＰＥＧ４、ＷＭＡを含むグループのうち、何れか一つで使われる心理音響モデルであることを特徴とする請求項１に記載の方法。 The psychoacoustic model includes MPEG-1 layer 3, MPEG-2
The method according to claim 1, wherein the psychoacoustic model is used in any one of a group including AAC, MPEG4, and WMA.

デジタルオーディオデータ符号化装置において、
入力オーディオ信号の特性によってウィンドウタイプを決定するウィンドウスイッチング部と、
前記ウィンドウスイッチング部で決定されたウィンドウタイプによって、前記入力オーディオ信号からＣＭＤＣＴスペクトルを生成するＣＭＤＣＴ部と、
前記ウィンドウスイッチング部で決定されたウィンドウタイプを利用して、前記入力オーディオ信号からＦＦＴスペクトルを生成するＦＦＴ部と、
前記ＣＭＤＣＴ部で生成されたＣＭＤＣＴスペクトル及び前記ＦＦＴ部で生成されたＦＦＴスペクトルを利用して、心理音響モデルの分析を行う心理音響モデル部とを含むことを特徴とする装置。 In a digital audio data encoding device,
A window switching unit that determines the window type according to the characteristics of the input audio signal;
A CMDCT unit for generating a CMDCT spectrum from the input audio signal according to a window type determined by the window switching unit;
An FFT unit that generates an FFT spectrum from the input audio signal using the window type determined by the window switching unit;
A psychoacoustic model unit that analyzes a psychoacoustic model using the CMDCT spectrum generated by the CMDCT unit and the FFT spectrum generated by the FFT unit.

前記符号化装置は、前記入力オーディオ信号をフィルタリングして複数個のサブバンドに分割するフィルタ部をさらに含み、前記ウィンドウスイッチング部は、前記フィルタ部の出力データに基づいて、ウィンドウタイプを決定することを特徴とする請求項９に記載の装置。 The encoding apparatus further includes a filter unit that filters the input audio signal and divides the input audio signal into a plurality of subbands, and the window switching unit determines a window type based on output data of the filter unit. The apparatus of claim 9.

前記フィルタ部は、多相フィルタバンクであることを特徴とする請求項１０に記載の装置。 The apparatus of claim 10, wherein the filter unit is a polyphase filter bank.

前記ウィンドウスイッチング部で決定されたウィンドウタイプがロングウィンドウである場合、前記ＣＭＤＣＴ部は、ロングウィンドウを適用してロングＣＭＤＣＴスペクトルを生成し、前記ＦＦＴ部は、ショートウィンドウを適用してショートＦＦＴスペクトルを生成し、前記心理音響モデル部は、前記ＣＭＤＣＴ部で生成されたロングＣＭＤＣＴスペクトル及び前記ＦＦＴ部で生成されたショートＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする請求項９に記載の装置。 When the window type determined by the window switching unit is a long window, the CMDCT unit applies a long window to generate a long CMDCT spectrum, and the FFT unit applies a short window to generate a short FFT spectrum. The psychoacoustic model unit generates and analyzes a psychoacoustic model based on a long CMDCT spectrum generated by the CMDCT unit and a short FFT spectrum generated by the FFT unit. The device described in 1.

前記ウィンドウスイッチング部で決定されたウィンドウタイプがショートウィンドウである場合、前記ＣＭＤＣＴ部は、ショートウィンドウを適用してショートＣＭＤＣＴスペクトルを生成し、前記ＦＦＴ部は、ロングウィンドウを適用してロングＦＦＴスペクトルを生成し、前記心理音響モデル部は、前記ＣＭＤＣＴ部で生成されたショートＣＭＤＣＴスペクトル及び前記ＦＦＴ部で生成されたロングＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする請求項９に記載の装置。 When the window type determined by the window switching unit is a short window, the CMDCT unit applies a short window to generate a short CMDCT spectrum, and the FFT unit applies a long window to generate a long FFT spectrum. The psychoacoustic model unit generates and analyzes the psychoacoustic model based on the short CMDCT spectrum generated by the CMDCT unit and the long FFT spectrum generated by the FFT unit. The device described in 1.

前記ウィンドウスイッチング部は、入力オーディオ信号が遷移信号である場合には、ウィンドウタイプをショートウィンドウと決定し、非遷移信号である場合には、ウィンドウタイプをロングウィンドウと決定することを特徴とする請求項９に記載の装置。 The window switching unit determines a window type as a short window when the input audio signal is a transition signal, and determines a window type as a long window when the input audio signal is a non-transition signal. Item 10. The apparatus according to Item 9.

前記ＣＭＤＣＴ部からのオーディオデータ及び前記心理音響モデル部からの結果値に基づいて、量子化及び符号化を行う量子化及び符号化部をさらに含むことを特徴とする請求項９に記載の装置。 The apparatus of claim 9, further comprising a quantization and encoding unit that performs quantization and encoding based on audio data from the CMDCT unit and a result value from the psychoacoustic model unit.

前記心理音響モデルは、ＭＰＥＧ−１レイヤ３、ＭＰＥＧ−２ＡＡＣ、ＭＰＥＧ４、ＷＭＡを含むグループのうち、何れか一つで使われる心理音響モデルであることを特徴とする請求項９に記載の装置。 The apparatus according to claim 9, wherein the psychoacoustic model is a psychoacoustic model used in any one of a group including MPEG-1 layer 3, MPEG-2 AAC, MPEG4, and WMA. .

デジタルオーディオ符号化方法において、
（ａ）入力オーディオ信号からＣＭＤＣＴスペクトルを生成するステップと、
（ｂ）前記生成されたＣＭＤＣＴスペクトルを利用して、心理音響モデルの分析を行うステップとを含むことを特徴とする方法。 In the digital audio encoding method,
(A) generating a CMDCT spectrum from the input audio signal;
(B) analyzing the psychoacoustic model using the generated CMDCT spectrum.

前記（ａ）ステップは、（ａ１）入力オーディオ信号について、ロングウィンドウ及びショートウィンドウを適用してＣＭＤＣＴを行い、ロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを生成するステップをさらに含むことを特徴とする請求項１７に記載の方法。 The step (a) further includes the step of: (a1) applying CMDCT to the input audio signal by applying a long window and a short window to generate a long CMDCT spectrum and a short CMDCT spectrum. The method described in 1.

前記（ｂ）ステップは、前記（ａ１）ステップで生成されたロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを使用して、心理音響モデルの分析を行うことを特徴とする請求項１８に記載の方法。 The method according to claim 18, wherein the step (b) performs an analysis of a psychoacoustic model using the long CMDCT spectrum and the short CMDCT spectrum generated in the step (a1).

前記（ａ）ステップは、（ａ２）前記入力オーディオ信号をフィルタリングして複数個のサブバンドに分割するステップをさらに含み、前記ＣＭＤＣＴスペクトルを生成するステップは、前記サブバンドに分割された入力オーディオ信号について行われることを特徴とする請求項１７に記載の方法。 The step (a) further includes the step (a2) of filtering the input audio signal and dividing the input audio signal into a plurality of subbands, and the step of generating the CMDCT spectrum includes the input audio signal divided into the subbands. The method of claim 17, wherein the method is performed.

前記符号化方法は、（ａ３）前記入力オーディオ信号の特性によって、ウィンドウタイプを決定するステップをさらに含むことを特徴とする請求項１７に記載の方法。 The method of claim 17, wherein the encoding method further comprises: (a3) determining a window type according to characteristics of the input audio signal.

前記（ａ３）ステップは、入力オーディオ信号が遷移信号である場合には、ウィンドウタイプをショートウィンドウと決定し、非遷移信号である場合には、ウィンドウタイプをロングウィンドウと決定することを特徴とする請求項２１に記載の方法。 In the step (a3), when the input audio signal is a transition signal, the window type is determined as a short window, and when the input audio signal is a non-transition signal, the window type is determined as a long window. The method of claim 21.

前記（ａ２）ステップは、多相フィルタバンクによって行われることを特徴とする請求項２０に記載の方法。 21. The method of claim 20, wherein step (a2) is performed by a polyphase filter bank.

前記（ａ３）ステップで決定されたウィンドウタイプがロングウィンドウである場合には、前記（ｂ）ステップで行われた心理音響モデルの分析結果に基づいて、ロングＭＤＣＴスペクトルについて量子化及び符号化を行い、前記（ａ２）ステップで決定されたウィンドウタイプがショートウィンドウである場合には、前記（ｂ）ステップで行われた心理音響モデルの分析結果に基づいて、ショートＭＤＣＴスペクトルについて量子化及び符号化を行うステップを含むことを特徴とする請求項２２に記載の方法。 When the window type determined in the step (a3) is a long window, the long MDCT spectrum is quantized and encoded based on the analysis result of the psychoacoustic model performed in the step (b). When the window type determined in the step (a2) is a short window, the short MDCT spectrum is quantized and encoded based on the analysis result of the psychoacoustic model performed in the step (b). 23. The method of claim 22, comprising performing.

前記心理音響モデルは、ＭＰＥＧ−１レイヤ３、ＭＰＥＧ−２
ＡＡＣ、ＭＰＥＧ４、ＷＭＡを含むグループのうち、何れか一つで使われる心理音響モデルであることを特徴とする請求項１７に記載の方法。 The psychoacoustic model includes MPEG-1 layer 3, MPEG-2
The method according to claim 17, wherein the psychoacoustic model is used in any one of a group including AAC, MPEG4, and WMA.

デジタルオーディオ符号化装置において、
入力オーディオ信号からＣＭＤＣＴスペクトルを生成するＣＭＤＣＴ部と、
前記ＣＭＤＣＴ部で生成されたＣＭＤＣＴスペクトルを利用して、心理音響モデルの分析を行う心理音響モデル部を含むことを特徴とする装置。 In a digital audio encoding device,
A CMDCT unit for generating a CMDCT spectrum from an input audio signal;
An apparatus comprising a psychoacoustic model unit that analyzes a psychoacoustic model using a CMDCT spectrum generated by the CMDCT unit.

前記ＣＭＤＣＴ部は、前記入力オーディオ信号について、ロングウィンドウ及びショートウィンドウを適用してＣＭＤＣＴを行い、ロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを生成することを特徴とする請求項２６に記載の装置。 The apparatus of claim 26, wherein the CMDCT unit performs CMDCT on the input audio signal by applying a long window and a short window to generate a long CMDCT spectrum and a short CMDCT spectrum.

前記心理音響モデル部は、前記ＣＭＤＣＴ部で生成されたロングＣＭＤＣＴスペクトル及びショートＣＭＤＣＴスペクトルを使用して、心理音響モデルの分析を行うことを特徴とする請求項２７に記載の装置。 The apparatus according to claim 27, wherein the psychoacoustic model unit analyzes a psychoacoustic model using a long CMDCT spectrum and a short CMDCT spectrum generated by the CMDCT unit.

前記入力オーディオ信号をフィルタリングして複数個のサブバンドに分割するフィルタ部をさらに含み、前記ＣＭＤＣＴ部は、前記サブバンドに分割されたデータについてＣＭＤＣＴを行うことを特徴とする請求項２６に記載の装置。 27. The filter of claim 26, further comprising a filter unit that filters the input audio signal and divides the input audio signal into a plurality of subbands, and the CMDCT unit performs CMDCT on the data divided into the subbands. apparatus.

前記入力オーディオ信号の特性によって、ウィンドウタイプを決定するウィンドウタイプ決定部をさらに含むことを特徴とする請求項２６に記載の装置。 27. The apparatus of claim 26, further comprising a window type determination unit that determines a window type according to characteristics of the input audio signal.

前記ウィンドウタイプ決定部は、入力オーディオ信号が遷移信号である場合には、ウィンドウタイプをショートウィンドウと決定し、非遷移信号である場合には、ウィンドウタイプをロングウィンドウと決定することを特徴とする請求項３０に記載の装置。 The window type determination unit determines a window type as a short window when the input audio signal is a transition signal, and determines a window type as a long window when the input audio signal is a non-transition signal. The apparatus of claim 30.

前記フィルタ部は、多相フィルタバンクであることを特徴とする請求項２９に記載の装置。 30. The apparatus of claim 29, wherein the filter unit is a polyphase filter bank.

前記符号化装置は、量子化及び符号化部をさらに含み、前記量子化及び符号化部は、前記ウィンドウタイプ決定部で決定されたウィンドウタイプがロングウィンドウである場合には、前記心理音響モデル部で行われた心理音響モデルの分析結果に基づいて、ロングＭＤＣＴスペクトルについて量子化及び符号化を行い、前記ウィンドウタイプ決定部で決定されたウィンドウタイプがショートウィンドウである場合には、前記心理音響モデル部で行われた心理音響モデルの分析結果に基づいて、ショートＭＤＣＴスペクトルについて量子化及び符号化を行うことを特徴とする請求項３１に記載の装置。 The encoding device further includes a quantization and encoding unit, and the quantization and encoding unit, when the window type determined by the window type determination unit is a long window, the psychoacoustic model unit In the case where the long MDCT spectrum is quantized and encoded based on the analysis result of the psychoacoustic model performed in step 1, and the window type determined by the window type determination unit is a short window, the psychoacoustic model 32. The apparatus according to claim 31, wherein the short MDCT spectrum is quantized and encoded based on the analysis result of the psychoacoustic model performed by the unit.

前記心理音響モデルは、ＭＰＥＧ−１レイヤ３、ＭＰＥＧ−２ＡＡＣ、ＭＰＥＧ４、ＷＭＡを含むグループのうち、何れか一つで使われる心理音響モデルであることを特徴とする請求項２６に記載の装置。 27. The apparatus according to claim 26, wherein the psychoacoustic model is a psychoacoustic model used in any one of a group including MPEG-1 layer 3, MPEG-2 AAC, MPEG4, and WMA. .

（ａ）入力オーディオ信号の特性によってウィンドウタイプを決定するステップと、
（ｂ）前記決定されたウィンドウタイプによって、前記入力オーディオ信号からＣＭＤＣＴスペクトルを生成するステップと、
（ｃ）前記決定されたウィンドウタイプを利用して、前記入力オーディオ信号からＦＦＴスペクトルを生成するステップと、
（ｄ）前記生成されたＣＭＤＣＴスペクトル及びＦＦＴスペクトルを利用して、心理音響モデルの分析を行うステップとを含むデジタルオーディオ符号化方法を行うためのコンピュータプログラムコードが記録されたコンピュータで判読可能な記録媒体。 (A) determining the window type according to the characteristics of the input audio signal;
(B) generating a CMDCT spectrum from the input audio signal according to the determined window type;
(C) generating an FFT spectrum from the input audio signal using the determined window type;
(D) a computer-readable record in which computer program code for performing a digital audio encoding method including a step of analyzing a psychoacoustic model using the generated CMDCT spectrum and FFT spectrum is recorded. Medium.

前記（ａ）ステップは、（ａ１）前記入力オーディオ信号をフィルタリングして、複数個のサブバンドに分割するステップをさらに含み、前記ウィンドウタイプを決定するステップは、前記サブバンドに分割された入力オーディオ信号について行われることを特徴とする請求項３５に記載の記録媒体。 The step (a) further includes the step of (a1) filtering the input audio signal to divide the input audio signal into a plurality of subbands, and the step of determining the window type includes the input audio divided into the subbands. 36. The recording medium of claim 35, wherein the recording medium is performed on a signal.

前記（ａ１）ステップは、多相フィルタバンクによって行われることを特徴とする請求項３６に記載の記録媒体。 The recording medium according to claim 36, wherein the step (a1) is performed by a polyphase filter bank.

前記（ａ）ステップで決定されたウィンドウタイプがロングウィンドウである場合、前記（ｂ）ステップでは、ロングウィンドウを適用してロングＣＭＤＣＴスペクトルを生成し、前記（ｃ）ステップでは、ショートウィンドウを適用してショートＦＦＴスペクトルを生成し、前記（ｄ）ステップでは、前記生成されたロングＣＭＤＣＴスペクトル及びショートＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする請求項３５に記載の記録媒体。 When the window type determined in step (a) is a long window, a long CMDCT spectrum is generated by applying a long window in step (b), and a short window is applied in step (c). 36. The recording medium according to claim 35, wherein a short FFT spectrum is generated, and in step (d), a psychoacoustic model is analyzed based on the generated long CMDCT spectrum and short FFT spectrum. .

前記（ａ）ステップで決定されたウィンドウタイプがショートウィンドウである場合、前記（ｂ）ステップでは、ショートウィンドウを適用してショートＣＭＤＣＴスペクトルを生成し、前記（ｃ）ステップでは、ロングウィンドウを適用してロングＦＦＴスペクトルを生成し、前記（ｄ）ステップでは、前記生成されたショートＣＭＤＣＴスペクトル及びロングＦＦＴスペクトルに基づいて、心理音響モデルの分析を行うことを特徴とする請求項３５に記載の記録媒体。 When the window type determined in step (a) is a short window, a short CMDCT spectrum is generated by applying a short window in step (b), and a long window is applied in step (c). 36. The recording medium according to claim 35, wherein a long FFT spectrum is generated, and in step (d), a psychoacoustic model is analyzed based on the generated short CMDCT spectrum and long FFT spectrum. .

前記（ａ）ステップは、入力オーディオ信号が遷移信号である場合には、ウィンドウタイプをショートウィンドウと決定し、非遷移信号である場合には、ウィンドウタイプをロングウィンドウと決定することを特徴とする請求項３５に記載の記録媒体。 In the step (a), when the input audio signal is a transition signal, the window type is determined as a short window, and when the input audio signal is a non-transition signal, the window type is determined as a long window. The recording medium according to claim 35.

前記（ｅ）前記（ｄ）ステップで行われた心理音響モデルの分析結果に基づいて、量子化及び符号化を行うステップをさらに含むことを特徴とする請求項３５に記載の記録媒体。 36. The recording medium according to claim 35, further comprising a step of performing quantization and encoding based on the analysis result of the psychoacoustic model performed in the step (e) and the step (d).