WO2007107046A1 - A coding/decoding method of rapidly-changing audio-frequency signals - Google Patents

A coding/decoding method of rapidly-changing audio-frequency signals Download PDF

Info

Publication number
WO2007107046A1
WO2007107046A1 PCT/CN2006/000474 CN2006000474W WO2007107046A1 WO 2007107046 A1 WO2007107046 A1 WO 2007107046A1 CN 2006000474 W CN2006000474 W CN 2006000474W WO 2007107046 A1 WO2007107046 A1 WO 2007107046A1
Authority
WO
WIPO (PCT)
Prior art keywords
modulation
frequency
scale factor
domain
coefficients
Prior art date
Application number
PCT/CN2006/000474
Other languages
French (fr)
Chinese (zh)
Inventor
Xingde Pan
Lei Wang
Original Assignee
Beijing Ori-Reu Technology Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ori-Reu Technology Co., Ltd filed Critical Beijing Ori-Reu Technology Co., Ltd
Priority to PCT/CN2006/000474 priority Critical patent/WO2007107046A1/en
Publication of WO2007107046A1 publication Critical patent/WO2007107046A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/035Scalar quantisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation

Definitions

  • the invention relates to a method for encoding and decoding fast-changing audio signals, in particular to a method for encoding and decoding fast-changing audio signals by effectively organizing scale factor bands.
  • the digital audio signal is audio-encoded or audio-compressed for storage and transmission.
  • the purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
  • Patent ZL02122099. 9 patent applications PCT/CN2004/001034, PCT/CN2005/000440, PCT/CN2005/000441, 200410046153 and 200410046154 all propose audio coding methods and devices based on a new multi-resolution analysis technique, and propose An encoding method for organizing frequency-modulation domain signals.
  • These patent applications use frequency domain coefficients obtained by transforming/filtering such as Modified Discrete Cosine Transform (MDCT) or Fast Fourier Transform (FFT). Further multi-resolution transform/filtering, for example, using wavelet transform, short MDCT or DCT, etc., to obtain a new two-dimensional data structure. This new two-dimensional data structure is different from the existing time-frequency two-dimensional data structure, but the frequency-modulation domain signal representation; the above method is called "pseudo-wavelet" filtering to distinguish it from the traditional wavelet filtering technique.
  • MDCT Modified Discrete Cosine Transform
  • FFT Fast Fourier Transform
  • Patent CN1141699 Encoding and Decoding of Wideband Digital Information Signals
  • Patent CN1141699 “Encoding and Decoding of Wideband Digital Information Signals” provides an encoding strategy for adaptively coding the scale factor bands in units of filtered sub-bands (time-frequency representations of signals) in time series, but In this patent, the order of organization of time-frequency two-dimensional data structures is limited to chronological order.
  • Patent CN1100850 “Methods and apparatus for recording, reproducing, transmitting and/or accepting compressed data, as well as recording media used” provides a wider range of time and A coding strategy for organizing the time-frequency two-dimensional data structure adaptively on frequency.
  • the above-mentioned scale factor band technique mainly focuses on the time-frequency representation of the signal, while in the "pseudo-wavelet" based audio encoder, the "pseudo-wavelet” filtering obtains the modulation domain signal distribution and the previous time domain-frequency. There is a difference in the distribution of domain coefficients. It is necessary to redesign the corresponding scale factor organization and coding strategy according to the frequency-modulation domain coefficient distribution characteristics.
  • the object of the present invention is to provide a codec method for rapidly changing audio signals according to the defects existing in the prior art, and to encode the coefficients of the same or similar properties together by a more sufficient tissue scale factor band, thereby effectively improving the fast change.
  • the coding efficiency of the signal significantly improves the performance of the codec.
  • the present invention provides a coding method for a fast-changing audio signal, the method comprising the following steps:
  • Step 1 Perform time-frequency mapping processing on the fast-changing audio signal to obtain a frequency domain coefficient of the audio signal
  • Step 2 performing frequency domain-modulation domain mapping processing on the frequency domain coefficients to obtain modulation domain coefficients; Step 3. obtaining a modulation window according to modulation domain coefficients, and configuring modulation domain coefficients according to frequency order in each modulation window;
  • Step 4 In each modulation window, divide a scale factor band
  • Step 5 Recombining the scale factor bands of each modulation window to form a large-scale factor band; Step 6. Perform quantization and entropy coding on the modulation domain coefficients to obtain an encoded audio code stream.
  • the time-frequency mapping process in the step 1 is a modified discrete cosine transform or a fast Fourier transform.
  • the step 1 and the step 2 further include: organizing the frequency domain coefficients in a frequency order.
  • the frequency domain modulation domain mapping process in the step 1 is a short modified discrete cosine transform or a wavelet transform.
  • the step 4 may be specifically: dividing, in each modulation window, a scale factor band according to a human ear auditory frequency resolution.
  • the step 5 may be specifically: combining the scale factor bands with the same or similar properties into a large scale factor band.
  • the quantization in the step 6 may be scalar quantization, and specifically includes: The modulation domain coefficients are subjected to nonlinear companding; and the quantization domain coefficients of each sub-band included in the large-scale factor band are quantized by using the scale factor of each large-scale factor band to obtain a quantized spectrum represented by an integer;
  • the first scale factor is used as a common scale factor; other scale factors are differentially processed from their previous scale factor;
  • the entropy coding comprises: entropy coding the quantized spectrum and the scaled factor after the difference processing, obtaining a code book serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the code book serial number to obtain a code book serial number coded value.
  • the organization mode, the quantizer parameter and the entropy coding mode information of the scale factor band are transmitted as control information.
  • the present invention also provides a decoding method of a fast-changing audio signal, the method comprising the following steps:
  • Step 1 Entropy decoding the audio signal according to the scale factor organization mode and the entropy coding mode information in the audio code stream control information, to obtain a quantized value of the modulation domain coefficient;
  • Step 2 Perform inverse quantization on the quantized value of the modulation domain coefficient according to the scale factor band organization mode and the scale factor information in the control information, to obtain an inverse quantization modulation domain coefficient;
  • Step 3 Perform recombination on the inverse-quantized modulation domain coefficients in the time-frequency plane, and then perform modulation domain-frequency domain mapping processing to obtain frequency domain coefficients;
  • Step 4 Perform frequency-frequency mapping processing on the frequency domain coefficients to obtain a time domain audio signal.
  • the recombination in the step 3 specifically includes: firstly, the modulation domain coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the modulation window and the scale factor band.
  • the invention organizes the coefficients with the same or similar coefficients together by a more comprehensive tissue scale factor band, which effectively improves the coding efficiency of the fast-changing signal and significantly improves the performance of the codec.
  • DRAWINGS 1 is a flow chart of a method for encoding a fast-changing audio signal according to the present invention
  • FIG. 2 is a schematic diagram of a coefficient of a tissue modulation domain in a coding method of a fast-changing audio signal according to the present invention
  • FIG. 3 is a schematic diagram of a division factor band in a coding method of a fast-changing audio signal according to the present invention
  • FIG. 4 is a fast-changing audio signal of the present invention
  • FIG. 5 is a flow chart of the decoding method of the fast-change audio signal according to the present invention.
  • the invention improves the codec of the fast-changing signal, and divides the modulation window into a plurality of scale factor bands according to the auditory frequency resolution of the human ear, and reorganizes the scale factor band according to the characteristics of the coded signal.
  • the large-scale factor band when the modulation domain coefficients are quantized and entropy encoded, the same large-scale factor band shares a quantizer and/or an entropy coder, thereby fully utilizing the modulation domain characteristics of the signal and increasing the compression ratio.
  • FIG. 1 it is a flowchart of a method for encoding a fast-changing audio signal according to the present invention, and the method includes the following steps:
  • Step 1 Perform time-frequency mapping processing on the fast-changing audio signal to obtain a frequency domain coefficient of the audio signal
  • Step 2 performing frequency domain-modulation domain mapping processing on the frequency domain coefficients to obtain modulation domain coefficients;
  • Step 3 obtaining modulation windows according to modulation domain coefficients, and configuring modulation domain coefficients according to frequency order in each modulation window;
  • Step 4 In each modulation window, divide a scale factor band
  • Step 5 Recombining the scale factor bands of each modulation window to form a large-scale factor band; Step 6. Perform quantization and entropy coding on the modulation domain coefficients to obtain an encoded audio code stream.
  • the time-frequency mapping can adopt the time domain-frequency domain transform method such as MDCT, FFT, discrete Fourier transform (DFT;), and wavelet transform.
  • MDCT time domain-frequency domain transform
  • FFT discrete Fourier transform
  • DFT discrete Fourier transform
  • wavelet transform wavelet transform
  • the MDCT of 2048 points is used (including the current frame number of 1024 points).
  • the linear PCM signal is converted to the frequency; and the 1024-point frequency domain coefficient is obtained.
  • the 2048 point MDCT transform can be defined as:
  • w(k) The rth window function coefficient, the formula of the window function is as follows: w(2N ⁇ l ⁇ k)
  • the MDCT input sequence and the IMDCT output sequence have a 50% overlap (over l ap), which is a sample.
  • IMDCT overlapping portions of adjacent output blocks are superimposed.
  • the multi-resolution analysis includes frequency domain coefficient transform and recombination, wherein the frequency domain coefficients are transformed into modulation domain coefficients by a frequency domain-modulation domain mapping, and the modulation domain coefficients are coefficients of a time-frequency plane;
  • the rules are grouped.
  • the frequency domain-modulation domain mapping can be performed by frequency domain wavelet transform or frequency domain short MDCT transform.
  • the following is a short MCDT transform to illustrate the frequency domain-modulation domain mapping process.
  • the frequency domain coefficients are first organized in frequency order, and then 128 16-point (including 8-point overlap) short MDCT are used for frequency domain multi-resolution transform in frequency order to obtain 1024 frequency-modulation domain coefficients.
  • the 16th-order short MDCT transform is performed at intervals of 8.
  • the first 8 spectral coefficients of each short MDCT transform are the spectral coefficients of the last transform input, and each short MDCT transform obtains 8 modulation domains. Coefficient.
  • the formula for calculating short MDCT is as follows:
  • the modulation domain coefficients are organized into a modulation window, and within each modulation window, the modulation domain coefficients are organized according to the frequency order.
  • the modulation domain coefficient diagram in order to organize the modulation domain coefficient diagram, the above-mentioned 1,024 modulation domain coefficients obtained by the 8-point short MDCT are organized into eight modulation windows, and eight modulation windows of the short MDCT transformation are obtained, and each modulation is obtained. There are 128 modulation domain coefficients in the window; in each modulation window, 128 coefficients are organized in frequency order.
  • each scale factor band is a basic coding unit, and the coefficients in the band share a quantizer and an entropy encoder.
  • FIG. 4 it is a schematic diagram of the recombination scale factor.
  • the scale factor bands obtained above are recombined, and the scale factor bands with the same or similar properties are organized. Together, one or more large-scale factor bands are formed, and each scale factor band in each large-scale factor band is called a sub-band of the large-scale factor band.
  • each subband of each large scale factor band shares a quantizer and/or a chirp encoder.
  • the quantization can be scalar quantization or vector quantization.
  • the scalar quantization mainly uses the scale factor to quantize the modulation domain coefficients.
  • Each sub-band included in the large-scale factor band uses the same scale factor.
  • the scalar quantization includes the following steps: Nonlinearity of the modulation domain coefficients in all scale factor bands Companding; then using the scale factor of each large-scale factor band to quantize the modulation domain coefficients of each sub-band included in the large-scale factor band to obtain a quantized spectrum represented by an integer; selecting the first scale factor in each frame signal As a common scale factor; other scale factors are differentially processed from their previous scale factor.
  • Vector quantization includes the following steps: constituting a modulation domain coefficient into a plurality of multi-dimensional vector signals; performing flattening according to a flattening factor for each vector; finding a codeword having a minimum distance from a vector to be quantized in a codebook according to a subjective perceptual distance measurement criterion, The codeword index is obtained, and each subband belonging to the same large scale factor band shares one codeword.
  • Entropy coding is a kind of source coding technology. The basic idea is: Give a shorter length codeword to a symbol with a higher probability of occurrence, and a longer codeword to a symbol with a lower probability of occurrence, such that the length of the average codeword The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then with the appropriate variable length coding, the average length of the codewords will be full. Where H ( X ) represents the entropy of the source and x represents the symbol variable.
  • the entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the codebook sequence number to obtain a codebook serial number coded value.
  • the quantized spectrum in the same large scale factor band uses the same code table, which can save the bit occupation of the sideband information.
  • the above entropy coding method may employ any of existing Huffman coding, arithmetic coding or run length coding.
  • the encoded audio code stream is obtained; the organization of the scale factor band, the scale factor and the entropy coding mode (such as the Huffman book number) are transmitted as control information, and the audio code stream is combined with the control information. Use to get a compressed audio stream.
  • the slow-changing signal in the code stream can also be used in the patent applications PCT/CN2004/001034, PCT/CN2005/000440, PCT/CN2005/000441, 200410046153 and 200410046154.
  • the method for fast changing signals needs to be decoded according to the fast variable signal decoding method of the present invention. As shown in FIG. 5, it is a flowchart of a method for decoding a fast-change audio signal according to the present invention, and the method includes the following steps:
  • Step A Entropy decoding the audio signal according to the scale factor band organization mode and the entropy coding mode information in the audio code stream control information, to obtain a quantized value of the modulation domain coefficient;
  • the organization method, scale factor and entropy coding mode (such as Huffman book number) of the scale factor band are used as control information transmission.
  • the entropy decoding mode is first determined according to the organization mode of the scale factor band and the entropy coding mode, and the code stream is performed. Entropy decoding.
  • Step B and according to the scale of the scale factor band information and organization information in the control factor, the quantization value of the modulation domain coefficients inverse quantization to obtain inverse quantized modulation domain coefficients;: - Step C: Perform recombination on the inverse-quantized modulation domain coefficients on the time-frequency plane, and then perform modulation domain-frequency domain mapping processing to obtain frequency domain coefficients;
  • the decoding end When the encoding end provides multiple frequency domain-modulation domain mapping processing methods, the decoding end should adopt the corresponding inverse mapping method for processing, and the frequency domain-modulation domain mapping processing method used in encoding can be encoded to In the control information, the decoding end selects a corresponding inverse mapping method according to the control information, and obtains a frequency domain coefficient.
  • the modulation domain coefficients are reorganized according to a certain rule in the time-frequency plane, and then the modulation domain-frequency domain mapping is performed on the modulation domain coefficients to obtain the frequency domain coefficients.
  • the method of recombining may include: firstly, the modulation domain coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the modulation window and the scale factor band.
  • the modulation domain-frequency domain processing should use the short IMDCT method, according to the following formula:
  • Step D Perform frequency-frequency mapping processing on the frequency domain coefficients to obtain a time domain audio signal.
  • the decoding end When the encoding end provides multiple time-frequency mapping processing methods, the decoding end should adopt the corresponding inverse mapping method for processing, and the time-frequency mapping processing method used in encoding can be encoded into the control information of the code stream, and the decoding end is controlled according to the control. The information is selected by the corresponding inverse mapping method to obtain a time domain signal.
  • the frequency-time mapping should use the IMDCT method, according to the following formula:
  • the output sequence of the IMDCT also has 50% overlap, ie N samples.
  • overlapping portions of adjacent output blocks are superimposed to obtain a time domain audio signal.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A coding method of rapidly-changing audio-frequency signals comprises performing time-frequency mapping on rapidly-changing audio-frequency signals to obtain the frequency-domain coefficients; performing frequency-domain to modulation-domain mapping on the frequency-domain coefficients to obtain the modulation-domain coefficients; forming modulation windows according to the modulation-domain coefficients and combining the modulation-domain coefficients in frequencies orders; dividing size-factor belts in each modulation window; recombining the size-factor belts to form big size-factor belts; and performing quantization and entropy encoding of modulation-domain coefficients to form an audio encoding code stream. A decoding method of rapidly-changing audio-frequency signals comprises performing entropy decoding and antiquantization of audio-frequency signals according to controlling information of audio encoding code stream; recombining modulation-domain coefficients, performing modulation-domain to frequency-domain mapping and frequency-time mapping to obtain audio decoding code stream.

Description

一种快变音频信号的编解码方法 技术领域  Codec method for fast changing audio signal
本发明涉及一种快变音频信号的编解码方法, 尤其是一种通过有效组织 尺度因子带对快变音频信号进行编解码的方法。 背景技术  The invention relates to a method for encoding and decoding fast-changing audio signals, in particular to a method for encoding and decoding fast-changing audio signals by effectively organizing scale factor bands. Background technique
为得到高保真的数字音频信号, 需对数字音频信号进行音频编码或音频 压缩以便于存储和传输。 对音频信号进行编码的目的是用尽可能少的比特数 实现音频信号的透明表示,例如,原始输入的音频信号与经编码后输出的音频 信号之间几乎没有差别。  In order to obtain a high-fidelity digital audio signal, the digital audio signal is audio-encoded or audio-compressed for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
专利 ZL02122099. 9,专利申请 PCT/CN2004/ 001034、 PCT/CN2005/ 000440、 PCT/CN2005/000441 , 200410046153及 200410046154均提出了基于一种新的 多分辨率分析技术的音频编码方法和装置,并提出了一种对频率-调制域信号 进行组织的编码方法。 这些专利申请采用对修正离散余弦变换 (Modi f i ed Di scre te Cos ine Transform, 简称 MDCT )或快速傅里叶变换 ( Fas t Four i er Trans form, 简称 FFT )等变换 /滤波获得的频域系数进一步做多分辨率变换 / 滤波, 例如采用小波变换、 短 MDCT或 DCT等, 获得一种新的二维数据结构。 这种新的二维数据结构不同于已有的时间 -频率二维数据结构, 而是频率-调 制域信号表示; 将上述方法称为 "伪小波" 滤波, 以区别于传统的小波滤波 技术。  Patent ZL02122099. 9, patent applications PCT/CN2004/001034, PCT/CN2005/000440, PCT/CN2005/000441, 200410046153 and 200410046154 all propose audio coding methods and devices based on a new multi-resolution analysis technique, and propose An encoding method for organizing frequency-modulation domain signals. These patent applications use frequency domain coefficients obtained by transforming/filtering such as Modified Discrete Cosine Transform (MDCT) or Fast Fourier Transform (FFT). Further multi-resolution transform/filtering, for example, using wavelet transform, short MDCT or DCT, etc., to obtain a new two-dimensional data structure. This new two-dimensional data structure is different from the existing time-frequency two-dimensional data structure, but the frequency-modulation domain signal representation; the above method is called "pseudo-wavelet" filtering to distinguish it from the traditional wavelet filtering technique.
专利 CN1141699 "宽带数字信息信号的编码和解码" 提供了以滤波子带 (对信号的时间 -频率表达形式)为单位, 在时间顺序上自适应組织尺度因子 带, 进行量化编码的编码策略, 但是在该专利中, 时间-频率二维数据结构的 组织顺序仅限于时间顺序。 专利 CN1100850 "用于记录、 再生、 传输和 /或接 受压缩数据的方法及装置, 以及采用的记录介质" 提供了更广泛的在时间和 频率上自适应对时间-频率二维数据结构进行组织的编码策略。但是, 上述尺 度因子带的技术主要针对信号的时间-频率表示, 而在基于 "伪小波"技术的 音频编码器中, "伪小波" 滤波所获得的调制域信号分布与以往的时域-频域 系数分布存在本庸的不同,需要根据频率-调制域系数分布特性重新设计相应 的尺度因子组织和编码策略。 发明内容 Patent CN1141699 "Encoding and Decoding of Wideband Digital Information Signals" provides an encoding strategy for adaptively coding the scale factor bands in units of filtered sub-bands (time-frequency representations of signals) in time series, but In this patent, the order of organization of time-frequency two-dimensional data structures is limited to chronological order. Patent CN1100850 "Methods and apparatus for recording, reproducing, transmitting and/or accepting compressed data, as well as recording media used" provides a wider range of time and A coding strategy for organizing the time-frequency two-dimensional data structure adaptively on frequency. However, the above-mentioned scale factor band technique mainly focuses on the time-frequency representation of the signal, while in the "pseudo-wavelet" based audio encoder, the "pseudo-wavelet" filtering obtains the modulation domain signal distribution and the previous time domain-frequency. There is a difference in the distribution of domain coefficients. It is necessary to redesign the corresponding scale factor organization and coding strategy according to the frequency-modulation domain coefficient distribution characteristics. Summary of the invention
本发明的目的在于针对现有技术所存在的缺陷, 提供快变音频信号的编 解码方法, 通过更充分的组织尺度因子带, 将性质相同或相似的系数组织在 一起编码, 从而有效提高快变信号的编码效率, 显著提升编解码器的性能。  The object of the present invention is to provide a codec method for rapidly changing audio signals according to the defects existing in the prior art, and to encode the coefficients of the same or similar properties together by a more sufficient tissue scale factor band, thereby effectively improving the fast change. The coding efficiency of the signal significantly improves the performance of the codec.
为了实现上述目的, 本发明提供了一种快变音频信号的编码方法, 该方 法包括以下步骤:  In order to achieve the above object, the present invention provides a coding method for a fast-changing audio signal, the method comprising the following steps:
步骤 1、 对快变音频信号进行时频映射处理, 得到该音频信号的频域系 数;  Step 1: Perform time-frequency mapping processing on the fast-changing audio signal to obtain a frequency domain coefficient of the audio signal;
步骤 2、 对该频域系数进行频域-调制域映射处理, 得到调制域系数; 步骤 3、 根据调制域系数得到调制窗, 在每个调制窗内, 根据频率顺序 组织调制域系数;  Step 2: performing frequency domain-modulation domain mapping processing on the frequency domain coefficients to obtain modulation domain coefficients; Step 3. obtaining a modulation window according to modulation domain coefficients, and configuring modulation domain coefficients according to frequency order in each modulation window;
步骤 4、 在每个调制窗中, 划分尺度因子带;  Step 4. In each modulation window, divide a scale factor band;
步骤 5、 对各调制窗的尺度因子带进行重组, 形成大尺度因子带; 步骤 6、 对调制域系数进行量化和熵编码, 得到编码后的音频码流。 所述步驟 1 中的时频映射处理为修正离散余弦变换或快速傅里叶变换。 在所述步骤 1和步骤 2之间还包括: 按频率顺序组织所述频域系数。 所述步 骤 1中的频域调制域映射处理为短修正离散余弦变换或小波变换。所述步骤 4 可具体为: 在每个调制窗中, 按照人耳听觉频率分辨率划分尺度因子带。 所述步骤 5可具体为: 将性质相同或相似的尺度因子带合并组织为大尺度因 子带。 所述步骤 6中的量化可为标量量化, 具体包括:. 对所有尺度因子带中 的调制域系数进行非线性压扩; 再利用每个大尺度因子带的尺度因子对该大 尺度因子带包含的各子带的调制域系数进行量化 得到整数表示的量化谱; 选择每帧信号中的第一个尺度因子作为公共尺度因子; 其它尺度因子与其前 一个尺度因子进行差分处理; Step 5: Recombining the scale factor bands of each modulation window to form a large-scale factor band; Step 6. Perform quantization and entropy coding on the modulation domain coefficients to obtain an encoded audio code stream. The time-frequency mapping process in the step 1 is a modified discrete cosine transform or a fast Fourier transform. The step 1 and the step 2 further include: organizing the frequency domain coefficients in a frequency order. The frequency domain modulation domain mapping process in the step 1 is a short modified discrete cosine transform or a wavelet transform. The step 4 may be specifically: dividing, in each modulation window, a scale factor band according to a human ear auditory frequency resolution. The step 5 may be specifically: combining the scale factor bands with the same or similar properties into a large scale factor band. The quantization in the step 6 may be scalar quantization, and specifically includes: The modulation domain coefficients are subjected to nonlinear companding; and the quantization domain coefficients of each sub-band included in the large-scale factor band are quantized by using the scale factor of each large-scale factor band to obtain a quantized spectrum represented by an integer; The first scale factor is used as a common scale factor; other scale factors are differentially processed from their previous scale factor;
所述熵编码包括: 对量化谱和差分处理后的尺度因子进行熵编码, 得到 码书序号、 尺度因子编码值及无损编码量化谱; 对码书序号进行熵编码, 得 到码书序号编码值。  The entropy coding comprises: entropy coding the quantized spectrum and the scaled factor after the difference processing, obtaining a code book serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the code book serial number to obtain a code book serial number coded value.
所述步骤 6之后, 将尺度因子带的组织方式、 量化器参数及熵编码方式 信息作为控制信息传输。  After the step 6, the organization mode, the quantizer parameter and the entropy coding mode information of the scale factor band are transmitted as control information.
为了实现上述目的, 本发明还提供了一种快变音频信号的解码方法, 该 方法包括以下步骤:  In order to achieve the above object, the present invention also provides a decoding method of a fast-changing audio signal, the method comprising the following steps:
步骤 1、 根据音频码流控制信息中的尺度因子带组织方式及熵编码方式 信息, 对音频信号进行熵解码, 得到调制域系数的量化值;  Step 1. Entropy decoding the audio signal according to the scale factor organization mode and the entropy coding mode information in the audio code stream control information, to obtain a quantized value of the modulation domain coefficient;
步骤 2、 根据控制信息中的尺度因子带组织方式及尺度因子信息, 对调 制域系数的量化值进行逆量化, 得到逆量化调制域系数;  Step 2: Perform inverse quantization on the quantized value of the modulation domain coefficient according to the scale factor band organization mode and the scale factor information in the control information, to obtain an inverse quantization modulation domain coefficient;
步骤 3、 对逆量化调制域系数在时频平面进行重组, 然后进行调制域-频 域映射处理, 得到频域系数;  Step 3: Perform recombination on the inverse-quantized modulation domain coefficients in the time-frequency plane, and then perform modulation domain-frequency domain mapping processing to obtain frequency domain coefficients;
步骤 4、 对频域系数进行频时映射处理, 得到时域音频信号。  Step 4: Perform frequency-frequency mapping processing on the frequency domain coefficients to obtain a time domain audio signal.
所述步骤 3中的重组具体包括: 先将调制域系数在频率方向组织, 每个 频带中的系数在时间方向组织, 然后将组织好的系数按照调制窗、 尺度因子 带的顺序排列。  The recombination in the step 3 specifically includes: firstly, the modulation domain coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the modulation window and the scale factor band.
本发明通过更充分的组织尺度因子带, 将性质相同或相似的系数组织在 一起编码,有效的提高了快变信号的编码效率并显著提升了编解码器的性能。  The invention organizes the coefficients with the same or similar coefficients together by a more comprehensive tissue scale factor band, which effectively improves the coding efficiency of the fast-changing signal and significantly improves the performance of the codec.
下面通过附图和实施例, 对本发明的技术方案做进一步的详细描述。 附图说明 图 1为本发明的快变音频信号的编码方法流程图; The technical solution of the present invention will be further described in detail below through the accompanying drawings and embodiments. DRAWINGS 1 is a flow chart of a method for encoding a fast-changing audio signal according to the present invention;
图 2为本发明的快变音频信号的编码方法中组织调制域系数示意图; 图 3为本发明的快变音频信号的编码方法中划分尺度因子带示意图; 图 4为本发明的快变音频信号的编码方法中重组尺度因子带示意图; 图 5为本发明的快变音频信号的解码方法流程图。 具体实施方式  2 is a schematic diagram of a coefficient of a tissue modulation domain in a coding method of a fast-changing audio signal according to the present invention; FIG. 3 is a schematic diagram of a division factor band in a coding method of a fast-changing audio signal according to the present invention; FIG. 4 is a fast-changing audio signal of the present invention; Schematic diagram of the recombination scale factor band in the coding method; FIG. 5 is a flow chart of the decoding method of the fast-change audio signal according to the present invention. detailed description
本发明是对快变信号的编解码进行了改进, 按照人耳听觉频率分辨率, 将调制窗划分为多个尺度因子带, 并才艮据编码信号的特性 , 对尺度因子带进 行重组, 形成大尺度因子带, 在对调制域系数进行量化和熵编码时, 同一大 尺度因子带共用一个量化器和 /或熵编码器,从而充分利用了信号的调制域特 性, 提高了压缩比率。  The invention improves the codec of the fast-changing signal, and divides the modulation window into a plurality of scale factor bands according to the auditory frequency resolution of the human ear, and reorganizes the scale factor band according to the characteristics of the coded signal. In the large-scale factor band, when the modulation domain coefficients are quantized and entropy encoded, the same large-scale factor band shares a quantizer and/or an entropy coder, thereby fully utilizing the modulation domain characteristics of the signal and increasing the compression ratio.
如图 1 所示, 为本发明的快变音频信号的编码方法流程图, 该方法包括 如下步骤:  As shown in FIG. 1, it is a flowchart of a method for encoding a fast-changing audio signal according to the present invention, and the method includes the following steps:
步骤 1、 对快变音频信号进行时频映射处理, 得到该音频信号的频域系 数;  Step 1: Perform time-frequency mapping processing on the fast-changing audio signal to obtain a frequency domain coefficient of the audio signal;
步骤 2、 对该频域系数进行频域 -调制域映射处理, 得到调制域系数; 步骤 3、 才艮据调制域系数得到调制窗, 在每个调制窗内, 根据频率顺序 组织调制域系数;  Step 2: performing frequency domain-modulation domain mapping processing on the frequency domain coefficients to obtain modulation domain coefficients; Step 3: obtaining modulation windows according to modulation domain coefficients, and configuring modulation domain coefficients according to frequency order in each modulation window;
步骤 4、 在每个调制窗中, 划分尺度因子带;  Step 4. In each modulation window, divide a scale factor band;
步骤 5、 对各调制窗的尺度因子带进行重组, 形成大尺度因子带; 步骤 6、 对调制域系数进行量化和熵编码, 得到编码后的音频码流。 下面对上述步骤的具体实现进行详细说明。  Step 5: Recombining the scale factor bands of each modulation window to form a large-scale factor band; Step 6. Perform quantization and entropy coding on the modulation domain coefficients to obtain an encoded audio code stream. The specific implementation of the above steps will be described in detail below.
时频映射可采用 MDCT、 FFT、 离散傅里叶变换(DFT;)、 小波变换等时域- 频域变换方法。 下面以 MDCT为例, 说明时频映射过程。  The time-frequency mapping can adopt the time domain-frequency domain transform method such as MDCT, FFT, discrete Fourier transform (DFT;), and wavelet transform. The following takes MDCT as an example to illustrate the time-frequency mapping process.
若每帧数据为 1024点, 则采用 2048点的 MDCT (包括 1024点当前帧数 据和 l OM点前一帧数据), 将线性 PCM信号变换到频 ; 获得 1024点频域系 数。 2048点 MDCT变换可定义为:
Figure imgf000007_0001
If the data per frame is 1024 points, the MDCT of 2048 points is used (including the current frame number of 1024 points). According to the data of the previous frame of the l OM point, the linear PCM signal is converted to the frequency; and the 1024-point frequency domain coefficient is obtained. The 2048 point MDCT transform can be defined as:
Figure imgf000007_0001
式中:  In the formula:
M: 帧长 = 1024  M: frame length = 1024
Km): MDCT频谱系数 Km): MDCT spectral coefficient
nr. 频谱系数索引  Nr. spectral coefficient index
xik): 输入序列  Xik): input sequence
t 样本索引  t sample index
w(k) : 第 r个窗函数系数, 窗函数的公式如下: w(2N ~ l ~ k)
Figure imgf000007_0002
w(k) : The rth window function coefficient, the formula of the window function is as follows: w(2N ~ l ~ k)
Figure imgf000007_0002
MDCT的输入序列和 IMDCT的输出序列有 50 %的重叠(over l ap) , 即 个样 本。 在 IMDCT中, 相邻输出块的重叠部分进行叠加。  The MDCT input sequence and the IMDCT output sequence have a 50% overlap (over l ap), which is a sample. In IMDCT, overlapping portions of adjacent output blocks are superimposed.
然后对频域系数进行多分辨率分析, 对输入的频域数据进行时 -频域的 重新组织, 以频率精度的降低为代价提高频域数据的时间分辨率, 从而自动 适应快变类型信号的时频特性。  Then, multi-resolution analysis is performed on the frequency domain coefficients, and the input frequency domain data is reorganized in the time-frequency domain, and the time resolution of the frequency domain data is improved at the cost of the frequency precision reduction, thereby automatically adapting to the fast variable type signal. Time-frequency characteristics.
多分辨率分析包括频域系数变换和重组,其中通过频域-调制域映射将频 域系数变换为调制域系数, 该调制域系数为时频平面的系数; 然后通过重组 将调制域系数按照一定的规则进行分组。  The multi-resolution analysis includes frequency domain coefficient transform and recombination, wherein the frequency domain coefficients are transformed into modulation domain coefficients by a frequency domain-modulation domain mapping, and the modulation domain coefficients are coefficients of a time-frequency plane; The rules are grouped.
频域-调制域映射可采用频域小波变换或频域短 MDCT 变换, 下面以短 MCDT变换说明频域 -调制域映射过程。  The frequency domain-modulation domain mapping can be performed by frequency domain wavelet transform or frequency domain short MDCT transform. The following is a short MCDT transform to illustrate the frequency domain-modulation domain mapping process.
先按频率顺序组织频域系数, 然后按频率顺序采用 128个 16点(包括 8 个点的重叠)短 MDCT进行频域多分辨率变换,获得 1024个频率-调制域系数。 在进行短 MDCT时, 以 8为间隔进行 16阶短 MDCT变换, 每次短 MDCT变换的 前 8个谱系数为上一次变换输入的谱系数,每个短 MDCT变换获得 8个调制域 系数。 短 MDCT的计算公式如下:
Figure imgf000008_0001
The frequency domain coefficients are first organized in frequency order, and then 128 16-point (including 8-point overlap) short MDCT are used for frequency domain multi-resolution transform in frequency order to obtain 1024 frequency-modulation domain coefficients. In the short MDCT, the 16th-order short MDCT transform is performed at intervals of 8. The first 8 spectral coefficients of each short MDCT transform are the spectral coefficients of the last transform input, and each short MDCT transform obtains 8 modulation domains. Coefficient. The formula for calculating short MDCT is as follows:
Figure imgf000008_0001
„ = ( 7V/2+1 ) 12  „ = ( 7V/2+1 ) 12
N. = 16  N. = 16
其中:  among them:
n : 样本序号;  n : sample serial number;
m: 第 HI个 MDCT;  m: the first HI MDCT;
k: 该组系数的谱序号; fi : 样本数; win-m(n) 正弦窗系数, 如下确定:  k: the spectral number of the set of coefficients; fi : the number of samples; win-m(n) the sine window coefficient, as determined below:
(1) win-m{n) = sin (pi/Ns * (/2+0. 5) ) Λ=0. . . 15 , 0<Λ 127(1) win-m{n) = sin (pi/Ns * (/2+0. 5) ) Λ=0. . . 15 , 0<Λ 127
(2) win— 0(0. . . 3) = 0
Figure imgf000008_0002
win- 0 {n) = sin {pi/Ns * Co+ 0. 5) ) Λ=8. . . 15
(2) win— 0(0. . . 3) = 0
Figure imgf000008_0002
Win- 0 {n) = sin {pi/Ns * Co+ 0. 5) ) Λ=8. . . 15
(3) win- 127 (n) = sin (pi/Ns * (Λ+0. 5) ) Λ=0. . . 7 (3) win- 127 (n) = sin (pi/Ns * (Λ+0. 5) ) Λ=0. . . 7
win-127( . . . 1 1) = 1  Win-127( . . . 1 1) = 1
【ϋ (12. . · 15) = 0  [ϋ (12. . · 15) = 0
在计算调制域系数后, 将调制域系数组织成调制窗, 在每个调制窗内, 根据频率顺序组织调制域系数。 如图 2所示, 为组织调制域系数示意图, 将 上述的通过 8点短 MDCT获得的 1 024个调制域系数按 8个调制窗进行组织, 获得短 MDCT变换的 8个调制窗, 每个调制窗内有 128个调制域系数; 在每个 调制窗中, 128个系数按频率顺序组织。  After calculating the modulation domain coefficients, the modulation domain coefficients are organized into a modulation window, and within each modulation window, the modulation domain coefficients are organized according to the frequency order. As shown in FIG. 2, in order to organize the modulation domain coefficient diagram, the above-mentioned 1,024 modulation domain coefficients obtained by the 8-point short MDCT are organized into eight modulation windows, and eight modulation windows of the short MDCT transformation are obtained, and each modulation is obtained. There are 128 modulation domain coefficients in the window; in each modulation window, 128 coefficients are organized in frequency order.
如图 3所示, 为划分尺度因子带示意图。 在每个调制窗中, 按照人耳听 觉频率分辨率, 如 BARK刻度或 ERB刻度, 将频率划,分为多个尺度因子带, 每 个尺度因子带为一个基本的编码单元, 带中的系数共用一个量化器和熵编码 器。 As shown in Figure 3, it is a schematic diagram of the scale factor band. In each modulation window, according to the human ear The frequency resolution, such as the BARK scale or the ERB scale, divides the frequency division into multiple scale factor bands, each scale factor band is a basic coding unit, and the coefficients in the band share a quantizer and an entropy encoder.
如图 4所示, 为重组尺度因子带示意图。 在划分尺度因子带后, 根据当 前编码信号的特性, 例如能量、 方差和平稳度(也称峭度)等, 对如上获得 的尺度因子带进行重组, 将性质相同或相似的尺度因子带组织在一起, 形成 一个或多个大尺度因子带, 每个大尺度因子带中的各尺度因子带称为该大尺 度因子带的子带。 对调制域系数进行量化和熵编码时, 每个大尺度因子带的 各子带共用一个量化器和 /或燏编码器。 通过更充分的组织尺度因子带, 将性 质相同或相似的系数组织在一起编码, 有效的提高了快变信号的编码效率并 显著提升了编解码器的性能。  As shown in Figure 4, it is a schematic diagram of the recombination scale factor. After dividing the scale factor band, according to the characteristics of the current coded signal, such as energy, variance and flatness (also called kurtosis), the scale factor bands obtained above are recombined, and the scale factor bands with the same or similar properties are organized. Together, one or more large-scale factor bands are formed, and each scale factor band in each large-scale factor band is called a sub-band of the large-scale factor band. When the modulation domain coefficients are quantized and entropy encoded, each subband of each large scale factor band shares a quantizer and/or a chirp encoder. Through the more comprehensive organization scale factor bands, the coefficients with the same or similar properties are organized and coded together, which effectively improves the coding efficiency of fast-changing signals and significantly improves the performance of the codec.
量化可以是标量量化或矢量量化。 标量量化主要是利用尺度因子对调制 域系数进行量化, 每个大尺度因子带所包含的各子带采用同一尺度因子, 标 量量化包括以下步驟: 对所有尺度因子带中的调制域系数进行非线性压扩; 再利用每个大尺度因子带的尺度因子对该大尺度因子带包含的各子带的调制 域系数进行量化, 得到整数表示的量化谱; 选择每帧信号中的第一个尺度因 子作为公共尺度因子; 其它尺度因子与其前一个尺度因子进行差分处理。  The quantization can be scalar quantization or vector quantization. The scalar quantization mainly uses the scale factor to quantize the modulation domain coefficients. Each sub-band included in the large-scale factor band uses the same scale factor. The scalar quantization includes the following steps: Nonlinearity of the modulation domain coefficients in all scale factor bands Companding; then using the scale factor of each large-scale factor band to quantize the modulation domain coefficients of each sub-band included in the large-scale factor band to obtain a quantized spectrum represented by an integer; selecting the first scale factor in each frame signal As a common scale factor; other scale factors are differentially processed from their previous scale factor.
矢量量化包括以下步骤: 将调制域系数构成多个多维矢量信号; 对于每 个矢量都根据平整因子进行傳平整; 根据主观感知距离测度准则在码书中查 找与待量化矢量距离最小的码字, 获得其码字索引, 属于同一大尺度因子带 的各子带共用一个码字。  Vector quantization includes the following steps: constituting a modulation domain coefficient into a plurality of multi-dimensional vector signals; performing flattening according to a flattening factor for each vector; finding a codeword having a minimum distance from a vector to be quantized in a codebook according to a subjective perceptual distance measurement criterion, The codeword index is obtained, and each subband belonging to the same large scale factor band shares one codeword.
在经过上述量化处理后, 利用熵编码技术进一步去除量化后的调制域系 数统计冗余。 熵编码是一种信源编码技术, 其基本思想是: 对出现概率较大 的符号给予较短长度的码字, 而对出现概率小的符号给予较长的码字, 这样 平均码字的长度最短。 根据 Shannon的无噪声编码定理, 如果传输的 N个源 消息的符号是独立的, 那么使用适当的变长度编码, 码字的平均长度 将满 ,其中 H (X)表示信源的熵, x表示符号变量。
Figure imgf000010_0001
After the above quantization process, the entropy coding technique is further used to further remove the quantized modulation domain coefficient statistical redundancy. Entropy coding is a kind of source coding technology. The basic idea is: Give a shorter length codeword to a symbol with a higher probability of occurrence, and a longer codeword to a symbol with a lower probability of occurrence, such that the length of the average codeword The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then with the appropriate variable length coding, the average length of the codewords will be full. Where H ( X ) represents the entropy of the source and x represents the symbol variable.
Figure imgf000010_0001
由于熵 ί-Kx)是平均码字长度的最短极限, 上述公式表明此时码字的平均长度 很接近于它的下限熵 Η (χ) , 因此这种变长度编码技术又成为 "熵编码"。 Since entropy ί-Kx) is the shortest limit of the average codeword length, the above formula indicates that the average length of the codeword is very close to its lower entropy Η(χ), so this variable length coding technique becomes "entropy coding". .
熵编码步骤包括: 对量化谱和差分处理后的尺度因子进行熵编码, 得到 码书序号、 尺度因子编码值和无损编码量化谱; 对码书序号进行熵编码, 得 到码书序号编码值。 熵编码时, 同一大尺度因子带内的量化谱采用同一码表, 从而可以节约边带信息的比特占用。  The entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the codebook sequence number to obtain a codebook serial number coded value. In entropy coding, the quantized spectrum in the same large scale factor band uses the same code table, which can save the bit occupation of the sideband information.
或者是: 对码字索引进行一维或多维熵编码, 得到码字索引的编码值。 上述的熵编码方法可采用现有的 Huffman编码、 算术编码或游程编码等 方法中的任一种。  Or: Perform one-dimensional or multi-dimensional entropy coding on the codeword index to obtain the coded value of the codeword index. The above entropy coding method may employ any of existing Huffman coding, arithmetic coding or run length coding.
经过量化和熵编码处理后, 得到编码后的音频码流; 尺度因子带的组织 方式、 尺度因子及熵编码模式(如 Huffman书序号)作为控制信息传输, 将 音频码流与控制信息一起进行复用得到压缩音频码流。  After quantization and entropy coding, the encoded audio code stream is obtained; the organization of the scale factor band, the scale factor and the entropy coding mode (such as the Huffman book number) are transmitted as control information, and the audio code stream is combined with the control information. Use to get a compressed audio stream.
当采用上述方法对快速音频信号编码后, 在解码时, 对码流中的緩变信 号还可按照 专 利 申 请 PCT/CN2004/001034 、 PCT/CN2005/000440 、 PCT/CN2005/000441 , 200410046153及 200410046154中的方法, 对于快变信 号需要按照本发明的快变信号解码方法进行解码。 如图 5所示, 为本发明的 快变音频信号的解码方法流程图, 该方法包括如下步骤:  When the fast audio signal is encoded by the above method, the slow-changing signal in the code stream can also be used in the patent applications PCT/CN2004/001034, PCT/CN2005/000440, PCT/CN2005/000441, 200410046153 and 200410046154. The method for fast changing signals needs to be decoded according to the fast variable signal decoding method of the present invention. As shown in FIG. 5, it is a flowchart of a method for decoding a fast-change audio signal according to the present invention, and the method includes the following steps:
步驟 A、 根据音频码流控制信息中的尺度因子带组织方式及熵编码方式 信息, 对音频信号进行熵解码, 得到调制域系数的量化值;  Step A: Entropy decoding the audio signal according to the scale factor band organization mode and the entropy coding mode information in the audio code stream control information, to obtain a quantized value of the modulation domain coefficient;
尺度因子带的组织方式、 尺度因子和熵编码模式(如 Huffman书序号) 作为控制信息传输, 在熵解码时, 先根据尺度因子带的组织方式和熵编码模 式确定熵解码方式, 对码流进行熵解码。  The organization method, scale factor and entropy coding mode (such as Huffman book number) of the scale factor band are used as control information transmission. In the entropy decoding, the entropy decoding mode is first determined according to the organization mode of the scale factor band and the entropy coding mode, and the code stream is performed. Entropy decoding.
步骤 B、 根据控制信息中的尺度因子带组织方式及尺度因子信息, 对调 制域系数的量化值进行逆量化, 得到逆量化调制域系数; :- 步骤 C、 对逆量化调制域系数在时频平面进行重组, 然后进行调制域-频 域映射处理, 得到频域系数; Step B, and according to the scale of the scale factor band information and organization information in the control factor, the quantization value of the modulation domain coefficients inverse quantization to obtain inverse quantized modulation domain coefficients;: - Step C: Perform recombination on the inverse-quantized modulation domain coefficients on the time-frequency plane, and then perform modulation domain-frequency domain mapping processing to obtain frequency domain coefficients;
当编码端提供多种频域 -调制域映射处理方法时,解码端要采用相应的逆 映射方法进行处理,可将编码时采用的频域 -调制域映射处理方法编码到
Figure imgf000011_0001
的控制信息中, 解码端根据该控制信息选择相应的逆映射方法进行处理, 得 到频域系数。
When the encoding end provides multiple frequency domain-modulation domain mapping processing methods, the decoding end should adopt the corresponding inverse mapping method for processing, and the frequency domain-modulation domain mapping processing method used in encoding can be encoded to
Figure imgf000011_0001
In the control information, the decoding end selects a corresponding inverse mapping method according to the control information, and obtains a frequency domain coefficient.
先将调制域系数在时频平面按照一定的规则重組, 再对调制域系数进行 调制域 -频域映射, 得到频域系数。 重组的方法可以包括: 先将调制域系数在 频率方向組织, 每个频带中的系数在时间方向组织, 然后将组织好的系数按 照调制窗、 尺度因子带的顺序排列。 当釆用公式(2 )进行频域 -调制域映射 处理时调制域 -频域处理要采用短 IMDCT方式, 根据如下公式进行:
Figure imgf000011_0002
Firstly, the modulation domain coefficients are reorganized according to a certain rule in the time-frequency plane, and then the modulation domain-frequency domain mapping is performed on the modulation domain coefficients to obtain the frequency domain coefficients. The method of recombining may include: firstly, the modulation domain coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the modulation window and the scale factor band. When using the formula (2) for frequency domain-modulation domain mapping processing, the modulation domain-frequency domain processing should use the short IMDCT method, according to the following formula:
Figure imgf000011_0002
式中的各变量定义与公式(2 )相同。 The definition of each variable in the formula is the same as equation (2).
步骤 D、 对频域系数进行频时映射处理, 得到时域音频信号。  Step D: Perform frequency-frequency mapping processing on the frequency domain coefficients to obtain a time domain audio signal.
当编码端提供多种时频映射处理方法时, 解码端要采用相应的逆映射方 法进行处理,可将编码时采用的时频映射处理方法编码到码流的控制信息中, 解码端根据该控制信息选择相应的逆映射方法进行处理, 得到时域信号。  When the encoding end provides multiple time-frequency mapping processing methods, the decoding end should adopt the corresponding inverse mapping method for processing, and the time-frequency mapping processing method used in encoding can be encoded into the control information of the code stream, and the decoding end is controlled according to the control. The information is selected by the corresponding inverse mapping method to obtain a time domain signal.
当采用公式(1 )进行时频映射时, 频时映射要采用 IMDCT方式, 根据如 下公式进^ ί亍:
Figure imgf000011_0003
When using the formula (1) for time-frequency mapping, the frequency-time mapping should use the IMDCT method, according to the following formula:
Figure imgf000011_0003
式中的各变量定义与公式(1 )相同。 The definition of each variable in the formula is the same as formula (1).
由于 MDCT的输入序列有 50 %的重叠(over lap) , 因此 IMDCT的输出序列也 有 50%重叠, 即 N个样本。 在 IMDCT中, 相邻输出块的重叠部分进行叠加, 以得 到时域音频信号。  Since the MDCT input sequence has a 50% overlap, the output sequence of the IMDCT also has 50% overlap, ie N samples. In IMDCT, overlapping portions of adjacent output blocks are superimposed to obtain a time domain audio signal.
最后应当说明的是: 以上实施例仅用以说明本发明的技术方案而非对 其限制; 尽管参照较佳实施例对本发明进行了详细的说明, 所属领域的普通 技术人员应当理解, 依然可以对本发明的具体实施方式进行修改或者对部分 技术特征进行等同替换; 而不脱离本发明技术方案的精神, 其均应涵盖在本 发明请求保护的技术方案范围当中。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, not The present invention has been described in detail with reference to the preferred embodiments thereof. It will be understood by those of ordinary skill in the art that The spirit of the technical solution should be covered by the scope of the technical solution claimed in the present invention.

Claims

权 利 要 求 Rights request
1、 一种快变音频信号的编码方法, 其特征在于, 该方法包括以下步骤: 步骤 1、 对快变音频信号进行时频映射处理, 得到该音频信号的频域系 数; A method for encoding a fast-changing audio signal, the method comprising the following steps: Step 1: performing time-frequency mapping processing on the fast-changing audio signal to obtain a frequency domain coefficient of the audio signal;
步骤 2、 对该频域系数进行频域-调制域映射处理, 得到调制域系数; 步骤 3、 根据调制域系数得到调制窗, 在每个调制窗内, 根据频率顺序 组织调制域系数;  Step 2: performing frequency domain-modulation domain mapping processing on the frequency domain coefficients to obtain modulation domain coefficients; Step 3. obtaining a modulation window according to modulation domain coefficients, and configuring modulation domain coefficients according to frequency order in each modulation window;
步骤 4、 在每个调制窗中, 划分尺度因子带;  Step 4. In each modulation window, divide a scale factor band;
步骤 5、 对各调制窗的尺度因子带进行重组, 形成大尺度因子带; 步驟 6、 对调制域系数进行量化和熵编码, 得到编码后的音频码流。  Step 5: Recombining the scale factor bands of each modulation window to form a large-scale factor band; Step 6. Perform quantization and entropy coding on the modulation domain coefficients to obtain an encoded audio code stream.
2、 根据权利要求 1所述的方法, 其特征在于, 所述步骤 1 中的时频映 射处理为修正离散余弦变换或快速傅里叶变换。  2. The method according to claim 1, wherein the time-frequency mapping process in the step 1 is a modified discrete cosine transform or a fast Fourier transform.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 在所述步驟 1和步 骤 2之间还包括: 按频率顺序组织所述频域系数。  The method according to claim 1 or 2, characterized in that, between the step 1 and the step 2, the method further comprises: organizing the frequency domain coefficients in order of frequency.
4、 据权利要求 1至 3任一所述的方法, 其特征在于, 所述的频域-调 制域映射处理为短修正离散余弦变换或小波变换。  The method according to any one of claims 1 to 3, characterized in that the frequency domain-modulation domain mapping process is a short modified discrete cosine transform or a wavelet transform.
5、 根据权利要求 1至 4任一所述的方法, 其特征在于, 所述步骤 4具 体为: 在每个调制窗中, 按照人耳听觉频率分辨率划分尺度因子带。  The method according to any one of claims 1 to 4, wherein the step 4 is specifically: dividing, in each modulation window, a scale factor band according to a human ear auditory frequency resolution.
6、 根据权利要求 1至 5任一所述的方法, 其特征在于, 所述步骤 5具 体为: 将性质相同或相似的尺度因子带合并組织为大尺度因子带。  The method according to any one of claims 1 to 5, wherein the step 5 is: combining the scale factor bands of the same or similar properties into a large-scale factor band.
7、 根据权利要求 1至 6任一所述的方法, 其特征在于, 所述步骤 6中 的量化是标量量化, 具体包括: 对所有尺度因子带中的调制域系数进行非线 性压扩; 再利用每个大尺度因子带的尺度因子对该大尺度因子带包含的各子 带的调制域系数进行量化, 得到整数表示的量化谱; 选择每帧信号中的第一 个尺度因子作为公共尺度因子; 其它尺度因子与其前一个尺度因子进行差分 处理 ·, The method according to any one of claims 1 to 6, wherein the quantization in the step 6 is scalar quantization, specifically comprising: performing nonlinear companding on modulation domain coefficients in all scale factor bands; The quantization domain coefficients of each sub-band included in the large-scale factor band are quantized by the scale factor of each large-scale factor band to obtain a quantized spectrum represented by an integer; the first scale factor in each frame signal is selected as a common scale factor ; other scale factors are differentiated from their previous scale factor deal with·,
所述熵编码包括: 对量化谱和差分处理后的尺度因子进行熵编码, 得到 码书序号、 尺度因子编码值及无损编码量化谱熵编码时, 同一大尺度因子带 内的量化谱采用同一码表; 对码书序号进行熵编码, 得到码书序号编码值。  The entropy coding includes: entropy coding the quantized spectrum and the scaled factor after the difference processing, and obtaining the codebook serial number, the scale factor coded value, and the lossless coded quantized spectral entropy coding, and the quantized spectrum in the same large scale factor band adopts the same code. Table; Entropy coding the code book number to obtain the code book number code value.
8、 根据权利要求 1至 7任一所述的方法, 其特征在于, 所述步骤 6之 后, 将尺度因子带的组织方式、 尺度因子及嫡编码模式信息作为控制信息传 输。  The method according to any one of claims 1 to 7, characterized in that, after the step 6, the organization mode, the scale factor and the 嫡 coding mode information of the scale factor band are transmitted as control information.
9、 一种快变音频信号的解码方法, 其特征在于, 该方法包括以下步骤: 步骤 1、 居音频码流控制信息中的尺度因子带组织方式及熵编码方式 信息, 对音频信号进行熵解码, 得到调制域系数的量化值;  A method for decoding a fast-changing audio signal, the method comprising the following steps: Step 1. The scale factor organization mode and the entropy coding mode information in the audio stream flow control information, and entropy decoding the audio signal Obtaining a quantized value of the modulation domain coefficient;
步骤 2、 根据控制信息中的尺度因子带组织方式及尺度因子信息, 对调 制域系数的量化值进行逆量化 , 得到逆量化调制域系数;  Step 2: Perform inverse quantization on the quantized value of the modulation domain coefficient according to the scale factor band organization mode and the scale factor information in the control information, to obtain an inverse quantization modulation domain coefficient;
步骤 3、 对逆量化调制域系数在时频平面进行重组, 然后进行调制域-频 域映射处理, 得到频域系数;  Step 3: Perform recombination on the inverse-quantized modulation domain coefficients in the time-frequency plane, and then perform modulation domain-frequency domain mapping processing to obtain frequency domain coefficients;
步驟 4、 对频域系数进行频时映射处理, 得到时域音频信号。  Step 4: Perform frequency-frequency mapping processing on the frequency domain coefficients to obtain a time domain audio signal.
1 0、 根据权利要求 9所述的方法, 其特征在于, 所述步骤 3中的重组具 体包括: 先将调制域系数在频率方向组织, 每个频带中的系数在时间方向组 织, 然后将组织好的系数按照调制窗、 尺度因子带的顺序排列。  The method according to claim 9, wherein the recombining in the step 3 specifically comprises: first modulating the modulation domain coefficients in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organization The good coefficients are arranged in the order of the modulation window and the scale factor band.
PCT/CN2006/000474 2006-03-23 2006-03-23 A coding/decoding method of rapidly-changing audio-frequency signals WO2007107046A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/000474 WO2007107046A1 (en) 2006-03-23 2006-03-23 A coding/decoding method of rapidly-changing audio-frequency signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/000474 WO2007107046A1 (en) 2006-03-23 2006-03-23 A coding/decoding method of rapidly-changing audio-frequency signals

Publications (1)

Publication Number Publication Date
WO2007107046A1 true WO2007107046A1 (en) 2007-09-27

Family

ID=38522007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/000474 WO2007107046A1 (en) 2006-03-23 2006-03-23 A coding/decoding method of rapidly-changing audio-frequency signals

Country Status (1)

Country Link
WO (1) WO2007107046A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318926A (en) * 2014-09-29 2015-01-28 四川九洲电器集团有限责任公司 IntMDCT-based lossless audio encoding method and decoding method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004042722A1 (en) * 2002-11-07 2004-05-21 Samsung Electronics Co., Ltd. Mpeg audio encoding method and apparatus
US20050075861A1 (en) * 2003-09-29 2005-04-07 Jeongnam Youn Method for grouping short windows in audio encoding
US20050144017A1 (en) * 2003-09-15 2005-06-30 Stmicroelectronics Asia Pacific Pte Ltd Device and process for encoding audio data
CN1677493A (en) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 Intensified audio-frequency coding-decoding device and method
CN1677492A (en) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 Intensified audio-frequency coding-decoding device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004042722A1 (en) * 2002-11-07 2004-05-21 Samsung Electronics Co., Ltd. Mpeg audio encoding method and apparatus
US20050144017A1 (en) * 2003-09-15 2005-06-30 Stmicroelectronics Asia Pacific Pte Ltd Device and process for encoding audio data
US20050075861A1 (en) * 2003-09-29 2005-04-07 Jeongnam Youn Method for grouping short windows in audio encoding
CN1677493A (en) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 Intensified audio-frequency coding-decoding device and method
CN1677492A (en) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 Intensified audio-frequency coding-decoding device and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318926A (en) * 2014-09-29 2015-01-28 四川九洲电器集团有限责任公司 IntMDCT-based lossless audio encoding method and decoding method

Similar Documents

Publication Publication Date Title
KR101517265B1 (en) Compression of audio scale-factors by two-dimensional transformation
AU2006332046B2 (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
CN101223577B (en) Method and apparatus to encode/decode low bit-rate audio signal
CN103415884B (en) Device and method for execution of huffman coding
TW533405B (en) Perceptual audio signal compression system and method
WO2005027094A1 (en) Method and device of multi-resolution vector quantilization for audio encoding and decoding
CN1866355B (en) Audio coding apparatus and method, and audio decoding apparatus and method
CN1787383B (en) Methods and apparatuses for transforming, adaptively encoding, inversely transforming and adaptively decoding an audio signal
WO2005096274A1 (en) An enhanced audio encoding/decoding device and method
US20090319278A1 (en) Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (mclt)
CN102334159A (en) Encoder, decoder, and method therefor
JP2005107255A5 (en)
CN102436819A (en) Wireless audio compression and decompression methods, audio coder and audio decoder
CN101162584A (en) Method and apparatus to encode and decode audio signal by using bandwidth extension technique
WO2005096273A1 (en) Enhanced audio encoding/decoding device and method
JP2004531151A (en) Method and apparatus for processing time discrete audio sample values
Rajesh et al. Speech compression using different transform techniques
Kumar et al. The optimized wavelet filters for speech compression
Zhao et al. Speech Compression with Best Wavelet Packet Transform and SPIHT Algorithm
WO2007107046A1 (en) A coding/decoding method of rapidly-changing audio-frequency signals
JPH09135176A (en) Information coder and method, information decoder and method and information recording medium
CN100538821C (en) The decoding method of fast audio-variable signal
JP4191503B2 (en) Speech musical sound signal encoding method, decoding method, encoding device, decoding device, encoding program, and decoding program
JPH10276095A (en) Encoder/decoder
WO2005096508A1 (en) Enhanced audio encoding and decoding equipment, method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06722126

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC, EPO FORM 1205A DATED 22.01.09

122 Ep: pct application non-entry in european phase

Ref document number: 06722126

Country of ref document: EP

Kind code of ref document: A1