WO2015096789A1 - 一种用于音频信号的矢量量化编解码方法及装置 - Google Patents

一种用于音频信号的矢量量化编解码方法及装置 Download PDF

Info

Publication number
WO2015096789A1
WO2015096789A1 PCT/CN2014/095012 CN2014095012W WO2015096789A1 WO 2015096789 A1 WO2015096789 A1 WO 2015096789A1 CN 2014095012 W CN2014095012 W CN 2014095012W WO 2015096789 A1 WO2015096789 A1 WO 2015096789A1
Authority
WO
WIPO (PCT)
Prior art keywords
quantized
time
frequency
vector
dividing
Prior art date
Application number
PCT/CN2014/095012
Other languages
English (en)
French (fr)
Inventor
潘兴德
吴超刚
李靓
Original Assignee
北京天籁传音数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京天籁传音数字技术有限公司 filed Critical 北京天籁传音数字技术有限公司
Publication of WO2015096789A1 publication Critical patent/WO2015096789A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio

Definitions

  • the present invention relates to a vector quantization codec method and apparatus for an audio signal.
  • the audio signals of the transform domain are mostly quantized using a scalar quantization scheme, such as scalars for MDCT data in standards such as MPEG-1 Layer 3 (MP3), MPEG 2/4 AAC, and AVS.
  • MP3 MPEG-1 Layer 3
  • MPEG 2/4 AAC MPEG 2/4 AAC
  • AVS MPEG 2/4 AAC
  • Huffman coding is used for entropy coding.
  • the MDCT data is decomposed into an exponent and a mantissa, and the mantissa portion is quantized with a variable bit number according to the bit allocation model. Since the scalar quantization scheme cannot effectively utilize the redundancy existing between adjacent data of the transform domain signal, it is difficult to obtain an ideal coding effect.
  • TWINVQ transform domain weighted cross vector quantization
  • the transform domain weighted cross vector quantization (TWINVQ) scheme is an audio coding method using vector quantization technology. After the MDCT transform of the signal, the cross-selection signal spectrum is used. The parameters construct the vector to be quantized, and then the high-efficiency vector quantization is used to obtain the better audio coding quality, but the TWINVQ does not effectively use the audio perception characteristics to control the quantization noise, and the TWINVQ does not make full use of the signal characteristics to guide the organization of the vector. Therefore, further improvement is needed.
  • the present invention provides a vector quantization coding method for an audio signal, comprising: performing audio perceptual analysis on a transform domain spectrum of an audio signal, and analyzing the audio signal according to the analysis result
  • the transform domain spectrum is subjected to amplitude adjustment (the amplitude-adjusted transform domain spectrum is referred to as a weighted spectrum) to obtain a weighted spectrum to be quantized;
  • the weighted spectrum to be quantized is organized to obtain a plurality of vectors to be quantized;
  • a plurality of vectors to be quantized are quantized to obtain vector quantized encoded data.
  • the step of organizing the weighted spectrum to be quantized comprises: constructing a time-frequency plane of the weighted spectrum to be quantized; performing the time-frequency plane according to the type of the audio signal and its tonality Dividing, and organizing the weighted spectra into a plurality of vectors to be quantized according to the division result.
  • the dividing the time-frequency plane according to the type of the audio signal and its tonality, and organizing the weighted spectrum into a plurality of vectors to be quantized according to the dividing result comprises: performing frequency extraction Dividing and organizing, specifically determining that the audio signal is a stationary signal having a harmonic structure according to the type and tonality, and then dividing the time-frequency plane according to a time direction, in units of harmonics
  • the weighted spectrum is subjected to frequency extraction, and the weighted spectrum is organized into a plurality of vectors to be quantized; or divided and organized in a time direction, which is specifically determined according to the type and tonality that the audio signal is a stationary signal, and then
  • the time-frequency plane is divided according to a time direction, and the weighted spectrum is organized into a plurality of vectors to be quantized according to the division result; or divided and organized according to the frequency direction, which is specifically determined according to the type and the tonality.
  • the audio signal has a fast varying characteristic in the time domain, and then the time-frequency plane is divided according to the frequency direction, according to the segmentation Organizing the weighted spectrum into a plurality of vectors to be quantized; or dividing and organizing according to a time-frequency region, specifically determining that the audio signal is a complex signal according to the tonality and type, and then using the time-frequency plane
  • the method is divided into a plurality of time-frequency regions, and the weighted spectra are organized into a plurality of vectors to be quantized according to the division result.
  • the step of dividing the time-frequency plane according to the type of the audio signal and its tonality, and organizing the weighted spectrum into a plurality of vectors to be quantized according to the division result further comprises: coding according to The rules of maximum gain are divided and organized based on the frequency extraction, the division and organization in the time direction, the division and organization in the frequency direction, the division in the time-frequency region, and the selection coding gain in the organization.
  • the division and organization are performed in a combination of one or several ways.
  • the step of performing quantization quantization on the plurality of vectors to be quantized comprises: performing vector quantization coding on the plurality of vectors to be quantized; or performing scalar quantization on the plurality of vectors to be quantized Entropy coding.
  • the present invention provides a vector quantization decoding method for an audio signal, comprising: decoding vector quantized encoded data to obtain an inverse quantized vector; and vectorizing the inverse quantized vector according to vector partitioning information Reconstructing, obtaining an inverse quantized weighted spectrum; performing amplitude adjustment on the inverse quantized weighted spectrum to obtain decoded data.
  • the present invention provides a vector quantization coding apparatus for audio, comprising: an amplitude adjustment module for performing audio perceptual analysis on a transform domain spectrum of an audio signal, and transforming a domain spectrum of the audio signal according to the analysis result. Performing amplitude adjustment to obtain a weighted spectrum to be quantized; a vector organization module for organizing the weighted spectrum to be quantized to obtain a plurality of vectors to be quantized; and a quantization coding module for performing the plurality of to-be quantis The vector is quantized and encoded to obtain vector quantized encoded data.
  • the vector organization module is configured to: construct a time-frequency plane of the weighted spectrum to be quantized; divide the time-frequency plane according to a type of the audio signal and its tonality, and according to the division result
  • the weighted spectra are organized into a plurality of vectors to be quantized.
  • the dividing the time-frequency plane according to the type of the audio signal and its tonality, and organizing the weighted spectrum into a plurality of vectors to be quantized according to the dividing result comprises: performing frequency extraction Dividing and organizing, specifically determining that the audio signal is a stationary signal having a harmonic structure according to the type and tonality, and then dividing the time-frequency plane according to a time direction, in units of harmonics
  • the weighted spectrum is subjected to frequency extraction, and the weighted spectrum is organized into a plurality of vectors to be quantized; or divided and organized in a time direction, which is specifically determined according to the type and tonality that the audio signal is a stationary signal, and then
  • the time-frequency plane is divided according to a time direction, and the weighted spectrum is organized into a plurality of vectors to be quantized according to the division result; or divided and organized according to the frequency direction, which is specifically determined according to the type and the tonality.
  • the audio signal has a fast varying characteristic in the time domain, and then the time-frequency plane is divided according to the frequency direction, and the root Organizing the weighted spectrum into a plurality of vectors to be quantized according to the division result; or dividing and organizing according to the time-frequency region, specifically determining that the audio signal is a complex signal according to the tonality and type, and then The time-frequency plane is divided into a plurality of time-frequency regions, and the weighted spectrum is organized into a plurality of vectors to be quantized according to the division result.
  • the step of dividing the time-frequency plane according to the type of the audio signal and its tonality, and organizing the weighted spectrum into a plurality of vectors to be quantized according to the division result further includes: Dividing and organizing from the frequency-based extraction according to a rule that maximizes the coding gain, dividing and organizing in the time direction, dividing and organizing in the frequency direction, dividing the time-frequency region, and selecting coding in the organization
  • the division and organization are performed in combination of one or several of the greatest gains.
  • the quantization coding module is configured to: perform vector quantization coding on the plurality of vectors to be quantized; or perform scalar quantization on the plurality of to-quantized vectors and perform entropy coding.
  • the present invention provides a vector quantization decoding apparatus for an audio signal, comprising: a quantization decoding module for decoding vector quantized encoded data to obtain an inverse quantized vector; and a vector reconstruction module for Performing vector reconstruction on the inverse quantized vector according to the vector division information to obtain an inverse quantized weighted spectrum; and a spectrum reconstruction module performing amplitude adjustment on the inverse quantized weighted spectrum to obtain decoded data.
  • the invention provides a vector quantization codec scheme for an audio signal, which adjusts the amplitude of the transform domain signal with reference to the audio sensing characteristic, which can eliminate the perception redundancy and improve the coding efficiency;
  • the frequency plane is reasonably divided and organized into a vector to be quantized; the time-frequency plane division and vector organization mode that maximizes the coding gain can be selected, which is beneficial to efficient quantization and coding of the signal.
  • FIG. 1 is a block diagram of a vector quantization encoding apparatus according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of four vector divisions in accordance with an embodiment of the present invention.
  • FIG. 3 is a block diagram of a vector quantization decoding apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of a mono audio vector quantization coding apparatus according to an embodiment of the present invention.
  • FIG. 5 is a structural block diagram of a mono audio vector quantization decoding apparatus according to an embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a mono band extended audio vector quantization coding apparatus according to an embodiment of the present invention.
  • FIG. 7 is a structural block diagram of a mono band extended audio vector quantization decoding apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of a mono audio vector quantization coding apparatus according to an embodiment of the present invention.
  • the mono audio vector quantization coding apparatus includes: a resampling module 401, a signal type judging module 402, an MDCT transform module 403, a vector quantization encoding module 404, and a bit stream multiplexing module 405. .
  • a resampling module 401 receives data from a signal source and produces data from a signal.
  • a signal type judging module 402 receives data from a signal source and produces a bit stream.
  • an MDCT transform module 403 a vector quantization encoding module 404
  • a bit stream multiplexing module 405. bit stream multiplexing module.
  • the apparatus and method are also applicable to coding of other types of data, such as an MDFT domain, an FFT domain, a QMF domain, and the like.
  • the resampling module 401 is configured to convert the input digital sound signal from the original sampling rate to the target sampling rate, and output the resampled signal to the signal type determining module and the MDCT transform module in units of frames. It should be noted that if the input digital sound signal itself has a target sampling rate, the encoding device in accordance with the principles of the present invention may not include the module.
  • the signal type judging module 402 is configured to perform signal type analysis on the resampled sound signal frame by frame, and output a result of the signal type analysis. Due to the complexity of the signal itself, the signal type can take a variety of representations. For example, if the frame signal is a slowly varying signal, the identifier indicating that the frame signal is a slowly changing signal is directly output; if the fast changing signal is used, the position where the fast change point occurs is continuously calculated, and the output indicates that the frame signal is fast changing. The identification of the signal and the location at which the fast change point occurs.
  • the MDCT transform module 403 is configured to determine the type of the signal output from the signal type judging module 402. As a result, the resampled sound signal is mapped to the MDCT transform domain by using MDCT transforms of different length orders, and the MDCT domain coefficients of the sound signal are output to the vector quantization encoding module 404. Specifically, if the frame signal is a slowly varying signal, the MDCT transform is performed in units of frames to select a longer order MDCT transform; if the fast change signal is used, the frame signal is divided into subframes, in units of subframes. Do the MDCT transform and select the MDCT transform with a shorter order.
  • the vector quantization coding module 404 is configured to receive the MDCT spectral coefficients of the sound signal from the MDCT transform module 403, perform redundancy elimination processing, and perform vector quantization coding on the redundant processed spectrum to obtain MDCT spectrally encoded data, and output the data to the bit stream. Multiplex module.
  • FIG. 1 is a block diagram of a vector quantization encoding apparatus according to an embodiment of the present invention.
  • the vector quantization coding apparatus includes an amplitude adjustment module 101, a vector organization module 102, and a quantization coding module 103.
  • the amplitude adjustment module 101 performs audio perceptual analysis on the signal according to the psychoacoustic model, and accordingly performs amplitude adjustment on the MDCT spectrum to obtain an amplitude-adjusted weighting spectrum to be quantized.
  • Using the psychoacoustic model to adjust the MDCT spectrum can effectively control the distribution of quantization error and improve the perceived quality of reconstructed audio.
  • the amplitude adjustment module 101 can perform amplitude adjustment on the MDCT spectrum according to the spectrum envelope curve.
  • the amplitude adjustment module 101 can obtain an envelope curve by using various methods, such as a spectral envelope curve represented by a line spectrum to an LSP parameter, with segmentation.
  • the spectral envelope curve is represented by a piecewise straight line.
  • a block with an MDCT spectrum length of 512 is described, and the frequency axis is divided into arrays ⁇ 0, 7, 16, 23, 33, 39, 46, 55, 65, 79, 93, 110, 130, 156, 186, 232, 278, 360, 512 ⁇ , and the 0 and 512 points are calculated first.
  • the amplitude represents the whole spectrum. From 46 points, the line segment is divided into two line segments, the amplitudes of the three points are calculated separately, and the spectrum envelope is approximated by two line segments; and so on, respectively, in the following order.
  • the values at both ends are represented by absolute values, and the intermediate values are represented differentially by prediction.
  • the envelope of the entire spectrum is obtained by linear interpolation of the 18-segment polylines for the amplitude adjustment of the MDCT spectrum.
  • the vector organization module 102 arranges and divides the weighted spectrum to be quantized after the amplitude adjustment, and organizes it into a plurality of vectors to be quantized.
  • the time-frequency plane of the MDCT spectrum is constructed, which may be an MDCT spectrum of each block in the frame or an MDCT spectrum between frames.
  • the time-frequency plane is divided according to the result of the signal type judgment and the tonality of the signal, and the MDCT spectrum is organized into a plurality of to-be quantized vectors according to the division.
  • the time-frequency plane division and the vector structure to be quantized can be divided into the following ways: division and organization according to the time direction, specifically, the stationary signal with strong tonality can be divided and organized according to the time direction; The method is divided and organized.
  • the signals with fast-changing characteristics in the time domain can be divided and organized according to the frequency direction; the division and organization are performed based on the frequency extraction method, specifically for the stationary signal with harmonic structure, Vector organization can be performed by frequency extraction; division and organization according to the time-frequency region, specifically for more complex audio signals, the vector can be organized according to the time-frequency region.
  • the division and vector organization may be performed by selecting one or several combinations of the above-described several division and vector organization methods in accordance with the principle of maximizing the coding gain.
  • the frequency coefficient of the signal is N
  • the resolution in the time direction on the time-frequency plane is L
  • the resolution in the frequency direction is K
  • K*L N.
  • the resolution K in the frequency direction is kept unchanged, and the time is divided
  • the resolution L in the time direction is kept unchanged, and the frequency is divided
  • the number of time and frequency direction divisions may be arbitrary, and the size and shape of each divided time-frequency region may be the same, regular, or different, irregular
  • the MDCT spectrum is extracted in units of harmonics.
  • FIG. 2 is a schematic diagram of four vector divisions in accordance with an embodiment of the present invention.
  • the vector is divided into 8*16 8-dimensional vectors in the frequency direction.
  • the results of the vector partitioning in the time direction are a total of 64 * 2 8-dimensional vectors.
  • the vector organizes the vectors according to the time-frequency region, and there are 16*8 8-dimensional vectors.
  • the first harmonic frequency is 8, and the frequency direction is extracted at intervals of 8 to obtain 8*16 groups of data, each group of 8 spectral lines, each group as a vector, a total of 8*16 An 8-dimensional vector; assuming that the first harmonic frequency is 4, the frequency is extracted at a frequency of 4 intervals, and 4*16 sets of data are obtained, each group of 16 spectral lines, each group being further divided into two 8-dimensional vectors, a total of 8*16 8-dimensional vectors; frequency extraction can also be performed according to the second harmonic or N-th harmonic. For example, when the first harmonic frequency is 4, the frequency is extracted according to the interval of 4*2 in the frequency direction, and 8 is obtained.
  • the division and vector organization may be performed by selecting one or several combinations of the above several division and vector organization methods according to the principle of maximizing the coding gain. For example, when the signal has a harmonic structure, assuming that the first harmonic frequency is 8, the combination of the frequency direction division and the frequency extraction can be selected to perform vector organization, and the data of each harmonic position is extracted to obtain 1 *16 groups of data, each group of 8 lines, each component is 2 4-dimensional vectors, a total of 2 * 16 4-dimensional vectors; the remaining positions of the data are divided and organized according to the frequency direction, to obtain 7 * 16 sets of data Each group has 8 spectral lines, each of which serves as an 8-dimensional vector with a total of 7*16 8-dimensional vectors.
  • the quantization coding module 103 performs quantization coding on each obtained vector to be quantized to obtain vector quantized coded data, and outputs the data to the bit stream multiplexing module. Can be quantized by sampling vector quantization
  • the vector is encoded, and the vector to be quantized may also be encoded in a scalar quantization plus entropy coding manner.
  • the codebook used for quantization can be obtained by a conventional LBG algorithm or the like (Linde Y, Buzo A, and Gray R M. "An algori thm for vector quant izer des ign" [J]. IEEE Trans.
  • lattice vector quantization (lat. ice vector quant izat ion) technology
  • F.Chen, Z .Gao, and J.Vi l lasenor "Latt ice vector quant izat ion of general ized Gaussian sources", IEEE Trans. Inform. Theory, vol. 43, no. 1, pp. 92-1031997.
  • ADSubramaniam and BD Rao "PDF opt imized parametric vector quant izat ion of speech l ine spectral frequencies", IEEE Trans. Speech Audio Process., vol. 11, no. 2, pp. 130-1422003).
  • each partition has a classification number, which is used to indicate which vector quantization codebook is used for quantization, and then vector quantization is performed on each vector in the partition using the quantization codebook.
  • the classification number also needs to be quantized, and scalar quantization or vector quantization can be used.
  • the spectral vector quantized encoded data includes coded data of a codeword number and a classification number.
  • the quantized data can be scalar quantized first, and then entropy coded by Huffman coding (ISO/IEC 14496-3 (Audio), Advanced Audio Coding (AAC)).
  • Huffman coding ISO/IEC 14496-3 (Audio), Advanced Audio Coding (AAC)
  • the obtained MDCT spectrally encoded data is output to the bitstream multiplexing module 405.
  • the bit stream multiplexing module 405 is configured to multiplex the encoded data and the side information output from the signal type determining module and the vector quantization encoding module to form a voice encoded code stream.
  • FIG. 5 is a structural block diagram of a mono audio vector quantization decoding apparatus according to an embodiment of the present invention.
  • the monophonic sound decoding apparatus includes: a bit stream demultiplexing module 501, a vector quantization decoding module 502, an IMDCT transform module 503, and resampling. Module 504.
  • the bitstream demultiplexing module 501 is configured to demultiplex the received voice encoded code stream to obtain encoded data and side information of the corresponding data frame, and output corresponding encoded data and side information to the vector quantization decoding module 502 to the IMDCT.
  • the transform module 503 outputs corresponding side information.
  • the vector quantization decoding module 502 is configured to decode the frame vector quantized encoded data, perform redundant inverse processing on the decoded data according to the redundant processing side information, acquire spectrum decoding data of the MDCT domain, and output the data to the IMDCT transform module.
  • Figure 3 is a block diagram of a vector quantization decoding apparatus in accordance with an embodiment of the present invention.
  • the vector quantization decoding module includes a quantization decoding module 301, a vector reconstruction module 302, and a spectrum reconstruction module 303.
  • the quantization decoding module 301 receives signal type analysis information and spectral vector quantization coded data from the bit stream demultiplexing module.
  • the vector quantization codebook used for decoding is determined according to the decoded classification number, and the inverse quantized vector is obtained according to the codebook and the decoded codeword sequence number.
  • the vector reconstruction module 302 performs vector reconstruction on the inverse quantized vector according to the decoded vector partition information to obtain an inverse quantized weighted spectrum.
  • the spectrum reconstruction module 303 performs amplitude adjustment on the inverse quantized weighted spectrum according to the decoded envelope curve to obtain a reconstructed spectrum.
  • the IMDCT transform module 503 is configured to perform IMDCT transform on the spectrum of the MDCT domain.
  • the IMDCT transform uses IMDCT transforms of different length orders according to the signal type side information, and performs time domain aliasing cancellation processing to obtain the reconstructed time domain signal of the frame.
  • the resampling module 504 is configured to convert the sampling frequency of the frame time domain signal output by the IMDCT module 503 to a sampling frequency suitable for sound playback, and it should be noted that if the sampling frequency of the signal output by the IMDCT module 503 is suitable for sound playback, the present invention The module may not be included in the sound decoding device.
  • FIG. 6 is a structural block diagram of a mono band extended audio vector quantization coding apparatus according to an embodiment of the present invention.
  • the mono band extended audio vector quantization coding apparatus includes: a resampling module 601, a signal type judging module 602, an MDCT transform module 603, a low frequency vector quantization coding module 604, and an MDCT to MDFT.
  • the present embodiment is described by taking the MDCT as an example, the apparatus and method are also applicable to coding of other types of data, such as an MDFT domain, an FFT domain, a QMF domain, and the like.
  • the resampling module 601 is configured to convert the input digital sound signal from the original sampling rate to the target sampling rate, and output the resampled signal to the signal type determining module and the MDCT transform module in units of frames. It should be noted that if the input digital sound signal itself has a target sampling rate, the encoding device in accordance with the principles of the present invention may not include the module.
  • the signal type judging module 602 is configured to perform signal type analysis on the resampled sound signal frame by frame, and output a result of the signal type analysis. Due to the complexity of the signal itself, the signal type can take a variety of representations. For example, if the frame signal is a slowly varying signal, the identifier indicating that the frame signal is a slowly changing signal is directly output; if the fast changing signal is used, the position where the fast change point occurs is continuously calculated, and the output indicates that the frame signal is fast changing. The identification of the signal and the location at which the fast change point occurs.
  • the MDCT transform module 603 is configured to map the resampled sound signal to the MDCT transform domain and the MDCT domain coefficient of the sound signal according to the signal type analysis result outputted from the signal type determining module 602, using MDCT transforms of different length orders.
  • the low frequency vector quantization coding module 604 is configured to receive the low frequency portion of the MDCT spectral coefficient of the sound signal from the MDCT transform module 603, perform redundancy elimination processing, and perform vector quantization coding on the redundant processed low spectrum to obtain low frequency encoded data. Output to the bitstream multiplexing module.
  • the MDCT to MDFT conversion module 605 is configured to receive MDCT domain coefficients of the sound signal from the MDCT transform module 603, convert the MDCT domain coefficients into MDFT domain coefficients including phase information, and convert the MDFT The domain coefficients are output to the high frequency parameter encoding module 606.
  • the high frequency parameter encoding module 606 is configured to receive MDFT domain coefficients from the MDCT to MDFT conversion module 605, extract necessary high frequency parameters such as gain parameters, tonal parameters, and quantize and encode the high frequency parameters and output to the Bitstream multiplexing module 607.
  • the bit stream multiplexing module 607 is configured to multiplex the encoded data and the side information output from the signal type determining module, the low frequency vector quantization encoding module, and the high frequency parameter encoding module to form a voice encoded code stream.
  • the low frequency vector quantization coding module 604 includes an amplitude adjustment module, a vector organization module, and a quantization coding module, as shown in FIG.
  • the amplitude adjustment module performs audio perceptual analysis on the signal according to the psychoacoustic model, and accordingly adjusts the amplitude of the MDCT low spectrum to obtain the amplitude-adjusted low-frequency weighting spectrum to be quantized.
  • Using the psychoacoustic model to adjust the low spectrum can effectively control the distribution of quantization error and improve the perceived quality of reconstructed audio.
  • the amplitude adjustment module adjusts the amplitude of the MDCT spectrum according to the spectral envelope curve.
  • the envelope curve can be obtained by various methods, such as the spectral envelope curve represented by the line spectrum to the LSP parameter, and the spectral envelope curve represented by the segmentation line.
  • the spectral envelope curve is represented by a piecewise straight line.
  • a block with an MDCT spectrum length of 512 is described, and the frequency axis is divided into arrays ⁇ 0, 7, 16, 23, 33, 39, 46, 55, 65, 79, 93, 110, 130, 156, 186, 232, 278, 360, 512 ⁇ , and the 0 and 512 points are calculated first.
  • the amplitude represents the whole spectrum. From 46 points, the line segment is divided into two line segments, the amplitudes of the three points are calculated separately, and the spectrum envelope is approximated by two line segments; and so on, respectively, in the following order.
  • the vector organization module arranges and divides the amplitude-adjusted low-frequency weighted spectrum to be quantized, and organizes it into several to-quantized vectors.
  • the time-frequency plane of the MDCT spectrum is constructed, which may be an MDCT spectrum of each block in the frame or an MDCT spectrum between frames.
  • the time-frequency plane is divided according to the result of the signal type judgment and the tonality of the signal, and the MDCT spectrum is organized into a plurality of to-be quantized vectors according to the division.
  • the time-frequency plane division and the vector structure to be quantized can be divided into the following ways: division and organization according to the time direction, specifically, the stationary signal with strong tonality can be evenly divided and organized according to the time direction; The direction is divided and organized.
  • the signals with fast-changing characteristics in the time domain can be divided and organized according to the frequency direction.
  • the frequency is extracted and organized according to the frequency extraction method, specifically for the stationary signal with harmonic structure.
  • the vector can be organized by frequency extraction; divided and organized according to the time-frequency region, specifically for more complex audio signals, the vector can be organized according to the time-frequency region.
  • the division and vector organization may be performed by selecting one or several combinations of the above-described several division and vector organization methods in accordance with the principle of maximizing the coding gain.
  • the frequency coefficient of the signal is N
  • the resolution in the time direction on the time-frequency plane is L
  • the resolution in the frequency direction is K
  • K*L N.
  • the resolution K in the frequency direction is kept unchanged, and the time is divided
  • the resolution L in the time direction is kept unchanged, and the frequency is divided;
  • the frequency extraction method performs vector division, the MDCT spectrum is extracted in units of harmonics; when vector division is performed according to the time-frequency region, the number of time and frequency directions can be arbitrarily divided, and the size of each time-frequency region is divided. And the shape can be the same, regular, or different and irregular.
  • the vector organizes the vectors according to the time-frequency region, and there are 16*8 8-dimensional vectors.
  • the first harmonic frequency is 8, and the frequency direction is extracted at intervals of 8 to obtain 8*16 groups of data, each group of 8 spectral lines, each group as a vector, a total of 8*16 An 8-dimensional vector; assuming that the first harmonic frequency is 4, the frequency is extracted at a frequency of 4 intervals, and 4*16 sets of data are obtained, each group of 16 spectral lines, each group being further divided into two 8-dimensional vectors, a total of 8*16 8-dimensional vectors; frequency extraction can also be performed according to the second harmonic or N-th harmonic.
  • the frequency is extracted according to the interval of 4*2 in the frequency direction, and 8 is obtained.
  • *16 groups of data each group of 8 lines, each group as a vector, a total of 8 * 16 8-dimensional vector. It should be pointed out that when dividing and vectorizing according to the above method or a combination of the above several methods, the dimensions of the vector can be flexibly changed, and different regions of the time-frequency plane can be organized into vectors of different latitudes to improve coding efficiency. .
  • the division and vector organization may be performed by selecting one or several combinations of the above several division and vector organization methods according to the principle of maximizing the coding gain. For example, when the signal has a harmonic structure, assuming that the first harmonic frequency is 8, the combination of the frequency direction division and the frequency extraction can be selected to perform vector organization, and the data of each harmonic position is extracted to obtain 1 *16 groups of data, each group of 8 lines, each component is 2 4-dimensional vectors, a total of 2 * 16 4-dimensional vectors; the remaining positions of the data are divided and organized according to the frequency direction, to obtain 7 * 16 sets of data Each group has 8 spectral lines, each of which serves as an 8-dimensional vector with a total of 7*16 8-dimensional vectors.
  • the quantization coding module 103 performs quantization coding on each obtained vector to be quantized to obtain vector quantized coded data, and outputs the data to the bit stream multiplexing module.
  • the quantized vector may be encoded by sampling vector quantization, or the quantized vector may be encoded by scalar quantization plus entropy coding.
  • the codebook used for quantization can be obtained by a conventional LBG algorithm or the like; or a structured codebook constructed, such as a lattice vector quantization technique.
  • each partition has a classification number, which is used to indicate which vector quantization codebook is used for quantization, and then vector quantization is performed on each vector in the partition using the quantization codebook. , get the codeword number of the vector, and The serial number is encoded.
  • the classification number also needs to be quantized, and scalar quantization or vector quantization can be used.
  • the spectral vector quantized encoded data includes coded data of a codeword number and a classification number. When the scalar quantization plus entropy coding method is adopted, the quantized data may be scalar-quantized first, and then subjected to entropy coding by Huffman coding.
  • FIG. 7 is a structural block diagram of a mono band extended audio vector quantization decoding apparatus according to an embodiment of the present invention.
  • a mono band extended sound decoding apparatus includes: a bit stream demultiplexing module 701, a low frequency vector quantization decoding module 702, an MDCT to MDFT conversion module 703, and high frequency parameter decoding. Module 704, IMDFT transform module 705, and resampling module 706.
  • the bitstream demultiplexing module 701 is configured to demultiplex the received voice encoded code stream, obtain encoded data and side information of the corresponding data frame, and output corresponding encoded data and side information to the low frequency vector quantization decoding module 702.
  • the high frequency parameter decoding module 704 and the IMDFT conversion module 705 output corresponding side information.
  • the low frequency vector quantization decoding module 702 is configured to decode the low frequency vector quantized encoded data of the frame, and perform redundant inverse processing on the decoded data according to the redundant processing side information, obtain low spectrum decoded data of the MDCT domain, and output the data to the MDCT to MDFT conversion module. .
  • the MDCT to MDFT conversion module 703 is configured to receive the output of the low frequency vector quantization decoding module 702, convert the low spectral decoding coefficients from the MDCT domain to the MDFT domain, and output the low spectral data of the MDFT domain to the high frequency parameter decoding module 704.
  • the high frequency parameter decoding module 704 is configured to map partial spectral data from the low frequency spectrum of the frame MDFT domain to the high frequency portion, and then according to the high frequency parameter encoded data output by the bit stream demultiplexing module 701 (including gain adjustment and tone adjustment). Side information) adjusts its gain and tonality to obtain high spectral decoded data.
  • the IMDFT transform module 705 is used to combine the low spectrum and the high spectrum for IMDFT transform.
  • the IMDFT transform uses the IMDFT transform of different length orders according to the signal type side information to obtain the time domain signal of the frame.
  • the resampling module 706 is configured to transform the sampling frequency of the frame time domain signal output by the IMDFT module 705 to a sampling frequency suitable for sound playback. It should be noted that if the sampling frequency of the signal output by the IMDFT module 705 is suitable for sound playback, the present invention The module may not be included in the sound decoding device.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种用于音频信号的矢量量化编解码方法及装置,所述编码方法包括:对音频信号的变换域谱进行音频感知分析,根据分析结果对音频信号的变换域谱进行幅度调整,得到待量化的加权谱;对所述待量化的加权谱进行组织,得到多个待量化的矢量;以及对所述多个待量化的矢量进行量化编码,得到矢量量化编码数据。该方法及装置参考音频感知特性对变换域信号进行幅度调整,可消除感知冗余,提高编码效率;通过信号特性分析,对音频的时频平面进行合理的划分,并将其组织成待量化矢量;可选择使得编码增益最大的时频平面划分和矢量组织方式,有利于对信号进行高效的量化编码。

Description

一种用于音频信号的矢量量化编解码方法及装置 技术领域
本发明涉及一种用于音频信号的矢量量化编解码方法及装置。
背景技术
在目前通用的各音频编码标准中,对变换域的音频信号多采用标量量化的方案进行量化编码,比如在MPEG-1Layer 3(MP3)、MPEG2/4AAC、AVS等标准中,对MDCT数据采用标量量化的方式进行量化,然后采用Huffman编码来进行熵编码;而在AC-3编码方案中,则将MDCT数据分解为指数和尾数,对尾数部分按照比特分配模型进行比特数可变的量化编码。由于标量量化方案无法有效地利用变换域信号相邻数据间存在的冗余,因此很难获得理想的编码效果。矢量量化是解决此问题的一个合理的方式,变换域加权交叉矢量量化(TWINVQ)方案是一种应用了矢量量化技术的音频编码方法,该方法在对信号进行MDCT变换后,通过交叉选择信号谱参数构造待量化的矢量,然后采用高效率的矢量量化获得了较优的音频编码质量,但TWINVQ中没有有效利用音频感知特性来控制量化噪声,而且TWINVQ中没有充分利用信号特性来指导矢量的组织,因此需要进一步改进。
发明内容
本发明的目的是提供一种能够克服上述缺陷的用于音频信号的矢量量化编解码方法及装置。
在第一方面,本发明提供了一种用于音频信号的矢量量化编码方法,包括:对音频信号的变换域谱进行音频感知分析,根据分析结果对音频信号的 变换域谱进行幅度调整(幅度调整后的变换域谱称作加权谱),得到待量化的加权谱;对所述待量化的加权谱进行组织,得到多个待量化的矢量;以及对所述多个待量化的矢量进行量化编码,得到矢量量化编码数据。
优选地,所述对所述待量化的加权谱进行组织的步骤包括:构造所述待量化的加权谱的时频平面;根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量。
优选地,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤包括:基于频率抽取进行划分和组织,其具体为根据所述类型及音调性判断出所述音频信号是具有谐波结构的平稳信号,然后按照时间方向对所述时频平面进行划分,以谐波为单位对所述加权谱进行频率抽取,将所述加权谱组织成多个待量化的矢量;或者按时间方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号是平稳信号,然后按照时间方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者按频率方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号在时域具有快变特性,然后按照频率方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者按时频区域进行划分和组织,其具体为根据所述音调性和类型判断出所述音频信号为复杂信号,然后将所述时频平面划分为多个时频区域,根据划分结果将所述加权谱组织成多个待量化的矢量。
优选地,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤还包括:按照使得编码增益最大的规则从所述基于频率抽取进行划分和组织、所述按时间方向进行划分和组织、所述按频率方向进行划分和组织、所述按时频区域进行划分和组织中选择编码增益最大的一种或几种方式的组合进行所述划分和组织。
优选地,所述对所述多个待量化的矢量进行量化编码的步骤包括:对所述多个待量化的矢量进行矢量量化编码;或者对所述多个待量化的矢量进行标量量化再进行熵编码。
在第二方面,本发明提供了一种用于音频信号的矢量量化解码方法,包括:对矢量量化编码数据进行解码,得到反量化的矢量;根据矢量划分信息对所述反量化的矢量进行矢量重构,得到反量化的加权谱;对所述反量化的加权谱进行幅度调整,得到解码数据。
在第三方面,本发明提供了一种用于音频的矢量量化编码装置,包括:幅度调整模块,用于对音频信号的变换域谱进行音频感知分析,根据分析结果对音频信号的变换域谱进行幅度调整,得到待量化的加权谱;矢量组织模块,用于对所述待量化的加权谱进行组织,得到多个待量化的矢量;以及量化编码模块,用于对所述多个待量化的矢量进行量化编码,得到矢量量化编码数据。
优选地,所述矢量组织模块被配置用于:构造所述待量化的加权谱的时频平面;根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量。
优选地,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤包括:基于频率抽取进行划分和组织,其具体为根据所述类型及音调性判断出所述音频信号是具有谐波结构的平稳信号,然后按照时间方向对所述时频平面进行划分,以谐波为单位对所述加权谱进行频率抽取,将所述加权谱组织成多个待量化的矢量;或者按时间方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号是平稳信号,然后按照时间方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者按频率方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号在时域具有快变特性,然后按照频率方向对所述时频平面进行划分,根 据划分结果将所述加权谱组织成多个待量化的矢量;或者按时频区域进行划分和组织,其具体为根据所述音调性和类型判断出所述音频信号为复杂信号,然后将所述时频平面划分为多个时频区域,根据划分结果将所述加权谱组织成多个待量化的矢量。
优选地,优选地,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤还包括:按照使得编码增益最大的规则从所述基于频率抽取进行划分和组织、所述按时间方向进行划分和组织、所述按频率方向进行划分和组织、所述按时频区域进行划分和组织中选择编码增益最大的一种或几种方式的组合进行所述划分和组织。
优选地,所述量化编码模块被配置用于:对所述多个待量化的矢量进行矢量量化编码;或者对所述多个待量化的矢量进行标量量化再进行熵编码。
在第四方面,本发明提供了一种用于音频信号的矢量量化解码装置,包括:量化解码模块,用于对矢量量化编码数据进行解码,得到反量化的矢量;矢量重构模块,用于根据矢量划分信息对所述反量化的矢量进行矢量重构,得到反量化的加权谱;频谱重构模块,对所述反量化的加权谱进行幅度调整,得到解码数据。
本发明提出了一种用于音频信号的矢量量化编解码方案,该方案参考音频感知特性对变换域信号进行幅度调整,可消除感知冗余,提高编码效率;通过信号特性分析,对音频的时频平面进行合理的划分,并将其组织成待量化矢量;可选择使得编码增益最大的时频平面划分和矢量组织方式,有利于对信号进行高效的量化编码。
附图说明
图1为根据本发明实施例的矢量量化编码装置的框图。
图2为根据本发明实施例的4种矢量划分的示意图。
图3为根据本发明实施例的矢量量化解码装置的框图。
图4为根据本发明实施例的单声道音频矢量量化编码装置的结构框图。
图5为根据本发明实施例的单声道音频矢量量化解码装置的结构框图。
图6为根据本发明实施例的单声道频带扩展音频矢量量化编码装置的结构框图。
图7为根据本发明实施例的单声道频带扩展音频矢量量化解码装置的结构框图。
具体实施方式
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。
图4为根据本发明实施例的单声道音频矢量量化编码装置的结构框图。
如图4所示,根据本发明实施例的单声道音频矢量量化编码装置包括:重采样模块401,信号类型判断模块402、MDCT变换模块403、矢量量化编码模块404以及比特流复用模块405。应当指出,虽然本实施例以MDCT为例进行说明,但该装置和方法也适用于其他类型数据的编码,比如MDFT域、FFT域、QMF域等。
重采样模块401用于将输入的数字声音信号从原始采样率变换到目标采样率,并将重采样后的信号以帧为单位输出到信号类型判断模块、MDCT变换模块。应注意,如果所输入的数字声音信号本身就具有目标采样率,则根据本发明原理的编码装置可以不包括该模块。
信号类型判断模块402用于对重采样后的声音信号逐帧进行信号类型分析,并输出信号类型分析的结果。由于信号本身的复杂性,信号类型可以采用多种表示形式。例如,若该帧信号是缓变信号,则直接输出表示该帧信号是缓变信号的标识;若是快变信号,则需继续计算快变点发生的位置,并输出表示该帧信号是快变信号的标识和快变点发生的位置。
MDCT变换模块403用于根据从信号类型判断模块402输出的信号类型分 析结果,采用不同长度阶数的MDCT变换,将重采样后的声音信号映射到MDCT变换域,并将声音信号的MDCT域系数输出到矢量量化编码模块404。具体地,若该帧信号是缓变信号,则以帧为单位做MDCT变换,选择较长阶数的MDCT变换;若是快变信号,则将该帧信号划分为子帧,以子帧为单位做MDCT变换,选择较短阶数的MDCT变换。
矢量量化编码模块404用于从MDCT变换模块403接收声音信号的MDCT谱系数,对其进行冗余消除处理,并将冗余处理后的频谱进行矢量量化编码得到MDCT谱编码数据,输出到比特流复用模块。
下面转到图1。
图1为根据本发明实施例的矢量量化编码装置框图。
根据本发明实施例的矢量量化编码装置包括幅度调整模块101、矢量组织模块102和量化编码模块103。
幅度调整模块101根据心理声学模型对信号进行音频感知分析,并据此对MDCT谱进行幅度调整,得到幅度调整后的待量化的加权谱。利用心理声学模型对MDCT谱进行调整,可以有效控制量化误差的分布,提升重建音频的感知质量。
例如,幅度调整模块101可以根据频谱包络曲线对MDCT谱进行幅度调整,幅度调整模块101可以用多种方法得到包络曲线,比如以线谱对LSP参数表示的频谱包络曲线,用分段直线表示的频谱包络曲线,用样条曲线拟合的频谱包络曲线,用泰勒展式表示的频谱包络曲线。
下面以分段直线表示频谱包络曲线为例进行说明。比如以MDCT谱长度为512的块进行描述,将频率轴划分为数组{0,7,16,23,33,39,46,55,65,79,93,110,130,156,186,232,278,360,512},首先计算两端0和512点幅值来表示整个频谱的情况,从46点将此线段分割为两个线段,分别计算3个点的幅值,并以两个线段近似表示频谱包络;以此类推,分别以下列顺序分割线段46,186,16,33,65,93,130,278,7,23,39,55,79,110,156, 232,360,最后得到18段折线表示整个频谱包络。为进一步压缩这些表示数据,可仅把两端的值以绝对值表示,中间值通过预测以差分方式表示。通过对这18段折线进行线性内插得到整个频谱的包络曲线,用于MDCT谱的幅度调整。
矢量组织模块102对经过幅度调整后的待量化的加权谱进行排列和划分,将其组织为若干个待量化矢量。
首先构造MDCT谱的时频平面,可以是帧内各块的MDCT谱或者是帧间的MDCT谱。根据信号类型判断的结果以及信号的音调性等信息对时频平面进行划分,并根据此划分将MDCT谱组织成多个待量化矢量。时频平面划分和待量化矢量组织可以分为下面几种方式:按时间方向的方式进行划分和组织,具体为对音调性较强的平稳信号可按时间方向进行划分和组织矢量;按频率方向的方式进行划分和组织,具体为对时域具有快变特性的信号可按频率方向进行划分和组织矢量;基于频率抽取的方式进行划分和组织,具体为对于具有谐波结构的平稳信号,则可通过频率抽取的方式进行矢量组织;按时频区域的方式进行划分和组织,具体为对比较复杂的音频信号,则可按时频区域组织矢量。优选地,可以按照使得编码增益最大的原则来从上述几种划分和矢量组织方法中选择一种或几种方式的组合来进行划分和矢量组织。
假设信号的频率系数长度为N,在时频平面上时间方向的分辨率为L,频率方向的分辨率为K,且K*L=N。当按照时间方向进行矢量划分时,保持频率方向的分辨率K不变,对时间进行划分;当按照频率方向进行矢量划分时,保持时间方向的分辨率L不变,对频率进行划分;当按照时频区域进行矢量划分时,其时间和频率方向划分的个数可任意,划分出的各个时频区域的大小和形状可以是相同的、规则的,也可以是不同的、不规则的;当按照频率抽取方式进行矢量划分时,以谐波为单位对MDCT谱进行抽取。
下面转到图2。
图2为根据本发明实施例的4种矢量划分的示意图。
图2图示了按照时间、频率、时频区域和频率抽取方式划分矢量的实施例。假设时频平面被划分为K*L=64*16形式,K=64,为频率方向的分辨率,L=16为时间方向的分辨率。假设矢量的维数为D=8,可以对该时频平面按照不同的方式组合和提取矢量,如图2-a、图2-b、图2-c和图2-d所示。
在图2-a中,矢量按照频率方向被划分为8*16个8维矢量。在图2-b中,矢量按照时间方向划分的结果,共有为64*2个8维矢量。在图2-c中,矢量按照时频区域组织矢量的结果,共有16*8个8维矢量。在图2-d中,假设一次谐波频率为8,对频率方向按照8为间隔进行频率抽取,得到8*16组数据,每组8条谱线,每组作为一个矢量,共有8*16个8维矢量;假设一次谐波频率为4,对频率方向按照4为间隔进行频率抽取,得到4*16组数据,每组16条谱线,每组再分为2个8维矢量,共有8*16个8维矢量;也可以按照二次谐波或N次谐波为间隔进行频率抽取,比如一次谐波频率为4时,对频率方向按照4*2为间隔进行频率抽取,得到8*16组数据,每组8条谱线,每组作为一个矢量,共有8*16个8维矢量。需要指出的是,按照上述方法或上述几种方法的组合进行划分和矢量组织时,矢量的维度是可以灵活变化的,对时频平面的不同区域可以组织成不同纬度的矢量,以提高编码效率。
为了提高编码效率,可以按照使得编码增益最大的原则来从上述几种划分和矢量组织方法中选择一种或几种方式的组合来进行划分和矢量组织。例如,当信号具有谐波结构时,假设一次谐波频率为8,可选择按频率方向划分、频率抽取两种方式的组合来进行矢量组织,将每个谐波位置的数据抽取出来,得到1*16组数据,每组8条谱线,每组分为2个4维矢量,共2*16个4维矢量;对其余位置的数据按照频率方向进行划分和组织,得到7*16组数据,每组8条谱线,每组作为1个8维矢量,共7*16个8维矢量。
下面回到图1。
量化编码模块103对得到的每一个待量化矢量进行量化编码,得到矢量量化编码数据,并输出到比特流复用模块。可以采样矢量量化方式对待量化 矢量进行编码,也可以采用标量量化加熵编码的方式对对待量化矢量进行编码。比如,采用矢量量化方法,量化所用码书可以通过传统的LBG算法等来获得(Linde Y,Buzo A,and Gray R M."An algori thm for vector quant izer des ign"[J].IEEE Trans.on Communicat ion,1980,28(1):84-95.);也可以是构造出的某种结构化码书,比如格型矢量量化(latt ice vector quant izat ion)技术(F.Chen,Z.Gao,and J.Vi l lasenor,"Latt ice vector quant izat ion of general ized Gauss ian sources",IEEE Trans.Inform.Theory,vol.43,no.1,pp.92-1031997.A.D.Subramaniam and B.D.Rao,"PDF opt imized parametric vector quant izat ion of speech l ine spectral frequencies",IEEE Trans.Speech Audio Process.,vol.11,no.2,pp.130-1422003)。首先将全部待量化矢量划分为不同的分区,每个分区都有一个分类号,用来指明使用哪个矢量量化码书来进行量化,然后使用该量化码书对分区中的每一个矢量进行矢量量化,得到该矢量的码字序号,并对序号进行编码。分类号也需要进行量化编码,可以采用标量量化或矢量量化。频谱矢量量化编码数据包含码字序号、分类号的编码数据。采用标量量化加熵编码的方法时,可先对待量化数据进行标量量化,然后采用霍夫曼编码进行熵编码(ISO/IEC 14496-3(Audio),Advanced Audio Coding(AAC))。
下面回到图4。
在进行矢量量化编码后,得到MDCT谱编码数据被输出到比特流复用模块405。
比特流复用模块405用于将从信号类型判断模块、矢量量化编码模块输出的编码数据以及边信息进行复用,形成声音编码码流。
图5为根据本发明实施例的单声道音频矢量量化解码装置的结构框图。
如图5所示,根据本发明的优选实施例的单声道声音解码装置包括:比特流解复用模块501、矢量量化解码模块502、IMDCT变换模块503和重采样 模块504。
下面,概括介绍图5所示各模块之间的连接关系和及其各自的功能。
比特流解复用模块501,用于对接收的声音编码码流进行解复用,得到相应数据帧的编码数据和边信息,向矢量量化解码模块502输出相应的编码数据和边信息,向IMDCT变换模块503输出相应的边信息。
矢量量化解码模块502用于对该帧矢量量化编码数据解码,并根据冗余处理边信息对解码数据进行冗余逆处理,获取MDCT域的频谱解码数据并输出到IMDCT变换模块。
下面转到图3,图3为根据本发明实施例的矢量量化解码装置的框图。
如图3所示,矢量量化解码模块包括量化解码模块301、矢量重构模块302、频谱重构模块303。
量化解码模块301从比特流解复用模块接收信号类型分析信息、频谱矢量量化编码数据。根据解码得分类号确定解码所用的矢量量化码书,根据该码书和解码得到的码字序号得到反量化的矢量。矢量重构模块302依据解码的矢量划分信息,对反量化的矢量进行矢量重构得到反量化的加权谱。频谱重构模块303依据解码后的包络曲线对反量化的加权谱进行幅度调整,得到重构的频谱。
下面回到图5。
IMDCT变换模块503用于将MDCT域的频谱进行IMDCT变换。IMDCT变换根据信号类型边信息采用不同长度阶数的IMDCT变换,并进行时域混叠消除处理,获取该帧的重建时域信号。
重采样模块504用于将IMDCT模块503输出的该帧时域信号的采样频率变换到适合声音回放的采样频率,应注意,如果IMDCT模块503输出的信号的采样频率适于声音回放,则本发明的声音解码装置中可以不包括该模块。
图6为根据本发明实施例的单声道频带扩展音频矢量量化编码装置的结构框图。
如图6所示,本发明优选实施例的单声道频带扩展音频矢量量化编码装置包括:重采样模块601,信号类型判断模块602、MDCT变换模块603、低频矢量量化编码模块604、MDCT至MDFT转换模块605、高频参数编码模块606,以及比特流复用模块607。应当指出,虽然本实施例以MDCT为例进行说明,但该装置和方法也适用于其他类型数据的编码,比如MDFT域、FFT域、QMF域等。
重采样模块601用于将输入的数字声音信号从原始采样率变换到目标采样率,并将重采样后的信号以帧为单位输出到信号类型判断模块、MDCT变换模块。应注意,如果所输入的数字声音信号本身就具有目标采样率,则根据本发明原理的编码装置可以不包括该模块。
信号类型判断模块602用于对重采样后的声音信号逐帧进行信号类型分析,并输出信号类型分析的结果。由于信号本身的复杂性,信号类型可以采用多种表示形式。例如,若该帧信号是缓变信号,则直接输出表示该帧信号是缓变信号的标识;若是快变信号,则需继续计算快变点发生的位置,并输出表示该帧信号是快变信号的标识和快变点发生的位置。
MDCT变换模块603用于根据从信号类型判断模块602输出的信号类型分析结果,采用不同长度阶数的MDCT变换,将重采样后的声音信号映射到MDCT变换域,并将声音信号的MDCT域系数输出到低频矢量量化编码模块604,MDCT至MDFT转换模块605。具体地,若该帧信号是缓变信号,则以帧为单位做MDCT变换,选择较长阶数的MDCT变换;若是快变信号,则将该帧信号划分为子帧,以子帧为单位做MDCT变换,选择较短阶数的MDCT变换。
低频矢量量化编码模块604用于从MDCT变换模块603接收声音信号的MDCT谱系数的低频部分,对其进行冗余消除处理,并将冗余处理后的低频谱进行矢量量化编码得到低频编码数据,输出到比特流复用模块。
MDCT至MDFT转换模块605用于从MDCT变换模块603接收声音信号的MDCT域系数,将MDCT域系数转换为包含有相位信息的MDFT域系数,并将该MDFT 域系数输出到高频参数编码模块606。
高频参数编码模块606用于从MDCT至MDFT转换模块605接收MDFT域系数,从中提取所需要的诸如增益参数、调性参数之类的高频参数,并对高频参数进行量化编码并输出到比特流复用模块607。
比特流复用模块607用于将从信号类型判断模块、低频矢量量化编码模块和高频参数编码模块输出的编码数据以及边信息进行复用,形成声音编码码流。
低频矢量量化编码模块604包括幅度调整模块、矢量组织模块和量化编码模块,如图1所示。
幅度调整模块根据心理声学模型对信号进行音频感知分析,并据此对MDCT低频谱进行幅度调整,得到幅度调整后的待量化的低频加权谱。利用心理声学模型对低频谱进行调整,可以有效控制量化误差的分布,提升重建音频的感知质量。
幅度调整模块根据频谱包络曲线对MDCT谱进行幅度调整,包络曲线可以用多种方法得到,比如以线谱对LSP参数表示的频谱包络曲线,用分段直线表示的频谱包络曲线,用样条曲线拟合的频谱包络曲线,用泰勒展式表示的频谱包络曲线。
下面以分段直线表示频谱包络曲线为例进行说明。比如以MDCT谱长度为512的块进行描述,将频率轴划分为数组{0,7,16,23,33,39,46,55,65,79,93,110,130,156,186,232,278,360,512},首先计算两端0和512点幅值来表示整个频谱的情况,从46点将此线段分割为两个线段,分别计算3个点的幅值,并以两个线段近似表示频谱包络;以此类推,分别以下列顺序分割线段46,186,16,33,65,93,130,278,7,23,39,55,79,110,156,232,360,最后得到18段折线表示整个频谱包络。为进一步压缩这些表示数据,可仅把两端的值以绝对值表示,中间值通过预测以差分方式表示。通过对这18段折线进行线性内插得到整个频谱的包络曲线,用于MDCT谱的 幅度调整。
矢量组织模块对经过幅度调整后的待量化的低频加权谱进行排列和划分,将其组织为若干个待量化矢量。
首先构造MDCT谱的时频平面,可以是帧内各块的MDCT谱或者是帧间的MDCT谱。根据信号类型判断的结果以及信号的音调性等信息对时频平面进行划分,并根据此划分将MDCT谱组织成多个待量化矢量。时频平面划分和待量化矢量组织可以分为下面几种方式:按时间方向的方式进行划分和组织,具体为对音调性较强的平稳信号可按时间方向进行均匀划分和组织矢量;按频率方向的方式进行划分和组织,具体为对时域具有快变特性的信号可按频率方向进行划分和组织矢量;基于频率抽取的方式进行划分和组织,具体为对于具有谐波结构的平稳信号,则可通过频率抽取的方式进行矢量组织;按时频区域的方式进行划分和组织,具体为对比较复杂的音频信号,则可按时频区域组织矢量。优选地,可以按照使得编码增益最大的原则来从上述几种划分和矢量组织方法中选择一种或几种方式的组合来进行划分和矢量组织。
假设信号的频率系数长度为N,在时频平面上时间方向的分辨率为L,频率方向的分辨率为K,且K*L=N。当按照时间方向进行矢量划分时,保持频率方向的分辨率K不变,对时间进行划分;当按照频率方向进行矢量划分时,保持时间方向的分辨率L不变,对频率进行划分;当按照频率抽取方式进行矢量划分时,以谐波为单位对MDCT谱进行抽取;当按照时频区域进行矢量划分时,其时间和频率方向划分的个数可任意,划分出的各个时频区域的大小和形状可以是相同的、规则的,也可以是不同的、不规则的。图2图示了按照时间、频率、时频区域和频率抽取方式划分矢量的实施例。假设时频平面被划分为K*L=64*16形式,K=64,为频率方向的分辨率,L=16为时间方向的分辨率。假设矢量的维数为D=8,可以对该时频平面按照不同的方式组合和提取矢量,如图2-a、图2-b、图2-c和图2-d所示。在图2-a中,矢量按照频率方向被划分为8*16个8维矢量。在图2-b中,矢量按照时间方向划分的结果, 共有为64*2个8维矢量。在图2-c中,矢量按照时频区域组织矢量的结果,共有16*8个8维矢量。在图2-d中,假设一次谐波频率为8,对频率方向按照8为间隔进行频率抽取,得到8*16组数据,每组8条谱线,每组作为一个矢量,共有8*16个8维矢量;假设一次谐波频率为4,对频率方向按照4为间隔进行频率抽取,得到4*16组数据,每组16条谱线,每组再分为2个8维矢量,共有8*16个8维矢量;也可以按照二次谐波或N次谐波为间隔进行频率抽取,比如一次谐波频率为4时,对频率方向按照4*2为间隔进行频率抽取,得到8*16组数据,每组8条谱线,每组作为一个矢量,共有8*16个8维矢量。需要指出的是,按照上述方法或上述几种方法的组合进行划分和矢量组织时,矢量的维度是可以灵活变化的,对时频平面的不同区域可以组织成不同纬度的矢量,以提高编码效率。
为了提高编码效率,可以按照使得编码增益最大的原则来从上述几种划分和矢量组织方法中选择一种或几种方式的组合来进行划分和矢量组织。例如,当信号具有谐波结构时,假设一次谐波频率为8,可选择按频率方向划分、频率抽取两种方式的组合来进行矢量组织,将每个谐波位置的数据抽取出来,得到1*16组数据,每组8条谱线,每组分为2个4维矢量,共2*16个4维矢量;对其余位置的数据按照频率方向进行划分和组织,得到7*16组数据,每组8条谱线,每组作为1个8维矢量,共7*16个8维矢量。
量化编码模块103对得到的每一个待量化矢量进行量化编码,得到矢量量化编码数据,并输出到比特流复用模块。可以采样矢量量化方式对待量化矢量进行编码,也可以采用标量量化加熵编码的方式对对待量化矢量进行编码。比如,采用矢量量化方法,量化所用码书可以通过传统的LBG算法等来获得;也可以是构造出的某种结构化码书,比如格型矢量量化(lattice vector quantization)技术。首先将全部待量化矢量划分为不同的分区,每个分区都有一个分类号,用来指明使用哪个矢量量化码书来进行量化,然后使用该量化码书对分区中的每一个矢量进行矢量量化,得到该矢量的码字序号,并对 序号进行编码。分类号也需要进行量化编码,可以采用标量量化或矢量量化。频谱矢量量化编码数据包含码字序号、分类号的编码数据。采用标量量化加熵编码的方法时,可先对待量化数据进行标量量化,然后采用霍夫曼编码进行熵编码。
图7为根据本发明实施例的单声道频带扩展音频矢量量化解码装置的结构框图。
如图7所示,根据本发明的优选实施例的单声道频带扩展声音解码装置包括:比特流解复用模块701、低频矢量量化解码模块702、MDCT至MDFT转换模块703、高频参数解码模块704、IMDFT变换模块705和重采样模块706。
下面,概括介绍图7所示各模块之间的连接关系和及其各自的功能。
比特流解复用模块701,用于对接收的声音编码码流进行解复用,得到相应数据帧的编码数据和边信息,向低频矢量量化解码模块702输出相应的编码数据和边信息,向高频参数解码模块704和IMDFT变换模块705输出相应的边信息。
低频矢量量化解码模块702用于对该帧低频矢量量化编码数据解码,并根据冗余处理边信息对解码数据进行冗余逆处理,获取MDCT域的低频谱解码数据并输出到MDCT至MDFT转换模块。
MDCT至MDFT转换模块703用于接收低频矢量量化解码模块702的输出,将低频谱解码系数从MDCT域转换至MDFT域,并将MDFT域的低频谱数据输出到高频参数解码模块704。
高频参数解码模块704用于从该帧MDFT域的低频谱中映射部分谱数据到高频部分,再按照比特流解复用模块701输出的高频参数编码数据(包括增益调整和音调性调整边信息)调整其增益和音调性得到高频谱解码数据。
IMDFT变换模块705用于将低频谱和高频谱组合在一起进行IMDFT变换。IMDFT变换根据信号类型边信息采用不同长度阶数的IMDFT变换,获取该帧的时域信号。
重采样模块706用于将IMDFT模块705输出的该帧时域信号的采样频率变换到适合声音回放的采样频率,应注意,如果IMDFT模块705输出的信号的采样频率适于声音回放,则本发明的声音解码装置中可以不包括该模块。
专业人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (12)

  1. 一种用于音频信号的矢量量化编码方法,包括:
    对音频信号的变换域谱进行音频感知分析,根据分析结果对音频信号的变换域谱进行幅度调整,得到待量化的加权谱;
    对所述待量化的加权谱进行组织,得到多个待量化的矢量;以及
    对所述多个待量化的矢量进行量化编码,得到矢量量化编码数据。
  2. 根据权利要求1所述的方法,其中,所述对所述待量化的加权谱进行组织的步骤包括:
    构造所述待量化的加权谱的时频平面;
    根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量。
  3. 根据权利要求2所述的方法,其中,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤包括:
    基于频率抽取进行划分和组织,其具体为根据所述类型及音调性判断出所述音频信号是具有谐波结构的平稳信号,然后按照时间方向对所述时频平面进行划分,以谐波为单位对所述加权谱进行频率抽取,将所述加权谱组织成多个待量化的矢量;或者
    按时间方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号是平稳信号,然后按照时间方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者
    按频率方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号在时域具有快变特性,然后按照频率方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者
    按时频区域进行划分和组织,其具体为根据所述音调性和类型判断出所述音频信号为复杂信号,然后将所述时频平面划分为多个时频区域,根据划 分结果将所述加权谱组织成多个待量化的矢量。
  4. 根据权利要求3的方法,其中,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤还包括:
    按照使得编码增益最大的规则从所述基于频率抽取进行划分和组织、所述按时间方向进行划分和组织、所述按频率方向进行划分和组织、所述按时频区域进行划分和组织中选择一种或几种方式的组合进行所述划分和组织。
  5. 根据权利要求1所述的方法,其中,所述对所述多个待量化的矢量进行量化编码的步骤包括:
    对所述多个待量化的矢量进行矢量量化编码;或者
    对所述多个待量化的矢量进行标量量化再进行熵编码。
  6. 一种用于音频信号的矢量量化解码方法,包括:
    对矢量量化编码数据进行解码,得到反量化的矢量;
    根据矢量划分信息对所述反量化的矢量进行矢量重构,得到反量化的加权谱;
    对所述反量化的加权谱进行幅度调整,得到解码数据。
  7. 一种用于音频的矢量量化编码装置,包括:
    幅度调整模块,用于对音频信号的变换域谱进行音频感知分析,根据分析结果对音频信号的变换域谱进行幅度调整,得到待量化的加权谱;
    矢量组织模块,用于对所述待量化的加权谱进行组织,得到多个待量化的矢量;以及
    量化编码模块,用于对所述多个待量化的矢量进行量化编码,得到矢量量化编码数据。
  8. 根据权利要求7所述的装置,其中,所述矢量组织模块被配置用于:
    构造所述待量化的加权谱的时频平面;
    根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且 根据划分结果将所述加权谱组织成多个待量化的矢量。
  9. 根据权利要求8所述的装置,其中,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤包括:
    基于频率抽取进行划分和组织,其具体为根据所述类型及音调性判断出所述音频信号是具有谐波结构的平稳信号,然后按照时间方向对所述时频平面进行划分,以谐波为单位对所述加权谱进行频率抽取,将所述加权谱组织成多个待量化的矢量;或者
    按时间方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号是平稳信号,然后按照时间方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者
    按频率方向进行划分和组织,其具体为根据所述类型和音调性判断出所述音频信号在时域具有快变特性,然后按照频率方向对所述时频平面进行划分,根据划分结果将所述加权谱组织成多个待量化的矢量;或者
    按时频区域进行划分和组织,其具体为根据所述音调性和类型判断出所述音频信号为复杂信号,然后将所述时频平面划分为多个时频区域,根据划分结果将所述加权谱组织成多个待量化的矢量
  10. 根据权利要求9所述的装置,其中,所述根据所述音频信号的类型及其音调性对所述时频平面进行划分,并且根据划分结果将所述加权谱组织成多个待量化的矢量的步骤还包括:
    按照使得编码增益最大的规则从所述基于频率抽取进行划分和组织、所述按时间方向进行划分和组织、所述按频率方向进行划分和组织、所述按时频区域进行划分和组织中选择一种或几种方式的组合进行所述划分和组织。
  11. 根据权利要求7所述的装置,其中,所述量化编码模块被配置用于:
    对所述多个待量化的矢量进行矢量量化编码;或者
    对所述多个待量化的矢量进行标量量化再进行熵编码。
  12. 一种用于音频信号的矢量量化解码装置,包括:
    量化解码模块,用于对矢量量化编码数据进行解码,得到反量化的矢量;
    矢量重构模块,用于根据矢量划分信息对所述反量化的矢量进行矢量重构,得到反量化的加权谱;
    频谱重构模块,对所述反量化的加权谱进行幅度调整,得到解码数据。
PCT/CN2014/095012 2013-12-25 2014-12-25 一种用于音频信号的矢量量化编解码方法及装置 WO2015096789A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310728959.2A CN104751850B (zh) 2013-12-25 2013-12-25 一种用于音频信号的矢量量化编解码方法及装置
CN201310728959.2 2013-12-25

Publications (1)

Publication Number Publication Date
WO2015096789A1 true WO2015096789A1 (zh) 2015-07-02

Family

ID=53477579

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/095012 WO2015096789A1 (zh) 2013-12-25 2014-12-25 一种用于音频信号的矢量量化编解码方法及装置

Country Status (2)

Country Link
CN (1) CN104751850B (zh)
WO (1) WO2015096789A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105070293B (zh) * 2015-08-31 2018-08-21 武汉大学 基于深度神经网络的音频带宽扩展编码解码方法及装置
US11528488B2 (en) 2020-04-09 2022-12-13 Jianghong Yu Image and video data processing method and system
US11503306B2 (en) 2020-04-09 2022-11-15 Jianghong Yu Image and video data processing method and system
CN113518227B (zh) * 2020-04-09 2023-02-10 于江鸿 数据处理的方法和***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0684705A2 (en) * 1994-05-06 1995-11-29 Nippon Telegraph And Telephone Corporation Multichannel signal coding using weighted vector quantization
EP0577488B1 (en) * 1992-06-29 1997-04-09 Nippon Telegraph And Telephone Corporation Speech coding method and apparatus for the same
CN1677490A (zh) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 一种增强音频编解码装置及方法
CN101110214A (zh) * 2007-08-10 2008-01-23 北京理工大学 一种基于多描述格型矢量量化技术的语音编码方法
WO2010000304A1 (en) * 2008-06-30 2010-01-07 Nokia Corporation Entropy - coded lattice vector quantization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0577488B1 (en) * 1992-06-29 1997-04-09 Nippon Telegraph And Telephone Corporation Speech coding method and apparatus for the same
EP0684705A2 (en) * 1994-05-06 1995-11-29 Nippon Telegraph And Telephone Corporation Multichannel signal coding using weighted vector quantization
CN1677490A (zh) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 一种增强音频编解码装置及方法
CN101110214A (zh) * 2007-08-10 2008-01-23 北京理工大学 一种基于多描述格型矢量量化技术的语音编码方法
WO2010000304A1 (en) * 2008-06-30 2010-01-07 Nokia Corporation Entropy - coded lattice vector quantization

Also Published As

Publication number Publication date
CN104751850A (zh) 2015-07-01
CN104751850B (zh) 2021-04-02

Similar Documents

Publication Publication Date Title
US7343287B2 (en) Method and apparatus for scalable encoding and method and apparatus for scalable decoding
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
AU2007208482B2 (en) Complex-transform channel coding with extended-band frequency coding
US8190425B2 (en) Complex cross-correlation parameters for multi-channel audio
US7953604B2 (en) Shape and scale parameters for extended-band frequency coding
AU2005337961B2 (en) Audio compression
CA2482427C (en) Apparatus and method for coding a time-discrete audio signal and apparatus and method for decoding coded audio data
CN105723452B (zh) 音频信号的频谱的频谱系数的解码方法及解码器
US20090112606A1 (en) Channel extension coding for multi-channel source
EP2279562B1 (en) Factorization of overlapping transforms into two block transforms
EP3165005B1 (en) Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
US10403292B2 (en) Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a HOA signal representation
US10194257B2 (en) Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a HOA signal representation
WO2015096789A1 (zh) 一种用于音频信号的矢量量化编解码方法及装置
CN111210832B (zh) 基于频谱包络模板的带宽扩展音频编解码方法及装置
US9794714B2 (en) Method and apparatus for decoding a compressed HOA representation, and method and apparatus for encoding a compressed HOA representation
WO2009125588A1 (ja) 符号化装置および符号化方法
Lee et al. Progressive multi-stage neural audio coding with guided references
US9800986B2 (en) Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a HOA signal representation
US8924202B2 (en) Audio signal coding system and method using speech signal rotation prior to lattice vector quantization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14874486

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14874486

Country of ref document: EP

Kind code of ref document: A1