WO2005055203A1 - Codage de signaux audio - Google Patents

Codage de signaux audio Download PDF

Info

Publication number
WO2005055203A1
WO2005055203A1 PCT/IB2004/052602 IB2004052602W WO2005055203A1 WO 2005055203 A1 WO2005055203 A1 WO 2005055203A1 IB 2004052602 W IB2004052602 W IB 2004052602W WO 2005055203 A1 WO2005055203 A1 WO 2005055203A1
Authority
WO
WIPO (PCT)
Prior art keywords
type
frequency
decoder
granule
data samples
Prior art date
Application number
PCT/IB2004/052602
Other languages
English (en)
Inventor
Erik G. P. Schuijers
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP04799284A priority Critical patent/EP1692686A1/fr
Priority to JP2006542091A priority patent/JP2007515672A/ja
Publication of WO2005055203A1 publication Critical patent/WO2005055203A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition

Definitions

  • the present invention relates to the encoding and decoding of data signals.
  • the invention relates particularly, but not exclusively, to apparatus for encoding and decoding MPEG-1 layer III data signals.
  • MPEG-1 layer III (commonly known as MP3) is a widely used audio codec.
  • the industry standard for MP3 is described in ISO/IEC JTC 1/SC29/WG11 MPEG, IS 11172- 3, Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s, Part 3: Audio, MPEG-1, 1992.
  • This standard is available from the International Organization for Standardization (ISO) (www.iso.ch) and is hereby incorporated herein by way of reference.
  • Figure 1 presents a simplified block diagram of a typical conventional MPEG-
  • the encoder 10 is arranged to receive a PCM input signal comprising a series, or a frame, of 1152 audio samples.
  • the input signal is supplied to a (polyphase) analysis filterbank 12 which filters the input signal into 32 uniformly spaced, overlapping frequency bands to produce 32 downsampled subband signal components, each comprising 36 subband samples.
  • a windowed (forward) MDCT Modified Discrete Cosine Transform
  • MDCT unit 14 Four window types are used to accommodate variable time segmentation.
  • the so-called normal windows can be used, while for non-stationary parts of the signal a sequence of so-called short windows can be used.
  • Two transitory types of windows, the so-called start and stop windows, have been defined to prevent discontinuities when switching from normal to short windows and vice versa.
  • the MDCT is performed on 36 inputs (i.e. 36 subband samples) and produces 18 output MDCT coefficients, which are commonly referred to as frequency lines.
  • the MDCT is performed on three sets of 12 inputs (i.e. three sets of 12 subband samples) and produces three sets of 6 output MDCT coefficients, or frequency lines.
  • a set of 576 MDCT coefficients is known as a granule.
  • two granules are produced as a result of the overlapping nature of the encoding process.
  • 18 x 32 576 MDCT coefficients, or frequency lines, are produced for each 576 input samples.
  • the MDCT frequency lines are provided to anti-aliasing butterflies 16 to reduce the effect of aliasing caused by downsampling the partially overlapping filters of the filterbank 12.
  • a quantization and coding unit 18 performs appropriate quantization and coding of the frequency lines to produce an output signal in a prescribed bitstream format.
  • the quantization and coding is performed under the control of a bit-allocation unit 20 which performs a bit-allocation algorithm, typically steered by a psycho-acoustic model.
  • Figure 2 presents a simplified block diagram of a conventional MPEG-1 layer
  • the decoder 30 is arranged to receive an input signal in the prescribed bitstream format.
  • a decoding and dequantizing unit 32 performs decoding and dequantization of the bitstream to produce frequency lines, or MDCT coefficients.
  • a respective 576 frequency lines are reproduced for each set of 576 MDCT frequency lines produced by the encoder 10.
  • the frequency lines are provided to a re-ordering unit 34 which re-orders the frequency lines, in case of short type of windows, within each granule. In case of normal, start or stop windows, the frequency lines are provided to aliasing butterflies 36 which perform the inverse of the anti-aliasing operation performed by the anti-aliasing butterflies 16.
  • An IMDCT unit 38 performs an IMDCT (Inverse Modified Discrete Cosine Transform) on the frequency lines to produce 32 subband signal components each comprising 36 subband samples. For those frequency lines corresponding to a normal, start or stop window MDCT, the IMDCT unit 38 takes as input 18 frequency lines and generates 36 subband samples. For those frequency lines corresponding to a short window MDCT, the IMDCT unit 38 takes as input 3 sets of 6 frequency lines and generates 3 sets of 12 subband samples. A windowing operation and standard overlapping and adding operations are performed on the subband samples by a windowing and overlap-add unit 40. Information on which type of window to use is carried in the associated side information of the bit stream.
  • IMDCT Inverse Modified Discrete Cosine Transform
  • the subband samples are provided to a (polyphase) synthesis filterbank 42, which also comprises upsampling by a factor of 32, to produce an output signal comprising PCM samples.
  • the filterbanks 12, 42 comprise a prototype low pass filter that is cosine modulated to form the higher frequency bands.
  • the serial combination of a subband filterbank and an MDCT unit is known as a hybrid filterbank, because it partially consists of a filterbank and partially consists of a transform.
  • the analysis filterbank 12 and the MDCT unit 14 together comprise a hybrid analysis filterbank while in the decoder 30, the IMDCT unit 38 and the synthesis filterbank 42 together comprise a hybrid synthesis filterbank.
  • a first aspect of the invention provides a decoder for data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the decoder comprising means for decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; means for performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; and means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples, wherein, in respect of at least a first type of window function, said inverse frequency transform means is arranged to perform a single inverse frequency transform on all frequency lines of a respective granule, and wherein the decoder further includes
  • a second aspect of the invention provides a method of decoding data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the method comprising decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; and constructing an output signal from said windowed data samples, wherein, in respect of at least a first type of window function, a single inverse frequency transform is performed on all frequency lines within a respective granule.
  • the first and second aspect of the invention each allow the output signal to be generated without the need for a filterbank.
  • the encoded data signals comprise MPEG-1 layer III data signals and the forward and inverse frequency transforms comprise the Modified Discrete Cosine Transform (MDCT) and the Inverse Modified Discrete Cosine Transform (IMDCT) respectively.
  • the forward frequency transform inverse comprises the Modified Discrete Cosine Transform (MDCT) and the encoded data signals comprise MPEG-1 layer III data signals.
  • a third aspect of the invention provides an encoder for an input signal comprising a plurality of data samples, the encoder comprising means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; means for performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; and means for encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, said MDCT means is arranged to perform a single MDCT on all windowed data samples of the received data signal in respect of which a respective granule is produced.
  • MDCTs modified discrete cosine transforms
  • a fourth aspect of the invention provides a method of encoding an input signal comprising a plurality of data samples, the method comprising applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, a single MDCT is performed on all windowed data samples of the received data signal in respect of which a respective granule is produced.
  • MDCTs modified discrete cosine transforms
  • a fifth aspect of the invention provides a system, or codec, for encoding and decoding data signals, the system comprising an encoder of the third aspect of the invention and a decoder of the first aspect of the invention.
  • Figure 1 is a block diagram of a conventional MPEG-1 layer III encoder
  • Figure 2 is a block diagram of a conventional MPEG-1 layer III decoder
  • Figure 3 is a graphical representation of MDCT coefficients coming from the
  • FIG. 1 is a block diagram of a decoder MPEG-1 layer III signals, the decoder embodying one aspect of the present invention
  • Figure 7 shows the order of MDCT coefficients for short windows after reordering in the decoding apparatus of Figure 6
  • Figure 8 is a block diagram of an encoder for generating MPEG-1 layer III type signals embodying a third aspect of the invention.
  • a typical data frame comprises two granules of 576 frequency lines, or MDCT coefficients, each.
  • the 576 frequency lines comprise a respective set of 18 frequency lines for each of the 32 subbands.
  • each set of 18 frequency lines is comprised of 3 sets of 6 frequency lines.
  • the transformations are performed by the hybrid filterbank 12, 14.
  • the MDCT unit 14 performs one or more
  • the MDCTs performed by MDCT unit 14 may be said to comprise "short" MDCTs in that each MDCT is performed on only a respective (relatively small) portion of the frame data at a time.
  • a single MDCT is performed on the 36 input samples of a subband to produce 18 frequency lines.
  • three MDCT transforms are performed each on a respective set of 12 input samples of a subband to produce a respective set of 6 frequency lines.
  • the inverse MDCTs performed by IMDCT unit 38 may be said to comprise "short" inverse MDCTs since each inverse MDCT is performed on only a respective portion of the decoded and dequantized frequency lines produced in respect of the data frame.
  • a single inverse MDCT is performed on the 18 frequency lines of a subband to produce 36 time domain samples.
  • three inverse MDCT transforms are performed each on a respective set of 6 frequency lines of a subband to produce a respective set of 12 time domain samples.
  • a method of decoding MP3 data wherein one or more "long" inverse MDCTs are performed on the decoded and dequantized frequency lines, or MDCT coefficients, produced in respect of a whole data granule.
  • a granule of 576 frequency lines, or MDCT coefficients when a normal, start or stop type of window is required, a single "long” inverse MDCT is performed on all 576 frequency lines to produce 1152 time domain samples while, for a short type of window, three "long” inverse MDCTs are performed on a respective set of 192 frequency lines to produce a respective set of 384 time domain samples.
  • one or more inverse MDCTs are performed on all of the frequency lines of a granule as a whole rather than being performed on the respective frequency lines associated with respective subbands. It is found that, with some pre-processing of the frequency lines and with appropriate windowing and overlap-add operations, the outputs of the "long" inverse MDCTs may be used to provide a perceptually close approximation of the desired PCM output signal, thereby removing the need for a filterbank in the decoder. Similar principles may be applied during the encoding process thereby removing the need for a filterbank in the encoder. This is described in more detail below. In arriving at the present invention, the following observations are made: -an ideal filterbank consists of rectangular non-overlapping passbands.
  • the hybrid filterbank could be approximated quite accurately by a single "long" MDCT as described above.
  • the combination of filterbank and anti-aliasing butterflies gives a relatively good approximation of an ideal filterbank.
  • the hybrid filterbank in combination with anti-aliasing butterflies can be replaced by a single "long" MDCT.
  • n is a time index which, for conventional MP3 encoders, denotes subband sample index
  • N is the transform length or size
  • k is a frequency index
  • x[n] is the time domain signal which, in conventional MP3 encoders, comprises the subband time domain signal comprised of the subband samples
  • c[k] is the frequency domain MDCT spectrum.
  • Figure 3 illustrates the result of the hybrid analysis filterbank after antialiasing butterflies of a delta pulse graphically. It can be seen that the spectrum shown in Figure 3 is comprised of a cosine-type waveform with the waveform corresponding to odd, i.e. alternate or every second, subbands being negated (multiplied by -1). This is a characteristic shared with the output of a hybrid filterbank, which is known to comprise negated alternate subband components. Indeed, for every second subband of the synthesis filterbank 42 in the decoder 30, every second input value is negated (i.e. multiplied by -1) to compensate for the frequency inversion caused by the analysis filterbank 12 in the encoder 10.
  • the distortion that can be seen in Figure 4 is caused by the aliasing due to downsampling in the analysis filterbank which is only partially compensated by the anti-aliasing butterflies and by the fact that the analysis filter bank does not have an ideal linear phase characteristic.
  • the operation of a hybrid filterbank may be approximated by a MDCT.
  • one or more "long" MDCTs are used to replace the operation of the hybrid synthesis filterbank 38, 42 of the decoder 30.
  • one or more "long” MDCT may be used to replace the operation of the hybrid analysis filterbank 12, 14 of the encoder 10.
  • the decoding apparatus, or decoder 60 comprises a decoding and dequantization unit 62 arranged to receive a data signal in the form of an MPEG-1 layer III bitstream, or similarly encoded data signal.
  • the decoding and dequantization unit 62 performs appropriate decoding (typically Huffman decoding as prescribed by MP3) and re- quantization of the received bitstream to recover a plurality of frequency lines, or MDCT coefficients.
  • the decoding and dequantization unit 62 may perform standard MP3 decoding and re-quantization. Typically, for a frame comprising 1152 input audio samples, two granules of 576 frequency lines are recovered by the unit 62 (due to the overlap-add operation performed in the windowing, effectively 576 input samples deliver 576 MDCT coefficients and so the system is critically sampled).
  • the decoder 60 includes a re-ordering unit 64 for re-ordering, as necessary, the frequency lines produced by the decoding and dequantization unit 62. The re-ordering reverses the re-ordering that is normally performed by an encoder. This is described in more detail below.
  • the re-ordering unit 62 may determine what type of re-ordering is required from the side information associated with the respective frame.
  • An inverse MDCT unit 68 IMDCT is provided for performing one or more inverse MDCTs on the re-ordered frequency lines. As described above, the IMDCT unit 68 is arranged to operate on a whole granule of frequency lines at a time, performing either a single inverse MDCT on all frequency lines within the granule (when normal, short or stop type windows are required) or a plurality inverse MDCTs on a corresponding number of subsets of all the frequency lines within the granule (when short type windows are required).
  • the IMDCT unit 68 For an MP3 bitstream where a granule comprises 576 frequency lines, the IMDCT unit 68 performs a single inverse MDCT on the whole granule for normal, start or stop windows resulting in 1152 time domain samples, and three inverse MDCTs on a respective one of 3 sub-sets of 192 frequency lines, resulting in three respective sequences, or sets, of 384 time domain samples.
  • the output of the IMDCT unit 68 comprises a plurality (1152 in the present example) of recovered signal components, or samples, which may be used to construct a PCM output signal. In order to construct the PCM output signal, windowing and overlap -add operations are performed on the signal samples produced by the IMDCT unit 68.
  • the decoder 60 further includes a windowing and overlap-add unit 70, the operation of which is described in more detail below.
  • the synthesis filterbank 42 of a conventional MP3 decoder 30 negates alternate subband signal components, or subband channels, to compensate for the frequency inversion of the analysis filterbank 12 of the encoder 10.
  • the decoder 60 includes a negation unit 66 for negating, i.e. multiplying the relevant MDCT coefficients by -1, alternate subband signal components, or channels.
  • the negation unit 66 is shown in Figure 6 between the re-ordering unit 64 and the IMDCT unit 68 but may alternatively be located elsewhere, for example between the decoding and dequantization unit 62 and the re-ordering unit 64. It is also noted that the analysis filterbank 12 has overlapping subbands. The effects of this are normally reduced by the anti-aliasing butterflies 16 that are normally included in the encoder 10.
  • conventional MP3 windowing is now described in more detail. Within MP3 four different window types (and accompanying lengths) are prescribed, namely 'normal', 'start', 'short' and 'stop'.
  • a particular type of window, or sequence of different window types, is selected to suit the characteristics of the portion of the data to which the window(s) are to be applied. For example, short type windows are usually applied to data portions corresponding to transients in the audio signal.
  • the side information associated with a given data frame indicates which window types are to be used with the granule.
  • the required window type affects both the length, or size, of the MDCT (and therefore inverse MDCT) and the windowing/overlap-add operations.
  • the window functions z(n) may be described as follows: For a normal type of window (type 0):
  • each granule of 576 MDCT coefficients (32 subbands times 3 windows times 6 MDCT coefficients) are ordered to allow a more efficient encoding.
  • corresponding re-ordering takes place to reverse the re-ordering performed by the encoder.
  • the MDCT coefficients, or frequency lines, of a granule are re-ordered, in increasing granularity, according to frequency line, then window index and then sub-band.
  • each frequency line, or MDCT coefficient may be accorded a respective frequency line index from 0 to 575.
  • the frequency lines are ordered in accordance with a subband index which denotes to which subband they belong and runs from 0 to 31.
  • the frequency lines are ordered in accordance with a window index which denotes which window is to be applied to the frequency lines and runs from 0 to 2.
  • the frequency lines are ordered in accordance with a frequency line sub-index which denotes the order in which the frequency lines are provided to the MDCT and runs from 0 to 5.
  • the re-ordering unit 64 is arranged to re-order the frequency lines of a granule in a manner different to that described above for a conventional decoder.
  • the re-ordering unit 64 re-orders the frequency lines, in increasing granularity, according to frequency line, then subband and finally window. This is illustrated in Figure 7 from which it may be seen that within a granule 50', the frequency lines are ordered at the highest level according to window index, then according to subband index and then according to frequency band sub-index.
  • the construction of the PCM output signal by the windowing and overlap -add unit 70 in conjunction with the IMDCT unit 68 is now described. It is assumed in the following example that the original PCM signal comprises frames of 1152 audio samples, each frame being transformed into two granules of 576 frequency lines (or MDCT coefficients).
  • the IMDCT unit 68 operates on granules of 576 MDCT coefficients to produce a signal comprising 1152 samples which are then provided to the windowing and overlap-add unit 70.
  • the output signal v 0 ( «) is initialised to zero for all n.
  • the generation of the signal x, (n) is dependent on the specified window type.
  • the window type for the I th granule is 0, 1, or 3
  • the IMDCT unit 68 performs an inverse MDCT on 576 input coefficients provided by X, (k) to produce a temporary signal x, mp ⁇ n) of 1152 points as described in equation [9]:
  • the windowing and overlap -add unit 70 calculates the signal x, ⁇ n) as:
  • the windowing and overlap-add unit 70 calculates the signal X / ( «) by first calculating three temporary signals:
  • the windowing and overlap -add unit 70 calculates the signal x,(n) as:
  • divisor 1152 corresponds with the IMDCT length N and divisor 384 corresponds with N/3.
  • the respective window lengths of the window functions z(n) in equations [11], [12], [13] and [15] are longer in accordance with the respective transform length N and the respective divisors are correspondingly larger.
  • the window functions z( ⁇ ) of equations [11], [12], [13] and [15] may be said to comprise up- sampled versions of the window functions z( ⁇ ) described in equations [4], [5], [6] and [7] respectively, the extent of the up-sampling depending on the respective transform length/window length, N. It will also be noted that the window functions of equations [11], [12], [13] and [15] each comprises a single window function even though its application may involve the application of more than one window.
  • the windowing and overlap -add unit 70 makes only one application of the specified window type, i.e. applies only one window function, to the samples of a whole granule. This is in contrast to the conventional decoder 30 in which a window function is applied in respect of each subband.
  • the PCM output signal produced by the windowing and overlap-add unit 70 is found to comprise a high quality audio signal although it is not fully conformant or bit-true with the MP3 standard.
  • some phase distortion and aliasing are present leading to relatively small spectral distortions and time-domain distortions in comparison with MP3 conformant signals.
  • these distortions or artefacts are found not to have a significant adverse effect on human perception of the audio signal.
  • decoder 60 Effectively, in decoder 60 the hybrid synthesis filterbank is replaced by a "long" phase distorted inverse MDCT with some spectral aliasing.
  • the computational complexity of the decoder 60 is significantly reduced.
  • a typically optimized conventional MP3 decoder requires approximately 22.11 multiplications and 26.73 additions per output sample.
  • a correspondingly optimized decoder 60 requires just 8 multiplications and 20.5 additions per output sample.
  • the decoder 60 offers a higher decoding efficiency, the latter leading to less power consumption or lower DSP requirements.
  • a further aspect of the invention provides an apparatus for encoding an audio signal to produce an MPEG-1 layer III type signal or bitstream. It is noted that the bitstream is not a standard MP3 bitstream, although it is conformant with MP3 - the resulting decoded signals vary from the MP3 standard in phase response and aliasing. In essence, a "long" phase distorted MDCT is used to replace the analysis hybrid filterbank 12, 14 of the conventional encoder 10.
  • FIG 8 shows a simplified block diagram of an encoder 80 embodying this aspect of the invention.
  • the encoder 80 includes a windowing unit 82 which performs windowing on the received PCM input samples.
  • the windowing functions are similar to those described in equations [4], [5], [6] and [7] although the window lengths are different in accordance with the required MDCT transform size.
  • an MDCT unit 84 performs a "long" MDCT on all 1152 input samples of a received frame to produce 576 frequency lines.
  • the MDCT unit 84 performs three "long" MDCTs on a respective one of 3 sets of 384 input samples to produce a respective set of 192 frequency lines.
  • the encoder 80 may include a conventional MP3 quantization and coding unit 86 and bit allocation unit 88.
  • a negation unit 85 may be provided between the MDCT unit 82 and the quantization and coding unit 86 for negating, alternate, i.e. every second, subbands. It will be understood that the role of the negation unit 66 in decoder 60 is to compensate for the inherent negation of alternate subbands that occurs in conventional MP3 encoders. Correspondingly, the role of the negation unit 85 in encoder 80 is to create the negation of alternate subbands that would normally occur in a conventional encoder 10. However, the negation of alternate subbands is not essential and so, in alternative embodiments, the negation units 66, 85 may be omitted.
  • the decoder 60 is capable not only of decoding standard conformant MPEG-1 layer III data but also non-standard MPEG-1 layer III type data as produced by, for example, encoder 80.
  • the invention is not limited to MPEG-1 layer III data signals or to MDCTs.
  • a decoder embodying the first aspect of the invention may be arranged to operate on encoded data signals produced by an encoder (including non-MPEG- 1 layer III encoders) which provides unencoded data signals (especially but not necessarily audio signals) to a subband filterbank and subsequently causes a respective forward frequency transform to be performed on each resulting subband signal, i.e. a hybrid filterbank.
  • the subsequent quantizing and encoding need not necessarily be in accordance with MP3 so long as corresponding dequantizing and decoding is performed at the decoder.
  • the forward frequency transform need not necessarily comprise the MDCT so long as a compatible inverse frequency transform is employed by the decoder.
  • the term “granule” is primarily an MP3 term but a skilled person will readily understand that, in the context of non-MP3 embodiments, the term “granule” as used herein may be interpreted as any equivalent grouping of frequency lines or coefficients (commonly the term “frame” is equivalent to "granule”).
  • the subband filterbank and the frequency transform are critically sampled and the window functions overlap by 50% (hence the transform exhibits the Time Domain Aliasing Cancellation (TDAC) property) and, more preferably, real valued. It is also preferred, but not essential, that aliasing reduction is performed, e.g. by anti-aliasing butterflies, on the transformed subband signals at the encoder.
  • TDAC Time Domain Aliasing Cancellation
  • aliasing reduction is performed, e.g. by anti-aliasing butterflies, on the transformed subband signals at the encoder.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Selon un aspect, l'invention se rapporte à un décodeur pour des signaux de données de la couche III de MPEG-1. Dans le mode préféré de réalisation, le décodeur exécute une MDCT inverse simple sur chacune des 576 lignes de fréquence d'un granulé respectif pour des fonctions de fenêtre MP3 de type 0, 1 et 3, et exécute trois MDCT inverses sur trois ensembles de 192 lignes de fréquence pour des fonctions de fenêtre de type 2. Il s'avère que l'utilisation de MDCT inverses longues fournit une approximation suffisante d'un banc de filtres hybride comportant une pluralité de MDCT inverses courtes et un banc de filtres de synthèse. Par conséquent, un signal de sortie peut être construit sans recours à un banc de filtres. Un autre aspect de l'invention se rapporte à un codeur pour produire des signaux de données du type couche III de MPEG-1 dans lesquels les MDCT longues servent à remplacer le banc de filtres hybride. Par conséquent, les signaux de données du type couche III de MPEG-1 peuvent être produits sans recours à un banc de filtres.
PCT/IB2004/052602 2003-12-04 2004-11-30 Codage de signaux audio WO2005055203A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP04799284A EP1692686A1 (fr) 2003-12-04 2004-11-30 Codage de signal audio
JP2006542091A JP2007515672A (ja) 2003-12-04 2004-11-30 オーディオ信号符号化

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03104535 2003-12-04
EP03104535.4 2003-12-04

Publications (1)

Publication Number Publication Date
WO2005055203A1 true WO2005055203A1 (fr) 2005-06-16

Family

ID=34639327

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/052602 WO2005055203A1 (fr) 2003-12-04 2004-11-30 Codage de signaux audio

Country Status (5)

Country Link
EP (1) EP1692686A1 (fr)
JP (1) JP2007515672A (fr)
KR (1) KR20060131767A (fr)
CN (1) CN1890712A (fr)
WO (1) WO2005055203A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9552818B2 (en) 2012-06-14 2017-01-24 Dolby International Ab Smooth configuration switching for multichannel audio rendering based on a variable number of received channels
US10388287B2 (en) 2015-03-09 2019-08-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102243872A (zh) * 2010-05-10 2011-11-16 炬力集成电路设计有限公司 对音频数字信号进行编码、解码的方法及***
JP7385531B2 (ja) * 2020-06-17 2023-11-22 Toa株式会社 音響通信システム、音響送信装置、音響受信装置、プログラムおよび音響信号送信方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101726A1 (fr) * 2001-06-08 2002-12-19 Stmicroelectronics Asia Pacific Pte Ltd Banc de filtrage unifie pour le codage audio

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101726A1 (fr) * 2001-06-08 2002-12-19 Stmicroelectronics Asia Pacific Pte Ltd Banc de filtrage unifie pour le codage audio

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"MPEG DIGITAL AUDIO CODING SETTING THE STANDARD FOR HIGH-QUALITY AUDIO COMPRESSION", IEEE SIGNAL PROCESSING MAGAZINE, IEEE INC. NEW YORK, US, September 1997 (1997-09-01), pages 59 - 81, XP000859051, ISSN: 1053-5888 *
BOSI M ET AL: "ISO/IEC MPEG-2 ADVANCED AUDIO CODING", JOURNAL OF THE AUDIO ENGINEERING SOCIETY, AUDIO ENGINEERING SOCIETY. NEW YORK, US, vol. 45, no. 10, October 1997 (1997-10-01), pages 789 - 812, XP000730161, ISSN: 0004-7554 *
BRANDENBURG K: "Low bitrate audio coding - state-of-the-art, challenges and future directions", COMMUNICATION TECHNOLOGY PROCEEDINGS, 2000. WCC - ICCT 2000. INTERNATIONAL CONFERENCE ON BEIJING, CHINA 21-25 AUG. 2000, PISCATAWAY, NJ, USA,IEEE, US, vol. 1, 21 August 2000 (2000-08-21), pages 594 - 597, XP010526818, ISBN: 0-7803-6394-9 *
INTERNATIONAL STANDARDS ORGANIZATION: "Final text for DIS 11172-3 (rev. 2) : Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media - Part 1 - Coding at up to about 1.5 Mbit/s (ISO/IEC JTC 1/SC 29/WG 11 N 0156) [MPEG 92] - Section 3: Audio", CODED REPRESENTATION OF AUDIO, PICTURE MULTIMEDIA AND HYPERMEDIA INFORMATION (TENTATIVE TITLE). APRIL 20, 1992. ISO/IEC JTC 1/SC 29 N 147. FINAL TEXT FOR DIS 11172-1 (REV. 2) : INFORMATION TECHNOLOGY - CODING OF MOVING PICTURES AND ASSOCIATED AUDIO F, 1992, pages III - V,174, XP002083108 *
PAN D: "A TUTORIAL ON MPEG/AUDIO COMPRESSION", IEEE MULTIMEDIA, IEEE COMPUTER SOCIETY, US, vol. 2, no. 2, 1995, pages 60 - 74, XP000525989, ISSN: 1070-986X *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9552818B2 (en) 2012-06-14 2017-01-24 Dolby International Ab Smooth configuration switching for multichannel audio rendering based on a variable number of received channels
US9601122B2 (en) 2012-06-14 2017-03-21 Dolby International Ab Smooth configuration switching for multichannel audio
US10388287B2 (en) 2015-03-09 2019-08-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
US10395661B2 (en) 2015-03-09 2019-08-27 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
US10777208B2 (en) 2015-03-09 2020-09-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
US11107483B2 (en) 2015-03-09 2021-08-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
US11238874B2 (en) 2015-03-09 2022-02-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
US11741973B2 (en) 2015-03-09 2023-08-29 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
US11881225B2 (en) 2015-03-09 2024-01-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal

Also Published As

Publication number Publication date
JP2007515672A (ja) 2007-06-14
CN1890712A (zh) 2007-01-03
KR20060131767A (ko) 2006-12-20
EP1692686A1 (fr) 2006-08-23

Similar Documents

Publication Publication Date Title
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
KR100892152B1 (ko) 시간-이산 오디오 신호를 부호화하기 위한 장치 및 방법그리고 부호화 오디오 데이터를 복호화하기 위한 장치 및방법
US7343287B2 (en) Method and apparatus for scalable encoding and method and apparatus for scalable decoding
EP2264699B1 (fr) Dispositif et procédé pour le post-traitement de valeurs spectrales et codeur et décodeur pour signaux audio
US8195730B2 (en) Apparatus and method for conversion into a transformed representation or for inverse conversion of the transformed representation
US7917564B2 (en) Device and method for processing a signal having a sequence of discrete values
EP1711938A1 (fr) Decodage de signaux audio a l'aide de donnees de valeur complexe
WO2007052088A1 (fr) Compression audio
JP2010538314A (ja) 切り換え可能な時間分解能を用いた低演算量のスペクトル分析/合成
JP3814611B2 (ja) 時間離散オーディオサンプル値を処理する方法と装置
Geiger et al. IntMDCT-A link between perceptual and lossless audio coding
EP1692686A1 (fr) Codage de signal audio
Lam et al. Digital filtering for audio coding
Chen et al. Fast time-frequency transform algorithms and their applications to real-time software implementation of AC-3 audio codec
Herre Audio Coding Based on Integer Transforms

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200480035931.X

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 2004799284

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020067010745

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2006542091

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2440/CHENP/2006

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 2004799284

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020067010745

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 2004799284

Country of ref document: EP