WO2005096508A1 - Equipement de codage et de decodage audio ameliore, procede associe - Google Patents

Equipement de codage et de decodage audio ameliore, procede associe Download PDF

Info

Publication number
WO2005096508A1
WO2005096508A1 PCT/CN2004/001034 CN2004001034W WO2005096508A1 WO 2005096508 A1 WO2005096508 A1 WO 2005096508A1 CN 2004001034 W CN2004001034 W CN 2004001034W WO 2005096508 A1 WO2005096508 A1 WO 2005096508A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
frequency
inverse
signal
spectrum
Prior art date
Application number
PCT/CN2004/001034
Other languages
English (en)
Chinese (zh)
Inventor
Xingde Pan
Dietz Martin
Andreas Ehret
Holger HÖRICH
Xiaoming Zhu
Michael Schug
Weimin Ren
Lei Wang
Hao Deng
Fredrik Henn
Original Assignee
Beijing Media Works Co., Ltd
Beijing E-World Technology Co., Ltd.
Coding Technologies Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Media Works Co., Ltd, Beijing E-World Technology Co., Ltd., Coding Technologies Ab filed Critical Beijing Media Works Co., Ltd
Publication of WO2005096508A1 publication Critical patent/WO2005096508A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • G10L19/0208Subband vocoders

Definitions

  • the present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec apparatus and method based on a perceptual model.
  • the digital audio signal is audio encoded or audio compressed for storage and transmission.
  • the purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few odds as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
  • CDs represented the many advantages of using digital representations of audio speech, such as high fidelity, large dynamic range, and robustness.
  • these advantages are at the expense of a very high data rate.
  • the sampling rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is equally quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s.
  • Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost.
  • new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio.
  • MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques mainly used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates.
  • MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, because five-channel high-quality encoding cannot be achieved at a code rate lower than 540 kbps.
  • MPE&-2 AAC technology was proposed, which can achieve higher quality coding for five-channel signals at a rate of 320 kbps.
  • Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a booster 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self.
  • the filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48 kHz
  • MDCT modified discrete cosine transform
  • the sampled signal has a maximum frequency resolution of 23 Hz and a maximum inter-resolution of 2. 6 ras.
  • a sine window and a Ka i ser-Bes sel window can be used in the filter bank 102, and a sine window is used when the harmonic interval of the input signal is less than 140 Hz, and Ka i ser is used when a strong component of the input signal is greater than 220 Hz.
  • -Bes sel window whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48 kHz
  • the audio signal After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103.
  • the time domain noise shaping technique is in the frequency domain. Perform linear prediction analysis on the spectral coefficients, and then control the quantization noise according to the analysis.
  • the shape in the time domain is used to achieve the purpose of controlling the pre-echo.
  • the intensity/coupling module 104 is used for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.
  • the second-order backward adaptive predictor 105 is used to eliminate the redundancy of the steady-state signal and improve the coding efficiency.
  • the difference and stereo (M/S) module 106 operates for a pair of channels, which are two channels such as a channel or a left and right surround channel in a channel signal or a multi-channel signal.
  • the M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.
  • the bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-homogeneous quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation.
  • the nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop uses the ratio of the quantization noise to the masking threshold to estimate the encoding quality of the signal. .
  • the last encoded signal is passed through a bitstream multiplexing module 108 to form an encoded audio stream output.
  • the input signal simultaneously generates four equal-band bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • PQF quad-band polyphase filter bank
  • each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • a gain controller 101 is used in each frequency band.
  • the high frequency PQF band can be ignored to obtain a low sampling rate signal.
  • FIG. 2 shows a block diagram of the corresponding MPEG-2 AAC decoder.
  • the decoder includes a bitstream demultiplexing module 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, a time domain A noise shaping module 208, a filter bank 209, and a gain control module 210.
  • the encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream.
  • the lossless decoding module 202 After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a quantized value of the signal spectrum are obtained.
  • the inverse quantizer 203 is a set of non-homogeneous quantizers implemented by a companding function for converting integer quantized values into reconstructed spectra. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the difference channel to the left and right channels under the control of the side information.
  • the intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction.
  • IMDCT Improved Discrete Cosine Transform
  • the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
  • MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.
  • FIG. 3 is a schematic structural diagram of an encoder using Dolby AC-3 technology, including a transient signal detection module 301, an improved discrete cosine transform filter MDCT 302, a spectral envelope/exponential encoding module 303, a mantissa encoding module 304, Forward-backward adaptive sensing model 305, parameter bit allocation module 306, and bitstream multiplexing module 307.
  • the audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.
  • the spectral envelope/index encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes.
  • the AC-3 technique differentially encodes the spectral envelope in frequency because at most ⁇ 2 increments are required, each increment represents a 6 dB level change, and the first DC term is encoded in absolute value, and the remaining indices are used.
  • Differential encoding In the D15 spectrum envelope index coding, each 4th number needs about 2.33 bits, 3 difference packets are encoded in a 7-bit word length, and the D15 coding mode provides a fine frequency by 4 times time resolution. Resolution.
  • D15 is occasionally transmitted, usually every 6 sound blocks (one The spectral envelope of the data frame is transmitted once.
  • the D25 coding mode provides a suitable frequency resolution and time resolution, and is differentially coded every other frequency coefficient, so that each index requires approximately 1.15 bits.
  • the D45 coding mode is differentially coded every three frequency coefficients, so that each index requires approximately 0.58 bits.
  • the D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.
  • the forward-backward adaptive sensing model 305 is used to estimate the masking threshold for each frame of the signal.
  • the forward adaptive part is only applied to the encoder end.
  • an optimal set of perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame.
  • the backward adaptive part is applied to both the encoder side and the decoder side.
  • the parameter bit allocation module 306 analyzes the spectral envelope of the audio signal based on the masking criteria to determine the number of bits allocated to each mantissa.
  • the module 306 utilizes a bit pool for global bit allocation for all channels.
  • bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted according to the number of bits that can be obtained.
  • the AC-3 encoder also uses high frequency coupling technology to divide the high frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels starting from a certain sub-band. Coupling.
  • an AC-3 audio stream output is formed by the bit stream multiplexing module 307.
  • Figure 4 shows the flow diagram for decoding with Dolby AC-3.
  • the bit 3 ⁇ 4 ⁇ 4 encoded by the AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data error is detected, error concealment or muting processing is performed.
  • the bit stream is then unpacked to obtain the main information and the side information, and then exponentially decoded.
  • two side information is needed: one is the number of indexed packets; one is the indexing strategy used, such as D15, D25 or D45 mode.
  • the decoded index and bit allocation side information are then bit-allocated, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa.
  • the bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream.
  • the single coded mantissa value is dequantized, converted to a dequantized value, the mantissa occupying zero bits is restored to zero, or replaced by a random jitter value under the control of the jitter flag.
  • the decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • matrix processing is applied to a certain sub-band, then the solution is; the 5th end needs to convert the sub-band and the difference channel value into left and right channel values through matrix recovery.
  • the dynamic range control value of each audio block is included in the code stream, and the value is subjected to dynamic range compression to change the amplitude of the coefficient, including Index and mantissa.
  • the frequency domain coefficients are inversely transformed into a time domain sample, and then the time domain samples are windowed, and adjacent blocks are overlapped and added to reconstruct a PCM audio signal.
  • the number of channels outputted by the decoding is smaller than the number of channels in the encoded bit stream, it is also necessary to downmix the audio signal and finally output the PCM stream.
  • Dolby AC-3 encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is 4 ⁇ 384 kbps, its encoding effect is poor; and for mono and dual Channel stereo coding efficiency is also awkward.
  • the existing codec technology cannot comprehensively solve the codec quality from very low code rate, low bit rate to high bit rate audio signal and mono and two channel signals, and the implementation is complicated.
  • the technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio codec to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.
  • the enhanced audio coding apparatus of the present invention includes a frequency band extension module, a resampling module, a psychoacoustic analysis module, a time-frequency missing module, a quantization and entropy coding module, and a bit stream multiplexing module;
  • the original input audio signal is analyzed over the entire frequency band, and the spectral envelope of the high frequency portion and its characteristics related to the low frequency portion are extracted and output to the bit stream multiplexing module;
  • the resampling module is used for inputting The audio signal is resampled, the sampling rate of the audio signal is changed, and the audio signal of the changed sampling rate is output to the psychoacoustic analysis module and the time-frequency mapping module;
  • the psychoacoustic analysis module is configured to calculate the input audio signal a masking threshold and a mask ratio, which are output to the quantization and entropy encoding module;
  • the time-frequency mapping module is configured to convert a time domain audio signal into a frequency domain coefficient; and the quantization
  • the enhanced audio decoding device of the present invention comprises a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-EI inter-mapping module and a band extension module; and the bit stream demultiplexing module is used for compression
  • the audio data stream is demultiplexed, and outputs corresponding data signals and control signals to the entropy decoding module and the band extension module;
  • the entropy decoding module is configured to decode the foregoing signals, and recover the quantized values of the spectra, Outputting to the inverse quantizer;
  • the inverse quantizer group is configured to reconstruct an inverse quantization spectrum and output to the frequency-time mapping module;
  • the frequency-time mapping module is configured to perform frequency-time on the spectral coefficients Mapping, obtaining a time domain audio signal of a low frequency band;
  • the frequency band expansion module configured to receive frequency band extension control information output by the bit stream demultiplexing module and the frequency-time mapping module, and a time
  • the invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported.
  • FIG. 1 is a block diagram of an MPEG-2 AAC encoder
  • FIG. 2 is a block diagram of an MPEG-2 AAC decoder
  • Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology
  • Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology
  • Figure 5 is a schematic structural view of an audio encoding device of the present invention.
  • FIG. 6 is a schematic structural diagram of an audio decoding device of the present invention.
  • FIG. 7 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention
  • FIG. 8 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention.
  • Embodiment 9 is a schematic structural diagram of Embodiment 2 of an encoding apparatus according to the present invention.
  • Figure 10 is a schematic structural diagram of Embodiment 2 of the decoding device of the present invention.
  • Figure 11 is a schematic structural view of Embodiment 3 of the invention encoding device
  • Figure 12 is a schematic structural diagram of Embodiment 3 of the decoding device of the present invention.
  • Figure 13 is a schematic structural view of Embodiment 4 of the invention encoding device
  • FIG. 14 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform
  • Figure 15 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform
  • FIG. 16 is a schematic structural diagram of Embodiment 4 of the invention decoding apparatus
  • Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention.
  • Embodiment 5 of the decoding device of the present invention is a schematic structural diagram of Embodiment 5 of the decoding device of the present invention.
  • Figure 19 is a schematic structural view of Embodiment 6 of the invention encoding device
  • Embodiment 6 of the decoding device of the present invention is a schematic structural diagram of Embodiment 6 of the decoding device of the present invention.
  • Figure 21 is a schematic structural view of Embodiment 7 of the invention encoding device
  • Figure 22 is a schematic structural diagram of Embodiment 7 of the decoding device of the present invention.
  • Figure 23 is a schematic structural view of Embodiment 8 of the inventive encoding device.
  • Figure 24 is a schematic structural view of Embodiment 9 of the coding apparatus of the present invention.
  • Figure 25 is a schematic structural view of Embodiment 10 of the encoding apparatus of the present invention.
  • Figure 26 is a schematic structural view of Embodiment 11 of the encoding apparatus of the present invention.
  • Figure 27 is a schematic structural view of Embodiment 12 of the encoding apparatus of the present invention.
  • Figure 28 is a schematic structural view of Embodiment 13 of the encoding apparatus of the present invention.
  • Figure 29 is a schematic structural view of Embodiment 14 of the encoding apparatus of the present invention.
  • Figure 30 is a schematic structural diagram of Embodiment 8 of the decoding apparatus of the present invention.
  • Figure 31 is a schematic structural view of Embodiment 15 of the inventive encoding device.
  • Figure 32 is a schematic structural view of Embodiment 16 of the encoding apparatus of the present invention.
  • Figure 33 is a schematic structural view of Embodiment 17 of the inventive encoding device.
  • Figure 34 is a schematic structural view of Embodiment 18 of the encoding apparatus of the present invention.
  • Figure 35 is a schematic structural view of Embodiment 19 of the encoding apparatus of the present invention.
  • Figure 36 is a schematic structural diagram of Embodiment 9 of the decoding apparatus of the present invention.
  • Figure 37 is a schematic structural diagram of Embodiment 10 of the decoding apparatus of the present invention.
  • Figure 38 is a schematic structural diagram of Embodiment 11 of the decoding device of the present invention.
  • Embodiment 12 of a decoding apparatus is a schematic structural diagram of Embodiment 12 of a decoding apparatus according to the present invention.
  • Figure 40 is a block diagram showing the structure of a thirteenth embodiment of the decoding apparatus of the present invention.
  • FIG. 1 to FIG. 4 are schematic structural diagrams of several encoders of the prior art, which have been introduced in the background art, and are not described herein again.
  • the audio encoding apparatus includes a resampling module 50, a psychoacoustic analysis module 51, a time-frequency mapping module 52, a quantization and entropy encoding module 53, a band extension module 54, and a bit stream multiplexing module 55;
  • the resampling module 50 is configured to resample the input audio signal;
  • the psychoacoustic analysis module 51 The masking threshold and the signal mask ratio of the audio signal after resampling are calculated, and the signal type is analyzed;
  • the time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients; and the quantization and entropy encoding module 53 is used in the psychoacoustic analysis module 51.
  • the frequency domain coefficients are quantized and entropy encoded under the control of the output mask ratio, and output to the bit stream multiplexing module 55.
  • the band expansion module 54 is configured to analyze the input audio signal over the entire frequency band to extract the high frequency portion.
  • the spectral envelope and its characteristics related to the frequency portion are output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is used to expand the resampling module 50, the quantization and entropy encoding module 53 and the frequency band
  • the data output by module 54 is multiplexed to form an audio coded stream. '
  • the digital audio signal is resampled in the resampling module 50, the sampling rate is changed, and the resampled signal is input to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, respectively, and the frame audio is calculated in the psychoacoustic analysis module 51.
  • the masked threshold and the signal mask ratio of the signal are then passed to the quantization and entropy encoding module 53 as a control signal; on the other hand, the audio signal in the time domain is converted to frequency domain coefficients by the time-frequency mapping module 52.
  • the above-described frequency domain coefficients are quantized and entropy encoded in the quantization and entropy coding module 53 under the control of the mask ratio output by the psychoacoustic analysis module 51.
  • the original digital audio signal is subjected to analysis by the band expansion module 54, and the spectral envelope and spectral characteristic parameters of the high frequency portion are obtained and output to the bit stream multiplexing module 55.
  • the encoded data and control signals are multiplexed in a bit:; jfc multiplexing module 55 to form an enhanced audio coded code stream.
  • the resampling module 50 is used to resample the input audio signal, and the resampling includes upsampling and downsampling.
  • the following sampling is used as an example to illustrate resampling.
  • the resampling module 50 includes a low pass filter and a down sampler, wherein the pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by downsampling.
  • the psychoacoustic analysis module 51 is mainly used for calculating the masking value, the mask ratio and the perceptual entropy of the input audio signal, and analyzing the signal type.
  • the perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current frame signal to be transparently encoded, thereby adjusting the bit allocation between frames.
  • the psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantization and entropy coding module 53, which controls it.
  • the time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT Modified discrete cosine transform
  • the frequency domain coefficients obtained by the time-frequency mapping are output to the quantization and entropy coding module 53 for quantization and coding processing.
  • the quantization and entropy encoding module 53 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer.
  • the vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; the i-resonant vector quantizer considers the previous vector when quantifying a vector, ie, uses correlation between vectors Sex.
  • the main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a split mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer. If a scalar quantizer is employed, the non-linear quantizer group further includes M sub-band quantizers. In each sub-band quantizer, the scale factor is mainly used for quantization, specifically: performing nonlinear suppression on all frequency domain coefficients in the M scale factor bands, and then using the scale factor to determine the frequency domain coefficient of the sub-band.
  • Quantization is performed, and the quantized spectrum obtained by the integer is output to the encoder, and the first scale factor in each frame signal is output as a common scale factor to the bit stream multiplexing module 55, and other scale factors are differentially processed with the previous scale factor. Output to the encoder.
  • the scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy.
  • the present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:
  • each sub-band quantizer is initialized, and the quantized values of the spectral coefficients in all sub-bands are all zero.
  • the quantization noise of each sub-band is equal to the energy value of each sub-band
  • the noise masking ratio R of each sub-band is equal to its signal-to-mask ratio SMR, and the ratio of the number of remaining pixels is 0, and the number of remaining bits is equal to the target.
  • the scale factor is reduced by one unit, and then the number of bits required to increase the subband ⁇ 5, . ( ⁇ ) is calculated. If the remaining bits of the subband ), confirm the modification of the scale factor, and subtract the remaining bit number A by ⁇ 5 ; (3 ⁇ 4), recalculate the noise mask of the subband, and then continue to find the subband with the noise masking ratio NMR, repeat Perform the next steps. If the remaining number of bits of the subband is ⁇ ⁇ 5 ; ( ⁇ ), the modification is canceled, the previous scale factor and the remaining number of bits are retained, and finally the allocation result is output, and the bit allocation t process ends.
  • the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group.
  • the spectral flattening is performed according to the flattening factor, that is, the dynamic range of the spectrum is reduced, and then the vector quantizer is used.
  • the subjective perceptual distance criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and passes the corresponding codeword index to the encoder.
  • the flattening factor is adjusted according to the bit quantization strategy of the vector quantization, and the bit allocation of the vector quantization is controlled according to the perceived importance between different subbands.
  • Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then use
  • the entropy encoding Represents the entropy of the source, representing the symbol variable. Since the entropy r>> is the shortest limit of the average codeword length, the above formula indicates that the average length of the codeword is very close to its lower bound entropy HUd, so this variable length coding technique becomes "entropy coding". Huffman entropy encoding major coding, arithmetic coding, or run-length coding method, the entropy encoding may be employed in the present invention, the above-described coding; method of any one of five horses.
  • the codebook serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the codebook sequence number is entropy encoded to obtain
  • the code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantization spectrum to the bit stream multiplexing module 55.
  • the codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain The coded value of the codeword index is then output to the bitstream multiplexing module 55.
  • the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the band extension control information to the bit stream multiplexing module. 55.
  • band extension For most audio signals, the characteristics of the high-frequency part have a strong correlation with the characteristics of the low-frequency part, because the high-frequency part of the jt ⁇ audio signal can be effectively reconstructed through its low-frequency part. Thus, the high frequency component of the audio signal may not be transmitted. To ensure proper reconstruction of the high frequency portion, only a small number of band extension control signals are transmitted in the compressed audio stream.
  • the band expansion module 54 includes a parameter extraction module and a spectral envelope extraction module, and the input signal enters the parameter extraction module to extract parameters in different time-frequency regions i or representing spectral characteristics of the input signal, and then in the spectral envelope extraction module,
  • the time-frequency resolution estimates the spectral envelope of the high frequency portion of the signal. In order to ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the spectral resolution of the spectral envelope can be chosen freely.
  • the parameters of the input signal spectrum characteristics and the spectral envelope of the high frequency portion are sent to the bit stream multiplexing module 55 for multiplexing as the output of the band extension.
  • the bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum output by the quantization and entropy coding module 53 or the coded value of the codeword index and the band extension. After the information output by the module 54 is multiplexed, a compressed audio data stream is obtained.
  • the encoding method based on the above encoder specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal; calculating the resampling Signal-to-mask ratio of the signal; time-frequency mapping of the resampled signal to obtain frequency domain coefficients of the audio signal; quantization and entropy coding of the frequency domain coefficients; multiplexing of the band extension control signal and the encoded audio code stream , get the compressed audio stream.
  • the resampling process consists of two steps: limiting the frequency band of the audio signal; and multiplying the audio signal of the limited band.
  • time-frequency transform of time-domain audio signals such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), chord-modulated filter bank, wavelet transform, and so on.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT modified discrete cosine transform
  • chord-modulated filter bank wavelet transform
  • wavelet transform wavelet transform
  • time-frequency transform using the modified discrete cosine transform (MDCT)
  • MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.
  • the impulse response of the MDCT analysis filter is:
  • J > is the output frequency domain signal of the MDCT transform.
  • the S ine window can be used as a window function.
  • the above limitation on the window function can also be modified by using a bi-orthogonal transform with a specific analysis filter and synthesis filter.
  • time-frequency transform using cosine modulation filtering first select the time domain signal of the previous frame sample and the current frame sample, and then perform windowing operation on the time domain signals of two samples of the two frames, and then The windowed signal is subjected to cosine modulation conversion to obtain a frequency domain coefficient.
  • n 0X---,N,- 1
  • 0 ⁇ ⁇ M-1, 0 ⁇ n ⁇ 2KM- ⁇ is an integer greater than zero, 1 ⁇ .
  • the analysis window of the M sub-band cosine-modulated filter bank (analytical prototype filter) has an impulse response length of N.
  • integrated window integrated prototype filter
  • the impact response length is N s .
  • Calculating the masking threshold and the mask ratio of the resampled signal includes the following steps:
  • the first step is to map the signal to the time domain to frequency i or.
  • the second step is to determine the tonal and non-sounding components in the signal.
  • the tonality of the signal is estimated by inter-predicting each spectral line.
  • the Euclidean distance between the predicted and true values of each spectral line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones.
  • Very strong, and low-predictive spectral components are considered to be noise-like.
  • r pred [k] r M [k] + (r f _! [k] - r t _ 2 [ ⁇ ] ) , indicates the coefficient of the current frame; -1 indicates the coefficient of the previous frame; - 2 indicates the coefficient of the first two frames.
  • the unpredictability of each sub-band is the kk h of the unpredictability of the energy of all the lines in the sub-band
  • the third step is to calculate the signal-to-noise ratio (SNR) of each sub-band.
  • SNR Signal-to-noise ratio
  • the masking threshold for selecting the final signal is static.
  • the value of the masking threshold and the masked threshold calculated above is greater, ie .
  • the perceptual entropy calculated using the following formula, i.e. pe ⁇ - ifbwidthb X log lQ ( n [b] / (e [b] + l))), where represents the number of spectral lines cbwidth b included in each sub-band.
  • Step 5 Calculate the mask ratio (S i gna l - to- Ma sk Ra tio , SMR for short) of each sub-band signal.
  • the frequency domain coefficients are quantized and entropy encoded according to the mask ratio, wherein the quantization may be scalar quantization or vector quantization.
  • the scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer; selecting each frame The first scale factor in the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.
  • the vector quantization comprises the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; performing spectral flattening according to a flattening factor for each dimension vector; finding a code having the smallest distance from the vector to be quantized in the codebook according to the main ⁇ perceptual distance measure criterion Word, get its codeword index.
  • the entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coded value, and a lossless coded quantized spectrum; entropy coding the codebook sequence number to obtain a codebook serial number coded value.
  • a one-dimensional or multi-dimensional entropy encoding of the codeword index which will be the encoded value of the codeword index.
  • the above entropy coding method may use any of the existing Huffman codec, arithmetic coding or run length coding.
  • the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and the band extension control signal to obtain a compressed audio code stream.
  • FIG. 6 is a block diagram showing the structure of an audio decoding device of the present invention.
  • the tone decoding apparatus includes a bit stream demultiplexing module 601, an entropy decoding module 602, an inverse quantizer group 603, a frequency-time mapping module 604, and a band extension module 605.
  • a bit stream demultiplexing module 601 After the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 601, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 602 and the band extension module 605.
  • the teaching signal and the control signal are subjected to decoding processing in the entropy decoding module 602 to recover the quantized value of the spectrum.
  • the upper quantized value is reconstructed in the inverse quantizer group 603, and the inverse quantized spectrum is obtained.
  • the inverse quantized spectrum is output to the frequency-time mapping module 604, and the time domain audio signal is obtained through frequency-time mapping, and is in the band extension module. In the middle 605, the high frequency signal portion is reconstructed to obtain a wide-band time domain audio signal.
  • the bit stream demultiplexing module 601 decomposes the compressed audio code stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules.
  • the signals output to the entropy decoding module 602 include a common scale factor, a scale factor encoding straight, a codebook serial number encoded value, and a lossless encoded quantized spectrum, or an encoded value of a codeword index;
  • the control information is extended to the band extension module 605.
  • the ⁇ decoding module 602 receives the common scale factor, the scale factor coded value output by the bit stream demultiplexing module 601, The code book serial number coded value and the lossless coded quantized spectrum are then decoded by the code book number, the spectral coefficient is decoded, and The scale factor is decoded, the quantized spectrum is reconstructed, and the integer representation of the scale factor and the quantized value of the spectrum are output to the inverse quantizer group 603.
  • the decoding method adopted by the entropy decoding module 602 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, # ⁇ decoding, or run-length decoding.
  • the inverse quantizer group 603 After receiving the quantized value of the spectrum and the gas expression of the scale factor, the inverse quantizer group 603 inversely quantizes the transmitted quantized value into a non-scaled reconstructed spectrum (inverse quantized spectrum), and outputs inverse quantization to the frequency-time mapping module 604. Spectrum.
  • the inverse quantizer group 603 may be a uniform quantizer group or a non-uniform quantizer group realized by a companding number.
  • the quantizer group employs a scalar quantizer, and the inverse quantizer group 603 also employs a scalar inverse quantizer in the numerator apparatus.
  • the quantized values of the spectra are first nonlinearly expanded, and then each spectral factor is used to obtain all the spectral coefficients (inverse t-spectrum) in the corresponding scale factor bands.
  • the entropy decoding module 602 receives the coded value of the codeword index output by the bitstream demultiplexing module 601, and the coded value of the ⁇ 1 codeword index is used.
  • the entropy decoding method corresponding to the entropy encoding method at the time of encoding is decoded to obtain a corresponding codeword index.
  • the codeword index is output to the inverse quantizer group 603, and the quantized value (inverse quantization ⁇ ) is obtained by querying the codebook, and output to the frequency-time mapping module 604.
  • the inverse quantizer group 603 employs an inverse vector quantizer.
  • the inverse quantization spectrum is processed by the mapping of the frequency-time mapping module 604 to obtain a time domain audio signal of a low frequency band.
  • the frequency-time mapping block 604 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and Cosine modulating wave group, etc.
  • IDCT inverse discrete cosine transform
  • IMDCT inverse modified discrete cosine transform
  • the band extension module 605 receives the band extension information output by the bit stream demultiplexing module 601 and the time domain audio signal of the low frequency band of the frequency-time mapping module 604, and reconstructs the high frequency signal part by spectrum shifting and high frequency adjustment. Output a wideband audio signal.
  • the decoding method based on the above solution includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the upper information to obtain a quantized value; performing inverse quantization processing on the quantized value of the spectrum to obtain an inverse Quantization spectrum; 4 inverse quantization spectrum for frequency-time mapping, to obtain a low-frequency time domain audio signal; according to the band extension control signal, reconstruct the high-frequency portion of the time domain audio signal to obtain a wide-band audio signal.
  • the demultiplexed information includes a codebook serial number code value, a common scale factor, a scale factor coded value, and a lossless coded quantized spectrum, it indicates that the spectral coefficients are quantized using a scalar quantization technique in the encoding device, and the entropy decoding step is performed.
  • the method comprises: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing the quantization i Lu.
  • the entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.
  • the process of entropy decoding is illustrated by using the run length decoding method to decode the code book number, using the Huffman decoding method to decode the quantized coefficients, and using the Huffman decoding method to decode the scale factor.
  • the codebook number of all scale factor bands is obtained by the run-length decoding method.
  • the decoded codebook sequence number is an integer of a certain interval, and the interval is [0, 11], then only the valid range is set, that is, 0.
  • the codebook number between 11 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For the all-zero sub-band, you can select the corresponding code number of the code book, and the optional 0 number of the type.
  • the quantized coefficients of all scale factor bands are decoded using the spectral coefficient Huffman codebook corresponding to the codebook number. If the codebook number of a scale factor is within the valid range, and the embodiment is between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the codebook is decoded from the quantized spectrum to obtain the scale factor band. The codeword index of the quantized coefficients, and then unpacked from the codeword index To the quantized coefficient.
  • the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the scale factor band is not decoded, and the quantized coefficients of the subband are all set to zero.
  • the scale factor is used to reconstruct the spectral value based on the inverse quantized spectral coefficient. If the codebook number of the scale factor is in the effective range, each codebook number corresponds to a scale factor. When decoding the above scale factor, first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. If the quantized coefficients of the current subband are all zero, the scale factor of the subband does not need to be decoded.
  • the inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all the spectral coefficients (inverse quantization) in the corresponding scale factor bands according to each scale factor.
  • the demultiplexed information includes the coded value of the codeword index, it indicates that the coding device uses the vector quantization technique to quantize the spectral coefficients, and the entropy decoding step includes: adopting an entropy corresponding to the entropy coding method in the encoding device.
  • the decoding method decodes the encoded value of the codeword index to obtain a codeword index.
  • the codeword index is then inversely quantized to obtain an inverse quantized spectrum.
  • the method of performing frequency-time mapping processing on the inverse quantization spectrum corresponds to the time-frequency mapping processing method in the encoding method, and may be an inverse discrete cosine transform (IDCT), an inverse discrete Fourier transform (IDFT), or an inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.
  • IDCT inverse discrete cosine transform
  • IDFT inverse discrete Fourier transform
  • IMDCT inverse modified discrete cosine transform
  • the inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process.
  • the frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.
  • the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x, ., repet. IMDCT
  • the transformed expression is: where, represents the sample number
  • the time domain signal obtained by the IMDCT transform is windowed in the time domain.
  • Typical window functions are Sine windows, Kaiser-Bessel windows, and the like.
  • the scalar orthogonal transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.
  • the windowed time domain signal is superimposed to obtain a time domain audio signal.
  • the high frequency portion of the audio signal is reproduced based on the frequency extension spread control information and the time domain audio signal to obtain a wideband audio signal.
  • FIG. 7 is a schematic illustration of a first embodiment of an encoding device of the present invention.
  • This embodiment adds a difference and difference stereo (M/S) encoding module 56 on the basis of FIG. 5, which is located between the output of the time-frequency mapping module 52 and the input of the quantization and entropy encoding module 53, the psychoacoustic analysis module. 51 to the output signal type analysis results.
  • the psychoacoustic analysis module 51 calculates the masking threshold for the sum and difference channels in addition to the masking threshold for the audio signal mono, and outputs to the quantization and entropy code module 53.
  • the and difference stereo encoding module 56 can also be located between the quantizer group and the encoder in the quantization and entropy encoding module 53.
  • the difference stereo encoding module 56 converts the frequency domain coefficients/residual sequences of the left and right channels into the difference channel frequency domain coefficients/residual sequences by using the correlation between the two channels of the channel pair to achieve
  • the purpose of improving coding efficiency and stereo panning is therefore only applicable to channel-pair signals with consistent word types. If it is a mono signal or a channel pair signal with inconsistent signal type, .] does not perform and differential stereo encoding.
  • the encoding method based on the encoding apparatus shown in FIG. 7 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain coefficients, it is judged whether the audio signal is Multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo coding conditions.
  • the frequency of the difference channel is obtained: 3 ⁇ 4 coefficient; if not, no and difference stereo coding is performed; if it is a mono signal or a signal type inconsistent multi-channel signal , the frequency domain coefficients are not processed.
  • the difference stereo coding can be applied before the quantization process, before the entropy coding, that is, after the quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, such as a multi-channel signal. , determining whether the signal classes of the left and right channel signals are consistent. If the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels satisfy the difference and the sound coding conditions, and if they are satisfied, the difference is performed. Stereo encoding; if not, no and difference stereo encoding processing is performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain teaching is not performed and the differential stereo encoding processing is performed.
  • the spectral factor of the corresponding scale factor band of the right channel is r (k) , and its correlation matrix
  • C rr ⁇ ; r(A ⁇ * r ⁇ k); is the number of spectral lines of the scale factor
  • the frequency domain coefficients in the scale factor band corresponding to the left and right channels are replaced by the frequency domain coefficients of the linear transform and the difference channel:
  • M represents the sum frequency domain coefficient
  • S represents the difference channel frequency domain coefficient
  • L represents the left channel frequency domain coefficient
  • R represents the right channel frequency domain coefficient
  • the quantized frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:
  • A represents the quantized sum channel frequency domain coefficient; represents the quantized difference channel frequency domain coefficient; £ quantized left channel frequency domain coefficient; represents the quantized right channel frequency domain coefficient.
  • the phase of the left and right channels can be effectively removed, and since the quantization is performed, lossless coding can be achieved.
  • Fig. 8 is a schematic diagram of the first embodiment of the decoding apparatus.
  • the decoding apparatus adds a difference and stereo decoding module 606, based on the decoding apparatus shown in FIG. 6, between the output of the inverse quantizer group 603 and the input of the frequency-time generating module 604, and receives the bit stream demultiplexing.
  • the signal type analysis result and the cover stereo control signal output by the module 601 are used to convert the inverse quantized spectrum of the difference channel and the ⁇ quantized spectrum of the left and right channels according to the above control information.
  • the stereo decoding, and difference stereo decoding module 606 determines whether the quantized values of the inverse quantized spectrum/spectrum in some scale factor bands need to be subjected to differential stereo decoding based on the flag bits of the scale factor band. If the sum and difference stereo coding are performed in the encoding device, the inverse quantized spectrum must be subjected to the sum and difference stereo I code in the decoding device.
  • the sum and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603, and receive the difference stereo control signal and signal type output from the bit stream demultiplexing module 601.
  • the decoding method based on the decoding apparatus shown in FIG. 8 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 6, except that the following steps are added: After the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the signal types are the same, Determining whether inverse quantization spectrum and differential stereo decoding are required according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, The inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be performed with the difference stereo win code, the inverse quantization spectrum is not performed. Processing, direct processing.
  • the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is: when the language is obtained
  • the signal type analysis result indicates that the letter ⁇ type is consistent
  • it is judged according to the difference and the stereo control signal whether the quantized value of the spectrum needs to be performed and the difference stereo code; if necessary, according to each scale
  • the flag bit determines whether the scale factor band needs to be decoded with the difference body sound, and if necessary, converts the quantized value of the spectrum of the difference channel in the scale dice band into the quantized value of the spectrum of the left and right channels, and then performs subsequent Processing;
  • the word signal type is inconsistent or does not need to be performed and the difference stereo: ⁇ code, then the paired quantized value is not processed, r is followed by subsequent processing.
  • the left and right channels are obtained by using the following operations in the frequency factor of the scale factor band and the frequency domain of the difference channel.
  • the quantized sum channel frequency domain coefficients S represents the quantized channel frequency domain coefficients; Z represents the quantized left channel frequency domain coefficients; ⁇ represents the quantized right channel frequency domain coefficients.
  • the right channel is in the frequency domain of the inverse quantization of the subband.
  • Channel frequency domain coefficient represents the difference channel frequency domain coefficient; 7 table left channel frequency domain coefficient; r represents the right channel frequency domain system.
  • Fig. 9 is a view showing the construction of a second embodiment of the encoding apparatus of the present invention.
  • This embodiment adds a frequency domain linear prediction and vector quantization module 57 to FIG. 5 , 7 , wherein the frequency domain linear prediction and vector quantization module 57 is located at the output of the time-frequency mapping module 52 and the input of the quantization and entropy coding sample 53 .
  • the residual sequence is outputted to the quantization and entropy encoding module 53, and the quantized code index is output to the bit stream multiplexing module 55 as side information.
  • the frequency domain coefficients output by the time-frequency mapping module 52 are transmitted to the frequency domain linear prediction and vector quantization module 57. If the predicted gain value of the frequency domain coefficients satisfies a given condition, J5'J performs linear prediction filtering on the frequency domain coefficients to obtain The prediction coefficient is converted into a line coefficient LSF (Line Spec trum Frequency), and then the codeword index of each codebook is calculated by using the best distortion metric, and the codeword index is transmitted as side information to the ratio ⁇
  • LSF Line Spec trum Frequency
  • the frequency domain linear prediction and vector quantization module 57 is composed of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer.
  • the frequency domain coefficients are input into a linear pre-conductor for prediction analysis, and the predicted soil and prediction coefficients are obtained. If the value of the prediction gain satisfies certain conditions, the frequency domain coefficients are output to the linear prediction filter to perform linear prediction error filtering.
  • Get the pre-frequency of the frequency domain coefficient The residual sequence is directly output to the quantity and entropy coding module 53, and the prediction coefficient is converted into a line spectrum pair frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization.
  • the codeword index of the pair is transferred to the bitstream multiplexing module 55.
  • the edge language corresponding to the positive frequency component of the signal that is, the Hi l ber t envelope of the signal is related to the autocorrelation function of the signal.
  • the bandpass signal in each certain frequency range if its H i lber t envelope remains constant,
  • the sequence of spectral coefficients is a sequence of 3 ⁇ 4fe states with respect to frequency, so that the spectral value can be processed by predictive coding techniques, and the signal is represented by a common set of prediction coefficients.
  • the encoding method based on the encoding device shown in FIG. 9 is basically the same as the encoding method based on the encoding device shown in FIG. 5, except that the following steps are added: standard linear predictive analysis is performed on the frequency domain coefficients to obtain prediction gain and prediction coefficients; Determining whether the prediction gain exceeds a threshold value of t, and if so, performing frequency domain linear prediction error filtering according to the prediction coefficient tooth frequency domain coefficient to obtain a prediction residual sequence of the domain coefficient; degenerating the prediction coefficient into a line spectrum pair frequency coefficient, and The multi-level vector quantization processing of the line spectrum on the frequency coefficient line is performed to obtain the side information; the tooth residual sequence is quantized and entropy encoded; if the prediction gain exceeds the set threshold, the frequency domain coefficients are quantized and entropy encoded.
  • the standard linear prediction analysis of the frequency domain coefficients is first performed, including the self-correlation matrix and the recursive Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the prediction gain of the if calculation exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient; otherwise, the frequency domain coefficient is not processed, and the next step is performed, and the frequency domain coefficient is performed. Quantity ⁇ ⁇ ⁇ and ⁇ encoding.
  • Linear prediction can be divided into forward prediction and backward prediction.
  • Forward prediction refers to the prediction of the current value by the value before a certain moment
  • backward prediction refers to the prediction of the current value by the value after a certain moment.
  • A represents the prediction coefficient, which is the prediction order. After passing through the frequency-domain coefficients of the ⁇ division-frequency transformation, the predicted error E (k) is obtained, which is also called the difference sequence, and the relationship between the two is satisfied.
  • the frequency domain coefficient X (k) output by the time-frequency transform module can be represented by the residual sequence E 0> and a set of prediction coefficients ai .
  • the set of prediction coefficients ⁇ ,. is converted into a line spectrum pair frequency coefficient LSF, and multi-level vector quantization is performed thereon.
  • Vectorization selects the best distortion metric (such as the most recent criterion), and searches for the codeword that best matches the temple-quantized LSF parameter vector (residual vector) in each codebook, and the corresponding codeword index is used. Side information is transmitted ⁇ !.
  • the residual sequence is quantized and encoded.
  • the decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 607 to the output and frequency of the inverse quantizer group 603 based on the decoding apparatus shown in FIG.
  • the demultiplexing module 601 outputs inverse frequency domain linear vector quantization control information thereto for inverse quantization processing and inverse linear prediction filtering on the inverse quantization spectrum (residual spectrum)
  • the pre-predicted spectrum is obtained and output to the frequency-time mapping module 604.
  • frequency domain linear predictive vector quantization techniques are employed to suppress pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear pre-quantization control information outputted by the inverse quantization spectrum and the bit multiplex demodulation module 601 is input to the inverse frequency domain linear prediction and vector quantization module 607 to recover the linear prediction error. .
  • the inverse frequency domain linear prediction and vector quantization module 607 includes an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter, wherein the inverse vector quantizer is used to inversely quantize the codeword index row to obtain a line spectrum pair frequency coefficient LSF; Then, the line spectrum is inversely converted into a prediction coefficient by a frequency coefficient LSF; the inverse linear prediction filter is used to inversely filter the inverse quantization spectrum according to the prediction coefficient, to obtain a pre-predicted spectrum, and output to the frequency-time mapping module 604.
  • the decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 6, except that the following steps are added: After obtaining the inverse spectrum, it is judged whether or not the inverse quantization is included in the control information. Inverse frequency domain linear prediction vector quantization information, if it is, then inverse vector quantization processing is performed to obtain prediction coefficients, and the inverse quantization spectrum is linearly predicted and synthesized according to the prediction coefficients to obtain a spectrum before prediction; Frequency-time mapping.
  • the inverse quantization spectrum After obtaining the inverse quantization spectrum, determining whether the frame signal is subjected to frequency domain linear predictive vector quantization according to the control information, and if so, obtaining the quantized codeword index from the control information; and then obtaining the quantization according to the codeword index
  • the line spectrum pairs the frequency coefficient LSF, and calculates the prediction coefficient; then the inverse quantization spectrum is linearly predicted and synthesized to obtain the spectrum before prediction.
  • the residual sequence and the calculated prediction coefficients are synthesized by frequency domain linear prediction, and the pre-predicted spectrum (k) can be obtained, and the pre-predicted spectrum J (k) is subjected to frequency-time ⁇ mapping processing.
  • control information indicates that the frame signal has not undergone frequency domain linear predictive vector quantization
  • the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantized spectrum is directly subjected to frequency-time mapping processing.
  • FIG. 11 is a block diagram showing the structure of a third embodiment of the encoding apparatus of the present invention.
  • This embodiment based on FIG. 9, adds a difference and difference stereo coding module 56 between the output of the frequency domain linear prediction and vector quantization module 57 and the input of the quantization and entropy coding module 53, to which the psychoacoustic analysis module 51 The signal type analysis result is output, and the masked value of the difference channel is output to the quantization entropy encoding module 53.
  • the sum and difference stereo coding module 56 may also be located between the quantizer group and the encoder in the quantization entropy coding module 53 to receive the signal type analysis result output by the psychoacoustic analysis module 51.
  • the function and working principle of the and difference stereo coding module 56 are the same as those in FIG. 7, and will not be described here.
  • the encoding method based on the encoding apparatus shown in FIG. 11 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain system, it is determined whether the audio signal is Multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor satisfies the coding condition, and if it is satisfied, the scale factor band is performed. And difference stereo coding; if not full, the difference stereo coding process is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo coding process is not performed.
  • the difference stereo coding can be applied before the quantization and before the entropy coding, that is, after the quantization of the frequency domain coefficients, it is judged whether the audio signal is a multi-channel signal, if it is multi-channel. Signal, then determine whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determine whether the scale factor band satisfies the coding condition, and if so, perform a differential stereo coding on the ⁇ -factor band; If it is satisfied, the difference and the stereo encoding processing are not performed; if it is a multi-channel signal in which the mono signal signal types are inconsistent, the difference and the stereo encoding processing are not performed.
  • Figure 12 is a block diagram showing a third embodiment of the decoding apparatus.
  • the decoding apparatus adds a difference and difference stereo decoding module 606 on the basis of the decoding apparatus shown in FIG. 10, between the output of the inverse quantizer group 603 and the input of the inverse frequency domain linear prediction and vector quantization module 607, and the bit stream solution
  • the multiplexing module 601 outputs and the difference stereo control signals thereto.
  • the sum and difference stereo decoding module 606 may also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603, and receive the sum and difference stereo planting signals output by the bit stream demultiplexing module 601.
  • the function and working principle of the difference and stereo decoding module 606 are the same as those in FIG. 8, and details are not described herein again.
  • the decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 10, except that the following steps are added: After the inverse quantization is obtained, if the signal type analysis result indicates that the signal types are the same, Determining whether inverse quantization spectrum and differential stereo decoding are required according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bits on each scale factor band, if necessary, The 3 ⁇ 4 quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and the difference stereo decoding, the inverse quantization is not processed. , directly follow-up processing.
  • the difference stereo decoding can also be performed before the inverse quantization process, that is: after the obtained quantized value, if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control signal whether the quantized value of the spectrum needs to be Poor stereo decoding; if necessary, judging whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, the quantized value of the spectrum of the sum channel in the scale factor band Converted into the quantized values of the left and right channels, and then perform subsequent processing; if the signal types are inconsistent or do not need to perform and the difference stereo decoding, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed.
  • Fig. 13 is a view showing the construction of a fourth embodiment of the encoding apparatus of the present invention.
  • This embodiment adds a multi-resolution analysis module 59 on the basis of FIG. 5, wherein the multi-resolution analysis block 59 is located between the output of the time-frequency mapping module 52 and the input of the quantization and entropy coding module 53, psychoacoustic points.
  • the Zhejiang module 51 outputs a signal type analysis result to it.
  • the encoding apparatus of the present invention increases the time resolution of the frequency-domain coefficients of the fast-changing signal by the multi-resolution analyzing module 59.
  • the frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 59. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT > is obtained, and the frequency domain coefficients are obtained.
  • the multi-resolution representation is output to the quantization and entropy coding module 53. If it is a slowly varying odor signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy coding module 53.
  • the multi-resolution analysis module 59 performs time-frequency domain re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the expense of the frequency precision, thereby automatically adapting the time-frequency characteristics of the fast-changing type signal. The effect of suppressing the pre-echo is achieved.
  • the form of the filter bank in the frequency mapping module 52 can be adjusted at any time.
  • the multi-resolution analysis module 59 includes a frequency domain coefficient transform module and a recombination patch, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the reassembly module is configured to use the ⁇ "frequency plane coefficients according to certain rules.
  • the frequency domain coefficient transform module may use a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, and the like.
  • the following takes the frequency domain wavelet transform and the frequency domain MDCT transform as an example to illustrate the operation of the multi-segment analysis module 59. Cheng.
  • the wavelet base of the frequency domain wavelet or wavelet packet transform may be fixed or adaptive.
  • Harr wavelet-based wavelet transform Let's take the simplest Harr wavelet-based wavelet transform as an example to illustrate the process of multi-resolution analysis of frequency domain and number.
  • the wave-wave transform is performed, and the high-frequency part of the frequency-domain coefficient is subjected to Harr wavelet transform to obtain coefficients X 2 (k) and X 3 of different time-frequency intervals ( k), X (k), 5 (;), X ⁇ ⁇ X. ik) , the corresponding time-frequency plane is divided as shown in Figure 15.
  • Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, the time-frequency plane division of the signal analysis can be arbitrarily adjusted as needed to meet the analysis requirements of different time and frequency resolutions.
  • time-frequency plane coefficients are weighted according to certain rules in the recombination module.
  • the time-frequency plane coefficients may be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are according to the sub-window, The order of the scale factor bands.
  • Different frequency-frequency MDCT transforms can be used in different frequency domains to obtain different time-frequency plane partitions, that is, different time and frequency precisions.
  • the recombination module reorganizes the time-frequency domain data outputted by the frequency MDCT transform filter bank.
  • a recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then the organization Good coefficients are arranged in the order of the sub-window and scale factor bands.
  • the basic flow is the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: If it is a fast-changing type signal, multi-resolution is performed on the frequency domain coefficient. Analysis, then quantizing and entropy coding the multi-resolution representation of the frequency domain coefficients; if not fast-type signals, the frequency domain coefficients are directly quantized and entropy encoded.
  • the multi-resolution analysis may use a frequency domain wavelet transform method or a frequency domain MDCT transform method.
  • the frequency domain wavelet analysis method comprises: performing wavelet transform on the frequency domain coefficients to obtain time-frequency plane coefficients; and recombining the above-mentioned time-frequency plane coefficients according to a certain rule.
  • the MDCT transform method includes: performing n times MDCT transformation on the frequency domain coefficients to obtain a time-frequency plane coefficient; and recombining the above-mentioned time-frequency plane coefficients according to a certain rule.
  • the method of recombination may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the coefficients of the woven fabric are arranged in the order of the sub-window and the scale factor band.
  • Figure 16 is a block diagram showing the structure of a fourth embodiment of the decoding apparatus.
  • the decoding device adds a multi-resolution synthesis module 609 to the base of the anti-horse device shown in FIG.
  • the multi-resolution synthesis module 609 is located between the output of the inverse quantizer ⁇ JL 603 and the input of the frequency-time mapping module 604 for multi-resolution green combining of the inverse quantized spectrum.
  • a multi-resolution filtering technique is employed for the fast-changing type signal to improve the temporal resolution of the encoded fast-changing type signal. Accordingly, in the decoder, the multi-resolution synthesis module 609 is required to recover the frequency domain coefficients before the multi-resolution analysis for the fast-changing signal.
  • the multi-resolution synthesis module 609 includes: a coefficient recombination module and a coefficient transformation module, wherein the coefficient transformation module can adopt a frequency domain inverse wavelet transform filter bank or a frequency domain IMDC ⁇ transform filter group.
  • the basic flow is the same as the decoding method based on the de-horse device shown in FIG. 6, except that the following steps are added: After the inverse quantization spectrum is obtained, the inverse quantization " ⁇ Multi-resolution synthesis is performed, and the obtained prediction coefficients are subjected to frequency-time mapping.
  • the process is described in detail below with 128 IMDCT transforms (8 inputs, 16 outputs).
  • 128 IMDCT transforms 8 inputs, 16 outputs.
  • the 3 inverse quantization spectral coefficients are arranged in the order of sub-window and scale factor bands; then recombined according to the frequency order, so that 128 coefficients of each sub-blind are organized in frequency order.
  • the coefficients arranged by the sub-window are grouped by frequency every 8 groups: ⁇ "to the organization, each group of 8 coefficients are arranged in time series, so that there are 128 sets of coefficients in the frequency direction.
  • Each group of coefficients is converted by 16 points IMDCT, The 16 coefficients outputted by each group of IMDCT are superimposed and added to obtain 8 frequency domain data. 128 similar operations are performed from the low frequency to the high frequency direction, and 1024 frequency domain coefficients are obtained.
  • Fig. 17 is a view showing the configuration of a fifth embodiment of the encoding apparatus of the present invention.
  • a frequency domain linear prediction and vector quantization module 57 is added, which is located between the output of the multi-dividing rate analyzing module 59 and the input of the quantization and entropy encoding module 53.
  • the acoustic analysis module 51 outputs a signal type analysis result thereto;
  • the frequency domain linear prediction and vector quantization module 57 is configured to perform linear prediction and multi-level vector quantization on the frequency domain coefficients subjected to the multi-resolution analysis, and output the residual sequence to the quantization and entropy.
  • the encoding module 53 simultaneously outputs the quantized codeword index to the bitstream multiplexing module 55.
  • the frequency-domain linear prediction and vector quantization module 57 needs to linearly predict the frequency domain coefficients in each time period. And multi-level vector quantization.
  • the frequency domain linear prediction and vector quantization module 57 may also be located between the output of the time-frequency mapping module 52 and the input of the multi-resolution analysis module 59, performing linear prediction and multi-level vector quantization on the frequency domain coefficients, and outputting residuals;
  • the resolution analysis module 59 simultaneously outputs the quantized codeword index to the bitstream multiplexing module 55.
  • the encoding method based on the encoding apparatus shown in Fig. 17 is basically the same as the encoding based on the encoding apparatus shown in Fig. 13; the difference is that the following steps are added: After performing multi-resolution analysis on the frequency domain coefficients, ⁇ The frequency domain coefficient on the time period is subjected to standard linear prediction analysis; whether the prediction gain exceeds the set threshold value; if it exceeds, the frequency domain linear prediction error is filtered on the frequency domain coefficient to obtain the residual of the prediction coefficient and the frequency domain coefficient.
  • the prediction coefficient is converted into a line spectrum pair frequency coefficient, and the line spectrum is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold Then, the frequency domain coefficients are quantized and entropy encoded. Or before the multi-resolution analysis, the frequency domain coefficients are linearly predicted and multi-level vector quantized, and then the residual sequences are subjected to multi-resolution analysis.
  • Figure 18 is a block diagram showing the structure of a fifth embodiment of the decoding apparatus.
  • the decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 607 on the basis of the numerator apparatus shown in FIG. 16, and is located in the inverse quantizer group 603. Between the inputs of the resolution synthesis module 609, and the bitstream demultiplexing module 601 outputs inverse frequency domain linearity vector quantization control information thereto for inverse quantization processing and linear prediction synthesis of the inverse quantization spectrum to obtain a prediction mouth. The spectrum is output to the multi-resolution synthesis module 609.
  • the inverse frequency domain linear prediction and vector quantization module 607 can also be located between the output of the multi-resolution synthesis module 609 and the input of the page rate-time mapping module 604 for linear pre-processing of the multi-resolution integrated inverse quantization :: 5 is synthesized.
  • the decoding method based on the decoding device shown in FIG. 18 is the same as the decoding method based on the decoding device shown in FIG. 16, except that the following steps are added: After obtaining the inverse quantization spectrum, it is determined whether or not the control information needs to include the quantized spectrum. After inverse frequency domain linear prediction vector quantization information, if it is included, the inverse vector quantization is used to obtain the prediction coefficients, and the inverse quantization ⁇ is linearly predicted and synthesized to obtain the error before prediction; the pre-predictive spectral line multi-resolution synthesis is performed. .
  • the vector quantization process obtains the prediction coefficients, and performs linear prediction synthesis on the inverse quantization spectrum to obtain the spectrum before the ij prediction; the frequency-time mapping is performed before the prediction.
  • Fig. 19 is a view showing the construction of a sixth embodiment of the encoding apparatus of the present invention.
  • This embodiment is based on the encoding apparatus shown in Fig. 17, with an addition and difference stereo encoding module 56 between the output of the frequency domain linear prediction and vectoring module 57 and the input of the quantization and entropy encoding module 53.
  • the signal type analysis result from the psychoacoustic analysis module 51 is received.
  • the sum and difference stereo coding module 56 may also be located between the quantizer and the encoder in the quantization and entropy coding module 53.
  • the functions and working principles of the and difference stereo coding module 56 are the same as those in FIG. 11, and are not described herein again.
  • the encoding method based on the encoding device shown in FIG. 19 is basically the same as the encoding method based on the encoding device shown in FIG. 17, except that the following steps are added: after obtaining the residual sequence, according to whether the audio signal is a multi-channel signal and For signals of the same signal type and satisfying the encoding conditions, it is determined whether or not to perform stereo encoding on the difference; however, subsequent processing is performed.
  • the specific process has been introduced above, and will not be described here.
  • Fig. 20 is a block diagram showing the structure of the sixth embodiment of the decoding apparatus.
  • the decoding apparatus is based on the decoding apparatus shown in Fig. 18, and the sum and difference stereo decoding module 606 is added between the output of the inverse quantizer group 603 and the input of the inverse frequency domain linearity detecting and vector quantization module 607.
  • the sum and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603.
  • the function and working principle of the differential stereo decoding module 6 O 6 in this embodiment are the same as those in FIG. 12 , and details are not described herein again.
  • the decoding method based on the decoding apparatus shown in FIG. 20 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 18, except that the following steps are added: After the inverse quantization spectrum is obtained, the result and the stereo control are analyzed according to the signal type. The information judges whether the inverse quantization spectrum needs to be subjected to differential stereo decoding; then the subsequent processing is described above, and will not be described here.
  • Figure 21 is a diagram showing a seventh embodiment of the encoding apparatus of the present invention, which is based on the tomb of Figure 13, with the addition and difference stereo encoding module 56, which is located at the output and quantize of the multiresolution analysis module 59. Between the input of the domain decoding module 53 and the input. The sum and difference stereo encoding module 56 can also be located between the quantizer group and the encoder in the quantization and entropy encoding module 53. The difference and stereo encoding module 56 has been previously detailed and is no longer referred to herein.
  • the encoding method based on the encoding apparatus shown in FIG. 21 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 13, and the difference is that the following steps are added: After the multi-resolution analysis of the frequency domain coefficients, whether the audio signal is more or not
  • the channel signal is a signal of the same signal type and satisfies the encoding condition, and determines whether it is subjected to and differential stereo encoding; and then performs subsequent processing.
  • the specific process has been introduced above, and will not be described here.
  • Figure 22 is a diagram showing the seventh embodiment of the decoding apparatus.
  • the decoding apparatus adds a difference stereo decoding module 606 between the output of the inverse quantizer group 603 and the input of the multi-resolution synthesis module 609 based on the decoding apparatus shown in FIG.
  • the and difference stereo decoding module 606 can also be located between the output of the entropy decoding module 602 and the input of the inverse quantizer group 603.
  • the and difference stereo decoding module 606 has been previously detailed and will not be obscured here.
  • the decoding method based on the decoding device shown in FIG. 22 is basically the same as the decoding method based on the decoding device shown in FIG. 16, except that the following steps are added: After the inverse quantization spectrum is obtained, the result and the difference stereo control information are analyzed according to the signal type. It is judged whether or not the inverse quantization spectrum needs to be subjected to differential stereo decoding; then, subsequent processing is performed. The specific process has been introduced above and will not be described here.
  • Figure 23 is a diagram showing an eighth embodiment of the encoding apparatus of the present invention.
  • the embodiment is based on the encoding apparatus shown in Figure 13, and a signal property analyzing module 510 is added for outputting the resampling module 50.
  • the signal is subjected to signal type analysis, and the resampled signal is output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and the signal type analysis result is output to the bit stream multiplexing module 55.
  • the signal property analysis module 510 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.
  • the encoding method based on the encoding apparatus shown in Fig. 23 is basically the same as the encoding method based on the encoding apparatus shown in Fig. 13, except that the following steps are added:
  • the type of the resampled signal is analyzed as a part of signal multiplexing.
  • the signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis.
  • the specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and searching for each sub-frame. a local maximum point of the absolute value of the PCM data; selecting a subframe peak value in a local maximum point of each subframe; for a certain subframe peak, using a plurality of (typically 3) subframe peaks in front of the subframe to predict relative A typical sample value of a plurality of (typically 4) subframes of the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if the predicted difference and the ratio are both greater than
  • the set threshold value is determined to be a sudden signal in the sub-frame, and it is confirmed that the sub-frame has a local maximum peak point of the back masking pre-echo capability, if between the sub-end and the mask peak, 2.
  • the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until the judgment
  • the frame signal is broken out to be a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly-changing type signal.
  • Figures 24 to 28 are schematic views showing the ninth to thirteenth embodiments of the encoding apparatus of the present invention.
  • the above embodiment is based on the coding apparatus shown in FIG. 17, FIG. 19, FIG. 21, FIG. 9 and FIG. 11, respectively, and a signal property analysis module 510 is added for performing signal type analysis on the signal output by the resampling module 50. And outputting the resampled signal to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and outputting the signal type analysis result to the bit stream multiplexing module 55.
  • the encoding method based on the encoding apparatus shown in FIGS. 24 to 28 is basically the same as the encoding method based on the encoding apparatus shown in FIGS. 17, 19, 21, and 11, except that the following steps are added: analysis of the resampled The type of signal that is part of the signal multiplexing.
  • Figure 29 is a diagram showing a fourteenth embodiment of the encoding apparatus of the present invention.
  • the gain control module 511 is added, the audio signal outputted by the signal property analysis module 510 is received, the dynamic range of the fast-changing type signal is controlled, and the pre-echo in the audio processing is eliminated. Its output is connected to a time-frequency mapping module 52 and a psychoacoustic analysis module 51.
  • the gain control module 511 only controls the fast-change type signal, but does not perform the slow-change type signal. Direct, direct output.
  • the gain control module 511 adjusts the time domain energy envelope of the signal, and increases the gain value of the signal before the fast change point, so that the time domain signal amplitudes before and after the fast change point are relatively close; then the time domain is adjusted.
  • the time envelope signal of the energy envelope is output to the time-frequency mapping module 52, and the gain adjustment amount is output to the bit stream multiplexing module 55.
  • the encoding method based on the encoding apparatus shown in Fig. 29 is basically the same as the encoding method based on the encoding apparatus shown in Fig. 23, except that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
  • FIG. 30 is a schematic structural diagram of Embodiment 8 of the decoding apparatus.
  • the embodiment is an inverse gain control module 610 added to the output of the frequency-time mapping module 604 and the band expansion module 605.
  • the signal type analysis result and the gain adjustment amount information output by the bit stream demultiplexing module 601 are received, and the gain of the time domain signal is adjusted to control the pre-echo.
  • the inverse gain control module 610 controls the fast-change type signal, and does not process the slowly-varying type signal, and directly outputs it to the band extension module 605.
  • the inverse gain control module 610 adjusts the energy envelope of the reconstructed time domain signal according to the gain adjustment amount information, reduces the amplitude value of the signal before the fast change point, and adjusts the energy envelope back to the original front low and high height. State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-echo.
  • the decoding method based on the decoding apparatus shown in FIG. 30 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 16, except that the following steps are added: Before the frequency band expansion of the reconstructed time domain signal is performed, the reconstructed time domain signal is performed. Inverse gain control.
  • 31 to 35 are views showing the fifteenth to nineteenth embodiments of the encoding apparatus of the present invention.
  • the five embodiments are respectively added to the encoding device shown in FIG. 24 to FIG. 28, and a gain control module 511 is added for controlling the dynamic range of the audio signal analyzed by the signal type, and eliminating the pre-echo in the audio processing. Its output is connected to a time-frequency mapping module 52 and a psychoacoustic analysis module 51.
  • the encoding method based on the above five encoding means is basically the same as the encoding method based on the encoding apparatus shown in Figs. 24 to 28, except that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
  • 36 to 40 are diagrams showing the configuration of Embodiments 9 to 13 of the decoding apparatus, and the above 5 decoding apparatuses are respectively based on the decoding apparatus shown in Figs. 18, 20, 22, 10 and 12.
  • the inverse gain control module 610 is added between the output of the frequency-time mapping module 604 and the input of the band expansion module 605, and receives the signal type analysis result output by the bit stream demultiplexing module 601 for adjusting the time domain signal. Gain, control pre-echo.
  • the decoding methods based on the above five decoding devices are also substantially the same as the decoding methods based on the decoding devices shown in Figs. 18, 20, 22, 10, and 12, respectively, except that the following steps are added:
  • the inverse time gain is controlled on the reconstructed time domain signal before the time domain signal is reconstructed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un équipement de codage audio amélioré, comprenant un module d'extension de bande, un module de rééchatillonnage, un module d'analyse acoustique psychologique, un module de mise en correspondance temps-fréquence, un module de quantification, un module de codage entropique et un module de diplexage de train binaire. Le module d'extension de bande analyse les signaux audio originaux entrés sur toute la largeur de la bande, extrait l'enveloppe spectrale de la partie haute fréquence et le paramètre se caractérisant par la dépendance entre les parties inférieures et supérieures du spectre de fréquence, puis les sort du module de diplexage du train binaire; le module de rééchantillonnage rééchantillone les signaux audio entrés, modifie la vitesse d'échantillonnage, et les sort en direction du module d'analyse acoustique psychologique et le module de mise en correspondance temps-fréquence. L'invention concerne le codage de compression haute fidélité de signaux audio qui présentent des configurations de toutes sortes de vitesses d'échantillonnage et de canaux de son, supportent des signaux audio dont la vitesse d'échantillonnage à une portée de 8kHz 192kHz, ainsi que toutes les configurations de canaux de son disponibles et le codage/décodage audio qui présente un large débit de code objet.
PCT/CN2004/001034 2004-04-01 2004-09-09 Equipement de codage et de decodage audio ameliore, procede associe WO2005096508A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200410030947.3 2004-04-01
CN200410030947 2004-04-01

Publications (1)

Publication Number Publication Date
WO2005096508A1 true WO2005096508A1 (fr) 2005-10-13

Family

ID=35064123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2004/001034 WO2005096508A1 (fr) 2004-04-01 2004-09-09 Equipement de codage et de decodage audio ameliore, procede associe

Country Status (1)

Country Link
WO (1) WO2005096508A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008031458A1 (fr) * 2006-09-13 2008-03-20 Telefonaktiebolaget Lm Ericsson (Publ) Procédés et dispositifs pour émetteur/récepteur de voix/audio
CN101136202B (zh) * 2006-08-29 2011-05-11 华为技术有限公司 音频信号处理***、方法以及音频信号收发装置
CN102150202A (zh) * 2008-07-14 2011-08-10 三星电子株式会社 对音频/语音信号进行编码和解码的方法和设备
CN110022463A (zh) * 2019-04-11 2019-07-16 重庆紫光华山智安科技有限公司 动态场景下实现视频感兴趣区域智能编码方法及***
CN110956970A (zh) * 2019-11-27 2020-04-03 广州市百果园信息技术有限公司 音频重采样方法、装置、设备及存储介质
US20210343299A1 (en) * 2019-01-13 2021-11-04 Huawei Technologies Co., Ltd. High resolution audio coding
US11594235B2 (en) 2013-07-22 2023-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling in multichannel audio coding
CN116866972A (zh) * 2023-08-01 2023-10-10 江苏苏源杰瑞科技有限公司 一种基于双模通信的通信监测***和方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1213935A (zh) * 1997-09-03 1999-04-14 松下电器产业株式会社 分层图像编码解码、数字广播信号记录及图像音频的解码
EP1351218A2 (fr) * 2002-03-06 2003-10-08 Kabushiki Kaisha Toshiba Dispositif et procédé de reproduction d'un signal audio

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1213935A (zh) * 1997-09-03 1999-04-14 松下电器产业株式会社 分层图像编码解码、数字广播信号记录及图像音频的解码
EP1351218A2 (fr) * 2002-03-06 2003-10-08 Kabushiki Kaisha Toshiba Dispositif et procédé de reproduction d'un signal audio

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136202B (zh) * 2006-08-29 2011-05-11 华为技术有限公司 音频信号处理***、方法以及音频信号收发装置
CN101512639B (zh) * 2006-09-13 2012-03-14 艾利森电话股份有限公司 用于语音/音频发送器和接收器的方法和设备
WO2008031458A1 (fr) * 2006-09-13 2008-03-20 Telefonaktiebolaget Lm Ericsson (Publ) Procédés et dispositifs pour émetteur/récepteur de voix/audio
CN102150202A (zh) * 2008-07-14 2011-08-10 三星电子株式会社 对音频/语音信号进行编码和解码的方法和设备
US8532982B2 (en) 2008-07-14 2013-09-10 Samsung Electronics Co., Ltd. Method and apparatus to encode and decode an audio/speech signal
US9355646B2 (en) 2008-07-14 2016-05-31 Samsung Electronics Co., Ltd. Method and apparatus to encode and decode an audio/speech signal
US9728196B2 (en) 2008-07-14 2017-08-08 Samsung Electronics Co., Ltd. Method and apparatus to encode and decode an audio/speech signal
US11887611B2 (en) 2013-07-22 2024-01-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling in multichannel audio coding
US11594235B2 (en) 2013-07-22 2023-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling in multichannel audio coding
US20210343299A1 (en) * 2019-01-13 2021-11-04 Huawei Technologies Co., Ltd. High resolution audio coding
US11735193B2 (en) * 2019-01-13 2023-08-22 Huawei Technologies Co., Ltd. High resolution audio coding
CN110022463A (zh) * 2019-04-11 2019-07-16 重庆紫光华山智安科技有限公司 动态场景下实现视频感兴趣区域智能编码方法及***
CN110956970A (zh) * 2019-11-27 2020-04-03 广州市百果园信息技术有限公司 音频重采样方法、装置、设备及存储介质
CN110956970B (zh) * 2019-11-27 2023-11-14 广州市百果园信息技术有限公司 音频重采样方法、装置、设备及存储介质
CN116866972B (zh) * 2023-08-01 2024-01-30 江苏苏源杰瑞科技有限公司 一种基于双模通信的通信监测***和方法
CN116866972A (zh) * 2023-08-01 2023-10-10 江苏苏源杰瑞科技有限公司 一种基于双模通信的通信监测***和方法

Similar Documents

Publication Publication Date Title
WO2005096274A1 (fr) Dispositif et procede de codage/decodage audio ameliores
JP6518361B2 (ja) オーディオ/音声符号化方法およびオーディオ/音声符号化装置
JP5539203B2 (ja) 改良された音声及びオーディオ信号の変換符号化
JP4950210B2 (ja) オーディオ圧縮
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
CN101276587B (zh) 声音编码装置及其方法和声音解码装置及其方法
CA2608030C (fr) Train de bits audio a compression echelonnee ; codeur/decodeur utilisant un banc de filtre hierarchique et codage conjoint multicanal
EP2308045B1 (fr) Compression de facteurs d'échelle audio par transformation bidimensionnelle
US6253165B1 (en) System and method for modeling probability distribution functions of transform coefficients of encoded signal
EP1914724B1 (fr) Codage à double transformation de signaux audio
CN103329197B (zh) 用于反相声道的改进的立体声参数编码/解码
JP4081447B2 (ja) 時間離散オーディオ信号を符号化する装置と方法および符号化されたオーディオデータを復号化する装置と方法
US9037454B2 (en) Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT)
WO2006003891A1 (fr) Dispositif de decodage du signal sonore et dispositif de codage du signal sonore
EP1873753A1 (fr) Ameliorations apportees a un procede et un dispositif de codage/decodage audio
CN103366750B (zh) 一种声音编解码装置及其方法
US9230551B2 (en) Audio encoder or decoder apparatus
CN114365218A (zh) 空间音频参数编码和相关联的解码的确定
JP5629319B2 (ja) スペクトル係数コーディングの量子化パラメータを効率的に符号化する装置及び方法
IL305626B1 (en) Harmonic-inverse harmonic exchanger combination for high-frequency reproduction of audio signals
WO2005096508A1 (fr) Equipement de codage et de decodage audio ameliore, procede associe
EP2212883A1 (fr) Codeur
RU2409874C2 (ru) Сжатие звуковых сигналов
US20100280830A1 (en) Decoder
WO2006056100A1 (fr) Procede et dispositif de codage/decodage utilisant la redondance des signaux intra-canal

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC

122 Ep: pct application non-entry in european phase

Ref document number: 04762168

Country of ref document: EP

Kind code of ref document: A1