MXPA06004049A - Method for encoding a digital signal into a scalable bitstream;method for decoding a scalable bitstream - Google Patents

Method for encoding a digital signal into a scalable bitstream;method for decoding a scalable bitstream

Info

Publication number
MXPA06004049A
MXPA06004049A MXPA/A/2006/004049A MXPA06004049A MXPA06004049A MX PA06004049 A MXPA06004049 A MX PA06004049A MX PA06004049 A MXPA06004049 A MX PA06004049A MX PA06004049 A MXPA06004049 A MX PA06004049A
Authority
MX
Mexico
Prior art keywords
signal
bit
digital signal
bitstream
perceptual
Prior art date
Application number
MXPA/A/2006/004049A
Other languages
Spanish (es)
Inventor
Rongshan Yu
Xiao Lin
Susanto Rahardja
Original Assignee
Agency For Science Technology And Research
Xiao Lin
Susanto Rahardja
Rongshan Yu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science Technology And Research, Xiao Lin, Susanto Rahardja, Rongshan Yu filed Critical Agency For Science Technology And Research
Publication of MXPA06004049A publication Critical patent/MXPA06004049A/en

Links

Abstract

A method for encoding a digital signal into a scalable bitstream comprising quantizing the digital signal, and encoding the quantized signal to form a core-layer bitstream, performing an error mapping based on the digital signal and the core-layer bitstream to remove information that has been encoded into the core-layer bitstream, resulting in an error signal, bit-plane coding the error signal based on perceptual information of the digital signal, resulting in an enhancement-layer bitstream, wherein the perceptual information of the digital signal is determined using a perceptual model, and multiplexing the core-layer bitstream and the enhancement-layer bitstream, thereby generating the scalable bitstream. A method for decoding a scalable bitstream into a digital signal comprising de-multiplexing the scalable bitstream into a core-layer bitstream and an enhancement-layer bitstream, decoding and de-quantizing the core-layer bitstream to generate a core-layer signal, bit-plane decoding the enhancement-layer bitstream based on perceptual information of the digital signal, and performing an error mapping based on the bit-plane decoded enhancement-layer bitstream and the de-quantized core-layer signal, resulting in an reconstructed transformed signal, wherein the reconstructed transformed signal is the digital signal.

Description

For two-letter codes and other abbreviations, refer to the "Guidance Notes on Codes and Abbreviations" appearing at the beginning-ning ofeach regular issue of the PCT Gazette.
METHOD FOR CODING A DIGITAL SIGNAL IN A SCALABLE BIT FLOW; METHOD TO DECODE A SCALABLE BITS FLOW BACKGROUND OF THE INVENTION Recently, in parallel with advances in computers, networks and communications, the continuous transmission of audio content through networks such as the Internet, wireless local area networks, home networks and cell phone systems is is becoming one of the main means for the delivery of the audio service. It is believed that with the progress of broadband network infrastructures such as xDSL, fiber optic and broadband wireless access, the bit rates for these channels are rapidly approaching those required for the delivery of audio signals without loss , with sampling frequency and high amplitude resolution (for example, 96 kHz, 24 bits / sample). On the other hand, there are still application areas where high compression digital audio formats are required, such as MPEG-4 AAC (described in [1]). As a result, interoperable solutions covering current channels and rapidly developing broadband channels are in high demand. In addition, even with the wide availability of broadband channels and the elimination of broadband restriction, a coding system with frequency of 52/356/06 scalable bits that are capable of producing a hierarchical bit stream whose bit frequencies can be changed dynamically during transmission is still highly favorable. For example, for applications where packet loss occurs occasionally due to accidents or requirements for the use of shared resources, representations of current waveforms such as PCM (Pulse Code Modulation, Pulse Code Modulation) and the formats of Lossless encoding can suffer severe distortions in a continuous transmission situation. However, this problem can be solved if one can set packet priorities in case network resources change dynamically. Finally, a scalable bit rate coding system also provides the server with advantageous audio streaming services, where a balanced degradation of the QoS can be achieved if an excessive number of requests are received from the client sites. Previously, many lossless audio coding algorithms have been proposed (see [2] - [8]). Most approaches rely on a prediction filter to eliminate redundancy of the original audio signals while the residuals are encoded by entropy (as described in [5] - [12]). Due to the existence of predictive filters, bit streams 52/356/06 generated by these prediction-based approaches are difficult and inefficient (see [5], [6]), if not impossible, to scale to achieve bit rate scalability. Other approaches, as described in [3], construct the lossless audio encoder through a two-layer approach where the original audio signals are encoded first with a lossy encoder and their residuals are then encoded with a residual encoder. Although this two-layer design provides some measure of scalability of the bit rate, its granularity is too thick to be appreciated by streaming audio applications. Audio codes were proposed that provide the fine-grained scalability in the bit frequency previously in [4] and [18], however, unlike the system to be treated in this document, these codees do not provide compatibility with the previous systems while the lossy bitstreams produced by both codees are incompatible with any existing audio codee. In [21], [22] and [23], the perceptual models are described. The object of the invention is to provide a method for encoding a digital signal in a scalable bitstream in which compatibility with the previous systems is maintained. 52/356/06 SUMMARY OF THE INVENTION A method for coding a digital signal in a scalable bit stream is provided, such method comprises: quantizing the digital signal, and encoding the quantized signal to form a data stream of the core layer; performing an error mapping based on the digital signal and the flow of the core layer to eliminate the information encoded in the bit stream of the core layer, which results in an error signal; encoding the error signal by bit planes based on the perceptual information of the digital signal, resulting in a bit stream of the enhancement layer, in which the perceptual information is determined by a perceptual model; and - multiplexing the bit stream of the core layer and the bit stream of the enhancement layer, thereby generating the scalable bit stream. In addition, an encoder is provided for encoding a digital signal into a scalable bit stream, a means readable by a computer, a computer program element, a method for decoding a scalable bit stream to a digital signal, a decoder for decoding a scalable bit stream to a digital signal, a means readable by an additional computer and an additional computation program element according to the method described above. 52/356/06 In one modality, a lossless audio codec is presented that achieves fine-grained bit-rate scalability (FGBS) with the following characteristics: - Compatibility with previous systems: a core layer bit stream with high compression, such as an MPEG-4 AAC bit stream, is embedded in the lossless bit stream. Bitstream without perceptually embedded loss: the lossless bit stream can be truncated at any lossy frequency, with no loss in the perceptual optimization of the reconstructed audio. - Low complexity: it only adds a limited calculation on the AAC (binary arithmetic code) as well as very limited memory. The broad functionality provided by the presented audio codee suggests its ability to serve as a "universal" audio format and meet the various frequency / quality requirements for different streaming or storage applications. For example, a bit stream compatible with MPEG-4 AAC which is used as the bitstream of the core layer can be easily extracted from the data stream generated by the codec for conventional MPEG-4 AAC audio services. On the other hand, compression is also provided 52/356/06 without loss through the codec for audio storage or editing applications with lossless reconstruction requirements. In audio streaming applications where FGBS is required, the lossless codec bitstream may also be truncated at lower bit rates in the encoder / decoder or in the communication channel for any frequency / fidelity / complexity constraint that may arise in practical systems. In one embodiment, a method is provided for encoding a digital signal to form a scalable bitstream, in which the scalable bitstream can be truncated at any point to produce a lower quality (lossy) signal when decoded by a decoder. The method can be used to encode any type of digital signal, such as audio, image or video signals. The digital signal, which corresponds to a measured physical signal, can be generated by analyzing at least one distinctive characteristic of a corresponding analog signal (for example, the chrominance and luminance values of a video signal, the amplitude of a signal of analog sound, or the analog detection signal of a sensor). For example, a microphone can be used to capture an analog audio signal, which is then converted into a digital audio signal when sampling and quantizing the analog audio signal 52/356/06 captured. A video camera can be used to capture a digital video signal, which is then converted into a digital video signal using an appropriate analog-to-digital converter. Alternatively, a digital camera can be used to directly capture the image or video signal in an image sensor (CMOS or CCD) as digital signals. Next, the digital signal is quantized and encoded to form a core layer bit stream. The core layer bitstream forms the minimum bit / quality frequency of the scalable bitstream. An enhancement layer data stream is used to provide an additional bit / quality frequency to the scalable bit stream. The bit stream of the enhancement layer is formed according to the invention by performing an error mapping based on the transformed signal and the core layer bit stream to generate an error signal. The purpose of performing an error mapping is to eliminate the information that has already been encoded in the bit stream of the core layer. The error signal is encoded by bit planes to form the bit stream of the enhancement layer. The coding by bit planes of the error signal is made based on the perceptual information, that is, the perceived or perceptual importance of the digital signal. The 52/356/06 perceptual information used in the present invention relates to information related to the human sensory system, for example the human visual system (i.e., the human eye) and the human auditory system (i.e., the human ear) . Such perceptual information for the digital signal (video or audio) is obtained using a perceptual model, for example the Psychoacoustic Model I or II in the MPEG-1 audio (Described in [22]), and the Space-Temporal Model used in video (described in [23]). The psychoacoustic model is based on the consistent effect that the human ear is only able to perceive sounds within a certain band of frequencies, depending on various environmental conditions. Similarly, the HVM (human visual model, Human Visual Model) is based on the consistent effect that the human eye pays more attention to certain movement, colors and contrast. The bitstream of the core layer and the bit stream of the enhancement layer are multiplexed to form the scalable bit stream. The scalable bitstream can be decoded to reconstruct the digital signal without loss. As mentioned above, the core layer bit stream is an embedded bitstream which forms the minimum bit / quality frequency of the scalable bit stream, and 52/356/06 the bitstream of the enhancement layer forms the lossy part without loss of the scalable bitstream. Because the bit stream of the enhancement layer is perceptually encoded by bit planes, the bit stream of the enhancement layer can be truncated, in such a way that the data in the bit stream of the enhancement layer, which are less perceptually important, are truncated first, to provide the perceptual scalability of the scalable bitstream. In other words, the scalable bitstream can be scaled by truncating the bit stream of the enhancement layer, so that the bitstream of the enhancement layer, and hence the scalable bitstream, can be optimized perceptually even when truncated at a lower bit / quality frequency. The method according to the invention can be used as a lossless encoder for a digital signal, such as an image, video or audio signal, in broadband or high fidelity systems. By changing the bandwidth requirement, the bit rate of the bitstream generated by the encoder may change accordingly to accommodate the change in the bandwidth requirement. Such a method can be implemented in many applications and systems such as MEG audio, images and JPEG 2000 video compression. In accordance with one embodiment of the invention, the 52/356/06 digital signal is transformed to a suitable domain before being quantized to form the quantized signal. The digital signal can be transformed within the same domain, or from one domain to another in order to better represent the digital signal, and with this an easy and efficient quantization and coding of the digital signal is possible to form the flow of the digital signal. core layer bits. Such a domain may include, but is not limited to, the time domain, the frequency domain, and a hybrid of the time and frequency domains. The transformation of the digital signal can be performed even by a unitary matrix, I. In one embodiment, the digital signal is transformed into a transformed signal using a Discrete Modified Integer Cosine Transform (intMDCT, integer Modified Discrete Cosine Transform). The intMDCT is a reversible approach to the filter bank of the Discrete Modified Cosine Transform (MDCT) normally used in an MPEG-4 AAC encoder. Other transforms may also be used to transform the digital signal to a suitable domain including, but not limited to, the Discrete Cosine Transform, the Discrete Sense Transform, the Fast Fourier Transform and the Discrete Wavelet Transform. When the intMDCT is used to transform the 52/356/06 digital signal to the transformed signal, the transformed signal (specifically the intMDCT coefficients which describe the transformed signal) are preferably normalized or scaled to approximate the output of an MDCT filter bank. The normalization of the signal transformed by intMDCT can be useful in the case that a quantizer to quantize the transformed signal, for example an AAC quantizer, has an MDCT filter bank with an overall gain different from the gain of the filter bank intMDCT . Such a normalization process approximates the signal transformed by intMDCT to the MDCT filter bank so that it is suitable for quantization and encoding directly by the quantizer to form the core layer bitstream. To encode a digital audio signal, the digital / transformed signal is preferably quantized and encoded according to the MPEG AAC specification to generate the core layer bitstream. This is because AAC is one of the most efficient perceptual audio coding algorithms to generate a bit stream of high quality audio but low bit rate. Therefore, the core layer bit stream generated using AAC (which will be referred to as the AAC bitstream) has a low bit rate, and even when the scalable bit stream is truncated to the bitstream of the 52/356/06 core layer, the perceptual quality of the truncated bit stream is still high. It is worth mentioning that other algorithms / methods of quantization and coding, for example MPEG-1 Audio Layer 3, (MP3) and other proprietary methods for coding / quantization to generate the core layer bitstream, can also be used. The error mapping that removes the information that has already been encoded in the bit stream of the core layer and which generates a residual signal (or error signal) is done by subtracting the lowest quantization threshold (close to zero) of each quantized value of the signal quantized from the transformed signal. Such an error mapping procedure based on the quantization threshold has the advantage that the values of the residual signal are always positive, and the amplitude of the residual signal is independent of the quantization threshold. This allows to implement an efficient and low complexity embedded coding scheme. However, it is also possible to subtract a reconstructed transformed signal from the transformed signal to generate the residual signal. In order to determine the perceptual information of the digital signal for coding by bit planes of the error signal, the psychoacoustic model can be used as a perceptual model. The psychoacoustic model may be based on the Psychoacoustic Model I or II used in the 52/356/06 audio MPEG-1 (as described in [21]), or in the Psychoacoustic Model in MPEG-4 audio (as described in [19]). When a perceptual quantizer, such as that used in accordance with AAC, is used to quantize and encode the digital / transformed signal, the perceptual model used in the perceptual quantizer can also be used to determine the perceptual information for coding by planes. bits of the error signal. In other words, in this case a separate perceptual model is not needed to provide the perceptual information for bit-plane coding of the error signal. The perceptual information for coding bit planes of the error signal is preferably also multiplexed with the bit streams of the core layer and the enhancement layer to form the scalable data stream as secondary information. The secondary information can be used to reconstruct the error signal by means of a decoder. The error signal is distributed in a plurality of bit planes, where each bit plane has a plurality of bit-plane symbols. In one embodiment of the invention, the distribution or order of the bit planes of the error signal is changed or shifted, and subsequently the bit planes are analyzed and encoded in a sequential sequential manner. The plans 52/356/06 of bits are shifted in such a way that when the bit-plane coding is performed on the displaced bit planes, the bit planes comprising the most important bit-plane symbols are perceptually analyzed and encoded first . In this embodiment, all the bit plane symbols in a bit plane are encoded before encoding the bit plane symbols of a subsequent adjacent bit plane. In another embodiment of the invention, the bit plane symbols of the bit planes are analyzed and encoded in a sequence based on the perceptual information. In other words, not all bit-plane symbols in a bit-plane are encoded before encoding the bit-plane symbols from another bit-plane. The sequence of analysis and coding of the bit-plane symbols from the plurality of bit planes is determined based on the perceptual information in such a way that the symbols of the bit planes that are most perceptually important are first coded. The perceptual information of the digital signal determined by the perceptual model can include the first (or maximum) bit plane M (s) (that is, a number (index) which specifies the first bit plane) of the plurality of bit planes for the coding of bit planes of the error signal, and / or the level of distortion 52/356/06 Barely Noticeable (JND, Just Noticeable Distortion) of the digital signal. It should be noted that the perceptual information is related to the digital signal for each of the different domain characteristics (eg, frequency, time, signal amplitude, etc.) or for a range of domain characteristics. For example, when the digital signal is transformed into the frequency domain, the values of the perceptual information of the digital signal at all frequencies or in a frequency band (frequency band s, or more generally, domain band s) they may be different, indicating that the signal may be more important perceptually at certain frequencies. In one embodiment of the invention, the perceptual significance P (s) of the digital signal, corresponding to each frequency band s, is determined as the perceptual information. In this mode, the JND level t (s) of the digital signal corresponding to the bit plane of the error signal is determined. The bit plane corresponding to the JND level t (s) is subtracted from the index of the first bit plane of the plurality of bit planes for the coding of bit planes of the error signal M (s) resulting in the significance perceptual P (s). The perceptual significance P (s) can be used to control the displacement of the bit planes, so that the 52/356/06 bit planes that comprise the most important bit-plane symbols are perceptually analyzed and coded first. More advantageously, the perceptual significance P (s) can be used to control the sequence of analysis and coding of the bit-plane symbols from the plurality of bit planes such that the symbols of the bit-planes which are more important perceptually are encoded first. In a further embodiment of the invention, the perceptual significance P (s) is normalized to form a normalized perceptual significance Ps' (s). In this modality, a common perceptual significance of the digital signal Ps__common is defined based on a function of the perceptual significance Ps (s). Some examples of such a function of perceptual significance Ps (s) include the average value, the maximum value and the minimum value or a normalized value of the perceptual significance Ps (s). The common perceptual significance Ps_common is subtracted from the perceptual significance Ps' (s) for each frequency band. When the frequency band s contains at least one quantized signal with non-zero value, the frequency band s is a significant band. Otherwise, the frequency band s is a non-significant band. For a significant band, the value of 52/356/06 the corresponding perceptual significance Ps (s) is established in the value of the common perceptual significance Ps. For the nonsignificant band, the normalized perceptual significance Ps' (s) is multiplexed with the bitstream of the core layer and the bit stream of the enhancement layer to generate the scalable bit stream for transmission. This normalized perceptual significance Ps' (s) is transmitted in the scalable data stream as secondary information to decode the scalable bit stream in a decoder. The normalization of perceptual significance Ps (s) by defining a common perceptual significance Ps_common has the advantage of reducing the amount of perceptual information to be transmitted in the scalable bitstream by using information obtained by quantizing the digital / transformed signal to generate the bitstream of the core layer. Therefore, the perceptual information, in particular the normalized perceptual significance Ps' (s), only needs to be transmitted to the side of the decoder for the nonsignificant band, since the perceptual information for the significant band can be easily regenerated by the decoder. The index of the first (or maximum) bit plane of the plurality of bit planes for coding by 52/356/06 bit planes of the error signal M (s), which forms part of the perceptual information of the digital signal, can be determined from the maximum quantization interval used to quantize the digital / transformed signal. For the significant band, the maximum quantization range (the difference between the highest and the lowest quantization threshold corresponding to each quantized value of the quantized signal) is determined, and said first bit plane (specified by M ( s)) accordingly. Such a maximum quantization interval can also be determined on the decoder side and, therefore, said first bit plane (specified by M (s)) need not be transmitted as part of the scalable bit stream in this case (for the significant band). Although the coding of a digital signal in a scalable bitstream is described, it should be understood that the invention also includes the decoding of the scalable bitstream in a decoded digital signal by the reverse method to that described above. In one embodiment of the invention, a method is provided for decoding the scalable bitstream in the digital signal, which includes demultiplexing the scalable bitstream in a core layer bitstream and an improvement layer bit stream. , decoding and 52/356/06 dequantizing the core layer bit stream to generate a core layer signal, decoding the improvement layer by bit planes based on the perceptual information of the digital signal, performing an error-based mapping in the decoding layer signal decoded by bit planes and the de-quantized core layer signal to generate a reconstructed transformed signal, in which the reconstructed transformed signal is the digital signal. It should be noted that the method for decoding the scalable bitstream can be used in combination with or separately from the method for encoding a digital signal in the scalable bit stream, as described above. The reconstructed transformed signal can be transformed to generate the digital signal, if the digital signal is in a domain different from the reconstructed transformed signal. The exact implementation of the decoding of the scalable bitstream to generate the digital signal will depend on how the bitstream scalable by the encoder is encoded. In one example, the reconstructed transformed signal can be transformed using the intMDCT to generate the digital signal. The core layer bit stream can be decoded and de-quantized according to the MPEG AAC specification. Error mapping is done by adding the lower quantization threshold 52/356/06 used to dequantize the transformed signal and the bitstream of the enhancement layer decoded by bit planes to generate the reconstructed transformed signal. The advantages and other implementations of the decoder are similar to those of the encoder, which has already been described above. The perceptual information of the digital signal can be obtained by demultiplexing the scalable bitstream, if the perceptual information has been multiplexed in the scalable bit stream as secondary information. Alternatively, if the bitstream of the core layer is perceptually coded, the perceptual information obtained by the decoding and dequantization of the bitstream of the core layer can be used for the bit-plane decoding of the bitstream of the core layer. improvement layer. In one embodiment of the invention, the data stream of the enhancement layer is decoded by bit planes in a consecutive sequence to generate a plurality of bit planes comprising a plurality of bit-plane symbols, and the bit-planes they are shifted based on the perceptual information of the digital signal to generate the bitstream of the enhancement layer decoded by bit planes. In another embodiment of the invention, the data flow of the enhancement layer is decoded by bit planes 52/356/06 in a sequence based on the perceptual information of the digital signal to generate a plurality of bit planes comprising a plurality of bit-plane symbols, thereby generating the bitstream of the decoded enhancement layer by bit planes. The perceptual information of the digital signal can be at least one of the following: -the bit plane corresponding to the data flow of the improvement layer when starting the decoding by bit planes of the bit stream of the improvement layer M ( s); and - The barely perceptible distortion level (JND) of the digital signal, in which s corresponds to a frequency band of the digital signal. The bit plane corresponding to the data flow of the enhancement layer at the start of the bit plane decoding of the bit stream of the improvement layer M (s) is determined from the maximum quantization interval used to dequantize the flow of bits of the core layer. The second aspect of the invention not only relates to a method for decoding a scalable bit stream in a digital signal, but also includes a computer program, a computer readable medium and a device for implementing said method.52 DETAILED DESCRIPTION OF THE INVENTION Various modalities and applications of the invention are described in detail below with reference to the figures, in which: Figure 1 shows an encoder according to an embodiment of the invention. Figure 2 shows a decoder according to an embodiment of the invention. Figure 3 illustrates a structure of a bitmap coding process. Figure 4 shows an encoder according to an embodiment of the invention. Figure 5 shows a decoder according to an embodiment of the invention. Figure 6 shows an encoder according to an embodiment of the invention. Figure 7 shows a decoder according to an embodiment of the invention. Figure 1 shows an encoder 100 according to one embodiment of the invention. The encoder 100 serves to generate a scalable bit stream, and comprises two distinct layers, namely a core layer which generates the core layer bit stream, and a lossless Enhancement (LLE) layer. which generates the bit stream of improvement layer. 52/356/06 The encoder comprises a domain transformer 101, a quantizer 102, an error mapping unit 103, a perceptual bitmap encoder 104 and a multiplexer 105. In the encoder 100, the digital signal is first transformed by the domain transformer 101 to a suitable domain, such as the frequency domain, resulting in a -transformed signal. The coefficients of the transformed signal are quantized by quantizer 102 and encoded to generate the bitstream of the core layer. The error mapping is performed by the error mapping unit 103, which corresponds to the LLE layer, to eliminate the information of the coefficients of the transformed signal that have been used or encoded in the core layer to form the flow of bits of the core layer. The resulting residue or error signal, specifically the error coefficients, are encoded by bit planes by the bit plane encoder 104 to generate the embedded LLE bit stream. This embedded bitstream can be further truncated to reduce the bit rate in the encoder 100 or a corresponding decoder (such as the decoder 200 shown in FIG. 2 and described below), or in the communication channel to meet the requirements of frequency / fidelity. A perceptual model 106 is used to control the coding by planes of 52/356/06 bits of the error coefficients, so that the error coefficients that are most important perceptually are coded first. Finally, the bitstream of the LLE layer is multiplexed with the bitstream of the core layer by the multiplexer 105 to generate the scalable bit stream. Additionally, the perceptual information for controlling the bitmap coding of the error coefficients can also be transmitted as secondary information so that a corresponding bitplane decoder is able to reconstruct the error coefficients in a correct order. When the LLE bit stream is truncated at a lower frequency, the decoded signal would be a lossy version of the original input signal. Figure 2 shows a decoder 200 according to one embodiment of the invention. The decoder 200 decodes a bitstream generated by the encoder 100 to reconstruct the digital signal encoded by the encoder 100. The encoder 200 comprises a domain transformer 201, a dequantizer 202, an error mapping unit 203, a flat decoder perceptual bit 204 and a demultiplexer 205. Demultiplexer 205 receives the bit stream 52/356/06 scalable as input and divides the scalable bitstream into the core layer bit stream and the enhancement layer bitstream as generated by the encoder 100. The core layer data stream is decodes and de-quantizes using dequantizer 202 to form the core layer signal. The enhancement layer data stream is decoded per bit planes perceptually based on the perceptual information provided by a perceptual model 206 by the perceptual bit plane decoder 204, and subsequently an error mapping is performed by the error mapping unit 203 with the core layer signal to generate an improvement layer signal. The enhancement layer signal is finally transformed back to the domain of the digital signal by the domain transformer 201, resulting in an enhanced layer signal which is the reconstructed digital signal. The processing performed by the encoder 100 and the decoder 200 is explained in detail below. The input signal is normally transformed to the frequency domain by the domain transformer 101 before being quantized by the quantizer 102 (which is part of the core layer encoder) for generating the core layer bitstream. Various transformation functions can be used to 52/356/06 transform the input signal to the frequency domain, such as the Cosine Discrete Transform (DCT), the Discrete Modified Cosine Transform (MDCT), the entire MDCT (IntMDCT) or the Fast Fourier Transform (FFT) ). When an MPEG-4 AAC encoder is used as the core layer encoder (for the audio signal), the MDCT is normally used to transform the input audio signal to the frequency domain, as described in [I]. In [13], the entire MDCT (IntMDCT) is proposed as a reversible approach to the Discrete Modified Cosine Transform (MDCT) filter bank used with the MPEG-4 AAC encoder. A commonly used way to implement the IntMDCT is to factor the MDCT filter bank into a cascade of Givens rotations in the form of: aX which is also factorized in three lifting steps: Each elevation step can be approximated by a reversible integer to perform an entire mapping rounded to the 52/356/06 nearest whole operation r: R-Z. For example, the last step of elevation is approximated as follows: which can be reversed without loss by: In this way, the IntMDCT is obtained by implementing all Givens rotations with the entire reversible mapping as described above. In the decoder, the intMDCT can be used again by the domain transformer 201 to transform the improvement layer signal to the (reconstructed) digital signal. In the core layer, the coefficients c (k) of the transformed signal, where k = 1, ..., 1024 is the length of a data flow chart of the core layer, are quantized by quantizer 102 and they are encoded in the bitstream of the core layer. In the context of an input audio signal, the coefficients of the transformed signal can be quantized according to the quantization values of an MPEG-4 encoder, an MPEG-1 Layer 3 Audio (MP3) encoder or any proprietary audio encoder . When an MPEG-4 ACC encoder is used in 52/356/06 together with the IntMDCT, the coefficients of the transformed signal (also known as the IntMDCT coefficients), c (k), are first normalized as: c '(k) = a • c () to approximate the outputs normalized to the outputs of the MDCT filter bank. The normalized coefficients of the IntMDCT, c '(k), are then quantized and encoded, for example, according to an AAC quantizer (see [19]) which is given in the following manner: Here [ . ] denotes the reduction operation which truncates an operand of floating point to integer, i (k) are the quantized coefficients AAC and scale_f actor (s) is a scale factor of a scale factor band to which the coefficient belongs c (k). The scaling factors can be adjusted in an adaptive manner by a noise reduction procedure so that the quantization noise is masked in a better way by the masking threshold of the human auditory system. A widely adopted approach for this noise reduction procedure is the nested quantization and the coding loop, as described in detail in [1]. The quantized coefficients i (k) are encoded without 52/356/06 noise (in this example, using quantizer 102), for example, by Huffman code or by bit-separated Arithmetic Code (BSAC) as described in [17]. The BSAC is preferred if the bit rate scalability is further required in the bitstream of the core layer. Scale factors are differentially encoded, for example, by the DPCM coding process described in [1], or by using the Huffman code. The bitstream of the core layer can then be generated by multiplexing all the encoded information according to the syntax of the AAC bit stream. It is worth mentioning that although the mechanism to embed the bitstream compatible with MPEG-4 AAC is described, it is also possible to use bit streams which are compatible with other codees such as proprietary MPEG 1/2 Layer I, II, III (MP3), Dolby AC3, or SONY ATRAC encoders as described in [20]. When the quantizer 102 operates in accordance with the MPEG AAC encoder, the scrambler 202 preferably operates in accordance with an MPEG AAC decoder to decode and dequantize the bitstream of the core layer in the decoder 200. Specifically, the dequantizer 202 is used to generate the signal from the core layer that is subsequently used to perform the error mapping using the error mapping unit 52/356/06 203 in the decoder 200 to generate the signal of the improvement layer as will be described below. However, it should be noted that dequants can be used according to other specifications such as MP3 or other proprietary decoders in the decoder 200. In the LLE layer, an error mapping procedure is used to eliminate the information that has already been encoded in the data flow of the core layer. One possible approach to construct such an error mapping procedure is to subtract the lowest quantization threshold (closest to zero) of each quantized coefficient coefficient of the corresponding transformed input signal. This can be illustrated as: e (k) = c (k) - thr (k), where thr (k) is the lowest quantization threshold (closest to zero) for c (k) and (k) is the coefficient of error that represents the error signal. When the MPEG-4 AAC encoder is used as the quantizer: In practical applications, to ensure a robust reconstruction, an entire mapping can be performed 52/356/06 i (k) to the integer thr (k) by means of a search table. As can be clearly seen from the previous formula, a total of 4 tables is required for the different values of scale_factors (since the same table can be shared between different values of scale_factors if they have a module 4 per bit shift), in which each table contains the mapping between all the possible values of i (k) and the corresponding thr (k) for any scale_factor of the set of those with module 4. It is also possible to perform the error mapping procedure by subtracting a reconstructed coefficient from from the coefficient of the input signal transformed from the coefficient of the transformed signal as described in [3], which can be illustrated as: e (k) = c (k) - c (k), where c (k) is the coefficient of the reconstructed transformed signal. In general, it is also possible to perform the error mapping procedure based on the use of: 'e (k) = c (k) - f (k) where f (k) is any function corresponding to ac (k), such as : 2/356/06 Clearly, for c (k) which has already been significant in the core layer (thr (k)? 0), the sign of the residual of IntMDCT e (k) can be determined from the reconstruction of the core layer and therefore only its amplitude is needed to be encoded in the LLE layer. Additionally, it is well known that for most audio signals, c (k) can be approximated by random Laplace variables by means of the probability density function (pdf): XW \ ^ f (c { K)) = e 2s¿ Where s is the variance of c (k). From the "without memory" property of a Laplace pdf, it is easy to verify that the amplitude of e (k) is geometrically distributed as: where the distribution parameter? (k) is determined by the variance of c (k) and the quantizer step size of the core layer. This property allows a very efficient bit-plane coding scheme, such as the Golomb bit-plane code (BPGC) 0 for 52/356/06 code the error signal to apply. In the decoder 200, the coefficients of the transformed signal can be reconstructed by the error mapping procedure performed by the error mapping unit 203 according to the following equation: c (k) = e * (k) + thr (k ) where e '(k) are the decoded error coefficients which describe the bitstream of the enhancement layer encoded by bit planes, which correspond to the error coefficients e (k) in the encoder 100. it can be seen that with the decoded error coefficients e '(k) (possibly a lossy version if the LLE bit stream is truncated at lower frequencies) and the quantization threshold thr (k) generated in the same way in the encoder with the quantization index i (k) contained in the embedded core layer (AAC) bitstream. Similarly to the encoder 100, the coefficients of the transformed signal c (k) in the decoder 200 can also be generated by using (summing) the decoded error coefficients e '(k) and the reconstructed error coefficients of the bit stream of the core layer. In addition, the coefficients of the transformed signal c (k) can be generated using (summing) the error coefficients 52/356/06 decoded e '(k) and a function of c (k). In order to produce the scalable part without loss of the bit stream without final embedded loss, the remainder of the error signal is further encoded by the encoder of preceptual bit planes 104 using bit-plane coding, an embedded coding technology which it has been widely adopted in audio coding [3] or in image coding [5], in the LLE layer. A description of the general bitmap coding procedure can be found in [4] and [15]. Consider an input data vector of n dimensions xn =. { x?, ..., xn} where is it extracted from some random sources of some alphabet A c SR. Clearly, x can be represented in a binary format í¡ =. { 2s¡ -í) - jbi, j - 2J, i = l, ..., k Cascading binary bit-plane symbols that are composed of a sign symbol = o ^ - [O Xi = O 'and symbols of amplitude b ±, j e. { 0.1 } . In practice, coding by bit planes can be initiated from the maximum bit plane M of the vector xn 52/356/06 where M is an integer that satisfies: and stop at the bit plane 0 if xn is an integer vector. The process of encoding and decoding bit planes according to one embodiment of the invention and, for example, as performed by the perceptual bit-plane encoder 104 and the perceptual bit-plane decoder 204 is explained below, with reference to figure 3. Figure 3 illustrates a structure of the process of encoding bit planes (BPC, Bit Plañe Coding) above, where each input vector first is broken down into the symbols of binary sign and amplitude, which then they are analyzed, in a desired order, by a unit of analysis of bit planes 301 and encoded by an entropy encoder 302 (eg), as an arithmetic code, as a Huffman code or as a serial length code). Additionally, a statistical model 3.03, for example, based on the Laplace distribution of the input signal, is normally used to determine the allocation probability for each binary symbol to be encoded. In the corresponding decoder, the defect in the data is reversed, that is, the output of the entropy encoder 302 52/356/06 is decoded by an entropy decoder 303 using a corresponding statistical model 304 and the result is used by a bit plane reconstruction unit 304 to reconstruct the bit plane, where the sign and amplitude symbols that are Decoded to reconstruct the bit plane of the data vector follows the same order of analysis in the encoder. The most significant advantage of having a bitmap coding system like the previous one is that the resulting compressed bitstream can easily be truncated to any desired frequency, where a reproduction data vector can still be obtained by means of a reconstructed x-bit plane. partially decoded from this truncated bitstream. To obtain the best performance in coding, an embedded principle (see [24]) is usually adopted in the PCB, according to which the symbols of the bit planes are coded in the order of the decreasing frequency distortion slope , so that the symbols with the most significant contribution to the distortion frequency per unit are always coded first. The selection of the order of the bitmap analysis depends on the desired distortion measurement. When the mean square error (MSE, Mean Square Error) or the expectation of the quadratic error function are used 52/356/06 as a measure of the distortion as shown: 1" where d (? ", & n) is the distortion value, xn is the original data vector, and Á" is the reconstructed vector of xn in the decoder. The results of [24] show that the embedded principle is correctly satisfied by a bit-plane analysis and a coding procedure for most sources, except those with a very skewed bit-plane symbol distribution. An example of a procedure for analyzing and coding simple sequential bit planes consists of the following steps: 1. Start from the most significant bit plane j = M-1; 2. Encode only b¿, j with bj., M-? = bj., M-2 = = k > i, j +? = 0. If bi, j = 1 in the significance analysis, code yes; (step of significance) 3. Encode the bj., j that are not encoded in the step of significance (step of refinement); 4. Proceed to the bit plane j-1. List 1. Analysis and coding procedure 52/356/06 of bit planes The above procedure is iterated until a certain completion criterion is reached, which is usually a predefined frequency / distortion constraint. In addition, further adjustment of the coding sequence may be necessary in a step of significance if it is found that the symbols of the bit planes have unequal distributions. An example of the above sequential coding procedure is illustrated considering a vector x with dimension 4, say (9, -7,14,2). Thus, it is encoded by bit planes from its most significant bit plane 4. The significance step starts, since all the elements are still non-significant, (x denotes the omission symbols). The sign is coded as follows: Positive is coded as 1, and negative is coded as 0. 2/356/06 Thus, the output binary stream is 11011010001001111110, which is then encoded by entropy and sent to the decoder. In the decoder, the bit-plane structure of the original data vector is reconstructed. If the complete binary stream is received in the decoder, the bit plane of the original data vector can be restored and thus, a lossless reconstruction of the original data vector is obtained. If only a subset (the most significant part) of the binary stream is received, the decoder is still able to restore a partial bit plane of the original data vector, so that a coarse (quantized) reconstruction version of the bit vector can be obtained. original data. The above is only a simple example of the method of analyzing and coding bit planes. In practice, the significant step can be fractionated 52/356/06 further to explore the statistical correlation of the elements in the data vector, such as the process of encoding bit planes in JPEG2000, or the process in the embedded audio encoder (EAC) in 4] . The above sequential bit-plane coding and analysis procedure provides only one effort to optimize the performance of the MSE. In the area of audio, image or video coding, minimizing perceptual distortion instead of the MSE is usually a more efficient coding method to obtain an optimum perceptual quality in the reconstructed audio, image or video signal. Therefore, the encoding of the sequential bit-plane of the error signal is definitely a less than optimal option. In the encoder 100, the error coefficients are preferably grouped in frequency bands so that each frequency band s contains a number of error coefficients in consecutive order. (The grouping of scale factor bands can be based on the band grouping adopted in quantizer 102 if a perceptual encoder is used as quantizer 102. However, other groupings of bands are also possible). It is said that a frequency band is significant if there is an error coefficient in the band 52/356/06 so that the quantized coefficient thr (k) from the quantizer is not zero. In other words, if e (k) is an error coefficient in the frequency band s: e (k) = c (k) - thr (k), the frequency band is significant if thr (k)? 0 (thr (k) = 0 when i (k) = 0), and therefore e (k) = c (k), otherwise the band s is considered non-significant. The perceptual significance of the bits of the error coefficients can be determined by the Barely Perceptible Distortion (JND) level at a location of frequency i. This level of JND, Ti, can be determined from a perceptual model such as the psychoacoustic model (1 or II) or any proprietary perceptual model. When a perceptual quantizer is used to form the bitstream of the core layer, the perceptual model used in the quantizer can also be used to generate the JND for perceptual bit-plane coding of the error coefficients. For simplicity, the perceptual significance of the bits of the error coefficients in the same frequency band s can be set to the same value. Next, a possible implementation of perceptual bit-plane coding is explained with reference to Fig. 4 52/356/06 Figure 4 shows an encoder 400 according to one embodiment of the invention. Analogously to the encoder 100, the encoder 400 comprises a domain transformer 401, a quantizer 402, an error mapping unit 403, a bitmap encoder 404 (using a perceptual model 406) and a multiplexer 405. perceptual BPC block, that is, the perceptual bit-plane encoder 404 comprises a block of displacement of bit planes 407 and a conventional BPC block 408. In the plane of displacement of bit planes 407, the bit planes are displaced perceptually, and the perceptually displaced bit planes are encoded in the BPC block 408 in a conventional sequential coding and analysis manner. Consider the following perceptually weighted distortion measure (modified): 1"d (.xn, Án) = -Y (xi - xi) 2wi (x¡) n M In the context of perceptual audio coding, the audio signal is normally quantized and encoded in the frequency domain so that the data vector x "is the transformed audio signal and the 52/356/06 weighting function wj (xt) is the importance of Xi in different locations of frequency i, that is, (N l The perceptually previous weighted distortion function can be rewritten as follows: where Therefore the squared error function of weighting becomes a quadratic error function in the scaled vector x'n =. { x '?, ..., x'n} . Therefore, the perceptually optimized encoding of xn can be obtained simply by sequentially encoding bit-planes in x'n- In the corresponding decoder, each element of the data vector decoded by bit planes £ 'can be scaled back to obtain a reconstructed data vector as follows. x¡ - ^ ¡t-¡? ¡¡) - L, ..., n. Clearly, the Ti weights are transmitted 52/356/06 preferably to the decoder as secondary information if they are unknown in the decoder. j is also quantized to a power of 2 whole pair such that it becomes T, = 22t-, where t, = iog2: rf and the scaled vector can therefore be obtained by moving bits of each element in the original data vector as follows: which is easily obtained by performing the right shift operation in x by t¡. For example, if x? = 00010011 and t¡ = -2, the element of the scaled data vector x'¡ is then 01001100; If r (. = 2 will be converted to 00000100.11 In this way, the bit planes of the error coefficients are displaced perceptually in such a way that when a sequential bit-coding is performed in the displaced bit plane, the bits that are more important perceptually (instead of having the highest MSE) can be coded first. 52/356/06 Clearly, if each element in the original data vector is an integer with a limited word length, p. ex. , if each element in x has a maximum bit plane of L, the coding without loss of x can be achieved if each x'i in the scaled vector is encoded by bit planes from the bit plane -T ~ L - T. As mentioned above, information on perceptual significance such as the JND level can be provided to the block of displacement of bit planes from a perceptual model. In the process of coding bit planes, a maximum bit plane, M (s) can be used to specify the initial bit plane at which the analysis and coding of bit planes should begin. The maximum bit plane M (s) and Ti should preferably be transmitted as secondary information in the scalable bitstream to the corresponding decoder in order that the decoder be able to decode the bit stream correctly. To reduce the amount of secondary information, M (s) and Ti can be restricted to the same value for the same band of scaling factor in the encoder. The value of the maximum plane of bits M (s) in each frequency band s can be determined from the error coefficients e (k) using the following expression: 52/356/06 In addition, the maximum absolute value of the error coefficients m? (| E (&) |) in each significant frequency band is linked by the quantizer interval of the perceptual quantizer: m? f (¿) |) < thr (i (k) +1) - thr (i { k)). Therefore, these results in the maximum bit plane M (s) for each of the significant frequency bands s to be determined from the following expression: Since the quantized coefficients of the perceptual quantizer i (k) are known to the decoder, it is therefore not necessary to transmit the maximum bit plane M (s) as secondary information to the decoder for the significant frequency bands s. The value of the maximum bit plane M (s) can also be predefined in the encoder and the decoder and, therefore, do not need to be transmitted as secondary information. Figure 5 shows a decoder 500 according to an embodiment of the invention. The decoder 500 implements a decoder 52/356/06 of perceptual bit planes which comprises the displacement of bit planes and the coding of conventional (sequential) bit planes. Analogously to the decoder 200, the decoder 500 comprises a domain transformer 501, a decontainer 502, an error mapping unit 503, a bit-plane decoder 504 (using a perceptual model 506) and a demultiplexer 505. Similar to the perceptual bit-plane encoder 404, the perceptual bit-plane decoder 504 comprises a plane shift block of 504. bits 507 and a conventional BPC block 508. The bit stream of the enhancement layer generated by the encoder 400 is decoded by bit planes by the decoder 500 sequentially sequentially (the same sequential bit-plane analysis procedure as the encoder 400) to reconstruct the bit planes. The reconstructed bit planes are shifted inversely to the encoder 400, based on the received or regenerated value t ± to generate the decoded error coefficients e1 (k) which describe the bit stream of the enhancement layer decoded by planes of bits. Figure 6 shows an encoder 600 according to one embodiment of the invention. 52/356/06 Encoder 600 uses perceptual bitmap coding. The encoder 600 comprises a domain transformer (intMDCT) 601, a quantizer (ACC quantizer and encoder) 602, an error mapping unit 603, a calculation unit of perceptual significance 604 (using a psychoacoustic model 605), a unit perceptual coding of bit planes 606 and a multiplexer 607. In this implementation, the order of analysis of bit planes and bit-plane symbols need not be sequential, but based on the perceptual importance of flat symbols of bits corresponding to the different frequency bands. The perceptual importance of the bitmap symbols is determined by the calculation of the parameters related to the perceptual information, such as the perceptual significance and the first (maximum) bit plane for the decoding of bit planes. The calculation of the perceptual information parameters is represented as the perceptual significance calculation block, that is, the perceptual coding unit of 604 bit planes. There are numerous ways of determining the perceptual importance, or specifically the perceptual significance, of the bit plane symbols 52/356/06 corresponding to different frequency bands. A widely adopted way is the use of psychoacoustic models, such as the Psychoacoustic Model 2 described in [19], of the digital input signal. The barely perceptible distortion level (JND) T (s) for each of the frequency bands determined by the psychoacoustic model can be converted to the unit of the bit plane level t (s) as follows: However, this invention does not restrict the method by which T (s) or t (s) can be obtained. Now, let Ps (s) represent the perceptual significance of the frequency band s, which can be determined by the distance from M (s) to t (s) as: Ps (s) = M (s) - t (s). It should also be noted that the noise level, or the level of the error coefficients of the IntMDCT e (k) would tend to be flat with respect to the JND level for the significant bands (as a result of the noise reduction mechanism in the encoder of core). In other words, the value of Ps (s) would be very close, if not identical, for the significant frequency bands. This fact can be explored in the method according to 52/356/06 the invention by sharing a common factor Ps_common for all significant bands. The possible selections of Ps_common can have the average value, the maximum value, the minimum value, or any other reasonable function of Ps (s) for all the s that are significant. Then, Ps (s) can be normalized as follows: Ps '(s) = Ps (s) - Ps_common, Since it is known that for the significant band s, Ps' (s) would be zero, therefore it does not need to be transmitted to the decoder. Otherwise, for the non-significant band s, Ps' (s) should preferably be transmitted to the corresponding decoder as secondary information. In some other examples, when there is no significant band, Ps_comnaon can be set to 0. It is also possible to use the noise reduction procedure in the core encoder to satisfy the need for perceptual coding. Therefore there is no need to further implement noise reduction, or significant perceptual identification in the improvement layer. In such cases, Ps' (s) = 0 can be established for all s. Normally, they do not need to be transmitted to the decoder if it is known by the decoder that they are all zero. A possible implementation of the mechanism of 52/356/06 coding of bit planes can be described by the pseudo code below. In this, the total number of frequency bands is denoted as s_total. 1. Find the frequency band s with the highest Ps' (s) 2. Encode the symbols of the bit plane of the bit plane M (s) for e (k) in the s band 3. M (s) = M (s) -1; Ps '(s) = Ps' (s) -1 4. if there is a band s for which M (s) = 0 go to step 1. Here we describe a method for obtaining the maximum bit plane M (s) . For the band of significance, M (s) can be determined from the maximum quantizer quantization interval if a perceptual quantizer such as an AAC quantizer is used. Specifically, M (s) is an integer that satisfies: In this case, M (s) does not need to be transmitted to the decoder, since i (k) would be known to the decoder. For non-significant bands M (s) it can be calculated from e (k) as follows: 52/356/06 and for those bands, M (s) should preferably be sent to the decoder as secondary information, since such information is not contained in the bitstream of the core layer. The value of the maximum bit plane M (s) can also be predefined in the encoder 600 and in the corresponding decoder and, therefore, do not need to be transmitted as secondary information. Other alternate approaches are also possible to explore the parameter Ps (s) in a bit-plane coding approach, directed towards some desired noise reduction objective. In general, Ps (s) can also be obtained by any function of M (s) and t (s), for example the following: Ps (s) = M (s) - 2t (s), or < s) -) 2 Figure 7 shows a decoder 700 according to one embodiment of the invention. The decoder 700 is the decoder corresponding to the encoder 600, in which the perceptual bit-plane decoding is implemented by the perceptual bit-plane analysis method as described above. The decoder 700, accordingly, comprises 52/356/06 a domain transformer (inverse intMDCT) 701, a dequantizer (dequantizer and ACC decoder) 702, an error mapping unit 703, a perceptual significance calculating unit 704, a perceptual decoding unit of planes of bits 706 and a demultiplexer 707. In decoder 700, for the significant band, Ps' (s) is set to zero, and M (s) can be calculated from the quantization index AAC i (k) in the same way as in the encoder, this is: For the non-significant band, Ps (s) and M (s) can be recovered simply from the transmitted secondary information. Once Ps (s) and M (s) have been recovered for all frequency bands, the error coefficients of the IntMDCT é (k) can be easily reconstructed by decoding the received bitstream and reconstructing their bitmap symbols in an order that is exactly the same as in the encoder 700. For example, the decoding process for the above coding example would be: 1. Find the frequency band s with the largest Ps' (s) 2. Decode the symbols of the bit planes 52/356/06 of the bit plane M (s) for é (k) in the band 3. M (s) = M (s) -1; Ps * (s) = Ps' (s) -1 4. If there is a band s for which M (s) > 0 go to step 1. Determine the maximum bit plane for coding bit planes of the error coefficients. For a significant band s (that is, the error coefficient e (k)? C (k) or 3k is, i (k)? 0), the maximum absolute value of e (k) is limited by the quantizer interval in the AAC quantizer as: Therefore, the maximum bit plane M (k) can be determined using: Since i (k) is known by the encoder, M (k). Does not need to be transmitted to the decoder since the decoder is capable of regenerating thr (k) and therefore, M (k) from i (k) for the significant band s. For the non-significant band, M (k) can be calculated from e (k) as follows: and the calculated M (s) is preferably transmitted 52/356/06 with the bit stream of the enhancement layer as secondary information for the bit stream of the enhancement layer is decoded by bit planes correctly. To reduce the amount of secondary information, M (k) can be restricted to the same values as for k for the same scale factor s in the quantizer of the core layer. Therefore, M (k) can also be denoted by M (s). In the decoder 700, the error coefficients corresponding to the error signal can be reconstructed by decoding the bitstream of the enhancement layer using the same bit plane analysis method as the encoder based on M (s). For the significant band, M (s) can be regenerated using the following: For the non-significant band, the decoder makes use of M (s), which is transmitted by the encoder as secondary information.
REFERENCES [1] M. Bosi et al, "ISO / IEC Mpeg-2 Advanced Audio Coding" (Advanced Audio Coding Mpeg-2 ISO / IEC), J. Audio Eng. Soc, Vol. 45, No. 10, pp. 789-814, 1997 OCT. [2] Jr. Stuart and et. al., "MLP lossless compression" (MLP Compression without loss), 9a. Regional Convention of the AES, Tokyo. [3] R. Geiger, J. Herre, J. Koller, and K. Brandenburg, - "INTMDCT - A link between perceptual and lossless audio coding" (INTMDCT, a link between perceptual and lossless audio coding), IEEE Proc. ICASSP 2002. [4] J.Li, "Embedded audio coding (EAC) with implicit auditory masking" (embedded audio encoding (EAC) with implicit auditing masking), ACM Multimedia 2002, Nice, France, Dec. 2002 [5] ] T. Moriya, N. Iwakami, T. Mori, and A. Jin, "A design of lossy and lossless scalable audio coding", a scalable audio coding design with and without loss, IEEE Proc. ICASSP 2000. [6] T. Moriya and et. al., "Lossless Scalable Audio Coder and Quality Enhancement", scalable audio encoder without loss and quality improvement, Proc. of ICASSP 2002. [7] M. Hans and R.W. Schafer, "Lossless Compression of Digital Audio" (Compression of digital audio without loss), IEEE signal processing magazine. Vol. 18 No. 52/356/06 4, pp. 21-32, 2001. [8] Lin Xiao, Li Gang, Li Zhengguo, Chia Thien King, Yoh Ai Ling, "A Novel Prediction Scheme for Lossless Compression of Audio Waveform "(A new prediction scheme for compression without loss of audio waveform), Proc.
IEEE ICME2001, Aug. , Japan. [9] Shorten: http://www.softsound.com/Shorten.html [10] WaveZip: http://www.gadgetlabs.com/wavezip.html [11] LPAC: http://www-ft.ee .tu-berlin.de / -liebchen / [12] Wave Archiver: www.ecf.utoronto.ca/-denlee/wavarc.html [13] R. Geiger, T. Sporer, J. Koller, and K. Brandenburg, "Audio Coding based on Integer Transforms" (Audio coding based on integer transforms) "Convention of the AES, Sep. 2001. [14] J. Johnston," Esti ation de Perceptual Entropy "(Estimation of perceptual entropy) , Proc. ICASSP 1988. [15] R. Yu, CC KO, X. Lin and S. Rahardja, "Bit-plane Golornb code for sources with Laplacian distributions" (Golornb code for bit planes for sources with Laplace distributions) , ICASSP proc 2003. [16] Monkey's Audio, http://www.monkeysaudio.com [17] SH Park et al., "Multi-Layer Bit-Sliced Bit Rate Scalable MPEG-4 Audio Coder" (Encoder MPEG-4 audio of scalable bit frequency separated by multilayer bits), presented in the 103va. AES Convention, New York, 52/356/06 Sep. 1997 (pre-press 4520) [18] Ralf Geiger et al. al., "FINE GRAIN SCALABLE PERCEPTUAL AND LOSSLESS AUDIO CODING BASED ON INTMDCT" (SCALABLE AUDIO CODING OF PERCEPTUAL FINE GRAIN AND WITHOUT LOSS BASED ON INTMDCT "Proc. of ICASSP2003. [19] ISO / IEC 14496-3 Subpart 4, Information Technology - Coding of Audiovisual Objects (Information Technology - Audiovisual Coding), Part 3. Audio, Subpart 4 Time / Frequency Coding, ISO / JTC 1 / SC 29 / WGll, 1998 [20 ] T. Painter, A. Spanias, "Perceptual Coding of Digital Audio", IEEE Proceedings, vol.88, No. 4, Apr. 2000. [21] ISO / IEC 11172-3, " CODING OF MOVING PICTURES AND ASSOCIATED AUDIO FOR DIGITAL STORAGE MEDIA AT UP TO ABOUT 1.5 MBIT / s "(CODING OF IMAGES IN MOTION AND THE ASSOCIATED AUDIO FOR DIGITAL STORAGE MEDIA UP TO SPEEDS NEAR 1.5 MBIT / s), Part 3 AUDIO [ 22] Westen, SJP, RL Lagendijk, and J. Biemond, "Optimization of JPEG color image coding using a human visual system model "(Optimization of color JPEG image coding using a human visual system model), SPIE conference on human vision and electronic images [23] Westen, S.J.P., R.L. Lagendijk, and J. Biemond, "Spatio-Temporal Model of Human Vision For Digital Video Compression" 52/356/06 (Spatiotemporal model of human vision for the compression of digital video), Proc. of the SPIE of electronic images 97. [24] J. Li and S. Lie, "An embedded still image coder with rate-distortion optimization" (An embedded static image encoder with optimization of frequency distortion), Trans. the IEEE on image processing, vol.8, no.7, pp. 913-924, July 1999

Claims (2)

  1. CLAIMS: 1. A method for encoding a digital signal in a scalable bit stream, which comprises: quantizing the digital signal and encoding the quantized signal to form a core layer bit stream; - performing an error mapping based on the digital signal and the bitstream of the core layer to eliminate the information that has already been encoded in the bitstream of the core layer, resulting in an error signal; - coding the error signal by bit planes based on the perceptual information of the digital signal, resulting in an improvement layer bit stream, in which the perceptual information of the digital signal is determined using a perceptual model; and - multiplexing the bitstream of the core layer and the bitstream of the enhancement layer, thereby generating the scalable bitstream. The method according to claim 1, further comprising: - transforming the digital signal to a suitable domain, in which the transformed signal is quantized to form the quantized signal before encoding the quantized signal. 3. The method according to claim 1 or 2, in 52/356/06 where the perceptual information of the digital signal is further multiplexed with the bitstream of the core layer and the bit stream of the enhancement layer to generate the scalable bit stream. 4. The method according to claim 2, wherein the digital signal is transformed into a digital signal transformed using a Discrete Modified Whole Cosine Transform. The method according to claim 4, wherein the transformed signal is normalized to approach the output of an MDCT filter bank. The method according to any of claims 1 to 5, wherein the digital signal or the digital transformed signal is quantized and encoded according to the Advanced Audio Coding (AAC) specification of the Expert Group on Images in Motion (MPEG, Moving Pictures Expert Group). The method according to any of claims 1 to 6, wherein the error mapping is done by subtracting the lowest quantization threshold corresponding to each of the quantized values of the quantized signal of the digital signal or of the transformed digital signal , thus generating the error signal. The method according to any of claims 1 to 7, wherein the psychoacoustic model is 52/356/06 used as the perceptual model to determine the perceptual information of the digital signal. The method according to any of claims 1 to 8, wherein the error signal is represented in bit planes comprising a plurality of bitmap symbols, and wherein the bit planes are shifted based on the information perceptual of the digital signal, such that the bit planes that are most perceptually important are first encoded when the bit planes are scanned and encoded in a consecutive sequence during the coding of bit planes of the error signal. The method according to any of claims 1 to 8, wherein the error signal is represented in bit planes comprising a plurality of bit-plane symbols, and wherein bit planes and bit-plane symbols they are analyzed and encoded during the encoding of bit planes of the error signal into a sequence based on the perceptual information of the digital signal, such that the bit-plane symbols of the most important bit planes are perceptually encoded first . The method according to claim 9 or 10, wherein at least one element of the following information is determined as the perceptual information of the signal 52/356/06 digital by means of the perceptual model: - the bit plane of the error signal in which the coding of bit planes of the error signal M (s) starts; and - the Barely Perceptible Distortion (JND) level of the digital signal, in which s corresponds to a frequency band of the digital signal or of the transformed digital signal. The method according to claim 11, wherein the perceptual significance Ps (s) of the digital signal is further determined as the perceptual information, the perceptual significance is determined by: - determining the bit plane of the error signal corresponding to the JND level t (s) of the digital signal, - subtract the bit plane of the error signal corresponding to the JND level t (s) of the digital signal of the bit plane from the error signal, in the which starts coding bit planes of the error signal M (s), thus determining the perceptual significance Ps (s), where the perceptual significance Ps (s) is used to control the sequence of analysis and coding of minus the bit planes or the symbols of the bit planes of the bit planes. The method according to claim 12, wherein the perceptual significance Ps (s) is normalized by: 52/356/06 - define a common perceptual significance Ps (s) _common based on a function of perceptual significance Ps (s); and subtract the common perceptual significance Ps (s) _common from the perceptual significance Ps (s), thus generating the normalized perceptual significance Ps' (s), where for the frequency band s for which the quantized values are not all zero, the value of the perceptual significance Ps (s) is established in the value of the common perceptual significance Ps_common, and where for the frequency band s for which the quantized values are all zero, the normalized perceptual significance Ps' ( s) is multiplexed with the bit stream of the core layer and the bit stream of the enhancement layer to generate the scalable bit stream. The method according to claim 11, wherein the bit plane of the error signal at which the coding of the error signal begins is determined from the maximum quantization interval used in the frequency band s to quantize the digital signal or the transformed signal. 15. An encoder for encoding a digital signal in a scalable bit stream, comprising: - a quantization unit for quantizing the 52/356/06 digital signal and encode the quantized signal to form a core layer bitstream; - an error mapping unit for performing an error mapping based on the digital signal and the bit stream of the core layer to eliminate the information that has already been encoded in the bit stream of the core layer, giving as a result an error signal; - a perceptual bit-plane coding unit for coding the error signal by bit-planes based on the perceptual information of the digital signal, resulting in an improvement layer bit stream, in which the perceptual information of the digital signal is determined using a perceptual model; and - a multiplexer unit for multiplexing the bit stream of the core layer and the bit stream of the enhancement layer, thereby generating the scalable bitstream. 16. A means readable by a computer, having a program recorded in it, where the program, when executed by a computer, causes the computer to perform a procedure to encode a digital signal in a scalable bit stream, the procedure It comprises: quantizing the digital signal and encoding the quantized signal to form a core layer bit stream; 52/356/06 - perform an error mapping based on the digital signal and the bitstream of the core layer to eliminate the information that has already been encoded in the bit stream of the core layer, resulting in an error signal; - coding the error signal by bit planes based on the perceptual information of the digital signal, resulting in an improvement layer bit stream, in which the perceptual information of the digital signal is determined using a perceptual model; and - multiplexing the bitstream of the core layer and the bitstream of the enhancement layer, thereby generating the scalable bitstream. 17. A computer program element which, when executed by a computer, causes the computer to perform a procedure to encode a digital signal in a scalable bitstream, the method comprising: - quantizing the digital signal and encoding the signal quantized to form a core layer bit stream; - performing an error mapping based on the digital signal and the bitstream of the core layer to eliminate the information that has already been encoded in the bitstream of the core layer, resulting in an error signal; 52/356/06 - encoding the error signal by bit planes based on the perceptual information of the digital signal, resulting in an improvement layer bit stream, in which the perceptual information of the digital signal is determined using a perceptual model; and - multiplexing the bitstream of the core layer and the bitstream of the enhancement layer, thereby generating the scalable bitstream. 18. A method for decoding a scalable bitstream in a digital signal, which comprises: demultiplexing the scalable bitstream in a core layer bitstream and an improvement layer bit stream; - decoding and de-quantizing the bit stream of the core layer to generate a core layer signal; decoding the bitstream of the enhancement layer by bit planes based on the perceptual information of the digital signal; and - performing an error mapping based on the bit stream of the enhancement layer decoded by bit planes and the unbalanced core layer signal, resulting in a reconstructed transformed signal, wherein the reconstructed transformed signal is the digital signal 19. The method according to claim 18, which further transforms the reconstructed transformed signal into a 52/356/06 reconstructed signal, where the reconstructed signal is the digital signal. The method according to claim 18 or 19, wherein the perceptual information of the digital signal is obtained from the demultiplexing of the scalable bit stream. The method according to claim 19 or 20, wherein the core layer signal and the enhancement layer signal are transformed using a Discrete Modified Cosine Transform (MDCT). 22. The method according to any of claims 18 to 21, wherein the bitstream of the core layer is decoded and de-quantized in accordance with the Advanced Audio Coding (AAC) specification of the Expert Group. in Moving Images (MPEG, Moving Pictures Expert Group). The method according to any of claims 18 to 22, wherein the error mapping is performed by adding the lowest quantization threshold used to dequantize the transformed signal and the data stream of the decoded improvement layer by planes of quantization. bits, thereby generating the signal of the enhancement layer. The method according to any of claims 18 to 23, wherein the data stream of the enhancement layer is decoded by bit planes to generate a plurality of bit planes comprising a 52/356/06 plurality of bit pattern symbols in a consecutive sequence, and the bit planes are shifted based on the perceptual information of the digital signal to generate the bitstream of the decoding layer decoded by bit planes . 25. The method according to any of claims 18 to 23, wherein the data stream of the enhancement layer is decoded by bit planes to generate a plurality of bit planes comprising a plurality of bit-plane symbols in a sequence. based on the perceptual information of the digital signal, thereby generating the bit stream of the enhancement layer decoded by bit planes. The method according to claim 24 or 25, wherein at least one element of the following information is received as the perceptual information of the digital signal: - the bit plane corresponding to the bit stream of the enhancement layer at the start of the decoding by bit planes of the bitstream of the enhancement layer and such bit plane is specified by an M (s) number; and - the barely perceptible distortion level (JND) of the digital signal, in which s corresponds to a frequency band of the digital signal. 27. The method according to claim 26, wherein the bit plane corresponding to the data stream of 52/356/06 the improvement layer at the beginning of the bit plane decoding of the bit stream of the improvement layer M (s) is determined from the maximum quantization interval used in the frequency band s to dequantize the flow of bits of the core layer. 28. A decoder for decoding a scalable bit stream in a digital signal, comprising: - a demultiplexing unit for demultiplexing the scalable bitstream in a core layer bit stream and an improvement layer bit stream; - a dequantization unit for decoding and de-quantizing the bit stream of the core layer to generate a core layer signal; - a bit plane decoding unit for decoding the bit stream of the enhancement layer by bit planes based on the perceptual information of the digital signal; and - an error mapping unit for performing an error mapping based on the bit stream of the enhancement layer decoded by bit planes and the unbalanced core layer signal, resulting in a reconstructed transformed signal, wherein the reconstructed transformed signal is the digital signal. 29. A medium readable by a computer, which has a program recorded on it, where the 52/356/06 program, when executed by a computer, causes the computer to perform a procedure for decoding a scalable bit stream in a digital signal, the method comprising: - demultiplexing the scalable bitstream in a bitstream of core layer and an improvement layer bit stream; - decoding and de-quantizing the bit stream of the core layer to generate a core layer signal; - decoding the bitstream of the enhancement layer by bit planes based on the perceptual information of the digital signal; and - performing an error mapping based on the bitstream of the enhancement layer decoded by bit planes and the unbalanced core layer signal, resulting in a reconstructed transformed signal, wherein the reconstructed transformed signal is the signal digital. 30. A computer program element, which, when executed by a computer, causes the computer to perform a procedure for decoding a scalable bit stream in a digital signal, the method comprising: - demultiplexing the scalable bitstream in a core layer bitstream and an improvement layer bit stream; 52/356/06 - decoding and de-quantizing the bit stream of the core layer to generate a core layer signal; decoding the bitstream of the enhancement layer by bit planes based on the perceptual information of the digital signal; and - performing an error mapping based on the bitstream of the enhancement layer decoded by bit planes and the unbalanced core layer signal, resulting in a reconstructed transformed signal, wherein the reconstructed transformed signal is the signal digital. 2/356/06 SUMMARY OF THE INVENTION A method for encoding a digital signal into a scalable bitstream comprising quantizing the digital signal, and encoding the quantized signal to form a core layer bit stream, performing an error mapping based on the digital signal and the bitstream of the core layer to eliminate the information that has already been encoded in the bitstream of the core layer, resulting in an error signal, encode the error signal by bit planes based on the perceptual information of the digital signal, resulting in a bit stream of the enhancement layer, in which the perceptual information of the digital signal is determined by a perceptual model, and multiplexing the bit stream of the core layer and the bitstream of the enhancement layer, thereby generating the scalable bitstream. A method for decoding a scalable bit stream in a digital signal comprising demultiplexing the scalable bit stream in a core layer bit stream and an improvement layer bit stream, decoding and de-quantizing the bit layer stream core to generate a core layer signal, decode by bit planes the bit stream of enhancement layer by bit planes based on the perceptual information of the digital signal, and perform an error mapping based on the 52/356/06 improvement layer bit stream decoded by bit planes and the de-quantized core layer signal, resulting in a reconstructed transformed signal, wherein the reconstructed transformed signal is the digital signal.
  2. 2/356/06
MXPA/A/2006/004049A 2003-10-10 2006-04-10 Method for encoding a digital signal into a scalable bitstream;method for decoding a scalable bitstream MXPA06004049A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US60/510,629 2003-10-10

Publications (1)

Publication Number Publication Date
MXPA06004049A true MXPA06004049A (en) 2007-04-10

Family

ID=

Similar Documents

Publication Publication Date Title
JP4849466B2 (en) Method for encoding a digital signal into a scalable bitstream and method for decoding a scalable bitstream
US6122618A (en) Scalable audio coding/decoding method and apparatus
EP0869622B1 (en) Scalable audio coding/decoding method and apparatus
EP1960999B1 (en) Method and apparatus encoding an audio signal
US6092041A (en) System and method of encoding and decoding a layered bitstream by re-applying psychoacoustic analysis in the decoder
JP4081447B2 (en) Apparatus and method for encoding time-discrete audio signal and apparatus and method for decoding encoded audio data
US7774205B2 (en) Coding of sparse digital media spectral data
US20090006103A1 (en) Bitstream syntax for multi-process audio decoding
EP1422694A2 (en) A progressive to lossless embedded audio coder (PLEAC) with multiple factorization reversible transform
USRE46082E1 (en) Method and apparatus for low bit rate encoding and decoding
KR19990041073A (en) Audio encoding / decoding method and device with adjustable bit rate
KR19990041072A (en) Stereo Audio Encoding / Decoding Method and Apparatus with Adjustable Bit Rate
US20080140393A1 (en) Speech coding apparatus and method
KR20050022160A (en) Method for scalable video coding and decoding, and apparatus for the same
KR20100113065A (en) Rounding noise shaping for integer transfrom based encoding and decoding
KR20050006028A (en) Scale factor based bit shifting in fine granularity scalability audio coding
Yu et al. A scalable lossy to lossless audio coder for MPEG-4 lossless audio coding
MXPA06004049A (en) Method for encoding a digital signal into a scalable bitstream;method for decoding a scalable bitstream
US20170206905A1 (en) Method, medium and apparatus for encoding and/or decoding signal based on a psychoacoustic model
KR100975522B1 (en) Scalable audio decoding/ encoding method and apparatus
KR101107318B1 (en) Scalabel video encoding and decoding, scalabel video encoder and decoder
KR20040051369A (en) Method and apparatus for encoding/decoding audio data with scalability
KR20060085117A (en) Apparatus for scalable speech and audio coding using tree structured vector quantizer
De Meuleneire et al. Algebraic quantization of transform coefficients for embedded audio coding
KR101449432B1 (en) Method and apparatus for encoding and decoding signal