WO2009048239A2 - Encoding and decoding method using variable subband analysis and apparatus thereof - Google Patents

Encoding and decoding method using variable subband analysis and apparatus thereof Download PDF

Info

Publication number
WO2009048239A2
WO2009048239A2 PCT/KR2008/005824 KR2008005824W WO2009048239A2 WO 2009048239 A2 WO2009048239 A2 WO 2009048239A2 KR 2008005824 W KR2008005824 W KR 2008005824W WO 2009048239 A2 WO2009048239 A2 WO 2009048239A2
Authority
WO
WIPO (PCT)
Prior art keywords
sub
band
variable
information
bands
Prior art date
Application number
PCT/KR2008/005824
Other languages
French (fr)
Other versions
WO2009048239A3 (en
Inventor
Jeong-Il Seo
Seungkwon Beack
Inseon Jang
Kyeongok Kang
Jinwoo Hong
Jinwoong Kim
Chieteuk Ahn
Original Assignee
Electronics And Telecommunications Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020080095541A external-priority patent/KR20090037806A/en
Application filed by Electronics And Telecommunications Research Institute filed Critical Electronics And Telecommunications Research Institute
Publication of WO2009048239A2 publication Critical patent/WO2009048239A2/en
Publication of WO2009048239A3 publication Critical patent/WO2009048239A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition

Definitions

  • the present invention relates to an encoding and decoding method and apparatus; and, more particularly, to an encoding and decoding method and apparatus based on variable sub-band analysis.
  • This work was supported by the IT R&D program of MIC/IITA [2007-S-004-01, "Development of Glassless Single-User 3D Broadcasting Technologies"].
  • AAC Advanced Audio Coding
  • MPS MPEG Surround
  • multichannel audio signals are encoded into down-mixed mono- channel signals or down-mixed stereo-channel signals and spatial cue information, and high-quality multi-channel signals are transmitted even at a low bit rate.
  • audio signals are analyzed for each sub-band, and original multi-channel audio signals are recovered from the down-mixed mono- channel or stereo-channel signals based on spatial cue information corresponding to each sub-band.
  • the spatial cue information includes information to be used for recovering the original signals during a decoding process and decides the sound quality of audio signals restored in an SAC decoding apparatus.
  • MPEG is working on standardization of SAC technology under the name of MPEG Surround (MPS) and uses Channel Level Difference (CLD) as a spatial cue.
  • MPS MPEG Surround
  • CLD Channel Level Difference
  • multi-channel and multi- object audio signals which are audio signals of diverse audio objects including multiple channels such as mono channel, stereo channel, and 5.1 channel, cannot be encoded and decoded.
  • BCC Binaural Cue Coding
  • SAOC Spatial Audio Object Coding
  • Conventional audio services generally have a functional limitation in that users are passive consumers of provided audio contents.
  • the audio encoding method for each object provides more active services to users.
  • the method not only control each audio object according to a request from a user but also create diverse audio services and contents out of one content combination.
  • mixer or renderer which provides such functions as panning, attenuation and suppression, can be applied to the SAOC.
  • the SAOC scheme can flexibly control audio objects through interaction with a user.
  • An embodiment of the present invention is directed to providing an encoding and decoding apparatus and method that can improve sound quality by dividing a sub- band structure into smaller sub-bands while minimizing an increase of bit rate.
  • an encoding method based on variable sub-band analysis which includes: generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and encoding the variable sub-band information and the parameter information.
  • an encoding apparatus based on variable sub-band analysis, which includes: an audio encoder for generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; a frequency transformer for transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals; a sub-band constructor for dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; a parameter generator for generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and an encoder for encoding the variable sub-band information and the parameter information .
  • a decoding method based on variable sub-band analysis which includes: decoding down-mixed signals on multiple audio objects, variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the multiple audio objects, and the parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; transforming the decoded down-mixed signals into a frequency domain; re-constructing the sub-band based on the decoded variable sub-band information; recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and transforming the recovered audio objects into a time domain .
  • a decoding apparatus based on variable sub-band analysis, which includes: a decoder for decoding down-mixed signals on multiple audio objects, variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the multiple audio objects, and parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; a frequency transformer for transforming the decoded down-mixed signals into a frequency domain to thereby produce frequency-domain signals; a sub-band re- constructor for re-constructing the sub-band based on the decoded variable sub-band information; a recovery unit for recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and a time transformer for transforming the multiple audio objects into a time domain.
  • an encoding method based on variable sub-band analysis which includes: transforming an audio object into a frequency domain to thereby produce frequency-domain signals; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; and generating parameter information used for recovering the audio object based on the variable sub-bands .
  • an encoding apparatus based on variable sub-band analysis, which includes: a frequency transformer for transforming an audio object into a frequency domain to thereby produce frequency- domain signals; a sub-band constructor for dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; and a parameter generator for generating parameter information used for recovering the audio objects based on the variable sub- bands .
  • a decoding method based on variable sub-band analysis which includes: receiving bitstream including variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the sub- band of an audio object, and parameter information for recovering the audio objects based on the variable sub- bands; re-constructing the sub-band based on the variable sub-band information; and recovering the audio object by using the parameter information.
  • a decoding apparatus based on variable sub-band analysis which includes: a receiver for receiving bitstream including variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the sub-band of an audio object, and parameter information for recovering the audio objects based on the variable sub-bands; a sub-band re-constructor for reconstructing the sub-band based on the variable sub-band information; and a recovery unit for recovering the audio objects by using the parameter information.
  • the present invention can improve sound quality by dividing a sub-band structure for an audio object.
  • Fig. 1 illustrates an audio encoding/decoding process in accordance with an embodiment of the present invention .
  • Fig. 2 is a block view showing a multi-object audio encoder in accordance with an embodiment of the present invention.
  • Fig. 3 is a block view showing a multi-object audio decoder in accordance with an embodiment of the present invention.
  • Fig. 4 illustrates a structure of a variable sub- band in accordance with an embodiment of the present invention .
  • Fig. 5 illustrates a re-constructing of a variable sub-band in accordance with an embodiment of the present invention .
  • Fig. 6 is a view describing quantization using a variable bit level in accordance with an embodiment of the present invention.
  • Fig. 7 is a view describing dequantization using a variable bit level in accordance with an embodiment of the present invention.
  • audio signals are down-mixed into signals of one audio object in a general multi-object/multi-channel audio signal encoding process
  • audio objects cannot be perfectly recovered during a decoding process.
  • the power of audio signals of one audio object is decreased greatly, such as a Karaoke mode, the degradation of sound quality is remarkable.
  • the present invention suggests a technology that extracts more accurate parameters by variably increasing the number of sub-bands for analyzing parameters during a process of encoding/decoding multi- object/multi-channel audio signals, and clearly recovers audio objects out of the down-mixed signals.
  • it is possible to minimize an increase of a bit rate by applying a different quantization level according to a frequency characteristic of signals.
  • Fig. 1 illustrates an audio encoding/decoding process in accordance with an embodiment of the present invention.
  • An encoder 101 receives an audio object.
  • the number of inputted audio object is limitless.
  • the encoder 101 may receive a plurality of audio objects (Object #1, Object #2, Object #3,).
  • the encoder 101 generates down-mixed signals by using the inputted audio objects, and extracts parameters to be required during a decoding process.
  • the parameters may include side information shown in Fig. 1.
  • a decoder 102 performs decoding. It outputs audio objects recovered by using the down-mixed signals and the parameters transmitted from the encoder 101.
  • the recovered audio objects go through position/level interaction control in a mixer/renderer 103 and they are outputted through channels (Channel #1, Channel #2, Channel #3, ).
  • the encoder 101 and the decoder 102 may employ Spatial Audio Object Coding (SAOC) scheme.
  • SAOC Spatial Audio Object Coding
  • Fig. 2 is a block view showing a multi-object audio encoder in accordance with an embodiment of the present invention.
  • the multi-object audio encoding of the present invention includes analyzing the freguency band characteristic of signals, defining a sub-band structure used to analyze parameters, and applying a different parameter quantization method according to the frequency characteristic.
  • the defined sub-band structure is re-constructed for recovery during a decoding process.
  • Audio objects (1, 2, ... , M) are inputted to an audio encoder 201 and a frequency transformer 202.
  • the audio encoder 201 down-mixes the audio objects (1, 2, ..., M) to encode the audio objects (1, 2, ..., M).
  • the frequency transformer 202 transforms the audio objects (1, 2, ..., M) into a frequency domain.
  • a sub-band constructor 203 divides a sub-band of frequency-transformed signals into variable sub-bands according to the characteristic of the sub-band.
  • a parameter generator 205 extracts parameters needed to recover audio objects from down-mixed signals during the decoding process based on the variable sub-bands.
  • Parameters for a sub-band may include Inter-Object Level Difference (IOLD) information.
  • IOLD is a parameter for calculating a power ratio of two audio objects for each sub-band. The IOLD is expressed as the following Equation 1.
  • M denotes the number of sub-bands
  • k denotes a frequency coefficient
  • b denotes a sub-band index
  • a sub-band may be a fixed sub-band fixed according to an encoding method. For example, Moving Picture Experts Group (MPEG) Surround applies 20 to 28 fixed sub-bands to one audio signal frame.
  • MPEG Moving Picture Experts Group
  • the sub-band constructor 203 forms a sub-band of variable sub-bands. The sub-band constructor 203 will be described in detail with reference to Fig. 4.
  • a first encoder 204 encodes variable sub-band information generated in the sub-band constructor 203.
  • a second encoder 206 encodes parameter information including the parameters generated in a parameter generator 205.
  • the first and second encoders 204 and 206 may use a lossless coding method.
  • a bitstream formatter 207 generates encoded variable sub-band information, parameter information, and audio objects into bitstreams.
  • the generated bitstreams may be SAOC bitstreams.
  • Fig. 4 illustrates a structure of a variable sub- band in accordance with an embodiment of the present invention.
  • the sub-band constructor 203 of Fig. 2 may include a spectrum analyzer 401 shown in Fig. 4.
  • Object #1 an object to be freely controlled by a user
  • Object #2 the other objects
  • Object #3 the other objects
  • the spectrum analyzer 401 analyzes power of the frequency band of each signal and outputs new sub-band information, which is variable sub-band information.
  • the basic structure of a sub-band used for analyzing parameters follows 28 bands used in the MPEG Surround. When the power ratio of two signals within each sub-band fluctuates, a specific band is divided in smaller bands.
  • the condition can be expressed as the following Equations 2 and 3.
  • avrg b denotes an average power ratio of two signals within a b th sub-band
  • var b denotes a dispersion coefficient indicating the extent of change of the power ratio of the two signals.
  • the analyzed b th sub-band is divided into smaller sub-bands.
  • a parameter indicating the structure of variable sub- bands is additionally transmitted so that the variable sub-bands can be easily re-constructed during the decoding process.
  • a parameter indicating the structure of sub-bands is marked as 0 or 1 for each band.
  • Sub-bands marked as 1 signify that the band needs to be divided into smaller bands.
  • Fig. 3 is a block view showing a multi-object audio decoder in accordance with an embodiment of the present invention.
  • a bitstream demultiplexer 301 receives bitstream, separates a signal for an audio object, a signal for parameter information, and a signal for variable sub-band information, and outputs the signals to decoders 302, 304 and 305, respectively.
  • the bitstream may be SAOC bitstream.
  • the signal for an audio object is decoded in the audio decoder 302 and outputted as a down-mixed signal.
  • the down-mixed signal goes through frequency transform in a frequency transformer 303.
  • the signal for parameter information is decoded in the parameter decoder 304 and outputted to a recovery unit 307.
  • the signal for variable sub-band information is decoded in the variable sub-band decoder 305 and outputted to a sub-band re-constructor 306.
  • the signal for parameter information and the signal for variable sub-band information may be decoded using a lossless decoding method.
  • the recovery unit 307 recovers an audio object based on the sub-band, which is re-constructed using the variable sub-band information in the sub-band re- constructor 306, and the parameter information and the down-mixed signal of the frequency-transformed audio object.
  • the parameter information may be spatial parameter including spatial cue information.
  • the recovered audio object is transformed into time domain in a time transformer and finally outputted as an audio object. For example, two audio objects are recovered from one down-mixed signal by using an IOLD parameter based on the following Equation 4.
  • FIG. 5 illustrates a re-construction of a variable sub-band in accordance with an embodiment of the present invention.
  • Fig. 5 describes a process of re-constructing the sub-bands used during the encoding process in the sub-band re-constructor 306.
  • 28 sub-bands are marked as 0 or 1 individually according to variable sub-band information. Bands marked as 0 are not changed, and bands marked as 1 are divided into a predetermined number of smaller bands and used.
  • a block 501 represents 28- sub-band partition information used in the MPEG Surround based on FFT.
  • An output A(k) represents partition of a k th band.
  • Fig. 6 is a view describing quantization using a variable bit level in accordance with an embodiment of the present invention.
  • a variable level quantizer 601 may be included in the parameter generator 205 shown in Fig. 2.
  • the variable level quantizer 601 analyzes a frequency band feature of an inputted parameter, performs variable bit quantization based on the feature, and outputs quantized parameters. Since the parameter generator performs variable bit quantization, it is possible to minimize an increase of bit rate.
  • Fig. 7 is a view describing dequantization using a variable bit level in accordance with an embodiment of the present invention.
  • a variable level dequantizer 701 may be included in the variable sub-band decoder 305 of Fig. 3.
  • the variable level dequantizer 701 receives a quantized parameter, performs variable bit dequantization based on a frequency band characteristic, and outputs dequanatized parameter.
  • the method of the present invention may be realized as a program and stored in a computer-readable recording medium such as CD-ROM, RAM, ROM, floppy disks, hard disks, magneto-optical disks and the like. Since the process can be easily implemented by those skilled in the art to which the present invention pertains, further description will not be provided herein. While the present invention has been described with respect to the specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.
  • Functions of various devices illustrated in the drawings including a functional block expressed as a processor or a similar concept can be provided not only by using hardware dedicated to the functions, but also by using hardware capable of running proper software for the functions.
  • a function When a function is provided by a processor, the function may be provided by a single dedicated processor, single shared processor, or a plurality of individual processors, part of which can be shared.
  • DSP digital signal processor
  • an element expressed as a means for performing a function described in the detailed description is intended to include all methods for performing the function including all formats of software, such as combinations of circuits for performing the intended function, firmware/microcode and the like.
  • the element is cooperated with a proper circuit for performing the software.
  • the present invention defined by claims includes diverse means for performing particular functions, and the means are connected with each other in a method requested in the claims. Therefore, any means that can provide the function should be understood to be an equivalent to what is figured out from the present specification.
  • the encoding of the present invention based on variable sub-band analysis can produce high-quality sound while minimizing an increase of a bit rate by dividing a sub-band of an audio object transformed into a frequency domain according to a characteristic of the sub-band.
  • variable sub-band analysis transforms an audio object into a frequency domain, divides a sub-band into variable sub- bands according to the characteristic of the sub-band of a signal obtained after the transformation into the frequency domain, and generates variable sub-band information including information on the variable sub- band obtained after the sub-band division.
  • Parameter information used for recovering the audio object is generated based on the variable sub-band.
  • Variable sub-band information and parameter information are encoded.
  • the encoding may be lossless encoding.
  • An audio object becomes down-mixed audio signals and the audio signals are encoded.
  • Audio encoding may be performed in a conventional audio encoding method.
  • the variable sub-band information, the parameter information, and the audio object are encoded into bitstream.
  • Characteristics of a sub-band include a dispersion coefficient characteristic of a power ratio of each sub- band.
  • a dispersion coefficient of a power ratio of a specific sub-band is equal to or higher than a predetermined threshold value
  • the sub-band is divided.
  • the dispersion coefficient is lower than the threshold value
  • the sub-band is not divided and an existing sub-band is maintained.
  • parameter information can be quantized using a variable bit level, and the parameter information may include spatial parameter information. The increase in a bit rate caused by an increase in the number of sub- bands can be minimized by quantizing the parameter information based on a variable bit level.
  • the decoding of the present invention based on variable sub-band analysis can produce high-quality sound while minimizing an increase of a bit rate by recovering an audio object based on sub-band division information including information related to the division of the sub- band of audio object transformed into a frequency domain according to characteristics of the sub-band.
  • the decoding based on variable sub-band analysis includes: receiving bitstream including parameter information for recovering an audio object based on a variable sub-band and variable sub-band information having information on a variable sub-band acquired from division according to the characteristic of a sub-band of an audio object, re-constructing sub-band based on the variable sub-band information, and recovering the audio object based on the parameter information.
  • Variable sub-band information and parameter information are decoded.
  • the decoding may be lossless decoding.
  • Bitstream may include bitstream on an audio object, and the audio object goes through audio decoding.
  • Audio decoding may be performed in a conventional audio decoding method.
  • the decoded audio object goes through frequency transform.
  • the audio object is recovered by using the frequency-transformed audio object and the reconstructed sub-band.
  • the re-constructed audio object goes through temporal transform and outputted.
  • Characteristics of a sub-band include a dispersion coefficient characteristic of a power ratio of each sub- band. To be specific, when a dispersion coefficient of a power ratio of a specific sub-band is equal to or higher than a predetermined threshold value, the sub-band is divided. When the dispersion coefficient is lower than the threshold value, the sub-band is not divided and an existing sub-band is maintained.
  • the variable sub-band information can include sub-band characteristic information, and the sub-band can be re-constructed using the sub-band characteristic information.
  • parameter information can be dequantized using a variable bit level, and the parameter information may include spatial parameter information.
  • the increase in a bit rate caused by an increase in the number of sub-bands can be minimized by dequantizing the parameter information based on a variable bit level.
  • the encoding method of the present invention based on variable sub-band analysis includes: generating down- mixed signals out of a plurality of inputted audio objects and encoding the down-mixed signals; transforming a plurality of audio objects into a frequency domain; dividing a sub-band of a signal acquired from the transform into the frequency domain into variable sub- bands according to the characteristic of the sub-band and generating variable sub-band information including the variable sub-band information on the variable sub-bands; generating parameter information used for recovering the down-mixed signal based on the variable sub-bands; and encoding the variable sub-band information and the parameter information.
  • the characteristic of the sub-band may include a dispersion coefficient characteristic of a power ratio of sub-bands.
  • the parameter information may include spatial parameter information . Meanwhile, in the generation of the parameter information, the parameter information may be quantized using a variable bit level.
  • An encoding apparatus based on variable sub-band analysis includes an audio encoder, a frequency transformer, a sub-band constructor, a parameter generator, and an encoder.
  • the audio encoder generates down-mixed signals out of inputted multiple audio objects and encodes the down-mixed signals.
  • the frequency transformer transforms the multiple audio objects into a frequency domain.
  • the sub-band constructor divides a sub-band of the signals obtained from the transform into the frequency domain into variable sub-bands according to the characteristics of the sub-band, and generates variable sub-band information including information on the variable sub-bands.
  • the parameter generator generates parameter information used for recovering the down-mixed signals based on the variable sub-bands.
  • the encoder encodes the variable sub-band information and the parameter information.
  • the characteristics of the sub-band may include a dispersion coefficient characteristic of a power ratio of the sub-band.
  • the parameter information may include spatial parameter information .
  • the parameter generator may include a quantizer for quantizing the parameter information by using a variable bit level.
  • the decoding method of the present invention based on variable sub-band analysis includes: decoding down- mixed signals on a plurality of audio objects, variable sub-band information including information on the variable sub-bands acquired from sub-band division according to the characteristics of the multiple audio objects, and parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; transforming the decoded down-mixed signals into a frequency domain; re-constructing the sub- band based on the decoded variable sub-band information; recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and transforming the recovered audio objects into a time domain.
  • the characteristic of the sub-band may include a dispersion coefficient characteristic of a power ratio of sub-bands.
  • the parameter information may include spatial parameter information.
  • the decoding may include dequantizing the parameter information by using a variable bit level.
  • a decoding apparatus based on variable sub-band analysis includes a decoder, a frequency transformer, a sub-band re-constructor, a recovery unit, and a time transformer.
  • the decoder decodes the down-mixed signals on the multiple audio objects, the variable sub-band information including information on the variable sub- bands acquired from sub-band division according to the characteristics of the multiple audio objects, and the parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream.
  • the frequency transformer transforms the decoded down-mixed signals into the frequency domain.
  • the sub-band re-constructor re-constructs the sub-band based on the decoded variable sub-band information.
  • the recovery unit recovers the multiple audio objects by using the decoded parameter information, the frequency- domain down-mixed signals, and the re-constructed sub- band.
  • the time transformer transforms the multiple audio objects into a time domain.
  • the characteristics of the sub-band may include a dispersion coefficient characteristic of a power ratio of sub-bands.
  • the parameter information may include spatial parameter information.
  • the decoder may include a dequantizer for dequantizing the parameter information by using a variable bit level.
  • the present invention is applied to encoding and decoding of audio signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Disclosed are encoding and decoding apparatuses and methods based on variable sub-band analysis. The encoding method based on variable sub-band analysis includes: generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub- bands; generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and encoding the variable sub-band information and the parameter information.

Description

DESCRIPTION
ENCODING AND DECODING METHOD USING VARIABLE SUBBAND ANALYSIS AND APPARATUS THEREOF
TECHNICAL FIELD
The present invention relates to an encoding and decoding method and apparatus; and, more particularly, to an encoding and decoding method and apparatus based on variable sub-band analysis. This work was supported by the IT R&D program of MIC/IITA [2007-S-004-01, "Development of Glassless Single-User 3D Broadcasting Technologies"].
BACKGROUND ART According to conventional technology, audio objects of diverse channels cannot be combined diversely according to the demand from a user. Accordingly, one audio content cannot be consumed in diverse forms, and this makes users mere passive consumers of audio contents. A Moving Picture Experts Group (MPEG) audio subgroup, which is one of standardization groups, has been developing audio coding standards such as Advanced Audio Coding (AAC) and MPEG Surround (MPS) . AAC is a high- quality audio encoding technology for mono or stereo channel signals, and MPS is a technology appropriate for multi-channel audio encoding. Conventional audio encoding apparatuses based on the technologies have focused on channel-based audio signals. An example of such technologies is Spatial Audio Coding (SAC) technology, which is an audio encoding method recently developed based on spatial cue.
According to conventional SAC technology, multichannel audio signals are encoded into down-mixed mono- channel signals or down-mixed stereo-channel signals and spatial cue information, and high-quality multi-channel signals are transmitted even at a low bit rate. According to the SAC technology, audio signals are analyzed for each sub-band, and original multi-channel audio signals are recovered from the down-mixed mono- channel or stereo-channel signals based on spatial cue information corresponding to each sub-band. The spatial cue information includes information to be used for recovering the original signals during a decoding process and decides the sound quality of audio signals restored in an SAC decoding apparatus. MPEG is working on standardization of SAC technology under the name of MPEG Surround (MPS) and uses Channel Level Difference (CLD) as a spatial cue.
According to conventional SAC technology, only one audio object can be encoded into multi-channel audio signals and decoded. Thus, multi-channel and multi- object audio signals, which are audio signals of diverse audio objects including multiple channels such as mono channel, stereo channel, and 5.1 channel, cannot be encoded and decoded.
Another conventional technology, which is Binaural Cue Coding (BCC) , is capable of encoding and decoding multi-object audio signals including only mono channel. The BCC technology cannot encode or decode multi-object audio signals formed of channels other than the mono channel .
Therefore, when multi-object audio signals of mono channel signals, stereo channel signals, or multi-channel signals are to be transmitted, the use of conventional audio encoding scheme comes to a bit rate problem inevitably.
To resolve the problem, yet another audio encoding scheme called Spatial Audio Object Coding (SAOC) has been introduced. The SAOC scheme is a method for encoding multi-object audio signals, whereas the conventional SAC scheme is a method focusing on multi-channel audio encoding. Multi-object audio encoding is a technology for compressing and transmitting different audio objects, and spatial cue representing features of each object can also be extracted. While conventional audio encoding scheme separately compresses each object, the SAOC scheme simultaneously processes one or more objects. According to the SAOC, one or more audio objects are represented by one stereo down-mixed signal and side information. Thus, the SAOC scheme can remarkably reduce a bit rate in comparison with conventional audio encoding scheme.
Conventional audio services generally have a functional limitation in that users are passive consumers of provided audio contents. The audio encoding method for each object, however, provides more active services to users. The method not only control each audio object according to a request from a user but also create diverse audio services and contents out of one content combination. Meanwhile, mixer or renderer, which provides such functions as panning, attenuation and suppression, can be applied to the SAOC. Thus, the SAOC scheme can flexibly control audio objects through interaction with a user.
However, since the SAC-based SAOC scheme cannot perfectly recover a sound source, there is a limitation in controlling major functions such as karaoke or solo- representation. Particularly, an SAOC system has a problem in that sound quality is seriously degraded due to a down-mix process based on a limited number of sub- bands. The number of sub-bands should be increased to solve this problem. However, when the number of sub- bands is increased indiscreetly, the amount of side information to be transmitted increases, too, which is problematic . DISCLOSURE TECHNICAL PROBLEM
An embodiment of the present invention is directed to providing an encoding and decoding apparatus and method that can improve sound quality by dividing a sub- band structure into smaller sub-bands while minimizing an increase of bit rate.
Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof.
TECHNICAL SOLUTION
In accordance with an aspect of the present invention, there is provided an encoding method based on variable sub-band analysis, which includes: generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and encoding the variable sub-band information and the parameter information.
In accordance with another aspect of the present invention, there is provided an encoding apparatus based on variable sub-band analysis, which includes: an audio encoder for generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; a frequency transformer for transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals; a sub-band constructor for dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; a parameter generator for generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and an encoder for encoding the variable sub-band information and the parameter information .
In accordance with another aspect of the present invention, there is provided a decoding method based on variable sub-band analysis, which includes: decoding down-mixed signals on multiple audio objects, variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the multiple audio objects, and the parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; transforming the decoded down-mixed signals into a frequency domain; re-constructing the sub-band based on the decoded variable sub-band information; recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and transforming the recovered audio objects into a time domain . In accordance with another aspect of the present invention, there is provided a decoding apparatus based on variable sub-band analysis, which includes: a decoder for decoding down-mixed signals on multiple audio objects, variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the multiple audio objects, and parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; a frequency transformer for transforming the decoded down-mixed signals into a frequency domain to thereby produce frequency-domain signals; a sub-band re- constructor for re-constructing the sub-band based on the decoded variable sub-band information; a recovery unit for recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and a time transformer for transforming the multiple audio objects into a time domain.
In accordance with another aspect of the present invention, there is provided an encoding method based on variable sub-band analysis, which includes: transforming an audio object into a frequency domain to thereby produce frequency-domain signals; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; and generating parameter information used for recovering the audio object based on the variable sub-bands . In accordance with another aspect of the present invention, there is provided an encoding apparatus based on variable sub-band analysis, which includes: a frequency transformer for transforming an audio object into a frequency domain to thereby produce frequency- domain signals; a sub-band constructor for dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; and a parameter generator for generating parameter information used for recovering the audio objects based on the variable sub- bands .
In accordance with another aspect of the present invention, there is provided a decoding method based on variable sub-band analysis, which includes: receiving bitstream including variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the sub- band of an audio object, and parameter information for recovering the audio objects based on the variable sub- bands; re-constructing the sub-band based on the variable sub-band information; and recovering the audio object by using the parameter information.
In accordance with another aspect of the present invention, there is provided a decoding apparatus based on variable sub-band analysis, which includes: a receiver for receiving bitstream including variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the sub-band of an audio object, and parameter information for recovering the audio objects based on the variable sub-bands; a sub-band re-constructor for reconstructing the sub-band based on the variable sub-band information; and a recovery unit for recovering the audio objects by using the parameter information.
Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art to which the present invention pertains that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof. When it is considered that detailed description on a related art may obscure a point of the present invention, the description will not be provided. Hereinafter, specific embodiments of the present invention will be described with reference to the accompanying drawings .
ADVANTAGEOUS EFFECTS
The present invention can improve sound quality by dividing a sub-band structure for an audio object.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 illustrates an audio encoding/decoding process in accordance with an embodiment of the present invention .
Fig. 2 is a block view showing a multi-object audio encoder in accordance with an embodiment of the present invention.
Fig. 3 is a block view showing a multi-object audio decoder in accordance with an embodiment of the present invention.
Fig. 4 illustrates a structure of a variable sub- band in accordance with an embodiment of the present invention .
Fig. 5 illustrates a re-constructing of a variable sub-band in accordance with an embodiment of the present invention . Fig. 6 is a view describing quantization using a variable bit level in accordance with an embodiment of the present invention.
Fig. 7 is a view describing dequantization using a variable bit level in accordance with an embodiment of the present invention.
MODE FOR THE INVENTION
Specific embodiments of the invention will be described with reference to the accompanying drawings. Since audio signals are down-mixed into signals of one audio object in a general multi-object/multi-channel audio signal encoding process, audio objects cannot be perfectly recovered during a decoding process. Particularly, when the power of audio signals of one audio object is decreased greatly, such as a Karaoke mode, the degradation of sound quality is remarkable.
Therefore, the present invention suggests a technology that extracts more accurate parameters by variably increasing the number of sub-bands for analyzing parameters during a process of encoding/decoding multi- object/multi-channel audio signals, and clearly recovers audio objects out of the down-mixed signals. During the process, it is possible to minimize an increase of a bit rate by applying a different quantization level according to a frequency characteristic of signals.
Fig. 1 illustrates an audio encoding/decoding process in accordance with an embodiment of the present invention. An encoder 101 receives an audio object. The number of inputted audio object is limitless. Thus, the encoder 101 may receive a plurality of audio objects (Object #1, Object #2, Object #3,...). The encoder 101 generates down-mixed signals by using the inputted audio objects, and extracts parameters to be required during a decoding process. The parameters may include side information shown in Fig. 1. A decoder 102 performs decoding. It outputs audio objects recovered by using the down-mixed signals and the parameters transmitted from the encoder 101. The recovered audio objects go through position/level interaction control in a mixer/renderer 103 and they are outputted through channels (Channel #1, Channel #2, Channel #3, ...). Herein, the encoder 101 and the decoder 102 may employ Spatial Audio Object Coding (SAOC) scheme.
Fig. 2 is a block view showing a multi-object audio encoder in accordance with an embodiment of the present invention. The multi-object audio encoding of the present invention includes analyzing the freguency band characteristic of signals, defining a sub-band structure used to analyze parameters, and applying a different parameter quantization method according to the frequency characteristic. The defined sub-band structure is re-constructed for recovery during a decoding process.
Audio objects (1, 2, ... , M) are inputted to an audio encoder 201 and a frequency transformer 202. The audio encoder 201 down-mixes the audio objects (1, 2, ..., M) to encode the audio objects (1, 2, ..., M). The frequency transformer 202 transforms the audio objects (1, 2, ..., M) into a frequency domain.
A sub-band constructor 203 divides a sub-band of frequency-transformed signals into variable sub-bands according to the characteristic of the sub-band. A parameter generator 205 extracts parameters needed to recover audio objects from down-mixed signals during the decoding process based on the variable sub-bands. Parameters for a sub-band may include Inter-Object Level Difference (IOLD) information. IOLD is a parameter for calculating a power ratio of two audio objects for each sub-band. The IOLD is expressed as the following Equation 1.
Figure imgf000012_0001
where M denotes the number of sub-bands; k denotes a frequency coefficient; and b denotes a sub-band index.
Also, a numerator term and a denominator term may be defined being switched with each other. A sub-band may be a fixed sub-band fixed according to an encoding method. For example, Moving Picture Experts Group (MPEG) Surround applies 20 to 28 fixed sub-bands to one audio signal frame. When calculation is performed with respect to each fixed sub-band and the resolution power of a band analyzing a fixed sub-band is low, there is a problem in that two signals are not separated out during the decoding process. To improve the separation performance and analyze parameters more accurately, the sub-band constructor 203 forms a sub-band of variable sub-bands. The sub-band constructor 203 will be described in detail with reference to Fig. 4.
A first encoder 204 encodes variable sub-band information generated in the sub-band constructor 203. A second encoder 206 encodes parameter information including the parameters generated in a parameter generator 205. The first and second encoders 204 and 206 may use a lossless coding method. A bitstream formatter 207 generates encoded variable sub-band information, parameter information, and audio objects into bitstreams. The generated bitstreams may be SAOC bitstreams.
Fig. 4 illustrates a structure of a variable sub- band in accordance with an embodiment of the present invention. The sub-band constructor 203 of Fig. 2 may include a spectrum analyzer 401 shown in Fig. 4. In Fig. 4, an object to be freely controlled by a user is referred to as Object #1, and the other objects are referred to as Object #2, Object #3, .... Herein, the spectrum analyzer 401 analyzes power of the frequency band of each signal and outputs new sub-band information, which is variable sub-band information.
The basic structure of a sub-band used for analyzing parameters follows 28 bands used in the MPEG Surround. When the power ratio of two signals within each sub-band fluctuates, a specific band is divided in smaller bands. The condition can be expressed as the following Equations 2 and 3.
av
Figure imgf000014_0001
Eq . 2
A{b+\)-\
∑log(power(Objecffl.(k))) - \og{power{ObjecM2{k))) - avrgby varό = k=A( —b)
Aφ + 1) - A{b)
Eq . 3 where avrgb denotes an average power ratio of two signals within a bth sub-band; and varb denotes a dispersion coefficient indicating the extent of change of the power ratio of the two signals.
When the dispersion coefficient acquired from Equation 3 exceeds a predetermined threshold value, the analyzed bth sub-band is divided into smaller sub-bands. A parameter indicating the structure of variable sub- bands is additionally transmitted so that the variable sub-bands can be easily re-constructed during the decoding process. For example, a parameter indicating the structure of sub-bands is marked as 0 or 1 for each band. Sub-bands marked as 1 signify that the band needs to be divided into smaller bands.
Fig. 3 is a block view showing a multi-object audio decoder in accordance with an embodiment of the present invention. A bitstream demultiplexer 301 receives bitstream, separates a signal for an audio object, a signal for parameter information, and a signal for variable sub-band information, and outputs the signals to decoders 302, 304 and 305, respectively. Herein, the bitstream may be SAOC bitstream. The signal for an audio object is decoded in the audio decoder 302 and outputted as a down-mixed signal. The down-mixed signal goes through frequency transform in a frequency transformer 303. The signal for parameter information is decoded in the parameter decoder 304 and outputted to a recovery unit 307. The signal for variable sub-band information is decoded in the variable sub-band decoder 305 and outputted to a sub-band re-constructor 306. The signal for parameter information and the signal for variable sub-band information may be decoded using a lossless decoding method. The recovery unit 307 recovers an audio object based on the sub-band, which is re-constructed using the variable sub-band information in the sub-band re- constructor 306, and the parameter information and the down-mixed signal of the frequency-transformed audio object. Herein, the parameter information may be spatial parameter including spatial cue information. The recovered audio object is transformed into time domain in a time transformer and finally outputted as an audio object. For example, two audio objects are recovered from one down-mixed signal by using an IOLD parameter based on the following Equation 4.
IOLD/
10 /l0 Object#\ = Downmix x IOLD/
1 + 10 /l0
Object#2 = Downmix x Λ \ -I0LD
1 + 10 /10
Eq . 4 Fig. 5 illustrates a re-construction of a variable sub-band in accordance with an embodiment of the present invention. Fig. 5 describes a process of re-constructing the sub-bands used during the encoding process in the sub-band re-constructor 306. 28 sub-bands are marked as 0 or 1 individually according to variable sub-band information. Bands marked as 0 are not changed, and bands marked as 1 are divided into a predetermined number of smaller bands and used. A block 501 represents 28- sub-band partition information used in the MPEG Surround based on FFT. An output A(k) represents partition of a kth band.
Fig. 6 is a view describing quantization using a variable bit level in accordance with an embodiment of the present invention. A variable level quantizer 601 may be included in the parameter generator 205 shown in Fig. 2. The variable level quantizer 601 analyzes a frequency band feature of an inputted parameter, performs variable bit quantization based on the feature, and outputs quantized parameters. Since the parameter generator performs variable bit quantization, it is possible to minimize an increase of bit rate.
Fig. 7 is a view describing dequantization using a variable bit level in accordance with an embodiment of the present invention. A variable level dequantizer 701 may be included in the variable sub-band decoder 305 of Fig. 3. The variable level dequantizer 701 receives a quantized parameter, performs variable bit dequantization based on a frequency band characteristic, and outputs dequanatized parameter.
As described above, the method of the present invention may be realized as a program and stored in a computer-readable recording medium such as CD-ROM, RAM, ROM, floppy disks, hard disks, magneto-optical disks and the like. Since the process can be easily implemented by those skilled in the art to which the present invention pertains, further description will not be provided herein. While the present invention has been described with respect to the specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.
BEST MODE
Following description exemplifies only the principles of the present invention. Even if they are not described or illustrated clearly in the present specification, one of ordinary skill in the art can embody the principles of the present invention and invent various apparatuses within the concept and scope of the present invention. The use of the conditional terms and embodiments presented in the present specification are intended only to make the concept of the present invention understood, and they are not limited to the embodiments and conditions mentioned in the specification.
In addition, all the detailed description on the principles, viewpoints and embodiments and particular embodiments of the present invention should be understood to include structural and functional equivalents to them. The equivalents include not only currently known equivalents but also those to be developed in future, that is, all devices invented to perform the same function, regardless of their structures. For example, block diagrams of the present invention should be understood to show a conceptual viewpoint of an exemplary circuit that embodies the principles of the present invention. Similarly, all the flowcharts, state conversion diagrams, pseudo codes and the like can be expressed substantially in a computer- readable media, and whether or not a computer or a processor is described distinctively, they should be understood to express various processes operated by a computer or a processor. Functions of various devices illustrated in the drawings including a functional block expressed as a processor or a similar concept can be provided not only by using hardware dedicated to the functions, but also by using hardware capable of running proper software for the functions. When a function is provided by a processor, the function may be provided by a single dedicated processor, single shared processor, or a plurality of individual processors, part of which can be shared.
The apparent use of a term, 'processor' , 'control' or similar concept, should not be understood to exclusively refer to a piece of hardware capable of running software, but should be understood to include a digital signal processor (DSP) , hardware, and ROM, RAM and non-volatile memory for storing software, implicatively . Other known and commonly used hardware may be included therein, too.
In the claims of the present specification, an element expressed as a means for performing a function described in the detailed description is intended to include all methods for performing the function including all formats of software, such as combinations of circuits for performing the intended function, firmware/microcode and the like.
To perform the intended function, the element is cooperated with a proper circuit for performing the software. The present invention defined by claims includes diverse means for performing particular functions, and the means are connected with each other in a method requested in the claims. Therefore, any means that can provide the function should be understood to be an equivalent to what is figured out from the present specification.
Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. The same reference numeral is given to the same element, although the element appears in different drawings. In addition, if detailed description on the related arts is considered to obscure a point of the present invention, the description is omitted. Hereafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. Hereinafter, specific embodiments of the present invention will be described with reference to the accompanying drawings.
The encoding of the present invention based on variable sub-band analysis can produce high-quality sound while minimizing an increase of a bit rate by dividing a sub-band of an audio object transformed into a frequency domain according to a characteristic of the sub-band.
To be specific, the encoding based on variable sub-band analysis transforms an audio object into a frequency domain, divides a sub-band into variable sub- bands according to the characteristic of the sub-band of a signal obtained after the transformation into the frequency domain, and generates variable sub-band information including information on the variable sub- band obtained after the sub-band division. Parameter information used for recovering the audio object is generated based on the variable sub-band. By dividing the sub-band, it is possible to provide a high-quality sound, and an increase in the bit rate can be suppressed by selectively dividing the sub-band.
Variable sub-band information and parameter information are encoded. The encoding may be lossless encoding. An audio object becomes down-mixed audio signals and the audio signals are encoded. Audio encoding may be performed in a conventional audio encoding method. The variable sub-band information, the parameter information, and the audio object are encoded into bitstream.
Characteristics of a sub-band include a dispersion coefficient characteristic of a power ratio of each sub- band. To be specific, when a dispersion coefficient of a power ratio of a specific sub-band is equal to or higher than a predetermined threshold value, the sub-band is divided. When the dispersion coefficient is lower than the threshold value, the sub-band is not divided and an existing sub-band is maintained. Meanwhile, parameter information can be quantized using a variable bit level, and the parameter information may include spatial parameter information. The increase in a bit rate caused by an increase in the number of sub- bands can be minimized by quantizing the parameter information based on a variable bit level.
The decoding of the present invention based on variable sub-band analysis can produce high-quality sound while minimizing an increase of a bit rate by recovering an audio object based on sub-band division information including information related to the division of the sub- band of audio object transformed into a frequency domain according to characteristics of the sub-band.
To be specific, the decoding based on variable sub-band analysis includes: receiving bitstream including parameter information for recovering an audio object based on a variable sub-band and variable sub-band information having information on a variable sub-band acquired from division according to the characteristic of a sub-band of an audio object, re-constructing sub-band based on the variable sub-band information, and recovering the audio object based on the parameter information. By dividing the sub-band, it is possible to provide a high-quality sound, and an increase in the bit rate can be suppressed by selectively dividing the sub- band.
Variable sub-band information and parameter information are decoded. The decoding may be lossless decoding. Bitstream may include bitstream on an audio object, and the audio object goes through audio decoding. Audio decoding may be performed in a conventional audio decoding method. The decoded audio object goes through frequency transform. The audio object is recovered by using the frequency-transformed audio object and the reconstructed sub-band. The re-constructed audio object goes through temporal transform and outputted.
Characteristics of a sub-band include a dispersion coefficient characteristic of a power ratio of each sub- band. To be specific, when a dispersion coefficient of a power ratio of a specific sub-band is equal to or higher than a predetermined threshold value, the sub-band is divided. When the dispersion coefficient is lower than the threshold value, the sub-band is not divided and an existing sub-band is maintained. The variable sub-band information can include sub-band characteristic information, and the sub-band can be re-constructed using the sub-band characteristic information.
Meanwhile, parameter information can be dequantized using a variable bit level, and the parameter information may include spatial parameter information. The increase in a bit rate caused by an increase in the number of sub-bands can be minimized by dequantizing the parameter information based on a variable bit level.
Hereafter, specific embodiments of the present invention will be described. <Encoding>
The encoding method of the present invention based on variable sub-band analysis includes: generating down- mixed signals out of a plurality of inputted audio objects and encoding the down-mixed signals; transforming a plurality of audio objects into a frequency domain; dividing a sub-band of a signal acquired from the transform into the frequency domain into variable sub- bands according to the characteristic of the sub-band and generating variable sub-band information including the variable sub-band information on the variable sub-bands; generating parameter information used for recovering the down-mixed signal based on the variable sub-bands; and encoding the variable sub-band information and the parameter information. Herein, the characteristic of the sub-band may include a dispersion coefficient characteristic of a power ratio of sub-bands. The parameter information may include spatial parameter information . Meanwhile, in the generation of the parameter information, the parameter information may be quantized using a variable bit level.
An encoding apparatus based on variable sub-band analysis includes an audio encoder, a frequency transformer, a sub-band constructor, a parameter generator, and an encoder. The audio encoder generates down-mixed signals out of inputted multiple audio objects and encodes the down-mixed signals. The frequency transformer transforms the multiple audio objects into a frequency domain. The sub-band constructor divides a sub-band of the signals obtained from the transform into the frequency domain into variable sub-bands according to the characteristics of the sub-band, and generates variable sub-band information including information on the variable sub-bands. The parameter generator generates parameter information used for recovering the down-mixed signals based on the variable sub-bands. The encoder encodes the variable sub-band information and the parameter information. Herein, the characteristics of the sub-band may include a dispersion coefficient characteristic of a power ratio of the sub-band. The parameter information may include spatial parameter information .
Meanwhile, the parameter generator may include a quantizer for quantizing the parameter information by using a variable bit level.
<Decoding>
The decoding method of the present invention based on variable sub-band analysis includes: decoding down- mixed signals on a plurality of audio objects, variable sub-band information including information on the variable sub-bands acquired from sub-band division according to the characteristics of the multiple audio objects, and parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; transforming the decoded down-mixed signals into a frequency domain; re-constructing the sub- band based on the decoded variable sub-band information; recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and transforming the recovered audio objects into a time domain. Herein, the characteristic of the sub-band may include a dispersion coefficient characteristic of a power ratio of sub-bands. The parameter information may include spatial parameter information.
Meanwhile, the decoding may include dequantizing the parameter information by using a variable bit level. A decoding apparatus based on variable sub-band analysis includes a decoder, a frequency transformer, a sub-band re-constructor, a recovery unit, and a time transformer. The decoder decodes the down-mixed signals on the multiple audio objects, the variable sub-band information including information on the variable sub- bands acquired from sub-band division according to the characteristics of the multiple audio objects, and the parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream. The frequency transformer transforms the decoded down-mixed signals into the frequency domain. The sub-band re-constructor re-constructs the sub-band based on the decoded variable sub-band information. The recovery unit recovers the multiple audio objects by using the decoded parameter information, the frequency- domain down-mixed signals, and the re-constructed sub- band. The time transformer transforms the multiple audio objects into a time domain. Herein, the characteristics of the sub-band may include a dispersion coefficient characteristic of a power ratio of sub-bands. The parameter information may include spatial parameter information.
Meanwhile, the decoder may include a dequantizer for dequantizing the parameter information by using a variable bit level.
INDUSTRIAL APPLICABILITY
The present invention is applied to encoding and decoding of audio signals.

Claims

WHAT IS CLAIMED IS
1. An encoding method based on variable sub-band analysis, comprising: generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals ; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and encoding the variable sub-band information and the parameter information.
2. The encoding method of claim 1, wherein the characteristic of the sub-band includes a dispersion coefficient characteristic of a power ratio of the sub- band.
3. The encoding method of claim 1, wherein said generating parameter information includes: quantizing the parameter information by using a variable bit level.
4. The encoding method of claim 1, wherein the parameter information is spatial parameter information.
5. An encoding apparatus based on variable sub- band analysis, comprising: an audio encoder for generating down-mixed signals out of inputted multiple audio objects and encoding the down-mixed signals; a frequency transformer for transforming the multiple audio objects into a frequency domain to thereby produce frequency-domain signals; a sub-band constructor for dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; a parameter generator for generating parameter information used for recovering the down-mixed signals based on the variable sub-bands; and an encoder for encoding the variable sub-band information and the parameter information.
6. The encoding apparatus of claim 5, wherein the characteristic of the sub-band includes a dispersion coefficient characteristic of a power ratio of the sub- band.
7. The encoding apparatus of claim 5, wherein the parameter generator includes : a quantizer for quantizing the parameter information by using a variable bit level.
8. The encoding apparatus of claim 5, wherein the parameter information is spatial parameter information.
9. A decoding method based on variable sub-band analysis, comprising: decoding down-mixed signals on multiple audio objects, variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the multiple audio objects, and the parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; transforming the decoded down-mixed signals into a frequency domain; re-constructing the sub-band based on the decoded variable sub-band information; recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down- mixed signals, and the re-constructed sub-band; and transforming the recovered audio objects into a time domain.
10. The decoding method of claim 9, wherein the characteristic of the sub-band includes a dispersion coefficient characteristic of a power ratio of the sub- band .
11. The decoding method of claim 9, wherein said decoding includes : dequantizing the parameter information by using a variable bit level.
12. The decoding method of claim 9, wherein the parameter information is spatial parameter information.
13. A decoding apparatus based on variable sub- band analysis, comprising: a decoder for decoding down-mixed signals on multiple audio objects, variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the multiple audio objects, and parameter information for recovering the down-mixed signals based on the variable sub-bands from inputted bitstream; a frequency transformer for transforming the decoded down-mixed signals into a frequency domain to thereby produce frequency-domain signals; a sub-band re-constructor for re-constructing the sub-band based on the decoded variable sub-band information; a recovery unit for recovering the multiple audio objects by using the decoded parameter information, the frequency-domain down-mixed signals, and the reconstructed sub-band; and a time transformer for transforming the multiple audio objects into a time domain.
14. The decoding apparatus of claim 13, wherein the characteristic of the sub-band includes a dispersion coefficient characteristic of a power ratio of the sub- band.
15. The decoding apparatus of claim 13, wherein the decoder includes: a dequantizer for dequantizing the parameter information by using a variable bit level.
16. The decoding apparatus of claim 13, wherein the parameter information is spatial parameter information .
17. An encoding method based on variable sub-band analysis, comprising: transforming an audio object into a frequency domain to thereby produce frequency-domain signals; dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; and generating parameter information used for recovering the audio object based on the variable sub- bands .
18. An encoding apparatus based on variable sub- band analysis, comprising: a frequency transformer for transforming an audio object into a frequency domain to thereby produce frequency-domain signals; a sub-band constructor for dividing a sub-band of the frequency-domain signals into variable sub-bands based on a characteristic of the sub-band and generating variable sub-band information including information on the variable sub-bands; and a parameter generator for generating parameter information used for recovering the audio objects based on the variable sub-bands .
19. A decoding method based on variable sub-band analysis, comprising: receiving bitstream including variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the sub-band of an audio object, and parameter information for recovering the audio objects based on the variable sub-bands; re-constructing the sub-band based on the variable sub-band information; and recovering the audio object by using the parameter information.
20. A decoding apparatus based on variable sub- band analysis, comprising: a receiver for receiving bitstream including variable sub-band information including information on variable sub-bands acquired by dividing a sub-band based on a characteristic of the sub-band of an audio object, and parameter information for recovering the audio objects based on the variable sub-bands; a sub-band re-constructor for re-constructing the sub-band based on the variable sub-band information; and a recovery unit for recovering the audio objects by using the parameter information.
PCT/KR2008/005824 2007-10-12 2008-10-02 Encoding and decoding method using variable subband analysis and apparatus thereof WO2009048239A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20070103184 2007-10-12
KR10-2007-0103184 2007-10-12
KR10-2008-0095541 2008-09-29
KR1020080095541A KR20090037806A (en) 2007-10-12 2008-09-29 Encoding and decoding method using variable subband aanlysis and apparatus thereof

Publications (2)

Publication Number Publication Date
WO2009048239A2 true WO2009048239A2 (en) 2009-04-16
WO2009048239A3 WO2009048239A3 (en) 2009-05-28

Family

ID=40549727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/005824 WO2009048239A2 (en) 2007-10-12 2008-10-02 Encoding and decoding method using variable subband analysis and apparatus thereof

Country Status (1)

Country Link
WO (1) WO2009048239A2 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2693431A1 (en) * 2012-08-01 2014-02-05 Nintendo Co., Ltd. Data compression apparatus, data compression program, data compression system, data compression method, data decompression apparatus, data compression/decompression apparatus, and data structure of compressed data
WO2014147441A1 (en) * 2013-03-20 2014-09-25 Nokia Corporation Audio signal encoder comprising a multi-channel parameter selector
US9031852B2 (en) 2012-08-01 2015-05-12 Nintendo Co., Ltd. Data compression apparatus, computer-readable storage medium having stored therein data compression program, data compression system, data compression method, data decompression apparatus, data compression/decompression apparatus, and data structure of compressed data
US9420375B2 (en) 2012-10-05 2016-08-16 Nokia Technologies Oy Method, apparatus, and computer program product for categorical spatial analysis-synthesis on spectrum of multichannel audio signals
US9672837B2 (en) 2013-09-12 2017-06-06 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US9820077B2 (en) 2014-07-25 2017-11-14 Dolby Laboratories Licensing Corporation Audio object extraction with sub-band object probability estimation
RU2646337C1 (en) * 2014-03-28 2018-03-02 Самсунг Электроникс Ко., Лтд. Method and device for rendering acoustic signal and machine-readable record media
US9911423B2 (en) 2014-01-13 2018-03-06 Nokia Technologies Oy Multi-channel audio signal classifier
EP2599081B1 (en) * 2010-07-30 2020-12-23 Qualcomm Incorporated(1/3) Systems, methods, apparatus, and computer-readable media for dynamic bit allocation
CN113314132A (en) * 2021-05-17 2021-08-27 武汉大学 Audio object coding method, decoding method and device applied to interactive audio system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007089131A1 (en) * 2006-02-03 2007-08-09 Electronics And Telecommunications Research Institute Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007089131A1 (en) * 2006-02-03 2007-08-09 Electronics And Telecommunications Research Institute Method and apparatus for control of randering multiobject or multichannel audio signal using spatial cue

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
'19th International Congress on Acoustics, Madrid, 2-7 September 2007', article JEROEN BREEBAART ET AL.: 'Spatial psychoacoustics as the basis for innovations in the field of audio coding and processing' *
CHRISTOF FALLER ET AL.: 'Parametric coding of spatial audio' ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE THESE POUR L'OBTENTION DU GRADE DE DOCTEUR ES SCIENCES no. 3062, 2004, pages 84 - 89 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2599081B1 (en) * 2010-07-30 2020-12-23 Qualcomm Incorporated(1/3) Systems, methods, apparatus, and computer-readable media for dynamic bit allocation
US10229688B2 (en) 2012-08-01 2019-03-12 Nintendo Co., Ltd. Data compression apparatus, computer-readable storage medium having stored therein data compression program, data compression system, data compression method, data decompression apparatus, data compression/decompression apparatus, and data structure of compressed data
US9031852B2 (en) 2012-08-01 2015-05-12 Nintendo Co., Ltd. Data compression apparatus, computer-readable storage medium having stored therein data compression program, data compression system, data compression method, data decompression apparatus, data compression/decompression apparatus, and data structure of compressed data
EP2693431A1 (en) * 2012-08-01 2014-02-05 Nintendo Co., Ltd. Data compression apparatus, data compression program, data compression system, data compression method, data decompression apparatus, data compression/decompression apparatus, and data structure of compressed data
US9420375B2 (en) 2012-10-05 2016-08-16 Nokia Technologies Oy Method, apparatus, and computer program product for categorical spatial analysis-synthesis on spectrum of multichannel audio signals
US10199044B2 (en) 2013-03-20 2019-02-05 Nokia Technologies Oy Audio signal encoder comprising a multi-channel parameter selector
WO2014147441A1 (en) * 2013-03-20 2014-09-25 Nokia Corporation Audio signal encoder comprising a multi-channel parameter selector
US11838798B2 (en) 2013-09-12 2023-12-05 Dolby International Ab Method and apparatus for audio decoding based on dequantization of quantized parameters
US11297533B2 (en) 2013-09-12 2022-04-05 Dolby International Ab Method and apparatus for audio decoding based on dequantization of quantized parameters
US10057808B2 (en) 2013-09-12 2018-08-21 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US9672837B2 (en) 2013-09-12 2017-06-06 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US10694424B2 (en) 2013-09-12 2020-06-23 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US10383003B2 (en) 2013-09-12 2019-08-13 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US9911423B2 (en) 2014-01-13 2018-03-06 Nokia Technologies Oy Multi-channel audio signal classifier
US10687162B2 (en) 2014-03-28 2020-06-16 Samsung Electronics Co., Ltd. Method and apparatus for rendering acoustic signal, and computer-readable recording medium
US10382877B2 (en) 2014-03-28 2019-08-13 Samsung Electronics Co., Ltd. Method and apparatus for rendering acoustic signal, and computer-readable recording medium
US10149086B2 (en) 2014-03-28 2018-12-04 Samsung Electronics Co., Ltd. Method and apparatus for rendering acoustic signal, and computer-readable recording medium
RU2646337C1 (en) * 2014-03-28 2018-03-02 Самсунг Электроникс Ко., Лтд. Method and device for rendering acoustic signal and machine-readable record media
US10638246B2 (en) 2014-07-25 2020-04-28 Dolby Laboratories Licensing Corporation Audio object extraction with sub-band object probability estimation
US9820077B2 (en) 2014-07-25 2017-11-14 Dolby Laboratories Licensing Corporation Audio object extraction with sub-band object probability estimation
CN113314132A (en) * 2021-05-17 2021-08-27 武汉大学 Audio object coding method, decoding method and device applied to interactive audio system
CN113314132B (en) * 2021-05-17 2022-05-17 武汉大学 Audio object coding method, decoding method and device in interactive audio system

Also Published As

Publication number Publication date
WO2009048239A3 (en) 2009-05-28

Similar Documents

Publication Publication Date Title
USRE49492E1 (en) Audio or video encoder, audio or video decoder and related methods for processing multi-channel audio or video signals using a variable prediction direction
JP6170520B2 (en) Audio and / or speech signal encoding and / or decoding method and apparatus
RU2710949C1 (en) Device and method for stereophonic filling in multichannel coding
EP2947653B1 (en) Multi-channel audio coding using complex prediction and window shape information
KR101303441B1 (en) Audio coding using downmix
WO2009048239A2 (en) Encoding and decoding method using variable subband analysis and apparatus thereof
KR101452722B1 (en) Method and apparatus for encoding and decoding signal
CN105957532B (en) Method and apparatus for encoding and decoding audio/speech signal
US20080077412A1 (en) Method, medium, and system encoding and/or decoding audio signals by using bandwidth extension and stereo coding
KR20100087661A (en) Method of coding/decoding audio signal and apparatus for enabling the method
KR20090095009A (en) Method and apparatus for encoding/decoding multi-channel audio using plurality of variable length code tables
KR20170024581A (en) Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
JP2015528926A (en) Generalized spatial audio object coding parametric concept decoder and method for downmix / upmix multichannel applications
KR20170028886A (en) Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
KR20100063639A (en) Decoder and decoding method for multichannel audio coder using sound source location cue
KR101434209B1 (en) Apparatus for encoding audio/speech signal
KR101434207B1 (en) Method of encoding audio/speech signal
KR101434206B1 (en) Apparatus for decoding a signal
KR20090037806A (en) Encoding and decoding method using variable subband aanlysis and apparatus thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08837059

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08837059

Country of ref document: EP

Kind code of ref document: A2