CN100364235C - Apparatus and methods for multichannel digital audio coding - Google Patents

Apparatus and methods for multichannel digital audio coding Download PDF

Info

Publication number
CN100364235C
CN100364235C CNB2005100958986A CN200510095898A CN100364235C CN 100364235 C CN100364235 C CN 100364235C CN B2005100958986 A CNB2005100958986 A CN B2005100958986A CN 200510095898 A CN200510095898 A CN 200510095898A CN 100364235 C CN100364235 C CN 100364235C
Authority
CN
China
Prior art keywords
resolution
code book
index
sub
transient state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005100958986A
Other languages
Chinese (zh)
Other versions
CN1848690A (en
Inventor
游余立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Digital Rise Technology Co Ltd
Original Assignee
GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=37078085&utm_source=***_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CN100364235(C) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU filed Critical GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Publication of CN1848690A publication Critical patent/CN1848690A/en
Application granted granted Critical
Publication of CN100364235C publication Critical patent/CN100364235C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A low bit rate digital audio coding system includes an encoder which assigns codebooks to groups of quantization indexes based on their local properties resulting in codebook application ranges that are independent of block quantization boundaries. The invention also incorporates a resolution filter bank, or a tri-mode resolution filter bank, which is selectively switchable between high and low frequency resolution modes or high, low and intermediate modes such as when detecting transient in a frame. The result is a multichannel audio signal having a significantly lower bit rate for efficient transmission or storage. The decoder is essentially an inverse of the structure and methods of the encoder, and results in a reproduced audio signal that cannot be audibly distinguished from the original signal.

Description

Multi-sound channel audio encoding device and method thereof
Related application
The application requires the priority of the U.S. Provisional Application 60/610,674 of application on September 17th, 2004.
Background of invention
The present invention relates generally to the method and system that is used for the Code And Decode multichannel digital audio signal.Or rather, the present invention relates to the digital audio encoding system of a low bit rate, it is obtaining to greatly reduce the bit rate of multi-channel audio signal so that effectively send or store when transparent audio signal is reproduced, even even expert listener can not be distinguished the audio signal and the primary signal of reducing in decoder end.
The multi-sound channel digital audio coded system generally includes following elements: the T/F analysis filterbank, its produce PCM (pulse code modulation) sample of input a frequency representation, be called sub-band samples or subband signal; Psychoacoustic model, its auditory properties based on people's ear calculates a masking threshold, and the quantizing noise that is lower than this masking threshold is unlikely heard; Overall situation bit distributor, it is to every group of sub-band samples allocation bit resource, so that the quantization noise power that obtains is lower than masking threshold; A plurality of quantizers, it quantizes sub-band samples according to the bit that is assigned with; A plurality of entropy coders, it reduces the statistical redundancy in the quantification index; With last multiplexer, the entropy coding of its quantification index and other supplementary are packaged into a complete bit stream.
For example, Doby AC-3 is mapped to input PCM sample in the frequency domain with MDCT (improved discrete cosine transform) bank of filters of the high frequency resolution of changeable window size.Steady-state signal is analyzed with 512 window, and transient signal and 256 s' window is analyzed.Be represented as index/mantissa and be quantized subsequently from the subband signal of MDCT.Adopt forward direction-back to optimize and quantize and reduce the required bit of coded-bit assignment information to adaptive psychoacoustic model.Do not use entropy coding for the complexity that reduces decoder.At last, quantification index and other supplementary are multiplexed into a complete AC-3 bit stream.The frequency resolution of the adaptive M DCT that disposes among the AC-3 is not complementary with input signal characteristics well, so its compression property is very restricted.Lacking of entropy coding is another factor that limits its compression property.
MPEG1﹠amp; 2 layers of III (MP3) use the multiphase filter group of one 32 frequency band, each sub-filter wherein all follow one 6 and 18 between the adaptive M DCT that switches.A senior psychoacoustic model is used to instruct its Bit Allocation in Discrete and scalar non-uniform quantizing.Huffman (Huffman) sign indicating number is used to most other supplementary of coded quantization exponential sum.The relatively poor frequency isolation of hybrid filter-bank has greatly limited its compression property and has had very high algorithm complexity.
The relevant acoustics of DTS adopts the multiphase filter group of one 32 frequency band to obtain the low resolution frequency representation of input signal.In order to compensate relatively poor frequency resolution, ADPCM (adaptive differential pulse-code modulation) optionally is used for each subband.If ADPCM produces a good coding gain, then evenly scalar quantization is applied directly to sub-band samples or is applied to prediction residual.Vector quantization can optionally be applied to high-frequency subband.Huffman code can optionally be applied to scalar quantization index and other supplementary.Because the structure of multiphase filter group+ADPCM can not provide good time and frequency resolution at all, its compression property is very low.
MPEG 2 AAC and MPEG 4 AAC adopt an adaptive M DCT bank of filters, and its window size can switch between 256 and 2048.The masking threshold that psychoacoustic model produces is used to instruct its scalar non-uniform quantizing and Bit Allocation in Discrete.Huffman code is used to most other supplementary of coded quantization exponential sum.Many other instruments such as TNS (temporary transient noise shaping), gain controlling (being similar to the hybrid filter-bank of MP3), spectrum prediction (linear prediction in the subband) are used to further strengthen its compression property, and this is a cost greatly to have increased algorithm complexity.
Therefore, still need the audio coding system of a low bit rate, its bit rate that has greatly reduced multi-channel audio signal reproduces and also can obtain transparent audio signal simultaneously to be used for effectively transmission or storage.The present invention has satisfied these needs and other associated advantages is provided.
Summary of the invention
In the following discussion, term " analysis/synthetic filtering device group " waits analysis/synthetic equipment or the method that refers to time of implementation-frequency.It can comprise as follows without limitation:
Unitary transformation;
Threshold sampling, uniformly or become during band pass filter group heterogeneous or change group when non-;
Harmonic wave or sinusoidal wave analyzer/synthesizer.
Multiphase filter group, DFT (discrete Fourier transform), DCT (discrete cosine transform) and MDCT are some bank of filters that are widely used.Term " subband signal or sub-band samples " etc. refers to signal or the sample that comes from analysis filterbank and enter the composite filter group.
An object of the present invention is low rate encoding for multi-channel audio signal provides with the compression performance of the same level of prior art but has reduced algorithm complexity.
Distolateral the finishing of coding, encoder comprises by encoder for this:
1) framer is used for the PCM sample cluster segmentation of input is become the quasi-stable state frame, and its size is the integral multiple of the sub band number of analysis filterbank, and its time scope is 2 to 50ms.
2) transient detector is used for detecting the existence of this frame transient state.An embodiment is according to the threshold value of getting the subband range measurement, obtains the sub-band samples of the analysis filterbank of threshold value under the low frequency resolution model.
3) analysis filterbank of variable-resolution is used for the PCM sample conversion of input is become sub-band samples, and it can be carried out with one of following:
A) bank of filters can be switched its operation between high, medium and low frequency resolution mode.
The high frequency resolution pattern is used for the stable state frame, and in, the low frequency resolution model frame that is used to have transient state.In a transient state frame, the low frequency resolution model is used to the transient state section, and the mid-resolution pattern is used to the remainder of this frame, has three class frames under this framework:
I) bank of filters is only operated the stable state frame of handling with the high frequency resolution pattern;
Ii) bank of filters is operated the transient state frame of handling with middle and high temporal resolution pattern;
Iii) bank of filters is only with the slow transient state frame of mid-resolution pattern operational processes;
Two preferred embodiments are presented as follows:
I) DCT realizes, wherein, three other resolution of level are corresponding to three DCT block lengths;
Ii) MDCT realizes, wherein, three other resolution of level are corresponding to three MDCT block lengths or length of window.Define a plurality of window types with the conversion between these windows of bridge joint.
B) hybrid filter-bank, it can switch the bank of filters of its operation based on one between the high-resolution and low-resolution pattern;
When i) not having transient state in present frame, it switches to the high frequency resolution pattern to guarantee the high compression performance of stable state section;
When ii) having transient state in present frame, it switches to low frequency resolution/high time resolution pattern to avoid the forward direction echo effect.This low frequency resolution model has also been followed a transient state cluster segmentation level, its sub-band samples is divided into the stable state section, the bank of filters or the ADPCM of an arbitrary resolution of heel in each subband alternatively then, if selecteed words can be used for providing suitable frequency resolution to each stable state section.
Provide two embodiment, wherein, one based on DCT and another is based on MDCT.The embodiment that provides two transient state sections goes out, wherein, one based on getting threshold value another is based on the k mean algorithm, two embodiment use the subband range measurement.
2) psychoacoustic model of calculating masking threshold.
3) optionally and/the difference encoder, the sub-band samples of its left and right acoustic channels centering convert to and/the difference sound channel is right.
4) optional combined strength encoder, its contrast source sound channel is extracted the intensity factor (boot vector) of associating sound channel, will unite sound channel and merge in the sound channel of source, and abandon each sub-band samples in the associating sound channel.
5) overall bit distributor, its bit resource allocation are given many group sub-band samples, so that their quantization noise power is lower than masking threshold.
6) scalar quantizer, its step-length that provides with bit distributor quantizes all sub-band samples.
7) optional interleaver, when having transient state in the frame, it optionally is used for rearranging quantification index so that reduce total number of bits.
8) entropy coder, its partial statistics characteristic based on quantification index is distributed to many group quantification indexes to best code book from the code book storehouse, comprise the following steps:
A) best code book is distributed to each quantification index, therefore in fact quantification index is converted to the code book index.
B) these code book indexes are divided into very big section, segment boundary has defined the range of application of code book.
A preferred embodiment is:
C) be the quantification index piecemeal district's group (granule), each district's group comprises the quantification index of fixed number.
D) determine the maximum code book demand of each district's group.
E) minimum code book is distributed to the district's group that can hold its maximum code book demand:
F) remove those code book indexes isolated pocket littler than its neighbour's code book index; Those isolated pockets corresponding to the code book index of zero quantification index can be without such processing.
Be used for the preferred embodiment that coding code book range of application is encoded has been used run-length code.
9) entropy coder, it is with code book and by the range of application that entropy code book selector is determined all quantification indexes of encoding.
10) multiplexer, all the entropy codes and the supplementary of its quantification index are packaged into a complete bit stream, and structure is to appear at before the index that is used for quantization step for quantification index like this.This structure makes unnecessary quantifying unit number each the transient state section into bit stream of packing, because it can be from being recovered by the quantification index that unpacks.
Decoder of the present invention comprises:
1) demultiplexing device is used for unpacking different code words from bit stream;
2) quantification index code book decoder is used for being used for from bit stream decoding entropy code book and each range of application (application range) thereof of quantification index;
3) entropy decoder is used for from the bit stream quantification index of decoding;
4) optional deinterleaver, when having transient state in present frame, it optionally rearranges quantification index;
5) the quantifying unit number is reproduced device, and it rebuilds the quantifying unit number of each transient state section from quantification index with the following step:
A) find maximum subband for each transient state section with non-zero quantification index;
B) find the minimum critical frequency band that can hold this subband, the quantifying unit number of this transient state section that Here it is;
6) step-length de-packetizer, it unpacks the quantization step of all quantifying unit;
7) inverse quantizer, it rebuilds sub-band samples from quantification index and step-length;
8) optional combined strength decoder, it utilizes combined strength scale factor (boot vector) to rebuild the sub-band samples of associating sound channel from the sub-band samples of source sound channel;
9) optionally and/the difference decoder, its from the sub-band samples of/difference sound channel rebuild left and right acoustic channels sub-band samples;
10) the composite filter group of variable-resolution, it rebuilds audio frequency PCM sample from sub-band samples, and this can be by with the execution of getting off:
A) composite filter group can be switched its operation between high, medium and low resolution model;
B) mix the composite filter group, it is based on the composite filter group that can switch between the high-resolution and low-resolution pattern;
I) when bit stream indication present frame be when encoding with the low frequency resolution model with convertible Analysis of Resolution bank of filters, this composite filter group is a secondary hybrid filter-bank, wherein, the first order is the composite filter group of an arbitrary resolution or a contrary ADPCM, and the second level is the low frequency resolution model of the self adaptation composite filter group that can switch between high and low frequency resolution model;
Ii) when bit stream indication present frame be when encoding with the high frequency resolution pattern with convertible Analysis of Resolution bank of filters, this composite filter group is nothing but the composite filter group of the convertible resolution under the high frequency resolution pattern.
At last, the invention provides a low coding delay pattern, this pattern is activated when the high frequency resolution pattern of switchable resolution analysis filterbank is forbidden by encoder, and frame length is reduced to block length or its integral multiple of the switchable resolution bank of filters under the low frequency resolution model subsequently.
According to a method that is used for the Code And Decode multichannel digital audio signal provided by the invention, comprise the following steps:
A) the PCM sample cluster segmentation of input is become the quasi-stable state frame;
B) utilize the analysis filterbank of variable-resolution that the PCM sample conversion is become sub-band samples;
C) the sub-band samples piecemeal is quantized into numerous quantification indexes;
D) provide pre-designed code book storehouse;
E) based on the local characteristics of quantification index code book is distributed to many group quantification indexes, thereby make code book range of application and piece quantization boundary have nothing to do;
F) code book index and range of application separately thereof are encoded;
G) create a complete encoded data stream, this encoded data stream comprises the quantification index that distributes through code book and described encoded code book index and range of application separately thereof;
H) send this complete encoded data stream;
I) receive this encoded data stream and unpack this data flow;
J) quantification index of from data flow, decoding;
K) from decoded quantification index, rebuild sub-band samples; With
L) from the sub-band samples of rebuilding, rebuild audio frequency PCM sample.
According to the present invention, the method for coding multichannel digital audio signal generally includes from multichannel digital audio signal creates PCM sample and the step that this PCM sample conversion is become sub-band samples.A plurality of quantification index throughput beggars with border are with sample and are created.By the code book that can hold the minimum of quantification index in the code book storehouse of design is in advance distributed to each quantification index, quantification index is converted into the code book index.Before the encoded data stream that is used to store or send in establishment, the code book index is by cluster segmentation and coding.
In general, the PCM sample is imported in the quasi-stable state frame of duration between 2 to 50 milliseconds (ms).Masking threshold for example can use, and a psychoacoustic model calculates.Bit distributor the bit resource allocation in many groups sub-band samples, so that quantization noise power is lower than masking threshold.
Switch process comprises: use a resolution bank of filters of switching under high and low frequency resolution model selectively.Detect transient state, when not detecting transient state, use the high frequency resolution pattern; Yet when detecting transient state, the resolution bank of filters is switched to the low frequency resolution model.Along with the resolution bank of filters is switched to the low frequency resolution model, sub-band samples just is divided into the stable state section.The frequency resolution of each stable state section is repaired with the bank of filters or the adaptive differential pulse-code modulation of arbitrary resolution.
Quantification index is rearranged in the time of can having transient state in frame to reduce total number of bits.Run length coding, RLC device can be used for the encoding application boundary of best entropy code book can adopt the cluster segmentation algorithm.
Can be used to the sub-band samples of left and right acoustic channels centering is transformed into/difference encoder and/differ from sound channel centering.In addition, the combined strength encoder can be used for contrasting the intensity factor that the source sound channel is extracted the associating sound channel, the associating sound channel is merged into the source sound channel, and abandons relevant subbands samples all in the associating sound channel.
In general, create the combination step of a complete bit data flow by using a multiplexer to carry out in storage or before decoder sends the coded digital audio signal.
The method of decoding audio data bit stream comprises: as by using a demultiplexing device to come the received code audio data stream and unpacking this data flow.Entropy code book index and range of application separately thereof are decoded.This may relate to run length and entropy decoder.They also are used to the quantification index of decoding.
When detecting transient state in present frame, quantification index is as by rearranging with deinterleaver.Sub-band samples is rebuild from decoded quantification index then.By use the composite filter group of the variable-resolution that can switch between low and high frequency resolution pattern, audio frequency PCM sample is rebuilt from the sub-band samples of rebuilding.When data flow indication present frame is when encoding with the low frequency resolution model with the switchable resolution analysis filterbank, variable synthetic resolution bank of filters is as a secondary hybrid filter-bank, wherein, the first order comprises composite filter group or contrary adaptive differential pulse-code modulation of an arbitrary resolution, and the second level is the low frequency resolution model of variable composite filter group.When data flow indication present frame is when encoding with the high frequency resolution pattern with the analysis filterbank of switchable resolution, variable-resolution composite filter group is operated under the high frequency resolution pattern.
A combined strength decoder can be used for rebuilding combined sound road subband sample with the combined strength scale factor from the sound channel sub-band samples of source.In addition and/the difference decoder can be used to from/difference sound channel sub-band samples rebuild left and right acoustic channels sub-band samples.
Result of the present invention is the digital audio encoding system of a low bit rate, and its bit rate that has greatly reduced multi-channel audio signal also obtains transparent audio signal simultaneously and reproduces to be used for effective transmission, so that be difficult to it and primary signal are distinguished.
Other features and advantages of the present invention will become from following detailed description obviously with reference to the accompanying drawings, and it illustrates principle of the present invention by way of example.
Description of drawings
Following accompanying drawing is used for illustrating the present invention.In these accompanying drawings:
Fig. 1 is a schematic diagram, describes the Code And Decode according to multichannel digital audio signal of the present invention;
Fig. 2 is a schematic diagram, and an example encoder used according to the invention has been described;
Fig. 3 is the schematic diagram of analysis filterbank of variable-resolution with bank of filters of arbitrary resolution;
Fig. 4 is the schematic diagram of analysis filterbank with variable-resolution of ADPCM;
Fig. 5 is the schematic diagram that is used for changeable MDCT window type according to of the present invention;
Fig. 6 is a schematic diagram according to transient state section of the present invention;
Fig. 7 is an application schematic diagram that has the switchable filter group of two resolution models according to of the present invention;
Fig. 8 is an application schematic diagram that has the switchable filter group of three resolution models according to of the present invention;
Be similar to Fig. 5, Fig. 9 is the schematic diagram according to other window type of the changeable MDCT that is used to have three resolution models of the present invention;
Figure 10 has described the one group of example that has the changeable MDCT series of windows of three resolution models according to of the present invention;
Figure 11 is definite schematic diagram of the entropy code book compared with prior art of the present invention;
Figure 12 is the schematic diagram that the code book index is divided into the isolated pocket of very big section or elimination code book index according to the present invention;
Figure 13 is the schematic diagram of the decoder that is equipped with of the present invention;
Figure 14 is a schematic diagram according to the composite filter group of the variable-resolution of the bank of filters with arbitrary resolution of the present invention;
Figure 15 is a schematic diagram with variable-resolution composite filter group of contrary ADPCM; With
Figure 16 is the structural representation of the bit stream when using meromict bank of filters or switchable filter group+ADPCM according to the present invention.
Figure 17 is when handling the transient state of interval one frame, is short to the advantage schematic diagram of the long window of conversion.
Figure 18 is the structural representation of the bit stream when using three-mode switchable filter group according to the present invention.
Embodiment
As shown in the drawing, for illustrative purposes, the present invention relates to a low bit rate digital audio encoding and decode system, its bit rate that has greatly reduced multi-channel audio signal has also been realized transparent audio reproducing simultaneously to be used for effectively transmission or storage.That is, the audio signal bit rate of multi-channel encoder reduces by using the lower system of algorithm complex, even and expert listener also can't distinguish the audio signal and the primary signal of on decoder end, reducing.
As shown in fig. 1, encoder 5 of the present invention is encoded into bit stream with multi-channel audio signal as input and with it, and has greatly reduced bit rate to be suitable for transmission or storage on the media of sound channel finite capacity.As long as receive the bit stream that produces by encoder 5, the multi-channel audio signal that decoder 10 is just decoded to it and reconstruction even expert listener can not be distinguished itself and primary signal.
In encoder 5 and decoder 10 inside, multi-channel audio signal is used as discrete channels and handles.That is, each sound channel and other sound channel are similarly treated, unless clearly specified associating sound channel coding 2.This has made explanation with the encoder structure of extremely simplifying in Fig. 1.
The coder structure that utilizes this extreme to simplify, its encoding process procedure declaration is as follows.Audio signal from each sound channel at first is broken down into subband signal in the first order 1 of analysis filterbank.Subband signal from all sound channels is optionally delivered to associating sound channel encoder 2, and it corresponding to the subband signal from the same frequency band of different sound channels, adopts the auditory properties of people's ear to reduce bit rate by combination.Can in 2, be quantized then and in 3, be encoded by the subband signal of combined coding.Quantification index or their entropy coding and in 4, be multiplexed into a complete bit stream then from the supplementary of all sound channels and send or storage being used for.
On decoding end, bit stream is supplementary and quantification index or its entropy coding by demultiplexing in 6 at first.Entropy coding decoded in 7 (note: the entropy decoding and the demultiplexing of the prefix code such as Huffman code are carried out in a single step usually).Subband signal utilizes in 7 that to quantize the step-length that exponential sum carried by supplementary rebuilt.If in encoder, use associating sound channel coding, then unite channel decoding and in 8, be performed.Then, the audio signal of each sound channel utilizes subband signal rebuilt in synthetic level 9.
The discrete feature of the Code And Decode method that the encoder structure that above-mentioned extreme is simplified is used to illustrate that separately the present invention provides.The Code And Decode method difference that is applied to each sound channel of audio signal is greatly different and complicated more.Unless otherwise mentioned, then these methods are described as follows in a sound channel environment of audio signal.
Encoder
The universal method of a sound channel of coding audio signal is described as follows in Fig. 2:
Framer 11 is divided into the quasi-stable state frame to the input PCM sample of duration from 2 to 50ms.The definite number of PCM sample must be the integral multiple of the subband maximum number of the different bank of filters of use in the T/F analysis filterbank 13 of variable-resolution in one frame.The maximum number of supposing subband is N, and the number of PCM sample is in the frame so
L=k·N
Wherein, k is a positive integer.
Transient analysis 12 detects the existence of transient state in the current incoming frame and this information is passed to variable-resolution analysis bank 13.
Can adopt any known transient state detection method here.In one embodiment of the invention, the incoming frame of PCM sample is sent to the low frequency resolution model of the analysis filterbank of variable-resolution.Allow s (m, n) expression is from the output sample of this bank of filters, wherein, m is a subband index and n is the time index (temporal index) in the subband domain.In the following discussion, term " transient state detects distance " waits the range measurement that refers to each time index definition:
E ( n ) = Σ m = 0 M - 1 | s ( m , n ) |
Or
E ( n ) = Σ m = 0 M - 1 s 2 ( m , n )
Wherein, M is the subband number of bank of filters.The range measurement of other type also can be used with similar method.Allow E max = Max n E ( n ) With E min = Min n E ( n ) Be the minimum and maximum value of this distance, if
E max - E min E max + E min > Threshold (threshold value)
Then there is transient state in statement, and wherein, threshold value can be set to 0.5.
The present invention uses the analysis filterbank 13 of a variable-resolution.Exist many known methods to realize the analysis filterbank of variable-resolution.An outstanding method is to use can switch the bank of filters of its operation between high and low frequency resolution model, the high frequency resolution pattern is used for the stable state section of audio signal and the low frequency resolution model is used to handle transient state.Yet the switching of resolution is owing to the constraint of theory and practice can not in time at random take place.On the contrary, it usually occurs in frame boundary, and promptly frame is handled with high frequency resolution pattern or low frequency resolution model.As shown in Figure 7, for transient state frame 131, bank of filters has switched to the low frequency resolution model to avoid the forward direction echo effect.Because transient state 132 itself is very short, and the preceding transient state 133 of this frame and 134 sections of the transient state in back are much longer, so the bank of filters of low frequency resolution model is obvious and these stable state sections do not match.This has greatly limited total coding gain that entire frame can reach.
The present invention proposes three methods and solve this problem.Basic thought is that the stable state part (stationary majority) for the transient state frame provides a upper frequency resolution in the switchable resolution structure.
The meromict bank of filters
As shown in Figure 3, it comes down to a hybrid filter-bank, the analysis filterbank 28 that comprises a switchable resolution that can between high and low frequency resolution model, switch, and when low frequency resolution model 24, a transient state cluster segmentation unit 25 has been followed in the back, and the analysis filterbank 26 of an optional arbitrary resolution is arranged in each subband then.
When transient detector 12 did not detect transient state and exists, the analysis filterbank 28 of switchable resolution entered low temporal resolution pattern 27, and it guarantees that high frequency resolution to realize the gain of high audio signal encoding, has strong tonal components.
When transient detector 12 detected transient state and exists, the analysis filterbank 28 of switchable resolution entered high time resolution pattern 24.This has guaranteed to handle transient state to prevent forward direction echo with good temporal resolution.So the sub-band samples that produces is divided into the quasi-stable state section by transient state cluster segmentation part 25 as shown in Figure 6.In the following discussion, term " transient state section " waits and refers to these quasi-stable state sections.This back is the analysis filterbank 26 of the arbitrary resolution in each subband, and its subband number equals the sub-band samples number of each transient state section in each subband.
The analysis filterbank 28 of switchable resolution can realize with any bank of filters that can switch its operation between high and low frequency resolution model.One embodiment of the present of invention have adopted pair of DC T, and corresponding to low and high frequency resolution, it is little and big that its transition length is respectively.Suppose that transition length is M, then the sub-band samples of the DCT of type 4 is obtained is:
s ( m , n ) = 2 M Σ k = 0 M - 1 cos [ π M ( k + 0.5 ) ( n + 0.5 ) ] · x ( mM + k )
Wherein, x (.) is an input PCM sample.The DCT of other form can be used for replacing the DCT of type 4.
Because DCT tendency causes blocking effect, thus of the present invention one preferably embodiment adopt improved DCT (MDCT):
s ( m , s ) = 2 M Σ k = 0 2 M - 1 cos [ π M ( k + 0.5 + M 2 ) ( n + 0.5 ) ] · w ( k ) · x ( mM - M + k )
Wherein, w (.) is a window function.
Window function must be the power symmetry in per half window:
w 2(k)+w 2(M-k)=1?k=0,…,M-1
w 2(k+M)+w 2(2M-1-k)=1?k=0,…,M-1
So that guarantee desirable reconstruction.
Can be used although satisfy any window of above-mentioned situation, have only following sine-window
w ( k ) = ± sin [ ( k + 0.5 ) π 2 M ] , k = 0 , · · · , 2 M - 1
Have good characteristic, promptly the DC component in the input signal is focused on first conversion coefficient.
In order to keep desirable reconstruction when MDCT switches between high and low frequency pattern or long and short window, the lap of long and short window must have identical shape.
Depend on the transient characteristic of input PCM sample, encoder can be selected a long window (shown in first window 61 among Fig. 5), switches to a short series of windows (shown in the four-light mouth 64 among Fig. 5), and returns.Length among Fig. 5 is that this class of bridge joint is switched needed to the long window 62 of short conversion with the long window 63 that is short to long conversion.When two transient state very near but be not when being close to the continuous application that is enough to guarantee short window, long window 65 is short to that to change be useful among Fig. 5.Encoder need transmit the window type that is used to each frame to decoder, so that identical window is used to rebuild the PCM sample.
The advantage that is short to the long window of conversion is only to handle the contiguous transient state of a frame at interval.As shown in the top 67 of Figure 17, the MDCT of prior art can handle the transient state of two frames at least at interval.As shown in the bottom 68 of Figure 17, use this long window that is short to conversion it can be reduced to a frame.
The present invention will carry out transient state section 25 then.By utilizing the variation of binary function value from 0 to 1 or 1 to 0, the transient state section can be represented by the binary function or the cluster segmentation border of indication transient position.For example, the quasi-stable state section among Fig. 6 can be expressed as followsin:
T ( n ) = 0 , n = 0,1,2,3,4 1 , n = 5,6,7,8,9 0 , n = 10,11,12,13,14,15,16
Notice that T (n)=0 means that not necessarily the energy of audio signal is very high when time index n, vice versa.Be called as " transient state section function " etc. in following discussion function T (n) everywhere.The information of being carried by this section function must be sent to decoder directly or indirectly.The run length coding, RLC of coding zero-one run length is an effective choice.For top object lesson, T (n) can be sent to decoder with 5,5 and 7 run length code.The run length code can also be by entropy coding.
Transient state cluster segmentation part 25 can realize with any known transient state cluster segmentation method.In one embodiment of the invention, the transient state cluster segmentation can be finished by simply the transient state detection range being got threshold value.
Figure C20051009589800251
Threshold value can be set to
Threshold = k · E max + E min 2
Wherein, k is an adjustable constant.
A more senior embodiment of the present invention is that it comprises the following steps: according to the k means clustering algorithm
1) transient state cluster segmentation function T (n) is initialised, and utilizes the above-mentioned result that threshold method obtains that gets.
2) barycenter of each class is calculated:
For the class that is associated with T (n)=0;
Figure C20051009589800261
For the class that is associated with T (n)=1.
3) transient state cluster segmentation function T (n) is distributed based on following rule
Figure C20051009589800262
4) enter step 2.
The analysis filterbank 26 of arbitrary resolution is a conversion such as DCT in essence, and its block length equals the number of samples in each sub band.Suppose that each subband all exists 32 sub-band samples and them to be divided into (9,3,20) in a frame, then block length is that three conversion of 9,3 and 20 will be by the sub-band samples that is applied in three sub bands each respectively.In the following discussion, term " sub band " waits the sub-band samples that refers to transient state section in the subband.The conversion of the back segment (9,3,20) of m subband can be described as follows with the DCT of type 4
u ( m , n ) = 2 20 Σ k = 0 20 - 1 cos [ π 20 ( k + 0.5 ) ( n + 0.5 ) ] · s ( m , 12 + k )
This conversion will increase the frequency resolution in each transient state section, so can expect a good coding gain.Yet in many cases, coding gain is less than 1 or too little, and favourable decision-making is to abandon this class transformation results and notify decoder this decision-making via supplementary.Because the expense relevant with supplementary, be that then it can improve total coding gain according to one group of sub band if whether abandon the decision of transformation result, promptly bit is utilized for one group of sub band rather than each sub band transmits this decision-making.
In the following discussion, term " quantifying unit " waits to refer to and belongs to identical psychologic acoustics critical band and one group of interior sub-band samples that links to each other of transient state section.Quantifying unit can be a good grouping that is used for the sub band of above-mentioned decision-making.If this is used, then sub bands all in the quantifying unit is calculated total coding gain.If coding gain then is that sub bands all in the quantifying unit keeps transformation result greater than 1 or some other higher thresholds.Otherwise this result is dropped.Only need to transmit the decision-making that this is used for all sub bands of quantifying unit to decoder with a bit.
Switchable filter group+ADPCM
As shown in Figure 4, it basically with Fig. 3 in identical, only the analysis filterbank 26 of arbitrary resolution is substituted by ADPCM29.The decision of whether using ADPCM is again according to one group of sub band such as quantifying unit, so that reduce the cost of supplementary.This group sub band even can share one group of predictive coefficient.Can use the known method that quantizes predictive coefficient herein, such as comprising LAR (log area ratio), IS (arcsine) and LSP (line spectrum pair).
The switchable bank of filters of three-mode
Be different from the common switchable filter group of having only the high-resolution and low-resolution pattern, this bank of filters can be switched its operation between high, medium and low resolution model.High and low frequency resolution model is respectively to be used for stable state and transient state frame, and follows a class principle identical with double mode switchable filter group.The stable state section that mainly is intended that in the transient state frame of mid-resolution pattern provides frequency resolution preferably.Therefore, in a transient state frame, the low frequency resolution model is used to the transient state section, and the mid-resolution pattern is used to the remainder of this frame.Be different from prior art, for the voice data of single frame, switchable filter group of the present invention is operated with two resolution models.The mid-resolution pattern can also be used to handle the frame with level and smooth transient state.
In the following discussion, term " long piece " wait refer to bank of filters each the time be engraved in a sample block of exporting under the high frequency resolution pattern: term " in piece " waits and refers to the bank of filters sample block that each is exported constantly under the intermediate frequency resolution model; Term " short block " etc. refers to the bank of filters sample block that each is exported constantly under the low frequency resolution model.Three kinds of frames can be described as follows with these three kinds of definition:
Bank of filters is operated the stable state frame of handling with the high frequency resolution pattern, and each frame in this class frame generally includes one or more long pieces;
Bank of filters is operated the frame of handling with transient state with high, middle temporal resolution pattern, and each frame in this class frame all comprises several middle pieces and several short block, and the total sample number of all short blocks equals the total sample number of a middle piece;
The frame with level and smooth transient state that bank of filters is handled with operation under the mid-resolution pattern, each frame in this class frame all comprise several middle pieces.
The advantage of this new method is illustrated in Fig. 8.Fig. 8 is identical with Fig. 7 basically, and only originally many sections (141,142 and 143) of handling under the low frequency resolution model in Fig. 7 were handled by the medium frequency resolution model now.Because these sections are stable states,, therefore can expect higher coding gain so the medium frequency resolution model obviously mates more than low frequency resolution model.
One embodiment of the present of invention adopt have little, in, the tlv triple DCT of big block length, correspond respectively to the resolution model of basic, normal, high frequency.
A better embodiment of the present invention (no blocking effect) adopt have little, in, the tlv triple MDCT of big block length.Owing to introduced the mid-resolution pattern, the window type shown in Fig. 9 also be provided the window type in Fig. 5.These windows are described as follows:
Middle window 151;
Long long window 152 to middle conversion: as a long window, its bridge joint from long window to the conversion of window.
In to the long window 153 of long conversion: as a long window, the therefrom window conversion of arriving long window of its bridge joint.
The long window 154 of centre to centre conversion: as a long window, its bridge joint is the conversion of window window in another therefrom.
In to the middle window 155 of short conversion: as a middle window, its bridge joint therefrom window to the conversion of short window.
The middle window 156 of conversion in being short to: as a middle window, its bridge joint from short window to the conversion of window.
In to the long window 157 of short conversion: as a long window, its bridge joint therefrom window to the conversion of weak point window.
The long window 158 of conversion in being short to: as a long window, its bridge joint from lack window to the conversion of window.
Attention: be similar to the long window 65 that is short to conversion among Fig. 5, the long window 154 of centre to centre conversion, in to the long window 157 of short conversion and the long window of changing in being short to 158 can make the transient state of three-mode MDCT processing interval one frame.
Figure 10 illustrates some examples of series of windows.161 understand that for example these embodiment handle the ability of slow transient state with intermediate- resolution 167, and 162 to 166 illustrated to transient state distribute meticulous temporal resolution 168, stable state section in same frame distribute in temporal resolution 169 and distribute the ability of high frequency resolution 170 to the stable state frame.
Common and/difference coding method 14 can here be employed.For example, a simple using method is as follows:
And sound channel=0.5 (L channel+R channel)
Difference sound channel=0.5 (L channel-R channel)
Common combined strength coding method 15 can here be employed.A simple method can be
With source and associating sound channel and replace the source sound channel.
With its be adjusted into quantifying unit in the identical energy level of original source sound channel
Abandon the sub-band samples in combined sound road in the quantifying unit, only the quantification index of scale factor (being called as " scale factor " among " boot vector " or the present invention) is sent to decoder, it is defined as:
The non-uniform quantizing of the boot vector such as logarithm will be used to the auditory properties of match people ear.Entropy coding can be applied to the quantification index of boot vector.
For fear of source and associating sound channel at their phase difference near the cancellation effect under the situations of 180 degree, can be added up to application polarity when forming associating sound channel at them:
And sound channel=source sound channel+polarity associating sound channel
Polarity also must be sent to decoder.
Psychoacoustic model 23 calculates the masking threshold of the current incoming frame of audio samples based on the auditory properties of people's ear, and the quantizing noise that is lower than masking threshold unlikely is heard.Can use any common psychoacoustic model here, but the present invention requires its psychoacoustic model that each quantifying unit is all exported a masking threshold.
The bit resource that overall situation bit distributor 16 distributes a frame to use to each quantifying unit globally, so that the quantization noise power in each quantifying unit is lower than its masking threshold separately, it controls the quantization noise power of each quantifying unit by regulating its quantization step.All sub-band samples in the quantifying unit all use identical step-length to quantize.
Can adopt all known Bit distribution methods here.One of these class methods are famous Water Filling algorithms.Its basic thought is to find the highest quantifying unit of its QNMR (quantizing noise masking ratio), and minimizing is distributed to the step-length of this quantifying unit to reduce quantizing noise.It repeats this and handles till the bit resource exhaustion of the QNMR of all quantifying unit (or any other threshold value) or present frame all less than 1.
Quantization step itself must be quantized so that it can be packaged in the bit stream.Non-uniform quantizing such as logarithm will be used to the auditory properties of match people ear.Entropy coding can be applied to the quantification index of step-length.
The step-length that the present invention uses overall Bit Allocation in Discrete 16 to provide quantizes all sub-band samples in each quantifying unit 17.Can use all quantization schemes linear or nonlinear, even or heterogeneous here.
When only in present frame, having transient state, just can optionally call staggered 18.(m, n k) are k quantification index in m quasi-stable state section and n the subband to allow x.(m, n, k) order that is arranged of quantification index normally.Interleaved units 18 rearrangement quantification indexes so as they be arranged as (n, m, k).The motivation of doing like this is that rearranging of quantification index can be so that the required bit number of these encoded indexs lacks during than staggered index.Whether call staggered decision-making and need be sent to decoder as supplementary.
In the audio coding algorithm formerly, the range of application of entropy code book is identical with quantifying unit, so the entropy code book is determined (referring to the top of Figure 11) by the quantification index in the quantifying unit.Therefore be not used in the space of optimization.
The present invention is diverse in this respect.It is proceeding to the existence of having ignored quantifying unit when code book is selected.On the contrary, it distributes to each quantification index to best code book, therefore in essence quantification index has been converted to the code book index.Then, it is divided into bigger section to these code book indexes, and segment boundary has defined the scope that code book is used.Obviously, these code book ranges of application differ greatly with the scope of being determined by quantifying unit.They only are based on the quality of quantification index, thereby selected code book is more suitable for quantification index.Therefore, only need less bit that quantification index is sent to decoder.
The advantage that this method in contrast to prior art is illustrated in Figure 11.The quantification index of maximum during let us is with the aid of pictures.It belongs to quantifying unit d and utilizes previous method will select a big code book, and this big code book obviously is not best, because the most of indexes among the quantifying unit d are much smaller.On the other hand, the new method of the application of the invention, the identical quantification index section of being divided into C is so the big quantification index of it and other is shared a code book.In addition, all quantification indexes among the section D are all very little, so a little code book is with selected.Therefore, need less bit to come the coded quantization index.
Referring now to Figure 12,, the system of prior art need only be sent to decoder to the code book index as supplementary, because their range of application is identical with the quantifying unit of being scheduled to.Yet method of the present invention also needs the code book range of application is sent to decoder as supplementary except transmitting the code book index, because they are independent of quantifying unit.If deal with improperly, then this overhead may be used for whole supplementary and quantification index with more bits and finish.Therefore, it is quite crucial for this expense of control that the code book index is divided into big section, because big section means that the code book index of less number and range of application thereof need be sent to decoder.
One embodiment of the present of invention are finished new departure that this code book is selected with the following step:
1) quantification index is blocked into district's group, each district's group comprises P quantification index.
2) determine that each district organizes maximum code book demand.For symmetrical quantizer, this is represented by absolute quantification index maximum in each district's group usually:
I max ( n ) = max k = 0 P - 1 | I ( nP + k ) | , N ∈ { all district's groups }
Wherein I (.) is a quantification index;
3) minimum code book is distributed to district's group that can hold maximum code book demand:
4) method of the minimum value of the code book index by the code book index of those code book indexes isolated pocket littler than its neighbour being risen to its neighbour and these isolated pockets are disposed.This is illustrated to 80 by mapping 71 to 72,73 to 74,77 to 78 and 79 in Figure 12.Deeply can from handling, this be removed, because this code book indication does not have code to be transmitted corresponding to the isolated pocket in the code book index of zero quantification index.This is described to 75 to 76 mapping in Figure 12.This step has reduced the number and the range of application thereof of the code book index that need be sent to decoder significantly.
One embodiment of the present of invention adopt the run length code code book range of application of encoding, and the run length code can also be encoded with the entropy code.
All quantification indexes all use the code book determined by entropy code book selector 19 and and their ranges of application separately encode 20.
Entropy coding can be realized with various huffman code books.When the quantification progression in the code book was very little, a plurality of quantification indexes were collected (blocked) to together to form a big huffman code book.When the number (number of quantization levels) of quantized level is too big, (for example surpass 200), then adopt the recurrence index.To this, a big quantification index q is represented as
q=m·M+r
Wherein, M is a mould, and m is the merchant, and r is a remainder.Have only m and r need be sent to decoder.In them one or its both can encode with Huffman code.
Entropy coding can be realized with various arithmetic code books.(for example surpass 200) when the quantized level number is too big, the recurrence index also will be used.
The entropy coding of other type also can be used to above-mentioned Huffman and arithmetic coding.
Without entropy coding and all or part of quantification index of directly packing also is a good selection.
Because the statistical property of quantification index is obviously different when the employing of variable-resolution bank of filters is hanged down with high resolution model, one embodiment of the present of invention adopt two entropy code book storehouses to come coded quantization index under these two patterns respectively.The 3rd storehouse can be used to the mid-resolution pattern, and it can also share this storehouse with high or low resolution model.
The present invention is all quantification indexes and the multiplexed 21 one-tenth complete bit streams of other supplementary.Supplementary comprises the length of quantization step, sample rate, speaker configurations, frame length, quasi-stable state section, the code of entropy code book etc.Other supplementary such as timing code also can be packetized in the bit stream.
The system of prior art need be sent to decoder to the quantifying unit number of each transient state section, because the code book and the quantification index self of the unpacking of quantization step, quantification index all depend on this.Yet in the present invention, because the selection of quantification index code book and range of application thereof is isolated (decouple) by the ad hoc approach of entropy code book selection 19 from quantifying unit, bit stream can constitute with method so, and promptly quantification index can be unpacked before needing the number of quantifying unit.In case quantification index is unpacked, they just can be used to rebuild the number of quantifying unit.This will explain in decoder.
Above-mentioned consideration has been arranged, and one embodiment of the present of invention are used a bit stream structure as shown in Figure 16 when using meromict bank of filters or switchable filter group+ADPCM, and it comprises in essence with the lower part:
Synchronization character 81: the beginning of indicative audio Frame;
Frame head 82: comprise the relevant information of audio signal, such as sample rate, normal channel number, LFE (low-frequency effects) channel number, speaker configurations etc.;
Sound channel 1,2 ..., N, 83,84,85: all voice datas of each sound channel are all packaged at this;
Auxiliary data 86: comprise the auxiliary data such as timing code;
Error detection 87: error detection code is inserted into the mistake that occurs in the present frame to detect here, so that error handling program can start when detecting bit stream error;
The voice data of each sound channel also is configured as follows:
Window type 90: the window indicating shown in Fig. 5 is used to encoder so that decoder can use identical window;
Transient position 91: only be used for the frame of transient state, it indicates the position of each transient state section.If the run length code is used, then this is the packaged position of length of each transient state section;
92: one bits of staggered decision-making only in the transient state frame, indicate whether that the quantification index of staggered each transient state section is wanted the deinterleave quantification index so that decoder knows whether;
Code book exponential sum range of application 93: it transmits about the entropy code book and to the information of the range of application of quantification index all, and it comprises with the lower part:
Zero code book number 101: the entropy code book number that transmits each transient state section of current sound channel;
Zero range of application 102: organize the range of application that transmits each entropy code book according to quantification index or district, they can also be encoded with the entropy code;
Zero code book index 103: this index is sent to the entropy code book, and they can also further be encoded with the entropy code;
Quantification index 94: transmit the entropy code that is used for current all quantification indexes of sound channel;
Quantization step 95: index is sent to the quantization step that is used for each quantifying unit, and it can also be encoded with the entropy code.As previously explained, the number of the number of step index or quantifying unit will be rebuild from quantification index by decoder as shown in 49;
96: one bits of the bank of filters decision-making of arbitrary resolution are used for each quantifying unit, when the analysis filterbank 28 that it only appears at switchable resolution was taked the low frequency resolution model, whether the instruction decoding device will rebuild (51 or 55) to the bank of filters that all sub bands in the quantifying unit are carried out arbitrary resolutions;
With/difference coding determine 97: one bits be used for by with of the quantifying unit of/difference coding.It is selectable and only appear at adopt and/during the difference coding, its instruction decoding device whether to carry out and/difference decoding 47;
Decision-making of combined strength coding and boot vector 98: it transmits the information that whether will carry out the combined strength decoding about decoder, it is selectable and only is used for the quantifying unit of the associating sound channel of being encoded by combined strength, and when only appearing at encoder employing combined strength coding, it comprises with the lower part:
Zero decision-making 121: bit of each associating quantifying unit indicates whether and will unite channel decoding to the sub-band samples in the quantifying unit to decoder;
Zero polarity 122: bit of each associating quantifying unit, represent the polarity of associating sound channel with respect to the source sound channel:
Figure C20051009589800371
Zero boot vector 123: scale factor of each associating quantifying unit, it can be by entropy coding;
Auxiliary data 99: comprise the supplementary such as dynamic range control.
When the switchable bank of filters of three-mode was used, bit stream structure was same as described above in essence, except:
Window type 90: indicate which window such as Fig. 5 to be used for encoder so that decoder can use identical window with window shown in Fig. 9.Notice that for the frame with transient state, this window type only relates to last window in the frame, because the last window that remaining window can use is inferred from this window type, transient position and last frame;
Transient position 91: only appear under the situation of frame with transient state.It at first indicates this frame whether to have slow transient state 171.If not, then it also indicates transient position according to short block 173 then according to middle piece 172;
The bank of filters decision-making 96 of arbitrary resolution: it is incoherent, therefore is not used.
Decoder
Decoder of the present invention has been realized the contrary processing of encoder basically, and it is illustrated in Figure 13 and is explained as follows.
A demultiplexing device 41 decodes quantification index from bit stream, and as the supplementary of quantization step, sample rate, speaker configurations and timing code etc. and so on.When the prefix entropy code such as Huffman code was used, this step was an one step that combines the entropy decoding.
Quantification index code book decoder 42 from bit stream, decode the entropy code book of quantification index and range of application separately thereof.
Entropy decoder 43 is based on the entropy code book that is provided by quantification index code book decoder 42 and range of application separately thereof the quantification index of decoding from bit stream.
When only existing transient state in present frame, just optionally adopted deinterleave 44.If the decision-making bit indication staggered 18 that unpacks from bit stream was called encoder, then the deinterleave quantification index.Otherwise, do not make any modification ground and transmit quantification index.
The present invention rebuilds the number of quantifying unit from the non-zero quantification index of each transient state section 49.Allow q (m n) is quantification index (if there is not transient state in the frame, then only having a transient state section) for n subband of m transient state section, finds out the maximum subband with non-zero quantification index of each transient state section m:
Band max ( m ) = max n { n | q ( m , n ) ≠ 0 }
Recall, quantifying unit is to be defined by critical band in frequency and temporal transient state section, so the quantifying unit number of each transient state section is to hold Band Max(m) minimum critical frequency band.Allow the frequency band (Cb) be the maximum subband of Cb critical band, the quantifying unit number of each transient state section m can be expressed as follows:
N ( m ) = min Cb { Cb | Band ( Cb ) ≥ Band max ( m ) }
Quantization step unpacks 50 quantization steps that unpack each quantifying unit from bit stream.
Re-quantization 45 utilizes each quantization step of each quantifying unit to rebuild sub-band samples from quantification index.
If called combined strength coding 15 in the bit stream indication encoder, then combined strength decoding 46 is duplicated sub-band samples and it be multiply by polarity and boot vector to rebuild the sub-band samples of associating sound channel from the source sound channel:
Associating sound channel=polarity boot vector source sound channel
If bit stream indication and/difference coding 14 was called in encoder, then and/difference decoder 47 from/difference sound channel rebuild left and right acoustic channels.Corresponding to/difference coding 14 in explained and/difference coding example, left and right acoustic channels can be resorted to:
L channel=and sound channel+difference sound channel
R channel=and sound channel-difference sound channel
Decoder of the present invention combines the composite filter group 48 of a variable-resolution, and it comes down to be used for the return device of analysis filterbank of code signal.
If the analysis filterbank of three-mode switchable resolution is used to encoder, then the operation of its corresponding composite filter group is determined uniquely and is required identical series of windows to be used for synthetic the processing.
If meromict bank of filters or switchable filter group+ADPCM are used to encoder, then decode procedure is described as follows:
If bit stream indication present frame is to encode with the high frequency resolution pattern with the analysis filterbank 28 of switchable resolution, then therefore the composite filter group 52 of switchable resolution enters the high frequency resolution pattern and rebuild PCM sample (seeing Figure 14 and Figure 15) from sub-band samples.
If bit stream indication present frame is to encode with the low frequency resolution model with the analysis filterbank 28 of switchable resolution, then sub-band samples at first is sent to the composite filter group 51 (Figure 14) or the contrary ADPCM55 (Figure 15) of arbitrary resolution, and this depends on which has been used in the encoder, finishes their synthetic processing separately then.Then, the PCM sample is rebuild from these synthetic sub-band samples with low frequency resolution model 53 by the composite filter group of switchable resolution.
Composite filter group 52,51 and 55 is respectively the return device of analysis filterbank 28,26 and 29.Their structure and operational processes are come to determine uniquely by analysis filterbank.Therefore, no matter use what analysis filterbank in encoder, its corresponding composite filter group must be used to decoder.
Low coding delay pattern
When the high frequency resolution pattern of the analysis filterbank of switchable resolution is forbidden by encoder, frame length can be reduced to block length or its integral multiple of the bank of filters of the switchable resolution under the low frequency mode subsequently, this has produced a much smaller frame length, causes the required much smaller delay of encoder operation.Low coding delay pattern of the present invention that Here it is.
Although some embodiment for example purpose are described in detail, yet under the prerequisite that does not depart from the scope of the present invention with spirit, can make different modifications.Therefore, the present invention is only limited by additional claim.

Claims (34)

1. a method that is used for the Code And Decode multichannel digital audio signal comprises the following steps:
A) the PCM sample cluster segmentation of input is become the quasi-stable state frame;
B) utilize the analysis filterbank of variable-resolution that the PCM sample conversion is become sub-band samples;
C) the sub-band samples piecemeal is quantized into numerous quantification indexes;
D) provide pre-designed code book storehouse;
E) based on the local characteristics of quantification index code book is distributed to many group quantification indexes, thereby make code book range of application and piece quantization boundary have nothing to do;
F) code book index and range of application separately thereof are encoded;
G) create a complete encoded data stream, this encoded data stream comprises the quantification index that distributes through code book and described encoded code book index and range of application separately thereof;
H) send this complete encoded data stream;
I) receive this encoded data stream and unpack this data flow;
J) quantification index of from data flow, decoding;
K) from decoded quantification index, rebuild sub-band samples; With
L) from the sub-band samples of rebuilding, rebuild audio frequency PCM sample.
2. the method for claim 1, wherein, above-mentioned code book allocation step comprises: thus by the index that can hold the minimum available code book of above-mentioned quantification index is distributed to each quantification index quantification index is converted to the code book index, and code book index cluster segmentation is become a plurality of ranges of application.
3. the process of claim 1 wherein that the duration of quasi-stable state frame is between 2 to 50 milliseconds.
4. the process of claim 1 wherein that above-mentioned switch process comprises: use the resolution bank of filters that between high and low frequency resolution model, to switch selectively.
5. the method for claim 4 comprises and detects the transient state step, uses the high frequency resolution pattern when not detecting transient state, and when detecting transient state, switches to the low frequency resolution model.
6. the method for claim 5, wherein, along with the resolution bank of filters is switched to the low frequency resolution model, sub-band samples is just become the quasi-stable state section by cluster segmentation.
7. the method for claim 4, wherein, the resolution bank of filters is configured to comprise a long window, its can bridge joint from short window immediately to the conversion of another short window, so that only handle the transient state of a long window at interval.
8. the process of claim 1 wherein that above-mentioned switch process comprises the resolution bank of filters that can switch selectively of use between high resolution model, low-resolution mode and mid-resolution pattern, so that in single frame, adopt a plurality of resolution.
9. the method for claim 8, wherein, the resolution bank of filters is configured to comprise a window, but this window bridge joint from a shorter window immediately to the conversion of another shorter window, so that only handle the transient state of this class window at interval.
10. the method for claim 6 comprises: use the bank of filters of an arbitrary resolution or the frequency resolution that adaptive differential pulse-code modulation ADPCM repairs each stable state section.
11. the process of claim 1 wherein at described step b), c) between further comprise the step of calculating a masking by noise threshold value.
12. the method for claim 11, wherein, calculation procedure is to carry out with a psychoacoustic model.
13. the process of claim 1 wherein that described step c) comprises: use by the bit resource allocation is quantized sub-band samples to the quantization step that bit distributor provided in many groups sub-band samples, so that quantization noise power is lower than a masking threshold.
14. the method for claim 1 is at described step b), c) between further comprise the step that the right sub-band samples of left and right acoustic channels is converted and differs to the right sub-band samples of sound channel.
15. the method for claim 14, wherein, above-mentioned switch process with and/difference encoder carry out.
16. the method for claim 1 is at described step b), c) between further comprise: contrast source sound channel is extracted the intensity factor of associating sound channel, merges to the source sound channel uniting sound channel, and abandons all sub-band samples in the associating sound channel.
17. the method for claim 16, wherein, said extracted and combining step are carried out with a combined strength encoder.
18. the process of claim 1 wherein that described step c) further comprises: quantification index rearranges to reduce total number of bits when having transient state in the frame.
19. the method for claim 1 comprises: provide a run length coding, RLC device, the range of application of the code book that is used to encode.
20. the process of claim 1 wherein that described step a) further comprises: when detecting transient state, use a transient state cluster segmentation algorithm.
21. the process of claim 1 wherein that the step of above-mentioned establishment encoded data stream is carried out with a multiplexer.
22. the process of claim 1 wherein that described code book index and range of application separately thereof comprise number, range of application and the code book index of code book.
23. the method for claim 1, wherein said step l) comprising: when encoded data stream indication present frame is when encoding with the low frequency resolution model with the switchable resolution analysis filterbank, variable synthetic resolution bank of filters is as a secondary hybrid filter-bank, wherein, the first order comprises composite filter group or contrary adaptive differential pulse-code modulation ADPCM of an arbitrary resolution, and the second level is the low frequency resolution model of variable composite filter group.
24. the method for claim 1, wherein said step l) comprising: when data flow indication present frame is when encoding with the high frequency resolution pattern with the analysis filterbank of switchable resolution, the composite filter group of variable-resolution is operated with the high frequency resolution pattern.
25. the process of claim 1 wherein that the step of above-mentioned unpacked data stream is carried out with a demultiplexing device.
26. the method for claim 1, wherein, above-mentioned decoding step utilization quantizes exponential code book decoder and decode the entropy code book of quantification index and range of application separately thereof from data flow, and decodes quantification index with entropy decoder based on the entropy code book and the range of application separately thereof of described decoding from data flow.
27. the process of claim 1 wherein that above-mentioned decoding step also comprises with entropy decoder decode the quantized subband index from data flow.
28. the method for claim 27 comprises and utilizes the decoded quantification index that goes out to rebuild the number of quantifying unit.
29. the process of claim 1 wherein that described step a) further comprises: rearrange quantification index when in present frame, detecting transient state.
30. the method for claim 29, wherein, the above-mentioned step that rearranges is carried out with a deinterleaver.
31. the method for claim 16 is wherein at described step k), l) between further comprise: the sub-band samples of from the sub-band samples of source sound channel, rebuilding the associating sound channel with the intensity factor of associating sound channel.
32. the method for claim 31, wherein, above-mentioned reconstruction procedures is carried out with a combined strength decoder.
33. the method for claim 14 is wherein at described step k), l) between further comprise: from rebuilding the right sub-band samples of left and right acoustic channels with the right sub-band samples of difference sound channel.
34. the method for claim 33, wherein, above-mentioned reconstruction procedures with one and/difference decoder carry out.
CNB2005100958986A 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding Active CN100364235C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US61067404P 2004-09-17 2004-09-17
US60/610,674 2004-09-17
US11/029,722 2005-01-04

Related Child Applications (7)

Application Number Title Priority Date Filing Date
CN2007101051439A Division CN101055721B (en) 2004-09-17 2005-09-07 Multi-sound channel digital audio encoding device and its method
CN2007101051462A Division CN101312041B (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2007101051443A Division CN101055719B (en) 2004-09-17 2005-09-07 Method for encoding and transmitting multi-sound channel digital audio signal
CN2008100034638A Division CN101246689B (en) 2004-09-17 2005-09-07 Audio encoding system
CN2008100034572A Division CN101247129B (en) 2004-09-17 2005-09-07 Signal processing method
CN2007101051458A Division CN101046963B (en) 2004-09-17 2005-09-07 Method for decoding encoded audio frequency data stream
CN2008100034623A Division CN101241701B (en) 2004-09-17 2005-09-07 Method and equipment used for audio signal decoding

Publications (2)

Publication Number Publication Date
CN1848690A CN1848690A (en) 2006-10-18
CN100364235C true CN100364235C (en) 2008-01-23

Family

ID=37078085

Family Applications (8)

Application Number Title Priority Date Filing Date
CN2008100034638A Active CN101246689B (en) 2004-09-17 2005-09-07 Audio encoding system
CN2007101051443A Active CN101055719B (en) 2004-09-17 2005-09-07 Method for encoding and transmitting multi-sound channel digital audio signal
CNB2005100958986A Active CN100364235C (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2008100034623A Active CN101241701B (en) 2004-09-17 2005-09-07 Method and equipment used for audio signal decoding
CN2007101051462A Active CN101312041B (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2007101051458A Active CN101046963B (en) 2004-09-17 2005-09-07 Method for decoding encoded audio frequency data stream
CN2008100034572A Active CN101247129B (en) 2004-09-17 2005-09-07 Signal processing method
CN2007101051439A Active CN101055721B (en) 2004-09-17 2005-09-07 Multi-sound channel digital audio encoding device and its method

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN2008100034638A Active CN101246689B (en) 2004-09-17 2005-09-07 Audio encoding system
CN2007101051443A Active CN101055719B (en) 2004-09-17 2005-09-07 Method for encoding and transmitting multi-sound channel digital audio signal

Family Applications After (5)

Application Number Title Priority Date Filing Date
CN2008100034623A Active CN101241701B (en) 2004-09-17 2005-09-07 Method and equipment used for audio signal decoding
CN2007101051462A Active CN101312041B (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2007101051458A Active CN101046963B (en) 2004-09-17 2005-09-07 Method for decoding encoded audio frequency data stream
CN2008100034572A Active CN101247129B (en) 2004-09-17 2005-09-07 Signal processing method
CN2007101051439A Active CN101055721B (en) 2004-09-17 2005-09-07 Multi-sound channel digital audio encoding device and its method

Country Status (1)

Country Link
CN (8) CN101246689B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104798131A (en) * 2012-10-05 2015-07-22 弗朗霍夫应用科学研究促进协会 Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8054969B2 (en) * 2007-02-15 2011-11-08 Avaya Inc. Transmission of a digital message interspersed throughout a compressed information signal
CN101453643B (en) * 2007-12-04 2011-05-18 华为技术有限公司 Quantitative mode, image encoding, decoding method, encoder, decoder and system
US8630848B2 (en) * 2008-05-30 2014-01-14 Digital Rise Technology Co., Ltd. Audio signal transient detection
CN101577116B (en) * 2009-02-27 2012-07-18 北京中星微电子有限公司 Extracting method of MFCC coefficients of voice signal, device and Mel filtering method
CN101615911B (en) 2009-05-12 2010-12-08 华为技术有限公司 Coding and decoding methods and devices
EP3349360B1 (en) 2011-01-14 2019-09-04 GE Video Compression, LLC Entropy encoding and decoding scheme
US10580417B2 (en) * 2013-10-22 2020-03-03 Industry-Academic Cooperation Foundation, Yonsei University Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain
WO2015144587A1 (en) * 2014-03-25 2015-10-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder device and an audio decoder device having efficient gain coding in dynamic range control
CN104050968B (en) * 2014-06-23 2017-02-15 东南大学 Embedded type audio acquisition terminal AAC audio coding method
CN104240712B (en) * 2014-09-30 2018-02-02 武汉大学深圳研究院 A kind of three-dimensional audio multichannel grouping and clustering coding method and system
CN105261373B (en) * 2015-09-16 2019-01-08 深圳广晟信源技术有限公司 Adaptive grid configuration method and apparatus for bandwidth extension encoding
CN107895580B (en) * 2016-09-30 2021-06-01 华为技术有限公司 Audio signal reconstruction method and device
CN108461086B (en) * 2016-12-13 2020-05-15 北京唱吧科技股份有限公司 Real-time audio switching method and device
US10354668B2 (en) * 2017-03-22 2019-07-16 Immersion Networks, Inc. System and method for processing audio data
US10699723B2 (en) * 2017-04-25 2020-06-30 Dts, Inc. Encoding and decoding of digital audio signals using variable alphabet size
CN109286922B (en) * 2018-09-27 2021-09-17 珠海市杰理科技股份有限公司 Bluetooth prompt tone processing method, system, readable storage medium and Bluetooth device
EP3751567B1 (en) * 2019-06-10 2022-01-26 Axis AB A method, a computer program, an encoder and a monitoring device
CN110970039A (en) * 2019-11-28 2020-04-07 北京蜜莱坞网络科技有限公司 Audio transmission method and device, electronic equipment and storage medium
CN115691521A (en) * 2021-07-29 2023-02-03 华为技术有限公司 Audio signal coding and decoding method and device
CN115691514A (en) * 2021-07-29 2023-02-03 华为技术有限公司 Coding and decoding method and device for multi-channel signal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010038643A1 (en) * 1998-07-29 2001-11-08 British Broadcasting Corporation Method for inserting auxiliary data in an audio data stream
JP2001325000A (en) * 2000-05-15 2001-11-22 Nippon Columbia Co Ltd Audio signal coding device
JP2003280695A (en) * 2002-03-19 2003-10-02 Sanyo Electric Co Ltd Method and apparatus for compressing audio

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
AU763060B2 (en) * 1998-03-16 2003-07-10 Koninklijke Philips Electronics N.V. Arithmetic encoding/decoding of a multi-channel information signal
JP4373006B2 (en) * 1998-05-27 2009-11-25 マイクロソフト コーポレーション Scalable speech coder and decoder
JP3346398B2 (en) * 2000-10-27 2002-11-18 日本ビクター株式会社 Audio encoding method and audio decoding method
US6636830B1 (en) * 2000-11-22 2003-10-21 Vialta Inc. System and method for noise reduction using bi-orthogonal modified discrete cosine transform
KR100472442B1 (en) * 2002-02-16 2005-03-08 삼성전자주식회사 Method for compressing audio signal using wavelet packet transform and apparatus thereof
GB2388502A (en) * 2002-05-10 2003-11-12 Chris Dunn Compression of frequency domain audio signals
CN100481733C (en) * 2002-08-21 2009-04-22 广州广晟数码技术有限公司 Coder for compressing coding of multiple sound track digital audio signal
CN100339886C (en) * 2003-04-10 2007-09-26 联发科技股份有限公司 Coding device capable of detecting transient position of sound signal and its coding method
CN1460992A (en) * 2003-07-01 2003-12-10 北京阜国数字技术有限公司 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010038643A1 (en) * 1998-07-29 2001-11-08 British Broadcasting Corporation Method for inserting auxiliary data in an audio data stream
JP2001325000A (en) * 2000-05-15 2001-11-22 Nippon Columbia Co Ltd Audio signal coding device
JP2003280695A (en) * 2002-03-19 2003-10-02 Sanyo Electric Co Ltd Method and apparatus for compressing audio

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104798131A (en) * 2012-10-05 2015-07-22 弗朗霍夫应用科学研究促进协会 Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding
US9734833B2 (en) 2012-10-05 2017-08-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoder, decoder and methods for backward compatible dynamic adaption of time/frequency resolution spatial-audio-object-coding
US10152978B2 (en) 2012-10-05 2018-12-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding

Also Published As

Publication number Publication date
CN101046963B (en) 2011-03-23
CN101241701B (en) 2012-06-27
CN101312041B (en) 2011-05-11
CN101055719B (en) 2011-02-02
CN101247129B (en) 2012-05-23
CN101246689A (en) 2008-08-20
CN101055719A (en) 2007-10-17
CN101312041A (en) 2008-11-26
CN1848690A (en) 2006-10-18
CN101241701A (en) 2008-08-13
CN101246689B (en) 2011-09-14
CN101247129A (en) 2008-08-20
CN101055721A (en) 2007-10-17
CN101055721B (en) 2011-06-01
CN101046963A (en) 2007-10-03

Similar Documents

Publication Publication Date Title
CN100364235C (en) Apparatus and methods for multichannel digital audio coding
JP5695714B2 (en) Multi-channel digital speech coding apparatus and method
US9361894B2 (en) Audio encoding using adaptive codebook application ranges
US6529604B1 (en) Scalable stereo audio encoding/decoding method and apparatus
CN100546233C (en) Be used to support the method and apparatus of multichannel audio expansion
US7620554B2 (en) Multichannel audio extension
US20020049586A1 (en) Audio encoder, audio decoder, and broadcasting system
JP2012163969A5 (en)
WO2002043291A2 (en) Perceptual audio signal compression system and method
JPH03167927A (en) Bit allotment device for conversion digital audio broadcasting signal being adaptation type quantitized on psychological hearing basis
KR20040019889A (en) Method and apparatus for encoding or decoding an audio signal that is processed using multiple subbands and overlapping window functions
KR20040086880A (en) Method and apparatus for encoding/decoding digital data
CN101065796A (en) Method and apparatus for coding/decoding using inter-channel redundance
JPH07106977A (en) Information decoding device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20061018

Assignee: Shenzhen Sheng Digital Technology Co., Ltd.

Assignor: Guangsheng Digital Technology Co., Ltd., Guangzhou

Contract record no.: 2010990000326

Denomination of invention: Multi-sound channel digital audio encoding device and its method

Granted publication date: 20080123

License type: Common License

Record date: 20100602

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model