CN101247129B - Signal processing method - Google Patents

Signal processing method Download PDF

Info

Publication number
CN101247129B
CN101247129B CN2008100034572A CN200810003457A CN101247129B CN 101247129 B CN101247129 B CN 101247129B CN 2008100034572 A CN2008100034572 A CN 2008100034572A CN 200810003457 A CN200810003457 A CN 200810003457A CN 101247129 B CN101247129 B CN 101247129B
Authority
CN
China
Prior art keywords
code book
index
resolution
quantification
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008100034572A
Other languages
Chinese (zh)
Other versions
CN101247129A (en
Inventor
游余立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Digital Rise Technology Co Ltd
Original Assignee
GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=37078085&utm_source=***_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CN101247129(B) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Priority claimed from US11/029,722 external-priority patent/US7630902B2/en
Application filed by GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU filed Critical GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Publication of CN101247129A publication Critical patent/CN101247129A/en
Application granted granted Critical
Publication of CN101247129B publication Critical patent/CN101247129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a signal processing method used for sound signal code comprising: receiving sub-band sample of singal which is converted; partitioning and quantizing the sub-band sample into multiple quantization index; allocating the code book to the multi-group quantization index based on the local feature of the quantization index, therefore the application range of the code bood is foreign to the block quantizing border. The method can be applied for processing signal, such as multichannel sound signals, video signal. The code book is allocated based on the quality of the quantization index to transfer the quantization index into code bood index, therefore the selected code bood is more suitable for the quantization index and the quantization index is transfered only by less bits.

Description

The code book distribution method that is used for audio-frequency signal coding
The application is that the application number submitted on September 17th, 2005 is 200510095898.6, name is called dividing an application of " multi-sound channel audio encoding device and method thereof ".
Related application
The application requires the priority of the U.S. Provisional Application 60/610,674 of application on September 17th, 2004.
Technical field
The present invention relates generally to the method and system that is used for the Code And Decode multichannel digital audio signal.Or rather; The present invention relates to the digital audio encoding system of a low bit rate; It is obtaining to greatly reduce the bit rate of multi-channel audio signal so that effectively send or store when transparent audio signal is reproduced, even even expert listener can not be distinguished audio signal and primary signal in the decoder end reduction.
Background technology
The multi-sound channel digital audio coded system generally includes following elements: the T/F analysis filterbank, its produce PCM (pulse code modulation) sample of input a frequency representation, be called sub-band samples or subband signal; Psychoacoustic model, its auditory properties based on people's ear calculates a masking threshold, and the quantizing noise that is lower than this masking threshold is unlikely heard; Overall situation bit distributor, it is to every group of sub-band samples allocation bit resource, so that the quantization noise power that obtains is lower than masking threshold; A plurality of quantizers, it quantizes sub-band samples according to the bit that is assigned with; A plurality of entropy coders, it reduces the statistical redundancy property in the quantification index; With last multiplexer, the entropy coding of its quantification index and other supplementary are packaged into a complete bit stream.
For example, Doby AC-3 is mapped to input PCM sample in the frequency domain with MDCT (improved discrete cosine transform) bank of filters of the high frequency resolution of changeable window size.Steady-state signal is analyzed with 512 window, and transient signal and 256 s' window is analyzed.Be represented as index/mantissa and be quantized subsequently from the subband signal of MDCT.Adopt forward direction-back to optimize and quantize and reduce the required bit of coded-bit assignment information to adaptive psychoacoustic model.Do not use entropy coding for the complexity that reduces decoder.At last, quantification index and other supplementary are multiplexed into a complete AC-3 bit stream.The frequency resolution of the adaptive M DCT that disposes among the AC-3 is not complementary with input signal characteristics well, so its compression property receives very big restriction.Lacking of entropy coding is another factor that limits its compression property.
MPEG1&2 layer III (MP3) uses the multiphase filter group of one 32 frequency band, each sub-filter wherein all follow one 6 and 18 between the adaptive M DCT that switches.A senior psychoacoustic model is used to instruct its Bit Allocation in Discrete and scalar non-uniform quantizing.Huffman (Huffman) sign indicating number is used to most other supplementary of coded quantization exponential sum.The relatively poor frequency isolation of hybrid filter-bank has greatly limited its compression property and has had very high algorithm complexity.
The relevant acoustics of DTS adopts the multiphase filter group of one 32 frequency band to obtain the low resolution frequency representation of input signal.For the frequency resolution of compensate for poor, ADPCM (adaptive differential pulse-code modulation) optionally is used for each subband.If ADPCM produces a good coding gain, then evenly scalar quantization is directly applied to sub-band samples or is applied to prediction residual.Vector quantization can optionally be applied to high-frequency subband.Huffman code can optionally be applied to scalar quantization index and other supplementary.Because the structure of multiphase filter group+ADPCM can not provide good time and frequency resolution at all, its compression property is very low.
MPEG 2AAC and MPEG 4AAC adopt an adaptive M DCT bank of filters, and its window size can switch between 256 and 2048.The masking threshold that psychoacoustic model produces is used to instruct its scalar non-uniform quantizing and Bit Allocation in Discrete.Huffman code is used to most other supplementary of coded quantization exponential sum.Many other instruments such as TNS (temporary transient noise shaping), gain controlling (being similar to the hybrid filter-bank of MP3), spectrum prediction (linear prediction in the subband) are used to further strengthen its compression property, and this is a cost greatly to have increased algorithm complexity.
Therefore, still need the audio coding system of a low bit rate, its bit rate that has greatly reduced multi-channel audio signal reproduces and also can obtain transparent audio signal simultaneously to be used for effectively transmission or storage.The present invention has satisfied these needs and other associated advantages is provided.
Summary of the invention
In following discussion, term " analysis/synthetic filtering device group " etc. refers to the analysis/synthetic equipment or the method for time of implementation-frequency.It can comprise as follows without limitation:
● unitary transformation;
● threshold sampling, uniformly or become during band pass filter group heterogeneous or change group when non-;
● harmonic wave or sinusoidal wave analyzer/synthesizer.
Multiphase filter group, DFT (discrete Fourier transform), DCT (discrete cosine transform) and MDCT are some bank of filters that are widely used.Term " subband signal or sub-band samples " etc. refers to and comes from analysis filterbank and the signal or the sample that get into the composite filter group.
An object of the present invention is low rate encoding for multi-channel audio signal provides with the compression performance of the same level of prior art but has reduced algorithm complexity.
In the distolateral completion of coding, encoder comprises by encoder for this:
1) framer is used for being divided into the quasi-stable state frame to the PCM sample of input, and its size is the integral multiple of the sub band number of analysis filterbank, and its time scope is 2 to 50ms.
2) transient detector is used for detecting the existence of this frame transient state.An embodiment is according to the threshold value of getting the subband range measurement, obtains the sub-band samples of the analysis filterbank of threshold value under the low frequency resolution model.
3) analysis filterbank of variable-resolution is used for becoming sub-band samples to the PCM sample conversion of input, and it can use one of following the execution:
A) bank of filters can be switched its operation between high, medium and low frequency resolution mode.The high frequency resolution pattern is used for the stable state frame, and in, the low frequency resolution model frame that is used to have transient state.In a transient state frame, the low frequency resolution model is used to the transient state section, and the mid-resolution pattern is used to the remainder of this frame, under this framework, has three types of frames:
I) bank of filters is only operated the stable state frame of handling with the high frequency resolution pattern;
Ii) bank of filters is operated the transient state frame of handling with middle and high temporal resolution pattern;
Iii) bank of filters is only with the slow transient state frame of mid-resolution pattern operational processes;
Two preferred embodiments are presented as follows:
I) DCT realizes, wherein, three other resolution of level are corresponding to three DCT block lengths;
Ii) MDCT realizes, wherein, three other resolution of level are corresponding to three MDCT block lengths or length of window.Define a plurality of window types with the conversion between these windows of bridge joint.
B) hybrid filter-bank, it can switch the bank of filters of its operation based on one between the high-resolution and low-resolution pattern;
When i) in present frame, not having transient state, it switches to the high frequency resolution pattern to guarantee the high compression performance of stable state section;
When ii) in present frame, having transient state, it switches to low frequency resolution/high time resolution pattern to avoid the forward direction echo effect.This low frequency resolution model has also been followed a transient state cluster segmentation level; Its sub-band samples is divided into the stable state section; The bank of filters or the ADPCM of an arbitrary resolution of heel in each subband alternatively then; If selecteed words can be used for to each stable state section suitable frequency resolution being provided.
Provide two embodiment, wherein, one based on DCT and another is based on MDCT.The embodiment that provides two transient state sections goes out, wherein, one based on getting threshold value another is based on the k mean algorithm, two embodiment use the subband range measurement.
2) psychoacoustic model of calculating masking threshold.
3) optional and/difference encoder, the sub-band samples of its left and right acoustic channels centering convert to/the difference sound channel is right.
4) optional combined strength encoder, its contrast source sound channel is extracted the intensity factor (boot vector) of associating sound channel, will unite sound channel and merge in the sound channel of source, and abandon each sub-band samples in the associating sound channel.
5) overall bit distributor, its bit resource allocation are given many group sub-band samples, so that their quantization noise power is lower than masking threshold.
6) scalar quantizer, its step-length that provides with bit distributor quantizes all sub-band samples.
7) optional interleaver, when having transient state in the frame, it optionally is used for arranging quantification index again so that reduce total number of bits.
8) entropy coder, its partial statistics characteristic based on quantification index is distributed to many group quantification indexes to best code book from the code book storehouse, comprise the following steps:
A) distribute to each quantification index to best code book, therefore convert quantification index to the code book index in fact.
B) be divided into very big section to these code book indexes, segment boundary has defined the range of application of code book.
A preferred embodiment is:
C) be the quantification index piecemeal district's group (granule), each district's group comprises the quantification index of fixed number.
D) confirm the maximum code book demand of each district's group.
E) can hold the minimum code book of its maximum code book demand to distinguishing set of dispense:
F) remove those code book indexes isolated pocket littler than its neighbour's code book index; Those isolated pockets corresponding to the code book index of zero quantification index can be without such processing.
Be used for the preferred embodiment that coding code book range of application is encoded has been used run-length code.The code book distribution method that the present invention proposes is not only applicable to multi-channel audio signal and handles, and can also be used to comprise the signal processing of video etc.
9) entropy coder, it is with code book and by the range of application that entropy code book selector is confirmed all quantification indexes of encoding.
10) multiplexer, all the entropy codes and the supplementary of its quantification index are packaged into a complete bit stream, and structure is to appear at before the index that is used for quantization step for quantification index like this.This structure makes unnecessary quantifying unit number each the transient state section into bit stream of packing, because it can be from being recovered by the quantification index that unpacks.
Decoder of the present invention comprises:
1) demultiplexing device is used for unpacking different code word from bit stream;
2) quantification index code book decoder is used for being used for from bit stream decoding entropy code book and each range of application (application range) thereof of quantification index;
3) entropy decoder is used for from the bit stream quantification index of decoding;
4) optional deinterleaver, when in present frame, having transient state, it optionally arranges quantification index again;
5) the quantifying unit number is reproduced device, and it rebuilds the quantifying unit number of each transient state section from quantification index with the following step:
A) find maximum subband for each transient state section with non-zero quantification index;
B) find the minimum critical frequency band that can hold this subband, the quantifying unit number of this transient state section that Here it is;
6) step-length de-packetizer, it unpacks the quantization step of all quantifying unit;
7) inverse quantizer, it rebuilds sub-band samples from quantification index and step-length;
8) optional combined strength decoder, it utilizes combined strength scale factor (boot vector) from the sub-band samples of source sound channel, to rebuild the sub-band samples of associating sound channel;
9) optional and/difference decoder, its from the sub-band samples of/difference sound channel rebuild left and right acoustic channels sub-band samples;
10) the composite filter group of variable-resolution, it rebuilds audio frequency PCM sample from sub-band samples, and this can be through with the execution of getting off:
A) composite filter group can be switched its operation between high, medium and low resolution model;
B) mix the composite filter group, it is based on the composite filter group that can between the high-resolution and low-resolution pattern, switch;
I) when bit stream indication present frame be when encoding with the low frequency resolution model with convertible Analysis of Resolution bank of filters; This composite filter group is a secondary hybrid filter-bank; Wherein, The first order is the contrary ADPCM of composite filter group or of an arbitrary resolution, and the second level is the low frequency resolution model of the self adaptation composite filter group that can between high and low frequency resolution model, switch;
Ii) when bit stream indication present frame be when encoding with the high frequency resolution pattern with convertible Analysis of Resolution bank of filters, this composite filter group is nothing but the composite filter group of the convertible resolution under the high frequency resolution pattern.
At last; The invention provides a low coding delay pattern; This pattern is encoded in the high frequency resolution pattern of switchable resolution analysis filterbank and is activated when device is forbidden, and frame length is reduced to block length or its integral multiple of the switchable resolution bank of filters under the low frequency resolution model subsequently.
According to a method that is used for the Code And Decode multichannel digital audio signal provided by the invention, comprise the following steps:
A) be divided into the quasi-stable state frame to the PCM sample of input;
B) utilize the analysis filterbank of variable-resolution to become sub-band samples to the PCM sample conversion;
C) be quantized into numerous quantification indexes to the sub-band samples piecemeal;
D) pre-designed code book storehouse is provided;
E) distribute to many group quantification indexes to code book based on the local characteristics of quantification index, thereby make code book range of application and piece quantization boundary have nothing to do;
F) code book index and range of application separately thereof are encoded;
G) create a complete encoded data stream, this encoded data stream comprises the quantification index that distributes through code book and said through the code book index of coding and range of application separately thereof;
H) send this complete encoded data stream;
I) receive this encoded data stream and unpack this data flow;
J) quantification index of from data flow, decoding;
K) from decoded quantification index, rebuild sub-band samples; With
L) from the sub-band samples of rebuilding audio frequency PCM sample.
According to the present invention, the method for coding multichannel digital audio signal generally includes from multichannel digital audio signal creates PCM sample and the step that becomes this PCM sample conversion sub-band samples.A plurality of quantification indexes with border are created through quantizing sub-band samples.Through distributing to each quantification index to the code book that can hold the minimum of quantification index in the code book storehouse of design in advance, quantification index is converted into the code book index.Before the encoded data stream that is used to store or send in establishment, the code book index is by cluster segmentation and coding.
In general, the PCM sample is imported in the quasi-stable state frame of duration between 2 to 50 milliseconds (ms).Masking threshold for example can use, and a psychoacoustic model calculates.Bit distributor the bit resource allocation in many groups sub-band samples, so that quantization noise power is lower than masking threshold.
Switch process comprises: use a resolution bank of filters of under high and low frequency resolution model, switching selectively.Detect transient state, when not detecting transient state, use the high frequency resolution pattern; Yet when detecting transient state, the resolution bank of filters is switched to the low frequency resolution model.Along with switching to the low frequency resolution model to the resolution bank of filters, sub-band samples just is divided into the stable state section.The frequency resolution of each stable state section is repaired with the bank of filters or the adaptive differential pulse-code modulation of arbitrary resolution.
When can existing transient state in frame, arranged again quantification index to reduce total number of bits.Run length coding, RLC device can be used for the encoding application boundary of best entropy code book can adopt the cluster segmentation algorithm.
With/difference encoder can be used to the sub-band samples of left and right acoustic channels centering be transformed into/differ from sound channel centering.In addition, the combined strength encoder can be used for contrasting the intensity factor that the source sound channel is extracted the associating sound channel, is merged into the source sound channel to the associating sound channel, and abandons relevant subbands samples all in the associating sound channel.
In general, the combination step of creating a complete bit data flow is through using a multiplexer to carry out in storage or before decoder sends the coded digital audio signal.
The method of decoding audio data bit stream comprises: as through using a demultiplexing device to come the received code audio data stream and unpacking this data flow.Entropy code book index and range of application separately thereof are decoded.This possibly relate to run length and entropy decoder.They also are used to the quantification index of decoding.
When in present frame, detecting transient state, quantification index is as through arranging again with deinterleaver.Sub-band samples is rebuild from decoded quantification index then.Through using the composite filter group of the variable-resolution that can between low and high frequency resolution pattern, switch, audio frequency PCM sample from the sub-band samples of rebuilding.When data flow indication present frame is when encoding with the low frequency resolution model with the switchable resolution analysis filterbank; Variable synthetic resolution bank of filters is as a secondary hybrid filter-bank; Wherein, The first order comprises composite filter group or contrary adaptive differential pulse-code modulation of an arbitrary resolution, and the second level is the low frequency resolution model of variable composite filter group.When data flow indication present frame is when encoding with the high frequency resolution pattern with the analysis filterbank of switchable resolution, variable-resolution composite filter group is done in the high frequency resolution mode.
A combined strength decoder can be used for from the sound channel sub-band samples of source, rebuilding combined sound road subband sample with the combined strength scale factor.In addition and/the difference decoder can be used to from/difference sound channel sub-band samples rebuild left and right acoustic channels sub-band samples.
According to another aspect of the present invention, a kind of code book distribution method that is used for audio-frequency signal coding is provided, comprises: the sub-band samples that receives signal through conversion; Be quantized into numerous quantification indexes to the sub-band samples piecemeal; Local characteristics based on quantification index is distributed to many group quantification indexes to code book, thereby makes code book range of application and piece quantization boundary have nothing to do.
Result of the present invention is the digital audio encoding system of a low bit rate, and its bit rate that has greatly reduced multi-channel audio signal also obtains transparent audio signal simultaneously and reproduces to be used for effective transmission, so that be difficult to it and primary signal are distinguished.
Other features and advantages of the present invention will become from following detailed description obviously with reference to accompanying drawing, and its mode is by way of example explained principle of the present invention.
Description of drawings
Attached drawings is used for explaining the present invention.In these accompanying drawings:
Fig. 1 is a sketch map, describes the Code And Decode according to multichannel digital audio signal of the present invention;
Fig. 2 is a sketch map, and an example encoder used according to the invention has been described;
Fig. 3 is the sketch map of analysis filterbank of variable-resolution with bank of filters of arbitrary resolution;
Fig. 4 is the sketch map of analysis filterbank with variable-resolution of ADPCM;
Fig. 5 is the sketch map that is used for changeable MDCT window type according to of the present invention;
Fig. 6 is a sketch map according to transient state section of the present invention;
Fig. 7 is an application sketch map that has the switchable filter group of two resolution models according to of the present invention;
Fig. 8 is an application sketch map that has the switchable filter group of three resolution models according to of the present invention;
Be similar to Fig. 5, Fig. 9 is the sketch map according to other window type of the changeable MDCT that is used to have three resolution models of the present invention;
Figure 10 has described the one group of example that has the changeable MDCT series of windows of three resolution models according to of the present invention;
Figure 11 is definite sketch map of the entropy code book compared with prior art of the present invention;
Figure 12 is the sketch map that is divided into very big section to the code book index according to the present invention or eliminates the isolated pocket of code book index;
Figure 13 is the sketch map of the decoder that is equipped with of the present invention;
Figure 14 is a sketch map according to the composite filter group of the variable-resolution of the bank of filters with arbitrary resolution of the present invention;
Figure 15 is a sketch map with variable-resolution composite filter group of contrary ADPCM; With
Figure 16 is the structural representation of the bit stream when using meromict bank of filters or switchable filter group+ADPCM according to the present invention.
Figure 17 is when handling the transient state of interval one frame, is short to the advantage sketch map of the long window of conversion.
Figure 18 is the structural representation of the bit stream when using three-mode switchable filter group according to the present invention.
Embodiment
Shown in accompanying drawing, for illustrative purposes, the present invention relates to a low bit rate digital audio encoding and decode system, its bit rate that has greatly reduced multi-channel audio signal has also been realized transparent audio reproducing simultaneously to be used for effectively transmission or storage.That is, the audio signal bit rate of multi-channel encoder reduces through using the lower system of algorithm complex, even and expert listener also can't distinguish the audio signal and the primary signal of on decoder end, reducing.
As shown in fig. 1, encoder 5 of the present invention is encoded into bit stream with multi-channel audio signal as input and with it, and has greatly reduced bit rate to be suitable for transmission or storage on the media of sound channel finite capacity.As long as receive the bit stream that produces by encoder 5, the multi-channel audio signal that decoder 10 is just decoded to it and reconstruction even expert listener can not be distinguished itself and primary signal.
In encoder 5 and decoder 10 inside, multi-channel audio signal is used as discrete channels and handles.That is, each sound channel and other sound channel are likewise treated, only if clearly specified associating sound channel coding 2.This has made explanation with the encoder structure of extremely simplifying in Fig. 1.
The coder structure that utilizes this extreme to simplify, its encoding process procedure declaration is following.Audio signal from each sound channel at first is broken down into subband signal in the first order 1 of analysis filterbank.Subband signal from all sound channels is optionally delivered to associating sound channel encoder 2, and it corresponding to the subband signal from the same frequency band of different sound channels, adopts the auditory properties of people's ear to reduce bit rate through combination.Can in 2, be quantized then and in 3, be encoded by the subband signal of combined coding.Quantification index or their entropy coding and in 4, be multiplexed into a complete bit stream then from the supplementary of all sound channels and send or storage being used for.
On decoding end, bit stream is supplementary and quantification index or its entropy coding by demultiplexing in 6 at first.Entropy coding decoded in 7 (note: the entropy decoding and the demultiplexing of the prefix code such as Huffman code are carried out in a single step usually).The step-length that subband signal utilizes the quantification exponential sum to be carried by supplementary in 7 is rebuild.If in encoder, use associating sound channel coding, then unite channel decoding and in 8, be performed.Then, the audio signal of each sound channel utilizes subband signal to be rebuild in synthetic level 9.
The discrete feature of the Code And Decode method that the encoder structure that above-mentioned extreme is simplified is used to explain that separately the present invention provides.The Code And Decode method difference that is applied to each sound channel of audio signal is greatly different and complicated more.Unless otherwise mentioned, then these methods are described as follows in a sound channel environment of audio signal.
Encoder
The universal method of a sound channel of coding audio signal is described in Fig. 2 as follows:
Framer 11 is divided into the quasi-stable state frame to the input PCM sample of duration from 2 to 50ms.The definite number of PCM sample must be the integral multiple of the subband maximum number of the different bank of filters of use in the T/F analysis filterbank 13 of variable-resolution in one frame.The maximum number of supposing subband is N, and the number of PCM sample is in the frame so
L=k·N
Wherein, k is a positive integer.
Transient analysis 12 detects the existence of transient state in the current incoming frame and this information is passed to variable-resolution analysis bank 13.
Can adopt any known transient state detection method here.In one embodiment of the invention, the incoming frame of PCM sample is sent to the low frequency resolution model of the analysis filterbank of variable-resolution.Let s (m, n) expression is from the output sample of this bank of filters, wherein, m is a subband index and n is the time index (temporal index) in the subband domain.In following discussion, term " transient state detects distance " waits the range measurement that refers to each time index definition:
E ( n ) = Σ m = 0 M - 1 | s ( m , n ) |
Or
E ( n ) = Σ m = 0 M - 1 s 2 ( m , n )
Wherein, M is the subband number of bank of filters.The range measurement of other type also can be used with similar method.Letting is the minimum and maximum value of this distance, if
Figure GSB00000629169300163
Then there is transient state in statement, and wherein, threshold value can be set to 0.5.
The present invention uses the analysis filterbank 13 of a variable-resolution.Exist many known methods to realize the analysis filterbank of variable-resolution.An outstanding method is to use can switch the bank of filters of its operation between high and low frequency resolution model, the high frequency resolution pattern is used for the stable state section of audio signal and the low frequency resolution model is used to handle transient state.Yet the switching of resolution is owing to the constraint of theory and practice can not in time at random take place.On the contrary, it usually occurs in frame boundary, and promptly frame is handled with high frequency resolution pattern or low frequency resolution model.As shown in Figure 7, for transient state frame 131, bank of filters has switched to the low frequency resolution model to avoid the forward direction echo effect.Because transient state 132 itself is very short, and the preceding transient state 133 of this frame is much longer with 134 sections of the transient state in back, so the bank of filters of low frequency resolution model is obvious and these stable state sections do not match.This has greatly limited total coding gain that entire frame can reach.
The present invention proposes three methods and solve this problem.Basic thought is that the stable state part (stationary majority) for the transient state frame provides a upper frequency resolution in the switchable resolution structure.
The meromict bank of filters
As shown in Figure 3; It comes down to a hybrid filter-bank; The analysis filterbank 28 that comprises a switchable resolution that can between high and low frequency resolution model, switch; And when low frequency resolution model 24, followed has the analysis filterbank 26 of an optional arbitrary resolution then with a transient state cluster segmentation unit 25 is arranged in each subband.
When transient detector 12 did not detect transient state and exists, the analysis filterbank 28 of switchable resolution got into low temporal resolution pattern 27, and it guarantees that high frequency resolution to realize the gain of high audio signal encoding, has strong tonal components.
When transient detector 12 detected transient state and exists, the analysis filterbank 28 of switchable resolution got into high time resolution patterns 24.This has guaranteed to handle transient state to prevent forward direction echo with good temporal resolution.The sub-band samples that so produces is divided into the quasi-stable state section by transient state cluster segmentation part 25 as shown in Figure 6.In following discussion, term " transient state section " etc. refers to these quasi-stable state sections.This is the analysis filterbank 26 of the arbitrary resolution in each subband at the back, and its subband number equals the sub-band samples number of each transient state section in each subband.
The analysis filterbank 28 of switchable resolution can realize with any bank of filters that can between high and low frequency resolution model, switch its operation.One embodiment of the present of invention have adopted pair of DC T, and corresponding to low and high frequency resolution, it is little and big that its transition length is respectively.Suppose that transition length is M, then the sub-band samples of the DCT of type 4 obtained be:
s ( m , n ) = 2 M Σ k = 0 M - 1 cos [ π M ( k + 0.5 ) ( n + 0.5 ) ] · x ( mM + k )
Wherein, x (.) is an input PCM sample.The DCT of other form can be used for replacing the DCT of type 4.
Because DCT tendency causes blocking effect, thus of the present invention one preferably embodiment adopt improved DCT (MDCT):
s ( m , n ) = 2 M Σ k = 0 2 M - 1 cos [ π M ( k + 0.5 + M 2 ) ( n + 0.5 ) ] · w ( k ) · x ( mM - M + k )
Wherein, w (.) is a window function.
Window function must be the power symmetry in per half window:
w 2(k)+w 2(M-k)=1 k=0,…,M-1
w 2(k+M)+w 2(2M-1-k)=1 k=0,…,M-1
So that guarantee desirable reconstruction.
Can be used although satisfy any window of above-mentioned situation, have only following sine-window
w ( k ) = ± [ ( k + 0.5 ) π 2 M ] , k = 0 , . . . , 2 M - 1
Have good characteristic, promptly the DC component in the input signal is focused on first conversion coefficient.
In order when MDCT switches, to keep desirable reconstruction between high and low frequency pattern or long and short window, the lap of long and short window must have identical shape.
Depend on the transient characteristic of input PCM sample, encoder can be selected a long window (shown in first window 61 among Fig. 5), switches to a short series of windows (shown in the four-light mouth 64 among Fig. 5), and returns.Length among Fig. 5 is that the switching of this type of bridge joint is needed to the long window 62 of short conversion with the long window 63 that is short to long conversion.When two transient state very near but be not when being close to the continuous application that is enough to guarantee short window, long window 65 is short to that to change be useful among Fig. 5.Encoder need transmit the window type that is used to each frame to decoder, so that identical window is used to rebuild the PCM sample.
The advantage that is short to the long window of conversion is only to handle the contiguous transient state of a frame at interval.As shown in the top 67 of Figure 17, the MDCT of prior art can handle the transient state of two frames at least at interval.As shown in the bottom 68 of Figure 17, use this long window that is short to conversion can it be reduced to a frame.
The present invention will carry out transient state section 25 then.Through utilizing the variation of binary function value from 0 to 1 or 1 to 0, the transient state section can be represented by the binary function or the cluster segmentation border of indication transient position.For example, the quasi-stable state section among Fig. 6 can be expressed as followsin:
T ( n ) = 0 , n = 0,1,2,3,4 1 , n = 5,6,7,8,9 0 , n = 10,11,12,13,14,15,16
Notice that T (n)=0 means that not necessarily the energy of audio signal is very high when time index n, vice versa.Be called as " transient state section function " etc. in following discussion function T (n) everywhere.The information of being carried by this section function must be sent to decoder directly or indirectly.The run length coding, RLC of coding zero-one run length is an effective choice.For top object lesson, T (n) can be sent to decoder with 5,5 and 7 run length code.The run length code can also be by entropy coding.
Transient state cluster segmentation part 25 can realize with any known transient state cluster segmentation method.In one embodiment of the invention, the transient state cluster segmentation can be accomplished through simply the transient state detection range being got threshold value.
Figure GSB00000629169300192
Threshold value can be set to
Threshold = k · E max + E min 2
Wherein, k is an adjustable constant.
A more senior embodiment of the present invention is that it comprises the following steps: according to the k means clustering algorithm
1) transient state cluster segmentation function T (n) is initialised, and utilizes the above-mentioned result that threshold method obtains that gets.
2) barycenter of each type is calculated:
Figure GSB00000629169300202
is for the class that is associated with T (n)=0;
Figure GSB00000629169300203
is for the class that is associated with T (n)=1.
3) transient state cluster segmentation function T (n) is distributed based on following rule
Figure GSB00000629169300204
4) enter step 2.
The analysis filterbank 26 of arbitrary resolution is a conversion such as DCT in essence, and its block length equals the number of samples in each sub band.Suppose that each subband all exists 32 sub-band samples and them to be divided into (9,3,20) in a frame, then block length is that three conversion of 9,3 and 20 will be by the sub-band samples that is applied in three sub bands each respectively.In following discussion, term " sub band " etc. refers to the sub-band samples of transient state section in the subband.The conversion of the back segment (9,3,20) of m subband can be explained as follows with the DCT of type 4
u ( m , n ) = 2 20 Σ k = 0 20 - 1 cos [ π 20 ( k + 0.5 ) ( n + 0.5 ) ] · s ( m , 12 + k )
This conversion will increase the frequency resolution in each transient state section, so can expect a good coding gain.Yet in many cases, coding gain is less than 1 or too little, and favourable decision-making is to abandon this type transformation results and notify decoder this decision-making via supplementary.Because the expense relevant with supplementary is that then it can improve total coding gain according to one group of sub band if whether abandon the decision of transformation result, promptly a bit is utilized for one group of sub band rather than this decision-making of each sub band transmission.
In following discussion, term " quantifying unit " etc. refers to and belongs to identical psychologic acoustics critical band and one group of interior sub-band samples that links to each other of transient state section.Quantifying unit can be good a grouping that is used for the sub band of above-mentioned decision-making.If this is used, then sub bands all in the quantifying unit is calculated total coding gain.If coding gain then is that sub bands all in the quantifying unit keeps transformation result greater than 1 or some other higher thresholds.Otherwise this result is dropped.Only need to transmit this to decoder and be used for the decision-making of all sub bands of quantifying unit with a bit.
Switchable filter group+ADPCM
As shown in Figure 4, it basically with Fig. 3 in identical, only the analysis filterbank 26 of arbitrary resolution is substituted by ADPCM29.The decision of whether using ADPCM is again according to one group of sub band such as quantifying unit, so that reduce the cost of supplementary.This group sub band even can share one group of predictive coefficient.Can use the known method that quantizes predictive coefficient herein, such as comprising LAR (log area ratio), IS (arcsine) and LSP (line spectrum pair).
The switchable bank of filters of three-mode
Be different from the common switchable filter group of having only the high-resolution and low-resolution pattern, this bank of filters can be switched its operation between high, medium and low resolution model.High and low frequency resolution model is respectively to be used for stable state and transient state frame, and follows the one type principle identical with double mode switchable filter group.The main intention of mid-resolution pattern is that the stable state section in the transient state frame provides frequency resolution preferably.Therefore, in a transient state frame, the low frequency resolution model is used to the transient state section, and the mid-resolution pattern is used to the remainder of this frame.Be different from prior art, for the voice data of single frame, switchable filter group of the present invention is operated with two resolution models.The mid-resolution pattern can also be used to handle the frame with level and smooth transient state.
In following discussion, term " long piece " etc. refer to bank of filters each the time be engraved in a sample block of exporting under the high frequency resolution pattern: term " in piece " waits and refers to the bank of filters sample block that each is exported constantly under the intermediate frequency resolution model; Term " short block " etc. refers to the bank of filters sample block that each is exported constantly under the low frequency resolution model.Three kinds of frames can be described as follows with these three kinds of definition:
● bank of filters is operated the stable state frame of handling with the high frequency resolution pattern, and each frame in this type frame generally includes one or more long pieces;
● bank of filters is operated the frame of handling with transient state with high, middle temporal resolution pattern, and each frame in this type frame all comprises several middle pieces and several short block, and the total sample number of all short blocks equals the total sample number of a middle piece;
● the frame with level and smooth transient state that bank of filters is done to handle with the mid-resolution mode, each frame in this type frame all comprise several middle pieces.
The advantage of this new method is illustrated in Fig. 8.Fig. 8 is identical with Fig. 7 basically, and only original many sections (141,142 and 143) of in Fig. 7, under the low frequency resolution model, handling are handled by the medium frequency resolution model now.Because these sections are stable states,, therefore can expect higher coding gain so the medium frequency resolution model obviously matees than low frequency resolution model more.
One embodiment of the present of invention adopt have little, in, the tlv triple DCT of big block length, correspond respectively to the resolution model of basic, normal, high frequency.
A better embodiment of the present invention (no blocking effect) adopt have little, in, the tlv triple MDCT of big block length.Owing to introduced the mid-resolution pattern, the window type shown in Fig. 9 also be provided the window type in Fig. 5.These windows are described as follows:
middle window 151;
● long long window 152 to middle conversion: as a long window, its bridge joint from long window to the conversion of window.
● in to the long window 153 of long conversion: as a long window, the therefrom window conversion of arriving long window of its bridge joint.
● the long window 154 of centre to centre conversion: as a long window, its bridge joint is the conversion of window window in another therefrom.
● in to the middle window 155 of short conversion: as a middle window, its bridge joint therefrom window to the conversion of short window.
● the middle window 156 of conversion in being short to: as a middle window, its bridge joint from short window to the conversion of window.
● in to the long window 157 of short conversion: as a long window, its bridge joint therefrom window to the conversion of weak point window.
● the long window 158 of conversion in being short to: as a long window, its bridge joint from lack window to the conversion of window.
Attention: be similar to the long window 65 that is short to conversion among Fig. 5, the long window 154 of centre to centre conversion, in to the long window 157 of short conversion and the long window of changing in being short to 158 can make the transient state of three-mode MDCT processing interval one frame.
Figure 10 illustrates some examples of series of windows.161 for example clear these embodiment handle the ability of slow transient state with intermediate-resolution 167, and 162 to 166 explained to transient state distribute meticulous temporal resolution 168, stable state section in same frame distribute in temporal resolution 169 and to the ability of stable state frame distribution high frequency resolution 170.
Common can here be employed with/difference coding method 14.For example, a simple method for using is following:
And sound channel=0.5 (L channel+R channel)
Difference sound channel=0.5 (L channel-R channel)
Common combined strength coding method 15 can here be employed.A simple method can be
● with source and associating sound channel with replace the source sound channel.
● with its be adjusted into quantifying unit in the identical energy level of original source sound channel
● abandon the sub-band samples in combined sound road in the quantifying unit, only be sent to decoder to the quantification index of scale factor (being called as " scale factor " among " boot vector " or the present invention), it is defined as:
Figure GSB00000629169300251
The non-uniform quantizing of the boot vector such as logarithm will be used to the auditory properties of match people ear.Entropy coding can be applied to the quantification index of boot vector.
For fear of source and associating sound channel at their phase difference near the cancellation effect under the situation of 180 degree, can be added up to application polarity when forming associating sound channel at them:
And sound channel=source sound channel+polarity associating sound channel
Polarity also must be sent to decoder.
Psychoacoustic model 23 calculates the masking threshold of the current incoming frame of audio samples based on the auditory properties of people's ear, and the quantizing noise that is lower than masking threshold unlikely is heard.Can use any common psychoacoustic model here, but the present invention requires its psychoacoustic model that each quantifying unit is all exported a masking threshold.
The bit resource that overall situation bit distributor 16 overall situation ground distribute a frame to use to each quantifying unit; So that the quantization noise power in each quantifying unit is lower than its masking threshold separately, it controls the quantization noise power of each quantifying unit through regulating its quantization step.All sub-band samples in the quantifying unit all use identical step-length to quantize.
Can adopt all known Bit distribution methods here.One of these class methods are famous Water Filling algorithms.Its basic thought is to find the highest quantifying unit of its QNMR (quantizing noise masking ratio), and the step-length that this quantifying unit is distributed in minimizing is to reduce quantizing noise.Its repeats this and handles till the bit resource exhaustion of the QNMR of all quantifying unit (or any other threshold value) or present frame all less than 1.
Quantization step itself must be quantized so that it can be packaged in the bit stream.Non-uniform quantizing such as logarithm will be used to the auditory properties of match people ear.Entropy coding can be applied to the quantification index of step-length.
The step-length that the present invention uses overall Bit Allocation in Discrete 16 to provide quantizes all sub-band samples in each quantifying unit 17.Can use all quantization schemes linear or nonlinear, even or heterogeneous here.
When only in present frame, having transient state, just can optionally call staggered 18.(m, n k) are k quantification index in m quasi-stable state section and n the subband to let x.(m, n, k) order that is arranged of quantification index normally.Interleaved units 18 rearrangement quantification indexes so as they be arranged as (n, m, k).The motivation of doing like this is that arranging again of quantification index can be so that the required bit number of these encoded indexs lacks when not interlocking index.Whether call staggered decision-making and need be sent to decoder as supplementary.
In the audio coding algorithm formerly, the range of application of entropy code book is identical with quantifying unit, so the entropy code book is confirmed (referring to the top of Figure 11) by the quantification index in the quantifying unit.Therefore be not used in the space of optimization.
The present invention is diverse in this respect.It is proceeding to the existence of having ignored quantifying unit when code book is selected.On the contrary, it distributes to each quantification index to best code book, has therefore converted quantification index to the code book index in essence.Then, it is divided into bigger section to these code book indexes, and segment boundary has defined the scope that code book is used.Obviously, these code book ranges of application differ greatly with the scope of being confirmed by quantifying unit.They only are based on the quality of quantification index, thereby selected code book is more suitable for quantification index.Therefore, only need less bit to be sent to decoder to quantification index.
The advantage that this method in contrast to prior art is illustrated in Figure 11.The quantification index of maximum during let us is with the aid of pictures.It belongs to quantifying unit d and utilizes previous method will select a big code book, and this big code book obviously is not best, and is much little because the most of indexes among the quantifying unit d are wanted.On the other hand, the new method of the application of the invention, the identical quantification index section of being divided into C is so the big quantification index of it and other is shared a code book.In addition, all quantification indexes among the section D are all very little, so a little code book will be selected.Therefore, need less bit to come the coded quantization index.
Referring now to Figure 12,, the system of prior art need only be sent to decoder to the code book index as supplementary, because their range of application is identical with the quantifying unit of being scheduled to.Yet method of the present invention also need be sent to decoder to the code book range of application as supplementary except transmitting the code book index, because they are independent of quantifying unit.If deal with improperly, then this overhead may be used for whole supplementary and quantification index with more bits and finish.Therefore, being divided into big section to the code book index is quite crucial for this expense of control, because big section means that the code book index of less number and range of application thereof need be sent to decoder.
One embodiment of the present of invention are accomplished new departure that this code book is selected with the following step:
1) be blocked into district's group to quantification index, each district's group comprises P quantification index.
2) confirm that each district organizes maximum code book demand.For symmetrical quantizer, this is represented by absolute quantification index maximum in each district's group usually:
Figure GSB00000629169300281
Wherein I (.) is a quantification index;
3) can hold the minimum code book of its maximum code book demand to distinguishing set of dispense:
Figure GSB00000629169300282
4) method of the minimum value of the code book index through rising to the code book index of those code book indexes isolated pocket littler its neighbour and dispose these isolated pockets than its neighbour.This is explained to 80 by mapping 71 to 72,73 to 74,77 to 78 and 79 in Figure 12.Deeply can, this be removed from handling, because this code book indication does not have code to be transmitted corresponding to the isolated pocket in the code book index of zero quantification index.This is described to 75 to 76 mapping in Figure 12.This step has reduced the number and the range of application thereof of the code book index that need be sent to decoder significantly.
One embodiment of the present of invention adopt the run length code code book range of application of encoding, and the run length code can also be encoded with the entropy code.
All quantification indexes are all used the code book confirmed by entropy code book selector 19 and are encoded 20 with their ranges of application separately.
Entropy coding can be realized with various huffman code books.When the quantification progression in the code book was very little, a plurality of quantification indexes were collected (blocked) to together to form a big huffman code book.When the number (number of quantization levels) of quantized level is too big, (for example surpass 200), then adopt the recurrence index.To this, a big quantification index q is represented as
q=m·M+r
Wherein, M is a mould, and m is the merchant, and r is a remainder.Have only m and r need be sent to decoder.In them one or its both can encode with Huffman code.
Entropy coding can be realized with various arithmetic code books.When the quantized level number is too big, (for example surpass 200), the recurrence index also will be used.
The entropy coding of other type also can be used to above-mentioned Huffman and arithmetic coding.
Without entropy coding and all or part of quantification index of directly packing also is a good selection.
Because the statistical property of quantification index is obviously different when the employing of variable-resolution bank of filters is hanged down with high resolution model, one embodiment of the present of invention adopt two entropy code book storehouses to come coded quantization index under these two patterns respectively.The 3rd storehouse can be used to the mid-resolution pattern, and it can also share this storehouse with high or low resolution model.
The present invention is all quantification indexes and the multiplexed 21 one-tenth complete bit streams of other supplementary.Supplementary comprises the length of quantization step, sample rate, speaker configurations, frame length, quasi-stable state section, the code of entropy code book etc.Other supplementary such as timing code also can be packetized in the bit stream.
The system of prior art need be sent to decoder to the quantifying unit number of each transient state section, because the code book and the quantification index self of the unpacking of quantization step, quantification index all depend on this.Yet in the present invention; Because the selection of quantification index code book and range of application thereof is isolated (decouple) by the ad hoc approach of entropy code book selection 19 from quantifying unit; Bit stream can constitute with method so, and promptly quantification index can be unpacked before the number that needs quantifying unit.In case quantification index is unpacked, they just can be used to rebuild the number of quantifying unit.This will explain in decoder.
Above-mentioned consideration has been arranged, and one embodiment of the present of invention are used a bit stream structure as shown in Figure 16 when using meromict bank of filters or switchable filter group+ADPCM, and it comprises with the lower part in essence:
● synchronization character 81: the beginning of indicative audio Frame;
● frame head 82: comprise the relevant information of audio signal, such as sample rate, normal channel number, LFE (low-frequency effects) channel number, speaker configurations etc.;
sound channel 1,2 ..., N, 83,84,85: all voice datas of each sound channel are all packed at this;
● auxiliary data 86: comprise the auxiliary data such as timing code;
● wrong detection 87: error detection code is inserted into the mistake that occurs in the present frame to detect here, so that error handling program can start when detecting bit stream error;
The voice data of each sound channel also is configured as follows:
● window type 90: the window indicating shown in Fig. 5 is used to encoder so that decoder can use identical window;
● transient position 91: only be used for the frame of transient state, the position of its each transient state section of indication.If the run length code is used, then this is the position that the length of each transient state section is packed;
● 92: one bits of staggered decision-making, only in the transient state frame, the quantification index that indicates whether staggered each transient state section is so that decoder knows whether want the deinterleave quantification index;
● code book exponential sum range of application 93: it reaches the information transmission to the range of application of quantification index to all about the entropy code book, and it comprises with the lower part:
Zero code book number 101: the entropy code book number that transmits each transient state section of current sound channel;
Zero range of application 102: organize the range of application that transmits each entropy code book according to quantification index or district, they can also be encoded with the entropy code;
Zero code book index 103: be sent to the entropy code book to this index, they can also further be encoded with the entropy code;
● quantification index 94: transmit the entropy code that is used for current all quantification indexes of sound channel;
● quantization step 95: be sent to the quantization step that is used for each quantifying unit to index, it can also be encoded with the entropy code.Explain that as previous the number of step index or the number of quantifying unit will be rebuild from quantification index by decoder shown in 49;
● 96: one bits of the bank of filters decision-making of arbitrary resolution are used for each quantifying unit; When the analysis filterbank 28 that it only appears at switchable resolution was taked the low frequency resolution model, whether the instruction decoding device will rebuild (51 or 55) to the bank of filters that all sub bands in the quantifying unit are carried out arbitrary resolutions;
● with/difference coding determine 97: one bits be used for by with of the quantifying unit of/difference coding.It is selectable and only appears at when adopting with/difference coding, and whether its instruction decoding device will be carried out and/difference decoding 47;
● decision-making of combined strength coding and boot vector 98: it transmits the information that whether will carry out the combined strength decoding about decoder; It is selectable and only is used for by the quantifying unit of the associating sound channel of combined strength coding; And when only appearing at encoder employing combined strength coding, it comprises with the lower part:
Zero decision-making 121: bit of each associating quantifying unit indicates whether and will unite channel decoding to the sub-band samples in the quantifying unit to decoder;
Zero polarity 122: bit of each associating quantifying unit, represent the polarity of associating sound channel with respect to the source sound channel:
Figure GSB00000629169300321
Zero boot vector 123: scale factor of each associating quantifying unit, it can be by entropy coding;
● auxiliary data 99: comprise the supplementary such as dynamic range control.
When the switchable bank of filters of three-mode was used, bit stream structure was same as described above in essence, except:
● window type 90: indicate which window such as Fig. 5 to be used for encoder so that decoder can use identical window with window shown in Fig. 9.Notice that for the frame with transient state, this window type only relates to last window in the frame, because the last window that remaining window can use is inferred from this window type, transient position and last frame;
● transient position 91: only appear under the situation of frame with transient state.It at first indicates this frame whether to have slow transient state 171.If not, then it also indicates transient position according to short block 173 then according to middle piece 172;
● the bank of filters decision-making 96 of arbitrary resolution: it is incoherent, therefore is not used.
Decoder
Decoder of the present invention has been realized the contrary processing of encoder basically, and it is illustrated in Figure 13 and is explained as follows.
A demultiplexing device 41 decodes quantification index from bit stream, and like the supplementary of quantization step, sample rate, speaker configurations and timing code etc. and so on.When the prefix entropy code such as Huffman code was used, this step was an one step that combines the entropy decoding.
Quantification index code book decoder 42 from bit stream, decode the entropy code book of quantification index and range of application separately thereof.
Entropy decoder 43 is based on the entropy code book that is provided by quantification index code book decoder 42 and range of application separately thereof the quantification index of from bit stream, decoding.
When only existing transient state in present frame, just optionally adopted deinterleave 44.If the decision-making bit indication staggered 18 that unpacks from bit stream was called encoder, then the deinterleave quantification index.Otherwise, do not make any modification ground and transmit quantification index.
The present invention rebuilds the number of quantifying unit from the non-zero quantification index of each transient state section 49.Let q (m n) is quantification index (if there is not transient state in the frame, then only having a transient state section) for n subband of m transient state section, finds out the maximum subband with non-zero quantification index of each transient state section m:
Band max ( m ) = max n { n | q ( m , n ) ≠ 0 }
Recall, quantifying unit is to be defined by critical band in frequency and temporal transient state section, so the quantifying unit number of each transient state section is to hold Band Max(m) minimum critical frequency band.Let the frequency band (Cb) be the maximum subband of Cb critical band, the quantifying unit number of each transient state section m can be represented as follows:
N ( m ) = min Cb { Cb | Band ( Cb ) ≥ Band max ( m ) }
Quantization step unpack 50 from bit stream, unpack each quantifying unit quantization step.
Re-quantization 45 utilizes each quantization step of each quantifying unit from quantification index, to rebuild sub-band samples.
If called combined strength coding 15 in the bit stream indication encoder, then combined strength decoding 46 from the source sound channel duplicate sub-band samples and it multiply by polarity and the sub-band samples of boot vector with reconstruction associating sound channel:
Associating sound channel=polarity boot vector source sound channel
If bit stream indication was called in encoder with/difference coding 14, then with/difference decoder 47 from/difference sound channel rebuild left and right acoustic channels.Corresponding to/difference coding 14 in explained with/difference coding example, left and right acoustic channels can be resorted to:
L channel=and sound channel+difference sound channel
R channel=and sound channel-difference sound channel
Decoder of the present invention has combined the composite filter group 48 of a variable-resolution, and it comes down to be used for the return device of analysis filterbank of code signal.
If the analysis filterbank of three-mode switchable resolution is used to encoder, then the operation of its corresponding composite filter group is confirmed uniquely and is required identical series of windows to be used for synthetic the processing.
If meromict bank of filters or switchable filter group+ADPCM are used to encoder, then decode procedure is described as follows:
● if bit stream indication present frame is to encode with the high frequency resolution pattern with the analysis filterbank 28 of switchable resolution, and then therefore the composite filter group 52 of switchable resolution gets into the high frequency resolution pattern and from sub-band samples, rebuild PCM sample (seeing Figure 14 and Figure 15).
● if bit stream indication present frame is to encode with the low frequency resolution model with the analysis filterbank 28 of switchable resolution; Then sub-band samples at first is sent to the composite filter group 51 (Figure 14) or the contrary ADPCM55 (Figure 15) of arbitrary resolution; And this depends on that which has been used in the encoder, accomplishes their synthetic processing separately then.Then, the PCM sample is rebuild from these synthetic sub-band samples with low frequency resolution model 53 by the composite filter group of switchable resolution.
Composite filter group 52,51 and 55 is respectively the return device of analysis filterbank 28,26 and 29.Their structure and operational processes are come to confirm uniquely by analysis filterbank.Therefore, no matter in encoder, use what analysis filterbank, its corresponding composite filter group must be used to decoder.
Low coding delay pattern
When the high frequency resolution pattern of the analysis filterbank of switchable resolution is encoded device when forbidding; Frame length can be reduced to block length or its integral multiple of the bank of filters of the switchable resolution under the low frequency mode subsequently; This has produced a much little frame length, causes the required much little delay of encoder operation.Low coding delay pattern of the present invention that Here it is.
Although some embodiment for example purpose are described in detail, yet under the prerequisite that does not depart from the scope of the present invention with spirit, can make various modifications.Therefore, the present invention is only limited by additional claim.

Claims (8)

1. code book distribution method that is used for audio-frequency signal coding comprises:
Receive the sub-band samples through conversion of signal;
Be quantized into numerous quantification indexes to the sub-band samples piecemeal;
Local characteristics based on quantification index is distributed to many group quantification indexes to code book, thereby makes code book range of application and piece quantization boundary have nothing to do.
2. code book distribution method as claimed in claim 1, wherein said code book are selected from pre-designed code book storehouse.
3. code book distribution method according to claim 1 further comprises:
Code book index and range of application separately thereof are encoded;
Create one complete, comprise encoded data stream through the code book index of coding and range of application so that transmission.
4. code book distribution method as claimed in claim 3, the sub-band samples that wherein receives signal comprises:
Be divided into the quasi-stable state frame to the PCM sample of input;
Utilize the analysis filterbank of variable-resolution to become sub-band samples to the PCM sample conversion.
5. like the code book distribution method of claim 3 or 4; Wherein, Above-mentioned code book allocation step comprises: thus convert quantification index to the code book index through the index that can hold the minimum available code book of above-mentioned quantification index to each quantification index distribution, and become a plurality of ranges of application to code book index cluster segmentation.
6. code book distribution method as claimed in claim 5 wherein is divided into very big section to these code book indexes, and segment boundary is defined as the range of application of code book.
7. the code book distribution method of claim 6, wherein, said code book index and range of application separately thereof comprise number, range of application and the code book index of code book.
8. code book distribution method as claimed in claim 1, wherein said code book distributes and comprises:
Is the quantification index piecemeal district's group, and each district's group comprises the quantification index of fixed number;
Confirm the maximum code book demand of each district's group;
The minimum code book that can hold its maximum code book demand to district's set of dispense; And
Remove those code book indexes isolated pocket littler than its neighbour's code book index.
CN2008100034572A 2004-09-17 2005-09-07 Signal processing method Active CN101247129B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US61067404P 2004-09-17 2004-09-17
US60/610,674 2004-09-17
US11/029,722 2005-01-04
US11/029,722 US7630902B2 (en) 2004-09-17 2005-01-04 Apparatus and methods for digital audio coding using codebook application ranges

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100958986A Division CN100364235C (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding

Publications (2)

Publication Number Publication Date
CN101247129A CN101247129A (en) 2008-08-20
CN101247129B true CN101247129B (en) 2012-05-23

Family

ID=37078085

Family Applications (8)

Application Number Title Priority Date Filing Date
CN2008100034638A Active CN101246689B (en) 2004-09-17 2005-09-07 Audio encoding system
CN2007101051443A Active CN101055719B (en) 2004-09-17 2005-09-07 Method for encoding and transmitting multi-sound channel digital audio signal
CNB2005100958986A Active CN100364235C (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2008100034623A Active CN101241701B (en) 2004-09-17 2005-09-07 Method and equipment used for audio signal decoding
CN2007101051462A Active CN101312041B (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2007101051458A Active CN101046963B (en) 2004-09-17 2005-09-07 Method for decoding encoded audio frequency data stream
CN2008100034572A Active CN101247129B (en) 2004-09-17 2005-09-07 Signal processing method
CN2007101051439A Active CN101055721B (en) 2004-09-17 2005-09-07 Multi-sound channel digital audio encoding device and its method

Family Applications Before (6)

Application Number Title Priority Date Filing Date
CN2008100034638A Active CN101246689B (en) 2004-09-17 2005-09-07 Audio encoding system
CN2007101051443A Active CN101055719B (en) 2004-09-17 2005-09-07 Method for encoding and transmitting multi-sound channel digital audio signal
CNB2005100958986A Active CN100364235C (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2008100034623A Active CN101241701B (en) 2004-09-17 2005-09-07 Method and equipment used for audio signal decoding
CN2007101051462A Active CN101312041B (en) 2004-09-17 2005-09-07 Apparatus and methods for multichannel digital audio coding
CN2007101051458A Active CN101046963B (en) 2004-09-17 2005-09-07 Method for decoding encoded audio frequency data stream

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN2007101051439A Active CN101055721B (en) 2004-09-17 2005-09-07 Multi-sound channel digital audio encoding device and its method

Country Status (1)

Country Link
CN (8) CN101246689B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8054969B2 (en) * 2007-02-15 2011-11-08 Avaya Inc. Transmission of a digital message interspersed throughout a compressed information signal
CN101453643B (en) * 2007-12-04 2011-05-18 华为技术有限公司 Quantitative mode, image encoding, decoding method, encoder, decoder and system
US8630848B2 (en) * 2008-05-30 2014-01-14 Digital Rise Technology Co., Ltd. Audio signal transient detection
CN101577116B (en) * 2009-02-27 2012-07-18 北京中星微电子有限公司 Extracting method of MFCC coefficients of voice signal, device and Mel filtering method
CN101615911B (en) 2009-05-12 2010-12-08 华为技术有限公司 Coding and decoding methods and devices
EP3349360B1 (en) 2011-01-14 2019-09-04 GE Video Compression, LLC Entropy encoding and decoding scheme
EP2717262A1 (en) 2012-10-05 2014-04-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding
US10580417B2 (en) * 2013-10-22 2020-03-03 Industry-Academic Cooperation Foundation, Yonsei University Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain
WO2015144587A1 (en) * 2014-03-25 2015-10-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder device and an audio decoder device having efficient gain coding in dynamic range control
CN104050968B (en) * 2014-06-23 2017-02-15 东南大学 Embedded type audio acquisition terminal AAC audio coding method
CN104240712B (en) * 2014-09-30 2018-02-02 武汉大学深圳研究院 A kind of three-dimensional audio multichannel grouping and clustering coding method and system
CN105261373B (en) * 2015-09-16 2019-01-08 深圳广晟信源技术有限公司 Adaptive grid configuration method and apparatus for bandwidth extension encoding
CN107895580B (en) * 2016-09-30 2021-06-01 华为技术有限公司 Audio signal reconstruction method and device
CN108461086B (en) * 2016-12-13 2020-05-15 北京唱吧科技股份有限公司 Real-time audio switching method and device
US10354668B2 (en) * 2017-03-22 2019-07-16 Immersion Networks, Inc. System and method for processing audio data
US10699723B2 (en) * 2017-04-25 2020-06-30 Dts, Inc. Encoding and decoding of digital audio signals using variable alphabet size
CN109286922B (en) * 2018-09-27 2021-09-17 珠海市杰理科技股份有限公司 Bluetooth prompt tone processing method, system, readable storage medium and Bluetooth device
EP3751567B1 (en) * 2019-06-10 2022-01-26 Axis AB A method, a computer program, an encoder and a monitoring device
CN110970039A (en) * 2019-11-28 2020-04-07 北京蜜莱坞网络科技有限公司 Audio transmission method and device, electronic equipment and storage medium
CN115691521A (en) * 2021-07-29 2023-02-03 华为技术有限公司 Audio signal coding and decoding method and device
CN115691514A (en) * 2021-07-29 2023-02-03 华为技术有限公司 Coding and decoding method and device for multi-channel signal

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
AU763060B2 (en) * 1998-03-16 2003-07-10 Koninklijke Philips Electronics N.V. Arithmetic encoding/decoding of a multi-channel information signal
JP4373006B2 (en) * 1998-05-27 2009-11-25 マイクロソフト コーポレーション Scalable speech coder and decoder
GB2340351B (en) * 1998-07-29 2004-06-09 British Broadcasting Corp Data transmission
JP2001325000A (en) * 2000-05-15 2001-11-22 Nippon Columbia Co Ltd Audio signal coding device
JP3346398B2 (en) * 2000-10-27 2002-11-18 日本ビクター株式会社 Audio encoding method and audio decoding method
US6636830B1 (en) * 2000-11-22 2003-10-21 Vialta Inc. System and method for noise reduction using bi-orthogonal modified discrete cosine transform
KR100472442B1 (en) * 2002-02-16 2005-03-08 삼성전자주식회사 Method for compressing audio signal using wavelet packet transform and apparatus thereof
JP2003280695A (en) * 2002-03-19 2003-10-02 Sanyo Electric Co Ltd Method and apparatus for compressing audio
GB2388502A (en) * 2002-05-10 2003-11-12 Chris Dunn Compression of frequency domain audio signals
CN100481733C (en) * 2002-08-21 2009-04-22 广州广晟数码技术有限公司 Coder for compressing coding of multiple sound track digital audio signal
CN100339886C (en) * 2003-04-10 2007-09-26 联发科技股份有限公司 Coding device capable of detecting transient position of sound signal and its coding method
CN1460992A (en) * 2003-07-01 2003-12-10 北京阜国数字技术有限公司 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding

Also Published As

Publication number Publication date
CN101046963B (en) 2011-03-23
CN101241701B (en) 2012-06-27
CN101312041B (en) 2011-05-11
CN101055719B (en) 2011-02-02
CN101246689A (en) 2008-08-20
CN101055719A (en) 2007-10-17
CN101312041A (en) 2008-11-26
CN1848690A (en) 2006-10-18
CN101241701A (en) 2008-08-13
CN101246689B (en) 2011-09-14
CN101247129A (en) 2008-08-20
CN100364235C (en) 2008-01-23
CN101055721A (en) 2007-10-17
CN101055721B (en) 2011-06-01
CN101046963A (en) 2007-10-03

Similar Documents

Publication Publication Date Title
CN101247129B (en) Signal processing method
JP5695714B2 (en) Multi-channel digital speech coding apparatus and method
US9361894B2 (en) Audio encoding using adaptive codebook application ranges
US6529604B1 (en) Scalable stereo audio encoding/decoding method and apparatus
CN1878001B (en) Apparatus and method of encoding audio data, and apparatus and method of decoding encoded audio data
US20020049586A1 (en) Audio encoder, audio decoder, and broadcasting system
JP2012163969A5 (en)
KR19990041073A (en) Audio encoding / decoding method and device with adjustable bit rate
WO2002043291A2 (en) Perceptual audio signal compression system and method
JPH03167927A (en) Bit allotment device for conversion digital audio broadcasting signal being adaptation type quantitized on psychological hearing basis
KR20040019889A (en) Method and apparatus for encoding or decoding an audio signal that is processed using multiple subbands and overlapping window functions
KR20040086880A (en) Method and apparatus for encoding/decoding digital data
KR980013436A (en) Adaptive Transform Coding System, Adaptive Transform Decoding System, and Adaptive Transform Coding / Decoding System
CN101065796A (en) Method and apparatus for coding/decoding using inter-channel redundance
JPH07106977A (en) Information decoding device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant