CN101800050A

CN101800050A - Audio fine scalable coding method and system based on perception self-adaption bit allocation

Info

Publication number: CN101800050A
Application number: CN201010107402A
Authority: CN
Inventors: 胡瑞敏; 杨玉红; 刘元元; 陈冰; 高丽; 项慨; 周超群; 杭波
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2010-02-03
Filing date: 2010-02-03
Publication date: 2010-08-11
Anticipated expiration: 2030-02-03
Also published as: CN101800050B

Abstract

The invention relates to the technical field of audio coding, in particular to audio fine scalable coding method and system based on perception self-adaption bit allocation. The method comprises the following steps of: preprocessing input signals; carrying out subband division on frequency-domain signals; calculating the perception importance of each subband; uniformly sequencing the subband from small to large according to the perception importance; extracting the subband with the largest perception importance for scalable longitudinal vector quantization; and then carrying out self-adaption adjustment on the vector-quantized subband with the largest perception importance. The system comprises a preprocessing module, a subband division module, a perception importance calculating, sequencing and extracting module, a scalable quantizing and coding module, a self-adaption adjusting module and a scalable coding finish judging module. The invention realizes high-efficient fine scalable audio coding, preferably realizes the unification of quantization precision and quantization efficiency and also satisfies the requirement of high tone quality when improving coding efficiency.

Description

Audio fine scalable coding method and system based on the perception self-adaption bit distribution

Technical field

The present invention relates to technical field of audio, relate in particular to a kind of audio fine scalable coding method and system that distributes based on perception self-adaption bit.

Background technology

The scalable audio coding technology is divided into a core layer and a plurality of enhancement layer with code stream, wherein core layer guarantees the minimum reconstruction quality of signal, enhancement layer then improves reconstruction quality gradually by the mode that improves signal to noise ratio (S/N ratio) or extending bandwidth, and the enhancing number of plies that receives is many more, and decoding tonequality is high more.

Scalable encoding can be by directly abandoning enhancement layer bitstream to adapt to network bandwidth fluctuation, and the meticulous more network bandwidth that just can effectively adapt to more of partition size fluctuates; On the other hand, the objective criteria that scalable audio coding performance quality is estimated is the perception signal to noise ratio (S/N ratio) of each hierarchical layer, and the subjective assessment standard also is each hierarchical layer decoded signal perceived quality.Therefore determine the steady perception self-adaption bit allocative decision that promotes of perceived quality of each hierarchical layer that graduated encoding performance quality is played crucial effects.

The more representative method of existing fine and classified audio coding method is the optimal bit distribution method that Moving Picture Experts Group-1 in 1994 adopts, and the frequency domain subband gradable method of the encoding and decoding speech standard of new generation that proposes of ITU-T in 2006 in G.729EV.

The optimal bit distribution method evenly is divided into a plurality of subbands with frequency-region signal, sort according to subband perceptual important degree, adopt by the most important subband of 5 bit quantization method coding perceptibility, and carry out the adjustment of subband perceptual important degree, feedback is proceeded the ordering of subband perceptual important degree and is pursued bit quantization, finish up to Bit Allocation in Discrete end or whole sub-band coding, what wherein pursue the 5 bit quantization method employing is scalar quantization, and the subband most important information is carried out quantization encoding.The optimal bit distribution method has guaranteed the raising of coding quality, but because scalar quantization itself quantizes the compression defective, and this method has limited the raising of quantitative efficiency to a certain extent, low code check in can't being applicable to.

G.729EV the standard enhancement layer is 32 subbands with division of signal, adopt the criterion of estimating of perceptual importance, each subband is sorted by the perceptual important degree, result and distributable bit number according to ordering are determined the optimum bit allocative decision, each subband MDCT coefficient is divided the sphere vector quantization, G.729EV the Bit distribution method of standard enhancement layer coding employing is not optimum, the bit number of each subband of encoding is wasteful, under the few situation of bit number, this Bit distribution method can only instruct scrambler quantization encoding minority subband, and most of sub-band information will be lost fully, though this method has significant quantitative efficiency, but there is the phenomenon of Bit Allocation in Discrete inequality and waste bits, cause some subband bit serious waste, some subband bit famine, thus the raising of tonequality finally influenced.

From above technology, current fine and classified audio coding is in two extreme states, and quantitative efficiency and partition size can not have a Rational Unified Process preferably, and the method partition size that quantitative efficiency is high is just low, and the partition size that quantitative efficiency is low is just high.

Summary of the invention

The purpose of this invention is to provide a kind of audio fine scalable coding method and system that distributes based on perception self-adaption bit, with with perception self-adaption bit piece allocative decision and the combination of high-effective classifying vector quantization technology, realize fine and classified efficiently audio coding, realize the unification of quantified precision and quantitative efficiency preferably.

For achieving the above object, the present invention adopts following technical scheme:

A kind of audio fine scalable coding method that distributes based on perception self-adaption bit may further comprise the steps:

Step is 1.: input signal is carried out pre-service, and wherein pre-service comprises that input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

Step is 2.: carry out sub-band division to above-mentioned through the frequency-region signal that obtains after the pre-service, according to the method for even division whole frequency domain is divided into N subband, wherein N 〉=1;

Step is 3.: calculate the perceptual important degree of each subband, and unify the antithetical phrase tape sort according to the perceptual important degree according to order from big to small, extract the subband of perceptual important degree maximum;

Step is 4.: according to the subband of perceptual important degree maximum, carry out gradable vertical vector quantization;

Step is 5.: the maximum important perception importance degree subband behind the vector quantization is carried out the self-adaptation adjustment;

Step is 6.: judge whether gradable quantification number of times arrives maximum times in the whole quantizing process, if do not reach maximum times, then return step 3., if reach maximum times, then finish hierarchical coding.

Described step 3. in, if with the perceptual important degree criterion of sub belt energy as each subband, the spectrum energy that then calculates each subband and comprised; If with amplitude as perceptual important degree criterion, the spectrum amplitude that then calculates each subband and comprised.

Definition VQ_rank (k) is the quantification gradation of k subband, and to its initialization assignment is:

VQ_rank(0)＝VQ_rank(1)...＝VQ_rank(N-1)＝0

K=0 wherein, 1 ... .N-1, the sub-band sum of N for dividing, N 〉=1;

The subband k of the perceptual important degree maximum that obtains is carried out the vector quantization of VQ_rank (k) level, give the frequency spectrum vector Y _kDistribute the R bit, the vector after obtaining quantizing

Wherein R value size is by the partition size S decision of scalable coder.

Definition _QmaxBe maximum gradable number of times in the signal quantization process, its initialization Q=1 calculates

The perceptual important degree And to Y _k, VQ_rank (k) and ip (k) carry out following self-adaptation to be revised:

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)＝VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q＝Q+1

Wherein, 0≤k≤N-1.

A kind of audio fine scalable coding system that distributes based on perception self-adaption bit comprises:

Pretreatment module is used for input signal is carried out pre-service, and wherein pre-service comprises that input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

The sub-band division module is used for the above-mentioned frequency-region signal that obtains after handling through pretreatment module is carried out sub-band division, according to the method for even division whole frequency domain is divided into N subband, wherein N 〉=1;

Subband perceptual important degree calculates ordering and extraction module, is used to calculate the perceptual important degree of each subband, and unifies the antithetical phrase tape sort according to the perceptual important degree according to order from big to small, extracts the subband of perceptual important degree maximum;

The scalar quantization coding module is used for the subband according to perceptual important degree maximum, carries out gradable vertical vector quantization;

The self-adaptation adjusting module is used for the maximum important perception importance degree subband behind the scalar quantization coding module vector quantization is carried out the self-adaptation adjustment;

Hierarchical coding finishes judge module, is used for judging whether the gradable quantification number of times of whole quantizing process arrives maximum times, and whether decision finishes hierarchical coding.

The perceptual weighting submodule is used for that input signal is carried out perceptual weighting and handles;

The time-frequency conversion submodule is used for that the signal after the perceptual weighting processing is carried out time-frequency conversion and handles.

Subband perceptual important degree calculates the ordering submodule, is used to calculate the perceptual important degree of each subband, and unifies the antithetical phrase tape sort according to the perceptual important degree according to order from big to small;

Perceptual important degree extraction module is used for the subband to the subband extraction perceptual important degree maximum after the ordering of subband perceptual important degree calculating ordering submodule.

The present invention has the following advantages and good effect:

1) with perception self-adaption bit piece allocative decision and the combination of high-effective classifying vector quantization technology, realizes fine and classified efficiently audio coding, realized the unification of quantified precision and quantitative efficiency preferably;

2) the present invention is that criterion antithetical phrase band carries out gradable vector quantification from people's ear apperceive characteristic with the perceptual important degree, has improved effectiveness of classification, has also satisfied the demand of high tone quality when improving code efficiency.

Description of drawings

Fig. 1 is the process flow diagram of the audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention.

Fig. 2 is sub-band division first synoptic diagram of the audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention.

Fig. 3 is sub-band division second synoptic diagram of the audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention.

Fig. 4 is the application synoptic diagram of the audio fine scalable coding system that distributes based on perception self-adaption bit provided by the invention.

Embodiment

The present invention mainly is that the perceptual important degree with subband is a criterion, the audio fine scalable coding method and the system that distribute based on perception self-adaption bit of proposition.

The present invention is relatively with the disposable the highest subband of perceptual important degree of distributing to of bit, increased effectiveness of classification, and relatively by the method for Bit Allocation in Discrete, improved code efficiency, from people's ear apperceive characteristic, be criterion with the perceptual important degree, the antithetical phrase band carries out gradable vector quantification, improve effectiveness of classification, described the present invention below respectively in conjunction with the accompanying drawings in detail.

The audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention specifically may further comprise the steps, and as shown in Figure 1, comprising:

Step 1: input signal is carried out pre-service, and wherein pre-service comprises that input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

Step 2: carry out sub-band division to above-mentioned through the frequency-region signal that obtains after the pre-service, whole frequency domain is divided into N subband, wherein N 〉=1 according to the method for even division;

Step 3: calculate the perceptual important degree of each subband, and unify the antithetical phrase tape sort according to order from big to small, extract the subband of perceptual important degree maximum according to the perceptual important degree;

The perceptual important degree criterion difference of concrete signal, if with the perceptual important degree criterion of sub belt energy as each subband, the spectrum energy that then calculates each subband and comprised; If with amplitude as perceptual important degree criterion, the spectrum amplitude that then calculates each subband and comprised;

The perceptual important degree that defines each subband is ip (k), k=0,1...N-1; According to the perceptual important degree size of calculating gained, the ordering of perceptual important degree is carried out in each subband unification, extract subband ip (k)=E (the k)=Max (ip (j)) of perceptual important degree maximum, wherein k=0,1 ... .N-1, j=0,1,2 ... N-1, the sub-band sum of N for dividing;

Step 4:, carry out gradable vertical vector quantization according to the subband of perceptual important degree maximum; This step further can comprise following substep:

1. defining VQ_rank (k) is the quantification gradation of k subband, and to its initialization assignment is:

VQ_rank(0)＝VQ_rank(1)...＝VQ_rank(N-1)＝0

K=0 wherein, 1 ... .N-1, the sub-band sum of N for dividing, N 〉=1;

2. the subband k of the perceptual important degree maximum that obtains is carried out the vector quantization of VQ_rank (k) level, give the frequency spectrum vector Y _kDistribute the R bit, the vector after obtaining quantizing

Wherein R value size is by the partition size S decision of scalable coder;

Step 5: the maximum important perception importance degree subband behind the vector quantization is carried out the self-adaptation adjustment; These step concrete operations are as follows:

Definition Q _MaxBe maximum gradable number of times in the signal quantization process, its initialization Q=1 calculates

The perceptual important degree

And to Y _k, VQ_rank (k) and ip (k) carry out following self-adaptation to be revised:

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)＝VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q＝Q+1

Wherein, 0≤k≤N-1;

Step 6: judge whether gradable quantification number of times arrives maximum times in the whole quantizing process,, then return step 3,, then finish hierarchical coding if reach maximum times if do not reach maximum times.

The audio fine scalable coding system that distributes based on perception self-adaption bit provided by the invention comprises with lower module:

1. pretreatment module is used for input signal is carried out pre-service, and wherein pre-service comprises that input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

Pretreatment module further comprises perceptual weighting submodule, time-frequency conversion submodule,

The time-frequency conversion submodule is used for that the signal after the perceptual weighting processing is carried out time-frequency conversion and handles;

2. sub-band division module is used for the above-mentioned frequency-region signal that obtains after handling through pretreatment module is carried out sub-band division, according to the method for even division whole frequency domain is divided into N subband, wherein N 〉=1;

3. subband perceptual important degree calculates ordering and extraction module, is used to calculate the perceptual important degree of each subband, and unifies the antithetical phrase tape sort according to the perceptual important degree according to order from big to small, extracts the subband of perceptual important degree maximum;

This module comprises that further subband perceptual important degree calculates ordering submodule, perceptual important degree extraction module:

Perceptual important degree extraction module is used for the subband to the subband extraction perceptual important degree maximum after the ordering of subband perceptual important degree calculating ordering submodule;

4. the scalar quantization coding module is used for the subband according to perceptual important degree maximum, carries out gradable vertical vector quantization;

5. the self-adaptation adjusting module is used for the maximum important perception importance degree subband behind the scalar quantization coding module vector quantization is carried out the self-adaptation adjustment;

6. hierarchical coding finishes judge module, is used for judging whether the gradable quantification number of times of whole quantizing process arrives maximum times, and whether decision finishes hierarchical coding.

Further the invention will be further described in conjunction with the accompanying drawings with specific embodiment below:

Step 1: input signal is carried out pre-service, and pre-service specifically comprises perceptual weighting and two processes of time-frequency conversion;

1. input signal is sent into perceptual weighting filter M _LB(z), while γ ₁', γ ₂' and γ ₃' (0＜γ ₁', γ ₂, ' γ ₃'＜1) three also corresponding adjustment of coefficient are to relax quantization noise spectrum:

W_{LB} (z) = \frac{\hat{A} (Z / {γ_{1}}^{'})}{\hat{A} (z / {γ_{2}}^{'})} (1 + Σ_{i = 1}^{2} a_{i} {γ_{3}}^{' i} z^{- i})

γ wherein ₁', γ ₂', γ ₃' for adjusting parameter, a _iBe the linear prediction analysis coefficient, i is the exponent number of linear prediction,

\hat{A} (z) = {\hat{a}}_{0} + {\hat{a}}_{1} z^{- 1} + \cdot \cdot \cdot + {\hat{a}}_{10} z^{- 10} .

2. time-frequency conversion is that time-domain signal is transformed into frequency domain, obtains the spectrum expression of sound signal, and present embodiment adopts the MDCT conversion.

Step 2: the frequency-region signal behind the time-frequency conversion is carried out spectral sub-bands divide, suppose entire spectrum evenly is divided into 64 subbands herein;

Fig. 2 is for evenly being divided into the synoptic diagram of 8 subbands, and transverse axis is represented subband frequency domain division scope, and the longitudinal axis is represented frequency domain energy amplitude, and its medium and low frequency core layer coding is basis of the present invention, not in limit of consideration of the present invention; The subband that comes out according to residual computations is used numeral " 1 " to arrive " 8 " in the drawings and is indicated respectively, and wherein subband 1, subband 2, subband 3 and subband 4 are low frequency audio sub-bands; Subband 5, subband 6, subband 7 and subband 8 are high-frequency audio subbands; The division of 64 subbands and 8 sub-band division are in like manner;

Step 3: suppose the measurement standard of the energy of each subband herein as subband perceptual important degree, calculate the energy that each subband comprised of 64 subbands, and sort from big to small according to the energy size, extract the subband of perceptual important degree maximum, embodiment is:

1. defining ip (k) is the perceptual important degree of k subband, and E (k) is k the spectrum energy that subband comprised, and calculates the energy of each subband with following formula:

ip (k) = E (k) = Y_{k}^{}

K=0 wherein, 1...63, Y _kIt is the MDCT spectral coefficient that k subband comprises;

2. the energy size of calculating each subband of gained according to following formula is the measurement standard of perceptual important degree, and the ordering of perceptual important degree is carried out in each subband unification, and the subband of extraction perceptual important degree maximum is sent into step 4 and carried out vector quantization, specifically is expressed as:

ip(k)＝Max(ip(j))

Wherein, 0≤k≤63, j=0 ..., 63;

Step 4: the subband of the perceptual important degree maximum that obtains according to step 3, according to this subband is carried out vertical vector quantization, suppose that here k subband is the subband of perceptual important degree maximum, concrete embodiment is:

VQ_rank(0)＝VQ_rank(1)...＝VQ_rank(63)＝0

K=0 wherein, 1 ... .63, the sub-band sum of N for dividing;

2. the subband k of the perceptual important degree maximum that obtains is carried out VQ_rank (k)=0 grade vector quantization, give the quantization vector Y of this subband _kDistribute R bit, wherein R partition size is as required adjusted, and weighs between quantitative efficiency and partition size, and to handle frame length 20ms, partition size 1kbps is an example, and then R is 20 bits, the vector after obtaining quantizing

Step 5: the subband k behind step 4 vector quantization is carried out the self-adaptation finishing, specifically implement as follows:

Suppose Q _Max=10 are maximum gradable number of times in the signal quantization process, its initialization Q=1;

Calculate

The perceptual important degree And to Y _k, VQ_rank (k) and ip (k) carry out following self-adaptation to be revised, that is:

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)＝VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q＝Q+1

0≤k≤63 wherein;

Step 6: whether the gradable quantification number of times Q after judgement carry out step 5 is greater than Q _MaxIf, greater than then finish hierarchical coding, if not greater than Q _MaxThen proceed step 3.

Fig. 3 is 8 subband bit allocation amounts synoptic diagram, and transverse axis is represented subband frequency domain division scope, and the longitudinal axis is represented frequency domain energy amplitude, and its medium and low frequency core layer coding is basis of the present invention, not in limit of consideration of the present invention; Enhancement layer evenly is divided into 8 subbands, according to each sub belt energy amplitude relatively, finds the 6th sub belt energy maximum, this subband of encoding vector block 1, adjust the 6th sub belt energy; Rearrangement sub belt energy amplitude is found the 1st sub belt energy maximum, this subband of encoding vector block 2; By that analogy, the 1st to 18 vector block of encoding respectively.

The binaural signal imported among Fig. 4 through mix down, resume module such as pre-service, low pass and high-pass filtering obtain low strap residual signals and high band signal.Low strap residual signals and high band signal obtain output code flow output as the input of graduated encoding module through method scalar quantization provided by the invention.

Fig. 4 is the application of content of the present invention in whole audio coding framework, wherein graduated encoding vector quantization 30 is realized the position of fine granulation hierarchical coding for the present invention, with content application of the present invention in the gradable vector quantification of coding framework, instruct audio coding, improve quantitative efficiency and quantified precision.

Claims

1. an audio fine scalable coding method that distributes based on perception self-adaption bit is characterized in that, may further comprise the steps:

Step: carry out sub-band division to above-mentioned through the frequency-region signal that obtains after the pre-service, whole frequency domain is divided into N subband, wherein N 〉=1 according to the method for even division;

2. the audio fine scalable coding method that distributes based on perception self-adaption bit according to claim 1 is characterized in that:

3. the audio fine scalable coding method that distributes based on perception self-adaption bit according to claim 1 is characterized in that,

4. described step further comprises following substep:

VQ_rank(0)＝VQ_rank(1)...＝VQ_rank(N-1)＝0

K=0 wherein, 1 ... .N-1, the sub-band sum of N for dividing, N 〉=1;

Wherein R value size is by the partition size S decision of scalable coder.

4. the audio fine scalable coding method that distributes based on perception self-adaption bit according to claim 3 is characterized in that 5. described step further comprises following substep:

Y_{k} = Y_{k} - \hat{Y_{k}}

VQ_rank(k)＝VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q＝Q+1

Wherein, 0≤k≤N-1.

5. an audio fine scalable coding system that distributes based on perception self-adaption bit is characterized in that, comprising:

6. the audio fine scalable coding system that distributes based on perception self-adaption bit according to claim 5 is characterized in that,

Described pretreatment module further comprises:

7. according to claim 5 or the 6 described audio fine scalable coding systems that distribute based on perception self-adaption bit, it is characterized in that described subband perceptual important degree calculates ordering and extraction module further comprises: