CN101430879B

CN101430879B - Multi-speed audio encoding method

Info

Publication number: CN101430879B
Application number: CN 200710169619
Authority: CN
Inventors: 刘泽新; 马付伟; 肖玮
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2007-11-05
Filing date: 2007-11-05
Publication date: 2011-08-10
Anticipated expiration: 2027-11-05
Also published as: WO2009059564A1; CN101430879A

Abstract

The embodiment of the invention discloses a multi-rate voice frequency encoding method. The method comprises the following steps: computing frequency spectrum of a perceptual weighting filter and a first ratio of the frequency spectrum of two previous layers of synthetic voice which are firstly encoded and then decoded by an input signal for each adjacent lattice point; and encoding index values of the corresponding lattice point of the first ratio into a code stream according to a descending order of the first ratio. The method can help improve the quality of output voice or sound signals when lacking bits during transform domain encoding.

Description

A kind of method of multi-speed audio encoding

Technical field

The present invention relates to coding techniques, refer in particular to a kind of method of multi-speed audio encoding.

Background technology

In present multi-speed audio encoding, when input signal (for example meets under musical specific property or the code check condition with higher, AMR-WB+, G.729.1 with G.VBR in), the transform domain codings that adopt more, promptly pass through transform method, for example, discrete cosine transform of correction (MDCT) or fast Flourier (FFT) conversion etc. transform to frequency domain with time-domain signal.And when the parameter to transform domain coding quantizes, then lattice type vector quantization technologies that adopt more.Fig. 1 is the process flow diagram of lattice vector quantization method in the prior art.As shown in Figure 1, in general, lattice vector quantization comprises step as described below:

Step 101 finds corresponding lattice point to frequency-region signal according to nearby principle.

Promptly to the input spectrum vector according to nearby principle, specifically, at Ge Saite (Gosset) dot matrix (being called the RE8 lattice) or the Z8 lattice or the Z16 lattice etc. of 8 dimensions, in find with its from nearest lattice point C _k

Step 102 according to size and total bit number of the pairing spectrum energy of each lattice point, is asked for the index value of corresponding lattice point.

Wherein, work as C _kIn basic code book, index value comprises codebook index value n _kWith corresponding codewords index value I _kWork as N _kG is not in basic code book the time, to C _kCarry out the Voronoi expansion, this moment, index value was except comprising codebook index value n _kWith codewords indexes value I _kAlso comprise extended code index to a book kv outward.

Step 103, when bit number was not enough, the element of the lattice point of the frequency spectrum correspondence that energy is less will all be put 0 by force.

Step 104 according to the order from the low frequency to the high frequency, is write into code stream with the index value of these lattice points.

Step 105 in decoding end, decodes frequency spectrum sequence after the quantification according to the index value that decodes from low frequency to the high frequency order.

By the masking effect of signal as can be known, the signal that the sound intensity is big can mask off the little signal of the sound intensity around it, makes people's the existence of the imperceptible masked signal of auditory system; Simultaneously, itself also has the effect that signal is sheltered people's auditory system, and promptly when the signal sound intensity during less than certain thresholding, even without sheltering of other signal, people's auditory system is the existence of imperceptible this signal also, this shelter to be called definitely shelter.Detect as can be known by test, in the scope of 0～200Hz, definitely shelter thresholding and reduce along with the increase of frequency, and in the scope of 200～2000Hz, it is almost constant definitely to shelter thresholding.

When the frequency spectrum to input signal carried out lattice vector quantization, because the restriction of total bit number, the element of some lattice point (for example, the pairing quantized value of the frequency spectrum of input signal) was forced to put 0 possibly.At this moment,, will reduce coding quality greatly, therefore need which lattice point be set to 0, and keep the criterion of which lattice point according to a decision if some lattice points that have the frequency spectrum module correspondence of important information are forced to put 0.

In above-mentioned coding method, owing to will promptly according to spectrum energy order from big to small, the parameter reference value of transform domain coding be write into code stream according to the order from the low frequency to the high frequency at coding side.Therefore, when bit number was not enough, the element of the lattice point of the frequency spectrum correspondence that spectrum energy is less relatively will all be put 0 by force.Therefore, when bit number is not enough, can only decode partial code streams in decoding end, and because the parameter of the transform domain coding of coding side by low frequency to the high frequency sequential write in code stream, so can only recover a spot of low frequency signal this moment, thereby cause when code check raises, the voice of output or audio signal quality obviously do not promote yet.In addition, when asking for the index value of lattice point, according to the importance of the size of spectrum energy decision lattice point, when bit number was not enough, the element of the lattice point that spectrum energy is less was set to 0; But the not necessarily important composition of part that spectrum energy is big, such decision rule may be set to 0 with the element of the lattice point of some important component correspondences, influences the quality of output signal.In addition, according to the principle of masking effect as can be known, whether signal the masked size that not exclusively depends on spectrum energy, to a certain extent, also depend on the difference between masking signal and the masked signal, and in above-mentioned coding method, also do not consider this difference.

Summary of the invention

In view of this, the fundamental purpose of the embodiment of the invention is to provide a kind of method of multi-speed audio encoding, thereby improves output voice (audio frequency) quality of signals when bit is not enough in the transform domain coding.

For achieving the above object, the technical scheme in the embodiment of the invention is achieved in that

A kind of method of multi-speed audio encoding, this method comprises:

For each adjacent lattice point, calculate the frequency spectrum of perceptual weighting wave filter and first ratio of the frequency spectrum of two-layer synthetic speech before decoded again according to the input signal coding; According to described first ratio order from big to small, the index value of the corresponding lattice point of described first ratio is enrolled code stream.

A kind of method of multi-speed audio encoding is provided in the embodiments of the invention in summary.By using the method in the present embodiment, can guarantee that the masked less pairing boot entry of signal of possibility that falls can not be forced to be changed to 0, thereby guarantee when bit number is not enough, more important information will be quantized more subtly and preferentially be enrolled code stream, and unessential information will be quantized roughly.Make when the decoding end bit number is not enough, can decode more important information, thus the quality of the voice that raising decodes.

Description of drawings

Fig. 1 is the process flow diagram of lattice vector quantization method in the prior art.

Fig. 2 is the process flow diagram of multi-speed audio encoding method in the embodiment of the invention.

Fig. 3 is the process flow diagram of multi-speed audio encoding method in another embodiment of the present invention.

Fig. 4 is the process flow diagram of multi-speed audio encoding method in yet another embodiment of the invention.

Embodiment

For making the purpose, technical solutions and advantages of the present invention express clearlyer, the present invention is further described in more detail below in conjunction with drawings and the specific embodiments.

Fig. 2 is the process flow diagram of multi-speed audio encoding method in the embodiment of the invention.As shown in Figure 2, the flow process of multi-speed audio encoding method comprises step as described below in the embodiment of the invention:

Step 201, coding side receive the input data.

In this step, can be to the input spectrum vector according to nearby principle, in the RE8 lattice, find with its from nearest lattice point C _xWith lattice point C _xThe size of pairing spectral capabilities and total bit number are as the input data, and coding side receives described input data, and it is carried out following processing.

Step 202 is carried out the CELP coding to input signal.

In the CELP cataloged procedure, input signal can be divided two-layer (for example, L1 layer and L2 layer) to encode respectively.

Step 203 is decoded to the described CELP encoded signals of carrying out.

Step 204 is found the solution frequency spectrum according to the decoded signal in the step 203.Specifically, promptly solve the frequency spectrum Freq_R2 of preceding two-layer synthetic speech (audio frequency) signal in the decoded signal according to decoded signal.

Step 205 is calculated the difference signal of preceding two-layer synthetic speech (audio frequency) signal of the decoded signal in original input signal and the step 203, and calculates the MDCT coefficient (being frequency spectrum Freq_err) of this difference signal.

Step 206, the MDCT coefficient in the step 205 operated (for example, rounding operation) after, can obtain 35 boot entries.

Step 207, frequency spectrum Freq_R2 in the calculation procedure 204 and the ratio R atio[k of the Freq_err in the step 205].The formula that calculates is as follows:

Ratio [k] = Σ_{l = 0}^{7} {(\frac{Freq_R 2 [l] - Freq_err [l]}{Freq_R 2 [l]})}^{2} - - - (1)

Wherein, l=8*k+i, k=0,1,2 ..., 34, i=0,1,2 ..., 7.

By using formula (2) to calculate, can obtain an array Ratio[k who constitutes by 35 ratios], the boot entry in all unique corresponding step 206 of each ratio in the array.

Step 208 sorts to boot entry according to array.

Specifically, be about to Ratio[k] be divided into two zones, a preceding n ratio is first zone, back (35-n) individual ratio is another zone.At first sorted according to ratio order from small to large in first zone, sorted according to ratio order from big to small in second zone; Set up an array R[k who comprises 35 elements], then will with the sequence number of the boot entry of the corresponding lattice point of (35-n) ratio in second zone after the ordering according to the putting in order of ratio in second zone, leave array R[k respectively in] in corresponding before in (35-n) individual element; Will with the sequence number of the boot entry of the corresponding lattice point of n ratio in first zone after the ordering according to the putting in order of ratio in first zone, leave array R[k respectively in] in corresponding before in n element.Wherein, n can preestablish according to practical situations.In the present embodiment, can establish n=4.

Step 209 is found the solution the frequency spectrum W_Freq of perceptual weighting wave filter.

Described perceptual weighting filters H (z) satisfies formula: H (z)=A (z/ γ)/(1-β z ^-1).Wherein, A represents linear predictor coefficient, and z represents frequency domain, and β and γ then represent weighting factor, is constant generally speaking.

Step 210, frequency spectrum Freq_R2 in the calculation procedure 204 and the ratio R at[k of the W_Freq in the step 209], the formula of calculating is as follows:

Rat [k] = \frac{Σ_{l = 0}^{7} \log [{(Freq_R 2 [l])}^{2} + {(Freq_R 2 [l + 1])}^{2}]}{Σ_{l = 0}^{7} \log [{(W_Freq [l])}^{2} + {(W_Freq [l + 1])}^{2}]} - - - (2)

Wherein, l=8*k+i, k=0,1,2 ..., 34, i=0,1,2 ..., 7.Execution in step 213.

Step 211 judges whether total bit number is enough.If, then direct execution in step 213; Otherwise, execution in step 213 again after the execution in step 212.

Step 212 is according to the situation of total bit number, with the array R[k in the step 208] in the corresponding lattice point of boot entry deposited of m the element in back all elements be changed to 0, index value also is 0 accordingly.Wherein, can preestablish the value of m according to what of total bit number.

Step 213 is found the solution the index value of L3 and L4 layer boot entry according to ratio order from big to small.Execution in step 214.

Step 214 is according to Rat[k] the putting in order from big to small of value, with corresponding lattice point c _kIndex value, order is write into code stream.Be about to Rat[k] value greatly the index value of the lattice point of the correspondence of (more important signal) write into code stream, Rat[k earlier] write into code stream behind the index value of the lattice point of the correspondence that value is little.

As from the foregoing, in the present embodiment, adopt the represented decision rule of formula (1) more to meet the principle of masking effect: according to the principle of masking effect, if the difference of Freq_R2 and Freq_err is more little, the frequency spectrum that they then are described is approaching more, so the masked possibility of falling of Freq_err is more little; In addition, under the identical situation of above-mentioned difference, above-mentioned difference and the Freq_R2 ratio behind the local decode are big more, and then the masked possibility of falling of Freq_err is more little.Therefore, by the method in the present embodiment, can guarantee that the masked less pairing boot entry of signal of possibility that falls can not be forced to be changed to 0, thereby guarantee when bit number is not enough, more important information will be quantized more subtly and preferentially be enrolled code stream, and unessential information will be quantized roughly.

In addition, in the present embodiment, be the ratio R at of the frequency spectrum W_Freq of the frequency spectrum Freq_R2 of synthetic speech (audio frequency) of the employing CELP coding that goes out according to local decode and perceptual weighting wave filter, determine the order of boot entry index value in code stream, reason is: according to the effect of perceptual weighting wave filter, distribute bigger distortion at input signal spectrum energy larger part, and spectrum energy less locate to reduce distortion as far as possible, thereby for adopting the CELP encoded signals, will cause quantize at the Rat larger part more coarse relatively, and this emphasis of lattice vector quantization in present embodiment just.So, be placed on the position than the front of code stream by the index value of those boot entries that Rat is bigger, and the index value of those boot entries that Rat is less is placed on the method than the position of back of code stream, can make when the decoding end bit number is not enough, can decode more important information, thereby improve the quality of the voice that decode.

In another embodiment of the present invention, also can determine the order of lattice type code book in code stream by a definite value.Fig. 3 is the process flow diagram of multi-speed audio encoding method in another embodiment of the present invention, and as shown in Figure 3, the multi-speed audio encoding method comprises step as described below in the present embodiment:

Step 301 is found the solution the MDCT coefficient of R3 and R4 layer.At this moment, also need 35 frequency spectrum modules of described MDCT coefficient correspondence are divided into 0～2kHz and two parts of 2～7kHz by spectral range.For example, preceding 10 pairing spectral ranges of frequency spectrum module are 0～2kHz, and then 25 pairing spectral ranges of frequency spectrum module are 2～7kHz.

Step 302 is found the solution the boot entry that spectral range is 2～7kHz.

Step 303 judges whether total bit number is enough.If then execution in step 306; Otherwise, execution in step 304;

Step 304 is that the value of the less boot entry of the spectrum energy of 2～7kHz is changed to 0 with spectral range.The boot entry that is about to spectral range and is 2～7kHz sorts according to spectrum energy order from big to small, and the value of the boot entry that n the spectrum energy in back is less is changed to 0.Described n can set in advance according to practical situations.

Step 305, the value of finding the solution spectral range and be 2～7kHz are not the index value of 0 boot entry, enroll code stream with finding the solution the index value that obtains.Execution in step 312.

Step 306 is found the solution the index value that spectral range is the boot entry of 2～7kHz, enrolls code stream with finding the solution the index value that obtains.

Step 307 is found the solution the boot entry that spectral range is 0～2kHz.

Step 308 judges whether total bit number is enough.If then execution in step 311; Otherwise, execution in step 309;

Step 309 is that the value of the less boot entry of the spectrum energy of 0～2kHz is changed to 0 with spectral range.The boot entry that is about to spectral range and is 0～2kHz sorts according to spectrum energy order from big to small, and the value of the boot entry that n the spectrum energy in back is less is changed to 0.Described n can set in advance according to practical situations.

Step 310, the value of finding the solution spectral range and be 0～2kHz are not the index value of 0 boot entry, enroll code stream with finding the solution the index value that obtains.Execution in step 312.

Step 311 is found the solution the index value that spectral range is the boot entry of 0～2kHz, enrolls code stream with finding the solution the index value that obtains.

Step 312 finishes coding.

By above-mentioned method, the index value of boot entry that can the MDCT spectrum of 2～7kHz is corresponding is placed on the anterior locations of code stream, and the index value of the boot entry that the MDCT spectrum of 0～2kHz is corresponding is placed on thereafter, forms a complete code stream.That is, during to the energy ordering of 35 frequency spectrum modules, can (pairing frequency be 2～7kHz) energy by from big to small rank order, and the sequence number of corresponding boot entry is left in the array with back 25 modules; And then with preceding 10 frequency spectrum modules (pairing frequency is 0～2kHz) energy according to from big to small rank order, the sequence number of corresponding boot entry is placed on 10 positions of the back of preceding 25 sequence numbers again.

Above-mentioned sort method is applicable to that low layer encodes with CELP, and is high-rise with in the embedded multi-rate speech coding algorithm of transition coding.In above-mentioned method, select 2kHz as separation, be because the CELP coding is fine for the effect of the processing of the low frequency signal of 0～2kHz; Simultaneously, because the spectrum signal of the difference signal of the lower layer signal that the high-rise signal of handling is original input signal and local decode to be gone out, so in the signal of the required processing of high level, the above spectrum signal of 2kHz is prior information.Therefore, when the frequency spectrum of coded difference signal, should pay the utmost attention to the above spectrum signal of coding 2kHz, thereby guarantee in decoding end when bit number is not enough, can preferentially decode the above more important information of 2kHz, rather than decode the information of more inessential low frequency.

In above-mentioned method,, identical with the coding method of the CELp coded portion shown in Fig. 2 about the coding method of CELP coded portion.Do not repeat them here.

In addition, the method for above-mentioned steps 301～312 also can combine with the method among the embodiment one, thereby better realizes purpose of the present invention.

In addition, Fig. 4 is the process flow diagram of multi-speed audio encoding method in yet another embodiment of the invention.As shown in Figure 4, by the method in the present embodiment, can realize:

1) according to the requirement of actual coding device, carry out mode switch, decision is to select global index's order deterministic model or block index order deterministic model.

2) according to the requirement of actual coding device, carry out the mode switch of index order, the ordering of static schema or the ordering of dynamic mode are carried out in decision.

3) difference of the frequency spectrum of the difference of the lower layer signal that goes out of the frequency spectrum of the lower layer signal that goes out according to local decode and original input signal and local decode, the frequency spectrum of the lower layer signal that goes out with local decode is done ratio and is used as criterion again;

4) signal to noise ratio (S/N ratio) that can obtain at the encoding and decoding end or certain the non-0 value order that decides lattice type codebook index in the code stream.

The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. the method for a multi-speed audio encoding is characterized in that, this method comprises:

2. method according to claim 1 is characterized in that, when total bit number is not enough, the index value of the corresponding lattice point of described first ratio is enrolled code stream comprise:

Calculate input signal with decoded again according to the input signal coding before the frequency spectrum of difference of two-layer composite signal, the frequency spectrum that calculates described difference with decoded again according to the input signal coding before second ratio of frequency spectrum of two-layer synthetic speech, will with the index value of the corresponding lattice point of described second ratio, enroll code stream according to second ratio order from big to small.

3. method according to claim 1 is characterized in that, the described first ratio R atio[k] computing formula be:

Ratio [k] = Σ_{l = 0}^{7} {(\frac{Freq_R 2 [l] - Freq_err [l]}{Freq_R 2 [l]})}^{2}

Wherein, l=8*k+i, k=0,1,2 ..., 34, i=0,1,2 ..., 7; Described Freq_R2[l] be the frequency spectrum of the preceding two-layer synthetic speech signal in the decoded signal; Described Freq_err[l] be the discrete cosine transform coefficient of difference signal of the preceding two-layer synthetic speech signal of original input signal and described decoded signal.

4. method according to claim 2 is characterized in that, the described second ratio R at[k] computing formula be:

Rat [k] = \frac{Σ_{l = 0}^{7} \log [{(Freq_R 2 [l])}^{2} + {(Freq_R 2 [l + 1])}^{2}]}{Σ_{l = 0}^{7} \log [{(W_Freq [l])}^{2} + {(W_Freq [l + 1])}^{2}]}

Wherein, l=8*k+i, k=0,1,2 ..., 34, i=0,1,2 ..., 7.

5. method according to claim 1 is characterized in that, described method also comprises:

Spectrum signal is divided into first and second portion by spectral range; Find the solution the boot entry and the index value thereof of the spectrum signal of second portion, and described index value is enrolled code stream.