CN101180676B

CN101180676B - Methods and apparatus for quantization of spectral envelope representation

Info

Publication number: CN101180676B
Application number: CN2006800181405A
Authority: CN
Inventors: 科恩·贝尔纳德·福斯
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2005-04-01
Filing date: 2006-04-03
Publication date: 2011-12-14
Anticipated expiration: 2026-04-03
Also published as: CN101185127A; CN101184979B; UA92341C2; ES2350494T3; CN101185120B; UA95776C2; CN101180677B; CN101185126A; CN101185125B; UA94041C2; CN101185120A; CN101185127B; CN101180676A; UA93677C2; CN101185125A; CN101185126B; CN101180677A; UA91853C2; UA92742C2; CN101185124A

Abstract

A quantizer according to an embodiment is configured to quantize a smoothed value of an input value (e.g., a vector of line spectral frequencies) to produce a corresponding output value, where the smoothed value is based on a scale factor and a quantization error of a previous output value.

Description

The method and apparatus that is used for the vector quantization of spectral envelope representation

The application's case is advocated the right of the 60/667th, No. 901 U.S. Provisional Patent Application case that is entitled as " CODING THE HIGH-FREQUENCYBAND OF WIDEBAND SPEECH " of application on April 1st, 2005.The application's case is also advocated the right of the 60/673rd, No. 965 U.S. Provisional Patent Application case that is entitled as " PARAMETER CODING IN A HIGH-BANDSPEECH CODER " of application on April 22nd, 2005.

Technical field

The present invention relates to signal Processing.

Background technology

(line spectral frequency, vector LSF) or the form of similar expression send to demoder to speech coder with line spectral frequencies with the sign of the spectrum envelope of voice signal.For effective transmission, these LSF are quantized.

Summary of the invention

According to an embodiment, a kind of quantizer to input value (for example is configured to, the vector of line spectral frequencies or its part) smooth value quantizes, and to produce corresponding output valve, wherein said smooth value is based on the quantization error of zoom factor and previous output valve.

Description of drawings

Fig. 1 a shows the block diagram according to the speech coder E100 of embodiment.

Fig. 1 b shows the block diagram of Voice decoder E200.

Fig. 2 shows the example that the common one dimension of being carried out by scalar quantizer shines upon.

Fig. 3 shows a simplified example as the multidimensional mapping of being carried out by vector quantizer.

Fig. 4 a shows an example of one-dimensional signal, and Fig. 4 b shows the example of the version of this signal after quantification.

Fig. 4 c shows the example as the signal of Fig. 4 a that is quantized by the quantizer 230b as shown in Fig. 6.

Fig. 4 d shows the example as the signal of Fig. 4 a that is quantized by the quantizer 230a as shown in Fig. 5.

Fig. 5 shows the block diagram according to the enforcement 230a of the quantizer 230 of embodiment.

Fig. 6 shows the block diagram according to the enforcement 230b of the quantizer 230 of embodiment.

Fig. 7 a shows the Logarithmic magnitude of voice signal and the example of frequency curve.

Fig. 7 b shows the block diagram of basic linear predictive coding system.

Fig. 8 shows the block diagram of the enforcement A122 of arrowband scrambler A120 (shown in Figure 10 a).

Fig. 9 shows the block diagram of the enforcement B112 of arrowband demoder B110 (shown in Figure 11 a).

Figure 10 a is the block diagram of wideband acoustic encoder A100.

Figure 10 b is the block diagram of the enforcement A102 of wideband acoustic encoder A100.

Figure 11 a is the block diagram corresponding to the broadband voice demoder B100 of wideband acoustic encoder A100.

Figure 11 b is the example corresponding to the broadband voice demoder B102 of wideband acoustic encoder A102.

Embodiment

Because quantization error, the spectrum envelope that is reconstituted in the demoder may show undue fluctuation.These fluctuations may produce bad " trill " quality in decoded signal.Embodiment comprises and is configured to use the temporary transient regulating noise of spectrum envelope parameter to quantize system, the method and apparatus that (temporal noise shaping quantization) carries out the high quality in broadband voice coding.Feature comprises the fixing or self-adaptation smooth that the coefficient of high frequency band LSF is for example represented.Application-specific described herein comprises the wideband acoustic encoder that narrow band signal and high-frequency band signals are made up.

Unless its context limits clearly, otherwise term " calculating " is used to refer to its any ordinary meaning in this article, for example calculates, produces and select from value list.Other element or operation are not got rid of in the place of using term " to comprise " in present embodiment and claims.Term " A is based on B " is used to refer to its any ordinary meaning, comprises situation (i) " A equals B " and (ii) " A is at least based on B ".Term " Internet Protocol (Internet Protocol) " comprises (the Internet Engineering Task Force as IETF, the internet engineering task group) RFC (Request for Comments, Request for Comment) the 4th edition described in 791 and for example the 6th edition subsequent version.

Speech coder can be implemented according to sound source-filter model, and described speech coder becomes one group of parameter of describing wave filter with the speech signal coding of input.For instance, the spectrum envelope of voice signal is characterized by the resonance of expression sound channel and the some peak values that are called resonance peak.Fig. 7 a shows an example of described spectrum envelope.Most of speech coders this rough spectrum structure at least are encoded into one group of parameter, for example filter coefficient.

Fig. 1 a shows the block diagram according to the speech coder E100 of embodiment.So shown in the example, analysis module can be embodied as linear predictive coding (linear prediction coding, LPC) analysis module 210, and its spectrum envelope with voice signal S1 is encoded into one group of linear prediction (LP) coefficient (for example, the coefficient 1/A (z) of all-pole filter).Analysis module is processed into input signal a series of non-overlapping frames usually, wherein calculates one group of new coefficient at each frame.Frame period is the possibility local during this period stable cycle of wanted signal normally; A common example is 20 milliseconds (being equivalent to following 160 samples of sampling rate of 8kHz).An example of low-frequency band lpc analysis module (lpc analysis module 210 as shown in Figure 8) is configured to calculate one group of ten LP filter coefficient, so that characterize the resonance peak structure of each 20 milliseconds of frame of narrow band signal S20, and an example of high frequency band lpc analysis module (the high band encoder A200 shown in Figure 10 a) is configured to calculate hexad (perhaps, eight) the LP filter coefficient, so that characterize the resonance peak structure of each 20 milliseconds of frame of high frequency band voice signal S30.Also might implement analysis module input signal is processed into a series of overlapping frame.

Analysis module can be configured to directly analyze the sample of each frame, or can at first be weighted sample according to window function (for example, Hamming window (Hamming window)).Analyze also and can go up execution at window (for example 30 milliseconds of windows) greater than described frame.This window can be symmetrical (for example, 5-20-5 makes it comprise and is right after before 20 milliseconds of frames and afterwards 5 milliseconds), or asymmetric (for example, 10-20 makes it comprise last 10 milliseconds of previous frame).The lpc analysis module is configured to use Levinson-Durbin recurrence or Leroux-Gueguen algorithm to calculate the LP filter coefficient usually.In another was implemented, analysis module can be configured to calculate one group of cepstrum coefficient (cepstralcoefficient) at each frame, rather than one group of LP filter coefficient.

By filter parameter is quantized, can significantly reduce the carry-out bit speed of speech coder, and have less relatively influence reproducing quality.Coefficient of linear prediction wave filter is difficult to be quantized effectively, and be mapped to another expression by speech coder usually, for example line spectrum pair (line spectral pair, LSP) or line spectral frequencies (line spectral frequency, LSF), for quantizing and/or entropy coding.Speech coder E100 shown in Fig. 1 a comprises the LP filter coefficient to LSF transducer 220, and it is configured to the LP groups of filter coefficients is transformed into the corresponding vector of LSFS3.Other of LP filter coefficient represents to comprise partial autocorrelation coefficient (parcor coefficient) one to one; Log area ratio rate value (log-area-ratio value); Adpedance spectrum to (immittance spectral pair, ISP); And adpedance spectral frequency (ISF, immittance spectral frequencies), it can be used for GSM (global system for mobile communications) AMR-WB (AMR-WB) codec.Usually, one group of LP filter coefficient is reversible with conversion between one group of corresponding LSF, but embodiment also comprise conversion can't be in the enforcement that does not have speech coder reversible under the situation of error.

Speech coder comprises quantizer usually, and it is configured to arrowband LSF group (or other coefficient is represented) is quantized, and exports the result of this quantification, as filter parameter.Usually use vector quantizer to carry out quantification, described vector quantizer is encoded into input vector the index of the corresponding vectorial clauses and subclauses in Compass or the code book.Described quantizer also can be configured to carry out and quantize through class vector.For instance, described quantizer can be configured to based on selecting one in the group code basis in information encoded in the same frame (for example, in the low-frequency band channel and/or in the high frequency band channel).Described technology is stored as the code efficiency that cost provides increase with extra code book usually.

Fig. 1 b shows the block diagram of corresponding Voice decoder E200, it comprises the inverse quantizer 310 that is configured to the LSF S3 through quantizing is carried out inverse quantization, will become the LSF of one group of LP filter coefficient to LP filter coefficient transducer 320 through the LSF of inverse quantization vector transformation with being configured to.Composite filter 330 according to LP filter coefficient configuration is driven by pumping signal usually, with the synthetic reproduction that produces input speech signal as through decoded speech signal S5.Pumping signal can based on random noise signal and/or based on as the quantization means of the residual error that sends by scrambler.In some multiband scrambler of for example wideband acoustic encoder A100 and demoder B100 (describing referring to (for example) Figure 10 a, Figure 10 b and Fig. 1 1a, Figure 11 b as this paper), the pumping signal that is used for a frequency band derives from the pumping signal that is used for another frequency band.

Stochastic error is introduced in the quantification of LSF, and described stochastic error is usually uncorrelated to next frame from a frame.This error may cause the LSF through quantizing level and smooth not as the LSF of non-quantized, and may reduce the perceived quality through decoded signal.Compare with the LSF vector of non-quantized, the independent of LSF vector quantizes usually the amount that meeting one frame one frame ground increases spectral fluctuations, and these spectral fluctuations may cause sounding not nature through the signal of decoding.

Knagenhjelm and Kleijn propose a kind of solution of complexity, " Spectral Dynamics is More Important thanSpectral Distortion; " 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), vol.1, pp.732-735,9-12May 1995 ", wherein in demoder, carry out smooth through the LSF of inverse quantization parameter.This has reduced spectral fluctuations, but is as cost with extra delay.The application's case describing method, described method is used temporary transient regulating noise in coder side, and making not to have reducing spectral fluctuations under the situation of extra delay.

Quantizer is configured to input value is mapped to one in one group of discrete output valve usually.A finite population output valve is available, makes the input value of a scope be mapped to single output valve.Because compare, indicate the index of corresponding output valve to transmit, so quantize to have increased code efficiency with less position with original input value.Fig. 2 shows the example that the common one dimension of being carried out by scalar quantizer shines upon.

Described quantizer can be vector quantizer equally, and uses vector quantizer that LSF is quantized usually.Fig. 3 shows a simplified example of the multidimensional mapping of being carried out by vector quantizer.In this example, the input space is divided into several Voronoi zones (for example, according to arest neighbors standard (nearest-neighbor criterion)).Quantification is mapped to the corresponding Voronoi of expression zone with each input value, and () value is shown as it point herein usually, the centre of form.In this example, the input space is divided into six zones, makes arbitrary input value all can represent by an index that only has six different conditions.

If input signal is very level and smooth, so according to the small step square between the value in the output region that quantizes, output through quantizing may appear sometimes not as good as original level and smooth situation.Fig. 4 a shows an example of the level and smooth one-dimensional signal that only changes in a quantification gradation (only showing a described grade herein), and Fig. 4 b is illustrated in the example of this signal after quantizing.Even the input among Fig. 4 a only changes among a small circle, but the output of the gained among Fig. 4 b contains how unexpected transformation and level and smooth not as good as original.Described effect may cause audible non-natural sign (artifact), and may reduce this effect at LSF (or other expression of spectrum envelope to be quantified).For instance, can improve LSF quantification performance by incorporating interim regulating noise into.

In method, in scrambler,, the vector of spectrum envelope parameter is estimated once at each frame (or other block) of voice according to an embodiment.Parameter vector is quantized, so that be transferred to demoder effectively.After quantizing, storage quantization error (be defined as between parameter vector that quantize and non-quantized poor).Before the parameter vector to frame N quantizes, make the quantization error of frame N-1 reduce zoom factor, and add the parameter vector of frame N to.When current spectrum envelope through estimating and previous difference between the spectrum envelope of estimation are relatively large, may wish that the value of zoom factor is less.

In method, calculate LSF quantization error vector and make it multiply by the zoom factor b that has than 1.0 little values at each frame according to an embodiment.Before quantizing, add previous frame to LSF vector (input value V10) through the scalar quantization error.The quantization operation of described method can be described by for example following expression formula:

y(n)＝Q(s(n)+b[y(n-1)-s(n-1)])，

Wherein s (n) is relevant to frame n the LSF of smooth vector, and y (n) is the LSF vector through quantize relevant with frame n, and Q () is the arest neighbors quantization operation, and b is a zoom factor.

(the output valve V30 through quantizing of) smooth value V20 for example, the LSF vector, wherein smooth value V20 is based on the quantization error of zoom factor V40 and previous output valve V30 to be configured to produce input value V10 according to the quantizer 230 of embodiment.Can use described quantizer do not having to reduce the spectrum fluctuation under the situation of extra delay.Fig. 5 shows a block diagram of implementing 230a of quantizer 230, wherein can implement peculiar value for this reason and be indicated by subscript a.In this example, by using totalizer A10 to calculate quantization error from deducting current input value V10 as current output valve V30a by inverse quantizer Q20 inverse quantization.Store described error into delay element DE10.Smooth value V20a be current input value V10 with as convergent-divergent (for example in multiplier M10, multiply by) zoom factor V40 previous frame quantization error with.Alternatively, the mode that can also use zoom factor V40 before storing quantization error into delay element DE10 is implemented quantizer 230a.

Fig. 4 d shows as the example of (through inverse quantization) output valve V30a sequence of being produced by quantizer 230a in response to the input signal of Fig. 4 a.In this example, the value of zoom factor V40 is fixed on 0.5.The signal that can see Fig. 4 d is more level and smooth than the fluctuation signal of Fig. 4 a.

May need to use recursive function to calculate feedback quantity.For instance, can calculate quantization error with respect to current input value rather than with respect to current smooth value.Described method can be described by for example following expression formula:

y(n)＝Q[s(n)]，s(n)＝x(n)+b[y(n-1)-s(n-1)]，

Wherein x (n) is the input LSF vector relevant with frame n.

Fig. 6 shows the block diagram of the enforcement 230b of quantizer 230, wherein can implement peculiar value for this reason and be indicated by subscript b.In this example, by using totalizer A10 to calculate quantization error from the currency that deducts smooth value V20b as current output valve V30b by inverse quantizer Q20 inverse quantization.Store error into delay element DE10.Smooth value V20b be current input value V10 with as convergent-divergent (for example in multiplier M10, multiply by) zoom factor V40 previous frame quantization error with.Alternatively, the mode that can also use zoom factor V40 before storing quantization error into delay element DE10 is implemented quantizer 230b.Also might implement among the 230a to use the different value of zoom factor V40 with to implement 230b opposite.

Fig. 4 c shows as the example of (through inverse quantization) output valve V30b sequence of being produced by quantizer 230b in response to the input signal of Fig. 4 a.In this example, the value of zoom factor V40 is fixed on 0.5.The signal that can see Fig. 4 c is more level and smooth than the fluctuation signal of Fig. 4 a.

It should be noted that and to implement embodiment as shown here by replacing or augment according to existing quantizer Q10 as Fig. 5 or layout shown in Figure 6.For instance, quantizer Q10 can be embodied as predicted vector quantizer, multi-level quantiser, division vector quantizer (split vector quantizer) or implement according to any other scheme that quantizes at LSF.

In an example, the value of zoom factor V40 is fixed on the desirable value between 0 and 1.Perhaps, may need dynamically to adjust the value of zoom factor.For instance, may need the degree of the fluctuation that existed in the LSF vector according to non-quantized to adjust the value of zoom factor.When the difference between current LSF vector and previous LSF vector was big, zoom factor approached zero, and does not almost have the regulating noise result.Have only a little not simultaneously at current LSF vector with previous LSF vector, zoom factor approaches 1.0.In this way, may keep spectrum envelope transformation in time, thereby when voice signal changes, make distortion spectrum reduce to minimum, when voice signal is constant relatively from a frame to next frame, can reduce spectral fluctuations simultaneously.

Can make value and the distance between the adjacent LSF of zoom factor V40 proportional, and can use in each distance between the vector any one to determine variation between the LSF.Usually use Euclid norm (Euclidean norm), but also can use other distance, comprise manhatton distance (Manhattan distance) (1-norm), Chebyshev (Chebyshev) distance (infinitely great norm), Mahalanobis generalised distance (Mahalanobis distance), Hamming distance (Hamming distance).

May need to use through the range observation of weighting and determine variation between the adjacent LSF vector.For instance, can come computed range d according to for example following expression formula:

d = Σ_{i = 1}^{P} c_{i} {(l_{i} - {\hat{l}}_{i})}^{2}

Wherein l indicates current LSF vector,

Indicate previous LSF vector, P indicates the number of the element in each LSF vector, subscript i indication LSF vector element, and the vector of c indication weighting factor.But selective value c emphasizes more significant lower frequency components in the perception.In an example, c _iHave value: 1.0, wherein i from 1 to 8; 0.8, i=9 wherein; And 0.4, i=10 wherein.

In another example, can according to for example following expression formula calculate between the adjacent LSF vector apart from d:

d = Σ_{i = 1}^{P} c_{i} w_{i} {(l_{i} - {\hat{l}}_{i})}^{2}

Wherein w indicates the vector of the variable weighting factor.In a described example, wi has value P (f _t) ^r, wherein P is illustrated in the LPC power spectrum that estimates under the respective frequencies f, and r is the constant with representative value (for example) 0.15 or 0.3.In another example, select the value of w according to the corresponding weighting function that G.729 uses in the standard at ITU-T:

At the minimum and the highest element of w, select boundary value to replace l respectively near 0 and 0.5 _I-1And l _I+1Under described situation, c _iCan have the value of indication as mentioned.In another example, remove c ₄And c ₅Has outside the value 1.2 c _iHas value 1.0.

Fig. 4 a from basis frame by frame can understand to Fig. 4 d, and interim regulating noise method described herein may increase quantization error.Yet though the absolute square error of quantization operation may increase, potential advantages are different pieces that quantization error is movable to frequency spectrum.For instance, quantization error is movable to lower frequency, therefore becomes more level and smooth.When input signal is also level and smooth, can obtain more level and smooth output signal, its be input signal and through the quantization error of smooth and.

Fig. 7 b shows the example as the basic sound source filter arrangement of the coding of the spectrum envelope that is applied to narrow band signal S20.Analysis module 710 calculates one group of parameter that characterizes corresponding to the wave filter of the language during a period of time (being generally 20 milliseconds).Prewhitening filter 760 (be also referred to as and analyze or prediction error filter) according to those filter parameter configurations is removed spectrum envelope so that signal is carried out the frequency spectrum equating.Gained is compared with primary speech signal through whitened signal (being also referred to as residual error), has less energy, and therefore has less variation, and easier coding.Also may on frequency spectrum, scatter more equably by the error that the coding of residual signals causes.Filter parameter and residual error are usually through quantizing, so that transmission effectively on channel.At the demoder place, according to the composite filter 780 of filter parameter configuration by signal excitation, to produce the synthetic version of original language based on residual error.Composite filter is configured to have transfer function usually, and it is the inverse of the transfer function of prewhitening filter.Fig. 8 shows the block diagram of the basis enforcement A122 of arrowband scrambler A120 (shown in Figure 10 a).

As shown in Figure 8, arrowband scrambler A122 also by making narrow band signal S20 through the prewhitening filter 260 (be also referred to as and analyze or prediction error filter) according to described filter coefficient configuration set, produces residual signals.Although also can use IIR to implement, in this particular instance, prewhitening filter 260 is embodied as the FIR wave filter.This residual signals will contain information important in the perception of speech frame usually, for example relevant with tone long-term structure, its not expression in narrow band filter parameter S 40.Quantizer 270 is configured to calculate the quantization means of this residual signals, so that export as encoded arrowband pumping signal S50.Described quantizer comprises vector quantizer usually, and it is encoded into input vector the index of the corresponding vectorial clauses and subclauses in Compass or the code book.Perhaps, described quantizer can be configured to send one or more parameters, according to described parameter, can dynamically produce vector at the demoder place, rather than retrieves from storer as in sparse code book (sparsecodebook) method.Described method is used for for example algebraically CELP (codebook excitationlinear prediction, the code book Excited Linear Prediction) encoding scheme, and 3GPP2 (Third GenerationPartnership 2 for example, third generation partner program 2) in the codec of EVRC (Enhanced Variable Rate Codec, enhanced variable rate codec).

Need arrowband scrambler A120 to produce encoded arrowband pumping signal according to the same filter parameter value, described filter parameter value can be used by corresponding arrowband demoder.In this way, the encoded arrowband of gained pumping signal may be the reason of the undesirable property (for example, quantization error) of those parameter values to a certain extent.Therefore, need to use and to dispose prewhitening filter by available same tie numerical value at the demoder place.In the basic example of as shown in Figure 8 scrambler A122,240 pairs of narrow band filter parameter S of inverse quantizer 40 are carried out inverse quantization, LSF gets back to corresponding LP groups of filter coefficients to LP filter coefficient transducer 250 with the income value mapping, and use this group coefficient to dispose prewhitening filter 260, thereby the generation residual signals, described residual signals is quantized by quantizer 270.

Some enforcement of arrowband scrambler A120 be configured to by in this vector of group code identification with residual signals mate most one, calculate encoded arrowband pumping signal S50.Yet, it should be noted that also and can not produce actually under the situation of residual signals, implement the quantization means that arrowband scrambler A120 calculates residual signals.For instance, arrowband scrambler A120 can be configured to use several codebook vectors (for example to produce corresponding composite signal, according to current filter parameter group), and select the codebook vectors of mating most with original narrow band signal S20 with the signal correction connection that is produced in the territory of weighting in perception.

Fig. 9 shows the block diagram of the enforcement B112 of arrowband demoder B110.310 pairs of narrow band filter parameter S of inverse quantizer 40 are carried out inverse quantization (in the case, inverse changes into one group of LSF), and LSF is transformed into one group of filter coefficient (for example, describing with reference to inverse quantizer 240 and the transducer 250 of arrowband scrambler A122 as mentioned) to LP filter coefficient transducer 320 with LSF.340 couples of encoded arrowband pumping signal S50 of inverse quantizer carry out inverse quantization, to produce arrowband pumping signal S80.Based on filter coefficient and arrowband pumping signal S80, composite filter 330 couples of narrow band signal S90 in arrowband synthesize.In other words, arrowband composite filter 330 is configured to repair according to through the inverse quantization filter coefficient arrowband pumping signal S80 being carried out frequency spectrum, thereby produces narrow band signal S90.Shown in Figure 11 a, arrowband demoder B112 (with the form of arrowband demoder B110) also is provided to high band decoder B200 with arrowband pumping signal S80, and it uses arrowband pumping signal S80 to derive high band excitation signal.In some was implemented, arrowband demoder B110 can be configured to the extraneous information relevant with narrow band signal (for example spectral tilt (spectral tilt), pitch gain (pitch gain) and sluggishness and speech pattern) is provided to high band decoder B200.The system of arrowband scrambler A122 and arrowband demoder B112 is the basic example of analysis-by-synthesis audio coder ﹠ decoder (codec).

By public switch telephone network (public switched telephone network, voice communication PSTN) traditionally with bandwidth constraints in the frequency range of 300-3400kHz.(for example the cellular phone and the networking telephone (voice over IP, VoIP)) can not have identical bandwidth constraints, and may wish to transmit and receive the voice communication that comprises wideband frequency range by described network to be used for the new network of voice communication.For instance, may wish to support to expand to 50Hz and/or upwards expand to 7 or the audio frequency range of 8kHz downwards.Also may wish to support other application, for example high quality audio or audio/video conference, it may have the audio speech content in the scope outside the traditional PSTN restriction.

A kind of wideband speech coding method relates to convergent-divergent narrowband speech coding techniques (for example, a kind of method is configured to the scope of 0-4kHz is encoded), to cover broader frequency spectrum.For instance, can higher rate be taken a sample by voice signal, so that it comprises the component under the upper frequency, and the arrowband coding techniques can be configured to use more the multi-filter coefficient to represent this broadband signal again.Yet for example the arrowband coding techniques calculated amount of CELP (code book Excited Linear Prediction) is bigger, and the broadband celp coder may consume too much cycle of treatment and just can be applicable to many moving and other Embedded Application.Using described technology that the entire spectrum of broadband signal is encoded to required quality also may cause bandwidth unacceptably significantly to increase.Its arrowband part in addition, can need described encoded signal is carried out code conversion, even can be transferred in the system that only supports arrowband coding and/or by the system decodes of only supporting the arrowband coding before.

Figure 10 a shows the block diagram of wideband acoustic encoder A100, and it comprises independently arrowband and high frequency band speech coder A120 and A200 respectively.One or both among arrowband and high frequency band speech coder A120 and the A200 can be configured to use the enforcement as quantizer disclosed herein 230, carries out the LSF quantification of (or another coefficient is represented).Figure 11 a shows the block diagram of corresponding broadband voice demoder B100.In Figure 10 a, can implement bank of filters A110 with principle and enforcement according to announcement in the U.S. patent application case of applying for " SYSTEMS; METHODS; AND APPARATUS FORSPEECH SIGNAL FILTERING " (case number be U.S.Pub.No.2007/0088558) with the application's case, according to wideband speech signal S10, produce narrow band signal S20 and high-frequency band signals S30, the disclosure of this type of bank of filters in the described patent application case is incorporated herein by reference.Shown in Figure 11 a, can implement bank of filters B120 similarly with according to narrow band signal S90 and high-frequency band signals S100 through decoding through decoding, produce wideband speech signal S110 through decoding.Figure 11 a has showed that also arrowband demoder B110 is configured to narrow band filter parameter S 40 and encoded arrowband pumping signal S50 are decoded, thereby produces narrow band signal S90 and arrowband pumping signal S80; High band decoder B200 is configured to produce the high-frequency band signals S100 based on high frequency band coding parameter S60 and arrowband pumping signal S80.

May need to implement wideband speech coding, make the part of arrowband at least of encoded signal can pass through narrow band channel (for example, the PSTN channel) transmission, and do not need code conversion or other material alteration.Also may need the efficient (for example) of wideband encoding expansion to avoid and in using (for example wireless cellular telephony and by wired and broadcasting wireless channel), significantly to reduce by serviced user's number.

A kind of method of wideband speech coding relates to according to encoded narrow-band spectrum envelope extrapolation high frequency band spectrum envelope.Yet, do not implement described method under the situation of code conversion though can have any increase and not need in bandwidth, generally can not predict the rough spectrum envelope or the resonance peak structure of the highband part of voice signal according to the spectrum envelope of arrowband part exactly.

The particular instance of wideband acoustic encoder A100 is configured under the speed of about 8.55kbps (kbps) wideband speech signal S10 be encoded, wherein about 7.55kbps is used for narrow band filter parameter S 40 and encoded arrowband pumping signal S50, and about 1kbps is used for high frequency band coding parameter (for example, filter parameter and/or gain parameter) S60.

May need encoded low band signal and high-frequency band signals are combined into single bit stream.For instance, may need encoded signal is carried out together multiplexed, for transmission (for example, by wired, optics or wireless transmission channel) or for storage, as encoded wideband speech signal.Figure 10 b shows the block diagram of wideband acoustic encoder A102, described wideband acoustic encoder A102 comprises multiplexer A130, and it is configured to narrow band filter parameter S 40, encoded arrowband pumping signal S50 and high frequency band coding parameter S60 are combined into through multiplexed signal S70.Figure 11 b shows the block diagram of the correspondence enforcement B102 of broadband voice demoder B100.Demoder B102 comprises demultiplexer B130, and it is configured to through multiplexing signal S70 demultiplexing, to obtain narrow band filter parameter S 40, encoded arrowband pumping signal S50 and high frequency band coding parameter S60.

May wish that multiplexer A130 is configured to encoded low band signal (comprising narrow band filter parameter S 40 and encoded arrowband pumping signal S50) be flowed as the separable son through multiplexed signal S70 and embed, make the another part (for example high frequency band and/or extremely low frequency band signal) that can be independent of recover and the encoded low band signal of decoding through multiplexed signal S70.For instance, can be through multiplexed signal S70 through arranging so that can recover encoded low band signal by removing high frequency band coding parameter S60.Potential advantages of described feature are encoded broadband signal need be delivered to support to the decoding of low band signal but before not supporting system to the decoding of highband part, described encoded broadband signal is carried out code conversion.

The equipment that comprises regulating noise quantizer and/or low-frequency band, high frequency band and/or wideband acoustic encoder as described herein also can comprise circuit, and described circuit is configured to encoded signal is transferred in the transmission channel (for example wired, optics or wireless channel).Described equipment also can be configured to signal is carried out one or more chnnel coding operations, for example error correction code (for example, the compatible convolutional encoding of speed), and/or error detection code (for example, cyclic redundancy code), and/or one or more layers procotol coding (for example, Ethernet, TCP/IP, cdma2000).

May need low-frequency band speech coder A120 is embodied as the analysis-by-synthesis speech coder.Code book Excited Linear Prediction (CELP) coding is a general kind of analysis-by-synthesis coding, and the enforcement of described scrambler can be carried out the waveform coding of residual error, comprises the operation of for example selecting clauses and subclauses, error minimize operation and/or perceptual weighting operation from fixing and adaptive codebook.Other enforcement of analysis-by-synthesis coding comprises MELP (Mixed Excitation Linear Prediction) (MELP), algebraically CELP (ACELP), lax CELP (RCELP), Regular-Pulse Excitation (RPE), multiple-pulse CELP (MPE), and linear prediction (VSELP) coding of vectorial summation excitation.The correlative coding method comprises multi-band excitation (MBE) and prototype waveform interpolation (PWI) coding.The example of standardization analysis-by-synthesis audio coder ﹠ decoder (codec) comprises: ETSI-global system for mobile communications (ETSI-GSM) full-rate codec (GSM 06.10), and it uses the linear prediction (RELP) of residual error excitation; The full-rate codec (ETSI-GSM 06.60) that GSM strengthens; The standard 11.8kb/s of International Telecommunication Union is annex E scrambler G.729; IS-641 codec at interim standard (IS)-136 (time division multiple access (TDMA) scheme); GSM adaptive multi-rate (GSM-AMR) codec; And 4GV ^TM(Fourth-Generation Vocoder ^TM) codec (California, the QUALCOMM company of Diego California).The existing enforcement of RCELP scrambler comprises enhanced variable rate codec (Enhanced Variable Rate Codec, EVRC) (as described in) at the IS-127 of telecommunications industry association (TIA), and third generation partner program 2 (3GPP2) alternative mode vocoder (SelectableMode Vocoder, SMV).Can implement various low-frequency band described herein, high frequency band and wideband encoder according to any one or any other speech coding technology in these technology (no matter be known or untapped), wherein said any other speech coding technology is expressed as voice signal (A) and describes the one group of parameter of wave filter and (B) quantization means of residual signals, and described residual signals is provided for driving described wave filter at least a portion with the excitation of reproducing described voice signal.

As mentioned above, embodiment described herein comprises and can be used for carrying out embedded encoded enforcement, and it supports the compatibility with narrowband systems, and does not need code conversion.The support of high frequency band coding also is used in distinguishes chip, chipset, device on the cost basis, and/or have network that compatible backward broadband supports and the network that only has the arrowband support are arranged.Support to high frequency band coding described herein also can support the technology of low-frequency band coding use in conjunction with being used to, and can support from (for example) about 50 or 100Hz to up to about 7 or the coding of the frequency component of 8kHz according to system, method or the equipment of described embodiment.

As mentioned above, add the high frequency band support to speech coder and can improve sharpness, especially with regard to fricative difference.Though the hearer can derive described difference from special context usually, the high frequency band support can be served as the feature of enabling of speech recognition and other machine decipher application (system that for example, is used for automatic speech menu navigation and/or automatic call treatment).

Equipment according to embodiment can be embedded into the mancarried device that is used for radio communication, for example cellular phone or PDA(Personal Digital Assistant).Perhaps, described equipment can be included in another communicator, and described communicator for example is the VoIP mobile phone, is configured to support the PC of VoIP communication, or is configured to send the network equipment of phone or VoIP communication.For instance, can implement at chip that is used for communicator or chipset according to the equipment of embodiment.Decide on application-specific, described device also can comprise: for example the feature of the analog-digital conversion of voice signal and/or digital-to-analog conversion, be used for voice signal carry out is amplified and/or the circuit of other signal processing operations, and/or be used to launch and/or receive the radio circuit of encoded voice signal.

Contain expressly and disclosed embodiment and can comprise the 60/667th, in the further feature that is disclosed in No. 901 (the present patent No. is U.S.Pub.No.2007/0088542) U.S. Provisional Patent Application cases any one or an above feature, and/or use with described feature.Described feature comprises regularization or other variation according to arrowband pumping signal S80 and arrowband residual signals S50, changes high-frequency band signals S30 and/or high band excitation signal S120.Described feature comprises the self-adaptation smooth of LSF, and it can be carried out before quantification described herein.Described feature also comprises the fixing or self-adaptation smooth of gain envelope, and the adaptive attenuation of gain envelope.

The those skilled in the art provide the aforementioned of described embodiment to present, so that can make or use the present invention.Various modifications to these embodiment are possible, and the General Principle that this paper presented can be applicable to other embodiment equally.For instance, but embodiment a part or whole part is embodied as hard-wired circuit, is made into the circuit arrangement of special IC, or be loaded into firmware program in the nonvolatile memory, or as machine readable code, load or be loaded into software program the data storage medium from data storage medium, described code is can be by the instruction of array of logic elements execution, and described logic element for example is microprocessor or other digital signal processing unit.Data storage medium can be a memory element array, it for example is semiconductor memory (its can including (but not limited to) dynamic or static RAM (RAM), ROM (read-only memory) (ROM) and/or quickflashing RAM), or ferroelectric, magnetic resistance, two-way, polymerization or phase transition storage; Or dish medium, for example disk or CD.Term " software " should be interpreted as any one or above instruction set or the instruction sequence that comprise source code, compilation phonetic code, machine code, binary code, firmware, macrocode, microcode, can carry out by array of logic elements, and any combination of described example.

The regulating noise quantizer, high frequency band speech coder A200, wideband acoustic encoder A100 and A102 and the various elements of enforcement that comprise the layout of one or more described equipment can be embodied as and reside on (for example) same chip or electronics and/or optical devices between two or more chips in the chipset, although also contain other layout of no described restriction.One or more elements of described equipment can be embodied as one or more instruction set in whole or in part, its through arrange with carry out one or more fix or programmable logic element (for example, transistor, door) array, described array for example is microprocessor, flush bonding processor, IP kernel, digital signal processor, field programmable gate array (FPGA), Application Specific Standard Product (ASSP) and special IC (ASIC).One or more described element also (for example might have sharing structure, be used for carrying out processor corresponding to the code section of different elements at different time, through carrying out finishing instruction set at different time corresponding to the task of different elements, or in the layout of different time at the electronics and/or the optical devices of different elements executable operations).In addition, one or more described element might be used to finish operation related task not direct and described equipment, or carry out not direct other instruction set relevant with the operation of described equipment, described task for example is with embedding the device of described equipment or another operation related task of system to be arranged.

Embodiment also comprises speech processes and the additional method of voice coding and the method that highband burst suppresses by the description of the structure embodiment of method as described in being configured to carry out is clearly disclosed as this paper (for example).In these methods each all also (for example can be implemented expressly, in one or more cited as mentioned data storage mediums) be one or more instruction set, it can be read and/or execution by the machine that comprises logic element (for example, processor, microprocessor, microcontroller or other finite state machine) array.Therefore, do not wish the embodiment of the present invention shown in being limited to above, and wish that the present invention meets and the principle and the novel feature the widest corresponding to scope that disclose by any way herein.

Claims

1. one kind is used for method for processing signals, and described method comprises:

First frame and second frame to voice signal are encoded, to produce the first and second corresponding vectors, wherein said primary vector is represented the spectrum envelope of described first image duration of described voice signal, and described secondary vector is represented the spectrum envelope of described second image duration of described voice signal;

Produce first through quantizing vector, described generation comprises the 3rd vector based at least a portion of described primary vector is quantized;

Calculate described first through quantizing the quantization error of vector;

Calculate the four-way amount, described calculating comprises at least a portion of adding described secondary vector to through zoom version with described quantization error; And

Described four-way amount is quantized.

2. method according to claim 1, wherein said calculating quantize error packet and contain and calculate described first through quantizing poor between vector and described the 3rd vector.

3. method according to claim 1, wherein said calculating quantize error packet and contain and calculate poor between described first at least a portion through quantizing vector and described primary vector.

4. method according to claim 1, described method comprise the quantization error of calculating through convergent-divergent, and described calculating comprises multiply by zoom factor with described quantization error,

Wherein said zoom factor is based on the distance between the counterpart of at least a portion of described primary vector and described secondary vector.

5. method according to claim 4, each in wherein said primary vector and the described secondary vector comprises a plurality of line spectral frequencies.

6. method according to claim 1, each in wherein said primary vector and the described secondary vector comprises the expression of a plurality of coefficient of linear prediction wave filter.

7. method according to claim 1, each in wherein said primary vector and the described secondary vector comprises a plurality of line spectral frequencies.

8. method according to claim 1, wherein said second frame is immediately following described first frame in the described voice signal.

9. method according to claim 1, the adaptive smooth coefficient of each the expression spectrum envelope parameter in wherein said first and second vectors is represented.

10. method according to claim 1, wherein said method comprises:

Described four-way amount is carried out inverse quantization;

Four-way amount based on described inverse quantization is calculated pumping signal.

11. method according to claim 1, wherein said method comprises: wideband speech signal is carried out filtering to obtain narrow band voice signal and high frequency band voice signal; And

Wherein said primary vector is illustrated in the spectrum envelope of the described narrow band voice signal of described first image duration;

Wherein said secondary vector is illustrated in the spectrum envelope of the described narrow band voice signal of described second image duration.

12. method according to claim 1, wherein said method comprises: wideband speech signal is carried out filtering to obtain narrow band voice signal and high frequency band voice signal; And

Wherein said primary vector is illustrated in the spectrum envelope of the described high frequency band voice signal of described first image duration;

Wherein said secondary vector is illustrated in the spectrum envelope of the described high frequency band voice signal of described second image duration.

13. method according to claim 1, wherein said method comprises:

Wideband speech signal is carried out filtering to obtain narrow band voice signal and high frequency band voice signal, wherein said primary vector is illustrated in the spectrum envelope of the described narrow band voice signal of described first image duration, and described secondary vector is illustrated in the spectrum envelope of the described narrow band voice signal of described second image duration;

Described four-way amount is carried out inverse quantization;

Based on the four-way amount of described inverse quantization, calculate the pumping signal of described narrow band voice signal;

Described pumping signal based on described narrow band voice signal derives the pumping signal of described high frequency band voice signal.

14. method according to claim 1, the described four-way amount of wherein said quantification comprise described four-way amount is carried out the division vector quantization.

15. an equipment that is used for signal Processing, it comprises:

Speech coder, it is configured to first frame of voice signal is encoded into primary vector at least, and second frame of voice signal is encoded into secondary vector at least, wherein said primary vector is represented the spectrum envelope of described first image duration of described voice signal, and described secondary vector is represented the spectrum envelope of described second image duration of described voice signal;

Quantizer, it is configured to the 3rd vector based at least a portion of described primary vector is quantized, to produce first through quantizing vector;

First adder, it is configured to calculate described first through quantizing the quantization error of vector; And

Second adder, it is configured at least a portion of adding described secondary vector to through zoom version with described quantization error, to calculate the four-way amount;

Wherein said quantizer is configured to described four-way amount is quantized.

16. equipment according to claim 15, wherein said first adder are configured to calculate described quantization error based on described first through the difference that quantizes between vector and described the 3rd vector.

17. equipment according to claim 15, wherein said first adder are configured to calculate described quantization error based on described first through the difference that quantizes between vector and at least a portion of described primary vector.

18. equipment according to claim 15, described equipment comprises multiplier, and it is configured to product based on described quantization error and zoom factor and calculates quantization error through convergent-divergent,

Wherein said equipment comprises computing unit, and it is configured to calculate described zoom factor based on the distance between the counterpart of at least a portion of described primary vector and described secondary vector.

19. equipment according to claim 18, each in wherein said primary vector and the described secondary vector comprises a plurality of line spectral frequencies.

20. equipment according to claim 15, each in wherein said primary vector and the described secondary vector comprises the expression of a plurality of coefficient of linear prediction wave filter.

21. equipment according to claim 15, each in wherein said primary vector and the described secondary vector comprises a plurality of line spectral frequencies.

22. equipment according to claim 15, described equipment comprises the device that is used for radio communication.

23. equipment according to claim 15, described equipment comprises the device of a plurality of bags that are configured to transmit the Internet Protocol of complying with a version, and wherein said a plurality of bags describe described first through quantizing vector.

24. equipment according to claim 15, wherein said second frame is immediately following described first frame in the described voice signal.

25. equipment according to claim 15, the adaptive smooth coefficient of each the expression spectrum envelope parameter in wherein said first and second vectors is represented.

26. equipment according to claim 15, wherein said equipment comprises:

Inverse DCT, it is configured to described four-way amount is carried out inverse quantization;

Prewhitening filter, it is configured to calculate pumping signal based on the four-way amount of described inverse quantization.

27. equipment according to claim 15, wherein said equipment comprises: bank of filters, and it is configured to wideband speech signal is carried out filtering to obtain narrow band voice signal and high frequency band voice signal; And

28. equipment according to claim 15, wherein said equipment comprises: bank of filters, and it is configured to wideband speech signal is carried out filtering to obtain narrow band voice signal and high frequency band voice signal; And

29. equipment according to claim 15, wherein said equipment comprises:

Bank of filters, it is configured to wideband speech signal is carried out filtering to obtain narrow band voice signal and high frequency band voice signal, wherein said primary vector is illustrated in the spectrum envelope of the described narrow band voice signal of described first image duration, and described secondary vector is illustrated in the spectrum envelope of the described narrow band voice signal of described second image duration;

Prewhitening filter, it is configured to the four-way amount based on described inverse quantization, calculates the pumping signal of described narrow band voice signal;

High band encoder, the described pumping signal that it is configured to based on described narrow band voice signal derives the pumping signal of described high frequency band voice signal.

30. equipment according to claim 15, wherein said quantizer are configured to quantize described four-way amount by described four-way amount being carried out the division vector quantization.