CA2037899C - Digital speech coder having improved long-term predictor - Google Patents

Digital speech coder having improved long-term predictor

Info

Publication number
CA2037899C
CA2037899C CA002037899A CA2037899A CA2037899C CA 2037899 C CA2037899 C CA 2037899C CA 002037899 A CA002037899 A CA 002037899A CA 2037899 A CA2037899 A CA 2037899A CA 2037899 C CA2037899 C CA 2037899C
Authority
CA
Canada
Prior art keywords
speech
vector
samples
circuitry
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA002037899A
Other languages
French (fr)
Other versions
CA2037899A1 (en
Inventor
Ira A. Gerson
Mark A. Jasiuk
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=23590969&utm_source=***_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CA2037899(C) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Motorola Inc filed Critical Motorola Inc
Publication of CA2037899A1 publication Critical patent/CA2037899A1/en
Application granted granted Critical
Publication of CA2037899C publication Critical patent/CA2037899C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0012Smoothing of parameters of the decoder interpolation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Complex Calculations (AREA)
  • Analogue/Digital Conversion (AREA)
  • Near-Field Transmission Systems (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

A digital speech coder includes a long-term filter (124) having an improved sub-sample resolution long-term predictor (Figure 5) which allows for subsample resolution for the lag parameter L. A frame of N samples of input speech vector s(n) is applied to an adder (510). The output of the adder (510) produces the output vector b(n) for the long term filter (124). The output vector b(n) is fed back to a delayed vector generator block (530) of the long-term predictor. The nominal long-term predictor lag parameter L is also input to the delayed vector generator block (530). The long-term predictor lag parameter L can take on non-integer values, which may be multiples of one half, one third, one fourth or any other rational fraction. The delayed vector generator (530) includes a memory which holds past samples of b(n). In addition, interpolated samples of b(n) are also calculated by the delayed vector generator (530) and stored in its memory, at least one interpolated sample being calculated and stored between each past sample of b(n). The delayed vector generator (530) provides output vector q(n) to the long-term multiplier block (520), which scales the long-term predictor response by the long-term predictor coefficient .beta.. The scaled output .beta.q(n) is then applied to the adder (510) to complete the feedback loop of the recursive filter (124).

Description

DIGll'AL SPEECH CODER HAVING IMPROVED
SUB-SAMPLE RESOLUTION LONG-TERM PREDICTOR
10 ~rlrprollnti of the rnvention Code-excited linear prediction (CELP) is a speech coding trrhni-lu~ which has the potential of producing high quality synthesized speech at low bit rates, i.e., 4.8 to 9.6 kilobits-per-second (kbps). This class of speech coding, also krLown as 20 vector-excited linear prediction or stnrh~ptir coding, will most likely be used in numerous speech c~lmm~nir itirln~ and speech synthesis ~r~lir~ti~n~ CELP may prove to be particularly ~rrlir~hl~ to digital speech encryption and digital radiotelephone comml-nic~tinn systems wherein speech 25 qu~lity, data rate, size, and cost are Pi~nif;~r~nt issues.
The term "code-excited" or "vector-excited" is derived from the fact that the ryrit~tinn sequence for the speech coder is vector quantized, i.e., a single codeword is used to represent a sequence, a vector, of P~rit~tirn samples. In this way, data 30 rates of less than one bit per sample are possible for coding the f~Yrits~ m sequence. The stored PlrritAtinn code vectors generally consist of infl~p~n~irnt random white Gaussian sequences. One code vector from the codebook is chosen to represent each block of N r~rit~tit~n samples. Each stored code ."~

-Z~33~B9~
vector is lc~ Dc~ d by a codeword, i.e., the address of the code vector memory location. It is this codeword that is 81lh9eq~1Pnt1y sent over a comml~nirot;onc channel to the speech synthesizer to reconstruct the speech frame at the 5 receiver. See M.R. Schroeder and B.S. Atal, "Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Lo Bit Rates", P~o~c_li~ of the IEEE rntPrno.~;onAl Conference on Aro~ ;rR, Speech and Signal P-o~ec :- g (ICASSP), Vol. 3, pp.
937-40, March 1985, for a more detailed ~yrlonot;on of CELP.
In a CELP speech coder, the PYrit~otion code vector from the codebook is applied to two time-varying linear filters which model the cl~rc.~,Lc.istics of the input speech signal. The first filter includes a long-term predictor in its feedback loop, which has a long delay, i.e., 2 to 15 millice~nn~lc, used to introduce 15 the pitch p_.;odi~;~y of voiced speech. The second filter includes a short-term predictor in its feedback loop, which has a short delay, i.e., less than 2 msec, used to introduce a spectral envelope or format structure. For each frame of speech, the speech coder applies each individual code vector to 20 the filters to generate a l__~lrlDtl u~t~d speech signal, the compares the original input speech signal to the reconstructed signal to create an error signal. The error signal is then weighted by passing it through a weighting filter having a response based on human auditory perception. The optimum 25 o~ritot;on signal is ~letprminod by selecting the code vector which produces the weighted error signal having the ... ;.. ;.. energy for the current frame. The codeword for the optimum code vector is then transmitted over a comml1ni-otinnc channel.
In a CELP speech synthesizer, the codeword received from the channel is used to address the codebook of PY~'itotiOn vectors. The single code vector is then ml~ltirlied by a gain factor, and filtered by the long-term and short-term filters to obtain a reconstructed speech vector. The gain factor and t~le ~3'~ 9 predictor parameters sre also obtained from the channel. It has been found that a better quality synthesized signal can be L~ad if the actual p~.,a~,~tel used by the synthesizer are used in the analysis stage, thus ~.;~I;I..;~;IIg the ql~nti7~ti~
5 errors. Hence, the use of these synthesis parameters in the CELP speech analysis stage to produce higher quality speech is referred to as analysis-by-synthesis speech coding.
The short-term predictor attempts to predict the current output sample s(n) by a linear rrlmhin:~hon of the imnnp~i~tply 10 preceding output samples s(n-i), according to the equation:
s(n) = als(n-l) + ~2S(n-2) + . . . + c~ps(n-p) +e(n) where p is the order of the short-term predictor, and e(n) i9 the 15 prediction residual, i.e., that part of s(n) that cannot be 3~ d by the weighted sum of p previous samples. The predictor order p typically ranges from 8 to 12, assuming an 8 kiloHertz (kHz) sampling rate. The weights al, 2, op, in this equation are called the predictor roPffiriPntc The short-term 20 predictor ~oPffiriPntc are determined from the speech signal using conventional linear predictive coding (LPC) techniques.
The output response of the short-term filter may be expressed in ~transform notation as:
25 _ . 1 A(z) =
p i=l 30 Refer to the article entitled "Predictive Coding of Speech at Low Bit Rates", IEEE Trans. Commun., Vol. COM-30, pp. 600-14, April 1982, by B.S. Atal, for further fligc~lc.cir~n of the short-term filter parameters.

7~ 9~
The long-term filter, on the other hand, must predict the next output sample from preceding samples that extend over a much longer time period. If only a single past sample is used in the predictor, then the predictor is a single-tap predictor.
5 Typically, one to three taps are used. The output response for a long-term filter inc~.~u1a~ a single-tap, long-term predictor is given in z-transform notation as:.

B(z) =

Note that this output response is a function of only the delay or lag L of the filter and the filter ~u~ . .B. For voiced speech, the lag L would typically be the pitch period of the speech, or a multiple of it. At a sampling rate of 8 kHz, a 15 suitable range for the lag L would be between 16 and 143, which C~ u,1~9 to a pitch range between 500 Hz to 56 Hz.
The long-term predictor lag L and long-term predictor ~o~ .13 can be d~l., llilled from either an open-loop or a closed loop Cuuilli u~a~iOn. Using the open-loop configuration, 20 the lag L and roPfflripnt B are computed from the input signal (or its residual) directly. In the closed loop configuration the lag L, and the r~ PffiriPnt B are computed at the frame rate from coded data 1~lU,Cr~ g the past output of the long-term filter and the input speech signal. rn using the coded data, the 25 long-term predictor lag dPtPrminAti~n is based ûn the actual long-term filter state that will exist at the synthesizer. Hence, the closed-loop configuration gives better p~ Arl~P than the open-loop method, since the pitch filter itself would be ,.. ~, il...~;.~, to the ~JL~ of the error sigr~l. Moreover, 30 a single-tap predictor works very well in the closed-loop configuration.
Using the closed-loop configuration, the long-term filter output response b(n) is determined from only past output 2~)3~7~399 samples from the long-term filter, and from the current input speech samples s(n) according to the equation:
b(n) = s(n) + ~3 b(n-L) This t~rhniq lP is ~lal~ ruL ~a1d for pitch lags L which are 5 greater than the frame length N, i.e., when L 2 N, since the term b(n-L) will always represent a past sample for all sample numbers n, 0 S n S N-l. Fu1ll1~1 uO1~, in the case of L > N, the p~rit~tir~n gain factor Y and the long-term predictor coefficient B can be ~imlllt~n~f)usly optimi7~d for given values of log L
10 and codeword i. It has been found that this joint optimization technique y~elds a noticeable iLU~lVt:LU~l~t in speech quality.
If, however, long-term predictor lags L of less than the frame length N must be ~c~ cl~tpfl~ the closed-loop approach false. This problem can readily occur in the case of 15 high-pitched female speech. For example, a female voice ~v1.~ ,-J.~ to a pitch rl~uue..~ of 250 Hz may require a long-term predictor lag L equal to 4 mi11i~ec~onrl~ (msec). A
pitch of 250 Hz at an 8 kHz sampling rate ~.,.1e~,,.ul~ds to along-term predictor lag L of 32 samples. It is not desirable, 20 however, to employ frame length N of less than 4 msec, since the CELP P~it~ti~n vector can be coded more efficiently when longer frame lengths are used. A~c~,1Lu~ly, utilizing a frame length time of 7.5 msec at a sampling rate of 8 kHz, the frame length N would be equal to 60 samples. This means only 32 25 , J , ~ would be available to predict the next 60 samples of the frame. Hence, if the long-term predictor lag L is less than the frame length N, only L past samples of the required N
samples are defined.
Several alternative a~ ac11~3 have been taken in the 30 prior art to address the problem of pitch lags L being less than frame length N. In Itt~ to jointly optimize the long-term predictor lag L and coefficient 13, the first approach would be to attempt to solve the equations directly, assuming no P-rit~tion gignal to present. Thig approach is explained in the -2~78g~
.
- . 6 - CM00450HP
article entitled "Regular-Pulse ~.Y~ :læ~;.... - A NoYel Approach to Effective and Efficient Mllltirl.1A_ Coding of Speech" by Kroon, et al., IEEE TrAn~.^ti^n~ on A^^,l-.-^tirD Speech, and Signal Plu~ Dil g~ Vol. ASSP - 34, No. 5, October 1986, pp.
1054-1063. However, in follûwing this approach, a nûnlinear equation in the single pLIcl~clc. B must be solved. The solution of the quadratic or cubic in B must be solved. The solution of the quadratic or cubic in B is . .~ ^,nAlly i~u~ L Moreover, jointly u,u~ the coPffi~^iPnt 13 with the gain factor ~ is still not possible with this approach.
A second solution, that of limiting the long-term predictor delay L to be greater than the frame length N, is proposed by Singhal and Atal in the article "Improving Performance of Multi-Pulse LPC Coders at Low Bit Rates", PIOCC~ D of the IEEE International Conference on Ar.^,l~-^t;^D, Speech, and Signal P~ù~,c,~Dil~g~ Vol. 1, March 19-21,1984, pp. 1.3.1-1.3.4. This artificial ~;U~IDll~ on the pitch lag L often does not accurately represent the pitch information. Ac~u.dil.~ly, using this approach, the voice quality is degraded for high-pitched speech.
A third solution is to reduce the size of the frame length N. With a shortcr frame length, the long-term predictor lag L
can always be ~ ;..Pd from past sample3. This approach, however, suffers from a severe bit rate penalty. With a shorter 25 frame length, a greater number of long-term predictor l.--.i.. -l.. D and PY.^.itAti.^n vectors must be coded, and a~ .. Jil. Iy, the bit rate of the channel must be greater to ~ tP the e~tra coding.
A second problem e~ists for high pitch speakers. The 30 sampling rate used in the coder places an upper limit on the pc,rul...~ c of a single-tap pitch predictor. For example, if the pitch rle.~ue.,~.~ is actually 485 Hz, the closest lag value would be 16 which .iul leD~uullds to 500 Hz. This results in an error of 15 Hz for the î~...-l_....~..l~l pitch frequency which 2Q;~789~3 degrades voice quality. This error is mllltirlied for the h~rmnnir~ of the pitch I`lèqu~ .y causing further degradation.
A need, therefore, exists to provide an improved method for determining the long-term predictor lag L. The optimum 5 solution must address both the problems of rnmrllt~tinn 1l c~ y and voice quality for the coding of high-pitched speech.
Sllmm~rv nf thP rnvpntinn Ac~o~Lll~;ly, a general object of the present invention is to provide an improved digital speech coding technique that produces high quality speech at low bit rates.
A more specific object of the present invention is to 15 provide a method to ~-k-,...;..~ long-term predictor ~ Le.
using the closed-loop approach.
Another object of the present invention is to provide an improved metbod for determining the output response of a long-term predictor in the case of when the long-term 20 predictor lag ~ ullel~l L is a non-integer number.
A furtber object of the present invention is to provide an improved CELP speech coder which permits joint optimi7~tinn of the gain factor Y and the long-term predictor roPffiriPnt. B
during the codebook search for the optimum excitation code 25 ve~tQr.
A~o.LIlg to a novel aspect of the invention, the .e~lu~ioll of the ~c~laLue~èl L i8 increased by allov~ing L to take on values which are not integers. This is achieved by the use of ill~é~ iillg filters to provide interpolated samples of the 30 long-term predictor state. In a closed loop imrlPmpnt~tinn) future samples of the long-term predictor state are not available to the interpolating filters. This problem is ci-~uLu~ i,ed by pitch-~yll~lllol~uusly P~tPn~1inE the long-term predictor state into the future for use by the interpolation filter.

X03'7~399 When the actual P~r~it.~tir~n samples for the next frame become available, the long-term predictor state is updated to reflect the actual P~rit it;rln samples (replacing those based on the pitch-,v.lously extended samples). For e%ample, the 5 iL~.~olaLion can be used to interpolate one sample between each existing sample thus doubling the resolution of L to half a sample. A higher interpolation factor could also be chosen, such as three or four, which would increase the resolution of L
to a third or a fourth of a sample.

BriPf Descrinfi~ n of the DrawinP~
The features of the present invention which are believed to be novel are set forth with particularity in the appended claims. The invention, together with further objects and avv~.Lc h~,~ thereof, may best be ulld~-~load by reference to the following description taken in conjunction with the .Iyillg drawings, in the several figures of which like-l~f~ numerals identify like elements, and in which:
Figure 1 is a general block diagram of a code-e~cited linear p.~L~.,iv~ speech coder, ill.,~l...l.;..~ the location of a long-term filter for use with the present invention;
Figure 2A is a detailed block diagram of an PmhoflimPnt of the longterm filter of Figure 1, illustrating the long-term 25 pr~dictor response where filter lag L is an integer;
Figure 2B is a ~imrlifipd diagram of a shift register which can be used to illustrate the operation of the ~ong-term predictor in Figure 2A;
Figure 2C is a detailed block diagram of another 30 PmhoAimPnt of the long-term filter of Figure 1, illustrating the long-term predictor response where filter lag L i8 an integer;
Figure 3 is a detailed flowchart diagram illustrating the operations p~rvlllled by the long-term filter of Figure 2A;

;~)3 g- CM00450HP
Figure 4 is a general block diagram of a speech synthesizer for use in accordance with the present invention;
Figure 5 is a detailed block diagram of the long-term filter of Figure 1, illustrating the sub-sample resolution long-term predictor response in accordance with the present invention;
Figures 6A and 6B are detailed flowchart diagrams illustrating the operations performed by the long-term filter of Figure 5; and Figure 7 is a detailed block diagram of a pitch post filter fûr int~u~lulil~ the short term filter and D/A converter of the speech synthesizer in Figure 4.
I)Pts~ilp~l l)escTi~tionofthPPreferred h:...h~..l;...~.,t Referring now to Figure 1, there is shov~n a general block diagram of code cited linear ~ iv~ speech coder 100 utilizing the long-term filter in ac~u.dancc, with the present invention. An acoustic input signal to be analyzed is applied to speech coder 100 at I~ .u,uhol~ 102. The input signal, typically a speech signal, is then applied to filter 104. Filter 104 generally will exhibit bandpass filter ~ listics.
However, if the speech bandwidth is already adequate, filter 104 may comprise a direct wire cnnnPrt;~-n _ The analog speech signal from filter 104 is then converted into a sequence of N pulse samples, and the lit~l~lP of each pulse sample is then represented by a digital code in analog-to-digital (A/D) converter 108, as known in the art. The sampling rate is d~PtPrmin~Pd by sample clock SC, which l~:,U~3._.1i:~ an 8.0 kHz rate in the preferred ~mho~limPnt The sample clock SC is generated along with the frame clock FC via clock 112.
The digital output of A/D 108, which may be represented as input speech vector s(n), is then applied to roPf~ iPnt ~! 2 ~ 3 ~ 8 9 9 analyzer 110. This input speech vector s(n) is repetitively obtained in separate frames, i.e., blocks of time, the length of which is flptprminpd by the frame clock FC. In the preferred ~.,.ho-l;...~..~., input speech vector s(n), 0 < n ~ N-l, represents a 5 7.6 msec frame cnntQinin~ N=60 samples, wherein each sample is e~ éd by 12 to 16 bits of a digital code, In this ~ - I-o ~ ., for each block of speech, a set of linear predictive coding (LPC) parameters are produced by coPffiriPnt analyzer 110 in an open-loop con~iguration. The short-term predictor 10 parameters ai, long-term predictor c(~Pffiripnt ~, nominal long-term predictor lag 1.~., ~llel~. L, weighting filter r~r~mPtPrs WFP, and excitation gain factor ~ (along with the best eYrit~t;~n codeword r as described later) are applied to ml~ltipl^YPr 150 and sent over the channel for use by the 15 speech ,,yll~lles;~ .. Refer to the article entitled "Predictive Coding of Speech at Low Bit Rates," Th:F.~. Tr~nc Comml~n .
Vol. COM-30, pp. 600-14, April 1982, by B.S. Atal, for lelJlesell~live methods of generating these parameters for this Pmho~imPnt The input speech vector s(n) is also applied 20 to subtractor 130 the function of which will subsequently be described.
Codebook ROM 120 contains a set of M PY~it~ti~m vectors u;(n),. wherein 1 < i M, each ctlmrriced of N samples, wherein 0 < n ~ N-1. Codebook ROM 120 is preferably 25 yl~ - ~ as described in US Patent No. 4,817,1~7, Codebook ROM 120 generates these "~ excitation vectors in response to a particular one of a set of ~ co~ 1s i. Each of the M PYrit~tinn vectors are comprised of a seriês of random 30 white Gaussian samples, although other types of excitation vectors may be used with the present invention. If the P.rit5.tinn signal were coded at a rate of 0.2 bits per sample for each of the 60 samples, then there would be 4096 codewords i ~ull~,~olld~g to the possible excitation vectors.

2~3~$9'`9 . 11- CM00450HP
For each individual PYrit~t;~ln vector u;(n), 8 . ucLed speech vector s';(n) i8 generated for comparison to the input speech vector s(n). Gain block 122 scales the PYrih~tion vector ui(n) by the PYrit~tinn gain factor Y, which is 5 content for the frame. The PYrit51ti~n g~un factor Y may be pre-computed by copfflripnt analyzer 110 and used to analyze all on vectors as shown in Figure 1, or may be optimized jointly with the search for the best P~rit~ti~n eodeword I and d by codebook search controller 140.
The scaled Pyritpti~n signal Y u;(n) iB then filtered by long-term filter 124 and short-term filter 126 to generate the l u~Led speech vector s'j(n). Filter 124 utilizes the long-term predictor p~..lleLel~, B and L to introduce voice p~,.iodicily, and filter 126 utilizes the short-term predictor pal~dldl~ to introduce the spectral envelope, as described above. Long-term filter 124 will be described in detail in the following figures. Note that blocks 124 and 126 are actually recursive filters which contain the long-term predictor and short-term predictor in their lc,~e~ feedback paths.
The reconstructed speech vector s';(n) for the i-th P~rit~tinn code vector is compared to the same block of the input speech vector s(n) by ~ubllc~ g these two signals in ~,u~ 130. The difference vector e;(n) le:pl~S~:11t.5 the dill~ e between the original and the ~ I-u~led bloeks of speech. The di~ lc~ vector is p~ l,ually weighted by 61.1illg filter 132, utilizing the ~,.61lLillg filter p~
WFP generated by coefficient analyzer 110. Refer to tbe preceding reference for a representative weighting filter transfer function. r~l~d~lUal weighting ~c~ell~uates those frequencies where the error is perceptually more important to the human ear, and ~tt~ tPq other frequencies.
Energy calculator 134 computes the energy of the weighted di~l~llce vector e';(n), and applies this error signal E; to codebook search controller 140. The search controller .~

20~7~9 . 12 - CM00450HP
compares the i-th error signal for the present excitation vector ui(n) against previous error signals to determine the p~rjt~ti(7n vector producing the minimllm error. The code of the i-th Pyrit~tion vector having a minimllm error is then 5 output over the channel as the best P~rit~tion code I. In the alternative, search controller 140 may determine a particular codeword which provides an error signal having some pre~3Pt~PrminPd criteria, such as meeting a predefined error threshold.
Figure 1 illustrates one PmhorlimPnt. of the invention for a code-excited linear predictive speech coder. In this ...ho.l;,.,~,.t, the long-term filter parameters L and 13 are ~Pie. .,,;l,Pd in an open-loop cullfii,ula~ion by roP~ Pnt analyzer 110. AlLdlllaLi~ly, the long-term filter parameters 15 can be determined in a closed-loop configuration as described in the tlru~ nPd Singhal and Atal reference. Generally, p~,.r~ uàl~ce of the speech coder is improved using long-term filter p~l6lll~cLel~ dptprminpd in the closed-loop configuration.
The nûvel structure of the long-term predictor according to the 20 present invention greatly f irilit~tPR the use of the closed-loop ~1.-1,.. ",;..~1 ~n of these parameters for lags L less than the frame length N.
Figure 2A illustrates an Pmho~imPnt of long-term filter 124 of Figure 1, where L is constrained to be an integer.
25 Al hough Figure 1 shows the scaled PY~it~ti~n vector ~ u;(n) from i~un block 122 as being input to long-term filter 124, a representative input speech vector s(n) has been used in Figure 2A for purposes of P~rrl~n~ti~n Hence, a frame of N
samples of input speech vector s(n) is applied to adder 210.
30 The output of adder 210 produces the output vector b(n) for the long-term filter 124. The output vector b(n) is fed back to delay block 230 of the long-term predictor. The Nominal long-term predictor lag p~ lUl~ ZI L iB also input to delay block 230. The long term predictor delay block provides output vector q(n) to 2Q~78~
13- CM00450HP ~~~~~
long-term predictor mllltirli~r block 220, which scales the long-term predictor response bythe long-term predictor roPffiri~nt B. The scaled output Bq(n) is then applied to adder 210 to complete the feedback loop of the recursive filter.
The output response Hn(z) of long-term filter 124 is defined in Z-LI; Ll~rul~ notation as:

Hn(Z) =
L(n+L)/L~L) I-sz ~
10 wherein n L~le~ a sample number of frame cnntAining N samples, 0 ~ n ~ N-1, wherein B I ~ s~ a filter roeffiriPnt wherein L represents the nominal lag or delay of the long-term predictor, and wherein L(n+L)/L~ S~ b the closest integer less than or equal to (n+L)/L. The long-term 1 5 predictor delay L (n+L)/L~ L varies as a function of the sample number n. Thus, according to the present invention, the actual long-term predictor delay becomes kL, wherein L is the basic or nominal long-term predictor lag, and wherein k is an integer chosen from the set ~1, 2, 3, 4, .. } as a function of the 20 sample number n. A~,.,vldi~l~ly, the long-term filter output response b(n) is a function of the nominal long-term predictor lag p~la LL~l~l L and the filter state FS which e~ists at the beg;nnine of the frame. This st~t~mPnt holds true for all vai~es of L -- even for the problematic case of when the pitch 25 lag L is less than the frame length N.
The function of the long-term predictor delay block 230 is to store the current input samples in order to predict future samples. Figure 2B represents a ~innrlified diagram of a shift register, which may be helpful in understanding the operation 30 of long-term predictor delay block 230 of Figure 2A. For sample number Q such that n=Q, the current output sample b(n) is applied to the input of the shift register, which is shown on the right on Figure 2B. For the ne~t sample n-~+1, the 2~337~9 .
. 14- CM00450HP
previous sample b(n) is shifted left into the shift register. This sample now becomes the first past sample b(n-1). For the next sample n=1+2, another sample of b(n) is shifted into the register, and the original sample is again shifted left to become the second past sample b(n-2). After L samples have been shifted in, the original sample has been shifted lert L
number of times such that it may be .cl~lc~.lldd as b(n-L).
As m~nt;~nPd above, the lag L would typically be the pitch period of voiced speech or a multiple of it. rf the lag L is at least as long as the frame length N, a su~icient number of past samples have been shifted in and stored to predict the next frame of speech. Even in the extreme case of where L=N, and where n=N-1, b(n-L) will be b(-1), which is indeed a past sample. Hence, the sample b(n-L) would be output from the shift register as the output sample q(n).
If however, the long-term predictor lag parameter L is shorter th~n the frame length N, then an in~1lffi~Pnt number of samples would have been shifted into the shift register by the bPeinnine of the next frame. Using the above example a 250 Hz pitch period, the pitch lag L would be equal to 32. Thus, where L=32 and N=60, and where k=N-1=59, b(n-L) would normally be b(27), which IC~JI~, .,.ll8 a future sample with respect to the beeinnine of the frame of 60 samples. In other words, not enough past samples have been stored to provide a complete long-term predictor response. The complete long-term predictor response is needed at the b~;....;..~ of the frame such that closed-loop analysis of the predictor clc.i~ can be p~.r~ d. According to the invention in that case, the same stored samples b(n-L), 0 < n < L, are 30 repeated such that the output response of the long-term predictor is alwsys a function of samples which have been input into the long-term predictor delay block prior to the start of the current frarne. In terrns of Figure 2B, the shift register has thus been e~tended to store another kL samples, which ;~Q~713~9 represent modifying the structure of the long-term predictor delay block 230. Hence, as the shift register fills with new samples b(n), k must be chosen such that b(n-kL) represents a sample which existed in the shift register prior to the start of 5 the frame. Using the previous example of L=32 and N=60, output sample q(32) would be a repeat of sample q(0), which is b(0-L)=b(32-2L) or b(-32).
Hence, the output response q(n) of the long-term predictor delay block 230 would CC~ ,.olld to:
q(n) = b(n-kL) wherein 0 < n c N-l, where k is chosen as the smallest integer such that (n-kL) is negative. More ~perifir~lly~ if a frame of N samples of s(n) is input into long-term predictor filter 124, each sample number n is j<n~N+j-1 where j is the 15 index for the first samp~e of a frame of N samples. Hence, the variable k would vary such that (n-kL) is always less than j.
This ensures that the long-term predictor utilizes only samples available prior to the bP~innin~ of the frame to predict the output response.
The operation of long-term filter 124 of Figure 2A will now be described in accordance with the flowchart of Figure 3.
Starting at step 350, the sample number n is initialized to zero at step 351. The nominal long-term predictor lag parameter L
and the long-term predictor roPffiriPnt 13 are input from rnPfl;riPnt analyzer 110 in step 352. In step 353, the sample number n is tested to see if an entire frame has been output. If n > N, operation ends at step 361. If all samples have not yet been romrllt~ptl~ a signal sample s(n) is input in step 354. In step 355, the output response of long-term predictor delay block 230 is f~ tPd according to the equation:
q(n)= b(n-L(n+L)/L~L) wherein L(n+L)/L~ represents the closes integer less than or equal to (n+L)lL. For example, if n=56 and L=32, then ~0~7~39~
. ~
- 16- C.~rO0450HP
L(n+L)/L~L) becomes L(56+32/32~L, which is L(2 75)~ L or 2L. In step 356, the output response b(n) of the long-term filter is computed according to the equation:
b(n) = 13 q(n) + s(n) This represents the function of m.lltirliPr 220 and adder 210.
In step 357, the sample in the shift register is shifted left one position, for all register locations between b(n-2) and b(n-LMAX), where LMAX ~ e~ s the ~ -r- Iong-term predictor lag that can be assigned. In the preferred .... hQ.l;.. "."t, LMAX would be equal to 143. In step 358, the output sample b(n) is input into the first location b(n- 1) of the shift register. Step 359 outputs the filtered sample b(n). The sample number n i8 then in..~ ed in step 360, and then tested in step 353. When all N samples have been computed, the process ends at step 361.
Figure 2C is an altcrnative PmhorlimPnt of a longter~n filter i.~c~ ing the present Invention. Filter 124' is the f~ul ~-d inverse version of the recursive filter configuration of Figure 2A. ~nput vector s(n) is applied to both subtractor 240 and long-term predictor delay block 260.
Delayed vector q(n) is output to ml~ltirliPr 250, which scales the vector by the long-term predictor ~oefficient. 13. The output response Hn(z) of digital filter 124' is given in z-transforrn notation as:
_ L(n+L)/LlL) Hn(z) =l -13z-wherein n le~ .lts the sample number of a frame ~r~nts~inin~ N samples, 0 5 n ~ N-l, wherein n represents the long-term filter coPffi~iPnt, wherein L lepl<,s~--L~ the nominal lag or delay of the long-term predictor, and wherein L(n+L)/L~
l~ the closest integer less than or equal to (n+L)/L.
The output signal b(n) of filter 124' may also be defined in terms of the input signal s(n) as:
b(n) = s(n) -13 s(n -L(n+L)/L~L) . 17 - CM00450HP
for 0 < n ~ N-1. As can be appreciated by those skilled in the art, the structure of the long-term predictor has again been modified so as to repeatedly output the same stored samples o~
the long-term predictor in the case of when the long-term 5 predictor lag L is less than the frame length N.
Referring next to Figure 5, there is illustrated the preferred Pmho~limPnt of the long-term filter 124 of Figure 1 which allows for sllh~nnrlP resolution for the lag parameter L. A frame of N samples of input speech vector s(n) is applied l 0 to adder 510. The output of adder 510 produces the output vector b(n) for the long term filter 124. The output vector b(n) is fed back to delayed vector generator block 530 of the long-term predictor. The nominal long-term predictor lag parameter L
is also input to delayed vector generator block 530. The long-15 term predictor lag palalll-tel L can take on non-integer rational number values. The preferred Pmho~limPnt allows L
to take on values which are a multiple of one half. Alternate impl- ~ t;~m~ of the sub-sample resolution long-term predictor of the present invention could allow values which are 20 multiples of one third or one fourth or any other rational fraction.
~n the preferred Pmho-iimPnt the delayed vector O~ 530 includes a memory which holds past samples of b(n). In addition, interpolated samples of b(n) are also 25 ~~ tpd by delayed vector generator 530 and stored in its memory. In the preferred Pmho~1imPnt, the state of the long-term predictor which is contained in delayed vector generator 530 has two samples for every stored sample of b(n). One sample is for b(n) and the other sample represents an 30 interpolated sample between two consecutive b(n) samples. In this way, samples of b(n) can be obtained from delayed vector o. 530 which correspond to integer delays or multiples of half 3ample delays. The interpolation is done using interpolating finite impulse response filters as described in z~
-18- C~00450HP
tirate Dis1ital Si~n~l Prm~?c~inF by R. Crochiere and L.
Rabiner, published by Prentice Hall Rubin Donally, 1983. The operation of vector delay generator 530 i9 described in further detail he~ bclo~ in conjunction with the flowcharts in 5 Figure 6A and 6B.
Delayed vector generator 630 provides output ~ ector q(n) to long-term mllltirli~r block 520, which scales the long-term predictor response by the long-term predictor coefEicien~ 13.
The 6caled output ~q(n) is then applied to adder 510 to complete 10 the feedback loop of the recursive filter 124 in Figure 5.
Referring to Figures 6A and 6B, there are illustrated detailed flowchart diagrams detailing the operations p~ f~ d by the long-term filter of Figure 5. According to the preferred ~mhotlim~nt. of the present invention, the resolution 15 of the long-term predictor memory is extended by mapping an N point sequence b(n), onto a 2N point vector e:~(i). The negative indexed samples of ex(i) contain the extended resolution past values of the long-term filter output b(n~, f~Yritslti~ n, or the extended resolution long term history. The 20 mapping process doubles the temporal resolution of the long-term predictor memory, each time it is applied. Here for simplicity single stage mapping is rl~rrihe~l although i~libnnsll stages may be imrl~m~nted in other f~mhorlim.~ntc of the present invention.
25 _ Entering at START step 602 in Figure 6A, the flowchart proceeds to step 604, where L, ~ and s(n) are inputted. At step 608, vector q(n) is uull~ll u~,led according to the equation:
q(n) = ex(2n - 2LL(n+L)/L~) for O<n~N-l 30 wherein L(n+L)lL~ represents the closes integer less than or equal to (n+L)/L and wherein L is the long term predictor lag.
For voiced speech, long term predictor lag L may be the pitch period or a multiple of the pitch period. L may be an integer ûr a real number whose fractional part is 0.5 in the preferred ;~3~8~9 ~mhorlimPnt When the fractional part of L is 0.5, L has an effective resolution of half a sample.
In step 610, vector b(n) of the long-term filter is computed according to the equation:
b(n) = B q(n) + s(n) for O<n~N-1 In step 612, Yector b(n) of the long-term filter is outputted. In step 614, the extended resolution state ex(n) is updated to generate and store the interpolated values of q(n) in the memory of delayed vector generator 530. Step 614 is illustrated in more detail in Figure 6B. Next, at step 616 the process has been ~ t~d and stops.
Entering at START step 622 in Figure 6B, the tlowchart proceeds to step 624, where the samples in ex(i) to be calculated in this subframe are zeroed out, ex(i) = O for i = -M, -M+2, 2N-I, where M is chosen to be odd for a filter of order 2M+1.
For example, if the order of the filter is 39, M is 19. Although M has been chosen to be odd for simplicity, M may also be even. At step 626, every other sample of ex(i) for i = O, 2, 2(N-1) is initi ili7~d with samples of b(n) according to the equation:
ex(2i) = b(i) fori=O, 1,...,N-1.
Thus ex(i) for i = O, 2, ..., 2(N-1) now holds the output vector 25 b(nl for the current subframe mapped onto its even indices, while the odd indices of exd(i) for i = 1, 3, ..., 2(N-1)+1 are initi~li7~d with zeros.
At step 628, the interpolated samples of ex(i) iniii~li7~d to zero are reconstructed through FIR interpolation, using a 30 symmetric, zero-phase shift filter, assuming that the order of such FIR filter is 2M+1 as explained hereinabove. The FIR
filter co~ffiri~nt~ are a(j), where j = -M, -M+2, ..., M-1, M and where a(j) = a(j). Only even samples pointed to be the FIR
filter taps are used in sample reconstruction, since odd 20- C~I00450HP
ssmples have been set to zero. As a result, M+l sanlples instead of 2M+1 samples are actually weightPd snd summed for each l~ l,Lru~Led sample. The FIR interpolation is performed according to the equation:
(M+l) ~x(i) s 2 ~ a2j fex(i-2j+1)+cx(i+2j- l)], j,l for i=-M,-M+2,...,2(N-l)-M-2,2(N-l)-M
Note that the first ssmple to be reconstructed is ex(-M), not ex(l) as one might expect. This is becsuse interpolated ssmples at indices -M,-M+2,..,-1 were reconstructed at the 10 previous frsme using an estimate of the P~rritAtinn in the current frame, since the actual PYrit~ti~n ssmples were then ~n~iPfinef1 At the current frsme those samples are known ( we have b(n) ), and thus the samples of ex(i), for i=-M,-M+2,..,-1 are now reconstructed again, with the filter tsps pointing to 15 the actual and not p~t;m~tPd vslues b(n).
The largest value of i in the above equation, is 2(N-l)-M.
This means that (M+1)12 odd samples of ex(i), for i=2N-M,2N-M+2,...,2(N-l)+l, still are to be reconstructed. However, for those values of index i, the upper tsps of the interpolating filter 20 point to the future samples of the PYrit~tir~n which are as yet ~ln~1PfinP~1 To calculate the values of ex(i) for those indices, the future state of ex(i) for i=2N,2N+2,.. ,2N+M-1 i8 extended by evaluating at step 630:
ex(i) = ~ ex(i-2L), for i=2N,2N+2,........................... ,2N+M-l The minim~lm value of 2L to be used in this scheme is 2M+l.
This c - .~11,.;..l may be lifted if we define:
ex(i) = ~ ex( F(i-2L)), for i=2N,2N+2,...,2N+M-l;
30 where F(i-2L) for i-2L equal to odd numbers is given by:
i-2L, for i-2L S 2(N-I)-M
F(i-2L) = i-2L-2L ~i 2(N21)+M 2~, for i-2L > 2(N-I)-M

~O~B~i9 and where F(i-2L) for i-2L equal to even numbers is giYen by:
r i-2L, for i-2L ~ 2(N-I) F(i-2L) = ~ i 2L 2L[~i-2(N-1)-2~ fo i 2L>2(N 1) The parameter ~, the history extension scaling factor, may be set equal to ~, which is the pitch predictor coefficient, or set to 5 unity.
At step 632, with the Plrrit~ti~n history thus extended, the last (M+1)/2 zeroed samples of the current extended resolution subframe are ~ t~d using:
~M+l) ex(i) = 2 ~ a~j l [ex(i-2j+1)+ex(i+2j-1)], j=l for i=2N-M,2N-M+2,.. , 2(N-1)+1 These samples will be rPc~lr~ t~d at the next subframe, once the actual P~l~it~ti~n samples for ex(i), i=2N,2N+2,...,2N+M-1 become available.
Thus b(n), for n=O,N-1 has been mapped onto vector 15 ex(i), i=0,2,...,2(N-l). The missing zeroed samples have been reconstructed using an FIR interpolating filter. Note that the FIR interpolation is applied only to the missing samples. This ensures that no distortion is unnecessarily introduced into the known samples, which are stored at even indices of ex(i). An 20 ~ itinn~l benefit of ~IU~ lg only the missing samples, is that ~ullllJuL~tion ~Uri~t~d with the i~f~ olalion is halved.
At step 634, finally the long term predictor history is updated by shifting down the contents of the extended resolution f~Yrit~tinn vector ex(i) by 2N points:
ex(i) = ex(i+2N), for i=-2Max_L,-1 where Max_L is the lI~ IIIIII Iong term predictor delay used Next, at step 636 the process has been cnmrlcted and stops.

- ~ 2037899 .
. 22 - C~00450HP
Referring now to Figure 4, a spcech synthesizcr block diagram is illustrated using the long-term filter of the present invention. Synthesizer 400 obtains the short-term predictor parameters ai, long-term predictor y~ ~.;, B and L, P.rit~t;~n gain factor ~ and the codeword I received ~rom the chanr~el, via de-m~lltirlPYpr 450. The codeword I is applied to codebook ROM 420 to address the codebook of PY~it~t;~n vectors.
Codebook ROM 420 is preferably ;",pl. ,.~ P~1 as described in US Patent No. 4,817,157. The single Py~it~t;~n vector ul(n) is then mll~tirliPd by the gain factor ~ in block 422, filtered by long-term predictor filter 424 and short-term predictor filter 426 to obtain reconstructed speech vector s'l(n). This vector, which ~ Lb a frame of L~u,./.~.ucted speech, is then applied to analog-to-digital (A/D) convertor 408 to produce a reconstructed analog signal, which is then low pass filtered to reduce aliasing by filter 404, and applied to an output transducer such as speaker 402.
~Pn-~P th~ CELP synthesizer utilizes the same codebook, gain block, long-term filter, and short-term filter as the CELP
analyzer of Figure 1.
Figure 7 is a detailed block diagram of a pitch post filter for i~ .g the short term filter 426 and D/A converter 408 of the speech synthesizer in Figure 4. A pitch post filter enhances the speech quality by removing noise introduced by thQfilters 424 and 426. A frame of N samples of reconstructed speech vector s'l(n) is applied to adder 710, The output of adder 710 produces the output vector s"(n) for the pitch post filter.
The output vector s"(n) is fed back to delayed sample generator block 630 of the pitch post filter. The nominal longterm predictor lag p~ l L i9 also input to delayed sample generator block 730. L may take on non-integer values for the present invention. If L is a non-integer, an interpolating FIR
filter is used to generate the fractional sample delay needed.
Delayed sample ~ Ol 730 provides output vector q(n) to B

~ 20~7~
- 23 - CM004~0HP
ml-ltirliDr block 720, which scales the pitch post filter response by coPf~ Dnt R which is a function of the long-term predictor 13. The scaled output Rq(n) is then applied to adder 710 to complete the feedback loop of the pitch post filter in 5 Fii~ure 7.
In utilizing the long-term predictor response according to the present invention, the P~ritat;~n gain factor Y and the long-term predictor i oDffii~ient 13 can be ~imlllt:~nDously opt.mi~ed for all values of L in a closed-loop configuration.
10 This joint optimi7at;~n technique wa3 heretofore impractical for values of L < N, since the joint ,,yi:...; ,.,.l: .. . equations would become nonlinear in the single p~.~e~e. B. The present invention modifies the structure of the long-term predictor to allow a linear joint v~ n equation. In 15 addition, the present invention allows the long-term predictor lag to have better resolution than one sample thereby Pnhaniine its performance.
Moreover, the codebook search IJ~v~.~.luld has been further .~imrlifiDtl since the zero state response of the long-20 term filter becomes zero for lags less than the frame length.This pd~litiona1 feature permits those skilled in the art to remove the effect of the long-term filter from the codebook search l~lvl~dllle. Hence, a CELP speech codêr has been shown which can provide higher quality speech for all pitch 25 rates while retaining the advantages of practical , ' ' ' and low bit rate.
While specific Dmho~imi~ntc of the present invention have been shown and described herein, further mo-lifii~atinnc and hlll~lvvell~ may be made without departing from the 30 invention in its broadêr aspects. For example, any type of speech coding (e.g., RELP, multipulse, RPE, LPC, etc.) may be used with the sub-sample resolution long-term predictor filtering t~D/~hniqni? described herein. Moreover, a~ itiilnal equivalent configurations of the sub-sample resolution long-;;~03~89 term predictor structure may be made which perform the same ~ as those illustrated above.

Claims (16)

Claims
1. A method of reconstructing speech comprising the steps of:
receiving from a communication channel a set of speech parameters including codeword 1 and a delay parameter L, where L may have a value in a predetermined range including integer and non-integer values related to a speechpitch period;
generating an excitation vector having a plurality of samples in response to the codeword I;
filtering the excitation vector based on at least the delay parameter L and stored filter state samples, the step of filtering comprising the steps of:
computing interpolated filter state samples from the stored filtered state samples using a non-integer L to determine the appropriate interpolation parameters; and combining the excitation vector with the interpolated filter state samples, thereby forming a filter output vector having a plurality of filter output samples;
and processing the filter output vector to produce reconstructed speech.
2. A method of reconstructing speech in accordance with claim 1 wherein the step of filtering further comprises the step of combining, responsive to L being an integer, the excitation vector with the stored filter state samples, thereby forming filter state output samples.
3. A method of reconstructing speech in accordance with claim 1 wherein the step of filtering further comprises the step of updating the stored filter state samples using the filter output samples.
4. A method of reconstructing speech in accordance with claim 1 further comprising the steps of:
converting the reconstructed speech to an analog voice signal, and transducing the analog voice signal into a perceptible audio output, such that the speech pitch periods are more accurately predicted.
5. Apparatus for reconstructing speech comprising:
receiving circuitry for receiving from a communication channel a set of speech parameters including codeword I and a delay parameter L, where L may have a value in a predetermined range including integer and non-integer values related to a speech pitch period;
generating circuitry for generating an excitation vector having a plurality of samples in response to the codeword I;
filtering circuitry for filtering the excitation vector based on at least the delay parameter L and stored filter state samples, the filtering circuitry comprising:
computing circuitry for computing interpolated filter state samples from the stored filtered state samples using a non-integer L to determine the appropriate interpolation parameters; and combining circuitry for combining the excitation vector with the interpolated filter state samples, thereby forming a filter output vector havinga plurality of filter output samples; and processing circuitry for processing the filter output vector to produce reconstructed speech.
6. Apparatus for reconstructing speech in accordance with claim 5 wherein the combining circuitry further comprises combining, responsive to L
being an integer, the excitation vector with the stored filter state samples, thereby forming filter state output samples.
7. Apparatus for reconstructing speech in accordance with claim 5 wherein the filtering circuitry further comprising updating circuitry for updating the stored filter state samples using the filter output samples.
8. Apparatus for reconstructing speech in accordance with claim 5 further comprising:
converting circuitry for converting the reconstructed speech to an analog voice signal; and transducer circuitry for transducing the analog voice signal into a perceptible audio output, such that the speech pitch periods are more accuratelypredicted.
9. A method of reconstructing speech comprising the steps of:
receiving from a communication channel a set of speech parameters including codeword I and a delay parameter L, where L may have a value in a predetermined range including integer and non-integer value related to a speech pitch period;
generating an excitation vector having a plurality of samples in response to the codeword I;
filtering the excitation vector based on at least the delay parameter L, a set of stored filter state samples and at least one set of stored interpolated filter state samples, the step of filtering comprises the steps of:
choosing a chosen set of filter state samples from the group consisting of the set of stored filter state samples and the at lest one set of stored interpolated filter state samples, the step of choosing using at least the delayparameter L, and combining the excitation vector with the chosen filter state samples, thereby forming a filter output vector having a plurality of filter output samples;
and processing the filter output vector to produce reconstructed speech.
10. A method of reconstructing speech in accordance with claim 9 further comprising the steps of:
converting the reconstructed speech to an analog voice signal; and transducing the analog voice signal into a perceptible audio output, such that the speech pitch periods are more accurately predicted.
11. Apparatus for reconstructing speech comprising:
receiving circuitry for receiving from a communication channel a set of speech parameters including codeword I and a delay parameter L, where L may have a value in a predetermined range including integer and non-integer values related to a speech pitch period;
generating circuitry for generating an excitation vector having a plurality of samples in response to the codeword I;
filtering circuitry for filtering the excitation vector based on at least the delay parameter L, a set of stored filter state samples and at lest one set of stored interpolated filter state samples, the filtering circuitry comprising:
choosing circuitry for choosing a chosen set of filter state samples from the group consisting of the set of stored filter state samples and the at least one set of stored interpolated filter state samples, the step of choosing using at least the delay parameter L, and combining circuitry for combining the excitation vector with the chosen filter state samples, thereby forming a filter output vector having a plurality of filter output samples; and processing circuitry for processing the filter output vector to produce reconstructed speech.
12. Apparatus for reconstructing speech in accordance with claim 11 further comprising:
converting circuitry for converting the reconstructed speech to an analog voice signal; and transducing circuitry for transducing the analog voice signal into a perceptible audio output, such that the speech pitch periods are more accuratelypredicted.
13. A method of encoding speech into sets of speech parameters for transmission on a communication channel, each set of speech parameters, the method comprising the steps of:
sampling a voice signal plurality of times to provide a plurality of samples forming a present speech vector;
generating a delay parameter L having a value in a predetermined range including integer and non-integer values related to a speech pitch period of thepresent speech vector;
searching excitation vectors to determine a codeword I that best matches the present speech vector, the step of searching comprising the steps of:
generating excitation vectors in response to corresponding codewords;
filtering each excitation vector comprising the steps of:
computing interpolated filter state samples from the stored filtered state samples using a non-integer L to determine the appropriate interpolation parameters, and combining the excitation vector with the interpolated filter state samples, thereby forming a filter output vector having a plurality of filter output samples;
processing the filter output vector to produce a reconstructed speech vector;
comparing the reconstructed speech vector to the present speech vector to determine the difference therebetween; and selecting the codeword I of the excitation vector for which the reconstructed speech vector differs the least from the present speech vector; and transmitting the selected codeword I and delay parameter L together with preselected speech parameters for the present speech vector on the communications channel, such that the speech pitch periods are more accurately predicted.
14. Apparatus for encoding speech into sets of speech parameters for transmission on a communication channel, each set of speech parameters, the apparatus comprising:
sampling circuitry for sampling a voice signal a plurality of times to provide a plurality of samples forming a present speech vector;
generating circuitry for generating a delay parameter L having a value in a predetermined range including integer and non-integer values related to a speech pitch period of the present speech vector;
searching circuitry for searching excitation vectors to determine a codeword I that best matches the present speech vector, the searching circuitry comprising:
generating circuitry for generating excitation vectors in response to corresponding codewords;
filtering circuitry for filtering each excitation vector, the filtering circuitry comprising:
computing circuitry for computing interpolated filter state samples from the stored filtered state samples using a non-integer L to determine the appropriate interpolation parameters, and combining circuitry for combining the excitation vector with the interpolated filter state samples, thereby forming a filter output vector havinga plurality of filter output samples;
processing circuitry for processing the filter output vector to produce a reconstructed speech vector;
comparing circuitry for comparing the reconstructed speech vector to the present speech vector to determine the difference therebetween: and selecting circuitry for selecting the codeword I of the excitation vector for which the reconstructed speech vector differs the least from the present speech vector; and transmitting circuitry for transmitting the selected codeword I and delay parameter L together with pre-selected speech parameters for the present speech vector on the communications channel, such that the speech pitch periods are more accurately predicted.
15. A method of encoding speech into sets of speech parameters for transmission on a communication channel, each set of speech parameters, the method comprising the steps of:
sampling a voice signal a plurality of times to provide a plurality of samples forming a present speech vector;
generating a delay parameter L having a value in a predetermined range including integer and non-integer values related to a speech pitch period of thepresent speech vector;
searching excitation vectors to determine a codeword I that best matches the present speech vector, the step of searching comprising the steps of:
generating excitation vectors in response to corresponding codewords, filtering each excitation vector based on at least the delay parameter L, a set of stored filter state samples and at least one set of stored interpolatedfilter state samples, the step of filtering comprising:
choosing a chosen set of filter state samples from the group consisting of the set of stored filter state samples and the at least one set of stored interpolated filter state samples, the step of choosing using at least the delayparameter L, and combining the excitation vector with the chosen filter state samples, thereby forming a filter output vector having a plurality of filter output samples;
processing the filter output vector to produce a reconstructed speech vector;
comparing the reconstructed speech vector to the present speech vector to determine the difference therebetween; and selecting the codeword I of the excitation vector for which the reconstructed speech vector differs the least from the present speech vector; and transmitting the selected codeword I and delay parameter L together with preselected speech parameters for the present speech vector on the communications channel, such that the speech pitch periods are more accurately predicted.
16. Apparatus for encoding speech into sets of speech parameters for transmission on a communication channel, each set of speech parameters, the apparatus comprising:
sampling circuitry for sampling a voice signal a plurality of times to provide a plurality of samples forming a present speech vector;
generating circuitry for generating a delay parameter L having a value in a predetermined range including integer and non-integer values related to a speech pitch period of the present speech vector;
searching circuitry for searching excitation vectors to determine a codeword I that best matches the present speech vector the searching circuitry comprising:
generating circuitry for generating excitation vectors in response to corresponding codewords;
filtering circuitry for filtering each excitation vector based on at least the delay parameter L, a set of stored filter state samples and at least one set of stored interpolated filter state samples, the filter circuitry comprising:
choosing circuitry for choosing a chosen set of filter state samples from the group consisting of the set of stored filter state samples and the at least one set of stored interpolated filter state samples, the choosing circuitry using atleast the delay parameter L, and combining circuitry for combining the excitation vector with the chosen filter state samples, thereby forming a filter output vector having a plurality of filter output samples;
processing circuitry for processing the filter output vector to produce a reconstructed speech vector;
comparing circuitry for comparing the reconstructed speech vector to the present speech vector to determine the difference therebetween; and selecting circuitry for selecting the codeword I of the excitation vector for which the reconstructed speech vector differs the least from the present speech vector; and transmitting circuitry for transmitting the selected codeword I and delay parameter L together with pre-selected speech parameters for the present speech vector on the communications channel, such that the speech pitch periods are more accurately predicted.
CA002037899A 1989-09-01 1990-06-25 Digital speech coder having improved long-term predictor Expired - Lifetime CA2037899C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US40220689A 1989-09-01 1989-09-01
US402,206 1989-09-01

Publications (2)

Publication Number Publication Date
CA2037899A1 CA2037899A1 (en) 1991-03-02
CA2037899C true CA2037899C (en) 1996-09-17

Family

ID=23590969

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002037899A Expired - Lifetime CA2037899C (en) 1989-09-01 1990-06-25 Digital speech coder having improved long-term predictor

Country Status (12)

Country Link
EP (1) EP0450064B2 (en)
JP (1) JP3268360B2 (en)
CN (1) CN1026274C (en)
AT (1) ATE191987T1 (en)
AU (1) AU634795B2 (en)
CA (1) CA2037899C (en)
DE (1) DE69033510T3 (en)
DK (1) DK0450064T4 (en)
ES (1) ES2145737T5 (en)
MX (1) MX167644B (en)
SG (1) SG47028A1 (en)
WO (1) WO1991003790A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5327520A (en) * 1992-06-04 1994-07-05 At&T Bell Laboratories Method of use of voice message coder/decoder
FR2702590B1 (en) * 1993-03-12 1995-04-28 Dominique Massaloux Device for digital coding and decoding of speech, method for exploring a pseudo-logarithmic dictionary of LTP delays, and method for LTP analysis.
FI96248C (en) 1993-05-06 1996-05-27 Nokia Mobile Phones Ltd Method for providing a synthetic filter for long-term interval and synthesis filter for speech coder
DE4421853A1 (en) * 1994-06-22 1996-01-04 Philips Patentverwaltung Mobile terminal
GB9408037D0 (en) * 1994-04-22 1994-06-15 Philips Electronics Uk Ltd Analogue signal coder
JP2970407B2 (en) * 1994-06-21 1999-11-02 日本電気株式会社 Speech excitation signal encoding device
FR2729246A1 (en) * 1995-01-06 1996-07-12 Matra Communication SYNTHETIC ANALYSIS-SPEECH CODING METHOD
FR2729247A1 (en) * 1995-01-06 1996-07-12 Matra Communication SYNTHETIC ANALYSIS-SPEECH CODING METHOD
FR2729244B1 (en) * 1995-01-06 1997-03-28 Matra Communication SYNTHESIS ANALYSIS SPEECH CODING METHOD
US5708757A (en) * 1996-04-22 1998-01-13 France Telecom Method of determining parameters of a pitch synthesis filter in a speech coder, and speech coder implementing such method
JP4857468B2 (en) * 2001-01-25 2012-01-18 ソニー株式会社 Data processing apparatus, data processing method, program, and recording medium
JP4857467B2 (en) 2001-01-25 2012-01-18 ソニー株式会社 Data processing apparatus, data processing method, program, and recording medium
GB2466674B (en) 2009-01-06 2013-11-13 Skype Speech coding
GB2466673B (en) 2009-01-06 2012-11-07 Skype Quantization
GB2466671B (en) 2009-01-06 2013-03-27 Skype Speech encoding
GB2466675B (en) 2009-01-06 2013-03-06 Skype Speech coding
GB2466672B (en) * 2009-01-06 2013-03-13 Skype Speech coding
CN104025191A (en) * 2011-10-18 2014-09-03 爱立信(中国)通信有限公司 An improved method and apparatus for adaptive multi rate codec
FR3015754A1 (en) 2013-12-20 2015-06-26 Orange RE-SAMPLING A CADENCE AUDIO SIGNAL AT A VARIABLE SAMPLING FREQUENCY ACCORDING TO THE FRAME

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL168669C (en) * 1974-09-16 1982-04-16 Philips Nv INTERPOLING DIGITAL FILTER WITH INPUT BUFFER.
US4080660A (en) * 1975-07-11 1978-03-21 James Nickolas Constant Digital signal time scale inversion
US4020332A (en) * 1975-09-24 1977-04-26 Bell Telephone Laboratories, Incorporated Interpolation-decimation circuit for increasing or decreasing digital sampling frequency
NL8105801A (en) * 1981-12-23 1983-07-18 Philips Nv RECURSIVE DIGITAL FILTER.
US4573135A (en) * 1983-04-25 1986-02-25 Rca Corporation Digital lowpass filter having controllable gain
JPS60116000A (en) * 1983-11-28 1985-06-22 ケイディディ株式会社 Voice encoding system
JPS63214032A (en) * 1987-03-02 1988-09-06 Fujitsu Ltd Coding transmitter
JPS63249200A (en) * 1987-04-06 1988-10-17 日本電信電話株式会社 Vector quantization system
JPH01177227A (en) * 1988-01-05 1989-07-13 Toshiba Corp Sound coder and decoder

Also Published As

Publication number Publication date
JP3268360B2 (en) 2002-03-25
ES2145737T5 (en) 2007-03-01
DE69033510D1 (en) 2000-05-25
DK0450064T3 (en) 2000-10-02
CN1050633A (en) 1991-04-10
AU634795B2 (en) 1993-03-04
CN1026274C (en) 1994-10-19
EP0450064B1 (en) 2000-04-19
DK0450064T4 (en) 2006-09-04
JPH04502675A (en) 1992-05-14
ATE191987T1 (en) 2000-05-15
AU5952590A (en) 1991-04-08
CA2037899A1 (en) 1991-03-02
DE69033510T3 (en) 2007-06-06
EP0450064A4 (en) 1995-04-05
EP0450064B2 (en) 2006-08-09
MX167644B (en) 1993-03-31
DE69033510T2 (en) 2000-11-23
SG47028A1 (en) 1998-03-20
EP0450064A1 (en) 1991-10-09
ES2145737T3 (en) 2000-07-16
WO1991003790A1 (en) 1991-03-21

Similar Documents

Publication Publication Date Title
CA2037899C (en) Digital speech coder having improved long-term predictor
US5359696A (en) Digital speech coder having improved sub-sample resolution long-term predictor
EP0443548B1 (en) Speech coder
EP0409239B1 (en) Speech coding/decoding method
US5485581A (en) Speech coding method and system
US5127053A (en) Low-complexity method for improving the performance of autocorrelation-based pitch detectors
US5396576A (en) Speech coding and decoding methods using adaptive and random code books
US5826224A (en) Method of storing reflection coeffients in a vector quantizer for a speech coder to provide reduced storage requirements
US8271274B2 (en) Coding/decoding of a digital audio signal, in CELP technique
NO302849B1 (en) Method and apparatus for digital speech encoding
JP2002202799A (en) Voice code conversion apparatus
JPH04270398A (en) Voice encoding system
EP0577809A1 (en) Double mode long term prediction in speech coding.
CA2132006C (en) Method for generating a spectral noise weighting filter for use in a speech coder
JP3089769B2 (en) Audio coding device
CA2005115C (en) Low-delay code-excited linear predictive coder for speech or audio
US5873060A (en) Signal coder for wide-band signals
Taniguchi et al. Pitch sharpening for perceptually improved CELP, and the sparse-delta codebook for reduced computation
US5692101A (en) Speech coding method and apparatus using mean squared error modifier for selected speech coder parameters using VSELP techniques
US7337110B2 (en) Structured VSELP codebook for low complexity search
JP3168238B2 (en) Method and apparatus for increasing the periodicity of a reconstructed audio signal
JPH086597A (en) Device and method for coding exciting signal of voice
US4908863A (en) Multi-pulse coding system
US6041298A (en) Method for synthesizing a frame of a speech signal with a computed stochastic excitation part
EP0483882B1 (en) Speech parameter encoding method capable of transmitting a spectrum parameter with a reduced number of bits

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry