WO1993006592A1 - A linear prediction speech coding device - Google Patents

A linear prediction speech coding device Download PDF

Info

Publication number
WO1993006592A1
WO1993006592A1 PCT/BE1991/000066 BE9100066W WO9306592A1 WO 1993006592 A1 WO1993006592 A1 WO 1993006592A1 BE 9100066 W BE9100066 W BE 9100066W WO 9306592 A1 WO9306592 A1 WO 9306592A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
words
inputted
value
input
Prior art date
Application number
PCT/BE1991/000066
Other languages
French (fr)
Inventor
Gao Yang
Original Assignee
Lernout & Hauspie Speechproducts
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lernout & Hauspie Speechproducts filed Critical Lernout & Hauspie Speechproducts
Priority to PCT/BE1991/000066 priority Critical patent/WO1993006592A1/en
Publication of WO1993006592A1 publication Critical patent/WO1993006592A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • the invention relates to a Linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter unit having a third input for receiving said coding error word, an output of said filter unit being closed looped with said first unit.
  • Such a coding device is known from the article of B.S. Atal and S.L. Hanauer entitled "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave” published in Journal of the Acoustical Society of America, Volume 50, 1972 pp. 637-655.
  • a speech signal which is for example created by a human voice is sampled, for example with a 8 kHz sampling frequency.
  • first words are formed. Those first words can be as well the samples themselves or obtained by applying a filtering or another processing on the samples. Beside those first words second words are formed which represent an estimation of said first words.
  • the second words are for example determined by minimalisation of the outputted filter signal.
  • the weighting filter unit operates on a coding error word obtained for example by deducting each time the second code word from the first.
  • the use of such a filter enables to mask sounds which would normally not be percepted by a human ear.
  • is chosen between 0.8 and 1.
  • the filtering operation can also be applied on excitation error values being determined for example by the difference between an ideal excitation and an estimated excitation.
  • an ideal excitation v. ue is obtained from said first words by applying thereon a short time prediction filtering operation and the first unit is provided for determining an estimated excitation value.
  • An excitation error is than outputted by the second unit, while the filter unit has a transfer function I/A(z/ ⁇ ).
  • a drawback of the known coding device is that the amount of required computation is rather important. Reduction of this computation amount implies that certain constraints imposed on the transfer function of the filter unit have to be violated, which would then have negative consequences on the quality of the reconstructed speech.
  • An LP coding device is therefore characterized in that said weighting filter unit has a transfer function A(z)/H(z/ ⁇ ) wherein
  • the spectrum of the transfer function l/H(z/ ⁇ ) is the image of the average slope of the synthesis filter l/A(z/ ⁇ ). The particular choice of this filter having a transfer function 1/H(z/ ⁇ ) thus enables to save computative time without having negative consequences on the quality of the outputted signal.
  • the gist of the present invention resides in the particular choice of such a l/H(z/ ⁇ ) filter.
  • the particular choice of ⁇ thus enables to further reduce the calculation time, without affecting the outputted value.
  • Figure 1 shows a first embodiment of an LP coding device according to the invention
  • Figure 2 shows a second embodiment of an LP coding device according to the invention ;
  • Figure 3 shows a third embodiment of an LP coding device according to the invention.
  • the class of Linear Predictive (LP) coding devices comprises a large number of elements such as for example CELP (Code Excited LP), BCELP (Binary), SBCELP (Simplified), VELP (Vector), VSELP (Vector Sum).
  • CELP Code Excited LP
  • BCELP Binary
  • SBCELP Simple
  • VELP Vector
  • VSELP Vector Sum
  • the LP coding device shown in figure 1 comprises an input 2 for receiving a first words obtained by sampling a supplied speech signal originating for example from a human voice.
  • the LP coding device further comprises a first unit 1 provided for determining second words representing an estimation of said inputted first words.
  • the input 2 respectively an output of the first unit
  • This second unit 3 which is for example formed by a substraction unit, is provided for determining a coding error word, for example by - * - deducting the second word from the first word.
  • the thus obtained coding error word is inputted to a weighting filter unit , which output is connected to an element 5 provided for determining the energy value which is the sum of the square of the inputted words.
  • the energy value is supplied to a further element 6 provided for minimising that energy value.
  • An output of the further element 6 is connected to an input of the first element 1.
  • the filter unit is closed looped with the first unit 1 which enables to determine said second value in an iterative manner and so to reduce the energy value.
  • the weighting filter unit is formed by a first order filter having a transfer function A(z)/H(z/Y).
  • LPC Linear Predictive Coding
  • the form of the transfer function of the weighting filter is not critical as long as it satisfies the following constraints :
  • the synthesized speech signal has the same spectral behaviour as the original signal
  • the spectrum of the coding error is shaped according to the speech signal spectrum (3) the spectrum of the coding error exhibits a flatter average envelope.
  • A(z)/H(z/ ⁇ ) enables a substantial reduction of the computation requirements because this transfer function is, as can be seen from the relevant expression, a first order function, whereas the A(z/ ⁇ ) function used in the known filter units is of order 8-14 depending of the used LPC analysis.
  • the value of ⁇ is 0 ⁇ 1 and preferably 0.85.
  • a value of ⁇ 1 has the advantage that a faster response is obtained since there is a more rapid decrease to zero of the spectrum value, thus obtaining a further reduction of the calculation term, without affecting the outputted value.
  • Values of Y ⁇ 0.5 are possible but offer a less good quality of the obtained values.
  • Figure 2 shows a second embodiment of an LP coding device according to the invention.
  • This second embodiment has a more conventional architecture in comparison with the first embodiment illustrated in figure 1 which has a more theoretical architecture.
  • the main difference between the first and the second embodiment is that in the second embodiment the filter unit 7 has a transfer function 1/H(z/Y ) and operates on an excitation error value.
  • the numerator A(z) present in the first embodiment is no longer used in the filter unit but in a short time prediction filter 8 even as in the first unit 9.
  • This short time prediction filter which receives the first words supplied at input 2, is provided for determining an ideal excitation value from the inputted first words. To this end the inputted first words are multiplied by the function n
  • A(z) 1 + ⁇ a-z -1 . This operation is performed by the short term prediction filter 8.
  • FIG. 3 shows a third embodiment for an LP coding device according to the invention.
  • the first element 9 comprises an adaptive codebook 12 or a Long Term Prediction unit for forming the ideal excitation value.
  • a gam factor 8 supplied by unit II. The use of such an adaptive codebook enables to determine an estimation of the Long Term excitation.
  • the element 10 is an L.P.C analysis device provided for determing the a- coefficient.
  • word means a sequen ⁇ ce of binary samples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter which is closed looped with said first unit and which has a transfer function A(z)/H(z/η) wherein A(z) = (I) and H(z/η) = 1 + ν (z/η)-1, ν being the first Parcor coefficient, a¿i? a prediction coefficient obtained from an LPC analysis and an empirical value 0<η«1.

Description

A LINEAR PREDICTION SPEECH CODING DEVICE
The invention relates to a Linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter unit having a third input for receiving said coding error word, an output of said filter unit being closed looped with said first unit.
Such a coding device is known from the article of B.S. Atal and S.L. Hanauer entitled "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave" published in Journal of the Acoustical Society of America, Volume 50, 1972 pp. 637-655. A speech signal, which is for example created by a human voice is sampled, for example with a 8 kHz sampling frequency. On the basis of the thus formed speech samples first words are formed. Those first words can be as well the samples themselves or obtained by applying a filtering or another processing on the samples. Beside those first words second words are formed which represent an estimation of said first words.
Since the filter unit is closed looped with the first unit, the second words are for example determined by minimalisation of the outputted filter signal. The weighting filter unit operates on a coding error word obtained for example by deducting each time the second code word from the first. The use of such a filter enables to mask sounds which would normally not be percepted by a human ear. A common choice for a perceptual weighting filter is W(z) = A(z)/A(z/ α ) wherein A(z) a polynomial function is defining the short term prediction and a being empirically determined. For the common weighting filter α is chosen between 0.8 and 1.
Instead of using the first and second words, the filtering operation can also be applied on excitation error values being determined for example by the difference between an ideal excitation and an estimated excitation. In the latter case an ideal excitation v. ue is obtained from said first words by applying thereon a short time prediction filtering operation and the first unit is provided for determining an estimated excitation value. An excitation error is than outputted by the second unit, while the filter unit has a transfer function I/A(z/ α).
A drawback of the known coding device is that the amount of required computation is rather important. Reduction of this computation amount implies that certain constraints imposed on the transfer function of the filter unit have to be violated, which would then have negative consequences on the quality of the reconstructed speech.
It is an object of the invention to realise an LP coding device wherein the computation amount is reduced with respect to the known device without negative consequences on the quality of -the outputted value.
An LP coding device according to the invention is therefore characterized in that said weighting filter unit has a transfer function A(z)/H(z/γ ) wherein
A(z) = 1 + Σ a-2 _i and ά=l 1 H(z/γ) = 1 + μ (z/γ )-* , μ being the first Parcor coefficient, a- a prediction coefficient obtained from an LPC analysis and γ an empirical value o <γ<l. The spectrum of the transfer function l/H(z/γ ) is the image of the average slope of the synthesis filter l/A(z/α ). The particular choice of this filter having a transfer function 1/H(z/ γ) thus enables to save computative time without having negative consequences on the quality of the outputted signal. The transfer function 1/H(z/ γ) is a first order function as can be seen from the expression H(z/γ) = 1 + μ(z/ γ)~I, whereas the transfer function A(z/α ) which is used in the known filter unit is of order of 8-14, dependent of the order chosen in the LPC (Linear Predictive Code) analysis. The gist of the present invention resides in the particular choice of such a l/H(z/ γ) filter.
A first embodiment of an LP coding device according to the invention is characterized in that 0,5 γ < 1 , in particular γ= 0.85- With those values for a faster response is obtained since there is a more rapid decrease to zero of the spectrum value. The particular choice of γ thus enables to further reduce the calculation time, without affecting the outputted value.
The invention will now be described in detail by means of the figures showing examples of LP coding device according to the invention. In the figures :
Figure 1 shows a first embodiment of an LP coding device according to the invention ; Figure 2 shows a second embodiment of an LP coding device according to the invention ;
Figure 3 shows a third embodiment of an LP coding device according to the invention.
In the figures a same reference number has been assigned to a same or analogous element.
The class of Linear Predictive (LP) coding devices comprises a large number of elements such as for example CELP (Code Excited LP), BCELP (Binary), SBCELP (Simplified), VELP (Vector), VSELP (Vector Sum). The present invention is applicable to all the elements of this class as long as the coding device uses a filter as will- be described hereunder.
The LP coding device shown in figure 1 comprises an input 2 for receiving a first words obtained by sampling a supplied speech signal originating for example from a human voice. The LP coding device further comprises a first unit 1 provided for determining second words representing an estimation of said inputted first words.
The input 2 respectively an output of the first unit
1 are connected to a first respectively a second input of a second unit
3. This second unit 3, which is for example formed by a substraction unit, is provided for determining a coding error word, for example by - * - deducting the second word from the first word. The thus obtained coding error word is inputted to a weighting filter unit , which output is connected to an element 5 provided for determining the energy value which is the sum of the square of the inputted words. The energy value is supplied to a further element 6 provided for minimising that energy value. An output of the further element 6 is connected to an input of the first element 1. Thus the filter unit is closed looped with the first unit 1 which enables to determine said second value in an iterative manner and so to reduce the energy value. The weighting filter unit is formed by a first order filter having a transfer function A(z)/H(z/Y). The pre-emphasis function H(z/ γ) being defined :
Figure imgf000006_0001
wherein μ = R(1)/R(0) or μ = -k^ ; R(i) being the autocorrelation of the speech signal and k^ being the first Par cor coefficient. Υ being an empirically determined constant, and 0 <γ-$ 1. P
A(z) = 1 + ∑ aiZ _1 i=l being a polynomial function wherein ai 1S a coefficient determined by a LPC (Linear Predictive Coding) element. Such a LPC element is for example described in the book of L.R. Rabiner and R.W. Schafer "Digital Processing of Speech Signals" p. 395-453 and published by Prentice Hall, 1978. In the expression for A(z), P being an order value, for example P = 10. The form of the transfer function of the weighting filter is not critical as long as it satisfies the following constraints :
(1) the synthesized speech signal has the same spectral behaviour as the original signal
(2) the spectrum of the coding error is shaped according to the speech signal spectrum (3) the spectrum of the coding error exhibits a flatter average envelope.
For the common weighting filter, the above constraints are satisfied with α = 0.8 to 1 ; however, the amount of the required computation is very important. An immediate simplification would be to use W(z) = A(z) (α= 0, so called "open loop"); with such a choice however, the third constraint is not met due to much larger coding errors in the lower frequency region, which has a great influence on the quality of the reconstructed speech.
The use of a filter with a transfer function A(z)/H(z/ γ ) enables a substantial reduction of the computation requirements because this transfer function is, as can be seen from the relevant expression, a first order function, whereas the A(z/α ) function used in the known filter units is of order 8-14 depending of the used LPC analysis. The value of γ is 0 <γ< 1 and preferably 0.85. A value of γ< 1 has the advantage that a faster response is obtained since there is a more rapid decrease to zero of the spectrum value, thus obtaining a further reduction of the calculation term, without affecting the outputted value. Values of Y< 0.5 are possible but offer a less good quality of the obtained values.
Figure 2 shows a second embodiment of an LP coding device according to the invention. This second embodiment has a more conventional architecture in comparison with the first embodiment illustrated in figure 1 which has a more theoretical architecture. The main difference between the first and the second embodiment is that in the second embodiment the filter unit 7 has a transfer function 1/H(z/Y ) and operates on an excitation error value. The numerator A(z) present in the first embodiment is no longer used in the filter unit but in a short time prediction filter 8 even as in the first unit 9. This short time prediction filter, which receives the first words supplied at input 2, is provided for determining an ideal excitation value from the inputted first words. To this end the inputted first words are multiplied by the function n
A(z) = 1 + Σ a-z-1. This operation is performed by the short term prediction filter 8.
Since in the second embodiment the filter unit operates on an excitation error value, the first unit 9 produces an estimated excitation value for the ideal excitation value outputted by the short term prediction filter 8. The second embodiment enables to even more reduce the computation time of the filter unit 7 since the latter now only has to perform the function 1/H(z/ γ). Figure 3 shows a third embodiment for an LP coding device according to the invention. According to this embodiment the first element 9 comprises an adaptive codebook 12 or a Long Term Prediction unit for forming the ideal excitation value. To the value originated from that adaptive codebook there is assigned a gam factor 8 supplied by unit II. The use of such an adaptive codebook enables to determine an estimation of the Long Term excitation. The element 10 is an L.P.C analysis device provided for determing the a- coefficient.
In this description the term "word" means a sequen¬ ce of binary samples.

Claims

1. A Linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter unit having a third input for receiving said coding error word, an output of said filter unit being closed looped with said first unit, characterized in that said weighting filter unit having a transfer function A(z)/H(z/ γ) wherein n
A(z) = 1 + ∑ a.2 -1 and i=l i H(z/ γ) = 1 + μ (z/γ )*** , μ being the first Parcor coefficient, a. a prediction coefficient obtained from an LPC analysis and an empirical value . o <γ^ 1.
2. An LP coding device having an input for receiving first words obtained by sampling an inputted speech signal, a short time prediction filter provided for determining an ideal excitation value from inputted first words, a first unit provided for determining an estimated excitation value for said ideal excitation value, a second unit having a first respectively a second input for receiving said ideal respectively said estimated excitation value and provided for determining an excitation error value from said inputted values, a filter unit having a third input for receiving said excitation error value, an output of said filter being closed looped with said first unit, characterized in that said filter unit has a transfer function l/H(z/ γ) wherein H(z/ γ) = 1 + μ(z/ γ)_ i, μ being the first Parcor coefficient and an empirical value 0 <γ^ 1.
3. An LP coding device as claimed in claim 1 or 2, characterized in that γ= 0.85.
PCT/BE1991/000066 1991-09-20 1991-09-20 A linear prediction speech coding device WO1993006592A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/BE1991/000066 WO1993006592A1 (en) 1991-09-20 1991-09-20 A linear prediction speech coding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/BE1991/000066 WO1993006592A1 (en) 1991-09-20 1991-09-20 A linear prediction speech coding device

Publications (1)

Publication Number Publication Date
WO1993006592A1 true WO1993006592A1 (en) 1993-04-01

Family

ID=3885299

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/BE1991/000066 WO1993006592A1 (en) 1991-09-20 1991-09-20 A linear prediction speech coding device

Country Status (1)

Country Link
WO (1) WO1993006592A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0331858A1 (en) * 1988-03-08 1989-09-13 International Business Machines Corporation Multi-rate voice encoding method and device
US4932061A (en) * 1985-03-22 1990-06-05 U.S. Philips Corporation Multi-pulse excitation linear-predictive speech coder

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4932061A (en) * 1985-03-22 1990-06-05 U.S. Philips Corporation Multi-pulse excitation linear-predictive speech coder
EP0331858A1 (en) * 1988-03-08 1989-09-13 International Business Machines Corporation Multi-rate voice encoding method and device

Similar Documents

Publication Publication Date Title
CN101180676B (en) Methods and apparatus for quantization of spectral envelope representation
KR100417635B1 (en) A method and device for adaptive bandwidth pitch search in coding wideband signals
TW448417B (en) Speech encoder adaptively applying pitch preprocessing with continuous warping
KR100421226B1 (en) Method for linear predictive analysis of an audio-frequency signal, methods for coding and decoding an audiofrequency signal including application thereof
DE69934608T2 (en) ADAPTIVE COMPENSATION OF SPECTRAL DISTORTION OF A SYNTHETIZED LANGUAGE RESIDUE
DE69934320T2 (en) LANGUAGE CODIER AND CODE BOOK SEARCH PROCEDURE
US7191123B1 (en) Gain-smoothing in wideband speech and audio signal decoder
KR101039343B1 (en) Method and device for pitch enhancement of decoded speech
TW454171B (en) Speech encoder using gain normalization that combines open and closed loop gains
JP4390803B2 (en) Method and apparatus for gain quantization in variable bit rate wideband speech coding
KR100956877B1 (en) Method and apparatus for vector quantizing of a spectral envelope representation
EP1105871B1 (en) Speech encoder and method for a speech encoder
CA2382575A1 (en) Variable bit-rate celp coding of speech with phonetic classification
US6665638B1 (en) Adaptive short-term post-filters for speech coders
US6205423B1 (en) Method for coding speech containing noise-like speech periods and/or having background noise
WO1993006592A1 (en) A linear prediction speech coding device
JP3153075B2 (en) Audio coding device
Stegmann et al. CELP coding based on signal classification using the dyadic wavelet transform

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA