WO1993006592A1

WO1993006592A1 - A linear prediction speech coding device

Info

Publication number: WO1993006592A1
Application number: PCT/BE1991/000066
Authority: WO
Inventors: Gao Yang
Original assignee: Lernout & Hauspie Speechproducts
Priority date: 1991-09-20
Filing date: 1991-09-20
Publication date: 1993-04-01

Abstract

A linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter which is closed looped with said first unit and which has a transfer function A(z)/H(z/η) wherein A(z) = (I) and H(z/η) = 1 + ν (z/η)-1, ν being the first Parcor coefficient, a¿i? a prediction coefficient obtained from an LPC analysis and an empirical value 0<η«1.

Description

A LINEAR PREDICTION SPEECH CODING DEVICE

The invention relates to a Linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter unit having a third input for receiving said coding error word, an output of said filter unit being closed looped with said first unit.

Such a coding device is known from the article of B.S. Atal and S.L. Hanauer entitled "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave" published in Journal of the Acoustical Society of America, Volume 50, 1972 pp. 637-655. A speech signal, which is for example created by a human voice is sampled, for example with a 8 kHz sampling frequency. On the basis of the thus formed speech samples first words are formed. Those first words can be as well the samples themselves or obtained by applying a filtering or another processing on the samples. Beside those first words second words are formed which represent an estimation of said first words.

Since the filter unit is closed looped with the first unit, the second words are for example determined by minimalisation of the outputted filter signal. The weighting filter unit operates on a coding error word obtained for example by deducting each time the second code word from the first. The use of such a filter enables to mask sounds which would normally not be percepted by a human ear. A common choice for a perceptual weighting filter is W(z) = A(z)/A(z/ _α ) wherein A(z) a polynomial function is defining the short term prediction and a being empirically determined. For the common weighting filter α is chosen between 0.8 and 1.

Instead of using the first and second words, the filtering operation can also be applied on excitation error values being determined for example by the difference between an ideal excitation and an estimated excitation. In the latter case an ideal excitation v. ue is obtained from said first words by applying thereon a short time prediction filtering operation and the first unit is provided for determining an estimated excitation value. An excitation error is than outputted by the second unit, while the filter unit has a transfer function I/A(z/ α).

A drawback of the known coding device is that the amount of required computation is rather important. Reduction of this computation amount implies that certain constraints imposed on the transfer function of the filter unit have to be violated, which would then have negative consequences on the quality of the reconstructed speech.

It is an object of the invention to realise an LP coding device wherein the computation amount is reduced with respect to the known device without negative consequences on the quality of -the outputted value.

An LP coding device according to the invention is therefore characterized in that said weighting filter unit has a transfer function A(z)/H(z/γ ) wherein

A(z) = 1 + Σ a-₂ ^_i and ά=l ¹ H(z/γ) = 1 + μ (z/γ )-* , μ being the first Parcor coefficient, a- _a prediction coefficient obtained from an LPC analysis and γ an empirical value o <γ<l. The spectrum of the transfer function l/H(z/γ ) is the image of the average slope of the synthesis filter l/A(z/α ). The particular choice of this filter having a transfer function 1/H(z/ γ) thus enables to save computative time without having negative consequences on the quality of the outputted signal. The transfer function 1/H(z/ _γ) is a first order function as can be seen from the expression H(z/γ) = 1 + μ(z/ γ)^~I, whereas the transfer function A(z/α ) which is used in the known filter unit is of order of 8-14, dependent of the order chosen in the LPC (Linear Predictive Code) analysis. The gist of the present invention resides in the particular choice of such a l/H(z/ γ) filter.

A first embodiment of an LP coding device according to the invention is characterized in that 0,5 γ < 1 , in particular γ= 0.85- With those values for a faster response is obtained since there is a more rapid decrease to zero of the spectrum value. The particular choice of γ thus enables to further reduce the calculation time, without affecting the outputted value.

The invention will now be described in detail by means of the figures showing examples of LP coding device according to the invention. In the figures :

Figure 1 shows a first embodiment of an LP coding device according to the invention ; Figure 2 shows a second embodiment of an LP coding device according to the invention ;

Figure 3 shows a third embodiment of an LP coding device according to the invention.

In the figures a same reference number has been assigned to a same or analogous element.

The class of Linear Predictive (LP) coding devices comprises a large number of elements such as for example CELP (Code Excited LP), BCELP (Binary), SBCELP (Simplified), VELP (Vector), VSELP (Vector Sum). The present invention is applicable to all the elements of this class as long as the coding device uses a filter as will- be described hereunder.

The LP coding device shown in figure 1 comprises an input 2 for receiving a first words obtained by sampling a supplied speech signal originating for example from a human voice. The LP coding device further comprises a first unit 1 provided for determining second words representing an estimation of said inputted first words.

The input 2 respectively an output of the first unit

1 are connected to a first respectively a second input of a second unit

3. This second unit 3, which is for example formed by a substraction unit, is provided for determining a coding error word, for example by - * - deducting the second word from the first word. The thus obtained coding error word is inputted to a weighting filter unit , which output is connected to an element 5 provided for determining the energy value which is the sum of the square of the inputted words. The energy value is supplied to a further element 6 provided for minimising that energy value. An output of the further element 6 is connected to an input of the first element 1. Thus the filter unit is closed looped with the first unit 1 which enables to determine said second value in an iterative manner and so to reduce the energy value. The weighting filter unit is formed by a first order filter having a transfer function A(z)/H(z/Y). The pre-emphasis function H(z/ γ) being defined :

wherein μ = R(1)/R(0) or μ = -k^ ; R(i) being the autocorrelation of the speech signal and k^ being the first Par cor coefficient. Υ being an empirically determined constant, and 0 <γ-$ 1. P

A(z) = 1 + ∑ a_iZ ^_1 i=l being a polynomial function wherein a_{i 1S a} coefficient determined by a LPC (Linear Predictive Coding) element. Such a LPC element is for example described in the book of L.R. Rabiner and R.W. Schafer "Digital Processing of Speech Signals" p. 395-453 and published by Prentice Hall, 1978. In the expression for A(z), P being an order value, for example P = 10. The form of the transfer function of the weighting filter is not critical as long as it satisfies the following constraints :

(1) the synthesized speech signal has the same spectral behaviour as the original signal

(2) the spectrum of the coding error is shaped according to the speech signal spectrum (3) the spectrum of the coding error exhibits a flatter average envelope.

For the common weighting filter, the above constraints are satisfied with α = 0.8 to 1 ; however, the amount of the required computation is very important. An immediate simplification would be to use W(z) = A(z) (α= 0, so called "open loop"); with such a choice however, the third constraint is not met due to much larger coding errors in the lower frequency region, which has a great influence on the quality of the reconstructed speech.

The use of a filter with a transfer function A(z)/H(z/ γ ) enables a substantial reduction of the computation requirements because this transfer function is, as can be seen from the relevant expression, a first order function, whereas the A(z/α ) function used in the known filter units is of order 8-14 depending of the used LPC analysis. The value of γ is 0 <γ< 1 and preferably 0.85. A value of γ< 1 has the advantage that a faster response is obtained since there is a more rapid decrease to zero of the spectrum value, thus obtaining a further reduction of the calculation term, without affecting the outputted value. Values of Y< 0.5 are possible but offer a less good quality of the obtained values.

Figure 2 shows a second embodiment of an LP coding device according to the invention. This second embodiment has a more conventional architecture in comparison with the first embodiment illustrated in figure 1 which has a more theoretical architecture. The main difference between the first and the second embodiment is that in the second embodiment the filter unit 7 has a transfer function 1/H(z/Y ) and operates on an excitation error value. The numerator A(z) present in the first embodiment is no longer used in the filter unit but in a short time prediction filter 8 even as in the first unit 9. This short time prediction filter, which receives the first words supplied at input 2, is provided for determining an ideal excitation value from the inputted first words. To this end the inputted first words are multiplied by the function n

A(z) = 1 + Σ a-z^-1. This operation is performed by the short term prediction filter 8.

Since in the second embodiment the filter unit operates on an excitation error value, the first unit 9 produces an estimated excitation value for the ideal excitation value outputted by the short term prediction filter 8. The second embodiment enables to even more reduce the computation time of the filter unit 7 since the latter now only has to perform the function 1/H(z/ γ). Figure 3 shows a third embodiment for an LP coding device according to the invention. According to this embodiment the first element 9 comprises an adaptive codebook 12 or a Long Term Prediction unit for forming the ideal excitation value. To the value originated from that adaptive codebook there is assigned a gam factor 8 supplied by unit II. The use of such an adaptive codebook enables to determine an estimation of the Long Term excitation. The element 10 is an L.P.C analysis device provided for determing the a- coefficient.

In this description the term "word" means a sequen¬ ce of binary samples.

Claims

1. A Linear Prediction coding device having an input for receiving first words obtained by sampling an inputted speech signal and a first unit provided for determining second words representing an estimation of said inputted first words, a second unit having a first, respectively a second input for receiving said first respectively said second word and provided for determining a coding error word from said inputted first and second word, a weighting filter unit having a third input for receiving said coding error word, an output of said filter unit being closed looped with said first unit, characterized in that said weighting filter unit having a transfer function A(z)/H(z/ γ) wherein n

A(z) = 1 + ∑ a.₂ ^-1 and i=l ⁱ H(z/ γ) = 1 + μ (z/γ )^*** , μ being the first Parcor coefficient, a. a prediction coefficient obtained from an LPC analysis and an empirical value _. o <γ^ 1.

2. An LP coding device having an input for receiving first words obtained by sampling an inputted speech signal, a short time prediction filter provided for determining an ideal excitation value from inputted first words, a first unit provided for determining an estimated excitation value for said ideal excitation value, a second unit having a first respectively a second input for receiving said ideal respectively said estimated excitation value and provided for determining an excitation error value from said inputted values, a filter unit having a third input for receiving said excitation error value, an output of said filter being closed looped with said first unit, characterized in that said filter unit has a transfer function l/H(z/ γ) wherein H(z/ γ) = 1 + μ(z/ γ)^{_ i}, μ being the first Parcor coefficient and an empirical value 0 <γ^ 1.

3. An LP coding device as claimed in claim 1 or 2, characterized in that γ= 0.85.