WO2001065543A1

WO2001065543A1 - Compensation for linear filtering using frequency weighting factors

Info

Publication number: WO2001065543A1
Application number: PCT/SE2001/000393
Authority: WO
Inventors: Anders Karlsson; Jonas Lundberg; Arne Steinarson
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2000-02-29
Filing date: 2001-02-22
Publication date: 2001-09-07
Also published as: AU2001236293A1

Abstract

Measuring the quality of speech signals in a telecommunications system is more effectively accomplished by first estimating and eliminating the effects of linear filtering that are associated with the telecommunications system. In doing so, a frequency domain representation of a test speech signal and a reference speech signal are generated. The frequency domain representations may take the form of bark-time frame matrices. Each of a plurality of frequency weights are then estimated based on the frequency domain representations of the test and reference speech signals, where each of the plurality of frequency weights corresponds with a different frequency band, and where the frequency weights reflect the linear filter characteristics of the telecommunications systems. The frequency domain representation of the reference signal is then adjusted by applying each of the frequency weights to a corresponding component of the frequency domain representation of the reference signal. The frequency representation of the test signal and the adjusted frequency domain representation of the reference signal are compared, and speech quality is measured based on the comparison.

Description

COMPENSATION FOR LINEAR FILTERING USING FREQUENCY WEIGHTING FACTORS

FIELD OF INVENTION

The present invention relates to the field of telecommunications. More particularly, the present invention involves estimating the quality of speech signals.

BACKGROUND

In conventional telecommunication systems, the transmission chain over which a speech signal (e.g., a signal carrying a spoken sentence) must pass may include a number of elements, such as speech encoders, speech decoders, and air interface, public switched telephone network (PSTN) links, computer network links, receive buffering and signal processing logic. As one skilled in the art will readily appreciate, any one or more of these elements may distort and/or delay the speech signal. Accordingly, it is important to periodically measure the quality of speech signals in order to ensure that, despite distortion and/or delay, the speech quality exceeds minimum acceptable standards, so that speech signals can be heard and understood by a listener.

Typically, measuring speech quality involves transmitting a reference signal across the transmission chain to a receiving entity. The received speech signal, having been distorted and/or delayed by the various elements that make up the transmission chain, is herein referred to as the test signal. The test signal and the original reference signal are then forwarded to a speech quality measurement algorithm. The algorithm then performs the much needed speech quality measurement by comparing the test signal with the reference signal. Telecommunications systems typically employ filters. Filters play a particularly important role in heterogenous systems, that is, systems that contain both analog and digital subsystems. When speech stimuli are fed to a digital system through an analog device, and in particular, a hands-free device, such as microphone, filters are generally used to reduce environmental noise. In another example, speech encoders and decoders (codecs) in digital systems use band-pass filters to attenuate undesirable energy that falls above and/or below a given frequency band which presumably contains the desired speech information.

In the frequency domain, a filter may, in general, be defined by equation (1) below:

Y(w) = H(w)X(w) (1)

where Y(w) represents the frequency content of a time domain output signal y(t), X(w) represents the frequency content of a time domain input signal x(t), and H(w) is the filter transfer function. If | H(w) | (i.e. , the magnitude of the transfer function) varies linearly as a function of w, the filter is said to be a linear filter. For the purpose of the present invention, | H(w) | is assumed to be stationary in time, that is, the filter remains constant over the duration of a corresponding speech connection (e.g. , a telephone call). The aforementioned environmental noise reduction filters and the band-pass filters are examples of linear filters.

Subjective speech quality, that is the quality of a speech signal perceived by the listener is effected by filtering. Generally, filtering can improve and degrade speech quality. For instance, filtering may improve speech quality by removing undesirable signal components from a speech signal, such as noise. In contrast, filtering can also degrade speech quality by delaying and/or distorting a speech signal. Hopefully, the overall affect of filtering is beneficial. Nevertheless, it is important to take filtering into consideration when measuring speech quality. The problem, however, is that known speech quality algorithms do not effectively assess the influence that filtering, and in particular, linear filtering, has on perceived speech quality.

U.S. Patent No. 4,352, 182 describes a technique for measuring the quality of digital speech transmission equipment, where the technique attempts to eliminate the effect that linear components have on the measurement. Fig. 1 illustrates this technique. As shown, a signal x is passed through the "system under test" 101, which produces an output signal y. The same signal x is also passed through an adaptive filtering algorithm 105, which attempts to mimic the "system under test" 101 by adjusting a set of filter coefficients C,...C_m. The adaptive filtering algorithm 105 produces a first corrected signal x' . The adaptive filtering algorithm 105 continues to adjust the coefficients C, ...C_m until the difference between the output signal y and the first corrected signal x' is minimized. The adjusted coefficients are then used by a filter 110 to produce a reference signal r' , which is compared to a test signal s. The speech quality algorithm then generates a signal quality measurement SQ based on a signal-to- noise ratio that is derived by dividing the reference signal r' by the difference between the test signal s and the reference signal r' (i.e., noise).

Fig. 2 further illustrates the technique described in U.S. Patent No. 4,352, 182. From Fig. 2, it is clear that this prior technique is accomplished in the time domain, where the filter coefficients C, ...C_m are estimated on a frame-by- frame basis. While this technique may adequately support the speech quality algorithm 115, which also operates in the time domain, it would not adequately support a frequency domain based speech quality algorithm operating. Accordingly, a need exists for a frequency domain based speech quality algorithm that takes the effects of linear filtering into account, so that a more accurate measure of speech quality can be provided.

SUMMARY OF THE INVENTION It is an objective of the present invention to provide an effective, frequency domain based speech quality algorithm.

It is also an objective of the present invention to provide a frequency domain based, and more specifically, a perceptual domain-based speech quality algorithm that is capable of adjusting the results of the speech quality assessment to compensate for the affects of linear filtering in the transmission chain.

In accordance with a first aspect of the present invention, the above- identified and other objectives are achieved through a frequency domain based method and/or apparatus that estimates the linear filtering effects in a telecommunications system. This aspect of the present invention involves generating a frequency domain representation of a first time domain based signal. A frequency domain representation of a second time domain based signal is then generated. A frequency amplitude associated with a frequency vector is then generated as a function of the frequency domain representations of the first and second time domain based signals, where the frequency vector represents a transfer function associated with the linear filtering effects in the telecommunications system.

In accordance with a second aspect of the present invention, the above- identified and other objectives are achieved through a method of measuring speech quality in a telecommunications system. This aspect of the present invention involves estimating each of a plurality of frequency weights based on a frequency domain representation of a test signal and on a frequency domain representation of a reference signal, where each of the frequency weights is associated with a different frequency band, and where the frequency weights reflect the linear filter characteristics of the telecommunications system. An adjusted frequency domain representation of the reference signal is then generated by applying each of the frequency weights to a corresponding component of the frequency domain representation of the reference signal. In the frequency domain, the frequency representation of the test signal and the adjusted, frequency domain representation of the reference signal are compared, and speech quality is measured based on the comparison.

BRIEF DESCRIPTION OF THE FIGURES

The objectives and advantages of the present invention will be understood by reading the following detailed description in conjunction with the drawings, in which:

Fig. 1 illustrates a prior speech quality measurement technique; Fig. 2 further illustrates the prior speech quality measurement technique;

Fig. 3 illustrates the process of estimating frequency weights | H(f) | in accordance with exemplary embodiments of the present invention; and

Fig. 4 illustrates a speech quality assessment techniques in accordance with exemplary embodiments of the present invention.

DETAILED DESCRIPTION OF THE INVENTION The present invention involves a speech quality assessment technique that estimates and eliminates the effects of linear filtering. Furthermore, the speech quality assessment technique operates in the frequency domain, and more particularly, the perceptual domain. The perceptual domain, as one skilled in the art will readily appreciate, is a variant of the frequency domain, where the perceptual domain may be characterized by the frequency domain as it specifically relates to the human auditory system. Thus, where a signal, in the frequency domain may be defined by a series of amplitudes and frequency bands, the same signal, in the perceptual domain, may be defined by a series of "loudness" values and "bark" bins. A bark bin is, essentially, the same as a frequency band but with limits that are defined to correspond with the human auditory system.

In general, the speech quality assessment technique compares a test signal and a reference signal in the perceptual domain in order to approximate or estimate the amplitudes (i.e. , the frequency weights) associated with a linear filtering transfer function | H(f) | . In accordance with exemplary embodiments of the present invention, the entire length of the test and reference signals are compared, such that a single frequency weight is estimated for each bark bin. Fig. 3 illustrates this technique of estimating the frequency weight | H(f) | for each bark bin over the entire length of the test and reference signals.

Fig. 4 illustrates, in greater detail, the speech quality assessment technique in accordance with a preferred embodiment of the present invention. As shown, the technique involves several processes: the process 401 of applying a perceptual model to the test signal y(i) and the reference signal x(i); the process 405 of estimating each frequency weight | H(f) | ; the process 410 of applying each estimated frequency weight | H(f) | to the reference signal X(f,j) in the perceptual domain; and the process 415 of measuring speech quality, in the perceptual domain, based on the test signal Y(f,j) and an adjusted reference signal X'(f,j). The speech quality assessment technique illustrated in Fig. 4 is now described in greater detail herein below.

As shown in Fig. 4, the first process 401 involves the transformation of the test signal x(i) and the reference signal y(i) from the time domain to the perceptual domain. In accordance with the preferred embodiment, the test signal x(i) and the reference signal y(i) are first divided into j time frames, where each of the j time frames comprises, for example, 256 samples with a 50% overlap. The j time frames samples are then transformed into f number of bark bin values, where f, for example, ranges in value from 0-55 with a bark resolution of 1/3. Accordingly, the transformation of the test signal x(i) results in the generation of a bark-time frame matrix X(f,j), where the value associated with each matrix element (f,j) is the "loudness" value associated with the test signal in bark bin f at time (i.e., time frame) j. Similarly, the transformation of the reference signal y(i) results in the generation of a bark-time frame matrix Y(fj), where the value associated with each matrix element (f,j) represents the "loudness" value associated with the reference signal in bark bin f at time j. Fig. 3, which was described previously, illustrates an exemplary bark-time frame matric.

The second process 405, illustrated in Fig. 4, involves estimating a vector H(f) based on the loudness values associated with the two bark-time frame matrices X(f,j) and Y(f,j). The vector H(f) actually comprises f frequency weights, where each of the f frequency weights is associated with a corresponding one of the f bark bins. Again, f may range in value from 0-55 in accordance with a preferred embodiment of the present invention.

Further in accordance with the preferred embodiment of the present invention, the vector H(f) may be estimated using equation (2) below:

E(f) = ∑ [\H(f)\X(f,j)- Y(f )Y (1)

where z represents the total number of time frames. More particularly, for each bark bin f, equation (2) is used to derive the value of the frequency weight | H(f) | that results in a minimization of E(f). As one skilled in the art will readily appreciate, given equation (2), E(f) is minimized for a given bark bin f when the corresponding frequency weight | H(f) | multiplied by the reference signal X(f,j) is as equal as possible to the test signal Y(f j) over all z time frames. Accordingly, if z is equal to "2" , it can be said that a maximum correlation is used to estimate each of the frequency weights | H(f) | associated with the vector H(f). Equation (3) below, represents the solution for each of the frequency weight | H(f) | that results in a minimization of E(f), where the loudness values associated with the bark-time frame matrices X(f,j) and Y(fj) are known.

\H(f)\ j i

(3)

Σ X⁽f )^z

Referring back to Fig. 4, the third process 410 involves the application of each frequency weight | H(f) | to the reference signal X(f,j). This may be achieved by multiplying each of the frequency weights with each of the loudness values associated with the corresponding bark bin. Thus, for a given bark bin, the process 410 may be implemented in accordance with equation (4) below:

X'ifj) = \H⁽f)\X⁽f ) (4) where f represents the bark bin, and X'(fj) represents the reference signal that has been adjusted for linear filter effects.

The last process 415 illustrated in Fig. 4 involves measuring speech quality. In accordance with exemplary embodiments of the present invention, the speech quality algorithm compares, in the perceptual domain, the test signal Y(f,j) and the reference signal X'(f,j), where the reference signal X'(fj) has been adjusted to account for linear filtering effects.

In accordance with an alternative embodiment of the present invention, the second process 405 may be modified by using fewer than all n time frames when estimating each frequency weight | H(f) | for each bark bin. For example, time frames which contain no speech content may be eliminated when estimating each frequency weight | H(f) | . Alternatively, only those time frames which are known to have been correctly transmitted may be used in estimating each frequency weight I H(f) I . In another alternative embodiment, a determination is made as to whether the transmission chain includes linear filtering. For instance, if in estimating each frequency weight | H(f) | during the process 405, it is established that each of the frequency weights equals or, due to noise, approximately equals the value " 1 ", it may be reasonable to set the value of all frequency weights equal to " 1 ", as there may be a high probability that the transmission chain contains no linear filtering. It should be noted that the present invention has been described in accordance with exemplary embodiments, which are intended to be illustrative in all aspects, rather than restrictive. For example, it is not intended that the present invention be limited to a specific domain, such as the perceptual domain. For example, the present invention is applicable to the time-mel domain or any other time-frequency based domain. Thus, the present invention is capable of many variations in detailed implementation, which may be derived from the description contained herein by a person or ordinary skill in the art. All such variations are considered to be within the scope and spirit of the present invention as defined by the following claims.

Claims

WHAT IS CLAIMED IS:

1. Frequency domain based method for estimating linear filtering effects in a telecommunications system comprising the steps of: generating a frequency domain representation of a first time domain based signal; generating a frequency domain representation of a second time domain based signal; and estimating a frequency amplitude associated with a frequency vector as a function of the frequency domain representations of the first and second time domain based signals, wherein the frequency vector represents a transfer function associated with the linear filtering effects in the telecommunications system.

2. The method of claim 1 wherein the frequency domain representations of the first and second time domain based signals are bark-time frame matrices.

3. The method of claim 1 wherein said step of estimating the frequency amplitude associated with the frequency vector comprises the step of: minimizing an error value, wherein the error value is a function of the frequency domain representation of the first time domain based signal and a product of the frequency domain representation of the second time domain based signal and the frequency amplitude associated with the frequency vector.

4. The method of claim 3, wherein said step of minimizing the error value comprises the steps of: deriving the product of the frequency domain representation of the first time domain based signal and the frequency domain representation of the second time domain based signal, for each of a number of time periods associated with the first and second time domain based signals; and summing the product, wherein each product is associated with a different one of the time periods.

5. The method of claim 1 further comprising the step of: estimating each of a number of additional frequency amplitudes associated with the frequency vector, wherein each frequency amplitude corresponds with a different frequency band, and wherein each frequency amplitude is estimated based on samples from a plurality of time periods associated with the first and second time domain based signals.

6. A method for measuring speech quality in a telecommunications system comprising the steps of: estimating each of a plurality of frequency weights based on a frequency domain representation of a test signal and on a frequency domain representation of a reference signal, wherein each of the plurality of frequency weights is associated with a different frequency band, and wherein the frequency weights reflect the linear filter characteristics of said telecommunications system; generating an adjusted frequency domain representation of the reference signal by applying each of the plurality of frequency weights to a corresponding component of the frequency domain representation of the reference signal; comparing, in the frequency domain, the frequency representation of the test signal and the adjusted, frequency domain representation of the reference signal; and measuring speech quality based on said comparison of the frequency representation of the test signal and the adjusted, frequency domain representation of the reference signal.

7. The method of claim 6 further comprising the step of: generating a frequency domain representation of the test signal and a frequency domain representation of the reference signal.

8. The method of claim 7, wherein said step of generating a frequency domain representation of the test signal and a frequency domain representation of the reference signal comprises the steps of: applying a perceptual domain model to the test signal; and applying the perceptual domain model to the reference signal.

9. The method of claim 6, wherein the frequency domain representation of the test signal and the frequency domain representation of the reference signal are bark-time frame matrices.

10. The method of claim 9, wherein each of the different frequency bands, with which each frequency weight is associated, corresponds to a different bark bin.

11. The method of claim 10, wherein said step of estimating each of the plurality of frequency weights based on the frequency domain representation of the test signal and on the frequency domain representation of the reference signal comprises the step of: approximating one of said frequency weights based on loudness values associated with the test signal and the reference signal at each of a number of time frames in a corresponding one of the different bark bins.

12. The method of claim 11, wherein only time frames containing speech information are employed in approximating the frequency weight.

13. The method of claim 6, wherein said step of estimating each of the plurality of frequency weights based on the frequency domain representation of the test signal and on the frequency domain representation of the reference signal comprises the step of: minimizing an error value, wherein the error value is a function of the frequency domain representation of the test signal, the frequency domain representation of the reference signal and a corresponding one of the frequency weights.

14. The method of claim 13, wherein said step of minimizing the error value comprises the steps of: deriving the product of the frequency domain representation of the test signal and the frequency domain representation of the reference signal, for each of a number of time frames associated with the test and reference signals; and summing the products associated with each of the time frames.

15. The method of claim 14 further comprising the steps of: deriving a reference signal value by raising a frequency domain representation of the reference signal, associated with a given one of the time frames, to an exponential power equivalent to a corresponding time frame number; repeating said step of deriving a reference signal value for each of the time frames; and dividing the sum of products by the sum of the reference signal values.

16. The method of claim 6, wherein said step of generating an adjusted frequency domain representation of the reference signal, by applying each of the plurality of frequency weights to a corresponding component of the frequency domain representation of the reference signal, comprises the step of: applying one of the plurality of frequency weights to a corresponding component of the reference signal, wherein the corresponding component of the reference signal entails a number of amplitudes, each being associated with a corresponding time frame in a given frequency band.

17. The method of claim 16, wherein said given frequency band is a bark bin.

18. The method of claim 6 further comprising the step of: determining whether the telecommunications system exhibits any linear filtering characteristics.

19. The method of claim 18, wherein said step of determining whether the telecommunications system exhibits any linear filtering characteristics comprises the step of: determining whether the estimated value associated with each of the frequency weights is equal to, or substantially equal to " 1" .

20. The method of claim 19 further comprising the step of: setting the value of each frequency weight to " 1 " , prior to the step of generating an adjusted frequency domain representation of the reference signal by applying each of the plurality of frequency weights to a corresponding component of the frequency domain representation of the reference signal, if it is determined that the estimated value associated with each of the frequency weights is equal to, or substantially equal to " 1 " .

21. Apparatus for estimating linear filtering effects in a telecommunications system comprising: means for generating a frequency domain representation of a first time domain based signal; means for generating a frequency domain representation of a second time domain based signal; and means for estimating a frequency amplitude associated with a frequency vector as a function of the frequency domain representations of the first and second time domain based signals, wherein the frequency vector represents a transfer function associated with the linear filtering effects in the telecommunications system.

22. The apparatus of claim 21 , wherein the frequency domain representations of the first and second time domain based signals are bark-time frame matrices.

23. The apparatus of claim 21 , wherein said means for estimating the frequency amplitude associated with the frequency vector comprises: means for minimizing an error value, wherein the error value is a function of the frequency domain representation of the first time domain based signal and a product of the frequency domain representation of the second time domain based signal and the frequency amplitude associated with the frequency vector.

24. The apparatus of claim 23, wherein said means for minimizing the error value comprises: means for deriving the product of the frequency domain representation of the first time domain based signal and the frequency domain representation of the second time domain based signal, for each of a number of time periods associated with the first and second time domain based signals; and means for summing the products, wherein each product is associated with a different one of the time periods.

25. The apparatus of claim 21 further comprising: means for estimating each of a number of additional frequency amplitudes associated with the frequency vector, wherein each frequency amplitude corresponds with a different frequency band, and wherein each frequency amplitude is estimated based on samples from a plurality of time periods associated with the first and second time domain based signals.