COMPENSATION FOR LINEAR FILTERING USING FREQUENCY WEIGHTING FACTORS
FIELD OF INVENTION
The present invention relates to the field of telecommunications. More particularly, the present invention involves estimating the quality of speech signals.
BACKGROUND
In conventional telecommunication systems, the transmission chain over which a speech signal (e.g., a signal carrying a spoken sentence) must pass may include a number of elements, such as speech encoders, speech decoders, and air interface, public switched telephone network (PSTN) links, computer network links, receive buffering and signal processing logic. As one skilled in the art will readily appreciate, any one or more of these elements may distort and/or delay the speech signal. Accordingly, it is important to periodically measure the quality of speech signals in order to ensure that, despite distortion and/or delay, the speech quality exceeds minimum acceptable standards, so that speech signals can be heard and understood by a listener.
Typically, measuring speech quality involves transmitting a reference signal across the transmission chain to a receiving entity. The received speech signal, having been distorted and/or delayed by the various elements that make up the transmission chain, is herein referred to as the test signal. The test signal and the original reference signal are then forwarded to a speech quality measurement algorithm. The algorithm then performs the much needed speech quality measurement by comparing the test signal with the reference signal.
Telecommunications systems typically employ filters. Filters play a particularly important role in heterogenous systems, that is, systems that contain both analog and digital subsystems. When speech stimuli are fed to a digital system through an analog device, and in particular, a hands-free device, such as microphone, filters are generally used to reduce environmental noise. In another example, speech encoders and decoders (codecs) in digital systems use band-pass filters to attenuate undesirable energy that falls above and/or below a given frequency band which presumably contains the desired speech information.
In the frequency domain, a filter may, in general, be defined by equation (1) below:
Y(w) = H(w)X(w) (1)
where Y(w) represents the frequency content of a time domain output signal y(t), X(w) represents the frequency content of a time domain input signal x(t), and H(w) is the filter transfer function. If | H(w) | (i.e. , the magnitude of the transfer function) varies linearly as a function of w, the filter is said to be a linear filter. For the purpose of the present invention, | H(w) | is assumed to be stationary in time, that is, the filter remains constant over the duration of a corresponding speech connection (e.g. , a telephone call). The aforementioned environmental noise reduction filters and the band-pass filters are examples of linear filters.
Subjective speech quality, that is the quality of a speech signal perceived by the listener is effected by filtering. Generally, filtering can improve and degrade speech quality. For instance, filtering may improve speech quality by removing undesirable signal components from a speech signal, such as noise. In contrast, filtering can also degrade speech quality by delaying and/or distorting a
speech signal. Hopefully, the overall affect of filtering is beneficial. Nevertheless, it is important to take filtering into consideration when measuring speech quality. The problem, however, is that known speech quality algorithms do not effectively assess the influence that filtering, and in particular, linear filtering, has on perceived speech quality.
U.S. Patent No. 4,352, 182 describes a technique for measuring the quality of digital speech transmission equipment, where the technique attempts to eliminate the effect that linear components have on the measurement. Fig. 1 illustrates this technique. As shown, a signal x is passed through the "system under test" 101, which produces an output signal y. The same signal x is also passed through an adaptive filtering algorithm 105, which attempts to mimic the "system under test" 101 by adjusting a set of filter coefficients C,...Cm. The adaptive filtering algorithm 105 produces a first corrected signal x' . The adaptive filtering algorithm 105 continues to adjust the coefficients C, ...Cm until the difference between the output signal y and the first corrected signal x' is minimized. The adjusted coefficients are then used by a filter 110 to produce a reference signal r' , which is compared to a test signal s. The speech quality algorithm then generates a signal quality measurement SQ based on a signal-to- noise ratio that is derived by dividing the reference signal r' by the difference between the test signal s and the reference signal r' (i.e., noise).
Fig. 2 further illustrates the technique described in U.S. Patent No. 4,352, 182. From Fig. 2, it is clear that this prior technique is accomplished in the time domain, where the filter coefficients C, ...Cm are estimated on a frame-by- frame basis. While this technique may adequately support the speech quality algorithm 115, which also operates in the time domain, it would not adequately support a frequency domain based speech quality algorithm operating. Accordingly, a need exists for a frequency domain based speech quality algorithm
that takes the effects of linear filtering into account, so that a more accurate measure of speech quality can be provided.
SUMMARY OF THE INVENTION It is an objective of the present invention to provide an effective, frequency domain based speech quality algorithm.
It is also an objective of the present invention to provide a frequency domain based, and more specifically, a perceptual domain-based speech quality algorithm that is capable of adjusting the results of the speech quality assessment to compensate for the affects of linear filtering in the transmission chain.
In accordance with a first aspect of the present invention, the above- identified and other objectives are achieved through a frequency domain based method and/or apparatus that estimates the linear filtering effects in a telecommunications system. This aspect of the present invention involves generating a frequency domain representation of a first time domain based signal. A frequency domain representation of a second time domain based signal is then generated. A frequency amplitude associated with a frequency vector is then generated as a function of the frequency domain representations of the first and second time domain based signals, where the frequency vector represents a transfer function associated with the linear filtering effects in the telecommunications system.
In accordance with a second aspect of the present invention, the above- identified and other objectives are achieved through a method of measuring speech quality in a telecommunications system. This aspect of the present invention involves estimating each of a plurality of frequency weights based on a frequency domain representation of a test signal and on a frequency domain representation of a reference signal, where each of the frequency weights is associated with a
different frequency band, and where the frequency weights reflect the linear filter characteristics of the telecommunications system. An adjusted frequency domain representation of the reference signal is then generated by applying each of the frequency weights to a corresponding component of the frequency domain representation of the reference signal. In the frequency domain, the frequency representation of the test signal and the adjusted, frequency domain representation of the reference signal are compared, and speech quality is measured based on the comparison.
BRIEF DESCRIPTION OF THE FIGURES
The objectives and advantages of the present invention will be understood by reading the following detailed description in conjunction with the drawings, in which:
Fig. 1 illustrates a prior speech quality measurement technique; Fig. 2 further illustrates the prior speech quality measurement technique;
Fig. 3 illustrates the process of estimating frequency weights | H(f) | in accordance with exemplary embodiments of the present invention; and
Fig. 4 illustrates a speech quality assessment techniques in accordance with exemplary embodiments of the present invention.
DETAILED DESCRIPTION OF THE INVENTION The present invention involves a speech quality assessment technique that estimates and eliminates the effects of linear filtering. Furthermore, the speech quality assessment technique operates in the frequency domain, and more particularly, the perceptual domain. The perceptual domain, as one skilled in the art will readily appreciate, is a variant of the frequency domain, where the perceptual domain may be characterized by the frequency domain as it specifically
relates to the human auditory system. Thus, where a signal, in the frequency domain may be defined by a series of amplitudes and frequency bands, the same signal, in the perceptual domain, may be defined by a series of "loudness" values and "bark" bins. A bark bin is, essentially, the same as a frequency band but with limits that are defined to correspond with the human auditory system.
In general, the speech quality assessment technique compares a test signal and a reference signal in the perceptual domain in order to approximate or estimate the amplitudes (i.e. , the frequency weights) associated with a linear filtering transfer function | H(f) | . In accordance with exemplary embodiments of the present invention, the entire length of the test and reference signals are compared, such that a single frequency weight is estimated for each bark bin. Fig. 3 illustrates this technique of estimating the frequency weight | H(f) | for each bark bin over the entire length of the test and reference signals.
Fig. 4 illustrates, in greater detail, the speech quality assessment technique in accordance with a preferred embodiment of the present invention. As shown, the technique involves several processes: the process 401 of applying a perceptual model to the test signal y(i) and the reference signal x(i); the process 405 of estimating each frequency weight | H(f) | ; the process 410 of applying each estimated frequency weight | H(f) | to the reference signal X(f,j) in the perceptual domain; and the process 415 of measuring speech quality, in the perceptual domain, based on the test signal Y(f,j) and an adjusted reference signal X'(f,j). The speech quality assessment technique illustrated in Fig. 4 is now described in greater detail herein below.
As shown in Fig. 4, the first process 401 involves the transformation of the test signal x(i) and the reference signal y(i) from the time domain to the perceptual domain. In accordance with the preferred embodiment, the test signal x(i) and the reference signal y(i) are first divided into j time frames, where each of the j time
frames comprises, for example, 256 samples with a 50% overlap. The j time frames samples are then transformed into f number of bark bin values, where f, for example, ranges in value from 0-55 with a bark resolution of 1/3. Accordingly, the transformation of the test signal x(i) results in the generation of a bark-time frame matrix X(f,j), where the value associated with each matrix element (f,j) is the "loudness" value associated with the test signal in bark bin f at time (i.e., time frame) j. Similarly, the transformation of the reference signal y(i) results in the generation of a bark-time frame matrix Y(fj), where the value associated with each matrix element (f,j) represents the "loudness" value associated with the reference signal in bark bin f at time j. Fig. 3, which was described previously, illustrates an exemplary bark-time frame matric.
The second process 405, illustrated in Fig. 4, involves estimating a vector H(f) based on the loudness values associated with the two bark-time frame matrices X(f,j) and Y(f,j). The vector H(f) actually comprises f frequency weights, where each of the f frequency weights is associated with a corresponding one of the f bark bins. Again, f may range in value from 0-55 in accordance with a preferred embodiment of the present invention.
Further in accordance with the preferred embodiment of the present invention, the vector H(f) may be estimated using equation (2) below:
E(f) = ∑ [\H(f)\X(f,j)- Y(f )Y (1)
where z represents the total number of time frames. More particularly, for each bark bin f, equation (2) is used to derive the value of the frequency weight | H(f) |
that results in a minimization of E(f). As one skilled in the art will readily appreciate, given equation (2), E(f) is minimized for a given bark bin f when the corresponding frequency weight | H(f) | multiplied by the reference signal X(f,j) is as equal as possible to the test signal Y(f j) over all z time frames. Accordingly, if z is equal to "2" , it can be said that a maximum correlation is used to estimate each of the frequency weights | H(f) | associated with the vector H(f). Equation (3) below, represents the solution for each of the frequency weight | H(f) | that results in a minimization of E(f), where the loudness values associated with the bark-time frame matrices X(f,j) and Y(fj) are known.
\H(f)\ j i
(3)
Σ X(f )z
Referring back to Fig. 4, the third process 410 involves the application of each frequency weight | H(f) | to the reference signal X(f,j). This may be achieved by multiplying each of the frequency weights with each of the loudness values associated with the corresponding bark bin. Thus, for a given bark bin, the process 410 may be implemented in accordance with equation (4) below:
X'ifj) = \H(f)\X(f ) (4)
where f represents the bark bin, and X'(fj) represents the reference signal that has been adjusted for linear filter effects.
The last process 415 illustrated in Fig. 4 involves measuring speech quality. In accordance with exemplary embodiments of the present invention, the speech quality algorithm compares, in the perceptual domain, the test signal Y(f,j) and the reference signal X'(f,j), where the reference signal X'(fj) has been adjusted to account for linear filtering effects.
In accordance with an alternative embodiment of the present invention, the second process 405 may be modified by using fewer than all n time frames when estimating each frequency weight | H(f) | for each bark bin. For example, time frames which contain no speech content may be eliminated when estimating each frequency weight | H(f) | . Alternatively, only those time frames which are known to have been correctly transmitted may be used in estimating each frequency weight I H(f) I . In another alternative embodiment, a determination is made as to whether the transmission chain includes linear filtering. For instance, if in estimating each frequency weight | H(f) | during the process 405, it is established that each of the frequency weights equals or, due to noise, approximately equals the value " 1 ", it may be reasonable to set the value of all frequency weights equal to " 1 ", as there may be a high probability that the transmission chain contains no linear filtering. It should be noted that the present invention has been described in accordance with exemplary embodiments, which are intended to be illustrative in all aspects, rather than restrictive. For example, it is not intended that the present invention be limited to a specific domain, such as the perceptual domain. For example, the present invention is applicable to the time-mel domain or any other time-frequency based domain. Thus, the present invention is capable of many variations in detailed implementation, which may be derived from the description
contained herein by a person or ordinary skill in the art. All such variations are considered to be within the scope and spirit of the present invention as defined by the following claims.