EP0720146A1 - A method for measuring speech masking properties - Google Patents

A method for measuring speech masking properties Download PDF

Info

Publication number
EP0720146A1
EP0720146A1 EP95309003A EP95309003A EP0720146A1 EP 0720146 A1 EP0720146 A1 EP 0720146A1 EP 95309003 A EP95309003 A EP 95309003A EP 95309003 A EP95309003 A EP 95309003A EP 0720146 A1 EP0720146 A1 EP 0720146A1
Authority
EP
European Patent Office
Prior art keywords
signal
noise
subband
power
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP95309003A
Other languages
German (de)
French (fr)
Inventor
Yair Shoham
Casimir Wierzynski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Publication of EP0720146A1 publication Critical patent/EP0720146A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • G10L19/0208Subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • the invention relates to a method for measuring masking properties of components of a signal and for determining a noise level vector for the signal.
  • ISDN Integrated Services Digital Network
  • an input speech signal which can be characterized as a continuous function of a continuous time variable, must be converted to a digital signal -- a signal that is discrete in both time and amplitude.
  • the conversion is a two step process. First, the input speech signal is sampled periodically in time ( i.e. at a particular rate) to produce a sequence of samples where the samples take on a continuum of values. Then the values are quantized to a finite set of values, represented by binary digits (bits), to yield the digital signal.
  • the digital signal is characterized by a bit rate, i.e. a specified number of bits per second that reflects how often the input speech signal was sampled and how many bits were used to quantized the sampled values.
  • Masking is a term describing the phenomenon of human hearing wherein one sound obscures or drowns out another.
  • a common example is where the sound of a car engine is drowned out if the volume of the car radio is high enough.
  • the sound of the shower masked the sound of the telephone ring; if the shower had not been running, the ring would have been heard.
  • the masking properties of a signal are typically measured as a noise-to-signal ratio determined with respect to a masking criterion.
  • a masking criterion is the just-noticeable-distortion (JND) level, i.e. the noise-to-signal ratio where the noise just becomes audible to a listener.
  • JND just-noticeable-distortion
  • another masking criterion is the audible-but-not-annoying level, i.e. the point where a listener may hear the noise, but the noise level is not sufficiently high as to irritate the listener.
  • the invention provides a method for determining the masking properties of a signal in which the signal is decomposed into a set of subband components, as for example by a filterbank.
  • the noise power spectrum that can be masked by each subband component is identified and the noise spectra are combined to yield the noise power spectrum that can be masked by the signal.
  • output signals are generated based on the power in each subband signal and on a masking matrix. The noise power spectrum that can be masked by the input signal is determined from the output signals.
  • FIG. 1 illustrates a flow chart of the inventive method in which for a frame (or segment) of an input signal, a noise level vector, i.e. the spectrum of noise which may be added to the frame without exceeding a masking criterion, is determined a priori .
  • the method involves three main steps.
  • step 120 the input signal frame is broken down, as for example by a filterbank, into subband components whose masking properties are known or can be determined.
  • the masking properties for each component are identified or accessed, e.g. from a database or a library, and in step 160 the masking properties are combined to determine the noise level vector, i.e. the spectrum of noise power that can be masked by the input signal.
  • the method represents the frame of the input signal as a sum of subband components each of whose masking properties has already been measured.
  • the masking properties of the components required in step 140 must first be determined. Once the library of component masking properties is determined and advantageously stored in a database, the masking components can always be accessed, and optionally adapted, to determine the noise level vector of any input signal.
  • the inventive method of FIG. 1 recognizes that the masking property of a speech signal, i.e. the spectrum of noise that the speech signal can mask, can be based on the masking property of components of the speech.
  • a segment or frame of a first speech input signal is split into subband components, as for example by using a filterbank comprising a plurality of subband (bandpass) filters.
  • the spectrum of noise that can be masked by the first speech input signal is determined and then the spectra for all subband components are combined to find the noise level vector for the first speech input signal.
  • a measurement is taken to determine how much narrowband noise in each subband can be masked.
  • the measurement could be summarized as a method consisting of two nested steps: for every subband of speech i and for every subband of white noise j : Adjust the noise in subband j to the point where sufficient noise is added so that the masking criterion is met. Measure the noise-to-signal ratio at this point. repeat for next subband j repeat for next subband i .
  • the noise-to-signal measurements for each combination of i and j , q i , j represent the ratio of noise power in band j that can be masked by the first speech input signal in band i .
  • the elements q i , j form a matrix Q .
  • An example of such a Q matrix is illustrated in FIG. 2A where, for convenience, the entries have been converted to decibels.
  • the Q matrix of FIG. 2A illustrates the results of an experiment in which narrowband speech masked narrowband noise.
  • the row numbers correspond to noise bands; the column numbers correspond to speech bands.
  • Each element q i , j represents the maximum power ratio that can be maintained between noise in band j and the first speech input signal in band i so that the noise is masked. Note that not all q i , j have an associated value, i.e.
  • subband 1 covers a frequency range of 80 Hz, from 0 to 80 Hz
  • each q i , j is a power ratio determined for a particular masking criterion.
  • This definition makes sense for stationary stimuli (i.e. signals whose statistical properties are invariant to time translation), but in the case of dynamic stimuli, such as speech, care must be taken in adding noise power to a signal whose level varies rapidly.
  • this problem is advantageously avoided by arranging for the noise power level to vary with the speech power level so that within a given segment or frame, the ratio of speech to noise power is a predetermined constant.
  • the level of the added noise is dynamically adjusted in order to achieve a constant signal-to-noise ratio (SNR) throughout the frame.
  • SNR signal-to-noise ratio
  • Measuring the amount of masking between one subband component of speech and another subband of noise therefore consists of listening to an ensemble of frames of bandpassed speech with a range of segmental SNRs to determine which SNR value meets the masking criterion.
  • Different frame sizes may advantageously be used for different subbands as described below.
  • quasi-critical band filterbank To split the speech and noise into subbands a non-uniform, quasi-critical band filterbank is designed.
  • the term quasi-critical is used in recognition that the human cochlea may be represented as a collection of bandpass filters where the bandwidth of each bandpass filter is termed a critical band. See , H. Fletcher, "Auditory Patterns," Rev. Mod. Phy. , Vol. 12, pp. 47-65, 1940.
  • the characteristics and parameters of the filters in the filterbank may incorporate knowledge from auditory experiments as, for example, in determining the bandwidth of the filters in the filterbank. Note that it is advantageous that the filterbank used to produce the library of masking properties of components be the same as the filterbank used in step 120 of FIG. 1.
  • each filter should be as rectangular as possible, although significant passband ripple can be sacrificed in the name of greater attenuation. Overlap between adjacent filters should be minimized.
  • the filterbank is not completely faithful to the human ear to the extent that experimentally measured cochlear filter responses are not rectangular and tend to overlap a great deal.
  • the combined output should advantageously be perceptually indistinguishable from the input. This quality of the filterbank may be verified by listening tests.
  • linear phase filters may be used, although it should be noted that because of the asymmetry of forward and backward masking it would be preferable to use minimum phase filters. This last point is illustrated by considering the case when the speech signal consists of a single spike.
  • the combined output of a linear-phase filterbank would consist of the same spike delayed by half of the filter length, but the combined filtered noise would be dispersed equally before and after the spike. Since forward masking extends much farther in time than backward masking, it would be preferable if more noise came after the spike instead of before; this might be achieved with a more complicated minimum-phase filter design.
  • N 20 total subbands, corresponding roughly to the number of critical bands between 0 and 7KHz as found in prior experimental methods.
  • the bandwidths form an increasing geometric series.
  • f 20 a b 20 -1 b -1
  • f 20 is the highest frequency to be included, typically 7KHz in a speech case.
  • Setting a 100, corresponding to previous measurements of the first critical band, and solved for b using Newton's iterative approximation. This value of b is then used to generate an ideal set of band edges as shown in Table 1.
  • filters may be designed.
  • twenty 512-point, min-max optimal filters using the well-known Remez exchange algorithm were designed. Table 2 lists the parameters for each filter.
  • the frame size for each band is advantageously chosen according to the length of the impulse response of the band filter. For higher bands, the energy of the impulse response becomes more concentrated in time, leading to a choice of a smaller frame size.
  • Table 3 shows the relationship between the noise band number and frame size.
  • the volume control may be set to a comfortable level for listening to the full-bandwidth speech and left in the same position when listening to the constituent subbands, which as a result sound much softer than the full speech signal. Listening tests are advantageously be carried out in a soundproof booth using headphones with the same signal is presented to both ears.
  • FIG. 3 is a block diagram of a system to achieve this for each frame of speech.
  • FIG. 4 is a flowchart illustrating steps carried out by the system of FIG. 3. The operation of the system of FIG.
  • Filter speech Input the current frame of speech in step 410. In step 415 the speech is filtered through filter j 315 of the filterbank to produce s j ( n ).
  • Measure energy of bandpass speech The output of filter 315 is then passed through delay 317.
  • the delay allows the system of FIG. 3 to "look ahead" to maintain a constant local NSR as described below.
  • Measure look-ahead energy of bandpass speech Because of the inherent delay imposed by the filterbank, adjustments to the noise level at the filter input are not immediately registered at the output.
  • L 320 samples yields the best results for 512 point filters. Note that this problem would be easier to solve if the filters were minimum-phase rather than linear phase.
  • Compute desired narrowband noise power: In step 430 multiply the speech power by the desired noise-to-signal ratio q ij in adaptive controller 330 to yield a desired noise power, ⁇ : ⁇ p j q ij .
  • e ( n ) u ( n ) S ⁇ ⁇ ⁇ i
  • Filter the adjusted noise The adjusted noise e ( n ) is filtered through band i using filter 350, to yield e i ( n ), and then applied to delay 355 so that the noise is again synchronous with the input frame of speech.
  • the quadratic equation for B usually has two real solutions; typically the solution that minimized
  • noise level vector for a speech signal i.e. the spectrum of noise masked by the input signal
  • a noise level vector for a speech signal may be calculated according to a three step process.
  • speech might best be analyzed in terms of its constituent critical bands, and determining the masking properties of each band.
  • the third step of the process namely, superposing the masking properties of the subbands to form a noise level vector, is discussed.
  • a noise level vector d ( d 1 ,..., d 20 ) can be determined such that noise added at these levels or below does not exceed the masking threshold.
  • the threshold noise power in each band is equal to the product of the signal power and the threshold noise-to-signal ratio.
  • Equation 4.4 thus describes how the noise level vector for a given frame of speech can be determined based on the input power in the speech frame and on the masking properties of speech as represented by the masking matrix Q .
  • the above method is flexible in that new knowledge about masking effects in the human auditory system may be readily incorporated.
  • the choice of a linear superposition rule for example, can be easily changed to a more complex function based on future auditory experiments.
  • the values in the Q matrix need not be fixed.
  • Each element in the matrix could be adaptive, e.g. a function of loudness since masking properties have been shown to change at high volume levels. It would also be easy to use different Q matrices depending on whether the current frame of speech consisted of voiced or unvoiced speech.
  • This disclosure describes a method for measuring the masking properties of components of speech signals and for determining the masking threshold of the speech signals.
  • the method disclosed herein has been described without reference to specific hardware or software. Instead the method has been described in such a manner that those skilled in the art can readily adapt such hardware or software as may be available or preferable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method measures the masking properties of subband components of a signal and determines a noise level vector for the signal. In the preferred embodiment, a signal is separated to yield a set of subband signal components. Bandpass noise components are also generated. For each combination of bandpass noise and subband signal component, the value of the noise-to-signal ratio that meets a specified masking criterion is determined. The values from the combinations are stored. Then, a noise level vector for any other signal can be determined by filtering the signal into a set of components, accessing the stored values and combining the values to yield a measure of the masking properties of the other signal.

Description

    Technical Field
  • The invention relates to a method for measuring masking properties of components of a signal and for determining a noise level vector for the signal.
  • Background of the Invention
  • Advances in digital networks such as ISDN (Integrated Services Digital Network) have rekindled interest in the transmission of high quality image and sound. In an age of compact discs and high-definition television, the trend toward higher and higher fidelity has come to include the telephone as well.
  • Aside from pure listening pleasure, there is a need for better sounding telephones, especially in the business world. Traditional telephony, with its limited bandwidth of 300-3000 Hz for transmission of narrowband speech, tends to strain listeners over the length of a telephone conversation. Wideband speech in the 50-7000 Hz range, on the other hand, offers listeners a feeling of more presence (by reason of transmission of signals in the 50-300 Hz range) and more intelligibility (by reason of transmission of signals in the 3000-7000 Hz range) and is more easily tolerated over longer periods. Thus, wider bandwidth speech transmission is a natural choice for improving the quality of telephone service.
  • In order to transmit speech (either wideband or narrowband) over the telephone network, an input speech signal, which can be characterized as a continuous function of a continuous time variable, must be converted to a digital signal -- a signal that is discrete in both time and amplitude. The conversion is a two step process. First, the input speech signal is sampled periodically in time (i.e. at a particular rate) to produce a sequence of samples where the samples take on a continuum of values. Then the values are quantized to a finite set of values, represented by binary digits (bits), to yield the digital signal. The digital signal is characterized by a bit rate, i.e. a specified number of bits per second that reflects how often the input speech signal was sampled and how many bits were used to quantized the sampled values.
  • The improved quality of telephone service made possible through transmission of wideband speech, unfortunately, typically requires higher bit rate transmission unless the wideband signal is properly coded, i.e. such that the wideband signal can be compressed into representation by a fewer number of bits without introducing obvious distortion due to quantization errors. Recently, high fidelity coders of speech and audio have relied on the notion that mean-squared-error measures of distortion (e.g. measures of the energy difference between a signal and the same signal after it is coded and decoded) do not necessarily accurately describe the perceptual quality of a coded signal. In short, not all kinds of distortion are equally perceptible to the human ear. M. R. Schroeder, B. S. Atal and J. L. Hall, "Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear," J. Acous. Soc. Am., Vol. 66, 1647-1652, 1979; N. Jayant, J. Johnston and R. Safranek, "Signal Compression Based on Models of Human Perception," Proc. IEEE, Vol. 81, No. 10, pp. 1385-1422, October 1993; J. D. Johnston, "Transform Coding of Audio Signals Using Perceptual Noise Criteria," IEEE J. Sel. Areas Comm., Vol. 6, pp. 314-323, 1988. Thus, given some knowledge of how the human auditory system tolerates different kinds of noise, it has been possible to design coders that reduce the audibility -- though not necessarily the energy -- of quantization errors. More specifically, these coders exploit a phenomenon of the auditory system known as masking.
  • Masking is a term describing the phenomenon of human hearing wherein one sound obscures or drowns out another. A common example is where the sound of a car engine is drowned out if the volume of the car radio is high enough. Similarly, if one is in the shower and misses a telephone call, it is because the sound of the shower masked the sound of the telephone ring; if the shower had not been running, the ring would have been heard.
  • The masking properties of a signal are typically measured as a noise-to-signal ratio determined with respect to a masking criterion. For example, one masking criterion is the just-noticeable-distortion (JND) level, i.e. the noise-to-signal ratio where the noise just becomes audible to a listener. Alternatively, another masking criterion is the audible-but-not-annoying level, i.e. the point where a listener may hear the noise, but the noise level is not sufficiently high as to irritate the listener.
  • Experiments in the area of psychoacoustics have focused on the masking properties of pure tones (i.e. single frequencies) and of narrow band noise. See, e.g., J. P. Egan and H. W. Hake, "On the Masking Pattern of a Simple Auditory Stimulus," J. Acous. Soc. Am., Vol. 22, pp. 622-630, 1950; R. L. Wegel and C. E. Lane, "The Masking of One Pure Tone by Another and its Probable Relation to the Dynamics of the Inner Ear," Phys. Rev., Vol. 23, No. 2, pp. 266-285, 1924. Psychoacoustic data gathered during these experiments has demonstrated that: when a first tone is used to mask a second tone, the masking ability of the first tone is maximized when the frequency of the first tone is near the frequency of the second tone and that the ability of narrowband noise to mask the second tone is also maximized when the narrowband noise is centered at a frequency near the second tone a lower frequency tone can mask a higher frequency tone more readily than a higher frequency tone can mask a lower frequency tone.
    The masking properties of more complex signals (such as wideband speech), however, are more difficult to determine, in part, because they are not readily decomposed into the tones and narrowband noise whose masking properties have been studied.
  • Thus, there is a need for a method to a priori measure the masking properties of complex signals, i.e. to determine a priori the level of noise which may be tolerated based on a selected masking criterion. Such measurements may then be used to improve speech coding as described in our co-pending and commonly assigned application "Method for Noise Weighting Filtering," filed concurrently herewith and incorporated by reference.
  • Summary of the Invention
  • Central to the invention is a recognition that the masking properties of a signal, such as wideband speech, may be determined from the masking properties of its subband components. Accordingly, the invention provides a method for determining the masking properties of a signal in which the signal is decomposed into a set of subband components, as for example by a filterbank. In one embodiment, for a given subband component, the noise power spectrum that can be masked by each subband component is identified and the noise spectra are combined to yield the noise power spectrum that can be masked by the signal. In a further embodiment, output signals are generated based on the power in each subband signal and on a masking matrix. The noise power spectrum that can be masked by the input signal is determined from the output signals.
  • Brief Description of the Drawings
  • Advantages of the present invention will become apparent from the following detailed description taken together with the drawings in which:
    • FIG. 1 illustrates the inventive method for determining a noise level vector of a speech signal.
    • FIG. 2A illustrates the elements q i ,j of a masking matrix Q.
    • FIG. 2B illustrates the elements of a noise level vector.
    • FIG. 3 illustrates a system for determining the values of elements q i ,j in masking matrix Q in the inventive method.
    • FIG. 4 is a flow chart for determining the values of the elements q i ,j in masking matrix Q in the inventive method.
    Detailed Description
  • FIG. 1 illustrates a flow chart of the inventive method in which for a frame (or segment) of an input signal, a noise level vector, i.e. the spectrum of noise which may be added to the frame without exceeding a masking criterion, is determined a priori. The method involves three main steps. In step 120 the input signal frame is broken down, as for example by a filterbank, into subband components whose masking properties are known or can be determined. In step 140 the masking properties for each component are identified or accessed, e.g. from a database or a library, and in step 160 the masking properties are combined to determine the noise level vector, i.e. the spectrum of noise power that can be masked by the input signal.
  • Note that the method represents the frame of the input signal as a sum of subband components each of whose masking properties has already been measured. However, in order to determine the noise level vector of an input speech signal, the masking properties of the components required in step 140 must first be determined. Once the library of component masking properties is determined and advantageously stored in a database, the masking components can always be accessed, and optionally adapted, to determine the noise level vector of any input signal.
  • The inventive method of FIG. 1 recognizes that the masking property of a speech signal, i.e. the spectrum of noise that the speech signal can mask, can be based on the masking property of components of the speech. For example, in order to determine the masking properties of speech, a segment or frame of a first speech input signal is split into subband components, as for example by using a filterbank comprising a plurality of subband (bandpass) filters. In order to determine the spectrum of noise that can be masked by the first speech input signal in a first embodiment, the spectrum of noise that can be masked by each subband component of the speech input signal is determined and then the spectra for all subband components are combined to find the noise level vector for the first speech input signal.
  • In another embodiment, for each subband component a measurement is taken to determine how much narrowband noise in each subband can be masked. Thus, the measurement could be summarized as a method consisting of two nested steps:
    for every subband of speech i and for every subband of white noise j: Adjust the noise in subband j to the point where sufficient noise is added so that the masking criterion is met. Measure the noise-to-signal ratio at this point. repeat for next subband j repeat for next subband i.
    The noise-to-signal measurements for each combination of i and j, q i ,j , represent the ratio of noise power in band j that can be masked by the first speech input signal in band i. The elements q i ,j form a matrix Q. An example of such a Q matrix is illustrated in FIG. 2A where, for convenience, the entries have been converted to decibels. The Q matrix of FIG. 2A illustrates the results of an experiment in which narrowband speech masked narrowband noise. The row numbers correspond to noise bands; the column numbers correspond to speech bands. Each element q i ,j represents the maximum power ratio that can be maintained between noise in band j and the first speech input signal in band i so that the noise is masked. Note that not all q i ,j have an associated value, i.e. some entries in the Q matrix are blank, because, as explained below, it typically is not necessary to determine every value in the Q matrix in order to determine the noise level vector. As explained below, the subbands in the Q matrix are not uniform in bandwidth. Instead, the bandwidth of each subband increases with frequency. For example, as shown in Table 2 below, subband 1 covers a frequency range of 80 Hz, from 0 to 80 Hz, while subband 20 covers a frequency range of 770 Hz, from 6230 Hz to 7000 Hz. If the power in each subband of the input frame of the first speech signal is represented as a column vector, p=[p 1,p 2,...p n ] T 2
    Figure imgb0001
    , the noise level vector d NLV may be found based on the Q matrix and on the p vector: d NLV = Qp
    Figure imgb0002
    , i.e. the noise level vector is also a column vector obtained by multiplying the n×n Q matrix by the n column vector of the power in each subband of the input frame of speech as shown in FIG. 2B.
  • In either embodiment, once either the spectrum of noise masked by each subband component or the elements in the Q matrix have been determined for a given input signal, they can be used to determine the spectrum of noise that can be masked not only by the given input signal but also by other input signals. For example, if the power in each subband of a second input signal is p 2=[p 1,p 2,...p n ] T 2
    Figure imgb0003
    , then d NLV 2 = Qp 2
    Figure imgb0004
    with Q as determined by the input signal.
  • Note that each q i ,j is a power ratio determined for a particular masking criterion. This definition makes sense for stationary stimuli (i.e. signals whose statistical properties are invariant to time translation), but in the case of dynamic stimuli, such as speech, care must be taken in adding noise power to a signal whose level varies rapidly. In this instance, this problem is advantageously avoided by arranging for the noise power level to vary with the speech power level so that within a given segment or frame, the ratio of speech to noise power is a predetermined constant. In other words, the level of the added noise is dynamically adjusted in order to achieve a constant signal-to-noise ratio (SNR) throughout the frame. Measuring the amount of masking between one subband component of speech and another subband of noise therefore consists of listening to an ensemble of frames of bandpassed speech with a range of segmental SNRs to determine which SNR value meets the masking criterion. Different frame sizes may advantageously be used for different subbands as described below.
  • In the paragraphs that follow a more rigorous presentation is given of the method described above. A method for determining the masking properties of the component signals required for step 140 is presented below first, and then a method of combining the component masking properties in step 160 is presented. The presentation concludes with a short discussion of other potential uses for the inventive method.
  • The more rigorous presentation begins by assuming that an input speech signal, s(n) is divided via a bank of filters into N subbands s 1(n),...,s N (n), and that the noise maskee d(n) is similarly split into subband components d 1 (n),...,d N (n). For each pair of subbands (i,j), measure the maximum segmental noise-to-signal ratio (NSR) between d j (n) and s i (n) such that the combination of d j (n)+s i (n) meets a given masking threshold, e.g. such that the combination of d j (n)+s i (n) is aurally indistinguishable (i.e. meets the just noticeable distortion level) from s i (n) alone. Define the NSR to be the reciprocal of the traditional SNR, i.e. NSR ij 1 SNR ij q ij = | d j | 2 | s i | 2 k d j 2 ( k ) k s i 2 ( k ) ,
    Figure imgb0005
    where the summation limits span the current frame of speech.
  • To split the speech and noise into subbands a non-uniform, quasi-critical band filterbank is designed. The term quasi-critical is used in recognition that the human cochlea may be represented as a collection of bandpass filters where the bandwidth of each bandpass filter is termed a critical band. See, H. Fletcher, "Auditory Patterns," Rev. Mod. Phy., Vol. 12, pp. 47-65, 1940. Thus, the characteristics and parameters of the filters in the filterbank may incorporate knowledge from auditory experiments as, for example, in determining the bandwidth of the filters in the filterbank. Note that it is advantageous that the filterbank used to produce the library of masking properties of components be the same as the filterbank used in step 120 of FIG. 1. However, some constraints on the filterbank may be advantageously imposed to make measurements obtained with one set of filterbank subbands more readily applicable to filterbanks with other subbands. In particular:
    Each filter should be as rectangular as possible, although significant passband ripple can be sacrificed in the name of greater attenuation. Overlap between adjacent filters should be minimized. Thus the filterbank is not completely faithful to the human ear to the extent that experimentally measured cochlear filter responses are not rectangular and tend to overlap a great deal. These conditions are imposed, however, since the ultimate interest is in the problem of coding, and splitting an input signal into (nearly) orthogonal subbands prevents coding the same information twice. The composite response of the filters should have nearly flat frequency response. Although perfect reconstruction is not required, the combined output should advantageously be perceptually indistinguishable from the input. This quality of the filterbank may be verified by listening tests. To avoid audible distortions due to different group delays, linear phase filters may be used, although it should be noted that because of the asymmetry of forward and backward masking it would be preferable to use minimum phase filters. This last point is illustrated by considering the case when the speech signal consists of a single spike. The combined output of a linear-phase filterbank would consist of the same spike delayed by half of the filter length, but the combined filtered noise would be dispersed equally before and after the spike. Since forward masking extends much farther in time than backward masking, it would be preferable if more noise came after the spike instead of before; this might be achieved with a more complicated minimum-phase filter design.
  • In order to model the constant-Q, critical band nature of the cochlea, the following constraints may also advantageously be imposed: N=20 total subbands, corresponding roughly to the number of critical bands between 0 and 7KHz as found in prior experimental methods. The bandwidths form an increasing geometric series. Assume that the first band spans the frequencies [0,a] and call b the ratio between successive bandwidths, then these last two conditions may be summarized as f 20 = a b 20 -1 b -1 ,
    Figure imgb0006
    where f 20 is the highest frequency to be included, typically 7KHz in a speech case. Setting a = 100, corresponding to previous measurements of the first critical band, and solved for b using Newton's iterative approximation. This value of b is then used to generate an ideal set of band edges as shown in Table 1.
  • Using these ideal band edges as a starting point, filters may be designed. In one embodiment of the invention, twenty 512-point, min-max optimal filters using the well-known Remez exchange algorithm were designed. Table 2 lists the parameters for each filter. Typically, it may be necessary to adjust the band edges so that the composite filterbank response would be flatter, but the filterbank's combined output should sound identical to the input.
  • Since the human cochlea exhibits increasing time resolution at higher frequencies, the frame size for each band is advantageously chosen according to the length of the impulse response of the band filter. For higher bands, the energy of the impulse response becomes more concentrated in time, leading to a choice of a smaller frame size. Table 3 shows the relationship between the noise band number and frame size.
  • Despite the well-known dependence of masking on stimulus level, no precise restrictions on loudness during the experiments typically need be imposed. It is usually sufficient to measure masking effects under the normal operating conditions of an actual speech coder. Thus the volume control may be set to a comfortable level for listening to the full-bandwidth speech and left in the same position when listening to the constituent subbands, which as a result sound much softer than the full speech signal. Listening tests are advantageously be carried out in a soundproof booth using headphones with the same signal is presented to both ears.
  • As mentioned above, the level of the noise should be adjusted on a frame-by-frame basis in order to maintain a constant local NSR, q ij . FIG. 3 is a block diagram of a system to achieve this for each frame of speech. FIG. 4 is a flowchart illustrating steps carried out by the system of FIG. 3. The operation of the system of FIG. 3 is advantageously described on a step-by-step basis:
    Generate a frame of unit variance noise: Unit variance Gaussian random noise generator 305 is used to produce u(n) in step 405, which is then scaled according to u ( n )← u ( n ) N k = mN mn + N -1 u 2 ( k ) ,
    Figure imgb0007
    where N is the frame size and m is the number of the current frame, starting from m=0. This ensures noise with unit variance on a frame-by-frame basis. Filter speech: Input the current frame of speech in step 410. In step 415 the speech is filtered through filter j 315 of the filterbank to produce s j (n). Measure energy of bandpass speech: The output of filter 315 is then passed through delay 317. The delay allows the system of FIG. 3 to "look ahead" to maintain a constant local NSR as described below. To compute how much noise to inject in this frame, in step 420 calculate the energy p j of the speech as, p j = k = mN mN + N -1 s j 2 ( k - L ) ,
    Figure imgb0008
    using energy measurer 320 where L s the amount of delay as explained in more detail below. Measure look-ahead energy of bandpass speech: Because of the inherent delay imposed by the filterbank, adjustments to the noise level at the filter input are not immediately registered at the output. Therefore some measure of the speech power is needed in the near future to help decide how to adjust the noise level in the present. The look-ahead energy p ^
    Figure imgb0009
    j is defined as the energy of one frame of s j (n): p ^ j = k = mN mN + N -1 s j 2 ( k ) .
    Figure imgb0010
    Typically L=320 samples yields the best results for 512 point filters. Note that this problem would be easier to solve if the filters were minimum-phase rather than linear phase. Compute desired narrowband noise power: In step 430 multiply the speech power by the desired noise-to-signal ratio q ij in adaptive controller 330 to yield a desired noise power, Δ: Δ = p j q ij .
    Figure imgb0011
    Estimate required broadband noise power: To approximate the desired noise power at the filter output, it is noted that for a filter of bandwidth ω i Hz, the filtered unit-variance noise should have a variance of ω i /S, where S is the Nyquist frequency. Linearity may therefore be exploited to try to achieve the desired noise power Δ at the filter output. Because of the filter delays described above, instead of using the speech power in the current frame to compute Δ, a look-ahead desired noise energy Δ ^
    Figure imgb0012
    is defined: Δ ^ = p ^ j q ij .
    Figure imgb0013
    Then the noise is scaled in pre-adjuster 340 in order to try to achieve the look-ahead energy as follows: e ( n ) = u ( n ) S Δ ^ ω i ,
    Figure imgb0014
    Filter the adjusted noise: The adjusted noise e(n) is filtered through band i using filter 350, to yield e i (n), and then applied to delay 355 so that the noise is again synchronous with the input frame of speech. Measure the energy of the bandpass noise: Next measure the actual bandpass noise power, d i in measurer 360: d i = k = mN mN + N -1 e i 2 ( k - L ) .
    Figure imgb0015
    Fine-tune the noise: To adjust the noise so that the desired NSR is achieved exactly, apply at multiplier 380 a time-varying gain g i at the filter output. To minimize smearing in the noise spectrum, it is advantageous to vary g i smoothly so that it takes the form
    Figure imgb0016

    where A is the final value of g i from the previous frame, W is the length of the smoothing window (which can be thought of as half of a Hann window), and B is the final value of g i . Thus, given A and W, one should be able to solve for B such that k = mN mN + N -1 {e i ( k - L ) g i ( k - L ) } 2 =Δ .
    Figure imgb0017
    Because g i is linear in B, the above expression becomes a quadratic equation of the form α 2 B 2 1 B +α 0 =0,
    Figure imgb0018
    where α 2 = 1 4 k = mN mN + W -1 (1-cos π( k - L ) W ) 2 e i 2 ( k - L ) + k = mN + W mN + N -1 e i 2 ( k - L ) α 1 = A 2 k = mN mN + W -1 (1-cos 2 π( k - L ) W e i 2 ( k - L ) α 0 = A 2 4 k = mN mN + W -1 (1+cos π( k - L ) W ) 2 e i 2 ( k - L )-Δ .
    Figure imgb0019
    Thus a compromise is forced between a smooth transition using a long window, and a crisp change to the desired noise level using a short window. Making the window too short smears the spectrum of the bandpass noise, an effect that typically is quite noticeable, leading to severe underestimates of masking power. Making the window too long, however, leads to more subtle clicks that emerge when the noise level lags behind the speech. Thus, an initial value of W=N /2
    Figure imgb0020
    was chosen.
  • The quadratic equation for B usually has two real solutions; typically the solution that minimized |A-B| was chosen in order to avoid drastic changes in gain and reduce spectral smearing. Sometimes, however, there is no real solution. This may occur at transitions from loud to soft frames, when reducing the gain gradually had the effect of including more noise at the beginning of the frame than we wanted in the entire frame. In these cases W may be decremented until the longest possible window that allowed an exact solution was found. In rare cases this search can lead to W=0, but only during very soft passages when both speech and noise were below the threshold of hearing. In the W=0 case, g i has the form g i ( n - L ) = Δ k = mN mN + N -1 e i 2 ( k - L )
    Figure imgb0021
       Since there are 20 sub-bands, potentially 400 combinations of i and j need to be measured. However, it is not typically necessary to carry out the experiment for every particular (i,j) combination because masking depends on how closely the signal component and masker are in frequency. Thus, typically measurements should be taken for combinations of i and j such that | i - j |≦2
    Figure imgb0022
    . Values for q i ,j for | i - j |>2
    Figure imgb0023
    can typically be assumed to be zero, i.e. no masking takes place, with perhaps the exception of small values of i and j where masking may sometimes extend over 3 bands.
  • Recall that a noise level vector for a speech signal. i.e. the spectrum of noise masked by the input signal, may be calculated according to a three step process. Already demonstrated is that speech might best be analyzed in terms of its constituent critical bands, and determining the masking properties of each band. Now the third step of the process, namely, superposing the masking properties of the subbands to form a noise level vector, is discussed.
  • Given a vector of speech powers p = ( p 1 ,..., p 20 )
    Figure imgb0024
    , where p i corresponds to the power of the speech in band i in the current frame, a noise level vector d = ( d 1 ,..., d 20 )
    Figure imgb0025
    can be determined such that noise added at these levels or below does not exceed the masking threshold.
  • This calculation requires knowledge of how to add the masking effects of two or more maskers and the effects are combined simple addition; or, more formally:
       Linear superposition of noise power: If a signal S masks a noise power vector d = ( d 1 ,..., d 20 ) T
    Figure imgb0026
    , i.e., where d j is the power of the noise in band j in the current frame and "T" indicates the transpose; and another signal S', uncorrelated with S, masks a noise power vector d' = (d ' 1
    Figure imgb0027
    ,...,d '  20
    Figure imgb0028
    ) T ; then the combined signal S + S' will mask the noise power vector d + d ' = ( d 1 + d ' 1 ,..., d 20 +d' 20 ) T
    Figure imgb0029
    Simple addition is advantageously used instead of non-linear superpositions rules because it typically leads to more conservative estimates of the masking properties of the signal.
  • Note generally that the superposition idea assumes that consecutive bands in the filterbank do not overlap, so that the noise level in one band can be adjusted without affecting the level in another, and so that the speech may be decomposed into uncorrelated subbands. Thus high-order, nearly rectangular filters in the filterbank were used.
  • Accordingly the total spectrum of the noise level vector, d NLV can be found in a given frame if we know the masking property d i for every band of speech i = 1,...,20 is known. This involves a simple sum of noise powers: d NLV = i =1 20 d i .
    Figure imgb0030
    To find the masked noise vector d i for speech band i, use the measured threshold NSRs q ij . Since the speech power p i and the minimum ratio of speech to noise power q ij are known, then the maximum masked power in bands 1-20 using one column of the q ij matrix can be computed: d i = [ p i q i 1 , p i q i 2 ,..., p i q i 20 ] T .
    Figure imgb0031
    In other words, the threshold noise power in each band is equal to the product of the signal power and the threshold noise-to-signal ratio.
  • Combining equations 4.2 and 4.3 to summarize the method as one matrix equation yield. d NLV = Qp ,
    Figure imgb0032
    where Q = {q ij }
    Figure imgb0033
    . (Note that whenever q ij has not been measured, assume that there is zero masking; q ij = 0.) Equation 4.4 thus describes how the noise level vector for a given frame of speech can be determined based on the input power in the speech frame and on the masking properties of speech as represented by the masking matrix Q.
  • The above method is flexible in that new knowledge about masking effects in the human auditory system may be readily incorporated. The choice of a linear superposition rule, for example, can be easily changed to a more complex function based on future auditory experiments. The values in the Q matrix, moreover, need not be fixed. Each element in the matrix could be adaptive, e.g. a function of loudness since masking properties have been shown to change at high volume levels. It would also be easy to use different Q matrices depending on whether the current frame of speech consisted of voiced or unvoiced speech.
  • This disclosure describes a method for measuring the masking properties of components of speech signals and for determining the masking threshold of the speech signals. The method disclosed herein has been described without reference to specific hardware or software. Instead the method has been described in such a manner that those skilled in the art can readily adapt such hardware or software as may be available or preferable.
  • While the above teaching of the present invention has been in terms of determining the masking properties of speech signals, those skilled in the art of digital signal processing will recognize the applicability of these teachings to other specific contexts. Thus, for example, the masking properties of music, other audio signals, images and other signals may be determined using the present invention. TABLE 1
    Band number Lower edge Hz Upper edge Hz
    1 0 100
    2 100 212
    3 212 337
    4 337 476
    5 476 632
    6 632 806
    7 806 1001
    8 1001 1219
    9 1219 1462
    10 1462 1734
    11 1734 2038
    12 2038 2377
    13 2377 2756
    14 2756 3180
    15 3180 3654
    16 3654 4183
    17 4183 4775
    18 4775 5436
    19 5436 6174
    20 6174 7000
    TABLE 2
    Band number Lower edge Hz Upper edge Hz Δf low Hz Δf high W Scale factor
    1 0 80 70 80 200.0 1.0
    2 120 195 75 75 450.0 0.9
    3 228 300 80 80 300.0 0.9
    4 337 435 75 75 300.0 0.9
    5 485 600 90 90 150.0 1.0
    6 660 806 85 85 150.0 1.0
    7 860 1000 85 85 150.0 1.0
    8 1060 1210 85 85 150.0 1.0
    9 1265 1460 85 85 150.0 1.0
    10 1515 1735 85 85 150.0 1.0
    11 1790 2038 85 85 150.0 1.0
    12 2095 2377 85 85 150.0 1.0
    13 2435 2756 85 85 150.0 1.0
    14 2815 3180 85 85 150.0 1.0
    15 3239 3654 85 85 150.0 1.0
    16 3712 4183 85 85 150.0 1.0
    17 4242 4775 85 85 150.0 1.0
    18 4835 5437 85 85 150.0 1.0
    19 5495 6174 85 85 150.0 1.0
    20 6230 7000 85 85 150.0 1.0
    TABLE 3
    Noise band# Frame size (samples)
    1-5 512
    6-14 256
    15-20 128

Claims (22)

  1. A method of determining the noise power spectrum that can be masked by a signal, the method comprising the steps of:
       separating said signal into a set of subband components,
       identifying the noise power spectrum that can be masked by each subband component in said set of subband components, and
       combining the identified noise power spectrum masked by each subband component to yield the noise power spectrum that can be masked by said signal.
  2. The method of claim 1 wherein the step of separating comprises the step of:
       applying said signal to a filterbank comprising a set of filters wherein the output of each filter in said set of filters is a subband component of the signal.
  3. The method of claim 1 wherein the step of combining comprises the step of:
       adding the noise power spectra masked by each subband component to yield the noise power spectrum masked by said signal.
  4. The method of claim 1 wherein said signal is wideband speech.
  5. A method comprising the steps of:
       separating an input signal to a set of subband signal components, and
       generating output signals based on the power in each subband signal component and on a masking matrix.
  6. The method of claim 5 wherein said masking matrix Q is an n×n matrix wherein each element q i ,j of said masking matrix is the ratio of the noise power in band j that can be masked by the power of the subband signal component in band i.
  7. The method of claim 5 wherein the input signal is a speech signal.
  8. The method of claim 5 wherein the step of separating comprises the step of:
       applying said input signal to a filterbank comprising a set of filters wherein the output of each filter in said set of filters is a subband component of the signal.
  9. A method comprising the steps of:
       separating a signal into a set of n subband signal components, wherein each subband signal component is characterized by a power level,
       generating a set of n subband noise components, and
       for combinations of one subband signal component i,i=1,2,...n and one subband noise component j,j=1,2,...n, measuring the ratio of the power level of the j th subband noise component that can be masked by the i th subband signal component to the power level of the i th subband signal component.
  10. The method of claim 9 wherein the power level of each subband noise component that can be masked by each subband signal component is determined according to a masking criterion.
  11. The method of claim 10 wherein said masking criterion is a just-noticeable-distortion level.
  12. The method of claim 10 wherein said masking criterion is an audible-but-not-annoying level.
  13. The method of claim 9 wherein said step of separating a signal into a set of n subband signal components comprises the step of applying said signal to a first filterbank comprising a first set of n filters, wherein the outputs of said first set of filters in said first filterbank are the set of n subband signal components.
  14. The method of claim 13 wherein said step of generating a set of n subband noise components comprises applying a wideband noise signal to a second filterbank comprising a second set of filters, said second filterbank having the same filter characteristics as said first filterbank, wherein the outputs of said second set of filters in the second filterbank are said set of n subband noise components.
  15. The method of claim 10 wherein
       the measured ratio is an element q i ,j of a masking matrix Q.
  16. The method of claim 15 further comprising the steps of:
       multiplying the masking matrix by a vector p whose elements p i are the power in each subband component of an input signal, to yield the noise power spectrum that can be masked by the signal.
  17. A method of determining the power of a filtered noise signal that can be masked by a filtered frame of speech, said method comprising the steps of:
       delaying said filtered frame of speech by a specified time,
       determining the power of said filtered frame of speech,
       measuring the power of said filtered noise signal,
       delaying said filtered noise signal by said specified time, and
       adjusting the power of said filtered noise signal as a function of the power of said filtered frame of speech and of a desired noise-to-signal ratio to yield the power of the filtered noise signal that is masked by the filtered frame of speech.
  18. The method of claim 17 further comprising the step of multiplying said filtered noise signal by a gain signal so as to achieve the desired noise-to-signal ratio.
  19. The method of claim 17 wherein said specified time is a function of the impulse response of said first filter.
  20. The method of claim 17 wherein said desired noise-to-signal ratio is determined according to a masking criterion.
  21. The method of claim 17 further comprising the steps of:
       generating a noise signal, said noise signal having unit variance; and
       applying said noise signal to a second filter to generate said filtered noise signal.
  22. A method comprising the steps of:
       applying an input speech signal to a filterbank, said filterbank comprising a set of n filters wherein the output of each filter is a respective subband signal component in a set of n subband signal components, and
       generating output signals based on the product of a masking matrix Q and a vector p, wherein said masking matrix Q is an n×n matrix in which each element q i ,j of said masking matrix is the ratio of power of the noise in filter j that can be masked by the power of the subband signal component in band i and wherein said vector p is a vector of length n in which each element p i is the power of the i th signal component.
EP95309003A 1994-12-30 1995-12-12 A method for measuring speech masking properties Withdrawn EP0720146A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US36737194A 1994-12-30 1994-12-30
US367371 1994-12-30

Publications (1)

Publication Number Publication Date
EP0720146A1 true EP0720146A1 (en) 1996-07-03

Family

ID=23446902

Family Applications (1)

Application Number Title Priority Date Filing Date
EP95309003A Withdrawn EP0720146A1 (en) 1994-12-30 1995-12-12 A method for measuring speech masking properties

Country Status (3)

Country Link
EP (1) EP0720146A1 (en)
JP (1) JPH08272391A (en)
CA (1) CA2165352A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101727A1 (en) * 2001-06-12 2002-12-19 Globespan Virata Incorporated Method and system for determining filter gain and automatic gain control
CN108806660A (en) * 2017-04-26 2018-11-13 福特全球技术公司 Sensitivity is gone to the active sound of vehicle medium pitch noise

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107942152A (en) * 2017-11-15 2018-04-20 中国电子科技集团公司第四十研究所 A kind of noise-measuring system and measuring method of microwave radio front end

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0240330A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0240329A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0575815A1 (en) * 1992-06-25 1993-12-29 Atr Auditory And Visual Perception Research Laboratories Speech recognition method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0240330A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0240329A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0575815A1 (en) * 1992-06-25 1993-12-29 Atr Auditory And Visual Perception Research Laboratories Speech recognition method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101727A1 (en) * 2001-06-12 2002-12-19 Globespan Virata Incorporated Method and system for determining filter gain and automatic gain control
US7013271B2 (en) 2001-06-12 2006-03-14 Globespanvirata Incorporated Method and system for implementing a low complexity spectrum estimation technique for comfort noise generation
CN108806660A (en) * 2017-04-26 2018-11-13 福特全球技术公司 Sensitivity is gone to the active sound of vehicle medium pitch noise
CN108806660B (en) * 2017-04-26 2023-12-01 福特全球技术公司 Active acoustic desensitization to tonal noise in vehicles

Also Published As

Publication number Publication date
CA2165352A1 (en) 1996-07-01
JPH08272391A (en) 1996-10-18

Similar Documents

Publication Publication Date Title
US5623577A (en) Computationally efficient adaptive bit allocation for encoding method and apparatus with allowance for decoder spectral distortions
US5825320A (en) Gain control method for audio encoding device
KR100913987B1 (en) Multi-channel synthesizer and method for generating a multi-channel output signal
JP3804968B2 (en) Apparatus and method for adaptive allocation encoding / decoding
US5414795A (en) High efficiency digital data encoding and decoding apparatus
US5301255A (en) Audio signal subband encoder
EP1939862B1 (en) Encoding device, decoding device, and method thereof
JP3033156B2 (en) Digital signal coding device
EP2991075B1 (en) Speech coding method and speech coding apparatus
DE69633633T2 (en) MULTI-CHANNEL PREDICTIVE SUBBAND CODIER WITH ADAPTIVE, PSYCHOACOUS BOOK ASSIGNMENT
KR100295217B1 (en) High efficiency encoding and/or decoding device
US6604069B1 (en) Signals having quantized values and variable length codes
JPH07273657A (en) Information coding method and device, information decoding method and device, and information transmission method and information recording medium
JPH10511243A (en) Apparatus and method for applying waveform prediction to subbands of a perceptual coding system
JP3277682B2 (en) Information encoding method and apparatus, information decoding method and apparatus, and information recording medium and information transmission method
US6199038B1 (en) Signal encoding method using first band units as encoding units and second band units for setting an initial value of quantization precision
EP1606797A1 (en) Processing of multi-channel signals
US5303346A (en) Method of coding 32-kb/s audio signals
JP3297050B2 (en) Computer-based adaptive bit allocation encoding method and apparatus for decoder spectrum distortion
EP0720146A1 (en) A method for measuring speech masking properties
JPH06242797A (en) Block size determining method of converting and encoding device
JPH08123488A (en) High-efficiency encoding method, high-efficiency code recording method, high-efficiency code transmitting method, high-efficiency encoding device, and high-efficiency code decoding method
EP2355094B1 (en) Sub-band processing complexity reduction
JP3033157B2 (en) Digital signal coding device
JP2002189499A (en) Method and device for compressing digital audio signal

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB IT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

17P Request for examination filed

Effective date: 19961211

18W Application withdrawn

Withdrawal date: 19970107