WO1991003042A1 - A method and an apparatus for classification of a mixed speech and noise signal - Google Patents

A method and an apparatus for classification of a mixed speech and noise signal Download PDF

Info

Publication number
WO1991003042A1
WO1991003042A1 PCT/DK1990/000214 DK9000214W WO9103042A1 WO 1991003042 A1 WO1991003042 A1 WO 1991003042A1 DK 9000214 W DK9000214 W DK 9000214W WO 9103042 A1 WO9103042 A1 WO 9103042A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
speech
envelopes
synchronism
noise
Prior art date
Application number
PCT/DK1990/000214
Other languages
French (fr)
Inventor
Claus Elberling
Michael Ekelid
Carl Ludvigsen
Original Assignee
Otwidan Aps Forenede Danske Høreapparat Fabrikker
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Otwidan Aps Forenede Danske Høreapparat Fabrikker filed Critical Otwidan Aps Forenede Danske Høreapparat Fabrikker
Publication of WO1991003042A1 publication Critical patent/WO1991003042A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the invention concerns a method and an apparatus for classification of a mixed speech and noise signal as being significantly or insignificantly affected by the speech signal.
  • the time intervals where the mixed signal is insignifi ⁇ cantly affected by the speech signal may be used for forming a running estimate of the noise signal with known methods, it being possible to suppress the noise on the basis of this estimate.
  • the invention may be used in electroacustic systems for transmission and signal processing of speech signals (e.g. mobile telephones, speech recognition systems and hearing aids), where it is endeavoured to eliminate or reduce de- gradation of speech quality, speech recognition and speech perception because of present background noise using noise suppressing and/or speech enhancing methods.
  • speech signals e.g. mobile telephones, speech recognition systems and hearing aids
  • Electroacustic systems for transmission and signal pro- cessing of speech signals exist in numerous types and for many different purposes.
  • the expansive development in the field of digital electronics, including particularly the digital signal processors, has made it possible to employ a plurality of methods not practically useful before in connection with removing or suppressing, in real time, the background noise, which occurs either acoustically simul ⁇ taneously with the speech signal (e.g. in a helicopter cockpit where machine and rotor noise affects the acoustic communication from the pilot) or as an electric signal, equivalent therewith, in the transmission system itself.
  • Such methods are known from the literature and are called noise suppression or speech enhancement methods. Of these methods may be mentioned adaptive filtering and spectral subtraction. See e.g. (1) and (7).
  • the aim of improving the signal/noise ratio is that the methods are to counteract the degradation of the reception caused by the noise and the intelligibility of the transmitted speech signal.
  • Several of the known methods are based on a run- ning estimate of the statistic characteristics of the background noise, e.g. intensity and frequency content.
  • the characteristics of the noise may be estimated by suitable signal analysis. Assuming a certain stationarity of the background noise this estimate may be used for adjusting the noise suppression or speech en ⁇ hancement method until the next time the noise can be estimated.
  • the energy histogram and valley detector prin ⁇ ciples In particular two of the known principles should be men ⁇ tioned: the energy histogram and valley detector prin ⁇ ciples.
  • a noise suppression method (3) use of the valley detector method is reported for pointing out the time intervals in which a mixed speech and noise signal exclusively consists of background noise (i.e. corresponding to pauses in the speech signal).
  • the method is incorporated in a type of feedback loop by acting on the individual frequency bands of the output signal and with the purpose of increasing the field of use of the speech/noise detector.
  • none of the known speech and pause detectors are particularly robust when the speech signal is subjected to e.g. considerable reverberation, or when the background noise is added in a poor signal/noise ratio (less than 0 dB) or has a speech-like nature, i.e. resembles the speech signal from one or more speakers. In these cases the detection will be less certain with known methods. It has been attempted to reduce this problem by using a priori knowledge about the speech and noise signals. It has thus been utilized in (1) and (2) that the amplitude fluctuations in speech and noise are different in certain cases. When, however, the noise is speech-like, this difference will be marginal.
  • a speech signal includes a plu ⁇ rality of time segments where the speech signal contri ⁇ butes only insignificantly to the mixed signal.
  • Such seg ⁇ ments are not just speech pauses (between words and sen- tences, breathing), but in particular also very short in ⁇ tervals, typically within a word where the speech signal assumes a value so that it just contributes insignifi ⁇ cantly to the mixed signal.
  • These segments are detected, and it is possible- on the basis of this to update para ⁇ meters for the background noise. This is done with unpre- cedented frequency and can therefore form the basis for a considerably more precise estimate of the background noise.
  • the energy can assume relatively great values in short time intervals, corresponding to some of the voiced sounds (e.g. the open vowels) as well as some of the consonants (the fricatives and the plosives). Therefore, the signal/noise ratio will be relatively great in time segments containing these speech sounds, and these segments are thus particularly useful for detecting pre ⁇ sence of speech in background noise.
  • the reason why the energy is great in the mentioned speech sounds is the following:
  • a vowel may be described as a (quasi)periodic time signal which in terms of frequency consists of a funda ⁇ mental frequency and its harmonics, whereby the speech energy simultaneously occurs in a larger frequency range.
  • a fricative and/or a plosive may be described as a short, noise-like time signal where the energy simul ⁇ taneously occurs in a wide frequency range.
  • the frequency range of the speech signal is suitably divided into a plu ⁇ rality of frequency bands, and it thus applies that for each of the two types of speech sounds the energy occurs with a certain simultaneousness between the frequency bands.
  • the envelope of a frequency restricted subsignal containing two or more consecutive harmonic frequencies will always be periodic and substantially synchronous with the fundamental frequency, since the envelope represents a beat signal with a frequency equal to the difference be ⁇ tween the two harmonics, which is precisely equal to the fundamental frequency. Since it is the same frequency, viz. the fundamental frequency of the speech signal, for all the subsignals which causes the beat signal which is detected by envelopment, the envelopes of the subsignals will substantially be synchonous or correlated with each other.
  • each subsignal has a frequency band width which always comprises at least two harmonic frequencies. This is obtained with a band width of at least twice the fundamental frequency. If the fundamental frequency is e.g. 220 Hz, the band width must at least be 440 Hz.
  • fig. 1 is a block diagram schematically showing an appa ⁇ ratus according to the invention
  • fig. 2 shows an example of an input signal consisting of a portion of a speech signal without noise, and how this signal is processed in the apparatus in fig. 1,
  • fig. 2A shows the input signal
  • fig. 2B shows the frequency limited subsignals originating from filtering of the input signal
  • fig « 2C shows the envelope signals corresponding to the subsignals in fig. 2B
  • fig. 2D shows the synchonism signal from the synchronism detector as well as a threshold value with which it is compared
  • fig. 2E shows the final classification signal from the threshold detector.
  • an electric input signal 101 consisting of a speech signal mixed with a noise signal (trafic noise, cafeteria noise, speech from other persons or the like) is passed to a filter bank 102 consisting of a plurality of optionally overlapping bandpass filters with increasing center frequency and covering in combination the entire frequency range of the speech signal or part thereof.
  • Each bandpass filter has a band width greater than twice the greatest expected value of the fundamental frequency of the speech signal,- so that a subsignal 103 comprising at least two consecutive harmonic frequencies to the funda- mental frequency can pass through each bandpass filter.
  • the subsignals are passed to their respective envelope detectors 104, which form the time envelopes 105 for the subsignals 103 e.g. by means of rectification, squaring or analytical signals as well as optional subsequent low-pass filtering.
  • This signal processing, which following band ⁇ pass filtering of the input signal generates and utilizes the envelopes of the bandpass filtered subsignals is known in other connections from the acoustic/audiological field, see e.g. (6).
  • the envelope signals are passed to a synchronism detector 106, which produces a measure of synchronism between the envelope signals 105 for a time segment of the signals. Then, the time course of the computed synchronism has the shape of a staircase curve and is called the synchronism signal 107.
  • the principle of the synchronism detector 106 may e.g. be based on correlation, an artificial neural network or another computing method applied to all or a subset of the envelope signals 105.
  • a correlation can be computed by first computing the product sum of the signal values for any pair of signals i.e. the envelope signals from two adjacent bandpass filters and then performing summation of all the computed product sums.
  • the synchronism signal 107 is passed to a thres ⁇ hold detector 108 where the synchronism signal 107 is com- pared with a threshold value. If the synchronism signal
  • the classification signal 109 is set to the value binary 1. If not, the classification signal 109 is set to the value binary 0.
  • the overall function of the synchronism detector 106 and the threshold detector 108 may also be implemented by means of either a trained, a self-organizing or other artificial neural network using the envelope signals 105 as input signals and forming the desired classification signal 109 as output signal for classification of the mixed signal.
  • Presence of a noise signal affects the classification more or less depending upon the characteristics of the noise signal. If the noise signal is stochastic, speech-like noise, the speech detection will by and large not be af ⁇ fected even with a very small signal/noise ratio. If, on the other hand, the noise signal is a signal with an in- herent modulation as a speech signal, or if it is a real speech signal from one or more persons, the interplay be ⁇ tween the actual signal/noise ratio and the construction of the threshold detector 108 will be of decisive impor ⁇ tance. When e.g.
  • the threshold detector 108 is arranged such that the threshold value 210 with a given time con ⁇ stant adaptively adjusts itself corresponding to a given fraction of the size of the synchronism signal 107, then only the dominating speech signal will advantageously be detected. Removal of the lowest frequency components of the synchronism signal provides the additional advantage that a continuous noise signal consisting of harmonic fre ⁇ quency components (e.g. acoustic noise from a rotating machine), will not erroneously be classified as being a speech signal.
  • Fig. 2 shows an example of how a given input signal 201 is processed in the apparatus in fig. 1. To illustrate the fundamental principle of the invention the input signal 201 is shown in fig.
  • Fig. 2A shows a short speech signal without noise consisting first of a (voiced) vowel and then of an unvoiced fricative.
  • Fig. 2B shows the frequency limited subsignals 203 formed in the filter bank 102.
  • Fig. 2C illustrates the envelope signals 205 formed by the enve ⁇ lope detectors 104 from the subsignals 203 in fig. 2B.
  • the envelope signals 205 At the vowel, the envelope signals 205 in several frequency bands are shown to be correlated with each other and modu ⁇ lated with a frequency corresponding to the fundamental frequency.
  • the envelope signals 205 show that short-term energy is present simultaneously in several frequency bands.
  • Fig. 2D shows the synchronism signal 207 computed from the synchronism detector 106 as well as the threshold value 210 with which it is compared.
  • fig. 2E shows the obtained classification signal 209.
  • An apparatus according to the invention may be implemented either in analog or digital hardware or in software or in combinations thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

For classification of a mixed speech and noise signal (101) the signal is divided into separate, frequency limited subsignals (103), each of which contains at least two harmonic frequencies for the speech signal. The envelopes (105) of the subsignals (103) are formed as well as a measure (107) of synchronism between the envelopes (105). The synchronism measure (107) is compared with a threshold value for classification of the mixed signal as being significantly or insignificantly affected by the speech signal. The classification takes place with an unpresidented frequency and can therefore form the basis for a considerably more precise estimate of the noise signal than before, in particular when this has a speech-like nature.

Description

A method and an apparatus for classification of a mixed speech and noise signal
The invention concerns a method and an apparatus for classification of a mixed speech and noise signal as being significantly or insignificantly affected by the speech signal.
The time intervals where the mixed signal is insignifi¬ cantly affected by the speech signal may be used for forming a running estimate of the noise signal with known methods, it being possible to suppress the noise on the basis of this estimate.
The invention may be used in electroacustic systems for transmission and signal processing of speech signals (e.g. mobile telephones, speech recognition systems and hearing aids), where it is endeavoured to eliminate or reduce de- gradation of speech quality, speech recognition and speech perception because of present background noise using noise suppressing and/or speech enhancing methods.
Electroacustic systems for transmission and signal pro- cessing of speech signals exist in numerous types and for many different purposes. The expansive development in the field of digital electronics, including particularly the digital signal processors, has made it possible to employ a plurality of methods not practically useful before in connection with removing or suppressing, in real time, the background noise, which occurs either acoustically simul¬ taneously with the speech signal (e.g. in a helicopter cockpit where machine and rotor noise affects the acoustic communication from the pilot) or as an electric signal, equivalent therewith, in the transmission system itself. Such methods are known from the literature and are called noise suppression or speech enhancement methods. Of these methods may be mentioned adaptive filtering and spectral subtraction. See e.g. (1) and (7). The aim of improving the signal/noise ratio (the ratio of speech signal magni¬ tude to noise magnitude) is that the methods are to counteract the degradation of the reception caused by the noise and the intelligibility of the transmitted speech signal. Several of the known methods are based on a run- ning estimate of the statistic characteristics of the background noise, e.g. intensity and frequency content. With a speech or pause detector time segments are identi¬ fied with and without speech signal, respectively, and in the segments exclusively containing background noise (speech pauses) the characteristics of the noise may be estimated by suitable signal analysis. Assuming a certain stationarity of the background noise this estimate may be used for adjusting the noise suppression or speech en¬ hancement method until the next time the noise can be estimated.
Several methods are described in the literature for dis¬ tinguishing between voiced speech, unvoiced speech, and pauses, both without and with background noise. See e.g. ( ), (5) and (8). (9) includes i.a. a survey of the most important methods which have been used for classification of speech, in particular in connection with speech recog¬ nition systems.
In particular two of the known principles should be men¬ tioned: the energy histogram and valley detector prin¬ ciples. In a noise suppression method (3) use of the valley detector method is reported for pointing out the time intervals in which a mixed speech and noise signal exclusively consists of background noise (i.e. corresponding to pauses in the speech signal). In the described invention the method is incorporated in a type of feedback loop by acting on the individual frequency bands of the output signal and with the purpose of increasing the field of use of the speech/noise detector.
However, none of the known speech and pause detectors are particularly robust when the speech signal is subjected to e.g. considerable reverberation, or when the background noise is added in a poor signal/noise ratio (less than 0 dB) or has a speech-like nature, i.e. resembles the speech signal from one or more speakers. In these cases the detection will be less certain with known methods. It has been attempted to reduce this problem by using a priori knowledge about the speech and noise signals. It has thus been utilized in (1) and (2) that the amplitude fluctuations in speech and noise are different in certain cases. When, however, the noise is speech-like, this difference will be marginal.
I So far, no speech detector has been developed which can operate reliably both with a poor signal/noise ratio and with speech-like noise. The object of the present inven¬ tion is therefore to provide a method and an apparatus where this problem is solved.
This object is achieved by the method stated in claim 1 and the apparatus stated in claim 8, involving detection of the time segments in a mixed speech and noise signal which are dominated by the speech signal. This is to be understood in combination with well-known knowledge, which is described below, that a speech signal includes a plu¬ rality of time segments where the speech signal contri¬ butes only insignificantly to the mixed signal. Such seg¬ ments are not just speech pauses (between words and sen- tences, breathing), but in particular also very short in¬ tervals, typically within a word where the speech signal assumes a value so that it just contributes insignifi¬ cantly to the mixed signal. These segments are detected, and it is possible- on the basis of this to update para¬ meters for the background noise. This is done with unpre- cedented frequency and can therefore form the basis for a considerably more precise estimate of the background noise.
In a speech signal the energy can assume relatively great values in short time intervals, corresponding to some of the voiced sounds (e.g. the open vowels) as well as some of the consonants (the fricatives and the plosives). Therefore, the signal/noise ratio will be relatively great in time segments containing these speech sounds, and these segments are thus particularly useful for detecting pre¬ sence of speech in background noise. The reason why the energy is great in the mentioned speech sounds is the following:
1) A vowel may be described as a (quasi)periodic time signal which in terms of frequency consists of a funda¬ mental frequency and its harmonics, whereby the speech energy simultaneously occurs in a larger frequency range.
2) A fricative and/or a plosive may be described as a short, noise-like time signal where the energy simul¬ taneously occurs in a wide frequency range.
In the preferred embodiment of the invention the frequency range of the speech signal is suitably divided into a plu¬ rality of frequency bands, and it thus applies that for each of the two types of speech sounds the energy occurs with a certain simultaneousness between the frequency bands. Further, it is special to the vowels that since the difference between two consecutive harmonic frequencies is always equal to the fundamental frequency for the speech signal, the envelope of a frequency restricted subsignal containing two or more consecutive harmonic frequencies will always be periodic and substantially synchronous with the fundamental frequency, since the envelope represents a beat signal with a frequency equal to the difference be¬ tween the two harmonics, which is precisely equal to the fundamental frequency. Since it is the same frequency, viz. the fundamental frequency of the speech signal, for all the subsignals which causes the beat signal which is detected by envelopment, the envelopes of the subsignals will substantially be synchonous or correlated with each other.
In order that this envelope, which is periodic with the fundamental frequency, can always be produced, it is ne¬ cessary that each subsignal has a frequency band width which always comprises at least two harmonic frequencies. This is obtained with a band width of at least twice the fundamental frequency. If the fundamental frequency is e.g. 220 Hz, the band width must at least be 440 Hz.
It is well-known from the literature, see e.g. (3), to examine a mixed speech and noise signal by division into time intervals and by splitting into a number of sub- signals by means of a filter bank consisting of bandpass filters. However, in contrast to the previously described methods, this is done in a particular manner in the pre¬ sent invention, since the invention realizes a filter bank consisting of bandpass filters with a band width which is especially dependant upon general characteristics of the speech signal, as well as a detector utilizing the corre¬ lation between the envelopes of the subsignals. Moreover, and still in contrast to the previously described methods, the aim of the present invention is not to point out the time intervals in the mixed speech and noise signal which just consist of noise (i.e. corresponding to pauses in the speech signal), but to point out the intervals which are dominated by the speech signal.
The invention will be explained more fully by the follow- ing description of a preferred embodiment with reference to the drawing, in which
fig. 1 is a block diagram schematically showing an appa¬ ratus according to the invention,
fig. 2 shows an example of an input signal consisting of a portion of a speech signal without noise, and how this signal is processed in the apparatus in fig. 1,
fig. 2A shows the input signal,
fig. 2B shows the frequency limited subsignals originating from filtering of the input signal,
fig« 2C shows the envelope signals corresponding to the subsignals in fig. 2B,
fig. 2D shows the synchonism signal from the synchronism detector as well as a threshold value with which it is compared, and
fig. 2E shows the final classification signal from the threshold detector.
In fig. 1 an electric input signal 101 consisting of a speech signal mixed with a noise signal (trafic noise, cafeteria noise, speech from other persons or the like) is passed to a filter bank 102 consisting of a plurality of optionally overlapping bandpass filters with increasing center frequency and covering in combination the entire frequency range of the speech signal or part thereof. Each bandpass filter has a band width greater than twice the greatest expected value of the fundamental frequency of the speech signal,- so that a subsignal 103 comprising at least two consecutive harmonic frequencies to the funda- mental frequency can pass through each bandpass filter.
The subsignals are passed to their respective envelope detectors 104, which form the time envelopes 105 for the subsignals 103 e.g. by means of rectification, squaring or analytical signals as well as optional subsequent low-pass filtering. This signal processing, which following band¬ pass filtering of the input signal generates and utilizes the envelopes of the bandpass filtered subsignals is known in other connections from the acoustic/audiological field, see e.g. (6).
The envelope signals are passed to a synchronism detector 106, which produces a measure of synchronism between the envelope signals 105 for a time segment of the signals. Then, the time course of the computed synchronism has the shape of a staircase curve and is called the synchronism signal 107.
The principle of the synchronism detector 106 may e.g. be based on correlation, an artificial neural network or another computing method applied to all or a subset of the envelope signals 105. For example, a correlation can be computed by first computing the product sum of the signal values for any pair of signals i.e. the envelope signals from two adjacent bandpass filters and then performing summation of all the computed product sums.
Finally, the synchronism signal 107 is passed to a thres¬ hold detector 108 where the synchronism signal 107 is com- pared with a threshold value. If the synchronism signal
107 is greater than the threshold value, the time segment in question is classified as being dominated by speech, and the classification signal 109 is set to the value binary 1. If not, the classification signal 109 is set to the value binary 0.
The overall function of the synchronism detector 106 and the threshold detector 108 may also be implemented by means of either a trained, a self-organizing or other artificial neural network using the envelope signals 105 as input signals and forming the desired classification signal 109 as output signal for classification of the mixed signal.
Presence of a noise signal affects the classification more or less depending upon the characteristics of the noise signal. If the noise signal is stochastic, speech-like noise, the speech detection will by and large not be af¬ fected even with a very small signal/noise ratio. If, on the other hand, the noise signal is a signal with an in- herent modulation as a speech signal, or if it is a real speech signal from one or more persons, the interplay be¬ tween the actual signal/noise ratio and the construction of the threshold detector 108 will be of decisive impor¬ tance. When e.g. the threshold detector 108 is arranged such that the threshold value 210 with a given time con¬ stant adaptively adjusts itself corresponding to a given fraction of the size of the synchronism signal 107, then only the dominating speech signal will advantageously be detected. Removal of the lowest frequency components of the synchronism signal provides the additional advantage that a continuous noise signal consisting of harmonic fre¬ quency components (e.g. acoustic noise from a rotating machine), will not erroneously be classified as being a speech signal. Fig. 2 shows an example of how a given input signal 201 is processed in the apparatus in fig. 1. To illustrate the fundamental principle of the invention the input signal 201 is shown in fig. 2A as a short speech signal without noise consisting first of a (voiced) vowel and then of an unvoiced fricative. Fig. 2B shows the frequency limited subsignals 203 formed in the filter bank 102. Fig. 2C illustrates the envelope signals 205 formed by the enve¬ lope detectors 104 from the subsignals 203 in fig. 2B. At the vowel, the envelope signals 205 in several frequency bands are shown to be correlated with each other and modu¬ lated with a frequency corresponding to the fundamental frequency. At the fricative, the envelope signals 205 show that short-term energy is present simultaneously in several frequency bands. Fig. 2D shows the synchronism signal 207 computed from the synchronism detector 106 as well as the threshold value 210 with which it is compared. Finally, fig. 2E shows the obtained classification signal 209.
An apparatus according to the invention may be implemented either in analog or digital hardware or in software or in combinations thereof.
References:
(1) US Patent No.- 4 025 721
(2) US Patent No. 4 185 168
(3) US Patent No. 4 630 304
(4) Cox B.V. and Timothy L.M.K. 1980. Nonparametric Rank- Order Statistics Applied to Robust Voiced-Unvoiced- Silence Classification. IEEE Trans. ASSP 28,5,550- 561.
(5) Gordos G. 1983. SPEECH DETECTION IN SEVERE NOISE. Proc. 11 ICA 91-94.
(6) Houtgast T. and Steeneken H.J.M. 1973. The modulation transfer function in room acoustics as a predictor of speech intelligibility. Acoustica, 28, 66-73.
(7) Lim J.S. 1986. SPEECH ENHANCEMENT. Proc. ICASSP 3135- 3142.
(8) McAulay R.J. and Malpass M.L. 1980. Speech Enhance- ment Using Soft-Decision Noise Suppression Filter.
IEEE Trans. ASSP 28,2,137-145.
(9) Savoji M.H. 1989. A robust algorithm for accurate endpointing of speech signals. Speech Comm. 8, 45-60.

Claims

P a t e n t C l a i m s
1. A method of classifying, in a selected time interval, a mixed speech and noise signal (101, 201) as being signi¬ ficantly or insignificantly affected by the speech signal, where the mixed signal is divided into a plurality of se¬ parate, frequency limited subsignals (103, 203), c h a ¬ r a c t e r i z e d in that
- each subsignal (103, 203) comprises at least two harmo¬ nic frequencies for a fundamental frequency of the speech signal,
- the time envelope (105, 205) is generated for the sub- signals (103, 203),
a measure (107, 207) of synchronism between these enve¬ lopes (105, 205) is generated, and
this measure (107, 207) is compared with a threshold value (210).
2. A method according to claim 1, c h a r a c t e r - i z e d in that the mixed signal is divided into a plu¬ rality of time intervals in which the signal is classified successively.
3. A method according to claim 1, c h a r a c t e r - i z e d in that the selected time interval is a running time window.
4. A method according to claims 1-3, c h a r a c t e r ¬ i z e d in that all envelopes are used for generating the measure (107, 207) of synchronism between the envelopes (105, 205).
5. A method according to claims 1-3, c h a r a c t e r ¬ i z e d in that one or more subsets of the envelopes (105, 205) are used for generating the measure (107, 207) of synchronism between the envelopes (105, 205).
6. A method according to claims 1-5, c h a r a c t e r ¬ i z e d in that the generation of the measure (107, 207) of synchronism between the envelopes (105, 205) is based on a correlation computation.
7. A method according to claims 1-5, c h a r a c t e r ¬ i z e d in that the envelopes (105, 205) are passed as input signals to an artificial neural network which clas¬ sifies the signal.
8. An apparatus for classification of a mixed speech and noise signal (101, 201), comprising filter means each of which permits passage of a subsignal (103, 203), c h a ¬ r a c t e r i z e d in that
each subsignal (103, 203) contains at least two harmo¬ nic frequencies for a fundamental frequency for the speech signal, and that the apparatus moreover comprises
- means (194) for generating the time envelopes (105, 205) of the subsignals,
- means (106) for generating a measure (107, 207) of synchronism between these envelopes, as well as
- means (108) for comparing the synchronism signal (107, 207) with a given threshold value (210).
PCT/DK1990/000214 1989-08-18 1990-08-17 A method and an apparatus for classification of a mixed speech and noise signal WO1991003042A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DK406189A DK406189A (en) 1989-08-18 1989-08-18 METHOD AND APPARATUS FOR CLASSIFYING A MIXED SPEECH AND NOISE SIGNAL
DK4061/89 1989-08-18

Publications (1)

Publication Number Publication Date
WO1991003042A1 true WO1991003042A1 (en) 1991-03-07

Family

ID=8129776

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK1990/000214 WO1991003042A1 (en) 1989-08-18 1990-08-17 A method and an apparatus for classification of a mixed speech and noise signal

Country Status (2)

Country Link
DK (1) DK406189A (en)
WO (1) WO1991003042A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5406635A (en) * 1992-02-14 1995-04-11 Nokia Mobile Phones, Ltd. Noise attenuation system
WO2000005923A1 (en) * 1998-07-24 2000-02-03 Siemens Audiologische Technik Gmbh Hearing aid having an improved speech intelligibility by means of frequency selective signal processing, and a method for operating such a hearing aid
WO2001047335A2 (en) 2001-04-11 2001-07-05 Phonak Ag Method for the elimination of noise signal components in an input signal for an auditory system, use of said method and a hearing aid
WO2005086536A1 (en) * 2004-03-02 2005-09-15 Oticon A/S Method for noise reduction in an audio device and hearing aid with means for reducing noise
EP2533550A1 (en) 2011-06-06 2012-12-12 Oticon A/s Diminishing tinnitus loudness by hearing instrument treatment
EP2560410A1 (en) 2011-08-15 2013-02-20 Oticon A/s Control of output modulation in a hearing instrument
EP2563044A1 (en) 2011-08-23 2013-02-27 Oticon A/s A method, a listening device and a listening system for maximizing a better ear effect
EP2563045A1 (en) 2011-08-23 2013-02-27 Oticon A/s A method and a binaural listening system for maximizing a better ear effect
EP2613567A1 (en) 2012-01-03 2013-07-10 Oticon A/S A method of improving a long term feedback path estimate in a listening device
EP2663094A1 (en) 2012-05-09 2013-11-13 Oticon A/s Methods and apparatus for processing audio signals
EP2677770A1 (en) 2012-06-21 2013-12-25 Oticon A/s Hearing aid comprising a feedback alarm
EP2840810A2 (en) 2013-04-24 2015-02-25 Oticon A/s A hearing assistance device with a low-power mode
EP2849462A1 (en) 2013-09-17 2015-03-18 Oticon A/s A hearing assistance device comprising an input transducer system
US9344817B2 (en) 2000-01-20 2016-05-17 Starkey Laboratories, Inc. Hearing aid systems
EP3068146B1 (en) 2015-03-13 2017-10-11 Sivantos Pte. Ltd. Method for operating a hearing device and hearing device
EP3048813B1 (en) 2015-01-22 2018-03-14 Sivantos Pte. Ltd. Method and device for suppressing noise based on inter-subband correlation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4277645A (en) * 1980-01-25 1981-07-07 Bell Telephone Laboratories, Incorporated Multiple variable threshold speech detector
US4382164A (en) * 1980-01-25 1983-05-03 Bell Telephone Laboratories, Incorporated Signal stretcher for envelope generator
DE2649259C2 (en) * 1976-10-29 1983-06-09 Felten & Guilleaume Fernmeldeanlagen GmbH, 8500 Nürnberg Method for the automatic detection of disturbed telephone speech
US4630304A (en) * 1985-07-01 1986-12-16 Motorola, Inc. Automatic background noise estimator for a noise suppression system
US4696039A (en) * 1983-10-13 1987-09-22 Texas Instruments Incorporated Speech analysis/synthesis system with silence suppression

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2649259C2 (en) * 1976-10-29 1983-06-09 Felten & Guilleaume Fernmeldeanlagen GmbH, 8500 Nürnberg Method for the automatic detection of disturbed telephone speech
US4277645A (en) * 1980-01-25 1981-07-07 Bell Telephone Laboratories, Incorporated Multiple variable threshold speech detector
US4382164A (en) * 1980-01-25 1983-05-03 Bell Telephone Laboratories, Incorporated Signal stretcher for envelope generator
US4696039A (en) * 1983-10-13 1987-09-22 Texas Instruments Incorporated Speech analysis/synthesis system with silence suppression
US4630304A (en) * 1985-07-01 1986-12-16 Motorola, Inc. Automatic background noise estimator for a noise suppression system

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU666161B2 (en) * 1992-02-14 1996-02-01 Nokia Mobile Phones Limited Noise attenuation system for voice signals
US5406635A (en) * 1992-02-14 1995-04-11 Nokia Mobile Phones, Ltd. Noise attenuation system
WO2000005923A1 (en) * 1998-07-24 2000-02-03 Siemens Audiologische Technik Gmbh Hearing aid having an improved speech intelligibility by means of frequency selective signal processing, and a method for operating such a hearing aid
US6768801B1 (en) 1998-07-24 2004-07-27 Siemens Aktiengesellschaft Hearing aid having improved speech intelligibility due to frequency-selective signal processing, and method for operating same
US9344817B2 (en) 2000-01-20 2016-05-17 Starkey Laboratories, Inc. Hearing aid systems
US9357317B2 (en) 2000-01-20 2016-05-31 Starkey Laboratories, Inc. Hearing aid systems
WO2001047335A2 (en) 2001-04-11 2001-07-05 Phonak Ag Method for the elimination of noise signal components in an input signal for an auditory system, use of said method and a hearing aid
WO2005086536A1 (en) * 2004-03-02 2005-09-15 Oticon A/S Method for noise reduction in an audio device and hearing aid with means for reducing noise
US7489789B2 (en) 2004-03-02 2009-02-10 Oticon A/S Method for noise reduction in an audio device and hearing aid with means for reducing noise
EP2533550A1 (en) 2011-06-06 2012-12-12 Oticon A/s Diminishing tinnitus loudness by hearing instrument treatment
EP2560410A1 (en) 2011-08-15 2013-02-20 Oticon A/s Control of output modulation in a hearing instrument
EP2563045A1 (en) 2011-08-23 2013-02-27 Oticon A/s A method and a binaural listening system for maximizing a better ear effect
EP2563044A1 (en) 2011-08-23 2013-02-27 Oticon A/s A method, a listening device and a listening system for maximizing a better ear effect
EP2613567A1 (en) 2012-01-03 2013-07-10 Oticon A/S A method of improving a long term feedback path estimate in a listening device
EP2663094A1 (en) 2012-05-09 2013-11-13 Oticon A/s Methods and apparatus for processing audio signals
EP2677770A1 (en) 2012-06-21 2013-12-25 Oticon A/s Hearing aid comprising a feedback alarm
EP2840810A2 (en) 2013-04-24 2015-02-25 Oticon A/s A hearing assistance device with a low-power mode
EP2849462A1 (en) 2013-09-17 2015-03-18 Oticon A/s A hearing assistance device comprising an input transducer system
US9538296B2 (en) 2013-09-17 2017-01-03 Oticon A/S Hearing assistance device comprising an input transducer system
US10182298B2 (en) 2013-09-17 2019-01-15 Oticfon A/S Hearing assistance device comprising an input transducer system
EP3048813B1 (en) 2015-01-22 2018-03-14 Sivantos Pte. Ltd. Method and device for suppressing noise based on inter-subband correlation
EP3068146B1 (en) 2015-03-13 2017-10-11 Sivantos Pte. Ltd. Method for operating a hearing device and hearing device

Also Published As

Publication number Publication date
DK406189A (en) 1991-02-19
DK406189D0 (en) 1989-08-18

Similar Documents

Publication Publication Date Title
WO1991003042A1 (en) A method and an apparatus for classification of a mixed speech and noise signal
Lim et al. Enhancement and bandwidth compression of noisy speech
Schroeder Vocoders: Analysis and synthesis of speech
McAulay et al. Speech enhancement using a soft-decision noise suppression filter
EP0719439B1 (en) Voice activity detector
Ibrahim et al. Preprocessing technique in automatic speech recognition for human computer interaction: an overview
US20050108004A1 (en) Voice activity detector based on spectral flatness of input signal
Kleinschmidt Methods for capturing spectro-temporal modulations in automatic speech recognition
JPH05346797A (en) Voiced sound discriminating method
CN110390957A (en) Method and apparatus for speech detection
Mourjopoulos et al. Modelling and enhancement of reverberant speech using an envelope convolution method
Pahar et al. Coding and decoding speech using a biologically inspired coding system
de-La-Calle-Silos et al. Synchrony-based feature extraction for robust automatic speech recognition
Lee et al. Cochannel speech separation
Shoba et al. Adaptive energy threshold for monaural speech separation
Kumar et al. A new pitch detection scheme based on ACF and AMDF
US3405237A (en) Apparatus for determining the periodicity and aperiodicity of a complex wave
Brown et al. A neural oscillator sound separator for missing data speech recognition
Alku et al. Linear predictive method for improved spectral modeling of lower frequencies of speech with small prediction orders
Ganapathy et al. Comparison of modulation features for phoneme recognition
Lin A New Frequency Coverage Metric and a New Subband Encoding Model, with an Application in Pitch Estimation.
Muhsina et al. Signal enhancement of source separation techniques
Logeshwari et al. A survey on single channel speech separation
Nikhil et al. Impact of ERB and bark scales on perceptual distortion based near-end speech enhancement
de Cheveigné A mixed speech F0 estimation algorithm.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE