WO2010052749A1 - 雑音抑圧装置 - Google Patents
雑音抑圧装置 Download PDFInfo
- Publication number
- WO2010052749A1 WO2010052749A1 PCT/JP2008/003162 JP2008003162W WO2010052749A1 WO 2010052749 A1 WO2010052749 A1 WO 2010052749A1 JP 2008003162 W JP2008003162 W JP 2008003162W WO 2010052749 A1 WO2010052749 A1 WO 2010052749A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- spectrum
- noise suppression
- suppression
- unit
- Prior art date
Links
- 230000001629 suppression Effects 0.000 title claims abstract description 230
- 238000001228 spectrum Methods 0.000 claims abstract description 186
- 238000004458 analytical method Methods 0.000 description 18
- 238000000034 method Methods 0.000 description 18
- 230000003595 spectral effect Effects 0.000 description 13
- 230000005236 sound signal Effects 0.000 description 11
- 238000011156 evaluation Methods 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 238000013441 quality evaluation Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006866 deterioration Effects 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000005534 acoustic noise Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Definitions
- the present invention suppresses noise other than a target signal such as a voice / acoustic signal in a voice communication system, a voice recognition system, and the like used under various noise environments, and enables a voice communication system / hands-free call system such as a mobile phone.
- the present invention relates to a noise suppression device for improving sound quality of a TV conference system or the like and improving a recognition rate of a voice recognition system.
- a spectral subtraction (SS) method is used as a typical technique for noise suppression processing for emphasizing a speech signal that is a target signal by suppressing noise that is a non-target signal from an input signal mixed with noise.
- noise suppression is performed by subtracting an average noise spectrum estimated separately from the amplitude spectrum (for example, Non-Patent Document 1).
- the noise spectrum estimation error remains as distortion in the signal after noise suppression processing, which has characteristics that are significantly different from the signal before processing, and also harsh noise (artificial Noise (also called musical tone)), the subjective quality of the output signal may be greatly degraded.
- Patent Document 1 discloses a method for suppressing the subjective feeling of deterioration as described above.
- Patent Document 1 aims to provide a noise suppression device that does not generate musical noise in a noise section and that does not generate distortion in a voice section, and determines whether a target signal section and a noise signal section are determined from an input signal.
- a noise determination unit a noise suppression unit that performs noise suppression according to the first suppression coefficient from the input signal and the estimated noise signal, and a second suppression that is greater than the first suppression coefficient from the input signal and the estimated noise signal
- a noise excess suppression unit that performs noise suppression according to a coefficient, and a switching unit that switches between an output signal of the noise suppression unit and an output signal of the noise excess suppression unit according to a determination result of the voice / noise determination unit.
- the conventional noise suppression device Since the conventional noise suppression device is configured as described above, it switches between the output signal of the noise suppression unit and the output signal of the excessive noise suppression unit in accordance with the determination result of the voice / noise determination unit. There has been a problem that quality deterioration due to judgment cannot be avoided. In addition, since there is a wide variety of audio signals and noise signals and is accompanied by time variations, there is a problem that it is difficult to make 100% correct determination.
- the audio signal section is erroneously determined as the noise signal section, the suppression of the voice is reduced by adding the input signal.
- the erroneous determination is frequently inserted in the same audio signal section, it is unstable. There was a problem that quality was deteriorated because of the fluctuations.
- the present invention has been made to solve the above-described problems, and an object of the present invention is to provide a high-quality noise suppression apparatus that greatly reduces the occurrence of musical noise.
- a noise suppression device performs a noise suppression process on an input spectrum, outputs a noise suppression spectrum obtained, and a value of the plurality of noise suppression spectra for each frequency component And a selection unit that selects a noise suppression spectrum having the maximum value and outputs it as a spectrum of the frequency component.
- noise suppression processing is performed on an input spectrum, and a plurality of noise suppression units that output the obtained noise suppression spectrum are compared with values of a plurality of noise suppression spectra for each frequency component, Since the selection unit that selects the noise suppression spectrum having the maximum value and outputs it as the spectrum of the frequency component is provided, it is possible to greatly reduce musical noise by selecting a spectrum that is not over-suppressed, and to It is possible to realize a high-quality noise suppression device with less unstable fluctuations in the signal interval.
- FIG. 1 is a block diagram illustrating a configuration of a noise suppression device according to a first embodiment.
- 6 is a schematic diagram illustrating an example of a time transition of a spectral component in the first embodiment.
- FIG. 6 is a block diagram illustrating a configuration of a noise suppression device according to a second embodiment.
- FIG. 10 is a schematic diagram illustrating an example of a time transition of a spectrum component in the second embodiment.
- FIG. 1 is a block diagram showing the configuration of the noise suppression apparatus according to the first embodiment.
- the noise suppression device includes a time / frequency conversion unit 1, a speech likelihood analysis unit 2, a noise spectrum estimation unit 3, a first noise suppression unit 4, a second noise suppression unit 5, a maximum amplitude selection unit 6, and a frequency / time conversion. It consists of part 7.
- the first noise suppression unit 4 includes an SN estimation unit 4a and a spectrum amplitude suppression unit 4b
- the second noise suppression unit 5 includes a spectrum subtraction unit 5a and a spectrum amplitude suppression unit 5b.
- the input signal 101 is sampled at a predetermined sampling frequency (for example, 8 kHz), divided into frames at a predetermined frame period (for example, 20 msec), and input to the time / frequency conversion unit 1 and the speech likelihood analysis unit 2. .
- a predetermined sampling frequency for example, 8 kHz
- a predetermined frame period for example, 20 msec
- the time / frequency conversion unit 1 performs a windowing process on the input signal 101 divided into frame periods, and performs, for example, 256-point FFT (Fast Fourier Transform) on the windowed signal. And converted into an input spectrum 102 that is a spectrum component for each frequency, and is converted into a speech likelihood analysis unit 2, a noise spectrum estimation unit 3, an SN estimation unit 4a, a spectrum amplitude suppression unit 4b, a spectrum subtraction unit (subtraction unit) 5a, and a spectrum Output to the amplitude suppressor (amplitude suppressor) 5b.
- a known method such as a Hanning window or a trapezoidal window can be used.
- FFT is a well-known method, description is abbreviate
- the speech likelihood analysis unit 2 uses the input signal 101, the input spectrum 102 output from the time / frequency conversion unit 1, and the estimated noise spectrum 104 of the previous frame stored in an internal memory of the noise spectrum estimation unit 3 described later.
- the degree of whether the input signal of the current frame is speech or noise is, for example, a large evaluation value when the possibility of speech is high, and a small evaluation value when the possibility of speech is low.
- the speech quality evaluation value 103 is calculated as described above, and is output to the noise spectrum estimation unit 3.
- the maximum value of the autocorrelation analysis result of the input signal 101 and the frame SN ratio that can be calculated from the ratio of the power of the input spectrum 102 to the power of the estimated noise spectrum 104 are individually or It can be used in combination.
- the maximum value ACF max of the autocorrelation analysis of the input signal 101 is calculated by Equation (1)
- the frame SN ratio SNR fr is calculated by Equation (2).
- the estimated noise spectrum 104 is read out from the previous frame stored in the internal memory of the noise spectrum estimation unit 3 described later.
- x (t) is the input signal 101 divided into frames at time t
- N is the autocorrelation analysis section length
- S (k) is the k-th component of the input spectrum 102
- N (k) is the estimated noise spectrum.
- the kth component of 104, M is the number of FFT points.
- the speech likelihood evaluation value VAD is calculated from the maximum value ACF max of the autocorrelation analysis obtained by the above equation (1) and the frame SN ratio SNR fr obtained by the equation (2) by the following equation.
- SNR norm is a predetermined value for normalizing the value of SNR fr within the range of 0 to 1
- w ACF and w SNR are predetermined values for weighting.
- the sound quality evaluation value may be adjusted in advance so that it can be suitably determined.
- ACF max takes a value in the range of 0 to 1 from the property of the formula (1).
- the speech likelihood evaluation value 103 calculated by the above processing is output to the noise spectrum estimation unit 3.
- equation (3) by setting either w ACF or w SNR to 0, it is also possible to calculate the speech likelihood evaluation value 103 using only the parameter set to a value other than 0. Specifically, when w SNR is set to 0, the speech likelihood evaluation value 103 is obtained only from the maximum value ACF max of the autocorrelation analysis.
- the speech quality evaluation value 103 it is also possible to add analysis parameters other than the index / value shown in the equation (3). For example, using the input spectrum 102 and the estimated noise spectrum 104, the SN ratio of the spectrum component for each frequency is calculated, and the sum of the SN ratios of the spectrum components for each frequency (the larger the sum, the greater the The possibility to change is appropriate, such as using the variance of the S / N ratio of the spectral component for each frequency (the higher the variance, the more likely the voice harmonic structure appears and the higher the possibility of voice). It is.
- the noise spectrum estimation unit 3 refers to the speech likelihood evaluation value 103 input from the speech likelihood analysis unit 2 and uses the input spectrum 102 of the current frame when the state of the input signal of the current frame is low in the possibility of speech.
- the estimated noise spectrum of the previous frame stored in an internal memory (not shown) is updated, and the updated result is output as the estimated noise spectrum 104 to the SN estimating unit 4a and the spectrum subtracting unit 5a.
- the estimated noise spectrum is updated, for example, by reflecting the input spectrum according to the following equation (4).
- n is the frame number
- N (n ⁇ 1, k) is the estimated noise spectrum before update
- S noise (n, k) is the input spectrum of the current frame that is determined to have a low possibility of speech
- N ( n, k) tilde is the estimated noise spectrum after update.
- ⁇ (k) is a predetermined update speed coefficient that takes a value from 0 to 1, and it is preferable to set a value relatively close to 0. Further, there are cases where it is better to increase the coefficient value slightly as the frequency becomes higher, and it is better to adjust according to the type of noise.
- the update power coefficient that increases the update speed is applied when these fluctuations are large. When the fluctuations are large, the power is the smallest or the sound quality is evaluated.
- the estimated noise spectrum can be replaced (reset) with the input spectrum of the frame having the smallest value. Also, when the speech likelihood evaluation value 103 is sufficiently large, that is, when the input signal of the current frame is probabilistically likely to be speech, the estimated noise spectrum need not be updated.
- the SN estimation unit 4 a calculates an estimated SN ratio based on the input spectrum 102 and the estimated noise spectrum 104, and the spectrum amplitude suppression unit 4 b uses the amplitude suppression gain based on the estimated SN ratio. And the amplitude suppression gain is multiplied by the input spectrum 102, and the obtained result is output to the maximum amplitude selection unit 6 as the first noise suppression spectrum 105.
- the calculation of the estimated S / N ratio in the SN estimation unit 4a can be performed, for example, in the same manner as the calculation of the frame S / N ratio in Expression (2) described above. If the speech likelihood analysis unit 2 calculates the frame S / N ratio, it may be used as it is or as an estimated S / N ratio by performing appropriate processing such as smoothing in the time direction.
- the calculation of the amplitude suppression gain in the spectrum amplitude suppression unit 4b is performed so that a large amplitude suppression gain is obtained in a frame with a high estimated SN ratio and a small amplitude suppression gain is obtained in a frame with a low estimated SN ratio.
- the amplitude suppression gain is larger than most of the amplitude suppression gains (the amplitude ratio of the input spectrum 102 and the second noise suppression spectrum 106 described later) in the noise signal section of the second noise suppression unit 5 described later.
- the estimated S / N ratio and the power of the input spectrum 102 are used to estimate the voice power of the frame, that is, the power when noise is removed, so that the power of the first noise suppression spectrum 105 matches this.
- an amplitude suppression gain is obtained, and if this amplitude suppression gain is less than or equal to a predetermined lower limit value, it may be replaced with a lower limit value.
- the spectrum subtraction unit 5 a performs spectrum subtraction processing based on the estimated noise spectrum 104 for the input spectrum 102, and the spectrum amplitude suppression unit 5 b Spectral amplitude suppression that gives attenuation to each spectral component is performed, and the obtained result is output as a second noise suppression spectrum 106 to the maximum amplitude selector 6.
- the spectrum amplitude suppression unit 5b has a small variation in the amplitude suppression gain (amplitude ratio between the input spectrum 102 and the second noise suppression spectrum 106) of the second noise suppression unit 5 as a whole. Perform adaptive control of attenuation.
- the second noise suppression unit 5 for example, the one described in Japanese Patent No. 3454190 “Noise Suppression Device and Method” can be applied. Further, the order of the spectrum amplitude suppressing unit 5b and the spectrum subtracting unit 5a is reversed, and the spectrum amplitude suppressing unit 5b performs spectrum amplitude suppression for giving an attenuation amount to the spectrum component for each frequency with respect to the input spectrum 102. A configuration is also possible in which the spectrum subtraction unit 5 a performs a spectrum subtraction process based on the estimated noise spectrum 104 for the subsequent spectrum and outputs the obtained result to the maximum amplitude selection unit 6 as the second noise suppression spectrum 106. .
- the maximum amplitude selection unit 6 compares the first noise suppression spectrum 105 and the second noise suppression spectrum 106, selects a larger spectral component for each frequency, collects the selected larger spectral components, and outputs an output spectrum. The result is output to the frequency / time converter 7 as 107.
- the frequency / time conversion unit 7 performs inverse FFT processing on the output spectrum 107 input from the maximum amplitude selection unit 6 to return to the time domain signal, performs windowing processing for smooth connection with the previous and subsequent frames, and connects them. And the obtained signal is output as an output signal 108.
- FIG. 2 shows the time transition of a spectrum component at a certain frequency.
- 2A shows the input spectrum
- FIG. 2B shows the first noise suppression spectrum
- FIG. 2C shows the second noise suppression spectrum
- FIG. 2D shows the time transition of the output spectrum.
- the horizontal axis indicates time
- the vertical axis indicates amplitude
- the white bar graph indicates the amplitude of the noise
- the shaded bar graph indicates the amplitude of the voice.
- the first five sections with respect to the time axis are the noise signal sections, and the second three sections are superimposed with noise. It is a section.
- the first noise suppression unit 4 calculates the amplitude suppression gain based on the estimated SN ratio, and multiplies the input spectrum 102 shown in FIG.
- the estimated SN since the estimated SN is low, a small amplitude suppression gain is calculated, and the amplitude value of the first noise suppression spectrum becomes small.
- the estimated SN since the estimated SN is high, a large amplitude suppression gain is calculated, and the amplitude value of the first noise suppression spectrum is not so small. It should be noted that the estimated SN is likely to be low in the vicinity of the head of the audio signal section, and therefore, as shown in FIG.
- the second noise suppression unit 5 performs subtraction and amplitude suppression based on the estimated noise spectrum 104 from the input spectrum 102 shown in FIG. 2 (a), as shown in FIG. 2 (c).
- a second noise suppression spectrum 106 in which the amplitude is substantially reduced and the amplitude of the audio signal section is close to the amplitude of the audio is obtained.
- the estimated noise spectrum 104 becomes larger than the actual value due to noise fluctuations or an error in the sound quality evaluation value, as shown in FIG.
- Artificial noise musical noise
- FIG. 2D is obtained by selecting the larger one of the first noise suppression spectrum 105 in FIG. 2B and the second noise suppression spectrum 106 in FIG.
- the output spectrum 107 is shown. Since the amplitude suppression gain in the first noise suppression unit 4 is set to be larger than most of the amplitude suppression gains in the noise signal interval of the second noise suppression unit 5, most of the noise suppression interval in the noise signal interval is set. The amplitude of the first noise suppression spectrum 105 increases and is selected as the output spectrum 107. Thereby, the island-like residual noise in the noise signal section is eliminated, and the musical noise is eliminated. Further, since the one with less over-suppression is selected in the audio signal section, the output spectrum 107 in which the over-suppression is suppressed is obtained, and the sense of voice interruption is reduced.
- the maximum amplitude is provided with three or more noise suppression units.
- the selection unit 6 may be configured to select the maximum value of the spectrum component for each frequency from three or more noise suppression spectra.
- the second noise suppression unit 5 includes the spectrum subtraction unit 5a and the spectrum amplitude suppression unit 5b.
- the present invention is not limited to this.
- the second noise suppression unit 5 may include only the spectrum subtraction unit 5a.
- the estimated noise spectrum 104 is configured to be estimated by the speech likelihood analysis unit 2 and the noise spectrum estimation unit 3, but means for obtaining the estimated noise spectrum 104 is limited to this configuration. It is not a thing. For example, by making the update speed in the noise spectrum estimation unit 3 very slow and constantly updating, the speech likelihood analysis unit 2 is omitted, or the estimated noise spectrum 104 is not estimated from the input signal 101. In addition, a separate analysis / estimation method may be used from an input signal for noise estimation in which only noise is input.
- the values of the first and second noise suppression spectra 105 and 106 output from the first and second noise suppression units 4 and 5 are obtained for each frequency component. Since the comparison is made and the output spectrum 107 is selected as the value of the frequency component by selecting the one having the largest value, the musical noise can be greatly reduced by selecting the spectrum that is not over-suppressed, It is possible to realize a high-quality noise suppression device with less unstable fluctuations in the speech signal section.
- the noise suppression unit collects all frequency components in a lump like the conventional technology that selects one of the outputs of the noise suppression unit based on voice / noise determination. Suppresses the occurrence of large spectrum fluctuations without switching, prevents quality degradation due to errors in voice / noise judgment, and suppresses the generation of musical noise in the band where the noise component of the voice signal section is dominant Can do.
- the amplitude suppression gain of the first noise suppression unit 4 is set to a value larger than most of the amplitude suppression gains in the noise signal section of the second noise suppression unit 5. Since the output of the first noise suppression unit 4 is generally selected in the noise signal interval, only the amplitude suppression that does not generate musical noise is performed in the noise signal interval, thereby improving the quality. Can do. In addition, when a plurality of noise suppression units are provided, other noise suppression units can accept the generation of musical noise in the noise signal interval and apply a method with good quality in the audio signal interval. Noise suppression can be realized.
- the amplitude suppression gain of the first noise suppression unit 4 is configured to be a large value when the estimated SN ratio is high, and a small value when the estimated SN ratio is low.
- the amplitude becomes a small amplitude suppression gain, and when the other noise suppression units cause excessive suppression, the output of the first noise suppression unit is selected, so that the quality can be improved.
- the second noise suppression unit 5 is configured to generate a noise suppression spectrum by combining spectral subtraction and spectral amplitude suppression.
- the attenuation amount of the internal spectrum amplitude suppression unit 5b can be adaptively controlled so that the fluctuation of the amplitude suppression gain of the suppression unit 5 as a whole is reduced, and the output of the first noise suppression unit is approximately in the noise signal section. It becomes easy to set to be selected. Thereby, the musical noise in the noise signal section can be further suppressed.
- FIG. FIG. 3 is a block diagram showing the configuration of the noise suppression apparatus according to Embodiment 2 of the present invention.
- the first noise suppression unit includes only the spectrum amplitude suppression unit.
- the same reference numerals as those used in FIG. 1 are attached to the same configurations as those of the first embodiment, and the description thereof will be omitted or simplified.
- the spectrum amplitude suppression unit 4 b ′ multiplies the input spectrum 102 input from the time / frequency conversion unit 1 by a fixed amplitude suppression gain, and the obtained result is used as the first noise suppression unit.
- the spectrum 105 ′ is output to the maximum amplitude selector 6.
- FIG. 4 shows a time transition of a spectrum component of a certain frequency.
- 4A shows the input spectrum
- FIG. 4B shows the first noise suppression spectrum
- FIG. 4C shows the second noise suppression spectrum
- FIG. 4D shows the time transition of the output spectrum.
- the horizontal axis indicates time
- the vertical axis indicates amplitude.
- the white bar graph indicates the amplitude of the noise
- the shaded bar graph indicates the amplitude of the voice.
- the first five sections with respect to the time axis are the noise signal sections, and the second three sections are superimposed with noise. It is a section.
- the input spectrum in FIG. 4A is the same as FIG. 2A in the first embodiment.
- the noise suppression apparatus of the second embodiment includes the second noise suppression unit 5 that is the same as that of the first embodiment, the noise suppression spectrum of FIG. Since this is the same as c), the description is omitted.
- the spectrum amplitude suppression unit 4b ′ of the first noise suppression unit 4 multiplies the input spectrum 102 shown in FIG. 4A by a fixed amplitude suppression gain to thereby obtain the first noise suppression spectrum shown in FIG. 4B. 105 ′ is obtained. Since it is multiplied by a fixed amplitude suppression gain, there is no generation of annoying artificial noise (musical noise), but only the amplitude is reduced.
- the output spectrum 107 is shown. Since the amplitude suppression gain in the first noise suppression unit 4 is set to be larger than most of the amplitude suppression gains in the noise signal interval of the second noise suppression unit 5, most of the noise suppression interval in the noise signal interval is set. The amplitude of the first noise suppression spectrum 105 ′ increases and is selected as the output spectrum 107. Thereby, the island-like residual noise in the noise signal section is eliminated, and the musical noise is eliminated. In the voice signal section, the amplitude of the second noise suppression spectrum 106 is mostly increased and is selected as the output spectrum 107. Although not shown, when the amplitude of the second noise suppression spectrum 106 becomes extremely small in the voice signal section, the first noise suppression spectrum 105 ′ is selected. As a result, a certain level of sound is output, and the sense of sound discontinuity is reduced.
- the maximum amplitude is provided with three or more noise suppression units.
- the selection unit 6 may be configured to select the maximum value of the spectrum component for each frequency from three or more noise suppression spectra.
- the second noise suppression unit 5 includes the spectrum subtraction unit 5a and the spectrum amplitude suppression unit 5b.
- the present invention is not limited to this.
- the second noise suppression unit 5 may include only the spectrum subtraction unit 5a.
- the estimated noise spectrum 104 is configured to be estimated by the speech likelihood analysis unit 2 and the noise spectrum estimation unit 3, but the means for obtaining the estimated noise spectrum 104 is limited to this configuration. It is not a thing. For example, by making the update speed in the noise spectrum estimation unit 3 very slow and constantly updating, the speech likelihood analysis unit 2 is omitted, or the estimated noise spectrum 104 is not estimated from the input signal 101. In addition, a separate analysis / estimation method may be used from an input signal for noise estimation in which only noise is input.
- the noise suppression unit collects all frequency components in a lump like the conventional technology that selects one of the outputs of the noise suppression unit based on voice / noise determination. Suppresses the occurrence of large spectrum fluctuations without switching, prevents quality degradation due to errors in voice / noise judgment, and suppresses the generation of musical noise in the band where the noise component of the voice signal section is dominant Can do.
- the amplitude suppression gain of the first noise suppression unit 4 is set to be larger than most of the amplitude suppression gains in the noise signal section of the second noise suppression unit 5. Since the output of the first noise suppression unit 4 is generally selected in the noise signal interval, only the amplitude suppression that does not generate musical noise is performed in the noise signal interval, thereby improving the quality. Can do. In addition, when a plurality of noise suppression units are provided, other noise suppression units can accept the generation of musical noise in the noise signal interval and apply a method with good quality in the audio signal interval. Noise suppression can be realized.
- the second noise suppression unit 5 since the second noise suppression unit 5 is configured to generate a noise suppression spectrum by combining spectral subtraction and spectral amplitude suppression, the second noise suppression section 5 in the noise signal section.
- the attenuation amount of the internal spectrum amplitude suppression unit 5b can be adaptively controlled so that the fluctuation of the amplitude suppression gain of the suppression unit 5 as a whole is reduced, and the output of the first noise suppression unit is approximately in the noise signal section. It becomes easy to set to be selected. Thereby, the musical noise in the noise signal section can be further suppressed.
- Embodiment 3 FIG.
- the values of the plurality of noise suppression spectra 105 (105 ′) and 106 output by the plurality of noise suppression units 4 and 5 are compared for each frequency component, and the value is the highest.
- the configuration is shown in which the output spectrum 107 is selected by selecting a larger one as the value of the frequency component, the plurality of noise suppression spectra are respectively returned to the time domain signal, and the largest among the obtained plurality of time domain signals. You may comprise so that a thing may be selected.
- the same one as the frequency / time conversion unit 7 can be applied. Further, before performing the windowing process for smooth connection with the front and rear frames, the one having the largest value may be selected.
- the plurality of noise suppression spectra output from the plurality of noise suppression units are returned to the time domain signal, and the largest value among the obtained plurality of time domain signals.
- the noise suppression unit collects all frequency components in a lump like the conventional technology that selects one of the outputs of the noise suppression unit based on voice / noise determination. Without switching, it is possible to suppress the occurrence of large signal fluctuations and prevent quality degradation due to voice / noise determination errors.
- the present invention reduces the generation of annoying noise (musical noise), is excellent in high-quality noise suppression, and can be widely applied to voice communication systems and voice recognition systems used in various noise environments. .
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Noise Elimination (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
実施の形態1.
図1は、実施の形態1に係る雑音抑圧装置の構成を示すブロック図である。
雑音抑圧装置は、時間・周波数変換部1、音声らしさ分析部2、雑音スペクトル推定部3、第1の雑音抑圧部4、第2の雑音抑圧部5、最大振幅選択部6および周波数・時間変換部7で構成されている。
また、第1の雑音抑圧部4は、SN推定部4aおよびスペクトル振幅抑圧部4bで構成され、第2の雑音抑圧部5は、スペクトル減算部5aおよびスペクトル振幅抑圧部5bで構成されている。
まず、入力信号101が所定のサンプリング周波数(例えば、8kHz)でサンプリングされ、所定のフレーム周期(例えば、20msec)にフレーム分割されて、時間・周波数変換部1および音声らしさ分析部2に入力される。
例えば、推定SN比と、入力スペクトル102のパワーとを用いて、当該フレームの音声パワー、すなわち雑音を取り除いた時のパワーを推定し、第1の雑音抑圧スペクトル105のパワーがこれに一致するように振幅抑圧ゲインを求め、この振幅抑圧ゲインが所定の下限値以下となる場合には下限値に置換すればよい。
ここで、雑音信号区間における、第2の雑音抑圧部5全体の振幅抑圧ゲイン(入力スペクトル102と第2の雑音抑圧スペクトル106の振幅比)の変動が少なくなるように、スペクトル振幅抑圧部5bの減衰量の適応制御を行うようにする。
また、スペクトル振幅抑圧部5bとスペクトル減算部5aの順序を逆にして、入力スペクトル102に対して、スペクトル振幅抑圧部5bが周波数毎のスペクトル成分に減衰量を与えるスペクトル振幅抑圧を行い、振幅抑圧後のスペクトルに対して、スペクトル減算部5aが推定雑音スペクトル104に基づくスペクトル減算処理を行い、得られた結果を第2の雑音抑圧スペクトル106として最大振幅選択部6に出力する構成も可能である。
また、第2の雑音抑圧部5に、スペクトル減算部5aおよびスペクトル振幅抑圧部5bを備える構成としたが、これに限るものではなく、例えばスペクトル減算部5aのみを備える構成としても構わない。
例えば、雑音スペクトル推定部3における更新速度を非常にゆっくりとし、常に更新を行うように構成することで、音声らしさ分析部2を省略したり、推定雑音スペクトル104の推定を入力信号101から行わずに、雑音のみが入力される雑音推定用の入力信号から別途分析・推定する方法を取っても良い。
また、周波数成分毎の大小比較に基づきスペクトル選択を行うので、音声・雑音判定などに基づいて雑音抑圧部の出力の一方を選択する従来技術のように雑音抑圧部が全周波数成分を一括して切り替えることがなく、スペクトルの大きな変動の発生を抑制し、音声・雑音判定の誤りにより品質劣化を防止し、さらに音声信号区間の雑音成分が支配的な帯域でのミュージカルノイズの発生を抑制することができる。
また、複数の雑音抑圧部を備えた場合、その他の雑音抑圧部では雑音信号区間のミュージカルノイズ発生を容認して、音声信号区間の品質がよい方式を適用できるので、音声信号区間でも高品質な雑音抑圧を実現することができる。
図3は、この発明の実施の形態2に係る雑音抑圧装置の構成を示すブロック図である。実施の形態2に係る雑音抑圧装置は、第1雑音抑圧部をスペクトル振幅抑圧部のみで構成している。以下、実施の形態1と同一の構成には図1で使用した符号と同一の符号を付し、説明を省略または簡略化する。
また、第2の雑音抑圧部5に、スペクトル減算部5aおよびスペクトル振幅抑圧部5bを備える構成としたが、これに限るものではなく、例えばスペクトル減算部5aのみを備える構成としても構わない。
例えば、雑音スペクトル推定部3における更新速度を非常にゆっくりとし、常に更新を行うように構成することで、音声らしさ分析部2を省略したり、推定雑音スペクトル104の推定を入力信号101から行わずに、雑音のみが入力される雑音推定用の入力信号から別途分析・推定する方法を取っても良い。
また、周波数成分毎の大小比較に基づきスペクトル選択を行うので、音声・雑音判定などに基づいて雑音抑圧部の出力の一方を選択する従来技術のように雑音抑圧部が全周波数成分を一括して切り替えることがなく、スペクトルの大きな変動の発生を抑制し、音声・雑音判定の誤りにより品質劣化を防止し、さらに音声信号区間の雑音成分が支配的な帯域でのミュージカルノイズの発生を抑制することができる。
また、複数の雑音抑圧部を備えた場合、その他の雑音抑圧部では雑音信号区間のミュージカルノイズ発生を容認して、音声信号区間の品質がよい方式を適用できるので、音声信号区間でも高品質な雑音抑圧を実現することができる。
上述した実施の形態1および実施の形態2では、各周波数成分毎に複数の雑音抑圧部4,5が出力した複数の雑音抑圧スペクトル105(105´),106の値を比較し、値が最も大きいものを選択して当該周波数成分の値とした出力スペクトル107を得る構成を示したが、複数の雑音抑圧スペクトルをそれぞれ時間領域信号に戻し、得られた複数の時間領域信号の中で最も大きいものを選択するように構成してもよい。
また、時間領域信号の大小比較に基づき信号選択を行うので、音声・雑音判定などに基づいて雑音抑圧部の出力の一方を選択する従来技術のように雑音抑圧部が全周波数成分を一括して切り替えることがなく、信号の大きな変動の発生を抑制し、音声・雑音判定の誤りによる品質劣化を防止することができる。
Claims (4)
- 入力スペクトルに対して雑音抑圧処理を行い、得られた雑音抑圧スペクトルを出力する複数の雑音抑圧部と、
各周波数成分毎に、前記複数の雑音抑圧スペクトルの値を比較し、最大値を有する雑音抑圧スペクトルを選択して当該周波数成分のスペクトルとして出力する選択部とを備えたことを特徴とする雑音抑圧装置。 - 雑音抑圧部は、第1の雑音抑圧部を有し、
前記第1の雑音抑圧部は、入力スペクトルに対して振幅抑圧ゲインを乗じることにより雑音抑圧スペクトルを生成し、
前記第1の雑音抑圧部の振幅抑圧ゲインは、他の雑音抑圧部の雑音信号区間のおける振幅抑圧ゲインよりも大きいことを特徴とする請求項1記載の雑音抑圧装置。 - 第1の雑音抑圧部は、入力スペクトルおよび過去のフレームから推定された雑音スペクトルに基づき算出される推定SN比が高い場合には振幅抑圧ゲインを大きい値とし、前記推定SN比が低い場合には振幅抑圧ゲインを小さい値とすることを特徴とする請求項2記載の雑音抑圧装置。
- 雑音抑圧部は、第2の雑音抑圧部を有し、
前記第2の雑音抑圧部は、スペクトル減算処理を行う減算部と、スペクトル振幅の抑圧を行う振幅抑圧部とを備えたことを特徴とする請求項2記載の雑音抑圧装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200880130856.3A CN102132343B (zh) | 2008-11-04 | 2008-11-04 | 噪声抑制装置 |
US13/054,589 US8737641B2 (en) | 2008-11-04 | 2008-11-04 | Noise suppressor |
EP08877945.9A EP2362389B1 (en) | 2008-11-04 | 2008-11-04 | Noise suppressor |
JP2010536590A JP5300861B2 (ja) | 2008-11-04 | 2008-11-04 | 雑音抑圧装置 |
PCT/JP2008/003162 WO2010052749A1 (ja) | 2008-11-04 | 2008-11-04 | 雑音抑圧装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2008/003162 WO2010052749A1 (ja) | 2008-11-04 | 2008-11-04 | 雑音抑圧装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010052749A1 true WO2010052749A1 (ja) | 2010-05-14 |
Family
ID=42152566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2008/003162 WO2010052749A1 (ja) | 2008-11-04 | 2008-11-04 | 雑音抑圧装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8737641B2 (ja) |
EP (1) | EP2362389B1 (ja) |
JP (1) | JP5300861B2 (ja) |
CN (1) | CN102132343B (ja) |
WO (1) | WO2010052749A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011257643A (ja) * | 2010-06-10 | 2011-12-22 | Nippon Hoso Kyokai <Nhk> | 雑音抑圧装置およびプログラム |
JP2012132950A (ja) * | 2010-12-17 | 2012-07-12 | Fujitsu Ltd | 音声認識装置、音声認識方法および音声認識プログラム |
JP2014021438A (ja) * | 2012-07-23 | 2014-02-03 | Nippon Hoso Kyokai <Nhk> | 雑音抑圧装置およびそのプログラム |
JP2016038551A (ja) * | 2014-08-11 | 2016-03-22 | 沖電気工業株式会社 | 雑音抑圧装置、方法及びプログラム |
CN107786709A (zh) * | 2017-11-09 | 2018-03-09 | 广东欧珀移动通信有限公司 | 通话降噪方法、装置、终端设备及计算机可读存储介质 |
JP2018518696A (ja) * | 2015-06-26 | 2018-07-12 | インテル アイピー コーポレーション | 電子デバイスのノイズリダクション |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2546831B1 (en) * | 2010-03-09 | 2020-01-15 | Mitsubishi Electric Corporation | Noise suppression device |
DE112011105791B4 (de) * | 2011-11-02 | 2019-12-12 | Mitsubishi Electric Corporation | Störungsunterdrückungsvorrichtung |
JPWO2013111360A1 (ja) * | 2012-01-27 | 2015-05-11 | 三菱電機株式会社 | 高周波電流低減装置 |
JP6182895B2 (ja) * | 2012-05-01 | 2017-08-23 | 株式会社リコー | 処理装置、処理方法、プログラム及び処理システム |
JP2014145838A (ja) * | 2013-01-28 | 2014-08-14 | Honda Motor Co Ltd | 音響処理装置及び音響処理方法 |
US9601130B2 (en) * | 2013-07-18 | 2017-03-21 | Mitsubishi Electric Research Laboratories, Inc. | Method for processing speech signals using an ensemble of speech enhancement procedures |
CN103824563A (zh) * | 2014-02-21 | 2014-05-28 | 深圳市微纳集成电路与***应用研究院 | 一种基于模块复用的助听器去噪装置和方法 |
CN108292501A (zh) * | 2015-12-01 | 2018-07-17 | 三菱电机株式会社 | 声音识别装置、声音增强装置、声音识别方法、声音增强方法以及导航*** |
JP6668995B2 (ja) * | 2016-07-27 | 2020-03-18 | 富士通株式会社 | 雑音抑圧装置、雑音抑圧方法及び雑音抑圧用コンピュータプログラム |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3454190B2 (ja) | 1999-06-09 | 2003-10-06 | 三菱電機株式会社 | 雑音抑圧装置および方法 |
JP2004341339A (ja) * | 2003-05-16 | 2004-12-02 | Mitsubishi Electric Corp | 雑音抑圧装置 |
JP2004347956A (ja) * | 2003-05-23 | 2004-12-09 | Toshiba Corp | 音声認識装置、音声認識方法及び音声認識プログラム |
JP2005195955A (ja) | 2004-01-08 | 2005-07-21 | Toshiba Corp | 雑音抑圧装置及び雑音抑圧方法 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3327058A (en) * | 1963-11-08 | 1967-06-20 | Bell Telephone Labor Inc | Speech wave analyzer |
US5706395A (en) * | 1995-04-19 | 1998-01-06 | Texas Instruments Incorporated | Adaptive weiner filtering using a dynamic suppression factor |
JP2950260B2 (ja) * | 1996-11-22 | 1999-09-20 | 日本電気株式会社 | 雑音抑圧送話装置 |
US6122384A (en) * | 1997-09-02 | 2000-09-19 | Qualcomm Inc. | Noise suppression system and method |
US6088668A (en) | 1998-06-22 | 2000-07-11 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
FR2797343B1 (fr) * | 1999-08-04 | 2001-10-05 | Matra Nortel Communications | Procede et dispositif de detection d'activite vocale |
US7133825B2 (en) * | 2003-11-28 | 2006-11-07 | Skyworks Solutions, Inc. | Computationally efficient background noise suppressor for speech coding and speech recognition |
JP5086442B2 (ja) * | 2007-12-20 | 2012-11-28 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | 雑音抑圧方法及び装置 |
-
2008
- 2008-11-04 EP EP08877945.9A patent/EP2362389B1/en not_active Not-in-force
- 2008-11-04 CN CN200880130856.3A patent/CN102132343B/zh not_active Expired - Fee Related
- 2008-11-04 US US13/054,589 patent/US8737641B2/en not_active Expired - Fee Related
- 2008-11-04 WO PCT/JP2008/003162 patent/WO2010052749A1/ja active Application Filing
- 2008-11-04 JP JP2010536590A patent/JP5300861B2/ja active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3454190B2 (ja) | 1999-06-09 | 2003-10-06 | 三菱電機株式会社 | 雑音抑圧装置および方法 |
JP2004341339A (ja) * | 2003-05-16 | 2004-12-02 | Mitsubishi Electric Corp | 雑音抑圧装置 |
JP2004347956A (ja) * | 2003-05-23 | 2004-12-09 | Toshiba Corp | 音声認識装置、音声認識方法及び音声認識プログラム |
JP2005195955A (ja) | 2004-01-08 | 2005-07-21 | Toshiba Corp | 雑音抑圧装置及び雑音抑圧方法 |
Non-Patent Citations (1)
Title |
---|
STEVEN F. BOLL: "Suppression of Acoustic noise in speech using spectral subtraction", IEEE TRANS. ASSP, vol. ASSP-27, no. 2, April 1979 (1979-04-01) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011257643A (ja) * | 2010-06-10 | 2011-12-22 | Nippon Hoso Kyokai <Nhk> | 雑音抑圧装置およびプログラム |
JP2012132950A (ja) * | 2010-12-17 | 2012-07-12 | Fujitsu Ltd | 音声認識装置、音声認識方法および音声認識プログラム |
JP2014021438A (ja) * | 2012-07-23 | 2014-02-03 | Nippon Hoso Kyokai <Nhk> | 雑音抑圧装置およびそのプログラム |
JP2016038551A (ja) * | 2014-08-11 | 2016-03-22 | 沖電気工業株式会社 | 雑音抑圧装置、方法及びプログラム |
JP2018518696A (ja) * | 2015-06-26 | 2018-07-12 | インテル アイピー コーポレーション | 電子デバイスのノイズリダクション |
CN107786709A (zh) * | 2017-11-09 | 2018-03-09 | 广东欧珀移动通信有限公司 | 通话降噪方法、装置、终端设备及计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US8737641B2 (en) | 2014-05-27 |
US20110123045A1 (en) | 2011-05-26 |
EP2362389B1 (en) | 2014-03-26 |
JPWO2010052749A1 (ja) | 2012-03-29 |
JP5300861B2 (ja) | 2013-09-25 |
EP2362389A1 (en) | 2011-08-31 |
CN102132343A (zh) | 2011-07-20 |
EP2362389A4 (en) | 2012-07-25 |
CN102132343B (zh) | 2014-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5300861B2 (ja) | 雑音抑圧装置 | |
JP5153886B2 (ja) | 雑音抑圧装置および音声復号化装置 | |
CN111899752B (zh) | 快速计算语音存在概率的噪声抑制方法及装置、存储介质、终端 | |
EP1252796B1 (en) | System and method for dual microphone signal noise reduction using spectral subtraction | |
JP4423300B2 (ja) | 雑音抑圧装置 | |
JP3454206B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
JP4836720B2 (ja) | ノイズサプレス装置 | |
CN110739005B (zh) | 一种面向瞬态噪声抑制的实时语音增强方法 | |
JP5435204B2 (ja) | 雑音抑圧の方法、装置、及びプログラム | |
KR101088627B1 (ko) | 잡음 억압 장치 및 잡음 억압 방법 | |
US8804980B2 (en) | Signal processing method and apparatus, and recording medium in which a signal processing program is recorded | |
US9454956B2 (en) | Sound processing device | |
CN104050971A (zh) | 声学回声减轻装置和方法、音频处理装置和语音通信终端 | |
KR101791444B1 (ko) | 동적 마이크로폰 신호 믹서 | |
KR101088558B1 (ko) | 잡음 억압 장치 및 잡음 억압 방법 | |
JP2008216721A (ja) | 雑音抑圧の方法、装置、及びプログラム | |
CN112151060B (zh) | 单通道语音增强方法及装置、存储介质、终端 | |
US8406430B2 (en) | Simulated background noise enabled echo canceller | |
JP4413205B2 (ja) | エコー抑圧方法、装置、エコー抑圧プログラム、記録媒体 | |
JP5413575B2 (ja) | 雑音抑圧の方法、装置、及びプログラム | |
JP2006113515A (ja) | ノイズサプレス装置、ノイズサプレス方法及び移動通信端末装置 | |
JP2003131689A (ja) | ノイズ除去方法及び装置 | |
JP2005250266A (ja) | エコー抑圧方法、この方法を実施する装置、プログラムおよび記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200880130856.3 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08877945 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010536590 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13054589 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008877945 Country of ref document: EP |