JP2859634B2

JP2859634B2 - Noise removal device

Info

Publication number: JP2859634B2
Application number: JP1101141A
Authority: JP
Inventors: 敬有吉
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-04-19
Filing date: 1989-04-19
Publication date: 1999-02-17
Anticipated expiration: 2014-02-17
Also published as: JPH02278298A

Description

【発明の詳細な説明】技術分野本発明は、雑音除去装置、より詳細には、雑音下での
音声入力に対する雑音除去技術に関し、オフィス内、工
場内、自動車内、家庭内での音声認識に応用して好適な
ものである。Description: TECHNICAL FIELD The present invention relates to a noise elimination device, and more particularly to a noise elimination technology for voice input under noise, and is used for speech recognition in offices, factories, automobiles, and homes. It is suitable for application.

従来技術周囲雑音の多い環境の中で発声される音声には、雑音
が重畳し、このことがそのような環境下の音声認識にお
いて、音声認識率を低下させる原因となる。従って、雑
音の重畳した音声情報からできるだけ雑音成分を除去す
る必要がある。2. Related Art Noise is superimposed on speech uttered in an environment with a lot of ambient noise, and this causes a reduction in the speech recognition rate in speech recognition in such an environment. Therefore, it is necessary to remove noise components as much as possible from speech information with noise superimposed.

従来、バンドパスフィルタバンクを用いた音声認識装
置などにおける雑音除去方式として、スペクトルサブト
ラクション法がある。この方法は、予め定められた期
間、又は、音声が検出されていない期間のスペクトルパ
ターンを雑音のスペクトルパターンとして保持し、音声
が検出された期間のスペクトルパターンからこの雑音の
スペクトルパターンを減じることにより、入力信号に含
まれる雑音成分を除去するものであるが、音声区間以外
の期間の雑音のスペクトルパターンと音声区間中の雑音
のスペクトルパターンとが時間定常であるという仮定が
あるので、時間非定常の雑音、特に、音声区間中の雑音
の変化に対応できないという欠点があった。2. Description of the Related Art Conventionally, there is a spectral subtraction method as a noise removal method in a speech recognition device or the like using a bandpass filter bank. This method retains a spectrum pattern of a predetermined period or a period in which no sound is detected as a noise spectral pattern, and subtracts this noise spectral pattern from a spectrum pattern of a period in which sound is detected. , Which removes noise components included in the input signal. However, there is an assumption that the noise spectrum pattern in a period other than the voice section and the noise spectrum pattern in the voice section are time-stationary. However, there is a drawback that it is impossible to cope with the noise of the voice, in particular, the change of the noise during the voice section.

また、時間非定常の雑音に対応するための方法とし
て、特開昭58−93100号公報、特開昭58−123599号公報
に示されている方法がある。これらは、タイムスペクト
ルパターンの時間軸方向と周波数軸方向の両方にある大
きさを持つ単位メッシュパターンを考え、この単位メッ
シュパターンの出現頻度を雑音を含まない長時間の音声
情報から予め求めておき、出現確率の小さい単位メッシ
ュパターンが生成した場合に、雑音によるものを判断し
て補正を加えるものであるが、比較的規模の小さい特定
話者音声認識装置などに応用するには、予め多くの情報
を必要とすることと、単位メッシュパターンのマッチン
グを行う処理時間が大きいという問題点がある。Further, as a method for dealing with time-unsteady noise, there are methods disclosed in JP-A-58-93100 and JP-A-58-123599. These consider a unit mesh pattern having a size in both the time axis direction and the frequency axis direction of the time spectrum pattern, and the frequency of appearance of this unit mesh pattern is obtained in advance from long-term audio information containing no noise. When a unit mesh pattern with a small appearance probability is generated, noise is determined and correction is performed. However, in order to apply to a relatively small-scale specific speaker voice recognition device, a large number of There is a problem that information is required and a processing time for matching the unit mesh pattern is long.

目的本発明は、上記従来技術の欠点に鑑みなされたもの
で、雑音除去装置において、時間非定常の雑音に対する
雑音除去性能の向上、特に、音声区間中であっても、音
声の成分が含まれていない帯域の雑音成分の推定を行
い、雑音の変動に対応し得ることを目的とするものであ
り、更には、入力信号に含まれる雑音の成分に関わりな
く正確に雑音除去を行うことを目的とするものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above-described disadvantages of the related art, and in a noise reduction apparatus, an improvement in noise reduction performance with respect to time-unsteady noise, particularly including a voice component even in a voice section. The purpose is to estimate the noise component of the band that is not present and to be able to cope with the fluctuation of the noise.Furthermore, the purpose is to accurately remove the noise irrespective of the noise component included in the input signal. It is assumed that.

構成本発明は、上記目的を達成するために、マイクから入
力された音声信号の前処理を行う音声前処理部と、上記
音声前処理部の出力信号のスペクトルを求める、複数個
のチャンネルからなるバンドパスフィルタバンクと、上
記バンドパスフィルタバンクの複数個のチャンネルの出
力信号を複数個の帯域に対応付けて分割し、該帯域毎に
音声区間を検出する音声区間検出部と、上記音声区間検
出部の各帯域に音声が検出されていない時に、上記バン
ドパスフィルタバンクの出力信号のうち該帯域に対応す
るチャンネルの出力信号から該チャンネルの雑音成分を
推定する雑音推定部と、上記バンドパスフィルタバンク
の各チャンネルの出力から、上記雑音推定部で推定され
た対応する各チャンネルの雑音成分を減じる雑音除去部
とを具備して成ること、更には、上記音声区間検出部
は、上記帯域毎に音声区間を検出するためのしきい値を
有し、該各帯域毎のしきい値は、上記雑音推定部で推定
された該帯域に対応するチャンネルの雑音推定値によっ
て決定されることを特徴とするものである。以下、本発
明の実施例に基づいて説明する。Configuration In order to achieve the above object, the present invention comprises an audio preprocessing unit for performing preprocessing of an audio signal input from a microphone, and a plurality of channels for obtaining a spectrum of an output signal of the audio preprocessing unit. A band-pass filter bank, a voice-segment detecting unit that divides output signals of a plurality of channels of the band-pass filter bank into a plurality of bands and detects a voice period for each band, A noise estimating unit for estimating a noise component of the channel from an output signal of a channel corresponding to the band among output signals of the band-pass filter bank when no sound is detected in each band of the unit; A noise removing unit for subtracting the noise component of each corresponding channel estimated by the noise estimating unit from the output of each channel of the bank. The voice section detection unit further includes a threshold for detecting a voice section for each band, and the threshold value for each band is the threshold estimated by the noise estimation unit. It is characterized by being determined by the noise estimation value of the channel corresponding to the band. Hereinafter, a description will be given based on examples of the present invention.

第１図は、本発明の一実施例を説明するための構成図
で、図中、１は音声前処理部、２はバンドパスフィルタ
バンク、３はA/D変換部、４は音声区間検出部、５は雑
音推定部、６は雑音除去部で、音声前処理部１は、マイ
クから入力された音声信号の増幅（Mic.Amp）、プリエ
ンファシス（Pre−emf）、オートゲインコントロール
（AGC）などの前処理を行う。バンドパスフィルタバン
ク２は、15チャンネルで各チャンネルが、バンドパスフ
ィルタ（BPF）、検波器（DET）、ローパスフィルタ（LP
F）で構成されていて、音声前処理部の出力信号のスペ
クトルを求める。各バンドパスフィルタの中心周波数
は、第１チャンネル（ch.1）が250Hz、第15チャンネル
（ch.15）が6.35kHzで対数軸上で等間隔になっている。FIG. 1 is a block diagram for explaining one embodiment of the present invention. In the drawing, reference numeral 1 denotes an audio preprocessing unit, 2 denotes a band-pass filter bank, 3 denotes an A / D conversion unit, and 4 denotes an audio section detection. 5, a noise estimating unit, 6 a noise removing unit, and a voice pre-processing unit 1. Amplifying (Mic.Amp), pre-emphasis (Pre-emf), and auto gain control (AGC) of a voice signal input from a microphone. ) Is performed. Bandpass filter bank 2 consists of 15 channels, each of which has a bandpass filter (BPF), detector (DET), lowpass filter (LP
F), and determines the spectrum of the output signal of the audio preprocessing unit. The center frequency of each bandpass filter is 250 Hz for the first channel (ch.1) and 6.35 kHz for the 15th channel (ch.15), which are equally spaced on a logarithmic axis.

A/D変換部３は、15チャンネルのバンドパスフィルタ
バンクの出力をフレーム周期10mS毎に８−bitのデジタ
ル値に変換し、タイムスペクトルパターンを得る。The A / D converter 3 converts the output of the band pass filter bank of 15 channels into an 8-bit digital value at a frame period of 10 ms to obtain a time spectrum pattern.

音声区間検出部４は、そのタイムスペクトルパターン
から15チャンネル毎に音声区間を検出する。但し、15チ
ャンネルを隣接する複数のチャンネルから成るグループ
に分割して各グループ毎に音声区間を検出しても良い。
フレーム番号ｉ、チャンネル番号ｊ、i,jに対するタイ
ムスペクトルパターンをＸ（i,j）、各チャンネルの予
め推定されているその時点での雑音の推定値をＮ（i,
j）、各チャンネルの予め推定されているその時点での
しきい値をTh（i,j）とすると、音声区間の条件は、Ｘ（i,j）−Ｎ（i,j）＞Th（i,j）なるフレームが、連続する３フレーム程度以上続いた場
合である。ここで、各チャンネルのしきい値Th（i,j）
は、予め実験により定められた定数、Thmin（ｊ）、Ｃ
（ｊ）に対して、とする。The voice section detection unit 4 detects a voice section from the time spectrum pattern for every 15 channels. However, it is also possible to divide the 15 channels into groups consisting of a plurality of adjacent channels and detect the voice section for each group.
X (i, j) is the time spectrum pattern for frame number i, channel number j, i, j, and N (i,
j), assuming that the pre-estimated threshold of each channel at that time is Th (i, j), the condition of the voice section is: X (i, j) −N (i, j)> Th ( i, j) is continuous for about three or more consecutive frames. Here, the threshold value Th (i, j) of each channel
Is a constant determined in advance by experiment, Thmin (j), C
For (j), And

但し、音声区間検出が、隣接する複数のチャンネルか
ら成るグループで行われる場合には、グループｇに含ま
れるチャンネルをjs（ｇ）,,,je（ｇ）とすると、音声
区間の条件は、で、しきい値は、とする。However, when the voice section detection is performed in a group including a plurality of adjacent channels, if the channels included in the group g are js (g) ,,, je (g), the condition of the voice section is as follows. And the threshold is And

雑音推定部５は、次のフレームで用いる各チャンネル
の雑音推定値Ｎ（ｉ＋1,j）を次の方法により求めて更
新する。The noise estimating unit 5 obtains and updates the noise estimation value N (i + 1, j) of each channel used in the next frame by the following method.

Ｘ（i,j）−Ｎ（i,j）＜Th（i,j）なるフレームが、連続する６フレーム程度以上続いた場
合に、その間のＸ（i,j）の平均、を新たな雑音推定値として更新する。それ以外の場合
は、１フレーム前の雑音推定値を保持する。即ち、Ｎ（ｉ＋1,j）＝Ｎ（i,j）である。X (i, j) −N (i, j) <Th (i, j) When six or more consecutive frames continue, the average of X (i, j) between them is: Is updated as a new noise estimate. In other cases, the noise estimation value of one frame before is held. That is, N (i + 1, j) = N (i, j).

但し、音声区間検出が、隣接する複数のチャンネルか
ら成るグループで行われる場合には、雑音推定値の更新
の条件は、同様に、である。However, when the voice section detection is performed in a group including a plurality of adjacent channels, the condition for updating the noise estimated value is It is.

雑音除去部６は、音声区間とされたフレームｉ、チャ
ンネルｊのタイムスペクトルパターンＸ（i,j）と、そ
の時点での雑音推定値Ｎ（i,j）から、雑音を除去した
音声のパターンＳ（i,j）を求める。即ち、Ｓ（i,j）＝Ｘ（i,j）−Ｎ（i,j）である。また音声区間でない、或いは、音声区間がキャ
ンセルされたフレームｉ、チャンネルｊに関しては、Ｓ（i,j）＝０とする。The noise removing unit 6 removes the noise pattern from the time spectrum pattern X (i, j) of the frame i and the channel j, which are the voice section, and the estimated noise value N (i, j) at that time. Find S (i, j). That is, S (i, j) = X (i, j) -N (i, j). Also, for a frame i and a channel j which are not a voice section or where a voice section has been canceled, S (i, j) = 0.

尚、音声区間検出部、雑音推定部、及び、雑音除去部
は、ソフトウェアによって実施されているが、DSPなど
を用いてハードウェアで構成することも可能である。Note that the voice section detection unit, the noise estimation unit, and the noise removal unit are implemented by software, but may be configured by hardware using a DSP or the like.

効果以上の説明から明らかなように、請求項第１項の発明
においては、バンドパスフィルタバンクの出力信号を複
数個の帯域に分割し、各帯域毎に音声区間を検出し、各
帯域に音声が検出されていない時に、バンドパスフィル
タバンクの出力信号のうち対応するチャンネルの出力信
号からそのチャンネルの雑音成分を推定しているので、
音声区間中でも、より正確には、有る帯域に音声の成分
がなく、別の帯域に音声の成分がある時でも、音声の成
分のない帯域で雑音成分の推定値を更新することが出
来、従来難しかった時間非定常の雑音に対する雑音除去
性能が向上した。Effect As is apparent from the above description, according to the first aspect of the present invention, the output signal of the band-pass filter bank is divided into a plurality of bands, a voice section is detected for each band, When is not detected, since the noise component of the channel is estimated from the output signal of the corresponding channel among the output signals of the band-pass filter bank,
Even during the voice section, more precisely, even when there is no voice component in one band and there is a voice component in another band, the estimated value of the noise component can be updated in the band without voice component. The noise removal performance for difficult time non-stationary noise has been improved.

また、請求項第２項の発明においては、各帯域毎に音
声区間を検出するためのしきい値を有し、この各しきい
値は、その帯域に対応するチャンネルの雑音推定値によ
って決定されるので、入力信号に含まれる雑音の成分に
関わりなく（低域成分の大きな雑音、高域成分の大きな
雑音に依らず）、各帯域の音声区間の検出を正確に行な
うことが出来るようになった。According to the second aspect of the present invention, each of the bands has a threshold for detecting a voice section, and each of the thresholds is determined by a noise estimation value of a channel corresponding to the band. Therefore, regardless of the noise component included in the input signal (irrespective of the large noise of the low frequency component and the large noise of the high frequency component), the voice section of each band can be accurately detected. Was.

【図面の簡単な説明】[Brief description of the drawings]

第１図は、本発明の一実施例を説明するための構成図で
ある。１……音声前処理部、２……バンドパスフィルタバン
ク、３……A/D変換部、４……音声区間検出部、５……
雑音推定部、６……雑音除去部。FIG. 1 is a configuration diagram for explaining an embodiment of the present invention. 1 ... voice pre-processing unit, 2 ... band-pass filter bank, 3 ... A / D conversion unit, 4 ... voice section detection unit, 5 ...
Noise estimator 6, noise remover.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 513 G10L 3/02 301 G10L 7/08 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continuation of the front page (58) Field surveyed (Int. Cl. ⁶ , DB name) G10L 3/00 513 G10L 3/02 301 G10L 7/08 JICST file (JOIS)

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】マイクから入力された音声信号の前処理を
行う音声前処理部と、上記音声前処理部の出力信号のス
ペクトルを求める複数個のチャンネルからなるバンドパ
スフィルタバンクと、上記バンドパスフィルタバンクの
複数個のチャンネルの出力信号を複数個の帯域に対応付
けて分割し、該帯域毎に音声区間を検出する音声区間検
出部と、上記音声区間検出部の各帯域に音声が検出され
ていない時に、上記バンドパスフィルタバンクの出力信
号のうち該帯域に対応するチャンネルの出力信号から該
チャンネルの雑音成分を推定する雑音推定部と、上記バ
ンドパスフィルタバンクの各チャンネルの出力から、上
記雑音推定部で推定された対応する各チャンネルの雑音
成分を減じる雑音除去部とを具備して成ることを特徴と
する雑音除去装置。An audio preprocessor for preprocessing an audio signal input from a microphone; a bandpass filter bank including a plurality of channels for obtaining a spectrum of an output signal of the audio preprocessor; An output signal of a plurality of channels of the filter bank is divided into a plurality of bands in association with a plurality of bands, and a voice section detecting section for detecting a voice section for each band, and voice is detected in each band of the voice section detecting section. When not, a noise estimator for estimating a noise component of the channel from an output signal of a channel corresponding to the band out of the output signals of the bandpass filter bank; and A noise removing unit for reducing a noise component of each of the corresponding channels estimated by the noise estimating unit.

【請求項２】上記音声区間検出部は、上記帯域毎に音声
区間を検出するためのしきい値を有し、該各帯域毎のし
きい値は、上記雑音推定部で推定された該帯域に対応す
るチャンネルの雑音推定値によって決定されることを特
徴とする請求項第１項に記載の雑音除去装置。2. The voice section detecting section has a threshold value for detecting a voice section for each of the bands, and the threshold value for each band is determined by the band estimated by the noise estimating section. The noise elimination apparatus according to claim 1, wherein the noise elimination apparatus is determined by a noise estimation value of a channel corresponding to.