JP2008209768A

JP2008209768A - Noise eliminator

Info

Publication number: JP2008209768A
Application number: JP2007047580A
Authority: JP
Inventors: Akio Horii; 昭男堀井; Tomohiro Narita; 知宏成田; Jun Ishii; 純石井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-02-27
Filing date: 2007-02-27
Publication date: 2008-09-11
Anticipated expiration: 2027-02-27
Also published as: JP4818955B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a noise eliminator capable of securely eliminating noise under various environments. <P>SOLUTION: A speech period determination means 3 inputs speech convoluting data mixed with noise, and determines whether it is a speech period or a noise period. A correction filter coefficient updating means 6 updates a coefficient of a correction filter for correcting difference of a frequency characteristic between the speech convoluting data mixed with noise and noise data, from the noise period just after the speech period, when the speech period determination means 3 determines that it is the noise period just after the speech period, and stores it in a correction filter memory 7. A noise eliminating means 8 eliminates the noise data from the speech convoluting data mixed with noise in the speech period by using the correction filter coefficient stored in the correction filter memory 7. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、観測信号から目的信号を抽出するために、観測信号に含まれる目的信号以外の信号である雑音を除去する雑音除去装置に関するものである。 The present invention relates to a noise removal apparatus that removes noise that is a signal other than a target signal included in an observation signal in order to extract the target signal from the observation signal.

雑音除去装置は、観測信号から目的信号を抽出するために、観測信号に含まれる目的信号以外の信号である雑音を除去する装置であり、音声認識や音声通信の分野で利用されており、音声認識率の改善や通信の際の音声品質の改善が得られる有効な装置である。
このような観測信号から、搬送信号に含まれる雑音（重畳雑音）を除去するための簡便で有効な手法として、２入力スペクトルサブトラクション法（以下、２入力ＳＳ法と称する）がある。例えば、このような２入力ＳＳ法を用いる従来の雑音除去装置としては、特許文献１に示されているものがあった。 A noise removal device is a device that removes noise that is a signal other than the target signal included in the observation signal in order to extract the target signal from the observation signal, and is used in the fields of voice recognition and voice communication. It is an effective device that can improve the recognition rate and the voice quality during communication.
As a simple and effective method for removing noise (superimposed noise) included in the carrier signal from such an observation signal, there is a two-input spectrum subtraction method (hereinafter referred to as a two-input SS method). For example, as a conventional noise removing apparatus using such a two-input SS method, there is one disclosed in Patent Document 1.

例えば、電話においては、送信信号が受信信号へ回り込む（混入）ことによって受信信号の音質劣化が生じることがある。上記のような従来技術では、受信信号の音質改善のために、受信信号に回り込んだ送信信号を２入力ＳＳによって除去することが開示されている。手順は、先ず、送信信号のパワースペクトルと受信信号のパワースペクトルから、回り込みの伝達特性を求める。次に、伝達特性を基に送信信号パワースペクトルから、受信信号パワースペクトルより除去すべき回り込み送信信号パワースペクトルを求める。そして、受信信号パワースペクトルから回り込み送信信号パワースペクトルを除去することで品質の良い受信信号が得られる。伝達特性は、（１）送信信号が存在する区間、（２）受信信号に相手の音声が無い区間、（３）送信信号が存在し、かつ受信信号に相手の音声が無い区間、で推定することが示されていた。 For example, in a telephone, the sound quality of the received signal may be deteriorated by the transmission signal wrapping around (mixing) the received signal. In the conventional technology as described above, it is disclosed that a transmission signal wrapping around a reception signal is removed by a two-input SS in order to improve the sound quality of the reception signal. In the procedure, first, the wraparound transfer characteristic is obtained from the power spectrum of the transmission signal and the power spectrum of the reception signal. Next, a roundabout transmission signal power spectrum to be removed from the reception signal power spectrum is obtained from the transmission signal power spectrum based on the transfer characteristics. A high-quality received signal can be obtained by removing the wraparound transmission signal power spectrum from the received signal power spectrum. Transfer characteristics are estimated in (1) a section where a transmission signal exists, (2) a section where there is no other party's voice in the received signal, and (3) a section where there is a transmission signal and there is no other party's voice in the received signal. It has been shown.

特開平９−２５２２６８号公報Japanese Patent Laid-Open No. 9-252268

従来の２入力ＳＳ法を用いた雑音除去装置は、音声入力が雑音から始まる場合には、補正フィルタ係数は正確に音響伝達特性を表す係数を更新することができるので、それによって音声に重畳した雑音を正確に推定できるため、音声に重畳した雑音除去を比較的正確に実施することができる。しかしながら、音声入力が音声から始まる場合や、雑音から始まっても補正フィルタ係数を更新するのに十分な区間の確保ができなかった場合には正確な補正フィルタ係数を更新することができない。その結果、このような場合は、補正フィルタ係数を用いて雑音データから音声に重畳した雑音データを正確に推定できないため、正確な雑音除去を実施することができず、認識性能が低下するという問題があった。 In the conventional noise reduction apparatus using the two-input SS method, when the voice input starts with noise, the correction filter coefficient can accurately update the coefficient representing the acoustic transfer characteristic, and is thus superimposed on the voice. Since noise can be estimated accurately, noise removal superimposed on speech can be performed relatively accurately. However, when the voice input starts with voice or when a section sufficient to update the correction filter coefficient cannot be ensured even if it starts with noise, the correct correction filter coefficient cannot be updated. As a result, in such a case, the noise data superimposed on the speech cannot be accurately estimated from the noise data using the correction filter coefficient, so that accurate noise removal cannot be performed and the recognition performance deteriorates. was there.

この発明は上記のような課題を解決するためになされたもので、音声区間直前で補正フィルタ係数を更新するのに十分な雑音区間を確保できない環境や発声直前から発声開始後にかけて雑音重畳音声データと雑音データとの間においての音声重畳雑音の伝達特性に係る周波数特性が大きく変化する環境下においても、正確な雑音除去を行うことのできる雑音除去装置を得ることを目的としている。 The present invention has been made in order to solve the above-described problems, and is an environment in which a sufficient noise section for updating the correction filter coefficient immediately before the speech section cannot be secured, or noise superimposed speech data from immediately before utterance to after the start of utterance. It is an object of the present invention to obtain a noise removal device that can perform accurate noise removal even in an environment where the frequency characteristic related to the transfer characteristic of voice superimposed noise between the noise and noise data changes greatly.

この発明に係る雑音除去装置は、雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、音声区間判定手段が音声区間の直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための補正フィルタの係数を更新する補正フィルタ係数更新手段と、補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去する雑音除去手段とを備えたものである。 The noise removal apparatus according to the present invention receives noise superimposed speech data, and determines a speech section determination unit that determines whether a speech section or a noise section, and the speech section determination unit determines that the noise section is immediately after the speech section. The correction filter coefficient updating means for updating the coefficient of the correction filter for correcting the frequency characteristic difference between the noise superimposed voice data and the noise data from the noise section immediately after the voice section, and the correction filter coefficient Noise removing means for removing the noise data from the noise-superimposed voice data in the section.

この発明の雑音除去装置は、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための補正フィルタの係数を更新するようにしたので、様々な環境下においても、確実に雑音除去を行うことができる。 Since the noise removal apparatus of the present invention updates the coefficient of the correction filter for correcting the frequency characteristic difference between the noise superimposed voice data and the noise data from the noise section immediately after the voice section, Noise removal can be reliably performed even in an environment.

実施の形態１．
図１は、この発明の実施の形態１による雑音除去装置を示す構成図である。
図において、雑音除去装置は、雑音重畳音声入力手段１、雑音入力手段２、音声区間判定手段３、雑音重畳音声スペクトル演算手段４、雑音スペクトル演算手段５、補正フィルタ係数更新手段６、補正フィルタメモリ７、雑音除去手段８、初期状態判定手段９、音声区間パワースペクトルメモリ１０、補正フィルタ係数更新用スペクトルメモリ１１からなる。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a noise removing apparatus according to Embodiment 1 of the present invention.
In the figure, the noise removing device includes a noise superimposed voice input means 1, a noise input means 2, a voice segment determination means 3, a noise superimposed voice spectrum calculation means 4, a noise spectrum calculation means 5, a correction filter coefficient update means 6, a correction filter memory. 7, a noise removing unit 8, an initial state determining unit 9, a speech section power spectrum memory 10, and a correction filter coefficient updating spectrum memory 11.

雑音重畳音声入力手段１は、ユーザ及びその他音響信号出力装置によって発生される雑音重畳音声を入力し、この雑音重畳音声に対してＡ／Ｄ変換を行って雑音重畳音声データを出力する手段である。雑音入力手段２は、雑音を入力し、この雑音に対してＡ／Ｄ変換を行って雑音データを出力する手段である。音声区間判定手段３は、雑音重畳音声入力手段１から出力される雑音重畳音声データを入力し、音声区間か雑音区間を判定し、区間情報と区間変化情報を出力する手段である。雑音重畳音声スペクトル演算手段４は、雑音重畳音声データを周波数変換して雑音重畳音声パワースペクトルを時系列として出力する手段である。雑音スペクトル演算手段５は、雑音データを周波数変換して雑音パワースペクトルを時系列として出力する手段である。補正フィルタ係数更新手段６は、音声区間判定手段３が雑音区間と判定した場合に、音声区間直後の雑音区間から雑音重畳音声と雑音との間においての音声重畳雑音の伝達特性に係る周波数特性の差異を補正するための補正フィルタ係数を更新し、補正フィルタメモリ７に記憶させる手段である。 The noise superimposed voice input means 1 is a means for inputting noise superimposed voice generated by a user and other acoustic signal output devices, and performing A / D conversion on the noise superimposed voice to output noise superimposed voice data. . The noise input means 2 is means for inputting noise, performing A / D conversion on the noise, and outputting noise data. The speech section determination means 3 is means for inputting the noise-superimposed speech data output from the noise-superimposed speech input means 1, determining a speech section or a noise section, and outputting section information and section change information. The noise-superimposed speech spectrum calculation means 4 is a means for frequency-converting the noise-superimposed speech data and outputting the noise-superimposed speech power spectrum as a time series. The noise spectrum calculation means 5 is means for frequency-converting noise data and outputting the noise power spectrum as a time series. The correction filter coefficient updating means 6 has a frequency characteristic relating to the transfer characteristic of the voice superimposed noise between the noise superimposed voice and the noise from the noise section immediately after the voice section when the voice section determining means 3 determines the noise section. This is means for updating the correction filter coefficient for correcting the difference and storing it in the correction filter memory 7.

補正フィルタメモリ７は、補正フィルタ係数を１個記憶するメモリであり、雑音除去手段８は、補正フィルタメモリ７に記憶された補正フィルタ係数を用いて雑音重畳音声パワースペクトルに係る推定雑音パワースペクトルを除去して、雑音除去音声パワースペクトルを時系列に出力する手段である。初期状態判定手段９は、音声区間判定手段３の出力である区間情報が入力され、入力が音声区間であった場合に、雑音重畳音声パワースペクトル及雑音パワースペクトルを記憶させるか否かの判定を行い、その判定信号を出力する手段である。音声区間パワースペクトルメモリ１０は、初期状態判定手段９の判定信号に基づいて雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶するメモリである。補正フィルタ係数更新用スペクトルメモリ１１は、雑音区間の雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶するメモリである。 The correction filter memory 7 is a memory for storing one correction filter coefficient, and the noise removing unit 8 uses the correction filter coefficient stored in the correction filter memory 7 to calculate an estimated noise power spectrum related to the noise superimposed speech power spectrum. A means for removing and outputting a noise-removed voice power spectrum in time series. The initial state determination means 9 determines whether or not to store the noise superimposed speech power spectrum and the noise power spectrum when the section information which is the output of the speech section determination means 3 is input and the input is a speech section. And a means for outputting the determination signal. The voice section power spectrum memory 10 is a memory that stores a noise superimposed voice power spectrum and a noise power spectrum based on the determination signal of the initial state determination means 9. The correction filter coefficient updating spectrum memory 11 is a memory for storing a noise superimposed voice power spectrum and a noise power spectrum in a noise section.

次に、実施の形態１の雑音除去装置の動作について説明する。
図２は、実施の形態１の雑音除去装置の動作を示すフローチャートである。
雑音重畳音声入力手段１は、一般的に話者の近傍に設置されるか、または話者が保持し、話者の音声を入力する。この際、音声と共に雑音も入力されるため、雑音重畳音声が入力されることになる。この入力に対して、雑音重畳音声入力手段１は、例えば１１ｋＨｚサンプリングでＡ／Ｄ変換を行い、雑音重畳音声データを出力する。また、雑音入力手段２は、一般的に話者から離隔した位置に設置されるか、または、雑音発生源に直接接続して雑音を入力し、雑音重畳音声入力手段１と同様に、例えば１６ビットでＡ／Ｄ変換を行い、雑音データを出力する（ステップＳＴ１０１）。尚、本発明の雑音除去装置は、雑音入力手段２への音声の漏れ込みが無視できるほど小さい場合を想定して構成されているものとする。 Next, the operation of the noise removal apparatus according to Embodiment 1 will be described.
FIG. 2 is a flowchart showing the operation of the noise removal apparatus according to the first embodiment.
The noise superimposed voice input means 1 is generally installed near the speaker or is held by the speaker and inputs the voice of the speaker. At this time, since noise is also input together with the voice, noise superimposed voice is input. In response to this input, the noise superimposed voice input means 1 performs A / D conversion, for example, at 11 kHz sampling, and outputs noise superimposed voice data. Further, the noise input means 2 is generally installed at a position separated from the speaker, or is connected directly to a noise generation source to input noise. A / D conversion is performed with bits, and noise data is output (step ST101). In addition, the noise removal apparatus of this invention shall be comprised supposing the case where the leak of the audio | voice into the noise input means 2 is so small that it can be disregarded.

雑音重畳音声スペクトル演算手段４は、雑音重畳音声入力手段１が出力する雑音重畳音声データを入力とし、ある一定の長さ、例えば２５６サンプルの長さでフレームを用いて、一定の幅、例えば１１０サンプルでシフトしながら、ハミング窓またはハニング窓等で切り出し、フーリエ変換によって周波数変換して雑音重畳音声パワースペクトルを時系列として出力する。尚、このフーリエ変換については後述する。また、雑音スペクトル演算手段５は、雑音入力手段２が出力する雑音データを入力とし、雑音重畳音声スペクトル演算手段４と同処理で周波数変換して雑音パワースペクトルを時系列として出力する（ステップＳＴ１０２）。 The noise-superimposed speech spectrum calculation means 4 receives the noise-superimposed speech data output from the noise-superimposed speech input means 1 and uses a frame with a certain length, for example, 256 samples, and a certain width, for example 110 While shifting with the sample, it is cut out with a Hamming window or Hanning window, etc., frequency-transformed by Fourier transformation, and a noise superimposed voice power spectrum is output as a time series. This Fourier transform will be described later. Also, the noise spectrum calculation means 5 receives the noise data output from the noise input means 2 and converts the frequency in the same process as the noise superimposed voice spectrum calculation means 4 and outputs the noise power spectrum as a time series (step ST102). .

音声区間判定手段３は、雑音重畳音声入力手段１の出力である雑音重畳音声データを入力として音声区間を判定し、処理対象フレームが音声区間なのか雑音区間なのかの区間情報を出力する（ステップＳＴ１０３）。区間の判定方法に関しては後述する。ステップＳＴ１０３における判定の結果が雑音区間であった場合はステップＳＴ１０５へ、判定結果が音声区間ならばステップＳＴ１０６に移行する（ステップＳＴ１０４）。 The speech section determination unit 3 determines a speech section by using the noise-superimposed speech data output from the noise-superimposed speech input unit 1 as input, and outputs section information indicating whether the processing target frame is a speech section or a noise section (step) ST103). The section determination method will be described later. If the determination result in step ST103 is a noise interval, the process proceeds to step ST105, and if the determination result is a voice interval, the process proceeds to step ST106 (step ST104).

ステップＳＴ１０５において、補正フィルタ係数更新手段６は、雑音区間の処理対象フレームが音声区間直後のフレームから規定フレーム数以内であるかを判定する。判定の結果、そうであった場合はステップＳＴ１０７へ、そうでない場合はステップＳＴ１０８に移行する。尚、規定フレーム数は、例えば３０フレームとする。補正フィルタ係数更新手段６は、ステップＳＴ１０７において、雑音重畳音声パワースペクトル及び雑音パワースペクトルが、規定フレーム数、例えば３０フレーム補正フィルタ係数更新用スペクトルメモリ１１に記憶できたかどうかを判定し、ステップＳＴ１０８においては、補正フィルタ係数更新用スペクトルメモリ１１をクリアする。 In step ST105, the correction filter coefficient updating unit 6 determines whether the processing target frame in the noise section is within the specified number of frames from the frame immediately after the speech section. As a result of the determination, if so, the process proceeds to step ST107, and if not, the process proceeds to step ST108. The specified number of frames is, for example, 30 frames. In step ST107, the correction filter coefficient updating means 6 determines whether or not the noise superimposed voice power spectrum and the noise power spectrum can be stored in the specified number of frames, for example, the 30-frame correction filter coefficient updating spectrum memory 11, and in step ST108. Clears the correction filter coefficient updating spectrum memory 11.

ステップＳＴ１０７において、規定フレーム数を記憶できていれば、補正フィルタ係数更新手段６は、雑音重畳音声データと雑音データとの間においての雑音の伝達特性に係る周波数特性の差異を補正するための補正フィルタ係数を雑音重畳音声パワースペクトルの規定フレーム数分の平均パワースペクトルを雑音パワースペクトルの規定フレーム数分の平均パワースペクトルで除算して補正フィルタ係数を更新し、補正フィルタメモリ７に補正フィルタ係数を１個記憶させる（ステップＳＴ１０９）。尚、補正フィルタメモリ７が記憶する補正フィルタ係数については後述する。一方、ステップＳＴ１０７において、規定フレーム数記憶できていない場合は、補正フィルタ係数更新用スペクトルメモリ１１に、雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶し、この処理対象フレームへの処理を終了する（ステップＳＴ１１０）。 In step ST107, if the specified number of frames can be stored, the correction filter coefficient updating unit 6 corrects the frequency characteristic related to the noise transfer characteristic between the noise superimposed voice data and the noise data. The correction filter coefficient is updated by dividing the filter coefficient by the average power spectrum for the specified number of frames of the noise superimposed speech power spectrum by the average power spectrum for the specified number of frames of the noise power spectrum, and the correction filter coefficient is stored in the correction filter memory 7. One is stored (step ST109). The correction filter coefficient stored in the correction filter memory 7 will be described later. On the other hand, if the specified number of frames cannot be stored in step ST107, the noise superimposing voice power spectrum and the noise power spectrum are stored in the correction filter coefficient updating spectrum memory 11, and the processing for this processing target frame is terminated ( Step ST110).

また、上記ステップＳＴ１０４において、音声区間判定手段３の判定結果が音声区間であった場合、ステップＳＴ１０６において、初期状態判定手段９は、処理対象フレームを音声区間と判定し、音声区間パワースペクトルメモリ１０に雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶する。 In step ST104, if the determination result of the speech segment determination unit 3 is a speech segment, in step ST106, the initial state determination unit 9 determines that the processing target frame is a speech segment, and the speech segment power spectrum memory 10 The noise-superimposed voice power spectrum and noise power spectrum are stored.

雑音除去手段８は、補正フィルタメモリ７に記憶された補正フィルタ係数を用いて雑音パワースペクトルから雑音重畳音声パワースペクトルに重畳している雑音パワースペクトルを推定し、除去して、雑音除去音声パワースペクトルを時系列に出力する（ステップＳＴ１１１）。以上のステップＳＴ１０１〜ＳＴ１１１の処理を処理対象フレーム毎に繰り返し、補正フィルタ係数更新用スペクトルメモリ１１に、雑音重畳音声パワースペクトル及び雑音パワースペクトルが、規定フレーム数記憶された場合は、補正フィルタの係数を更新する。 The noise removing means 8 estimates and removes the noise power spectrum superimposed on the noise superimposed voice power spectrum from the noise power spectrum using the correction filter coefficient stored in the correction filter memory 7, and removes the noise power spectrum. Are output in time series (step ST111). When the processes of steps ST101 to ST111 are repeated for each processing target frame, and the noise superimposing voice power spectrum and the noise power spectrum are stored in the correction filter coefficient updating spectrum memory 11, the coefficients of the correction filter are stored. Update.

雑音除去手段８によって出力される雑音除去音声パワースペクトル｜Ｓｔ(ω)｜²は以下の式（１）に示すように表される。

尚、式（１）において、ωは周波数、Ｓｔ（ω）はフレーム時刻ｔの雑音除去音声振幅スペクトル、Ｙｔ（ω）はフレーム時刻ｔの雑音重畳音声振幅スペクトル、Ｈ（ω）は補正フィルタ係数、Ｎｔ（ω）はフレーム時刻ｔの雑音振幅スペクトル、αはパワースペクトルの減算量を調整するパラメータ（サブトラクト係数）、βはパワースペクトルにおける各周波数成分の下限値を設定するパラメータ（フロアリング値）である。また、ｍａｘ｛｝は括弧内の要素の中で最大の値の要素を返す関数として与えられるものである。 The noise-removed voice power spectrum | St (ω) | ² output by the noise removing means 8 is expressed as shown in the following equation (1).

In Equation (1), ω is the frequency, St (ω) is the noise-removed speech amplitude spectrum at frame time t, Yt (ω) is the noise-superimposed speech amplitude spectrum at frame time t, and H (ω) is the correction filter coefficient. , Nt (ω) is the noise amplitude spectrum at frame time t, α is a parameter (subtract coefficient) for adjusting the subtraction amount of the power spectrum, β is a parameter (flooring value) for setting the lower limit value of each frequency component in the power spectrum It is. Further, max {} is given as a function that returns the element having the maximum value among the elements in parentheses.

図３は、実施の形態１の音声区間判定手段３で判定の際に用いる信号を時系列で表す説明図である。
以下、図３に基づいて音声区間判定手段３の動作を詳細に説明する。
図３において、（１）は雑音重畳音声入力手段１が出力する雑音重畳音声データ、（２）は音声区間判定手段３が出力する雑音重畳音声データである。
音声区間の判定は、例えば、規定フレーム数分の雑音重畳音声データの平均パワーを演算し、これに一定パワーｐ、例えば５ｄＢを加えた値を閾値ｔｈと設定する。任意フレーム時刻ｔの雑音重畳音声データのパワーを演算し、閾値ｔｈを上回る時間が一定時間ｔ１、例えば０．５ｓｅｃ以上ある場合に任意フレーム時刻ｔを音声区間始端ｔｓ、また、閾値ｔｈを下回る時間が一定時間ｔ２以上ある場合に任意フレーム時刻ｔを雑音区間終端ｔｅとして、音声区間始端ｔｓが検出されるまでの間を雑音区間、音声区間始端ｔｓが検出されれば音声区間、雑音区間終端ｔｅが検出されれば雑音区間と判定する。 FIG. 3 is an explanatory diagram showing, in time series, signals used for the determination by the speech section determination unit 3 according to the first embodiment.
Hereinafter, the operation of the speech segment determination means 3 will be described in detail with reference to FIG.
In FIG. 3, (1) is the noise superimposed voice data output from the noise superimposed voice input means 1, and (2) is the noise superimposed voice data output from the voice section determining means 3.
The speech section is determined by, for example, calculating the average power of noise-superimposed speech data for a specified number of frames and adding a constant power p, for example, 5 dB, to the threshold th. The power of noise-superimposed speech data at an arbitrary frame time t is calculated, and when the time exceeding the threshold th is a fixed time t1, for example, 0.5 sec or more, the arbitrary frame time t is set to the voice section start ts, and the time below the threshold th Is the noise interval end te when the predetermined frame time t2 is equal to or longer than the predetermined time t2, the noise interval is detected until the voice interval start ts is detected, and if the audio interval start ts is detected, the audio interval and noise interval end te are detected. Is detected as a noise interval.

図４は、実施の形態１における雑音除去を行う場合の入力信号を時系列で表す説明図である。以下、この図４を基に雑音除去手段８の詳細に説明する。
図４において、（１）は雑音重畳音声入力手段１が出力する雑音重畳音声データ、（２）は雑音重畳音声データ（１）を判断基準にして発信する音声区間判定信号、（３）は雑音入力手段２が出力する雑音データである。 FIG. 4 is an explanatory diagram showing an input signal in time series when noise removal is performed in the first embodiment. Hereinafter, the noise removal means 8 will be described in detail with reference to FIG.
In FIG. 4, (1) is the noise superimposed voice data output from the noise superimposed voice input means 1, (2) is a voice segment determination signal transmitted based on the noise superimposed voice data (1) as a criterion, and (3) is noise. This is noise data output from the input means 2.

音声区間判定信号（２）が発信していない場合で音声区間から規定フレーム数経過した雑音区間の場合は、補正フィルタ係数更新手段６は雑音重畳音声（１）と雑音（３）の補正フィルタ作成フレーム区間ｆｈの平均スペクトラムの比から補正フィルタ係数を更新し、補正フィルタメモリ７に補正フィルタ係数更新手段６が係数更新した補正フィルタ係数を記憶する。 In the case where the speech section determination signal (2) is not transmitted and the noise section has passed the specified number of frames from the speech section, the correction filter coefficient updating means 6 creates correction filters for the noise superimposed speech (1) and noise (3). The correction filter coefficient is updated from the ratio of the average spectrum of the frame section fh, and the correction filter coefficient updated by the correction filter coefficient updating means 6 is stored in the correction filter memory 7.

雑音除去手段８は、音声区間判定信号（２）が処理対象フレームを音声区間と判定した場合、即ち、音声区間の場合は、補正フィルタメモリ７が記憶している補正フィルタ係数を用いて、雑音パワースペクトル、即ち雑音データ（３）のパワースペクトルの伝達特性の補正を行い、推定雑音パワースペクトラムを算出する。音声区間パワースペクトルメモリ１０が記憶している雑音重畳音声パワースペクトル、即ち雑音重畳音声データ（１）のパワースペクトルから、推定雑音パワースペクトラムを減算及び除去する。 When the speech section determination signal (2) determines that the processing target frame is a speech section, that is, when the speech section determination signal (2) is a speech section, the noise removing unit 8 uses the correction filter coefficient stored in the correction filter memory 7 to generate noise. The power spectrum, that is, the transfer characteristic of the power spectrum of the noise data (3) is corrected, and the estimated noise power spectrum is calculated. The estimated noise power spectrum is subtracted and removed from the noise superimposed speech power spectrum stored in the speech section power spectrum memory 10, that is, the power spectrum of the noise superimposed speech data (1).

図５は、実施の形態１の補正フィルタメモリに記憶される補正フィルタ係数の一例を示す説明図である。
係数は周波数毎の音声重畳雑音パワースペクトラムと雑音パワースペクトラムの比である。以下、この図を基に補正フィルタメモリに記憶されている補正フィルタ係数について詳細に説明する。 FIG. 5 is an explanatory diagram illustrating an example of correction filter coefficients stored in the correction filter memory according to the first embodiment.
The coefficient is the ratio of the audio superimposed noise power spectrum and the noise power spectrum for each frequency. Hereinafter, the correction filter coefficient stored in the correction filter memory will be described in detail with reference to FIG.

上記の式（２）において、ｋは時間、ωは周波数、ｊは虚数単位、Ｎはサンプル数、ｘ（ｋ）は時系列データである。

In the above equation (2), k is time, ω is frequency, j is an imaginary unit, N is the number of samples, and x (k) is time series data.

例えば、規定フレーム数を３０フレームとした場合、補正フィルタ係数は式（３）のように表すことができる。

上記の式（３）において、ωは周波数、Ｈ（ω）は補正フィルタ係数、｜Ｙｎ(ω)｜²は雑音重畳音声パワースペクトル、｜Ｎｎ(ω)｜²は雑音パワースペクトルである。補正フィルタ係数Ｈ（ω）は実際には周波数ωに対応した補正フィルタポイントｉ毎に値を持つ係数で、例えばサンプリング周波数が１１ｋＨｚの場合で、２５６ポイントでフーリエ変換した場合、補正フィルタポイント１２９は周波数５５００Ｈｚに対応している。 For example, when the specified number of frames is 30, the correction filter coefficient can be expressed as Equation (3).

In the above equation (3), ω is a frequency, H (ω) is a correction filter coefficient, | Yn (ω) | ² is a noise superimposed speech power spectrum, and | Nn (ω) | ² is a noise power spectrum. The correction filter coefficient H (ω) is actually a coefficient having a value for each correction filter point i corresponding to the frequency ω. For example, when the sampling frequency is 11 kHz and the Fourier transform is performed at 256 points, the correction filter point 129 is It corresponds to a frequency of 5500 Hz.

例えば、１フレーム長２５６サンプル、フレームシフト１１０サンプル、１１ｋＨｚサンプリングで分析した場合の雑音重畳音声データを対象にフーリエ変換して算出した雑音重畳音声パワースペクトルの音声区間直前３０フレームの平均を、雑音データに対して同処理した雑音スペクトル３０フレームの平均で除算した結果の一例である。パワースペクトルは式（２）によりスペクトルＸ（ω）を計算し、更に｜Ｘ(ω)｜²を求めることで得る。 For example, the average of the 30 frames immediately before the speech interval of the noise superimposed speech power spectrum calculated by subjecting the noise superimposed speech data when analyzed with 256 frames of 1 frame length, 110 samples of frame shift, and 11 kHz sampling to the noise data Is an example of the result of dividing by the average of 30 frames of the noise spectrum processed in the same manner. The power spectrum is obtained by calculating the spectrum X (ω) according to the equation (2) and further obtaining | X (ω) | ² .

以上、雑音推定及び雑音除去方法を２入力ＳＳで説明したが、１入力ＳＳへの適用も可能である。雑音除去音声パワースペクトルは１入力ＳＳを用いた場合は式（４）のように表すことができる。

上記の式（４）において、Ｎ_A（ω）は雑音区間における平均雑音パワースペクトルである。本発明はその雑音区間を音声区間直後の雑音区間を用いることを特徴とする。他の変数については、式（１）と同一なので説明を省略する。 Although the noise estimation and noise removal method has been described above with a two-input SS, application to a one-input SS is also possible. When the 1-input SS is used, the noise-removed voice power spectrum can be expressed as shown in Equation (4).

In the above equation (4), N _A (ω) is an average noise power spectrum in the noise section. The present invention is characterized in that the noise section immediately after the voice section is used as the noise section. The other variables are the same as those in Equation (1), and thus description thereof is omitted.

以上のように、実施の形態１の雑音除去装置によれば、雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、雑音を入力し、雑音データを出力する雑音入力手段と、雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、音声区間判定手段が音声区間の直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための補正フィルタの係数を更新する補正フィルタ係数更新手段と、補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去する雑音除去手段とを備えたので、例えば、音声区間直前で補正フィルタの係数を更新するのに十分な雑音区間を確保できない場合や、発声直前から発声開始後にかけて雑音重畳音声データと雑音データとの間においての音声重畳雑音の伝達特性に係る周波数特性が大きく変化する場合等、様々な環境下であっても正確な雑音除去を行うことができる。 As described above, according to the noise removal apparatus of the first embodiment, the noise superimposed voice input unit that inputs noise superimposed voice and outputs noise superimposed voice data, and the noise input that inputs noise and outputs noise data Means, a speech section determination means for inputting noise-superimposed speech data and determining whether the speech section is a noise section, or a noise section immediately after the speech section when the speech section determination means determines that the noise section is immediately after the speech section. From the noise-superimposed speech data, the noise-superimposed speech data in the speech section using the correction filter coefficient updating means for updating the coefficient of the correction filter for correcting the difference in frequency characteristics between the noise data and the correction filter coefficient For example, it is not possible to secure a sufficient noise section for updating the coefficient of the correction filter immediately before the speech section. Accurate noise even under various circumstances, such as when the frequency characteristics related to the transfer characteristics of speech superimposed noise between noise superimposed data and noise data greatly vary between immediately before utterance and after the start of utterance. Removal can be performed.

実施の形態２．
実施の形態２は、音声区間の直前直後の補正フィルタ係数を平均するようにしたものである。
図６は、実施の形態２の雑音除去装置の構成図である。
図において、雑音除去装置は、雑音重畳音声入力手段１、雑音入力手段２、音声区間判定手段３ａ、雑音重畳音声スペクトル演算手段４、雑音スペクトル演算手段５、補正フィルタ係数更新手段６ａ、補正フィルタメモリ７ａ、雑音除去手段８、初期状態判定手段９、音声区間パワースペクトルメモリ１０、補正フィルタ係数更新用スペクトルメモリ１１、前補正フィルタ作成用データメモリ１２、補正フィルタ係数合成手段１３からなる。 Embodiment 2. FIG.
In the second embodiment, the correction filter coefficients immediately before and after the speech section are averaged.
FIG. 6 is a configuration diagram of the noise removal apparatus according to the second embodiment.
In the figure, the noise removing apparatus includes a noise superimposed voice input means 1, a noise input means 2, a voice section determination means 3a, a noise superimposed voice spectrum calculation means 4, a noise spectrum calculation means 5, a correction filter coefficient update means 6a, a correction filter memory. 7a, noise removal means 8, initial state determination means 9, speech section power spectrum memory 10, correction filter coefficient update spectrum memory 11, pre-correction filter creation data memory 12, and correction filter coefficient synthesis means 13.

音声区間判定手段３ａは、雑音重畳音声入力手段１が出力する雑音重畳音声データと共に、前補正フィルタ作成用データメモリ１２に記憶された雑音重畳音声データに基づいて、音声区間を判定し、かつ、音声区間の直前あるいは直後の雑音区間を判定する手段である。補正フィルタ係数更新手段６ａは、音声区間判定手段３ａが、音声区間の直前の雑音区間と判定した場合に、音声区間直前の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する手段である。また、補正フィルタメモリ７ａは、補正フィルタ係数更新手段６ａが更新する前補正フィルタ係数及び後補正フィルタ係数と、補正フィルタ係数合成手段１３が更新する平均補正フィルタ係数とを記憶するよう構成されている。 The voice section determination unit 3a determines a voice section based on the noise superimposed voice data stored in the precorrection filter creation data memory 12 together with the noise superimposed voice data output from the noise superimposed voice input unit 1, and It is means for determining a noise section immediately before or after a speech section. The correction filter coefficient updating unit 6a determines the frequency characteristic between the noise superimposed voice data and the noise data from the noise section immediately before the voice section when the voice section determination unit 3a determines that the noise section is immediately before the voice section. The coefficient of the pre-correction filter for correcting the difference is updated, and when it is determined as the noise section immediately after the speech section, the frequency characteristic between the noise superimposed speech data and the noise data is determined from the noise section immediately after the speech section. This is means for updating the coefficient of the post-correction filter for correcting the difference. The correction filter memory 7a is configured to store the pre-correction filter coefficient and post-correction filter coefficient updated by the correction filter coefficient update unit 6a, and the average correction filter coefficient updated by the correction filter coefficient synthesis unit 13. .

前補正フィルタ作成用データメモリ１２は、入力される雑音重畳音声データを一定時間分バッファリングするためのメモリである。補正フィルタ係数合成手段１３は、補正フィルタメモリ７ａに記憶された前補正フィルタ係数と後補正フィルタ係数との平均を算出し、この平均値を平均補正フィルタ係数として補正フィルタメモリ７ａに記憶させる手段である。他の構成については、実施の形態１と同様であるため、ここでの説明は省略する。 The pre-correction filter creation data memory 12 is a memory for buffering input noise superimposed voice data for a predetermined time. The correction filter coefficient synthesizing unit 13 calculates the average of the pre-correction filter coefficient and the post-correction filter coefficient stored in the correction filter memory 7a, and stores the average value in the correction filter memory 7a as the average correction filter coefficient. is there. Since other configurations are the same as those in the first embodiment, description thereof is omitted here.

次に、実施の形態２の雑音除去装置の動作について説明する。
図７及び図８は、実施の形態２の動作を示すフローチャートであり、以下、このフローチャートに沿って動作を説明する。図中のステップＳＴ１０１〜ステップＳＴ１１１の処理は、実施の形態１におけるステップＳＴ１０１〜ステップＳＴ１１１の処理と同様の内容であることを示している。 Next, the operation of the noise removal apparatus according to the second embodiment will be described.
7 and 8 are flowcharts showing the operation of the second embodiment. Hereinafter, the operation will be described with reference to this flowchart. The process of step ST101-step ST111 in the figure has shown that it is the same content as the process of step ST101-step ST111 in Embodiment 1.

雑音重畳音声入力手段１及び雑音入力手段２は、それぞれ雑音重畳音声及び雑音を入力し、雑音重畳音声データ及び雑音データを出力する（ステップＳＴ１０１）。次に、雑音重畳音声スペクトル演算手段４及び雑音スペクトル演算手段５は、それぞれ雑音重畳音声データ及び雑音データを入力し、これらデータを周波数変換して雑音重畳音声パワースペクトル及び雑音パワースペクトルを時系列として出力する（ステップＳＴ１０２）。 The noise superimposed voice input means 1 and the noise input means 2 respectively input the noise superimposed voice and noise, and output the noise superimposed voice data and noise data (step ST101). Next, the noise superimposed voice spectrum calculation means 4 and the noise spectrum calculation means 5 are inputted with noise superimposed voice data and noise data, respectively, and frequency-converted these data to make the noise superimposed voice power spectrum and noise power spectrum as time series. Output (step ST102).

また、雑音重畳音声入力手段１が出力する雑音重畳音声データは、前補正フィルタ作成用データメモリ１２に記憶され（ステップＳＴ２０１）、音声区間判定手段３は、前補正フィルタ作成用データメモリ１２が記憶している雑音重畳音声データを入力として音声区間を判定し、処理対象フレームが音声区間なのか雑音区間なのかの区間情報を出力する（ステップＳＴ１０３）。ステップＳＴ１０３における判定の結果が雑音区間であった場合はステップＳＴ１０５へ、判定結果が音声区間ならばステップＳＴ１０６に移行する（ステップＳＴ１０４）。 Also, the noise superimposed voice data output from the noise superimposed voice input means 1 is stored in the pre-correction filter creation data memory 12 (step ST201), and the voice section determination means 3 is stored in the pre-correction filter creation data memory 12. The speech section is determined using the noise-superimposed speech data being input as input, and section information indicating whether the processing target frame is a speech section or a noise section is output (step ST103). If the determination result in step ST103 is a noise interval, the process proceeds to step ST105, and if the determination result is a voice interval, the process proceeds to step ST106 (step ST104).

ステップＳＴ１０６においては、初期状態判定手段９の判定結果により雑音重畳音声パワースペクトル及び雑音パワースペクトルを音声区間パワースペクトルメモリ１０に記憶させると共に、音声区間判定手段３ａは、前補正フィルタ作成用データメモリ１２内の雑音区間は音声区間直前から規定フレーム以上であるかを判定する（ステップＳＴ２０２）。ステップＳＴ２０２において、規定フレーム以上であった場合、補正フィルタ係数更新手段６ａは、雑音重畳音声パワースペクトルの平均を雑音パワースペクトルの平均で除算して補正フィルタ係数を更新し、この補正フィルタ係数を１個、補正フィルタメモリ７ａに記憶する（ステップＳＴ１０９）。次に、補正フィルタ係数合成手段１３は、補正フィルタメモリ７ａに、前補正フィルタ係数及び後補正フィルタ係数の両方の補正フィルタ係数が記憶されているかを判定し（ステップＳＴ２０３）、記憶されていない場合はそのまま終了する。例えば、前補正フィルタ係数のみが記憶されている場合は、そのまま終了する。また、ステップＳＴ２０２において、規定フレーム以上ではなかった場合は、そのまま終了する。 In step ST106, the noise superimposed speech power spectrum and the noise power spectrum are stored in the speech section power spectrum memory 10 according to the determination result of the initial state determination means 9, and the speech section determination means 3a It is determined whether the noise section in the frame is a specified frame or more immediately before the voice section (step ST202). In step ST202, if the frame is equal to or more than the specified frame, the correction filter coefficient updating unit 6a updates the correction filter coefficient by dividing the average of the noise superimposed speech power spectrum by the average of the noise power spectrum. Are stored in the correction filter memory 7a (step ST109). Next, the correction filter coefficient synthesizing unit 13 determines whether the correction filter coefficients of both the pre-correction filter coefficient and the post-correction filter coefficient are stored in the correction filter memory 7a (step ST203). Ends as is. For example, when only the pre-correction filter coefficient is stored, the process ends as it is. In step ST202, if it is not the specified frame or more, the process ends as it is.

また、ステップＳＴ１０４において処理対象フレームが雑音区間であった場合、補正フィルタ係数更新手段６ａは、雑音区間の処理対象フレームが音声区間直後のフレームから規定フレーム数以内であるかを判定する（ステップＳＴ１０５）。判定の結果、そうであった場合はステップＳＴ１０７へ、そうでない場合はステップＳＴ１０８に移行する。補正フィルタ係数更新手段６は、ステップＳＴ１０７において、雑音重畳音声パワースペクトル及び雑音パワースペクトルが、規定フレーム数分、補正フィルタ係数更新用スペクトルメモリ１１に記憶できたかどうかを判定し、一方、ステップＳＴ１０８においては、補正フィルタ係数更新用スペクトルメモリ１１をクリアする。 Further, when the processing target frame is a noise section in step ST104, the correction filter coefficient updating unit 6a determines whether the processing target frame in the noise section is within the specified number of frames from the frame immediately after the speech section (step ST105). ). As a result of the determination, if so, the process proceeds to step ST107, and if not, the process proceeds to step ST108. In step ST107, the correction filter coefficient updating means 6 determines whether or not the noise superimposed voice power spectrum and the noise power spectrum can be stored in the correction filter coefficient updating spectrum memory 11 for the specified number of frames, while in step ST108. Clears the correction filter coefficient updating spectrum memory 11.

ステップＳＴ１０７において、規定フレーム数を記憶できていれば、補正フィルタ係数更新手段６ａは、雑音重畳音声パワースペクトルの規定フレーム数分の平均パワースペクトルを雑音パワースペクトルの規定フレーム数分の平均パワースペクトルで除算して補正フィルタ係数を更新し、補正フィルタメモリ７に後補正フィルタ係数としてこれを１個記憶させる（ステップＳＴ１０９）。一方、ステップＳＴ１０７において、規定フレーム数記憶できていない場合、補正フィルタ係数更新手段６ａは、補正フィルタ係数更新用スペクトルメモリ１１に、雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶する（ステップＳＴ１１０）。 If the specified number of frames can be stored in step ST107, the correction filter coefficient updating unit 6a uses the average power spectrum for the specified number of frames of the noise superimposed speech power spectrum as the average power spectrum for the specified number of frames of the noise power spectrum. The correction filter coefficient is updated by division, and one correction filter coefficient is stored in the correction filter memory 7 as a post-correction filter coefficient (step ST109). On the other hand, if the specified number of frames cannot be stored in step ST107, the correction filter coefficient updating means 6a stores the noise superimposed voice power spectrum and the noise power spectrum in the correction filter coefficient updating spectrum memory 11 (step ST110).

次に、補正フィルタ係数合成手段１３は、補正フィルタメモリ７ａ中に、前補正フィルタ係数と後補正フィルタ係数の両方の補正フィルタ係数が記憶されているかを判定し（ステップＳＴ２０３）、そうであった場合は、これら前補正フィルタ係数と後補正フィルタ係数の平均を算出し、補正フィルタメモリ７ａに平均補正フィルタ係数として１個記憶する（ステップＳＴ２０４）。 Next, the correction filter coefficient synthesizing unit 13 determines whether the correction filter coefficients of both the pre-correction filter coefficient and the post-correction filter coefficient are stored in the correction filter memory 7a (step ST203). In this case, the average of these pre-correction filter coefficients and post-correction filter coefficients is calculated and stored as one average correction filter coefficient in the correction filter memory 7a (step ST204).

ステップＳＴ２０４において、平均補正フィルタ係数が更新された場合、雑音除去手段８は補正フィルタメモリ７ａに記憶された平均補正フィルタ係数を用いて雑音重畳音声パワースペクトルに係る推定雑音パワースペクトルを除去して、雑音除去音声パワースペクトルを時系列に出力する（ステップＳＴ１１１）。尚、補正フィルタメモリ７ａには平均補正フィルタ係数として予め決められた初期値が記憶されており、この初期値を平均補正フィルタ係数で更新する。また、最初の音声区間において平均補正フィルタ係数が得られなかった場合は初期値により雑音除去音声を出力する。 In step ST204, when the average correction filter coefficient is updated, the noise removing unit 8 removes the estimated noise power spectrum related to the noise superimposed speech power spectrum using the average correction filter coefficient stored in the correction filter memory 7a. The noise-removed voice power spectrum is output in time series (step ST111). The correction filter memory 7a stores an initial value determined in advance as an average correction filter coefficient, and the initial value is updated with the average correction filter coefficient. Further, when the average correction filter coefficient cannot be obtained in the first speech section, the noise-removed speech is output with the initial value.

尚、上記実施の形態では、前補正フィルタ係数と後補正フィルタ係数を両方共補正フィルタメモリ７ａに記憶させ、補正フィルタ係数合成手段１３が、補正フィルタメモリ７ａから両方の補正フィルタ係数と取り出して平均を求めるようにしたが、補正フィルタ係数更新手段６ａが後補正フィルタ係数を更新した場合、補正フィルタメモリ７ａに記憶されている前補正フィルタ係数を取り出して平均を求めるようにしてもよい。このように構成することにより、補正フィルタメモリ７ａ内に後補正フィルタ係数を記憶させるための領域を不要とすることができる。 In the above embodiment, both the pre-correction filter coefficient and the post-correction filter coefficient are stored in the correction filter memory 7a, and the correction filter coefficient synthesizing unit 13 takes out both correction filter coefficients from the correction filter memory 7a and averages them. However, when the correction filter coefficient updating unit 6a updates the post-correction filter coefficient, the average may be obtained by extracting the pre-correction filter coefficient stored in the correction filter memory 7a. By configuring in this way, an area for storing the post-correction filter coefficient in the correction filter memory 7a can be made unnecessary.

以上のように、実施の形態２の雑音除去装置によれば、雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、雑音を入力し、雑音データを出力する雑音入力手段と、雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、音声区間判定手段が音声区間の直前の雑音区間と判定した場合に、音声区間直前の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する補正フィルタ係数更新手段と、前補正フィルタ係数と後補正フィルタ係数との平均を算出し、平均補正フィルタ係数として出力する補正フィルタ係数合成手段と、平均補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去する雑音除去手段とを備えたので、より正確な音声重畳雑音を推定可能な補正フィルタ係数を更新することができるので、更に精度の高い雑音除去を行うことができる。 As described above, according to the noise removal apparatus of the second embodiment, the noise superimposed voice input means for inputting the noise superimposed voice and outputting the noise superimposed voice data, and the noise input for inputting the noise and outputting the noise data Means, a speech section determination means for determining whether the speech section is a noise section by inputting noise-superimposed speech data, and a noise section immediately before the speech section when the speech section determination means determines that the noise section is immediately before the speech section. To update the coefficient of the pre-correction filter for correcting the frequency characteristic difference between the noise-superimposed speech data and the noise data, and determine the noise interval immediately after the speech interval, A correction filter coefficient updating means for updating a coefficient of a post-correction filter for correcting a difference in frequency characteristics between the noise-superimposed speech data and the noise data, and a pre-correction filter. Noise that removes the noise data from the noise-superimposed speech data in the speech section using the correction filter coefficient synthesis means that calculates the average of the data coefficient and the post-correction filter coefficient and outputs the average correction filter coefficient, and the average correction filter coefficient With the removal means, it is possible to update the correction filter coefficient capable of estimating the more accurate voice superimposed noise, so that noise removal with higher accuracy can be performed.

実施の形態３．
実施の形態３は、ボタン操作によりユーザが音声区間を指定する手段を設けると共に、音声区間の再判定を行って補正フィルタ係数を再度更新するようにしたものである。 Embodiment 3 FIG.
In the third embodiment, a means for a user to specify a voice section by button operation is provided, and the correction filter coefficient is updated again by re-determination of the voice section.

図９は、実施の形態３の雑音除去装置を示す構成図である。
図示の雑音除去装置は、雑音重畳音声入力手段１、雑音入力手段２、音声区間判定手段３ｂ、雑音重畳音声スペクトル演算手段４、雑音スペクトル演算手段５、補正フィルタ係数更新手段６、補正フィルタメモリ７、雑音除去手段８、初期状態判定手段９、音声区間パワースペクトルメモリ１０、補正フィルタ係数更新用スペクトルメモリ１１、ボタン押下タイミング入力手段１４、雑音除去音声データ演算手段１５からなる。 FIG. 9 is a configuration diagram illustrating the noise removal device of the third embodiment.
The noise removal apparatus shown in the figure includes a noise superimposed voice input means 1, a noise input means 2, a voice segment determination means 3b, a noise superimposed voice spectrum calculation means 4, a noise spectrum calculation means 5, a correction filter coefficient update means 6, and a correction filter memory 7. , Noise removing means 8, initial state determining means 9, voice section power spectrum memory 10, correction filter coefficient updating spectrum memory 11, button press timing input means 14, and noise removed voice data calculating means 15.

ボタン押下タイミング入力手段１４は、ユーザが音声入力を行う場合にボタンを押下する際に発信する制御信号が入力され、音声区間判定手段３に対してボタン押下タイミング信号を出力する手段である。雑音除去音声データ演算手段１５は、雑音除去手段８が出力する雑音除去音声パワースペクトルに対して、逆フーリエ変換等により時間領域変換を行った雑音除去音声データを出力する手段である。また、音声区間判定手段３ｂは、図示しないタイマを有し、ボタン押下タイミング入力手段１４による押下タイミングに基づき、雑音重畳音声データに対して音声区間か雑音区間かを判定すると共に、雑音除去音声データ演算手段１５より、雑音除去音声データが入力された場合は、この雑音除去音声データに基づいて音声区間の再判定を行うよう構成されている。その他の構成については、実施の形態１と同様であるため、ここでの説明は省略する。 The button press timing input means 14 is a means for receiving a control signal transmitted when the user presses a button when making a voice input, and outputting a button press timing signal to the voice section determination means 3. The noise-removed voice data calculation unit 15 is a unit that outputs noise-removed voice data obtained by performing time domain conversion on the noise-removed voice power spectrum output from the noise removal unit 8 by inverse Fourier transform or the like. The voice section determination unit 3b includes a timer (not shown), determines whether the voice section or the noise section is from the noise superimposed voice data based on the pressing timing by the button pressing timing input unit 14, and the noise-removed voice data. When noise-removed speech data is input from the computing unit 15, the speech section is re-determined based on the noise-removed speech data. Since other configurations are the same as those in the first embodiment, description thereof is omitted here.

次に、実施の形態３の雑音除去装置の動作について説明する。
図１０及び図１１は、実施の形態３の動作を示すフローチャートであり、以下、このフローチャートに沿って動作を説明する。図中のステップＳＴ１０１〜ステップＳＴ１１１の処理は、実施の形態１におけるステップＳＴ１０１〜ステップＳＴ１１１の処理と同様の内容であることを示している。 Next, the operation of the noise removal apparatus according to the third embodiment will be described.
10 and 11 are flowcharts showing the operation of the third embodiment, and the operation will be described below with reference to this flowchart. The process of step ST101-step ST111 in the figure has shown that it is the same content as the process of step ST101-step ST111 in Embodiment 1.

次に、ボタン押下タイミング入力手段１４からの出力がボタン押下であるか否かを判定し、ボタン押下であればステップＳＴ１０３へ、ボタン押下がなければ終了する（ステップＳＴ３０１）。ステップＳＴ１０３において、音声区間判定手段３ｂは、雑音重畳音声入力手段１の出力である雑音重畳音声データを入力として音声区間を判定し、処理対象フレームが音声区間なのか雑音区間なのかの区間情報を出力する。また、ステップＳＴ３０１においてボタン押下があった場合、この押下タイミングで音声区間判定手段３ｂ内のタイマがセットされる。ステップＳＴ１０４の判定処理において、処理対象フレームが音声区間であった場合は、実施の形態１と同様に初期状態判定手段９により、音声区間パワースペクトルメモリ１０に、雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶し（ステップＳＴ１０６）、終了する。 Next, it is determined whether or not the output from the button press timing input means 14 is a button press. If the button is pressed, the process proceeds to step ST103, and if the button is not pressed, the process ends (step ST301). In step ST103, the speech segment determination unit 3b determines the speech segment by using the noise superimposed speech data output from the noise superimposed speech input unit 1 as input, and obtains segment information indicating whether the processing target frame is a speech segment or a noise segment. Output. If a button is pressed in step ST301, the timer in the voice section determination unit 3b is set at this pressing timing. In the determination process of step ST104, when the processing target frame is a speech section, the initial state determination means 9 stores the noise superimposed speech power spectrum and the noise power spectrum in the speech section power spectrum memory 10 as in the first embodiment. Is stored (step ST106), and the process ends.

一方、ステップＳＴ１０４において、処理対象フレームが音声区間でなかった場合、音声区間判定手段３ｂは、タイマのタイムアウトで音声区間終端を決定する（ステップＳＴ３０２）。尚、タイマのタイムアウト時間としては、雑音除去装置が適用される装置における発声時間として適当な値が予め設定されている。ステップＳＴ３０２において、音声区間終端が決定された場合、補正フィルタ係数更新手段６は、ステップＳＴ１０７において、雑音重畳音声パワースペクトル及び雑音パワースペクトルが、規定フレーム数分、補正フィルタ係数更新用スペクトルメモリ１１に記憶できたかどうかを判定し、規定フレーム数を記憶できていれば、補正フィルタ係数更新手段６は、雑音重畳音声パワースペクトルの規定フレーム数分の平均パワースペクトルを雑音パワースペクトルの規定フレーム数分の平均パワースペクトルで除算して補正フィルタ係数を更新し、補正フィルタメモリ７に補正フィルタ係数としてこれを１個記憶させる（ステップＳＴ１０９）。一方、ステップＳＴ１０７において、規定フレーム数記憶できていない場合は、補正フィルタ係数更新用スペクトルメモリ１１に、雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶する（ステップＳＴ１１０）。 On the other hand, if the processing target frame is not a speech section in step ST104, the speech section determination means 3b determines the end of the speech section by a timer timeout (step ST302). Note that an appropriate value is set in advance as the time-out time of the timer as the utterance time in the device to which the noise removing device is applied. When the speech section end is determined in step ST302, the correction filter coefficient update unit 6 stores the noise superimposed voice power spectrum and the noise power spectrum in the correction filter coefficient update spectrum memory 11 for the specified number of frames in step ST107. If it is determined whether or not the predetermined number of frames can be stored, the correction filter coefficient updating unit 6 calculates the average power spectrum corresponding to the specified number of frames of the noise superimposed speech power spectrum as much as the specified number of frames of the noise power spectrum. The correction filter coefficient is updated by dividing by the average power spectrum, and one correction filter coefficient is stored in the correction filter memory 7 (step ST109). On the other hand, if the prescribed number of frames cannot be stored in step ST107, the noise superimposed speech power spectrum and the noise power spectrum are stored in the correction filter coefficient updating spectrum memory 11 (step ST110).

次に、ステップＳＴ１０９において補正フィルタ係数が記憶された場合、雑音除去手段８は、補正フィルタメモリ７に記憶された補正フィルタ係数を用いて雑音重畳音声パワースペクトルに係る推定雑音パワースペクトルを除去して、雑音除去音声パワースペクトルを時系列に出力する（ステップＳＴ１１１）。次に、雑音除去音声データ演算手段１５は、雑音除去手段８が出力した雑音除去音声パワースペクトルから雑音除去音声データを生成し、これを音声区間判定手段３ｂに出力する（ステップＳＴ３０３）。即ち、雑音除去音声データ演算手段１５は、雑音除去手段８から出力された雑音除去音声パワースペクトル｜Ｓ(ω)｜²へ雑音重畳音声スペクトルＹ(ω)の位相情報を適用して雑音除去音声スペクトルＳ(ω)を求め、逆フーリエ変換を用いて時間領域変換を行い、雑音除去音声データを出力する。雑音除去音声データｓ(k)の算出式は式（５）のようになる。

尚、上式（５）において、ｋは時間、ωは周波数、Ｎはサンプル数、Ｓ(ω)は雑音除去音声スペクトルである。 Next, when the correction filter coefficient is stored in step ST109, the noise removing unit 8 uses the correction filter coefficient stored in the correction filter memory 7 to remove the estimated noise power spectrum related to the noise superimposed speech power spectrum. The noise-removed voice power spectrum is output in time series (step ST111). Next, the noise-removed voice data calculating unit 15 generates noise-removed voice data from the noise-removed voice power spectrum output by the noise removing unit 8, and outputs the noise-removed voice data to the voice section determining unit 3b (step ST303). That is, the noise-removed voice data calculation means 15 applies the phase information of the noise-superimposed voice spectrum Y (ω) to the noise-removed voice power spectrum | S (ω) | ² output from the noise removal means 8 to remove the noise-removed voice. A spectrum S (ω) is obtained, time domain transform is performed using inverse Fourier transform, and noise-removed speech data is output. The calculation formula of the noise-removed voice data s (k) is as shown in Formula (5).

In the above equation (5), k is time, ω is frequency, N is the number of samples, and S (ω) is a noise-removed speech spectrum.

ステップＳＴ３０３において、雑音除去音声データ演算手段１５より音声区間判定手段３ｂに雑音除去音声データが出力された場合、音声区間判定手段３ｂは、雑音除去音声データを対象に音声区間を検出したかを判定し（ステップＳＴ３０４）、既に検出した場合はそのまま終了し、検出していない場合は、雑音除去音声データを入力として、パワーを演算し、閾値を用いて音声区間と雑音区間とを判定する（ステップＳＴ３０５）。その後、雑音除去音声データにより判定した音声区間と雑音区間とに基づく補正フィルタ係数更新処理を行う（ステップＳＴ３０６）。即ち、ステップＳＴ３０６は、実施の形態１におけるステップＳＴ１０４〜ステップＳＴ１１１の処理と同様の処理である。 In step ST303, when the noise-removed speech data is output from the noise-removed speech data calculation unit 15 to the speech-sector determination unit 3b, the speech-sector determination unit 3b determines whether a speech segment has been detected for the noise-removed speech data. (Step ST304) If it has already been detected, the processing ends as it is. If it has not been detected, power is calculated using the noise-removed speech data as input, and a speech section and a noise section are determined using a threshold value (Step ST304). ST305). Thereafter, correction filter coefficient update processing is performed based on the speech section determined from the noise-removed speech data and the noise section (step ST306). That is, step ST306 is the same process as the process of step ST104 to step ST111 in the first embodiment.

このように実施の形態３では、周囲の雑音のパワーが大きい場合でも、雑音と発声のパワーの大きさにかかわらず、ボタン押下時からのタイムアウトで音声区間を判定し、この判定結果に基づく補正フィルタで雑音除去を行った後、再度、音声区間の検出を行うようにしたので、雑音のパワーが大きい場合や発声のパワーが小さい場合でも精度良く補正フィルタ係数を算出することができる。 As described above, in the third embodiment, even when the power of the surrounding noise is large, the voice section is determined by the time-out after the button is pressed regardless of the noise and the power of the utterance, and the correction based on the determination result is performed. After the noise removal by the filter, the speech section is detected again, so that the correction filter coefficient can be calculated accurately even when the noise power is high or the utterance power is low.

以上では雑音推定及び雑音除去方法を２入力ＳＳで説明したが、１入力ＳＳへの適用も可能である。この場合、平均雑音パワースペクトルをタイムアウト後で計算することで、１入力ＳＳへの適用を実現することができる。１入力ＳＳへの適用については前述したので説明を省略する。 Although the noise estimation and noise removal method has been described above with a two-input SS, application to a one-input SS is also possible. In this case, by applying the average noise power spectrum after the timeout, application to one input SS can be realized. Since the application to the one-input SS has been described above, the description thereof will be omitted.

以上のように、実施の形態３の雑音除去装置によれば、雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、雑音を入力し、雑音データを出力する雑音入力手段と、音声入力時に押下されるボタンの押下タイミングを取得するボタン押下タイミング入力手段と、ボタン押下タイミング入力手段による押下タイミングに基づき、雑音重畳音声データに対して音声区間か雑音区間かを判定する音声区間判定手段と、音声区間判定手段が音声区間の直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための補正フィルタの係数を更新する補正フィルタ係数更新手段と、補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去する雑音除去手段とを備え、音声区間判定手段は、ボタン押下タイミングに基づいて得られた雑音除去音声データを用いて音声区間と雑音区間の再判定を行い、補正フィルタ係数更新手段は、再判定の結果得られた音声区間と雑音区間に基づいて、補正フィルタの係数を再度更新するようにしたので、雑音重畳音声データを対象に音声区間判別を行った時に判別不可能だった音声区間も判別することができる。あるいは、判別可能であった場合には、より正確な音声区間を判別することができる。 As described above, according to the noise removal apparatus of the third embodiment, the noise-superimposed voice input means for inputting the noise-superimposed voice and outputting the noise-superimposed voice data, and the noise input for inputting the noise and outputting the noise data Based on the pressing timing by the means, the button pressing timing input means for obtaining the pressing timing of the button pressed at the time of the voice input, and the button pressing timing input means, it is determined whether the noise superimposed voice data is a voice section or a noise section When the speech segment determination unit and the speech segment determination unit determine that the noise segment is immediately after the speech segment, the frequency characteristic difference between the noise superimposed speech data and the noise data is corrected from the noise segment immediately after the speech segment. Correction filter coefficient updating means for updating the coefficient of the correction filter, and the noise-superimposed sound in the speech section using the correction filter coefficient Noise removal means for removing noise data from the data, and the speech section determination means re-determines the speech section and noise section using the noise-removed speech data obtained based on the button press timing, and the correction filter coefficient The updating means updates the coefficient of the correction filter again based on the speech section and noise section obtained as a result of redetermination, so it cannot be discriminated when performing speech section discrimination on noise superimposed speech data. It is also possible to discriminate the voice section that was. Alternatively, if it is discriminable, a more accurate voice section can be discriminated.

実施の形態４．
実施の形態４は、前補正フィルタ係数と後補正フィルタ係数との相関を算出するようにしたものである。
図１２は、実施の形態４の雑音除去装置の構成図である。
図において、雑音除去装置は、雑音重畳音声入力手段１、雑音入力手段２、音声区間判定手段３ａ、雑音重畳音声スペクトル演算手段４、雑音スペクトル演算手段５、補正フィルタ係数更新手段６ａ、補正フィルタメモリ７ａ、雑音除去手段８、初期状態判定手段９、音声区間パワースペクトルメモリ１０、補正フィルタ係数更新用スペクトルメモリ１１、前補正フィルタ作成用データメモリ１２、補正フィルタ相関算出手段１６からなる。補正フィルタ相関算出手段１６は、補正フィルタメモリ７ａに記憶されている前補正フィルタ係数と後補正フィルタ係数との相関を算出する手段である。それ以外の構成は、実施の形態２と同様であるため、対応する部分に同一符号を付してその説明を省略する。 Embodiment 4 FIG.
In the fourth embodiment, the correlation between the pre-correction filter coefficient and the post-correction filter coefficient is calculated.
FIG. 12 is a configuration diagram of the noise removal apparatus according to the fourth embodiment.
In the figure, the noise removing apparatus includes a noise superimposed voice input means 1, a noise input means 2, a voice section determination means 3a, a noise superimposed voice spectrum calculation means 4, a noise spectrum calculation means 5, a correction filter coefficient update means 6a, a correction filter memory. 7a, noise removal means 8, initial state determination means 9, speech section power spectrum memory 10, correction filter coefficient update spectrum memory 11, pre-correction filter creation data memory 12, and correction filter correlation calculation means 16. The correction filter correlation calculation means 16 is a means for calculating the correlation between the pre-correction filter coefficient and the post-correction filter coefficient stored in the correction filter memory 7a. Since the other configuration is the same as that of the second embodiment, the same reference numerals are assigned to the corresponding portions, and the description thereof is omitted.

次に、実施の形態４の雑音除去装置の動作について説明する。
図１３及び図１４は、実施の形態４の動作を示すフローチャートであり、以下、このフローチャートに沿って動作を説明する。図中のステップＳＴ１０１〜ステップＳＴ１１１の処理は、実施の形態１におけるステップＳＴ１０１〜ステップＳＴ１１１の処理と同様の内容であることを示している。また、図中のステップＳＴ２０１〜ステップＳＴ２０３は、実施の形態２におけるステップＳＴ２０１〜ステップＳＴ２０３の処理と同様の内容であることを示している。 Next, the operation of the noise removal apparatus according to the fourth embodiment will be described.
FIGS. 13 and 14 are flowcharts showing the operation of the fourth embodiment, and the operation will be described below with reference to this flowchart. The process of step ST101-step ST111 in the figure has shown that it is the same content as the process of step ST101-step ST111 in Embodiment 1. Further, step ST201 to step ST203 in the figure indicate the same contents as the processing of step ST201 to step ST203 in the second embodiment.

また、雑音重畳音声入力手段１が出力する雑音重畳音声データは、前補正フィルタ作成用データメモリ１２に記憶され（ステップＳＴ２０１）、音声区間判定手段３ａは、前補正フィルタ作成用データメモリ１２が記憶している雑音重畳音声データを入力として音声区間を判定し、処理対象フレームが音声区間なのか雑音区間なのかの区間情報を出力する（ステップＳＴ１０３）。ステップＳＴ１０３における判定の結果が雑音区間であった場合はステップＳＴ１０５へ、判定結果が音声区間ならばステップＳＴ１０６に移行する（ステップＳＴ１０４）。 Further, the noise superimposed voice data output from the noise superimposed voice input means 1 is stored in the pre-correction filter creation data memory 12 (step ST201), and the voice section determination means 3a is stored in the pre-correction filter creation data memory 12. The speech section is determined using the noise-superimposed speech data being input as input, and section information indicating whether the processing target frame is a speech section or a noise section is output (step ST103). If the determination result in step ST103 is a noise interval, the process proceeds to step ST105, and if the determination result is a voice interval, the process proceeds to step ST106 (step ST104).

ステップＳＴ１０６においては、初期状態判定手段９によって雑音重畳音声パワースペクトル及び雑音パワースペクトルが音声区間パワースペクトルメモリ１０に記憶されると共に、音声区間判定手段３ａは、前補正フィルタ作成用データメモリ１２内の雑音区間は音声区間直前から規定フレーム以上であるかを判定する（ステップＳＴ２０２）。ステップＳＴ２０２において、規定フレーム以上であった場合、補正フィルタ係数更新手段６ａは、雑音重畳音声パワースペクトルの平均を雑音パワースペクトルの平均で除算して補正フィルタ係数を更新し、この補正フィルタ係数を１個、前補正フィルタ係数として、補正フィルタメモリ７ａに記憶する（ステップＳＴ１０９）。次に、補正フィルタ相関算出手段１６は、前補正フィルタ係数及び後補正フィルタ係数の両方の補正フィルタ係数が得られたかを判定し（ステップＳＴ２０３）、得られていない場合はそのまま終了する。例えば、補正フィルタメモリ７ａに前補正フィルタ係数は記憶されているが後補正フィルタ係数が得られていない場合は、そのまま終了する。また、ステップＳＴ２０２において、規定フレーム以上ではなかった場合は、そのまま終了する。また、実施の形態４では、ステップＳＴ１０９において補正フィルタ係数が記憶された場合、バックグラウンド処理として、雑音除去手段８は、この補正フィルタ係数に基づいて、雑音重畳音声パワースペクトルに係る推定雑音パワースペクトルを除去して、雑音除去音声パワースペクトルを生成する。 In step ST106, the noise superimposition voice power spectrum and the noise power spectrum are stored in the voice section power spectrum memory 10 by the initial state judgment means 9, and the voice section judgment means 3a is stored in the precorrection filter creation data memory 12. It is determined whether the noise section is a specified frame or more immediately before the voice section (step ST202). In step ST202, if the frame is equal to or more than the specified frame, the correction filter coefficient updating unit 6a updates the correction filter coefficient by dividing the average of the noise superimposed speech power spectrum by the average of the noise power spectrum. Are stored in the correction filter memory 7a as the previous correction filter coefficients (step ST109). Next, the correction filter correlation calculation unit 16 determines whether both of the pre-correction filter coefficient and the post-correction filter coefficient have been obtained (step ST203). If it has not been obtained, the process ends. For example, when the pre-correction filter coefficient is stored in the correction filter memory 7a but the post-correction filter coefficient is not obtained, the process ends as it is. In step ST202, if it is not the specified frame or more, the process ends as it is. Further, in the fourth embodiment, when the correction filter coefficient is stored in step ST109, the noise removal unit 8 performs the estimated noise power spectrum related to the noise superimposed speech power spectrum based on the correction filter coefficient as background processing. To generate a noise-removed voice power spectrum.

ステップＳＴ１０４において処理対象フレームが雑音区間であった場合、補正フィルタ係数更新手段６ａは、雑音区間の処理対象フレームが音声区間直後のフレームから規定フレーム数以内であるかを判定する（ステップＳＴ１０５）。判定の結果、そうであった場合はステップＳＴ１０７へ、そうでない場合はステップＳＴ１０８に移行する。補正フィルタ係数更新手段６ａは、ステップＳＴ１０７において、雑音重畳音声パワースペクトル及び雑音パワースペクトルが、規定フレーム数分、補正フィルタ係数更新用スペクトルメモリ１１に記憶できたかどうかを判定し、一方、ステップＳＴ１０８においては、補正フィルタ係数更新用スペクトルメモリ１１をクリアする。 When the processing target frame is a noise section in step ST104, the correction filter coefficient updating unit 6a determines whether the processing target frame in the noise section is within the specified number of frames from the frame immediately after the speech section (step ST105). As a result of the determination, if so, the process proceeds to step ST107, and if not, the process proceeds to step ST108. In step ST107, the correction filter coefficient updating means 6a determines whether or not the noise superimposed speech power spectrum and the noise power spectrum can be stored in the correction filter coefficient updating spectrum memory 11 for the specified number of frames, while in step ST108. Clears the correction filter coefficient updating spectrum memory 11.

ステップＳＴ１０７において、規定フレーム数を記憶できていれば、補正フィルタ係数更新手段６ａは、雑音重畳音声パワースペクトルの規定フレーム数分の平均パワースペクトルを雑音パワースペクトルの規定フレーム数分の平均パワースペクトルで除算して補正フィルタ係数を更新し、補正フィルタメモリ７ａに後補正フィルタ係数としてこれを１個記憶させる（ステップＳＴ１０９）。尚、ステップＳＴ１０９において、前補正フィルタ係数と後補正フィルタ係数とは別々に記憶するものとする。一方、ステップＳＴ１０７において、規定フレーム数記憶できていない場合は、補正フィルタ係数更新用スペクトルメモリ１１に、雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶する（ステップＳＴ１１０）。 If the specified number of frames can be stored in step ST107, the correction filter coefficient updating unit 6a uses the average power spectrum for the specified number of frames of the noise superimposed speech power spectrum as the average power spectrum for the specified number of frames of the noise power spectrum. The correction filter coefficient is updated by division and is stored as one post-correction filter coefficient in the correction filter memory 7a (step ST109). In step ST109, the pre-correction filter coefficient and the post-correction filter coefficient are stored separately. On the other hand, if the prescribed number of frames cannot be stored in step ST107, the noise superimposed speech power spectrum and the noise power spectrum are stored in the correction filter coefficient updating spectrum memory 11 (step ST110).

次に、補正フィルタ相関算出手段１６は、補正フィルタメモリ７ａに前補正フィルタ係数と後補正フィルタ係数の両方の補正フィルタ係数が記憶されているかを判定し（ステップＳＴ２０３）、そうであった場合は、これら前補正フィルタ係数と後補正フィルタ係数の相関があるかを判定する（ステップＳＴ４０１，ＳＴ４０２）。この相関係数rの算出式は下式（６）のようになる。

上式（６）において、ｉは補正フィルタポイント、ｎは最大補正フィルタポイント、ｈ₁（ｉ）は、後補正フィルタｈ₁のｉポイント目の係数、ｈ₂（ｉ）は、前補正フィルタｈ₂のｉポイント目の係数、ｈ_a1は後補正フィルタｈ₁の係数の平均、ｈ_a2は前補正フィルタｈ₂の係数の平均である。 Next, the correction filter correlation calculation means 16 determines whether the correction filter coefficients of both the pre-correction filter coefficient and the post-correction filter coefficient are stored in the correction filter memory 7a (step ST203). Then, it is determined whether there is a correlation between these pre-correction filter coefficients and post-correction filter coefficients (steps ST401 and ST402). The equation for calculating the correlation coefficient r is as shown in the following equation (6).

In the above equation (6), i is the correction filter point, n is the maximum correction filter point, h ₁ (i) is the coefficient of the i-th point of the post-correction filter h ₁ , and h ₂ (i) is the pre-correction filter h The coefficient at the _2nd i-point, h _a1 is the average of the coefficients of the post-correction filter h ₁ , and h _a2 is the average of the coefficients of the pre-correction filter h ₂ .

ステップＳＴ４０２において、補正フィルタ相関算出手段１６が相関があると判定した場合、雑音除去手段８は、前補正フィルタ係数を用いて処理した雑音除去音声パワースペクトルを出力する（ステップＳＴ４０３）。一方、ステップＳＴ４０２において、相関がないと判定した場合、補正フィルタ相関算出手段１６は、前補正フィルタ係数を後補正フィルタ係数で更新して、これを補正フィルタ係数として補正フィルタメモリ７ａに記憶する（ステップＳＴ４０４）。そして、雑音除去手段８は、更新された補正フィルタ係数を用いた雑音除去音声パワースペクトルを出力する（ステップＳＴ４０５）。 In step ST402, when the correction filter correlation calculation unit 16 determines that there is a correlation, the noise removal unit 8 outputs a noise-removed voice power spectrum processed using the pre-correction filter coefficient (step ST403). On the other hand, if it is determined in step ST402 that there is no correlation, the correction filter correlation calculation means 16 updates the pre-correction filter coefficient with the post-correction filter coefficient and stores it in the correction filter memory 7a as the correction filter coefficient ( Step ST404). And the noise removal means 8 outputs the noise removal audio | voice power spectrum using the updated correction filter coefficient (step ST405).

以上では雑音推定及び雑音除去方法を２入力ＳＳで説明したが、１入力ＳＳへの適用も可能である。この場合、平均雑音パワースペクトルを音声区間直前と音声区間直後で計算することで、１入力ＳＳへの適用を実現することができる。１入力ＳＳへの適用については前述したので説明を省略する。 Although the noise estimation and noise removal method has been described above with a two-input SS, application to a one-input SS is also possible. In this case, application to 1-input SS can be realized by calculating the average noise power spectrum immediately before and after the speech interval. Since the application to the one-input SS has been described above, the description thereof will be omitted.

尚、上記実施の形態では、前補正フィルタ係数と後補正フィルタ係数を両方共補正フィルタメモリ７ａに記憶させ、補正フィルタ相関算出手段１６は、補正フィルタメモリ７ａから両方の補正フィルタ係数と取り出して相関を算出するようにしたが、補正フィルタ係数更新手段６ａが後補正フィルタ係数を更新した場合、補正フィルタメモリ７ａに記憶されている前補正フィルタ係数を取り出して相関を算出するようにしてもよい。このように構成することにより、補正フィルタメモリ７ａ内に後補正フィルタ係数を記憶させるための領域を不要とすることができる。 In the above embodiment, both the pre-correction filter coefficient and the post-correction filter coefficient are stored in the correction filter memory 7a, and the correction filter correlation calculating means 16 takes out both the correction filter coefficients from the correction filter memory 7a and correlates them. However, when the correction filter coefficient updating unit 6a updates the post-correction filter coefficient, the correlation may be calculated by extracting the pre-correction filter coefficient stored in the correction filter memory 7a. By configuring in this way, an area for storing the post-correction filter coefficient in the correction filter memory 7a can be made unnecessary.

以上のように、実施の形態４の雑音除去装置によれば、雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、雑音を入力し、雑音データを出力する雑音入力手段と、雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、音声区間判定手段が音声区間の直前の雑音区間と判定した場合に、音声区間直前の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する補正フィルタ係数更新手段と、前補正フィルタ係数と後補正フィルタ係数との相関を算出する補正フィルタ相関算出手段と、補正フィルタ相関算出手段で前補正フィルタ係数と後補正フィルタ係数との間に相関があった場合、前補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去した雑音除去音声を出力する雑音除去手段とを備えたので、前補正フィルタ係数と後補正フィルタ係数との間に相関がある場合に、前補正フィルタ係数による雑音除去音声を出力することができるため、より早く雑音除去音声を出力することが可能である。 As described above, according to the noise removal apparatus of the fourth embodiment, noise superimposed voice input means for inputting noise superimposed voice and outputting noise superimposed voice data, and noise input for inputting noise and outputting noise data Means, a speech section determination means for determining whether the speech section is a noise section by inputting noise-superimposed speech data, and a noise section immediately before the speech section when the speech section determination means determines that the noise section is immediately before the speech section. To update the coefficient of the pre-correction filter for correcting the frequency characteristic difference between the noise-superimposed speech data and the noise data, and determine the noise interval immediately after the speech interval, A correction filter coefficient updating means for updating a coefficient of a post-correction filter for correcting a difference in frequency characteristics between the noise-superimposed speech data and the noise data, and a pre-correction filter. The correction filter correlation calculating means for calculating the correlation between the pre-correction filter coefficient and the post-correction filter coefficient. Using noise removal means for outputting noise-removed speech obtained by removing noise data from noise-superimposed speech data in a speech section, so that there is a correlation between the pre-correction filter coefficient and the post-correction filter coefficient, Since the noise-removed voice based on the pre-correction filter coefficient can be output, the noise-removed voice can be output earlier.

実施の形態５．
実施の形態５は、補正フィルタメモリ７に記憶されている複数の補正フィルタ係数の重み付け平均をとるようにしたものである。
図１５は、実施の形態５の雑音除去装置の構成図である。
図において、雑音除去装置は、雑音重畳音声入力手段１、雑音入力手段２、音声区間判定手段３ａ、雑音重畳音声スペクトル演算手段４、雑音スペクトル演算手段５、補正フィルタ係数更新手段６ａ、補正フィルタメモリ７ｂ、雑音除去手段８、初期状態判定手段９、音声区間パワースペクトルメモリ１０、補正フィルタ係数更新用スペクトルメモリ１１、前補正フィルタ作成用データメモリ１２、補正フィルタ係数重み付け平均算出手段１７からなる。補正フィルタメモリ７ｂは、前補正フィルタ係数及び後補正フィルタ係数が時系列に複数記憶されるメモリである。補正フィルタ係数重み付け平均算出手段１７は、補正フィルタメモリ７ｂに記憶されている時系列の補正フィルタ係数の重み付け平均を算出する手段である。それ以外の構成は、実施の形態２と同様であるため、対応する部分に同一符号を付してその説明を省略する。 Embodiment 5. FIG.
In the fifth embodiment, a weighted average of a plurality of correction filter coefficients stored in the correction filter memory 7 is taken.
FIG. 15 is a configuration diagram of the noise removal device of the fifth embodiment.
In the figure, the noise removing apparatus includes a noise superimposed voice input means 1, a noise input means 2, a voice section determination means 3a, a noise superimposed voice spectrum calculation means 4, a noise spectrum calculation means 5, a correction filter coefficient update means 6a, a correction filter memory. 7b, noise removal means 8, initial state determination means 9, speech section power spectrum memory 10, correction filter coefficient update spectrum memory 11, pre-correction filter creation data memory 12, and correction filter coefficient weighted average calculation means 17. The correction filter memory 7b is a memory in which a plurality of pre-correction filter coefficients and post-correction filter coefficients are stored in time series. The correction filter coefficient weighted average calculation means 17 is a means for calculating a weighted average of time-series correction filter coefficients stored in the correction filter memory 7b. Since the other configuration is the same as that of the second embodiment, the same reference numerals are assigned to the corresponding portions, and the description thereof is omitted.

次に、実施の形態５の雑音除去装置の動作について説明する。
図１６及び図１７は、実施の形態５の動作を示すフローチャートであり、以下、このフローチャートに沿って動作を説明する。図中のステップＳＴ１０１〜ステップＳＴ１１１の処理は、実施の形態１におけるステップＳＴ１０１〜ステップＳＴ１１１の処理と同様の内容であることを示している。また、図中のステップＳＴ２０１〜ステップＳＴ２０２は、実施の形態２におけるステップＳＴ２０１〜ステップＳＴ２０２の処理と同様の内容であることを示している。 Next, the operation of the noise removal apparatus according to the fifth embodiment will be described.
FIGS. 16 and 17 are flowcharts showing the operation of the fifth embodiment. Hereinafter, the operation will be described with reference to this flowchart. The process of step ST101-step ST111 in the figure has shown that it is the same content as the process of step ST101-step ST111 in Embodiment 1. Further, step ST201 to step ST202 in the figure indicate the same contents as the processing of step ST201 to step ST202 in the second embodiment.

ステップＳＴ１０６においては、初期状態判定手段９によって雑音重畳音声パワースペクトル及び雑音パワースペクトルが音声区間パワースペクトルメモリ１０に記憶されると共に、音声区間判定手段３ａは、前補正フィルタ作成用データメモリ１２内の雑音区間は音声区間直前から規定フレーム以上であるかを判定する（ステップＳＴ２０２）。ステップＳＴ２０２において、規定フレーム以上であった場合、補正フィルタ係数更新手段６ａは、雑音重畳音声パワースペクトルの平均を雑音パワースペクトルの平均で除算して補正フィルタ係数を更新し、この補正フィルタ係数を１個、前補正フィルタ係数として、補正フィルタメモリ７ｂに記憶する（ステップＳＴ１０９）。 In step ST106, the noise superimposition voice power spectrum and the noise power spectrum are stored in the voice section power spectrum memory 10 by the initial state judgment means 9, and the voice section judgment means 3a is stored in the precorrection filter creation data memory 12. It is determined whether the noise section is a specified frame or more immediately before the voice section (step ST202). In step ST202, if the frame is equal to or more than the specified frame, the correction filter coefficient updating unit 6a updates the correction filter coefficient by dividing the average of the noise superimposed speech power spectrum by the average of the noise power spectrum. Are stored in the correction filter memory 7b as the previous correction filter coefficients (step ST109).

また、ステップＳＴ１０４において処理対象フレームが雑音区間であった場合、補正フィルタ係数更新手段６ａは、雑音区間の処理対象フレームが音声区間直後のフレームから規定フレーム数以内であるかを判定する（ステップＳＴ１０５）。判定の結果、そうであった場合はステップＳＴ１０７へ、そうでない場合はステップＳＴ１０８に移行する。補正フィルタ係数更新手段６ａは、ステップＳＴ１０７において、雑音重畳音声パワースペクトル及び雑音パワースペクトルが、規定フレーム数分、補正フィルタ係数更新用スペクトルメモリ１１に記憶できたかどうかを判定し、一方、ステップＳＴ１０８においては、補正フィルタ係数更新用スペクトルメモリ１１をクリアする。 Further, when the processing target frame is a noise section in step ST104, the correction filter coefficient updating unit 6a determines whether the processing target frame in the noise section is within the specified number of frames from the frame immediately after the speech section (step ST105). ). As a result of the determination, if so, the process proceeds to step ST107, and if not, the process proceeds to step ST108. In step ST107, the correction filter coefficient updating means 6a determines whether or not the noise superimposed speech power spectrum and the noise power spectrum can be stored in the correction filter coefficient updating spectrum memory 11 for the specified number of frames, while in step ST108. Clears the correction filter coefficient updating spectrum memory 11.

ステップＳＴ１０７において、規定フレーム数を記憶できていれば、ステップＳＴ１０９に移行し、補正フィルタ係数更新手段６ａは、雑音重畳音声パワースペクトルの規定フレーム数分の平均パワースペクトルを雑音パワースペクトルの規定フレーム数分の平均パワースペクトルで除算して補正フィルタ係数を更新し、補正フィルタメモリ７ｂに後補正フィルタ係数としてこれを１個記憶させる。尚、ステップＳＴ１０９において、前補正フィルタ係数と後補正フィルタ係数とは別々にかつ時系列にそれぞれ複数記憶するものとする。一方、ステップＳＴ１０７において、規定フレーム数記憶できていない場合は、補正フィルタ係数更新用スペクトルメモリ１１に、雑音重畳音声パワースペクトル及び雑音パワースペクトルを記憶する（ステップＳＴ１１０）。 If the specified number of frames can be stored in step ST107, the process proceeds to step ST109, and the correction filter coefficient updating means 6a calculates the average power spectrum for the specified number of frames of the noise superimposed speech power spectrum as the specified number of frames of the noise power spectrum. The correction filter coefficient is updated by dividing by the average power spectrum of minutes, and one correction filter coefficient is stored in the correction filter memory 7b. In step ST109, a plurality of pre-correction filter coefficients and post-correction filter coefficients are stored separately and in time series. On the other hand, if the prescribed number of frames cannot be stored in step ST107, the noise superimposed speech power spectrum and the noise power spectrum are stored in the correction filter coefficient updating spectrum memory 11 (step ST110).

次に、補正フィルタ係数重み付け平均算出手段１７は、補正フィルタメモリ７ｂに記憶された複数の補正フィルタ係数に重み付けを行って加算平均を算出し、補正フィルタ係数を更新し、平均補正フィルタ係数として補正フィルタメモリ７ｂに記憶する（ステップＳＴ５０１）。例えば、重み付けは、時系列の過去の補正フィルタ係数ほど小さな値の重みを乗算することで時系列の現在に近い補正フィルタ係数を重視するようにし、補正フィルタメモリ７に記憶されている補正フィルタ係数の平均を算出する。補正フィルタメモリ７には、例えば式（７）のように重み付けを行った補正フィルタ係数Ｈ（ｉ）を格納する。

上記の式（７）において、ｉは補正フィルタポイントで、ａ、ｂ、ｃは各補正フィルタ係数の重み定数、Ｈ₁（ｉ）、Ｈ₂（ｉ）は過去に係数更新し補正フィルタメモリに記憶している補正フィルタ係数で、Ｈ₃（ｉ）は新しく更新した補正フィルタ係数である。式（７）は補正フィルタメモリが補正フィルタ係数を過去二つまで記憶している場合の補正フィルタ係数Ｈ（ｉ）の算出式である。 Next, the correction filter coefficient weighted average calculating means 17 calculates the addition average by weighting the plurality of correction filter coefficients stored in the correction filter memory 7b, updates the correction filter coefficient, and corrects it as the average correction filter coefficient. Store in the filter memory 7b (step ST501). For example, the weighting is such that the correction filter coefficient closer to the present in the time series is emphasized by multiplying the time-series past correction filter coefficient by a smaller weight, and the correction filter coefficient stored in the correction filter memory 7 is weighted. The average of is calculated. In the correction filter memory 7, for example, the correction filter coefficient H (i) weighted as shown in the equation (7) is stored.

In the above equation (7), i is a correction filter point, a, b, and c are weight constants for each correction filter coefficient, and H ₁ (i) and H ₂ (i) are updated in the past and stored in the correction filter memory. In the stored correction filter coefficient, H ₃ (i) is a newly updated correction filter coefficient. Formula (7) is a formula for calculating the correction filter coefficient H (i) when the correction filter memory has stored up to two correction filter coefficients.

ステップＳＴ５０１において、係数を更新した平均補正フィルタ係数を記憶した後、雑音除去手段８は補正フィルタメモリ７に記憶された平均補正フィルタ係数を用いて雑音重畳音声パワースペクトルに係る推定雑音パワースペクトルを除去して、雑音除去音声パワースペクトルを時系列に出力する（ステップＳＴ１１１）。 In step ST501, after storing the average corrected filter coefficient whose coefficient has been updated, the noise removing unit 8 removes the estimated noise power spectrum related to the noise superimposed speech power spectrum using the average corrected filter coefficient stored in the correction filter memory 7. The noise-removed voice power spectrum is output in time series (step ST111).

以上のように、実施の形態５の雑音除去装置によれば、雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、雑音を入力し、雑音データを出力する雑音入力手段と、雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、音声区間判定手段が音声区間の直前の雑音区間と判定した場合に、音声区間直前の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、音声区間直後の雑音区間から、雑音重畳音声データと雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する補正フィルタ係数更新手段と、前補正フィルタ係数及び後補正フィルタ係数に対し、これら係数が得られた時系列に従って重み付けを行って複数の前補正フィルタ係数及び後補正フィルタ係数の平均を算出し、平均補正フィルタ係数として出力する補正フィルタ係数重み付け平均算出手段と、平均補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去する雑音除去手段とを備えたので、過去に更新した補正フィルタ係数も用いることで、より正確に音声重畳雑音を推定可能な補正フィルタを作成することができる。 As described above, according to the noise removal apparatus of the fifth embodiment, the noise superimposed voice input unit that inputs noise superimposed voice and outputs noise superimposed voice data, and the noise input that inputs noise and outputs noise data Means, a speech section determination means for determining whether the speech section is a noise section by inputting noise-superimposed speech data, and a noise section immediately before the speech section when the speech section determination means determines that the noise section is immediately before the speech section. To update the coefficient of the pre-correction filter for correcting the frequency characteristic difference between the noise-superimposed speech data and the noise data, and determine the noise interval immediately after the speech interval, A correction filter coefficient updating means for updating a coefficient of a post-correction filter for correcting a difference in frequency characteristics between the noise-superimposed speech data and the noise data, and a pre-correction filter. Correction filter coefficients that are weighted according to the time series from which these coefficients are obtained to calculate an average of a plurality of pre-correction filter coefficients and post-correction filter coefficients, and output as average correction filter coefficients Since the weighted average calculating means and the noise removing means for removing the noise data from the noise-superimposed speech data in the speech section using the average correction filter coefficient are provided, the correction filter coefficient updated in the past is also used for more accurate It is possible to create a correction filter capable of estimating the voice superimposed noise.

尚、上記実施の形態５では、実施の形態２や実施の形態４と同様に、音声区間の前後の補正フィルタ係数を求めるようにしたが、実施の形態１で示したように、後補正フィルタ係数のみを補正フィルタメモリに記憶する構成に対して重み付け平均を行ってもよい。 In the fifth embodiment, the correction filter coefficients before and after the speech section are obtained as in the second and fourth embodiments. However, as shown in the first embodiment, the post-correction filter coefficient is obtained. A weighted average may be performed on a configuration in which only the coefficients are stored in the correction filter memory.

また、上記実施の形態３では、ボタン押下タイミング入力手段からの押下タイミングに基づいて音声区間を判定した後、得られた雑音除去音声データを用いて音声区間を再判定するようにしたが、実施の形態１，２，４，５においても、得られた雑音除去音声データを用いて音声区間を再判定するよう構成してもよい。即ち、これら実施の形態において、実施の形態３の雑音除去音声データ演算手段を設け、この出力で音声区間判定手段が音声区間の再判定を行うよう構成してもよい。このような構成により、実施の形態１，２，４，５においても更に精度の高い音声区間検出の実現が可能であり、従って、より精度の高い雑音除去を実現することができる。 In the third embodiment, after determining the voice section based on the pressing timing from the button pressing timing input means, the voice section is re-determined using the obtained noise-removed voice data. In the first, second, fourth, and fifth embodiments, the speech section may be re-determined using the obtained noise-removed speech data. That is, in these embodiments, the noise-removed voice data calculation means of the third embodiment may be provided, and the voice section determination means may perform redetermination of the voice section by this output. With such a configuration, it is possible to realize voice segment detection with higher accuracy in the first, second, fourth, and fifth embodiments, and it is therefore possible to realize noise removal with higher accuracy.

この発明の実施の形態１による雑音除去装置を示す構成図である。It is a block diagram which shows the noise removal apparatus by Embodiment 1 of this invention. この発明の実施の形態１による雑音除去装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the noise removal apparatus by Embodiment 1 of this invention. この発明の実施の形態１による雑音除去装置の音声区間判定手段で判定の際に用いる信号を時系列で表す説明図である。It is explanatory drawing which represents the signal used in the case of determination with the audio | voice area determination means of the noise removal apparatus by Embodiment 1 of this invention in time series. この発明の実施の形態１による雑音除去装置の雑音除去を行う場合の入力信号を時系列で表す説明図である。It is explanatory drawing which represents in time series the input signal in the case of performing the noise removal of the noise removal apparatus by Embodiment 1 of this invention. この発明の実施の形態１による雑音除去装置の補正フィルタ係数の一例を示す説明図である。It is explanatory drawing which shows an example of the correction filter coefficient of the noise removal apparatus by Embodiment 1 of this invention. この発明の実施の形態２による雑音除去装置を示す構成図である。It is a block diagram which shows the noise removal apparatus by Embodiment 2 of this invention. この発明の実施の形態２による雑音除去装置の動作を示すフローチャート（その１）である。It is a flowchart (the 1) which shows operation | movement of the noise removal apparatus by Embodiment 2 of this invention. この発明の実施の形態２による雑音除去装置の動作を示すフローチャート（その２）である。It is a flowchart (the 2) which shows operation | movement of the noise removal apparatus by Embodiment 2 of this invention. この発明の実施の形態３による雑音除去装置を示す構成図である。It is a block diagram which shows the noise removal apparatus by Embodiment 3 of this invention. この発明の実施の形態３による雑音除去装置の動作を示すフローチャート（その１）である。It is a flowchart (the 1) which shows operation | movement of the noise removal apparatus by Embodiment 3 of this invention. この発明の実施の形態３による雑音除去装置の動作を示すフローチャート（その２）である。It is a flowchart (the 2) which shows operation | movement of the noise removal apparatus by Embodiment 3 of this invention. この発明の実施の形態４による雑音除去装置を示す構成図である。It is a block diagram which shows the noise removal apparatus by Embodiment 4 of this invention. この発明の実施の形態４による雑音除去装置の動作を示すフローチャート（その１）である。It is a flowchart (the 1) which shows operation | movement of the noise removal apparatus by Embodiment 4 of this invention. この発明の実施の形態４による雑音除去装置の動作を示すフローチャート（その２）である。It is a flowchart (the 2) which shows operation | movement of the noise removal apparatus by Embodiment 4 of this invention. この発明の実施の形態５による雑音除去装置を示す構成図である。It is a block diagram which shows the noise removal apparatus by Embodiment 5 of this invention. この発明の実施の形態５による雑音除去装置の動作を示すフローチャート（その１）である。It is a flowchart (the 1) which shows operation | movement of the noise removal apparatus by Embodiment 5 of this invention. この発明の実施の形態５による雑音除去装置の動作を示すフローチャート（その２）である。It is a flowchart (the 2) which shows operation | movement of the noise removal apparatus by Embodiment 5 of this invention.

符号の説明Explanation of symbols

１雑音重畳音声入力手段、２雑音入力手段、３，３ａ，３ｂ音声区間判定手段、６，６ａ補正フィルタ係数更新手段、７，７ａ，７ｂ補正フィルタメモリ７、８雑音除去手段、１３補正フィルタ係数合成手段、１４ボタン押下タイミング入力手段、１５雑音除去音声データ演算手段、１６補正フィルタ相関算出手段、１７補正フィルタ係数重み付け平均算出手段。 DESCRIPTION OF SYMBOLS 1 Noise superimposition voice input means, 2 Noise input means, 3, 3a, 3b Voice area determination means, 6, 6a Correction filter coefficient update means, 7, 7a, 7b Correction filter memory 7, 8 Noise removal means, 13 Correction filter coefficient Synthesis means, 14 button press timing input means, 15 noise-removed voice data calculation means, 16 correction filter correlation calculation means, 17 correction filter coefficient weighted average calculation means.

Claims

雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、
雑音を入力し、雑音データを出力する雑音入力手段と、
前記雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、
前記音声区間判定手段が音声区間の直後の雑音区間と判定した場合に、当該音声区間直後の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための補正フィルタの係数を更新する補正フィルタ係数更新手段と、
前記補正フィルタ係数を用いて、音声区間における前記雑音重畳音声データから前記雑音データを除去する雑音除去手段とを備えた雑音除去装置。 Noise superimposed voice input means for inputting noise superimposed voice and outputting noise superimposed voice data;
Noise input means for inputting noise and outputting noise data;
Voice interval determination means for inputting the noise-superimposed voice data and determining whether it is a voice interval or a noise interval;
When the speech section determination means determines that the noise section is immediately after the speech section, the noise section immediately after the speech section is used to correct a frequency characteristic difference between the noise superimposed speech data and the noise data. Correction filter coefficient updating means for updating the coefficient of the correction filter;
A noise removal device comprising: noise removal means for removing the noise data from the noise-superimposed speech data in a speech section using the correction filter coefficient.

雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、
雑音を入力し、雑音データを出力する雑音入力手段と、
前記雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、
前記音声区間判定手段が音声区間の直前の雑音区間と判定した場合に、当該音声区間直前の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、当該音声区間直後の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する補正フィルタ係数更新手段と、
前記前補正フィルタ係数と前記後補正フィルタ係数との平均を算出し、平均補正フィルタ係数として出力する補正フィルタ係数合成手段と、
前記平均補正フィルタ係数を用いて、音声区間における前記雑音重畳音声データから前記雑音データを除去する雑音除去手段とを備えた雑音除去装置。 Noise superimposed voice input means for inputting noise superimposed voice and outputting noise superimposed voice data;
Noise input means for inputting noise and outputting noise data;
Voice interval determination means for inputting the noise-superimposed voice data and determining whether it is a voice interval or a noise interval;
When the speech section determination unit determines that the noise section is immediately before the speech section, the noise section for correcting the frequency characteristic difference between the noise superimposed speech data and the noise data from the noise section immediately before the speech section. Updates the coefficient of the pre-correction filter and corrects the difference in frequency characteristics between the noise-superimposed speech data and the noise data from the noise section immediately after the speech section when it is determined that the noise section is immediately after the speech section. Correction filter coefficient updating means for updating the coefficient of the post-correction filter for performing,
A correction filter coefficient synthesizing unit that calculates an average of the pre-correction filter coefficient and the post-correction filter coefficient and outputs the average as the average correction filter coefficient;
A noise removal device comprising: noise removal means for removing the noise data from the noise-superimposed speech data in a speech section using the average correction filter coefficient.

雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、
雑音を入力し、雑音データを出力する雑音入力手段と、
音声入力時に押下されるボタンの押下タイミングを取得するボタン押下タイミング入力手段と、
前記ボタン押下タイミング入力手段による押下タイミングに基づき、前記雑音重畳音声データに対して音声区間か雑音区間かを判定する音声区間判定手段と、
前記音声区間判定手段が音声区間の直後の雑音区間と判定した場合に、当該音声区間直後の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための補正フィルタの係数を更新する補正フィルタ係数更新手段と、
前記補正フィルタ係数を用いて、音声区間における前記雑音重畳音声データから前記雑音データを除去する雑音除去手段とを備え、
前記音声区間判定手段は、前記ボタン押下タイミングに基づいて得られた雑音除去音声データを用いて音声区間と雑音区間の再判定を行い、前記補正フィルタ係数更新手段は、当該再判定の結果得られた音声区間と雑音区間に基づいて、前記補正フィルタの係数を再度更新することを特徴とする雑音除去装置。 Noise superimposed voice input means for inputting noise superimposed voice and outputting noise superimposed voice data;
Noise input means for inputting noise and outputting noise data;
Button pressing timing input means for acquiring a pressing timing of a button pressed at the time of voice input;
Based on the pressing timing by the button pressing timing input means, a voice section determining means for determining whether the noise superimposed voice data is a voice section or a noise section;
When the speech section determination means determines that the noise section is immediately after the speech section, the noise section immediately after the speech section is used to correct a frequency characteristic difference between the noise superimposed speech data and the noise data. Correction filter coefficient updating means for updating the coefficient of the correction filter;
Noise removing means for removing the noise data from the noise-superimposed speech data in a speech section using the correction filter coefficient;
The speech segment determination means re-determines a speech segment and a noise segment using noise-removed speech data obtained based on the button press timing, and the correction filter coefficient update unit obtains the result of the re-determination. A noise removing apparatus that updates the coefficient of the correction filter again based on the voice section and the noise section.

雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、
雑音を入力し、雑音データを出力する雑音入力手段と、
前記雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、
前記音声区間判定手段が音声区間の直前の雑音区間と判定した場合に、当該音声区間直前の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、当該音声区間直後の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する補正フィルタ係数更新手段と、
前記前補正フィルタ係数と前記後補正フィルタ係数との相関を算出する補正フィルタ相関算出手段と、
前記補正フィルタ相関算出手段で前記前補正フィルタ係数と前記後補正フィルタ係数との間に相関があった場合、前記前補正フィルタ係数を用いて、音声区間における前記雑音重畳音声データから前記雑音データを除去した雑音除去音声を出力する雑音除去手段とを備えた雑音除去装置。 Noise superimposed voice input means for inputting noise superimposed voice and outputting noise superimposed voice data;
Noise input means for inputting noise and outputting noise data;
Voice interval determination means for inputting the noise-superimposed voice data and determining whether it is a voice interval or a noise interval;
When the speech section determination unit determines that the noise section is immediately before the speech section, the noise section for correcting the frequency characteristic difference between the noise superimposed speech data and the noise data from the noise section immediately before the speech section. Updates the coefficient of the pre-correction filter and corrects the difference in frequency characteristics between the noise-superimposed speech data and the noise data from the noise section immediately after the speech section when it is determined that the noise section is immediately after the speech section. Correction filter coefficient updating means for updating the coefficient of the post-correction filter for performing,
Correction filter correlation calculating means for calculating a correlation between the pre-correction filter coefficient and the post-correction filter coefficient;
When there is a correlation between the pre-correction filter coefficient and the post-correction filter coefficient in the correction filter correlation calculating means, the noise data is calculated from the noise-superimposed speech data in the speech section using the pre-correction filter coefficient. A noise removing device comprising noise removing means for outputting the removed noise-removed speech.

雑音重畳音声を入力し、雑音重畳音声データを出力する雑音重畳音声入力手段と、
雑音を入力し、雑音データを出力する雑音入力手段と、
前記雑音重畳音声データを入力し、音声区間か雑音区間かを判定する音声区間判定手段と、
前記音声区間判定手段が音声区間の直前の雑音区間と判定した場合に、当該音声区間直前の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための前補正フィルタの係数を更新すると共に、音声区間直後の雑音区間と判定した場合に、当該音声区間直後の雑音区間から、前記雑音重畳音声データと前記雑音データとの間における周波数特性の差異を補正するための後補正フィルタの係数を更新する補正フィルタ係数更新手段と、
前記前補正フィルタ係数及び前記後補正フィルタ係数に対し、これら係数が得られた時系列に従って重み付けを行って複数の前補正フィルタ係数及び後補正フィルタ係数の平均を算出し、平均補正フィルタ係数として出力する補正フィルタ係数重み付け平均算出手段と、
前記平均補正フィルタ係数を用いて、音声区間における雑音重畳音声データから雑音データを除去する雑音除去手段とを備えた雑音除去装置。 Noise superimposed voice input means for inputting noise superimposed voice and outputting noise superimposed voice data;
Noise input means for inputting noise and outputting noise data;
Voice interval determination means for inputting the noise-superimposed voice data and determining whether it is a voice interval or a noise interval;
When the speech section determination unit determines that the noise section is immediately before the speech section, the noise section for correcting the frequency characteristic difference between the noise superimposed speech data and the noise data from the noise section immediately before the speech section. Updates the coefficient of the pre-correction filter and corrects the difference in frequency characteristics between the noise-superimposed speech data and the noise data from the noise section immediately after the speech section when it is determined that the noise section is immediately after the speech section. Correction filter coefficient updating means for updating the coefficient of the post-correction filter for performing,
The average of a plurality of pre-correction filter coefficients and post-correction filter coefficients is calculated by weighting the pre-correction filter coefficients and the post-correction filter coefficients according to the time series in which these coefficients are obtained, and output as an average correction filter coefficient Correction filter coefficient weighted average calculating means to perform,
A noise removal device comprising noise removal means for removing noise data from noise-superimposed speech data in a speech section using the average correction filter coefficient.