JP6763319B2

JP6763319B2 - Non-purpose sound determination device, program and method

Info

Publication number: JP6763319B2
Application number: JP2017035286A
Authority: JP
Inventors: 克之高橋
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-02-27
Filing date: 2017-02-27
Publication date: 2020-09-30
Anticipated expiration: 2037-02-27
Also published as: JP2018142819A

Description

本発明は、非目的音判定装置、プログラム及び方法に関し、例えば、電話やテレビ会議などにおける音声処理や、音声認識処理に際して、目的音以外の非目的音（例えば、背景雑音）の有無の判定に適用し得る。 The present invention relates to a non-purpose sound determination device, a program, and a method for determining the presence or absence of a non-purpose sound (for example, background noise) other than the target sound in, for example, voice processing in a telephone or video conference, or voice recognition processing. Applicable.

近年、スマートフォンやカーナビゲーションなどの音声通話機能や音声認識機能などの様々な音声処理機能に対応する装置（以下、これらの装置を総称して「音声処理装置」と呼ぶものとする）が普及している。しかし、これらの音声処理装置が普及したことで、混雑した街中や走行中の車内など、以前よりも過酷な雑音環境下で音声処理装置が用いられるようになってきている。そのため、雑音環境下でも通話音質や音声認識性能を維持できるような、音声処理装置の需要が高まっている。 In recent years, devices that support various voice processing functions such as voice call functions such as smartphones and car navigation systems and voice recognition functions (hereinafter, these devices are collectively referred to as "voice processing devices") have become widespread. ing. However, with the widespread use of these voice processing devices, the voice processing devices have come to be used in more severe noise environments than before, such as in a crowded city or in a moving vehicle. Therefore, there is an increasing demand for a voice processing device that can maintain call quality and voice recognition performance even in a noisy environment.

従来の音声処理装置において、目的音を抽出して取得する際には、目的音以外の非目的音を抑制する処理が行われる。従来の非目的音を抑制する音声処理装置としては、例えば、特許文献１に記載された技術がある。 In a conventional voice processing device, when extracting and acquiring a target sound, a process of suppressing a non-purpose sound other than the target sound is performed. As a conventional voice processing device that suppresses non-purpose sounds, for example, there is a technique described in Patent Document 1.

特許文献１に記載された装置では、入力音声信号に遅延減算処理を施して、第１、第２の所定方位に死角を有する第１、第２の指向性信号を形成し、これら２つの指向性信号のコヒーレンスを取得する。そして、特許文献１に記載された装置では、取得したコヒーレンスと判定閾値とを比較して、入力音声信号が、目的方位から到来している目的音声の区間か、それ以外の非目的音声区間かを判定し、この判定結果に応じてゲインを設定し、ゲインを入力音声信号に乗算して非目的音声を減衰する。 In the apparatus described in Patent Document 1, the input audio signal is subjected to delay subtraction processing to form first and second directivity signals having blind spots in the first and second predetermined directions, and these two directivities are directed. Obtain coherence of sexual signals. Then, in the apparatus described in Patent Document 1, the acquired coherence is compared with the determination threshold value, and whether the input voice signal is a target voice section arriving from the target direction or a non-purpose voice section other than that. Is determined, the gain is set according to the determination result, and the gain is multiplied by the input audio signal to attenuate the non-purpose audio.

特開２０１３−１８２０４４号公報Japanese Unexamined Patent Publication No. 2013-182044 特開２０１４−１０６３３７号公報Japanese Unexamined Patent Publication No. 2014-106337

ところで、通常非目的音に含まれる成分としては、例えば、背景雑音（例えば、街中での雑踏や、自動車の走行雑音など）と、妨害音声（例えば、当該音声処理装置の使用者以外の人の話し声）に大別できる。 By the way, the components usually contained in the non-purpose sound include, for example, background noise (for example, crowds in the city, driving noise of a car, etc.) and disturbing sound (for example, a person other than the user of the voice processing device). It can be roughly divided into (speaking voice).

背景雑音と妨害音声とは特性や挙動が全く異なっているため、従来の音声処理装置は、背景雑音が存在するか否かにより、妨害音の抑圧パラメータを変更するなど、処理を切り替えなければ十分な効果が得られなかった。また、処理音を音声認識に供する場合も、やはり背景雑音の有無で認識処理やノイズリダクションの特性を変えなければ十分な認識性能が得られない。よって、背景雑音の存在を正確に判定（検出）できることは重要である。 Since the characteristics and behavior of the background noise and the disturbing sound are completely different, it is sufficient for the conventional voice processing device to switch the processing such as changing the suppression parameter of the disturbing sound depending on the presence or absence of the background noise. No effect was obtained. Further, even when the processed sound is used for voice recognition, sufficient recognition performance cannot be obtained unless the recognition processing and noise reduction characteristics are changed depending on the presence or absence of background noise. Therefore, it is important to be able to accurately determine (detect) the presence of background noise.

しかし、音声信号処理の利用環境が急激に拡大したことで、様々な未知の音源が存在する状態で背景雑音の存在を正確に判定することは難しくなっている。したがって、上述のような高度な信号処理実施の前提として、背景雑音の存在を正確に判定できる方法の必要性が増している。 However, due to the rapid expansion of the usage environment for audio signal processing, it is difficult to accurately determine the existence of background noise in the presence of various unknown sound sources. Therefore, as a premise for performing advanced signal processing as described above, there is an increasing need for a method capable of accurately determining the presence of background noise.

以上のような問題に鑑みて、精度よく非目的音（例えば、背景雑音）の有無を判定することができる非目的音判定装置、プログラム及び方法が望まれている。 In view of the above problems, a non-purpose sound determination device, a program, and a method capable of accurately determining the presence or absence of a non-purpose sound (for example, background noise) are desired.

第１の本発明は、（１）複数のマイクから得られた入力信号を時間領域から周波数領域に変換された周波数領域入力信号を取得し、取得した前記マイクごとの周波数領域入力信号の差に基づいて、正面に死角を有する第１の正面抑圧信号を生成する正面抑圧信号生成部と、（２）前記複数のマイクから得られた入力信号からコヒーレンスを算出するコヒーレンス算出部と、（３）前記第１の正面抑圧信号及び前記コヒーレンスの相関係数と、前記相関係数の振幅の傾きの正負の変動の激しさを表す第１の特徴量とを算出する第１の特徴量算出部と、（４）前記第１の特徴量算出部が算出した前記第１の特徴量の値に基づいて背景雑音の有無を判定する背景雑音存在判定部とを有することを特徴とする。 The first aspect of the present invention is (1) to acquire a frequency domain input signal obtained by converting input signals obtained from a plurality of microphones from a time domain to a frequency domain, and to obtain a difference between the acquired frequency domain input signals for each microphone. Based on this, a front suppression signal generation unit that generates a first front suppression signal having a blind spot on the front, (2) a coherence calculation unit that calculates coherence from input signals obtained from the plurality of microphones, and (3) A first feature amount calculation unit that calculates the correlation coefficient of the first frontal suppression signal and the coherence, and the first feature amount indicating the intensity of positive / negative fluctuation of the amplitude of the correlation coefficient. (4) It is characterized by having a background noise presence determination unit that determines the presence or absence of background noise based on the value of the first feature amount calculated by the first feature amount calculation unit.

第２の本発明の非目的音判定プログラムは、コンピュータを、（１）複数のマイクから得られた入力信号を時間領域から周波数領域に変換された周波数領域入力信号を取得し、取得した前記マイクごとの周波数領域入力信号の差に基づいて、正面に死角を有する第１の正面抑圧信号を生成する正面抑圧信号生成部と、（２）前記複数のマイクから得られた入力信号からコヒーレンスを算出するコヒーレンス算出部と、（３）前記第１の正面抑圧信号及び前記コヒーレンスの相関係数と、前記相関係数の振幅の傾きの正負の変動の激しさを表す第１の特徴量とを算出する第１の特徴量算出部と、（４）前記第１の特徴量算出部が算出した前記第１の特徴量の値に基づいて背景雑音の有無を判定する背景雑音存在判定部として機能させることを特徴とする。 The second non-purpose sound determination program of the present invention uses the computer to obtain (1) a frequency domain input signal obtained by converting input signals obtained from a plurality of microphones from a time domain to a frequency domain, and the acquired microphones. Based on the difference between the frequency domain input signals for each frequency domain, the coherence is calculated from the front suppression signal generator that generates the first front suppression signal having a blind spot on the front, and (2) the input signals obtained from the plurality of microphones. The coherence calculation unit, (3) the correlation coefficient of the first frontal suppression signal and the coherence, and the first feature quantity representing the intensity of positive and negative fluctuations of the amplitude gradient of the correlation coefficient are calculated. It functions as a background noise presence determination unit that determines the presence or absence of background noise based on the first feature amount calculation unit and (4) the value of the first feature amount calculated by the first feature amount calculation unit. It is characterized by that.

第３の本発明は、非目的音判定装置に使用する非目的音判定方法であって、正面抑圧信号生成部、コヒーレンス算出部、第１の特徴量算出部、及び背景雑音存在判定部を有し、（１）前記正面抑圧信号生成部は、複数のマイクから得られた入力信号を時間領域から周波数領域に変換された周波数領域入力信号を取得し、取得した前記マイクごとの周波数領域入力信号の差に基づいて、正面に死角を有する第１の正面抑圧信号を生成し、（２）前記コヒーレンス算出部は、前記複数のマイクから得られた入力信号からコヒーレンスを算出し、（３）前記第１の特徴量算出部は、前記第１の正面抑圧信号及び前記コヒーレンスの相関係数と、前記相関係数の振幅の傾きの正負の変動の激しさを表す第１の特徴量とを算出し、（４）前記背景雑音存在判定部は、前記第１の特徴量算出部が算出した前記第１の特徴量の値に基づいて背景雑音の有無を判定することを特徴とする。 The third invention is a non-purpose sound determination method used in a non-purpose sound determination device, and includes a front suppression signal generation unit, a coherence calculation unit, a first feature amount calculation unit, and a background noise presence determination unit. (1) The front suppression signal generation unit acquires a frequency domain input signal obtained by converting input signals obtained from a plurality of microphones from a time domain to a frequency domain, and the acquired frequency domain input signal for each microphone. A first front suppression signal having a blind spot on the front is generated based on the difference between the above, and (2) the coherence calculation unit calculates coherence from the input signals obtained from the plurality of microphones, and (3) the above. The first feature amount calculation unit calculates the correlation coefficient of the first frontal suppression signal and the coherence, and the first feature amount indicating the intensity of positive / negative fluctuation of the slope of the amplitude of the correlation coefficient. (4) The background noise presence determination unit is characterized in that the presence or absence of background noise is determined based on the value of the first feature amount calculated by the first feature amount calculation unit.

本発明によれば、精度よく非目的音（例えば、背景雑音）の有無を判定することができる。 According to the present invention, the presence or absence of non-purpose sound (for example, background noise) can be accurately determined.

第１の実施形態に係る非目的音判定装置の機能的構成について示したブロック図である。It is a block diagram which showed the functional structure of the non-purpose sound determination apparatus which concerns on 1st Embodiment. 第１の実施形態に係るマイクの配置例について示した説明図である。It is explanatory drawing which showed the arrangement example of the microphone which concerns on 1st Embodiment. 第１の実施形態に係る非目的音判定装置で適用される指向性信号の特性について示した図である。It is a figure which showed the characteristic of the directional signal applied by the non-purpose sound determination apparatus which concerns on 1st Embodiment. 第１の実施形態に係る背景雑音存在判定部が背景雑音の有無を判定する処理について示したフローチャート（その１）である。It is a flowchart (No. 1) which showed the process which the background noise existence determination part which concerns on 1st Embodiment determines the presence or absence of background noise. 第１の実施形態に係る背景雑音存在判定部が背景雑音の有無を判定する処理について示したフローチャート（その２）である。FIG. 2 is a flowchart (No. 2) showing a process of determining the presence or absence of background noise by the background noise presence determination unit according to the first embodiment. 第２の実施形態に係る非目的判定装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the non-purpose determination apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る相関計算及び妨害音存在判定部が妨害音声の有無を判定する処理について示したフローチャート（その１）である。It is a flowchart (No. 1) which showed about the correlation calculation which concerns on 2nd Embodiment, and the process which the interference sound existence determination part determines the presence or absence of an interference sound. 第２の実施形態に係る相関計算及び妨害音存在判定部が妨害音声の有無を判定する処理について示したフローチャート（その２）である。It is a flowchart (No. 2) which showed about the correlation calculation which concerns on 2nd Embodiment, and the process which the interference sound existence determination part determines the presence or absence of an interference sound.

（Ａ）第１の実施形態
以下、本発明による非目的音判定装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, the first embodiment of the non-purpose sound determination device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る非目的音判定装置１の全体構成を示すブロック図である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing an overall configuration of a non-purpose sound determination device 1 according to the first embodiment.

非目的音判定装置１は、一対のマイクｍ＿１、ｍ＿２のそれぞれから、図示しないＡＤ変換器を介して入力信号ｓ１（ｎ）、ｓ２（ｎ）を取得する。なお、ｎはサンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎが小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The non-purpose sound determination device 1 acquires input signals s1 (n) and s2 (n) from each of the pair of microphones m_1 and m_2 via an AD converter (not shown). Note that n is an index representing the input order of the samples, and is represented by a positive integer. In the text, it is assumed that the smaller n is the older input sample, and the larger n is the newer input sample.

非目的音判定装置１は、マイクｍ＿１、ｍ＿２で補足される入力信号に非目的音（背景雑音）が含まれるか否かを判定し、その判定結果を図示しない音声処理装置に供給する。音声処理装置は、非目的音判定装置１から供給される判定結果を利用して、入力信号の処理を行う。音声処理装置が入力信号に対して行う処理内容については限定されないものである。音声処理装置は、例えば、テレビ会議システムや携帯電話端末などの通信装置や音声認識機能の前処理に、非目的音判定装置１から供給される判定結果を利用する。音声処理装置は、例えば、非目的音判定装置１から供給される判定結果を非目的音（例えば、背景雑音）の抑制処理等に利用する。 The non-purpose sound determination device 1 determines whether or not the input signals captured by the microphones m_1 and m_2 include non-purpose sounds (background noise), and supplies the determination result to a voice processing device (not shown). The voice processing device processes the input signal by using the determination result supplied from the non-purpose sound determination device 1. The processing content performed by the voice processing device on the input signal is not limited. The voice processing device uses the determination result supplied from the non-purpose sound determination device 1 for preprocessing of a communication device such as a video conferencing system or a mobile phone terminal or a voice recognition function. The voice processing device uses, for example, the determination result supplied from the non-purpose sound determination device 1 for suppression processing of the non-purpose sound (for example, background noise).

図２は、マイクｍ＿１、ｍ＿２の配置の例について示した説明図である。 FIG. 2 is an explanatory diagram showing an example of arrangement of microphones m_1 and m_2.

図２に示すように、この実施形態では、マイクｍ＿１、ｍ＿２は、２つのマイクｍ＿１、ｍ＿２を目的音の到来する方向（目的音の音源の方向）に対して水平となるように配置されているものとする。また、以下では、図２に示すように、２つのマイクｍ＿１、ｍ＿２の間の位置から見て、目的音の到来方向を前方向又は正面方向と呼ぶものとする。また、以下では、図２に示すように、右方向、左方向、後方向と呼ぶ場合は、２つのマイクｍ＿１、ｍ＿２の間の位置から目的音の到来方向を見た場合の各方向を示すものとして説明する。なお、この実施形態では、目的音がマイクｍ＿１、ｍ＿２の正面方向から到来し、妨害音声を含む非目的音が左右方向（横方向）から到来するものとして説明する。 As shown in FIG. 2, in this embodiment, the microphones m_1 and m_2 are arranged so that the two microphones m_1 and m_2 are horizontal to the direction in which the target sound arrives (the direction of the sound source of the target sound). It is assumed that there is. Further, in the following, as shown in FIG. 2, the direction in which the target sound arrives is referred to as the forward direction or the front direction when viewed from the position between the two microphones m_1 and m_1. Further, in the following, as shown in FIG. 2, when referred to as a right direction, a left direction, and a rear direction, each direction when the arrival direction of the target sound is viewed from a position between the two microphones m_1 and m_1 is shown. Explain as a thing. In this embodiment, it is assumed that the target sound comes from the front direction of the microphones m_1 and m_2, and the non-purpose sound including the disturbing sound comes from the left-right direction (horizontal direction).

非目的音判定装置１は、ＦＦＴ部１０、正面抑圧信号生成部２０、コヒーレンス計算部３０、相関及びｍｏｄＧＩ計算部４０、及び背景雑音存在判定部５０を有している。 The non-purpose sound determination device 1 includes an FFT unit 10, a front suppression signal generation unit 20, a coherence calculation unit 30, a correlation and modGI calculation unit 40, and a background noise presence determination unit 50.

非目的音判定装置１は、プロセッサやメモリ等を有するコンピュータにプログラム（実施形態に係る非目的音判定プログラムを含むプログラム）をインストールして実現するようにしてもよいが、この場合でも、非目的音判定装置１は機能的には図１を用いて示すことができる。なお、非目的音判定装置１については一部又は全部をハードウェア的に実現するようにしてもよい。 The non-purpose sound determination device 1 may be realized by installing a program (a program including the non-purpose sound determination program according to the embodiment) on a computer having a processor, a memory, or the like, but even in this case, the non-purpose sound determination device 1 is non-purpose. The sound determination device 1 can be functionally shown with reference to FIG. The non-purpose sound determination device 1 may be partially or completely realized in terms of hardware.

ＦＦＴ部１０は、マイクｍ１及びマイクｍ２から入力信号系列ｓ１及びｓ２を受け取り、その入力信号ｓ１及びｓ２に高速フーリエ変換（あるいは離散フーリエ変換）を行うものである。これにより、入力信号ｓ１及びｓ２が周波数領域で表現されることになる。なお、ＦＦＴ部１０は、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から所定のＮ個（Ｎは任意の整数）のサンプルから成る、分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成するものとする。入力信号ｓ１からＦＲＡＭＥ１を構成する例を以下の（１）式に示す。なお、以下の（１）式において、Ｋはフレームの順番を表すインデックスであり、正の整数で表現される。以下では、Ｋの値が小さいほど古い分析フレームであり、Ｋの値大きいほど新しい分析フレームであるものとする。また、以降の動作説明において、特に但し書きが無い限りは、分析対象となる最新の分析フレームを表すインデックスはＫであるとする。
ＦＲＡＭＥ１（１）＝｛ｓ１（１）、ｓ１（２）・・、ｓ１（ｉ）、・・ｓ１（ｎ）｝
ＦＲＡＭＥ１（Ｋ）＝｛ｓ１（Ｎ×Ｋ＋１）、ｓ１（Ｎ×Ｋ＋２）・・、ｓ１（Ｎ×Ｋ＋ｉ）、・・ｓ１（Ｎ×Ｋ＋Ｎ）｝ …（１） The FFT unit 10 receives input signal sequences s1 and s2 from microphones m1 and m2, and performs a fast Fourier transform (or discrete Fourier transform) on the input signals s1 and s2. As a result, the input signals s1 and s2 are represented in the frequency domain. The FFT unit 10 includes an analysis frame FRAME1 (K) and an analysis frame FRAME1 (K) composed of a predetermined N samples (N is an arbitrary integer) from the input signals s1 (n) and s2 (n) when performing the fast Fourier transform. It shall constitute FRAME2 (K). An example of constructing FRAME1 from the input signal s1 is shown in the following equation (1). In the following equation (1), K is an index representing the order of frames and is represented by a positive integer. In the following, it is assumed that the smaller the value of K, the older the analysis frame, and the larger the value of K, the newer the analysis frame. Further, in the following operation description, unless otherwise specified, the index representing the latest analysis frame to be analyzed is assumed to be K.
FRAME1 (1) = {s1 (1), s1 (2) ..., s1 (i), ... s1 (n)}
FRAME1 (K) = {s1 (N × K + 1), s1 (N × K + 2) ..., s1 (N × K + i), ・・ s1 (N × K + N)}… (1)

ＦＦＴ部１０は、分析フレームごとに高速フーリエ変換処理を施すことで、入力信号ｓ１から構成した分析フレームＦＲＡＭＥ１（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ１（ｆ，Ｋ）と、入力信号ｓ２から構成した分析フレームＦＲＡＭＥ２（Ｋ）をフーリエ変換して得た周波数領域信号Ｘ２（ｆ，Ｋ）とを取得する。なおｆは周波数を表すインデックスである。また（ｆ，Ｋ）は単一の値ではなく、以下の（２）式のように、複数の周波数ｆ１〜ｆｍのｍ個（ｍは任意の整数）のスペクトル成分から構成されるものであるものとする。 The FFT unit 10 performs a fast Fourier transform process for each analysis frame to obtain a frequency domain signal X1 (f, K) obtained by Fourier transforming the analysis frame FRAME1 (K) composed of the input signal s1 and an input signal. The frequency domain signal X2 (f, K) obtained by Fourier transforming the analysis frame FRAME2 (K) composed of s2 is acquired. Note that f is an index representing the frequency. Further, (f, K) is not a single value, but is composed of m (m is an arbitrary integer) spectral components of a plurality of frequencies f1 to fm as shown in the following equation (2). It shall be.

ＦＦＴ部１０は、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）を、正面抑圧信号生成部２０及びコヒーレンス計算部３０に供給する。 The FFT unit 10 supplies the frequency domain signals X1 (f, K) and X2 (f, K) to the front suppression signal generation unit 20 and the coherence calculation unit 30.

なお、Ｘ１（ｆ，Ｋ）は複素数であり、実部と虚部で構成される。これは、Ｘ２（ｆ，Ｋ）及び、後述する正面抑圧信号生成部２０で説明する「Ｎ（ｆ，Ｋ）」についても同様である。
Ｘ１（ｆ，Ｋ）＝｛Ｘ１（ｆ１，Ｋ）、Ｘ１（ｆ２，Ｋ）、・・Ｘ１（ｆｉ，Ｋ）・・、Ｘ１（ｆｍ，Ｋ）｝ …（２） Note that X1 (f, K) is a complex number and is composed of a real part and an imaginary part. This also applies to X2 (f, K) and “N (f, K)” described later in the front suppression signal generation unit 20.
X1 (f, K) = {X1 (f1, K), X1 (f2, K), ... X1 (fi, K) ..., X1 (fm, K)} ... (2)

次に、正面抑圧信号生成部２０について説明する。 Next, the front suppression signal generation unit 20 will be described.

正面抑圧信号生成部２０は、ＦＦＴ部１０から供給された信号について、周波数ごとに正面方向の信号成分を抑圧する処理を行う。言い換えると、正面抑圧信号生成部２０は、正面方向の成分を抑圧する指向性フィルタとして機能する。 The front suppression signal generation unit 20 performs a process of suppressing the signal component in the front direction for each frequency with respect to the signal supplied from the FFT unit 10. In other words, the front suppression signal generation unit 20 functions as a directional filter that suppresses components in the front direction.

例えば、正面抑圧信号生成部２０は、図３に示すように、正面方向に死角を有する８の字型の双指向性のフィルタを用いて、ＦＦＴ部１０から供給された信号から正面方向の成分を抑圧する指向性フィルタを形成する。 For example, as shown in FIG. 3, the front suppression signal generation unit 20 uses a figure eight-shaped bidirectional filter having a blind spot in the front direction, and a component in the front direction from the signal supplied from the FFT unit 10. Form a directional filter that suppresses.

具体的には、正面抑圧信号生成部２０は、ＦＦＴ部１０から供給された信号「Ｘ１（ｆ，Ｋ）」、「Ｘ２（ｆ，Ｋ）」に基づいて以下の（３）式のような計算を行って、周波数ごとの正面抑圧信号Ｎ（ｆ，Ｋ）を生成する。以下の（３）式の計算は、上述の図３のような、正面方向に死角を有する８の字型の双指向性のフィルタを形成する処理に相当する。
Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（３） Specifically, the front suppression signal generation unit 20 is as described in the following equation (3) based on the signals “X1 (f, K)” and “X2 (f, K)” supplied from the FFT unit 10. A calculation is performed to generate a front suppression signal N (f, K) for each frequency. The calculation of the following equation (3) corresponds to the process of forming a figure eight bidirectional filter having a blind spot in the front direction as shown in FIG. 3 described above.
N (f, K) = X1 (f, K) -X2 (f, K) ... (3)

そして、正面抑圧信号生成部２０は、以下の（４）式を用いて、全周波数にわたってＮ（ｆ，Ｋ）を平均した、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出する。

Then, the front suppression signal generation unit 20 calculates the average front suppression signal AVE_N (K) by averaging N (f, K) over all frequencies using the following equation (4).

次に、コヒーレンス計算部３０の処理について説明する。 Next, the processing of the coherence calculation unit 30 will be described.

コヒーレンス計算部３０は、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）について、左方向（第１の方向）に強い指向性のフィルタで処理された信号（以下、「指向性信号Ｂ１（ｆ）」と呼ぶ）と、右方向（第２の方向）に強い指向性のフィルタで処理された信号（以下、「指向性信号Ｂ２（ｆ）」と呼ぶ）とに基づくコヒーレンスＣＯＨ（Ｋ）を算出する。なお、指向性信号Ｂ１（ｆ）及び指向性信号Ｂ２（ｆ）に係る指向性の方向は正面方向以外の任意の方向（ただし、Ｂ１（ｆ）とＢ２（ｆ）とで異なる方向とする必要がある）とするようにしてもよい。 The coherence calculation unit 30 uses the frequency domain signals X1 (f, K) and X2 (f, K) as signals processed by a filter having a strong directional filter in the left direction (first direction) (hereinafter, “directional signal”). Coherence COH (hereinafter referred to as "directional signal B2 (f)") based on "B1 (f)") and a signal processed by a strong directional filter in the right direction (second direction) (hereinafter referred to as "directional signal B2 (f)"). K) is calculated. It should be noted that the directivity directions of the directivity signal B1 (f) and the directivity signal B2 (f) need to be arbitrary directions other than the front direction (however, the directions of B1 (f) and B2 (f) are different. There is).

コヒーレンスＣＯＨ（Ｋ）を算出する具体的な算出処理（例えば、計算式）については限定されないものであるが、例えば、特許文献１と同様の処理（例えば、特許文献１に記載された（３）式〜（７）式の計算処理）を適用することができるため、詳細については省略する。 The specific calculation process (for example, calculation formula) for calculating the coherence COH (K) is not limited, but for example, the same process as in Patent Document 1 (for example, described in Patent Document 1 (3)). Since equations (calculation processing of equation (7)) can be applied, details will be omitted.

次に、相関及びｍｏｄＧＩ計算部４０の処理について説明する。 Next, the correlation and the processing of the modGI calculation unit 40 will be described.

まず、相関及びｍｏｄＧＩ計算部４０は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）と、コヒーレンスＣＯＨ（Ｋ）から、両者の相関係数ｃｏｒ（Ｋ）を算出する。相関係数ｃｏｒ（Ｋ）を算出する理由（原理）は後述するが、簡単に言えば、相関係数ｃｏｒ（Ｋ）の正負を観測することにより、目的音が重畳されてとしても、妨害音声を容易に検出できるからである。 First, the correlation and modGI calculation unit 40 calculates the correlation coefficient cor (K) of both from the average front suppression signal AVE_N (K) and the coherence COH (K). The reason (principle) for calculating the correlation coefficient cor (K) will be described later, but simply put, by observing the positive and negative of the correlation coefficient cor (K), even if the target sound is superimposed, the disturbing sound This is because can be easily detected.

ここでは、目的音がマイクｍ＿１、ｍ＿２の正面方向から到来し、妨害音声を含む非目的音が左右方向（横方向）から到来するものとして説明する。例えば、マイクｍ＿１、ｍ＿２を電話端末（例えば、携帯電話端末等）の受話器のマイク部分に適用した場合には、目的音としての話者（ユーザ）の音声はマイクｍ＿１、ｍ＿２の正面方向から到来し、当該電話端末の話者以外の音声は、左右方向（横方向）から到来することになる。 Here, it is assumed that the target sound arrives from the front direction of the microphones m_1 and m_2, and the non-purpose sound including the disturbing sound arrives from the left-right direction (horizontal direction). For example, when the microphones m_1 and m_2 are applied to the microphone portion of the receiver of a telephone terminal (for example, a mobile phone terminal, etc.), the voice of the speaker (user) as the target sound comes from the front direction of the microphones m_1 and m_2. However, the voice of the telephone terminal other than the speaker comes from the left-right direction (horizontal direction).

したがって、例えば、「妨害音声が存在せず」かつ「目的音が存在する」場合は、正面抑圧信号Ｎ（ｆ，Ｋ）の平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）は、目的音成分の大きさに比例した値となる。図２に示すように、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）（正面抑圧信号Ｎ（ｆ，Ｋ））生成時の指向性特性には、「妨害音声が存在せず」かつ「目的音が存在する」場合でも、正面方向から到来する信号成分も含まれることになるためである。ただし、図２に示すように、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）（正面抑圧信号Ｎ（ｆ，Ｋ））生成時の指向性特性には、正面方向から到来する信号成分も含まれるが、横方向のゲインと比較すると非常に小さい。そのため、「妨害音声が存在せず」かつ「目的音が存在する」場合の正面抑圧信号Ｎ（ｆ，Ｋ）のゲインは、妨害音声が存在する場合よりも小さくなる。 Therefore, for example, when "there is no disturbing sound" and "there is a target sound", the average front suppression signal AVE_N (K) of the front suppression signal N (f, K) has the magnitude of the target sound component. It will be a proportional value. As shown in FIG. 2, the directivity characteristics at the time of generating the average front suppression signal AVE_N (K) (front suppression signal N (f, K)) are "no disturbing sound" and "presence of target sound". Even in this case, the signal component arriving from the front direction is also included. However, as shown in FIG. 2, the directivity characteristics at the time of generating the average front suppression signal AVE_N (K) (front suppression signal N (f, K)) include the signal component arriving from the front direction, but laterally. Very small compared to the directional gain. Therefore, the gain of the front suppression signal N (f, K) when "there is no disturbing sound" and "there is a target sound" is smaller than that when the disturbing sound is present.

また、コヒーレンスＣＯＨ（Ｋ）は、簡単に述べれば、第１の方向（右方向）から到来する信号と第２の方向（左方向）から到来する信号の相関（特徴量）と言える。従って、コヒーレンスＣＯＨ（Ｋ）が小さい場合とは、２つの指向性信号Ｂ１（ｆ）、Ｂ２（ｆ）の相関が小さい場合であり、反対にコヒーレンスＣＯＨ（Ｋ）が大きい場合とは相関が大きい場合と言い換えることができる。そして、相関が小さい場合は、目的音の到来方向が右又は左のどちらかに大きく偏った場合か、偏りがなくても雑音のような明確な規則性の少ない信号の場合である。また、例えば、マイクｍ＿１、ｍ＿２を電話端末（例えば、携帯電話端末等）の受話器のマイク部分に適用した場合には、話者の音声（目的音声）は正面から到来し、妨害音声は正面以外から到来する傾向が強い。以上のようにコヒーレンスＣＯＨ（Ｋ）は、入力信号の到来方向と深い関係を持つ特徴量となる。したがって、「妨害音声が存在せず」かつ「目的音が存在する」場合には、コヒーレンスＣＯＨ（Ｋ）の値は大きくなる傾向となり、「妨害音声が存在する」場合には、コヒーレンスＣＯＨ（Ｋ）の値は小さくなる傾向となる。 In addition, the coherence COH (K) can be said to be simply a correlation (feature amount) between a signal arriving from the first direction (right direction) and a signal arriving from the second direction (left direction). Therefore, the case where the coherence COH (K) is small is the case where the correlation between the two directional signals B1 (f) and B2 (f) is small, and conversely, the correlation is large when the coherence COH (K) is large. In other words, the case. When the correlation is small, the direction of arrival of the target sound is largely biased to either the right or the left, or the signal has no clear regularity such as noise even if there is no bias. Further, for example, when the microphones m_1 and m_2 are applied to the microphone portion of the receiver of a telephone terminal (for example, a mobile phone terminal, etc.), the speaker's voice (target voice) arrives from the front, and the disturbing voice is other than the front. There is a strong tendency to arrive from. As described above, the coherence COH (K) is a feature quantity having a close relationship with the arrival direction of the input signal. Therefore, the value of coherence COH (K) tends to be large when "there is no disturbing voice" and "there is a target sound", and when "there is disturbing voice", the coherence COH (K) tends to be large. ) Tends to be smaller.

以上の各値の挙動を妨害音声の有無に着目して整理すると以下のような条件で、妨害音声の有無を判断することができる。以下では、「妨害音声が存在せず」かつ「目的音が存在する」という条件（以下、「第１の条件」と呼ぶ）と、「妨害音声が存在する」という条件（以下、「第２の条件」と呼ぶ）に場合分けして、妨害音声の有無の判定方法について説明する。 By arranging the behavior of each of the above values by focusing on the presence or absence of disturbing voice, the presence or absence of disturbing voice can be determined under the following conditions. In the following, a condition that "there is no disturbing sound" and "there is a target sound" (hereinafter referred to as "first condition") and a condition that "there is a disturbing sound" (hereinafter, "second condition"). The method of determining the presence or absence of disturbed voice will be described in each case (referred to as "condition").

第１の条件の場合（「妨害音声が存在せず」かつ「目的音が存在する」場合）には、コヒーレンスＣＯＨ（Ｋ）が比較的大きな値となり、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）は、目的音成分の大きさに比例した値となる。 In the case of the first condition (when "there is no disturbing sound" and "there is a target sound"), the coherence COH (K) becomes a relatively large value, and the average front suppression signal AVE_N (K) becomes. The value is proportional to the magnitude of the target sound component.

一方、第２の条件の場合（「妨害音声が存在する」場合）には、コヒーレンスＣＯＨ（Ｋ）の値は小さい値となり、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）は大きな値となる傾向にある。 On the other hand, in the case of the second condition (when "interfering voice is present"), the value of coherence COH (K) tends to be small, and the average front suppression signal AVE_N (K) tends to be large.

したがって、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）の相関係数ｃｏｒ（Ｋ）を導入すると、相関係数ｃｏｒ（Ｋ）と妨害音声の有無との関係は以下のような関係となる。 Therefore, when the correlation coefficient cor (K) of the average front suppression signal AVE_N (K) and the coherence COH (K) is introduced, the relationship between the correlation coefficient cor (K) and the presence or absence of disturbed voice is as follows. Become.

妨害音声が存在しない場合は、相関係数ｃｏｒ（Ｋ）は正の値（相関性が高いことを示す所定値以上の値）となる傾向となる。一方、妨害音声が存在する場合には、相関係数ｃｏｒ（Ｋ）は負の値（相関性が低いことを示す所定値未満の値）となる傾向となる。 In the absence of disturbing voice, the correlation coefficient cor (K) tends to be a positive value (a value greater than or equal to a predetermined value indicating that the correlation is high). On the other hand, when the disturbing voice is present, the correlation coefficient cor (K) tends to be a negative value (a value less than a predetermined value indicating that the correlation is low).

すなわち、相関係数ｃｏｒ（Ｋ）を導入することにより、例えば、相関係数ｃｏｒ（Ｋ）の正負判断というシンプルな処理で、妨害音声の有無を判定（妨害音声が存在する区間を検出）することができる。 That is, by introducing the correlation coefficient cor (K), for example, the presence or absence of the disturbing voice is determined (detects the section in which the disturbing voice exists) by a simple process of determining whether the correlation coefficient cor (K) is positive or negative. be able to.

そこで、この実施形態の相関及びｍｏｄＧＩ計算部４０は、まず、相関係数ｃｏｒ（Ｋ）を求め、妨害音声が存在する区間を検出する。 Therefore, the correlation and modGI calculation unit 40 of this embodiment first obtains the correlation coefficient cor (K) and detects the section in which the disturbing voice exists.

なお、相関及びｍｏｄＧＩ計算部４０が、相関係数ｃｏｒ（Ｋ）を求める際の具体的な計算方法については限定されないものであるが、例えば、相関及びｍｏｄＧＩ計算部４０は、以下の（５）式を用いて相関係数ｃｏｒ（Ｋ）を求めるようにしてもよい。なお、以下の（５）式において、Ｃｏｖ［ＡＶＥ＿Ｎ（Ｋ），ＣＯＨ（Ｋ）］は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）の共分散を示している。また、以下の（５）式において、σＮ（ｆ，Ｋ）は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）の標準偏差を示している。さらに、以下の（５）式において、σＣＯＨ（Ｋ）は、コヒーレンスＣＯＨ（Ｋ）の標準偏差を示している。以下の（５）式にて相関係数ｃｏｒ（Ｋ）を求める場合には、ＡＶＥ＿Ｎ（Ｋ）及びＣＯＨ（Ｋ）についてそれぞれ直近に処理した所定数i個のフレームの結果を用いて、標準偏差や共分散を求めるようにしてもよい。具体的には、以下の（５）式にて相関係数ｃｏｒ（Ｋ）を求める過程において、例えば、直近に処理したｉ個のフレーム（Ｋ−ｉ番目のフレーム、Ｋ−（ｉ−１）番目のフレーム、…、Ｋ−１番目のフレーム、Ｋ番目のフレームの）のそれぞれに係るＣＯＨ及びＡＶＥ＿Ｎを用いて、標準偏差（σＮ（ｆ，Ｋ）、及びσＣＯＨ（Ｋ））や共分散（Ｃｏｖ［ＡＶＥ＿Ｎ（Ｋ），ＣＯＨ（Ｋ）］）を求めるようにしてもよい。言い換えると、相関及びｍｏｄＧＩ計算部４０は、相関係数ｃｏｒ（Ｋ）を求める過程において、直近に求めたｉ個のＡＶＥ＿Ｎ及びＣＯＨをサンプルとして用いて、以下の（５）式における標準偏差や共分散を求めるようにしてもよい。

The specific calculation method when the correlation and modGI calculation unit 40 obtains the correlation coefficient cor (K) is not limited. For example, the correlation and modGI calculation unit 40 includes the following (5). The correlation coefficient cor (K) may be obtained using an equation. In the following equation (5), Cov [AVE_N (K), COH (K)] indicates the covariance of the average front suppression signal AVE_N (K) and the coherence COH (K). Further, in the following equation (5), σN (f, K) indicates the standard deviation of the average front suppression signal AVE_N (K). Further, in the following equation (5), σCOH (K) indicates the standard deviation of coherence COH (K). When the correlation coefficient cor (K) is calculated by the following equation (5), the standard deviation is used for the results of the predetermined number of i frames processed most recently for AVE_N (K) and COH (K). Or covariance may be obtained. Specifically, in the process of obtaining the correlation coefficient cor (K) by the following equation (5), for example, the i-th frame (K-i-th frame, K- (i-1)) processed most recently. The standard deviation (σN (f, K), and σCOH (K)) and covariance (of the second frame, ..., K-1st frame, Kth frame) are used with COH and AVE_N, respectively. Cov [AVE_N (K), COH (K)]) may be obtained. In other words, the correlation and modGI calculation unit 40 uses the most recently obtained i AVE_N and COH as samples in the process of obtaining the correlation coefficient cor (K), and uses the standard deviation and covariance in the following equation (5). You may want to find the variance.

そして、相関及びｍｏｄＧＩ計算部４０は、（５）式により、算出したｃｏｒ（Ｋ）を特許文献２の（１３）式に代入することにより、相関のｍｏｄＧＩを求める（以下、求めた相関のｍｏｄＧＩを、「ｃｏｒ＿ｍｏｄＧＩ（Ｋ）」とする）。相関のｍｏｄＧＩを算出した理由、及び利用方法等については、後述する。 Then, the correlation and modGI calculation unit 40 obtains the modGI of the correlation by substituting the calculated cor (K) by the formula (5) into the formula (13) of Patent Document 2 (hereinafter, the obtained modGI of the correlation). Is referred to as "cor_modGI (K)"). The reason for calculating the mod GI of the correlation, the usage method, and the like will be described later.

次に、背景雑音存在判定部５０の処理について説明する。 Next, the processing of the background noise presence determination unit 50 will be described.

背景雑音存在判定部５０は、相関及びｍｏｄＧＩ計算部４０で求めたｃｏｒ＿ｍｏｄＧＩ（Ｋ）を用いて、背景雑音が存在する区間を判定（検出）する。 The background noise presence determination unit 50 determines (detects) a section in which the background noise exists by using the correlation and the cor_modGI (K) obtained by the modGI calculation unit 40.

ところで、背景雑音が存在する場合、先に述べた平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）の相関（相関係数ｃｏｒ（Ｋ））の挙動は、次のように変化する。 By the way, in the presence of background noise, the behavior of the correlation (correlation coefficient cor (K)) between the average front suppression signal AVE_N (K) and the coherence COH (K) described above changes as follows.

妨害音が存在すると相関係数ｃｏｒ（Ｋ）が負の値、妨害音が存在しなければ相関係数ｃｏｒ（Ｋ）が正値、というマクロな挙動はある程度維持される。ただし、背景雑音の影響を受けて、正面抑圧信号（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））の振幅の大小の変動の不規則さが増すのに対し、コヒーレンスＣＯＨ（Ｋ）、はダイナミックレンジが小さくなる程度で、振幅の大小の不規則さは極端に変化しない。このため、正面抑圧信号の増加・減少と、コヒーレンスＣＯＨ（Ｋ）の増加・減少の同期性が損なわれ、相関係数ｃｏｒ（Ｋ）の増減の変動が激しくなる。また、相関係数ｃｏｒ（Ｋ）の正負の変動の頻度が増す。 The macro behavior that the correlation coefficient cor (K) is a negative value when there is an interfering sound and the correlation coefficient cor (K) is a positive value when there is no interfering sound is maintained to some extent. However, under the influence of background noise, the irregularity of fluctuations in the amplitude of the front suppression signal (average front suppression signal AVE_N (K)) increases, whereas the dynamic range of coherence COH (K) is small. To some extent, the irregularity of amplitude does not change drastically. Therefore, the synchrony between the increase / decrease of the frontal suppression signal and the increase / decrease of the coherence COH (K) is impaired, and the fluctuation of the increase / decrease of the correlation coefficient cor (K) becomes large. In addition, the frequency of positive and negative fluctuations of the correlation coefficient cor (K) increases.

以上より、背景雑音の影響が増すほど、相関係数ｃｏｒ（Ｋ）の増減の変動や、正負の変動頻度は増加する。このように、背景雑音が存在する場合には、相関係数ｃｏｒ（Ｋ）の増減の変動や正負の変動の頻度が増し、背景雑音の影響が増すほどこれらの変動は大きくなる。この挙動は、背景雑音にのみ由来するものである。よって、背景雑音存在判定部５０は、相関係数ｃｏｒ（Ｋ）の傾きの正負の変動の激しさを観測することで、目的音声や妨害音声の影響を受けずに背景雑音が存在するか否かを判定することができる。 From the above, as the influence of background noise increases, the fluctuation of the increase / decrease in the correlation coefficient cor (K) and the frequency of positive / negative fluctuations increase. In this way, when background noise is present, the frequency of fluctuations in the increase / decrease in the correlation coefficient cor (K) and the frequency of positive / negative fluctuations increase, and these fluctuations increase as the influence of the background noise increases. This behavior is derived only from background noise. Therefore, the background noise existence determination unit 50 observes the intensity of the positive / negative fluctuation of the slope of the correlation coefficient cor (K) to determine whether the background noise exists without being affected by the target voice or the disturbing voice. Can be determined.

ところで、ｍｏｄＧＩ（特許文献２の（１３）式で定義されている）は、波形の傾きの正負が変動する頻度を表している。ｍｏｄＧＩは、信号の傾きの正負の変動が小さくなる程小さくなるのに対し、傾きの正負の変動が大きくなる程大きくなる、という特徴を有する。 By the way, modGI (defined by the equation (13) of Patent Document 2) represents the frequency with which the positive and negative of the slope of the waveform fluctuates. The modGI has a feature that the smaller the positive / negative fluctuation of the slope of the signal, the smaller the modGI, whereas the larger the positive / negative fluctuation of the slope, the larger the modGI.

従って、背景雑音存在判定部５０は、先述の相関及びｍｏｄＧＩ計算部４０で求めた相関のｍｏｄＧＩ（ｃｏｒ＿ｍｏｄＧＩ（Ｋ））を参照し、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）が大きければ背景雑音は存在し、反対にｃｏｒ＿ｍｏｄＧＩ（Ｋ）が小さければ背景雑音が存在しない、と判定することができる。 Therefore, the background noise existence determination unit 50 refers to the modGI (cor_modGI (K)) of the above-mentioned correlation and the correlation obtained by the modGI calculation unit 40, and if the cor_modGI (K) is large, the background noise exists, and conversely, the cor_modGI If (K) is small, it can be determined that there is no background noise.

この実施形態の背景雑音存在判定部５０は、例えば、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）が所定の閾値より大きかった場合、背景雑音有りを示す値（例えば、「１」）を出力し、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）が閾値以下だった場合には背景雑音無しを示す値（例えば、「０」）を出力するようにしてもよい。閾値の値は、種々様々な値を適用でき、限定されないものであるが、例えば、種々様々なシュミレーション及び統計的な分析により最適な値が定まる。 For example, when the background noise presence determination unit 50 of this embodiment is larger than a predetermined threshold value, the background noise presence determination unit 50 outputs a value indicating the presence of background noise (for example, “1”), and the cor_modGI (K) is the threshold value. In the case of the following, a value indicating no background noise (for example, "0") may be output. Various various values can be applied to the threshold value, and the optimum value is determined by, for example, various simulations and statistical analysis.

また、背景雑音存在判定部５０は、判定結果を示す信号Ｙ（Ｋ）を出力する。信号Ｙ（Ｋ）の形式は限定されないものであるが、例えば、「背景雑音有り」を示す値（例えば、「１」）又は、「背景雑音無し」を示す値（例えば、「０」）を出力するようにしてもよい。なお、背景雑音存在判定部５０が信号Ｙ（Ｋ）を出力する方式や供給先については限定されないものである。 Further, the background noise presence determination unit 50 outputs a signal Y (K) indicating the determination result. The format of the signal Y (K) is not limited, and for example, a value indicating "with background noise" (for example, "1") or a value indicating "without background noise" (for example, "0") is used. It may be output. The method by which the background noise presence determination unit 50 outputs the signal Y (K) and the supply destination are not limited.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有するこの実施形態の非目的音判定装置１の動作（実施形態の判定方法）を説明する。 (A-2) Operation of the First Embodiment Next, the operation (determination method of the embodiment) of the non-purpose sound determination device 1 of this embodiment having the above configuration will be described.

まず、非目的音判定装置１の全体の動作について図１を用いて説明する。 First, the overall operation of the non-purpose sound determination device 1 will be described with reference to FIG.

マイクｍ＿１、ｍ＿２のそれぞれから図示しないＡＤ変換器を介して、１フレーム分（１つの処理単位分）の入力信号ｓ１（ｎ）及びｓ２（ｎ）がＦＦＴ部１０に供給されたものとする。そして、ＦＦＴ部１０は、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）に基づく分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）についてフーリエ変換し、周波数領域で示される信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）を取得する。そして、ＦＦＴ部１０で生成された信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）が、正面抑圧信号生成部２０及びコヒーレンス計算部３０に供給される。 It is assumed that input signals s1 (n) and s2 (n) for one frame (one processing unit) are supplied to the FFT unit 10 from each of the microphones m_1 and m_2 via an AD converter (not shown). Then, the FFT unit 10 Fourier transforms the analysis frames FRAME1 (K) and FRAME2 (K) based on the input signals s1 (n) and s2 (n) for one frame, and performs Fourier transform on the signals X1 (f,) indicated in the frequency domain. K) and X2 (f, K) are acquired. Then, the signals X1 (f, K) and X2 (f, K) generated by the FFT unit 10 are supplied to the front suppression signal generation unit 20 and the coherence calculation unit 30.

正面抑圧信号生成部２０は、供給されたＸ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、正面抑圧信号Ｎ（ｆ，Ｋ）を算出する。そして、正面抑圧信号生成部２０は、正面抑圧信号Ｎ（ｆ，Ｋ）に基づいて平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出し、相関及びｍｏｄＧＩ計算部４０に供給する。 The front suppression signal generation unit 20 calculates the front suppression signal N (f, K) based on the supplied X1 (f, K) and X2 (f, K). Then, the front suppression signal generation unit 20 calculates the average front suppression signal AVE_N (K) based on the front suppression signal N (f, K) and supplies it to the correlation and modGI calculation unit 40.

一方、コヒーレンス計算部３０は、供給されたＸ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、コヒーレンスＣＯＨ（Ｋ）を生成し、相関及びｍｏｄＧＩ計算部４０に供給する。 On the other hand, the coherence calculation unit 30 generates a coherence COH (K) based on the supplied X1 (f, K) and X2 (f, K) and supplies it to the correlation and modGI calculation unit 40.

相関及びｍｏｄＧＩ計算部４０は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）及びコヒーレンスＣＯＨ（Ｋ）に基づいて、相関係数ｃｏｒ（Ｋ）を算出し、算出した相関係数ｃｏｒ（Ｋ）に基づいて、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）を算出する。 The correlation and modGI calculation unit 40 calculates the correlation coefficient cor (K) based on the average frontal suppression signal AVE_N (K) and the coherence COH (K), and based on the calculated correlation coefficient cor (K). The cor_modGI (K) is calculated.

そして、背景雑音存在判定部５０は、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）に基づいて、背景雑音の有無を判定し、その判定結果を信号Ｙ（Ｋ）として出力する。 Then, the background noise presence determination unit 50 determines the presence or absence of background noise based on the cor_modGI (K), and outputs the determination result as a signal Y (K).

次に、背景雑音存在判定部５０の動作詳細について図４、図５のフローチャートを用いて説明する。 Next, the operation details of the background noise presence determination unit 50 will be described with reference to the flowcharts of FIGS. 4 and 5.

図４は、背景雑音存在判定部５０が背景雑音の有無を判定する処理について示したフローチャートである。図５は、図４のフローチャートの一部の処理について示したフローチャートである。背景雑音存在判定部５０は、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）（１フレーム分のデータ）が供給されるごとに、図４、図５のフローチャートの処理により背景雑音の有無を判定し、信号Ｙ（Ｋ）を出力するものとする。 FIG. 4 is a flowchart showing a process in which the background noise presence determination unit 50 determines the presence or absence of background noise. FIG. 5 is a flowchart showing a part of the processing of the flowchart of FIG. The background noise presence determination unit 50 determines the presence or absence of background noise by processing the flowcharts of FIGS. 4 and 5 each time cor_modGI (K) (data for one frame) is supplied, and outputs a signal Y (K). It shall be output.

背景雑音存在判定部５０は、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）が供給されると（Ｓ１０１）、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）と閾値Θとに基づいて背景雑音の有無を判定し（Ｓ１０２）、その判定結果を示す信号Ｙ（Ｋ）を生成して出力する（Ｓ１０３）。 When the background noise presence determination unit 50 is supplied with the cor_modGI (K) (S101), the background noise presence determination unit 50 determines the presence or absence of the background noise based on the cor_modGI (K) and the threshold value Θ (S102), and the signal Y indicating the determination result. (K) is generated and output (S103).

次に、背景雑音存在判定部５０が上述のステップＳ１０２で行う判定処理の具体例について図５のフローチャートを用いて説明する。 Next, a specific example of the determination process performed by the background noise presence determination unit 50 in step S102 will be described with reference to the flowchart of FIG.

背景雑音存在判定部５０は、判定処理を開始すると、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値を確認し（Ｓ２０１）、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値に応じて妨害音の有無を判定する。 When the background noise presence determination unit 50 starts the determination process, it confirms the value of cor_modGI (K) (S201), and determines the presence or absence of disturbing sound according to the value of cor_modGI (K).

具体的には、背景雑音存在判定部５０は、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）が閾値Θより大きい場合には「背景雑音有り」と判定し（Ｓ２０２）、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）が閾値Θ未満の場合には「背景雑音無し」と判定する（Ｓ２０３）。 Specifically, the background noise presence determination unit 50 determines that "there is background noise" when the cor_modGI (K) is larger than the threshold value Θ (S202), and when the cor_modGI (K) is less than the threshold value Θ, " It is determined that there is no background noise (S203).

（Ａ−３）第１の実施形態の効果
以上のように第１の実施形態によれば、非目的音判定装置１は、正面抑圧信号とコヒーレンスの相関のｍｏｄＧＩは背景雑音が存在する時には小さくなり、背景雑音が存在しないときには大きくなるという、特徴的な挙動に基づいて、非目的音（背景雑音）の有無を精度よく判定することができる。そして、判定結果の供給先で、背景雑音の有無に応じて最適な音声処理を実現することができる。すなわち、音声処理装置の音声処理（例えば、テレビ会議システムや携帯電話などの通信装置や音声認識機能の前処理）に、この実施形態の非目的音判定装置１の判定結果を適用することで、音声処理装置の性能向上（例えば、背景雑音等の非目的音の抑制性能の向上）が期待できる。 (A-3) Effect of First Embodiment As described above, according to the first embodiment, in the non-purpose sound determination device 1, the mod GI of the correlation between the front suppression signal and the coherence is small when background noise is present. Therefore, the presence or absence of non-purpose sound (background noise) can be accurately determined based on the characteristic behavior that the background noise becomes large when there is no background noise. Then, the optimum voice processing can be realized at the supply destination of the determination result depending on the presence or absence of background noise. That is, by applying the determination result of the non-purpose sound determination device 1 of this embodiment to the audio processing of the audio processing device (for example, preprocessing of a communication device such as a video conference system or a mobile phone or a voice recognition function), It is expected that the performance of the voice processing device will be improved (for example, the performance of suppressing non-purpose sounds such as background noise will be improved).

（Ｂ）第２の実施形態
以下、本発明による非目的音判定装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of the non-purpose sound determination device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図６は、第２の実施形態に係る非目的音判定装置２の全体構成を示すブロック図であり、上述した図１との同一、対応部分には同一、対応符号を付して示している。第２の実施形態の非目的判定装置は、その各部をハードウェアによって構成しても良く、また、一部の構成についてはソフトウェア的に構成しても良い。 (B-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing the overall configuration of the non-purpose sound determination device 2 according to the second embodiment, and is the same as and corresponding to FIG. 1 described above. Is the same and is shown with a corresponding code. Each part of the non-purpose determination device of the second embodiment may be configured by hardware, or some configurations may be configured by software.

図６において、非目的音判定装置２は、図１に示した非目的音判定装置１と同様な、ＦＦＴ部１０、正面抑圧信号生成部２０、コヒーレンス計算部３０、相関及びｍｏｄＧＩ計算部４０、背景雑音存在判定部５０と、この実施形態に特有な相関計算及び妨害音存在判定部６０とを有する。 In FIG. 6, the non-purpose sound determination device 2 has the same FFT unit 10, front suppression signal generation unit 20, coherence calculation unit 30, correlation and modGI calculation unit 40, as in the non-purpose sound determination device 1 shown in FIG. It has a background noise presence determination unit 50, and a correlation calculation and interference sound presence determination unit 60 peculiar to this embodiment.

この実施形態の非目的音判定装置２は、図１の非目的音判定装置１と比較すると、相関計算及び妨害音存在判定部６０が追加されたものである。以下では、追加された妨害音存在判定部６０についてのみ説明する。 The non-purpose sound determination device 2 of this embodiment has a correlation calculation and an interfering sound presence determination unit 60 added as compared with the non-purpose sound determination device 1 of FIG. In the following, only the added interfering sound presence determination unit 60 will be described.

妨害音存在判定部６０は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）に長期平均化処理を施したＬｏｎｇ＿Ｎ（Ｋ）と、コヒーレンスＣＯＨ（Ｋ）とから、両者の相関（相関係数ｃｏｒ_ｌ（Ｋ））を算出して、妨害音声の有無を判定するものである。相関及びｍｏｄＧＩ計算部４０においても、相関係数ｃｏｒ（Ｋ）を算出していたが、妨害音存在判定部６０は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）では無く、長期平均化処理を施したＬｏｎｇ＿Ｎ（Ｋ）を用いている点が異なる。以下、長期平均化処理を施したＬｏｎｇ＿Ｎ（Ｋ）を用いている理由を説明する。 The interfering sound presence determination unit 60 correlates the average front suppression signal AVE_N (K) from Long_N (K) obtained by performing long-term averaging processing and coherence COH (K) (correlation coefficient cor_l (K)). Is calculated to determine the presence or absence of disturbing voice. The correlation and modGI calculation unit 40 also calculated the correlation coefficient cor (K), but the interfering sound presence determination unit 60 was not the average front suppression signal AVE_N (K), but Long_N subjected to long-term averaging processing. The difference is that (K) is used. Hereinafter, the reason why Long_N (K) subjected to the long-term averaging treatment is used will be described.

先に述べたように、相関係数ｃｏｒ（Ｋ）を導入することにより、例えば、相関係数ｃｏｒ（Ｋ）の正負判断というシンプルな処理で、妨害音声の有無を判定することができる。しかしながら、背景雑音の影響が増すほど、相関係数ｃｏｒ（Ｋ）の増減の変動や、正負の変動頻度は増加する。そのため、妨害音の有無とは無関係に相関係数ｃｏｒ（Ｋ）が負になる場合があり、誤判定が生じることにもなる。そのため、妨害音存在判定部６０は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）に変動を抑制する長期平均処理を施したＬｏｎｇ＿Ｎ（Ｋ）を用いた相関（ｃｏｒ_ｌ（Ｋ））を算出することにより、この課題に対処する。前述のように、背景雑音の影響により、正面抑圧信号の変動が不規則になることが相関の挙動の変化の原因でもあるからである。 As described above, by introducing the correlation coefficient cor (K), it is possible to determine the presence or absence of disturbed voice by, for example, a simple process of determining whether the correlation coefficient cor (K) is positive or negative. However, as the influence of background noise increases, the fluctuation of the increase / decrease in the correlation coefficient cor (K) and the frequency of positive / negative fluctuations increase. Therefore, the correlation coefficient cor (K) may be negative regardless of the presence or absence of disturbing sound, and an erroneous determination may occur. Therefore, the disturbing sound presence determination unit 60 calculates the correlation (cor_l (K)) using Long_N (K) that has been subjected to long-term average processing that suppresses fluctuations in the average front suppression signal AVE_N (K). Address challenges. This is because, as described above, irregular fluctuations in the front suppression signal due to the influence of background noise are also the cause of changes in the behavior of the correlation.

具体的には、妨害音存在判定部６０は、正面抑圧信号生成部２０から供給された平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）に基づいて以下の（６）式のような計算を行って、長期平均処理を施したＬｏｎｇ＿Ｎ（Ｋ）を生成する。
Ｌｏｎｇ＿Ｎ（Ｋ）＝λ×ＡＶＥ＿Ｎ（Ｋ）＋（１−λ）×ＡＶＥ＿Ｎ（Ｋ−１）（０．０＜λ＜１．０） …（６） Specifically, the disturbing sound presence determination unit 60 performs a calculation as shown in the following equation (6) based on the average front suppression signal AVE_N (K) supplied from the front suppression signal generation unit 20, and performs a long-term average. The treated Long_N (K) is generated.
Long_N (K) = λ × AVE_N (K) + (1-λ) × AVE_N (K-1) (0.0 <λ <1.0)… (6)

（６）式において、時定数λは、長期平均値に対して瞬時入力値をどの程度反映するかをコントロールする役割を持つ値である。時定数λが大きいほど瞬時入力の影響が強くなり、小さければ瞬時入力の影響は薄れる。よって、時定数λを小さくするほど、正面抑圧信号の変動を抑制でき、この結果、相関の変動を抑制することができる。時定数λに設定される値は、限定されないものであるが、この実施形態では、背景雑音存在判定部５０の判定結果により変動する変動値とする。例えば、妨害音存在判定部６０は、背景雑音存在判定部５０により送付された背景雑音の存在の判定結果を示す信号Ｙ（Ｋ）が、「背景雑音有り」を示す値の場合には、背景雑音の影響が大きく正面抑圧信号の変動が大きいと判断し、時定数λには背景雑音無しに比べて大きな値を設定する。一方、妨害音存在判定部６０は、信号Ｙ（Ｋ）が、「背景雑音無し」を示す値の場合には、背景雑音の影響が小さく正面抑圧信号の変動が小さいと判断し、時定数λには背景雑音有りに比べて小さな値を設定する。 In the equation (6), the time constant λ is a value having a role of controlling how much the instantaneous input value is reflected with respect to the long-term average value. The larger the time constant λ, the stronger the influence of the instantaneous input, and the smaller the time constant λ, the less the influence of the instantaneous input. Therefore, the smaller the time constant λ is, the more the fluctuation of the front suppression signal can be suppressed, and as a result, the fluctuation of the correlation can be suppressed. The value set in the time constant λ is not limited, but in this embodiment, it is a variable value that varies depending on the determination result of the background noise presence determination unit 50. For example, when the signal Y (K) indicating the presence determination result of the background noise sent by the background noise presence determination unit 50 is a value indicating "with background noise", the disturbing sound presence determination unit 60 has a background. Judging that the influence of noise is large and the fluctuation of the front suppression signal is large, the time constant λ is set to a larger value than that without background noise. On the other hand, when the signal Y (K) has a value indicating "no background noise", the disturbing sound presence determination unit 60 determines that the influence of the background noise is small and the fluctuation of the front suppression signal is small, and the time constant λ Is set to a smaller value than with background noise.

妨害音存在判定部６０は、長期平均化処理済みの平均正面抑圧信号Ｌｏｎｇ＿Ｎ（Ｋ）と、コヒーレンス計算部３０から取得したコヒーレンスＣＯＨ（Ｋ）により、相関係数ｃｏｒ_ｌ（Ｋ）を算出する。なお、相関係数ｃｏｒ_ｌ（Ｋ）の算出方法は、先に述べた、相関係数ｃｏｒ（Ｋ）の算出方法と同様の手法により算出できるため、ここでは説明を省略する。 The disturbing sound presence determination unit 60 calculates the correlation coefficient cor_l (K) from the average front suppression signal Long_N (K) that has undergone long-term averaging processing and the coherence COH (K) acquired from the coherence calculation unit 30. Since the method for calculating the correlation coefficient cor_l (K) can be calculated by the same method as the method for calculating the correlation coefficient cor (K) described above, the description thereof is omitted here.

妨害音存在判定部６０は、相関係数ｃｏｒ_ｌ（Ｋ）が０より大きい場合（相関係数ｃｏｒ_ｌ（Ｋ）が、０又は正の場合；ｃｏｒ_ｌ（Ｋ）≧０の場合）には妨害音声無しと判定し、相関係数ｃｏｒ_ｌ（Ｋ）が０未満の場合（相関係数ｃｏｒ_ｌ（Ｋ）が負の場合；ｃｏｒ_ｌ（Ｋ）＜０の場合）には妨害音声有りと判定するものとする。 The disturbing sound presence determination unit 60 has no disturbing sound when the correlation coefficient cor_l (K) is larger than 0 (when the correlation coefficient cor_l (K) is 0 or positive; when cor_l (K) ≥ 0). When the correlation coefficient cor_l (K) is less than 0 (when the correlation coefficient cor_l (K) is negative; when cor_l (K) <0), it is determined that there is an interfering voice.

また、妨害音存在判定部６０は、判定結果を示す信号Ｒ（Ｋ）を出力する。信号Ｒ（Ｋ）の形式は限定されないものであるが、例えば、「妨害音声有り」を示す値（例えば、「１」）又は、「妨害音声無し」を示す値（例えば、「０」）を出力するようにしてもよい。なお、妨害音存在判定部６０が信号Ｒ（Ｋ）を出力する方式や供給先については限定されないものである。 Further, the disturbing sound presence determination unit 60 outputs a signal R (K) indicating the determination result. The format of the signal R (K) is not limited, but for example, a value indicating "with disturbing voice" (for example, "1") or a value indicating "without disturbing voice" (for example, "0") is used. It may be output. The method by which the disturbing sound presence determination unit 60 outputs the signal R (K) and the supply destination are not limited.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の非目的音判定装置２の動作（実施形態の判定方法）を説明する。なお、相関計算及び妨害音存在判定部６０以外の各部の動作は、第１の実施形態と同様であるので説明を省略し、以下では、相関計算及び妨害音存在判定部６０の詳細動作を図７、図８のフローチャートを用いて説明する。 (B-2) Operation of the Second Embodiment Next, the operation of the non-purpose sound determination device 2 of the second embodiment having the above configuration (determination method of the embodiment) will be described. Since the operations of each unit other than the correlation calculation and the interference sound presence determination unit 60 are the same as those in the first embodiment, the description thereof will be omitted. In the following, the detailed operations of the correlation calculation and the interference sound presence determination unit 60 are shown. 7. This will be described with reference to the flowchart of FIG.

図７は、相関計算及び妨害音存在判定部６０が妨害音声の有無を判定する処理について示したフローチャートである。図８は、図７のフローチャートの一部の処理について示したフローチャートである。相関計算及び妨害音存在判定部６０は、Ｙ（Ｋ）（１フレーム分のデータ）が供給されるごとに、図８、図７のフローチャートの処理により妨害音声の有無を判定し、信号Ｒ（Ｋ）を出力するものとする。 FIG. 7 is a flowchart showing the correlation calculation and the process of determining the presence / absence of the disturbing sound by the disturbing sound presence determination unit 60. FIG. 8 is a flowchart showing a part of the processing of the flowchart of FIG. 7. The correlation calculation and interference sound presence determination unit 60 determines the presence or absence of interference sound by processing the flowcharts of FIGS. 8 and 7 each time Y (K) (data for one frame) is supplied, and signals R ( K) shall be output.

相関計算及び妨害音存在判定部６０は、背景雑音の存在の判定結果を示す信号Ｙ（Ｋ）が供給されると（Ｓ３０１）、信号Ｙ（Ｋ）の値を確認し、時定数λに設定する値を制御する（Ｓ３０２）。 When the signal Y (K) indicating the determination result of the presence of background noise is supplied (S301), the correlation calculation and the presence determination unit 60 of the disturbing sound confirms the value of the signal Y (K) and sets the time constant λ. The value to be used is controlled (S302).

相関計算及び妨害音存在判定部６０は、供給された平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）と、先述のステップＳ３０２の処理により値が設定された時定数λとに基づいて長期平均処理を施した平均正面抑圧信号Ｌｏｎｇ＿Ｎ（Ｋ）を算出する（Ｓ３０３）。 The correlation calculation and interfering sound presence determination unit 60 performs long-term average processing based on the supplied average front suppression signal AVE_N (K) and the time constant λ whose value is set by the processing of step S302 described above. The front suppression signal Long_N (K) is calculated (S303).

相関計算及び妨害音存在判定部６０は、供給されたコヒーレンスＣＯＨ（Ｋ）と、長期平均処理を施した平均正面抑圧信号Ｌｏｎｇ＿Ｎ（Ｋ）とに基づいて相関係数ｃｏｒ_ｌ（Ｋ）を算出する（Ｓ３０４）。 The correlation calculation and the presence determination unit 60 of the disturbing sound calculates the correlation coefficient cor_l (K) based on the supplied coherence COH (K) and the average front suppression signal Long_N (K) subjected to the long-term average processing ( S304).

相関計算及び妨害音存在判定部６０は、相関係数ｃｏｒ_ｌ（Ｋ）が０より大きい場合（相関係数ｃｏｒ_ｌ（Ｋ）が０又は正の値の場合；ｃｏｒ_ｌ（Ｋ）≧０の場合）には「妨害音声無し」と判定し、相関係数ｃｏｒ_ｌ（Ｋ）が０未満の場合（相関係数ｃｏｒ_ｌ（Ｋ）が負の値の場合；ｃｏｒ_ｌ（Ｋ）＜０の場合）には「妨害音声有り」と判定する（Ｓ３０５）。 When the correlation coefficient cor_l (K) is larger than 0 (when the correlation coefficient cor_l (K) is 0 or a positive value; when cor_l (K) ≥ 0), the correlation calculation and interference sound presence determination unit 60 is used. Is judged as "no disturbing sound", and when the correlation coefficient cor_l (K) is less than 0 (when the correlation coefficient cor_l (K) is a negative value; when cor_l (K) <0), "jamming" It is determined that "there is sound" (S305).

相関計算及び妨害音存在判定部６０は、先述のステップＳ３０５の処理による判定結果を示す信号Ｒ（Ｋ）を生成して出力する（Ｓ３０６）。 The correlation calculation and interference sound presence determination unit 60 generates and outputs a signal R (K) indicating a determination result by the process of step S305 described above (S306).

次に、相関計算及び妨害音存在判定部６０が上述のステップＳ３０２で行う判定処理の具体例について図８のフローチャートを用いて説明する。 Next, a specific example of the determination process performed by the correlation calculation and the interference sound presence determination unit 60 in step S302 will be described with reference to the flowchart of FIG.

背景雑音存在判定部５０は、信号Ｙ（Ｋ）の値（「背景雑音有り」を示す「１」か「背景雑音無し」を示す「０」）を確認し（Ｓ４０１）、信号Ｙ（Ｋ）が「背景雑音有り」を示す値の場合には、時定数λには大きな値を設定し（Ｓ４０２）、信号Ｙ（Ｋ）が、「背景雑音無し」を示す値の場合には、時定数λには小さな値を設定する（Ｓ４０３）。 The background noise presence determination unit 50 confirms the value of the signal Y (K) (“1” indicating “with background noise” or “0” indicating “without background noise”) (S401), and signals Y (K). Is a value indicating "with background noise", a large value is set for the time constant λ (S402), and when the signal Y (K) is a value indicating "without background noise", the time constant is set. A small value is set for λ (S403).

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態の効果に加えて以下のような効果を奏する。 (B-3) Effect of Second Embodiment According to the second embodiment, the following effects are obtained in addition to the effect of the first embodiment.

第２の実施形態の非目的音判定装置２では、相関係数ｃｏｒ_ｌ（Ｋ）の値に基づいて、妨害音声の有無を判定している。これにより、第２の実施形態の非目的音判定装置２では、精度よく妨害音声の有無を判定することができるので、判定結果の供給先で、妨害音声の有無に応じて最適な音声処理を実現することができる。すなわち、音声処理装置の音声処理（例えば、テレビ会議システムや携帯電話などの通信装置や音声認識機能の前処理）に、この実施形態の非目的音判定装置２の判定結果を適用することで、音声処理装置の性能向上（例えば、妨害音声等の非目的音の抑制性能の向上）が期待できる。 In the non-purpose sound determination device 2 of the second embodiment, the presence / absence of disturbing sound is determined based on the value of the correlation coefficient cor_l (K). As a result, the non-purpose sound determination device 2 of the second embodiment can accurately determine the presence or absence of the disturbing voice, so that the optimum voice processing can be performed at the supply destination of the determination result according to the presence or absence of the disturbing voice. It can be realized. That is, by applying the determination result of the non-purpose sound determination device 2 of this embodiment to the audio processing of the audio processing device (for example, preprocessing of a communication device such as a video conference system or a mobile phone or a voice recognition function), It is expected that the performance of the voice processing device will be improved (for example, the performance of suppressing non-purpose sounds such as disturbing voices will be improved).

相関係数ｃｏｒ_ｌ（Ｋ）の算出に用いられる平均正面抑圧信号Ｌｏｎｇ＿Ｎ（Ｋ）は、背景雑音の有無に応じて最適な時定数λが設定された上で、長期平均処理を施されているので、背景雑音の変動に頑健な妨害音声の存在判定処理を実現することができる。 Since the average front suppression signal Long_N (K) used for calculating the correlation coefficient cor_l (K) is subjected to long-term average processing after the optimum time constant λ is set according to the presence or absence of background noise. , It is possible to realize the existence determination processing of the disturbing sound that is robust against the fluctuation of the background noise.

（Ｃ）他の実施形態
上記実施形態に加えて、さらに、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments In addition to the above embodiments, modified embodiments as illustrated below can also be mentioned.

（Ｃ−１）第１の実施形態ではｍｏｄＧＩを適用する場合を示したが、修正される前のＧＩ（特許文献２の（４）式）も、信号波形の傾き方向が変化する回数とその大きさを測る指標であるので、第１の実施形態におけるｍｏｄＧＩに代えてＧＩを適用するようにしても良い。 (C-1) In the first embodiment, the case where modGI is applied is shown, but in the GI before modification (Equation (4) of Patent Document 2), the number of times the inclination direction of the signal waveform changes and the number thereof. Since it is an index for measuring the size, GI may be applied instead of modGI in the first embodiment.

（Ｃ−２）第２の実施形態において、平均正面抑圧信号Ｌｏｎｇ＿Ｎ（Ｋ）を算出する際に用いた時定数λは、予め設定された固定値でも良い。 (C-2) In the second embodiment, the time constant λ used when calculating the average front suppression signal Long_N (K) may be a preset fixed value.

１、２…非目的音判定装置、１０…ＦＦＴ部、２０…正面抑圧信号生成部、３０…コヒーレンス計算部、４０…ｍｏｄＧＩ計算部、５０…背景雑音存在判定部、６０…妨害音存在判定部、ＡＶＥ＿Ｎ、Ｌｏｎｇ＿Ｎ…平均正面抑圧信号、ＣＯＨ…コヒーレンス、ｃｏｒ、ｃｏｒ_ｌ…相関係数、Θ…閾値、λ…時定数。 1, 2 ... Non-purpose sound determination device, 10 ... FFT unit, 20 ... Front suppression signal generation unit, 30 ... Coherence calculation unit, 40 ... modGI calculation unit, 50 ... Background noise presence determination unit, 60 ... Interfering sound presence determination unit , AVE_N, Long_N ... average frontal suppression signal, COH ... coherence, cor, cor_l ... correlation coefficient, Θ ... threshold, λ ... time constant.

Claims

複数のマイクから得られた入力信号を時間領域から周波数領域に変換された周波数領域入力信号を取得し、取得した前記マイクごとの周波数領域入力信号の差に基づいて、正面に死角を有する第１の正面抑圧信号を生成する正面抑圧信号生成部と、
前記複数のマイクから得られた入力信号からコヒーレンスを算出するコヒーレンス算出部と、
前記第１の正面抑圧信号及び前記コヒーレンスの相関係数と、前記相関係数の振幅の傾きの正負の変動の激しさを表す第１の特徴量とを算出する第１の特徴量算出部と、
前記第１の特徴量算出部が算出した前記第１の特徴量の値に基づいて背景雑音の有無を判定する背景雑音存在判定部と
を有することを特徴とする非目的音判定装置。 A first frequency domain input signal obtained by converting input signals obtained from a plurality of microphones from a time domain to a frequency domain, and having a blind spot on the front surface based on the difference between the acquired frequency domain input signals for each microphone. Front suppression signal generator that generates the front suppression signal of
A coherence calculation unit that calculates coherence from input signals obtained from the plurality of microphones,
A first feature amount calculation unit for calculating the correlation coefficient of the first frontal suppression signal and the coherence, and a first feature amount indicating the intensity of positive / negative fluctuation of the amplitude gradient of the correlation coefficient. ,
A non-purpose sound determination device including a background noise presence determination unit that determines the presence or absence of background noise based on the value of the first feature amount calculated by the first feature amount calculation unit.

前記第１の特徴量算出部は、前記相関係数のｍｏｄＧＩ値を前記第１の特徴量として算出し、
前記背景雑音存在判定部は、前記第１の特徴量が所定の閾値より大きければ背景雑音が存在し、前記第１の特徴量が所定の閾値より小さければ背景雑音が存在しないと判定する
ことを特徴とする請求項１に記載の非目的音判定装置。 The first feature amount calculation unit calculates the modGI value of the correlation coefficient as the first feature amount.
The background noise presence determination unit determines that background noise is present if the first feature amount is larger than a predetermined threshold value, and that background noise is not present if the first feature amount is smaller than a predetermined threshold value. The non-purpose sound determination device according to claim 1, which is characterized.

前記第１の正面抑圧信号に長期平均化処理を施した第２の正面抑圧信号と前記コヒーレンスとの関係性を表す第２の特徴量を算出する第２の特徴量算出部と、
前記第２の特徴量算出部が算出した前記第２の特徴量の値に基づいて妨害音声の有無を判定する妨害音声存在判定部と
をさらに有することを特徴とする請求項１又は２に記載の非目的音判定装置。 A second feature amount calculation unit that calculates a second feature amount representing the relationship between the second frontal suppression signal obtained by subjecting the first frontal suppression signal to a long-term averaging process and the coherence, and a second feature amount calculation unit.
The invention according to claim 1 or 2, further comprising a disturbing sound presence determination unit that determines the presence or absence of disturbing voice based on the value of the second feature amount calculated by the second feature amount calculation unit. Non-purpose sound judgment device.

前記第２の特徴量算出部は、前記第２の正面抑圧信号及び前記コヒーレンスの相関係数を前記第２の特徴量として算出し、
前記妨害音声存在判定部は、前記第２の特徴量の値の正負に基づいて妨害音声の有無を判定する
ことを特徴とする請求項３に記載の非目的音判定装置。 The second feature amount calculation unit calculates the correlation coefficient between the second front suppression signal and the coherence as the second feature amount.
The non-purpose sound determination device according to claim 3, wherein the disturbing voice presence determination unit determines the presence or absence of disturbing voice based on the positive or negative value of the value of the second feature amount.

前記第２の特徴量算出部は、前記背景雑音存在判定部による背景雑音の有無の判定結果に応じて、前記第２の正面抑圧信号に対して施す前記長期平均化処理に供する時定数を制御することを特徴とする請求項４に記載のの非目的音判定装置。 The second feature amount calculation unit controls the time constant to be subjected to the long-term averaging process applied to the second front suppression signal according to the determination result of the presence / absence of background noise by the background noise presence determination unit. The non-purpose sound determination device according to claim 4, wherein the non-purpose sound determination device is characterized.

コンピュータを、
複数のマイクから得られた入力信号を時間領域から周波数領域に変換された周波数領域入力信号を取得し、取得した前記マイクごとの周波数領域入力信号の差に基づいて、正面に死角を有する第１の正面抑圧信号を生成する正面抑圧信号生成部と、
前記複数のマイクから得られた入力信号からコヒーレンスを算出するコヒーレンス算出部と、
前記第１の正面抑圧信号及び前記コヒーレンスの相関係数と、前記相関係数の振幅の傾きの正負の変動の激しさを表す第１の特徴量とを算出する第１の特徴量算出部と、
前記第１の特徴量算出部が算出した前記第１の特徴量の値に基づいて背景雑音の有無を判定する背景雑音存在判定部と
して機能させることを特徴とする非目的音判定プログラム。 Computer,
A first frequency domain input signal obtained by converting input signals obtained from a plurality of microphones from a time domain to a frequency domain, and having a blind spot on the front surface based on the difference between the acquired frequency domain input signals for each microphone. Front suppression signal generator that generates the front suppression signal of
A coherence calculation unit that calculates coherence from input signals obtained from the plurality of microphones,
A first feature amount calculation unit for calculating the correlation coefficient of the first frontal suppression signal and the coherence, and a first feature amount indicating the intensity of positive / negative fluctuation of the amplitude gradient of the correlation coefficient. ,
A non-purpose sound determination program characterized in that it functions as a background noise presence determination unit that determines the presence or absence of background noise based on the value of the first feature amount calculated by the first feature amount calculation unit.

非目的音判定装置に使用する非目的音判定方法であって、
正面抑圧信号生成部、コヒーレンス算出部、第１の特徴量算出部、及び背景雑音存在判定部を有し、
前記正面抑圧信号生成部は、複数のマイクから得られた入力信号を時間領域から周波数領域に変換された周波数領域入力信号を取得し、取得した前記マイクごとの周波数領域入力信号の差に基づいて、正面に死角を有する第１の正面抑圧信号を生成し、
前記コヒーレンス算出部は、前記複数のマイクから得られた入力信号からコヒーレンスを算出し、
前記第１の特徴量算出部は、前記第１の正面抑圧信号及び前記コヒーレンスの相関係数と、前記相関係数の振幅の傾きの正負の変動の激しさを表す第１の特徴量とを算出し、
前記背景雑音存在判定部は、前記第１の特徴量算出部が算出した前記第１の特徴量の値に基づいて背景雑音の有無を判定する
ことを特徴とする非目的音判定方法。 This is a non-purpose sound determination method used in a non-purpose sound determination device.
It has a front suppression signal generation unit, a coherence calculation unit, a first feature amount calculation unit, and a background noise presence determination unit.
The front suppression signal generation unit acquires a frequency domain input signal obtained by converting input signals obtained from a plurality of microphones from a time domain to a frequency domain, and based on the difference between the acquired frequency domain input signals for each microphone. Generates a first frontal suppression signal with a frontal blind spot,
The coherence calculation unit calculates coherence from the input signals obtained from the plurality of microphones.
The first feature amount calculation unit obtains the correlation coefficient of the first front suppression signal and the coherence, and the first feature amount representing the intensity of positive / negative fluctuation of the slope of the amplitude of the correlation coefficient. Calculate and
The non-purpose sound determination method, wherein the background noise presence determination unit determines the presence or absence of background noise based on the value of the first feature amount calculated by the first feature amount calculation unit.