JP4859130B2

JP4859130B2 - Monitoring system

Info

Publication number: JP4859130B2
Application number: JP2007081091A
Authority: JP
Inventors: 栄治馬場
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2007-03-27
Filing date: 2007-03-27
Publication date: 2012-01-25
Anticipated expiration: 2027-03-27
Also published as: JP2008241991A

Abstract

<P>PROBLEM TO BE SOLVED: To improve monitoring efficiency of a monitoring system, while object environmental sound is recognized from a monitor sound with high accuracy, even in non-steady noisy environment. <P>SOLUTION: Signals x<SB>1</SB>(t) and x<SB>2</SB>(t) for expressing the monitor sound in which sound generated from a plurality of sound sources 9a and 9b are mixed, are obtained by monitoring surrounding noise by a plurality of microphones 2a and 2b. A separation signal is generated for each of sound sources 9a and 9b by inputting the signals x<SB>1</SB>(t) and x<SB>2</SB>(t) to a signal processing circuit 3. After noise reduction in a noise reduction circuit 4, it is determined, by a sound recognition circuit 5, whether or not, the sound expressed by the separation signal for each of the sound sources 9a and 9b is the object environment sound. When it is recognized as the object environmental sound, a device control section 6 controls a camera 80, a microphone 81 and a speaker 82, on the basis of the sound source direction of the corresponding sound obtained from the signal processing circuit 3. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、観測音に目的の環境音が含まれるか否かを判定する技術に関する。 The present invention relates to a technique for determining whether or not a target environmental sound is included in an observation sound.

例えば、複数の防犯カメラを備え、撮影したデータによって屋外環境を監視する監視システムが知られている。しかし、防犯カメラの多くは撮影したデータを保存し続けるため、取得されるデータが膨大になる。また、蓄積された膨大なデータから目的のデータ（犯罪等が発生した箇所のデータ）を検索するにも多くの時間が必要となる。 For example, a monitoring system that includes a plurality of security cameras and that monitors the outdoor environment based on captured data is known. However, since many security cameras continue to store captured data, the amount of data acquired is enormous. In addition, it takes a lot of time to search for target data (data where a crime or the like has occurred) from a large amount of accumulated data.

このような問題を解決するために、例えば、周囲の音（観測音）から、検証したい事象（犯罪等）が発生したか否かを判定し、このような事象が発生したと判定したときにのみ、画像を記録するようにすることが考えられる。そして、検証したい事象が発生したときに観測される可能性の高い音（目的の環境音、例えば悲鳴等）を検知する技術も提案されている（特許文献１）。 In order to solve such a problem, for example, it is determined whether or not an event (crime etc.) to be verified has occurred from surrounding sounds (observation sound), and it is determined that such an event has occurred Only an image can be considered to be recorded. And the technique which detects the sound (target environmental sound, for example, a scream etc.) with high possibility of being observed when the event to verify is generated (patent document 1) is also proposed.

一般に、ある地点で観測される観測音は、様々な音源から発生した音が合成された混合音である。このような観測音に基づいてそのまま音認識しようとすると、必要としている音以外の音は雑音となるため好ましくない。すなわち、特許文献１に記載されている技術では、観測音に悲鳴が含まれているか否かを認識する際に、観測音のパワーや周波数スペクトルの分析による判定方法を用いてるため、特に非定常的な雑音が多い屋外環境では、目的の環境音を認識するのが難しいという問題があった。 In general, the observation sound observed at a certain point is a mixed sound in which sounds generated from various sound sources are synthesized. If an attempt is made to recognize the sound as it is based on such observation sound, a sound other than the necessary sound becomes noise, which is not preferable. In other words, the technique described in Patent Document 1 uses a determination method based on analysis of the power of the observation sound and the frequency spectrum when recognizing whether or not the observation sound includes scream. In an outdoor environment where there is a lot of noise, there is a problem that it is difficult to recognize the target environmental sound.

そこで、従来より、観測音の認識精度を向上させるために、音源を分離する技術も提案されている（特許文献２）。 Therefore, conventionally, a technique for separating a sound source has been proposed in order to improve the recognition accuracy of observation sound (Patent Document 2).

特許第３５０３７１７号公報Japanese Patent No. 3503717 特許第３５３００３５号公報Japanese Patent No. 3530035

ところが、特許文献２に記載されている技術では、比較的フィルタリング性能の高い音源分離手法を用いて判別しているものの、マイクの各チャンネル間のレベル及び位相差を用いて音を判別しているため、依然として、非定常的な雑音に対応するのは困難であるという問題があった。すなわち、従来の技術では、認識精度が低いために、誤報や失報が発生し、システムの信頼性が低下するという問題があった。 However, in the technique described in Patent Document 2, although the sound source separation method with relatively high filtering performance is used for discrimination, the sound is discriminated using the level and phase difference between each channel of the microphone. Therefore, there is still a problem that it is difficult to cope with non-stationary noise. In other words, the conventional technique has a problem in that the recognition accuracy is low, so that false or misreporting occurs and the reliability of the system decreases.

本発明は、上記課題に鑑みなされたものであり、非定常的な雑音の多い環境であっても観測音から目的の環境音を高精度に認識することを目的とする。さらに、監視システムの監視効率の向上を目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to recognize a target environmental sound from an observation sound with high accuracy even in an environment with a large amount of non-stationary noise. Furthermore, it aims at improving the monitoring efficiency of the monitoring system.

上記の課題を解決するため、請求項１の発明は、検出対象の目的音に応じて周囲の環境を監視する監視システムであって、それぞれの配置位置において観測された観測音を示す観測信号をそれぞれが生成する複数の観測装置と、周波数帯域を複数の分割帯域に分割し、各分割帯域ごとの部分分離行列を求めることによって、前記観測信号に対する分離行列を生成する分離行列演算手段と、前記分離行列演算手段により生成された分離行列によって前記複数の観測信号のうちの少なくとも１つから、少なくとも１の分離信号を生成する信号分離手段と、前記信号分離手段により生成された分離信号に基づいて、前記観測音に前記目的音が含まれているか否かを認識する認識手段と、前記観測装置間の距離に応じて変化する観測音の遅延時間を計測し、当該遅延時間と前記分離行列演算手段により生成された分離行列とに基づいて、前記目的音の音源方向を特定する方向特定手段と、前記分離行列演算手段は、前記複数の分割帯域をそれぞれ学習帯域群または補間帯域群に分類する帯域分類手段と、前記方向特定手段により特定された音源方向に基づいて、前記補間帯域群における部分分離行列を演算する補間手段と、学習処理により、前記学習帯域群における部分分離行列を演算する学習手段とを有し、前記分離行列演算手段は、前記学習手段により求めた学習帯域群における部分分離行列と、前記補間手段により求めた補間帯域群における部分分離行列とに基づいて、前記分離行列を生成することを特徴とする。 In order to solve the above problems, the invention of claim 1 is a monitoring system that monitors the surrounding environment according to the target sound to be detected , and provides an observation signal indicating an observation sound observed at each arrangement position. A plurality of observation devices each of which generates a separation matrix for the observation signal by dividing a frequency band into a plurality of divided bands and obtaining a partial separation matrix for each divided band; and from at least one of the plurality of observed signals by the separation matrix generated by the separation matrix computing means, a signal separation means for generating at least one separation signal, based on the separation signal generated by said signal separating means Te, recognition means for recognizing whether or not the target sound is included in the observed sound, the delay time of the observed sound that varies according to the distance between the observation device meter The direction specifying means for specifying the sound source direction of the target sound based on the delay time and the separation matrix generated by the separation matrix calculation means, and the separation matrix calculation means each of the plurality of divided bands Band classification means for classifying into learning band groups or interpolation band groups, interpolation means for calculating a partial separation matrix in the interpolation band group based on the sound source direction specified by the direction specifying means, and learning by the learning process Learning means for calculating a partial separation matrix in the band group, wherein the separation matrix calculation means includes a partial separation matrix in the learning band group obtained by the learning means and a partial separation in the interpolation band group obtained by the interpolation means. The separation matrix is generated based on the matrix.

また、請求項２の発明は、請求項１の発明に係る監視システムであって、周囲の環境を撮影により記録する少なくとも１つのカメラをさらに備え、前記少なくとも１つのカメラは、前記方向特定手段により特定された音源方向に応じて制御されることを特徴とする。 The invention of claim 2 is the monitoring system according to the invention of claim 1, the environment of the ambient further comprises at least one camera for recording by the capturing, wherein the at least one camera, the direction identification It is controlled according to the sound source direction specified by the means.

また、請求項３の発明は、請求項１の発明に係る監視システムであって、周囲の環境を録音により記録する少なくとも１つの録音装置をさらに備え、前記少なくとも１つの録音装置は、前記方向特定手段により特定された音源方向に応じて制御されることを特徴とする。 Further, the invention of claim 3, a monitoring system according to the invention of claim 1, further comprising at least one recording equipment to record the record environment ambient, said at least one recording device, the It is controlled according to the sound source direction specified by the direction specifying means.

また、請求項４の発明は、請求項１ないし３のいずれかの発明に係る監視システムであって、前記認識手段による認識結果に基づいてオペレータに通報する通報手段をさらに備えることを特徴とする。 The invention of claim 4 is the monitoring system according to any one of claims 1 to 3 , further comprising a reporting means for reporting to an operator based on the recognition result by the recognition means. .

また、請求項５の発明は、請求項４の発明に係る監視システムであって、前記分離信号を音データとして記憶する記憶手段をさらに備え、前記通報手段は、前記音データを再生することを特徴とする。 The invention of claim 5 is the monitoring system according to the invention of claim 4 , further comprising storage means for storing the separated signal as sound data, wherein the notification means reproduces the sound data. Features.

本発明によれば、周波数帯域を複数の分割帯域に分割し、分割帯域ごとの部分分離行列を求めることによって観測信号に対する分離行列を生成し、生成した分離行列によって複数の観測信号のうちの少なくとも１つから、少なくとも１の分離信号を生成して、当該分離信号に基づいて観測音に目的音が含まれているか否かを認識することにより、認識精度が向上する。
また、学習手段により求めた学習帯域群における部分分離行列と、補間手段により求めた補間帯域群における部分分離行列とに基づいて、分離行列を生成することにより、全帯域について学習処理を行う場合に比べて、演算量が抑制される。
また、分離行列演算手段により生成された分離行列に基づいて音源方向が特定されるので、学習結果を反映させることができ、音源方向の特定精度が向上する。 According to the present invention, the frequency band is divided into a plurality of divided bands, a separation matrix for the observation signal is generated by obtaining a partial separation matrix for each divided band, and at least one of the plurality of observation signals is generated by the generated separation matrix. Recognition accuracy is improved by generating at least one separated signal from one and recognizing whether the target sound is included in the observation sound based on the separated signal.
When learning processing is performed for all bands by generating a separation matrix based on the partial separation matrix in the learning band group obtained by the learning means and the partial separation matrix in the interpolation band group obtained by the interpolation means. In comparison, the calculation amount is suppressed.
Moreover, since the sound source direction is specified based on the separation matrix generated by the separation matrix calculation means, the learning result can be reflected, and the accuracy of specifying the sound source direction is improved.

請求項２に記載の発明では、カメラが、方向特定手段により特定された音源方向に応じて制御されることにより、効率よく撮影を行うことができる。 According to the second aspect of the present invention, the camera is controlled according to the sound source direction specified by the direction specifying means, so that it is possible to perform shooting efficiently.

請求項３に記載の発明では、録音装置が、方向特定手段により特定された音源方向に応じて制御されることにより、効率よく録音を行うことができる。 According to the third aspect of the invention, the recording device is controlled according to the sound source direction specified by the direction specifying means, so that recording can be performed efficiently.

請求項４に記載の発明では、認識手段による認識結果に基づいてオペレータに通報する通報手段をさらに備えることにより、例えば、目的音が観測されたことをオペレータに知らせることができる。 According to the fourth aspect of the present invention, it is possible to notify the operator that, for example, the target sound has been observed by further providing a reporting unit that reports to the operator based on the recognition result by the recognition unit.

請求項５に記載の発明では、分離信号を音データとして記憶する記憶手段をさらに備え、通報手段が音データを再生することにより、オペレータは目的音と認識された音を直接確認することができる。
According to the fifth aspect of the present invention, the information processing apparatus further includes storage means for storing the separation signal as sound data, and the notification means reproduces the sound data , so that the operator can directly confirm the sound recognized as the target sound. .

以下、本発明の好適な実施の形態について、添付の図面を参照しつつ、詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of the invention will be described in detail with reference to the accompanying drawings.

＜１．実施の形態＞
図１は、本発明に係る監視システム１を示す図である。 <1. Embodiment>
FIG. 1 is a diagram showing a monitoring system 1 according to the present invention.

監視システム１は、２つのマイク２ａ，２ｂ、信号処理回路３、雑音除去回路４、音認識回路５、機器制御部６および記憶装置７を備える。また、監視システム１は、機器制御部６によって制御される装置として、カメラ８０、マイク８１、スピーカ８２および再生装置８３を備える。 The monitoring system 1 includes two microphones 2a and 2b, a signal processing circuit 3, a noise removal circuit 4, a sound recognition circuit 5, a device control unit 6, and a storage device 7. The monitoring system 1 includes a camera 80, a microphone 81, a speaker 82, and a playback device 83 as devices controlled by the device control unit 6.

マイク２ａ，２ｂは、一般的なマイクロフォンとしての機能を有しており、それぞれが設置された位置において観測される音（観測音）を電気信号（信号ｘ₁(t)，ｘ₂(t)）に変換する。２つのマイク２ａ，２ｂは、同時に観測音の観測を行うことが可能である。すなわち、マイク２ａ，２ｂは、本発明における複数の観測装置に相当する構成である。 The microphones 2a and 2b have a function as a general microphone. Sounds (observation sounds) observed at positions where the microphones 2a and 2b are installed are converted into electric signals (signals x ₁ (t) and x ₂ (t)). ). The two microphones 2a and 2b can simultaneously observe the observation sound. That is, the microphones 2a and 2b have a configuration corresponding to a plurality of observation apparatuses in the present invention.

図１に示すように、周囲の環境として、２つの音源９ａ，９ｂから、それぞれ音ｓ₁(t)，ｓ₂(t)が発生している状況を例に、マイク２ａ，２ｂによって生成される信号ｘ₁(t)，ｘ₂(t)について説明する。 As shown in FIG. 1, the microphones 2a and 2b are generated as an example of a situation in which sounds s ₁ (t) and s ₂ (t) are generated from two sound sources 9a and 9b, respectively. The signals x ₁ (t) and x ₂ (t) will be described.

音源９ａ，９ｂと、マイク２ａ，２ｂとの間の空間に対するインパルス応答のｚ変換をそれぞれａ₁₁(z)，ａ₂₁(z)，ａ₁₂(z)，ａ₂₂(z)と表すと、マイク２ａにおいて観測される観測音は、音ａ₁₁(z)・ｓ₁(t)と、音ａ₁₂(z)・ｓ₂(t)との混合音となり、この混合音に基づいて、マイク２ａは信号ｘ₁(t)を生成する。また、同様に、マイク２ｂにおいて観測される観測音は、音ａ₂₁(z)・ｓ₁(t)と、音ａ₂₂(z)・ｓ₂(t)との混合音となり、この混合音に基づいて、マイク２ｂは信号ｘ₂(t)を生成する。 When z conversion of the impulse response to the space between the sound sources 9a and 9b and the microphones 2a and 2b is expressed as a ₁₁ (z), a ₂₁ (z), a ₁₂ (z), and a ₂₂ (z), respectively. The observed sound observed by the microphone 2a is a mixed sound of the sound a ₁₁ (z) · s ₁ (t) and the sound a ₁₂ (z) · s ₂ (t). Based on this mixed sound, the microphone 2a generates a signal x ₁ (t). Similarly, the observation sound observed in the microphone 2b is a mixed sound of the sound a ₂₁ (z) · s ₁ (t) and the sound a ₂₂ (z) · s ₂ (t), and this mixed sound , The microphone 2b generates a signal x ₂ (t).

すなわち、混合行列をＡ(z)とおくと、複数の観測装置により取得される観測音の信号Ｘ(t)は、式１で表される。 That is, when the mixing matrix is A (z), the signal X (t) of the observation sound acquired by the plurality of observation devices is expressed by Equation 1.

図２は、信号処理回路３の詳細を示す図である。図２に示すように、信号処理回路３は、ＦＦＴ３０、信号分離部３１、ＩＦＦＴ３２、ＩＣＡ３３および補正部３４を備えている。信号処理回路３は、マイク２ａ，２ｂから入力される複数の信号（信号ｘ₁(t)，ｘ₂(t)）から少なくとも１つの分離信号（信号ｙ₁(t)，ｙ₂(t)）を生成して雑音除去回路４に出力するとともに、各音源９ａ，９ｂの方向Ｄ_p（Ｄ₁，Ｄ₂）を特定して、機器制御部６に出力する。 FIG. 2 is a diagram showing details of the signal processing circuit 3. As shown in FIG. 2, the signal processing circuit 3 includes an FFT 30, a signal separation unit 31, an IFFT 32, an ICA 33, and a correction unit 34. The signal processing circuit 3 includes at least one separated signal (signals y ₁ (t), y ₂ (t)) from a plurality of signals (signals x ₁ (t), x ₂ (t)) input from the microphones 2a, 2b. ) And output to the noise removal circuit 4, and the direction D _p (D ₁ , D ₂ ) of each sound source 9 a, 9 b is specified and output to the device control unit 6.

ＦＦＴ３０は入力された信号に対してフーリェ変換を行う回路であり、マイク２ａ，２ｂが生成した信号（信号ｘ₁(t)，ｘ₂(t)）を入力信号とし、当該入力信号のフーリェ変換後の信号（信号ｘ₁(f,t)，ｘ₂(f,t)）を出力信号として、信号分離部３１およびＩＣＡ３３に出力する。 The FFT 30 is a circuit that performs Fourier transform on an input signal. The signals (signals x ₁ (t), x ₂ (t)) generated by the microphones 2a and 2b are used as input signals, and Fourier transform of the input signals is performed. The subsequent signals (signals x ₁ (f, t), x ₂ (f, t)) are output to the signal separation unit 31 and the ICA 33 as output signals.

信号分離部３１はＦＦＴ３０から入力された信号（信号ｘ₁(f,t)，ｘ₂(f,t)）と、ＩＣＡ３３から入力された分離行列Ｗ(f)とに基づいて、音源分離処理を行って分離信号（信号ｙ₁(f,t)，ｙ₂(f,t)）を生成する機能を有する。このとき、信号分離部３１は、式２を演算することにより、分離信号を求める。 The signal separation unit 31 performs sound source separation processing based on the signals (signals x ₁ (f, t) and x ₂ (f, t)) input from the FFT 30 and the separation matrix W (f) input from the ICA 33. To generate separation signals (signals y ₁ (f, t), y ₂ (f, t)). At this time, the signal separation unit 31 calculates a separated signal by calculating Equation 2.

なお、分離行列Ｗ(f)は、近似的には混合行列Ａ(z)の逆行列である。また、生成された分離信号（信号ｙ₁(f,t)，ｙ₂(f,t)）は、信号分離部３１の出力信号として、ＩＦＦＴ３２およびＩＣＡ３３に出力される。 Note that the separation matrix W (f) is approximately an inverse matrix of the mixing matrix A (z). The generated separated signals (signals y ₁ (f, t), y ₂ (f, t)) are output to the IFFT 32 and the ICA 33 as output signals of the signal separating unit 31.

ＩＦＦＴ３２は入力された信号に対して逆フーリェ変換を行う回路であり、信号分離部３１からの信号（信号ｙ₁(f,t)，ｙ₂(f,t)）を入力信号とし、当該入力信号の逆フーリェ変換後の信号（信号ｙ₁(t)，ｙ₂(t)）を出力信号とする。ＩＦＦＴ３２からの出力信号は、分離信号として、信号処理回路３からの出力信号となる。 The IFFT 32 is a circuit that performs inverse Fourier transform on an input signal. The signal (signal y ₁ (f, t), y ₂ (f, t)) from the signal separation unit 31 is used as an input signal, and the input is performed. The signals after the inverse Fourier transform of the signals (signals y ₁ (t), y ₂ (t)) are output signals. The output signal from the IFFT 32 becomes an output signal from the signal processing circuit 3 as a separation signal.

図３は、信号分離部３１とＩＣＡ３３の詳細とを示す図である。ただし、図３では、補正部３４を省略している。 FIG. 3 is a diagram showing details of the signal separation unit 31 and the ICA 33. However, the correction unit 34 is omitted in FIG.

ＩＣＡ３３は、分離行列演算部３５、方向特定部３６および補間部３７を備え、独立成分分析法（ICA：Independent Component Analysis）を適用することにより、信号分離部３１において音源分離処理を行うための分離行列Ｗ(f)を求める機能を有する。 The ICA 33 includes a separation matrix calculation unit 35, a direction specifying unit 36, and an interpolation unit 37, and is applied to perform sound source separation processing in the signal separation unit 31 by applying an independent component analysis method (ICA: Independent Component Analysis). It has a function for obtaining the matrix W (f).

分離行列演算部３５は、入力される信号ｘ₁(f,t)，ｘ₂(f,t)における周波数帯域ｆを複数の分割帯域に分割するとともに、分割した複数の分割帯域をそれぞれ学習帯域群ｆ_gまたは補間帯域群ｆ_hに分類する。なお、分離行列演算部３５は、学習を行う際の反復回数を設定するとともに、各回ごとに前述の分類を行うことが可能である。例えば、一回目の学習演算を行うときには全周波数帯域ｆを学習帯域群ｆ_gとし、二回目以降の学習演算では間引いた残りの分割帯域のみを学習帯域群ｆ_gとすることも可能である。間引きが行われた場合の間引かれた分割帯域は、分離行列演算部３５によって補間帯域群ｆ_hに分類される。 The separation matrix calculator 35 divides the frequency band f in the input signals x ₁ (f, t) and x ₂ (f, t) into a plurality of divided bands, and each of the divided divided bands is a learning band. Group into group f _g or interpolation band group f _h . In addition, the separation matrix calculation unit 35 can set the number of iterations for learning and can perform the above-described classification for each time. For example, when performing the first learning calculation, the entire frequency band f can be set as the learning band group f _g, and in the second and subsequent learning calculations, only the remaining divided bands can be set as the learning band group f _g . When the thinning is performed, the thinned divided bands are classified into the interpolation band group f _h by the separation matrix calculation unit 35.

また、分離行列演算部３５は、学習帯域群ｆ_gにおける部分分離行列（学習分離行列ＷＧ(ｆ_g)）を、学習演算により求める。求めた学習分離行列ＷＧ(ｆ_g)は、方向特定部３６に出力する。なお、学習分離行列Ｗ(ｆ_g)を求める際の学習演算方法は、従来の技術を適用することが可能であるため、ここでは詳細に述べないが、例えば、式３に示す公式から求めることができる。 Further, the separation matrix calculation unit 35 obtains a partial separation matrix (learning separation matrix WG (f _g )) in the learning band group f _g by learning calculation. The obtained learning separation matrix WG (f _g ) is output to the direction specifying unit 36. Note that the learning calculation method for obtaining the learning separation matrix W (f _g ) is not described in detail here because the conventional technique can be applied, but for example, it is obtained from the formula shown in Equation 3. Can do.

さらに、分離行列演算部３５は、求めた学習分離行列ＷＧ(ｆ_g)と、補間部３７から入力される補間分離行列ＷＨ(ｆ_h)に基づいて、先述のように、全周波数領域ｆにおける分離行列Ｗ(f)を求める。具体的には、式４を演算することにより、分離行列Ｗ(f)を求める。 Further, the separation matrix calculation unit 35, based on the obtained learning separation matrix WG (f _g ) and the interpolation separation matrix WH (f _h ) input from the interpolation unit 37, in the entire frequency region f as described above. A separation matrix W (f) is obtained. Specifically, the separation matrix W (f) is obtained by calculating Expression 4.

なお、式４から明らかなように、全ての分割帯域が学習帯域群ｆ_gに分類されている場合には、Ｗ(f)＝ＷＧ(ｆ_g)が成立する。 As is clear from Equation 4, when all the divided bands are classified into the learning band group f _g , W (f) = WG (f _g ) is established.

図４は、最初の分離行列Ｗ₁(f)を求める様子を概念的に示す図である。ここで、分離行列Ｗ₀(f)は初期値として予め与えられているものとする。また、部分分離行列Ｗ₀(ｆ₁)，Ｗ₀(ｆ₂)，Ｗ₀(ｆ₃)，・・・，Ｗ₀(ｆ_n)は、それぞれ分割帯域ｆ₁，ｆ₂，ｆ₃，・・・，ｆ_nにおける部分分離行列を示す。すなわち、ここに示す例では、周波数帯域ｆは、ｎ個に分割されている（ｎは自然数）。 FIG. 4 is a diagram conceptually showing how the first separation matrix W ₁ (f) is obtained. Here, it is assumed that the separation matrix W ₀ (f) is given in advance as an initial value. The partial separation matrices W ₀ (f ₁ ), W ₀ (f ₂ ), W ₀ (f ₃ ),..., W ₀ (f _n ) are divided into divided bands f ₁ , f ₂ , f ₃ , ..., the partial separation matrix at f _n . That is, in the example shown here, the frequency band f is divided into n (n is a natural number).

本実施の形態において、最初の分離行列Ｗ₁(f)を求める際には、分離行列演算部３５は、全ての分割帯域を学習帯域群ｆ_gに分類する。したがって、図４に示すように、全ての部分分離行列Ｗ₀(ｆ₁)，Ｗ₀(ｆ₂)，Ｗ₀(ｆ₃)，・・・，Ｗ₀(ｆ_n)が「学習」の対象となり、それぞれについて式３が演算される。 In the present embodiment, when obtaining the first separation matrix W ₁ (f), the separation matrix calculator 35 classifies all the divided bands into the learning band group f _g . Therefore, as shown in FIG. 4, all partial separation matrices W ₀ (f ₁ ), W ₀ (f ₂ ), W ₀ (f ₃ ),..., W ₀ (f _n ) are “learning”. Equation 3 is computed for each.

次に、求まった全ての部分分離行列（ここでは求まった全ての部分分離行列が学習分離行列ＷＧ(ｆ_g)となる）が、式４を演算することにより加算されて、分離行列Ｗ₁(f)が求まる。 Next, all the obtained partial separation matrices (here, all the obtained partial separation matrices become the learning separation matrix WG (f _g )) are added by calculating Equation 4, and the separation matrix W ₁ ( f) is obtained.

このように、本実施の形態では、最初の分離行列Ｗ₁(f)を求める際には、実際に得られた信号に基づいて、全ての分割帯域について学習演算を行って分離行列Ｗ₁(f)を求めるので、非定常的な環境においても、柔軟に対応できる。したがって、後述の処理において、異なる音源において発生した音を精度よく分離することができる。 As described above, in the present embodiment, when the first separation matrix W ₁ (f) is obtained, the learning calculation is performed for all the divided bands based on the actually obtained signal, and the separation matrix W ₁ ( Since f) is obtained, it can be flexibly handled even in an unsteady environment. Therefore, it is possible to accurately separate sounds generated in different sound sources in the processing described later.

図５は、ｉ＋１回目の分離行列Ｗ_i+1(f)を求める様子を概念的に示す図である。図５に示す例では、ｎ＝４ｍ−３（ｍは自然数）を満たす分割帯域ｆ_nが学習帯域群ｆ_gに分類される。すなわち、全分割帯域ｆ_nのうちの４分の１のみ反復学習を行い、他の４分の３については補間処理を行う。 FIG. 5 is a diagram conceptually showing how the i + 1-th separation matrix W _{i + 1} (f) is obtained. In the example shown in FIG. 5, the divided band f _n satisfying n = 4m−3 (m is a natural number) is classified into the learning band group f _g . That is, iterative learning is performed only for one quarter of all the divided bands f _n , and interpolation processing is performed for the other three quarters.

具体的に説明すると、学習帯域群ｆ_gに分類された分割帯域ｆ₁，ｆ₅，・・・・の部分分離行列Ｗ_i(ｆ₁)，Ｗ_i(ｆ₅)，・・・については、学習演算の対象とする。したがって、分離行列演算部３５は式３を実行して、学習分離行列ＷＧ_i+1(ｆ₁)，ＷＧ_i+1(ｆ₅)，・・・を求める。一方、補間帯域群ｆ_hに分類された分割帯域ｆ₂，ｆ₃，ｆ₄，・・・の部分分離行列Ｗ_i(ｆ₂)，Ｗ_i(ｆ₃)，Ｗ_i(ｆ₄)，・・・については、学習演算の対象としない。そして、補間帯域群ｆ_hにおける部分分離行列は、補間分離行列ＷＨ_i+1(ｆ₂)，ＷＨ_i+1(ｆ₃)，ＷＨ_i+1(ｆ₄)，・・・として、後述する補間部３７から分離行列演算部３５に入力される。分離行列演算部３５は、学習演算により求めた学習分離行列ＷＧ_i+1(ｆ₁)，ＷＧ_i+1(ｆ₅)，・・・と、補間部３７から入力された補間分離行列ＷＨ_i+1(ｆ₂)，ＷＨ_i+1(ｆ₃)，ＷＨ_i+1(ｆ₄)，・・・とに基づいて、式４を演算して、ｉ＋１回目の分離行列Ｗ_i+1(f)を求める。 When specifically described, the learning band group f _g to the classified sub-bands f _1, f _5, part of ... separation matrix _{_{W i (f 1), W}} i (f 5), for ... are , Subject to learning calculations. Therefore, the separation matrix calculation unit 35 executes Expression 3 to obtain learning separation matrices WG _{i + 1} (f ₁ ), WG _{i + 1} (f ₅ ),. On the other hand, sub-bands f _2, which is classified in the interpolation band group _{_{_{f h, f 3, f 4}}} , ··· partial separation matrix _{_{W i (f 2), W}} i (f 3), W i (f 4), ... are not subject to learning calculations. The partial separation matrix in the interpolation band group f _h will be described later as an interpolation separation matrix WH _{i + 1} (f ₂ ), WH _{i + 1} (f ₃ ), WH _{i + 1} (f ₄ ),. Input from the interpolation unit 37 to the separation matrix calculation unit 35. The separation matrix calculation unit 35 includes learning separation matrices WG _{i + 1} (f ₁ ), WG _{i + 1} (f ₅ ),... Obtained by learning calculation, and the interpolation separation matrix WH _i input from the interpolation unit 37. _{Based on +1} (f ₂ ), WH _{i + 1} (f ₃ ), WH _{i + 1} (f ₄ ),..., Equation 4 is calculated and the i + 1-th separation matrix W _{i + 1} ( Find f).

このように、本実施の形態では、分離行列演算部３５が複数回の反復学習を行って分離行列Ｗ(f)を求めるので、さらに精度が向上する。また、学習演算は比較的複雑で、反復することによりさらに演算量が膨大となるが、補間帯域群ｆ_hに分類された分割帯域については、後述する補間部３７から入力される補間分離行列ＷＨ(ｆ_h)を用いることにより、分離行列Ｗ(f)を求めるための演算量を抑制することができる。 Thus, in this embodiment, since the separation matrix calculation unit 35 performs iterative learning a plurality of times to obtain the separation matrix W (f), the accuracy is further improved. Further, the learning calculation is relatively complicated, and the calculation amount becomes enormous by repeating. However, for the divided bands classified into the interpolation band group f _h , the interpolation separation matrix WH input from the interpolation unit 37 described later is used. By using (f _h ), the amount of calculation for obtaining the separation matrix W (f) can be suppressed.

なお、本実施の形態における監視システム１は、まず、全ての分割帯域を学習帯域群ｆ_gに分類して、全周波数帯域について２回の反復学習を行う。その後（２＜ｉ＜Ｒ：Ｒは反復回数の上限値）、上記に説明したように、４分の１の分割帯域のみを学習帯域群ｆ_gに分類して当該学習帯域群ｆ_gについてのみ反復学習を行う。そして、反復回数ｉが上限値Ｒとなると、一旦、反復学習を停止し、何かのトリガ（例えば、新たな異常音の観測による定常状態の変化や、所定の時間経過等）の発生があった場合に、反復回数ｉを初期化し、再び、全周波数帯域に対する反復学習を開始する。ただし、分離行列演算部３５による分割帯域の分類規則はこれに限定されるものではない。例えば、反復学習を行う分割帯域の数を徐々に減らすように規定してもよい。 Note that the monitoring system 1 according to the present embodiment first classifies all the divided bands into the learning band group f _g and performs iterative learning twice for all frequency bands. Then (2 <i <R: upper limit value of R is the number of iterations), as described above, the classified only one sub-band of 4 minutes in the learning band group f _g the learned band group f _g only Perform iterative learning. When the number of iterations i reaches the upper limit value R, the iterative learning is temporarily stopped, and some trigger (for example, a change in steady state due to observation of a new abnormal sound or the passage of a predetermined time) occurs. In this case, the number of iterations i is initialized, and iterative learning for all frequency bands is started again. However, the division rule of the divided band by the separation matrix calculation unit 35 is not limited to this. For example, you may prescribe | regulate so that the number of the division | segmentation bands which perform iterative learning may be reduced gradually.

分離行列演算部３５によって求められた分離行列Ｗ(f)は、ＩＣＡ３３からの出力として、補正部３４に出力される。 The separation matrix W (f) obtained by the separation matrix calculation unit 35 is output to the correction unit 34 as an output from the ICA 33.

方向特定部３６は、いわゆるビームフォーミングと呼ばれる演算手法（DOA：Direction of Arraival）を実行する。概略を説明すると、方向特定部３６は、到来する音波について、マイク２ａ，２ｂの位置によって変わる観測音の遅延時間τと、マイク２ａ，２ｂの特性とを利用して、音源方向Ｄｐを特定する。したがって、詳細は図示していないが、方向特定部３６は遅延時間τを計測するタイマとしての機能も備えている。また、マイク２ａ，２ｂの距離ｄは予め記憶されている設定データ等から取得されるものとする。なお、式５ないし式８は、方向特定部３６が音源方向Ｄｐを求める演算式を示す。 The direction specifying unit 36 executes a so-called beam forming calculation method (DOA: Direction of Arraival). In brief, the direction specifying unit 36 specifies the sound source direction Dp for the incoming sound wave using the observation sound delay time τ that varies depending on the positions of the microphones 2a and 2b and the characteristics of the microphones 2a and 2b. . Therefore, although not shown in detail, the direction specifying unit 36 also has a function as a timer for measuring the delay time τ. Further, it is assumed that the distance d between the microphones 2a and 2b is acquired from setting data stored in advance. In addition, Formula 5 thru | or Formula 8 show the computing equation in which the direction specific | specification part 36 calculates | requires sound source direction Dp.

このように、本実施の形態における方向特定部３６は、マイク２ａ，２ｂの位置情報（距離ｄ）を利用し、マイク２ａ，２ｂの特性を示す情報（特性情報）として、分離行列演算部３５から伝達される学習分離行列ＷＧ(ｆ_g)を用いる。 As described above, the direction specifying unit 36 according to the present embodiment uses the position information (distance d) of the microphones 2a and 2b and uses the position information (distance d) of the microphones 2a and 2b as information (characteristic information) indicating the characteristics of the microphones 2a and 2b. The learning separation matrix WG (f _g ) transmitted from is used.

これにより、音源の方向を特定する際に、学習結果を反映させることができるので、予め与えられた特性情報のみに基づいて音源方向Ｄ_pを特定する場合に比べて、精度が向上する。また、分離行列演算部３５が既に求めた値を用いることにより、特性情報を求めるための新たな演算を行う必要がない。言い換えれば、本実施の形態における方向特定部３６は、分離行列演算部３５において実行された学習演算の結果を効果的に利用することができる。 Thus, since the learning result can be reflected when the direction of the sound source is specified, the accuracy is improved as compared with the case where the sound source direction D _p is specified based only on the characteristic information given in advance. In addition, by using the values already obtained by the separation matrix computing unit 35, it is not necessary to perform a new computation for obtaining characteristic information. In other words, the direction specifying unit 36 in the present embodiment can effectively use the result of the learning calculation executed in the separation matrix calculating unit 35.

なお、本実施の形態における方向特定部３６は、反復回数が「２」のときの学習分離行列ＷＧ(ｆ_g)を特性情報として用いる。先述のように、本実施の形態において、反復回数「２」のときには、全周波数帯域が学習帯域群ｆ_gに分類されている。したがって、このときの学習分離行列ＷＧ(ｆ_g)とは、分離行列Ｗ₂(f)である。すなわち、特性情報は、部分分離行列（学習分離行列ＷＧ(ｆ_g)）に限定されるものではなく、分離行列演算部３５によって演算される分離行列Ｗ(f)であってもよい。ただし、特性情報は、位置情報と同様に予め設定データに含まれていてもよい。 Note that the direction specifying unit 36 in the present embodiment uses the learning separation matrix WG (f _g ) when the number of iterations is “2” as the characteristic information. As described above, in the present embodiment, when the number of iterations is “2”, all frequency bands are classified into the learning band group f _g . Therefore, the learning separation matrix WG (f _g ) at this time is the separation matrix W ₂ (f). That is, the characteristic information is not limited to the partial separation matrix (learning separation matrix WG (f _g )), but may be the separation matrix W (f) calculated by the separation matrix calculation unit 35. However, the characteristic information may be included in the setting data in advance like the position information.

方向特定部３６により求められた音源方向Ｄ_pは、補間部３７に向けて出力されるとともに、信号処理回路３の出力として機器制御部６に向けて出力される。 The sound source direction D _p obtained by the direction specifying unit 36 is output toward the interpolation unit 37 and also output toward the device control unit 6 as an output of the signal processing circuit 3.

補間部３７は、方向特定部３６から入力された音源方向Ｄ_pに基づいて、補間帯域群ｆ_hにおける部分分離行列である補間分離行列ＷＨ(ｆ_h)を取得する。補間部３７が補間分離行列ＷＨ(ｆ_h)を取得する方法としては、例えば、音源方向Ｄ_pの値ごとに適切な補間分離行列ＷＨ(ｆ_h)を予めテーブル（設定データ）として記憶しておき、方向特定部３６から伝達された音源方向Ｄ_pの値を検索キーとして、当該テーブルから適切な補間分離行列ＷＨ(ｆ_h)を検索して取得する。 The interpolation unit 37 acquires an interpolation separation matrix WH (f _h ) that is a partial separation matrix in the interpolation band group f _h based on the sound source direction D _p input from the direction specifying unit 36. As a method for the interpolation unit 37 to acquire the interpolation separation matrix WH (f _h ), for example, an appropriate interpolation separation matrix WH (f _h ) is stored in advance as a table (setting data) for each value of the sound source direction D _p. Then, using the value of the sound source direction D _p transmitted from the direction specifying unit 36 as a search key, an appropriate interpolation separation matrix WH (f _h ) is searched and acquired from the table.

なお、この場合、テーブルを記憶するために必要とされる記憶容量を抑制するには、例えば、−９０°から９０°までの方向について、１０°刻み程度で記憶しておくことが好ましい。また、補間分離行列ＷＨ(ｆ_h)を取得する方法としては、予め記憶しておいたテーブルを参照する方法に限定されるものではなく、音源方向Ｄ_pに基づいて、演算により求めることもできる。 In this case, in order to suppress the storage capacity required for storing the table, for example, it is preferable to store in about 10 ° increments in the directions from −90 ° to 90 °. Further, the method for obtaining the interpolation separation matrix WH (f _h ) is not limited to the method for referring to a pre-stored table, and can be obtained by calculation based on the sound source direction D _p. .

補間部３７により取得された補間分離行列ＷＨ(ｆ_h)は、分離行列演算部３５に向けて出力され、先述のように、分離行列演算部３５が式４を演算する際に使用される。 The interpolation separation matrix WH (f _h ) acquired by the interpolation unit 37 is output to the separation matrix calculation unit 35, and is used when the separation matrix calculation unit 35 calculates Equation 4 as described above.

図２に戻って、補正部３４は、分離行列演算部３５から入力される分離行列Ｗ(f)に対して、パミュテーションおよびレベル推定を行う処理部である。一般的に、独立成分分析法では、チャンネルおよびレベルについて任意性を許容している。そのため、分離された信号に異なるチャンネルの成分が含まれている場合や、レベルが異なる場合が発生する。すなわち、補正部３４はこのような問題を補正する機能を有する。 Returning to FIG. 2, the correction unit 34 is a processing unit that performs permutation and level estimation on the separation matrix W (f) input from the separation matrix calculation unit 35. In general, independent component analysis allows arbitraryness for channels and levels. For this reason, a case where components of different channels are included in the separated signal or a case where the levels are different occurs. That is, the correction unit 34 has a function of correcting such a problem.

図１に戻って、雑音除去回路４は、分離信号に含まれている雑音成分を除去するフィルタリング処理を行う回路である。雑音除去回路４において用いられるフィルタには様々なものが適用可能であるが、本実施の形態では、スペクトルサブトラクション方式を用いる。スペクトルサブトラクション方式を用いることにより、ゼロ位相で周波数特性をコントロールすることが容易となる。雑音除去回路４によって雑音が除去された分離信号は、音認識回路５に向けて出力される。 Returning to FIG. 1, the noise removal circuit 4 is a circuit that performs a filtering process for removing a noise component included in the separated signal. Various filters can be applied to the filter used in the noise removal circuit 4, but in this embodiment, a spectral subtraction method is used. By using the spectral subtraction method, it becomes easy to control the frequency characteristics with zero phase. The separated signal from which noise has been removed by the noise removal circuit 4 is output toward the sound recognition circuit 5.

音認識回路５は、分離信号により表現される音に、目的の環境音が含まれているか否かを判定する機能を有する。なお、目的の環境音とは、監視システム１によって検出しようとする音として予め指定されている音であって、例えば、防犯ブザーの音、悲鳴あるいは警笛音等である。 The sound recognition circuit 5 has a function of determining whether or not a target environmental sound is included in the sound expressed by the separated signal. The target environmental sound is a sound designated in advance as a sound to be detected by the monitoring system 1, and is, for example, a security buzzer sound, a scream or a horn sound.

観測された音を認識し識別するアルゴリズムは様々なものが提案されているが、本実施の形態における監視システム１では、非音声系における特性の優れた「混合ガウスモデル（ＧＭＭ）」を用いる。ただし、「隠れマルコフモデル（ＨＭＭ）」等の他の手法が用いられてもよい。 Various algorithms for recognizing and identifying the observed sound have been proposed, but the monitoring system 1 in the present embodiment uses a “mixed Gaussian model (GMM)” having excellent characteristics in a non-voice system. However, other methods such as “Hidden Markov Model (HMM)” may be used.

図１では図示を省略しているが、音認識回路５は、記憶装置７に記憶されている目的の環境音に関する情報（以下、「環境音情報」と称する）と、目的の環境音以外の音に関する情報（以下、「非環境音情報」と称する）とを参照することが可能である。 Although not shown in FIG. 1, the sound recognition circuit 5 includes information on the target environmental sound stored in the storage device 7 (hereinafter referred to as “environmental sound information”) and information other than the target environmental sound. It is possible to refer to information related to sound (hereinafter referred to as “non-environmental sound information”).

このように、環境音情報と非環境音情報とを記憶装置７に記憶させてデータベースを構築しておくことにより、本実施の形態における監視システム１は、環境音情報と非環境音情報とを任意に書き換えることが可能である。これにより、例えば、監視システム１の目的や設置場所等の状況に応じて、適切に周囲の環境を監視することができる。 In this way, by storing the environmental sound information and the non-environmental sound information in the storage device 7 and constructing a database, the monitoring system 1 in the present embodiment uses the environmental sound information and the non-environmental sound information. It can be arbitrarily rewritten. Thereby, for example, the surrounding environment can be appropriately monitored in accordance with the purpose of the monitoring system 1, the installation location, and the like.

例えば、踏切を監視するために監視システム１を設置した場合、当該監視システム１は、特に、遮断機が降りている間の状況を監視（撮影等）する必要がある。このような場合には、当該踏切の遮断機が降りるときの警告音を環境音情報として記憶しておくことが好ましい。一方、たまたま踏切の近くの店舗に監視システム１を設置した場合、遮断機が降りている状況を撮影する必要はなく、逆に、遮断機が降りるたびにカメラ８０が撮影を行うことは好ましくない。このような場合には、当該警告音を非環境音情報として記憶しておくことにより、非定常音（常に発生しているわけではない音）を定常音（検出不要な音）として扱うことができる。 For example, when the monitoring system 1 is installed to monitor a crossing, the monitoring system 1 needs to monitor (photograph, etc.) the situation while the breaker is getting off. In such a case, it is preferable to store a warning sound when the railroad crossing breaker gets off as environmental sound information. On the other hand, when the monitoring system 1 is installed in a store near the railroad crossing, it is not necessary to take a picture of the situation where the breaker is getting down. On the contrary, it is not preferable that the camera 80 takes a picture whenever the breaker gets off. . In such a case, by storing the warning sound as non-environmental sound information, an unsteady sound (a sound that is not always generated) can be treated as a steady sound (a sound that does not need to be detected). it can.

音認識回路５は、入力された分離信号から特徴量を抽出して、環境音情報および非環境音情報と比較（尤度判定）し、予め設定されている閾値に応じて、当該入力された分離信号によって表現される音が目的の環境音であるか否かを判定する。音認識回路５は、認識した結果（判定結果）を機器制御部６に伝達する。なお、判定に用いる閾値は、予め設定データとして記憶装置７に記憶されており、適宜、変更可能である。 The sound recognition circuit 5 extracts a feature amount from the input separated signal, compares it with environmental sound information and non-environmental sound information (likelihood determination), and inputs the input according to a preset threshold value. It is determined whether or not the sound expressed by the separated signal is the target environmental sound. The sound recognition circuit 5 transmits the recognized result (determination result) to the device control unit 6. Note that the threshold value used for the determination is stored in advance in the storage device 7 as setting data, and can be changed as appropriate.

機器制御部６は、制御信号を生成して伝達することにより、監視システム１が備えるカメラ８０、マイク８１、スピーカ８２および再生装置８３を制御する機能を有する。 The device control unit 6 has a function of controlling the camera 80, the microphone 81, the speaker 82, and the playback device 83 included in the monitoring system 1 by generating and transmitting a control signal.

機器制御部６は、音認識回路５から、目的の環境音が観測されたことを示す判定結果を受けた場合に、カメラ８０による撮影の開始、マイク８１による録音の開始、およびスピーカ８２による警報再生を開始させる。 When receiving a determination result indicating that the target environmental sound has been observed from the sound recognition circuit 5, the device control unit 6 starts shooting by the camera 80, starts recording by the microphone 81, and issues an alarm by the speaker 82. Start playback.

また、機器制御部６は、信号処理回路３から入力される音源方向Ｄ_pに基づいて、カメラ８０、マイク８１およびスピーカ８２のアクチュエータ（図示せず）を制御する。これにより、カメラ８０のパン・チルト・ズーム等の動作、マイク８１やスピーカ８２の向き調整等の動作を適切に行うことができる。 The device control unit 6 controls the actuators (not shown) of the camera 80, the microphone 81, and the speaker 82 based on the sound source direction D _p input from the signal processing circuit 3. Thereby, operations such as pan / tilt / zoom of the camera 80 and operations such as adjusting the direction of the microphone 81 and the speaker 82 can be appropriately performed.

このように音源方向Ｄ_pに基づいて各装置を制御することによって、例えば、悲鳴が発生した方向（悲鳴と認識された音を表現した分離信号の音源方向）にカメラ８０を向けることができるので、より確実に撮影すべき被写体を撮影することができる。すなわち、より適切な画像を記録することができるため、録画効率が向上する。 By controlling each device based on the sound source direction D _p in this way, for example, the camera 80 can be directed in the direction in which the scream occurred (the sound source direction of the separated signal representing the sound recognized as scream). This makes it possible to shoot a subject to be photographed more reliably. That is, since a more appropriate image can be recorded, the recording efficiency is improved.

また、悲鳴が発生した方向にマイク８１を向けることにより、マイク８１の指向性を考慮した収音が行われるので、録音効率が向上する。なお、マイク８１ではなく、マイク２ａ，２ｂを音源方向Ｄ_pに向けるように制御してもよい。 Also, by directing the microphone 81 in the direction in which the scream occurred, sound collection is performed in consideration of the directivity of the microphone 81, so that the recording efficiency is improved. In the microphone 81 without microphones 2a, 2b and may be controlled to direct the sound source direction D _p.

さらに、スピーカ８２を、悲鳴が発生した方向に向けることにより、単に警告音を再生するだけの場合に比べて、威嚇効果が向上する。なお、威嚇の目的を達成するための装置としては警告音を再生するスピーカ８２に限られるものではなく、例えば、光を発するサーチライトや非常灯等の照明装置であってもよい。その場合、被写体が効果的に照明されることになるので、カメラ８０と連動させることにより、威嚇効果のみならず、質のよい画像データを記録する効果もある。 Further, by directing the speaker 82 in the direction in which the scream occurred, the threatening effect is improved as compared with the case of merely reproducing the warning sound. Note that the device for achieving the purpose of intimidation is not limited to the speaker 82 that reproduces a warning sound, and may be an illumination device such as a searchlight or an emergency light that emits light. In that case, the subject is effectively illuminated, and by linking with the camera 80, not only a threatening effect but also an effect of recording high-quality image data is obtained.

なお、信号処理回路３からは、複数の音源に対して、それぞれ音源方向が出力されるが、機器制御部６は、これら複数の音源方向のうち、目的の環境音と認識された音（信号）の音源方向に基づいて、上記制御を行う。 The signal processing circuit 3 outputs a sound source direction for each of a plurality of sound sources. The device control unit 6 outputs a sound (signal) recognized as a target environmental sound among the plurality of sound source directions. The above control is performed based on the sound source direction.

記憶装置７は、一般的なハードディスク装置であり、各種データを記憶する。記憶装置７が記憶する情報としては、パラメータや初期値等を示す設定データや、音認識回路５によって参照される環境音情報および非環境音情報、カメラ８０によって撮影された画像データ、マイク８１で収音した音データ、スピーカ８２が再生する警告音の情報等である。 The storage device 7 is a general hard disk device and stores various data. Information stored in the storage device 7 includes setting data indicating parameters and initial values, environmental sound information and non-environmental sound information referred to by the sound recognition circuit 5, image data captured by the camera 80, and a microphone 81. The collected sound data, warning sound information reproduced by the speaker 82, and the like.

カメラ８０は、図示しない光電変換素子（ＣＣＤ等）を備えた一般的なデジタルカメラであり、機器制御部６によって制御される。カメラ８０によって撮影された画像データは、記憶装置７に転送され記憶される。なお、本実施の形態におけるカメラ８０は動画像を撮影する機能を有しているが、もちろん静止画を撮影するものであってもよい。 The camera 80 is a general digital camera including a photoelectric conversion element (CCD or the like) (not shown), and is controlled by the device control unit 6. Image data photographed by the camera 80 is transferred to and stored in the storage device 7. The camera 80 in the present embodiment has a function of capturing a moving image, but may of course capture a still image.

マイク８１は、一般的なマイクロフォンとしての機能を有しており、設置された位置において観測される音（観測音）を電気信号（音データ）に変換して、記憶装置７に転送する。先述のように、マイク８１から転送された音データは記憶装置７に記憶される。すなわち、監視システム１では、マイク８１および記憶装置７によって録音装置が構成されている。マイク８１の指向特性は記憶装置７に予め記憶されており、機器制御部６はこれを参照することによって、マイク８１を適切な向きに調整する。 The microphone 81 has a function as a general microphone, converts sound (observation sound) observed at the installed position into an electric signal (sound data), and transfers it to the storage device 7. As described above, the sound data transferred from the microphone 81 is stored in the storage device 7. In other words, in the monitoring system 1, the microphone 81 and the storage device 7 constitute a recording device. The directivity characteristic of the microphone 81 is stored in the storage device 7 in advance, and the device control unit 6 refers to this to adjust the microphone 81 in an appropriate direction.

スピーカ８２は、記憶装置７に記憶されている警告音の情報を、機器制御部６からの制御信号に応じて、音として再生する装置である。また、スピーカ８２の指向特性は記憶装置７に予め記憶されており、機器制御部６はこれを参照することによって、スピーカ８２を適切な向きに調整する。 The speaker 82 is a device that reproduces the information of the warning sound stored in the storage device 7 as a sound in accordance with a control signal from the device control unit 6. Further, the directivity characteristic of the speaker 82 is stored in the storage device 7 in advance, and the device control unit 6 refers to this to adjust the speaker 82 in an appropriate direction.

なお、図１では、カメラ８０、マイク８１およびスピーカ８２をいずれも一台のみ図示しているが、カメラ８０、マイク８１、スピーカ８２はいずれも一台に限定されるものではない。 In FIG. 1, only one camera 80, microphone 81, and speaker 82 are shown, but the camera 80, microphone 81, and speaker 82 are not limited to one.

再生装置８３は、例えば、各種データを表示するディスプレイやランプ、データを印刷する印刷装置、音を出力するスピーカ等の出力装置が主に該当し、監視システム１の目的や設置状況に応じて、その種類・機能等が選択され設けられる。監視システム１において再生装置８３は、オペレータや警備員等に状況を出力する通報装置として機能する。 The playback device 83 mainly corresponds to an output device such as a display and lamp for displaying various data, a printing device for printing data, a speaker for outputting sound, and the like, depending on the purpose and installation status of the monitoring system 1. The type and function are selected and provided. In the monitoring system 1, the playback device 83 functions as a notification device that outputs a situation to an operator, a guard, or the like.

再生装置８３が再生する情報は、主に、記憶装置７に記憶された画像データ（カメラ８０によって撮影されたデータ）や、音データ（マイク８１から転送されたデータや、警報音の情報等）であるが、例えば、履歴情報等の加工された情報であってもよいし、予め記憶されている所定の情報であってもよい。 Information reproduced by the reproduction device 83 is mainly image data (data photographed by the camera 80) stored in the storage device 7, sound data (data transferred from the microphone 81, information on alarm sound, etc.). However, it may be processed information such as history information, or may be predetermined information stored in advance.

さらに、再生装置８３は、信号処理回路３によって分離された分離信号よって表現される音を再生する。図１では詳細を省略しているが、記憶装置７は信号処理回路３によって生成された分離信号を音データとして記憶する。そして、再生装置８３は、このようにして記憶された音データを再生する。すなわち、監視システム１は、マイク２ａ，２ｂを観測装置としてのみならず、録音装置の一部としても兼用する。これにより、目的の環境音であると認識された音を、直接オペレータが聴いて判断することができる。 Further, the playback device 83 plays back the sound expressed by the separated signal separated by the signal processing circuit 3. Although details are omitted in FIG. 1, the storage device 7 stores the separated signal generated by the signal processing circuit 3 as sound data. Then, the playback device 83 plays back the sound data stored in this way. That is, the monitoring system 1 uses the microphones 2a and 2b not only as an observation device but also as a part of a recording device. Thereby, the operator can directly determine the sound recognized as the target environmental sound and listen to it.

なお、監視システム１の使用形態によっては、警備員等が監視システム１の監視場所（設置場所）から離れた遠隔地に駐在していることもある。この場合、再生装置８３の一部または全部は、ネットワークを介して、当該警備員等が駐在している場所（駐在所）に設置されていてもよい。 Depending on how the monitoring system 1 is used, a guard or the like may be stationed at a remote location away from the monitoring location (installation location) of the monitoring system 1. In this case, a part or all of the playback device 83 may be installed in a place (a representative office) where the guards are stationed via a network.

以上が、監視システム１の構成および機能の説明である。次に、監視システム１の動作を簡単に説明する。 The above is the description of the configuration and functions of the monitoring system 1. Next, the operation of the monitoring system 1 will be briefly described.

監視状態における監視システム１は、周囲の環境を示す情報として、継続的に、マイク２ａ，２ｂによって観測音を示す信号Ｘ(t)を取得する。そして、取得したＸ(t)を信号処理回路３によって音源ごとに分離して分離信号を生成し、雑音除去回路４を経て、音認識回路５が当該分離信号に対して音認識を行う。 The monitoring system 1 in the monitoring state continuously acquires the signal X (t) indicating the observation sound by the microphones 2a and 2b as information indicating the surrounding environment. Then, the acquired X (t) is separated for each sound source by the signal processing circuit 3 to generate a separated signal, and the sound recognition circuit 5 performs sound recognition on the separated signal through the noise removal circuit 4.

このようにして、監視システム１は、監視状態において、継続的に、音認識回路５による認識を行っており、目的の環境音が含まれていないか否か（周囲の環境においてそのような音が発生しているか否か）を常に監視している。 In this way, the monitoring system 1 continuously performs recognition by the sound recognition circuit 5 in the monitoring state, and whether or not the target environmental sound is included (such sound in the surrounding environment). Or not) is constantly monitored.

また、監視システム１は、信号処理回路３において、先述のように、反復学習によって求まる分離行列Ｗ(f)を用いて、音の信号を分離する。これにより、監視システム１は非定常的な屋外環境に設置された場合であっても、従来のシステムに比べて高精度に音源ごとの信号を分離することができる。したがって、複数の音源から発生した複数の音が混合した観測音（混合音）から高精度に音を分離抽出することができるので、音認識回路５における認識精度が向上し、システムの信頼性が向上する。 The monitoring system 1 separates the sound signal in the signal processing circuit 3 using the separation matrix W (f) obtained by iterative learning as described above. Thereby, even when the monitoring system 1 is installed in an unsteady outdoor environment, the signal for each sound source can be separated with higher accuracy than the conventional system. Therefore, since the sound can be separated and extracted with high accuracy from the observation sound (mixed sound) obtained by mixing a plurality of sounds generated from a plurality of sound sources, the recognition accuracy in the sound recognition circuit 5 is improved and the reliability of the system is improved. improves.

音認識回路５が目的の環境音を検出するまでの間、機器制御部６は、カメラ８０、マイク８１およびスピーカ８２をＯＦＦ状態とする。したがって、この間、カメラ８０は撮影を行わず、マイク８１による録音も行われない。また、スピーカ８２による警告音の再生も行われない。これにより、記憶装置７に記録されるデータ（画像データおよび音データ）の量を効果的に抑制することができる。 Until the sound recognition circuit 5 detects the target environmental sound, the device control unit 6 turns off the camera 80, the microphone 81, and the speaker 82. Accordingly, during this time, the camera 80 does not shoot and recording by the microphone 81 is not performed. Further, the warning sound is not reproduced by the speaker 82. Thereby, the amount of data (image data and sound data) recorded in the storage device 7 can be effectively suppressed.

一方、音認識回路５が分離信号に目的の環境音が含まれていると判定すると、その旨が機器制御部６に伝達される。そして、機器制御部６は、カメラ８０、マイク８１およびスピーカ８２をＯＮ状態に制御する。すなわち、カメラ８０に撮影を開始させるとともに、マイク８１に録音を開始させる。また、スピーカ８２に所定の警告音の再生を開始させる。 On the other hand, if the sound recognition circuit 5 determines that the target environmental sound is included in the separated signal, the fact is transmitted to the device control unit 6. And the apparatus control part 6 controls the camera 80, the microphone 81, and the speaker 82 to an ON state. That is, the camera 80 starts shooting and the microphone 81 starts recording. In addition, the speaker 82 is caused to start reproducing a predetermined warning sound.

この処理と並行して、機器制御部６は、信号処理回路３から入力される音源方向Ｄ_pに基づいて、カメラ８０の撮影調整の制御を行い、カメラ８０のパン・チルト・ズーム等の動作が行われる。同様に、機器制御部６は、信号処理回路３から入力される音源方向Ｄ_pに基づいて、マイク８１およびスピーカ８２の指向方向の調整制御とを行う。 In parallel with this processing, the device control unit 6 controls shooting adjustment of the camera 80 based on the sound source direction D _p input from the signal processing circuit 3, and performs operations such as pan / tilt / zoom of the camera 80. Is done. Similarly, the device control unit 6 performs adjustment control of the directivity directions of the microphone 81 and the speaker 82 based on the sound source direction D _p input from the signal processing circuit 3.

さらに、機器制御部６は、再生装置８３に対して、所定の通報を行うように制御する。この制御に応じて、再生装置８３は、必要な情報を記憶装置７から取得して、取得した情報に基づいて通報を行う。例えば、目的の環境音が検出された旨を示す文字や画像等をディスプレイの画面に表示したり、警告灯を点灯させたり、あるいは所定の警告音を再生したりする。また、カメラ８０およびマイク８１が記録している内容をリアルタイムに再生する。このように、再生装置８３によってリアルタイムの通報を行うことにより、警備員等の迅速な対応が可能となる。 Furthermore, the device control unit 6 controls the playback device 83 to make a predetermined report. In response to this control, the playback device 83 acquires necessary information from the storage device 7 and makes a report based on the acquired information. For example, a character or an image indicating that the target environmental sound is detected is displayed on the display screen, a warning lamp is turned on, or a predetermined warning sound is reproduced. Further, the contents recorded by the camera 80 and the microphone 81 are reproduced in real time. In this way, by making a real-time report by the playback device 83, it is possible for a guard or the like to respond quickly.

なお、機器制御部６は、一旦検出された目的の環境音が停止した場合には、所定の時間が経過した後に、カメラ８０、マイク８１およびスピーカ８２をＯＦＦ状態に戻す。また、再生装置８３についても、適宜、ＯＦＦ状態となるように制御する。 Note that when the target environmental sound once detected stops, the device control unit 6 returns the camera 80, the microphone 81, and the speaker 82 to the OFF state after a predetermined time has elapsed. Further, the playback device 83 is also controlled to be in an OFF state as appropriate.

機器制御部６は、目的の環境音が検出されたときだけでなく、オペレータや監視員等から指示があった場合にも当該指示に従って再生装置８３を制御する。これにより、例えば、これまでに撮影された画像データや録音された音データの再生、あるいは目的の環境音が検出された履歴情報の出力（プリントアウトや画面表示等）が行われる。すなわち、監視システム１は、リアルタイムの通報に加えて、過去の状況を通報（確認）することも可能である。 The device control unit 6 controls the playback device 83 according to the instruction not only when the target environmental sound is detected but also when there is an instruction from an operator or a monitor. Thereby, for example, reproduction of image data and sound data recorded so far, or output of history information in which a target environmental sound is detected (printout, screen display, etc.) is performed. That is, the monitoring system 1 can report (confirm) the past situation in addition to the real-time notification.

以上のように、本実施の形態における監視システム１は、周波数帯域を複数の分割帯域に分割して、各分割帯域ごとの部分分離行列を求めることによって、観測信号に対する分離行列を生成する。そして、生成された分離行列によって観測信号から、少なくとも１の分離信号を生成する。さらに、このようにして生成された分離信号に基づいて、当該分離信号によって表現される音が目的の環境音であるか否かを認識することにより、認識精度が向上する。 As described above, the monitoring system 1 according to the present embodiment generates a separation matrix for an observation signal by dividing a frequency band into a plurality of divided bands and obtaining a partial separation matrix for each divided band. Then, at least one separation signal is generated from the observation signal by the generated separation matrix. Furthermore, the recognition accuracy is improved by recognizing whether or not the sound expressed by the separated signal is the target environmental sound based on the separated signal generated in this way.

また、カメラ８０が、方向特定部３６により特定された音源方向に応じて、機器制御部６によって制御されることにより、効率よく撮影を行うことができる。 Further, the camera 80 is controlled by the device control unit 6 according to the sound source direction specified by the direction specifying unit 36, so that it is possible to perform shooting efficiently.

また、マイク８１（録音装置）が、方向特定部３６により特定された音源方向に応じて、機器制御部６によって制御されることにより、効率よく録音を行うことができる。 Further, the microphone 81 (recording device) can be efficiently recorded by being controlled by the device control unit 6 according to the sound source direction specified by the direction specifying unit 36.

また、特性情報として、分離行列演算部３５により生成された分離行列Ｗ(f)を用いることにより、音源方向を特定する際に、学習結果を反映させることができるので、精度が向上する。 Further, by using the separation matrix W (f) generated by the separation matrix calculator 35 as the characteristic information, the learning result can be reflected when the sound source direction is specified, so that the accuracy is improved.

また、分離行列演算部３５が求めた学習帯域群における部分分離行列と、補間部３７により求めた補間帯域群における部分分離行列とに基づいて、分離行列を生成することにより、方向特定部３６の演算結果（音源方向）を有効に活用しつつ、全周波数帯域について反復して学習処理を行う場合に比べて、演算量を抑制できる。 In addition, by generating a separation matrix based on the partial separation matrix in the learning band group obtained by the separation matrix calculation unit 35 and the partial separation matrix in the interpolation band group obtained by the interpolation unit 37, the direction specifying unit 36 Compared to the case where the learning process is repeatedly performed for all frequency bands while effectively using the calculation result (sound source direction), the amount of calculation can be suppressed.

また、再生装置８３が、音認識回路５による認識結果に基づいてオペレータや警備員等に通報することにより、例えば、目的の環境音が観測されたことを迅速かつ容易に知らせることができる。 In addition, the playback device 83 can notify the operator, the guard, or the like based on the recognition result by the sound recognition circuit 5 to quickly and easily notify that the target environmental sound has been observed, for example.

さらに、再生装置８３は、信号処理回路３によって分離された分離信号よって表現される音を再生することにより、再生された音によって、目的の環境音であるか否かをオペレータが直接判断することができる。 Furthermore, the playback device 83 plays back the sound expressed by the separated signal separated by the signal processing circuit 3, so that the operator directly determines whether the reproduced sound is the target environmental sound. Can do.

＜２．変形例＞
以上、本発明の実施の形態について説明してきたが、本発明は上記実施の形態に限定されるものではなく様々な変形が可能である。 <2. Modification>
Although the embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and various modifications can be made.

例えば、上記実施の形態では１回目の反復において、全周波数領域を学習帯域群ｆ_hに分類すると説明したが、もちろん１回目の学習のときから間引きを行ってもよい。その場合、分離行列Ｗ(f)の初期値Ｗ₀(f)で補間して、１回目の分離行列Ｗ₁(f)を求めてもよい。 For example, in the above embodiment, it has been described that the entire frequency region is classified into the learning band group f _h in the first iteration, but of course, thinning may be performed from the first learning. In that case, the first separation matrix W ₁ (f) may be obtained by interpolation with the initial value W ₀ (f) of the separation matrix W (f).

また、機器制御部６は、音認識回路５によって認識された目的の環境音の種類に応じて、カメラ８０、マイク８１、スピーカ８２および再生装置８３の制御を変更してもよい。例えば、認識された目的の環境音の種類に応じて、スピーカ８２から再生する音を選択するように構成してもよい。 Further, the device control unit 6 may change the control of the camera 80, the microphone 81, the speaker 82, and the playback device 83 according to the type of the target environmental sound recognized by the sound recognition circuit 5. For example, a sound to be reproduced from the speaker 82 may be selected according to the recognized type of the target environmental sound.

また、上記実施の形態において、機器制御部６がカメラ８０、マイク８１、スピーカ８２および再生装置８３を制御する手法は、あくまでも例示であって、このような制御に限定されるものではない。すなわち、監視システム１の目的、設置状況、構成等によって、適宜変更されてもよい。 Moreover, in the said embodiment, the method the apparatus control part 6 controls the camera 80, the microphone 81, the speaker 82, and the reproducing | regenerating apparatus 83 is an illustration to the last, Comprising: It is not limited to such control. In other words, the monitoring system 1 may be appropriately changed depending on the purpose, installation status, configuration, and the like.

また、上記実施の形態における方向特定部３６は、反復学習の過程で、補間分離行列ＷＨ(ｆ_h)を求めるために求めた音源方向Ｄ_pに基づいて、機器制御部６がカメラ８０等を制御すると説明した。しかし、反復学習を終了した後の分離行列Ｗ(f)に基づいて改めて音源方向Ｄ_pを求め、このようにして求めた音源方向Ｄ_pに基づいて機器制御部６がカメラ８０等を制御してもよい。この場合、方向特定部３６による演算回数が増加するものの、音源方向Ｄ_pの精度は向上する。 Further, the direction specifying unit 36 in the above embodiment is configured such that the device control unit 6 determines the camera 80 and the like based on the sound source direction D _p obtained in order to obtain the interpolation separation matrix WH (f _h ) in the process of iterative learning. Explained to control. However, the sound source direction D _p is obtained again based on the separation matrix W (f) after the iterative learning is completed, and the device control unit 6 controls the camera 80 and the like based on the sound source direction D _p thus obtained. May be. In this case, although the number of calculation by the direction specification portion 36 is increased, the accuracy of sound source direction D _p is improved.

本発明に係る監視システムを示す図である。It is a figure which shows the monitoring system which concerns on this invention. 信号処理回路の詳細を示す図である。It is a figure which shows the detail of a signal processing circuit. 信号分離部とＩＣＡの詳細とを示す図である。It is a figure which shows the detail of a signal separation part and ICA. 最初の分離行列Ｗ₁(f)を求める様子を概念的に示す図である。It is a figure which shows notionally a mode that the _1st separation matrix W1 (f) is calculated | required. ｉ＋１回目の分離行列Ｗ_i+1(f)を求める様子を概念的に示す図である。It is a figure which shows notionally a mode that the separation matrix W _{i + 1} (f) of i + 1 time is calculated | required.

符号の説明Explanation of symbols

１監視システム
２ａ，２ｂマイク
３信号処理回路
３０ＦＦＴ
３１信号分離部
３２ＩＦＦＴ
３３ＩＣＡ
３４補正部
３５分離行列演算部
３６方向特定部
３７補間部
４雑音除去回路
５音認識回路
６機器制御部
７記憶装置
８０カメラ
８１マイク
８２スピーカ
８３再生装置
９ａ，９ｂ音源
Ｄｐ音源方向 1 Monitoring System 2a, 2b Microphone 3 Signal Processing Circuit 30 FFT
31 Signal separator 32 IFFT
33 ICA
34 Correction Unit 35 Separation Matrix Calculation Unit 36 Direction Specification Unit 37 Interpolation Unit 4 Noise Removal Circuit 5 Sound Recognition Circuit 6 Device Control Unit 7 Storage Device 80 Camera 81 Microphone 82 Speaker 83 Playback Device 9a, 9b Sound Source Dp Sound Source Direction

Claims

検出対象の目的音に応じて周囲の環境を監視する監視システムであって、
それぞれの配置位置において観測された観測音を示す観測信号をそれぞれが生成する複数の観測装置と、
周波数帯域を複数の分割帯域に分割し、各分割帯域ごとの部分分離行列を求めることによって、前記観測信号に対する分離行列を生成する分離行列演算手段と、
前記分離行列演算手段により生成された分離行列によって前記複数の観測信号のうちの少なくとも１つから、少なくとも１の分離信号を生成する信号分離手段と、
前記信号分離手段により生成された分離信号に基づいて、前記観測音に前記目的音が含まれているか否かを認識する認識手段と、
前記観測装置間の距離に応じて変化する観測音の遅延時間を計測し、当該遅延時間と前記分離行列演算手段により生成された分離行列とに基づいて、前記目的音の音源方向を特定する方向特定手段と、
前記分離行列演算手段は、
前記複数の分割帯域をそれぞれ学習帯域群または補間帯域群に分類する帯域分類手段と、
前記方向特定手段により特定された音源方向に基づいて、前記補間帯域群における部分分離行列を演算する補間手段と、
学習処理により、前記学習帯域群における部分分離行列を演算する学習手段と、
を有し、
前記分離行列演算手段は、
前記学習手段により求めた学習帯域群における部分分離行列と、前記補間手段により求めた補間帯域群における部分分離行列とに基づいて、前記分離行列を生成することを特徴とする監視システム。 A monitoring system that monitors the surrounding environment according to a target sound to be detected ,
A plurality of observation devices each generating an observation signal indicating an observation sound observed at each arrangement position;
A separation matrix computing means for generating a separation matrix for the observed signal by dividing a frequency band into a plurality of divided bands and obtaining a partial separation matrix for each divided band;
Signal separation means for generating at least one separation signal from at least one of the plurality of observation signals by the separation matrix generated by the separation matrix calculation means;
Based on the separation signal generated by said signal separating means, recognizing means for recognizing whether or not the target sound is included in the observation sound,
A direction for measuring the delay time of the observation sound that changes according to the distance between the observation devices, and specifying the sound source direction of the target sound based on the delay time and the separation matrix generated by the separation matrix calculation means Specific means,
The separation matrix calculation means includes:
Band classification means for classifying the plurality of divided bands into learning band groups or interpolation band groups,
Interpolating means for calculating a partial separation matrix in the interpolation band group based on the sound source direction specified by the direction specifying means;
Learning means for calculating a partial separation matrix in the learning band group by learning processing;
Have
The separation matrix calculation means includes:
A monitoring system, wherein the separation matrix is generated based on a partial separation matrix in the learning band group obtained by the learning means and a partial separation matrix in the interpolation band group obtained by the interpolation means.

請求項１に記載の監視システムであって、
周囲の環境を撮影により記録する少なくとも１つのカメラ、
をさらに備え、
前記少なくとも１つのカメラは、前記方向特定手段により特定された音源方向に応じて制御されることを特徴とする監視システム。 The monitoring system according to claim 1,
At least one camera for recording by the photographing environment of ambient,
Further comprising
The monitoring system, wherein the at least one camera is controlled according to a sound source direction specified by the direction specifying means.

請求項１に記載の監視システムであって、
周囲の環境を録音により記録する少なくとも１つの録音装置、
をさらに備え、
前記少なくとも１つの録音装置は、前記方向特定手段により特定された音源方向に応じて制御されることを特徴とする監視システム。 The monitoring system according to claim 1,
At least one recording device for recording the record environment ambient,
Further comprising
The monitoring system, wherein the at least one recording device is controlled according to a sound source direction specified by the direction specifying means.

請求項１ないし３のいずれかに記載の監視システムであって、 The monitoring system according to any one of claims 1 to 3,
前記認識手段による認識結果に基づいてオペレータに通報する通報手段をさらに備えることを特徴とする監視システム。 A monitoring system further comprising reporting means for reporting to an operator based on a recognition result by the recognition means.

請求項４に記載の監視システムであって、  The monitoring system according to claim 4,
前記分離信号を音データとして記憶する記憶手段をさらに備え、  Storage means for storing the separated signal as sound data;
前記通報手段は、前記音データを再生することを特徴とする監視システム。  The monitoring system is characterized in that the notification means reproduces the sound data.