JP2005004518A

JP2005004518A - Monitoring device

Info

Publication number: JP2005004518A
Application number: JP2003167971A
Authority: JP
Inventors: Tadaya Miura; 肇也三浦
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-06-12
Filing date: 2003-06-12
Publication date: 2005-01-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a monitoring device for suitably coping with different abnormalities by keeping a low cost. <P>SOLUTION: When the human voice of "Bye." is detected, a monitor state is set. When the human voice of "I'm home." is detected by a microphone 12 in the monitor state, monitoring is terminated. In the monitor state, when not the human voice of "I'm home." but circumferential sound is detected by the microphone 12, when the level of the circumferential sound is not less than a threshold, and also when the circumferential sound does not coincide with any one of the noise of an automobile or a streetcar and the call sound of a telephone set, etc., in a normal state, an image indicating the inner part of a room, etc., photographed by a monitor camera 11 and the circumferential sound detected by the microphone 12, etc., are stored in a storage device 13. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、店舗や住宅等を監視するための監視装置に関する。
【０００２】
【従来の技術】
周知の様に監視装置として、多様なものが提案されている。
【０００３】
例えば、特許文献１には、プラントに設けられた測定器等により異常が検出されると、マイクから入力された音声を録音すると共に、監視カメラにより撮影された画像を録画し、その後で音声及び画像を再生して、プラントで発生した異常を確認するという技術が開示されている。
【０００４】
また、特許文献２には、交差点に衝撃音センサー及び監視カメラを設けておき、交差点で発生した事故の衝撃音を衝撃音センサーにより検出し、衝撃音センサーの検出出力に応答して監視カメラを起動し、事故発生時の交差点の画像を監視カメラで撮影してビデオデッキに記録し、その後で事故の画像を再生するという技術が開示されている。
【０００５】
更に、特許文献３は、各種の異常音を抽出して検出し、これらの異常音のレベルに応じて監視カメラの向きを制御し、監視カメラにより異常発生箇所を撮影するという技術が開示されている。
【０００６】
【特許文献１】
特開平７−３２５９９０号公報
【特許文献２】
特開平８−１１６５２８号公報
【特許文献３】
特開２００２−１２３８７８号公報
【０００７】
【発明が解決しようとする課題】
しかしながら、店舗や住宅等の監視を前提とする場合は、特許文献１乃至３の技術には次の様な問題点があった。
【０００８】
特許文献１では、測定器等により異常を検出し、その検出出力に応答して監視カメラにより撮影された画像を録画している。従って、店舗や住宅等の監視を前提とするならば、店舗や住宅等の異常を検出するための測定器等（センサー）を必用とする。ところが、店舗や住宅等の異常が多様であって、多数の各種センサーを設置する必要があることから、これらのセンサーを含むシステムの規模が大きくなって、コストが高くなり、一般向きではなくなる。
【０００９】
また、特許文献２では、衝撃音センサーの検出出力に応答して監視カメラを起動している。従って、非常に大きな衝撃音が発生したときに、監視カメラを起動していることになる。ところが、店舗や住宅等では、異常の発生に際し、非常に大きな衝撃音が発生するとは限らず、多様な異常に適確に対応することができない。
【００１０】
更に、特許文献３では、各種の異常音を抽出して検出し、これらの異常音のレベルに応じて監視カメラの向きを制御している。従って、店舗や住宅等の監視を前提とするならば、異常の発生に際し、店舗や住宅等で如何なる異常音が発生するかを予測しておく必要ある。ところが、空き巣等が発生する異常音を事前に予測することは困難であり、適確に対応することができない。
【００１１】
尚、店舗や住宅等を撮影する監視カメラを設け、店舗や住宅の留守中に、監視カメラによる撮影を継続して、その画像を記録し続けることも考えられる。しかしながら、この場合は、画像の記録時間が長くなるために、記憶装置の容量を大きくせねばならず、コストが高くなる。また、画像の確認時間が長くなり、また画像の管理が煩雑となって、実用的ではない。
【００１２】
そこで、本発明は、上記従来の問題点に鑑みてなされたものであり、コストを低く抑えることができ、多様な異常に適確に対応することができる監視装置を提供することを目的とする。
【００１３】
【課題を解決するための手段】
上記課題を解決するために、本発明は、監視対象領域を撮影する撮影手段と、画像記憶手段と、周辺音声を検出する音声検出手段と、予め設定された除外音声を記憶した除外音声記憶手段と、音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声を比較し、周辺音声と除外音声が異なるときに、撮影手段により撮影された画像を画像記憶手段に記憶させる制御手段とを備えている。
【００１４】
この様な構成の本発明によれば、音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声が異なるときに、撮影手段により撮影された画像を画像記憶手段に記憶させている。除外音声としては、例えば平常時の店舗や住宅で検出し得る音声を設定する。これにより、周辺音声と除外音声が一致するときには、平常時とみなすことができ、周辺音声と除外音声が異なるときには、異常時とみなすことができる。そして、周辺音声と除外音声が異なる異常時には、撮影手段により撮影された画像を画像記憶手段に記憶させる。このため、画像記憶手段に記憶された画像を再生すれば、異常時の様子を知ることができる。
【００１５】
例えば、店舗や住宅を監視する場合は、除外音声として、来客の報知音、電話機の呼び出し音、近隣の自動車や電車の音を設定する。これにより、来客の報知音、電話機の呼び出し音、近隣の自動車や電車の音が検出された平常時には、音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声が一致して、撮影手段により撮影された画像が画像記憶手段に記憶されず、またそれ以外の音が検出された異常時には、音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声が異なり、撮影手段により撮影された画像が画像記憶手段に記憶され、この画像の再生により異常時の様子を知ることができる。
【００１６】
また、平常時の店舗や住宅で検出し得る音声は、除外音声として、除外音声記憶手段に予め記憶させ易く、多様な音声があったとしても、これらの音声を除外音声記憶手段に予め記憶しておくことが可能である。そして、除外音声として、多様な音声を除外音声記憶手段に予め記憶しておくことにより、異常時の判定精度を高めることができる。
【００１７】
これに対して異常時の音声は、平常時に発生しないため、予測し難く、予め記憶しておくことが困難である。従って、従来の様に異常時の音声に基づいて、異常時の判定精度を高めることは困難である。
【００１８】
また、本発明においては、予め設定されたコマンド音声を記憶するコマンド音声記憶手段を備え、制御手段は、音声検出手段により検出された周辺音声とコマンド音声記憶手段内のコマンド音声を比較し、周辺音声とコマンド音声が一致するときに、監視を開始もしくは終了している。
【００１９】
例えば、店舗や住宅を監視する場合は、「いってきます」や「ただいま」という人の音声をコマンド音声としてコマンド音声記憶手段に記憶しておき、音声検出手段により検出された周辺音声が「いってきます」というコマンド音声に一致したときに、監視を開始し、また音声検出手段により検出された周辺音声が「ただいま」というコマンド音声に一致したときに、監視を終了する。これにより、格別な操作をしなくても、監視を開始したり終了したりすることができ、監視状態が無闇に継続されたり、不用意に無監視状態になることを防止することができ、画像記憶手段の容量の有効利用を果たすことができる。
【００２０】
更に、本発明においては、制御手段は、周辺音声とコマンド音声が一致するときに、撮影手段により撮影された画像を画像記憶手段に記憶させない。
【００２１】
ここでは、周辺音声とコマンド音声が一致しても、画像記憶手段への画像の記憶を行なわないことを明確にしている。
【００２２】
また、本発明においては、音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声が異なるときに、該周辺音声を記録する周辺音声記録手段を備え、制御手段は、周辺音声と除外音声が異なるときに画像記憶手段に記憶された画像を消去することを指示されると、周辺音声記録手段内の該周辺音声を新たな除外音声として除外音声記憶手段に記憶させている。
【００２３】
周辺音声と除外音声が異なるときは、異常時とみなされて、撮影手段により撮影された画像が画像記憶手段に記憶される。ところが、画像記憶手段に記憶された画像を再生してみても、平常時の様子しか確認することができなければ、周辺音声が平常時のものであるにもかかわらず、この周辺音声が除外音声記憶手段に記憶されていなかったことから、異常時とみなされて、画像が画像記憶手段に記憶されたことになる。また、通常、平常時の様子を示す画像は消去される。そこで、周辺音声と除外音声が異なるときに、画像を画像記憶手段に記憶するだけではなく、周辺音声を周辺音声記録手段に記録しておき、この後で画像記憶手段内の画像の消去を指示されたときに、周辺音声記録手段内の周辺音声を新たな除外音声として除外音声記憶手段に記憶させる。以降、同一の周辺音声が再度発生したときに、周辺音声と除外音声が一致し、撮影手段により撮影された画像が画像記憶手段に記憶されることはなくなる。これにより、監視精度を高め、画像記憶手段の容量をより有効に利用することができる。
【００２４】
更に、本発明においては、複数の音声検出手段を設け、除外音声記憶手段は、各音声検出手段に対応する予め設定されたそれぞれの除外音声を記憶し、制御手段は、各音声検出手段別に、音声検出手段により検出された周辺音声と除外音声記憶手段内の該音声検出手段に対応する除外音声を比較し、周辺音声と除外音声が異なるときに、撮影手段により撮影された画像を画像記憶手段に記憶させている。
【００２５】
各音声検出手段は、店舗や住宅の各箇所に設置され、それぞれの箇所の音声を検出する。また、各音声検出手段の設置箇所で検出され得るそれぞれの除外音声を該各音声検出手段に対応付けて予め記憶しておく。そして、各音声検出手段別に、音声検出手段により検出された周辺音声と除外音声記憶手段内の該音声検出手段に対応する除外音声を比較する。この場合は、各音声検出手段の設置箇所別に、平常時の除外音声を特定することになり、平常時の除外音声の種類を減らして、監視精度を高めることができる。
【００２６】
【発明の実施の形態】
以下、本発明の実施形態を添付図面を参照して詳細に説明する。
【００２７】
図１は、本発明の監視装置の一実施形態を示すブロック図である。本実施形態の監視装置は、店舗や住宅に設置され、室内等を撮影する監視カメラ１１と、室内等の周辺音声を検出するマイクロホン１２と、監視カメラ１１によって撮影された画像及びマイクロホン１２により検出された周辺音声を記憶する記憶装置１３と、マイクロホン１２により検出された周辺音声を分析する音声分析比較装置１４と、現在の年月日並びに時刻を計時する時計１５と、異常発生を電話回線を通じて外部端末に通知する通報装置１６と、ＣＲＴや液晶表示装置等の表示装置１７と、音声再生装置１８と、キーボード等からなる操作パネル１９と、この監視装置を統括的に制御する主制御装置２１と、この監視装置の各部を相互接続するバス２２とを備えている。
【００２８】
音声分析比較装置１４は、図２に示す様な各番号、各名称、及び各除外音声データを対応付けた除去音声データテーブル３１を記憶している。各除外音声データは、例えば図３に示す様な室内で、平常時に、マイクロホン１２により検出し得る自動車の騒音、電車の騒音、電話機の呼び出し音等の周辺音声（以下除外音声とも称す）を示すものである。
【００２９】
また、音声分析比較装置１４は、図４に示す様な各番号、各名称、及び各コマンド音声データを対応付けたコマンド音声データテーブル３２を記憶している。各コマンド音声データは、図３に示す様な室内で、マイクロホン１２により検出し得る「いってきます」や「ただいま」という人の音声を示すものである。
【００３０】
除外音声データ及びコマンド音声データのいずれも、マイクロホン１２からの音声信号を変換したものである。
【００３１】
例えば、自動車の騒音が発生しているときに、操作パネル１９の操作により除去音声データのサンプリングが主制御装置２０に指示されると、主制御装置２０によりマイクロホン１２及び音声分析比較装置１４が起動され、自動車の騒音がサンプリングされて、自動車の騒音を示す除去音声データが生成される。このとき、マイクロホン１２は、自動車の騒音を検出し、自動車の騒音を示す音声信号を音声分析比較装置１４に出力する。音声分析比較装置１４は、音声信号をデジタル化して、自動車の騒音を示す除去音声データを生成し、除去音声データを除去音声データテーブル３１に登録する。
【００３２】
同様に、「いってきます」という人の音声が発生しているときに、操作パネル１９の操作によりコマンド音声のサンプリングが主制御装置２０に指示されると、主制御装置２０によりマイクロホン１２及び音声分析比較装置１４が起動され、「いってきます」という人の音声がサンプリングされて、コマンド音声データが生成される。このとき、マイクロホン１２は、「いってきます」という人の音声を検出し、この人の音声を示す音声信号を音声分析比較装置１４に出力する。音声分析比較装置１４は、音声信号をデジタル化して、「いってきます」という人の音声を示すコマンド音声データを生成し、このコマンド音声データをコマンド音声データテーブル３２に登録する。
【００３３】
次に、この様な構成の監視装置による室内の監視手順を図５に示すフローチャートに従って説明する。
【００３４】
まず、待機状態では、マイクロホン１２は、周辺音声を検出する度に、周辺音声を示す音声信号を音声分析比較装置１４に出力する。音声分析比較装置１４は、マイクロホン１２からの音声信号を入力する度に、音声信号をデジタル化して、周辺音声データを生成し、この周辺音声データがコマンド音声データテーブル３２内の「いってきます」という人の音声を示すコマンド音声データに一致するか否かを判定する（ステップＳ１０１）。これにより、周辺音声がコマンド音声データテーブル３２に登録されている「いってきます」という人の音声に一致するか否かが判定される。
【００３５】
そして、音声分析比較装置１４は、周辺音声が「いってきます」という人の音声に一致しなければ（ステップＳ１０１で「Ｎｏ」）、待機状態を維持し続ける。また、音声分析比較装置１４は、周辺音声が「いってきます」という人の音声に一致すれば（ステップＳ１０１で「Ｙｅｓ」）、この旨を主制御装置２１に通知する。
【００３６】
主制御装置２１は、周辺音声が「いってきます」という人の音声に一致すると、監視状態を設定する（ステップＳ１０２）。
【００３７】
この監視状態において、音声分析比較装置１４は、マイクロホン１２からの音声信号を周辺音声データに変換し、この周辺音声データがコマンド音声データテーブル３２内の「ただいま」という人の音声を示すコマンド音声データかに一致するか否かを判定する（ステップＳ１０３）。これにより、周辺音声がコマンド音声データテーブル３２に登録されている「ただいま」という人の音声に一致するか否かが判定される。
【００３８】
そして、音声分析比較装置１４は、周辺音声が「ただいま」という人の音声に一致すれば（ステップＳ１０３で「Ｙｅｓ」）、この旨を主制御装置２１に通知する。
【００３９】
主制御装置２１は、周辺音声が「ただいま」という人の音声に一致すると、監視状態を終了して（ステップＳ１０４）、ステップＳ１０１の待機状態に戻る。
【００４０】
また、音声分析比較装置１４は、周辺音声が「ただいま」という人の音声に一致しなければ（ステップＳ１０３で「Ｎｏ」）、周辺音声データに基づいて、周辺音声のレベルが予め設定された閾値以上であるか否かを判定する（ステップＳ１０５）。そして、音声分析比較装置１４は、周辺音声のレベルが閾値以上でなければ（ステップＳ１０５で「Ｎｏ」）、ステップＳ１０３に戻る。
【００４１】
また、音声分析比較装置１４は、周辺音声のレベルが閾値以上であれば（ステップＳ１０５で「Ｙｅｓ」）、周辺音声データが除外音声データテーブル３１内の各除外音声データのいずれかに一致するか否かを判定する（ステップＳ１０６）。これにより、周辺音声が除外音声データテーブル３１に登録されている各除外音声のいずれかに一致するか否かが判定される。
【００４２】
そして、音声分析比較装置１４は、周辺音声が各除外音声のいずれかに一致すれば（ステップＳ１０６で「Ｙｅｓ」）、つまり周辺音声が平常時の自動車の騒音、電車の騒音、電話機の呼び出し音等のいずれかに一致すれば、ステップＳ１０３に戻る。
【００４３】
また、音声分析比較装置１４は、周辺音声が各除外音声のいずれにも一致しなければ（ステップＳ１０６で「Ｎｏ」）、つまり周辺音声が平常時の自動車の騒音、電車の騒音、電話機の呼び出し音等のいずれにも一致しなければ、この旨を主制御装置２１に通知する。これに応答して主制御装置２１は、監視カメラ１１及び記憶装置１３を起動する。
【００４４】
監視カメラ１１は、起動されると、室内等を撮影し、その画像データを記憶装置１３に出力する（ステップＳ１０７）。記憶装置１３は、監視カメラ１１からの画像データを時計１５により計時されている現在の年月日並びに時刻と共に記憶する（ステップＳ１０８）。また、記憶装置１３は、各除外音声のいずれにも一致しなかった周辺音声を示す周辺音声データを音声分析比較装置１４から入力し、周辺声データを監視カメラ１１からの画像データと共に記憶する。
【００４５】
このとき、記憶装置１３は、時計１５により計時されている現在の年月日並びに時刻、監視カメラ１１からの室内等を示す画像データ、及び音声分析比較装置１４からの周辺音声を示す周辺音声データ等を対応付けて記憶し、これにより図６に示す様な監視データテーブル３３を形成する。
【００４６】
ここでは、一定周期毎に、複数の静止画像を監視カメラ１１により撮影し、各静止画像データを含む静止画像ファイルを監視データテーブル３３に記憶している。また、周辺音声を示す周辺音声データを音声ファイルとして監視データテーブル３３に記憶している。更に、静止画像ファイル及び音声ファイルは、番号、現在の年月日並びに時刻、静止画像ファイル及び音声ファイルが再生済みであるか否かを示す再生フラッグ、及びその他の情報等と共に記憶されている。
【００４７】
尚、複数の静止画像データの代わりに、監視カメラ１１により撮影された動画像データを記憶しても構わない。
【００４８】
この様に「いってきます」という人の音声が検出されると、監視状態が設定される。そして、監視状態では、「ただいま」という人の音声が検出されると、監視が終了となる。また、監視状態では、「ただいま」という人の音声ではない周辺音声が検出され、この周辺音声のレベルが閾値以上であり、この周辺音声が平常時の自動車の騒音、電車の騒音、電話機の呼び出し音等のいずれにも一致しなければ、室内等を示す画像及び周辺音声が記憶装置１３に記憶される。また、「ただいま」という人の音声ではない周辺音声が検出されても、この周辺音声のレベルが閾値未満であったり、この周辺音声が平常時の自動車の騒音、電車の騒音、電話機の呼び出し音等のいずれかに一致すると、室内等を示す画像及び周辺音声が記憶されない。
【００４９】
ここで、周辺音声が平常時の自動車の騒音、電車の騒音、電話機の呼び出し音等のいずれにも一致しないということは、周辺音声が異常時のものであると推定することができる。従って、記憶装置１３に記憶されている室内等の画像及び周辺音声等も、異常時のものと推定することができる。
【００５０】
次に、異常時のものと推定される記憶装置１３内の画像及び周辺音声を確認するための手順を図７に示すフローチャートに従って説明する。
【００５１】
まず、操作パネル１９の操作により記憶装置１３内の静止画像ファイル及び音声ファイルの再生が指示されると、これに応答して主制御装置２０は、静止画像ファイル及び音声ファイルが記憶装置１３内に記憶されているか否かを判定する（ステップＳ２０１）。そして、主制御装置２０は、静止画像ファイル及び音声ファイルが記憶されていなければ（ステップＳ２０１で「Ｎｏ」）、この処理を終了する（ステップＳ２０２）。
【００５２】
また、主制御装置２０は、静止画像ファイル及び音声ファイルが記憶されていれば（ステップＳ２０１で「Ｙｅｓ」）、番号ｎ＝１と設定し（ステップＳ２０３）、番号ｎ（＝１）に対応する年月日並びに時刻、静止画像ファイル、及び音声ファイルを記憶装置１３から読み出して、年月日並びに時刻と静止画像ファイルの各静止画像データを表示装置１７に与え、また音声ファイルの周辺音声データを音声再生装置１８に与える（ステップＳ２０４）。表示装置１７は、年月日並びに時刻を表示すると共に、各静止画像データによって示される室内のそれぞれの静止画像を一定周期で順次表示する。また、音声再生装置１８は、周辺音声データによって示される周辺音声を再生する。
【００５３】
先に述べた様に記憶装置１３に記憶されている室内等の画像及び周辺音声等が異常時のものと推定される。そこで、表示装置１７により表示された室内の各静止画像、及び音声再生装置１８により発音された周辺音声に基づいて、室内等に異常が発生していたか否かを確認する。また、室内等の異常発生が確認された場合は、表示装置１７により表示されている年月日並びに時刻を異常発生の年月日並びに時刻とみなす。
【００５４】
次に、操作パネル１９の操作により次の静止画像ファイル及び音声ファイルの再生が指示されると（ステップＳ２０５で「Ｙｅｓ」）、これに応答して主制御装置２０は、番号（ｎ＋１）の静止画像ファイル及び音声ファイルが記憶装置１３内に記憶されているか否かを判定し（ステップＳ２０６）、記憶されていなければ（ステップＳ２０６で「Ｎｏ」）、この処理を終了する（ステップＳ２０２）。
【００５５】
また、主制御装置２０は、番号（ｎ＋１）の静止画像ファイル及び音声ファイルが記憶されていれば（ステップＳ２０６で「Ｙｅｓ」）、番号ｎ＝（ｎ＋１）と更新してから（ステップＳ２０７）、ステップＳ２０４に戻る。これにより、番号ｎ（＝２）に対応する年月日並びに時刻、静止画像ファイル、及び音声ファイルが記憶装置１３から読み出され、年月日並びに時刻と静止画像ファイルの各静止画像データによって示されるそれぞれの静止画像が表示装置１７に表示され、音声ファイルの周辺音声データによって示される周辺音声が音声再生装置１８により再生される。
【００５６】
更に、操作パネル１９の操作により前回の静止画像ファイル及び音声ファイルの再生が指示されると（ステップＳ２０５で「Ｎｏ」、ステップＳ２０８で「Ｙｅｓ」）、これに応答して主制御装置２０は、番号（ｎ−１）＝０であるか否かを判定し（ステップＳ２０９）、番号（ｎ−１）＝０であれば（ステップＳ２０９で「Ｙｅｓ」）、この処理を終了する（ステップＳ２０２）。
【００５７】
また、主制御装置２０は、番号（ｎ−１）＝０なければ（ステップＳ２０９で「Ｎｏ」）、番号ｎ＝（ｎ−１）と更新してから（ステップＳ２１０）、ステップＳ２０３に戻る。これにより、番号ｎに対応する年月日並びに時刻、静止画像ファイル、及び音声ファイルが記憶装置１３から読み出され、年月日並びに時刻と静止画像ファイルの各静止画像データによって示されるそれぞれの静止画像が表示装置１７に表示され、音声ファイルの周辺音声データによって示される周辺音声が音声再生装置１８により再生される。
【００５８】
以降同様に、操作パネル１９の操作により、次の静止画像ファイル及び音声ファイルが指示されるか、前回の静止画像ファイル及び音声ファイルが指示されると、指示された静止画像ファイル及び音声ファイルが再生される。
【００５９】
また、指示された静止画像ファイル及び音声ファイルの再生に引き続いて、操作パネル１９の操作により該静止画像ファイル及び該音声ファイルの消去が指示されると（ステップＳ２０５で「Ｎｏ」、ステップＳ２０９で「Ｎｏ」、ステップＳ２１１で「Ｙｅｓ」）、これに応答して主制御装置２０は、該音声ファイルの周辺音声データを除外音声データとして音声分析比較装置１４の除去音声データテーブル３１に登録してから（ステップＳ２１２）、記憶装置１３内の該静止画像ファイル及び該音声ファイルを消去し（ステップＳ２１３）、この処理を終了する（ステップＳ２０２）。
【００６０】
ここで、静止画像ファイル及び音声ファイルの再生に引き続いて、これらのファイルの消去が指示されたときには、該静止画像ファイルの各静止画像データによって示されるそれぞれの静止画像及び該音声ファイルの周辺音声データによって示される周辺音声が異常時のものではなくて平常時のものであったとみなすことができる。
【００６１】
そこで、以降の監視状態で、該周辺音声が再度検出されたときに、この周辺音声が除外音声データテーブル３１に登録されている除外音声に一致すると判定されて、監視カメラ１１が起動されない様にするために、該周辺音声データを除外音声データとして音声分析比較装置１４の除去音声データテーブル３１に登録しておく。これにより、平常時の同一の周辺音声に応答して静止画像ファイル及び音声ファイルが記憶装置１３に記憶されることがなくなり、監視精度を高め、記憶装置１３の容量をより有効に利用することができる。
【００６２】
この様に本実施形態では、「いってきます」という人の音声に応答して監視状態が設定され、「ただいま」という人の音声に応答して監視が終了となる。また、監視状態では、周辺音声が平常時の自動車の騒音、電車の騒音、電話機の呼び出し音等のいずれにも一致しなければ、周辺音声が異常時のものであると推定されて、室内等を示す画像及び周辺音声等が記憶装置１３に記憶される。このため、記憶装置１３内の画像及び周辺音声を再生すれば、異常が発生していたか否かを確認することができる。
【００６３】
また、記憶装置１３内の画像及び周辺音声の再生に際しては、画像及び周辺音声の消去が指示されると、画像及び周辺音声が異常時のものではなくて平常時のものであるとみなして、この周辺音声を除外音声として登録しているので、以降の監視状態では、同一の周辺音声が異常時のものであると推定されることがなく、室内等を示す画像及び周辺音声等が無駄に記憶されずに済む。
【００６４】
尚、本発明は、上記実施形態に限定されるものではなく、多様に変形することができる。例えば、画像及び周辺音声の記憶装置として、周知の様々な装置を適用することができる。また、複数の監視カメラや複数のマイクロホンを設置しても良い。更に、複数のマイクロホンを設置する場合は、マイクロホンの設置位置により平常時の周辺音声が異なるため、各マイクロホン別に、除外音声データテーブルを設定しても良い。例えば、マイクロホンを道路沿いの住宅の窓に設置した場合は、自動車の騒音を除外音声として設定し、またマイクロホンを電話機近傍に設置した場合は、電話機の呼び出し音を除外音声として設定する。これにより、各マイクロホン毎に、平常時の周辺音声の種類を減少させることができ、異常時の周辺音声の検出精度を高めることができる。
【００６５】
【発明の効果】
以上説明した様に本発明によれば、周辺音声と除外音声が一致するときには、平常時とみなし、周辺音声と除外音声が異なるときには、異常時とみなしている。そして、周辺音声と除外音声が異なる異常時には、撮影手段により撮影された画像を画像記憶手段に記憶させる。このため、画像記憶手段に記憶された画像を再生すれば、異常時の様子を知ることができる。また、異常時の画像を記憶するだけであるから、画像記憶手段として容量の小さなものを適用することができる。
【００６６】
また、平常時の店舗や住宅で検出し得る音声は、除外音声として、除外音声記憶手段に予め記憶させ易く、多様な音声があったとしても、これらの音声を除外音声記憶手段に予め記憶しておくことが可能である。そして、除外音声として、多様な音声を除外音声記憶手段に予め記憶しておくことにより、異常時の判定精度を高めることができる。
【００６７】
これに対して異常時の音声は、平常時に発生しないため、予測し難く、予め記憶しておくことが困難である。従って、従来の様に異常時の音声に基づいて、異常時の判定精度を高めることは困難である。
【００６８】
また、多数の各種センサーを必用としないので、コストの低減を図ることができる。
【図面の簡単な説明】
【図１】本発明の監視装置の一実施形態を示すブロック図である。
【図２】図１の監視装置における除去音声データテーブルを概念的に示す図である。
【図３】図１の監視装置により監視される住宅の室内を例示する図である。
【図４】図１の監視装置におけるコマンド音声データテーブルを概念的に示す図である。
【図５】図１の監視装置による室内の監視手順を示すフローチャートである。
【図６】図１の監視装置における監視データテーブルを概念的に示す図である。
【図７】図６の監視データテーブルの画像及び周辺音声を確認するための手順を示すフローチャートである。
【符号の説明】
１１監視カメラ
１２マイクロホン
１３記憶装置
１４音声分析比較装置
１５時計
１６通報装置
１７表示装置
１８音声再生装置
１９操作パネル
２１主制御装置
２２バス
３１除去音声データテーブル
３２コマンド音声データテーブル
３３監視データテーブル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a monitoring device for monitoring a store or a house.
[0002]
[Prior art]
As is well known, various monitoring devices have been proposed.
[0003]
For example, in Patent Document 1, when an abnormality is detected by a measuring instrument or the like provided in a plant, sound input from a microphone is recorded, and an image taken by a surveillance camera is recorded, and then the sound and A technique for reproducing an image and confirming an abnormality occurring in a plant is disclosed.
[0004]
In Patent Document 2, an impact sound sensor and a monitoring camera are provided at an intersection, the impact sound of an accident occurring at the intersection is detected by the impact sound sensor, and the monitoring camera is set in response to the detection output of the impact sound sensor. A technique has been disclosed in which an image of an intersection at the time of an accident is shot with a surveillance camera, recorded on a video deck, and then an accident image is reproduced.
[0005]
Furthermore, Patent Document 3 discloses a technique in which various abnormal sounds are extracted and detected, the direction of the monitoring camera is controlled in accordance with the level of these abnormal sounds, and an abnormal point is photographed by the monitoring camera. Yes.
[0006]
[Patent Document 1]
JP 7-325990 A
[Patent Document 2]
JP-A-8-116528
[Patent Document 3]
JP 2002-123878 A
[0007]
[Problems to be solved by the invention]
However, when monitoring a store or a house is assumed, the techniques of Patent Documents 1 to 3 have the following problems.
[0008]
In Patent Document 1, an abnormality is detected by a measuring instrument or the like, and an image taken by a surveillance camera is recorded in response to the detection output. Therefore, if monitoring of a store or a house is assumed, a measuring instrument or the like (sensor) for detecting an abnormality in the store or the house is required. However, since there are various abnormalities such as stores and houses and it is necessary to install a large number of various sensors, the scale of the system including these sensors increases, the cost increases, and it is not suitable for general use.
[0009]
In Patent Document 2, the surveillance camera is activated in response to the detection output of the impact sound sensor. Therefore, the monitoring camera is activated when a very loud impact sound is generated. However, in stores, houses, etc., when an abnormality occurs, a very loud impact sound is not always generated, and it is not possible to appropriately deal with various abnormalities.
[0010]
Further, in Patent Document 3, various abnormal sounds are extracted and detected, and the direction of the monitoring camera is controlled according to the level of these abnormal sounds. Therefore, if it is assumed to monitor a store or a house, it is necessary to predict what abnormal sound will occur in the store or the house when an abnormality occurs. However, it is difficult to predict in advance an abnormal sound in which a vacant nest or the like is generated, and it is not possible to respond appropriately.
[0011]
It is also conceivable that a surveillance camera for photographing a store or a house is provided, and photographing by the surveillance camera is continued and the image is continuously recorded while the store or the house is absent. However, in this case, since the image recording time becomes long, the capacity of the storage device must be increased, and the cost increases. In addition, the confirmation time of the image becomes long and the management of the image becomes complicated, which is not practical.
[0012]
Therefore, the present invention has been made in view of the above-described conventional problems, and an object thereof is to provide a monitoring apparatus that can keep costs low and can appropriately cope with various abnormalities. .
[0013]
[Means for Solving the Problems]
In order to solve the above-described problems, the present invention provides an imaging unit that captures a monitoring target area, an image storage unit, an audio detection unit that detects surrounding audio, and an excluded audio storage unit that stores preset exclusion audio. Control means for comparing the surrounding sound detected by the sound detecting means with the excluded sound in the excluded sound storage means, and storing the image photographed by the photographing means in the image storage means when the peripheral sound and the excluded sound are different And.
[0014]
According to the present invention having such a configuration, when the peripheral sound detected by the sound detecting means is different from the excluded sound in the excluded sound storing means, the image photographed by the photographing means is stored in the image storing means. . As the excluded voice, for example, a voice that can be detected in a normal store or house is set. As a result, when the surrounding sound and the excluded sound match, it can be regarded as normal, and when the surrounding sound and the excluded sound are different, it can be regarded as abnormal. When the ambient sound and the excluded sound are different from each other, the image captured by the image capturing unit is stored in the image storage unit. For this reason, if the image stored in the image storage means is reproduced, the state at the time of abnormality can be known.
[0015]
For example, when a store or a house is monitored, a notification sound for a visitor, a ringing tone for a telephone, and a sound from a nearby car or train are set as excluded sounds. Thereby, in the normal time when the notification sound of the visitor, the ringing sound of the telephone, the sound of the nearby car or train is detected, the surrounding voice detected by the voice detecting means matches the excluded voice in the excluded voice storage means, When an image photographed by the photographing means is not stored in the image storing means, and other sounds are detected abnormally, the peripheral sound detected by the sound detecting means is different from the excluded sound in the excluded sound storing means. The image photographed by the means is stored in the image storage means, and the state of the abnormality can be known by reproducing this image.
[0016]
In addition, audio that can be detected in a normal store or house is easily stored in the excluded audio storage unit in advance as excluded audio, and even if there are various types of audio, these audios are stored in advance in the excluded audio storage unit. It is possible to keep. Then, by storing various sounds as excluded sounds in the excluded sound storage means in advance, it is possible to improve the determination accuracy at the time of abnormality.
[0017]
On the other hand, since the sound at the time of abnormality does not occur at normal times, it is difficult to predict and is difficult to store in advance. Therefore, it is difficult to improve the determination accuracy at the time of abnormality based on the sound at the time of abnormality as in the prior art.
[0018]
Further, in the present invention, command voice storage means for storing command voice set in advance is provided, and the control means compares the peripheral voice detected by the voice detection means with the command voice in the command voice storage means, Monitoring is started or ended when the voice and the command voice match.
[0019]
For example, when monitoring a store or a house, the voice of the person who says “I will come” or “I am now” is stored as a command voice in the command voice storage means, and the surrounding voice detected by the voice detection means is “ Monitoring is started when the command voice matches “command voice”, and when the surrounding voice detected by the voice detection means matches the command voice “Now”. This makes it possible to start and end monitoring without any special operation, and to prevent the monitoring state from continuing unintentionally or inadvertently entering the unmonitored state. Effective use of the capacity of the image storage means can be achieved.
[0020]
Furthermore, in the present invention, the control means does not store the image photographed by the photographing means in the image storage means when the peripheral sound and the command sound match.
[0021]
Here, it is clarified that the image is not stored in the image storage means even if the peripheral sound and the command sound match.
[0022]
Further, in the present invention, when the peripheral sound detected by the sound detecting means is different from the excluded sound in the excluded sound storage means, the peripheral sound recording means for recording the peripheral sound is provided, and the control means When an instruction to delete the image stored in the image storage means is given when the excluded sound is different, the peripheral sound in the peripheral sound recording means is stored in the excluded sound storage means as a new excluded sound.
[0023]
When the peripheral sound and the excluded sound are different, it is regarded as an abnormal time, and an image photographed by the photographing means is stored in the image storage means. However, even if the image stored in the image storage means can be reproduced and only the normal state can be confirmed, the peripheral sound is excluded sound even though the peripheral sound is normal. Since it was not stored in the storage means, it is regarded as abnormal and the image is stored in the image storage means. Also, an image showing a normal state is usually deleted. Therefore, when the peripheral sound and the excluded sound are different, not only the image is stored in the image storage means but also the peripheral sound is recorded in the peripheral sound recording means, and thereafter an instruction to erase the image in the image storage means is given. When this is done, the peripheral voice in the peripheral voice recording means is stored in the excluded voice storage means as a new excluded voice. Thereafter, when the same surrounding sound is generated again, the surrounding sound and the excluded sound coincide with each other, and the image photographed by the photographing means is not stored in the image storage means. Thereby, monitoring accuracy can be improved and the capacity of the image storage means can be used more effectively.
[0024]
Further, in the present invention, a plurality of voice detection means are provided, the excluded voice storage means stores preset exclusion voices corresponding to the respective voice detection means, and the control means is provided for each voice detection means. The surrounding sound detected by the sound detecting means and the excluded sound corresponding to the sound detecting means in the excluded sound storage means are compared, and when the surrounding sound and the excluded sound are different, the image taken by the photographing means is stored in the image storing means. To remember.
[0025]
Each voice detection means is installed in each place of a store or a house, and detects the voice of each place. Further, each excluded voice that can be detected at the installation location of each voice detection means is stored in advance in association with each voice detection means. Then, for each voice detection means, the peripheral voice detected by the voice detection means is compared with the excluded voice corresponding to the voice detection means in the excluded voice storage means. In this case, normal excluded voices are specified for each installation location of each voice detection means, and the types of normal excluded voices can be reduced to improve monitoring accuracy.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
[0027]
FIG. 1 is a block diagram showing an embodiment of the monitoring apparatus of the present invention. The monitoring apparatus according to the present embodiment is installed in a store or a house, and is detected by a monitoring camera 11 that images a room or the like, a microphone 12 that detects ambient sounds in a room, and the like and an image captured by the monitoring camera 11 and the microphone 12. A storage device 13 for storing the ambient sound, a speech analysis / comparison device 14 for analyzing the ambient sound detected by the microphone 12, a clock 15 for measuring the current date and time, and an abnormality occurrence through a telephone line. A notification device 16 for notifying an external terminal, a display device 17 such as a CRT or a liquid crystal display device, an audio reproduction device 18, an operation panel 19 comprising a keyboard and the like, and a main control device 21 for comprehensively controlling the monitoring device. And a bus 22 that interconnects each part of the monitoring device.
[0028]
The speech analysis / comparison device 14 stores a removal speech data table 31 in which each number, each name, and each exclusion speech data are associated with each other as shown in FIG. Each excluded voice data indicates surrounding sounds (hereinafter also referred to as excluded sounds) such as automobile noise, train noise, telephone ringing sound, etc. that can be detected by the microphone 12 in a room as shown in FIG. Is.
[0029]
Further, the voice analysis comparison device 14 stores a command voice data table 32 in which each number, each name, and each command voice data are associated with each other as shown in FIG. Each command voice data indicates a voice of a person “I will come” or “I am now” that can be detected by the microphone 12 in a room as shown in FIG.
[0030]
Both the excluded voice data and the command voice data are obtained by converting a voice signal from the microphone 12.
[0031]
For example, when the main control device 20 is instructed to sample the removed voice data by operating the operation panel 19 while the vehicle noise is occurring, the main control device 20 activates the microphone 12 and the voice analysis / comparison device 14. Then, the noise of the automobile is sampled, and the removal voice data indicating the noise of the automobile is generated. At this time, the microphone 12 detects automobile noise and outputs an audio signal indicating the automobile noise to the audio analysis and comparison device 14. The voice analysis / comparison device 14 digitizes the voice signal to generate the removal voice data indicating the noise of the automobile, and registers the removal voice data in the removal voice data table 31.
[0032]
Similarly, when the main controller 20 is instructed to sample the command voice by operating the operation panel 19 while the voice of the person “I will come” is generated, the main controller 20 causes the microphone 12 and voice analysis. The comparison device 14 is activated, and the voice of the person “I will come” is sampled to generate command voice data. At this time, the microphone 12 detects the voice of the person “I will come” and outputs a voice signal indicating the voice of the person to the voice analysis comparison device 14. The voice analysis / comparison device 14 digitizes the voice signal, generates command voice data indicating the voice of the person who says “I will come”, and registers this command voice data in the command voice data table 32.
[0033]
Next, the indoor monitoring procedure by the monitoring apparatus having such a configuration will be described with reference to the flowchart shown in FIG.
[0034]
First, in the standby state, the microphone 12 outputs a sound signal indicating the surrounding sound to the sound analyzing / comparing device 14 every time the surrounding sound is detected. Each time the voice signal from the microphone 12 is input, the voice analysis / comparison device 14 digitizes the voice signal to generate peripheral voice data, and this peripheral voice data is referred to as “coming” in the command voice data table 32. It is determined whether or not it matches the command voice data indicating the voice of the person (step S101). Thereby, it is determined whether or not the surrounding voice matches the voice of the person “I will come” registered in the command voice data table 32.
[0035]
If the surrounding voice does not match the voice of the person who says “I will come” (“No” in step S101), the voice analysis comparison device 14 continues to maintain the standby state. Further, if the surrounding voice matches the voice of the person saying “I will come” (“Yes” in step S101), the voice analysis comparison device 14 notifies the main control device 21 of this fact.
[0036]
The main control device 21 sets the monitoring state when the surrounding voice matches the voice of the person who says “I will come” (step S102).
[0037]
In this monitoring state, the voice analysis / comparison device 14 converts the voice signal from the microphone 12 into the peripheral voice data, and the peripheral voice data indicates the voice of the person “Tadaima” in the command voice data table 32. It is determined whether or not they match (step S103). As a result, it is determined whether or not the surrounding voice matches the voice of the person “Tadaima” registered in the command voice data table 32.
[0038]
Then, the voice analysis / comparison device 14 notifies the main control device 21 of this if the surrounding voice matches the voice of the person “I'm right” (“Yes” in step S103).
[0039]
When the surrounding voice matches the voice of the person who is “just now”, main controller 21 ends the monitoring state (step S104) and returns to the standby state of step S101.
[0040]
In addition, if the surrounding voice does not match the voice of the person who is “just now” (“No” in step S103), the voice analysis comparison device 14 sets the threshold level of the surrounding voice based on the surrounding voice data. It is determined whether or not this is the case (step S105). Then, the voice analysis comparison device 14 returns to step S103 if the level of the surrounding voice is not equal to or higher than the threshold (“No” in step S105).
[0041]
Further, if the level of the surrounding sound is equal to or higher than the threshold (“Yes” in step S105), the sound analysis comparison device 14 determines whether the surrounding sound data matches any of the excluded sound data in the excluded sound data table 31. It is determined whether or not (step S106). Thereby, it is determined whether or not the surrounding sound matches any of the excluded sounds registered in the excluded sound data table 31.
[0042]
If the surrounding voice matches any of the excluded voices (“Yes” in step S106), the voice analysis / comparison device 14 means that the surrounding voice is normal car noise, train noise, telephone ringing sound. Or the like, the process returns to step S103.
[0043]
Further, the voice analysis / comparison device 14 does not match any of the excluded voices ("No" in step S106), that is, the peripheral voices are normal car noise, train noise, telephone call. If it does not match any of the sounds, etc., this is notified to the main controller 21. In response to this, the main control device 21 activates the monitoring camera 11 and the storage device 13.
[0044]
When the surveillance camera 11 is activated, it takes a picture of the room and outputs the image data to the storage device 13 (step S107). The storage device 13 stores the image data from the monitoring camera 11 together with the current date and time measured by the clock 15 (step S108). In addition, the storage device 13 inputs the peripheral sound data indicating the peripheral sound that did not match any of the excluded sounds from the sound analysis comparison device 14 and stores the peripheral sound data together with the image data from the monitoring camera 11.
[0045]
At this time, the storage device 13 stores the current date and time measured by the clock 15, the image data indicating the room from the monitoring camera 11, and the peripheral audio data indicating the peripheral audio from the audio analysis comparison device 14. Are stored in association with each other, thereby forming a monitoring data table 33 as shown in FIG.
[0046]
Here, a plurality of still images are taken by the monitoring camera 11 at regular intervals, and a still image file including each still image data is stored in the monitoring data table 33. In addition, peripheral audio data indicating the peripheral audio is stored in the monitoring data table 33 as an audio file. Furthermore, the still image file and the audio file are stored together with a number, a current date and time, a reproduction flag indicating whether or not the still image file and the audio file have been reproduced, and other information.
[0047]
Note that moving image data captured by the monitoring camera 11 may be stored instead of a plurality of still image data.
[0048]
In this way, when the voice of a person “coming” is detected, the monitoring state is set. In the monitoring state, when the voice of the person “I'm right” is detected, the monitoring ends. Also, in the monitoring state, the surrounding voice that is not the voice of the person “I'm right now” is detected, and the level of this surrounding voice is above the threshold, and this surrounding voice is normal vehicle noise, train noise, telephone call If it does not match any sound or the like, an image indicating the room and the surrounding sound are stored in the storage device 13. In addition, even if the surrounding voice that is not the voice of the person “I'm right” is detected, the level of the surrounding voice is below the threshold, or the surrounding voice is normal car noise, train noise, telephone ringing sound. Or the like, the image indicating the room and the surrounding sound are not stored.
[0049]
Here, the fact that the surrounding voice does not match any of the normal automobile noise, train noise, telephone ringing sound, etc., can be estimated that the surrounding voice is an abnormal one. Accordingly, it is possible to estimate that the room image and the surrounding sound stored in the storage device 13 are also abnormal.
[0050]
Next, the procedure for confirming the image and the surrounding sound in the storage device 13 that is presumed to be abnormal will be described with reference to the flowchart shown in FIG.
[0051]
First, when the reproduction of the still image file and the audio file in the storage device 13 is instructed by the operation of the operation panel 19, in response to this, the main control device 20 stores the still image file and the audio file in the storage device 13. It is determined whether it is stored (step S201). If the still image file and the audio file are not stored (“No” in step S201), main controller 20 ends this process (step S202).
[0052]
Further, if a still image file and an audio file are stored (“Yes” in step S201), main controller 20 sets number n = 1 (step S203), and corresponds to number n (= 1). The date and time, the still image file, and the audio file are read from the storage device 13, the still image data of the date and time and the still image file are given to the display device 17, and the peripheral audio data of the audio file is also obtained. It gives to the audio | voice reproduction apparatus 18 (step S204). The display device 17 displays date, time, and time, and sequentially displays each still image in the room indicated by each still image data at a constant period. In addition, the audio reproduction device 18 reproduces peripheral audio indicated by the peripheral audio data.
[0053]
As described above, it is presumed that the room image and the surrounding sound stored in the storage device 13 are abnormal. Therefore, whether or not an abnormality has occurred in the room or the like is confirmed based on each still image in the room displayed on the display device 17 and the surrounding sound generated by the sound reproduction device 18. When the occurrence of an abnormality in the room or the like is confirmed, the date and time displayed by the display device 17 are regarded as the date and time of occurrence of the abnormality.
[0054]
Next, when the reproduction of the next still image file and audio file is instructed by operation of the operation panel 19 (“Yes” in step S205), in response to this, the main controller 20 responds to the still number (n + 1). It is determined whether or not the image file and the audio file are stored in the storage device 13 (step S206). If not stored (“No” in step S206), this process is terminated (step S202).
[0055]
Further, if the still image file and the audio file of number (n + 1) are stored (“Yes” in step S206), main controller 20 updates number n = (n + 1) (step S207), The process returns to step S204. As a result, the date and time, the still image file, and the audio file corresponding to the number n (= 2) are read from the storage device 13 and indicated by the still image data of the date and time and the still image file. Each still image is displayed on the display device 17, and the surrounding sound indicated by the surrounding sound data of the sound file is reproduced by the sound reproducing device 18.
[0056]
Further, when an instruction to reproduce the previous still image file and audio file is given by operation of the operation panel 19 (“No” in step S205, “Yes” in step S208), in response to this, the main controller 20 It is determined whether or not the number (n−1) = 0 (step S209). If the number (n−1) = 0 (“Yes” in step S209), this process is terminated (step S202). .
[0057]
If the number (n−1) = 0 is not satisfied (“No” in step S209), main controller 20 updates number n = (n−1) (step S210), and then returns to step S203. As a result, the date and time, the still image file, and the audio file corresponding to the number n are read from the storage device 13, and each still image indicated by the date and time and each still image data of the still image file. The image is displayed on the display device 17, and the surrounding sound indicated by the surrounding sound data of the sound file is reproduced by the sound reproducing device 18.
[0058]
Similarly, when the next still image file and audio file are instructed by the operation of the operation panel 19 or the previous still image file and audio file are instructed, the designated still image file and audio file are reproduced. Is done.
[0059]
Further, following the reproduction of the instructed still image file and audio file, when the operation panel 19 is operated to instruct deletion of the still image file and audio file (“No” in step S205, “ No ”,“ Yes ”in step S211, and in response, the main controller 20 registers the peripheral voice data of the voice file as excluded voice data in the removed voice data table 31 of the voice analysis comparison device 14. (Step S212), the still image file and the audio file in the storage device 13 are deleted (Step S213), and this process is terminated (Step S202).
[0060]
Here, following the reproduction of the still image file and the audio file, when the deletion of these files is instructed, the respective still images and the surrounding audio data of the audio file indicated by the still image data of the still image file It can be considered that the surrounding voice indicated by is not normal but normal.
[0061]
Therefore, when the surrounding sound is detected again in the subsequent monitoring state, it is determined that the surrounding sound matches the excluded sound registered in the excluded sound data table 31, so that the monitoring camera 11 is not activated. In order to do this, the surrounding voice data is registered as excluded voice data in the removed voice data table 31 of the voice analysis comparison device 14. As a result, the still image file and the audio file are not stored in the storage device 13 in response to the same peripheral sound in the normal state, the monitoring accuracy is improved, and the capacity of the storage device 13 can be used more effectively. it can.
[0062]
As described above, in this embodiment, the monitoring state is set in response to the voice of the person “I'm coming”, and the monitoring is ended in response to the voice of the person “I'm right”. Also, in the monitoring state, if the surrounding voice does not match any of the normal automobile noise, train noise, telephone ringing sound, etc., it is assumed that the surrounding voice is abnormal, and the room etc. Are stored in the storage device 13. Therefore, it is possible to confirm whether or not an abnormality has occurred by reproducing the image and the surrounding sound in the storage device 13.
[0063]
Further, when the image and the surrounding sound are instructed to be reproduced in the storage device 13, when the image and the surrounding sound are instructed to be erased, the image and the surrounding sound are regarded as normal and not abnormal. Since this peripheral sound is registered as an excluded sound, in the subsequent monitoring state, it is not estimated that the same peripheral sound is an abnormal sound, and an image showing the room and the surrounding sound and the like are useless. No need to remember.
[0064]
In addition, this invention is not limited to the said embodiment, It can deform | transform variously. For example, various known devices can be applied as a storage device for images and peripheral sounds. A plurality of surveillance cameras and a plurality of microphones may be installed. Further, when a plurality of microphones are installed, since the peripheral sound in the normal time varies depending on the installation position of the microphones, an excluded sound data table may be set for each microphone. For example, when a microphone is installed in a house window along a road, automobile noise is set as an excluded voice, and when a microphone is installed in the vicinity of a telephone, a telephone ringing tone is set as an excluded voice. As a result, for each microphone, the types of normal ambient sounds can be reduced, and the accuracy of detecting peripheral sounds at the time of abnormality can be increased.
[0065]
【The invention's effect】
As described above, according to the present invention, when the surrounding sound and the excluded sound match, it is regarded as normal, and when the surrounding sound and the excluded sound are different, it is regarded as abnormal. When the ambient sound and the excluded sound are different from each other, the image captured by the image capturing unit is stored in the image storage unit. For this reason, if the image stored in the image storage means is reproduced, the state at the time of abnormality can be known. In addition, since only an abnormal image is stored, an image storage unit having a small capacity can be applied.
[0066]
In addition, audio that can be detected in a normal store or house is easily stored in the excluded audio storage unit in advance as excluded audio, and even if there are various types of audio, these audios are stored in advance in the excluded audio storage unit. It is possible to keep. Then, by storing various sounds as excluded sounds in the excluded sound storage means in advance, it is possible to improve the determination accuracy at the time of abnormality.
[0067]
On the other hand, since the sound at the time of abnormality does not occur at normal times, it is difficult to predict and is difficult to store in advance. Therefore, it is difficult to improve the determination accuracy at the time of abnormality based on the sound at the time of abnormality as in the prior art.
[0068]
Moreover, since many various sensors are not required, cost reduction can be achieved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an embodiment of a monitoring device of the present invention.
FIG. 2 is a diagram conceptually showing a removed voice data table in the monitoring apparatus of FIG. 1;
FIG. 3 is a diagram illustrating the interior of a house monitored by the monitoring device of FIG. 1;
4 is a diagram conceptually showing a command voice data table in the monitoring apparatus of FIG. 1; FIG.
FIG. 5 is a flowchart showing an indoor monitoring procedure by the monitoring apparatus of FIG. 1;
6 is a diagram conceptually showing a monitoring data table in the monitoring apparatus of FIG. 1. FIG.
7 is a flowchart showing a procedure for confirming an image and surrounding sound in the monitoring data table of FIG. 6;
[Explanation of symbols]
11 Surveillance camera
12 Microphone
13 Storage device
14 Voice analysis and comparison device
15 clock
16 Reporting device
17 Display device
18 Audio playback device
19 Operation panel
21 Main controller
22 Bus
31 Removal voice data table
32 Command voice data table
33 Monitoring data table

Claims

監視対象領域を撮影する撮影手段と、
画像記憶手段と、
周辺音声を検出する音声検出手段と、
予め設定された除外音声を記憶した除外音声記憶手段と、
音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声を比較し、周辺音声と除外音声が異なるときに、撮影手段により撮影された画像を画像記憶手段に記憶させる制御手段と
を備えることを特徴とする監視装置。Photographing means for photographing the monitored area;
Image storage means;
Voice detection means for detecting ambient voice;
An excluded voice storage means for storing preset excluded voice;
A control means for comparing the surrounding sound detected by the sound detecting means and the excluded sound in the excluded sound storage means, and storing the image photographed by the photographing means in the image storage means when the surrounding sound and the excluded sound are different; A monitoring apparatus comprising:

予め設定されたコマンド音声を記憶するコマンド音声記憶手段を備え、
制御手段は、音声検出手段により検出された周辺音声とコマンド音声記憶手段内のコマンド音声を比較し、周辺音声とコマンド音声が一致するときに、監視を開始もしくは終了することを特徴とする請求項１に記載の監視装置。Command voice storage means for storing command voice set in advance is provided,
The control means compares the surrounding voice detected by the voice detecting means with the command voice in the command voice storage means, and starts or ends the monitoring when the surrounding voice and the command voice match. The monitoring apparatus according to 1.

制御手段は、周辺音声とコマンド音声が一致するときに、撮影手段により撮影された画像を画像記憶手段に記憶させないことを特徴とする請求項２に記載の監視装置。3. The monitoring apparatus according to claim 2, wherein the control unit does not store the image captured by the imaging unit in the image storage unit when the surrounding audio and the command audio match.

音声検出手段により検出された周辺音声と除外音声記憶手段内の除外音声が異なるときに、該周辺音声を記録する周辺音声記録手段を備え、
制御手段は、周辺音声と除外音声が異なるときに画像記憶手段に記憶された画像を消去することを指示されると、周辺音声記録手段内の該周辺音声を新たな除外音声として除外音声記憶手段に記憶させることを特徴とする請求項１に記載の監視装置。A peripheral sound recording means for recording the peripheral sound when the peripheral sound detected by the sound detecting means is different from the excluded sound in the excluded sound storage means;
When the control means is instructed to erase the image stored in the image storage means when the peripheral sound and the excluded sound are different, the control means stores the peripheral sound in the peripheral sound recording means as a new excluded sound. The monitoring device according to claim 1, wherein the monitoring device is stored.

複数の音声検出手段を設け、
除外音声記憶手段は、各音声検出手段に対応する予め設定されたそれぞれの除外音声を記憶し、
制御手段は、各音声検出手段別に、音声検出手段により検出された周辺音声と除外音声記憶手段内の該音声検出手段に対応する除外音声を比較し、周辺音声と除外音声が異なるときに、撮影手段により撮影された画像を画像記憶手段に記憶させることを特徴とする請求項１に記載の監視装置。Provide a plurality of voice detection means,
The excluded voice storage means stores each preset excluded voice corresponding to each voice detection means,
The control means compares the surrounding voice detected by the voice detecting means with the excluded voice corresponding to the voice detecting means in the excluded voice storage means for each voice detecting means, and shoots when the surrounding voice and the excluded voice are different. The monitoring apparatus according to claim 1, wherein an image captured by the means is stored in an image storage means.