JP2004158919A

JP2004158919A - Network camera system, network camera thereof, and data transmission method

Info

Publication number: JP2004158919A
Application number: JP2002320146A
Authority: JP
Inventors: Yuji Arima; 祐二有馬; Tadashi Yoshigai; 規吉貝; Toshiyuki Kihara; 寿之木原
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-11-01
Filing date: 2002-11-01
Publication date: 2004-06-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide a network camera system, a network camera thereof, and a data transmission method capable of sufficiently synchronizing images with sounds without causing the sound to be interrupted and avoiding the effect of compression and decompression processing. <P>SOLUTION: The network camera system, the network camera thereof, and the data transmission method are characterized in that network cameras 2, 2a, 2b, 2c select an insertion position of image file form data included in a web page, inserts sound data to a header area of the image file form data, inserts image data to a data area of the image file form data, and transmit the web page to a network terminal 1. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、カメラ及びマイクで取得した画像及び音声を所定の形式で送信するネットワークカメラシステム、及びそのとき使用するネットワークカメラ、またそのとき採用されるデータ送信方法に関する。
【０００２】
【従来の技術】
最近のデジタル技術とネットワーク技術の進歩は目覚しく、ネットワーク端末、例えばパソコン等をインターネットに接続して多くのウェブサーバから画像や音声を同時に受信して再生することが行われている。
【０００３】
ところで、このような従来のウェブサーバが画像と音声を送信する場合には、一般に画像と音声のそれぞれのデータにタイムスタンプ、すなわち時間情報による同期情報を付加して送信することが行われていた。例えば、テレビ会議システムを例として挙げているが、画像データと音声データに各々時間データを持たせて同期を取る端末装置が提案されている（特許文献１参照）。この端末装置は、音声、画像両データに時間制御による同期情報をもたせ、送信側ユーザの意図通り同期のとれたデータを作成し、受信側で同期情報を持ったデータを再生し、音声、画像両データを同期出力するものである。図７（ａ）は従来の同期情報による画像と音声の同期方式の受信側端末装置の構成図、図７（ｂ）は従来の同期情報による画像と音声の同期方式の送信側端末装置の構成図、図８は従来の送信側端末から送信する音声データと静止画データのタイムチャートである。
【０００４】
図７（ａ），（ｂ）において、この受信側は、通信部１０１と音声か画像かを判定するデータ分離部１０２と音声処理部１０３と音声格納部１０４と画像処理部１０５と画像格納部１０６と音声、画像データに付加された同期情報の抽出に基づき出力を制御する統合処理部１０７と同期情報記憶部１０８と時間情報の取得とデータの出力時間を管理する時間制御部１０９と音声出力部１１０と画像出力部１１１とを備えており、送信側は、音声入力部１１２と画像入力部１１３と入力データの合成部１１４と同期情報付加部１１５と同期情報入力部１１６と時間情報の制御部１１７とデータ送信部１１８と通信部１１９とを備えた構成である。
【０００５】
この（特許文献１）の送信側端末装置は、図８に示すように時間ｔ１で音声データが送信されると、時間ｔ２になるとｔ２の同期情報をもつ画像データ２，３を送信する。音声データ、画像データはどれぞれヘッダ情報、データ、終了コードから構成されており、ヘッダ情報にはデータの種類、データの大きさ、動機情報が収められる。
【０００６】
これを受信側端末装置が受信し、音声か画像かでデータを分離して同期情報の時間に合わせて出力を開始する。音声はデータの長さが決まっているが、画像データは出力時間が決まっていない。従って、ネットワークのトラフィック負荷が大きい場合、この端末装置では画像データと音声データのすべてを送信することが困難になる。そこでこのような場合、（特許文献１）の端末装置は、時間制御部１０９がデータを間引く処理を行っている。この結果、画像の一部とともに、音声の一部がカットされ、音声が途切れ途切れになってしまう。音声の途切れ途切れは聞き辛く、情報の伝達を大きく損なう。
【０００７】
以上説明した時間情報による同期の他に、（特許文献２）に示すようなフレーム番号を画像データと音声データに付加して同期をとるタイムスタンプ方式などが存在するが、タイムスタンプやフレーム番号を画像データ及び音声データに各々付加する必要があり構成が複雑となるばかりでなく、いずれもネットワークのトラフィック負荷が大きい場合、この端末装置ではすべての画像データと音声データを送信することは困難である。当然、音声は途切れ途切れとなるし、これら（特許文献１）（特許文献２）端末装置の同期をとる構成はいずれも複雑で、コスト高になるものであった。
【０００８】
【特許文献１】
特開平９−２７８７１号公報
【特許文献２】
特開平９−９３５５３号公報
【０００９】
【発明が解決しようとする課題】
以上説明したように、従来画像と音声を送信する場合に、各データに時間情報による同期情報を付加したり、フレーム番号を画像と音声のデータに付加して同期を取ったりして同期をとることが行われてきた。
【００１０】
しかし、ネットワークのトラフィック負荷が大きい場合、これらの同期をとる方式では画像データと音声データのすべてを送信することは困難になるものであった。すなわち、遅延が起こった場合データを間引く処理を行う必要があり、再生した画像の一部、音声の一部が同時にカットされ、音声が途切れ途切れになってしまうものであった。そしてこれらの同期をとる構成はいずれも複雑で、端末装置をコスト高になるものであった。
【００１１】
そこで本発明は、音声が途切れることがなく、圧縮伸長処理が影響を及ぼさず、画像と音声の十分な同期をとることができるネットワークカメラシステムを提供することを目的とする。
【００１２】
また本発明は、音声が途切れることがなく、圧縮伸長処理が影響を及ぼさず、画像と音声の十分な同期をとることができるネットワークカメラを提供することを目的とする。
【００１３】
さらに本発明は、音声が途切れることがなく、圧縮伸長処理が影響を及ぼさず、画像と音声の十分な同期をとることができるデータ送信方法を提供することを目的とする。
【００１４】
【課題を解決するための手段】
この課題を解決するために本発明のネットワークカメラシステムは、ネットワークカメラが、ウェブページに含まれる画像ファイル形式データの挿入位置を選択して、音声データを画像ファイル形式データのヘッダ域に挿入するとともに画像データを画像ファイル形式データのデータ域に挿入し、ネットワーク端末にウェブページを送信することを特徴とする。
【００１５】
これにより、音声が途切れることがなく、圧縮伸長処理が影響を及ぼさず、画像と音声の十分な同期をとることができる。
【００１６】
また本発明のネットワークカメラは、音声データを画像ファイル形式データのヘッダ域に挿入する画像ファイル形式データ生成手段が設けられ、該画像ファイル形式データ生成手段が音声データを画像ファイル形式データのヘッダ部分に挿入し、かつ画像ファイル形式データのデータ領域に画像データを挿入して画像ファイル形式データを生成して、ネットワーク端末に送信することを特徴とする。
【００１７】
これにより、音声が途切れることがなく、圧縮伸長処理が影響を及ぼさず、画像と音声の十分な同期をとることができる。
【００１８】
そして本発明のデータ送信方法は、ウェブページに音声データと画像データを添付して送信するとき、画像ファイル形式データのフォーマットにおける挿入位置を選択し、画像ファイル形式データのヘッダ域に音声データ、データ域に画像データを挿入して、このウェブページをネットワーク端末へ送信することにより通信の同期をとることを特徴とする。
【００１９】
これにより、音声が途切れることがなく、圧縮伸長処理が影響を及ぼさず、画像と音声の十分な同期をとることができる。
【００２０】
【発明の実施の形態】
上記課題を解決するために本発明の請求項１の発明は、カメラ部で撮影した画像データとマイクで集音した音声データを含むウェブページをネットワークに送信するネットワークカメラと、該ネットワークに接続され、受信したウェブページから映像と音声を再生することができるネットワーク端末とから構成されるネットワークカメラシステムであって、ネットワークカメラが、ウェブページに含まれる画像ファイル形式データの挿入位置を選択して、音声データを画像ファイル形式データのヘッダ域に挿入するとともに画像データを画像ファイル形式データのデータ域に挿入し、ネットワーク端末にウェブページを送信することを特徴とするネットワークカメラシステムであり、ＪＰＥＧ等の画像ファイル形式のデータに、画像ファイルだけではなく、音声データをも挿入することにより、ネットワークカメラから送信される音声データ及び画像データをネットワーク端末で容易に両者を十分に同期させて再生することができる。また、音声データはヘッダ領域（コメント部）に挿入されているため、通常の画像展開動作のみを行うネットワーク端末においても、画像だけを再生することができ、一方、ヘッダ領域のコメント部の音声データを分離再生することができるような構成とすれば、ネットワーク端末で音声も再生することができる。従って、同一の画像ファイル形式データで一般のブラウザであっても、少なくとも画像データは表示することができ、ネットワーク端末ごとに異なるファイルを生成する必要がなく、ネットワークカメラの処理負担が軽減され、安価なシステムを提供することができる。また、従来のように画像データと一緒に音声データが間引かれないので可能性として音声が途切れることがない。
【００２１】
請求項２の発明は、カメラ部を駆動する駆動制御部を備え、該カメラ部のカメラ位置情報及びまたはカメラ制御情報を画像ファイル形式データのヘッダ域に挿入するとともに画像データを画像ファイル形式データのデータ域に挿入して送信することを特徴とする請求項１記載のネットワークカメラシステムであり、カメラ部のカメラ位置情報及びまたはカメラ制御情報はヘッダ領域（コメント部）に挿入されているため、通常の画像展開動作のみを行うネットワーク端末においても、画像だけは再生することができ、一方、ヘッダ領域のコメント部のカメラ位置情報及びまたはカメラ制御情報を分離し処理することができるような構成とすれば、ネットワーク端末でカメラ位置情報及びまたはカメラ制御情報を取得することができる。従って、同一の画像ファイル形式データで一般のブラウザであっても、画像データを表示することができ、ネットワーク端末ごとに異なるファイルを生成する必要がなく、ネットワークカメラの処理負担が軽減され、安価なシステムを提供することができる。
【００２２】
請求項３の発明は、映像を撮影するカメラ部と、音声を集音するマイクと、カメラ部で得た画像情報を符号化して画像データを出力する映像制御部と、マイクからの音声情報を符号化して音声データを出力する音声制御部と、ネットワークと接続してネットワーク端末に画像データと音声データを送信できるネットワーク制御部を備えたネットワークカメラであって、音声データを画像ファイル形式データのヘッダ域に挿入する画像ファイル形式データ生成手段が設けられ、該画像ファイル形式データ生成手段が音声データを画像ファイル形式データのヘッダ部分に挿入し、かつ画像ファイル形式データのデータ領域に画像データを挿入して画像ファイル形式データを生成して、ネットワーク端末に送信することを特徴とするネットワークカメラであり、画像データが圧縮されているときにも音声データはヘッダ域（コメント部）に格納されており、伸長処理の際の互換性を保つことができる。従来のように画像データと一緒に音声データが間引かれないので可能性として音声が途切れることがない。画像データと音声データを同一の画像ファイル形式データとして同時に送信するため、画像と音声の十分な同期をとることができる。
【００２３】
請求項４の発明は、音声データと画像データを同時に送信する第１のモードと、音声データだけを送信する第２のモードと、画像データだけを送信する第３のモードとを切り替えるモード切り替え手段を備え、ネットワーク端末からのモード切り替え要求を受けると、モード切り替え手段がいずれかのモードに切り替え、通知生成手段が該モードに従って画像ファイル形式データを生成することを特徴とする請求項３記載のネットワークカメラであり、モードを切り替えることができるため、トラフィックの負荷が大きくなったときなどに音声データだけを送信することができ、トラフィックと画像と音声に対するニーズを反映した通信が行える。
【００２４】
請求項５の発明は、カメラ部を駆動する駆動制御部を備え、画像ファイル形式データ生成手段がカメラ位置情報及びまたはカメラ制御情報を画像ファイル形式データのヘッダ域に挿入することを特徴とする請求項３または４記載のネットワークカメラであり、カメラ位置情報やカメラ制御情報が画像ファイルのヘッダ部分に挿入されているから、ネットワーク端末では受信した画像ファイ形式データから容易にカメラ位置情報やカメラ制御情報を入手し、利用することができる。
【００２５】
請求項６の発明は、マイクからの音声入力が無音レベルの場合に、画像ファイル形式データ生成手段が、音声データを添付しないで画像データのみが挿入された画像ファイル形式データまたは無音レベルを表すデータと画像データが挿入された画像ファイル形式データを送信することを特徴とする請求項３〜５のいずれかに記載のネットワークカメラであり、無音レベルのときには無用なデータを送らず、データ量をおとすことができ、又トラフィックの混雑により再生する音声データが遅延するような場合でも、無音状態の音声データは間引くことができるから、同期の調整を補完することができる。
【００２６】
請求項７の発明は、音声データのデータ長が許容データ長を越えた場合、通知生成手段が該許容データ長の音声データを添付した画像ファイル形式データを送信することを特徴とする請求項３〜６のいずれかに記載のネットワークカメラであり、音声データが許容データ長を越えた場合にも送信が可能になる。
【００２７】
請求項８の発明は、音声データのデータ長が許容データ長を越えた場合、通知生成手段が該許容データ長の音声データを添付した画像ファイル形式データを送信した後に、音声データのみが添付された画像ファイル形式データを送信することを特徴とする請求項７記載のネットワークカメラであり、音声データが許容データ長を越えた場合、まず許容データ長のデータを送信し、次いで残りのデータを音声データのみの通知を行うから、音声が途切れることがない。
【００２８】
本発明の請求項９に記載された発明は、ウェブページに音声データと画像データを添付して送信するとき、画像ファイル形式データのフォーマットにおける挿入位置を選択し、画像ファイル形式データのヘッダ域に音声データ、データ域に画像データを挿入して、このウェブページをネットワーク端末へ送信することにより通信の同期をとることを特徴とするデータ送信方法であり、ＪＰＥＧ等の画像ファイル形式のデータに、画像ファイルだけではなく、音声データをも挿入することにより、ネットワークカメラから送信される音声データ及び画像データをネットワーク端末で容易に両者を十分に同期させて再生することができる。また、音声データはヘッダ領域（コメント部）に挿入されているため、通常の画像展開動作のみを行うネットワーク端末においても、画像だけは再生することができ、一方、ヘッダ領域のコメント部の音声データを分離再生することができるような構成とすれば、ネットワーク端末で音声も再生することができる。従って、同一の画像ファイル形式データで一般のブラウザであっても、少なくとも画像データを表示することができ、ネットワーク端末ごとに異なるファイルを生成する必要がなく、ネットワークカメラの処理負担が軽減され、安価なシステムを提供することができる。また、従来のように画像データと一緒に音声データが間引かれないので可能性として音声が途切れることがない。
【００２９】
（実施の形態１）
本発明の実施の形態１におけるネットワークカメラシステムとそのネットワークカメラ、さらにデータ送信方法について説明する。図１は本発明の実施の形態１におけるネットワークカメラシステムのシステム構成図、図２は本発明の実施の形態１におけるネットワークカメラの構成図、図３は本発明の実施の形態１におけるネットワークカメラが送信する画像ファイル形式データのデータフォーマット構成図、図４は本発明の実施の形態１におけるネットワークカメラが送信するデータのタイムチャート、図５は図４のタイムチャートで無音時のチャートである。
【００３０】
図１に示すネットワークカメラシステムの全体構成において、１はディスプレーに映像を表示するとともに音声も再生できるパソコン等のネットワーク端末、２，２ａ，２ｂ，２ｃはネットワーク端末１からアクセスに対して画像や音声データを埋め込んだ（タグ等でリンク情報を付加した）ウェブページを送信する画像・音声サーバ機能を有するネットワークカメラ、３はネットワークカメラ２ａ，２ｂ，２ｃを配下にしたルータ、４はインターネット等のネットワークである。５はネットワーク端末１等からのＩＰアドレス割り当て要求に従って、ネットワーク端末１等にグローバルＩＰアドレスを割り当てるＤＨＣＰサーバ、６はネットワーク端末１からの名前解決の問い合わせに対して、問い合わせのホスト名に対するＩＰアドレスをネットワーク端末１に送信するＤＮＳサーバである。７はブラウザの機能を拡張して音声や動画を再生するためのプラグインソフトをネットワーク端末１にダウンロードできるウェブサーバである。例えば後述する音声制御手段１４や表示制御手段１３等がプラグインソフトとしてダウンロードされてよい。
【００３１】
このようなネットワークカメラシステムで、ネットワーク端末１に搭載されているブラウザ手段１２からネットワークカメラ２ａ，２ｂ，２ｃに対し、あて先のＵＲＬをネットワークカメラのホスト名及びポート番号として、Ｈｔｔｐプロトコルによりアクセス（ウェブページの要求）したときは、ネットワーク端末１はまずＤＮＳサーバ６からグローバルＩＰアドレスを入手し、入手したＩＰアドレスをＩＰパケットのあて先ヘッダに、そしてポート番号をＴＣＰヘッダ部分に埋め込んだデータパケットをルータ３に送り、ルータ３でＴＣＰヘッダに埋め込まれたポート番号に従ってポートフォワーディングされ、ネットワークカメラ２ａ，２ｂ，２ｃに送信される。一方、ネットワークカメラ２は、このネットワーク端末１からのアクセスに応答して、画像及び音声が添付（リンク先が記載）されたウェブページはルータ３及びネットワーク４を経由してネットワーク端末１に転送されることになる。
【００３２】
そこで、以下このような分散して構成されるネットワークカメラシステムの内部構成、まずネットワーク端末１について説明する。図１において、１１はネットワーク端末１のネットワーク４との通信を制御するネットワーク制御部である１２は、ネットワーク制御部１１を介してインターネット等のネットワーク４に接続されたサーバにアクセスすると、画像ファイルや音声ファイル等を添付したＨＴＭＬ、ＸＴＭＬファイル等のページ（本発明のウェブページ）を受信し、ディスプレーやスピーカで再生するブラウザ手段である。
【００３３】
図１に示す１３は、受信した画像ファイルや動画等その他の画像ファイルをディスプレーに表示するための表示制御手段、１４は受信した音声ファイルやその他の音声ファイルを再生するための音声制御手段である。音声制御手段１４と表示制御手段１３はブラウザ手段１２の機能を拡張するためにウェブサーバ７からプラグインしてもよい。ブラウザ手段１２はＨＴＭＬファイルやＸＴＭＬファイルを受信すると、表示制御手段１３や音声制御手段１４を動作させて画像や音声を再生させる。なお、音声制御手段１４はアンプを備えておりネットワークカメラ２等から送信されるＡＳＦ（ＡｄｖａｎｃｅｄＳｔｒｅａｍｉｎｇＦｏｒｍａｔ）形式に符号化された音声データを伸長し、Ｄ／Ａ変換してアンプで音量を調整して出力する。
【００３４】
次に、１５は各種制御プログラムや各種データをメモリする記憶部である。１６はネットワーク端末１の制御を行う制御部である。制御部１６は中央処理装置を用いて構成され、記憶部１５から各機能の制御プログラムが読み出されて実行されるもので、機能実現手段として構成される。１７はマウスで入力したりキーボードからの入力を行うための入力制御手段である。
【００３５】
また、実施の形態１のネットワーク端末１は、ネットワークカメラ２から連続送信される静止画像ファイル形式データから音声データを取り出して再生する以下の構成を備えている。
【００３６】
すなわち、図１において、１８は、画像を添付したＨＴＭＬファイルにリンク情報を埋め込まれた画像ファイル形式データから音声データを取り出す音声機能拡張部である。音声機能拡張部１８は少なくとも以下説明する手段を備えて構成される。図２において、１８ａは以下説明するカメラ装置からＨＴＭＬにリンク情報を埋め込まれた画像ファイル形式データのヘッダ部分に収容された音声データを取り出す音声データ取出し手段である。１８ｂは取り出した音声データを音声制御手段１４で再生するためにストリーミングデータの解凍を管理する解凍管理手段である。
【００３７】
ここでＨＴＭＬファイルにリンク情報を埋め込まれた画像ファイル形式データの構成の説明をすると、図３に示すように、４０はＨＴＭＬファイルにリンク情報を埋め込まれ、かつネットワーク端末１から送信される音声データを含む画像ファイル形式データである。本実施の形態ではＪＰＥＧファイル形式で構成されているが、ヘッダ部分に音声データを挿入可能な領域を有する画像ファイル形式であれば、ＪＰＥＧ形式に限られない。
【００３８】
なお、ＪＰＥＧファイルは、開始識別コード（ＳＯＩマーカーコード）で始まり、終了識別コード（ＥＯＩマーカーコード）で終了する構成をとっており、コメントを挿入する場合には、コメントコード（ＣＯＭマーカーコード）、コメントコードの長さ情報、コメントデータを開始識別コードと終了識別コードとの間に挿入する必要がある。本実施の形態では、画像データ部分以外の情報をヘッダ部（ヘッダ領域）として説明する。
【００３９】
４１はこの画像ファイル形式データのヘッダ部であり、４２はデータ部である。データ部４２には圧縮した画像データがセットされる。４３はヘッダ部４１の先頭に置かれる開始コード、４４〜４６はコメント部であり、４４はコメントコード、４５はコメント部のデータ長記載部、４６は音声データ部（コメントデータ部）である。
【００４０】
コメント部のデータ長記載部４５には２バイトの長さのデータ量の記載が許されているため、音声データ部４６には６４ｋバイトまでのデータをセットすることができる。実施の形態１においてはこの音声データ部４６には後述する音声制御部２５によってデータ化されたＡＳＦデータが収容される。そして、通常音声は８ｋバイト／ｓｅｃで出力されるから最大８ｓｅｃ分のデータが収容可能であり、ネットワークカメラ２からの画像ファイル形式データの送信は３３ｍｓｅｃに１枚が通常の送信速度であるから、本実施の形態１のレスポンス通知は最大８ｓｅｃ遅延して着信しても音声だけは途切れることはない。なお、ネットワークカメラ２にアクセスした際にネットワーク端末１に送信されるＨＴＭＬファイルには画像ファイル形式データをモーションＪＰＥＧ形式でネットワークカメラ２に送信要求するように記載がなされており、ネットワーク端末１がＨＴＭＬファイルの記載に従ってネットワークカメラ２にアクセスすることにより、ネットワークカメラ２からネットワーク端末１に対して、最大３０枚／秒で連続して画像ファイル形式データを送信するように構成されている。
【００４１】
同様に、実施の形態１のネットワークカメラ２はカメラ部２２を駆動する駆動制御部２６を備えており、ネットワーク端末１からの操作で姿勢を制御している。ウェブサーバ部２１ａからカメラ部２２のカメラ位置情報やカメラ制御情報をネットワーク端末１が受信することができれば、ネットワーク端末１からその情報に基づいて、カメラの動作状態を画面に表示したり、制御したりするのが容易になる。従って、このカメラ位置情報及びまたはカメラ制御情報を画像ファイル形式データのヘッダ域に添付することにより、画像や音声と対応したカメラ位置情報やカメラ制御情報を、画像ファイル形式データで画像データと独自性を保って通知することができる。音声データと同様にコメントデータ部に書き込むから、詳細は上述の音声データのヘッダ域への添付と同様である。但し、データ長が６４ｋバイトを超えることはない。
【００４２】
続いて、ネットワークカメラシステムを構成するネットワークカメラ２について説明する。図２において、２１はルータ３を介してネットワークカメラ２のネットワーク４との通信を制御するネットワーク制御部、２１ａはウェブサーバ部、２２はカメラ部、２３はカメラ部２２で撮影した画像データを圧縮する映像制御部、２４は音声を集音するためのマイク、２４ａはネットワーク端末１で入力された音声を出力するためのスピーカ、２５はマイク２４からの音声アナログ信号をＡ／Ｄ変換によりデータ信号に変換し、圧縮符号化してネットワーク制御部２１に渡す音声制御部である。２６はネットワークカメラ１のパン、チルト等の駆動制御を行う駆動制御部である。２７は制御プログラムや各種データをメモリする記憶部である。３０はネットワークカメラ２全体を制御する制御部である。
【００４３】
また、図３に示すネットワークカメラ２は画像・音声添付のウェブ文ページ成機能を有しており、２８は画像データのみ、若しくは画像データ及び音声データの双方から構成される画像ファイル形式データを作成する画像ファイル形式データ生成手段、２９は画像ファイル形式データに画像データだけでなく音声データを付加して同時に送る画像・音声モード（本発明の第１のモード）と、音声だけの画像ファイル形式データを送る音声モード（本発明の第２のモード）と、画像だけの画像ファイル形式データを送る音声モード（本発明の第３のモード）とを切り替えるモード管理手段である。モードを切り替えることで無音が続く場合等に音声データを送信せず、音声出力の中に意味の乏しい無音状態を混入させることを避けることができる。また、ネットワーク端末１側からネットワーク制御部２１にモード指定のリクエスト通知を送信することにより、ユーザのニーズを反映させることができる。
【００４４】
そこで、ネットワークカメラ２が画像と音声の双方のデータを付加した画像ファイル形式データを送信し、ネットワーク端末１で再生処理を行うときの処理の流れを時間の経過とともに説明する。図４の上段がネットワークカメラの送信するパケットであり、下段が端末での再生状態である。図４において、画像・音声モードで時間ｔ１の時点で撮像した画像データ及びｔ０〜ｔ１間の音声データを画像ファイル形式データとしてネットワーク端末１に送信する。これにより、ネットワーク端末１では、送信直前の時点に撮影した画像データと、前回送信した画像ファイル形式データに付加して送信した音声データ以降でかつ送信直前の時点までの音声データを画像ファイル形式データとして受信し、下段に示した出力画面と出力音声として再生される。
【００４５】
トラフィックの関係上、画像ファイル形式データの遅延等により、時点ｔ２にネットワークカメラ２から画像ファイル形式データが送信されず、この状態が時点ｔ３まで、１ｓｅｃ以上、例えば５ｓｅｃ間続いた場合においては、ｔ３時点においての画像データ及びｔ１〜ｔ３の間の入力音声が一つの画像ファイル形式データとしてネットワーク端末２へ送信され、ネットワーク端末１においては、この画像ファイル形式データに基づいて、受信時点ｔ３におけるこの画像データの画像表示を行うとともにｔ１〜ｔ３の間の音声データが再生されることになる。但し、この場合においては、１ｓｅｃ以上前の音声データが受信されることになるため、画像の表示と音声との間で１ｓｅｃの同期ずれが発生する可能性がある。従って、１ｓｅｃ以上の同期ずれが生じないように、１ｓｅｃ以上の音声はデータとして送信しないように画像ファイル形式データの生成時に制限をかけるように利用者により選択するようにすることが好ましい。また、ウェブページに音声遅延の状態表示（例えば、「１ｓｅｃ遅延」等の表示）をするように、ネットワーク端末１の利用者が容易にその遅延を理解することができるようにし、さらにウェブページに音声のプリセットボタン（ボタン押下で、次の画像ファイル形式データが送られてきた時点で前の画像ファイル形式データの未再生の音声データを破棄して、次の画像ファイル形式データの音声データを再生する制御をするボタン）を設けるようにしてもよい。さらに、音声のプリセットボタンによらずに、利用者の選択により、新しい画像ファイル形式データが受信された場合には、前回の画像ファイル形式データの未再生分の音声データを破棄して新しい画像ファイル形式データの音声データをすぐに再生するようにしてもよい。
【００４６】
ｔ１〜ｔ３までの間、すなわち画像データの送信がない状態が８ｓｅｃ以上、例えば１０ｓｅｃ続いた場合には、時点ｔ３で画像・音声モードから音声モードに切り替えられ、音声データだけをセットして画像データのない画像ファイル形式データが送信される。従って、このとき画像は送られず、音声だけが連続して再生され、時点ｔ４までの音声データの再生が続く。音声データのみの画像ファイル形式データは、容量が少ないため、ネットワークのトラフィックが混雑していても画像データを含む画像ファイル形式データに比べ遅延を減少させることができる。時点ｔ４で再び画像・音声モードに切り替わり、音声データ及び画像データの双方を含む画像ファイル形式データが送られる。なお、時点ｔ５は、音声モードに切り替え後、ネットワークのトラフィックの混雑が所定の基準まで減少したときであり、所定の基準まで減少したかどうかは、ネットワークカメラ２から送信したデータをネットワーク端末１が受信したレスポンス時間等に基づいて判断される。
【００４７】
ところで、図５には無音状態が続いたときの処理を示している。時点ｔ６〜ｔ７までの音声データはｔ７の画像データと共に画像ファイル形式データとして送信され、受信側で出力される。次に時点ｔ７〜ｔ８までの音声データが無音状態であるとネットワークカメラ２が判定すると、時点ｔ７〜ｔ８までの音声データを画像ファイル形式データに入れずに、ｔ８時点の画像データのみを画像ファイル形式データに入れて送信する。従って、ｔ７〜ｔ８までの時間が６６ｍｓｅｃ以内であれば、ｔ７〜ｔ８間での音声データは無音状態として省略することができ、ｔ８〜ｔ９間の音声データは、ｔ９時点の画像データに同期してネットワーク端末２で出力することができる。なお、無音状態が続いたとき音声データ無しの画像ファイル形式データを送信するのではなく、無音を表す短いデータをセットして送信するのも好適である。また、無音状態という判定は、予め無音レベルの判断の基準となる閾値を決めておき、モード管理手段２９が判定を行う。
【００４８】
続いて、以上説明した各モードで実施の形態１のネットワークカメラ２が行う具体的な処理について説明する。図６は本発明の実施の形態１におけるネットワークカメラが行う処理のフローチャートである。図６において、待機中のネットワークカメラ２はネットワーク端末１からウェブページの送信要求があったか否かをチェックする（ｓｔｅｐ１）。ウェブページの送信要求がない場合再び待機する。ウェブページの送信要求があったときには、音声を送るモードが要求されているか否かがチェックされる（ｓｔｅｐ２）。音声を送るモードの場合、さらに画像を送るモードが要求されているか否かがチェックされる（ｓｔｅｐ３）。音声を送るモードではない場合、画像データを圧縮し（ｓｔｅｐ４）、画像ファイル形式データのデータ領域に圧縮した画像データを挿入し、所定の記憶領域に格納し、この格納場所を記載したＨＴＭＬファイルを送信するために所定の記憶領域から取り出す（ｓｔｅｐ５）。
【００４９】
ｓｔｅｐ３において、画像を送るモードであった場合画像データと音声データをそれぞれ圧縮し（ｓｔｅｐ６）、画像を送るモードではなかった場合は音声データを圧縮する（ｓｔｅｐ７）。ｓｔｅｐ６で圧縮された音声データのデータ長が許容データ長と比較され、データ長が許容される範囲か否かがチェックされる（ｓｔｅｐ８）。ｓｔｅｐ８でデータ長が許容データ長より短くて許容範囲内であった場合、画像データが画像ファイル形式データのデータ域に、音声データがヘッダ域に挿入される（ｓｔｅｐ９）。ｓｔｅｐ８でデータ長が許容データ長より長かった場合、ｓｔｅｐ１０で超過データ長が許容データ長より短く許容される長さか否かがチェックされ、許容されるときには許容されるデータ長（許容データ長とそれより短い場合を含む）の音声データを挿入した画像ファイル形式データ形式データを作成する（ｓｔｅｐ１１）。ｓｔｅｐ１０において許容されるデータ長でなかったときには許容データ長の音声データを挿入した画像ファイル形式データを作成するとともにｓｔｅｐ１０に戻り（ｓｔｅｐ１２）、これを繰返す。なお、許容長データより長い部分の音声データは、以降の画像ファイル形式データに挿入され、その際に画像データを挿入せずに音声データのみを挿入するようにすることが可能である。
【００５０】
ｓｔｅｐ７で圧縮された音声データのデータ長が許容データ長と比較され、データ長が許容されるか否かがチェックされる（ｓｔｅｐ１３）。ｓｔｅｐ１３において音声データのデータ長が許容データ長より短くて許容範囲内であった場合、レスポンス通知のヘッダ域に音声データが挿入される（ｓｔｅｐ１４）。ｓｔｅｐ１３でデータ長が許容データ長より長かった場合、ｓｔｅｐ１５で超過データ長が許容データ長より短く許容される長さか否かがチェックされ、許容されるときには許容されるデータ長（許容データ長とそれより短い場合を含む）の音声データを挿入した画像ファイル形式データを作成する（ｓｔｅｐ１６）。ｓｔｅｐ１３において許容されるデータ長でなかったときには許容データ長の音声データを挿入した画像ファイル形式データを作成するとともにｓｔｅｐ１５に戻り（ｓｔｅｐ１７）、これを繰返す。ｓｔｅｐ５，９，１１，１４，１６で作成された画像ファイル形式データはネットワーク制御部２１から送信され（ｓｔｅｐ１８）、再び待機状態に復帰する。
【００５１】
このように実施の形態１のネットワークカメラ２は、モード指定と音声データのデータ長をチェックすることにより、音声データが小さい場合は画像データと音声データの全部、音声データに超過分が生じた場合には画像データと許容データ長の音声データを添付するため、画像と音声の同期が十分とれた画像ファイル形式データを送信することができる。なお、カメラ位置情報とカメラ制御情報は以上説明した音声データと同様であるから説明を省略する。但し、データ量は小さく、許容データ長を超えることはない。
【００５２】
以上の説明したように、本実施の形態１においては音声が無音状態が長く続かない限り、ほとんど途切れることなく再生される。また、無音状態が続いた場合には画像モードもしくは画像・音声モードのデータを送り、音声データが届いた時点に音声を再生するから、無音状態の時間を圧縮して再生することができる。また、カメラ部のカメラ位置情報及びまたはカメラ制御情報をヘッダ域に収めて送信するから、画像データが圧縮されているときにも独自性を保って情報をネットワーク端末に送ることができる。
【００５３】
なお、本実施の形態では、音声のみを送る場合には、画像ファイル形式データで音声データを送信するようにしたが、画像ファイル形式でない音声ファイル形式で音声データを送り、それを受信して再生するようにしてもよい。
【００５４】
【発明の効果】
本発明のネットワークカメラシステムによれば、ＪＰＥＧ等の画像ファイル形式のデータに、画像ファイルだけではなく、音声データをも挿入することにより、ネットワークカメラから送信される音声データ及び画像データをネットワーク端末で容易に両者を十分に同期させて再生することができる。また、音声データはヘッダ領域の（コメント部）に挿入されているため、通常の画像展開動作のみを行うネットワーク端末においても、画像だけを再生することができ、一方、ヘッダ領域のコメント部の音声データを分離再生することができるような構成とすれば、ネットワーク端末で音声も再生することができる。従って、同一の画像ファイル形式データで一般のブラウザであっても、少なくとも画像データは表示することができ、ネットワーク端末ごとに異なるファイルを生成する必要がなく、ネットワークカメラの処理負担が軽減され、安価なシステムを提供することができる。また、従来のように画像データと一緒に音声データが間引かれないので可能性として音声が途切れることがない。
【００５５】
カメラ部のカメラ位置情報及びまたはカメラ制御情報はヘッダ領域（コメント部）に挿入されているため、通常の画像展開動作のみを行うネットワーク端末においても、画像だけは再生することができ、一方、ヘッダ領域のコメント部のカメラ位置情報及びまたはカメラ制御情報を分離し処理することができるような構成とすれば、ネットワーク端末でカメラ位置情報及びまたはカメラ制御情報を取得することができる。従って、同一の画像ファイル形式データで一般のブラウザであっても、画像データを表示することができ、ネットワーク端末ごとに異なるファイルを生成する必要がなく、ネットワークカメラの処理負担が軽減され、安価なシステムを提供することができる。
【００５６】
本発明のネットワークカメラによれば、画像データが圧縮されているときにも音声データはヘッダ域（コメント部）に格納されており、伸長処理の際の互換性を保つことができる。従来のように画像データと一緒に音声データが間引かれないので可能性として音声が途切れることがない。画像データと音声データを同一の画像ファイル形式データとして同時に送信するため、画像と音声の十分な同期をとることができる。
【００５７】
モードを切り替えることができるため、トラフィックの負荷が大きくなったときなどに音声データだけを送信することができ、トラフィックと画像と音声に対するニーズを反映した通信が行える。また、カメラ位置情報やカメラ制御情報が画像ファイルのヘッダ部分に挿入されているから、ネットワーク端末では受信した画像ファイ形式データから容易にカメラ位置情報やカメラ制御情報を入手し、利用することができる。
【００５８】
さらに、無音レベルのときには無用なデータを送らず、データ量をおとすことができ、又トラフィックの混雑により再生する音声データが遅延するような場合でも、無音状態の音声データは間引くことができるから、同期の調整を補完することができる。そして、音声データが許容データ長を越えた場合にも送信が可能になる。また、音声データが許容データ長を越えた場合、まず許容データ長のデータを送信し、次いで残りのデータを音声データのみの通知を行うから、音声が途切れることがない。
【００５９】
本発明のデータ送信方法によれば、ＪＰＥＧ等の画像ファイル形式のデータに、画像ファイルだけではなく、音声データをも挿入することにより、ネットワークカメラから送信される音声データ及び画像データをネットワーク端末で容易に両者を十分に同期させて再生することができる。また、音声データはヘッダ領域（コメント部）に挿入されているため、通常の画像展開動作のみを行うネットワーク端末においても、画像だけは再生することができ、一方、ヘッダ領域のコメント部の音声データを分離再生することができるような構成とすれば、ネットワーク端末で音声も再生することができる。従って、同一の画像ファイル形式データで一般のブラウザであっても、少なくとも画像データを表示することができ、ネットワーク端末ごとに異なるファイルを生成する必要がなく、ネットワークカメラの処理負担が軽減され、安価なシステムを提供することができる。また、従来のように画像データと一緒に音声データが間引かれないので可能性として音声が途切れることがない。
【図面の簡単な説明】
【図１】本発明の実施の形態１におけるネットワークカメラシステムのシステム構成図
【図２】本発明の実施の形態１におけるネットワークカメラの構成図
【図３】本発明の実施の形態１におけるネットワークカメラが送信するレスポンス通知のパケット構成図
【図４】本発明の実施の形態１におけるネットワークカメラが送信するデータのタイムチャート
【図５】図４のタイムチャートで無音時のチャート
【図６】本発明の実施の形態１におけるネットワークカメラが行う処理のフローチャート
【図７】（ａ）従来の同期情報による画像と音声の同期方式の受信側端末装置の構成図（ｂ）従来の同期情報による画像と音声の同期方式の送信側端末装置の構成図
【図８】従来の送信側端末から送信する音声データと静止画データのタイムチャート
【符号の説明】
１ネットワーク端末
２，２ａ，２ｂ，２ｃネットワークカメラ
３ルータ
４ネットワーク
５ＤＨＣＰサーバ
６ＤＮＳサーバ
７ウェブサーバ
１１ネットワーク制御部
１２ブラウザ手段
１３表示制御手段
１４音声制御手段
１５記憶部
１６制御部
１７入力制御手段
１８音声機能拡張部
１８ａ音声データ取出し手段
１８ｂタイミング管理手段
２１ネットワーク制御部
２１ａウェブサーバ部
２２カメラ部
２３映像制御部
２４マイク
２４ａスピーカ
２５音声制御部
２６駆動制御部
２７記憶部
２８画像ファイル形式データ生成手段
２９モード管理手段
４０レスポンス通知
４１ヘッダ部
４２データ部
４３開始コード
４４コメント部
４５データ長記載部
４６音声データ部（コメントデータ部）
１０１通信部
１０２データ分離部
１０３音声処理部
１０４音声格納部
１０５画像処理部
１０６画像格納部
１０７統合処理部
１０８同期情報記憶部
１０９時間制御部
１１０音声出力部
１１１画像出力部
１１２音声入力部
１１３画像入力部
１１４合成部
１１５同期情報付加部
１１６同期情報入力部
１１７制御部
１１８データ送信部
１１９通信部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a network camera system for transmitting images and sounds acquired by a camera and a microphone in a predetermined format, a network camera used at that time, and a data transmission method adopted at that time.
[0002]
[Prior art]
2. Description of the Related Art Recent advances in digital technology and network technology are remarkable, and a network terminal, for example, a personal computer or the like is connected to the Internet to simultaneously receive and reproduce images and sounds from many web servers.
[0003]
By the way, when such a conventional web server transmits an image and a sound, a time stamp, that is, synchronization information based on time information is generally added to each data of the image and the sound and transmitted. . For example, a video conference system is taken as an example, but a terminal device has been proposed which synchronizes image data and audio data with time data (see Patent Document 1). This terminal device gives synchronization information by time control to both audio and image data, creates synchronized data as intended by the transmitting user, reproduces the data with the synchronization information on the receiving side, and reproduces the audio and image data. Both data are output synchronously. FIG. 7A is a configuration diagram of a receiving terminal device of a conventional image and audio synchronization method using synchronization information, and FIG. 7B is a configuration of a conventional transmission terminal device of an image and audio synchronization method using synchronization information. FIG. 8 and FIG. 8 are time charts of audio data and still image data transmitted from a conventional transmitting terminal.
[0004]
7A and 7B, the receiving side includes a communication unit 101, a data separation unit 102 that determines whether the image is a sound or an image, a sound processing unit 103, a sound storage unit 104, an image processing unit 105, and an image storage unit. 106, an integrated processing unit 107 for controlling output based on extraction of synchronization information added to audio and image data, a synchronization information storage unit 108, a time control unit 109 for acquiring time information and managing data output time, and an audio output. The transmitting side includes a voice input unit 112, an image input unit 113, an input data combining unit 114, a synchronization information adding unit 115, a synchronization information input unit 116, and control of time information. This is a configuration including a unit 117, a data transmission unit 118, and a communication unit 119.
[0005]
When audio data is transmitted at time t1 as shown in FIG. 8, the transmitting terminal device of this (Patent Document 1) transmits image data 2 and 3 having synchronization information of t2 at time t2. Each of the audio data and the image data includes header information, data, and an end code, and the header information includes a data type, a data size, and motive information.
[0006]
This is received by the receiving terminal device, the data is separated according to audio or video, and output is started in synchronization with the time of the synchronization information. The data length of audio is fixed, but the output time of image data is not fixed. Therefore, when the traffic load of the network is large, it becomes difficult for this terminal device to transmit all of the image data and the audio data. Therefore, in such a case, in the terminal device of Patent Document 1, the time control unit 109 performs a process of thinning out data. As a result, a part of the sound is cut together with a part of the image, and the sound is interrupted. Intermittent speech is hard to hear and greatly impairs the transmission of information.
[0007]
In addition to the synchronization based on the time information described above, there is a time stamp method or the like in which a frame number is added to image data and audio data to synchronize as shown in (Patent Document 2). It is necessary to add the image data and the audio data to each other, which not only complicates the configuration, but also in the case where the traffic load of the network is large, it is difficult for this terminal device to transmit all the image data and the audio data. . Naturally, the voice is interrupted, and the configurations for synchronizing these terminal devices (Patent Document 1) and (Patent Document 2) are complicated and costly.
[0008]
[Patent Document 1]
JP-A-9-27871
[Patent Document 2]
JP-A-9-93553
[0009]
[Problems to be solved by the invention]
As described above, when transmitting a conventional image and audio, synchronization is achieved by adding synchronization information based on time information to each data or by synchronizing by adding a frame number to the image and audio data. Things have been done.
[0010]
However, when the traffic load of the network is large, it is difficult to transmit all of the image data and the audio data by the method of synchronizing these. That is, when a delay occurs, it is necessary to perform a process of thinning out data, and a part of the reproduced image and a part of the audio are cut at the same time, and the audio is interrupted. Each of these configurations for synchronizing is complicated, and increases the cost of the terminal device.
[0011]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a network camera system in which audio is not interrupted, compression / expansion processing is not affected, and sufficient synchronization between image and audio can be achieved.
[0012]
It is another object of the present invention to provide a network camera capable of achieving sufficient synchronization between an image and a sound without any interruption of the sound, without affecting the compression / decompression processing.
[0013]
It is a further object of the present invention to provide a data transmission method capable of achieving sufficient synchronization between an image and a sound without any interruption of the sound, without affecting the compression / decompression processing.
[0014]
[Means for Solving the Problems]
In order to solve this problem, a network camera system according to the present invention is configured such that a network camera selects an insertion position of image file format data included in a web page and inserts audio data into a header area of the image file format data. The image data is inserted into the data area of the image file format data, and the web page is transmitted to the network terminal.
[0015]
As a result, the audio is not interrupted, the compression / decompression processing has no effect, and the image and the audio can be sufficiently synchronized.
[0016]
Further, the network camera of the present invention is provided with an image file format data generating means for inserting audio data into a header area of the image file format data, and the image file format data generating means converts the audio data into a header portion of the image file format data. The image data is inserted into the data area of the image file format data to generate image file format data and transmitted to the network terminal.
[0017]
As a result, the audio is not interrupted, the compression / decompression processing has no effect, and the image and the audio can be sufficiently synchronized.
[0018]
In the data transmission method of the present invention, when audio data and image data are attached to a web page and transmitted, the insertion position in the format of the image file format data is selected, and the audio data and the data are stored in the header area of the image file format data. Communication is synchronized by inserting image data into the area and transmitting this web page to the network terminal.
[0019]
As a result, the audio is not interrupted, the compression / decompression processing has no effect, and the image and the audio can be sufficiently synchronized.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
In order to solve the above-mentioned problem, the invention according to claim 1 of the present invention provides a network camera that transmits a web page including image data captured by a camera unit and audio data collected by a microphone to a network, and a network camera connected to the network. A network camera system comprising a network terminal capable of reproducing video and audio from a received web page, wherein the network camera selects an insertion position of image file format data included in the web page, A network camera system that inserts audio data into a header area of image file format data, inserts image data into a data area of image file format data, and transmits a web page to a network terminal. Image file format data Only rather than by also inserting the audio data, it can reproduce the sound data and the image data is transmitted from the network camera sufficiently synchronized easily both in the network terminal. Also, since the audio data is inserted in the header area (comment part), only the image can be reproduced even in the network terminal that performs only the normal image expansion operation, while the audio data in the comment part of the header area can be reproduced. Is configured to be able to separate and reproduce the sound, the network terminal can also reproduce the sound. Therefore, even a general browser using the same image file format data can display at least the image data, there is no need to generate a different file for each network terminal, the processing load on the network camera is reduced, and the cost is reduced. System can be provided. In addition, since the audio data is not thinned out together with the image data as in the related art, there is no possibility that the audio is interrupted.
[0021]
According to a second aspect of the present invention, there is provided a drive control unit for driving the camera unit, wherein the camera position information and / or camera control information of the camera unit is inserted into a header area of the image file format data, and the image data is converted into the image file format data. 2. The network camera system according to claim 1, wherein the data is transmitted after being inserted in a data area, wherein the camera position information and / or camera control information of the camera section is inserted in a header area (comment section). In a network terminal that performs only the image expansion operation described above, only the image can be reproduced, while the camera position information and / or camera control information in the comment section of the header area can be separated and processed. For example, the network terminal can acquire camera position information and / or camera control information. Therefore, even a general browser can display image data using the same image file format data, there is no need to generate a different file for each network terminal, the processing load on the network camera is reduced, and the cost is reduced. A system can be provided.
[0022]
According to a third aspect of the present invention, there is provided a camera unit that captures an image, a microphone that collects audio, a video control unit that encodes image information obtained by the camera unit, and outputs image data, and transmits audio information from the microphone. A network camera comprising an audio control unit for encoding and outputting audio data, and a network control unit connected to a network and capable of transmitting image data and audio data to a network terminal, wherein the audio data is a header of image file format data. Image file format data generating means for inserting the audio data into the header portion of the image file format data, and inserting the image data into the data area of the image file format data. A network card that generates image file format data and sends it to a network terminal. Is La, the audio data even when the image data is compressed is stored in the header area (comments section), it is possible to maintain compatibility at the time of decompression processing. Since the audio data is not thinned out together with the image data as in the related art, there is no possibility that the audio is interrupted. Since the image data and the audio data are transmitted simultaneously as the same image file format data, the image and the audio can be sufficiently synchronized.
[0023]
The invention according to claim 4, wherein mode switching means for switching between a first mode for simultaneously transmitting audio data and image data, a second mode for transmitting only audio data, and a third mode for transmitting only image data. 4. The network according to claim 3, wherein when a mode switching request is received from the network terminal, the mode switching unit switches to one of the modes, and the notification generating unit generates image file format data according to the mode. Since it is a camera and its mode can be switched, only voice data can be transmitted when the load of traffic becomes large, and communication reflecting needs for traffic, images, and voice can be performed.
[0024]
According to a fifth aspect of the present invention, there is provided a drive control unit for driving a camera unit, wherein the image file format data generating means inserts camera position information and / or camera control information into a header area of the image file format data. Item 3. The network camera according to item 3 or 4, wherein the camera position information and camera control information are inserted into the header portion of the image file. Can be obtained and used.
[0025]
According to a sixth aspect of the present invention, when the audio input from the microphone is at the silence level, the image file format data generating means includes image file format data in which only the image data is inserted without attaching the audio data or data representing the silence level. The network camera according to any one of claims 3 to 5, wherein the network camera transmits image data in which image data has been inserted and the image data is inserted. In addition, even when the audio data to be reproduced is delayed due to traffic congestion, the audio data in a silent state can be thinned out, so that synchronization adjustment can be complemented.
[0026]
According to a seventh aspect of the present invention, when the data length of the audio data exceeds the allowable data length, the notification generation means transmits image file format data to which the audio data having the allowable data length is attached. 6. The network camera as described in any one of (1) to (6), wherein transmission is possible even when audio data exceeds an allowable data length.
[0027]
In the invention according to claim 8, when the data length of the audio data exceeds the allowable data length, only the audio data is attached after the notification generating means transmits the image file format data to which the audio data of the allowable data length is attached. 8. The network camera according to claim 7, wherein said image data is transmitted in the form of an image file format, wherein when the audio data exceeds the allowable data length, data of the allowable data length is transmitted first, and then the remaining data is transmitted as audio data. Since only the data is notified, the sound is not interrupted.
[0028]
According to the ninth aspect of the present invention, when audio data and image data are attached to a web page and transmitted, an insertion position in the format of the image file format data is selected, and a header area of the image file format data is selected. This data transmission method is characterized in that communication is synchronized by inserting image data into audio data and a data area and transmitting this web page to a network terminal. By inserting not only the image file but also the audio data, the audio data and the image data transmitted from the network camera can be easily reproduced by the network terminal by sufficiently synchronizing them. Also, since the audio data is inserted in the header area (comment part), only the image can be reproduced even in a network terminal that performs only a normal image development operation, while the audio data in the comment part of the header area is reproduced. Is configured to be able to separate and reproduce the sound, the network terminal can also reproduce the sound. Therefore, even a general browser can display at least the image data with the same image file format data, there is no need to generate a different file for each network terminal, the processing load on the network camera is reduced, and the cost is reduced. System can be provided. In addition, since the audio data is not thinned out together with the image data as in the related art, there is no possibility that the audio is interrupted.
[0029]
(Embodiment 1)
A network camera system, a network camera, and a data transmission method according to Embodiment 1 of the present invention will be described. FIG. 1 is a system configuration diagram of the network camera system according to the first embodiment of the present invention, FIG. 2 is a configuration diagram of the network camera according to the first embodiment of the present invention, and FIG. FIG. 4 is a time chart of data transmitted by the network camera according to Embodiment 1 of the present invention, and FIG. 5 is a time chart of FIG. 4 when there is no sound.
[0030]
In the overall configuration of the network camera system shown in FIG. A network camera having an image / voice server function for transmitting a web page in which data is embedded (link information is added by a tag or the like), 3 is a router under the network cameras 2a, 2b, 2c, and 4 is a network such as the Internet. It is. Reference numeral 5 denotes a DHCP server which assigns a global IP address to the network terminal 1 or the like in accordance with an IP address assignment request from the network terminal 1 or the like. 6 shows, in response to a name resolution inquiry from the network terminal 1, an IP address corresponding to the inquiry host name. This is a DNS server that transmits to the network terminal 1. Reference numeral 7 denotes a web server which can download plug-in software for reproducing audio and moving images to the network terminal 1 by extending the functions of the browser. For example, the voice control unit 14 and the display control unit 13 described later may be downloaded as plug-in software.
[0031]
In such a network camera system, the browser URL 12 installed in the network terminal 1 accesses the network cameras 2a, 2b, 2c using the HTTP protocol with the destination URL as the host name and port number of the network camera. When a request for a page is made, the network terminal 1 first obtains a global IP address from the DNS server 6, and inserts the obtained IP address in the destination header of the IP packet, and inserts the data packet in which the port number is embedded in the TCP header into the router. 3 and is port-forwarded by the router 3 according to the port number embedded in the TCP header, and transmitted to the network cameras 2a, 2b and 2c. On the other hand, in response to the access from the network terminal 1, the network camera 2 transfers the web page to which the image and the sound are attached (the link destination is described) to the network terminal 1 via the router 3 and the network 4. Will be.
[0032]
Therefore, the internal configuration of such a distributed network camera system, first, the network terminal 1 will be described below. In FIG. 1, reference numeral 11 denotes a network control unit for controlling communication of the network terminal 1 with the network 4. When a server connected to the network 4 such as the Internet via the network control unit 11 is accessed, an Browser means for receiving a page (web page of the present invention) such as an HTML file or an XML file to which an audio file or the like is attached, and reproducing the page on a display or a speaker.
[0033]
Reference numeral 13 shown in FIG. 1 is a display control unit for displaying a received image file or other image file such as a moving image on a display, and 14 is an audio control unit for reproducing the received audio file or other audio file. . The voice control means 14 and the display control means 13 may be plugged in from the web server 7 to extend the function of the browser means 12. Upon receiving the HTML file or the XHTML file, the browser unit 12 operates the display control unit 13 and the audio control unit 14 to reproduce the image and the audio. The audio control means 14 includes an amplifier, expands audio data encoded in an ASF (Advanced Streaming Format) format transmitted from the network camera 2 or the like, converts the data into a digital signal, and adjusts the volume with the amplifier. Output.
[0034]
Next, 15 is a storage unit for storing various control programs and various data. A control unit 16 controls the network terminal 1. The control unit 16 is configured using a central processing unit, and reads out and executes a control program of each function from the storage unit 15 and is configured as a function realizing unit. Reference numeral 17 denotes input control means for inputting with a mouse or inputting from a keyboard.
[0035]
The network terminal 1 according to the first embodiment has the following configuration for extracting audio data from still image file format data continuously transmitted from the network camera 2 and reproducing the audio data.
[0036]
That is, in FIG. 1, reference numeral 18 denotes an audio function extension unit for extracting audio data from image file format data in which link information is embedded in an HTML file to which an image is attached. The voice function expansion unit 18 is configured to include at least the units described below. In FIG. 2, reference numeral 18a denotes audio data extracting means for extracting audio data contained in a header portion of image file format data in which link information is embedded in HTML from a camera device described below. Reference numeral 18b denotes decompression management means for managing decompression of streaming data in order to reproduce the extracted audio data by the audio control means 14.
[0037]
Here, the structure of image file format data in which link information is embedded in an HTML file will be described. As shown in FIG. 3, reference numeral 40 denotes audio data in which link information is embedded in an HTML file and transmitted from the network terminal 1. Is image file format data. Although the present embodiment is configured in the JPEG file format, the image format is not limited to the JPEG format as long as it is an image file format having an area in which audio data can be inserted in a header portion.
[0038]
The JPEG file starts with a start identification code (SOI marker code) and ends with an end identification code (EOI marker code). When a comment is inserted, a comment code (COM marker code), It is necessary to insert comment code length information and comment data between the start identification code and the end identification code. In the present embodiment, information other than the image data part will be described as a header part (header area).
[0039]
Reference numeral 41 denotes a header part of the image file format data, and reference numeral 42 denotes a data part. The compressed image data is set in the data section 42. 43 is a start code placed at the head of the header section 41, 44 to 46 are comment sections, 44 is a comment code, 45 is a data length description section of the comment section, and 46 is a voice data section (comment data section).
[0040]
Since the data length description portion 45 of the comment portion is allowed to describe a data amount of 2 bytes long, the audio data portion 46 can set data up to 64 kbytes. In the first embodiment, the audio data unit 46 contains ASF data converted into data by the audio control unit 25 described later. Since normal audio is output at 8 kbytes / sec, data of up to 8 sec can be accommodated, and transmission of image file format data from the network camera 2 is performed at a normal transmission speed every 33 msec. In the response notification according to the first embodiment, only the voice is not interrupted even if the response is received with a delay of up to 8 seconds. Note that the HTML file transmitted to the network terminal 1 when accessing the network camera 2 describes that the image file format data is requested to be transmitted to the network camera 2 in the motion JPEG format. By accessing the network camera 2 according to the description of the file, the network camera 2 is configured to continuously transmit image file format data to the network terminal 1 at a maximum of 30 frames / sec.
[0041]
Similarly, the network camera 2 according to the first embodiment includes a drive control unit 26 that drives the camera unit 22, and controls the attitude by operating the network terminal 1. If the network terminal 1 can receive the camera position information and the camera control information of the camera unit 22 from the web server unit 21a, the operation state of the camera is displayed on the screen or controlled based on the information from the network terminal 1. Or easier. Therefore, by attaching the camera position information and / or camera control information to the header area of the image file format data, the camera position information and camera control information corresponding to the image and sound can be uniquely associated with the image data in the image file format data. Can be kept informed. Since the data is written in the comment data section in the same manner as the audio data, the details are the same as the above-described attachment of the audio data to the header area. However, the data length does not exceed 64 kbytes.
[0042]
Next, the network camera 2 that configures the network camera system will be described. In FIG. 2, reference numeral 21 denotes a network control unit for controlling communication of the network camera 2 with the network 4 via the router 3, reference numeral 21a denotes a web server unit, reference numeral 22 denotes a camera unit, and reference numeral 23 denotes compression of image data taken by the camera unit 22. 24, a microphone for collecting audio, a speaker 24a for outputting audio input from the network terminal 1, and a data signal 25 for converting an analog audio signal from the microphone 24 by A / D conversion. Is a voice control unit which converts the data into a compressed data and sends it to the network control unit 21. Reference numeral 26 denotes a drive control unit that performs drive control such as panning and tilting of the network camera 1. A storage unit 27 stores a control program and various data. A control unit 30 controls the entire network camera 2.
[0043]
Further, the network camera 2 shown in FIG. 3 has a function of generating a web text page attached to an image and a voice, and 28 generates image file format data composed of only the image data or both the image data and the voice data. The image file format data generating means 29 which adds image data as well as image data to the image file format data and simultaneously transmits the image file format data (the first mode of the present invention); This is a mode management means for switching between an audio mode for transmitting an image (second mode of the present invention) and an audio mode for transmitting image file format data of only an image (third mode of the present invention). By switching the mode, audio data is not transmitted when silence continues, and it is possible to avoid mixing a meaningless silence state into the audio output. Further, by transmitting a mode specification request notification from the network terminal 1 to the network control unit 21, the needs of the user can be reflected.
[0044]
Therefore, the flow of processing when the network camera 2 transmits image file format data to which both image and audio data are added and the network terminal 1 performs the reproduction processing will be described with the passage of time. The upper part of FIG. 4 shows a packet transmitted by the network camera, and the lower part shows a reproduction state of the terminal. In FIG. 4, image data captured at time t1 in the image / audio mode and audio data between t0 and t1 are transmitted to the network terminal 1 as image file format data. As a result, the network terminal 1 converts the image data captured immediately before the transmission and the audio data added to the previously transmitted image file format data and transmitted from the audio data up to the time immediately before the transmission into the image file format data. , And reproduced as the output screen and output audio shown in the lower part.
[0045]
Due to the delay of the image file format data due to the traffic, etc., the image file format data is not transmitted from the network camera 2 at the time t2, and if this state continues for 1 second or more, for example, 5 seconds until the time t3, the time t3 The image data at the time point and the input sound between t1 and t3 are transmitted to the network terminal 2 as one image file format data, and the network terminal 1 performs the image processing at the reception time point t3 based on the image file format data. The image of the data is displayed, and the audio data between t1 and t3 is reproduced. However, in this case, since the audio data one second or earlier is received, there is a possibility that a one-second synchronization shift occurs between the display of the image and the audio. Therefore, it is preferable that the user makes a selection so as not to transmit the audio for 1 sec or more as data so as not to cause a synchronization shift of 1 sec or more so as to limit when generating the image file format data. In addition, the user of the network terminal 1 can easily understand the delay so that the state of the audio delay is displayed on the web page (for example, a display such as “1 second delay”). Audio preset button (When the next image file format data is sent by pressing the button, the unplayed audio data of the previous image file format data is discarded, and the audio data of the next image file format data is played (A button for performing the control to perform). In addition, if new image file format data is received by the user's selection, regardless of the audio preset button, the unplayed audio data of the previous image file format data is discarded and a new image file format is discarded. The audio data of the format data may be reproduced immediately.
[0046]
During the period from t1 to t3, that is, when the state in which no image data is transmitted continues for 8 seconds or more, for example, 10 seconds, the mode is switched from the image / audio mode to the audio mode at time t3, and only the audio data is set to set the image data. The image file format data without is transmitted. Therefore, at this time, no image is sent, and only the audio is continuously reproduced, and the reproduction of the audio data until time t4 continues. Since the image file format data including only audio data has a small capacity, the delay can be reduced as compared with the image file format data including the image data even when network traffic is congested. At time t4, the mode is switched again to the image / audio mode, and image file format data including both audio data and image data is sent. At time t5, the time when the traffic congestion of the network is reduced to the predetermined standard after switching to the audio mode, the network terminal 1 determines whether the data transmitted from the network camera 2 has been reduced to the predetermined standard. The determination is made based on the received response time and the like.
[0047]
FIG. 5 shows a process when the silent state continues. The audio data from time t6 to time t7 is transmitted as image file format data together with the image data at t7, and output on the receiving side. Next, when the network camera 2 determines that the audio data from time t7 to t8 is in a silent state, the audio data from time t7 to t8 is not included in the image file format data, and only the image data at time t8 is stored in the image file. Send in format data. Therefore, if the time from t7 to t8 is within 66 msec, the audio data between t7 and t8 can be omitted as a silent state, and the audio data between t8 and t9 is synchronized with the image data at time t9. Output from the network terminal 2. It is also preferable to set short data representing silence and transmit it instead of transmitting image file format data without audio data when silence continues. In the determination of the silent state, a threshold value as a reference for determining the silent level is determined in advance, and the mode management unit 29 performs the determination.
[0048]
Next, specific processing performed by the network camera 2 of the first embodiment in each mode described above will be described. FIG. 6 is a flowchart of a process performed by the network camera according to Embodiment 1 of the present invention. In FIG. 6, the network camera 2 on standby checks whether or not there is a web page transmission request from the network terminal 1 (step 1). If there is no request to send a web page, the process waits again. When a request for transmitting a web page is made, it is checked whether or not a mode for transmitting audio is requested (step 2). In the case of the mode for sending audio, it is checked whether or not a mode for sending an image is required (step 3). If the mode is not the audio transmission mode, the image data is compressed (step 4), the compressed image data is inserted into the data area of the image file format data, stored in a predetermined storage area, and an HTML file describing the storage location is stored in an HTML file. It is taken out of a predetermined storage area for transmission (step 5).
[0049]
In step 3, when the mode is the mode for transmitting an image, the image data and the audio data are respectively compressed (step 6). The data length of the audio data compressed in step 6 is compared with the allowable data length, and it is checked whether the data length is within an allowable range (step 8). If the data length is shorter than the allowable data length and is within the allowable range in step 8, the image data is inserted into the data area of the image file format data, and the audio data is inserted into the header area (step 9). If the data length is longer than the permissible data length in step 8, it is checked in step 10 whether the excess data length is shorter than the permissible data length and permissible. If permissible, the permissible data length (permissible data length and its Image data in the form of an image file in which audio data (including a shorter case) is inserted is created (step 11). If the data length is not an allowable data length in step 10, image file format data in which audio data of an allowable data length is inserted is created, and the process returns to step 10 (step 12), and this is repeated. Note that the audio data of a portion longer than the permissible length data is inserted into the subsequent image file format data. At that time, it is possible to insert only the audio data without inserting the image data.
[0050]
The data length of the audio data compressed in step 7 is compared with the permissible data length, and it is checked whether the data length is permissible (step 13). If the data length of the audio data is shorter than the allowable data length and is within the allowable range in step 13, the audio data is inserted into the header area of the response notification (step 14). If the data length is longer than the permissible data length in step 13, it is checked in step 15 whether the excess data length is shorter than the permissible data length and permissible. If permissible, the permissible data length (permissible data length and its In this case, image file format data in which audio data (including a shorter case) is inserted is created (step 16). If the data length is not an allowable data length in step 13, image file format data in which audio data having an allowable data length is inserted is created, and the process returns to step 15 (step 17), and this is repeated. The image file format data created in steps 5, 9, 11, 14, and 16 is transmitted from the network control unit 21 (step 18), and returns to the standby state again.
[0051]
As described above, the network camera 2 according to the first embodiment checks the mode designation and the data length of the audio data. When the audio data is small, all of the image data and the audio data, and when the excess of the audio data occurs, Since image data and audio data having an allowable data length are attached to the image data, image file format data with sufficient synchronization between the image and the audio can be transmitted. Note that the camera position information and the camera control information are the same as the audio data described above, and a description thereof will be omitted. However, the data amount is small and does not exceed the allowable data length.
[0052]
As described above, in the first embodiment, a sound is reproduced almost without interruption unless a silent state continues for a long time. If the silent state continues, data in the image mode or the image / audio mode is sent, and the audio is reproduced when the audio data arrives. Therefore, the time of the silent state can be compressed and reproduced. Further, since the camera position information and / or camera control information of the camera section is transmitted in the header area, the information can be sent to the network terminal while maintaining the originality even when the image data is compressed.
[0053]
In the present embodiment, when only audio is transmitted, audio data is transmitted in image file format data. However, audio data is transmitted in an audio file format other than the image file format, and received and reproduced. You may make it.
[0054]
【The invention's effect】
According to the network camera system of the present invention, not only the image file but also the audio data is inserted into the data in the image file format such as JPEG, so that the audio data and the image data transmitted from the network camera can be transmitted to the network terminal. It is possible to easily reproduce the two by sufficiently synchronizing them. Also, since the audio data is inserted in the (comment part) of the header area, only the image can be reproduced even in a network terminal that performs only a normal image expansion operation, while the audio data of the comment part of the header area can be reproduced. With a configuration in which data can be separated and reproduced, audio can also be reproduced in the network terminal. Therefore, even a general browser using the same image file format data can display at least the image data, there is no need to generate a different file for each network terminal, the processing load on the network camera is reduced, and the cost is reduced. System can be provided. In addition, since the audio data is not thinned out together with the image data as in the related art, there is no possibility that the audio is interrupted.
[0055]
Since the camera position information and / or camera control information of the camera section is inserted in the header area (comment section), only the image can be reproduced even in a network terminal that performs only a normal image expansion operation, while the header is not. If the configuration is such that the camera position information and / or camera control information in the comment part of the area can be separated and processed, the network terminal can acquire the camera position information and / or camera control information. Therefore, even a general browser can display image data using the same image file format data, there is no need to generate a different file for each network terminal, the processing load on the network camera is reduced, and the cost is reduced. A system can be provided.
[0056]
According to the network camera of the present invention, even when the image data is compressed, the audio data is stored in the header area (comment part), so that the compatibility at the time of the decompression processing can be maintained. Since the audio data is not thinned out together with the image data as in the related art, there is no possibility that the audio is interrupted. Since the image data and the audio data are transmitted simultaneously as the same image file format data, the image and the audio can be sufficiently synchronized.
[0057]
Since the mode can be switched, only the audio data can be transmitted when the traffic load becomes large, and communication reflecting needs for traffic, images, and audio can be performed. Also, since the camera position information and camera control information are inserted in the header of the image file, the network terminal can easily obtain and use the camera position information and camera control information from the received image file format data. .
[0058]
Furthermore, at the silent level, unnecessary data is not sent, the amount of data can be reduced, and even when audio data to be reproduced is delayed due to traffic congestion, audio data in a silent state can be thinned out. Synchronization adjustment can be complemented. Then, even when the audio data exceeds the allowable data length, transmission becomes possible. Further, when the audio data exceeds the allowable data length, the data of the allowable data length is transmitted first, and then the remaining data is notified only of the audio data, so that the audio is not interrupted.
[0059]
According to the data transmission method of the present invention, not only the image file but also the audio data is inserted into the data in the image file format such as JPEG, so that the audio data and the image data transmitted from the network camera can be transmitted to the network terminal. It is possible to easily reproduce the two by sufficiently synchronizing them. Also, since the audio data is inserted in the header area (comment part), only the image can be reproduced even in a network terminal that performs only a normal image development operation, while the audio data in the comment part of the header area is reproduced. Is configured to be able to separate and reproduce the sound, the network terminal can also reproduce the sound. Therefore, even a general browser can display at least the image data with the same image file format data, there is no need to generate a different file for each network terminal, the processing load on the network camera is reduced, and the cost is reduced. System can be provided. In addition, since the audio data is not thinned out together with the image data as in the related art, there is no possibility that the audio is interrupted.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram of a network camera system according to a first embodiment of the present invention.
FIG. 2 is a configuration diagram of a network camera according to Embodiment 1 of the present invention.
FIG. 3 is a packet configuration diagram of a response notification transmitted by the network camera according to the first embodiment of the present invention.
FIG. 4 is a time chart of data transmitted by the network camera according to the first embodiment of the present invention.
FIG. 5 is a chart at the time of silence in the time chart of FIG. 4;
FIG. 6 is a flowchart of a process performed by the network camera according to the first embodiment of the present invention.
FIG. 7A is a configuration diagram of a receiving terminal device of a conventional image and audio synchronization method based on synchronization information. FIG. 7B is a configuration diagram of a transmission terminal device of a conventional image and audio synchronization method based on synchronization information.
FIG. 8 is a time chart of conventional audio data and still image data transmitted from a transmitting terminal.
[Explanation of symbols]
1 network terminal
2,2a, 2b, 2c Network camera
3 router
4 Network
5 DHCP server
6 DNS server
7 Web server
11 Network control unit
12 Browser means
13 Display control means
14 Voice control means
15 Memory
16 control unit
17 Input control means
18 Voice Function Extension
18a Voice data extracting means
18b Timing management means
21 Network control unit
21a Web server section
22 Camera section
23 Video control unit
24 microphone
24a speaker
25 Voice control unit
26 Drive control unit
27 Memory
28 Image File Format Data Generation Means
29 Mode management means
40 Response notification
41 Header
42 Data Division
43 Start code
44 Comment section
45 Data length description section
46 Voice data section (comment data section)
101 Communication unit
102 Data separation unit
103 Audio processing unit
104 voice storage
105 Image processing unit
106 Image storage unit
107 Integration processing unit
108 Synchronization information storage unit
109 Time control unit
110 audio output unit
111 Image output unit
112 Voice input unit
113 Image input unit
114 Synthesizing unit
115 Synchronization information addition unit
116 Synchronization information input section
117 control unit
118 Data transmission unit
119 Communication Department

Claims

カメラ部で撮影した画像データとマイクで集音した音声データを含むウェブページをネットワークに送信するネットワークカメラと、該ネットワークに接続され、受信したウェブページから映像と音声を再生することができるネットワーク端末とから構成されるネットワークカメラシステムであって、
前記ネットワークカメラが、前記ウェブページに含まれる画像ファイル形式データの挿入位置を選択して、前記音声データを前記画像ファイル形式データのヘッダ域に挿入するとともに前記画像データを前記画像ファイル形式データのデータ域に挿入し、前記ネットワーク端末に前記ウェブページを送信することを特徴とするネットワークカメラシステム。A network camera for transmitting a web page including image data captured by a camera unit and audio data collected by a microphone to a network, and a network terminal connected to the network and capable of reproducing video and audio from the received web page A network camera system comprising:
The network camera selects an insertion position of the image file format data included in the web page, inserts the audio data into a header area of the image file format data, and inserts the image data into the image file format data. And transmitting the web page to the network terminal.

前記カメラ部を駆動する駆動制御部を備え、該カメラ部のカメラ位置情報及びまたはカメラ制御情報を前記画像ファイル形式データのヘッダ域に挿入するとともに前記画像データを前記画像ファイル形式データのデータ域に挿入して送信することを特徴とする請求項１記載のネットワークカメラシステム。A drive control unit that drives the camera unit, and inserts camera position information and / or camera control information of the camera unit into a header area of the image file format data and inserts the image data into a data area of the image file format data. 2. The network camera system according to claim 1, wherein the network camera system inserts and transmits.

映像を撮影するカメラ部と、
音声を集音するマイクと、
前記カメラ部で得た画像情報を符号化して画像データを出力する映像制御部と、
前記マイクからの音声情報を符号化して音声データを出力する音声制御部と、ネットワークと接続してネットワーク端末に前記画像データと前記音声データを送信できるネットワーク制御部を備えたネットワークカメラであって、
前記音声データを画像ファイル形式データのヘッダ域に挿入する画像ファイル形式データ生成手段が設けられ、
該画像ファイル形式データ生成手段が前記音声データを前記画像ファイル形式データのヘッダ部分に挿入し、かつ前記画像ファイル形式データのデータ領域に前記画像データを挿入して画像ファイル形式データを生成して、前記ネットワーク端末に送信することを特徴とするネットワークカメラ。A camera section for shooting video,
A microphone that collects audio,
A video control unit that encodes image information obtained by the camera unit and outputs image data;
An audio control unit that encodes audio information from the microphone and outputs audio data, and a network camera including a network control unit that can connect to a network and transmit the image data and the audio data to a network terminal.
Image file format data generating means for inserting the audio data into the header area of the image file format data is provided,
The image file format data generating means inserts the audio data into a header portion of the image file format data, and inserts the image data into a data area of the image file format data to generate image file format data, A network camera for transmitting to the network terminal.

音声データと画像データを同時に送信する第１のモードと、音声データだけを送信する第２のモードと、画像データだけを送信する第３のモードとを切り替えるモード切り替え手段を備え、
ネットワーク端末からのモード切り替え要求を受けると、前記モード切り替え手段がいずれかのモードに切り替え、前記通知生成手段が該モードに従って画像ファイル形式データを生成することを特徴とする請求項３記載のネットワークカメラ。A mode switching unit that switches between a first mode for simultaneously transmitting audio data and image data, a second mode for transmitting only audio data, and a third mode for transmitting only image data,
4. The network camera according to claim 3, wherein when a mode switching request is received from a network terminal, the mode switching unit switches to one of the modes, and the notification generating unit generates image file format data according to the mode. .

前記カメラ部を駆動する駆動制御部を備え、前記画像ファイル形式データ生成手段がカメラ位置情報及びまたはカメラ制御情報を画像ファイル形式データのヘッダ域に挿入することを特徴とする請求項３または４記載のネットワークカメラ。5. The image processing apparatus according to claim 3, further comprising a drive control unit that drives the camera unit, wherein the image file format data generating unit inserts camera position information and / or camera control information into a header area of the image file format data. Network camera.

マイクからの音声入力が無音レベルの場合に、前記画像ファイル形式データ生成手段が、音声データを添付しないで前記画像データのみが挿入された画像ファイル形式データまたは無音レベルを表すデータと前記画像データが挿入された画像ファイル形式データを送信することを特徴とする請求項３〜５のいずれかに記載のネットワークカメラ。When the sound input from the microphone is at the silence level, the image file format data generating means may include the image file format data in which only the image data is inserted without attaching the sound data or the data representing the silence level and the image data. 6. The network camera according to claim 3, wherein the inserted image file format data is transmitted.

前記音声データのデータ長が許容データ長を越えた場合、前記通知生成手段が該許容データ長の音声データを添付した画像ファイル形式データを送信することを特徴とする請求項３〜６のいずれかに記載のネットワークカメラ。7. The method according to claim 3, wherein when the data length of the audio data exceeds the allowable data length, the notification generating unit transmits image file format data to which the audio data having the allowable data length is attached. The network camera according to item 1.

前記音声データのデータ長が許容データ長を越えた場合、前記通知生成手段が該許容データ長の音声データを添付した画像ファイル形式データを送信した後に、音声データのみが添付された画像ファイル形式データを送信することを特徴とする請求項７記載のネットワークカメラ。When the data length of the audio data exceeds the allowable data length, after the notification generating means transmits the image file format data to which the audio data of the allowable data length is attached, the image file format data to which only the audio data is attached 8. The network camera according to claim 7, wherein the network camera transmits the information.

ウェブページに音声データと画像データを添付して送信するとき、画像ファイル形式データのフォーマットにおける挿入位置を選択し、前記画像ファイル形式データのヘッダ域に音声データ、データ域に画像データを挿入して、このウェブページをネットワーク端末へ送信することにより通信の同期をとることを特徴とするデータ送信方法。When sending audio data and image data attached to the web page, select the insertion position in the format of the image file format data, insert the audio data in the header area of the image file format data, insert the image data in the data area Transmitting a web page to a network terminal to synchronize communication.