JP2018137639A

JP2018137639A - Moving image processing system, encoder and program, decoder and program

Info

Publication number: JP2018137639A
Application number: JP2017031341A
Authority: JP
Inventors: 塚本　明利; Akitoshi Tsukamoto; 明利塚本
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-02-22
Filing date: 2017-02-22
Publication date: 2018-08-30

Abstract

PROBLEM TO BE SOLVED: To provide a moving image processing system capable of efficiently decoding an image of a face having specific characteristics.SOLUTION: A coding part includes: face area processing means that detects a face area including a face of a person in each frame of a moving image and generates a piece of face detection result data representing the detection result; encoding processing means that encodes the moving images and generates a piece of moving image encoded data; and encoding side output means that outputs a piece of data including the face detection result data and moving image encoded data. The coding part also includes: data acquisition means that acquires a piece of data output from the coding part; and decode processing means that extracts a part of frame from the moving image encoded data by using at least face detection result data and decodes the same to acquire decode moving images.SELECTED DRAWING: Figure 1

Description

この発明は、動画像処理システム、並びに、符号化装置及びプログラム、並びに、復号装置及びプログラムに関し、例えば、カメラで撮影した映像（動画像）のデータを圧縮符号化（符号化）して伝送するシステムに適用し得る。 The present invention relates to a moving image processing system, an encoding device and a program, and a decoding device and a program. For example, video (moving image) data captured by a camera is compressed and encoded (encoded) and transmitted. Applicable to the system.

従来、カメラで撮影した映像の映像データを圧縮する技術としては、例えば、特許文献１の記載技術がある。特許文献１の記載技術では、映像の各フレームについて顔検出処理を行うことにより顔が映った領域（以下、「顔領域」とも呼ぶ）を求め、顔以外の領域（以下、「非顔領域」とも呼ぶ）よりも顔領域に多くの符号量を割り当てることで、顔領域を高画質にした圧縮符号化を行う。また、特許文献１の記載技術は、ビデオカメラにも適用できる。 Conventionally, as a technique for compressing video data of a video taken by a camera, for example, there is a technique described in Patent Document 1. In the technique described in Patent Document 1, a face detection process (hereinafter also referred to as “face area”) is obtained by performing face detection processing on each frame of a video, and an area other than the face (hereinafter referred to as “non-face area”). (2) is assigned to the face area, and compression encoding is performed with the face area having high image quality. The technique described in Patent Document 1 can also be applied to a video camera.

特開２００７−２９５３７０号公報JP 2007-295370 A

しかしながら、特許文献１に記載の技術では、圧縮符号化されたデータから特定の特徴の人物（例えば、ある特定の人物、あるいは「２０代・女性」等の特定の属性の人物）が写っている映像シーンを検索表示する場合、いったん圧縮符号化された動画像をすべて復号して改めて顔検出を行い、検出された顔に対して認識処理（例えば、個人識別や性別年齢推定等）を行う必要がある。 However, in the technique described in Patent Document 1, a person with a specific characteristic (for example, a specific person or a person with a specific attribute such as “20s / female”) is captured from the compression-encoded data. When searching and displaying video scenes, it is necessary to decode all compressed and encoded moving images and perform face detection again, and perform recognition processing (eg, personal identification and gender age estimation) on the detected faces. There is.

そのため、効率的に特定の特徴の顔が映った映像を復号することができる動画像処理システム、並びに、符号化装置及びプログラム、並びに、復号装置及びプログラムが望まれている。 Therefore, there is a demand for a moving image processing system, an encoding device and a program, and a decoding device and a program that can efficiently decode a video showing a face having a specific feature.

第１の本発明は、符号化部と復号部を備える動画像処理システムにおいて、（１）前記符号化部は、（１−１）動画像の各フレームについて人物の顔を含む顔領域を検出し、その検出結果を示す顔検出結果データを生成する顔領域処理手段と、（１−２）前記動画像を符号化して動画像符号化データを生成する符号化処理手段と、（１−３）前記顔検出結果データと、前記動画像符号化データとを含むデータを出力する符号化側出力手段とを有し、（２）前記復号部は、（２−１）前記符号化部が出力したデータを取得するデータ取得手段と、（２−２）少なくとも前記顔検出結果データを利用して、前記動画像符号化データから、一部のフレームを抽出して復号し、復号動画像を取得する復号処理手段とを有することを特徴とする。 1st this invention is a moving image processing system provided with an encoding part and a decoding part, (1) The said encoding part detects the face area | region containing a human face about each frame of (1-1) moving images. A face area processing means for generating face detection result data indicating the detection result, (1-2) an encoding processing means for encoding the moving image to generate moving image encoded data, and (1-3 ) Encoding side output means for outputting data including the face detection result data and the moving image encoded data; (2) the decoding unit is (2-1) the encoding unit outputs And (2-2) using at least the face detection result data to extract and decode a part of the frames from the moving image encoded data to obtain a decoded moving image. And a decoding processing means.

第２の本発明の符号化装置は、（１）動画像の各フレームについて人物の顔を含む顔領域を検出し、その検出結果を示す顔検出結果データを生成する顔領域処理手段と、（２）前記動画像を符号化して動画像符号化データを生成する符号化処理手段と、（３）前記顔検出結果データと、前記動画像符号化データとを含むデータを出力する符号化側出力手段とを有することを特徴とする。 The encoding apparatus according to the second aspect of the present invention includes: (1) a face area processing unit that detects a face area including a human face for each frame of a moving image and generates face detection result data indicating the detection result; 2) An encoding processing unit that encodes the moving image to generate moving image encoded data; and (3) an encoding side output that outputs data including the face detection result data and the moving image encoded data. Means.

第３の本発明の符号化プログラムは、コンピュータを、（１）動画像の各フレームについて人物の顔を含む顔領域を検出し、その検出結果を示す顔検出結果データを生成する顔領域処理手段と、（２）前記動画像を符号化して動画像符号化データを生成する符号化処理手段と、（３）前記顔検出結果データと、前記動画像符号化データとを含むデータを出力する符号化側出力手段として機能させることを特徴とする。 According to a third aspect of the present invention, there is provided an encoding program comprising: (1) a face area processing unit that detects a face area including a person's face for each frame of a moving image and generates face detection result data indicating the detection result; And (2) an encoding processing unit that encodes the moving image to generate moving image encoded data, and (3) a code that outputs data including the face detection result data and the moving image encoded data. It is made to function as a production-side output means.

第４の本発明の復号装置は、（１）動画像を符号化した動画像符号化データと、前記動画像で人物の顔を含む顔領域を検出した検出結果を示す顔検出結果データとを含むデータを取得するデータ取得手段と、（２）少なくとも前記顔検出結果データを利用して、前記動画像符号化データから、一部のフレームを抽出して復号し、復号動画像を取得する復号処理手段とを有することを特徴とする。 A decoding device according to a fourth aspect of the present invention provides (1) moving image encoded data obtained by encoding a moving image, and face detection result data indicating a detection result of detecting a face area including a human face in the moving image. (2) decoding that extracts and decodes a part of the frame from the moving image encoded data using at least the face detection result data and acquires the decoded moving image. And a processing means.

第５の本発明の復号プログラムは、コンピュータを、（１）動画像を符号化した動画像符号化データと、前記動画像で人物の顔を含む顔領域を検出した検出結果を示す顔検出結果データとを含むデータを取得するデータ取得手段と、（２）少なくとも前記顔検出結果データを利用して、前記動画像符号化データから、一部のフレームを抽出して復号し、復号動画像を取得する復号処理手段として機能させることを特徴とする。 A decoding program according to a fifth aspect of the present invention provides a computer, (1) moving image encoded data obtained by encoding a moving image, and a face detection result indicating a detection result of detecting a face area including a human face in the moving image. Data acquisition means for acquiring data including data, and (2) using at least the face detection result data, extracting and decoding a part of the frame from the moving image encoded data, and decoding the decoded moving image It is made to function as a decoding processing means to acquire.

本発明によれば、効率的に特定の特徴の顔が映った映像を復号する動画像処理システムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the moving image processing system which decodes the image | video in which the face of the specific characteristic was reflected efficiently can be provided.

第１の実施形態に係るカメラシステム（動画像処理システム）の構成について示したブロック図である。It is the block diagram shown about the structure of the camera system (moving image processing system) which concerns on 1st Embodiment. 第１の実施形態に係る検索条件情報の構成例について示した説明図である。It is explanatory drawing shown about the structural example of the search condition information which concerns on 1st Embodiment. 第１の実施形態に係るカメラシステムで処理される映像（フレーム）の遷移の例について示した説明図である。It is explanatory drawing shown about the example of the transition of the image | video (frame) processed with the camera system which concerns on 1st Embodiment. 第１の実施形態に係る符号化部（符号化装置）の動作について示したフローチャートである。It is the flowchart shown about operation | movement of the encoding part (encoding apparatus) which concerns on 1st Embodiment. 第１の実施形態に係る復号部（復号装置）の動作について示したフローチャートである。It is the flowchart shown about operation | movement of the decoding part (decoding apparatus) which concerns on 1st Embodiment. 第２の実施形態に係るカメラシステム（動画像処理システム）の構成について示したブロック図である。It is the block diagram shown about the structure of the camera system (moving image processing system) which concerns on 2nd Embodiment. 第２の実施形態に係る認識内容情報の構成例について示した説明図である。It is explanatory drawing shown about the structural example of the recognition content information which concerns on 2nd Embodiment. 第２の実施形態に係る検索条件情報の構成例について示した説明図である。It is explanatory drawing shown about the structural example of the search condition information which concerns on 2nd Embodiment. 第２の実施形態に係るカメラシステムで処理される映像（フレーム）の遷移の例について示した説明図である。It is explanatory drawing shown about the example of the transition of the image | video (frame) processed with the camera system which concerns on 2nd Embodiment. 第２の実施形態に係る符号化部（符号化装置）の動作について示したフローチャートである。It is the flowchart shown about operation | movement of the encoding part (encoding apparatus) which concerns on 2nd Embodiment. 第２の実施形態に係る復号部（復号装置）の動作について示したフローチャートである。It is the flowchart shown about operation | movement of the decoding part (decoding apparatus) which concerns on 2nd Embodiment.

（Ａ）第１の実施形態
以下、本発明による動画像処理システム、並びに、符号化装置及びプログラム、並びに、復号装置及びプログラムの第１の実施形態を、図面を参照しながら詳述する。以下では、本発明の映像処理システム、符号化装置、復号装置を、それぞれカメラシステム、符号化部、及び復号部に適用した例について説明する。 (A) First Embodiment Hereinafter, a moving image processing system, an encoding device and a program, and a decoding device and a program according to a first embodiment of the present invention will be described in detail with reference to the drawings. Below, the example which applied the video processing system of this invention, the encoding apparatus, and the decoding apparatus to the camera system, the encoding part, and the decoding part, respectively is demonstrated.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係るカメラシステム１の全体構成について示したブロック図である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing an overall configuration of a camera system 1 according to the first embodiment.

カメラシステム１には、カメラ３００、符号化部１００（撮影側の装置）、及び復号部２００（検索側の装置）が配置されている。 In the camera system 1, a camera 300, an encoding unit 100 (photographing side device), and a decoding unit 200 (searching side device) are arranged.

カメラ３００は、撮像（撮影）した映像（動画像）の映像データ（映像信号）を符号化部１００に供給する。なお、この実施形態では、カメラ３００で撮影された映像の映像データがリアルタイムに符号化部１００に供給されるものとして説明するが、ハードディスクドライブやメモリカード等のデータ記録媒体に記録された映像データをオフラインで符号化部１００に供給するようにしてもよい。カメラ３００としては、デジタルカメラ、監視カメラ等の種々のカメラを適用することができる。 The camera 300 supplies video data (video signal) of a captured video (moving image) to the encoding unit 100. In this embodiment, video data captured by the camera 300 is described as being supplied to the encoding unit 100 in real time. However, video data recorded on a data recording medium such as a hard disk drive or a memory card is described. May be supplied to the encoding unit 100 offline. As the camera 300, various cameras such as a digital camera and a surveillance camera can be applied.

この実施形態では、カメラシステム１は、符号化部１００、復号部２００、及びカメラ３００の３つの装置で構成されている例について説明するが、カメラシステム１を構成する装置の数や各装置に搭載する機能の組み合わせは限定されないものである。例えば、図１に示すカメラシステム１の構成をすべて１つの装置で構成するようにしてもよい。 In this embodiment, an example in which the camera system 1 is configured by three devices, that is, an encoding unit 100, a decoding unit 200, and a camera 300, will be described. The combination of functions to be installed is not limited. For example, you may make it comprise the structure of the camera system 1 shown in FIG. 1 by one apparatus.

カメラシステム１の用途は限定されないものではあるが、例えば、カメラ３００で銀行やコンビニエンスストア等の店舗で撮影した映像（動画像）から、特定の人物や特定の属性の人物が写ったシーンのフレームを抽出して出力することに用いることができる。 Although the usage of the camera system 1 is not limited, for example, a frame of a scene in which a specific person or a person with a specific attribute is captured from a video (moving image) captured by a camera 300 at a store such as a bank or a convenience store. Can be used to extract and output.

符号化部１００は、カメラ３００等により入力された映像を圧縮符号化（符号化）した圧縮データＤ１０を生成して出力するものである。符号化部１００が圧縮データＤ１０を出力する出力形式については限定されないものである。符号化部１００は、例えば、ハードディスクドライブやメモリカード等のデータ記録媒体に圧縮データＤ１０を書き込むようにしてもよいし、通信により復号部２００側にデータ送信（伝送）するようにしてもよい。また、符号化部１００は、通信により復号部２００へ圧縮データＤ１０を送信する場合、リアルタイムに（例えば、フレームごとに）、復号部２００へ圧縮データＤ１０を送信するようにしてもよいし、復号部２００へ一括したファイルで圧縮データＤ１０を送信するようにしてもよい。符号化部１００及び圧縮データＤ１０の詳細構成については後述する。 The encoding unit 100 generates and outputs compressed data D10 obtained by compression encoding (encoding) video input by the camera 300 or the like. The output format in which the encoding unit 100 outputs the compressed data D10 is not limited. For example, the encoding unit 100 may write the compressed data D10 on a data recording medium such as a hard disk drive or a memory card, or may transmit (transmit) data to the decoding unit 200 side by communication. Further, when transmitting the compressed data D10 to the decoding unit 200 by communication, the encoding unit 100 may transmit the compressed data D10 to the decoding unit 200 in real time (for example, for each frame). The compressed data D10 may be transmitted to the unit 200 as a batch file. Detailed configurations of the encoding unit 100 and the compressed data D10 will be described later.

復号部２００は、圧縮データＤ１０が供給されると、圧縮データＤ１０を復号した映像に基づく出力を行う。具体的には、復号部２００は、圧縮データＤ１０を復号した映像に基づくデータ（以下、「出力データＤ２０」と呼ぶ）又は、圧縮データＤ１０を復号した映像(復号映像)の信号（例えば、ディスプレイに直接供給可能な形式の信号）を出力する。復号部２００に圧縮データＤ１０を入力（供給）する方式、及び復号部２００が出力データ２０又は映像信号を出力する方式については限定されないものである。復号部２００は、例えば、ハードディスクドライブやメモリカード等のデータ記録媒体を用いて、データの入出力（圧縮データＤ１０の入力及び出力データ２０の出力）を行うようにしてもよいし、通信によりデータの入出力を行うようにしてもよい。 When the compressed data D10 is supplied, the decoding unit 200 performs output based on the video obtained by decoding the compressed data D10. Specifically, the decoding unit 200 includes data (hereinafter referred to as “output data D20”) based on a video obtained by decoding the compressed data D10, or a signal (for example, a display) of the video (decoded video) obtained by decoding the compressed data D10. To a signal that can be supplied directly to The method for inputting (supplying) the compressed data D10 to the decoding unit 200 and the method for the decoding unit 200 outputting the output data 20 or the video signal are not limited. The decoding unit 200 may perform data input / output (input of compressed data D10 and output of output data 20), for example, using a data recording medium such as a hard disk drive or a memory card, or data by communication. May be input / output.

次に、符号化部１００の内部構成について説明する。 Next, the internal configuration of the encoding unit 100 will be described.

符号化部１００は、顔検出部１０１、映像圧縮部１０２、及びデータ出力部１０３を有している。 The encoding unit 100 includes a face detection unit 101, a video compression unit 102, and a data output unit 103.

符号化部１００は、例えば、プロセッサ及びメモリを有するプログラムの実施構成（コンピュータ）にプログラム（実施形態に係る符号化プログラムを含む）をインストールすることにより構成してもよい。 For example, the encoding unit 100 may be configured by installing a program (including the encoding program according to the embodiment) in an implementation configuration (computer) of a program having a processor and a memory.

顔検出部１０１は、カメラ３００から供給された映像の各フレームから人物の顔を検出する処理（以下、「顔検出処理」と呼ぶ）を行い、その結果を示すデータ（以下、「顔認識結果データ」と呼ぶ）を生成する。顔検出部１０１は、顔認識結果データとして、顔検出処理の結果として顔検出がされたフレーム（以下、「顔映像フレーム」と呼ぶ）を特定するための時刻情報と、当該顔映像フレームにおいて顔映像が検出された位置（顔領域）を特定する位置情報とを含むデータを生成する。顔検出結果データを構成する時刻情報は、顔映像フレームを特定可能（例えば、時間軸上やフレーム列上で特定可能）な情報であればよく、例えば、先頭からの時刻情報や先頭からのフレーム順序を示す情報としてもよい。顔検出データを構成する位置情報は、フレーム内で顔領域の位置（範囲）を特定することができればよく、例えば、顔領域を特定するための座標やベクタ形式の情報を適用することができる。なお、１つの顔映像フレームに複数の顔領域が存在する場合もあり得る。 The face detection unit 101 performs processing for detecting a human face from each frame of the video supplied from the camera 300 (hereinafter referred to as “face detection processing”), and data indicating the result (hereinafter, “face recognition result”). Data)). The face detection unit 101 uses, as face recognition result data, time information for specifying a frame in which face detection has been performed as a result of face detection processing (hereinafter referred to as a “face image frame”), and a face in the face image frame. Data including position information specifying the position (face area) where the video is detected is generated. The time information constituting the face detection result data may be information that can identify a face video frame (for example, it can be identified on a time axis or a frame sequence). Information indicating the order may be used. The position information constituting the face detection data only needs to be able to specify the position (range) of the face area in the frame. For example, coordinates for specifying the face area or vector format information can be applied. There may be a plurality of face areas in one face video frame.

映像圧縮部１０２は、顔検出部１０１による顔検出処理結果（顔検出結果データ）を利用して、カメラ３００から供給されるフレームについて圧縮符号化（符号化）を行って、圧縮映像データを生成する。映像圧縮部１０２は、顔映像フレームについては、例えば、特許文献１に記載されるように、顔領域と非顔領域とで異なる圧縮符号化処理（符号化処理）を行う。具体的には、例えば、映像圧縮部１０２は、特許文献１に記載されるように、顔領域と非顔領域で異なる符号量を割り当てる（異なるレートで符号化処理を行う）など、目的に応じた圧縮符号化処理を行うようにしてもよい。 The video compression unit 102 uses the face detection processing result (face detection result data) by the face detection unit 101 to perform compression encoding (encoding) on a frame supplied from the camera 300 to generate compressed video data. To do. For example, as described in Patent Document 1, the video compression unit 102 performs different compression encoding processing (encoding processing) on the face area and the non-face area. Specifically, for example, as described in Patent Document 1, the video compression unit 102 allocates different code amounts for the face area and the non-face area (performs encoding processing at different rates), or the like according to the purpose. The compression encoding process may be performed.

データ出力部１０３は、映像圧縮部１０２で生成された圧縮映像データと顔検出部１０１で生成された顔検出結果データとを含む所定の形式の圧縮データＤ１０を出力する。例えば、データ出力部１０３が、動画像の標準符号化方式であるＨ．２６４形式で圧縮データＤ１０を出力する場合には、圧縮映像データの付加情報としてその他の情報（この実施形態では顔検出結果データ）を一体化させることが可能である。また、映像圧縮データと顔検出結果データを個別のデータ（ファイル）として出力するようにしてもよい。 The data output unit 103 outputs compressed data D10 having a predetermined format including the compressed video data generated by the video compression unit 102 and the face detection result data generated by the face detection unit 101. For example, the data output unit 103 is an H.264 standard encoding system for moving images. When outputting the compressed data D10 in the H.264 format, other information (face detection result data in this embodiment) can be integrated as additional information of the compressed video data. Further, the compressed video data and the face detection result data may be output as individual data (file).

次に、復号部２００の内部構成について説明する。 Next, the internal configuration of the decoding unit 200 will be described.

復号部２００は、顔検出結果読み取り部２０１、顔映像復号部２０２、顔認識部２０３、及び結果出力部２０４を有している。 The decoding unit 200 includes a face detection result reading unit 201, a face video decoding unit 202, a face recognition unit 203, and a result output unit 204.

復号部２００は、例えば、プロセッサ及びメモリを有するプログラムの実施構成（コンピュータ）にプログラム（実施形態に係る復号プログラムを含む）をインストールすることにより構成してもよい。 For example, the decoding unit 200 may be configured by installing a program (including the decoding program according to the embodiment) in an implementation configuration (computer) of a program having a processor and a memory.

顔検出結果読み取り部２０１は、供給された圧縮データＤ１０から、圧縮映像データと顔検出結果データを読み込む。 The face detection result reading unit 201 reads compressed video data and face detection result data from the supplied compressed data D10.

顔映像復号部２０２は、顔検出結果データに含まれる顔映像フレームの時刻情報に基づき、顔映像フレームのみ圧縮映像データから復号した映像（以下、「第１の復号映像」と呼ぶ）を取得し、顔認識部２０３に供給する。 The face video decoding unit 202 acquires a video (hereinafter referred to as “first decoded video”) obtained by decoding only the face video frame from the compressed video data based on the time information of the face video frame included in the face detection result data. To the face recognition unit 203.

顔認識部２０３は、第１の復号映像の顔領域について、設定された検索条件情報２０３ａに該当するか否かを認識する処理を行う。また、顔認識部２０３は、顔検出結果データで示された位置情報（顔領域を示した位置情報）に基づき、顔映像フレームの顔領域のみを抽出して顔認識を行い、検索条件情報２０３ａの条件と照合する。顔認識部２０３が顔認識処理（検索処理）する際の具体的な処理方式については限定されず、種々の顔認識の処理を適用することができる。 The face recognition unit 203 performs processing for recognizing whether or not the face area of the first decoded video corresponds to the set search condition information 203a. Further, the face recognition unit 203 performs face recognition by extracting only the face area of the face video frame based on the position information (position information indicating the face area) indicated by the face detection result data, and the search condition information 203a. Check against the conditions of. A specific processing method when the face recognition unit 203 performs the face recognition process (search process) is not limited, and various face recognition processes can be applied.

検索条件情報２０３ａは、顔映像フレームの顔映像の特徴について認識（分析）する内容を定義した情報である。例えば、検索条件情報２０３ａとしては、「特性の人物の顔の特徴と照合して閾値以上の一致度となること」や、「女性で、かつ、２０代であること」等を示す情報が設定される。検索条件情報２０３ａを定義する具体的な形式については限定されないものであるが、例えば、上述のような検索条件を表す論理式や、プログラム言語で記述した内容を設定するようにしてもよい。 The search condition information 203a is information defining the contents to be recognized (analyzed) about the features of the face image of the face image frame. For example, as the search condition information 203a, information indicating “matching degree equal to or higher than a threshold value by matching with the characteristics of a person's face of a characteristic”, “being a woman and being in his twenties”, or the like is set. Is done. Although the specific format for defining the search condition information 203a is not limited, for example, a logical expression representing the search condition as described above or contents described in a programming language may be set.

図２は、検索条件情報２０３ａの構成例について示した説明図である。 FIG. 2 is an explanatory diagram showing a configuration example of the search condition information 203a.

図２に示すように、検索条件情報２０３ａは、例えば、検索条件種別を表すコードと、付加情報の組み合わせで構成することができる。検索条件情報２０３ａにおいて、検索条件種別に対応する付加情報を複数付加することもできる。各付加情報は可変長であることを考慮して、データ長を示す「バイト数」と、付加情報自体の組み合わせで構成するものとする。各付加情報の意味は、検索条件種別に対応する条件で規定（定義）されるものとする。 As shown in FIG. 2, the search condition information 203a can be composed of, for example, a combination of a code representing the search condition type and additional information. In the search condition information 203a, a plurality of additional information corresponding to the search condition type can be added. Considering that each additional information has a variable length, it is configured by a combination of “number of bytes” indicating the data length and the additional information itself. The meaning of each additional information is defined (defined) by the condition corresponding to the search condition type.

例えば、「特定の人物の顔の特徴と照合して閾値以上の一致度となること」を表す検索条件情報２０３ａは、図２（ａ）、図２（ｂ）のような内容となる。 For example, the search condition information 203a indicating “matching degree equal to or higher than a threshold value by collating with a facial feature of a specific person” has contents as shown in FIGS. 2 (a) and 2 (b).

図２（ａ）に示す検索条件情報２０３ａでは、検索条件種別として、「付加情報１に設定された顔特徴との一致度が付加情報２に設定された閾値以上であること」を示す「１」が設定され、付加情報１として顔特徴情報（３４００バイト）が設定され、付加情報２として一致度閾値（４バイト）が設定されている。図７（ａ）に示す検索条件情報２０３ａが設定されている場合、顔認識部２０３は、検索条件種別「１」の規定に従って、付加情報１に設定された顔特徴情報（３４００バイト）と、顔映像フレームから取得した顔領域に映った顔の特徴量との一致度が、付加情報２の一致度閾値以上であるか否かを判断する処理を行うことになる。 In the search condition information 203a shown in FIG. 2A, “1” indicating that the degree of matching with the face feature set in the additional information 1 is equal to or greater than the threshold set in the additional information 2 as the search condition type. ”Is set, the face feature information (3400 bytes) is set as the additional information 1, and the coincidence threshold (4 bytes) is set as the additional information 2. When the search condition information 203a shown in FIG. 7A is set, the face recognition unit 203, according to the search condition type “1”, the face feature information (3400 bytes) set in the additional information 1, A process of determining whether or not the degree of coincidence with the feature amount of the face reflected in the face area acquired from the face video frame is equal to or greater than the coincidence degree threshold of the additional information 2 is performed.

なお、図２（ｂ）に示すように、検索条件情報２０３ａにおいて、付加情報１の顔特徴情報については、顔特徴情報自体ではなく、照合する顔特徴が格納されたデータベースのインデックス（４バイト）で示すようにしてもよい。この場合、顔特徴を格納するデータベースの場所について限定されないものであるが、復号部２００自体でもよいし、他のコンピュータ（例えば、図示しないファイルサーバやクラウド上等）としてもよい。 As shown in FIG. 2B, in the search condition information 203a, the face feature information of the additional information 1 is not the face feature information itself but an index (4 bytes) of the database storing the face feature to be collated. You may make it show by. In this case, the location of the database storing the facial features is not limited, but the decoding unit 200 itself or another computer (for example, a file server or cloud not shown) may be used.

また、例えば、「性別・年齢を認識（推定）すること」を表す検索条件情報２０３ａは、図２（ｃ）のような内容となる。図２（ｃ）に示す検索条件情報２０３ａでは、検索条件種別として、「年齢範囲および性別が付加情報１〜３に設定された指定条件に合致すること」を示す「３」が設定され、付加情報１として性別（４バイト）が設定され、付加情報２として年齢範囲下限値（４バイト）が設定され、付加情報３として年齢上限値（４バイト）が設定されている。図２（ｃ）に示す認識内容情報１０３ａが設定されている場合、顔認識部２０３は、検索条件種別「３」の規定に従って、顔映像フレームから取得した顔領域に映った顔の特徴量が、付加情報１に設定された性別（女性）で、かつ、付加情報２、３に設定された年齢範囲（２０才〜２９才）に合致するか否かを判断する処理を行うことになる。 Further, for example, the search condition information 203a indicating “recognizing (estimating) gender / age” has the contents as shown in FIG. In the search condition information 203a shown in FIG. 2 (c), “3” indicating that “the age range and gender match the specified conditions set in the additional information 1 to 3” is set as the search condition type. Gender (4 bytes) is set as information 1, age range lower limit (4 bytes) is set as additional information 2, and age upper limit (4 bytes) is set as additional information 3. When the recognition content information 103a shown in FIG. 2C is set, the face recognition unit 203 determines that the facial feature amount reflected in the face area acquired from the face image frame is in accordance with the definition of the search condition type “3”. Then, it is determined whether or not the gender (female) set in the additional information 1 matches the age range (20 to 29 years) set in the additional information 2 and 3.

そして、顔認識部２０３は、検索条件情報２０３ａに合致した顔のみが映っているフレームで構成された映像（以下、「第２の復号映像」と呼ぶ）を取得してデータ出力部１０３に供給する。 Then, the face recognition unit 203 acquires a video composed of frames in which only faces matching the search condition information 203a are shown (hereinafter referred to as “second decoded video”) and supplies the video to the data output unit 103. To do.

顔認識部２０３において、検索条件情報２０３ａはユーザの操作等により任意の情報を設定することが可能である。顔認識部２０３において、検索条件情報２０３ａの入力をユーザから受け付ける方式は限定されないものである。顔認識部２０３では、例えば、テキストファイル等の所定の形式のファイル受信により検索条件情報２０３ａの入力を受け付けるようにしてもよいし、ＧＵＩ等の操作画面（例えば、Ｗｅｂ画面）を用いて入力を受け付けるようにしてもよい。 In the face recognition unit 203, any information can be set as the search condition information 203a by a user operation or the like. In the face recognition unit 203, a method for receiving input of the search condition information 203a from the user is not limited. For example, the face recognition unit 203 may receive an input of the search condition information 203a by receiving a file of a predetermined format such as a text file, or input using an operation screen (for example, a Web screen) such as a GUI. You may make it accept.

そして、データ出力部１０３は、第２の復号映像に基づく出力データＤ２０又は第２の復号映像の映像信号を出力する。 Then, the data output unit 103 outputs the output data D20 based on the second decoded video or the video signal of the second decoded video.

（Ａ−２）第１の実施形態の動作
次に、本発明の第１の実施例におけるカメラシステム１の動作について説明する。 (A-2) Operation of First Embodiment Next, the operation of the camera system 1 in the first embodiment of the present invention will be described.

まず、第1の実施形態のカメラシステム１における画像処理の流れについて説明する。 First, the flow of image processing in the camera system 1 of the first embodiment will be described.

図３は、カメラシステム１における画像処理の遷移について示した説明図である。 FIG. 3 is an explanatory diagram showing transition of image processing in the camera system 1.

図３（ａ）は、カメラ３００で撮影した映像のフレームの例について示している。図３（ａ）には人物の顔が映ったフレームＦ１０１が図示されている。 FIG. 3A shows an example of a frame of a video shot by the camera 300. FIG. 3A shows a frame F101 in which a human face is reflected.

図３（ｂ）は、フレームＦ１０１について顔検出部１０１で行った顔検出結果の例を示す図である。図３（ｂ）では、フレームＦ１０１において顔を検出した顔領域を点線で囲って図示している。 FIG. 3B is a diagram illustrating an example of a face detection result performed by the face detection unit 101 for the frame F101. In FIG. 3B, the face area in which the face is detected in the frame F101 is surrounded by a dotted line.

図３（ｃ）は、フレームＦ１０１について映像圧縮部１０２で圧縮符号化処理（符号化処理）した処理結果（圧縮映像データ）の例について示している。図３（ｃ）では、顔領域の部分を点線で囲み、非顔領域（顔領域以外）の部分にハッチ（斜線）を付して図示している。映像圧縮部１０２は、顔領域の部分（点線で囲まれた部分）と非顔領域の部分（ハッチを付した部分）とで異なる圧縮符号化処理（符号化処理）を行うようにしてもよい。具体的には、映像圧縮部１０２では、フレームＦ１０１を圧縮符号化処理（符号化処理）する際に、顔領域の部分（点線で囲まれた部分）よりも顔領域以外の領域（ハッチを付した部分）についてデータ量が小さくなる圧縮処理（符号量が少なくなる圧縮処理）を行うようにしてもよい。言い換えると、映像圧縮部１０２では、フレームＦ１０１を圧縮符号化処理（符号化処理）する際に、顔領域の部分（点線で囲まれた部分）について、顔領域以外の領域（ハッチを付した部分）よりも多くの符号量を割り当てた圧縮処理（より画質の劣化の少ないレートの高い符号化処理）を行うようにしてもよい。 FIG. 3C illustrates an example of a processing result (compressed video data) obtained by compressing and encoding (encoding processing) the frame F101 by the video compression unit 102. In FIG. 3C, the face area is surrounded by a dotted line, and the non-face area (other than the face area) is hatched. The video compression unit 102 may perform different compression encoding processing (encoding processing) for the face area portion (the portion surrounded by the dotted line) and the non-face region portion (the hatched portion). . Specifically, in the video compression unit 102, when the frame F101 is subjected to compression encoding processing (encoding processing), a region other than the face region (hatched portion is attached) rather than the face region portion (portion surrounded by a dotted line). Compression processing that reduces the data amount (compression processing that reduces the code amount) may be performed. In other words, in the video compression unit 102, when the frame F101 is compression-encoded (encoding process), the area other than the face area (hatched part) is included in the face area part (the part surrounded by the dotted line). ) To which a larger amount of code is allocated (encoding process having a higher rate with less deterioration in image quality) may be performed.

図３（ｄ）は、顔映像復号部２０２で、顔検出結果データ（時刻情報）に基づいて、圧縮映像データから顔映像フレームであるフレームＦ１０１を特定し、圧縮映像データからフレームＦ１０１を復号する処理（第１の復号映像を生成する処理の例）について示している。顔映像復号部２０２では、顔映像フレーム（フレームＦ１０１）を復号する際に、顔領域の部分（点線で囲まれた部分）と、非顔領域（ハッチを付した部分）について、それぞれ対応する復号処理で復号処理を行う。これは、映像圧縮部１０２で、顔領域の部分と非顔領域の部分とで異なる圧縮処理（符号化処理）がなされているためである。 In FIG. 3D, the face video decoding unit 202 identifies the frame F101 that is a face video frame from the compressed video data based on the face detection result data (time information), and decodes the frame F101 from the compressed video data. It shows a process (an example of a process for generating a first decoded video). When decoding the face video frame (frame F101), the face video decoding unit 202 decodes the face area part (the part surrounded by a dotted line) and the non-face area (the hatched part) respectively. The decryption process is performed in the process. This is because the video compression unit 102 performs different compression processing (encoding processing) on the face area portion and the non-face area portion.

図３（ｅ）は、顔認識部２０３で、第１の復号映像を構成するフレームＦ１０１（図３（ｄ）の画像）から、顔検出結果データ（顔映像フレーム内における顔領域の位置情報）に基づき、顔認識処理を行う顔領域のみ（図３（ｄ）の顔領域のみ）を抽出した状態を表した図である。 FIG. 3E shows the face detection unit 203, and the face detection result data (position information of the face area in the face video frame) from the frame F101 (image of FIG. 3D) constituting the first decoded video. 4 is a diagram showing a state in which only a face area for performing face recognition processing (only the face area in FIG. 3D) is extracted based on FIG.

次に、符号化部１００の動作の例について図４のフローチャートを用いて説明する。 Next, an example of the operation of the encoding unit 100 will be described using the flowchart of FIG.

まず、カメラ３００で撮像された映像のフレームが符号化部１００に供給されたものとする（Ｓ１０１）。 First, it is assumed that a frame of an image captured by the camera 300 is supplied to the encoding unit 100 (S101).

次に、顔検出部１０１が、カメラ３００から供給されたフレーム（以下、このフレームを「注目フレーム」と呼ぶ）について顔検出処理を行い、顔映像（顔領域）が含まれているか否か（顔が映っているか否か）を判定する（Ｓ１０２）。このとき、顔検出部１０１は、顔の大きさや向きも考慮し、注目フレームに正面から所定以上の大きさで撮影された顔であるか否かを判定するようにしてもよい。顔検出部１０１は、注目フレームに顔映像（顔領域）が含まれている場合には後述するステップＳ１０３から動作し、そうでない場合には後述するステップＳ１０６から動作する。 Next, the face detection unit 101 performs face detection processing on a frame supplied from the camera 300 (hereinafter, this frame is referred to as a “frame of interest”), and whether or not a face image (face region) is included ( It is determined whether or not a face is reflected (S102). At this time, the face detection unit 101 may determine whether or not the face is a face photographed at a predetermined size or more from the front in consideration of the size and orientation of the face. The face detection unit 101 operates from step S103 to be described later when the face image (face area) is included in the frame of interest, and operates from step S106 to be described otherwise.

上述のステップＳ１０２で、注目フレームに顔映像が含まれている判定された場合、顔検出部１０１は、当該注目フレームを特定する時刻情報と、当該注目フレーム内の顔領域を特定する位置情報とを含む顔検出結果データを生成する（Ｓ１０３）。 When it is determined in step S102 described above that a face image is included in the target frame, the face detection unit 101 includes time information for specifying the target frame, and position information for specifying a face region in the target frame. Is generated (S103).

次に、映像圧縮部１０２は、顔検出結果データの位置情報に基づいて、注目フレームの顔領域と非顔領域とで異なる圧縮符号化処理（符号化処理）を行い、注目フレームの圧縮映像データを生成する（Ｓ１０４）。 Next, the video compression unit 102 performs different compression encoding processing (encoding processing) on the face area and the non-face area of the frame of interest based on the position information of the face detection result data, and the compressed video data of the frame of interest Is generated (S104).

一方、上述のステップＳ１０２で、注目フレームに顔映像（顔領域）が含まれていないと判定された場合、映像圧縮部１０２は、注目フレームについて、特に区別せずに（顔領域も非顔領域も同じ方式で）、全体を圧縮符号化処理（符号化処理）して、注目フレームの圧縮映像データを生成する（Ｓ１０６）。 On the other hand, if it is determined in step S102 described above that the face image (face area) is not included in the attention frame, the image compression unit 102 does not particularly distinguish the attention frame (the face area is also a non-face area). The whole is compression-encoded (encoding process) to generate compressed video data of the frame of interest (S106).

上述のステップＳ１０４又はステップＳ１０６で注目フレームの圧縮映像データが生成されると、データ出力部１０３は、当該映像圧縮データを含むデータ（顔検出結果データが生成されていた場合当該顔検出結果データを付加したデータ）を圧縮データＤ１０として出力する（Ｓ１０５）。データ出力部１０３は、所定の方式により圧縮データＤ１０を蓄積（例えば、データ記録媒体に蓄積）又は伝送（例えば、復号部２００に送信）する処理を行う。なお、データ出力部１０３は、圧縮映像データと、顔検出結果データとを別個のデータ（ファイル）として出力するようにしてもよい。 When the compressed video data of the frame of interest is generated in step S104 or step S106 described above, the data output unit 103 displays the data including the video compressed data (if the face detection result data has been generated, the face detection result data is generated). The added data) is output as compressed data D10 (S105). The data output unit 103 performs processing for storing (for example, storing in the data recording medium) or transmitting (for example, transmitting to the decoding unit 200) the compressed data D10 by a predetermined method. The data output unit 103 may output the compressed video data and the face detection result data as separate data (files).

次に、復号部２００の動作の例について図５のフローチャートを用いて説明する。 Next, an example of the operation of the decoding unit 200 will be described using the flowchart of FIG.

復号部２００では、符号化部１００で生成された圧縮データＤ１０が供給されると、顔検出結果読み取り部２０１により当該圧縮データＤ１０の顔検出結果データの時刻情報が読み取られ、顔映像復号部２０２により圧縮データＤ１０の圧縮映像データにおける顔映像フレームが特定される（Ｓ２０１）。 In the decoding unit 200, when the compressed data D10 generated by the encoding unit 100 is supplied, the time information of the face detection result data of the compressed data D10 is read by the face detection result reading unit 201, and the face video decoding unit 202 is read. Thus, the face video frame in the compressed video data of the compressed data D10 is specified (S201).

顔映像復号部２０２は、復号部２００で顔映像フレームと特定されたフレームについてのみ圧縮映像データから復号して第１の復号映像を得る（Ｓ２０２）。 The face video decoding unit 202 decodes only the frame identified as the face video frame by the decoding unit 200 from the compressed video data to obtain a first decoded video (S202).

次に、顔認識部２０３が第１の復号映像の各フレームについて、検索条件情報２０３ａの条件に該当する顔映像が含まれているか否かを確認する顔認識処理を行い、該当する顔映像を含むフレームのみを抽出し、第２の復号映像として取得する（Ｓ２０３）。このとき、顔認識部２０３は、顔検出結果データで示された位置情報（顔領域を示した位置情報）に基づき、顔映像フレームの顔領域のみを抽出して顔認識を行い、検索条件情報２０３ａの条件と照合する。 Next, the face recognition unit 203 performs face recognition processing for confirming whether or not the face video corresponding to the condition of the search condition information 203a is included for each frame of the first decoded video, Only the included frame is extracted and acquired as the second decoded video (S203). At this time, the face recognition unit 203 performs face recognition by extracting only the face area of the face video frame based on the position information (position information indicating the face area) indicated by the face detection result data, and the search condition information Check against the condition of 203a.

次に、結果出力部２０４は、第２の復号映像に基づく出力データＤ２０（又は、第２の復号映像に基づく映像信号）を出力する（Ｓ２０４）。 Next, the result output unit 204 outputs the output data D20 based on the second decoded video (or the video signal based on the second decoded video) (S204).

その後、復号部２００は、ユーザから顔認識部２０３に設定する検索条件情報２０３ａの内容変更を受け付け、繰り返しステップＳ２０１の処理を実行することが可能である。ユーザは、所望の結果（所望の映像の出力データＤ２０又は映像信号）が得られるまで、繰り返し復号部２００に処理を実行させることができる。 Thereafter, the decoding unit 200 can receive a change in the content of the search condition information 203a set in the face recognition unit 203 from the user, and can repeatedly execute the process of step S201. The user can cause the iterative decoding unit 200 to execute processing until a desired result (desired video output data D20 or video signal) is obtained.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

第１の実施形態のカメラシステム１では、符号化部１００（撮影側）で、カメラ３００で撮影した映像（動画像）の顔検出処理を行い、顔検出結果データを生成している。そして、復号部２００で顔検出結果データを読み取ることで、顔映像フレーム（顔が写っている映像）のみを復号して顔認識処理を行い、検索条件情報２０３ａに基づいた検索を行うことができる。例えば、復号部２００において、撮影・蓄積された圧縮映像データから特定の人物（あるいは所定の属性の人物）が写っているシーンを検索表示する場合、すべてのフレームを復号して顔検出することなく、効率的に顔認識処理を行い検索することが可能となる。すなわち、第１の実施形態のカメラシステム１では、符号化部１００で顔検出結果データを生成して、復号部２００で読み取ることで、撮影した映像から所定の映像シーンを効率的に検索表示することを可能とする。 In the camera system 1 of the first embodiment, the encoding unit 100 (photographing side) performs face detection processing of a video (moving image) captured by the camera 300 and generates face detection result data. Then, by reading the face detection result data with the decoding unit 200, only the face image frame (image including the face) is decoded and the face recognition process is performed, and the search based on the search condition information 203a can be performed. . For example, when the decoding unit 200 searches and displays a scene in which a specific person (or a person with a predetermined attribute) is captured from the compressed video data that has been shot and stored, it does not detect the face by decoding all frames. Thus, it is possible to search by performing face recognition processing efficiently. That is, in the camera system 1 of the first embodiment, the face detection result data is generated by the encoding unit 100 and read by the decoding unit 200, thereby efficiently searching and displaying a predetermined video scene from the captured video. Make it possible.

（Ｂ）第２の実施形態
以下、本発明による動画像処理システム、並びに、符号化装置及びプログラム、並びに、復号装置及びプログラムの第２の実施形態を、図面を参照しながら詳述する。以下では、本発明の映像処理システム、符号化装置、復号装置を、それぞれカメラシステム、符号化部、及び復号部に適用した例について説明する。 (B) Second Embodiment Hereinafter, a moving image processing system, a coding apparatus and a program, and a decoding apparatus and a program according to a second embodiment of the present invention will be described in detail with reference to the drawings. Below, the example which applied the video processing system of this invention, the encoding apparatus, and the decoding apparatus to the camera system, the encoding part, and the decoding part, respectively is demonstrated.

（Ｂ−１）第２の実施形態の構成
図６は、第２の実施形態のカメラシステム１Ａの全体構成について示した説明図である。 (B-1) Configuration of Second Embodiment FIG. 6 is an explanatory diagram showing the overall configuration of a camera system 1A of the second embodiment.

カメラシステム１Ａでは、符号化部１００と復号部２００が、それぞれ符号化部１００Ａ及び復号部２００Ａに置き換わっている。 In the camera system 1A, the encoding unit 100 and the decoding unit 200 are replaced with an encoding unit 100A and a decoding unit 200A, respectively.

次に、符号化部１００Ａの構成について説明する。 Next, the configuration of the encoding unit 100A will be described.

符号化部１００Ａでは、顔検出部１０１が顔検出／認識部１０４に置き換わっている点で第１の実施形態と異なっている。 The encoding unit 100A is different from the first embodiment in that the face detection unit 101 is replaced with a face detection / recognition unit 104.

顔検出／認識部１０４は、カメラ３００から供給された映像の各フレームについて、顔検出処理を行い、顔映像（顔領域）を含むフレーム（顔映像フレーム）であるか否かを判断する。そして、顔検出／認識部１０４は、顔映像フレームの顔映像を、保持している認識内容情報１０４ａに従って認識処理する。認識内容情報１０４ａは、顔映像フレームの顔映像について認識（分析）する内容を定義した情報である。例えば、認識内容情報１０４ａとしては、「特性の人物の顔の特徴と照合して一致度を認識すること」や、「性別・年齢を認識（推定）すること」等を示す情報が設定される。認識内容情報１０４ａを定義する具体的な形式については限定されないものであるが、例えば、上述のような検索条件を論理式や、プログラム言語で記述した内容を設定するようにしてもよい。 The face detection / recognition unit 104 performs face detection processing on each frame of the video supplied from the camera 300, and determines whether the frame includes a face video (face region) (face video frame). Then, the face detection / recognition unit 104 recognizes the face image of the face image frame according to the held recognition content information 104a. The recognition content information 104a is information that defines the content to be recognized (analyzed) for the face image of the face image frame. For example, as the recognition content information 104a, information indicating “recognizing the degree of coincidence by comparing with the characteristics of the face of the characteristic person”, “recognizing (estimating) gender / age”, or the like is set. . The specific format for defining the recognition content information 104a is not limited. For example, the above-described search condition may be set as a logical expression or content described in a programming language.

図７は、認識内容情報１０４ａの構成例について示した説明図である。 FIG. 7 is an explanatory diagram showing a configuration example of the recognition content information 104a.

図７に示すように、認識内容情報１０４ａは、例えば、認識条件種別を表すコードと、付加情報の組み合わせで構成することができる。また、認識内容情報１０４ａにおいて、認識条件種別に対応する付加情報を複数付加するようにしてもよいし、付加情報を付加せずに設定するようにしてもよい。各付加情報は可変長であることを考慮して、データ長を示す「バイト数」と、付加情報自体の組み合わせで構成するものとする。各付加情報の意味は、認識条件種別に対応する条件で規定（定義）されるものとする。 As shown in FIG. 7, the recognition content information 104a can be composed of, for example, a combination of a code representing a recognition condition type and additional information. Further, in the recognition content information 104a, a plurality of additional information corresponding to the recognition condition type may be added, or may be set without adding the additional information. Considering that each additional information has a variable length, it is configured by a combination of “number of bytes” indicating the data length and the additional information itself. The meaning of each additional information is defined (defined) by the condition corresponding to the recognition condition type.

例えば、「特定の人物の顔の特徴と照合して一致度を認識すること」を表す検索条件情報２０３ａは、図７（ａ）のような内容となる。 For example, the search condition information 203a representing “recognizing the degree of coincidence by collating with the facial features of a specific person” has the contents as shown in FIG.

図７（ａ）に示す認識内容情報１０４ａでは、認識条件種別として、「付加情報１に設定された顔特徴との一致度を顔認識結果データとして生成すること」を示す「１」が設定され、付加情報１として顔特徴情報（３４００バイト）が設定されている。図７（ａ）に示す認識内容情報１０４ａが設定されている場合、顔認識部２０３は、認識条件種別「１」の規定に従って、付加情報１に設定された顔特徴情報（３４００バイト）と、顔映像フレームから取得した顔領域に映った顔の特徴量との一致度を算出して顔認識結果データとして生成する処理を行うことになる。なお、上述の図２（ｂ）と同様に、認識内容情報１０４ａにおいて、付加情報１の顔特徴情報については、顔特徴情報自体ではなく、照合する顔特徴が格納されたデータベースのインデックス（４バイト）で示すようにしてもよい。 In the recognition content information 104a shown in FIG. 7A, “1” indicating that “a degree of coincidence with the face feature set in the additional information 1 is generated as face recognition result data” is set as the recognition condition type. , Face feature information (3400 bytes) is set as additional information 1. When the recognition content information 104a shown in FIG. 7A is set, the face recognition unit 203, in accordance with the definition of the recognition condition type “1”, the face feature information (3400 bytes) set in the additional information 1, A process of calculating the degree of coincidence with the feature amount of the face reflected in the face area acquired from the face video frame and generating it as face recognition result data is performed. As in FIG. 2B described above, in the recognition content information 104a, the face feature information of the additional information 1 is not the face feature information itself but the index (4 bytes) of the database storing the face feature to be collated. ).

また、例えば、「性別・年齢を認識（推定）すること」を表す検索条件情報２０３ａは、図７（ｂ）のような内容となる。図７（ｂ）に示す検索条件情報２０３ａでは、認識条件種別として、「性別・年齢の認識（推定）結果を顔認識結果データとして生成すること」を示す「３」が設定され、付加情報は設定されていない。図７（ｂ）に示す認識内容情報１０４ａが設定されている場合、顔認識部２０３は、認識条件種別「３」の規定に従って、顔映像フレームから取得した顔領域に映った顔の特徴量から、性別及び年齢を認識し、その認識結果を顔認識結果データとして生成する処理を行うことになる。そして、顔検出／認識部１０４は、顔検出処理の結果として顔映像（顔領域）が検出された顔映像フレームについては、第１の実施形態と同様に顔検出結果データを生成する。 Further, for example, the search condition information 203a indicating “recognizing (estimating) gender / age” has the contents as shown in FIG. In the search condition information 203a shown in FIG. 7B, “3” indicating “Generate gender / age recognition (estimation) result as face recognition result data” is set as the recognition condition type, and the additional information is Not set. When the recognition content information 104a shown in FIG. 7B is set, the face recognition unit 203 uses the facial feature amount captured in the face area acquired from the face image frame in accordance with the definition of the recognition condition type “3”. Then, the process of recognizing the sex and age and generating the recognition result as face recognition result data is performed. Then, the face detection / recognition unit 104 generates face detection result data in the same manner as in the first embodiment for a face image frame in which a face image (face area) is detected as a result of the face detection process.

また、顔検出／認識部１０４は、顔映像フレームについては、認識内容情報１０４ｂに基づく認識結果を記述した顔認識結果データを生成する。 The face detection / recognition unit 104 generates face recognition result data describing a recognition result based on the recognition content information 104b for the face video frame.

例えば、認識内容情報１０４ｂが「特性の人物の顔の特徴と照合して一致度を認識すること」という内容だった場合、検索条件情報２０６ａに記述された顔の特徴量と、顔映像フレームから取得した顔映像から取得した特徴量とを比較して、その一致度合の情報（例えば、一致度合示す数値）を顔認識結果データとして生成する。また、例えば、検索条件情報２０６ａが「性別・年齢を認識（推定）すること」だった場合、「女性・２０代」等の認識結果（推定結果）を、顔認識結果データとして生成する。顔検出／認識部１０４において、顔映像に基づく特徴量抽出や、特徴量に基づく認識処理の具体的方式については種々の顔認識方式を適用することができる。 For example, when the recognition content information 104b is “recognizing the degree of coincidence by collating with the characteristics of the face of the characteristic person”, the facial feature amount described in the search condition information 206a and the face video frame are used. The feature amount acquired from the acquired face image is compared, and information on the matching level (for example, a numerical value indicating the matching level) is generated as face recognition result data. For example, if the search condition information 206a is “recognize (estimate) gender / age”, a recognition result (estimation result) such as “female / 20s” is generated as face recognition result data. In the face detection / recognition unit 104, various face recognition methods can be applied as a specific method of feature amount extraction based on face images and recognition processing based on feature amounts.

映像圧縮部１０２は、顔検出／認識部１０４による顔検出結果データ（時刻情報）を利用して、カメラ３００から供給されるフレームの圧縮符号化（符号化）を行って、圧縮映像データを生成する。また、映像圧縮部１０２は、第１の実施形態と同様に、顔映像フレームについて圧縮符号化（符号化）を行う際に、顔領域と非顔領域で異なる符号化処理を行うようにしてもよい。 The video compression unit 102 uses the face detection result data (time information) from the face detection / recognition unit 104 to perform compression encoding (encoding) of a frame supplied from the camera 300 to generate compressed video data. To do. Further, as in the first embodiment, the video compression unit 102 may perform different encoding processes for the face area and the non-face area when performing compression encoding (encoding) on the face image frame. Good.

データ出力部１０３は、映像圧縮部１０２で生成された圧縮映像データと、顔検出／認識部１０４で生成された顔検出結果データと、顔認識結果データとを含むデータに基づく所定の形式の圧縮データＤ１１を出力する。データ出力部１０３が圧縮データＤ１１を出力する方式や手段は第１の実施形態と同様であるため詳しい説明は省略する。 The data output unit 103 performs compression in a predetermined format based on data including the compressed video data generated by the video compression unit 102, the face detection result data generated by the face detection / recognition unit 104, and the face recognition result data. Data D11 is output. Since the method and means for outputting the compressed data D11 by the data output unit 103 are the same as those in the first embodiment, detailed description thereof is omitted.

次に、復号部２００Ａの構成について説明する。 Next, the configuration of the decoding unit 200A will be described.

第２の実施形態の復号部２００Ａでは、顔検出結果読み取り部２０１が顔検出／認識データ読み取り部２０５に置き換わっている。また、第２の実施形態の復号部２００Ａでは、顔映像復号部２０２が対象映像復号部２０６に置き換わっている。さらに、第２の実施形態の復号部２００Ａでは、顔認識部２０３が除外されている。 In the decoding unit 200A of the second embodiment, the face detection result reading unit 201 is replaced with a face detection / recognition data reading unit 205. Further, in the decoding unit 200A of the second embodiment, the face video decoding unit 202 is replaced with the target video decoding unit 206. Furthermore, the face recognition unit 203 is excluded from the decoding unit 200A of the second embodiment.

顔検出／認識データ読み取り部２０５は、供給された圧縮データＤ１１から、圧縮映像データ、顔検出結果データ、及び顔認識結果データを読み込む。 The face detection / recognition data reading unit 205 reads compressed video data, face detection result data, and face recognition result data from the supplied compressed data D11.

対象映像復号部２０６は、顔検出結果データに含まれる顔映像フレームの時刻情報、及び顔認識結果データに基づいて、保持している検索条件情報２０６ａに該当する顔映像を含む顔映像フレーム（以下、「対象フレーム」と呼ぶ）を検索（特定）する。そして、対象映像復号部２０６は、圧縮映像データから、対象フレームのみを復号した映像（以下、「第３の復号映像」と呼ぶ）を生成する。 The target video decoding unit 206, based on the time information of the face video frame included in the face detection result data and the face recognition result data, includes a face video frame (hereinafter referred to as face video frame including the face video corresponding to the search condition information 206a held therein. , Referred to as “target frame”). Then, the target video decoding unit 206 generates a video obtained by decoding only the target frame (hereinafter, referred to as “third decoded video”) from the compressed video data.

検索条件情報２０６ａには、認識内容情報１０４ｂに基づく顔認識結果データと対応する情報（比較可能な情報）が記述される。 The search condition information 206a describes information (comparable information) corresponding to face recognition result data based on the recognition content information 104b.

例えば、認識内容情報１０４ｂが「特性の人物の顔の特徴と照合して一致度を認識すること」だった場合、顔認識結果データにはその一致度の数値が入力されるため、検索条件情報２０６ａには一致度に対応する閾値を設定することができる。この場合、対象映像復号部２０６は、顔認識結果データに入力された一致度が閾値（認識内容情報１０４ｂとして設定された閾値）以上の顔映像フレームを対象フレームとして検出する。 For example, when the recognition content information 104b is “recognizing the matching degree by collating with the characteristics of the face of the characteristic person”, the numerical value of the matching degree is input to the face recognition result data. A threshold corresponding to the degree of coincidence can be set in 206a. In this case, the target video decoding unit 206 detects a face video frame having a matching degree input to the face recognition result data equal to or higher than a threshold (threshold set as the recognition content information 104b) as a target frame.

また、例えば、認識内容情報１０４ｂが「性別・年齢を認識（推定）すること」だった場合、顔認識結果データには認識した性別・年齢が入力されるため、検索条件情報２０６ａには「女性・２０代」等の具体的な性別・年齢（年齢の範囲）を設定することができる。この場合、対象映像復号部２０６は、顔認識結果データに入力された年齢・性別が認識内容情報１０４ｂに設定された「女性・２０代」に該当する顔映像フレームを対象フレームとして検出する。 Further, for example, when the recognition content information 104b is “recognize (estimate) gender / age”, the recognized gender / age is input to the face recognition result data, so the search condition information 206a includes “female・ Specific gender and age (age range) such as “20s” can be set. In this case, the target video decoding unit 206 detects, as a target frame, a face video frame corresponding to “female / 20's” whose age / gender input in the face recognition result data is set in the recognition content information 104b.

図８は、検索条件情報２０６ａの構成例について示した説明図である。 FIG. 8 is an explanatory diagram showing a configuration example of the search condition information 206a.

図８に示すように、検索条件情報２０６ａは、例えば、検索条件種別を表すコードと、付加情報の組み合わせで構成することができる。また、検索条件情報２０６ａにおいて、検索条件種別に対応する付加情報を複数付加するようにしてもよい。各付加情報は可変長であることを考慮して、データ長を示す「バイト数」と、付加情報自体の組み合わせで構成するものとする。各付加情報の意味は、検索条件種別に対応する条件で規定（定義）されるものとする。 As shown in FIG. 8, the search condition information 206a can be composed of, for example, a combination of a code representing the search condition type and additional information. Further, in the search condition information 206a, a plurality of additional information corresponding to the search condition type may be added. Considering that each additional information has a variable length, it is configured by a combination of “number of bytes” indicating the data length and the additional information itself. The meaning of each additional information is defined (defined) by the condition corresponding to the search condition type.

例えば、「特定の人物の顔の特徴と照合して一致度が閾値以上であること」を表す検索条件情報２０３ａは、図８（ａ）のような内容となる。 For example, the search condition information 203a indicating that “the matching degree is equal to or higher than a threshold value by collating with the facial features of a specific person” has the content as shown in FIG.

図８（ａ）に示す検索条件情報２０６ａでは、検索条件種別として、「認識結果データにおける顔特徴との一致度が付加情報１に設定された閾値以上であること」を示す「１」が設定され、付加情報１として一致度閾値（４バイト）が設定されている。図８（ａ）に示す検索条件情報２０６ａが設定されている場合、対象映像復号部２０６は、は、検索条件種別「１」の規定に従って、取得した顔認識結果データの一致度が付加情報１に設定された一致度閾値を超えるか否かを判定する処理を行うことになる。 In the search condition information 206a shown in FIG. 8A, “1” indicating that “the degree of coincidence with the facial feature in the recognition result data is equal to or greater than the threshold set in the additional information 1” is set as the search condition type. Thus, a matching threshold (4 bytes) is set as additional information 1. When the search condition information 206a shown in FIG. 8A is set, the target video decoding unit 206 determines that the degree of coincidence of the acquired face recognition result data is additional information 1 in accordance with the definition of the search condition type “1”. The process of determining whether or not the coincidence threshold set in the above is exceeded.

また、例えば、「女性でかつ２０代（２０才〜２９才）であること」を表す検索条件情報２０３ａは、図８（ｂ）のような内容となる。図８（ｂ）に示す検索条件情報２０３ａでは、検索条件種別として、「認識結果データにおける年齢及び性別が、付加情報１〜３に設定された指定条件に合致すること」を示す「３」が設定され、付加情報１として性別（４バイト）が設定され、付加情報２として年齢範囲下限値（４バイト）が設定され、付加情報３として年齢上限値（４バイト）が設定されている。図８（ｂ）に示す検索条件情報２０６ａが設定されている場合、対象映像復号部２０６は、検索条件種別「３」の規定に従って、顔認識結果データから性別、及び、年齢範囲が付加情報１〜３（女性・２０代）に合致するか否かを判断する処理を行うことになる。 Further, for example, the search condition information 203a indicating “being a woman and being in his twenties (20 to 29 years old)” has the contents as shown in FIG. In the search condition information 203a shown in FIG. 8B, “3” indicating that the age and sex in the recognition result data match the specified conditions set in the additional information 1 to 3 is used as the search condition type. The additional information 1 is set to sex (4 bytes), the additional information 2 is set to an age range lower limit (4 bytes), and the additional information 3 is set to an age upper limit (4 bytes). When the search condition information 206a shown in FIG. 8B is set, the target video decoding unit 206 determines that the gender and the age range from the face recognition result data are additional information 1 according to the definition of the search condition type “3”. Processing to determine whether or not to match 3 (female / 20's) will be performed.

データ出力部１０３は、対象映像復号部２０６が生成した第３の復号映像に基づく出力データＤ２１、又は第３の復号映像の映像信号を出力する。 The data output unit 103 outputs the output data D21 based on the third decoded video generated by the target video decoding unit 206 or the video signal of the third decoded video.

以上のように、第２の実施形態では、符号化部１００Ａ（撮影側）の顔検出／認識部１０４で、認識内容情報１０４ａに基づく顔認識処理（分析処理）を行い、顔検出結果データを生成している。そして、復号部２００Ａ（検索側）の顔検出／認識データ読み取り部２０５で、顔検出結果データを読み取り、対象映像復号部２０６で検索条件情報２０６ａに該当する対象フレームだけを復号している。 As described above, in the second embodiment, the face detection / recognition unit 104 of the encoding unit 100A (shooting side) performs face recognition processing (analysis processing) based on the recognition content information 104a, and the face detection result data is obtained. Is generated. The face detection / recognition data reading unit 205 of the decoding unit 200A (search side) reads the face detection result data, and the target video decoding unit 206 decodes only the target frame corresponding to the search condition information 206a.

符号化部１００Ａ（顔検出／認識部１０４）及び復号部２００Ａ（対象映像復号部２０６）において、設定する情報（認識内容情報１０４ｂ／検索条件情報２０６ａ）は、ユーザの操作等により任意の情報を設定することが可能である。顔検出／認識部１０４及び対象映像復号部２０６において、設定する情報の入力をユーザから受け付ける方式は限定されないものである。顔検出／認識部１０４及び対象映像復号部２０６では、例えば、テキストファイル等の所定の形式のファイル受信により情報の入力を受け付けるようにしてもよいし、ＧＵＩ等の操作画面（例えば、Ｗｅｂ画面）を用いて入力を受け付けるようにしてもよい。 In the encoding unit 100A (face detection / recognition unit 104) and decoding unit 200A (target video decoding unit 206), the information to be set (recognition content information 104b / search condition information 206a) is arbitrary information by a user operation or the like. It is possible to set. In the face detection / recognition unit 104 and the target video decoding unit 206, a method of receiving input of information to be set from the user is not limited. For example, the face detection / recognition unit 104 and the target video decoding unit 206 may accept input of information by receiving a file in a predetermined format such as a text file, or an operation screen such as a GUI (for example, a Web screen). The input may be received using.

（Ｂ−２）第２の実施形態の動作
次に、本発明の第２の実施例におけるカメラシステム１Ａの動作について説明する。 (B-2) Operation of the Second Embodiment Next, the operation of the camera system 1A in the second example of the present invention will be described.

まず、第２の実施形態のカメラシステム１Ａにおける画像処理の流れについて説明する。 First, the flow of image processing in the camera system 1A of the second embodiment will be described.

図９は、カメラシステム１における画像処理の遷移について示した説明図である。 FIG. 9 is an explanatory diagram showing transition of image processing in the camera system 1.

図９（ａ）は、カメラ３００で撮影した映像のフレームの例について示している。図９（ａ）には人物の顔が映ったフレームＦ２０１が図示されている。 FIG. 9A shows an example of a frame of a video shot by the camera 300. FIG. 9A shows a frame F201 in which a human face is reflected.

図９（ｂ）は、フレームＦ２０１について顔検出／認識部１０４で行った顔検出結果の例を示す図である。図９（ｂ）では、フレームＦ２０１において顔を検出した顔領域を点線で囲って図示している。また、図９（ｂ）では、顔検出／認識部１０４で行った認識結果（認識内容情報１０４ａに基づく認識結果）として、性別・年齢の推定結果（図９（ｂ）では「女性・２０代」）も図示している。 FIG. 9B is a diagram illustrating an example of a face detection result performed by the face detection / recognition unit 104 for the frame F201. In FIG. 9B, the face area in which the face is detected in the frame F201 is surrounded by a dotted line. Further, in FIG. 9B, the recognition result (recognition result based on the recognition content information 104a) performed by the face detection / recognition unit 104 is the gender / age estimation result (FIG. 9B “female / 20s”). ]) Is also illustrated.

図９（ｃ）は、フレームＦ２０１について映像圧縮部１０２で圧縮符号化処理（符号化処理）した処理結果（圧縮映像データ）の例について示している。図９（ｃ）では、顔領域の部分を点線で囲み、顔領域以外の部分にハッチ（斜線）を付して図示している。映像圧縮部１０２では、第１の実施形態と同様に、フレームＦ２０１を映像圧縮（符号化）する際に、顔領域の部分（点線で囲まれた部分）について顔領域以外の領域（ハッチを付した部分）と異なる圧縮符号化処理（符号化処理）を行う。 FIG. 9C illustrates an example of a processing result (compressed video data) obtained by compressing and encoding (encoding processing) the frame F201 by the video compression unit 102. In FIG. 9C, the face area is surrounded by a dotted line, and the area other than the face area is hatched. As in the first embodiment, when compressing (encoding) the frame F201, the video compression unit 102 applies a region other than the face region (hatched portion) to the face region (portion surrounded by a dotted line). And a compression encoding process (encoding process) different from the above-described part).

図９（ｄ）は、顔映像復号部２０２で、顔認識結果データに基づいて、圧縮映像データから検索対象となる顔映像を含むフレームＦ２０１（検索条件情報２０６ａに該当するフレームＦ２０１）を特定し、圧縮映像データからフレームＦ２０１を復号する処理を行う処理（第３の復号映像を生成する処理）について示している。顔映像復号部２０２では、第１の実施形態と同様に、顔の映像を含むフレームＦ２０１を復号する際に、顔領域の部分（点線で囲まれた部分）と、顔領域以外の領域（ハッチを付した部分）について、それぞれ対応する復号処理を行う。 In FIG. 9D, the face video decoding unit 202 identifies a frame F201 (frame F201 corresponding to the search condition information 206a) including the face video to be searched from the compressed video data based on the face recognition result data. , A process of performing a process of decoding the frame F201 from the compressed video data (a process of generating a third decoded video) is illustrated. Similar to the first embodiment, the face video decoding unit 202 decodes a frame F201 including a video of a face, a face area portion (a portion surrounded by a dotted line), and an area other than the face area (hatch). The corresponding decryption process is performed for each part marked with ().

次に、符号化部１００Ａの動作の例について図１０のフローチャートを用いて説明する。 Next, an example of the operation of the encoding unit 100A will be described using the flowchart of FIG.

まず、カメラ３００で撮像された映像のフレームが符号化部１００Ａに供給されたものとする（Ｓ３０１）。 First, it is assumed that a frame of a video captured by the camera 300 is supplied to the encoding unit 100A (S301).

次に、顔検出／認識部１０４が、カメラ３００から供給されたフレーム（注目フレーム）について顔検出処理を行い、顔映像（顔領域）が含まれているか否かを判定する（Ｓ３０２）。このとき、顔検出／認識部１０４は、顔の大きさや向きも考慮し、注目フレームに正面から所定以上の大きさで撮影された顔の有無を判定するようにしてもよい。顔検出／認識部１０４は、注目フレームに顔映像（顔領域）が含まれている場合には後述するステップＳ３０３から動作し、そうでない場合には後述するステップＳ３０６から動作する。 Next, the face detection / recognition unit 104 performs face detection processing on the frame (frame of interest) supplied from the camera 300, and determines whether or not a face image (face area) is included (S302). At this time, the face detection / recognition unit 104 may determine the presence / absence of a face photographed at a predetermined size or larger from the front in the frame of interest in consideration of the size and orientation of the face. The face detection / recognition unit 104 operates from step S303 described later when a face image (face area) is included in the frame of interest, and operates from step S306 described below otherwise.

上述のステップＳ３０２で、注目フレームに顔映像が含まれている判定された場合、顔検出／認識部１０４は、当該注目フレームで検出された顔映像を認識内容情報１０４ａに従って認識し、その認識結果を顔認識結果データとして取得する。そして、顔検出／認識部１０４は、当該注目フレームを特定する時刻情報と、当該注目フレーム内の顔領域を特定する位置情報を含む顔検出結果データを生成して取得する（Ｓ３０３）。 When it is determined in step S302 described above that the face image is included in the attention frame, the face detection / recognition unit 104 recognizes the face image detected in the attention frame according to the recognition content information 104a, and the recognition result. Is acquired as face recognition result data. Then, the face detection / recognition unit 104 generates and acquires face detection result data including time information for specifying the frame of interest and position information for specifying the face area in the frame of interest (S303).

次に、映像圧縮部１０２は、第１の実施形態と同様に顔検出結果データの位置情報に基づいて、注目フレームの顔領域と非顔領域とで異なる圧縮符号化処理（符号化処理）を行い、注目フレームの圧縮映像データを生成する（Ｓ３０４）。 Next, the video compression unit 102 performs different compression encoding processing (encoding processing) on the face area and the non-face area of the frame of interest based on the position information of the face detection result data as in the first embodiment. The compressed video data of the frame of interest is generated (S304).

一方、上述のステップＳ３０２で、注目フレームに顔が映っていないと判定された場合、映像圧縮部１０２は、注目フレームについて、特に区別せずに、全体を圧縮符号化処理（符号化処理）して、注目フレームの圧縮映像データを生成する（Ｓ３０６）。 On the other hand, when it is determined in step S302 described above that the face is not shown in the frame of interest, the video compression unit 102 performs compression coding processing (coding processing) on the entire frame of interest without particular distinction. Thus, compressed video data of the frame of interest is generated (S306).

上述のステップＳ３０４又はステップＳ３０６で注目フレームの圧縮映像データが生成されると、データ出力部１０３は、当該映像圧縮データを含むデータを、注目フレームの圧縮データＤ１１として出力する（Ｓ３０５）。なお、上述のステップＳ３０３で顔検出結果データ及び顔認識結果データが生成されていた場合、データ出力部１０３は、注目フレームの圧縮データに当該顔検出結果データ及び顔認識結果データも付加する。なお、第２の実施形態において、データ出力部１０３によるデータ出力の方式や手段は第１の実施形態と同様であるため詳しい説明を省略する。 When the compressed video data of the frame of interest is generated in the above-described step S304 or step S306, the data output unit 103 outputs data including the video compressed data as the compressed data D11 of the frame of interest (S305). If face detection result data and face recognition result data have been generated in step S303 described above, the data output unit 103 adds the face detection result data and face recognition result data to the compressed data of the frame of interest. In the second embodiment, the method and means of data output by the data output unit 103 are the same as those in the first embodiment, and detailed description thereof is omitted.

次に、復号部２００Ａの動作の例について図１１のフローチャートを用いて説明する。 Next, an example of the operation of the decoding unit 200A will be described using the flowchart of FIG.

復号部２００Ａでは、符号化部１００Ａで生成された圧縮データＤ１１が供給されると、顔検出／認識データ読み取り部２０５により、当該圧縮データの顔検出結果データ、及び顔検出結果データが読み取られる（Ｓ４０１）。 In the decoding unit 200A, when the compressed data D11 generated by the encoding unit 100A is supplied, the face detection / recognition data reading unit 205 reads the face detection result data and the face detection result data of the compressed data ( S401).

そして、対象映像復号部２０６は、顔検出結果データ及び顔認識結果データに基づいて、検索条件情報２０６ａに該当する対象フレームを特定し、特定された対処フレームについてのみ圧縮映像データから復号して第３の復号映像を得る（Ｓ４０２）。 Then, the target video decoding unit 206 identifies a target frame corresponding to the search condition information 206a based on the face detection result data and the face recognition result data, and decodes only the identified handling frame from the compressed video data. 3 decoded video is obtained (S402).

次に、結果出力部２０４は、第３の復号映像に基づく出力データＤ２１（又は、第３の復号映像に基づく映像信号）を出力する（Ｓ４０３）。 Next, the result output unit 204 outputs the output data D21 based on the third decoded video (or the video signal based on the third decoded video) (S403).

その後、復号部２００Ａは、ユーザから対象映像復号部２０６に設定する検索条件情報２０６ａの内容変更を受け付け、繰り返しステップＳ４０１〜Ｓ４０３の処理を実行することが可能である。ユーザは、所望の結果（所望の映像）が得られるまで、繰り返し検索条件情報２０６ａを変更して復号部２００Ａに処理を実行させることができる。 Thereafter, the decoding unit 200A can receive a change in the content of the search condition information 206a set in the target video decoding unit 206 from the user, and can repeatedly execute the processes of steps S401 to S403. The user can repeatedly change the search condition information 206a and cause the decoding unit 200A to execute processing until a desired result (desired video) is obtained.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態と比較して以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved as compared with the first embodiment.

第２の実施形態では、映像圧縮部１０２で、認識内容情報１０４ｂに従った顔認識処理を行って顔認識結果データを生成し、復号部２００Ａで復号時に読み取らせている。これにより、復号部２００Ａでは、検索条件情報２０６ａと顔映像フレームごとの顔認識結果データとを比較するだけで、検索対象の顔（検索条件情報２０６ａに該当する顔）が映っている対象フレームを特定し、対象フレームだけを復号して出力することができる。例えば、復号部２００Ａにおいて、撮影・蓄積された圧縮映像データから特定の人物（あるいは所定の属性の人物）が写っているシーンを検索表示する場合、すべての映像を復号して顔検出処理や顔認識処理を行うことなく、効率的に検索表示することが可能となる。 In the second embodiment, the video compression unit 102 performs face recognition processing according to the recognition content information 104b to generate face recognition result data, and the decoding unit 200A reads the data at the time of decoding. As a result, the decoding unit 200A simply compares the search condition information 206a with the face recognition result data for each face video frame, and selects the target frame in which the face to be searched (the face corresponding to the search condition information 206a) is shown. It is possible to specify and decode only the target frame and output it. For example, in the decoding unit 200A, when searching and displaying a scene in which a specific person (or a person with a predetermined attribute) is captured from compressed video data that has been photographed and stored, all images are decoded to perform face detection processing or face detection Search and display can be performed efficiently without performing recognition processing.

言い換えると、第２の実施形態では、符号化部１００Ａ（撮影側）で顔検出を行う際に顔認識処理まで行っているため、復号部２００Ａ（検索側）で顔認識処理を行うことなく必要な映像を効率的に検索することができる。 In other words, in the second embodiment, since face recognition processing is performed when face detection is performed by the encoding unit 100A (shooting side), it is necessary without performing face recognition processing by the decoding unit 200A (search side). Efficient video search.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｃ−１）上記の各実施形態のカメラシステム１、１Ａでは、顔検出処理を用いて、顔が写っているフレーム（顔映像フレーム）から特定の人物や所定の属性の人物が写っているフレームを検索表示する場合について説明したが、単に顔が映っているだけでなく、例えば正面から撮影された顔だけが写っている映像（正面顔映像）や、サングラスやマスクなど顔以外の物体が無い顔や、表情が真顔である顔だけを顔映像として検出することで、その後に行う顔認識処理がより正確に行われ、検索表示をさらに効率的に行うことができるようになる。例えば、第１の実施形態の顔検出部１０１や、第２の実施形態の顔検出／認識部１０４で行われる顔検出処理において、上述のように、正面から撮影された顔だけが写っている映像（正面顔映像）や、サングラスやマスクなど顔以外の物体が無い顔や、表情が真顔である顔が含まれるフレームだけを検出するようにしてもよい。 (C-1) In the camera systems 1 and 1A of the above-described embodiments, a specific person or a person with a predetermined attribute is captured from a frame (face image frame) in which a face is captured using face detection processing. The case where the frame is searched and displayed has been explained. However, not only the face is shown, but also an image showing only the face taken from the front (front face image), and objects other than the face such as sunglasses and masks. By detecting only a face without a face or a face whose expression is a true face as a face image, the subsequent face recognition processing is performed more accurately, and search display can be performed more efficiently. For example, in the face detection processing performed by the face detection unit 101 of the first embodiment and the face detection / recognition unit 104 of the second embodiment, only the face photographed from the front is shown as described above. Only a frame including an image (front face image), a face having no object other than a face such as sunglasses or a mask, or a face whose expression is a true face may be detected.

（Ｃ−２）上記の各実施形態では、圧縮データＤ１０、Ｄ１１の顔検出結果データに顔映像フレームを特定するための時刻情報が含まれているが、圧縮データＤ１０、Ｄ１１において、フレーム単位に顔検出結果データが紐づいたデータ形式とすることをもって、時刻情報を付加（顔検出結果データの一部を付加）するようにしてもよい。フレーム単位に顔検出結果データが紐づいたデータ形式となっていれば、復号部２００、２００Ａ側で顔映像フレームを特定することができる。 (C-2) In each of the above embodiments, time information for specifying a face video frame is included in the face detection result data of the compressed data D10 and D11. Time information may be added (part of the face detection result data is added) by using a data format in which face detection result data is linked. If the face detection result data is in a data format linked to each frame, the face video frame can be specified on the decoding units 200 and 200A side.

（Ｃ−３）第２の実施形態において、符号化部１００Ａが出力する圧縮データＤ１１において、顔検出結果データに顔領域の位置情報が含まれている旨を説明したが、第２の実施形態の復号部２００Ａでは、顔認識処理を行わないため圧縮データＤ１１の顔検出結果データからは顔領域の位置情報を除外するようにしてもよい。 (C-3) In the second embodiment, it has been described that the face detection result data includes the position information of the face area in the compressed data D11 output from the encoding unit 100A. The second embodiment In the decoding unit 200A, since face recognition processing is not performed, the position information of the face area may be excluded from the face detection result data of the compressed data D11.

１…カメラシステム、１００…符号化部、１０１…顔検出部、１０２…映像圧縮部、１０３…データ出力部、２００…復号部、２０１…顔検出結果読み取り部、２０２…顔映像復号部、２０３…顔認識部、２０３ａ…検索条件情報、２０４…結果出力部、３００…カメラ。 DESCRIPTION OF SYMBOLS 1 ... Camera system, 100 ... Encoding part, 101 ... Face detection part, 102 ... Image | video compression part, 103 ... Data output part, 200 ... Decoding part, 201 ... Face detection result reading part, 202 ... Face image decoding part, 203 ... face recognition unit, 203a ... search condition information, 204 ... result output unit, 300 ... camera.

Claims

符号化部と復号部を備える動画像処理システムにおいて、
前記符号化部は、
動画像の各フレームについて人物の顔を含む顔領域を検出し、その検出結果を示す顔検出結果データを生成する顔領域処理手段と、
前記動画像を符号化して動画像符号化データを生成する符号化処理手段と、
前記顔検出結果データと、前記動画像符号化データとを含むデータを出力する符号化側出力手段とを有し、
前記復号部は、
前記符号化部が出力したデータを取得するデータ取得手段と、
少なくとも前記顔検出結果データを利用して、前記動画像符号化データから、一部のフレームを抽出して復号し、復号動画像を取得する復号処理手段とを有する
ことを特徴とする動画像処理システム。 In a moving image processing system including an encoding unit and a decoding unit,
The encoding unit includes:
Face area processing means for detecting a face area including a human face for each frame of a moving image and generating face detection result data indicating the detection result;
Encoding processing means for encoding the moving image to generate moving image encoded data;
Encoding side output means for outputting data including the face detection result data and the moving image encoded data;
The decoding unit
Data acquisition means for acquiring data output by the encoding unit;
A moving image processing comprising: decoding processing means for extracting and decoding a part of the frame from the moving image encoded data using at least the face detection result data and obtaining a decoded moving image system.

前記符号化処理手段は、顔領域を含むフレームを符号化する際に、顔領域と顔領域以外の非顔領域とで異なる符号化処理を行うことを特徴とする請求項１に記載の動画像処理システム。 The moving image according to claim 1, wherein the encoding processing unit performs different encoding processing for a face area and a non-face area other than the face area when encoding a frame including the face area. Processing system.

前記復号処理手段は、前記顔検出結果データに基づいて、前記動画像符号化データから、顔領域を含むフレームのみを抽出して復号して前記復号動画像を取得することを特徴とする請求項１又は２に記載の動画像処理システム。 The decoding processing means acquires the decoded moving image by extracting and decoding only a frame including a face region from the moving image encoded data based on the face detection result data. 3. The moving image processing system according to 1 or 2.

前記復号処理手段は、顔領域を含むフレームのそれぞれの顔領域について認識処理を行い、設定された検索条件に該当する顔領域を含むフレームを抽出し、抽出したフレームのみで構成された動画像を前記復号動画像として取得することを特徴とする請求項３に記載の動画像処理システム。 The decoding processing means performs recognition processing for each face area of the frame including the face area, extracts a frame including the face area corresponding to the set search condition, and extracts a moving image including only the extracted frame. The moving image processing system according to claim 3, wherein the moving image processing system is acquired as the decoded moving image.

前記顔領域処理手段は、前記動画像の各フレームで顔領域を含むフレームを検出した場合、当該フレーム内の顔領域の位置情報を取得し、
前記顔検出結果データには、顔領域を含むフレーム内の顔領域の位置情報が含まれており、
前記復号処理手段は、前記顔検出結果データの位置情報を利用して顔領域を含むフレームの顔領域を抽出して認識処理を行う
ことを特徴とする請求項４に記載の動画像処理システム。 When the face area processing unit detects a frame including a face area in each frame of the moving image, the face area processing unit acquires position information of the face area in the frame,
The face detection result data includes position information of the face area in the frame including the face area,
The moving image processing system according to claim 4, wherein the decoding processing unit extracts a face area of a frame including the face area by using position information of the face detection result data, and performs a recognition process.

前記顔領域処理手段は、顔領域を検出したフレームの顔領域について、設定された認識内容に従った内容の認識処理を行い、その認識処理結果を示す顔認識結果データを生成し、
前記符号化側出力手段が出力するデータには、さらに前記顔認識結果データも含まれており、
前記復号処理手段は、前記顔検出結果データ及び前記顔認識結果データに基づき、設定された検索条件に該当する顔領域を含むフレームを抽出し、抽出したフレームのみを復号して前記復号動画像を生成する
ことを特徴とする請求項１又は２に記載の動画像処理システム。 The face area processing means performs a content recognition process according to the set recognition content for the face area of the frame in which the face area is detected, and generates face recognition result data indicating the recognition process result,
The data output by the encoding side output means further includes the face recognition result data,
The decoding processing means extracts a frame including a face area corresponding to a set search condition based on the face detection result data and the face recognition result data, decodes only the extracted frame, and extracts the decoded moving image. The moving image processing system according to claim 1, wherein the moving image processing system is generated.

動画像の各フレームについて人物の顔を含む顔領域を検出し、その検出結果を示す顔検出結果データを生成する顔領域処理手段と、
前記動画像を符号化して動画像符号化データを生成する符号化処理手段と、
前記顔検出結果データと、前記動画像符号化データとを含むデータを出力する符号化側出力手段と
を有することを特徴とする符号化装置。 Face area processing means for detecting a face area including a human face for each frame of a moving image and generating face detection result data indicating the detection result;
Encoding processing means for encoding the moving image to generate moving image encoded data;
An encoding apparatus comprising: encoding side output means for outputting data including the face detection result data and the moving image encoded data.

コンピュータを、
動画像の各フレームについて人物の顔を含む顔領域を検出し、その検出結果を示す顔検出結果データを生成する顔領域処理手段と、
前記動画像を符号化して動画像符号化データを生成する符号化処理手段と、
前記顔検出結果データと、前記動画像符号化データとを含むデータを出力する符号化側出力手段と
して機能させることを特徴とする符号化プログラム。 Computer
Face area processing means for detecting a face area including a human face for each frame of a moving image and generating face detection result data indicating the detection result;
Encoding processing means for encoding the moving image to generate moving image encoded data;
An encoding program that functions as an encoding-side output unit that outputs data including the face detection result data and the moving image encoded data.

動画像を符号化した動画像符号化データと、前記動画像で人物の顔を含む顔領域を検出した検出結果を示す顔検出結果データとを含むデータを取得するデータ取得手段と、
少なくとも前記顔検出結果データを利用して、前記動画像符号化データから、一部のフレームを抽出して復号し、復号動画像を取得する復号処理手段と
を有することを特徴とする復号装置。 Data acquisition means for acquiring data including moving image encoded data obtained by encoding a moving image and face detection result data indicating a detection result obtained by detecting a face area including a human face in the moving image;
A decoding apparatus comprising: decoding processing means for extracting and decoding a part of frames from the moving image encoded data using at least the face detection result data and obtaining a decoded moving image.

コンピュータを、
動画像を符号化した動画像符号化データと、前記動画像で人物の顔を含む顔領域を検出した検出結果を示す顔検出結果データとを含むデータを取得するデータ取得手段と、
少なくとも前記顔検出結果データを利用して、前記動画像符号化データから、一部のフレームを抽出して復号し、復号動画像を取得する復号処理手段と
して機能させることを特徴とする復号プログラム。 Computer
Data acquisition means for acquiring data including moving image encoded data obtained by encoding a moving image and face detection result data indicating a detection result obtained by detecting a face area including a human face in the moving image;
Decoding characterized in that at least the face detection result data is used to extract and decode a part of the frames from the moving image encoded data and to function as decoding processing means for obtaining a decoded moving image program.