JP2009048347A

JP2009048347A - Image processing apparatus, method and program

Info

Publication number: JP2009048347A
Application number: JP2007212659A
Authority: JP
Inventors: Toshinori Nagahashi; 敏則長橋
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2007-08-17
Filing date: 2007-08-17
Publication date: 2009-03-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus for tracking a person in a moving image regardless of the presence of an obstacle or changes in the background. <P>SOLUTION: The image processing apparatus includes a first detecting means (face detection/recognition module 62) for detecting a face image showing the face of a person to be tracked from a frame constituting a moving image; a second detecting means (second tracking area detection module 66) for detecting an image showing at least a portion of the body of the person from the frame; and a tracking means (central control module 65) for tracking the person based on the face image when the face image is detected and for tracking the person based on the image showing at least a portion of the body when the face image is not detected. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、画像処理装置、画像処理方法、および、画像処理プログラムに関する。 The present invention relates to an image processing device, an image processing method, and an image processing program.

特許文献１には、画像中の人物の顔などを抽出処理する技術が開示されている。この技
術では、互いに排他である２つの仮想領域を取得画像中に設定し、これら仮想領域間の分
離度に基づいて部分画像（顔の画像）を抽出する。
特開平１１−２９６６５９号公報 Patent Document 1 discloses a technique for extracting a human face in an image. In this technique, two virtual regions that are mutually exclusive are set in an acquired image, and a partial image (face image) is extracted based on the degree of separation between these virtual regions.
Japanese Patent Laid-Open No. 11-296659

しかしながら、特許文献１に開示される技術では、動画像中における人物の顔をトラッ
キング（追跡）しようとすると、人物の顔が遮蔽物の背後に隠れた場合には、トラッキン
グが正常に行われなくなり、対象となる人物を見失ってしまう場合がある。また、シーン
によって背景が変化したり、時系列的に背景が変化したりする場合には、分離度が変化し
、人物の顔を適正にトラッキングできない場合がある。 However, in the technique disclosed in Patent Document 1, when a person's face in a moving image is tracked (tracked), if the person's face is hidden behind a shield, tracking is not performed normally. The target person may be lost. In addition, when the background changes depending on the scene or the background changes in time series, the degree of separation may change, and the person's face may not be properly tracked.

本発明は、上述した事情に鑑みてなされたものであり、遮蔽物の存在または背景の変化
によらず、動画中の人物を追跡することが可能な画像処理装置、画像処理方法、および、
画像処理プログラムを提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and an image processing apparatus, an image processing method, and an image processing apparatus capable of tracking a person in a moving image regardless of the presence of a shielding object or a change in background.
An object is to provide an image processing program.

上記目的を達成するために、本発明は、動画を構成するフレームからトラッキング対象
である人物の顔を表す顔画像を検出する第１の検出手段と、前記人物の体の少なくとも一
部を表す画像を前記フレームから検出する第２の検出手段と、前記顔画像が検出された場
合には当該顔画像に基づいて前記人物のトラッキングを行い、前記顔画像が検出されない
場合には前記体の少なくとも一部を表す画像に基づいて前記人物のトラッキングを行うト
ラッキング手段と、を有することを特徴とする。
この構成によれば、顔が検出できた場合には顔により、また、顔が検出できなかった場
合には、体の一部、例えば、胴体に基づいてトラッキングが行われる。このため、遮蔽物
の存在または背景の変化によらず、動画中の人物を追跡することが可能になる。 To achieve the above object, the present invention provides a first detection means for detecting a face image representing the face of a person to be tracked from a frame constituting a moving image, and an image representing at least a part of the person's body. Second detection means for detecting the face from the frame, tracking the person based on the face image when the face image is detected, and at least one of the body when the face image is not detected. Tracking means for tracking the person based on an image representing a portion.
According to this configuration, tracking is performed based on a face when a face can be detected, and based on a part of the body, for example, a torso when a face cannot be detected. For this reason, it becomes possible to track the person in the moving image regardless of the presence of the shielding object or the change of the background.

また、本発明は、上記発明において、前記第１の検出手段は、顔を表す画像の特徴に基
づいて前記顔画像を検出し、前記第２の検出手段は、前記体の少なくとも一部を表す画像
の特徴量と、前記フレームのうち背景を表す画像の特徴量との差異に基づいて前記体の少
なくとも一部を表す画像を検出することを特徴とする。
この構成によれば、顔については顔が有する目、鼻、口等による画像としての特徴に基
づいて検出が行われ、体の少なくとも一部については背景との特徴量の差異に基づいて検
出が行われる。このため、異なる２つの検出方法によって対象がトラッキングされるので
、対象を見失う可能性を低くすることができる。 Also, in the present invention according to the above invention, the first detection unit detects the face image based on a feature of the image representing the face, and the second detection unit represents at least a part of the body. An image representing at least a part of the body is detected based on a difference between a feature amount of the image and a feature amount of an image representing a background in the frame.
According to this configuration, the face is detected based on the features of the face, such as the eyes, nose, mouth, and the like, and at least a part of the body is detected based on the difference in the feature amount from the background. Done. For this reason, since the object is tracked by two different detection methods, the possibility of losing the object can be reduced.

また、本発明は、上記発明において、前記第２の検出手段は、前記体の少なくとも一部
を表す画像及び前記背景を表す画像の複数の異なる種類の特徴量のうち、前記体の少なく
とも一部を表す画像と、前記背景を表す画像とを識別する能力が最も高い特徴量に基づい
て、前記体の少なくとも一部を表す画像を検出することを特徴とする。
この構成によれば、体の少なくとも一部を検出する際には、複数の特徴量が算出され、
その中から最も識別性が高いものが選択されて使用される。このため、対象の移動等に伴
って変化する背景の影響を最小限に抑えつつ、対象をトラッキングすることができる。 Also, in the present invention according to the above-described invention, the second detection unit includes at least a part of the body among a plurality of different types of feature amounts of an image representing at least a part of the body and an image representing the background. An image representing at least a part of the body is detected based on a feature amount having the highest ability to discriminate between an image representing the background and an image representing the background.
According to this configuration, when detecting at least a part of the body, a plurality of feature amounts are calculated,
Among them, the one with the highest discriminability is selected and used. For this reason, it is possible to track the object while minimizing the influence of the background that changes as the object moves.

また、本発明は、上記発明において、前記第２の検出手段は、前記体の少なくとも一部
を表す画像に該当する第１の領域を推定し、前記第１の領域の特徴量の分布と、前記第１
の領域以外の第２の領域の特徴量の分布との重複が最も少ない特徴量を前記識別する能力
が最も高い特徴量とする。
この構成によれば、体の少なくとも一部と背景との特徴量の分布曲線の重複部分が最も
少ないものが識別性が高い特徴量として判定される。このため、特徴量の統計的な性質に
基づいて、最も識別性が高い特徴量を的確かつ迅速に判定することができる。 Further, the present invention is the above invention, wherein the second detection means estimates a first region corresponding to an image representing at least a part of the body, and a distribution of feature amounts of the first region; The first
The feature amount having the smallest overlap with the feature amount distribution of the second region other than the second region is set as the feature amount having the highest identification capability.
According to this configuration, the feature amount distribution curve having at least a part of the body and the background having the smallest overlapping portion is determined as the feature amount having high discriminability. For this reason, it is possible to accurately and quickly determine the feature quantity with the highest discriminability based on the statistical properties of the feature quantity.

また、本発明は、上記発明において、前記第２の検出手段は、前記フレームよりも時系
列において前のフレームである前フレームから検出された前記体の少なくとも一部を表す
画像に基づいて算出される動きベクトルに基づいて、前記第１の領域を推定するか、また
は、前記フレームから検出された顔画像の位置、若しくは、大きさ、若しくは、向きに基
づいて前記第１の領域を推定することを特徴とする。
この構成によれば、第２の検出手段は、それよりも前のフレームで検出された体の少な
くとも一部に基づく動きベクトルまたは第１の検出手段によって検出された顔の位置、大
きさ、向きに基づいて、体の少なくとも一部に該当する領域を推定する。このため、動き
ベクトルまたは顔のいずれかまたは双方に基づいて推定を行うことにより、体の少なくと
も一部をより正確に推定することができることから、第２の検出手段による検出の精度を
向上できる。 Further, in the present invention according to the above invention, the second detection means is calculated based on an image representing at least a part of the body detected from a previous frame that is a previous frame in time series than the frame. The first region is estimated based on a motion vector to be estimated, or the first region is estimated based on the position, size, or orientation of a face image detected from the frame. It is characterized by.
According to this configuration, the second detection means is a motion vector based on at least a part of the body detected in the previous frame, or the position, size, and orientation of the face detected by the first detection means. Based on the above, an area corresponding to at least a part of the body is estimated. For this reason, by performing estimation based on either or both of the motion vector and the face, it is possible to more accurately estimate at least a part of the body, thereby improving the accuracy of detection by the second detection unit.

また、本発明は、上記発明において、前記第１の検出手段は、前記フレームよりも時系
列において前のフレームである前フレームから検出された前記顔画像または前記体の少な
くとも一部を表す画像に基づいて算出される動きベクトルに基づいて、前記フレームから
前記顔画像に該当する領域を推定し、前記推定された領域を優先して前記顔画像を検出す
ることを特徴とする。
この構成によれば、第１の検出手段は前のフレームにおける顔または体の少なくとも一
部に基づく動きベクトルに基づいて顔が存在する領域を推定し、推定された領域を優先的
に検出処理を行う。このため、第１の検出手段により顔を迅速に検出することが可能にな
ることから、顔の検出処理の負荷を軽減することができる。 Also, in the present invention according to the above-described invention, the first detection unit may convert the face image or the image representing at least a part of the body detected from a previous frame which is a previous frame in time series with respect to the frame. An area corresponding to the face image is estimated from the frame based on a motion vector calculated based on the motion vector, and the face image is detected with priority on the estimated area.
According to this configuration, the first detection means estimates a region where the face exists based on a motion vector based on at least a part of the face or body in the previous frame, and performs preferential detection processing on the estimated region. Do. For this reason, since the face can be quickly detected by the first detection means, the load of the face detection process can be reduced.

また、本発明は、動画を構成するフレームからトラッキング対象である人物の顔を表す
顔画像を検出し、前記人物の体の少なくとも一部を表す画像を前記フレームから検出し、
前記顔画像が検出された場合には当該顔画像に基づいて前記人物のトラッキングを行い、
前記顔画像が検出されなかった場合には前記体の少なくとも一部を表す画像に基づいて前
記人物のトラッキングを行うことを特徴とする。
この構成によれば、顔が検出できた場合には顔により、また、顔が検出できなかった場
合には、体の一部、例えば、胴体によってトラッキングが行われる。このため、遮蔽物の
存在または背景の変化によらず、動画中の人物を追跡することが可能になる。 Further, the present invention detects a face image representing a face of a person who is a tracking target from a frame constituting a moving image, detects an image representing at least a part of the person's body from the frame,
When the face image is detected, the person is tracked based on the face image,
When the face image is not detected, the person is tracked based on an image representing at least a part of the body.
According to this configuration, tracking is performed by the face when the face can be detected, and by a part of the body, for example, the torso when the face cannot be detected. For this reason, it becomes possible to track the person in the moving image regardless of the presence of the shielding object or the change of the background.

また、本発明は、動画を構成するフレームからトラッキング対象である人物の顔を表す
顔画像を検出する第１の検出手段、前記人物の体の一部を表す画像を前記フレームから検
出する第２の検出手段、前記顔画像が検出された場合には当該顔画像に基づいて前記人物
のトラッキングを行い、前記顔画像が検出されない場合には前記体の少なくとも一部を表
す画像に基づいて前記人物のトラッキングを行うトラッキング手段、としてコンピュータ
を機能させる。
この構成によれば、顔が検出できた場合には顔により、また、顔が検出できなかった場
合には、体の一部、例えば、胴体によってトラッキングが行われる。このため、遮蔽物の
存在または背景の変化によらず、動画中の人物を追跡することが可能になる。 Further, the present invention provides a first detection unit for detecting a face image representing a face of a person who is a tracking target from a frame constituting a moving image, and a second detection unit for detecting an image representing a part of the person's body from the frame. Detecting means for tracking the person based on the face image when the face image is detected, and for detecting the person based on an image representing at least a part of the body when the face image is not detected. The computer functions as a tracking means for performing tracking.
According to this configuration, tracking is performed by the face when the face can be detected, and by a part of the body, for example, the torso when the face cannot be detected. For this reason, it becomes possible to track the person in the moving image regardless of the presence of the shielding object or the change of the background.

以下、図面を参照して本発明の実施形態について説明する。なお、以下では、本発明の
画像処理装置を印刷装置として実施した場合を例に挙げて説明する。また、本発明の画像
処理方法および画像処理プログラムは、印刷装置の動作および印刷装置を制御するプログ
ラムとして説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Hereinafter, a case where the image processing apparatus of the present invention is implemented as a printing apparatus will be described as an example. The image processing method and the image processing program of the present invention will be described as a program for controlling the operation of the printing apparatus and the printing apparatus.

（Ａ）実施の形態の構成の説明
図１は、本発明を適用した実施形態に係る印刷装置の概略構成を示す図である。図１に
示すように、印刷装置１０は、ＣＰＵ（Central Processing Unit）１１、ＲＯＭ（Read
Only Memory）１２、ＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）１
３、ＲＡＭ（Random Access Memory）１４、画像処理部１５、Ｉ／Ｆ（Interface）１６
、バス１７、ＬＣＤ１８、操作ボタン１９、カードＩ／Ｆ回路２０、メモリカードＭが挿
入されるカードスロット２１、プリンタエンジンコントローラ２２、紙送りモータ２３、
ローラ２４、キャリッジモータ２５、駆動ベルト２６、キャリッジ２７、および、記録ヘ
ッド２８を有しており、この例では、動画再生装置４０が接続ケーブル４１によって接続
されている。 (A) Description of Configuration of Embodiment FIG. 1 is a diagram illustrating a schematic configuration of a printing apparatus according to an embodiment to which the present invention is applied. As shown in FIG. 1, a printing apparatus 10 includes a CPU (Central Processing Unit) 11 and a ROM (Read
Only Memory (12), EEPROM (Electrically Erasable and Programmable ROM) 1
3, RAM (Random Access Memory) 14, image processing unit 15, I / F (Interface) 16
, Bus 17, LCD 18, operation button 19, card I / F circuit 20, card slot 21 into which memory card M is inserted, printer engine controller 22, paper feed motor 23,
A roller 24, a carriage motor 25, a drive belt 26, a carriage 27, and a recording head 28 are provided. In this example, a moving image reproducing device 40 is connected by a connection cable 41.

ここで、ＣＰＵ１１は、ＲＯＭ１２に格納されているプログラム１２ａに応じて各種演
算処理を実行するとともに、紙送りモータ２３およびキャリッジモータ２５をはじめとす
る装置の各部を制御する。ＲＯＭ１２は、ＣＰＵ１１が実行するプログラム１２ａおよび
その他のデータを格納している半導体メモリである。ＲＡＭ１４は、ＣＰＵ１１が実行対
象とするプログラムやデータを一時的に格納する半導体メモリである。ＥＥＰＲＯＭ１３
は、ＣＰＵ１１における演算処理結果の所定のデータ等が格納され、印刷装置の電源が切
断された後もこのデータを保持することが可能な不揮発性の半導体メモリである。画像処
理部１５は、ＣＰＵ１１から供給された描画命令に基づいて描画処理を実行し、得られた
画像データをＬＣＤ１８に供給して表示させる。Ｉ／Ｆ１６は、操作ボタン１９、カード
Ｉ／Ｆ回路２０、プリンタエンジンコントローラ２２、および、動画再生装置４０の間で
情報を授受する際に、データの表現形式を適宜変換する装置である。バス１７は、ＣＰＵ
１１、ＲＯＭ１２、ＥＥＰＲＯＭ１３、ＲＡＭ１４、画像処理部１５、および、Ｉ／Ｆ１
６、を相互に接続し、これらの間で情報の授受を可能とするための信号線群である。 Here, the CPU 11 executes various arithmetic processes in accordance with the program 12 a stored in the ROM 12 and controls each part of the apparatus including the paper feed motor 23 and the carriage motor 25. The ROM 12 is a semiconductor memory that stores a program 12a executed by the CPU 11 and other data. The RAM 14 is a semiconductor memory that temporarily stores programs and data to be executed by the CPU 11. EEPROM 13
Is a non-volatile semiconductor memory that stores predetermined data as a result of arithmetic processing in the CPU 11 and can retain this data even after the power of the printing apparatus is turned off. The image processing unit 15 executes a drawing process based on the drawing command supplied from the CPU 11 and supplies the obtained image data to the LCD 18 for display. The I / F 16 is a device that appropriately converts the data representation format when information is exchanged among the operation buttons 19, the card I / F circuit 20, the printer engine controller 22, and the moving image playback device 40. Bus 17 is a CPU
11, ROM 12, EEPROM 13, RAM 14, image processing unit 15, and I / F 1
6 is a signal line group for mutually connecting and enabling information exchange between them.

操作ボタン１９は、ユーザの操作に応じた所定の情報を生成して出力する。メモリカー
ドＭは、例えば、ディジタルカメラ等によって撮像された画像データ（静止画）が格納さ
れている不揮発メモリである。カードスロット２１は、印刷装置１０の筐体の一部に設け
られており、この部分にメモリカードＭが挿入される。カードＩ／Ｆ回路２０は、メモリ
カードＭに情報を書き込んだり、メモリカードＭから情報を読み出したりするためのイン
タフェースである。プリンタエンジンコントローラ２２は、紙送りモータ２３、キャリッ
ジモータ２５、および、記録ヘッド２８を制御するための制御部である。紙送りモータ２
３は、ローラ２４を回転させることにより印刷用紙またはロール紙を副走査方向（キャリ
ッジ２７の移動方向（主走査方向）に直交する方向）に移動させる。ローラ２４は、円柱
状の部材によって構成され、印刷用紙またはロール紙を副走査方向に移動させる。キャリ
ッジモータ２５は、キャリッジ２７に一端が固定されている駆動ベルト２６に駆動力を与
えることにより、キャリッジ２７を主走査方向に往復動させる。記録ヘッド２８は、印刷
用紙に対向する面に複数のノズルが形成されており、これら複数のノズルからインクを吐
出させることにより情報を印刷用紙に記録する。 The operation button 19 generates and outputs predetermined information corresponding to a user operation. The memory card M is a non-volatile memory that stores image data (still images) captured by a digital camera or the like, for example. The card slot 21 is provided in a part of the casing of the printing apparatus 10, and the memory card M is inserted into this part. The card I / F circuit 20 is an interface for writing information to the memory card M and reading information from the memory card M. The printer engine controller 22 is a control unit for controlling the paper feed motor 23, the carriage motor 25, and the recording head 28. Paper feed motor 2
3, the printing paper or the roll paper is moved in the sub-scanning direction (direction orthogonal to the moving direction of the carriage 27 (main scanning direction)) by rotating the roller 24. The roller 24 is configured by a cylindrical member, and moves printing paper or roll paper in the sub-scanning direction. The carriage motor 25 reciprocates the carriage 27 in the main scanning direction by applying a driving force to the driving belt 26 whose one end is fixed to the carriage 27. The recording head 28 has a plurality of nozzles formed on the surface facing the printing paper, and records information on the printing paper by discharging ink from the plurality of nozzles.

動画再生装置４０は、例えば、ＤＶＤ（Digital Versatile Disk）再生装置、ビデオ再
生装置、ビデオカメラ等によって構成されており、複数のフレームより構成される動画像
を再生して出力する。接続ケーブル４１は、例えば、ＵＳＢ（Universal Serial Bus）ケ
ーブルまたはＩＥＥＥ（Institute of Electrical and Electronic Engineers）１３９４
ケーブルによって構成され、ＵＳＢ規格またはＩＥＥＥ１３９４規格に基づいた信号を、
動画再生装置４０からＩ／Ｆ１６に伝送する。なお、記録された動画像を再生するのでは
なく、例えば、テレビカメラ等によってリアルタイムで撮影された動画像を入力するよう
にしてもよい。 The moving image playback device 40 includes, for example, a DVD (Digital Versatile Disk) playback device, a video playback device, a video camera, and the like, and plays back and outputs a moving image composed of a plurality of frames. The connection cable 41 is, for example, a USB (Universal Serial Bus) cable or an IEEE (Institute of Electrical and Electronic Engineers) 1394.
Consists of cables and signals based on the USB standard or the IEEE 1394 standard.
It is transmitted from the moving image playback device 40 to the I / F 16. Instead of reproducing the recorded moving image, for example, a moving image captured in real time by a television camera or the like may be input.

つぎに、図２を参照して、図１に示すＲＯＭ１２に格納されているプログラム１２ａが
実行されることにより、ソフトウエアとしてのプログラム１２ａと、ハードウエアとして
のＣＰＵ１１その他が協働することにより実現される機能ブロック群について説明する。
図２に示すように、プログラム１２ａが実行されることにより実現される機能ブロック群
６０としては、動画像入力モジュール６１、顔検出・認識モジュール６２、第２のトラッ
キング領域推定モジュール６３、画像特徴量算出モジュール６４、中央制御モジュール６
５、第２のトラッキング領域検出モジュール６６、および、演算結果出力モジュール６７
が主に存在する。
ここで、動画像入力モジュール６１は、動画再生装置４０から出力される動画を入力す
るモジュールである。顔検出・認識モジュール６２（請求項中「第１の検出手段」に対応
）は、動画を構成する各フレームから所定の人物の顔を第１のトラッキング領域として検
出するとともに、必要に応じて顔の認識処理（同一人物であるか否かの同定処理）を実行
するモジュールである。第２のトラッキング領域推定モジュール６３（請求項中「第２の
検出手段」に対応）は、人物の顔以外の領域（主に胴体部分）を第２のトラッキング領域
とし、このトラッキング領域を、それよりも前のフレームにおける第２のトラッキング領
域の動きベクトル、または、顔の位置、大きさ、方向に基づいて各フレーム中から推定す
るモジュールである。画像特徴量算出モジュール６４（請求項中「第２の検出手段」に対
応）は、推定された第２のトラッキング領域と背景のそれぞれについて、複数の種類の画
像特徴量（詳細は後述する）を算出し、出力する。中央制御モジュール６５（請求項中「
トラッキング手段」に対応）は、処理の中核となるモジュールであり、一連の処理が適切
に実行されるように他のモジュールを制御する。第２のトラッキング領域検出モジュール
６６（請求項中「第２の検出手段」に対応）は、画像特徴量算出モジュール６４によって
算出された複数の画像特徴量のうち、最も識別性の高い特徴量に基づいて、第２のトラッ
キング領域を検出する。演算結果出力モジュール６７は、演算処理の結果（トラッキング
の結果）のデータを出力する。 Next, referring to FIG. 2, the program 12a stored in the ROM 12 shown in FIG. 1 is executed, so that the program 12a as software and the CPU 11 as hardware cooperate with each other. The functional block group to be performed will be described.
As shown in FIG. 2, the functional block group 60 realized by executing the program 12a includes a moving image input module 61, a face detection / recognition module 62, a second tracking area estimation module 63, an image feature amount. Calculation module 64, central control module 6
5. Second tracking region detection module 66 and calculation result output module 67
Is mainly present.
Here, the moving image input module 61 is a module for inputting a moving image output from the moving image playback device 40. The face detection / recognition module 62 (corresponding to “first detection means” in the claims) detects a face of a predetermined person from each frame constituting the moving image as the first tracking area, and if necessary, the face This is a module for executing the recognition process (identification process for identifying whether or not they are the same person). The second tracking area estimation module 63 (corresponding to “second detection means” in the claims) uses an area other than the human face (mainly the body part) as the second tracking area, and uses this tracking area as the second tracking area. This is a module that estimates from each frame based on the motion vector of the second tracking area in the previous frame or the position, size, and direction of the face. The image feature amount calculation module 64 (corresponding to “second detection means” in the claims) obtains a plurality of types of image feature amounts (details will be described later) for each of the estimated second tracking region and background. Calculate and output. Central control module 65 ("
Corresponding to “tracking means”) is a module that is the core of the process, and controls other modules so that a series of processes are appropriately executed. The second tracking area detection module 66 (corresponding to “second detection means” in the claims) selects the feature quantity having the highest discriminability among the plurality of image feature quantities calculated by the image feature quantity calculation module 64. Based on this, a second tracking region is detected. The calculation result output module 67 outputs data of the calculation processing result (tracking result).

（Ｂ）実施の形態の動作の概要説明
つぎに、本実施の形態の動作の概要について説明する。本実施の形態では、動画像中の
人物をトラッキング（追跡）する。より詳細には、図３に示すように、動画像を構成する
フレーム８０中の人物８１の顔を含む領域としての第１のトラッキング領域８２を顔検出
処理によって検出する。
そして、第１のトラッキング領域８２の検出に成功した場合には、第１のトラッキング
領域８２（顔）の大きさ、位置、および、方向等に基づいて、胴体を含む領域としての第
２のトラッキング領域８３を推定する。また、第１のトラッキング領域８２の検出に失敗
した場合には、時間的に前のフレームにおいて検出された第２のトラッキング領域８３の
動きベクトルに基づいて、現フレームにおける第２のトラッキング領域８３を推定する。
具体的には、図４に示すように、遮蔽物（この例ではボール９５）の存在等によって第１
のトラッキング領域８２が検出できない場合には、時間的に前のフレームにおいて検出さ
れた第２のトラッキング領域８３の動きベクトルに基づいて、第２のトラッキング領域８
３の位置および範囲が推定される。
そして、第２のトラッキング領域８３の領域の推定が完了すると、図３に示すように、
フレーム８０を複数の小領域８５に分割し、各小領域８５のそれぞれについて、複数種類
の特徴量を算出する。算出する特徴量の種類としては、例えば、輝度、ＲＧＢ（Red Gree
n Blue）色で指定されるヒストグラム、テキスチャ情報、空間周波数成分等がある。これ
ら複数の特徴量がそれぞれの小領域８５について算出されると、つぎに、推定された第２
のトラッキング領域８３に属する小領域８５と、それ以外の領域（背景９０）に属する小
領域８５のそれぞれについて、特徴量の平均値および分散値を求める。
つづいて、それぞれの特徴量について、平均値と分散値に基づいて識別力を求める。こ
こで、識別力とは、当該特徴量に基づいて小領域８５を第２のトラッキング領域８３（胴
体部分）と背景とに分類する場合に、各小領域８５がどちらに分類されるかを識別する能
力の高さをいう。より詳細には、図５に示すように、第２のトラッキング領域８３に属す
る全ての小領域８５の特徴量の分布曲線１１１と、背景９０に属する全ての小領域８５の
特徴量の分布曲線１１０との重複する部分（図中ハッチングを施した部分）の面積が最小
となる種類の特徴量を選択する。例えば、特徴量として、輝度、色ヒストグラム、テキス
チャ情報、空間周波数成分の４種類を使用する場合、これらのうち、分布曲線の重複する
部分の面積が最小であるのが色ヒストグラムである場合には、色ヒストグラムの識別力が
最も高いと判定される。なお、特徴量の分布曲線を正規分布曲線と仮定し、求められた特
徴量の平均値と分散値から、分布曲線の重複する部分を計算により数学的に求めてもよい
。すなわち、必ずしも正確な分布曲線を求める必要はなく、また分布曲線を描画して重複
する部分を求める必要はない。 (B) Outline of Operation of Embodiment Next, an outline of operation of the present embodiment will be described. In the present embodiment, a person in a moving image is tracked. More specifically, as shown in FIG. 3, a first tracking region 82 as a region including the face of a person 81 in a frame 80 constituting a moving image is detected by face detection processing.
If the detection of the first tracking region 82 is successful, the second tracking as a region including the body is performed based on the size, position, direction, and the like of the first tracking region 82 (face). Region 83 is estimated. If the detection of the first tracking area 82 fails, the second tracking area 83 in the current frame is determined based on the motion vector of the second tracking area 83 detected in the previous frame in time. presume.
Specifically, as shown in FIG. 4, the first is determined by the presence of a shield (ball 95 in this example).
If the tracking area 82 of the second tracking area 83 cannot be detected, the second tracking area 8 is detected based on the motion vector of the second tracking area 83 detected in the previous frame in time.
3 positions and ranges are estimated.
When the estimation of the second tracking area 83 is completed, as shown in FIG.
The frame 80 is divided into a plurality of small areas 85, and a plurality of types of feature amounts are calculated for each of the small areas 85. The types of feature quantities to be calculated include, for example, luminance, RGB (Red Gree
n Blue) histogram, texture information, spatial frequency components, etc. When the plurality of feature amounts are calculated for each small region 85, the estimated second value is then calculated.
For each of the small area 85 belonging to the tracking area 83 and the small area 85 belonging to the other area (background 90), the average value and the variance value of the feature values are obtained.
Subsequently, for each feature amount, the discriminating power is obtained based on the average value and the variance value. Here, the discriminating power is used to identify which small area 85 is classified when the small area 85 is classified into the second tracking area 83 (body part) and the background based on the feature amount. High ability to do. More specifically, as shown in FIG. 5, the feature amount distribution curve 111 of all small regions 85 belonging to the second tracking region 83 and the feature amount distribution curve 110 of all small regions 85 belonging to the background 90. The feature quantity of the type that minimizes the area of the overlapping part (the hatched part in the figure) is selected. For example, when four types of luminance, color histogram, texture information, and spatial frequency component are used as feature amounts, the color histogram has the smallest area of the overlapping portions of the distribution curve. The color histogram is determined to have the highest discriminating power. Note that the distribution curve of the feature amount may be assumed to be a normal distribution curve, and the overlapping portion of the distribution curve may be mathematically obtained by calculation from the average value and the variance value of the obtained feature amount. That is, it is not always necessary to obtain an accurate distribution curve, and it is not necessary to draw an overlapping portion by drawing the distribution curve.

つづいて、選択された特徴量に基づいて閾値を設定し、各小領域８５を第２のトラッキ
ング領域８３（胴体部分）と背景とに区別する。図５の例では、閾値Ｔｈよりも特徴量が
小さい小領域８５については背景に属すると判断され、閾値Ｔｈよりも特徴量が大きい小
領域８５については第２のトラッキング領域８３（胴体部分）に属すると判断される。こ
れにより、胴体に属する小領域８５が特定される。
つづいて、前述した顔検出処理によって第１のトラッキング領域８２が検出された場合
には、第１のトラッキング領域８２を代表する座標値およびその範囲を、トラッキング結
果として出力する。また、顔検出処理によって第１のトラッキング領域８２が検出されな
かった場合には、第２のトラッキング領域８３（胴体部分）に属する小領域８５を代表す
る座標値およびその範囲を、トラッキング結果として出力する。これにより、遮蔽物によ
って顔が隠れた場合であっても、胴体によってトラッキングがなされるので、対象を見失
うことがない。
そして、つづく処理では、前回の処理において第１のトラッキング領域８２（顔）が検
出された場合には第１のトラッキング領域８２の動きベクトルに基づいて、新たなフレー
ム中において第１のトラッキング領域８２が存在する領域を推定し、推定された領域内を
優先して顔検出処理を実行する。これにより、顔検出処理にかかる時間を短縮することが
できる。また、前回の処理において第１のトラッキング領域８２が検出されなかった場合
には第２のトラッキング領域８３の動きベクトルに基づいて、新たなフレーム中において
第２のトラッキング領域８３が存在する領域を推定し、推定された第２のトラッキング領
域８３に基づいて第１のトラッキング領域８２が存在する領域をさらに推定し、推定され
た領域内を優先して顔検出処理を実行する。これにより、第１のトラッキング領域８２を
見失うことを防止できるとともに、顔検出処理にかかる時間を短縮することができる。 Subsequently, a threshold value is set based on the selected feature amount, and each small area 85 is distinguished from the second tracking area 83 (body part) and the background. In the example of FIG. 5, it is determined that the small region 85 having a feature amount smaller than the threshold Th belongs to the background, and the small region 85 having a feature amount larger than the threshold Th is in the second tracking region 83 (body portion). Judged to belong. Thereby, the small region 85 belonging to the trunk is specified.
Subsequently, when the first tracking area 82 is detected by the face detection process described above, a coordinate value representing the first tracking area 82 and its range are output as a tracking result. If the first tracking area 82 is not detected by the face detection process, the coordinate value representing the small area 85 belonging to the second tracking area 83 (the body part) and its range are output as the tracking result. To do. As a result, even if the face is hidden by the shielding object, tracking is performed by the torso, so that the target is not lost.
In the subsequent process, when the first tracking area 82 (face) is detected in the previous process, the first tracking area 82 in the new frame is based on the motion vector of the first tracking area 82. The area in which the image exists is estimated, and the face detection process is executed with priority on the estimated area. Thereby, the time required for the face detection process can be shortened. If the first tracking area 82 is not detected in the previous process, the area where the second tracking area 83 exists in the new frame is estimated based on the motion vector of the second tracking area 83. Then, a region where the first tracking region 82 exists is further estimated based on the estimated second tracking region 83, and the face detection process is executed with priority in the estimated region. Thereby, it is possible to prevent the first tracking area 82 from being lost, and it is possible to shorten the time required for the face detection process.

（Ｃ）実施の形態の動作の詳細説明
つぎに、本発明の実施の形態の動作について詳細に説明する。図６は、図２に示す機能
ブロックにより実行される処理を説明するフローチャートである。この処理では、動画像
中の人物をトラッキングする。なお、トラッキング対象となる人物については、例えば、
ユーザが操作ボタン１９を操作することにより特定するようにしてもよいし、あるいは、
動画像中の全ての人物についてトラッキングの対象としてもよい。
動画再生装置４０から接続ケーブル４１およびＩ／Ｆ１６を介して動画像が入力される
と、動画像入力モジュール６１は、入力された動画像から一つのフレームを抽出し、中央
制御モジュール６５に供給する。中央制御モジュール６５は、供給された動画像のフレー
ムから、動きベクトルを算出する（ステップＳ１０）。より詳細には、直前の２フレーム
において第１のトラッキング領域８２としての顔が検出された場合には、これら２つのフ
レームにおける顔の位置座標Ｆ１＝（Ｘ１，Ｙ１）、Ｆ２＝（Ｘ２，Ｙ２）およびフレー
ム間の時間Ｔに基づき、動きベクトルＶ＝（Ｆ２−Ｆ１）／Ｔを算出し、１つ前のフレー
ムの位置座標Ｆ２と動きベクトルＶと時間Ｔに基づいて、現フレームにおける顔の位置座
標Ｆ３＝（Ｘ３，Ｙ３）を推定する。また、直前の２フレームの少なくとも１つにおいて
顔が検出されなかった場合には、直前の２フレームにおける胴体（第２のトラッキング領
域８３）の位置座標に基づいて、前述の場合と同様の処理により、胴体の動きベクトルを
検出し、この動きベクトルに基づいて現フレームにおける胴体の位置座標を推定する。 (C) Detailed Description of Operation of Embodiment Next, the operation of the embodiment of the present invention will be described in detail. FIG. 6 is a flowchart for explaining processing executed by the functional blocks shown in FIG. In this process, a person in the moving image is tracked. As for the person to be tracked, for example,
The user may specify by operating the operation button 19, or
All persons in the moving image may be subject to tracking.
When a moving image is input from the moving image playback device 40 via the connection cable 41 and the I / F 16, the moving image input module 61 extracts one frame from the input moving image and supplies it to the central control module 65. . The central control module 65 calculates a motion vector from the supplied moving image frame (step S10). More specifically, when a face as the first tracking area 82 is detected in the immediately preceding two frames, the face position coordinates F1 = (X1, Y1) and F2 = (X2, Y2) in these two frames. ) And the time T between the frames, the motion vector V = (F2−F1) / T is calculated, and the face coordinates in the current frame are calculated based on the position coordinates F2 of the previous frame, the motion vector V, and the time T. The position coordinate F3 = (X3, Y3) is estimated. If no face is detected in at least one of the immediately preceding two frames, the same processing as described above is performed based on the position coordinates of the body (second tracking area 83) in the immediately preceding two frames. The body motion vector is detected, and the position coordinates of the body in the current frame are estimated based on the motion vector.

顔または胴体の動きベクトルが計算されると、中央制御モジュール６５は、顔検出・認
識モジュール６２に対して、推定された顔または胴体（第１または第２のトラッキング領
域）の位置座標を供給するとともに、現フレームから対象となる人物の顔を検出するよう
に指示する。その結果、顔検出・認識モジュール６２は、現フレームに対して顔検出処理
を実行する（ステップＳ１１）。その際、推定された顔の位置座標が供給された場合には
、当該位置座標の周辺を優先して顔検出処理を実行する。また、推定された胴体の位置座
標が供給された場合には、当該位置座標から顔の位置座標を推定し、推定された位置座標
の周辺を優先して顔検出処理を実行する。なお、胴体から顔の位置を検出する方法として
は、例えば、胴体を矩形の領域として表した場合に、当該矩形の短手方向に存在する辺の
いずれか一方の近傍に顔が存在すると判定でき、また、顔の大きさとしては矩形の面積か
ら推定することができる（例えば、１／４の面積として推定することができる）。
顔検出処理の具体的な内容としては、例えば、図７（Ａ）に示すテンプレート（顔の特
徴点である目、鼻、口を含む画像）と相関が高い領域が含まれている領域に顔が含まれて
いると判定する。なお、含まれている顔の大きさは、被写体とカメラの距離によって変化
し、また、被写体が複数である場合も想定されるので、図７（Ａ）に示すように大きさの
異なる複数のテンプレート（図７（Ａ）では第１から第５の５枚のテンプレート）を用い
て顔の検出を行うとともに、例えば、１０人分の顔領域が発見されるまで処理を繰り返し
てもよい。また、テンプレートの画像の解像度が高い場合には、各個人の顔の特徴に影響
を受けてマッチング処理の精度が低下するので、図７（Ｂ）に示すように、テンプレート
にモザイク処理を施すことにより、各個人の特徴に影響を受けにくくしている。
なお、実際には、人物は正面ではなく、上、下、左、右の方向を向いている場合も想定
され、また、顔の角度もフレーム中において右または左のいずれかに傾いていることも想
定される。そのため、実際の処理では、上、下、左、右の方向を向いているものに対応す
る複数のテンプレートを使用する。また、顔の角度に対応するために、これら複数のテン
プレートを、右または左に所定の角度ずつ傾けながら検出処理を実行する。
そして、フレーム中に顔（第１のトラッキング領域８２）が検出された場合には、顔検
出・認識モジュール６２は、顔の中心座標とその大きさを、中央制御モジュール６５に供
給する。また、顔が検出できなかった場合には、検出できなかった旨を中央制御モジュー
ル６５に通知する。
なお、特定の人物をトラッキングする場合には、顔検出・認識モジュール６２は、検出
された顔から特徴量（例えば、目、鼻、口の大きさおよび配置関係を示す情報）を抽出し
、人物の同定処理（認識処理）を実行する。 When the motion vector of the face or torso is calculated, the central control module 65 supplies the estimated position coordinates of the face or torso (first or second tracking region) to the face detection / recognition module 62. At the same time, an instruction is given to detect the face of the target person from the current frame. As a result, the face detection / recognition module 62 performs face detection processing on the current frame (step S11). At this time, when the estimated position coordinates of the face are supplied, the face detection process is executed with priority on the periphery of the position coordinates. Further, when the estimated position coordinates of the body are supplied, the position coordinates of the face are estimated from the position coordinates, and face detection processing is executed with priority on the periphery of the estimated position coordinates. As a method for detecting the position of the face from the torso, for example, when the torso is represented as a rectangular area, it can be determined that the face exists in the vicinity of one of the sides existing in the short direction of the rectangle. In addition, the face size can be estimated from a rectangular area (for example, it can be estimated as a quarter area).
As specific contents of the face detection processing, for example, a face is included in an area that includes an area highly correlated with the template shown in FIG. 7A (an image including eyes, nose, and mouth that are facial feature points). Is determined to be included. Note that the size of the included face varies depending on the distance between the subject and the camera, and it is also assumed that there are a plurality of subjects, so a plurality of different sizes as shown in FIG. While detecting a face using a template (first to fifth templates in FIG. 7A), for example, the processing may be repeated until a face area for 10 people is found. In addition, when the resolution of the template image is high, the accuracy of the matching process is affected by the characteristics of each individual's face, so the template is subjected to mosaic processing as shown in FIG. 7B. This makes it less susceptible to the characteristics of each individual.
Actually, it is assumed that the person is facing the top, bottom, left, and right directions instead of the front, and the face angle is tilted to the right or left in the frame. Is also envisaged. Therefore, in actual processing, a plurality of templates corresponding to those facing in the upward, downward, left, and right directions are used. Further, in order to correspond to the angle of the face, detection processing is executed while tilting the plurality of templates to the right or left by a predetermined angle.
When a face (first tracking area 82) is detected in the frame, the face detection / recognition module 62 supplies the center coordinates and the size of the face to the central control module 65. If the face cannot be detected, the central control module 65 is notified that the face has not been detected.
When tracking a specific person, the face detection / recognition module 62 extracts a feature amount (for example, information indicating the size of the eyes, the nose, the mouth, and the arrangement relationship) from the detected face. The identification process (recognition process) is executed.

顔の検出処理が完了すると、つぎに、中央制御モジュール６５は、第２のトラッキング
領域推定モジュール６３に対して、第２のトラッキング領域８３（胴体）を推定するよう
に指示をする。その結果、第２のトラッキング領域推定モジュール６３は、ステップＳ１
１で検出された顔（第１のトラッキング領域８２）に基づいて第２のトラッキング領域８
３を推定するか、または、直前のフレームから求めた第２のトラッキング領域８３の動き
ベクトルに基づいて現フレームにおける第２のトラッキング領域８３を推定する（ステッ
プＳ１２）。より詳細には、ステップＳ１１において顔が検出できた場合には、顔の大き
さ、位置、および、方向に基づいて第２のトラッキング領域８３を推定する。一例として
、顔の検出に使用されたテンプレートの縦、横がそれぞれ２倍の大きさの矩形を第２のト
ラッキング領域８３と想定し、検出された顔の位置（例えば、中心座標）と方向（顔の左
右方向の傾き）に基づいて、第２のトラッキング領域８３の位置を推定する。すなわち、
顔の傾き方向の直線上であって、顔の中心位置から所定の距離だけ離れた位置を第２のト
ラッキング領域８３の中心とする。胴体は顔から一定の距離を隔てて存在し、また、胴体
の大きさは顔の大きさと一定の相関を有するからである。
また、顔が検出できなかった場合には、それよりも前のフレームにおける第２のトラッ
キング領域８３の動きベクトルに基づいて、現フレームにおける第２のトラッキング領域
８３を推定する。具体的には、直前の２フレームにおいて検出された第２のトラッキング
領域８３の中心座標の位置の変化から動きベクトルを検出し、この動きベクトルに基づい
て現フレームにおける第２のトラッキング領域８３の中心座標を推定する。また、直前の
フレームにおける第２のトラッキング領域８３の大きさに基づいて現フレームにおける第
２のトラッキング領域８３の大きさを推定する。なお、配置角度については、例えば、直
前の２フレームにおける回転ベクトル（回転方向および速度を示すベクトル）を計算し、
これに基づいて現フレームにおける第２のトラッキング領域８３の配置角度を推定するよ
うにしてもよい。
なお、以上の説明では、顔が検出された場合には、顔に基づいて第２のトラッキング領
域８３を推定し、顔が検出されなかった場合には、直前のフレームにおける第２のトラッ
キング領域８３に基づいて現フレームにおける第２のトラッキング領域８３を推定するよ
うにしたが、顔が検出された場合には、これらを組み合わせて使用するようにしてもよい
。具体的には、例えば、動きベクトルを用いて第２のトラッキング領域８３の中心座標を
推定し、顔の大きさ、位置、方向に基づいて、第２のトラッキング領域８３の大きさおよ
び方向を推定するようにしてもよい。
第２のトラッキング領域８３が推定されると、第２のトラッキング領域推定モジュール
６３は、推定された第２のトラッキング領域８３の中心座標、大きさ、および、方向を中
央制御モジュール６５に通知する。 When the face detection process is completed, the central control module 65 then instructs the second tracking area estimation module 63 to estimate the second tracking area 83 (torso). As a result, the second tracking region estimation module 63 performs step S1.
2 based on the face detected in 1 (first tracking area 82).
3 or the second tracking region 83 in the current frame is estimated based on the motion vector of the second tracking region 83 obtained from the immediately preceding frame (step S12). More specifically, when a face can be detected in step S11, the second tracking area 83 is estimated based on the size, position, and direction of the face. As an example, assuming that the second tracking area 83 is a rectangle whose vertical and horizontal dimensions are twice as large as that of the template used for face detection, the detected face position (for example, center coordinates) and direction ( The position of the second tracking region 83 is estimated based on the inclination of the face in the horizontal direction. That is,
The center of the second tracking area 83 is a position on the straight line in the tilt direction of the face that is a predetermined distance away from the center position of the face. This is because the torso exists at a certain distance from the face, and the size of the torso has a certain correlation with the size of the face.
If the face cannot be detected, the second tracking area 83 in the current frame is estimated based on the motion vector of the second tracking area 83 in the previous frame. Specifically, a motion vector is detected from a change in the center coordinate position of the second tracking region 83 detected in the immediately preceding two frames, and the center of the second tracking region 83 in the current frame is detected based on the motion vector. Estimate the coordinates. Further, the size of the second tracking region 83 in the current frame is estimated based on the size of the second tracking region 83 in the immediately preceding frame. As for the arrangement angle, for example, a rotation vector (vector indicating the rotation direction and speed) in the immediately preceding two frames is calculated,
Based on this, the arrangement angle of the second tracking region 83 in the current frame may be estimated.
In the above description, when a face is detected, the second tracking area 83 is estimated based on the face, and when no face is detected, the second tracking area 83 in the immediately preceding frame is estimated. The second tracking region 83 in the current frame is estimated based on the above, but when a face is detected, these may be used in combination. Specifically, for example, the center coordinates of the second tracking region 83 are estimated using a motion vector, and the size and direction of the second tracking region 83 are estimated based on the size, position, and direction of the face. You may make it do.
When the second tracking region 83 is estimated, the second tracking region estimation module 63 notifies the central control module 65 of the estimated center coordinates, size, and direction of the second tracking region 83.

中央制御モジュール６５は、推定された第２のトラッキング領域８３に関する情報を受
け取ると、画像特徴量算出モジュール６４に対して、当該情報を供給するとともに、画像
を小領域に分割するように指示する。その結果、画像特徴量算出モジュール６４は、現フ
レームの画像を図３に示すように複数の小領域８５に分割する（ステップＳ１３）。なお
、分割の方法としては、例えば、各小領域８５がＭＣＵ（Minimum Coded Unit）と同サイ
ズである縦、横、それぞれ８画素を有するように分割することができる。 When the central control module 65 receives the information on the estimated second tracking area 83, the central control module 65 supplies the information to the image feature amount calculation module 64 and instructs the image to be divided into small areas. As a result, the image feature amount calculation module 64 divides the image of the current frame into a plurality of small regions 85 as shown in FIG. 3 (step S13). As a division method, for example, each small area 85 can be divided so as to have 8 pixels in the vertical and horizontal directions, each having the same size as an MCU (Minimum Coded Unit).

小領域８５への分割処理が完了すると、中央制御モジュール６５は、画像特徴量算出モ
ジュール６４に対して推定された第２のトラッキング領域８３と、背景のそれぞれに属す
る小領域８５の特徴量を複数算出するように指示する。その結果、画像特徴量算出モジュ
ール６４は、まず、第２のトラッキング領域８３の属する小領域８５（図３では推定され
た第２のトラッキング領域８３にその面積の半分以上が属している小領域８５）のそれぞ
れについて、輝度、ＲＧＢで指定される色ヒストグラム、テキスチャ情報、および、空間
周波数成分を計算する（ステップＳ１４）。
より詳細には、輝度としては、各小領域８５を構成する画素のＲＧＢの画像データに基
づいて周知の計算式により輝度を算出する。色ヒストグラムについては、各小領域８５を
構成する画素のＲＧＢ毎の色の分布を示すヒストグラムを算出する。テキスチャ情報につ
いては、例えば、各小領域８５を構成する画素の明暗のパターンを示す情報を算出する。
また、色空間周波数成分については、各小領域８５をガボール変換またはフーリエ変換し
た場合に得られる周波数成分を算出する。
そして、全ての小領域８５に対して全ての種類の特徴量の算出が完了すると、画像特徴
量算出モジュール６４は、計算結果を中央制御モジュール６５に供給する。 When the division processing into the small areas 85 is completed, the central control module 65 sets a plurality of feature quantities of the second tracking area 83 estimated for the image feature quantity calculation module 64 and the small areas 85 belonging to the background. Instruct to calculate. As a result, the image feature amount calculation module 64 firstly has a small region 85 to which the second tracking region 83 belongs (the small region 85 to which more than half of the area belongs to the second tracking region 83 estimated in FIG. 3). ), Luminance, a color histogram specified by RGB, texture information, and a spatial frequency component are calculated (step S14).
More specifically, as the luminance, the luminance is calculated by a well-known calculation formula based on the RGB image data of the pixels constituting each small region 85. As for the color histogram, a histogram indicating the color distribution for each RGB of the pixels constituting each small region 85 is calculated. For the texture information, for example, information indicating the light / dark pattern of the pixels constituting each small region 85 is calculated.
For the color space frequency component, a frequency component obtained when each small region 85 is subjected to Gabor transform or Fourier transform is calculated.
When the calculation of all types of feature amounts for all the small regions 85 is completed, the image feature amount calculation module 64 supplies the calculation result to the central control module 65.

中央制御モジュール６５は、ステップＳ１４において算出された特徴量に基づいて、識
別性が最も高い特徴量を選択する（ステップＳ１５）。より詳細には、それぞれの特徴量
について、推定された第２のトラッキング領域８３に属する小領域８５全ての特徴量に対
する分布式と、それ以外の領域に属する小領域８５全ての特徴量に対する分布曲線を以下
の式（１）に基づいて求める。なお、μは各領域に属する全ての小領域８５の特徴量の平
均値であり、σは各領域に属する全ての小領域８５の特徴量の分散値を示す。

そして、図５に示すように、第２のトラッキング領域８３に属する全ての小領域８５の
特徴量の分布曲線１１１と、背景９０に属する全ての小領域８５の特徴量の分布曲線１１
０を比較した場合に、これらの分布曲線１１０，１１１の重複する面積が少ないものを、
最も識別性が高い特徴量と判断する。例えば、分布曲線の重複する面積が最も少ないのが
、色ヒストグラムである場合には色ヒストグラムが識別性が最も高い特徴量として選択さ
れる。なお、このとき、図５に示すように、重複する領域の中間値を閾値Ｔｈとして算出
する。 The central control module 65 selects the feature quantity having the highest discriminability based on the feature quantity calculated in step S14 (step S15). More specifically, for each feature quantity, a distribution formula for the feature quantities of all the small areas 85 belonging to the estimated second tracking area 83 and a distribution curve for the feature quantities of all the small areas 85 belonging to the other areas. Is obtained based on the following equation (1). Note that μ is an average value of feature values of all small regions 85 belonging to each region, and σ represents a variance value of feature values of all small regions 85 belonging to each region.

Then, as shown in FIG. 5, the feature amount distribution curve 111 of all the small regions 85 belonging to the second tracking region 83 and the feature amount distribution curve 11 of all the small regions 85 belonging to the background 90.
When 0 is compared, the one where these distribution curves 110 and 111 have a small overlapping area,
Judge as the feature quantity with the highest discriminability. For example, when the area where the distribution curve overlaps is the smallest in the color histogram, the color histogram is selected as the feature quantity having the highest discriminability. At this time, as shown in FIG. 5, an intermediate value of overlapping regions is calculated as a threshold Th.

識別性が高い特徴量が選択されると、中央制御モジュール６５は、選択された特徴量お
よび閾値Ｔｈを第２のトラッキング領域検出モジュール６６に通知するとともに、第２の
トラッキング領域８３を検出するように指示する。この結果、第２のトラッキング領域検
出モジュール６６は、選択された特徴量と閾値Ｔｈに基づいて、各小領域８５の特徴量と
閾値Ｔｈとを比較し、第２のトラッキング領域８３（胴体）に属するか否かを判定するこ
とにより、第２のトラッキング領域８３を検出する（ステップＳ１６）。例えば、図５に
示す例では、各小領域８５の特徴量が閾値Ｔｈ以上である場合には第２のトラッキング領
域８３に属すると判定し、Ｔｈ未満である場合には属しないと判定する。そして、このよ
うな判定処理を全ての小領域８５に対して実行する。この結果、例えば、図８にハッチン
グを示す領域が、第２のトラッキング領域８３として検出される。このようにして検出さ
れた第２のトラッキング領域８３に関する情報（例えば、第２のトラッキング領域８３に
属する小領域８５の位置を示す情報）は、中央制御モジュール６５に通知される。
なお、全ての小領域８５について判定するのではなく、例えば、ステップＳ１２におい
て推定された第２のトラッキング領域８３に属する小領域８５およびその周辺に存在する
小領域８５のみを対象として判定するようにしてもよい。 When a feature quantity with high discriminability is selected, the central control module 65 notifies the second tracking area detection module 66 of the selected feature quantity and threshold value Th, and detects the second tracking area 83. To instruct. As a result, the second tracking region detection module 66 compares the feature amount of each small region 85 with the threshold Th based on the selected feature amount and the threshold Th, and the second tracking region 83 (body) By determining whether or not it belongs, the second tracking region 83 is detected (step S16). For example, in the example shown in FIG. 5, when the feature amount of each small region 85 is equal to or greater than the threshold Th, it is determined that it belongs to the second tracking region 83, and when it is less than Th, it is determined that it does not belong. Then, such a determination process is executed for all the small areas 85. As a result, for example, the hatched area in FIG. 8 is detected as the second tracking area 83. Information regarding the second tracking area 83 detected in this way (for example, information indicating the position of the small area 85 belonging to the second tracking area 83) is notified to the central control module 65.
Instead of determining all the small areas 85, for example, only the small area 85 belonging to the second tracking area 83 estimated in step S12 and the small areas 85 existing in the vicinity thereof are determined as targets. May be.

つづいて、中央制御モジュール６５は、ステップＳ１１において、第１のトラッキング
領域８２の検出に成功したか否かを判定する（ステップＳ１７）。その結果、ステップＳ
１１において第１のトラッキング領域８２（顔）の検出に成功した場合（ステップＳ１７
においてＹｅｓと判定した場合）にはステップＳ１８に進み、それ以外の場合（ステップ
Ｓ１７においてＮｏと判定した場合）にはステップＳ１９に進む。例えば、第１のトラッ
キング領域８２の検出に成功した場合にはステップＳ１８に進む。 Subsequently, the central control module 65 determines whether or not the first tracking area 82 has been successfully detected in step S11 (step S17). As a result, step S
11 successfully detects the first tracking area 82 (face) (step S17).
If YES in step S18), the process proceeds to step S18. In other cases (NO in step S17), the process proceeds to step S19. For example, if the first tracking area 82 is successfully detected, the process proceeds to step S18.

第１のトラッキング領域８２の検出に成功した場合にはステップＳ１８に進み、中央制
御モジュール６５は、第１のトラッキング領域８２としての顔を代表する位置座標（例え
ば、顔の中心座標）と、その範囲を示す情報（例えば、顔を含む矩形）とを演算結果出力
モジュール６７を介して出力する。この結果、例えば、ＬＣＤ１８には、図９に示すよう
に、第１のトラッキング領域８２である顔を囲む矩形１２０が、画像とともに表示される
。 If the detection of the first tracking area 82 is successful, the process proceeds to step S18, and the central control module 65 determines the position coordinates representing the face as the first tracking area 82 (for example, the center coordinates of the face), Information indicating the range (for example, a rectangle including the face) is output via the calculation result output module 67. As a result, for example, as shown in FIG. 9, a rectangle 120 surrounding the face, which is the first tracking area 82, is displayed on the LCD 18 together with the image.

第１のトラッキング領域８２が検出できなかった場合にはステップＳ１９に進み、中央
制御モジュール６５は、第２のトラッキング領域８３としての胴体を代表する位置座標（
例えば、図８に示すハッチングが施された全ての小領域８５の重心座標）と、その範囲を
示す情報（例えば、図８に示すハッチングが施された全ての小領域８５を囲む矩形）とを
演算結果出力モジュール６７を介して出力する。この結果、例えば、ＬＣＤ１８には、図
１０に示すように、第２のトラッキング領域８３である胴体を囲む矩形１２１が、画像と
ともに表示される。 If the first tracking area 82 cannot be detected, the process proceeds to step S19, where the central control module 65 determines the position coordinates (representing the body as the second tracking area 83) (
For example, the coordinates of the center of gravity of all the small regions 85 shown in FIG. 8) and information indicating the range (for example, a rectangle surrounding all the small regions 85 shown in FIG. 8). The result is output via the calculation result output module 67. As a result, for example, as shown in FIG. 10, a rectangle 121 surrounding the body that is the second tracking region 83 is displayed on the LCD 18 together with the image.

このようにしてＬＣＤ１８に表示された画像と、矩形１２０，１２１を参照することに
より、同一の人物をトラッキングすることができる。また、顔が遮蔽物によって遮られた
場合であっても、トラッキングが途絶えることがない。したがって、例えば、図４に示す
ように、サッカーの試合等において、ボールが顔の付近に存在しているような場合（例え
ば、ヘディングをしている場合）であっても、トラッキングが途絶えることがない。した
がって、特定の選手の決定的な瞬間（例えば、ヘディングシュートをした瞬間）を逃すこ
となく、トラッキングするとともに、そのような瞬間を、印刷用紙に印刷することができ
る。 Thus, the same person can be tracked by referring to the image displayed on the LCD 18 and the rectangles 120 and 121. Further, even when the face is blocked by the shielding object, tracking is not interrupted. Therefore, for example, as shown in FIG. 4, in a soccer game or the like, even when the ball is present near the face (for example, when heading), tracking may be interrupted. Absent. Therefore, it is possible to track without missing a decisive moment of a specific player (for example, a moment when a heading shot is taken) and to print such a moment on a print sheet.

つぎに、中央制御モジュール６５は、処理を終了するか否かを判定し（ステップＳ２０
）、終了する場合（ステップＳ２０においてＮｏと判定した場合）にはステップＳ１０に
戻って、前述の場合と同様の処理を繰り返し、それ以外の場合（ステップＳ２０において
Ｙｅｓと判定した場合）には処理を終了する。 Next, the central control module 65 determines whether or not to end the process (step S20).
), If finished (if determined No in step S20), return to step S10 and repeat the same process as described above, otherwise process (if determined Yes in step S20) Exit.

図６に示す処理では、第１のトラッキング領域（顔）を検出できた場合でも第２のトラ
ッキング領域を検出している。この理由は、第１のトラッキング領域が検出できなくなっ
た時点で第２のトラッキング領域の検出を行う構成では、以前のフレームに第２のトラッ
キング領域に関する情報が存在しないため、第２のトラッキング領域の検出精度が悪くな
ることを考慮し、常に第２のトラッキング領域を検出しておく構成を取っているためであ
る。これにより以降のフレームにおいて第１のトラッキング領域が検出できなくなった場
合でも、検出しておいた第２のトラッキング領域に基づいてトラッキングを中断すること
なく継続させることができる。また、図６に示す処理は、例えば、フレーム単位で実行さ
れるので、各フレーム中における人物を連続してトラッキングすることができる。また、
図３，４の例では、人物が１人である場合を例に挙げて説明したが、複数の人物が存在す
る場合には、それぞれの人物または指定された特定の人物がトラッキングの対象となる。
これにより、複数の人物を並行してトラッキングしたり、特定の人物を複数の人物の中か
ら選択してトラッキングしたりすることができる。 In the process shown in FIG. 6, even when the first tracking area (face) can be detected, the second tracking area is detected. This is because in the configuration in which the second tracking area is detected when the first tracking area cannot be detected, there is no information about the second tracking area in the previous frame. This is because the configuration in which the second tracking area is always detected is taken into consideration that the detection accuracy is deteriorated. As a result, even when the first tracking area cannot be detected in the subsequent frames, the tracking can be continued without interruption based on the detected second tracking area. Further, since the process shown in FIG. 6 is executed, for example, in units of frames, it is possible to continuously track a person in each frame. Also,
In the examples of FIGS. 3 and 4, the case where there is one person has been described as an example. However, when there are a plurality of persons, each person or a specified specific person is a target of tracking. .
Thereby, a plurality of persons can be tracked in parallel, or a specific person can be selected from a plurality of persons for tracking.

以上の実施の形態では、第１のトラッキング領域８２と第２のトラッキング領域８３の
２つの領域に基づいてトラッキングを行うようにした。このため、例えば、遮蔽物等によ
って第１のトラッキング領域８２が検出できない場合であっても、第２のトラッキング領
域８３に基づいて、対象を見逃すことなくトラッキングを継続することができる。 In the above embodiment, tracking is performed based on the two areas of the first tracking area 82 and the second tracking area 83. For this reason, for example, even when the first tracking area 82 cannot be detected due to an obstacle or the like, tracking can be continued based on the second tracking area 83 without missing the target.

また、以上の実施の形態では、第１のトラッキング領域８２については顔の画像的な特
徴（目、鼻、口等）に基づく検出処理を実行し、第２のトラッキング領域８３については
背景の特徴量との差異に基づく検出処理を実行するようにした。このため、異なる２つの
方法によって検出処理が実行されることから、対象を見失う可能性を低くすることができ
る。 In the above embodiment, detection processing based on facial image characteristics (eyes, nose, mouth, etc.) is executed for the first tracking area 82, and background characteristics for the second tracking area 83. The detection process based on the difference from the quantity was executed. For this reason, since the detection process is executed by two different methods, the possibility of losing the object can be reduced.

また、以上の実施の形態では、第２のトラッキング領域８３を検出する際には、複数の
特徴量を算出し、これらの特徴量の中でも最も識別性が高い特徴量を用いて、第２のトラ
ッキング領域８３を検出するようにした。このため、対象が移動して、背景が変化するよ
うな場合に、それぞれの背景毎に最適な（識別性が高い）特徴量が選択されることから、
背景が変化する場合であっても、対象を見失うことなく、トラッキングすることができる
。 In the above embodiment, when the second tracking region 83 is detected, a plurality of feature amounts are calculated, and the feature amount having the highest discriminability among these feature amounts is used, The tracking area 83 is detected. For this reason, when the object moves and the background changes, the optimum feature amount (high discriminability) is selected for each background.
Even when the background changes, tracking can be performed without losing sight of the object.

また、以上の実施の形態では、第１または第２のトラッキング領域８２，８３に基づい
て動きベクトルを算出し、当該動きベクトルに基づいて顔が存在する位置を推定し、推定
された領域を優先して顔検出処理するようにした。これにより、顔を検出する時間を短縮
することができる。 In the above embodiment, a motion vector is calculated based on the first or second tracking regions 82 and 83, the position where the face exists is estimated based on the motion vector, and the estimated region is prioritized. And face detection processing. Thereby, the time for detecting a face can be shortened.

また、以上の実施の形態では、顔が検出できた場合には第１のトラッキング領域８２に
基づいて第２のトラッキング領域８３を推定するようにするとともに、顔が検出できなか
った場合には前のフレームから第２のトラッキング領域８３に基づいて動きベクトルを算
出し、当該動きベクトルに基づいて現フレームにおいて第２のトラッキング領域８３を推
定するようにした。これにより、顔の位置から第２のトラッキング領域８３を的確に検出
することができるとともに、顔が検出できない場合であっても、直前のフレームにおける
第２のトラッキング領域８３に基づいて、現フレームの第２のトラッキング領域８３を的
確に検出することができる。 In the above embodiment, the second tracking area 83 is estimated based on the first tracking area 82 when the face can be detected, and the front is detected when the face cannot be detected. The motion vector is calculated from the second frame based on the second tracking region 83, and the second tracking region 83 is estimated in the current frame based on the motion vector. As a result, the second tracking area 83 can be accurately detected from the position of the face, and even if the face cannot be detected, the current frame can be detected based on the second tracking area 83 in the immediately preceding frame. The second tracking region 83 can be accurately detected.

また、以上の実施の形態では、第１のトラッキング領域８２が検出できた場合には図９
に示すように顔の領域を囲む矩形１２０を出力し、検出できない場合には図１０に示すよ
うに胴体を囲む矩形１２１を出力するようにした。これにより、ユーザは、トラッキング
の状態を的確に知ることができる。すなわち、顔の領域を囲む矩形１２０が出力されてい
る場合にはトラッキングが安定している状態であることを知ることができ、また、胴体の
領域を囲む矩形１２１が出力されている場合にはトラッキングが不安定な状態であること
を知ることができる。 Further, in the above embodiment, when the first tracking region 82 can be detected, FIG.
As shown in FIG. 10, a rectangle 120 surrounding the face area is output. If the rectangle 120 cannot be detected, a rectangle 121 surrounding the body is output as shown in FIG. Thereby, the user can know the tracking state accurately. That is, when the rectangle 120 surrounding the face area is output, it can be known that the tracking is stable, and when the rectangle 121 surrounding the body area is output. You can know that tracking is unstable.

（Ｄ）変形実施の態様
なお、上述した実施の形態は、あくまでも本発明の一態様を示すものであり、本発明の
範囲内で任意に変形および応用が可能であることは勿論である。
たとえば、以上の実施の形態では、本発明の画像処理装置をプリンタに適用した場合を
例に挙げて説明したが、本発明は、これ以外にも多くの機器に適用することができる。具
体的には、監視カメラと連動したセキュリティ装置に本発明を適用することができる。例
えば、ＡＴＭ（Automated Teller Machine）の近傍に配置されるカメラからの動画像を入
力して前述したトラッキング処理を実行する。そして、第２のトラッキング領域８３しか
検出できない状態が所定の時間以上継続した場合（例えば、ＡＴＭの利用者が一定時間以
上下を向いている場合、または、機械の下を覗き込んでいる場合）には、何らかのトラブ
ルが発生しているか、犯罪が行われていると判定し、警告等を行うようにしてもよい。 (D) Modified Embodiment The above-described embodiment is merely an aspect of the present invention, and it is needless to say that modifications and applications can be arbitrarily made within the scope of the present invention.
For example, in the above embodiment, the case where the image processing apparatus of the present invention is applied to a printer has been described as an example. However, the present invention can be applied to many other devices. Specifically, the present invention can be applied to a security device linked with a surveillance camera. For example, a moving image from a camera arranged in the vicinity of an ATM (Automated Teller Machine) is input to execute the tracking process described above. And when the state in which only the second tracking area 83 can be detected continues for a predetermined time or longer (for example, when the ATM user looks down for a certain time or more, or looks under the machine) It may be determined that some trouble has occurred or that a crime has been committed, and a warning or the like may be given.

また、撮像領域を自由に移動する機能を有するカメラと、本発明の画像処理装置を組み
合わせ、被写体の移動に応じて撮像範囲を自動的に移動させることにより、対象を広範囲
に渡って追尾できるようにしてもよい。より詳細には、検出された第１および第２のトラ
ッキング領域８２，８３が常に撮像領域の中央に位置するように、カメラをパンおよびチ
ルトするとともに、オートフォーカスすることにより、対象を追尾することができる。そ
のような場合、本発明では、対象が物陰に隠れた場合であっても胴体に基づいて追尾を継
続することができる。また、対象が移動して、背景が変化した場合であっても、第２のト
ラッキング領域８３については、最適な特徴量に基づいて検出が行われるので、対象を見
逃すことを防止できる。 In addition, by combining a camera having a function of moving an imaging region freely with the image processing apparatus of the present invention and automatically moving the imaging range in accordance with the movement of the subject, the target can be tracked over a wide range. It may be. More specifically, the target is tracked by panning and tilting the camera so that the detected first and second tracking areas 82 and 83 are always located at the center of the imaging area and performing autofocus. Can do. In such a case, in the present invention, tracking can be continued based on the trunk even when the object is hidden behind the object. Further, even when the object moves and the background changes, the second tracking region 83 is detected based on the optimum feature amount, so that it is possible to prevent the object from being overlooked.

また、第１または第２のトラッキング領域８２，８３を示す矩形１２０，１２１を出力
するのみならず、これらに囲まれた領域内の画素に基づいて、印刷に最適なフレームを選
択するようにしたり、これらの画素の状態に基づいて補正処理を施したりするようにして
もよい。具体的には、前者の場合、面積が広く、また、単一色の場合が多い第２のトラッ
キング領域８３に基づいてピントおよび露光が適切か否かを判定し、色合いの判断が容易
な第１のトラッキング領域８２に基づいて色合いを判定し、これらの双方が適切であるフ
レームを印刷候補として一覧表示することができる。また、後者の場合には、第１のトラ
ッキング領域８２に基づいてピントおよび露光の補正を行い、第２のトラッキング領域８
３に基づいて色合いの補正を行うことができる。 In addition to outputting the rectangles 120 and 121 indicating the first or second tracking regions 82 and 83, the optimum frame for printing may be selected based on the pixels in the regions surrounded by the rectangles 120 and 121. The correction process may be performed based on the state of these pixels. Specifically, in the former case, it is determined whether the focus and exposure are appropriate based on the second tracking region 83 that has a large area and often has a single color. It is possible to determine the hue based on the tracking area 82 and to display a list of frames for which both are appropriate as print candidates. In the latter case, the focus and exposure are corrected based on the first tracking area 82, and the second tracking area 8 is corrected.
3 can be used to correct the hue.

なお、以上の実施の形態では、特徴量として輝度、色ヒストグラム、テキスチャ情報、
および、空間周波数成分を用いるようにしたが、これ以外の情報を用いるようにしたり、
これ以外の情報も含めた複数の特徴量の一部を用いるようにしたりしてもよい。 In the above embodiment, luminance, color histogram, texture information,
And the spatial frequency component was used, but other information could be used,
A part of a plurality of feature amounts including other information may be used.

また、以上の実施の形態では、複数の特徴量の中から最適な特徴量を選択して使用する
ようにしたが、例えば、複数の特徴量の少なくとも一部を使用し、それぞれの識別力に応
じた重み付けを行って得られた判定値に基づいてステップＳ１６の判定を行うようにして
もよい。 In the above embodiment, the optimum feature amount is selected and used from a plurality of feature amounts. For example, at least a part of the plurality of feature amounts is used, and each discriminating power is used. The determination in step S16 may be performed based on the determination value obtained by performing the corresponding weighting.

また、以上の実施の形態では、第２のトラッキング領域８３として胴体を採用したが、
例えば、手および足を含む体全体としてもよい。あるいは、胴体ではなく、衣服の特徴的
な部分（例えば、Ｔシャツ）を第２のトラッキング領域８３としてもよい。 In the above embodiment, the body is adopted as the second tracking region 83.
For example, the whole body including hands and feet may be used. Alternatively, a characteristic part of clothing (for example, a T-shirt) may be used as the second tracking region 83 instead of the trunk.

また、以上の実施の形態では、人物を検出対象とするようにしたが、例えば、動物を対
象としてトラッキングを行うようにしてもよい。その場合、第１のトラッキング領域とし
ては動物の顔を使用し、第２のトラッキング領域としては動物の胴体を使用することがで
きる。 In the above embodiment, a person is a detection target. However, for example, tracking may be performed for an animal. In this case, an animal face can be used as the first tracking area, and an animal body can be used as the second tracking area.

なお、上記の処理機能は、コンピュータによって実現することができる。その場合、画
像処理装置が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラ
ムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。
処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録してお
くことができる。コンピュータで読み取り可能な記録媒体としては、磁気記録装置、光デ
ィスク、光磁気記録媒体、半導体メモリなどがある。磁気記録装置には、ハードディスク
装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクに
は、ＤＶＤ（Digital Versatile Disk）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disk
ROM）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などがある。光磁気記録媒体には
、ＭＯ（Magneto-Optical disk）などがある。 The above processing functions can be realized by a computer. In that case, a program describing the processing contents of the functions that the image processing apparatus should have is provided. By executing the program on a computer, the above processing functions are realized on the computer.
The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Optical disks include DVD (Digital Versatile Disk), DVD-RAM, CD-ROM (Compact Disk)
ROM), CD-R (Recordable) / RW (ReWritable), and the like. Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合には、たとえば、そのプログラムが記録されたＤＶＤ、Ｃ
Ｄ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータ
の記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピ
ュータにそのプログラムを転送することもできる。 When distributing the program, for example, the DVD or C on which the program is recorded
Portable recording media such as D-ROM are sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、たとえば、可搬型記録媒体に記録されたプログ
ラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納す
る。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに
従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読
み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、
サーバコンピュータからプログラムが転送される毎に、逐次、受け取ったプログラムに従
った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. Also, the computer
Each time the program is transferred from the server computer, it is possible to sequentially execute processing according to the received program.

本発明の実施形態に係る画像処理装置のブロック図である。1 is a block diagram of an image processing apparatus according to an embodiment of the present invention. プログラムが実行された場合に実現される機能ブロック図である。It is a functional block diagram implement | achieved when a program is performed. 第１および第２のトラッキング領域を示す図である。It is a figure which shows the 1st and 2nd tracking area | region. 第２のトラッキング領域を示す図である。It is a figure which shows the 2nd tracking area | region. 特徴量の分布曲線を示す図である。It is a figure which shows the distribution curve of a feature-value. 図２に示すブロックによって実行される処理のフローチャートである。It is a flowchart of the process performed by the block shown in FIG. 顔の検出に利用するテンプレートの一例である。It is an example of the template utilized for the detection of a face. 検出された第２のトラッキング領域に属する小領域を示す図である。It is a figure which shows the small area | region which belongs to the detected 2nd tracking area | region. 検出された第１のトラッキング領域を示す図である。It is a figure which shows the detected 1st tracking area | region. 検出された第２のトラッキング領域を示す図である。It is a figure which shows the detected 2nd tracking area | region.

符号の説明Explanation of symbols

１０…印刷装置、６２…顔検出・認識モジュール６２（第１の検出手段）、６３…第２
のトラッキング領域推定モジュール（第２の検出手段）、６４…画像特徴量算出モジュー
ル（第２の検出手段）、６５…中央制御モジュール（第２の検出手段）、６６…第２のト
ラッキング領域検出モジュール（トラッキング手段）。 DESCRIPTION OF SYMBOLS 10 ... Printing apparatus 62 ... Face detection / recognition module 62 (1st detection means), 63 ... 2nd
Tracking area estimation module (second detection means), 64 ... image feature quantity calculation module (second detection means), 65 ... central control module (second detection means), 66 ... second tracking area detection module (Tracking means).

Claims

動画を構成するフレームからトラッキング対象である人物の顔を表す顔画像を検出する
第１の検出手段と、
前記人物の体の少なくとも一部を表す画像を前記フレームから検出する第２の検出手段
と、
前記顔画像が検出された場合には当該顔画像に基づいて前記人物のトラッキングを行い
、前記顔画像が検出されない場合には前記体の少なくとも一部を表す画像に基づいて前記
人物のトラッキングを行うトラッキング手段と、
を有することを特徴とする画像処理装置。 First detection means for detecting a face image representing a face of a person to be tracked from a frame constituting a moving image;
Second detection means for detecting from the frame an image representing at least a part of the person's body;
When the face image is detected, the person is tracked based on the face image, and when the face image is not detected, the person is tracked based on an image representing at least a part of the body. Tracking means;
An image processing apparatus comprising:

請求項１に記載の画像処理装置において、
前記第１の検出手段は、顔を表す画像の特徴に基づいて前記顔画像を検出し、
前記第２の検出手段は、前記体の少なくとも一部を表す画像の特徴量と、前記フレーム
のうち背景を表す画像の特徴量との差異に基づいて前記体の少なくとも一部を表す画像を
検出する、
ことを特徴とする画像処理装置。 The image processing apparatus according to claim 1.
The first detection means detects the face image based on the feature of the image representing the face,
The second detection means detects an image representing at least a part of the body based on a difference between a feature amount of an image representing at least a part of the body and a feature amount of an image representing a background in the frame. To
An image processing apparatus.

請求項２に記載の画像処理装置において、
前記第２の検出手段は、前記体の少なくとも一部を表す画像及び前記背景を表す画像の
複数の異なる種類の特徴量のうち、前記体の少なくとも一部を表す画像と、前記背景を表
す画像とを識別する能力が最も高い特徴量に基づいて、前記体の少なくとも一部を表す画
像を検出する、
ことを特徴とする画像処理装置。 The image processing apparatus according to claim 2,
The second detection means includes an image representing at least a part of the body and an image representing the background among a plurality of different types of feature quantities of an image representing at least a part of the body and an image representing the background. Detecting an image representing at least a part of the body based on a feature amount having the highest ability to distinguish between
An image processing apparatus.

請求項３に記載の画像処理装置において、
前記第２の検出手段は、前記体の少なくとも一部を表す画像に該当する第１の領域を推
定し、前記第１の領域の特徴量の分布と、前記第１の領域以外の第２の領域の特徴量の分
布との重複が最も少ない特徴量を前記識別する能力が最も高い特徴量とする、
ことを特徴とする画像処理装置。 The image processing apparatus according to claim 3.
The second detection means estimates a first region corresponding to an image representing at least a part of the body, and a distribution of feature amounts of the first region and a second region other than the first region. The feature quantity that has the least amount of overlap with the distribution of the feature quantity of the region is the feature quantity having the highest ability to identify,
An image processing apparatus.

請求項４に記載の画像処理装置において、
前記第２の検出手段は、前記フレームよりも時系列において前のフレームである前フレ
ームから検出された前記体の少なくとも一部を表す画像に基づいて算出される動きベクト
ルに基づいて、前記第１の領域を推定するか、または、前記フレームから検出された顔画
像の位置、若しくは、大きさ、若しくは、向きに基づいて前記第１の領域を推定する、
ことを特徴とする画像処理装置。 The image processing apparatus according to claim 4.
The second detection means, based on a motion vector calculated based on an image representing at least a part of the body detected from a previous frame that is a previous frame in time series from the frame. Or the first region is estimated based on the position, size, or orientation of the face image detected from the frame.
An image processing apparatus.

請求項１乃至５のいずれか１項に記載の画像処理装置において、
前記第１の検出手段は、前記フレームよりも時系列において前のフレームである前フレ
ームから検出された前記顔画像または前記体の少なくとも一部を表す画像に基づいて算出
される動きベクトルに基づいて、前記フレームから前記顔画像に該当する領域を推定し、
前記推定された領域を優先して前記顔画像を検出する、
ことを特徴とする画像処理装置。 The image processing apparatus according to any one of claims 1 to 5,
The first detection means is based on a motion vector calculated based on the face image or an image representing at least a part of the body detected from a previous frame that is a previous frame in time series than the frame. , The region corresponding to the face image is estimated from the frame,
Detecting the face image in preference to the estimated area;
An image processing apparatus.

動画を構成するフレームからトラッキング対象である人物の顔を表す顔画像を検出し、
前記人物の体の少なくとも一部を表す画像を前記フレームから検出し、
前記顔画像が検出された場合には当該顔画像に基づいて前記人物のトラッキングを行い
、前記顔画像が検出されなかった場合には前記体の少なくとも一部を表す画像に基づいて
前記人物のトラッキングを行う、
ことを特徴とする画像処理方法。 Detect a face image representing the face of the person being tracked from the frames that make up the video,
Detecting an image representing at least part of the person's body from the frame;
If the face image is detected, the person is tracked based on the face image, and if the face image is not detected, the person is tracked based on an image representing at least a part of the body. I do,
An image processing method.

動画を構成するフレームからトラッキング対象である人物の顔を表す顔画像を検出する
第１の検出手段、
前記人物の体の一部を表す画像を前記フレームから検出する第２の検出手段、
前記顔画像が検出された場合には当該顔画像に基づいて前記人物のトラッキングを行い
、前記顔画像が検出されない場合には前記体の少なくとも一部を表す画像に基づいて前記
人物のトラッキングを行うトラッキング手段、
としてコンピュータを機能させる画像処理プログラム。 First detection means for detecting a face image representing a face of a person to be tracked from a frame constituting a moving image;
Second detection means for detecting an image representing a part of the person's body from the frame;
When the face image is detected, the person is tracked based on the face image, and when the face image is not detected, the person is tracked based on an image representing at least a part of the body. Tracking means,
An image processing program that causes a computer to function.