JP2010193406A

JP2010193406A - Image input apparatus, subject detecting method, and program

Info

Publication number: JP2010193406A
Application number: JP2009038485A
Authority: JP
Inventors: Masao Okada; 正雄岡田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-02-20
Filing date: 2009-02-20
Publication date: 2010-09-02
Anticipated expiration: 2029-02-20
Also published as: JP5253227B2

Abstract

<P>PROBLEM TO BE SOLVED: To achieve a highly-precise face tracking performance by reducing error tracking. <P>SOLUTION: An image input apparatus has an image input means to input an image, a subject detection means to detect a specific subject from one input image, a tracking means to estimate a position of the specific subject from the images which are continuously input, and a decision means to decide a tracking template used for tracking by the tracking means. When the specific subject is detected by the subject detection means from one of the images which are continuously input and the position of the specific subject is estimated by the tracking means without detecting the specific subject by the subject detection means from one of the images which are input after the image, the decision means selects either the image detected by the subject detection means or the image whose position is estimated by the tracking means and sets the selected image as the tracking template. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、撮影画像中の人物の顔を検出し追尾する技術に関する。 The present invention relates to a technique for detecting and tracking a person's face in a captured image.

従来から、カメラの撮影画像から両目や口の形状を抽出することで人物の顔を検出し、これを追尾する追尾装置が提案されている。従来の追尾装置は、人物の顔の検出を複数のフレームにおいて連続的に行うことにより、人物の顔の移動を追尾するように構成されている。しかしながら、従来の追尾装置では、顔が横や後ろを向いてしまった場合や、顔が障害物の陰に隠れてしまった場合などには、両目や口の形状を抽出できないために顔検出ができなくなり、人物の追尾に失敗してしまう。 2. Description of the Related Art Conventionally, a tracking device has been proposed that detects the face of a person by extracting the shape of both eyes and mouth from a captured image of a camera and tracks this. A conventional tracking device is configured to track the movement of a person's face by continuously detecting the person's face in a plurality of frames. However, with conventional tracking devices, face detection cannot be performed because the shape of both eyes and mouth cannot be extracted when the face is turned sideways or behind, or when the face is hidden behind an obstacle. It becomes impossible to track people.

これを回避するために、直前に検出された顔の位置に基づいて現在の顔のおおまかな位置候補を推定するとともに、その位置候補それぞれに対応する複数の周囲情報を取得する。そして、直前に検出された顔の位置に関連づけられている周囲情報に最も類似する周囲情報が取得された位置候補を、現在の顔の位置として決定する技術がある（特許文献１参照）。これによると、画像中から顔の検出を行うことなく現在の顔の位置を推定することができる。このため、例えば追尾対象の顔が物陰に隠れてしまったり、後ろを向いてしまったときのように、顔の検出ができない場合であっても、追尾対象の顔を見失うことなく、追尾を継続することが可能となる。 In order to avoid this, a rough position candidate of the current face is estimated based on the position of the face detected immediately before, and a plurality of surrounding information corresponding to each position candidate is acquired. There is a technique for determining a position candidate from which the peripheral information most similar to the peripheral information associated with the face position detected immediately before is determined as the current face position (see Patent Document 1). According to this, the current face position can be estimated without detecting the face from the image. For this reason, tracking continues without losing sight of the tracking target face, even when the tracking target face is hidden behind the object or facing backwards, such as when the face cannot be detected. It becomes possible to do.

特開２００７−０４２０７２号公報JP 2007-042072 A

上記特許文献１では、障害物や顔の向きにより顔の検出ができなくなった場合に、直前の画像で検出された顔画像を顔追尾のテンプレートとして、現在の画像とのフレーム相関を行うことによって、現在の顔の位置を推定している。また、顔追尾のためのテンプレートの設定方法として、直前の画像で顔検出された場合はその顔画像を顔追尾のテンプレートとし、直前の画像で顔検出されなかった場合はフレーム相関によって推定された顔の位置を追尾画像に設定する。 In Patent Document 1, when face detection becomes impossible due to an obstacle or face orientation, the face image detected in the immediately preceding image is used as a face tracking template to perform frame correlation with the current image. Estimate the current face position. As a template setting method for face tracking, when a face is detected in the previous image, the face image is used as a face tracking template, and when a face is not detected in the previous image, it is estimated by frame correlation. Set the face position to the tracking image.

ここで、障害物によって顔検出ができなくなった場合には、フレーム相関による顔追尾に切り替えられるが、顔追尾のテンプレートとして、直前の追尾画像を順次用いる方法では、障害物に追従してしまい誤追尾になり易くなる。また、顔追尾のテンプレートとして最後に検出できた顔画像を固定して用いると、障害物には追従し難くなるものの、追尾性能が低下して、顔の向きが横や後ろに変わった場合にはテンプレートとの差分が大きくなるため、顔追尾ができなくなる。 Here, when face detection becomes impossible due to an obstacle, it is switched to face tracking based on frame correlation, but the method of sequentially using the immediately preceding tracking image as a face tracking template will follow the obstacle and cause an error. It becomes easy to follow. Also, if the face image that can be detected last is fixed and used as a face tracking template, it will be difficult to follow an obstacle, but the tracking performance will be reduced and the face orientation will change to the side or back. Since the difference with the template becomes large, face tracking cannot be performed.

本発明は、上記課題に鑑みてなされ、誤追尾を低減し、高精度な追尾性能を実現する。 The present invention has been made in view of the above-described problems, reduces false tracking, and realizes highly accurate tracking performance.

上述した課題を解決するために、本発明の画像入力装置は、画像を入力する画像入力手段と、１つの入力画像から特定の被写体を検出する被写体検出手段と、連続的に入力される画像から前記特定の被写体の位置を推定する追尾手段と、前記追尾手段による追尾に用いる追尾テンプレートを決定する決定手段と、を有し、前記決定手段は、連続的に入力される画像の１つから、前記被写体検出手段によって前記特定の被写体が検出され、かつ、該画像よりも後に入力される画像の１つから前記被写体検出手段によって前記特定の被写体が検出されずに前記追尾手段によって前記特定の被写体の位置が推定された場合に、前記被写体検出手段により検出された画像、および、前記追尾手段により位置が推定された画像のいずれかを選択し、前記追尾テンプレートとして設定する。 In order to solve the above-described problems, an image input apparatus according to the present invention includes an image input unit that inputs an image, a subject detection unit that detects a specific subject from one input image, and an image that is continuously input. Tracking means for estimating the position of the specific subject, and determining means for determining a tracking template used for tracking by the tracking means, wherein the determining means is based on one of continuously input images, The specific subject is detected by the subject detection means, and the specific subject is detected by the tracking means without being detected by the subject detection means from one of images input after the image. When the position of is estimated, either the image detected by the subject detection means or the image whose position is estimated by the tracking means is selected, To set as your template.

また、本発明の被写体検出方法は、画像を入力する画像入力装置における被写体検出方法であって、１つの入力画像から特定の被写体を検出する被写体検出工程と、連続的に入力される画像から前記特定の被写体の位置を推定する追尾工程と、前記追尾工程による追尾に用いる追尾テンプレートを決定する決定工程と、を有し、前記決定工程は、連続的に入力される画像の１つから、前記被写体検出工程によって前記特定の被写体が検出され、かつ、該画像よりも後に入力される画像の１つから前記被写体検出工程によって前記特定の被写体が検出されずに前記追尾工程によって前記特定の被写体の位置が推定された場合に、前記被写体検出工程により検出された画像、および、前記追尾工程により位置が推定された画像のいずれかを選択し、前記追尾テンプレートとして設定する。 The subject detection method of the present invention is a subject detection method in an image input device that inputs an image, the subject detection step of detecting a specific subject from one input image, A tracking step for estimating the position of a specific subject, and a determination step for determining a tracking template used for tracking by the tracking step, wherein the determination step includes: The specific subject is detected by the subject detection step, and the specific subject is not detected by the subject detection step from one of the images input after the image, and the tracking step does not detect the specific subject. When the position is estimated, select either the image detected by the subject detection process or the image estimated by the tracking process. It is set as the tracking template.

本発明によれば、誤追尾を低減し、高精度な追尾性能を実現できる。 According to the present invention, it is possible to reduce erroneous tracking and realize highly accurate tracking performance.

本発明に係る実施形態の画像入力装置の概略構成を示すブロック図である。1 is a block diagram illustrating a schematic configuration of an image input apparatus according to an embodiment of the present invention. 顔を検出してから顔追尾テンプレート記憶部に顔追尾テンプレート情報を記憶するまでの処理を示すフローチャートである。It is a flowchart which shows a process after detecting a face until it memorize | stores face tracking template information in a face tracking template memory | storage part. 顔追尾のテンプレートから特徴量を算出する処理を示す図である。It is a figure which shows the process which calculates the feature-value from the template of face tracking. 顔が移動すると推定される領域を例示する図である。It is a figure which illustrates the field where it is estimated that a face moves. 障害物で遮蔽されることで顔検出できなくなり、顔追尾に切り替える例を示す図である。It is a figure which shows the example which becomes impossible to detect a face by being obstruct | occluded with an obstruction, and switches to face tracking. 障害物で遮蔽されることで顔検出できなくなり、顔追尾に切り替える例を示す図である。It is a figure which shows the example which becomes impossible to detect a face by being obstruct | occluded with an obstruction, and switches to face tracking. 顔の向きが変化したことで顔検出ができなくなり、顔追尾に切り替える例を示す図である。It is a figure which shows the example which face detection becomes impossible because the direction of a face changed, and switches to face tracking. 顔の向きが変化したことで顔検出ができなくなり、顔追尾に切り替える例を示す図である。It is a figure which shows the example which face detection becomes impossible because the direction of a face changed, and switches to face tracking.

以下に、添付図面を参照して本発明を実施するための最良の形態について詳細に説明する。尚、以下に説明する実施の形態は、本発明を実現するための一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施の形態に限定されるものではない。 The best mode for carrying out the present invention will be described below in detail with reference to the accompanying drawings. The embodiment described below is an example for realizing the present invention, and should be appropriately modified or changed according to the configuration and various conditions of the apparatus to which the present invention is applied. It is not limited to the embodiment.

図１は本発明に係る実施形態の画像入力装置の概略構成を示すブロック図である。図１において、画像入力装置１は、カメラ等により撮影された複数の画像を連続的に入力する画像入力部１０を備える。また、画像入力装置１は、入力画像から特定の被写体を検出する被写体検出部として人物の顔を検出する顔検出部１１と、顔検出部１１で検出された顔の向きを検出する顔向き検出部１２とを備える。また、画像入力装置１は、顔追尾のためのテンプレート情報を記憶する顔追尾テンプレート記憶部１３と、顔追尾の対象画像における顔の位置候補それぞれにおける画像特徴情報を記憶する画像特徴情報記憶部１４とを備える。更に、画像入力装置１は、顔追尾テンプレートと追尾対象画像中の顔位置候補の各画像特徴情報を比較して顔の位置を推定する位置推定部１５と、顔検出部１１又は位置推定部１５にて求められた顔の位置を外部に出力する位置出力部１６を備える。 FIG. 1 is a block diagram showing a schematic configuration of an image input apparatus according to an embodiment of the present invention. In FIG. 1, an image input device 1 includes an image input unit 10 that continuously inputs a plurality of images taken by a camera or the like. The image input apparatus 1 also includes a face detection unit 11 that detects a human face as a subject detection unit that detects a specific subject from an input image, and a face direction detection that detects the orientation of the face detected by the face detection unit 11. Part 12. The image input apparatus 1 also includes a face tracking template storage unit 13 that stores template information for face tracking, and an image feature information storage unit 14 that stores image feature information for each face position candidate in the face tracking target image. With. Furthermore, the image input apparatus 1 compares the face tracking template with each image feature information of face position candidates in the tracking target image to estimate the face position, and the face detection unit 11 or the position estimation unit 15. Is provided with a position output unit 16 for outputting the position of the face determined in step 1 to the outside.

画像入力部１０は、デジタルカメラ等の撮像装置によって連続的に撮影して得られた複数の画像を順次入力し、顔検出部１１に出力する。顔検出部１１は、入力画像から人物の顔を検出し、検出された顔の位置や大きさ等を示す顔情報を特定する。また、顔検出部１１は、検出された顔の特徴量として、検出された顔を含む領域における輝度分布や色ヒストグラム等を算出する。顔向き検出部１２は、顔検出部１１で検出された顔情報を用いて顔の向きを検出する。顔追尾テンプレート記憶部１３は、顔検出部１１で検出された顔情報と、顔向き検出部１２で検出された顔の向きをもとに、顔追尾のための基準画像として顔追尾テンプレートを決定し記憶する。この画像入力装置１は、単体の装置で構成されても、複数の装置からなるシステムで構成されても構わない。例えば、デジタルカメラやデジタルビデオカメラ単体の内部に画像入力部１０から位置出力部１６の全ての構成を備えても良い。あるいは、画像入力部１０のみをデジタルカメラやデジタルビデオカメラ内部に持たせ、それ以外をデジタルカメラやデジタルビデオカメラと通信可能な外部のコンピュータに持たせも良い。 The image input unit 10 sequentially inputs a plurality of images obtained by continuously capturing images with an imaging device such as a digital camera and outputs the images to the face detection unit 11. The face detection unit 11 detects the face of a person from the input image, and specifies face information indicating the position and size of the detected face. Further, the face detection unit 11 calculates a luminance distribution, a color histogram, and the like in a region including the detected face as the detected face feature amount. The face direction detection unit 12 detects the face direction using the face information detected by the face detection unit 11. The face tracking template storage unit 13 determines a face tracking template as a reference image for face tracking based on the face information detected by the face detection unit 11 and the face orientation detected by the face direction detection unit 12. And remember. The image input apparatus 1 may be configured as a single apparatus or may be configured as a system including a plurality of apparatuses. For example, all the configurations from the image input unit 10 to the position output unit 16 may be provided inside a single digital camera or digital video camera. Alternatively, only the image input unit 10 may be provided inside the digital camera or digital video camera, and the other may be provided in an external computer that can communicate with the digital camera or digital video camera.

図２は、顔を検出してから顔追尾テンプレート記憶部に顔追尾テンプレート情報を記憶するまでの処理を示すフローチャートである。図２において、先ず、ステップＳ２０で、顔検出部１１によって、画像入力部１０を通じて入力された画像中から人物の顔を検出する。顔検出部１１による顔検出の方法としては、例えば、特開平７−３１１８３３号公報に記載されたように、まず、画像中から目や口としての形状を満たす領域を抽出する。そして、その領域の組み合わせにより顔候補を決定し、この顔候補について予め記憶された顔標準パターンとの比較を行い、一致度が高ければその顔候補が顔であると判定する。本実施形態では、２つの目があり、その２つの目の中間を通過する延長線上に鼻と口が存在する領域を抽出する。そして、それらの両目、鼻、口を含む領域の中心を基準として、領域を順次拡大していき、領域境界部のコントラスト信号、色差信号、あるいは、輝度信号の変化が大きくなった時点で、そこが顔候補の境界であると判断する。あるいは、顔の輪郭形状を直接検出することで顔候補となる領域を決定しても良いし、両目、鼻、口を含み、かつ、肌色であると推定される領域を顔候補として決定してもよい。そして、この顔候補に対して顔標準パターンとの比較を行い、この顔候補が顔であるか否かの判定を行う。 FIG. 2 is a flowchart showing a process from detection of a face to storage of face tracking template information in the face tracking template storage unit. In FIG. 2, first, in step S <b> 20, the face detection unit 11 detects a human face from the image input through the image input unit 10. As a face detection method by the face detection unit 11, for example, as described in Japanese Patent Laid-Open No. 7-31833, first, a region satisfying the shape of eyes and mouth is extracted from an image. Then, a face candidate is determined based on the combination of the areas, the face candidate is compared with a previously stored face standard pattern, and if the degree of coincidence is high, it is determined that the face candidate is a face. In the present embodiment, an area where there are two eyes and a nose and a mouth exist on an extension line passing through the middle of the two eyes is extracted. Then, the area is sequentially enlarged with reference to the center of the area including both eyes, nose, and mouth, and when the change in contrast signal, color difference signal, or luminance signal at the area boundary increases, Is a boundary between face candidates. Alternatively, an area that is a face candidate may be determined by directly detecting the contour shape of the face, or an area that includes both eyes, nose, and mouth and is estimated to be skin color is determined as a face candidate. Also good. Then, the face candidate is compared with a face standard pattern to determine whether or not the face candidate is a face.

ステップＳ２１では、顔向き検出部１２によって、検出された顔の向きを判別する。顔検出部１１での顔検出結果から顔の向きも同時に検出される場合は、その結果を使用すれば良いし、そうでない場合は、例えば、顔領域における顔の構成要素（目、鼻、口など）の位置関係から顔の向きを推定するようにしても良い。例えば、目、鼻、口などの構成要素が、全体的に顔領域の右側によっていれば、その顔は向かって右側を向いていると推定でき、全体的に顔領域の上側によっていれば、その顔は上を見上げていると推定できる。さらに、それら構成要素の顔領域における位置から、その顔が、正面に対して何度傾いた方向を向いているのかを推定することもできる。 In step S21, the face orientation detection unit 12 determines the orientation of the detected face. If the face orientation is also detected from the face detection result in the face detection unit 11, the result may be used. Otherwise, for example, the face components in the face area (eyes, nose, mouth) Or the like) may be estimated from the positional relationship. For example, if components such as eyes, nose, and mouth are entirely on the right side of the face area, it can be estimated that the face is facing the right side, and if it is entirely on the upper side of the face area, It can be estimated that the face looks up. Further, it can be estimated from the positions of these constituent elements in the face region how many times the face is inclined with respect to the front.

ステップＳ２２では、顔の向きが予め設定された範囲内にあるか否か判定する。例えば、顔領域における顔の構成要素（目、鼻、口など）の位置関係から、顔の向きが上下左右いずれの方向においても、真正面の向きに対して４５度傾いた範囲内にあると推定されるか否かを判定する。この範囲は４５度に限定されるわけではなく、様々なシチュエーションにおける検証結果から実験的に求めればよい。顔の向きが予め設定された範囲内にある場合には、ステップＳ２３に進み、次回以降の顔検出において顔が検出されなかった場合に使用する顔追尾のテンプレートとして、最後に検出された顔画像を設定する。また、予め設定された範囲内にない場合には、ステップＳ２４に進み、次回以降の顔検出で顔が検出されなかった場合に使用する顔追尾テンプレートとして、顔追尾を行う直前に撮影された画像中から顔検出又は顔追尾により検出された顔画像を設定する。 In step S22, it is determined whether the face orientation is within a preset range. For example, from the positional relationship of facial components (eyes, nose, mouth, etc.) in the face area, it is estimated that the face orientation is within a range inclined by 45 degrees with respect to the frontal orientation in any of the top, bottom, left, and right directions. It is determined whether or not. This range is not limited to 45 degrees, but may be obtained experimentally from the verification results in various situations. If the face orientation is within the preset range, the process proceeds to step S23, and the face image detected last as a face tracking template to be used when the face is not detected in the subsequent face detection. Set. If it is not within the preset range, the process proceeds to step S24, and an image photographed immediately before performing face tracking as a face tracking template to be used when a face is not detected in the subsequent face detection. A face image detected by face detection or face tracking is set.

本実施形態における顔追尾処理は、顔検出部１１で顔が検出できない場合にのみ実行される。あるいは、顔追尾処理は顔検出部１１の検出結果によらずに常に実行されるが、顔検出部１１で顔が検出できた場合には、顔追尾処理による追尾結果よりも顔検出部１１による顔検出結果が優先される。顔追尾における顔位置の推定は、顔検出部１１によって検出される顔位置に比べて精度が劣ることが多い。このため、顔検出部１１で顔が検出された場合には、その検出結果を優先して使用することにより、顔の位置をより精度良く検出することができる。 The face tracking process in the present embodiment is executed only when the face detection unit 11 cannot detect a face. Alternatively, the face tracking process is always executed regardless of the detection result of the face detection unit 11, but when the face can be detected by the face detection unit 11, the face detection unit 11 performs more than the tracking result of the face tracking process. The face detection result has priority. The estimation of the face position in face tracking is often less accurate than the face position detected by the face detection unit 11. For this reason, when a face is detected by the face detection unit 11, the position of the face can be detected with higher accuracy by giving priority to the detection result.

また、顔検出ができなくなり、顔追尾を実行しているときに、新たに顔が検出された場合には、顔追尾を中止して、再び図２のフローによる処理を行い、顔追尾のテンプレートを新たに設定する。このように常に最新の顔検出結果を用いて、顔追尾のためのテンプレートを設定することで、より高精度の顔追尾を行うことができる。 If face detection cannot be performed and face tracking is being performed, and a new face is detected, face tracking is stopped, and the process according to the flow of FIG. 2 is performed again. Is newly set. Thus, by setting a template for face tracking always using the latest face detection result, more accurate face tracking can be performed.

画像特徴情報記憶部１４は、顔追尾テンプレート記憶部１３に記憶された顔追尾テンプレートの特徴量を算出して記憶する。図３は、顔追尾のテンプレートから特徴量を算出する処理を示す図である。図３において、顔追尾テンプレート記憶部１３に記憶された顔追尾テンプレートの中心座標を（ｘ，ｙ）、幅Ｗ、高さＨとして、顔を囲む短形の領域を設定する。そして、この矩形領域の輝度分布や色ヒストグラム等を算出し、テンプレート顔画像の特徴量として記憶する。なお、顔を囲む短形の領域のサイズは顔の大きさ等に応じて任意に設定可能である。 The image feature information storage unit 14 calculates and stores the feature amount of the face tracking template stored in the face tracking template storage unit 13. FIG. 3 is a diagram illustrating processing for calculating a feature amount from a face tracking template. In FIG. 3, a short area surrounding the face is set with the center coordinates of the face tracking template stored in the face tracking template storage unit 13 being (x, y), the width W, and the height H. Then, a luminance distribution, a color histogram, and the like of this rectangular area are calculated and stored as a feature amount of the template face image. The size of the short area surrounding the face can be arbitrarily set according to the size of the face.

また、画像特徴情報記憶部１４は、顔追尾テンプレート記憶部１３によって記憶された顔追尾テンプレートの位置情報を用いて、顔が移動すると推定される位置の画像特徴量も算出して記憶する。図４は、顔が移動すると推定される領域を例示する図である。ここでは、顔追尾テンプレートの中心座標を（ｘ，ｙ）とし、この中心座標（領域５）から水平・垂直方向にそれぞれ±５画素移動したときの、幅Ｗ、高さＨの短形の領域を、それぞれ領域１〜４，領域６〜９とする。そして、領域１〜４，６〜９のそれぞれにおける輝度分布や色ヒストグラム等を算出し、顔移動推定位置候補の特徴量として記憶する。 The image feature information storage unit 14 also calculates and stores an image feature amount at a position where the face is estimated to move using the position information of the face tracking template stored by the face tracking template storage unit 13. FIG. 4 is a diagram illustrating an area where the face is estimated to move. Here, the center coordinates of the face tracking template are (x, y), and a short area having a width W and a height H when the center coordinates (area 5) are moved ± 5 pixels in the horizontal and vertical directions, respectively. Are defined as regions 1 to 4 and regions 6 to 9, respectively. Then, a luminance distribution, a color histogram, and the like in each of the regions 1 to 4 and 6 to 9 are calculated and stored as feature amounts of face movement estimated position candidates.

なお、本例では顔移動推定位置候補として、中心座標から±５画素移動したときの８個の領域を例としたが、移動量や位置候補の数は、顔の大きさや動き等に応じて任意に設定可能である。 In this example, as the face movement estimated position candidates, eight areas when moving ± 5 pixels from the center coordinates are taken as an example, but the amount of movement and the number of position candidates depend on the size and movement of the face. It can be set arbitrarily.

位置推定部１５は、画像特徴情報記憶部１４に記憶された顔追尾テンプレートの特徴量と、追尾対象画像における顔の位置候補である、図４の領域１〜領域９のそれぞれにおける画像特徴量とを比較する。そして、最も画像特徴情報が類似している領域を現在の顔位置として推定する。また、最も類似している画像特徴量と、顔追尾テンプレートの特徴量との差分が閾値よりも大きい場合には、顔追尾を停止するように設定することで、誤追尾を防止することができる。位置出力部１６は、位置推定部１５で推定された顔の位置情報を画像入力装置１から外部機器に出力する。 The position estimation unit 15 includes the feature amount of the face tracking template stored in the image feature information storage unit 14 and the image feature amount in each of the regions 1 to 9 in FIG. 4, which are face position candidates in the tracking target image. Compare Then, the region having the most similar image feature information is estimated as the current face position. Further, when the difference between the most similar image feature amount and the feature amount of the face tracking template is larger than the threshold value, it is possible to prevent erroneous tracking by setting the face tracking to stop. . The position output unit 16 outputs the face position information estimated by the position estimation unit 15 from the image input device 1 to an external device.

本例の顔追尾処理における具体例と効果について、図５〜図８を参照して説明する。図５及び図６は、当初は顔検出できていたが、障害物が顔の前を横切ったことにより、顔検出ができなくなって顔追尾に切り替える例を示している。図５に従来の顔追尾処理を適用したときの例を示す。この図５では、顔追尾テンプレートを用いた顔追尾が開始されると、顔検出部１１にて検出された顔、および、顔追尾テンプレートを用いた顔追尾にて検出された顔のうち、最新の顔を顔追尾テンプレートに設定する構成となっている。フレーム（ａ）の画像においては、顔検出部１１が顔の検出に成功しており、顔向き検出部１２が検出された顔が真正面を向いているものと判別している。 Specific examples and effects in the face tracking process of this example will be described with reference to FIGS. FIG. 5 and FIG. 6 show an example in which face detection was initially performed, but face detection could not be performed because an obstacle crossed in front of the face, and switching to face tracking was performed. FIG. 5 shows an example when the conventional face tracking process is applied. In FIG. 5, when face tracking using the face tracking template is started, the latest of the faces detected by the face detection unit 11 and the faces detected by face tracking using the face tracking template is displayed. The face is set as a face tracking template. In the image of frame (a), the face detection unit 11 has succeeded in detecting the face, and the face direction detection unit 12 determines that the detected face is facing directly in front.

次のフレーム（ｂ）においては、被写体の片目の一部が障害物によって隠れてしまっているため、顔検出部１１は顔の検出に失敗する。顔検出部１１が顔の検出に失敗したため、このフレーム（ｂ）において、顔追尾テンプレートを用いた顔追尾が開始される。顔追尾テンプレート記憶部１３は、直前に顔検出部１１によって顔が検出されているため、顔追尾テンプレートとしてこの顔（フレーム（ａ）で検出された顔）が設定される。画像特徴情報記憶部１４は、この顔追尾テンプレートの輝度分布や色ヒストグラムを算出し、特徴量として記憶する。そして、位置推定部１５がフレーム（ｂ）の画像の中から、記憶した特徴量と最も類似している領域を抽出し、その差分が閾値以内であったため、抽出した領域をフレーム（ｂ）の画像における顔と推定する。ここで、フレーム（ｂ）の画像から新たな顔が推定されたので、顔追尾テンプレート記憶部１３は、このフレーム（ｂ）から推定された顔を顔追尾テンプレートとして更新し、画像特徴情報記憶部１４は、そこから得られた特徴量を新たな特徴量として更新する。 In the next frame (b), the face detection unit 11 fails to detect the face because a part of one eye of the subject is hidden by the obstacle. Since the face detection unit 11 failed to detect the face, face tracking using the face tracking template is started in this frame (b). In the face tracking template storage unit 13, since the face is detected immediately before by the face detection unit 11, this face (face detected in the frame (a)) is set as the face tracking template. The image feature information storage unit 14 calculates the luminance distribution and color histogram of the face tracking template and stores them as feature amounts. Then, since the position estimation unit 15 extracts an area most similar to the stored feature amount from the image of the frame (b), and the difference is within the threshold, the extracted area is extracted from the frame (b). Estimate the face in the image. Here, since a new face is estimated from the image of the frame (b), the face tracking template storage unit 13 updates the face estimated from the frame (b) as a face tracking template, and the image feature information storage unit 14 updates the feature value obtained therefrom as a new feature value.

フレーム（ｃ）においては、被写体の顔の大部分が障害物に隠れてしまっているため、顔検出部１１は顔の検出に失敗する。よって、位置推定部１５がフレーム（ｃ）の画像の中から、フレーム（ｂ）の画像から得られた特徴量と最も類似している領域を抽出し、その特徴量の差分が閾値以内であれば、この領域を顔として推定する。フレーム（ｂ）の画像から得られた特徴量には、被写体の顔と障害物から得られた特徴量が含まれているため、同じく被写体の顔と障害物が含まれた領域が、新たな顔として推定される。ここで、実際の被写体の顔の大部分が障害物に隠れてしまっているために、新たな顔として推定された領域には、被写体の顔よりも障害物が多く含まれてしまうことになる。 In frame (c), the face detection unit 11 fails to detect the face because most of the face of the subject is hidden behind the obstacle. Therefore, the position estimation unit 15 extracts an area most similar to the feature amount obtained from the image of the frame (b) from the image of the frame (c), and the difference between the feature amounts is within a threshold value. For example, this region is estimated as a face. Since the feature amount obtained from the image of the frame (b) includes the feature amount obtained from the subject's face and the obstacle, the region including the subject's face and the obstacle is also a new one. Estimated as a face. Here, since most of the face of the actual subject is hidden by the obstacle, the area estimated as a new face contains more obstacles than the face of the subject. .

フレーム（ｄ）においても、実際の被写体の顔の大部分が障害物に隠れてしまっているため、顔検出部１１は顔の検出に失敗する。ここでもフレーム（ｃ）と同様に、被写体の顔よりも障害物が多く含まれてしまっている領域が、新たな顔として推定される。
そして、フレーム（ｅ）においても、実際の被写体の顔の半分程度が障害物に隠れてしまっているため、顔検出部１１は顔の検出に失敗する。このフレーム（ｅ）の画像では、実際の被写体の顔の半分程度が障害物の陰から現れたが、フレーム（ｄ）にて顔として推定された領域には、障害物に大部分を占められているため、障害物が占める領域が新たな顔として推定されてしまう。 Even in the frame (d), the face detection unit 11 fails to detect the face because most of the face of the actual subject is hidden behind the obstacle. Here again, as in the case of frame (c), a region that contains more obstacles than the face of the subject is estimated as a new face.
Also in the frame (e), about half of the face of the actual subject is hidden by the obstacle, so the face detection unit 11 fails to detect the face. In the image of frame (e), about half of the face of the actual subject appeared from behind the obstacle, but the area estimated as the face in frame (d) is mostly occupied by the obstacle. Therefore, the area occupied by the obstacle is estimated as a new face.

このように、従来の方法では、顔よりも手前に障害物が存在すると、その障害物に対して追尾が行われてしまうという問題があった。これに対し、図６に本実施形態の顔追尾処理を適用したときの例を示す。フレーム（ａ）の画像においては、顔検出部１１が顔の検出に成功しており、顔向き検出部１２が検出された顔が真正面を向いているものと判別している。 As described above, in the conventional method, when an obstacle is present in front of the face, there is a problem that the obstacle is tracked. On the other hand, FIG. 6 shows an example when the face tracking process of this embodiment is applied. In the image of frame (a), the face detection unit 11 has succeeded in detecting the face, and the face direction detection unit 12 determines that the detected face is facing directly in front.

次のフレーム（ｂ）においては、被写体の片目の一部が障害物によって隠れてしまっているため、顔検出部１１は顔の検出に失敗する。顔検出部１１が顔の検出に失敗したため、このフレーム（ｂ）において、顔追尾テンプレートを用いた顔追尾が開始される。顔追尾テンプレート記憶部１３は、直前に顔検出部１１によって顔が検出された際に、その顔が真正面の向きに対して４５度傾いた範囲内にあったため、顔追尾テンプレートとしてその顔（フレーム（ａ）で検出された顔）が設定される。画像特徴情報記憶部１４は、この顔追尾テンプレートの輝度分布や色ヒストグラムを算出し、特徴量として記憶する。そして、位置推定部１５がフレーム（ｂ）の画像の中から、記憶した特徴量と最も類似している領域を抽出し、その差分が閾値以内であったため、抽出した領域をフレーム（ｂ）の画像における顔と推定する。ここで、画像特徴情報記憶部１４は、フレーム（ｂ）の画像から新たな顔が推定されたが、顔検出部１１によって最後に検出された顔の向きが予め設定された範囲内であったため、特徴量を更新せずにそのまま保持する。 In the next frame (b), the face detection unit 11 fails to detect the face because a part of one eye of the subject is hidden by the obstacle. Since the face detection unit 11 failed to detect the face, face tracking using the face tracking template is started in this frame (b). When the face is detected by the face detection unit 11 immediately before, the face tracking template storage unit 13 is within a range tilted by 45 degrees with respect to the frontal direction. The face detected in (a)) is set. The image feature information storage unit 14 calculates the luminance distribution and color histogram of the face tracking template and stores them as feature amounts. Then, since the position estimation unit 15 extracts an area most similar to the stored feature amount from the image of the frame (b), and the difference is within the threshold, the extracted area is extracted from the frame (b). Estimate the face in the image. Here, the image feature information storage unit 14 estimates a new face from the image of the frame (b), but the direction of the face detected last by the face detection unit 11 is within the preset range. The feature amount is maintained without being updated.

フレーム（ｃ）においては、実際の被写体の顔の大部分が障害物に隠れてしまっているため、顔検出部１１は顔の検出に失敗する。よって、位置推定部１５がフレーム（ｃ）の画像の中から、保持した特徴量と最も類似している領域を抽出する。しかしながら、最も類似している領域と、フレーム（ａ）で検出された顔から得られた特徴量の差分が閾値よりも大きかったため、位置推定部１５は顔追尾に失敗したと判断し、追尾を終了する。
フレーム（ｄ）、（ｅ）では既に追尾が終了されているため、顔検出部１１によって新たに顔が検出されるまで、顔追尾が行われない。 In frame (c), the face detection unit 11 fails to detect the face because most of the face of the actual subject is hidden behind the obstacle. Therefore, the position estimation unit 15 extracts an area most similar to the retained feature amount from the image of the frame (c). However, since the difference between the most similar region and the feature amount obtained from the face detected in the frame (a) is larger than the threshold, the position estimation unit 15 determines that the face tracking has failed, finish.
Since tracking has already been completed in frames (d) and (e), face tracking is not performed until a new face is detected by the face detection unit 11.

このように、本実施形態の顔追尾処理では、直前に顔検出部１１にて顔領域が検出された際に、その顔の向きが予め設定された範囲内にあったため、画像特徴情報記憶部１４は、顔検出部１１が検出した顔から得られた特徴量を保持している。顔検出部１１にて検出された顔が正面向きであったにも関わらず、その後に顔検出部１１によって顔が検出できなくなったのであれば、顔の手前に障害物が発生した可能性が高いと考えられる。そこで、本実施形態では、そのような場合には、顔追尾テンプレートを用いた顔追尾にて推定された顔の方が、顔検出部１１にて検出された顔よりも新しい情報であっても、顔検出部１１にて検出された顔を顔追尾テンプレートとして設定する。このようにすることで、誤追尾を抑制することが可能となる。 As described above, in the face tracking process according to the present embodiment, when the face area is detected immediately before by the face detection unit 11, the face orientation is within the preset range. Reference numeral 14 denotes a feature amount obtained from the face detected by the face detection unit 11. Even if the face detected by the face detection unit 11 is face-to-face, if the face cannot be detected by the face detection unit 11 after that, an obstacle may have occurred in front of the face. It is considered high. Therefore, in this embodiment, in such a case, even if the face estimated by face tracking using the face tracking template is newer information than the face detected by the face detection unit 11. The face detected by the face detection unit 11 is set as a face tracking template. By doing so, it becomes possible to suppress erroneous tracking.

ここで、顔検出部１１にて検出された顔の向きが予め設定された範囲内にあった場合にのみ、上述した処理を行う理由について説明する。本実施形態の構成と比較をするため、図７に、常に顔検出部１１にて検出された顔を顔追尾テンプレートとして設定する顔追尾処理を適用したときの例を示す。フレーム（ａ）の画像においては、顔検出部１１が顔の検出に成功しており、顔向き検出部１２が検出された顔が真正面を向いていることを検出している。フレーム（ｂ）の画像においては、顔検出部１１が顔の検出に成功しており、顔向き検出部１２が検出された顔が真正面に対して右に５０度傾いた方向を向いていることを検出している。 Here, the reason why the above-described processing is performed only when the orientation of the face detected by the face detection unit 11 is within a preset range will be described. In order to compare with the configuration of the present embodiment, FIG. 7 shows an example of applying a face tracking process that always sets a face detected by the face detection unit 11 as a face tracking template. In the image of the frame (a), the face detection unit 11 has succeeded in detecting the face, and the face direction detection unit 12 detects that the detected face is facing directly in front. In the image of frame (b), the face detection unit 11 has succeeded in detecting the face, and the face detected by the face direction detection unit 12 is oriented in a direction inclined 50 degrees to the right with respect to the front. Is detected.

フレーム（ｃ）の画像においては、被写体の顔が大きく右を向くことで、片目の一部が見えなくしまっているため、顔検出部１１は顔の検出に失敗する。顔検出部１１が顔の検出に失敗したため、このフレーム（ｃ）において、顔追尾テンプレートを用いた顔追尾が開始される。顔追尾テンプレート記憶部１３は、直前に顔検出部１１によって顔が検出されているため、顔追尾テンプレートとして、その顔（フレーム（ｂ）で検出された顔）が設定される。画像特徴情報記憶部１４は、この顔追尾テンプレートの輝度分布や色ヒストグラムを算出し、特徴量として記憶する。そして、位置推定部１５がフレーム（ｃ）の画像の中から、新たな顔を推定する。ここで、画像特徴情報記憶部１４は、フレーム（ｃ）の画像から新たな顔が推定されたが、顔検出部１１によって新たな顔が検出されたのではないため、特徴量を更新せずにそのまま保持する。 In the image of the frame (c), the face detection unit 11 fails to detect the face because the face of the subject is greatly turned to the right so that part of one eye is not visible. Since the face detection unit 11 failed to detect the face, face tracking using the face tracking template is started in this frame (c). In the face tracking template storage unit 13, the face (face detected in the frame (b)) is set as the face tracking template because the face detection unit 11 detects the face immediately before. The image feature information storage unit 14 calculates the luminance distribution and color histogram of the face tracking template and stores them as feature amounts. Then, the position estimation unit 15 estimates a new face from the image of the frame (c). Here, the image feature information storage unit 14 does not update the feature amount because a new face is estimated from the image of the frame (c), but no new face is detected by the face detection unit 11. Keep it as it is.

フレーム（ｄ）の画像においては、被写体の顔がさらに大きく右を向くことで、片目が見えなくしまっているため、顔検出部１１は顔の検出に失敗する。顔検出部１１が顔の検出に失敗したため、このフレーム（ｄ）においても、引き続き顔追尾テンプレートを用いた顔追尾が行われる。しかしながら、画像特徴情報記憶部１４が保持している特徴量は、フレーム（ｂ）で検出された顔から得られた特徴量のままであるため、顔の向きが大きく異なるフレーム（ｄ）の顔の特徴量は、保持された特徴量との差分が閾値よりも大きくなる。そのため、位置推定部１５は顔追尾に失敗したと判断し、追尾を終了してしまう。
フレーム（ｅ）の画像においても、画像特徴情報記憶部１４が保持している特徴量との差分が閾値以内となる特徴量が得られる領域が見つからずに、顔追尾に失敗したと判断する。 In the image of the frame (d), the face detection unit 11 fails to detect the face because the face of the subject is turned further to the right so that one eye cannot be seen. Since the face detection unit 11 failed to detect the face, face tracking using the face tracking template is continued in this frame (d). However, since the feature amount held in the image feature information storage unit 14 remains the feature amount obtained from the face detected in the frame (b), the face of the frame (d) whose face direction is greatly different. The difference between the feature amount and the retained feature amount is larger than the threshold value. Therefore, the position estimation unit 15 determines that face tracking has failed and ends tracking.
Also in the image of the frame (e), it is determined that face tracking has failed because a region where a feature amount having a difference from the feature amount held in the image feature information storage unit 14 is within a threshold is not found.

このように、常に顔検出部１１にて検出された顔を顔追尾テンプレートとして設定すると、顔の向きが変化していくような場合には、障害物がなくともその顔が追尾できなくなるという問題があった。これに対し、図８に本実施形態の顔追尾処理を適用したときの例を示す。この図８には、顔検出部１１にて検出された顔の向きが予め設定された範囲内になければ、顔追尾テンプレートを用いた顔追尾にて推定された顔を、新たな顔追尾テンプレートとして更新する例を示す。 As described above, when the face detected by the face detection unit 11 is always set as a face tracking template, the face cannot be tracked even if there is no obstacle when the orientation of the face changes. was there. On the other hand, FIG. 8 shows an example when the face tracking process of the present embodiment is applied. In FIG. 8, if the face direction detected by the face detection unit 11 is not within a preset range, a face estimated by face tracking using the face tracking template is displayed as a new face tracking template. An example of updating as follows.

フレーム（ａ）、（ｂ）の画像においては、図７と同様である。フレーム（ｃ）の画像においては、被写体の顔が大きく右を向くことで、片目の一部が見えなくしまっているため、顔検出部１１は顔の検出に失敗する。顔検出部１１が顔の検出に失敗したため、このフレーム（ｃ）において、顔追尾テンプレートを用いた顔追尾が開始される。顔追尾テンプレート記憶部１３は、直前に顔検出部１１によって顔が検出されているため、顔追尾テンプレートとして、この顔（フレーム（ｂ）で検出された顔領域）が設定される。画像特徴情報記憶部１４は、この顔追尾テンプレートの輝度分布や色ヒストグラムを算出し、特徴量として記憶する。そして、位置推定部１５がフレーム（ｃ）の画像の中から、新たな顔を推定する。ここで、フレーム（ｃ）の画像から新たな顔が推定され、かつ、顔検出部１１によって検出された最新の顔の向き（フレーム（ｂ）の顔の向き）が予め設定された範囲内にない。そのため、顔追尾テンプレート記憶部１３は、このフレーム（ｃ）から推定された顔を顔追尾テンプレートとして更新し、画像特徴情報記憶部１４は、そこから得られた特徴量を新たな特徴量として更新する。 The images in frames (a) and (b) are the same as those in FIG. In the image of the frame (c), the face detection unit 11 fails to detect the face because the face of the subject is greatly turned to the right so that part of one eye is not visible. Since the face detection unit 11 failed to detect the face, face tracking using the face tracking template is started in this frame (c). In the face tracking template storage unit 13, since the face is detected immediately before by the face detection unit 11, this face (face area detected in the frame (b)) is set as the face tracking template. The image feature information storage unit 14 calculates the luminance distribution and color histogram of the face tracking template and stores them as feature amounts. Then, the position estimation unit 15 estimates a new face from the image of the frame (c). Here, a new face is estimated from the image of the frame (c), and the latest face direction (face direction of the frame (b)) detected by the face detection unit 11 is within a preset range. Absent. Therefore, the face tracking template storage unit 13 updates the face estimated from the frame (c) as a face tracking template, and the image feature information storage unit 14 updates the feature amount obtained therefrom as a new feature amount. To do.

フレーム（ｄ）の画像においては、被写体の顔がさらに大きく右を向くことで、片目が見えなくしまっているため、顔検出部１１は顔の検出に失敗する。顔検出部１１が顔の検出に失敗したため、このフレーム（ｄ）においても、引き続き顔追尾テンプレートを用いた顔追尾が行われる。画像特徴情報記憶部１４が保持している特徴量は、フレーム（ｃ）で検出された顔から得られた特徴量である。そのため、フレーム（ｃ）の顔に対して、その向きの差が小さいフレーム（ｄ）の顔は、保持された特徴量との差分が閾値以内となり、顔として推定されることになる。 In the image of the frame (d), the face detection unit 11 fails to detect the face because the face of the subject is turned further to the right so that one eye cannot be seen. Since the face detection unit 11 failed to detect the face, face tracking using the face tracking template is continued in this frame (d). The feature amount held by the image feature information storage unit 14 is a feature amount obtained from the face detected in the frame (c). For this reason, the face of the frame (d) whose direction difference is small relative to the face of the frame (c) is within the threshold value and is estimated as a face.

このように、顔の向きが徐々に変化していくような場合には、顔追尾テンプレートを用いた顔追尾によって推定された顔領域を用いて、顔追尾テンプレートを更新することが望ましいと言える。 Thus, when the face orientation changes gradually, it can be said that it is desirable to update the face tracking template using the face area estimated by the face tracking using the face tracking template.

以上説明したように、本実施形態によれば、顔追尾テンプレートとして、最後に検出された顔画像と、顔追尾によって検出された直前の顔画像とを顔の向きに基づいて切り替えて設定する。これにより、障害物によって顔検出ができないと想定される場合には、最後に顔検出されたときの顔画像を顔追尾テンプレートとして設定することにより、図５のような誤追尾を防止し、図６のように顔追尾を停止することが可能になる。 As described above, according to the present embodiment, as the face tracking template, the face image detected last and the face image immediately before detected by face tracking are switched and set based on the face orientation. Thus, when it is assumed that face detection cannot be performed due to an obstacle, the face image at the time of the last face detection is set as a face tracking template to prevent erroneous tracking as shown in FIG. The face tracking can be stopped as shown in FIG.

また、顔の向きが大きく変化したことにより顔検出ができないと想定される場合には、直前に検出された顔追尾画像を顔追尾テンプレートに設定できるようにする。これにより、図８のように追尾が途中で停止されることなく、図７のように顔が横向きや後ろ向に変わった場合であっても、顔追尾を継続することができる。 When it is assumed that face detection cannot be performed due to a large change in face orientation, the face tracking image detected immediately before can be set as the face tracking template. As a result, the face tracking can be continued even when the face changes to the horizontal or backward direction as shown in FIG. 7 without being stopped midway as shown in FIG.

上述した実施形態では、顔の向きを判別して、顔追尾テンプレートの設定方法を変えていたが、それ以外の条件によって、顔追尾テンプレートの設定方法を変更しても良い。例えば、時系列順の複数の画像から顔の周辺に物体が近づいていると判定された場合には、最後に顔検出されたときの顔画像を顔追尾テンプレートとして設定する。これに対し、顔の周辺に物体が近付いていないと判定された場合には、最後に顔追尾されたときの顔画像を顔追尾テンプレートとして設定する。こうすることにより、物体に追従して誤追尾することなく、追尾を停止することが可能になる。つまり、１つのフレーム画像から顔を検出する機能と、複数のフレーム画像の相関から顔が位置すると思われる領域を追尾する機能を備え、いずれかの処理の結果を顔追尾テンプレートとして選択する構成を備えていれば、本発明を適用することができる。さらに、追尾対象は必ずしも被写体の顔でなくとも構わない。被写体の特定の姿勢や、特定の動物など、色や形状といった複数の条件と比較することで、１つの画像中からその被写体を特定することができる対象であれば、上述した実施形態における顔と置き換えることが可能である。 In the above-described embodiment, the face tracking template setting method is changed by determining the face orientation. However, the face tracking template setting method may be changed depending on other conditions. For example, when it is determined that an object is approaching the periphery of the face from a plurality of images in time series order, the face image when the face is detected last is set as the face tracking template. On the other hand, if it is determined that the object is not approaching the periphery of the face, the face image when the face is last tracked is set as a face tracking template. By doing so, it becomes possible to stop the tracking without following the object and mistracking. In other words, it has a function to detect a face from one frame image and a function to track an area where a face is supposed to be located from the correlation of a plurality of frame images, and to select a result of any processing as a face tracking template If it is provided, the present invention can be applied. Furthermore, the tracking target need not necessarily be the face of the subject. If the subject can be identified from a single image by comparing with a plurality of conditions such as color and shape, such as a specific posture of the subject and a specific animal, the face in the above-described embodiment It is possible to replace it.

［他の実施形態］
本発明の目的は、前述した実施形態の機能を実現するように各種デバイスを動作させるために、これらデバイスと接続された装置或いはシステムのコンピュータに対して、本実施形態の機能を実現するソフトウェアを供給しても達成できる。この場合、上記ソフトウェアのプログラムコードを記録した記憶媒体を、システム又は装置に供給し、そのコンピュータ（ＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行して各種デバイスを動作させることで達成される。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。 [Other Embodiments]
An object of the present invention is to operate various devices so as to realize the functions of the above-described embodiments, and to implement software that realizes the functions of the present embodiment for a computer of an apparatus or system connected to these devices. It can be achieved even if it is supplied. In this case, a storage medium storing the program code of the software is supplied to a system or apparatus, and the computer (CPU or MPU) reads and executes the program code stored in the storage medium to operate various devices. Achieved. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.

プログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性の半導体メモリカード、ＲＯＭなどを用いることができる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現される場合もある。しかし、さらにプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部又は全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 As a storage medium for supplying the program code, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, magnetic tape, nonvolatile semiconductor memory card, ROM, or the like can be used. . Further, the functions of the above-described embodiments may be realized by executing the program code read by the computer. However, an OS (operating system) running on the computer may perform part or all of the actual processing based on the instruction of the program code, and the functions of the above-described embodiments may be realized by the processing. Needless to say, it is included.

さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれる場合も有得る。その後、プログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, the program code read from the storage medium may be written to a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Thereafter, the CPU of the function expansion board or function expansion unit performs part or all of the actual processing based on the instruction of the program code, and the function of the above-described embodiment is realized by the processing. Needless to say.

Claims

画像を入力する画像入力手段と、
１つの入力画像から特定の被写体を検出する被写体検出手段と、
連続的に入力される画像から前記特定の被写体の位置を推定する追尾手段と、
前記追尾手段による追尾に用いる追尾テンプレートを決定する決定手段と、を有し、
前記決定手段は、連続的に入力される画像の１つから、前記被写体検出手段によって前記特定の被写体が検出され、かつ、該画像よりも後に入力される画像の１つから前記被写体検出手段によって前記特定の被写体が検出されずに前記追尾手段によって前記特定の被写体の位置が推定された場合に、前記被写体検出手段により検出された画像、および、前記追尾手段により位置が推定された画像のいずれかを選択し、前記追尾テンプレートとして設定することを特徴とする画像入力装置。 An image input means for inputting an image;
Subject detection means for detecting a specific subject from one input image;
Tracking means for estimating the position of the specific subject from continuously input images;
Determining means for determining a tracking template used for tracking by the tracking means;
The determination unit detects the specific subject from the one of continuously input images by the subject detection unit, and uses the subject detection unit from one of the images input after the image. When the position of the specific subject is estimated by the tracking means without detecting the specific subject, either the image detected by the subject detection means or the image whose position is estimated by the tracking means An image input device, wherein the image is selected and set as the tracking template.

前記特定の被写体の向きを検出する向き検出手段を更に有し、
前記被写体検出手段は、前記特定の被写体の向きが予め設定された範囲内であれば、当該被写体検出手段により検出された画像を前記追尾テンプレートとして設定し、前記検出された被写体の向きが予め設定された範囲内になければ、前記追尾手段により位置が推定された画像を前記追尾テンプレートとして設定することを特徴とする請求項１に記載の画像入力装置。 Direction detection means for detecting the direction of the specific subject;
The subject detection means sets the image detected by the subject detection means as the tracking template if the orientation of the specific subject is within a preset range, and the orientation of the detected subject is preset. 2. The image input device according to claim 1, wherein an image whose position is estimated by the tracking unit is set as the tracking template if it is not within the specified range.

前記特定の被写体は、人物の顔であることを特徴とする請求項１または２に記載の画像入力装置。 The image input apparatus according to claim 1, wherein the specific subject is a human face.

画像を入力する画像入力装置における被写体検出方法であって、
１つの入力画像から特定の被写体を検出する被写体検出工程と、
連続的に入力される画像から前記特定の被写体の位置を推定する追尾工程と、
前記追尾工程による追尾に用いる追尾テンプレートを決定する決定工程と、を有し、
前記決定工程は、連続的に入力される画像の１つから、前記被写体検出工程によって前記特定の被写体が検出され、かつ、該画像よりも後に入力される画像の１つから前記被写体検出工程によって前記特定の被写体が検出されずに前記追尾工程によって前記特定の被写体の位置が推定された場合に、前記被写体検出工程により検出された画像、および、前記追尾工程により位置が推定された画像のいずれかを選択し、前記追尾テンプレートとして設定することを特徴とする被写体検出方法。 An object detection method in an image input device for inputting an image,
A subject detection step of detecting a specific subject from one input image;
A tracking step of estimating the position of the specific subject from continuously input images;
Determining a tracking template to be used for tracking by the tracking process,
In the determining step, the specific subject is detected by the subject detection step from one of continuously input images, and the subject detection step is performed from one of the images input after the image. When the position of the specific subject is estimated by the tracking step without detecting the specific subject, either the image detected by the subject detection step or the image whose position is estimated by the tracking step A method for detecting an object, wherein the object is selected and set as the tracking template.

請求項４に記載の被写体検出方法を画像入力装置のコンピュータに実行させるためのプログラム。 A program for causing a computer of an image input apparatus to execute the subject detection method according to claim 4.