JP2021040183A

JP2021040183A - Image processing apparatus and image processing method

Info

Publication number: JP2021040183A
Application number: JP2019158661A
Authority: JP
Inventors: 佳絵伊藤; Yoshie Ito
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2021-03-11
Anticipated expiration: 2039-08-30
Also published as: JP7479803B2

Abstract

To accurately estimate and display a pupil position for a small subject without impairing real time properties.SOLUTION: An image processing apparatus has: display control means that displays a picked-up image and an icon on display means; detection means that detects a face and eyes from the picked-up image; tracking means that detects and tracks a face area of an arbitrary object from the picked-up image with a detection method having a faster processing speed but a lower accuracy than the detection means; and determination means that determines estimated positions of the eyes based on a position and size of the face and position information on the eyes. The determination means calculates, in a frame in which the detection means detects the eyes and the face, relative positions of the eyes to the face based on at least one of eye positions, a face position, and face size information detected by the detection means, and applies a relative positional relation of the eyes to the face to the face position and the face size information detected by the tracking means in the present frame to determine estimated positions of the eyes, and the display control means superimposedly displays icons indicating the estimated positions of the eyes on the display means.SELECTED DRAWING: Figure 5

Description

本発明は、被写体を検出し、検出した被写***置を表示する装置に関する。 The present invention relates to a device that detects a subject and displays the detected subject position.

撮像素子を使用したデジタルカメラにおいて、撮像素子から得られた画像データから被写体の検出及び追尾を行い、その被写体に対してピント、明るさ、色を好適な状態に合わせて撮影することが一般的になっている。検出対象となる被写体として一般的なものとしては、人物の顔や人体、あるいは犬猫などの特定の動物などが知られている。さらに、検出した被写体の特定の部位を検出する技術として、顔の中の目（瞳）、鼻、口といった器官検出がある。器官検出の代表的な使用用途としては、検出された目を焦点検出領域に設定しオートフォーカスを行う瞳ＡＦがある。特許文献１には、統計データ上の顔の輪郭領域からの瞳領域の相対位置を用いて、顔輪郭情報から瞳領域を推定する方法が開示されている。 In a digital camera using an image sensor, it is common to detect and track a subject from the image data obtained from the image sensor, and shoot the subject in focus, brightness, and color in a suitable state. It has become. As a general subject to be detected, a person's face, a human body, a specific animal such as a dog or a cat, or the like is known. Further, as a technique for detecting a specific part of the detected subject, there is organ detection such as eyes (pupils), nose, and mouth in the face. A typical use of organ detection is pupil AF, in which the detected eye is set in the focus detection area and autofocus is performed. Patent Document 1 discloses a method of estimating a pupil region from face contour information by using a relative position of a pupil region from a face contour region on statistical data.

特開２００９−１７５７４４号公報JP-A-2009-175744

カメラから遠い、すなわち小さな被写体では、被写体に対する情報が減少するため、検出精度が出にくい。このような小さな被写体に対して瞳検出を行う場合に、既存の手法では課題が存在している。 For a subject far from the camera, that is, a small subject, the information about the subject is reduced, so that the detection accuracy is difficult to obtain. When performing pupil detection on such a small subject, there is a problem in the existing method.

そこで、本発明では、先行技術と比較して、小さい被写体に対してリアルタイム性を損なうことなく精度よく瞳位置を推定して表示する技術を提供することを目的としている。 Therefore, an object of the present invention is to provide a technique for accurately estimating and displaying the pupil position for a small subject without impairing the real-time property as compared with the prior art.

そこで、本発明は、
表示手段に撮像画像およびアイコンを表示する表示制御手段と、撮像画像から顔および目を検出する検出手段と、前記撮像画像から任意の対象の顔領域を、前記検出手段よりも処理速度は早いが精度が低い検出手法で検出し追尾する追尾手段と、顔の位置およびサイズと目の位置情報をもとに目の推定位置を決定する決定手段とを有し、前記決定手段は、検出手段が目と顔を検出したコマにおいて、検出手段が検出した目位置、顔位置および顔サイズ情報の少なくとも１つをもとに顔に対する目の相対位置を算出し、現コマで前記追尾手段が検出した顔位置および顔サイズ情報に対して、前記顔に対する目の相対位置関係を当てはめて目の推定位置を決定し、前記表示制御手段は、前記目の推定位置を示すアイコンを表示手段上に重畳表示することを特徴とする。 Therefore, the present invention
The processing speed of the display control means for displaying the captured image and the icon on the display means, the detection means for detecting the face and eyes from the captured image, and the face area of an arbitrary target from the captured image is faster than that of the detection means. It has a tracking means for detecting and tracking by a detection method having low accuracy, and a determining means for determining an estimated eye position based on the position and size of the face and the position information of the eyes. In the frame where the eyes and the face were detected, the relative position of the eyes with respect to the face was calculated based on at least one of the eye position, the face position and the face size information detected by the detecting means, and the tracking means detected in the current frame. The estimated position of the eye is determined by applying the relative positional relationship of the eye to the face with respect to the face position and the face size information, and the display control means superimposes and displays an icon indicating the estimated position of the eye on the display means. It is characterized by doing.

本発明によれば、小さい被写体に対してリアルタイム性を損なうことなく精度よく瞳位置を推定して表示することができる。 According to the present invention, it is possible to accurately estimate and display the pupil position for a small subject without impairing the real-time property.

本発明を実施するための構成図Configuration diagram for carrying out the present invention 顔および器官検出方法に関する図Diagram of face and organ detection methods 追尾方法に関する図Diagram of tracking method 実施例１におけるタイミング図Timing diagram in Example 1 実施例１における瞳位置推定方法を示した図The figure which showed the pupil position estimation method in Example 1. 実施例１における小顔時の瞳位置推定に関するフローチャートFlowchart for estimating pupil position when face is small in Example 1 実施例２における顔角度による表示枠位置の補正方法を示した図The figure which showed the correction method of the display frame position by a face angle in Example 2. 実施例２における顔角度による表示枠サイズの補正方法を示した図The figure which showed the correction method of the display frame size by a face angle in Example 2.

本発明の各種実施例について、添付図面を参照して説明する。各実施形態では、瞳検出機能を有する撮像装置を例示する。瞳検出機能を有する撮像装置としては、ビデオカメラ、デジタルカメラおよび銀塩スチルカメラや、さらにカメラ機能を搭載したスマートフォンなどの携帯機器も本発明の一側面を構成する。 Various examples of the present invention will be described with reference to the accompanying drawings. In each embodiment, an imaging device having a pupil detection function is illustrated. As an imaging device having a pupil detection function, a video camera, a digital camera, a silver halide still camera, and a portable device such as a smartphone equipped with the camera function also constitute one aspect of the present invention.

本実施形態では、瞳ＡＦの発動および発動位置をユーザに通知するため、カメラ上の表示装置上の被写体の画像データに重畳する形で、瞳ＡＦ領域に枠などのマークを表示する。特に動きのある被写体においては、ユーザに対して正確に瞳ＡＦの発動位置を知らせるために、リアルタイム性が要求される。また、瞳ＡＦは、撮影者の撮影準備にゆとりを持たせるためにカメラに対してなるべく遠い被写体でも発動し、枠表示されることが好ましい。 In the present embodiment, in order to notify the user of the activation and activation position of the pupil AF, a mark such as a frame is displayed in the pupil AF area in a form of being superimposed on the image data of the subject on the display device on the camera. Especially for a moving subject, real-time performance is required in order to accurately inform the user of the activation position of the pupil AF. Further, it is preferable that the pupil AF is activated even for a subject as far as possible from the camera and displayed in a frame in order to allow the photographer to prepare for shooting.

顔検出および器官検出手段の既知の方法として、Ｈａａｒ−ｌｉｋｅ特徴（２つの領域の輝度差）を用いた手法や、ディープラーニング等が知られている。別の手段として、テンプレートマッチング手法を用いた汎用的な物体追尾を用いて、ある時点での器官位置を含む画像エリアをテンプレートとして作成し、次フレーム以降はテンプレートと類似するエリアを探索するパターンマッチングを行う手法が知られている。また、さらに他の手段として、器官の位置を他の情報から簡易推定する方法が知られている。 As a known method of face detection and organ detection means, a method using a Har-like feature (brightness difference between two regions), deep learning, and the like are known. As another means, using general-purpose object tracking using a template matching method, an image area including the organ position at a certain point in time is created as a template, and from the next frame onward, pattern matching is performed to search for an area similar to the template. There is a known method of doing this. Further, as another means, a method of simply estimating the position of an organ from other information is known.

一般的に器官検出手段は精度が高い代わりに検出にかかる時間が長い。そのため、リアルタイム性が損なわれてしまう。特に、顔検出と器官検出は別手段として装置が構成される場合にさらに時間を必要とする。例えば画面内から複数の顔候補を検出し、その複数の顔候補の中からＡＦ対象となりうる顔を選択・決定した上で顔エリアを器官検出手段に投入し、より詳細に解析することで器官を検出することが可能となる。このように、顔検出と器官検出を別構成とした場合は実行の順序関係がおのずと決定され、さらに器官検出にかかる時間によっては、顔検出実行時との時間的な乖離が発生する。 In general, organ detection means have high accuracy but take a long time to detect. Therefore, the real-time property is impaired. In particular, face detection and organ detection require more time if the device is configured as a separate means. For example, multiple face candidates are detected from the screen, a face that can be an AF target is selected and determined from the plurality of face candidates, the face area is input to the organ detection means, and the organ is analyzed in more detail. Can be detected. In this way, when face detection and organ detection are configured separately, the order of execution is naturally determined, and depending on the time required for organ detection, a time lag from the time of face detection execution occurs.

一方、テンプレートマッチング手法を用いた追尾手段は、検出にかかる時間が短くリアルタイム性の確保が容易である代わりに、精度に課題がある。そもそもテンプレートマッチングは画像内に存在する類似パターンを探索する方式のため、本来検出したい被写体とは別の、類似する別被写体をヒットしてしまう場合がある。小さい顔の中のさらに小さい目では追尾に十分な解像度が得られず、マッチング精度が不足して誤追尾が発生する課題がある。器官の位置を他の情報から簡易推定する方法である特許文献１に記載の手法では、検出にかかる時間が短い利点はあるが、人の顔の平均形状に基づいて瞳領域を推定するため、顔パーツが平均から偏っている被写体ではズレが生じてしまう。 On the other hand, the tracking means using the template matching method has a problem in accuracy, although the time required for detection is short and real-time performance can be easily ensured. In the first place, since template matching is a method of searching for a similar pattern existing in an image, it may hit another similar subject different from the subject originally desired to be detected. Even smaller eyes in a small face cannot obtain sufficient resolution for tracking, and there is a problem that matching accuracy is insufficient and erroneous tracking occurs. The method described in Patent Document 1, which is a method for simply estimating the position of an organ from other information, has an advantage that the detection time is short, but since the pupil region is estimated based on the average shape of a human face, If the face parts are biased from the average, the subject will be misaligned.

そこで本実施形態では、器官検出処理とパターンマッチング処理とを併用することで瞳検出のリアルタイム性と精度を両立することを特徴とする。 Therefore, the present embodiment is characterized in that both the real-time property and the accuracy of pupil detection are achieved by using the organ detection process and the pattern matching process in combination.

図１は本発明の第１実施形態に係る撮像装置の構成例を示す図であり、瞳ＡＦ機能を搭載したミラーレスカメラ（以下、カメラ）の構成を例示したものである。 FIG. 1 is a diagram showing a configuration example of an image pickup apparatus according to a first embodiment of the present invention, and illustrates the configuration of a mirrorless camera (hereinafter referred to as a camera) equipped with a pupil AF function.

交換レンズ１００は、カメラ本体部１２０に装着可能な光学機器のうちの一つである。交換レンズ１００は、主撮影光学系１０２、光量調節を行う絞り１０３、およびピント調節を行うフォーカスレンズ群１０４を含む撮影レンズユニット１０１を備える。 The interchangeable lens 100 is one of the optical devices that can be attached to the camera body 120. The interchangeable lens 100 includes a photographing lens unit 101 including a main photographing optical system 102, an aperture 103 for adjusting the amount of light, and a focus lens group 104 for adjusting the focus.

レンズシステム制御用マイクロコンピュータ（以下、レンズ制御部という）１１１は、絞り１０３の動作を制御する絞り制御部１１２、フォーカスレンズ群１０４の動作（駆動とも称する）を制御するフォーカスレンズ制御部１１３、などを備える。フォーカスレンズ制御部１１３は、カメラ本体部１２０から取得したフォーカスレンズ駆動情報に基づいてフォーカスレンズ群１０４を撮影レンズユニット１０１の光軸方向に駆動させ、カメラのピント調節を行う。 The lens system control microcomputer (hereinafter referred to as a lens control unit) 111 includes an aperture control unit 112 that controls the operation of the aperture 103, a focus lens control unit 113 that controls the operation (also referred to as driving) of the focus lens group 104, and the like. To be equipped with. The focus lens control unit 113 drives the focus lens group 104 in the optical axis direction of the photographing lens unit 101 based on the focus lens drive information acquired from the camera body unit 120, and adjusts the focus of the camera.

なお、フォーカスレンズ群１０４は、複数のフォーカスレンズを有していても、１枚のフォーカスレンズのみを有していても良い。また、ここでは図の簡略化のため、交換レンズの例として単焦点レンズを示しているが、焦点距離を変更可能なレンズ（ズームレンズ）であっても良い。ズームレンズである場合には、レンズ制御部１１３はズームレンズ位置を検出するエンコーダ出力から焦点距離情報する。また、手振れ補正機能を搭載したレンズの場合には、レンズ制御部１１３は、振れ補正用のシフトレンズ群などの制御も行う。 The focus lens group 104 may have a plurality of focus lenses or may have only one focus lens. Further, although a single focus lens is shown here as an example of an interchangeable lens for simplification of the figure, a lens (zoom lens) whose focal length can be changed may be used. In the case of a zoom lens, the lens control unit 113 provides focal length information from the encoder output that detects the zoom lens position. Further, in the case of a lens equipped with a camera shake correction function, the lens control unit 113 also controls a shift lens group for shake correction and the like.

カメラ本体部１２０は、露出制御に用いるシャッター１２１や、ＣＭＯＳ（相補型金属酸化膜半導体）センサ等の撮像素子１２２を備える。撮像素子１２２の出力する撮像信号は、アナログ信号処理回路１２３で処理された後、カメラ信号処理回路１２４に送られる。 The camera body 120 includes a shutter 121 used for exposure control and an image sensor 122 such as a CMOS (complementary metal oxide semiconductor) sensor. The image pickup signal output from the image pickup element 122 is processed by the analog signal processing circuit 123 and then sent to the camera signal processing circuit 124.

カメラシステム制御用マイクロコンピュータ（以下、カメラ制御部という）１３１は、撮像装置全体を制御する。例えば、カメラ制御部１３１は不図示のシャッター駆動用のモータを駆動制御し、シャッター１２１を駆動する。メモリカード１２５は撮影された画像のデータを記録する記録媒体である。撮影者によって操作されるレリーズスイッチ１８１の押下状態がカメラ制御部１３１に送られ、その状態に応じて撮像した画像がメモリカード１２５に記憶される。 The camera system control microcomputer (hereinafter referred to as a camera control unit) 131 controls the entire image pickup apparatus. For example, the camera control unit 131 drives and controls a motor for driving a shutter (not shown) to drive the shutter 121. The memory card 125 is a recording medium for recording data of captured images. The pressed state of the release switch 181 operated by the photographer is sent to the camera control unit 131, and the image captured according to the state is stored in the memory card 125.

画像表示部１７１は、撮影者がカメラで撮影しようとしている画像をモニタし、また撮影した画像を表示する液晶パネル（ＬＣＤ）等の表示デバイスを備える。また、タッチパネル１７２は撮影者が指やタッチペンにより画像表示部１７１における座標を指定することができる操作部であり、画像表示部１７１とは一体的に構成することができる。例えば、タッチパネルを光の透過率が画像表示部１７１の表示を妨げないように構成し、画像表示部１７１の表示面の内部に組み込む内蔵型（インセル型）などである。そして、タッチパネル１７２上の入力座標と、画像表示部１７１上の表示座標とを対応付ける。これにより、あたかもユーザが画像表示部１７１上に表示された画面を直接的に操作可能であるかのようなＧＵＩ（グラフィカルユーザーインターフェース）を構成することができる。操作部１７２への操作状態はカメラ制御部１３１で管理される。 The image display unit 171 includes a display device such as a liquid crystal panel (LCD) that monitors an image that the photographer intends to capture with the camera and displays the captured image. Further, the touch panel 172 is an operation unit that allows the photographer to specify the coordinates in the image display unit 171 with a finger or a touch pen, and can be integrally configured with the image display unit 171. For example, there is a built-in type (in-cell type) in which the touch panel is configured so that the light transmittance does not interfere with the display of the image display unit 171 and is incorporated inside the display surface of the image display unit 171. Then, the input coordinates on the touch panel 172 are associated with the display coordinates on the image display unit 171. This makes it possible to configure a GUI (graphical user interface) as if the user can directly operate the screen displayed on the image display unit 171. The operation state of the operation unit 172 is managed by the camera control unit 131.

カメラ本体部１２０は、交換レンズ１００とのマウント面に、交換レンズ１００と通信を行うための通信端子であるマウント接点部１６１を備える。また、交換レンズ１００は、カメラ本体１２０とのマウント面に、カメラ本体１２０と通信を行うための通信端子であるマウント接点部１１４を備える。 The camera body 120 includes a mount contact portion 161 which is a communication terminal for communicating with the interchangeable lens 100 on the mount surface with the interchangeable lens 100. Further, the interchangeable lens 100 includes a mount contact portion 114 which is a communication terminal for communicating with the camera body 120 on the mount surface with the camera body 120.

レンズ制御部１０５とカメラ制御部１３１は、マウント接点部１１４および１６１を介して所定のタイミングでシリアル通信を行うよう通信を制御する。この通信により、カメラ制御部１３１からレンズ制御部１１１にはフォーカスレンズ駆動情報、絞り駆動情報などが送られ、レンズ制御部１１１からカメラ制御部１３１へ焦点距離などの光学情報が送られる。 The lens control unit 105 and the camera control unit 131 control communication via the mount contact units 114 and 161 so as to perform serial communication at predetermined timings. Through this communication, the camera control unit 131 sends focus lens drive information, aperture drive information, and the like to the lens control unit 111, and the lens control unit 111 sends optical information such as the focal length to the camera control unit 131.

カメラ信号処理回路１２４は、顔情報検出部１４１を備え、さらに器官情報検出部１４２を備えており、器官情報検出部１４２は顔情報検出部１４１で検出した顔情報から、瞳、口などの器官情報を検出する。追尾部１４３は、アナログ信号処理回路１２３より撮像信号を受け取り、前記顔情報検出部１４１および器官情報検出部１４２よりも精度は劣るが高速な検出手法により瞳や顔を検出する。算出部１４４は、顔情報検出部１４１、器官情報検出部１４２、追尾部１４３における検出結果を用いて顔に対する瞳の相対位置を算出する。顔情報検出部１４１、器官情報検出部１４２、追尾部１４３における検出結果および算出部１４４の算出結果はカメラ制御部１３１に送られる。 The camera signal processing circuit 124 includes a face information detection unit 141, and further includes an organ information detection unit 142, and the organ information detection unit 142 uses organs such as eyes and mouth from the face information detected by the face information detection unit 141. Detect information. The tracking unit 143 receives an image pickup signal from the analog signal processing circuit 123, and detects the pupil and face by a high-speed detection method, which is inferior in accuracy to the face information detection unit 141 and the organ information detection unit 142. The calculation unit 144 calculates the relative position of the pupil with respect to the face using the detection results of the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143. The detection results of the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143 and the calculation results of the calculation unit 144 are sent to the camera control unit 131.

カメラ制御部１３１には、本発明に関連するブロックとして、対象とする瞳を検出した顔情報から自動で選択する瞳自動選択部１５０、検出した顔、あるいは瞳情報に対応して表示部１７１に表示させるための検出枠を設定する表示枠設定部１５１を有する。また、撮影者による操作に応じて、撮影者が指定した瞳を検出し続ける瞳として指定、あるいは指定を解除する瞳指定／指定解除部１５２を有する。また、ユーザによる操作に応じて指定した瞳や顔のほか、顔情報検出部１４１および器官情報検出部１４２、追尾部１４３の検出結果を記憶する記憶部１５３、さらに、選択あるいは指定した瞳や顔を、ピントを合わせるべき被写体（対象被写体とも称する）として焦点検出部１５５に通知するＡＦ対象被写体設定部１５４がある。これらは顔情報検出部１４１、器官情報検出部１４２、追尾部１４３の出力に基づいて動作する。焦点検出部１５５は、ＡＦ対象被写体設定部１５４によって通知されたピントを合わせるべき被写体に対応する画像信号に基づいて、焦点検出処理を行う。焦点検出処理は、例えば公知の位相差検出式や、コントラスト検出式等によって実行される。位相差検出式の場合、焦点検出処理として、視差を有する一対の像信号を相関演算することで算出された像すれ量の算出、もしくは像ずれ量を更にデフォーカス量に変換して算出する処理が行われる。デフォーカス量は、交換レンズ１００のレンズ駆動時の敏感度等を考慮することで、更にフォーカスレンズ駆動量へと変換することができる。カメラ制御部１３１は、焦点検出部１５５によって検出された焦点検出結果（像ずれ量またはデフォーカス量）あるいは焦点検出結果に基づいて算出されたフォーカスレンズ駆動量をレンズ制御部１１１に送信する。フォーカスレンズ制御部１１３は、カメラ制御部１３１から受信したフォーカスレンズ駆動情報に基づいて、フォーカスレンズの駆動を制御する。換言すると、カメラ制御部１３１がフォーカスレンズ制御部１１３を介してフォーカスレンズの駆動を制御する。 As a block related to the present invention, the camera control unit 131 has a pupil automatic selection unit 150 that automatically selects a target pupil from the detected face information, and a display unit 171 corresponding to the detected face or pupil information. It has a display frame setting unit 151 for setting a detection frame for display. Further, it has a pupil designation / designation cancellation unit 152 that designates or cancels the designation as a pupil that continues to detect the pupil designated by the photographer according to the operation by the photographer. In addition to the eyes and face specified according to the operation by the user, the face information detection unit 141 and the organ information detection unit 142, the storage unit 153 that stores the detection results of the tracking unit 143, and the selected or specified pupil and face. There is an AF target subject setting unit 154 that notifies the focus detection unit 155 as a subject (also referred to as a target subject) to be focused on. These operate based on the outputs of the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143. The focus detection unit 155 performs the focus detection process based on the image signal corresponding to the subject to be focused, which is notified by the AF target subject setting unit 154. The focus detection process is executed by, for example, a known phase difference detection formula, a contrast detection formula, or the like. In the case of the phase difference detection formula, as the focus detection process, the amount of image misalignment calculated by correlating a pair of image signals having parallax, or the amount of image misalignment is further converted into the amount of defocus and calculated. Is done. The defocus amount can be further converted into a focus lens drive amount by considering the sensitivity of the interchangeable lens 100 when the lens is driven. The camera control unit 131 transmits the focus detection result (image shift amount or defocus amount) detected by the focus detection unit 155 or the focus lens drive amount calculated based on the focus detection result to the lens control unit 111. The focus lens control unit 113 controls the drive of the focus lens based on the focus lens drive information received from the camera control unit 131. In other words, the camera control unit 131 controls the drive of the focus lens via the focus lens control unit 113.

このような構成からなるカメラにおいて、顔の器官を検出及び追尾する方法について説明する。 A method of detecting and tracking facial organs in a camera having such a configuration will be described.

図２に検出器による顔および器官の検出を行う器官検出処理の様子を示す。本実施形態では、顔情報検出部１４１あるいはその中の器官情報検出部１４２が行うものとする。図２（ａ）は画像内に複数の顔が存在している状況で複数の顔が検出できていることを示している（顔エリア２０１、顔エリア２１１、顔エリア２１２）。図２（ｂ）は顔エリア２０１を器官検出器（器官情報検出部１４２）にかけた結果を示している。器官検出は顔内部の特徴点を検出するものであり、代表的なものとして、目（２０３、２０５）、鼻２０７、口２０９を検出することができる。図２（ｃ）は顔の状況によって変化する顔検出の検出スコアの大小の様子を示している。検出スコアは顔検出の信頼度、自信度と言い換えられる情報で、その人物自身の表情によって変化したり、環境光の当たり方などの外部要因によっても変化する。高スコアであるほど正確な位置を検出できており、低スコアの場合はその位置情報は信頼ならない、といった使い分けを行うことができる。図２（ｃ）の顔エリア２０１では、目は大きく開いており、口もはっきりしている（比較的大きく見えている）ので顔検出スコアは相対的に高い。一方で顔エリア２３１、顔エリア２５１になるにつれて目は小さくあるいは閉じた状態に近くなり、口元もはっきり見えず小さくすぼんできており、顔検出スコアは合わせて相対的に低くなる。 FIG. 2 shows a state of an organ detection process for detecting a face and an organ by a detector. In the present embodiment, it is assumed that the face information detection unit 141 or the organ information detection unit 142 in the face information detection unit 141 performs the operation. FIG. 2A shows that a plurality of faces can be detected in a situation where a plurality of faces are present in the image (face area 201, face area 211, face area 212). FIG. 2B shows the result of applying the face area 201 to the organ detector (organ information detection unit 142). Organ detection detects feature points inside the face, and typically can detect eyes (203, 205), nose 207, and mouth 209. FIG. 2C shows how the detection score of face detection changes depending on the face condition. The detection score is information that can be rephrased as the reliability and confidence of face detection, and it changes depending on the person's own facial expression and external factors such as how the ambient light hits. The higher the score, the more accurate the position can be detected, and the lower the score, the more unreliable the position information can be used. In the face area 201 of FIG. 2C, the eyes are wide open and the mouth is clear (looks relatively large), so that the face detection score is relatively high. On the other hand, as the face area 231 and the face area 251 become smaller, the eyes become smaller or closer to a closed state, the mouth is not clearly visible, and the area becomes smaller and smaller, and the face detection score is relatively low.

図２（ｄ）は検出された顔と器官の位置から、顔の角度を推定する様子を示している。正面顔（角度０度）の顔では、顔枠２６１の中心線２６２に対して両瞳が左右均等に配置され、鼻や口の中心は中心線上に配置される。一方、斜め４５度回転した顔では顔枠２７１の中心線２７２から目鼻口が寄っており、左右均等性が崩れる。目鼻口の配置の偏り程度によって、顔の角度を推定することができる。顔角度の信頼度は、目鼻口の検出スコアに基づき算出される。 FIG. 2D shows how the angle of the face is estimated from the detected positions of the face and organs. In the face of the front face (angle 0 degree), both pupils are evenly arranged on the left and right with respect to the center line 262 of the face frame 261, and the centers of the nose and mouth are arranged on the center line. On the other hand, in the face rotated at an angle of 45 degrees, the eyes, nose and mouth are closer to the center line 272 of the face frame 271, and the left-right uniformity is lost. The angle of the face can be estimated from the degree of bias in the arrangement of the eyes, nose and mouth. The reliability of the face angle is calculated based on the detection score of the eyes, nose and mouth.

図３に汎用的な追尾手段による追尾処理の動作概念を示す。本実施形態では、撮像素子１２２からの信号に基づいて追尾部１４３によって本追尾処理が行われるものとする。撮像素子は駆動信号ＶＤ周期３０１で駆動され、露光３０３と読み出し３０５の繰り返し動作を行う。撮像素子から得られたデータは現像処理されて可視画像として画像生成される（画像生成３０７）。図３ではこの画像を用いてテンプレートマッチング（パターンマッチング）手法を適用する。露光１に基づいて画像生成３０７の処理で生成された画像１において、ユーザによる指定あるいは被写体検出処理に基づき自動で、追尾対象を含むエリアをテンプレート３３１（追尾領域）として設定する。その後、露光２に基づいて生成された画像２に対して、順次位置を変えながらテンプレート画像３３１との画像差分をとるサーチ動作を行う（サーチ３３３）。差分値を得る作業を繰り返し行い、差分が最も小さいマッチエリア３３５が追尾対象が存在する位置として確定する。その後、画像１と画像２の間で追尾処理が成立したあと画像２で求めた追尾位置を次回のためのテンプレートとして更新し、画像２と３による追尾、画像３と４による追尾を繰り返し実行していくことで、連続的な追尾処理を実現する。なお、更新の頻度は所定フレームごと、あるいは撮影条件や環境によって制御されてもよい。追尾手段によって、顔位置および瞳などの器官の位置を得ることができるが、マッチングの精度を得るためには対象が規定のサイズ以上であることが好ましい。 FIG. 3 shows an operation concept of tracking processing by a general-purpose tracking means. In the present embodiment, it is assumed that the tracking unit 143 performs the main tracking process based on the signal from the image sensor 122. The image sensor is driven by the drive signal VD cycle 301, and repeats the exposure 303 and the read 305. The data obtained from the image sensor is developed and generated as a visible image (image generation 307). In FIG. 3, a template matching (pattern matching) method is applied using this image. In the image 1 generated by the process of image generation 307 based on the exposure 1, the area including the tracking target is automatically set as the template 331 (tracking area) based on the user's designation or the subject detection process. After that, a search operation is performed on the image 2 generated based on the exposure 2 to obtain an image difference from the template image 331 while sequentially changing the position (search 333). The work of obtaining the difference value is repeated, and the match area 335 with the smallest difference is determined as the position where the tracking target exists. After that, after the tracking process is established between the images 1 and 2, the tracking position obtained in the image 2 is updated as a template for the next time, and the tracking by the images 2 and 3 and the tracking by the images 3 and 4 are repeatedly executed. By going on, continuous tracking processing will be realized. The update frequency may be controlled for each predetermined frame, or depending on the shooting conditions and environment. The tracking means can obtain the positions of organs such as the face position and the pupil, but it is preferable that the target is a specified size or larger in order to obtain matching accuracy.

上記説明した器官検出処理と追尾処理を併用して実行することによる本発明の効果を説明するため、図２で示した検出と、図３で説明した追尾処理を用い、処理時間の観点からの連写撮影時のタイミング図を図４に示す。図４（ａ）は静止画撮影（４１１、４３１、４５１）と、静止画撮影の間に検出及びＡＦ用のＬＶ露光（４１３、４３３、４５３）をもうけて検出を行い、検出結果を用いてＡＦ４１５および枠表示４１７を行うシーケンスを示している。枠表示４１７に使用する瞳位置情報を顔・器官検出４２１より得る場合、追尾４４１と比較して時間がかかるため、枠表示の位置更新までの時間４２９が長くなりリアルタイム性が損なわれる。また、検出処理時間は連写撮影時に求められる高速なコマ速の実現への律速となる。 In order to explain the effect of the present invention by executing the organ detection process and the tracking process described above in combination, the detection shown in FIG. 2 and the tracking process described in FIG. 3 are used from the viewpoint of processing time. FIG. 4 shows a timing diagram during continuous shooting. In FIG. 4A, detection is performed between still image shooting (411, 431, 451) and LV exposure for detection and AF (413, 433, 453) between still image shooting, and the detection result is used. The sequence of performing AF415 and frame display 417 is shown. When the pupil position information used for the frame display 417 is obtained from the face / organ detection 421, it takes longer than the tracking 441, so that the time 429 until the frame display position is updated becomes longer and the real-time property is impaired. In addition, the detection processing time is the rate-determining factor for realizing the high-speed frame speed required for continuous shooting.

図４（ｂ）はＬＶ露光期間で得た画像を用いて追尾処理によりＡＦおよび表示枠対象位置を求めるシーケンスを示している。一般的に処理の複雑度が高い器官検出を用いた検出処理よりも、シンプルなマッチング処理を用いた追尾処理のほうが短時間で処理が終わることが知られている。追尾４４１の結果を用いてＡＦ４３５と枠表示４３７を行うことで、検出を行う図４（ａ）のシーケンスよりも短い時間４４９で枠表示の位置更新を行うことができ、早いコマ速も実現できていることが分かる。しかしながら、追尾単体での連続動作は類似被写体への誤追尾の懸念がある。被写体が小さい場合、例えば画像枠に全身が入っているような人の顔の場合、追尾による顔位置の精度はある程度は保たれるが目の位置の精度は不十分となる場合が多い。 FIG. 4B shows a sequence for obtaining AF and display frame target positions by tracking processing using images obtained during the LV exposure period. It is generally known that the tracking process using a simple matching process completes the process in a shorter time than the detection process using the organ detection, which has a high degree of processing complexity. By performing AF435 and frame display 437 using the result of tracking 441, the position of the frame display can be updated in a shorter time 449 than the sequence of FIG. 4A for detection, and a high frame speed can be realized. You can see that. However, the continuous operation of the tracking alone may cause erroneous tracking of similar subjects. When the subject is small, for example, in the case of a human face in which the whole body is contained in the image frame, the accuracy of the face position by tracking is maintained to some extent, but the accuracy of the eye position is often insufficient.

図４（ｃ）は検出と追尾を併用したシーケンスを、図５は併用方法を説明するためのイメージ図を示している。高速な追尾処理４６３により得られた現コマの顔情報５２１、５２３（位置及びサイズ）に対し、前コマの高精度な検出処理４６５から得られた顔枠中心５１３からの目５１５の相対位置の情報（図５ではＸ方向に顔枠５１１の水平幅を１００％とした時の２０％右方向、Ｙ方向に１５％上方向）を適用することで、器官検出だけを用いる場合と比較して枠表示のリアルタイム性を高めつつ、追尾情報のみを使用する場合と比較してより高い精度の検出結果に基づいて目の推定位置５２５を推定し、表示することができる。 FIG. 4C shows a sequence in which detection and tracking are used in combination, and FIG. 5 shows an image diagram for explaining the combined method. The relative position of the eye 515 from the face frame center 513 obtained from the high-precision detection process 465 of the previous frame with respect to the face information 521 and 523 (position and size) of the current frame obtained by the high-speed tracking process 463. By applying the information (20% rightward in FIG. 5 when the horizontal width of the face frame 511 is 100% in the X direction and 15% upward in the Y direction), compared with the case where only organ detection is used. While enhancing the real-time property of the frame display, the estimated position 525 of the eye can be estimated and displayed based on the detection result with higher accuracy than the case where only the tracking information is used.

図６を用いて、本発明の処理フローを説明する。本フローはカメラ制御部１３１あるいはカメラ制御部１３１の指示によりカメラ信号処理回路１２４など各部で実行されるものとする。まずＡＦ用のＬＶ露光が終わると、図６（ｂ）のＳ２０１の瞳枠設定処理が開始される。Ｓ２０１で瞳枠設定処理が開始されると、カメラ制御部１３１は、Ｓ２０２に進んで追尾部１４３による現撮影コマにおける追尾処理の結果を取得し、Ｓ２０３に進む。Ｓ２０３では、追尾処理の結果、一定の閾値以上の信頼度で顔を検出できたかを判定する。Ｓ２０３一定の閾値以上の信頼度で顔検出ができたと判定される場合は、Ｓ２０４へ移行し、次にカメラ制御部１３１は、追尾処理を用いて検出された顔のサイズが所定の閾値以下であるか判定する。Ｓ２０４を行う意味として、追尾で検出精度を得るためには対象が規定のサイズ以上であることが好ましいからである。Ｓ２０４でカメラ制御部１３１が顔サイズが精度が出る十分な大きさではなかった場合、すなわち顔サイズが所定の閾値以下であると判定した場合は、Ｓ２０５へ移行する。Ｓ２０５では、算出部１４４が、顔情報検出部１４１および器官情報検出部１４２から得られている最新の顔と瞳の検出結果を取得する。カメラ制御部１３１はＳ２０５で検出結果を取得したのち、Ｓ２０６ではカメラ制御部１３１が、取得した検出結果が検出されたコマのコマ数（コマ番号）と現コマ数（現コマのコマ番号）の差分が所定の閾値以下であるか判定する。この判定に用いるものは、検出結果と現コマの間の時間に紐づいた情報であればなんでも良い。例えば、コマ数の代わりに時間そのものを記録して比較しても良い。この閾値は、好適には被写体の移動量が大きい場合は小さく、移動量が少ない場合は大きく設定される。移動量は、前コマと前々コマにおける同被写体の顔位置、顔角度、顔サイズの変化量の少なくともいずれか１つに基づいて決定されたものを使用する。Ｓ２０６において、カメラ制御部１３１が検出結果を得たコマ数と現コマ数の差分が閾値以下であると判定した場合、取得した検出結果を用いても現コマとの位置ずれが起こりにくいと判断し、Ｓ２０７に移行する。Ｓ２０７では、算出部１４４が、顔情報検出部１４１および器官情報検出部１４２から得られた瞳位置、顔位置情報および顔サイズ情報から、顔に対する瞳の相対位置を算出する。Ｓ２０８では、算出部１４４が算出された相対位置を現コマの追尾で得た顔位置および顔サイズ情報に適用し、現コマにおける瞳の推定位置を算出する。 The processing flow of the present invention will be described with reference to FIG. It is assumed that this flow is executed in each unit such as the camera signal processing circuit 124 according to the instruction of the camera control unit 131 or the camera control unit 131. First, when the LV exposure for AF is completed, the pupil frame setting process of S201 in FIG. 6B is started. When the pupil frame setting process is started in S201, the camera control unit 131 proceeds to S202 to acquire the result of the tracking process in the current shooting frame by the tracking unit 143, and proceeds to S203. In S203, as a result of the tracking process, it is determined whether or not the face can be detected with a reliability equal to or higher than a certain threshold value. S203 When it is determined that the face can be detected with the reliability equal to or higher than a certain threshold value, the process proceeds to S204, and then the camera control unit 131 determines that the size of the face detected by the tracking process is equal to or less than the predetermined threshold value. Determine if there is. The meaning of performing S204 is that it is preferable that the target is a specified size or larger in order to obtain detection accuracy in tracking. If the camera control unit 131 determines in S204 that the face size is not large enough for accuracy, that is, if the face size is equal to or less than a predetermined threshold value, the process proceeds to S205. In S205, the calculation unit 144 acquires the latest face and pupil detection results obtained from the face information detection unit 141 and the organ information detection unit 142. After the camera control unit 131 acquires the detection result in S205, in S206, the camera control unit 131 determines the number of frames (frame number) and the current number of frames (frame number of the current frame) in which the acquired detection result is detected. Determine if the difference is less than or equal to a predetermined threshold. Any information linked to the time between the detection result and the current frame may be used for this determination. For example, the time itself may be recorded and compared instead of the number of frames. This threshold value is preferably set small when the amount of movement of the subject is large, and large when the amount of movement is small. The amount of movement used is determined based on at least one of the amount of change in the face position, face angle, and face size of the same subject in the previous frame and the frame before the previous frame. In S206, when the camera control unit 131 determines that the difference between the number of frames for which the detection result is obtained and the number of current frames is equal to or less than the threshold value, it is determined that the positional deviation from the current frame is unlikely to occur even if the acquired detection result is used. Then, the process proceeds to S207. In S207, the calculation unit 144 calculates the relative position of the pupil with respect to the face from the pupil position, the face position information, and the face size information obtained from the face information detection unit 141 and the organ information detection unit 142. In S208, the calculated relative position is applied to the face position and face size information obtained by tracking the current frame, and the estimated position of the pupil in the current frame is calculated.

Ｓ２０８で現コマにおける瞳推定位置が算出されると、Ｓ２０９に移行し、カメラ制御部１３１は、表示枠設定部１５１が設定した表示部１７１上の前記瞳の推定位置に瞳枠を表示する。なお、そもそも顔サイズが所定の閾値より大きい大きさであった場合は、追尾の瞳検出結果を用いた方が前コマ以前の器官検出処理による検出結果から位置推定するよりも精度とリアルタイム性を確保できるため、Ｓ２０４からＳ２１１へ移行する。Ｓ２１１でカメラ制御部１３１は、追尾処理による瞳検出結果を取得したのち、Ｓ２１２で瞳を一定値以上の信頼度で検出できたかを判定する。Ｓ２１２で瞳を一定値以上の信頼度で検出できたと判定した場合には、Ｓ２０９に遷移し、カメラ制御部１３１は表示枠設定部１５１が追尾の瞳検出結果を用いて設定した瞳枠を表示部１７１に表示する。カメラ制御部１３１が、Ｓ２０３で追尾処理で顔検出ができない、Ｓ２０６での器官検出結果の取得コマ数と現コマ数の差が所定の閾値以下、あるいはＳ２１２で追尾処理で瞳検出ができないとそれぞれ判定した場合には、瞳枠表示を中止する（Ｓ２１３）。 When the pupil estimation position in the current frame is calculated in S208, the process proceeds to S209, and the camera control unit 131 displays the pupil frame at the pupil estimation position on the display unit 171 set by the display frame setting unit 151. If the face size is larger than the predetermined threshold value, the accuracy and real-time performance of using the tracking pupil detection result is better than the position estimation from the detection result of the organ detection process before the previous frame. Since it can be secured, the process shifts from S204 to S211. In S211 the camera control unit 131 acquires the pupil detection result by the tracking process, and then determines in S212 whether the pupil can be detected with a reliability equal to or higher than a certain value. When it is determined in S212 that the pupil can be detected with a reliability equal to or higher than a certain value, the process proceeds to S209, and the camera control unit 131 displays the pupil frame set by the display frame setting unit 151 using the tracking pupil detection result. Displayed in unit 171. If the camera control unit 131 cannot detect the face in the tracking process in S203, the difference between the number of frames acquired as the organ detection result in S206 and the current number of frames is less than a predetermined threshold, or the pupil cannot be detected in the tracking process in S212, respectively. If it is determined, the pupil frame display is stopped (S213).

Ｓ２０５で取得される顔情報検出部１４１（器官情報検出部１４２）による器官検出の検出結果は、図６（ａ）に示されている処理フローに従って毎コマ更新されているものとする。具体的には、毎コマの器官検出処理が行われたのち、Ｓ１０１の検出結果情報更新処理が開始される。次いで、Ｓ１０２に進み、カメラ制御部１３１は、処理顔情報検出部１４１および器官情報検出部１４２の検出結果情報を取得する。Ｓ１０２で検出結果を取得したら、Ｓ１０３に推移して取得した顔および瞳の両方が一定の閾値以上の信頼度で検出できているかを判定する。一定以上の信頼度で顔・瞳検出ができていた場合はＳ１０４に移行し、記憶部１５３に格納されているコマ数と顔および瞳の検出位置情報を取得した検出結果に更新する。Ｓ１０３において、顔・瞳の信頼度が閾値未満であった場合は、更新処理を行わず更新処理を終了する。また、器官検出処理の更新頻度は必ずしも毎コマでなくてもよく、所定のコマ数ごとあるいは所定の期間ごとであってもよい。また、何らかのユーザ指示やシーン判定等によって割込みで器官検出処理が行われてもよい。いずれの態様においても、器官検出処理が行われた場合、最新の検出結果が記憶部１５３に記憶され、更新される。 It is assumed that the detection result of the organ detection by the face information detection unit 141 (organ information detection unit 142) acquired in S205 is updated every frame according to the processing flow shown in FIG. 6A. Specifically, after the organ detection process for each frame is performed, the detection result information update process for S101 is started. Next, the process proceeds to S102, and the camera control unit 131 acquires the detection result information of the processed face information detection unit 141 and the organ information detection unit 142. After the detection result is acquired in S102, it is determined whether both the face and the pupil acquired in the transition to S103 can be detected with a reliability equal to or higher than a certain threshold value. If the face / pupil can be detected with a certain degree of reliability or higher, the process proceeds to S104, and the number of frames stored in the storage unit 153 and the detection position information of the face / pupil are updated to the acquired detection result. In S103, if the reliability of the face / pupil is less than the threshold value, the update process is not performed and the update process is terminated. Further, the update frequency of the organ detection process does not necessarily have to be every frame, and may be every predetermined number of frames or every predetermined period. Further, the organ detection process may be performed by an interrupt by some user instruction, scene determination, or the like. In any aspect, when the organ detection process is performed, the latest detection result is stored in the storage unit 153 and updated.

以上のように本実施形態では、器官検出処理だけを用いて瞳ＡＦを行う場合と比較して枠表示のリアルタイム性を高めつつ、パターンマッチング処理等を用いた追尾処理のみを行う場合と比較して高い精度の検出結果に基づいて目の推定を行うことができる。 As described above, in the present embodiment, compared with the case where the pupil AF is performed using only the organ detection process, the real-time property of the frame display is improved, and compared with the case where only the tracking process using the pattern matching process or the like is performed. The eye can be estimated based on the detection result with high accuracy.

次に、顔情報検出部１４１（器官情報検出部１４２）の検出結果に基づき算出部１４４が行う瞳位置の推定処理について、被写体がＰｉｔｃｈ、Ｙａｗ、Ｒｏ１１方向への回転を行っている場合においても正確な位置に瞳枠を推定することができる実施例２を示す。 Next, regarding the pupil position estimation process performed by the calculation unit 144 based on the detection result of the face information detection unit 141 (organ information detection unit 142), even when the subject is rotating in the Pitch, Yaw, and Ro11 directions. Example 2 is shown in which the pupil frame can be estimated at an accurate position.

図７（ａ）に示したように、実施例１の方法では、被写体がカメラのレンズ１００に対し、顔角度が変わらないような移動をした場合、またはゆっくりとＰｉｔｃｈ、Ｙａｗ、Ｒｏ１１方向へ回転した場合は精度よく瞳位置を推定できるが、前記３種の回転方向のいずれかもしくは全部において急激な移動をした場合には、瞳の位置推定に前コマにおける顔枠に対する瞳の相対位置を用いるために実際の瞳位置と枠位置のズレが大きくなってしまう。例えば、Ｐｉｔｃｈ方向に７１１→７２１→７３１と被写体の顔が回転した場合では、前コマでの瞳の相対位置７２２を現コマに適用した場合、実際の瞳よりも下位置に瞳があると推測してしまい、枠７３２が実際の瞳位置から下方にずれてしまう。そこで、本変形例では表示枠設定部１５１が顔情報検出部１４１から得た前々コマと前コマの顔角度情報をもとに、枠７３２の位置から図７（ｂ）の枠７３３に補正する。また、Ｙａｗ（７４１〜７６３）方向への回転でもＰｉｔｃｈと同様に、補正を行う。Ｙａｗ回転とＰｉｔｃｈ回転の補正の方法としては、簡便な手法としては前々コマと前コマの移動距離をそのまま補正量として用いる方法が考えられる。顔は略球形なので、例えば、正対方向からＹａｗ方向の奥側に行くにつれ、瞳移動量は減少する。そのため、さらに精度よく補正する方法としてこの顔角度に対する移動量をルックアップチャートとして持ち、参照しても良い。Ｒｏｌｌ（７７１〜７９３）方向に顔が回転した場合では、ＹａｗとＰｉｔｃｈ回転で用いる前々コマと前コマの移動距離をそのまま補正量として用いる方法は適さない。なぜならば、瞳は画像上で円運動をするため、略直線移動を前提とした補正方法では位置ずれが悪化する可能性があるためである。そのため、図７（ｂ）に示すように、Ｒｏｌｌ方向の回転では、顔枠の中心７９５を円心とし、円心から瞳までの距離を半径とした円７９６の上を、顔角度に応じて表示枠を７９２→７９３へと移動させる補正を行う。 As shown in FIG. 7A, in the method of the first embodiment, when the subject moves with respect to the lens 100 of the camera so that the face angle does not change, or slowly rotates in the Pitch, Yaw, and Ro11 directions. If this is the case, the pupil position can be estimated accurately, but if there is a sudden movement in any or all of the above three rotation directions, the position of the pupil relative to the face frame in the previous frame is used to estimate the position of the pupil. Therefore, the difference between the actual pupil position and the frame position becomes large. For example, when the subject's face rotates in the Pitch direction in the order of 711 → 721 → 731, when the relative position 722 of the pupil in the previous frame is applied to the current frame, it is estimated that the pupil is located below the actual pupil. As a result, the frame 732 shifts downward from the actual pupil position. Therefore, in this modification, the display frame setting unit 151 corrects the position of the frame 732 to the frame 733 of FIG. 7 (b) based on the face angle information of the previous frame and the previous frame obtained from the face information detection unit 141. To do. Further, the rotation in the Yaw (741 to 763) direction is also corrected in the same manner as the Pitch. As a method of correcting the Yaw rotation and the Pitch rotation, as a simple method, a method of using the moving distance between the previous frame and the previous frame as the correction amount can be considered. Since the face is substantially spherical, for example, the amount of pupil movement decreases from the facing direction to the back side in the Yaw direction. Therefore, as a method of correcting with higher accuracy, the movement amount with respect to the face angle may be held as a lookup chart and referred to. When the face is rotated in the Roll (771 to 793) direction, the method of using the moving distance between the previous frame and the previous frame used in Yaw and Pitch rotation as the correction amount is not suitable. This is because the pupil makes a circular motion on the image, and the correction method premised on substantially linear movement may worsen the misalignment. Therefore, as shown in FIG. 7B, in the rotation in the Roll direction, the center 795 of the face frame is the center of the circle, and the distance from the center of the circle to the pupil is the radius of the circle 796, which is displayed according to the face angle. A correction is performed to move the frame from 792 to 793.

図８（ａ）はカメラ８１１と被写体８１２を上空から見た図である。被写体８１２の瞳８１３がカメラ８１１に対し正対する顔角度を０度とし、矢印の方向に顔が回転する場合の画面上の瞳のサイズ変動を示したのが図（ｂ）である。０度から４５度に行く間で瞳８１３はカメラに接近し、顔に対して相対的に大きくなり、さらに奥方向へ向いたときには小さくなっていく。この顔の回転に伴う瞳のサイズ変動を加味して枠サイズを設定するとより実態に則した枠を付けることができる。また、枠が小さくなりすぎると視認性が下がるため、枠サイズが一定閾値以下にはならないようにするか、もしくは矢印にて瞳位置を示すなど表示方法を変えてもよい。Ｙａｗでも同様に、枠サイズ補正を行う。Ｒｏｌｌでは瞳のサイズは変化しないため、このサイズ補正を行わない。 FIG. 8A is a view of the camera 811 and the subject 812 as viewed from above. FIG. (B) shows the variation in the size of the pupil on the screen when the face angle of the pupil 813 of the subject 812 facing the camera 811 is 0 degree and the face is rotated in the direction of the arrow. The pupil 813 approaches the camera between 0 and 45 degrees, becomes larger relative to the face, and becomes smaller when facing further back. By setting the frame size in consideration of the change in the size of the pupil due to the rotation of the face, it is possible to attach a frame that is more in line with the actual situation. Further, if the frame becomes too small, the visibility is lowered. Therefore, the display method may be changed such that the frame size does not fall below a certain threshold value or the pupil position is indicated by an arrow. Similarly, in Yaw, the frame size is corrected. Since the size of the pupil does not change in Roll, this size correction is not performed.

以上により、被写体がＰｉｔｃｈ、Ｙａｗ、Ｒｏ１１方向への回転を行っている場合において、処理負荷は増すものの、実施例１よりも正確な位置に瞳枠を表示することができる。 As described above, when the subject is rotating in the Pitch, Yaw, and Ro11 directions, the pupil frame can be displayed at a more accurate position than in the first embodiment, although the processing load is increased.

以上、本発明をその好適な実施形態に基づいて詳述してきたが、本発明はこれら特定の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の様々な形態も本発明に含まれる。上述の実施形態の一部を適宜組み合わせてもよい。また、上述の実施形態の機能を実現するソフトウェアのプログラムを、記録媒体から直接、或いは有線／無線通信を用いてプログラムを実行可能なコンピュータを有するシステム又は装置に供給し、そのプログラムを実行する場合も本発明に含む。従って、本発明の機能処理をコンピュータで実現するために、該コンピュータに供給、インストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明の機能処理を実現するためのコンピュータプログラム自体も本発明に含まれる。その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。プログラムを供給するための記録媒体としては、例えば、ハードディスク、磁気テープ等の磁気記録媒体、光／光磁気記憶媒体、不揮発性の半導体メモリでもよい。また、プログラムの供給方法としては、コンピュータネットワーク上のサーバに本発明を形成するコンピュータプログラムを記憶し、接続のあったクライアントコンピュータがコンピュータプログラムをダウンロードしてプログラムするような方法も考えられる。 Although the present invention has been described in detail based on the preferred embodiments thereof, the present invention is not limited to these specific embodiments, and various embodiments within the scope of the gist of the present invention are also included in the present invention. included. Some of the above-described embodiments may be combined as appropriate. Further, when a software program that realizes the functions of the above-described embodiment is supplied to a system or device having a computer capable of executing the program directly from a recording medium or by using wired / wireless communication, and the program is executed. Is also included in the present invention. Therefore, in order to realize the functional processing of the present invention on a computer, the program code itself supplied and installed on the computer also realizes the present invention. That is, the computer program itself for realizing the functional processing of the present invention is also included in the present invention. In that case, the form of the program does not matter, such as the object code, the program executed by the interpreter, and the script data supplied to the OS, as long as it has the function of the program. The recording medium for supplying the program may be, for example, a hard disk, a magnetic recording medium such as a magnetic tape, an optical / optical magnetic storage medium, or a non-volatile semiconductor memory. Further, as a method of supplying the program, a method in which the computer program forming the present invention is stored in a server on the computer network and the connected client computer downloads and programs the computer program can be considered.

１２２撮像素子
１２４カメラ信号処理回路
１３１カメラ制御部
１４１顔情報検出部
１４２器官情報検出部
１４３追尾部
１４４算出部
１５０瞳自動選択部
１５１表示枠設定部
１５２瞳指定／指定解除部
１５３指定瞳記憶部
１７１表示部
１７２タッチパネル 122 Image sensor 124 Camera signal processing circuit 131 Camera control unit 141 Face information detection unit 142 Organ information detection unit 143 Tracking unit 144 Calculation unit 150 Eye automatic selection unit 151 Display frame setting unit 152 Eye designation / dedesignation unit 153 Designated pupil storage unit 171 Display unit 172 Touch panel

Claims

表示手段に撮像画像およびアイコンを表示する表示制御手段と、
撮像画像から器官を検出することにより顔および目を検出する検出手段と、
第１の撮像画像において設定された顔領域を前記検出手段による検出よりも簡易な検出手法で第２の撮像画像から検出し追尾する追尾手段と、
顔の位置およびサイズと目の位置情報をもとに目の推定位置を決定する決定手段とを有し、
前記決定手段は、第２の撮像画像において前記追尾手段が検出した前記顔領域の顔位置および顔サイズに対して、前記顔に対する目の相対位置関係を当てはめて目の推定位置を決定し、
前記表示制御手段は、前記決定された目の推定位置を示すアイコンを表示手段に表示された前記第２の撮像画像に重畳表示することを特徴とする画像処理装置。 Display control means for displaying captured images and icons on the display means,
A detection means that detects the face and eyes by detecting organs from captured images,
A tracking means that detects and tracks the face region set in the first captured image from the second captured image by a detection method simpler than the detection by the detection means.
It has a deciding means for determining the estimated position of the eyes based on the position and size of the face and the position information of the eyes.
The determining means determines the estimated position of the eyes by applying the relative positional relationship of the eyes to the face with respect to the face position and the face size of the face region detected by the tracking means in the second captured image.
The display control means is an image processing device characterized in that an icon indicating an estimated position of the determined eye is superimposed and displayed on the second captured image displayed on the display means.

前記推定位置による表示枠の表示は、前記追尾手段から得た被写体の顔サイズが所定の閾値以下のサイズであると判定された場合に行われることを特徴とする
請求項１に記載の画像処理装置。 The image processing according to claim 1, wherein the display of the display frame based on the estimated position is performed when it is determined that the face size of the subject obtained from the tracking means is a size equal to or less than a predetermined threshold value. apparatus.

前記算出手段が用いる前記検出手段による検出結果の情報は、現コマから所定のコマ数前あるいは所定時間前の情報であることを特徴とする請求項１または２に記載の画像処理装置。 The image processing apparatus according to claim 1 or 2, wherein the information of the detection result by the detection means used by the calculation means is information before a predetermined number of frames or a predetermined time from the current frame.

前記算出手段が用いる前記検出手段による検出結果の情報は、現コマから所定の時間内の情報であり、前記時間は被写体の移動量が大きい場合は少なく、移動量が少ない場合は多く設定されることを特徴とする請求項１乃至３のいずれか１項に記載の画像処理装置。 The information of the detection result by the detection means used by the calculation means is information within a predetermined time from the current frame, and the time is set less when the movement amount of the subject is large and more when the movement amount is small. The image processing apparatus according to any one of claims 1 to 3, wherein the image processing apparatus is characterized by the above.

前記目の相対位置の算出方法として、顔の中心点から目の中心点までの水平および垂直方向の移動距離を、顔サイズに対する相対量として算出することを特徴とする請求項１乃至４のいずれか１項に記載の画像処理装置。 Any of claims 1 to 4, wherein as a method for calculating the relative position of the eyes, the moving distance in the horizontal and vertical directions from the center point of the face to the center point of the eyes is calculated as a relative amount with respect to the face size. The image processing apparatus according to item 1.

前記算出手段は、検出手段結果から算出した目位置を、被写体の回転量ないしは直近のコマ間の器官の移動量を用いて補正することを特徴とする請求項１乃至５のいずれか１項に記載の画像処理装置。 The calculation means according to any one of claims 1 to 5, wherein the eye position calculated from the result of the detection means is corrected by using the amount of rotation of the subject or the amount of movement of the organ between the nearest frames. The image processing apparatus described.

前記決定手段は、前記検出手段が検出した目位置、顔位置および顔サイズ情報の少なくとも１つをもとに顔に対する目の相対位置を算出することを特徴とする請求項１乃至６のいずれか１項に記載の画像処理装置。 Any one of claims 1 to 6, wherein the determining means calculates the relative position of the eye with respect to the face based on at least one of the eye position, the face position, and the face size information detected by the detecting means. The image processing apparatus according to item 1.

前記撮像手段をさらに有することを特徴とする請求項１乃至７のいずれか１項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 7, further comprising the image pickup means.

撮像画像から器官を検出することにより顔および目を検出手段により検出する検出ステップと、
第１の撮像画像において設定された顔領域を前記検出手段による検出よりも簡易な検出手法で第２の撮像画像から検出し追尾する追尾ステップと、
顔の位置およびサイズと目の位置情報をもとに目の推定位置を決定する決定ステップとを有し、
前記決定ステップでは、第２の撮像画像において前記追尾手段が検出した前記顔領域の顔位置および顔サイズに対して、前記顔に対する目の相対位置関係を当てはめて目の推定位置を決定し、
前記表示制御ステップでは、前記決定された目の推定位置を示すアイコンを表示手段に表示された前記第２の撮像画像に重畳表示することを特徴とする画像処理方法。 A detection step in which the face and eyes are detected by a detection means by detecting an organ from an captured image,
A tracking step of detecting and tracking the face region set in the first captured image from the second captured image by a detection method simpler than the detection by the detection means.
It has a determination step of determining the estimated position of the eyes based on the position and size of the face and the position information of the eyes.
In the determination step, the estimated position of the eyes is determined by applying the relative positional relationship of the eyes to the face with respect to the face position and face size of the face region detected by the tracking means in the second captured image.
In the display control step, an image processing method characterized in that an icon indicating an estimated position of the determined eye is superimposed and displayed on the second captured image displayed on the display means.