JP2010225118A

JP2010225118A - Image processing device, method and program

Info

Publication number: JP2010225118A
Application number: JP2009074716A
Authority: JP
Inventors: Masashi Nishiyama; 正志西山; Mayumi Yuasa; 真由美湯浅; Osamu Yamaguchi; 修山口
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-03-25
Filing date: 2009-03-25
Publication date: 2010-10-07
Anticipated expiration: 2029-03-25
Also published as: JP5087037B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing technology capable of correctly detecting spoofing in face recognition. <P>SOLUTION: An acquiring part 51 acquires photographed images of authentication objects in chronological order and extracts adjacent frames. A setting part 52 sets a face area and a background area about each of the adjacent frames. An associating part 53 associates respective pixels with each other between the adjacent frames in each of the face area and the background area. A calculating part 54 calculates movement features in each of the face area and the background area. A determining part 55 compares a movement feature in the face area with a movement feature in the background area, and determines whether the authentication objects are a picture or a person. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像処理装置、方法及びプログラムに関する。 The present invention relates to an image processing apparatus, method, and program.

従来より、例えば、ユーザに対してサービスの利用の許可を与える際に、ユーザの顔をカメラで撮影してその顔を認証する顔認識装置がある。この認証の際に、生きた人間の顔ではなく写真の顔をカメラに提示することで顔認識装置を欺く行為がある。このような行為はなりすましと呼ばれる。このなりすましはセキュリティーの観点からして大きな問題である。これを防ぐためには、カメラで撮影された顔が、生きた人間のものか、または、写真なのかを判定する必要がある。このような判定を写真検知と呼ぶ。写真検知の手法の一つとして、顔の動きを用いる方法がある。人間であれば認証の際に顔が動くのに対し、写真中の顔は動かないことを利用するのである。例えば、特許文献１の技術では、目の領域や口の領域を時間方向に探索して顔の動きを求める。写真であれば時間が経過しても目と口は動かないのに対し、人間であれば時間が経過すると目や口が動く。このような人間の動きを検出するために、特許文献１の技術では、各領域においてフレーム間の差分をとり、差分の値の合計が小さい場合は動きのない写真として判定している。 2. Description of the Related Art Conventionally, for example, when a user is permitted to use a service, there is a face recognition device that photographs a user's face with a camera and authenticates the face. At the time of this authentication, there is an act of deceiving the face recognition device by presenting a photographed face instead of a live human face to the camera. Such an act is called spoofing. This impersonation is a big problem from the viewpoint of security. In order to prevent this, it is necessary to determine whether the face photographed by the camera belongs to a living person or a photograph. Such a determination is called photo detection. One of the methods for detecting a photo is a method using facial movement. For humans, the face moves at the time of authentication, while the face in the photograph does not move. For example, in the technique disclosed in Patent Document 1, the movement of the face is obtained by searching the eye area and the mouth area in the time direction. In the case of photographs, eyes and mouth do not move over time, whereas in humans, eyes and mouth move over time. In order to detect such a human motion, the technique of Patent Document 1 takes a difference between frames in each region, and determines that the photograph has no motion when the sum of the difference values is small.

特開２００６−３３０９３６号公報JP 2006-330936 A

しかし、特許文献１の技術で写真検知に用いているのは、顔の局所的な領域のみである。そのため以下の問題が発生する恐れがある。写真検知の精度が、目や口といった顔の局所的な部分の領域自体を検出する技術の性能に強く依存する。このため、目や口の検出ずれが発生すると、写真中の顔でも動きがあると判定され、なりすましを正しく検出できない恐れがある。 However, only the local area of the face is used for the photo detection in the technique of Patent Document 1. Therefore, the following problems may occur. The accuracy of photo detection is strongly dependent on the performance of the technology that detects the local area of the face itself, such as the eyes and mouth. For this reason, if a detection error occurs in the eyes or mouth, it is determined that there is movement in the face in the photograph, and impersonation may not be detected correctly.

本発明は、上記に鑑みてなされたものであって、顔認識において、なりすましを正しく検出可能な画像処理装置、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide an image processing apparatus, method, and program capable of correctly detecting impersonation in face recognition.

上述した課題を解決し、目的を達成するために、本発明の一態様は、画像処理装置であって、取得した画像のうち時間的に隣接する２つの画像のそれぞれについて、顔の全部又は一部を含む顔領域と、背景を含む背景領域とを設定する設定部と、前記顔領域と前記背景領域とのそれぞれにおいて、前記２つの画像間の各画素の対応付けを行って、各画素の動きを推定する対応付け部と、前記顔領域と前記背景領域とのそれぞれにおいて、各画素の動きの統計的な分布を表す動き特徴量を算出する算出部と、前記顔領域における動き特徴量と前記背景領域における動き特徴量とを比較して、前記画像に撮影されたものが写真であるか人間であるかを判定する判定部とを備えることを特徴とする。 In order to solve the above-described problems and achieve the object, one embodiment of the present invention is an image processing apparatus, in which two or more faces are adjacent to each other in two temporally adjacent images among acquired images. A setting unit that sets a face area including a part and a background area including a background, and each of the face area and the background area is associated with each pixel between the two images. A correlation unit that estimates motion, a calculation unit that calculates a statistical distribution of motion of each pixel in each of the face region and the background region, and a motion feature amount in the face region And a determination unit that compares a motion feature amount in the background region and determines whether the image captured in the image is a photograph or a human being.

本発明によれば、顔認識において、なりすましを正しく検出可能になる。 According to the present invention, impersonation can be correctly detected in face recognition.

第１の実施の形態にかかる画像処理装置５０の機能的構成を例示する図である。It is a figure which illustrates the functional structure of the image processing apparatus 50 concerning 1st Embodiment. 第１の実施の形態にかかる顔領域及び背景領域を例示する図である。It is a figure which illustrates the face field and background field concerning a 1st embodiment. 第１の実施の形態にかかるオプティカルフローの算出方法を概念的に例示する図である。It is a figure which illustrates notionally the calculation method of the optical flow concerning a 1st embodiment. 第１の実施の形態にかかる顔領域における画素毎のオプティカルフロー及び背景領域における画素毎のオプティカルフローを例示する図である。It is a figure which illustrates the optical flow for every pixel in the face area | region concerning 1st Embodiment, and the optical flow for every pixel in a background area | region. 第１の実施の形態にかかる顔領域における画素毎のオプティカルフロー及び背景領域における画素毎のオプティカルフローを例示する図である。It is a figure which illustrates the optical flow for every pixel in the face area | region concerning 1st Embodiment, and the optical flow for every pixel in a background area | region. 第１の実施の形態にかかるオプティカルフローを量子化する方向を例示する図である。It is a figure which illustrates the direction which quantizes the optical flow concerning a 1st embodiment. 第１の実施の形態にかかる方向ヒストグラムを例示する図である。It is a figure which illustrates the direction histogram concerning a 1st embodiment. 第１の実施の形態にかかる画像処理装置５０の行う写真検知処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the photo detection process which the image processing apparatus 50 concerning 1st Embodiment performs. 第２の実施の形態にかかる画像処理装置５０の機能的構成を例示する図である。It is a figure which illustrates the functional structure of the image processing apparatus 50 concerning 2nd Embodiment. 第２の実施の形態にかかる画像処理装置５０の行う写真検知処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the photo detection process which the image processing apparatus 50 concerning 2nd Embodiment performs. 第３の実施の形態にかかる画像処理装置５０の機能的構成を例示する図である。It is a figure which illustrates the functional structure of the image processing apparatus 50 concerning 3rd Embodiment. 第３の実施の形態にかかる画像処理装置５０の行う写真検知処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the photo detection process which the image processing apparatus 50 concerning 3rd Embodiment performs. 一変形例にかかる顔領域及び背景領域を例示する図である。It is a figure which illustrates the face field and background field concerning a modification. 一変形例にかかる顔領域及び背景領域を例示する図である。It is a figure which illustrates the face field and background field concerning a modification. 一変形例にかかる顔領域及び背景領域を例示する図である。It is a figure which illustrates the face field and background field concerning a modification.

[第１の実施の形態]
以下に添付図面を参照して、この発明にかかる画像処理装置の実施の形態を詳細に説明する。ここで、画像処理装置のハードウェア構成について説明する。本実施の形態の画像処理装置は、装置全体を制御するＣＰＵ（Central Processing Unit）等の制御部と、各種データや各種プログラムを記憶するＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）等の記憶部と、各種データや各種プログラムを記憶するＨＤＤ（Hard Disk Drive）やＣＤ（Compact Disk）ドライブ装置等の外部記憶部と、これらを接続するバスとを備えており、通常のコンピュータを利用したハードウェア構成となっている。また、画像処理装置には、認証対象を撮影してその画像を入力するＣＣＤ(Charge Coupled Device Image Sensor)カメラなどにより構成される画像入力部と、ユーザの指示入力を受け付けるキーボードやマウス等の操作入力部と、外部装置の通信を制御する通信Ｉ／Ｆ（interface）とが有線又は無線により各々接続される。 [First embodiment]
Embodiments of an image processing apparatus according to the present invention will be described below in detail with reference to the accompanying drawings. Here, the hardware configuration of the image processing apparatus will be described. The image processing apparatus according to the present embodiment includes a control unit such as a CPU (Central Processing Unit) that controls the entire apparatus, a ROM (Read Only Memory) that stores various data and various programs, a RAM (Random Access Memory), and the like. Equipped with a storage unit, an external storage unit such as an HDD (Hard Disk Drive) or CD (Compact Disk) drive device for storing various data and various programs, and a bus for connecting them, using a normal computer It has a hardware configuration. In addition, the image processing apparatus includes an image input unit configured by a CCD (Charge Coupled Device Image Sensor) camera that captures an authentication target and inputs the image, and operations such as a keyboard and a mouse that accept user input. An input unit and a communication I / F (interface) that controls communication of an external device are connected by wire or wirelessly.

次に、このようなハードウェア構成において、画像処理装置のＣＰＵが記憶装置や外部記憶部に記憶された各種プログラムを実行することにより実現される各種機能について説明する。図１は、画像処理装置５０の機能的構成を例示する図である。画像処理装置５０は、取得部５１と、設定部５２と、対応付け部５３と、算出部５４と、判定部５５とを有する。これら各部は、ＣＰＵのプログラム実行時にＲＡＭなどの記憶部上に生成されるものである。 Next, various functions realized by the CPU of the image processing apparatus executing various programs stored in the storage device or the external storage unit in such a hardware configuration will be described. FIG. 1 is a diagram illustrating a functional configuration of the image processing apparatus 50. The image processing apparatus 50 includes an acquisition unit 51, a setting unit 52, an association unit 53, a calculation unit 54, and a determination unit 55. Each of these units is generated on a storage unit such as a RAM when the CPU executes a program.

取得部５１は、画像入力部で時系列に撮影された認証対象の画像をフレーム単位で取得して、時間的に隣接するフレーム（隣接フレームという）を抽出する。隣接フレームとは、具体的には、時刻ｔで取得された第１のフレームと時刻「ｔ−１」で取得された第２のフレームとである。 The acquisition unit 51 acquires the authentication target images taken in time series by the image input unit in units of frames, and extracts temporally adjacent frames (referred to as adjacent frames). Specifically, the adjacent frames are a first frame acquired at time t and a second frame acquired at time “t−1”.

ここで、本実施の形態における前提について説明する。写真の画像が認証対象の画像として取得される場合、その写真は顔と背景とを含み顔だけがトリミングされた写真ではないと仮定する。また、認証対象が人間である場合、この人間は顔の向きを随時変えると仮定する。これらの仮定のもとで背景の動きと顔の動きとを比較すると、認証対象が写真であれば背景と顔とが同じ方向へ動くことに対し、認証対象が人間であれば異なる方向に動く。本実施の形態においてはこの相対的な動きの違いを写真検知に用いる。以降ではこの相対的な動きの違いを写真検知にどのように用いるかを詳細に説明する。 Here, the premise in this Embodiment is demonstrated. When an image of a photograph is acquired as an image to be authenticated, it is assumed that the photograph is not a photograph that includes a face and a background and only the face is trimmed. Further, when the authentication target is a human, it is assumed that this human changes the orientation of the face as needed. Comparing the background movement and the face movement under these assumptions, if the authentication target is a photo, the background and the face move in the same direction, whereas if the authentication target is a human, it moves in a different direction. . In the present embodiment, this relative movement difference is used for photo detection. In the following, it will be described in detail how this relative difference in motion is used for photo detection.

図１の説明に戻る。設定部５２は、取得部５１が抽出した第１フレームに対して顔領域の検出を行い、顔領域を設定する。次に、設定部５２は、第１フレームの顔領域以外の領域から背景領域を設定する。図２は、顔領域及び背景領域を例示する図である。同図に示されるように、顔の全部又は一部を含む領域が顔領域として設定され、顔の付近の領域が背景領域として設定される。ここでは背景領域は複数設定され、顔領域に隣接する左右の領域に設定される。また、設定部５２は、取得部５１が抽出した第２フレームに対しても同様に顔領域の検出を行い、顔領域を設定し、背景領域を設定する。尚、計算時間を少なくするために、第２フレームで設定する顔領域及び背景領域は各々、第１フレームで設定した顔領域及び背景領域と同じものにしても良い。尚、以降、顔領域及び背景領域を区別する必要がない場合には、説明の便宜上、単に領域と記載する。 Returning to the description of FIG. The setting unit 52 detects a face area for the first frame extracted by the acquisition unit 51 and sets the face area. Next, the setting unit 52 sets a background area from an area other than the face area of the first frame. FIG. 2 is a diagram illustrating a face area and a background area. As shown in the figure, an area including all or part of the face is set as a face area, and an area near the face is set as a background area. Here, a plurality of background areas are set, and the left and right areas adjacent to the face area are set. In addition, the setting unit 52 similarly detects the face area for the second frame extracted by the acquisition unit 51, sets the face area, and sets the background area. In order to reduce the calculation time, the face area and background area set in the second frame may be the same as the face area and background area set in the first frame, respectively. Hereinafter, when it is not necessary to distinguish the face area and the background area, they are simply referred to as areas for convenience of explanation.

対応付け部５３は、第１フレームに対して設定部５２が設定した顔領域と、第２フレームに対して設定部５２が設定した顔領域との間で、各画素の対応付けを行うと共に、第１フレームに対して設定部５２が設定した背景領域と、第２フレームに対して設定部５２が設定した背景領域との間で、各画素の対応付けを行う。対応付けにはオプティカルフローの算出方法を用いる。図３は、オプティカルフローの算出方法を概念的に例示する図である。オプティカルフローとは、ある時刻ｔにおける点が、そのΔｔ後の時刻「ｔ＋Δｔ」にどれくらい移動したかを表す２次元方向ベクトルである。画像内においてこのような移動する点を画素として、ある時刻ｔのフレームにおける画素が、そのΔｔ後の時刻「ｔ＋Δｔ」のフレームにおいてどの画素に対応するのかを画素毎に求める。このようにして、対応付け部５３は、各画素の動きを推定し、その対応付けを示すオプティカルフローを画素毎に算出する。尚、オプティカルフローの算出方法には、相関（ブロックマッチング法）による対応付けや、勾配法による対応付けや、特徴点追跡を利用した対応付けなどの様々な方法が従来からあるが、本実施の形態においてはいずれの方法を適用しても良い。対応付け部５３は、このように顔領域において算出した画素毎のオプティカルフロー及び背景領域において算出した画素毎のオプティカルフローを、対応付けの結果として各々出力する。尚、複数設定されている背景領域については対応付け部５３はそれぞれに対してオプティカルフローを算出して出力する。 The association unit 53 associates each pixel between the face area set by the setting unit 52 for the first frame and the face area set by the setting unit 52 for the second frame, Each pixel is associated between the background region set by the setting unit 52 for the first frame and the background region set by the setting unit 52 for the second frame. An optical flow calculation method is used for the association. FIG. 3 is a diagram conceptually illustrating an optical flow calculation method. The optical flow is a two-dimensional direction vector representing how much a point at a certain time t has moved at a time “t + Δt” after Δt. Using such a moving point in the image as a pixel, it is determined for each pixel which pixel in a frame at a certain time t corresponds to a pixel at a time “t + Δt” after that Δt. In this way, the associating unit 53 estimates the movement of each pixel, and calculates an optical flow indicating the association for each pixel. In addition, as a method for calculating an optical flow, there are conventionally various methods such as association by correlation (block matching method), association by gradient method, and association using feature point tracking. Any method may be applied in the form. The associating unit 53 outputs the optical flow for each pixel calculated in the face area and the optical flow for each pixel calculated in the background area as the result of the association. Note that for a plurality of set background areas, the associating unit 53 calculates and outputs an optical flow for each.

図４〜５は、顔領域における画素毎のオプティカルフロー及び背景領域における画素毎のオプティカルフローを例示する図である。但し、図４は、認証対象が人間である場合のオプティカルフローを例示し、図５は、認証対象が写真である場合のオプティカルフローを例示している。尚、図面が煩雑になるのを防ぐため、各領域におけるオプティカルフローを簡略化して例示している。各図に示されるように、第１のフレームにおいて現れるある点と、当該点が移動して第２のフレームに現れる点とが結ばれてベクトルとして表される。 4 to 5 are diagrams illustrating an optical flow for each pixel in the face area and an optical flow for each pixel in the background area. However, FIG. 4 illustrates an optical flow when the authentication target is a human, and FIG. 5 illustrates an optical flow when the authentication target is a photograph. In order to prevent the drawing from becoming complicated, the optical flow in each region is illustrated in a simplified manner. As shown in each figure, a certain point appearing in the first frame is connected to a point appearing in the second frame by the movement of the point, and is represented as a vector.

算出部５４は、対応付け部５３が出力した顔領域における画素毎のオプティカルフローを用いて、顔領域の動き特徴量を算出すると共に、対応付け部５３が出力した背景領域における画素毎のオプティカルフローを用いて、背景領域の動き特徴量を算出する。動き特徴量とは、領域内の動きが、どのような分布を統計的にもつかを表すものである。ここでは動きの分布を方向ヒストグラムとして表現する。オプティカルフローの方向は連続値であるためこれを離散化すべく、算出部５４は、オプティカルフローの方向を例えば図６に示すように８方向に量子化する。量子化するための算式は例えば以下の式１に示される。 The calculating unit 54 calculates the motion feature amount of the face area using the optical flow for each pixel in the face area output from the associating unit 53 and also the optical flow for each pixel in the background area output from the associating unit 53. Is used to calculate the motion feature amount of the background region. The motion feature amount represents what distribution the motion in the region has statistically. Here, the motion distribution is expressed as a direction histogram. Since the direction of the optical flow is a continuous value, the calculation unit 54 quantizes the direction of the optical flow into eight directions as shown in FIG. An equation for quantization is shown in the following equation 1, for example.

そして、算出部５４は、例えば図７に示されるように、量子化した各方向の出現頻度を算出して、各方向に対応した８個のビンをもつヒストグラムを作成する。このヒストグラムが方向ヒストグラムである。この方向ヒストグラムを動き特徴量とする。ヒストグラムの各ビンを合計した値が１となるように正規化する。このような動き特徴量を顔領域及び背景領域のそれぞれについて算出部５４は算出する。尚、複数設定されている背景領域については算出部５４はそれぞれに対して動き特徴量を算出する。 Then, for example, as illustrated in FIG. 7, the calculation unit 54 calculates the appearance frequency of each quantized direction, and creates a histogram having eight bins corresponding to each direction. This histogram is a direction histogram. This direction histogram is used as a motion feature amount. Normalization is performed so that the sum of the bins of the histogram is 1. The calculation unit 54 calculates such a motion feature amount for each of the face area and the background area. Note that the calculation unit 54 calculates a motion feature amount for each of the set background areas.

判定部５５は、算出部５４が算出した、顔領域の動き特徴量と、背景領域の動き特徴量との類似度を算出し、その類似度を用いて、認証対象が写真であるか否かを判定する。動き特徴量の類似度には、ヒストグラム間のインターセクション、相関、バタッチャリア係数などを用いれば良い。認証対象が写真であれば、図５に示したように、背景領域と顔領域とが同じ動き特徴量をもつため、類似度が高くなる。逆に、認証対象が人間であれば、図４に示したように、背景領域と顔領域とが異なる動き特徴量をもち、類似度が低くなる。従って、判定部５５は、算出した類似度が閾値以上であるか否かを判定することにより、認証対象が写真であるか否かを判定する。この閾値の値は、例えば予め設定されてＨＤＤなどの外部記憶部に記憶される。尚、複数設定されている背景領域については、判定部５５はこれらの動き特徴量を合算して、顔領域の動き特徴量との類似度を算出するようにしても良いし、各背景領域のそれぞれの動き特徴量と、顔領域の動き特徴量との類似度を算出するようにしても良い。 The determination unit 55 calculates the similarity between the motion feature amount of the face region and the motion feature amount of the background region calculated by the calculation unit 54, and uses the similarity to determine whether the authentication target is a photograph. Determine. For the similarity of the motion feature quantity, an intersection between histograms, a correlation, a butterfly coefficient, and the like may be used. If the authentication target is a photograph, as shown in FIG. 5, the background area and the face area have the same motion feature amount, so that the degree of similarity is high. Conversely, if the authentication target is a human, as shown in FIG. 4, the background area and the face area have different motion feature amounts, and the similarity is low. Accordingly, the determination unit 55 determines whether or not the authentication target is a photograph by determining whether or not the calculated similarity is greater than or equal to a threshold value. The threshold value is set in advance and stored in an external storage unit such as an HDD. Note that for a plurality of set background areas, the determination unit 55 may add the motion feature amounts to calculate the similarity with the motion feature amount of the face area. You may make it calculate the similarity degree of each motion feature-value and the motion feature-value of a face area | region.

次に、本実施の形態にかかる画像処理装置５０の行う写真検知処理の手順について図８を用いて説明する。画像処理装置５０は、まず、画像入力部で時系列に撮影された認証対象の画像をフレーム単位で取得して、隣接フレームである第１フレーム及び第２フレームを抽出する（ステップＳ１）。次いで、画像処理装置５０は、ステップＳ１で抽出した第１フレームに対して顔領域の検出を行い、例えば図２に示されるように、顔領域を設定し（ステップＳ２）、第１フレームの顔領域以外の領域から背景領域を設定する(ステップＳ３)。また、画像処理装置５０は、ステップＳ１で抽出した第２フレームについても顔領域を設定し（ステップＳ４）、背景領域を設定する（ステップＳ５）。 Next, the procedure of the photo detection process performed by the image processing apparatus 50 according to the present embodiment will be described with reference to FIG. First, the image processing apparatus 50 acquires authentication target images taken in time series by the image input unit in units of frames, and extracts the first frame and the second frame that are adjacent frames (step S1). Next, the image processing apparatus 50 detects the face area for the first frame extracted in step S1, sets the face area as shown in FIG. 2, for example (step S2), and the face of the first frame A background area is set from an area other than the area (step S3). The image processing apparatus 50 also sets a face area for the second frame extracted in step S1 (step S4) and sets a background area (step S5).

次いで、画像処理装置５０は、ステップＳ２で第１フレームに対して設定した顔領域と、ステップＳ４で第２フレームに対して設定した顔領域との間で、各画素の対応付けを行って、その対応付けの結果を出力する（ステップＳ６）。更に、画像処理装置５０は、ステップＳ３で第１フレームに対して設定した背景領域と、ステップＳ５で第２フレームに対して設定した背景領域との間で、各画素の対応付けを行って、その対応付けの結果を出力する(ステップＳ７)。その後、画像処理装置５０は、ステップＳ６で出力した顔領域における画素毎のオプティカルフローを用いて、顔領域の動き特徴量を算出する（ステップＳ８）。また、画像処理装置５０は、ステップＳ７で出力した背景領域における画素毎のオプティカルフローを用いて、背景領域の動き特徴量を算出する（ステップＳ９）。 Next, the image processing apparatus 50 associates each pixel between the face area set for the first frame in step S2 and the face area set for the second frame in step S4. The result of the association is output (step S6). Further, the image processing apparatus 50 associates each pixel between the background region set for the first frame in step S3 and the background region set for the second frame in step S5, and The result of the association is output (step S7). Thereafter, the image processing apparatus 50 calculates the motion feature amount of the face area using the optical flow for each pixel in the face area output in step S6 (step S8). In addition, the image processing apparatus 50 calculates the motion feature amount of the background region using the optical flow for each pixel in the background region output in step S7 (step S9).

続いて、画像処理装置５０は、ステップＳ８で算出した顔領域の動き特徴量と、ステップＳ９で算出した背景領域の動き特徴量との類似度を算出して、その類似度が閾値以上であるか否かを判定することにより、認証対象が写真であるか否かを判定する（ステップＳ１０）。 Subsequently, the image processing apparatus 50 calculates a similarity between the motion feature amount of the face region calculated in step S8 and the motion feature amount of the background region calculated in step S9, and the similarity is equal to or greater than a threshold value. It is determined whether or not the authentication target is a photograph (step S10).

以上のように、顔領域の動きだけではなく、背景領域の動きを求め、背景の動きと顔の動きとの相対的な違いから、写真検知を行う。この結果、正しく安定的な写真検知を行うことができ、即ち、顔認識におけるなりすましを正しく検出可能である。 As described above, not only the movement of the face area but also the movement of the background area is obtained, and photo detection is performed from the relative difference between the movement of the background and the movement of the face. As a result, correct and stable photo detection can be performed, that is, impersonation in face recognition can be detected correctly.

また、従来のように、目や口といった顔の局所的な部分を検出する必要がなく、形状に合わせた見え方として３次元的な見え方を補正する必要がない。つまり、カメラからの目や口の見え方は、顔の向きの変化に併せて変化するが、平面である写真でも３次元空間で向きを変えることによってその見え方は変化する。このため、目や口の検出により写真検知を行う場合、それぞれの形状に応じた３次元的な補正を行なって、向きの影響を除去しなければ、精度高く写真検知を行うことは困難であった。しかし、本実施の構成によればこのような３次元的な補正を必要とすることなく、精度高く写真検知を行うことができる。 Further, unlike the conventional case, it is not necessary to detect local parts of the face such as eyes and mouth, and it is not necessary to correct the three-dimensional appearance as the appearance matching the shape. In other words, how the eyes and mouth look from the camera changes in accordance with the change in the orientation of the face, but the appearance changes even when the orientation is changed in a three-dimensional space even in a photograph that is a plane. For this reason, when performing photo detection by detecting eyes or mouth, it is difficult to perform photo detection with high accuracy unless three-dimensional correction is performed according to each shape and the influence of the orientation is removed. It was. However, according to the present embodiment, photo detection can be performed with high accuracy without requiring such three-dimensional correction.

更に、目や口の検出により写真検知を行う場合、顔がどう動いたかを判別することは困難であり、動きが大きかったからといって、フレーム間の差分の合計値が大きくなるとは限らない。このため、従来技術欄で説明したように、フレーム間の差分の値の合計が小さい場合を写真として判定すると、認証対象が人間であるのに写真であると誤判定する恐れがあった。しかし、本実施の形態のように、背景の動きと顔の動きとの相対的な違いから写真検知を行うことにより、精度高く写真検知を行うことができる。 Furthermore, when performing photo detection by detecting eyes and mouth, it is difficult to determine how the face has moved, and just because the movement is large does not necessarily increase the total difference between frames. For this reason, as described in the prior art column, if a case where the sum of the difference values between frames is small is determined as a photograph, there is a risk that the authentication target is a human but the photograph is erroneously determined. However, as in the present embodiment, it is possible to perform photo detection with high accuracy by performing photo detection from the relative difference between the background motion and the face motion.

[第２の実施の形態]
次に、画像処理装置、方法及びプログラムの第２の実施の形態について説明する。なお、上述の第１の実施の形態と共通する部分については、同一の符号を使用して説明したり、説明を省略したりする。 [Second Embodiment]
Next, a second embodiment of the image processing apparatus, method, and program will be described. In addition, about the part which is common in the above-mentioned 1st Embodiment, it demonstrates using the same code | symbol or abbreviate | omits description.

本実施の形態においては、上述の第１の実施の形態で説明したオプティカルフローのはずれ値を除去して、より正しく安定的な写真検知を行なう。図９は、本実施の形態にかかる画像処理装置５０の機能的構成を例示する図である。本実施の形態にかかる画像処理装置５０の構成は、上述の第１の実施の形態にかかる画像処理装置５０の構成とは以下の点で異なる。画像処理装置５０は、除去部５６を更に有する。 In this embodiment, the outlier value of the optical flow described in the first embodiment is removed, and more correct and stable photo detection is performed. FIG. 9 is a diagram illustrating a functional configuration of the image processing apparatus 50 according to the present embodiment. The configuration of the image processing apparatus 50 according to the present embodiment differs from the configuration of the image processing apparatus 50 according to the first embodiment described above in the following points. The image processing apparatus 50 further includes a removal unit 56.

除去部５６は、対応付け部５３が出力した顔領域の画素毎のオプティカルフローから、当該オプティカルフローの大きさの分散を算出し、分散の一定倍以上の大きさをもつオプティカルフロー（はずれ値という）を除去する。同様に、除去部５６は、対応付け部５３が出力した背景領域の画素毎のオプティカルフローから、当該オプティカルフローの大きさの分散を算出し、分散の一定倍以上の大きさをもつオプティカルフロー（はずれ値）を除去する。複数設定されている背景領域については除去部５６はそれぞれの背景領域のオプティカルフローからはずれ値を除去する。 The removal unit 56 calculates a variance of the size of the optical flow from the optical flow for each pixel of the face area output from the association unit 53, and an optical flow (referred to as an outlier value) having a size equal to or larger than a certain multiple of the variance. ) Is removed. Similarly, the removal unit 56 calculates a variance of the size of the optical flow from the optical flow for each pixel in the background region output from the association unit 53, and an optical flow having a size equal to or larger than a certain multiple of the variance ( Outliers). For a plurality of set background areas, the removal unit 56 removes outliers from the optical flow of each background area.

算出部５４は、対応付け部５３が出力した背景領域における画素毎のオプティカルフローのうち除去部５６がはずれ値を除去したオプティカルフローを用いて、顔領域の動き特徴量を算出すると共に、対応付け部５３が出力した背景領域における画素毎のオプティカルフローのうち除去部５６がはずれ値を除去したオプティカルフローを用いて、背景領域の動き特徴量を算出する。その他の構成は上述の第１の実施の形態と同様である。 The calculation unit 54 calculates the motion feature amount of the face region using the optical flow from which the removal unit 56 has removed the outliers out of the optical flows for each pixel in the background region output from the association unit 53, and the association Of the optical flows for each pixel in the background region output by the unit 53, the motion feature amount of the background region is calculated using the optical flow from which the removal unit 56 has removed the outliers. Other configurations are the same as those in the first embodiment.

次に、本実施の形態にかかる画像処理装置５０の行う写真検知処理の手順について図１０を用いて説明する。ステップＳ１〜Ｓ７は上述の第１の実施の形態と同様である。ステップＳ７の後、画像処理装置５０は、ステップＳ６で出力した顔領域における画素毎のオプティカルフローからはずれ値を除去する（ステップＳ２０）。また、画像処理装置５０は、ステップＳ７で出力した背景領域における画素毎のオプティカルフローからはずれ値を除去する（ステップＳ２１）。その後、ステップＳ８では、画像処理装置５０は、ステップＳ２０ではずれ値を除去したオプティカルフローを用いて、顔領域の動き特徴量を算出する（ステップＳ８）。また、ステップＳ９では、画像処理装置５０は、ステップＳ２１ではずれ値を除去したオプティカルフローを用いて、背景領域の動き特徴量を算出する。ステップＳ１０は上述の第１の実施の形態と同様である。 Next, the procedure of the photo detection process performed by the image processing apparatus 50 according to the present embodiment will be described with reference to FIG. Steps S1 to S7 are the same as those in the first embodiment. After step S7, the image processing apparatus 50 removes a deviation value from the optical flow for each pixel in the face area output in step S6 (step S20). Further, the image processing apparatus 50 removes the outlier from the optical flow for each pixel in the background area output in step S7 (step S21). After that, in step S8, the image processing device 50 calculates the motion feature amount of the face region using the optical flow from which the deviation value is removed in step S20 (step S8). In step S9, the image processing apparatus 50 calculates the motion feature amount of the background region using the optical flow from which the deviation value is removed in step S21. Step S10 is the same as that in the first embodiment.

以上のように、オプティカルフローのはずれ値を除去することで、各画素の対応付けの誤差の影響を排除することができる。このため、より正しく安定的な写真検知を行うことができる。 As described above, by removing the outlier value of the optical flow, it is possible to eliminate the influence of the error in associating each pixel. For this reason, more accurate and stable photo detection can be performed.

[第３の実施の形態]
次に、画像処理装置、方法及びプログラムの第３の実施の形態について説明する。なお、上述の第１の実施の形態又は第２の実施の形態と共通する部分については、同一の符号を使用して説明したり、説明を省略したりする。 [Third embodiment]
Next, a third embodiment of the image processing apparatus, method, and program will be described. In addition, about the part which is common in the above-mentioned 1st Embodiment or 2nd Embodiment, it demonstrates using the same code | symbol or abbreviate | omits description.

本実施の形態においては、写真と人間との動きの違いを明確にするために、ユーザに対して顔を意識的に動かすように指示を与える。このような指示を与えるための出力部が画像処理装置５０に接続される。出力部は、ユーザが知覚可能な刺激を発生させるものであり、例えば、情報を表示する表示部、音声を出力する音声出力部、光を発生させる光発生部などである。 In the present embodiment, an instruction is given to the user to move his / her face consciously in order to clarify the difference in motion between a photograph and a human. An output unit for giving such an instruction is connected to the image processing apparatus 50. The output unit generates a stimulus that can be perceived by the user. Examples of the output unit include a display unit that displays information, a sound output unit that outputs sound, and a light generation unit that generates light.

図１１は、本実施の形態にかかる画像処理装置５０の機能的構成を例示する図である。本実施の形態にかかる画像処理装置５０の構成は、上述の第１の実施の形態にかかる画像処理装置５０の構成とは以下の点で異なる。画像処理装置５０は、指示部５７を更に有する。指示部５７は、画像入力部で顔が撮影されるユーザに対し、顔を動かす契機となる刺激を出力部を介して発生させる。例えば、出力部が表示部である場合、指示部５７は、例えば、「顔を動かして下さい」などというメッセージを表す画像を表示部に出力する。また、出力部が光発生部である場合、指示部５７は、例えば、光を光発生部に発生させる。これらの視覚的な刺激がユーザの顔を動かす契機となる。また、出力部が音声出力部である場合、指示部５７は、例えば、「顔を動かして下さい」などという音声やアラーム音を音声出力部に出力する。このような聴覚的な刺激がユーザの顔を動かす契機となる。指示部５７は、このような画像の表示や、音の出力や、光の発生を契機としてユーザに顔を動かすことを指示する。取得部５１は、このような指示に従って動いているであろう認証対象の画像をフレーム単位で取得して、隣接フレームを抽出する。その他の構成は上述の第１の実施の形態と同様である。 FIG. 11 is a diagram illustrating a functional configuration of the image processing apparatus 50 according to the present embodiment. The configuration of the image processing apparatus 50 according to the present embodiment differs from the configuration of the image processing apparatus 50 according to the first embodiment described above in the following points. The image processing apparatus 50 further includes an instruction unit 57. The instruction unit 57 causes the user whose face is photographed by the image input unit to generate a stimulus for moving the face via the output unit. For example, when the output unit is a display unit, the instruction unit 57 outputs an image representing a message such as “Please move your face” to the display unit. When the output unit is a light generation unit, the instruction unit 57 causes the light generation unit to generate light, for example. These visual stimuli trigger the user's face to move. When the output unit is a voice output unit, the instruction unit 57 outputs, for example, a voice such as “Please move your face” or an alarm sound to the voice output unit. Such an auditory stimulus is an opportunity to move the user's face. The instruction unit 57 instructs the user to move his / her face in response to such image display, sound output, and light generation. The acquisition unit 51 acquires an image to be authenticated that will be moving according to such an instruction in units of frames, and extracts adjacent frames. Other configurations are the same as those in the first embodiment.

次に、本実施の形態にかかる画像処理装置５０の行う写真検知処理の手順について図１２を用いて説明する。まず、ステップＳ３０では、画像処理装置５０は、ユーザが知覚可能な刺激を出力部を介して発生させることにより、ユーザに顔を動かすことを指示する。以降、ステップＳ１〜Ｓ１０は上述の第１の実施の形態と同様である。 Next, the procedure of the photo detection process performed by the image processing apparatus 50 according to the present embodiment will be described with reference to FIG. First, in step S30, the image processing apparatus 50 instructs the user to move the face by generating a stimulus that can be perceived by the user via the output unit. Henceforth, step S1-S10 is the same as that of the above-mentioned 1st Embodiment.

以上のように、指示を与えてユーザの顔を動かすことにより、写真と人間との動きの違いを明確にすることができ、より正しく安定的な写真検知を行うことができる。 As described above, by giving an instruction and moving the user's face, the difference in motion between the photograph and the human can be clarified, and more accurate and stable photograph detection can be performed.

なお、本発明は前記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、前記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。また、以下に例示するような種々の変形が可能である。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Moreover, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined. Further, various modifications as exemplified below are possible.

上述した各実施の形態において、画像処理装置５０で実行される各種プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また当該各種プログラムを、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（Digital Versatile Disk）等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成しても良い。 In each of the embodiments described above, various programs executed by the image processing apparatus 50 may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. . The various programs are recorded in a computer-readable recording medium such as a CD-ROM, a flexible disk (FD), a CD-R, and a DVD (Digital Versatile Disk) in a file in an installable or executable format. May be configured to be provided.

上述した各実施の形態の写真検知処理では、ステップＳ２〜Ｓ５の順に処理を行ったが、この順に限らない。同様に、ステップＳ６〜Ｓ７の順もこれに限らず、ステップＳ８〜Ｓ９の順もこれに限らない。 In the photo detection processing of each embodiment described above, processing is performed in the order of steps S2 to S5, but the order is not limited to this. Similarly, the order of steps S6 to S7 is not limited to this, and the order of steps S8 to S9 is not limited to this.

上述した各実施の形態において設定する顔領域及び背景領域は、図２に示したものに限らない。図１３〜１５は、顔領域及び背景領域を例示する図である。これらの図に示されるように、背景領域は複数であっても１つであっても良いし、顔領域及び背景領域の形状は、矩形ではなく任意の形状でも良い。 The face area and background area set in each embodiment described above are not limited to those shown in FIG. 13 to 15 are diagrams illustrating the face area and the background area. As shown in these drawings, the background area may be plural or one, and the shape of the face area and the background area may be an arbitrary shape instead of a rectangle.

上述した各実施の形態において、方向ヒストグラムにおける方向は、８方向に量子化したが、方向の数はこれに限らない。また、動きのない状態、即ち、方向がない状態を取り扱うようにしても良い。 In each embodiment described above, the directions in the direction histogram are quantized into eight directions, but the number of directions is not limited to this. Further, a state where there is no movement, that is, a state where there is no direction may be handled.

５０画像処理装置
５１取得部
５２設定部
５３対応付け部
５４算出部
５５判定部
５６除去部
５７指示部 DESCRIPTION OF SYMBOLS 50 Image processing apparatus 51 Acquisition part 52 Setting part 53 Association part 54 Calculation part 55 Determination part 56 Removal part 57 Instruction part

Claims

取得した画像のうち時間的に隣接する２つの画像のそれぞれについて、顔の少なくとも一部を含む顔領域と、背景を含む背景領域とを設定する設定部と、
前記顔領域と前記背景領域とのそれぞれにおいて、前記２つの画像間の各画素の対応付けを行って、各画素の動きを推定する対応付け部と、
前記顔領域と前記背景領域とのそれぞれにおいて、各画素の動きの統計的な分布を表す動き特徴量を算出する算出部と、
前記顔領域における動き特徴量と前記背景領域における動き特徴量とを比較して、前記画像に撮影されたものが写真であるか人間であるかを判定する判定部とを備える
ことを特徴とする画像処理装置。 A setting unit for setting a face area including at least a part of a face and a background area including a background for each of two images that are temporally adjacent to each other in the acquired images;
In each of the face area and the background area, an association unit that associates each pixel between the two images and estimates the movement of each pixel;
In each of the face area and the background area, a calculation unit that calculates a motion feature amount representing a statistical distribution of movement of each pixel;
A determination unit configured to compare a motion feature amount in the face region with a motion feature amount in the background region and determine whether the image captured in the image is a photograph or a human being; Image processing device.

各画素の動きから、はずれ値を除去する除去部を更に備える
ことを特徴とする請求項１に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising a removing unit that removes the outlier value from the movement of each pixel.

前記対応付け部は、前記顔領域と前記背景領域とのそれぞれにおいて、前記２つの画像間の各画素の対応付けを、オプティカルフローを算出することにより行い、
前記除去部は、各画素のオプティカルフローの大きさから分散を求め、当該分散の定数倍より大きいオプティカルフローをはずれ値として除去する
ことを特徴とする請求項２に記載の画像処理装置。 The association unit performs association of each pixel between the two images in each of the face area and the background area by calculating an optical flow,
The image processing apparatus according to claim 2, wherein the removing unit obtains a variance from the size of the optical flow of each pixel and removes an optical flow larger than a constant multiple of the variance as an outlier.

前記対応付け部は、前記顔領域と前記背景領域とのそれぞれにおいて、前記２つの画像間の各画素の対応付けを、オプティカルフローを算出することにより行い、
前記算出部は、前記顔領域と前記背景領域とのそれぞれにおいて、前記オプティカルフローの方向を離散化して方向ヒストグラムを作成することにより、前記動き特徴量を算出する
ことを特徴とする請求項１乃至３のいずれか一項に記載の画像処理装置。 The association unit performs association of each pixel between the two images in each of the face area and the background area by calculating an optical flow,
The said calculation part calculates the said motion feature-value by discretizing the direction of the said optical flow in each of the said face area | region and the said background area | region, and producing a direction histogram. The image processing apparatus according to any one of claims 3 to 4.

前記判定部は、前記顔領域において算出された前記方向ヒストグラムと、前記背景領域において算出された前記方向ヒストグラムとの類似度を求め、当該類似度が閾値以上である場合、前記画像に撮影されたものが写真であると判定し、当該類似度が閾値より小さい場合、前記画像に撮影されたものが人間であると判定する
ことを特徴とする請求項４に記載の画像処理装置。 The determination unit obtains a degree of similarity between the direction histogram calculated in the face area and the direction histogram calculated in the background area. If the degree of similarity is equal to or greater than a threshold, the image is captured in the image 5. The image processing apparatus according to claim 4, wherein it is determined that the object is a photograph, and if the similarity is smaller than a threshold value, it is determined that the object captured in the image is a person.

ユーザが知覚可能な刺激を発生させる指示部を更に備え、
前記取得部は、前記刺激が発生した後に、前記画像を取得する
ことを特徴とする請求項１乃至５のいずれか一項に記載の画像処理装置。 An instruction unit for generating a stimulus that can be perceived by the user;
The image processing apparatus according to claim 1, wherein the acquisition unit acquires the image after the stimulus is generated.

撮影した画像を入力する画像入力部を更に備える
ことを特徴とする請求項１乃至６のいずれか一項に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising an image input unit configured to input a captured image.

画像処理方法であって、
設定部が、取得した画像のうち時間的に隣接する２つの画像のそれぞれについて、顔の全部又は一部を含む顔領域と、背景を含む背景領域とを設定し、
対応付け部が、前記顔領域と前記背景領域とのそれぞれにおいて、前記２つの画像間の各画素の対応付けを行って、各画素の動きを推定し、
算出部が、前記顔領域と前記背景領域とのそれぞれにおいて、各画素の動きの統計的な分布を表す動き特徴量を算出し、
判定部が、前記顔領域における動き特徴量と前記背景領域における動き特徴量とを比較して、前記画像に撮影されたものが写真であるか人間であるかを判定する
ことを特徴とする画像処理方法。 An image processing method comprising:
The setting unit sets a face area including all or part of the face and a background area including the background for each of two images that are temporally adjacent to each other in the acquired images,
The associating unit associates each pixel between the two images in each of the face area and the background area, and estimates the movement of each pixel.
The calculation unit calculates a motion feature amount representing a statistical distribution of the motion of each pixel in each of the face region and the background region,
The determination unit compares the motion feature amount in the face region and the motion feature amount in the background region, and determines whether the photographed image is a photograph or a human being Processing method.

コンピュータを、
取得した画像のうち時間的に隣接する２つの画像のそれぞれについて、顔の少なくとも一部を含む顔領域と、背景を含む背景領域とを設定する設定部と、
前記顔領域と前記背景領域とのそれぞれにおいて、前記２つの画像間の各画素の対応付けを行って、各画素の動きを推定する対応付け部と、
前記顔領域と前記背景領域とのそれぞれにおいて、各画素の動きの統計的な分布を表す動き特徴量を算出する算出部と、
前記顔領域における動き特徴量と前記背景領域における動き特徴量とを比較して、前記画像に撮影されたものが写真であるか人間であるかを判定する判定部と
して機能させるためのプログラム。 Computer
A setting unit for setting a face area including at least a part of a face and a background area including a background for each of two images that are temporally adjacent to each other in the acquired images;
In each of the face area and the background area, an association unit that associates each pixel between the two images and estimates the movement of each pixel;
In each of the face area and the background area, a calculation unit that calculates a motion feature amount representing a statistical distribution of movement of each pixel;
A program for functioning as a determination unit that compares a motion feature amount in the face region with a motion feature amount in the background region to determine whether the image captured in the image is a photograph or a human being.