JP2008009728A

JP2008009728A - Expression recognition method and expression recognition device

Info

Publication number: JP2008009728A
Application number: JP2006179745A
Authority: JP
Inventors: Hiroki Yamauchi; 寛紀山内; Takuji Miki; 拓司三木
Original assignee: Ritsumeikan Trust
Current assignee: Ritsumeikan Trust
Priority date: 2006-06-29
Filing date: 2006-06-29
Publication date: 2008-01-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide an expression recognition method and an expression recognition device which suppress individual differences. <P>SOLUTION: The method for recognizing an expression of a face in an object face image includes: a feature extraction step of extracting a plurality of pieces of expression feature information of the object face image by subjecting the object face image to frequency analysis by a plurality of space filters different by area characteristics and direction characteristics; a difference operation step of obtaining a plurality of pieces of difference information indicating differences between a plurality of pieces of reference feature information obtained by causing the plurality of space filters to act upon one reference face image, and the plurality of pieces of expression feature information; and a determination step of determining the expression on the basis of the difference information. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、表情認識方法及び表情認識装置に関するものである。 The present invention relates to a facial expression recognition method and a facial expression recognition device.

人間とコンピュータとの間での高度の協調関係を実現するには、人間の表情を認識する能力をロボット等のコンピュータシステムに搭載することが求められる。すなわち、コンピュータが人間の表情を認識することにより、コンピュータは、人間の感情を読み取って、人間の感情に対応した対応をすることが可能になる。 In order to realize a high degree of cooperative relationship between a human and a computer, it is required to install a computer system such as a robot with the ability to recognize human facial expressions. That is, when the computer recognizes the human facial expression, the computer can read the human emotion and can respond to the human emotion.

従来の表情認識方法として、対象となる顔画像の特徴を抽出し、その特徴に基づいて表情を判定する方法がある。
従来の表情認識方法において、静止画像を対象顔画像とする場合には、顔表情の特徴抽出自体が困難であるが、時系列画像（動画）を対象に特徴を抽出する場合には、表情変化に伴う顔の特徴点の移動を捉えることで、表情に対応した顔画像の特徴を抽出することが容易となる。しかし、時系列画像における特徴点の移動は、個人の顔構造に依存するため、個人差による影響を受けやすいという問題がある。 As a conventional facial expression recognition method, there is a method of extracting a feature of a target face image and determining a facial expression based on the feature.
In the conventional facial expression recognition method, when a still image is used as a target facial image, it is difficult to extract facial facial features. However, when extracting features from a time-series image (moving image), facial expression changes By capturing the movement of facial feature points associated with a facial feature, it is easy to extract facial image features corresponding to facial expressions. However, the movement of the feature points in the time-series image is dependent on the individual face structure, so that there is a problem that it is easily influenced by individual differences.

ここで、特許文献１には、人物の顔の映像信号をウェーブレット変換することにより所定の各帯域ごとに空間周波数領域の周波数信号を生成し、各帯域ごとに前記周波数信号の平均電力を算出し、その平均電力と前記人物の顔が無表情のときにその顔から得られる対応の平均電力との差分を算出する表情検出装置が記載されている。
特許文献１のウェーブレット変換は、画像中の特定の検出領域に対してなされるものであって、前記検出領域全体の周波数の高域と低域を順次分けている帯域分割フィルタ（サブバンドフィルタ）によって実現されている。 Here, in Patent Document 1, a frequency signal in a spatial frequency domain is generated for each predetermined band by wavelet transforming a human face video signal, and the average power of the frequency signal is calculated for each band. Describes a facial expression detection device that calculates the difference between the average power and the corresponding average power obtained from the face when the person's face is expressionless.
The wavelet transform of Patent Literature 1 is performed on a specific detection region in an image, and a band division filter (subband filter) that sequentially divides the high frequency and low frequency of the entire detection region. It is realized by.

前記特許文献１記載のものでは、人物の顔の映像信号をウェーブレット変換することにより所定の各帯域ごとに空間周波数領域の周波数信号を生成しているが、特許文献１におけるウェーブレット変換は、画像中の特定の検出領域の周波数の高域と低域を順次分けている帯域分割フィルタによって行われるものにすぎず、対象顔画像に対して、ガボールフィルタのように領域特性及び方向特性が異なる複数の空間フィルタによって周波数分析を行っているものではない。
すなわち、特許文献１のものでは、画像中の特定部分の周波数の高域と低域を分けるだけであって、周波数特性を調べる際の領域特性及び方向特性についての自由度がなく、個人差による影響を十分に抑制することができない。
特に、特許文献１のものでは、周波数の高域と低域を分けるだけで表情の特徴を抽出しようとしているため、眉・目・口など表情の出やすい特定部分を予め検出領域として特定する必要があり、煩雑な処理となる。 In the one described in Patent Document 1, a frequency signal in the spatial frequency domain is generated for each predetermined band by performing wavelet transform on a human face video signal. This is only performed by a band-splitting filter that sequentially divides the high frequency and low frequency of a specific detection region, and the target face image has a plurality of region characteristics and directional characteristics different from each other like a Gabor filter. The frequency analysis is not performed by a spatial filter.
In other words, in Patent Document 1, only a high frequency and a low frequency of a specific part in an image are separated, and there is no degree of freedom regarding a region characteristic and a direction characteristic when examining a frequency characteristic. The influence cannot be suppressed sufficiently.
In particular, since Patent Document 1 attempts to extract facial features simply by separating the high and low frequencies, it is necessary to specify in advance a specific part such as an eyebrow, an eye, or a mouth that is prone to expression as a detection region. This is a complicated process.

しかも、特許文献１のものでは、ウェーブレット変換後に周波数信号の平均電力を算出して、無表情の顔から得られる対応平均電力との差分を算出しており、ウェーブレット変換によって得られた特徴情報そのものの差分を得るという着想はない。 Moreover, in Patent Document 1, the average power of the frequency signal is calculated after the wavelet transform, and the difference from the corresponding average power obtained from the expressionless face is calculated, and the characteristic information itself obtained by the wavelet transform is calculated. There is no idea of obtaining the difference.

また、特許文献２には、被写体の顔の特徴点に対して、ガボールウェーブレット変換を行うことにより特徴量を抽出する人種推定装置が記載されている。
特許文献２記載のものは、人種推定装置であって、表情認識装置ではないが、ガボールウェーブレット変換を行っている。ただし、特許文献２記載のものでは、ガボールウェーブレット変換によって特徴を抽出しているものの、ガボールウェーブレット変換によって得られた特徴から直ちに人種推定を行っており、ガボールウェーブレット変換後に基準画像にガボールウェーブレット変換を行ったものとの差分を求めるという着想はない。
特開平８−２４９４４７号公報特開２００５−２６６９８１号公報 Patent Document 2 describes a race estimation device that extracts a feature amount by performing Gabor wavelet transform on a feature point of a subject's face.
The device described in Patent Document 2 is a race estimation device and not a facial expression recognition device, but performs Gabor wavelet transform. However, in Patent Document 2, although features are extracted by the Gabor wavelet transform, the race is immediately estimated from the features obtained by the Gabor wavelet transform, and the Gabor wavelet transform is performed on the reference image after the Gabor wavelet transform. There is no idea to find the difference from what has been done.
JP-A-8-249447 JP 2005-266981 A

本発明は、個人差による影響を抑制した表情認識技術を提供することを目的とする。 An object of this invention is to provide the facial expression recognition technique which suppressed the influence by an individual difference.

本発明は、対象顔画像中の顔の表情を認識する方法であって、対象顔画像に対して、領域特性及び方向特性が異なる複数の空間フィルタによって周波数分析を行って、前記対象顔画像の複数の表情特徴情報を抽出する特徴抽出ステップと、複数の前記空間フィルタを同一者の基準顔画像に作用させて得られた複数の基準特徴情報と、複数の前記表情特徴情報との差分を示す複数の差分情報を求める差分演算ステップと、前記差分情報に基づいて表情を判定する判定ステップと、を含む。 The present invention is a method for recognizing facial expressions in a target face image, wherein the target face image is subjected to frequency analysis using a plurality of spatial filters having different region characteristics and direction characteristics, A feature extraction step for extracting a plurality of facial expression feature information, a plurality of reference feature information obtained by applying a plurality of spatial filters to a reference face image of the same person, and a difference between the plurality of facial expression feature information A difference calculating step for obtaining a plurality of difference information; and a determination step for determining a facial expression based on the difference information.

前記空間フィルタは、ガボールフィルタであるのが好ましい。 The spatial filter is preferably a Gabor filter.

前記差分情報を圧縮した圧縮差分情報を取得するステップを更に含み、
前記判定ステップは、前記圧縮差分情報に基づいて表情を判定するのが好ましい。 Further comprising obtaining compressed difference information obtained by compressing the difference information;
In the determination step, it is preferable to determine a facial expression based on the compression difference information.

前記判定ステップは、前記差分情報を入力とし、表情の種別を出力とするニューラルネットワークによって表情を判定するのが好ましい。 In the determination step, the facial expression is preferably determined by a neural network that receives the difference information as an input and outputs a facial expression type.

前記特徴抽出ステップは、前記対象顔画像の顔全体に対して、複数の前記空間フィルタを作用させるのが好ましい。 In the feature extraction step, it is preferable that a plurality of the spatial filters are applied to the entire face of the target face image.

他の観点からみた本発明は、対象顔画像中の顔の表情を認識する装置であって、対象顔画像に対して、領域特性及び方向特性が異なる複数の空間フィルタによって周波数分析を行って、前記対象顔画像の複数の表情特徴情報を抽出する特徴抽出部と、複数の前記特徴抽出フィルタを同一者の基準顔画像に作用させて得られた複数の基準特徴情報と、前記表情特徴情報との差分を示す複数の差分情報を求める差分演算部と、前記差分情報に基づいて表情を判定する判定部と、を含む。 Another aspect of the present invention is an apparatus for recognizing a facial expression in a target face image, and performing frequency analysis on the target face image using a plurality of spatial filters having different region characteristics and direction characteristics, A feature extraction unit that extracts a plurality of facial expression feature information of the target face image, a plurality of reference feature information obtained by applying the plurality of feature extraction filters to a reference face image of the same person, and the facial expression feature information; The difference calculating part which calculates | requires the some difference information which shows these differences, and the determination part which determines an expression based on the said difference information are included.

本発明によれば、領域特性及び方向特性が異なる複数の空間フィルタによって得られた表情特徴情報と基準特徴情報との差分を求めることで、個人差による影響を抑制して表情認識を行うことができる。 According to the present invention, facial expression recognition can be performed while suppressing the influence of individual differences by obtaining a difference between facial expression feature information obtained by a plurality of spatial filters having different region characteristics and directional characteristics and reference feature information. it can.

以下、本発明の実施形態を図面に基づいて説明する。
図１は、表情認識装置１を示している。この表情認識装置１は、カメラ（撮像装置）によって取得された対象顔画像に対して前処理を行う前処理部２と、対象顔画像と基準画像との差分情報を求める差分取得部３と、差分取得部３の出力の次元圧縮を行う次元圧縮部４と、次元圧縮部４の出力に基づいて表情判定を行う表情判定部５とを備えている。
なお、上記各部２，３，４，５は、コンピュータの記憶装置（記録媒体）に記憶されたコンピュータプログラムがコンピュータによって実行されることによって実現されるが、上記各部２，３，４，５を専用のハードウェアによって実現してもよい。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 shows a facial expression recognition device 1. The facial expression recognition device 1 includes a preprocessing unit 2 that performs preprocessing on a target face image acquired by a camera (imaging device), a difference acquisition unit 3 that calculates difference information between the target face image and a reference image, A dimensional compression unit 4 that performs dimensional compression of the output of the difference acquisition unit 3 and a facial expression determination unit 5 that performs facial expression determination based on the output of the dimensional compression unit 4 are provided.
The units 2, 3, 4, and 5 are realized by a computer program stored in a storage device (recording medium) of a computer being executed by the computer. It may be realized by dedicated hardware.

なお、表情認識装置１は、学習モードと認識モードの２つのモードを実行可能とされている。学習モードは、ニューラルネットワークからなる表情判定部５に表情認識のための学習を行わせるものであり、認識モードは、カメラによって取得された対象顔画像の表情認識を行うモードである。
また、表情認識装置１は、学習モードの際には、顔表情データベースから顔画像を読み込んで、差分取得部３、次元圧縮部４及び表情判定部５における処理を行う。なお、学習のために、顔表情データベースには、様々な表情の人物顔画像が複数名分、登録されている。 The facial expression recognition device 1 can execute two modes, a learning mode and a recognition mode. The learning mode is a mode in which the facial expression determination unit 5 including a neural network performs learning for facial expression recognition, and the recognition mode is a mode in which facial expression recognition of the target face image acquired by the camera is performed.
In the learning mode, the facial expression recognition device 1 reads a facial image from the facial facial expression database and performs processing in the difference acquisition unit 3, the dimension compression unit 4, and the facial expression determination unit 5. For learning, a plurality of human face images with various expressions are registered in the facial expression database.

前記前処理部２は、カメラによって取得された対象顔画像（正面顔の静止画像）に対して、前処理として顔画像の正規化を行う。
表情認識装置１の差分取得部３では、顔画像間の差分情報を取得するため、顔画像間で顔の位置・大きさ・姿勢等が異なると、適切な差分情報が得られず、表情認識精度が低下する。そのため、前処理部２では、カメラによって得られた生の対象顔画像（前処理が未処理の対象顔画像）について、顔画像における顔の位置・大きさ・姿勢等を補正した正規化対象顔画像を得る。
前処理部２では、具体的には、入力された対象顔画像について、６分割矩形フィルタ（Six Segment Rectangle Filter:SSRフィルタ）による眉間候補点の抽出、平均顔テンプレートとのマッチング、瞳検索、アフィン変換（姿勢の正規化）、顔領域抽出（顔の位置の正規化）等を行う。 The pre-processing unit 2 normalizes the face image as pre-processing for the target face image (front face still image) acquired by the camera.
Since the difference acquisition unit 3 of the facial expression recognition device 1 acquires difference information between face images, if the face position, size, posture, and the like differ between face images, appropriate difference information cannot be obtained, and facial expression recognition is performed. Accuracy is reduced. Therefore, the preprocessing unit 2 corrects the position, size, posture, and the like of the face in the face image with respect to the raw target face image obtained by the camera (the target face image that has not been preprocessed). Get an image.
Specifically, the preprocessing unit 2 extracts the candidate points between the eyebrows by using a six segment rectangle filter (SSR filter), matching with the average face template, pupil search, affine, for the input target face image. Conversion (posture normalization), face region extraction (face position normalization), and the like are performed.

６分割矩形フィルタ（Six Segment Rectangle Filter :ＳＳＲフィルタ）による眉間候補点の抽出は、人間の眉間周辺部分を６分割し、その濃淡情報に注目する手法であり、照明条件に強い手法である。ＳＳＲフィルタによる眉間候補点抽出については、S.kawato and Tetsutani, Scale Adaptive Face Detection and tracking in Rear Time with SSR Filter and Support Vector Machine, Proc. of ACCV 2004, vol. 1, pp132-137,2004（以下、「参考文献１」という）や、S.kawato, Y.Senda and N.Nobuji ,Detection of Between-the-Eyes with SSR-Filter Technical Report of IEICE, PRMU2002-207,pp.41-46,2003（以下、「参考文献２」という）に記載されている。 Extraction of candidate points between eyebrows by a six segment rectangle filter (SSR filter) is a technique that divides a human eyebrow peripheral part into six parts and pays attention to the density information, and is a technique that is strong against illumination conditions. S.kawato and Tetsutani, Scale Adaptive Face Detection and tracking in Rear Time with SSR Filter and Support Vector Machine, Proc. Of ACCV 2004, vol. 1, pp132-137, 2004 (below) , "Reference 1"), S.kawato, Y. Senda and N. Nobuji, Detection of Between-the-Eyes with SSR-Filter Technical Report of IEICE, PRMU2002-207, pp.41-46,2003 ( Hereinafter, it is described in “Reference Document 2”.

前処理部２は、入力された対象顔画像に対し、ＳＳＲフィルタによって絞り込まれた眉間候補点の周辺で、予め用意しておいた平均顔テンプレートとの重み（分散パターン）付けテンプレートマッチングを行うことにより候補点をさらに絞り込む。なお、平均顔テンプレートとのマッチングについては、参考文献２に記載されている。 The pre-processing unit 2 performs weighted (distributed pattern) template matching with an average face template prepared in advance around the candidate eyebrow points narrowed down by the SSR filter on the input target face image. To further narrow down the candidate points. The matching with the average face template is described in Reference Document 2.

さらに、前処理部２は、絞り込まれた眉間候補点に基づき、瞳検索を行う。瞳検索は、眉間候補点に基づき、対象顔画像の眉間周辺についてエッジ抽出を行い、さらに平滑化処理を行ってノイズを除去する。さらに、２値処理を行い、膨張・縮小処理を行う。最後に、ラベリングを行い、最適な領域の重心を瞳位置として抽出する。以上の処理により、前処理部２に入力された対象顔画像における両瞳の座標が得られる。
なお、入力された対象顔画像から、瞳を検索するための一連の処理は、上記のものに特に限定されない。 Furthermore, the preprocessing unit 2 performs a pupil search based on the narrowed eyebrow candidate points. In the pupil search, based on candidate eyebrow points, edge extraction is performed around the eyebrows of the target face image, and smoothing processing is further performed to remove noise. Further, binary processing is performed, and expansion / reduction processing is performed. Finally, labeling is performed, and the center of gravity of the optimum region is extracted as the pupil position. Through the above processing, the coordinates of both pupils in the target face image input to the preprocessing unit 2 are obtained.
A series of processes for searching for a pupil from the input target face image is not particularly limited to the above.

図２（ａ）に示すように、両瞳Ｅ，Ｅの位置関係から、入力対象顔画像Ｆ１における顔の姿勢が求まる。すなわち、前処理部２は、両瞳Ｅ，ＥのＸ方向の間隔Ｘ１及び両瞳Ｅ，ＥのＹ方向の間隔Ｙ１から、対象顔画像Ｆ１における顔の姿勢（傾きθ）を算出する。
さらに、前処理部２は、傾きθに基づき、下記回転式を入力顔画像Ｆ１に適用して（アフィン変換）、図２（ｂ）に示すように傾きを補正した姿勢補正顔画像Ｆ２を得る（姿勢の正規化）。
なお、上記式において、Ｘ１は入力対象顔画像Ｆ１におけるＸ座標、Ｙ１は入力顔対象画像Ｆ１におけるＹ座標、Ｘ２は姿勢補正顔画像Ｆ２におけるＸ座標、Ｙ２は姿勢補正顔画像Ｆ２におけるＹ座標である。 As shown in FIG. 2A, the posture of the face in the input target face image F1 is obtained from the positional relationship between the pupils E and E. That is, the preprocessing unit 2 calculates the posture (inclination θ) of the face in the target face image F1 from the interval X1 between the pupils E and E in the X direction and the interval Y1 between the pupils E and E in the Y direction.
Further, the preprocessing unit 2 applies the following rotation formula to the input face image F1 based on the tilt θ (affine transformation), and obtains a posture-corrected face image F2 whose tilt is corrected as shown in FIG. (Attitude normalization).
In the above formula, X1 is the X coordinate in the input target face image F1, Y1 is the Y coordinate in the input face target image F1, X2 is the X coordinate in the posture correction face image F2, and Y2 is the Y coordinate in the posture correction face image F2. is there.

更に、前処理部２は、姿勢補正顔画像Ｆ２から、顔領域画像Ｆ３の抽出を行う。顔領域の抽出の際には、両瞳Ｅ，Ｅ間の距離をｄとしたときに、この瞳間距離ｄ及び両瞳Ｅ，Ｅの位置を基準として、図３に示すようにＸ，Ｙ方向がそれぞれ２ｄとなる大きさの顔領域画像（顔のほぼ全体の画像）を姿勢補正顔画像Ｆ２から抽出する。 Further, the preprocessing unit 2 extracts a face area image F3 from the posture corrected face image F2. When extracting the face area, when the distance between the pupils E and E is d, the distance between the pupils d and the positions of the pupils E and E are used as a reference, as shown in FIG. A face area image (an image of almost the entire face) having a size of 2d in each direction is extracted from the posture-corrected face image F2.

以上のようにして得られた顔領域画像（正規化対象顔画像）Ｆ３は、顔における瞳等の部位の位置が正規化されているため、顔画像間における顔の位置・大きさが補正され、顔画像間での適切な差分情報を得ることが可能となる。特に、表情変化における顔の特徴移動においては、目や目の周辺の特徴点の変化が特に大きいことに鑑みて、瞳（目）の位置を基準として正規化を行うことで、表情認識率を高めている。 In the face area image (normalization target face image) F3 obtained as described above, the position and size of the face between the face images are corrected because the position of a part such as a pupil in the face is normalized. Therefore, it is possible to obtain appropriate difference information between face images. In particular, in facial feature movement during facial expression changes, in consideration of the fact that the changes in the feature points of the eyes and around the eyes are particularly large, normalization is performed based on the position of the pupil (eyes), thereby improving the facial expression recognition rate. It is increasing.

前処理部２から出力された正規化対象顔画像は、差分取得部３に与えられる。差分取得部３は、入力された正規化対象顔画像に対しガボールフィルタによるフィルタリングを行う特徴抽出部３１を備えている。
特徴抽出部３１は、ガボールフィルタによって顔の特徴（ガボール特徴）を抽出するためのものである。ガボールフィルタは、画像にガボールウェーブレット変換を施して当該画像の周波数分析を行うものである。
ガボールフィルタ（ガボールウェブレット関数）は下記式によって表され、ｓｉｎ／ｃｏｓ関数をガウス関数で局在化した空間フィルタとして構成されており、ガウス関数を対象顔画像に畳み込み積分することにより、局在化した領域に対してフーリエ変換による周波数分析を行うことができる（図４参照）。 The normalization target face image output from the preprocessing unit 2 is given to the difference acquisition unit 3. The difference acquisition unit 3 includes a feature extraction unit 31 that performs filtering using a Gabor filter on the input face image to be normalized.
The feature extraction unit 31 is for extracting facial features (Gabor features) using a Gabor filter. The Gabor filter performs Gabor wavelet transform on an image and performs frequency analysis of the image.
The Gabor filter (Gabor weblet function) is expressed by the following equation, and is configured as a spatial filter in which the sin / cos function is localized by a Gaussian function. By convolving and integrating the Gaussian function into the target face image, the Gabor filter is localized. Frequency analysis by Fourier transform can be performed on the converted region (see FIG. 4).

このガボールフィルタは、画像の局所的な濃淡情報を取り出すことができるとともに、照明変動の影響を受けにくいという特性がある。また、上記式中のｋ及びφを変更することで、フィルタの領域特性及び方向特性を異ならせることができる。すなわち、図５に示すように、ガウス関数のスケールｋを変更することで、ｓｉｎ／ｃｏｓ関数が局在的に存在する領域の大きさ（領域特性）を変更することができ、回転角φを変更することで、前記の局在的な領域の方向を変更することができる。 This Gabor filter has a characteristic that it can extract local shading information of an image and is not easily affected by illumination fluctuations. Further, by changing k and φ in the above formula, it is possible to vary the region characteristics and direction characteristics of the filter. That is, as shown in FIG. 5, by changing the scale k of the Gaussian function, the size of the region where the sin / cos function exists locally (region characteristic) can be changed, and the rotation angle φ can be changed. By changing, the direction of the localized region can be changed.

前記局在的領域を大きくすると、顔の、より大局的な特徴が抽出される。一方、局在的領域を小さくすると、顔の、より局所的な特徴が抽出される。また、方向特性を変更すると、抽出される特徴の方向性が異なり、例えば図５においてφ＝０の場合、横方向の特徴が抽出され、φ＝π／２の場合、立て方向の特徴が抽出され、φ＝π／４，φ＝３π／４の場合、斜め方向の特徴が抽出される。 When the localized area is enlarged, more global features of the face are extracted. On the other hand, if the localized area is reduced, more local features of the face are extracted. Further, when the direction characteristic is changed, the directionality of the extracted feature is different. For example, when φ = 0 in FIG. 5, the lateral feature is extracted, and when φ = π / 2, the vertical feature is extracted. When φ = π / 4 and φ = 3π / 4, features in the oblique direction are extracted.

前記特徴抽出部３１は、例えば図５に示すように、ｋ及びφの異なる複数のガボールフィルタを有しており、入力された対象顔画像（顔のほぼ全体をカバーする画像）の、大局的な表情特徴情報から局所的な表情特徴情報を抽出することができるとともに、大局的な表情特徴情報及び局所的な表情特徴情報を様々な方向について抽出することができる。
つまり、特徴抽出部３１は、入力された対象顔画像から、複数のガボールフィルタによって、ガボールフィルタ数に対応した数の表情特徴情報（表情特徴画像）を抽出することができる。 For example, as shown in FIG. 5, the feature extraction unit 31 includes a plurality of Gabor filters having different k and φ, and the input target face image (an image covering almost the entire face) is globally displayed. Local facial expression feature information can be extracted from various facial expression feature information, and global facial expression feature information and local facial expression feature information can be extracted in various directions.
That is, the feature extraction unit 31 can extract the number of facial expression feature information (expression facial feature images) corresponding to the number of Gabor filters from the input target face image using a plurality of Gabor filters.

特徴抽出部３１から出力された複数の表情特徴情報は、差分取得部３の差分演算部３２に与えられる。
差分演算部３２は、入力された対象顔画像と同一者の真顔（基準顔画像）から得られた基準特徴情報（基準特徴画像）との差分を求めるものである。基準特徴情報は、対象顔画像から得た表情特徴情報と同様に、真顔の基準顔画像に対し、前記前処理部２による処理を行うとともに、特徴抽出部３１による処理を施して、真顔における表情特徴を前述のガボールフィルタによって抽出したものである。
差分演算部３２は、入力された対象顔画像の各表情特徴情報について、当該表情特徴情報に対応する基準特徴情報、すなわち、当該表情特徴情報を求めたガボールフィルタと同じガボールフィルタ（ｋ，φが同じガボールフィルタ）によって求めた基準特徴情報との差分を求める。これにより、ガボールフィルタ数と同じ数の差分情報（差分画像）が得られる。 The plurality of facial expression feature information output from the feature extraction unit 31 is provided to the difference calculation unit 32 of the difference acquisition unit 3.
The difference calculation unit 32 obtains a difference between the input target face image and reference feature information (reference feature image) obtained from the same person's true face (reference face image). Similar to the facial expression feature information obtained from the target face image, the reference feature information is processed by the pre-processing unit 2 and processed by the feature extraction unit 31 on the true facial reference face image to express the expression on the true face. Features are extracted by the Gabor filter described above.
For each facial expression feature information of the input target face image, the difference calculation unit 32 uses the same Gabor filter (k, φ) as the reference feature information corresponding to the facial expression feature information, that is, the Gabor filter for which the facial expression feature information is obtained. The difference from the reference feature information obtained by the same Gabor filter) is obtained. Thereby, the same number of difference information (difference images) as the number of Gabor filters is obtained.

図６は、基準顔画像（真顔）Ｆ３−ａと、様々な表情の対象顔画像（笑顔、驚き、怒り）Ｆ３−ｂ，Ｆ３−ｃ，Ｆ３−ｄとについて、共通のガボールフィルタを施した場合の基準特徴情報Ｃ−ａ及び表情特徴情報Ｃ−ｂ，Ｃ−ｃ，Ｃ−ｄを示すとともに、各表情の差分情報Ｄ−ｂ，Ｄ−ｃ，Ｄ−ｄを示している。
例えば、対象顔画像が笑顔の画像Ｆ３−ｂである場合、差分演算部３２では、画像Ｆ３−ｂの表情特徴情報Ｃ−ｂと、基準特徴情報Ｃ−ａとの差分をとって、差分情報Ｄ−ｂを獲得する。そして、同様の処理を、他のガボールフィルタによって獲得した表情特徴情報についても行う。 FIG. 6 shows a common Gabor filter applied to the reference face image (true face) F3-a and the target face images (smile, surprise, anger) F3-b, F3-c, and F3-d having various expressions. In addition to reference feature information Ca and facial expression feature information Cb, Cc, and Cd, difference information Db, Dc, and Dd for each facial expression are shown.
For example, when the target face image is a smile image F3-b, the difference calculation unit 32 takes the difference between the facial expression feature information Cb of the image F3-b and the reference feature information Ca to obtain difference information. Obtain Db. Then, the same processing is performed for facial expression feature information acquired by other Gabor filters.

図６の各表情の表情特徴情報Ｃ−ｂ，Ｃ−ｃ，Ｃ−ｄからわかるように、表情の特徴は、表情特徴情報Ｃ−ｂ，Ｃ−ｃ，Ｃ−ｄにおいても、各表情の間で、はっきりとした違いとして表れている。しかし、顔は、目、鼻、口といった部位の位置・大きさが、個人差により、異なるため、表情特徴情報においても、個人差の影響が大きく生じる。このような個人差は、顔表情間の特徴の相違を利用して表情を分類する上で大きな妨げとなる。 As can be seen from the facial expression feature information Cb, Cc, and Cd of each facial expression in FIG. 6, the facial expression features are also represented in the facial expression feature information Cb, Cc, and Cd. It appears as a clear difference between them. However, since the position and size of the parts of the face such as eyes, nose, and mouth vary depending on individual differences, the effect of individual differences is also great in facial expression feature information. Such individual differences greatly hinder the classification of facial expressions using the difference in characteristics between facial expressions.

これに対し、本実施形態の差分演算部３２では、表情のある顔と真顔との差分をとることで、表情のある顔の真顔からの特徴変化量を差分情報として得ている。目や口といった部位の大きさに個人差があっても、表情を真顔からの変位とみた場合、それほど個人差がなく、真顔からの変位である差分情報に基づいて表情判定を行うことで、個人差による影響を抑制することができる。 On the other hand, the difference calculation unit 32 of the present embodiment obtains the feature change amount from the true face of the face with the expression as difference information by taking the difference between the face with the expression and the true face. Even if there are individual differences in the size of the parts such as eyes and mouth, if the expression is viewed as displacement from a true face, there is not so much individual difference, and by performing facial expression determination based on difference information that is displacement from the true face, The influence of individual differences can be suppressed.

しかも、本実施形態では、表情特徴情報と基準特徴情報とは、いずれも正規化された顔画像に基づいて得られているため、画像間における顔のズレ等を解消して、適切な差分情報が得られている。 In addition, in the present embodiment, since both the facial expression feature information and the reference feature information are obtained based on the normalized face images, it is possible to eliminate the face misalignment between images and to obtain appropriate difference information. Is obtained.

ここで、各表情において、真顔からの変位が生じる顔の位置や部位（目など）の変位の方向は、表情によって異なる。例えば、真顔からの変位といっても、顔全体からみた大局的な変化から、顔の局所的な部分における変化まであり、さらに、顔の部位の変化の方向も横方向だったり縦方向だったり斜め方向だったりする。これらの変位の生じ方は、各表情間において一定というわけではない。
したがって、一種類（１スケール、１回転角）のガボールフィルタで差分情報を求めても各表情について有意な情報となるとは限らないが、本実施形態では、顔画像に、複数スケール及び複数回転角のガボールフィルタかけているため、顔画像全体に単にフィルタリング処理をするだけで、顔全体における任意の局所空間情報を取得することができ、各表情についての有意な情報を取得することができる。 Here, in each facial expression, the position of the face where the displacement from the true face and the direction of displacement of the part (such as eyes) vary depending on the facial expression. For example, displacement from a true face ranges from a global change as seen from the whole face to a change in a local part of the face, and the direction of the change of the facial part is also horizontal or vertical. It may be diagonal. The manner in which these displacements occur is not constant between facial expressions.
Therefore, even if the difference information is obtained with one type (one scale, one rotation angle) of the difference information, it is not always significant information about each facial expression. However, in this embodiment, the face image includes a plurality of scales and a plurality of rotation angles. Since the Gabor filter is applied, arbitrary local spatial information in the entire face can be acquired by simply filtering the entire face image, and significant information on each facial expression can be acquired.

差分演算部３２から出力された差分情報は、次元圧縮部４に与えられ、圧縮される。例えば、４８×４８画素の画像に対して、３スケール・４回転角のガボールフィルタをかけた出力の次元数は、４８×４８×３×４＝２７６４８次元となり、膨大である。膨大な次元数の差分情報を、後段の表情判定部（ニューラルネットワーク）５に入力するのは困難であるため、次元圧縮部４では次元の圧縮を行い、圧縮差分情報を得る。
次元圧縮部４での次元圧縮は、主成分分析（Principal Component Analysis:ＰＣＡ）によって行われる。なお、主成分分析とは、互いに相関のある多数の変数の情報を、互いに無相関な少数の合成変数に要約する手法である。 The difference information output from the difference calculation unit 32 is given to the dimension compression unit 4 and compressed. For example, the number of dimensions of an output obtained by applying a Gabor filter of 3 scales and 4 rotation angles to a 48 × 48 pixel image is 48 × 48 × 3 × 4 = 27648 dimensions, which is enormous. Since it is difficult to input a huge number of dimensional difference information to the facial expression determination unit (neural network) 5 in the subsequent stage, the dimensional compression unit 4 performs dimensional compression to obtain compressed differential information.
The dimensional compression in the dimensional compression unit 4 is performed by principal component analysis (PCA). The principal component analysis is a method of summarizing information on a large number of variables that are correlated with each other into a small number of synthesized variables that are not correlated with each other.

次元圧縮部４から出力された圧縮差分情報は、表情判定部５に与えられる。表情判定部５では、圧縮差分情報の分類（表情ごとの分類）を行うことにより、表情判定を行う。図７に示すように、表情判定部５は、入力層、中間層、出力層により構成された３層ニューラルネットワークにより構成されている。入力層の入力ユニット数は、圧縮差分情報の次元数であり、出力層の出力ユニット数は、分類する表情数だけ設けられている。なお、中間数の中間ユニット数は適宜設定される。 The compression difference information output from the dimension compression unit 4 is given to the facial expression determination unit 5. The facial expression determination unit 5 performs facial expression determination by classifying the compressed difference information (classification for each facial expression). As shown in FIG. 7, the facial expression determination unit 5 is configured by a three-layer neural network including an input layer, an intermediate layer, and an output layer. The number of input units in the input layer is the number of dimensions of the compression difference information, and the number of output units in the output layer is provided as many as the number of facial expressions to be classified. The intermediate number of intermediate units is set as appropriate.

対象顔画像を顔表情データベースから獲得する学習モードでは、既知の表情の対象顔画像から得た差分情報を入力とし、出力ユニットへの教師信号として既知の当該表情を示す情報を与える。これにより、表情を認識するための各ユニット間の結合荷重Ｑ，Ｗが得られる。 In the learning mode in which the target face image is acquired from the facial expression database, the difference information obtained from the target face image of the known expression is input, and information indicating the known expression is given as a teacher signal to the output unit. As a result, the coupling loads Q and W between the units for recognizing the facial expression are obtained.

表情未知の対象顔画像をカメラから獲得する認識モードでは、学習モードによって得られた結合荷重を持つニューラルネットワークに対し、差分情報を与える。入力値は中間層、出力層への伝搬し、出力層の各ユニットから値が出力される。出力層のユニットは、０〜１．０までの実数値を出力値としてとり、最も大きな値を示したユニットに対応する表情が、表情判定結果としての表情となる。 In the recognition mode in which the target face image with unknown expression is acquired from the camera, difference information is given to the neural network having the connection weight obtained in the learning mode. The input value propagates to the intermediate layer and the output layer, and a value is output from each unit of the output layer. The unit of the output layer takes a real value from 0 to 1.0 as an output value, and the facial expression corresponding to the unit showing the largest value is the facial expression as the facial expression determination result.

［実験結果］
上記表情認識装置１によって、真顔、笑顔、驚き、怒りの４表情の分類実験を行った。学習モード用の表情既知の顔画像として９名×４表情の顔画像を用いた。認識モード用の表情未知の顔画像の提供者として５名の被験者を用意した。各被験者の４表情を示す画像として各被験者の各表情について、それぞれ１０画像程度（６〜１７画像）用意した。
ガボールフィルタのスケール数ｋとしては、４スケール（ｋ＝π／（√２），π／２，π／（２√２），π／４）、ガボールフィルタの回転角φとしては４回転角（φ＝０，π／４，π／２，３π／４）とした。 [Experimental result]
The facial expression recognition device 1 was used to classify the four facial expressions of true face, smile, surprise, and anger. A face image of 9 people × 4 facial expressions was used as a facial image with known facial expressions for the learning mode. Five subjects were prepared as providers of facial images with unknown facial expressions for recognition mode. About 10 images (6 to 17 images) were prepared for each facial expression of each subject as an image showing four facial expressions of each subject.
The scale number k of the Gabor filter is 4 scales (k = π / (√2), π / 2, π / (2√2), π / 4), and the rotation angle φ of the Gabor filter is 4 rotation angles ( φ = 0, π / 4, π / 2, 3π / 4).

次元圧縮部４における圧縮後の次元数を３０とし、ニューラルネットワークの入力ユニット数も３０とした。ニューラルネットワークの出力ユニット数は表情分類数の４とし、中間ユニット数は１５に設定した。学習モードでは、教師信号との平均二乗誤差が０．０００１以下になるまで学習し、その結合荷重を認識モードに用いた。
各被験者１〜５について、４つの表情（真顔、笑顔、驚き、怒り）ついて、複数回ランダムに表情を表現した画像を入力画像（対象顔画像）とし、入力画像数（入力枚数）に対して正確に表情を認識した認識率を求めた。
なお、真顔の顔画像については、真顔同士の差分（自己差分）をとった。 The number of dimensions after compression in the dimension compression unit 4 is 30, and the number of input units of the neural network is also 30. The number of output units of the neural network was set to 4 as the number of facial expression classifications, and the number of intermediate units was set to 15. In the learning mode, learning was performed until the mean square error with the teacher signal was 0.0001 or less, and the combined weight was used in the recognition mode.
For each subject 1-5, an image expressing a facial expression at random multiple times for four facial expressions (true face, smile, surprise, anger) is used as an input image (target face image), and the number of input images (number of input images) The recognition rate for accurately recognizing facial expressions was obtained.
In addition, about the face image of a true face, the difference (self-difference) between true faces was taken.

下記表は、実験結果を示している。
The table below shows the experimental results.

上記実験結果によれば、全体平均として８０％を超える良好な表情認識率が得られていることがわかる。 According to the above experimental results, it can be seen that a good facial expression recognition rate exceeding 80% as an overall average is obtained.

なお、本発明は、上記実施形態に限定されるものではない。例えば、フィルタは、ガボールフィルタに限られるものではなく、領域特性と方向特性が異なる複数のフィルタであればよい。また、基準顔画像は、真顔が好ましいが、これに限定されるものではない。さらに、圧縮部４や表情判定部５も上記のものに限定されるわけではない。また、ガボールフィルタのスケール数や回転角は上記のものに限られるものではなく、任意のスケール数、回転角を採用することができる。 The present invention is not limited to the above embodiment. For example, the filter is not limited to a Gabor filter, and may be a plurality of filters having different region characteristics and directional characteristics. The reference face image is preferably a true face, but is not limited to this. Further, the compression unit 4 and the facial expression determination unit 5 are not limited to those described above. The scale number and rotation angle of the Gabor filter are not limited to those described above, and any scale number and rotation angle can be adopted.

表情認識装置の構成図である。It is a block diagram of a facial expression recognition apparatus. 画像の正規化手順を示す図である。It is a figure which shows the normalization procedure of an image. 正規化顔画像における瞳の位置と瞳間距離との関係を示す図である。It is a figure which shows the relationship between the position of the pupil in a normalized face image, and the distance between pupils. ガボールフィルタの一例を示す図である。It is a figure which shows an example of a Gabor filter. ３スケール×４回転角のガボールフィルタを示す図である。It is a figure which shows the Gabor filter of 3 scales x 4 rotation angles. 顔画像、基準特徴情報、表情特徴情報、及び差分情報を示す図である。It is a figure which shows a face image, reference | standard feature information, expression feature information, and difference information. ニューラルネットワーク構成図である。It is a neural network block diagram.

符号の説明Explanation of symbols

１表情認識装置
２前処理部
３差分取得部
４次元圧縮部
５表情判定部
３１特徴抽出部
３２差分演算部 DESCRIPTION OF SYMBOLS 1 Facial expression recognition apparatus 2 Pre-processing part 3 Difference acquisition part 4 dimensional compression part 5 Expression determination part 31 Feature extraction part 32 Difference calculation part

Claims

対象顔画像中の顔の表情を認識する方法であって、
対象顔画像に対して、領域特性及び方向特性が異なる複数の空間フィルタによって周波数分析を行って、前記対象顔画像の複数の表情特徴情報を抽出する特徴抽出ステップと、
複数の前記空間フィルタを同一者の基準顔画像に作用させて得られた複数の基準特徴情報と、複数の前記表情特徴情報との差分を示す複数の差分情報を求める差分演算ステップと、
前記差分情報に基づいて表情を判定する判定ステップと、
を含むことを特徴とする表情認識方法。 A method for recognizing facial expressions in a target face image,
A feature extraction step of performing frequency analysis on a target face image by a plurality of spatial filters having different region characteristics and direction characteristics to extract a plurality of facial expression feature information of the target face image;
A difference calculating step for obtaining a plurality of difference information indicating a difference between a plurality of reference feature information obtained by applying a plurality of the spatial filters to a reference face image of the same person and a plurality of the facial expression feature information;
A determination step of determining a facial expression based on the difference information;
A facial expression recognition method comprising:

前記空間フィルタは、ガボールフィルタである請求項１記載の表情認識方法。 The expression recognition method according to claim 1, wherein the spatial filter is a Gabor filter.

前記差分情報を圧縮した圧縮差分情報を取得するステップを更に含み、
前記判定ステップは、前記圧縮差分情報に基づいて表情を判定することを特徴とする請求項１又は２に記載の表情認識方法。 Further comprising obtaining compressed difference information obtained by compressing the difference information;
The expression recognition method according to claim 1, wherein the determination step determines an expression based on the compression difference information.

前記判定ステップは、前記差分情報を入力とし、表情の種別を出力とするニューラルネットワークによって表情を判定することを特徴とする請求項１〜３のいずれかに記載の表情認識方法。 The facial expression recognition method according to any one of claims 1 to 3, wherein in the determination step, the facial expression is determined by a neural network that receives the difference information and outputs a facial expression type.

前記特徴抽出ステップは、前記対象顔画像の顔全体に対して、複数の前記空間フィルタを作用させることを特徴とする請求項１〜４のいずれかに記載の表情認識方法。 The facial expression recognition method according to claim 1, wherein in the feature extraction step, a plurality of the spatial filters are applied to the entire face of the target face image.

対象顔画像中の顔の表情を認識する装置であって、
対象顔画像に対して、領域特性及び方向特性が異なる複数の空間フィルタによって周波数分析を行って、前記対象顔画像の複数の表情特徴情報を抽出する特徴抽出部と、
複数の前記特徴抽出フィルタを同一者の基準顔画像に作用させて得られた複数の基準特徴情報と、前記表情特徴情報との差分を示す複数の差分情報を求める差分演算部と、
前記差分情報に基づいて表情を判定する判定部と、
を含むことを特徴とする表情認識装置。 A device for recognizing facial expressions in a target face image,
A feature extraction unit that performs frequency analysis on a target face image using a plurality of spatial filters having different region characteristics and direction characteristics, and extracts a plurality of facial expression feature information of the target face image;
A difference calculation unit for obtaining a plurality of difference information indicating a difference between the plurality of reference feature information obtained by applying the plurality of feature extraction filters to the reference face image of the same person and the facial expression feature information;
A determination unit for determining a facial expression based on the difference information;
A facial expression recognition device comprising: