JP5234833B2

JP5234833B2 - Facial expression classifier creation apparatus, facial expression classifier creation method, facial expression recognition apparatus, facial expression recognition method, and programs thereof

Info

Publication number: JP5234833B2
Application number: JP2010008650A
Authority: JP
Inventors: 泰彦宮崎; 豪入江; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-01-19
Filing date: 2010-01-19
Publication date: 2013-07-10
Anticipated expiration: 2030-01-19
Also published as: JP2011150381A

Description

本発明は，画像データ内に存在する人物の顔の表情を認識するための識別器を自動的に効率よく作成することを可能にする技術，および，その技術を利用して与えられた画像内から人物の顔の表情を自動的に認識することを可能にする技術に関するものである。 The present invention provides a technique for automatically and efficiently creating a discriminator for recognizing facial expressions of a person existing in image data, and an image provided using the technique. The present invention relates to a technology that makes it possible to automatically recognize facial expressions of a person.

画像解析技術の分野において，表情に限らず，画像データを解析することによって自動的にそこに写っている何らかの情報を自動的に取得する方法としては，予め正解ラベルを付与されたサンプルデータを用意しておき，それを学習用のデータとして使用する方法が多く知られている。例えば，特許文献１の技術により，画像データ内から特定のパターンを認識することが可能であるが，この技術においても，学習画像データが必要となる。実際，特許文献１のような従来技術によって，非特許文献３にあるように，画像データ内に存在する人物の顔を認識することが，「各方向におけるさまざまな人物の画像を収集し，それらを学習することにより」可能になる（非特許文献３の第１８頁）。 In the field of image analysis technology, not only facial expressions, but also sample data with a correct answer label prepared in advance as a method to automatically acquire some information that is reflected there by analyzing image data There are many known methods of using it as learning data. For example, it is possible to recognize a specific pattern from the image data by the technique of Patent Document 1, but this technique also requires learning image data. In fact, as described in Non-Patent Document 3, the conventional technique such as Patent Document 1 recognizes the faces of persons existing in image data, “collecting various human images in each direction, It becomes possible by learning "(page 18 of Non-Patent Document 3).

パターン認識ではなく，画像データを解析することによって自動的に分類を行い，その分類結果を「ラベル」として出力する方法も知られている。これらの方法の多くに共通するアプローチは，次のようになる。 There is also known a method of performing classification automatically by analyzing image data instead of pattern recognition and outputting the classification result as a “label”. A common approach to many of these methods is as follows.

まず，画像データを解析することによって得られる何らかの多次元の数値的な情報（これを一般に特徴量という）を取得する。また，予め人手等により正解ラベルを付与されたサンプルデータを用意しておき，これらのサンプルデータに適用して統計的な処理（機械学習とも呼ばれる）を行っておくことで，特徴量からラベルに変換するための処理モジュール（識別器や検出器とも呼ばれる）を構築する。このことにより，正解ラベルが未知の対象画像データに対しても，同じ特徴量を取得してその特徴量を識別器にかけることで，情報を表すラベルが出力される。 First, some multidimensional numerical information (this is generally referred to as a feature amount) obtained by analyzing image data is acquired. Also, by preparing sample data with correct labels given in advance by hand, etc., and applying statistical processing (also called machine learning) to these sample data, the feature values are converted into labels. Build a processing module (also called a discriminator or detector) to convert. As a result, even for target image data for which the correct label is unknown, a label representing information is output by acquiring the same feature amount and applying the feature amount to the discriminator.

表情の認識（分類）に関しても同様の技術が適用可能である。例えば，非特許文献２では，特徴量として，目じり，鼻の頭，唇の端などの顔の決められた３４箇所の位置において，Gabor Waveletsを適用することによって得られる数値を利用することで，（happiness, sadness, surprise, anger, disgust, fear）という表情に分類できることが記載されている。 Similar techniques can be applied to facial expression recognition (classification). For example, in Non-Patent Document 2, by using numerical values obtained by applying Gabor Wavelets as the feature amount at 34 positions of the face such as the eyes, the head of the nose, and the edge of the lips, It is described that the expression can be classified into (happiness, sadness, surprise, anger, disgust, fear).

また，非特許文献１では，さらに簡易化し，６４箇所の画像上の格子点において，同じく Gaborフィルター（Gabor Waveletsと同義）を適用して得られる特徴量を，ニューラルネットワークによって構築できる分類器にかけることで，笑顔（smile/laughter）か否かを出力する技術が記載されている。 Further, in Non-Patent Document 1, further simplification is applied to a classifier that can be constructed by a neural network using feature values obtained by applying the same Gabor filter (synonymous with Gabor Wavelets) at 64 lattice points. Thus, a technique for outputting whether or not a smile (laugh / laughter) is described.

なお，非特許文献４には，顔面表情に焦点をあて，意図的表出条件および自発的表出条件下における表情の動的変化に関して分析した結果が記載されている。 Non-Patent Document 4 describes the results of analyzing dynamic changes in facial expressions under intentional expression conditions and spontaneous expression conditions, focusing on facial expressions.

特開２００８−２８７６５２号公報JP 2008-287852 A

Uwe Kowalik 他，「Creating joyful digests by exploiting smile/laughter facial expressions present in video」，International Workshop on Advanced Image Technology(2009)．Uwe Kowalik et al., “Creating joyful digests by exploiting smile / laughter facial expressions present in video”, International Workshop on Advanced Image Technology (2009). Michael Lyons 他，「Coding Facial Expressions with Gabor Wavelets 」，Proceedings, Third IEEE International Conference on Automatic Face and Gesture Recognition(1998)．Michael Lyons et al., "Coding Facial Expressions with Gabor Wavelets", Proceedings, Third IEEE International Conference on Automatic Face and Gesture Recognition (1998). 杵渕他，「画像処理による広告効果測定技術」，ＮＴＴ技術ジャーナル，2009.7，P.16-19 ．Tsuji et al., “Advertising Effect Measurement Technology Using Image Processing”, NTT Technology Journal, 2007.7, P.16-19. 内田英子他，「高速度カメラを用いた顔面表情の動的変化に関する分析」，電子情報通信学会技術研究報告. ＨＩＰ，ヒューマン情報処理 99(722), pp.1-6(2000)．Eiko Uchida et al., “Analysis of dynamic changes in facial expression using high-speed camera”, IEICE Technical Report. HIP, Human Information Processing 99 (722), pp.1-6 (2000).

上述したような方法により，何らかの情報を認識する場合，学習のためのサンプルとなる正解データの収集が重要となる。 When recognizing some information by the above-described method, it is important to collect correct data as a sample for learning.

例えば，非特許文献３にあるような「顔領域」の認識において，必要なデータを収集するためには，複数人の被験者に依頼し，各方向からの顔写真を撮影するといったことが必要になる。この場合には，撮影手順などをきちんと整備しておくといった工夫により，１件あたりの正解サンプルデータの作成は，短時間で行うことができる。 For example, in recognition of the “face area” as described in Non-Patent Document 3, in order to collect necessary data, it is necessary to request a plurality of subjects and take a face photograph from each direction. Become. In this case, the correct sample data per case can be created in a short period of time by properly preparing the photographing procedure and the like.

しかしながら，上記のような方法により表情認識をする場合，サンプルとなる正解データを集めることが困難である。非特許文献４などによると，「意図的な表情」と「自発的な表情」は異なると言われているのに対し，例えば，非特許文献２の例では，被験者がカメラの前でそれぞれ指示された表情を作って（しかも，ややオーバーに作って）撮影したデータが使われている。このように，「表情認識用の学習サンプルデータ」を効率よく集めようとすると，被験者にその目的を伝え短時間になるべく多くの表情を作ってもらうといった方法をとる必要があり，必然的に「意図的な表情」となってしまう。 However, when facial expression recognition is performed by the above method, it is difficult to collect sample correct answer data. According to Non-Patent Document 4 and the like, it is said that “intentional facial expression” and “spontaneous facial expression” are different, but in the example of Non-Patent Document 2, for example, the subject gives instructions in front of the camera. The data is used to create a facial expression (and make it a little over). Thus, in order to efficiently collect “learning sample data for facial expression recognition”, it is necessary to take a method that informs the subject of the purpose and makes as many facial expressions as possible in a short time. "Intentional expression".

一方で，「自発的な表情」を含む学習用のサンプルデータを集めることは容易ではない。実際，非特許文献４でも，一人当たりの被験者に対して，長時間をかけてデータを収集しており，このような方法では多人数の被験者のデータを集めることは非常に難しい。 On the other hand, it is not easy to collect sample data for learning including “spontaneous facial expressions”. In fact, even in Non-Patent Document 4, data is collected over a long time for each subject, and it is very difficult to collect data for a large number of subjects by such a method.

本発明は，以上の課題を解決し，顔表情を認識するための学習データの収集を効率よく行い，特に意図的に作った顔ではなく，自然な顔表情の画像を短時間で多数収集し，認識精度のよい識別器の作成と表情認識を実現できるようにすることを目的とする。 The present invention solves the above problems, efficiently collects learning data for recognizing facial expressions, and collects a large number of images of natural facial expressions in a short time, not particularly intentionally created faces. The purpose is to create a classifier with high recognition accuracy and to realize facial expression recognition.

上記課題を解決するために，本発明は，インターネット等に公開されている画像データのうち，特定の表情との関連性が非常に高いキーワードを，タグ情報として持つ画像データを検索し，その画像データの中から実際に人物の顔と推定される画像内の領域を抽出し，その領域の数および面積が予め決められた条件に合致する画像データを収集し，そのデータの顔領域から画像特徴量を算出し，算出された画像特徴量を機械学習することによって，表情認識用の識別器を生成する。 In order to solve the above problems, the present invention, among the image data that has been exposed to the Internet or the like, the keywords is very high association with a specific facial expression, retrieves the image data having the tag information, the image Regions in the image that are actually estimated to be human faces are extracted from the data, and image data whose number and area meet the predetermined conditions are collected. A classifier for facial expression recognition is generated by calculating the quantity and machine learning the calculated image feature quantity.

さらに，生成された識別器を用い，与えられた画像データから，顔領域を抽出し，抽出された顔領域の画像特徴量を算出し，算出された画像特徴量を，この識別器にかけることで，表情分類結果のラベルを出力する。 Further, using the generated discriminator, the face area is extracted from the given image data, the image feature quantity of the extracted face area is calculated, and the calculated image feature quantity is applied to the discriminator. The label for the facial expression classification result is output.

詳しくは，本発明は，画像データ内に存在する人物の顔の表情を認識するための識別器を作成する表情識別器作成装置であって，与えられたキーワードに対し，ネットワーク上で公開された画像データから，その画像データに関連付けられたキーワードが前記与えられたキーワードと一致する画像データを検索する機能を持つサンプル画像データ検索部と，検索された画像データから，人物の顔と推定される画像内の領域を抽出し，その領域の数および面積が予め決められた条件に合致するものを出力する機能を持つ顔検出部と，出力された顔領域の画像データから多次元の数値データからなる特徴量を計算により求める特徴量算出部と，求められた特徴量を教師データとして機械学習によって識別器を作成する識別器作成部とを備えることを特徴とする。
More specifically, the present invention relates to a facial expression classifier creation device for creating a classifier for recognizing facial expressions of a person existing in image data, and is disclosed on a network for a given keyword. From the image data, a sample image data search unit having a function of searching for image data in which a keyword associated with the image data matches the given keyword, and a human face is estimated from the searched image data. Extract a region in the image and output a face detection unit with the number and area of the regions that meet a predetermined condition. From the output face region image data, multi-dimensional numerical data And a discriminator creating unit that creates a discriminator by machine learning using the obtained feature amount as teacher data. And features.

例えば，特徴量算出部における特徴量の算出では，ガボールフィルタ等を利用することができる。 For example, a Gabor filter or the like can be used for calculating the feature value in the feature value calculation unit.

さらに，本発明は，上記の表情識別器作成装置において，前記サンプル画像データ検索部は，笑顔に関連するキーワードを与えることによって，笑顔を多く含む画像データを検索する機能を持ち，前記識別器作成部は，笑顔を識別するための表情識別器を作成するように構成することができる。 Furthermore, the present invention provides the facial expression classifier creating apparatus, wherein the sample image data retrieval unit has a function of retrieving image data containing a lot of smiles by giving keywords related to smiles, The unit can be configured to create a facial expression classifier for identifying a smile.

また，本発明は，上記の表情識別器作成装置において，前記サンプル画像データ検索部は，さらに，笑顔以外の表情に関連するキーワードを与えることによって，笑顔が多く含まれない画像データを検索する機能を持ち，前記識別器作成部は，笑顔を多く含む画像データから得られた特徴量と，笑顔が多く含まれない画像データから得られた特徴量との統計的な分布の差を探索する機械学習によって，笑顔であるか否かを判定する表情識別器を作成するように構成することができる。 Further, the present invention provides a function for retrieving image data that does not contain many smiles by providing a keyword related to a facial expression other than a smile, in the sample image data search unit in the facial expression classifier creating apparatus described above. And the classifier creating unit searches for a difference in statistical distribution between a feature value obtained from image data including many smiles and a feature value obtained from image data not including many smiles. By learning, it can be configured to create a facial expression classifier that determines whether or not it is a smile.

また，本発明は，上記の表情識別器作成装置において，前記サンプル画像データ検索部は，笑顔，怒り顔，驚き顔または泣き顔に関連する特定の表情に関するキーワードを与えることによって，前記特定の表情を多く含む画像データを検索する機能を持つとともに，顔や人物に関連するキーワードを条件とし，かつ前記特定の表情に関するキーワードに関連付けられていないという条件を付加することによって，前記特定の表情が多く含まれない画像データを検索する機能を持ち，前記識別器作成部は，前記特定の表情を多く含む画像データから得られた特徴量と，前記特定の表情が多く含まれない画像データから得られた特徴量との統計的な分布の差を探索する機械学習によって，前記特定の表情であるか否かを判定する表情識別器を作成するように構成することができる。 Further, the present invention provides the facial expression classifier creating apparatus, wherein the sample image data search unit gives the specific facial expression by giving a keyword related to a specific facial expression related to a smile, an angry face, a surprised face, or a crying face. It has a function to search a large amount of image data, and includes a lot of the specific facial expressions by adding a condition that the keyword is related to a face or a person and is not related to a keyword related to the specific facial expression. A function of retrieving image data that is not included, and the classifier generating unit is obtained from feature data obtained from image data including a large amount of the specific expression and image data not including a large amount of the specific expression Creates a facial expression classifier that determines whether a particular facial expression is detected by machine learning that searches for the difference in statistical distribution from the feature quantity. It can be configured to so that.

また，第２の本発明は，与えられた画像内から人物の顔の表情を自動的に認識する表情認識装置であって，表情の認識対象となる画像データを取得する画像データ取得部と，取得された画像データから，人物の顔と推定される画像内の領域を抽出する顔検出部と，抽出された顔領域の画像データから多次元の数値データからなる特徴量を計算により求める特徴量算出部と，上記の表情識別器作成装置によって作成された表情識別器とを備え，前記特徴量算出部は，前記表情識別器作成装置における表情識別器の作成に用いた特徴量の算出方法と同じ算出方法を用いて顔領域の画像データから特徴量を算出し，算出した特徴量を前記表情識別器の入力として前記表情識別器に表情の識別結果を出力させることを特徴とする。 According to a second aspect of the present invention, there is provided a facial expression recognition device for automatically recognizing a facial expression of a person from a given image, an image data acquisition unit for acquiring image data for recognition of the facial expression, A face detection unit that extracts an area in the image that is estimated to be a human face from the acquired image data, and a feature quantity that is obtained by calculating a feature quantity including multidimensional numerical data from the extracted image data of the face area A calculation unit, and a facial expression classifier created by the facial expression classifier creation device, wherein the feature quantity calculation unit includes a calculation method of a feature quantity used to create the facial expression classifier in the facial expression classifier creation device; A feature amount is calculated from image data of a face region using the same calculation method, and the facial expression identification result is output to the facial expression classifier using the calculated feature amount as an input to the facial expression classifier.

本発明により，多くの自発的表情を含むサンプル画像データを使用した識別器を効率的に作成することが可能となる。さらに，そのようにして生成された識別器を利用して，自発的表情についても，より精度よく認識することが可能となる。 According to the present invention, it is possible to efficiently create a discriminator using sample image data including many spontaneous facial expressions. Furthermore, it is possible to recognize spontaneous facial expressions more accurately by using the classifier generated in this way.

本発明の実施例に係る装置の全体構成図である。1 is an overall configuration diagram of an apparatus according to an embodiment of the present invention. サンプル画像データ検索部の処理フローチャートである。It is a process flowchart of a sample image data search part. 顔検出部の処理フローチャートである。It is a process flowchart of a face detection part.

以下，図面を用いて，本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は，本発明の実施例に係る装置の全体構成図である。図中，１０はＣＰＵやメモリ等によって構成される表情識別器作成装置であり，サンプル画像データ検索部１１，サンプル画像の顔検出部１２，特徴量算出部１３，識別器作成部１４を備える。また，２０は表情認識装置であり，画像データ取得部２１，入力画像の顔検出部２２，特徴量算出部１３，表情識別器２４を備える。サンプル画像データ３０は，表情識別器２４の作成に用いる学習用のデータであり，入力画像３１は，表情識別対象となる画像である。表情ラベル３２は，入力画像３１中の表情の認識結果を示す情報である。 FIG. 1 is an overall configuration diagram of an apparatus according to an embodiment of the present invention. In the figure, reference numeral 10 denotes a facial expression discriminator creating apparatus constituted by a CPU, a memory and the like, and includes a sample image data search unit 11, a sample image face detection unit 12, a feature amount calculation unit 13, and a discriminator creation unit 14. Reference numeral 20 denotes an expression recognition device, which includes an image data acquisition unit 21, an input image face detection unit 22, a feature amount calculation unit 13, and an expression classifier 24. The sample image data 30 is learning data used for creating the facial expression classifier 24, and the input image 31 is an image that is a facial expression identification target. The expression label 32 is information indicating the recognition result of the expression in the input image 31.

〔第１の実施例〕
まず，表情識別器作成装置１０の実施例について述べる。初めに，サンプル画像データ検索部１１の実施例について述べる。 [First embodiment]
First, an embodiment of the facial expression classifier creating apparatus 10 will be described. First, an example of the sample image data search unit 11 will be described.

現在では，画像共有サイトとして知られているインターネット上のサーバが多く存在している。これらのサーバでは，インターネット上のユーザが，ブラウザ端末より画像データをアップロードすることによって，自由に画像データを登録できる。さらに，その際にその画像データを表すキーワード情報（タグと呼ばれる）をアップロードするユーザにより，複数登録することが一般的である。これらのサイトでは，登録されたタグと画像データとの関連をサイト内のデータベースに格納する。これらのサイトに対し，キーワードを指定することにより，そのキーワードと一致するタグを持つ画像データを検索することができ，実際にそのようなことがインターネットを通じてブラウザから操作可能となっている。 Currently, there are many servers on the Internet known as image sharing sites. In these servers, users on the Internet can freely register image data by uploading image data from a browser terminal. Furthermore, it is common to register a plurality of keywords by a user who uploads keyword information (called tags) representing the image data. In these sites, the relationship between the registered tag and the image data is stored in a database in the site. By specifying a keyword for these sites, it is possible to search for image data having a tag that matches the keyword, and such a fact can be operated from a browser through the Internet.

ブラウザから操作可能であるということは，特定のＵＲＬ（Uniform Resource Locator）に対して，特定のパラメータ指定をして標準化されたＨＴＴＰプロトコルを使用してアクセスすることで，結果を表すためのレスポンスが，標準化されたＨＴＭＬ等の形式で返却され，その中の特定のＨＴＭＬタグに，実際の画像データ（あるいはそのＵＲＬ）が埋め込まれているということである。このため，ブラウザを使って人手で操作することなく，同様の動作を行うソフトウェアモジュールを構築することは可能である。本実施例におけるサンプル画像データ検索部１１は，インターネットに接続された計算機上に実装された，このようなモジュールにより実施できる。 The operation from the browser means that a response for expressing a result is obtained by accessing a specific URL (Uniform Resource Locator) using a standardized HTTP protocol by specifying a specific parameter. , It is returned in a standardized HTML format or the like, and the actual image data (or its URL) is embedded in a specific HTML tag therein. For this reason, it is possible to construct a software module that performs the same operation without manual operation using a browser. The sample image data search unit 11 in the present embodiment can be implemented by such a module mounted on a computer connected to the Internet.

なお，検索のためのＵＲＬのパラメータ指定方法や，検索結果のＨＴＭＬの解析方法は，それぞれの画像共有サイトで異なる。そのため，図２に示すようにすることで，複数の画像共有サイトから検索することができる。 Note that the URL parameter specification method for search and the HTML analysis method of the search result are different for each image sharing site. Therefore, by making it as shown in FIG. 2, it is possible to search from a plurality of image sharing sites.

図２は，サンプル画像データ検索部１１の処理フローチャートである。まず，ステップＳ１０では，検索の条件として表情などを表すキーワードを指定する。このキーワードは，予めテーブル化して記憶しておいてもよいし，識別器の作成者に入力させるようにしてもよい。ここで，検索の条件としては，指定したキーワードと，画像に付与されたタグの１つが「一致する」という条件だけでなく，「キーワードＫ１を含み，キーワードＫ２を含まない」といった条件での検索も通常可能である。実際，画像共有サイトがそのようなデータベース検索機能を公開していることもあるし，あるいは画像に付与されたタグ情報一覧を取得できるのであれば，検索した結果の画像データからキーワードＫ２を含むデータを除外すればよい。また，辞書データベースを利用することによって，「他言語での表現（例：「スマイル」と「smile 」）」などでの検索も可能である。 FIG. 2 is a process flowchart of the sample image data search unit 11. First, in step S10, a keyword representing a facial expression or the like is designated as a search condition. This keyword may be stored in a table in advance or may be input by the creator of the classifier. Here, the search condition is not only a condition that the specified keyword and one of the tags attached to the image “match”, but also a search condition that “includes keyword K1 and does not include keyword K2”. Is also usually possible. In fact, if the image sharing site may publish such a database search function, or if the tag information list attached to the image can be acquired, data including the keyword K2 from the image data of the search result Should be excluded. In addition, by using a dictionary database, it is possible to search by “expressions in other languages (eg,“ smile ”and“ smile ”).

ステップＳ１１では，例えば複数の動画共有サイトＡ，Ｂの画像データを収集するとすると，各動画共有サイトＡ，Ｂごとに用意した検索モジュールを呼び出す。動画像共有サイトＡの検索モジュールでは，まず，ステップＳ１２ａによって，動画共有サイトＡの検索要求ＵＲＬを組み立てる。次に，ステップＳ１３ａでは，組み立てたＵＲＬを用いて，その動画共有サイトＡへＨＴＴＰリクエストを送る。ステップＳ１４ａでは，動画共有サイトＡからのＨＴＴＰレスポンスを受信し，ステップＳ１５ａでは，動画共有サイトＡからのＨＴＴＰレスポンスに含まれるＨＴＭＬを解析して画像データを取得する。 In step S11, for example, if image data of a plurality of video sharing sites A and B are collected, a search module prepared for each video sharing site A and B is called. In the search module for the moving image sharing site A, first, the search request URL for the moving image sharing site A is assembled in step S12a. Next, in step S13a, an HTTP request is sent to the moving image sharing site A using the assembled URL. In step S14a, an HTTP response from the video sharing site A is received, and in step S15a, the HTML included in the HTTP response from the video sharing site A is analyzed to acquire image data.

動画共有サイトＢに対しても同様に，動画共有サイトＢの検索モジュールによるステップＳ１２ｂ〜Ｓ１５ｂを実行して，動画共有サイトＢが保持する画像データのうち，キーワードで指定した検索条件に合致する画像データを取得する。 Similarly, for the video sharing site B, the steps S12b to S15b by the search module of the video sharing site B are executed, and among the image data held by the video sharing site B, an image that matches the search condition specified by the keyword. Get the data.

ステップＳ１６では，動画共有サイトＡ，Ｂから取得した全画像データをまとめて，サンプル画像の顔検出部１２へ出力する。 In step S16, all the image data acquired from the moving image sharing sites A and B are collected and output to the face detection unit 12 of the sample image.

以上のような構成により，具体例としては，次のようなサンプル画像データを検索することが可能となる。 With the above configuration, as a specific example, the following sample image data can be retrieved.

（１）［例１］：「笑顔」または「スマイル」または「smile 」または「laugh 」というタグを持つ笑顔サンプル画像データの取得。 (1) [Example 1]: Acquisition of smile sample image data having a tag of “smile” or “smile” or “smile” or “laugh”.

（２）［例２］：「顔」または「表情」というタグを持ち，「笑顔」「スマイル」「smile 」「laugh 」のいずれのタグも持たない，非笑顔サンプル画像データの取得。 (2) [Example 2]: Acquisition of non-smiling sample image data having a tag of “face” or “expression” and having no tags of “smile”, “smile”, “smile”, or “laugh”.

（３）［例３］：［例１］の笑顔サンプル画像データに加え，その他のサンプル画像データの取得。怒り顔，驚き顔，泣き顔の表情などのサンプル画像データを取得。例えば，怒り顔サンプル画像データは，「怒り」または「anger 」というタグを持つものとし，驚き顔サンプル画像データは，「驚き」または「びっくり」または「surprise」とうタグを持つものとし，泣き顔サンプル画像データは，「悲しい」または「涙」または「泣き顔」または「sadness 」または「sad 」または「tear」というタグを持つものとする。 (3) [Example 3]: Acquisition of other sample image data in addition to the smile sample image data of [Example 1]. Sample image data such as an angry face, a surprised face, and a crying face expression are acquired. For example, an angry face sample image data has a tag “anger” or “anger”, and a surprise face sample image data has a tag “surprise”, “surprise” or “surprise”, and a crying face sample. The image data has a tag “sad” or “tears” or “crying face” or “sadness” or “sad” or “tear”.

なお，上記の例において特定の表情を持たないサンプル画像データを取得するにあたっては，一般的な顔や人物に関連するタグを含み，特定の表情に関連するタグを含まないというような条件を，複数のキーワードのＡＮＤ結合およびＯＲ結合によって指定すればよい。 In the above example, when acquiring sample image data that does not have a specific facial expression, a condition that a tag related to a general face or person is included and a tag related to a specific facial expression is not included. It may be specified by AND combination and OR combination of a plurality of keywords.

サンプル画像の顔検出部１２の処理フローを，図３に示す。まず，ステップＳ２０では，サンプル画像データ検索部１１で検索した画像データの一つを取得する。次に，ステップＳ２１では，取得した画像データを解析し，顔と推定される画像領域を求める。ここで顔と推定される画像領域を求めるにあたっては，例えば非特許文献３にあるような顔の学習データを用意して，特許文献１の方法を用いればよい。他にも，ＯｐｅｎＣＶなどとして知られているプログラムを利用してもよい（http://opencv.jp/sample/object＿detection.html）。 The processing flow of the sample image face detection unit 12 is shown in FIG. First, in step S20, one of the image data searched by the sample image data search unit 11 is acquired. Next, in step S21, the acquired image data is analyzed to obtain an image area estimated as a face. Here, in obtaining the image area estimated as the face, for example, face learning data as in Non-Patent Document 3 is prepared, and the method of Patent Document 1 may be used. In addition, a program known as OpenCV may be used (http://opencv.jp/sample/object_detection.html).

ステップＳ２２では，顔と推定される画像領域が求まったかどうかを判定し，求まらなかった場合には，その画像データを棄却して処理を終了し，次の画像データについて，同様な顔の画像領域の検出処理に移る。 In step S22, it is determined whether or not an image area estimated as a face has been obtained. If not, the image data is discarded and the process is terminated. The process proceeds to image area detection processing.

顔と推定される画像領域が求まった場合，ステップＳ２３に進み，その画像領域が所定の条件を満たすかどうかを判定する。条件を満たさない場合には，その画像データを棄却して処理を終了する。このステップＳ２３で判定する条件は，例えば「抽出された領域数が１であり，かつ，抽出された領域の面積が，元画像データの面積の１０％以上」といった条件にすることが好適である。その効果については後述する。なお，「元画像データの面積の１０％以上」といった条件については，ステップＳ２１において検査することも可能である。すなわち，特許文献１などの技術によって画像内のパターンを認識する場合，さまざまな大きさに関するパターン適合検査を繰り返すといった処理が行われることが多いのであるが，決められた大きさ以外の結果を利用しないのであれば，そのような大きさでのパターン適合検査を略すことによって，ステップＳ２１の処理を，より高速にすることができる。 If an image area estimated as a face is obtained, the process proceeds to step S23 to determine whether the image area satisfies a predetermined condition. If the condition is not satisfied, the image data is rejected and the process is terminated. The condition determined in step S23 is preferably a condition such as “the number of extracted regions is 1 and the area of the extracted regions is 10% or more of the area of the original image data”. . The effect will be described later. Note that the condition “10% or more of the area of the original image data” can be inspected in step S21. That is, when a pattern in an image is recognized by a technique such as Patent Document 1, a process of repeating pattern conformity inspection for various sizes is often performed, but a result other than a predetermined size is used. If not, the processing of step S21 can be made faster by omitting the pattern matching inspection with such a size.

条件を満たす場合，ステップＳ２４では，元画像データから顔と推定される画像領域の画像を切り出して，予め定められた標準形式に変換し出力する。ここでは，例えば抽出領域を「１２８×１２８ピクセルの８ｂｉｔグレースケール画像」にするというような処理を行う。このような変換を可能とする画像処理ライブラリは，近年数多く用いられている。 If the condition is satisfied, in step S24, an image of an image region estimated as a face is cut out from the original image data, converted into a predetermined standard format, and output. In this case, for example, processing is performed such that the extraction region is an “128-bit 128-pixel 8-bit grayscale image”. Many image processing libraries that enable such conversion have been used in recent years.

特徴量算出部１３としては，非特許文献２や非特許文献３に記載されているような，Gabor Wavelet （ガボールフィルタ）などを利用した特徴量を算出する処理モジュールを使用すればよい。 As the feature amount calculation unit 13, a processing module that calculates a feature amount using a Gabor Wavelet (Gabor filter) or the like as described in Non-Patent Document 2 or Non-Patent Document 3 may be used.

識別器作成部１４は，サンプル画像データ検索部１１の実施方法に応じて，次のように実施することができる。 The discriminator creation unit 14 can be implemented as follows according to the implementation method of the sample image data search unit 11.

表情識別器２４は，画像の特徴量に基づいて，タグを識別するためのモデルであり，前述した［例１］〜［例３］にあるような，タグ付きの画像データから統計的に得るものである。以下，各例に対する表情識別器２４の生成例を説明する。以降，ある画像ｉの特徴量をｘ_i，それに付随するタグが表す表情をｙ_iと表す。 The facial expression classifier 24 is a model for identifying a tag based on the feature amount of the image, and is statistically obtained from the tagged image data as described in [Example 1] to [Example 3]. Is. Hereinafter, an example of generating the facial expression classifier 24 for each example will be described. Later, it represents the feature amount x _i of an image i, the expression representing the tag associated therewith and y _i.

まず，サンプル画像データ検索部１１として，［例１］のようなデータのみを取得するような実施方法の場合について説明する。この例の場合，得られているのは，笑顔を表すタグを持つ画像データだけであるため，識別器作成部１４としては，生成モデルと呼ばれる，同時確率分布ｐ（ｘ_i，ｙ_i）によってモデルを生成し，これを表情識別器として用いる。タグは笑顔を表す「笑顔」や「smile 」などであるので，これらのタグを全て“笑顔”という表情であるとみなせば，生成される生成モデルはｐ（ｘ_i；ｙ_i＝ "笑顔" ）となる。 First, the case of an implementation method in which only the data as in [Example 1] is acquired as the sample image data search unit 11 will be described. In this example, since only image data having a tag representing smile is obtained, the discriminator creating unit 14 uses a joint probability distribution p (x _i , y _i ) called a generation model. A model is generated and used as a facial expression classifier. Since the tags are “smile” or “smile” representing smile, if all of these tags are regarded as “smile”, the generated model is p (x _i ; y _i = “smile” )

生成モデルｐ（ｘ_i；ｙ_i＝ "笑顔" ）は，通常同時確率分布であるが，どのような同時確率分布を用いても構わない。例えば，正規分布，混合正規分布，probabilistic latent semantic analysis（ｐＬＳＡ）やlatent Dirichlet allocation （ＬＤＡ）などの確率分布が代表的である。 The generation model p (x _i ; y _i = “smile”) is a normal joint probability distribution, but any joint probability distribution may be used. For example, probability distributions such as normal distribution, mixed normal distribution, probabilistic latent semantic analysis (pLSA), and latent dirichlet allocation (LDA) are representative.

通常，確率分布はパラメータを持つため，これらのパラメータを画像データ群から推定する必要がある。仮に，ｎ枚のタグ付き画像データ｛ｘ_i，ｙ_i＝ "笑顔" ｜ｉ＝１，２，…，ｎ｝があるとしよう。例えば，（多変量）正規分布の場合を考えよう。 Since the probability distribution usually has parameters, it is necessary to estimate these parameters from the image data group. Suppose that there are n tagged image data {x _i , y _i = “smile” | i = 1, 2,..., N}. For example, consider a (multivariate) normal distribution.

ここで，下添字ｙ_i＝ "笑顔" 付きのφは，タグｙ_iが笑顔である画像の特徴量ｘ_iが従う多変量正規分布を表し，μは平均値ベクトルを表し，Σは分散共分散行列を表す。パラメータは，μとΣの二つである。この場合には，以下の尤度を最大化する最尤推定法によってパラメータである平均と分散共分散行列を推定できる。 Here, φ with the subscript y _i = “smile” represents the multivariate normal distribution followed by the feature value x _i of the image whose tag y _i is smile, μ represents the mean vector, Σ represents the variance Represents the variance matrix. There are two parameters, μ and Σ. In this case, the mean and variance-covariance matrix as parameters can be estimated by the maximum likelihood estimation method that maximizes the following likelihood.

また，混合正規分布やｐＬＳＡの場合には，期待値最大化（Expectation-Maximization：ＥＭ）法，ＬＤＡの場合には，変分ベイズ法やマルコフ連鎖モンテカルロ法（ＭＣＭＣ法）によって，パラメータを推定することができる。例えば，ＥＭ法は下記の参考文献１に，変分ベイズ法は参考文献２に，ＭＣＭＣ法は参考文献３に記載されている。 In the case of mixed normal distribution and pLSA, parameters are estimated by the Expectation-Maximization (EM) method, and in the case of LDA, the parameters are estimated by the variational Bayes method or the Markov chain Monte Carlo method (MCMC method). be able to. For example, the EM method is described in the following reference 1, the variational Bayes method is described in the reference 2, and the MCMC method is described in the reference 3.

［参考文献１］：A.P. Dempster; N. M. Laird; D. B. Rubin,“Maximum Likelihood from Incomplete Data via the EM Algorithm, ” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1., pp.1-38, 1977.
［参考文献２］：H. Attias,“Inferring parameters and structure of latent variable models by variational bayes,” Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 21-30, 1999.
［参考文献３］：S. Geman, and D. Geman“Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Issue 6, pp.721-741, 1984.
結果として，［例１］で取得した画像データを用いた場合には，「笑顔らしさ」を識別する表情識別器が生成される。 [Reference 1]: AP Dempster; NM Laird; DB Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, pp.1- 38, 1977.
[Reference 2]: H. Attias, “Inferring parameters and structure of latent variable models by variational bayes,” Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 21-30, 1999.
[Reference 3]: S. Geman, and D. Geman “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Issue 6, pp.721-741, 1984.
As a result, when the image data acquired in [Example 1] is used, a facial expression classifier for identifying “like smile” is generated.

実際に新たな画像データｊが得られたとして，これを識別する際には，まず，画像データｊの特徴量ｘ_jを抽出したのち，この同時確率ｐ（ｘ_j；ｙ_j＝ "笑顔" ）を求める。この確率値が，ある閾値以上であった場合に笑顔であると識別することになる。 Assuming that new image data j is actually obtained, when identifying this, first, after extracting the feature quantity x _j of the image data j, this joint probability p (x _j ; y _j = “smile”) ) If this probability value is equal to or greater than a certain threshold value, it is identified as a smile.

次に，［例１］に加えて，［例２］のようなサンプル画像データを取得するような実装方法の場合，笑顔と非笑顔を識別するような識別モデルを生成するものとすればよい。［例１］の場合のように，笑顔のみの画像データから生成モデルを得るよりも，［例１］および［例２］の両方を使い，笑顔と非笑顔を積極的に識別する（分ける）ような識別モデルを得るほうが，より高い精度を得ることができる。 Next, in addition to [Example 1], in the case of an implementation method that acquires sample image data as in [Example 2], an identification model that identifies smiles and non-smiles may be generated. . Rather than obtaining a generation model from smile-only image data as in [Example 1], both [Example 1] and [Example 2] are used to actively identify (separate) smiles and non-smiles. It is possible to obtain higher accuracy by obtaining such an identification model.

識別モデルは，ｐ（ｙ_i｜ｘ_i）として表現できる。すなわち，ある画像データｉの特徴量ｘ_iが与えられたもとでの，タグｙ_iが出現する確率である。具体例としては，例えば，線形判別関数，ロジスティック回帰関数，ニューラルネットワーク，サポートベクトルマシン（support vector machine：ＳＶＭ），カーネル回帰関数など，様々なものが利用できる。いずれの識別モデルにおいても，パラメータを持つ。 The identification model can be expressed as p (y _i | x _i ). That is, in Moto by the feature x _i of an image data i is given, the probability of tag y _i appears. As specific examples, various types such as a linear discriminant function, a logistic regression function, a neural network, a support vector machine (SVM), and a kernel regression function can be used. All identification models have parameters.

例えば，線形判別関数，ロジスティック回帰関数，ニューラルネットワークなどについては，最小二乗誤差規範やエントロピー最大化規範などの目的関数を，データに基づいて最小化するようにパラメータを最適化する。最適化では，線形最適化，あるいは勾配法などの非線形最適化法などの公知の方法を用いればよい。また，ＳＶＭやカーネル回帰関数などについては，カーネルトリックを介したマージン最大化規範を，主双対内点法などの公知の方法で最適化すればよい。さらに，これらの識別モデルを，ブースティングと呼ばれる手法によって統合して用いてもよい。 For example, for linear discriminant functions, logistic regression functions, neural networks, etc., parameters are optimized so that objective functions such as least square error norm and entropy maximization norm are minimized based on data. In the optimization, a known method such as linear optimization or nonlinear optimization method such as a gradient method may be used. For SVM, kernel regression function, etc., the margin maximization criterion via kernel trick may be optimized by a known method such as the main dual interior point method. Furthermore, these identification models may be integrated and used by a technique called boosting.

結果として，［例１］および［例２］のサンプル画像データを用いた場合には，「笑顔であるか否か」を分類するための表情識別器が作成される。 As a result, when the sample image data of [Example 1] and [Example 2] are used, a facial expression classifier for classifying “whether it is a smile” is created.

次に，［例１］と［例３］のような，複数の表情のサンプル画像データを取得するような実装方法の場合，識別器作成部１４としては，［例２］のときに挙げた識別モデルを，複数生成する。 Next, in the case of an implementation method such as [Example 1] and [Example 3] for acquiring sample image data of a plurality of facial expressions, the classifier creating unit 14 is given in [Example 2]. Generate multiple identification models.

例えば，笑顔，怒り顔，驚き顔の３種の場合を考える。このときは，表情識別器を３つ用意し，それぞれ，笑顔であるか否か，怒り顔であるか否か，驚き顔であるか否かを独立に識別するものとする。各表情識別器は，［例２］で説明した生成法をそのまま適用して生成することができる。表情の種類が増減した場合でも，同様に表情識別器を生成すればよい。 For example, consider three cases: smile, angry face, and surprise face. At this time, three facial expression classifiers are prepared, and each of them identifies whether it is a smile, an angry face, or a surprise face. Each facial expression classifier can be generated by directly applying the generation method described in [Example 2]. Even when the types of facial expressions increase or decrease, facial expression classifiers may be generated in the same manner.

結果として，この場合には，予め決めたいくつかの表情分類の１つとして分類するための表情識別器が作成される。 As a result, in this case, a facial expression classifier for classifying it as one of several predetermined facial expression classifications is created.

識別を行う際には，最も識別モデルの確率値ｐ（ｙ_i｜ｘ_i）が高かった表情に分類するものとすればよい。また，ニューラルネットワークなど，出力を複数取ることのできる関数の場合には，対象とする全ての表情の確率値を一括で出力するように表情識別器生成を行うものとしてもよい。 At the time of identification, the expression model having the highest probability value p (y _i | x _i ) of the identification model may be classified. In the case of a function that can take a plurality of outputs, such as a neural network, the facial expression classifier may be generated so that the probability values of all the facial expressions to be processed are output collectively.

ここで，本実施例における効果について述べる。画像共有サイトとして知られているインターネット上のサーバには，現在，非常に大量の画像データがタグデータと共に蓄えられている。例えば，１つの大手画像共有サイト（http://www.flickr.com ）で「smile 」というタグを持ったデータを検索すると，一般に公開されている画像データだけでも，２百万を超えるデータが検索される。それらのデータのほとんどは，通常の状況で撮影された画像であり，すなわち，被写体となっている人物の表情には，自発的な表情であるものが大量に存在する。 Here, the effect in the present embodiment will be described. A server on the Internet known as an image sharing site currently stores a very large amount of image data together with tag data. For example, if you search for data with the tag “smile” on one major image sharing site (http://www.flickr.com), more than 2 million data will be displayed even if it is only publicly available image data. Searched. Most of these data are images taken under normal circumstances, that is, there are a large number of spontaneous facial expressions in the facial expressions of the person who is the subject.

一方で，このような画像データには，当然ながら，人物の特定の表情のみに着目してアップロードおよびタグ付けをしたデータではないものも数多く含まれる。例えば，ペット（動物）の「smile 」であったり，smile マークと呼ばれるロゴマークであったり，たまたま撮影した風景がまるで笑った顔のようにも見えるという趣旨の画像であったり，といった画像も「smile 」というタグを持つ画像データには含まれる。 On the other hand, such image data naturally includes a lot of data that is not data that has been uploaded and tagged by focusing only on a specific facial expression of a person. For example, an image such as “smile” of a pet (animal), a logo mark called “smile mark”, or an image with the intention that the photographed landscape looks like a laughing face. It is included in image data with the tag “smile”.

あるいは，人物の笑顔であっても，顔のごく一部（目や口など）のみを極端にクローズアップしたものや，集合写真のように個々の顔が小さく不鮮明なものも含まれる。特に，複数の被写体が写っている場合には，１人のみが「smile 」で他は別の表情をしている可能性もある。以上のようなデータは，表情認識用のサンプル正解データとしては不適切となってしまう。 Or, even a person's smile may include an extremely close-up of only a small part of the face (eyes, mouth, etc.), or a face that is small and unclear, such as a group photo. In particular, when there are a plurality of subjects, there is a possibility that only one person is “smile” and others have different expressions. Such data is inappropriate as sample correct data for facial expression recognition.

そこで，本実施例では，これらタグ情報によって検索された画像データに対して，顔検出処理を行う。予め「人物の顔」を学習データとした特許文献１のパターン認識技術などを適用することにより，「人物の顔」のパターンに適合しない動物やロゴマークや一部分のみの顔のデータは排除することができる。さらに，「抽出された領域数が１であり，かつ，抽出された領域の面積が，元画像データの面積の１０％以上」といった条件を付加することにより，「ある特定の１人の人物のみで，それなりに顔が大きく写った画像」に絞り込むことができる。このように絞り込んでも，依然として，非特許文献４にあるような方法より簡易にかつ大規模に「自発的な表情をした顔画像サンプルデータ」を取り出すことができる。 Therefore, in the present embodiment, face detection processing is performed on the image data retrieved by the tag information. By applying the pattern recognition technology of Patent Document 1 using “person's face” as learning data in advance, the data of animals, logo marks, and partial faces that do not match the “person's face” pattern should be excluded. Can do. Furthermore, by adding a condition that “the number of extracted regions is 1 and the area of the extracted regions is 10% or more of the area of the original image data”, “only one specific person is present. Then, it can be narrowed down to “image with a big face as it is”. Even if narrowed down in this way, it is still possible to extract “face image sample data with a spontaneous expression” more easily and on a larger scale than the method described in Non-Patent Document 4.

特に，このような条件を満たす構図の画像は，その特定の１人の人物を主な被写体として撮影されたものと考えることができ，そのような画像データに対して付与されるタグには，当然その特定の人物の表情を示唆するキーワードが含まれることが多い。もちろん，このことは絶対的に成立する条件ではなく，例えば，泣き顔の画像に対して，もっと笑ってほしいという願いをこめてわざと「smile 」というタグをつける可能性はゼロではない。しかしながら，そのようなデータの比率は小さく，また本実施例では，得られたサンプルデータを主に統計的な理論に基づいた機械学習の方法で処理を行うため，そのようなデータが一部に混入していたとしても，結果として有意な「表情識別器」が生成される。 In particular, an image with a composition that satisfies such a condition can be considered to have been taken with that particular person as the main subject, and tags attached to such image data include: Of course, a keyword suggesting the facial expression of the specific person is often included. Of course, this is not an absolute condition, and for example, there is no possibility of intentionally tagging a smile with a crying face image with the desire to laugh more. However, the ratio of such data is small, and in this embodiment, the obtained sample data is processed by a machine learning method mainly based on statistical theory. Even if it is mixed, a significant “facial expression classifier” is generated as a result.

〔第２の実施例〕
次に，表情認識装置２０の実施例について述べる。表情認識装置２０は，与えられた入力画像３１に写っている人物の表情を認識し，その認識結果を表情ラベル３２として出力する装置であり，図１に示すような構成で実現できる。 [Second Embodiment]
Next, an embodiment of the facial expression recognition device 20 will be described. The facial expression recognition device 20 is a device that recognizes the facial expression of a person shown in a given input image 31 and outputs the recognition result as a facial expression label 32, and can be realized with the configuration shown in FIG.

画像データ取得部２１は，表情を判定したい画像データを取得する機能を持つ。具体的には，例えば，本装置の操作者がパラメータとして与えた画像データファイル名をとり，そのファイルをオープンして画像データを取り出すといった機能を持つソフトウェアモジュールによって実現できる。あるいは，ＵＲＬ形式で指定することによって，標準化されたＨＴＴＰプロトコルにより画像データを取り出すようなモジュールでもよい。あるいは，カメラ等のデバイスを取り付け，本装置の操作者の何らかの操作をトリガーとして，カメラからの画像情報を取り込むようなモジュールであってもよい。 The image data acquisition unit 21 has a function of acquiring image data whose facial expression is to be determined. Specifically, for example, it can be realized by a software module having a function of taking an image data file name given as a parameter by an operator of the apparatus, and opening the file to extract image data. Alternatively, a module that takes out image data by a standard HTTP protocol by designating it in the URL format may be used. Alternatively, it may be a module in which a device such as a camera is attached, and image information from the camera is captured using some operation of the operator of the apparatus as a trigger.

入力画像の顔検出部２２は，第１の実施例と同様の方法（図３）により実施できる。ただし，この場合には，図３のステップＳ２３の条件をより緩和したほうが好適となる。例えば，認識された領域数には制限を設けず，大きさに関しても，３２×３２ピクセル以上といった条件としたほうがよい。そのほうが，複数人数が写っている画像データや顔がやや小さめに写っている画像データに対しても，各人の表情認識を行うことができる。 The input image face detection unit 22 can be implemented by the same method (FIG. 3) as in the first embodiment. However, in this case, it is preferable to relax the condition of step S23 in FIG. For example, the number of recognized areas is not limited, and the size should be set to 32 × 32 pixels or more. In this way, each person's facial expression can be recognized even for image data showing a plurality of people or image data showing a slightly smaller face.

特徴量算出部１３は，第１の実施例と同じ処理機能を使用する。算出される特徴量に差異があると表情識別器が適切な結果ラベルを出力することができない。同一であることを示すために，図１では一つのコンポーネントとして記載してある。もちろん，同一の処理機能を持つモジュールをそれぞれに用意してもよい。 The feature amount calculation unit 13 uses the same processing function as in the first embodiment. If there is a difference between the calculated feature quantities, the facial expression classifier cannot output an appropriate result label. In order to show that they are the same, they are shown as one component in FIG. Of course, modules having the same processing function may be prepared for each.

表情識別器２４としては，第１の実施例で述べた表情識別器作成装置１０によって，実際に作成された表情識別器を使用する。すなわち，表情識別器２４は，適切な「学習」過程を経た機械学習の処理モジュールとなり，このモジュールは入力値となる特徴量を，推定された結果値である表情ラベル３２に変換し，出力する。第１の実施例で述べたように，学習されたモデルに従って，
・「笑顔らしさ」，
・「笑顔であるか否か」，
・「笑顔，怒り顔，驚き顔のいずれか」，
というような情報が出力される。 As the facial expression classifier 24, the facial expression classifier actually created by the facial expression classifier creating apparatus 10 described in the first embodiment is used. That is, the facial expression classifier 24 becomes a machine learning processing module that has undergone an appropriate “learning” process, and this module converts the feature value as the input value into the facial expression label 32 that is the estimated result value and outputs it. . As described in the first embodiment, according to the learned model,
・ "Like smile",
・ "Whether or not you smile",
・ "Smile, anger face, surprise face",
Is output.

以上の表情識別器作成および表情認識の処理は，コンピュータとソフトウェアプログラムとによって実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録することも，ネットワークを通して提供することも可能である。 The above facial expression classifier creation and facial expression recognition processing can be realized by a computer and a software program, and the program can be recorded on a computer-readable recording medium or provided through a network.

図１では，表情認識装置２０の中に表情識別器作成装置１０が組み込まれている状態の装置構成例を示しているが，これらを同一のコンピュータで実現することもでき，また，異なるコンピュータで別々に実現することもできる。 Although FIG. 1 shows an example of a device configuration in which the facial expression classifier creating device 10 is incorporated in the facial expression recognition device 20, these can be realized by the same computer, or by different computers. It can also be realized separately.

１０表情識別器作成装置
１１サンプル画像データ検索部
１２サンプル画像の顔検出部
１３特徴量算出部
１４識別器作成部
２０表情認識装置
２１画像データ取得部
２２入力画像の顔検出部
２４表情識別器
３０サンプル画像データ
３１入力画像
３２表情ラベル DESCRIPTION OF SYMBOLS 10 Expression discriminator creation apparatus 11 Sample image data search part 12 Sample image face detection part 13 Feature-value calculation part 14 Discriminator preparation part 20 Expression recognition apparatus 21 Image data acquisition part 22 Face detection part 24 of input image 24 Facial expression classifier 30 Sample image data 31 Input image 32 Expression label

Claims

画像データ内に存在する人物の顔の表情を認識するための識別器を作成する表情識別器作成装置であって，
与えられたキーワードに対し，ネットワーク上で公開された画像データから，その画像データに関連付けられたキーワードが前記与えられたキーワードと一致する画像データを検索する機能を持つサンプル画像データ検索部と，
検索された画像データから，人物の顔と推定される画像内の領域を抽出し，その領域の数および面積が予め決められた条件に合致するものを出力する機能を持つ顔検出部と，
出力された顔領域の画像データから多次元の数値データからなる特徴量を計算により求める特徴量算出部と，
求められた特徴量を教師データとして機械学習によって識別器を作成する識別器作成部とを備える
ことを特徴とする表情識別器作成装置。 A facial expression classifier creating device for creating a classifier for recognizing facial expressions of a person existing in image data,
A sample image data search unit having a function for searching image data in which a keyword associated with the image data matches the given keyword from image data published on the network for the given keyword;
A face detection unit having a function of extracting a region in an image estimated as a human face from the retrieved image data, and outputting a region whose number and area meet a predetermined condition;
A feature amount calculation unit for calculating a feature amount composed of multidimensional numerical data from the output face area image data;
An expression discriminator creating apparatus comprising: a discriminator creating unit that creates a discriminator by machine learning using the obtained feature quantity as teacher data.

請求項１記載の表情識別器作成装置において，
前記サンプル画像データ検索部は，笑顔に関連するキーワードを与えることによって，笑顔を多く含む画像データを検索する機能を持ち，
前記識別器作成部は，笑顔を識別するための表情識別器を作成する
ことを特徴とする表情識別器作成装置。 In the facial expression classifier preparation apparatus of Claim 1,
The sample image data search unit has a function of searching image data including a lot of smiles by giving a keyword related to smiles,
The facial expression classifier creating device, wherein the classifier creating section creates a facial expression classifier for identifying a smile.

請求項２記載の表情識別器作成装置において，
前記サンプル画像データ検索部は，さらに，笑顔以外の表情に関連するキーワードを与えることによって，笑顔が多く含まれない画像データを検索する機能を持ち，
前記識別器作成部は，笑顔を多く含む画像データから得られた特徴量と，笑顔が多く含まれない画像データから得られた特徴量との統計的な分布の差を探索する機械学習によって，笑顔であるか否かを判定する表情識別器を作成する
ことを特徴とする表情識別器作成装置。 In the facial expression classifier preparation device according to claim 2,
The sample image data search unit further has a function of searching image data not containing many smiles by giving keywords related to facial expressions other than smiles,
The discriminator creating unit uses machine learning to search for a statistical distribution difference between a feature amount obtained from image data including a lot of smiles and a feature amount obtained from image data including no smiles. A facial expression classifier creating apparatus for creating a facial expression classifier that determines whether or not a person is smiling.

請求項１記載の表情識別器作成装置において，
前記サンプル画像データ検索部は，笑顔，怒り顔，驚き顔または泣き顔に関連する特定の表情に関するキーワードを与えることによって，前記特定の表情を多く含む画像データを検索する機能を持つとともに，顔や人物に関連するキーワードを条件とし，かつ前記特定の表情に関するキーワードに関連付けられていないという条件を付加することによって，前記特定の表情が多く含まれない画像データを検索する機能を持ち，
前記識別器作成部は，前記特定の表情を多く含む画像データから得られた特徴量と，前記特定の表情が多く含まれない画像データから得られた特徴量との統計的な分布の差を探索する機械学習によって，前記特定の表情であるか否かを判定する表情識別器を作成する
ことを特徴とする表情識別器作成装置。 In the facial expression classifier preparation apparatus of Claim 1,
The sample image data retrieval unit has a function of retrieving image data containing a large amount of the specific expression by giving a keyword related to a specific expression related to a smile, an angry face, a surprised face, or a crying face, and a face or person By adding a condition that is not related to the keyword related to the specific facial expression as a condition and a keyword related to the specific facial expression, it has a function of searching for image data that does not contain many of the specific facial expressions,
The discriminator creating unit calculates a statistical distribution difference between a feature amount obtained from image data including a large amount of the specific expression and a feature amount obtained from image data not including the specific expression. A facial expression classifier creating apparatus that creates a facial expression classifier that determines whether or not the specific facial expression is obtained by machine learning to be searched.

コンピュータが，画像データ内に存在する人物の顔の表情を認識するための識別器を作成する表情識別器作成方法であって，
与えられたキーワードに対し，ネットワーク上で公開された画像データから，その画像データに関連付けられたキーワードが前記与えられたキーワードと一致する画像データを検索するサンプル画像データ検索過程と，
検索された画像データから，人物の顔と推定される画像内の領域を抽出し，その領域の数および面積が予め決められた条件に合致するものを出力する顔検出過程と，
出力された顔領域の画像データから多次元の数値データからなる特徴量を計算により求める特徴量算出過程と，
求められた特徴量を教師データとして機械学習によって識別器を作成する識別器作成過程とを有する
ことを特徴とする表情識別器作成方法。 A facial expression classifier creation method in which a computer creates a classifier for recognizing a facial expression of a person existing in image data,
A sample image data search process for searching image data in which a keyword associated with the image data matches the given keyword from image data published on the network for the given keyword;
Extracting a region in the image that is estimated to be a human face from the retrieved image data, and outputting a region whose number and area meet a predetermined condition;
A feature amount calculation process for calculating a feature amount composed of multi-dimensional numerical data from the output face area image data,
And a discriminator creating process for creating a discriminator by machine learning using the obtained feature quantity as teacher data.

与えられた画像内から人物の顔の表情を自動的に認識する表情認識装置であって，
表情の認識対象となる画像データを取得する画像データ取得部と，
取得された画像データから，人物の顔と推定される画像内の領域を抽出する顔検出部と，
抽出された顔領域の画像データから多次元の数値データからなる特徴量を計算により求める特徴量算出部と，
請求項１から請求項４までのいずれか１項に記載された表情識別器作成装置によって作成された表情識別器とを備え，
前記特徴量算出部は，前記表情識別器作成装置における表情識別器の作成に用いた特徴量の算出方法と同じ算出方法を用いて顔領域の画像データから特徴量を算出し，算出した特徴量を前記表情識別器の入力として前記表情識別器に表情の識別結果を出力させる
ことを特徴とする表情認識装置。 A facial expression recognition device that automatically recognizes a facial expression of a person from a given image,
An image data acquisition unit for acquiring image data for facial expression recognition;
A face detection unit that extracts an area in the image that is estimated to be a human face from the acquired image data;
A feature amount calculation unit for calculating a feature amount composed of multi-dimensional numerical data from the extracted image data of the face area;
A facial expression classifier created by the facial expression classifier creating apparatus according to any one of claims 1 to 4,
The feature amount calculation unit calculates the feature amount from the image data of the face region using the same calculation method as the feature amount calculation method used for creating the facial expression classifier in the facial expression classifier creation device, and calculates the calculated feature amount The facial expression recognition device outputs the facial expression identification result to the facial expression classifier as an input to the facial expression classifier.

コンピュータが，与えられた画像内から人物の顔の表情を自動的に認識する表情認識方法であって，
表情の認識対象となる画像データを取得する画像データ取得過程と，
取得された画像データから，人物の顔と推定される画像内の領域を抽出する顔検出過程と，
抽出された顔領域の画像データから多次元の数値データからなる特徴量を計算により求める特徴量算出過程と，
請求項１から請求項４までのいずれか１項に記載された表情識別器作成装置によって作成された表情識別器に前記特徴量を入力して表情を識別する表情識別過程とを有し，
前記特徴量算出過程では，前記表情識別器作成装置における表情識別器の作成に用いた特徴量の算出方法と同じ算出方法を用いて顔領域の画像データから特徴量を算出し，算出した特徴量を前記表情識別器の入力として，前記表情識別器に表情の識別結果を出力させる
ことを特徴とする表情認識方法。 A facial expression recognition method in which a computer automatically recognizes a facial expression of a person from within a given image,
An image data acquisition process for acquiring image data for facial expression recognition;
A face detection process for extracting a region in the image estimated to be a human face from the acquired image data;
A feature amount calculation process for calculating a feature amount composed of multidimensional numerical data from the extracted image data of the face area;
A facial expression identification process for identifying the facial expression by inputting the feature quantity into the facial expression classifier created by the facial expression classifier creation device according to any one of claims 1 to 4;
In the feature amount calculation process, the feature amount is calculated from the image data of the face region by using the same calculation method as the feature amount calculation method used to create the facial expression classifier in the facial expression classifier creation device, and the calculated feature amount A facial expression recognition method, wherein the facial expression classifier outputs a facial expression identification result using the facial expression classifier as an input.

請求項５に記載された表情識別器作成方法を，コンピュータに実行させるための表情識別器作成プログラム。 An expression discriminator creation program for causing a computer to execute the expression discriminator creation method according to claim 5.

請求項７に記載された表情認識方法を，コンピュータに実行させるための表情認識プログラム。 A facial expression recognition program for causing a computer to execute the facial expression recognition method according to claim 7.