JP6697356B2

JP6697356B2 - Device, program and method for identifying state of specific object among predetermined objects

Info

Publication number: JP6697356B2
Application number: JP2016178294A
Authority: JP
Inventors: 剣明呉; 矢崎　智基; 智基矢崎
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-09-13
Filing date: 2016-09-13
Publication date: 2020-05-20
Anticipated expiration: 2036-09-13
Also published as: JP2018045350A

Description

本発明は、所定対象の状態を、当該所定対象に係る情報に基づいて識別する技術に関する。 The present invention relates to a technique of identifying a state of a predetermined target based on information about the predetermined target.

従来、所定対象の状態、例えば人間の表情を、この所定対象に関する情報、例えば顔を撮影した写真画像を用いて識別する技術は、種々考案されてきた。 Conventionally, various techniques have been devised for identifying a state of a predetermined object, for example, a human facial expression, using information about the predetermined object, for example, a photographic image of a face.

特に、人間の表情認識の分野では、ポジティブ、ネガティブ、ニュートラルの３分類モデルや、Paul Ekman の７分類モデル（ニュートラル、喜び、嫌悪、怒り、サプライズ、悲しみ、恐怖）等を採用し、多くの研究者が表情認識技術の向上に取り組んでいる。 Especially in the field of human facial expression recognition, positive, negative and neutral three classification models and Paul Ekman's seven classification model (neutral, joy, disgust, anger, surprise, sadness, fear) etc. have been adopted and many studies have been conducted. Are working on improving facial expression recognition technology.

このような取り組みの一例として、特許文献１には、上記の分類モデルに基づく大量の顔画像データの特徴量を学習し、その特徴量に基づいて表情を識別する技術が開示されている。この技術では、特に、意図的に作った顔ではなく、自然な顔表情の学習データを効率良く収集し、認識精度の良い識別器を作成することを目的としている。 As an example of such an approach, Patent Document 1 discloses a technique for learning a feature amount of a large amount of face image data based on the above classification model and identifying a facial expression based on the feature amount. The purpose of this technique is to efficiently collect learning data of natural facial expressions rather than intentionally created faces and create a discriminator with high recognition accuracy.

特開２０１１−１５０３８１号公報JP, 2011-150381, A

しかしながら、特許文献１に記載されたような従来技術においては、具体的に表情を識別すべき個人の顔の表情を判定したとしても、その個人の有する表情の表出傾向によって、実際とは異なる判定結果が出ることも少なくなく、大きな問題となっている。 However, in the conventional technique as described in Patent Document 1, even if the facial expression of an individual whose facial expression is to be specifically identified is determined, it differs from the actual one depending on the expression tendency of the facial expression of the individual. It is not uncommon for a judgment result to appear, which is a big problem.

すなわち、その個人の性格や、その個人の属する民族、居住地域等の違いによって、例えば、元来顔の表情が厳しい、怒りの感情の表現が控えめであるといったような、現れる表情に特定の傾向が存在することはよく知られている。これに対し、従来の表情の判定処理においては、例えば特許文献１の技術のように、大量の顔画像データの特徴量を学習した識別器を用いて処理を行っている。従って、このような表情識別対象の有する特定の傾向は、表情表出の一般的傾向からは逸脱していることも少なくないので、表情識別の失敗を起こす原因となってしまう。 That is, depending on the personality of the individual, the ethnicity to which the individual belongs, the residential area, etc., for example, the facial expression that is originally severe, the expression of emotions of anger is modest, and the specific tendency to the facial expression that appears. It is well known that there exists. On the other hand, in the conventional facial expression determination processing, for example, as in the technique of Patent Document 1, the processing is performed using a discriminator that has learned the feature amount of a large amount of face image data. Therefore, such a specific tendency of the facial expression identification target often deviates from the general tendency of expression of the facial expression, which causes failure in facial expression identification.

そこで、本発明は、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態をより確実に識別することが可能な装置、プログラム及び方法を提供することを目的とする。 Therefore, the present invention is a device, a program, and a method that can more reliably identify a state of a predetermined target, in which a tendency that each target or a type of the target is different is different. The purpose is to provide.

本発明によれば、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するクラスタリング手段と、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該特定対象に係る対象情報が属するクラスタに対応付けられた状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有する状態識別装置が提供される。 According to the present invention, a state of a predetermined target, which is different in tendency to appear for each individual target or for each type of the target, is identified based on target information related to the predetermined target. A state identification device,
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Score determination means for determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score Clustering means for classifying into
Associating each of each said plurality of the states of the plurality of clusters, a state in which target information related to the specific object has been correlated to belong cluster, a correct answer determination means for determining the correct answer for the target information,
About one target information related to the specific target, with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification device is provided which has a state determination means for inputting the determined score and for determining the state of the one target information in the specific target from the output thereof.

本発明によれば、また、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルであって、入力されたスコアから生成された特徴量のなす特徴量空間において各特徴量の点との距離が最大となる識別超平面を求める特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有する状態識別装置が提供される。 According to the present invention , the state of a predetermined target, a state in which the tendency to develop for each individual target or for each type of the target is different, based on the target information related to the predetermined target A state identification device for identifying,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information,
Based on a score determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, when the plurality of target information is classified into a plurality of clusters associated with each state A correct answer determining unit that determines a state corresponding to a cluster to which the target information related to the specific target belongs, as a correct answer for the target information,
A specific identification model determined based on a score determined for a plurality of target information related to the specific target and a correct answer determined for the plurality of target information, and a feature amount generated from the input score For a specific identification model that obtains an identification hyperplane that maximizes the distance to each feature amount point in the feature amount space formed by, input the score determined for one piece of target information related to the specific target, and output it. from the state determining means that determine the state of the said one of the target information that put in the specific subject
There is provided a state identification device having:

本発明によれば、さらに、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルであって、入力されたスコアに対する重み付け係数を含んでおり、決定された正解に係る状態と、当該モデルの出力との誤差を減少させるように当該重み付け係数を更新する特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有する状態識別装置が提供される。 According to the present invention, further, in the state of the predetermined target, a state in which the tendency to develop for each individual target or for each type of the target is different, based on the target information related to the predetermined target A state identification device for identifying,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information,
Based on a score determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, when the plurality of target information is classified into a plurality of clusters associated with each state A correct answer determining unit that determines a state corresponding to a cluster to which the target information related to the specific target belongs, as a correct answer for the target information,
A score determined for a plurality of object information related to the specific object, a specific identification model determined based on the correct answer determined for the plurality of target information, including Nde weighting coefficients for the input score The score determined for one piece of target information related to the specific target for the specific identification model that updates the weighting coefficient so as to reduce the error between the state of the determined correct answer and the output of the model . enter a, from the output, and a state determining means for determining a state of the said one of the target information that put in the specific subject
There is provided a state identification device having:

さらに、本発明による状態識別装置の一実施形態として、当該所定の対象は人間の顔であり、当該状態は顔の表情であって、当該対象情報は、人間の顔の画像に係る情報であり、
当該特定対象は、その表情を識別する対象である個人、又はその表情を識別する対象である人間の属する所定の属性集団であり、
状態決定手段は、当該個人又は当該属性集団に属する人間の顔の表情の画像情報に基づいて、当該画像情報に係る顔に現れた表情を識別することも好ましい。 Further, as one embodiment of the state identification device according to the present invention, the predetermined target is a human face, the state is a facial expression, and the target information is information related to an image of a human face. ,
The specific object is a predetermined attribute group to which an individual whose facial expression is to be identified, or a human who is the target whose facial expression is to be identified,
It is also preferable that the state determining means identifies the facial expression that appears on the face related to the image information, based on the image information of the facial expression of the person who belongs to the individual or the attribute group.

また、本発明による状態識別装置における、当該複数の対象情報の当該クラスタへの分類は、当該スコアのなす空間においてｋ平均（k-means）法を用いて実行されることも好ましい。 It is also preferable that the state identification device according to the present invention classify the plurality of pieces of target information into the clusters using a k-means method in a space formed by the scores.

さらに、本発明による状態識別装置のスコア決定手段において用いられる識別モデルは、畳み込み層を含む畳み込みニューラルネットワーク（Convolutional Neural Network）における学習モデルであることも好ましい。 Further, the discrimination model used in the score determination means of the state discrimination device according to the present invention is preferably a learning model in a convolutional neural network including a convolutional layer.

本発明によれば、さらに、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するクラスタリング手段と、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該複数のクラスタの中心のうち、当該特定対象に係る１つの対象情報について決定されたスコアとの距離が最も小さい中心を有するクラスタに対応付けられた状態を、該１つの対象情報に係る状態に決定する状態決定手段と
を有する状態識別装置が提供される。 According to the present invention, further, in the state of the predetermined target, a state in which the tendency to develop for each individual target or for each type of the target is different, based on the target information related to the predetermined target A state identification device for identifying,
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Score determination means for determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score Clustering means for classifying into
A cluster having the smallest distance from the score determined for one piece of target information related to the specific target among the centers of the plurality of clusters, each of the plurality of clusters being associated with each of the plurality of states the state of being correlated to, state identification device having a state determining means for determining the state of the said one of the target information is provided.

本発明によれば、また、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する装置に搭載されたコンピュータを機能させる評価推定プログラムであって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するクラスタリング手段と、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該特定対象に係る対象情報が属するクラスタに対応付けられた状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
してコンピュータを機能させる状態識別プログラムが提供される。 According to the present invention, the state of a predetermined target, a state in which the tendency to develop for each individual target or for each type of the target is different, based on the target information related to the predetermined target An evaluation estimation program that causes a computer installed in an identifying device to function,
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Score determination means for determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score Clustering means for classifying into
Associating each of each said plurality of the states of the plurality of clusters, a state in which target information related to the specific object has been correlated to belong cluster, a correct answer determination means for determining the correct answer for the target information,
About one target information related to the specific target, with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification program that causes a computer to function as a state determining unit that inputs the determined score and outputs the output of the determined score is provided.

本発明によれば、さらに、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する装置に搭載されたコンピュータにおいて実施される状態識別方法であって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するステップと、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するステップと、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該特定対象に係る対象情報が属するクラスタに対応付けられた状態を、該対象情報についての正解に決定するステップと、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定するステップと
を有する状態識別方法が提供される。 According to the present invention, further, in the state of the predetermined target, a state in which the tendency to develop for each individual target or for each type of the target is different, based on the target information related to the predetermined target A state identification method implemented in a computer mounted on an identification device, comprising:
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score A step of classifying into
Associating each of said plurality of states in each of the plurality of clusters, a state in which target information related to the specific object has been correlated to belong cluster, determining the correct answer for the target information,
About one target information related to the specific target, with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information The step of inputting the determined score and determining the state of the one target information in the specific target from the output thereof is provided.

本発明の状態識別装置、プログラム及び方法によれば、所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態をより確実に識別することができる。 According to the state identification device, the program and the method of the present invention, it is possible to more reliably identify a state of a predetermined target that is different in tendency to appear for each individual target or each type of the target. You can

本発明による状態識別装置の一実施形態における機能構成を示す機能ブロック図である。It is a functional block diagram which shows the functional structure in one Embodiment of the state identification device by this invention. 表情識別エンジンで構築・使用される表情識別モデルの一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the facial expression identification model constructed and used by the facial expression identification engine. 表情スコア決定部（表情識別エンジン）におけるスコア決定処理の一実施例を示すテーブルである。6 is a table showing an example of score determination processing in a facial expression score determination unit (facial expression identification engine). 画像クラスタリング部及び正解表情決定部における処理の一実施例を示すテーブルである。9 is a table showing an example of processing in an image clustering unit and a correct facial expression determination unit. 状態決定部で使用される特定識別モデルの識別器における学習の一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the learning in the discriminator of the specific discriminant model used by a state determination part. 特定識別モデルの識別器に採用されるＳＶＭにおける識別境界面を説明するための模式図である。It is a schematic diagram for demonstrating the discrimination | determination boundary surface in SVM employ | adopted by the discriminator of a specific discrimination model. 本発明による状態識別装置の他の実施形態における機能構成を示す機能ブロック図である。It is a functional block diagram which shows the functional structure in other embodiment of the state identification device by this invention.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［一実施形態における装置構成］
図１は、本発明による状態識別装置の一実施形態における機能構成を示す機能ブロック図である。 [Device Configuration in One Embodiment]
FIG. 1 is a functional block diagram showing a functional configuration in an embodiment of a state identification device according to the present invention.

図１によれば、本実施形態の状態識別装置としてのスマートフォン１は、公知の構成を有するカメラ１０５を内蔵しており、このカメラ１０５を用いて、例えばユーザの顔を撮影してこの顔の写真画像（個人画像）を生成し、生成した写真画像に映ったユーザの顔の表情を識別して、タッチパネル・ディスプレイ（ＴＰ・ＤＰ）に識別結果を表示することができる。また、当然に、このような表情識別対象である顔の写真画像を、外部から通信ネットワークを介して取得して処理することも可能である。 According to FIG. 1, the smartphone 1 as the state identification device of the present embodiment incorporates a camera 105 having a known configuration, and using the camera 105, for example, an image of a user's face is taken to capture this face. It is possible to generate a photographic image (personal image), identify the facial expression of the user's face reflected in the generated photographic image, and display the identification result on the touch panel display (TP / DP). Further, naturally, it is also possible to externally acquire a photographic image of such a face, which is an expression recognition target, through a communication network and process it.

また、１つの応用例として、スマートフォン１のアプリケーション１２１、例えば対話ＡＩアプリが、この表情の識別結果を利用して、例えば対話しているユーザの感情（発話意図）を理解し、その応答内容を調整したり、当該ユーザとの対話内容をパーソナライズしたりすることも可能になる。 In addition, as one application example, the application 121 of the smartphone 1, for example, the dialogue AI application, uses the identification result of the facial expression to understand the emotion (intention of utterance) of the user who is interacting with the response contents, for example. It is also possible to make adjustments and personalize the content of the dialogue with the user.

さらに、スマートフォン１は、本実施形態において、表情識別のための表情識別エンジン１１２における学習用の大量の一般画像（様々な人間の顔の写真画像）を、画像管理サーバ２から取得することも好ましい。 Further, in the present embodiment, it is also preferable that the smartphone 1 acquires from the image management server 2 a large amount of general images for learning (photograph images of various human faces) in the facial expression identification engine 112 for facial expression identification. ..

このような本発明による状態識別装置としてのスマートフォン１は、所定の対象（例えば人間の顔）の状態（例えば顔の表情）であって、個々の対象（例えば個々人）毎に又は当該対象の種別（例えば属する民族や居住地域）毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報（例えば顔の写真画像（に係る情報））に基づいて識別する装置であって、
（Ａ）多数の対象情報（写真画像）に基づいて決定された「識別モデル」を用いて、入力された対象情報（写真画像）からこの対象情報に係る対象の状態（顔の表情）を表すスコアを決定するスコア決定手段（表情スコア決定部１１２ｂ）と、
（Ｂ）所定の対象（人間の顔）のうちの状態識別対象である特定対象（例えば特定のユーザの顔）に係る複数の対象情報（写真画像）から決定されたスコアに基づき、これら複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、特定対象（特定のユーザの顔）に係る対象情報（写真画像）が属するクラスタに対応する状態を、この対象情報（写真画像）についての正解に決定する正解決定手段（正解表情決定部１１４）と、
（Ｃ）特定対象（特定ユーザの顔）に係る複数の対象情報（写真画像）について決定されたスコアと、上記の複数の対象情報（写真画像）について決定された正解とに基づいて決定された「特定識別モデル」に対して、特定対象（特定ユーザの顔）に係る１つの対象情報（写真画像）について決定されたスコアを入力し、その出力から、特定対象におけるこの１つの対象情報に係る状態（写真画像における特定ユーザの顔に現れた表情）を決定する状態決定手段（表情決定部１１５）と
を有することを特徴としている。 The smartphone 1 as the state identification device according to the present invention is in a state (for example, facial expression) of a predetermined target (for example, human face), and is for each individual target (for example, individual) or the type of the target. A device for identifying a state in which a tendency to appear differently for each (for example, ethnicity to which they belong or a residential area) based on target information (for example, a photographic image of a face (information related to)) related to the predetermined target,
(A) Using an “identification model” determined based on a large number of target information (photo images), represents the state of the target (facial expression) related to this target information from the input target information (photo image) Score determination means (facial expression score determination unit 112b) for determining a score,
(B) Based on scores determined from a plurality of pieces of target information (photograph images) related to a specific target (for example, a face of a specific user) that is a state identification target of a predetermined target (human face), When the target information is classified into a plurality of clusters associated with each state, the state corresponding to the cluster to which the target information (photo image) related to the specific target (face of a specific user) belongs is A correct answer determining unit (correct answer facial expression determining unit 114) for determining a correct answer for a photograph image);
(C) Determined based on the scores determined for the plurality of target information (photo images) related to the specific target (face of the specific user) and the correct answers determined for the plurality of target information (photo images). The score determined for one piece of target information (photo image) related to the specific target (face of the specific user) is input to the “specific identification model”, and the output is related to this one target information in the specific target. And a state determining unit (facial expression determining unit 115) for determining a state (an expression that appears on the face of a specific user in a photographic image).

このように、スマートフォン１によれば、表情識別器によって決定されるスコアだけに頼って表情を識別するのではなく、特定対象（例えば特定のユーザの顔）の対象情報（例えば写真画像）に対し、クラスタリング処理を利用して正解を予め決定する。これにより、この特定対象（特定のユーザの顔）の識別に適合した「特定識別モデル」を利用することができ、結果として、この特定対象の状態（特定ユーザの顔の表情）をより確実に、高い精度で識別することが可能となるのである。 As described above, according to the smartphone 1, the facial expression is not identified only by the score determined by the facial expression discriminator, but the target information (for example, the photographic image) of the specific target (for example, the face of the specific user) is detected. , The correct answer is determined in advance by using the clustering process. This makes it possible to use a “specific identification model” that is suitable for identifying this specific target (specific user's face), and as a result, the state of this specific target (face expression of the specific user) can be more reliably It is possible to identify with high accuracy.

ここで、本実施形態のように人間の顔の表情を識別する場合、識別すべき特定対象は、その表情を識別する対象である特定の個人（例えばスマートフォン１のユーザ）、又はその表情を識別する対象である人間の属する所定の属性集団、例えば特定の個人の属する民族や居住地域とすることができる。 Here, in the case of identifying the facial expression of a human face as in the present embodiment, the specific target to be identified is a specific individual (for example, the user of the smartphone 1) that is the target of the facial expression, or the facial expression is identified. It can be a predetermined attribute group to which a human being as a target belongs, for example, an ethnic group or a residential area to which a specific individual belongs.

実際、国・民族別（地域別）や、年齢、性別等の個人属性別による表情識別結果の相違については、ポジティブ、ニュートラル、ネガティブ３分類モデルや、Ekman の７分類モデルといった、広く普及している表情カテゴリモデルを利用して、種々の研究がなされている。 In fact, the differences in facial expression recognition results by country / ethnicity (region) and individual attributes such as age and sex are widely spread, such as the positive, neutral, and negative three-classification models, and the Ekman seven-classification model. Various studies have been made using the facial expression category model.

例えば、研究文献：Jack, R. E.， Blais, C.， Scheepers, C.，Schyns, P. G.，及びCaldara, R. "Cultural confusions show that facial expressions are not universal" Current Biology, 19，２００９年，１５４３〜１５４８頁は、東アジア系の被験者がヨーロッパ系の被験者に比べて、恐怖を驚きに、嫌悪を怒りに混同させる表情をとる傾向を示す実験結果を示している。また、その原因として、ヨーロッパ系の被験者は、他人の表情を観察する際、目と口とを同程度見る、すなわち顔全体を見るのに対し、東アジア系の被験者は目に対してより注視を行うことを記載している。 For example, research literature: Jack, RE, Blais, C., Scheepers, C., Schyns, PG, and Caldara, R. "Cultural confusions that that facial expressions are not universal" Current Biology, 19, 2009, 1543-1548. Page shows experimental results showing that East Asian subjects tended to confuse fear with confusion and anger with anger compared to European subjects. In addition, as a cause for this, when observing the facial expressions of others, European subjects see the eyes and mouth to the same extent, that is, they see the entire face, whereas East Asian subjects gaze more closely at the eyes. It describes that it does.

さらに、研究文献：Yuki, M.，Maddux, W. W.，及びMasuda, T. "Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and the United States" Journal of Experimental Social Psychology, 43，２００７年，３０３〜３１１頁においては、日本人は、喜びや悲しみを示す顔の表情を評価する際、米国人に比べ口元よりも目元に対してより重点を置く傾向のあることが記載されている。 Furthermore, research literature: Yuki, M., Maddux, WW, and Masuda, T. "Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and In the United States "Journal of Experimental Social Psychology, 43, 2007, pp. 303-311, Japanese assess the facial expression of joy and sadness more toward the eyes than the mouth than the Americans. It states that there is a tendency to place more emphasis.

これらの研究結果が示すような個人差や国・民族・個人地域差等が存在する人間の表情を判定する処理は、従来それにもかかわらず、大量の多種多様な顔画像データの特徴量を学習した識別器を用いて行われてきた。従って例えば、特定の個人の表情を識別するのに失敗する場合も少なくなかったのである。これに対し、スマートフォン１を用いれば、特定のユーザの顔に対し、クラスタリング処理を利用して正解を予め決定した上でより適合した識別器を構築するので、結局、この特定のユーザの表情をより確実に識別することが可能となるのである。 Conventionally, the processing for determining human facial expressions that have individual differences, country / ethnicity / individual regional differences, etc. as shown by the results of these studies has nevertheless learned the feature amount of a large variety of face image data. It has been performed using the discriminator. Therefore, for example, it often happened that the facial expression of a specific individual failed to be identified. On the other hand, when the smartphone 1 is used, the correct answer is determined in advance by using the clustering process for the face of a specific user, and a more suitable discriminator is constructed. It is possible to identify more reliably.

なお、上記のスマートフォン１に具現されたような本発明による状態識別装置は、識別すべき所定対象の状態として、人間の顔の表情にのみ適用されるものではない。本発明によれば、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態であるならば、種々の状態が、より確実に識別可能となる。言い換えると、従来そのような異なる傾向故に識別結果に大きな誤差や間違いが発生していたのに対し、本発明によれば、そのような状態をより精度良く識別することができるのである。 The state identification device according to the present invention as embodied in the smartphone 1 is not limited to the facial expression of a human face as the state of the predetermined target to be identified. According to the present invention, various states can be more reliably discriminated if the tendency is different for each individual object or for each type of the object. In other words, while a large error or error has occurred in the identification result due to such different tendencies in the past, according to the present invention, such a state can be identified more accurately.

さらに、スマートフォン１に具現されたような本発明による状態識別装置は、当然にスマートフォンに限定されるものではない。例えば、この状態識別装置として、タブレット型コンピュータ、ノート型コンピュータ、パーソナルコンピュータ、セットトップボックス（セットトップボックス）、ロボット、デジタルサイネージ等を採用することもできる。例えば、カメラを内蔵したこれらの装置（端末）において、ユーザの表情を読み取ることによって、読み取った表情に係る情報に応じた応答を行ったり、読み取った表情に係る情報から、先に実施されたユーザに対するアクション等の評価を行ったりすることも可能となる。 Furthermore, the state identification device according to the present invention as embodied in the smartphone 1 is not of course limited to the smartphone. For example, a tablet computer, a notebook computer, a personal computer, a set top box (set top box), a robot, a digital signage, or the like can be used as the state identification device. For example, in these devices (terminals) with a built-in camera, by reading the facial expression of the user, a response corresponding to the information related to the read facial expression is performed, or from the information related to the read facial expression, the user who has been previously performed. It is also possible to evaluate actions and so on.

同じく図１の機能ブロック図に示すように、状態識別装置（表情識別装置）である本実施形態のスマートフォン１は、通信インタフェース部１０１と、一般画像データベース１０２と、個人画像データベース１０３と、表情データ記憶部１０４と、カメラ１０５と、タッチパネル・ディスプレイ（ＴＰ・ＤＰ）１０６と、プロセッサ・メモリとを有する。ここで、プロセッサ・メモリは、スマートフォン１のコンピュータを機能させるプログラムを実行することによって、状態識別機能（表情識別機能）を実現させる。 Similarly, as shown in the functional block diagram of FIG. 1, the smartphone 1 of the present embodiment, which is a state identification device (facial expression identification device), includes a communication interface unit 101, a general image database 102, a personal image database 103, and facial expression data. It has a storage unit 104, a camera 105, a touch panel display (TP / DP) 106, and a processor memory. Here, the processor memory realizes a state identification function (facial expression identification function) by executing a program that causes the computer of the smartphone 1 to function.

さらに、このプロセッサ・メモリは、機能構成部として、画像管理部１１１と、識別モデル学習部１１２ａ及び表情スコア決定部１１２ｂを有する表情識別エンジン１１２と、画像クラスタリング部１１３と、正解表情決定部１１４と、表情決定部１１５と、アプリケーション１２１とを有する。ここで、図１におけるスマートフォン１の機能構成部間を矢印で接続して示した処理の流れは、本発明による表情識別方法の一実施形態としても理解される。 Further, the processor / memory includes an image management unit 111, a facial expression identification engine 112 having an identification model learning unit 112a and a facial expression score determination unit 112b, an image clustering unit 113, and a correct facial expression determination unit 114 as functional components. The facial expression determination unit 115 and the application 121 are included. Here, the process flow shown by connecting the functional components of the smartphone 1 in FIG. 1 with arrows is understood as an embodiment of the facial expression identification method according to the present invention.

通信インタフェース部１０１は、表情識別エンジン１１２における学習用の大量の一般画像を、画像管理サーバ２からインターネット等の通信ネットワークを介して取得する。また、通信インタフェース部１０１は、本発明に係る表情識別プログラム（アプリ）や、当該表情識別結果を利用したサービスを提供可能なアプリケーション・プログラム、例えば対話ＡＩアプリ、をダウンロードすることもできる。 The communication interface unit 101 acquires a large amount of general images for learning in the facial expression identification engine 112 from the image management server 2 via a communication network such as the Internet. The communication interface unit 101 can also download the facial expression identification program (application) according to the present invention or an application program capable of providing a service using the facial expression identification result, for example, a dialogue AI application.

画像管理部１１１は、カメラ１０５から、又は外部の情報機器から通信インタフェース１０１を介して、表情識別対象である特定の個人（例えばスマートフォン１のユーザ）の個人画像を取得し、個人画像データベース１０３に保存し管理することができる。また、通信インタフェース１０１を介して取得された一般画像も、一般画像データベース１０２に保存し管理してもよい。例えば、個人画像データに対しては、（例えばユーザの指定入力に基づく）個人画像ラベルを付与して管理することも好ましい。 The image management unit 111 acquires a personal image of a specific individual (for example, the user of the smartphone 1) as a facial expression identification target from the camera 105 or an external information device via the communication interface 101, and stores the personal image in the personal image database 103. Can be saved and managed. Further, general images acquired via the communication interface 101 may also be stored and managed in the general image database 102. For example, it is also preferable to add and manage a personal image label (for example, based on a user's designated input) to the personal image data.

表情識別エンジン１１２は、本実施形態において、識別モデル学習部１１２ａと、表情スコア決定部１１２ｂとを有する。このうち、識別モデル学習部１１２ａは、取得された大量の一般画像（様々な人間の顔の写真画像）を用いて学習を行い、表情識別モデルを構築・決定する。この表情識別モデルは、例えば、ディープラーニングの一種である畳み込みニューラルネットワーク（Convolutional Neural Network）を含む識別器とすることができ、一般的な万人向けの、又は平均的な若しくは共通する表情の傾向をもった人的集団に向けた識別器と捉えることができる。 The facial expression identification engine 112 includes an identification model learning unit 112a and an facial expression score determination unit 112b in the present embodiment. Of these, the identification model learning unit 112a performs learning using a large amount of acquired general images (photographic images of various human faces) to construct / determine a facial expression identification model. This facial expression discriminant model can be, for example, a discriminator including a convolutional neural network, which is a kind of deep learning, and can be used for general people or for average or common facial tendency. It can be regarded as a discriminator for a human group with.

一方、表情スコア決定部１１２ｂは、構築・決定された表情識別モデルを用いて、入力された対象情報からこの対象情報に係る対象の状態を表すスコアを決定する。 On the other hand, the facial expression score determination unit 112b determines, from the input target information, the score representing the state of the target related to the target information, using the constructed / determined facial expression identification model.

図２は、表情識別エンジン１１２で構築・使用される表情識別モデルの一実施形態を示す模式図である。 FIG. 2 is a schematic diagram showing an embodiment of a facial expression identification model constructed and used by the facial expression identification engine 112.

図２に示すように、本実施形態において、表情識別エンジン１１２で構築・決定される表情識別モデルは、順伝播型の一種である畳み込みニューラルネットワーク（ＣＮＮ, ConvNet）に基づいて構成されている。このＣＮＮは複数の畳み込み層を含んでいるが、この畳み込み層は、動物の視覚野の単純細胞の働きを模しており、画像に対しカーネル（重み付け行列フィルタ）をスライドさせて特徴マップを生成する畳み込み処理を実行する層である。この畳み込み処理によって、画像の解像度を段階的に落としながら、エッジや勾配等の基本的特徴を抽出し、局所的な相関パターンの情報を得ることができる。 As shown in FIG. 2, in the present embodiment, the facial expression identification model constructed and determined by the facial expression identification engine 112 is configured based on a convolutional neural network (CNN, ConvNet) which is a kind of forward propagation type. This CNN contains multiple convolutional layers. This convolutional layer imitates the function of simple cells in the visual cortex of an animal, and a kernel (weighting matrix filter) is slid on an image to generate a feature map. This is the layer that executes the convolution processing. By this convolution processing, basic features such as edges and gradients can be extracted and information on local correlation patterns can be obtained while gradually reducing the resolution of the image.

また、各畳み込み層はプーリング層（サブサンプリング層）と対になっており、畳み込み処理とプーリング処理とが繰り返されることも好ましい。ここで、プーリング処理とは、動物の視覚野の複雑細胞の働きを模した処理であり、畳み込み層から出力される特徴マップ（一定領域内の畳み込みフィルタの反応）を最大値や平均値等でまとめ、調整パラメータを減らしつつ、局所的な平行移動不変性を確保する処理である。これにより、顔のサイズ、顔の向き、頭の傾き、帽子やサングラス等の付属物の付加といった画像における多少のズレによる見え方の違いを吸収し、本来の特徴を捉えた適切な特徴量を獲得することができる。 It is also preferable that each convolution layer is paired with a pooling layer (subsampling layer), and the convolution process and the pooling process are repeated. Here, the pooling process is a process that imitates the action of a complex cell in the visual cortex of an animal, and the feature map (reaction of the convolution filter within a certain area) output from the convolutional layer is expressed as a maximum value or an average value. In summary, this is a process for ensuring local parallelism invariance while reducing adjustment parameters. This absorbs differences in appearance due to slight deviations in the image such as face size, face orientation, head inclination, addition of accessories such as hats and sunglasses, and obtains an appropriate feature amount that captures the original feature. Can be earned.

表情識別エンジン１１２の識別モデル学習部１１２ａ（図１）は、例えば一般画像データベース１０２（図１）に蓄積された大量の一般画像からなる大規模画像データセットを用いて、このＣＮＮに対し学習を行わせる。具体的には、この大規模画像データセットの画像をＣＮＮに入力し、ＣＮＮ内の複数の層のうち最終層を除いたいくつかの層分による多層ネットワークとしての反応を特徴量として出力し、この出力を正解と照合して、ニューロンの結合荷重やネットワーク構成のパラメータ等を生成・更新することにより学習を行う。 The discriminant model learning unit 112a (FIG. 1) of the facial expression discrimination engine 112 uses a large-scale image data set including a large number of general images accumulated in the general image database 102 (FIG. 1) to perform learning on this CNN. Let it be done. Specifically, the image of this large-scale image data set is input to CNN, and the reaction as a multi-layer network by some layers of the plurality of layers in CNN excluding the final layer is output as a feature quantity, This output is compared with the correct answer, and learning is performed by generating and updating the connection weight of the neuron, the network configuration parameter, and the like.

ここで、本実施形態では、入力する大規模画像データセットの画像を、ポジティブ、ニュートラル、ネガティブという表情に関する３つのカテゴリに予め分類しておき、この分類結果を正解として使用する。 Here, in the present embodiment, the images of the input large-scale image data set are classified in advance into three categories regarding facial expressions of positive, neutral, and negative, and the classification result is used as a correct answer.

図３は、表情スコア決定部１１２ｂ（表情識別エンジン１１２）におけるスコア決定処理の一実施例を示すテーブルである。 FIG. 3 is a table showing an example of score determination processing in the facial expression score determination unit 112b (facial expression identification engine 112).

ここで、本実施形態において、スコアは、スコア算定対象の画像を、上述したような表情識別モデルの識別器に入力した結果出力される値であり、ポジティブ、ニュートラル、ネガティブの３項目の各々についての値となっている。すなわち、スコア算定対象である１つの画像を入力することによって、これら３つのスコアの組が１つ出力されるのである。以下、このスコアの組を単にスコアと称呼する場合もある。なお、本実施形態のこれら３つのスコアは、各項目の度合いをレコード間で比較しやすいように、合計値が１となるように規格化されている。 Here, in the present embodiment, the score is a value output as a result of inputting the image for score calculation to the discriminator of the facial expression discrimination model as described above, and for each of the three items of positive, neutral, and negative. Is the value of. That is, by inputting one image to be score-calculated, one set of these three scores is output. Hereinafter, this set of scores may be simply referred to as a score. It should be noted that these three scores in the present embodiment are standardized so that the total value becomes 1 so that the degree of each item can be easily compared between records.

図３（Ａ）には、ユーザＡ、ユーザＢ、・・・についての「実際にネガティブと判断される表情」の画像に対するスコアが示されている。ここで、ユーザＡは、表情の表出に関して一般的とされる通常タイプであり、実際、そのスコアもネガティブについての値（0.90）が最も大きくなっている。一方、ユーザＢは、「怒っても表情表出が控えめなタイプ」であり、それ故、そのスコアは、「実際にはネガティブ」であるにもかかわらずニュートラルについての値（0.65）が最も大きくなっている。 FIG. 3 (A) shows the scores for the images of “actually negative facial expressions” for user A, user B, .... Here, the user A is a normal type that is generally used for expression of facial expressions, and in fact, the score also has the largest negative value (0.90). On the other hand, the user B is a “type with a modest expression when he is angry”, and therefore, the score is the largest for neutral (0.65) even though it is “actually negative”. Is becoming

ちなみに、この表情識別モデルの識別器だけを用いた表情判定を行うとすると、上記３つのスコアのうちで最も大きい値のものに対応するカテゴリが、識別結果として出力される。例えば、図３（Ａ）のユーザＡでは、表情はネガティブであると識別されるが、ユーザＢではニュートラルであると識別されてしまう。 Incidentally, if facial expression determination is performed using only the discriminator of this facial expression discrimination model, the category corresponding to the largest value among the above three scores is output as the discrimination result. For example, the user A in FIG. 3A identifies the facial expression as negative, while the user B identifies the facial expression as neutral.

次いで、図３（Ｂ）には、ユーザＡ、ユーザＣ、・・・についての「実際にニュートラルと判断される表情」の画像に対するスコアが示されている。ここで、ユーザＡは、上述したように通常タイプであり、実際、そのスコアもニュートラルについての値（0.95）が最も大きくなっている。一方、ユーザＣは、「日頃から表情の厳しいタイプ」であり、それ故、そのスコアは、「実際にはニュートラル」であるにもかかわらずネガティブについての値（0.50）が最も大きくなっている。 Next, FIG. 3 (B) shows the scores for the images of the “expressions actually judged to be neutral” for user A, user C, .... Here, the user A is the normal type as described above, and in fact, the score thereof also has the largest value (0.95) for the neutral. On the other hand, the user C is a “type with a severe expression on a daily basis”, and therefore, the score is the largest value (0.50) for the negative despite being “actually neutral”.

さらに、図３（Ｃ）には、ユーザＡ、ユーザＤ、・・・についての「実際にポジティブと判断される表情」の画像に対するスコアが示されている。ここで、ユーザＡは、上述したように通常タイプであり、実際、そのスコアもポジティブについての値（1.00）が最も大きくなっている。一方、ユーザＤは、「笑っても表情表出が控えめなタイプ」であり、それ故、そのスコアは、「実際にはポジティブ」であるにもかかわらずニュートラルについての値（0.50）が最も大きくなっている。 Further, FIG. 3C shows the scores for the images of “actually positive facial expressions” for user A, user D, .... Here, the user A is of the normal type as described above, and in fact, the score also has the largest value (1.00) for positive. On the other hand, the user D is a “type with a modest expression even when he / she laughs”, and therefore the score is the highest for the neutral (0.50) even though the score is “actually positive”. Is becoming

以上、ユーザＡ〜Ｄについての実施例を用いて説明したように、表情スコア決定部１１２ｂ（表情識別エンジン１１２）において決定されたスコアは、表情表出傾向の個人差によって、本来あるべき値からずれてしまう場合のあることが理解される。すなわち、当該個人差によっては、正確な表情の識別が行えないことも少なくない。 As described above with reference to the examples of the users A to D, the score determined by the facial expression score determination unit 112b (facial expression identification engine 112) is different from the originally expected value due to individual differences in facial expression expression tendency. It is understood that there may be deviations. That is, it is often the case that the facial expression cannot be accurately identified depending on the individual difference.

図１の機能ブロック図に戻って、画像クラスタリング部１１３は、特定対象（例えば特定のユーザの顔）に係る複数の対象情報（例えば写真画像）から決定されたスコアに基づいて、これら複数の対象情報を、各状態（例えば顔の表情）に対応付けられた複数のクラスタに分類する。ここで、このクラスタへの分類は、スコアのなす空間においてｋ平均（k-means）法を用いて実行されてもよい。ちなみに、クラスタリング対象となる複数の写真画像は、例えば、スマートフォン１のユーザが当該端末の使用を開始し自身の写真画像を所定量蓄積した段階での、これらの蓄積された写真画像とすることができる。 Returning to the functional block diagram of FIG. 1, the image clustering unit 113 determines, based on scores determined from a plurality of pieces of target information (for example, photographic images) related to a specific target (for example, a face of a specific user), the plurality of targets. The information is classified into a plurality of clusters associated with each state (for example, facial expression). Here, the classification into the clusters may be performed by using the k-means method in the space formed by the scores. Incidentally, the plurality of photographic images to be clustered may be, for example, the accumulated photographic images at the stage when the user of the smartphone 1 starts using the terminal and accumulates a predetermined amount of his own photographic images. it can.

また、正解表情決定部１１４は、各状態（顔の表情）に対応付けられた複数のクラスタに分類された特定対象（特定のユーザの顔）に係る対象情報（写真画像）が属するクラスタに対応する状態（顔の表情）を、この対象情報（写真画像）についての正解に決定する。 In addition, the correct answer facial expression determination unit 114 corresponds to the cluster to which the target information (photo image) related to the specific target (face of a specific user) classified into a plurality of clusters associated with each state (face expression) belongs. The state (expression of the face) to be turned on is determined as the correct answer for this target information (photographic image).

図４は、画像クラスタリング部１１３及び正解表情決定部１１４における処理の一実施例を示すテーブルである。 FIG. 4 is a table showing an example of processing in the image clustering unit 113 and the correct answer facial expression determination unit 114.

図４（Ａ）には、図３（Ａ）で説明した「怒っても表情表出が控えめなタイプ」であるユーザＢについてのクラスタリング及び正解表情決定処理の結果が示されている。同図によれば、決定されたスコアからニュートラル、ニュートラル及びポジティブと判定されたユーザＢの顔画像データレコード（群）として、それぞれ
（ａ１）レコード：B-neutral-001、B-neutral-002、B-neutral-003、・・・、
（ａ２）レコード：B-neutral-101、B-neutral-102、B-neutral-103、・・・及び
（ａ３）レコード：B-positive-001、B- positive-002、B- positive-003、・・・
が挙げられている。この図４（Ａ）のテーブルでは、これらのレコードの各々について、決定された３つのスコアの値と、これらのレコードのスコアに基づいて生成されたクラスタのうちで当該レコードの属しているクラスタのＩＤ（識別子）とが、対応付けて記録されている。 FIG. 4 (A) shows the result of the clustering and correct facial expression determination processing for user B who is the “type with a modest facial expression even when angry” described in FIG. 3 (A). According to the figure, (a1) records: B-neutral-001, B-neutral-002, as face image data records (group) of the user B determined to be neutral, neutral and positive from the determined score, respectively. B-neutral-003, ...
(A2) Records: B-neutral-101, B-neutral-102, B-neutral-103, ... And (a3) Records: B-positive-001, B-positive-002, B-positive-003, ...
Are listed. In the table of FIG. 4A, for each of these records, the value of the three determined scores and the cluster to which the record belongs among the clusters generated based on the scores of these records. ID (identifier) is recorded in association with each other.

また、図４（Ｂ）には、図３（Ｂ）で説明した「日頃から表情の厳しいタイプ」であるユーザＣについてのクラスタリング及び正解表情決定処理の結果が示されている。同図によれば、決定されたスコアからネガティブ、ネガティブ、ポジティブ及びニュートラルと判定されたユーザＣの顔画像データレコード（群）として、それぞれ
（ｂ１）レコード：C-negative-001、C-negative-002、C-negative-003、・・・、
（ｂ２）レコード：C-negative-101、C-negative-102、C-negative-103、・・・、
（ｂ３）レコード：C-positive-001、C-positive-002、・・・及び
（ｂ４）レコード：C-neutral-001、・・・
が挙げられている。この図４（Ｂ）のテーブルでも、これらのレコードの各々について、決定された３つのスコアの値と、これらのレコードのスコアに基づいて生成されたクラスタのうちで当該レコードの属しているクラスタのＩＤ（識別子）とが、対応付けて記録されている。 Further, FIG. 4B shows the result of the clustering and correct facial expression determination processing for the user C who is the “type with a severe facial expression” described with reference to FIG. 3B. According to the figure, as the face image data record (group) of the user C determined to be negative, negative, positive and neutral from the determined score, (b1) records: C-negative-001, C-negative-, respectively. 002, C-negative-003, ...
(B2) Record: C-negative-101, C-negative-102, C-negative-103, ...
(B3) Record: C-positive-001, C-positive-002, ... And (b4) Record: C-neutral-001 ,.
Are listed. Also in the table of FIG. 4B, for each of these records, the value of the three determined scores and the cluster to which the record belongs among the clusters generated based on the scores of these records. ID (identifier) is recorded in association with each other.

ここで、図４（Ａ）に示したユーザＢのレコードのテーブル、及び図４（Ｂ）に示したユーザＣのレコードのテーブルにおいて、クラスタＩＤ：１，２，３の付されたクラスタは、これらのレコードについて決定されたスコアのなすスコア空間において、k-means法を用いて形成されている。具体的には、典型的な手順として、
（ア）スコア空間における各点（レコード）に対しランダムにクラスタを割り当てる。ここで、割り当てるクラスタの数は、表情識別のために採用する表情の分類モデルにおけるカテゴリの数であり、３分類モデルを採用する本実施形態では３つ（k＝3）となる。 Here, in the table of the record of the user B shown in FIG. 4 (A) and the table of the record of the user C shown in FIG. 4 (B), the clusters to which the cluster IDs are assigned are: It is formed using the k-means method in the score space formed by the scores determined for these records. Specifically, as a typical procedure,
(A) A cluster is randomly assigned to each point (record) in the score space. Here, the number of clusters to be allocated is the number of categories in the facial expression classification model adopted for facial expression identification, and is three (k = 3) in the present embodiment that employs the three classification model.

（イ）次いで、各クラスタにおける重心を算出する。
（ウ）各点（レコード）の所属するクラスタを、当該点から最も近い重心のクラスタとする。
（エ）上記（ウ）の処理を行っても、全ての点について、属するクラスタに変更が生じなければ、クラスタリングを終了する。一方、変更が生じた場合は、再度、上記（ウ）の処理を実行する。 (A) Next, the center of gravity in each cluster is calculated.
(C) The cluster to which each point (record) belongs is the cluster of the center of gravity closest to the point.
(D) Even if the process of (C) is performed, if the cluster to which all points belong does not change, the clustering ends. On the other hand, when a change occurs, the above process (c) is executed again.

なお、上記（ア）〜（エ）の処理が終了しても、この段階ではまだ、分類されたクラスタは、表情識別の分類カテゴリ（ポジティブ、ネガティブ、ニュートラル）に対応付けられていない。これらのクラスタにカテゴリ（ポジティブ、ネガティブ、ニュートラル）をラベル付けする１つの手法として、例えば、各クラスタに属するレコードにおけるカテゴリ毎のスコアの平均値を算出し、全クラスタの中で、この平均値が最も高いクラスタに対して、この平均値に係るカテゴリをラベル付けする手法が挙げられる。 Even if the above processes (a) to (d) are completed, the classified clusters are not yet associated with the classification category (positive, negative, neutral) of facial expression identification at this stage. As one method for labeling these clusters with categories (positive, negative, neutral), for example, the average value of the score for each category in the records belonging to each cluster is calculated, and this average value among all the clusters is calculated. There is a method of labeling the category related to this average value with respect to the highest cluster.

具体的には、例えば、図４（Ｂ）における
（ｂ１）レコード：C-negative-001、C-negative-002、C-negative-003、・・・、
には、ＩＤ＝１のクラスタ（以後、クラスタ１と略称）が対応付けられている。ここで、これらのレコード（ｂ１）においては、ネガティブについてのスコアの平均値が、他のレコード（ｂ２）、（ｂ３）及び（ｂ４）におけるネガティブについてのスコアの平均値のいずれよりも大きく、最大となっている。従って、レコード（ｂ１）の属するクラスタ１にはネガティブのラベルが付与される。また、
（ｂ２）レコード：C-negative-101、C-negative-102、C-negative-103、・・・、
には、クラスタ２が対応付けられている。ここで、これらのレコード（ｂ２）においては、ニュートラルについてのスコアの平均値が、他のレコード（ｂ１）、（ｂ３）及び（ｂ４）におけるニュートラルについてのスコアの平均値のいずれよりも大きく、最大となっている。従って、レコード（ｂ２）の属するクラスタ２にはニュートラルのラベルが付与される。 Specifically, for example, (b1) record in FIG. 4B: C-negative-001, C-negative-002, C-negative-003, ...
Is associated with a cluster with ID = 1 (hereinafter abbreviated as cluster 1). Here, in these records (b1), the average value of the scores for the negative is larger than any of the average values of the scores for the negative in the other records (b2), (b3), and (b4), and the maximum value. Has become. Therefore, a negative label is given to the cluster 1 to which the record (b1) belongs. Also,
(B2) Record: C-negative-101, C-negative-102, C-negative-103, ...
Is associated with the cluster 2. Here, in these records (b2), the average value of the scores for neutral is larger than any of the average values of the scores for neutral in the other records (b1), (b3), and (b4), and the maximum Has become. Therefore, the neutral label is given to the cluster 2 to which the record (b2) belongs.

さらに、
（ｂ３）レコード：C-positive-001、C-positive-002、・・・及び
（ｂ４）レコード：C-neutral-001、・・・
には、クラスタ３が対応付けられている。ここで、これらのレコード（ｂ３）及び（ｂ４）においては、ポジティブについてのスコアの平均値が、他のレコード（ｂ１）及び（ｂ２）におけるポジティブについてのスコアの平均値のいずれよりも大きく、最大となっている。従って、レコード（ｂ３）及び（ｂ４）の属するクラスタ３にはポジティブのラベルが付与される。 further,
(B3) Record: C-positive-001, C-positive-002, ... And (b4) Record: C-neutral-001 ,.
Is associated with the cluster 3. Here, in these records (b3) and (b4), the average value of positive scores is larger than any of the average values of positive scores in other records (b1) and (b2), and the maximum Has become. Therefore, a positive label is given to the cluster 3 to which the records (b3) and (b4) belong.

また、図４（Ａ）に記録されたクラスタ１〜３についても、上記と同様の手法をもって、それぞれネガティブ、ニュートラル及びポジティブのラベルが付与される。 Also, for the clusters 1 to 3 recorded in FIG. 4A, the negative, neutral, and positive labels are given by the same method as described above.

以上説明したように、画像クラスタリング部１１３によれば、レコードのスコアだけから判断するとニュートラルであるにもかかわらず、実際にはネガティブな表情でありがちなユーザＢにおいて、これらのレコードの属するクラスタに対し、本来の（正解とされる）カテゴリであるネガティブのラベルを付与することが可能となっている。また、レコードのスコアだけから判断するとネガティブであるにもかかわらず、実際にはニュートラルな表情であることも少なくないユーザＣにおいて、これらのレコードの属するクラスタに対し、本来の（正解とされる）カテゴリであるニュートラルのラベルを付与することも可能となっている。 As described above, according to the image clustering unit 113, for the user B who tends to have a negative facial expression in reality, even though it is neutral when judged only from the score of the record, with respect to the cluster to which these records belong. , It is possible to give a negative label which is the original (correct answer) category. In addition, although it is negative when judged only from the scores of the records, the user C, who often has a neutral expression in reality, has the original (correct answer) for the cluster to which these records belong. It is also possible to add a neutral label that is a category.

すなわち、以上に説明したクラスタリング処理を行うことによって、表情表出傾向の個人差に起因するスコア判定の誤差を修正可能な表情カテゴリのラベリングを行うことも可能となっている。また、これを受けて、正解表情決定部１１４は、各レコード（ユーザの写真画像に係る情報）について、当該レコードの属するクラスタに付与されたラベルのカテゴリを、「正解」に決定することができるのである。 That is, by performing the clustering process described above, it is also possible to perform labeling of the facial expression category that can correct the error in the score determination due to the individual difference in the facial expression expression tendency. Further, in response to this, the correct facial expression determination unit 114 can determine, for each record (information related to the photograph image of the user), the category of the label given to the cluster to which the record belongs as “correct”. Of.

なお、分類したクラスタに対するラベリング処理は、当然、上述した手法に限定されるものではない。例えば、クラスタを表現するベクトルと、各表情カテゴリを代表する代表ベクトルとのコサイン類似度に基づいてラベルを決定してもよい。または、所定カテゴリを有する点（レコード）からのユークリッド距離が最短となる中心値を有するクラスタに対し、当該所定カテゴリのラベルを付与することも可能である。 The labeling process for the classified clusters is not of course limited to the method described above. For example, the label may be determined based on the cosine similarity between the vector expressing the cluster and the representative vector representing each facial expression category. Alternatively, it is also possible to give a label of the predetermined category to a cluster having a central value with the shortest Euclidean distance from a point (record) having the predetermined category.

さらに、図３及び図４に示した実施例では、表情について３分類モデルを採用しているが、当然これに限定されるものではなく、例えば、Paul Ekman の７分類モデルや、これらのモデルよりもさらに細分化された感情分類モデルを適用してもよい。例えば、分類カテゴリとして、Paul Ekmanモデルの７つに加え、面白さ、軽蔑、満足、困惑、興奮、罪悪感、功績に基づく自負心、安心、納得感、喜び、及び恥を採用したものを使用することも可能である。いずれにしても、分類カテゴリの数だけクラスタが生成され、これらのクラスタにそれぞれ、当該分類カテゴリのラベルが付与される。 Furthermore, in the embodiment shown in FIGS. 3 and 4, three classification models are adopted for facial expressions, but the present invention is not limited to this, and for example, Paul Ekman's seven classification models and those models are used. May be applied to a further subdivided emotion classification model. For example, as the classification category, in addition to the seven Paul Ekman models, we use fun, disdain, satisfaction, confusion, excitement, guilt, self-confidence based on merit, peace of mind, satisfaction, joy, and shame. It is also possible to do so. In any case, as many clusters as the number of classification categories are generated, and each of these clusters is labeled with the classification category.

図１の機能ブロック図に戻って、表情決定部１１５は、
（ａ）特定対象（例えば特定ユーザの顔）に係る複数の対象情報（例えば写真画像）について決定されたスコアと、
（ｂ）当該複数の対象情報（写真画像）について決定された「正解」と
に基づいて決定された「特定識別モデル」に対して、特定対象（特定ユーザの顔）に係る１つの対象情報（写真画像）について決定されたスコアを入力し、その出力から、特定対象におけるこの１つの対象情報に係る状態（写真画像における特定ユーザの顔に現れた表情）を決定する。 Returning to the functional block diagram of FIG.
(A) a score determined for a plurality of target information (for example, a photographic image) related to a specific target (for example, a face of a specific user),
(B) With respect to the “specific identification model” determined based on the “correct answer” determined for the plurality of target information (photographic images), one target information (specific target (face of a specific user)) The score determined for the (photo image) is input, and the state related to this one piece of target information in the specific target (the facial expression on the face of the specific user in the photo image) is determined from the output.

このように、表情決定部１１５で決定された、特定対象の対象情報に係る状態（特定ユーザの写真画像の顔に現れた表情）の情報は、この対象情報（写真画像）と対応付けて表情データ記憶部１０４に記録されてもよく、また、アプリケーション１２１へ出力されて、所定のアプリケーション・プログラムによって表情判断データとして処理されてもよい。また、このアプリケーション・プログラムでの処理を介して、タッチパネル・ディスプレイ１０６に表示されてもよく、通信インタフェース部１０１を通して外部に送信されてもよい。 In this way, the information on the state related to the target information of the specific target (the facial expression appearing on the face of the photo image of the specific user) determined by the facial expression determining unit 115 is associated with the target information (photo image) to express the facial expression. It may be recorded in the data storage unit 104, or may be output to the application 121 and processed as facial expression determination data by a predetermined application program. Further, it may be displayed on the touch panel display 106 through processing by this application program, or may be transmitted to the outside through the communication interface unit 101.

ここで、この状態決定部１１５の「特定識別モデル」は、例えば、サポートベクタマシン（Support Vector Machine）による識別器のモデルであって、入力されたスコアから生成された特徴量のなす特徴量空間において各特徴量の点との距離が最大となる識別超平面を求めるモデルであってもよい。または、その他の学習有りの機械学習、例えばニューラルネットワークによる識別器のモデルとすることもできる。 Here, the “specific identification model” of the state determination unit 115 is, for example, a model of an identifier by a support vector machine (Support Vector Machine), and is a feature amount space formed by the feature amounts generated from the input scores. In the above, the model may be a model for obtaining an identification hyperplane that maximizes the distance from each feature amount point. Alternatively, it may be machine learning with other learning, for example, a model of a discriminator by a neural network.

図５は、状態決定部１１５で使用される特定識別モデルの識別器における学習の一実施形態を示す模式図である。また、図６は、特定識別モデルの識別器に採用されるＳＶＭにおける識別境界面を説明するための模式図である。 FIG. 5 is a schematic diagram showing an embodiment of learning in the discriminator of the specific discrimination model used in the state determination unit 115. Further, FIG. 6 is a schematic diagram for explaining the discrimination boundary surface in the SVM adopted in the discriminator of the specific discrimination model.

図５によれば、状態決定部１１５は、図４（Ａ）及び図４（Ｂ）に示したような、特定ユーザについての（スコアの決定された）各レコードに対し、所属するクラスタのラベルを正解として紐づけたレコードデータを、特徴量化して特定識別モデルの識別器に入力し、当該特定識別モデルの学習・更新を行っている。ここで、これらの正解付きのレコードデータは、その正解のカテゴリ別に、ネガティブログ、ニュートラルログ及びポジティブログの３種に区分されている。 According to FIG. 5, the state determination unit 115, for each record (the score of which has been determined) for a specific user, as shown in FIGS. 4A and 4B, the label of the cluster to which it belongs. The record data associated with the correct answer is converted into a feature quantity and input to the discriminator of the specific identification model to learn and update the specific identification model. Here, these record data with correct answers are classified into three types of negative log, neutral log, and positive log according to the category of the correct answer.

また、この特定識別モデルの識別器は、本実施形態においてＳＶＭを採用している。ＳＶＭは、現在開発されている数多くの機械学習手法の中でも汎用性と認識性能の両方が優れているとされる手法の１つであり、未学習データに対して高い識別性能を発揮することが可能となっている。 The discriminator of this specific discriminating model adopts SVM in this embodiment. SVM is one of the many machine learning methods currently being developed, which is said to have excellent versatility and recognition performance, and can exhibit high discrimination performance for unlearned data. It is possible.

このＳＶＭを採用した識別器では、図６に示すように、例えば、ネガティブ判定を行う場合、特徴量空間において、ネガティブログのレコード点には正解ラベルを付与して、その他のレコード点には不正解ラベルを付与する。次いで、各レコード点からの距離が最大となる面（識別境界面）を決定して、以後、ネガティブ判定に使用する。同様の処理をニュートラル判定やポジティブ判定にも行い、結局、全てのログの各フィールドの変数を入力して集計処理を行い、ＳＶＭ識別関数の判定係数を決定する。 In the discriminator employing this SVM, as shown in FIG. 6, for example, in the case of making a negative determination, the correct answer label is given to the record point of the negative log and the other record point is not recorded in the feature amount space. Give the correct answer label. Next, the surface (identification boundary surface) having the maximum distance from each record point is determined, and thereafter used for negative determination. The same process is performed for the neutral determination and the positive determination, and after all, the variables of each field of all logs are input and the aggregation process is performed to determine the determination coefficient of the SVM discriminant function.

状態決定部１１５では、このように構築された特定識別モデルのＳＶＭ識別器に対し、例えば、識別対象となる特定ユーザの写真画像におけるポジティブ、ニュートラル及びネガティブについての各スコアを入力し、すなわち上記のＳＶＭ識別関数に入力して、この特定ユーザに適した表情識別結果を出力する。 In the state determination unit 115, for example, the scores of positive, neutral and negative in the photographic image of the specific user to be identified are input to the SVM identifier of the specific identification model thus constructed, that is, the above It is input to the SVM identification function and the facial expression identification result suitable for this particular user is output.

例えば、図３及び図４の実施例で説明した、「怒っても表情表出が控えめなタイプ」のユーザＢについて学習を行った特定識別モデルのＳＶＭ識別器に対し、このユーザＢの写真画像についての３つのスコアであってニュートラルが最大であるスコアを入力することによって、正解であるネガティブとの識別結果を出力することも可能となる。また、「日頃から表情の厳しいタイプ」のユーザＣについて学習を行った特定識別モデルのＳＶＭ識別器に対し、このユーザＢの写真画像についての３つのスコアであってネガティブが最大であるスコアを入力することによって、正解であるニュートラルとの識別結果を出力することも可能となるのである。 For example, for the SVM classifier of the specific identification model that has learned about the user B of the "moderate expression expression even when angry" described in the embodiments of FIGS. 3 and 4, a photograph image of this user B It is also possible to output the discrimination result from the negative which is the correct answer by inputting the score having the maximum neutral value, which is the three scores for. Further, for the SVM discriminator of the specific discriminant model, which has been learned for the user C of “type with a severe expression on a daily basis”, the three scores of the photograph image of the user B and the score with the largest negative value are input. By doing so, it is possible to output the discrimination result from the correct answer, neutral.

このように、状態決定部１１５での状態決定処理によれば、特定対象（例えば特定のユーザの顔）の対象情報（例えば写真画像）に対し、クラスタリング処理から決定された正解を用いて学習した、この特定対象（特定のユーザの顔）の識別に適合した特定識別モデルを利用することができる。また、その結果、この特定対象の状態（特定ユーザの顔の表情）をより高い精度で識別することが可能となるのである。 As described above, according to the state determination process in the state determination unit 115, the target information (for example, the photographic image) of the specific target (for example, the face of the specific user) is learned by using the correct answer determined by the clustering process. It is possible to use a specific identification model suitable for identifying the specific target (face of a specific user). Further, as a result, it becomes possible to identify the state of the specific target (the facial expression of the specific user) with higher accuracy.

なお、特定識別モデルの識別器は、本実施形態において、特定ユーザに適合したものとなっているが、当然これに限定されるものではない。例えば、表情識別対象として、所定の属性集団、例えばある民族や、所定の居住地域の住民等を採用し、このような対象に特化した特定識別モデルの識別器を構成することもできる。なお、この場合、特定識別モデルの識別器への入力は、このような表情識別対象となる属性集団に属する人間の顔についてのスコア（レコード）となる。 In addition, although the classifier of the specific identification model is adapted to the specific user in the present embodiment, it is not limited to this. For example, it is possible to adopt a predetermined attribute group, for example, a certain ethnic group or a resident in a predetermined residential area, as a facial expression identification target, and configure a classifier of a specific identification model specialized for such an object. In this case, the input of the specific discriminant model to the discriminator is a score (record) for a human face belonging to such an attribute group that is a facial expression discrimination target.

また、特定識別モデルの識別器に採用される機械学習手法も、上述したＳＶＭに限定されるものではない。例えば、ニューラルネットワークを採用した識別器とすることも可能である。この場合、このニューラルネットワークは、入力されたスコアに対する重み付け係数を含み、決定された正解に係る状態（表情のカテゴリ）と、当該モデルの出力との誤差を減少させるように重み付け係数を更新するタイプのものとすることができる。 Further, the machine learning method adopted in the classifier of the specific classification model is not limited to the SVM described above. For example, it is possible to use a discriminator that employs a neural network. In this case, this neural network includes a weighting coefficient for the input score, and updates the weighting coefficient so as to reduce the error between the state (category of facial expression) related to the determined correct answer and the output of the model. Can be

さらに、状態決定部１１５での状態決定処理は、以上に述べた特定識別モデルを用いず、より簡易な実装の下で実施することも可能である。例えば、画像クラスタリング部１１３で生成された複数のクラスタの中心のうち、特定対象（特定ユーザの顔）に係る１つの対象情報（写真画像）について決定されたスコアとの距離が最も小さい中心を有するクラスタに付与されたラベルの状態（表情のカテゴリ）を、この１つの対象情報（写真画像）に係る状態（表情のカテゴリ）に決定してもよい。 Furthermore, the state determination process in the state determination unit 115 can be performed under a simpler implementation without using the specific identification model described above. For example, among the centers of the plurality of clusters generated by the image clustering unit 113, the center having the smallest distance from the score determined for one piece of target information (photo image) related to the specific target (face of the specific user) is included. The state of the label given to the cluster (category of facial expression) may be determined as the state (category of facial expression) related to this one piece of target information (photo image).

具体的には、１つのレコードのスコアを要素とするスコア空間のベクトルを、<(ポジティブ), (ニュートラル), (ネガティブ)>の形に記述するとした場合に、画像クラスタリング部１１３で生成され、それぞれ表情カテゴリ：ネガティブ、ニュートラル及びポジティブをラベリングされた３つのクラスタの中心は、１つの実施例として、
ネガティブ・クラスタの中心：<0.02, 0.10, 0.88>、
ニュートラル・クラスタの中心：<0.08, 0.42, 0.50>、及び
ポジティブ・クラスタの中心：<0.37, 0.35, 0.28>
といった形で表される。ここで、表情識別対象である特定対象の対象情報（特定ユーザの顔の写真画像）について決定されたスコアのなす点を<ng, nt, ps>とすると、上記の３つの中心のうち、この点<ng, nt, ps>とのユークリッド距離が最も小さい中心のクラスタに付与されたラベルを、この特定対象の対象情報の状態（表情カテゴリ）とすることができるのである。 Specifically, when the vector of the score space having the score of one record as an element is described in the form of <(positive), (neutral), (negative)>, it is generated by the image clustering unit 113, The centers of the three clusters labeled facial expression categories: Negative, Neutral and Positive, respectively, as an example,
Negative cluster center: <0.02, 0.10, 0.88>,
Neutral cluster centers: <0.08, 0.42, 0.50>, and positive cluster centers: <0.37, 0.35, 0.28>
It is expressed in the form. Here, if the point made by the score determined for the target information of the specific target (photograph image of the face of the specific user), which is the facial expression identification target, is <ng, nt, ps>, among these three centers, The label given to the central cluster having the smallest Euclidean distance from the point <ng, nt, ps> can be used as the state (expression category) of the target information of this specific target.

［他の実施形態における装置構成］
図７は、本発明による状態識別装置の他の実施形態における機能構成を示す機能ブロック図である。 [Device Configuration in Other Embodiments]
FIG. 7 is a functional block diagram showing a functional configuration in another embodiment of the state identification device according to the present invention.

図７に示した実施形態の状態識別装置であるスマートフォン５は、図１に示したスマートフォン１の機能構成部と対応する機能構成部を有している。具体的には、通信インタフェース部５０１と、カメラ５０５と、タッチパネル・ディスプレイ５０６と、画像管理部５１１と、表情スコア決定部５１２ｂを有する表情識別エンジン５１２と、正解表情決定部５１４と、表情決定部５１５と、アプリケーション１２１とを有する。 The smartphone 5 which is the state identification device of the embodiment shown in FIG. 7 has a functional configuration unit corresponding to the functional configuration unit of the smartphone 1 shown in FIG. Specifically, the communication interface unit 501, the camera 505, the touch panel display 506, the image management unit 511, the facial expression identification engine 512 having the facial expression score determination unit 512b, the correct facial expression determination unit 514, and the facial expression determination unit. 515 and the application 121.

すなわち、スマートフォン５は、図１に示したスマートフォン１の有する識別モデル学習部１１２ａ及び画像クラスタリング部１１３に対応する機能構成部を備えていない。本実施形態では、表情識別エンジン５１２の有する表情識別モデルの構築（学習）については、外部の表情識別準備装置３が、画像管理サーバ２から一般画像データを取得して行っている。また、スコアを有する写真画像データに対するクラスタリング処理についても、この表情識別準備装置３が、スマートフォン５から個人画像データを取得して行っているのである。 That is, the smartphone 5 does not include a functional configuration unit corresponding to the identification model learning unit 112a and the image clustering unit 113 included in the smartphone 1 illustrated in FIG. In the present embodiment, the external facial expression identification preparation device 3 acquires general image data from the image management server 2 to construct (learn) the facial expression identification model of the facial expression identification engine 512. The facial expression identification preparation device 3 also acquires the personal image data from the smartphone 5 and performs the clustering process on the photo image data having the score.

スマートフォン５の正解表情決定部５１４は、表情識別準備装置３から、構築された表情識別モデル及びクラスタリング結果を受信して、管理している個人画像データについての正解を決定する。次いで、表情決定部５１５は、この正解を用いて特定識別モデルを構築し、構築したこの特定識別モデルによって、表情識別対象（例えばスマートフォン５のユーザの顔写真画像）の表情カテゴリを決定するのである。 The correct answer facial expression determination unit 514 of the smartphone 5 receives the constructed facial expression identification model and the clustering result from the facial expression identification preparation device 3, and determines the correct answer for the managed personal image data. Next, the facial expression determination unit 515 constructs a specific identification model using this correct answer, and determines the facial expression category of the facial expression identification target (for example, a facial photograph image of the user of the smartphone 5) by the constructed specific identification model. ..

変更態様として、スマートフォン５は、スマートフォン１の画像クラスタリング部１１３（図１）に対応する画像クラスタリング部５１３を備えていてもよい。この場合、クラスタリング処理はスマートフォン５で実施されるので、表情識別準備装置３に個人画像データを送信する必要はなくなる。 As a modification, the smartphone 5 may include an image clustering unit 513 corresponding to the image clustering unit 113 (FIG. 1) of the smartphone 1. In this case, since the clustering process is performed by the smartphone 5, it is not necessary to send the personal image data to the facial expression identification preparation device 3.

以上説明したように、スマートフォン５では、少なくとも表情識別モデルを構築する処理を省略できる分、装置内で実行する情報処理量が格段に小さくて済む。言い換えれば、スマートフォン５は、携帯端末レベルのサイズ及び処理能力をもって表情識別を実現可能とするのである。 As described above, in the smartphone 5, at least the process of constructing the facial expression identification model can be omitted, and thus the amount of information processing executed in the device can be remarkably small. In other words, the smartphone 5 can realize facial expression recognition with the size and processing capability of a mobile terminal level.

なお、更なる他の実施形態として、スマートフォン５は、表情識別エンジン５１２、画像クラスタリング部５１３、正解表情決定部５１４及び表情決定部５１５のいずれも備えておらず、表情識別準備装置３がこれらの機能構成部を全て備えていてもよい。このような実施形態では、表情識別準備装置３が本発明に係る状態識別装置として機能する。 Note that, as still another embodiment, the smartphone 5 does not include any of the facial expression identification engine 512, the image clustering unit 513, the correct answer facial expression determination unit 514, and the facial expression determination unit 515, and the facial expression identification preparation device 3 does not include these. All the functional components may be provided. In such an embodiment, the facial expression identification preparation device 3 functions as the state identification device according to the present invention.

具体的には、スマートフォン５のカメラ５０５で撮影された個人画像を受信した表情識別準備装置３は、表情識別モデルによるスコア決定処理だけでなく、クラスタリング処理及び個人画像についての正解決定処理、さらには、特定識別モデルによる個人画像の表情カテゴリの決定処理を実施する。表情識別準備装置３は、次いで、この決定された表情カテゴリに係る情報（表情識別結果）をスマートフォン５に送信し、当該情報を受信したスマートフォン５は、当該情報をアプリケーション５２１において利用するのである。 Specifically, the facial expression identification preparation device 3 that has received the personal image captured by the camera 505 of the smartphone 5 performs not only the score determination processing based on the facial expression identification model, but also the clustering processing and the correct answer determination processing for the personal image. , The determination process of the facial expression category of the personal image is performed by the specific identification model. The facial expression identification preparation device 3 then transmits information (facial expression identification result) related to the determined facial expression category to the smartphone 5, and the smartphone 5 that has received the information uses the information in the application 521.

ちなみに、上述したようなサーバ（表情識別準備装置３）から出力された表情識別結果を享受する端末は当然、スマートフォンに限定されるものではない。例えば、タブレット型コンピュータ、ノート型コンピュータや、ＰＣ（パーソナル・コンピュータ）であってもよく、さらには、ＩＯＴ（Internet Of Things）環境での使用に適したデバイスとしてのシンクライアント（Thin client）端末等、種々の形態の端末を採用することが可能である。 Incidentally, the terminal that enjoys the facial expression identification result output from the server (facial expression identification preparation device 3) as described above is not limited to the smartphone. For example, it may be a tablet computer, a notebook computer, a PC (personal computer), and further, a thin client terminal as a device suitable for use in an IOT (Internet Of Things) environment. It is possible to adopt various types of terminals.

以上、詳細に説明したように、本発明によれば、表情識別器によって決定されるスコアだけに頼って表情を識別するのではなく、特定対象（例えば特定のユーザの顔）の対象情報（例えば写真画像）に対し、クラスタリング処理を利用して正解を予め決定する。これにより、この特定対象（特定のユーザの顔）の識別に適合した特定識別モデルを利用することができ、結果として、この特定対象の状態（特定ユーザの顔の表情）をより確実に識別することが可能となるのである。 As described above in detail, according to the present invention, the facial expression is not identified only by the score determined by the facial expression discriminator, but the target information (for example, the face of a specific user) of the specific target (for example, the face of a specific user) For a photograph image), a correct answer is determined in advance by using a clustering process. This makes it possible to use a specific identification model that is suitable for identifying the specific target (face of the specific user), and as a result, more reliably identify the state of the specific target (face expression of the specific user). It is possible.

特に、顔の表情を識別する場合、個人差や国・民族・居住地域差等が存在する表情を、これらの差異を考慮したモデルを構築することによって、より高い精度で識別することが可能となる。 In particular, when identifying facial expressions, it is possible to identify facial expressions that have individual differences, country / ethnicity / residential differences, etc. with higher accuracy by constructing a model that considers these differences. Become.

ちなみに、本発明に基づき、端末ユーザのような特定の個人の表情をより確実に識別し、そこで得られた高精度の表情識別結果を利用することによって、様々なサービスを提供可能なアプリケーション・プログラムを開発することもできる。そのようなアプリとして、例えば、この表情識別結果を利用して、対話している端末ユーザの感情（発話意図）を理解し、その応答内容を調整したり、当該ユーザとの対話内容をパーソナライズしたりすることが可能な対話ＡＩアプリが挙げられる。 By the way, based on the present invention, an application program capable of providing various services by more surely identifying the facial expression of a specific individual such as a terminal user and using the highly accurate facial expression identification result obtained there. Can also be developed. As such an application, for example, the facial expression recognition result is used to understand the emotion (utterance intention) of the terminal user who is interacting, adjust the response content, and personalize the content of the interaction with the user. An interactive AI application that can be used is mentioned.

以上に述べた本発明の種々の実施形態について、本発明の技術思想及び見地の範囲内での種々の変更、修正及び省略は、当業者によれば容易に行うことができる。以上に述べた説明はあくまで例示であって、何ら制約を意図するものではない。本発明は、特許請求の範囲及びその均等物によってのみ制約される。 Various changes, modifications, and omissions of the various embodiments of the present invention described above within the technical idea and viewpoint of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to impose any restrictions. The invention is limited only by the claims and their equivalents.

１、５スマートフォン（状態識別装置）
１０１、５０１通信インタフェース部
１０２一般画像データベース
１０３個人画像データベース
１０４表情データ記憶部
１０５、５０５カメラ
１０６、５０６タッチパネル・ディスプレイ（ＴＰ・ＤＰ）
１１１、５１１画像管理部
１１２、５１２表情識別エンジン
１１２ａ識別モデル学習部
１１２ｂ、５１２ｂ表情スコア決定部
１１３、５１３画像クラスタリング部
１１４、５１４正解表情決定部
１１５、５１５表情決定部
１２１、５２１アプリケーション
２画像管理サーバ
３表情識別準備装置 1, 5 smartphone (state identification device)
101, 501 Communication interface unit 102 General image database 103 Personal image database 104 Facial expression data storage unit 105, 505 Camera 106, 506 Touch panel display (TP / DP)
111, 511 Image management unit 112, 512 Facial expression identification engine 112a Discrimination model learning unit 112b, 512b Facial expression score determination unit 113, 513 Image clustering unit 114, 514 Correct facial expression determination unit 115, 515 Facial expression determination unit 121, 521 Application 2 Image management Server 3 Facial expression identification preparation device

Claims

所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するクラスタリング手段と、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該特定対象に係る対象情報が属するクラスタに対応付けられた状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有することを特徴とする状態識別装置。 A state identification device for identifying a state in which a predetermined target state is different in tendency to be manifested for each individual target or for each type of the target, based on target information related to the predetermined target. ,
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Score determination means for determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score Clustering means for classifying into
Associating each of each said plurality of the states of the plurality of clusters, a state in which target information related to the specific object has been correlated to belong cluster, a correct answer determination means for determining the correct answer for the target information,
About one target information related to the specific target, with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification device, comprising: a state determination means for inputting the determined score and, from the output thereof, determining a state related to the one target information in the specific target.

所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルであって、入力されたスコアから生成された特徴量のなす特徴量空間において各特徴量の点との距離が最大となる識別超平面を求める特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有することを特徴とする状態識別装置。 A state identification device for identifying a state in which a predetermined target state is different in tendency to be manifested for each individual target or for each type of the target, based on target information related to the predetermined target. ,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information,
Based on a score determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, when the plurality of target information is classified into a plurality of clusters associated with each state A correct answer determining unit that determines a state corresponding to a cluster to which the target information related to the specific target belongs, as a correct answer for the target information,
A specific identification model determined based on a score determined for a plurality of target information related to the specific target and a correct answer determined for the plurality of target information, and a feature amount generated from the input score For a specific identification model that obtains an identification hyperplane that maximizes the distance to each feature amount point in the feature amount space formed by, input the score determined for one piece of target information related to the specific target, and output it. from the state determining means that determine the state of the said one of the target information that put in the specific subject
You wherein state identification device that has a.

所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルを用いて、入力された対象情報から該対象情報に係る対象の状態を表すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定されたスコアに基づき、当該複数の対象情報を、各状態に対応付けられた複数のクラスタに分類した場合において、当該特定対象に係る対象情報が属するクラスタに対応する状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルであって、入力されたスコアに対する重み付け係数を含んでおり、決定された正解に係る状態と、当該モデルの出力との誤差を減少させるように当該重み付け係数を更新する特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
を有することを特徴とする状態識別装置。 A state identification device for identifying a state in which a predetermined target state is different in tendency to be manifested for each individual target or for each type of the target, based on target information related to the predetermined target. ,
Using an identification model determined based on a large number of target information, score determination means for determining a score representing the state of the target related to the target information from the input target information,
Based on a score determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, when the plurality of target information is classified into a plurality of clusters associated with each state A correct answer determining unit that determines a state corresponding to a cluster to which the target information related to the specific target belongs, as a correct answer for the target information,
A score determined for a plurality of object information related to the specific object, a specific identification model determined based on the correct answer determined for the plurality of target information, including Nde weighting coefficients for the input score The score determined for one piece of target information related to the specific target for the specific identification model that updates the weighting coefficient so as to reduce the error between the state of the determined correct answer and the output of the model . enter a, from the output, and a state determining means for determining a state of the said one of the target information that put in the specific subject
You wherein state identification device that has a.

当該所定の対象は人間の顔であり、当該状態は顔の表情であって、当該対象情報は、人間の顔の画像に係る情報であり、
当該特定対象は、その表情を識別する対象である個人、又はその表情を識別する対象である人間の属する所定の属性集団であり、
前記状態決定手段は、当該個人又は当該属性集団に属する人間の顔の表情の画像情報に基づいて、当該画像情報に係る顔に現れた表情を識別する
ことを特徴とする請求項１から３のいずれか１項に記載の状態識別装置。 The predetermined target is a human face, the state is a facial expression, the target information is information related to a human face image,
The specific object is a predetermined attribute group to which an individual whose facial expression is to be identified, or a human who is the target whose facial expression is to be identified,
Said state determining means, humans belonging to the individual or the population of attributes based on the facial expression image information of the face, of claims 1 to 3, characterized in that identifying the expression that appeared on the face related to the image information The state identification device according to any one of items.

当該複数の対象情報の当該クラスタへの分類は、当該スコアのなす空間においてｋ平均（k-means）法を用いて実行されることを特徴とする請求項１から４のいずれか１項に記載の状態識別装置。 Classification into the cluster of the plurality of object information, wherein to be performed using the k-means (k-means clustering) method in the form space of the score from claim 1, wherein in any one of 4 Status identification device.

前記スコア決定手段において用いられる識別モデルは、畳み込み層を含む畳み込みニューラルネットワーク（Convolutional Neural Network）における学習モデルであることを特徴とする請求項５に記載の状態識別装置。 The state identification device according to claim 5 , wherein the identification model used in the score determination means is a learning model in a convolutional neural network including a convolutional layer.

所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する状態識別装置であって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するクラスタリング手段と、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該複数のクラスタの中心のうち、当該特定対象に係る１つの対象情報について決定されたスコアとの距離が最も小さい中心を有するクラスタに対応付けられた状態を、該１つの対象情報に係る状態に決定する状態決定手段と
を有することを特徴とする状態識別装置。 A state identification device for identifying a state in which a predetermined target state is different in tendency to be manifested for each individual target or for each type of the target, based on target information related to the predetermined target. ,
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Score determination means for determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score Clustering means for classifying into
A cluster having the smallest distance from the score determined for one piece of target information related to the specific target among the centers of the plurality of clusters, each of the plurality of clusters being associated with each of the plurality of states the state of being correlated to, state identification device, characterized in that it comprises a state determining means for determining the state of the said one of the target information.

所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する装置に搭載されたコンピュータを機能させる評価推定プログラムであって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するスコア決定手段と、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するクラスタリング手段と、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該特定対象に係る対象情報が属するクラスタに対応付けられた状態を、該対象情報についての正解に決定する正解決定手段と、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定する状態決定手段と
してコンピュータを機能させることを特徴とする状態識別プログラム。 A computer mounted on a device that identifies a state of a predetermined target, which is different in tendency to occur for each individual target or for each type of the target, based on target information related to the predetermined target Is an evaluation and estimation program that makes
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Score determination means for determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score Clustering means for classifying into
Associating each of each said plurality of the states of the plurality of clusters, a state in which target information related to the specific object has been correlated to belong cluster, a correct answer determination means for determining the correct answer for the target information,
About one target information related to the specific target, with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information A state identification program, characterized in that a computer is made to function as a state determination means for inputting the determined score and determining the state related to the one target information in the specific target from the output thereof.

所定の対象の状態であって、個々の対象毎に又は当該対象の種別毎に発現する傾向が異なるような状態を、当該所定の対象に係る対象情報に基づいて識別する装置に搭載されたコンピュータにおいて実施される状態識別方法であって、
多数の対象情報に基づいて決定された識別モデルであって、取り得る複数の状態の各々である度合を示すスコアを出力する識別モデルを用いて、入力された対象情報から、該対象情報に係る対象が各状態をとり得る度合を示すスコアを決定するステップと、
当該所定の対象のうちの状態識別対象である特定対象に係る複数の対象情報から決定された各状態のスコアに基づき、当該複数の対象情報を、当該スコアのなす空間で規定される複数のクラスタに分類するステップと、
当該複数のクラスタの各々に前記複数の状態の各々を対応付け、当該特定対象に係る対象情報が属するクラスタに対応付けられた状態を、該対象情報についての正解に決定するステップと、
当該特定対象に係る複数の対象情報について決定されたスコアと、当該複数の対象情報について決定された正解とに基づいて決定された特定識別モデルに対して、当該特定対象に係る１つの対象情報について決定されたスコアを入力し、その出力から、当該特定対象における該１つの対象情報に係る状態を決定するステップと
を有することを特徴とする状態識別方法。 A computer mounted on a device that identifies a state of a predetermined target, which is different in tendency to occur for each individual target or for each type of the target, based on target information related to the predetermined target A state identification method implemented in
Be identification model determined based on a number of target information using the identification model that outputs a score indicating the degree are each of a plurality of possible states, the target information input, according to the target information Determining a score indicating the degree to which the subject can assume each state,
Based on a score of each state determined from a plurality of target information related to a specific target that is a state identification target of the predetermined target, the plurality of target information, a plurality of clusters defined in the space formed by the score A step of classifying into
Associating each of said plurality of states in each of the plurality of clusters, a state in which target information related to the specific object has been correlated to belong cluster, determining the correct answer for the target information,
About one target information related to the specific target, with respect to the specific identification model determined based on the score determined for the plurality of target information related to the specific target and the correct answer determined for the plurality of target information Inputting the determined score, and determining the state related to the one piece of target information in the specific target from the output thereof, the state identifying method.