JP6282121B2

JP6282121B2 - Image recognition apparatus, image recognition method, and program

Info

Publication number: JP6282121B2
Application number: JP2014005274A
Authority: JP
Inventors: 康生片野; 要冨手
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-01-15
Filing date: 2014-01-15
Publication date: 2018-02-21
Anticipated expiration: 2034-01-15
Also published as: JP2015133065A

Description

本発明は、特に微細な表情変化や個体差を反映したＣＧ画像を生成するために用いて好適な画像認識装置、画像認識方法およびプログラムに関する。 The present invention relates to an image recognition apparatus, an image recognition method, and a program suitable for use in generating a CG image reflecting particularly minute facial expression changes and individual differences.

従来、例えば特許文献１に記載の技術のように、ＣＧによる顔の表情を生成する技術について、これまで多くの研究がなされている。特に非特許文献１に記載のBlend Shape法は、MAYAや3D Studio Maxなど多くの市販のＣＧモデリングソフトウェアにおいても重要な表情生成手段として使用されている。この方法は、無表情の標準基底表情に、あらかじめ設定した基底表情（喜び、怒り、悲しみなど）の重み付き線形和として記述することにより、生成したい表情を適用する方法である。 Conventionally, many studies have been made on a technique for generating a facial expression using CG, such as the technique described in Patent Document 1. In particular, the Blend Shape method described in Non-Patent Document 1 is used as an important expression generation means in many commercially available CG modeling software such as MAYA and 3D Studio Max. This method is a method of applying a facial expression to be generated by describing a weighted linear sum of preset basic facial expressions (joy, anger, sadness, etc.) to a standard facial expression of no expression.

しかしながらこのBlend Shape法は、すべての表情空間を表現できているわけではなく、あらかじめ設定された基底表情の重み付き線形和では表現できない表情が存在する。特に微細な表情変化や個体差による影響については、Blend Shape法で表現することは難しい。 However, this Blend Shape method does not represent all facial expression spaces, and there are facial expressions that cannot be expressed by a weighted linear sum of preset base facial expressions. In particular, it is difficult to express the effects of fine facial expression changes and individual differences using the Blend Shape method.

そこで、表現したい微細な表情の違いとしてMicro Expressionという概念を提唱する方法や、疎な表情変形と微細な顔形状とを分けて法線マップで表現する手法が提案されている。このように微細表情やリアルな表情再現についての要求は高まっている。また、例えば非特許文献２には、上述するような仮想モデルを用いて生成された画像を、データベースを用いて学習を施す画像認識技術も提案されている。 Therefore, there have been proposed a method of proposing the concept of Micro Expression as a difference in fine facial expressions to be expressed, and a method of expressing sparse facial expression deformation and fine face shape by dividing them with a normal map. Thus, the demand for fine facial expressions and realistic facial expression reproduction is increasing. For example, Non-Patent Document 2 proposes an image recognition technique in which an image generated using a virtual model as described above is learned using a database.

特開２００６−１８８２８号公報JP 2006-18828 A 特開２００９−７５８８０号公報JP 2009-75880 A

V. Blanz and T. Vetter, "A Morphable Model for the Synthesis of 3D Faces", Computer Graphics Proc. SIGGRAPH '99, pp. 187-194, 1999.V. Blanz and T. Vetter, "A Morphable Model for the Synthesis of 3D Faces", Computer Graphics Proc. SIGGRAPH '99, pp. 187-194, 1999. P. Paysan, R. Knothe, B. Amberg, S. Romdhani, T. Vetter, "A 3D Face Model for Pose and Illumination Invariant Face Recognition". AVSS 2009P. Paysan, R. Knothe, B. Amberg, S. Romdhani, T. Vetter, "A 3D Face Model for Pose and Illumination Invariant Face Recognition". AVSS 2009 D. G. Lowe, "Distinctive image features from scaleinvariant keypoints", International Journal of Computer Vision, 60(2), pp. 91-110, 2004.D. G. Lowe, "Distinctive image features from scaleinvariant keypoints", International Journal of Computer Vision, 60 (2), pp. 91-110, 2004. J. Shi and C. Tomasi, "Good features to track", In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR94), Seattle, June 1994.J. Shi and C. Tomasi, "Good features to track", In Proc.IEEE Conf. On Computer Vision and Pattern Recognition (CVPR94), Seattle, June 1994. Iain Matthews, Simon Baker, "Active Appearance Models Revisited", International Journal of Computer Vision, Vol. 60, No. 2, November, 2004, pp. 135 - 164.Iain Matthews, Simon Baker, "Active Appearance Models Revisited", International Journal of Computer Vision, Vol. 60, No. 2, November, 2004, pp. 135-164. J. Ahlberg, CANDIDE-3 -- an updated parameterized face, Report No. LiTH-ISY-R-2326, Dept. of Electrical Engineering, Linkoping University, Sweden, 2001.J. Ahlberg, CANDIDE-3-an updated parameterized face, Report No. LiTH-ISY-R-2326, Dept. of Electrical Engineering, Linkoping University, Sweden, 2001. 片野坂匡裕，田中一史，本井美甫，坂本麻似子，山崎敏正，牧秀之，高柳浩，山ノ井高洋，上条憲一，"Single-trial EEGsを利用したBCIの開発−ICA結果の再現性について−"，電子情報通信学会技術研究報告. MBE, MEとバイオサイバネティックス 108(405), 83-86, 2009-01-16Akihiro Katanosaka, Kazufumi Tanaka, Misaki Motoi, Mamiko Sakamoto, Toshimasa Yamazaki, Hideyuki Maki, Hiroshi Takayanagi, Takahiro Yamanoi, Kenichi Kamijo, "Development of BCI using Single-trial EEGs-Reproduction of ICA results Technical Report of the Institute of Electronics, Information and Communication Engineers. MBE, ME and Bio Cybernetics 108 (405), 83-86, 2009-01-16 T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R. Sumner, M. Gross, "High-Quality Passive Facial Performance Capture using Anchor Frames", ACM Transactions on Graphics (Proc. SIGGRAPH 2011), vol. 30, no. 3, August 2011T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R. Sumner, M. Gross, "High-Quality Passive Facial Performance Capture using Anchor Frames", ACM Transactions on Graphics ( Proc. SIGGRAPH 2011), vol. 30, no. 3, August 2011 D. D. Lee and H. S. Seung, "Learning the parts of objects with nonnegative matrix factorization," Nature, vol. 401, pp. 788-791, 1999.D. D. Lee and H. S. Seung, "Learning the parts of objects with nonnegative matrix factorization," Nature, vol. 401, pp. 788-791, 1999. Lane Daughtry, 小田島義行, 城間英樹, 畑中敏宏, "目指せゲームクリエータ Maya Games", pp. 115-172, 2002Lane Daughtry, Yoshiyuki Odajima, Hideki Shiroma, Toshihiro Hatanaka, "Aiming Game Creator Maya Games", pp. 115-172, 2002

上述したBlend shape法に関係する多くの手法では、多数の特徴点の位置と動きとから変動ベクトルを再現する場合に、特に微細表情と呼ばれる微妙な変動成分パラメータを最終的に作業者が手作業によって調整する必要がある。映画などのオフライン作業であればこのような手法もあり得るが、特に微細な顔の表情に関するデータベースの作成を目的とした場合には、以下のような課題がある。 In many methods related to the blend shape method described above, when a variation vector is reproduced from the positions and movements of a large number of feature points, an operator must manually perform a subtle variation component parameter called a fine expression. Need to be adjusted by. Such an approach can be used for off-line work such as a movie, but there are the following problems especially for the purpose of creating a database relating to fine facial expressions.

まず、日常生活のログ画像を撮りためておき、その中から微細な表情変化を手作業により抽出することは困難である。これは、微細表情がいつどのような状況で発生するか不明なため、長時間監視するログ画面の中から所望の微細表情を抽出することは労力を必要とし、さらには効率が悪いからである。したがって、同じ感情に由来する表情成分を大量に集めることが非常に困難である。 First, it is difficult to take a log image of daily life and extract minute facial expression changes manually from the log image. This is because it is unclear when and in what circumstances a fine facial expression will occur, so extracting a desired fine facial expression from a log screen that is monitored for a long time requires labor and is inefficient. . Therefore, it is very difficult to collect a large amount of facial expression components derived from the same emotion.

また、画像認識を目的としたデータベースの場合、大量のタグ付き画像が必要となるが、手作業によりタグ付けを行うことが困難なため、学習に使用できない。さらには、微細な表情を仮想モデルに反映したい場合に、複数の要因が関係するため、どのような変動成分を適用したらよいかが不明である。 In addition, in the case of a database intended for image recognition, a large amount of tagged images are required, but since it is difficult to perform tagging by hand, it cannot be used for learning. Furthermore, when a fine facial expression is to be reflected in the virtual model, it is unclear what variation component should be applied because a plurality of factors are involved.

本発明は前述の問題点に鑑み、微細な表情の変動または個体差を適用した変動成分を高精度にかつ簡単に生成できるようにすることを目的としている。 The present invention has been made in view of the above-described problems, and an object of the present invention is to be able to easily and accurately generate a fluctuation component to which fine facial expression fluctuations or individual differences are applied.

本発明に係る画像認識装置は、被写体の時系列情報を入力する入力手段と、前記入力手段によって入力された時系列情報からイベントに関する変化点を検出するとともに、前記変化点の周囲の時系列情報からイベント情報を検出する検出手段と、前記検出手段によって検出されたイベント情報に基づいて、前記イベントに関連する１つまたは複数のフレームの区間を前記時系列情報から取得するとともに、前記取得したフレームの区間に係る特徴量を取得する取得手段と、前記取得手段によって取得された特徴量を１つまたは複数の基底成分に分解する分解手段と、前記変化点の周囲における基底成分の変化に基づいて、前記分解手段によって分解された基底成分から前記イベントに関連する基底成分を抽出する抽出手段と、前記抽出手段によって抽出された基底成分から仮想モデルの変動成分を生成する生成手段と、を備えることを特徴とする。 An image recognition apparatus according to the present invention detects time change information related to an event from input means for inputting time series information of a subject, time series information input by the input means, and time series information around the change point And detecting the event information from the time series information based on the event information detected by the detecting means, and acquiring one or a plurality of frame sections related to the event from the time series information Based on the change of the base component around the change point, the acquisition means for acquiring the feature quantity related to the section, the decomposition means for decomposing the feature quantity acquired by the acquisition means into one or a plurality of base components Extraction means for extracting a base component related to the event from the base components decomposed by the decomposition means; and A generating means for generating a fluctuation component of a virtual model from basal components extracted I, characterized in that it comprises a.

本発明によれば、微細な表情の変動または個体差を適用した変動成分を高精度にかつ簡単に生成できる。 According to the present invention, it is possible to easily and accurately generate a fluctuation component to which minute facial expression fluctuations or individual differences are applied.

本発明の第１の実施形態に係る画仮想モデル生成装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image virtual model production | generation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態におけるイベント検出に関連する時系列軌跡データの一例を示す模式図である。It is a schematic diagram which shows an example of the time series locus data relevant to the event detection in the 1st Embodiment of this invention. 本発明の第２の実施形態に係る画仮想モデル生成装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image virtual model production | generation apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態におけるイベント検出に関連する時系列軌跡データの一例を示す模式図である。It is a schematic diagram which shows an example of the time series locus data relevant to the event detection in the 2nd Embodiment of this invention. 本発明の第３の実施形態に係る画仮想モデル生成装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image virtual model production | generation apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施形態におけるイベント検出に関連する時系列軌跡データの一例を示す模式図である。It is a schematic diagram which shows an example of the time series locus data relevant to the event detection in the 3rd Embodiment of this invention. 本発明の第４の実施形態に係る変動テクスチャ生成装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the fluctuation | variation texture production | generation apparatus which concerns on the 4th Embodiment of this invention.

（第１の実施形態）
以下、本実施形態では、被写体として対象人物の顔を常時撮影し、その顔画像の中から微細な表情変化を抽出して仮想モデルに反映させる方法について説明する。
図１は、本実施形態に係る画像認識装置として仮想モデル生成装置１００の機能構成例を示すブロック図である。
図１において仮想モデル生成装置１００は、入力部１１０、イベント検出部１２０、イベント解析部１４０、フレーム区間抽出部１３０、基底分解部１５０、イベント関連基底成分抽出部１６０、変形ベクトル生成部１７０、及びＣＧ変形部１８０を備えている。 (First embodiment)
Hereinafter, in the present embodiment, a method will be described in which a face of a target person is always photographed as a subject, and a fine facial expression change is extracted from the face image and reflected in a virtual model.
FIG. 1 is a block diagram illustrating a functional configuration example of a virtual model generation device 100 as an image recognition device according to the present embodiment.
1, the virtual model generation apparatus 100 includes an input unit 110, an event detection unit 120, an event analysis unit 140, a frame section extraction unit 130, a base decomposition unit 150, an event-related base component extraction unit 160, a modified vector generation unit 170, A CG deformation unit 180 is provided.

イベント検出部１２０は、入力部１１０より入力された動画像から時系列的な変化点を検出する。変化点を検出する際には、例えば、非特許文献３に記載されているSIFT特徴や非特許文献４に記載されているKLT trackerなどの特徴点の追跡手法を利用する。そして、これらの方法によって取得した時系列的な軌跡情報から変化点を求める。また、他の方法として、ある特定の軌跡パターンとのパターンマッチングによって特定のパターンを検出して変化点を求めるようにしてもよい。また、フレーム間差分法により抽出した差分成分がある程度以上の面積を有する場合をイベントとして検出するなどの単純な方法であってもよい。本実施形態では、検出する変化点の詳細な意味を把握する必要はなく、時系列上でイベントとして検出された時刻を抽出する。 The event detection unit 120 detects a time-series change point from the moving image input from the input unit 110. When detecting the change point, for example, a feature point tracking method such as SIFT feature described in Non-Patent Document 3 or KLT tracker described in Non-Patent Document 4 is used. And a change point is calculated | required from the time-sequential locus | trajectory information acquired by these methods. As another method, a change point may be obtained by detecting a specific pattern by pattern matching with a specific trace pattern. Also, a simple method such as detecting a case where the difference component extracted by the interframe difference method has an area of a certain extent or more may be used. In the present embodiment, it is not necessary to grasp the detailed meaning of the detected change point, and the time detected as an event on the time series is extracted.

図２は、イベント検出に関連する時系列軌跡データの一例を示す模式図であり、図２において、横軸は動画像の時間（フレーム番号）を示し、縦軸は標準時からの変動分を示す。また、図２に示す軌跡は顔画像中の特徴量（例えばKLT tracker）の軌跡を表し、その軌跡の波形を評価することによりイベントの時刻を検出する。図２に示す例では、時刻２００にイベントが検出されたものとしている。 FIG. 2 is a schematic diagram illustrating an example of time-series trajectory data related to event detection. In FIG. 2, the horizontal axis indicates the time (frame number) of the moving image, and the vertical axis indicates the variation from the standard time. . Also, the trajectory shown in FIG. 2 represents the trajectory of the feature quantity (for example, KLT tracker) in the face image, and the event time is detected by evaluating the waveform of the trajectory. In the example illustrated in FIG. 2, it is assumed that an event is detected at time 200.

本実施形態では、入力部１１０として、動画像を生成するビデオカメラを想定した例について説明するが、これに限定するものではない。例えば、イベントの時刻を検出するために必要であれば、音声を入力するマイクや、kinectに代表されるような奥行きを測定できる装置を併せて適用することができる。 In the present embodiment, an example in which a video camera that generates a moving image is assumed as the input unit 110 will be described, but the present invention is not limited to this. For example, if necessary to detect the time of an event, a microphone that inputs sound and a device that can measure depth such as kinect can be applied together.

また、入力部１１０に入力された情報から、そのままイベント検出部１２０にてイベントの時刻を直接検出してもよいいが、それ以外の方法によりイベントの時刻を検出してもよい。例えば、微細な表情変化を取得するために、非特許文献１に記載されているBlend Shape法によって大まかな表情を取得してその結果をもとに微細表情を含んだデータとの差分を取得する。イベント検出部１２０は、その差分情報を入力する。 The event time may be directly detected by the event detection unit 120 as it is from the information input to the input unit 110, or the event time may be detected by other methods. For example, in order to acquire a fine facial expression change, a rough facial expression is obtained by the Blend Shape method described in Non-Patent Document 1, and a difference from data including the fine facial expression is obtained based on the result. . The event detection unit 120 inputs the difference information.

イベント解析部１４０は、イベント検出部１２０で検出されたイベントの時刻を周辺の時系列情報から解析する。本実施形態では、イベントを解析するために時系列的な特徴量を用い、イベント情報としてイベント周辺の軌跡を取得する。詳細は後述するが、フレーム区間抽出部１３０は、このイベント周辺の軌跡から、図２に示すようなイベントに関連するイベント有区間２１０と、関連しないイベント無区間２２０とを抽出することとなる。 The event analysis unit 140 analyzes the time of the event detected by the event detection unit 120 from the surrounding time series information. In the present embodiment, a trajectory around the event is acquired as event information using time-series feature amounts in order to analyze the event. As will be described in detail later, the frame section extraction unit 130 extracts the event present section 210 related to the event as shown in FIG. 2 and the unrelated event non-section 220 from the trajectory around the event.

フレーム区間抽出部１３０は、キーフレーム領域抽出処理１３１において、イベント解析部１４０によって解析された情報を用いてキーフレーム領域を抽出する。また、フレーム区間抽出部１３０は、イベント有区間抽出処理１３２において、以下のようにしてイベントの時刻２００に関連するイベント有区間２１０を求める。例えば、波形の平均値からピークに向かって離れる位置、及びピークに達した部分の時刻２００の後で平均値に再び戻る位置をその境界としてイベント有区間２１０を設定する。また、フレーム間の相関を取り、イベントの時刻２００と相関の高い領域をイベント有区間２１０とするなど、検出したイベントに関連する領域を抽出できる方法であればどのような方法を用いてもよい。 The frame section extraction unit 130 extracts a key frame region using the information analyzed by the event analysis unit 140 in the key frame region extraction processing 131. Further, in the event presence section extraction processing 132, the frame section extraction unit 130 obtains the event presence section 210 related to the event time 200 as follows. For example, the event existence section 210 is set with the position away from the average value of the waveform toward the peak and the position at which the peak is reached again after the time 200 at the peak as the boundary. In addition, any method may be used as long as it can extract a region related to a detected event, such as taking a correlation between frames, and setting a region having a high correlation with the event time 200 as an event-provided section 210. .

一方、イベント無区間抽出処理１３３では、イベント無区間２２０は、イベント周囲の画像列からイベント有区間２１０との差分を取った領域とする。本実施形態においては、イベント無区間２２０にイベント有区間２１０の前区間２２１と後区間２２２との両方を合わせてイベント無区間２２０としている。ところが、これに限定するものではなく、ユースケースに応じて前区間２２１のみ、もしくは後区間２２２のみをイベント無区間２２０としてもよい。 On the other hand, in the eventless section extraction process 133, the eventless section 220 is an area obtained by taking a difference from the event existence section 210 from the image sequence around the event. In the present embodiment, the event-free section 220 is combined with both the previous section 221 and the rear section 222 of the event-provided section 210 as the event-free section 220. However, the present invention is not limited to this, and only the previous section 221 or only the rear section 222 may be set as the no-event section 220 according to the use case.

さらにフレーム区間抽出部１３０は、イベント有区間２１０およびイベント無区間２２０に対し、顔画像処理を行う。本実施形態では、顔画像処理として非特許文献５に記載されているＡＡＭ（Active Appearance Model）および非特許文献６に記載されているCANDIDEを用いて顔形状の記述を行う。また、本実施形態では上記ＡＡＭを用いるがこれに限定するものではなく、前記KLT trackerの情報も用いるなど他の情報を組み合わせて使用してもよい。ＡＡＭなどコーナー点や特徴点を用いたマーカーレスの方法だけでなく、マーカーを顔上に設置してマーカー点を追跡する方法なども適用することができる。本実施形態では、後述する基底分解を行う前の段階で、フレーム区間抽出部１３０により、イベント有区間２１０およびイベント無区間２２０の複数のフレームに関して、それぞれ特徴量として特徴ベクトルＦ_e、Ｆ_nが得られるものとする。 Further, the frame segment extraction unit 130 performs face image processing on the event present segment 210 and the event non-segment 220. In the present embodiment, face shape is described using AAM (Active Appearance Model) described in Non-Patent Document 5 and CANDIDE described in Non-Patent Document 6 as face image processing. In the present embodiment, the AAM is used. However, the present invention is not limited to this, and other information such as the information of the KLT tracker may be used in combination. Not only a markerless method using corner points and feature points such as AAM but also a method of tracking a marker point by placing a marker on the face can be applied. In the present embodiment, the feature vectors F _e and F _n are respectively obtained as feature amounts for a plurality of frames in the event-provided section 210 and the event-free section 220 by the frame section extraction unit 130 before performing the base decomposition described later. Shall be obtained.

基底分解部１５０は、イベント有区間２１０（Ｆ_e）およびイベント無区間２２０（Ｆ_n）においてそれぞれを基底分解する。基底分解の方法にはＰＣＡ（Principal Component Analysis）や、ＩＣＡ（Independent Component Analysis）を適用できる。さらにＮＭＦ（Non-Negative Matrix Factorization）などを適用することも可能である。本実施形態では、ＰＣＡではデータの欠損などによって固有値ベクトルが変化してしまうのに対して、非特許文献７に記載の統計的な独立性を扱うことができるＩＣＡを例に説明する。 The base decomposition unit 150 performs base decomposition on each of the event-provided section 210 (F _e ) and the event-free section 220 (F _n ). PCA (Principal Component Analysis) and ICA (Independent Component Analysis) can be applied to the base decomposition method. Further, NMF (Non-Negative Matrix Factorization) or the like can be applied. In the present embodiment, an eigenvalue vector changes due to data loss or the like in PCA, whereas ICA that can handle statistical independence described in Non-Patent Document 7 will be described as an example.

まず、分解処理１５２では、フレーム区間抽出部１３０より出力されたイベント有区間２１０のフレームに対してＩＣＡを適用し、以下の式（１）に示すようなｎ個の基底信号Ｆ_kの重みｗ_kによる重み付き線形和に分解する。 First, in the decomposition process 152, ICA is applied to the frame of the event-carrying section 210 output from the frame section extraction unit 130, and the weight w of n base signals F _k as shown in the following equation (1). _Break down into weighted linear sums with _k .

同様に、分解処理１５３では、イベント無区間２２０のフレームに対して、以下の式（２）に示すｍ個の基底信号Ｆ_lの重みｗ_lによる重み付き線形和に分解する。 Similarly, in the decomposition process 153, the frame of the event-less interval 220 is decomposed into a weighted linear sum of m basis signals F _{l based on} the weight w _l shown in the following equation (2).

イベント有区間２１０およびイベント無区間２２０は、連続した時系列のデータであるため、多くの基底は共通していることが予想される。しかし、イベント有区間２１０にはイベントに由来する信号成分があるため、イベント無区間２２０には含まれない基底成分があると仮定できる。 Since the event present section 210 and the event non-section 220 are continuous time-series data, it is expected that many bases are common. However, since there is a signal component derived from an event in the event existence section 210, it can be assumed that there is a base component not included in the event-less section 220.

イベント関連基底成分抽出部１６０は、領域間基底比較処理１６１によりイベント有区間２１０の基底信号Ｆ_kのセットとイベント無区間２２０の基底信号Ｆ_lのセットとを比較する。そして、集合｛Ｆ_k｝に存在して集合｛Ｆ_l｝に存在しない基底成分を抽出する。具体的には例えば、すべての基底信号Ｆ_kおよびＦ_lの基底成分間で相互相関を取り、相関値が閾値を超えるものを両方の領域に存在する基底成分として除去する。残った基底成分のうち、集合｛Ｆ_k｝にのみ存在し、重みｗ_kの値が大きいものをイベント関連基底成分として抽出する。なお、重みｗ_kの閾値は、あらかじめ決められたものでもよいし、事前の学習などによって求められる値を適用してもよい。 The event-related base component extraction unit 160 compares the set of the base signal F _{k in} the event-equipped section 210 and the set of the base signal F _{l in} the event-free section 220 by the inter-region base comparison processing 161. Then, base components that exist in the set {F _k } but do not exist in the set {F _l } are extracted. Specifically, for example, the cross-correlation is taken between the base components of all base signals F _k and F _l , and those whose correlation value exceeds the threshold are removed as base components existing in both regions. Among the remaining basis components, those that exist only in the set {F _k } and have a large weight w _k are extracted as event-related basis components. Note that the threshold value of the weight w _k may be determined in advance, or a value obtained by prior learning or the like may be applied.

このように抽出されるイベント関連基底成分は、ＡＡＭのようなランドマーク点の変形後の構造として表されているので、変形ベクトル生成部１７０は、このランドマーク点のフレーム間変形ベクトルから、変動成分であるＣＧの変形ベクトルに適用する。具体的には、例えば非特許文献８に記載されているような方法でランドマーク点の変形ベクトルからＣＧの変形ベクトルを算出することが可能である。 Since the event-related base component extracted in this way is represented as a structure after deformation of a landmark point such as AAM, the deformation vector generation unit 170 changes from the inter-frame deformation vector of this landmark point. It applies to the deformation vector of CG which is a component. Specifically, for example, a CG deformation vector can be calculated from a landmark point deformation vector by a method described in Non-Patent Document 8.

上記のような実施形態によって、イベント検出では検出可能だがBlend Shape法では表現しきれない微細な表情の変化の変形ベクトルを算出することが可能となる。ＣＧ変形部１８０は、このＣＧの変形ベクトルを適用することにより微細な表情のＣＧ画像を生成することが可能となる。 According to the embodiment as described above, it is possible to calculate a deformation vector of a fine expression change that can be detected by event detection but cannot be expressed by the Blend Shape method. The CG deformation unit 180 can generate a CG image with a fine expression by applying the CG deformation vector.

なお、上述した実施形態は顔のＣＧ画像について述べたが、これに限定されず、他のＣＧ物体の変形に対しても適用可能である。また、あらかじめBlend Shape法を適用して入力映像との差分を入力情報としたが、本実施形態はこれに限定するものではなく、直接入力映像を入力情報とすることも可能である。 In the above-described embodiment, the CG image of the face has been described. However, the present invention is not limited to this, and can be applied to deformation of other CG objects. In addition, although the blend shape method is applied in advance and the difference from the input video is used as the input information, the present embodiment is not limited to this, and the direct input video can be used as the input information.

（第２の実施形態）
第１の実施形態では、イベントの時系列的な特徴を用いてイベントに関連する基底成分を求める方法としてイベント有区間とイベント無区間とを設定し、それぞれの基底成分の論理演算からイベントに含まれるイベント関連基底成分を抽出した。これに対して本実施形態では、異なる時系列特徴を用いてイベント関連基底成分を抽出する方法について詳述する。 (Second Embodiment)
In the first embodiment, an event-within interval and an event-free interval are set as a method for obtaining a base component related to an event using time-series characteristics of the event, and are included in the event from a logical operation of each base component. Event-related base components extracted. In contrast, in the present embodiment, a method for extracting event-related base components using different time series features will be described in detail.

図３は、本実施形態における仮想モデル生成装置３００の機能構成例を示すブロック図である。また、図４は、本実施形態におけるイベントの検出に関連する時系列軌跡データの一例を示す模式図である。なお、基本的な構成は第１の実施形態と同様であるため、差異のある部分のみを詳述する。 FIG. 3 is a block diagram illustrating a functional configuration example of the virtual model generation device 300 according to the present embodiment. FIG. 4 is a schematic diagram illustrating an example of time-series trajectory data related to event detection in the present embodiment. Since the basic configuration is the same as that of the first embodiment, only differences will be described in detail.

イベント解析部３４０は、イベント検出部１２０にて検出されたイベントの時刻４００から、イベント有区間となるフレームの範囲である区間情報３４１と、そのイベント領域内の軌跡の変極点やゼロクロス点、カーブ曲率などの波形情報３４２とを取得する。フレーム区間抽出部３３０では、イベント有区間抽出処理１３２により区間情報３４１を用いて図４に示すようなイベント有区間４１０を抽出する。なお、本実施形態では、イベント無区間の抽出を不要としている。 The event analysis unit 340 starts from the time 400 of the event detected by the event detection unit 120, the section information 341 that is the range of the frame that is the event-provided section, the inflection point, the zero cross point, the curve of the trajectory in the event area Waveform information 342 such as curvature is acquired. The frame section extraction unit 330 extracts the section with event 410 as shown in FIG. 4 using the section information 341 by the section with event extraction process 132. In the present embodiment, it is not necessary to extract an eventless section.

基底分解部３５０は、イベント有区間（Ｆ_e）に対してＩＣＡを適用し、式（１）に示したような基底信号Ｆ_kの重み付き線形和に分解する。さらにフレーム有区間内における重みｗ_kの変化軌跡を求める。この手順は第１の実施形態の分解処理１５２と同様である。イベント関連基底成分抽出部３６０は、フレーム有区間内の重みｗ_kの変化軌跡と、イベント解析部３４０で取得した波形情報３４２とを取得する。そして、比較処理３６１において、これらの変化軌跡と波形情報３４２との相互相関を取り、波形の類似性を比較する。 The base decomposition unit 350 applies ICA to the event interval (F _e ) and decomposes it into a weighted linear sum of the base signal F _k as shown in equation (1). Further, a change locus of the weight w _k in the frame-equipped section is _obtained . This procedure is the same as the disassembly process 152 of the first embodiment. The event-related base component extraction unit 360 acquires the change trajectory of the weight w _k in the section with frame and the waveform information 342 acquired by the event analysis unit 340. Then, in the comparison process 361, the cross-correlation between the change locus and the waveform information 342 is taken, and the similarity of the waveforms is compared.

比較方法としては、イベント有区間内の波形のピーク位置と、波形情報３４２のピーク位置やゼロクロス点とを比較する。さらに、互いの波形をピーク時間で正規化して波形間の相互相関を求めることにより、イベント有区間の波形情報３４２の信号と類似したｗ_kのプロファイルをもつ基底成分Ｆ_kをイベント関連基底成分とする方法を採用してもよい。 As a comparison method, the peak position of the waveform in the event existence section is compared with the peak position and zero cross point of the waveform information 342. Further, by normalizing each waveform with the peak time and obtaining the cross-correlation between the waveforms, the basis component F _k having a profile of w _k similar to the signal of the waveform information 342 in the event interval is determined as the event-related basis component. You may adopt the method of doing.

以上のように本実施形態によれば、微細な表情を適用した変形ベクトルを高精度にかつ簡単に生成することができる。 As described above, according to the present embodiment, a deformation vector to which a fine expression is applied can be easily generated with high accuracy.

（第３の実施形態）
第１および第２の実施形態では、イベントの時系列的な特徴を用いてイベント関連基底成分を抽出した。これに対して本実施形態では、イベントが顔のどのあたりで発生しているかを示す空間局在性を用いてイベント関連基底成分を抽出する方法について詳述する。 (Third embodiment)
In the first and second embodiments, event-related base components are extracted using time-series characteristics of events. In contrast, in the present embodiment, a method for extracting event-related base components using spatial localization indicating where an event has occurred is described in detail.

図５は、本実施形態における仮想モデル生成装置５００の機能構成例を示すブロック図である。また、図６は、本実施形態におけるイベント検出に関連する時系列軌跡データの一例を示す模式図である。なお、基本的な構成は第１又は第２の実施形態と同様であるため、差異のある部分のみを詳述する。 FIG. 5 is a block diagram illustrating a functional configuration example of the virtual model generation device 500 in the present embodiment. FIG. 6 is a schematic diagram illustrating an example of time-series trajectory data related to event detection in the present embodiment. Since the basic configuration is the same as that of the first or second embodiment, only the differences will be described in detail.

イベント解析部５４０は、イベント検出部１２０にて検出されたイベントの時刻６００から、イベント有区間となる領域情報５４１と、そのイベントが顔のどの付近で起きたものかを表す局所情報５４２とを取得する。本実施形態では、顔の中で最も変化しやすい口および眼に着目して説明する。図６に示す例では、イベント有区間６１０においては、眼領域軌跡６２０では変動が少なく、口領域軌跡６３０にて変化が起きていることを表している。 The event analysis unit 540 obtains the area information 541 that is the section with the event from the time 600 of the event detected by the event detection unit 120, and the local information 542 that indicates where the event occurred in the vicinity of the face. get. In the present embodiment, description will be given focusing on the mouth and eyes that change most easily in the face. In the example illustrated in FIG. 6, in the event presence section 610, the eye area trajectory 620 has little change, and the mouth area trajectory 630 represents a change.

基底分解部５５０は、分解処理５５２において、式（１）に従ってイベント有区間６１０の基底分解を行う。ただし本実施形態においては、空間局在性を評価するために、非特許文献９に記載されているＮＭＦ（Non-Negative Matrix Factorization）を使用する。このようにＮＭＦを用いることにより、基底成分を局所的な成分に分解することが可能となる。なお、本実施形態では、空間局在を示す基底分解法としてＮＭＦを用いたが、これに限定するものではなく、ＩＣＡなど他の局所性を特徴とする手法であれば適用可能である。 In the decomposition process 552, the base decomposition unit 550 performs base decomposition of the event-equipped section 610 according to the equation (1). However, in this embodiment, NMF (Non-Negative Matrix Factorization) described in Non-Patent Document 9 is used to evaluate spatial localization. By using NMF in this way, the base component can be decomposed into local components. In the present embodiment, NMF is used as a basis decomposition method that indicates spatial localization. However, the present invention is not limited to this, and any method characterized by other locality such as ICA is applicable.

イベント関連基底成分抽出部５６０は、局所性評価処理５６１において、基底分解部５５０で抽出された基底成分の局所的な分布と、イベント解析部５４０により得られた局所情報５４２とを比較する。そして、局所情報５４２と同一の領域にある基底成分Ｆ_kをイベント関連基底成分として抽出する。こうすることにより、イベントが発生している局所領域に関連性の深い基底成分を抽出することが可能となる。本実施形態では、空間的局在性のみを用いてイベント関連基底成分を抽出したが、これに限定するものではなく、第１又は第２の実施形態で説明した時系列情報を用いたイベント関連基底成分の抽出方法を併用してもよい。 In the locality evaluation process 561, the event-related base component extraction unit 560 compares the local distribution of the base component extracted by the base decomposition unit 550 with the local information 542 obtained by the event analysis unit 540. Then, the base component F _k in the same area as the local information 542 is extracted as the event-related base component. By doing this, it is possible to extract a base component that is closely related to the local region where the event occurs. In this embodiment, event-related basis components are extracted using only spatial localization. However, the present invention is not limited to this. Event-related information using the time-series information described in the first or second embodiment is used. A base component extraction method may be used in combination.

（第４の実施形態）
第１〜第３の実施形態では、イベントの特徴を用いてイベント関連基底成分を抽出し、ＣＧの形状が変形した場合について説明した。これに対して本実施形態では、イベントに応じてＣＧの色（テクスチャ）が変動する場合について説明する。 (Fourth embodiment)
In the first to third embodiments, the case has been described where the event-related base component is extracted using the feature of the event and the shape of the CG is deformed. In contrast, in the present embodiment, a case where the color (texture) of CG varies according to an event will be described.

図７は、本実施形態における変動テクスチャ生成装置７００の機能構成例を示すブロック図である。なお、基本的な構成は第３の実施形態で説明した図５の仮想モデル生成装置５００と類似しているため、差異のある部分のみを詳述する。 FIG. 7 is a block diagram illustrating a functional configuration example of the variable texture generation apparatus 700 according to the present embodiment. Note that the basic configuration is similar to the virtual model generation device 500 of FIG. 5 described in the third embodiment, and therefore only the differences will be described in detail.

フレーム区間抽出部７３０は、イベント有区間抽出処理７３２において、イベント有区間を抽出する。このとき、第１〜第３の実施形態では、ＡＡＭなど形状のランドマークをベースにした特徴量を用いたのに対し、本実施形態では非特許文献１０に記載された方法によりＣＧ画像からＵＶ展開したイベント有区間でのテクスチャマップを抽出する。同様に、イベント無区間抽出処理７３３において、ＣＧ画像からＵＶ展開したイベント無区間でのテクスチャマップを抽出する。 The frame segment extraction unit 730 extracts an event segment in the event segment segment extraction process 732. At this time, in the first to third embodiments, feature amounts based on landmarks such as AAM are used, whereas in this embodiment, UV is extracted from a CG image by the method described in Non-Patent Document 10. Extract the texture map in the developed event section. Similarly, in the event no-interval extraction process 733, a texture map in the event no-interval that is UV-developed from the CG image is extracted.

基底分解部７５０は、分解処理７５２において、イベント無区間でのテクスチャマップと、イベント有区間でのテクスチャマップの差分とを用いて、式（１）に従ってＮＭＦによる基底分解を行う。イベント関連基底成分抽出部７６０は、ＮＭＦの空間局在領域と、局所情報５４２とを比較して、イベント有区間で変化している情報を抽出することによって変動テクスチャマップを生成する。テクスチャマップ合成部７７０では、生成した変動テクスチャマップを元のテクスチャマップと合成し、イベントに応じて発生した変動を適用したテクスチャマップを生成する。ＣＧ生成部７８０はＣＧを合成して出力する。 In the decomposition process 752, the base decomposition unit 750 performs base decomposition using NMF according to Equation (1) using the texture map in the event-free interval and the difference between the texture maps in the event-existing interval. The event-related base component extraction unit 760 compares the NMF spatial localization region with the local information 542 and extracts information that changes in the event-equipped section, thereby generating a variable texture map. The texture map synthesis unit 770 synthesizes the generated variation texture map with the original texture map, and generates a texture map to which the variation generated according to the event is applied. The CG generation unit 780 combines and outputs CG.

以上のように本実施形態によれば、微細な変動を適用したテクスチャマップを高精度にかつ簡単に生成することができる。 As described above, according to the present embodiment, a texture map to which minute fluctuations are applied can be easily generated with high accuracy.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

１１０入力部
１２０イベント検出部
１３０フレーム区間抽出部
１４０イベント解析部
１５０基底分解部
１６０イベント関連基底成分抽出部
１７０変形ベクトル生成部
１８０ＣＧ変形部 DESCRIPTION OF SYMBOLS 110 Input part 120 Event detection part 130 Frame area extraction part 140 Event analysis part 150 Basal decomposition part 160 Event related base component extraction part 170 Deformation vector generation part 180 CG deformation part

Claims

被写体の時系列情報を入力する入力手段と、
前記入力手段によって入力された時系列情報からイベントに関する変化点を検出するとともに、前記変化点の周囲の時系列情報からイベント情報を検出する検出手段と、
前記検出手段によって検出されたイベント情報に基づいて、前記イベントに関連する１つまたは複数のフレームの区間を前記時系列情報から取得するとともに、前記取得したフレームの区間に係る特徴量を取得する取得手段と、
前記取得手段によって取得された特徴量を１つまたは複数の基底成分に分解する分解手段と、
前記変化点の周囲における基底成分の変化に基づいて、前記分解手段によって分解された基底成分から前記イベントに関連する基底成分を抽出する抽出手段と、
前記抽出手段によって抽出された基底成分から仮想モデルの変動成分を生成する生成手段と、
を備えることを特徴とする画像認識装置。 An input means for inputting time-series information of the subject;
Detecting a change point related to the event from the time series information input by the input means, and detecting means for detecting event information from the time series information around the change point;
Based on the event information detected by the detection means, acquisition of one or a plurality of frame sections related to the event from the time-series information and acquisition of feature quantities related to the acquired frame sections Means,
Decomposition means for decomposing the feature quantity acquired by the acquisition means into one or more base components;
Extraction means for extracting a base component related to the event from the base component decomposed by the decomposition means based on a change in the base component around the change point;
Generating means for generating a fluctuation component of the virtual model from the base component extracted by the extracting means;
An image recognition apparatus comprising:

前記検出手段は、前記イベント情報として、前記イベントに関する変化点の周囲のフレームの区間における時系列な変動の情報を取得し、
前記抽出手段は、前記検出手段によって検出された時系列な変動の情報に基づいて前記イベントに関連する基底成分を抽出することを特徴とする請求項１に記載の画像認識装置。 The detection means acquires, as the event information, time-series fluctuation information in a frame section around a change point related to the event,
The image recognition apparatus according to claim 1, wherein the extraction unit extracts a base component related to the event based on time-series fluctuation information detected by the detection unit.

前記取得手段は、イベント有区間とイベント無区間とに係る特徴量を取得し、前記分解手段は、それぞれの特徴量を基底成分に分解し、前記抽出手段は、それぞれの基底成分に基づいて前記イベントに関連する基底成分を抽出することを特徴とする請求項２に記載の画像認識装置。 The acquisition unit acquires a feature amount related to an event-present interval and an event-less interval, the decomposition unit decomposes each feature amount into base components, and the extraction unit uses the basis components to The image recognition apparatus according to claim 2, wherein a base component related to the event is extracted.

前記取得手段は、イベント有区間に係る特徴量を取得し、前記分解手段は、基底成分の重み付き線形和に分解し、前記抽出手段は、前記イベント有区間における当該重みの変化に基づいて前記イベントに関連する基底成分を抽出することを特徴とする請求項２に記載の画像認識装置。 The acquisition unit acquires a feature amount related to the event-carrying section, the decomposition unit decomposes the weighted linear sum of the base components, and the extraction unit uses the weight change in the event-carrying section based on the change in the weight. The image recognition apparatus according to claim 2, wherein a base component related to the event is extracted.

前記検出手段は、前記イベント情報として、前記イベントが発生する位置に関する局所情報を検出し、
前記抽出手段は、前記検出手段によって検出された局所情報に基づいて前記イベントに関連する基底成分を抽出することを特徴とする請求項１又は２に記載の画像認識装置。 The detection means detects, as the event information, local information regarding a position where the event occurs,
The image recognition apparatus according to claim 1, wherein the extraction unit extracts a base component related to the event based on local information detected by the detection unit.

前記被写体は顔であることを特徴とする請求項１〜５の何れか１項に記載の画像認識装置。 The image recognition apparatus according to claim 1, wherein the subject is a face.

前記被写体の時系列情報が動画像であることを特徴とする請求項１〜６の何れか１項に記載の画像認識装置。 The image recognition apparatus according to claim 1, wherein the time-series information of the subject is a moving image.

被写体の時系列情報を入力する入力工程と、
前記入力工程において入力された時系列情報からイベントに関する変化点を検出するとともに、前記変化点の周囲の時系列情報からイベント情報を検出する検出工程と、
前記検出工程において検出されたイベント情報に基づいて、前記イベントに関連する１つまたは複数のフレームの区間を前記時系列情報から取得するとともに、前記取得したフレームの区間に係る特徴量を取得する取得工程と、
前記取得工程において取得された特徴量を１つまたは複数の基底成分に分解する分解工程と、
前記変化点の周囲における基底成分の変化に基づいて、前記分解工程において分解された基底成分から前記イベントに関連する基底成分を抽出する抽出工程と、
前記抽出工程において抽出された基底成分から仮想モデルの変動成分を生成する生成工程と、
を備えることを特徴とする画像認識方法。 An input process for inputting time-series information of the subject;
Detecting a change point related to an event from the time-series information input in the input step, and detecting event information from time-series information around the change point;
Acquiring one or a plurality of frame sections related to the event from the time-series information based on the event information detected in the detection step, and acquiring a feature amount related to the acquired frame section Process,
A decomposition step of decomposing the feature amount acquired in the acquisition step into one or a plurality of base components;
An extraction step of extracting a base component related to the event from the base component decomposed in the decomposition step based on a change of the base component around the change point;
A generation step of generating a fluctuation component of the virtual model from the base component extracted in the extraction step;
An image recognition method comprising:

被写体の時系列情報を入力する入力工程と、
前記入力工程において入力された時系列情報からイベントに関する変化点を検出するとともに、前記変化点の周囲の時系列情報からイベント情報を検出する検出工程と、
前記検出工程において検出されたイベント情報に基づいて、前記イベントに関連する１つまたは複数のフレームの区間を前記時系列情報から取得するとともに、前記取得したフレームの区間に係る特徴量を取得する取得工程と、
前記取得工程において取得された特徴量を１つまたは複数の基底成分に分解する分解工程と、
前記変化点の周囲における基底成分の変化に基づいて、前記分解工程において分解された基底成分から前記イベントに関連する基底成分を抽出する抽出工程と、
前記抽出工程において抽出された基底成分から仮想モデルの変動成分を生成する生成工程と、
をコンピュータに実行させることを特徴とするプログラム。 An input process for inputting time-series information of the subject;
Detecting a change point related to an event from the time-series information input in the input step, and detecting event information from time-series information around the change point;
Acquiring one or a plurality of frame sections related to the event from the time-series information based on the event information detected in the detection step, and acquiring a feature amount related to the acquired frame section Process,
A decomposition step of decomposing the feature amount acquired in the acquisition step into one or a plurality of base components;
An extraction step of extracting a base component related to the event from the base component decomposed in the decomposition step based on a change of the base component around the change point;
A generation step of generating a fluctuation component of the virtual model from the base component extracted in the extraction step;
A program that causes a computer to execute.