JP2013228765A

JP2013228765A - Optimal gradient pursuit for image alignment

Info

Publication number: JP2013228765A
Application number: JP2012098291A
Authority: JP
Inventors: Xiaoming Liu; シャオミン・リュウ; Frederick W Wheeler; フレドリック・ウィルソン・ウィーラー; Henry Tu Peter; ピーター・ヘンリー・トュー; Jilin Tu; ジーリン・トュー
Original assignee: General Electric Co
Current assignee: General Electric Co
Priority date: 2012-04-24
Filing date: 2012-04-24
Publication date: 2013-11-07
Anticipated expiration: 2032-04-24
Also published as: JP5953097B2

Abstract

PROBLEM TO BE SOLVED: To provide a method for model-based facial image alignment.SOLUTION: The method includes acquiring a facial image of a person and using a discriminative face alignment model to fit a generic facial mesh to the facial image so as to facilitate locating facial features. The discriminative face alignment model may include a generative shape model component and a discriminative appearance model component. Further, the discriminative appearance model component may have been trained to estimate a score function that minimizes the angle between a gradient direction and a vector pointing toward a ground-truth shape parameter. Additional methods, systems, and articles of manufacture are also disclosed.

Description

本開示は、全般的にはイメージ位置合わせに関し、いくつかの実施形態では、顔イメージを位置合わせする技法に関する。 The present disclosure relates generally to image registration and, in some embodiments, to techniques for aligning facial images.

モデルベースのイメージ登録／位置合わせは、コンピュータビジョンで重要な話題であり、モデルベースのイメージ登録／位置合わせでは、モデルが、イメージに対するモデルの距離が最小化されるように変形される。具体的には、顔位置合わせは、さまざまな実用的能力（たとえば、顔特徴検出、ポーズ矯正（ｐｏｓｅｒｅｃｔｉｆｉｃａｔｉｏｎ）、および顔アニメーション）を可能にし、ポーズ、照明、表情、および隠蔽における顔外見変動に起因する科学的課題を提示するので、重要である。以前の技法は、ＡＳＭ（ＡｃｔｉｖｅＳｈａｐｅＭｏｄｅｌ）を含み、ＡＳＭは、統計的形状モデルを物体クラスにあてはめる。ＡＳＭは、ＡＡＭ（ＡｃｔｉｖｅＡｐｐｅａｒａｎｃｅＭｏｄｅｌ）に拡張され、ＡＡＭは、顔位置合わせに使用されてきた。ＡＡＭベースのモデルあてはめ中に、外見モデルから合成された外見インスタンスと入力イメージからのワープされた外見との間の平均二乗誤差が、形状パラメータおよび／または外見パラメータを反復して更新することによって最小化される。ＡＡＭは、対象の小さい集合に対して学習し、あてはめる間は適度によく機能することができるが、大きいデータセットに対してトレーニングされる時および／またはモデル学習中には見られなかった対象にあてはめる時に、その性能は、すばやく劣化する。 Model-based image registration / registration is an important topic in computer vision, and in model-based image registration / registration, the model is deformed so that the distance of the model to the image is minimized. Specifically, face alignment allows for a variety of practical capabilities (eg, face feature detection, pose correction, and facial animation), and for facial appearance variations in poses, lighting, facial expressions, and concealment. It is important because it presents the scientific challenges that result. Previous techniques include ASM (Active Shape Model), which applies a statistical shape model to an object class. ASM has been extended to AAM (Active Appearance Model), and AAM has been used for face alignment. During AAM-based model fitting, the mean square error between the appearance instance synthesized from the appearance model and the warped appearance from the input image is minimized by iteratively updating the shape and / or appearance parameters It becomes. AAM can work reasonably well while learning and applying to a small set of subjects, but for subjects that were not seen when training on large datasets and / or during model learning When applied, its performance degrades quickly.

ＡＡＭなどの生成モデルベースの手法に加えて、識別モデルベースの位置合わせ手法もある。ＢＡＭ（ＢｏｏｓｔｅｄＡｐｐｅａｒａｎｃｅＭｏｄｅｌ）は、ＡＡＭと同一の形状モデルを利用するが、完全に異なる外見モデルを利用し、この外見モデルは、本質的に２クラスクラシファイヤであり、正しくワープされたイメージおよび不正にワープされたイメージの集合から識別的に学習される。モデルあてはめ中に、ＢＡＭは、勾配方向に沿って形状パラメータを更新することによってクラシファイヤスコアを最大化することを目指す。ＢＡＭは、ＡＡＭと比較して、見られていないイメージへのあてはめによりよく一般化されることが示されたが、１つの潜在的な問題は、学習された２進クラシファイヤが、形状パラメータを摂動させている間の凹スコア面を保証できないことである。言い替えると、勾配方向に沿った移動は、必ずしも位置合わせを改善しない。ＢＲＭ（ＢｏｏｓｔｅｄＲａｎｋｉｎｇＭｏｄｅｌ）は、学習を介して凸性を強制することによってこの問題を軽減する。一方が他方よりよい位置合わせであるワープされたイメージの対を使用して、ＢＲＭは、すべてのトレーニング対内の２つのワープされたイメージを正しくランキングすることを試みるスコア関数を学習する。ＢＲＭは、以前の技法に対するある利益を提供する場合があるが、イメージ位置合わせにおけるさらなる改善を、下で説明するように達成することができる。 In addition to generation model-based techniques such as AAM, there are also identification model-based registration techniques. BAM (Boosted Appearance Model) uses the same shape model as AAM, but uses a completely different appearance model, which is essentially a two-class classifier, correctly warped images and fraud It is discriminatively learned from the set of images warped. During model fitting, BAM aims to maximize the classifier score by updating the shape parameters along the gradient direction. BAM has been shown to be more generalized by fitting to unseen images compared to AAM, but one potential problem is that a learned binary classifier can change the shape parameter. The concave score surface cannot be guaranteed during perturbation. In other words, movement along the gradient direction does not necessarily improve alignment. BRM (Boosted Ranking Model) alleviates this problem by forcing convexity through learning. Using warped image pairs where one is better aligned than the other, the BRM learns a score function that attempts to correctly rank the two warped images in all training pairs. Although BRM may provide certain benefits over previous techniques, further improvements in image registration can be achieved as described below.

米国特許出願公開第２００８／０３１０７５９号明細書US Patent Application Publication No. 2008/0310759

独創的に特許請求される本発明と同一の範囲のある種の態様を、下で示す。これらの態様が、単に、現在開示される主題のさまざまな実施形態がとることのできるある形態の短い要約を読者に提供するために提示されることと、これらの態様が本発明の範囲を限定することが意図されていないこととを理解されたい。実際に、本発明は、下で示されない可能性があるさまざまな態様を含むことができる。 Certain embodiments within the same scope as the invention as originally claimed are set forth below. These aspects are merely presented to provide the reader with a short summary of certain forms that various embodiments of the presently disclosed subject matter can take, and these aspects limit the scope of the invention. It should be understood that this is not intended. Indeed, the invention may include a variety of aspects that may not be shown below.

ここで開示される主題の実施形態は、一般に、イメージ位置合わせに関するものとすることができる。一実施形態では、方法は、人の顔イメージを獲得することと、顔イメージの顔特徴の突き止めを容易にするために包括的顔メッシュを顔イメージに位置合わせするのに識別顔位置合わせモデルを使用することとを含む。識別顔位置合わせモデルは、生成形状モデルコンポーネントおよび識別外見モデルコンポーネントを含むことができる。識別外見モデルコンポーネントは、所与のイメージの形状パラメータの関数であるスコア関数であって、形状パラメータのスコア関数の勾配方向と形状パラメータの理想的位置合わせ移動方向との間の角度を最小化することを試みるスコア関数を推定するために、トレーニングデータを用いてトレーニング済みであるものとすることができる。 Embodiments of the presently disclosed subject matter can generally relate to image registration. In one embodiment, the method uses an identified face alignment model to align a generic face mesh with a face image to facilitate obtaining a human face image and locating facial features of the face image. Using. The identified face registration model can include a generated shape model component and an identified appearance model component. The discriminating appearance model component is a score function that is a function of the shape parameter of a given image and minimizes the angle between the gradient direction of the shape parameter score function and the ideal alignment movement direction of the shape parameter. In order to estimate a score function that attempts to do so, it may have been trained using training data.

もう１つの実施形態では、システムは、複数の格納されたルーチンを有するメモリデバイスと、複数の格納されたルーチンを実行するように構成されたプロセッサとを含む。複数の格納されたルーチンは、トレーニングイメージの集合にアクセスするように構成されたルーチンと、位置合わせスコア関数の勾配方向と所望の位置合わせへの理想的移動方向との間の角度を最小化する位置合わせスコア関数を学習するためにトレーニングイメージの集合を使用して外見モデルをトレーニングするように構成されたルーチンとを含むことができる。 In another embodiment, the system includes a memory device having a plurality of stored routines and a processor configured to execute the plurality of stored routines. Multiple stored routines minimize the angle between the routine configured to access the collection of training images and the gradient direction of the alignment score function and the ideal direction of movement to the desired alignment. And a routine configured to train the appearance model using the set of training images to learn the alignment score function.

追加の実施形態では、製造品は、実行可能命令をその上に格納された１つまたは複数の固定コンピュータ可読媒体を含む。実行可能命令は、人間の顔を含むイメージにアクセスするように適合された命令と、識別顔位置合わせモデルを使用して人間の顔を位置合わせするように適合された命令とを含むことができる。識別顔位置合わせモデルは、位置合わせスコア関数の勾配方向と位置合わせスコア関数の最大値の方向で指すベクトルとの間の角度を最小化する位置合わせスコア関数を推定するためにトレーニングされた識別外見モデルを含むことができる。 In additional embodiments, the article of manufacture includes one or more fixed computer-readable media having executable instructions stored thereon. Executable instructions can include instructions adapted to access an image including a human face and instructions adapted to align a human face using an identified face alignment model. . The discriminant face registration model is a discriminating appearance trained to estimate the registration score function that minimizes the angle between the gradient direction of the registration score function and the vector pointing in the direction of the maximum value of the registration score function. Model can be included.

上で注記した特徴のさまざまな洗練が、本明細書で説明される主題のさまざまな態様に関して存在する可能性がある。さらなる特徴を、これらのさまざまな態様に組み込むこともできる。これらの洗練および追加の特徴は、個別にまたは任意の組合せで存在することができる。たとえば、示される実施形態のうちの１つまたは複数に関して下で議論されるさまざまな特徴を、本開示の説明される実施形態のいずれにも、単独でまたは任意の組合せで組み込むことができる。やはり、上で提示された短い要約は、請求される主題に対する限定を伴わずに、読者を、本明細書で開示される主題のある種の態様および文脈に慣れさせることだけが意図されたものである。 Various refinements of the features noted above may exist for various aspects of the subject matter described herein. Additional features can also be incorporated into these various aspects. These refinements and additional features can exist individually or in any combination. For example, the various features discussed below with respect to one or more of the illustrated embodiments can be incorporated into any of the described embodiments of the present disclosure, alone or in any combination. Again, the short summary presented above is intended only to familiarize the reader with certain aspects and contexts of the subject matter disclosed herein, without limitation to the claimed subject matter. It is.

本技法の上記および他の特徴、態様、および利益は、次の詳細な説明が添付図面を参照して読まれる時によりよく理解されるようになり、添付図面では、同様の符号が、図面全体を通じて同様の部分を表す。 The above and other features, aspects, and advantages of the present techniques will become better understood when the following detailed description is read with reference to the accompanying drawings, in which like reference characters represent The same part is expressed through.

本開示の実施形態による顔形状テンプレートを示す図である。FIG. 6 is a diagram illustrating a face shape template according to an embodiment of the present disclosure. ＢＲＭを介して学習された凸位置合わせスコア関数の例を示す図である。It is a figure which shows the example of the convex alignment score function learned via BRM. 本開示の実施形態による、勾配方向が理想的な移動方向によりよく位置合わせされた位置合わせスコア関数を示す図である。FIG. 6 is a diagram illustrating an alignment score function in which a gradient direction is better aligned with an ideal movement direction according to an embodiment of the present disclosure. 本開示の実施形態による、観察イメージと、顔形状テンプレートを利用してワープされた顔イメージとの例を示す図である。FIG. 6 is a diagram illustrating an example of an observed image and a warped face image using a face shape template according to an embodiment of the present disclosure. 本開示の実施形態による、特徴パラメータ化を用いてワープされた顔イメージの例を示す図である。FIG. 4 illustrates an example of a warped face image using feature parameterization according to an embodiment of the present disclosure. 本開示の実施形態による、外見モデルによって使用できる長方形特徴タイプの例を示す図である。FIG. 4 is a diagram illustrating an example of a rectangular feature type that can be used by an appearance model according to an embodiment of the present disclosure. 本開示の実施形態による特徴テンプレートの例を示す図である。FIG. 3 is a diagram illustrating an example of a feature template according to an embodiment of the present disclosure. 本開示の実施形態による、位置合わせスコア関数を推定するプロセスを全般的に示す図である。FIG. 3 illustrates generally a process for estimating an alignment score function according to an embodiment of the present disclosure. 本開示の実施形態による、学習アルゴリズムによって選択される上位１５個のハール特徴（Ｈａａｒｆｅａｔｕｒｅ）を示す図である。FIG. 6 is a diagram illustrating the top 15 Haar features selected by a learning algorithm according to an embodiment of the present disclosure. 本開示の実施形態による、学習アルゴリズムによって選択される上位１５個のハール特徴（Ｈａａｒｆｅａｔｕｒｅ）を示す図である。FIG. 6 is a diagram illustrating the top 15 Haar features selected by a learning algorithm according to an embodiment of the present disclosure. 本開示の実施形態による、図８および９の学習アルゴリズムによって選択された上位１００個のハール特徴の空間密度マップを示す図である。FIG. 10 illustrates a spatial density map of the top 100 Haar features selected by the learning algorithm of FIGS. 8 and 9 according to an embodiment of the present disclosure. 本開示の実施形態による、データセットからの例のイメージを示す図である。FIG. 4 illustrates an example image from a data set according to an embodiment of the present disclosure. 本開示の実施形態による、データセットからの例のイメージを示す図である。FIG. 4 illustrates an example image from a data set according to an embodiment of the present disclosure. 本開示の実施形態による、データセットからの例のイメージを示す図である。FIG. 4 illustrates an example image from a data set according to an embodiment of the present disclosure. 本開示の実施形態の学習アルゴリズムのランキング性能をＢＲＭのランキング性能と比較するグラフを示す図である。It is a figure which shows the graph which compares the ranking performance of the learning algorithm of embodiment of this indication with the ranking performance of BRM. 本開示の実施形態の学習アルゴリズムの角度推定性能をＢＲＭの角度推定性能と比較するグラフを示す図である。It is a figure which shows the graph which compares the angle estimation performance of the learning algorithm of embodiment of this indication with the angle estimation performance of BRM. 本開示の実施形態の学習アルゴリズムの位置合わせ速度性能をＢＲＭの位置合わせ速度性能と比較するグラフを示す図である。It is a figure which shows the graph which compares the alignment speed performance of the learning algorithm of embodiment of this indication with the alignment speed performance of BRM. 本開示の実施形態による、顔分析プロセスの例を示す図である。FIG. 6 illustrates an example of a face analysis process according to an embodiment of the present disclosure. 本開示の実施形態による、本開示で説明される機能性を提供するプロセッサベースのデバイスまたはシステムを示すブロック図である。FIG. 6 is a block diagram illustrating a processor-based device or system that provides the functionality described in this disclosure, according to an embodiment of the present disclosure.

現在開示される主題の１つまたは複数の特定の実施形態を、下で説明する。これらの実施形態の簡潔な説明を提供するために、実際の実施態様のいくつかの特徴が、本明細書で説明されない場合がある。すべてのそのようの実際の実施態様の開発において、すべての工学プロジェクトまたは設計プロジェクトと同様に、実施態様ごとに異なる可能性があるシステム関連制約およびビジネス関連制約の遵守などの開発者の特定の目標を達成するために、多数の実施態様固有の判断を行わなければならないことを了解されたい。さらに、そのような開発努力は、複雑で時間のかかるものである可能性があるが、それでも、本開示の利益を有する当業者にとって設計、製作、および製造の日常的仕事であることを了解されたい。本技法のさまざまな実施形態の要素を紹介する時に、冠詞「ａ」、「ａｎ」、「ｔｈｅ」、および「ｓａｉｄ」は、１つまたは複数のその要素があることを意味することが意図されている。用語「ｃｏｍｐｒｉｓｉｎｇ（含む）」、「ｉｎｃｌｕｄｉｎｇ（含む）」、および「ｈａｖｉｎｇ（有する）」は、包含的であることが意図され、リストされた要素以外の追加要素があってもよいことを意味する。 One or more specific embodiments of the presently disclosed subject matter are described below. In order to provide a concise description of these embodiments, some features of the actual implementation may not be described herein. In developing all such actual implementations, as with any engineering or design project, the developer's specific goals such as compliance with system-related and business-related constraints that may vary from implementation to implementation It should be understood that a number of implementation specific decisions must be made to achieve this. Further, it is understood that such development efforts can be complex and time consuming, but are still routine tasks of design, fabrication, and manufacture for those skilled in the art having the benefit of this disclosure. I want. When introducing elements of various embodiments of the present technique, the articles “a”, “an”, “the”, and “said” are intended to mean that there is one or more of the elements. ing. The terms “comprising”, “including”, and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. .

イメージ位置合わせは、イメージ特徴（たとえば、顔特徴）を正確に突き止めることを可能にするためにランドマークベースの包括的なメッシュをイメージ（たとえば、顔イメージ）に移動し、変形するプロセスである。いくつかの位置合わせモデルは、形状モデルコンポーネントおよび外見モデルコンポーネントを含む。イメージを与えられて、イメージの形状を定量化するために、ランドマークポイントを突き止めることができる。たとえば、顔イメージ位置合わせでは、形状モデルが、顔特徴（たとえば、鼻の先端、口の角など）に対応するランドマークポイントを含むことができる。図１に示された例の平均形状１０は、ランドマークポイント１４および線分１６によって定義される複数の三角形１２を含むことができる。 Image registration is the process of moving and transforming a landmark-based generic mesh into an image (eg, a face image) to allow the image features (eg, facial features) to be accurately located. Some alignment models include a shape model component and an appearance model component. Given an image, landmark points can be located to quantify the shape of the image. For example, in face image registration, the shape model can include landmark points corresponding to facial features (eg, nose tip, mouth corners, etc.). The average shape 10 of the example shown in FIG. 1 can include a plurality of triangles 12 defined by landmark points 14 and line segments 16.

外見モデルは、一般に、図２および図３に全般的に表されているように、学習された位置合わせスコア関数を含むことができる。ＢＲＭを介して学習された位置合わせスコア関数の例を、全般的に図２にグラフ２０として示す。この凹関数では、グランドトルース（ｇｒｏｕｎｄ−ｔｒｕｔｈ）形状パラメータ２２が、関数の最大値２４（すなわち、所望の位置合わせ）を表し、各線２６は、それぞれの線２６上の他の点と等しい大きさの点を表す。さまざまな摂動された形状パラメータ２８のスコアは、勾配方向３２を有する要素３０としてグラフ化される。しかし、ＢＲＭでは、勾配方向３２が、それでも、現在の形状パラメータ要素３０から始まる、グランドトルース形状パラメータ２２（すなわち、値２４）を指すベクトル３４に関して比較的大きい角度３６を有する可能性がある。したがって、ＢＲＭでは、形状パラメータを勾配方向３２に沿って更新することができるが、ＢＲＭでの位置合わせプロセスは、比較的大きい角度３６に起因して、最適化中に入り組んだ経路をたどる可能性がある。これは、逸脱の可能性を増やすだけではなく、位置合わせの速度をも下げる。 The appearance model can generally include a learned alignment score function, as generally represented in FIGS. An example of an alignment score function learned via BRM is shown generally as graph 20 in FIG. In this concave function, the ground-truth shape parameter 22 represents the maximum value 24 of the function (ie, the desired alignment), and each line 26 is equal in size to the other points on the respective line 26. Represents a point. The scores of the various perturbed shape parameters 28 are graphed as an element 30 having a gradient direction 32. However, in BRM, the gradient direction 32 may still have a relatively large angle 36 with respect to the vector 34 that points to the ground truth shape parameter 22 (ie, the value 24), starting from the current shape parameter element 30. Thus, in BRM, the shape parameters can be updated along the gradient direction 32, but the alignment process in BRM may follow a complicated path during optimization due to the relatively large angle 36. There is. This not only increases the likelihood of deviation, but also reduces the speed of alignment.

この問題に対処するために、本技法の一実施形態は、その代わりに、下で説明するＯＧＰＭ（ＯｐｔｉｍａｌＧｒａｄｉｅｎｔＰｕｒｓｕｉｔＭｏｄｅｌ）を使用して、形状モデルコンポーネントおよび外見モデルコンポーネントをも含む識別位置合わせモデルを学習する。ＢＡＭおよびＢＲＭと同一の形状表現を使用して、ＯＧＰＭ外見モデルコンポーネント（位置合わせスコア関数でもある）の学習が、非常に異なる目的を伴って定式化される。特に、図３のグラフ４０によって全般的に表されるように、外見モデルは、さまざまな摂動された形状パラメータ２８（符号３０によって絵図的に表される）での勾配３２が、理想的な移動方向（すなわち、グランドトルース形状パラメータを直接に指すベクトル３４）に関して最小限の角度３６を有する位置合わせスコア関数を学習することを目指す。スコア関数は、それぞれがワープされたイメージ領域内の１つの局所的特徴に作用する弱関数（ｗｅａｋｆｕｎｃｔｉｏｎ）の集合を含むかこれからなるものとすることができる。目的関数は、各弱関数を特徴候補の大きいプールから増分式の形で推定できるように定式化される。モデルあてはめ中に、初期形状パラメータを有するイメージを考慮して、勾配上昇（ｇｒａｄｉｅｎｔａｓｃｅｎｔ）が、勾配方向で形状パラメータを更新することによって実行され、この勾配方向は、ＯＧＰＭでは、勾配３２とベクトル３４との間の角度３６の最適化に起因して理想的移動方向により似ていると期待される。現在開示される位置合わせモデルの追加の詳細を、下で提供する。顔モデルおよび顔位置合わせに関係するある種の実施形態を、説明のために下で説明するが、やはり、他のイメージコンテキスト（すなわち、顔以外）でのモデルおよび位置合わせ技法の使用も考えられることに留意されたい。
顔モデル
ＢＡＭおよびＢＲＭに似て、一実施形態の顔モデルは、生成形状モデルコンポーネントおよび識別外見モデルコンポーネントからなり、またはこれを含む。形状モデルに関して、ランドマークベースの形状表現が、イメージの顔形状を記述する人気のある形であることに留意されたい。すなわち、２Ｄランドマークの集合｛ｘ_i，ｙ_i｝_i=1,…_,vを、たとえば目の角、口の角、鼻の先端などの主要な顔特徴の上に配置することができる。これらのランドマークの連結は、イメージの形状観察ｓ＝［ｘ₁，ｙ₁，ｘ₂，ｙ₂，…，ｘ_v，ｙ_v］^Tを形成する。各イメージが手作業でランドマークのラベルを付けられる顔データベースを与えられて、形状観察の集合全体を、形状モデルのトレーニングデータとして扱うことができる。一実施形態では、形状モデルを、観察集合に対する主成分分析（ＰＣＡ）を介して学習されたＰＤＭ（ＰｏｉｎｔＤｉｓｔｒｉｂｕｔｉｏｎＭｏｄｅｌ）とすることができる。したがって、学習された生成ＰＤＭは、 To address this issue, one embodiment of the present technique instead uses an OGPM (Optimal Gradient Pursuit Model), described below, to identify discriminative alignment models that also include shape model components and appearance model components. To learn. Using the same shape representation as BAM and BRM, learning of the OGPM appearance model component (which is also a registration score function) is formulated with very different objectives. In particular, as generally represented by the graph 40 of FIG. 3, the appearance model is such that the gradient 32 at various perturbed shape parameters 28 (represented graphically by the reference numeral 30) has an ideal movement. The goal is to learn an alignment score function that has a minimum angle 36 with respect to the direction (ie, vector 34 that directly points to the ground truth shape parameter). The score function may include or consist of a set of weak functions, each acting on one local feature in the warped image region. The objective function is formulated so that each weak function can be estimated incrementally from a large pool of feature candidates. During model fitting, a gradient ascent is performed by updating the shape parameter in the gradient direction, taking into account the image with the initial shape parameter, which in OGPM is the gradient 32 and the vector 34. Is expected to be more similar to the ideal direction of travel due to the optimization of the angle 36 between. Additional details of the currently disclosed alignment model are provided below. Certain embodiments related to face models and face registration are described below for purposes of illustration, but again, the use of models and registration techniques in other image contexts (ie, other than faces) is also contemplated. Please note that.
Face Model Similar to BAM and BRM, the face model of one embodiment consists of or includes a generated shape model component and an identification appearance model component. Note that with respect to the shape model, the landmark-based shape representation is a popular shape that describes the facial shape of the image. That is, the set of 2D landmarks {x _i , y _i } _{i = 1,} ... _{, V} can be placed on major facial features such as the corners of the eyes, the corners of the mouth, the tip of the nose, and the like. The concatenation of these landmarks forms the image shape observation s = [x ₁ , y ₁ , x ₂ , y ₂ ,..., X _v , y _v ] ^T. Given a face database where each image is manually labeled with landmarks, the entire set of shape observations can be treated as training data for the shape model. In one embodiment, the shape model may be a Point Distribution Model (PDM) learned via principal component analysis (PCA) on the observation set. Therefore, the learned generated PDM is

として特定の形状インスタンスを表すことができ、ここで、ｓ₀およびｓ_iは、それぞれ、ＰＤＭ学習から生じる平均形状および第ｉｓｈａｐｅｂａｓｉｓである。形状パラメータを、ｐ＝［ｐ₁，ｐ₂，…，ｐ_n］^Tによって与えることができる。ＡＡＭの形状コンポーネントに似て、最初の４つのｓｈａｐｅｂａｓｅをトレーニングして、大域的な並進および回転を表すことができ、残りのｓｈａｐｅｂａｓｅは、顔形状の固定的ではない変形を表すことができる。 Can represent a particular shape instance, where s ₀ and s _i are the average shape and i shape basis, respectively, resulting from PDM learning. The shape parameter can be given by p = [p ₁ , p ₂ ,..., P _n ] ^T. Similar to the shape component of AAM, the first four shape bases can be trained to represent global translation and rotation, and the remaining shape bases can represent non-fixed deformations of the face shape. .

図４に示されているように、平均形状座標系からイメージ観察５２内の座標へのワープ関数４８は、区分的アフィンワープとして定義される。 As shown in FIG. 4, the warp function 48 from the average shape coordinate system to the coordinates in the image observation 52 is defined as a piecewise affine warp.

Ｗ（ｘ⁰，ｙ⁰；ｐ）＝［１ｘ⁰ ｙ⁰］ａ（ｐ）（２）
ここで、（ｘ⁰，ｙ⁰）は、平均形状領域内の画素座標４６であり、ａ（ｐ）＝［ａ₁（ｐ）ａ₂（ｐ）］は、ｓ₀およびｓ（ｐ）内の各三角形対を関係付ける一意の３×２アフィン変換行列である。形状パラメータｐを与えられて、三角形１２ごとにａ（ｐ）を計算することができる。しかし、各画素（ｘ⁰，ｙ⁰）がどの三角形に属するのかの知識は、先験的に既知なので、ワープを、単純なテーブルルックアップを介して効率的に実行することができる。このワープ関数４８を使用して、任意の顔イメージ５２を平均形状（全体的に、符号５０および５６によって１画素について表される）にワープすることができ、この顔イメージ５２は、符号５８によって全体的に表される形状正規化された顔イメージＩ（Ｗ（ｘ；ｐ））をもたらし、外見モデルは、この形状正規化された顔イメージＩ（Ｗ（ｘ；ｐ））から学習される。 W (x ⁰ , y ⁰ ; p) = [1 x ⁰ y ⁰ ] a (p) (2)
Here, (x ⁰ , y ⁰ ) is a pixel coordinate 46 in the average shape region, and a (p) = [a ₁ (p) a ₂ (p)] is in s ₀ and s (p). Is a unique 3 × 2 affine transformation matrix that associates each triangle pair. Given a shape parameter p, a (p) can be calculated for each triangle 12. However, since the knowledge of which triangle each pixel (x ⁰ , y ⁰ ) belongs to is known a priori, warping can be performed efficiently via a simple table lookup. This warp function 48 can be used to warp an arbitrary face image 52 to an average shape (generally represented for one pixel by reference numerals 50 and 56), which is represented by reference numeral 58. A globally represented shape normalized face image I (W (x; p)) results, and the appearance model is learned from this shape normalized face image I (W (x; p)). .

外見モデルの一実施形態を、図５〜７を参照してよりよく理解することができる。特に、図５に、パラメータ化された特徴７２を有するワープされたイメージ７０の例を示す。図６に、外見モデルによって使用できる５つの特徴タイプ７４（特徴タイプ７６、７８、８０、８２、および８４として個々にラベルを付けられている）を示す。さらに、図７は、概念イメージテンプレートＡ（符号９２）を全体的に表す。 One embodiment of the appearance model can be better understood with reference to FIGS. In particular, FIG. 5 shows an example of a warped image 70 having parameterized features 72. FIG. 6 shows five feature types 74 (individually labeled as feature types 76, 78, 80, 82, and 84) that can be used by the appearance model. Further, FIG. 7 generally represents the conceptual image template A (reference numeral 92).

一実施形態の外見モデルは、形状正規化された顔イメージＩ（Ｗ（ｘ；ｐ））に対して計算されるｍ個の局所特徴 The appearance model of one embodiment is m local features calculated for a shape normalized face image I (W (x; p)).

の集合によって記述される。一実施形態の局所特徴は、ハール様長方形特徴（たとえば、特徴７２）とすることができ、このハール様長方形特徴は、計算効率に関する利益を提供することができる（たとえば、インテグラルイメージ（ｉｎｔｅｇｒａｌｉｍａｇｅ）技法に起因する）。長方形特徴を、 Is described by a set of The local feature of one embodiment may be a Haar-like rectangular feature (eg, feature 72), which may provide a computational efficiency benefit (eg, an integral image). ) Due to the technique). Rectangular features,

のように計算することができ、ここで、Ａは、イメージテンプレート９２である。テンプレートとワープされたイメージとの間の内積は、インテグラルイメージを使用して長方形特徴を計算することと同等である。図５に示されているように、イメージテンプレートＡを、（α，β，γ，δ，τ）によってパラメータ化することができ、ここで、（α，β）は、左上角であり、γおよびδは、幅および高さであり、τは、特徴タイプ７４である。
位置合わせ学習
外見モデル表現を紹介したので、我々は、これから本技法の外見モデルをどのようにトレーニングするのかに移る。一実施形態では、外見モデルは、モデルあてはめステージ中に使用される位置合わせスコア関数を含み、またはこれからなるものとすることができる。まず、ｐを、式（１）の形状モデルの現在の位置合わせを表す所与のイメージの形状パラメータとして表すことができる。一実施形態では、外見モデル学習の目標を、ラベルを付けられたトレーニングデータから、ｐに関して最大化された時に正しい位置合わせの形状パラメータをもたらすようになるスコア関数Ｆ（ｐ）を学習することを目指すことと述べることができる。具体的には、この目標を使用して、ｐ₀がイメージの正しい位置合わせに対応する形状パラメータである場合に、Ｆは、 Where A is the image template 92. The dot product between the template and the warped image is equivalent to calculating the rectangular feature using the integral image. As shown in FIG. 5, the image template A can be parameterized by (α, β, γ, δ, τ), where (α, β) is the upper left corner and γ And δ are the width and height, and τ is the feature type 74.
Registration Learning Having introduced the appearance model representation, we will now move on how to train the appearance model of this technique. In one embodiment, the appearance model may include or consist of an alignment score function used during the model fitting stage. First, p can be expressed as a shape parameter of a given image that represents the current alignment of the shape model of Equation (1). In one embodiment, the goal of appearance model learning is to learn from the labeled training data a score function F (p) that will result in the correct alignment shape parameter when maximized with respect to p. It can be described as aiming. Specifically, using this goal, if p ₀ is a shape parameter corresponding to correct alignment of the image, F is

になるものでなければならない。 Must be.

上の式を与えられて、Ｆ（ｐ）を、勾配上昇を介して最適化することができる。すなわち、Ｆが微分可能であると仮定することによって、形状パラメータを、初期パラメータｐ⁽⁰⁾から開始して各位置合わせ反復で反復して更新することができる。 Given the above equation, F (p) can be optimized via gradient rise. That is, by assuming that F is differentiable, the shape parameter can be updated iteratively at each registration iteration starting from the initial parameter p ⁽⁰⁾ .

ただし、λはステップサイズである。ｋ回の反復の後に、位置合わせプロセスが収束する時に、位置合わせは、ユークリッド距離‖ｐ^(k)−ｐ₀‖が事前定義のしきい値未満である場合に成功と考えられる。 Where λ is the step size. When the registration process converges after k iterations, the registration is considered successful if the Euclidean distance ‖p ^(k) −p ₀未満 is less than a predefined threshold.

式（５）から、 From equation (5)

が形状パラメータｐの移動方向を示すことは明白である。そのような移動の最終的な目的地はｐ₀なので、理想的な移動方向は、ｐから始まりｐ₀を指すベクトルでなければならず、このベクトルを Clearly shows the direction of movement of the shape parameter p. Since the final destination of such movement is p ₀ , the ideal direction of movement must be a vector starting with p and pointing to p ₀ ,

と表す。 It expresses.

同様に、最悪の移動方向は、 Similarly, the worst direction of movement is

の反対方向すなわち The opposite direction of

である。したがって、スコア関数Ｆの学習中には、 It is. Therefore, during learning of the score function F,

が、理想的な移動方向 Is the ideal direction of travel

にできる限り似た方向、またはこれと同等に、最悪の移動方向 Direction as similar as possible, or equivalent, worst direction of movement

にできる限り似ていない方向を有することが望まれる。具体的には、２つの単位ベクトルの間の内積であり、この２つのベクトルの間の角度の余弦応答（ｃｏｓｉｎｅｒｅｓｐｏｎｓｅ）でもあるクラシファイヤ It is desirable to have directions that are as dissimilar as possible. Specifically, a classifier that is an inner product between two unit vectors and is also a cosine response of the angle between the two vectors.

を定義する場合に、 When defining

が得られる。実際には、Ｈ（ｐ）が上の式に示されているように必ず１または−１と等しくなることは、ほとんどない。したがって、Ｈクラシファイヤを学習する目的関数を、 Is obtained. In practice, H (p) is almost never equal to 1 or −1 as shown in the above equation. Therefore, the objective function for learning H classifier is

として定式化することができ、ここでは、理想的な移動方向 Can be formulated as the ideal direction of travel here

だけが使用される。というのは、この理想的な移動方向が、 Only used. Because this ideal direction of movement is

からの制約をも表すことができるからである。ここから、 This is because the constraints from can also be expressed. from here,

を、明瞭さのために For clarity

として単純化する。この目的関数は、本質的に、その勾配方向が、すべてのトレーニングデータのすべての可能な形状パラメータｐで理想的な移動方向に関して最小の角度を有する関数Ｆを推定することを目指すものである。 Simplify as: This objective function is essentially aimed at estimating the function F whose gradient direction has the smallest angle with respect to the ideal direction of movement at all possible shape parameters p of all training data.

一実施形態では、目的関数（９）を最小化する解を、図８に示され下で説明される形で提供することができる。まず、位置合わせスコア関数が、単純な加法モデル In one embodiment, a solution that minimizes the objective function (9) may be provided in the form shown in FIG. 8 and described below. First, the alignment score function is a simple additive model

を使用すると仮定し、ここで、ｆ_i（ｐ）は、１つの長方形特徴 Where f _i (p) is one rectangular feature

に作用する弱関数である。したがって、Ｆの勾配も、加法的な形 Is a weak function that acts on Therefore, the gradient of F is also additive

である。これを式（７）に代入することによって、 It is. By substituting this into equation (7),

が得られる。Ｈ関数を再帰的な形で書くことができるという事実を考慮すると、増分推定を使用して、目的関数（９）を最小化することができる。すなわち、トレーニングサンプルの集合と、それから長方形特徴を選択できる仮説空間とを定義することによって、各弱関数ｆ_iを、反復して推定し、ターゲット関数Ｆに増分的に加算することができる。一実施形態の学習プロセスの例の諸部分の追加の詳細を、下で説明する。 Is obtained. Considering the fact that the H function can be written in a recursive form, incremental estimation can be used to minimize the objective function (9). That is, by defining a set of training samples and a hypothesis space from which rectangular features can be selected, each weak function f _i can be estimated iteratively and incrementally added to the target function F. Additional details of portions of an example learning process of one embodiment are described below.

一実施形態の外見学習では、トレーニングサンプルは、Ｎ次元のワープされたイメージＩ（Ｗ（ｘ；ｐ））である。顔イメージＩ_iごとに手作業でラベルを付けられたランドマーク｛ｓ_i｝を有する顔データベース｛Ｉ_i｝_i∈_[1,K]を与えられて、式（１）を使用して、グランドトルース形状パラメータｐ_0,iを計算し、その後、ランダム摂動によって複数の「不正な」形状パラメータ｛ｐ_j,i｝_j∈_[1,U]を合成することができる。下の式（１２）は、摂動の一例を説明し、ここで、ｖは、各要素が［−１，１］内で一様に分布するｎ次元ベクトルであり、μは、ＰＤＭ内のすべてのｓｈａｐｅｂａｓｅのベクトル化された固有値であり、摂動インデックスσは、摂動の範囲を制御する一定のスケールであり、○は、２つの等しい長さのベクトルの要素ごとの積を表す。 In appearance learning of one embodiment, the training sample is an N-dimensional warped image I (W (x; p)). Given a face database {I _i } _i ∈ _{[1, K]} with manually labeled landmarks {s _i } for each face image I _i , using equation (1), Truth shape parameters p _{0, i} can be calculated and then multiple “incorrect” shape parameters {p _{j, i} } _j ∈ _{[1, U]} can be synthesized by random perturbation. Equation (12) below illustrates an example of perturbation, where v is an n-dimensional vector in which each element is uniformly distributed within [-1, 1], and μ is all in the PDM. The shape base vectorized eigenvalues, where the perturbation index σ is a constant scale that controls the perturbation range, and ◯ represents the element-by-element product of two equally long vectors.

ｐ_j,i＝ｐ_i＋σｖ○μ （１２）
そしてワープされたイメージＩ_i（Ｗ（ｘ；ｐ_j,i））の集合を、学習用の肯定的トレーニングサンプル（ｙ_i＝１）として扱うことができる。理想的な移動方向と一緒に、これが、我々のトレーニングセットを構成することができる。 p _{j, i} = p _i + σv o μ (12)
The set of warped images I _i (W (x; p _{j, i} )) can then be treated as a positive training sample for learning (y _i = 1). Together with the ideal direction of travel, this can constitute our training set.

一実施形態では、弱関数ｆ_iは、 In one embodiment, the weak function f _i is

と定義され、ここで、ｇ_i＝±１であり、正規化する定数は、ｆ_iが［−１，１］の範囲内に留まることを保証する。この選択は、複数の考慮事項に基づくものとすることができる。第１に、ｆ_iは、Ｆが微分可能関数であると仮定されるので、微分可能でなければならない。第２に、各関数ｆ_iが１つの長方形特徴 Where g _i = ± 1 and the normalizing constant ensures that f _i stays within the range [−1,1]. This selection may be based on a number of considerations. First, f _i must be differentiable because F is assumed to be a differentiable function. Second, each function f _i has one rectangular feature

だけに作用することが望まれる場合がある。平均形状空間内で、長方形特徴のすべての可能な位置、サイズ、およびタイプは、そこから各反復で最良の特徴を選択できる仮説空間 It may be desirable to act only on. Within the mean shape space, all possible positions, sizes, and types of rectangular features are hypothetical spaces from which the best feature can be selected at each iteration

を形成する。 Form.

位置合わせスコア関数（１０）を学習する１つの手順が、下の表のアルゴリズム１として提供される。 One procedure for learning the alignment score function (10) is provided as Algorithm 1 in the table below.

このアルゴリズムは、一実施形態に従って図８にも全体的に示され、図８では、プロセス９６が、上の式（１３）からサンプル９８の集合に基づいて位置合わせスコア関数を推定する。 This algorithm is also shown generally in FIG. 8 according to one embodiment, in which process 96 estimates a registration score function based on the set of samples 98 from equation (13) above.

特に、プロセス９６では、位置合わせスコア関数Ｆを、ブロック１００（上のアルゴリズムのステップ１に対応する）で初期化することができる。弱関数ｆ_tを、上のアルゴリズムのステップ３で説明した形でブロック１０２であてはめることができる。仮説空間全体が網羅的に検索されるので、上のアルゴリズムのステップ３が、最も計算集中型のステップであることに留意されたい。ステップ３では、最良の特徴が、ブーストベースの学習での弱いクラシファイヤのＬ²距離ではなく、１に関するＨのＬ²距離に基づいて選択される。その後、クラシファイヤ関数Ｈを、ブロック１０４（上のアルゴリズムのステップ４に対応する）でｆ_tを用いて更新することができ、ｆ_tを、ブロック１０６で位置合わせスコア関数に加算することができる（上のアルゴリズムのステップ５に対応する）。このアルゴリズムのステップ３〜５を、図８のブロック１０８および１１０とリターンループ１１２とによって全体的に表されるように、各ｔについて繰り返すことができる（上のステップ２に対応する）。その最後に、プロセス９６は、ブロック１１４で、弱関数の集合の和と等しい位置合わせスコア関数の推定値を返すことができる。 In particular, in process 96, the alignment score function F may be initialized at block 100 (corresponding to step 1 of the above algorithm). Weak function f _t, it is possible to apply at block 102 in the manner described in Step 3 of the above algorithm. Note that step 3 of the above algorithm is the most computationally intensive step since the entire hypothesis space is searched exhaustively. In step 3, the best feature is selected based on the L ² distance of H with respect to 1, not the weak classifier L ² distance in boost-based learning. Then, the classifier function H, (corresponding to step 4 of the above algorithm) block 104 can be updated by using the f _t, a f _t, can be added to the alignment score function block 106 (Corresponding to step 5 of the above algorithm). Steps 3-5 of this algorithm can be repeated for each t as represented generally by blocks 108 and 110 and return loop 112 of FIG. 8 (corresponding to step 2 above). Finally, process 96 may return an estimate of the registration score function equal to the sum of the weak function set at block 114.

本質的に、スコア関数Ｆの学習は、特徴の集合 In essence, learning the score function F is a set of features

、しきい値｛ｔ_i｝、および特徴符号｛ｇ_i｝の学習と同等である。実用的な実施態様では、それぞれｇ_i＝＋１およびｇ_i＝−１をセットし、両方の場合の最適しきい値を推定することができる。最終的に、ｇ_iは、どのケースがより小さい誤差を有するのか（式１５）に基づいてセットされる。最適しきい値を、誤差が最小化される特徴値 , Threshold {t _i }, and feature code {g _i }. In a practical implementation, we can set g _i = + 1 and g _i = −1, respectively, and estimate the optimal threshold for both cases. Finally, g _i is set based on which case has the smaller error (Equation 15). Optimal threshold, a feature value that minimizes errors

の範囲内での二分検索によって推定することができる。 It can be estimated by a binary search within the range of.

３つ組 Triplet

の最終的な集合を、形状モデル｛ｓ_i｝_i=1,…_,nと一緒に、本明細書ではＯＧＰＭ（ＯｐｔｉｍａｌＧｒａｄｉｅｎｔＰｕｒｓｕｉｔＭｏｄｅｌ）と称する。一実施形態で学習アルゴリズムによって選択される上位１５個の特徴を、図９および１０に示す。特に、図９は、学習アルゴリズムによって選択される上位５個のハール特徴１２０の表現１１８を提供し、図１０は、学習アルゴリズムによって選択される次の１０個のハール特徴１２６の表現１２４を提供する。同一の実施形態の学習アルゴリズムによって選択される上位１００個のハール特徴の空間密度マップ１３０も、図１１に提供される。多数の選択された特徴が、顔特徴の境界に位置合わせされていることに留意されたい。
顔位置合わせ
一実施形態で、ＯＧＰＭを、下で説明する形で、初期形状パラメータｐ⁽⁰⁾（０回目の反復で）を有する所与のイメージＩの顔にあてはめることができる。式（５）に示されているように、位置合わせを、勾配上昇手法を使用することによって反復的に実行することができる。式（３）、（１０）、および（１４）から、ｐに関するＦの導関数が , Together with the shape model {s _i } _{i = 1,} ... _{, N} , is referred to herein as an OGPM (Optimal Gradient Pursuit Model). The top 15 features selected by the learning algorithm in one embodiment are shown in FIGS. In particular, FIG. 9 provides a representation 118 of the top five Haar features 120 selected by the learning algorithm, and FIG. 10 provides a representation 124 of the next ten Haar features 126 selected by the learning algorithm. . A spatial density map 130 of the top 100 Haar features selected by the learning algorithm of the same embodiment is also provided in FIG. Note that a number of selected features are aligned to the boundary of the facial features.
Face Registration In one embodiment, an OGPM can be applied to the face of a given image I having an initial shape parameter p ⁽⁰⁾ (in the 0th iteration) in the manner described below. As shown in equation (5), alignment can be performed iteratively by using a gradient ascent technique. From equations (3), (10), and (14), the derivative of F with respect to p is

であることがわかり、ここで、▽Ｉは、Ｗ（ｘ；ｐ）での評価されるイメージの勾配であり、 Where ▽ I is the gradient of the image being evaluated at W (x; p)

は、ｐでの評価されるワープのヤコビアンである。ＢＡＭの位置合わせ手順、計算の複雑さ、および Is the rated warp Jacobian at p. BAM alignment procedure, computational complexity, and

の効率的実施態様に関する議論を、Xioaming Liuによる"Discriminative Face Alignment"(IEEE Trans. On Pattern Analysis and Machine Intelligence, 31(11):1941-1954, November 2009)と題された出版物に見出すことができる。しかし、ＢＡＭベースのフィルタリングとは異なって、本技法は、単純な静的定数ではなく、線形検索を介して動的に判定されるステップサイズλを使用する。すなわち、各反復で、ある範囲内の最適λが、更新された形状パラメータが現在のスコア関数値Ｆ（ｐ）を最大に増やすことができるように探される。
実験結果
次の実験結果は、３つの公に使用可能なデータベースすなわち、ＮＤ１データベース、ＦＥＲＥＴデータベース、およびＢｉｏＩＤデータベースからの９６４個のイメージを含む実験データセットを使用して入手された。９６４個のイメージのそれぞれが、３３個の手作業でラベルを付けられたランドマークを含む。トレーニングプロセスの速度を高めるために、この実験において、イメージセットは、顔の幅がセットにわたって約４０画素になるようにダウンサンプリングされた。ＮＤ１データベース、ＦＥＲＥＴデータベース、およびＢｉｏＩＤデータベースのサンプルイメージ１３４を、それぞれ図１２、１３、および１４に示す。下の表１に示されているように、すべてのイメージが、３つのオーバーラップしないデータセットに区分された。セット１は、２つのデータベースからの４００個のイメージ（被験者あたり１つのイメージ）を含んだ。セット２は、セット１内のＮＤ１データベースと同一の被験者からの、３３４個の異なるイメージを含んだ。セット３は、ＢｉｏＩＤデータベース内の２３人の被験者からの、トレーニングに一度も使用されなかった２３０個のイメージを含んだ。セット１は、モデル学習用のトレーニングセットとして使用され、３つのセットのすべてが、モデルあてはめをテストするのに使用された。そのような区分の動機づけは、一般化能力のさまざまなレベルを実験することであった。たとえば、セット２を、見られた被験者の見られていないデータとしてテストすることができ、セット３を、見られていない被験者の見られていないデータ（より挑戦的なケースであり、実用応用のシナリオにより似ている）としてテストすることができる。 Can be found in a publication entitled "Discriminative Face Alignment" (IEEE Trans. On Pattern Analysis and Machine Intelligence, 31 (11): 1941-1954, November 2009) by Xioaming Liu it can. However, unlike BAM-based filtering, the technique uses a step size λ that is determined dynamically via a linear search, rather than a simple static constant. That is, at each iteration, an optimal λ within a range is sought so that the updated shape parameter can increase the current score function value F (p) to the maximum.
Experimental Results The following experimental results were obtained using an experimental data set containing 964 images from three publicly available databases: the ND1 database, the FERET database, and the BioID database. Each of the 964 images includes 33 manually labeled landmarks. To increase the speed of the training process, in this experiment, the image set was downsampled so that the face width was approximately 40 pixels across the set. Sample images 134 of the ND1 database, FERET database, and BioID database are shown in FIGS. 12, 13, and 14, respectively. As shown in Table 1 below, all images were partitioned into three non-overlapping datasets. Set 1 included 400 images from two databases (one image per subject). Set 2 included 334 different images from the same subject as the ND1 database in set 1. Set 3 included 230 images from 23 subjects in the BioID database that were never used for training. Set 1 was used as a training set for model learning, and all three sets were used to test the model fit. The motivation for such a division was to experiment with various levels of generalization ability. For example, set 2 can be tested as unseen data for seen subjects, and set 3 can be tested for unseen data for unseen subjects (a more challenging case for practical applications). More similar to the scenario).

実験では、上で説明したＯＧＰＭアルゴリズムが、２つの考慮事項に基づいてＢＲＭと比較された。第１に、ＯＧＰＭアルゴリズムを、ＢＲＭの拡張と考えることができる。第２に、ＢＲＭが、ＢＡＭなどの他の識別イメージ位置合わせ技法をしのぐことが示された。モデル学習中に、ＢＲＭとＯＧＰＭとの両方が、セット１の４００個のイメージからトレーニングされた。ＢＲＭは、セット１から合成された２４０００（＝４００×１０×６）個のトレーニングサンプルを使用し、各イメージは、１０個のプロファイル線を合成し、各線は、６つの均等な間隔のサンプルを有した。比較して、ＯＧＰＭは、１２０００個のトレーニングサンプルを使用し、各イメージは、式（１２）に従って３０個のサンプルを合成した。すべての合成されたサンプルが、ＢＲＭのように１つのプロファイル線から選択された複数のサンプルではなく、ランダムに拡散され、より少ないトレーニングサンプルを用いてよい性能を達成することを可能にするので、ＯＧＰＭについて、より少ないサンプルを使用することができた。セット１イメージの手作業でラベルを付けられたランドマークは、Xiaoming Liu et al.による"Face Model Fitting on Low Resolution Images"(Proc. Of the British Machine Vision Conference(BMVC), vol.3, pp.1079-1088, 2006)と題された出版物に記載の自動モデル洗練手法を使用して改善された。モデル学習の後に、ＢＲＭとＯＧＰＭとの両方の形状モデルコンポーネントは、９つのｓｈａｐｅｂａｓｅを有するＰＤＭであり、その外見モデル（すなわち、位置合わせスコア関数）は、１００個の弱クラシファイヤ／関数を有した。 In the experiment, the OGPM algorithm described above was compared with BRM based on two considerations. First, the OGPM algorithm can be considered an extension of BRM. Second, it has been shown that BRM outperforms other discriminating image registration techniques such as BAM. During model training, both BRM and OGPM were trained from 400 images in set 1. The BRM uses 24000 (= 400 × 10 × 6) training samples synthesized from set 1, each image combines 10 profile lines, and each line has 6 equally spaced samples. Had. In comparison, OGPM used 12000 training samples and each image synthesized 30 samples according to equation (12). Since all synthesized samples are randomly spread rather than multiple samples selected from a single profile line as in BRM, it is possible to achieve good performance with fewer training samples, Fewer samples could be used for OGPM. The manually labeled landmarks in the set 1 image are “Face Model Fitting on Low Resolution Images” by Xiaoming Liu et al. (Proc. Of the British Machine Vision Conference (BMVC), vol.3, pp. 1079-1088, 2006) and improved using the automatic model refinement technique described in the publication. After model learning, both BRM and OGPM shape model components are PDMs with 9 shape bases, and their appearance model (ie, the alignment score function) has 100 weak classifiers / functions. did.

ＢＲＭは、ワープされたイメージの対を正しくランキングすることによって、学習されたスコア関数の凸性を改善することを目指す。ＯＧＰＭは、スコア関数が、凸であるだけではなく、勾配方向とグランドトルース形状パラメータを指すベクトルとの間の最小の角度をも有しなければならないという意味で、ＢＲＭを拡張する。したがって、凸性は、ＢＲＭとＯＧＰＭとの両方のスコア関数を評価するためのよいメトリックである。ＢＲＭに似て、実験での凸性は、ワープされたイメージの正しくランキングされた対のパーセンテージを計算することによって測定された。セット１およびセット２を与えられて、対の２つのそれぞれのセットが合成され、ＢＲＭおよびＯＧＰＭのランキング性能がテストされた。図１５のグラフ１４０によって示されるように、摂動インデックスσは、イメージ対の摂動の量を制御する（式１２を参照されたい）。両方のセットについて、ＢＲＭとは異なって、ＯＧＰＭがその目的関数でランキングを直接には利用しないという事実を除いて、ＯＧＰＭは、ＢＲＭに非常に似たランキング性能を達成した。ＢＲＭは、摂動が非常に小さい（σ＝１）時にわずかによりよい性能を示した。しかし、これが、主にトレーニングデータ内のラベル付けの誤りに帰する可能性があると思われる。というのは、ラベル付けされたランドマークの小さい摂動を、かなりよい位置合わせとして扱うこともでき、これがランキングをよりむずかしくするからである。 BRM aims to improve the convexity of the learned score function by correctly ranking the warped image pairs. OGPM extends BRM in the sense that the score function must not only be convex, but also have a minimum angle between the gradient direction and the vector pointing to the ground truth shape parameter. Convexity is therefore a good metric for evaluating both BRM and OGPM score functions. Similar to BRM, experimental convexity was measured by calculating the percentage of correctly ranked pairs of the warped image. Given Set 1 and Set 2, the two respective sets of pairs were synthesized and the ranking performance of BRM and OGPM was tested. As shown by graph 140 in FIG. 15, the perturbation index σ controls the amount of perturbation of the image pair (see Equation 12). For both sets, unlike BRM, OGPM achieved ranking performance very similar to BRM, except for the fact that OGPM does not directly use ranking in its objective function. BRM showed slightly better performance when the perturbation was very small (σ = 1). However, this is likely to result mainly in labeling errors in the training data. This is because small perturbations of labeled landmarks can be treated as fairly good alignment, which makes ranking more difficult.

凸性測定に加えて、我々は、勾配方向とグランドトルース形状パラメータを指すベクトルとの間の角度の推定値をも検証した。この角度の最小化は、Ｈ（ｐ）関数によって表される、ＯＧＰＭの目的関数である。前述のランキング実験に似て、セット１を与えられて、我々は、さまざまな摂動インデックスσを使用して、ワープされたイメージの６つのセットをランダムに合成した。その後、セット内のイメージごとに、我々は、Ｈ（ｐ）スコアを計算し、各セットの平均スコアを図１６のグラフ１５０にプロットした。同様の実験が、セット２についても行われた。ＯＧＰＭおよびＢＲＭは、類似するランキング性能を有するが、ＯＧＰＭは、セット１と２との両方についてより大きい関数スコアを達成し、したがってより小さい勾配角度を達成する。これは、ＢＲＭによって行われるように、目的としてランキング性能を使用することが、最適の角度推定を保証せず、ＯＧＰＭによって行われるように、目的関数として勾配角度を直接に使用することを、よりよい位置合わせスコア関数を得るために使用できることを実証するものである。 In addition to the convexity measurement, we also verified an estimate of the angle between the gradient direction and the vector pointing to the ground truth shape parameter. This angle minimization is the objective function of OGPM, represented by the H (p) function. Similar to the ranking experiment described above, given set 1, we randomly synthesized six sets of warped images using various perturbation indices σ. Then, for each image in the set, we calculated the H (p) score and plotted the average score for each set on the graph 150 in FIG. A similar experiment was performed for Set 2. OGPM and BRM have similar ranking performance, but OGPM achieves a larger function score for both sets 1 and 2 and thus achieves a smaller gradient angle. This is because using ranking performance as an objective, as done by BRM, does not guarantee optimal angle estimation, and more directly using the gradient angle as objective function, as done by OGPM. It demonstrates that it can be used to obtain a good alignment score function.

位置合わせ実験では、モデルあてはめアルゴリズムが、複数の初期ランドマークを有する各イメージに対して実行され、位置合わせ結果が評価された。初期ランドマークは、式（１２）を使用して、すなわち、その範囲がＰＤＭトレーニング中のｓｈａｐｅｂａｓｉｓの固有値の倍数（σ）と等しい独立の一様分布によってグランドトルースランドマークをランダムに摂動させることによって生成された。あるイメージへのあてはめが終了した後に、位置合わせ性能が、位置合わせされたランドマークとグランドトルースランドマークとの間の結果の二乗平均平方根誤差（ＲＭＳＥ）によって測定された。 In the registration experiment, a model fitting algorithm was performed on each image having a plurality of initial landmarks and the registration results were evaluated. The initial landmark can be randomly perturbed using equation (12), that is, by an independent uniform distribution whose range is equal to a multiple of the shape basis eigenvalue (σ) during PDM training. Generated by. After the fit to an image was finished, the registration performance was measured by the root mean square error (RMSE) of the result between the registered landmark and the ground truth landmark.

我々は、ＯＰＧＭとＢＲＭとの両方を使用して、３つすべてのセットについて位置合わせ実験を行った。上の表２は、画素に関するＲＭＳＥ結果を示し、各要素は、ある特定の摂動インデックスσでの２０００回を超える試行の平均値である。したがって、セット１、２、および３の各イメージは、それぞれ５回、６回、および９回のランダム試行を用いてテストされた。ＯＧＰＭおよびＢＲＭは、同一の条件の下でテストされた。たとえば、両方のアルゴリズムが、同一のランダム試行を用いて初期化され、終了条件も同一であった。すなわち、位置合わせ反復は、位置合わせスコアＦ（ｐ）をさらに高めることができない場合、または連続する反復の間のランドマーク差（ＲＭＳＥ）が、前に説明した実験での０．０５画素などの事前定義のしきい値より小さい場合に、終了された。 We performed alignment experiments on all three sets using both OPGM and BRM. Table 2 above shows the RMSE results for pixels, where each element is an average of over 2000 trials at a particular perturbation index σ. Thus, each image in sets 1, 2, and 3 was tested using 5, 6, and 9 random trials, respectively. OGPM and BRM were tested under the same conditions. For example, both algorithms were initialized with the same random trial and the termination conditions were the same. That is, if the registration iteration cannot further increase the registration score F (p), or the landmark difference (RMSE) between successive iterations is such as 0.05 pixels in the previously described experiment Terminated if less than predefined threshold.

表２から、３つすべてのセットについて、ＯＧＰＭがＢＲＭよりよい位置合わせ性能を達成できたことがわかる。初期摂動が、σ＝６または８など（実用的応用例では最も挑戦的なケースである）、比較的大きい時に、性能利益がより多かったことに留意されたい。テストイメージが非常に低解像度であったという事実を考慮すると、これは、実質的な性能改善を表す。３つのデータセットの中で比較すると、トレーニングセット（セット１）での性能利益は、他の２つのデータセットと比較して大きかった。 From Table 2, it can be seen that OGPM was able to achieve better alignment performance than BRM for all three sets. Note that there was more performance benefit when the initial perturbation was relatively large, such as σ = 6 or 8 (which is the most challenging case in practical applications). Considering the fact that the test image was very low resolution, this represents a substantial performance improvement. When compared among the three data sets, the performance benefit in the training set (Set 1) was greater compared to the other two data sets.

より小さい勾配角度の１つの強みは、位置合わせ中により少ない反復で収束する能力である。図１７に、ＯＧＰＭおよびＢＲＭがσ＝８の時にセット３で収束するために実験で必要とした反復の回数を示すヒストグラム１６０を提供する。平均して、ＯＧＰＭがＢＲＭより早く収束できることがわかる。実験では、ＯＧＰＭの反復の平均回数は５．４７であったが、ＢＲＭの反復の平均回数は６．４０であった。同様に、セット１について、σ＝８の時に、ＯＧＰＭの反復の平均回数は５．０８であったが、ＢＲＭの反復の平均回数は６．０９であった。 One strength of the smaller slope angle is the ability to converge with fewer iterations during registration. FIG. 17 provides a histogram 160 showing the number of iterations required in the experiment to converge with set 3 when OGPM and BRM are σ = 8. On average, it can be seen that OGPM can converge faster than BRM. In the experiment, the average number of OGPM iterations was 5.47, while the average number of BRM iterations was 6.40. Similarly, for set 1, when σ = 8, the average number of OGPM iterations was 5.08, but the average number of BRM iterations was 6.09.

本開示で説明されるイメージ位置合わせ技法を、所望の結果を達成するために多数の他の処理技法と共に使用することができる。たとえば、図１８に全体的に示されているように、一実施形態によれば、開示されるイメージ位置合わせ技法を、顔分析プロセス１７０で使用することができる。例として、そのようなプロセス１７０は、ブロック１７２および１７４によって全体的に示されるように、イメージを受け取ることと、イメージ内の１つまたは複数の顔を検出することとを含むことができる。検出された顔を、ブロック１７６によって全体的に示されるように、現在開示される技法を介してなど、位置合わせすることができる。その後、イメージ内の人を識別するために位置合わせされた顔を基準データと比較することによる顔認識のためまたはポーズ推定のためなど、位置合わせされた顔をブロック１７８で分析することができる。 The image registration techniques described in this disclosure can be used with a number of other processing techniques to achieve a desired result. For example, as generally shown in FIG. 18, according to one embodiment, the disclosed image registration techniques can be used in the face analysis process 170. By way of example, such a process 170 can include receiving an image and detecting one or more faces in the image, as generally indicated by blocks 172 and 174. The detected face can be registered, such as via currently disclosed techniques, as indicated generally by block 176. The registered face can then be analyzed at block 178, such as for face recognition by comparing the registered face to identify people in the image with reference data or for pose estimation.

最後に、本開示で説明される機能性（たとえば、イメージ検出、位置合わせ、および分析）を、コンピュータなどのプロセッサベースのシステムによって実行できることに留意されたい。そのようなシステムの例を、一実施形態に従って図１９で提供する。図示されたプロセッサベースのシステム１８４を、本明細書で説明される機能性のすべてまたは一部を実施するソフトウェアを含むさまざまなソフトウェアを実行するように構成された、パーソナルコンピュータなどの汎用コンピュータとすることができる。その代わりに、プロセッサベースのシステム１８４は、とりわけ、システムの一部として提供される特殊化されたソフトウェアおよび／またはハードウェアに基づいて本技法のすべてまたは一部を実施するように構成された、メインフレームコンピュータ、分散コンピューティングシステム、または特定用途向けコンピュータもしくは特定用途向けワークステーションを含むことができる。さらに、プロセッサベースのシステム１８４は、現在開示される機能性の実施を容易にするために、単一のプロセッサまたは複数のプロセッサのいずれかを含むことができる。 Finally, it should be noted that the functionality described in this disclosure (eg, image detection, registration, and analysis) can be performed by a processor-based system such as a computer. An example of such a system is provided in FIG. 19 according to one embodiment. The illustrated processor-based system 184 is a general purpose computer, such as a personal computer, configured to execute a variety of software, including software that implements all or part of the functionality described herein. be able to. Instead, the processor-based system 184 is configured to perform all or part of the present technique based on, among other things, specialized software and / or hardware provided as part of the system, It can include mainframe computers, distributed computing systems, or special purpose computers or special purpose workstations. Further, the processor-based system 184 can include either a single processor or multiple processors to facilitate implementation of the presently disclosed functionality.

一般に、プロセッサベースのシステム１８４は、システム１８４のさまざまなルーチンおよび処理機能を実行できる、中央処理装置（ＣＰＵ）などのマイクロコントローラまたはマイクロプロセッサ１８６を含むことができる。たとえば、マイクロプロセッサ１８６は、さまざまなオペレーティングシステム命令ならびにあるプロセスを果たすように構成されたソフトウェアルーチンを実行することができる。ルーチンを、メモリ１８８（たとえば、パーソナルコンピュータのランダムアクセスメモリ（ＲＡＭ））または１つもしくは複数のマスストレージデバイス１９０（たとえば、内蔵もしくは外付けのハードドライブ、ソリッドステートストレージデバイス、光ディスク、磁気ストレージデバイス、または任意の他の適切なストレージデバイス）など、１つまたは複数の固定コンピュータ可読媒体を含む製造品内に格納しまたはこれによって提供することができる。さらに、マイクロプロセッサ１８６は、コンピュータベースの実施態様で本技法の一部として提供されるデータなど、さまざまなルーチンまたはソフトウェアプログラムの入力として提供されるデータを処理する。 In general, the processor-based system 184 can include a microcontroller or microprocessor 186, such as a central processing unit (CPU), that can perform the various routine and processing functions of the system 184. For example, the microprocessor 186 can execute various operating system instructions as well as software routines configured to perform certain processes. Routines can be stored in memory 188 (eg, personal computer random access memory (RAM)) or one or more mass storage devices 190 (eg, internal or external hard drives, solid state storage devices, optical disks, magnetic storage devices, Or any other suitable storage device), which may be stored in or provided by an article of manufacture that includes one or more fixed computer-readable media. In addition, the microprocessor 186 processes data provided as input to various routines or software programs, such as data provided as part of the present technique in a computer-based implementation.

そのようなデータを、メモリ１８８またはマスストレージデバイス１９０内に格納し、またはこれによって提供することができる。その代わりに、そのようなデータを、１つまたは複数の入力デバイス１９２を介してマイクロプロセッサ１８６に提供することができる。入力デバイス１９２は、キーボード、マウス、または類似物などの手動入力デバイスを含むことができる。さらに、入力デバイス１９２は、有線もしくは無線のイーサネット（商標）カード、無線ネットワークアダプタ、または、ローカルエリアネットワークもしくはインターネットなどの任意の適切な通信ネットワーク１９８を介する他のデバイスとの通信を容易にするように構成されたさまざまなポートもしくはデバイスのうちのいずれかなどのネットワークデバイスを含むことができる。そのようなネットワークデバイスを介して、システム１８４は、システム１８４に近接するものであれシステム１８４から遠隔であれ、他のネットワーク化された電子システムとデータを交換し、通信することができる。ネットワーク１９８は、スイッチ、ルータ、サーバまたは他のコンピュータ、ネットワークアダプタ、通信ケーブルなどを含む、通信を容易にするさまざまなコンポーネントを含むことができる。 Such data can be stored in or provided by memory 188 or mass storage device 190. Instead, such data can be provided to the microprocessor 186 via one or more input devices 192. Input device 192 may include a manual input device such as a keyboard, mouse, or the like. Further, the input device 192 facilitates communication with other devices over a wired or wireless Ethernet card, wireless network adapter, or any suitable communication network 198 such as a local area network or the Internet. Network devices such as any of a variety of ports or devices configured. Through such a network device, system 184 can exchange data and communicate with other networked electronic systems, whether proximate to system 184 or remote from system 184. The network 198 may include various components that facilitate communication, including switches, routers, servers or other computers, network adapters, communication cables, and the like.

１つまたは複数の格納されたルーチンに従ってデータを処理することによって得られた結果など、マイクロプロセッサ１８６によって生成された結果を、ディスプレイ１９４またはプリンタ１９６などの１つまたは複数の出力デバイスを介してオペレータに提供することができる。表示されたまたは印刷された出力に基づいて、オペレータは、追加処理または代替処理を要求するか、入力デバイス１９２を介するなど、追加データまたは代替データを提供することができる。プロセッサベースのシステム１８４のさまざまなコンポーネントの間の通信を、通常、チップセットと、システム１８４のコンポーネントを電気的に接続する１つまたは複数のバスまたは相互接続とを介して達成することができる。 Results generated by the microprocessor 186, such as results obtained by processing data according to one or more stored routines, can be transmitted to an operator via one or more output devices such as a display 194 or a printer 196. Can be provided. Based on the displayed or printed output, the operator can request additional or alternative processing or provide additional or alternative data, such as via input device 192. Communication between the various components of the processor-based system 184 can typically be achieved via a chipset and one or more buses or interconnects that electrically connect the components of the system 184.

本発明の技術的効果は、顔イメージおよび非顔イメージの位置合わせに関する速度、効率、および正確さの改善を含む。本発明のある特徴だけが、図示され、本明細書で説明されたが、多数の修正形態および変更を、当業者は思い浮かべるであろう。したがって、添付の特許請求の範囲が、本発明の真の範囲に含まれるものとしてすべてのそのような修正形態および変更を包含することが意図されていることを理解されたい。 Technical effects of the present invention include improved speed, efficiency, and accuracy with respect to registration of facial and non-facial images. While only certain features of the invention have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. Accordingly, it is to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true scope of the invention.

１０平均形状
１２三角形
１４ランドマークポイント
１６線分
２０グラフ
２２グランドトルース形状パラメータ
２４最大値
２６線
２８摂動された形状パラメータ
３０要素
３２勾配方向
３４ベクトル
３６角度
４０グラフ
４６画素座標
４８ワープ関数
５２イメージ観察
５８形状正規化された顔イメージ
７０ワープされたイメージ
７２パラメータ化された特徴
７４特徴タイプ
７６特徴タイプ
７８特徴タイプ
８０特徴タイプ
８２特徴タイプ
８４特徴タイプ
９２概念イメージテンプレートＡ
９６プロセス
９８サンプル
１１８表現
１２０上位５個のハール特徴
１２４表現
１２６次の１０個のハール特徴
１３０空間密度マップ
１３４サンプルイメージ
１４０グラフ
１５０グラフ
１６０ヒストグラム
１７０顔分析プロセス
１８４プロセッサベースのシステム
１８６マイクロコントローラまたはマイクロプロセッサ
１８８メモリ
１９０マスストレージデバイス
１９２入力デバイス
１９４ディスプレイ
１９６プリンタ
１９８通信ネットワーク 10 average shape 12 triangle 14 landmark point 16 line segment 20 graph 22 ground truth shape parameter 24 maximum value 26 line 28 perturbed shape parameter 30 element 32 gradient direction 34 vector 36 angle 40 graph 46 pixel coordinate 48 warp function 52 image observation 58 shape normalized face image 70 warped image 72 parameterized feature 74 feature type 76 feature type 78 feature type 80 feature type 82 feature type 84 feature type 92 concept image template A
96 Processes 98 Samples 118 Representation 120 Top 5 Haar Features 124 Representation 126 Next 10 Haar Features 130 Spatial Density Map 134 Sample Image 140 Graph 150 Graph 160 Histogram 170 Face Analysis Process 184 Processor Based System 186 Microcontroller or Micro Processor 188 Memory 190 Mass storage device 192 Input device 194 Display 196 Printer 198 Communication network

Claims

人の顔イメージを獲得することと、
システムのプロセッサによって実行されるソフトウェアを介して、前記顔イメージの顔特徴の突き止めを容易にするために包括的顔メッシュを前記顔イメージに位置合わせするのに識別顔位置合わせモデルを使用することであって、前記識別顔位置合わせモデルは、生成形状モデルコンポーネントおよび識別外見モデルコンポーネントを含み、前記識別外見モデルコンポーネントは、所与のイメージの形状パラメータの関数であるスコア関数であって、前記形状パラメータの前記スコア関数の勾配方向と前記形状パラメータの理想的位置合わせ移動方向との間の角度を最小化することを試みる前記スコア関数を推定するために、トレーニングデータを用いてトレーニング済みである、使用することと
を含む方法。 Acquiring a human face image,
Using an identified face alignment model to align a generic face mesh with the face image to facilitate locating facial features of the face image via software executed by a processor of the system The identified face alignment model includes a generated shape model component and an identified appearance model component, wherein the identified appearance model component is a score function that is a function of a shape parameter of a given image, the shape parameter Use of training data to estimate the score function that attempts to minimize the angle between the gradient direction of the score function and the ideal registration movement direction of the shape parameter And a method comprising:

前記識別外見モデルコンポーネントは、前記トレーニングデータのすべての形状パラメータｐについて
と定義される目的関数を介して前記スコア関数を推定するためにトレーニングデータを用いてトレーニング済みであり、Ｆは、スコア関数であり、
は、それぞれ前記勾配方向および前記理想的位置合わせ移動方向を表す２つの単位ベクトルの間の内積と等しいクラシファイヤである、請求項１記載の方法。 The identified appearance model component is for all shape parameters p of the training data.
Trained with training data to estimate the score function via an objective function defined as: F is a score function;
The method of claim 1, wherein is a classifier equal to an inner product between two unit vectors representing the gradient direction and the ideal alignment movement direction, respectively.

前記目的関数を最小化することは、それぞれがそれぞれの単一の長方形顔特徴に作用する弱関数を合計することを含む、請求項２記載の方法。 The method of claim 2, wherein minimizing the objective function comprises summing weak functions, each acting on a respective single rectangular facial feature.

前記プロセッサによって実行される追加ソフトウェアを介して、位置合わせに続いて前記顔イメージに対して顔認識を実行することを含む、請求項１記載の方法。 The method of claim 1, comprising performing face recognition on the face image following registration via additional software executed by the processor.

前記人の前記顔イメージを獲得することは、前記人の前記顔を検出するためにイメージデータを分析することを含む、請求項１記載の方法。 The method of claim 1, wherein obtaining the face image of the person comprises analyzing image data to detect the face of the person.

前記トレーニングデータを用いて前記識別外見モデルをトレーニングすることを含む、請求項１記載の方法。 The method of claim 1, comprising training the identified appearance model using the training data.

勾配上昇を介して前記スコア関数を最適化することを含む、請求項６記載の方法。 The method of claim 6, comprising optimizing the score function via gradient elevation.

複数の顔イメージの顔イメージごとにグランドトルース形状パラメータを計算することと、
前記グランドトルース形状パラメータのランダム摂動によって顔イメージごとに複数の変更された顔パラメータを合成することと
を含む、請求項６記載の方法。 Calculating ground truth shape parameters for each face image of multiple face images;
7. The method of claim 6, comprising combining a plurality of modified face parameters for each face image by random perturbation of the ground truth shape parameters.

前記トレーニングデータは、前記変更された形状パラメータに基づくワープされたイメージの集合と、前記ワープされたイメージの理想的移動方向とを含む、請求項８記載の方法。 The method of claim 8, wherein the training data includes a set of warped images based on the modified shape parameters and an ideal direction of movement of the warped images.

複数のルーチンをその中に格納されたメモリデバイスと、
前記メモリデバイス内に格納された前記複数のルーチンを実行するように構成されたプロセッサであって、前記複数のルーチンは、
トレーニングイメージの集合にアクセスするように構成されたルーチンと、
位置合わせスコア関数の勾配方向と所望の位置合わせへの理想的移動方向との間の角度を最小化する前記位置合わせスコア関数を学習するためにトレーニングイメージの前記集合を使用して外見モデルをトレーニングするように構成されたルーチンと
を含む、プロセッサと
を含むシステム。 A memory device having a plurality of routines stored therein;
A processor configured to execute the plurality of routines stored in the memory device, the plurality of routines comprising:
A routine configured to access a collection of training images;
Train the appearance model using the set of training images to learn the alignment score function that minimizes the angle between the gradient direction of the alignment score function and the ideal direction of movement to the desired alignment And a processor, including a routine configured to:

前記複数のルーチンは、
トレーニングイメージの前記集合のイメージごとにグランドトルース形状パラメータを判定するように構成されたルーチンと、
前記グランドトルース形状パラメータから派生する複数の形状パラメータを合成するように構成されたルーチンと
を含む、請求項１０記載のシステム。 The plurality of routines are:
A routine configured to determine a ground truth shape parameter for each image in the set of training images;
A system configured to synthesize a plurality of shape parameters derived from the ground truth shape parameters.

前記複数の形状パラメータを合成するように構成された前記ルーチンは、ランダム摂動を介して前記複数の形状パラメータを合成するように構成されたルーチンを含む、請求項１１記載のシステム。 The system of claim 11, wherein the routine configured to combine the plurality of shape parameters includes a routine configured to combine the plurality of shape parameters via random perturbations.

前記外見モデルをトレーニングするように構成された前記ルーチンは、前記位置合わせスコア関数を初期化することと、単一の長方形特徴に作用する複数の弱関数を反復して推定することと、前記複数の弱関数の推定値を前記位置合わせスコア関数に増分的に加算することとによって前記位置合わせスコア関数を学習するルーチンを含む、請求項１０記載のシステム。 The routine configured to train the appearance model includes initializing the alignment score function, iteratively estimating a plurality of weak functions acting on a single rectangular feature, The system of claim 10, comprising a routine for learning the alignment score function by incrementally adding an estimate of a weak function of the alignment function to the alignment score function.

前記複数の弱関数を反復して推定することは、１に関するクラシファイヤ関数の最小二乗距離に基づいて前記複数の弱関数の弱関数をあてはめることを含む、請求項１３記載のシステム。 The system of claim 13, wherein iteratively estimating the plurality of weak functions includes fitting weak functions of the plurality of weak functions based on a least square distance of a classifier function with respect to one.

トレーニングイメージの前記集合は、顔イメージの集合を含み、トレーニングイメージの前記集合にアクセスするように構成された前記ルーチンは、顔イメージの前記集合にアクセスするように構成されたルーチンを含み、トレーニングイメージの前記集合を使用して前記外見モデルをトレーニングするように構成された前記ルーチンは、顔イメージの前記集合を使用して前記外見モデルをトレーニングするように構成されたルーチンを含む、請求項１０記載のシステム。 The set of training images includes a set of face images, and the routine configured to access the set of training images includes a routine configured to access the set of face images; 11. The routine configured to train the appearance model using the set of features comprises a routine configured to train the appearance model using the set of facial images. System.

前記メモリデバイスは、光ディスク、ランダムアクセスメモリ、またはハードドライブのうちの少なくとも１つを含む、請求項１０記載のシステム。 The system of claim 10, wherein the memory device comprises at least one of an optical disc, a random access memory, or a hard drive.

実行可能命令をその上に格納された１つまたは複数の固定コンピュータ可読媒体であって、前記実行可能命令は、
人間の顔を含むイメージにアクセスするように適合された命令と、
位置合わせスコア関数の勾配方向と前記位置合わせスコア関数の最大値の方向で指すベクトルとの間の角度を最小化する前記位置合わせスコア関数を推定するためにトレーニングされた識別外見モデルを含む識別顔位置合わせモデルを使用して前記人間の顔を位置合わせするように適合された命令と
を含む、１つまたは複数の固定コンピュータ可読媒体
を含む製造品。 One or more fixed computer readable media having executable instructions stored thereon, the executable instructions comprising:
Instructions adapted to access an image containing a human face;
An identification face comprising an identification appearance model trained to estimate the alignment score function that minimizes an angle between a gradient direction of the alignment score function and a vector pointing in a direction of a maximum value of the alignment score function An article of manufacture comprising one or more fixed computer readable media comprising instructions adapted to align the human face using an alignment model.

前記１つまたは複数の固定コンピュータ可読媒体は、少なくとも集合的に前記実行可能命令をその上に格納された複数の固定コンピュータ可読媒体を含む、請求項１７記載の製造品。 The article of manufacture of claim 17, wherein the one or more fixed computer readable media comprises a plurality of fixed computer readable media having the executable instructions stored thereon, at least collectively.

前記１つまたは複数の固定コンピュータ可読媒体は、光ディスク、磁気ディスク、ソリッドステートディスク、またはそのある組合せを含む、請求項１７記載の製造品。 The article of manufacture of claim 17, wherein the one or more fixed computer-readable media comprises an optical disk, a magnetic disk, a solid state disk, or some combination thereof.

前記１つまたは複数の固定コンピュータ可読媒体は、コンピュータのランダムアクセスメモリを含む、請求項１７記載の製造品。 The article of manufacture of claim 17, wherein the one or more fixed computer readable media comprises a random access memory of a computer.