JP6742405B2

JP6742405B2 - Head-mounted display with facial expression detection function

Info

Publication number: JP6742405B2
Application number: JP2018517321A
Authority: JP
Inventors: ジフンチュ; チュンウォンパク
Original assignee: Binaryvr Inc
Current assignee: Binaryvr Inc
Priority date: 2015-09-29
Filing date: 2016-09-26
Publication date: 2020-08-19
Anticipated expiration: 2036-09-26
Also published as: CN108140105A; KR102136241B1; JP2018538593A; KR20180112756A; US10089522B2; WO2017058733A1; DE112016004437T5; US20180365484A1; US20170091535A1

Description

本開示は、仮想現実又は拡張現実の環境において使用される、ユーザの表情検出のためのヘッドマウントディスプレイユニット全般に関連する。 The present disclosure relates generally to head-mounted display units for user facial expression detection used in virtual or augmented reality environments.

仮想現実（ＶＲ）及び拡張現実（ＡＲ）は、没入型リアルライフ経験をさせることを可能にすることから、ゲーム、教育、医療、及びソーシャルネットワーキングサービス等の適用における新興分野となっている。これらの適用の一部として、ユーザが表示装置に表示された自身の３Ｄ表現（例えば、アバター）を通じて他のユーザと関わることができるようにするソーシャルプラットフォームがある。ユーザが仮想現実又は拡張現実を通じて他のユーザと交流できるようにすることにより、これらの適用におけるユーザの経験を向上させることができる。 Virtual Reality (VR) and Augmented Reality (AR) are emerging fields in applications such as gaming, education, healthcare, and social networking services, as they enable immersive real life experiences. Part of these applications are social platforms that allow users to interact with other users through their 3D representation (eg, avatar) displayed on a display device. Allowing the user to interact with other users through virtual or augmented reality can enhance the user's experience in these applications.

人同士の交流において、表情は、個人の感情状態について、他者に多くの情報を伝達する。ユーザ間の交流をより豊かなものとするために、ユーザの３Ｄ表現は、ユーザの表情を示すように開発されてもよい。このようにして、ＶＲ又はＡＲ環境において、ユーザの心的状態についてより多くの情報を他のユーザに効果的に伝えることができる。 In human interaction, facial expressions convey a lot of information to others about their emotional state. In order to enrich the interaction between users, the user's 3D representation may be developed to show the user's facial expressions. In this way, more information about the user's mood can be effectively communicated to other users in a VR or AR environment.

本願は、２０１５年９月２９日出願の米国仮特許出願シリアル番号６２／２３４，４７８号及び２０１６年５月１６日出願の米国仮特許出願シリアル番号６２／３３７，２６１号に対する米国特許法第１１９条（ｅ）に基づく優先権を主張するものであり、その内容全体を参照としてここに援用する。 This application is directed to US provisional patent application serial number 62/234,478 filed September 29, 2015 and US provisional patent application serial number 62/337,261 filed May 16, 2016. Claims priority under section (e), the entire contents of which are incorporated herein by reference.

実施形態は、ヘッドマウントディスプレイを使用した表情検出に関連する。第１画像は、ヘッドマウントディスプレイ上の第１画像撮影装置により撮影される。第１画像は、ユーザの顔の上部を含む。第２画像は、ヘッドマウントディスプレイ上の第２画像撮影装置により撮影される。第２画像は、ユーザの顔の下部を含む。第１画像及び第２画像を処理することにより、ユーザの表情を表す表情パラメータを抽出する。 Embodiments relate to facial expression detection using a head mounted display. The first image is captured by the first image capturing device on the head mounted display. The first image includes the top of the user's face. The second image is captured by the second image capturing device on the head mounted display. The second image includes the lower part of the user's face. By processing the first image and the second image, facial expression parameters representing the facial expression of the user are extracted.

一実施形態において、第１画像撮影装置は、一対の赤外線カメラを備える。第２画像撮影装置は、深度カメラ、カラーカメラ、赤外線カメラ、又は２つの立体カメラ、のうちの１つを備える。 In one embodiment, the first image capturing device includes a pair of infrared cameras. The second image capturing device includes one of a depth camera, a color camera, an infrared camera, or two stereoscopic cameras.

一実施形態において、第１画像及び第２画像を処理することにより、少なくとも、第１画像から、ユーザの目及びユーザの眉に関連付けられたランドマーク位置を検出し、第２画像から、ユーザの顔の下部に関連付けられたランドマーク位置を検出する。 In one embodiment, by processing the first image and the second image, at least the landmark positions associated with the user's eyes and the user's eyebrows are detected from the first image, and from the second image, the landmark position of the user is detected. Detect the landmark position associated with the bottom of the face.

一実施形態において、抽出された表情パラメータをユーザのデジタル表現に適用することにより、ユーザのグラフィック表現を生成する。 In one embodiment, the extracted facial expression parameters are applied to the digital representation of the user to generate a graphical representation of the user.

一実施形態において、ユーザの無表情を表すキャリブレーション画像を撮影および処理することにより、キャリブレーションを実施する。 In one embodiment, the calibration is performed by capturing and processing a calibration image that represents the user's expressionlessness.

一実施形態において、キャリブレーション画像に基づき、個人化無表情メッシュを生成し、変形転写技術を個人化無表情メッシュに適用することで、個人化追跡モデルを構築することにより、キャリブレーションを実施する。 In one embodiment, the calibration is performed by creating a personalized expressionless mesh based on the calibration image and applying deformation transfer techniques to the personalized expressionless mesh to build a personalized tracking model. ..

一実施形態において、ブレンドシェイプモデルを、個人化追跡モデルに基づき、第１画像及び第２画像におけるランドマーク位置にフィットさせることにより、表情パラメータを得る。 In one embodiment, the blend shape model is fitted to the landmark positions in the first and second images based on the personalized tracking model to obtain the facial expression parameters.

一実施形態において、第１画像及び第２画像はリアルタイムで処理され、表情を取得する。 In one embodiment, the first image and the second image are processed in real time to obtain facial expressions.

実施形態は、第１撮影装置と、第２撮影装置と、表示装置と、本体とを備えるヘッドマウントディスプレイにも関連する。第１撮影装置は、目領域を含むユーザの顔の上部を撮影する。第２撮影装置は、第１撮影装置の下方位置に設けられ、ユーザの顔の下部を撮影する。表示装置は、ユーザに画像を表示する。本体には、第１撮影装置、第２撮影装置、及び表示装置が搭載される。 The embodiment also relates to a head mounted display that includes a first photographing device, a second photographing device, a display device, and a main body. The first imaging device images the upper part of the user's face including the eye region. The second photographing device is provided below the first photographing device and photographs the lower part of the face of the user. The display device displays an image to the user. A first photographing device, a second photographing device, and a display device are mounted on the main body.

一実施形態において、第２撮影装置は、本体からユーザの顔の下部に向かって伸びる伸張部材に搭載される。 In one embodiment, the second imaging device is mounted on an extension member that extends from the body toward the bottom of the user's face.

一実施形態において、ヘッドマウントディスプレイユニットは、第２撮影装置が搭載される摺動可能なマウントをさらに備える。 In one embodiment, the head mounted display unit further comprises a slidable mount on which the second imaging device is mounted.

一実施形態において、表示装置は、第１ディスプレイ及び第２ディスプレイを備える。第１ディスプレイは、左側画像をユーザの左目の方に表示し、第２ディスプレイは、右側画像をユーザの右目の方に表示する。 In one embodiment, the display device comprises a first display and a second display. The first display displays the left image toward the left eye of the user, and the second display displays the right image toward the right eye of the user.

一実施形態において、第１撮影装置は、一対のカメラを備え、カメラは各々、本体の両側に設置される。 In one embodiment, the first imaging device comprises a pair of cameras, each camera being installed on both sides of the body.

一実施形態において、第１撮影装置は、本体の中間に設置されたカメラを備える。 In one embodiment, the first imaging device comprises a camera installed in the middle of the body.

一実施形態において、第２撮影装置は、本体に直接搭載される。 In one embodiment, the second imaging device is mounted directly on the body.

一実施形態において、本体は、目領域を包含する膨らんだ上部を有する。 In one embodiment, the body has a bulged top that includes the eye region.

一実施形態において、表示装置は、一対の別個の表示部を備え、第１撮影装置は、一対の表示部の間に２つのカメラを備える。 In one embodiment, the display device includes a pair of separate display units, and the first imaging device includes two cameras between the pair of display units.

実施形態は、ヘッドマウントディスプレイユニットと、演算装置とを備える仮想現実システムにも関連する。演算装置は、ヘッドヘッドマウントディスプレイユニットに通信可能に連結される。演算装置は、ヘッドマウントディスプレイユニットから第１画像及び第２画像を受信し、第１画像及び第２画像を処理することにより、ユーザの表情を表す表情パラメータを抽出する。 Embodiments also relate to a virtual reality system that includes a head mounted display unit and a computing device. The computing device is communicatively coupled to the head-mounted display unit. The computing device receives the first image and the second image from the head mounted display unit and processes the first image and the second image to extract a facial expression parameter representing a facial expression of the user.

図１は、一実施形態に係る、ユーザの表情を撮影および処理するシステムを示すブロック図である。FIG. 1 is a block diagram illustrating a system for capturing and processing a facial expression of a user according to an embodiment. 図２Ａは、一実施形態に係る、図１のヘッドマウントディスプレイユニットの概略図である。2A is a schematic diagram of the head mounted display unit of FIG. 1 according to one embodiment. 図２Ｂは、一実施形態に係る、ユーザの目領域の画像を撮影する２Ｄカメラを示す概略図である。FIG. 2B is a schematic diagram illustrating a 2D camera that captures an image of a user's eye region, according to one embodiment. 図２Ｃは、一実施形態に係る、ユーザの顔に対するヘッドマウントディスプレイユニットの構成要素を示す概略図である。FIG. 2C is a schematic diagram illustrating components of a head mounted display unit for a user's face, according to one embodiment. 図２Ｄは、他の実施形態に係る、ヘッドマウントディスプレイユニットを示す概略図である。FIG. 2D is a schematic view showing a head mounted display unit according to another embodiment. 図２Ｅは、他の実施形態に係る、立体画像センサを備えたヘッドマウントディスプレイユニットを示す概略図である。FIG. 2E is a schematic diagram showing a head mounted display unit including a stereoscopic image sensor according to another embodiment. 図２Ｆは、他の実施形態に係る、摺動可能な立体画像センサを備えたヘッドマウントディスプレイユニットを示す概略図である。FIG. 2F is a schematic view showing a head mounted display unit including a slidable stereoscopic image sensor according to another embodiment. 図２Ｇは、一実施形態に係る、主要本体の中間頂上部分に２Ｄカメラを備えたヘッドマウントディスプレイユニットを示す概略図である。FIG. 2G is a schematic diagram illustrating a head-mounted display unit with a 2D camera on the mid-top portion of the main body, according to one embodiment. 図２Ｈは、一実施形態に係る、２Ｄカメラを包含するように膨らんだ上部を備えたヘッドマウントディスプレイユニットを示す概略図である。FIG. 2H is a schematic diagram illustrating a head-mounted display unit with an inflated top to include a 2D camera, according to one embodiment. 図２Ｉは、一実施形態に係る、図２Ｇ〜図２Ｈのヘッドマウントディスプレイユニットを使用したユーザの目領域の撮影を示す概略図である。FIG. 2I is a schematic diagram illustrating imaging of a user's eye area using the head mounted display unit of FIGS. 2G-2H, according to one embodiment. 図２Ｊは、一実施形態に係る、ヘッドマウントディスプレイユニットの表示装置の間に一対の２Ｄカメラを配置することを示す概略図である。FIG. 2J is a schematic diagram illustrating disposing a pair of 2D cameras between display devices of a head mounted display unit according to an embodiment. 図２Ｋは、一実施形態に係る、モバイルデバイスを受容するヘッドマウントディスプレイユニットを示す概略図である。FIG. 2K is a schematic diagram illustrating a head mounted display unit that receives a mobile device, according to one embodiment. 図３は、一実施形態に係る、表情を判定するためにヘッドマウントディスプレイユニットに接続された演算装置を示すブロック図である。FIG. 3 is a block diagram illustrating a computing device connected to a head mounted display unit for determining a facial expression according to one embodiment. 図４は、一実施形態に係る、演算装置におけるソフトウェアモジュールのブロック図である。FIG. 4 is a block diagram of software modules in a computing device according to one embodiment. 図５は、一実施形態に係る、表情を判定するために追跡される顔のランドマークを示す図である。FIG. 5 is a diagram illustrating face landmarks that are tracked to determine a facial expression, according to one embodiment. 図６は、一実施形態に係る、ユーザのデジタル表現における表情を使用するプロセス全体を示すフローチャートである。FIG. 6 is a flowchart illustrating the overall process of using facial expressions in a user's digital representation, according to one embodiment. 図７は、一実施形態に係る、ユーザの表情を検出するプロセスを示すフローチャートである。FIG. 7 is a flowchart illustrating a process of detecting a user's facial expression according to one embodiment.

図面及び以下の説明は、単なる例示としての好適な実施形態に関連する。以下を検討することにより、本明細書に開示した構造及び方法の代替実施形態は、クレームの主旨から逸脱することなく採用されてもよい実行可能な代替として容易に認識されることに留意しなければならない。
以降、添付の図面に例示したいくつかの実施形態について詳細に説明する。可及的に、図中では同様の参照符号が使用され、同様の機能を示すことがあることに留意しなければならない。図面は、例示のみを目的として開示のシステム（又は方法）の実施形態を示している。以下の説明において、本明細書に記載の主旨から逸脱しない限り、本明細書に示される構造及び方法の代替実施形態が採用されてもよい。 The drawings and the following description relate to preferred embodiments by way of example only. It should be noted that by considering the following, alternative embodiments of the structures and methods disclosed herein are readily recognized as viable alternatives that may be employed without departing from the spirit of the claims. I have to.
Hereinafter, some embodiments illustrated in the accompanying drawings will be described in detail. It should be noted that wherever possible, like reference numerals will be used in the figures to indicate similar functions. The drawings depict embodiments of the disclosed system (or method) for purposes of illustration only. In the following description, alternative embodiments of the structures and methods set forth herein may be employed without departing from the spirit described herein.

実施形態は、目領域を含むユーザの顔の上部を撮影する画像撮影装置（例えば、赤外線カメラ）と、鼻、唇、顎先、及び頬のうちの少なくとも１つを含むユーザの顔の下部特徴を撮影するもうひとつの画像撮影装置（例えば、深度カメラ）とを含むヘッドマウントディスプレイユニットを使用して、リアルタイムでユーザの表情を検出することに関連する。第１画像撮影装置及び第２画像撮影装置によって撮影された画像が処理され、表情に関連付けられたパラメータを抽出する。このパラメータは、表情を含むユーザのデジタル表現が生成可能となるように、送信又は処理可能である。 Embodiments include an image capture device (e.g., an infrared camera) that captures an upper portion of the user's face, including the eye region, and lower features of the user's face, including at least one of a nose, lips, chin, and cheeks. Relates to detecting a user's facial expression in real time using a head-mounted display unit that includes another image capturing device (e.g., a depth camera) that captures. The images captured by the first image capture device and the second image capture device are processed to extract parameters associated with the facial expression. This parameter can be transmitted or processed so that a digital representation of the user, including facial expressions, can be generated.

本明細書に記載の目領域とは、目と目の上方の眉をカバーする顔領域をいう。

表情検出システムの例としてのアーキテクチャ The eye area described in this specification refers to a face area that covers the eyes and the eyebrows above the eyes.

Architecture as an example of facial expression detection system

図１は、一実施形態に係る、ユーザの表情を撮影及び処理するシステム１００を示すブロック図である。システム１００は、数ある構成要素の中でも特に、ヘッドマウントディスプレイ（ＨＭＤ）１０２と、ＨＭＤ１０２と通信する演算装置１０８とを備えてもよい。ＨＭＤ１０２は、演算装置１０８と連携して使用され、ユーザの姿勢を検出し、ユーザの表情を検出し、ユーザに画像を表示する。 FIG. 1 is a block diagram illustrating a system 100 for capturing and processing a facial expression of a user according to one embodiment. The system 100 may include a head mounted display (HMD) 102 and a computing device 108 in communication with the HMD 102, among other components. The HMD 102 is used in cooperation with the arithmetic device 108, detects the posture of the user, detects the facial expression of the user, and displays an image on the user.

演算装置１０８は、有線通信又は無線通信を介してＨＭＤ１０２と通信してもよい。ＨＭＤ１０２にて再生するための画像及び音声のデータ１２０は、演算装置１０８から送信可能である。ＨＭＤ１０２はまた、ユーザの頭部の姿勢を示し、表情に関連付けられた撮影画像を含む情報１１０を演算装置１０８に送信する。 The arithmetic device 108 may communicate with the HMD 102 via wired communication or wireless communication. Image and audio data 120 to be reproduced on the HMD 102 can be transmitted from the arithmetic unit 108. The HMD 102 also transmits the information 110 indicating the posture of the user's head and including the captured image associated with the facial expression to the arithmetic unit 108.

ＨＭＤ１０２は、図２Ｃを参照して以下に詳細に示す通り、ユーザの頭部に装着される。ＨＭＤ１０２は、数ある構成要素の中でも特に、１つ以上の２Ｄカメラ１０４と、１つ以上の深度カメラ１０５と、１つ以上のディスプレイ１０６とを備えてもよい。ディスプレイ１０６とそのオペレーティングモジュールの詳細については、説明を簡潔にするために、本明細書中では省略する。各２Ｄカメラ１０４は、ユーザの顔の目領域を撮影し、赤外線カメラ又はＲＧＢカメラ（照明ランプを備えるか、備えないかを問わず）として実現されてもよい。各目領域は、目と眉とを含む。一方、深度カメラ１０５は、鼻、唇、頬、及び顎先のうちの少なくとも１つを含むユーザの顔の下部特徴について深度画像を生成する。 The HMD 102 is worn on the user's head, as described in detail below with reference to FIG. 2C. The HMD 102 may include one or more 2D cameras 104, one or more depth cameras 105, and one or more displays 106, among other components. Details of display 106 and its operating module are omitted herein for the sake of brevity. Each 2D camera 104 may be implemented as an infrared camera or an RGB camera (with or without an illumination lamp) that captures the eye area of the user's face. Each eye region includes eyes and eyebrows. On the other hand, the depth camera 105 generates depth images for the lower features of the user's face including at least one of the nose, lips, cheeks, and chin.

演算装置１０８は、以降、図７を参照して詳細に説明する通り、赤外線カメラ１０４及び深度カメラ１０５によって生成された画像を処理することにより、ユーザの表情を判定する。深度カメラ１０５を使用する代わりに、２ＤＲＧＢカメラ又は２Ｄ赤外線（ＩＲ）カメラも使用することができる。 The arithmetic device 108 determines the facial expression of the user by processing the images generated by the infrared camera 104 and the depth camera 105, as described in detail below with reference to FIG. 7. Instead of using the depth camera 105, a 2D RGB camera or a 2D infrared (IR) camera can also be used.

演算装置１０８は、ＨＭＤ１０２とは別個のものとして図１に示されているが、この演算装置１０８は、ＨＭＤ１０２の一部であってもよい。

ヘッドマウントディスプレイの例 Although computing device 108 is shown in FIG. 1 as separate from HMD 102, computing device 108 may be part of HMD 102.

Head mounted display example

図２Ａは、一実施形態に係るＨＭＤ１０２の概略図である。ＨＭＤ１０２は、主要本体２０２と、主要本体２０２から下の方へ伸びる垂直伸張部材２０４とを備える。主要本体２０２は、２Ｄカメラ１０４と、ディスプレイ１０６と、その他のセンサ（例えば、ジャイロスコープ）とを備えて設置される。 FIG. 2A is a schematic diagram of the HMD 102 according to one embodiment. The HMD 102 comprises a main body 202 and a vertical extension member 204 extending downwardly from the main body 202. The main body 202 is installed with the 2D camera 104, the display 106, and other sensors (for example, a gyroscope).

ＨＭＤ１０２及び垂直伸張部材２０４は、深度カメラ１０５によって撮影されるユーザの顔の領域の調整を可能にする機構を介して連結されてもよい。垂直伸張部材の代わりに、水平に伸びるか、又は傾斜配向で伸びる部材も深度カメラ１０５を搭載するために使用されてよい。深度カメラ１０５は、（ｉ）３Ｄ深度マップと、（ｉｉ）撮影領域の２Ｄカラー画像又は赤外線画像とを提供する。ユーザの顔の下部特徴を撮影するために深度カメラ１０５を使用することは、数ある理由の中でも特に、顔の下部特徴についての３Ｄジオメトリ情報を高精度に得ることができるという理由により、有利である。深度カメラ１０５を使用する代わりに、２Ｄカラーカメラもユーザの顔の下部特徴を撮影するために使用可能である。２Ｄカラーカメラで撮影されたカラー画像を演算装置１０８で処理することにより、顔の下部特徴について３Ｄジオメトリ情報を生成してもよい。 The HMD 102 and the vertical extension member 204 may be coupled via a mechanism that allows adjustment of the area of the user's face imaged by the depth camera 105. Instead of vertically extending members, members that extend horizontally or extend in a tilted orientation may also be used to mount the depth camera 105. The depth camera 105 provides (i) a 3D depth map and (ii) a 2D color image or infrared image of the captured area. Using the depth camera 105 to capture the bottom features of the user's face is advantageous, among other reasons, because of the high accuracy of the 3D geometry information about the bottom features of the face. is there. Instead of using the depth camera 105, a 2D color camera can also be used to capture the lower features of the user's face. The 3D geometry information may be generated for the lower face features by processing the color image captured by the 2D color camera with the arithmetic device 108.

ＨＭＤ１０２はまた、ユーザがＨＭＤ１０２をユーザの頭部に固定できるように、ストラップ２１２に取り付けられてもよい。 The HMD 102 may also be attached to the strap 212 to allow the user to secure the HMD 102 to the user's head.

図２Ａに示される通り、一対の２Ｄカメラ１０４が、主要本体２０２の正面壁部の上方隅部に配置されて、ユーザの顔のそれぞれの領域（すなわち、左目及び右目の領域）を撮影する。代替実施形態においては、一対の２ＤカメラをＨＭＤ１０２の側壁２０３に配置することができる。２Ｄカメラ１０４は、ディスプレイ１０６のすぐ隣に配置することもできる。 As shown in FIG. 2A, a pair of 2D cameras 104 are placed in the upper corners of the front wall of the main body 202 to capture respective areas of the user's face (ie, left and right eye areas). In an alternative embodiment, a pair of 2D cameras may be located on the sidewall 203 of the HMD 102. The 2D camera 104 can also be placed right next to the display 106.

ディスプレイ１０６は２つの別個の表示モジュールを備えてもよく、そのうちのひとつは左側画像をユーザの左目の方に表示するもの、もう一つは右側画像をユーザの右目の方に表示するものである。２つのディスプレイ１０６は、物理的に離間してもよい。或いは、単一の表示モジュールが、左側画像及び右側画像を別々に表示するための２つの別個の表示領域に分けられてもよい。 The display 106 may include two separate display modules, one for displaying the left image toward the left eye of the user and the other for displaying the right image toward the right eye of the user. .. The two displays 106 may be physically separated. Alternatively, a single display module may be split into two separate display areas for displaying the left and right images separately.

図２Ｂは、一実施形態に係る、目と眉を含むユーザの顔２２０の目領域を表す画像を撮影する２Ｄカメラ１０４を示す概略図である。２Ｄカメラ１０４は、ユーザが装着しているときにユーザの顔に対向するＨＭＤ１０２の本体２０２に設置される。具体的には、２Ｄカメラ１０４は、ユーザの顔の片目又は両目の領域を撮影する。 FIG. 2B is a schematic diagram illustrating a 2D camera 104 that captures an image representing an eye region of a user's face 220, including eyes and eyebrows, according to one embodiment. The 2D camera 104 is installed in the main body 202 of the HMD 102 that faces the user's face when the user wears it. Specifically, the 2D camera 104 captures an area of one or both eyes of the user's face.

赤外線カメラが、２Ｄカメラ１０４として使用されてもよい。目及び眉の周辺領域の画像を撮影するために赤外線カメラを使用することは、数ある理由の中でも特に、（ｉ）赤外線カメラはユーザの肌に接触することなく顔の特徴を十分に撮影することができるという理由と、（ｉｉ）赤外線カメラは、ＨＭＤ１０２がユーザによって装着されているときに外部の光が遮られることから生じ得る低照明条件の下で動作するという理由により、有利である。 An infrared camera may be used as the 2D camera 104. The use of infrared cameras to capture images of the area around the eyes and eyebrows is, among other reasons, (i) the infrared camera captures sufficient facial features without touching the user's skin. And (ii) the infrared camera is advantageous because it operates under low lighting conditions that can result from external light being blocked when the HMD 102 is worn by the user.

一実施形態において、２Ｄカメラ１０４は、広角を撮影するために魚眼レンズを備えてもよい。２Ｄカメラからユーザの目及び眉までの距離が短い（通常、５センチメートル以内）ため、魚眼レンズを使用して目領域全体を撮影する。深度カメラ１０５も、広角を撮影するために魚眼レンズを備える。 In one embodiment, the 2D camera 104 may include a fisheye lens to capture wide angles. Due to the short distance from the 2D camera to the user's eyes and eyebrows (typically within 5 cm), a fisheye lens is used to image the entire eye area. The depth camera 105 also includes a fisheye lens for wide-angle shooting.

図２Ｃは、一実施形態に係る、ユーザの顔２２０に対するＨＭＤ１０２の構成要素の配置を示す概略図である。図２ＣのＨＭＤ１０２は、左目領域を撮影するものと、右目領域を撮影するものとの一対の２Ｄカメラ１０４を有する。２Ｄカメラの中心軸２４４は、垂直面２５４に対して角度αを成す。角度αは、目領域を撮影するために３０°〜４５°の範囲内であってもよい。 FIG. 2C is a schematic diagram illustrating an arrangement of components of the HMD 102 with respect to a user's face 220, according to one embodiment. The HMD 102 in FIG. 2C has a pair of 2D cameras 104 that captures the left-eye region and that captures the right-eye region. The central axis 244 of the 2D camera makes an angle α with the vertical plane 254. The angle α may be in the range of 30° to 45° to capture the eye region.

図２Ｄは、他の実施形態に係るＨＭＤ１０２Ｂの概略図である。ＨＭＤ１０２Ｂは、図２ＡのＨＭＤ１０２と同様であるが、ユーザの顔の下部の画像を撮影するためにカメラ１０５Ｂが取り付けられるマウント２０４Ｂを有する。マウント２０４Ｂは、図２Ａの垂直伸張部材よりも短い。カメラ１０５Ｂは、深度カメラであってもよく、又は、ＲＧＢ／グレースケールカメラであってもよい。カメラ１０５Ｂがユーザの顔の下部の画像をよりよく撮影できるように、１つ以上の赤外線又は可視光源（図示せず）もマウント２０４Ｂに取り付けられてもよい。代替実施形態では、ＨＭＤ１０２は別個のマウント又は垂直伸張部材を備えないが、主要本体２０２に直接搭載されたカメラ１０５Ｂを有する。 FIG. 2D is a schematic diagram of an HMD 102B according to another embodiment. The HMD 102B is similar to the HMD 102 of FIG. 2A, but has a mount 204B to which a camera 105B is attached to capture an image of the bottom of the user's face. Mount 204B is shorter than the vertical extension member of FIG. 2A. The camera 105B may be a depth camera or an RGB/greyscale camera. One or more infrared or visible light sources (not shown) may also be attached to mount 204B to allow camera 105B to better capture an image of the bottom of the user's face. In an alternative embodiment, the HMD 102 does not include a separate mount or vertically extending member, but has the camera 105B mounted directly on the main body 202.

図２Ｅは、他の実施形態に係るＨＭＤ１０２Ｃの概略図である。ＨＭＤ１０２Ｃは、図２ＤのＨＭＤ１０２Ｂと同様であるが、マウント２０４Ｃに設置される立体カメラ１０５Ｂを有する。両方の立体カメラ１０５Ｂがユーザの顔の下部の画像を撮影する。撮影された画像は、演算装置１０８によって処理され、ユーザの表情を判定する。 FIG. 2E is a schematic diagram of an HMD 102C according to another embodiment. The HMD 102C is similar to the HMD 102B of Figure 2D, but has a stereoscopic camera 105B installed on a mount 204C. Both stereoscopic cameras 105B take an image of the bottom of the user's face. The captured image is processed by the calculation device 108 to determine the facial expression of the user.

図２Ｆは、一実施形態に係るＨＭＤ１０２Ｄの概略図である。ＨＭＤ１０２Ｄは、図２ＥのＨＭＤ１０２Ｃと同様であるが、主要本体２０２に対して摺動可能なマウント２２２Ａ及び２２２Ｂを有する。マウント２２２Ａ及び２２２Ｂにはカメラ１０５Ｄが搭載され、それらはＩＲカメラ又はグレースケールカメラであってもよい。マウント２２２Ａ及び２２２Ｂが主要本体２０２に対して摺動できるようにすることにより、マウント２２２Ａ及び２２２Ｂの位置が、ユーザの顔の下部をよりよく撮影できるように調整されてもよい。いくつかの実施形態において、マウント２２２Ａ及び２２２Ｂは、ユーザによって手動で移動される。他の実施形態においては、マウント２２２Ａ及び２２２Ｂは、アクチュエータ（例えば、図示しないモータである）によって自動調整される。 FIG. 2F is a schematic diagram of the HMD 102D according to one embodiment. HMD 102D is similar to HMD 102C of FIG. 2E, but has mounts 222A and 222B slidable relative to main body 202. Mounts 222A and 222B carry cameras 105D, which may be IR cameras or grayscale cameras. By allowing the mounts 222A and 222B to slide relative to the main body 202, the position of the mounts 222A and 222B may be adjusted to better image the lower portion of the user's face. In some embodiments, the mounts 222A and 222B are manually moved by the user. In other embodiments, mounts 222A and 222B are self-adjusting by an actuator (eg, a motor not shown).

図２Ｇは、一実施形態に係るＨＭＤ１０２Ｅの概略図である。ＨＭＤ１０２Ｄは、単一の２Ｄカメラ１０４が主要本体２０２の中心に配置されることを除いて、図２ＡのＨＭＤ１０２と同様である。単一の２Ｄカメラ１０４は、図２Ｉを参照して以下に説明する通り、ユーザの顔の左目領域および右目領域を撮影する。 FIG. 2G is a schematic diagram of an HMD 102E according to one embodiment. The HMD 102D is similar to the HMD 102 of FIG. 2A, except that a single 2D camera 104 is centered on the main body 202. The single 2D camera 104 images the left and right eye regions of the user's face, as described below with reference to FIG. 2I.

図２Ｈは、一実施形態に係るＨＭＤ１０２Ｆの概略図である。ＨＭＤ１０２Ｆは、主要本体２０２が上の方へ突出した縁部２３３を有することを除いて、図２ＧのＨＭＤ１０２Ｅと同様である。上の方へ突出した縁部２３３は、ユーザの顔の目領域が、主要本体２０２の下に完全に包まれるのを可能とする。 FIG. 2H is a schematic diagram of an HMD 102F according to one embodiment. The HMD 102F is similar to the HMD 102E of FIG. 2G, except that the main body 202 has an upwardly projecting edge 233. The upwardly projecting edge 233 allows the eye area of the user's face to be completely wrapped underneath the main body 202.

図２Ｉは、図２ＧのＨＭＤ１０２Ｅ又は図２ＨのＨＭＤ１０２Ｆにおける単一の２Ｄカメラ１０４を使用して両側の目領域を撮影することを示す概略図である。２Ｄカメラ１０４で撮影される顔の領域を広げるために、２Ｄカメラ１０４において魚眼レンズが使用されてもよい。 FIG. 2I is a schematic diagram showing capturing both eye regions using a single 2D camera 104 in the HMD 102E of FIG. 2G or the HMD 102F of FIG. 2H. A fisheye lens may be used in the 2D camera 104 to increase the area of the face captured by the 2D camera 104.

図２Ｊは、一実施形態に係る、２つの別個の２Ｄカメラ１０４Ｊを使用して両側の目領域を撮影することを示す概略図である。図２Ｃの２Ｄカメラ１０４と異なり、２Ｄカメラ１０４Ｊは、ディスプレイ１０６の間に配置される。２Ｄカメラ１０４Ｊの中心軸２４７は、両方の２Ｄカメラ１０４Ｊが顔の目領域に対向するように、垂直面２５４に対して角度βをなす。２Ｄカメラ１０４Ｊをディスプレイ１０６の間に配置する多くの利点のうちの１つとして、ＨＭＤの寸法（特に、幅Ｗ）を小さくできることがあげられる。 FIG. 2J is a schematic diagram illustrating capturing two sided eye regions using two separate 2D cameras 104J, according to one embodiment. Unlike the 2D camera 104 of FIG. 2C, the 2D camera 104J is located between the displays 106. The central axis 247 of the 2D camera 104J makes an angle β with respect to the vertical plane 254 so that both 2D cameras 104J face the eye area of the face. One of the many advantages of placing the 2D camera 104J between the displays 106 is that the dimensions of the HMD (particularly the width W) can be reduced.

図２Ａから図２Ｊを参照して上述したＨＭＤは、画像をそれぞれの目の方に表示する専用ディスプレイ１０６を使用するものとして説明したが、他の実施形態では、ディスプレイは、別のモバイルデバイス（例えば、スマートフォン）の表示装置として実現されてもよい。例えば、図２Ｋは、モバイルデバイス２６１を受容するスロット２６３を有するＨＭＤ１０２Ｇの概略図である。モバイルデバイス２６１が主要本体２０２のスロット２６３に挿入されて、モバイルデバイスの表示装置がＨＭＤ１０２Ｇのディスプレイとして機能してもよい。図２Ｋに示されるようなスロット２６３は、単なる例示であり、異なる構成のスロットも採用可能である。図２Ｋの実施形態において、ディスプレイ１０６は、演算装置１０８と同じように、モバイルデバイス２６１で実現される。

表情を判定する一例としての演算装置 Although the HMDs described above with reference to FIGS. 2A-2J use a dedicated display 106 that displays an image toward each eye, in other embodiments, the display may be another mobile device ( For example, it may be realized as a display device of a smartphone. For example, FIG. 2K is a schematic diagram of an HMD 102G having a slot 263 that receives a mobile device 261. The mobile device 261 may be inserted into the slot 263 of the main body 202, and the display device of the mobile device may function as the display of the HMD 102G. The slot 263 as shown in FIG. 2K is merely exemplary, and differently configured slots may be employed. In the embodiment of FIG. 2K, display 106 is implemented on mobile device 261, similar to computing device 108.

Arithmetic device as an example for determining facial expression

図３は、一実施形態に係る、表情を判定するためにＨＭＤ１０２と接続された演算装置１０８を示すブロック図である。演算装置１０８は、数ある構成要素の中でも特に、メモリ３０２と、プロセッサ３０４と、ＨＭＤインタフェース３０６と、ディスプレイ３０８と、ユーザインタフェース３１０と、これらの構成要素を接続するバス３０１とを備えてもよい。演算装置１０８は、他の演算装置（図示せず）と通信するために、ネットワークインタフェース等の他の構成要素を含んでもよい。 FIG. 3 is a block diagram illustrating a computing device 108 connected to the HMD 102 to determine a facial expression, according to one embodiment. The computing device 108 may include a memory 302, a processor 304, an HMD interface 306, a display 308, a user interface 310, and a bus 301 connecting these components, among other components. .. Computing device 108 may include other components, such as a network interface, to communicate with other computing devices (not shown).

メモリ３０２は、図４を参照して以下に詳細に説明する通り、ソフトウェアモジュールを記憶する非一時的コンピュータ可読記憶媒体である。メモリ３０２に記憶された命令は、プロセッサ３０４によって実行されることにより、表情検出に関連付けられたオペレーションと、検出された表情を組み込んだユーザのデジタル表現の生成とを実施する。 The memory 302 is a non-transitory computer readable storage medium that stores software modules, as described in detail below with reference to FIG. The instructions stored in memory 302 are executed by processor 304 to perform operations associated with facial expression detection and to generate a digital representation of the user incorporating the detected facial expression.

プロセッサ３０４は、メモリ３０２に記憶された種々の命令を実行し、演算装置１０８における他の構成要素のオペレーションを制御する。演算装置１０８は、１つを上回る数のプロセッサを備えてもよい。 Processor 304 executes various instructions stored in memory 302 and controls the operation of other components in computing device 108. Computing device 108 may include more than one processor.

ＨＭＤインタフェース３０６は、ＨＭＤ１０２と通信するためのハードウェア、ソフトウェア、ファームウェア、又はそれらの組み合わせである。ＨＭＤインタフェース３０６は、演算装置１０８がＨＭＤ１０２において再生するための画像及び音声のデータ１２０を送信できるようにし、また、ユーザの頭部の姿勢に関連付けられた情報１１０と、表情に関連付けられた撮影画像とをＨＭＤ１０２から受信することができる。ＨＭＤインタフェース３０６は、１つ以上の通信プロトコルに対応するものであってもよい。 The HMD interface 306 is hardware, software, firmware, or a combination thereof for communicating with the HMD 102. The HMD interface 306 enables the computing device 108 to transmit image and sound data 120 for reproduction on the HMD 102, and also information 110 associated with the posture of the user's head and captured images associated with facial expressions. And can be received from the HMD 102. The HMD interface 306 may be compatible with one or more communication protocols.

ディスプレイ３０８は、画像をレンダリングしてユーザに提示するために使用される。これらの画像には、ＨＭＤ１０２のオペレーションに関連付けられた情報が含まれてもよい。 The display 308 is used to render the image and present it to the user. These images may include information associated with the operation of HMD 102.

ユーザインタフェース３１０は、ユーザが演算装置１０８と情報をやりとりできるようにするためのハードウェア、ソフトウェア、ファームウェア、又はそれらの組み合わせである。ユーザインタフェース３１０は、ポインティングデバイス（例えば、マウス）及びキーボードを備えてもよい。 The user interface 310 is hardware, software, firmware, or a combination thereof that allows a user to interact with the computing device 108. The user interface 310 may include a pointing device (eg, mouse) and keyboard.

図４は、一実施形態に係る、演算装置１０８におけるソフトウェアモジュールのブロック図である。メモリ３０２は、数あるソフトウェア構成要素の中でも特に、オペレーティングシステム４０６と、表情検出モジュール４１０と、アプリケーションモジュール４４０とを記憶する。メモリ３０２はまた、図４には図示されない他の種々のソフトウェアモジュールも備えてもよい。 FIG. 4 is a block diagram of software modules in the computing device 108 according to one embodiment. The memory 302 stores an operating system 406, a facial expression detection module 410, and an application module 440, among other software components. Memory 302 may also include various other software modules not shown in FIG.

オペレーティングシステム４０６は、演算装置１０８において利用可能なリソースの管理を担うソフトウェアモジュールである。利用可能なオペレーティングシステムには、例えば、ＩＯＳ、ＷＩＮＤＯＷＳ（登録商標）、ＬＩＮＵＸ、ＡＮＤＲＯＩＤ（登録商標）、及びＭＡＣＯＳが含まれてもよい。 The operating system 406 is a software module that manages resources that can be used in the computing device 108. The available operating system, for example, IOS, WINDOWS (registered trademark), LINUX, ANDROID (TM), and may include MAC OS.

表情検出モジュール４１０は、２Ｄカメラ１０４から受信した２Ｄ画像（例えば、赤外線画像）４０２と、深度カメラ１０５から受信した画像４０４とに基づいて、ユーザの表情を検出するソフトウェアモジュールである。画像４０４には、深度カメラ１０５によって生成された深度画像とカラー画像又はグレースケール画像との双方が含まれてもよい。表情検出モジュール４１０は、赤外線画像４０２と画像４０４とを処理することにより、ユーザの表情を示す表情（ＦＥ）パラメータ４２４を生成する。 The facial expression detection module 410 is a software module that detects the facial expression of the user based on the 2D image (for example, infrared image) 402 received from the 2D camera 104 and the image 404 received from the depth camera 105. The image 404 may include both the depth image generated by the depth camera 105 and a color image or a grayscale image. The facial expression detection module 410 processes the infrared image 402 and the image 404 to generate an facial expression (FE) parameter 424 indicating the facial expression of the user.

表情検出モジュール４１０には、目及び眉追跡モジュール４１４と、顔下部追跡モジュール４１８と、ＦＥパラメータ生成器４２２とを含むがこれに限定されないサブモジュールが含まれてもよい。目及び眉追跡モジュール４１４は、ランドマーク位置に基づき、２Ｄ画像４０２における瞳の中心と、目の輪郭と、眉の輪郭とを判定する。目及び眉追跡モジュール４１４は、瞳、目の輪郭、眉の輪郭のランドマークのアノテーションのついたトレーニング画像サンプルを用いて事前トレーニングされている。このようなアノテーションは、手動で実施されてもよい。図５Ａ及び図５Ｂ中、例としてのランドマークが「Ｘ」点として示されている。 The facial expression detection module 410 may include sub-modules including, but not limited to, the eye and eyebrow tracking module 414, the lower face tracking module 418, and the FE parameter generator 422. The eye and eyebrow tracking module 414 determines the center of the pupil, the eye contour, and the eyebrow contour in the 2D image 402 based on the landmark position. The eye and eyebrow tracking module 414 has been pre-trained with training image samples annotated with landmarks for pupils, eye contours, and eyebrow contours. Such annotations may be implemented manually. An exemplary landmark is shown as an "X" point in FIGS. 5A and 5B.

目及び眉追跡モジュール４１４は、追跡アルゴリズムを採用してもよい。追跡アルゴリズムは、当分野で周知である、例えば、（ｉ）教師あり降下法（ＳＤＭ）、（ｉｉ）変形可能モデルフィッティング、（ｉｉｉ）アクティブアピアランスモデリング、（ｉｉｉ）ディープラーニング技術を使用してもよい。目及び眉追跡モジュール４１４は、ユーザの目と眉とを追跡した結果として、目及び眉の位置及び形状を示すランドマーク位置４１５を生成する。魚眼レンズを使用して２Ｄ画像を撮影するとき、目及び眉追跡モジュール４１４は、追跡アルゴリズムの実行前に、画像を平坦化して、魚眼レンズの使用によって生じた２Ｄ画像中の歪みを除去してもよい。 The eye and eyebrow tracking module 414 may employ a tracking algorithm. Tracking algorithms are also well known in the art, for example using (i) supervised descent (SDM), (ii) deformable model fitting, (iii) active appearance modeling, (iii) deep learning techniques. Good. The eye and eyebrow tracking module 414 generates a landmark position 415 indicating the position and shape of the eyes and eyebrows as a result of tracking the user's eyes and eyebrows. When using a fisheye lens to take a 2D image, the eye and eyebrow tracking module 414 may flatten the image to remove distortions in the 2D image caused by the use of the fisheye lens before executing the tracking algorithm. ..

同様に、顔下部追跡モジュール４１８は、画像４０４に基づき、ユーザの鼻、唇、顎先、頬、及び顎先と頬の周囲の顔のシルエットのうちの少なくとも１つの姿勢を追跡する。下部追跡モジュール４１８は、ユーザの顔の下部のランドマークを追跡するために、当分野において周知である、例えば、（ｉ）教師あり降下法（ＳＤＭ）、（ｉｉ）変形可能モデルフィッティング、（ｉｉｉ）アクティブアピアランスモデリング、（ｉｖ）ディープマシンラーニングのうちの１つを使用して、追跡アルゴリズムも使用してもよい。ユーザの顔の下部におけるランドマークは、例えば、図５Ｃに示されている。顔下部追跡モジュール４１８は、ユーザの顔の下部におけるランドマークを追跡することにより、鼻、唇、顎先、及び頬のうちの少なくとも１つを含む顔の下部特徴のランドマーク位置４１９を生成する。顎先及び頬の周囲のシルエットを検出することの多くの利点のうちの１つとして、顎及び頬の動きを明確に撮影できることがあげられる。それはまた、カメラに対する頭部位置のロバストな追跡にも役立ち、これは唇追跡では容易なことではない。 Similarly, the lower part of face tracking module 418 tracks the posture of at least one of the user's nose, lips, chin, cheeks, and silhouette of the face around the chin and cheeks based on the image 404. The bottom tracking module 418 is well known in the art for tracking landmarks on the bottom of a user's face, such as (i) supervised descent (SDM), (ii) deformable model fitting, (iii). Tracking algorithms may also be used using one of) Active Appearance Modeling, (iv) Deep Machine Learning. The landmarks at the bottom of the user's face are shown, for example, in Figure 5C. A lower face tracking module 418 generates landmark position 419 of lower face features including at least one of the nose, lips, chin, and cheeks by tracking landmarks in the lower portion of the user's face. .. One of the many advantages of detecting silhouettes around the chin and cheeks is that chin and cheek movements can be clearly imaged. It also helps with robust tracking of head position relative to the camera, which is not easy with lip tracking.

ＦＥパラメータ生成器４２２は、ランドマーク位置４１５及び４１９と、深度カメラからの３Ｄ深度マップとを受信する。ＦＥパラメータ生成器４２２は、図６を参照して以下に詳細に説明する通り、キャリブレーションプロセス中に得られる個人化３Ｄ表情モデルのモデルを記憶する。ＦＥパラメータ生成器４２２はまた、図７を参照して以下に詳細に説明する通り、ランドマーク位置４１５及び４１９と３Ｄ深度マップを３Ｄ表情モデルのモデルにフィッティングすることにより、ＨＭＤ１０２を装着するユーザの表情を総合的に示す表情（ＦＥ）パラメータ４２４を抽出する。 The FE parameter generator 422 receives the landmark positions 415 and 419 and the 3D depth map from the depth camera. The FE parameter generator 422 stores a model of the personalized 3D facial expression model obtained during the calibration process, as described in detail below with reference to FIG. The FE parameter generator 422 also fits the user wearing the HMD 102 by fitting the landmark positions 415 and 419 and the 3D depth map to the model of the 3D facial expression model, as described in detail below with reference to FIG. A facial expression (FE) parameter 424 that comprehensively indicates the facial expression is extracted.

アプリケーションモジュール４４０は、ＦＥパラメータ４２４の形式での検出表情に基づき、種々のオペレーションを実施する。アプリケーションモジュール４４０には、数ある要素の中でも特に、マッピングモジュール４４２と、グラフィック表現ストレージ４４６と、仮想現実（ＶＲ）／拡張現実（ＡＲ）モジュール４４８とが含まれてもよい。グラフィック表現ストレージ４４６は、ユーザの１つ以上のデジタル表現を記憶する。マッピングモジュール４４２は、グラフィック表現ストレージ４４６からユーザのデジタル表現を検索し、受信されたユーザのデジタル表現にＦＥパラメータ４２４（例えば、ブレンドシェイプウェイト値）をリアルタイムで転写することにより、ＶＲ／ＡＲモジュール４４８において使用するデータを生成する。 The application module 440 performs various operations based on the detected facial expression in the form of FE parameters 424. The application module 440 may include a mapping module 442, a graphic representation storage 446, and a virtual reality (VR)/augmented reality (AR) module 448, among other elements. Graphic representation storage 446 stores one or more digital representations of a user. The mapping module 442 retrieves the user's digital representation from the graphic representation storage 446 and transcribes the FE parameters 424 (eg, blend shape weight values) into the received user's digital representation in real time, thereby causing the VR/AR module 448. Generate the data to be used in.

ＶＲ／ＡＲモジュール４４８は、ＦＥパラメータ４２４（例えば、ブレンドシェイプ）に従って、又は、転写されたブレンドシェイプウェイトとユーザのデジタル表現の表現パラメータ空間との間のセマンティック・マッピング関数に基づき、ユーザの３Ｄグラフィック表現を生成してもよい。ＶＲ／ＡＲモジュール４４８は、表情に基づく、例えば、ソーシャルネットワーキングサービス、ゲーム、オンラインショッピング、ビデオ通話、及びヒューマン・マシン・インターフェースなどの種々のサービスを実施するソフトウェアモジュールの一部であってもよく、又はこれらと連携して動作してもよい。 The VR/AR module 448 determines the user's 3D graphics according to the FE parameters 424 (eg, blend shape) or based on the semantic mapping function between the transferred blend shape weights and the expression parameter space of the user's digital representation. The representation may be generated. The VR/AR module 448 may be part of a software module that implements a variety of facial expression-based services, such as social networking services, games, online shopping, video calling, and human-machine interfaces, Alternatively, it may operate in cooperation with these.

表情検出モジュール４１０及びアプリケーションモジュール４４０がソフトウェアモジュールとして実現されるものとして図４に示されているが、これらのモジュールは、集積回路（ＩＣ）構成要素として実現されてもよい。

表情検出プロセス Although facial expression detection module 410 and application module 440 are shown in FIG. 4 as implemented as software modules, these modules may be implemented as integrated circuit (IC) components.

Facial expression detection process

図６は、一実施形態に係る、ユーザのデジタル表現における表情を使用するプロセス全体を示すフローチャートである。まず、ユーザがＨＭＤ１０２を装着した後、キャリブレーションが実施される（６０６）。一実施形態において、オンラインキャリブレーションプロセスを使用して、ＨＭＤ１０２のユーザのための個人化追跡モデルを構築する。キャリブレーション中、２Ｄカメラ１０４及び／又は深度カメラ１０５が複数の深度画像及び２Ｄカラー画像又は赤外線画像を撮影している間の所定時間（例えば、数秒間）、ユーザは無表情の顔ポーズを保つ。 FIG. 6 is a flowchart illustrating the overall process of using facial expressions in a user's digital representation, according to one embodiment. First, after the user wears the HMD 102, calibration is performed (606). In one embodiment, an online calibration process is used to build a personalized tracking model for users of the HMD 102. During the calibration, the user maintains a faceless face pose for a predetermined time (for example, several seconds) while the 2D camera 104 and/or the depth camera 105 captures a plurality of depth images and 2D color images or infrared images. ..

表情検出モジュール４１０は、キャリブレーションプロセスの一部として、これらの画像を受信し、３Ｄ体積測定モデル作成プロセスを適用することにより、関連付けられた顔色情報を備えた、顔の下半分についての平滑化３Ｄ体積測定顔メッシュを作成する。平滑化３Ｄ体積測定顔メッシュを作成するプロセスは、当分野において周知である（例えば、ＲｉｃｈａｒｄＡ．Ｎｅｗｃｏｍｂｅらによる「ＫｉｎｅｃｔＦｕｓｉｏｎ：Ｒｅａｌ−ｔｉｍｅＤｅｎｓｅＳｕｒｆａｃｅＭａｐｐｉｎｇａｎｄＴｒａｃｋｉｎｇ」Ｍｉｘｅｄａｎｄａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ（ＩＳＭＡＲ）、２０１１年第１０回ＩＥＥＥ国際シンポジウム、２０１１年を参照のこと。その内容全体を参照としてここに援用する）。表情検出モジュール４１０はまた、目領域画像と顔下部画像とに２Ｄランドマーク検出を実施することにより、目、目のライン、眉のライン、唇のライン、鼻のライン、及び顔のシルエット（例えば、顎先及び頬のライン）の中心及び周囲を探し出す。表情検出モジュール４１０は、複数の撮影画像に亘って２Ｄランドマークを平均化することにより、２Ｄランドマーク検出においてノイズとなるアーティファクトを低減する。ＦＥパラメータ生成器４２２は、３Ｄ体積測定顔メッシュと２Ｄ画像における２Ｄ顔ランドマーク位置とを使用して、（ｉ）テンプレート無表情モデルの剛性ポーズを推定した後、（ｉｉ）無表情の線形主成分分析（ＰＣＡ）モデルを歪ませて、体積測定メッシュと２Ｄランドマークとをフィットさせることにより、個人化無表情モデルを構築する。 The facial expression detection module 410 receives these images as part of the calibration process and applies a 3D volumetric modeling process to smooth the lower half of the face with associated complexion information. Create a 3D volumetric face mesh. The process of creating a smoothed 3D volumetric face mesh is well known in the art (eg, Richard A. Newcombe et al., "KinectFusion: Real-time Sense Surface Mapping and Tracking", Mixed and Aligned Years, 11). See the 10th IEEE International Symposium, 2011, the entire contents of which are hereby incorporated by reference). The facial expression detection module 410 also performs 2D landmark detection on the eye area image and the lower face image to detect eyes, eye lines, eyebrow lines, lip lines, nose lines, and face silhouettes (eg, , Chin and cheek lines) center and perimeter. The facial expression detection module 410 averages the 2D landmarks over a plurality of captured images, thereby reducing noise artifacts in the 2D landmark detection. The FE parameter generator 422 uses the 3D volumetric face mesh and the 2D face landmark positions in the 2D image to (i) estimate the rigid pose of the template expressionless model and then (ii) the expressionless linear principal. A personalized expressionless model is constructed by distorting the component analysis (PCA) model and fitting the volumetric mesh with the 2D landmarks.

具体的には、ＦＥパラメータ生成器４２２は、個人化無表情モデルＭの線形ＰＣＡモーフィングバージョンを利用し、以下の式を使用して、ワールド座標における顔メッシュＷを表す。 Specifically, the FE parameter generator 422 utilizes a linear PCA morphing version of the personalized expressionless model M and uses the following equation to represent the face mesh W in world coordinates.

式中、ｗはモーフィングモデルに対する線形重みづけベクトルを表し、Ｒは回転行列であり、ｔは変換ベクトルである。ＦＥパラメータ生成器４２２は、以下のエネルギー項を最小化することにより、反復的にｗ及び（Ｒ、ｔ）を求める。 In the formula, w represents a linear weighting vector for the morphing model, R is a rotation matrix, and t is a transformation vector. The FE parameter generator 422 iteratively determines w and (R,t) by minimizing the following energy terms:

式中、α、β、γは、フィッティング項についての重みづけを表す。Ｃ_ｐｌは、式（４）に規定される通り、体積測定メッシュＶと顔メッシュＷとの間の点−面誤差を最小化するためのフィッティング項である。Ｃ_ｍは、式（５）に規定される通り、口、鼻、及びシルエットの２Ｄ顔特徴ランドマークと顔メッシュＷにおいて対応する頂点との間の点−点誤差を最小化するフィッティング項である。Ｃ_ｒは、式（６）に規定の通り、右目領域の２Ｄ顔特徴ランドマークと顔メッシュＷにおいて対応する頂点との間の点−点誤差を最小化するフィッティング項である。Ｃ_ｌは、式（７）に規定される通り、左目領域の２Ｄ顔特徴ランドマークと顔メッシュＷにおいて対応する頂点との間の点−点誤差を最小化するフィッティング項である。Ｃ_ｐｌは、以下の通り、規定される。 In the formula, α, β, and γ represent weighting for the fitting term. C _pl is a fitting term for minimizing the point-to-face error between the volumetric mesh V and the face mesh W, as defined by equation (4). C _m is a fitting term that minimizes the point-to-point error between the 2D facial feature landmarks of the mouth, nose, and silhouette and the corresponding vertices in the face mesh W, as defined in equation (5). .. C _r is a fitting term that minimizes the point-to-point error between the 2D facial feature landmark in the right eye region and the corresponding vertex in the face mesh W, as defined in equation (6). C _l is a fitting term that minimizes the point-to-point error between the 2D facial feature landmark in the left eye region and the corresponding vertex in the face mesh W, as defined in equation (7). C _pl is defined as follows.

式中、ν_ｉは顔メッシュＷのi番目の頂点であり、は、体積測定メッシュＶ上のν_ｉの最も近い点であり、ｎ_ｉは、体積測定メッシュＶ上のν_ｉにおける面法線である。Ｃ_ｍは、以下の通り、規定される。 Where v _i is the i-th vertex of the face mesh W, is the closest point of v _i on the volumetric mesh V, and n _i is the surface normal at v _i on the volumetric mesh V. Is. C _m is defined as follows.

式中、ｕ_ｊは、追跡された２Ｄ顔特徴の位置であり、π_ｍ（ν_ｊ）は、ユーザの口、鼻、及びシルエットの周辺のカメラ空間への対応メッシュ頂点ν_ｊの投影である。Ｃ_ｒは、以下の通り、規定される。 Where u _j is the position of the tracked 2D facial feature and π _m (ν _j ) is the projection of the corresponding mesh vertex ν _j onto the camera space around the user's mouth, nose and silhouette. .. _Cr is defined as follows.

式中、ｕ_ｊは、追跡された２Ｄ顔特徴の位置であり、π_ｒ（ν_ｊ）は、対応するメッシュ頂点ν_ｊの右目領域のカメラ空間への投影である。Ｃ_ｌは、以下の通り、規定される。 Where u _j is the position of the tracked 2D facial feature and π _r (ν _j ) is the projection of the corresponding mesh vertex ν _j into the camera space of the right eye region. C _l is defined as follows.

式中、ｕ_ｊは、追跡された２Ｄ顔特徴の位置であり、π_ｌ（ν_ｊ）は、対応するメッシュ頂点ν_ｊの左目領域のカメラ空間への投影である。 Where u _j is the position of the tracked 2D facial feature and π _l (ν _j ) is the projection of the corresponding mesh vertex ν _j into the camera space of the left eye region.

個人化無表情メッシュが構築された後、当分野で周知の通り、テンプレートフェイスの表現ブレンドシェイプからの変形を、変形転写を用いて転写することにより、個人化表情モデル（ブレンドシェイプモデル）が得られる。変形転写を用いて表現ブレンドシェイプからの変形を転写する一例としての方法については、例えば、ＲｏｂｅｒｔＷ．Ｓｕｍｎｅｒらによる「Ｄｅｆｏｒｍａｔｉｏｎｔｒａｎｓｆｅｒｆｏｒｔｒｉａｎｇｌｅｍｅｓｈｅｓ」ＡＣＭＴｒａｎｓａｃｔｉｏｎｓｏｎＧｒａｐｈｉｃｓ（ＴＯＧ）２３．３（２００４年）３９９〜４０５頁に記載されている。或いは、顔のジオメトリの広がり、同一性、及び表情をマルチランクデータテンソルに符号化する双線形顔モデルを適用することにより、個人化表情モデルを得ることができる。双線形顔モデルを適用して個人化表情モデルを構築する一例としての方法は、例えば、ＣｈｅｎＣａｏらによる「Ｄｉｓｐｌａｃｅｄｄｙｎａｍｉｃｅｘｐｒｅｓｓｉｏｎｒｅｇｒｅｓｓｉｏｎｆｏｒｒｅａｌ−ｔｉｍｅｆａｃｉａｌｔｒａｃｋｉｎｇａｎｄａｎｉｍａｔｉｏｎ」ＡＣＭＴｒａｎｓａｃｔｉｏｎｓｏｎＧｒａｐｈｉｃｓ（ＴＯＧ）３３．４（２０１４年）に記載されている。この内容全体を参照としてここに援用する。 After the personalized expressionless mesh is constructed, the transformation from the template face's expression blend shape is transferred using transformation transfer to obtain a personalized expression model (blend shape model), as is well known in the art. To be For an exemplary method of transferring deformation from an expression blend shape using deformation transfer, see, eg, Robert W. Sumner et al., "Deformation transfer for triangle meshes," ACM Transactions on Graphics (TOG) 23.3 (2004), pages 399-405. Alternatively, a personalized facial expression model can be obtained by applying a bilinear facial model that encodes facial geometry spread, identity, and facial expression into a multi-rank data tensor. An example method of applying a bilinear face model to build a personalized facial expression model is described in, for example, Chen Cao et al. 4 (2014). The entire content of which is incorporated herein by reference.

右目及び左目両方にひとつのカメラだけを使用すれば、Ｃ_ｒ及びＣ_ｌは、１つの式に組み合わせられる。 Using only one camera for both the right and left eyes, C _r and C _l can be combined into one equation.

キャリブレーションを実施した後、図７を参照して以下に詳細に説明する通り、ユーザの顔の特徴の２Ｄ画像及び深度画像を追跡及び処理することにより、ユーザの表情を検出する（６１０）。 After performing the calibration, the facial expression of the user is detected 610 by tracking and processing the 2D image and the depth image of the facial features of the user, as described in detail below with reference to FIG. 7.

その後、ユーザのグラフィック表現に組み込むために、検出された表情をユーザのデジタル表現に適用する（６１６）。生成されたグラフィック表現は、演算装置１０８又はネットワーク（例えば、インターネット）を通じて演算装置１０８と通信する遠隔演算装置により、仮想現実又は拡張現実に表示されてもよい。 The detected facial expression is then applied to the user's digital representation for incorporation into the user's graphical representation (616). The generated graphic representation may be displayed in virtual or augmented reality by the computing device 108 or a remote computing device that communicates with the computing device 108 over a network (eg, the Internet).

図７は、一実施形態に係る、表情検出プロセスを示すフローチャートである。まず、図４を参照して以上に詳細に説明した通り、目領域に関連付けられたランドマーク位置４２２を２Ｄ画像から判定する（７１０）。図４を参照して以上に詳細に説明した通り、ＩＲ画像又はＲＧＢ画像及び／又は３Ｄカメラの深度画像を処理することにより、ユーザの顔の下部特徴に関連付けられたランドマーク位置を生成する（７２０）。 FIG. 7 is a flowchart illustrating a facial expression detection process according to one embodiment. First, as described in detail above with reference to FIG. 4, the landmark position 422 associated with the eye region is determined from the 2D image (710). As described in detail above with reference to FIG. 4, processing the IR or RGB image and/or the depth image of the 3D camera to generate landmark positions associated with the lower features of the user's face ( 720).

ランドマーク位置（及び任意で３Ｄ深度マップデータ）を使用することにより、ユーザの顔全体に対するＦＥパラメータ４２４を生成する（７３０）。一実施形態において、ＦＥパラメータ生成器４２２は、ランドマーク位置４１５に基づき、顎の開放、笑み、及び息を吹く様子等の表現を示すように、ＦＥパラメータ４２４の一部としてブレンドシェイプパラメータを生成する一方で、ランドマーク位置４１９に基づき、目の開閉及び眉の上下を示すように、ＦＥパラメータ４２４の一部としてブレンドシェイプパラメータを生成する。 The landmark position (and optionally the 3D depth map data) is used to generate 730 FE parameters 424 for the entire face of the user. In one embodiment, the FE parameter generator 422 generates a blend shape parameter as part of the FE parameter 424 based on the landmark position 415 to indicate expressions such as jaw open, smile, and breathing. Meanwhile, based on the landmark position 419, a blend shape parameter is generated as a part of the FE parameter 424 so as to indicate the opening/closing of the eyes and the upper and lower sides of the eyebrows.

ＦＥパラメータ４２４を演算するために、追跡されたランドマーク位置４１５が入力制約として組み合わせられ、この入力制約に基づき、ＦＥ表現パラメータにフィッティングが実施される。フィッティングオペレーションは、２つの部分からなってもよい。すなわち、（ｉ）剛性安定化と、（ｉｉ）表現パラメータ追跡とである。最適化オペレーションは、剛性ポーズ値とパラメータ値との双方が収束するまで、剛性安定化と表現パラメータ追跡の間で選択的に実施されてもよい。 The tracked landmark position 415 is combined as an input constraint to compute the FE parameter 424, and a fitting is performed on the FE expression parameter based on the input constraint. The fitting operation may consist of two parts. That is, (i) rigidity stabilization and (ii) expression parameter tracking. Optimization operations may be selectively performed between stiffness stabilization and representation parameter tracking until both stiffness pose values and parameter values converge.

剛性安定化については、顔の剛性ポーズをカメラに対して動かすことができる。ユーザが表情を作るとき、頬の筋肉がヘッドセットを前方且つ上方に押し上げ、カメラに対する相対的な顔の剛性ポーズを経時的に変化させる。ポーズがロックされたとき、剛性的な動きにより、表現パラメータ追跡にアーティファクトを生じるため、無関係な剛性的動きの因子により、不正確なパラメータ値が得られることがある。さらに、ユーザが素早く頭部を動かすとき、ヘッドセットが顔にしっかり装着されていたとしても、顔に対して滑り落ちてしまうことがある。このような状況により、頭部ポーズが固定されているという前提が無効になり、表現追跡にアーティファクトを生じる。ヘッドセットに対する頭部の位置のずれを調整するために、剛性安定化が実施され、ヘッドセット上に剛性固定されたカメラに対する頭部の相対的ポーズを演算する。

For stiffness stabilization, the rigid pose of the face can be moved with respect to the camera. As the user makes facial expressions, the muscles of the cheeks push the headset forward and upward, changing the rigid pose of the face relative to the camera over time. When the pose is locked, stiff movements can cause artifacts in the representation parameter tracking, so unrelated stiff movement factors can result in inaccurate parameter values. Furthermore, when the user moves his head quickly, he or she may slip off the face, even if the headset is firmly attached to the face. Such a situation invalidates the assumption that the head pose is fixed, resulting in artifact tracking artifacts. Rigidity stabilization is performed to adjust the offset of the position of the head with respect to the headset and computes the relative pose of the head with respect to a camera rigidly fixed on the headset.

一実施形態において、頭部の初期剛性ポーズを判定するために、キャリブレーション段階において剛性反復最近接点（ＩＣＰ）アルゴリズムが使用される。しかしながら、追跡モードへの切り替え後、初期剛性ポーズは、ヘッドセットに対する頭部の相対的な動きを調整するように限定的範囲での摂動を許容されたアンカーとして使用される。剛性ＩＣＰはまた、特に、ユーザがＨＭＤを外し、それを再び装着した後に、ＨＭＤからの頭部ポーズのオフセットを考慮して頭部の初期剛性ポーズを判定するために、追跡モードの開始時にいくつかの画像フレームに対して実施されてもよい。その後、剛性ＩＣＰを再び実施することにより、剛性ポーズが再初期化される。初期剛性ポーズが判定された後、追跡されたランドマーク位置と入力された深度マップとを入力制約として使用しつつ、回転のヨー、ロール、ピッチと変換ｘ、ｙ、ｚの値が初期剛性ポーズの所与の限度を超えて外れることがないように、制約を追加して剛性ＩＣＰが実施されてもよい。 In one embodiment, a Rigid Iterative Closest Point (ICP) algorithm is used in the calibration phase to determine the initial rigid pose of the head. However, after switching to tracking mode, the initial rigid pose is used as an anchor that is perturbed to a limited extent to adjust the relative movement of the head with respect to the headset. The Stiffness ICP also determines how many at the beginning of the tracking mode, especially after the user has removed the HMD and put it back on, to determine the initial stiffness pose of the head considering the offset of the head pose from the HMD. It may be performed for any image frame. The rigid pose is then re-initialized by performing the rigid ICP again. After the initial stiffness pose is determined, the yaw, roll, and pitch of the rotation and the x, y, z values of the rotation are used as the input constraint while using the tracked landmark position and the input depth map. A rigid ICP may be implemented with additional constraints such that it does not exceed a given limit of

ＦＥパラメータ４２４を得るために、ランドマーク位置４１５及び４１９（及び３Ｄカメラが使用される場合には深度マップ）に基づき、個人化追跡モデルにフィッティングが実施される。キャリブレーションプロセスのように、撮影カメラは、ヘッドセットに剛性的に固定されると想定され、それらの相対的ポーズ（すなわち、回転及び変換）におけるパラメータが既知であると想定される。ＦＥパラメータ生成器４２２は、ランドマーク位置４１５及び４１９（及び３Ｄカメラが使用される場合には深度マップ）に基づき、ＦＥ表現パラメータを取得するために、フィッティングオペレーションを実施してもよい。 A fitting is performed on the personalized tracking model based on the landmark positions 415 and 419 (and the depth map if a 3D camera is used) to obtain the FE parameter 424. As with the calibration process, the filming cameras are assumed to be rigidly fixed to the headset and the parameters in their relative poses (ie rotation and translation) are assumed to be known. The FE parameter generator 422 may perform a fitting operation to obtain FE representation parameters based on the landmark positions 415 and 419 (and the depth map if a 3D camera is used).

個人化線形モデルは、個人化無表情モデルＭから導き出された一組の表情形状（例えば、笑顔と顎の開放）である。一実施形態において、ＦＥパラメータ生成器４２２は、フィッティング最適化を実施するためのワールド座標における式（８）に示される通り、個人化線形表現モデル（すなわち、ブレンドシェイプモデル）Ｂを利用して顔メッシュＷを表す。 The personalized linear model is a set of facial expression shapes (for example, a smile and open jaw) derived from the personalized expressionless model M. In one embodiment, the FE parameter generator 422 utilizes a personalized linear representation model (ie, a blend shape model) B as shown in equation (8) in world coordinates to perform the fitting optimization, using the face. The mesh W is shown.

式中、ｅはブレンドシェイプモデルのための線形重みづけベクトルであり、Ｒは回転行列であり、ｔは剛性安定化ステップから演算された変換ベクトルである。ＦＥパラメータ生成器４２２における追跡プロセスでは、以下のエネルギー項を最小化することにより、最適なｅを反復的にみつける。 Where e is the linear weighting vector for the blend shape model, R is the rotation matrix, and t is the transformation vector calculated from the stiffness stabilization step. The tracking process in FE parameter generator 422 iteratively finds the optimal e by minimizing the following energy terms:

式中、α、β、γはフィッティング項についての重みづけを表し、Ｃ^＊ _ｐｌは、式（１０）で規定される通り、深度マップと顔メッシュＷとの間の点−面誤差を最小化するフィッティング項である。Ｃ^＊ _ｍは、式（１１）で規定される通り、口、鼻、シルエットの２Ｄ顔特徴ランドマークと顔のメッシュＷの対応頂点との間の点−点誤差を最小化するフィッティング項である。Ｃ^＊ _ｒは、式（１２）で規定される通り、右目領域の２Ｄ顔特徴ランドマークと顔メッシュＷの対応頂点との間の点−点誤差を最小化するフィッティング項である。Ｃ^＊ _ｌは、式（１３）で規定される通り、左目領域の２Ｄ顔特徴ランドマークと顔メッシュＷの対応頂点との間の点−点誤差を最小化するフィッティング項である。Ｃ^＊ _ｐｌは、以下の通り、規定される。 Where α, β, and γ represent weighting for the fitting terms, and C ^* _pl minimizes the point-to-face error between the depth map and the face mesh W, as defined by equation (10). Is a fitting term that C ^* _m is a fitting term that minimizes the point-to-point error between the 2D facial feature landmarks of the mouth, nose, and silhouette and the corresponding vertices of the face mesh W, as defined by equation (11). .. C ^* _r is a fitting term that minimizes the point-to-point error between the 2D facial feature landmark in the right eye region and the corresponding vertex of the face mesh W, as defined by equation (12). C ^* _l is a fitting term that minimizes the point-to-point error between the 2D facial feature landmark in the left eye region and the corresponding vertex of the face mesh W, as defined by equation (13). C ^* _pl is defined as follows.

式中、ν_ｉは顔メッシュＷのｉ番目の頂点であり、ｐ_ｉはν_ｉと同一のカメラ空間座標である深度マップ上の点であり、ｎ_ｉはｐ_ｉにおける面法線である。Ｃ^＊ _ｍは、以下の通り、規定される。 In the formula, ν _i is the i-th vertex of the face mesh W, p _i is a point on the depth map that is the same camera space coordinate as ν _i, and n _i is the surface normal at p _i . C ^* _m is defined as follows.

式中、ｕ_ｊは追跡された２Ｄ顔特徴の位置であり、π_ｍ（ν_ｊ）は対応メッシュ頂点ν_ｊの口カメラ空間への投影である。Ｃ^＊ _ｒは、以下の通り、規定される。 Where u _j is the position of the tracked 2D facial feature and π _m (ν _j ) is the projection of the corresponding mesh vertex ν _{j into} the mouth camera space. C ^* _r is defined as follows.

式中、ｕ_ｊは追跡された２Ｄ顔特徴の位置であり、π_ｒ（ν_ｊ）は対応メッシュ頂点ν_ｊの右目領域のカメラ空間への投影である。Ｃ^＊ _ｌは、以下の通り、規定される。 Where u _j is the position of the tracked 2D facial feature and π _r (ν _j ) is the projection of the corresponding mesh vertex ν _j into the camera space of the right eye region. C ^* _l is defined as follows.

式中、ｕ_ｊは追跡された２Ｄ顔特徴の位置であり、π_ｌ（ν_ｊ）は対応メッシュ頂点ν_ｊの左目領域のカメラ空間への投影である。右目及び左目の双方に対して１つのみのカメラが使用される場合、Ｃ_ｒ及びＣ_ｌは、１つの式に組み合わせられる。 Where u _j is the position of the tracked 2D facial feature and π _l (ν _j ) is the projection of the corresponding mesh vertex ν _j into the camera space of the left eye region. If only one camera is used for both the right and left eyes, then C _r and C _l are combined into one equation.

一実施形態において、ＦＥパラメータ生成器４２２は、検出されたユーザの表情を総合的に示すＦＥパラメータを追跡プロセスの結果として生成しながら、式（８）中、ｅで表されるブレンドシェイプウェイト値を出力する。 In one embodiment, the FE parameter generator 422 generates a FE parameter that is generally indicative of the detected user's facial expression as a result of the tracking process, while the blend shape weight value represented by e in equation (8). Is output.

１つ以上の実施形態において、図７のステップは、リアルタイムで実施され、カメラから受信した画像４０２及び４０４の各セットを処理してもよい。さらに、図７に示されるステップ及びステップのシーケンスは、単なる例示である。例えば、ランドマーク位置を判定するステップ７１０と、３Ｄ深度マップデータを判定するステップ７２０とは、逆の順序で実施することもでき、また並列に実施することもできる。 In one or more embodiments, the steps of Figure 7 may be performed in real time to process each set of images 402 and 404 received from the camera. Moreover, the steps and sequence of steps shown in FIG. 7 are merely exemplary. For example, the step 710 of determining the landmark position and the step 720 of determining the 3D depth map data may be performed in reverse order or may be performed in parallel.

本明細書中、本発明の特定の実施形態及び適用について図示及び説明したが、本発明は、本明細書に開示の精密な構成及び構成要素に限定されるものでなく、本発明の方法及び装置の配置、オペレーション、及び詳細において、添付のクレームによって規定される本発明の主旨及び範囲から逸脱することなく、種々の修正、変更、及び変化が加えられてもよいことが理解されなければならない。 While particular embodiments and applications of the present invention have been illustrated and described herein, the present invention is not limited to the precise constructions and components disclosed herein, but rather the methods and methods of the present invention. It should be understood that various modifications, changes, and variations in the arrangement, operation, and details of the device may be made without departing from the spirit and scope of the invention as defined by the appended claims. ..

Claims

表情検出方法であって、
ヘッドマウントディスプレイの主要本体上の第１画像撮影装置により、ユーザの目領域を含む前記ユーザの顔の上部の第１画像を撮影することと、
前記ヘッドマウントディスプレイの前記主要本体から前記ユーザの顔の下部に向かって下方に伸びる伸張部材上の第２画像撮影装置により、前記ユーザの顔の前記下部を含む前記ユーザの第２画像を撮影することと、
前記第１画像及び前記第２画像を処理することにより、前記ユーザの表情を表す表情パラメータを抽出することと、を備え、
前記第１画像及び前記第２画像の処理では、剛性安定化を実行し、前記第１画像撮影装置及び前記第２画像撮影装置に対する前記ユーザの顔の相対的なポーズを決定する方法。 A facial expression detection method,
Capturing a first image of the top of the user's face, including the user's eye area, with a first image capture device on the main body of the head-mounted display;
A second image capturing device on the extension member extending downward from the main body of the head mounted display toward the lower portion of the user's face captures a second image of the user including the lower portion of the user's face. That
Extracting a facial expression parameter representing the facial expression of the user by processing the first image and the second image.
A method of performing rigidity stabilization in the processing of the first image and the second image to determine a relative pose of the face of the user with respect to the first image capturing device and the second image capturing device.

前記第１画像撮影装置は、一対の赤外線カメラを備え、前記第２画像撮影装置は、深度カメラ、カラーカメラ、赤外線カメラ、又は２つの立体カメラ、のうちの１つを備える請求項１に記載の方法。 The first image capturing device comprises a pair of infrared cameras, and the second image capturing device comprises one of a depth camera, a color camera, an infrared camera, or two stereoscopic cameras. the method of.

前記第１画像及び前記第２画像を処理することは、
前記第１画像から、前記ユーザの目及び前記ユーザの眉に関連付けられたランドマーク位置を検出することと、
前記第２画像から、前記ユーザの顔の下部に関連付けられたランドマーク位置を検出することと、を備える請求項１に記載の方法。 Processing the first image and the second image includes
Detecting landmark positions associated with the user's eyes and the user's eyebrows from the first image;
Detecting a landmark position associated with the lower portion of the user's face from the second image.

前記抽出された表情パラメータを前記ユーザのデジタル表現に適用することにより、前記ユーザのグラフィック表現を生成することをさらに備える請求項１に記載の方法。 The method of claim 1, further comprising applying the extracted facial expression parameters to a digital representation of the user to generate a graphical representation of the user.

前記ユーザの無表情を表すキャリブレーション画像を撮影および処理することにより、キャリブレーションを実施することをさらに備える請求項１に記載の方法。 The method of claim 1, further comprising performing calibration by capturing and processing a calibration image that represents the user's expressionlessness.

前記キャリブレーションを実施することは、
前記キャリブレーション画像に基づき、個人化無表情メッシュを生成することと、
変形転写技術を前記個人化無表情メッシュに適用することにより、個人化追跡モデルを構築することと、を備え、
前記第１画像及び前記第２画像の前記処理は、前記個人化追跡モデルに基づき、少なくとも１つのブレントシェイプモデルを前記第１画像及び前記第２画像におけるランドマーク位置にフィッティングすることにより、前記表情パラメータを得ることを備える請求項５に記載の方法。 Performing the calibration is
Generating a personalized expressionless mesh based on the calibration image;
Constructing a personalized tracking model by applying a deformation transfer technique to the personalized expressionless mesh,
The processing of the first image and the second image is performed by fitting at least one Brent shape model to a landmark position in the first image and the second image based on the personalized tracking model to obtain the facial expression. The method of claim 5, comprising obtaining a parameter.

前記第１画像及び前記第２画像の前記処理は、リアルタイムで実施される請求項６に記載の方法。 The method of claim 6, wherein the processing of the first image and the second image is performed in real time.

ヘッドマウントディスプレイであって、
目領域を含むユーザの顔の上部を示す第１画像を撮影するように構成された本体上の第１画像撮影装置と、
前記ユーザの顔の下部を示す第２画像を撮影するように構成され、前記本体から下方に延伸した伸張部材又は前記本体に搭載された第２画像撮影装置と、
剛性安定化を実施し、前記第１画像撮影装置及び前記第２画像撮影装置に対するユーザの顔の相対的なポーズを決定することで生成された画像を前記ユーザに表示するように構成された前記本体上の表示装置と、
前記本体と、
を備えるヘッドマウントディスプレイユニット。 A head mounted display,
A first image capture device on the body configured to capture a first image showing an upper portion of the user's face including the eye region;
Is configured to capture a second image showing the bottom of the face of the user, and the second image capturing device mounted on elongate members or the main body was stretched downwardly from said body,
Wherein the image is generated by performing stiffness stabilization and determining a relative pose of the user's face with respect to the first image capture device and the second image capture device and displaying the image to the user. A display device on the main body,
The main body,
Head mounted display unit.

前記第１画像撮影装置は、一対の赤外線カメラを備え、前記第２画像撮影装置は、深度カメラ、カラーカメラ、赤外線カメラ、又は２つの立体カメラのうちの１つを備える請求項８に記載のヘッドマウントディスプレイユニット。 The first image capturing device comprises a pair of infrared cameras, and the second image capturing device comprises one of a depth camera, a color camera, an infrared camera, or two stereoscopic cameras. Head mounted display unit.

摺動可能なマウントをさらに備える請求項８に記載のヘッドマウントディスプレイユニット。 The head mounted display unit according to claim 8, further comprising a slidable mount.

前記表示装置は、第１ディスプレイ及び第２ディスプレイを備え、前記第１ディスプレイは、左側画像を前記ユーザの左目の方に表示するように構成され、前記第２ディスプレイは、右側画像を前記ユーザの右目の方に表示するように構成される請求項８に記載のヘッドマウントディスプレイユニット。 The display device includes a first display and a second display, the first display is configured to display a left image toward the left eye of the user, and the second display displays a right image of the user. The head mounted display unit according to claim 8, which is configured to display on the right eye side.

前記第１画像撮影装置は、一対のカメラを備え、前記カメラは各々、前記本体の両側に設置される請求項８に記載のヘッドマウントディスプレイユニット。 The head mount display unit according to claim 8, wherein the first image capturing device includes a pair of cameras, and the cameras are installed on both sides of the main body.

前記第２画像撮影装置は、前記本体に直接搭載される請求項８に記載のヘッドマウントディスプレイユニット。 The head mounted display unit according to claim 8, wherein the second image capturing device is directly mounted on the main body.

前記本体は、前記目領域を包含する膨らんだ上部を有する請求項８に記載のヘッドマウントディスプレイユニット。 The head mounted display unit according to claim 8, wherein the main body has a bulged upper portion including the eye region.

前記表示装置は、一対の別個の表示部を備え、前記第１画像撮影装置は、前記一対の表示部の間に２つのカメラを備える請求項８に記載のヘッドマウントディスプレイユニット。 The head mounted display unit according to claim 8, wherein the display device includes a pair of separate display units, and the first image capturing device includes two cameras between the pair of display units.

仮想現実又は拡張現実システムであって、
目領域を含むユーザの顔の上部の第１画像を撮影するように構成された第１画像撮影装置と、
前記第１画像撮影装置の下方位置に設けられ、前記ユーザの顔の下部の第２画像を撮影するように構成された第２画像撮影装置と、
画像を前記ユーザに表示するように構成された表示装置と、
前記第１画像撮影装置、及び前記表示装置を搭載するように構成された本体と、
前記本体から前記ユーザの顔の前記下部に向かって伸びた伸張部材と、
を備え、
前記第２画像撮影装置は前記本体又は前記伸張部材に搭載されるヘッドマウントディスプレイユニットと、
前記ヘッドマウントディスプレイユニットに通信可能に連結される演算装置と、
を備え、
前記演算装置は、
前記ヘッドマウントディスプレイユニットから前記第１画像及び前記第２画像を受信し、
剛性安定化に基づき個人化無表情メッシュを生成し、前記個人化無表情メッシュに変形転写技術を適用することで個人化追跡モデルを構築することにより、キャリブレーションを実施し、
前記個人化追跡モデルに基づき、前記第１画像及び前記第２画像内のランドマーク位置に少なくともブレンドシェイプモデルをフィッティングすることにより、前記第１画像及び前記第２画像を処理して、前記ユーザの表情を表す表情パラメータを得る仮想現実又は拡張現実システム。 A virtual or augmented reality system,
A first image capture device configured to capture a first image of an upper portion of the user's face including the eye region;
A second image capturing device that is provided below the first image capturing device and is configured to capture a second image of the lower part of the user's face;
A display device configured to display an image to the user,
A main body configured to mount the first image capturing device and the display device;
An extension member extending from the body toward the lower portion of the user's face,
Equipped with
The second image capturing device includes a head mount display unit mounted on the main body or the extension member,
A computing device communicatively coupled to the head mounted display unit,
Equipped with
The arithmetic unit is
Receiving the first image and the second image from the head mounted display unit,
Calibration is performed by generating a personalized expressionless mesh based on stiffness stabilization and applying a deformation transfer technique to the personalized expressionless mesh to build a personalized tracking model,
The first image and the second image are processed by fitting at least the blend shape model to the landmark positions in the first image and the second image based on the personalized tracking model to process the first image of the user. A virtual reality or augmented reality system that obtains facial expression parameters that represent facial expressions.

前記演算装置は、
前記第１画像から、前記ユーザの目及び前記ユーザの眉に関連付けられたランドマーク位置を検出し、
前記第２画像から、前記ユーザの顔の前記下部に関連付けられたランドマーク位置を検出するように構成される請求項１６に記載の仮想現実又は拡張現実システム。 The arithmetic unit is
Detecting landmark positions associated with the user's eyes and the user's eyebrows from the first image;
The virtual reality or augmented reality system of claim 16, configured to detect a landmark position associated with the lower portion of the user's face from the second image.