JP7460450B2

JP7460450B2 - Gaze estimation system, gaze estimation method, gaze estimation program, learning data generation device, and gaze estimation device

Info

Publication number: JP7460450B2
Application number: JP2020098172A
Authority: JP
Inventors: 勇氣 ▲高▼橋
Original assignee: Yazaki Corp
Current assignee: Yazaki Corp
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2024-04-02
Anticipated expiration: 2040-06-05
Also published as: JP2021190041A

Description

本発明は、視線推定システム、視線推定方法、視線推定プログラム、学習用データ生成装置、及び、視線推定装置に関する。 The present invention relates to a line-of-sight estimation system, a line-of-sight estimation method, a line-of-sight estimation program, a learning data generation device, and a line-of-sight estimation device.

従来、視線推定システムとして、例えば、特許文献１には、人物の顔を含む画像を取得する画像取得部と、取得した画像から人物の目を含む部分画像を抽出する画像抽出部と、視線方向を推定するための機械学習を行った学習済みの学習器に当該部分画像を入力することで、人物の視線方向を示す視線情報を学習器から取得する推定部とを備える情報処理装置が記載されている。 Conventionally, gaze estimation systems include, for example, Patent Document 1, which includes an image acquisition unit that acquires an image including a person's face, an image extraction unit that extracts a partial image including the person's eyes from the acquired image, and a gaze direction estimation system. An information processing device is described that includes an estimator that acquires gaze information indicating the direction of a person's gaze from the learning device by inputting the partial image into a learned learning device that has performed machine learning to estimate the ing.

特開２０１９－２８８４３号公報JP2019-28843A

ところで、上述の特許文献１に記載の情報処理装置は、例えば、視線を推定する精度の低下を抑制する点で更なる改善の余地がある。 By the way, the information processing device described in Patent Document 1 mentioned above has room for further improvement, for example, in terms of suppressing a decrease in the accuracy of estimating the line of sight.

そこで、本発明は、上記に鑑みてなされたものであって、適正に視線を推定することができる視線推定システム、視線推定方法、視線推定プログラム、視線推定装置、及び、適正に視線を推定することを支援することができる学習用データ生成装置を提供することを目的とする。 Therefore, the present invention has been made in consideration of the above, and aims to provide a gaze estimation system, a gaze estimation method, a gaze estimation program, a gaze estimation device, and a learning data generation device that can assist in estimating the gaze appropriately.

上述した課題を解決し、目的を達成するために、本発明に係る視線推定システムは、推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成装置と、前記学習用データ生成装置により生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成部と、前記推定対象の入力顔画像を入力する推定対象入力部と、前記モデル生成部により生成された前記学習済みモデルを用いて、前記推定対象入力部により入力された前記入力顔画像から視線を推定する視線推定部と、を備え、前記学習用データ生成装置は、学習対象者の顔を撮像する学習用撮像部と、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理部と、を含んで構成されることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the gaze estimation system of the present invention comprises a learning data generation device that generates a learning dataset used when machine learning a trained model that estimates the gaze from an input face image of an estimation target, a model generation unit that generates the trained model by machine learning using a plurality of the learning datasets generated by the learning data generation device, an estimation target input unit that inputs an input face image of the estimation target, and a gaze estimation unit that estimates the gaze from the input face image input by the estimation target input unit using the trained model generated by the model generation unit, and the learning data generation device comprises a learning imaging unit that images the face of a training target, The system is characterized by including: a gaze detector that is disposed between the learning imaging unit and the learning subject and detects the gaze of the learning subject; an image detection unit that detects an image of the gaze detector in facial image data including an image of the gaze detector captured by the learning imaging unit and a facial image of the learning subject; an image replacement unit that replaces pixel values of an image area in the facial image data that includes the image of the gaze detector detected by the image detection unit with predetermined pixel values; and a matching processing unit that generates the learning dataset in which the facial image data replaced by the image replacement unit is matched with gaze data representing the gaze of the learning subject detected by the gaze detector.

上記視線推定システムにおいて、前記学習用データ生成装置は、前記置換後の顔画像データにおいて、前記学習対象者の顔画像が含まれることを判定する顔判定部を含み、前記対応付け処理部は、前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しないことが好ましい。 In the above gaze estimation system, it is preferable that the learning data generation device includes a face determination unit that determines whether the replaced face image data contains a facial image of the learning subject, and the matching processing unit generates the learning data set in which the replaced face image data and the gaze data are matched when the face determination unit determines that the replaced face image data contains a facial image of the learning subject, and does not generate the learning data set in which the replaced face image data and the gaze data are matched when the face determination unit determines that the replaced face image data does not contain a facial image of the learning subject.

上記視線推定システムにおいて、前記学習用データ生成装置は、前記視線検出器により検出された前記視線データに基づいて前記学習対象者の瞬きを判定する瞬き判定部を含み、前記対応付け処理部は、前記瞬き判定部により瞬きをしていないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、前記瞬き判定部により瞬きをしていると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しないことが好ましい。 In the above gaze estimation system, the learning data generation device includes a blink determination unit that determines blinking of the learning subject based on the gaze data detected by the gaze detector, and the association processing unit includes: If the blink determination unit determines that the person is not blinking, the learning data set is generated in which the replaced face image data and the line of sight data are associated with each other, and the blink determination unit determines that the person is not blinking. If it is determined that the replaced face image data and the line of sight data are associated with each other, it is preferable not to generate the learning data set in which the replaced face image data and the line of sight data are associated with each other.

上記視線推定システムにおいて、予め定められた前記画素値は、前記学習対象者の眼の色とは異なる色の画素値であることが好ましい。 In the above gaze estimation system, it is preferable that the predetermined pixel value is a pixel value of a color different from the eye color of the subject.

本発明に係る視線推定方法は、推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成ステップと、前記学習用データ生成ステップで生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成ステップと、前記推定対象の入力顔画像を入力する推定対象入力ステップと、前記モデル生成ステップで生成された前記学習済みモデルを用いて、前記推定対象入力ステップで入力された前記入力顔画像から視線を推定する視線推定ステップと、を有し、前記学習用データ生成ステップでは、学習対象者の顔を学習用撮像部により撮像する撮像ステップと、前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出する視線検出ステップと、前記撮像ステップで撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出ステップと、前記顔画像データにおいて、前記画像検出ステップで検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換ステップと、前記画像置換ステップで置き換えられた置換後の顔画像データと、前記視線検出ステップで検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理ステップと、を含むことを特徴とする。 The gaze estimation method according to the present invention includes a learning data generation step of generating a learning data set used in machine learning a trained model for estimating gaze from an input face image to be estimated; and a learning data generation step. a model generation step of generating the trained model by machine learning using the plurality of training data sets generated in step; an estimation target input step of inputting the input face image of the estimation target; and an estimation target input step of inputting the input face image of the estimation target; a gaze estimation step of estimating a gaze from the input face image input in the estimation target input step using the trained model generated in the step, and in the learning data generation step, the learning target an imaging step of capturing an image of the person's face by a learning imaging unit; a line-of-sight detection step of detecting the learning subject's line of sight with a line-of-sight detector disposed between the learning imaging unit and the learning subject; an image detection step of detecting an image of the line-of-sight detector in face image data including an image of the line-of-sight detector captured in the imaging step and a face image of the learning subject; an image replacement step of replacing pixel values of an image area including the image of the line of sight detector detected in the detection step with predetermined pixel values; and face image data after replacement replaced in the image replacement step; The method is characterized in that it includes a correlation processing step of generating the learning data set in which the eye gaze data representing the eye gaze of the learning subject detected in the eye gaze detection step is associated with the eye gaze data.

本発明に係る視線推定プログラムは、推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成し、生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成し、前記推定対象の入力顔画像を入力し、前記学習済みモデルを用いて、前記推定対象の入力顔画像から視線を推定する各処理をコンピュータに実行させるものであり、前記学習用データセットを生成する場合、学習対象者の顔を学習用撮像部により撮像し、前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出し、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出し、前記顔画像データにおいて、前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換え、置換後の顔画像データと前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成することを特徴とする。 The gaze estimation program of the present invention generates a learning dataset used in machine learning a trained model that estimates the gaze from an input face image of an estimation target, generates the trained model by machine learning using the generated multiple learning datasets, inputs the input face image of the estimation target, and causes a computer to execute each process of estimating the gaze from the input face image of the estimation target using the trained model.When generating the learning dataset, the face of the training target is imaged by a training imaging unit, the gaze of the training target is detected by a gaze detector arranged between the training imaging unit and the training target, an image of the gaze detector is detected in facial image data including the image of the gaze detector imaged by the training imaging unit and the facial image of the training target, an image of the gaze detector is detected in the facial image data including the image of the gaze detector and the facial image of the training target, pixel values of an image area including the image of the gaze detector in the facial image data are replaced with predetermined pixel values, and the training dataset is generated in which the facial image data after the replacement is associated with gaze data representing the gaze of the training target.

本発明に係る学習用データ生成装置は、学習対象者の顔を撮像する学習用撮像部と、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部と、を備えることを特徴とする。 The learning data generation device according to the present invention is characterized by comprising: a learning imaging unit that captures an image of a learning subject's face; a gaze detector that is disposed between the learning imaging unit and the learning subject and detects the gaze of the learning subject; an image detection unit that detects an image of the gaze detector in facial image data including an image of the gaze detector captured by the learning imaging unit and a facial image of the learning subject; an image replacement unit that replaces pixel values of an image area in the facial image data that includes the image of the gaze detector detected by the image detection unit with predetermined pixel values; and a matching processing unit that generates a learning dataset that matches the facial image data replaced by the image replacement unit with gaze data representing the gaze of the learning subject detected by the gaze detector.

本発明に係る視線推定装置は、車両の運転者の顔を撮像する運転者撮像部と、前記運転者撮像部により撮像された運転者の顔画像を入力する推定対象入力部と、学習対象者の顔を撮像する学習用撮像部、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部、及び、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部を含む学習用データ生成装置により生成された複数の前記学習用データセットを用いて機械学習した学習済みモデルを用いて、前記推定対象入力部により入力された前記運転者の顔画像から前記運転者の視線を推定する視線推定部と、を備えることを特徴とする。 The gaze estimation device according to the present invention is characterized in that it includes a driver imaging unit that images the face of a vehicle driver, an estimation target input unit that inputs the facial image of the driver imaged by the driver imaging unit, a learning imaging unit that images the face of a learning subject, a gaze detector that is disposed between the learning imaging unit and the learning subject and detects the gaze of the learning subject, an image detection unit that detects the image of the gaze detector in facial image data including the image of the gaze detector imaged by the learning imaging unit and the facial image of the learning subject, an image replacement unit that replaces pixel values of an image area including the image of the gaze detector detected by the image detection unit in the facial image data with predetermined pixel values, and a correspondence processing unit that generates a learning data set in which the replaced facial image data replaced by the image replacement unit is associated with gaze data representing the gaze of the learning subject detected by the gaze detector, and a gaze estimation unit that estimates the gaze of the driver from the facial image of the driver input by the estimation target input unit using a trained model that has been machine-learned using a plurality of the learning datasets generated by a learning data generation device including:

本発明に係る視線推定システム、視線推定方法、視線推定プログラム、及び、視線推定装置は、顔画像データにおいて視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換えた学習用データセットを生成し、当該学習用データセットを用いて推定対象の入力顔画像から視線を推定する学習済みモデルを生成する。この結果、視線推定システム、視線推定方法、視線推定プログラム、及び、視線推定装置は、この学習済みモデルを用いて適正に視線を推定することができる。 The gaze estimation system, gaze estimation method, gaze estimation program, and gaze estimation device of the present invention generate a learning dataset in which pixel values of an image area in face image data that includes an image of a gaze detector are replaced with predetermined pixel values, and generate a trained model that uses the learning dataset to estimate the gaze from an input face image to be estimated. As a result, the gaze estimation system, gaze estimation method, gaze estimation program, and gaze estimation device can properly estimate the gaze using the trained model.

図１は、実施形態に係る視線推定システムの構成例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of the configuration of a gaze estimation system according to an embodiment. 図２は、実施形態に係る視線推定システムの処理回路によって行われる学習フェーズ、及び、使用フェーズの処理を示す模式図である。FIG. 2 is a schematic diagram showing the processing of the learning phase and the usage phase performed by the processing circuit of the gaze estimation system according to the embodiment. 図３は、実施形態に係る学習用データ生成部の構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of the learning data generation unit according to the embodiment. 図４は、実施形態に係る同期信号生成装置の動作例を示すタイミングチャートである。FIG. 4 is a timing chart showing an example of the operation of the synchronization signal generating device according to the embodiment. 図５は、実施形態に係る視線検出器の右側の眼球カメラの検出例を示す図である。FIG. 5 is a diagram illustrating an example of detection by the eyeball camera on the right side of the line of sight detector according to the embodiment. 図６は、実施形態に係る視線検出器の左側の眼球カメラの検出例を示す図である。FIG. 6 is a diagram showing an example of detection by the left eye camera of the line-of-sight detector according to the embodiment. 図７は、実施形態に係る視線検出器の各眼球カメラの画像置換例を示す図である。FIG. 7 is a diagram illustrating an example of image replacement of each eyeball camera of the line of sight detector according to the embodiment. 図８は、実施形態に係る顔判定部により顔判定された置換後の顔画像データを示す図である。FIG. 8 is a diagram illustrating replaced face image data whose face has been determined by the face determination unit according to the embodiment. 図９は、実施形態に係る顔判定部により顔判定されなかった置換後の顔画像データを示す図である。FIG. 9 is a diagram showing replaced face image data that has not been determined as a face by the face determining unit according to the embodiment. 図１０は、実施形態に係る学習用データセットの構成例を示す図である。FIG. 10 is a diagram illustrating an example of the configuration of a learning dataset according to the embodiment. 図１１は、実施形態に係る視線推定システムにおける視線推定方法の処理手順を示すフローチャートである。FIG. 11 is a flowchart showing a processing procedure of the gaze estimation method in the gaze estimation system according to the embodiment. 図１２は、実施形態に係る学習用データ生成装置の動作例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of the operation of the learning data generating device according to the embodiment. 図１３は、実施形態の変形例に係る視線推定システムの構成例を示すブロック図である。FIG. 13 is a block diagram illustrating a configuration example of a line-of-sight estimation system according to a modification of the embodiment. 図１４は、実施形態の変形例に係る視線推定装置の適用例を示す概略図である。FIG. 14 is a schematic diagram showing an application example of a gaze estimation device according to a modified example of the embodiment.

本発明を実施するための形態（実施形態）につき、図面を参照しつつ詳細に説明する。以下の実施形態に記載した内容により本発明が限定されるものではない。また、以下に記載した構成要素には、当業者が容易に想定できるもの、実質的に同一のものが含まれる。更に、以下に記載した構成は適宜組み合わせることが可能である。また、本発明の要旨を逸脱しない範囲で構成の種々の省略、置換又は変更を行うことができる。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Modes (embodiments) for carrying out the present invention will be described in detail with reference to the drawings. The present invention is not limited to the contents described in the following embodiments. Further, the constituent elements described below include those that can be easily assumed by those skilled in the art and those that are substantially the same. Furthermore, the configurations described below can be combined as appropriate. Further, various omissions, substitutions, or changes in the configuration can be made without departing from the gist of the present invention.

〔実施形態〕
図面を参照しながら実施形態に係る学習用データ生成装置１００について説明する。図１は、実施形態に係る視線推定システム１の構成例を示すブロック図である。図２は、実施形態に係る視線推定システム１の処理回路４０によって行われる学習フェーズ、及び、使用フェーズの処理を示す模式図である。図３は、実施形態に係る学習用データ生成装置１００の構成例を示すブロック図である。図４は、実施形態に係る同期信号生成装置５０の動作例を示すタイミングチャートである。図５は、実施形態に係る視線検出器６０の右側の眼球カメラ６２Ｒの検出例を示す図である。図６は、実施形態に係る視線検出器６０の左側の眼球カメラ６２Ｌの検出例を示す図である。図７は、実施形態に係る視線検出器６０の各眼球カメラ６２Ｒ、６２Ｌの画像置換例を示す図である。図８は、実施形態に係る顔判定部４４３により顔判定された置換後の顔画像データＤ１ａを示す図である。図９は、実施形態に係る顔判定部４４３により顔判定されなかった置換後の顔画像データＤ１ａを示す図である。図１０は、実施形態に係る学習用データセットＤ３の構成例を示す図である。 [Embodiment]
The learning data generating device 100 according to the embodiment will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a gaze estimation system 1 according to the embodiment. FIG. 2 is a schematic diagram showing the processing of the learning phase and the use phase performed by the processing circuit 40 of the gaze estimation system 1 according to the embodiment. FIG. 3 is a block diagram showing a configuration example of the learning data generating device 100 according to the embodiment. FIG. 4 is a timing chart showing an operation example of the synchronization signal generating device 50 according to the embodiment. FIG. 5 is a diagram showing a detection example of the right eyeball camera 62R of the gaze detector 60 according to the embodiment. FIG. 6 is a diagram showing a detection example of the left eyeball camera 62L of the gaze detector 60 according to the embodiment. FIG. 7 is a diagram showing an image replacement example of each eyeball camera 62R, 62L of the gaze detector 60 according to the embodiment. FIG. 8 is a diagram showing face image data D1a after replacement that has been determined as a face by the face determination unit 443 according to the embodiment. FIG. 9 is a diagram showing face image data D1a after replacement that has not been determined as a face by the face determination unit 443 according to the embodiment. FIG. 10 is a diagram showing a configuration example of a learning dataset D3 according to the embodiment.

図１に示す本実施形態の視線推定システム１は、視線を推定するシステムである。視線推定システム１では、図２に示すように、視線を推定するための学習済みモデルＭを生成する処理を行う学習フェーズと、学習済みモデルＭを用いて視線を推定する処理を行う使用フェーズとがある。視線推定システム１は、種々のコンピュータ機器によって実現される。以下、図１、図２を参照して視線推定システム１の各構成について詳細に説明する。 A line-of-sight estimation system 1 according to the present embodiment shown in FIG. 1 is a system for estimating line-of-sight. As shown in FIG. 2, the line of sight estimation system 1 includes a learning phase in which a trained model M for estimating the line of sight is generated, and a use phase in which the trained model M is used to estimate the line of sight. There is. The line of sight estimation system 1 is realized by various computer devices. Hereinafter, each configuration of the line of sight estimation system 1 will be described in detail with reference to FIGS. 1 and 2.

視線推定システム１は、例えば、車両に搭載され、入力機器１０と、出力機器２０と、記憶回路３０と、処理回路４０と、同期信号生成装置５０と、視線検出器６０と、学習用撮像部及び運転者撮像部としてのカメラ７０とを備える。入力機器１０、出力機器２０、記憶回路３０、処理回路４０、同期信号生成装置５０、視線検出器６０、及び、カメラ７０は、ネットワークを介して相互に通信可能に接続されている。ここで、入力機器１０、記憶回路３０、同期信号生成装置５０、視線検出器６０、カメラ７０、及び、処理回路４０の一部（後述する学習用データ生成部４１）は、学習用データ生成装置１００を構成する。入力機器１０、記憶回路３０、及び、処理回路４０の一部（後述するモデル生成部４２）は、モデル生成装置２００を構成する。入力機器１０、出力機器２０、記憶回路３０、カメラ７０、及び、処理回路４０の一部（後述する推定対象入力部４３、視線推定部４４、及び、出力部４５）は、視線推定装置３００を構成する。学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００は、例えば、それぞれが同じ車両に搭載された１つのシステムとして構成されてもよいし、それぞれが別々の場所に配置された分散したシステムとして構成されてもよい。図１に示す構成の説明では、一例として、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００は、それぞれが同じ車両に搭載された１つのシステムとして説明する。後述する図１３に示す構成の説明では、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００は、それぞれが別々の場所に配置された分散したシステムとして説明する。 The line of sight estimation system 1 is mounted on a vehicle, for example, and includes an input device 10, an output device 20, a storage circuit 30, a processing circuit 40, a synchronization signal generation device 50, a line of sight detector 60, and a learning imaging unit. and a camera 70 as a driver imaging unit. The input device 10, the output device 20, the storage circuit 30, the processing circuit 40, the synchronization signal generation device 50, the line of sight detector 60, and the camera 70 are communicably connected to each other via a network. Here, the input device 10, the storage circuit 30, the synchronization signal generation device 50, the line of sight detector 60, the camera 70, and a part of the processing circuit 40 (a learning data generation unit 41 to be described later) are the learning data generation device 100. The input device 10, the storage circuit 30, and a part of the processing circuit 40 (a model generation unit 42 described later) constitute a model generation device 200. The input device 10 , the output device 20 , the storage circuit 30 , the camera 70 , and a part of the processing circuit 40 (an estimation target input section 43 , a line-of-sight estimation section 44 , and an output section 45 to be described later) operate the line-of-sight estimation device 300 . Configure. For example, the learning data generation device 100, the model generation device 200, and the line of sight estimation device 300 may be configured as one system each mounted on the same vehicle, or each may be configured in different locations. It may also be configured as a distributed system. In the description of the configuration shown in FIG. 1, as an example, the learning data generation device 100, the model generation device 200, and the line of sight estimation device 300 will be described as one system each mounted on the same vehicle. In the description of the configuration shown in FIG. 13, which will be described later, the learning data generation device 100, the model generation device 200, and the line of sight estimation device 300 will be described as a distributed system in which each is placed at a different location.

入力機器１０は、視線推定システム１に対する種々の入力を行う機器である。入力機器１０は、例えば、ユーザからの各種の操作入力を受け付ける操作入力機器、視線推定システム１外の他の機器からのデータ（情報）入力を受け付けるデータ入力機器等によって実現される。操作入力機器は、例えば、マウス、キーボード、トラックボール、スイッチ、ボタン、ジョイスティック、タッチパッド、タッチスクリーン、非接触入力回路、音声入力回路等により実現される。データ入力機器は、例えば、有線、無線を問わず通信を介して機器との間で各種データの送受信を行う通信インターフェース、フレキシブルディスク（ＦＤ）、光磁気ディスク（Ｍａｇｎｅｔｏ－Ｏｐｔｉｃａｌｄｉｓｋ）、ＣＤ－ＲＯＭ、ＤＶＤ、ＵＳＢメモリ、ＳＤカードメモリ、Ｆｌａｓｈメモリ等の記録媒体から各種データを読み出す記録媒体インターフェース等によって実現される。ここでは、入力機器１０は、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００の入力部として兼用される。 The input device 10 is a device that performs various inputs to the line of sight estimation system 1. The input device 10 is realized by, for example, an operation input device that accepts various operation inputs from a user, a data input device that accepts data (information) input from other devices outside the line of sight estimation system 1, and the like. The operation input device is realized by, for example, a mouse, keyboard, trackball, switch, button, joystick, touch pad, touch screen, non-contact input circuit, voice input circuit, or the like. Data input devices include, for example, communication interfaces that send and receive various data to and from devices via wired or wireless communication, flexible disks (FD), magneto-optical disks, and CD-ROMs. This is realized by a recording medium interface that reads various data from recording media such as , DVD, USB memory, SD card memory, and Flash memory. Here, the input device 10 is also used as an input unit for the learning data generation device 100, the model generation device 200, and the line of sight estimation device 300.

出力機器２０は、視線推定システム１から種々の出力を行う機器である。出力機器２０は、例えば、各種画像情報を出力して表示するディスプレイ、音情報を出力するスピーカ、視線推定システム１外の他の機器に対するデータ（情報）出力を行うデータ出力機器等によって実現される。データ出力機器は、例えば、有線、無線を問わず通信を介して機器との間で各種データの送受信を行う通信インターフェース、上記と同様の記録媒体に各種データを書き込む記録媒体インターフェース等によって実現される。データ入力機器とデータ出力機器とは、一部又は全部の構成が兼用されてもよい。 The output device 20 is a device that performs various outputs from the line of sight estimation system 1. The output device 20 is realized by, for example, a display that outputs and displays various image information, a speaker that outputs sound information, a data output device that outputs data (information) to other devices outside the line of sight estimation system 1, and the like. . The data output device is realized by, for example, a communication interface that sends and receives various data to and from the device via wired or wireless communication, a recording medium interface that writes various data to the same recording medium as above, etc. . Part or all of the configurations of the data input device and the data output device may be shared.

記憶回路３０は、各種データを記憶する回路である。記憶回路３０は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリ等の半導体メモリ素子、ハードディスク、光ディスク等によって実現される。記憶回路３０は、例えば、視線推定システム１が各種の機能を実現するためのプログラムを記憶する。記憶回路３０に記憶されるプログラムには、入力機器１０を機能させるプログラム、出力機器２０を機能させるプログラム、処理回路４０を機能させるプログラム等が含まれる。また、記憶回路３０は、処理回路４０での各種処理に必要なデータ、学習済みモデルＭの学習に用いる学習用データセットＤ３、学習済みモデルＭ、出力機器２０を介して出力する推定結果データＤ５等の各種データを記憶する。記憶回路３０は、処理回路４０等によってこれらの各種データが必要に応じて読み出される。なお、記憶回路３０は、ネットワークを介して視線推定システム１に接続されたクラウドサーバ等により実現されてもよい。ここでは、記憶回路３０は、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００の記憶部として兼用される。 The memory circuit 30 is a circuit that stores various data. The memory circuit 30 is realized, for example, by a semiconductor memory element such as a RAM (Random Access Memory), a flash memory, a hard disk, an optical disk, or the like. The memory circuit 30 stores, for example, a program for the gaze estimation system 1 to realize various functions. The programs stored in the memory circuit 30 include a program for making the input device 10 function, a program for making the output device 20 function, a program for making the processing circuit 40 function, and the like. The memory circuit 30 also stores various data such as data required for various processes in the processing circuit 40, a learning data set D3 used for learning the learned model M, the learned model M, and the estimation result data D5 output via the output device 20. These various data are read out from the memory circuit 30 by the processing circuit 40 or the like as necessary. The memory circuit 30 may be realized by a cloud server or the like connected to the gaze estimation system 1 via a network. Here, the memory circuitry 30 is used as a memory unit for the learning data generation device 100, the model generation device 200, and the gaze estimation device 300.

処理回路４０は、視線推定システム１における各種処理機能を実現する回路である。処理回路４０は、例えば、プロセッサによって実現される。プロセッサとは、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の回路を意味する。処理回路４０は、例えば、記憶回路３０から読み込んだプログラムを実行することにより、各処理機能を実現する。 The processing circuit 40 is a circuit that realizes various processing functions in the gaze estimation system 1. The processing circuit 40 is realized, for example, by a processor. The processor means, for example, a circuit such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field Programmable Gate Array). The processing circuit 40 realizes each processing function, for example, by executing a program read from the memory circuit 30.

同期信号生成装置５０は、同期信号を生成する処理を実行可能な機能を有するものである。視線検出器６０は、学習対象者ＯＢの視線を検出する処理を実行可能な機能を有するものである。カメラ７０は、機械学習を行う際の対象者である学習対象者ＯＢの顔全体を撮像する処理を実行可能な機能を有するものである。また、カメラ７０は、車両の運転者の顔全体を撮像する処理を実行可能な機能を有するものである。同期信号生成装置５０、視線検出器６０、及び、カメラ７０の詳細については、後述する。 The synchronization signal generation device 50 has a function capable of executing a process of generating a synchronization signal. The line-of-sight detector 60 has a function capable of detecting the line-of-sight of the learning subject OB. The camera 70 has a function capable of performing a process of capturing an image of the entire face of a learning subject OB, who is a subject when performing machine learning. Furthermore, the camera 70 has a function capable of capturing an image of the entire face of the driver of the vehicle. Details of the synchronization signal generation device 50, line of sight detector 60, and camera 70 will be described later.

以上、本実施形態に係る視線推定システム１の全体構成の概略について説明した。このような構成のもと、本実施形態に係る処理回路４０は、学習フェーズにおいて、視線を推定するための学習済みモデルＭを生成する各種処理を行うための機能を有している。また、本実施形態に係る処理回路４０は、使用フェーズにおいて、学習済みモデルＭを用いて視線を推定する各種処理を行うための機能を有している。 The above describes an outline of the overall configuration of the gaze estimation system 1 according to this embodiment. With this configuration, the processing circuit 40 according to this embodiment has a function for performing various processes to generate a trained model M for estimating the gaze in the learning phase. In addition, the processing circuit 40 according to this embodiment has a function for performing various processes to estimate the gaze using the trained model M in the usage phase.

本実施形態の処理回路４０は、上記各種処理機能を実現するために、機能概念的に、学習用データ生成部４１と、モデル生成部４２と、推定対象入力部４３と、視線推定部４４と、出力部４５とを含んで構成される。処理回路４０は、例えば、記憶回路３０から読み込んだプログラムを実行することにより、これらの学習用データ生成部４１、モデル生成部４２、推定対象入力部４３、視線推定部４４、及び、出力部４５の各処理機能を実現する。 In order to realize the various processing functions described above, the processing circuitry 40 of this embodiment is functionally configured to include a learning data generation unit 41, a model generation unit 42, an estimation target input unit 43, a gaze estimation unit 44, and an output unit 45. The processing circuitry 40 realizes each of the processing functions of the learning data generation unit 41, the model generation unit 42, the estimation target input unit 43, the gaze estimation unit 44, and the output unit 45, for example, by executing a program read from the memory circuitry 30.

学習用データ生成部４１は、学習フェーズにおいて、推定対象の入力顔画像から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３（図３、図１０参照）を生成する機能を有する部分である。本実施形態の学習用データ生成部４１は、例えば、後述する顔画像データＤ１ａと視線データＤ２とからなる学習用データセットＤ３を生成する処理を実行可能である。この学習用データセットＤ３は、学習済みモデルＭを機械学習によって生成する際に用いられる教師データである。学習用データ生成部４１は、生成した複数の学習用データセットＤ３を記憶回路３０に記憶させる。学習用データ生成部４１の処理の詳細については、後述する。 In the learning phase, the learning data generation unit 41 generates a learning dataset D3 (see FIGS. 3 and 10) that is used when performing machine learning on the trained model M that estimates the line of sight from the input face image to be estimated. This is the part that has the function of The learning data generation unit 41 of this embodiment is capable of executing, for example, a process of generating a learning data set D3 made up of face image data D1a and line of sight data D2, which will be described later. This learning data set D3 is teacher data used when generating the trained model M by machine learning. The learning data generation unit 41 causes the storage circuit 30 to store the plurality of generated learning data sets D3. The details of the processing of the learning data generation unit 41 will be described later.

モデル生成部４２は、学習フェーズにおいて、複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成する処理を実行可能な機能を有する部分である。本実施形態のモデル生成部４２は、学習用データ生成部４１によって生成された複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成する処理を実行可能である。 The model generation unit 42 is a part that has a function capable of executing a process of generating a learned model M by machine learning using a plurality of learning data sets D3 in the learning phase. The model generation unit 42 of this embodiment can execute a process of generating a learned model M by machine learning using the plurality of learning data sets D3 generated by the learning data generation unit 41.

モデル生成部４２は、複数の学習用データセットＤ３を教師データとして、種々の機械学習アルゴリズムＡＬに基づく機械学習を行うことによって、学習済みモデルＭを生成する。使用する機械学習アルゴリズムＡＬとしては、例えば、畳み込みニューラルネットワーク（ＣＮＮ；ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）がある。畳み込みニューラルネットワークは、パターン認識方法を多層化したＤＮＮ（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）のうち、２次元データに対応させたものであり、画像に対して高いパターン認識能力を有している。モデル生成部４２は、畳み込みニューラルネットワークによる機械学習の結果として、顔画像から視線を推定するための学習済みモデルＭを生成する。 The model generation unit 42 generates a trained model M by performing machine learning based on various machine learning algorithms AL using multiple learning datasets D3 as training data. The machine learning algorithm AL used is, for example, a convolutional neural network (CNN). A convolutional neural network is a DNN (Deep Neural Network) that is a multi-layered pattern recognition method that is compatible with two-dimensional data and has high pattern recognition capabilities for images. The model generation unit 42 generates a trained model M for estimating gaze from a face image as a result of machine learning using the convolutional neural network.

モデル生成部４２によって生成された学習済みモデルＭは、入力を顔画像とし、出力を視線の推定を定量化した値としたモデルである。すなわち、学習済みモデルＭは、顔画像の入力を受け付けて当該顔画像から視線の予測を定量化した値を出力するように機能付けられる。より詳しくは、学習済みモデルＭは、畳み込みニューラルネットワークの入力層に入力された顔画像に対して、畳み込み層、プーリング層、全結合層等により所定の演算を行い、出力層から視線の推定を定量化した値（例えば視線角度）を出力するようにコンピュータを機能させる。モデル生成部４２は、上記のようにして生成した学習済みモデルＭを記憶回路３０に記憶させる。 The trained model M generated by the model generation unit 42 is a model in which the input is a face image and the output is a quantified value of the gaze estimation. That is, the trained model M is functionalized to accept the input of a face image and output a quantified value of the gaze prediction from the face image. More specifically, the trained model M causes the computer to perform a predetermined calculation using a convolutional layer, a pooling layer, a fully connected layer, etc. on the face image input to the input layer of the convolutional neural network, and to output a quantified value of the gaze estimation (e.g., gaze angle) from the output layer. The model generation unit 42 stores the trained model M generated as described above in the memory circuitry 30.

推定対象入力部４３は、使用フェーズにおいて、推定対象となる入力顔画像データＤ４を入力する処理を実行可能な機能を有する部分である。推定対象入力部４３は、例えば、車両を運転する運転者の顔を撮像するカメラ７０から出力される入力顔画像データＤ４を入力する。 The estimation target input unit 43 is a part that has a function of inputting the input face image data D4 to be estimated in the use phase. The estimation target input unit 43 receives, for example, input face image data D4 output from a camera 70 that captures an image of the face of a driver driving a vehicle.

視線推定部４４は、使用フェーズにおいて、学習済みモデルＭを用いて、入力顔画像データＤ４から視線を推定する処理を実行可能な機能を有する部分である。本実施形態の視線推定部４４は、モデル生成部４２によって生成された学習済みモデルＭを用いて、推定対象入力部４３によって入力された推定対象となる入力顔画像データＤ４から視線を推定する処理を実行可能である。 The line of sight estimating unit 44 is a part that has a function of being able to perform a process of estimating the line of sight from the input facial image data D4 using the learned model M in the use phase. The line of sight estimating unit 44 of this embodiment performs a process of estimating the line of sight from the input face image data D4 to be estimated that is input by the estimation target input unit 43 using the learned model M generated by the model generating unit 42. is possible.

視線推定部４４は、モデル生成部４２によって生成された学習済みモデルＭに対して、推定対象入力部４３によって入力された入力顔画像データＤ４を入力データとして入力し、これに応じて当該学習済みモデルＭから視線の推定を定量化した値（例えば視線角度）を出力させる。視線推定部４４は、出力された視線の推定を定量化した値（例えば視線角度）を、推定結果データ（出力データ）Ｄ５として記憶回路３０に記憶させる。 The gaze estimation unit 44 inputs the input face image data D4 input by the estimation target input unit 43 as input data to the trained model M generated by the model generation unit 42, and outputs a quantified value of the gaze estimate (e.g., gaze angle) from the trained model M in response to this. The gaze estimation unit 44 stores the quantified value of the output gaze estimate (e.g., gaze angle) in the memory circuit 30 as estimation result data (output data) D5.

出力部４５は、視線推定部４４による視線の推定結果に基づいて出力を行う処理を実行可能な機能を有する部分である。本実施形態の出力部４５は、視線推定部４４によって推定された推定結果データＤ５に基づいて出力機器２０を介して出力する処理を実行可能である。出力部４５は、例えば、推定結果データＤ５の視線角度に基づいて運転者が脇見運転をしているか否かを判定し、運転者が脇見運転をしていると判定した場合には運転者に警告するための警告データを出力機器２０に出力し、運転者が脇見運転をしていないと判定した場合には警告データを出力機器２０に出力しない。出力部４５は、運転者に警告する場合、例えば、出力機器２０を構成するディスプレイを介して画像情報として警告データを表示してもよいし、出力機器２０を構成するスピーカを介して音情報として警告データを音声出力してもよい。 The output unit 45 is a part that has a function of performing output processing based on the estimation result of the line of sight by the line of sight estimating unit 44. The output unit 45 of the present embodiment can execute a process of outputting via the output device 20 based on the estimation result data D5 estimated by the line of sight estimation unit 44. For example, the output unit 45 determines whether or not the driver is driving inattentively based on the line of sight angle of the estimation result data D5, and if it is determined that the driver is driving inattentively, the output unit 45 outputs a message to the driver. Warning data for issuing a warning is output to the output device 20, and the warning data is not output to the output device 20 when it is determined that the driver is not driving inattentively. When warning the driver, the output unit 45 may, for example, display warning data as image information via a display that constitutes the output device 20, or display the warning data as sound information via a speaker that constitutes the output device 20. Warning data may also be output as voice.

次に、学習用データ生成装置１００について詳細に説明する。学習用データ生成装置１００は、上述したように、学習フェーズにおいて、学習済みモデルＭを学習させるための学習用データを生成する機能を有する装置である。学習用データ生成装置１００は、図３に示すように、同期信号生成装置５０と、視線検出器６０と、カメラ７０と、学習用データ生成部４１とを含んで構成される。 Next, the training data generation device 100 will be described in detail. As described above, the training data generation device 100 is a device that has a function of generating training data for training the trained model M in the training phase. As shown in FIG. 3, the training data generation device 100 includes a synchronization signal generation device 50, a gaze detector 60, a camera 70, and a training data generation unit 41.

同期信号生成装置５０は、同期信号を生成する処理を実行可能な機能を有するものである。同期信号生成装置５０は、視線検出器６０及びカメラ７０に接続され、当該同期信号生成装置５０に付属するスイッチがＯＮされると、視線検出器６０及びカメラ７０に同期信号を出力する。同期信号生成装置５０は、例えば、図４に示すように、カメラ７０に対して予め定められたタイミングで同期信号Ｓｇ１を出力し、視線検出器６０に対して予め定められたタイミングで同期信号Ｓｇ２を出力する。なお、同期信号Ｓｇ１の出力間隔は、カメラ７０の性能に応じて適宜定められ、同期信号Ｓｇ２の出力間隔は、視線検出器６０の性能に応じて適宜定められる。 The synchronization signal generation device 50 has a function capable of executing a process of generating a synchronization signal. The synchronization signal generation device 50 is connected to the line-of-sight detector 60 and the camera 70, and outputs a synchronization signal to the line-of-sight detector 60 and the camera 70 when a switch attached to the synchronization signal generation device 50 is turned on. For example, as shown in FIG. 4, the synchronization signal generation device 50 outputs a synchronization signal Sg1 to the camera 70 at a predetermined timing, and outputs a synchronization signal Sg2 to the line of sight detector 60 at a predetermined timing. Output. Note that the output interval of the synchronization signal Sg1 is appropriately determined according to the performance of the camera 70, and the output interval of the synchronization signal Sg2 is appropriately determined according to the performance of the line-of-sight detector 60.

視線検出器６０は、学習対象者ＯＢの視線を検出する処理を実行可能な機能を有するものである。視線検出器６０は、学習対象者ＯＢの頭部に装着する装着型の検出器であり、例えば、株式会社ナックイメージテクノロジー製のＥＭＲ－９（帽子型）を採用することができる。視線検出器６０は、図５等に示すように、視野カメラ６１と、右側の眼球カメラ６２Ｒと、左側の眼球カメラ６２Ｌとを備え、これらの視野カメラ６１、眼球カメラ６２Ｒ、及び、眼球カメラ６２Ｌが帽子に取り付けられている。学習対象者ＯＢは、帽子を頭部に被ることで、視野カメラ６１、眼球カメラ６２Ｒ、及び、眼球カメラ６２Ｌがカメラ７０と学習対象者ＯＢとの間に配置される。これにより、視野カメラ６１、眼球カメラ６２Ｒ、及び、眼球カメラ６２Ｌが学習対象者ＯＢの顔の正面、つまり学習対象者ＯＢの顔（眼を除く部分）と重なった位置に配置される。視野カメラ６１は、学習対象者ＯＢが帽子を頭部に被った状態で、学習対象者ＯＢの頭部の額に配置され、学習対象者ＯＢの前方の景色を撮像する。眼球カメラ６２Ｒは、学習対象者ＯＢが帽子を頭部に被った状態で、学習対象者ＯＢの右眼の下、つまり右側の頬に配置され、学習対象者ＯＢの右眼を撮像する。眼球カメラ６２Ｌは、学習対象者ＯＢが帽子を頭部に被った状態で、学習対象者ＯＢの左眼の下、つまり左側の頬に配置され、学習対象者ＯＢの左眼を撮像する。 The line-of-sight detector 60 has a function capable of detecting the line-of-sight of the learning subject OB. The line-of-sight detector 60 is a wearable detector that is attached to the head of the OB of the learning subject, and for example, EMR-9 (cap type) manufactured by NAC Image Technology Co., Ltd. can be adopted. As shown in FIG. 5 etc., the line of sight detector 60 includes a visual field camera 61, a right eyeball camera 62R, and a left eyeball camera 62L, and these visual field camera 61, eyeball camera 62R, and eyeball camera 62L is attached to the hat. The learning subject OB puts a hat on his head, so that the visual field camera 61, the eyeball camera 62R, and the eyeball camera 62L are arranged between the camera 70 and the learning subject OB. Thereby, the visual field camera 61, the eyeball camera 62R, and the eyeball camera 62L are arranged in front of the face of the learning subject OB, that is, in a position overlapping with the face (excluding the eyes) of the learning subject OB. The visual field camera 61 is placed on the forehead of the learning subject OB while the learning subject OB is wearing a hat on his head, and images the scenery in front of the learning subject OB. The eye camera 62R is placed under the right eye of the learning subject OB, that is, on the right cheek, with the learning subject OB wearing a hat on his head, and images the right eye of the learning subject OB. The eye camera 62L is placed under the left eye of the learning subject OB, that is, on the left cheek, with the learning subject OB wearing a hat on his head, and images the left eye of the learning subject OB.

そして、視線検出器６０は、視野カメラ６１により撮像された視野画像、眼球カメラ６２Ｒにより撮像された学習対象者ＯＢの右眼画像、及び、眼球カメラ６２Ｌにより撮像された学習対象者ＯＢの左眼画像に基づいて、視野画像における学習対象者ＯＢの視線位置を検出する。つまり、視線検出器６０は、視野画像のＸＹ座標軸上において、学習対象者ＯＢの視線のＸ座標及びＹ座標を検出する。言い換えれば、視線検出器６０は、視野画像のＸＹ座標軸上において、実際に学習対象者ＯＢが視ている位置のＸ座標及びＹ座標を検出する。視線検出器６０は、検出した視野画像における学習対象者ＯＢの視線位置を表す視線データＤ２を記憶回路３０に保存する。視線検出器６０は、例えば、同期信号生成装置５０から出力された同期信号Ｓｇ２に基づいて学習対象者ＯＢの視線位置を検出し、検出した学習対象者ＯＢの視線位置を表す視線データＤ２及び当該視線データＤ２を検出したタイミングを表す同期信号Ｓｇ２を記憶回路３０に保存する。視線検出器６０は、学習用データ生成部４１に接続され、保存した視線データＤ２及び同期信号Ｓｇ２を学習用データ生成部４１に出力する。なお、視線検出器６０は、学習対象者ＯＢの視線を検出する際に学習対象者ＯＢの瞬きを検出し、瞬きの検出結果も視線データＤ２に含めて学習用データ生成部４１に出力してもよい。 Then, the gaze detector 60 detects the gaze position of the learner OB in the visual field image based on the visual field image captured by the visual field camera 61, the right eye image of the learner OB captured by the eyeball camera 62R, and the left eye image of the learner OB captured by the eyeball camera 62L. That is, the gaze detector 60 detects the X and Y coordinates of the gaze of the learner OB on the XY coordinate axis of the visual field image. In other words, the gaze detector 60 detects the X and Y coordinates of the position where the learner OB is actually looking on the XY coordinate axis of the visual field image. The gaze detector 60 stores the gaze data D2 representing the gaze position of the learner OB in the detected visual field image in the memory circuit 30. The gaze detector 60 detects the gaze position of the learner OB based on, for example, the synchronization signal Sg2 output from the synchronization signal generating device 50, and stores the gaze data D2 representing the detected gaze position of the learner OB and the synchronization signal Sg2 representing the timing of detecting the gaze data D2 in the memory circuit 30. The gaze detector 60 is connected to the learning data generation unit 41 and outputs the stored gaze data D2 and synchronization signal Sg2 to the learning data generation unit 41. Note that the gaze detector 60 may detect the blinking of the learning subject OB when detecting the gaze of the learning subject OB, and may output the blink detection result to the learning data generation unit 41 together with the gaze data D2.

カメラ７０は、学習用の撮像部であると共に推定画像入力用の撮像部であり、兼用されている。つまり、カメラ７０は、学習用データ生成装置１００の学習用の撮像部と、視線推定装置３００の推定画像入力用の撮像部として兼用される。なお、学習用データ生成装置１００の学習用の撮像部と、視線推定装置３００の推定画像入力用の撮像部とを別々のカメラとして設けてもよい。 Camera 70 is used both as an imaging unit for learning and as an imaging unit for inputting an estimated image. In other words, camera 70 is used both as an imaging unit for learning of learning data generation device 100 and as an imaging unit for inputting an estimated image of gaze estimation device 300. Note that the imaging unit for learning of learning data generation device 100 and the imaging unit for inputting an estimated image of gaze estimation device 300 may be provided as separate cameras.

カメラ７０は、学習用の撮像部として機能する場合、機械学習を行う際の対象者である学習対象者ＯＢの顔全体を撮像する。カメラ７０は、学習対象者ＯＢと一定の間隔を空けた状態で当該学習対象者ＯＢの顔の前方に配置されている。カメラ７０は、図５、図６に示すように、学習対象者ＯＢの顔の正面に配置された視線検出器６０と共に学習対象者ＯＢの顔全体を撮像する。つまり、カメラ７０により撮像された顔画像データＤ１には、視線検出器６０の画像及び学習対象者ＯＢの顔画像が含まれている。カメラ７０は、撮像した視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１を記憶回路３０に保存する。カメラ７０は、例えば、同期信号生成装置５０から出力された同期信号Ｓｇ１に基づいて学習対象者ＯＢの顔を撮像し、撮像した視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１、並びに、当該顔画像データＤ１を撮像したタイミングを表す同期信号Ｓｇ１を記憶回路３０に保存する。カメラ７０は、学習用データ生成部４１に接続され、保存した顔画像データＤ１及び同期信号Ｓｇ１を学習用データ生成部４１に出力する。 When the camera 70 functions as an imaging unit for learning, it captures the entire face of the learner OB, who is the subject of machine learning. The camera 70 is arranged in front of the face of the learner OB with a certain distance between the learner OB and the camera 70. As shown in FIG. 5 and FIG. 6, the camera 70 captures the entire face of the learner OB together with the gaze detector 60 arranged in front of the face of the learner OB. That is, the facial image data D1 captured by the camera 70 includes an image of the gaze detector 60 and a facial image of the learner OB. The camera 70 stores the facial image data D1 including the captured image of the gaze detector 60 and the facial image of the learner OB in the memory circuit 30. For example, the camera 70 captures the face of the learner OB based on the synchronization signal Sg1 output from the synchronization signal generating device 50, and stores the facial image data D1 including the captured image of the gaze detector 60 and the facial image of the learner OB, as well as the synchronization signal Sg1 indicating the timing of capturing the facial image data D1 in the memory circuit 30. The camera 70 is connected to the learning data generation unit 41 and outputs the stored facial image data D1 and synchronization signal Sg1 to the learning data generation unit 41.

カメラ７０は、推定画像入力用の撮像部として機能する場合、車両の運転者の顔全体を撮像する。カメラ７０は、その撮像範囲が運転者のアイリプス（アイボックス）をカバーする範囲に設定されている。ここで、アイリプスとは、運転席に着座した運転者の視点位置が位置することが想定される範囲を表すものである。カメラ７０は、撮像した運転者の顔画像を表す入力顔画像データＤ４を処理回路４０の推定対象入力部４３に出力する。 When the camera 70 functions as an imaging unit for inputting an estimated image, it captures the entire face of the driver of the vehicle. The imaging range of the camera 70 is set to a range that covers the driver's iris (eye box). Here, the iris represents the range in which the viewpoint position of the driver seated in the driver's seat is expected to be located. The camera 70 outputs input face image data D4 representing the captured face image of the driver to the estimation target input unit 43 of the processing circuit 40.

学習用データ生成部４１は、機械学習を行う際に用いられる学習用データセットＤ３を生成する処理を実行可能な機能を有するものである。学習用データ生成部４１は、眼球カメラ検出部４４１と、画像置換部４４２と、顔判定部４４３と、ラベル付与部４４４と、瞬き判定部４４５と、ラベル付与部４４６と、視線角度演算部４４７と、対応付け処理部４４８とを備える。 The learning data generation unit 41 has a function capable of executing a process of generating a learning dataset D3 used when performing machine learning. The learning data generation unit 41 includes an eyeball camera detection unit 441, an image replacement unit 442, a face determination unit 443, a label assignment unit 444, a blink determination unit 445, a label assignment unit 446, a gaze angle calculation unit 447, and a matching processing unit 448.

眼球カメラ検出部４４１は、視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１において、視線検出器６０の画像を検出する処理を実行可能な機能を有する部分である。眼球カメラ検出部４４１は、例えば、顔画像データＤ１において、眼球カメラ６２Ｒの画像、及び、眼球カメラ６２Ｒを帽子に固定するためのアームの画像を検出する。具体的には、眼球カメラ検出部４４１は、テンプレート画像を用いたテンプレートマッチング等の周知の画像認識技術により、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像を検出する。眼球カメラ検出部４４１は、テンプレート画像として、例えば、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像を表す第１画像Ｕ１と、当該第１画像Ｕ１を反転させた第２画像Ｕ２とを用いる。第２画像Ｕ２は、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームを表す画像である。 The eyeball camera detection unit 441 is a part having a function capable of executing a process of detecting an image of the gaze detector 60 in the facial image data D1 including the image of the gaze detector 60 and the facial image of the learning subject OB. For example, the eyeball camera detection unit 441 detects an image of the eyeball camera 62R and an image of an arm for fixing the eyeball camera 62R to a hat in the facial image data D1. Specifically, the eyeball camera detection unit 441 detects an image of the eyeball camera 62R and an image of an arm for fixing the eyeball camera 62R by a well-known image recognition technique such as template matching using a template image. The eyeball camera detection unit 441 uses, for example, a first image U1 representing an image of the eyeball camera 62R and an image of the arm for fixing the eyeball camera 62R, and a second image U2 obtained by inverting the first image U1, as the template image. The second image U2 is an image representing an image of the eyeball camera 62L and an arm for fixing the eyeball camera 62L.

眼球カメラ検出部４４１は、顔画像データＤ１において、第１画像Ｕ１に基づいてテンプレートマッチングを行い、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像を検出する。また、眼球カメラ検出部４４１は、顔画像データＤ１において、第２画像Ｕ２に基づいてテンプレートマッチングを行い、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームの画像を検出する。眼球カメラ検出部４４１は、画像置換部４４２に接続され、検出した検出結果を画像置換部４４２に出力する。 The eyeball camera detection unit 441 performs template matching on the face image data D1 based on the first image U1, and detects the image of the eyeball camera 62R and the image of the arm for fixing the eyeball camera 62R. Further, the eyeball camera detection unit 441 performs template matching on the face image data D1 based on the second image U2, and detects the image of the eyeball camera 62L and the image of the arm for fixing the eyeball camera 62L. The eyeball camera detection section 441 is connected to the image replacement section 442 and outputs the detected detection result to the image replacement section 442.

画像置換部４４２は、画像を置換する処理を実行可能な機能を有する部分である。画像置換部４４２は、例えば、眼球カメラ検出部４４１により検出された検出結果に基づいて、顔画像データＤ１における特定の画像領域Ｑを塗りつぶすことで画像置換する。画像置換部４４２は、例えば、図７に示すように、顔画像データＤ１において、眼球カメラ検出部４４１により検出された視線検出器６０の画像を含む画像領域Ｑの画素値を、予め定められた画素値である置換画素値に置き換える。このとき、画像置換部４４２は、顔画像データＤ１における画像領域Ｑの画素値を、眼の色とは異なる色の置換画素値に置き換えることが好ましい。具体的には、画像置換部４４２は、顔画像データＤ１において、眼球カメラ検出部４４１により検出された眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像、及び、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームの画像を含む矩形状の画像領域Ｑの画素値を、置換画素値に置き換え、置換後の顔画像データＤ１ａを生成する。これにより、置換後の顔画像データＤ１ａには、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像、及び、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームの画像が含まれなくなる。なお、眼は、例えば、瞳孔及び虹彩から構成される。眼の色とは異なる色は、顔画像データＤ１において、顔の皮膚の色の平均的な色とすることが考えられる。画像置換部４４２は、対象の顔画像データＤ１ごとにそれぞれ異なる置換画素値を設定することが可能である。画像置換部４４２は、顔判定部４４３に接続され、置換後の顔画像データＤ１ａを顔判定部４４３に出力する。 The image replacement unit 442 is a part that has a function that can perform image replacement processing. The image replacement unit 442 performs image replacement by filling out a specific image area Q in the face image data D1, for example, based on the detection result detected by the eyeball camera detection unit 441. For example, as shown in FIG. 7, the image replacement unit 442 replaces the pixel values of the image area Q including the image of the line of sight detector 60 detected by the eyeball camera detection unit 441 with a predetermined pixel value in the face image data D1. Replace with a replacement pixel value that is a pixel value. At this time, it is preferable that the image replacement unit 442 replaces the pixel value of the image area Q in the face image data D1 with a replacement pixel value of a color different from the eye color. Specifically, the image replacement unit 442 replaces the image of the eye camera 62R detected by the eye camera detection unit 441, the image of the arm for fixing the eye camera 62R, and the image of the eye camera 62L and the image of the eye camera 62R, in the face image data D1. The pixel values of the rectangular image area Q including the image of the arm for fixing the eyeball camera 62L are replaced with replacement pixel values to generate face image data D1a after replacement. As a result, the face image data D1a after replacement no longer includes the image of the eyeball camera 62R and the image of the arm for fixing the eyeball camera 62R, and the image of the eyeball camera 62L and the image of the arm for fixing the eyeball camera 62L. . Note that the eye is composed of, for example, a pupil and an iris. The color different from the eye color may be the average skin color of the face in the face image data D1. The image replacement unit 442 can set different replacement pixel values for each target face image data D1. The image replacement unit 442 is connected to the face determination unit 443 and outputs the replaced face image data D1a to the face determination unit 443.

顔判定部４４３は、顔の判定を行う処理を実行可能な機能を有する部分である。顔判定部４４３は、例えば、Ｖｉｏｌａ－Ｊｏｎｅｓ法等の周知の顔判定アルゴリズムにより顔の判定を行う。顔判定部４４３は、この顔判定アルゴリズムを使用して、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれるか否かを判定する。顔判定部４４３は、例えば、図８に示すように、画像置換部４４２により眼球カメラ６２Ｒ、６２Ｌ等を含む画像領域Ｑの画素値が置換画素値に置き換えられている場合、学習対象者ＯＢの顔画像が含まれると判定する。一方で、顔判定部４４３は、例えば、図９に示すように、画像置換部４４２により眼球カメラ６２Ｒ、６２Ｌ等を含む画像領域Ｑの画素値が置換画素値に置き換えられていない場合、学習対象者ＯＢの顔画像が含まれていないと判定する。顔判定部４４３は、ラベル付与部４４４に接続され、判定結果をラベル付与部４４４に出力する。 The face determination unit 443 is a part having a function capable of executing a process for determining a face. The face determination unit 443 performs face determination using a known face determination algorithm such as the Viola-Jones method. The face determination unit 443 uses this face determination algorithm to determine whether or not the face image of the learning subject OB is included in the face image data D1a after replacement. For example, as shown in FIG. 8, the face determination unit 443 determines that the face image of the learning subject OB is included when the pixel values of the image area Q including the eyeball cameras 62R, 62L, etc. are replaced with replacement pixel values by the image replacement unit 442. On the other hand, for example, as shown in FIG. 9, the face determination unit 443 determines that the face image of the learning subject OB is not included when the pixel values of the image area Q including the eyeball cameras 62R, 62L, etc. are not replaced with replacement pixel values by the image replacement unit 442. The face determination unit 443 is connected to the label assignment unit 444 and outputs the determination result to the label assignment unit 444.

ラベル付与部４４４は、顔判定部４４３により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して顔判定のラベルを付与する処理を実行可能な機能を有する部分である。ラベル付与部４４４は、例えば、顔判定部４４３により判定された判定結果が、学習対象者ＯＢの顔画像が含まれていないことを表す場合、置換後の顔画像データＤ１ａに対して顔判定の不可を表すラベル（例えば「１」）を付与する。一方で、ラベル付与部４４４は、顔判定部４４３により判定された判定結果が、学習対象者ＯＢの顔画像が含まれていることを表す場合、置換後の顔画像データＤ１ａに対して顔判定の可能を表すラベル（例えば「０」）を付与する。ラベル付与部４４４は、対応付け処理部４４８に接続され、置換後の顔画像データＤ１ａ及びラベル付与情報を対応付け処理部４４８に出力する。 The labeling unit 444 is a part having a function capable of executing a process of assigning a face determination label to the replaced face image data D1a based on the determination result determined by the face determination unit 443. For example, when the determination result determined by the face determination unit 443 indicates that the face image of the learning subject OB is not included, the labeling unit 444 assigns a label (e.g., "1") indicating that face determination is impossible to the replaced face image data D1a. On the other hand, when the determination result determined by the face determination unit 443 indicates that the face image of the learning subject OB is included, the labeling unit 444 assigns a label (e.g., "0") indicating that face determination is possible to the replaced face image data D1a. The labeling unit 444 is connected to the association processing unit 448 and outputs the replaced face image data D1a and the label assignment information to the association processing unit 448.

瞬き判定部４４５は、視線検出器６０により検出された視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する処理を実行可能な機能を有する部分である。瞬き判定部４４５は、例えば、視線検出器６０により検出された視線データＤ２に含まれる学習対象者ＯＢの瞬きの検出結果に基づいて学習対象者ＯＢの瞬きを判定する。瞬き判定部４４５は、例えば、視線データＤ２が学習対象者ＯＢの瞬きを表す場合、学習対象者ＯＢが瞬きをしていると判定する。一方で、瞬き判定部４４５は、視線データＤ２が学習対象者ＯＢの瞬きを表わさない場合、学習対象者ＯＢが瞬きをしていないと判定する。瞬き判定部４４５は、ラベル付与部４４６に接続され、学習対象者ＯＢの瞬きを判定した判定結果をラベル付与部４４６に出力する。 The blink determination unit 445 is a part having a function capable of executing a process of determining whether the learner OB has blinked based on the gaze data D2 detected by the gaze detector 60. The blink determination unit 445 determines whether the learner OB has blinked based on the detection result of the blink of the learner OB contained in the gaze data D2 detected by the gaze detector 60, for example. For example, if the gaze data D2 represents the blink of the learner OB, the blink determination unit 445 determines that the learner OB is blinking. On the other hand, if the gaze data D2 does not represent the blink of the learner OB, the blink determination unit 445 determines that the learner OB is not blinking. The blink determination unit 445 is connected to the label assignment unit 446 and outputs the determination result of the blink of the learner OB to the label assignment unit 446.

ラベル付与部４４６は、瞬き判定部４４５により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して瞬き判定のラベルを付与する処理を実行可能な機能を有する部分である。ラベル付与部４４６は、顔画像データＤ１ａを撮像したタイミングを表す同期信号Ｓｇ１と、視線データＤ２を検出したタイミングを表す同期信号Ｓｇ２とに基づいて、顔画像データＤ１ａに対して瞬き判定のラベルを付与する。ラベル付与部４４６は、例えば、視線データＤ２を検出したタイミングと同じタイミングで撮像した顔画像データＤ１ａに対して瞬き判定のラベルを付与する。 The labeling unit 446 is a part that has a function of performing a process of assigning a blinking determination label to the replaced face image data D1a based on the determination result determined by the blinking determination unit 445. The labeling unit 446 applies a blink determination label to the face image data D1a based on a synchronization signal Sg1 representing the timing at which the face image data D1a was captured and a synchronization signal Sg2 representing the timing at which the line of sight data D2 was detected. Give. For example, the labeling unit 446 adds a blink determination label to the face image data D1a captured at the same timing as the timing when the line of sight data D2 is detected.

ラベル付与部４４６は、例えば、瞬き判定部４４５により判定された判定結果が、学習対象者ＯＢが瞬きをしたことを表す場合、置換後の顔画像データＤ１ａに対して瞬きをしたことを表すラベル（例えば「１」）を付与する。一方で、ラベル付与部４４６は、瞬き判定部４４５により判定された判定結果が、学習対象者ＯＢが瞬きをしていないことを表す場合、置換後の顔画像データＤ１ａに対して瞬きをしていないことを表すラベル（例えば「０」）を付与する。ラベル付与部４４６は、視線角度演算部４４７に接続され、ラベル付与情報を視線角度演算部４４７に出力する。 For example, when the judgment result determined by the blink judgment unit 445 indicates that the learning subject OB has blinked, the label assignment unit 446 assigns a label (e.g., "1") indicating that the learning subject OB has blinked to the replaced face image data D1a. On the other hand, when the judgment result determined by the blink judgment unit 445 indicates that the learning subject OB has not blinked, the label assignment unit 446 assigns a label (e.g., "0") indicating that the learning subject OB has not blinked to the replaced face image data D1a. The label assignment unit 446 is connected to the gaze angle calculation unit 447 and outputs label assignment information to the gaze angle calculation unit 447.

視線角度演算部４４７は、視野画像における学習対象者ＯＢの視線位置を表す視線データＤ２に基づいて視線角度を演算する処理を実行可能な機能を有する部分である。視線角度演算部４４７は、瞬きをしていることを表すラベル（例えば「１」）が視線データＤ２に付与されている場合、視線データＤ２に対して視線角度を演算しない。一方で、視線角度演算部４４７は、瞬きをしていないことを表すラベル（例えば「０」）が視線データＤ２に付与されている場合、視線データＤ２に基づいて視線角度を演算する。ここで、視線データＤ２には、視野画像のＸＹ座標軸上において、学習対象者ＯＢの視線位置を表すＸ座標及びＹ座標が記録されている。視線角度演算部４４７は、この視野画像において、学習対象者ＯＢの視線位置を表すＸ座標及びＹ座標に基づいて視線角度を演算する。視線角度演算部４４７は、対応付け処理部４４８に接続され、演算した視線角度を対応付け処理部４４８に出力する。 The line-of-sight angle calculation unit 447 is a part that has a function of calculating a line-of-sight angle based on the line-of-sight data D2 representing the line-of-sight position of the learning subject OB in the visual field image. The line-of-sight angle calculation unit 447 does not calculate the line-of-sight angle for the line-of-sight data D2 when a label (for example, "1") representing blinking is given to the line-of-sight data D2. On the other hand, when the line-of-sight data D2 is given a label indicating that the person is not blinking (for example, "0"), the line-of-sight angle calculation unit 447 calculates the line-of-sight angle based on the line-of-sight data D2. Here, the line-of-sight data D2 records the X and Y coordinates representing the line-of-sight position of the learning subject OB on the XY coordinate axes of the visual field image. The line-of-sight angle calculation unit 447 calculates the line-of-sight angle in this visual field image based on the X and Y coordinates representing the line-of-sight position of the learning subject OB. The line-of-sight angle calculation unit 447 is connected to the association processing unit 448 and outputs the calculated line-of-sight angle to the association processing unit 448.

対応付け処理部４４８は、各種データを対応づけて学習用データセットＤ３を生成する処理を実行可能な機能を有する部分である。対応付け処理部４４８は、画像置換部４４２により置き換えられた置換後の顔画像データＤ１ａと、視線検出器６０により検出された視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する。 The matching processing unit 448 is a part that has a function capable of executing a process of matching various data to generate a learning data set D3. The matching processing unit 448 generates a learning data set D3 that matches the face image data D1a after replacement by the image replacement unit 442 with the gaze angle of the gaze data D2 detected by the gaze detector 60.

対応付け処理部４４８は、例えば、顔判定部４４３により置換後の顔画像データＤ１に学習対象者ＯＢの顔画像が含まれると判定された場合、つまり顔判定のラベル付与情報が「０」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する。例えば、図１０に示す学習用データセットＤ３には、顔判定のラベル付与情報「０（採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００１.ｊｐｇ」とその視線角度を表す「１５．３」とが対応付けられる。 For example, when the face determination unit 443 determines that the replaced face image data D1 contains the face image of the learning subject OB, that is, when the labeling information for face determination is "0", the association processing unit 448 generates a learning dataset D3 that associates the replaced face image data D1a with the gaze angle of the gaze data D2. For example, in the learning dataset D3 shown in FIG. 10, the labeling information for face determination "0 (adopted)" is recorded, thereby associating "0001.jpg", which represents the replaced face image data D1a, with "15.3", which represents the gaze angle.

一方で、対応付け処理部４４８は、顔判定部４４３により置換後の顔画像データＤ１に学習対象者ＯＢの顔画像が含まれないと判定された場合、つまり顔判定のラベル付与情報が「１」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成しない。例えば、図１０に示す学習用データセットＤ３には、顔判定のラベル付与情報「１（不採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００３.ｊｐｇ」とその視線角度を表す「－１２．１」とが対応付けられず不採用となる。 On the other hand, if the face determination unit 443 determines that the replaced face image data D1 does not include the face image of the learning subject OB, that is, if the face determination label information is "1", the matching processing unit 448 does not generate a learning dataset D3 that matches the replaced face image data D1a with the gaze angle of the gaze data D2. For example, in the learning dataset D3 shown in FIG. 10, the face determination label information "1 (not adopted)" is recorded, so that "0003.jpg" representing the replaced face image data D1a and "-12.1" representing its gaze angle are not matched and are not adopted.

また、対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていないと判定された場合、つまり瞬き判定のラベル付与情報が「０」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する。例えば、図１０に示す学習用データセットＤ３には、瞬き判定のラベル付与情報「０（採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００１.ｊｐｇ」とその視線角度を表す「１５．３」とが対応付けられる。 Furthermore, when the blink determination unit 445 determines that the person is not blinking, that is, when the labeling information for blink determination is “0,” the association processing unit 448 replaces the face image data D1a with the line of sight data D2. A learning data set D3 is generated in which the line of sight angles are associated with each other. For example, in the learning data set D3 shown in FIG. 10, labeling information "0 (adopted)" for blink determination is recorded, so that "0001.jpg" representing the face image data D1a after replacement and its line of sight are recorded. "15.3" representing the angle is associated.

一方で、対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていると判定された場合、つまり瞬き判定のラベル付与情報が「１」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成しない。例えば、図１０に示す学習用データセットＤ３には、瞬き判定のラベル付与情報「１（不採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００４.ｊｐｇ」とその視線角度を表す「０」とが対応付けられず不採用となる。 On the other hand, when the blink determination unit 445 determines that a person is blinking, that is, when the blink determination label assignment information is "1", the association processing unit 448 does not generate a learning dataset D3 that associates the replaced facial image data D1a with the gaze angle of the gaze data D2. For example, in the learning dataset D3 shown in FIG. 10, the blink determination label assignment information "1 (not adopted)" is recorded, so that "0004.jpg" representing the replaced facial image data D1a and "0" representing the gaze angle are not associated and are not adopted.

対応付け処理部４４８は、顔判定のラベル付与情報「０（採用）」であり、且つ、瞬き判定のラベル付与情報「０（採用）」である置換後の顔画像データＤ１ａを学習用データセットＤ３に採用する。対応付け処理部４４８は、顔判定のラベル付与情報「１（不採用）」、又は、瞬き判定のラベル付与情報「１（不採用）」を含む置換後の顔画像データＤ１ａを学習用データセットＤ３に採用しない。対応付け処理部４４８は、記憶回路３０に接続され、生成した学習用データセットＤ３を記憶回路３０に記憶させる。 The matching processing unit 448 adopts the replaced face image data D1a having the face determination labeling information "0 (adopted)" and the blink determination labeling information "0 (adopted)" into the learning dataset D3. The matching processing unit 448 does not adopt the replaced face image data D1a having the face determination labeling information "1 (not adopted)" or the blink determination labeling information "1 (not adopted)" into the learning dataset D3. The matching processing unit 448 is connected to the memory circuitry 30 and stores the generated learning dataset D3 in the memory circuitry 30.

次に、視線推定システム１における視線推定方法の処理手順について説明する。図１１は、実施形態に係る視線推定システム１における視線推定方法の処理手順を示すフローチャートである。図１１に示す視線推定システム１おける視線推定方法は、学習用データ生成ステップ（ステップＳ１）と、モデル生成ステップ（ステップＳ２）と、推定対象入力ステップ（ステップＳ３）と、視線推定ステップ（ステップＳ４）と、警告データ出力ステップ（ステップＳ５）とを有する。ここでは、上記各ステップに関する処理は、視線推定システム１の処理回路４０によって実行される。 Next, the processing steps of the gaze estimation method in the gaze estimation system 1 will be described. FIG. 11 is a flowchart showing the processing steps of the gaze estimation method in the gaze estimation system 1 according to the embodiment. The gaze estimation method in the gaze estimation system 1 shown in FIG. 11 has a learning data generation step (step S1), a model generation step (step S2), an estimation target input step (step S3), a gaze estimation step (step S4), and a warning data output step (step S5). Here, the processing related to each of the above steps is executed by the processing circuit 40 of the gaze estimation system 1.

まず、処理回路４０の学習用データ生成部４１は、学習フェーズにおいて、推定対象の入力顔画像から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３を生成する学習用データ生成ステップ（ステップＳ１）を実行する。学習用データ生成部４１は、生成した複数の学習用データセットＤ３を記憶回路３０に記憶させる。 First, the learning data generation unit 41 of the processing circuit 40 executes a learning data generation step (step S1) for generating a learning data set D3 used in machine learning of a trained model M that estimates the gaze from an input face image of an estimation target in the learning phase. The learning data generation unit 41 stores the generated multiple learning data sets D3 in the memory circuit 30.

次に、処理回路４０のモデル生成部４２は、学習フェーズにおいて、学習用データ生成ステップ（ステップＳ１）で生成された複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成するモデル生成ステップ（ステップＳ２）を実行する。モデル生成部４２は、生成した学習済みモデルＭを記憶回路３０に記憶させる。 Next, in the learning phase, the model generation unit 42 of the processing circuit 40 executes a model generation step (step S2) in which a trained model M is generated by machine learning using the multiple training data sets D3 generated in the training data generation step (step S1). The model generation unit 42 stores the generated trained model M in the memory circuit 30.

次に、処理回路４０の推定対象入力部４３は、使用フェーズにおいて、推定対象となる入力顔画像データＤ４を処理回路４０の視線推定部４４に入力する入力ステップ（ステップＳ３）を実行する。この場合、推定対象入力部４３は、例えば、運転者の顔を撮像するカメラ７０から出力される入力顔画像データＤ４を入力する。 Next, in the use phase, the estimation target input unit 43 of the processing circuit 40 executes an input step (step S3) of inputting the input face image data D4 to be estimated to the gaze estimation unit 44 of the processing circuit 40. In this case, the estimation target input unit 43 inputs, for example, the input face image data D4 output from the camera 70 that captures the face of the driver.

次に、処理回路４０の視線推定部４４は、使用フェーズにおいて、モデル生成ステップ（ステップＳ２）で生成された学習済みモデルＭを用いて、入力ステップ（ステップＳ３）で入力された入力顔画像データＤ４から視線を推定する視線推定ステップ（ステップＳ４）を実行する。視線推定部４４は、例えば、モデル生成ステップ（ステップＳ２）で生成された学習済みモデルＭに対して、入力ステップ（ステップＳ３）で入力された入力顔画像データＤ４を入力データとして入力し、これに応じて当該学習済みモデルＭから視線の推定を定量化した値（例えば視線角度）を出力させる。視線推定部４４は、出力された視線の推定を定量化した値（例えば視線角度）を、推定結果データＤ５として記憶回路３０に記憶させる。 Next, in the use phase, the gaze estimation unit 44 of the processing circuit 40 executes a gaze estimation step (step S4) in which the trained model M generated in the model generation step (step S2) is used to estimate the gaze from the input face image data D4 input in the input step (step S3). For example, the gaze estimation unit 44 inputs the input face image data D4 input in the input step (step S3) as input data to the trained model M generated in the model generation step (step S2), and outputs a quantified value of the gaze estimate (e.g., gaze angle) from the trained model M in response to this. The gaze estimation unit 44 stores the output quantified value of the gaze estimate (e.g., gaze angle) in the memory circuit 30 as estimation result data D5.

次に、処理回路４０の出力部４５は、視線推定ステップ（ステップＳ４）で推定された視線の推定結果データＤ５の視線角度に基づいて警告データを出力する警告データ出力ステップ（ステップＳ５）を実行し、本フローチャートによる処理を終了する。出力部４５は、例えば、推定結果データＤ５の視線角度に基づいて運転者が脇見運転をしているか否かを判定し、運転者が脇見運転をしていると判定した場合には警告データを出力機器２０に出力し、運転者が脇見運転をしていないと判定した場合には警告データを出力機器２０に出力しない。 Next, the output unit 45 of the processing circuit 40 executes a warning data output step (step S5) of outputting warning data based on the gaze angle of the gaze estimation result data D5 estimated in the gaze estimation step (step S4), and ends the processing according to this flowchart. For example, the output unit 45 determines whether the driver is looking away from the road based on the gaze angle of the estimation result data D5, and outputs warning data to the output device 20 if it is determined that the driver is looking away from the road, and does not output warning data to the output device 20 if it is determined that the driver is not looking away from the road.

図１２は、実施形態に係る学習用データ生成装置１００の動作例を示すフローチャートである。上記学習用データ生成ステップ（ステップＳ１）は、図１２に示すように、さらに、撮像ステップ（ステップＴ１）と、画像検出ステップ（ステップＴ２）と、画像置換ステップ（ステップＴ３）と、顔判定ステップ（ステップＴ４）と、ラベル付与ステップ（Ｔ５、Ｔ６）と、視線検出ステップ（ステップＴ７）と、瞬き判定ステップ（ステップＴ８）と、ラベル付与ステップ（ステップＴ９、Ｔ１０）と、演算ステップ（ステップＴ１１）と、対応付け処理ステップ（ステップＴ１２）とを含む。ここでは、上記各ステップに関する処理は、学習用データ生成装置１００によって実行される。 FIG. 12 is a flowchart showing an example of the operation of the training data generation device 100 according to the embodiment. As shown in FIG. 12, the training data generation step (step S1) further includes an imaging step (step T1), an image detection step (step T2), an image replacement step (step T3), a face determination step (step T4), a label assignment step (T5, T6), a gaze detection step (step T7), a blink determination step (step T8), a label assignment step (steps T9, T10), a calculation step (step T11), and an association processing step (step T12). Here, the processing related to each of the above steps is executed by the training data generation device 100.

まず、学習用データ生成装置１００において、カメラ７０は、学習対象者ＯＢの顔を撮像する撮像ステップ（ステップＴ１）を実行する。 First, in the learning data generation device 100, the camera 70 executes an imaging step (step T1) to capture an image of the face of the learning subject OB.

次に、眼球カメラ検出部４４１は、撮像ステップ（ステップＴ１）で撮像された視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１において、視線検出器６０の画像を検出する画像検出ステップ（ステップＴ２）を実行する。 Next, the eyeball camera detection unit 441 detects the image of the line of sight detector 60 in the face image data D1 including the image of the line of sight detector 60 captured in the imaging step (step T1) and the face image of the learning subject OB. The image detection step (step T2) is executed.

次に、画像置換部４４２は、顔画像データＤ１において、画像検出ステップ（ステップＴ２）で検出された視線検出器６０の画像を含む画像領域Ｑの画素値を、予め定められた置換画素値に置き換える画像置換ステップ（ステップＴ３）を実行する。このとき、画像置換部４４２は、顔画像データＤ１における画像領域Ｑの画素値を、眼の色とは異なる色の置換画素値に置き換えることが好ましい。 Next, the image replacement unit 442 executes an image replacement step (step T3) in which the pixel values of the image region Q in the facial image data D1, which includes the image of the gaze detector 60 detected in the image detection step (step T2), are replaced with predetermined replacement pixel values. At this time, it is preferable that the image replacement unit 442 replaces the pixel values of the image region Q in the facial image data D1 with replacement pixel values of a color different from the eye color.

次に、顔判定部４４３は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれるか否かを判定する顔判定ステップ（ステップＴ４）を実行する。顔判定部４４３は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれる場合（ステップＴ４；Ｙｅｓ）、ラベル付与ステップ（ステップＴ５）に移行する。一方で、顔判定部４４３は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれない場合（ステップＴ４；Ｎｏ）、ラベル付与ステップ（ステップＴ６）に移行する。 Next, the face determination unit 443 executes a face determination step (step T4) to determine whether or not the face image of the learning subject OB is included in the replaced face image data D1a. If the replaced face image data D1a contains the face image of the learning subject OB (step T4; Yes), the face determination unit 443 proceeds to a label assignment step (step T5). On the other hand, if the replaced face image data D1a does not contain the face image of the learning subject OB (step T4; No), the face determination unit 443 proceeds to a label assignment step (step T6).

次に、ラベル付与部４４４は、顔判定部４４３により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して顔判定のラベルを付与する。ラベル付与部４４４は、学習対象者ＯＢの顔画像が含まれると判定された場合（ステップＴ４；Ｙｅｓ）、置換後の顔画像データＤ１ａに対して顔判定の可能を表すラベル（例えば「０」）を付与するラベル付与ステップ（ステップＴ５）を実行する。一方で、ラベル付与部４４４は、学習対象者ＯＢの顔画像が含まれないと判定された場合（ステップＴ４；Ｎｏ）、置換後の顔画像データＤ１ａに対して顔判定の不可を表すラベル（例えば「１」）を付与するラベル付与ステップ（ステップＴ６）を実行する。 Next, the label assignment unit 444 assigns a face determination label to the replaced face image data D1a based on the determination result determined by the face determination unit 443. If the label assignment unit 444 determines that the face image of the learning subject OB is included (step T4; Yes), it executes a label assignment step (step T5) of assigning a label indicating that face determination is possible (e.g., "0") to the replaced face image data D1a. On the other hand, if the label assignment unit 444 determines that the face image of the learning subject OB is not included (step T4; No), it executes a label assignment step (step T6) of assigning a label indicating that face determination is impossible (e.g., "1") to the replaced face image data D1a.

視線検出器６０は、同期信号生成装置５０から出力される同期信号Ｓｇ２のタイミングで、学習対象者ＯＢの視線を検出する視線検出ステップ（ステップＴ７）を実行する。 The line-of-sight detector 60 executes a line-of-sight detection step (step T7) of detecting the line of sight of the learning subject OB at the timing of the synchronization signal Sg2 output from the synchronization signal generation device 50.

次に、瞬き判定部４４５は、視線検出器６０により検出された視線を表す視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する瞬き判定ステップ（ステップＴ８）を実行する。瞬き判定部４４５は、学習対象者ＯＢが瞬きをしていると判定した場合（ステップＴ８；Ｙｅｓ）、ラベル付与ステップ（ステップＴ９）に移行する。一方で、瞬き判定部４４５は、学習対象者ＯＢが瞬きをしていないと判定した場合（ステップＴ８；Ｎｏ）、ラベル付与ステップ（ステップＴ１０）に移行する。 Next, the blink determination unit 445 executes a blink determination step (step T8) to determine whether the learner OB is blinking based on the gaze data D2 representing the gaze detected by the gaze detector 60. If the blink determination unit 445 determines that the learner OB is blinking (step T8; Yes), it proceeds to a label assignment step (step T9). On the other hand, if the blink determination unit 445 determines that the learner OB is not blinking (step T8; No), it proceeds to a label assignment step (step T10).

次に、ラベル付与部４４６は、瞬き判定部４４５により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して瞬き判定のラベルを付与する。ラベル付与部４４６は、学習対象者ＯＢが瞬きをしていると判定された場合（ステップＴ８；Ｙｅｓ）、置換後の顔画像データＤ１ａに対して瞬きを実施したことを表すラベル（例えば「１」）を付与するラベル付与ステップ（ステップＴ９）を実行する。一方で、ラベル付与部４４６は、学習対象者ＯＢが瞬きをしていないと判定された場合（ステップＴ８；Ｎｏ）、置換後の顔画像データＤ１ａに対して瞬きを実施していないことを表すラベル（例えば「０」）を付与するラベル付与ステップ（ステップＴ１０）を実行する。 Next, the label assignment unit 446 assigns a blink judgment label to the replaced facial image data D1a based on the judgment result determined by the blink judgment unit 445. If the label assignment unit 446 judges that the learning subject OB is blinking (step T8; Yes), it executes a label assignment step (step T9) of assigning a label (e.g., "1") indicating that a blink has been performed to the replaced facial image data D1a. On the other hand, if the label assignment unit 446 judges that the learning subject OB is not blinking (step T8; No), it executes a label assignment step (step T10) of assigning a label (e.g., "0") indicating that a blink has not been performed to the replaced facial image data D1a.

次に、視線角度演算部４４７は、瞬きを実施していないことを表すラベル（例えば「０」）を付与するラベル付与ステップ（ステップＴ１０）の処理の後、視野画像における学習対象者ＯＢの視線位置を表す視線データＤ２に基づいて視線角度を演算する演算ステップ（ステップＴ１１）を実行する。 Next, after processing a labeling step (step T10) in which a label indicating that blinking is not performed (for example, "0") is added, the line of sight angle calculation unit 447 calculates the line of sight of the learning subject OB in the visual field image. A calculation step (step T11) is executed to calculate the line-of-sight angle based on the line-of-sight data D2 representing the position.

次に、対応付け処理部４４８は、ラベル付与ステップ（ステップＴ５）で顔判定の可能を表すラベル（例えば「０」）が付与された置換後の顔画像データＤ１ａと、演算ステップ（ステップＴ１１）で演算された視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する対応付け処理ステップ（ステップＴ１２）を実行する。対応付け処理部４４８は、例えば、撮像ステップ（ステップＴ１）でカメラ７０により学習対象者ＯＢの顔を撮像したタイミングと、視線検出ステップ（ステップＴ７）で視線検出器６０により学習対象者ＯＢの視線を検出したタイミングとがそれぞれ同期する置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成する。 Next, the association processing unit 448 uses the replaced face image data D1a to which a label (for example, "0") indicating that face determination is possible is added in the labeling step (step T5), and the replaced face image data D1a in the calculation step (step T11). A matching processing step (step T12) is executed to generate a learning data set D3 in which the line-of-sight angle of the line-of-sight data D2 calculated in step T1 is associated with the line-of-sight angle. For example, the association processing unit 448 determines the timing at which the face of the learning subject OB is captured by the camera 70 in the imaging step (step T1), and the timing at which the face of the learning subject OB is captured by the sight line detector 60 in the sight line detection step (step T7). A learning data set D3 is generated in which face image data D1a after replacement and line of sight data D2 are associated with each other in synchronization with the timing of detection.

上述した視線推定方法は、予め用意された視線推定プログラムを種々のコンピュータ機器で実行することによって実現することができる。この視線推定プログラムは、少なくとも上述した学習用データ生成ステップ（ステップＳ１）、モデル生成ステップ（ステップＳ２）、推定対象入力ステップ（ステップＳ３）、視線推定ステップ（ステップＳ４）、警告データ出力ステップ（ステップＳ５）の各処理、さらには、撮像ステップ（ステップＴ１）、視線検出ステップ（ステップＴ７）、画像検出ステップ（ステップＴ２）、画像置換ステップ（ステップＴ３）、対応付け処理ステップ（ステップＴ１２）の各処理をコンピュータ機器に実行させる。 The above-mentioned gaze estimation method can be realized by executing a gaze estimation program prepared in advance on various computer devices. This gaze estimation program causes the computer devices to execute at least the above-mentioned learning data generation step (step S1), model generation step (step S2), estimation target input step (step S3), gaze estimation step (step S4), and warning data output step (step S5), as well as the imaging step (step T1), gaze detection step (step T7), image detection step (step T2), image replacement step (step T3), and matching processing step (step T12).

以上のように、実施形態に係る視線推定システム１は、学習用データ生成装置１００と、モデル生成部４２と、推定対象入力部４３と、視線推定部４４とを備える。学習用データ生成装置１００は、推定対象の入力顔画像データＤ４から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３を生成する。モデル生成部４２は、学習用データ生成装置１００により生成された複数の学習用データセットＤ３を用いて、機械学習により学習済みモデルＭを生成する。推定対象入力部４３は、推定対象の入力顔画像データＤ４を入力する。視線推定部４４は、モデル生成部４２により生成された学習済みモデルＭを用いて、推定対象入力部４３により入力された入力顔画像データＤ４から視線を推定する。 As described above, the gaze estimation system 1 according to the embodiment includes the training data generation device 100, the model generation unit 42, the estimation target input unit 43, and the gaze estimation unit 44. The training data generation device 100 generates a training data set D3 used when performing machine learning to generate a trained model M that estimates a gaze from input face image data D4 of an estimation target. The model generation unit 42 generates a trained model M by machine learning using a plurality of training data sets D3 generated by the training data generation device 100. The estimation target input unit 43 inputs the input face image data D4 of the estimation target. The gaze estimation unit 44 estimates a gaze from the input face image data D4 input by the estimation target input unit 43 using the trained model M generated by the model generation unit 42.

ここで、上記学習用データ生成装置１００は、カメラ７０と、視線検出器６０と、眼球カメラ検出部４４１と、画像置換部４４２と、対応付け処理部４４８と、を含んで構成される。カメラ７０は、機械学習を行う際の対象者である学習対象者ＯＢの顔を撮像する。視線検出器６０は、カメラ７０と学習対象者ＯＢとの間に配置され学習対象者ＯＢの視線を検出する。眼球カメラ検出部４４１は、カメラ７０により撮像された視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１において、視線検出器６０の画像を検出する。画像置換部４４２は、顔画像データＤ１において、眼球カメラ検出部４４１により検出された視線検出器６０の画像を含む画像領域Ｑの画素値を、予め定められた画素値に置き換える。対応付け処理部４４８は、画像置換部４４２により置き換えられた置換後の顔画像データＤ１ａと、視線検出器６０により検出された視線データＤ２とを対応付けた学習用データセットＤ３を生成する。 Here, the learning data generating device 100 includes a camera 70, a gaze detector 60, an eyeball camera detection unit 441, an image replacement unit 442, and a matching processing unit 448. The camera 70 captures the face of the learning subject OB, who is the subject when performing machine learning. The gaze detector 60 is disposed between the camera 70 and the learning subject OB and detects the gaze of the learning subject OB. The eyeball camera detection unit 441 detects the image of the gaze detector 60 in the face image data D1 including the image of the gaze detector 60 captured by the camera 70 and the face image of the learning subject OB. The image replacement unit 442 replaces the pixel values of the image area Q including the image of the gaze detector 60 detected by the eyeball camera detection unit 441 in the face image data D1 with predetermined pixel values. The matching processing unit 448 generates a learning data set D3 in which the face image data D1a after replacement by the image replacement unit 442 is associated with the gaze data D2 detected by the gaze detector 60.

この構成により、視線推定システム１は、学習用データセットＤ３により学習した学習済みモデルＭを生成する際に視線検出器６０を眼として誤認識した状態で学習済みモデルＭが生成されてしまうことを抑制できる。つまり、視線推定システム１は、推定対象の入力顔画像データＤ４から視線を推定することができる学習済みモデルＭを精度よく生成することができる。この結果、視線推定システム１は、精度よく生成された学習済みモデルＭを用いて入力顔画像データＤ４から運転者の視線を適正に推定することができる。またこのとき、視線推定システム１は、視線検出器６０の画像を含む画像領域Ｑの画素値を予め定められた画素値に置き換えるので、従来のように視線検出器６０の画像を削除した上で顔画像を復元するような処理と比較して、推定精度を確保した上で演算負荷を軽減することができる。視線推定システム１は、学習対象者ＯＢの頭部に装着する装着型の視線検出器６０を採用することにより、実際に学習対象者ＯＢが視た位置（視線位置）に基づいて機械学習を行うことができるため、精度のよい学習用データセットＤ３を生成することができる。このように、視線推定システム１は、視線検出器６０により精度のよい視線データＤ２を検出することができ、その上で視線検出器６０を採用するがゆえに視線検出器６０が写り込んでしまうという背反（デメリット）も解消することができる。視線推定システム１は、対応付け処理部４４８により置換後の顔画像データＤ１ａと視線データＤ２とを対応付けるので、自動的に学習用データセットＤ３を生成することができる。 With this configuration, the gaze estimation system 1 can suppress the generation of the learned model M in a state where the gaze detector 60 is erroneously recognized as an eye when generating the learned model M learned from the learning data set D3. In other words, the gaze estimation system 1 can accurately generate the learned model M that can estimate the gaze from the input face image data D4 of the estimation target. As a result, the gaze estimation system 1 can properly estimate the driver's gaze from the input face image data D4 using the accurately generated learned model M. In addition, at this time, the gaze estimation system 1 replaces the pixel values of the image area Q including the image of the gaze detector 60 with predetermined pixel values, so that it is possible to reduce the calculation load while ensuring the estimation accuracy, compared to the conventional process of deleting the image of the gaze detector 60 and then restoring the face image. The gaze estimation system 1 employs a wearable gaze detector 60 that is attached to the head of the learning subject OB, and can perform machine learning based on the position (gaze position) actually looked by the learning subject OB, so that it is possible to generate a learning data set D3 with high accuracy. In this way, the gaze estimation system 1 can detect the gaze data D2 with high accuracy using the gaze detector 60, and can also eliminate the trade-off (disadvantage) of the gaze detector 60 being captured in the image due to the use of the gaze detector 60. The gaze estimation system 1 uses the association processing unit 448 to associate the replaced face image data D1a with the gaze data D2, and can automatically generate the learning dataset D3.

上記視線推定システム１において、学習用データ生成装置１００は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれることを判定する顔判定部４４３を含んで構成される。対応付け処理部４４８は、顔判定部４４３により置換後の顔画像データＤ１ａに学習対象者ＯＢの顔画像が含まれると判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成する。一方で、対応付け処理部４４８は、顔判定部４４３により置換後の顔画像データＤ１ａに学習対象者ＯＢの顔画像が含まれないと判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成しない。 In the line of sight estimation system 1, the learning data generation device 100 is configured to include a face determination unit 443 that determines that the facial image data D1a after replacement includes the face image of the learning target OB. When the face determining unit 443 determines that the face image data D1a after replacement includes the face image of the learning target OB, the correlation processing unit 448 associates the face image data D1a after replacement with the line of sight data D2. The attached learning data set D3 is generated. On the other hand, if the face determining unit 443 determines that the face image data D1a after replacement does not include the face image of the learning subject OB, the correlation processing unit 448 combines the face image data D1a after replacement and the line of sight data. The learning data set D3 associated with D2 is not generated.

この構成により、視線推定システム１は、例えば、図９に示すように、置換後の顔画像データＤ１ａにおいて、画像置換部４４２により眼球カメラ６２Ｒ、６２Ｌ等を含む画像領域Ｑの画素値が置換画素値に置き換えられていない場合、当該置換後の顔画像データＤ１ａを不採用とすることができる。これにより、視線推定システム１は、学習用データセットＤ３の信頼性の低下を抑制することができ、この結果、学習済みモデルＭにより適正に視線を推定することができる。 With this configuration, the line of sight estimation system 1 can, for example, as shown in FIG. If the face image data D1a has not been replaced with a value, the face image data D1a after the replacement can be rejected. Thereby, the line of sight estimation system 1 can suppress a decrease in reliability of the learning data set D3, and as a result, the line of sight can be appropriately estimated using the trained model M.

上記視線推定システム１において、学習用データ生成装置１００は、視線検出器６０により検出された視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する瞬き判定部４４５を含んで構成される。対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていないと判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成する。一方で、対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていると判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成しない。 In the gaze estimation system 1, the learning data generation device 100 is configured to include a blink determination unit 445 that determines whether the learning subject OB blinks based on the gaze data D2 detected by the gaze detector 60. If the blink determination unit 445 determines that the person is not blinking, the association processing unit 448 generates a learning data set D3 in which the replaced face image data D1a and the line of sight data D2 are associated with each other. On the other hand, if the blink determination unit 445 determines that the person is blinking, the association processing unit 448 does not generate the learning data set D3 that associates the replaced face image data D1a and the line of sight data D2. .

この構成により、視線推定システム１は、例えば、学習対象者ＯＢが瞬きをすることにより眼を閉じた状態となり、学習対象者ＯＢの視線を検出することができない場合、或いは誤って視線を検出した場合、置換後の顔画像データＤ１ａを不採用とすることができる。これにより、視線推定システム１は、学習用データセットＤ３の信頼性の低下を抑制することができ、この結果、学習済みモデルＭにより適正に視線を推定することができる。 With this configuration, the gaze estimation system 1 can reject the replaced facial image data D1a when, for example, the learning subject OB blinks and has his/her eyes closed, making it impossible to detect the learning subject OB's gaze, or when the gaze is detected erroneously. This allows the gaze estimation system 1 to suppress a decrease in the reliability of the learning dataset D3, and as a result, the gaze estimation system 1 can properly estimate the gaze using the trained model M.

上記視線推定システム１において、予め定められた置換画素値は、学習対象者ＯＢの眼の色とは異なる色の画素値である。この構成により、視線推定システム１は、学習済みモデルＭを生成する際に視線検出器６０を眼として誤認識した状態で学習済みモデルＭが生成されてしまうことをより抑制することができ、この結果、学習済みモデルＭにより適正に視線を推定することができる。 In the line of sight estimation system 1, the predetermined replacement pixel value is a pixel value of a color different from the eye color of the learning subject OB. With this configuration, the line-of-sight estimation system 1 can further suppress the generation of the trained model M with the line-of-sight detector 60 erroneously recognized as an eye when generating the trained model M. As a result, the line of sight can be appropriately estimated using the trained model M.

実施形態に係る視線推定方法、及び、視線推定プログラムは、置換後の顔画像データＤ１ａを含む学習用データセットＤ３により機械学習した学習済みモデルＭを用いて入力顔画像データＤ４から視線を推定するので、上述した視線推定システム１と同様に、適正に視線を推定することができる。実施形態に係る学習用データ生成装置１００は、置換後の顔画像データＤ１ａを含む学習用データセットＤ３を生成するので、適正に視線を推定することを支援することができる。 The gaze estimation method and gaze estimation program according to the embodiment estimate the gaze from input facial image data D4 using a trained model M that has been machine-learned using a training dataset D3 that includes post-replacement facial image data D1a, and can therefore estimate the gaze appropriately, similar to the above-described gaze estimation system 1. The training data generation device 100 according to the embodiment generates a training dataset D3 that includes post-replacement facial image data D1a, and can therefore assist in estimating the gaze appropriately.

〔変形例〕
次に、実施形態の変形例について説明する。なお、変形例では、実施形態と同等の構成要素には同じ符号を付し、その詳細な説明を省略する。上述した実施形態では、視線推定システム１として、１つのシステムで学習フェーズと使用フェーズとの双方を行う場合の例を説明したが、実施形態はこれに限られない。 [Modified example]
Next, a modification of the embodiment will be described. In addition, in the modified example, the same reference numerals are given to the same components as in the embodiment, and detailed explanation thereof will be omitted. In the embodiment described above, an example has been described in which one system performs both the learning phase and the use phase as the line-of-sight estimation system 1, but the embodiment is not limited to this.

例えば、図１３に例示する変形例に係る視線推定システム１Ａは、学習用データセットＤ３を生成する学習用データ生成装置１００Ａと、学習済みモデルＭを生成するモデル生成装置２００Ａと、学習済みモデルＭを用いて入力顔画像データＤ４から視線を推定する視線推定装置３００Ａとに分かれて構成される点で上述した視線推定システム１とは異なる。図１３は、実施形態の変形例に係る視線推定システム１Ａの構成例を示すブロック図である。図１４は、実施形態の変形例に係る視線推定装置３００Ａの適用例を示す概略図である。 For example, the gaze estimation system 1A according to the modified example shown in FIG. 13 differs from the gaze estimation system 1 described above in that it is configured separately from a learning data generation device 100A that generates a learning dataset D3, a model generation device 200A that generates a trained model M, and a gaze estimation device 300A that estimates a gaze from input face image data D4 using the trained model M. FIG. 13 is a block diagram showing an example of the configuration of the gaze estimation system 1A according to the modified example of the embodiment. FIG. 14 is a schematic diagram showing an application example of the gaze estimation device 300A according to the modified example of the embodiment.

視線推定システム１Ａは、図１３に示すように、学習用データ生成装置１００Ａ、モデル生成装置２００Ａ、及び、視線推定装置３００Ａが、それぞれが独立して別々の場所に配置され、分散したシステムを構成している。 As shown in FIG. 13, the gaze estimation system 1A is a distributed system in which a learning data generation device 100A, a model generation device 200A, and a gaze estimation device 300A are each independently located in separate locations.

学習用データ生成装置１００Ａは、入力機器１０Ａと、出力機器２０Ａと、記憶回路３０Ａと、処理回路４０Ａと、同期信号生成装置５０と、視線検出器６０と、学習用撮像部としての学習用カメラ７０Ａとを備えている。学習用データ生成装置１００Ａは、学習フェーズにおいて、推定対象の入力顔画像から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３を生成する処理を行う。学習用カメラ７０Ａは、学習対象者ＯＢの顔全体を撮像するものであり、学習対象者ＯＢと一定の間隔を空けた状態で当該学習対象者ＯＢの顔の前方に配置されている。処理回路４０Ａは、機能概念的に、学習用データ生成部４１を含んで構成される。 The learning data generation device 100A includes an input device 10A, an output device 20A, a memory circuit 30A, a processing circuit 40A, a synchronization signal generation device 50, a gaze detector 60, and a learning camera 70A as a learning imaging unit. In the learning phase, the learning data generation device 100A performs a process of generating a learning data set D3 used when machine learning a trained model M that estimates the gaze from an input face image of an estimation target. The learning camera 70A captures an image of the entire face of the learning subject OB, and is placed in front of the face of the learning subject OB with a certain distance between them. The processing circuit 40A is functionally configured to include a learning data generation unit 41.

モデル生成装置２００Ａは、入力機器１０Ｂと、出力機器２０Ｂと、記憶回路３０Ｂと、処理回路４０Ｂとを備え、学習フェーズにおいて、処理回路４０Ａの学習用データ生成部４１により生成された複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成する処理を行う。処理回路４０Ｂは、機能概念的に、モデル生成部４２を含んで構成される。 The model generating device 200A includes an input device 10B, an output device 20B, a memory circuit 30B, and a processing circuit 40B, and in the learning phase, performs a process of generating a trained model M by machine learning using a plurality of training data sets D3 generated by a training data generating unit 41 of the processing circuit 40A. Functionally, the processing circuit 40B is configured to include a model generating unit 42.

視線推定装置３００Ａは、図１４に示すように、車両に搭載されている。視線推定装置３００Ａは、入力機器１０Ｃと、出力機器２０Ｃと、記憶回路３０Ｃと、処理回路４０Ｃと、運転者撮像部としての運転者カメラ７０Ｃとを備え、使用フェーズにおいて、処理回路４０Ｂのモデル生成部４２により生成された学習済みモデルＭを用いて、入力顔画像データＤ４から運転者の視線を推定する処理を行う。これにより、視線推定装置３００Ａは、運転者の視線を適正に推定することができる。運転者カメラ７０Ｃは、車両の運転者の顔を撮像するものであり、車両のメーター内部やステアリングコラムカバーに設置されている。処理回路４０Ｃは、機能概念的に、推定対象入力部４３と、視線推定部４４と、出力部４５とを含んで構成される。記憶回路３０Ｃには、処理回路４０Ｂのモデル生成部４２により生成された学習済みモデルＭが予め保存されている。処理回路４０Ｃの視線推定部４４は、記憶回路３０Ｃに保存された学習済みモデルＭを用いて、運転者カメラ７０Ｃから出力される入力顔画像データＤ４から運転者の視線を推定する処理を行う。 The line of sight estimation device 300A is mounted on a vehicle, as shown in FIG. 14. The line of sight estimation device 300A includes an input device 10C, an output device 20C, a storage circuit 30C, a processing circuit 40C, and a driver camera 70C as a driver imaging unit, and generates a model of the processing circuit 40B in the use phase. Using the trained model M generated by the unit 42, a process of estimating the driver's line of sight from the input facial image data D4 is performed. Thereby, the line of sight estimating device 300A can appropriately estimate the driver's line of sight. The driver camera 70C takes an image of the face of the driver of the vehicle, and is installed inside the meter of the vehicle or on the steering column cover. The processing circuit 40C is functionally configured to include an estimation target input section 43, a line of sight estimation section 44, and an output section 45. The learned model M generated by the model generation unit 42 of the processing circuit 40B is stored in advance in the storage circuit 30C. The line of sight estimating unit 44 of the processing circuit 40C uses the learned model M stored in the storage circuit 30C to perform a process of estimating the driver's line of sight from the input facial image data D4 output from the driver camera 70C.

以上のように、視線推定システム１Ａは、学習用データ生成装置１００Ａ、モデル生成装置２００Ａ、及び、視線推定装置３００Ａが、それぞれ分かれて構成されてもよい。 As described above, the gaze estimation system 1A may be configured with the learning data generation device 100A, the model generation device 200A, and the gaze estimation device 300A each being separate.

なお、上記説明では、学習用データ生成装置１００は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれることを判定する顔判定部４４３を含んで構成される例について説明したが、顔判定部４４３を含んで構成されていなくてもよい。 In the above description, an example in which the learning data generation device 100 is configured to include a face determination unit 443 that determines that a face image of a learning target OB is included in the face image data D1a after replacement is described. However, it is not necessary to include the face determination unit 443.

学習用データ生成装置１００は、視線検出器６０により検出された視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する瞬き判定部４４５を含んで構成される例について説明したが、瞬き判定部４４５を含んで構成されていなくてもよい。 The learning data generating device 100 has been described as being configured to include a blink determination unit 445 that determines whether the learning subject OB blinks based on the gaze data D2 detected by the gaze detector 60, but it does not have to be configured to include a blink determination unit 445.

瞬き判定部４４５は、視線検出器６０により検出された視線データＤ２に含まれる学習対象者ＯＢの瞬きの検出結果に基づいて学習対象者ＯＢの瞬きを判定する例について説明したが、これに限定されない。瞬き判定部４４５は、例えば、学習対象者ＯＢの瞬きの検出結果が視線データＤ２に含まれない場合、視線データＤ２の視野画像に基づいて学習対象者ＯＢの瞬きを判定する。瞬き判定部４４５は、例えば、視線データＤ２の視野画像に示される学習対象者ＯＢの視線位置の値や黒眼中心値等に時系列フィルタリング等を行い、黒眼を検出できる場合には瞬きをしていないと判定し、黒眼を検出できない場合には瞬きをしていると判定する。 Although an example has been described in which the blink determination unit 445 determines the blink of the learning subject OB based on the detection result of the blink of the learning subject OB included in the gaze data D2 detected by the gaze detector 60, the present invention is not limited to this. Not done. For example, if the detection result of the blinking of the learning subject OB is not included in the visual line data D2, the blinking determination unit 445 determines the blinking of the learning subject OB based on the visual field image of the visual line data D2. The blink determination unit 445 performs, for example, time-series filtering on the value of the line-of-sight position of the learning subject OB, the center value of the melasma, etc. shown in the visual field image of the line-of-sight data D2, and determines whether or not to blink if the melasma can be detected. If the black eye cannot be detected, it is determined that the eye is not blinking.

視線検出器６０は、株式会社ナックイメージテクノロジー製のＥＭＲ－９（帽子型）を採用することができる例について説明したがこれに限定されず、例えば、ＥＭＲ－９（メガネ型）を採用してもよいし、トビー・テクノロジー株式会社製のＴｏｂｉＰｒｏグラス２（メガネ型）を採用してもよい。メガネ型の場合、画像置換部４４２は、メガネのフレームの画像を含む画像領域の画素値を置換画素値に置き換える。 The gaze detector 60 has been described as being an example in which the EMR-9 (hat type) manufactured by NAC Image Technology Co., Ltd. can be used, but is not limited to this. For example, the EMR-9 (glasses type) may be used, or TobiPro Glasses 2 (glasses type) manufactured by Tobi Technology Co., Ltd. may be used. In the case of the glasses type, the image replacement unit 442 replaces the pixel values of the image area including the image of the glasses frame with replacement pixel values.

機械学習アルゴリズムＡＬとして、畳み込みニューラルネットワークを用いる例について説明したが、これに限定されず、例えば、ロジスティック（Ｌｏｇｉｓｔｉｃ）回帰、アンサンブル学習（ＥｎｓｅｍｂｌｅＬｅａｒｎｉｎｇ）、サポートベクターマシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）、ランダムフォレスト（ＲａｎｄｏｍＦｏｒｅｓｔ）、ナイーブベイズ（ＮａｉｖｅＢａｙｓ）等のアルゴリズムを用いてもよい。 Although an example of using a convolutional neural network as the machine learning algorithm AL has been described, the present invention is not limited to this, and other algorithms such as logistic regression, ensemble learning, support vector machine, random forest, and naive bays may also be used.

処理回路４０は、単一のプロセッサによって各処理機能が実現されるものとして説明したがこれに限らない。処理回路４０は、複数の独立したプロセッサを組み合わせて各プロセッサがプログラムを実行することにより各処理機能が実現されてもよい。また、処理回路４０が有する処理機能は、単一又は複数の処理回路に適宜に分散又は統合されて実現されてもよい。また、処理回路４０が有する処理機能は、その全部又は任意の一部をプログラムにて実現してもよく、また、ワイヤードロジック等によるハードウェアとして実現してもよい。 Although the processing circuit 40 has been described as having each processing function realized by a single processor, the present invention is not limited to this. The processing circuit 40 may realize each processing function by combining a plurality of independent processors and having each processor execute a program. Further, the processing functions of the processing circuit 40 may be appropriately distributed or integrated into a single processing circuit or a plurality of processing circuits. Further, all or any part of the processing functions of the processing circuit 40 may be realized by a program, or may be realized by hardware using wired logic or the like.

以上で説明したプロセッサによって実行されるプログラムは、記憶回路３０等に予め組み込まれて提供される。なお、このプログラムは、これらの装置にインストール可能な形式又は実行可能な形式のファイルで、コンピュータで読み取り可能な記憶媒体に記録されて提供されてもよい。また、このプログラムは、インターネット等のネットワークに接続されたコンピュータ上に格納され、ネットワーク経由でダウンロードされることにより提供又は配布されてもよい。 The program executed by the processor described above is provided by being pre-installed in the storage circuit 30 or the like. Note that this program may be provided as a file in an installable or executable format on these devices and recorded on a computer-readable storage medium. Further, this program may be stored on a computer connected to a network such as the Internet, and may be provided or distributed by being downloaded via the network.

１、１Ａ視線推定システム
１００、１００Ａ学習用データ生成装置
３００、３００Ａ視線推定装置
４１学習用データ生成部
４２モデル生成部
４３推定対象入力部
４４視線推定部
６０視線検出器
７０カメラ（学習用撮像部、運転者撮像部）
７０Ａ学習用カメラ（学習用撮像部）
７０Ｃ運転者カメラ（運転者撮像部）
４４１眼球カメラ検出部（画像検出部）
４４２画像置換部
４４３顔判定部
４４５瞬き判定部
４４８対応付け処理部
Ｄ１、Ｄ１ａ顔画像データ
Ｄ２視線データ
Ｄ３学習用データセット
Ｄ４入力顔画像データ（入力顔画像）
Ｍ学習済みモデル
ＯＢ学習対象者
Ｑ画像領域 1, 1A Gaze estimation system 100, 100A Learning data generation device 300, 300A Gaze estimation device 41 Learning data generation unit 42 Model generation unit 43 Estimation target input unit 44 Gaze estimation unit 60 Gaze detector 70 Camera (learning imaging unit, driver imaging unit)
70A Learning camera (learning imaging unit)
70C Driver camera (driver imaging unit)
441 Eyeball camera detection unit (image detection unit)
442 Image replacement unit 443 Face determination unit 445 Blink determination unit 448 Correlation processing unit D1, D1a Face image data D2 Gaze data D3 Learning data set D4 Input face image data (input face image)
M Trained model OB Trainee Q Image area

Claims

推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成装置と、
前記学習用データ生成装置により生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成部と、
前記推定対象の入力顔画像を入力する推定対象入力部と、
前記モデル生成部により生成された前記学習済みモデルを用いて、前記推定対象入力部により入力された前記入力顔画像から視線を推定する視線推定部と、を備え、
前記学習用データ生成装置は、
学習対象者の顔を撮像する学習用撮像部と、
前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、
前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、
前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、
前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理部と、を含んで構成されることを特徴とする視線推定システム。 a learning data generation device that generates a learning data set used in machine learning a trained model that estimates line of sight from an input face image to be estimated;
a model generation unit that generates the learned model by machine learning using the plurality of learning data sets generated by the learning data generation device;
an estimation target input unit that inputs the input face image of the estimation target;
a line of sight estimating unit that uses the learned model generated by the model generating unit to estimate a line of sight from the input face image input by the estimation target input unit;
The learning data generation device includes:
a learning imaging unit that images the face of the learning subject;
a line-of-sight detector that is arranged between the learning imaging unit and the learning subject and detects the learning subject's line of sight;
an image detection unit that detects an image of the line-of-sight detector in face image data including an image of the line-of-sight detector and a face image of the learning subject captured by the learning imaging unit;
an image replacement unit that replaces, in the face image data, a pixel value of an image area including the image of the line of sight detector detected by the image detection unit with a predetermined pixel value;
Correlation processing for generating the learning data set in which the face image data after replacement replaced by the image replacement unit is associated with the line-of-sight data representing the line-of-sight of the learning subject detected by the line-of-sight detector. A line-of-sight estimation system comprising: a part;

前記学習用データ生成装置は、前記置換後の顔画像データにおいて、前記学習対象者の顔画像が含まれることを判定する顔判定部を含み、
前記対応付け処理部は、前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、
前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しない請求項１に記載の視線推定システム。 the learning data generation device includes a face determination unit that determines whether the replaced face image data includes a face image of the learning subject;
the association processing unit generates the learning data set in which the replaced face image data and the gaze data are associated with each other when the face determination unit determines that the replaced face image data includes a face image of the learning subject;
2. The gaze estimation system according to claim 1, wherein if the face determination unit determines that the replaced face image data does not include a face image of the subject, the learning data set in which the replaced face image data and the gaze data are associated with each other is not generated.

前記学習用データ生成装置は、前記視線検出器により検出された前記視線データに基づいて前記学習対象者の瞬きを判定する瞬き判定部を含み、
前記対応付け処理部は、前記瞬き判定部により瞬きをしていないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、
前記瞬き判定部により瞬きをしていると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しない請求項１又は２に記載の視線推定システム。 the learning data generation device includes a blink determination unit that determines a blink of the learning subject based on the gaze data detected by the gaze detector,
the association processing unit generates the learning data set in which the replaced face image data and the gaze data are associated with each other when the blink determination unit determines that the person has not blinked;
The gaze estimation system according to claim 1 or 2, wherein when the blink determination unit determines that a blink is occurring, the learning data set in which the replaced face image data and the gaze data are associated is not generated.

予め定められた前記画素値は、前記学習対象者の眼の色とは異なる色の画素値である請求項１～３のいずれか１項に記載の視線推定システム。 The line of sight estimation system according to any one of claims 1 to 3, wherein the predetermined pixel value is a pixel value of a color different from the eye color of the learning subject.

推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成ステップと、
前記学習用データ生成ステップで生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成ステップと、
前記推定対象の入力顔画像を入力する推定対象入力ステップと、
前記モデル生成ステップで生成された前記学習済みモデルを用いて、前記推定対象入力ステップで入力された前記入力顔画像から視線を推定する視線推定ステップと、を有し、
前記学習用データ生成ステップでは、
学習対象者の顔を学習用撮像部により撮像する撮像ステップと、
前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出する視線検出ステップと、
前記撮像ステップで撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出ステップと、
前記顔画像データにおいて、前記画像検出ステップで検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換ステップと、
前記画像置換ステップで置き換えられた置換後の顔画像データと、前記視線検出ステップで検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理ステップと、を含むことを特徴とする視線推定方法。 a training data generation step for generating a training data set used in machine learning a trained model that estimates line of sight from an input face image to be estimated;
a model generation step of generating the learned model by machine learning using the plurality of learning data sets generated in the learning data generation step;
an estimation target input step of inputting the input face image of the estimation target;
a gaze estimation step of estimating a gaze from the input face image input in the estimation target input step using the learned model generated in the model generation step;
In the learning data generation step,
an imaging step of imaging the learning subject's face using a learning imaging unit;
a line-of-sight detection step of detecting the line-of-sight of the learning subject using a line-of-sight detector disposed between the learning imaging unit and the learning subject;
an image detection step of detecting an image of the line-of-sight detector in face image data including an image of the line-of-sight detector and a face image of the learning subject captured in the imaging step;
an image replacement step of replacing, in the face image data, a pixel value of an image area including the image of the line of sight detector detected in the image detection step with a predetermined pixel value;
Correlation processing for generating the learning data set in which the replaced face image data replaced in the image replacement step and the line-of-sight data representing the line-of-sight of the learning subject detected in the line-of-sight detection step A gaze estimation method characterized by comprising steps.

推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成し、
生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成し、
前記推定対象の入力顔画像を入力し、
前記学習済みモデルを用いて、前記推定対象の入力顔画像から視線を推定する各処理をコンピュータに実行させるものであり、
前記学習用データセットを生成する場合、
学習対象者の顔を学習用撮像部により撮像し、
前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出し、
前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出し、
前記顔画像データにおいて、前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換え、
置換後の顔画像データと前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成することを特徴とする視線推定プログラム。 Generate a training dataset used for machine learning of a trained model that estimates gaze from an input face image to be estimated,
Generating the learned model by machine learning using the plurality of generated training data sets,
inputting the input face image of the estimation target;
The method causes a computer to perform each process of estimating the line of sight from the input face image of the estimation target using the learned model,
When generating the training dataset,
The face of the learning subject is captured by the learning imaging unit,
detecting the line of sight of the learning subject by a line of sight detector disposed between the learning imaging unit and the learning subject;
Detecting an image of the line of sight detector in face image data including an image of the line of sight detector and a face image of the learning subject captured by the learning imaging unit,
In the face image data, replacing pixel values of an image area including the image of the line of sight detector with predetermined pixel values,
A line-of-sight estimation program characterized in that the program generates the learning data set in which face image data after replacement is associated with line-of-sight data representing the line of sight of the learning subject.

学習対象者の顔を撮像する学習用撮像部と、
前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、
前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、
前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、
前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部と、を備えることを特徴とする学習用データ生成装置。 a learning image capturing unit that captures an image of a learning target person's face;
a line-of-sight detector arranged between the learning imaging unit and the learning subject and detecting the learning subject's line of sight;
an image detection unit that detects an image of the line-of-sight detector in face image data including an image of the line-of-sight detector and a face image of the learning subject captured by the learning imaging unit;
an image replacement unit that replaces, in the face image data, a pixel value of an image area including the image of the line of sight detector detected by the image detection unit with a predetermined pixel value;
an association processing unit that generates a learning data set in which face image data after replacement replaced by the image replacement unit is associated with line-of-sight data representing the line-of-sight of the learning subject detected by the line-of-sight detector; A learning data generation device comprising:

車両の運転者の顔を撮像する運転者撮像部と、
前記運転者撮像部により撮像された運転者の顔画像を入力する推定対象入力部と、
学習対象者の顔を撮像する学習用撮像部、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部、及び、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部を含む学習用データ生成装置により生成された複数の前記学習用データセットを用いて機械学習した学習済みモデルを用いて、前記推定対象入力部により入力された前記運転者の顔画像から前記運転者の視線を推定する視線推定部と、
を備えることを特徴とする視線推定装置。 a driver imaging unit that images the face of the driver of the vehicle;
an estimation target input unit that inputs a face image of the driver captured by the driver imaging unit;
a learning imaging unit that captures an image of the learning subject's face; a line-of-sight detector that is disposed between the learning imaging unit and the learning subject and detects the learning subject's line of sight; an image detection unit that detects an image of the line-of-sight detector in face image data including an image of the line-of-sight detector and a face image of the learning subject; an image replacement unit that replaces pixel values of an image area including the image of the line of sight detector with predetermined pixel values; and face image data after replacement replaced by the image replacement unit, and detection by the line of sight detector. A machine uses a plurality of learning data sets generated by a learning data generation device including a mapping processing unit that generates a learning data set in which the learning target's line of sight is associated with the learning target's line of sight. a line of sight estimating unit that uses the learned model to estimate the line of sight of the driver from the facial image of the driver input by the estimation target input unit;
A line-of-sight estimation device comprising: