JP2021190041A

JP2021190041A - Line-of-sight estimation system, line-of-sight estimation method, line-of-sight estimation program, learning data generation apparatus, and line-of-sight estimation apparatus

Info

Publication number: JP2021190041A
Application number: JP2020098172A
Authority: JP
Inventors: 勇氣 ▲高▼橋; Yuki Takahashi
Original assignee: Yazaki Corp
Current assignee: Yazaki Corp
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2021-12-13
Anticipated expiration: 2040-06-05
Also published as: JP7460450B2

Abstract

To provide a line-of-sight estimation system configured to accurately estimate a line of sight.SOLUTION: In a line-of-sight estimation system, an image replacement unit 442 of a learning data generation unit 100 replaces pixel values in an image area including an image of a line-of-sight detector 60 detected by an eyeball camera detection unit 441 with predetermined pixel values in face image data. An association processing unit 448 generates a learning data set by associating face image data replaced by the image replacement unit 442 with line-of-sight data detected by the line-of-sight detector 60. A model generation unit generates a learned model by machine learning by using a plurality of learning data sets generated by a learning data generation unit 41. A line-of-sight estimation unit estimates a line of sight from input face image data input by an estimation object input unit by using the learned model generated by the model generation unit.SELECTED DRAWING: Figure 3

Description

本発明は、視線推定システム、視線推定方法、視線推定プログラム、学習用データ生成装置、及び、視線推定装置に関する。 The present invention relates to a line-of-sight estimation system, a line-of-sight estimation method, a line-of-sight estimation program, a learning data generation device, and a line-of-sight estimation device.

従来、視線推定システムとして、例えば、特許文献１には、人物の顔を含む画像を取得する画像取得部と、取得した画像から人物の目を含む部分画像を抽出する画像抽出部と、視線方向を推定するための機械学習を行った学習済みの学習器に当該部分画像を入力することで、人物の視線方向を示す視線情報を学習器から取得する推定部とを備える情報処理装置が記載されている。 Conventionally, as a line-of-sight estimation system, for example, Patent Document 1 includes an image acquisition unit that acquires an image including a person's face, an image extraction unit that extracts a partial image including a person's eyes from the acquired image, and a line-of-sight direction. Described is an information processing device including an estimation unit that acquires line-of-sight information indicating a person's line-of-sight direction from the learner by inputting the partial image into a trained learner that has been machine-learned to estimate. ing.

特開２０１９−２８８４３号公報Japanese Unexamined Patent Publication No. 2019-28843

ところで、上述の特許文献１に記載の情報処理装置は、例えば、視線を推定する精度の低下を抑制する点で更なる改善の余地がある。 By the way, the information processing apparatus described in Patent Document 1 described above has room for further improvement in, for example, in suppressing a decrease in accuracy in estimating a line of sight.

そこで、本発明は、上記に鑑みてなされたものであって、適正に視線を推定することができる視線推定システム、視線推定方法、視線推定プログラム、視線推定装置、及び、適正に視線を推定することを支援することができる学習用データ生成装置を提供することを目的とする。 Therefore, the present invention has been made in view of the above, and is a line-of-sight estimation system capable of appropriately estimating the line of sight, a line-of-sight estimation method, a line-of-sight estimation program, a line-of-sight estimation device, and an appropriate line-of-sight estimation. It is an object of the present invention to provide a learning data generation device capable of supporting the above.

上述した課題を解決し、目的を達成するために、本発明に係る視線推定システムは、推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成装置と、前記学習用データ生成装置により生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成部と、前記推定対象の入力顔画像を入力する推定対象入力部と、前記モデル生成部により生成された前記学習済みモデルを用いて、前記推定対象入力部により入力された前記入力顔画像から視線を推定する視線推定部と、を備え、前記学習用データ生成装置は、学習対象者の顔を撮像する学習用撮像部と、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理部と、を含んで構成されることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the line-of-sight estimation system according to the present invention provides a learning data set used for machine-learning a trained model that estimates the line-of-sight from an input face image to be estimated. A model generation unit that generates the trained model by machine learning using the training data generation device to be generated and a plurality of the training data sets generated by the training data generation device, and an input of the estimation target. An estimation target input unit for inputting a face image, a line-of-sight estimation unit for estimating a line of sight from the input face image input by the estimation target input unit using the trained model generated by the model generation unit, and a line-of-sight estimation unit. The learning data generation device is arranged between a learning image pickup unit that captures an image of a learning target person's face, the learning image pickup unit, and the learning target person, and detects the line of sight of the learning target person. An image detection unit that detects an image of the line-of-sight detector in face image data including a line-of-sight detector, an image of the line-of-sight detector captured by the learning image pickup unit, and a face image of the learning target person, and the above-mentioned. In the face image data, an image replacement unit that replaces the pixel value of the image region including the image of the line-of-sight detector detected by the image detection unit with a predetermined pixel value, and a replacement that is replaced by the image replacement unit. It is configured to include a mapping processing unit that generates the learning data set in which the subsequent face image data and the line-of-sight data representing the line-of-sight of the learning target person detected by the line-of-sight detector are associated with each other. It is characterized by that.

上記視線推定システムにおいて、前記学習用データ生成装置は、前記置換後の顔画像データにおいて、前記学習対象者の顔画像が含まれることを判定する顔判定部を含み、前記対応付け処理部は、前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しないことが好ましい。 In the line-of-sight estimation system, the learning data generation device includes a face determination unit that determines that the face image of the learning target person is included in the replacement face image data, and the matching processing unit includes a face determination unit. When the face determination unit determines that the replaced face image data includes the face image of the learning target person, the learning data set in which the replaced face image data and the line-of-sight data are associated with each other. When it is determined by the face determination unit that the face image data after the replacement does not include the face image of the learning target person, the face image data after the replacement and the line-of-sight data are associated with each other. It is preferable not to generate the training data set.

上記視線推定システムにおいて、前記学習用データ生成装置は、前記視線検出器により検出された前記視線データに基づいて前記学習対象者の瞬きを判定する瞬き判定部を含み、前記対応付け処理部は、前記瞬き判定部により瞬きをしていないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、前記瞬き判定部により瞬きをしていると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しないことが好ましい。 In the line-of-sight estimation system, the learning data generation device includes a blink determination unit that determines the blink of the learning target person based on the line-of-sight data detected by the line-of-sight detector, and the association processing unit includes a blink determination unit. When it is determined by the blink determination unit that the blinking is not performed, the learning data set in which the replaced face image data and the line-of-sight data are associated with each other is generated, and the blink determination unit performs blinking. When it is determined that the data is present, it is preferable not to generate the learning data set in which the replaced face image data and the line-of-sight data are associated with each other.

上記視線推定システムにおいて、予め定められた前記画素値は、前記学習対象者の眼の色とは異なる色の画素値であることが好ましい。 In the line-of-sight estimation system, it is preferable that the predetermined pixel value is a pixel value having a color different from the eye color of the learning target person.

本発明に係る視線推定方法は、推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成ステップと、前記学習用データ生成ステップで生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成ステップと、前記推定対象の入力顔画像を入力する推定対象入力ステップと、前記モデル生成ステップで生成された前記学習済みモデルを用いて、前記推定対象入力ステップで入力された前記入力顔画像から視線を推定する視線推定ステップと、を有し、前記学習用データ生成ステップでは、学習対象者の顔を学習用撮像部により撮像する撮像ステップと、前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出する視線検出ステップと、前記撮像ステップで撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出ステップと、前記顔画像データにおいて、前記画像検出ステップで検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換ステップと、前記画像置換ステップで置き換えられた置換後の顔画像データと、前記視線検出ステップで検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理ステップと、を含むことを特徴とする。 The line-of-sight estimation method according to the present invention includes a training data generation step for generating a training data set used for machine-learning a trained model that estimates the line-of-sight from an input face image to be estimated, and the training data generation. A model generation step for generating the trained model by machine learning using the plurality of training data sets generated in the step, an estimation target input step for inputting an input face image of the estimation target, and the model generation. Using the trained model generated in the step, the learning target has a line-of-sight estimation step for estimating the line-of-sight from the input face image input in the estimation target input step, and the learning data generation step has a learning target. An imaging step of capturing a person's face with a learning image pickup unit, a line-of-sight detection step of detecting the line of sight of the learning target person with a line-of-sight detector arranged between the learning image pickup unit and the learning target person, and a line-of-sight detection step. In the image detection step for detecting the image of the line-of-sight detector in the face image data including the image of the line-of-sight detector and the face image of the learning target person captured in the imaging step, and the image in the face image data. An image replacement step in which the pixel value of the image area including the image of the line-of-sight detector detected in the detection step is replaced with a predetermined pixel value, and the face image data after replacement replaced in the image replacement step. It is characterized by including a mapping processing step for generating the learning data set in which the line-of-sight data representing the line of sight of the learning target person detected in the line-of-sight detection step is associated with the line-of-sight data.

本発明に係る視線推定プログラムは、推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成し、生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成し、前記推定対象の入力顔画像を入力し、前記学習済みモデルを用いて、前記推定対象の入力顔画像から視線を推定する各処理をコンピュータに実行させるものであり、前記学習用データセットを生成する場合、学習対象者の顔を学習用撮像部により撮像し、前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出し、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出し、前記顔画像データにおいて、前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換え、置換後の顔画像データと前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成することを特徴とする。 The line-of-sight estimation program according to the present invention generates a learning data set used for machine-learning a trained model that estimates the line-of-sight from an input face image to be estimated, and generates a plurality of the generated training data sets. Using the trained model, the trained model is generated, the input face image of the estimation target is input, and each process of estimating the line of sight from the input face image of the estimation target using the trained model is performed on the computer. When the data set for learning is generated, the face of the learning target person is imaged by the learning image pickup unit, and the line-of-sight detector is arranged between the learning image pickup unit and the learning target person. The line of sight of the learning target person is detected, and the image of the line of sight detector is detected in the face image data including the image of the line of sight detector captured by the learning imaging unit and the face image of the learning target person. In the face image data, the pixel value of the image region including the image of the line-of-sight detector is replaced with a predetermined pixel value, and the replaced face image data and the line-of-sight data representing the line of sight of the learning target person are replaced with each other. It is characterized in that the associated learning data set is generated.

本発明に係る学習用データ生成装置は、学習対象者の顔を撮像する学習用撮像部と、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部と、を備えることを特徴とする。 The learning data generation device according to the present invention is arranged between a learning image pickup unit that captures an image of a learning target person's face, the learning image pickup unit, and the learning target person, and detects the line of sight of the learning target person. An image detection unit that detects an image of the line-of-sight detector in face image data including a line-of-sight detector, an image of the line-of-sight detector captured by the learning image pickup unit, and a face image of the learning target person, and the above-mentioned. In the face image data, an image replacement unit that replaces the pixel value of the image region including the image of the line-of-sight detector detected by the image detection unit with a predetermined pixel value, and a replacement that is replaced by the image replacement unit. It is characterized by including a mapping processing unit that generates a learning data set in which the subsequent face image data and the line-of-sight data representing the line-of-sight of the learning target person detected by the line-of-sight detector are associated with each other. ..

本発明に係る視線推定装置は、車両の運転者の顔を撮像する運転者撮像部と、前記運転者撮像部により撮像された運転者の顔画像を入力する推定対象入力部と、学習対象者の顔を撮像する学習用撮像部、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部、及び、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部を含む学習用データ生成装置により生成された複数の前記学習用データセットを用いて機械学習した学習済みモデルを用いて、前記推定対象入力部により入力された前記運転者の顔画像から前記運転者の視線を推定する視線推定部と、を備えることを特徴とする。 The line-of-sight estimation device according to the present invention includes a driver image pickup unit that images the driver's face of a vehicle, an estimation target input unit that inputs a driver's face image captured by the driver image pickup unit, and a learning target person. A learning image pickup unit that captures an image of a face, a line-of-sight detector that is arranged between the learning image pickup unit and the learning target person to detect the line of sight of the learning target person, and the line-of-sight imaged by the learning image pickup unit. An image detection unit that detects the image of the line-of-sight detector in the face image data including the image of the detector and the face image of the learning target, and the line-of-sight detector detected by the image detection unit in the face image data. An image replacement unit that replaces the pixel value of the image region including the image with a predetermined pixel value, the face image data after the replacement replaced by the image replacement unit, and the line-of-sight detector. Learning machine-learned using a plurality of the learning data sets generated by a learning data generator including a matching processing unit that generates a learning data set in which the line-of-sight data representing the line-of-sight of the learning target is associated with each other. It is characterized by including a line-of-sight estimation unit that estimates the driver's line-of-sight from the driver's face image input by the estimation target input unit using the completed model.

本発明に係る視線推定システム、視線推定方法、視線推定プログラム、及び、視線推定装置は、顔画像データにおいて視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換えた学習用データセットを生成し、当該学習用データセットを用いて推定対象の入力顔画像から視線を推定する学習済みモデルを生成する。この結果、視線推定システム、視線推定方法、視線推定プログラム、及び、視線推定装置は、この学習済みモデルを用いて適正に視線を推定することができる。 The line-of-sight estimation system, the line-of-sight estimation method, the line-of-sight estimation program, and the line-of-sight estimation device according to the present invention replace the pixel values of the image region including the image of the line-of-sight detector in the face image data with predetermined pixel values. A training data set is generated, and a trained model that estimates the line of sight from the input face image to be estimated is generated using the training data set. As a result, the line-of-sight estimation system, the line-of-sight estimation method, the line-of-sight estimation program, and the line-of-sight estimation device can appropriately estimate the line-of-sight using this trained model.

図１は、実施形態に係る視線推定システムの構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of a line-of-sight estimation system according to an embodiment. 図２は、実施形態に係る視線推定システムの処理回路によって行われる学習フェーズ、及び、使用フェーズの処理を示す模式図である。FIG. 2 is a schematic diagram showing processing of a learning phase and a usage phase performed by the processing circuit of the line-of-sight estimation system according to the embodiment. 図３は、実施形態に係る学習用データ生成部の構成例を示すブロック図である。FIG. 3 is a block diagram showing a configuration example of the learning data generation unit according to the embodiment. 図４は、実施形態に係る同期信号生成装置の動作例を示すタイミングチャートである。FIG. 4 is a timing chart showing an operation example of the synchronization signal generator according to the embodiment. 図５は、実施形態に係る視線検出器の右側の眼球カメラの検出例を示す図である。FIG. 5 is a diagram showing a detection example of the eyeball camera on the right side of the line-of-sight detector according to the embodiment. 図６は、実施形態に係る視線検出器の左側の眼球カメラの検出例を示す図である。FIG. 6 is a diagram showing a detection example of the eyeball camera on the left side of the line-of-sight detector according to the embodiment. 図７は、実施形態に係る視線検出器の各眼球カメラの画像置換例を示す図である。FIG. 7 is a diagram showing an image replacement example of each eyeball camera of the line-of-sight detector according to the embodiment. 図８は、実施形態に係る顔判定部により顔判定された置換後の顔画像データを示す図である。FIG. 8 is a diagram showing face image data after replacement in which the face is determined by the face determination unit according to the embodiment. 図９は、実施形態に係る顔判定部により顔判定されなかった置換後の顔画像データを示す図である。FIG. 9 is a diagram showing face image data after replacement in which the face was not determined by the face determination unit according to the embodiment. 図１０は、実施形態に係る学習用データセットの構成例を示す図である。FIG. 10 is a diagram showing a configuration example of a learning data set according to an embodiment. 図１１は、実施形態に係る視線推定システムにおける視線推定方法の処理手順を示すフローチャートである。FIG. 11 is a flowchart showing a processing procedure of the line-of-sight estimation method in the line-of-sight estimation system according to the embodiment. 図１２は、実施形態に係る学習用データ生成装置の動作例を示すフローチャートである。FIG. 12 is a flowchart showing an operation example of the learning data generation device according to the embodiment. 図１３は、実施形態の変形例に係る視線推定システムの構成例を示すブロック図である。FIG. 13 is a block diagram showing a configuration example of a line-of-sight estimation system according to a modified example of the embodiment. 図１４は、実施形態の変形例に係る視線推定装置の適用例を示す概略図である。FIG. 14 is a schematic view showing an application example of the line-of-sight estimation device according to the modified example of the embodiment.

本発明を実施するための形態（実施形態）につき、図面を参照しつつ詳細に説明する。以下の実施形態に記載した内容により本発明が限定されるものではない。また、以下に記載した構成要素には、当業者が容易に想定できるもの、実質的に同一のものが含まれる。更に、以下に記載した構成は適宜組み合わせることが可能である。また、本発明の要旨を逸脱しない範囲で構成の種々の省略、置換又は変更を行うことができる。 An embodiment (embodiment) for carrying out the present invention will be described in detail with reference to the drawings. The present invention is not limited to the contents described in the following embodiments. In addition, the components described below include those that can be easily assumed by those skilled in the art and those that are substantially the same. Further, the configurations described below can be combined as appropriate. In addition, various omissions, substitutions or changes of the configuration can be made without departing from the gist of the present invention.

〔実施形態〕
図面を参照しながら実施形態に係る学習用データ生成装置１００について説明する。図１は、実施形態に係る視線推定システム１の構成例を示すブロック図である。図２は、実施形態に係る視線推定システム１の処理回路４０によって行われる学習フェーズ、及び、使用フェーズの処理を示す模式図である。図３は、実施形態に係る学習用データ生成装置１００の構成例を示すブロック図である。図４は、実施形態に係る同期信号生成装置５０の動作例を示すタイミングチャートである。図５は、実施形態に係る視線検出器６０の右側の眼球カメラ６２Ｒの検出例を示す図である。図６は、実施形態に係る視線検出器６０の左側の眼球カメラ６２Ｌの検出例を示す図である。図７は、実施形態に係る視線検出器６０の各眼球カメラ６２Ｒ、６２Ｌの画像置換例を示す図である。図８は、実施形態に係る顔判定部４４３により顔判定された置換後の顔画像データＤ１ａを示す図である。図９は、実施形態に係る顔判定部４４３により顔判定されなかった置換後の顔画像データＤ１ａを示す図である。図１０は、実施形態に係る学習用データセットＤ３の構成例を示す図である。 [Embodiment]
The learning data generation device 100 according to the embodiment will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of the line-of-sight estimation system 1 according to the embodiment. FIG. 2 is a schematic diagram showing processing of a learning phase and a usage phase performed by the processing circuit 40 of the line-of-sight estimation system 1 according to the embodiment. FIG. 3 is a block diagram showing a configuration example of the learning data generation device 100 according to the embodiment. FIG. 4 is a timing chart showing an operation example of the synchronization signal generation device 50 according to the embodiment. FIG. 5 is a diagram showing a detection example of the eyeball camera 62R on the right side of the line-of-sight detector 60 according to the embodiment. FIG. 6 is a diagram showing a detection example of the eyeball camera 62L on the left side of the line-of-sight detector 60 according to the embodiment. FIG. 7 is a diagram showing an example of image replacement of the eyeball cameras 62R and 62L of the line-of-sight detector 60 according to the embodiment. FIG. 8 is a diagram showing the face image data D1a after replacement, in which the face is determined by the face determination unit 443 according to the embodiment. FIG. 9 is a diagram showing the face image data D1a after replacement, in which the face was not determined by the face determination unit 443 according to the embodiment. FIG. 10 is a diagram showing a configuration example of the learning data set D3 according to the embodiment.

図１に示す本実施形態の視線推定システム１は、視線を推定するシステムである。視線推定システム１では、図２に示すように、視線を推定するための学習済みモデルＭを生成する処理を行う学習フェーズと、学習済みモデルＭを用いて視線を推定する処理を行う使用フェーズとがある。視線推定システム１は、種々のコンピュータ機器によって実現される。以下、図１、図２を参照して視線推定システム１の各構成について詳細に説明する。 The line-of-sight estimation system 1 of the present embodiment shown in FIG. 1 is a system for estimating the line-of-sight. In the line-of-sight estimation system 1, as shown in FIG. 2, a learning phase in which a trained model M for estimating the line-of-sight is generated and a use phase in which the line-of-sight is estimated using the trained model M are performed. There is. The line-of-sight estimation system 1 is realized by various computer devices. Hereinafter, each configuration of the line-of-sight estimation system 1 will be described in detail with reference to FIGS. 1 and 2.

視線推定システム１は、例えば、車両に搭載され、入力機器１０と、出力機器２０と、記憶回路３０と、処理回路４０と、同期信号生成装置５０と、視線検出器６０と、学習用撮像部及び運転者撮像部としてのカメラ７０とを備える。入力機器１０、出力機器２０、記憶回路３０、処理回路４０、同期信号生成装置５０、視線検出器６０、及び、カメラ７０は、ネットワークを介して相互に通信可能に接続されている。ここで、入力機器１０、記憶回路３０、同期信号生成装置５０、視線検出器６０、カメラ７０、及び、処理回路４０の一部（後述する学習用データ生成部４１）は、学習用データ生成装置１００を構成する。入力機器１０、記憶回路３０、及び、処理回路４０の一部（後述するモデル生成部４２）は、モデル生成装置２００を構成する。入力機器１０、出力機器２０、記憶回路３０、カメラ７０、及び、処理回路４０の一部（後述する推定対象入力部４３、視線推定部４４、及び、出力部４５）は、視線推定装置３００を構成する。学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００は、例えば、それぞれが同じ車両に搭載された１つのシステムとして構成されてもよいし、それぞれが別々の場所に配置された分散したシステムとして構成されてもよい。図１に示す構成の説明では、一例として、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００は、それぞれが同じ車両に搭載された１つのシステムとして説明する。後述する図１３に示す構成の説明では、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００は、それぞれが別々の場所に配置された分散したシステムとして説明する。 The line-of-sight estimation system 1 is mounted on a vehicle, for example, and includes an input device 10, an output device 20, a storage circuit 30, a processing circuit 40, a synchronization signal generation device 50, a line-of-sight detector 60, and a learning imaging unit. And a camera 70 as a driver image pickup unit. The input device 10, the output device 20, the storage circuit 30, the processing circuit 40, the synchronization signal generator 50, the line-of-sight detector 60, and the camera 70 are connected to each other so as to be communicable with each other via a network. Here, the input device 10, the storage circuit 30, the synchronization signal generation device 50, the line-of-sight detector 60, the camera 70, and a part of the processing circuit 40 (learning data generation unit 41 described later) are learning data generation devices. Consists of 100. The input device 10, the storage circuit 30, and a part of the processing circuit 40 (model generation unit 42, which will be described later) constitute a model generation device 200. The input device 10, the output device 20, the storage circuit 30, the camera 70, and a part of the processing circuit 40 (the estimation target input unit 43, the line-of-sight estimation unit 44, and the output unit 45, which will be described later) have the line-of-sight estimation device 300. Configure. The learning data generation device 100, the model generation device 200, and the line-of-sight estimation device 300 may be configured as one system mounted on the same vehicle, or may be arranged at different locations. It may be configured as a distributed system. In the description of the configuration shown in FIG. 1, as an example, the learning data generation device 100, the model generation device 200, and the line-of-sight estimation device 300 will be described as one system mounted on the same vehicle. In the description of the configuration shown in FIG. 13 to be described later, the learning data generation device 100, the model generation device 200, and the line-of-sight estimation device 300 will be described as distributed systems arranged in different places.

入力機器１０は、視線推定システム１に対する種々の入力を行う機器である。入力機器１０は、例えば、ユーザからの各種の操作入力を受け付ける操作入力機器、視線推定システム１外の他の機器からのデータ（情報）入力を受け付けるデータ入力機器等によって実現される。操作入力機器は、例えば、マウス、キーボード、トラックボール、スイッチ、ボタン、ジョイスティック、タッチパッド、タッチスクリーン、非接触入力回路、音声入力回路等により実現される。データ入力機器は、例えば、有線、無線を問わず通信を介して機器との間で各種データの送受信を行う通信インターフェース、フレキシブルディスク（ＦＤ）、光磁気ディスク（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｋ）、ＣＤ−ＲＯＭ、ＤＶＤ、ＵＳＢメモリ、ＳＤカードメモリ、Ｆｌａｓｈメモリ等の記録媒体から各種データを読み出す記録媒体インターフェース等によって実現される。ここでは、入力機器１０は、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００の入力部として兼用される。 The input device 10 is a device that performs various inputs to the line-of-sight estimation system 1. The input device 10 is realized by, for example, an operation input device that accepts various operation inputs from the user, a data input device that accepts data (information) input from another device other than the line-of-sight estimation system 1. The operation input device is realized by, for example, a mouse, a keyboard, a trackball, a switch, a button, a joystick, a touch pad, a touch screen, a non-contact input circuit, a voice input circuit, and the like. The data input device includes, for example, a communication interface for transmitting and receiving various data to and from the device via communication regardless of whether it is wired or wireless, a flexible disk (FD), a magneto-optical disk (Magnet-Optical disk), and a CD-ROM. It is realized by a recording medium interface or the like that reads various data from a recording medium such as a DVD, a USB memory, an SD card memory, or a Flash memory. Here, the input device 10 is also used as an input unit of the learning data generation device 100, the model generation device 200, and the line-of-sight estimation device 300.

出力機器２０は、視線推定システム１から種々の出力を行う機器である。出力機器２０は、例えば、各種画像情報を出力して表示するディスプレイ、音情報を出力するスピーカ、視線推定システム１外の他の機器に対するデータ（情報）出力を行うデータ出力機器等によって実現される。データ出力機器は、例えば、有線、無線を問わず通信を介して機器との間で各種データの送受信を行う通信インターフェース、上記と同様の記録媒体に各種データを書き込む記録媒体インターフェース等によって実現される。データ入力機器とデータ出力機器とは、一部又は全部の構成が兼用されてもよい。 The output device 20 is a device that outputs various outputs from the line-of-sight estimation system 1. The output device 20 is realized by, for example, a display that outputs and displays various image information, a speaker that outputs sound information, a data output device that outputs data (information) to other devices outside the line-of-sight estimation system 1. .. The data output device is realized by, for example, a communication interface for transmitting and receiving various data to and from the device via communication regardless of whether it is wired or wireless, a recording medium interface for writing various data to a recording medium similar to the above, and the like. .. The data input device and the data output device may have a part or all of the same configuration.

記憶回路３０は、各種データを記憶する回路である。記憶回路３０は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリ等の半導体メモリ素子、ハードディスク、光ディスク等によって実現される。記憶回路３０は、例えば、視線推定システム１が各種の機能を実現するためのプログラムを記憶する。記憶回路３０に記憶されるプログラムには、入力機器１０を機能させるプログラム、出力機器２０を機能させるプログラム、処理回路４０を機能させるプログラム等が含まれる。また、記憶回路３０は、処理回路４０での各種処理に必要なデータ、学習済みモデルＭの学習に用いる学習用データセットＤ３、学習済みモデルＭ、出力機器２０を介して出力する推定結果データＤ５等の各種データを記憶する。記憶回路３０は、処理回路４０等によってこれらの各種データが必要に応じて読み出される。なお、記憶回路３０は、ネットワークを介して視線推定システム１に接続されたクラウドサーバ等により実現されてもよい。ここでは、記憶回路３０は、学習用データ生成装置１００、モデル生成装置２００、及び、視線推定装置３００の記憶部として兼用される。 The storage circuit 30 is a circuit for storing various data. The storage circuit 30 is realized by, for example, a RAM (Random Access Memory), a semiconductor memory element such as a flash memory, a hard disk, an optical disk, or the like. The storage circuit 30 stores, for example, a program for the line-of-sight estimation system 1 to realize various functions. The program stored in the storage circuit 30 includes a program for functioning the input device 10, a program for functioning the output device 20, a program for functioning the processing circuit 40, and the like. Further, the storage circuit 30 has data required for various processes in the processing circuit 40, a learning data set D3 used for learning the trained model M, a trained model M, and an estimation result data D5 output via the output device 20. Store various data such as. In the storage circuit 30, these various data are read out as needed by the processing circuit 40 and the like. The storage circuit 30 may be realized by a cloud server or the like connected to the line-of-sight estimation system 1 via a network. Here, the storage circuit 30 is also used as a storage unit for the learning data generation device 100, the model generation device 200, and the line-of-sight estimation device 300.

処理回路４０は、視線推定システム１における各種処理機能を実現する回路である。処理回路４０は、例えば、プロセッサによって実現される。プロセッサとは、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の回路を意味する。処理回路４０は、例えば、記憶回路３０から読み込んだプログラムを実行することにより、各処理機能を実現する。 The processing circuit 40 is a circuit that realizes various processing functions in the line-of-sight estimation system 1. The processing circuit 40 is realized by, for example, a processor. The processor is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Circuit), or the like. The processing circuit 40 realizes each processing function by, for example, executing a program read from the storage circuit 30.

同期信号生成装置５０は、同期信号を生成する処理を実行可能な機能を有するものである。視線検出器６０は、学習対象者ＯＢの視線を検出する処理を実行可能な機能を有するものである。カメラ７０は、機械学習を行う際の対象者である学習対象者ＯＢの顔全体を撮像する処理を実行可能な機能を有するものである。また、カメラ７０は、車両の運転者の顔全体を撮像する処理を実行可能な機能を有するものである。同期信号生成装置５０、視線検出器６０、及び、カメラ７０の詳細については、後述する。 The synchronization signal generation device 50 has a function capable of executing a process of generating a synchronization signal. The line-of-sight detector 60 has a function capable of executing a process of detecting the line of sight of the learning target person OB. The camera 70 has a function capable of executing a process of capturing the entire face of the learning target person OB who is the target person when performing machine learning. Further, the camera 70 has a function capable of executing a process of capturing the entire face of the driver of the vehicle. Details of the synchronization signal generator 50, the line-of-sight detector 60, and the camera 70 will be described later.

以上、本実施形態に係る視線推定システム１の全体構成の概略について説明した。このような構成のもと、本実施形態に係る処理回路４０は、学習フェーズにおいて、視線を推定するための学習済みモデルＭを生成する各種処理を行うための機能を有している。また、本実施形態に係る処理回路４０は、使用フェーズにおいて、学習済みモデルＭを用いて視線を推定する各種処理を行うための機能を有している。 The outline of the overall configuration of the line-of-sight estimation system 1 according to the present embodiment has been described above. Under such a configuration, the processing circuit 40 according to the present embodiment has a function for performing various processes for generating a trained model M for estimating the line of sight in the learning phase. Further, the processing circuit 40 according to the present embodiment has a function for performing various processes for estimating the line of sight using the trained model M in the use phase.

本実施形態の処理回路４０は、上記各種処理機能を実現するために、機能概念的に、学習用データ生成部４１と、モデル生成部４２と、推定対象入力部４３と、視線推定部４４と、出力部４５とを含んで構成される。処理回路４０は、例えば、記憶回路３０から読み込んだプログラムを実行することにより、これらの学習用データ生成部４１、モデル生成部４２、推定対象入力部４３、視線推定部４４、及び、出力部４５の各処理機能を実現する。 In order to realize the various processing functions, the processing circuit 40 of the present embodiment functionally conceptually includes a learning data generation unit 41, a model generation unit 42, an estimation target input unit 43, and a line-of-sight estimation unit 44. , And an output unit 45 are included. The processing circuit 40, for example, by executing a program read from the storage circuit 30, includes a learning data generation unit 41, a model generation unit 42, an estimation target input unit 43, a line-of-sight estimation unit 44, and an output unit 45. Realize each processing function of.

学習用データ生成部４１は、学習フェーズにおいて、推定対象の入力顔画像から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３（図３、図１０参照）を生成する機能を有する部分である。本実施形態の学習用データ生成部４１は、例えば、後述する顔画像データＤ１ａと視線データＤ２とからなる学習用データセットＤ３を生成する処理を実行可能である。この学習用データセットＤ３は、学習済みモデルＭを機械学習によって生成する際に用いられる教師データである。学習用データ生成部４１は、生成した複数の学習用データセットＤ３を記憶回路３０に記憶させる。学習用データ生成部４１の処理の詳細については、後述する。 The learning data generation unit 41 generates a learning data set D3 (see FIGS. 3 and 10) used for machine learning a trained model M that estimates a line of sight from an input face image to be estimated in the learning phase. It is a part that has a function to do. The learning data generation unit 41 of the present embodiment can execute, for example, a process of generating a learning data set D3 including face image data D1a and line-of-sight data D2, which will be described later. The training data set D3 is teacher data used when the trained model M is generated by machine learning. The learning data generation unit 41 stores the generated plurality of learning data sets D3 in the storage circuit 30. The details of the processing of the learning data generation unit 41 will be described later.

モデル生成部４２は、学習フェーズにおいて、複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成する処理を実行可能な機能を有する部分である。本実施形態のモデル生成部４２は、学習用データ生成部４１によって生成された複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成する処理を実行可能である。 The model generation unit 42 is a part having a function of executing a process of generating a trained model M by machine learning using a plurality of learning data sets D3 in the learning phase. The model generation unit 42 of the present embodiment can execute a process of generating a trained model M by machine learning using a plurality of learning data sets D3 generated by the learning data generation unit 41.

モデル生成部４２は、複数の学習用データセットＤ３を教師データとして、種々の機械学習アルゴリズムＡＬに基づく機械学習を行うことによって、学習済みモデルＭを生成する。使用する機械学習アルゴリズムＡＬとしては、例えば、畳み込みニューラルネットワーク（ＣＮＮ；ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）がある。畳み込みニューラルネットワークは、パターン認識方法を多層化したＤＮＮ（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）のうち、２次元データに対応させたものであり、画像に対して高いパターン認識能力を有している。モデル生成部４２は、畳み込みニューラルネットワークによる機械学習の結果として、顔画像から視線を推定するための学習済みモデルＭを生成する。 The model generation unit 42 generates a trained model M by performing machine learning based on various machine learning algorithms AL using a plurality of training data sets D3 as teacher data. As the machine learning algorithm AL to be used, for example, there is a convolutional neural network (CNN). The convolutional neural network corresponds to two-dimensional data among DNN (Deep Neural Network) in which the pattern recognition method is multi-layered, and has a high pattern recognition ability for an image. The model generation unit 42 generates a trained model M for estimating the line of sight from the face image as a result of machine learning by the convolutional neural network.

モデル生成部４２によって生成された学習済みモデルＭは、入力を顔画像とし、出力を視線の推定を定量化した値としたモデルである。すなわち、学習済みモデルＭは、顔画像の入力を受け付けて当該顔画像から視線の予測を定量化した値を出力するように機能付けられる。より詳しくは、学習済みモデルＭは、畳み込みニューラルネットワークの入力層に入力された顔画像に対して、畳み込み層、プーリング層、全結合層等により所定の演算を行い、出力層から視線の推定を定量化した値（例えば視線角度）を出力するようにコンピュータを機能させる。モデル生成部４２は、上記のようにして生成した学習済みモデルＭを記憶回路３０に記憶させる。 The trained model M generated by the model generation unit 42 is a model in which the input is a face image and the output is a value obtained by quantifying the estimation of the line of sight. That is, the trained model M is functioned to accept the input of the face image and output the value obtained by quantifying the prediction of the line of sight from the face image. More specifically, the trained model M performs a predetermined calculation on the face image input to the input layer of the convolutional neural network by the convolutional layer, the pooling layer, the fully connected layer, etc., and estimates the line of sight from the output layer. Make the computer function to output quantified values (eg, line-of-sight angle). The model generation unit 42 stores the trained model M generated as described above in the storage circuit 30.

推定対象入力部４３は、使用フェーズにおいて、推定対象となる入力顔画像データＤ４を入力する処理を実行可能な機能を有する部分である。推定対象入力部４３は、例えば、車両を運転する運転者の顔を撮像するカメラ７０から出力される入力顔画像データＤ４を入力する。 The estimation target input unit 43 is a portion having a function capable of executing a process of inputting input face image data D4 to be an estimation target in the use phase. The estimation target input unit 43 inputs, for example, the input face image data D4 output from the camera 70 that captures the face of the driver who drives the vehicle.

視線推定部４４は、使用フェーズにおいて、学習済みモデルＭを用いて、入力顔画像データＤ４から視線を推定する処理を実行可能な機能を有する部分である。本実施形態の視線推定部４４は、モデル生成部４２によって生成された学習済みモデルＭを用いて、推定対象入力部４３によって入力された推定対象となる入力顔画像データＤ４から視線を推定する処理を実行可能である。 The line-of-sight estimation unit 44 is a part having a function of executing a process of estimating the line-of-sight from the input face image data D4 by using the trained model M in the use phase. The line-of-sight estimation unit 44 of the present embodiment uses the trained model M generated by the model generation unit 42 to estimate the line-of-sight from the input face image data D4 to be the estimation target input by the estimation target input unit 43. Is feasible.

視線推定部４４は、モデル生成部４２によって生成された学習済みモデルＭに対して、推定対象入力部４３によって入力された入力顔画像データＤ４を入力データとして入力し、これに応じて当該学習済みモデルＭから視線の推定を定量化した値（例えば視線角度）を出力させる。視線推定部４４は、出力された視線の推定を定量化した値（例えば視線角度）を、推定結果データ（出力データ）Ｄ５として記憶回路３０に記憶させる。 The line-of-sight estimation unit 44 inputs the input face image data D4 input by the estimation target input unit 43 as input data to the trained model M generated by the model generation unit 42, and the trained eye estimation unit 44 inputs the input face image data D4 as input data accordingly. A quantified value (for example, a line-of-sight angle) is output from the model M. The line-of-sight estimation unit 44 stores the quantified value (for example, the line-of-sight angle) of the output line-of-sight in the storage circuit 30 as the estimation result data (output data) D5.

出力部４５は、視線推定部４４による視線の推定結果に基づいて出力を行う処理を実行可能な機能を有する部分である。本実施形態の出力部４５は、視線推定部４４によって推定された推定結果データＤ５に基づいて出力機器２０を介して出力する処理を実行可能である。出力部４５は、例えば、推定結果データＤ５の視線角度に基づいて運転者が脇見運転をしているか否かを判定し、運転者が脇見運転をしていると判定した場合には運転者に警告するための警告データを出力機器２０に出力し、運転者が脇見運転をしていないと判定した場合には警告データを出力機器２０に出力しない。出力部４５は、運転者に警告する場合、例えば、出力機器２０を構成するディスプレイを介して画像情報として警告データを表示してもよいし、出力機器２０を構成するスピーカを介して音情報として警告データを音声出力してもよい。 The output unit 45 is a portion having a function capable of executing a process of outputting based on the estimation result of the line of sight by the line of sight estimation unit 44. The output unit 45 of the present embodiment can execute a process of outputting via the output device 20 based on the estimation result data D5 estimated by the line-of-sight estimation unit 44. For example, the output unit 45 determines whether or not the driver is inattentive driving based on the line-of-sight angle of the estimation result data D5, and if it is determined that the driver is inattentive driving, the output unit 45 informs the driver. The warning data for warning is output to the output device 20, and if it is determined that the driver is not inattentive driving, the warning data is not output to the output device 20. When warning the driver, the output unit 45 may display warning data as image information via a display constituting the output device 20, or as sound information via a speaker constituting the output device 20. Warning data may be output by voice.

次に、学習用データ生成装置１００について詳細に説明する。学習用データ生成装置１００は、上述したように、学習フェーズにおいて、学習済みモデルＭを学習させるための学習用データを生成する機能を有する装置である。学習用データ生成装置１００は、図３に示すように、同期信号生成装置５０と、視線検出器６０と、カメラ７０と、学習用データ生成部４１とを含んで構成される。 Next, the learning data generation device 100 will be described in detail. As described above, the learning data generation device 100 is a device having a function of generating learning data for training the trained model M in the learning phase. As shown in FIG. 3, the learning data generation device 100 includes a synchronization signal generation device 50, a line-of-sight detector 60, a camera 70, and a learning data generation unit 41.

同期信号生成装置５０は、同期信号を生成する処理を実行可能な機能を有するものである。同期信号生成装置５０は、視線検出器６０及びカメラ７０に接続され、当該同期信号生成装置５０に付属するスイッチがＯＮされると、視線検出器６０及びカメラ７０に同期信号を出力する。同期信号生成装置５０は、例えば、図４に示すように、カメラ７０に対して予め定められたタイミングで同期信号Ｓｇ１を出力し、視線検出器６０に対して予め定められたタイミングで同期信号Ｓｇ２を出力する。なお、同期信号Ｓｇ１の出力間隔は、カメラ７０の性能に応じて適宜定められ、同期信号Ｓｇ２の出力間隔は、視線検出器６０の性能に応じて適宜定められる。 The synchronization signal generation device 50 has a function capable of executing a process of generating a synchronization signal. The synchronization signal generation device 50 is connected to the line-of-sight detector 60 and the camera 70, and when the switch attached to the synchronization signal generation device 50 is turned on, the synchronization signal is output to the line-of-sight detector 60 and the camera 70. For example, as shown in FIG. 4, the synchronization signal generation device 50 outputs the synchronization signal Sg1 to the camera 70 at a predetermined timing, and outputs the synchronization signal Sg2 to the line-of-sight detector 60 at a predetermined timing. Is output. The output interval of the synchronization signal Sg1 is appropriately determined according to the performance of the camera 70, and the output interval of the synchronization signal Sg2 is appropriately determined according to the performance of the line-of-sight detector 60.

視線検出器６０は、学習対象者ＯＢの視線を検出する処理を実行可能な機能を有するものである。視線検出器６０は、学習対象者ＯＢの頭部に装着する装着型の検出器であり、例えば、株式会社ナックイメージテクノロジー製のＥＭＲ−９（帽子型）を採用することができる。視線検出器６０は、図５等に示すように、視野カメラ６１と、右側の眼球カメラ６２Ｒと、左側の眼球カメラ６２Ｌとを備え、これらの視野カメラ６１、眼球カメラ６２Ｒ、及び、眼球カメラ６２Ｌが帽子に取り付けられている。学習対象者ＯＢは、帽子を頭部に被ることで、視野カメラ６１、眼球カメラ６２Ｒ、及び、眼球カメラ６２Ｌがカメラ７０と学習対象者ＯＢとの間に配置される。これにより、視野カメラ６１、眼球カメラ６２Ｒ、及び、眼球カメラ６２Ｌが学習対象者ＯＢの顔の正面、つまり学習対象者ＯＢの顔（眼を除く部分）と重なった位置に配置される。視野カメラ６１は、学習対象者ＯＢが帽子を頭部に被った状態で、学習対象者ＯＢの頭部の額に配置され、学習対象者ＯＢの前方の景色を撮像する。眼球カメラ６２Ｒは、学習対象者ＯＢが帽子を頭部に被った状態で、学習対象者ＯＢの右眼の下、つまり右側の頬に配置され、学習対象者ＯＢの右眼を撮像する。眼球カメラ６２Ｌは、学習対象者ＯＢが帽子を頭部に被った状態で、学習対象者ＯＢの左眼の下、つまり左側の頬に配置され、学習対象者ＯＢの左眼を撮像する。 The line-of-sight detector 60 has a function capable of executing a process of detecting the line of sight of the learning target person OB. The line-of-sight detector 60 is a wearable detector worn on the head of the learning subject OB, and for example, EMR-9 (hat type) manufactured by Nac Image Technology Co., Ltd. can be adopted. As shown in FIG. 5, the line-of-sight detector 60 includes a field-of-view camera 61, a right-side eyeball camera 62R, and a left-side eyeball camera 62L, and these field-of-view cameras 61, an eyeball camera 62R, and an eyeball camera 62L. Is attached to the hat. By putting the hat on the head of the learning target person OB, the field-of-view camera 61, the eyeball camera 62R, and the eyeball camera 62L are arranged between the camera 70 and the learning target person OB. As a result, the visual field camera 61, the eyeball camera 62R, and the eyeball camera 62L are arranged in front of the face of the learning target OB, that is, at a position overlapping the face of the learning target OB (the portion excluding the eyes). The field-of-view camera 61 is placed on the forehead of the head of the learning target OB with the learning target OB wearing a hat on the head, and images the scenery in front of the learning target OB. The eyeball camera 62R is placed under the right eye of the learning target OB, that is, on the right cheek with the learning target OB wearing a hat on the head, and images the right eye of the learning target OB. The eyeball camera 62L is placed under the left eye of the learning target OB, that is, on the left cheek with the learning target OB wearing a hat on the head, and images the left eye of the learning target OB.

そして、視線検出器６０は、視野カメラ６１により撮像された視野画像、眼球カメラ６２Ｒにより撮像された学習対象者ＯＢの右眼画像、及び、眼球カメラ６２Ｌにより撮像された学習対象者ＯＢの左眼画像に基づいて、視野画像における学習対象者ＯＢの視線位置を検出する。つまり、視線検出器６０は、視野画像のＸＹ座標軸上において、学習対象者ＯＢの視線のＸ座標及びＹ座標を検出する。言い換えれば、視線検出器６０は、視野画像のＸＹ座標軸上において、実際に学習対象者ＯＢが視ている位置のＸ座標及びＹ座標を検出する。視線検出器６０は、検出した視野画像における学習対象者ＯＢの視線位置を表す視線データＤ２を記憶回路３０に保存する。視線検出器６０は、例えば、同期信号生成装置５０から出力された同期信号Ｓｇ２に基づいて学習対象者ＯＢの視線位置を検出し、検出した学習対象者ＯＢの視線位置を表す視線データＤ２及び当該視線データＤ２を検出したタイミングを表す同期信号Ｓｇ２を記憶回路３０に保存する。視線検出器６０は、学習用データ生成部４１に接続され、保存した視線データＤ２及び同期信号Ｓｇ２を学習用データ生成部４１に出力する。なお、視線検出器６０は、学習対象者ＯＢの視線を検出する際に学習対象者ＯＢの瞬きを検出し、瞬きの検出結果も視線データＤ２に含めて学習用データ生成部４１に出力してもよい。 Then, the line-of-sight detector 60 includes a field image image captured by the field camera 61, a right eye image of the learning target person OB captured by the eyeball camera 62R, and a left eye of the learning target person OB captured by the eyeball camera 62L. Based on the image, the line-of-sight position of the learning target person OB in the visual field image is detected. That is, the line-of-sight detector 60 detects the X-coordinate and the Y-coordinate of the line of sight of the learning target OB on the XY coordinate axes of the visual field image. In other words, the line-of-sight detector 60 detects the X-coordinate and the Y-coordinate of the position actually viewed by the learning target OB on the XY coordinate axes of the visual field image. The line-of-sight detector 60 stores the line-of-sight data D2 representing the line-of-sight position of the learning target person OB in the detected visual field image in the storage circuit 30. The line-of-sight detector 60 detects, for example, the line-of-sight position of the learning target person OB based on the synchronization signal Sg2 output from the synchronization signal generation device 50, and the line-of-sight data D2 representing the detected line-of-sight position of the learning target person OB and the relevant line-of-sight detector 60. The synchronization signal Sg2 representing the timing at which the line-of-sight data D2 is detected is stored in the storage circuit 30. The line-of-sight detector 60 is connected to the learning data generation unit 41, and outputs the stored line-of-sight data D2 and the synchronization signal Sg2 to the learning data generation unit 41. The line-of-sight detector 60 detects the blink of the learning target person OB when detecting the line of sight of the learning target person OB, includes the blink detection result in the line-of-sight data D2, and outputs the blinking detection result to the learning data generation unit 41. May be good.

カメラ７０は、学習用の撮像部であると共に推定画像入力用の撮像部であり、兼用されている。つまり、カメラ７０は、学習用データ生成装置１００の学習用の撮像部と、視線推定装置３００の推定画像入力用の撮像部として兼用される。なお、学習用データ生成装置１００の学習用の撮像部と、視線推定装置３００の推定画像入力用の撮像部とを別々のカメラとして設けてもよい。 The camera 70 is both an image pickup unit for learning and an image pickup unit for estimation image input, and is also used. That is, the camera 70 is also used as a learning image pickup unit of the learning data generation device 100 and an image pickup unit for estimation image input of the line-of-sight estimation device 300. The learning image pickup unit of the learning data generation device 100 and the image pickup unit for estimation image input of the line-of-sight estimation device 300 may be provided as separate cameras.

カメラ７０は、学習用の撮像部として機能する場合、機械学習を行う際の対象者である学習対象者ＯＢの顔全体を撮像する。カメラ７０は、学習対象者ＯＢと一定の間隔を空けた状態で当該学習対象者ＯＢの顔の前方に配置されている。カメラ７０は、図５、図６に示すように、学習対象者ＯＢの顔の正面に配置された視線検出器６０と共に学習対象者ＯＢの顔全体を撮像する。つまり、カメラ７０により撮像された顔画像データＤ１には、視線検出器６０の画像及び学習対象者ＯＢの顔画像が含まれている。カメラ７０は、撮像した視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１を記憶回路３０に保存する。カメラ７０は、例えば、同期信号生成装置５０から出力された同期信号Ｓｇ１に基づいて学習対象者ＯＢの顔を撮像し、撮像した視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１、並びに、当該顔画像データＤ１を撮像したタイミングを表す同期信号Ｓｇ１を記憶回路３０に保存する。カメラ７０は、学習用データ生成部４１に接続され、保存した顔画像データＤ１及び同期信号Ｓｇ１を学習用データ生成部４１に出力する。 When the camera 70 functions as an image pickup unit for learning, the camera 70 captures the entire face of the learning target person OB who is the target person when performing machine learning. The camera 70 is arranged in front of the face of the learning target person OB at a certain distance from the learning target person OB. As shown in FIGS. 5 and 6, the camera 70 captures the entire face of the learning subject OB together with the line-of-sight detector 60 arranged in front of the face of the learning subject OB. That is, the face image data D1 captured by the camera 70 includes the image of the line-of-sight detector 60 and the face image of the learning target person OB. The camera 70 stores the captured face image data D1 including the image of the line-of-sight detector 60 and the face image of the learning target OB in the storage circuit 30. The camera 70, for example, captures the face of the learning target OB based on the synchronization signal Sg1 output from the synchronization signal generation device 50, and the captured face including the image of the line-of-sight detector 60 and the face image of the learning target OB. The image data D1 and the synchronization signal Sg1 indicating the timing at which the face image data D1 is captured are stored in the storage circuit 30. The camera 70 is connected to the learning data generation unit 41, and outputs the saved face image data D1 and the synchronization signal Sg1 to the learning data generation unit 41.

カメラ７０は、推定画像入力用の撮像部として機能する場合、車両の運転者の顔全体を撮像する。カメラ７０は、その撮像範囲が運転者のアイリプス（アイボックス）をカバーする範囲に設定されている。ここで、アイリプスとは、運転席に着座した運転者の視点位置が位置することが想定される範囲を表すものである。カメラ７０は、撮像した運転者の顔画像を表す入力顔画像データＤ４を処理回路４０の推定対象入力部４３に出力する。 When the camera 70 functions as an image pickup unit for estimating an image input, the camera 70 captures the entire face of the driver of the vehicle. The image pickup range of the camera 70 is set to cover the driver's eye lip (eye box). Here, the irips represents a range in which the viewpoint position of the driver seated in the driver's seat is assumed to be located. The camera 70 outputs the input face image data D4 representing the captured driver's face image to the estimation target input unit 43 of the processing circuit 40.

学習用データ生成部４１は、機械学習を行う際に用いられる学習用データセットＤ３を生成する処理を実行可能な機能を有するものである。学習用データ生成部４１は、眼球カメラ検出部４４１と、画像置換部４４２と、顔判定部４４３と、ラベル付与部４４４と、瞬き判定部４４５と、ラベル付与部４４６と、視線角度演算部４４７と、対応付け処理部４４８とを備える。 The learning data generation unit 41 has a function capable of executing a process of generating a learning data set D3 used when performing machine learning. The learning data generation unit 41 includes an eyeball camera detection unit 441, an image replacement unit 442, a face determination unit 443, a label assignment unit 444, a blink determination unit 445, a label assignment unit 446, and a line-of-sight angle calculation unit 447. And a mapping processing unit 448.

眼球カメラ検出部４４１は、視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１において、視線検出器６０の画像を検出する処理を実行可能な機能を有する部分である。眼球カメラ検出部４４１は、例えば、顔画像データＤ１において、眼球カメラ６２Ｒの画像、及び、眼球カメラ６２Ｒを帽子に固定するためのアームの画像を検出する。具体的には、眼球カメラ検出部４４１は、テンプレート画像を用いたテンプレートマッチング等の周知の画像認識技術により、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像を検出する。眼球カメラ検出部４４１は、テンプレート画像として、例えば、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像を表す第１画像Ｕ１と、当該第１画像Ｕ１を反転させた第２画像Ｕ２とを用いる。第２画像Ｕ２は、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームを表す画像である。 The eyeball camera detection unit 441 is a portion having a function capable of executing a process of detecting the image of the line-of-sight detector 60 in the face image data D1 including the image of the line-of-sight detector 60 and the face image of the learning target OB. The eyeball camera detection unit 441 detects, for example, an image of the eyeball camera 62R and an image of an arm for fixing the eyeball camera 62R to a hat in the face image data D1. Specifically, the eyeball camera detection unit 441 detects an image of the eyeball camera 62R and an image of an arm for fixing the eyeball camera 62R by a well-known image recognition technique such as template matching using a template image. As template images, the eyeball camera detection unit 441 includes, for example, a first image U1 representing an image of the eyeball camera 62R and an image of an arm for fixing the eyeball camera 62R, and a second image U2 obtained by inverting the first image U1. Is used. The second image U2 is an image of the eyeball camera 62L and an image showing an arm for fixing the eyeball camera 62L.

眼球カメラ検出部４４１は、顔画像データＤ１において、第１画像Ｕ１に基づいてテンプレートマッチングを行い、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像を検出する。また、眼球カメラ検出部４４１は、顔画像データＤ１において、第２画像Ｕ２に基づいてテンプレートマッチングを行い、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームの画像を検出する。眼球カメラ検出部４４１は、画像置換部４４２に接続され、検出した検出結果を画像置換部４４２に出力する。 The eyeball camera detection unit 441 performs template matching based on the first image U1 in the face image data D1 and detects the image of the eyeball camera 62R and the image of the arm for fixing the eyeball camera 62R. Further, the eyeball camera detection unit 441 performs template matching based on the second image U2 in the face image data D1 and detects the image of the eyeball camera 62L and the image of the arm for fixing the eyeball camera 62L. The eyeball camera detection unit 441 is connected to the image replacement unit 442 and outputs the detected detection result to the image replacement unit 442.

画像置換部４４２は、画像を置換する処理を実行可能な機能を有する部分である。画像置換部４４２は、例えば、眼球カメラ検出部４４１により検出された検出結果に基づいて、顔画像データＤ１における特定の画像領域Ｑを塗りつぶすことで画像置換する。画像置換部４４２は、例えば、図７に示すように、顔画像データＤ１において、眼球カメラ検出部４４１により検出された視線検出器６０の画像を含む画像領域Ｑの画素値を、予め定められた画素値である置換画素値に置き換える。このとき、画像置換部４４２は、顔画像データＤ１における画像領域Ｑの画素値を、眼の色とは異なる色の置換画素値に置き換えることが好ましい。具体的には、画像置換部４４２は、顔画像データＤ１において、眼球カメラ検出部４４１により検出された眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像、及び、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームの画像を含む矩形状の画像領域Ｑの画素値を、置換画素値に置き換え、置換後の顔画像データＤ１ａを生成する。これにより、置換後の顔画像データＤ１ａには、眼球カメラ６２Ｒの画像及び眼球カメラ６２Ｒ固定用のアームの画像、及び、眼球カメラ６２Ｌの画像及び眼球カメラ６２Ｌ固定用のアームの画像が含まれなくなる。なお、眼は、例えば、瞳孔及び虹彩から構成される。眼の色とは異なる色は、顔画像データＤ１において、顔の皮膚の色の平均的な色とすることが考えられる。画像置換部４４２は、対象の顔画像データＤ１ごとにそれぞれ異なる置換画素値を設定することが可能である。画像置換部４４２は、顔判定部４４３に接続され、置換後の顔画像データＤ１ａを顔判定部４４３に出力する。 The image replacement unit 442 is a portion having a function of executing a process of replacing an image. The image replacement unit 442 replaces the image by filling a specific image area Q in the face image data D1 based on the detection result detected by the eyeball camera detection unit 441, for example. For example, as shown in FIG. 7, the image replacement unit 442 predetermined the pixel value of the image region Q including the image of the line-of-sight detector 60 detected by the eyeball camera detection unit 441 in the face image data D1. Replace with a replacement pixel value that is a pixel value. At this time, it is preferable that the image replacement unit 442 replaces the pixel value of the image region Q in the face image data D1 with a replacement pixel value of a color different from the eye color. Specifically, the image replacement unit 442 includes an image of the eyeball camera 62R detected by the eyeball camera detection unit 441, an image of an arm for fixing the eyeball camera 62R, an image of the eyeball camera 62L, and an image of the eyeball camera 62L in the face image data D1. The pixel value of the rectangular image area Q including the image of the arm for fixing the eyeball camera 62L is replaced with the replacement pixel value, and the replaced face image data D1a is generated. As a result, the replaced face image data D1a does not include the image of the eyeball camera 62R, the image of the arm for fixing the eyeball camera 62R, the image of the eyeball camera 62L, and the image of the arm for fixing the eyeball camera 62L. .. The eye is composed of, for example, a pupil and an iris. It is conceivable that the color different from the eye color is the average color of the skin color of the face in the face image data D1. The image replacement unit 442 can set different replacement pixel values for each target face image data D1. The image replacement unit 442 is connected to the face determination unit 443 and outputs the replaced face image data D1a to the face determination unit 443.

顔判定部４４３は、顔の判定を行う処理を実行可能な機能を有する部分である。顔判定部４４３は、例えば、Ｖｉｏｌａ−Ｊｏｎｅｓ法等の周知の顔判定アルゴリズムにより顔の判定を行う。顔判定部４４３は、この顔判定アルゴリズムを使用して、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれるか否かを判定する。顔判定部４４３は、例えば、図８に示すように、画像置換部４４２により眼球カメラ６２Ｒ、６２Ｌ等を含む画像領域Ｑの画素値が置換画素値に置き換えられている場合、学習対象者ＯＢの顔画像が含まれると判定する。一方で、顔判定部４４３は、例えば、図９に示すように、画像置換部４４２により眼球カメラ６２Ｒ、６２Ｌ等を含む画像領域Ｑの画素値が置換画素値に置き換えられていない場合、学習対象者ＯＢの顔画像が含まれていないと判定する。顔判定部４４３は、ラベル付与部４４４に接続され、判定結果をラベル付与部４４４に出力する。 The face determination unit 443 is a portion having a function capable of executing a process of determining a face. The face determination unit 443 determines the face by, for example, a well-known face determination algorithm such as the Viola-Jones method. The face determination unit 443 uses this face determination algorithm to determine whether or not the face image of the learning target person OB is included in the face image data D1a after replacement. As shown in FIG. 8, for example, when the pixel value of the image area Q including the eyeball cameras 62R, 62L, etc. is replaced by the replacement pixel value by the image replacement unit 442, the face determination unit 443 of the learning target person OB. It is determined that the face image is included. On the other hand, as shown in FIG. 9, the face determination unit 443 is a learning target when the pixel value of the image area Q including the eyeball cameras 62R, 62L, etc. is not replaced by the replacement pixel value by the image replacement unit 442, for example. It is determined that the face image of the person OB is not included. The face determination unit 443 is connected to the label assignment unit 444 and outputs the determination result to the label assignment unit 444.

ラベル付与部４４４は、顔判定部４４３により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して顔判定のラベルを付与する処理を実行可能な機能を有する部分である。ラベル付与部４４４は、例えば、顔判定部４４３により判定された判定結果が、学習対象者ＯＢの顔画像が含まれていないことを表す場合、置換後の顔画像データＤ１ａに対して顔判定の不可を表すラベル（例えば「１」）を付与する。一方で、ラベル付与部４４４は、顔判定部４４３により判定された判定結果が、学習対象者ＯＢの顔画像が含まれていることを表す場合、置換後の顔画像データＤ１ａに対して顔判定の可能を表すラベル（例えば「０」）を付与する。ラベル付与部４４４は、対応付け処理部４４８に接続され、置換後の顔画像データＤ１ａ及びラベル付与情報を対応付け処理部４４８に出力する。 The label assigning unit 444 is a portion having a function capable of executing a process of assigning a face determination label to the face image data D1a after replacement based on the determination result determined by the face determination unit 443. For example, when the determination result determined by the face determination unit 443 does not include the face image of the learning target OB, the label assigning unit 444 determines the face with respect to the face image data D1a after replacement. A label indicating impossibility (for example, "1") is given. On the other hand, when the determination result determined by the face determination unit 443 includes the face image of the learning target person OB, the label giving unit 444 determines the face with respect to the face image data D1a after replacement. A label (for example, "0") indicating the possibility of The label assignment unit 444 is connected to the association processing unit 448, and outputs the replaced face image data D1a and the label assignment information to the association processing unit 448.

瞬き判定部４４５は、視線検出器６０により検出された視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する処理を実行可能な機能を有する部分である。瞬き判定部４４５は、例えば、視線検出器６０により検出された視線データＤ２に含まれる学習対象者ＯＢの瞬きの検出結果に基づいて学習対象者ＯＢの瞬きを判定する。瞬き判定部４４５は、例えば、視線データＤ２が学習対象者ＯＢの瞬きを表す場合、学習対象者ＯＢが瞬きをしていると判定する。一方で、瞬き判定部４４５は、視線データＤ２が学習対象者ＯＢの瞬きを表わさない場合、学習対象者ＯＢが瞬きをしていないと判定する。瞬き判定部４４５は、ラベル付与部４４６に接続され、学習対象者ＯＢの瞬きを判定した判定結果をラベル付与部４４６に出力する。 The blink determination unit 445 is a portion having a function capable of executing a process of determining the blink of the learning target person OB based on the line-of-sight data D2 detected by the line-of-sight detector 60. The blink determination unit 445 determines, for example, the blink of the learning target person OB based on the detection result of the blink of the learning target person OB included in the line-of-sight data D2 detected by the line-of-sight detector 60. For example, when the line-of-sight data D2 represents the blink of the learning target person OB, the blink determination unit 445 determines that the learning target person OB is blinking. On the other hand, when the line-of-sight data D2 does not represent the blink of the learning target person OB, the blink determination unit 445 determines that the learning target person OB is not blinking. The blink determination unit 445 is connected to the label assignment unit 446, and outputs the determination result of determining the blink of the learning target person OB to the label assignment unit 446.

ラベル付与部４４６は、瞬き判定部４４５により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して瞬き判定のラベルを付与する処理を実行可能な機能を有する部分である。ラベル付与部４４６は、顔画像データＤ１ａを撮像したタイミングを表す同期信号Ｓｇ１と、視線データＤ２を検出したタイミングを表す同期信号Ｓｇ２とに基づいて、顔画像データＤ１ａに対して瞬き判定のラベルを付与する。ラベル付与部４４６は、例えば、視線データＤ２を検出したタイミングと同じタイミングで撮像した顔画像データＤ１ａに対して瞬き判定のラベルを付与する。 The label assigning unit 446 is a portion having a function capable of executing a process of assigning a blink determination label to the face image data D1a after replacement based on the determination result determined by the blink determination unit 445. The labeling unit 446 assigns a blink determination label to the face image data D1a based on the synchronization signal Sg1 indicating the timing at which the face image data D1a is imaged and the synchronization signal Sg2 indicating the timing at which the line-of-sight data D2 is detected. Give. The label assigning unit 446, for example, assigns a blink determination label to the face image data D1a captured at the same timing as the line-of-sight data D2 is detected.

ラベル付与部４４６は、例えば、瞬き判定部４４５により判定された判定結果が、学習対象者ＯＢが瞬きをしたことを表す場合、置換後の顔画像データＤ１ａに対して瞬きをしたことを表すラベル（例えば「１」）を付与する。一方で、ラベル付与部４４６は、瞬き判定部４４５により判定された判定結果が、学習対象者ＯＢが瞬きをしていないことを表す場合、置換後の顔画像データＤ１ａに対して瞬きをしていないことを表すラベル（例えば「０」）を付与する。ラベル付与部４４６は、視線角度演算部４４７に接続され、ラベル付与情報を視線角度演算部４４７に出力する。 The label giving unit 446 is a label indicating that, for example, when the determination result determined by the blink determination unit 445 indicates that the learning target person OB has blinked, the face image data D1a after replacement has blinked. (For example, "1") is given. On the other hand, when the determination result determined by the blink determination unit 445 indicates that the learning target person OB is not blinking, the labeling unit 446 blinks the face image data D1a after replacement. A label (for example, "0") indicating that there is no such substance is given. The label assigning unit 446 is connected to the line-of-sight angle calculation unit 447, and outputs the label assignment information to the line-of-sight angle calculation unit 447.

視線角度演算部４４７は、視野画像における学習対象者ＯＢの視線位置を表す視線データＤ２に基づいて視線角度を演算する処理を実行可能な機能を有する部分である。視線角度演算部４４７は、瞬きをしていることを表すラベル（例えば「１」）が視線データＤ２に付与されている場合、視線データＤ２に対して視線角度を演算しない。一方で、視線角度演算部４４７は、瞬きをしていないことを表すラベル（例えば「０」）が視線データＤ２に付与されている場合、視線データＤ２に基づいて視線角度を演算する。ここで、視線データＤ２には、視野画像のＸＹ座標軸上において、学習対象者ＯＢの視線位置を表すＸ座標及びＹ座標が記録されている。視線角度演算部４４７は、この視野画像において、学習対象者ＯＢの視線位置を表すＸ座標及びＹ座標に基づいて視線角度を演算する。視線角度演算部４４７は、対応付け処理部４４８に接続され、演算した視線角度を対応付け処理部４４８に出力する。 The line-of-sight angle calculation unit 447 is a part having a function capable of executing a process of calculating the line-of-sight angle based on the line-of-sight data D2 representing the line-of-sight position of the learning target person OB in the visual field image. The line-of-sight angle calculation unit 447 does not calculate the line-of-sight angle with respect to the line-of-sight data D2 when a label (for example, "1") indicating that the user is blinking is attached to the line-of-sight data D2. On the other hand, the line-of-sight angle calculation unit 447 calculates the line-of-sight angle based on the line-of-sight data D2 when a label (for example, "0") indicating that blinking is not performed is attached to the line-of-sight data D2. Here, in the line-of-sight data D2, the X-coordinate and the Y-coordinate representing the line-of-sight position of the learning target person OB are recorded on the XY coordinate axes of the visual field image. The line-of-sight angle calculation unit 447 calculates the line-of-sight angle based on the X-coordinate and the Y-coordinate representing the line-of-sight position of the learning target person OB in this visual field image. The line-of-sight angle calculation unit 447 is connected to the mapping processing unit 448, and outputs the calculated line-of-sight angle to the matching processing unit 448.

対応付け処理部４４８は、各種データを対応づけて学習用データセットＤ３を生成する処理を実行可能な機能を有する部分である。対応付け処理部４４８は、画像置換部４４２により置き換えられた置換後の顔画像データＤ１ａと、視線検出器６０により検出された視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する。 The association processing unit 448 is a part having a function capable of executing a process of associating various data with each other to generate a learning data set D3. The association processing unit 448 generates a learning data set D3 in which the face image data D1a after replacement replaced by the image replacement unit 442 and the line-of-sight angle of the line-of-sight data D2 detected by the line-of-sight detector 60 are associated with each other. do.

対応付け処理部４４８は、例えば、顔判定部４４３により置換後の顔画像データＤ１に学習対象者ＯＢの顔画像が含まれると判定された場合、つまり顔判定のラベル付与情報が「０」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する。例えば、図１０に示す学習用データセットＤ３には、顔判定のラベル付与情報「０（採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００１.ｊｐｇ」とその視線角度を表す「１５．３」とが対応付けられる。 For example, when the face determination unit 443 determines that the face image data D1 after replacement includes the face image of the learning target person OB, the association processing unit 448 sets the label assignment information for face determination to "0". In this case, a learning data set D3 in which the face image data D1a after replacement and the line-of-sight angle of the line-of-sight data D2 are associated with each other is generated. For example, in the learning data set D3 shown in FIG. 10, the face determination label assignment information “0 (adopted)” is recorded, so that the face image data D1a after replacement is represented by “0001.jpg” and its line of sight. It is associated with "15.3" representing an angle.

一方で、対応付け処理部４４８は、顔判定部４４３により置換後の顔画像データＤ１に学習対象者ＯＢの顔画像が含まれないと判定された場合、つまり顔判定のラベル付与情報が「１」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成しない。例えば、図１０に示す学習用データセットＤ３には、顔判定のラベル付与情報「１（不採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００３.ｊｐｇ」とその視線角度を表す「−１２．１」とが対応付けられず不採用となる。 On the other hand, when the matching processing unit 448 determines that the face image data D1 after replacement does not include the face image of the learning target person OB by the face determination unit 443, that is, the face determination label assignment information is "1". In the case of ", the learning data set D3 in which the face image data D1a after replacement and the line-of-sight angle of the line-of-sight data D2 are associated with each other is not generated. For example, in the learning data set D3 shown in FIG. 10, the face determination label assignment information “1 (not adopted)” is recorded, so that the replaced face image data D1a is represented by “0003.jpg” and the like. It is not adopted because it is not associated with "-12.1." Which represents the line-of-sight angle.

また、対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていないと判定された場合、つまり瞬き判定のラベル付与情報が「０」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する。例えば、図１０に示す学習用データセットＤ３には、瞬き判定のラベル付与情報「０（採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００１.ｊｐｇ」とその視線角度を表す「１５．３」とが対応付けられる。 Further, when the blinking determination unit 445 determines that the blinking is not performed, that is, when the blinking determination label assignment information is "0", the association processing unit 448 has the face image data D1a and the line-of-sight data D2 after replacement. A training data set D3 associated with the line-of-sight angle of is generated. For example, in the learning data set D3 shown in FIG. 10, the label assignment information “0 (adopted)” for blinking determination is recorded, so that “0001.jpg” representing the replaced face image data D1a and its line of sight are recorded. It is associated with "15.3" representing an angle.

一方で、対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていると判定された場合、つまり瞬き判定のラベル付与情報が「１」の場合、置換後の顔画像データＤ１ａと視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成しない。例えば、図１０に示す学習用データセットＤ３には、瞬き判定のラベル付与情報「１（不採用）」が記録されることにより、置換後の顔画像データＤ１ａを表す「０００４.ｊｐｇ」とその視線角度を表す「０」とが対応付けられず不採用となる。 On the other hand, when the association processing unit 448 is determined by the blink determination unit 445 to be blinking, that is, when the blink determination label assignment information is "1", the face image data D1a and the line-of-sight data after replacement are used. The training data set D3 associated with the line-of-sight angle of D2 is not generated. For example, in the learning data set D3 shown in FIG. 10, the label assignment information “1 (not adopted)” for blinking determination is recorded, so that the replaced face image data D1a is represented by “0004. jpg” and the like. It is not adopted because it is not associated with "0" representing the line-of-sight angle.

対応付け処理部４４８は、顔判定のラベル付与情報「０（採用）」であり、且つ、瞬き判定のラベル付与情報「０（採用）」である置換後の顔画像データＤ１ａを学習用データセットＤ３に採用する。対応付け処理部４４８は、顔判定のラベル付与情報「１（不採用）」、又は、瞬き判定のラベル付与情報「１（不採用）」を含む置換後の顔画像データＤ１ａを学習用データセットＤ３に採用しない。対応付け処理部４４８は、記憶回路３０に接続され、生成した学習用データセットＤ３を記憶回路３０に記憶させる。 The association processing unit 448 uses the replaced face image data D1a, which is the face determination label assignment information "0 (adopted)" and the blink determination label assignment information "0 (adopted)", as a learning data set. Adopted for D3. The association processing unit 448 uses the replaced face image data D1a including the face determination label assignment information "1 (not adopted)" or the blink determination label assignment information "1 (not adopted)" as a learning data set. Not adopted for D3. The association processing unit 448 is connected to the storage circuit 30 and stores the generated learning data set D3 in the storage circuit 30.

次に、視線推定システム１における視線推定方法の処理手順について説明する。図１１は、実施形態に係る視線推定システム１における視線推定方法の処理手順を示すフローチャートである。図１１に示す視線推定システム１おける視線推定方法は、学習用データ生成ステップ（ステップＳ１）と、モデル生成ステップ（ステップＳ２）と、推定対象入力ステップ（ステップＳ３）と、視線推定ステップ（ステップＳ４）と、警告データ出力ステップ（ステップＳ５）とを有する。ここでは、上記各ステップに関する処理は、視線推定システム１の処理回路４０によって実行される。 Next, the processing procedure of the line-of-sight estimation method in the line-of-sight estimation system 1 will be described. FIG. 11 is a flowchart showing a processing procedure of the line-of-sight estimation method in the line-of-sight estimation system 1 according to the embodiment. The line-of-sight estimation method in the line-of-sight estimation system 1 shown in FIG. 11 includes a learning data generation step (step S1), a model generation step (step S2), an estimation target input step (step S3), and a line-of-sight estimation step (step S4). ) And a warning data output step (step S5). Here, the processing related to each of the above steps is executed by the processing circuit 40 of the line-of-sight estimation system 1.

まず、処理回路４０の学習用データ生成部４１は、学習フェーズにおいて、推定対象の入力顔画像から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３を生成する学習用データ生成ステップ（ステップＳ１）を実行する。学習用データ生成部４１は、生成した複数の学習用データセットＤ３を記憶回路３０に記憶させる。 First, the learning data generation unit 41 of the processing circuit 40 generates learning data set D3 used for machine learning the learned model M that estimates the line of sight from the input face image to be estimated in the learning phase. Data generation step (step S1) is executed. The learning data generation unit 41 stores the generated plurality of learning data sets D3 in the storage circuit 30.

次に、処理回路４０のモデル生成部４２は、学習フェーズにおいて、学習用データ生成ステップ（ステップＳ１）で生成された複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成するモデル生成ステップ（ステップＳ２）を実行する。モデル生成部４２は、生成した学習済みモデルＭを記憶回路３０に記憶させる。 Next, in the learning phase, the model generation unit 42 of the processing circuit 40 generates a trained model M by machine learning using the plurality of learning data sets D3 generated in the learning data generation step (step S1). The model generation step (step S2) to be performed is executed. The model generation unit 42 stores the generated learned model M in the storage circuit 30.

次に、処理回路４０の推定対象入力部４３は、使用フェーズにおいて、推定対象となる入力顔画像データＤ４を処理回路４０の視線推定部４４に入力する入力ステップ（ステップＳ３）を実行する。この場合、推定対象入力部４３は、例えば、運転者の顔を撮像するカメラ７０から出力される入力顔画像データＤ４を入力する。 Next, the estimation target input unit 43 of the processing circuit 40 executes an input step (step S3) in which the input face image data D4 to be estimated is input to the line-of-sight estimation unit 44 of the processing circuit 40 in the use phase. In this case, the estimation target input unit 43 inputs, for example, the input face image data D4 output from the camera 70 that captures the driver's face.

次に、処理回路４０の視線推定部４４は、使用フェーズにおいて、モデル生成ステップ（ステップＳ２）で生成された学習済みモデルＭを用いて、入力ステップ（ステップＳ３）で入力された入力顔画像データＤ４から視線を推定する視線推定ステップ（ステップＳ４）を実行する。視線推定部４４は、例えば、モデル生成ステップ（ステップＳ２）で生成された学習済みモデルＭに対して、入力ステップ（ステップＳ３）で入力された入力顔画像データＤ４を入力データとして入力し、これに応じて当該学習済みモデルＭから視線の推定を定量化した値（例えば視線角度）を出力させる。視線推定部４４は、出力された視線の推定を定量化した値（例えば視線角度）を、推定結果データＤ５として記憶回路３０に記憶させる。 Next, the line-of-sight estimation unit 44 of the processing circuit 40 uses the trained model M generated in the model generation step (step S2) in the use phase, and inputs the input face image data in the input step (step S3). The line-of-sight estimation step (step S4) for estimating the line-of-sight from D4 is executed. The line-of-sight estimation unit 44 inputs, for example, the input face image data D4 input in the input step (step S3) as input data to the trained model M generated in the model generation step (step S2). A value (for example, a line-of-sight angle) obtained by quantifying the estimation of the line-of-sight is output from the trained model M according to the above. The line-of-sight estimation unit 44 stores the quantified value (for example, the line-of-sight angle) of the output line-of-sight in the storage circuit 30 as the estimation result data D5.

次に、処理回路４０の出力部４５は、視線推定ステップ（ステップＳ４）で推定された視線の推定結果データＤ５の視線角度に基づいて警告データを出力する警告データ出力ステップ（ステップＳ５）を実行し、本フローチャートによる処理を終了する。出力部４５は、例えば、推定結果データＤ５の視線角度に基づいて運転者が脇見運転をしているか否かを判定し、運転者が脇見運転をしていると判定した場合には警告データを出力機器２０に出力し、運転者が脇見運転をしていないと判定した場合には警告データを出力機器２０に出力しない。 Next, the output unit 45 of the processing circuit 40 executes a warning data output step (step S5) that outputs warning data based on the line-of-sight angle of the line-of-sight estimation result data D5 estimated in the line-of-sight estimation step (step S4). Then, the process according to this flowchart is terminated. For example, the output unit 45 determines whether or not the driver is inattentive driving based on the line-of-sight angle of the estimation result data D5, and if it is determined that the driver is inattentive driving, warns data. If the data is output to the output device 20 and it is determined that the driver is not inattentive driving, the warning data is not output to the output device 20.

図１２は、実施形態に係る学習用データ生成装置１００の動作例を示すフローチャートである。上記学習用データ生成ステップ（ステップＳ１）は、図１２に示すように、さらに、撮像ステップ（ステップＴ１）と、画像検出ステップ（ステップＴ２）と、画像置換ステップ（ステップＴ３）と、顔判定ステップ（ステップＴ４）と、ラベル付与ステップ（Ｔ５、Ｔ６）と、視線検出ステップ（ステップＴ７）と、瞬き判定ステップ（ステップＴ８）と、ラベル付与ステップ（ステップＴ９、Ｔ１０）と、演算ステップ（ステップＴ１１）と、対応付け処理ステップ（ステップＴ１２）とを含む。ここでは、上記各ステップに関する処理は、学習用データ生成装置１００によって実行される。 FIG. 12 is a flowchart showing an operation example of the learning data generation device 100 according to the embodiment. As shown in FIG. 12, the learning data generation step (step S1) further includes an imaging step (step T1), an image detection step (step T2), an image replacement step (step T3), and a face determination step. (Step T4), labeling step (T5, T6), line-of-sight detection step (step T7), blink determination step (step T8), labeling step (step T9, T10), calculation step (step T11). ) And the association processing step (step T12). Here, the processing related to each of the above steps is executed by the learning data generation device 100.

まず、学習用データ生成装置１００において、カメラ７０は、学習対象者ＯＢの顔を撮像する撮像ステップ（ステップＴ１）を実行する。 First, in the learning data generation device 100, the camera 70 executes an imaging step (step T1) of imaging the face of the learning target person OB.

次に、眼球カメラ検出部４４１は、撮像ステップ（ステップＴ１）で撮像された視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１において、視線検出器６０の画像を検出する画像検出ステップ（ステップＴ２）を実行する。 Next, the eyeball camera detection unit 441 detects the image of the line-of-sight detector 60 in the face image data D1 including the image of the line-of-sight detector 60 captured in the imaging step (step T1) and the face image of the learning target OB. The image detection step (step T2) to be performed is executed.

次に、画像置換部４４２は、顔画像データＤ１において、画像検出ステップ（ステップＴ２）で検出された視線検出器６０の画像を含む画像領域Ｑの画素値を、予め定められた置換画素値に置き換える画像置換ステップ（ステップＴ３）を実行する。このとき、画像置換部４４２は、顔画像データＤ１における画像領域Ｑの画素値を、眼の色とは異なる色の置換画素値に置き換えることが好ましい。 Next, the image replacement unit 442 changes the pixel value of the image region Q including the image of the line-of-sight detector 60 detected in the image detection step (step T2) into a predetermined replacement pixel value in the face image data D1. Perform the replacement image replacement step (step T3). At this time, it is preferable that the image replacement unit 442 replaces the pixel value of the image region Q in the face image data D1 with a replacement pixel value of a color different from the eye color.

次に、顔判定部４４３は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれるか否かを判定する顔判定ステップ（ステップＴ４）を実行する。顔判定部４４３は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれる場合（ステップＴ４；Ｙｅｓ）、ラベル付与ステップ（ステップＴ５）に移行する。一方で、顔判定部４４３は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれない場合（ステップＴ４；Ｎｏ）、ラベル付与ステップ（ステップＴ６）に移行する。 Next, the face determination unit 443 executes a face determination step (step T4) for determining whether or not the face image of the learning target person OB is included in the face image data D1a after replacement. When the face image data D1a after replacement includes the face image of the learning target person OB (step T4; Yes), the face determination unit 443 shifts to the labeling step (step T5). On the other hand, when the face image data D1a after replacement does not include the face image of the learning target person OB (step T4; No), the face determination unit 443 shifts to the labeling step (step T6).

次に、ラベル付与部４４４は、顔判定部４４３により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して顔判定のラベルを付与する。ラベル付与部４４４は、学習対象者ＯＢの顔画像が含まれると判定された場合（ステップＴ４；Ｙｅｓ）、置換後の顔画像データＤ１ａに対して顔判定の可能を表すラベル（例えば「０」）を付与するラベル付与ステップ（ステップＴ５）を実行する。一方で、ラベル付与部４４４は、学習対象者ＯＢの顔画像が含まれないと判定された場合（ステップＴ４；Ｎｏ）、置換後の顔画像データＤ１ａに対して顔判定の不可を表すラベル（例えば「１」）を付与するラベル付与ステップ（ステップＴ６）を実行する。 Next, the label assigning unit 444 assigns a face determination label to the replaced face image data D1a based on the determination result determined by the face determination unit 443. When it is determined that the face image of the learning target OB is included (step T4; Yes), the label assigning unit 444 indicates that the face image data D1a after replacement can be face-determined (for example, “0””. ) Is given a label giving step (step T5). On the other hand, when it is determined that the face image of the learning target person OB is not included (step T4; No), the label assigning unit 444 indicates that the face image data D1a after replacement cannot be face-determined (step T4; No). For example, a label assignment step (step T6) for assigning “1”) is executed.

視線検出器６０は、同期信号生成装置５０から出力される同期信号Ｓｇ２のタイミングで、学習対象者ＯＢの視線を検出する視線検出ステップ（ステップＴ７）を実行する。 The line-of-sight detector 60 executes a line-of-sight detection step (step T7) for detecting the line of sight of the learning target OB at the timing of the synchronization signal Sg2 output from the synchronization signal generation device 50.

次に、瞬き判定部４４５は、視線検出器６０により検出された視線を表す視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する瞬き判定ステップ（ステップＴ８）を実行する。瞬き判定部４４５は、学習対象者ＯＢが瞬きをしていると判定した場合（ステップＴ８；Ｙｅｓ）、ラベル付与ステップ（ステップＴ９）に移行する。一方で、瞬き判定部４４５は、学習対象者ＯＢが瞬きをしていないと判定した場合（ステップＴ８；Ｎｏ）、ラベル付与ステップ（ステップＴ１０）に移行する。 Next, the blink determination unit 445 executes a blink determination step (step T8) for determining the blink of the learning target person OB based on the line-of-sight data D2 representing the line of sight detected by the line-of-sight detector 60. When the blink determination unit 445 determines that the learning target OB is blinking (step T8; Yes), the blink determination unit 445 shifts to the labeling step (step T9). On the other hand, when the blink determination unit 445 determines that the learning target person OB is not blinking (step T8; No), the blink determination unit 445 shifts to the labeling step (step T10).

次に、ラベル付与部４４６は、瞬き判定部４４５により判定された判定結果に基づいて、置換後の顔画像データＤ１ａに対して瞬き判定のラベルを付与する。ラベル付与部４４６は、学習対象者ＯＢが瞬きをしていると判定された場合（ステップＴ８；Ｙｅｓ）、置換後の顔画像データＤ１ａに対して瞬きを実施したことを表すラベル（例えば「１」）を付与するラベル付与ステップ（ステップＴ９）を実行する。一方で、ラベル付与部４４６は、学習対象者ＯＢが瞬きをしていないと判定された場合（ステップＴ８；Ｎｏ）、置換後の顔画像データＤ１ａに対して瞬きを実施していないことを表すラベル（例えば「０」）を付与するラベル付与ステップ（ステップＴ１０）を実行する。 Next, the label assigning unit 446 assigns a blink determination label to the face image data D1a after replacement based on the determination result determined by the blink determination unit 445. When it is determined that the learning target OB is blinking (step T8; Yes), the label giving unit 446 indicates that blinking has been performed on the replaced face image data D1a (for example, "1"). ”) Is added, and the labeling step (step T9) is executed. On the other hand, when it is determined that the learning target person OB is not blinking (step T8; No), the labeling unit 446 indicates that blinking is not performed on the face image data D1a after replacement. A labeling step (step T10) for assigning a label (for example, "0") is executed.

次に、視線角度演算部４４７は、瞬きを実施していないことを表すラベル（例えば「０」）を付与するラベル付与ステップ（ステップＴ１０）の処理の後、視野画像における学習対象者ＯＢの視線位置を表す視線データＤ２に基づいて視線角度を演算する演算ステップ（ステップＴ１１）を実行する。 Next, the line-of-sight angle calculation unit 447, after the processing of the labeling step (step T10) for adding a label (for example, “0”) indicating that blinking is not performed, the line-of-sight of the learning target OB in the visual field image. The calculation step (step T11) for calculating the line-of-sight angle based on the line-of-sight data D2 representing the position is executed.

次に、対応付け処理部４４８は、ラベル付与ステップ（ステップＴ５）で顔判定の可能を表すラベル（例えば「０」）が付与された置換後の顔画像データＤ１ａと、演算ステップ（ステップＴ１１）で演算された視線データＤ２の視線角度とを対応付けた学習用データセットＤ３を生成する対応付け処理ステップ（ステップＴ１２）を実行する。対応付け処理部４４８は、例えば、撮像ステップ（ステップＴ１）でカメラ７０により学習対象者ＯＢの顔を撮像したタイミングと、視線検出ステップ（ステップＴ７）で視線検出器６０により学習対象者ＯＢの視線を検出したタイミングとがそれぞれ同期する置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成する。 Next, the association processing unit 448 has the replaced face image data D1a to which a label (for example, “0”) indicating the possibility of face determination is assigned in the label assignment step (step T5), and the calculation step (step T11). The mapping processing step (step T12) for generating the learning data set D3 associated with the line-of-sight angle of the line-of-sight data D2 calculated in 1 is executed. The association processing unit 448 has, for example, the timing at which the face of the learning target person OB is imaged by the camera 70 in the imaging step (step T1) and the line of sight of the learning target person OB by the line-of-sight detector 60 in the line-of-sight detection step (step T7). A learning data set D3 is generated in which the face image data D1a after replacement and the line-of-sight data D2 are associated with each other in synchronization with the timing at which the above is detected.

上述した視線推定方法は、予め用意された視線推定プログラムを種々のコンピュータ機器で実行することによって実現することができる。この視線推定プログラムは、少なくとも上述した学習用データ生成ステップ（ステップＳ１）、モデル生成ステップ（ステップＳ２）、推定対象入力ステップ（ステップＳ３）、視線推定ステップ（ステップＳ４）、警告データ出力ステップ（ステップＳ５）の各処理、さらには、撮像ステップ（ステップＴ１）、視線検出ステップ（ステップＴ７）、画像検出ステップ（ステップＴ２）、画像置換ステップ（ステップＴ３）、対応付け処理ステップ（ステップＴ１２）の各処理をコンピュータ機器に実行させる。 The above-mentioned line-of-sight estimation method can be realized by executing a line-of-sight estimation program prepared in advance on various computer devices. This line-of-sight estimation program has at least the above-mentioned learning data generation step (step S1), model generation step (step S2), estimation target input step (step S3), line-of-sight estimation step (step S4), and warning data output step (step). Each process of S5), further, each of an imaging step (step T1), a line-of-sight detection step (step T7), an image detection step (step T2), an image replacement step (step T3), and an association processing step (step T12). Have a computer device perform the process.

以上のように、実施形態に係る視線推定システム１は、学習用データ生成装置１００と、モデル生成部４２と、推定対象入力部４３と、視線推定部４４とを備える。学習用データ生成装置１００は、推定対象の入力顔画像データＤ４から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３を生成する。モデル生成部４２は、学習用データ生成装置１００により生成された複数の学習用データセットＤ３を用いて、機械学習により学習済みモデルＭを生成する。推定対象入力部４３は、推定対象の入力顔画像データＤ４を入力する。視線推定部４４は、モデル生成部４２により生成された学習済みモデルＭを用いて、推定対象入力部４３により入力された入力顔画像データＤ４から視線を推定する。 As described above, the line-of-sight estimation system 1 according to the embodiment includes a learning data generation device 100, a model generation unit 42, an estimation target input unit 43, and a line-of-sight estimation unit 44. The learning data generation device 100 generates a learning data set D3 used when machine learning a trained model M that estimates a line of sight from input face image data D4 to be estimated. The model generation unit 42 generates a trained model M by machine learning using a plurality of learning data sets D3 generated by the learning data generation device 100. The estimation target input unit 43 inputs the input face image data D4 of the estimation target. The line-of-sight estimation unit 44 estimates the line-of-sight from the input face image data D4 input by the estimation target input unit 43 using the trained model M generated by the model generation unit 42.

ここで、上記学習用データ生成装置１００は、カメラ７０と、視線検出器６０と、眼球カメラ検出部４４１と、画像置換部４４２と、対応付け処理部４４８と、を含んで構成される。カメラ７０は、機械学習を行う際の対象者である学習対象者ＯＢの顔を撮像する。視線検出器６０は、カメラ７０と学習対象者ＯＢとの間に配置され学習対象者ＯＢの視線を検出する。眼球カメラ検出部４４１は、カメラ７０により撮像された視線検出器６０の画像及び学習対象者ＯＢの顔画像を含む顔画像データＤ１において、視線検出器６０の画像を検出する。画像置換部４４２は、顔画像データＤ１において、眼球カメラ検出部４４１により検出された視線検出器６０の画像を含む画像領域Ｑの画素値を、予め定められた画素値に置き換える。対応付け処理部４４８は、画像置換部４４２により置き換えられた置換後の顔画像データＤ１ａと、視線検出器６０により検出された視線データＤ２とを対応付けた学習用データセットＤ３を生成する。 Here, the learning data generation device 100 includes a camera 70, a line-of-sight detector 60, an eyeball camera detection unit 441, an image replacement unit 442, and an association processing unit 448. The camera 70 captures the face of the learning target person OB who is the target person when performing machine learning. The line-of-sight detector 60 is arranged between the camera 70 and the learning target person OB and detects the line-of-sight of the learning target person OB. The eyeball camera detection unit 441 detects the image of the line-of-sight detector 60 in the face image data D1 including the image of the line-of-sight detector 60 captured by the camera 70 and the face image of the learning target OB. The image replacement unit 442 replaces the pixel value of the image region Q including the image of the line-of-sight detector 60 detected by the eyeball camera detection unit 441 in the face image data D1 with a predetermined pixel value. The association processing unit 448 generates a learning data set D3 in which the replaced face image data D1a replaced by the image replacement unit 442 and the line-of-sight data D2 detected by the line-of-sight detector 60 are associated with each other.

この構成により、視線推定システム１は、学習用データセットＤ３により学習した学習済みモデルＭを生成する際に視線検出器６０を眼として誤認識した状態で学習済みモデルＭが生成されてしまうことを抑制できる。つまり、視線推定システム１は、推定対象の入力顔画像データＤ４から視線を推定することができる学習済みモデルＭを精度よく生成することができる。この結果、視線推定システム１は、精度よく生成された学習済みモデルＭを用いて入力顔画像データＤ４から運転者の視線を適正に推定することができる。またこのとき、視線推定システム１は、視線検出器６０の画像を含む画像領域Ｑの画素値を予め定められた画素値に置き換えるので、従来のように視線検出器６０の画像を削除した上で顔画像を復元するような処理と比較して、推定精度を確保した上で演算負荷を軽減することができる。視線推定システム１は、学習対象者ＯＢの頭部に装着する装着型の視線検出器６０を採用することにより、実際に学習対象者ＯＢが視た位置（視線位置）に基づいて機械学習を行うことができるため、精度のよい学習用データセットＤ３を生成することができる。このように、視線推定システム１は、視線検出器６０により精度のよい視線データＤ２を検出することができ、その上で視線検出器６０を採用するがゆえに視線検出器６０が写り込んでしまうという背反（デメリット）も解消することができる。視線推定システム１は、対応付け処理部４４８により置換後の顔画像データＤ１ａと視線データＤ２とを対応付けるので、自動的に学習用データセットＤ３を生成することができる。 With this configuration, when the line-of-sight estimation system 1 generates the trained model M learned by the learning data set D3, the trained model M is generated in a state where the line-of-sight detector 60 is erroneously recognized as an eye. Can be suppressed. That is, the line-of-sight estimation system 1 can accurately generate a trained model M capable of estimating the line-of-sight from the input face image data D4 to be estimated. As a result, the line-of-sight estimation system 1 can appropriately estimate the line-of-sight of the driver from the input face image data D4 using the trained model M generated with high accuracy. At this time, since the line-of-sight estimation system 1 replaces the pixel value of the image area Q including the image of the line-of-sight detector 60 with a predetermined pixel value, the image of the line-of-sight detector 60 is deleted as in the conventional case. Compared with the process of restoring the face image, the calculation load can be reduced while ensuring the estimation accuracy. The line-of-sight estimation system 1 employs a wearable line-of-sight detector 60 mounted on the head of the learning target person OB, and performs machine learning based on the position (line-of-sight position) actually seen by the learning target person OB. Therefore, it is possible to generate an accurate learning data set D3. As described above, the line-of-sight estimation system 1 can detect the line-of-sight data D2 with high accuracy by the line-of-sight detector 60, and since the line-of-sight detector 60 is adopted on the line-of-sight data D2, the line-of-sight detector 60 is reflected. The conflict (disadvantage) can also be eliminated. Since the line-of-sight estimation system 1 associates the replaced face image data D1a with the line-of-sight data D2 by the association processing unit 448, the learning data set D3 can be automatically generated.

上記視線推定システム１において、学習用データ生成装置１００は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれることを判定する顔判定部４４３を含んで構成される。対応付け処理部４４８は、顔判定部４４３により置換後の顔画像データＤ１ａに学習対象者ＯＢの顔画像が含まれると判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成する。一方で、対応付け処理部４４８は、顔判定部４４３により置換後の顔画像データＤ１ａに学習対象者ＯＢの顔画像が含まれないと判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成しない。 In the line-of-sight estimation system 1, the learning data generation device 100 includes a face determination unit 443 that determines that the face image of the learning target person OB is included in the face image data D1a after replacement. When the face determination unit 443 determines that the face image data D1a after replacement includes the face image of the learning target OB, the association processing unit 448 corresponds the face image data D1a after replacement with the line-of-sight data D2. The attached training data set D3 is generated. On the other hand, when the matching processing unit 448 determines that the face image data D1a after replacement does not include the face image of the learning target OB by the face determination unit 443, the face image data D1a after replacement and the line-of-sight data The training data set D3 associated with D2 is not generated.

この構成により、視線推定システム１は、例えば、図９に示すように、置換後の顔画像データＤ１ａにおいて、画像置換部４４２により眼球カメラ６２Ｒ、６２Ｌ等を含む画像領域Ｑの画素値が置換画素値に置き換えられていない場合、当該置換後の顔画像データＤ１ａを不採用とすることができる。これにより、視線推定システム１は、学習用データセットＤ３の信頼性の低下を抑制することができ、この結果、学習済みモデルＭにより適正に視線を推定することができる。 With this configuration, for example, as shown in FIG. 9, in the face image data D1a after replacement, the pixel value of the image region Q including the eyeball cameras 62R, 62L, etc. is replaced by the image replacement unit 442. If it is not replaced with a value, the face image data D1a after the replacement can be rejected. As a result, the line-of-sight estimation system 1 can suppress a decrease in the reliability of the learning data set D3, and as a result, the line-of-sight can be estimated appropriately by the trained model M.

上記視線推定システム１において、学習用データ生成装置１００は、視線検出器６０により検出された視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する瞬き判定部４４５を含んで構成される。対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていないと判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成する。一方で、対応付け処理部４４８は、瞬き判定部４４５により瞬きをしていると判定された場合、置換後の顔画像データＤ１ａと視線データＤ２とを対応付けた学習用データセットＤ３を生成しない。 In the line-of-sight estimation system 1, the learning data generation device 100 includes a blink determination unit 445 that determines the blink of the learning target person OB based on the line-of-sight data D2 detected by the line-of-sight detector 60. When the blinking determination unit 445 determines that the blinking is not performed, the association processing unit 448 generates a learning data set D3 in which the replaced face image data D1a and the line-of-sight data D2 are associated with each other. On the other hand, when the blinking determination unit 445 determines that the blinking is performed, the association processing unit 448 does not generate the learning data set D3 in which the replaced face image data D1a and the line-of-sight data D2 are associated with each other. ..

この構成により、視線推定システム１は、例えば、学習対象者ＯＢが瞬きをすることにより眼を閉じた状態となり、学習対象者ＯＢの視線を検出することができない場合、或いは誤って視線を検出した場合、置換後の顔画像データＤ１ａを不採用とすることができる。これにより、視線推定システム１は、学習用データセットＤ３の信頼性の低下を抑制することができ、この結果、学習済みモデルＭにより適正に視線を推定することができる。 With this configuration, the line-of-sight estimation system 1 is, for example, in a state where the learning target person OB blinks to close his / her eyes and cannot detect the line-of-sight of the learning target person OB, or erroneously detects the line-of-sight. In this case, the face image data D1a after replacement can be rejected. As a result, the line-of-sight estimation system 1 can suppress a decrease in the reliability of the learning data set D3, and as a result, the line-of-sight can be estimated appropriately by the trained model M.

上記視線推定システム１において、予め定められた置換画素値は、学習対象者ＯＢの眼の色とは異なる色の画素値である。この構成により、視線推定システム１は、学習済みモデルＭを生成する際に視線検出器６０を眼として誤認識した状態で学習済みモデルＭが生成されてしまうことをより抑制することができ、この結果、学習済みモデルＭにより適正に視線を推定することができる。 In the line-of-sight estimation system 1, the predetermined replacement pixel value is a pixel value of a color different from the eye color of the learning target OB. With this configuration, the line-of-sight estimation system 1 can further suppress the generation of the trained model M in a state where the line-of-sight detector 60 is erroneously recognized as an eye when the trained model M is generated. As a result, the line of sight can be properly estimated by the trained model M.

実施形態に係る視線推定方法、及び、視線推定プログラムは、置換後の顔画像データＤ１ａを含む学習用データセットＤ３により機械学習した学習済みモデルＭを用いて入力顔画像データＤ４から視線を推定するので、上述した視線推定システム１と同様に、適正に視線を推定することができる。実施形態に係る学習用データ生成装置１００は、置換後の顔画像データＤ１ａを含む学習用データセットＤ３を生成するので、適正に視線を推定することを支援することができる。 The line-of-sight estimation method and the line-of-sight estimation program according to the embodiment estimate the line-of-sight from the input face image data D4 using the trained model M machine-learned by the learning data set D3 including the replaced face image data D1a. Therefore, the line of sight can be estimated appropriately in the same manner as the above-mentioned line of sight estimation system 1. Since the learning data generation device 100 according to the embodiment generates the learning data set D3 including the face image data D1a after replacement, it is possible to support the proper estimation of the line of sight.

〔変形例〕
次に、実施形態の変形例について説明する。なお、変形例では、実施形態と同等の構成要素には同じ符号を付し、その詳細な説明を省略する。上述した実施形態では、視線推定システム１として、１つのシステムで学習フェーズと使用フェーズとの双方を行う場合の例を説明したが、実施形態はこれに限られない。 [Modification example]
Next, a modification of the embodiment will be described. In the modified example, the same components as those in the embodiment are designated by the same reference numerals, and detailed description thereof will be omitted. In the above-described embodiment, an example in which both the learning phase and the use phase are performed by one system as the line-of-sight estimation system 1 has been described, but the embodiment is not limited to this.

例えば、図１３に例示する変形例に係る視線推定システム１Ａは、学習用データセットＤ３を生成する学習用データ生成装置１００Ａと、学習済みモデルＭを生成するモデル生成装置２００Ａと、学習済みモデルＭを用いて入力顔画像データＤ４から視線を推定する視線推定装置３００Ａとに分かれて構成される点で上述した視線推定システム１とは異なる。図１３は、実施形態の変形例に係る視線推定システム１Ａの構成例を示すブロック図である。図１４は、実施形態の変形例に係る視線推定装置３００Ａの適用例を示す概略図である。 For example, the line-of-sight estimation system 1A according to the modification illustrated in FIG. 13 includes a learning data generation device 100A that generates a training data set D3, a model generation device 200A that generates a trained model M, and a trained model M. It is different from the above-mentioned line-of-sight estimation system 1 in that it is configured separately from the line-of-sight estimation device 300A that estimates the line-of-sight from the input face image data D4 using the above. FIG. 13 is a block diagram showing a configuration example of the line-of-sight estimation system 1A according to a modified example of the embodiment. FIG. 14 is a schematic view showing an application example of the line-of-sight estimation device 300A according to the modified example of the embodiment.

視線推定システム１Ａは、図１３に示すように、学習用データ生成装置１００Ａ、モデル生成装置２００Ａ、及び、視線推定装置３００Ａが、それぞれが独立して別々の場所に配置され、分散したシステムを構成している。 As shown in FIG. 13, the line-of-sight estimation system 1A constitutes a distributed system in which the learning data generation device 100A, the model generation device 200A, and the line-of-sight estimation device 300A are independently arranged at different locations. is doing.

学習用データ生成装置１００Ａは、入力機器１０Ａと、出力機器２０Ａと、記憶回路３０Ａと、処理回路４０Ａと、同期信号生成装置５０と、視線検出器６０と、学習用撮像部としての学習用カメラ７０Ａとを備えている。学習用データ生成装置１００Ａは、学習フェーズにおいて、推定対象の入力顔画像から視線を推定する学習済みモデルＭを機械学習させる際に用いられる学習用データセットＤ３を生成する処理を行う。学習用カメラ７０Ａは、学習対象者ＯＢの顔全体を撮像するものであり、学習対象者ＯＢと一定の間隔を空けた状態で当該学習対象者ＯＢの顔の前方に配置されている。処理回路４０Ａは、機能概念的に、学習用データ生成部４１を含んで構成される。 The learning data generation device 100A includes an input device 10A, an output device 20A, a storage circuit 30A, a processing circuit 40A, a synchronization signal generation device 50, a line-of-sight detector 60, and a learning camera as a learning image pickup unit. It is equipped with 70A. In the learning phase, the learning data generation device 100A performs a process of generating a learning data set D3 used when machine learning a trained model M that estimates a line of sight from an input face image to be estimated. The learning camera 70A captures the entire face of the learning target person OB, and is arranged in front of the face of the learning target person OB at a certain interval from the learning target person OB. The processing circuit 40A is functionally conceptually configured to include a learning data generation unit 41.

モデル生成装置２００Ａは、入力機器１０Ｂと、出力機器２０Ｂと、記憶回路３０Ｂと、処理回路４０Ｂとを備え、学習フェーズにおいて、処理回路４０Ａの学習用データ生成部４１により生成された複数の学習用データセットＤ３を用いて、学習済みモデルＭを機械学習により生成する処理を行う。処理回路４０Ｂは、機能概念的に、モデル生成部４２を含んで構成される。 The model generator 200A includes an input device 10B, an output device 20B, a storage circuit 30B, and a processing circuit 40B, and in the learning phase, a plurality of learning data generated by the learning data generation unit 41 of the processing circuit 40A. Using the data set D3, a process of generating a trained model M by machine learning is performed. The processing circuit 40B is functionally conceptually configured to include a model generation unit 42.

視線推定装置３００Ａは、図１４に示すように、車両に搭載されている。視線推定装置３００Ａは、入力機器１０Ｃと、出力機器２０Ｃと、記憶回路３０Ｃと、処理回路４０Ｃと、運転者撮像部としての運転者カメラ７０Ｃとを備え、使用フェーズにおいて、処理回路４０Ｂのモデル生成部４２により生成された学習済みモデルＭを用いて、入力顔画像データＤ４から運転者の視線を推定する処理を行う。これにより、視線推定装置３００Ａは、運転者の視線を適正に推定することができる。運転者カメラ７０Ｃは、車両の運転者の顔を撮像するものであり、車両のメーター内部やステアリングコラムカバーに設置されている。処理回路４０Ｃは、機能概念的に、推定対象入力部４３と、視線推定部４４と、出力部４５とを含んで構成される。記憶回路３０Ｃには、処理回路４０Ｂのモデル生成部４２により生成された学習済みモデルＭが予め保存されている。処理回路４０Ｃの視線推定部４４は、記憶回路３０Ｃに保存された学習済みモデルＭを用いて、運転者カメラ７０Ｃから出力される入力顔画像データＤ４から運転者の視線を推定する処理を行う。 As shown in FIG. 14, the line-of-sight estimation device 300A is mounted on the vehicle. The line-of-sight estimation device 300A includes an input device 10C, an output device 20C, a storage circuit 30C, a processing circuit 40C, and a driver camera 70C as a driver image pickup unit, and generates a model of the processing circuit 40B in the use phase. Using the trained model M generated by the unit 42, a process of estimating the driver's line of sight from the input face image data D4 is performed. As a result, the line-of-sight estimation device 300A can properly estimate the line-of-sight of the driver. The driver camera 70C captures the face of the driver of the vehicle, and is installed inside the meter of the vehicle or on the steering column cover. The processing circuit 40C is functionally conceptually configured to include an estimation target input unit 43, a line-of-sight estimation unit 44, and an output unit 45. In the storage circuit 30C, the trained model M generated by the model generation unit 42 of the processing circuit 40B is stored in advance. The line-of-sight estimation unit 44 of the processing circuit 40C uses the learned model M stored in the storage circuit 30C to perform a process of estimating the driver's line of sight from the input face image data D4 output from the driver camera 70C.

以上のように、視線推定システム１Ａは、学習用データ生成装置１００Ａ、モデル生成装置２００Ａ、及び、視線推定装置３００Ａが、それぞれ分かれて構成されてもよい。 As described above, in the line-of-sight estimation system 1A, the learning data generation device 100A, the model generation device 200A, and the line-of-sight estimation device 300A may be separately configured.

なお、上記説明では、学習用データ生成装置１００は、置換後の顔画像データＤ１ａにおいて、学習対象者ＯＢの顔画像が含まれることを判定する顔判定部４４３を含んで構成される例について説明したが、顔判定部４４３を含んで構成されていなくてもよい。 In the above description, an example in which the learning data generation device 100 includes a face determination unit 443 for determining that the face image of the learning target person OB is included in the replaced face image data D1a will be described. However, it does not have to be configured to include the face determination unit 443.

学習用データ生成装置１００は、視線検出器６０により検出された視線データＤ２に基づいて学習対象者ＯＢの瞬きを判定する瞬き判定部４４５を含んで構成される例について説明したが、瞬き判定部４４５を含んで構成されていなくてもよい。 An example in which the learning data generation device 100 includes a blink determination unit 445 that determines the blink of the learning target person OB based on the line-of-sight data D2 detected by the line-of-sight detector 60 has been described. It may not be configured to include 445.

瞬き判定部４４５は、視線検出器６０により検出された視線データＤ２に含まれる学習対象者ＯＢの瞬きの検出結果に基づいて学習対象者ＯＢの瞬きを判定する例について説明したが、これに限定されない。瞬き判定部４４５は、例えば、学習対象者ＯＢの瞬きの検出結果が視線データＤ２に含まれない場合、視線データＤ２の視野画像に基づいて学習対象者ＯＢの瞬きを判定する。瞬き判定部４４５は、例えば、視線データＤ２の視野画像に示される学習対象者ＯＢの視線位置の値や黒眼中心値等に時系列フィルタリング等を行い、黒眼を検出できる場合には瞬きをしていないと判定し、黒眼を検出できない場合には瞬きをしていると判定する。 The blink determination unit 445 has described an example of determining the blink of the learning target person OB based on the detection result of the blink of the learning target person OB included in the line-of-sight data D2 detected by the line-of-sight detector 60, but the present invention is limited to this. Not done. For example, when the blink detection result of the learning target person OB is not included in the line-of-sight data D2, the blink determination unit 445 determines the blinking of the learning target person OB based on the visual field image of the line-of-sight data D2. The blink determination unit 445 performs time-series filtering or the like on the value of the line-of-sight position of the learning target person OB shown in the visual field image of the line-of-sight data D2, the center value of the black eye, and the like, and blinks when the black eye can be detected. It is determined that the eye is not blinking, and if the black eye cannot be detected, it is determined that the eye is blinking.

視線検出器６０は、株式会社ナックイメージテクノロジー製のＥＭＲ−９（帽子型）を採用することができる例について説明したがこれに限定されず、例えば、ＥＭＲ−９（メガネ型）を採用してもよいし、トビー・テクノロジー株式会社製のＴｏｂｉＰｒｏグラス２（メガネ型）を採用してもよい。メガネ型の場合、画像置換部４４２は、メガネのフレームの画像を含む画像領域の画素値を置換画素値に置き換える。 The line-of-sight detector 60 has described an example in which EMR-9 (hat type) manufactured by Nac Image Technology Co., Ltd. can be adopted, but the present invention is not limited to this, and for example, EMR-9 (glasses type) is adopted. Alternatively, TobiPro glass 2 (glasses type) manufactured by Tobii Technology Co., Ltd. may be adopted. In the case of the glasses type, the image replacement unit 442 replaces the pixel value of the image area including the image of the frame of the glasses with the replacement pixel value.

機械学習アルゴリズムＡＬとして、畳み込みニューラルネットワークを用いる例について説明したが、これに限定されず、例えば、ロジスティック（Ｌｏｇｉｓｔｉｃ）回帰、アンサンブル学習（ＥｎｓｅｍｂｌｅＬｅａｒｎｉｎｇ）、サポートベクターマシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）、ランダムフォレスト（ＲａｎｄｏｍＦｏｒｅｓｔ）、ナイーブベイズ（ＮａｉｖｅＢａｙｓ）等のアルゴリズムを用いてもよい。 An example of using a convolutional neural network as a machine learning algorithm AL has been described, but the present invention is not limited to this, and examples thereof include logistic regression, ensemble learning, support vector machine, and random forest (Random Forest). Algorithms such as Random Forest) and Naive Bays may be used.

処理回路４０は、単一のプロセッサによって各処理機能が実現されるものとして説明したがこれに限らない。処理回路４０は、複数の独立したプロセッサを組み合わせて各プロセッサがプログラムを実行することにより各処理機能が実現されてもよい。また、処理回路４０が有する処理機能は、単一又は複数の処理回路に適宜に分散又は統合されて実現されてもよい。また、処理回路４０が有する処理機能は、その全部又は任意の一部をプログラムにて実現してもよく、また、ワイヤードロジック等によるハードウェアとして実現してもよい。 The processing circuit 40 has been described as assuming that each processing function is realized by a single processor, but the processing circuit 40 is not limited to this. In the processing circuit 40, each processing function may be realized by combining a plurality of independent processors and executing a program by each processor. Further, the processing function of the processing circuit 40 may be appropriately distributed or integrated into a single or a plurality of processing circuits. Further, the processing function of the processing circuit 40 may be realized by a program in whole or in any part thereof, or may be realized as hardware by wired logic or the like.

以上で説明したプロセッサによって実行されるプログラムは、記憶回路３０等に予め組み込まれて提供される。なお、このプログラムは、これらの装置にインストール可能な形式又は実行可能な形式のファイルで、コンピュータで読み取り可能な記憶媒体に記録されて提供されてもよい。また、このプログラムは、インターネット等のネットワークに接続されたコンピュータ上に格納され、ネットワーク経由でダウンロードされることにより提供又は配布されてもよい。 The program executed by the processor described above is provided by being incorporated in the storage circuit 30 or the like in advance. It should be noted that this program may be provided as a file in a format that can be installed on these devices or in an executable format, recorded on a storage medium that can be read by a computer. Further, this program may be stored on a computer connected to a network such as the Internet, and may be provided or distributed by being downloaded via the network.

１、１Ａ視線推定システム
１００、１００Ａ学習用データ生成装置
３００、３００Ａ視線推定装置
４１学習用データ生成部
４２モデル生成部
４３推定対象入力部
４４視線推定部
６０視線検出器
７０カメラ（学習用撮像部、運転者撮像部）
７０Ａ学習用カメラ（学習用撮像部）
７０Ｃ運転者カメラ（運転者撮像部）
４４１眼球カメラ検出部（画像検出部）
４４２画像置換部
４４３顔判定部
４４５瞬き判定部
４４８対応付け処理部
Ｄ１、Ｄ１ａ顔画像データ
Ｄ２視線データ
Ｄ３学習用データセット
Ｄ４入力顔画像データ（入力顔画像）
Ｍ学習済みモデル
ＯＢ学習対象者
Ｑ画像領域 1, 1A Line-of-sight estimation system 100, 100A Learning data generation device 300, 300A Line-of-sight estimation device 41 Learning data generation unit 42 Model generation unit 43 Estimated target input unit 44 Line-of-sight estimation unit 60 Line-of-sight detector 70 Camera (learning image pickup unit) , Driver imaging unit)
70A Learning camera (learning imaging unit)
70C driver camera (driver imaging unit)
441 Eyeball camera detection unit (image detection unit)
442 Image replacement unit 443 Face determination unit 445 Blink determination unit 448 Correspondence processing unit D1, D1a Face image data D2 Line-of-sight data D3 Learning data set D4 Input face image data (input face image)
M Trained model OB Learned subject Q Image area

Claims

推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成装置と、
前記学習用データ生成装置により生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成部と、
前記推定対象の入力顔画像を入力する推定対象入力部と、
前記モデル生成部により生成された前記学習済みモデルを用いて、前記推定対象入力部により入力された前記入力顔画像から視線を推定する視線推定部と、を備え、
前記学習用データ生成装置は、
学習対象者の顔を撮像する学習用撮像部と、
前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、
前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、
前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、
前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理部と、を含んで構成されることを特徴とする視線推定システム。 A learning data generator that generates a learning data set used for machine learning a trained model that estimates the line of sight from the input face image to be estimated.
A model generation unit that generates the trained model by machine learning using a plurality of the training data sets generated by the training data generation device, and a model generation unit.
The estimation target input unit for inputting the input face image of the estimation target, and the estimation target input unit.
Using the trained model generated by the model generation unit, a line-of-sight estimation unit that estimates the line of sight from the input face image input by the estimation target input unit is provided.
The learning data generator is
A learning imaging unit that captures the face of the learning target,
A line-of-sight detector arranged between the learning image pickup unit and the learning target person to detect the line of sight of the learning target person,
An image detection unit that detects an image of the line-of-sight detector in face image data including the image of the line-of-sight detector captured by the learning image pickup unit and the face image of the learning target person.
An image replacement unit that replaces the pixel value of the image region including the image of the line-of-sight detector detected by the image detection unit with a predetermined pixel value in the face image data.
Correspondence processing for generating the learning data set in which the face image data after replacement replaced by the image replacement unit and the line-of-sight data representing the line-of-sight of the learning target person detected by the line-of-sight detector are associated with each other. A line-of-sight estimation system characterized in that it is composed of a part and a part.

前記学習用データ生成装置は、前記置換後の顔画像データにおいて、前記学習対象者の顔画像が含まれることを判定する顔判定部を含み、
前記対応付け処理部は、前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、
前記顔判定部により前記置換後の顔画像データに前記学習対象者の顔画像が含まれないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しない請求項１に記載の視線推定システム。 The learning data generation device includes a face determination unit that determines that the face image of the learning target is included in the face image data after the replacement.
When the face determination unit determines that the face image data after the replacement includes the face image of the learning target person, the association processing unit corresponds the face image data after the replacement with the line-of-sight data. Generate the attached training data set,
When the face determination unit determines that the face image data after the replacement does not include the face image of the person to be learned, the learning data in which the face image data after the replacement and the line-of-sight data are associated with each other. The line-of-sight estimation system according to claim 1, which does not generate a set.

前記学習用データ生成装置は、前記視線検出器により検出された前記視線データに基づいて前記学習対象者の瞬きを判定する瞬き判定部を含み、
前記対応付け処理部は、前記瞬き判定部により瞬きをしていないと判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成し、
前記瞬き判定部により瞬きをしていると判定された場合、前記置換後の顔画像データと前記視線データとを対応付けた前記学習用データセットを生成しない請求項１又は２に記載の視線推定システム。 The learning data generation device includes a blink determination unit that determines a blink of the learning target person based on the line-of-sight data detected by the line-of-sight detector.
When the blink determination unit determines that the blink is not blinking, the association processing unit generates the learning data set in which the replaced face image data and the line-of-sight data are associated with each other.
The line-of-sight estimation according to claim 1 or 2, wherein when the blink determination unit determines that blinking is occurring, the learning data set in which the replaced face image data and the line-of-sight data are associated with each other is not generated. system.

予め定められた前記画素値は、前記学習対象者の眼の色とは異なる色の画素値である請求項１〜３のいずれか１項に記載の視線推定システム。 The line-of-sight estimation system according to any one of claims 1 to 3, wherein the predetermined pixel value is a pixel value of a color different from the eye color of the learning target person.

推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成する学習用データ生成ステップと、
前記学習用データ生成ステップで生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成するモデル生成ステップと、
前記推定対象の入力顔画像を入力する推定対象入力ステップと、
前記モデル生成ステップで生成された前記学習済みモデルを用いて、前記推定対象入力ステップで入力された前記入力顔画像から視線を推定する視線推定ステップと、を有し、
前記学習用データ生成ステップでは、
学習対象者の顔を学習用撮像部により撮像する撮像ステップと、
前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出する視線検出ステップと、
前記撮像ステップで撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出ステップと、
前記顔画像データにおいて、前記画像検出ステップで検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換ステップと、
前記画像置換ステップで置き換えられた置換後の顔画像データと、前記視線検出ステップで検出された前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成する対応付け処理ステップと、を含むことを特徴とする視線推定方法。 A learning data generation step that generates a learning data set used when machine learning a trained model that estimates the line of sight from the input face image to be estimated,
A model generation step for generating the trained model by machine learning using a plurality of the training data sets generated in the training data generation step, and a model generation step.
The estimation target input step for inputting the input face image of the estimation target, and
Using the trained model generated in the model generation step, it has a line-of-sight estimation step for estimating a line-of-sight from the input face image input in the estimation target input step.
In the training data generation step,
An imaging step in which the face of the learning target is imaged by the learning imaging unit,
A line-of-sight detection step of detecting the line of sight of the learning target person by a line-of-sight detector arranged between the learning image pickup unit and the learning target person,
An image detection step for detecting an image of the line-of-sight detector in face image data including the image of the line-of-sight detector and the face image of the learning target person captured in the image pickup step.
An image replacement step of replacing the pixel value of the image region including the image of the line-of-sight detector detected in the image detection step with a predetermined pixel value in the face image data.
Correspondence processing for generating the learning data set in which the face image data after replacement replaced in the image replacement step and the line-of-sight data representing the line-of-sight of the learning target person detected in the line-of-sight detection step are associated with each other. A line-of-sight estimation method characterized by including steps.

推定対象の入力顔画像から視線を推定する学習済みモデルを機械学習させる際に用いられる学習用データセットを生成し、
生成された複数の前記学習用データセットを用いて、機械学習により前記学習済みモデルを生成し、
前記推定対象の入力顔画像を入力し、
前記学習済みモデルを用いて、前記推定対象の入力顔画像から視線を推定する各処理をコンピュータに実行させるものであり、
前記学習用データセットを生成する場合、
学習対象者の顔を学習用撮像部により撮像し、
前記学習用撮像部と前記学習対象者との間に配置された視線検出器により前記学習対象者の視線を検出し、
前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出し、
前記顔画像データにおいて、前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換え、
置換後の顔画像データと前記学習対象者の視線を表す視線データとを対応付けた前記学習用データセットを生成することを特徴とする視線推定プログラム。 Generates a training data set used for machine learning a trained model that estimates the line of sight from the input face image to be estimated.
Using the generated plurality of the trained data sets, the trained model is generated by machine learning.
Input the input face image of the estimation target,
Using the trained model, a computer is made to execute each process of estimating the line of sight from the input face image of the estimation target.
When generating the training data set,
The face of the learning target is imaged by the learning imaging unit,
The line of sight of the learning target person is detected by a line-of-sight detector arranged between the learning image pickup unit and the learning target person.
The image of the line-of-sight detector is detected in the face image data including the image of the line-of-sight detector captured by the learning image pickup unit and the face image of the learning target person.
In the face image data, the pixel value of the image area including the image of the line-of-sight detector is replaced with a predetermined pixel value.
A line-of-sight estimation program characterized by generating the learning data set in which the face image data after replacement and the line-of-sight data representing the line-of-sight of the learning target person are associated with each other.

学習対象者の顔を撮像する学習用撮像部と、
前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器と、
前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部と、
前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部と、
前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部と、を備えることを特徴とする学習用データ生成装置。 A learning imaging unit that captures the face of the learning target,
A line-of-sight detector arranged between the learning image pickup unit and the learning target person to detect the line of sight of the learning target person,
An image detection unit that detects an image of the line-of-sight detector in face image data including the image of the line-of-sight detector captured by the learning image pickup unit and the face image of the learning target person.
An image replacement unit that replaces the pixel value of the image region including the image of the line-of-sight detector detected by the image detection unit with a predetermined pixel value in the face image data.
A mapping processing unit that generates a learning data set in which the face image data after replacement replaced by the image replacement unit and the line-of-sight data representing the line of sight of the learning target person detected by the line-of-sight detector are associated with each other. And, a learning data generation device characterized by being provided with.

車両の運転者の顔を撮像する運転者撮像部と、
前記運転者撮像部により撮像された運転者の顔画像を入力する推定対象入力部と、
学習対象者の顔を撮像する学習用撮像部、前記学習用撮像部と前記学習対象者との間に配置され前記学習対象者の視線を検出する視線検出器、前記学習用撮像部により撮像された前記視線検出器の画像及び前記学習対象者の顔画像を含む顔画像データにおいて、前記視線検出器の画像を検出する画像検出部、前記顔画像データにおいて、前記画像検出部により検出された前記視線検出器の画像を含む画像領域の画素値を、予め定められた画素値に置き換える画像置換部、及び、前記画像置換部により置き換えられた置換後の顔画像データと、前記視線検出器により検出された前記学習対象者の視線を表す視線データとを対応付けた学習用データセットを生成する対応付け処理部を含む学習用データ生成装置により生成された複数の前記学習用データセットを用いて機械学習した学習済みモデルを用いて、前記推定対象入力部により入力された前記運転者の顔画像から前記運転者の視線を推定する視線推定部と、
を備えることを特徴とする視線推定装置。 A driver image pickup unit that captures the face of the driver of the vehicle,
An estimation target input unit for inputting a driver's face image captured by the driver image pickup unit, and an estimation target input unit.
An image is taken by a learning image pickup unit that images the face of a learning target person, a line-of-sight detector that is arranged between the learning image pickup unit and the learning target person and detects the line of sight of the learning target person, and the learning image pickup unit. The image detection unit that detects the image of the line-of-sight detector in the face image data including the image of the line-of-sight detector and the face image of the learning target, and the image detection unit detected in the face image data. An image replacement unit that replaces the pixel value of the image area including the image of the line-of-sight detector with a predetermined pixel value, and the face image data after replacement replaced by the image replacement unit, and detection by the line-of-sight detector. A machine using a plurality of the learning data sets generated by a learning data generation device including a mapping processing unit that generates a learning data set in which the line-of-sight data representing the line-of-sight of the learning target person is associated with the learning data set. Using the learned model, the line-of-sight estimation unit that estimates the driver's line of sight from the driver's face image input by the estimation target input unit, and the line-of-sight estimation unit.
A line-of-sight estimation device characterized by comprising.