JP2019028876A

JP2019028876A - Device and method of generating teacher data for machine learning

Info

Publication number: JP2019028876A
Application number: JP2017149969A
Authority: JP
Inventors: ベンジャミンシュミット; Benjamin Schmitt
Original assignee: DIGITAL MEDIA PROFESSIONAL KK; Digital Media Professionals Inc
Current assignee: DIGITAL MEDIA PROFESSIONAL KK; Digital Media Professionals Inc
Priority date: 2017-08-02
Filing date: 2017-08-02
Publication date: 2019-02-21
Anticipated expiration: 2037-08-02
Also published as: JP6330092B1

Abstract

To efficiently generate teacher data suitable for image analysis.SOLUTION: A teacher data generation device 20 for generating teacher data used for a machine learning system 100 comprises: a database 22 storing at least one or more image inherent components of the shape and other appearance elements of an object extracted from an input image 10; a change section 25 changing image inherent components stored in the database 22 to generate one or plural types of other image inherent components; and a reconstitution section 26 generating a reconstitution image corresponding to the input image 10 at least partially using the other image inherent components to apply the same to machine learning.SELECTED DRAWING: Figure 5

Description

本発明は，機械学習用の教師データを生成するための装置及び方法に関する。また，本発明は，上記装置及び方法によって生成した教師データを利用して機械学習を実施することにより学習済みモデルを生成する方法や，その学習済みモデルを使用する方法に関するものである。 The present invention relates to an apparatus and method for generating teacher data for machine learning. The present invention also relates to a method for generating a learned model by performing machine learning using teacher data generated by the above apparatus and method, and a method for using the learned model.

近年，画像解析や動画解析の分野において人工知能（ＡＩ）が利用されている。人工知能の学習には大量のデータが必要となり，データの量が人工知能による解析精度に大きく影響を与える。例えば人工知能により入力画像を解析して画像認識や状況予測を行う場合，その入力画像に近似した既知の画像データを大量に予め学習させておく必要があるが，十分な画像データが存在しない場合には，画像認識や状況予測を行うことができないか或いはその精度が著しく低下する。 In recent years, artificial intelligence (AI) has been used in the fields of image analysis and video analysis. Artificial intelligence learning requires a large amount of data, and the amount of data greatly affects the analysis accuracy of artificial intelligence. For example, when performing image recognition and situation prediction by analyzing an input image using artificial intelligence, it is necessary to learn a large amount of known image data that approximates the input image, but there is not enough image data In some cases, image recognition and situation prediction cannot be performed, or the accuracy thereof is significantly reduced.

このような問題を解消するために，例えば特許文献１には，少ない元データから十分な量の教師データを作成する技術が開示されている。特許文献１に記載の学習方法では，検出対象画像上で順次切り出した部分画像に対して左右反転処理や９０度単位の回転処理を施すことにより，部分画像の反転／回転画像を生成する。そして，これらの画像に対して所定の種類の判別器を用いて当該画像が所定の顔の向き及び天地方向にある顔画像であるか否かを判別する。このようにして，判別器の種類と入力された画像の種類との組合せから，種々の向き及び天地方向にある顔であるか否かが判別される。 In order to solve such a problem, for example, Patent Literature 1 discloses a technique for creating a sufficient amount of teacher data from a small amount of original data. In the learning method described in Patent Literature 1, a partial image that is sequentially cut out on a detection target image is subjected to left-right reversal processing or 90-degree rotation processing to generate a reversed / rotated image of the partial image. Then, a predetermined type of discriminator is used for these images to determine whether or not the image is a face image in a predetermined face direction and vertical direction. In this way, it is determined whether or not the face is in various orientations and top and bottom directions from the combination of the type of discriminator and the type of input image.

特開２００６−３５０７０４号公報JP 2006-350704 A

しかしながら，上記特許文献１に記載の技術は，入力画像に対して反転や回転などの幾何変換が行われているに過ぎず，実質的には同じ画像が繰り返し学習に使用されているに過ぎない。このため，特許文献１の技術では，高精度の画像認識や状況予測に利用し得る十分な量の教師データを得ることができないという問題がある。 However, the technique described in Patent Document 1 merely performs geometric transformation such as inversion and rotation on the input image, and substantially the same image is used for repeated learning. . For this reason, the technique of Patent Document 1 has a problem that a sufficient amount of teacher data that can be used for highly accurate image recognition and situation prediction cannot be obtained.

そこで，本発明は，元データから特に画像解析に適した教師データを効率的に生成できる技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique capable of efficiently generating teacher data particularly suitable for image analysis from original data.

本発明の発明者は，従来技術の問題点の解決手段について鋭意検討した結果，元データとなる入力画像からそこに含まれるオブジェクトの形状や外観要因といった画像固有成分を抽出し，これに変更を加えた別の種類の画像固有成分を用いて入力画像を再構成することで，一の入力画像から様々なバリエーションの教師データ（教師画像）を容易かつ効率的に生成できるという知見を得た。そして，本発明者は，上記知見に基づけば従来技術の課題を解決できることに想到し，本発明を完成させた。具体的に説明すると，本発明は以下の構成・工程を有する。 The inventor of the present invention, as a result of diligent research on the means for solving the problems of the prior art, extracts image-specific components such as the shape and appearance factor of the object contained therein from the input image that is the original data, and changes this. We obtained knowledge that various variations of teacher data (teacher image) can be generated easily and efficiently from one input image by reconstructing the input image using another kind of image specific component added. The inventor has conceived that the problems of the prior art can be solved based on the above knowledge, and has completed the present invention. If it demonstrates concretely, this invention has the following structures and processes.

本発明の第１の側面は，機械学習に用いられる教師データを生成する教師データ生成装置に関する。教師データ生成装置は，データベース，変更部，及び再構成部を含む。データベースは，入力画像から抽出された画像固有成分を記憶している。ここにいう画像固有成分は，オブジェクトの形状及びその他の外観要因の少なくともいずれか１つ以上を含む。例えば画像固有成分には，オブジェクトの形状の他に，ライティングや，テクスチャ，カメラに関するパラメータが含まれる。変更部は，データベースに記憶されている画像固有成分を変更して，一又は複数種類の別の画像固有成分を生成する。再構成部は，変更部が作成した別の画像固有成分を用いて，少なくとも部分的に入力画像に対応する再構成画像を生成する。そして，本発明の教師データ生成装置は，ここで得られた再構成画像を機械学習に適用する。機械学習の技法は特に限定されず，例えば，ニューラルネットワーク，判別分析，ロジスティック回帰分析，遺伝的プログラミング，帰納論理プログラミング，サポートベクタマシン，及びクラスタリングなどの公知の技法を利用すればよい。 A first aspect of the present invention relates to a teacher data generation apparatus that generates teacher data used for machine learning. The teacher data generation device includes a database, a change unit, and a reconstruction unit. The database stores image specific components extracted from the input image. The image specific component mentioned here includes at least one of the shape of the object and other appearance factors. For example, the image specific component includes parameters related to lighting, texture, and camera in addition to the shape of the object. The changing unit changes the image specific component stored in the database to generate one or a plurality of different image specific components. The reconstructing unit generates a reconstructed image corresponding at least partially to the input image using another image specific component created by the changing unit. The teacher data generation apparatus of the present invention applies the reconstructed image obtained here to machine learning. The machine learning technique is not particularly limited, and known techniques such as neural network, discriminant analysis, logistic regression analysis, genetic programming, inductive logic programming, support vector machine, and clustering may be used.

一例を挙げて説明すると，入力画像から，“光源の影響を排除した表面の柄”といったようなテクスチャの画像固有成分が抽出されてデータベースに記憶されている。つまり，この例では，光源のパラメータと表面の柄（テクスチャ）のパラメータとが分離されて，画像固有成分としてそれぞれデータベースに記憶されていることとなる。また，変更部は，このような画像固有成分（光源の影響を排除した表面の柄）に対して，３次元グラフィックスなどの技術で光源を照射し，照射位置を様々に変えたり，光源の種類を変化させたりすることによって，別の種類の画像固有成分を生成し，その種類を増やしていく。そして，再構成部は，ここで得られた別種の画像固有成分を用いて，入力画像と同様のオブジェクトを含みつつも，画像固有成分（照明環境や描画角度）の異なる再構成画像を生成していく。再構成画像は，ある種無限に生成することが可能である。このようにして得られた再構成画像は，入力画像を幾何変換したに過ぎない画像や，画像の輝度値やコントラストを調整したに過ぎない画像とは本質的に異なる。再構成画像は，上記のとおり，入力画像と同様のオブジェクトを含みつつも，これらのオブジェクトが異なる照明環境に置かれている状況や，これらのオブジェクトを別の角度から描画した状況などに対応したものであるため，入力画像の様々なバリエーションを表したものであるといえる。従って，このような再構成画像は，画像解析用の教師データとして効果的に利用することができる。 As an example, a texture image-specific component such as “surface pattern excluding the influence of the light source” is extracted from the input image and stored in the database. That is, in this example, the light source parameter and the surface pattern (texture) parameter are separated and stored in the database as image specific components. Also, the changing unit irradiates the image specific component (surface pattern excluding the influence of the light source) with a technique such as three-dimensional graphics to change the irradiation position in various ways, By changing the types, other types of image-specific components are generated and the types are increased. Then, the reconstruction unit generates a reconstructed image having different image unique components (lighting environment and drawing angle) while including the same object as the input image using the different types of image unique components obtained here. To go. The reconstructed image can be generated indefinitely. The reconstructed image obtained in this manner is essentially different from an image obtained by merely geometrically transforming an input image or an image obtained by merely adjusting the luminance value and contrast of the image. As described above, the reconstructed image includes the same objects as the input image, but corresponds to the situation where these objects are placed in different lighting environments or the situation where these objects are drawn from different angles. Therefore, it can be said that it represents various variations of the input image. Therefore, such a reconstructed image can be effectively used as teacher data for image analysis.

本発明に係る教師データ生成装置は，解析部をさらに含むことが好ましい。解析部は，入力画像から，３Ｄオブジェクトの形状，光源，及び当該画像の撮影に用いられた装置に関する情報の１つ以上を，前記の画像固有成分として抽出し，データベースに登録する。また，教師データ生成装置は，入力画像を単純化する画像フィルタリングを行うフィルタリング部をさらに含んでいてもよい。この場合に，解析部は，画像フィルタリング後の入力画像から画像固有成分を抽出する。このようにフィルタリング部によって入力画像を単純化させることで，解析部による画像固有成分の抽出処理の負荷を軽減することができ，抽出処理の高速化を図ることができる。 The teacher data generation device according to the present invention preferably further includes an analysis unit. The analysis unit extracts one or more pieces of information regarding the shape of the 3D object, the light source, and the device used for capturing the image from the input image as the image specific component, and registers the extracted information in the database. The teacher data generation device may further include a filtering unit that performs image filtering to simplify the input image. In this case, the analysis unit extracts an image specific component from the input image after image filtering. Thus, by simplifying the input image by the filtering unit, it is possible to reduce the load of the image specific component extraction processing by the analysis unit, and to speed up the extraction processing.

本発明に係る教師データ生成装置は，比較部をさらに含んでいてもよい。比較部は，第１の入力画像に基づいて生成された第１の再構成画像を使用した機械学習により出力された第１の特徴パターンと，第２の入力画像に基づいて生成された第２の再構成画像を使用した機械学習により出力された第２の特徴パターンとを比較する。この場合に，変更部は，比較部における比較結果に基づいて，画像固有成分を変更する。例えば第１の特徴パターンと第２の特徴パターンとが特徴空間（特徴量をベクトルで示した空間）において離れすぎている場合には，変更部は画像固有成分の変更量を調整して両特徴パターンが近づくように制御し，反対に第１の特徴パターンと第２の特徴パターンとが特徴空間において近すぎる場合には，変更部は画像固有成分の変更量を調整して両特徴パターンが遠くなるように制御する。これにより，一の入力画像から機械学習に適した教師データを効率的に作成することができる。 The teacher data generation apparatus according to the present invention may further include a comparison unit. The comparison unit includes a first feature pattern output by machine learning using the first reconstructed image generated based on the first input image and a second feature pattern generated based on the second input image. Are compared with the second feature pattern output by machine learning using the reconstructed image. In this case, the changing unit changes the image specific component based on the comparison result in the comparison unit. For example, when the first feature pattern and the second feature pattern are too far apart in the feature space (a space in which the feature amount is represented by a vector), the changing unit adjusts the change amount of the image unique component to adjust both features. If the first feature pattern and the second feature pattern are too close to each other in the feature space, the changing unit adjusts the amount of change of the image specific component to make both feature patterns far away. Control to be. Thereby, it is possible to efficiently create teacher data suitable for machine learning from one input image.

本発明の第２の側面は，コンピュータプログラムに関する。本発明のコンピュータプログラムは，コンピュータを上記第１の側面に係る教師データ生成装置として機能させる。なお，このコンピュータプログラムは，インターネットを通じてダウンロード可能なものであってもよいし，記録媒体に記録されたものであってもよい。 The second aspect of the present invention relates to a computer program. The computer program of the present invention causes a computer to function as the teacher data generation device according to the first aspect. Note that this computer program may be downloadable via the Internet, or may be recorded on a recording medium.

本発明の第３の側面は，機械学習システムに関する。本発明の機械学習システムは，上記第１の側面に係る教師データ生成装置と，この教師データ生成装置が生成した再構成画像を利用して機械学習を実施する学習装置とを備える。 A third aspect of the present invention relates to a machine learning system. The machine learning system of the present invention includes the teacher data generation device according to the first aspect and a learning device that performs machine learning using a reconstructed image generated by the teacher data generation device.

本発明の第４の側面は，機械学習に用いられる教師データを生成する方法に関する。本発明に係る教師データの生成方法は，コンピュータにより実行される。まず，入力画像から抽出されたオブジェクトの形状及びその他の外観要因の少なくともいずれか１つ以上の画像固有成分をデータベースに記憶する（記憶工程）。次に，データベースに記憶されている画像固有成分を変更して，一又は複数種類の別の画像固有成分を生成する（変更工程）。次に，ここで得られた別の画像固有成分を用いて，少なくとも部分的に入力画像に対応する再構成画像を生成する（再構成工程）。そして，少なくとも再構成画像は機械学習用の教師データとして利用される。 The fourth aspect of the present invention relates to a method for generating teacher data used for machine learning. The method for generating teacher data according to the present invention is executed by a computer. First, at least one image specific component of the shape of the object extracted from the input image and other appearance factors is stored in the database (storage step). Next, the image specific component stored in the database is changed to generate one or more types of other image specific components (change step). Next, a reconstructed image corresponding to the input image is generated at least partially using another image specific component obtained here (reconstruction step). At least the reconstructed image is used as teacher data for machine learning.

本発明の第５の側面は，学習済みモデルの生成方法に関する。本発明に係る学習済みモデルの生成方法は，コンピュータにより実行される。本発明では，上記第４の側面に係る方法によって生成した教師データを利用して機械学習を実施することにより学習済みモデルを生成する。 A fifth aspect of the present invention relates to a learned model generation method. The learned model generation method according to the present invention is executed by a computer. In the present invention, a learned model is generated by performing machine learning using teacher data generated by the method according to the fourth aspect.

本発明の第６の側面は，学習済みモデルの使用方法に関する。本発明に係る学習済みモデルの使用方法は，コンピュータにより実行される。本発明では，上記第４の側面に係る方法によって生成した教師データを利用して機械学習を実施することにより学習済みモデルを生成し，ここで得られた学習済みモデルを用いて画像認識及び状況予測のいずれか一方又は両方行う。学習済みモデルを用いた画像認識及び状況予測は公知である。 The sixth aspect of the present invention relates to a method for using a learned model. The method of using the learned model according to the present invention is executed by a computer. In the present invention, a learned model is generated by performing machine learning using the teacher data generated by the method according to the fourth aspect, and image recognition and status are obtained using the learned model obtained here. Make either or both predictions. Image recognition and situation prediction using a learned model is well known.

本発明によれば，元データから特に画像解析に適した教師データを効率的に生成することができる。 According to the present invention, it is possible to efficiently generate teacher data particularly suitable for image analysis from original data.

図１は，機械学習システムの全体構成の一例を示している。FIG. 1 shows an example of the overall configuration of a machine learning system. 図２は，教師データ生成装置による入力画像の解析処理の一例を示している。FIG. 2 shows an example of input image analysis processing by the teacher data generation apparatus. 図３は，入力画像の解析処理の詳細を示している。FIG. 3 shows details of input image analysis processing. 図４は，教師データ生成装置による入力画像の解析処理の別の例を示している。FIG. 4 shows another example of input image analysis processing by the teacher data generation apparatus. 図５は，教師データ生成装置による画像再構成処理の一例を示している。FIG. 5 shows an example of image reconstruction processing by the teacher data generation apparatus. 図６は，教師データ生成装置による画像再構成処理の別の例を示している。FIG. 6 shows another example of image reconstruction processing by the teacher data generation device. 図７は，教師データ生成装置による画像再構成処理のさらに別の例を示している。FIG. 7 shows still another example of image reconstruction processing by the teacher data generation apparatus. 図８は，画像固有成分の抽出処理の一例を示している。FIG. 8 shows an example of an image specific component extraction process. 図９は，画像固有成分の変更処理の一例を示している。FIG. 9 shows an example of the image unique component changing process. 図１０は，画像固有成分の変更処理の別の例を示している。FIG. 10 shows another example of the image unique component changing process. 図１１は，入力画像の再構成処理の一例を示している。FIG. 11 shows an example of an input image reconstruction process.

以下，図面を用いて本発明を実施するための形態について説明する。本発明は，以下に説明する形態に限定されるものではなく，以下の形態から当業者が自明な範囲で適宜変更したものも含む。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. This invention is not limited to the form demonstrated below, The thing suitably changed in the range obvious to those skilled in the art from the following forms is also included.

図１は，本発明に係る機械学習システム１００の全体構成を示したブロック図である。図１に示されるように，機械学習システム１００は，教師データ生成装置２０とＡＩ学習装置３０とを備える。教師データ生成装置２０は，入力画像１０が入力されると，これに基づいて複数の教師データを生成し，ＡＩ学習装置３０に提供する。また，ＡＩ学習装置３０は，入力画像１０と教師データ生成装置２０から提供された教師データとを利用して機械学習を実施する。ＡＩ学習装置３０は，機械学習の結果，入力画像から抽出した特徴量を表すネットワークの構造と各リンクの重み付けが決定された学習済みモデル４０を出力することができる。この学習済みモデル４０に判別対象となる画像を適用することで，その画像の画像認識や状況予測を行うことが可能である。本発明において，ＡＩ学習装置３０や学習済み４０を用いた画像判別処理については，適宜公知の装置や処理方法を採用することができる。本発明は，ＡＩ学習装置３０が機械学習に利用する教師データを生成するための教師データ生成装置２０に特徴を有する。そこで，以下ではこの教師データ生成装置２０について特に詳しく説明する。 FIG. 1 is a block diagram showing the overall configuration of a machine learning system 100 according to the present invention. As shown in FIG. 1, the machine learning system 100 includes a teacher data generation device 20 and an AI learning device 30. When the input image 10 is input, the teacher data generation device 20 generates a plurality of teacher data based on the input image 10 and provides it to the AI learning device 30. In addition, the AI learning device 30 performs machine learning using the input image 10 and the teacher data provided from the teacher data generation device 20. As a result of machine learning, the AI learning device 30 can output a learned model 40 in which the network structure representing the feature amount extracted from the input image and the weight of each link are determined. By applying an image to be discriminated to the learned model 40, it is possible to perform image recognition and situation prediction of the image. In the present invention, a known device or processing method can be appropriately employed for the image discrimination processing using the AI learning device 30 or the learned 40. The present invention is characterized by a teacher data generation apparatus 20 for generating teacher data used by the AI learning apparatus 30 for machine learning. Therefore, the teacher data generation apparatus 20 will be described in detail below.

本発明の教師データ生成装置２０は，入力画像の種類や量が少ない場合に，ＡＩを学習させるのに十分な教師データを用意することを目的とするものである。効果的なＡＩ学習のためには，十分な量の教師データを用意することが重要となる。例えば，ＡＩを利用して自動車を識別する場合，多様な車種を，異なる照明条件，異なる色，異なる位置のカメラで撮影した画像や，あるいは異なる画質で表示された画像が大量に必要となるが，多様なシーンに備えてあらゆる状況の画像を教師データとして用意することは非常に困難である。そこで，本発明の教師データ生成装置２０は，様々な状況を表す新しい画像を容易に作成できるようにすることを目的としている。つまり，図１に示すように，教師データ生成装置２０は，１つの入力画像が与えられたときに，複数の新しい画像（再構成画像）を生成してＡＩ学習装置に適用する。 The teacher data generation apparatus 20 of the present invention is intended to prepare sufficient teacher data for learning AI when the types and amounts of input images are small. For effective AI learning, it is important to prepare a sufficient amount of teacher data. For example, when an automobile is identified using AI, a large amount of images taken with cameras of various vehicle types with different lighting conditions, different colors, different positions, or different image quality are required. , It is very difficult to prepare images of all situations as teacher data in preparation for various scenes. Therefore, an object of the teacher data generation device 20 of the present invention is to make it easy to create new images representing various situations. That is, as shown in FIG. 1, when one input image is given, the teacher data generation device 20 generates a plurality of new images (reconstructed images) and applies them to the AI learning device.

本発明に係る教師データ生成装置２０は，コンピュータにより実現されるものであり，基本的に制御演算装置と記憶装置とを備える。制御演算装置は，ＣＰＵ又はＧＰＵといったプロセッサを利用することができる。制御演算装置は，記憶装置に記憶されているプログラムを読み出し，このプログラムに従って所定の画像処理や演算処理を行う。また，制御演算装置は，プログラムに従った演算結果を，記憶装置に適宜書き込んだり読み出したりすることができる。本願の各図においては，制御演算装置により実現される各種の機能ブロックを示している。また，記憶装置のストレージ機能は，例えばＨＤＤ及びＳＤＤといった不揮発性メモリによって実現できる。また，記憶装置は，制御演算装置による演算処理の途中経過などを書き込む又は読み出すためのメモリとしての機能を有していてもよい。記憶装置のメモリ機能は，ＲＡＭやＤＲＡＭといった揮発性メモリにより実現できる。記憶装置は，後述するデータベース２２として利用することができる。 The teacher data generation device 20 according to the present invention is realized by a computer, and basically includes a control arithmetic device and a storage device. The control arithmetic unit can use a processor such as a CPU or a GPU. The control arithmetic device reads a program stored in the storage device and performs predetermined image processing and arithmetic processing according to the program. In addition, the control arithmetic device can appropriately write or read the arithmetic result according to the program in the storage device. In each drawing of the present application, various functional blocks realized by the control arithmetic device are shown. The storage function of the storage device can be realized by a nonvolatile memory such as an HDD and an SDD. Further, the storage device may have a function as a memory for writing or reading out the progress of arithmetic processing by the control arithmetic device. The memory function of the storage device can be realized by a volatile memory such as RAM or DRAM. The storage device can be used as a database 22 described later.

図２は，入力画像を解析して，そこから所定の画像固有成分を抽出し，データベース２２に登録する処理を示している。図２に示されるように，教師データ生成装置２０は，解析部２１とデータベース２２を有している。解析部２１は，入力画像から画像固有成分を抽出してデータベース２２に登録する機能を持つ。 FIG. 2 shows a process of analyzing an input image, extracting a predetermined image specific component therefrom, and registering it in the database 22. As shown in FIG. 2, the teacher data generation device 20 includes an analysis unit 21 and a database 22. The analysis unit 21 has a function of extracting an image specific component from the input image and registering it in the database 22.

図３は，解析部２１による解析処理を概念的に示している。解析部２１は，入力画像が入力されると，そこに含まれる画像固有成分を抽出する。画像固有成分は，例えば図３に示されるように，入力画像に含まれるオブジェクトの形状要素２１ａや，その他の外観要因２１ｂを含む。入力画像に含まれる複数のオブジェクトが含まれる場合，解析部２１は，複数のオブジェクトについてそれぞれの形状要素を抽出する。形状要素の例は，オブジェクトのサイズや，形態，輪郭，種類などである。オブジェクトの形状要素を抽出することは，例えばオブジェクトをボーンモデル化することを意味する。入力画像が写真であっても，そこに人の画像が含まれている場合には，その人画像をモデル化し，そのモデルの各種情報をデータベース２２に記憶することで，入力画像からオブジェクトの形状要素を画像固有成分として抽出することができる。例えば実画像によって表されるシーンは，コンピュータグラフィックス技術を用いてモデル化することができ，入力画像と同程度に近いものとすることができる（フォトリアリスティックレンダリング技術等）。 FIG. 3 conceptually shows the analysis processing by the analysis unit 21. When the input image is input, the analysis unit 21 extracts an image specific component included therein. For example, as shown in FIG. 3, the image unique component includes a shape element 21a of an object included in the input image and other appearance factors 21b. When a plurality of objects included in the input image are included, the analysis unit 21 extracts each shape element for the plurality of objects. Examples of shape elements are object size, form, contour, type, and the like. Extracting the shape element of the object means, for example, converting the object into a bone model. Even if the input image is a photograph, if it contains a human image, the human image is modeled, and various types of information about the model are stored in the database 22 so that the shape of the object can be determined from the input image. Elements can be extracted as image-specific components. For example, a scene represented by a real image can be modeled using computer graphics technology and can be similar to the input image (photorealistic rendering technology, etc.).

また，その他の外観要因２１ｂとしては，入力画像内に含まれるライティング情報（光源情報）や，テクスチャ情報，あるいはカメラ情報が挙げられる。ライティング情報には，照明の種類や強さ，アルベド，反射率，陰影，光源位置などに関する情報が含まれる。また，テクスチャ情報には，テクスチャ座標やカラー情報などが含まれる。また，カメラ情報には，入寮画像の撮影に用いられたカメラの解像度や，画質，ノイズ，画角などに関する情報が含まれる。このように，入力画像から外観要因に関する画像固有成分を抽出してデータベース２２に記憶する。例えば，このように入力画像から画像固有成分を抽出することで，例えばシェーディング成分（ライティング情報）とカラー成分（テクスチャ情報）とを分離することができる。このため，シェーディング成分を排除したカラー成分のみのオブジェクトを取り出すこともできる。解析部２１は，このようにして入力画像から抽出した各画像固有成分を，その種類ごとに分類してデータベース２２に登録する。 Other appearance factors 21b include lighting information (light source information), texture information, or camera information included in the input image. The lighting information includes information on the type and intensity of lighting, albedo, reflectance, shadow, light source position, and the like. The texture information includes texture coordinates and color information. The camera information includes information on the resolution, image quality, noise, angle of view, etc. of the camera used for taking the dormitory image. In this way, the image specific component related to the appearance factor is extracted from the input image and stored in the database 22. For example, by extracting the image specific component from the input image in this way, for example, a shading component (lighting information) and a color component (texture information) can be separated. For this reason, it is possible to extract an object having only a color component excluding the shading component. The analysis unit 21 classifies each image-specific component extracted from the input image in this way according to its type and registers it in the database 22.

このように，入力画像が与えられた場合，そこから抽出可能な画像固有成分は，オブジェクトの形状要素２１ａとその他の外観要因２１ｂとに分類される。形状要素２１ａは，入力画像を構成要素の集合として記述し，各構成要素は３Ｄオブジェクトとして定義される。例えば，道路上に自動車が存在するシーンを示した画像の場合，そのシーンは例えば自動車自体，道路，及び車の周囲環境に分解することができる。シーンの分解のレベルは，ＡＩの学習の程度に依存するため，より高度な学習が必要な場合にはシーンを細かく分解すればよい。例えばＡＩ学習装置の目的が自動車の種別を判別するための学習済みモデルそ生成することにある場合，自動車を表すための単一の３Ｄオブジェクトを抽出すれば十分である。あるいはＡＩ学習装置の目的がタイヤの挙動や異常等を検出することを目的とした学習済みモデルを生成することにある場合，自動車の３Ｄオブジェクトは，タイヤを含む部品郡を構成する３Ｄオブジェクトの集合として抽出される。シーンを定義する３Ｄオブジェクトは，手動でモデル化したり，既存のライブラリから取得したりすることもできるし，あるいは入力画像自体に十分な情報が含まれている場合には自動的に抽出できる。例えば，入力画像のデータに深さ情報が含まれている場合には３Ｄ再構成が可能である。同様に，画像が既存の自動車又はモニュメントの写真である場合も３Ｄ再構成が可能である。他方で，他の外観要因２１ｂとしては，画像の視覚的外観及びそのような画像を得るための物理的環境要因を抽出する。物理的環境要因の一例は，太陽やオフィスのライトなどの照明である。入力画像内を構成する照明情報は，照明の拡散率，アルベド，反射率，陰影，光源の強度や位置，霧，雨などの大気条件を含む相互依存するパラメータに分解することができる。他の別の物理的環境要因は，カメラなどの画像を撮影するために使用された装置である。つまり，シーンに対するカメラの位置，視野，解像度，ノイズおよび品質などの要素に分解することができる。 As described above, when an input image is given, image-specific components that can be extracted therefrom are classified into an object shape element 21a and other appearance factors 21b. The shape element 21a describes an input image as a set of components, and each component is defined as a 3D object. For example, in the case of an image showing a scene where a car is present on a road, the scene can be decomposed into, for example, the car itself, the road, and the surrounding environment of the car. Since the level of scene decomposition depends on the degree of AI learning, the scene may be decomposed finely when more advanced learning is required. For example, if the purpose of the AI learning device is to generate a learned model for discriminating the type of vehicle, it is sufficient to extract a single 3D object for representing the vehicle. Alternatively, when the purpose of the AI learning device is to generate a learned model for the purpose of detecting the behavior or abnormality of the tire, the 3D object of the automobile is a set of 3D objects that constitute a component group including the tire. Extracted as The 3D object defining the scene can be modeled manually, obtained from an existing library, or can be automatically extracted if the input image itself contains sufficient information. For example, when depth information is included in the input image data, 3D reconstruction is possible. Similarly, 3D reconstruction is possible if the image is a photograph of an existing car or monument. On the other hand, as other appearance factors 21b, the visual appearance of the image and the physical environment factors for obtaining such an image are extracted. An example of a physical environmental factor is lighting such as the sun or office lights. The illumination information constituting the input image can be decomposed into interdependent parameters including atmospheric diffusivity, albedo, reflectance, shadow, light source intensity and position, fog, rain and other atmospheric conditions. Another physical environmental factor is the device used to capture the image, such as a camera. In other words, it can be broken down into factors such as camera position, field of view, resolution, noise and quality with respect to the scene.

図４は，画像固有成分を抽出する処理の別の例を示している。図４に示されるように，教師データ生成装置２０は，解析部２１及びデータベース２２に加えて，さらにフィルタリング部２３を含む。フィルタリング部２３は，入力画像を解析部２１で解析する前に，その入力画像を単純化する画像フィルタリング処理を行う。例えば，入力画像の種類によっては，既存のコンピュータグラフィックス技術を用いて画像固有成分抽出する処理やモデル化する処理の負荷が大きくなる可能性がある。例えば，毛髪をリアルにモデリングすることは，現在のコンピュータグラフィックスにおいても依然として困難であるとされており，レンダリング時間が長くなりすぎて，ＡＩ学習用の教師データ生成の効率が低下するおそれがある。また，入力画像が写真であるような場合には，その写真の中には大量のオブジェクトが含まれているため，全てのオブジェクトから正確に画像固有成分を抽出することは困難である。そこで，図４に示された例では，入力画像から画像固有成分を抽出する処理の前に，これを単純化させるフィルタリング処理を行う。 FIG. 4 shows another example of processing for extracting an image specific component. As shown in FIG. 4, the teacher data generation apparatus 20 further includes a filtering unit 23 in addition to the analysis unit 21 and the database 22. The filtering unit 23 performs an image filtering process for simplifying the input image before analyzing the input image by the analysis unit 21. For example, depending on the type of input image, there is a possibility that the load of processing for extracting an image specific component or modeling for existing computer graphics technology may increase. For example, realistic modeling of hair is still difficult in current computer graphics, rendering time too long, and the efficiency of generating AI training teacher data may be reduced . In addition, when the input image is a photograph, since the photograph includes a large number of objects, it is difficult to accurately extract the image specific component from all the objects. Therefore, in the example shown in FIG. 4, before the process of extracting the image specific component from the input image, a filtering process for simplifying this is performed.

フィルタリング処理は，関連情報を保存しながら，元の入力画像の細部量を減らす処理である。フィルタリング処理の一例としては，画像の解像度を下げる処理や，画像のコントラストや明度，彩度を調整する処理，あるいは入力画像が不要成分を除去する処理が含まれる。また，フィルタリング処理には，例えば，kuwaharaフィルタ（エッジを保存したメジアンフィルタ），バイラテラルフィルタ（エッジを保存しつつ近接領域をぼかすフィルタ），ノンフォトリアリスティックレンダリング（NPR）を含む。これらのフィルタリング処理を行うことで，入力画像内の細部が大幅に削減され，ストロークのエッジやアニメーションレンダリングのための一定の色などの関連情報が残るため，処理の高速化を実現することができる。 The filtering process is a process for reducing the detail amount of the original input image while saving the related information. Examples of the filtering process include a process of reducing the resolution of the image, a process of adjusting the contrast, brightness, and saturation of the image, or a process of removing unnecessary components from the input image. The filtering process includes, for example, a kuwahara filter (median filter that preserves edges), a bilateral filter (a filter that blurs adjacent areas while preserving edges), and non-photorealistic rendering (NPR). By performing these filtering processes, details in the input image are greatly reduced, and related information such as stroke edges and certain colors for animation rendering remains, so the processing speed can be increased. .

図５は，入力画像から多種の再構成画像を生成する処理の一例を示している。図５に示されるように，教師データ生成装置２０は，上記の工程を経て構築したデータベース２２に加えて，抽出部２４，変更部２５，及び再構築部２６をさらに含む。入力画像が与えられると，抽出部２４による処理が１回実行され，変更部２５及び再構築部２６による処理が複数回繰り返し実行され，複数の再構成画像が作成される。再構成画像はＡＩ学習装置３０において使用される。 FIG. 5 shows an example of processing for generating various reconstructed images from an input image. As shown in FIG. 5, the teacher data generation apparatus 20 further includes an extraction unit 24, a change unit 25, and a reconstruction unit 26 in addition to the database 22 constructed through the above steps. When an input image is given, the processing by the extraction unit 24 is executed once, the processing by the changing unit 25 and the reconstruction unit 26 is repeatedly executed a plurality of times, and a plurality of reconstructed images are created. The reconstructed image is used in the AI learning device 30.

抽出部２４は，データベース２２から所定の画像固有成分を抽出（検索）するための要素である。抽出部２４は，教師データの元となる入力画像１０が入力されると，この入力画像１０に関連する画像構成成分をデータベース２２から抽出（検索）する。例えば，抽出部２４は，上記した解析部２１と同様の処理により入力画像から画像構成成分を抽出し，各画像構成成分と同様の又は対応した画像構成成分をデータベース２２から抽出（検索）すればよい。 The extraction unit 24 is an element for extracting (searching) a predetermined image specific component from the database 22. When the input image 10 that is the source of the teacher data is input, the extraction unit 24 extracts (searches) the image components related to the input image 10 from the database 22. For example, the extraction unit 24 extracts an image constituent component from the input image by the same processing as the analysis unit 21 described above, and extracts (searches) an image constituent component similar to or corresponding to each image constituent component from the database 22. Good.

変更部２５は，抽出部２４によって抽出された画像固有成分を変更して，一又は複数種類の別の画像固有成分を生成する。画像固有成分の変更には，例えば入力画像に含まれるオブジェクトの形状要素を変形させることが含まれる。前処理（図４等参照）において，オブジェクトは３Ｄモデルされているため，そのモデルを動かしてオブジェクトの形や姿勢を変形させればよい。また，画像固有成分の変更には，３Ｄモデルに当たる照明条件を変化させたり，３Ｄモデルを撮影しているカメラの位置を変化させたりすることが含まれる。例えば，３Ｄモデル表面のテクスチャパターンを別のパターンに再構築したり，あるいはこの再構築したテクスチャパターンを貼り付けた３Ｄモデルに対して仮想空間内で照明を当てることにより，画像固有成分を変更する。また，仮想空間内では，３Ｄモデルで当てる照明の色調を変えたり，３Ｄモデルに当てる照明の角度を変えたり，３Ｄモデルを撮影するカメラの角度を変えたりすることもできる。このように，入力画像は，画像固有成分に分離，具体的には入力画像に含まれるオブジェクトの形状モデルと，その他のテクスチャや照明条件，カメラ条件などの外観要因に分離されており，その上で，変更部２５は，形状モデル自体を変形させたり，あるいはその他の外観要因を様々なシーンを想定したものに変化させたりする処理を行う。このようにして，変更部２５は，元の入力画像の画像固有成分に対応する，一又は複数種類の別の画像固有成分を生成していく。 The changing unit 25 changes the image specific component extracted by the extraction unit 24 to generate one or more types of other image specific components. The change of the image specific component includes, for example, changing the shape element of the object included in the input image. In the preprocessing (see FIG. 4 and the like), since the object is a 3D model, the shape and orientation of the object may be changed by moving the model. In addition, the change of the image specific component includes changing the illumination condition corresponding to the 3D model and changing the position of the camera photographing the 3D model. For example, the image-specific component is changed by reconstructing the texture pattern on the surface of the 3D model into another pattern, or by applying illumination in the virtual space to the 3D model to which the reconstructed texture pattern is pasted. . In the virtual space, it is also possible to change the color tone of the illumination applied by the 3D model, change the angle of illumination applied to the 3D model, or change the angle of the camera that captures the 3D model. In this way, the input image is separated into image-specific components, specifically, the shape model of the object contained in the input image, and other appearance factors such as texture, lighting conditions, and camera conditions. Thus, the changing unit 25 performs a process of deforming the shape model itself or changing other appearance factors to those assuming various scenes. In this way, the changing unit 25 generates one or a plurality of types of other image specific components corresponding to the image specific component of the original input image.

例えば入力画像に自動車が含まれる場合，次の変更処理を行うことができる。例えば，自動車を写すカメラ位置を変更して様々な角度（上下左右斜め方向，遠近の調整など）から見ることができる。また，照明条件は，例えば，光の位置，輝度の色，光が太陽の場合には時間などを変化させることで，様々な方法で修正することができる。自動車モデルの形態要素については，異なる形状を持つ異なるメーカーの異なる自動車を使用することもでき，また自動車の構成物品（カメラ，照明など）を変更することもできる。 For example, when a car is included in the input image, the following change process can be performed. For example, it can be viewed from various angles (up and down, left and right diagonal directions, perspective adjustment, etc.) by changing the position of the camera that captures the car. Also, the illumination conditions can be corrected in various ways by changing the position of light, the color of luminance, and the time when the light is the sun, for example. As for the form factor of the car model, different cars of different manufacturers with different shapes can be used, and the components of the car (camera, lighting, etc.) can be changed.

再構成部２６は，上記変更後の画像固有成分を用いて，少なくとも部分的に元の入力画像に対応する再構成画像を生成する。つまり，再構成部２６は，変更後の画像固有成分を再度組み合わせたり，あるいはその組み合わせ方を変えたりすることによって，大量の再構成画像を生成することができる。このように，再構成部２６は，変更後の画像固有成分を最終的に合わせこんで，元画像とは別のバリエーションとなる再構成画像を生成する。 The reconstruction unit 26 generates a reconstructed image corresponding at least partially to the original input image using the image specific component after the change. That is, the reconstruction unit 26 can generate a large amount of reconstructed images by recombining the changed image specific components or changing the combination. In this way, the reconstruction unit 26 finally combines the changed image specific components to generate a reconstructed image that is a variation different from the original image.

図５に示されるように，再構成部２６によって生成された複数の再構成画像は，ＡＩ学習装置３０に提供される。また，これらの再構成画像とあわせて，元の入力画像をＡＩ学習装置３０に入力することとしてもよい。ＡＩ学習装置３０は，多種の再構成画像を教師データとして機械学習を実施し，特徴量を表すネットワークの構造と各リンクの重み付けが決定された学習済みモデル４０を出力する。このように，教師データとして利用できる入力画像の量や種類が乏しい場合であっても，教師データ生成装置２０を利用すれば，入力画像から多数の再構成画像を得ることができるため，教師データの量や種類を増やすことができる。 As shown in FIG. 5, the plurality of reconstructed images generated by the reconstructing unit 26 are provided to the AI learning device 30. The original input image may be input to the AI learning device 30 together with these reconstructed images. The AI learning device 30 performs machine learning using various reconstructed images as teacher data, and outputs a learned model 40 in which the network structure representing the feature amount and the weight of each link are determined. In this way, even if the amount and type of input images that can be used as teacher data are scarce, if the teacher data generation device 20 is used, a large number of reconstructed images can be obtained from the input images. You can increase the amount and type.

図６は，再構成画像を生成する処理の別の例を示している。図６に示されるように，教師データ生成装置２０は，フィルタリング部２７をさらに含む。このフィルタリング部２７は，前述したフィルタリング２３と同様に，入力画像を単純化するフィルタリング処理を行う。入力画像は，教師データ生成装置２０の抽出部２４やＡＩ学習装置３０には直接入力されず，フィルタリング部２７によって単純化されたものが抽出部２４やＡＩ学習装置３０にそれぞれ入力される。 FIG. 6 shows another example of processing for generating a reconstructed image. As shown in FIG. 6, the teacher data generation device 20 further includes a filtering unit 27. The filtering unit 27 performs a filtering process that simplifies the input image, similarly to the filtering 23 described above. The input image is not directly input to the extraction unit 24 or the AI learning device 30 of the teacher data generation device 20, but is simplified by the filtering unit 27 and input to the extraction unit 24 or the AI learning device 30.

図７は，再構成画像を生成する処理のさらに別の例を示している。図７に示した例において，教師データ生成装置２０は，比較部２８をさらに含む。例えば，第１の入力画像に基づいて第１の再構成画像が生成され，この第１の再構成画像を使用した機械学習により第１の特徴パターンが出力される。同様に，第２の入力画像に基づいて第２の再構成画像が生成され，この第２の再構成画像を使用した機械学習により第２の特徴パターンが出力される。このとき，例えば，比較部２８は，上記第１の特徴パターンと第２の特徴パターンとを比較して，両者が十分に異なるパターンであるか否かを判断する。比較部２８において第１の特徴パターンと第２の特徴パターンとの差が十分ではないと判断された場合，変更部２５は，第２の特徴パターンの学習に利用された第２の再構成画像を生成するための画像固有成分の変更量を変化させる。具体的には，変更部２５は，第１の特徴パターンと第２の特徴パターンの差が十分に大きくなるように，第２の再構成画像を生成するための画像固有成分の変更量を制御する。また，比較部２８は，上記第１の特徴パターンと第２の特徴パターンとを比較して，両者が離れすぎていないかどうかを判断してもよい。この場合，変更部２５は，第１の特徴パターンと第２の特徴パターンの差が許容範囲内に収まるように，第２の再構成画像を生成するための画像固有成分の変更量を制御する。このようにＡＩ学習装置３０の出力結果からのフィードバックを受けて，変更部２５における画像固有成分の変更量を制御することができる。これにより，機械学習に有効活用できる再構成画像（教師データ）を効率的に生成することができる。 FIG. 7 shows still another example of processing for generating a reconstructed image. In the example illustrated in FIG. 7, the teacher data generation device 20 further includes a comparison unit 28. For example, a first reconstructed image is generated based on the first input image, and a first feature pattern is output by machine learning using the first reconstructed image. Similarly, a second reconstructed image is generated based on the second input image, and a second feature pattern is output by machine learning using the second reconstructed image. At this time, for example, the comparison unit 28 compares the first feature pattern and the second feature pattern to determine whether or not they are sufficiently different patterns. When the comparison unit 28 determines that the difference between the first feature pattern and the second feature pattern is not sufficient, the changing unit 25 uses the second reconstructed image used for learning the second feature pattern. The amount of change of the image specific component for generating is changed. Specifically, the changing unit 25 controls the change amount of the image specific component for generating the second reconstructed image so that the difference between the first feature pattern and the second feature pattern becomes sufficiently large. To do. The comparison unit 28 may compare the first feature pattern and the second feature pattern to determine whether or not they are too far apart. In this case, the changing unit 25 controls the change amount of the image specific component for generating the second reconstructed image so that the difference between the first feature pattern and the second feature pattern falls within the allowable range. . In this way, the feedback from the output result of the AI learning device 30 can be received to control the change amount of the image specific component in the changing unit 25. Thereby, a reconstructed image (teacher data) that can be effectively used for machine learning can be efficiently generated.

次に，教師データ生成装置２０の実施例について説明する。図８は，入力画像からの画像固有成分の抽出及びオブジェクトの３Ｄモデル化の例を示している。この例における入力画像は，自動車を運転している車内のシーンを示しており，そこには運転手と同乗者が含まれる。図５に示した例では，入力画像から運転手と自動車とを抽出している。運転手の位置や姿勢は，骨格を抽出したボーンモデルで推定できる。また，自動車の３Ｄモデルは，データベース２２に登録されている既存のライブラリから取得すればよい。また，自動車の３Ｄモデルには車内の構造も含まれており，車内の３Ｄモデルと入力画像とが類似するように，車内を写すカメラ位置を決定することができる。そして，運転手のボーンモデルと自動車の車内の３Ｄモデルを組み合わせることで，入力画像に近似したシーンを表す３Ｄモデルを仮想空間内に生成することができる。そして，このようにして得られた３Ｄモデルに対して，照明条件やカメラ条件に変更を加えることで，教師データ用の多様な再構成画像を得ることができる。教師データを利用してＡＩ学習を行う目的は，例えば運転者の行動を検出することにあり，例えば片手での運転，電話を使用しながらの運転，喫煙しながらの運転といった危険運転を外部から検出することである。このようなＡＩ目標の定義から，入力画像を例として使用して，自動車，運転手，屋外シーンの３Ｄオブジェクトを定義すればよい。 Next, an embodiment of the teacher data generation device 20 will be described. FIG. 8 shows an example of extraction of image specific components from an input image and 3D modeling of an object. The input image in this example shows a scene in a car driving a car, which includes a driver and a passenger. In the example shown in FIG. 5, the driver and the car are extracted from the input image. The driver's position and posture can be estimated with a bone model that extracts the skeleton. Moreover, what is necessary is just to acquire the 3D model of a motor vehicle from the existing library registered in the database 22. FIG. In addition, the 3D model of the car includes a structure inside the vehicle, and the camera position that captures the inside of the car can be determined so that the 3D model in the car is similar to the input image. Then, by combining the driver's bone model and the in-car 3D model, a 3D model representing a scene approximated to the input image can be generated in the virtual space. A variety of reconstructed images for teacher data can be obtained by changing the illumination conditions and camera conditions for the 3D model obtained in this way. The purpose of AI learning using teacher data is, for example, to detect driver's behavior. For example, dangerous driving such as driving with one hand, driving while using a phone, driving while smoking is externally performed. Is to detect. From such an AI target definition, a 3D object of an automobile, a driver, or an outdoor scene may be defined using an input image as an example.

３Ｄ形状のパラメータの変更は，オブジェクトに応じて異なる方法で行われる。例えば，自動車の場合には，既存のライブラリから異なる車種を抽出したり，あるいは図９に示されるようにカメラの位置を変更することによって，画像固有成分の変更が行われる。他方で，人間の場合には，例えば図１０に示されるようにパラメトリックモデルを使用することができる。図１０では，上段の画像が体型の異なるモデルを示し，中間の画像は衣服スタイルの異なるモデルを示し，下の画像は異なる骨格アニメーションを示している。形状のパラメータ化は，元の形状またはデータが持つパラメータを１つ以上変更することによって，類似の形状またはデータを持つ異なるモデルを作成する。また，人間の形態の観点では，身体のサイズ（細身，標準，肥満）や，身長，頭の大きさ，腕，年齢などのパラメータを変更することもできる。スケルトンアニメーションについても同様に，初期パスと差異はあるもののこれに類似したパスになるように，パスを適宜変更することができる。図１０の下の画像では，様々なタイプのパスの変化を示している。図１０に示したスケルトンアニメーションは，いずれもランニング中の姿勢である点において共通しているが，それぞれアームバランスや，歩幅，元の立ち位置（スケルトンの中心骨格）のパラメータなどが微修正されいて，それぞれ似ているが異なる姿勢を取っていることがわかる。 Changing the parameters of the 3D shape is performed in different ways depending on the object. For example, in the case of a car, the image-specific component is changed by extracting a different car model from an existing library or changing the position of the camera as shown in FIG. On the other hand, in the case of a human, a parametric model can be used as shown in FIG. In FIG. 10, the upper image shows models with different body shapes, the intermediate image shows models with different clothing styles, and the lower image shows different skeleton animations. Shape parameterization creates different models with similar shapes or data by changing one or more parameters of the original shape or data. In terms of human form, parameters such as body size (slim, standard, obesity) and height, head size, arms, and age can be changed. Similarly, for the skeleton animation, the path can be appropriately changed so that the path is similar to the initial path, although there is a difference. The lower image in FIG. 10 shows various types of path changes. The skeleton animation shown in FIG. 10 is common in that it is a posture while running, but the parameters of arm balance, stride, original standing position (skeleton's central skeleton), etc. are slightly modified. , You can see that they are similar but have different attitudes.

図１１は，別の実施例を示している。図１１の例では，入力画像内のオブジェクトを３Ｄモデル化し，この３Ｄモデルに当たる照明の影響の排除をする前処理を行う。そして，照明の影響が排除された３Ｄモデルに対して別種の照明を当てることによって再構成画像を生成して，教師データの種類を増やすこととしている。具体的に説明すると，入力画像内の３Ｄモデルには，既に特定の照明が当てられている。そこで，解析部２１は，この３Ｄモデルを画像固有成分に分離し，例えばカラー成分（あるいはテクスチャ成分）とシェーディング成分とを抽出する。シェーディング成分は，照明の影響を受けて決定されたパラメータであるため，これを取り除いたカラー成分は，照明の影響が排除されたものであるといえる。これらの３Ｄモデルから抽出された画像固有成分は，データベース２２に蓄積される。 FIG. 11 shows another embodiment. In the example of FIG. 11, an object in the input image is converted into a 3D model, and preprocessing is performed to eliminate the influence of illumination that hits the 3D model. Then, a reconstructed image is generated by applying different types of illumination to the 3D model from which the influence of illumination is eliminated, and the types of teacher data are increased. Specifically, specific illumination is already applied to the 3D model in the input image. Therefore, the analysis unit 21 separates the 3D model into image-specific components, and extracts, for example, color components (or texture components) and shading components. Since the shading component is a parameter determined under the influence of lighting, it can be said that the color component from which the shading component has been removed is one in which the influence of lighting is eliminated. Image specific components extracted from these 3D models are stored in the database 22.

次いで，教師データの増量又は多様化が必要となったときに，抽出部２４はデータベース２２から入力画像１０に対応する画像固有成分を抽出する。前述したとおり，３Ｄモデルを構成する画像固有成分は，カラー成分とシェーディング成分とに分離されてデータベース２２に記録されている。抽出部２４は，照明条件を変更した教師データを生成する場合には，シェーディング成分を除いたカラー成分のみをデータベース２２から読み出す。変更部２５は，照明の影響が排除された３Ｄモデルに対して，仮想空間内にて照明条件（色調や角度）を変えた照明を当てることにより再ライティングの演算を行う。つまり，再ライティングによって，３Ｄモデルのカラー成分は変更されず，シェーディング成分のみが変更されることとなる。変更部２５は，このようにして入力画像に含まれる３Ｄモデルの画像固有成分の変更を行う。そして，再構成部２６は，変更された画像固有成分を用いて３Ｄモデルの再構成を行う。図１１の例では，元のカラー成分と，再ライティングによって変更されたシェーディング成分とを組み合わせて，照明条件が変更された再構成画像を生成する。照明条件の種類を変えることで，複数の再構成画像を生成することも容易である。例えば図１１の例において，再ライティング例１は周辺光を白色近くに変更した例であり，再ライティング例２は周辺光をオレンジ色近くに変更した例となっている。このようにして得られた再構成画像がＡＩ学習装置３０に提供されて，機械学習の教師データとして利用される。 Next, when it is necessary to increase or diversify the teacher data, the extraction unit 24 extracts an image specific component corresponding to the input image 10 from the database 22. As described above, the image specific components constituting the 3D model are separated into color components and shading components and recorded in the database 22. The extraction unit 24 reads only the color components excluding the shading components from the database 22 when generating the teacher data in which the illumination conditions are changed. The changing unit 25 performs a relighting operation by applying illumination in which the illumination conditions (color tone and angle) are changed in the virtual space to the 3D model from which the influence of illumination is eliminated. That is, the color component of the 3D model is not changed by relighting, and only the shading component is changed. The changing unit 25 changes the image unique component of the 3D model included in the input image in this way. Then, the reconstruction unit 26 reconstructs the 3D model using the changed image specific component. In the example of FIG. 11, the original color component and the shading component changed by relighting are combined to generate a reconstructed image with the illumination condition changed. It is also easy to generate a plurality of reconstructed images by changing the type of illumination conditions. For example, in the example of FIG. 11, the relighting example 1 is an example in which the ambient light is changed to near white, and the relighting example 2 is an example in which the ambient light is changed to near orange. The reconstructed image obtained in this way is provided to the AI learning device 30 and used as teacher data for machine learning.

以上，本願明細書では，本発明の内容を表現するために，図面を参照しながら本発明の実施形態の説明を行った。ただし，本発明は，上記実施形態に限定されるものではなく，本願明細書に記載された事項に基づいて当業者が自明な変更形態や改良形態を包含するものである。 As mentioned above, in this specification, in order to express the content of this invention, embodiment of this invention was described, referring drawings. However, the present invention is not limited to the above-described embodiments, but includes modifications and improvements obvious to those skilled in the art based on the matters described in the present specification.

１０…入力画像２０…教師データ生成装置
２１…解析部２２…データベース
２３…フィルタリング部２４…抽出部
２５…変更部２６…再構成部
２７…フィルタリング部２８…比較部
３０…ＡＩ学習装置４０…学習済みモデル DESCRIPTION OF SYMBOLS 10 ... Input image 20 ... Teacher data generation apparatus 21 ... Analysis part 22 ... Database 23 ... Filtering part 24 ... Extraction part 25 ... Change part 26 ... Reconstruction part 27 ... Filtering part 28 ... Comparison part 30 ... AI learning apparatus 40 ... Learning Finished model

抽出部２４は，データベース２２から所定の画像固有成分を抽出（検索）するための要素である。抽出部２４は，教師データの元となる入力画像１０が入力されると，この入力画像１０に関連する画像固有成分をデータベース２２から抽出（検索）する。例えば，抽出部２４は，上記した解析部２１と同様の処理により入力画像から画像固有成分を抽出し，各画像固有成分と同様の又は対応した画像固有成分をデータベース２２から抽出（検索）すればよい。 The extraction unit 24 is an element for extracting (searching) a predetermined image specific component from the database 22. When the input image 10 that is the source of the teacher data is input, the extraction unit 24 extracts (searches) an image specific component related to the input image 10 from the database 22. For example, the extraction unit 24 extracts an image specific component from the input image by the same processing as the analysis unit 21 described above, and extracts (searches) an image specific component similar to or corresponding to each image specific component from the database 22. Good.

Claims

機械学習に用いられる教師データを生成する教師データ生成装置であって，
入力画像から抽出されたオブジェクトの形状及びその他の外観要因の少なくともいずれか１つ以上の画像固有成分を記憶するデータベースと，
前記データベースに記憶されている前記画像固有成分を変更して，一又は複数種類の別の画像固有成分を生成する変更部と，
前記別の画像固有成分を用いて，少なくとも部分的に前記入力画像に対応する再構成画像を生成する再構成部と，を含み，
少なくとも前記再構成画像を教師データとして機械学習に適用する
教師データ生成装置。 A teacher data generation device for generating teacher data used for machine learning,
A database storing at least one image-specific component of the shape of the object extracted from the input image and other appearance factors;
A change unit that changes the image-specific component stored in the database to generate one or more types of other image-specific components;
A reconstructing unit that generates a reconstructed image corresponding at least in part to the input image using the other image-specific component,
A teacher data generation device that applies at least the reconstructed image as machine data to machine learning.

請求項１に記載の教師データ生成装置であって，
入力画像から，３Ｄオブジェクトの形状，光源，及び当該画像の撮影に用いられた装置に関する情報の１つ以上を前記画像固有成分として抽出し，前記データベースに登録する解析部をさらに含む
教師データ生成装置。 The teacher data generation device according to claim 1,
A teacher data generation device further including an analysis unit that extracts, as an image specific component, one or more pieces of information relating to the shape of the 3D object, the light source, and the device used to capture the image from the input image and registers the information in the database .

請求項２に記載の教師データ生成装置であって，
前記入力画像を単純化する画像フィルタリングを行うフィルタリング部をさらに含み，
前記解析部は，前記画像フィルタリング後の入力画像から前記画像固有成分を抽出する
教師データ生成装置。 The teacher data generation device according to claim 2,
A filtering unit that performs image filtering to simplify the input image;
The said analysis part is a teacher data generation apparatus which extracts the said image specific component from the input image after the said image filtering.

請求項１から請求項３のいずれかに記載の教師データ生成装置であって，
第１の入力画像に基づいて生成された第１の再構成画像を使用した機械学習により出力された第１の特徴パターンと，第２の入力画像に基づいて生成された第２の再構成画像を使用した機械学習により出力された第２の特徴パターンとを比較する比較部をさらに有し，
前記変更部は，前記比較部における比較結果に基づいて，前記画像固有成分を変更する
教師データ生成装置。 The teacher data generation device according to any one of claims 1 to 3,
The first feature pattern output by machine learning using the first reconstructed image generated based on the first input image, and the second reconstructed image generated based on the second input image A comparison unit that compares the second feature pattern output by machine learning using
The change unit is a teacher data generation device that changes the image specific component based on a comparison result in the comparison unit.

コンピュータを請求項１から請求項４のいずれかに記載の教師データ生成装置として機能させるコンピュータプログラム。 A computer program that causes a computer to function as the teacher data generation device according to any one of claims 1 to 4.

請求項１から請求項４のいずれかに記載の教師データ生成装置と，
前記教師データ生成装置が生成した再構成画像を利用して機械学習を実施する学習装置と，を備える
機械学習システム。 A teacher data generation device according to any one of claims 1 to 4,
A machine learning system comprising: a learning device that performs machine learning using the reconstructed image generated by the teacher data generation device.

機械学習に用いられる教師データを生成する方法であって，
入力画像から抽出されたオブジェクトの形状及びその他の外観要因の少なくともいずれか１つ以上の画像固有成分をデータベースに記憶する工程と，
前記データベースに記憶されている前記画像固有成分を変更して，一又は複数種類の別の画像固有成分を生成する工程と，
前記別の画像固有成分を用いて，少なくとも部分的に前記入力画像に対応する再構成画像を生成する工程と，を含み，
少なくとも前記再構成画像を教師データとして利用する
教師データの生成方法。 A method for generating teacher data used in machine learning,
Storing at least one image-specific component of the shape of the object extracted from the input image and other appearance factors in a database;
Changing the image-specific component stored in the database to generate one or more types of other image-specific components;
Using the other image-specific component to generate a reconstructed image corresponding at least in part to the input image,
A teacher data generation method using at least the reconstructed image as teacher data.

請求項７に記載の方法によって生成した教師データを利用して機械学習を実施することにより学習済みモデルを生成する
学習済みモデルの生成方法。 A learning model generation method for generating a learned model by performing machine learning using the teacher data generated by the method according to claim 7.

請求項７に記載の方法によって生成した教師データを利用して機械学習を実施することにより学習済みモデルを生成し，前記学習済みモデルを用いて画像認識及び状況予測のいずれか一方又は両方行う
学習済みモデルの使用方法。 A learned model is generated by performing machine learning using the teacher data generated by the method according to claim 7, and one or both of image recognition and situation prediction are performed using the learned model. To use a completed model.