JP2022147713A

JP2022147713A - Image generation device, learning device, and image generation method

Info

Publication number: JP2022147713A
Application number: JP2021049089A
Authority: JP
Inventors: 優也田中; Yuya Tanaka; 真也木内; Shinya Kiuchi; 友子森田; Tomoko Morita
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2022-10-06

Abstract

To provide an image generation device and the like capable of reliably improving the detection performance of a machine learning model.SOLUTION: An image generation device 10 includes: an area information extraction unit 11 that acquires a first image of an object captured by a camera and extracts a piece of first statistical information on the feature value of an object reflected in a first image from the first image; a CG generation unit 13 that acquires a second image including the object, which is different from the first image and extracts a piece of second statistical information on the feature value of the object included in the second image from the second image; and a feature value conversion unit 15 that performs a correction to make a piece of the second statistical information of the second image closer to the first statistical information of the first image based on the correspondence between the first statistical information and the second statistical information.SELECTED DRAWING: Figure 1

Description

本開示は、画像生成装置、学習装置、及び、画像生成方法に関する。 The present disclosure relates to an image generation device, a learning device, and an image generation method.

近年、ディープラーニング等の機械学習を利用した物体検出技術の開発が行われている。例えば、物体検出技術が車載に利用される場合、人物等の対象物は、出現頻度が多く検出対象として重要であり、当該対象物を精度よく検出することが望まれる。 In recent years, object detection techniques using machine learning such as deep learning have been developed. For example, when an object detection technique is used in a vehicle, objects such as people appear frequently and are important detection targets, and it is desired to detect such objects with high accuracy.

このような学習モデルは、対象物が映る実写画像を用いた訓練により生成されることが多い。どのような実写画像を用いて訓練されるかは、物体の検出性能に大きく影響する。 Such a learning model is often generated by training using a photographed image of an object. What real images are used for training greatly affects object detection performance.

特許文献１には、第二の画像の対象物の特徴（例えば、色）を変更した第一の画像を生成する技術が開示されている。これにより、所望の特徴を有する学習用の画像を生成することができる。 Patent Literature 1 discloses a technique for generating a first image in which the feature (for example, color) of an object in a second image is changed. As a result, a learning image having desired features can be generated.

国際公開第２０２０／１２１８１１号WO2020/121811

しかしながら、特許文献１の技術のように、画像処理により生成された学習用の画像を用いて機械学習モデルの訓練を行う場合、訓練による機械学習モデルの検出性能の向上が抑制されることがある。 However, when training a machine learning model using learning images generated by image processing, as in the technique of Patent Document 1, the improvement in the detection performance of the machine learning model due to training may be suppressed. .

そこで、本開示では、機械学習モデルの検出性能をより確実に向上させることができる画像生成装置、学習装置、及び、画像生成方法を提供する。 Therefore, the present disclosure provides an image generation device, a learning device, and an image generation method that can more reliably improve the detection performance of a machine learning model.

本開示の一態様に係る画像生成装置は、カメラにより対象物を撮像した第１画像を取得する第１取得部と、前記第１画像に映る前記対象物の特徴量の第１統計情報を前記第１画像から抽出する第１抽出部と、前記対象物が映り、前記第１画像と異なる第２画像を取得する第２取得部と、前記第２画像に映る前記対象物の前記特徴量の第２統計情報を前記第２画像から抽出する第２抽出部と、前記第１統計情報と前記第２統計情報との対応関係に基づいて、前記第２画像の前記第２統計情報を前記第１画像の前記第１統計情報に近づける補正を行う補正部とを備える。 An image generation device according to an aspect of the present disclosure includes a first acquisition unit that acquires a first image of an object captured by a camera; a first extraction unit that extracts from a first image; a second acquisition unit that captures the target object and acquires a second image that is different from the first image; a second extraction unit for extracting second statistical information from the second image; and extracting the second statistical information of the second image from the second image based on the correspondence relationship between the first statistical information and the second statistical information. a correction unit that performs correction to bring the image closer to the first statistical information of one image;

本開示の一態様に係る学習装置は、上記の画像生成装置により生成された前記第２画像を用いて、機械学習モデルに対して学習処理を行う。 A learning device according to an aspect of the present disclosure performs learning processing on a machine learning model using the second image generated by the image generation device.

本開示の一態様に係る画像生成方法は、カメラにより対象物を撮像した第１画像を取得し、前記第１画像に映る前記対象物の特徴量の第１統計情報を前記第１画像から抽出し、前記対象物が映り、前記第１画像と異なる第２画像を取得し、前記第２画像に映る前記対象物の前記特徴量の第２統計情報を前記第２画像から抽出し、前記第１統計情報と前記第２統計情報との対応関係に基づいて、前記第２画像の前記第２統計情報を前記第１画像の前記第１統計情報に近づける補正を行う。 An image generation method according to an aspect of the present disclosure acquires a first image of an object captured by a camera, and extracts first statistical information of a feature amount of the object shown in the first image from the first image. obtaining a second image showing the object and different from the first image; extracting second statistical information of the feature amount of the object shown in the second image from the second image; Correction is performed so that the second statistical information of the second image approaches the first statistical information of the first image based on the correspondence relationship between the first statistical information and the second statistical information.

本開示の一態によれば、機械学習モデルの検出性能をより確実に向上させることができる画像生成装置等を実現することができる。 According to one aspect of the present disclosure, it is possible to realize an image generation device and the like that can more reliably improve the detection performance of a machine learning model.

図１は、実施の形態に係る情報処理システムの機能構成を示すブロック図である。FIG. 1 is a block diagram showing the functional configuration of an information processing system according to an embodiment. 図２は、実施の形態に係る情報処理システムの動作を示すフローチャートである。FIG. 2 is a flow chart showing the operation of the information processing system according to the embodiment. 図３Ａは、実写画像を示す図である。FIG. 3A is a diagram showing a photographed image. 図３Ｂは、実写画像における部位ごとの領域、及び、特徴量を示す図である。FIG. 3B is a diagram showing regions and feature amounts for each part in a photographed image. 図４Ａは、実写画像の上半身のＲ値のヒストグラムを示す図である。FIG. 4A is a diagram showing a histogram of R values of the upper body of a photographed image. 図４Ｂは、ＣＧ画像の上半身のＲ値のヒストグラムを示す図である。FIG. 4B is a diagram showing a histogram of R values of the upper body of the CG image. 図４Ｃは、上半身のＲ値の変換テーブルを示す図である。FIG. 4C is a diagram showing a conversion table for the R value of the upper body. 図５は、対象物の姿勢を示す図である。FIG. 5 is a diagram showing the posture of the target object.

（本開示に至った経緯）
本開示の実施の形態の説明に先立ち、本開示の基礎となった知見について、図５を参照しながら説明する。図５は、対象物の姿勢を示す図である。 (Circumstances leading to this disclosure)
Prior to the description of the embodiments of the present disclosure, knowledge on which the present disclosure is based will be described with reference to FIG. FIG. 5 is a diagram showing the posture of the target object.

機械学習を用いた物体検出により人物を検出する場合、人物が映る画像(実写画像)を用いて機械学習が行われることが多い。人物を精度よく検出する場合、例えば、様々な姿勢の人物が映る画像を用いて訓練が行われることで、効果的に学習モデルの人物に対する検出性能を向上させることができる。 When detecting a person by object detection using machine learning, machine learning is often performed using an image (actual image) in which the person appears. When detecting a person with high accuracy, for example, training is performed using images showing a person in various postures, thereby effectively improving the detection performance of the learning model for a person.

しかしながら、実写画像の取得において、対象物の姿勢等によっては、収集することが容易な画像と、収集することが困難な画像とがある。 However, in acquisition of real images, there are images that are easy to acquire and images that are difficult to acquire, depending on the posture of the object.

図５に示すように、歩行者等の画像は、比較的容易に収集することができるが、座り姿勢、寝姿勢等の姿勢変動を伴った画像、及び、子供等の出現頻度が低い画像は、歩行者等の画像に比べて収集に時間的及び費用的なコストがかかることがある。 As shown in FIG. 5, images of pedestrians and the like can be collected relatively easily. , pedestrians, etc., may be time-consuming and costly to collect.

このように、対象物によっては収集が容易な集合と、収集が困難な集合とが混在していることがある。 In this way, depending on the object, there may be a mixture of a collection that is easy to collect and a collection that is difficult to collect.

収集が困難な集合に属する対象物の実写画像を収集するための代替手法のひとつとしてＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）が提案されている。例えば、ＣＧにより生成された対象物を実写背景に合成した画像を学習用の画像として用いる手法が提案されている。 CG (Computer Graphics) has been proposed as one of alternative methods for collecting actual images of objects belonging to collections that are difficult to collect. For example, a technique has been proposed in which an image obtained by synthesizing an object generated by CG with a photographed background is used as a learning image.

しかしながら、ＣＧにより生成された対象物を合成した画像を用いて機械学習モデルを訓練した場合、ＣＧにより生成された対象物を含まない実写画像を用いて機械学習モデルを訓練した場合と比べて、検知率が劣るという問題が指摘されている。ＣＧの特徴量（例えば、エッジ強度又は色調のような特徴量）分布が合成先の実写画像の特徴量（例えば、エッジ強度又は色調のような特徴量）分布と乖離している場合があり、そのために機械学習モデルが合成画像に固有の特徴量を学習してしまうことが原因として挙られる。 However, when training a machine learning model using an image synthesized by a CG-generated object, compared to training a machine learning model using a real-life image that does not contain a CG-generated object, The problem of poor detection rate has been pointed out. The distribution of CG feature quantities (for example, feature quantities such as edge strength or color tone) may deviate from the distribution of feature quantities (for example, feature quantities such as edge strength or color tone) of the actual image to be synthesized, One reason for this is that the machine learning model learns the unique feature values of the synthesized image.

上記のように、特許文献１の技術、及び、ＣＧにより生成された画像、つまり画像処理により生成された画像を用いて機械学習モデルの訓練を行う場合、訓練による機械学習モデルの検出性能の向上が抑制されることがある。 As described above, when training a machine learning model using the technique of Patent Document 1 and an image generated by CG, that is, an image generated by image processing, the detection performance of the machine learning model is improved by training. may be suppressed.

そこで、本願発明者は、検出性能をより確実に向上させることができる画像生成装置等について鋭意検討を行い、以下に説明する画像生成装置等を創案した。なお、本開示に係る画像生成装置等は、ＣＧにより生成された対象物を含む合成画像以外の画像を用いて機械学習モデルを訓練する場合にも、適用可能である。 Therefore, the inventors of the present application conducted extensive studies on an image generating apparatus and the like that can more reliably improve the detection performance, and created the image generating apparatus and the like described below. Note that the image generation device and the like according to the present disclosure can also be applied when training a machine learning model using images other than composite images including target objects generated by CG.

これにより、画像生成装置は、第２画像の特徴量の統計情報を、カメラにより撮像された画像（実写画像）の特徴量の統計情報に近づけることができる。つまり、第１画像の特徴量と第２画像の特徴量との乖離が生じることを抑制することができる。よって、画像生成装置により補正された第２画像を用いて機械学習モデルの訓練が行われることで、機械学習モデルが第２画像に固有の特徴量を学習してしまうことを抑制することができるので、機械学習モデルの検出性能をより確実に向上させることができる。 Thereby, the image generation device can bring the statistical information of the feature amount of the second image closer to the statistical information of the feature amount of the image (actually shot image) captured by the camera. That is, it is possible to suppress the occurrence of deviation between the feature amount of the first image and the feature amount of the second image. Therefore, by training the machine learning model using the second image corrected by the image generating device, it is possible to prevent the machine learning model from learning the feature amount unique to the second image. Therefore, the detection performance of the machine learning model can be improved more reliably.

また、例えば、前記第２画像は、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）により生成された前記対象物を含むＣＧ画像であってもよい。 Further, for example, the second image may be a CG image including the object generated by CG (Computer Graphics).

これにより、ＣＧにより生成された対象物の特徴量を、カメラが撮像した対象物の特徴量に近づけることができる。よって、ＣＧにより生成された対象物を含む学習用画像を用いて機械学習モデルの訓練が行われた場合に、機械学習モデルの検出性能をより確実に向上させることができる。 As a result, the feature amount of the object generated by CG can be brought closer to the feature amount of the object captured by the camera. Therefore, when a machine learning model is trained using a learning image including an object generated by CG, the detection performance of the machine learning model can be improved more reliably.

また、例えば、前記第１抽出部は、さらに前記第１画像から前記特徴量の第３統計情報であって前記第１統計情報より情報量が少ない第３統計情報を抽出し、前記第２抽出部は、さらに前記第２画像から前記特徴量の第４統計情報であって前記第２統計情報より情報量が少ない第４統計情報を抽出し、前記第２画像は、前記第３統計情報と前記第４統計情報との関係に基づいて、前記ＣＧにより生成された前記対象物の前記第４統計情報を前記第３統計情報に近づける補正が行われた画像であってもよい。 Further, for example, the first extraction unit further extracts, from the first image, third statistical information of the feature amount that has a smaller amount of information than the first statistical information, and the second extraction The unit further extracts fourth statistical information of the feature amount from the second image, the fourth statistical information being less in amount of information than the second statistical information, wherein the second image is the third statistical information and The image may be an image in which correction has been performed to bring the fourth statistical information of the object generated by the CG closer to the third statistical information based on the relationship with the fourth statistical information.

これにより、ＣＧにより生成された対象物の特徴量を、第１統計情報と第２統計情報とにより補正する前に、実写画像の対象物の特徴量に近づけることができる。第３統計情報と第４統計情報とにより補正することで、第１統計情報と第２統計情報とによる補正を効果的に行うことが可能となり、機械学習モデルの検出性能をさらに確実に向上させることが可能となる。 As a result, the feature amount of the target object generated by CG can be brought closer to the feature amount of the target object in the photographed image before being corrected by the first statistical information and the second statistical information. By correcting with the third statistical information and the fourth statistical information, it becomes possible to effectively correct with the first statistical information and the second statistical information, and the detection performance of the machine learning model is further reliably improved. becomes possible.

また、例えば、前記第１統計情報及び前記第２統計情報は、前記特徴量の分布を示す特徴量分布を含み、前記補正部は、前記第１統計情報における前記特徴量分布と前記第２統計情報における前記特徴量分布とに基づく前記第２統計情報を前記第１統計情報に近づけるための変換テーブルを用いて、前記補正を行ってもよい。 Further, for example, the first statistical information and the second statistical information include a feature quantity distribution indicating the distribution of the feature quantity, and the correcting unit includes the feature quantity distribution and the second statistical information in the first statistical information. The correction may be performed using a conversion table for bringing the second statistical information closer to the first statistical information based on the feature quantity distribution in the information.

これにより、変換テーブルを用いることで、機械学習モデルの検出性能をより確実にかつより簡単に向上させることができる。 As a result, by using the conversion table, the detection performance of the machine learning model can be improved more reliably and easily.

また、例えば、前記第１統計情報における前記特徴量分布、及び、前記第２統計情報における前記特徴量分布は、前記対象物の部位ごとに生成され、前記補正部は、前記対象物の前記部位ごとに前記変換テーブルを生成し、前記補正を行ってもよい。 Further, for example, the feature amount distribution in the first statistical information and the feature amount distribution in the second statistical information are generated for each part of the object, and the correcting unit generates the part of the object The conversion table may be generated for each time, and the correction may be performed.

これにより、第１画像の特徴量と第２画像の特徴量との乖離が生じることを対象物の部位のそれぞれにおいて抑制することができる。よって、画像生成装置により補正された第２画像を用いて機械学習モデルの訓練が行われることで、機械学習モデルが第２画像に固有の特徴量を学習してしまうことをさらに抑制することができるので、機械学習モデルの検出性能をさらに確実に向上させることができる。 Accordingly, it is possible to suppress the occurrence of deviation between the feature amount of the first image and the feature amount of the second image in each part of the object. Therefore, by training the machine learning model using the second image corrected by the image generation device, it is possible to further prevent the machine learning model from learning the feature amount unique to the second image. Therefore, it is possible to further improve the detection performance of the machine learning model.

また、例えば、前記特徴量は、前記対象物の色調であり、前記第１統計情報及び前記第２統計情報は、横軸を階調値としたヒストグラムであり、前記第３統計情報及び前記第４統計情報は、前記色調の平均値であってもよい。 Further, for example, the feature amount is the color tone of the object, the first statistical information and the second statistical information are histograms with the horizontal axis being a gradation value, the third statistical information and the 4 Statistical information may be an average value of the color tone.

これにより、第１画像及び第２画像の色調に関するヒストグラム及び平均値を取得することで、第２画像の色調を第１画像の色調に近づけることができる。よって、機械学習モデルが第２画像に固有の色調を学習してしまうことを抑制することができるので、機械学習モデルの検出性能をより確実に向上させることができる。 Accordingly, by acquiring the histogram and average value of the color tones of the first image and the second image, the color tone of the second image can be brought closer to the color tone of the first image. Therefore, it is possible to prevent the machine learning model from learning the color tone specific to the second image, so that the detection performance of the machine learning model can be improved more reliably.

また、例えば、前記第１画像に映る前記対象物と前記第２画像に映る前記対象物とは、前記対象物の姿勢が互いに異なっていてもよい。 Further, for example, the target object shown in the first image and the target object shown in the second image may have different postures.

これにより、第２画像は、第１画像より様々な姿勢の対象物を含む画像となり得る。このような第２画像を用いて機械学習モデルが訓練されることにより、様々な姿勢の対象物の検出性能を向上させ得る。 As a result, the second image can be an image including objects in various postures than the first image. By training a machine learning model using such a second image, it is possible to improve the detection performance of objects in various postures.

また、例えば、前記第２画像は、さらに、前記対象物の背景も前記ＣＧにより生成された画像であってもよい。 Further, for example, the second image may be an image in which the background of the object is also generated by the CG.

これにより、背景もＣＧにより生成された第２画像を学習用画像として用いた場合に、機械学習モデルの検出性能をより確実に向上させることができる。 As a result, the detection performance of the machine learning model can be more reliably improved when the second image in which the background is also generated by CG is used as the learning image.

また、例えば、前記第２画像は、前記ＣＧにより生成された前記対象物を前景とし、実写画像を背景として重畳することにより生成された画像であってもよい。 Further, for example, the second image may be an image generated by superimposing the target object generated by the CG as a foreground and a photographed image as a background.

これにより、実写画像にＣＧにより生成された対象物を重畳した第２画像を学習用画像として用いた場合に、機械学習モデルの検出性能をより確実に向上させることができる。 As a result, the detection performance of the machine learning model can be more reliably improved when the second image in which the target object generated by CG is superimposed on the photographed image is used as the learning image.

また、例えば、前記補正部により補正された前記第２画像は、機械学習モデルの学習時に使用される学習用画像であってもよい。 Further, for example, the second image corrected by the correction unit may be a learning image used when learning a machine learning model.

これにより、機械学習モデルの検出性能をより確実に向上させることができる。 This makes it possible to more reliably improve the detection performance of the machine learning model.

また、例えば、前記機械学習モデルは、物体検出用の学習モデル、画像セグメンテーション用の学習モデル、又は、深度推定用の学習モデルであってもよい。 Also, for example, the machine learning model may be a learning model for object detection, a learning model for image segmentation, or a learning model for depth estimation.

これにより、物体検出用の学習モデル、画像セグメンテーション用の学習モデル、又は、深度推定用の学習モデルの検出性能をより確実に向上させることができる。 Thereby, the detection performance of the learning model for object detection, the learning model for image segmentation, or the learning model for depth estimation can be improved more reliably.

また、本開示の一態様に係る学習装置は、上記の画像生成装置により生成された前記第２画像を用いて、機械学習モデルに対して学習処理を行う。 A learning device according to an aspect of the present disclosure performs learning processing on a machine learning model using the second image generated by the image generation device.

これにより、生成される機械学習モデルは、第２画像に固有の特徴量を学習してしまうことが抑制されるので、当該機械学習モデルの検出性能をより確実に向上させることができる。 As a result, the generated machine learning model is prevented from learning the feature quantity unique to the second image, so that the detection performance of the machine learning model can be more reliably improved.

また、本開示の一態様に係る画像生成方法は、カメラにより対象物を撮像した第１画像を取得し、前記第１画像に映る前記対象物の特徴量の第１統計情報を前記第１画像から抽出し、前記対象物が映り、前記第１画像と異なる第２画像を取得し、前記第２画像に映る前記対象物の前記特徴量の第２統計情報を前記第２画像から抽出し、前記第１統計情報と前記第２統計情報との対応関係に基づいて、前記第２画像の前記第２統計情報を前記第１画像の前記第１統計情報に近づける補正を行う。 Further, an image generation method according to an aspect of the present disclosure acquires a first image of an object captured by a camera, and converts first statistical information of the feature amount of the object reflected in the first image into the first image. to obtain a second image showing the object and different from the first image, extracting second statistical information of the feature amount of the object shown in the second image from the second image, Based on the correspondence relationship between the first statistical information and the second statistical information, correction is performed to bring the second statistical information of the second image closer to the first statistical information of the first image.

これにより、上記画像生成装置と同様の効果を奏する。 As a result, the same effect as that of the image generation device described above can be obtained.

なお、これらの全般的又は具体的な態様は、システム、方法、集積回路、コンピュータプログラム又はコンピュータで読み取り可能なＣＤ－ＲＯＭ等の非一時的記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラム又は記録媒体の任意な組み合わせで実現されてもよい。プログラムは、記録媒体に予め記憶されていてもよいし、インターネット等を含む広域通信網を介して記録媒体に供給されてもよい。 In addition, these general or specific aspects may be realized by a system, method, integrated circuit, computer program, or non-transitory recording medium such as a computer-readable CD-ROM. It may be realized by any combination of circuits, computer programs or recording media. The program may be pre-stored in a recording medium, or may be supplied to the recording medium via a wide area network including the Internet.

以下、実施の形態について、図面を参照しながら具体的に説明する。 Hereinafter, embodiments will be specifically described with reference to the drawings.

なお、以下で説明する実施の形態は、いずれも包括的又は具体的な例を示すものである。以下の実施の形態で示される数値、形状、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序等は、一例であり、本開示を限定する主旨ではない。例えば、数値は、厳格な意味のみを表す表現ではなく、実質的に同等な範囲、例えば数％程度の差異をも含むことを意味する表現である。また、以下の実施の形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 It should be noted that the embodiments described below are all comprehensive or specific examples. Numerical values, shapes, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are examples, and are not intended to limit the present disclosure. For example, numerical values are not expressions that express only strict meanings, but expressions that include a substantially equivalent range, for example, a difference of several percent. Further, among the constituent elements in the following embodiments, constituent elements not described in independent claims will be described as optional constituent elements.

また、各図は、模式図であり、必ずしも厳密に図示されたものではない。したがって、例えば、各図において縮尺などは必ずしも一致しない。また、各図において、実質的に同一の構成については同一の符号を付しており、重複する説明は省略又は簡略化する。 Each figure is a schematic diagram and is not necessarily strictly illustrated. Therefore, for example, scales and the like do not necessarily match in each drawing. Moreover, in each figure, the same code|symbol is attached|subjected about the substantially same structure, and the overlapping description is abbreviate|omitted or simplified.

また、本明細書において、同一などの要素間の関係性を示す用語、及び、矩形などの要素の形状を示す用語、並びに、数値、及び、数値範囲は、厳格な意味のみを表す表現ではなく、実質的に同等な範囲、例えば数％程度（例えば、５％程度）の差異をも含むことを意味する表現である。 Also, in this specification, terms indicating the relationship between elements such as the same, terms indicating the shape of elements such as rectangles, and numerical values and numerical ranges are not expressions that express only strict meanings. , a substantially equivalent range, for example, a difference of about several percent (for example, about 5%).

（実施の形態）
以下、本実施の形態に係る情報処理システムについて、図１～図４Ｃを参照しながら説明する。 (Embodiment)
An information processing system according to the present embodiment will be described below with reference to FIGS. 1 to 4C.

［１．情報処理システムの構成］
図１は、本実施の形態に係る情報処理システム１の機能構成を示すブロック図である。 [1. Configuration of information processing system]
FIG. 1 is a block diagram showing the functional configuration of an information processing system 1 according to this embodiment.

図１に示すように、情報処理システム１は、画像生成部１０と、学習部２０とを備える。情報処理システム１は、対象物をカメラで撮像した実写画像の特徴量を用いて、ＣＧで生成された対象物の特徴量を補正することで、学習モデルが訓練時に、ＣＧで生成された対象物を含む合成画像に固有の特徴量を学習することを抑制するためのシステムである。 As shown in FIG. 1 , the information processing system 1 includes an image generation section 10 and a learning section 20 . The information processing system 1 corrects the feature amount of the object generated by CG using the feature amount of the photographed image of the object captured by the camera, so that the learning model is trained on the object generated by CG. This is a system for suppressing learning of feature values unique to synthetic images that include objects.

なお、対象物は、特に限定されず、学習モデルが実装される装置の用途、利用場面等に応じて適宜決定される。対象物は、例えば、人物、人物以外の動物であってもよいし、車両等の移動体であってもよいし、固定されている物体であってもよい。 Note that the target object is not particularly limited, and is appropriately determined according to the purpose of the device in which the learning model is implemented, the usage scene, and the like. The object may be, for example, a person, an animal other than a person, a moving object such as a vehicle, or a fixed object.

画像生成部１０は、学習部２０における訓練に用いられる学習用の画像を生成する。画像生成部１０は、例えば、実写画像では収集が困難である画像をＣＧを用いて生成し、生成した画像（ＣＧ画像）の特徴量を実写画像の特徴量に近づける補正をすることで、特徴量が補正されたＣＧ画像を、学習用画像として出力する。画像生成部１０は、画像生成装置の一例である。 The image generation unit 10 generates learning images used for training in the learning unit 20 . For example, the image generating unit 10 uses CG to generate an image that is difficult to collect from a photographed image, and corrects the feature amount of the generated image (CG image) to be closer to the feature amount of the photographed image. A CG image whose amount has been corrected is output as a learning image. The image generation unit 10 is an example of an image generation device.

画像生成部１０は、領域情報抽出部１１と、統計情報計算部１２と、ＣＧ生成部１３と、テーブル生成部１４と、特徴量変換部１５とを有する。 The image generator 10 has an area information extractor 11 , a statistical information calculator 12 , a CG generator 13 , a table generator 14 , and a feature quantity converter 15 .

領域情報抽出部１１は、対象物の画像（実写画像）を取得し、取得した実写画像の領域を示す領域情報を抽出する。領域情報抽出部１１は、少なくとも実写画像に映る対象物の領域を示す領域情報を抽出する。領域情報抽出部１１は、例えば、実写画像に映る対象物の各構成部を、領域情報として抽出する。対象物が人物である場合、領域情報抽出部１１は、例えば、人物の体の各部位を、領域情報として抽出する。領域情報は、例えば、構成部ごとの当該構成部の実写画像上の位置又は領域を含む。実写画像は、カメラにより対象物を撮像した画像であり、第１画像の一例である。 The area information extraction unit 11 acquires an image of a target object (actually photographed image), and extracts area information indicating the area of the acquired photographed image. The region information extraction unit 11 extracts region information indicating at least the region of the object appearing in the photographed image. The area information extraction unit 11 extracts, for example, each constituent part of an object appearing in a photographed image as area information. When the object is a person, the area information extraction unit 11 extracts, for example, each part of the person's body as area information. The region information includes, for example, the position or region of each component on the photographed image of the component. A photographed image is an image of an object captured by a camera, and is an example of a first image.

領域情報抽出部１１は、画像検出又は画像セグメンテーション等の画像認識処理を実行することで、領域情報を抽出する。領域情報抽出部１１は、例えば、画像認識の訓練によって得られた学習済みモデルを含み、当該画像認識処理は、画像に映る人物の体の部分をこの学習済みモデルに入力することで実行される。なお、領域情報の抽出方法は、これに限定されず、既知のいかなる方法が用いられてもよい。 The region information extraction unit 11 extracts region information by executing image recognition processing such as image detection or image segmentation. The area information extraction unit 11 includes a trained model obtained by image recognition training, for example, and the image recognition processing is executed by inputting the body part of the person in the image into this trained model. . Note that the region information extraction method is not limited to this, and any known method may be used.

なお、実写画像は、カメラにより現実世界を撮像した対象物が映る画像である。実写画像は、例えば、公知のデータセットに含まれる画像であってもよい。また、実写画像は、ＣＧにより生成された対象物を含まない画像である。 Note that a photographed image is an image of an object that is captured by a camera in the real world. A real-world image may be, for example, an image contained in a known data set. A photographed image is an image generated by CG that does not include an object.

統計情報計算部１２は、実写画像に映る対象物の特徴量の第１統計情報を当該実写画像から抽出する。統計情報計算部１２は、実写画像と領域情報とに基づいて、対象物の少なくとも１つの領域（構成部）の特徴量の統計情報を計算する。また、統計情報計算部１２は、領域情報により特定される実写画像上の領域に基づいて、当該領域の特徴量の統計情報を計算する。つまり、統計情報計算部１２は、実写画像（例えば、実写画像の画素値）から特徴量の統計情報を計算する。なお、本実施の形態では、統計情報計算部１２は、対象物の領域ごと（構成部ごとであり、例えば、人物の体の部位ごと）に、当該実写画像から特徴量の第１統計情報を計算する。 The statistical information calculation unit 12 extracts the first statistical information of the feature amount of the object appearing in the photographed image from the photographed image. The statistical information calculation unit 12 calculates statistical information of feature amounts of at least one region (constituent portion) of the object based on the photographed image and the region information. Also, the statistical information calculation unit 12 calculates statistical information of the feature amount of the region based on the region on the photographed image specified by the region information. That is, the statistical information calculation unit 12 calculates statistical information of feature amounts from a photographed image (for example, pixel values of a photographed image). Note that, in the present embodiment, the statistical information calculation unit 12 calculates the first statistical information of the feature amount from the photographed image for each region of the object (for each component, for example, each part of the body of a person). calculate.

特徴量は、実写画像から取得され得る情報であり、例えば、対象物の画像自体の特徴を示す情報である。また、特徴量は、例えば、ＣＧ画像と実写画像とで乖離する情報であってもよい。以下では、特徴量が対象物の色調である例について説明する。なお、特徴量は、色調に限定されるものではなく画像に含まれるノイズ等であってもよい。ノイズは、例えば、ホワイトノイズ等である。なお、以下では、特徴量の統計情報を単に特徴量とも記載する。 A feature amount is information that can be obtained from a photographed image, and is, for example, information that indicates the characteristics of the image itself of the object. Also, the feature amount may be, for example, information that deviates between the CG image and the photographed image. An example in which the feature amount is the color tone of the object will be described below. Note that the feature amount is not limited to the color tone, and may be noise or the like included in the image. Noise is, for example, white noise or the like. In addition, below, the statistical information of a feature-value is only described also as a feature-value.

第１統計情報は、例えば、特徴量の分布を示す特徴量分布を含む。特徴量が色調である場合、第１統計情報は、例えば、特徴量分布の一例として、ＲＧＢヒストグラムであってもよい。ＲＧＢヒストグラムは、横軸を階調値とし、縦軸を当該領域におけるＲ（赤）、Ｇ（緑）及びＢ（青）の画素値（階調値）ごとの画素数としたヒストグラムである。ＲＧＢヒストグラムは、Ｒ（赤）、Ｇ（緑）及びＢ（青）ごとのヒスとグラムを含む。統計情報計算部１２が抽出するＲＧＢヒストグラムは、図１に示す実写画像のヒストグラムの一例である。 The first statistical information includes, for example, a feature quantity distribution that indicates the distribution of feature quantities. When the feature amount is color tone, the first statistical information may be, for example, an RGB histogram as an example of the feature amount distribution. The RGB histogram is a histogram in which the horizontal axis is the gradation value and the vertical axis is the number of pixels for each R (red), G (green), and B (blue) pixel value (gradation value) in the region. The RGB histogram contains Hiss and Grams for each R (red), G (green) and B (blue). The RGB histogram extracted by the statistical information calculation unit 12 is an example of the histogram of the photographed image shown in FIG.

また、統計情報計算部１２は、さらに、対象物の特徴量の第３統計情報であって第１統計情報より情報量が少ない第３統計情報を実写画像から抽出してもよい。特徴量が色調である場合、第３統計情報は、平均ＲＧＢ値であってもよい。平均ＲＧＢ値は、当該領域におけるＲ（赤）、Ｇ（緑）及びＢ（青）それぞれの画素値の平均値（色調の平均値）を示す。なお、第３統計情報は、平均値であることに限定されず、色調の最大値、最小値、中央値又は最頻値等であってもよい。また、ＲＧＢヒストグラムは、色調の分布を示す特徴量分布であれば、ヒストグラムであることに限定されない。 Moreover, the statistical information calculation unit 12 may further extract, from the photographed image, the third statistical information that is the third statistical information of the feature amount of the object and that has a smaller amount of information than the first statistical information. When the feature amount is color tone, the third statistical information may be an average RGB value. The average RGB value indicates the average value (average value of color tone) of pixel values of R (red), G (green), and B (blue) in the region. The third statistical information is not limited to the average value, and may be the maximum value, minimum value, median value, mode value, or the like of color tone. Further, the RGB histogram is not limited to being a histogram as long as it is a feature quantity distribution that indicates the distribution of color tones.

このように、統計情報計算部１２は、対象物の特徴量を示す２つの統計情報を計算してもよい。２つの統計情報は、対象物における同一の特徴量を示す情報であり、かつ、情報量が異なる情報である。 In this way, the statistical information calculation unit 12 may calculate two pieces of statistical information indicating the feature amount of the object. The two pieces of statistical information are information indicating the same feature amount of the object, and are information with different amounts of information.

また、統計情報計算部１２は、少なくとも第１統計情報における特徴量分布を対象物の部位ごとに生成する。統計情報計算部１２は、例えば、第１統計情報における特徴量分布及び第３統計情報を、対象物の部位ごとに生成する。統計情報計算部１２は、例えば、対象物の領域ごと（人物の体の部位ごと）に、平均ＲＧＢ値、及び、ＲＧＢヒストグラムを計算する。統計情報計算部１２は、第１抽出部の一例である。 Also, the statistical information calculation unit 12 generates at least the feature amount distribution in the first statistical information for each part of the object. The statistical information calculation unit 12 generates, for example, the feature quantity distribution in the first statistical information and the third statistical information for each part of the object. The statistical information calculation unit 12 calculates, for example, an average RGB value and an RGB histogram for each region of the object (each part of the human body). The statistical information calculator 12 is an example of a first extractor.

ＣＧ生成部１３は、実写画像に映る対象物とカテゴリが同じ対象物をＣＧにより生成し、ＣＧにより生成された対象物を含むＣＧ画像を生成する。ＣＧ画像は、第２画像の一例である。実写画像に映る対象物のカテゴリとＣＧにより生成された対象物のカテゴリとが同一のカテゴリであることを、対象物が同じであるとも記載する。ＣＧ生成部１３は、ＣＧにより対象物を生成することで、当該対象物が映り、実写画像と異なるＣＧ画像を取得する。ＣＧ生成部１３は、第２取得部として機能する。 The CG generation unit 13 generates, by CG, an object having the same category as that of the object appearing in the photographed image, and generates a CG image including the object generated by CG. A CG image is an example of a second image. When the category of the object shown in the photographed image and the category of the object generated by CG are the same category, it is also described that the object is the same. The CG generation unit 13 generates a target object by CG, and acquires a CG image in which the target object is reflected and which is different from the photographed image. The CG generation unit 13 functions as a second acquisition unit.

実写画像に映る対象物が人物である場合、ＣＧ生成部１３は、ＣＧにより人物を生成する。ＣＧ生成部１３は、例えば、実写画像に映る対象物と姿勢が異なる対象物をＣＧにより生成する。また、ＣＧ生成部１３は、例えば、実写画像に映る対象物と属性が異なる対象物をＣＧにより生成してもよい。対象物が人物である場合、属性は、例えば、年齢であってもよいし、体格であってもよいし、肌の色であってもよい。 When the object appearing in the photographed image is a person, the CG generation unit 13 generates the person by CG. The CG generation unit 13 generates, by CG, an object whose posture is different from that of the object shown in the photographed image, for example. Further, the CG generation unit 13 may generate, by CG, an object whose attributes are different from those of the object appearing in the photographed image. When the object is a person, the attribute may be, for example, age, physique, or skin color.

ＣＧ生成部１３が対象物をＣＧにより生成する手法は、特に限定されず、既知のいかなる方法が用いられてもよい。ＣＧ生成部１３は、例えば、ＤＡＺＳｔｕｄｉｏ、又は、ＡｕｔｏｄｅｓｋＣｈａｒａｃｔｅｒＧｅｎｅｒａｔｏｒ等のソフトウェアを用いて対象物を生成してもよい。 A method for the CG generation unit 13 to generate the object by CG is not particularly limited, and any known method may be used. The CG generator 13 may generate the object using software such as DAZ Studio or Autodesk Character Generator.

ＣＧ生成部１３は、例えば、ＣＧにより生成した対象物を前景とし、実写画像を背景として重畳させることで、ＣＧ画像を生成する。ＣＧ画像は、少なくとも一部がＣＧにより生成された画像である。ここでの実写画像は、領域情報抽出部１１が取得した実写画像であってもよいし、他の実写画像であってもよい。他の実写画像は、例えば、領域情報抽出部１１が取得した実写画像が撮像された環境が同一又は類似である実写画像であってもよい。環境は、例えば、実写画像が撮像された位置、時間帯、周囲の明るさ、カメラの撮像条件の少なくとも１つを含む。 For example, the CG generation unit 13 generates a CG image by superimposing a CG-generated object as a foreground and a photographed image as a background. A CG image is an image at least partially generated by CG. The photographed image here may be the photographed image acquired by the area information extraction unit 11, or may be another photographed image. The other photographed image may be, for example, a photographed image in which the environment in which the photographed image acquired by the area information extraction unit 11 was captured is the same or similar. The environment includes, for example, at least one of the position where the actual image was captured, the time period, the ambient brightness, and the imaging conditions of the camera.

なお、ＣＧ生成部１３は、対象物に加えて当該対象物の周囲の背景もＣＧにより生成してもよい。つまり、第２画像は、さらに、対象物の背景もＣＧにより生成された画像であってもよい。例えば、ＣＧ画像の全体がＣＧにより生成されてもよい。ＣＧ生成部１３は、様々な背景に対象物が映るＣＧ画像を容易に生成することができる。例えば、対象物を精度よく検出したい状況を示す背景をＣＧにより再現することで、特定の状況での対象物の検出精度を向上可能な学習用画像を生成し得る。 In addition to the object, the CG generation unit 13 may also generate the background around the object by CG. That is, the second image may be an image in which the background of the object is also generated by CG. For example, the entire CG image may be generated by CG. The CG generation unit 13 can easily generate CG images in which objects appear in various backgrounds. For example, it is possible to generate a learning image capable of improving the detection accuracy of the object in a specific situation by reproducing the background showing the situation in which the object is to be detected with high accuracy by CG.

また、ＣＧ生成部１３がＣＧにより背景を生成する場合、領域情報抽出部１１は実写画像の背景についても領域情報を抽出し、統計情報計算部１２は、背景の領域ごとに特徴量の統計情報を計算してもよい。背景の領域ごとの特徴量は、対象物の領域ごとの特徴量と同一種類の特徴量ある。統計情報計算部１２は、例えば、背景の領域ごとに特徴量を計算する。例えば、統計情報計算部１２は、背景の領域ごとに２種類の統計情報（例えば、平均ＲＧＢ値、及び、ＲＧＢヒストグラム）を計算してもよい。 Further, when the CG generation unit 13 generates the background by CG, the area information extraction unit 11 also extracts the area information of the background of the photographed image, and the statistical information calculation unit 12 calculates the statistical information of the feature amount for each background area. can be calculated. The feature quantity for each region of the background is the same type of feature quantity as the feature quantity for each region of the object. The statistical information calculation unit 12 calculates, for example, a feature amount for each background region. For example, the statistical information calculation unit 12 may calculate two types of statistical information (for example, average RGB value and RGB histogram) for each background region.

ＣＧ生成部１３は、対象物をＣＧにより生成するときに、ＣＧ画像に含まれる対象物の特徴量を取得可能である。ＣＧ生成部１３は、上記のソフトウェアでＣＧをレンダリングするときに、領域ごとの特徴量に関する情報を取得可能である。ＣＧ生成部１３は、対象物の領域、及び、領域ごとの特徴量を取得可能である。ＣＧ生成部１３は、例えば、統計情報計算部１２が実写画像から抽出した特徴量と同じ特徴量をＣＧにより生成された対象物から取得する。 The CG generation unit 13 can acquire the feature amount of the object included in the CG image when generating the object by CG. The CG generation unit 13 can acquire information about feature amounts for each region when rendering CG with the above software. The CG generation unit 13 can acquire the area of the object and the feature amount for each area. For example, the CG generation unit 13 acquires the same feature amount as the feature amount extracted from the photographed image by the statistical information calculation unit 12 from the object generated by CG.

ＣＧ生成部１３は、ＣＧ画像に映る対象物の特徴量の第２統計情報を当該ＣＧ画像から抽出する。ＣＧ生成部１３は、ＣＧ画像に基づいて、対象物の少なくとも１つの領域（構成部）の特徴量の統計情報を計算する。なお、本実施の形態では、ＣＧ生成部１３は、対象物の領域ごと（構成部ごとであり、例えば、人物の体の部位ごと）に、当該ＣＧ画像から特徴量の第２統計情報を計算する。 The CG generation unit 13 extracts the second statistical information of the feature amount of the object appearing in the CG image from the CG image. The CG generation unit 13 calculates statistical information of feature amounts of at least one region (constituent portion) of the object based on the CG image. Note that, in the present embodiment, the CG generation unit 13 calculates the second statistical information of the feature amount from the CG image for each region of the object (for each component, for example, each part of the body of a person). do.

特徴量は、ＣＧ画像から取得され得る情報であり、統計情報計算部１２が抽出した特徴量と同一の情報である。第２統計情報は、第１統計情報と同一の特徴量である。 The feature amount is information that can be acquired from the CG image, and is the same information as the feature amount extracted by the statistical information calculation unit 12 . The second statistical information is the same feature amount as the first statistical information.

第２統計情報は、例えば、特徴量の分布を示す特徴量分布を含む。特徴量が色調である場合、第２統計情報は、例えば、特徴量分布の一例として、ＲＧＢヒストグラムであってもよい。ＣＧ生成部１３が抽出するＲＧＢヒストグラムは、図１に示すＣＧ画像のヒストグラムの一例である。 The second statistical information includes, for example, a feature quantity distribution that indicates the distribution of feature quantities. When the feature amount is color tone, the second statistical information may be, for example, an RGB histogram as an example of the feature amount distribution. The RGB histogram extracted by the CG generator 13 is an example of the histogram of the CG image shown in FIG.

また、ＣＧ生成部１３は、さらに、対象物の特徴量の第４統計情報であって第２統計情報より情報量が少ない第４統計情報をＣＧ画像から抽出してもよい。第４統計情報は、第２統計情報と同一の特徴量である。特徴量が色調である場合、第４統計情報は、平均ＲＧＢ値であってもよい。 Further, the CG generation unit 13 may further extract from the CG image the fourth statistical information of the feature amount of the object, which has a smaller amount of information than the second statistical information. The fourth statistical information is the same feature quantity as the second statistical information. When the feature amount is color tone, the fourth statistical information may be an average RGB value.

このように、ＣＧ生成部１３は、対象物の特徴量を示す２つの統計情報を計算してもよい。２つの統計情報は、対象物における同一の特徴量を示す情報であり、情報量が異なる情報であり、かつ、統計情報計算部１２が実写画像から抽出した特徴量と同一の特徴量を示す情報である。 In this way, the CG generation unit 13 may calculate two pieces of statistical information indicating feature amounts of the object. The two pieces of statistical information are information indicating the same feature amount of the target object, are information with different information amounts, and are information indicating the same feature amount as the feature amount extracted from the photographed image by the statistical information calculation unit 12. is.

また、ＣＧ生成部１３は、少なくとも第２統計情報における特徴量分布を対象物の部位ごとに生成する。ＣＧ生成部１３は、例えば、第２統計情報における特徴量分布及び第４統計情報を、対象物の部位ごとに生成する。ＣＧ生成部１３は、例えば、対象物の領域ごと（人物の体の部位ごと）に、平均ＲＧＢ値、及び、ＲＧＢヒストグラムを計算する。ＣＧ生成部１３は、特徴量（例えば、特徴量の統計情報）を抽出する第２抽出部としても機能する。 Also, the CG generation unit 13 generates at least the feature amount distribution in the second statistical information for each part of the object. The CG generation unit 13 generates, for example, the feature quantity distribution in the second statistical information and the fourth statistical information for each part of the object. The CG generation unit 13 calculates, for example, an average RGB value and an RGB histogram for each region of the object (each part of the human body). The CG generation unit 13 also functions as a second extraction unit that extracts feature amounts (for example, feature amount statistical information).

テーブル生成部１４は、実写画像の特徴量とＣＧ画像の特徴量とに基づいて、実写画像の特徴量とＣＧ画像の特徴量との対応関係を示す変換テーブルを生成する。具体的には、テーブル生成部１４は、実写画像の特徴量の特徴量分布（例えば、ＲＧＢヒストグラム）とＣＧ画像の特徴量の特徴量分布（例えば、ＲＧＢヒストグラム）とに基づいて、変換テーブルを生成する。テーブル生成部１４は、対象物の領域ごと（例えば、人物の体の部位ごと）に変換テーブルを生成する。変換テーブルは、ＣＧ画像の特徴量（例えば、第２統計情報）を実写画像の特徴量（例えば、第１統計情報）に近づけるためのテーブルである。変換テーブルは、ＣＧ画像の特徴量（例えば、第２統計情報）を実写画像の特徴量（例えば、第１統計情報）に近づけるためのテーブルである。 The table generation unit 14 generates a conversion table indicating the correspondence relationship between the feature amount of the photographed image and the feature amount of the CG image based on the feature amount of the photographed image and the feature amount of the CG image. Specifically, the table generation unit 14 generates a conversion table based on the feature amount distribution (eg, RGB histogram) of the feature amount of the photographed image and the feature amount distribution (eg, RGB histogram) of the feature amount of the CG image. Generate. The table generation unit 14 generates a conversion table for each region of the object (for example, each part of a person's body). The conversion table is a table for bringing the feature quantity (eg, second statistical information) of the CG image closer to the feature quantity (eg, first statistical information) of the photographed image. The conversion table is a table for bringing the feature quantity (eg, second statistical information) of the CG image closer to the feature quantity (eg, first statistical information) of the photographed image.

特徴量変換部１５は、変換テーブル（実写画像の特徴量とＣＧ画像の特徴量との対応関係の一例）に基づいて、ＣＧ画像の特徴量を実写画像の特徴量に近づける補正を行う。特徴量変換部１５は、例えば、変換テーブルに基づいて、ＣＧ画像に含まれる対象物の第２統計情報を、実写画像に映る当該対象物の第１統計情報に近づける補正を行う。特徴量変換部１５は、ＣＧ画像と変換テーブルとに基づいて、ＣＧ画像の特徴量を変換した画像を生成する。特徴量変換部１５は、例えば、対象物の領域ごと（例えば、人物の体の部位ごと）に、当該部位に対応する変換テーブルを用いて、ＣＧ画像の特徴量を補正する。 The feature quantity conversion unit 15 corrects the feature quantity of the CG image to be closer to the feature quantity of the photographed image based on a conversion table (an example of the correspondence relationship between the feature quantity of the photographed image and the feature quantity of the CG image). For example, based on a conversion table, the feature amount conversion unit 15 corrects the second statistical information of the target object included in the CG image so as to be closer to the first statistical information of the target object appearing in the photographed image. The feature amount conversion unit 15 generates an image by converting the feature amount of the CG image based on the CG image and the conversion table. For example, the feature amount conversion unit 15 corrects the feature amount of the CG image for each region of the object (for example, each part of the human body) using a conversion table corresponding to the part.

このように、特徴量変換部１５は、第１統計情報における特徴量分布（例えば、ＲＧＢヒストグラム）と第２統計情報における特徴量分布（例えば、ＲＧＢヒストグラム）とに基づく第２統計情報を第１統計情報に近づけるための変換テーブルを用いて、ＣＧ画像の特徴量の補正を行う。特徴量変換部１５は、補正部の一例である。 In this way, the feature amount conversion unit 15 converts the second statistical information based on the feature amount distribution (eg, RGB histogram) in the first statistical information and the feature amount distribution (eg, RGB histogram) in the second statistical information to the first The feature amount of the CG image is corrected using a conversion table for approximating the statistical information. The feature quantity conversion unit 15 is an example of a correction unit.

ＣＧ画像の特徴量が補正された画像は、学習部２０による機械学習モデルの学習時に使用される学習用画像である。ＣＧ画像の特徴量が補正された画像は、ＣＧ画像の特徴量が変換された画像であるとも言える。 The image obtained by correcting the feature amount of the CG image is a learning image used when the learning unit 20 learns the machine learning model. An image in which the feature amount of the CG image has been corrected can also be said to be an image in which the feature amount of the CG image has been converted.

学習部２０は、カメラ等で撮像された画像に対して物体の識別等を行う学習モデルの訓練を行う。学習部２０は、画像生成部１０により生成された画像（特徴量が変換されたＣＧ画像）を用いて、学習モデルに対して学習処理を行う。学習部２０は、画像生成部１０により生成された画像と、当該画像に付与されたアノテーション情報とを含む学習用データセットを用いた機械学習により、学習モデルの訓練を行う。学習部２０は、学習装置の一例である。 The learning unit 20 trains a learning model for identifying an object on an image captured by a camera or the like. The learning unit 20 uses the image generated by the image generation unit 10 (a CG image whose feature amount has been converted) to perform a learning process on the learning model. The learning unit 20 trains a learning model by machine learning using a learning data set that includes images generated by the image generation unit 10 and annotation information attached to the images. The learning unit 20 is an example of a learning device.

学習モデルは、画像に基づいて物体を識別等の何らかのタスクを行う機械学習モデルの一例であり、例えば、ＤｅｅｐＬｅａｒｎｉｎｇ（深層学習）等のニューラルネットワークを用いた機械学習モデルである。学習モデルは、画像に映る対象物（物体）を検出する物体検出用の学習モデル（物体検出モデル）であってもよいし、画像の各画素がどのカテゴリに属するかを特定する（同じカテゴリに属する物体を同一ラベルとして扱う）画像セグメンテーション用の機械学習モデル（セグメンテーションモデル）であってもよいし、入力された画像の各画素の深度を推定する深度推定用のモデル（深度推定用モデル）であってもよい。セグメンテーションモデルには、例えば、畳み込みニューラルネットワーク（ＣＮＮ）を用いることができる。深度推定モデルには、例えば、ＣＮＮ－Ｄｅｐｔｈを用いることができる。 A learning model is an example of a machine learning model that performs some task such as identifying an object based on an image, and is, for example, a machine learning model using a neural network such as Deep Learning. The learning model may be a learning model for object detection (object detection model) that detects objects (objects) in an image, or a learning model (object detection model) that identifies to which category each pixel of an image belongs It may be a machine learning model (segmentation model) for image segmentation (which treats belonging objects as the same label), or a model for depth estimation (depth estimation model) that estimates the depth of each pixel in the input image. There may be. A convolutional neural network (CNN), for example, can be used for the segmentation model. For example, CNN-Depth can be used for the depth estimation model.

なお、学習モデルは、ニューラルネットワークを用いた機械学習モデルである例について説明したが、他の機械学習モデルであってもよい。例えば、機械学習モデルは、ＲａｎｄｏｍＦｏｒｅｓｔ、ＧｅｎｅｔｉｃＰｒｏｇｒａｍｍｉｎｇ等を用いた機械学習モデルであってもよい。 Although the learning model has been described as a machine learning model using a neural network, it may be another machine learning model. For example, the machine learning model may be a machine learning model using Random Forest, Genetic Programming, or the like.

また、機械学習は、例えば、ディープラーニングなどにおける誤差逆伝播法（ＢＰ：ＢａｃｋＰｒｏｐａｇａｔｉｏｎ）などによって実現される。具体的には、学習部２０は、訓練されていない学習モデルに画像生成部１０が生成した学習用画像を入力し、当該学習モデルが出力する識別結果を取得する。そして、学習部２０は、識別結果が正解情報となるように当該学習モデルを調整する。学習部２０は、このような調整をそれぞれ異なる複数の（例えば数千組の）学習用画像及びこれに対応する正解情報について繰り返すことによって、学習モデルの識別精度を向上させる。 Also, machine learning is realized by, for example, backpropagation (BP) in deep learning or the like. Specifically, the learning unit 20 inputs a learning image generated by the image generation unit 10 to a learning model that has not been trained, and acquires the identification result output by the learning model. Then, the learning unit 20 adjusts the learning model so that the identification result becomes correct information. The learning unit 20 repeats such adjustment for a plurality of different learning images (for example, thousands of sets) and correct information corresponding thereto, thereby improving the recognition accuracy of the learning model.

［２．情報処理システムの動作］
続いて、上記のように構成された情報処理システム１の動作について、図２～図４Ｃを参照しながら説明する。図２は、本実施の形態に係る情報処理システム１の動作を示すフローチャートである。図３Ａは、実写画像を示す図である。図３Ｂは、実写画像における部位ごとの領域、及び、特徴量を示す図である。図４Ａは、実写画像の上半身のＲ値のヒストグラムを示す図である。図４Ｂは、ＣＧ画像の上半身のＲ値のヒストグラムを示す図である。図４Ｃは、上半身のＲ値の変換テーブルを示す図である。 [2. Operation of information processing system]
Next, the operation of the information processing system 1 configured as described above will be described with reference to FIGS. 2 to 4C. FIG. 2 is a flow chart showing the operation of the information processing system 1 according to this embodiment. FIG. 3A is a diagram showing a photographed image. FIG. 3B is a diagram showing regions and feature amounts for each part in a photographed image. FIG. 4A is a diagram showing a histogram of R values of the upper body of a photographed image. FIG. 4B is a diagram showing a histogram of R values of the upper body of the CG image. FIG. 4C is a diagram showing a conversion table for the R value of the upper body.

図２におけるステップＳ１１～Ｓ１８は、画像生成部１０の動作であり、ステップＳ１９は、学習部２０の動作である。また、図３Ｂでは、部位ごとの領域を互いに異なるハッチングで示している。また、図４Ａ及び図４Ｂは、横軸が画素値（Ｒ値）を示し、縦軸が画素数を示す。なお、以下では、対象物が人物である例について説明するが、これに限定されない。 Steps S11 to S18 in FIG. 2 are operations of the image generation unit 10, and step S19 is operation of the learning unit 20. FIG. In addition, in FIG. 3B, the regions for each part are indicated by different hatching. 4A and 4B, the horizontal axis indicates the pixel value (R value), and the vertical axis indicates the number of pixels. An example in which the object is a person will be described below, but the object is not limited to this.

図２に示すように、領域情報抽出部１１は、実写画像を取得する（Ｓ１１）。領域情報抽出部１１は、例えば、図３Ａに示すように、人物が映る実写画像を複数取得する。領域情報抽出部１１は、実写画像を取得する第１取得部として機能する。 As shown in FIG. 2, the area information extraction unit 11 acquires a photographed image (S11). For example, as shown in FIG. 3A, the area information extraction unit 11 acquires a plurality of photographed images in which people appear. The area information extraction unit 11 functions as a first acquisition unit that acquires a photographed image.

なお、ステップＳ１２以降の処理は、実写画像を取得するたびに実行されてもよいし、所定の枚数の実写画像を取得した場合に実行されてもよい。 Note that the processing after step S12 may be executed each time a photographed image is acquired, or may be executed when a predetermined number of photographed images are acquired.

図２を再び参照して、次に、領域情報抽出部１１は、実写画像に映る人物の部位ごとの領域情報を抽出する（Ｓ１２）。図３Ｂでは、領域情報抽出部１１が、頭部、顔、上半身、左腕、右腕、下半身、左足及び右足の８個の領域を示す領域情報を、画像セグメンテーション等の画像認識処理を用いて抽出した例を示している。領域情報抽出部１１は、例えば、複数の画像のそれぞれにおいて領域情報を抽出してもよい。領域情報抽出部１１は、抽出した領域情報を統計情報計算部１２に出力する。なお、図３Ｂには、上半身及び下半身の特徴量（特徴量の統計情報）も参考として図示している。 Referring to FIG. 2 again, next, area information extraction unit 11 extracts area information for each part of the person appearing in the photographed image (S12). In FIG. 3B, the region information extraction unit 11 extracts region information indicating eight regions of the head, face, upper body, left arm, right arm, lower body, left leg, and right leg using image recognition processing such as image segmentation. shows an example. The region information extraction unit 11 may, for example, extract region information from each of a plurality of images. The area information extraction unit 11 outputs the extracted area information to the statistical information calculation unit 12 . In addition, FIG. 3B also shows feature amounts (statistical information of feature amounts) of the upper body and the lower body for reference.

図２を再び参照して、次に、統計情報計算部１２は、実写画像の部位ごとに、所定の特徴量に関する第１平均値及び第１ヒストグラムを計算する（Ｓ１３）。所定の特徴量は、例えば、色調であり、予め設定されている。第１平均値は、例えば、実写画像の平均ＲＧＢ値であり、第１ヒストグラムは、実写画像のヒストグラムであり、本実施の形態では、ＲＧＢヒストグラムである。第１平均値と、第１ヒストグラムとは、実写画像の人物の同一部位における統計情報である。 Referring to FIG. 2 again, the statistical information calculation unit 12 then calculates a first average value and a first histogram regarding predetermined feature amounts for each part of the photographed image (S13). The predetermined feature amount is, for example, color tone, and is set in advance. The first average value is, for example, the average RGB value of the photographed image, and the first histogram is the histogram of the photographed image, which is the RGB histogram in this embodiment. The first average value and the first histogram are statistical information for the same part of the person in the photographed image.

統計情報計算部１２は、例えば、図４Ａに示すような第１ヒストグラムを計算する。図４Ａは、実写画像の人物の複数の部位（例えば、８個の部位）のうち上半身におけるＲＧＢヒストグラムのうちのＲ値のヒストグラム（第１ヒストグラム）、及び、上半身におけるＲ値の第１平均値（図４Ａ中の平均値）を示す。なお、上半身における第１ヒストグラムには、Ｇ値のヒストグラム及びＢ値のヒストグラムも含まれるが、図示を省略している。 The statistical information calculator 12 calculates, for example, a first histogram as shown in FIG. 4A. FIG. 4A shows a histogram (first histogram) of R values in the RGB histogram of the upper body of a plurality of parts (e.g., 8 parts) of a person in a photographed image, and a first average value of R values in the upper body. (average value in FIG. 4A). Although the first histogram of the upper body includes a histogram of G values and a histogram of B values, they are omitted from the drawing.

統計情報計算部１２は、例えば、上半身のＲ値の第１ヒストグラム及び上半身のＲ値の第１平均値を、当該上半身の領域に含まれる複数の画素それぞれの画素値に基づいて計算する。統計情報計算部１２は、当該領域に含まれる複数の画素において、同一の画素値の画素数をカウントすることで、第１ヒストグラムを生成する。また、統計情報計算部１２は、当該領域に含まれる複数の画素それぞれの画素値の平均を第１平均値として計算する。統計情報計算部１２は、第１ヒストグラム（実写画像のヒストグラム）をテーブル生成部１４に出力し、第１平均値（平均ＲＧＢ値）をＣＧ生成部１３に出力する。 The statistical information calculation unit 12 calculates, for example, a first histogram of the R values of the upper body and a first average value of the R values of the upper body based on the pixel values of each of the plurality of pixels included in the upper body region. The statistical information calculation unit 12 generates a first histogram by counting the number of pixels having the same pixel value in a plurality of pixels included in the area. The statistical information calculation unit 12 also calculates the average of the pixel values of the plurality of pixels included in the region as the first average value. The statistical information calculation unit 12 outputs the first histogram (the histogram of the photographed image) to the table generation unit 14 and outputs the first average value (average RGB value) to the CG generation unit 13 .

図２を再び参照して、ＣＧ生成部１３は、第１平均値を用いてＣＧ画像を生成する（Ｓ１４）。ＣＧ生成部１３は、ＣＧにより人物を生成し、生成した人物の部位ごとの平均ＲＧＢ値（第２平均値）を算出し、部位ごとに、当該部位の第１平均値及び第２平均値の対応関係により当該部位の色調を変換する。ＣＧ生成部１３は、例えば、当該部位の第２平均値を当該部位の第１平均値に近づける補正を行うとも言える。なお、第２平均値（例えば、図４Ｂに示す平均値）は、第１平均値と画像上における同一の特徴量を示す値である。第１平均値は、第３統計情報の一例であり、第２平均値は、第４統計情報の一例である。 Referring to FIG. 2 again, the CG generator 13 uses the first average value to generate a CG image (S14). The CG generation unit 13 generates a person by CG, calculates an average RGB value (second average value) for each part of the generated person, and calculates the first average value and the second average value for each part. The color tone of the part concerned is converted according to the correspondence. It can also be said that the CG generation unit 13 performs, for example, correction to bring the second average value of the part closer to the first average value of the part. Note that the second average value (for example, the average value shown in FIG. 4B) is a value indicating the same feature quantity on the image as the first average value. The first average value is an example of third statistical information, and the second average value is an example of fourth statistical information.

ＣＧ生成部１３は、例えば、第１平均値と第２平均値との差異に基づいて、ＣＧ画像における当該部位の画素値を補正する。差異は、例えば、差分又は比率である。ＣＧ生成部１３は、例えば、当該部位の第１平均値と第２平均値との比率（＝第１平均値／第２平均値）を算出し、算出した比率をＣＧにより生成した人物の当該部位を構成する複数の画素のそれぞれに反映する。例えば、ＣＧ生成部１３は、当該比率を複数の画素のそれぞれに演算（乗算）する。ＣＧ生成部１３は、部位ごとに、上記の補正を行う。 The CG generator 13 corrects the pixel value of the part in the CG image based on the difference between the first average value and the second average value, for example. A difference is, for example, a difference or a ratio. The CG generation unit 13 calculates, for example, the ratio of the first average value and the second average value of the part (=first average value/second average value), and converts the calculated ratio to the It is reflected in each of the plurality of pixels that make up the part. For example, the CG generation unit 13 calculates (multiplies) each of the plurality of pixels by the ratio. The CG generator 13 performs the above correction for each part.

これにより、ＣＧにより生成された人物の特徴（例えば、色調）を、ＣＧを生成した時点で実写画像に映る人物に近づけることができる。ＣＧにより生成された人物は、生成時の条件に応じた光沢、シワ、凹凸、影等を含む画像である。例えば、第２平均値を第１平均値に置き換える場合、人物からシワ等が消失してしまう。一方、本実施の形態では、ＣＧにより生成された人物に、当該差異を演算するので、シワ等が消失することなく、ＣＧにより生成された人物の特徴量を実写画像の人物に近づけることができる。よって、より現実に近いＣＧ画像を生成することが可能となる。 This makes it possible to bring the characteristics (for example, color tone) of the person generated by CG closer to the person appearing in the photographed image at the time the CG was generated. A person generated by CG is an image including glossiness, wrinkles, irregularities, shadows, etc. according to the conditions at the time of generation. For example, when replacing the second average value with the first average value, wrinkles and the like disappear from the person. On the other hand, in the present embodiment, since the difference is calculated for the person generated by CG, the feature amount of the person generated by CG can be brought closer to the person in the photographed image without losing wrinkles and the like. . Therefore, it is possible to generate a more realistic CG image.

なお、生成時の条件は、例えば、光源の位置、光源の明るさ等を含み、例えば、ユーザにより設定される。 The conditions at the time of generation include, for example, the position of the light source, the brightness of the light source, etc., and are set by the user, for example.

ＣＧ生成部１３は、上記のような補正が行われたＣＧにより生成された人物を、実写画像に重畳することでＣＧ画像を生成する。このように、ＣＧ画像は、第１平均値と第２平均値との関係に基づいて、ＣＧにより生成された人物の第２平均値を第１平均値に近づける補正が行われた画像である。ＣＧ生成部１３は、生成したＣＧ画像を特徴量変換部１５に出力する。 The CG generation unit 13 generates a CG image by superimposing the person generated by the CG corrected as described above on the photographed image. In this way, the CG image is an image that has been corrected so that the second average value of the person generated by CG approaches the first average value based on the relationship between the first average value and the second average value. . The CG generation section 13 outputs the generated CG image to the feature quantity conversion section 15 .

なお、ＣＧ生成部１３は、ＣＧ画像の生成に第１ヒストグラムの情報を用いない。また、ＣＧ生成部１３は、第１平均値を用いてＣＧにより生成された人物の色調を補正することに限定されない。 Note that the CG generation unit 13 does not use the information of the first histogram to generate the CG image. Further, the CG generator 13 is not limited to correcting the color tone of a person generated by CG using the first average value.

次に、ＣＧ生成部１３は、ＣＧ画像の部位ごとに、所定の特徴量に関する第２ヒストグラムを計算する（Ｓ１５）。所定の特徴量は、例えば、ステップＳ１３における特徴量と同じ特徴量であり、本実施の形態では、色調である。第２ヒストグラムは、ＣＧ画像のヒストグラムであり、本実施の形態では、ＲＧＢヒストグラムである。第２ヒストグラムと、第２平均値とは、ＣＧ画像の人物の同一部位における統計情報である。 Next, the CG generation unit 13 calculates a second histogram regarding a predetermined feature amount for each part of the CG image (S15). The predetermined feature amount is, for example, the same feature amount as the feature amount in step S13, and is color tone in the present embodiment. The second histogram is the histogram of the CG image, and in this embodiment, it is the RGB histogram. The second histogram and the second average value are statistical information for the same part of the person in the CG image.

ＣＧ生成部１３は、例えば、図４Ｂに示すような第２ヒストグラムを計算する。図４Ｂは、ＣＧ画像の人物の複数の部位（例えば、８個の部位）のうち上半身におけるＲＧＢヒストグラムのうちのＲ値のヒストグラム（第２ヒストグラム）、及び、上半身のＲ値の第２平均値（図４Ｂ中の平均値）を示す。 The CG generator 13 calculates, for example, a second histogram as shown in FIG. 4B. FIG. 4B shows a histogram (second histogram) of the R values in the RGB histogram of the upper body of a plurality of parts (e.g., 8 parts) of the person in the CG image, and the second average value of the R values of the upper body. (average value in FIG. 4B).

なお、上半身における第２ヒストグラムには、Ｇ値のヒストグラム及びＢ値のヒストグラムも含まれるが、図示を省略している。なお、第２ヒストグラムにおける画素値（Ｒ値）は、第１ヒストグラムにおける画素値（Ｒ値）と同じビット数（例えば、８ｂｉｔ）の情報である。 Although the second histogram of the upper body includes a histogram of G values and a histogram of B values, they are omitted from the drawing. Note that the pixel value (R value) in the second histogram is information of the same number of bits (for example, 8 bits) as the pixel value (R value) in the first histogram.

ＣＧ生成部１３は、例えば、上半身のＲ値のヒストグラムを、ＣＧ画像における当該上半身の領域に含まれる複数の画素それぞれの画素値に基づいて計算する。ＣＧ生成部１３は、第２ヒストグラム（ＣＧ画像のヒストグラム）をテーブル生成部１４に出力する。 The CG generator 13 calculates, for example, a histogram of R values of the upper body based on the pixel values of each of a plurality of pixels included in the upper body region in the CG image. The CG generator 13 outputs the second histogram (a histogram of the CG image) to the table generator 14 .

図２を再び参照して、テーブル生成部１４は、第１ヒストグラムと第２ヒストグラムとに基づいて、第２ヒストグラムを第１ヒストグラムに近づけるための変換テーブルを生成する（Ｓ１６）。テーブル生成部１４は、図４Ａ及び図４Ｂに示すヒストグラムの対応関係に基づいて、図４Ｃに示す上半身のＲ値の変換テーブルを生成する。テーブル生成部１４は、図４Ａ及び図４ＢのそれぞれのＲ値の出現頻度の対応関係に基づいて変換テーブルを生成する。テーブル生成部１４は、図４Ａ及び図４ＢのそれぞれのＲ値を出現頻度ごとに順にプロットし、多項式近似、折れ線近似等の近似により変換関係が数式化された変換テーブルを生成する。テーブル生成部１４は、部位ごとに変換テーブルを生成する。テーブル生成部１４は、生成した変換テーブルを特徴量変換部１５に出力する。 Referring to FIG. 2 again, the table generator 14 generates a conversion table for bringing the second histogram closer to the first histogram based on the first histogram and the second histogram (S16). The table generation unit 14 generates a conversion table of R values of the upper body shown in FIG. 4C based on the correspondence relationship of the histograms shown in FIGS. 4A and 4B. The table generation unit 14 generates a conversion table based on the correspondence between the appearance frequencies of the R values in FIGS. 4A and 4B. The table generation unit 14 plots the R values in FIGS. 4A and 4B in order by frequency of appearance, and generates a conversion table in which the conversion relationship is expressed by approximation such as polynomial approximation and broken line approximation. The table generator 14 generates a conversion table for each part. The table generator 14 outputs the generated conversion table to the feature quantity converter 15 .

変換テーブルは、横軸が入力Ｒ値を示し、縦軸が出力Ｒ値を示す。入力Ｒ値は、ＣＧ生成部１３がステップＳ１４で生成したＣＧ画像における上半身のＲ値であり、出力Ｒ値は、当該Ｒ値の変換後のＲ値を示す。 In the conversion table, the horizontal axis indicates the input R value, and the vertical axis indicates the output R value. The input R value is the R value of the upper body in the CG image generated by the CG generation unit 13 in step S14, and the output R value indicates the R value after conversion of the R value.

図２を再び参照して、特徴量変換部１５は、変換テーブルに基づいて、ＣＧ画像の特徴量分布を変換する（Ｓ１７）。特徴量変換部１５は、変換テーブルを用いて、ＣＧ画像における人物の部位ごとのＲＧＢ値を変調させる。これにより、ＣＧ画像の人物のＲＧＢヒストグラムを、実写画像の人物のＲＧＢヒストグラムに近づけることができる。言い換えると、特徴量変換部１５は、ＣＧに特有の特徴量（例えば、色調）が低減されたＣＧ画像を生成することができる。 Referring to FIG. 2 again, the feature quantity conversion unit 15 converts the feature quantity distribution of the CG image based on the conversion table (S17). The feature quantity conversion unit 15 uses a conversion table to modulate the RGB values of each part of the person in the CG image. As a result, the RGB histogram of the person in the CG image can be brought closer to the RGB histogram of the person in the photographed image. In other words, the feature amount conversion unit 15 can generate a CG image in which the feature amount (for example, color tone) peculiar to CG is reduced.

次に、特徴量変換部１５は、全ての実写画像に対して処理が完了したか否かを判定する（Ｓ１８）。特徴量変換部１５は、全ての実写画像に対して処理が完了している場合（Ｓ１８でＹｅｓ）、特徴量が変換されたＣＧ画像を学習用画像として学習部２０に出力する。また、特徴量変換部１５は、全ての実写画像に対して処理が完了していない場合（Ｓ１８でＮｏ）、ステップＳ１１に戻り処理を継続する。 Next, the feature quantity conversion unit 15 determines whether or not the processing has been completed for all the photographed images (S18). When the processing has been completed for all the photographed images (Yes in S18), the feature amount conversion unit 15 outputs the CG image whose feature amount has been converted to the learning unit 20 as a learning image. Further, when the processing has not been completed for all the photographed images (No in S18), the feature amount conversion unit 15 returns to step S11 and continues the processing.

次に、学習部２０は、画像生成部１０から取得した学習用画像を用いて、学習モデルに対して学習処理を実行する（Ｓ１９）。学習部２０は、既知の学習処理により学習モデルのパラメータを最適化する。 Next, the learning unit 20 uses the learning image acquired from the image generation unit 10 to perform learning processing on the learning model (S19). The learning unit 20 optimizes the parameters of the learning model by a known learning process.

学習部２０は、上記のように画像生成部１０が生成した特徴量が補正されたＣＧ画像を用いることで、特徴量が補正されていないＣＧ画像を用いる場合に比べて、ＣＧに固有の特徴量を学習することが抑制された学習済みモデルを生成することができる。つまり、学習部２０は、ＣＧ画像を用いて訓練する場合であっても、検知率の低下が抑制された学習モデルを生成することができる。 By using the CG image in which the feature amount is corrected, which is generated by the image generating unit 10 as described above, the learning unit 20 is able to obtain the characteristics unique to the CG compared to the case of using the CG image in which the feature amount is not corrected. It is possible to generate a trained model in which learning of quantity is suppressed. In other words, the learning unit 20 can generate a learning model in which a decrease in the detection rate is suppressed even when training is performed using CG images.

なお、特徴量変換部１５は、１つの変換テーブルを、複数のＣＧ画像に適用してもよい。複数のＣＧ画像は、例えば、同一の人物が映り、かつ、当該人物の姿勢が異なるＣＧ画像であってもよい。例えば、特徴量変換部１５は、１つの変換テーブルを用いて、複数のＣＧ画像のそれぞれの特徴量を補正する。この場合、例えば、ステップＳ１７及びＳ１８の間に、１つの変換テーブルを用いて、複数のＣＧ画像のそれぞれの特徴量を変換したか否かを判定してもよい。 Note that the feature quantity conversion unit 15 may apply one conversion table to a plurality of CG images. The plurality of CG images may be, for example, CG images in which the same person is shown and the posture of the person is different. For example, the feature quantity conversion unit 15 corrects the feature quantity of each of a plurality of CG images using one conversion table. In this case, for example, between steps S17 and S18, one conversion table may be used to determine whether or not the feature amounts of each of the plurality of CG images have been converted.

（その他の実施の形態）
以上、一つまたは複数の態様に係る画像生成装置等について、実施の形態に基づいて説明したが、本開示は、この実施の形態に限定されるものではない。本開示の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、本開示に含まれてもよい。 (Other embodiments)
Although the image generation device and the like according to one or more aspects have been described above based on the embodiments, the present disclosure is not limited to these embodiments. As long as it does not deviate from the spirit of the present disclosure, the present disclosure may include various modifications that a person skilled in the art can come up with, and a configuration constructed by combining the components of different embodiments. .

例えば、上記実施の形態では、画像生成部は、実写画像を用いてＣＧ画像の特徴量を補正する例について説明したが、補正される対象となる画像はＣＧ画像に限定されず、例えば、実写画像であってもよいし、イラスト等の絵を撮像又は取り込んだ画像であってもよい。 For example, in the above-described embodiment, the image generation unit corrects the feature amount of the CG image using a photographed image. However, the image to be corrected is not limited to the CG image. It may be an image, or an image obtained by imaging or importing a picture such as an illustration.

また、上記実施の形態では、統計情報計算部は、対象物の特徴量を示す２つの統計情報を計算する例について説明したが、計算する統計情報の数はこれに限定されず、１つであってもよいし、３つ以上であってもよい。 In the above embodiment, the statistical information calculation unit calculates two pieces of statistical information indicating the feature amount of the target object. There may be, or there may be three or more.

また、上記実施の形態では、領域情報抽出部は、実写画像に映る対象物の各構成部の領域情報として抽出する例について説明したが、少なくとも１つの構成部の領域情報を抽出すればよい。領域情報抽出部は、例えば、特定の構成部のみを領域情報として抽出してもよい。例えば、対象物が人物である場合、領域情報抽出部は、特定の部位（例えば、上半身）のみを領域情報として抽出してもよい。 Further, in the above-described embodiment, the region information extraction unit extracts the region information of each constituent part of the object shown in the photographed image, but the region information of at least one constituent part may be extracted. The region information extraction unit may, for example, extract only a specific component as region information. For example, when the target object is a person, the area information extraction unit may extract only a specific part (for example, upper body) as area information.

また、上記実施の形態では、実写画像には１人の人物が映る例について説明したが、人物は複数写っていてもよい。この場合、領域情報抽出部は、複数の人物それぞれの領域を抽出してもよいし、任意の１人の人物の領域を抽出してもよい。 Also, in the above embodiment, an example in which one person is shown in the photographed image has been described, but a plurality of persons may be shown. In this case, the area information extraction unit may extract areas of each of a plurality of persons, or may extract an area of any one person.

また、上記実施の形態では、画像生成部は、ＣＧ画像を生成する例について説明したが、これに限定されず、外部の装置等からＣＧ画像を取得してもよい。 Further, in the above-described embodiment, an example in which the image generation unit generates a CG image has been described, but the present invention is not limited to this, and a CG image may be acquired from an external device or the like.

また、上記実施の形態では、ＣＧ生成部が第２統計情報を抽出する例について説明したがこれに限定されない。例えば、統計情報計算部がＣＧ画像から第２統計情報を取得してもよい。 Also, in the above embodiment, an example in which the CG generation unit extracts the second statistical information has been described, but the present invention is not limited to this. For example, the statistical information calculator may acquire the second statistical information from the CG image.

また、上記実施の形態では、学習モデルの訓練時に用いられる学習用の画像の生成について説明したが、本開示は、学習済みモデルを再学習するときに用いられる再学習用の画像の生成にも適用可能である。 Further, in the above embodiment, the generation of the learning image used when training the learning model has been described, but the present disclosure also applies to the generation of the re-learning image used when re-learning the trained model. Applicable.

また、上記実施の形態では、学習モデルは、ＤｅｅｐＬｅａｒｎｉｎｇ等のニューラルネットワークを用いた機械学習モデルである例について説明したが、他の機械学習モデルであってもよい。例えば、機械学習モデルは、ＲａｎｄｏｍＦｏｒｅｓｔ、ＧｅｎｅｔｉｃＰｒｏｇｒａｍｍｉｎｇ等を用いた機械学習モデルであってもよい。 Further, in the above embodiment, the learning model is a machine learning model using a neural network such as Deep Learning, but it may be another machine learning model. For example, the machine learning model may be a machine learning model using Random Forest, Genetic Programming, or the like.

また、上記実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 Further, in the above embodiments, each component may be implemented by dedicated hardware or by executing a software program suitable for each component. Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.

また、フローチャートにおける各ステップが実行される順序は、本開示を具体的に説明するために例示するためのものであり、上記以外の順序であってもよい。また、上記ステップの一部が他のステップと同時（並列）に実行されてもよいし、上記ステップの一部は実行されなくてもよい。 Also, the order in which each step in the flowchart is executed is for illustrative purposes in order to specifically describe the present disclosure, and orders other than the above may be used. Also, some of the steps may be executed concurrently (in parallel) with other steps, or some of the steps may not be executed.

また、ブロック図における機能ブロックの分割は一例であり、複数の機能ブロックを一つの機能ブロックとして実現したり、一つの機能ブロックを複数に分割したり、一部の機能を他の機能ブロックに移してもよい。また、類似する機能を有する複数の機能ブロックの機能を単一のハードウェア又はソフトウェアが並列又は時分割に処理してもよい。 Also, the division of functional blocks in the block diagram is an example, and a plurality of functional blocks can be realized as one functional block, one functional block can be divided into a plurality of functional blocks, and some functions can be moved to other functional blocks. may Moreover, single hardware or software may process the functions of a plurality of functional blocks having similar functions in parallel or in a time-sharing manner.

また、上記実施の形態等に係る画像生成装置及び学習装置のそれぞれは、単一の装置として実現されてもよいし、複数の装置により実現されてもよい。画像生成装置及び学習装置が複数の装置によって実現される場合、当該画像生成装置及び学習装置が有する各構成要素は、複数の装置にどのように振り分けられてもよい。また、画像生成装置及び学習装置が備える各構成要素の少なくとも１つは、サーバ装置により実現されてもよい。また、画像生成装置及び学習装置が複数の装置で実現される場合、当該画像生成装置及び学習装置間の通信方法は、特に限定されず、無線通信であってもよいし、有線通信であってもよい。また、装置間では、無線通信および有線通信が組み合わされてもよい。 Further, each of the image generation device and the learning device according to the above embodiments and the like may be realized as a single device or may be realized by a plurality of devices. When the image generation device and the learning device are realized by a plurality of devices, each component of the image generation device and the learning device may be distributed to the plurality of devices in any way. Also, at least one of the components included in the image generation device and the learning device may be realized by a server device. Further, when the image generation device and the learning device are realized by a plurality of devices, the communication method between the image generation device and the learning device is not particularly limited, and may be wireless communication or wired communication. good too. Also, wireless and wired communications may be combined between devices.

また、上記実施の形態で説明した各構成要素は、ソフトウェアとして実現されても良いし、典型的には、集積回路であるＬＳＩとして実現されてもよい。これらは、個別に１チップ化されてもよいし、一部または全てを含むように１チップ化されてもよい。ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。また、集積回路化の手法はＬＳＩに限るものではなく、専用回路または汎用プロセッサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）又は、ＬＳＩ内部の回路セルの接続若しくは設定を再構成可能なリコンフィギュラブル・プロセッサを利用してもよい。更には、半導体技術の進歩または派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて構成要素の集積化を行ってもよい。 Moreover, each component described in the above embodiments may be implemented as software, or typically as an LSI, which is an integrated circuit. These may be made into one chip individually, or may be made into one chip so as to include part or all of them. Although LSI is used here, it may also be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration. Also, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure connections or settings of circuit cells inside the LSI may be used. Furthermore, if an integrated circuit technology that replaces the LSI emerges due to advances in semiconductor technology or another technology derived from it, the components may naturally be integrated using that technology.

システムＬＳＩは、複数の処理部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などを含んで構成されるコンピュータシステムである。ＲＯＭには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムに従って動作することにより、システムＬＳＩは、その機能を達成する。 A system LSI is an ultra-multifunctional LSI manufactured by integrating multiple processing units on a single chip, and specifically includes a microprocessor, ROM (Read Only Memory), RAM (Random Access Memory), etc. A computer system comprising A computer program is stored in the ROM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

また、本開示の一態様は、図２に示す画像生成方法に含まれる特徴的な各ステップ（Ｓ１１～Ｓ１８）をコンピュータに実行させるコンピュータプログラムであってもよい。また、本開示の一態様は、学習部が行う学習方法に含まれる特徴的な各ステップ（Ｓ１９）をコンピュータに実行させるコンピュータプログラムであってもよい。このような学習方法は、上記の画像生成方法により生成された学習用の画像を取得し、取得された学習用の画像を用いて機械学習モデルに対して学習処理を行うことを含む。 Further, one aspect of the present disclosure may be a computer program that causes a computer to execute each characteristic step (S11 to S18) included in the image generation method shown in FIG. Further, one aspect of the present disclosure may be a computer program that causes a computer to execute each characteristic step (S19) included in the learning method performed by the learning unit. Such a learning method includes obtaining a learning image generated by the above-described image generating method, and performing a learning process on a machine learning model using the obtained learning image.

また、例えば、プログラムは、コンピュータに実行させるためのプログラムであってもよい。また、本開示の一態様は、そのようなプログラムが記録された、コンピュータ読み取り可能な非一時的な記録媒体であってもよい。例えば、そのようなプログラムを記録媒体に記録して頒布又は流通させてもよい。例えば、頒布されたプログラムを、他のプロセッサを有する装置にインストールして、そのプログラムをそのプロセッサに実行させることで、その装置に、上記各処理を行わせることが可能となる。 Also, for example, the program may be a program to be executed by a computer. Also, one aspect of the present disclosure may be a computer-readable non-transitory recording medium on which such a program is recorded. For example, such a program may be recorded on a recording medium and distributed or distributed. For example, by installing the distributed program in a device having another processor and causing the processor to execute the program, it is possible to cause the device to perform the above processes.

本開示は、学習モデルの訓練用の画像を生成する画像生成装置等に有用である。 INDUSTRIAL APPLICABILITY The present disclosure is useful for an image generation device or the like that generates an image for training a learning model.

１情報処理システム
１０画像生成部
１１領域情報抽出部
１２統計情報計算部
１３ＣＧ生成部
１４テーブル生成部
１５特徴量変換部
２０学習部 1 information processing system 10 image generation unit 11 area information extraction unit 12 statistical information calculation unit 13 CG generation unit 14 table generation unit 15 feature amount conversion unit 20 learning unit

Claims

カメラにより対象物を撮像した第１画像を取得する第１取得部と、
前記第１画像に映る前記対象物の特徴量の第１統計情報を前記第１画像から抽出する第１抽出部と、
前記対象物が映り、前記第１画像と異なる第２画像を取得する第２取得部と、
前記第２画像に映る前記対象物の前記特徴量の第２統計情報を前記第２画像から抽出する第２抽出部と、
前記第１統計情報と前記第２統計情報との対応関係に基づいて、前記第２画像の前記第２統計情報を前記第１画像の前記第１統計情報に近づける補正を行う補正部とを備える
画像生成装置。 a first acquisition unit that acquires a first image of an object captured by a camera;
a first extracting unit that extracts, from the first image, first statistical information of the feature amount of the object appearing in the first image;
a second acquisition unit that captures the object and acquires a second image that is different from the first image;
a second extraction unit that extracts second statistical information of the feature amount of the object appearing in the second image from the second image;
a correction unit that performs correction to bring the second statistical information of the second image closer to the first statistical information of the first image based on the correspondence relationship between the first statistical information and the second statistical information. Image production device.

前記第２画像は、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）により生成された前記対象物を含むＣＧ画像である
請求項１に記載の画像生成装置。 The image generating apparatus according to claim 1, wherein the second image is a CG image including the object generated by CG (Computer Graphics).

前記第１抽出部は、さらに前記第１画像から前記特徴量の第３統計情報であって前記第１統計情報より情報量が少ない第３統計情報を抽出し、
前記第２抽出部は、さらに前記第２画像から前記特徴量の第４統計情報であって前記第２統計情報より情報量が少ない第４統計情報を抽出し、
前記第２画像は、前記第３統計情報と前記第４統計情報との関係に基づいて、前記ＣＧにより生成された前記対象物の前記第４統計情報を前記第３統計情報に近づける補正が行われた画像である
請求項１又は２に記載の画像生成装置。 The first extraction unit further extracts, from the first image, third statistical information of the feature amount that has a smaller amount of information than the first statistical information,
The second extraction unit further extracts fourth statistical information of the feature amount from the second image, the fourth statistical information having a smaller amount of information than the second statistical information,
The second image is corrected based on the relationship between the third statistical information and the fourth statistical information to bring the fourth statistical information of the object generated by the CG closer to the third statistical information. 3. The image generation device according to claim 1, wherein the image is a split image.

前記第１統計情報及び前記第２統計情報は、前記特徴量の分布を示す特徴量分布を含み、
前記補正部は、前記第１統計情報における前記特徴量分布と前記第２統計情報における前記特徴量分布とに基づく前記第２統計情報を前記第１統計情報に近づけるための変換テーブルを用いて、前記補正を行う
請求項１～３のいずれか１項に記載の画像生成装置。 The first statistical information and the second statistical information include a feature distribution indicating the distribution of the feature,
The correction unit uses a conversion table for bringing the second statistical information based on the feature amount distribution in the first statistical information and the feature amount distribution in the second statistical information closer to the first statistical information, The image generation device according to any one of claims 1 to 3, wherein the correction is performed.

前記第１統計情報における前記特徴量分布、及び、前記第２統計情報における前記特徴量分布は、前記対象物の部位ごとに生成され、
前記補正部は、前記対象物の前記部位ごとに前記変換テーブルを生成し、前記補正を行う
請求項４に記載の画像生成装置。 The feature quantity distribution in the first statistical information and the feature quantity distribution in the second statistical information are generated for each part of the object,
The image generation device according to claim 4, wherein the correction unit generates the conversion table for each part of the object and performs the correction.

前記特徴量は、前記対象物の色調であり、
前記第１統計情報及び前記第２統計情報は、横軸を階調値としたヒストグラムであり、
前記第３統計情報及び前記第４統計情報は、前記色調の平均値である
請求項３に記載の画像生成装置。 The feature amount is a color tone of the object,
The first statistical information and the second statistical information are histograms with a horizontal axis as a gradation value,
The image generation device according to claim 3, wherein the third statistical information and the fourth statistical information are average values of the color tones.

前記第１画像に映る前記対象物と前記第２画像に映る前記対象物とは、前記対象物の姿勢が互いに異なる
請求項１～６のいずれか１項に記載の画像生成装置。 7. The image generation device according to claim 1, wherein the object shown in the first image and the object shown in the second image are different in orientation of the object.

前記第２画像は、さらに、前記対象物の背景も前記ＣＧにより生成された画像である
請求項１～７のいずれか１項に記載の画像生成装置。 The image generation device according to any one of claims 1 to 7, wherein the second image is an image in which the background of the object is also generated by the CG.

前記第２画像は、前記ＣＧにより生成された前記対象物を前景とし、実写画像を背景として重畳することにより生成された画像である
請求項１～７のいずれか１項に記載の画像生成装置。 8. The image generation device according to claim 1, wherein the second image is an image generated by superimposing the object generated by the CG as a foreground and a photographed image as a background. .

前記補正部により補正された前記第２画像は、機械学習モデルの学習時に使用される学習用画像である
請求項１～９のいずれか１項に記載の画像生成装置。 The image generation device according to any one of claims 1 to 9, wherein the second image corrected by the correction unit is a learning image used when learning a machine learning model.

前記機械学習モデルは、物体検出用の学習モデル、画像セグメンテーション用の学習モデル、又は、深度推定用の学習モデルである
請求項１０に記載の画像生成装置。 The image generation device according to claim 10, wherein the machine learning model is a learning model for object detection, a learning model for image segmentation, or a learning model for depth estimation.

請求項１～１１のいずれか１項に記載の画像生成装置により生成された前記第２画像を用いて、機械学習モデルに対して学習処理を行う
学習装置。 A learning device that performs learning processing on a machine learning model using the second image generated by the image generating device according to any one of claims 1 to 11.

カメラにより対象物を撮像した第１画像を取得し、
前記第１画像に映る前記対象物の特徴量の第１統計情報を前記第１画像から抽出し、
前記対象物が映り、前記第１画像と異なる第２画像を取得し、
前記第２画像に映る前記対象物の前記特徴量の第２統計情報を前記第２画像から抽出し、
前記第１統計情報と前記第２統計情報との対応関係に基づいて、前記第２画像の前記第２統計情報を前記第１画像の前記第１統計情報に近づける補正を行う
画像生成方法。 Acquiring a first image of an object captured by a camera,
Extracting first statistical information of the feature amount of the object appearing in the first image from the first image,
obtaining a second image showing the object and different from the first image;
extracting from the second image second statistical information of the feature amount of the object appearing in the second image;
An image generation method comprising: performing correction to make the second statistical information of the second image closer to the first statistical information of the first image based on a correspondence relationship between the first statistical information and the second statistical information.