JP2021033707A

JP2021033707A - Information processing apparatus

Info

Publication number: JP2021033707A
Application number: JP2019154000A
Authority: JP
Inventors: 晃平上里; Kohei Uezato; 木村　浩章; Hiroaki Kimura; 浩章木村
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2021-03-01

Abstract

To provide an information processing apparatus which can generate a learning image for accurately detecting an object partially hidden.SOLUTION: In an information processing apparatus 1, an image acquisition unit 20 acquires a detection object image, a hidden image, and a background image. A first generation unit 22 generates a superimposed image by superimposing the hidden image on a part of the detection object image so that a positional relationship between the detection object image and the hidden image may satisfy a predetermined superimposition criterion. A second generation unit 32 generates a learning image by synthesizing the superimposed image on the background image.SELECTED DRAWING: Figure 1

Description

本発明は、機械学習用の教師データを生成する技術に関する。 The present invention relates to a technique for generating teacher data for machine learning.

近年、機械学習を用いた物体認識技術が開発されている。機械学習には大量の教師データが必要である。特許文献１は、画像認識の対象物体と背景画像とを合成することで、機械学習の教師データとなる合成画像を大量に作成する技術を開示している。 In recent years, object recognition technology using machine learning has been developed. Machine learning requires a large amount of teacher data. Patent Document 1 discloses a technique for creating a large amount of composite images that serve as teacher data for machine learning by synthesizing a target object for image recognition and a background image.

特開２０１８−０８８２２３号公報JP-A-2018-08823

画像認識が行われる画像において、検出対象の一部が別の物体で隠れていることがある。本発明者は、一部が隠れている検出対象を精度よく検出するために、想定される隠され方を網羅した大量の学習用画像を教師データとして用い、物体検出モデルの学習を実行することが望ましいことを認識した。 In an image for which image recognition is performed, a part of the detection target may be hidden by another object. The present inventor uses a large amount of training images covering the assumed hiding method as teacher data in order to accurately detect a partially hidden detection target, and executes learning of an object detection model. Recognized that is desirable.

本発明はこうした状況に鑑みてなされたものであり、その目的は、一部が隠れた検出対象を精度よく検出するための学習用画像を生成できる情報処理装置を提供することにある。 The present invention has been made in view of such a situation, and an object of the present invention is to provide an information processing device capable of generating a learning image for accurately detecting a partially hidden detection target.

上記課題を解決するために、本発明のある態様の情報処理装置は、検出対象画像、隠蔽画像および背景画像を取得する画像取得部と、前記検出対象画像と前記隠蔽画像との位置関係が所定の重畳基準を満たすよう当該検出対象画像の一部に当該隠蔽画像を重畳して重畳画像を生成する第１生成部と、前記背景画像に前記重畳画像を合成して学習用画像を生成する第２生成部と、を備える。 In order to solve the above problems, in the information processing apparatus of the present invention, the positional relationship between the image acquisition unit that acquires the detection target image, the hidden image, and the background image and the detection target image and the hidden image is predetermined. A first generation unit that superimposes the hidden image on a part of the detection target image to generate a superimposition image and a first generation unit that superimposes the superimposition image on the background image to generate a learning image. It is provided with two generation units.

この態様によると、検出対象画像と隠蔽画像との位置関係が所定の重畳基準を満たすよう検出対象画像の一部に隠蔽画像を重畳するので、一部が隠れた検出対象を精度よく検出するための学習用画像を生成できる。 According to this aspect, the hidden image is superposed on a part of the detection target image so that the positional relationship between the detection target image and the concealed image satisfies a predetermined superposition standard, so that the partially hidden detection target can be detected accurately. Can generate learning images for.

本発明によれば、一部が隠れた検出対象を精度よく検出するための学習用画像を生成できる。 According to the present invention, it is possible to generate a learning image for accurately detecting a partially hidden detection target.

実施の形態に係る情報処理装置のブロック図である。It is a block diagram of the information processing apparatus which concerns on embodiment. 図２（ａ），（ｂ）は、実施の形態に係る重畳画像の一例を示す図である。2 (a) and 2 (b) are diagrams showing an example of a superimposed image according to the embodiment. 図３（ａ），（ｂ）は、実施の形態に係る重畳画像の他の例を示す図である。3 (a) and 3 (b) are diagrams showing other examples of superimposed images according to the embodiment. 実施の形態に係る学習用画像の一例を示す図である。It is a figure which shows an example of the learning image which concerns on embodiment. 図１の情報処理装置の処理を示すフローチャートである。It is a flowchart which shows the process of the information processing apparatus of FIG. 検出対象画像の他の例を示す図である。It is a figure which shows another example of the detection target image. 図６の検出対象画像の一部が隠れた一例を示す図である。It is a figure which shows an example in which a part of the detection target image of FIG. 6 is hidden. 図６の検出対象画像を含む学習用画像の一例を示す図である。It is a figure which shows an example of the learning image including the detection target image of FIG.

図１は、実施の形態に係る情報処理装置１のブロック図である。情報処理装置１は、画像認識に用いる物体検出モデルの学習を行うための複数の学習用画像を生成する。一例として、自動車などの車両に搭載された車載装置で実行される画像認識について説明するが、用途は特に限定されない。 FIG. 1 is a block diagram of the information processing device 1 according to the embodiment. The information processing device 1 generates a plurality of learning images for learning the object detection model used for image recognition. As an example, image recognition performed by an in-vehicle device mounted on a vehicle such as an automobile will be described, but the application is not particularly limited.

たとえば、画像認識による検出対象が人である場合、画像認識が行われる画像において人の一部が車両や建物などで隠れていることがある。既述のように、本発明者は、一部が隠れている検出対象を精度よく検出するために、想定される隠され方を網羅した数千から数万の学習用画像を用いることが望ましいことを認識した。 For example, when the detection target by image recognition is a person, a part of the person may be hidden by a vehicle, a building, or the like in the image for which image recognition is performed. As described above, it is desirable for the present inventor to use thousands to tens of thousands of learning images covering possible hidden methods in order to accurately detect a partially hidden detection target. I realized that.

このような大量の学習用画像を生成するには労力を要するため、実施の形態では、検出対象画像の一部に車両などの隠蔽画像を重畳し、隠蔽画像が重畳された検出対象画像を背景画像に合成し、学習用画像を生成する。これらの処理を繰り返すことで、大量の学習用画像を自動的に生成する。 Since it takes labor to generate such a large amount of learning images, in the embodiment, a hidden image such as a vehicle is superimposed on a part of the detection target image, and the detection target image on which the hidden image is superimposed is used as a background. Combine with an image to generate a learning image. By repeating these processes, a large number of learning images are automatically generated.

情報処理装置１は、処理部１０および記憶部１２を備える。処理部１０は、画像取得部２０、第１生成部２２、第２生成部３２、学習部３４および検出部３６を備える。記憶部１２は、画像記憶部４０、重畳画像記憶部４２、位置関係記憶部４４、学習用画像記憶部４６およびモデル記憶部４８を備える。 The information processing device 1 includes a processing unit 10 and a storage unit 12. The processing unit 10 includes an image acquisition unit 20, a first generation unit 22, a second generation unit 32, a learning unit 34, and a detection unit 36. The storage unit 12 includes an image storage unit 40, a superimposed image storage unit 42, a positional relationship storage unit 44, a learning image storage unit 46, and a model storage unit 48.

処理部１０の構成は、ハードウエア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩで実現でき、ソフトウエア的にはメモリにロードされたプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウエアのみ、ソフトウエアのみ、またはそれらの組合せによっていろいろな形で実現できることは、当業者には理解されるところである。 The configuration of the processing unit 10 can be realized by the CPU, memory, or other LSI of an arbitrary computer in terms of hardware, and can be realized by a program loaded in the memory in terms of software. It depicts a functional block realized by cooperation. Therefore, it will be understood by those skilled in the art that these functional blocks can be realized in various ways by hardware only, software only, or a combination thereof.

画像記憶部４０は、複数の検出対象画像、複数の隠蔽画像および複数の背景画像を予め記憶している。 The image storage unit 40 stores a plurality of detection target images, a plurality of hidden images, and a plurality of background images in advance.

検出対象画像は、たとえば人、自転車、車両などの検出対象を撮影した画像である。検出対象画像には、何の画像であるかを示す正解ラベルが予め関連付けられている。１つの正解ラベルに関して複数の検出対象画像が含まれる。 The detection target image is an image obtained by photographing a detection target such as a person, a bicycle, or a vehicle. A correct label indicating what kind of image it is is associated with the detection target image in advance. A plurality of detection target images are included for one correct label.

隠蔽画像は、検出対象画像の一部を隠す画像であり、たとえば、矩形、円形、星形などの図形の画像、および、交通環境に存在する物体である車両、自転車、建物などを撮影した画像を含む。図形の画像は、様々な色と透過率を有してもよい。交通環境に存在する物体の画像は、当該物体が不透明に設定され、当該物体の外側が透明に設定されたアルファチャンネル付きの画像であってもよい。 The hidden image is an image that hides a part of the detection target image, for example, an image of a figure such as a rectangle, a circle, or a star, and an image of an object existing in a traffic environment such as a vehicle, a bicycle, or a building. including. The image of the figure may have various colors and transmittances. The image of the object existing in the traffic environment may be an image with an alpha channel in which the object is set to be opaque and the outside of the object is set to be transparent.

背景画像は、検出対象画像の背景となる画像であり、たとえば各地の道路などを撮影した画像である。 The background image is an image that becomes the background of the detection target image, and is, for example, an image of a road or the like in various places.

画像取得部２０は、指定された正解ラベルの検出対象画像、隠蔽画像および背景画像を画像記憶部４０から取得し、取得した検出対象画像と隠蔽画像を第１生成部２２に出力し、取得した背景画像を第２生成部３２に出力する。 The image acquisition unit 20 acquires the detection target image, the hidden image, and the background image of the designated correct answer label from the image storage unit 40, outputs the acquired detection target image and the hidden image to the first generation unit 22, and acquires them. The background image is output to the second generation unit 32.

第１生成部２２は、画像取得部２０から出力された検出対象画像と隠蔽画像との位置関係が所定の重畳基準を満たすよう当該検出対象画像の一部に当該隠蔽画像を重畳して重畳画像を生成する。つまり重畳画像は、検出対象画像の一部が隠蔽画像で隠された画像である。第１生成部２２は、重畳部２４および判定部２６を有する。 The first generation unit 22 superimposes the concealed image on a part of the detection target image so that the positional relationship between the detection target image and the concealed image output from the image acquisition unit 20 satisfies a predetermined superimposition standard. To generate. That is, the superimposed image is an image in which a part of the detection target image is hidden by a hidden image. The first generation unit 22 has a superimposition unit 24 and a determination unit 26.

重畳部２４は、画像取得部２０から出力された検出対象画像の一部に隠蔽画像を重畳して重畳画像を生成する。隠蔽画像を重畳する位置は、ランダムに設定されてもよいし、ユーザに指定されてもよい。 The superimposing unit 24 superimposes a hidden image on a part of the detection target image output from the image acquisition unit 20 to generate a superimposing image. The position on which the hidden image is superimposed may be randomly set or may be specified by the user.

重畳部２４は、生成した重畳画像ごとに、検出対象画像における隠蔽画像が重畳した領域の位置の情報を、重畳画像を識別するための情報に関連付けて重畳画像記憶部４２に記憶させる。隠蔽画像が重畳した領域の位置は、検出対象画像の左下の頂点などの所定位置を基準とした位置である。隠蔽画像が矩形の場合、隠蔽画像が重畳した領域の位置情報は、たとえば当該領域の対頂点の座標を含む。重畳部２４は、検出対象画像における隠蔽画像が重畳していない領域の位置情報も重畳画像記憶部４２に記憶させてもよい。 The superimposing unit 24 stores the information on the position of the region on which the hidden image is superposed in the detection target image in the superimposing image storage unit 42 in association with the information for identifying the superimposing image for each generated superimposing image. The position of the region on which the hidden image is superimposed is a position based on a predetermined position such as the lower left apex of the detection target image. When the concealed image is rectangular, the position information of the region on which the concealed image is superimposed includes, for example, the coordinates of the pair vertices of the region. The superimposing unit 24 may also store the position information of the region where the hidden image in the detection target image is not superposed in the superimposing image storage unit 42.

重畳部２４は、同一の検出対象画像に関し、隠蔽画像を同一とし、隠蔽画像の重畳位置を異ならせて複数の重畳画像を生成してもよい。重畳部２４は、同一の検出対象画像に関し、隠蔽画像が重畳した領域の位置情報にもとづいて隠蔽画像の重畳位置を同一とし、隠蔽画像を異ならせて複数の重畳画像を生成してもよい。重畳部２４は、隠蔽画像が重畳した領域の位置情報が同一の複数の重畳画像ごとに、これら複数の重畳画像を識別するための情報に関連付けて位置情報を重畳画像記憶部４２に記憶させてもよい。 The superimposition unit 24 may generate a plurality of superimposition images with the same concealed image and different superimposition positions of the concealed images with respect to the same detection target image. With respect to the same detection target image, the superimposition unit 24 may generate a plurality of superimposition images by making the superimposition position of the concealed image the same based on the position information of the region on which the concealed image is superposed and making the concealed images different. The superimposing unit 24 stores the position information in the superimposing image storage unit 42 in association with the information for identifying the plurality of superimposing images for each of the plurality of superimposing images having the same position information in the area where the hidden images are superposed. May be good.

重畳部２４は、検出対象画像の一部に図形の隠蔽画像を重畳してから、当該隠蔽画像が重畳した領域の位置情報にもとづいて図形の隠蔽画像を現実の物体である車両などの隠蔽画像に置き換えてもよいし、図形の隠蔽画像を重畳せずに車両などの隠蔽画像を直接重畳してもよい。車両などの隠蔽画像を用いることで、図形の隠蔽画像を用いる場合よりも検出対象の検出精度をより高めることができる。 The superimposing unit 24 superimposes the hidden image of the figure on a part of the detection target image, and then changes the hidden image of the figure to the hidden image of a vehicle or the like which is a real object based on the position information of the area on which the hidden image is superimposed. It may be replaced with, or the concealed image of the vehicle or the like may be directly superposed without superimposing the concealed image of the figure. By using a concealed image of a vehicle or the like, the detection accuracy of the detection target can be further improved as compared with the case of using a concealed image of a figure.

図２（ａ），（ｂ）は、実施の形態に係る重畳画像７０の一例を示す。図２（ａ）では、人の検出対象画像７２の右下の一部に矩形の白塗りの隠蔽画像７４が重畳され、図２（ｂ）では、人の検出対象画像７２の左下の一部に矩形の白塗りの隠蔽画像７４が重畳されている。 2 (a) and 2 (b) show an example of the superimposed image 70 according to the embodiment. In FIG. 2A, a rectangular white-painted concealed image 74 is superimposed on a lower right part of the human detection target image 72, and in FIG. 2B, a lower left part of the human detection target image 72. A rectangular white-painted concealed image 74 is superimposed on the image.

図３（ａ），（ｂ）は、実施の形態に係る重畳画像７０の他の例を示す。この例では、図２（ａ）の矩形の隠蔽画像７４が現実の物体の隠蔽画像７４に置き換えられている。図３（ａ）では、人の検出対象画像７２の右下の一部に車両の隠蔽画像７４が重畳され、図３（ｂ）では、人の検出対象画像７２の右下の一部にバスの隠蔽画像７４が重畳されている。 3 (a) and 3 (b) show another example of the superimposed image 70 according to the embodiment. In this example, the rectangular concealed image 74 of FIG. 2A is replaced with the concealed image 74 of a real object. In FIG. 3A, the vehicle concealment image 74 is superimposed on the lower right part of the human detection target image 72, and in FIG. 3B, the bus is superimposed on the lower right part of the human detection target image 72. The hidden image 74 of the above is superimposed.

位置関係記憶部４４は、検出対象画像の正解ラベルごとに、検出対象画像と隠蔽画像の正しい位置関係を予め記憶している。たとえば、検出対象画像の正解ラベルが「人」である場合、正しい位置関係は、検出対象画像の中央部などの内部のみに隠蔽画像が位置しないこと、すなわち隠蔽画像の４つの頂点が検出対象画像の４つの辺より内側に存在しないことである。人の中央部のみが車などの障害物で隠されることは非現実的であるためである。 The positional relationship storage unit 44 stores in advance the correct positional relationship between the detection target image and the hidden image for each correct label of the detection target image. For example, when the correct label of the detection target image is "person", the correct positional relationship is that the hidden image is not located only inside the center of the detection target image, that is, the four vertices of the hidden image are the detection target image. It does not exist inside the four sides of. This is because it is unrealistic that only the central part of a person is hidden by an obstacle such as a car.

正しい位置関係は、隠蔽物に一部が隠れた検出対象を実際に撮影して得られた複数の画像をもとに、手動で設定されてもよいし、画像認識により自動で設定されてもよい。画像認識を行う場合、検出対象を画像認識してその位置とサイズを出力可能な既存の画像認識装置と、隠蔽物を画像認識してその位置とサイズを出力可能な既存の画像認識装置とを用いればよい。 The correct positional relationship may be set manually based on a plurality of images obtained by actually shooting a detection target partially hidden by a concealed object, or may be automatically set by image recognition. Good. When performing image recognition, an existing image recognition device capable of recognizing a detection target and outputting its position and size and an existing image recognition device capable of recognizing a concealed object and outputting its position and size are used. You can use it.

判定部２６は、検出対象画像における隠蔽画像が重畳した領域の位置情報をもとに特定される重畳画像の検出対象画像と隠蔽画像との位置関係、および、位置関係記憶部４４に記憶された正しい位置関係にもとづいて、重畳画像が現実的な画像であるか否か判定する。重畳画像が現実的な画像であれば、判定部２６は、当該重畳画像を重畳画像記憶部４２に記憶させる。重畳画像が非現実的な画像であれば、判定部２６は、当該重畳画像を除外し、重畳画像記憶部４２に記憶させない。 The determination unit 26 stores the positional relationship between the detection target image and the hidden image of the superimposed image specified based on the position information of the region on which the hidden image is superimposed on the detection target image, and the positional relationship storage unit 44. Based on the correct positional relationship, it is determined whether or not the superimposed image is a realistic image. If the superimposed image is a realistic image, the determination unit 26 stores the superimposed image in the superimposed image storage unit 42. If the superimposed image is an unrealistic image, the determination unit 26 excludes the superimposed image and does not store the superimposed image in the superimposed image storage unit 42.

具体的には判定部２６は、重畳画像における検出対象画像と隠蔽画像の位置関係が、正しい位置関係に含まれなければ、重畳画像が非現実的な画像であると判定する。たとえば、検出対象画像の正解ラベルが「人」であれば、判定部２６は、検出対象画像の中央領域のみに隠蔽画像が存在する重畳画像を除外する。非現実的な重畳画像を除外できるので、学習精度を向上できる。 Specifically, the determination unit 26 determines that the superimposed image is an unrealistic image if the positional relationship between the detection target image and the hidden image in the superimposed image is not included in the correct positional relationship. For example, if the correct label of the detection target image is "person", the determination unit 26 excludes the superimposed image in which the hidden image exists only in the central region of the detection target image. Since unrealistic superimposed images can be excluded, learning accuracy can be improved.

このように、第１生成部２２は、検出対象画像と隠蔽画像との位置関係が位置関係記憶部４４に記憶された正しい位置関係に含まれるよう、当該検出対象画像の一部に当該隠蔽画像を重畳して重畳画像を生成する。 In this way, the first generation unit 22 includes the hidden image in a part of the detection target image so that the positional relationship between the detection target image and the hidden image is included in the correct positional relationship stored in the positional relationship storage unit 44. Are superimposed to generate a superimposed image.

第２生成部３２は、第１生成部２２で生成された重畳画像を背景画像に合成して学習用画像を生成する。重畳画像を合成する背景画像内の位置は、公知の技術を用いて現実的な位置に自動的に設定されてもよいし、ユーザに指定されてもよい。第２生成部３２は、１つの重畳画像を複数の背景画像のそれぞれに合成して複数の学習用画像を生成してもよい。 The second generation unit 32 synthesizes the superimposed image generated by the first generation unit 22 with the background image to generate a learning image. The position in the background image into which the superimposed image is combined may be automatically set to a realistic position by using a known technique, or may be specified by the user. The second generation unit 32 may generate a plurality of learning images by synthesizing one superimposed image with each of the plurality of background images.

図４は、実施の形態に係る学習用画像７８の一例を示す。この学習用画像７８では、背景画像７６の中央付近に図３（ａ）の重畳画像７０が合成されている。 FIG. 4 shows an example of the learning image 78 according to the embodiment. In the learning image 78, the superimposed image 70 of FIG. 3A is synthesized near the center of the background image 76.

図１に戻る。第２生成部３２は、生成した学習用画像内の検出対象画像の位置の情報を検出対象画像の正解ラベルとともに学習用画像に関連付け、これらが関連付けられた学習用画像を学習用画像記憶部４６に記憶させる。検出対象画像の位置は、学習用画像の左下の頂点などの所定位置を基準とした位置である。検出対象画像が矩形の場合、検出対象画像の位置情報は、たとえば検出対象画像の矩形の対頂点の座標を含む。 Return to FIG. The second generation unit 32 associates the position information of the detection target image in the generated learning image with the learning image together with the correct answer label of the detection target image, and associates the learning image associated with these with the learning image storage unit 46. To memorize. The position of the detection target image is a position based on a predetermined position such as the lower left vertex of the learning image. When the detection target image is a rectangle, the position information of the detection target image includes, for example, the coordinates of the pair vertices of the rectangle of the detection target image.

画像取得部２０、第１生成部２２、第２生成部３２は、１つの正解ラベルに関して、予め定められた数の学習用画像が生成されるまで以上の一連の処理を繰り返す。予め定められた数は、ユーザにより設定され、たとえば数千から数万である。これにより、１つの正解ラベルに関して、正解ラベルおよび検出対象画像の位置情報が関連付けられた学習用画像が複数生成される。よって、手間がかかるアノテーション作業を不要にできる。また、非現実的な重畳画像は除外されているので、一部が隠れた検出対象を精度よく検出するための学習用画像を生成できる。 The image acquisition unit 20, the first generation unit 22, and the second generation unit 32 repeat the above series of processes for one correct label until a predetermined number of learning images are generated. The predetermined number is set by the user and is, for example, thousands to tens of thousands. As a result, for one correct answer label, a plurality of learning images in which the correct answer label and the position information of the detection target image are associated with each other are generated. Therefore, the time-consuming annotation work can be eliminated. Further, since the unrealistic superimposed image is excluded, it is possible to generate a learning image for accurately detecting a detection target whose part is hidden.

学習用画像記憶部４６は、検出対象画像の正解ラベルおよび検出対象画像の位置情報が関連付けられた学習用画像を複数記憶する。 The learning image storage unit 46 stores a plurality of learning images associated with the correct answer label of the detection target image and the position information of the detection target image.

予め定められた数の学習用画像が生成された場合、学習部３４は、学習用画像記憶部４６に記憶された複数の学習用画像を教師データとして、画像認識に用いられる物体検出モデルの学習を行う。学習は、検出対象画像の正解ラベルおよび検出対象画像の位置情報を用いて実行される。 When a predetermined number of learning images are generated, the learning unit 34 learns an object detection model used for image recognition using a plurality of learning images stored in the learning image storage unit 46 as teacher data. I do. The learning is executed using the correct label of the detection target image and the position information of the detection target image.

モデルの学習は、周知の機械学習の各種手法を用いて行うことができる。機械学習の例として、ディープラーニングなどがある。ディープラーニングの具体例としては、ニューラルネットワークを利用した誤差逆伝播法がある。教師データを用いて教師あり学習を実行できれば、他の手法により学習してもよい。学習部３４は、学習済みの物体検出モデルをモデル記憶部４８に記憶させる。 Model learning can be performed using various well-known machine learning methods. Deep learning is an example of machine learning. As a specific example of deep learning, there is an error back propagation method using a neural network. If supervised learning can be performed using teacher data, learning may be performed by other methods. The learning unit 34 stores the learned object detection model in the model storage unit 48.

検出部３６は、モデル記憶部４８に記憶された学習済みのモデルを用いて、実際に撮影された検証用の画像から検出対象を検出する。この処理は、学習済みのモデルの検証に相当する。 The detection unit 36 detects the detection target from the verification image actually taken by using the trained model stored in the model storage unit 48. This process corresponds to the verification of the trained model.

検出部３６による検出対象の検出精度が所定値未満である場合、または、学習用画像のバリエーションを増やす必要がある場合、画像取得部２０、第１生成部２２、第２生成部３２は、以上の一連の処理をさらに繰り返す。この場合、重畳部２４は、重畳画像記憶部４２に記憶された検出対象画像における隠蔽画像が重畳した領域の位置情報をもとに、当該領域の位置を変えずに、複数の重畳画像のそれぞれにおいて現在の隠蔽画像を別の隠蔽画像に置き換えてもよい。つまり、この場合も第１生成部２２は、検出対象画像と隠蔽画像との位置関係が所定の重畳基準を満たすよう当該検出対象画像の一部に当該隠蔽画像を重畳して重畳画像を生成する。検出対象画像と隠蔽画像との位置関係が所定の重畳基準を満たすことは、検出対象画像における新たな隠蔽画像が重畳した領域の位置と、重畳画像記憶部４２に記憶された検出対象画像における隠蔽画像が重畳した領域の位置とが一致することである。第２生成部３２は、隠蔽画像が置き換えられた複数の重畳画像をもとに、複数の学習用画像を生成する。これにより、隠蔽画像が置き換えられた追加の複数の学習用画像を容易に生成でき、学習用画像の数を容易に増やすことができる。よって、一部が隠れた検出対象を精度よく検出するための学習用画像を容易に生成できる。 When the detection accuracy of the detection target by the detection unit 36 is less than a predetermined value, or when it is necessary to increase the variation of the learning image, the image acquisition unit 20, the first generation unit 22, and the second generation unit 32 are all over. The series of processes of is further repeated. In this case, the superimposition unit 24 does not change the position of the region based on the position information of the region on which the hidden image is superimposed in the detection target image stored in the superimposition image storage unit 42, and each of the plurality of superimposition images. The current concealed image may be replaced with another concealed image in. That is, also in this case, the first generation unit 22 superimposes the concealed image on a part of the detection target image so that the positional relationship between the detection target image and the concealed image satisfies a predetermined superimposition standard to generate the superposed image. .. When the positional relationship between the detection target image and the hidden image satisfies a predetermined superposition criterion, the position of the region where the new concealed image is superposed in the detection target image and the concealment in the detection target image stored in the superimposition image storage unit 42 are satisfied. The position of the area on which the image is superimposed matches. The second generation unit 32 generates a plurality of learning images based on the plurality of superimposed images in which the hidden images are replaced. As a result, it is possible to easily generate a plurality of additional training images in which the hidden images are replaced, and it is possible to easily increase the number of training images. Therefore, it is possible to easily generate a learning image for accurately detecting a detection target whose part is hidden.

検出部３６による検出対象の検出精度が所定値以上であれば、学習済みのモデルが確定される。確定した学習済みのモデルは、たとえばネットワークなどを介して図示しない車載装置に送られ、車載装置において検出対象の画像認識に用いられる。 If the detection accuracy of the detection target by the detection unit 36 is equal to or higher than a predetermined value, the trained model is determined. The determined trained model is sent to an in-vehicle device (not shown) via a network or the like, and is used for image recognition of a detection target in the in-vehicle device.

次に、以上の構成による情報処理装置１の全体的な動作を説明する。図５は、図１の情報処理装置１の処理を示すフローチャートである。 Next, the overall operation of the information processing apparatus 1 with the above configuration will be described. FIG. 5 is a flowchart showing the processing of the information processing device 1 of FIG.

重畳部２４は、検出対象画像の一部に隠蔽画像を重畳して重畳画像を生成し（Ｓ１０）、判定部２６は非現実的な重畳画像を除外し（Ｓ１２）、第２生成部３２は、重畳画像を背景画像に合成して学習用画像を生成する（Ｓ１４）。学習用画像の数が所定数以上でなければ（Ｓ１６のＮ）、Ｓ１０に戻る。学習用画像の数が所定数以上であれば（Ｓ１６のＹ）、学習部３４は、複数の学習用画像を教師データとしてモデルを学習し（Ｓ１８）、検出部３６は、学習済みのモデルを用いて実際の画像から検出対象を検出する（Ｓ２０）。データを増やす必要があれば（Ｓ２２のＹ）、Ｓ１０に戻る。データを増やす必要がなければ（Ｓ２２のＮ）、処理を終了する。 The superimposition unit 24 superimposes a hidden image on a part of the detection target image to generate a superimposition image (S10), the determination unit 26 excludes the unrealistic superimposition image (S12), and the second generation unit 32 , The superimposed image is combined with the background image to generate a learning image (S14). If the number of learning images is not equal to or greater than a predetermined number (N in S16), the process returns to S10. If the number of learning images is equal to or greater than a predetermined number (Y in S16), the learning unit 34 learns the model using the plurality of learning images as teacher data (S18), and the detection unit 36 uses the trained model as the teacher data. The detection target is detected from the actual image by using (S20). If it is necessary to increase the data (Y in S22), the process returns to S10. If there is no need to increase the data (N in S22), the process ends.

本実施の形態によれば、学習用画像に検出対象画像の位置情報が関連付けられるので、アノテーション作業が不要になる。また、一部が隠蔽された検出対象を高精度に検出しやすい大量の学習用画像を容易に作成できる。 According to the present embodiment, since the position information of the detection target image is associated with the learning image, the annotation work becomes unnecessary. In addition, it is possible to easily create a large number of learning images in which a partially hidden detection target can be easily detected with high accuracy.

（検出対象画像の他の例）
検出対象画像は、人、車などの画像に限らず、マーク、文字などの画像でもよい。ここでは、機密情報を表す「秘」マークおよび「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークをパーソナルコンピュータのデスクトップ画面の画像から検出する一例を説明する。 (Other examples of images to be detected)
The image to be detected is not limited to an image of a person, a car, or the like, but may be an image of a mark, characters, or the like. Here, an example of detecting the "confidential" mark and the "confidential" mark representing confidential information from the image of the desktop screen of the personal computer will be described.

図６は、検出対象画像７２の他の例を示す。検出対象画像７２は、「秘」マークおよび「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークを含む書類の画像である。 FIG. 6 shows another example of the detection target image 72. The detection target image 72 is an image of a document including the "confidential" mark and the "confidential" mark.

図７は、図６の検出対象画像７２の一部が隠れた一例を示す。書類の検出対象画像７２の右下の一部がＰＤＦファイルなどの別の書類で隠れている。 FIG. 7 shows an example in which a part of the detection target image 72 of FIG. 6 is hidden. A part of the lower right part of the image 72 to be detected of the document is hidden by another document such as a PDF file.

デスクトップ画面において「秘」マークと「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークが付された書類を開いている場合、図６に示すように書類の全体が最前面に位置することもあれば、図７に示すように書類の一部が他のファイルで隠され、他のファイルの背面に位置することもある。 When a document with the "Confidential" mark and the "Confidential" mark is opened on the desktop screen, the entire document may be located in the foreground as shown in FIG. 6, or the document as shown in FIG. 7. Some of them are hidden by other files and may be located behind other files.

パーソナルコンピュータにおいてデスクトップ画面全体のスクリーンショットを撮り、撮られた画像をメールに添付して送信する状況を想定する。図７のように「秘」マークが付された書類の一部の上にＰＤＦファイルを開いており、このＰＤＦファイルの内容をスクリーンショットの画像で送信したい場合、「秘」マークが付された書類の内容もスクリーンショットの画像に写る可能性がある。そのため、この画像を送信すると、機密情報の流出につながる可能性がある。 Imagine a situation where you take a screenshot of the entire desktop screen on a personal computer and send the taken image as an attachment to an email. If you have a PDF file open on a part of a document marked "secret" as shown in Fig. 7, and you want to send the contents of this PDF file as a screenshot image, the "secret" mark is added. The contents of the document may also appear in the screenshot image. Therefore, sending this image may lead to the leakage of confidential information.

そこで、実施の形態と同様に複数の学習用画像を生成し、モデルを学習することで、一部が隠れた「秘」マークおよび「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークをスクリーンショットのデスクトップ画面の画像から検出する。 Therefore, by generating a plurality of learning images and training the model in the same manner as in the embodiment, the "confidential" mark and the "confidential" mark, which are partially hidden, are detected from the image of the desktop screen of the screenshot.

この場合、隠蔽画像は、図形、デスクトップ画面に存在する書類、写真、ウェブブラウザなどの画像を含む。背景画像は、デスクトップ画面などの画像を含む。 In this case, the hidden image includes an image such as a graphic, a document existing on the desktop screen, a photograph, or a web browser. The background image includes an image such as a desktop screen.

図８は、図６の検出対象画像７２を含む学習用画像７８の一例を示す。この学習用画像７８では、書類の検出対象画像７２の右下の一部に書類の隠蔽画像７４が重畳された重畳画像７０が、デスクトップ画面の背景画像７６の中央付近に合成されている。 FIG. 8 shows an example of a learning image 78 including the detection target image 72 of FIG. In the learning image 78, the superimposed image 70 in which the hidden image 74 of the document is superimposed on the lower right part of the detection target image 72 of the document is synthesized near the center of the background image 76 of the desktop screen.

画像認識は、パーソナルコンピュータで動作するメールアプリ、または、外部のメールサーバで実行される。メールアプリまたはメールサーバは、メールに添付されたデスクトップ画面の画像から、学習済みモデルを用いて「秘」マークまたは「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークを検出し、少なくともいずれかのマークが検出された場合、メールの送信を中断してユーザに注意喚起する。これにより、機密情報の流出を抑制しやすくなる。 Image recognition is executed by a mail application running on a personal computer or an external mail server. The email app or mail server uses the trained model to detect the "secret" or "confidential" mark from the desktop screen image attached to the email, and if at least one of the marks is detected, the email Suspend transmission to alert the user. This makes it easier to suppress the leakage of confidential information.

一部が隠れた「秘」マーク、「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークの学習用画像を用いて学習したモデルを用いることで、一部が隠れた「秘」マーク、「Ｃｏｎｆｉｄｅｎｔｉａｌ」マークをより高精度に検出できる。 By using a model learned using the learning image of the "confidential" mark and "confidential" mark that are partially hidden, the "confidential" mark and "confidential" mark that are partially hidden can be detected with higher accuracy. ..

以上、実施の形態をもとに本発明を説明した。実施の形態はあくまでも例示であり、各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiments. Embodiments are merely examples, and it is understood by those skilled in the art that various modifications are possible for each component and combination of each processing process, and that such modifications are also within the scope of the present invention.

例えば、第１の変形例として、学習部３４、検出部３６およびモデル記憶部４８は、情報処理装置１に設けられず、情報処理装置１の外部の端末装置などに配置されてもよい。この場合、情報処理装置１は教師データ生成装置として機能する。この変形例では、情報処理装置１の構成の自由度を向上できる。 For example, as a first modification, the learning unit 34, the detection unit 36, and the model storage unit 48 may not be provided in the information processing device 1, but may be arranged in a terminal device or the like outside the information processing device 1. In this case, the information processing device 1 functions as a teacher data generation device. In this modification, the degree of freedom in the configuration of the information processing device 1 can be improved.

第２の変形例として、実施の形態または第１の変形例において情報処理装置１は判定部２６、位置関係記憶部４４を備えず、非現実的な重畳画像を除外しなくてもよい。第２の変形例では、情報処理装置１の処理を簡素化できる。 As a second modification, in the embodiment or the first modification, the information processing apparatus 1 does not include the determination unit 26 and the positional relationship storage unit 44, and it is not necessary to exclude the unrealistic superimposed image. In the second modification, the processing of the information processing device 1 can be simplified.

第３の変形例として、第２の変形例において情報処理装置１は第１生成部２２、重畳画像記憶部４２を備えず、隠蔽画像および重畳画像を利用しなくてもよい。この場合、第２生成部３２は、検出対象画像を背景画像に合成して学習用画像を生成する。第３の変形例では、一部が隠れている検出対象画像を含む学習用画像は生成しないが、アノテーション作業を不要にできる。 As a third modification, in the second modification, the information processing device 1 does not include the first generation unit 22 and the superimposed image storage unit 42, and the hidden image and the superimposed image may not be used. In this case, the second generation unit 32 synthesizes the detection target image with the background image to generate a learning image. In the third modification, the learning image including the detection target image in which a part is hidden is not generated, but the annotation work can be omitted.

実施の形態では、記憶部１２は情報処理装置１の内部に配置されているが、情報処理装置１の外部に配置されてもよい。たとえば、外部のストレージ装置、外部のサーバ、クラウドストレージなどに記憶部１２を設け、学習用画像や学習済みの物体検出モデルなどを保存してもよい。この変形例では、情報処理装置１の構成の自由度を向上できる。 In the embodiment, the storage unit 12 is arranged inside the information processing device 1, but may be arranged outside the information processing device 1. For example, a storage unit 12 may be provided in an external storage device, an external server, a cloud storage, or the like to store a learning image, a learned object detection model, or the like. In this modification, the degree of freedom in the configuration of the information processing device 1 can be improved.

１…情報処理装置、２０…画像取得部、２２…第１生成部、２４…重畳部、２６…判定部、３２…第２生成部、３４…学習部、３６…検出部、４０…画像記憶部、４２…重畳画像記憶部、４４…位置関係記憶部、４６…学習用画像記憶部、４８…モデル記憶部。 1 ... Information processing device, 20 ... Image acquisition unit, 22 ... 1st generation unit, 24 ... Superimposition unit, 26 ... Judgment unit, 32 ... Second generation unit, 34 ... Learning unit, 36 ... Detection unit, 40 ... Image storage Units, 42 ... Superimposed image storage unit, 44 ... Positional relationship storage unit, 46 ... Learning image storage unit, 48 ... Model storage unit.

Claims

検出対象画像、隠蔽画像および背景画像を取得する画像取得部と、
前記検出対象画像と前記隠蔽画像との位置関係が所定の重畳基準を満たすよう当該検出対象画像の一部に当該隠蔽画像を重畳して重畳画像を生成する第１生成部と、
前記背景画像に前記重畳画像を合成して学習用画像を生成する第２生成部と、
を備えることを特徴とする情報処理装置。 An image acquisition unit that acquires a detection target image, a hidden image, and a background image,
A first generation unit that superimposes the concealed image on a part of the detection target image to generate a superposed image so that the positional relationship between the detection target image and the concealed image satisfies a predetermined superimposition criterion.
A second generation unit that generates a learning image by synthesizing the superimposed image with the background image,
An information processing device characterized by being equipped with.