JP2021047816A

JP2021047816A - Learning data generation device, learning data generation method, and program

Info

Publication number: JP2021047816A
Application number: JP2019171710A
Authority: JP
Inventors: 美恵大串; Mie Ogushi; 貴広馬場; Takahiro Baba; 陽太 ▲高▼岡; Yota Takaoka; 英雄寺田; Hideo Terada
Original assignee: Toppan Forms Co Ltd; Open Stream Inc
Current assignee: Open Stream Inc; Toppan Edge Inc
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2021-03-25
Anticipated expiration: 2039-09-20
Also published as: JP7431005B2

Abstract

To provide a learning data generation device, a learning data generation method, and a program that can efficiently prepare various types of learning data in machine learning for document image recognition.SOLUTION: A learning data generation device comprises: an image data generation unit that, based on a generation condition, generates an image including characters and geometric figures as input data in machine learning; and a training data generation unit that generates information in which an element type for classifying pixels in the image into character elements indicating elements constituting the characters, geometric elements indicating elements constituting the geometric figures, and background elements indicating elements constituting the background that is not the characters and geometric figures, is associated with the pixels in the generated image, as training data in the machine learning.SELECTED DRAWING: Figure 1

Description

本発明は、学習データ生成装置、学習データ生成方法、及びプログラムに関する。 The present invention relates to a learning data generator, a learning data generation method, and a program.

近年、文書画像に対する文字認識により、当該文書画像における文字や幾何学的図形等を認識する技術が各種提案されている。 In recent years, various techniques for recognizing characters, geometric figures, and the like in a document image by character recognition have been proposed.

例えば、下記特許文献１には、機械学習を利用して文書画像の所定の領域の画素が文字を示す文字画素であるか否かを判定する技術が開示されている。当該技術では、文書画像を入力された機械学習モデルが文書画像の画素ごとに文字を示す画素であるか否かを判定し、文字を示す画素が所定の領域を占める範囲に応じて、所定の領域が文字を示す領域であるか否かを判定する。 For example, Patent Document 1 below discloses a technique for determining whether or not a pixel in a predetermined region of a document image is a character pixel indicating a character by using machine learning. In this technique, it is determined whether or not the machine learning model in which the document image is input is a pixel indicating a character for each pixel of the document image, and a predetermined range is determined according to the range in which the pixel indicating the character occupies a predetermined area. It is determined whether or not the area is an area indicating characters.

特開２０１９−５７８０３号公報JP-A-2019-57803

機械学習を利用した画像認識の精度は、機械学習モデルの学習時のデータ量が多いほど向上する。そのため、文書画像のレイアウトの認識の精度を向上するには、多様なレイアウトの画像データがより多く用意されることが望ましい。当該画像データは、例えば、文書画像等の印刷物をスキャンすることで生成される。そのため、多様なレイアウトの画像データを用意するには、多様なレイアウトの印刷物をより多く用意する必要がある。しかしながら、文書画像等のレイアウトは限りなく存在するため、多様なレイアウトの印刷物を用意することには時間と労力を要する。また、教師有り学習においては、画像データごとに教師データを用意する必要があるため、さらに時間と労力を要する。 The accuracy of image recognition using machine learning improves as the amount of data at the time of learning the machine learning model increases. Therefore, in order to improve the accuracy of recognizing the layout of the document image, it is desirable to prepare more image data of various layouts. The image data is generated by scanning a printed matter such as a document image, for example. Therefore, in order to prepare image data having various layouts, it is necessary to prepare more printed matter having various layouts. However, since there are an infinite number of layouts such as document images, it takes time and effort to prepare printed matter having various layouts. Further, in supervised learning, it is necessary to prepare teacher data for each image data, which requires more time and labor.

上述の課題を鑑み、本発明の目的は、文書画像認識のための機械学習における多様な学習データを効率よく用意することが可能な学習データ生成装置、学習データ生成方法、及びプログラムを提供することにある。 In view of the above problems, an object of the present invention is to provide a learning data generation device, a learning data generation method, and a program capable of efficiently preparing various learning data in machine learning for document image recognition. It is in.

上述の課題を解決するために、本発明の一態様に係る学習データ生成装置は、生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成する画像データ生成部と、前記画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を、生成された前記画像の前記画素と対応付けた情報を、前記機械学習における教師データとして生成する教師データ生成部と、を備える。 In order to solve the above-mentioned problems, the learning data generation device according to one aspect of the present invention is an image data generation unit that generates an image including characters and geometric figures as input data in machine learning based on generation conditions. The pixels in the image constitute a character element indicating an element constituting a character, a geometric element indicating an element constituting a geometric figure, or a character and a background that is not a geometric figure. It includes a teacher data generation unit that generates information in which an element type that distinguishes whether it is a background element indicating an element is associated with the pixel of the generated image as teacher data in the machine learning.

本発明の一態様に係る学習データ生成方法は、画像データ生成部が、生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成することと、教師データ生成部が、前記画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を、生成された前記画像の前記画素と対応付けた情報を、前記機械学習における教師データとして生成することと、を含む。 In the learning data generation method according to one aspect of the present invention, the image data generation unit generates an image including characters and geometric figures as input data in machine learning based on the generation conditions, and the teacher data generation unit. However, the pixels in the image form a character element indicating an element constituting a character, a geometric element indicating an element constituting a geometric figure, or a character and a background that is not a geometric figure. The element type for distinguishing whether it is a background element indicating an element includes information associated with the pixel of the generated image as teacher data in the machine learning.

本発明の一態様に係るプログラムは、コンピュータを、生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成する画像データ生成部と、前記画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を、生成された前記画像の前記画素と対応付けた情報を、前記機械学習における教師データとして生成する教師データ生成部と、として機能させる。 In the program according to one aspect of the present invention, the computer has an image data generation unit that generates an image including characters and geometric figures as input data in machine learning based on generation conditions, and pixels in the image are characters. Whether it is a character element that indicates an element that constitutes a character, a geometric element that indicates an element that constitutes a geometric figure, or a background element that indicates an element that constitutes a background that is not a character or a geometric figure. The element types to be distinguished are made to function as a teacher data generation unit that generates information associated with the pixels of the generated image as teacher data in the machine learning.

本発明によれば、文書画像認識のための機械学習における多様な学習データを効率よく用意することができる。 According to the present invention, various learning data in machine learning for document image recognition can be efficiently prepared.

本発明の実施形態に係る学習データ生成装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the learning data generation apparatus which concerns on embodiment of this invention. 同実施形態に係る学習データの生成例を示す図である。It is a figure which shows the generation example of the learning data which concerns on the same embodiment. 同実施形態に係る学習データ生成装置における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing in the learning data generation apparatus which concerns on this embodiment.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜学習データ生成装置の構成例＞
まず、図１を参照しながら、本実施形態に係る学習データ生成装置について説明する。図１は、本発明の実施形態に係る学習データ生成装置１０の構成例を示すブロック図である。 <Configuration example of learning data generator>
First, the learning data generation device according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration example of the learning data generation device 10 according to the embodiment of the present invention.

学習データ生成装置１０は、機械学習に用いられる学習データを生成する機能を有する装置である。以下では、画像における画素ごとの要素種別を判定する学習済みモデルを生成する際に用いられる学習データを学習データ生成装置１０が生成する例について説明する。 The learning data generation device 10 is a device having a function of generating learning data used for machine learning. Hereinafter, an example in which the learning data generation device 10 generates learning data used when generating a trained model for determining an element type for each pixel in an image will be described.

要素種別とは、画像における各画素が如何なる種類を構成する要素であるかを示す情報であり、文字要素、線分要素、及び背景要素のいずれかを示す情報である。文字要素は、画素が画像における文字領域を構成する要素であることを示す。線分要素は、画素が画像における線分を構成する要素であることを示す。背景要素は、画素が画像における背景（線分ではなく、且つ文字領域ではないもの）を構成する要素であることを示す。ここで、線分要素は「幾何学要素」の一例である。 The element type is information indicating what kind of element each pixel in the image constitutes, and is information indicating any one of a character element, a line segment element, and a background element. The character element indicates that the pixel is an element that constitutes a character area in the image. The line segment element indicates that the pixel is an element that constitutes a line segment in the image. The background element indicates that the pixel is an element that constitutes the background (not a line segment and not a character area) in the image. Here, the line segment element is an example of a "geometric element".

学習済みモデルは、画像における各画素が画像に示されている如何なる内容を構成する要素であるか判定する装置（以下、「判定装置」とも称される）に用いられ得る。例えば、判定装置は、画像における画素が文字とそれ以外の要素との何れであるかを判定する。ここで、それ以外の要素とは、例えば、幾何学的図形である。幾何学的図形とは、幾何学的な図形であって、例えば、線、線分、一定条件を満たす状態で配置された記号の群等である。 The trained model can be used as a device (hereinafter, also referred to as a “determination device”) for determining what kind of content each pixel in the image constitutes in the image. For example, the determination device determines whether a pixel in an image is a character or another element. Here, the other elements are, for example, geometric figures. A geometric figure is a geometric figure, for example, a line, a line segment, a group of symbols arranged in a state satisfying a certain condition, and the like.

画像は、線、及び文字を含む画像である。線が組み合わされる、或いは線の一部が屈曲（或いは湾曲）されることで、罫線や枠線等が構成される場合もある。画像は、判定装置による判定の対象となる画像である。すなわち、画像は、「対象画像」の一例である。 The image is an image including lines and characters. Ruled lines, borders, and the like may be formed by combining lines or by bending (or bending) a part of the lines. The image is an image to be determined by the determination device. That is, the image is an example of a "target image".

学習済みモデルは、対象画像が入力されると、対象画像における画素の要素種別を判定する。学習済みモデルは、入力された対象画像の画素の要素種別を判定するために、あらかじめ学習データを用いた機械学習を行った上で生成される。当該学習データは、学習データ生成装置１０によって生成される。 When the target image is input, the trained model determines the element type of the pixel in the target image. The trained model is generated after performing machine learning using the training data in advance in order to determine the element type of the pixel of the input target image. The learning data is generated by the learning data generation device 10.

学習データ生成装置１０は、例えば、教師有り学習における学習用のデータセットを生成する。データセットは、入力データと、当該入力データと対応する教師データのセットである。入力データは、学習時の入力となるデータである。本実施形態に係る入力データは、文字と幾何学的図形とを含む画像である。教師データは、入力データに基づき出力される出力データの正解を示すデータである。本実施形態に係る教師データは、入力データである画像の各画素と、当該画素の要素種別とを対応付けた情報である。 The learning data generation device 10 generates, for example, a data set for learning in supervised learning. A data set is a set of input data and teacher data corresponding to the input data. The input data is data that is input during learning. The input data according to the present embodiment is an image including characters and geometric figures. The teacher data is data indicating the correct answer of the output data output based on the input data. The teacher data according to the present embodiment is information in which each pixel of the image, which is input data, is associated with the element type of the pixel.

（ＤＣＮＮの基本）
教師有り学習における学習済みモデルは、学習用のデータセットを用いてＤＣＮＮ（Deep Convolutional Neural Network）などのモデルを学習させることにより生成される。ＤＣＮＮは、Convolution（畳込み積分）層を主要部分に使用する深層形のニューラルネットワークである。画像認識においては、ＤＣＮＮにて入力層に２次元のConvolution層を使用することにより、着目画素とその近傍にある画素の双方の情報を加味した画像特徴情報を効率よく認識できる。さらに２次元Convolutionを重ねて多層化して適用することにより、着目画素の近傍だけでなく、より離れた画素の情報も加味した大域的な画像特徴情報も認識できることが知られている。 (Basics of DCNN)
A trained model in supervised learning is generated by training a model such as DCNN (Deep Convolutional Neural Network) using a data set for learning. DCNN is a deep neural network that uses the Convolution layer as the main part. In image recognition, by using a two-dimensional Convolution layer as an input layer in DCNN, it is possible to efficiently recognize image feature information in which information of both a pixel of interest and a pixel in the vicinity thereof is added. Further, it is known that by superimposing two-dimensional Convolutions and applying them in multiple layers, it is possible to recognize not only the information of pixels in the vicinity of the pixel of interest but also the global image feature information including the information of pixels farther away.

（ＤＣＮＮの学習）
Convolution層の計算は、数学的な線形変換式(y=<W,x>+b)で表現することができる。すなわち、これは微分可能な計算式である。微分可能な計算層は、誤差逆伝播法として知られているニューラルネットの教師有り学習の原理を使って、学習を実行することが可能である。 (Learning of DCNN)
The calculation of the Convolution layer can be expressed by a mathematical linear transformation formula (y = <W, x> + b). That is, this is a differentiable formula. The differentiable computational layer can perform learning using the principle of supervised learning of neural networks known as backpropagation.

ＤＣＮＮでは、ある層のユニットから、より深い層のユニットにデータが出力される際に、ユニット同士を接続するノードの結合係数に応じた重みＷ、及びバイアス成分ｂが付与されたデータが出力される。学習モデルは、入力されたデータ（入力データ）に対し、各ユニット間の演算を行い、出力層から出力データを出力する。 In DCNN, when data is output from a unit in a certain layer to a unit in a deeper layer, data with a weight W corresponding to the coupling coefficient of the node connecting the units and a bias component b is output. The node. The learning model performs operations between each unit on the input data (input data), and outputs the output data from the output layer.

本実施形態における学習用のデータセットは、入力としての画像情報と、その画素ごとの要素種別とを対応付けた情報である。 The data set for learning in the present embodiment is information in which the image information as input and the element type for each pixel are associated with each other.

学習の過程において、学習モデルに、学習用のデータセットの入力データを入力させる。学習モデルは、入力データに対して出力層から出力されるデータ（出力データ）が、学習用のデータセットの出力（教師データ）に近づくように、学習モデルのパラメータ（重みＷ及びバイアス成分ｂ）を調整することにより、学習モデルを学習させる。 In the process of learning, the learning model is made to input the input data of the data set for learning. In the learning model, the parameters (weight W and bias component b) of the learning model so that the data (output data) output from the output layer with respect to the input data approaches the output (teacher data) of the data set for training. The learning model is trained by adjusting.

例えば、ＤＣＮＮモデルのパラメータ（重みＷ、及びバイアス成分ｂ）の調整には、誤差逆伝搬法が用いられる。誤差逆伝搬法では、学習モデルの出力層から出力されるデータと、学習用のデータセットの出力との乖離度合いが、損失関数として表現される。ここでの乖離度合いには、任意の指標が用いられてよいが、例えば、誤差の二乗（二乗誤差）やクロスエントロピー等が用いられる。誤差逆伝搬法では、出力層から入力層側に至る方向に、損失関数が最小となるように、重みＷとバイアス成分ｂの値を決定（更新）する。これにより学習モデルを学習させ、判定の精度を向上させる。 For example, the error back propagation method is used to adjust the parameters (weight W and bias component b) of the DCNN model. In the error back propagation method, the degree of deviation between the data output from the output layer of the training model and the output of the training data set is expressed as a loss function. Any index may be used for the degree of dissociation here, and for example, the square of the error (square error), cross entropy, or the like is used. In the error back propagation method, the values of the weight W and the bias component b are determined (updated) so that the loss function is minimized in the direction from the output layer to the input layer side. As a result, the learning model is trained and the accuracy of the judgment is improved.

なお、学習モデルは、ＤＣＮＮに限定されることはない。学習モデルとして、例えば、決定木、階層ベイズ、ＳＶＭ（Support Vector Machine）などの手法が用いられてもよい。 The learning model is not limited to DCNN. As the learning model, for example, a method such as a decision tree, hierarchical Bayes, or SVM (Support Vector Machine) may be used.

学習データ生成装置１０は、学習データを生成する機能を実現するために、図１に示すように、画像データ生成部１１０及び教師データ生成部１２０を備える。 As shown in FIG. 1, the learning data generation device 10 includes an image data generation unit 110 and a teacher data generation unit 120 in order to realize a function of generating learning data.

（画像データ生成部１１０）
画像データ生成部１１０は、生成条件に基づき、機械学習における入力用の画像データを生成する機能を有する。例えば、画像データ生成部１１０は、生成条件に基づき、文字と幾何学的図形とを含む画像を生成する。生成後、画像データ生成部１１０は、生成した画像を例えば記憶装置（不図示）へ出力し、記憶装置に画像を保存させる。 (Image data generation unit 110)
The image data generation unit 110 has a function of generating image data for input in machine learning based on the generation conditions. For example, the image data generation unit 110 generates an image including characters and geometric figures based on the generation conditions. After generation, the image data generation unit 110 outputs the generated image to, for example, a storage device (not shown), and stores the image in the storage device.

なお、生成条件は、ユーザにより設定されてもよいし、画像データ生成部１１０により自動で設定されてもよい。ユーザは、例えば、キーボードやタッチパネル等の入力インタフェースを介して生成条件を入力することで生成条件を設定する。画像データ生成部１１０は、例えば、設定項目をランダムに変更することで生成条件をランダムに設定してもよいし、ユーザに指定された条件に基づき生成条件を設定してもよい。 The generation conditions may be set by the user or may be automatically set by the image data generation unit 110. The user sets the generation condition by inputting the generation condition through an input interface such as a keyboard or a touch panel, for example. The image data generation unit 110 may, for example, randomly set the generation conditions by randomly changing the setting items, or may set the generation conditions based on the conditions specified by the user.

生成条件は、多様な条件の組み合わせにより設定される。主な生成条件の一例として、画像生成条件、文字生成条件、幾何学的図形生成条件、及び背景生成条件が挙げられる。 The generation conditions are set by a combination of various conditions. Examples of the main generation conditions include an image generation condition, a character generation condition, a geometric figure generation condition, and a background generation condition.

画像生成条件は、具体的に、生成する画像のサイズの設定である。文字生成条件は、具体的に、生成する文字のフォント、サイズ、太さ、数、位置、及び反転の有無等の設定である。幾何学的図形生成条件は、具体的に、生成する幾何学的図形の種類、サイズ、数、太さ、位置、境界の線種（例えば実践、点線等）、塗りつぶしの条件（例えば白塗り、ベタ塗り等）、及びコーナーの形状（例えば角、ラウンド等）等の設定である。背景生成条件は、具体的に、背景パターンの有無及び背景の種類等の設定である。 The image generation condition is specifically a setting of the size of the image to be generated. The character generation conditions are specifically settings such as the font, size, thickness, number, position, and presence / absence of inversion of the characters to be generated. The geometric figure generation conditions are specifically the type, size, number, thickness, position, boundary line type (for example, practice, dotted line, etc.) of the geometric figure to be generated, and the filling condition (for example, white painting, white painting, etc.). It is a setting such as solid coating) and corner shape (for example, corner, round, etc.). The background generation condition is specifically a setting such as the presence / absence of a background pattern and the type of background.

画像データ生成部１１０は、上述した画像生成条件、文字生成条件、幾何学的図形生成条件、及び背景生成条件の組み合わせに基づき、文字と幾何学的図形とを含む画像を入力用の画像データとして生成する。これにより、画像データ生成部１１０は、上述の条件が組み合わされた多様な生成条件に応じて、多様な画像を入力用の画像データとして生成することができる。 The image data generation unit 110 uses an image including characters and geometric figures as image data for input based on a combination of the above-mentioned image generation conditions, character generation conditions, geometric figure generation conditions, and background generation conditions. Generate. As a result, the image data generation unit 110 can generate various images as input image data according to various generation conditions in which the above conditions are combined.

なお、生成条件は、かかる例に限定されず、その他の条件が設定されてもよい。例えば、ノイズの追加の有無が設定されてもよい。ノイズの追加が有りに設定された場合、画像データ生成部１１０は、ノイズの追加を行う設定も組み合わせた生成条件に基づき、文字と幾何学的図形に加えてノイズをさらに含む画像を生成する。これにより、画像データには、例えばスキャナーで印刷物がスキャンされた際に生じるノイズが再現される。よって、画像データ生成部１１０は、より現実的な画像データを生成することができる。 The generation condition is not limited to this example, and other conditions may be set. For example, the presence or absence of addition of noise may be set. When the addition of noise is set to Yes, the image data generation unit 110 generates an image including noise in addition to the characters and geometric figures based on the generation conditions combined with the setting for adding noise. As a result, the noise generated when the printed matter is scanned by the scanner, for example, is reproduced in the image data. Therefore, the image data generation unit 110 can generate more realistic image data.

スキャンにより生じるノイズは、例えば、ドットや線状のもので、画像の全体に均等に生じたり、部分的に生じたりする。ノイズの種類は、一例として、ガウシアンノイズ、インパルスノイズ、スキャナーの読込部の汚れが原因で生じるノイズ等が挙げられる。 The noise generated by scanning is, for example, dots or linear noise, which may occur evenly or partially throughout the image. Examples of the types of noise include Gaussian noise, impulse noise, and noise caused by dirt on the reading portion of the scanner.

なお、生成条件は、画像データ生成部１１０による画像の生成後に、自動的に変更され得る。生成条件が変更された場合、画像データ生成部１１０は、変更された生成条件に基づく画像を生成する。さらに、生成条件の自動的な変更と、変更後の生成条件に基づく画像の生成は繰り返される。これにより、学習データ生成装置１０は、自動的に多様な画像を大量に生成することができる。 The generation conditions can be automatically changed after the image is generated by the image data generation unit 110. When the generation conditions are changed, the image data generation unit 110 generates an image based on the changed generation conditions. Further, the automatic change of the generation condition and the generation of the image based on the changed generation condition are repeated. As a result, the learning data generation device 10 can automatically generate a large number of various images.

（教師データ生成部１２０）
教師データ生成部１２０は、入力データと対応する教師データを生成する機能を有する。例えば、教師データ生成部１２０は、入力データとして生成された画像の画素と、当該画素の要素種別とを対応付けた情報を、機械学習における教師データとして生成する。例えば、教師データ生成部１２０は、画像の画素ごとに、文字要素、線分要素、又は背景要素のいずれか１つを要素種別として対応付ける。生成後、教師データ生成部１２０は、生成した教師データを例えば記憶装置（不図示）へ出力し、記憶装置に教師データを保存させる。 (Teacher data generation unit 120)
The teacher data generation unit 120 has a function of generating teacher data corresponding to the input data. For example, the teacher data generation unit 120 generates information in which the pixels of the image generated as input data and the element types of the pixels are associated with each other as teacher data in machine learning. For example, the teacher data generation unit 120 associates any one of a character element, a line segment element, and a background element as an element type for each pixel of an image. After generation, the teacher data generation unit 120 outputs the generated teacher data to, for example, a storage device (not shown), and stores the teacher data in the storage device.

教師データ生成部１２０は、例えば、画像の生成時の生成条件を参照して、画素ごとの要素種別を取得し、取得した要素種別を各画素に対して対応付けた教師データを生成する。教師データ生成部１２０は、例えば、要素種別ごとに教師データを生成する。 The teacher data generation unit 120 acquires, for example, an element type for each pixel by referring to the generation conditions at the time of image generation, and generates teacher data in which the acquired element type is associated with each pixel. The teacher data generation unit 120 generates teacher data for each element type, for example.

教師データ生成部１２０は、要素種別が文字要素である画素をある特定の色（例えば黒）とし、文字要素ではない画素を、別の色（例えば白）とすることにより、文字要素が対応付けられた画素を示す教師データを生成する。
教師データ生成部１２０は、要素種別が線分要素である画素をある特定の色（例えば黒）とし、線分要素ではない画素を、別の色（例えば白）とすることにより、線分要素が対応付けられた画素を示す教師データを生成する。
教師データ生成部１２０は、要素種別が背景要素である画素をある特定の色（例えば黒）とし、背景要素ではない画素を、別の色（例えば白）とすることにより、背景要素が対応付けられた画素を示す教師データを生成する。 The teacher data generation unit 120 associates a character element with a pixel whose element type is a character element by setting a specific color (for example, black) and a pixel other than a character element by another color (for example, white). Generates teacher data indicating the pixels that have been created.
The teacher data generation unit 120 sets a pixel whose element type is a line segment element to a specific color (for example, black) and a pixel that is not a line segment element to another color (for example, white), thereby setting the line segment element. Generates teacher data indicating the associated pixels.
The teacher data generation unit 120 associates the background elements with each other by setting the pixels whose element type is the background element to a specific color (for example, black) and the pixels that are not the background element to another color (for example, white). Generates teacher data indicating the pixels that have been created.

教師データ生成部１２０は、要素種別ごとに生成した教師データに基づき、教師データに対して領域情報をさらに設定してもよい。領域情報とは、各要素種別を示す画素が対応付けられている画像上の領域を示す情報である。教師データ生成部１２０は、例えば、要素種別ごとに教師データに領域情報を設定する。 The teacher data generation unit 120 may further set area information for the teacher data based on the teacher data generated for each element type. The area information is information indicating an area on an image to which pixels indicating each element type are associated. The teacher data generation unit 120 sets, for example, area information in the teacher data for each element type.

教師データ生成部１２０は、文字要素が対応付けられた画素を示す教師データにおいて、特定の色（例えば黒）で示された文字要素である画素を含む領域を、文字領域と設定する。
教師データ生成部１２０は、線分要素が対応付けられた画素を示す教師データにおいて、特定の色（例えば黒）で示された線分要素である画素を含む領域を、線分領域と設定する。
教師データ生成部１２０は、背景要素が対応付けられた画素を示す教師データにおいて、特定の色（例えば黒）で示された背景要素である画素を含む領域を、背景領域と設定する。 The teacher data generation unit 120 sets, in the teacher data indicating the pixels to which the character elements are associated, the area including the pixels which are the character elements indicated by a specific color (for example, black) as the character area.
The teacher data generation unit 120 sets a region including pixels that are line segment elements indicated by a specific color (for example, black) as a line segment region in the teacher data indicating the pixels to which the line segment elements are associated. ..
The teacher data generation unit 120 sets a region including pixels, which are background elements indicated by a specific color (for example, black), as a background region in the teacher data indicating pixels to which background elements are associated.

なお、各要素種別の領域には、画素が１つのみ含まれてもよいし、同一の要素種別を示す隣接する画素が複数含まれてもよい。このように、要素種別ごとに領域情報が設定されることで、教師データは、各画素の要素種別の違いを領域単位で明確に示すことができる。 The area of each element type may include only one pixel, or may include a plurality of adjacent pixels indicating the same element type. By setting the area information for each element type in this way, the teacher data can clearly show the difference in the element type of each pixel for each area.

ここで、図２を参照して、学習データの生成例について説明する。図２は、本発明の実施形態に係る学習データの生成例を示す図である。上述したように、学習データには入力データと教師データが含まれ、入力データと対応する教師データが１つのデータセットとなっている。 Here, an example of generating training data will be described with reference to FIG. FIG. 2 is a diagram showing an example of generating learning data according to an embodiment of the present invention. As described above, the training data includes input data and teacher data, and the input data and the corresponding teacher data form one data set.

例えば、図２に示すように、入力データ２０−１と対応する３つの教師データ３０−１、教師データ３１−１、及び教師データ３２−１が１つのデータセット４０−１となっている。なお、教師データ３０−１は、文字要素が対応付けられた画素を示す教師データであり、黒い領域が文字領域を示している。また、教師データ３１−１は、線分要素が対応付けられた画素を示す教師データであり、黒い領域が線分領域を示している。また、教師データ３２−１は、背景要素が対応付けられた画素を示す教師データであり、黒い領域が背景領域を示している。 For example, as shown in FIG. 2, the three teacher data 30-1 corresponding to the input data 20-1, the teacher data 31-1 and the teacher data 32-1 form one data set 40-1. The teacher data 30-1 is teacher data indicating pixels to which character elements are associated, and a black area indicates a character area. Further, the teacher data 31-1 is teacher data indicating pixels to which line segment elements are associated, and a black region indicates a line segment region. Further, the teacher data 32-1 is teacher data indicating pixels to which background elements are associated, and a black area indicates a background area.

また、図２に示すように、入力データ２０−２と対応する３つの教師データ３０−２、教師データ３１−２、及び教師データ３２−２が１つのデータセット４０−２となっている。なお、教師データ３０−２は、文字要素が対応付けられた画素を示す教師データであり、黒い領域が文字領域を示している。また、教師データ３１−２は、線分要素が対応付けられた画素を示す教師データであり、黒い領域が線分領域を示している。また、教師データ３２−２は、背景要素が対応付けられた画素を示す教師データであり、黒い領域が背景領域を示している。 Further, as shown in FIG. 2, the three teacher data 30-2, the teacher data 31-2, and the teacher data 32-2 corresponding to the input data 20-2 form one data set 40-2. The teacher data 30-2 is teacher data indicating pixels to which character elements are associated, and a black area indicates a character area. Further, the teacher data 31-2 is the teacher data indicating the pixels to which the line segment elements are associated, and the black area indicates the line segment area. Further, the teacher data 32-2 is teacher data indicating pixels to which background elements are associated, and a black area indicates a background area.

＜処理の流れ＞
以上、学習データ生成装置１０の構成例について説明した。続いて、本実施形態に係る学習データ生成装置１０における処理の流れについて説明する。図３は、本発明の実施形態に係る学習データ生成装置１０における処理の流れを示すフローチャートである。 <Processing flow>
The configuration example of the learning data generation device 10 has been described above. Subsequently, the flow of processing in the learning data generation device 10 according to the present embodiment will be described. FIG. 3 is a flowchart showing a processing flow in the learning data generation device 10 according to the embodiment of the present invention.

まず、学習データ生成装置１０の画像データ生成部１１０は、生成条件に基づき、入力用の画像データを生成する（Ｓ１０２）。 First, the image data generation unit 110 of the learning data generation device 10 generates image data for input based on the generation conditions (S102).

次いで、教師データ生成部１２０は、生成された画像データと対応する教師データを生成する（Ｓ１０４）。 Next, the teacher data generation unit 120 generates teacher data corresponding to the generated image data (S104).

以上説明したように、本実施形態に係る学習データ生成装置１０は、まず、生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成する。また、学習データ生成装置は、入力データとして生成された画像の画素ごとに要素種別を対応付けた情報を、機械学習における教師データとして生成する。かかる構成により、学習データ生成装置１０は、生成条件に基づき入力データと教師データを生成するため、生成条件が変更される度に変更後の生成条件に応じた多様な入力データと教師データを生成することができる。よって、学習データ生成装置１０は、文書画像認識のための機械学習における多様な学習データを効率よく用意することができる。 As described above, the learning data generation device 10 according to the present embodiment first generates an image including characters and geometric figures as input data in machine learning based on the generation conditions. Further, the learning data generation device generates information in which element types are associated with each pixel of the image generated as input data as teacher data in machine learning. With this configuration, the learning data generation device 10 generates input data and teacher data based on the generation conditions, so that each time the generation conditions are changed, various input data and teacher data are generated according to the changed generation conditions. can do. Therefore, the learning data generation device 10 can efficiently prepare various learning data in machine learning for document image recognition.

以上、本発明の実施形態について説明した。なお、上述した実施形態における学習データ生成装置１０をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The embodiment of the present invention has been described above. The learning data generation device 10 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. In that case, a program may be held for a certain period of time, such as a volatile memory inside a computer system serving as a server or a client. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

以上、図面を参照してこの発明の実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the gist of the present invention. It is possible to do.

１０…学習データ生成装置
１１０…画像データ生成部
１２０…教師データ生成部 10 ... Learning data generation device 110 ... Image data generation unit 120 ... Teacher data generation unit

Claims

生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成する画像データ生成部と、
前記画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を、生成された前記画像の前記画素と対応付けた情報を、前記機械学習における教師データとして生成する教師データ生成部と、
を備える、学習データ生成装置。 An image data generation unit that generates an image including characters and geometric figures as input data in machine learning based on the generation conditions.
A pixel in the image is a character element indicating an element constituting a character, a geometric element indicating an element constituting a geometric figure, or an element constituting a character and a background that is not a geometric figure. A teacher data generation unit that generates element types that distinguish whether they are background elements to be shown, and information that is associated with the pixels of the generated image as teacher data in the machine learning.
A learning data generator equipped with.

前記教師データ生成部は、前記文字要素が対応付けられた前記画素を含む領域である文字領域を前記教師データに設定する、請求項１に記載の学習データ生成装置。 The learning data generation device according to claim 1, wherein the teacher data generation unit sets a character area, which is an area including the pixel to which the character element is associated, in the teacher data.

前記画像データ生成部は、前記生成条件に基づき、ノイズをさらに含む前記画像を生成する、請求項１又は請求項２に記載の学習データ生成装置。 The learning data generation device according to claim 1 or 2, wherein the image data generation unit generates the image further including noise based on the generation conditions.

画像データ生成部が、生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成することと、
教師データ生成部が、前記画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を、生成された前記画像の前記画素と対応付けた情報を、前記機械学習における教師データとして生成することと、
を含む、学習データ生成方法。 The image data generation unit generates an image including characters and geometric figures as input data in machine learning based on the generation conditions.
In the teacher data generation unit, the pixels in the image are character elements indicating elements constituting characters, geometric elements indicating elements constituting geometric figures, or not characters and geometric figures. To generate information in which an element type for distinguishing whether it is a background element indicating an element constituting a background is associated with the pixel of the generated image is generated as teacher data in the machine learning.
Learning data generation methods, including.

コンピュータを、
生成条件に基づき、文字と幾何学的図形とを含む画像を機械学習における入力データとして生成する画像データ生成部と、
前記画像における画素が、文字を構成する要素を示す文字要素であるか、幾何学的図形を構成する要素を示す幾何学要素であるか、文字及び幾何学的図形ではない背景を構成する要素を示す背景要素であるかを区別する要素種別を、生成された前記画像の前記画素と対応付けた情報を、前記機械学習における教師データとして生成する教師データ生成部と、
として機能させる、プログラム。 Computer,
An image data generation unit that generates an image including characters and geometric figures as input data in machine learning based on the generation conditions.
A pixel in the image is a character element indicating an element constituting a character, a geometric element indicating an element constituting a geometric figure, or an element constituting a character and a background that is not a geometric figure. A teacher data generation unit that generates element types that distinguish whether they are background elements to be shown, and information that is associated with the pixels of the generated image as teacher data in the machine learning.
A program that functions as.