JP2022120308A

JP2022120308A - Image processing apparatus and program

Info

Publication number: JP2022120308A
Application number: JP2021017121A
Authority: JP
Inventors: 裕太郎平岡; Yutaro Hiraoka
Original assignee: Japan Research Institute Ltd
Current assignee: Japan Research Institute Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2022-08-18

Abstract

To provide an image processing apparatus which processes an image of a document in a fixed format, and a program.SOLUTION: In an image processing system, an image processing unit includes: a target image acquisition unit which acquires an image to be processed, which is a captured image of a document in a fixed format; a template image acquisition unit which acquires a template image of the fixed format; a feature point pair extraction unit which extracts multiple pairs of feature points having similar image information between the image to be processed and the template image, from the image to be processed and the template image; and an image transformation unit which performs projective transformation on the image to be processed on the basis of positions of the extracted pairs of feature points, to form a front image which is to be obtained when the document is seen from the front. In the fixed format, a common area is defined where common information is to be shared with a plurality of documents. The template image does not include image information of areas other than the common area, and includes at least a part of a plurality of character images in the common area.SELECTED DRAWING: Figure 2

Description

本発明は、画像処理装置及びプログラムに関する。 The present invention relates to an image processing apparatus and program.

特許文献１には、予め辞書登録済みの商品画像の局所特徴点情報と入力画像から抽出した局所特徴点情報との対応状態に基づいて、両者の相対位置関係を表現する射影変換行列を算出し、射影変換を行うことが記載されている。特許文献２には、画像にて検出された文字の認識をして、そのフォントを判断することが記載されている。
［先行技術文献］
［特許文献］
［特許文献１］特開２０１７－１８７９７１号公報
［特許文献２］特開２０１３－１８８９３５号公報 In Patent Document 1, based on the correspondence state between the local feature point information of the product image registered in advance in the dictionary and the local feature point information extracted from the input image, a projective transformation matrix that expresses the relative positional relationship between the two is calculated. , to perform a projective transformation. Patent Document 2 describes recognizing characters detected in an image and determining the font.
[Prior art documents]
[Patent Literature]
[Patent Document 1] JP 2017-187971 [Patent Document 2] JP 2013-188935

本発明の第１の態様においては、画像処理装置が提供される。画像処理装置は、撮影された定型フォーマットの書類の画像である処理対象画像を取得する対象画像取得部を備える。画像処理装置は、定型フォーマットのテンプレート画像を取得するテンプレート画像取得部を備える。画像処理装置は、処理対象画像及びテンプレート画像から、処理対象画像とテンプレート画像との間で画像情報が類似する複数の特徴点ペアを抽出する特徴点ペア抽出部を備える。画像処理装置は、特徴点ペア抽出部により抽出された複数の特徴点ペアの位置に基づいて、処理対象画像を、書類を正面から見た場合に得られるべき正面画像に射影変換する画像変換部を備える。定型フォーマットには、複数の書類の間で共通の情報を持つべき領域である共通領域が定められている。テンプレート画像は、共通領域以外の領域の画像情報を含まず、共通領域内の複数の文字画像のうちの少なくとも一部の文字画像を含む。 A first aspect of the present invention provides an image processing apparatus. The image processing apparatus includes a target image acquisition unit that acquires a processing target image, which is a photographed image of a standard format document. The image processing apparatus includes a template image acquisition section that acquires a template image in a fixed format. The image processing apparatus includes a feature point pair extraction unit that extracts a plurality of feature point pairs having similar image information between the processing target image and the template image from the processing target image and the template image. The image processing device includes an image conversion unit that projectively transforms an image to be processed into a front image that should be obtained when the document is viewed from the front, based on the positions of the plurality of feature point pairs extracted by the feature point pair extraction unit. Prepare. A common area, which is an area where common information should be held among a plurality of documents, is defined in the standard format. The template image does not contain image information of areas other than the common area, and contains at least part of the character images among the plurality of character images in the common area.

テンプレート画像は、共通領域内に含まれる枠線の画像のうちの少なくとも一部の画像を含んでよい。 The template image may include an image of at least part of the image of the border included in the common area.

特徴点ペア抽出部は、テンプレート画像に含まれる複数の画像領域のそれぞれに設定された複数の特徴点のそれぞれに対して、処理対象画像から画像情報が類似する特徴点を抽出することによって、複数の特徴点ペアを抽出してよい。画像処理装置は、複数の画像領域のそれぞれについて、それぞれの画像領域に設定された複数の特徴点に対して処理対象画像から抽出された画像情報が類似する特徴点の数を計数し、複数の画像領域のそれぞれについて計数された特徴点の数に基づいて、複数の画像領域の中から射影変換に用いる一部の画像領域を選択する領域選択部を備えてよい。画像変換部は、領域選択部により選択された一部の画像領域に設定された複数の特徴点に対して抽出された複数の特徴点ペアの位置に基づいて、処理対象画像を正面画像に射影変換してよい。 The feature point pair extraction unit extracts feature points having similar image information from the processing target image for each of the plurality of feature points set in each of the plurality of image regions included in the template image. feature point pairs may be extracted. The image processing device counts the number of feature points having similar image information extracted from the processing target image to the feature points set in each of the plurality of image regions, and calculates a plurality of feature points. An area selection unit may be provided that selects a part of the image areas to be used for projective transformation from among the plurality of image areas based on the number of feature points counted for each of the image areas. The image conversion unit projects the image to be processed onto the front image based on the positions of the plurality of feature point pairs extracted from the plurality of feature points set in the partial image region selected by the region selection unit. can be converted.

領域選択部は、複数の画像領域のうち、複数の画像領域のそれぞれについて計数された特徴点の数がより多い画像領域を、射影変換に用いる画像領域としてより優先して選択してよい。 The area selection unit may preferentially select an image area with a larger number of feature points counted for each of the plurality of image areas as the image area to be used for the projective transformation.

特徴点ペア抽出部は、複数の画像領域のそれぞれに設定された複数の特徴点のそれぞれに対して処理対象画像から画像情報が類似する特徴点を抽出し、処理対象画像から抽出した複数の特徴点の中から、特徴点ペアとして抽出された特徴点同士の位置関係がより近い一部の特徴点を複数の特徴点ペアを構成する特徴点としてより優先して選択してよい。 The feature point pair extraction unit extracts feature points having similar image information from the processing target image for each of the plurality of feature points set in each of the plurality of image regions, and extracts the plurality of features extracted from the processing target image. Among the points, some feature points having closer positional relationships between the feature points extracted as feature point pairs may be preferentially selected as feature points forming a plurality of feature point pairs.

テンプレート画像を格納する格納部を備えてよい。 A storage unit for storing the template image may be provided.

定型フォーマットの書類は人物の写真画像を予め定められた位置に含む書類であってよい。画像処理装置は、定型フォーマットの書類に含まれる人物の写真画像を教師データとして機械学習された学習済みモデルを用いて、処理対象画像から人物の写真画像を含む領域を特定する特定部を備えてよい。画像変換部は、特定部が特定した写真画像を含む領域の位置にさらに基づいて、処理対象画像を正面画像に射影変換してよい。 A form-format document may be a document containing a photographic image of a person at a predetermined location. The image processing device includes an identification unit that identifies an area containing a photographic image of a person from an image to be processed using a trained model machine-learned using a photographic image of a person included in a standard-format document as teacher data. good. The image conversion section may projectively transform the processing target image into the front image further based on the position of the region containing the photographic image identified by the identification section.

定型フォーマットの書類は、自動車又は原動機付自転車の運転免許証、旅券、若しくは健康保険の被保険者証であってよい。 The standard format document may be a driver's license for a motor vehicle or motorized bicycle, a passport, or a health insurance card.

画像処理装置は、正面画像から文字の画像を抽出する文字画像抽出部と、文字画像抽出部が抽出した文字の画像と予め定められた字形を持つ基準文字の画像との相違を示す情報を出力する文字処理部とをさらに備えてよい。 The image processing device outputs a character image extraction unit that extracts a character image from a front image, and outputs information indicating a difference between the character image extracted by the character image extraction unit and a reference character image having a predetermined character shape. and a character processing unit for processing.

文字処理部は、文字画像抽出部が抽出した文字の画像を、処理対象の文字の画像として取得する処理対象文字取得部を備えてよい。文字処理部は、基準文字の画像と、互いに異なる字形を持つ複数の文字の画像とを用いた機械学習によって生成され、入力される文字の画像から予め定められた字形に適応した文字の画像を生成する学習済みモデルを格納する格納部を備えてよい。文字処理部は、学習済みモデルを用いて、処理対象の文字の画像から、予め定められた字形に適応させた処理対象の文字の画像を生成する文字画像生成部を備えてよい。文字処理部は、文字画像生成部が生成した画像と基準文字の画像との比較結果に基づいて、処理対象の文字と基準文字との相違を示す情報を出力する相違情報出力部を備えてよい。 The character processing unit may include a processing target character acquisition unit that acquires the image of the character extracted by the character image extraction unit as the image of the character to be processed. The character processing unit is generated by machine learning using an image of a reference character and images of a plurality of characters having different character shapes. A storage unit for storing the trained model to be generated may be provided. The character processing unit may include a character image generation unit that generates an image of a character to be processed adapted to a predetermined character shape from an image of the character to be processed using a trained model. The character processing unit may include a difference information output unit that outputs information indicating the difference between the character to be processed and the reference character based on the comparison result between the image generated by the character image generation unit and the image of the reference character. .

第２の態様において、プログラムが提供される。プログラムは、コンピュータを、上記の画像処理装置として機能させる。 In a second aspect, a program is provided. The program causes the computer to function as the above image processing device.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 It should be noted that the above summary of the invention does not list all the necessary features of the invention. Subcombinations of these feature groups can also be inventions.

一実施形態における画像処理システム１０の全体構成を概略的に示す。1 schematically shows the overall configuration of an image processing system 10 in one embodiment. 画像処理部４０の機能ブロックを示す。3 shows functional blocks of an image processing unit 40. FIG. 画像切り出し部４３のよる画像の切り出し処理を説明する図面である。4 is a drawing for explaining image clipping processing by an image clipping unit 43. FIG. 運転免許証の定型フォーマット５０を運転免許証５２の一例とともに模式的に示す。A standard format 50 of a driver's license is schematically shown together with an example of a driver's license 52 . 特徴点ペア抽出部４４が使用するテンプレート画像６０を模式的に示す。A template image 60 used by the feature point pair extraction unit 44 is schematically shown. 特徴点ペア抽出部４４による特徴点マッチングの処理結果を模式的に示す。4 schematically shows the processing result of feature point matching by the feature point pair extraction unit 44. FIG. 領域選択部４６が領域を選択する処理を説明するための表である。FIG. 11 is a table for explaining a process of selecting an area by an area selection unit 46; FIG. テンプレート画像６０において写真画像を含むべき画像領域９０を示す。An image area 90 that should contain a photographic image in the template image 60 is shown. 人物の写真画像の位置を更に用いて射影変換を行う場合の処理を説明するための図である。FIG. 10 is a diagram for explaining a process in which projective transformation is performed further using the position of a photographic image of a person; 画像処理システム１０が実行する処理の流れを概略的に示す。2 schematically shows the flow of processing executed by the image processing system 10. FIG. 文字処理部２００、学習装置２０２、及び文字解析装置２８０の機能ブロックを示す。2 shows functional blocks of a character processing unit 200, a learning device 202, and a character analysis device 280. FIG. 学習データの構成を示す。The structure of training data is shown. モデル生成部２０６における機械学習を実行する学習器の概念構成を示す。2 shows a conceptual configuration of a learning device that executes machine learning in a model generation unit 206. FIG. 相違情報出力部２４０が出力する相違情報の一例を示す。An example of difference information output by the difference information output unit 240 is shown. 文字画像６２０－１、文字画像６３０－１、文字画像６４０－１及び比較画像６５０－１を拡大して示す。Character image 620-1, character image 630-1, character image 640-1, and comparison image 650-1 are shown enlarged. 参考例として、文字画像６２０－１に文字画像６４０－１を重ねた状態を示す。As a reference example, a state in which a character image 640-1 is superimposed on a character image 620-1 is shown. 特異文字情報１６０のデータ構造の一例を示す。An example of the data structure of the unique character information 160 is shown. 本実施形態に係るコンピュータ２０００の例を示す。2 shows an example of a computer 2000 according to this embodiment.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。なお、図面において、同一または類似の部分には同一の参照番号を付して、重複する説明を省く場合がある。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. Also, not all combinations of features described in the embodiments are essential for the solution of the invention. In addition, in the drawings, the same or similar parts may be denoted by the same reference numerals to omit redundant description.

図１は、一実施形態における画像処理システム１０の全体構成を概略的に示す。画像処理システム１０は、画像処理装置１２と、学習装置２０２と、文字解析装置２８０と、記憶装置２９０と、表示装置８８とを備える。画像処理装置１２は、画像処理部４０と、文字処理部２００とを備える。記憶装置２９０は、フォントデータ１００と、モデル１２０と、特異文字情報１６０とを格納する格納部である。 FIG. 1 schematically shows the overall configuration of an image processing system 10 in one embodiment. The image processing system 10 includes an image processing device 12 , a learning device 202 , a character analysis device 280 , a storage device 290 and a display device 88 . The image processing device 12 includes an image processing section 40 and a character processing section 200 . The storage device 290 is a storage unit that stores the font data 100 , the model 120 and the unique character information 160 .

処理対象画像２０及び解析対象画像３０は、撮影された定型フォーマットの書類の画像である。処理対象画像２０及び解析対象画像３０は、本人確認用書類の画像である。一例として、処理対象画像２０及び解析対象画像３０は、自動車や原動機付自転車の運転免許証、健康保険の被保険者証等のカードを撮影することにより生成された画像である。処理対象画像２０は、画像処理部４０及び文字処理部２００による処理対象の画像となる。本実施形態において定型フォーマットの書類の一例として、運転免許証を取り上げて説明する。 The image to be processed 20 and the image to be analyzed 30 are captured images of standard format documents. The processing target image 20 and the analysis target image 30 are images of personal identification documents. As an example, the processing target image 20 and the analysis target image 30 are images generated by photographing a card such as a driver's license for a car or motorized bicycle, or a health insurance card. The processing target image 20 is an image to be processed by the image processing unit 40 and the character processing unit 200 . In this embodiment, a driver's license will be described as an example of a standard format document.

画像処理部４０は、処理対象画像２０から、運転免許証に使用されている文字を抽出する機能を有する。処理対象画像２０は、例えば、運転免許証の所有者が運転免許証をスマートフォン等のカメラ機能を用いて撮影した画像である。したがって、処理対象画像２０には、運転免許証が置かれた机や床の模様等の背景ノイズが含まれる場合がある。また、処理対象画像２０には、運転免許証を正面から撮影していないこと等によって、運転免許証の画像に歪みが生じている場合がある。その他、処理対象画像２０には、撮影者の影の映り込みや、光源による陰影変化等を含む場合がある。そのため、処理対象画像２０から直接的に文字の画像を抽出すると、運転免許証に使用されている文字を正確な形状で抽出できない可能性がある。 The image processing unit 40 has a function of extracting characters used in a driver's license from the image 20 to be processed. The processing target image 20 is, for example, an image of a driver's license captured by the owner of the driver's license using a camera function of a smartphone or the like. Therefore, the processing target image 20 may include background noise such as the pattern of the desk or floor on which the driver's license is placed. In addition, in the processing target image 20, the image of the driver's license may be distorted due to the fact that the driver's license is not photographed from the front. In addition, the processing target image 20 may include reflection of the photographer's shadow, shadow change due to the light source, and the like. Therefore, if a character image is directly extracted from the processing target image 20, there is a possibility that the characters used in the driver's license cannot be extracted with an accurate shape.

そこで、画像処理部４０は、処理対象画像２０とテンプレート画像との間で特徴点マッチングを行って、処理対象画像２０を、運転免許証を正面から撮影した正面画像に変換するための射影変換行列を算出する。画像処理部４０は、算出した射影変換行列を用いて処理対象画像２０を射影変換することによって、処理対象画像２０の正面画像を生成する。画像処理部４０は、射影変換によって生成された処理対象画像２０の正面画像を生成から、運転免許証に使用されている文字の画像を抽出する。例えば、画像処理部４０は、運転免許証の所有者の氏名が印字される部位の画像と、運転免許証の所有者の住所が印字される部位の文字を抽出する。 Therefore, the image processing unit 40 performs feature point matching between the processing target image 20 and the template image, and uses a projective transformation matrix for converting the processing target image 20 into a front image of a driver's license photographed from the front. Calculate The image processing unit 40 generates a front image of the processing target image 20 by projectively transforming the processing target image 20 using the calculated projective transformation matrix. The image processing unit 40 extracts the character image used in the driver's license from the front image of the processing target image 20 generated by the projective transformation. For example, the image processing unit 40 extracts an image of the part where the name of the driver's license owner is printed and characters of the part where the address of the driver's license owner is printed.

画像処理部４０が使用するテンプレート画像は、運転免許証の定型フォーマットとして必ず含まれる部位の画像のみを含む。例えば、テンプレート画像は、運転免許証の所有者の氏名や住所等の個人情報が印字される部位の画像を含まず、氏名欄の「氏名」という文字が印字された部位と、住所欄の「住所」という文字が印字された部位の画像を含む。また、テンプレート画像は、例えば「運転免許証」の「運」、「転」、「免」、「許」及び「証」のそれぞれの文字が印字された部位の画像を含む。なお、運転免許証において生年月日欄に印字される年号の文字や運転免許証番号が印字される部位等には、発行者によって異なるフォントや異なる字体の文字が使用される場合があり得る。テンプレート画像には、発行者によって異なるフォントや異なる字体の文字が使用される可能性がある部位の文字の画像を含まないことが望ましい。このように、テンプレート画像は、運転免許証の定型フォーマットとして必ず含まれる部位の画像のみを含むので、正確な射影変換行列を算出することができる。これにより、運転免許証の氏名欄、住所欄に用いられている文字の形状を正確に表す画像を抽出することができる。 The template image used by the image processing unit 40 includes only the image of the part that is always included in the fixed format of the driver's license. For example, the template image does not include the image of the part where personal information such as the name and address of the owner of the driver's license is printed, but the part where the characters "name" in the name column and the " It includes an image of a part where the characters "Address" are printed. Also, the template image includes, for example, an image of a portion printed with characters of "Driver's License", "Un", "Trans", "En", "Permit", and "Certificate". It is possible that different issuers may use different fonts or different fonts for the year characters printed in the date of birth field and the part where the driver's license number is printed on the driver's license. . It is desirable that the template image does not include images of characters in areas where different publishers may use characters in different fonts or fonts. In this way, the template image includes only the image of the part that is always included in the standard format of the driver's license, so an accurate projective transformation matrix can be calculated. This makes it possible to extract an image that accurately represents the shapes of the characters used in the name column and address column of the driver's license.

画像処理部４０が抽出した文字の画像は、文字処理部２００に入力される。文字処理部２００は、運転免許証に使用されている文字のうち、特徴的な字形を持つ文字を示す情報を生成する機能を有する。文字処理部２００の機能の概要を説明する。 The character image extracted by the image processing unit 40 is input to the character processing unit 200 . The character processing unit 200 has a function of generating information indicating characters having a characteristic character shape among characters used in a driver's license. An outline of the functions of the character processing unit 200 will be described.

学習装置２０２は、フォントデータ１００に含まれる複数のフォントの文字画像を用いた機械学習によって、入力される文字の字形を、基準となるフォントの字形に適応させる学習済みモデル１２０を生成する。一例として、モデル１２０は、「とめ」、「はね」、「はらい」等のような装飾的な字形要素を、基準となるフォントの字形要素に適応させるためのニューラルネットワークモデルである。学習装置２０２がモデル１２０を生成する処理については後述する。 The learning device 202 generates a trained model 120 that adapts the character shape of an input character to the character shape of a reference font by machine learning using character images of a plurality of fonts included in the font data 100. As an example, the model 120 is a neural network model for adapting decorative glyph elements such as "stop", "hane", "harai", etc. to the glyph elements of a reference font. The process by which the learning device 202 generates the model 120 will be described later.

文字処理部２００は、モデル１２０を用いて、処理対象画像２０から抽出された文字を、基準となる字形に適応させた上で、処理対象画像２０に含まれる文字のうち、どの文字が特徴的な字形を持つかを示す情報を生成して、特異文字情報１６０に記録する。文字処理部２００が生成した情報は表示装置８８に表示され、特徴的な字形を持つ文字であるか否かを判定者８０が最終的に判定してよい。なお、本実施形態において、特徴的な字形を持つ文字のことを「特異文字」と呼ぶ場合がある。 The character processing unit 200 uses the model 120 to adapt the characters extracted from the processing target image 20 to the reference character shape, and then determines which characters among the characters included in the processing target image 20 are characteristic. Information indicating whether the character has a unique character shape is generated and recorded in the unique character information 160 . The information generated by the character processing unit 200 may be displayed on the display device 88, and the judge 80 may finally judge whether or not the character has a characteristic character shape. In this embodiment, a character having a characteristic character shape may be called a "peculiar character".

処理対象画像２０及び解析対象画像３０は、例えば、同一の発行機関が発行した運転免許証を撮影した画像であってよい。一例として、処理対象画像２０及び解析対象画像３０は、同一の発行者が発行した運転免許証ｎの画像であってよい。上述したように、運転免許証は、発行者毎に独自のフォントを用いて作成される場合がある。文字解析装置２８０は、特異文字情報１６０の情報を用いて解析対象画像３０を解析するので、運転免許証の作成に使用する特徴的な字形を持つ文字を認識して、解析対象画像３０に含まれる文字を適切に解析することができる。なお、文字処理部２００の機能については、図９から図１７等に関連して説明する。 The processing target image 20 and the analysis target image 30 may be, for example, images of driver's licenses issued by the same issuing agency. As an example, the processing target image 20 and the analysis target image 30 may be images of a driver's license n issued by the same issuer. As noted above, driver's licenses may be created using fonts that are unique to each issuer. Since the character analysis device 280 analyzes the analysis target image 30 using the information of the unique character information 160, it recognizes characters having characteristic character shapes used for creating a driver's license and includes them in the analysis target image 30. characters can be properly parsed. Note that the functions of the character processing unit 200 will be described with reference to FIGS. 9 to 17 and the like.

図２は、画像処理部４０の機能ブロックを示す。画像処理部４０は、テンプレート画像取得部４１と、対象画像取得部４２と、画像切り出し部４３と、特徴点ペア抽出部４４と、特定部４５と、領域選択部４６と、画像変換部４７と、文字画像抽出部４８と、格納部４９とを備える。 FIG. 2 shows functional blocks of the image processing unit 40. As shown in FIG. The image processing unit 40 includes a template image acquisition unit 41, a target image acquisition unit 42, an image clipping unit 43, a feature point pair extraction unit 44, a specification unit 45, an area selection unit 46, and an image conversion unit 47. , a character image extraction unit 48 and a storage unit 49 .

画像処理部４０は、コンピュータにより実現される。文字処理部２００、学習装置２０２、及び文字解析装置２８０は、１以上の任意の数のコンピュータにより実現されてよい。格納部４９は、不揮発性の記憶媒体や揮発性の記憶媒体によって実現されてよい。格納部４９は、インターネット等の通信回線を通じてアクセス可能な外部の記憶媒体によって実現されてよい。 The image processing unit 40 is implemented by a computer. The character processing unit 200, the learning device 202, and the character analysis device 280 may be implemented by any number of computers, one or more. The storage unit 49 may be realized by a non-volatile storage medium or a volatile storage medium. The storage unit 49 may be realized by an external storage medium accessible through a communication line such as the Internet.

対象画像取得部４２は、撮影された定型フォーマットの書類の画像である処理対象画像を取得する。本実施形態において、定型フォーマットの書類は、運転免許証である。 The target image acquiring unit 42 acquires a processing target image, which is a photographed image of a standard format document. In this embodiment, the standard format document is a driver's license.

テンプレート画像取得部４１は、定型フォーマットのテンプレート画像を取得する。具体的には、格納部４９はテンプレート画像を格納し、テンプレート画像取得部４１は、格納部４９からテンプレート画像を取得する。 The template image acquisition unit 41 acquires a template image in a fixed format. Specifically, the storage section 49 stores the template image, and the template image acquisition section 41 acquires the template image from the storage section 49 .

定型フォーマットには、複数の書類の間で共通の情報を持つべき領域である共通領域が定められている。テンプレート画像は、共通領域以外の領域の画像情報を含まず、共通領域内の複数の文字画像のうちの少なくとも一部の文字画像を含む。テンプレート画像は、共通領域内に含まれる枠線の画像のうちの少なくとも一部の画像を含んでよい。 A common area, which is an area where common information should be held among a plurality of documents, is defined in the standard format. The template image does not contain image information of areas other than the common area, and contains at least part of the character images among the plurality of character images in the common area. The template image may include an image of at least part of the image of the border included in the common area.

特徴点ペア抽出部４４は、処理対象画像及びテンプレート画像から、処理対象画像とテンプレート画像との間で画像情報が類似する複数の特徴点ペアを抽出する。例えば、特徴点ペア抽出部４４は、処理対象画像の各領域から局所特徴量を抽出し、テンプレート画像において設定された予め設定された特徴点から抽出される局所特徴量に類似する局所特徴量が抽出された領域を特徴点として選択して、選択した特徴点とテンプレート画像において設定された特徴点とを特徴点ペアとして抽出する。画像変換部４７は、特徴点ペア抽出部４４により抽出された複数の特徴点ペアの位置に基づいて、処理対象画像を、書類を正面から見た場合に得られるべき正面画像に射影変換する。 The feature point pair extraction unit 44 extracts a plurality of feature point pairs having similar image information between the processing target image and the template image from the processing target image and the template image. For example, the feature point pair extraction unit 44 extracts a local feature amount from each region of the processing target image, and the local feature amount similar to the local feature amount extracted from the preset feature points set in the template image is The extracted area is selected as a feature point, and the selected feature point and the feature point set in the template image are extracted as a feature point pair. Based on the positions of the plurality of feature point pairs extracted by the feature point pair extraction section 44, the image conversion section 47 projectively transforms the image to be processed into a front image that should be obtained when the document is viewed from the front.

特徴点ペア抽出部４４は、テンプレート画像に含まれる複数の画像領域のそれぞれに設定された複数の特徴点のそれぞれに対して、処理対象画像から画像情報が類似する特徴点を抽出することによって、複数の特徴点ペアを抽出する。領域選択部４６は、複数の画像領域のそれぞれについて、それぞれの画像領域に設定された複数の特徴点に対して処理対象画像から抽出された画像情報が類似する特徴点の数を計数し、複数の画像領域のそれぞれについて計数された特徴点の数に基づいて、複数の画像領域の中から射影変換に用いる一部の画像領域を選択する。例えば、領域選択部４６は、複数の画像領域のうち、複数の画像領域のそれぞれについて計数された特徴点の数がより多い画像領域を、射影変換に用いる画像領域としてより優先して選択する。画像変換部４７は、領域選択部４６により選択された一部の画像領域に設定された複数の特徴点に対して抽出された複数の特徴点ペアの位置に基づいて、処理対象画像を正面画像に射影変換する。 The feature point pair extraction unit 44 extracts feature points having similar image information from the processing target image for each of the plurality of feature points set in each of the plurality of image regions included in the template image. Extract multiple feature point pairs. For each of the plurality of image regions, the region selection unit 46 counts the number of feature points having similar image information extracted from the processing target image to the plurality of feature points set in each of the image regions. Based on the number of feature points counted for each of the image regions, a part of the image regions to be used for projective transformation is selected from among the plurality of image regions. For example, the area selection unit 46 preferentially selects an image area with a larger number of feature points counted for each of the plurality of image areas as the image area to be used for the projective transformation. The image conversion unit 47 converts the image to be processed into a front image based on the positions of the plurality of feature point pairs extracted for the plurality of feature points set in the partial image region selected by the region selection unit 46. Projectively transform to .

例えば、画像変換部４７は、特徴点ペアを構成する特徴点の画像上の位置に基づいて、処理対象画像を正面画像に射影変換するための射影変換行列を算出し、算出した射影変換行列を処理対象画像に適用することによって、正面画像を生成する。例えば、画像変換部４７は、射影変換後の画素位置を（ｘ１、ｙ１）とし、射影変換前の画素位置を（ｘ２，ｙ２）とした場合に、次の式（１）で表される射影変換行列Ｈを算出する。

For example, the image conversion unit 47 calculates a projective transformation matrix for projectively transforming the processing target image into a front image based on the positions on the image of the feature points forming the feature point pair, and converts the calculated projective transformation matrix to A front image is generated by applying it to the image to be processed. For example, if the pixel position after projective transformation is (x1, y1) and the pixel position before projective transformation is (x2, y2), the image transformation unit 47 performs the projection represented by the following equation (1). A conversion matrix H is calculated.

例えば、画像変換部４７は、テンプレート画像において設定された特徴点の位置を（ｘ１、ｙ１）とし、処理対象画像２０から抽出された特徴点の位置を（ｘ２、ｙ２）として、最小二乗法等を用いた数値計算によって射影変換行列Ｈのパラメータｈ１１からｈ３３を算出する。 For example, the image conversion unit 47 sets the position of the feature point set in the template image to (x1, y1), sets the position of the feature point extracted from the processing target image 20 to (x2, y2), and performs the least square method or the like. Parameters h11 to h33 of the projective transformation matrix H are calculated by numerical calculation using .

特徴点ペア抽出部４４は、複数の画像領域のそれぞれに設定された複数の特徴点のそれぞれに対して処理対象画像から画像情報が類似する特徴点を抽出し、処理対象画像から抽出した複数の特徴点の中から、特徴点ペアとして抽出された特徴点同士の位置関係がより近い一部の特徴点を複数の特徴点ペアを構成する特徴点としてより優先して選択してよい。 The feature point pair extraction unit 44 extracts feature points having similar image information from the processing target image for each of the plurality of feature points set in each of the plurality of image regions, and extracts the plurality of feature points extracted from the processing target image. Among the feature points, some feature points extracted as feature point pairs having a closer positional relationship to each other may be preferentially selected as feature points forming a plurality of feature point pairs.

定型フォーマットの書類は、人物の写真画像を含む書類であってよい。例えば、定型フォーマットの書類は、運転免許証又はパスポートであってよい。定型フォーマットの書類が人物の写真画像を含む書類である場合、特定部４５は、運転免許証に含まれる人物の写真画像を教師データとして機械学習された学習済みモデルを用いて、処理対象画像から人物の写真画像を含む領域を特定する。画像変換部４７は、特定部４５が特定した写真画像を含む領域の位置にさらに基づいて、処理対象画像を正面画像に射影変換する。 A form-format document may be a document containing a photographic image of a person. For example, the form-formatted document may be a driver's license or passport. If the document in the standard format contains a photographic image of a person, the identifying unit 45 uses a learned model that has undergone machine learning using the photographic image of a person included in a driver's license as teacher data, and extracts the image from the image to be processed. A region containing a photographic image of a person is identified. The image conversion unit 47 projectively transforms the processing target image into a front image, further based on the position of the region containing the photographic image identified by the identification unit 45 .

文字画像抽出部４８は、画像変換部４７が生成した正面画像から文字の画像を抽出する。文字処理部２００は、文字画像抽出部４８が抽出した文字の画像と予め定められた字形を持つ基準文字の画像との相違を示す情報を出力する。 The character image extraction unit 48 extracts an image of characters from the front image generated by the image conversion unit 47 . The character processing unit 200 outputs information indicating the difference between the image of the character extracted by the character image extracting unit 48 and the image of the reference character having a predetermined character shape.

図３は、画像切り出し部４３のよる画像の切り出し処理を説明する図面である。処理対象画像２０には、背景画像２１と、運転免許証が撮影された画像２２とを含む。背景画像２１は、例えば、運転免許証の背後にある机や床等が撮影された部分の画像である。画像切り出し部４３は、処理対象画像２０から物体認識を行うことによって、運転免許証が撮像された部分を特定する。画像切り出し部４３は、処理対象画像２０から、運転免許証の画像が持つべき特徴的な画像特徴量を検出し、運転免許証が持つ特徴的な画像特徴量が検出された部位を囲う枠線２３を特定することによって、運転免許証が撮像された部分を特定してよい。画像切り出し部４３は、処理対象画像２０から、運転免許証が撮像された部分の画像２２を切り出す。画像切り出し部４３が切り出した画像２２は、特徴点ペア抽出部４４及び画像変換部４７に供給される。これにより、特徴点ペア抽出部４４及び画像変換部４７が処理する画像に背景ノイズが含まれないようにすることができる。 FIG. 3 is a diagram for explaining image clipping processing by the image clipping unit 43 . The image 20 to be processed includes a background image 21 and an image 22 of a driver's license. The background image 21 is, for example, an image of the desk or floor behind the driver's license. The image clipping unit 43 identifies the portion where the driver's license is imaged by performing object recognition from the processing target image 20 . The image clipping unit 43 detects, from the processing target image 20, the characteristic image feature amount that the image of the driver's license should have, and draws a frame line surrounding the part where the characteristic image feature amount of the driver's license is detected. By identifying 23, the portion where the driver's license was imaged may be identified. The image clipping unit 43 clips an image 22 of a portion where the driver's license is captured from the image 20 to be processed. The image 22 clipped by the image clipping unit 43 is supplied to the feature point pair extraction unit 44 and the image conversion unit 47 . Thereby, background noise can be prevented from being included in the image processed by the feature point pair extraction unit 44 and the image conversion unit 47 .

運転免許証が正面から撮影されない場合、運転免許証の画像２２には矩形からの歪みが生じる場合がある。特徴点ペア抽出部４４、領域選択部４６及び画像変換部４７により、画像２２を射影変換することによって、運転免許証を正面から撮影した場合に得られるべき正面画像を仮想的に生成する。 If the driver's license is not photographed from the front, the driver's license image 22 may be distorted from being rectangular. The feature point pair extraction unit 44, the area selection unit 46, and the image conversion unit 47 projectively transform the image 22 to virtually generate a front image that should be obtained when the driver's license is photographed from the front.

図４は、運転免許証の定型フォーマット５０を運転免許証５２の一例とともに模式的に示す。定型フォーマット５０は、氏名欄の「氏名」の文字、生年月日欄の「年月日生」の文字、住所欄の「住所」の文字、運転免許の条件欄の「免許の条件等」の文字、「運転免許証」の文字等を含む。これら定型フォーマット５０に含まれる情報は、全ての運転免許証に共通する情報である。なお、図４の点線は、運転免許証の外枠を分かりやすく表すことを目的として図示したものであり、印字情報としての定型フォーマットには含まれるものではない。 FIG. 4 schematically shows a standard format 50 of a driver's license together with an example of a driver's license 52 . The standard format 50 includes characters of "name" in the name field, characters of "year, month, day of birth" in the date of birth field, characters of "address" in the address field, and "license conditions, etc." in the driver's license conditions field. Including characters, characters such as "driver's license". The information contained in these standard formats 50 is information common to all driver's licenses. Note that the dotted lines in FIG. 4 are drawn for the purpose of showing the outer frame of the driver's license in an easy-to-understand manner, and are not included in the standard format as print information.

運転免許証５２は、共通情報に加えて、氏名「○○太郎」の文字、生年月日欄の「平成２１１」の文字、住所欄の「○○県××市△△１－１－１」の文字等の情報を含む。これらの情報は、運転免許証の発行者によって印字される情報であるため、発行者によって字体やフォントが違う可能性がある。 In addition to the common information, the driver's license 52 includes the characters of the name "Taro XX", the characters of "Heisei 2 1 1" in the date of birth column, and the characters of "○○ Prefecture XX City △△ 1-1" in the address column. -1” and other information. Since this information is printed by the issuer of the driver's license, there is a possibility that the typeface and font may differ depending on the issuer.

図５は、特徴点ペア抽出部４４が使用するテンプレート画像６０を模式的に示す。図５の点線は、図４の点線と同様に運転免許証の外枠を分かりやすく表すことを目的として図示したものであり、テンプレート画像６０の画像情報として含まれるものではない。 FIG. 5 schematically shows a template image 60 used by the feature point pair extraction unit 44. As shown in FIG. The dotted lines in FIG. 5 are drawn for the purpose of showing the outer frame of the driver's license in a manner similar to the dotted lines in FIG. 4, and are not included as image information of the template image 60.

テンプレート画像６０は、氏名欄の「氏名」の文字を含む画像領域６１の画像と、住所欄の「住所」の文字を含む画像領域６２の画像と、「運」の文字を含む画像領域６３の画像と、「転」の文字を含む画像領域６４のと、「免」の文字を含む画像領域６５と、「許」の文字を含む画像領域６６と、「証」の文字を含む画像領域６７からなる画像である。画像領域６２の画像は、「氏名」の文字を囲む枠線６８の画像を含む。また、画像領域６３の画像は、「住所」の文字を囲む枠線６９の画像を含む。なお、テンプレート画像６０として、枠線６８及び枠線６９の画像を含まず、文字の画像のみを含む形態を採用してもよい。 The template image 60 consists of an image area 61 containing the characters "name" in the name field, an image area 62 containing the characters "address" in the address field, and an image area 63 containing the characters "luck". An image, an image area 64 containing the characters "TRANS", an image area 65 containing the characters "MEN", an image area 66 containing the characters "PERMIT", and an image area 67 containing the characters "PROOF". It is an image consisting of The image in the image area 62 includes an image of a border 68 surrounding the characters "name". Also, the image of the image area 63 includes an image of a frame line 69 surrounding the characters of "address". It should be noted that the template image 60 may adopt a form in which the image of the frame line 68 and the frame line 69 is not included, and only the image of characters is included.

図５に示されるように、テンプレート画像６０は、定型フォーマット５０に含まれる情報の画像情報を含み、定型フォーマット５０に含まれない情報の画像情報を含まない。したがって、テンプレート画像６０は、運転免許証の発行者によらず、全ての運転免許証が共通して持つべき情報の画像情報から構成される。そのため、特徴点ペア抽出部４４が処理対象画像２０から特徴点をより正確に抽出することが可能になる。 As shown in FIG. 5, the template image 60 includes image information of information included in the fixed format 50 and does not include image information of information not included in the fixed format 50 . Therefore, the template image 60 is composed of image information that all driver's licenses should have in common regardless of the issuer of the driver's license. Therefore, the feature point pair extraction unit 44 can more accurately extract feature points from the processing target image 20 .

図６は、特徴点ペア抽出部４４による特徴点マッチングの処理結果を模式的に示す。特徴点ペア抽出部４４は、テンプレート画像６０の画像領域６１、画像領域６２、画像領域６３、画像領域６４、画像領域６５、画像領域６６及び画像領域６７に、複数の特徴点を設定する。なお、図６には、特徴点の対応関係を分かりやすく示すことを目的として、画像領域６１から画像領域６７の各領域に特徴点を２つ設定した場合を示している。画像領域６１から画像領域６７の各領域に設定される特徴点の数は２つに限定されない。各領域に設定される特徴点の数は３つ以上であってよい。各領域に設定される特徴点の数は１つ以上であってもよい。 FIG. 6 schematically shows the processing result of feature point matching by the feature point pair extraction unit 44 . The feature point pair extraction unit 44 sets a plurality of feature points in the image area 61 , the image area 62 , the image area 63 , the image area 64 , the image area 65 , the image area 66 and the image area 67 of the template image 60 . Note that FIG. 6 shows a case where two feature points are set in each of the image areas 61 to 67 for the purpose of showing the correspondence relationship of the feature points in an easy-to-understand manner. The number of feature points set in each of the image areas 61 to 67 is not limited to two. The number of feature points set in each region may be three or more. One or more feature points may be set in each region.

一例として、画像領域６１には、特徴点７１Ａ及び特徴点７２Ｂを含む複数の特徴点が設定されている。特徴点ペア抽出部４４は、特徴点マッチングにより、特徴点７１Ａと類似する画像情報を持つ特徴点７１ａと、特徴点７２Ｂと類似する画像情報を持つ特徴点７１ｂとを、画像２２から抽出する。特徴点ペア抽出部４４は、特徴点７１Ａ及び特徴点７１ａを１つの特徴点ペアとして抽出し、特徴点７１Ｂ及び特徴点７１ｂを１つの特徴点ペアとして抽出する。 As an example, the image area 61 is set with a plurality of feature points including a feature point 71A and a feature point 72B. The feature point pair extraction unit 44 extracts from the image 22 a feature point 71a having image information similar to that of the feature point 71A and a feature point 71b having image information similar to that of the feature point 72B by feature point matching. The feature point pair extraction unit 44 extracts the feature points 71A and 71a as one feature point pair, and extracts the feature points 71B and 71b as one feature point pair.

また、一例として、画像領域６３には、特徴点７３Ａ及び特徴点７３Ｂを含む複数の特徴点が設定されている。特徴点ペア抽出部４４は、特徴点マッチングにより、特徴点７３Ａと類似する画像情報を持つ特徴点７３ａと、特徴点７３Ｂと類似する画像情報を持つ特徴点７３ｂとを、画像２２から抽出する。特徴点ペア抽出部４４は、特徴点７３Ａ及び特徴点７３ａを１つの特徴点ペアとして抽出し、特徴点７３Ｂ及び特徴点７３ｂを１つの特徴点ペアとして抽出する。 As an example, the image area 63 is set with a plurality of feature points including a feature point 73A and a feature point 73B. The feature point pair extraction unit 44 extracts from the image 22 a feature point 73a having image information similar to that of the feature point 73A and a feature point 73b having image information similar to that of the feature point 73B by feature point matching. The feature point pair extraction unit 44 extracts the feature points 73A and 73a as one feature point pair, and extracts the feature points 73B and 73b as one feature point pair.

なお、特徴点ペア抽出部４４は、特徴点７３Ｂと類似する画像情報を持つ特徴点として、特徴点７３ｂ以外に特徴点７３ｃが抽出された場合、テンプレート画像６０に設定された他の特徴点と画像２２から抽出された対応する特徴点との間の位置関係に基づいて、特徴点７３ｃを特徴点ペアとして選択せずに除外する。例えば、特徴点７３Ｂと特徴点７３ｃとを結ぶ線分は、テンプレート画像６０に設定された他の特徴点と画像２２から抽出された特徴点とを結ぶ線分とは向きが大きく異なる。そのため、特徴点ペア抽出部４４は、特徴点７３Ｂと特徴点７３ｃとの間の位置関係は、他の特徴点間の位置関係とは異なるものと判断し、特徴点７３Ｂの特徴点ペアとして、特徴点７３Ｂ及び特徴点７３ｃを除外する。これにより、特徴点ペア抽出部４４は、特徴点７３Ｂの特徴点ペアとして、特徴点７３Ｂ及び特徴点７３ｂのみを抽出する。 Note that when the feature point pair extraction unit 44 extracts the feature point 73c other than the feature point 73b as a feature point having image information similar to that of the feature point 73B, the feature point pair extraction unit 44 extracts the feature point pair from the other feature points set in the template image 60. Based on the positional relationship between the corresponding feature points extracted from the image 22, the feature point 73c is not selected as a feature point pair and is excluded. For example, a line segment connecting the feature points 73B and 73c has a significantly different direction from a line segment connecting the other feature points set in the template image 60 and the feature points extracted from the image 22 . Therefore, the feature point pair extraction unit 44 determines that the positional relationship between the feature point 73B and the feature point 73c is different from the positional relationship between other feature points, and the feature point pair of the feature point 73B is: The feature points 73B and 73c are excluded. Thereby, the feature point pair extraction unit 44 extracts only the feature points 73B and 73b as the feature point pair of the feature point 73B.

図７は、領域選択部４６が領域を選択する処理を説明するための表である。ここでは、図７の「領域」の列に示されているように、「氏名」の文字を含む画像領域６１と、「住所」の文字を含む画像領域６２と、「運」の文字を含む画像領域６３とを取り上げて説明する。また、図７の「特徴点」の列に示されているように、画像領域６１には５つの特徴点７１Ａ～特徴点７１Ｅが設定されており、画像領域６２には５つの特徴点７２Ａ～特徴点７２Ｅが設定されており、画像領域６３には５つの特徴点７３Ａ～特徴点７３Ｅが設定されているものとする。 FIG. 7 is a table for explaining the process of selecting an area by the area selection unit 46. As shown in FIG. Here, as shown in the "area" column in FIG. 7, an image area 61 containing the characters "name", an image area 62 containing the characters "address", and an image area 62 containing the characters "luck" are shown. The image area 63 will be taken up and explained. 7, five feature points 71A to 71E are set in the image area 61, and five feature points 72A to 72E are set in the image area 62. It is assumed that a feature point 72E is set, and five feature points 73A to 73E are set in the image area 63. FIG.

図７の表において、「特徴点ペア抽出結果」の列は、特徴点ペア抽出部４４によって特徴点ペアが抽出されたか否かを示す。「特徴点ペア抽出結果」の列において、「○」は特徴点ペアが抽出されたことを示し、「×」は特徴点ペアが抽出されなかったことを示す。なお、特徴点ペアが抽出されない場合としては、例えば撮影者の影の映り込みや、光源による陰影変化の影響等によって特徴点が抽出されなかった場合や、撮影者の指によって免許証の一部が隠されることによって特徴点が抽出されなかった場合等があり得る。その他、図６において画像領域６３の特徴点７３Ｂに関連して説明したように、画像２２から１つ以上の特徴点が抽出されたものの、特徴点の位置関係に基づいて特徴点ペアを構成する特徴点から除外したことによって、特徴点ペアを１つも抽出できなかった場合等があり得る。 In the table of FIG. 7 , the column “feature point pair extraction result” indicates whether feature point pairs have been extracted by the feature point pair extraction unit 44 . In the "feature point pair extraction result" column, "o" indicates that a feature point pair was extracted, and "x" indicates that a feature point pair was not extracted. Cases in which feature point pairs are not extracted include, for example, cases in which feature points are not extracted due to the reflection of the photographer's shadow, effects of changes in shading due to the light source, etc., or cases in which part of the driver's license is There may be a case where the feature point is not extracted because the is hidden. In addition, as described in relation to the feature point 73B of the image region 63 in FIG. 6, although one or more feature points are extracted from the image 22, feature point pairs are constructed based on the positional relationship of the feature points. There may be a case where even one feature point pair could not be extracted due to exclusion from the feature points.

図７の表において、「使用」の列は、特徴点ペアの位置情報を射影変換に使用するか否かを示す。「使用」の列において、「○」は特徴点ペアの位置情報を射影変換に使用することを示し、「×」は特徴点ペアの位置情報を射影変換に使用しないことを示す。例えば、図７の表の「特徴点ペア抽出結果」の列に示されるように、画像領域６１及び画像領域６２においては、設定された５つの特徴点のうち４つの特徴点について特徴点ペアが抽出されている。一方、画像領域６３においては、設定された５つの特徴点のうち３つの特徴点について特徴点ペアが抽出されていない。この場合、領域選択部４６は、射影変換に用いる画像領域として画像領域６１及び画像領域６２を選択し、射影変換に使用する画像領域として画像領域６３を選択しない。 In the table of FIG. 7 , the “Use” column indicates whether or not the position information of feature point pairs is used for projective transformation. In the "use" column, "o" indicates that the position information of the feature point pair is used for projective transformation, and "x" indicates that the position information of the feature point pair is not used for projective transformation. For example, as shown in the column of "feature point pair extraction result" in the table of FIG. extracted. On the other hand, in the image region 63, feature point pairs are not extracted for three of the set five feature points. In this case, the area selection unit 46 selects the image area 61 and the image area 62 as the image areas to be used for the projective transformation, and does not select the image area 63 as the image area to be used for the projective transformation.

このように、領域選択部４６は、特徴点ペアが抽出された数がより多い画像領域を、射影変換に用いる特徴点ペアを選択するための画像領域として、より優先して選択する。なお、領域選択部４６は、射影変換に用いる特徴点ペアを選択するための画像領域として、特徴点ペアが抽出された数が予め定められた閾値以上の画像領域を選択してよい。領域選択部４６は、射影変換に用いる特徴点ペアを選択するための画像領域として、画像領域に設定された特徴点の数に対する特徴点ペアが抽出された数の比率が予め定められた閾値以上の画像領域を選択してよい。このように、領域選択部４６は、特徴点ペアが抽出された数に基づいて、射影変換に用いる特徴点ペアを選択するための画像領域として、テンプレート画像６０において設定された複数の画像領域から一部の画像領域を選択してよい。例えば、特定の画像領域に設定された特徴点について抽出された特徴点ペアの数が少ない場合に比べると、抽出された特徴点ペアの数が多い方が、特徴点ペアの抽出結果として信頼性が高いと判断することができる。これにより、射影変換に用いる特徴点ペアを選択するための画像領域から、特徴点ペアの抽出結果の信頼性が低い可能性が高い画像領域を、除外することができる。 In this way, the area selection unit 46 preferentially selects an image area from which a larger number of feature point pairs are extracted as an image area for selecting feature point pairs to be used for projective transformation. Note that the region selection unit 46 may select an image region in which the number of extracted feature point pairs is equal to or greater than a predetermined threshold as the image region for selecting feature point pairs to be used for projective transformation. The area selection unit 46 selects an image area for selecting a feature point pair to be used for projective transformation, and selects an image area in which the ratio of the number of extracted feature point pairs to the number of feature points set in the image area is equal to or greater than a predetermined threshold value. image area may be selected. In this way, the region selection unit 46 selects a plurality of image regions set in the template image 60 as image regions for selecting feature point pairs to be used for projective transformation, based on the number of extracted feature point pairs. Some image regions may be selected. For example, compared to the case where the number of feature point pairs extracted for feature points set in a specific image region is small, the more the number of feature point pairs extracted, the more reliable the extraction result of the feature point pairs. can be determined to be high. As a result, it is possible to exclude an image region in which the reliability of the extraction result of the feature point pair is likely to be low from the image region for selecting the feature point pair to be used for the projective transformation.

画像変換部４７は、領域選択部４６が選択した画像領域に設定された特徴点について抽出された特徴点ペアの位置に基づいて、射影変換行列Ｈを算出する。そして、画像変換部４７は、算出した射影変換行列Ｈを画像２２に適用して、正面画像を生成する。これにより、処理対象画像２０において歪のある運転免許証の画像２２から、歪が低減された正面画像を生成することができる。そして、文字画像抽出部４８は、画像変換部４７により生成された正面画像から、文字の画像を抽出する。例えば、文字画像抽出部４８は、画像変換部４７が生成した正面画像から、定型フォーマット５０の氏名欄において運転免許証の所有者の氏名が印字される領域から「○○太郎」の文字の画像を抽出する。また、文字画像抽出部４８は、定型フォーマット５０の住所欄において運転免許証の所有者の住所が印字される領域から「○○県××市△△１－１－１」の文字の画像を抽出する。 The image conversion unit 47 calculates a projective transformation matrix H based on the positions of the feature point pairs extracted for the feature points set in the image area selected by the area selection unit 46 . Then, the image conversion unit 47 applies the calculated projective transformation matrix H to the image 22 to generate a front image. As a result, a front image with reduced distortion can be generated from the driver's license image 22 that is distorted in the processing target image 20 . Then, the character image extraction unit 48 extracts an image of characters from the front image generated by the image conversion unit 47 . For example, the character image extraction unit 48 extracts an image of the characters “Taro XX” from the front image generated by the image conversion unit 47, from the area where the name of the owner of the driver's license is printed in the name column of the standard format 50. to extract Further, the character image extracting unit 48 extracts a character image of “XX city, XX prefecture, △△ 1-1-1” from the area where the address of the owner of the driver's license is printed in the address column of the standard format 50. Extract.

図８及び図９は、人物の写真画像の位置を更に用いて射影変換を行う場合の処理を説明するための図である。図８は、テンプレート画像６０において写真画像を含むべき画像領域９０を示す。画像領域９０は、定型フォーマット５０において予め定められる特定の領域である。図８及び図９において、画像領域９０に特徴点９１Ａ及び特徴点９１Ｂが設定される場合を説明する。画像領域９０は矩形形状を有する。図８に示す例において、特徴点９１Ａ及び特徴点９１Ｂは、画像領域９０の矩形の２つの頂点である。 FIG. 8 and FIG. 9 are diagrams for explaining the process of performing projective transformation further using the position of the photographic image of the person. FIG. 8 shows an image area 90 that should contain a photographic image in the template image 60 . Image area 90 is a specific area that is predetermined in fixed format 50 . 8 and 9, the case where feature points 91A and 91B are set in the image area 90 will be described. Image area 90 has a rectangular shape. In the example shown in FIG. 8 , feature point 91 A and feature point 91 B are two vertices of a rectangle of image area 90 .

図９は、写真画像の画像領域から抽出した特徴点ペアを模式的に示す。特定部４５は、人物の写真画像を含む画像領域９００を画像２２から特定する。特徴点ペア抽出部４４は、特定部４５が特定した画像領域９００から特徴点９１ａ及び特徴点９１ｂを抽出することにより、特徴点ペアを特定する。 FIG. 9 schematically shows feature point pairs extracted from an image area of a photographic image. The specifying unit 45 specifies from the image 22 an image area 900 containing a photographic image of a person. The feature point pair extraction unit 44 specifies feature point pairs by extracting feature points 91 a and 91 b from the image region 900 specified by the specification unit 45 .

例えば、特定部４５は、運転免許証に含まれる人物の写真画像を教師データとして機械学習された学習済みモデルを用いる。例えば、第１の学習済みモデルは、例えば、畳み込みニューラルネットワーク（ＣＮＮ）等の機械学習アルゴリズムを利用して、運転免許証を撮像することにより得られた画像に含まれる写真画像の画像領域のうちの人物の画像領域から抽出される画像特徴量を教師データとして、運転免許証に含まれる画像領域から抽出される画像特徴量と人物の画像領域との関係性を学習させる教師あり機械学習により生成される。また、第２の学習済みモデルは、畳み込みニューラルネットワーク（ＣＮＮ）等の機械学習アルゴリズムを利用して、運転免許証を撮像することにより得られた画像に含まれる写真画像の画像領域のうちの人物の背景の画像領域から抽出される画像特徴量を教師データとして、運転免許証に含まれる画像領域から抽出される画像特徴量と人物の背景の画像領域との関係性を学習させる教師あり機械学習により生成される。 For example, the identifying unit 45 uses a learned model machine-learned using a photographic image of a person included in a driver's license as teacher data. For example, the first trained model uses a machine learning algorithm such as a convolutional neural network (CNN), for example, out of the image area of the photographic image included in the image obtained by imaging the driver's license Generated by supervised machine learning that learns the relationship between the image feature extracted from the image area included in the driver's license and the image area of the person using the image feature extracted from the image area of the person as training data be done. In addition, the second trained model uses a machine learning algorithm such as a convolutional neural network (CNN) to obtain the image of the person in the image area of the photographic image included in the image obtained by imaging the driver's license. Supervised machine learning that learns the relationship between the image feature extracted from the image area included in the driver's license and the background image area of the person, using the image feature extracted from the background image area of the driver's license as training data. Generated by

特定部４５は、画像２２の各画像領域から抽出した画像特徴量と第１の学習済みモデルとを用いて、それぞれの画像領域が運転免許証に含まれる人物の画像領域であるか否かを判定し、画像２２の各画像領域から抽出した画像特徴量と第２の学習済みモデルとを用いて、それぞれの画像領域が運転免許証に含まれる人物の背景の画像領域であるか否かを判定することによって、画像２２から画像領域９００を特定する。 The specifying unit 45 uses the image feature amount extracted from each image area of the image 22 and the first learned model to determine whether each image area is an image area of a person included in the driver's license. Then, using the image feature amount extracted from each image area of the image 22 and the second learned model, it is determined whether or not each image area is the background image area of the person included in the driver's license. The determination identifies an image region 900 from the image 22 .

特徴点ペア抽出部４４は、テンプレート画像６０に設定される特徴点９１Ａと特定部４５により特定された画像領域９００の１つの頂点を特徴点９１ａとを特徴点ペアとして抽出する。また、特徴点ペア抽出部４４は、テンプレート画像６０に設定される特徴点９１Ｂと特定部４５により特定された画像領域９００の１つの頂点を特徴点９１ｂとを特徴点ペアとして抽出する。特徴点ペア抽出部４４は、画像領域９００の外形を表す頂点のうち、特定部４５により背景の画像領域として判定された側の２つの頂点のうち、人物の右側の頂点を特徴点９１ａとして選択してよい。特徴点ペア抽出部４４は、画像領域９００の外形を表す頂点のうち、特定部４５により人物の画像領域として判定された側の２つの頂点のうち、人物の右側の頂点を特徴点９１ｂとして選択してよい。 The feature point pair extraction unit 44 extracts the feature point 91A set in the template image 60 and the feature point 91a, which is one vertex of the image region 900 specified by the specifying unit 45, as a feature point pair. Further, the feature point pair extraction unit 44 extracts the feature point 91B set in the template image 60 and the feature point 91b, which is one vertex of the image region 900 specified by the specifying unit 45, as a feature point pair. The feature point pair extraction unit 44 selects the vertex on the right side of the person among the two vertices on the side determined as the background image region by the identification unit 45 from among the vertices representing the outer shape of the image region 900 as the feature point 91a. You can The feature point pair extraction unit 44 selects the vertex on the right side of the person among the two vertices on the side determined as the image area of the person by the identification unit 45 from among the vertices representing the outer shape of the image area 900 as the feature point 91b. You can

特徴点ペア抽出部４４は、図５等に関連して説明したように、テンプレート画像６０に設定された他の特徴点と画像２２から抽出された対応する特徴点との間の位置関係に基づいて、特徴点９１ａ及び特徴点９１ｂをそれぞれ特徴点ペアを構成する特徴点として選択するか否かを判断してよい。 The feature point pair extraction unit 44, as described with reference to FIG. Then, it may be determined whether or not to select the feature point 91a and the feature point 91b as the feature points forming the feature point pair.

なお、図８においては、一例として画像領域９０に２つの特徴点を設定する場合を説明したが、画像領域９０に設定される特徴点の数は２つに限られない。画像領域９０に設定される特徴点の数は３つ以上であってよい。画像領域９０に設定される特徴点の数は１つであってもよい。 In FIG. 8, the case where two feature points are set in the image area 90 has been described as an example, but the number of feature points set in the image area 90 is not limited to two. The number of feature points set in the image area 90 may be three or more. The number of feature points set in the image area 90 may be one.

画像変換部４７は、テンプレート画像６０の画像領域６１、画像領域６２、画像領域６３、画像領域６４、画像領域６５、画像領域６６及び画像領域６７に設定した複数の特徴点について抽出した特徴点ペアに加えて、画像領域９００に設定した特徴点９１Ａ及び特徴点９１Ｂについて抽出した特徴点ペアを用いて、射影変換を行う。この際、図７等に関連して説明したように、領域選択部４６は、画像領域９００から抽出された特徴点ペアの数に基づいて、画像領域９００を射影変換に用いる画像領域として選択するか否かを判断してよい。画像変換部４７は、画像領域９００を射影変換に用いる画像領域として選択された場合に、画像領域９００に設定した特徴点９１Ａ及び特徴点９１Ｂについて抽出した特徴点ペアを更に用いて射影変換を行ってよい。 The image conversion unit 47 converts feature point pairs extracted from a plurality of feature points set in the image area 61, the image area 62, the image area 63, the image area 64, the image area 65, the image area 66 and the image area 67 of the template image 60. In addition, projective transformation is performed using feature point pairs extracted from the feature points 91A and 91B set in the image area 900. FIG. At this time, as described with reference to FIG. 7 and the like, the region selection unit 46 selects the image region 900 as the image region to be used for projective transformation based on the number of feature point pairs extracted from the image region 900. You can judge whether When the image area 900 is selected as the image area to be used for the projective transformation, the image transformation unit 47 performs the projective transformation further using the feature point pairs extracted from the feature points 91A and 91B set in the image area 900. you can

以上に説明したように、画像処理部４０の処理によれば、定型フォーマットの書類から抽出される文字の画像が、定型フォーマットの書類の発行者が使用する字形やフォントの違いの影響を受けにくくすることができる。これにより、書類の発行者によって印字された文字を正確に判断することが可能になる。なお、本実施形態において、定型フォーマットの書類の一例として運転免許証を適用した場合を説明した。定型フォーマットの書類としては、運転免許証の他に、健康保険の被保険者証、クレジットカード等の、個人情報を含むカード状の書類を適用できる。定型フォーマットの書類は、旅券において所有者の個人情報を含むページであってよい。定型フォーマットの書類は、旅券において所有者の写真画像を含むページであってよい。定型フォーマットの書類は、定型フォーマットとして文字が印字又は記載される領域を含む任意の書類であってよい。 As described above, according to the processing of the image processing unit 40, the image of characters extracted from the standard format document is less likely to be affected by the difference in character shapes and fonts used by the issuer of the standard format document. can do. This makes it possible to accurately determine the characters printed by the issuer of the document. In this embodiment, a case where a driver's license is applied as an example of a standard format document has been described. As the standard format document, in addition to a driver's license, a card-like document containing personal information, such as a health insurance card, a credit card, etc., can be applied. The form-formatted document may be a page in a passport containing the owner's personal information. The form-formatted document may be a page in a passport containing a photographic image of the owner. A standard format document may be any document that includes an area in which characters are printed or written as standard format.

次に、図１０から図１７を参照し、画像処理部４０により抽出される文字の画像を利用する一利用形態として、文字処理部２００に関する処理を具体的に説明する。 Next, referring to FIG. 10 to FIG. 17, the processing of the character processing unit 200 will be specifically described as one form of utilization of character images extracted by the image processing unit 40. FIG.

図１０は、画像処理システム１０が実行する処理の流れを概略的に示す。学習装置２０２は、フォントデータ１００を用いた機械学習により、モデル１２０を生成する。フォントデータ１００は、フォントＡ、Ｂ１、Ｂ２・・・のデータを含む。フォントＡ、Ｂ１、Ｂ２・・・は、それぞれ互いに異なる字形を持つ。フォントＡ、フォントＢ１、フォントＢ２・・・のうち、特定のフォントＡを、基準フォントＡと呼ぶ。基準フォントＡは、特異文字を特定する場合に比較対象として用いられる基準となるフォントである。なお、複数のフォントＢ１、Ｂ２・・・を、各フォントを識別する符号「ｉ」を用いて、「フォントＢｉ」と総称する場合がある。 FIG. 10 schematically shows the flow of processing executed by the image processing system 10. As shown in FIG. A learning device 202 generates a model 120 by machine learning using the font data 100 . Font data 100 includes data of fonts A, B1, B2, and so on. Fonts A, B1, B2, . . . have different character shapes. Among font A, font B1, font B2, . The reference font A is a reference font that is used as a comparison target when identifying a unique character. A plurality of fonts B1, B2, .

フォントＡ及びフォントＢｉのデータは、機械学習用のフォントデータとして用いられる。学習装置２０２は、フォントＡ及びフォントＢｉのフォントデータを用いて機械学習することによって、フォントＢｉの各フォントの文字を、フォントＡの文字の字形に適応させるモデル１２０を生成する。モデル１２０は、例えばニューラルネットワークによって構築されるモデルである。学習装置２０２は、｛フォントＢｉの文字とフォントＡの文字｝を１つの文字ペアとして機械学習を行い、フォントＢｉの文字の画像を入力した場合にフォントＡの文字の字形に適応した文字の画像を生成するモデル１２０を生成する。 The data of font A and font Bi are used as font data for machine learning. The learning device 202 generates a model 120 that adapts each character of the font Bi to the character shape of the character of the font A by machine learning using the font data of the font A and the font Bi. Model 120 is a model constructed by, for example, a neural network. The learning device 202 performs machine learning with {characters of font Bi and characters of font A} as one character pair, and when a character image of font Bi is input, a character image adapted to the character shape of font A is generated. Generate a model 120 that generates

モデル１２０が生成されると、文字処理部２００は、処理対象画像２０の画像から抽出した文字の画像をモデル１２０に入力して、適応文字１４０を生成する。文字処理部２００は、適応文字１４０と基準フォントＡの文字との比較結果に基づいて、処理対象画像２０の画像から抽出した文字が特異文字であるか否かを判定して、特異文字と判定された文字の情報を特異文字情報１６０に記録する。 After the model 120 is generated, the character processing unit 200 inputs the character image extracted from the image to be processed 20 to the model 120 to generate the adapted character 140 . The character processing unit 200 determines whether or not the character extracted from the image to be processed 20 is a peculiar character based on the comparison result between the adapted character 140 and the character of the reference font A, and determines that the character is a peculiar character. The information of the selected character is recorded in the peculiar character information 160 .

文字解析装置２８０は、特異文字情報１６０を用いて、解析対象画像３０の画像に対して文字解析を行う。例えば、「藤」の字が特異文字として判定されている場合、特異文字情報１６０は、解析対象画像３０の「藤」の文字を正しく認識するように構築されたアルゴリズムで文字認識を行う。 The character analysis device 280 uses the peculiar character information 160 to perform character analysis on the analysis target image 30 . For example, when the character "wisteria" is determined as a unique character, the unique character information 160 performs character recognition using an algorithm constructed to correctly recognize the character "wisteria" in the image 30 to be analyzed.

図１１は、文字処理部２００、学習装置２０２、及び文字解析装置２８０の機能ブロックを示す。 FIG. 11 shows functional blocks of the character processing unit 200, the learning device 202, and the character analysis device 280. As shown in FIG.

学習装置２０２は、フォント選択部２０４と、モデル生成部２０６とを備える。文字処理部２００は、処理対象文字取得部２１０と、文字画像生成部２２０と、文字画像選択部２３０と、相違情報出力部２４０と、判定結果取得部２５０とを備える。文字解析装置２８０は、解析対象画像取得部２８２と、文字画像抽出部２８４と、文字解析部２８６とを備える。 The learning device 202 includes a font selection section 204 and a model generation section 206 . The character processing unit 200 includes a processing target character acquisition unit 210 , a character image generation unit 220 , a character image selection unit 230 , a difference information output unit 240 and a determination result acquisition unit 250 . The character analysis device 280 includes an analysis target image acquisition section 282 , a character image extraction section 284 and a character analysis section 286 .

文字処理部２００、学習装置２０２、及び文字解析装置２８０は、コンピュータにより実現される。文字処理部２００、学習装置２０２、及び文字解析装置２８０は、１以上の任意の数のコンピュータにより実現されてよい。記憶装置２９０は、不揮発性の記憶媒体や揮発性の記憶媒体によって実現される。 The character processing unit 200, the learning device 202, and the character analysis device 280 are implemented by a computer. The character processing unit 200, the learning device 202, and the character analysis device 280 may be implemented by any number of computers, one or more. The storage device 290 is realized by a non-volatile storage medium or a volatile storage medium.

記憶装置２９０は、予め定められた字形を持つ基準文字の画像と、互いに異なる字形を持つ複数の文字の画像とを用いた機械学習によって生成された学習済みモデル１２０を格納する。モデル１２０は、入力される文字の画像から、予め定められた字形に適応した文字の画像を生成するモデルである。 The storage device 290 stores a learned model 120 generated by machine learning using a reference character image having a predetermined character shape and a plurality of character images having mutually different character shapes. The model 120 is a model that generates a character image adapted to a predetermined character shape from an input character image.

学習装置２０２において、モデル生成部２０６は、予め定められた字形を持つ基準文字の画像と、互いに異なる字形を持つ文字のそれぞれの画像との複数の組を学習データとした敵対的生成ネットワーク（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ、ＧＡＮ）を用いてモデル１２０を生成する。 In the learning device 202, the model generation unit 206 generates a generative hostile network using, as learning data, a plurality of sets of images of reference characters having predetermined character shapes and images of characters having different character shapes. A model 120 is generated using an Adversarial Network (GAN).

モデル生成部２０６は、学習データとしてフォントデータ１００を用いてよい。この場合、「予め定められた字形を持つ基準文字」は、予め定められた第１のフォントに属する文字であり、「互いに異なる字形を持つ複数の文字」は、第１のフォントとは異なる、互いに異なる複数の第２のフォントに属する文字である。具体的には、第１のフォントは、上述した基準フォントＡに対応し、第２のフォントは、上述したフォントＢｉに対応する。 The model generator 206 may use the font data 100 as learning data. In this case, the "reference character having a predetermined character shape" is a character belonging to a predetermined first font, and the "plurality of characters having mutually different character shapes" are different from the first font. Characters belonging to a plurality of second fonts different from each other. Specifically, the first font corresponds to the reference font A described above, and the second font corresponds to the font Bi described above.

フォント選択部２０４は、第１のフォントと、複数の第２のフォントとを選択する。フォント選択部２０４は、判定者８０が指定したフォントを、第１のフォントとして選択してよい。モデル生成部２０６は、第１のフォントの画像と、複数の第２のフォントの画像とを用いた機械学習を行って、モデル１２０を生成する。モデル１２０は、複数の第２のフォントに属する文字の画像から、第１のフォントに属する文字の字形に適応した文字の画像を生成するモデル１２０を生成するように機械学習することによって生成されたモデルである。 Font selection unit 204 selects a first font and a plurality of second fonts. The font selection unit 204 may select the font specified by the judge 80 as the first font. The model generating unit 206 generates the model 120 by performing machine learning using the first font image and the plurality of second font images. The model 120 was generated by machine learning to generate a model 120 adapted to the glyphs of characters belonging to the first font from images of characters belonging to a plurality of second fonts. is a model.

文字処理部２００において、処理対象文字取得部２１０は、処理対象の文字の画像を取得する。処理対象の文字は、例えば、処理対象画像２０に含まれる文字の画像である。処理対象の文字の画像は、具体的には、画像処理部４０によって処理対象画像２０の正面画像から抽出された文字の画像である。 In the character processing unit 200, the processing target character acquisition unit 210 acquires the image of the processing target character. A character to be processed is, for example, an image of a character included in the image to be processed 20 . Specifically, the image of characters to be processed is an image of characters extracted from the front image of the image to be processed 20 by the image processing unit 40 .

文字画像生成部２２０は、学習済みモデル１２０を用いて、処理対象の文字の画像から、予め定められた字形に適応させた処理対象の文字の画像を生成する。相違情報出力部２４０は、文字画像生成部２２０が生成した画像と基準文字の画像との比較結果に基づいて、処理対象の文字と基準文字との相違を示す情報を出力する。 The character image generating unit 220 uses the trained model 120 to generate an image of the character to be processed adapted to a predetermined character shape from the image of the character to be processed. The difference information output unit 240 outputs information indicating the difference between the character to be processed and the reference character based on the comparison result between the image generated by the character image generation unit 220 and the image of the reference character.

相違情報出力部２４０は、文字画像生成部２２０が生成した画像と基準文字の画像とを重畳して表示させる。例えば、相違情報出力部２４０は、文字画像生成部２２０が生成した画像と基準文字の画像とを互いに異なる色で重畳して表示させてよい。判定結果取得部２５０は、文字画像生成部２２０が生成した画像と基準文字の画像とが相違するか否かを示す情報を、利用者としての判定者８０から取得する。 The difference information output unit 240 displays the image generated by the character image generation unit 220 and the image of the reference character in a superimposed manner. For example, the difference information output section 240 may superimpose and display the image generated by the character image generation section 220 and the image of the reference character in different colors. The determination result acquiring unit 250 acquires information indicating whether or not the image generated by the character image generating unit 220 is different from the image of the reference character from the determining person 80 as the user.

例えば、文字画像選択部２３０は、複数の基準文字の画像の中から、文字画像生成部２２０が生成した文字の画像に類似する文字の画像を選択する。そして、相違情報出力部２４０は、文字画像選択部２３０が選択した画像と、文字画像生成部２２０が生成した画像とを重畳して表示させる。例えば、相違情報出力部２４０は、文字画像選択部２３０が選択した画像と、文字画像生成部２２０が生成した画像とを重畳して、表示装置８８に表示させる。文字画像生成部２２０が生成した画像と基準文字の画像とを重畳させて表示するので、判定者８０は、文字の字形を基準フォントに適応させた文字が基準文字と相違する部位を適切に判断することができる。 For example, the character image selection unit 230 selects a character image similar to the character image generated by the character image generation unit 220 from among a plurality of reference character images. Then, the difference information output section 240 displays the image selected by the character image selection section 230 and the image generated by the character image generation section 220 in a superimposed manner. For example, the difference information output section 240 superimposes the image selected by the character image selection section 230 and the image generated by the character image generation section 220 and causes the display device 88 to display them. Since the image generated by the character image generating unit 220 and the image of the reference character are displayed in a superimposed manner, the judge 80 can appropriately judge the part where the character whose character shape is adapted to the reference font differs from the reference character. can do.

相違情報出力部２４０は、処理対象の文字と基準文字とが相違すると判定された場合に、処理対象の文字が特徴的な字形を持つ文字であることを示す情報を特異文字情報１６０に記録してよい。なお、「処理対象の文字が特徴的な字形を持つ文字であることを示す情報」は、「処理対象の文字と基準文字との相違を示す情報」の一例である。相違情報出力部２４０は、特異文字情報１６０に記録する際、処理対象画像２０の発行主体情報とともに記録してよい。 The difference information output unit 240 records, in the peculiar character information 160, information indicating that the character to be processed has a characteristic character shape when it is determined that the character to be processed is different from the reference character. you can "Information indicating that the character to be processed has a characteristic character shape" is an example of "information indicating the difference between the character to be processed and the reference character." When recording the unique character information 160 , the difference information output unit 240 may record it together with the publisher information of the processing target image 20 .

相違情報出力部２４０は、文字画像生成部２２０が生成した画像と基準文字の画像とを比較して、文字画像生成部２２０が生成した画像において基準文字の画像とは文字の骨格が異なる部位が存在する場合に、文字の骨格が異なる部位を示す情報と処理対象の文字の識別情報とを対応づけて記録する。 The difference information output unit 240 compares the image generated by the character image generation unit 220 with the image of the reference character, and determines that the image generated by the character image generation unit 220 has a different character skeleton from the image of the reference character. If it exists, the information indicating the part with the different skeleton of the character and the identification information of the character to be processed are recorded in association with each other.

文字解析装置２８０において、解析対象画像取得部２８２は、文字の解析対象となる書類としての解析対象画像３０の画像データを取得する。解析対象画像３０は、「文字の解析対象となる書類」の一例である。文字画像抽出部２８４は、解析対象画像３０の画像データから、文字を含む画像を抽出する。文字解析部２８６は、相違情報出力部２４０によって記録された情報を用いて、文字画像抽出部２８４が抽出した画像に含まれる文字を解析する。文字解析部２８６は、判定者８０から解析対象画像３０の発行主体情報を受付けて、その発行主体情報をもとに選択された特異文字情報１６０を使用して、文字を解析するよい。このように、文字解析部２８６は、特異文字情報１６０を用いて、文字画像抽出部２８４が抽出した画像に含まれる文字を解析する。処理対象画像２０は、解析対象画像３０と同種の書類のサンプル画像である。そのため、解析対象画像３０には、処理対象画像２０で使用されている文字と同じフォントの文字が使用されている。文字解析部２８６は、処理対象画像２０から特定された特異文字を認識して解析対象画像３０を解析するので、解析対象画像３０に記載されている内容をより正確に解析することができる。 In the character analysis device 280, the analysis target image acquisition unit 282 acquires image data of the analysis target image 30 as a document whose characters are to be analyzed. The analysis target image 30 is an example of "a document whose characters are to be analyzed". The character image extraction unit 284 extracts an image including characters from the image data of the image 30 to be analyzed. The character analysis unit 286 uses the information recorded by the difference information output unit 240 to analyze characters included in the image extracted by the character image extraction unit 284 . The character analysis unit 286 may receive the publisher information of the analysis target image 30 from the judge 80 and analyze the characters using the unique character information 160 selected based on the publisher information. In this way, the character analysis unit 286 uses the peculiar character information 160 to analyze the characters included in the image extracted by the character image extraction unit 284 . The processing target image 20 is a sample image of the same type of document as the analysis target image 30 . Therefore, the analysis target image 30 uses characters in the same font as the characters used in the processing target image 20 . The character analysis unit 286 analyzes the analysis target image 30 by recognizing the peculiar characters specified from the processing target image 20, so that the content described in the analysis target image 30 can be analyzed more accurately.

図１２は、学習データの構成を示す。学習データは、文字ペア４００－１１、文字ペア４００－１２、文字ペア４００－１３・・・と、文字ペア４００－２１、文字ペア４００－２２、文字ペア４００－２３・・・とを含む。 FIG. 12 shows the configuration of learning data. The learning data includes a character pair 400-11, a character pair 400-12, a character pair 400-13, .

文字ペア４００－１１は、フォントＢ１の「藤」の文字と、基準フォントＡの「藤」の文字とのペアである。文字ペア４００－１２は、フォントＢ１の「研」の文字と、基準フォントＡの「研」の文字とのペアである。文字ペア４００－１３は、フォントＢ１の「あ」の文字と、基準フォントＡの「あ」の文字とのペアである。 A character pair 400-11 is a pair of the character "wisteria" of font B1 and the character "wisteria" of reference font A. FIG. Character pair 400-12 is a pair of the character "ken" of font B1 and the character "ken" of reference font A. FIG. A character pair 400-13 is a pair of the character "a" of font B1 and the character "a" of reference font A. FIG.

文字ペア４００－２１は、フォントＢ２の「藤」の文字と、基準フォントＡの「藤」の文字とのペアである。文字ペア４００－２２は、フォントＢ２の「研」の文字と、基準フォントＡの「研」の文字とのペアである。文字ペア４００－２３は、フォントＢ２の「あ」の文字と、基準フォントＡの「あ」の文字とのペアである。 A character pair 400-21 is a pair of the character "wisteria" of font B2 and the character "wisteria" of reference font A. FIG. Character pair 400-22 is a pair of the character "ken" of font B2 and the character "ken" of reference font A. FIG. Character pair 400-23 is a pair of the character "a" of font B2 and the character "a" of reference font A. FIG.

一般に、フォントＢｉを識別する符号を「ｉ」とし、字体（字種）を識別する符号を「ｊ」とすると、モデル生成部２０６は、文字ペア４００－ｉｊを用いて機械学習を行って、モデル１２０を生成する。モデル生成部２０６は、例えば、文字ペア４００－ｉｊを用いて、フォントＢ１の「藤」の文字の画像が入力された場合に、基準フォントＡの「藤」の文字の字形にできるだけ適応した字形を持つ「藤」文字の画像を生成するように、機械学習を行う。 In general, if the code for identifying font Bi is “i” and the code for identifying character style (character type) is “j”, model generation unit 206 performs machine learning using character pair 400-ij, Generate model 120 . For example, when the character image of the character "Fuji" of the font B1 is input using the character pair 400-ij, the model generation unit 206 generates a character shape adapted to the character shape of the character "Fuji" of the reference font A as much as possible. Machine learning is performed so as to generate an image of the "wisteria" character with

これにより、例えばフォントＢ１の「藤」の文字の画像が入力された場合に、基準フォントＡの「藤」の字形に適応した字形を持つ「藤」の文字の画像を生成するモデル１２０が生成される。このように、モデル１２０は、フォントＢｉの文字ｊの画像から、フォントＡの文字ｊが持つ字形に適応した字形を持つ文字ｊの画像を生成するモデルである。 As a result, a model 120 is generated that, for example, when a character image of "wisteria" of font B1 is input, generates a character image of "wisteria" having a character shape adapted to the character shape of "wisteria" of reference font A. be done. Thus, the model 120 is a model for generating an image of character j having a character shape adapted to the character shape of character j of font A from an image of character j of font Bi.

図１３は、モデル生成部２０６における機械学習を実行する学習器の概念構成を示す。図１３は、フォントＢｉの「藤」の文字の画像５００ｂと、基準フォントＡの「藤」の文字の画像５００ａとの文字ペアを学習データとして用いた学習を行う場合を示す。図１３に示す学習器は、ＧＡＮの一種である条件付きＧＡＮを用いた学習器である。 FIG. 13 shows a conceptual configuration of a learner that executes machine learning in the model generation unit 206. As shown in FIG. FIG. 13 shows a case where learning is performed using a character pair of a character image 500b of a character "wisteria" in font Bi and an image 500a of a character "wisteria" in reference font A as learning data. The learning device shown in FIG. 13 is a learning device using a conditional GAN, which is a type of GAN.

学習器は、生成ネットワークＧと識別ネットワークＤとを備える。生成ネットワークＧ及び識別ネットワークＤは、それぞれニューラルネットワークである。生成ネットワークＧは、入力される画像５００ｂからフェイク画像５００ｂ'を生成する。 The learner comprises a generator network G and a discriminant network D. FIG. The generation network G and the identification network D are each neural networks. The generation network G generates a fake image 500b' from an input image 500b.

識別ネットワークＤには、画像５００ｂと画像５００ａとの組み合わせが入力される。また、識別ネットワークＤには、画像５００ｂと、生成ネットワークＧが生成したフェイク画像５００ｂ'の組み合わせが入力される。識別ネットワークＤは、入力された画像の組み合わせの識別結果を出力する。例えば、識別ネットワークＤは、入力された画像の組み合わせの正しさの程度を示す確率を、０から１の範囲の数値で出力する。例えば、識別ネットワークＤは、入力された画像の組み合わせが正しいと判断した場合に「１」を出力し、入力された画像の組み合わせ正しくないと判断した場合に「０」を出力する。 Identification network D receives a combination of image 500b and image 500a. Also, the combination of the image 500b and the fake image 500b′ generated by the generation network G is input to the identification network D. The identification network D outputs the identification result of the input image combination. For example, the identification network D outputs a numerical value ranging from 0 to 1, which indicates the degree of correctness of the combination of the input images. For example, the identification network D outputs "1" when it determines that the combination of input images is correct, and outputs "0" when it determines that the combination of input images is incorrect.

識別ネットワークＤは、画像５００ｂと画像５００ａとのペアが入力された場合に「１」に近い値を出力し、画像５００ｂとフェイク画像５００ｂ'のペアが入力された場合に「０」に近い値を出力するように学習する。いわば、識別ネットワークＤは、生成ネットワークＧが生成したフェイク画像５００ｂ'を偽物であると判断できるように学習する。一方で、生成ネットワークＧは、画像５００ｂとフェイク画像５００ｂ'とのペアを識別ネットワークＤに入力した場合に識別ネットワークＤから「１」に近い値が出力されるようなフェイク画像５００ｂ'を生成できるように学習する。画像５００ｂとフェイク画像５００ｂ'とのペアを入力したときの識別ネットワークＤの出力が１／２に十分に近くなった場合に、学習が達成されたと判断される。 The identification network D outputs a value close to "1" when the pair of the image 500b and the image 500a is input, and outputs a value close to "0" when the pair of the image 500b and the fake image 500b' is input. is learned to output In other words, the identification network D learns so as to determine that the fake image 500b' generated by the generation network G is a fake. On the other hand, the generation network G can generate a fake image 500b' such that a value close to "1" is output from the identification network D when the pair of the image 500b and the fake image 500b' is input to the identification network D. learn as Learning is judged to have been achieved when the output of the discrimination network D when inputting the pair of the image 500b and the fake image 500b' is sufficiently close to 1/2.

モデル生成部２０６は、図１２に示されるようなフォント及び字体（字種）の組み合わせが異なる多数の文字ペアを用いて機械学習を行う。モデル生成部２０６は、各文字ペアで学習が達成されたと判断した場合に、生成ネットワークＧをモデル１２０として出力する。 The model generating unit 206 performs machine learning using a large number of character pairs having different combinations of fonts and character styles (types of characters) as shown in FIG. 12 . The model generation unit 206 outputs the generation network G as the model 120 when it is determined that learning has been achieved for each character pair.

図１４は、相違情報出力部２４０が出力する相違情報の一例を示す。図１４の画面６００は、相違情報出力部２４０が表示装置８８に出力する相違情報の表示例である。 FIG. 14 shows an example of difference information output by the difference information output unit 240. As shown in FIG. A screen 600 in FIG. 14 is a display example of difference information output by the difference information output unit 240 to the display device 88 .

文字画像６２０－１及び文字画像６２０－２は、処理対象画像２０の文字の画像である。画面６００において、文字画像６２０－１及び文字画像６２０－２は、文字処理部２００における「検査対象文字」として表示される。 A character image 620-1 and a character image 620-2 are images of characters in the image 20 to be processed. In the screen 600, the character image 620-1 and the character image 620-2 are displayed as the "inspection target character" in the character processing section 200. FIG.

文字画像６３０－１は、文字画像６２０－１をモデル１２０に入力することによって文字画像生成部２２０が生成した文字画像である。文字画像６３０－２は、文字画像６２０－２をモデル１２０に入力することによって文字画像生成部２２０が生成した文字である。画面６００において、文字画像６３０－１及び文字画像６３０－２は、基準フォントの字形に適合した文字を持つ「変換後文字」として表示される。 Character image 630 - 1 is a character image generated by character image generation unit 220 by inputting character image 620 - 1 to model 120 . Character image 630 - 2 is a character generated by character image generation unit 220 by inputting character image 620 - 2 to model 120 . On screen 600, character image 630-1 and character image 630-2 are displayed as "converted characters" having characters adapted to the character shape of the reference font.

文字画像６４０－１は、文字画像６２０－１の文字に対応する、基準フォントＡの文字の画像である。文字画像６４０－２は、文字画像６２０－２の文字に対応する、基準フォントＡの文字の画像である。文字画像６４０－１及び文字画像６４０－２は、文字画像選択部２３０によって選択された画像である。例えば、文字画像選択部２３０は、文字画像６３０－１と一致度が予め定められた値より高い文字を、基準フォントＡの文字の中から選択する。画面６００において、文字画像６４０－１及び文字画像６４０－２は、変換後文字の比較対象となる「基準文字」として表示される。 Character image 640-1 is an image of a character in reference font A corresponding to the character in character image 620-1. Character image 640-2 is an image of a character in reference font A corresponding to the character in character image 620-2. Character image 640 - 1 and character image 640 - 2 are images selected by character image selection unit 230 . For example, the character image selection unit 230 selects, from among the characters of the reference font A, characters whose degree of matching with the character image 630-1 is higher than a predetermined value. On screen 600, character image 640-1 and character image 640-2 are displayed as "reference characters" to be compared with the post-conversion characters.

比較画像６５０－１は、文字画像６３０－１に文字画像６４０－１を重畳した画像である。比較画像６５０－２は、文字画像６３０－２に文字画像６４０－２を重畳した画像である。 A comparative image 650-1 is an image in which the character image 640-1 is superimposed on the character image 630-1. A comparative image 650-2 is an image in which the character image 640-2 is superimposed on the character image 630-2.

相違情報出力部２４０は、文字画像６２０－１、文字画像６３０－１、文字画像６４０－１、及び比較画像６５０－１と、ボタン６１０－１とを対応づけて、表示装置８８に表示させる。また、相違情報出力部２４０は、文字画像６２０－２、文字画像６３０－２、文字画像６４０－２、及び比較画像６５０－２と、ボタン６１０－２とを対応づけて、表示装置８８に表示させる。 Difference information output unit 240 causes display device 88 to display character image 620-1, character image 630-1, character image 640-1, and comparison image 650-1 in association with button 610-1. Further, difference information output unit 240 associates character image 620-2, character image 630-2, character image 640-2, and comparison image 650-2 with button 610-2 and displays them on display device 88. Let

図１４には、「藤」と「研」の２種の字体の文字についての情報が表示された状態を示す。一般に、検査対象とした文字種を示す符号を「ｉ」とすると、相違情報出力部２４０は、文字画像６２０－ｉ、文字画像６３０－ｉ、文字画像６４０－ｉ、及び比較画像６５０－ｉと、ボタン６１０－ｉとを対応づけて、表示装置８８に表示させる。 FIG. 14 shows a state in which information about characters in two types of fonts, "wisteria" and "ken", is displayed. In general, if the code indicating the character type to be inspected is “i”, the difference information output unit 240 outputs a character image 620-i, a character image 630-i, a character image 640-i, a comparison image 650-i, It is displayed on the display device 88 in association with the button 610-i.

なお、相違情報出力部２４０は、文字画像６３０－ｉと文字画像６４０－ｉとを比較して、文字画像６３０－ｉと文字画像６４０－ｉとの一致度を算出してよい。相違情報出力部２４０は、算出した画像の一致度が低い順に、文字画像６２０－ｉ、文字画像６３０－ｉ、文字画像６４０－ｉ、及び比較画像６５０－ｉと、ボタン６１０－ｉとを含む文字画像を表示させてよい。 Note that the difference information output unit 240 may compare the character images 630-i and 640-i to calculate the degree of matching between the character images 630-i and 640-i. The difference information output unit 240 includes a character image 620-i, a character image 630-i, a character image 640-i, a comparison image 650-i, and a button 610-i in descending order of the calculated degree of matching between the images. A character image may be displayed.

判定者８０は、例えば比較画像６５０－ｉを参照して、文字画像６３０－ｉが文字画像６４０－ｉと相違する部分を持つか否かを判定する。具体的には、判定者８０は、文字画像６３０－ｉが文字画像６４０－ｉと相違する骨格部分を持つか否かを判定する。例えば、判定者８０は、文字画像６３０－ｉが文字画像６４０－ｉと相違する骨格部分を持つと判定した場合に、ボタン６１０－ｉを押す。 The determiner 80 refers to the comparison image 650-i, for example, and determines whether or not the character image 630-i has a portion different from the character image 640-i. Specifically, the determiner 80 determines whether the character image 630-i has a different skeleton portion from the character image 640-i. For example, the judge 80 presses the button 610-i when judging that the character image 630-i has a skeleton portion different from that of the character image 640-i.

ボタン６１０－ｉを押されたことに応じて、判定結果取得部２５０は、文字画像６２０－ｉが特異文字であると判定結果を取得する。この場合に、相違情報出力部２４０は、文字画像６２０－ｉの文字の情報を、特異文字として特異文字情報１６０に記録する。例えば、相違情報出力部２４０は、文字画像６２０－ｉの文字種の識別情報を、特異文字情報１６０に記録する。 In response to pressing of button 610-i, determination result acquisition section 250 acquires a determination result that character image 620-i is a unique character. In this case, the difference information output unit 240 records the character information of the character image 620-i in the peculiar character information 160 as a peculiar character. For example, the difference information output unit 240 records the identification information of the character type of the character image 620-i in the unique character information 160. FIG.

なお、判定者８０は、文字画像６３０－ｉが文字画像６４０－ｉと相違する部位を示す情報を入力してよい。例えば、判定者８０は、文字画像６３０－ｉが文字画像６４０－ｉと相違する骨格部分を含む範囲を示す情報を入力する。この場合、相違情報出力部２４０は、入力された骨格部分を含む範囲を示す情報を、文字画像６２０－ｉの文字種の識別情報に対応づけて特異文字情報１６０に記録する。 Note that the judge 80 may input information indicating the part where the character image 630-i is different from the character image 640-i. For example, the determiner 80 inputs information indicating the range in which the character image 630-i includes a different skeleton portion from the character image 640-i. In this case, the difference information output unit 240 records the information indicating the range including the input skeleton part in the unique character information 160 in association with the identification information of the character type of the character image 620-i.

図１５は、文字画像６２０－１、文字画像６３０－１、文字画像６４０－１及び比較画像６５０－１を拡大して示す。Ｄ１は、文字画像６２０－１における「藤」の文字を構成する辺７０１と辺７０２の間の骨格の間隔を示す。Ｄ２は、文字画像６３０－１における辺７０１と辺７０２の間の骨格の間隔であり、Ｄ３は、文字画像６４０－１における辺７０１と辺７０２の間の骨格の間隔である。比較画像６５０－１から、Ｄ１及びＤ２はいずれも、Ｄ３より短いことが明瞭に分かる。 FIG. 15 shows enlarged character image 620-1, character image 630-1, character image 640-1 and comparative image 650-1. D1 indicates the skeleton spacing between the sides 701 and 702 forming the character of "wisteria" in the character image 620-1. D2 is the skeleton spacing between sides 701 and 702 in character image 630-1, and D3 is the skeleton spacing between sides 701 and 702 in character image 640-1. It can be clearly seen from comparative image 650-1 that both D1 and D2 are shorter than D3.

文字画像６３０－１の文字は、文字画像６２０－１の文字の字形を基準フォントＡの「藤」の字形に適応させたものである。これにより、文字画像６３０－１と文字画像６４０－１との間では、例えば「とめ」、「はね」、「はらい」等の装飾的デザインや文字の太さの違いによって生じる差が小さくなる。そのため、文字画像６３０－１の文字の装飾的デザインや文字の太さは、基準フォントの文字画像６４０－１の文字の装飾的デザインや太さに近いものとなる。一般的に流通しているフォントは主として、装飾的デザインや太さが異なるものが多い。そのため、装飾的デザインや太さが異なるフォントを用いてモデル１２０を学習することによって、基準フォントＡの装飾的デザインに近い装飾的デザインを持つ文字の画像を生成するモデル１２０が得られる。これにより、文字画像６３０－１と文字画像６４０－１との間の装飾的デザインの差が小さくなる。よって、例えば比較画像６５０－１を通じて、文字画像６３０－１の辺７０１と辺７０２の間の骨格の間隔が短いという、学習に用いたフォントに対する特徴的な相違点を、明確に提示することができる。 The characters in the character image 630-1 are obtained by adapting the character shape of the character in the character image 620-1 to the character shape of "wisteria" of the reference font A. As a result, between the character image 630-1 and the character image 640-1, the difference between the character image 630-1 and the character image 640-1 caused by the decorative design such as "stop", "splash", and "clear" and the difference in the thickness of the characters is reduced. . Therefore, the decorative design and thickness of the characters in the character image 630-1 are close to the decorative design and thickness of the characters in the character image 640-1 of the reference font. Commonly distributed fonts are mainly those with different decorative designs and thicknesses. Therefore, by learning the model 120 using fonts with different decorative designs and thicknesses, the model 120 that generates an image of a character having a decorative design close to the decorative design of the reference font A can be obtained. This reduces the difference in decorative design between character image 630-1 and character image 640-1. Therefore, for example, through the comparison image 650-1, it is possible to clearly present a characteristic difference from the font used for learning, that is, the skeleton interval between the side 701 and the side 702 of the character image 630-1 is short. can.

なお、モデル１２０を生成するための機械学習に用いたフォントデータにおいて、辺７０１と辺７０２の間の骨格の間隔については、フォント間で違いが小さいものであったとする。この場合、フォントデータを用いたモデル１２０の学習工程において、辺７０１と辺７０２の間の骨格の間隔は普遍的な特徴を持つものとして学習される。これにより生成されるモデル１２０は、入力される「藤」の字の画像に対して、辺７０１と辺７０２の間の骨格の間隔を基準フォントに大きく適応させるようなものにはならない。したがって、文字画像６２０－１をモデル１２０で変換すると、文字画像６３０－１のように辺７０１と辺７０２の間の骨格の間隔が比較的に狭い文字の画像が得られる。 In the font data used for machine learning to generate the model 120, it is assumed that there is little difference between the fonts in the spacing between the sides 701 and 702 of the skeleton. In this case, in the learning process of the model 120 using font data, the skeleton spacing between the sides 701 and 702 is learned as having a universal feature. The model 120 generated by this does not greatly adapt the skeleton spacing between sides 701 and 702 to the reference font for the input image of the character "Fuji". Therefore, when the character image 620-1 is converted by the model 120, a character image with a relatively narrow skeleton interval between the side 701 and the side 702 is obtained like the character image 630-1.

このように、画像処理システム１０によれば、文字画像生成部２２０は、モデル１２０に入力した文字の字形要素のうち、装飾的デザインのように様々な違いがある字形要素については、基準フォントＡに適応した文字の画像を生成する。一方で、文字画像生成部２２０は、モデル１２０に入力した文字の字形要素のうち、機械学習に用いたフォントデータに共通している字形要素とは異質な特徴的な字形要素については、その特徴的な字形要素を実質的に維持した文字の画像を生成する。 As described above, according to the image processing system 10, the character image generation unit 220 selects, among the character elements of the character input to the model 120, the character elements having various differences such as decorative designs. Generates an image of characters adapted to On the other hand, the character image generation unit 220 selects, among the character shape elements of the character input to the model 120, characteristic character shape elements that are different from the character shape elements common to the font data used for machine learning. Generates an image of a character that substantially maintains the typical glyph elements.

したがって、判定者８０は、文字画像６３０－１と文字画像６４０－１とを比較することによって、文字画像６２０－１が特異文字であるか否かを容易に判定することができる。特に、相違情報出力部２４０は比較画像６５０－１を表示装置８８に表示させるので、判定者８０は、処理対象画像２０において「藤」の文字の辺７０１と辺７０２の間の骨格の間隔が短いことを一目で判断することができる。 Therefore, by comparing character image 630-1 and character image 640-1, determiner 80 can easily determine whether or not character image 620-1 is a peculiar character. In particular, since the difference information output unit 240 causes the display device 88 to display the comparison image 650-1, the judge 80 can see that the skeleton interval between the side 701 and the side 702 of the character "wisteria" in the processing target image 20 is Shortness can be judged at a glance.

図１６は、参考例として、文字画像６２０－１に文字画像６４０－１を重ねた状態を示す。文字画像６２０－１は、文字画像６３０－１とは異なり、基準フォントＡの「藤」の字形に適応させた文字ではない。そのため、文字画像６２０－１と文字画像６４０－１との間には、字形のデザイン性の違いによって生じる誤差が多く存在する。このように、文字画像６２０－１と文字画像６４０－１とを比較しても、字形の装飾的デザインの違いに起因する誤差が目立つ。そのため、判定者８０が目で見て判定する場合においても、コンピュータ等によって判定する場合においても、その文字が特異文字であるか否かを容易に判定することはできない。 FIG. 16 shows a state in which a character image 640-1 is superimposed on a character image 620-1 as a reference example. Unlike the character image 630-1, the character image 620-1 is not a character adapted to the character shape of "wisteria" of the reference font A. Therefore, there are many errors between the character image 620-1 and the character image 640-1 due to the difference in character design. In this way, even if the character image 620-1 and the character image 640-1 are compared, errors due to the difference in the decorative design of the character shape are conspicuous. Therefore, whether or not the character is a peculiar character cannot be easily determined by the judge 80 visually or by a computer or the like.

これに対し、図１６及び図１７に示されるとように、画像処理システム１０によれば、どの文字が特異文字であるかを比較的に容易に判定することができる。 On the other hand, as shown in FIGS. 16 and 17, according to the image processing system 10, it is relatively easy to determine which character is the peculiar character.

図１７は、特異文字情報１６０のデータ構造の一例を示す。特異文字情報１６０は、文字識別情報、文字画像、特徴部位及び特徴量を対応づけて格納する。 FIG. 17 shows an example of the data structure of the unique character information 160. As shown in FIG. The peculiar character information 160 stores character identification information, character images, characteristic portions, and characteristic amounts in association with each other.

「文字識別情報」には、特異文字として判定された文字の識別情報が格納される。文字識別情報は、例えば文字種を示す情報であってよい。文字識別情報は、文字コードであってよい。「文字画像」には、特異文字として判定された文字の画像データがバイナリ形式で格納される。 The "character identification information" stores identification information of characters determined as peculiar characters. The character identification information may be information indicating character types, for example. The character identification information may be a character code. The "character image" stores the image data of the character determined as the peculiar character in binary format.

「特徴部位」は、特徴的な字形を持つ部位の範囲を示す情報である。例えば、特徴的な字形を持つ領域を矩形領域で示す場合、「特徴部位」は、矩形の対角の座標を示す情報を含んでよい。 A “characteristic part” is information indicating a range of parts having a characteristic character shape. For example, when an area having a characteristic character shape is indicated by a rectangular area, the "characteristic portion" may include information indicating the coordinates of the diagonal corners of the rectangle.

「特徴量」は、文字画像から抽出される特徴量を示す情報である。特徴量は、文字画像全体から抽出される特徴量を含んでよい。特徴量は、文字画像における特徴部位から抽出される特徴量を含んでよい。 "Feature amount" is information indicating the feature amount extracted from the character image. The feature quantity may include a feature quantity extracted from the entire character image. A feature amount may include a feature amount extracted from a feature part in a character image.

特異文字情報１６０は、文字解析装置２８０が解析対象画像３０を解析する場合に使用される。例えば、解析対象画像３０を解析する場合、解析対象画像取得部２８２が解析対象画像３０の画像データを取得し、文字画像抽出部２８４が解析対象画像３０の画像データから文字画像を抽出する。そして、文字解析部２８６は、特異文字情報１６０に格納されている情報を用いて、文字画像抽出部２８４が抽出した文字画像を解析する。 The peculiar character information 160 is used when the character analysis device 280 analyzes the image 30 to be analyzed. For example, when analyzing the analysis target image 30 , the analysis target image acquisition unit 282 acquires image data of the analysis target image 30 , and the character image extraction unit 284 extracts a character image from the image data of the analysis target image 30 . The character analysis unit 286 analyzes the character image extracted by the character image extraction unit 284 using the information stored in the peculiar character information 160 .

例えば、文字解析部２８６は、解析対象画像３０から抽出された文字画像から特徴量を検出する。文字解析部２８６は、検出した特徴量が、特異文字情報１６０に格納されている特徴量に適合した場合に、当該特徴量に対応づけて特異文字情報１６０に格納されている文字識別情報を読み出す。そして、文字解析部２８６は、当該文字識別情報の文字が、解析対象画像３０から抽出された文字画像の文字であると認識する。 For example, the character analysis unit 286 detects feature amounts from character images extracted from the analysis target image 30 . When the detected feature amount matches the feature amount stored in the peculiar character information 160, the character analysis unit 286 reads the character identification information stored in the peculiar character information 160 in association with the feature amount. . Then, the character analysis unit 286 recognizes that the character of the character identification information is the character of the character image extracted from the image 30 to be analyzed.

以上に説明した画像処理システム１０によれば、画像として入力された文字が特異文字であるか否かを適切に判定するとができる。そのため、特異文字であることを認識して、文字認識等の文字を解析することができる。 According to the image processing system 10 described above, it is possible to appropriately determine whether or not a character input as an image is a peculiar character. Therefore, it is possible to recognize the unique character and analyze the character such as character recognition.

なお、機械学習に用いるフォントの書体は、処理対象画像２０及び解析対象画像３０で使用されるフォントの書体に整合させることが望ましい。例えば、処理対象画像２０及び解析対象画像３０で使用されるフォントがゴシック体の書体のフォントである場合には、ゴシック体の書体のフォントを機械学習に用いることが望ましい。 It is desirable that the typeface of the font used for machine learning be matched with the typeface of the font used in the image to be processed 20 and the image to be analyzed 30 . For example, when the fonts used in the image to be processed 20 and the image to be analyzed 30 are Gothic typeface fonts, it is desirable to use the Gothic typeface fonts for machine learning.

なお、文字認識は、文字解析の一例である。文字解析としては、書類の有効性判定等を例示することができる。 Character recognition is an example of character analysis. Character analysis can be exemplified by judging the validity of a document.

なお、画像処理システム１０における文字処理の適用例として、手書き文字の評価等に使用できる場合がある。例えば、ペン習字において生徒が楷行体等の標準的な書体を練習する場合の評価に使用できる場合がる。例えば、ある特定の指導者の手書き文字を基準文字とし、他の多数の指導者の手書き文字を用いて学習することによって、モデル１２０を生成する。そして、文字処理部２００に入力される書類として、生徒が書いた書類を入力する。これにより、生徒の手書き文字を特定の指導者が書いた手書き文字に適応した文字と、特定の指導者が書いた基準文字との比較結果に基づいて、生徒が書いた文字が特異性を持つか否かを判定する。また、画像処理システム１０における文字解析処理として、筆跡鑑定等の手書き文字の解析に利用できる場合がある。 As an application example of character processing in the image processing system 10, it may be used for evaluation of handwritten characters. For example, in pen calligraphy, it may be used for evaluation when a student practices a standard typeface such as cursive style. For example, the model 120 is generated by using the handwritten characters of a certain instructor as reference characters and learning using the handwritten characters of many other instructors. Then, as a document to be input to the character processing unit 200, a document written by the student is input. This allows the characters written by the student to have specificity based on the results of comparing the characters adapted from the student's handwriting to the handwriting written by the specific instructor and the reference characters written by the specific instructor. Determine whether or not Further, as character analysis processing in the image processing system 10, it may be used for analysis of handwritten characters such as handwriting analysis.

図１８は、本実施形態に係るコンピュータ２０００の例を示す。コンピュータ２０００にインストールされたプログラムは、コンピュータ２０００に、実施形態に係る画像処理部４０、文字処理部２００、学習装置２０２及び文字解析装置２８０、若しくは画像処理システム１０等の装置又はシステム、若しくは当該装置又はシステムの各部として機能させる、当該装置又は当該装置の各部に関連付けられるオペレーションを実行させる、及び／又は、実施形態に係るプロセス又は当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ２０００に、本明細書に記載の処理手順及びブロック図のブロックのうちのいくつか又はすべてに関連付けられた特定のオペレーションを実行させるべく、ＣＰＵ２０１２によって実行されてよい。 FIG. 18 shows an example of a computer 2000 according to this embodiment. The program installed in the computer 2000 is installed in the computer 2000 as an image processing unit 40, a character processing unit 200, a learning device 202, and a character analysis device 280 according to the embodiment, or a device or system such as the image processing system 10, or the device. or parts of a system, perform operations associated with the device or parts of the device, and/or perform a process or steps of an embodiment. Such programs may be executed by CPU 2012 to cause computer 2000 to perform specific operations associated with some or all of the process steps and block diagram blocks described herein.

本実施形態によるコンピュータ２０００は、ＣＰＵ２０１２、及びＲＡＭ２０１４を含み、それらはホストコントローラ２０１０によって相互に接続されている。コンピュータ２０００はまた、ＲＯＭ２０２６、フラッシュメモリ２０２４、通信インタフェース２０２２、及び入力／出力チップ２０４０を含む。ＲＯＭ２０２６、フラッシュメモリ２０２４、通信インタフェース２０２２、及び入力／出力チップ２０４０は、入力／出力コントローラ２０２０を介してホストコントローラ２０１０に接続されている。 A computer 2000 according to this embodiment includes a CPU 2012 and a RAM 2014 , which are interconnected by a host controller 2010 . Computer 2000 also includes ROM 2026 , flash memory 2024 , communication interface 2022 and input/output chip 2040 . ROM 2026 , flash memory 2024 , communication interface 2022 and input/output chip 2040 are connected to host controller 2010 via input/output controller 2020 .

ＣＰＵ２０１２は、ＲＯＭ２０２６及びＲＡＭ２０１４内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。 The CPU 2012 operates according to programs stored in the ROM 2026 and RAM 2014, thereby controlling each unit.

通信インタフェース２０２２は、ネットワークを介して他の電子デバイスと通信する。フラッシュメモリ２０２４は、コンピュータ２０００内のＣＰＵ２０１２によって使用されるプログラム及びデータを格納する。ＲＯＭ２０２６は、アクティブ化時にコンピュータ２０００によって実行されるブートプログラム等、及び／又はコンピュータ２０００のハードウエアに依存するプログラムを格納する。入力／出力チップ２０４０はまた、キーボード、マウス及びモニタ等の様々な入力／出力ユニットをシリアルポート、パラレルポート、キーボードポート、マウスポート、モニタポート、ＵＳＢポート、ＨＤＭＩ（登録商標）ポート等の入力／出力ポートを介して、入力／出力コントローラ２０２０に接続してよい。 Communication interface 2022 communicates with other electronic devices over a network. Flash memory 2024 stores programs and data used by CPU 2012 in computer 2000 . ROM 2026 stores programs such as boot programs that are executed by computer 2000 upon activation and/or programs that depend on the hardware of computer 2000 . Input/output chip 2040 also supports various input/output units such as keyboards, mice and monitors such as serial ports, parallel ports, keyboard ports, mouse ports, monitor ports, USB ports, HDMI ports, etc. It may be connected to the input/output controller 2020 via an output port.

プログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、又はメモリカードのようなコンピュータ可読媒体又はネットワークを介して提供される。ＲＡＭ２０１４、ＲＯＭ２０２６、又はフラッシュメモリ２０２４は、コンピュータ可読媒体の例である。プログラムは、フラッシュメモリ２０２４、ＲＡＭ２０１４、又はＲＯＭ２０２６にインストールされ、ＣＰＵ２０１２によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ２０００に読み取られ、プログラムと上記様々なタイプのハードウエアリソースとの間の連携をもたらす。装置又は方法が、コンピュータ２０００の使用に従い情報のオペレーション又は処理を実現することによって構成されてよい。 The program is provided via a computer readable medium such as a CD-ROM, DVD-ROM, or memory card or via a network. RAM 2014, ROM 2026, or flash memory 2024 are examples of computer-readable media. Programs are installed in flash memory 2024 , RAM 2014 , or ROM 2026 and executed by CPU 2012 . The information processing described within these programs is read by computer 2000 to provide coordination between the programs and the various types of hardware resources described above. An apparatus or method may be configured by implementing information operations or processing according to the use of computer 2000 .

例えば、コンピュータ２０００及び外部デバイス間で通信が実行される場合、ＣＰＵ２０１２は、ＲＡＭ２０１４にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インタフェース２０２２に対し、通信処理を命令してよい。通信インタフェース２０２２は、ＣＰＵ２０１２の制御下、ＲＡＭ２０１４及びフラッシュメモリ２０２４のような記録媒体内に提供される送信バッファ処理領域に格納された送信データを読み取り、読み取った送信データをネットワークに送信し、ネットワークから受信された受信データを、記録媒体上に提供される受信バッファ処理領域等に書き込む。 For example, when communication is performed between the computer 2000 and an external device, the CPU 2012 executes a communication program loaded in the RAM 2014 and sends communication processing to the communication interface 2022 based on the processing described in the communication program. you can command. Under the control of the CPU 2012, the communication interface 2022 reads transmission data stored in a transmission buffer processing area provided in a recording medium such as the RAM 2014 and the flash memory 2024, transmits the read transmission data to the network, and receives the transmission data from the network. The received data is written in a receive buffer processing area or the like provided on the recording medium.

また、ＣＰＵ２０１２は、フラッシュメモリ２０２４等のような記録媒体に格納されたファイル又はデータベースの全部又は必要な部分がＲＡＭ２０１４に読み取られるようにし、ＲＡＭ２０１４上のデータに対し様々な種類の処理を実行してよい。ＣＰＵ２０１２は次に、処理されたデータを記録媒体にライトバックする。 The CPU 2012 also causes the RAM 2014 to read all or a necessary portion of a file or database stored in a recording medium such as the flash memory 2024, and performs various types of processing on the data on the RAM 2014. good. CPU 2012 then writes back the processed data to the recording medium.

様々なタイプのプログラム、データ、テーブル、及びデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理にかけられてよい。ＣＰＵ２０１２は、ＲＡＭ２０１４から読み取られたデータに対し、本明細書に記載され、プログラムの命令シーケンスによって指定される様々な種類のオペレーション、情報処理、条件判断、条件分岐、無条件分岐、情報の検索／置換等を含む、様々な種類の処理を実行してよく、結果をＲＡＭ２０１４にライトバックする。また、ＣＰＵ２０１２は、記録媒体内のファイル、データベース等における情報を検索してよい。例えば、各々が第２の属性の属性値に関連付けられた第１の属性の属性値を有する複数のエントリが記録媒体内に格納される場合、ＣＰＵ２０１２は、第１の属性の属性値が指定されている、条件に一致するエントリを当該複数のエントリの中から検索し、当該エントリ内に格納された第２の属性の属性値を読み取り、それにより予め定められた条件を満たす第１の属性に関連付けられた第２の属性の属性値を取得してよい。 Various types of information such as various types of programs, data, tables, and databases may be stored on the recording medium and subjected to information processing. CPU 2012 performs various types of operations on data read from RAM 2014, information processing, conditional judgment, conditional branching, unconditional branching, information retrieval/ Various types of processing may be performed, including permutations, etc., and the results written back to RAM 2014 . Also, the CPU 2012 may search for information in a file in a recording medium, a database, or the like. For example, if multiple entries each having an attribute value of a first attribute associated with an attribute value of a second attribute are stored in the recording medium, the CPU 2012 determines which attribute value of the first attribute is specified. search the plurality of entries for an entry that matches the condition, read the attribute value of the second attribute stored in the entry, and thereby determine the first attribute that satisfies the predetermined condition. An attribute value of the associated second attribute may be obtained.

上で説明したプログラム又はソフトウェアモジュールは、コンピュータ２０００上又はコンピュータ２０００近傍のコンピュータ可読媒体に格納されてよい。専用通信ネットワーク又はインターネットに接続されたサーバーシステム内に提供されるハードディスク又はＲＡＭのような記録媒体が、コンピュータ可読媒体として使用可能である。コンピュータ可読媒体に格納されたプログラムを、ネットワークを介してコンピュータ２０００に提供してよい。 The programs or software modules described above may be stored in a computer readable medium on or near computer 2000 . A storage medium such as a hard disk or RAM provided in a server system connected to a private communication network or the Internet can be used as the computer readable medium. A program stored in a computer-readable medium may be provided to computer 2000 via a network.

コンピュータ２０００にインストールされ、コンピュータ２０００を画像処理システム１０として機能させるプログラムは、ＣＰＵ２０１２等に働きかけて、コンピュータ２０００を、画像処理部４０、文字処理部２００を含む画像処理システム１０の各部としてそれぞれ機能させてよい。これらのプログラムに記述された情報処理は、コンピュータ２０００に読込まれることにより、ソフトウエアと上述した各種のハードウエア資源とが協働した具体的手段である画像処理システム１０の各部として機能する。そして、これらの具体的手段によって、本実施形態におけるコンピュータ２０００の使用目的に応じた情報の演算又は加工を実現することにより、使用目的に応じた特有の画像処理システム１０が構築される。 A program installed in the computer 2000 and causing the computer 2000 to function as the image processing system 10 works on the CPU 2012 and the like to cause the computer 2000 to function as each part of the image processing system 10 including the image processing section 40 and the character processing section 200. you can The information processing described in these programs is read by the computer 2000 and functions as each part of the image processing system 10, which is concrete means in which the software and various hardware resources described above cooperate. By implementing calculation or processing of information according to the purpose of use of the computer 2000 in this embodiment by these concrete means, a unique image processing system 10 corresponding to the purpose of use is constructed.

様々な実施形態が、ブロック図等を参照して説明された。ブロック図において各ブロックは、（１）オペレーションが実行されるプロセスの段階又は（２）オペレーションを実行する役割を持つ装置の各部を表わしてよい。特定の段階及び各部が、専用回路、コンピュータ可読媒体上に格納されるコンピュータ可読命令と共に供給されるプログラマブル回路、及び／又はコンピュータ可読媒体上に格納されるコンピュータ可読命令と共に供給されるプロセッサによって実装されてよい。専用回路は、デジタル及び／又はアナログハードウエア回路を含んでよく、集積回路（ＩＣ）及び／又はディスクリート回路を含んでよい。プログラマブル回路は、論理ＡＮＤ、論理ＯＲ、論理ＸＯＲ、論理ＮＡＮＤ、論理ＮＯＲ、及び他の論理オペレーション、フリップフロップ、レジスタ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、プログラマブルロジックアレイ（ＰＬＡ）等のようなメモリ要素等を含む、再構成可能なハードウエア回路を含んでよい。 Various embodiments have been described with reference to block diagrams and the like. Each block in the block diagram may represent (1) a stage of a process in which an operation is performed or (2) a piece of equipment responsible for performing the operation. Certain steps and portions may be implemented by dedicated circuitry, programmable circuitry provided with computer readable instructions stored on a computer readable medium, and/or processor provided with computer readable instructions stored on a computer readable medium. you can Dedicated circuitry may include digital and/or analog hardware circuitry and may include integrated circuits (ICs) and/or discrete circuitry. Programmable circuits include logic AND, logic OR, logic XOR, logic NAND, logic NOR, and other logic operations, memory elements such as flip-flops, registers, field programmable gate arrays (FPGAs), programmable logic arrays (PLAs), etc. and the like.

コンピュータ可読媒体は、適切なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよく、その結果、そこに格納される命令を有するコンピュータ可読媒体は、処理手順又はブロック図で指定されたオペレーションを実行するための手段をもたらすべく実行され得る命令を含む製品の少なくとも一部を構成する。コンピュータ可読媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体等が含まれてよい。コンピュータ可読媒体のより具体的な例としては、フロッピー（登録商標）ディスク、ディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭ又はフラッシュメモリ）、電気的消去可能プログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、静的ランダムアクセスメモリ（ＳＲＡＭ）、コンパクトディスクリードオンリメモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、ブルーレイ（登録商標）ディスク、メモリスティック、集積回路カード等が含まれてよい。 A computer-readable medium may include any tangible device capable of storing instructions to be executed by a suitable device such that the computer-readable medium having instructions stored thereon may be represented by procedures or block diagrams. constitutes at least part of an article of manufacture that includes instructions that can be executed to bring about means for performing the operations performed. Examples of computer-readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer readable media include floppy disks, diskettes, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), Electrically Erasable Programmable Read Only Memory (EEPROM), Static Random Access Memory (SRAM), Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD), Blu-ray Disc, Memory Stick, An integrated circuit card or the like may be included.

コンピュータ可読命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、又はＳｍａｌｌｔａｌｋ（登録商標）、ＪＡＶＡ（登録商標）、Ｃ＋＋等のようなオブジェクト指向プログラミング言語、及び「Ｃ」プログラミング言語又は同様のプログラミング言語のような従来の手続型プログラミング言語を含む、１又は複数のプログラミング言語の任意の組み合わせで記述されたソースコード又はオブジェクトコードのいずれかを含んでよい。 The computer readable instructions may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state configuration data, or instructions such as Smalltalk, JAVA, C++, etc. any source or object code written in any combination of one or more programming languages, including object-oriented programming languages, and conventional procedural programming languages such as the "C" programming language or similar programming languages; may include

コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサ又はプログラマブル回路に対し、ローカルに又はローカルエリアネットワーク（ＬＡＮ）、インターネット等のようなワイドエリアネットワーク（ＷＡＮ）を介して提供され、説明された処理手順又はブロック図で指定されたオペレーションを実行するための手段をもたらすべく、コンピュータ可読命令を実行してよい。プロセッサの例としては、コンピュータプロセッサ、処理ユニット、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラ等を含む。 Computer readable instructions may be transferred to a processor or programmable circuitry of a general purpose computer, special purpose computer, or other programmable data processing apparatus, either locally or over a wide area network (WAN), such as a local area network (LAN), the Internet, or the like. ) and may be executed to provide means for performing the operations specified in the process steps or block diagrams described. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。また、技術的に矛盾しない範囲において、特定の実施形態について説明した事項を、他の実施形態に適用することができる。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It is obvious to those skilled in the art that various modifications and improvements can be made to the above embodiments. In addition, matters described with respect to a specific embodiment can be applied to other embodiments as long as they are not technically inconsistent. It is clear from the description of the scope of claims that forms with such modifications or improvements can also be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The execution order of each process such as actions, procedures, steps, and stages in the devices, systems, programs, and methods shown in the claims, the specification, and the drawings is particularly "before", "before etc., and it should be noted that they can be implemented in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the specification, and the drawings, even if the description is made using "first," "next," etc. for the sake of convenience, it means that it is essential to carry out in this order. not a thing

１０画像処理システム
１２画像処理装置
２０処理対象画像
２１背景画像
２２画像
２３枠線
３０解析対象画像
４０画像処理部
４１テンプレート画像取得部
４２対象画像取得部
４３画像切り出し部
４４特徴点ペア抽出部
４５特定部
４６領域選択部
４７画像変換部
４８文字画像抽出部
４９格納部
５０定型フォーマット
５２運転免許証
６０テンプレート画像
６１、６２、６３、６４、６５、６６、６７画像領域
６８、６９枠線
７１、７２、７３特徴点
８０判定者
８８表示装置
９０画像領域
９１特徴点
９１Ａ、９１Ｂ特徴点
１００フォントデータ
１２０モデル
１４０適応文字
１６０特異文字情報
２００文字処理部
２０２学習装置
２０４フォント選択部
２０６モデル生成部
２１０処理対象文字取得部
２２０文字画像生成部
２３０文字画像選択部
２４０相違情報出力部
２５０判定結果取得部
２８０文字解析装置
２８２解析対象画像取得部
２８４文字画像抽出部
２８６文字解析部
２９０記憶装置
４００文字ペア
５００画像
６００画面
６１０ボタン
６２０、６３０、６４０文字画像
６５０比較画像
７０１、７０２辺
９００画像領域
２０００コンピュータ
２０１０ホストコントローラ
２０１２ＣＰＵ
２０１４ＲＡＭ
２０２０入力／出力コントローラ
２０２２通信インタフェース
２０２４フラッシュメモリ
２０２６ＲＯＭ
２０４０入力／出力チップ 10 Image processing system 12 Image processing device 20 Image to be processed 21 Background image 22 Image 23 Frame line 30 Image to be analyzed 40 Image processing unit 41 Template image acquisition unit 42 Target image acquisition unit 43 Image clipping unit 44 Feature point pair extraction unit 45 Identification Section 46 Region Selection Section 47 Image Conversion Section 48 Character Image Extraction Section 49 Storage Section 50 Standard Format 52 Driver's License 60 Template Images 61, 62, 63, 64, 65, 66, 67 Image Regions 68, 69 Frame Lines 71, 72 , 73 feature point 80 judge 88 display device 90 image area 91 feature points 91A, 91B feature point 100 font data 120 model 140 adaptive character 160 unique character information 200 character processing unit 202 learning device 204 font selection unit 206 model generation unit 210 processing Target character acquisition unit 220 Character image generation unit 230 Character image selection unit 240 Difference information output unit 250 Judgment result acquisition unit 280 Character analysis device 282 Analysis target image acquisition unit 284 Character image extraction unit 286 Character analysis unit 290 Storage device 400 Character pair 500 Image 600 Screen 610 Buttons 620, 630, 640 Character image 650 Comparison images 701, 702 Side 900 Image area 2000 Computer 2010 Host controller 2012 CPU
2014 RAM
2020 input/output controller 2022 communication interface 2024 flash memory 2026 ROM
2040 input/output chip

Claims

撮影された定型フォーマットの書類の画像である処理対象画像を取得する対象画像取得部と、
前記定型フォーマットのテンプレート画像を取得するテンプレート画像取得部と、
前記処理対象画像及び前記テンプレート画像から、前記処理対象画像と前記テンプレート画像との間で画像情報が類似する複数の特徴点ペアを抽出する特徴点ペア抽出部と、
前記特徴点ペア抽出部により抽出された前記複数の特徴点ペアの位置に基づいて、前記処理対象画像を、前記書類を正面から見た場合に得られるべき正面画像に射影変換する画像変換部と
を備え、
前記定型フォーマットには、複数の書類の間で共通の情報を持つべき領域である共通領域が定められており、
前記テンプレート画像は、前記共通領域以外の領域の画像情報を含まず、前記共通領域内の複数の文字画像のうちの少なくとも一部の文字画像を含む
画像処理装置。 a target image acquisition unit that acquires a processing target image that is a photographed image of a standard format document;
a template image obtaining unit that obtains the template image in the fixed format;
a feature point pair extraction unit that extracts a plurality of feature point pairs having similar image information between the processing target image and the template image from the processing target image and the template image;
an image conversion unit that projectively transforms the image to be processed into a front image that should be obtained when the document is viewed from the front, based on the positions of the plurality of feature point pairs extracted by the feature point pair extraction unit; with
A common area, which is an area where common information should be held among a plurality of documents, is defined in the standard format,
The image processing device, wherein the template image does not include image information of areas other than the common area, and includes at least part of character images among the plurality of character images in the common area.

前記テンプレート画像は、前記共通領域内に含まれる枠線の画像のうちの少なくとも一部の画像を含む
請求項１に記載の画像処理装置。 2. The image processing apparatus according to claim 1, wherein said template image includes at least a part of an image of a frame included in said common area.

前記特徴点ペア抽出部は、前記テンプレート画像に含まれる複数の画像領域のそれぞれに設定された複数の特徴点のそれぞれに対して、前記処理対象画像から画像情報が類似する特徴点を抽出することによって、前記複数の特徴点ペアを抽出し、
前記画像処理装置は、
前記複数の画像領域のそれぞれについて、それぞれの画像領域に設定された前記複数の特徴点に対して前記処理対象画像から抽出された前記画像情報が類似する特徴点の数を計数し、前記複数の画像領域のそれぞれについて計数された前記特徴点の数に基づいて、前記複数の画像領域の中から前記射影変換に用いる一部の画像領域を選択する領域選択部
をさらに備え、
前記画像変換部は、前記領域選択部により選択された前記一部の画像領域に設定された前記複数の特徴点に対して抽出された複数の特徴点ペアの位置に基づいて、前記処理対象画像を前記正面画像に射影変換する
請求項１又は２に記載の画像処理装置。 The feature point pair extraction unit extracts feature points having similar image information from the processing target image for each of a plurality of feature points set in each of a plurality of image regions included in the template image. extracting the plurality of feature point pairs by
The image processing device is
for each of the plurality of image regions, counting the number of feature points similar in the image information extracted from the processing target image to the plurality of feature points set in each of the image regions; An area selection unit that selects a partial image area to be used for the projective transformation from among the plurality of image areas based on the number of feature points counted for each image area,
The image conversion unit converts the processing target image based on the positions of the plurality of feature point pairs extracted for the plurality of feature points set in the partial image region selected by the region selection unit. to the front image.

前記領域選択部は、前記複数の画像領域のうち、前記複数の画像領域のそれぞれについて計数された前記特徴点の数がより多い画像領域を、前記射影変換に用いる画像領域としてより優先して選択する
請求項３に記載の画像処理装置。 The area selection unit preferentially selects, from among the plurality of image areas, an image area having a larger number of the feature points counted for each of the plurality of image areas as an image area to be used for the projective transformation. 4. The image processing apparatus according to claim 3.

前記特徴点ペア抽出部は、前記複数の画像領域のそれぞれに設定された複数の特徴点のそれぞれに対して前記処理対象画像から画像情報が類似する特徴点を抽出し、前記処理対象画像から抽出した複数の特徴点の中から、特徴点ペアとして抽出された特徴点同士の位置関係がより近い一部の特徴点を前記複数の特徴点ペアを構成する特徴点としてより優先して選択する
請求項４に記載の画像処理装置。 The feature point pair extraction unit extracts feature points having similar image information from the processing target image for each of the plurality of feature points set in each of the plurality of image regions, and extracts from the processing target image. selecting, from among the plurality of feature points extracted as feature point pairs, some feature points having a closer positional relationship to each other as feature points constituting the plurality of feature point pairs. Item 5. The image processing apparatus according to item 4.

前記テンプレート画像を格納する格納部
をさらに備える請求項１から５のいずれか一項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 5, further comprising a storage unit that stores the template image.

前記定型フォーマットの書類は、人物の写真画像を予め定められた位置に含む書類であり、
前記画像処理装置は、
前記定型フォーマットの書類に含まれる人物の写真画像を教師データとして機械学習された学習済みモデルを用いて、前記処理対象画像から人物の写真画像を含む領域を特定する特定部
をさらに備え、
前記画像変換部は、前記特定部が特定した前記写真画像を含む領域の位置にさらに基づいて、前記処理対象画像を前記正面画像に射影変換する
請求項１から６のいずれか一項に記載の画像処理装置。 the standard format document is a document containing a photographic image of a person at a predetermined position;
The image processing device is
an identifying unit that identifies an area containing a photographic image of a person from the image to be processed using a trained model that has been machine-learned using the photographic image of the person included in the document of the fixed format as teacher data;
7. The image conversion unit according to any one of claims 1 to 6, wherein the image conversion unit projectively transforms the processing target image into the front image further based on the position of the area containing the photographic image identified by the identification unit. Image processing device.

前記定型フォーマットの書類は、自動車又は原動機付自転車の運転免許証、旅券、若しくは健康保険の被保険者証である
請求項１から７のいずれか一項に記載の画像処理装置。 8. The image processing apparatus according to any one of claims 1 to 7, wherein the fixed format document is a driver's license for an automobile or motorized bicycle, a passport, or a health insurance card.

前記正面画像から文字の画像を抽出する文字画像抽出部と、
前記文字画像抽出部が抽出した文字の画像と予め定められた字形を持つ基準文字の画像との相違を示す情報を出力する文字処理部と
をさらに備える請求項１から８のいずれか一項に記載の画像処理装置。 a character image extraction unit that extracts an image of characters from the front image;
9. The character processing unit according to any one of claims 1 to 8, further comprising a character processing unit that outputs information indicating a difference between the image of the character extracted by the character image extracting unit and the image of the reference character having a predetermined character shape. The described image processing device.

前記文字処理部は、
前記文字画像抽出部が抽出した前記文字の画像を、処理対象の文字の画像として取得する処理対象文字取得部と、
前記基準文字の画像と、互いに異なる字形を持つ複数の文字の画像とを用いた機械学習によって生成され、入力される文字の画像から前記予め定められた字形に適応した文字の画像を生成する学習済みモデルを格納する格納部と、
前記学習済みモデルを用いて、前記処理対象の文字の画像から、前記予め定められた字形に適応させた前記処理対象の文字の画像を生成する文字画像生成部と、
前記文字画像生成部が生成した画像と前記基準文字の画像との比較結果に基づいて、前記処理対象の文字と前記基準文字との相違を示す情報を出力する相違情報出力部と
を備える請求項９に記載の画像処理装置。 The character processing unit
a processing target character acquisition unit that acquires the image of the character extracted by the character image extraction unit as an image of the character to be processed;
Learning to generate a character image adapted to the predetermined character shape from an input character image generated by machine learning using the reference character image and a plurality of character images having mutually different character shapes. a storage unit for storing finished models;
a character image generation unit that generates an image of the character to be processed adapted to the predetermined character shape from the image of the character to be processed using the trained model;
A difference information output unit for outputting information indicating a difference between the character to be processed and the reference character based on a comparison result between the image generated by the character image generation unit and the image of the reference character. 9. The image processing apparatus according to 9.

コンピュータを、請求項１から１０のいずれか一項に記載の画像処理装置として機能させるためのプログラム。 A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 10.