JP2010231686A

JP2010231686A - Device, method and program for extracting document area from image

Info

Publication number: JP2010231686A
Application number: JP2009080901A
Authority: JP
Inventors: Nobuyuki Hara; 伸之原; Akihiro Minagawa; 明洋皆川; Yutaka Katsuyama; 裕勝山; Yoshinobu Hotta; 悦伸堀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-03-30
Filing date: 2009-03-30
Publication date: 2010-10-14
Anticipated expiration: 2029-03-30
Also published as: JP5229050B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology for extracting a document area from a real-world photographed image for the purpose of distortion correction and character recognition. <P>SOLUTION: An area division processing unit divides an input image into a plurality of division areas on the basis of the color information of each pixel in the input image. An area integration processing unit extracts a temporary document area by performing area integration between respective division areas. A temporary document area contour straight line detection processing unit detects temporary document area contour straight lines being straight lines showing a contour of the extracted temporary document area. A text block extraction processing unit extracts a text block being a character area including a small area corresponding to a character in the extracted temporary document area. A text block contour straight line detection processing unit detects text block contour straight lines being straight lines showing a contour of the text block on the basis of boundary pixels of the extracted text block. A document area extraction processing unit extracts and outputs a document area as a square surrounded by the extracted temporary document area contour straight lines or text block contour straight lines. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

開示する技術は、文字認識を目的とした実世界撮影画像からの文書領域抽出技術に関する。 The disclosed technique relates to a technique for extracting a document area from a real-world photographed image for the purpose of character recognition.

コンパクトデジタルカメラやカメラ機能を搭載した携帯電話が普及した現在では、実世界の文書や看板などの被写体を撮影し、その撮影画像から文字情報を取得し活用するための文字認識技術に期待が高まっている。 Now that compact digital cameras and mobile phones with camera functions have become widespread, there is a growing expectation for character recognition technology to capture subjects such as real-world documents and billboards, and to acquire and use character information from the captured images. ing.

しかし、従来の文字認識処理において対象とされる画像は、スキャナなどの固定装置を利用した外乱のない環境で撮影された画像であった。
コンパクトデジタルカメラや携帯電話で名刺や活字文章などが読み取られる利用シーンを考えると、撮影時に必要な文字情報だけを正確に撮影することは困難であり、画像中に不要な情報を含む背景まで写りこむ場合が多い。また、被写体に正対して撮影できないために透視歪みや回転歪みも画像上に発生する場合が多い。このような状況から生じる、撮影画像中の背景への認識対象以外の文字列や模様などの不要な情報の写りこみや、撮影画像に発生する透視歪みや回転歪み歪みが、文字認識精度を低下させる原因となっている。 However, an image targeted in the conventional character recognition processing is an image taken in an environment free from disturbance using a fixing device such as a scanner.
Considering the usage scene where business cards and printed text can be read with a compact digital camera or mobile phone, it is difficult to accurately capture only the character information necessary for shooting, and even the background including unnecessary information is reflected in the image In many cases. Further, since it is not possible to photograph the subject directly, perspective distortion and rotational distortion often occur on the image. Due to this situation, the reflection of unnecessary information such as character strings and patterns other than the recognition target in the background of the captured image, and the perspective distortion and rotational distortion generated in the captured image reduce the character recognition accuracy. It is a cause.

そこで、文字認識処理を行う前に、文字が書かれている領域（文書領域）を精度良く抽出し、さらに歪み補正を行うことで、文字認識処理の精度を向上させる技術が要請されている。即ち、歪みや認識対象以外の要素が文字認識に影響しないように、文字認識処理が実行される前に歪み補正処理と文書領域抽出処理が実行されることが望ましい。また、歪み補正処理は、画像変換処理であり画素数に比例して処理量が増加する処理である。このため、歪み補正処理は、文書領域抽出処理によって抽出された認識対象となる文書領域に限定されて実行されることが望ましい。 Therefore, there is a demand for a technique for improving the accuracy of character recognition processing by accurately extracting a region (document region) in which characters are written before performing character recognition processing and further performing distortion correction. That is, it is desirable that the distortion correction process and the document area extraction process be performed before the character recognition process is performed so that elements other than the distortion and the recognition target do not affect the character recognition. Further, the distortion correction process is an image conversion process in which the processing amount increases in proportion to the number of pixels. For this reason, it is desirable that the distortion correction processing is executed limited to the document region to be recognized extracted by the document region extraction processing.

一般的な撮影画像中の文字認識を目的とする文字領域の抽出処理に関する第１の従来技術として、文書領域を抽出することなく、画素特徴量に基づいて画像全体の背景と文字を分離する技術が提案されている。 As a first conventional technique relating to character region extraction processing for the purpose of character recognition in a general photographed image, a technique for separating the background and characters of the entire image based on pixel feature amounts without extracting a document region Has been proposed.

また、上記と同様な文字領域の抽出処理に関する第２の従来技術として、斜め方向から撮影された文書画像において発生する回転や透視歪みを補正することを目的とする、歪み除去変換技術が提案されている。この第２の従来技術では、撮影画像から抽出された複数の文字列の傾きと既知のカメラの焦点距離に基づいて、変換処理が実行されることにより、歪みが除去される。 In addition, as a second conventional technique related to character area extraction processing similar to the above, a distortion removal conversion technique for correcting rotation and perspective distortion occurring in a document image taken from an oblique direction has been proposed. ing. In the second prior art, distortion is removed by executing conversion processing based on the inclinations of a plurality of character strings extracted from a captured image and the known focal length of the camera.

更に文字領域の抽出処理に関する第３の従来技術として、画像の歪み補正や文字認識を目的とした領域抽出技術が提案されている。この第３の従来技術では、入力画像を変換して得られるエッジ画像から得られる直線に基づいて、背景領域と文書領域が分離され、或いはエッジ画像から得られる複数の直線によって矩形領域が区画されることにより、文書領域が抽出される。 Further, as a third conventional technique related to character area extraction processing, an area extraction technique for correcting image distortion and character recognition has been proposed. In the third prior art, the background area and the document area are separated based on the straight line obtained from the edge image obtained by converting the input image, or the rectangular area is partitioned by a plurality of straight lines obtained from the edge image. As a result, the document area is extracted.

上述の各従来技術に関連して、下記の先行技術文献が開示されている。 The following prior art documents are disclosed in relation to the above-described conventional techniques.

特開平９−１６７１３号公報Japanese Patent Laid-Open No. 9-16713 特開平２００２−３３４３２７号公報Japanese Patent Laid-Open No. 2002-334327 特開平２００４−９６４３５号公報Japanese Unexamined Patent Publication No. 2004-96435 特開平２００６−１０７０３４号公報Japanese Patent Laid-Open No. 2006-107034

しかし、第１の従来技術では、文字領域が抽出されることなく文字認識が実行される。このため、第１の従来技術は、背景部分の文字でない要素が文字であると誤認識されてしまう場合が発生するという問題点を有していた。また、第１の従来技術では、対象となる文書領域の文字情報を全体の文字認識結果から選択する必要もあり、その選択処理においても誤った選択がなされる場合が発生するという問題点を有していた。 However, in the first prior art, character recognition is performed without extracting a character region. For this reason, the first prior art has a problem that an element that is not a character in the background portion is erroneously recognized as a character. In addition, the first conventional technique has a problem that it is necessary to select the character information of the target document area from the entire character recognition result, and an erroneous selection may occur in the selection process. Was.

次に、第２の従来技術では、文書領域ではない背景領域に文字や画像が存在する場合には、それらの文字等もいっしょに変換されてしまう。このような場合には、文書領域内の文字列だけを正しく抽出して変換処理を実行することが困難となり、誤変換が発生するという問題点を有していた。 Next, in the second prior art, when characters or images exist in a background area that is not a document area, these characters are also converted together. In such a case, it is difficult to correctly extract only the character string in the document area and execute the conversion process, resulting in a problem that erroneous conversion occurs.

更に、第３の従来技術では、手持ちカメラ等による一般的な撮影画像では、文書領域の輪郭直線のエッジ情報と共に、文書領域内の文字や画像要素、並びに背景部分の文字でない要素からもエッジ情報が多数取得される場合が多い。このような場合には、対象とする文書領域のみの輪郭直線を選択することが困難な場合があるという問題点を有していた。更に、背景と文書領域の識別ができない場合や文書領域の一部が画面外にある場合には、輪郭の一部が検出できない。このような場合には、対象となる文書領域を正しく抽出することが困難であるという問題点を有していた。 Further, according to the third prior art, in a general photographed image by a handheld camera or the like, edge information is also obtained from characters and image elements in the document area and non-character elements in the background portion, along with edge information of the outline of the document area. Is often acquired. In such a case, there is a problem that it may be difficult to select a contour line only for the target document area. Further, when the background and the document area cannot be identified or when a part of the document area is outside the screen, a part of the contour cannot be detected. In such a case, there is a problem that it is difficult to correctly extract a target document area.

開示する技術が解決しようとする課題は、手持ちカメラ等による一般的な撮影画像から、文書領域を正しく抽出することにある。 A problem to be solved by the disclosed technique is to correctly extract a document area from a general captured image by a handheld camera or the like.

上記課題を解決するために、開示する技術は、入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出装置として、以下の構成により実現される。 In order to solve the above-described problem, the disclosed technique is realized as the document area extracting apparatus that extracts a document area including a character string in order to recognize the character string in the input image with the following configuration.

領域分割処理部は、入力画像中の各画素の色情報に基づいて入力画像を複数の分割領域に分割する。
領域統合処理部は、各分割領域について、分割領域の他の分割領域と接している画素と、その画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する。 The area division processing unit divides the input image into a plurality of divided areas based on the color information of each pixel in the input image.
The area integration processing unit compares, for each divided area, a difference in color information between a pixel in contact with another divided area of the divided area and a pixel in another divided area adjacent to the pixel with a preset threshold value. If the difference in color information is smaller than a preset threshold value, a temporary document area is extracted by performing area integration in which both divided areas are regarded as the same area.

仮文書領域輪郭直線検出処理部は、抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する。
文書領域抽出処理部は、抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する。 The temporary document region contour straight line detection processing unit detects a temporary document region contour straight line that is a straight line indicating the contour of the extracted temporary document region.
The document area extraction processing unit extracts and outputs a rectangular document area surrounded by the extracted temporary document area outline straight line.

開示する技術によれば、抽出された仮文書領域の輪郭に直線が当てはめられることで、背景部分の要素に影響されることなく、また背景と文字領域の分離が部分的に失敗する場合でも、文字認識対象領域を四角形領域として高精度に背景から抽出することができる。更に、例えば仮文書領域の輪郭を抽出できない場合に、テキストブロック輪郭直線を求め
て併用することで、四角形領域としての文書領域を抽出することが可能となる。 According to the disclosed technology, a straight line is applied to the contour of the extracted temporary document area, so that it is not affected by the elements of the background part, and even when separation of the background and the character area partially fails, The character recognition target area can be extracted from the background with high accuracy as a rectangular area. Further, for example, when the outline of the temporary document area cannot be extracted, it is possible to extract the document area as a quadrangular area by obtaining a text block outline straight line and using it together.

また、開示する技術によれば、領域分割の際に過分割となった領域が領域統合処理によって統合されることで、文書領域内部の領域境界が不要に発生することを防ぐことができる。これにより、その後段で実行される仮文書領域輪郭直線候補算出処理では、文書領域内部の領域境界画素は算出精度の低下につながるため、これを防ぐことで輪郭直線候補の算出精度を向上させることが可能となる。 Further, according to the disclosed technology, the regions that are overdivided during the region division are integrated by the region integration processing, so that it is possible to prevent an unnecessary region boundary inside the document region. As a result, in the provisional document region contour straight line candidate calculation process executed in the subsequent stage, the region boundary pixels inside the document region lead to a decrease in the calculation accuracy, so that the contour straight line candidate calculation accuracy can be improved by preventing this. Is possible.

更に、開示する技術によれば、照明光の影響などによる輪郭の部分的な誤抽出が発生しても、文書領域が輪郭直線の当てはめによる四角形として抽出されることで、文書領域を矩形領域として正確に抽出することが可能となる。加えて、文書領域の輪郭が画面外に有る場合や背景と識別が困難な場合などで仮文書領域輪郭直線の抽出ができない場合でも、テキストブロック輪郭直線が併用されることで四角形としての文書領域を抽出することが可能となる。 Furthermore, according to the disclosed technology, even if a partial misextraction of the contour due to the influence of illumination light or the like occurs, the document region is extracted as a quadrangle by fitting the contour straight line, thereby making the document region a rectangular region. It becomes possible to extract accurately. In addition, even if the temporary document area contour line cannot be extracted because the document area outline is outside the screen or when it is difficult to distinguish it from the background, the document area as a rectangle can be created by using the text block contour line together. Can be extracted.

文書領域抽出装置の実施形態の構成図である。It is a block diagram of embodiment of a document area extraction apparatus. 図１の文書領域抽出装置が第１の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。3 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as the first embodiment. 色情報による領域分割処理の詳細を示す動作フローチャートである。It is an operation | movement flowchart which shows the detail of the area | region division process by color information. 領域統合による仮文書領域及び背景領域の分離処理（仮文書領域の粗抽出処理）の詳細を示す動作フローチャートである。It is an operation | movement flowchart which shows the detail of the separation process (rough extraction process of a temporary document area | region) of the temporary document area | region and background area | region by area | region integration. 輪郭直線候補の検出処理の詳細を示す動作フローチャートである。It is an operation | movement flowchart which shows the detail of the detection process of an outline straight line candidate. 仮文書領域輪郭直線の検出処理の詳細を示す動作フローチャートである。It is an operation | movement flowchart which shows the detail of the detection process of a temporary document area | region outline straight line. ４辺の仮文書領域輪郭直線に基づく文書領域抽出処理を示す動作フローチャートである。It is an operation | movement flowchart which shows the document area extraction process based on the temporary document area outline straight line of 4 sides. テキストブロック輪郭直線の検出処理の詳細を示す動作フローチャートである。It is an operation | movement flowchart which shows the detail of the detection process of a text block outline straight line. 文書領域輪郭直線とテキストブロック輪郭直線とに基づく文書領域抽出処理の詳細を示す動作フローチャートである。It is an operation | movement flowchart which shows the detail of the document area extraction process based on a document area outline straight line and a text block outline straight line. 図１の文書領域抽出装置が第２の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。7 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as a second embodiment. 図１の文書領域抽出装置が第３の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。12 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as the third embodiment. 図１の文書領域抽出装置が第４の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。12 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as the fourth embodiment. 図１の文書領域抽出装置が第５の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。12 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as the fifth embodiment. 第１の実施形態の動作説明図（その１）である。FIG. 6 is an operation explanatory diagram (No. 1) of the first embodiment. 第１の実施形態の動作説明図（その２）である。FIG. 6 is an operation explanatory diagram (No. 2) of the first embodiment. 領域分割処理部１０３の動作を示す説明図である。6 is an explanatory diagram illustrating an operation of an area division processing unit 103. FIG. 領域分割処理部１０３が実行するクラスタリング処理の動作説明図である。6 is an operation explanatory diagram of clustering processing executed by an area division processing unit 103. FIG. 領域統合処理部１０４の動作を示す説明図である。6 is an explanatory diagram showing an operation of a region integration processing unit 104. FIG. 仮文書領域輪郭直線検出処理部１０５の動作を示す説明図（その１）である。FIG. 11 is an explanatory diagram (part 1) illustrating an operation of the temporary document region contour straight line detection processing unit 105; 仮文書領域輪郭直線検出処理部１０５の動作を示す説明図（その２）である。FIG. 10 is an explanatory diagram (part 2) illustrating the operation of the temporary document region contour straight line detection processing unit 105; テキストブロック輪郭直線検出処理部１０７の動作を示す説明図（その１）である。It is explanatory drawing (the 1) which shows operation | movement of the text block outline straight line detection process part. テキストブロック輪郭直線検出処理部１０７の動作を示す説明図（その２）である。It is explanatory drawing (the 2) which shows operation | movement of the text block outline straight line detection process part. 図１の文字領域抽出装置を実現できるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer which can implement | achieve the character area extraction apparatus of FIG. 机の上に置かれた名刺の撮影画像の例を示す図である。It is a figure which shows the example of the picked-up image of the business card put on the desk. ２値化画像の例を示す図である。It is a figure which shows the example of a binarized image. エッジ画像の例を示す図である。It is a figure which shows the example of an edge image.

以下、実施形態について詳細に説明する。
図１は、文書領域抽出装置の実施形態の構成図である。この構成は、デジタルカメラやカメラを搭載した携帯端末や携帯電話などの機器内に実現することができる。また、図２は、図１の文書領域抽出装置が第１の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。図２の２００で示される部分が、図１の文書領域抽出装置が実行する処理部分である。なお、図２の動作フローチャートの一連の流れの制御は、図１の制御部１０９が所定の制御プログラムを実行する動作として実現される。 Hereinafter, embodiments will be described in detail.
FIG. 1 is a configuration diagram of an embodiment of a document area extracting apparatus. This configuration can be realized in devices such as a digital camera, a mobile terminal equipped with a camera, and a mobile phone. FIG. 2 is an operation flowchart showing the overall operation when the document area extracting apparatus of FIG. 1 operates as the first embodiment. A portion indicated by 200 in FIG. 2 is a processing portion executed by the document area extracting apparatus in FIG. 2 is realized as an operation in which the control unit 109 in FIG. 1 executes a predetermined control program.

以下、図１の文書領域抽出装置の構成及び図２の文書領域抽出装置の動作フローチャートに基づく第１の実施形態について、以下に詳細に説明する。
まず、ユーザが、カメラ撮影部１０１にて、例えば机の上に置かれた名刺などの文書を含む画像を撮影する。この結果、カメラ撮影部１０１が画像データを出力し、その画像データは、画像データ記憶部１０２に記憶される。 The first embodiment based on the configuration of the document area extracting apparatus in FIG. 1 and the operation flowchart of the document area extracting apparatus in FIG. 2 will be described in detail below.
First, a user captures an image including a document such as a business card placed on a desk, for example, by the camera capturing unit 101. As a result, the camera photographing unit 101 outputs image data, and the image data is stored in the image data storage unit 102.

次に、領域分割処理部１０３が、画像データ記憶部１０２から画像データを読み出して入力し、その画像データに対して、色情報に基づいて領域分割処理を行う（図２のステップＳ２０１）。 Next, the region division processing unit 103 reads out and inputs image data from the image data storage unit 102, and performs region division processing on the image data based on the color information (step S201 in FIG. 2).

次に、領域統合処理部１０４が、領域分割処理部１０３での領域分割結果を統合し、仮文書領域と背景領域を分離し、仮文書領域を粗く抽出（粗抽出）する（図２のステップＳ２０２）。 Next, the region integration processing unit 104 integrates the region division results in the region division processing unit 103, separates the temporary document region and the background region, and roughly extracts (roughly extracts) the temporary document region (step of FIG. 2). S202).

次に、仮文書領域輪郭直線検出処理部１０５が、領域統合処理部１０４での仮文書領域の粗抽出結果を入力して、仮文書領域の端画素から輪郭直線候補を検出する（図２のステップＳ２０３）。 Next, the temporary document region contour straight line detection processing unit 105 inputs the rough extraction result of the temporary document region in the region integration processing unit 104 and detects contour straight line candidates from the end pixels of the temporary document region (see FIG. 2). Step S203).

次に、仮文書領域輪郭直線検出処理部１０５が、ステップＳ２０３にて抽出された輪郭直線の候補を評価することにより、仮文書領域から輪郭直線を検出する（図２のステップＳ２０４）。以後、この輪郭直線を仮文書領域輪郭直線と呼ぶ。 Next, the temporary document region contour straight line detection processing unit 105 detects the contour straight line from the temporary document region by evaluating the contour straight line candidates extracted in step S203 (step S204 in FIG. 2). Hereinafter, this contour straight line is referred to as a temporary document region contour straight line.

続いて、制御部１０９は、仮文書領域輪郭直線検出処理部１０５において、仮文書領域輪郭直線が四角形の４辺全てに対して検出されたか否かを判定する（図２のステップＳ２０５）。 Subsequently, the control unit 109 determines whether or not the temporary document region contour straight line detection processing unit 105 has detected the temporary document region contour straight line for all four sides of the quadrangle (step S205 in FIG. 2).

ステップＳ２０５で仮文書領域輪郭直線が四角形の４辺全てに対して検出されたと判定されたならば、文書領域抽出処理部１０８が、４辺からなる仮文書領域輪郭直線で囲まれる四角形領域を正式な出力用の文書領域として抽出出力する（図２のステップＳ２０６）。 If it is determined in step S205 that the temporary document area contour straight line has been detected for all four sides of the quadrangle, the document area extraction processing unit 108 formalizes the quadrangular area surrounded by the temporary document area contour straight line having four sides. The document area is extracted and output as a document area for output (step S206 in FIG. 2).

ステップＳ２０５で仮文書領域輪郭直線が四角形の４辺全てに対して検出されてはいないと判定されたならば、テキストブロック抽出処理部１０６が、領域統合処理部１０４が粗抽出した仮文書領域から、テキストブロックを抽出する（図２のステップＳ２０７）。 If it is determined in step S205 that the temporary document area contour straight line has not been detected for all four sides of the quadrangle, the text block extraction processing unit 106 uses the temporary document area roughly extracted by the area integration processing unit 104. Then, a text block is extracted (step S207 in FIG. 2).

続いて、テキストブロック輪郭直線検出処理部１０７が、テキストブロック抽出処理部１０６が抽出したテキストブロックに対する輪郭直線を検出する（図２のステップＳ２０７）。以後、この輪郭直線をテキストブロック輪郭直線と呼ぶ。 Subsequently, the text block contour straight line detection processing unit 107 detects a contour straight line for the text block extracted by the text block extraction processing unit 106 (step S207 in FIG. 2). Hereinafter, this contour straight line is referred to as a text block contour straight line.

更に、文書領域抽出処理部１０８が、仮文書領域輪郭直線検出処理部１０５がステップＳ２０４にて検出した仮文書領域輪郭直線と、テキストブロック輪郭直線検出処理部１０７がステップＳ２０８にて検出したテキストブロック輪郭直線とを併合する。文書領域抽出処理部１０８は、この併合処理の結果得られる四角形領域を正式な出力用の文書領域として抽出し、その文書領域を出力する（図２のステップＳ２０８）。 Further, the document region extraction processing unit 108 detects the temporary document region contour straight line detected by the temporary document region contour straight line detection processing unit 105 in step S204, and the text block detected by the text block contour straight line detection processing unit 107 in step S208. Merge with contour straight line. The document area extraction processing unit 108 extracts a rectangular area obtained as a result of the merging process as a formal output document area, and outputs the document area (step S208 in FIG. 2).

以上のようにして、図１の文書領域抽出装置内の文書領域抽出処理部１０８から出力される文書領域は、文書領域抽出装置の後段に接続される特には図示しない歪み補正処理部に入力し、そこで歪み補正処理が実行される（図２のステップＳ２０９）。その歪み補正処理結果は、更に歪み補正処理部の後段に接続される特には図示しない文字認識処理部に入力し、そこで文字認識処理が実行される（図２のステップＳ２１０）。歪み補正処理や文字認識処理の結果が不十分な場合には、図１の制御部１０９は、仮文書領域輪郭直線検出処理部１０５の処理（図２のステップＳ２０４）に制御を戻し、輪郭直線候補からの仮文書領域輪郭直線の検出をやり直させる。この場合には例えば、新たな輪郭直線候補から仮文書領域輪郭直線が選択される。 As described above, the document area output from the document area extraction processing unit 108 in the document area extraction apparatus in FIG. 1 is input to a distortion correction processing section (not shown) connected to the subsequent stage of the document area extraction apparatus. Therefore, distortion correction processing is executed (step S209 in FIG. 2). The distortion correction processing result is further input to a character recognition processing section (not shown) connected to the subsequent stage of the distortion correction processing section, where character recognition processing is executed (step S210 in FIG. 2). When the results of the distortion correction process and the character recognition process are insufficient, the control unit 109 in FIG. 1 returns the control to the process of the temporary document area contour straight line detection processing unit 105 (step S204 in FIG. 2), and the contour straight line. The detection of the temporary document region outline straight line from the candidates is performed again. In this case, for example, a temporary document area contour line is selected from new contour line candidates.

図２４は、第１の実施形態が対象とする撮像画像であって、手持ちのコンパクトデジタルカメラ又は携帯電話に搭載されているカメラ等の手持ちカメラによって撮影された、机の上に置かれた名刺の撮影画像の例を示す図である。また、図２５は、図２４に示される撮影画像に対して一般的な２値化処理を実行して得られる、２値化画像の例を示す図である。更に、図２６は、図２５に示される２値化画像に対して一般的なエッジ抽出処理を実行して得られる、エッジ画像の例を示す図である。手持ちカメラによる撮影に基づく文字認識処理では、一般的に、図２６に示されるようなエッジ画像から、文字が存在する文字領域を抽出する必要がある。図２６に例示されるエッジ画像では、文字領域は全体的に左に回転しており、背景に机の表面の模様に起因する様々な模様も写り込んでいる。このようなエッジ画像から、いかに正確に文字領域を抽出するかが、課題である。 FIG. 24 is a picked-up image targeted by the first embodiment, and is a business card placed on a desk photographed by a hand-held camera such as a hand-held compact digital camera or a camera mounted on a mobile phone. It is a figure which shows the example of these picked-up images. FIG. 25 is a diagram showing an example of a binarized image obtained by executing a general binarization process on the captured image shown in FIG. Furthermore, FIG. 26 is a diagram illustrating an example of an edge image obtained by performing a general edge extraction process on the binarized image illustrated in FIG. In character recognition processing based on photographing by a handheld camera, it is generally necessary to extract a character region where characters exist from an edge image as shown in FIG. In the edge image illustrated in FIG. 26, the character region is rotated to the left as a whole, and various patterns resulting from the pattern on the desk surface are also reflected in the background. The problem is how to accurately extract a character region from such an edge image.

第１の実施形態は、図１４に示されるように、「文字認識対象文書のほかに机や付箋などの様々な要素が写り込んだ画像データ中の文書領域は矩形でかつ文字以外の部分は一様色である場合が多い」という事実を利用して動作する。 In the first embodiment, as shown in FIG. 14, “a document area in image data in which various elements such as a desk and a tag are reflected in addition to a character recognition target document is a rectangle and a portion other than a character is It works by utilizing the fact that it is often a uniform color.

即ちまず、領域分割処理部１０３が色情報に基づく領域分割を実行することにより、仮文書領域１４０２を粗抽出する。このようにして色情報による領域分割が行われてから、図１４の１４０３として示されるように、仮文書領域輪郭直線検出処理部１０５が、粗抽出された仮文書領域１４０２に対して直線を当てはめる処理を実行することで、仮文書領域輪郭直線を検出する。この処理により、正式な出力用の文書領域１４０４を四角形として精度良く背景から分離することができ、領域外の背景部分の要素が輪郭検出処理に影響することが抑止される。 That is, first, the region division processing unit 103 executes region division based on color information, thereby roughly extracting the temporary document region 1402. After the region division based on the color information is performed in this way, the temporary document region contour straight line detection processing unit 105 applies a straight line to the roughly extracted temporary document region 1402 as indicated by 1403 in FIG. By executing the process, a temporary document area contour straight line is detected. By this process, the formal output document area 1404 can be accurately separated from the background as a quadrangle, and the elements of the background portion outside the area are prevented from affecting the contour detection process.

この場合、図１５の１５０１として示されるように、環境光などの影響による仮文書領域の部分的な突出や窪みが発生しても、色情報による分割された仮文書領域１４０２への直線１４０３の当てはめにより、正式な出力用の文書領域１４０４を四角形として精度良く近似できる。この結果、後段での歪み補正処理の精度と文字認識の精度が向上する。なお、図１５で、図１４と同じ部分には同じ番号が付されている。 In this case, as shown by 1501 in FIG. 15, even if a partial protrusion or depression of the temporary document area due to the influence of ambient light or the like occurs, a straight line 1403 to the temporary document area 1402 divided by the color information is displayed. By fitting, the document area 1404 for formal output can be accurately approximated as a rectangle. As a result, the accuracy of the distortion correction process and the character recognition accuracy in the subsequent stage are improved. In FIG. 15, the same parts as those in FIG. 14 are denoted by the same reference numerals.

仮文書領域輪郭直線検出処理部１０５での仮文書領域輪郭直線の検出精度を向上させる
ために、領域分割処理部１０３において、以下の詳細処理が実行される。
図３は、領域分割処理部１０３が実行する図２のステップＳ２０１の色情報による領域分割処理の詳細を示す動作フローチャートである。また、図１６は、領域分割処理部１０３の動作を示す説明図、図１７は、領域分割処理部１０３が実行するクラスタリング処理の動作説明図である。 In order to improve the detection accuracy of the temporary document region contour straight line in the temporary document region contour straight line detection processing unit 105, the following detailed processing is executed in the region division processing unit 103.
FIG. 3 is an operation flowchart showing details of the area division processing based on the color information in step S201 of FIG. 2 executed by the area division processing unit 103. FIG. 16 is an explanatory diagram showing the operation of the area division processing unit 103, and FIG. 17 is an operation explanatory diagram of the clustering process executed by the area division processing unit 103.

まず、領域分割処理部１０３は、基本的な動作として、色情報に基づいたクラスタリングによる領域分割処理を実行する。この処理により、色情報に基づく分割によって過度の領域分割が発生した場合、隣接する分割領域のそれぞれの色情報が比較され、類似性が高い場合にはそれらの分割領域がクラスタリングにより併合される。このクラスタリング処理により、仮文書領域内のエッジ要素の発生が抑止され、不用なエッジ情報の影響が防止される。クラスタリング処理の詳細なアルゴリズムについては後述する。 First, the area division processing unit 103 executes area division processing by clustering based on color information as a basic operation. With this processing, when excessive region division occurs due to division based on color information, the color information of adjacent divided regions is compared, and when the similarity is high, the divided regions are merged by clustering. By this clustering process, the generation of edge elements in the temporary document area is suppressed, and the influence of unnecessary edge information is prevented. A detailed algorithm of the clustering process will be described later.

上述のクラスタリング処理を基本として、領域分割処理部１０３はまず、画像データ記憶部１０２から読み出した入力画像データの全体について色情報に基づくクラスタリング処理を実行して分割領域を生成する（図３のステップＳ３０１）。例えば、図１６に示される例では、入力画像データ１６０１の全体から分割領域１６０２が生成される。 Based on the above-described clustering processing, the region division processing unit 103 first performs clustering processing based on color information on the entire input image data read from the image data storage unit 102 to generate a divided region (step of FIG. 3). S301). For example, in the example shown in FIG. 16, a divided region 1602 is generated from the entire input image data 1601.

これと併せて、領域分割処理部１０３は、画像データ記憶部１０２から読み出した入力画像データを例えば４つの部分画像データに分け、各部分画像データに対して色情報に基づくクラスタリング処理を個別に実行して、各部分分割領域を生成する（ステップＳ３０２）。図１６の例では、入力画像データ１６０１を例えば４つに分割して得られる４つの部分画像データの各々から、各部分分割領域１６０３（＃１〜＃４）が生成される。なお、４分割部分画像というような１種類の部分画像群についてだけではなく、例えば９分割部分画像というような他の種類の部分画像群についても同時に処理が行われてもよい。 At the same time, the area division processing unit 103 divides the input image data read from the image data storage unit 102 into, for example, four partial image data, and individually executes clustering processing based on color information for each partial image data. Thus, each partial divided region is generated (step S302). In the example of FIG. 16, each partial divided area 1603 (# 1 to # 4) is generated from each of four partial image data obtained by dividing the input image data 1601 into, for example, four. Note that the processing may be performed not only on one type of partial image group such as a 4-partial partial image but also on other types of partial image groups such as a 9-partial partial image.

次に、領域分割処理部１０３は、全体画像に対する処理で得られた分割領域と、各部分画像に対する処理で得られた各部分分割領域とを統合して、最終的な分割領域を得る（図３のステップＳ３０３）。図１６の例では、全体画像に対応する分割領域１６０２と各部分画像に対応する各部分分割領域１６０３（＃１〜＃４）とが統合されて、分割領域１６０４が得られる。ステップＳ３０２において、複数種類の部分画像群に対して処理が行われた場合には、複数種類の各部分分割領域が統合される。 Next, the area division processing unit 103 integrates the divided areas obtained by the process for the whole image and the partial divided areas obtained by the process for each partial image to obtain a final divided area (see FIG. 3 step S303). In the example of FIG. 16, the divided areas 1602 corresponding to the entire image and the partial divided areas 1603 (# 1 to # 4) corresponding to the partial images are integrated to obtain a divided area 1604. In step S302, when processing is performed on a plurality of types of partial image groups, the plurality of types of partial divided areas are integrated.

そして、領域分割処理部１０３は、クラスタリングの結果、同じクラスに分類された領域毎に、それぞれ個別のラベルを付与し、ラベル結果画像を出力する（図３のステップＳ３０４）。ただし、同じクラスに属していても隣接していない領域には異なるラベルが付与されることで、領域分割が行われる。 Then, the area division processing unit 103 assigns an individual label to each area classified into the same class as a result of clustering, and outputs a label result image (step S304 in FIG. 3). However, region division is performed by assigning different labels to regions that are not adjacent but belong to the same class.

図１６の例では、入力画像データの全体に対する処理で得られた分割領域１６０２では、画面中央にある仮文書領域の左上角１６０５が背景との分離に失敗しているが、それ以外の３つの角や辺については背景との分離に成功している。一方、入力画像データを４分割して得た各部分画像データに対する処理で得られた各部分分割領域１６０３（＃１〜＃４）では、左上角１６０５の分離は成功しているが、上辺中央部１６０６の分離に失敗している。しかし、分割領域１６０２と各部分分割領域１６０３（＃１〜＃４）が統合され、その場合にどちらかの結果で分割された領域は全て分割されるように統合されることにより、環境光の影響などで文書領域が欠落してしまう可能性を回避することができる。 In the example of FIG. 16, in the divided area 1602 obtained by processing the entire input image data, the upper left corner 1605 of the temporary document area at the center of the screen has failed to be separated from the background. The corners and sides are successfully separated from the background. On the other hand, in each partial divided area 1603 (# 1 to # 4) obtained by processing the partial image data obtained by dividing the input image data into four parts, the upper left corner 1605 has been successfully separated, but the upper side center The separation of the part 1606 has failed. However, the divided areas 1602 and the partial divided areas 1603 (# 1 to # 4) are integrated, and in this case, the areas divided by either result are integrated so as to be divided. It is possible to avoid the possibility that the document area is lost due to an influence or the like.

図３のステップＳ３０１及びＳ３０２における各クラスタリング処理としては、Ｋ−ｍｅａｎｓ法等の統計量に基づく手法をはじめとして、種々の手法が適用できる。ここでは、図１７に模式的に示されるように、画素の色情報を色空間に投影した結果を用いて、投影結果の頻度分布の頂点（頻度の極大点）から谷（頻度の極小点）に向かって探索が行われ逐次的にクラスが決定される方式を用いることができる。これにより分布の頂点から谷までを１つのクラスとして決定することができる。この方式は、代表的なクラスを頻度の高いクラスから順に決定することができ、文書領域のような一様な領域の抽出に適している。なお、図１７は、色空間を１次元で模式的に表現したものであるが、実際のクラスタリング処理は例えば３次元の色空間において実行される。 As each clustering process in steps S301 and S302 in FIG. 3, various methods such as a method based on a statistic such as a K-means method can be applied. Here, as schematically shown in FIG. 17, using the result of projecting pixel color information onto the color space, the peak (frequency maximum point) to valley (frequency minimum point) of the frequency distribution of the projection result is used. It is possible to use a method in which a search is performed toward and a class is sequentially determined. Thereby, it is possible to determine from the vertex of the distribution to the valley as one class. This method can determine representative classes in order from the class with the highest frequency, and is suitable for extracting a uniform area such as a document area. Note that FIG. 17 schematically represents a color space in one dimension, but actual clustering processing is executed in, for example, a three-dimensional color space.

クラスタリングアルゴリズムの詳細を、以下のステップＣ１からステップＣ３までの処理として示す。
The details of the clustering algorithm will be shown as the following processing from Step C1 to Step C3.

ステップＣ１
色情報チャンネルによる色空間テーブルが用意され、画像内の各画素の色情報に基づいて投票が行われる。
例えば、色情報がＹＵＶの３チャンネルからなり、各チャネルの段階を２５６段階とすると、色空間テーブルの大きさは２５６×２５６×２５６となる。この色空間テーブルに、画像内の画素がその色情報に基づいて投票される。例えば、対象画素の色情報がＹＵＶ＝６４，１２８，１２８である場合は、該当するテーブル内の要素（６４，１２８，１２８）に１票が加算される。色空間テーブルは、必要に応じて色空間を間引きして、より小さいものを使用してもよい。 Step C1
A color space table based on the color information channel is prepared, and voting is performed based on the color information of each pixel in the image.
For example, if the color information is composed of three channels of YUV and each channel has 256 levels, the size of the color space table is 256 × 256 × 256. In this color space table, the pixels in the image are voted based on the color information. For example, when the color information of the target pixel is YUV = 64, 128, 128, one vote is added to the element (64, 128, 128) in the corresponding table. A smaller color space table may be used by thinning out the color space as necessary.

このようにして、同じ色情報を持つ画素数に比例して、その色情報に対応する色空間テーブル内の要素への投票数が多くなる。
In this way, the number of votes for elements in the color space table corresponding to the color information increases in proportion to the number of pixels having the same color information.

ステップＣ２
色空間テーブルの中で、最も票数の多い要素が探索の開始位置とされ、局所最小値が探索される。探索処理は以下のようになる。

ステップＣ２−１：色空間テーブル内の各要素にクラスタ番号が付与される。初期状態では各要素にはクラスタ番号は付与されていないとする。探索の開始位置とする要素（以下、探索要素と記す）に、クラスタ番号が付与される。例えば、最大の投票数を有する要素から探索が開始される場合にはその要素の番号が０番とされ、以降は１ずつ番号が増加させられる。

ステップＣ２−２：探索要素と色空間テーブル内のある軸に沿って、色空間内で隣接する要素（以下、隣接要素と記す）へのクラスタ番号の付与状態が調査される。 Step C2
In the color space table, the element with the largest number of votes is set as the search start position, and the local minimum value is searched. The search process is as follows.

Step C2-1: A cluster number is assigned to each element in the color space table. Assume that no cluster number is assigned to each element in the initial state. A cluster number is assigned to an element used as a search start position (hereinafter referred to as a search element). For example, when the search is started from an element having the maximum number of votes, the number of the element is set to 0, and thereafter the number is incremented by 1.

Step C2-2: A search is made for the cluster number assignment state to an element adjacent in the color space (hereinafter referred to as an adjacent element) along a certain axis in the search space and the color space table.

例えば、探索要素が（ＹＵＶ＝６４，１２８，１２８）である場合に、隣接する要素がＶ軸に沿って増加する方向に選択されると、（６４，１２８，１２９）の要素が選択される。 For example, when the search element is (YUV = 64, 128, 128) and the adjacent element is selected in the increasing direction along the V axis, the element (64, 128, 129) is selected. .

この隣接要素にクラスタ番号が付与済みの場合は、ステップＣ２−３の処理に進む。
隣接要素にクラスタ番号が付与されていない場合は、探索要素の投票数と、隣接要素の投票数が比較される。その結果、隣接要素の投票数の方が少ない場合（例えば、探索要素の投票数が１００で、隣接要素の投票数が９０である場合）には、隣接要素にクラスタ番号が付与されて、隣接要素に探索の開始位置が移動し、ステップＣ２−２の処理が繰り返される。隣接要素の投票数の方が多い場合は、ステップＣ２−３へ進む。
If a cluster number has been assigned to this adjacent element, the process proceeds to step C2-3.
When the cluster number is not assigned to the adjacent element, the number of votes of the search element is compared with the number of votes of the adjacent element. As a result, when the number of votes of the adjacent element is smaller (for example, when the number of votes of the search element is 100 and the number of votes of the adjacent element is 90), the cluster number is assigned to the adjacent element and The search start position moves to the element, and the process of step C2-2 is repeated. If the number of votes of adjacent elements is larger, the process proceeds to step C2-3.

ステップＣ２−３：ステップＣ２−２で選択された軸に沿って反対方向の隣接要素（例えばＶ軸に沿って減少する方向（６４，１２８，１２７））へのクラスタ番号の付与状態が
調査される。 Step C2-3: The assignment state of the cluster number to the adjacent element in the opposite direction along the axis selected in Step C2-2 (for example, the decreasing direction along the V axis (64, 128, 127)) is checked. The

その結果、隣接要素についてクラスタ番号が付与済みの場合は、異なる軸に沿って隣接する要素が調査される。例えば、Ｕ軸方向に沿って隣接する要素（６４，１２７，１２８）、（６４，１２９，１２８）が調査される。全ての隣接要素にクラスタ番号が付与済みの場合は、ステップＣ２−４に進む。 As a result, when a cluster number has been assigned to adjacent elements, adjacent elements along different axes are investigated. For example, adjacent elements (64, 127, 128) and (64, 129, 128) along the U-axis direction are examined. If a cluster number has been assigned to all adjacent elements, the process proceeds to step C2-4.

クラスタ番号が付与されていない隣接要素が存在する場合は、隣接要素の投票数と探索要素の投票数が比較される。
隣接要素の投票数の方が少ない場合は、探索の開始位置が隣接要素に移動されてクラスタ番号が付与され、ステップＣ２−２の処理が繰り返される。 If there is an adjacent element to which no cluster number is assigned, the number of votes of the adjacent element is compared with the number of votes of the search element.
When the number of votes of the adjacent element is smaller, the search start position is moved to the adjacent element, the cluster number is assigned, and the process of step C2-2 is repeated.

隣接要素の投票数の方が多い場合は、異なる軸に沿って隣接する要素間で投票数が比較される。隣接要素の投票数が対象要素の投票数より少ない場合は、その隣接要素にクラスタ番号が付与され、更に隣接要素に探索の開始位置が移動されて、ステップＣ２−２の処理が繰り返される。全ての隣接要素の投票数が探索要素の投票数より多い場合は、ステップＣ２−４に進む。
When the number of votes of adjacent elements is larger, the number of votes is compared between elements adjacent along different axes. When the number of votes of the adjacent element is smaller than the number of votes of the target element, the cluster number is assigned to the adjacent element, the search start position is further moved to the adjacent element, and the process of step C2-2 is repeated. If the number of votes of all adjacent elements is greater than the number of votes of the search element, the process proceeds to step C2-4.

ステップＣ２−４：現在の対象要素に移動する前の要素へ、探索開始位置が戻される。
戻された後の要素がステップＣ２−１の最初の処理対象として選択された要素でない場合は、ステップＣ２−３の処理が繰り返される。
戻された後の要素がステップＣ２−１の最初の処理対象要素として選択された要素である場合は、色空間テーブルの中でクラスタ番号が付与されていない要素の中で、最も票数の多い要素が探索開始位置とされる。そして、異なるクラスタ番号を用いて、ステップＣ２−１の処理から繰り返される。
全ての要素にラベルが付与された時点で、色空間テーブル要素へのクラスタ番号の付与処理が終了する。
Step C2-4: The search start position is returned to the element before moving to the current target element.
If the returned element is not the element selected as the first processing target of step C2-1, the process of step C2-3 is repeated.
If the returned element is the element selected as the first element to be processed in step C2-1, the element with the largest number of votes among the elements not assigned the cluster number in the color space table Is the search start position. And it repeats from the process of step C2-1 using a different cluster number.
When labels are assigned to all the elements, the process of assigning cluster numbers to the color space table elements ends.

ステップＣ３
色空間テーブル内のクラスタ番号が、画像内の各画素に反映させられる。これにより、同じ色情報を持つ画素は、画像内で隣接しているかどうかによらず同じクラスタ番号が付与される。
例えば、画素（ＸＹ＝０，０）と画素（ＸＹ＝５，１０）が同じ色情報（ＹＵＶ＝６４，１２８，１２８）である場合は、同じクラスタ番号、例えば（０ｘ０００００１）が付与される。
Step C3
The cluster number in the color space table is reflected on each pixel in the image. Thereby, pixels having the same color information are assigned the same cluster number regardless of whether they are adjacent in the image.
For example, when the pixel (XY = 0, 0) and the pixel (XY = 5, 10) have the same color information (YUV = 64, 128, 128), the same cluster number, for example, (0x000001) is assigned.

以上のステップＣ１からＣ３によって示される色情報に基づくクラスタリング処理が、図３のステップＳ３０１においては、画像データ記憶部１０２から読み出した入力画像データの全体について実行される。また、図３のステップＳ３０２においては、画像データ記憶部１０２から読み出した入力画像データを分割して得られる各部分画像データに対して個別に実行される。 The clustering process based on the color information indicated by the above steps C1 to C3 is executed for the entire input image data read from the image data storage unit 102 in step S301 in FIG. Further, in step S302 of FIG. 3, the process is individually executed for each partial image data obtained by dividing the input image data read from the image data storage unit 102.

次に、図３のステップＳ３０３において領域分割処理部１０３が実行する、全体画像から得られる分割領域と各部分画像から得られる各部分分割領域の統合処理について説明する。ステップＳ３０１とＳ３０２の処理の結果、各ステップにて出力される画像データの各画素は、全体画像と各部分分割画像に対してそれぞれ付与されたクラスタ番号を持つ。例えば、ステップＳ３０２での分割を４分割と９分割の２種類とする。そして、或る画素
（ＸＹ＝０，０）につき、全体画像上でのその画素のクラスタ番号が０ｘ０００００１、４分割画像上でのその画素のクラスタ番号が０ｘ０００００２、９分割画像上でのその画素のクラスタ番号が０ｘ０００００５であるとする。図３のステップＳ３０３では、この画素に対する各クラスタ番号に基づいて、新たな統合されたクラスタ番号として、画素毎に各分割結果のクラスタ番号を結合して得られる番号、例えば０ｘ０００００１０００００２０００００５が割り振られる。このようにして、最終的な分割領域が得られる。 Next, the integration process of the divided areas obtained from the entire image and the partial divided areas obtained from the partial images, which is executed by the area dividing processing unit 103 in step S303 in FIG. 3 will be described. As a result of the processing in steps S301 and S302, each pixel of the image data output in each step has a cluster number assigned to the entire image and each partially divided image. For example, there are two types of division in step S302, 4 divisions and 9 divisions. Then, for a certain pixel (XY = 0, 0), the cluster number of the pixel on the whole image is 0x000001, the cluster number of the pixel on the 4-division image is 0x000002, and the pixel number on the 9-division image Assume that the cluster number is 0x000005. In step S303 of FIG. 3, based on each cluster number for this pixel, a number obtained by combining the cluster numbers of the respective division results for each pixel, for example, 0x000001000002000005 is assigned as a new integrated cluster number. In this way, a final divided region is obtained.

最後に、図３のステップＳ３０４において領域分割処理部１０３が実行する、ラベル結果画像の出力処理について説明する。上述のようにしてステップＳ３０３にて各画素に統合されたクラスタ番号が付与された画像データにおいて、画像内で隣接する画素間で付与されているクラスタ番号が同一の画素であれば、それらの画素には同じラベル番号が割り振られる。ただし、クラスタ番号が同一の画素同士であっても、画像内で隣接していない画素同士である場合には、これらの画素には別のラベル番号が割り振られる。例えば、孤立領域に対しては、他の領域とクラスタ番号が同一でも別のラベル番号が割り振られる。このラベリング処理により、領域分割処理部１０３は、ラベリング結果画像を出力する。 Finally, a label result image output process executed by the area division processing unit 103 in step S304 in FIG. 3 will be described. In the image data in which the cluster number integrated to each pixel is assigned in step S303 as described above, if the cluster number assigned between adjacent pixels in the image is the same pixel, those pixels Are assigned the same label number. However, even if the pixels have the same cluster number, if the pixels are not adjacent in the image, different label numbers are assigned to these pixels. For example, a different label number is assigned to an isolated area even if the cluster number is the same as the other area. By this labeling process, the area division processing unit 103 outputs a labeling result image.

上述のように第１の実施形態では、全体画像に対するクラスタリング処理（ステップＳ３０１）と各部分画像に対する各クラスタリング処理（ステップＳ３０２）が併用され各結果が統合される。その効果の概略については、図１６を用いて前述した。 As described above, in the first embodiment, the clustering process for the entire image (step S301) and the clustering process for each partial image (step S302) are used together to integrate the results. The outline of the effect has been described above with reference to FIG.

ここで、その効果について再度、クラスタリング処理の観点から説明する。
色情報によるクラスタリング処理では、画素の色情報を色空間へ投票した頻度分布に対して処理が実行される。この場合、画像内に局所的な色情報の偏りがある場合は、画像全体に対する色空間内の頻度分布と、部分画像に対する色空間内の頻度分布が大きく異なる場合がある。 Here, the effect will be described again from the viewpoint of clustering processing.
In the clustering process using color information, the process is executed on the frequency distribution of voting the pixel color information to the color space. In this case, when there is a local color information bias in the image, the frequency distribution in the color space for the entire image may be significantly different from the frequency distribution in the color space for the partial image.

例えば文書領域（対象とする紙面領域）が画像内の隅にあり文書領域周囲の背景部分が小さい場合、１枚の入力画像を分割して得られる各部分画像では、背景部分の画素数が多い部分画像と少ない部分画像とが存在する。 For example, when the document area (target paper area) is at the corner of the image and the background portion around the document area is small, each partial image obtained by dividing one input image has a large number of pixels in the background portion. There are partial images and few partial images.

部分画像で背景部分の画素が文書画像内の画素に比べて非常に少ない場合、色空間内の頻度分布によっては、仮文書領域内の画素へのクラスタリング結果に背景部分の画素が誤って含まれる場合が起こり得る。 If the background image has very few background pixels compared to the pixels in the document image, depending on the frequency distribution in the color space, the background image pixels may be incorrectly included in the clustering results for the pixels in the temporary document area. Cases can happen.

このような場合でも、全体画像であれば背景部分の画素が、望ましいクラスタリング結果を得られる程度に存在することが期待できる。
このため、部分画像と全体画像の各クラスタリング結果が併用されることで、クラスタリングの誤り（不十分なクラスタリング）を抑止することが可能となる。 Even in such a case, in the entire image, it can be expected that pixels in the background portion exist to such an extent that a desired clustering result can be obtained.
For this reason, the clustering error (insufficient clustering) can be suppressed by using the clustering results of the partial image and the entire image together.

図１の仮文書領域輪郭直線検出処理部１０５での仮文書領域輪郭直線の検出精度を向上させるために、上述の領域分割処理部１０３における詳細処理に続いて、図１の領域統合処理部１０４において、以下の詳細処理が実行される。 In order to improve the detection accuracy of the temporary document region contour straight line in the temporary document region contour straight line detection processing unit 105 in FIG. 1, following the detailed processing in the region division processing unit 103 described above, the region integration processing unit 104 in FIG. The following detailed processing is executed.

図４は、領域統合処理部１０４が実行する図２のステップＳ２０２の、領域統合による仮文書領域及び背景領域の分離処理（仮文書領域の粗抽出処理）の詳細を示す動作フローチャートである。また、図１８は、領域統合処理部１０４の動作を示す説明図である。 FIG. 4 is an operation flowchart showing details of the temporary document region and background region separation processing (provisional document region rough extraction processing) by region integration in step S202 of FIG. 2 executed by the region integration processing unit 104. FIG. 18 is an explanatory diagram showing the operation of the region integration processing unit 104.

領域統合処理部１０４は、領域分割処理部１０３が領域分割結果として出力したラベリング結果画像を入力する。領域統合処理部１０４は、このラベリング結果画像において、隣接する領域間でラベルが異なる場合は、各領域の色情報を評価し、類似性が高い場合は
それらの領域を併合して同じラベルを付与する（図４のステップＳ４０１）。 The area integration processing unit 104 receives the labeling result image output as the area division result by the area division processing unit 103. In the labeling result image, the region integration processing unit 104 evaluates the color information of each region when the labels are different between adjacent regions, and merges the regions and gives the same label when the similarity is high. (Step S401 in FIG. 4).

そして、領域統合処理部１０４は、ステップＳ４０１でのラベル併合の結果、画像中央付近で領域サイズ（領域の面積）が所定の大きさ以上の領域を仮文書領域の粗抽出結果として選択して出力する（図４のステップＳ４０２）。 Then, as a result of the label merging in step S401, the region integration processing unit 104 selects and outputs a region having a region size (region area) of a predetermined size or more near the center of the image as a rough extraction result of the temporary document region. (Step S402 in FIG. 4).

もし上述の領域統合処理が行われない場合は、図１８に示されるように、仮文書領域の粗抽出結果１８０１の内部に過分割された領域１８０２が存在する場合、輪郭境界が、背景領域との境界以外に仮文書領域１８０１の内部にも発生し、仮文書領域輪郭直線の検出精度と計算量に影響を与えてしてしまう。従って、上述の領域統合処理により、局所的な濃淡変化による誤統合が回避される。 If the above-described region integration processing is not performed, as shown in FIG. 18, when there is an excessively divided region 1802 inside the rough extraction result 1801 of the temporary document region, the contour boundary is defined as the background region. In addition to this boundary, it also occurs inside the temporary document area 1801 and affects the detection accuracy and calculation amount of the temporary document area outline straight line. Therefore, erroneous integration due to local shading changes is avoided by the above-described region integration processing.

上述のステップＳ４０１とＳ４０２の処理の詳細なアルゴリズムを、以下のステップＭ１からステップＭ５までの処理として示す。
A detailed algorithm of the processes in steps S401 and S402 described above is shown as the following processes from step M1 to step M5.

ステップＭ１
ラベリング結果画像において、同一のラベル番号を持つ画素が１つの領域とされる。具体的には、これらの画素には、新たな同一のラベル番号が振り直される。
Step M1
In the labeling result image, pixels having the same label number are set as one region. Specifically, the new same label number is reassigned to these pixels.

ステップＭ２
ステップＭ１の処理の後に得られるラベリング結果画像において、あるラベル番号を有する領域が処理対象領域として選択される。 Step M2
In the labeling result image obtained after the processing in step M1, an area having a certain label number is selected as a processing target area.

処理対象領域が他の異なるラベル番号を持つ領域と接している場合は、相手のラベル番号毎に領域同士が接している部分の輪郭線長、即ち互いに接している画素の数の合計が算出される。例えば、ラベル番号＃００１を有する領域内の１つの画素Ａが、ラベル番号＃００３を有する領域内の２つの画素に接している場合は、画素Ａに対する輪郭線長は２となる。この処理が、処理対象の領域内の画素全てに対して実施される。例えば、処理対象領域であるラベル番号＃００１の領域に対して、ラベル番号＃００３と＃００４の領域が接している場合は、＃００３、＃００４の領域毎に輪郭線長が算出される。以下、処理対象領域に接している領域を隣接領域、処理対象領域内の画素に接している画素を隣接画素と呼ぶ。 When the processing target area is in contact with another area having a different label number, the contour line length of the part where the areas are in contact with each other, that is, the total number of pixels in contact with each other is calculated. The For example, when one pixel A in the region having the label number # 001 is in contact with two pixels in the region having the label number # 003, the contour line length for the pixel A is 2. This process is performed for all the pixels in the processing target area. For example, when the region of label number # 003 and the region of # 004 are in contact with the region of label number # 001, which is the processing target region, the contour line length is calculated for each region of # 003 and # 004. Hereinafter, an area in contact with the processing target area is referred to as an adjacent area, and a pixel in contact with a pixel in the processing target area is referred to as an adjacent pixel.

続いて、処理対象領域が接する隣接領域毎（ラベル番号毎）に、接している画素同士の色情報の差異（絶対値）の合計が算出される。例えば、ラベル番号＃００１である処理対象領域内の画素Ａ（ＹＵＶ＝１２８，１９２，１２８）に対して、ラベル番号＃００３の領域内の２つの隣接画素Ｐ（ＹＵＶ＝１２５，１９０，１２０）とＱ（ＹＵＶ＝１２５，１９０，１２５）が存在するとする。この場合、画素Ａと隣接画素Ｐの色情報の差違（絶対値）は、次のように計算される。 Subsequently, for each adjacent region (for each label number) with which the processing target region is in contact, the sum of differences (absolute values) in color information between the pixels in contact is calculated. For example, for the pixel A (YUV = 128, 192, 128) in the processing target area with label number # 001, two adjacent pixels P (YUV = 125, 190, 120) in the area with label number # 003 And Q (YUV = 125, 190, 125). In this case, the difference (absolute value) between the color information of the pixel A and the adjacent pixel P is calculated as follows.

ＡとＰのＹ成分の差違＝｜１２８−１２５｜＝３
ＡとＰのＵ成分の差違＝｜１９２−１９０｜＝２
ＡとＰのＶ成分の差違＝｜１２８−１２０｜＝８ Difference between Y components of A and P = | 128−125 | = 3
Difference between U components of A and P = | 192−190 | = 2
Difference between V components of A and P = | 128−120 | = 8

同様に、画素Ａと隣接画素Ｑの色情報の差違（絶対値）は、次のように計算される。
ＡとＱのＹ成分の差違＝｜１２８−１２５）＝３
ＡとＱのＵ成分の差違＝｜１９２−１９０｜＝２
ＡとＱのＶ成分の差違＝｜１２８−１２５｜＝３ Similarly, the difference (absolute value) in color information between the pixel A and the adjacent pixel Q is calculated as follows.
Difference between Y components of A and Q = | 128−125) = 3
Difference between U components of A and Q = | 192−190 | = 2
Difference between V components of A and Q = | 128−125 | = 3

従って、画素Ａに対する隣接画素Ｐ，Ｑとの色情報の各差異（絶対値）の合計は、次のように計算される。
ＡとＰ，ＱとのＹ成分の差違の合計
＝ＡとＰのＹ成分の差違＋ＡとＱのＹ成分の差違＝３＋３＝６
ＡとＰ，ＱとのＵ成分の差違の合計
＝ＡとＰのＵ成分の差違＋ＡとＱのＵ成分の差違＝２＋２＝４
ＡとＰ，ＱとのＶ成分の差違の合計
＝ＡとＰのＶ成分の差違＋ＡとＱのＶ成分の差違＝８＋３＝１１ Accordingly, the sum of the differences (absolute values) of the color information of the pixel A from the adjacent pixels P and Q is calculated as follows.
Total difference in Y component between A, P, and Q
= Difference between Y components of A and P + Difference between Y components of A and Q = 3 + 3 = 6
Total difference in U components between A, P, and Q
= Difference between U components of A and P + Difference between U components of A and Q = 2 + 2 = 4
Total difference in V component between A, P, and Q
= V component difference between A and P + V component difference between A and Q = 8 + 3 = 11

つまり、画素Ａに対する隣接画素Ｐ，Ｑとの色情報の各差異（絶対値）の合計は、ＹＵＶ＝６，４，１１となる。この合計処理が、１つの隣接領域における隣接画素全てについて実行され、色情報の成分毎の合計値が算出される。更に、この合計値が、処理対象領域が接する隣接領域毎（ラベル番号毎）に算出される。 That is, the sum of the differences (absolute values) of the color information of the pixel A from the adjacent pixels P and Q is YUV = 6, 4, 11. This summation process is executed for all adjacent pixels in one adjacent region, and a total value for each component of the color information is calculated. Further, this total value is calculated for each adjacent region (for each label number) that is in contact with the processing target region.

以上のようにして、隣接領域毎（ラベル番号毎）に、隣接画素の輪郭線長と、隣接画素に関する色情報の差違の合計値が算出される。例えば、ラベル番号＃００１の処理対象領域について、ラベル番号＃００３の隣接領域に関する輪郭線長は２０、色情報の差異の合計値はＹＵＶ＝６０，４０，５５といった如くである。また、ラベル番号＃００４の隣接領域に関する輪郭線長は３０、色情報の差異の合計値はＹＵＶ＝９０，６０，１６０といった如くである。
As described above, for each adjacent region (for each label number), the total value of the difference between the contour line length of the adjacent pixel and the color information regarding the adjacent pixel is calculated. For example, for the region to be processed with label number # 001, the outline length for the adjacent region with label number # 003 is 20, and the total difference in color information is YUV = 60, 40, 55, and so on. Further, the contour length regarding the adjacent region of label number # 004 is 30, and the total value of the differences in color information is YUV = 90, 60, 160, and so on.

ステップＭ３
ステップＭ２にて算出された処理対象領域に接する隣接領域毎に、隣接画素の輪郭線長と、隣接画素に関する色情報の成分毎の差異の合計値を隣接画素の輪郭線長で除算して得られる色情報の成分毎の差異平均値が、それぞれ所定の閾値と比較される。この結果、隣接画素の輪郭線長が閾値より長く、かつ色情報の成分毎の差異平均値が閾値よりも小さい場合は、処理対象領域と隣接領域は同一の領域であると見なされ、２つの領域のラベル番号がどちらか一方の領域のラベル番号に統合される。
例えば、輪郭線長に対する閾値を１０、色情報の成分毎の差異平均値に対する成分毎の閾値をＹＵＶ＝５，５，５とした場合を考える。 Step M3
For each adjacent region in contact with the processing target region calculated in step M2, the total value of the difference between the adjacent pixel contour line length and the color information component for the adjacent pixel is divided by the adjacent pixel contour length. The average difference value for each color information component is compared with a predetermined threshold value. As a result, when the contour line length of the adjacent pixel is longer than the threshold value and the difference average value for each component of the color information is smaller than the threshold value, the processing target area and the adjacent area are regarded as the same area. The label number of the area is integrated with the label number of one of the areas.
For example, consider a case where the threshold for the contour length is 10 and the threshold for each component for the difference average value for each component of color information is YUV = 5, 5, 5.

そして今、ステップＭ２で例示したように、ラベル番号＃００１の処理対象領域について、ラベル番号＃００３の隣接領域に関する輪郭線長は２０、色情報の差異の合計値はＹＵＶ＝６０，４０，５５であるとする。この場合、色情報の成分毎の差違平均値はＹＵＶ＝６０／２０，４０／２０，５５／２０＝３，２，２．７５である。従って、輪郭線長２０は閾値１０よりも長く、かつ色情報の成分毎の差違平均値ＹＵＶ＝３，２，２．７５は成分毎の閾値ＹＵＶ＝５，５，５よりも小さい。このため、ラベル番号＃００３の隣接領域は、比較条件を全て満たすことから、ラベル番号＃００１の処理対象領域とラベル番号＃００３の隣接領域は、同一の領域と見なされて統合される。２つの領域のラベル番号は例えば、若い番号の方に統合される。 Now, as exemplified in step M2, for the processing target area of label number # 001, the contour line length for the adjacent area of label number # 003 is 20, and the total difference in color information is YUV = 60, 40, 55. Suppose that In this case, the difference average value for each component of the color information is YUV = 60/20, 40/20, 55/20 = 3, 2, 2.75. Therefore, the contour line length 20 is longer than the threshold value 10, and the difference average value YUV = 3, 2, 2.75 for each component of the color information is smaller than the threshold value YUV = 5, 5, 5 for each component. For this reason, since the adjacent region with label number # 003 satisfies all the comparison conditions, the processing target region with label number # 001 and the adjacent region with label number # 003 are regarded as the same region and integrated. For example, the label numbers of the two areas are integrated into a smaller number.

一方、ステップＭ２で例示したように、ラベル番号＃００１の処理対象領域について、ラベル番号＃００４の隣接領域に関する輪郭線長は３０、色情報の差異の合計値はＹＵＶ＝９０，６０，１６０であるとする。この場合、色情報の成分毎の差違平均値はＹＵＶ＝９０／３０，６０／３０，１６０／３０＝３，２，５．３である。従って、輪郭線長３０は閾値１０よりも長いが、色情報の成分毎の差違平均値ＹＵＶ＝３，２，５．３と成分毎の閾値ＹＵＶ＝５，５，５とを比較した場合にＶ成分が閾値よりも大きい。このため、ラベル番号＃００４の隣接領域は、比較条件の全ては満たさないことから、ラベル番号＃００１の処理対象領域とラベル番号＃００４の隣接領域は、同一の領域とは見なされず、統合は行われない。
On the other hand, as illustrated in step M2, for the processing target area with label number # 001, the contour line length for the adjacent area with label number # 004 is 30, and the total difference in color information is YUV = 90, 60, 160. Suppose there is. In this case, the difference average value for each component of the color information is YUV = 90/30, 60/30, 160/30 = 3, 2, 5.3. Accordingly, the contour line length 30 is longer than the threshold 10, but when the difference average value YUV = 3, 2, 5.3 for each component of the color information is compared with the threshold YUV = 5, 5, 5 for each component. The V component is larger than the threshold value. For this reason, since the adjacent area of label number # 004 does not satisfy all of the comparison conditions, the processing target area of label number # 001 and the adjacent area of label number # 004 are not regarded as the same area, and integration is not performed. Not done.

ステップＭ４
全てのラベル番号の領域について、ステップＭ１からステップＭ３までの操作が繰り返し実行されることにより、領域統合が行われる。
Step M4
The region integration is performed by repeatedly executing the operations from step M1 to step M3 for all the label number regions.

ステップＭ５
領域統合の後、画像中央付近にあり、かつ面積が最大の領域が選択される。例えば、画像中心部の画素に対する重みが最大とされ、周辺に近づくにつれて重みが小さくされるような関数が用意される。この関数を使って、画像データ中の画素毎に、重みが算出される。ラベル番号毎に、そのラベル番号に属する画素の重みの合計が算出され、合計値が最大となるラベル番号の領域が、仮文書領域の粗抽出結果として選択される。
Step M5
After region integration, a region near the center of the image and having the largest area is selected. For example, a function is prepared in which the weight for the pixel at the center of the image is maximized, and the weight is reduced as it approaches the periphery. Using this function, a weight is calculated for each pixel in the image data. For each label number, the sum of the weights of the pixels belonging to that label number is calculated, and the area of the label number that maximizes the total value is selected as the rough extraction result of the temporary document area.

図１の領域統合処理部１０４における上述の詳細処理に続いて、図１の仮文書領域輪郭直線検出処理部１０５において、以下の詳細処理が実行される。 Following the above-described detailed processing in the region integration processing unit 104 in FIG. 1, the following detailed processing is executed in the temporary document region contour straight line detection processing unit 105 in FIG.

図５は、仮文書領域輪郭直線検出処理部１０５が実行する図２のステップＳ２０３の、輪郭直線候補の検出処理の詳細を示す動作フローチャートである。また、図６は、仮文書領域輪郭直線検出処理部１０５が実行する図２のステップＳ２０４の、仮文書領域輪郭直線の検出処理の詳細を示す動作フローチャートである。更に、図１９及び図２０は、仮文書領域輪郭直線検出処理部１０５の動作を示す説明図である。 FIG. 5 is an operation flowchart showing details of the contour straight line candidate detection processing in step S203 of FIG. 2 executed by the temporary document region contour straight line detection processing unit 105. FIG. 6 is an operation flowchart showing details of the temporary document region contour straight line detection processing in step S204 of FIG. 2 executed by the temporary document region contour straight line detection processing unit 105. 19 and 20 are explanatory diagrams showing the operation of the temporary document area contour straight line detection processing unit 105. FIG.

まず、図２のステップＳ２０３の輪郭直線候補の検出処理の詳細について、図５の動作フローチャートに従って説明する。
仮文書領域輪郭直線検出処理部１０５は、領域統合処理部１０４によって算出された仮文書領域の粗抽出結果において、隣接領域との境界画素を抽出する（図５のステップＳ５０１）。ここで、仮文書領域内部のテキストブロック（文字領域）との境界は対象外とされる。このテキストブロックは例えば、仮文書領域内にあって面積が所定の閾値以下である大きさを有し、周囲との色情報の差（成分毎又は各成分の合計値）が所定の閾値以上である領域として抽出することができる。 First, the details of the detection process of the contour straight line candidate in step S203 of FIG. 2 will be described according to the operation flowchart of FIG.
The temporary document region outline straight line detection processing unit 105 extracts a boundary pixel with an adjacent region in the rough extraction result of the temporary document region calculated by the region integration processing unit 104 (step S501 in FIG. 5). Here, the boundary with the text block (character area) inside the temporary document area is excluded. For example, the text block has a size within the temporary document area and the area is equal to or smaller than a predetermined threshold, and the difference in color information from each other (for each component or the total value of each component) is equal to or larger than the predetermined threshold It can be extracted as a certain area.

次に、仮文書領域輪郭直線検出処理部１０５は、ステップＳ５０１で抽出した各境界画素を、極座標の２次元空間で表されるハフ（Ｈｏｕｇｈ）空間上の対応する座標に投票する（写像する）。ここでＨｏｕｇｈ変換は、直線の検出や円の検出に用いられる。直線の検出の場合、元になる直角座標上の点（ｘ、ｙ）を角度θと距離γの極座標二次元空間に変換し、角度θと距離γごとに、その個数をメモリ配列上に加算する。個数が最大になった角度θと距離γの組み合わせを元の直角座標に戻したものが、最も直線らしい点の集まりとなる。すなわち直角座標上の直線が極座標上の１点になる。従って、個数を下げてゆくと、次の候補が順次得られる。実画像上で直線状に並ぶ画素に対応するハフ空間上の座標は、１点に集約されることが知られている。このため、ハフ空間上で投票数の多い座標値に対応する実際の画像上の直線が、文書領域の輪郭直線に対応する可能性が高い。そこで、仮文書領域輪郭直線検出処理部１０５は、ハフ空間上で所定の閾値以上の投票数を持つ点を、上位から所定の数だけ検出する。そして、仮文書領域輪郭直線検出処理部１０５は、各点に写像されている各境界画素群によって形成される各直線群を、文書領域に対応する四角形を構成する４辺の輪郭直線候補として算出する（以上、図５のステップＳ５０２）。 Next, the provisional document area contour straight line detection processing unit 105 votes (maps) each boundary pixel extracted in step S501 to corresponding coordinates on a Hough space represented by a two-dimensional space of polar coordinates. . Here, the Hough transform is used for detecting a straight line or a circle. In the case of detecting a straight line, a point (x, y) on the original rectangular coordinate is converted to a polar coordinate two-dimensional space of angle θ and distance γ, and the number is added to the memory array for each angle θ and distance γ. To do. The combination of the angle θ and the distance γ that maximizes the number is returned to the original rectangular coordinates to form a set of points that are most likely to be straight. That is, a straight line on the rectangular coordinate becomes one point on the polar coordinate. Therefore, when the number is decreased, the next candidates are obtained sequentially. It is known that the coordinates in the Hough space corresponding to the pixels arranged in a straight line on the actual image are collected into one point. For this reason, there is a high possibility that a straight line on an actual image corresponding to a coordinate value having a large number of votes in the Hough space corresponds to a contour straight line of the document area. Therefore, the provisional document region outline straight line detection processing unit 105 detects a predetermined number of points from the higher order having a number of votes equal to or greater than a predetermined threshold in the Hough space. Then, the temporary document region contour straight line detection processing unit 105 calculates each straight line group formed by each boundary pixel group mapped to each point as a four-side contour straight line candidate constituting a quadrangle corresponding to the document region. (Step S502 in FIG. 5).

複数の輪郭直線候補を選択する理由は、文書領域周囲の背景の影響による輪郭画素の誤検出の影響から、最大の投票数の座標が最適な輪郭直線に対応するとは限らないためである。例えば、文書領域周囲の背景の影響としては、文書領域周囲の照明ムラによる境界の誤検出や、背景の模様や図形の一部を文書領域に誤って含めた場合などがある。 The reason for selecting a plurality of contour line candidates is that the coordinates of the maximum number of votes do not always correspond to the optimum contour line due to the influence of erroneous detection of the contour pixels due to the influence of the background around the document area. For example, the influence of the background around the document area includes erroneous detection of a boundary due to uneven illumination around the document area, or a case where a part of the background pattern or figure is erroneously included in the document area.

ステップＳ５０２において、仮文書領域輪郭直線検出処理部１０５は、ハフ空間上で所
定の閾値より小さい投票数の点については、信頼性が低いと見なして輪郭直線候補としては抽出しない。これは、被写体の皺や破れなどで輪郭の直線性が低い場合などに対応する。また、文書領域が画像外にはみ出し、文書領域の境界が画像端となる場合も輪郭直線候補としては検出されないことになる。 In step S502, the provisional document region contour straight line detection processing unit 105 regards points having a vote count smaller than a predetermined threshold in the Hough space as low reliability and does not extract them as contour straight line candidates. This corresponds to a case where the linearity of the contour is low due to wrinkles or tearing of the subject. Further, even when the document area protrudes outside the image and the boundary of the document area becomes the edge of the image, it is not detected as a contour straight line candidate.

図１９は、仮文書領域に対して輪郭直線候補が検出される例を示した図である。
次に、図２のステップＳ２０４の仮文書領域輪郭直線の検出処理の詳細について、図６の動作フローチャートに従って説明する。 FIG. 19 is a diagram showing an example in which a contour straight line candidate is detected for a temporary document region.
Next, details of the temporary document area outline straight line detection process in step S204 of FIG. 2 will be described with reference to the operation flowchart of FIG.

この処理では、仮文書領域を構成する四角形の各辺毎に、輪郭直線候補が評価されて仮文書領域輪郭直線が検出される。輪郭直線候補の評価には、輪郭直線候補が仮文書領域と重なる（横切る）長さと、輪郭直線候補が仮文書領域を分割する面積が用いられる。 In this processing, the contour straight line candidates are evaluated for each side of the quadrangle constituting the temporary document region, and the temporary document region contour straight line is detected. For the evaluation of the contour straight line candidate, a length in which the contour straight line candidate overlaps (crosses) the temporary document region and an area where the contour straight line candidate divides the temporary document region are used.

まず、仮文書領域輪郭直線検出処理部１０５は、仮文書領域を構成する四角形の輪郭のうち、仮文書領域輪郭直線の検出結果を得ていない１つの輪郭を選択し、それに対する輪郭直線候補を１つ選択する（図６のステップＳ６０１）。 First, the temporary document region contour straight line detection processing unit 105 selects one contour that has not obtained the detection result of the temporary document region contour straight line from the quadrangular contours constituting the temporary document region, and selects a contour straight line candidate for the selected contour. One is selected (step S601 in FIG. 6).

次に、仮文書領域輪郭直線検出処理部１０５は、選択した輪郭直線候補が、仮文書領域と重なる長さを評価値の１つとして算出する（図６のステップＳ６０２）。重なる長さは、図２０に示されるように、輪郭直線候補上にあり同時に仮文書領域に含まれる画素数を合計して得られる。 Next, the temporary document region contour straight line detection processing unit 105 calculates the length of the selected contour straight line candidate overlapping the temporary document region as one of the evaluation values (step S602 in FIG. 6). As shown in FIG. 20, the overlapping length is obtained by summing the number of pixels on the contour straight line candidate and simultaneously included in the temporary document area.

続いて、仮文書領域輪郭直線検出処理部１０５は、図２０に示されるように、選択した輪郭直線候補が仮文書領域を分断して得られる２つの評価用領域の面積（画素数の合計）を算出し、面積が小さいほうを評価値の他の１つとして選択する（図６のステップＳ６０３）。 Subsequently, as shown in FIG. 20, the temporary document area contour straight line detection processing unit 105 has two evaluation area areas (total number of pixels) obtained by dividing the temporary document area by the selected contour straight line candidate. And the smaller area is selected as another evaluation value (step S603 in FIG. 6).

仮文書領域輪郭直線検出処理部１０５は、全ての輪郭直線候補に対して上記重なる長さと面積の評価値を算出したか否かを判定する（図６のステップＳ６０４）。
仮文書領域輪郭直線検出処理部１０５は、ステップＳ６０４の判定がＮＯならば、評価値を算出していない輪郭直線候補を１つ選択する（図６のステップＳ６０５）。その後、仮文書領域輪郭直線検出処理部１０５は、ステップＳ６０３とステップＳ６０４を実行するこことにより、輪郭直線候補毎に、重なる長さと面積の評価値を算出する。 The provisional document region contour straight line detection processing unit 105 determines whether or not the evaluation values of the overlapping length and area have been calculated for all contour straight line candidates (step S604 in FIG. 6).
If the determination in step S604 is NO, the temporary document area contour straight line detection processing unit 105 selects one contour straight line candidate whose evaluation value has not been calculated (step S605 in FIG. 6). Thereafter, the provisional document region contour straight line detection processing unit 105 calculates the overlapping length and area evaluation values for each contour straight line candidate by executing Step S603 and Step S604.

仮文書領域輪郭直線検出処理部１０５は、ステップＳ６０４の判定がＹＥＳとなると、輪郭直線候補の中で、重なる長さの評価値が所定値以上で、かつ評価用領域の面積が最小となる輪郭直線候補を、選択中の輪郭に対する仮文書領域輪郭直線検出結果として決定する（図６のステップＳ６０６）。ここで、重なる長さの評価値が所定値以上となる輪郭直線候補が得られない場合には、選択中の輪郭については、輪郭直線の検出が不能である旨の仮文書領域輪郭直線検出結果を決定する。例えば、仮文書領域を構成する四角形の４辺中のある辺に対して、輪郭直線候補が３つ存在するとする。輪郭直線候補＃１は、仮文書領域と重なる長さが５０画素、評価用領域の面積が２００画素であるとする。同様に、輪郭直線候補＃２は、仮文書領域と重なる長さが１０画素、評価用領域の面積が３０画素であるとする。更に、輪郭直線候補＃３は、仮文書領域と重なる長さが４０画素、評価用領域の面積が１８０画素であるとする。このとき、仮文書領域と重なる長さの閾値を３０画素とすると、輪郭直線候補＃１と＃３が閾値以上の長さとなり、両者の評価用領域の面積を比べると輪郭直線候補＃３のほうが小さいことから、輪郭直線候補＃３が仮文書領域輪郭直線検出結果として選択される。 When the determination in step S604 is YES, the temporary document region contour straight line detection processing unit 105 has a contour whose overlapping length evaluation value is equal to or greater than a predetermined value and the area of the evaluation region is minimum among the contour straight line candidates. A straight line candidate is determined as a temporary document region contour straight line detection result for the selected contour (step S606 in FIG. 6). Here, in the case where a contour straight line candidate whose overlapping length evaluation value is equal to or greater than a predetermined value cannot be obtained, the temporary document region contour straight line detection result indicating that the contour straight line cannot be detected for the selected contour. To decide. For example, it is assumed that there are three contour straight line candidates for a certain side of the four sides of the quadrangle constituting the temporary document area. The contour straight line candidate # 1 is assumed to have a length overlapping with the temporary document area of 50 pixels and an area of the evaluation area of 200 pixels. Similarly, it is assumed that the contour straight line candidate # 2 has a length of 10 pixels overlapping with the temporary document area and an area of the evaluation area is 30 pixels. Further, it is assumed that the contour straight line candidate # 3 has a length overlapping with the temporary document area of 40 pixels and an area of the evaluation area of 180 pixels. At this time, if the threshold of the length that overlaps the temporary document region is 30 pixels, the contour straight line candidates # 1 and # 3 are longer than the threshold, and when the areas of the evaluation regions of both are compared, the contour straight line candidate # 3 Therefore, the contour straight line candidate # 3 is selected as the temporary document region contour straight line detection result.

仮文書領域輪郭直線検出処理部１０５は、全ての輪郭に対して仮文書領域輪郭直線検出
結果を得たか否かを判定する（図６のステップＳ６０７）。
ステップＳ６０７の判定がＮＯならば、仮文書領域輪郭直線検出処理部１０５は、ステップＳ６０１の処理に戻り、仮文書領域輪郭直線検出結果を得ていない新たな輪郭を選択して、ステップＳ６０２からステップＳ６０６までの処理を繰り返し実行する。 The temporary document region contour straight line detection processing unit 105 determines whether or not the temporary document region contour straight line detection result has been obtained for all contours (step S607 in FIG. 6).
If the determination in step S607 is NO, the temporary document area contour straight line detection processing unit 105 returns to the process in step S601, selects a new contour that has not obtained the temporary document area contour straight line detection result, and performs steps from step S602 to step S602. The processing up to S606 is repeatedly executed.

ステップＳ６０７の判定がＹＥＳとなったら、仮文書領域輪郭直線検出処理部１０５は、各辺に対する仮文書領域輪郭直線検出結果を出力して処理を終了する。
図１の制御部１０９は、図２のステップＳ２０５において、上述の詳細処理により仮文書領域輪郭直線検出処理部１０５が出力した仮文書領域輪郭直線検出結果が、四角形の４辺全てに対する輪郭直線を検出しているか否かを判定する。 If the determination in step S607 is YES, the temporary document region contour straight line detection processing unit 105 outputs the temporary document region contour straight line detection result for each side and ends the process.
The control unit 109 in FIG. 1 determines that the temporary document area contour straight line detection result output by the temporary document area contour straight line detection processing unit 105 in the step S205 in FIG. It is determined whether or not it is detected.

前述したように、ステップＳ２０５の判定結果がＹＥＳならば、図１の文書領域抽出処理部１０８が、その４辺の仮文書領域輪郭直線検出結果からなる輪郭直線で囲まれる四角形領域を文書領域として抽出し出力する（図２のステップＳ２０６）。図７は、文書領域抽出処理部１０８による上記処理を示す動作フローチャートである。即ち、文書領域抽出処理部１０８は、仮文書領域輪郭直線検出処理部１０５が上述の詳細処理により出力した４本の仮文書領域輪郭直線検出結果を入力し、それらにより形成される四角形を文書領域として出力する（図７のステップＳ７０１）。 As described above, if the decision result in the step S205 is YES, the document area extraction processing unit 108 in FIG. 1 uses the quadrangular area surrounded by the contour straight lines formed from the temporary document area contour straight line detection results of the four sides as the document area. Extract and output (step S206 in FIG. 2). FIG. 7 is an operation flowchart showing the above processing by the document region extraction processing unit 108. That is, the document region extraction processing unit 108 inputs the four temporary document region contour straight line detection results output by the temporary document region contour straight line detection processing unit 105 through the above-described detailed processing, and forms a rectangle formed by them as a document region. (Step S701 in FIG. 7).

ステップＳ２０５の判定結果がＮＯならば、図１のテキストブロック抽出処理部１０６が、仮文書領域からテキストブロックを抽出し、更にテキストブロック輪郭直線検出処理部１０７が、テキストブロック輪郭直線を検出する（図２のステップＳ２０７）。そして、文書領域抽出処理部１０８が、このテキストブロック輪郭直線と、ステップＳ２０４にて検出されている一部の辺の仮文書領域輪郭直線検出結果とを併合し、その結果得られる四角形領域を文書領域として抽出し出力する（図２のステップＳ２０８）。 If the decision result in the step S205 is NO, the text block extraction processing unit 106 in FIG. 1 extracts a text block from the temporary document area, and the text block contour straight line detection processing unit 107 detects a text block contour straight line ( Step S207 in FIG. Then, the document area extraction processing unit 108 merges the text block outline straight line with the temporary document area outline straight line detection result of a part of the sides detected in step S204, and obtains a rectangular area obtained as a result as the document. An area is extracted and output (step S208 in FIG. 2).

これらのステップＳ２０７及びＳ２０８の処理の詳細について、以下に説明する。図８は、テキストブロック輪郭直線検出処理部１０７が実行する図２のステップＳ２０７の、テキストブロック輪郭直線の検出処理の詳細を示す動作フローチャートである。また、図９は、文書領域抽出処理部１０８が実行する図２のステップＳ２０８の、仮文書領域輪郭直線とテキストブロック輪郭直線とに基づく文書領域抽出処理の詳細を示す動作フローチャートである。更に、図２１及び図２２は、テキストブロック輪郭直線検出処理部１０７の動作を示す説明図である。 Details of the processes in steps S207 and S208 will be described below. FIG. 8 is an operation flowchart showing details of the text block contour straight line detection processing in step S207 of FIG. 2 executed by the text block contour straight line detection processing unit 107. FIG. 9 is an operation flowchart showing details of the document area extraction process based on the temporary document area outline straight line and the text block outline straight line in step S208 of FIG. 2 executed by the document area extraction processing unit 108. 21 and 22 are explanatory diagrams showing the operation of the text block contour straight line detection processing unit 107. FIG.

まず、図２のステップＳ２０７のテキストブロック輪郭直線の検出処理の詳細について、図８の動作フローチャートに従って説明する。
テキストブロック輪郭直線検出処理部１０７はまず、領域統合処理部１０４が図２のステップＳ２０２にて算出している仮文書領域の粗抽出結果を入力する。そして、テキストブロック輪郭直線検出処理部１０７は、この入力した仮文書領域の外接矩形を求め、この外接矩形内をテキストブロック（文字領域）の探索範囲として設定する（図８のステップＳ８０１）。 First, details of the text block contour straight line detection process in step S207 of FIG. 2 will be described with reference to the operation flowchart of FIG.
First, the text block contour straight line detection processing unit 107 inputs the rough extraction result of the temporary document region calculated by the region integration processing unit 104 in step S202 of FIG. Then, the text block contour straight line detection processing unit 107 obtains a circumscribed rectangle of the input temporary document area, and sets the inside of the circumscribed rectangle as a search range of the text block (character area) (step S801 in FIG. 8).

次に、テキストブロック輪郭直線検出処理部１０７は、仮文書領域に含まれる画素で、他の領域（テキストブロック、背景領域）に隣接する画素を、水平垂直方向にラスタスキャンを行うことにより抽出する（図８のステップＳ８０２）。 Next, the text block contour straight line detection processing unit 107 extracts pixels adjacent to other regions (text block, background region) by raster scanning in the horizontal and vertical directions, among the pixels included in the temporary document region. (Step S802 in FIG. 8).

次に、テキストブロック輪郭直線検出処理部１０７は、ステップＳ８０１で算出した外接矩形の中心から、図２のステップＳ２０４にて仮文書領域輪郭直線が得られなかった（検出に失敗した）輪郭の方向に、仮文書領域に含まれかつテキストブロックや背景領域などの他の領域と接する画素を検出する（図８のステップＳ８０２）。例えば、仮文書領域
右側の仮文書領域輪郭直線が得られなかった場合は、水平方向（左から右へ）のラスタスキャンが、外接矩形上端から下端に向かって実施される。この処理により、仮文書領域内のテキストブロックが抽出される。 Next, the text block contour straight line detection processing unit 107 determines the direction of the contour from which the temporary document region contour straight line was not obtained in step S204 of FIG. 2 (detection failed) from the center of the circumscribed rectangle calculated in step S801. Then, pixels included in the temporary document area and in contact with other areas such as a text block and a background area are detected (step S802 in FIG. 8). For example, when a temporary document area contour straight line on the right side of the temporary document area is not obtained, raster scanning in the horizontal direction (from left to right) is performed from the upper end of the circumscribed rectangle toward the lower end. By this process, the text block in the temporary document area is extracted.

次に、テキストブロック輪郭直線検出処理部１０７は、ステップＳ８０２にて検出された他の領域と接する画素の中で、仮文書領域の端点を除いて仮文書領域輪郭直線が得られなかった方向の画像端に近い画素を選択する（図８のステップＳ８０３）。例えば、図２１に示されるように、仮文書領域の上側の仮文書領域輪郭直線２１０１、右側の仮文書領域輪郭直線２１０２、及び下側の仮文書領域輪郭直線２１０３は得られていて、左側の仮文書領域輪郭直線が得られなかった場合を考える。この場合は、水平方向（左から右へ）にラスタスキャンが行われ、仮文書領域の左端画素群２１０４を除いて最も画像左端に近い画素群２１０５及び２１０６が選択される。 Next, the text block contour straight line detection processing unit 107 has a direction in which the temporary document region contour straight line is not obtained except for the end point of the temporary document region among the pixels in contact with the other regions detected in step S802. A pixel close to the image end is selected (step S803 in FIG. 8). For example, as shown in FIG. 21, a temporary document area outline straight line 2101 on the upper side of the temporary document area, a temporary document area outline straight line 2102 on the right side, and a temporary document area outline straight line 2103 on the lower side are obtained. Consider a case where a temporary document area contour straight line is not obtained. In this case, raster scanning is performed in the horizontal direction (from left to right), and pixel groups 2105 and 2106 closest to the left edge of the image are selected except for the left edge pixel group 2104 of the temporary document area.

次に、前述の図５のステップＳ５０２の場合と同様にして、テキストブロック輪郭直線検出処理部１０７は、ステップＳ８０３で選択された画素群を、ハフ空間上の対応する座標に投票する。続いて、テキストブロック輪郭直線検出処理部１０７は、ハフ空間上で所定の閾値以上の投票数を持つ点を、上位から所定の数だけ検出する。そして、テキストブロック輪郭直線検出処理部１０７は、各点に写像されている上記各選択画素群によって形成される各直線群を、テキストブロックの各辺に対応する輪郭直線候補として算出する（以上、図８のステップＳ８０４）。 Next, in the same manner as in step S502 in FIG. 5 described above, the text block contour straight line detection processing unit 107 votes the pixel group selected in step S803 for corresponding coordinates in the Hough space. Subsequently, the text block contour straight line detection processing unit 107 detects a predetermined number of points from the higher order having a vote count equal to or greater than a predetermined threshold in the Hough space. Then, the text block contour straight line detection processing unit 107 calculates each straight line group formed by each of the selected pixel groups mapped to each point as a contour straight line candidate corresponding to each side of the text block (the foregoing, Step S804 in FIG.

最後に、テキストブロック輪郭直線検出処理部１０７は、上記輪郭直線候補を入力として、図６の動作フローチャートによって実現される処理と同様の処理を実行することにより、テキストブロック輪郭直線を算出する（図８のステップＳ８０５）。例えば、図２１と同様の図２２の例の場合、テキストブロック輪郭直線２２０１及び２２０２が得られる。 Finally, the text block contour straight line detection processing unit 107 calculates the text block contour straight line by executing the same processing as the processing realized by the operation flowchart of FIG. 8 step S805). For example, in the case of the example of FIG. 22 similar to FIG. 21, text block outline straight lines 2201 and 2202 are obtained.

次に、図２のステップＳ２０８の、仮文書領域輪郭直線とテキストブロック輪郭直線とに基づく文書領域抽出処理の詳細について、図９の動作フローチャートに従って説明する。 Next, details of the document region extraction processing based on the temporary document region contour straight line and the text block contour straight line in step S208 of FIG. 2 will be described with reference to the operation flowchart of FIG.

まず、文書領域抽出処理部１０８は、仮文書領域に対応する四角形を構成する辺のうち、図２のステップＳ２０４にて仮文書領域輪郭直線が得られなかった全ての辺に対して、テキストブロック輪郭直線検出処理部１０７が検出したテキストブロック輪郭直線が存在するか否かを判定する（図９のステップＳ９０１）。 First, the document area extraction processing unit 108 applies a text block to all the sides that form a quadrangle corresponding to the temporary document area, for which no temporary document area outline straight line has been obtained in step S204 of FIG. It is determined whether or not there is a text block contour line detected by the contour line detection processing unit 107 (step S901 in FIG. 9).

ステップＳ９０１の判定がＹＥＳならば、文書領域抽出処理部１０８は、図２のステップＳ２０４にて得られた仮文書領域輪郭直線と図２のステップ２０７にて得られたテキストブロック輪郭直線とで形成される四角形を、文書領域として出力する（図９のステップＳ９０２）。例えば、図２２の例の場合、仮文書領域輪郭直線２１０１、２１０２、２１０３と、テキストブロック輪郭直線２２０１とで形成される領域が、文書領域として出力される。
このようにして、完全な四角形を形成しない文書領域も、適切に抽出することが可能となる。 If the determination in step S901 is YES, the document area extraction processing unit 108 forms the temporary document area outline straight line obtained in step S204 in FIG. 2 and the text block outline straight line obtained in step 207 in FIG. The square to be output is output as a document area (step S902 in FIG. 9). For example, in the case of the example in FIG. 22, an area formed by the temporary document area contour straight lines 2101, 2102, 2103 and the text block contour straight line 2201 is output as the document area.
In this way, it is possible to appropriately extract a document area that does not form a complete rectangle.

一方、ステップＳ９０１の判定がＮＯとなった場合には、文書領域抽出処理部１０８は、仮文書領域を４つの輪郭直線で囲むことができず四角形としての文書領域の抽出は行えないため、文書領域輪郭直線とテキストブロック輪郭直線とそれぞれに対応する輪郭直線候補を出力する。この場合の出力の活用については、第３の実施形態の説明において後述する。 On the other hand, if the determination in step S901 is NO, the document area extraction processing unit 108 cannot enclose the temporary document area with four contour lines and cannot extract the document area as a rectangle. Contour line candidates corresponding to the area contour line and the text block contour line are output. The use of the output in this case will be described later in the description of the third embodiment.

以上説明した図１の文書領域抽出装置の構成と図２の文書領域抽出装置の動作フローチャートに基づく第１の実施形態では、撮影画像が色情報に基づいて領域分割され、認識対象とする仮文書領域が粗抽出される。粗抽出された仮文書領域の輪郭に直線が当てはめられることで、背景部分の要素に影響されることなく、また背景と文字領域の分離が部分的に失敗する場合でも、文字認識対象領域を四角形領域として高精度に背景から抽出することができる。更に、仮文書領域の輪郭を抽出できない場合は、テキストブロック輪郭直線を求めて併用することで四角形領域としての文書領域を抽出することができる。 In the first embodiment based on the configuration of the document area extracting apparatus in FIG. 1 and the operation flowchart of the document area extracting apparatus in FIG. 2 described above, the captured image is divided into areas based on color information, and is a temporary document to be recognized. A region is roughly extracted. By applying a straight line to the outline of the roughly extracted temporary document area, the character recognition target area is rectangular without being affected by the elements of the background part and even when the separation of the background and the character area partially fails. The region can be extracted from the background with high accuracy. Further, when the outline of the temporary document area cannot be extracted, the document area as a quadrangular area can be extracted by obtaining the text block outline straight line and using it together.

また、上述の第１の実施形態では、領域分割の際に過分割となった領域が領域統合処理によって統合されることで、文書領域内部の領域境界が不要に発生することを防ぐことができる。これにより、その後段で実行される仮文書領域の境界画素からの輪郭直線候補算出処理では、文書領域内部の領域境界画素は算出精度の低下につながるため、これを防ぐことで輪郭直線候補の算出精度を向上させることができる。 Further, in the first embodiment described above, the regions that are excessively divided at the time of region division are integrated by the region integration processing, so that it is possible to prevent the region boundary inside the document region from being generated unnecessarily. . As a result, in the contour line candidate calculation process from the boundary pixel of the temporary document area executed in the subsequent stage, the area boundary pixel inside the document area leads to a decrease in the calculation accuracy. Accuracy can be improved.

また、上述の第１の実施形態では、粗抽出された仮文書領域から輪郭直線候補が複数抽出される。そして、それぞれの輪郭直線候補に対して仮文書領域と重なる長さ（画素数）と仮文書領域の分断後の面積に着目し、仮文書領域と重なる長さが長くかつ文書画像を分断する面積が小さい直線候補が仮文書領域の境界線に沿っていると判断され、これらが評価値とされることで、最適な輪郭直線を選択することができる。 In the first embodiment described above, a plurality of contour straight line candidates are extracted from the temporarily extracted temporary document region. Then, paying attention to the length (number of pixels) overlapping the temporary document area and the divided area of the temporary document area for each contour straight line candidate, the area overlapping the temporary document area is long and the area where the document image is divided It is determined that a straight line candidate having a small is along the boundary line of the temporary document area, and these are used as evaluation values, whereby an optimum contour straight line can be selected.

また、上述の第１の実施形態では、テキストブロックに関しても、仮文書領域の場合と同様にして、輪郭直線候補を複数抽出し、それぞれの輪郭直線候補に対して評価を行うことにより、最適なテキストブロック輪郭直線を選択することができる。 Further, in the first embodiment described above, the optimum for the text block is also obtained by extracting a plurality of contour line candidates and evaluating each of the contour line candidates in the same manner as in the case of the temporary document region. A text block contour line can be selected.

更に、上述の第１の実施形態では、照明光の影響などによる輪郭の部分的な誤抽出が発生しても、文書領域が輪郭直線の当てはめによる四角形として抽出されることで、文書領域を矩形領域として正確に抽出することができる。加えて、文書領域の輪郭が画面外に有る場合や背景と識別が困難な場合などで仮文書領域輪郭直線の抽出ができない場合でも、テキストブロック輪郭直線が併用されることで四角形としての文書領域を抽出することができる。 Furthermore, in the above-described first embodiment, even if a partial erroneous extraction of the contour due to the influence of illumination light or the like occurs, the document region is extracted as a quadrangle by fitting the contour straight line, so that the document region is rectangular. It can be accurately extracted as a region. In addition, even if the temporary document area contour line cannot be extracted because the document area outline is outside the screen or when it is difficult to distinguish it from the background, the document area as a rectangle can be created by using the text block contour line together. Can be extracted.

そして、上述の第１の実施形態では、照明光の影響などが原因で局所的な濃淡変化が発生し、全体画像における色情報に基づいた領域分割において文書領域に過分割が発生したような場合であっても、部分画像に対する領域分割処理が合わせて行われる。そして、全体画像と部分画像の領域分割結果が文書画像の一様性に基づいて統合される。これにより、文書領域の過分割を抑えることができる。 In the first embodiment described above, a local gradation change occurs due to the influence of illumination light or the like, and overdivision occurs in the document area in the area division based on the color information in the entire image. Even so, the region division processing for the partial image is performed together. Then, the area division results of the entire image and the partial image are integrated based on the uniformity of the document image. As a result, excessive division of the document area can be suppressed.

次に、図１の文書領域抽出装置の構成に基づく第２の実施形態について、以下に詳細に説明する。
図１０は、図１の文書領域抽出装置が第２の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。図１０において、第１の実施形態における図２の動作フローチャートと同じステップには、同じステップ番号が付与されている。なお、図２の場合と同様に、図１０の動作フローチャートの一連の流れの制御は、図１の制御部１０９が所定の制御プログラムを実行する動作として実現される。 Next, a second embodiment based on the configuration of the document area extracting apparatus in FIG. 1 will be described in detail below.
FIG. 10 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as the second embodiment. 10, the same step numbers are assigned to the same steps as those in the operation flowchart of FIG. 2 in the first embodiment. As in the case of FIG. 2, the control of a series of flows in the operation flowchart of FIG. 10 is realized as an operation in which the control unit 109 of FIG. 1 executes a predetermined control program.

図１０の動作フローチャートが、図２の動作フローチャートと異なる部分は、図１０の枠１０００で囲まれた処理部分である。
第２の実施形態では、文書領域は文字や図形を除くとほぼ一様であるという前提から、制御部１０９が、図１０のステップＳ２０６又はＳ２０８にて抽出された文書領域内にお
いて、文字領域を除いた画素に対する分散値を取得する（図１０のステップＳ１００１）。例えば、文書領域に含まれる画素の色情報（ＹＵＶチャネル値）の分散値が計算される。 The operation flowchart of FIG. 10 is different from the operation flowchart of FIG. 2 in a processing portion surrounded by a frame 1000 in FIG.
In the second embodiment, based on the premise that the document area is almost uniform except for characters and graphics, the control unit 109 sets the character area in the document area extracted in step S206 or S208 in FIG. The dispersion value for the excluded pixels is acquired (step S1001 in FIG. 10). For example, the variance value of the color information (YUV channel value) of the pixels included in the document area is calculated.

図１０のステップＳ２０６又はＳ２０８にて選択された輪郭直線が適切でなく背景に含まれるべき画素が文書領域に含まれてしまう場合は、上述の分散値が大きくなる。このため、制御部１０９は、ステップＳ１００１で算出した分散値が所定の閾値以下であるか否かを、比較判定する（図１０のステップＳ１００２）。 When the contour line selected in step S206 or S208 in FIG. 10 is not appropriate and pixels that should be included in the background are included in the document area, the above-described variance value increases. For this reason, the control unit 109 compares and determines whether or not the variance value calculated in step S1001 is equal to or less than a predetermined threshold (step S1002 in FIG. 10).

分散値が閾値以下でステップＳ１００２の判定がＹＥＳの場合には、制御部１０９は、ステップＳ２０６又はＳ２０８にて選択された輪郭直線が適切であると判断する。そして、制御部１０９は、文書領域を、後段の歪み補正や文字認識の処理のために出力する。 When the variance value is equal to or smaller than the threshold value and the determination in step S1002 is YES, the control unit 109 determines that the contour straight line selected in step S206 or S208 is appropriate. Then, the control unit 109 outputs the document area for subsequent distortion correction and character recognition processing.

一方、分散値が閾値より大きくステップＳ１００２の判定がＮＯの場合には、制御部１０９は、図１０のステップＳ２０６又はＳ２０８にて選択された輪郭直線が適切でないと判断する。そして、制御部１０９は、図１の仮文書領域輪郭直線検出処理部１０５の処理（図１０のステップＳ２０４）に制御を戻し、輪郭直線候補からの仮文書領域輪郭直線の検出をやり直させる。この結果、ステップＳ２０４では、未選択の輪郭候補直線から最も評価値の高い輪郭候補直線が選択される。 On the other hand, when the variance value is larger than the threshold value and the determination in step S1002 is NO, the control unit 109 determines that the contour straight line selected in step S206 or S208 in FIG. 10 is not appropriate. Then, the control unit 109 returns the control to the process of the temporary document area contour straight line detection processing unit 105 in FIG. 1 (step S204 in FIG. 10), and re-detects the temporary document area contour straight line from the contour straight line candidates. As a result, in step S204, the contour candidate straight line with the highest evaluation value is selected from the unselected contour candidate straight lines.

このように第２の実施形態では、文字領域のみが抽出された場合は文字領域の一様性から分散値は小さくなるが背景要素が誤って含まれる場合は分散値が高くなることを利用し、文書領域の抽出精度を向上させることが可能となる。 As described above, the second embodiment uses the fact that when only the character area is extracted, the variance value becomes smaller due to the uniformity of the character area, but when the background element is erroneously included, the variance value becomes higher. Thus, the extraction accuracy of the document area can be improved.

次に、図１の文書領域抽出装置の構成に基づく第３の実施形態について、以下に詳細に説明する。
図１１は、図１の文書領域抽出装置が第３の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。図１１において、第１の実施形態における図２の動作フローチャートと同じステップには、同じステップ番号が付与されている。なお、図２の場合と同様に、図１１の動作フローチャートの一連の流れの制御は、図１の制御部１０９が所定の制御プログラムを実行する動作として実現される。 Next, a third embodiment based on the configuration of the document area extraction apparatus in FIG. 1 will be described in detail below.
FIG. 11 is an operation flowchart showing an overall operation when the document area extracting apparatus of FIG. 1 operates as the third embodiment. In FIG. 11, the same step numbers are assigned to the same steps as those in the operation flowchart of FIG. 2 in the first embodiment. As in the case of FIG. 2, the control of a series of flows in the operation flowchart of FIG. 11 is realized as an operation in which the control unit 109 of FIG. 1 executes a predetermined control program.

図１１の動作フローチャートが、図２の動作フローチャートと異なる部分は、図１１の枠１１００で囲まれた処理部分である。
第３の実施形態では、図１の制御部１０９が、図１１のステップＳ２０６又はＳ２０８にて抽出された文書領域と共に、図１１のステップＳ２０３又はＳ２０７にて抽出された輪郭直線候補又はテキストブロック輪郭直線候補を、表示装置上に明示する。表示装置は、例えばデジタルカメラや携帯電話の液晶表示画面である。そして、制御部１０９は、ユーザに確認を求めて、マウスやタッチパネルなどのポインティングデバイスによって、輪郭直線候補のうち適切なのものを選択可能とさせる（以上、図１１のステップＳ１１０１）。この結果、ユーザは、文書領域を目視で確認でき、必要であれば、輪郭直線候補からより適切な直線を選択することが可能となる。なお、ユーザに、ポインティングデバイスを利用して、新規の直線を指定させてもよい。 The operation flowchart of FIG. 11 is different from the operation flowchart of FIG. 2 in a processing portion surrounded by a frame 1100 in FIG.
In the third embodiment, the control unit 109 in FIG. 1 performs the contour line candidate or text block contour extracted in step S203 or S207 in FIG. 11 together with the document area extracted in step S206 or S208 in FIG. The straight line candidates are clearly indicated on the display device. The display device is, for example, a liquid crystal display screen of a digital camera or a mobile phone. Then, the control unit 109 asks the user for confirmation and allows an appropriate one of the contour line candidates to be selected using a pointing device such as a mouse or a touch panel (step S1101 in FIG. 11). As a result, the user can visually confirm the document area, and if necessary, can select a more appropriate straight line from the contour straight line candidates. Note that the user may designate a new straight line using a pointing device.

次に、制御部１０９は、ユーザが輪郭直線候補を選択し直して輪郭直線を修正したか否かを判定する（図１１のステップＳ１１０２）。
輪郭直線が修正されておらずステップＳ１１０２の判定がＮＯとなると、制御部１０９は、制御部１０９は、ステップＳ２０６又はＳ２０８にて選択された輪郭直線が適切であると判断する。そして、制御部１０９は、文書領域を、後段の歪み補正や文字認識の処理のために出力する。 Next, the control unit 109 determines whether or not the user has reselected the contour line candidate and corrected the contour line (step S1102 in FIG. 11).
If the contour straight line is not corrected and the determination in step S1102 is NO, the control unit 109 determines that the contour straight line selected in step S206 or S208 is appropriate. Then, the control unit 109 outputs the document area for subsequent distortion correction and character recognition processing.

輪郭直線が修正されてステップＳ１１０２の判定がＹＥＳとなると、制御部１０９は、図１の仮文書領域輪郭直線検出処理部１０５の処理（図１１のステップＳ２０４）に制御を戻し、ユーザが選択した輪郭直線候補を含む仮文書領域輪郭直線の検出を再度実行させる。この結果、ステップＳ２０４では、ユーザの意向を反映した輪郭直線を選択することが可能となる。 If the contour straight line is corrected and the determination in step S1102 is YES, the control unit 109 returns control to the process of the temporary document region contour straight line detection processing unit 105 in FIG. 1 (step S204 in FIG. 11), and the user selects it. The detection of the temporary document area contour line including the contour line candidate is executed again. As a result, in step S204, it is possible to select a contour line reflecting the user's intention.

なお、図１１のステップＳ１１０１の表示処理は、特に、図１１のステップ２０８にて、仮文書領域を４つの輪郭直線で囲むことができず四角形としての文書領域の抽出に失敗した場合（図９のステップＳ９０１の判定がＮＯとなった場合）に有効である。即ち、システムが文書領域を定めることができなかった場合に、ユーザの支援により、文書領域を決定することが可能となる。 Note that the display processing in step S1101 in FIG. 11 is not particularly limited when the provisional document area cannot be surrounded by four contour lines in step 208 in FIG. 11 and extraction of the document area as a rectangle fails (FIG. 9). This is effective when the determination in step S901 is NO. That is, when the system cannot determine the document area, the document area can be determined with the help of the user.

文書領域抽出結果は、それ自体は意味を持たず、文書領域の抽出処理に続く歪み補正処理及び文字認識処理までの一連の処理の結果を出力してはじめて意味を持つ。従って、通常は文書領域抽出結果をユーザに明示する必要はない。しかし、文字認識処理までの一連の処理の中間結果として、文字認識に用いられた文書領域又はテキストブロックとその輪郭直線がユーザに明示されることにより、ユーザが文書領域の抽出結果を評価可能となる。従って、第３の実施形態により、文書領域の抽出精度にユーザの意向を反映させることが可能となり、続いて実行される歪み補正処理及び文字認識処理の精度も向上させることが可能となる。 The document area extraction result has no meaning per se, and only after outputting the result of a series of processes from the distortion correction process and the character recognition process following the document area extraction process. Therefore, it is not usually necessary to clearly indicate the document area extraction result to the user. However, as an intermediate result of a series of processes up to the character recognition process, the document area or text block used for character recognition and its contour line are clearly indicated to the user, so that the user can evaluate the extraction result of the document area. Become. Therefore, according to the third embodiment, it is possible to reflect the user's intention on the extraction accuracy of the document area, and it is possible to improve the accuracy of the distortion correction processing and character recognition processing to be subsequently executed.

続いて、図１の文書領域抽出装置の構成に基づく第４の実施形態について、以下に詳細に説明する。
図１２は、図１の文書領域抽出装置が第４の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。図１２において、第１の実施形態における図２の動作フローチャートと同じステップには、同じステップ番号が付与されている。なお、図２の場合と同様に、図１２の動作フローチャートの一連の流れの制御は、図１の制御部１０９が所定の制御プログラムを実行する動作として実現される。 Next, a fourth embodiment based on the configuration of the document area extraction apparatus in FIG. 1 will be described in detail below.
FIG. 12 is an operation flowchart showing the overall operation when the document area extracting apparatus of FIG. 1 operates as the fourth embodiment. In FIG. 12, the same step numbers are assigned to the same steps as those in the operation flowchart of FIG. 2 in the first embodiment. As in the case of FIG. 2, the control of a series of flows in the operation flowchart of FIG. 12 is realized as an operation in which the control unit 109 of FIG. 1 executes a predetermined control program.

図１２の動作フローチャートが、図２の動作フローチャートと異なる部分は、図１２の枠１２００で囲まれた処理部分である。
第４の実施形態は、文書領域の抽出結果に続いて実行される歪み補正処理の結果に基づいて、文書領域の抽出処理を再実行させることにより、文書領域の抽出精度を高めることができる実施形態である。 The operation flowchart of FIG. 12 is different from the operation flowchart of FIG. 2 in a processing portion surrounded by a frame 1200 in FIG.
In the fourth embodiment, the extraction accuracy of the document area can be improved by re-executing the document area extraction process based on the result of the distortion correction process executed following the extraction result of the document area. It is a form.

文書領域の抽出が適切になされて、その文書領域に対して歪み補正処理が実行された場合、文書領域及びテキストブロックの輪郭直線は多くの場合、水平線または垂直線に平行になる。 When the document area is appropriately extracted and distortion correction processing is performed on the document area, the outline straight lines of the document area and the text block are often parallel to the horizontal line or the vertical line.

そこで、第４の実施形態では、まず、図１２のステップＳ２０６又はＳ２０８にて抽出された文書領域内に対して、特には図示しない歪み補正処理部によって歪み補正処理が実行される（図１２のステップＳ１２０１）。具体的には、輪郭直線で囲まれた文書領域に対して、その４辺が直交するように画像が変形させられることにより、画像の透視歪みを補正する処理が実行される。 Therefore, in the fourth embodiment, first, distortion correction processing is executed by a distortion correction processing unit (not shown) in particular in the document area extracted in step S206 or S208 in FIG. 12 (FIG. 12). Step S1201). Specifically, a process for correcting the perspective distortion of the image is executed by deforming the image so that the four sides thereof are orthogonal to the document region surrounded by the contour line.

次に、領域分割処理部１０３によりステップＳ２０１と同様の領域分割処理が実行され、更に領域統合処理部１０４によりステップＳ２０２と同様の文書領域の粗抽出処理が実行されて、文書領域が再度粗抽出される。続いて、テキストブロック抽出処理部１０６によりステップＳ２０７と同様の処理が実行されて、再度抽出された文書領域からテキストブロックが抽出され、それを囲むテキストブロック輪郭直線が抽出される（以上、図１２のステップＳ１２０２）。 Next, the region division processing unit 103 executes region division processing similar to step S201, and the region integration processing unit 104 executes rough extraction processing of the document region similar to step S202, so that the document region is again rough extracted. Is done. Subsequently, the text block extraction processing unit 106 executes the same processing as in step S207, extracts a text block from the extracted document area again, and extracts a text block outline straight line surrounding it (see FIG. 12 above). Step S1202).

そして、図１の制御部１０９は、仮文書領域輪郭直線と同様に、ステップＳ１２０２で抽出されたテキストブロック輪郭直線が、水平又は垂直に揃っているか否かを判定する（図１２のステップＳ１２０３）。 Then, the control unit 109 in FIG. 1 determines whether the text block contour straight lines extracted in step S1202 are aligned horizontally or vertically in the same manner as the temporary document region contour straight lines (step S1203 in FIG. 12). .

テキストブロック輪郭直線が仮文書領域輪郭直線と同様に水平又は垂直に揃っておりステップＳ１２０３の判定がＹＥＳならば、制御部１０９は、文書領域の抽出及びそれに続くステップＳ１２０１での歪み補正処理が適切になされたと判断する。そして、制御部１０９は、文書領域を、後段の文字認識処理等のために出力する。 If the text block contour straight line is aligned horizontally or vertically in the same manner as the temporary document region contour straight line and the determination in step S1203 is YES, the control unit 109 performs document region extraction and subsequent distortion correction processing in step S1201 as appropriate. It is judged that it was made. Then, the control unit 109 outputs the document area for subsequent character recognition processing or the like.

テキストブロック輪郭直線が仮文書領域輪郭直線と同様に水平又は垂直に揃っておらずステップＳ１２０３の判定がＮＯならば、制御部１０９は、文書領域の抽出が適切になされていないと判断する。そして、制御部１０９は、図１の仮文書領域輪郭直線検出処理部１０５の処理（図１２のステップＳ２０４）に制御を戻し、輪郭直線候補からの仮文書領域輪郭直線の検出をやり直させる。この結果、ステップＳ２０４では、未選択の輪郭候補直線から最も評価値の高い輪郭候補直線が選択される。 If the text block contour straight line is not aligned horizontally or vertically like the temporary document region contour straight line, and the determination in step S1203 is NO, the control unit 109 determines that the document region is not properly extracted. Then, the control unit 109 returns the control to the processing of the temporary document area contour straight line detection processing unit 105 in FIG. 1 (step S204 in FIG. 12), and redoes the detection of the temporary document area contour straight line from the contour straight line candidates. As a result, in step S204, the contour candidate straight line with the highest evaluation value is selected from the unselected contour candidate straight lines.

文書領域に誤って背景領域が含まれる場合や逆に含まれるべき文書領域の一部が含まれていないような場合には、文書領域に対する歪み補正処理の後に文書領域の内部のテキストブロックに対して算出された輪郭直線が水平垂直線に対して傾きを持つ。第４の実施形態では、このような状態が検出され、輪郭直線候補から別の直線候補が選択されて文書領域抽出が再度実行させられることにより、文書領域の抽出精度を向上させることが可能となる。 When the background area is mistakenly included in the document area or when part of the document area that should be included is not included, the text block inside the document area is corrected after the distortion correction processing for the document area. The contour straight line calculated as described above has an inclination with respect to the horizontal and vertical lines. In the fourth embodiment, such a state is detected, another straight line candidate is selected from the contour straight line candidates, and the document area extraction is executed again, thereby improving the document area extraction accuracy. Become.

最後に、図１の文書領域抽出装置の構成に基づく第５の実施形態について、以下に詳細に説明する。
図１３は、図１の文書領域抽出装置が第５の実施形態として動作する場合における全体的な動作を示す動作フローチャートである。図１３において、第１の実施形態における図２の動作フローチャートと同じステップには、同じステップ番号が付与されている。なお、図２の場合と同様に、図１３の動作フローチャートの一連の流れの制御は、図１の制御部１０９が所定の制御プログラムを実行する動作として実現される。 Finally, a fifth embodiment based on the configuration of the document area extracting apparatus in FIG. 1 will be described in detail below.
FIG. 13 is an operation flowchart showing the overall operation when the document area extracting apparatus of FIG. 1 operates as the fifth embodiment. In FIG. 13, the same step numbers are assigned to the same steps as those in the operation flowchart of FIG. 2 in the first embodiment. As in the case of FIG. 2, the control of a series of flows in the operation flowchart of FIG. 13 is realized as an operation in which the control unit 109 of FIG. 1 executes a predetermined control program.

図１３の動作フローチャートが、図２の動作フローチャートと異なる部分は、図１３の枠１３００で囲まれた処理部分である。
第５の実施形態は、文書領域の抽出結果に続いて実行される歪み補正処理及び文字認識処理の結果に基づいて、文書領域の抽出処理を再実行させることにより、文書領域の抽出精度を高めることができる実施形態である。 The operation flowchart of FIG. 13 is different from the operation flowchart of FIG. 2 in a processing portion surrounded by a frame 1300 in FIG.
In the fifth embodiment, the extraction accuracy of a document area is improved by performing the extraction process of the document area again based on the result of the distortion correction process and the character recognition process executed following the extraction result of the document area. This is an embodiment that can.

抽出された文書領域に対して歪み補正処理が実行され、更に文字認識処理が実行された場合において、文書領域を分割して得られる特定の分割領域で文字認識率が低い場合は、近傍の輪郭直線の誤抽出の可能性が高いと判断できる。 When distortion correction processing is performed on the extracted document area and further character recognition processing is performed, if the character recognition rate is low in a specific divided area obtained by dividing the document area, the neighboring contour It can be determined that there is a high probability of straight line extraction.

そこで、第５の実施形態では、まず、図１３のステップＳ２０６又はＳ２０８にて抽出された文書領域内に対して、特には図示しない歪み補正処理部によって歪み補正処理が実行される（図１３のステップＳ１３０１）。具体的には、輪郭直線で囲まれた文書領域に対して、その４辺が直交するように画像が変形させられることにより、画像の透視歪みを補正する処理が実行される。 Therefore, in the fifth embodiment, first, distortion correction processing is executed by a distortion correction processing unit (not shown) in particular in the document area extracted in step S206 or S208 in FIG. 13 (FIG. 13). Step S1301). Specifically, a process for correcting the perspective distortion of the image is executed by deforming the image so that the four sides thereof are orthogonal to the document region surrounded by the contour line.

次に、特には図示しない文字認識処理部によって文字認識処理が実行される（図１３のステップＳ１３０２）。
そして、制御部１０９が、文書領域を分割し（例えば４分割し）、各分割領域毎に、ステップＳ１３０２で得られた各分割領域内の文字認識率の平均値を算出する。そして、制御部１０９は、画像内での認識率に局所的な差があるか否かを判定する（図１３のステップＳ１３０３）。 Next, character recognition processing is executed by a character recognition processing unit (not shown) (step S1302 in FIG. 13).
Then, the control unit 109 divides the document area (for example, divides the document area into four), and calculates the average value of the character recognition rates in each divided area obtained in step S1302 for each divided area. Then, the control unit 109 determines whether there is a local difference in the recognition rate in the image (step S1303 in FIG. 13).

ステップＳ１３０３の判定がＮＯの場合には、制御部１０９は、文書領域の抽出が適切になされたと判断する。そして、制御部１０９は、文書領域を、後段の処理等のために出力する。 If the determination in step S1303 is NO, the control unit 109 determines that the document area has been appropriately extracted. Then, the control unit 109 outputs the document area for subsequent processing or the like.

一方、例えば上述の分割領域の一部の文字認識率が他の分割領域に比べて低くステップＳ１３０３の判定がＹＥＳならば、制御部１０９は、近傍の輪郭直線の誤抽出の可能性が高いと判断する。これは例えば、背景領域の画素が誤って文書領域に含まれている場合や、正しい輪郭直線に対して傾きが有り文書領域の４辺が画像に対して水平垂直になっていないような場合である。この場合には、制御部１０９は、図１の仮文書領域輪郭直線検出処理部１０５の処理（図１３のステップＳ２０４）に制御を戻し、輪郭直線候補からの仮文書領域輪郭直線の検出をやり直させる。この結果、ステップＳ２０４では、未選択の輪郭候補直線から最も評価値の高い輪郭候補直線が選択される。 On the other hand, for example, if the character recognition rate of a part of the above-described divided areas is low compared to the other divided areas, and the determination in step S1303 is YES, the control unit 109 has a high possibility of erroneous extraction of a neighboring contour line to decide. This is the case, for example, when the pixels in the background area are mistakenly included in the document area, or when the four sides of the document area are not horizontally or vertically with respect to the image. is there. In this case, the control unit 109 returns control to the processing of the temporary document region contour straight line detection processing unit 105 in FIG. 1 (step S204 in FIG. 13), and re-detects the temporary document region contour straight line from the contour straight line candidates. Make it. As a result, in step S204, the contour candidate straight line with the highest evaluation value is selected from the unselected contour candidate straight lines.

このように第５の実施形態では、四角形として抽出された文書領域に対して歪み補正処理と文字認識処理が実行され、局所的な文字認識率の差が判定される。これにより、特定の輪郭部分について、輪郭直線候補から別の候補が選択されて文書領域の抽出処理が再度実行されることで、文書領域の抽出精度を向上させることが可能となる。 As described above, in the fifth embodiment, the distortion correction process and the character recognition process are executed on the document area extracted as a quadrangle, and the difference in local character recognition rates is determined. As a result, for a specific contour portion, another candidate is selected from the contour straight line candidates, and the document region extraction process is executed again, whereby the document region extraction accuracy can be improved.

図２３は、図１の文字領域抽出装置を実現できるコンピュータのハードウェア構成の一例を示す図である。
図２３に示されるコンピュータは、ＣＰＵ２３０１、メモリ２３０２、入力装置２３０３、出力装置２３０４、外部記憶装置２３０５、可搬記録媒体２３０９が挿入される可搬記録媒体駆動装置２３０６、及びネットワーク接続装置２３０７を有し、これらがバス２３０８によって相互に接続された構成を有する。同図に示される構成は上記システムを実現できるコンピュータの一例であり、そのようなコンピュータはこの構成に限定されるものではない。このコンピュータは、例えば携帯電話やデジタルカメラ等の電子機器に搭載することができる。 FIG. 23 is a diagram illustrating an example of a hardware configuration of a computer capable of realizing the character area extraction device of FIG.
The computer shown in FIG. 23 includes a CPU 2301, a memory 2302, an input device 2303, an output device 2304, an external storage device 2305, a portable recording medium driving device 2306 into which a portable recording medium 2309 is inserted, and a network connection device 2307. However, they are connected to each other by a bus 2308. The configuration shown in the figure is an example of a computer that can implement the above system, and such a computer is not limited to this configuration. This computer can be mounted on an electronic device such as a mobile phone or a digital camera.

ＣＰＵ２３０１は、当該コンピュータ全体の制御を行う。メモリ２３０２は、プログラムの実行、データ更新等の際に、外部記憶装置２３０５（或いは可搬記録媒体２３０９）に記憶されているプログラム又はデータを一時的に格納するＲＡＭ等のメモリである。ＣＵＰ２３０１は、プログラムをメモリ２３０２に読み出して実行することにより、全体の制御を行う。 The CPU 2301 controls the entire computer. The memory 2302 is a memory such as a RAM that temporarily stores a program or data stored in the external storage device 2305 (or portable recording medium 2309) when executing a program, updating data, or the like. The CUP 2301 performs overall control by reading the program into the memory 2302 and executing it.

入力装置２３０３は、例えば、キーボード、マウス等及びそれらのインタフェース制御装置とからなる。入力装置２３０３は、ユーザによるキーボードやマウス等による入力操作を検出し、その検出結果をＣＰＵ２３０１に通知する。 The input device 2303 includes, for example, a keyboard, a mouse, etc. and their interface control devices. The input device 2303 detects an input operation by a user using a keyboard, a mouse, or the like, and notifies the CPU 2301 of the detection result.

出力装置２３０４は、表示装置、印刷装置等及びそれらのインタフェース制御装置とからなる。出力装置２３０４は、ＣＰＵ２３０１の制御によって送られてくるデータを表示装置や印刷装置に出力する。 The output device 2304 includes a display device, a printing device, etc. and their interface control devices. The output device 2304 outputs data sent under the control of the CPU 2301 to a display device or a printing device.

外部記憶装置２３０５は、例えばハードディスク記憶装置である。主に各種データやプ
ログラムの保存に用いられる。
可搬記録媒体駆動装置２３０６は、光ディスクやＳＤＲＡＭ、コンパクトフラッシュ（登録商標）等の可搬記録媒体２３０９を収容するもので、外部記憶装置２３０５の補助の役割を有する。 The external storage device 2305 is, for example, a hard disk storage device. Mainly used for storing various data and programs.
The portable recording medium driving device 2306 accommodates a portable recording medium 2309 such as an optical disc, SDRAM, or Compact Flash (registered trademark), and has an auxiliary role for the external storage device 2305.

ネットワーク接続装置２３０７は、例えばＬＡＮ（ローカルエリアネットワーク）又はＷＡＮ（ワイドエリアネットワーク）の通信回線を接続するための装置である。
上述した第１の実施形態から第５の実施形態までの文書領域抽出装置は、各実施形態に必要な機能を搭載したプログラムをＣＰＵ２３０１が実行することで実現される。そのプログラムは、例えば外部記憶装置２３０５や可搬記録媒体２３０９に記録して配布してもよく、或いはネットワーク接続装置２３０７によりネットワークから取得できるようにしてもよい。 The network connection device 2307 is a device for connecting, for example, a LAN (local area network) or WAN (wide area network) communication line.
The document area extracting apparatus from the first embodiment to the fifth embodiment described above is realized by the CPU 2301 executing a program having functions necessary for each embodiment. The program may be distributed by being recorded in, for example, the external storage device 2305 or the portable recording medium 2309, or may be acquired from the network by the network connection device 2307.

以上の第１〜第５の実施形態に関して、更に以下の付記を開示する。
（付記１）
入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出装置において、
前記入力画像中の各画素の色情報に基づいて該入力画像を複数の分割領域に分割する領域分割処理部と、
前記各分割領域について、分割領域の他の分割領域と接している画素と、該画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する領域統合処理部と
前記抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する仮文書領域輪郭直線検出処理部と、
前記抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する文書領域抽出処理部と、
を含むことを特徴とする文書領域抽出装置。
（付記２）
前記抽出された仮文書領域内の文字に相当する小領域を含む文字領域であるテキストブロックを抽出するテキストブロック抽出処理部と
前記抽出したテキストブロックにおいて、前記文書領域輪郭直線検出処理部で四角形を形成する仮文書領域輪郭直線が得られていない方向にラスタスキャンを行い、前記抽出された仮文書領域の端画素を除いて仮文書領域輪郭直線が得られていない方向に近い画素を選択し、該選択した画素をハフ変換して該テキストブロックの輪郭を示す直線であるテキストブロック輪郭直線を検出するテキストブロック輪郭直線検出処理部と、を更に含み、
前記文書領域抽出処理部が、前記抽出された仮文書領域輪郭直線と前記テキストブロック輪郭直線により囲まれる四角形として、文書領域を抽出し出力すること、
を特徴とする付記１に記載の文書領域抽出装置。
（付記３）
前記テキストブロック抽出処理部は、前記抽出された仮文書領域の内部に含まれる文字領域を前記仮文書領域の中心から探索し、該当する文字領域をテキストブロックとして抽出し、
前記テキストブロック輪郭線検出処理部は、該抽出されたテキストブロックの境界画素から輪郭直線候補を複数算出し、該各輪郭直線候補が前記テキストブロックと重なる長さと該各輪郭直線候補が前記テキストブロックを分断したときの分断部分の面積とに基づく評価値を算出し、該評価値に基づいて前記輪郭直線候補から前記テキストブロックを囲む各辺の輪郭に対応するテキストブロック輪郭直線を選択する、
ことを特徴とする付記２に記載の文書領域抽出装置。
（付記４）
前記文書領域抽出処理部は、前記仮文書領域輪郭直線が前記仮文書領域を囲む４辺全て
に対して検出された場合は該４辺に対応する仮文書領域輪郭直線で囲まれる四角形の領域を前記文書領域として抽出し、前記仮文書領域輪郭直線が前記仮文書領域を囲む４辺全てに対しては検出されなかった場合は前記仮文書領域輪郭直線及び前記テキストブロック輪郭直線とを併せて得られる４本の輪郭直線で囲まれる四角形の領域を前記文書領域として抽出する、
ことを特徴とする付記２又は３の何れか１項に記載の文書領域抽出装置。
（付記５）
前記仮文書領域又は前記テキストブロック、該仮文書領域又は該テキストブロックに対応する前記輪郭直線候補、及び前記仮文書領域輪郭直線又は前記テキストブロック輪郭直線をユーザに表示し、該ユーザに前記各輪郭直線候補のうち所望のものを選択させ、又は新たな輪郭直線候補を指定させ、該選択又は指定の結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする付記２乃至４の何れか１項に記載の文書領域抽出装置。
（付記６）
前記文書領域抽出処理部が出力する文書領域に対して歪み補正処理を実行し、該歪み補正処理により得られる文書領域から前記テキストブロック輪郭直線を抽出し、該抽出されたテキストブロック輪郭直線と前記歪み補正処理の後の前記文書領域に対応する輪郭直線との傾き関係を比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする付記２乃至５の何れか１項に記載の文書領域抽出装置。
（付記７）
前記文書領域抽出処理部にて抽出された文書領域の分散値を算出し、該分散値を所定の閾値と比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする付記１乃至６の何れか１項に記載の文書領域抽出装置。
（付記８）
前記文書領域抽出処理部が出力する文書領域に対して歪み補正処理、文字認識処理を実行し、該文字認識処理により得られる前記文書領域内の文字認識率を判定して得られる判定結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする付記１乃至７の何れか１項に記載の文書領域抽出装置。
（付記９）
前記領域分割処理部は、前記入力画像中の各画素の色情報に基づくクラスタリング処理を実行し、該クラスタリング処理により得られる各分割領域にそれぞれラベルを付与することにより、ラベリング結果画像を生成して出力し、
前記領域統合処理部は、前記ラベリング結果画像において、それぞれ異なるラベルが付与された隣接する分割領域間の色情報を評価し、該色情報の類似性が高い場合は前記隣接する仮文書領域を併合して同じラベルを付与し、該併合の結果得られる画像において、画像中央付近にあり領域サイズが所定の大きさ以上である分割領域を選択し、該分割領域を前記仮文書領域の抽出結果として出力する、
ことを特徴とする付記１乃至８の何れか１項に記載の文書領域抽出装置。
（付記１０）
前記領域分割処理部は、前記入力画像に対応する全体画像と前記入力画像の一部を取り出した部分画像をそれぞれ生成し、該全体画像及び該各部分画像のそれぞれに対して前記クラスタリング処理を実行して前記各分割領域を算出し、その後、前記全体画像及び前記各部分画像のそれぞれに対して算出した前記各分割領域を統合し、該統合した各分割領域から前記ラベリング結果画像を生成する、
ことを特徴とする付記９に記載の文書領域抽出装置。
（付記１１）
前記仮文書領域輪郭直線検出処理部は、前記抽出された仮文書領域の境界画素から輪郭直線候補を複数算出し、該各輪郭直線候補が前記仮文書領域と重なる長さと該各輪郭直線候補が前記仮文書領域を分断したときの分断部分の面積とに基づく評価値を算出し、該評価値に基づいて前記輪郭直線候補から前記仮文書領域を囲む各辺の輪郭に対応する仮文書領域輪郭直線を選択する、
ことを特徴とする付記１乃至１０の何れか１項に記載の文書領域抽出装置。
（付記１２）
文書領域抽出装置が入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出方法であって、
前記文書領域抽出装置が
前記入力画像中の各画素の色情報に基づいて該入力画像を複数の分割領域に分割する領域分割処理ステップと、
前記各分割領域について、分割領域の他の分割領域と接している画素と、該画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する領域統合処理ステップと、
前記抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する仮文書領域輪郭直線検出処理ステップと、
前記抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する文書領域抽出処理ステップと、
を実行することを特徴とする文書領域抽出方法。
（付記１３）
前記抽出された仮文書領域内の文字に相当する小領域を含む文字領域であるテキストブロックを抽出するテキストブロック抽出処理ステップと
前記抽出したテキストブロックにおいて、前記文書領域輪郭直線検出処理ステップで四角形を形成する仮文書領域輪郭直線が得られていない方向にラスタスキャンを行い、前記抽出された仮文書領域の端画素を除いて仮文書領域輪郭直線が得られていない方向に近い画素を選択し、該選択した画素をハフ変換して該テキストブロックの輪郭を示す直線であるテキストブロック輪郭直線を検出するテキストブロック輪郭直線検出処理ステップと、を更に含み、
前記文書領域抽出処理ステップが、前記抽出された仮文書領域輪郭直線と前記テキストブロック輪郭直線により囲まれる四角形として、文書領域を抽出し出力すること、
を特徴とする付記１２に記載の文書領域抽出方法。
（付記１４）
前記テキストブロック抽出処理ステップは、前記抽出された仮文書領域の内ステップに含まれる文字領域を前記仮文書領域の中心から探索し、該当する文字領域をテキストブロックとして抽出し、
前記テキストブロック輪郭線検出処理ステップは、該抽出されたテキストブロックの境界画素から輪郭直線候補を複数算出し、該各輪郭直線候補が前記テキストブロックと重なる長さと該各輪郭直線候補が前記テキストブロックを分断したときの分断ステップ分の面積とに基づく評価値を算出し、該評価値に基づいて前記輪郭直線候補から前記テキストブロックを囲む各辺の輪郭に対応するテキストブロック輪郭直線を選択する、
ことを特徴とする付記１３に記載の文書領域抽出方法。
（付記１５）
前記文書領域抽出処理ステップは、前記仮文書領域輪郭直線が前記仮文書領域を囲む４辺全てに対して検出された場合は該４辺に対応する仮文書領域輪郭直線で囲まれる四角形の領域を前記文書領域として抽出し、前記仮文書領域輪郭直線が前記仮文書領域を囲む４辺全てに対しては検出されなかった場合は前記仮文書領域輪郭直線及び前記テキストブロック輪郭直線とを併せて得られる４本の輪郭直線で囲まれる四角形の領域を前記文書領域
として抽出する、
ことを特徴とする付記１３又は１４の何れか１項に記載の文書領域抽出方法。
（付記１６）
前記仮文書領域又は前記テキストブロック、該仮文書領域又は該テキストブロックに対応する前記輪郭直線候補、及び前記仮文書領域輪郭直線又は前記テキストブロック輪郭直線をユーザに表示し、該ユーザに前記各輪郭直線候補のうち所望のものを選択させ、又は新たな輪郭直線候補を指定させ、該選択又は指定の結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記１３乃至１５の何れか１項に記載の文書領域抽出方法。
（付記１７）
前記文書領域抽出処理ステップが出力する文書領域に対して歪み補正処理を実行し、該歪み補正処理により得られる文書領域から前記テキストブロック輪郭直線を抽出し、該抽出されたテキストブロック輪郭直線と前記歪み補正処理の後の前記文書領域に対応する輪郭直線との傾き関係を比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記１３乃至１６の何れか１項に記載の文書領域抽出方法。
（付記１８）
前記文書領域抽出処理ステップにて抽出された文書領域の分散値を算出し、該分散値を所定の閾値と比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記１３乃至１６の何れか１項に記載の文書領域抽出方法。
（付記１９）
前記文書領域抽出処理ステップが出力する文書領域に対して歪み補正処理、文字認識処理を実行し、該文字認識処理により得られる前記文書領域内の文字認識率を判定して得られる判定結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記１３乃至１６の何れか１項に記載の文書領域抽出方法。
（付記２０）
前記領域分割処理ステップは、前記入力画像中の各画素の色情報に基づくクラスタリング処理を実行し、該クラスタリング処理により得られる各分割領域にそれぞれラベルを付与することにより、ラベリング結果画像を生成して出力し、
前記領域統合処理ステップは、前記ラベリング結果画像において、それぞれ異なるラベルが付与された隣接する分割領域間の色情報を評価し、該色情報の類似性が高い場合は前記隣接する仮文書領域を併合して同じラベルを付与し、該併合の結果得られる画像において、画像中央付近にあり領域サイズが所定の大きさ以上である分割領域を選択し、該分割領域を前記仮文書領域の抽出結果として出力する、
ことを特徴とする付記１２乃至１９の何れか１項に記載の文書領域抽出方法。
（付記２１）
前記領域分割処理ステップは、前記入力画像に対応する全体画像と前記入力画像の一ステップを取り出したステップ分画像をそれぞれ生成し、該全体画像及び該各ステップ分画像のそれぞれに対して前記クラスタリング処理を実行して前記各分割領域を算出し、その後、前記全体画像及び前記各ステップ分画像のそれぞれに対して算出した前記各分割領域を統合し、該統合した各分割領域から前記ラベリング結果画像を生成する、
ことを特徴とする付記２０に記載の文書領域抽出方法。
（付記２２）
前記仮文書領域輪郭直線検出処理ステップは、前記抽出された仮文書領域の境界画素から輪郭直線候補を複数算出し、該各輪郭直線候補が前記仮文書領域と重なる長さと該各輪
郭直線候補が前記仮文書領域を分断したときの分断ステップ分の面積とに基づく評価値を算出し、該評価値に基づいて前記輪郭直線候補から前記仮文書領域を囲む各辺の輪郭に対応する仮文書領域輪郭直線を選択する、
ことを特徴とする付記１２乃至２１の何れか１項に記載の文書領域抽出方法。
（付記２３）
入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出装置として構成されるコンピュータに、
前記入力画像中の各画素の色情報に基づいて該入力画像を複数の分割領域に分割する領域分割処理ステップと、
前記各分割領域について、分割領域の他の分割領域と接している画素と、該画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する領域統合処理ステップと、
前記抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する仮文書領域輪郭直線検出処理ステップと、
前記抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する文書領域抽出処理ステップと、
を実行させるためのプログラム。
（付記２４）
前記抽出された仮文書領域内の文字に相当する小領域を含む文字領域であるテキストブロックを抽出するテキストブロック抽出処理ステップと
前記抽出したテキストブロックにおいて、前記文書領域輪郭直線検出処理ステップで四角形を形成する仮文書領域輪郭直線が得られていない方向にラスタスキャンを行い、前記抽出された仮文書領域の端画素を除いて仮文書領域輪郭直線が得られていない方向に近い画素を選択し、該選択した画素をハフ変換して該テキストブロックの輪郭を示す直線であるテキストブロック輪郭直線を検出するテキストブロック輪郭直線検出処理ステップと、を更に含み、
前記文書領域抽出処理ステップが、前記抽出された仮文書領域輪郭直線と前記テキストブロック輪郭直線により囲まれる四角形として、文書領域を抽出し出力すること、
を特徴とする付記２３に記載のプログラム。
（付記２５）
前記テキストブロック抽出処理ステップは、前記抽出された仮文書領域の内ステップに含まれる文字領域を前記仮文書領域の中心から探索し、該当する文字領域をテキストブロックとして抽出し、
前記テキストブロック輪郭線検出処理ステップは、該抽出されたテキストブロックの境界画素から輪郭直線候補を複数算出し、該各輪郭直線候補が前記テキストブロックと重なる長さと該各輪郭直線候補が前記テキストブロックを分断したときの分断ステップ分の面積とに基づく評価値を算出し、該評価値に基づいて前記輪郭直線候補から前記テキストブロックを囲む各辺の輪郭に対応するテキストブロック輪郭直線を選択する、
ことを特徴とする付記２４に記載のプログラム。
（付記２６）
前記文書領域抽出処理ステップは、前記仮文書領域輪郭直線が前記仮文書領域を囲む４辺全てに対して検出された場合は該４辺に対応する仮文書領域輪郭直線で囲まれる四角形の領域を前記文書領域として抽出し、前記仮文書領域輪郭直線が前記仮文書領域を囲む４辺全てに対しては検出されなかった場合は前記仮文書領域輪郭直線及び前記テキストブロック輪郭直線とを併せて得られる４本の輪郭直線で囲まれる四角形の領域を前記文書領域として抽出する、
ことを特徴とする付記２４又は２５の何れか１項に記載のプログラム。
（付記２７）
前記仮文書領域又は前記テキストブロック、該仮文書領域又は該テキストブロックに対
応する前記輪郭直線候補、及び前記仮文書領域輪郭直線又は前記テキストブロック輪郭直線をユーザに表示し、該ユーザに前記各輪郭直線候補のうち所望のものを選択させ、又は新たな輪郭直線候補を指定させ、該選択又は指定の結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記２４乃至２６の何れか１項に記載のプログラム。
（付記２８）
前記文書領域抽出処理ステップが出力する文書領域に対して歪み補正処理を実行し、該歪み補正処理により得られる文書領域から前記テキストブロック輪郭直線を抽出し、該抽出されたテキストブロック輪郭直線と前記歪み補正処理の後の前記文書領域に対応する輪郭直線との傾き関係を比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記２４乃至２７の何れか１項に記載のプログラム。
（付記２９）
前記文書領域抽出処理ステップにて抽出された文書領域の分散値を算出し、該分散値を所定の閾値と比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記２４乃至２７の何れか１項に記載のプログラム。
（付記３０）
前記文書領域抽出処理ステップが出力する文書領域に対して歪み補正処理、文字認識処理を実行し、該文字認識処理により得られる前記文書領域内の文字認識率を判定して得られる判定結果に基づいて、前記仮文書領域輪郭直線検出処理ステップ又は前記テキストブロック輪郭直線検出処理ステップの処理を再度実行させる制御を更に含む、
ことを特徴とする付記２４乃至２７の何れか１項に記載のプログラム。
（付記３１）
前記領域分割処理ステップは、前記入力画像中の各画素の色情報に基づくクラスタリング処理を実行し、該クラスタリング処理により得られる各分割領域にそれぞれラベルを付与することにより、ラベリング結果画像を生成して出力し、
前記領域統合処理ステップは、前記ラベリング結果画像において、それぞれ異なるラベルが付与された隣接する分割領域間の色情報を評価し、該色情報の類似性が高い場合は前記隣接する仮文書領域を併合して同じラベルを付与し、該併合の結果得られる画像において、画像中央付近にあり領域サイズが所定の大きさ以上である分割領域を選択し、該分割領域を前記仮文書領域の抽出結果として出力する、
ことを特徴とする付記２３乃至３０の何れか１項に記載のプログラム。
（付記３２）
前記領域分割処理ステップは、前記入力画像に対応する全体画像と前記入力画像の一ステップを取り出したステップ分画像をそれぞれ生成し、該全体画像及び該各ステップ分画像のそれぞれに対して前記クラスタリング処理を実行して前記各分割領域を算出し、その後、前記全体画像及び前記各ステップ分画像のそれぞれに対して算出した前記各分割領域を統合し、該統合した各分割領域から前記ラベリング結果画像を生成する、
ことを特徴とする付記３１に記載のプログラム。
（付記３３）
前記仮文書領域輪郭直線検出処理ステップは、前記抽出された仮文書領域の境界画素から輪郭直線候補を複数算出し、該各輪郭直線候補が前記仮文書領域と重なる長さと該各輪郭直線候補が前記仮文書領域を分断したときの分断ステップ分の面積とに基づく評価値を算出し、該評価値に基づいて前記輪郭直線候補から前記仮文書領域を囲む各辺の輪郭に対応する仮文書領域輪郭直線を選択する、
ことを特徴とする付記２３乃至３２の何れか１項に記載のプログラム。 Regarding the above first to fifth embodiments, the following additional notes are further disclosed.
(Appendix 1)
In a document area extracting apparatus for extracting a document area including a character string in order to recognize a character string in an input image,
An area division processing unit that divides the input image into a plurality of divided areas based on color information of each pixel in the input image;
For each of the divided areas, the difference in color information between the pixels in contact with the other divided areas of the divided area and the pixels in the other divided areas adjacent to the pixel is compared with a preset threshold value. Is smaller than a preset threshold value, the region integration processing unit that extracts the temporary document region by performing region integration that regards both divided regions as the same region, and a straight line that indicates the outline of the extracted temporary document region A temporary document region contour straight line detection processing unit for detecting a temporary document region contour straight line;
A document area extraction processing unit for extracting and outputting a rectangular document area surrounded by the extracted temporary document area outline straight line;
A document area extracting apparatus comprising:
(Appendix 2)
In the extracted text block, a text block extraction processing unit that extracts a text block that is a character region including a small region corresponding to a character in the extracted temporary document region; Raster scanning is performed in a direction in which the temporary document area contour straight line to be formed is not obtained, and pixels near the direction in which the temporary document area contour straight line is not obtained are selected except for the end pixels of the extracted temporary document area, A text block contour straight line detection processing unit for detecting a text block contour straight line that is a straight line indicating the contour of the text block by performing a Hough transform on the selected pixel;
The document area extraction processing unit extracts and outputs a document area as a quadrangle surrounded by the extracted temporary document area contour straight line and the text block contour straight line;
The document area extracting device according to appendix 1, characterized by:
(Appendix 3)
The text block extraction processing unit searches a character area included in the extracted temporary document area from the center of the temporary document area, extracts the corresponding character area as a text block,
The text block contour detection processing unit calculates a plurality of contour straight line candidates from boundary pixels of the extracted text block, a length that each contour straight line candidate overlaps with the text block, and each contour straight line candidate is the text block Calculating an evaluation value based on the area of the divided part when dividing, and selecting a text block contour line corresponding to the contour of each side surrounding the text block from the contour line candidate based on the evaluation value,
The document area extracting apparatus according to Supplementary Note 2, wherein
(Appendix 4)
When the temporary document area outline straight line is detected for all four sides surrounding the temporary document area, the document area extraction processing unit determines a rectangular area surrounded by the temporary document area outline straight line corresponding to the four sides. Extracted as the document area, and if the temporary document area contour straight line is not detected for all four sides surrounding the temporary document area, the temporary document area contour straight line and the text block contour straight line are obtained together. A quadrangular region surrounded by four contour lines is extracted as the document region;
4. The document area extracting apparatus according to any one of appendices 2 and 3, wherein
(Appendix 5)
The temporary document area or the text block, the contour line candidate corresponding to the temporary document area or the text block, and the temporary document area contour straight line or the text block contour straight line are displayed to the user, and each contour is displayed to the user. Based on the result of the selection or designation, the temporary document area contour straight line detection processing unit or the text block contour straight line detection processing unit is selected based on the selection or designation result. And further includes control to execute the process again.
5. The document area extracting apparatus according to any one of appendices 2 to 4, wherein
(Appendix 6)
A distortion correction process is performed on the document area output by the document area extraction processing unit, the text block outline straight line is extracted from the document area obtained by the distortion correction process, the extracted text block outline straight line and the extracted text block outline straight line Processing of the temporary document region contour straight line detection processing unit or the text block contour straight line detection processing unit based on the comparison result obtained by comparing the inclination relationship with the contour straight line corresponding to the document region after the distortion correction processing Further includes control to execute
6. The document area extracting device according to any one of appendices 2 to 5, characterized in that:
(Appendix 7)
Based on a comparison result obtained by calculating a variance value of the document region extracted by the document region extraction processing unit and comparing the variance value with a predetermined threshold value, the temporary document region outline straight line detection processing unit or the Further includes a control for executing the process of the text block contour straight line detection processing unit again.
7. The document area extracting apparatus according to any one of appendices 1 to 6, wherein
(Appendix 8)
Based on a determination result obtained by executing distortion correction processing and character recognition processing on the document region output by the document region extraction processing unit, and determining a character recognition rate in the document region obtained by the character recognition processing. The temporary document region contour straight line detection processing unit or the text block contour straight line detection processing unit is further included.
8. The document area extracting apparatus according to any one of appendices 1 to 7, wherein
(Appendix 9)
The area division processing unit generates a labeling result image by executing a clustering process based on color information of each pixel in the input image and assigning a label to each divided area obtained by the clustering process. Output,
The region integration processing unit evaluates color information between adjacent divided regions to which different labels are assigned in the labeling result image, and merges the adjacent temporary document regions when the similarity of the color information is high. Then, in the image obtained as a result of the merging, a divided area that is near the center of the image and whose area size is equal to or larger than a predetermined size is selected, and the divided area is selected as the temporary document area extraction result. Output,
9. The document area extracting apparatus according to any one of appendices 1 to 8, wherein
(Appendix 10)
The region division processing unit generates a whole image corresponding to the input image and a partial image obtained by extracting a part of the input image, and executes the clustering process for each of the whole image and each of the partial images. Calculating each of the divided regions, and then integrating the divided regions calculated for each of the whole image and each of the partial images, and generating the labeling result image from the integrated divided regions,
The document area extracting apparatus according to appendix 9, characterized in that:
(Appendix 11)
The temporary document region contour straight line detection processing unit calculates a plurality of contour straight line candidates from the extracted boundary pixels of the temporary document region, the length of each contour straight line candidate overlapping with the temporary document region, and each contour straight line candidate A temporary document region contour corresponding to a contour of each side surrounding the temporary document region from the contour straight line candidate is calculated based on the evaluation value based on the area of the divided portion when the temporary document region is divided. Select a straight line,
11. The document area extracting apparatus according to any one of appendices 1 to 10, wherein
(Appendix 12)
A document area extraction method for extracting a document area including a character string so that the document area extraction device recognizes the character string in an input image,
An area division processing step in which the document area extracting device divides the input image into a plurality of divided areas based on color information of each pixel in the input image;
For each of the divided areas, the difference in color information between the pixels in contact with the other divided areas of the divided area and the pixels in the other divided areas adjacent to the pixel is compared with a preset threshold value. A region integration processing step of extracting a temporary document region by performing region integration in which both divided regions are regarded as the same region when is smaller than a preset threshold value;
A temporary document region contour straight line detection processing step for detecting a temporary document region contour straight line that is a straight line indicating the contour of the extracted temporary document region;
A document region extraction processing step for extracting and outputting a rectangular document region surrounded by the extracted temporary document region contour line;
A document area extracting method characterized by executing
(Appendix 13)
In the extracted text block, a text block extraction processing step for extracting a text block that is a character region including a small region corresponding to a character in the extracted temporary document region; Raster scanning is performed in a direction in which the temporary document area contour straight line to be formed is not obtained, and pixels near the direction in which the temporary document area contour straight line is not obtained are selected except for the end pixels of the extracted temporary document area, A text block contour straight line detection processing step for detecting a text block contour straight line that is a straight line indicating the contour of the text block by performing a Hough transform on the selected pixel,
The document area extraction processing step extracts and outputs a document area as a quadrangle surrounded by the extracted temporary document area contour straight line and the text block contour straight line;
The document area extracting method according to appendix 12, characterized by:
(Appendix 14)
In the text block extraction processing step, a character area included in the extracted temporary document area is searched from the center of the temporary document area, and the corresponding character area is extracted as a text block.
The text block contour detection processing step calculates a plurality of contour straight line candidates from boundary pixels of the extracted text block, a length that each contour straight line candidate overlaps with the text block, and each contour straight line candidate is the text block Calculating an evaluation value based on the area of the dividing step when dividing, and selecting a text block contour line corresponding to the contour of each side surrounding the text block from the contour line candidate based on the evaluation value,
The document area extraction method according to attachment 13, wherein the document area extraction method is described above.
(Appendix 15)
In the document area extraction processing step, when the temporary document area contour straight line is detected for all four sides surrounding the temporary document area, a rectangular area surrounded by the temporary document area contour straight line corresponding to the four sides is detected. Extracted as the document area, and if the temporary document area contour straight line is not detected for all four sides surrounding the temporary document area, the temporary document area contour straight line and the text block contour straight line are obtained together. A quadrangular region surrounded by four contour lines is extracted as the document region;
15. The document area extraction method according to any one of appendix 13 or 14, wherein
(Appendix 16)
The temporary document area or the text block, the contour line candidate corresponding to the temporary document area or the text block, and the temporary document area contour straight line or the text block contour straight line are displayed to the user, and each contour is displayed to the user. Based on the result of the selection or designation, the provisional document area contour straight line detection processing step or the text block contour straight line detection processing step is selected. And further includes control to execute the process again.
16. The document area extracting method according to any one of appendices 13 to 15, wherein
(Appendix 17)
A distortion correction process is performed on the document area output by the document area extraction process step, the text block contour straight line is extracted from the document area obtained by the distortion correction process, the extracted text block contour straight line and the extracted text block contour straight line Processing of the temporary document region contour straight line detection processing step or the text block contour straight line detection processing step based on the comparison result obtained by comparing the inclination relationship with the contour straight line corresponding to the document region after the distortion correction processing Further includes control to execute
The document region extraction method according to any one of appendices 13 to 16, wherein
(Appendix 18)
Based on a comparison result obtained by calculating a variance value of the document area extracted in the document area extraction processing step and comparing the variance value with a predetermined threshold value, the temporary document area outline straight line detection processing step or the And further including a control for executing the processing of the text block contour straight line detection processing step again.
The document region extraction method according to any one of appendices 13 to 16, wherein
(Appendix 19)
Based on a determination result obtained by executing distortion correction processing and character recognition processing on the document region output by the document region extraction processing step, and determining a character recognition rate in the document region obtained by the character recognition processing. The temporary document region contour straight line detection processing step or the text block contour straight line detection processing step is further included.
The document region extraction method according to any one of appendices 13 to 16, wherein
(Appendix 20)
The region dividing processing step generates a labeling result image by executing a clustering process based on color information of each pixel in the input image and assigning a label to each divided region obtained by the clustering process. Output,
The area integration processing step evaluates color information between adjacent divided areas each having a different label in the labeling result image, and merges the adjacent temporary document areas when the similarity of the color information is high. Then, in the image obtained as a result of the merging, a divided area that is near the center of the image and whose area size is equal to or larger than a predetermined size is selected, and the divided area is selected as the temporary document area extraction result. Output,
20. The document area extracting method according to any one of appendices 12 to 19, wherein
(Appendix 21)
The region division processing step generates an entire image corresponding to the input image and a step image obtained by extracting one step of the input image, and the clustering process is performed on each of the entire image and the image for each step. To calculate each of the divided areas, and then integrate the calculated divided areas for each of the whole image and the image for each step, and the labeling result image from the integrated divided areas. Generate,
The document area extraction method according to appendix 20, wherein the document area is extracted.
(Appendix 22)
The provisional document region contour straight line detection processing step calculates a plurality of contour straight line candidates from the extracted boundary pixels of the temporary document region, the length of each contour straight line candidate overlapping with the temporary document region, and each contour straight line candidate An evaluation value based on the area of the dividing step when the temporary document area is divided is calculated, and the temporary document area corresponding to the outline of each side surrounding the temporary document area from the outline straight line candidate based on the evaluation value Select a contour line,
22. The document area extraction method according to any one of appendices 12 to 21, wherein
(Appendix 23)
In order to recognize a character string in an input image, a computer configured as a document area extracting device that extracts a document area including the character string,
An area division processing step for dividing the input image into a plurality of divided areas based on color information of each pixel in the input image;
For each of the divided areas, the difference in color information between the pixels in contact with the other divided areas of the divided area and the pixels in the other divided areas adjacent to the pixel is compared with a preset threshold value. A region integration processing step of extracting a temporary document region by performing region integration in which both divided regions are regarded as the same region when is smaller than a preset threshold value;
A temporary document region contour straight line detection processing step for detecting a temporary document region contour straight line that is a straight line indicating the contour of the extracted temporary document region;
A document region extraction processing step for extracting and outputting a rectangular document region surrounded by the extracted temporary document region contour line;
A program for running
(Appendix 24)
In the extracted text block, a text block extraction processing step for extracting a text block that is a character region including a small region corresponding to a character in the extracted temporary document region; Raster scanning is performed in a direction in which the temporary document area contour straight line to be formed is not obtained, and pixels near the direction in which the temporary document area contour straight line is not obtained are selected except for the end pixels of the extracted temporary document area, A text block contour straight line detection processing step for detecting a text block contour straight line that is a straight line indicating the contour of the text block by performing a Hough transform on the selected pixel,
The document area extraction processing step extracts and outputs a document area as a quadrangle surrounded by the extracted temporary document area contour straight line and the text block contour straight line;
The program according to appendix 23, characterized by:
(Appendix 25)
In the text block extraction processing step, a character area included in the extracted temporary document area is searched from the center of the temporary document area, and the corresponding character area is extracted as a text block.
The text block contour detection processing step calculates a plurality of contour straight line candidates from boundary pixels of the extracted text block, a length that each contour straight line candidate overlaps with the text block, and each contour straight line candidate is the text block Calculating an evaluation value based on the area of the dividing step when dividing, and selecting a text block contour line corresponding to the contour of each side surrounding the text block from the contour line candidate based on the evaluation value,
The program according to appendix 24, which is characterized by the above.
(Appendix 26)
In the document area extraction processing step, when the temporary document area contour straight line is detected for all four sides surrounding the temporary document area, a rectangular area surrounded by the temporary document area contour straight line corresponding to the four sides is detected. Extracted as the document area, and if the temporary document area contour straight line is not detected for all four sides surrounding the temporary document area, the temporary document area contour straight line and the text block contour straight line are obtained together. A quadrangular region surrounded by four contour lines is extracted as the document region;
26. The program according to any one of supplementary notes 24 and 25,
(Appendix 27)
The temporary document area or the text block, the contour line candidate corresponding to the temporary document area or the text block, and the temporary document area contour straight line or the text block contour straight line are displayed to the user, and each contour is displayed to the user. Based on the result of the selection or designation, the provisional document area contour straight line detection processing step or the text block contour straight line detection processing step is selected. And further includes control to execute the process again.
27. The program according to any one of appendices 24 to 26, characterized by:
(Appendix 28)
A distortion correction process is performed on the document area output by the document area extraction process step, the text block contour straight line is extracted from the document area obtained by the distortion correction process, the extracted text block contour straight line and the extracted text block contour straight line Processing of the temporary document region contour straight line detection processing step or the text block contour straight line detection processing step based on the comparison result obtained by comparing the inclination relationship with the contour straight line corresponding to the document region after the distortion correction processing Further includes control to execute
28. The program according to any one of appendices 24 to 27, wherein
(Appendix 29)
Based on a comparison result obtained by calculating a variance value of the document area extracted in the document area extraction processing step and comparing the variance value with a predetermined threshold value, the temporary document area outline straight line detection processing step or the And further including a control for executing the processing of the text block contour straight line detection processing step again.
28. The program according to any one of appendices 24 to 27, wherein
(Appendix 30)
Based on a determination result obtained by executing distortion correction processing and character recognition processing on the document region output by the document region extraction processing step, and determining a character recognition rate in the document region obtained by the character recognition processing. The temporary document region contour straight line detection processing step or the text block contour straight line detection processing step is further included.
28. The program according to any one of appendices 24 to 27, wherein
(Appendix 31)
The region dividing processing step generates a labeling result image by executing a clustering process based on color information of each pixel in the input image and assigning a label to each divided region obtained by the clustering process. Output,
The area integration processing step evaluates color information between adjacent divided areas each having a different label in the labeling result image, and merges the adjacent temporary document areas when the similarity of the color information is high. Then, in the image obtained as a result of the merging, a divided area that is near the center of the image and whose area size is equal to or larger than a predetermined size is selected, and the divided area is selected as the temporary document area extraction result. Output,
The program according to any one of appendices 23 to 30, characterized in that:
(Appendix 32)
The region division processing step generates an entire image corresponding to the input image and a step image obtained by extracting one step of the input image, and the clustering process is performed on each of the entire image and the image for each step. To calculate each of the divided areas, and then integrate the calculated divided areas for each of the whole image and the image for each step, and the labeling result image from the integrated divided areas. Generate,
The program according to supplementary note 31, characterized by:
(Appendix 33)
The provisional document region contour straight line detection processing step calculates a plurality of contour straight line candidates from the extracted boundary pixels of the temporary document region, the length of each contour straight line candidate overlapping with the temporary document region, and each contour straight line candidate An evaluation value based on the area of the dividing step when the temporary document area is divided is calculated, and the temporary document area corresponding to the outline of each side surrounding the temporary document area from the outline straight line candidate based on the evaluation value Select a contour line,
33. The program according to any one of appendices 23 to 32, characterized in that:

開示する技術は例えば、コンパクトデジタルカメラや携帯電話に搭載されるカメラを用いて名刺の文字を読取って認識し住所録に登録する機能をはじめとして、様々な文章を手持ち撮影し認識して文字コードとして取り込むアプリケーションに利用することができる。 The disclosed technology includes, for example, a function for reading and recognizing business card characters using a compact digital camera or a camera mounted on a mobile phone, and registering them in an address book. Can be used as an application.

１０１カメラ撮影部
１０２画像データ記憶部
１０３領域分割処理部
１０４領域統合処理部
１０５仮文書領域輪郭直線検出処理部
１０６テキストブロック抽出処理部
１０７テキストブロック輪郭直線検出処理部
１０８文書領域抽出処理部
１０９制御部
２３０１ＣＰＵ
２３０２メモリ
２３０３入力装置
２３０４出力装置
２３０５外部記憶装置
２３０６可搬記録媒体駆動装置
２３０７ネットワーク接続装置
２３０８可搬記録媒体 DESCRIPTION OF SYMBOLS 101 Camera photographing part 102 Image data storage part 103 Area division process part 104 Area integration process part 105 Temporary document area outline straight line detection process part 106 Text block extraction process part 107 Text block outline straight line detection process part 108 Document area extraction process part 109 Control Part 2301 CPU
2302 Memory 2303 Input device 2304 Output device 2305 External storage device 2306 Portable recording medium driving device 2307 Network connection device 2308 Portable recording medium

Claims

入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出装置において、
前記入力画像中の各画素の色情報に基づいて該入力画像を複数の分割領域に分割する領域分割処理部と、
前記各分割領域について、分割領域の他の分割領域と接している画素と、該画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する領域統合処理部と
前記抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する仮文書領域輪郭直線検出処理部と、
前記抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する文書領域抽出処理部と、
を含むことを特徴とする文書領域抽出装置。 In a document area extracting apparatus for extracting a document area including a character string in order to recognize a character string in an input image,
An area division processing unit that divides the input image into a plurality of divided areas based on color information of each pixel in the input image;
For each of the divided areas, the difference in color information between the pixels in contact with the other divided areas of the divided area and the pixels in the other divided areas adjacent to the pixel is compared with a preset threshold value. Is smaller than a preset threshold value, the region integration processing unit that extracts the temporary document region by performing region integration that regards both divided regions as the same region, and a straight line that indicates the outline of the extracted temporary document region A temporary document region contour straight line detection processing unit for detecting a temporary document region contour straight line;
A document area extraction processing unit for extracting and outputting a rectangular document area surrounded by the extracted temporary document area outline straight line;
A document area extracting apparatus comprising:

前記抽出された仮文書領域内の文字に相当する小領域を含む文字領域であるテキストブロックを抽出するテキストブロック抽出処理部と
前記抽出したテキストブロックにおいて、前記文書領域輪郭直線検出処理部で四角形を形成する仮文書領域輪郭直線が得られていない方向にラスタスキャンを行い、前記抽出された仮文書領域の端画素を除いて仮文書領域輪郭直線が得られていない方向に近い画素を選択し、該選択した画素をハフ変換して該テキストブロックの輪郭を示す直線であるテキストブロック輪郭直線を検出するテキストブロック輪郭直線検出処理部と、を更に含み、
前記文書領域抽出処理部が、前記抽出された仮文書領域輪郭直線と前記テキストブロック輪郭直線により囲まれる四角形として、文書領域を抽出し出力すること、
を特徴とする請求項１に記載の文書領域抽出装置。 In the extracted text block, a text block extraction processing unit that extracts a text block that is a character region including a small region corresponding to a character in the extracted temporary document region; Raster scanning is performed in a direction in which the temporary document area contour straight line to be formed is not obtained, and pixels near the direction in which the temporary document area contour straight line is not obtained are selected except for the end pixels of the extracted temporary document area, A text block contour straight line detection processing unit for detecting a text block contour straight line that is a straight line indicating the contour of the text block by performing a Hough transform on the selected pixel;
The document area extraction processing unit extracts and outputs a document area as a quadrangle surrounded by the extracted temporary document area contour straight line and the text block contour straight line;
The document area extracting apparatus according to claim 1, wherein:

前記仮文書領域又は前記テキストブロック、該仮文書領域又は該テキストブロックに対応する前記輪郭直線候補、及び前記仮文書領域輪郭直線又は前記テキストブロック輪郭直線をユーザに表示し、該ユーザに前記各輪郭直線候補のうち所望のものを選択させ、又は新たな輪郭直線候補を指定させ、該選択又は指定の結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする請求項２に記載の文書領域抽出装置。 The temporary document area or the text block, the contour line candidate corresponding to the temporary document area or the text block, and the temporary document area contour straight line or the text block contour straight line are displayed to the user, and each contour is displayed to the user. Based on the result of the selection or designation, the temporary document area contour straight line detection processing unit or the text block contour straight line detection processing unit is selected based on the selection or designation result. And further includes control to execute the process again.
The document area extracting apparatus according to claim 2, wherein:

前記文書領域抽出処理部が出力する文書領域に対して歪み補正処理を実行し、該歪み補正処理により得られる文書領域から前記テキストブロック輪郭直線を抽出し、該抽出されたテキストブロック輪郭直線と前記歪み補正処理の後の前記文書領域に対応する輪郭直線との傾き関係を比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする請求項２又は３の何れか１項に記載の文書領域抽出装置。 A distortion correction process is performed on the document area output by the document area extraction processing unit, the text block outline straight line is extracted from the document area obtained by the distortion correction process, the extracted text block outline straight line and the extracted text block outline straight line Processing of the temporary document region contour straight line detection processing unit or the text block contour straight line detection processing unit based on the comparison result obtained by comparing the inclination relationship with the contour straight line corresponding to the document region after the distortion correction processing Further includes control to execute
The document area extracting apparatus according to claim 2, wherein the document area extracting apparatus is a document area extracting apparatus.

前記文書領域抽出処理部にて抽出された文書領域の分散値を算出し、該分散値を所定の閾値と比較して得られる比較結果に基づいて、前記仮文書領域輪郭直線検出処理部又は前記テキストブロック輪郭直線検出処理部の処理を再度実行させる制御を更に含む、
ことを特徴とする請求項１乃至４の何れか１項に記載の文書領域抽出装置。 Based on a comparison result obtained by calculating a variance value of the document region extracted by the document region extraction processing unit and comparing the variance value with a predetermined threshold value, the temporary document region outline straight line detection processing unit or the Further includes a control for executing the process of the text block contour straight line detection processing unit again.
The document area extracting apparatus according to claim 1, wherein the document area extracting apparatus is a document area extracting apparatus.

文書領域抽出装置が入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出方法であって、
前記文書領域抽出装置が
前記入力画像中の各画素の色情報に基づいて該入力画像を複数の分割領域に分割する領域分割処理ステップと、
前記各分割領域について、分割領域の他の分割領域と接している画素と、該画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する領域統合処理ステップと、
前記抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する仮文書領域輪郭直線検出処理ステップと、
前記抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する文書領域抽出処理ステップと、
を実行することを特徴とする文書領域抽出方法。 A document area extraction method for extracting a document area including a character string so that the document area extraction device recognizes the character string in an input image,
An area division processing step in which the document area extracting device divides the input image into a plurality of divided areas based on color information of each pixel in the input image;
For each of the divided areas, the difference in color information between the pixels in contact with the other divided areas of the divided area and the pixels in the other divided areas adjacent to the pixel is compared with a preset threshold value. A region integration processing step of extracting a temporary document region by performing region integration in which both divided regions are regarded as the same region when is smaller than a preset threshold value;
A temporary document region contour straight line detection processing step for detecting a temporary document region contour straight line that is a straight line indicating the contour of the extracted temporary document region;
A document region extraction processing step for extracting and outputting a rectangular document region surrounded by the extracted temporary document region contour line;
A document area extracting method characterized by executing

入力画像中の文字列を認識するために該文字列が含まれる文書領域を抽出する文書領域抽出装置として構成されるコンピュータに、
前記入力画像中の各画素の色情報に基づいて該入力画像を複数の分割領域に分割する領域分割処理ステップと、
前記各分割領域について、分割領域の他の分割領域と接している画素と、該画素に隣接する他の分割領域の画素との色情報の差異を予め設定した閾値と比較し、色情報の差異が予め設定した閾値よりも小さいと双方の分割領域を同一の領域と見なす領域統合を行うことにより、仮文書領域を抽出する領域統合処理ステップと、
前記抽出された仮文書領域の輪郭を示す直線である仮文書領域輪郭直線を検出する仮文書領域輪郭直線検出処理ステップと、
前記抽出された仮文書領域輪郭直線により囲まれる四角形の文書領域を抽出し出力する文書領域抽出処理ステップと、
を実行させるためのプログラム。 In order to recognize a character string in an input image, a computer configured as a document area extracting device that extracts a document area including the character string,
An area division processing step for dividing the input image into a plurality of divided areas based on color information of each pixel in the input image;
For each of the divided areas, the difference in color information between the pixels in contact with the other divided areas of the divided area and the pixels in the other divided areas adjacent to the pixel is compared with a preset threshold value. A region integration processing step of extracting a temporary document region by performing region integration in which both divided regions are regarded as the same region when is smaller than a preset threshold value;
A temporary document region contour straight line detection processing step for detecting a temporary document region contour straight line that is a straight line indicating the contour of the extracted temporary document region;
A document region extraction processing step for extracting and outputting a rectangular document region surrounded by the extracted temporary document region contour line;
A program for running