JP2019021085A

JP2019021085A - Image processing program, image processing method, and image processing device

Info

Publication number: JP2019021085A
Application number: JP2017139647A
Authority: JP
Inventors: 美佐子宗; Misako So; 瀬川　英吾; Eigo Segawa; 英吾瀬川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-07-19
Filing date: 2017-07-19
Publication date: 2019-02-07

Abstract

To enable a character region to be accurately extracted.SOLUTION: A binarizing process is performed on a first image containing characters based on a first binarizing threshold value so as to generate a second image, and a line width of the characters contained in the second image is estimated based on the generated second image. The binarizing process is performed so as to generate a group of images based on each of plural kinds of the binarizing threshold value on the connected components of partial region of the first image, and the line width is extracted from the plural places of each image contained in the image group, and the binarizing threshold value showing the least deviation in an evaluation value is specified regarding the deviation with the estimated line width, and characters are extracted from the generated image based on the specified binarizing threshold value.SELECTED DRAWING: Figure 3

Description

本発明は画像処理プログラム、画像処理方法及び画像処理装置に関する。 The present invention relates to an image processing program, an image processing method, and an image processing apparatus.

カメラで撮影した文字を含む画像からＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）を利用して文字を抽出、認識する技術が普及している。文字を抽出する方法として、例えば、明度が非一様である画像から文字を２値画像として抽出するＭＳＥＲ（ＭａｘｉｍａｌｌｙＳｔａｂｌｅＥｘｔｒｅｍａｌＲｅｇｉｏｎｓ）法や、画像を小領域に分割し、小領域毎に明度閾値を算出し、算出した明度閾値に基づいて画像から文字を２値画像として抽出する領域分割法等がある。 A technique for extracting and recognizing characters using an OCR (Optical Character Recognition) from an image including characters photographed by a camera has become widespread. As a method for extracting characters, for example, an MSER (Maximally Stable Extreme Regions) method for extracting characters as a binary image from an image with non-uniform brightness, or a brightness threshold value for each small region by dividing the image into small regions. And a region division method for extracting characters from the image as a binary image based on the calculated brightness threshold.

しかし、上述したＭＳＥＲ法や領域分割法では、画像のボケやブレによって、文字パターンに潰れや文字同士の接触が発生している場合に文字領域を正確に抽出することができず、文字の誤認識が発生する場合がある。 However, in the above-described MSER method or region segmentation method, a character region cannot be accurately extracted when a character pattern is crushed or touched between characters due to blurring or blurring of an image. Recognition may occur.

これに対し、ＭＳＥＲ法を改良した技術が存在する。ＭＳＥＲ法の改良法を利用して文字領域を抽出する処理について説明する。ＭＳＥＲ法の改良法では、ＭＳＥＲ法によって２値化した画像から文字領域を抽出し、抽出した文字領域に対応する輪郭点と濃度勾配に基づいて文字の境界を求め直し、潰れや文字同士の接触を修復する方法である。この方法では、英数字などの文字輪郭が比較的単純な場合で画像の品質がよい場合は文字領域を正確に抽出することができる。 On the other hand, there is a technique that improves the MSER method. A process for extracting a character area using an improved method of the MSER method will be described. In the improved method of the MSER method, a character region is extracted from an image binarized by the MSER method, a character boundary is re-determined based on a contour point and a density gradient corresponding to the extracted character region, and crushing or contact between characters is performed. Is a way to repair. In this method, when a character outline such as alphanumeric characters is relatively simple and the image quality is good, the character region can be accurately extracted.

しかし、漢字等のように画数が多く文字線が入り組んでいる場合や明度画像にノイズ等の画像劣化が発生している場合では、濃度勾配が理想的な状態と異なり境界の計算を失敗し、誤認識が発生する場合がある。 However, when there are many strokes, such as kanji characters, or when image degradation such as noise occurs in the brightness image, the density gradient is different from the ideal state, and the boundary calculation fails. Misrecognition may occur.

続いて、領域分割法の改良法を用いる場合の処理について説明する。領域分割法の改良法では、明度画像を文字線が含まれるような大きさの小領域に分割する。分割した領域毎に計算した明度値の平均値等によって文字線を含む領域か否かを判定し、文字線を含むと判定した領域に対して、文字線幅が予め与えられた、または、画像より算出された文字線幅になるように、各領域で２値化閾値を設定して２値化を行う。 Next, processing in the case of using an improved region dividing method will be described. In the improved region segmentation method, the brightness image is segmented into small regions that are large enough to contain character lines. It is determined whether or not it is a region including a character line by the average value of brightness values calculated for each divided region, and a character line width is given in advance for the region determined to include a character line, or an image Binarization is performed by setting a binarization threshold value in each area so that the calculated character line width is obtained.

しかし、予め与えられた文字線幅の値、または、画像より算出された文字線幅の値が適切でない場合には、各領域の文字線の有無の判定を誤り、誤認識が発生する場合がある。 However, if the value of the character line width given in advance or the value of the character line width calculated from the image is not appropriate, the determination of the presence / absence of the character line in each area may be wrong, and erroneous recognition may occur. is there.

特開２０１５−２８７３５号公報JP-A-2015-28735

Huizhong Chen，Sam S Tsai，George Schroth，David M．Chen，RadekGrzeszczuk，Bernd Girod, 「ROBUST TEXT DETECTION IN NATURAL IMAGES WITH EDGE−ENHANCED MAXIMALLY STABLE EXTREMAL REGIONS」, Image Processing (ICIP), 2011 18th IEEE International Conference on, P.2609-2612Huizhong Chen, Sam S Tsai, George Schroth, David M. Chen, RadekGrzeszczuk, Bernd Girod, "ROBUST TEXT DETECTION IN NATURAL IMAGES WITH EDGE-ENHANCED MAXIMALLY STABLE EXTREMAL REGIONS", Image Processing (ICIP), 2011 18th IEEE International Conference on, P.2609-2612

そこで、本発明は文字領域を精度良く抽出することを目的とする。 Accordingly, an object of the present invention is to accurately extract a character region.

文字を含む第一の画像について、第一の２値化閾値に基づいて２値化処理を施して第二の画像を生成し、生成した第二の画像に基づいて、第二の画像に含まれる文字の線幅を推定し、第一の画像の連結成分または部分領域について複数種類の２値化閾値のそれぞれに基づいて２値化処理を施して画像群を生成し、画像群に含まれる各画像の複数箇所から線幅を抽出し、推定した線幅との間のずれに関する評価値において最もずれが少ないことを示す２値化閾値を特定し、特定した２値化閾値に基づいて生成した画像から文字を抽出する。 About the 1st image containing a character, it binarizes based on the 1st binarization threshold value, generates the 2nd image, and includes in the 2nd image based on the generated 2nd image The line width of the character to be estimated is estimated, and binarization processing is performed based on each of a plurality of types of binarization thresholds for the connected component or the partial region of the first image to generate an image group, which is included in the image group A line width is extracted from a plurality of locations in each image, a binarization threshold value indicating that the deviation is the smallest in the evaluation value regarding the deviation from the estimated line width is specified, and generated based on the specified binarization threshold value Extract characters from the captured image.

文字領域を精度良く抽出することができる。 The character area can be extracted with high accuracy.

第１の実施形態に係る処理を行う画像処理装置の構成例を示す図である。It is a figure which shows the structural example of the image processing apparatus which performs the process which concerns on 1st Embodiment. ２値画像の輪郭点から最短黒画素ランレングス（文字領域が白画素であれば最短白画素ランレングス）を算出する際の模式図である。It is a schematic diagram when calculating the shortest black pixel run length (the shortest white pixel run length if the character area is a white pixel) from the contour point of the binary image. 第１の実施形態に係る画像処理装置の処理フローを示す図である。It is a figure which shows the processing flow of the image processing apparatus which concerns on 1st Embodiment. 画像処理を行う際の画像の具体例を示す図である。It is a figure which shows the specific example of the image at the time of performing image processing. 図４（ａ）に示す画像に対し所定の間隔で２値化閾値を減少し、２値化した際の画像を示す図である。FIG. 5 is a diagram illustrating an image when binarization is performed by reducing a binarization threshold at a predetermined interval with respect to the image illustrated in FIG. 第１の実施形態に係る画像処理装置を、ハードウェアプロセッサを用いて構成する場合の例を示す図である。It is a figure which shows the example in the case of comprising the image processing apparatus which concerns on 1st Embodiment using a hardware processor.

以下、図面を参照して本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず始めに、第１の実施形態に係る処理を行う画像処理装置１００の構成について説明する。図１は、第１の実施形態に係る処理を行う画像処理装置１００の構成例を示す図である。 First, the configuration of the image processing apparatus 100 that performs processing according to the first embodiment will be described. FIG. 1 is a diagram illustrating a configuration example of an image processing apparatus 100 that performs processing according to the first embodiment.

第１の実施形態に係る処理を行う画像処理装置１００は、受信部１０１、変換部１０２、推定部１０３、生成部１０４、抽出部１０５、評価部１０６、作成部１０７、記憶部１０８を有する。 An image processing apparatus 100 that performs processing according to the first embodiment includes a reception unit 101, a conversion unit 102, an estimation unit 103, a generation unit 104, an extraction unit 105, an evaluation unit 106, a creation unit 107, and a storage unit 108.

受信部１０１は、例えば撮像装置等の入力装置１０９から、画像を受信する。受信部１０１は、受信した画像を変換部１０２へ送信する。なお、本実施形態にかかる画像処理装置１００が処理を行う画像は、必ずしも撮像装置等の入力装置１０９から受信される必要はなく、例えば、画像処理装置１００内に撮像部を備え、撮像部で撮影した画像に対し画像処理を行っても良い。 The receiving unit 101 receives an image from an input device 109 such as an imaging device. The reception unit 101 transmits the received image to the conversion unit 102. Note that the image processed by the image processing apparatus 100 according to the present embodiment is not necessarily received from the input device 109 such as an imaging apparatus. For example, the image processing apparatus 100 includes an imaging unit, and the imaging unit Image processing may be performed on the captured image.

変換部１０２は、受信部１０１から受信した画像がカラー画像である場合には、明度変換により受信したカラー画像をモノクロ画像に変換する。変換部１０２は、変換したモノクロ画像を推定部１０３と生成部１０４へ送信する。変換部１０２は、受信部１０１から受信した画像がモノクロ画像である場合は、受信部１０１から受信したモノクロ画像を推定部１０３と生成部１０４へ送信する。 When the image received from the reception unit 101 is a color image, the conversion unit 102 converts the color image received by brightness conversion into a monochrome image. The conversion unit 102 transmits the converted monochrome image to the estimation unit 103 and the generation unit 104. When the image received from the reception unit 101 is a monochrome image, the conversion unit 102 transmits the monochrome image received from the reception unit 101 to the estimation unit 103 and the generation unit 104.

推定部１０３は、変換部１０２から受信したモノクロ画像の２値化と、２値化した画像に基づいた文字線幅の推定とを行う。 The estimation unit 103 binarizes the monochrome image received from the conversion unit 102 and estimates the character line width based on the binarized image.

ここで推定部１０３が、変換部１０２から受信したモノクロ画像を２値化する方法について説明する。推定部１０３がモノクロ画像を２値化する方法として、例えば、ＭＳＥＲ法や領域分割法がある。ＭＳＥＲ法とは、２値化閾値を一定の間隔で変化していった際に得られる前景領域（文字部分）の連結成分の面積変化が最も安定する閾値を２値化閾値として２値化を行う方法である。また、領域分割法とは、変換部１０２から受信したモノクロ画像を文字線が内部に含まれる程度の小領域に分割して領域毎に計算した明度値の平均値等によって文字線を含む領域か否かを判定し、文字線を含むと判定した領域に対して文字線幅が一定値になるように２値化閾値を各領域で設定して２値化を行う方法である。なお、推定部１０３がモノクロ画像から２値化閾値を算出し、モノクロ画像を２値化する方法は前述のＭＳＥＲ法や領域分割法に限定されず、他の方法を利用しても良い。 Here, a method in which the estimation unit 103 binarizes the monochrome image received from the conversion unit 102 will be described. As a method for the estimation unit 103 to binarize a monochrome image, for example, there are an MSER method and a region division method. In the MSER method, binarization is performed by setting a threshold at which the area change of the connected component of the foreground region (character portion) obtained when the binarization threshold is changed at regular intervals as a binarization threshold. How to do it. In addition, the region division method is a region including a character line based on an average value of brightness values calculated for each region by dividing a monochrome image received from the conversion unit 102 into small regions that include character lines inside. In this method, binarization is performed by setting a binarization threshold value in each region so that the character line width becomes a constant value for a region determined to include a character line. Note that the estimation unit 103 calculates a binarization threshold value from a monochrome image and binarizes the monochrome image is not limited to the above-described MSER method or region division method, and other methods may be used.

続いて、推定部１０３が文字線幅の推定を行う処理について説明する。推定部１０３は２値化した画像を水平方向及び垂直方向に走査して、前景領域（文字部分）に該当する黒画素または白画素のランレングスを測定する。推定部１０３は、測定した画像全体におけるランレングスの内、例えば、最も出現頻度の高いランレングスを文字線幅として推定する。 Subsequently, a process in which the estimation unit 103 estimates the character line width will be described. The estimation unit 103 scans the binarized image in the horizontal direction and the vertical direction, and measures the run length of the black pixel or white pixel corresponding to the foreground area (character portion). The estimation unit 103 estimates, for example, the run length with the highest appearance frequency among the run lengths in the entire measured image as the character line width.

なお、推定部１０３は、必ずしも出現頻度の高いランレングスを文字線幅として推定しなくても良い。例えば、画像全体におけるランレングスの平均値や中央値を文字線幅として推定しても良い。また、文字線幅を必ずしも画像全体から推定する必要はなく、連結成分または部分領域から推定しても良い。また、文字線幅は、必ずしも推定部１０３でモノクロ画像から推定しなくても良い。予め文字線幅の指定を受け付けておき、その文字線幅に基づいて画像処理を行っても良い。推定部１０３は２値化閾値の情報を生成部１０４へ、推定した文字線幅の情報を評価部１０６へ送信する。 Note that the estimation unit 103 does not necessarily have to estimate a run length having a high appearance frequency as the character line width. For example, an average or median run length in the entire image may be estimated as the character line width. In addition, the character line width does not necessarily have to be estimated from the entire image, and may be estimated from a connected component or a partial region. The character line width does not necessarily have to be estimated from the monochrome image by the estimation unit 103. A character line width designation may be received in advance, and image processing may be performed based on the character line width. The estimation unit 103 transmits the binarization threshold information to the generation unit 104 and the estimated character line width information to the evaluation unit 106.

生成部１０４は、変換部１０２から受信したモノクロ画像と、推定部１０３から受信した２値化閾値とに基づいて、変換部１０２から受信したモノクロ画像に含まれる前景領域（文字部分）の連結成分毎または部分領域毎に２値化閾値を所定の間隔δで減少または増加させて２値化した画像群を生成する。 Based on the monochrome image received from the conversion unit 102 and the binarization threshold value received from the estimation unit 103, the generation unit 104 connects connected components of the foreground region (character part) included in the monochrome image received from the conversion unit 102. A binarized threshold is decreased or increased at a predetermined interval δ for each or each partial region to generate a binarized image group.

以下、連結成分毎または部分領域毎の２値化閾値を減少または増加させる際の間隔δの算出方法の一例について説明する。説明のため、連結成分または部分領域の総数をＮとし、Ｎ個の連結成分または部分領域のうち、ｉ番目の連結成分または部分領域をＮ_ｉと表現することとする。また、推定部１０３で２値化を行った際の２値化閾値をＰ_ｉ０とする。 Hereinafter, an example of a method for calculating the interval δ when decreasing or increasing the binarization threshold for each connected component or each partial region will be described. For illustration, the total number of connected components or partial regions and N, of the N connected components or partial region of the i-th connected component or partial region of the be expressed as N _i. In addition, the binarization threshold when binarization is performed by the estimation unit 103 is _defined as P _i0 .

生成部１０４は、推定部１０３で２値化を行った際の２値化閾値Ｐ_ｉ０に基づいて前景領域（文字部分）と背景領域に２値化した際の、前景領域（文字部分）と背景領域それぞれの代表明度値Ｐ_ｉＦとＰ_ｉＢを算出する。代表明度値とは、２値化閾値Ｐ_ｉ０に基づいて２値化した前景領域（文字部分）と背景領域それぞれの領域における明度の最頻値である。ただし、生成部１０４が算出する代表明度値は必ずしもそれぞれの領域における最頻値である必要はなく、中央値や平均値としても良い。 The generation unit 104 generates a foreground region (character portion) and a foreground region (character portion) when binarized into a foreground region (character portion) and a background region based on the binarization threshold value P _i0 when the estimation unit 103 performs binarization. The representative brightness values P _iF and P _iB for each background area are calculated. The representative brightness value is a mode value of brightness in each of the foreground region (character portion) and the background region binarized based on the binarization threshold value _Pi0 . However, the representative brightness value calculated by the generation unit 104 is not necessarily a mode value in each region, and may be a median value or an average value.

生成部１０４は算出した前景領域（文字部分）と背景領域それぞれの代表明度値Ｐ_ｉＦ、Ｐ_ｉＢと、推定部１０３で２値化を行った際の２値化閾値Ｐ_ｉ０との差分の絶対値（｜Ｐ_ｉ０−Ｐ_ｉＦ｜、｜Ｐ_ｉ０−Ｐ_ｉＢ｜）を算出する。生成部１０４は、算出した差分の絶対値のうち、小さい値を、２値化閾値を減少または増加させる際の間隔δとする。ただし、２値化閾値を減少または増加させる際の間隔δは必ずしも前述の方法により算出する必要はなく、予め間隔δの設定を受け付けておいても良い。また、生成部１０４が２値化閾値を増加または減少させる際の上限値または下限値を、推定部１０３で２値化を行った際の前景領域（文字部分）の代表明度値Ｐ_ｉＦまたは背景領域の代表明度値Ｐ_ｉＢとしても良く、予め上限値または下限値の設定を受け付けておいても良い。 The generation unit 104 calculates the absolute difference between the calculated representative brightness values P _iF and P _{iB of} the foreground region (character portion) and the background region, and the binarization threshold value P _i0 when the estimation unit 103 performs binarization. The values (| P _i0 −P _iF |, | P _i0 −P _iB |) are calculated. The generation unit 104 sets a small value among the calculated absolute values of the difference as an interval δ when the binarization threshold is decreased or increased. However, the interval δ when the binarization threshold is decreased or increased is not necessarily calculated by the above-described method, and the setting of the interval δ may be received in advance. In addition, the upper or lower limit value when the generation unit 104 increases or decreases the binarization threshold is used as the representative lightness value P _iF or background of the foreground region (character portion) when the estimation unit 103 performs binarization. It may be the representative brightness value _PiB of the area, or an upper limit value or a lower limit value may be received in advance.

生成部１０４は、推定部１０３から受信した２値化閾値を間隔δで減少または増加させた２値化閾値に基づいて変換部１０２から受信したモノクロ画像を２値化し、連結成分または部分領域Ｎ_ｉに対する画像群｛Ａ_ｉ｝を生成する。生成部１０４は、生成した画像群｛Ａ_ｉ｝を抽出部１０５へ送信する。 The generation unit 104 binarizes the monochrome image received from the conversion unit 102 based on the binarization threshold obtained by decreasing or increasing the binarization threshold received from the estimation unit 103 at the interval δ, and generates a connected component or partial region N. generating an image group _{{a i}} for _i. The generation unit 104 transmits the generated image group {A _i } to the extraction unit 105.

抽出部１０５は、生成部１０４から受信した連結成分または部分領域Ｎ_ｉに対する画像群｛Ａ_ｉ｝のそれぞれの文字線幅を抽出する。以下に抽出部１０５が文字線幅を抽出する際の一例について説明する。抽出部１０５が文字線幅を抽出する方法としては、例えば、２値画像の各輪郭点Ｐ_ｎから複数方向に前景画素（文字領域の画素）上を走査し、黒画素（文字領域が白画素であれば白画素）の画素ランレングスのうち、最短黒画素ランレングス（文字領域が白画素であれば最短白画素ランレングス）を文字線幅として抽出する。 Extraction unit 105 extracts each character line width of the image group {A _i} for connected component or partial region N _i received from the generator 104. An example when the extraction unit 105 extracts the character line width will be described below. As a method for the extraction unit 105 to extract the character line width, for example, the foreground pixels (character region pixels) are scanned in a plurality of directions from each contour point P _n of the binary image, and black pixels (the character region is a white pixel). If so, the shortest black pixel run length (or the shortest white pixel run length if the character region is a white pixel) is extracted as the character line width.

図２は２値画像の輪郭点Ｐ_ｎから最短黒画素ランレングス（文字領域が白画素であれば最短白画素ランレングス）を算出する際の模式図である。抽出部１０５は、図２に示すように輪郭点Ｐ_ｎから複数方向に走査を行い、複数方向のランレングスを算出する。図２では輪郭点Ｐ_ｎから４方向に走査した例を示す。図２に示す輪郭点Ｐ_ｎにおいては、（ｃ）方向への走査によって算出したランレングスが最短であるため、（ｃ）方向へのランレングスを文字線幅ｗ_ｉｋｎとする。抽出部１０５が輪郭点Ｐ_ｎにおいて走査を行う方向は予め設定を受け付けておいても良く、文字領域の形状に応じて変更しても良い。 FIG. 2 is a schematic diagram for calculating the shortest black pixel run length (the shortest white pixel run length if the character area is a white pixel) from the contour point P _{n of the} binary image. As illustrated in FIG. 2, the extraction unit 105 scans in a plurality of directions from the contour point P _n and calculates a run length in a plurality of directions. FIG. 2 shows an example of scanning in four directions from the contour point _Pn . At the contour point P _n shown in FIG. 2, the run length calculated by scanning in the (c) direction is the shortest, so the run length in the (c) direction is _defined as the character line width w _ikn . The direction in which the extraction unit 105 scans at the contour point P _n may be received in advance, or may be changed according to the shape of the character area.

抽出部１０５は、全ての輪郭点Ｐ_ｎに対する文字線幅ｗ_ｉｋｎを抽出する。文字線幅ｗ_ｉｋｎは、ｉ番目の連結成分または部分領域のｋ番目の２値化閾値おける、ｎ番目の輪郭点の文字線幅を示す。抽出部１０５は抽出した文字線幅ｗ_ｉｋｎの情報とそれぞれの画像の輪郭点の総数Ｔ_ｉｋの情報を評価部１０６へ送信する。 The extraction unit 105 extracts the character line width w _ikn for all the contour points P _n . The character line width w _ikn indicates the character line width of the nth contour point in the kth binarization threshold of the i th connected component or partial region. The extraction unit 105 _transmits information on the extracted character line width w _ikn and information on the total number T _ik of the contour points of each image to the evaluation unit 106.

評価部１０６は、推定部１０３で推定した文字線幅ｄと、抽出部１０５で抽出した文字線幅ｗ_ｉｋｎとに基づいて各連結成分または各部分領域の画像群｛Ａ_ｉ｝に含まれる２値画像のそれぞれにおける潰れ・掠れの度合いを算出する評価関数により評価値ｆ_ｉｋを算出する。評価値ｆ_ｉｋを算出する際の式を（数１）に示す。
（数１）
The evaluation unit 106 includes 2 included in the image group {A _i } of each connected component or each partial region based on the character line width d estimated by the estimation unit 103 and the character line width w _ikn extracted by the extraction unit 105. The evaluation value f _ik is calculated by an evaluation function that calculates the degree of crushing / curling in each of the value images. An expression for calculating the evaluation value f _ik is shown in (Equation 1).
(Equation 1)

上記式において、ｄは推定部１０３で推定した文字線幅、ｗ_ｉｋｎはｉ番目の連結成分または部分領域のｋ番目の２値化閾値おける、ｎ番目の輪郭点の文字線幅、Ｔ_ｉｋは部分領域の２値画像の輪郭点の総数である。評価部１０６で算出する評価値ｆ_ｉｋは、推定部１０３で推定した文字線幅と抽出部１０５で抽出した各連結成分または各部分領域の２値画像における文字線幅との平均２乗誤差平方根であり、抽出部１０５で抽出した各部分領域の２値画像における文字線幅が、推定部１０３で推定した文字線幅に近い場合に小さい値となる。評価部１０６は、異なる閾値で２値化した画像それぞれにおける評価値のうち、最小を示す評価値を抽出する。評価部１０６は、抽出した最小を示す評価値及びその評価値に対応する２値化閾値の情報を作成部１０７へ送信する。 In the above formula, d is estimated character line width estimating section 103, w _IKN the definitive k-th binarization threshold of i-th connected component or partial region, n-th contour point character line width, T _ik is This is the total number of contour points of the binary image of the partial area. The evaluation value f _ik calculated by the evaluation unit 106 is an average square error square root between the character line width estimated by the estimation unit 103 and the character line width in each connected component or binary image of each partial region extracted by the extraction unit 105. When the character line width in the binary image of each partial region extracted by the extraction unit 105 is close to the character line width estimated by the estimation unit 103, the value is small. The evaluation unit 106 extracts an evaluation value indicating the minimum among the evaluation values in each of the images binarized with different threshold values. The evaluation unit 106 transmits the evaluation value indicating the extracted minimum and the binarization threshold information corresponding to the evaluation value to the creation unit 107.

作成部１０７は、評価部１０６から受信した評価値及びその評価値に対応する２値化閾値の情報に基づき、連結成分または部分領域毎に最小となる評価値に対応する２値化閾値により２値画像を生成する。作成部１０７は、生成した連結成分または部分領域毎の２値画像を推定部１０３から受信した２値画像において、対応する連結成分または部分領域の箇所の画像と置き換え、２値画像を生成する。作成部１０７は生成した２値画像を記憶装置１１０へ送信する。ただし、作成部１０７は生成した画像を必ずしも記憶装置１１０へ送信する必要はなく、例えば出力装置等に送信しても良い。 Based on the evaluation value received from the evaluation unit 106 and the information on the binarization threshold corresponding to the evaluation value, the creation unit 107 uses the binarization threshold corresponding to the minimum evaluation value for each connected component or partial region. Generate a value image. The creation unit 107 generates a binary image by replacing the generated binary image for each connected component or partial region with the corresponding connected component or partial region image in the binary image received from the estimation unit 103. The creation unit 107 transmits the generated binary image to the storage device 110. However, the creation unit 107 does not necessarily need to transmit the generated image to the storage device 110, and may transmit it to an output device, for example.

記憶部１０８は、画像処理装置１００が画像処理を行うのに必要な種々の情報を記憶する。種々の情報とは、例えば、受信部１０１が受信した画像や、変換部１０２が変換したモノクロ画像、推定部１０３が推定した文字線幅、生成部１０４が生成した画像群、抽出部１０５が抽出した画像群に含まれる画像それぞれの２値化閾値、評価部１０６で算出した評価値及び評価値に対応する画像の２値化閾値、作成部１０７が生成した２値画像等である。また、記憶部１０８は前述の内容以外にも画像処理に関する情報を記憶しても良い。 The storage unit 108 stores various information necessary for the image processing apparatus 100 to perform image processing. The various information includes, for example, an image received by the receiving unit 101, a monochrome image converted by the converting unit 102, a character line width estimated by the estimating unit 103, an image group generated by the generating unit 104, and an extraction unit 105 extracting The binarization threshold value of each image included in the image group, the evaluation value calculated by the evaluation unit 106, the binarization threshold value of the image corresponding to the evaluation value, the binary image generated by the creation unit 107, and the like. Further, the storage unit 108 may store information related to image processing in addition to the above contents.

次に、第１の実施形態に係る画像処理装置１００の処理フローについて具体例とともに説明する。具体例として、画像処理装置１００がボケやブレが生じた「町」という文字を含む画像に対し、画像処理を行う際の例について説明する。図３は第１の実施形態に係る画像処理装置１００の処理フローを示す図である。 Next, the processing flow of the image processing apparatus 100 according to the first embodiment will be described together with a specific example. As a specific example, an example in which the image processing apparatus 100 performs image processing on an image including characters “town” in which blurring or blurring has occurred will be described. FIG. 3 is a diagram illustrating a processing flow of the image processing apparatus 100 according to the first embodiment.

第１の実施形態に係る画像処理装置１００は、受信部１０１で入力装置１０９から画像を受信する（ステップＳ３０１）。 In the image processing apparatus 100 according to the first embodiment, the reception unit 101 receives an image from the input device 109 (step S301).

画像処理装置１００の変換部１０２は、受信部１０１で受信した画像の変換を行う（ステップＳ３０２）。画像処理装置１００の変換部１０２は受信部１０１から受信した画像がカラー画像である場合には、モノクロ画像への変換を行い、変換を行った画像を推定部１０３に送信する。また、画像処理装置１００の変換部１０２は、受信部１０１から受信した画像がモノクロ画像である場合には、受信したモノクロ画像を推定部１０３へ送信する。 The conversion unit 102 of the image processing apparatus 100 converts the image received by the reception unit 101 (step S302). When the image received from the reception unit 101 is a color image, the conversion unit 102 of the image processing apparatus 100 converts the image into a monochrome image and transmits the converted image to the estimation unit 103. In addition, when the image received from the reception unit 101 is a monochrome image, the conversion unit 102 of the image processing apparatus 100 transmits the received monochrome image to the estimation unit 103.

図４は画像処理を行う際の画像の具体例を示す図である。画像処理装置１００の変換部１０２が変換を行った画像を図４（ａ）に示す。図４（ａ）に示すよう、「町」という文字を含む画像を変換部１０２でモノクロ変換した画像は、ブレが生じて「町」という文字の輪郭が分かりにくくなっている。図４（ａ）に示すようなブレが生じている場合に、従来のＭＳＥＲ法や領域分割法を行うと、図４（ｂ）に示すように、「町」という文字の左側の部分が潰れるような２値化を行ってしまい、最終的に「町」という文字を図４（ｃ）に示すような「可」という文字と誤認識してしまう場合がある。 FIG. 4 is a diagram showing a specific example of an image when image processing is performed. An image converted by the conversion unit 102 of the image processing apparatus 100 is shown in FIG. As shown in FIG. 4A, an image obtained by converting the image including the characters “town” into monochrome by the conversion unit 102 is blurred and the outline of the characters “town” is difficult to understand. When blurring as shown in FIG. 4 (a) occurs, if the conventional MSER method or region segmentation method is performed, the left portion of the word “town” is crushed as shown in FIG. 4 (b). In some cases, the binarization is performed and the character “town” is finally mistakenly recognized as the character “OK” as shown in FIG.

画像処理装置１００の推定部１０３は、変換部１０２から受信したモノクロ画像を２値化する（ステップＳ３０３）。画像処理装置１００の推定部１０３がモノクロ画像を２値化する方法として、例えば、ＭＳＥＲ法や領域分割法がある。 The estimation unit 103 of the image processing apparatus 100 binarizes the monochrome image received from the conversion unit 102 (step S303). Examples of a method for the estimation unit 103 of the image processing apparatus 100 to binarize a monochrome image include an MSER method and a region division method.

続いて、画像処理装置１００の推定部１０３は、２値化した画像から文字線幅を推定する（ステップＳ３０４）。画像処理装置１００の推定部１０３が文字線幅を推定する方法としては、例えば、２値化した画像における線幅のヒストグラムに基づいて推定する方法である。画像処理装置１００の推定部１０３は２値化した画像を水平方向及び、垂直方向に走査して、黒画素（または白画素）のランレングスを測定する。推定部１０３は、測定した画像全体におけるランレングスの内、例えば、最も出現頻度の高いランレングスを文字線幅として推定する。ただし、推定部１０３は、必ずしも出現頻度の高いランレングスを文字線幅として推定する必要はなく、例えば、画像全体におけるランレングスの平均値や中央値を文字線幅として推定しても良い。また、文字線幅は、必ずしも推定部１０３でモノクロ画像から推定する必要はなく、予め文字線幅の指定を受け付けておき、その文字線幅に基づいて画像処理を行っても良い。 Subsequently, the estimation unit 103 of the image processing apparatus 100 estimates the character line width from the binarized image (step S304). The estimation unit 103 of the image processing apparatus 100 estimates the character line width based on, for example, a line width histogram in a binarized image. The estimation unit 103 of the image processing apparatus 100 scans the binarized image in the horizontal direction and the vertical direction, and measures the run length of the black pixel (or white pixel). The estimation unit 103 estimates, for example, the run length with the highest appearance frequency among the run lengths in the entire measured image as the character line width. However, the estimation unit 103 does not necessarily have to estimate a run length having a high appearance frequency as a character line width, and may estimate an average value or a median of run lengths in the entire image as a character line width, for example. In addition, the character line width does not necessarily have to be estimated from the monochrome image by the estimation unit 103, and designation of the character line width may be received in advance, and image processing may be performed based on the character line width.

画像処理装置１００の生成部１０４は、変換部１０２で変換したモノクロ画像と、推定部１０３で推定した２値化閾値Ｔとに基づいて、変換部１０２で変換したモノクロ画像に含まれる前景領域（文字部分）の連結成分毎または部分領域毎に２値化閾値Ｔを所定の間隔δで減少または増加させた２値化閾値に基づき２値化した画像群｛Ａ_ｉ｝を生成する（ステップＳ３０５）。図５は図４（ａ）に示す画像に対し所定の間隔で２値化閾値を減少し、２値化した際の画像を示す図である。図５の（ｂ）〜（ｅ）は、画像処置装置１００が、推定部１０３で２値化した図５（ａ）に示す画像に対し、２値化閾値Ｔから所定の間隔δずつ小さくして２値化した画像を示す図である。図５に示すように、２値化閾値を小さくすることで、ブレが生じた画像に含まれる「町」という文字から抽出する文字の線幅が異なることが分かる。画像処理装置１００は、図５に示すように、異なる２値化閾値ｋによって２値化した画像の画像群｛Ａ_ｉ｝を生成する。 The generation unit 104 of the image processing apparatus 100 uses the monochrome image converted by the conversion unit 102 and the binarization threshold T estimated by the estimation unit 103, foreground regions ( A binarized image group {A _i } is generated based on a binarization threshold obtained by decreasing or increasing the binarization threshold T at a predetermined interval δ for each connected component or partial region of the character portion (step S305). ). FIG. 5 is a diagram illustrating an image when the binarization threshold is decreased at a predetermined interval with respect to the image illustrated in FIG. 5B to 5E, the image processing apparatus 100 reduces the image shown in FIG. 5A binarized by the estimation unit 103 by a predetermined interval δ from the binarization threshold T. It is a figure which shows the image binarized. As shown in FIG. 5, it can be seen that by reducing the binarization threshold, the line width of the character extracted from the characters “town” included in the blurred image is different. As illustrated in FIG. 5, the image processing apparatus 100 generates an image group {A _i } of images binarized using different binarization threshold values k.

画像処理装置１００の抽出部１０５は、生成部１０４で生成した画像群｛Ａ_ｉ｝に含まれる画像それぞれから文字線幅を抽出する（ステップＳ３０６）。画像処理装置１００の抽出部１０５が文字線幅を抽出する方法としては、例えば、２値画像の各輪郭点Ｐ_ｎから複数方向に前景画素（文字領域の画素）上を走査し、黒画素または白画素（文字領域が示す領域の画素）の画素ランレングスのうち、最短黒画素ランレングス（文字領域が白画素であれば最短白画素ランレングス）を文字線幅ｗ_ｉｋｎとして抽出する。画像処理装置１００の抽出部１０５は、全ての輪郭点Ｐ_ｎに対する文字線幅ｗ_ｉｋｎを抽出する。 The extraction unit 105 of the image processing apparatus 100 extracts the character line width from each of the images included in the image group {A _i } generated by the generation unit 104 (step S306). As a method for extracting the character line width by the extraction unit 105 of the image processing apparatus 100, for example, a foreground pixel (character region pixel) is scanned in a plurality of directions from each contour point P _n of the binary image, and a black pixel or Of the pixel run lengths of the white pixels (the pixels in the region indicated by the character region), the shortest black pixel run length (or the shortest white pixel run length if the character region is a white pixel) is extracted as the character line width w _ikn . The extraction unit 105 of the image processing apparatus 100 extracts the character line width w _ikn for all the contour points P _n .

画像処理装置１００の評価部１０６は、推定部１０３で推定した文字線幅ｄと、抽出部１０５で抽出した文字線幅ｗ_ｉｋｎとに基づいて各連結成分または各部分領域の画像群｛Ａ_ｉ｝に含まれる２値画像のそれぞれにおける潰れ・掠れの度合いを算出する評価関数により評価値ｆ_ｉｋを算出する。画像処理装置１００の評価部１０６が算出する評価値ｆ_ｉｋは、推定部１０３で推定した文字線幅と抽出部１０５で抽出した各連結成分または各部分領域の２値画像における文字線幅との平均２乗誤差平方根であり、抽出部１０５で抽出した各部分領域の２値画像における文字線幅が、推定部１０３で推定した文字線幅に近い場合に小さい値となる。画像処理装置１００の評価部１０６は、異なる閾値で２値化した画像それぞれにおける評価値のうち、最小を示す評価値を算出する（ステップＳ３０７）。 Based on the character line width d estimated by the estimation unit 103 and the character line width w _ikn extracted by the extraction unit 105, the evaluation unit 106 of the image processing apparatus 100 uses the image group {A _{i of} each connected component or each partial region. }, An evaluation value f _ik is calculated by an evaluation function that calculates the degree of crushing / blurring in each of the binary images included in the. The evaluation value f _ik calculated by the evaluation unit 106 of the image processing apparatus 100 is the character line width estimated by the estimation unit 103 and the character line width in the binary image of each connected component or each partial region extracted by the extraction unit 105. It is the mean square error square root, and becomes a small value when the character line width in the binary image of each partial region extracted by the extraction unit 105 is close to the character line width estimated by the estimation unit 103. The evaluation unit 106 of the image processing apparatus 100 calculates an evaluation value indicating the minimum among the evaluation values in each of the images binarized with different threshold values (step S307).

画像処理装置１００の作成部１０７は、評価部１０６で算出した評価値及びその評価値に対応する２値化閾値の情報に基づき、連結成分または部分領域毎に最小となる評価値に対応する２値化閾値に応じた２値画像を生成する。作成部１０７は、生成した連結成分または部分領域毎の２値画像を推定部１０３から受信した２値画像において、対応する連結成分または部分領域の箇所の画像と置き換え、２値画像を生成する（ステップＳ３０８）。画像処理装置１００は、図４（ａ）の画像に対し画像処理を行うことで、図５（ｃ）の画像を生成する。画像処理装置１００は作成部１０７で生成した２値画像を記憶装置１１０へ出力する（ステップＳ３０９）。生成した２値画像の用途として、例えば、記憶装置１１０に記憶した２値画像から、文字を抽出し、それぞれの画像に対し、抽出した文字をタグ付けすることで画像検索を効率的に行うことができる。 The creation unit 107 of the image processing apparatus 100 is based on the evaluation value calculated by the evaluation unit 106 and the binarization threshold information corresponding to the evaluation value, and 2 corresponding to the minimum evaluation value for each connected component or partial region. A binary image corresponding to the threshold value is generated. The creation unit 107 replaces the generated binary image for each connected component or partial region with the corresponding connected component or the image of the portion of the partial region in the binary image received from the estimation unit 103, and generates a binary image ( Step S308). The image processing apparatus 100 generates the image of FIG. 5C by performing image processing on the image of FIG. The image processing apparatus 100 outputs the binary image generated by the creation unit 107 to the storage device 110 (step S309). As an application of the generated binary image, for example, a character is extracted from the binary image stored in the storage device 110, and the image is efficiently searched by tagging the extracted character for each image. Can do.

以上のように、第１の実施形態に係る処理を行うことにより画像内の文字にボケやブレがある場合でも、連結成分毎または部分領域毎に文字領域を精度良く抽出することができ、従来のＭＳＥＲ法の改良法での課題であった入り組んだ文字に対しても、推定部１０３で画像に含まれる文字線幅を推定しておくことで、潰れを抑制することができる。また、従来の領域分割法の改良法での課題であった文字線幅の値が適切でないことによる誤認識についても、予め推定部１０３で画像に含まれる文字線幅を推定することで、連結成分毎または部分領域毎に含まれる文字の有無を誤る可能性を低減することができる。以上より、文字間でボケやブレ、明るさの度合いが異なる場合にも、それぞれの連結成分毎または部分領域毎に最適な２値化閾値に応じて２値化を行うことができ、文字領域を精度良く抽出することができる。 As described above, the character region can be accurately extracted for each connected component or each partial region even when the characters in the image are blurred or blurred by performing the processing according to the first embodiment. Even for complicated characters, which is a problem in the improved method of the MSER method, the estimation unit 103 can estimate the character line width included in the image to suppress crushing. In addition, misrecognition due to the inappropriate value of the character line width, which was a problem in the improvement of the conventional region segmentation method, can be obtained by estimating the character line width included in the image in the estimation unit 103 in advance. The possibility of erroneous presence or absence of characters included in each component or each partial region can be reduced. As described above, even when the degree of blur, blur, and brightness differs between characters, binarization can be performed according to the optimum binarization threshold for each connected component or for each partial region. Can be extracted with high accuracy.

図６は、第１の実施形態に係る画像処理装置１００を、ハードウェアプロセッサを用いて構成する場合の例を示す図である。画像処理装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６０１とメモリ（主記憶装置）６０２と、補助記憶装置６０３と、Ｉ／Ｏ装置６０４、ネットワークインタフェース６０５を有する。これらの各装置はバス６０６を介して接続される。 FIG. 6 is a diagram illustrating an example in which the image processing apparatus 100 according to the first embodiment is configured using a hardware processor. The image processing apparatus 100 includes a CPU (Central Processing Unit) 601, a memory (main storage device) 602, an auxiliary storage device 603, an I / O device 604, and a network interface 605. Each of these devices is connected via a bus 606.

ＣＰＵ６０１は画像処理装置１００全体の制御を司り、図３のフローに示した各処理を実行する。メモリ６０２には、本実施形態に係る処理を行うプログラムが記憶されている。 The CPU 601 controls the entire image processing apparatus 100 and executes each process shown in the flow of FIG. The memory 602 stores a program for performing processing according to the present embodiment.

ＣＰＵ６０１は補助記憶装置６０３から処理に関するプログラムの情報を読み出し、メモリ６０２に格納する。さらにＣＰＵ６０１は、メモリ６０２に格納された情報に基づき、画像処理を行う。ただし、すべての処理に関する情報は常にメモリ６０２に格納される必要はなく、処理に用いられるデータがメモリ６０２に格納されれば良い。また、処理に関するプログラムを、必ずしも補助記憶装置６０３に記憶させる必要はなく、例えば、コンピュータに挿入されるディスク等の可搬用媒体に記憶させておいても良い。 The CPU 601 reads program information related to processing from the auxiliary storage device 603 and stores it in the memory 602. Further, the CPU 601 performs image processing based on information stored in the memory 602. However, it is not always necessary to store information regarding all processes in the memory 602, and data used for the process may be stored in the memory 602. Further, the program relating to the processing is not necessarily stored in the auxiliary storage device 603, and may be stored in a portable medium such as a disk inserted into the computer, for example.

Ｉ／Ｏ装置６０４は、例えば、入力装置１０９からの画像データの受信や、画像処理に関する設定等の入力を受け付ける。また、画像処理の結果等をディスプレイ等に出力する。 The I / O device 604 receives, for example, input of image data from the input device 109 and settings related to image processing. Also, the result of the image processing is output to a display or the like.

ネットワークインタフェース６０５は、ネットワーク上での情報のやり取りを行うインタフェース装置である。 The network interface 605 is an interface device that exchanges information on the network.

バス６０６は上記各装置を互いに接続し、データのやり取りを行う通信経路である。 A bus 606 is a communication path for connecting the above devices to each other and exchanging data.

なお、本発明は、以上に述べた実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内で種々の構成または実施形態を採ることができる。 The present invention is not limited to the embodiments described above, and various configurations or embodiments can be adopted without departing from the gist of the present invention.

１００画像処理装置
１０１受信部
１０２変換部
１０３推定部
１０４生成部
１０５抽出部
１０６評価部
１０７作成部
１０８記憶部
１０９入力装置
１１０記憶装置
６０１ＣＰＵ
６０２メモリ（主記憶装置）
６０３補助記憶装置
６０４Ｉ／Ｏ装置
６０５ネットワークインタフェース
６０６バス DESCRIPTION OF SYMBOLS 100 Image processing apparatus 101 Reception part 102 Conversion part 103 Estimation part 104 Generation part 105 Extraction part 106 Evaluation part 107 Creation part 108 Storage part 109 Input device 110 Storage apparatus 601 CPU
602 Memory (main storage device)
603 Auxiliary storage device 604 I / O device 605 Network interface 606 Bus

Claims

文字を含む第一の画像について、第一の２値化閾値に基づいて２値化処理を施して第二の画像を生成し、
生成した前記第二の画像に基づいて、前記第二の画像に含まれる文字の線幅を推定し、
前記第一の画像の連結成分または部分領域について複数種類の２値化閾値のそれぞれに基づいて２値化処理を施して画像群を生成し、
前記画像群に含まれる各画像の複数箇所から線幅を抽出し、抽出した線幅と推定した前記線幅との間のずれに関する評価値において最もずれが少ないことを示す２値化閾値を特定し、
特定した前記２値化閾値に基づいて生成した画像から文字を抽出する、
処理をコンピュータに実行させることを特徴とする画像処理プログラム。 A first image including characters is subjected to binarization processing based on the first binarization threshold to generate a second image,
Based on the generated second image, the line width of the character included in the second image is estimated,
An image group is generated by performing binarization processing based on each of a plurality of types of binarization thresholds for the connected component or partial region of the first image,
A line width is extracted from a plurality of positions of each image included in the image group, and a binarization threshold value indicating that the deviation is the smallest in the evaluation value regarding the deviation between the extracted line width and the estimated line width is specified. And
Extracting characters from an image generated based on the specified binarization threshold;
An image processing program for causing a computer to execute processing.

前記複数種類の２値化閾値は、前記第一の２値化閾値に対して所定の値の倍数を加算または減算した値であることを特徴とする請求項１に記載の画像処理プログラム。 The image processing program according to claim 1, wherein the plurality of types of binarization threshold values are values obtained by adding or subtracting a multiple of a predetermined value to the first binarization threshold value.

前記所定の値は、前記第一の２値化閾値に基づいて画像を２値化した際の２値化したそれぞれの領域における明度値に基づいて特定した値と、前記第一の２値化閾値とに基づいて算出する処理をコンピュータに実行させることを特徴とする請求項１または２に記載の画像処理プログラム。 The predetermined value includes a value specified based on a lightness value in each binarized area when the image is binarized based on the first binarization threshold, and the first binarization The image processing program according to claim 1, wherein the computer executes a process to be calculated based on the threshold value.

前記複数の２値化閾値の上限値及び下限値を、前記第一の２値化閾値に基づいて画像を２値化した際の２値化したそれぞれの領域における明度値に基づいて特定した値とする処理をコンピュータに実行させることを特徴とする請求項１乃至３のいずれか一項に記載の画像処理プログラム。 The value specified based on the brightness value in each binarized area when binarizing the image based on the first binarization threshold, the upper limit value and the lower limit value of the plurality of binarization threshold values The image processing program according to any one of claims 1 to 3, wherein the computer executes the processing described above.

前記文字の線幅を画像全体、画像の連結成分または部分領域のいずれかの領域に基づいて推定する処理をコンピュータに実行させることを特徴とする請求項１乃至４のいずれか一項に記載の画像処理プログラム。 5. The computer according to claim 1, wherein the computer executes a process of estimating the line width of the character based on any one of an entire image, a connected component of the image, or a partial region. 6. Image processing program.

文字を含む第一の画像について、第一の２値化閾値に基づいて２値化処理を施して第二の画像を生成し、
生成した前記第二の画像に基づいて、前記第二の画像に含まれる文字の線幅を推定し、
前記第一の画像の連結成分または部分領域について複数種類の２値化閾値のそれぞれに基づいて２値化処理を施して画像群を生成し、
前記画像群に含まれる各画像の複数箇所から線幅を抽出し、抽出した線幅と推定した前記線幅との間のずれに関する評価値において最もずれが少ないことを示す２値化閾値を特定し、
特定した前記２値化閾値に基づいて生成した画像から文字を抽出する、
処理をコンピュータが実行することを特徴とする画像処理方法。 A first image including characters is subjected to binarization processing based on the first binarization threshold to generate a second image,
Based on the generated second image, the line width of the character included in the second image is estimated,
An image group is generated by performing binarization processing based on each of a plurality of types of binarization thresholds for the connected component or partial region of the first image,
A line width is extracted from a plurality of positions of each image included in the image group, and a binarization threshold value indicating that the deviation is the smallest in the evaluation value regarding the deviation between the extracted line width and the estimated line width is specified. And
Extracting characters from an image generated based on the specified binarization threshold;
An image processing method, wherein the computer executes the processing.

文字を含む第一の画像について、第一の２値化閾値に基づいて２値化処理を施して第二の画像を生成し、
生成した前記第二の画像に基づいて、前記第二の画像に含まれる文字の線幅を推定する推定部と、
前記第一の画像の連結成分または部分領域について複数種類の２値化閾値のそれぞれに基づいて２値化処理を施して画像群を生成する生成部と、
前記画像群に含まれる各画像の複数箇所から線幅を抽出し、抽出した線幅と推定した前記線幅との間のずれに関する評価値において最もずれが少ないことを示す２値化閾値を特定する評価部と、
特定した前記２値化閾値に基づいて生成した画像から文字を抽出する作成部と、
を有することを特徴とする画像処置装置。 A first image including characters is subjected to binarization processing based on the first binarization threshold to generate a second image,
An estimation unit that estimates a line width of a character included in the second image based on the generated second image;
A generating unit that generates a group of images by performing binarization processing based on each of a plurality of types of binarization thresholds for the connected component or partial region of the first image;
A line width is extracted from a plurality of positions of each image included in the image group, and a binarization threshold value indicating that the deviation is the smallest in the evaluation value regarding the deviation between the extracted line width and the estimated line width is specified. An evaluation section to
A creation unit that extracts characters from an image generated based on the specified binarization threshold;
An image processing apparatus comprising: