JP2001143021A

JP2001143021A - Method and device for recognizing character

Info

Publication number: JP2001143021A
Application number: JP31948299A
Authority: JP
Inventors: Jutaro Ishioka; 寿太郎石岡
Original assignee: Japan Digital Laboratory Co Ltd
Current assignee: Japan Digital Laboratory Co Ltd
Priority date: 1999-11-10
Filing date: 1999-11-10
Publication date: 2001-05-25

Abstract

PROBLEM TO BE SOLVED: To provide character recognition method and device capable of improving a recognition ratio, when a document or the like on which a ruled line or a frame line is printed out by monochromatic printing using a non-dropout color is read out by an image reader. SOLUTION: A character is segmented from an image from which a ruled line or the like is removed (S1, S2), ruled line contact information is stored in a memory, a segmented character image recognition processing (S4, S5) is executed, and when the segmented character image cannot be recognized, the segmented character image is interpolated, so as to connect partial sections lost by the ruled line removal on the basis of the ruled line contact information (S7), and recognition processing (S8, S9) is executed again and whether a recognition result is to be outputted is judged (S10).

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字認識技術に関
し、特に、非ドロップアウトカラーの１色刷りで罫線や
枠線が印刷された帳票に記入された文字をイメージリー
ダで読み取って得た読み取りイメージから罫線又は枠線
を除去した文字の認識技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition technology, and more particularly to a character recognition technology which reads a character written on a form on which a ruled line and a frame line are printed by one color printing of a non-dropout color with an image reader. The present invention relates to a technology for recognizing characters from which ruled lines or frame lines have been removed.

【０００２】[0002]

【従来の技術】ＯＣＲやスキャナ等のイメージリーダで
は帳票や原稿上の文字を読み取って電気信号に変換し文
字イメージを出力するが、帳票には、通常、罫線や枠線
が印刷されており文字は罫線に沿って印刷或いは記入さ
れるので、読み取りの邪魔にならないように罫線や枠線
はドロップアウトカラーで印刷されている。従って、イ
メージリーダで文字が印刷或いは記入された帳票を読み
取っても罫線又は枠線は読み取られないのでそれらの罫
線や枠線イメージなしの文字、すなわち、文字のみを読
み取ることができる。2. Description of the Related Art An image reader such as an OCR or a scanner reads a character on a form or a document, converts the character into an electric signal, and outputs a character image. However, a form usually has a ruled line or a frame line printed thereon. Are printed or written along the ruled lines, so that the ruled lines and frame lines are printed in a dropout color so as not to interfere with reading. Therefore, even if a form on which characters are printed or written is read by an image reader, the ruled lines or the frame lines are not read, so that the characters without those ruled lines or the frame line images, that is, only the characters can be read.

【０００３】しかし、従来、このような帳票をイメージ
リーダで読み取って文字認識処理を行うには文字記入位
置を判定するために罫線又は枠線の位置を示す非ドロッ
プアウトカラーのガイドマークを印刷しておく必要があ
った。However, conventionally, in order to perform character recognition processing by reading such a form with an image reader, a non-dropout color guide mark indicating the position of a ruled line or a frame line is printed in order to determine the character entry position. Had to be kept.

【０００４】しかし、上述のように非ドロップアウトカ
ラーのガイドマークを印刷する方法では帳票をドロップ
アウトカラーの罫線又は枠線と非ドロップアウトカラー
のガイドラインの２色刷りとする必要があるので、帳票
の印刷コストがかかりランニングコストが高くなるとい
った問題点があった。However, in the method of printing the guide mark of the non-dropout color as described above, it is necessary to print the form in two colors of the ruled line or frame of the dropout color and the guideline of the non-dropout color. There is a problem that printing cost is high and running cost is high.

【０００５】[0005]

【発明が解決しようとする課題】上述した２色刷りの帳
票を用いることによるランニングコストの上昇を避ける
には非ドロップアウトカラーの１色刷りで印刷された帳
票を用いればよいが、この場合には、非ドロップアウト
カラーで印刷された罫線又は枠線と記入された文字が接
触又は重複すると、イメージリーダで読み取った際、罫
線又は枠線と文字との区別がつかず、誤認識や読み取り
不能を生ずる場合があるといった問題点があった。In order to avoid an increase in running cost due to the use of the above-described two-color printing form, a form printed with non-dropout color one-color printing may be used. If a ruled line or frame printed in a non-dropout color touches or overlaps with a written character, the ruled line or frame cannot be distinguished from the character when read by an image reader, resulting in erroneous recognition or inability to read. There was a problem that there was a case.

【０００６】そこで、罫線又は枠線と文字が接触した場
合の誤認識や読み取り不能を防止するためには、イメー
ジリーダで読み取った後、罫線又は枠線を強制的に除去
すればよいが、単に、罫線や枠線を除去するだけでは
（罫線又は枠線と接触していた文字の一部が除去されて
しまうので）除去後の文字イメージの認識率が低下する
ので、従来は、残った文字イメージ部分の前後のストロ
ークの方向とその距離等から除去された部分を推定して
イメージ補間を行っていた。Therefore, in order to prevent erroneous recognition or inability to read when a character contacts a ruled line or a frame line, the ruled line or the frame line may be forcibly removed after being read by an image reader. Since the recognition rate of the character image after the removal is reduced only by removing the ruled line or the frame line (since a part of the character in contact with the ruled line or the frame line is removed), conventionally, the remaining character Image interpolation is performed by estimating the removed portion from the directions of strokes before and after the image portion and the distance between the strokes.

【０００７】しかし、上述のイメージ補間方法でストロ
ークの方向が真の文字イメージとは異なる方向に向いて
いる場合には正しいイメージ補間ができないといった問
題点があった。例えば、文字「２」の下の部分が図３の
例のように罫線に接触していると、罫線と共にその部分
が除去され図１０（ａ）の例のようにストロークの方向
が左斜め下方向となり、しかもその距離が長いので、従
来の方法でイメージ補間すると図１０（ｃ）のように左
斜め方向にストローク部分が延長したイメージ（文字認
識すれば「７」）となり、正しい文字イメージ（図１０
（ｂ））とは異なったイメージ補間がなされることとな
る。However, there is a problem that correct image interpolation cannot be performed if the stroke direction is different from the true character image in the above-described image interpolation method. For example, if the lower part of the character "2" is in contact with the ruled line as in the example of FIG. 3, that part is removed together with the ruled line, and the direction of the stroke is diagonally lower left as in the example of FIG. Since the distance is long and the distance is long, when the image is interpolated by the conventional method, an image in which the stroke portion is extended diagonally to the left as shown in FIG. FIG.
Image interpolation different from (b)) is performed.

【０００８】本発明は上記問題点を解決するためになさ
れたものであり、非ドロップアウトカラーの１色刷りで
罫線或いは枠線（以下、罫線等）が印刷された帳票等を
イメージリーダで読み取る際の認識率向上を実現した文
字認識方法及び文字認識装置の提供を目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problem, and is intended to read a form or the like on which a ruled line or a frame (hereinafter referred to as a ruled line) is printed by non-dropout color printing with an image reader. It is an object of the present invention to provide a character recognition method and a character recognition device which realize an improvement in the recognition rate of a character.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、第１の発明の文字認識方法は、読み取った原稿の読
み取りイメージから罫線等のイメージを取り除いて１文
字ずつ文字イメージを切り出して文字認識を行う文字認
識方法であって、文字イメージの切り出しの際に切り出
された文字イメージと罫線等との接触情報を取得し、切
り出し文字の認識処理を行い、その認識結果が所定の条
件を満たさない場合に、認識結果と接触情報を基に、原
稿の読み取りイメージから罫線等のイメージを取り除く
際に該文字イメージから取り除かれた接触部分をつなぐ
ように補間した文字イメージを作成し、補間後の文字イ
メージの認識処理を行う、ことを特徴とする。In order to solve the above-mentioned problems, a character recognition method according to a first aspect of the present invention is to remove a ruled line or the like from a read image of a read original and cut out a character image one by one to extract a character image. This is a character recognition method that performs recognition, acquires contact information between a character image cut out at the time of cutting out a character image and a ruled line, performs recognition processing of a cut out character, and the recognition result satisfies a predetermined condition. If not, based on the recognition result and the contact information, when removing an image such as a ruled line from the read image of the original, create a character image interpolated so as to connect the contact portions removed from the character image, and Character image recognition processing is performed.

【００１０】また、第２の発明は上記第１の発明の文字
認識方法において、補正後の文字イメージが所定の条件
を満たさない場合に、該認識結果と接触情報を基に、原
稿の読み取りイメージから罫線等のイメージを取り除く
際に該文字イメージから取り除かれた接触部分をつなぐ
ように補間した補間後の文字イメージのうち、余分に生
成された部分をとり除いて補正した文字イメージを作成
し、再補間後の文字イメージの認識処理を行う、ことを
特徴とする。According to a second aspect of the present invention, in the character recognition method according to the first aspect, when the corrected character image does not satisfy a predetermined condition, a read image of the original is formed based on the recognition result and the contact information. When removing an image such as a ruled line from the interpolated character image interpolated so as to connect the contact portions removed from the character image, a corrected character image is created by removing an excessively generated portion, And performing recognition processing of the character image after the re-interpolation.

【００１１】また、第３の発明の文字認識装置は、読み
取った原稿の読み取りイメージから罫線等のイメージを
取り除く罫線除去手段と、この罫線除去手段によって罫
線等が取り除かれた文字イメージから１文字ずつ文字イ
メージを切り出す切り出し手段と、罫線除去手段によっ
て罫線等が取り除かれた文字イメージから罫線等が接触
していた部分の接触情報を取得する罫線接触情報取得手
段と、切り出し手段によって切り出された文字イメージ
の認識処理を行なうと共に認識手段による認識結果を評
価する文字認識手段と、この文字認識手段による認識結
果の評価が所定の条件を満たさない場合に、該認識結果
と前記接触情報を基に、前記罫線除去手段による罫線除
去の際に該文字イメージから取り除かれた接触部分をつ
なぐように補間する文字イメージ補間手段とを備えたこ
とを特徴とする。A character recognition device according to a third aspect of the present invention provides a ruled line removing means for removing an image such as a ruled line from a read image of a read original, and one character at a time from a character image from which a ruled line or the like has been removed by the ruled line removing means. A cutout means for cutting out a character image, a ruled line contact information obtaining means for obtaining contact information of a portion where a ruled line or the like has contacted from a character image from which a ruled line or the like has been removed by a ruled line removing means, and a character image cut out by the cutout means Character recognition means for performing recognition processing and evaluating the recognition result by the recognition means, and when the evaluation of the recognition result by the character recognition means does not satisfy a predetermined condition, based on the recognition result and the contact information, Interpolation is performed to connect the contact parts removed from the character image when the ruled line is removed by the ruled line removing means. Characterized in that a character image interpolation means.

【００１２】また、第４の発明は上記第３の発明の文字
認識装置において、文字イメージ補間手段による補間後
の文字イメージが所定の条件を満たしているか否かを判
定する補間イメージ判定手段を備え、文字イメージ補正
手段は、文字イメージ補間手段による補間後の文字イメ
ージが補間イメージ判定手段によって所定の条件を満た
さないと判定された場合に、該認識結果と前記接触情報
を基に、接触部分をつなぐように補間した文字イメージ
のうち、余分に生成された部分を除いて補正した文字イ
メージを作成する手段を含むことを特徴とする。According to a fourth aspect, in the character recognition apparatus according to the third aspect, there is provided an interpolated image determining means for determining whether or not the character image interpolated by the character image interpolating means satisfies a predetermined condition. The character image correcting means determines a contact portion based on the recognition result and the contact information when the character image interpolated by the character image interpolating means does not satisfy a predetermined condition by the interpolated image determining means. The image processing apparatus further includes means for creating a corrected character image by removing an extraly generated portion of the character image interpolated so as to be connected.

【００１３】また、第５の発明は上記第３又は第４の発
明の文字認識装置において、接触情報は、罫線等と文字
の接触方向、罫線等と文字の接触又は重複個所数、罫線
等と文字の接触部分又は重複部分の端部または両端の位
置の全部またはそれらの組み合わせであることを特徴と
する。According to a fifth aspect of the present invention, in the character recognition apparatus according to the third or fourth aspect, the contact information includes a contact direction of the ruled line or the like with the character, a contact of the ruled line or the like with the character or the number of overlapping portions, a ruled line or the like. It is characterized in that it is all or a combination of the positions of the ends or both ends of the contact portion or the overlapping portion of the character.

【００１４】[0014]

【発明の実施の形態】図１は本発明の文字認識装置の一
実施例の構成を示すブロック図であり、文字認識装置１
００は、罫線除去部１１０、文字切り出し部１２０、罫
線接触情報格納部１３０及び文字認識ブロック１４０か
ら構成されている。なお、図示していないが文字認識装
置１００はＣＰＵおよびその周辺回路からなり、これら
各構成部分の動作制御及び文字認識装置全体の動作を制
御する制御部を備えている。FIG. 1 is a block diagram showing the configuration of an embodiment of a character recognition apparatus according to the present invention.
Reference numeral 00 includes a ruled line removing unit 110, a character cutout unit 120, a ruled line contact information storage unit 130, and a character recognition block 140. Although not shown, the character recognition device 100 includes a CPU and its peripheral circuits, and includes a control unit that controls the operation of each of these components and controls the operation of the entire character recognition device.

【００１５】罫線除去部１１０は、スキャナ等から読み
込んだ読み込みイメージＩｍ１（図３）から罫線等（罫
線或いは枠線）のイメージを除去した文字イメージＩｍ
２（図４）を取得し、罫線等に接触していたイメージの
罫線接触情報（例えば、文字の接触方向、接触個所数、
接触部分の位置（座標））を取得する。The ruled line removing unit 110 is a character image Im obtained by removing a ruled line or the like (ruled line or frame line) from the read image Im1 (FIG. 3) read from a scanner or the like.
2 (FIG. 4), and obtains ruled line contact information (eg, character contact direction, number of contact points,
Acquire the position (coordinates) of the contact part).

【００１６】文字切り出し部１２０は、罫線除去部１１
０によって罫線等が除去されたイメージＩｍ２から１文
字ずつ文字イメージを切り出して、切り出し文字イメー
ジＣｉ１を取得し、その切り出し文字イメージのもつ罫
線接触情報を文字単位の罫線接触情報Ｉｆ２に変換す
る。The character cutout unit 120 includes the ruled line removal unit 11
A character image is cut out character by character from the image Im2 from which ruled lines and the like have been removed by 0, a cut-out character image Ci1 is obtained, and ruled line contact information of the cut-out character image is converted into ruled line contact information If2 for each character.

【００１７】罫線接触情報格納部１３０はＲＡＭ等の一
時格納メモリからなり、文字切り出し部１３０で得た罫
線接触情報Ｉｆ２を格納する。The ruled line contact information storage unit 130 includes a temporary storage memory such as a RAM, and stores the ruled line contact information If2 obtained by the character cutout unit 130.

【００１８】また、文字認識ブロック１４０は、特徴抽
出部１４１、辞書部１４２、識別部１４３、イメージ補
間処理部１４４、及び棄却判定部１４６を備え、特徴抽
出やイメージ補間処理等を行った後、文字認識を行う。The character recognition block 140 includes a feature extraction section 141, a dictionary section 142, an identification section 143, an image interpolation processing section 144, and a rejection determination section 146. Perform character recognition.

【００１９】すなわち、文字認識ブロック１４０におい
て、特徴抽出部１４１は、文字切り出し部１２０で切り
出された文字イメージＣｉ１から特徴量Ｆｄ１を算出す
る（又は、イメージ補間処理部１４４でイメージ補間さ
れた文字イメージＣｉ２から特徴量Ｆｄ２を算出す
る）。That is, in the character recognition block 140, the feature extracting unit 141 calculates the feature amount Fd1 from the character image Ci1 cut out by the character cutout unit 120 (or the character image that is image-interpolated by the image interpolation processing unit 144). The feature amount Fd2 is calculated from Ci2).

【００２０】また、辞書部１４２は、例えば、数字、ア
ルファベット等の標準的な特徴量Ｆｄｄ、文字コードＣ
ｏ等から構成されるテンプレート構成をなし、各文字種
に対して予め複数個のテンプレートをＲＯＭ等の保存メ
モリに格納してなる。The dictionary unit 142 includes standard feature values Fdd such as numbers and alphabets, and character codes C
The template configuration is made up of o and the like, and a plurality of templates are stored in advance in a storage memory such as a ROM for each character type.

【００２１】また、識別部１４３は、特徴抽出部１４１
で算出された特徴量Ｆｄ１と辞書部１４２の各テンプレ
ートが持つ特徴量Ｆｄｄとの距離計算を行い、距離の小
さい順（特徴の近い順）から上位規定値までの文字コー
ドＣｏ１、その距離Ｄｉ１等の認識候補情報を取得し
（又は、特徴抽出部１４１で算出された特徴量Ｆｄ２と
辞書部１４２の各テンプレートが持つ特徴量Ｆｄｄとの
距離計算を行い、距離の小さい順（特徴の近い順）から
上位規定値までの文字コードＣｏ２、その距離Ｄｉ２等
の認識候補情報を取得し）、識別可否を判定する。The identification unit 143 includes a feature extraction unit 141
The distance between the feature amount Fd1 calculated in step (1) and the feature amount Fdd of each template of the dictionary unit 142 is calculated, and the character code Co1 from the smallest distance (the order of the closest feature) to the upper specified value, the distance Di1, etc. (Or the distance between the feature amount Fd2 calculated by the feature extraction unit 141 and the feature amount Fdd of each template of the dictionary unit 142 is calculated, and the distance is calculated in ascending order (the order in which the features are closer). , The recognition candidate information such as the character code Co2 and the distance Di2 from the upper limit specified value to the upper specified value are obtained), and the identification is determined.

【００２２】また、イメージ補間処理部１４４では、識
別部１４３で識別不可と判定された文字イメージ（つま
り、文字切り出し部１２０で切り出された文字イメー
ジ）Ｃｉ１と、それに対応する識別部１４３で得られた
認識候補情報と、罫線接触情報格納部１３０に格納され
ている罫線接触情報Ｉｆ２を基に罫線等の除去により失
われた部分のイメージ補間処理を行い、文字イメージＣ
ｉ２を取得する。In the image interpolation processing section 144, the character image determined to be unidentifiable by the identification section 143 (that is, the character image cut out by the character cutout section 120) Ci1 and the corresponding identification image obtained by the identification section 143 are obtained. Based on the recognized candidate information and the ruled line contact information If2 stored in the ruled line contact information storage unit 130, an image interpolation process for a part lost by removing ruled lines and the like is performed, and the character image C
Acquire i2.

【００２３】また、棄却判定部１４６は、イメージ補間
処理部１４４でイメージ補間処理された文字イメージＣ
ｉ２に対し、特徴抽出部１４１で特徴抽出され、識別部
１４３で取得された認識候補情報を基に、どの認識候補
の文字コードを出力するかそれともリジェクトコードを
出力するかを判定する。The rejection judging section 146 outputs the character image C subjected to the image interpolation processing by the image interpolation processing section 144.
For i2, based on the recognition candidate information extracted by the feature extraction unit 141 and acquired by the identification unit 143, it is determined which recognition candidate character code to output or a reject code to output.

【００２４】図２は、図１の文字認識装置１００による
文字認識動作例を示すフローチャートであり、各ステッ
プの動作シーケンスの制御は制御部によって行われる。
また、図３は読み込み文字イメージの例を示す図であ
る。また、図４は罫線等の除去後の文字イメージの例を
示す図であり、図５は図４での罫線等の除去後の文字
「３」を例としたイメージ補間及び認識結果を示す図で
ある。ステップＳ１：（罫線等の除去）図２で、枠線除
去部１１０はＤＲＡＭ等の一時記憶メモリに取り込まれ
た非ドロップアウトカラーの帳票又は原稿（図３の例で
は文字「１」、「２」、「３」、「４」、「５」が記入
されている）の読み込みイメージＩｍ１の罫線等（図３
の例では符号３１〜３８で示される罫線及び符号３９で
示される枠線）を除去した文字イメージＩｍ２（図４）
を得て、ＤＲＡＭ等の一時記憶メモリに記憶する。FIG. 2 is a flowchart showing an example of a character recognition operation performed by the character recognition apparatus 100 shown in FIG. 1. The control of the operation sequence of each step is performed by a control unit.
FIG. 3 is a diagram showing an example of a read character image. FIG. 4 is a diagram showing an example of a character image after removing ruled lines and the like, and FIG. 5 is a diagram showing an image interpolation and recognition result of a character “3” after removing ruled lines and the like in FIG. 4 as an example. It is. Step S1: (Removal of Ruled Lines and the Like) In FIG. 2, the frame line removing unit 110 reads out a non-dropout color form or document (characters "1", "2" in the example of FIG. 3) taken into a temporary storage memory such as a DRAM. , "3", "4", "5"), the ruled line of the read image Im1 (FIG. 3)
In the example, the character image Im2 (FIG. 4) from which the ruled lines denoted by reference numerals 31 to 38 and the frame line denoted by reference numeral 39) have been removed.
And store it in a temporary storage memory such as a DRAM.

【００２５】ステップＳ２：（文字の切り出し）次に、文字切り出し部１２０は上記ステップＳ１で罫線
等が除去されたイメージＩｍ２から１文字ずつ文字イメ
ージを切り出して、切り出し文字イメージＣｉ１（図４
の例では符号４１〜４５で表される各切り出し文字イメ
ージ）を取得する。Step S2: (Cutout of Character) Next, the character cutout unit 120 cuts out character images one by one from the image Im2 from which the ruled lines and the like have been removed in step S1, and cuts out the character image Ci1 (FIG. 4).
In the example, each cut-out character image represented by reference numerals 41 to 45) is acquired.

【００２６】ステップＳ３：（罫線接触情報の取得及び
格納）また、文字切り出し部１２０は上記ステップＳ２で切り
出した切り出し文字イメージのもつ罫線接触情報を文字
単位の罫線接触情報Ｉｆ２（例えば、文字の接触方向、
接触個所数、接触部分の位置（座標））に変換し（図４
の例では切り出し文字イメージ４１〜４５をそれぞれ変
換して５つの罫線接触情報を得て）、罫線接触情報格納
部１３０に格納する（罫線等と接触していない文字につ
いても「罫線接触なし」を意味する罫線接触情報（例え
ば、接触個所数＝０）が格納される）。ここで、図４に
示す５文字の切り出しイメージのうち符号４３で表され
る「３」を例とすると、図３で文字「３」は右側の罫線
３６と枠線３９の下側に接触しているので、ステップＳ
１で罫線除去を行うと図４の符号４３に示すような３ブ
ロックに分かれた切り出しイメージとなる（つまり、上
記ステップＳ２で図５（ａ）に示すような３ブロックに
分かれたままの１文字イメージが切り出される）。ま
た、この例で、文字切り出し部１２０は文字イメージの
罫線接触情報として罫線３６と接触した罫線等の位置
（つまり、罫線３６と文字「３」が接触した両端の位置
５１、５２と、枠線３９と文字「３」が接触した両端の
位置５３、５４）を罫線接触情報格納部１３０に格納す
る。なお、実施例では位置５１、５２、５３、５４を座
標値（Ｘ、Ｙ）で表しているがこれに限定されない。Step S3: (Acquisition and Storage of Ruled Line Contact Information) The character cutout unit 120 converts the ruled line contact information of the cut-out character image cut out in step S2 into ruled line contact information If2 for each character (for example, a character contact direction,
(The number of contact points and the position (coordinates) of the contact portion)
In the example of (5), the cut-out character images 41 to 45 are respectively converted to obtain five pieces of ruled line contact information), and stored in the ruled line contact information storage unit 130 (even for a character not in contact with a ruled line or the like, “No ruled line contact” is set. Meaning ruled line contact information (for example, the number of contact points = 0) is stored). Here, assuming that “3” represented by reference numeral 43 in the cut-out image of the five characters shown in FIG. 4 is an example, the character “3” contacts the right ruled line 36 and the lower side of the frame line 39 in FIG. Step S
When the ruled line is removed in step 1, a cutout image divided into three blocks as indicated by reference numeral 43 in FIG. 4 is obtained (that is, in step S 2, one character remains divided into three blocks as illustrated in FIG. 5A). Image is cropped). Further, in this example, the character cutout unit 120 determines the position of the ruled line or the like in contact with the ruled line 36 as the ruled line contact information of the character image (that is, the positions 51 and 52 at both ends where the ruled line 36 and the character “3” are in contact, and the frame line The positions 53 and 54 at both ends where 39 and the character “3” contact each other are stored in the ruled line contact information storage unit 130. In the embodiment, the positions 51, 52, 53, 54 are represented by coordinate values (X, Y), but the present invention is not limited to this.

【００２７】ステップＳ４：（切り出した文字イメージ
の特徴抽出）特徴抽出部１４１は、上記ステップＳ２で切り出された
１つの文字イメージＣｉ１（例えば、図５（ａ）の切り
出し文字イメージ）から特徴量Ｆｄ１を算出する。Step S4: (Characteristic Extraction of Cut-out Character Image) The feature extracting section 141 extracts the feature amount Fd1 from one character image Ci1 (for example, the cut-out character image in FIG. 5A) cut out in step S2. Is calculated.

【００２８】ステップＳ５：（切り出した文字イメージ
の識別（文字認識））識別部１４３は上記ステップＳ４で算出された文字イメ
ージＣｉ１の特徴量Ｆｄ１と辞書部１４２の各テンプレ
ートに格納されている標準的な特徴量Ｆｄｄとの距離計
算を行い、距離の小さい順（特徴の近い順）から上位３
位までの認識候補文字コードＣｏ１及び距離計算結果Ｄ
ｉ１を認識候補情報として取得する。Step S5: (Identification of Cut-out Character Image (Character Recognition)) The identification unit 143 calculates the characteristic amount Fd1 of the character image Ci1 calculated in step S4 and the standard value stored in each template of the dictionary unit 142. Calculate the distance to the feature amount Fdd, and select the top three
Recognition candidate character code Co1 up to the rank and distance calculation result D
Acquire i1 as recognition candidate information.

【００２９】ステップＳ６：（識別可否の判定）また、識別部１４３は各認識候補文字コードＣｏ１（つ
まり、１位から３位までの認識候補コード）が一致し、
且つ各認識候補の距離Ｄｉ１が所定値以下である場合に
認識可としてＳ１１に移行し、そうでない場合（各認識
候補コードＣｏ１が一致しないか、距離Ｄｉ１が所定値
以下の場合）には、Ｓ７に移行して文字イメージ補間処
理を行う。例えば、図５（ａ）の切り出し文字イメージ
について、上記ステップＳ５の認識処理を行った結果、
認識第３位までの認識候補文字コードＣｏ１は全てが
「３」を示す文字コードであるが、辞書部４１２の標準
的な特徴量Ｆｄｄとの距離が大きい（つまり、所定値以
上）とすると、このままでは信頼性のある認識結果を出
力することができないので、イメージ補間を要するもの
としてＳ７に移行する。Step S6: (Determination of Identification Possibility) The identification section 143 determines that the recognition candidate character codes Co1 (that is, the first to third recognition candidate codes) match,
If the distance Di1 of each recognition candidate is equal to or less than a predetermined value, the process proceeds to S11 as recognizable, and if not (if each recognition candidate code Co1 does not match or the distance Di1 is equal to or less than a predetermined value), S7 And the character image interpolation process is performed. For example, as a result of performing the recognition process in step S5 on the cut-out character image in FIG.
The recognition candidate character codes Co1 up to the third place are all character codes indicating “3”, but if the distance from the standard feature value Fdd of the dictionary unit 412 is large (that is, a predetermined value or more), In this state, a reliable recognition result cannot be output, and the process proceeds to S7 on the assumption that image interpolation is required.

【００３０】ステップＳ７：（イメージ補間処理）イメージ補間処理部１４４では、上記ステップＳ５で得
た認識候補情報（認識候補文字コードＣｏ１及び距離計
算結果Ｄｉ１）と、罫線接触情報格納部１３０に格納さ
れているこの切り出し文字イメージの罫線接触情報Ｉｆ
２（例えば、文字の接触方向、接触個所数、接触部分の
位置（座標））を基に次に述べるようなイメージ補間処
理を行い、文字イメージＣｉ２（図５の例では図５
（ｂ））の文字イメージを取得する。Step S7: (Image Interpolation Processing) The image interpolation processing section 144 stores the recognition candidate information (recognition candidate character code Co1 and distance calculation result Di1) obtained in step S5 and the ruled line contact information storage section 130. Ruled line contact information If of this cut-out character image
2 (for example, the contact direction of the character, the number of contact points, and the position (coordinates) of the contact portion), an image interpolation process described below is performed, and the character image Ci2 (FIG.
(B) Acquire the character image.

【００３１】まず、認識候補文字コードＣｏ１を調べ、
その文字コードで表される文字が１ブロックからなる文
字（つまり、辺やつくりからなる文字（或いは、しんに
ゅう、冠、点を含む文字）のように２ブロック以上の部
分からなっていない文字）か否かを調べ１文字からなる
文字の場合にはステップＳ１の罫線等の除去処理で除去
された部分（図５（ａ）の例では位置５１、５２の間及
び位置５３、５４の間）を補間してつなぎ、１ブロック
からなる文字イメージＣｉ２を得る。また、２ブロック
以上からなる文字の場合には各ブロック毎にステップＳ
１の罫線等の除去処理で除去された部分を補間してつな
ぎ、補間された複数ブロックからなる１つの文字イメー
ジＣｉ２を得る（各数字「０」〜「９」はそれぞれ１ブ
ロックとなるので認識対象文字が数字だけの場合には処
理が簡単になる）。First, the recognition candidate character code Co1 is checked.
Whether the character represented by the character code is a character consisting of one block (that is, a character that does not consist of two or more blocks, such as a character consisting of sides and structures (or a character containing shin, crown, dot)) It is checked whether or not the character consists of one character. In the case of a character consisting of one character, the portions removed by the ruled line removal processing in step S1 (between positions 51 and 52 and between positions 53 and 54 in the example of FIG. By interpolating and connecting, a character image Ci2 consisting of one block is obtained. If the character is composed of two or more blocks, step S is performed for each block.
Interpolated and connected portions removed by the removal processing of one ruled line or the like to obtain a single character image Ci2 composed of a plurality of interpolated blocks (each of the numerals “0” to “9” is one block, so it is recognized. If the target character is only a number, the process is easier.)

【００３２】ステップＳ８：（イメージ補間後の文字イ
メージの特徴抽出）次に、特徴抽出部１４１で上記ステップＳ７で補間・生
成された文字イメージＣｉ２に対して特徴抽出を行い、
特徴量Ｆｄ２を算出する。Step S8: (Character Image Character Extraction After Image Interpolation) Next, the feature extraction section 141 performs feature extraction on the character image Ci2 interpolated and generated in the above step S7.
The feature amount Fd2 is calculated.

【００３３】ステップＳ９：（イメージ補間後の文字イ
メージの識別（文字認識））また、識別部１４３は上記ステップＳ８で算出された文
字イメージＣｉ２の特徴量Ｆｄ２と辞書部１４２の各テ
ンプレートに格納されている標準的な特徴量Ｆｄｄとの
距離計算を行い、距離の小さい順（特徴の近い順）から
上位３位までの認識候補文字コードＣｏ２及び距離計算
結果Ｄｉ２を認識候補情報として取得する。Step S9: (Identification of Character Image after Image Interpolation (Character Recognition)) The identification unit 143 stores the feature amount Fd2 of the character image Ci2 calculated in step S8 and the templates of the dictionary unit 142. The distance calculation is performed with respect to the standard feature amount Fdd, and the recognition candidate character codes Co2 and the distance calculation result Di2 from the smallest distance (the order of the features) to the top three are acquired as recognition candidate information.

【００３４】ステップＳ１０：（認識結果出力かリジェ
クトコード出力かの判定）次に、棄却判定部１４６で認識候補文字コードＣｏ２
（つまり、１位から３位までの認識候補）が一致し、且
つ各認識候補の距離Ｄｉ２が所定値以下である場合に認
識可として認識結果を出力し、そうでない場合（各認識
候補コードＣｏ２が一致しないか、距離Ｄｉ２が所定値
以下の場合）にはリジェクトコードを出力して、Ｓ１１
に移行する。例えば、図５の例で、棄却判定部１４６で
上記方法で得られた文字イメージＣｉ２に対しての認識
候補文字コードＣｏ２（つまり、１位から３位までの認
識候補文字コード）が全て「３」を示す文字コードであ
り、辞書部１４２で持っている標準的な特徴量Ｆｄｄと
の距離が所定以下の場合には、その文字イメージ（図５
（ｂ））は文字「３」としての信頼性が高いと判定し
「３」に対応する文字コードを認識結果として出力す
る。Step S10: (Determination of Recognition Result Output or Reject Code Output) Next, the rejection determination unit 146 sets the recognition candidate character code Co2
When the recognition candidates (the first to third recognition candidates) match and the distance Di2 of each recognition candidate is equal to or less than a predetermined value, the recognition result is output as recognizable, and otherwise (each recognition candidate code Co2 Does not match or the distance Di2 is equal to or less than a predetermined value), a reject code is output and S11
Move to For example, in the example of FIG. 5, all of the recognition candidate character codes Co2 (that is, the first to third recognition candidate character codes) for the character image Ci2 obtained by the above method in the rejection determination unit 146 are “3”. When the distance from the standard feature value Fdd held by the dictionary unit 142 is equal to or less than a predetermined value, the character image (FIG. 5)
(B)) determines that the reliability of the character “3” is high, and outputs a character code corresponding to “3” as a recognition result.

【００３５】ステップＳ１１：（認識処理終了の可否判
定）制御部は上記ステップＳ２で切り出した全ての文字イメ
ージについて上記Ｓ４〜Ｓ１０の文字認識処理等が終了
したかを調べ、終了していない場合はＳ４に戻ってＳ４
以降の文字認識処理等を繰り返す。上記構成により、文
字が罫線等に接触している場合、罫線等の除去により幾
つかの部分（ブロック）に分かれても、図５（ｂ）のよ
うな補間文字イメージを取得できるので、非ドロップア
ウトカラーの帳票又は原稿を読み取って罫線等の除去処
理を行なう場合の文字認識性能が従来より向上した。Step S11: (Determining Whether Recognition Processing Ends) The control unit checks whether the character recognition processing of S4 to S10 has been completed for all the character images cut out in step S2, and if not completed, Return to S4 and S4
The subsequent character recognition processing and the like are repeated. According to the above configuration, if a character is in contact with a ruled line or the like, an interpolated character image as shown in FIG. The character recognition performance in the case where an out-colored form or document is read and a ruled line or the like is removed is improved compared to the related art.

【００３６】図６は本発明の文字認識装置の他の実施例
の構成を示すブロック図であり、イメージ補間された文
字イメージに対する文字認識の信頼度が低い場合に再度
イメージの補間を繰り返して新たな文字イメージを取得
可能に構成した例である。図６で、文字認識装置１０
０’は、罫線除去部１１０、文字切り出し部１２０、罫
線接触情報格納部１３０及び文字認識ブロック１４０’
から構成されている。なお、図１の文字認識装置１００
と同様に図示していないが文字認識装置１００’はＣＰ
Ｕおよびその周辺回路からなり、これら各構成部分の動
作制御及び文字認識装置全体の動作を制御する制御部を
備えている。FIG. 6 is a block diagram showing the configuration of another embodiment of the character recognition apparatus according to the present invention. When the reliability of character recognition for a character image on which image interpolation has been performed is low, image interpolation is repeated again to obtain a new image. This is an example in which a simple character image can be acquired. In FIG. 6, the character recognition device 10
0 ′ is a ruled line removing unit 110, a character cutout unit 120, a ruled line contact information storage unit 130, and a character recognition block 140 ′.
It is composed of Note that the character recognition device 100 shown in FIG.
Although not shown in the figure, the character recognition device 100 '
U and its peripheral circuits, and includes a control unit that controls the operation of each of these components and the operation of the entire character recognition device.

【００３７】ここで、罫線除去部１１０、文字切り出し
部１２０及び罫線接触情報格納部１３０の構成、機能及
び動作は図１の文字認識装置１００の場合と同様であ
る。Here, the configurations, functions, and operations of the ruled line removing unit 110, the character cutout unit 120, and the ruled line contact information storage unit 130 are the same as those of the character recognition device 100 of FIG.

【００３８】また、文字認識ブロック１４０は、特徴抽
出部１４１、辞書部１４２、識別部１４３、イメージ補
間処理部１４４’、補間イメージ判定部１４５及び棄却
判定部１４６を備え、特徴抽出やイメージ補間処理等を
行った後、文字認識を行う。The character recognition block 140 includes a feature extraction unit 141, a dictionary unit 142, an identification unit 143, an image interpolation processing unit 144 ', an interpolation image determination unit 145, and a rejection determination unit 146. After that, character recognition is performed.

【００３９】ここで、文字認識ブロック１４０’で特徴
抽出部１４１、辞書部１４２、識別部１４３及び棄却判
定部１４６の構成、機能及び動作は図１の文字認識装置
１００の文字認識部１４０の場合と同様である。また、
イメージ補間処理部１４４’は、識別部１４３で識別不
可と判定された文字イメージ（つまり、文字切り出し部
１２０で切り出された文字イメージ）Ｃｉ１と、それに
対応する識別部１４３で得られた認識候補情報と、罫線
接触情報格納部１３０に格納されている罫線接触情報Ｉ
ｆ２（例えば、文字の接触方向、接触個所数、接触部分
の位置（座標））を基に罫線等の除去により失われた部
分のイメージ補間処理を行い、文字イメージＣｉ２を取
得する。また、イメージ補間処理部１４４’は、補間イ
メージ判定部１４５で再度イメージ補間を要すると判定
された場合に、再度イメージ補間を行い、新たな文字イ
メージＣｉ２を取得する。Here, in the character recognition block 140 ', the configurations, functions, and operations of the feature extraction unit 141, the dictionary unit 142, the identification unit 143, and the rejection determination unit 146 are the same as those of the character recognition unit 140 of the character recognition device 100 in FIG. Is the same as Also,
The image interpolation processing unit 144 'includes a character image determined to be unidentifiable by the identification unit 143 (that is, a character image extracted by the character extraction unit 120) Ci1 and recognition candidate information corresponding to the character image obtained by the identification unit 143. And the ruled line contact information I stored in the ruled line contact information storage unit 130.
Based on f2 (for example, the contact direction of the character, the number of contact points, and the position (coordinates) of the contact portion), image interpolation processing is performed on the portion lost due to the removal of ruled lines and the like to obtain a character image Ci2. Further, when the interpolation image determination unit 145 determines that image interpolation is necessary again, the image interpolation processing unit 144 ′ performs image interpolation again to acquire a new character image Ci2.

【００４０】また、補間イメージ判定部１４５は、イメ
ージ補間処理部１４４でイメージ補間処理された文字イ
メージＣｉ２に対し、特徴抽出部１４１で特徴抽出さ
れ、識別部１４３で算出された認識候補情報（認識候補
文字コードＣｏ２及び距離計算結果Ｄｉ２）から認識対
象文字のイメージとしての信頼性を判定する。The interpolated image determining section 145 extracts the feature of the character image Ci2 subjected to the image interpolation processing by the image interpolation processing section 144 by the feature extracting section 141 and calculates the recognition candidate information (recognition information) calculated by the identifying section 143. The reliability as an image of the recognition target character is determined from the candidate character code Co2 and the distance calculation result Di2).

【００４１】図７は、図６の文字認識装置１００’によ
る文字認識動作例を示すフローチャートであり、補間イ
メージ判定ステップ（Ｓ９’）での補間イメージ判定が
否の場合に再度イメージ補間を行うイメージ補間処理ス
テップ（Ｓ９”）を設け、イメージ補間された文字イメ
ージに対する文字認識の信頼度が低い場合に再度イメー
ジの補間を繰り返して新たな文字イメージを取得可能と
した例である。また、各ステップの動作シーケンスの制
御は制御部によって行われる。また、図７でステップＳ
１〜Ｓ７（罫線等の除去〜イメージ補間処理）の動作は
図２の文字認識動作と同様である。FIG. 7 is a flowchart showing an example of a character recognition operation performed by the character recognition device 100 'of FIG. 6. In the case where the interpolation image determination in the interpolation image determination step (S9') is negative, image interpolation is performed again. In this example, an interpolation processing step (S9 ") is provided, and when the reliability of character recognition for a character image subjected to image interpolation is low, a new character image can be obtained by repeating image interpolation again. The control of the operation sequence is performed by the control unit.
Operations 1 to S7 (removal of ruled lines and the like to image interpolation processing) are the same as the character recognition operation in FIG.

【００４２】ステップＳ８：（イメージ補間後の文字イ
メージの特徴抽出）図７で、特徴抽出部１４１は上記ステップＳ７または後
述のステップＳ９”で補間された文字イメージＣｉ２に
対して特徴抽出を行い、特徴量Ｆｄ２を算出する。Step S8: (Extraction of Character Image Character after Image Interpolation) In FIG. 7, the characteristic extraction unit 141 performs characteristic extraction on the character image Ci2 interpolated in the above-described step S7 or step S9 ″ described later. The feature amount Fd2 is calculated.

【００４３】ステップＳ９：（イメージ補間後の文字イ
メージの識別（文字認識））また、認識部１４３は上記ステップＳ８で算出された文
字イメージＣｉ２の特徴量Ｆｄ２と辞書部１４２の各テ
ンプレートに格納されている標準的な特徴量Ｆｄｄとの
距離計算を行い、距離の小さい順（特徴の近い順）から
上位３位までの認識候補文字コードＣｏ２及び距離計算
結果Ｄｉ２を認識候補情報として取得する。Step S9: (Identification of Character Image after Image Interpolation (Character Recognition)) The recognition unit 143 stores the feature amount Fd2 of the character image Ci2 calculated in step S8 and the templates of the dictionary unit 142. The distance calculation is performed with respect to the standard feature amount Fdd, and the recognition candidate character codes Co2 and the distance calculation result Di2 from the smallest distance (the order of the features) to the top three are acquired as recognition candidate information.

【００４４】ステップＳ９’：（補間イメージの可否判
定）補間イメージ判定部１４５は上記ステップＳ９で取得し
た文字イメージＣｉ２の認識候補情報（認識候補文字コ
ードＣｏ２及び距離計算結果Ｄｉ２）を基に後述するよ
うに補間イメージの信頼性を判定し、信頼性が低いと判
定した場合にはＳ９”に移行し、信頼性ありと判定した
場合はＳ１０に移行する。Step S9 ': (Judgment of availability of interpolation image) Interpolation image determination section 145 will be described later based on the recognition candidate information (recognition candidate character code Co2 and distance calculation result Di2) of character image Ci2 obtained in step S9. The reliability of the interpolated image is determined as described above. If it is determined that the reliability is low, the process proceeds to S9 ", and if it is determined that the image is reliable, the process proceeds to S10.

【００４５】ステップＳ９”：（再イメージ補間処理）イメージ補間処理部１４４’では、上記ステップＳ９で
得た認識候補情報（認識候補文字コードＣｏ２及び距離
計算結果Ｄｉ２）と、罫線接触情報格納部１３０に格納
されているこの切り出し文字イメージの罫線接触情報Ｉ
ｆ２を基に次に述べるようなイメージ補間処理を行い、
新たな文字イメージＣｉ２（図８の例では図８（ｂ）の
文字イメージを取得する。Step S9 ": (Re-Image Interpolation Processing) The image interpolation processing section 144 'stores the recognition candidate information (recognition candidate character code Co2 and distance calculation result Di2) obtained in step S9 and the ruled line contact information storage section 130. Line contact information I of this cut-out character image stored in
Based on f2, the following image interpolation processing is performed.
A new character image Ci2 (in the example of FIG. 8, the character image of FIG. 8B is acquired).

【００４６】まず、文字イメージＣｉ２と罫線接触情報
Ｉｆ２（例えば、文字の接触方向、接触個所数、接触部
分の位置（座標））を調べ、上記ステップＳ９のイメー
ジ補間で２組以上の位置の間で余分につながっている区
間（例えば、図８（ａ）の例で（位置８１、８２）と位
置（８３、８４）の間）のつながりを取り除いて補間部
分を補正する。First, the character image Ci2 and the ruled line contact information If2 (for example, the contact direction of the character, the number of contact points, and the position (coordinates) of the contact portion) are checked. Then, the connection between the sections (for example, between (positions 81, 82) and (83, 84) in the example of FIG. 8A) that are redundantly connected is removed to correct the interpolation portion.

【００４７】以下、図６の文字認識装置１００’による
文字認識の具体的動作例について上記図７のフローチャ
ート（Ｓ１〜Ｓ７については図２のフローチャート）を
基に説明する。Hereinafter, a specific operation example of character recognition by the character recognition device 100 'of FIG. 6 will be described with reference to the flowchart of FIG. 7 (S1 to S7 are flowcharts of FIG. 2).

【００４８】読み込みイメージＩｍ１を図３に示したイ
メージとし、罫線除去部１１０により罫線等を除去した
イメージＩｍ２を図４に示したイメージとする（Ｓ
１）。ここで、切り出された５文字分の切り出しイメー
ジＣｍ１のうち、記入文字の質が悪く真の記入文字自体
が２ブロックに分かれ、枠線３９（図３）の右側に接触
して更に分かれて４ブロックになってしまった切り出し
イメージ「５」（図８（ａ））（Ｓ２）の認識処理につ
いて説明する。The read image Im1 is the image shown in FIG. 3, and the image Im2 from which the ruled lines and the like have been removed by the ruled line removing unit 110 is the image shown in FIG. 4 (S
1). Here, of the cut-out image Cm1 for the cut-out five characters, the quality of the input character is poor and the true input character itself is divided into two blocks, and is further divided by touching the right side of the frame 39 (FIG. 3). The recognition processing of the cut-out image “5” (FIG. 8A) (S2) that has become a block will be described.

【００４９】文字切り出し部１２０は図８（ａ）の切り
出しイメージについて枠線３９の右側と接触している部
分（８１，８２）、（８３，８４）と枠線３９の下側と
接触している部分（８５，８６）の位置（及び文字の接
触方向、接触個所数等）を罫線接触情報格納部１３０に
格納する（Ｓ３）。The character cutout section 120 contacts portions (81, 82) and (83, 84) of the cutout image shown in FIG. The position of the part (85, 86) (and the contact direction of the character, the number of contact points, etc.) is stored in the ruled line contact information storage unit 130 (S3).

【００５０】次に、特徴抽出部１４１で図８（ａ）の切
り出しイメージに対して特徴抽出を行って特徴量Ｆｄ１
を算出し（Ｓ４）、識別部１４３でこの特徴量Ｆｄ１と
辞書部１４２の各テンプレートに格納されている標準的
な特徴量Ｆｄｄとの距離計算を行い、距離の小さい順
（特徴の近い順）から上位３位までの認識候補文字コー
ドＣｏ１及び距離計算結果Ｄｉ１を認識候補情報として
取得する（Ｓ６）。ここで得られた第３位までの認識候
補文字コードＣｏ１は全てが「５」を示す文字コードで
あるが、辞書部１４２の持っている標準的な特徴量Ｆｄ
ｄとの距離が大きい（つまり、所定値以上）とすると、
このままでは信頼性のある認識結果を出力することがで
きないと判定されると、イメージ補間を要するものとし
てＳ７に移行する（Ｓ６）。Next, the feature extraction unit 141 performs feature extraction on the cut-out image of FIG.
(S4), and the identification unit 143 calculates the distance between the feature value Fd1 and the standard feature value Fdd stored in each template of the dictionary unit 142. Then, the recognition candidate character code Co1 and the distance calculation result Di1 from the top to the top three are acquired as recognition candidate information (S6). The recognition candidate character codes Co1 up to the third place obtained here are all character codes indicating “5”, but the standard feature amount Fd of the dictionary unit 142 is included.
If the distance to d is large (that is, a predetermined value or more),
If it is determined that a reliable recognition result cannot be output as it is, it is determined that image interpolation is required, and the process proceeds to S7 (S6).

【００５１】イメージ補間処理部１４４’では、上記ス
テップＳ５で得た認識候補情報（認識候補文字コードＣ
ｏ１及び距離計算結果Ｄｉ１）と、罫線接触情報格納部
１３０に格納されているこの切り出し文字イメージ（図
８（ａ））の罫線接触情報Ｉｆ２を基にイメージブロッ
クを一つとするため罫線除去で失われた区間をつなぐイ
メージ補間処理を行い、文字イメージＣｉ２（図８の例
では図８（ｂ）の文字イメージ）を取得する。すなわ
ち、イメージ補間処理部１４４で認識候補文字コードＣ
ｏ１を調べると、その文字コードで表される文字「５」
は数字であるから１ブロックからなる文字と判定し、ス
テップＳ１の罫線等の除去処理で除去された部分（図８
（ａ）の例では位置（８１，８２）、（８３，８４）、
（８５，８６）間）を補間してつなぎ、１ブロックから
なる文字イメージＣｉ２（図８（ｂ））を得る（Ｓ
７）。In the image interpolation processing section 144 ', the recognition candidate information (recognition candidate character code C) obtained in step S5 is obtained.
o1 and the distance calculation result Di1) and the ruled line contact information If2 of the cut-out character image (FIG. 8 (a)) stored in the ruled line contact information storage unit 130. Image interpolation processing for connecting the divided sections is performed to obtain a character image Ci2 (the character image in FIG. 8B in the example of FIG. 8). That is, the recognition candidate character code C
When you check o1, the character "5" represented by that character code
Is a character consisting of one block because it is a numeral, and the part removed by the removal processing of the ruled line and the like in step S1 (FIG. 8)
In the example of (a), the positions (81, 82), (83, 84),
(Between (85, 86)) and interpolated to obtain a character image Ci2 (FIG. 8 (b)) composed of one block (S8).
7).

【００５２】次に、特徴抽出部１４１で上記Ｓ７で得た
補正後の文字イメージＣｉ２または後述のＳ９”で再補
間された文字イメージＣｉ２に対して特徴抽出を行い
（Ｓ８）、特徴量Ｆｄ２を算出し、識別部１４３でこの
特徴量Ｆｄ２と辞書部１４２の各テンプレートに格納さ
れている標準的な特徴量Ｆｄｄとの距離計算を行い、距
離の小さい順（特徴の近い順）から上位３位までの認識
候補文字コードＣｏ２及び距離計算結果Ｄｉ２を認識候
補情報として取得する（Ｓ９）。Next, the feature extraction unit 141 performs feature extraction on the corrected character image Ci2 obtained in S7 or the character image Ci2 re-interpolated in S9 ″ described later (S8), and calculates the feature amount Fd2. The identification unit 143 calculates the distance between the feature value Fd2 and the standard feature value Fdd stored in each template of the dictionary unit 142. The recognition candidate character code Co2 and the distance calculation result Di2 up to are obtained as recognition candidate information (S9).

【００５３】補間イメージ判定部１４５は補間後の文字
イメージＣｉ２に対する認識候補情報を基に補間イメー
ジの信頼性を判定する。ここで、図８（ｂ）の「５」の
文字イメージ（補正後の文字イメージＣｉ２）の認識候
補文字コードＣｏ２は第１位が「５」、第２位が
「８」、第３位が「９」を示す文字コードであり辞書部
１４２で持っている標準的な特長量Ｆｄｄとの距離も大
きとすると、補間イメージ判定部１４５はイメージとし
ての信頼性が低いと判定して再度イメージ判定を行うた
めにＳ９”に移行する（Ｓ９’）。The interpolated image determining section 145 determines the reliability of the interpolated image based on the recognition candidate information for the interpolated character image Ci2. Here, as for the recognition candidate character code Co2 of the character image (corrected character image Ci2) of “5” in FIG. 8B, the first place is “5”, the second place is “8”, and the third place is Assuming that the distance between the character code indicating “9” and the standard feature value Fdd held by the dictionary unit 142 is also large, the interpolation image determination unit 145 determines that the reliability of the image is low and determines again the image. Then, the process proceeds to S9 ″ (S9 ′).

【００５４】イメージ補間処理部１４４’は、上記ステ
ップＳ９で得た認識候補情報（認識候補文字コードＣｏ
２及び距離計算結果Ｄｉ２）と、罫線接触情報格納部１
３０に格納されているこの切り出し文字イメージ（図８
（ａ））の罫線接触情報Ｉｆ２を基に余分な補間部分
（図８（ａ）の例では位置８２と８３の間の線分）を取
り除いて補間部分を補正するイメージ補間処理を行い、
新たな文字イメージＣｉ２（図８の例では図８（ｃ）の
文字イメージ）を取得しＳ８に戻る（Ｓ９”）。The image interpolation processing section 144 ′ performs the recognition candidate information (recognition candidate character code Co) obtained in step S 9.
2 and distance calculation result Di2), and ruled line contact information storage unit 1
This cut-out character image stored in the image 30 (FIG. 8)
Based on the ruled line contact information If2 of (a)), an image interpolation process of removing an extra interpolation portion (a line segment between positions 82 and 83 in the example of FIG. 8A) and correcting the interpolation portion is performed.
A new character image Ci2 (the character image of FIG. 8C in the example of FIG. 8) is obtained, and the process returns to S8 (S9 ″).

【００５５】以下、ステップＳ８で特徴抽出部１４１は
上記Ｓ７で得た補正後の文字イメージＣｉ２または後述
のＳ９”で再補間された文字イメージＣｉ２に対して特
徴抽出を行なって特徴量Ｆｄ２を算出し、ステップＳ９
で識別部１４３はこの特徴量Ｆｄ２と辞書部１４２の各
テンプレートに格納されている標準的な特徴量Ｆｄｄと
の距離計算を行い、距離の小さい順から上位３位までの
認識候補文字コードＣｏ２及び距離計算結果Ｄｉ２を新
たな認識候補情報として取得する。Thereafter, in step S8, the feature extraction unit 141 performs feature extraction on the corrected character image Ci2 obtained in step S7 or the character image Ci2 re-interpolated in step S9 "described later to calculate a feature amount Fd2. And step S9
The identification unit 143 calculates the distance between the feature value Fd2 and the standard feature value Fdd stored in each template of the dictionary unit 142, and recognizes the recognition candidate character codes Co2 from the smallest distance to the top three, and The distance calculation result Di2 is acquired as new recognition candidate information.

【００５６】次に、ステップＳ９’で補正イメージ判定
部１４５は補正後の文字イメージＣｉ２に対する認識候
補情報を基に補間イメージの信頼性を判定する。ここ
で、図８（ｃ）の「５」の文字イメージ（再補正後の文
字イメージＣｉ２）の認識候補文字コードＣｏ１は第３
位までが「５」を示す文字コードであり、辞書１４２で
持っている標準的な特長量Ｆｄｄとの距離が小さいとす
ると、補正イメージ判定部１４５はイメージとしての信
頼性が高いと判定してＳ１０に移行する。Next, in step S9 ', the corrected image determining unit 145 determines the reliability of the interpolated image based on the recognition candidate information for the corrected character image Ci2. Here, the recognition candidate character code Co1 of the character image of “5” in FIG. 8C (the character image Ci2 after re-correction) is the third character code.
If the order is a character code indicating “5” and the distance from the standard feature value Fdd held in the dictionary 142 is small, the corrected image determination unit 145 determines that the reliability as an image is high. Move to S10.

【００５７】ステップＳ１０（図２）で、棄却判定部１
４６は上記Ｓ９で得られた文字イメージＣｉ２に対する
認識候補情報を基に認識結果を出力するか、リジェクト
コードを出力する（図８（ｃ）の例では認識結果の信頼
度が高いと判定され認識結果が出力される）。In step S10 (FIG. 2), rejection determination section 1
46 outputs a recognition result based on the recognition candidate information for the character image Ci2 obtained in S9 or outputs a reject code (in the example of FIG. 8C, it is determined that the reliability of the recognition result is high and the recognition is performed). The result is output).

【００５８】上記構成により、罫線除去処理によって図
８（ａ）に示したように文字が数ブロックに分かれてい
て１度のイメージ補間で分離している部分を補間しても
余分な線分により認識できないような場合にも、余分な
補間部分を除去する再イメージ補間を行うことにより信
頼性の高い認識結果を得ることができる。With the above-described configuration, even if a portion in which a character is divided into several blocks and separated by one image interpolation is interpolated as shown in FIG. Even in the case where recognition cannot be performed, a highly reliable recognition result can be obtained by performing re-image interpolation for removing an unnecessary interpolation portion.

【００５９】次に、切り出された５文字分の切り出しイ
メージＣｍ１のうち、記入文字の一部のストロークの大
部分が枠線３９に重なり、罫線除去部１１０によって図
４に示すようにそのストロークを失ってしまった符号４
２で示される切り出しイメージ「２」（図９（ａ））の
認識処理について説明する。Next, in the cut-out image Cm1 for the cut-out five characters, most of the strokes of a part of the entered characters overlap the frame 39, and the stroke is removed by the ruled line removing unit 110 as shown in FIG. Code 4 lost
The recognition processing of the cut-out image “2” (FIG. 9A) indicated by 2 will be described.

【００６０】文字切り出し部１２０は図９（ａ）の切り
出しイメージについて枠線３９の下と接触している部分
（９１，９２）の位置（及び文字の接触方向、接触個所
数等）を罫線接触情報格納部１３０に格納する（Ｓ
３）。The character cutout section 120 determines the position of the portion (91, 92) in contact with the portion below the frame line 39 (and the character contact direction, the number of contact points, etc.) in the cutout image of FIG. Stored in the information storage unit 130 (S
3).

【００６１】次に、特徴抽出部１４１で図９（ａ）の切
り出しイメージに対して特徴抽出を行って特徴量Ｆｄ１
を算出し（Ｓ４）、識別部１４３でこの特徴量Ｆｄ１と
辞書部１４２の各テンプレートに格納されている標準的
な特徴量Ｆｄｄとの距離計算を行い、距離の小さい順
（特徴の近い順）から上位３位までの認識候補文字コー
ドＣｏ１及び距離計算結果Ｄｉ１を認識候補情報として
取得する（Ｓ６）。ここで得られた第３位までの認識候
補文字コードＣｏ１は全てが「７」を示す文字コードで
あるが、辞書部１４２の持っている標準的な特徴量Ｆｄ
ｄとの距離が少し大きい（つまり、所定値以上）とする
と、この例の場合、「７」と認識するには標準的な特長
量ＦＤＤとの距離が少し大きく、下側の罫線と接触して
いるという情報（位置情報（９１，９２））から他の文
字（例えば「２」）の可能性が在るので「７」としての
信頼性は低いと判定され、イメージ補間を要するものと
してＳ７に移行する（Ｓ６）。Next, the feature extraction unit 141 performs feature extraction on the cut-out image of FIG.
(S4), and the identification unit 143 calculates the distance between the feature value Fd1 and the standard feature value Fdd stored in each template of the dictionary unit 142. Then, the recognition candidate character code Co1 and the distance calculation result Di1 from the top to the top three are acquired as recognition candidate information (S6). All of the recognition candidate character codes Co1 up to the third place obtained here are character codes indicating “7”, but the standard feature amount Fd of the dictionary unit 142 is included.
Assuming that the distance to d is slightly large (that is, a predetermined value or more), in this example, the distance to the standard feature value FDD is slightly large to recognize “7”, From the information (position information (91, 92)) that there is a possibility of another character (for example, “2”), it is determined that the reliability as “7” is low. (S6).

【００６２】イメージ補間処理部１４４’では、上記ス
テップＳ５で得た認識候補情報（認識候補文字コードＣ
ｏ１及び距離計算結果Ｄｉ１）と、罫線接触情報格納部
１３０に格納されているこの切り出し文字イメージ（図
９（ａ））の罫線接触情報Ｉｆ２を基にイメージブロッ
クが一つになるように罫線除去処理（Ｓ１）で失われた
区間をつなぐイメージ補間処理を行い、文字イメージＣ
ｉ２（図９の例では図９（ｂ）の文字イメージ）を取得
する。すなわち、イメージ補間処理部１４４’で認識候
補文字コードＣｏ１を調べると、その文字コードで表さ
れる文字「７」は数字であるから１ブロックからなる文
字と判定し、ステップＳ１の罫線等の除去処理で除去さ
れた部分（図９（ａ）の例では位置（９１，９２）を補
間してつなぎ、１ブロックからなる文字イメージＣｉ２
（図９（ｂ））を得る（Ｓ７）。In the image interpolation processing section 144 ', the recognition candidate information (recognition candidate character code C) obtained in step S5 is obtained.
o1 and the distance calculation result Di1) and the ruled line contact information If2 of this cut-out character image (FIG. 9A) stored in the ruled line contact information storage unit 130 so as to remove the ruled line so that the number of image blocks becomes one. The image interpolation processing for connecting the sections lost in the processing (S1) is performed, and the character image C
i2 (the character image of FIG. 9B in the example of FIG. 9) is obtained. That is, when the image interpolation processing unit 144 'examines the recognition candidate character code Co1, the character "7" represented by the character code is determined to be a character consisting of one block because it is a numeral, and the ruled line and the like are removed in step S1. The part removed by the processing (in the example of FIG. 9A, the positions (91, 92) are interpolated and connected, and the character image Ci2 composed of one block is obtained.
(FIG. 9B) is obtained (S7).

【００６３】次に、特徴抽出部１４１で上記Ｓ７で得た
補正後の文字イメージＣｉ２または後述のＳ９”で再補
間された文字イメージＣｉ２に対して特徴抽出を行って
特徴量Ｆｄ２を算出し（Ｓ８）、識別部１４３でこの特
徴量Ｆｄ２と辞書部１４２の各テンプレートに格納され
ている標準的な特徴量Ｆｄｄとの距離計算を行い、距離
の小さい順から上位３位までの認識候補文字コードＣｏ
２及び距離計算結果Ｄｉ２を認識候補情報として取得す
る（Ｓ９）。Next, the feature extracting unit 141 performs feature extraction on the corrected character image Ci2 obtained in S7 or the character image Ci2 re-interpolated in S9 ″ described later to calculate a feature amount Fd2 ( S8) The identification unit 143 calculates the distance between the feature value Fd2 and the standard feature value Fdd stored in each template of the dictionary unit 142, and recognizes candidate character codes from the smallest distance to the top three. Co
2 and the distance calculation result Di2 are acquired as recognition candidate information (S9).

【００６４】補間イメージ判定部１４５は補間後の文字
イメージＣｉ２に対する認識候補情報を基に補間イメー
ジの信頼性を判定する。ここで、図９（ｂ）の「２」の
文字イメージ（補正後の文字イメージＣｉ２）の認識候
補文字コードＣｏ１は第３位まで全て「２」を示す文字
コードであり辞書部１４２で持っている標準的な特長量
Ｆｄｄとの距離も大きいとすると、補間イメージ判定部
１４５はイメージとしての信頼性が高いと判定してＳ１
０に移行し再イメージ補間（Ｓ９”）は行わない。The interpolated image determining unit 145 determines the reliability of the interpolated image based on the recognition candidate information for the interpolated character image Ci2. Here, the recognition candidate character codes Co1 of the character image “2” (the corrected character image Ci2) in FIG. 9B are all character codes indicating “2” to the third place, and are held by the dictionary unit 142. If the distance from the standard feature value Fdd is also large, the interpolation image determination unit 145 determines that the reliability as an image is high, and S1
The process proceeds to 0 and re-image interpolation (S9 ") is not performed.

【００６５】ステップＳ１０（図２）で、棄却判定部１
４６は上記Ｓ９で得られた文字イメージＣｉ２に対する
認識候補情報を基に認識結果を出力するか、リジェクト
コードを出力する（図９（ｃ）の例では認識結果の信頼
度が高いと判定され認識結果が出力される）。In step S10 (FIG. 2), rejection determination section 1
46 outputs a recognition result based on the recognition candidate information for the character image Ci2 obtained in S9 or outputs a reject code (in the example of FIG. 9C, the recognition result is determined to have high reliability and the recognition is performed). The result is output).

【００６６】上記構成により、罫線除去処理によって図
９（ａ）に示したように文字が数ブロックに分かれてい
て１度のイメージ補間で分離している部分を補間しても
余分な線分により認識できないような場合にも、余分な
補間部分を除去する再イメージ補間を行うことにより信
頼性の高い認識結果を得ることができる。With the above configuration, even if a part where a character is divided into several blocks and separated by one image interpolation is interpolated by the ruled line removal processing as shown in FIG. Even in the case where recognition cannot be performed, a highly reliable recognition result can be obtained by performing re-image interpolation for removing an unnecessary interpolation portion.

【００６７】上記構成により、罫線除去処理によって記
入文字の一部のストロークの大部分が枠線３９に重な
り、罫線除去部１１０によって図９（ａ）に示すように
そのストロークを失ってしまった場合でも、従来のよう
に誤ったイメージ補間（図１０（ｂ）を行うことなく失
われたストローク部分を再現することができる。According to the above configuration, a case where most of the stroke of a part of the input character overlaps with the frame line 39 by the ruled line removing process and the stroke is lost by the ruled line removing unit 110 as shown in FIG. However, it is possible to reproduce a lost stroke portion without performing erroneous image interpolation (FIG. 10B) as in the related art.

【００６８】以上、本発明のいくつかの実施例について
説明したが本発明はこれらの実施例に限定されるもので
はなく、種々の変形実施が可能であることはいうまでも
ない。Although several embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and it goes without saying that various modifications can be made.

【００６９】[0069]

【発明の効果】上記説明したように、第１の発明の文字
認識方法及び第３の発明の文字認識装置によれば、罫線
除去の際、罫線と接触していた部分の情報を保持してお
き、その情報を用いて文字イメージを補間するので、ス
クロール方向のいかんによらず文字イメージの補間がで
き、また、罫線に接触していた文字が幾つかの部分（ブ
ロック）に分離されても補間を行うことができるので非
ドロップアウトカラーの罫線等を１色刷りした帳票等を
用いても認識率の高い文字認識を実現できる。As described above, according to the character recognition method of the first invention and the character recognition device of the third invention, the information of the portion that has been in contact with the ruled line is removed when the ruled line is removed. Since the character image is interpolated using the information, the character image can be interpolated irrespective of the scroll direction, and even if the character touching the ruled line is separated into several parts (blocks). Since interpolation can be performed, character recognition with a high recognition rate can be realized even using a form in which ruled lines and the like of non-dropout colors are printed in one color.

【００７０】また、第１の発明の文字認識方法及び第３
の発明の文字認識装置によれば、罫線除去により多くの
ブロックに分離され、１度のイメージ補間では認識度が
低いため文字に対してはイメージ補間を２度行い、最初
のイメージ補間の結果（補間後の文字イメージ）を罫線
除去の際、罫線と接触していた部分の情報で補正できる
ので、非ドロップアウトカラーの罫線等を１色刷りした
帳票等を用いても更に高い認識率の文字認識を実現でき
る。Further, the character recognition method of the first invention and the third
According to the character recognition apparatus of the present invention, the image data is separated into many blocks by ruled line removal, and the degree of recognition is low in one image interpolation. Therefore, image interpolation is performed twice for characters, and the result of the first image interpolation ( The character image after interpolation can be corrected with the information of the part that was in contact with the ruled line when the ruled line was removed, so that even if a non-dropout color ruled line or the like is printed in one color, character recognition with a higher recognition rate can be achieved. Can be realized.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の文字認識装置の一実施例の構成を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration of an embodiment of a character recognition device of the present invention.

【図２】図１の文字認識装置による文字認識動作例を示
すフローチャートである。FIG. 2 is a flowchart illustrating an example of a character recognition operation performed by the character recognition device of FIG. 1;

【図３】読み込み文字イメージの例を示す図である。FIG. 3 is a diagram illustrating an example of a read character image.

【図４】罫線等の除去後の文字イメージの例を示す図で
ある。FIG. 4 is a diagram showing an example of a character image after removing ruled lines and the like.

【図５】罫線等の除去後の文字を例としたイメージ補間
及び認識結果を示す図である。FIG. 5 is a diagram illustrating an image interpolation and recognition result of a character after removing ruled lines and the like as an example.

【図６】本発明の文字認識装置の一実施例の構成を示す
ブロック図である。FIG. 6 is a block diagram showing the configuration of an embodiment of the character recognition device of the present invention.

【図７】図６の文字認識装置による文字認識動作例を示
すフローチャートである。FIG. 7 is a flowchart illustrating an example of a character recognition operation performed by the character recognition device of FIG. 6;

【図８】罫線等の除去後の文字を例としたイメージ補間
及び認識結果を示す図である。FIG. 8 is a diagram showing an image interpolation and recognition result of a character after removing ruled lines and the like as an example.

【図９】罫線等の除去後の文字を例としたイメージ補間
及び認識結果を示す図である。FIG. 9 is a diagram showing an image interpolation and recognition result of a character after removing ruled lines and the like as an example.

【図１０】本発明の文字認識方法による認識結果と、従
来の文字認識方法による認識結果の比較説明図である。FIG. 10 is a diagram illustrating a comparison between a recognition result obtained by the character recognition method of the present invention and a recognition result obtained by the conventional character recognition method.

【符号の説明】[Explanation of symbols]

１００、１００’ 文字認識装置１１０罫線除去部（罫線除去手段）１２０文字切り出し部（切り出し手段、罫線接触情報
取得手段）１４１特長抽出部（文字認識手段）１４２辞書部（文字認識手段）１４３識別部（文字認識手段）１４４イメージ補間処理部（文字イメージ補間手段）１４５補間イメージ判定部（補間イメージ判定手段）100, 100 ′ Character recognition device 110 Ruled line removing unit (ruled line removing unit) 120 Character cutout unit (cutout unit, ruled line contact information acquiring unit) 141 Feature extracting unit (character recognizing unit) 142 Dictionary unit (character recognizing unit) 143 identifying unit (Character Recognition Unit) 144 Image Interpolation Processing Unit (Character Image Interpolation Unit) 145 Interpolation Image Judgment Unit (Interpolation Image Judgment Unit)

【手続補正書】[Procedure amendment]

【提出日】平成１２年１月２８日（２０００．１．２
８）[Submission date] January 28, 2000 (2000.1.2
8)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Correction target item name] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【特許請求の範囲】[Claims]

【手続補正２】[Procedure amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１０[Correction target item name] 0010

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００１０】また、第２の発明は上記第１の発明の文字
認識方法において、補間後の文字イメージが所定の条件
を満たさない場合に、該認識結果と接触情報を基に、原
稿の読み取りイメージから罫線等のイメージを取り除く
際に該文字イメージから取り除かれた接触部分をつなぐ
ように補間した補間後の文字イメージのうち、余分に生
成された部分をとり除いて補間した文字イメージを作成
し、再補間後の文字イメージの認識処理を行う、ことを
特徴とする。According to a second aspect of the present invention, in the character recognition method according to the first aspect of the present invention, when the character image after interpolation does not satisfy a predetermined condition, a read image of the original is read based on the recognition result and the contact information. When removing an image such as a ruled line from the interpolated character image interpolated to connect the contact portions removed from the character image, an extraly generated portion is removed to create an interpolated character image, And performing recognition processing of the character image after the re-interpolation.

【手続補正３】[Procedure amendment 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１１[Correction target item name] 0011

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００１１】また、第３の発明の文字認識装置は、読み
取った原稿の読み取りイメージから罫線等のイメージを
取り除く罫線除去手段と、この罫線除去手段によって罫
線等が取り除かれた文字イメージから１文字ずつ文字イ
メージを切り出す切り出し手段と、罫線除去手段によっ
て罫線等が取り除かれた文字イメージから罫線等が接触
していた部分の接触情報を取得する罫線接触情報取得手
段と、切り出し手段によって切り出された文字イメージ
の認識処理を行なうと共に該認識処理の結果を評価する
認識手段と、この文字認識手段による認識結果の評価が
所定の条件を満たさない場合に、該認識結果と前記接触
情報を基に、前記罫線除去手段による罫線除去の際に該
文字イメージから取り除かれた接触部分をつなぐように
補間する文字イメージ補間手段とを備えたことを特徴と
する。A character recognition device according to a third aspect of the present invention provides a ruled line removing means for removing an image such as a ruled line from a read image of a read original, and one character at a time from a character image from which a ruled line or the like has been removed by the ruled line removing means. A cutout means for cutting out a character image, a ruled line contact information obtaining means for obtaining contact information of a portion where a ruled line or the like has contacted from a character image from which a ruled line or the like has been removed by a ruled line removing means, and a character image cut out by the cutout means And evaluate the result of the recognition process
When the evaluation of the recognition result by the recognition unit and the character recognition unit does not satisfy a predetermined condition, the character line is removed from the character image when the ruled line is removed by the ruled line removal unit based on the recognition result and the contact information. Character image interpolating means for interpolating so as to connect the contact portions.

【手続補正４】[Procedure amendment 4]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１２[Correction target item name] 0012

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００１２】また、第４の発明は上記第３の発明の文字
認識装置において、文字イメージ補間手段による補間後
の文字イメージが所定の条件を満たしているか否かを判
定する補間イメージ判定手段を備え、文字イメージ補間
手段は、文字イメージ補間手段による補間後の文字イメ
ージが補間イメージ判定手段によって所定の条件を満た
さないと判定された場合に、該認識結果と接触情報を基
に、接触部分をつなぐように補間した文字イメージのう
ち、余分に生成された部分を除いて補間した文字イメー
ジを作成する手段を含むことを特徴とする。According to a fourth aspect, in the character recognition apparatus according to the third aspect, there is provided an interpolated image determining means for determining whether or not the character image interpolated by the character image interpolating means satisfies a predetermined condition. When the character image interpolated by the character image interpolation means is determined not to satisfy the predetermined condition by the interpolation image determination means, the character image interpolation means, based on the recognition result and the contact information, The image processing apparatus further includes means for creating an interpolated character image by removing an extraly generated portion from the character image interpolated to connect the contact portions.

【手続補正５】[Procedure amendment 5]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００５６[Correction target item name] 0056

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００５６】次に、ステップＳ９’で補正イメージ判定
部１４５は補正後の文字イメージＣｉ２に対する認識候
補情報を基に補間イメージの信頼性を判定する。ここ
で、図８（ｃ）の「５」の文字イメージ（再補正後の文
字イメージＣｉ２）の認識候補文字コードＣｏ１は第３
位までが「５」を示す文字コードであり、辞書１４２で
持っている標準的な特徴量Ｆｄｄとの距離が小さいとす
ると、補正イメージ判定部１４５はイメージとしての信
頼性が高いと判定してＳ１０に移行する。Next, in step S9 ', the corrected image determining unit 145 determines the reliability of the interpolated image based on the recognition candidate information for the corrected character image Ci2. Here, the recognition candidate character code Co1 of the character image of “5” in FIG. 8C (the character image Ci2 after re-correction) is the third character code.
A character code indicating position until the "5", when the distance between the standard feature amount Fdd have a dictionary 142 is small, the correction image determination unit 145 determines that the reliability of the image To S10.

【手続補正６】[Procedure amendment 6]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００６１[Correction target item name] 0061

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００６１】次に、特徴抽出部１４１で図９（ａ）の切
り出しイメージに対して特徴抽出を行って特徴量Ｆｄ１
を算出し（Ｓ４）、識別部１４３でこの特徴量Ｆｄ１と
辞書部１４２の各テンプレートに格納されている標準的
な特徴量Ｆｄｄとの距離計算を行い、距離の小さい順
（特徴の近い順）から上位３位までの認識候補文字コー
ドＣｏ１及び距離計算結果Ｄｉ１を認識候補情報として
取得する（Ｓ６）。ここで得られた第３位までの認識候
補文字コードＣｏ１は全てが「７」を示す文字コードで
あるが、辞書部１４２の持っている標準的な特徴量Ｆｄ
ｄとの距離が少し大きい（つまり、所定値以上）とする
と、この例の場合、「７」と認識するには標準的な特徴
量Ｆｄｄとの距離が少し大きく、下側の罫線と接触して
いるという情報（位置情報（９１，９２））から他の文
字（例えば「２」）の可能性が在るので「７」としての
信頼性は低いと判定され、イメージ補間を要するものと
してＳ７に移行する（Ｓ６）。Next, the feature extraction unit 141 performs feature extraction on the cut-out image of FIG.
(S4), and the identification unit 143 calculates the distance between the feature value Fd1 and the standard feature value Fdd stored in each template of the dictionary unit 142. Then, the recognition candidate character code Co1 and the distance calculation result Di1 from the top to the top three are acquired as recognition candidate information (S6). All of the recognition candidate character codes Co1 up to the third place obtained here are character codes indicating “7”, but the standard feature amount Fd of the dictionary unit 142 is included.
Assuming that the distance to d is slightly larger (that is, a predetermined value or more), in this example, a standard feature is used to recognize “7”.
The information (position information (91, 92)) indicating that the distance from the amount Fdd is slightly larger and in contact with the lower ruled line indicates that there is a possibility of another character (for example, “2”). Is determined to be low in reliability, and it is determined that image interpolation is required, and the process proceeds to S7 (S6).

【手続補正７】[Procedure amendment 7]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】符号の説明[Correction target item name] Explanation of sign

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【符号の説明】１００、１００’ 文字認識装置１１０罫線除去部（罫線除去手段）１２０文字切り出し部（切り出し手段、罫線接触情報
取得手段）１４１特徴抽出部（文字認識手段）１４２辞書部（文字認識手段）１４３識別部（文字認識手段）１４４イメージ補間処理部（文字イメージ補間手段）１４５補間イメージ判定部（補間イメージ判定手段）[Description of Reference Numerals] 100, 100 'character recognition device 110 line removal section (line removal means) 120 character segmentation unit (clipping means, borders contact information acquisition means) 141 feature extraction unit (character recognition means) 142 dictionary unit (character Recognition unit) 143 Identification unit (character recognition unit) 144 Image interpolation processing unit (character image interpolation unit) 145 Interpolation image determination unit (interpolation image determination unit)

Claims

【特許請求の範囲】[Claims]

【請求項１】読み取った原稿の読み取りイメージから
罫線等のイメージを取り除いて１文字ずつ文字イメージ
を切り出して文字認識を行う文字認識方法であって、前記文字イメージの切り出しの際に切り出された文字イ
メージと罫線等との接触情報を取得し、前記切り出し文字の認識処理を行い、その認識結果が所定の条件を満たさない場合に、該認識
結果と前記接触情報を基に、原稿の読み取りイメージか
ら罫線等のイメージを取り除く際に該文字イメージから
取り除かれた接触部分をつなぐように補間した文字イメ
ージを作成し、上記補間後の文字イメージの認識処理を行う、ことを特
徴とする文字認識方法。1. A character recognition method for removing characters such as ruled lines from a read image of a read original and extracting a character image one character at a time to perform character recognition, wherein the character extracted when the character image is extracted. Acquiring contact information between an image and a ruled line, etc., and performing recognition processing of the cut-out character.If the recognition result does not satisfy a predetermined condition, based on the recognition result and the contact information, a document read image is used. A character recognition method, comprising: creating a character image interpolated so as to connect contact portions removed from the character image when removing an image such as a ruled line; and performing recognition processing of the character image after the interpolation.

【請求項２】前記補正後の文字イメージが所定の条件
を満たさない場合に、該認識結果と前記接触情報を基
に、原稿の読み取りイメージから罫線等のイメージを取
り除く際に該文字イメージから取り除かれた接触部分を
つなぐように補間した補間後の文字イメージのうち、余
分に生成された部分をとり除いて補正した文字イメージ
を作成し、上記再補間後の文字イメージの認識処理を行う、ことを
特徴とする請求項１記載の文字認識方法。2. When the corrected character image does not satisfy a predetermined condition, when removing an image such as a ruled line from a read image of a document based on the recognition result and the contact information, the character image is removed from the character image. Of the interpolated character image interpolated so as to connect the touched parts, and create a corrected character image by removing the extraly generated part, and perform the re-interpolated character image recognition process. 2. The character recognition method according to claim 1, wherein:

【請求項３】読み取った原稿の読み取りイメージから
罫線等のイメージを取り除く罫線除去手段と、この罫線除去手段によって罫線等が取り除かれた文字イ
メージから１文字ずつ文字イメージを切り出す切り出し
手段と、前記罫線除去手段によって罫線等が取り除かれた文字イ
メージから罫線等が接触していた部分の接触情報を取得
する罫線接触情報取得手段と、前記切り出し手段によって切り出された文字イメージの
認識処理を行なうと共に認識手段による認識結果を評価
する文字認識手段と、この文字認識手段による認識結果の評価が所定の条件を
満たさない場合に、該認識結果と前記接触情報を基に、
前記罫線除去手段による罫線除去の際に該文字イメージ
から取り除かれた接触部分をつなぐように補間する文字
イメージ補間手段とを備えたことを特徴とする文字認識
装置。3. A ruled line removing means for removing an image such as a ruled line from a read image of a read document; a cutout means for cutting out a character image one by one from a character image from which a ruled line or the like has been removed by the ruled line removing means; A ruled line contact information acquiring means for acquiring contact information of a portion where the ruled line or the like is in contact from the character image from which the ruled line or the like has been removed by the removing means; Character recognition means for evaluating the recognition result by, and when the evaluation of the recognition result by the character recognition means does not satisfy a predetermined condition, based on the recognition result and the contact information,
A character recognition apparatus comprising: a character image interpolating means for interpolating so as to connect contact portions removed from the character image when the ruled line removing means removes a ruled line.

【請求項４】前記文字イメージ補間手段による補間後
の文字イメージが所定の条件を満たしているか否かを判
定する補間イメージ判定手段を備え、前記文字イメージ補正手段は、該文字イメージ補間手段
による補間後の文字イメージが前記補間イメージ判定手
段によって所定の条件を満たさないと判定された場合
に、該認識結果と前記接触情報を基に、接触部分をつな
ぐように補間した文字イメージのうち、余分に生成され
た部分を除いて補正した文字イメージを作成する手段を
含むことを特徴とする請求項３記載の文字認識装置。4. An interpolation image determining means for determining whether or not a character image interpolated by the character image interpolating means satisfies a predetermined condition, wherein the character image correcting means comprises an When the subsequent character image is determined not to satisfy the predetermined condition by the interpolation image determination means, based on the recognition result and the contact information, an extra one of the character images interpolated so as to connect the contact portions. 4. The character recognition apparatus according to claim 3, further comprising means for creating a corrected character image excluding the generated portion.

【請求項５】前記接触情報は、罫線等と文字の接触方
向、罫線等と文字の接触又は重複個所数、罫線等と文字
の接触部分又は重複部分の端部または両端の位置の全部
またはそれらの組み合わせであることを特徴とする請求
項３又は４記載の文字認識装置。5. The contact information includes a contact direction of a ruled line or the like and a character, the number of contacting or overlapping portions of the ruled line or the like with a character, and all or all of the positions of the end portions or both ends of the contact portion or the overlapping portion of the ruled line or the like and the character. The character recognition device according to claim 3, wherein the combination is a combination of the following.