JP3239965B2 - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JP3239965B2
JP3239965B2 JP01850593A JP1850593A JP3239965B2 JP 3239965 B2 JP3239965 B2 JP 3239965B2 JP 01850593 A JP01850593 A JP 01850593A JP 1850593 A JP1850593 A JP 1850593A JP 3239965 B2 JP3239965 B2 JP 3239965B2
Authority
JP
Japan
Prior art keywords
character
character frame
black
pixels
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP01850593A
Other languages
Japanese (ja)
Other versions
JPH06231304A (en
Inventor
良則 桑村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP01850593A priority Critical patent/JP3239965B2/en
Publication of JPH06231304A publication Critical patent/JPH06231304A/en
Application granted granted Critical
Publication of JP3239965B2 publication Critical patent/JP3239965B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、文字認識装置に関し、
特に、文字枠が非ドロップアウトカラーの帳票に記入さ
れた手書き文字を認識することのできる文字認識装置に
関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device,
In particular, the present invention relates to a character recognition device capable of recognizing handwritten characters entered in a form with a non-dropout color character frame.

【0002】近年、コンピュータ等の情報処理装置の普
及に伴い、これらの装置への入力手段として、紙等に書
いた手書きの文字や印刷文字を読み取る文字認識の技術
が要求されている。この文字認識装置では、予め設計さ
れた帳票上の特定の位置に記入された手書きの文字や印
刷文字を光学的に読み取って認識するものである。
In recent years, with the spread of information processing apparatuses such as computers, a technique of character recognition for reading handwritten or printed characters written on paper or the like has been required as an input means to these apparatuses. This character recognition device optically reads and recognizes handwritten characters and printed characters written at specific positions on a previously designed form.

【0003】[0003]

【従来の技術】一般に、文字認識のためのOCR帳票で
は、文字の記入位置を特定するために、文字枠が必要で
ある。しかし、文字を認識する処理においては、文字枠
は余分なものであり、文字認識の邪魔になる存在であ
り、認識処理を行う前に文字枠を取り除く必要がある。
2. Description of the Related Art In general, an OCR form for character recognition requires a character frame in order to specify a character entry position. However, in the process of recognizing a character, the character frame is redundant and interferes with character recognition, and it is necessary to remove the character frame before performing the recognition process.

【0004】そして、従来、上記文字枠の存在により認
識が不能になるのを防止するための手法としては、文字
枠を薄い赤や薄い青の特殊なインクにより印刷すること
が行われている。この手法は、ドロップアウトカラーで
印刷された文字枠が読み取り装置に反応しなことを利用
するもので、スキャン時に鉛筆あるいはボールペンで書
かれた文字のみが読み取り装置に取り込まれる。
[0004] Conventionally, as a method for preventing recognition from becoming impossible due to the presence of the character frame, printing of the character frame with a special ink of light red or light blue has been performed. This method utilizes the fact that a character frame printed in a dropout color does not respond to a reading device, and only characters written with a pencil or a ballpoint pen are taken into the reading device during scanning.

【0005】[0005]

【発明が解決しようとする課題】しかし、ドロップアウ
トカラーによる文字枠の印刷は、特殊インクを使用する
ために、コストが非常に高い上に、ドロップアウトカラ
ーは一般に人間が見にくい色が多いために、帳票記入も
しにくいという欠点を有するものであった。
However, the printing of a character frame using a dropout color uses a special ink, so the cost is very high, and the dropout color generally has many colors that are difficult for humans to see. However, there is a disadvantage that it is difficult to fill out a form.

【0006】本発明は、以上の欠点を解消すべくなされ
たものであって、非ドロップアウトカラーにより印刷さ
れた文字枠内に書かれた文字も正確に認識することので
きる文字認識装置を提供することを目的とする。
SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned drawbacks, and provides a character recognition device capable of accurately recognizing characters written in a character frame printed in a non-dropout color. The purpose is to do.

【0007】[0007]

【課題を解決するための手段】図1に本発明の原理図を
示し、文字認識装置は、帳票画像を読み込む画像入力部
8と、文字枠検出部3において検出された文字枠2の内
周に接するすべての黒画素から文字枠2内方に向けて、
黒画素を始端として隣接する黒画素を追跡し、所定長連
続しない黒画素を削除して輪郭形成画素を求める輪郭追
跡部4と、輪郭追跡部4により得られた輪郭形成画素の
方向を演算する輪郭方向線演算部5と、輪郭方向線を文
字枠2方向に延長して、文字枠2内における延長線を白
黒境界線とし、該白黒境界線により包囲された領域以外
の文字枠2を白画素に変換して文字枠2との干渉部分か
ら文字6部分を抽出する文字部抽出部7とを備えて構成
される。
FIG. 1 shows the principle of the present invention. The character recognition apparatus comprises an image input unit 8 for reading a form image and an inner periphery of the character frame 2 detected by the character frame detection unit 3. From all black pixels in contact with
Track black pixels adjacent to the black pixel and the starting end, a predetermined length continuous
A contour tracing unit 4 for obtaining contour forming pixels by deleting black pixels that do not continue, a contour direction calculating unit 5 for calculating the direction of the contour forming pixels obtained by the contour tracing unit 4, and a contour direction line as a sentence.
Extending in the character frame 2 direction, the extension line in the character frame 2 is set as a black-and-white boundary line, and the character frame 2 other than the area surrounded by the black-and-white boundary line is converted into white pixels, and the interference portion with the character frame 2 And a character part extracting unit 7 for extracting a character 6 part from the character string.

【0008】[0008]

【作用】画像入力部8において読み込まれた読み込み画
像1は、予め帳票に印刷された文字枠2や、認識対象と
なる手書き、あるいは印刷による文字6を含み、帳票設
計時において予め決定されている文字枠2の位置は、文
字枠検出部3において検出される。文字枠2の検出は、
例えば周辺分布等、公知の手段により容易に行うことが
できる。
The read image 1 read by the image input unit 8 includes a character frame 2 printed on a form in advance and a handwritten or printed character 6 to be recognized, and is determined in advance when designing the form. The position of the character frame 2 is detected by the character frame detection unit 3. The detection of character frame 2
For example, it can be easily performed by a known means such as a peripheral distribution.

【0009】輪郭追跡部4は、文字枠検出部3において
検出された文字枠2の内周に接する黒画素を抽出し、該
黒画素を始端として枠内方に向けて輪郭追跡を行う。輪
郭追跡部4における輪郭追跡画素数は、例えば4画素、
あるいは5画素とというように予め設定されており、こ
の閾値に満たない黒画素は、文字枠2とみなしてして白
画素に変換する。
The contour tracing unit 4 extracts a black pixel in contact with the inner periphery of the character frame 2 detected by the character frame detecting unit 3, and performs contour tracing toward the inside of the frame with the black pixel as a starting end. The number of contour tracking pixels in the contour tracking unit 4 is, for example, 4 pixels.
Alternatively, black pixels less than this threshold value are set in advance as five pixels, and are regarded as character frame 2 and converted to white pixels.

【0010】この輪郭追跡部4において追跡された輪郭
形成画素は、文字枠2に接し、かつ、所定長の連続性を
有する黒画素の集合であり、文字枠2に干渉し、かつ、
文字6の一部を構成するものと推定される。
The contour forming pixels tracked by the contour tracking unit 4 are a set of black pixels that are in contact with the character frame 2 and have a continuity of a predetermined length, interfere with the character frame 2, and
It is presumed that it constitutes a part of the character 6.

【0011】輪郭方向線演算部5は、上記輪郭追跡部4
において追跡された輪郭形成画素から、輪郭線の方向を
演算する。文字部抽出部7は、以上のようにして得られ
た輪郭方向線を文字枠2方向に延長して文字枠2内での
白黒境界線を求める。輪郭方向線の文字枠2内への延長
線である白黒境界線は、文字6の輪郭線の延長と考えら
れ、白黒境界線により包囲された領域以外の黒画素を白
画素に変換すると、文字枠2と干渉している文字6構成
要素を抽出することができる。
The contour direction line computing section 5 is provided with the contour tracing section 4.
Calculates the direction of the contour line from the contour forming pixels tracked in. The character portion extraction unit 7 extends the outline direction line obtained as described above in the direction of the character frame 2 and obtains a black and white boundary line in the character frame 2. The black-and-white border, which is an extension of the contour direction line into the character frame 2, is considered to be an extension of the contour of the character 6, and when black pixels other than the area surrounded by the black-and-white border are converted into white pixels, the character The character 6 component that interferes with the frame 2 can be extracted.

【0012】[0012]

【実施例】以下、本発明の望ましい実施例を添付図面に
基づいて詳細に説明する。先ず、帳票は、図1に示すよ
うに、非ドロップアウトカラーにより印刷された文字枠
2を有しており、文字枠2内に記入された文字6は、画
像入力部8から取り込まれる。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below in detail with reference to the accompanying drawings. First, as shown in FIG. 1, the form has a character frame 2 printed in a non-dropout color, and the characters 6 entered in the character frame 2 are taken in from the image input unit 8.

【0013】画像入力部8は、帳票を画像データ化し、
文字記入位置等のデータが格納された帳票データ(図示
せず)を参照し、その部分の画像を入力する。画像入力
部8により取り込まれた読み込み画像1は、文字6とと
もに、文字枠2成分を含んでいる。
The image input unit 8 converts the form into image data,
Reference is made to form data (not shown) in which data such as a character entry position is stored, and an image of that part is input. The read image 1 captured by the image input unit 8 includes a character 6 and a character frame 2 component.

【0014】読み込み画像1の一例を図2に示す。図示
の例は、文字枠2内に「8」を手書きし、かつ、文字6
が文字枠2に接触している例を示し、図6に示すよう
に、文字枠2を単純に削除したり、文字枠2の中だけを
認識しようとしても誤認識や認識不能に陥ってしまう場
合が示されている。なお、読み込み画像1を説明する図
2、図6において、黒画素は黒丸で、白画素は白丸で示
されている。
FIG. 2 shows an example of the read image 1. In the illustrated example, “8” is handwritten in the character frame 2 and the character 6
Shows an example in which the character is in contact with the character frame 2. As shown in FIG. 6, even if the character frame 2 is simply deleted or only the inside of the character frame 2 is to be recognized, erroneous recognition or recognition failure occurs. The case is shown. In FIGS. 2 and 6 illustrating the read image 1, black pixels are indicated by black circles, and white pixels are indicated by white circles.

【0015】以下、上述した読み込み画像1を例に取っ
て、文字6の認識動作を説明する。図3に示すように、
文字認識に先立ち、文字枠検出部3において文字枠2の
位置を特定する。
Hereinafter, the recognition operation of the character 6 will be described by taking the above-described read image 1 as an example. As shown in FIG.
Prior to character recognition, the position of the character frame 2 is specified by the character frame detection unit 3.

【0016】枠位置の特定には、周知の周辺分布法が適
用可能である。図4に、周辺分布による文字枠2の特定
手順を詳細に示す。先ず、読み込み画像1の黒画素数を
縦、横方向にカウントし、その度数分布を取る。次い
で、以上のようにして得られたヒストグラム値のピーク
を求め、読み取り画像の端縁に近く、ある程度の分布高
さを持つところを枠位置と特定する。ここで、意味を持
つのは、文字枠2の内側であり、その位置を図2におい
て符号a1〜a4で示し、その位置から求めた文字枠2
の内周を符号bで示す。
To specify the frame position, a well-known marginal distribution method can be applied. FIG. 4 shows a detailed procedure for specifying the character frame 2 based on the marginal distribution. First, the number of black pixels in the read image 1 is counted in the vertical and horizontal directions, and the frequency distribution is obtained. Next, the peak of the histogram value obtained as described above is obtained, and a portion close to the edge of the read image and having a certain distribution height is specified as a frame position. Here, what is significant is the inside of the character frame 2, and its position is indicated by reference numerals a1 to a4 in FIG.
Is denoted by reference numeral b.

【0017】この後、輪郭追跡部4に制御を移して、文
字輪郭の追跡を行う。すなわち、図4に示すように、上
述した手順で得られた文字枠2の内側の点を順次見てい
き、黒画素があるとその黒画素から内側へと黒画素の輪
郭を追跡していく。
Thereafter, the control is transferred to the contour tracing unit 4 to trace the character contour. That is, as shown in FIG. 4, points inside the character frame 2 obtained by the above-described procedure are sequentially looked at, and if there is a black pixel, the outline of the black pixel is traced inward from the black pixel. .

【0018】黒画素の追跡は、画素数が閾値(本例では
5画素)に達するまで行われ、例えば、図2において符
号cを付して示されるような黒画素は、次に追跡できる
黒画素がなく、追跡を終了するとともに、文字枠2の一
部とみなし削除する。
The tracking of black pixels is performed until the number of pixels reaches a threshold value (5 pixels in this example). For example, a black pixel indicated by reference numeral c in FIG. Since there is no pixel, the tracking is terminated, and it is regarded as a part of the character frame 2 and is deleted.

【0019】今、d1の黒画素に注目すると、輪郭の追
跡は、d2、d3、d4、d5となる。これは注目する
黒画素の隣接画素を見ることにより容易に行える。この
例では閾値を5画素に設定しているが、3画素程度でも
本処理の効果はあり、これは対象とする文字枠2の大き
さにより可変とすることができる。
Attention is now paid to the black pixel at d1, and the tracing of the contour is d2, d3, d4 and d5. This can be easily performed by looking at the pixel adjacent to the target black pixel. In this example, the threshold value is set to 5 pixels, but the effect of this processing can be obtained even with about 3 pixels, and this can be varied depending on the size of the target character frame 2.

【0020】方向線演算部5は、輪郭追跡部4により求
められた輪郭形成画素から、輪郭方向線を求める。ここ
では直線近似により行っており、これは最小二乗法等の
公知の技術により可能である。以上のようにして求めた
輪郭方向線をe1〜e10で示す。
The direction line calculating section 5 obtains a contour direction line from the contour forming pixels obtained by the contour tracking section 4. Here, the approximation is performed by linear approximation, which can be performed by a known technique such as the least square method. The contour direction lines obtained as described above are indicated by e1 to e10.

【0021】文字部抽出部7は、輪郭方向線e1〜e1
0から、文字枠2と干渉してい文字構成画素を分離す
る。すなわち、図3に示すように、先ず、輪郭方向線e
1〜e10を文字枠2の外側へ延ばし、輪郭方向線にか
かる画素位置を文字6と文字枠2の分離境界とする。
The character portion extracting section 7 is provided with the outline direction lines e1 to e1.
From 0, character constituent pixels that interfere with the character frame 2 are separated. That is, as shown in FIG.
1 to e10 are extended to the outside of the character frame 2, and a pixel position on the contour direction line is set as a separation boundary between the character 6 and the character frame 2.

【0022】図2の例では、点d1は、上が白画素で下
が黒画素のため、境界も同様に上が白で下が黒となるよ
うな境界とする。この処理を文字枠2に接する黒画素
で、かつ、白と黒(あるいは黒と白)の変わり目の画素
について順次行う。
In the example of FIG. 2, the point d1 is a boundary in which the upper part is white and the lower part is black because the upper part is a white pixel and the lower part is a black pixel. This process is sequentially performed on the black pixels that are in contact with the character frame 2 and at the transition point between white and black (or black and white).

【0023】以上のようにして求められた白黒境界線か
ら、境界外で白と判定された画素を消していくことは、
ラベリング等により行うことができ、その処理を終えた
後の画像は図5のようになり、文字6の欠けた部分はな
くなり、余計な部分もなく、従来の認識技術で「8」と
認識することが可能になる。
From the black-and-white border line obtained as described above, erasing pixels determined to be white outside the border is as follows.
After completion of the processing, the image is as shown in FIG. 5, and there is no missing part of the character 6, no extra part, and it is recognized as "8" by the conventional recognition technology. It becomes possible.

【0024】この後、文字6画像は、認識処理部9にお
いて、認識辞書10とのパターンマッチングを行い、文
字6の認識を行う。
Thereafter, the character 6 image is subjected to pattern matching with the recognition dictionary 10 in the recognition processing section 9 to recognize the character 6.

【0025】[0025]

【発明の効果】以上の説明から明らかなように、本発明
によれば、文字枠が非ドロップアウトカラーで印刷され
ている帳票において、文字と文字枠が接触、交差してい
る場合でも文字のみを正確に切り出すことができる。
As is clear from the above description, according to the present invention, in a form in which a character frame is printed in a non-dropout color, even if the character and the character frame touch or intersect, only the character Can be accurately cut out.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図2】本発明の実施例を示す説明図である。FIG. 2 is an explanatory diagram showing an embodiment of the present invention.

【図3】本発明の動作を示すフローチャートである。FIG. 3 is a flowchart showing the operation of the present invention.

【図4】文字枠検出部と輪郭追跡部の動作を示すフロー
チャートである。
FIG. 4 is a flowchart illustrating operations of a character frame detection unit and an outline tracking unit.

【図5】文字の切り取り状態を示す図である。FIG. 5 is a diagram showing a cut-out state of a character.

【図6】文字の切り取り状態を示す図である。FIG. 6 is a diagram illustrating a cut-out state of a character.

【符号の説明】[Explanation of symbols]

1 読み込み画像 2 文字枠 3 文字枠検出部 4 輪郭追跡部 5 方向線演算部 6 文字 7 文字部抽出部 DESCRIPTION OF SYMBOLS 1 Read image 2 Character frame 3 Character frame detection part 4 Outline tracking part 5 Direction line calculation part 6 Character 7 Character part extraction part

Claims (2)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】読み込み画像から文字枠を検出する文字枠
検出部と、 検出された文字枠の内周に接する黒画素を始端として文
字枠内方に向けて隣接する黒画素を追跡し、所定長連続
しない黒画素を削除して輪郭形成画素を求める輪郭追跡
部と、 輪郭形成画素の連続方向から輪郭方向線を求める方向線
演算部と、 輪郭方向線を文字枠方向に延長して、文字枠内における
延長線を白黒境界線とし、該白黒境界線により包囲され
た領域以外の文字枠を白画素に変換して文字枠との干渉
部分から文字部分を抽出する文字部抽出部とを備える文
字認識装置。
1. A character frame detecting unit for detecting a character frame from a read image, and a black pixel adjacent to an inner periphery of the detected character frame is tracked as a starting point to track an adjacent black pixel toward the inside of the character frame. Long continuous
A contour tracing unit that obtains a contour forming pixel by removing black pixels that are not used , a direction line calculating unit that calculates a contour direction line from a continuous direction of the contour forming pixels , In
A character recognition device comprising: a character portion extracting unit that converts a character frame other than an area surrounded by the black and white boundary line into a white pixel and extracts a character portion from an interference portion with the character frame, using the extension line as a black and white boundary line. .
【請求項2】前記文字枠検出部は、画素数の周辺分布か
ら文字枠を検出する請求項1記載の文字認識装置。
2. The character recognition device according to claim 1, wherein said character frame detecting section detects a character frame from a peripheral distribution of the number of pixels.
JP01850593A 1993-02-05 1993-02-05 Character recognition device Expired - Fee Related JP3239965B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP01850593A JP3239965B2 (en) 1993-02-05 1993-02-05 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP01850593A JP3239965B2 (en) 1993-02-05 1993-02-05 Character recognition device

Publications (2)

Publication Number Publication Date
JPH06231304A JPH06231304A (en) 1994-08-19
JP3239965B2 true JP3239965B2 (en) 2001-12-17

Family

ID=11973486

Family Applications (1)

Application Number Title Priority Date Filing Date
JP01850593A Expired - Fee Related JP3239965B2 (en) 1993-02-05 1993-02-05 Character recognition device

Country Status (1)

Country Link
JP (1) JP3239965B2 (en)

Also Published As

Publication number Publication date
JPH06231304A (en) 1994-08-19

Similar Documents

Publication Publication Date Title
US6600834B1 (en) Handwriting information processing system with character segmentation user interface
Maddouri et al. Text lines and PAWs segmentation of handwritten Arabic document by two hybrid methods
Shafait et al. Layout analysis of Urdu document images
CN111209865A (en) File content extraction method and device, electronic equipment and storage medium
Bukhari et al. Layout analysis of Arabic script documents
JPH07105312A (en) Method and device for eliminating dirt from character image in optical character reader
JP3239965B2 (en) Character recognition device
US7103220B2 (en) Image processing apparatus, method and program, and storage medium
JP2000082110A (en) Ruled line deletion device, character picture extraction device, ruled line deletion method, character picture extraction method and storage medium
JP3954247B2 (en) Document input method, recording medium recording document input program, and document input device
JP3794285B2 (en) Optical character reader
Airphaiboon et al. Recognition of handprinted Thai characters using loop structures
JP3163698B2 (en) Character recognition method
JP2580976B2 (en) Character extraction device
JP2909132B2 (en) Optical character reader
JP2877380B2 (en) Optical character reader
EP0701226B1 (en) Method for segmenting handwritten lines of text in a scanned image
JPH10162104A (en) Character recognition device
JP3190794B2 (en) Character segmentation device
JP2925270B2 (en) Character reader
Shima et al. A form dropout method based on line-elimination and image-subtraction
JP3356819B2 (en) Mark recognition device
JPH06231305A (en) Character recognition method and slip used for the method
JPS6031682A (en) Method and apparatus for region extraction of printed document picture
JPH10293845A (en) Broken-line recognition method

Legal Events

Date Code Title Description
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20010925

LAPS Cancellation because of no payment of annual fees