JPS62281096A

JPS62281096A - Character recognition device

Info

Publication number: JPS62281096A
Application number: JP61123725A
Authority: JP
Inventors: Akihiko Uekusa; 植草　明彦; Toshiaki Yagasaki; 矢ケ崎　敏明; Shinko Ishitani; 石谷　新子; Yumie Gou; 郷　由美恵
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1986-05-30
Filing date: 1986-05-30
Publication date: 1987-12-05
Anticipated expiration: 2012-02-05
Also published as: JP2578767B2

Abstract

PURPOSE:To remarkably shorten a processing time by reducing the quantity of feature information checked in a feature check processing. CONSTITUTION:A memory means 2 previously stores the prescribed standard feature information to recognize a character. A stroke length extraction means 572 counts the number of black picture elements horizontally and vertically on a photoelectrically transferred and read character pattern, thereby, extracts the horizontal and vertical stroke length information of the character pattern. A stroke number extraction means 573 extracts the information of the stroke number of the character pattern crossing a horizontal line on, for instance, the prescribed horizontal line set on the character pattern. A stroke position extraction means 574 extracts the stroke position information of the character pattern initially crossing the horizontal line on for instance, the prescribed horizontal line set on the character pattern. A recognition means 50 compares the feature information extracted by the extraction means 572-574 with the standard feature information stored by the memory means 2 to recognize the character.

Description

【発明の詳細な説明】３、発明の詳細な説明［産業上の利用分野］本発明は文字認識装置に関し、特に手書又は印刷による
英数字等を自動的に読み取って認識する文字認識装置に
関する。Detailed Description of the Invention 3. Detailed Description of the Invention [Field of Industrial Application] The present invention relates to a character recognition device, and particularly to a character recognition device that automatically reads and recognizes handwritten or printed alphanumeric characters. .

［従来の技術］従来、この種の文字認識装置は非常に複雑な認識処理を
行っており、その分認識に時間を要し、装置が高価であ
った。[Prior Art] Conventionally, this type of character recognition device performs very complicated recognition processing, which requires time for recognition and is expensive.

第９図は従来の文字認識処理の一例を示すフローチャー
トである。図において、用紙Ｐ上の文字は光電変換され
て読み取られ（ステップ５８１）、論理゛′１°°及び
０′°の２値化文字パターンに変換される（ステップ５
８２）。そして前記文字パターンにはその後の認識処理
を容易かつ確実なものとするための前処理が行われる（
ステップ５８３）。前処理は、例えば用紙Ｐ上の黒点等
に起因するノイズの除去ＩＡ埋や文字線の境界に生じた
ピーク又はボイド等の平滑化処理を含む一連の処理であ
る。次に、いくつかの特徴情報（交点、分岐点、ループ
数、ストローク長等の情報）を抽出する特徴抽出処理が
行われる（ステップ５８４）。認識対象が多様であると
きはこの特徴情報も相当の数になる。そして、この特徴
抽出結果に応じて単一文字候補が選び出されるときは、
その文字候補が認識出力になる（ステップＳ８５→ステ
ツプ５８８）。即ち、辞書話導を要しない場合である。FIG. 9 is a flowchart showing an example of conventional character recognition processing. In the figure, the characters on paper P are photoelectrically converted and read (step 581), and converted into a binary character pattern of logical ゛'1°° and 0'° (step 5).
82). The character pattern is then subjected to preprocessing to make the subsequent recognition process easy and reliable (
Step 583). The preprocessing is a series of processes including, for example, removal of noise caused by black dots on the paper P, smoothing of peaks or voids generated at the boundaries of character lines, and the like. Next, feature extraction processing is performed to extract some feature information (information such as intersection points, branch points, number of loops, stroke length, etc.) (step 584). When the objects to be recognized are diverse, the number of feature information becomes considerable. Then, when a single character candidate is selected according to this feature extraction result,
The character candidate becomes the recognition output (step S85→step 588). That is, this is a case where no dictionary guidance is required.

しかし、多くの場合は特徴情報を共通する複数の文字候
補が選び出され、更に唯一の文字を選び出すための詳細
な識別処理がなされる（ステップＳ８５→ステツプ５８
６）。この詳細な識別処理は一般に辞書照合処理といわ
れ、認識対象が多様になるとかなり複雑化し、かつ照合
に時間を要する。そして、この照合処理によって照合一
致が得られたときは特定文字候補が選び出される。また
、この照合処理によっても不一致の場合は最終的に認識
不能の結果が出力される（ステップ５８７−４ステツプ
８８８）。However, in many cases, multiple character candidates having common characteristic information are selected, and detailed identification processing is performed to select a unique character (step S85→step 58).
6). This detailed identification process is generally referred to as a dictionary matching process, and as the objects to be recognized become more diverse, it becomes considerably more complex and requires more time to perform the matching. Then, when a match is obtained through this matching process, a specific character candidate is selected. Furthermore, if there is a mismatch even after this matching process, an unrecognizable result is finally output (step 587-4, step 888).

このように、従来の文字認識装置は多様な文字を認識対
象とするためにアルファベット、数字等の簡易な文字（
認識し易い文字）を認識する場合でも上述のような複雑
な認識処理を行うことになる。従って、処理に時間がか
かり過ぎ、業務によってはコストパフォーマンスを著し
く低下させていた。In this way, conventional character recognition devices recognize simple characters such as alphabets and numbers (
Even when recognizing characters that are easy to recognize, the above-mentioned complex recognition processing is required. Therefore, processing takes too much time, and depending on the task, cost performance is significantly reduced.

［発明が解決しようとする問題点コ本発明は上述した従来技術の欠点に鑑みてなされたもの
であって、その目的とする所は、数字、アルファベット
等の文字を認識対象とし、簡単かつ高速に精度の高い認
識を行える文字認識装置を提供することにある。[Problems to be Solved by the Invention] The present invention has been made in view of the above-mentioned drawbacks of the prior art, and its purpose is to recognize characters such as numbers and alphabets, and to recognize them easily and quickly. An object of the present invention is to provide a character recognition device that can perform highly accurate recognition.

［問題点を解決するだめの手段］本発明の文字認識装置は上記目的を達成するため、文字
認識ための所定の標準特徴情報を記憶している記憶手段
と、読み取った文字パターン上を水平及び垂直方向にそ
の黒画素数を計数することにより前記文字パターンのス
トローク長情報を抽出するストローク長抽出手段と、前
記文字パターン上の所定の水平方向に対して該方向を横
切る前記文字パターンのストローク数を計数することに
よりストローク数情報を抽出するストローク数抽出手段
と、前記文字パターン上の所定の水平方向に対して該方
向を最初に横切る前記文字パターンのストローク位置を
検出することによりストローク位置情報を抽出するスト
ローク位置抽出手段と、前記各抽出手段が抽出した特徴
情報と前記記憶手段の記憶している標準特徴情報を比較
することにより文字認識をする認識手段を備える。[Means for Solving the Problems] In order to achieve the above object, the character recognition device of the present invention includes a storage means that stores predetermined standard characteristic information for character recognition, and a horizontal and a stroke length extraction means for extracting stroke length information of the character pattern by counting the number of black pixels in the vertical direction; and a stroke number of the character pattern that crosses a predetermined horizontal direction on the character pattern. stroke number extraction means for extracting stroke number information by counting; and stroke position information by detecting a stroke position of the character pattern that first crosses a predetermined horizontal direction on the character pattern. The apparatus includes a stroke position extraction means for extracting, and a recognition means for character recognition by comparing the feature information extracted by each of the extraction means with the standard feature information stored in the storage means.

［作用］かかる構成において、記憶手段は予め文字認識ための所
定の標準特徴情報を記憶している。一方、ストローク長
抽出手段は光電変換して読み取った文字パターン上を水
平及び垂直方向にその黒画素数を計数することにより前
記文字パターンの水平及び垂直のストローク長情報を抽
出する。またストローク数抽出手段は前記文字パターン
上に設定した例えば所定の水平ライン上で該水平ライン
を横切る前記文字パターンのストローク数の情報を抽出
する。またストローク位置抽出手段は前記文字パターン
上に設定した例えば所定の水平ライン上で該水平ライン
を最初に横切る前記文字パターンのストローク位置情報
を抽出する。そして認識手段は前記各抽出手段が抽出し
た特徴情報と前記記憶手段の記憶している標準特徴情報
を比較することにより文字認識をする。[Operation] In this configuration, the storage means stores predetermined standard characteristic information for character recognition in advance. On the other hand, the stroke length extraction means extracts horizontal and vertical stroke length information of the character pattern by counting the number of black pixels in the horizontal and vertical directions on the character pattern read by photoelectric conversion. Further, the stroke number extraction means extracts information on the number of strokes of the character pattern that crosses, for example, a predetermined horizontal line set on the character pattern. Further, the stroke position extracting means extracts stroke position information of the character pattern that first crosses, for example, a predetermined horizontal line set on the character pattern. The recognition means performs character recognition by comparing the feature information extracted by each of the extraction means with the standard feature information stored in the storage means.

［実施例］以下、添付図面を参照して本発明の実施例を詳細に説明
する。第１図は実施例のマーク・文字認識装置（ＯＭＲ
）のブロック構成図である。尚、実施例のマーク・文字
認識装置は手書マーク及び本実施例に係る手書文字の認
識の他、一般の画像ＩＡ理も行なえる汎用機能の認識装
置であるが、本発明は手書マーク及び特定のグループの
文字のみを認識する簡易な認識装置としても実現できる
。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Figure 1 shows the mark/character recognition device (OMR) of the embodiment.
) is a block configuration diagram of. The mark/character recognition device of this embodiment is a general-purpose recognition device that can perform general image IA processing in addition to recognizing handwritten marks and handwritten characters according to this embodiment. It can also be implemented as a simple recognition device that recognizes only marks and characters in a specific group.

第１図において、１は手書マーク、手書文字等の他原稿
画像を読み取って電気信号に変換するリーダ、７はリー
ダ１と連動してマークシート、画像原稿等をリーダ１の
読取部にフィードするオートフィーダ、２はリーダ１で
読み取った画像情報及びマーク、文字の認識結果の情報
を記憶する光ディスク、３はマーク・文字認識装置の全
体を制御するホストコンピュータ、４は各種の制御指令
、認識不能文字等の入力を行うキーボード、５は画像情
報、マーク及び文字の認識結果の情報、その他のオペレ
ーション情報等を表示するＣＲ７表示装置（ＣＲＴ）、
ａはマーク及び文字の認識結果の情報その他国像情報等
を印刷出力するプリンタである。In FIG. 1, 1 is a reader that reads other original images such as handwritten marks and handwritten characters and converts them into electrical signals. 7 is a reader that works in conjunction with the reader 1 to feed mark sheets, image originals, etc. to the reading section of the reader 1. 2 is an optical disk that stores the image information read by the reader 1 as well as mark and character recognition result information; 3 is a host computer that controls the entire mark/character recognition device; 4 is various control commands and recognition a keyboard for inputting invalid characters, etc.; 5 a CR7 display device (CRT) for displaying image information, mark and character recognition result information, other operation information, etc.;
A is a printer that prints out information on recognition results of marks and characters, as well as information on national images and the like.

ホストコンピュータ３において、５０は各種のプログラ
ムを実行するセントラルプロセッシングユニット（例え
ばモトローラ社製のマイクロコンピユータＭＣ６８００
０）、５１はＣＰＵ５０が実行する実施例の第３図の文
字認識処理プログラムの細文字認識のための標準特徴情
報５１ａを記憶しているＲＯＭでる。更に５２はＣＲＴ
インタフェース、５３はキーボードインタフェース、５
４は光デイスクインタフェース、５５はリーダインタフ
ェース、５６はプリンタインタフェース、５７はプログ
ラム実行の処理経通や、読み取った文字パターン情報の
細文字認識に必要な特徴抽出情報を記憶するＲＡＭ、６
０はＣＰＵ５０の共通バスである。In the host computer 3, 50 is a central processing unit (for example, Motorola microcomputer MC6800) that executes various programs.
0), 51 is a ROM that stores standard characteristic information 51a for fine character recognition of the character recognition processing program shown in FIG. 3 of the embodiment executed by the CPU 50. Furthermore, 52 is a CRT
interface, 53 is a keyboard interface, 5
4 is an optical disk interface, 55 is a reader interface, 56 is a printer interface, 57 is a RAM that stores feature extraction information necessary for program execution processing and fine character recognition of read character pattern information, 6
0 is a common bus for the CPU 50.

ＲＡＭ５７において、５７１は読み取った文字パターン
データな格納する文字バッファ、５７２は文字バッファ
の文字パターン上を水平及び垂直方向にその黒画素数を
計数することにより抽出した前記文字パターンのストロ
ーク長情報を記憶するエリア、５７３は前記文字パター
ン上に設定した所定の水平領域（又はライン）上で計数
した前記水平領域（又はライン）を横切る前記文字パタ
ーンのストローク数を記憶するエリア、５７４は前記所
定の水平領域（又はライン）上で抽出した前記水平領域
（又はライン）を最初に横切る前記文字パターンのスト
ローク位置情報を記憶するエリア、５７５は認識結果の
マークコード、文字コード等を記憶するエリアである。In the RAM 57, 571 is a character buffer that stores read character pattern data, and 572 stores stroke length information of the character pattern extracted by counting the number of black pixels in the horizontal and vertical directions on the character pattern of the character buffer. 573 is an area for storing the number of strokes of the character pattern that crosses the horizontal area (or line), which is counted on a predetermined horizontal area (or line) set on the character pattern; 574 is an area for storing the number of strokes of the character pattern that crosses the horizontal area (or line); An area 575 stores stroke position information of the character pattern that first crosses the horizontal area (or line) extracted on the area (or line), and an area 575 stores the mark code, character code, etc. of the recognition result.

第２図は第１図の構成を備えるマーク・文字認識装置の
外観図である。図中、第１図と同一の構成には同一番号
を付した。８はリーダ１とホストコンピュータ３を結ぶ
インタフェースケーブル、９はプリンタ６とポストコン
ピュータ３を結ぶインタフェースケーブルである。FIG. 2 is an external view of a mark/character recognition device having the configuration shown in FIG. 1. In the figure, the same components as in FIG. 1 are given the same numbers. 8 is an interface cable that connects the reader 1 and the host computer 3, and 9 is an interface cable that connects the printer 6 and the post computer 3.

次に木実流側装置による文字の認識原理を説明する。Next, we will explain the principle of character recognition by the Kinomi flow side device.

第７図は実施例装置の認識対象であるアルファベット及
び数字の文字形態を示す図である。図において、実施例
の文字は１６セグメントからなる要素の組合せで構成さ
れる。これらの要素の組合せで構成される代表的な文字
ストロークは、例えば水平（ｘ）方向の上、中又は下段
に記入される全長又は半長の文字ストローク、同様にし
て垂直（ｙ）方向の左、中又は右に記入される全長又は
半長の文字ストローク、及び斜方向の全長又は半長の文
字ストロークである。例えば第７図の１行１列目にはア
ルファベット文字「Ａ」を示している。該文字のＸ方向
のストローク長は上から順に「長」　「長Ｊ　「短」で
ある。尚、「短」のストローク長は実際にはないが、後
述する如く例えば縦長の文字ストロークを横方向に見た
ときに「短」と認識することになる。またＸ方向のスト
ローク長は左から順に「長」　「短」　「長」である。FIG. 7 is a diagram showing the character forms of alphabets and numbers that are objects to be recognized by the embodiment apparatus. In the figure, the character of the example is composed of a combination of elements consisting of 16 segments. Typical character strokes composed of combinations of these elements include, for example, full-length or half-length character strokes written in the top, middle, or bottom row in the horizontal (x) direction, and similarly written on the left in the vertical (y) direction. , full-length or half-length character strokes written in the middle or right, and full-length or half-length character strokes in the diagonal direction. For example, the alphabetic character "A" is shown in the first row and first column of FIG. The stroke lengths of the character in the X direction are "long", "long J" and "short" from the top. Although there is no actual stroke length that is "short," for example, when a vertically long character stroke is viewed in the horizontal direction, it is recognized as "short." The stroke lengths in the X direction are "long,""short," and "long" in order from the left.

１行２列目にはアルファベット文字ｒＢ」を示している
。同様にしてＸ方向のストローク長は上から順に「長」
　ｒ半」　「長」であり、Ｘ方向のストローク長は左か
ら順に「短」　「長」　「長」である。従って、このよ
うなストローク長の組合せは各文字に固有のものとなり
、この違いによって文字を容易に判別できる。In the first row and second column, the alphabet character "rB" is shown. Similarly, the stroke length in the X direction is "length" from the top.
The stroke lengths in the X direction are "short,""long," and "long" in order from the left. Therefore, such a combination of stroke lengths is unique to each character, and the characters can be easily distinguished based on this difference.

しかし、いくつかの文字は上述したストローク長の特徴
摘出のみでは判別できない。例えば５行４列目の数字「
２」と６行１列目の数字「５」ではＸ方向のストローク
長は上から共に「長」ｒ長」　「長」であり、Ｘ方向の
ストローク長は左から共に「半」　「短」　「半」であ
るから区別がつかない。また３行４列目のアルファベッ
ト文字「Ｐ」と３行６列目のアルファベット文字「Ｒ」
も同様である。However, some characters cannot be distinguished only by extracting the stroke length feature described above. For example, the number in the 5th row and 4th column "
2" and the number "5" in the 6th row and 1st column, the stroke lengths in the X direction are "long", "r length", and "long" from the top, and the stroke lengths in the X direction are "half" and "short" from the left. Since it is "half", it is difficult to tell the difference. Also, the alphabetic character "P" in the 3rd row and 4th column and the alphabetic character "R" in the 3rd row and 6th column.
The same is true.

そこで、これらの文字間の違いは、例えば第６図（ａ）
に示すように文字パターン上の水平方向に所定のライン
１とライン２を設定し、該ライン１又はライン２を横切
る縦ストロークの数と、文字の左端の基点（右０１１６
の基点でもよいが）から見て前記ライン１又はライン２
を最初に横切る縦ストロークがどこにあるかというスト
ローク位置情報を特徴抽出する。こうすればｒｐＪと「
Ｒ」ではライン２を横切るストローク数が「Ｐ」では１
木であり「Ｒ」では２本である。また「２」ではライン
１を最初に横切るまでの左端からの距離が長く、「５」
では短い。ライン２についてはその逆である。以上の特
徴抽出を組合せることによって、第７図の全文字を認識
できる。Therefore, the difference between these characters can be seen, for example, in Figure 6 (a).
As shown in , set predetermined line 1 and line 2 in the horizontal direction on the character pattern, and set the number of vertical strokes that cross line 1 or line 2 and the base point of the left end of the character (right 0116
(Although it may be the base point of) the line 1 or line 2
The stroke position information, which indicates where the first vertical stroke that crosses the line, is extracted as a feature. In this way, rpJ and “
The number of strokes crossing line 2 in "R" is 1 in "P"
It is a tree, and in "R" there are two. Also, in "2", the distance from the left end to the first crossing of line 1 is long, and in "5"
So short. The opposite is true for line 2. By combining the above feature extraction, all the characters in FIG. 7 can be recognized.

ここで、紙面に付いたゴミ等によるノイズの問題を考え
てみると、文字ストロークの長さについては少々のゴミ
が存在しても特徴抽出に影響を与える事は少ないと考え
られる。各所定領域１〜６についてはその中で黒画素数
を数回カウントしてその最大値をとる方法を採用してお
り、更にその最大値を所定のスレッシュホルドレベルｘ
ｌ、ｘ２で３値化し、大まかな長い、半分、短い、の区
分に量子化しているからである。一方、文字パターンの
内側にゴミ等があると誤って文字のストローク数を余分
にカウントする可能性が生じる。Now, considering the problem of noise caused by dust and the like on the paper surface, it is thought that even if there is a small amount of dust in the length of a character stroke, it will have little effect on feature extraction. For each predetermined area 1 to 6, a method is adopted in which the number of black pixels is counted several times and the maximum value is taken, and the maximum value is further set at a predetermined threshold level x
This is because the signal is ternarized using l and x2 and quantized into roughly long, half, and short categories. On the other hand, if there is dust or the like inside the character pattern, there is a possibility that the number of character strokes will be erroneously counted.

そこで、第６図（ｂ）に示すように文字パターン上の水
平方向に所定の領域７と領域８を設定し、その中で例え
ば水平方向に３回のストローク・数のカウントを行い、
この結果得られたストローク数のうち、該エリア内の最
小値をもってそのスト０−ク数とする。こうすることに
より、ゴミ等のノイズの存在によってストローク数を過
大計数する可能性を小さくできる。Therefore, as shown in FIG. 6(b), predetermined areas 7 and 8 are set in the horizontal direction on the character pattern, and the number of strokes is counted, for example, three times in the horizontal direction.
Among the stroke numbers obtained as a result, the minimum value within the area is taken as the stroke number. By doing so, the possibility of overcounting the number of strokes due to the presence of noise such as dust can be reduced.

さらに、前記と同一エリア内のストローク数の最小値を
カウントしたラインを利用し、そのラインを文字パター
ンの左端の基点から走査していき、該ラインを最初に横
切るストロークの位置を得る。そして、これを第６図（
ｂ）のスレッシュホルドレベルｄ１．ｄ２で３値化する
。ストローク数の最小値をカウントしたラインを利用す
るのは前述した理由により当該ラインはノイズの影響を
受けていないラインと考えられるからである。Furthermore, using the line where the minimum number of strokes in the same area as above was counted, that line is scanned from the base point at the left end of the character pattern to obtain the position of the first stroke that crosses this line. And this is shown in Figure 6 (
b) threshold level d1. Convert to 3 value using d2. The reason why the line for which the minimum number of strokes has been counted is used is because, for the reason mentioned above, this line is considered to be a line that is not affected by noise.

第６図（ｂ）の場合は、最初の文字ストロークが基点に
あるからエリア７も８もそのストローク発生位置の情報
は１°゛である。In the case of FIG. 6(b), since the first character stroke is at the base point, the information on the stroke generation position in areas 7 and 8 is 1°.

第３図は実施例の文字認識処理手順のフローチヤードで
ある。この処理にはシート上の文字が読み取られ、２値
化処理され、文字バッファ５７１に格納された後に入力
する。前記２値化処理では先ず用紙Ｐ上の文字を例えば
１文字毎に光電変換する。ここでは、第７図に示すよう
に反射率の高いドロップアウトカラーで１６セグメント
のプレプリントを行い、文字をセグメント上に書き込む
ようにしておくとその後の認識処理は簡単である。次に
光電変換した文字をさらに“１°゛、０°′の２値パタ
ーンに変換する。こうして文字バッファ５７１に格納さ
れた文字パターンは第４図のように４８Ｘ４８ビツトの
パターンサイズを持っている。FIG. 3 is a flowchart of the character recognition processing procedure of the embodiment. In this process, the characters on the sheet are read, binarized, stored in the character buffer 571, and then input. In the binarization process, first, the characters on the paper P are photoelectrically converted, for example, character by character. Here, as shown in FIG. 7, if 16 segments are preprinted using a dropout color with high reflectance and characters are written on the segments, the subsequent recognition process will be simple. Next, the photoelectrically converted characters are further converted into a binary pattern of "1°", 0°'.The character pattern thus stored in the character buffer 571 has a pattern size of 48 x 48 bits as shown in Figure 4. .

ステップＳ１では先ずＸ方向の黒画素の個数をカウント
してＲＡＭ５７にストアする。例えば第４図の第１行目
は長ストロークを構成するものでありその個数は４８で
ある。第２行目も長ストロークを構成するものでありそ
の個数は４８でる。第３行目はＸ方向の長ストロークを
２回横切るものでありその個数は４である。こうして第
４８行目までの各個数をカウントしてＲＡＭ５７にスト
アする。次に同様にしてＸ方向の黒画素の個数をカウン
トしてＲＡＭ５７にストアする。第４図の第１列目及び
第２列目は長ストロークを構成するものでありその個数
は夫々４８である。第３列目はＸ方向の長ストロークを
２回横切るものでありその個数は４である。ステップＳ
３ではＲＡＭ５７にストアしたデータを所定エリア毎に
分割してそれぞれ個数の最大値を抽出する。所定のエリ
アは例えば第５図（ｂ）のように縦方向に３分割したエ
リア１：２，３と、第５図（Ｃ）のように横方向に３分
割したエリア４，５．６とする。In step S1, first, the number of black pixels in the X direction is counted and stored in the RAM 57. For example, the first line in FIG. 4 constitutes a long stroke, the number of which is 48. The second line also constitutes a long stroke, and the number of strokes is 48. The third line crosses the long stroke in the X direction twice, and the number of strokes is four. In this way, each number up to the 48th line is counted and stored in the RAM 57. Next, in the same manner, the number of black pixels in the X direction is counted and stored in the RAM 57. The first and second rows in FIG. 4 constitute long strokes, and the number of strokes is 48. The third row crosses the long stroke in the X direction twice, and the number of strokes is four. Step S
3, the data stored in the RAM 57 is divided into predetermined areas and the maximum value of each area is extracted. The predetermined areas are, for example, area 1:2,3 divided into three in the vertical direction as shown in FIG. 5(b), and area 4,5.6 divided into three in the horizontal direction as shown in FIG. 5(C). do.

従って第５図（ａ）のような文字「Ａｊを読み取った場
合はエリア１の最大値は４８、エリア２の最大値は４８
、エリア３の最大値は４、エリア４の最大値は４８、エ
リア５の最大値は中央にあるノイズをカウントしたとし
て５、エリア６の最大値は４８である。ステップＳ４で
は各エリアの最大値を、第５図（ｂ）のようにＸ軸方向
に設けた画素数のスライスレベルｘｘ、ｘ２、及び第５
図（ｅ）のようにｙ軸方向に設けた画素数のスライスレ
ベルｙｘ、！２を基準にして３値化する。例えば第５図
（ｂ）においてエリア１の黒画素数の最大値４８はＸｌ
及び×２以上であるから３゛。Therefore, if the character "Aj" as shown in Figure 5(a) is read, the maximum value of area 1 is 48, and the maximum value of area 2 is 48.
, the maximum value of area 3 is 4, the maximum value of area 4 is 48, the maximum value of area 5 is 5, assuming that noise in the center is counted, and the maximum value of area 6 is 48. In step S4, the maximum value of each area is set to the slice level xx, x2, and the fifth slice level of the number of pixels provided in the
As shown in figure (e), the slice level yx of the number of pixels provided in the y-axis direction,! Convert into three values based on 2. For example, in FIG. 5(b), the maximum number of black pixels in area 1, 48, is Xl
and ×2 or more, so 3゛.

に量子化される。同様にしてエリア２は′３°′、エリ
ア３は“′ビとなる。またエリア４は３°゛、エリア５
は１°゛、エリア６はパ３°゛となる。これらの値はエ
リア５７２にストアされ、特徴抽出情報の一部を構成す
る。quantized to Similarly, area 2 becomes '3°', area 3 becomes ''bi', area 4 becomes '3°', area 5
is 1°, and area 6 is 3°. These values are stored in area 572 and form part of the feature extraction information.

ステップＳ５では所定領域７及び８を横切る文字ストロ
ークの数をカウントする。第６図（ｂ）の文字「ＡＪに
ついていえば、領域７を横切るストローク数は２″であ
り、領域８を横切るストローク数も°゛２″である。あ
るいは、ステップＳ５では第６図（ａ）のようにライン
１及びライン２を横切る文字ストロークの数をカウント
してもよい。第６図（ａ）の文字「Ａ」についていえば
、ライン１を横切るストローク数は２°゛であり、ライ
ン２を横切るストローク数も２°゛である。何れにして
もステラＳ６ではカウントしたストローク数をエリア５
７３にストアする。In step S5, the number of character strokes crossing the predetermined areas 7 and 8 is counted. Regarding the character "AJ" in FIG. 6(b), the number of strokes crossing area 7 is 2'', and the number of strokes crossing area 8 is also °2''. Alternatively, in step S5, the number of strokes crossing area 7 is 2''. ), the number of character strokes crossing line 1 and line 2 may be counted.For the character "A" in FIG. 6(a), the number of strokes crossing line 1 is 2°, The number of strokes across line 2 is also 2°. In any case, in Stella S6, the number of strokes counted is
Store in 73.

ステップＳ７では所定領域７及び８を最初に横切る文字
ストロークの位置を検出する。第６図（ｂ）の文字「Ａ
」についていえば、領域７を最初に横切るストローク位
置は基点と６１間にありその値は１′°である。領域８
を最初に横切るストローク位置の値も１”である。ある
いは、ステップＳ７では第６図（ａ）のようにライン１
及びライン２を最初に横切る文字ストロークの位置を検
出するようにしてもよい。第６図（ａ）の文字ｒ　Ａ　
Ｊについていえば、ライン１を最初に横切るストローク
位置は基点と６１間にありその値は１°°である。ライ
ン２を最初に横切るストローク位置の値も“１°°であ
る。ステラＳ８では求めたストローク位置の特徴情報を
エリア５７４にストアする。ちなみに数字「２」につい
ていえば、領域７を最初に横切るストロ、−り位置はｄ
２より右側にあるのでその値は′３°゛である。In step S7, the position of the character stroke that first crosses the predetermined areas 7 and 8 is detected. The letter “A” in Figure 6(b)
'', the stroke position that first crosses region 7 is between the base point and 61, and its value is 1'°. Area 8
The value of the stroke position that first crosses the line is also 1''.Alternatively, in step S7, as shown in FIG.
The position of the character stroke that first crosses line 2 may also be detected. Letter r A in Figure 6(a)
Regarding J, the stroke position where line 1 is first crossed is between the base point and 61, and its value is 1°°. The value of the stroke position that first crosses line 2 is also “1°°.In Stella S8, the characteristic information of the calculated stroke position is stored in area 574.By the way, regarding the number “2”, the value of the stroke position that first crosses line 2 is “1°°.” The stroke position is d.
Since it is on the right side of 2, its value is '3°'.

こうして得られた１０個の情報が１文字から抽出した特
徴パターンとなり、ステップＳ９では工リア５７２〜５
７４の内容とＲＯＭ５の標準特徴情報５１ａを照合する
。ステップＳ１０で照合一致が得られればステップＳ１
１でＲＯＭ５から対応文字コードを取り出してコードバ
ッファ５７５にストアする。またステップＳＩＯで照合
一致が得らないときはステップＳ１２でリジェクトコー
ドをコードバッファ５７５にストアする。The 10 pieces of information obtained in this way become feature patterns extracted from one character, and in step S9, the
74 and the standard feature information 51a of ROM5. If a match is obtained in step S10, step S1
1, the corresponding character code is taken out from the ROM 5 and stored in the code buffer 575. If no match is found in step SIO, a reject code is stored in the code buffer 575 in step S12.

尚、実施例のマーク・文字認識装置は、例えば第８図に
示すようなマークシートに記入されたマーク及び手書文
字を読み取る。該マークシートの第１列目には幅広のタ
イミングマークが印刷されており、装置はタイミングマ
ークを読み取ることにより手書マーク又は手書文字の読
取タイミングを決定する。また第１行目にはキーワード
マークが印刷されている。第２行目からはデータ欄が続
き、ここにはデータマークを記入する。そして手書文字
は下欄の２行に記入される。手書文字は必ずしもプレプ
リントしたセグメント内に正確に記入する必要はないが
、こううするととで記入が容易になり認識確度も高い。The mark/character recognition device of this embodiment reads marks and handwritten characters written on a mark sheet as shown in FIG. 8, for example. A wide timing mark is printed on the first row of the mark sheet, and the device determines the reading timing of the handwritten mark or handwritten character by reading the timing mark. Also, a keyword mark is printed on the first line. The data field continues from the second line, and data marks are entered here. The handwritten characters are then entered in the two lines at the bottom. Handwritten characters do not necessarily need to be written exactly within the preprinted segments, but this makes writing easier and increases recognition accuracy.

上記マークシートは電子ファイルシステムのキーワード
の登録、ファクシミリの電話番号入力、複写機などの画
像形成装置のコピ一枚数、縮率などのモード設定用に使
うことが考えられる。The mark sheet may be used for registering keywords in an electronic file system, inputting a telephone number for a facsimile, and setting modes such as the number of copies and reduction ratio of an image forming device such as a copying machine.

［発明の効果］以上述べた如く本発明によれば、特徴照合処理で照合さ
れる特徴情報量が少ないため処理時間が大幅に短縮され
る。しかも各特徴情報量は簡単な処理で求められる。[Effects of the Invention] As described above, according to the present invention, since the amount of feature information to be compared in the feature matching process is small, the processing time can be significantly shortened. Furthermore, each amount of feature information can be obtained through simple processing.

また所定領域についてカウントした黒画素数の最大値を
もってストローク長を代表させる構成はその領域内のゴ
ミ、ピーク、ボイドの影響を軽減する。Furthermore, the configuration in which the stroke length is represented by the maximum number of black pixels counted for a predetermined region reduces the influence of dust, peaks, and voids within that region.

また所定領域を横切るストローク数を数回カウントし、
その中でカラン］・値が最小となるものをもって前記所
定領域を横切るストロークと判定するのでこの場合もそ
の領域内のゴミ、ピークの影響を軽減する。We also count the number of strokes that cross a given area several times,
Among them, the one with the smallest value is determined to be a stroke that crosses the predetermined area, so in this case as well, the influence of dust and peaks within that area is reduced.

以上によって文字認識の確度が向上する。The accuracy of character recognition is improved by the above.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は実施例のマーク・文字認識装置（ＯＭＲ）のブ
ロック構成図、第２図は第１図の構成を備えるマーク・文字認識装置の
外観図、第３図は実施例の文字認識処理手順のフローチャート、第４図は文字パターンのビットサイズを示す図、第５図（ａ）〜（ｃ）は文字パターンのストローク長の
検出を説明する図、第６図（ａ）、（ｂ）は文字の縦ストロークの数及びス
トローク位置の検出を説明する図、第７図は実施例装置
の認識対象であるアルファベット及び数字の文字形態を
示す図、第８図はマークシートの一例を示す図、第９図は従来の
文字認識処理の一例を示すフローチャートである。図中、１・・・リーダ、２・・・光ディスク、３・・・
ポストコンピュータ、４・・・キーボード、５・・・Ｃ
ＲＴ表示装置、６・・・プリンタ、７・・・オートフィ
ーダである。特許出願人　　　キャノン株式会社１１開日Ｒ６２−２８１０９６　　（９）〉（ｃ）第５図（α）１？、１第６図Fig. 1 is a block diagram of a mark/character recognition device (OMR) according to an embodiment, Fig. 2 is an external view of a mark/character recognition device having the configuration shown in Fig. 1, and Fig. 3 is a character recognition process according to an embodiment. Flowchart of the procedure, Figure 4 is a diagram showing the bit size of a character pattern, Figures 5 (a) to (c) are diagrams explaining detection of the stroke length of a character pattern, Figures 6 (a) and (b) 7 is a diagram illustrating the detection of the number of vertical strokes and the stroke position of a character, FIG. 7 is a diagram illustrating the character forms of alphabets and numbers that are recognized by the embodiment device, and FIG. 8 is a diagram illustrating an example of a mark sheet. FIG. 9 is a flowchart showing an example of conventional character recognition processing. In the figure, 1...reader, 2...optical disk, 3...
Post computer, 4...keyboard, 5...C
RT display device, 6... printer, 7... auto feeder. Patent Applicant: Canon Co., Ltd. 11 Kaichi R62-281096 (9)〉 (c) Figure 5 (α) 1? , 1 Figure 6

Claims

【特許請求の範囲】[Claims]

（１）文字認識ための所定の標準特徴情報を記憶してい
る記憶手段と、読み取つた文字パターン上を水平及び垂
直方向にその黒画素数を計数することにより前記文字パ
ターンのストローク長情報を抽出するストローク長抽出
手段と、前記文字パターン上の所定の水平方向に対して
該方向を横切る前記文字パターンのストローク数を計数
することによりストローク数情報を抽出するストローク
数抽出手段と、前記文字パターン上の所定の水平方向に
対して該方向を最初に横切る前記文字パターンのストロ
ーク位置を検出することによりストローク位置情報を抽
出するストローク位置抽出手段と、前記各抽出手段が抽
出した特徴情報と前記記憶手段の記憶している標準特徴
情報を比較することにより文字認識をする認識手段を備
えることを特徴とする文字認識装置。(1) A storage means that stores predetermined standard feature information for character recognition, and extracts stroke length information of the read character pattern by counting the number of black pixels in the horizontal and vertical directions on the read character pattern. stroke length extraction means for extracting stroke number information by counting the number of strokes of the character pattern that crosses a predetermined horizontal direction on the character pattern; stroke position extraction means for extracting stroke position information by detecting a stroke position of the character pattern that first crosses a predetermined horizontal direction of the character pattern; and feature information extracted by each of the extraction means and the storage means. 1. A character recognition device comprising recognition means for recognizing a character by comparing standard feature information stored in the character recognition device.

（２）ストローク長抽出手段は読み取つた文字パターン
上の所定領域の各ラインについて計数した黒画素数の内
最大のものをストローク長情報として抽出することを特
徴とする特許請求の範囲第１項記載の文字認識装置。(2) The stroke length extraction means extracts the maximum number of black pixels counted for each line in a predetermined area on the read character pattern as stroke length information. character recognition device.

（３）ストローク長抽出手段は読み取つた文字パターン
上の所定領域の各ラインについて計数した黒画素数の内
最大のものを所定のスレツシユホルドレベルでＮ値化し
たものをストローク長情報として抽出することを特徴と
する特許請求の範囲第１項記載の文字認識装置。(3) The stroke length extraction means converts the maximum number of black pixels counted for each line in a predetermined area on the read character pattern into an N value at a predetermined threshold level, and extracts the result as stroke length information. A character recognition device according to claim 1, characterized in that:

（４）ストローク数抽出手段は読み取つた文字パターン
上の所定水平ラインについて計数したストローク数をス
トローク数情報として抽出することを特徴とする特許請
求の範囲第１項記載の文字認識装置。(4) The character recognition device according to claim 1, wherein the stroke number extraction means extracts the number of strokes counted for a predetermined horizontal line on the read character pattern as stroke number information.

（５）ストローク数抽出手段は読み取つた文字パターン
上の所定水平領域の各ラインについて計数したストロー
ク数のうち最小のものストローク数情報として抽出する
ことを特徴とする特許請求の範囲第１項記載の文字認識
装置。(5) The stroke number extraction means extracts the smallest number of strokes counted for each line in a predetermined horizontal area on the read character pattern as stroke number information. Character recognition device.