JPS60160486A

JPS60160486A - Optical character reader

Info

Publication number: JPS60160486A
Application number: JP59015487A
Authority: JP
Inventors: Keiichi Aoyama; 恵一青山
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-01-31
Filing date: 1984-01-31
Publication date: 1985-08-22

Abstract

PURPOSE:To allow detection processing in character frames except a drop-out color by detecting a character frame pattern in a memory means in accordance with character frame data and a character pattern based on the detected pattern. CONSTITUTION:Character patterns 12 for one line and a character frame pattern 40 are stored in a line buffer 37. A detection feeding part 32 scans patterns in the line buffer 37, and detects a character frame pattern 40 in accordance with character frame data stored beforehand in a memory 33. The character frame data specify a size, height, pitch and thickness of a line about a character frame. Since a character frame is normally composed of only vertical and horizontal lines, by refering to the character frame data, a character frame is easily detected in the line buffer 37. Then the detection feeding part 32 detects a character pattern at every character in accordance with position data of the detected character frame pattern 40.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、特に文字枠を有する帳票を読取るための光学
的文字読取装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to an optical character reading device particularly for reading forms having character frames.

〔発明の技術的背景とその問題点〕[Technical background of the invention and its problems]

従来、光学的文字読取装置（ＯＣＲ）では、例えば巣１
図に示すような文字枠１ｏが印刷された帳票１ノを使用
することが一般的である。この帳票１１では、文字枠ｌ
θがドロップアウトカラーで印刷されてお）、この文字
枠１ｏの中に読取対象である文字が１文字毎に記入され
ることになる。Conventionally, in optical character reading devices (OCR), for example, nest 1
It is common to use a form 1 on which a character frame 1o as shown in the figure is printed. In this form 11, the character frame l
θ is printed in a dropout color), and each character to be read is written in this character frame 1o.

ＯＣＲは、第１図に示すような帳票１ノ上全光学系走査
部によシ走査して、通常１行分の文字パターンをデータ
（２値化信号）として抽出する。抽出された文字パター
ンは、例えば嬉２図に示すようにラインバッファ２ｏに
格納される。In OCR, a document as shown in FIG. 1 is scanned by an all-optical scanning unit, and usually one line of character patterns is extracted as data (binarized signal). The extracted character pattern is stored in a line buffer 2o, for example, as shown in Figure 2.

このとき、文字枠１ｏがドロップアウトカラーで印刷さ
れているため、帳票１ノ上に記入された文字のみがライ
ンバッファ２ｏに格納される。At this time, since the character frame 1o is printed in dropout color, only the characters written on the form 1 are stored in the line buffer 2o.

いま仮に、第１図に承す帳票１１上の４行目の文字列１
２がラインバッファ２ｏに格納されたとする。For example, if the character string 1 in the 4th line on the form 11 shown in Figure 1
2 is stored in the line buffer 2o.

ＯＣＲは、予め文字ピッチ、文字の大きさ２文字の位置
（具体的には第１図に示す文字枠１゜の先頭座標Ｘ４．
Ｙ（）及び字株等を指定するフォーマットコントロール
データヲｍｔ２ｔＭＬ、ている。このフォーマットコン
トロールデータに基づいて、ラインバッファ２ｏ内を走
査して文字ノセターンに対する垂直射影２１を作成する
。OCR uses the character pitch, character size, and position of two characters (specifically, the starting coordinates of the 1° character frame shown in FIG. 1, X4.
Format control data that specifies Y(), character stock, etc. is included. Based on this format control data, the line buffer 2o is scanned to create a vertical projection 21 for the character noceturn.

ＯＣＲは、垂直射影２１に基づいて、黒の部分が文字で
あシ白の部分が文字間であることを検知し、１文字毎の
文字・母ターンを検出切出す（以下検切と略称する）こ
とができる。これにょシ。Based on the vertical projection 21, OCR detects that the black part is a character and the white part is a space between characters, and detects and cuts out the character/mother turn of each character (hereinafter abbreviated as "cutting"). )be able to. This is it.

認識部において、１文字毎の文字パターンニ対する認識
処理が実行される。In the recognition unit, recognition processing is executed for each character pattern.

しかしながら、上記のようなＯＣＲでは、文字枠１０を
ドロップアウトカラーで印刷した帳票１ノを用いること
になる。このような帳票１ノは、ドロップアウトカラー
用の特殊な印刷インクを用いて印刷する必要があるため
、高いコストを要する。このため、ｏｃｉｔ全体のコス
トが高くなる欠点があ〕、また使用できる帳票の種類も
限定されることになる問題があった。However, in the above OCR, a form 1 in which the character frame 10 is printed in a dropout color is used. Such a form 1 requires high cost because it needs to be printed using special printing ink for dropout colors. For this reason, there is a problem that the cost of the entire ocit is high, and the types of forms that can be used are also limited.

〔発明の目的〕[Purpose of the invention]

本発明は上記の事情に鑑みてなされたもので、その目的
は、文字枠がドロップアウトカラー以外の色で印刷され
た帳票を用いた場合、その帳票上に記入された文字を確
実に読取ることができるようにして、帳票に要するコス
トを減少することができ、しかも使用可能な帳票の種類
を増大できる有用な光学的文字読取装置を提供すること
にある。The present invention has been made in view of the above circumstances, and its purpose is to reliably read characters written on a form when the character frame is printed in a color other than the dropout color. It is an object of the present invention to provide a useful optical character reading device that can reduce the cost required for forms and increase the types of usable forms.

〔発明の概要〕[Summary of the invention]

本発明では、帳票上に記録された１行分の文字及び文字
枠に対応する文字パターンデータを格納するメモリ手段
が設けられる。このメモリ手段から１文字毎の文字パタ
ーンを検切する検出切出し手段が設けられる。検出切出
し手段は、予め設定される文字枠データに基づいてメモ
リ＋段内の文字枠パターンを検出し、その文字枠パター
ンの位置データ及び予め設定されるフォーマットコント
ロールデータに基づいてメモリ手段内の文字パターンデ
ータから１文字毎の文字パターンを検切するように構成
される。検切された文字パターンは、認識手段によシ認
職処理される。In the present invention, a memory means is provided for storing character pattern data corresponding to one line of characters and character frames recorded on a form. A detection cutout means is provided for detecting character patterns for each character from the memory means. The detection cutting means detects a character frame pattern in the memory + column based on character frame data set in advance, and extracts characters in the memory means based on position data of the character frame pattern and format control data set in advance. It is configured to examine the character pattern for each character from the pattern data. The verified character pattern is processed by the recognition means.

このような構成によシ、帳票上の文字枠がドロ、グアウ
ドカラー以外の色で印刷された帳票において、その帳票
上に記入された文字列から１文字毎の検切処理を確実に
行なうことができるＯ〔発明の実施例〕以下図面を参照して本発明の一実施例について説明する
。第３図は、一実施例に係わるＯＣＲの構成を示すブロ
ック図である。第３図において、走査部３０は例えば第
１図に示すような帳票１ノ上を走査して、帳票上の文字
を電気信号（２値化信号）である文字ノ４ターンデータ
に変換する。前処理部３１は、走査部３０から送られる
文字パターンデータに対して正規化等の前処理を行なっ
て、検出切出し部３２へ出力する。With this configuration, it is possible to reliably perform the verification process for each character from the character string written on the form in a form in which the character frame on the form is printed in a color other than the black or gray color. [Embodiment of the Invention] An embodiment of the present invention will be described below with reference to the drawings. FIG. 3 is a block diagram showing the configuration of an OCR according to an embodiment. In FIG. 3, a scanning unit 30 scans, for example, a form 1 as shown in FIG. 1, and converts characters on the form into four-turn character data, which is an electrical signal (binarized signal). The preprocessing unit 31 performs preprocessing such as normalization on the character pattern data sent from the scanning unit 30 and outputs the data to the detection cutting unit 32 .

検出切出し部３２は、例えばマイクロプロセッサからな
り、１行分の文字及び文字枠に対応する文字パターンデ
ータを格納するライン・９ツフア３７を備えており、前
処理部３ノから送られる文字パターンデータを一時ライ
ンバッファ３２に格納する。さらに検出切出し部３２は
、予め検切用メモリ３３に格納されたフォーマットコン
トロールデータ及び文字枠データに基ツいて、ラインバ
ッファ３７円から１文字毎の文字パターンを検切して認
識部３４へ出力する。The detection cutting section 32 is composed of, for example, a microprocessor, and includes a line/9 buffer 37 that stores character pattern data corresponding to one line of characters and character frames, and stores the character pattern data sent from the preprocessing section 3. is stored in the temporary line buffer 32. Further, the detection cutting section 32 cuts out the character pattern for each character from the line buffer 37 based on the format control data and character frame data stored in the cutting memory 33 in advance and outputs it to the recognition section 34. do.

認鯖部３４は、検切された文字パターンに対する認識処
理を行なう。この場合、辞書メモリ３５に予め格納され
た標準パターンに基づいて、例えばマツチング方式によ
る認識処理を行なう。The recognition section 34 performs recognition processing on the character pattern that has been examined. In this case, recognition processing is performed, for example, by a matching method based on standard patterns stored in the dictionary memory 35 in advance.

編集部３６は、認識部３４で認識された結果を編集し、
その編集結果を最終的読取結果として出力する。The editing unit 36 edits the results recognized by the recognition unit 34,
The editing result is output as the final reading result.

上記のような構成のＯＣＲにおいて、第４図及び第５図
を参照して一実施例に係わる動作を説明する。先ず、本
発明では例えば第１図に示すような帳票１ノにおいて、
文字枠１０がドロップアウトカラー以外の例えば文字と
同様の黒色で印刷されている。このような帳票が走査部
３０によ多走査されると、所定の行に記入された文字及
び文字枠は電気信号（２値化信号）に変換された抜文字
ノ４ターンデータとして前処理部３ノへ送られる。前処
理部３１では、文字ｉ＋ターンデータに対する正規化等
の前処理がなされた後、検出切出し部３２へ送られる。In the OCR configured as described above, the operation according to one embodiment will be described with reference to FIGS. 4 and 5. First, in the present invention, for example, in a form 1 as shown in FIG.
The character frame 10 is printed in a color other than the dropout color, for example, the same black as the characters. When such a form is scanned multiple times by the scanning unit 30, the characters and character frames written in a predetermined line are converted into electric signals (binarized signals) and are processed as 4-turn data by the preprocessing unit. Sent to No.3. The preprocessing section 31 performs preprocessing such as normalization on the character i+turn data, and then sends the data to the detection cutting section 32 .

検出切出し部３２では、前処理部３１から送られる文字
パターンデータが１行分ラインバッファ３７に格納され
る。この場合、第４図に示すようにラインバッファ３７
には、１行分の文字ノ４ターン１２及び文字枠パターン
４ｏが共に格納される。文字枠ノ４ターン４０は、上記
のように帳票上に文字と同様の黒色で印刷された文字枠
に相当する。検出切出し部３２は、ラインバッファ３７
内を走査して、予めメモリ３３に格納された文字枠デー
タに基づいて文字枠パターン４θを検出する。文字枠デ
ータは、文字枠の大きさ、高さ、ピッチ、梅の太さ等を
指定するデータである。文字枠は通常垂直及び水平の各
線のみで構成されているため、文字枠データを参照する
ことによシラインバッファ３７内から容易に検出される
。次に検出切出し部３２は、検出した文字枠パターン４
０の位置データ即ち垂直、水平の各直線の位置データ及
びフォーマットコントロールデータ（文字ピッチ、文字
の大きさ等を指定するデータ）に基づいて、１文字毎の
文字パターンを検切する。この場合、各文字パターンは
第４図に示すように文字枠の垂直直線の間に位置してい
るため、文字枠の位置データを利用することによシ各文
字・母ターンの位置を容易に検出することができる。こ
れにより、ラインバッファ３７内から１文字毎の文字ノ
リーンが文字枠４０とは分離されて切出されることにな
る。ここで、例えば第５図（、）に示すように文字行の
中で文字枠４０からはみ出した文字５０が存在する場合
でも、上記のような検切処理によシ文字と文字枠４θが
分離され各文字が切出されることになる（第５図（ｂ）
）。In the detection cutout section 32, the character pattern data sent from the preprocessing section 31 is stored in the line buffer 37 for one line. In this case, as shown in FIG.
, 4 turns of characters 12 and a character frame pattern 4o for one line are both stored. The character frame No. 4 turn 40 corresponds to the character frame printed in the same black color as the characters on the form as described above. The detection cutting section 32 uses a line buffer 37
The character frame pattern 4θ is detected based on the character frame data stored in the memory 33 in advance. The character frame data is data that specifies the size, height, pitch, thickness, etc. of the character frame. Since a character frame usually consists of only vertical and horizontal lines, it can be easily detected from within the line buffer 37 by referring to the character frame data. Next, the detection cutting unit 32 detects the detected character frame pattern 4.
The character pattern for each character is inspected based on the position data of 0, that is, the position data of each vertical and horizontal straight line, and format control data (data specifying character pitch, character size, etc.). In this case, since each character pattern is located between the vertical straight lines of the character frame as shown in Figure 4, the position of each character/mother turn can be easily determined by using the position data of the character frame. can be detected. As a result, each character is cut out from the line buffer 37, separated from the character frame 40. Here, even if there is a character 50 protruding from the character frame 40 in the character line as shown in FIG. and each character is cut out (Fig. 5(b))
).

認識部３４では、検出切出し部３２から切出された文字
パターンのみが１文字分毎に出力される。そして、上記
のように予め設定された標準ノ母ターンと文字ノ９ター
ンとのマツチング処理によシ、文字ノ！ターンに対する
認識処理が行なわれる。編集部３６は、認識部３４から
の１文字毎の認識結果を例えば行単位に編集して、その
編集結果を最終的読取結果として出力する。In the recognition section 34, only the character pattern cut out from the detection cutout section 32 is outputted for each character. Then, through the matching process between the preset standard mother turn and the 9th turn of the letter as described above, the letter 9! Recognition processing for the turn is performed. The editing section 36 edits the recognition result for each character from the recognition section 34, for example, line by line, and outputs the editing result as the final reading result.

このようにして、帳票上の文字枠をドロップアウトカラ
ー以外の黒色等で印刷した帳票から文字を読取る場合、
上記のような検出切出し部３２によシ１文字毎の文字検
切処理を行なうことができる。この場合、文字枠はドロ
、プアウトカラー以外の黒色等印刷されているため、文
字と共にラインバッファ３７内に格納される。In this way, when reading characters from a form whose character frame is printed in black, etc. other than the dropout color,
The detection cutout section 32 as described above can perform character inspection processing for each character. In this case, since the character frame is printed with a black color other than the black color or pullout color, it is stored in the line buffer 37 along with the characters.

したがって、従来のＯＣＲによる射影（第２図の２１）
を利用した検切方式では文字パターンのみを検切するこ
とは不可能である。これに対し本発明のＯＣＲでは、予
め設定した文字枠データを利用してラインバッファ３７
内の文字枠を検出し、その文字枠の位置データを利用し
て文字と文字枠を分離することができる。Therefore, projection by conventional OCR (21 in Figure 2)
It is impossible to check only the character pattern with the checking method using . On the other hand, in the OCR of the present invention, the line buffer 37 uses preset character frame data.
It is possible to detect a character frame within a character frame and use the position data of that character frame to separate the character from the character frame.

これによシ、帳票上の文字枠をドロップアウトカラーで
ある特殊な色で印刷する必要がなくなり、帳票の作成を
簡単化することができる。This eliminates the need to print the character frame on the form in a special dropout color, which simplifies the creation of the form.

また、検切処理時において、文字枠を検出することかで
きるため、文字枠の位置が帳票の印刷精度等によシ多少
ずれている場合でも、文字の位置を補正して読取ること
が可能となる。In addition, since the character frame can be detected during the inspection process, even if the position of the character frame is slightly off due to the printing accuracy of the form, it is possible to correct the position of the character and read it. Become.

〔発明の効果〕〔Effect of the invention〕

以上詳述したように本発明によれば、帳票上（Ｄ　文字
枠がドロップアウトカラー以外の黒色等で印刷され、そ
の文字枠内に記入された文字を確実に認識することがで
きる。したがって、帳票の作成上特殊な色を用いた印刷
を不安にすることができるため、Ｉｌ！ｉ＋２票の作成
を１シ１単化することができ、帳票に要するコストを減
少することができる。また、０ＣＲＫ使用できる帳票の
種類を増大することができる。As detailed above, according to the present invention, the character frame (D) on the form is printed in a color other than the dropout color, such as black, and the characters written within the character frame can be reliably recognized. Since it is possible to avoid printing using special colors when creating forms, the creation of Il!i+2 forms can be done in one sheet, and the cost required for forms can be reduced.Also, The types of forms that can be used with 0CRK can be increased.

これによ、６、ｏｃａ全体のコストを減少でき、しかも
ＯＣＲの読取対象範囲を増大できるため結果的に性能を
高めることができるものである。This makes it possible to reduce the overall cost of 6.OCA, and to increase the range of OCR reading, resulting in improved performance.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は帳票の一例を示す図、第２図は従来の光学的文
字読取装置において検出切出し動作を説明するための図
、第３図は本発明の一実施例に係わる光学的文字読取装
置の構成を示すブロック図、＠４図は第３図のラインバ
ッファの記憶内容の一例を夾す図、第５図（ａ）　、　
（ｂ）はそれぞれ第３図の光学的文字読取装置の検出切
出し動作を説明するための図である。３２・・・検出切出し部、３３・・・検切用メモリ、３
４・・・認識部、３５・・・辞書メモリ、３７・・・ラ
インバッファ。出願人代理人　弁理士　鈴　江　武　彦２第２図第３図第４図７第５図FIG. 1 is a diagram showing an example of a form, FIG. 2 is a diagram for explaining the detection and extraction operation in a conventional optical character reading device, and FIG. 3 is an optical character reading device according to an embodiment of the present invention. Figure 4 is a block diagram showing the configuration of the line buffer shown in Figure 3. Figure 5 (a) is a block diagram showing the configuration of
(b) is a diagram for explaining the detection and extraction operation of the optical character reading device shown in FIG. 3; 32...Detection cutting section, 33...Memory for inspection cutting, 3
4... Recognition unit, 35... Dictionary memory, 37... Line buffer. Applicant's agent Patent attorney Takehiko Suzue 2 Figure 2 Figure 3 Figure 4 Figure 7 Figure 5

Claims

【特許請求の範囲】[Claims]

帳票上に記録された少なくとも１行分の文字及び文字枠
に対応するノ４ターンデータを格納するメモリ手段と、
予め設定される文字枠データに基づいて上記メモリ手段
に格納された文字枠パターンを検出しその文字枠ノ（タ
ーンの位置データ及び予め設定されるフォーマットコン
トロールデータに基づいて上記メモリ手段から１文字毎
の文字パターンを検出切出する検出切出し手段と、この
検出切出し手段によ）検出切出された１文字毎の文字ノ
４ターンに対する認識処理を行なう認識手段とを具備し
てなることを特徴とする光学的文字読取装置。a memory means for storing four-turn data corresponding to at least one line of characters and character frames recorded on the form;
The character frame pattern stored in the memory means is detected based on preset character frame data, and the character frame pattern (character frame pattern) is extracted from the memory means for each character based on turn position data and preset format control data. The present invention is characterized by comprising: a detection/cutout means for detecting and cutting out a character pattern; and a recognition means for performing recognition processing on four turns of each character detected and cut out by the detection/cutout means. optical character reading device.