JPS6316795B2 - - Google Patents

Info

Publication number
JPS6316795B2
JPS6316795B2 JP55000590A JP59080A JPS6316795B2 JP S6316795 B2 JPS6316795 B2 JP S6316795B2 JP 55000590 A JP55000590 A JP 55000590A JP 59080 A JP59080 A JP 59080A JP S6316795 B2 JPS6316795 B2 JP S6316795B2
Authority
JP
Japan
Prior art keywords
kanji
discrimination
character
katakana
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP55000590A
Other languages
Japanese (ja)
Other versions
JPS5699573A (en
Inventor
Yasutomi Ejiri
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP59080A priority Critical patent/JPS5699573A/en
Publication of JPS5699573A publication Critical patent/JPS5699573A/en
Publication of JPS6316795B2 publication Critical patent/JPS6316795B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 本発明は、ふりがな(例えば、カタカナ)判別
を用いる漢字判別方式に関し、特に光学文字読取
り装置の漢字判別方式に関する。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a kanji discrimination method using furigana (eg, katakana) discrimination, and more particularly to a kanji discrimination method for an optical character reading device.

従来、光学文字読取り装置の文字認識方法には
各種あり、そのどの方法をとつてみても未知パタ
ーンからある文字を判別して結果を出す方式であ
つた。
Conventionally, there are various character recognition methods for optical character reading devices, and each method produces a result by identifying a certain character from an unknown pattern.

すなわち、まず帳票上に書かれている文字は光
電変換によつてメモリ部にとり込まれ、第1図に
示すように、とり込まれたパターン1(通常、未
知パターンと呼ばれている)を、何らかの認識方
法2により判別を行い、文字を判定3する方法で
あつた。
That is, first, the characters written on the form are captured into the memory section by photoelectric conversion, and as shown in Fig. 1, the captured pattern 1 (usually called the unknown pattern) is This was a method in which a recognition method 2 was used to determine the character.

手書き文字の認識の場合、第2図に示すよう
に、光電変換後、入力パターンは生成された手書
き文字から字体等の手書き変動成分を除き文字を
安定に認識するために正規化4が行われ、次に特
徴抽出回路により文字パターンの形状からその特
徴を抽出して(5)、その結果を識別回路に入れて認
識する(6)。識別6とは、特徴抽出5で得られた特
徴の組を基に未知入力文字を判定する操作をい
う。
In the case of handwritten character recognition, as shown in Figure 2, after photoelectric conversion, the input pattern is normalized 4 to remove handwritten fluctuation components such as font from the generated handwritten characters and to stably recognize the characters. Next, the feature extraction circuit extracts the features from the shape of the character pattern (5), and the results are input to the identification circuit for recognition (6). Identification 6 refers to an operation of determining an unknown input character based on the set of features obtained in feature extraction 5.

また、文字の特徴抽出には、印刷文字をはじめ
常用手書き文字を対象に広く用いられている方法
として、パターン整合法がある。これは、標準パ
ターンと未知入力パターンとの整合の度合いを調
べることにより、文字を認識する方法である。そ
の他に、ストローク分析法、文字輪郭分析法、特
徴記号系列法、幾何学的特徴抽出法、A−b−S
(Analysis−by−Synthesis)法がある。
Furthermore, for character feature extraction, a pattern matching method is widely used for commonly used handwritten characters, including printed characters. This is a method of recognizing characters by checking the degree of matching between a standard pattern and an unknown input pattern. In addition, stroke analysis method, character contour analysis method, feature symbol sequence method, geometric feature extraction method, A-b-S
(Analysis-by-Synthesis) method.

以上述べた文字認識の方法は、数字、英字、カ
ナ、記号程度の少ない文字種では、誤つて読み取
られる場合は少ない。
The above-described character recognition method is unlikely to be read incorrectly for characters with a small number of characters, such as numbers, alphabets, kana, and symbols.

しかし、漢字の判定ともなると約2000種の文字
を判別する必要がでてくる。したがつて、従来の
判別のやり方では正しく漢字の認識を行うことは
不可能であるという問題点がある。
However, when it comes to identifying kanji, it is necessary to distinguish between approximately 2,000 types of characters. Therefore, there is a problem in that it is impossible to correctly recognize kanji using conventional discrimination methods.

本発明の目的は、このような従来の問題点を除
去するため、光学文字読取り装置において、誤読
を少なくして漢字を正しく認識するためカタカナ
判別を用いる漢字判別方式を提供することにあ
る。
SUMMARY OF THE INVENTION In order to eliminate such conventional problems, it is an object of the present invention to provide a kanji character discrimination method using katakana discrimination in order to reduce misreading and correctly recognize kanji characters in an optical character reading device.

本発明のカタカナ判別を用いる漢字判別方式
は、従来の判別方法にカナの判別を追加して組合
せ、漢字に対応してカナ文字を記入し、カナ文字
から漢字をある程度しぼり、漢字の判別結果と合
わせて判定することを特徴としている。
The kanji discrimination method using katakana discrimination according to the present invention combines the conventional discrimination method with the addition of kana discrimination, writes kana characters corresponding to kanji, narrows down the kanji characters from the kana characters to a certain extent, and combines the kanji discrimination results with the katakana discrimination method. It is characterized by making judgments together.

紙面に印刷されている文字パターンを認識する
には、その光学像を観測して認識装置で処理する
ための電気信号に変える必要があり、また観測さ
れる文字行の行位置や文字位置を決める必要があ
る。
To recognize a character pattern printed on paper, it is necessary to observe the optical image and convert it into an electrical signal for processing by a recognition device, and also to determine the line position and character position of the character line to be observed. There is a need.

行位置や文字位置は、帳票の基準端からのおお
よその位置が指定されているが、印刷の位置ず
れ、用紙の裁断によつて変動するために、文字の
観測系で帳票ごとに位置決めをし、走査しなけれ
ばならない。
Line positions and character positions are specified as approximate positions from the reference edge of the form, but because they vary due to printing misalignment and paper cutting, it is necessary to use a character observation system to determine the position for each form. , must be scanned.

以下、図面により、本発明の実施例を説明す
る。
Embodiments of the present invention will be described below with reference to the drawings.

第3図は、本発明のカタカナ判別を用いる漢字
判別方式を説明するための帳票の一例である。帳
票には住所と氏名が書かれている。
FIG. 3 is an example of a form for explaining the kanji discrimination method using katakana discrimination according to the present invention. The address and name are written on the form.

第3図においては、帳票7上に文字枠8があ
り、この文字枠8の中に漢字9とそれに対応した
カタカナ10が記入されている。帳票7の右端の
マーク11は読み取るべき文字の行位置を示して
いる。
In FIG. 3, there is a character frame 8 on a form 7, and within this character frame 8, a kanji character 9 and a corresponding katakana character 10 are written. A mark 11 at the right end of the form 7 indicates the line position of the character to be read.

第4図は、本発明のカタカナ判別を用いる漢字
判別方式の構成図である。
FIG. 4 is a block diagram of a kanji discrimination method using katakana discrimination according to the present invention.

第4図に示すように、漢字一文字に対応したカ
タカナ部分の未知パターン12を帳票上より光電
変換により取り出し、第1図、第2図に示される
従来と同じ方法でカタカナ判別13を行う。判別
されたカタカナによつて推定される漢字14をメ
モリ部より取り出す。漢字A,B,………Xは推
定される漢字である。
As shown in FIG. 4, an unknown pattern 12 of a katakana portion corresponding to a single kanji character is extracted from a form by photoelectric conversion, and katakana discrimination 13 is performed in the same manner as the conventional method shown in FIGS. 1 and 2. The kanji 14 estimated based on the determined katakana is retrieved from the memory section. Kanji characters A, B, ......X are estimated Kanji characters.

一方、漢字一文字の未知パターン15を帳票上
より取り出す。この未知パターン15を第1図、
第2図に示すような従来と同じ考え方の判別方法
によつて漢字の判別16を行い判定結果を出す。
しかし漢字の場合約2000種もあり類似の判定漢字
17がいくつも出る。漢字A′,………,X′はい
くつかの類似する漢字である。
On the other hand, an unknown pattern 15 of a single kanji character is taken out from the form. This unknown pattern 15 is shown in Figure 1.
Kanji characters are discriminated 16 using the same conventional discrimination method as shown in FIG. 2, and a judgment result is obtained.
However, in the case of kanji, there are about 2,000 types, and there are 17 similar kanji. Kanji A', ......, X' are some similar Kanji.

この漢字の判定17と前記のカタカナによる判
定14とをいつしよにして類似度法等による判別
18を行い、最終判定19を行う。
This kanji determination 17 and the above-mentioned katakana determination 14 are combined to perform a determination 18 using a similarity method or the like, and a final determination 19 is made.

第4図のカタカナ判別を有した漢字判別方式
を、具体的に説明すると第5図、第6図、第7図
に示されるようになる。
The kanji discrimination system having katakana discrimination shown in FIG. 4 will be explained in detail as shown in FIGS. 5, 6, and 7.

第4図におけるカタカナ判別13、漢字判別1
6は第1図、第2図において説明した従来の方法
による。
Katakana discrimination 13 and Kanji discrimination 1 in Figure 4
6 is based on the conventional method explained in FIGS. 1 and 2.

第5図は、第4図のカタカナ判別13の結果か
ら漢字14を推定する方法である。
FIG. 5 shows a method for estimating kanji 14 from the results of katakana discrimination 13 shown in FIG.

すなわち、カナ20に対応して漢字21をメモ
リ内にたくわえておき、推定される漢字14をメ
モリより取り出す。
That is, the kanji 21 corresponding to the kana 20 is stored in the memory, and the estimated kanji 14 is taken out from the memory.

第6図は、第4図における判別18の具体的な
方法である。
FIG. 6 shows a specific method of determination 18 in FIG. 4.

まず、カタカナから推定される漢字14のうち
の一文字aをレジスタ22に持つてくる。次に漢
字判別16から類似される漢字17のうちの一文
字bを別のレジスタ23に持つてくる。この2つ
の文字a,bについて、定点サンプリング法等に
よりその類似度計算24を行う。これを上記の2
つの漢字の組み合わせすべてに行う。
First, one character a out of 14 kanji characters estimated from katakana is brought into the register 22. Next, from the kanji discrimination 16, one character b of the similar kanji 17 is brought to another register 23. Similarity calculation 24 is performed for these two characters a and b using a fixed point sampling method or the like. This is the above 2
Do this for all combinations of kanji.

この結果を第7図に示すようなマトリツクス上
にうめていき、その類似度の数値から最終的に漢
字を判定25する。
The results are entered into a matrix as shown in FIG. 7, and the kanji characters are finally determined 25 from the similarity values.

以上説明したように、本発明によればOCR等
の文字読取りシステムにおいて、漢字の判別のほ
かにふりがな、例えばカタカナ読取りを併用した
ので、誤読の少ない漢字判別が可能となる。
As described above, according to the present invention, in a character reading system such as OCR, furigana (for example, katakana) reading is used in addition to kanji discrimination, so that kanji discrimination with fewer misreadings is possible.

すなわち、本発明のふりがな判別、例えばカタ
カナ判別を用いる漢字判別方式は、カナの誤読が
少ないので、それから推定される漢字も誤読が少
ない。加えて、漢字本来の判別の結果と重ね合わ
せるため、従来の考え方ではできなかつた正確な
読取りが可能となつた。
That is, since the kanji discrimination method of the present invention using furigana discrimination, for example, katakana discrimination, has fewer misreadings of kana, the kanji estimated therefrom also has fewer misreadings. In addition, by combining the results with the original recognition of kanji, it has become possible to read them accurately, which was not possible using conventional methods.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は従来の判別方式のブロツク図、第2図
は従来の文字認識の説明図、第3図は本発明の漢
字判別方式に使われる帳票の一例を示す図第4図
は本発明の実施例を示すカタカナ判別を有した漢
字判別方式のシステム構成図、第5図は第4図の
カタカナ判別に対応した漢字の具体例を示す図、
第6図は第4図の最終判別方法の実際例を示す
図、第7図は類似度計算法による漢字判定法の一
例を示す図である。 1……未知パターン、2……認識、3……判
定、4……正規化、5……特徴抽出、6……識
別、7……帳票、8……文字枠、9……漢字、1
0……カタカナ、11……読取りマーク、12…
…カタカナ部分の未知パターン、13……カタカ
ナ判別、14,17……漢字、15……漢字部分
の未知パターン、16……漢字判別、、18……
判別、19,………25……判定、20……カタ
カナ、21……漢字、22……カタカナからの漢
字、23……漢字判別からの漢字、24……類似
度計算。
Fig. 1 is a block diagram of a conventional discrimination method, Fig. 2 is an explanatory diagram of conventional character recognition, Fig. 3 is an example of a form used in the kanji discrimination method of the present invention, and Fig. 4 is a diagram of a conventional character recognition method. A system configuration diagram of a kanji character discrimination method with katakana discrimination showing an embodiment; FIG. 5 is a diagram showing a specific example of kanji characters corresponding to the katakana discrimination in FIG. 4;
FIG. 6 is a diagram showing an actual example of the final discrimination method shown in FIG. 4, and FIG. 7 is a diagram showing an example of the kanji character determination method using the similarity calculation method. 1... Unknown pattern, 2... Recognition, 3... Judgment, 4... Normalization, 5... Feature extraction, 6... Identification, 7... Form, 8... Character frame, 9... Kanji, 1
0...Katakana, 11...Reading mark, 12...
...Unknown pattern in katakana part, 13...Katakana discrimination, 14,17...Kanji, 15...Unknown pattern in kanji part, 16...Kanji discrimination, 18...
Discrimination, 19,...25...Judgment, 20...Katakana, 21...Kanji, 22...Kanji from katakana, 23...Kanji from kanji discrimination, 24...Similarity calculation.

Claims (1)

【特許請求の範囲】[Claims] 1 一つの漢字を記入可能な枠と、該枠に対応し
て複数の文字から成るふりがなを記入可能な一つ
の枠が設けられた帳票に記入された漢字を認識す
る光学文字読取装置の漢字判別方式において、該
帳票に記入された一つの枠に記入された漢字と該
漢字対応の一つの枠に記入されたふりがなを認識
する認識手段と、該漢字及び該漢字対応のふりが
なの認識結果を比較して当該漢字の判別候補文字
を決定する手段とを備えることを特徴とするふり
がな判別を用いる漢字判別方式。
1 Kanji discrimination by an optical character reader that recognizes kanji written on a form that has a frame in which one kanji can be written and a corresponding frame in which furigana consisting of multiple characters can be written. In the method, a recognition means that recognizes a kanji written in one box entered in the form and a furigana written in one box corresponding to the kanji is compared with the recognition results of the kanji and the furigana corresponding to the kanji. 1. A kanji discrimination method using furigana discrimination, comprising means for determining a discrimination candidate character for the kanji.
JP59080A 1980-01-09 1980-01-09 Kanji (chinese character) distinction system using katakana (square form of japanese syllabary) Granted JPS5699573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59080A JPS5699573A (en) 1980-01-09 1980-01-09 Kanji (chinese character) distinction system using katakana (square form of japanese syllabary)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59080A JPS5699573A (en) 1980-01-09 1980-01-09 Kanji (chinese character) distinction system using katakana (square form of japanese syllabary)

Publications (2)

Publication Number Publication Date
JPS5699573A JPS5699573A (en) 1981-08-10
JPS6316795B2 true JPS6316795B2 (en) 1988-04-11

Family

ID=11477937

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59080A Granted JPS5699573A (en) 1980-01-09 1980-01-09 Kanji (chinese character) distinction system using katakana (square form of japanese syllabary)

Country Status (1)

Country Link
JP (1) JPS5699573A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0255761A (en) * 1988-08-22 1990-02-26 Shikoku Chem Corp Polyamide resin composition
JPH0662490A (en) * 1992-08-05 1994-03-04 Mitsubishi Electric Corp Multichannel audio reproducing device

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58222379A (en) * 1982-06-18 1983-12-24 Fujitsu Ltd Processing system of correction of character recognition
JPS592191A (en) * 1982-06-29 1984-01-07 Fujitsu Ltd Recognizing and processing system of handwritten japanese sentence
JPS6334680A (en) * 1986-07-29 1988-02-15 Toshiba Corp Character reader
JPH0546806A (en) * 1991-08-20 1993-02-26 Oki Electric Ind Co Ltd Character recognition method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5347733A (en) * 1976-10-14 1978-04-28 Fujitsu Ltd Recognizing device for hand-written kana and chinese characters

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5347733A (en) * 1976-10-14 1978-04-28 Fujitsu Ltd Recognizing device for hand-written kana and chinese characters

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0255761A (en) * 1988-08-22 1990-02-26 Shikoku Chem Corp Polyamide resin composition
JPH0662490A (en) * 1992-08-05 1994-03-04 Mitsubishi Electric Corp Multichannel audio reproducing device

Also Published As

Publication number Publication date
JPS5699573A (en) 1981-08-10

Similar Documents

Publication Publication Date Title
JP2553608B2 (en) Optical character reader
US6886136B1 (en) Automatic template and field definition in form processing
US5040226A (en) Courtesy amount read and transaction balancing system
EP0862132A2 (en) Robust identification code recognition system
JP2713622B2 (en) Tabular document reader
JPS6316795B2 (en)
EP3477547B1 (en) Optical character recognition systems and methods
JPH07182448A (en) Character recognition method
JPH05108806A (en) Picture characteristic extracting method and device
JP2877380B2 (en) Optical character reader
JPH0991385A (en) Character recognition dictionary adding method and terminal ocr device using same
JP2925270B2 (en) Character reader
JP2906758B2 (en) Character reader
JPH0426153B2 (en)
JP3151866B2 (en) English character recognition method
JP2924356B2 (en) Optical character reader
JP2832035B2 (en) Character recognition device
JP2600703B2 (en) Partial line collation device
JPS6074094A (en) Character recognizing device
JPS62295192A (en) Optical character image reader
JPH0319589B2 (en)
JPH0877293A (en) Character recognition device and generating method for dictionary for character recognition
EP0114996A2 (en) Character recognition utilizing transition measurements
JP2727755B2 (en) Character string recognition method and apparatus
JPH0628521A (en) Optical character reader