JPS6316795B2 - - Google Patents
Info
- Publication number
- JPS6316795B2 JPS6316795B2 JP55000590A JP59080A JPS6316795B2 JP S6316795 B2 JPS6316795 B2 JP S6316795B2 JP 55000590 A JP55000590 A JP 55000590A JP 59080 A JP59080 A JP 59080A JP S6316795 B2 JPS6316795 B2 JP S6316795B2
- Authority
- JP
- Japan
- Prior art keywords
- kanji
- discrimination
- character
- katakana
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
- 238000012850 discrimination method Methods 0.000 claims description 16
- 238000000034 method Methods 0.000 claims description 16
- 238000012015 optical character recognition Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 8
- 235000016496 Panda oleosa Nutrition 0.000 description 6
- 240000000220 Panda oleosa Species 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Character Discrimination (AREA)
Description
【発明の詳細な説明】
本発明は、ふりがな(例えば、カタカナ)判別
を用いる漢字判別方式に関し、特に光学文字読取
り装置の漢字判別方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a kanji discrimination method using furigana (eg, katakana) discrimination, and more particularly to a kanji discrimination method for an optical character reading device.
従来、光学文字読取り装置の文字認識方法には
各種あり、そのどの方法をとつてみても未知パタ
ーンからある文字を判別して結果を出す方式であ
つた。 Conventionally, there are various character recognition methods for optical character reading devices, and each method produces a result by identifying a certain character from an unknown pattern.
すなわち、まず帳票上に書かれている文字は光
電変換によつてメモリ部にとり込まれ、第1図に
示すように、とり込まれたパターン1(通常、未
知パターンと呼ばれている)を、何らかの認識方
法2により判別を行い、文字を判定3する方法で
あつた。 That is, first, the characters written on the form are captured into the memory section by photoelectric conversion, and as shown in Fig. 1, the captured pattern 1 (usually called the unknown pattern) is This was a method in which a recognition method 2 was used to determine the character.
手書き文字の認識の場合、第2図に示すよう
に、光電変換後、入力パターンは生成された手書
き文字から字体等の手書き変動成分を除き文字を
安定に認識するために正規化4が行われ、次に特
徴抽出回路により文字パターンの形状からその特
徴を抽出して(5)、その結果を識別回路に入れて認
識する(6)。識別6とは、特徴抽出5で得られた特
徴の組を基に未知入力文字を判定する操作をい
う。 In the case of handwritten character recognition, as shown in Figure 2, after photoelectric conversion, the input pattern is normalized 4 to remove handwritten fluctuation components such as font from the generated handwritten characters and to stably recognize the characters. Next, the feature extraction circuit extracts the features from the shape of the character pattern (5), and the results are input to the identification circuit for recognition (6). Identification 6 refers to an operation of determining an unknown input character based on the set of features obtained in feature extraction 5.
また、文字の特徴抽出には、印刷文字をはじめ
常用手書き文字を対象に広く用いられている方法
として、パターン整合法がある。これは、標準パ
ターンと未知入力パターンとの整合の度合いを調
べることにより、文字を認識する方法である。そ
の他に、ストローク分析法、文字輪郭分析法、特
徴記号系列法、幾何学的特徴抽出法、A−b−S
(Analysis−by−Synthesis)法がある。 Furthermore, for character feature extraction, a pattern matching method is widely used for commonly used handwritten characters, including printed characters. This is a method of recognizing characters by checking the degree of matching between a standard pattern and an unknown input pattern. In addition, stroke analysis method, character contour analysis method, feature symbol sequence method, geometric feature extraction method, A-b-S
(Analysis-by-Synthesis) method.
以上述べた文字認識の方法は、数字、英字、カ
ナ、記号程度の少ない文字種では、誤つて読み取
られる場合は少ない。 The above-described character recognition method is unlikely to be read incorrectly for characters with a small number of characters, such as numbers, alphabets, kana, and symbols.
しかし、漢字の判定ともなると約2000種の文字
を判別する必要がでてくる。したがつて、従来の
判別のやり方では正しく漢字の認識を行うことは
不可能であるという問題点がある。 However, when it comes to identifying kanji, it is necessary to distinguish between approximately 2,000 types of characters. Therefore, there is a problem in that it is impossible to correctly recognize kanji using conventional discrimination methods.
本発明の目的は、このような従来の問題点を除
去するため、光学文字読取り装置において、誤読
を少なくして漢字を正しく認識するためカタカナ
判別を用いる漢字判別方式を提供することにあ
る。 SUMMARY OF THE INVENTION In order to eliminate such conventional problems, it is an object of the present invention to provide a kanji character discrimination method using katakana discrimination in order to reduce misreading and correctly recognize kanji characters in an optical character reading device.
本発明のカタカナ判別を用いる漢字判別方式
は、従来の判別方法にカナの判別を追加して組合
せ、漢字に対応してカナ文字を記入し、カナ文字
から漢字をある程度しぼり、漢字の判別結果と合
わせて判定することを特徴としている。 The kanji discrimination method using katakana discrimination according to the present invention combines the conventional discrimination method with the addition of kana discrimination, writes kana characters corresponding to kanji, narrows down the kanji characters from the kana characters to a certain extent, and combines the kanji discrimination results with the katakana discrimination method. It is characterized by making judgments together.
紙面に印刷されている文字パターンを認識する
には、その光学像を観測して認識装置で処理する
ための電気信号に変える必要があり、また観測さ
れる文字行の行位置や文字位置を決める必要があ
る。 To recognize a character pattern printed on paper, it is necessary to observe the optical image and convert it into an electrical signal for processing by a recognition device, and also to determine the line position and character position of the character line to be observed. There is a need.
行位置や文字位置は、帳票の基準端からのおお
よその位置が指定されているが、印刷の位置ず
れ、用紙の裁断によつて変動するために、文字の
観測系で帳票ごとに位置決めをし、走査しなけれ
ばならない。 Line positions and character positions are specified as approximate positions from the reference edge of the form, but because they vary due to printing misalignment and paper cutting, it is necessary to use a character observation system to determine the position for each form. , must be scanned.
以下、図面により、本発明の実施例を説明す
る。 Embodiments of the present invention will be described below with reference to the drawings.
第3図は、本発明のカタカナ判別を用いる漢字
判別方式を説明するための帳票の一例である。帳
票には住所と氏名が書かれている。 FIG. 3 is an example of a form for explaining the kanji discrimination method using katakana discrimination according to the present invention. The address and name are written on the form.
第3図においては、帳票7上に文字枠8があ
り、この文字枠8の中に漢字9とそれに対応した
カタカナ10が記入されている。帳票7の右端の
マーク11は読み取るべき文字の行位置を示して
いる。 In FIG. 3, there is a character frame 8 on a form 7, and within this character frame 8, a kanji character 9 and a corresponding katakana character 10 are written. A mark 11 at the right end of the form 7 indicates the line position of the character to be read.
第4図は、本発明のカタカナ判別を用いる漢字
判別方式の構成図である。 FIG. 4 is a block diagram of a kanji discrimination method using katakana discrimination according to the present invention.
第4図に示すように、漢字一文字に対応したカ
タカナ部分の未知パターン12を帳票上より光電
変換により取り出し、第1図、第2図に示される
従来と同じ方法でカタカナ判別13を行う。判別
されたカタカナによつて推定される漢字14をメ
モリ部より取り出す。漢字A,B,………Xは推
定される漢字である。 As shown in FIG. 4, an unknown pattern 12 of a katakana portion corresponding to a single kanji character is extracted from a form by photoelectric conversion, and katakana discrimination 13 is performed in the same manner as the conventional method shown in FIGS. 1 and 2. The kanji 14 estimated based on the determined katakana is retrieved from the memory section. Kanji characters A, B, ......X are estimated Kanji characters.
一方、漢字一文字の未知パターン15を帳票上
より取り出す。この未知パターン15を第1図、
第2図に示すような従来と同じ考え方の判別方法
によつて漢字の判別16を行い判定結果を出す。
しかし漢字の場合約2000種もあり類似の判定漢字
17がいくつも出る。漢字A′,………,X′はい
くつかの類似する漢字である。 On the other hand, an unknown pattern 15 of a single kanji character is taken out from the form. This unknown pattern 15 is shown in Figure 1.
Kanji characters are discriminated 16 using the same conventional discrimination method as shown in FIG. 2, and a judgment result is obtained.
However, in the case of kanji, there are about 2,000 types, and there are 17 similar kanji. Kanji A', ......, X' are some similar Kanji.
この漢字の判定17と前記のカタカナによる判
定14とをいつしよにして類似度法等による判別
18を行い、最終判定19を行う。 This kanji determination 17 and the above-mentioned katakana determination 14 are combined to perform a determination 18 using a similarity method or the like, and a final determination 19 is made.
第4図のカタカナ判別を有した漢字判別方式
を、具体的に説明すると第5図、第6図、第7図
に示されるようになる。 The kanji discrimination system having katakana discrimination shown in FIG. 4 will be explained in detail as shown in FIGS. 5, 6, and 7.
第4図におけるカタカナ判別13、漢字判別1
6は第1図、第2図において説明した従来の方法
による。 Katakana discrimination 13 and Kanji discrimination 1 in Figure 4
6 is based on the conventional method explained in FIGS. 1 and 2.
第5図は、第4図のカタカナ判別13の結果か
ら漢字14を推定する方法である。 FIG. 5 shows a method for estimating kanji 14 from the results of katakana discrimination 13 shown in FIG.
すなわち、カナ20に対応して漢字21をメモ
リ内にたくわえておき、推定される漢字14をメ
モリより取り出す。 That is, the kanji 21 corresponding to the kana 20 is stored in the memory, and the estimated kanji 14 is taken out from the memory.
第6図は、第4図における判別18の具体的な
方法である。 FIG. 6 shows a specific method of determination 18 in FIG. 4.
まず、カタカナから推定される漢字14のうち
の一文字aをレジスタ22に持つてくる。次に漢
字判別16から類似される漢字17のうちの一文
字bを別のレジスタ23に持つてくる。この2つ
の文字a,bについて、定点サンプリング法等に
よりその類似度計算24を行う。これを上記の2
つの漢字の組み合わせすべてに行う。 First, one character a out of 14 kanji characters estimated from katakana is brought into the register 22. Next, from the kanji discrimination 16, one character b of the similar kanji 17 is brought to another register 23. Similarity calculation 24 is performed for these two characters a and b using a fixed point sampling method or the like. This is the above 2
Do this for all combinations of kanji.
この結果を第7図に示すようなマトリツクス上
にうめていき、その類似度の数値から最終的に漢
字を判定25する。 The results are entered into a matrix as shown in FIG. 7, and the kanji characters are finally determined 25 from the similarity values.
以上説明したように、本発明によればOCR等
の文字読取りシステムにおいて、漢字の判別のほ
かにふりがな、例えばカタカナ読取りを併用した
ので、誤読の少ない漢字判別が可能となる。 As described above, according to the present invention, in a character reading system such as OCR, furigana (for example, katakana) reading is used in addition to kanji discrimination, so that kanji discrimination with fewer misreadings is possible.
すなわち、本発明のふりがな判別、例えばカタ
カナ判別を用いる漢字判別方式は、カナの誤読が
少ないので、それから推定される漢字も誤読が少
ない。加えて、漢字本来の判別の結果と重ね合わ
せるため、従来の考え方ではできなかつた正確な
読取りが可能となつた。 That is, since the kanji discrimination method of the present invention using furigana discrimination, for example, katakana discrimination, has fewer misreadings of kana, the kanji estimated therefrom also has fewer misreadings. In addition, by combining the results with the original recognition of kanji, it has become possible to read them accurately, which was not possible using conventional methods.
第1図は従来の判別方式のブロツク図、第2図
は従来の文字認識の説明図、第3図は本発明の漢
字判別方式に使われる帳票の一例を示す図第4図
は本発明の実施例を示すカタカナ判別を有した漢
字判別方式のシステム構成図、第5図は第4図の
カタカナ判別に対応した漢字の具体例を示す図、
第6図は第4図の最終判別方法の実際例を示す
図、第7図は類似度計算法による漢字判定法の一
例を示す図である。
1……未知パターン、2……認識、3……判
定、4……正規化、5……特徴抽出、6……識
別、7……帳票、8……文字枠、9……漢字、1
0……カタカナ、11……読取りマーク、12…
…カタカナ部分の未知パターン、13……カタカ
ナ判別、14,17……漢字、15……漢字部分
の未知パターン、16……漢字判別、、18……
判別、19,………25……判定、20……カタ
カナ、21……漢字、22……カタカナからの漢
字、23……漢字判別からの漢字、24……類似
度計算。
Fig. 1 is a block diagram of a conventional discrimination method, Fig. 2 is an explanatory diagram of conventional character recognition, Fig. 3 is an example of a form used in the kanji discrimination method of the present invention, and Fig. 4 is a diagram of a conventional character recognition method. A system configuration diagram of a kanji character discrimination method with katakana discrimination showing an embodiment; FIG. 5 is a diagram showing a specific example of kanji characters corresponding to the katakana discrimination in FIG. 4;
FIG. 6 is a diagram showing an actual example of the final discrimination method shown in FIG. 4, and FIG. 7 is a diagram showing an example of the kanji character determination method using the similarity calculation method. 1... Unknown pattern, 2... Recognition, 3... Judgment, 4... Normalization, 5... Feature extraction, 6... Identification, 7... Form, 8... Character frame, 9... Kanji, 1
0...Katakana, 11...Reading mark, 12...
...Unknown pattern in katakana part, 13...Katakana discrimination, 14,17...Kanji, 15...Unknown pattern in kanji part, 16...Kanji discrimination, 18...
Discrimination, 19,...25...Judgment, 20...Katakana, 21...Kanji, 22...Kanji from katakana, 23...Kanji from kanji discrimination, 24...Similarity calculation.
Claims (1)
て複数の文字から成るふりがなを記入可能な一つ
の枠が設けられた帳票に記入された漢字を認識す
る光学文字読取装置の漢字判別方式において、該
帳票に記入された一つの枠に記入された漢字と該
漢字対応の一つの枠に記入されたふりがなを認識
する認識手段と、該漢字及び該漢字対応のふりが
なの認識結果を比較して当該漢字の判別候補文字
を決定する手段とを備えることを特徴とするふり
がな判別を用いる漢字判別方式。1 Kanji discrimination by an optical character reader that recognizes kanji written on a form that has a frame in which one kanji can be written and a corresponding frame in which furigana consisting of multiple characters can be written. In the method, a recognition means that recognizes a kanji written in one box entered in the form and a furigana written in one box corresponding to the kanji is compared with the recognition results of the kanji and the furigana corresponding to the kanji. 1. A kanji discrimination method using furigana discrimination, comprising means for determining a discrimination candidate character for the kanji.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP59080A JPS5699573A (en) | 1980-01-09 | 1980-01-09 | Kanji (chinese character) distinction system using katakana (square form of japanese syllabary) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP59080A JPS5699573A (en) | 1980-01-09 | 1980-01-09 | Kanji (chinese character) distinction system using katakana (square form of japanese syllabary) |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS5699573A JPS5699573A (en) | 1981-08-10 |
JPS6316795B2 true JPS6316795B2 (en) | 1988-04-11 |
Family
ID=11477937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP59080A Granted JPS5699573A (en) | 1980-01-09 | 1980-01-09 | Kanji (chinese character) distinction system using katakana (square form of japanese syllabary) |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS5699573A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0255761A (en) * | 1988-08-22 | 1990-02-26 | Shikoku Chem Corp | Polyamide resin composition |
JPH0662490A (en) * | 1992-08-05 | 1994-03-04 | Mitsubishi Electric Corp | Multichannel audio reproducing device |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58222379A (en) * | 1982-06-18 | 1983-12-24 | Fujitsu Ltd | Processing system of correction of character recognition |
JPS592191A (en) * | 1982-06-29 | 1984-01-07 | Fujitsu Ltd | Recognizing and processing system of handwritten japanese sentence |
JPS6334680A (en) * | 1986-07-29 | 1988-02-15 | Toshiba Corp | Character reader |
JPH0546806A (en) * | 1991-08-20 | 1993-02-26 | Oki Electric Ind Co Ltd | Character recognition method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5347733A (en) * | 1976-10-14 | 1978-04-28 | Fujitsu Ltd | Recognizing device for hand-written kana and chinese characters |
-
1980
- 1980-01-09 JP JP59080A patent/JPS5699573A/en active Granted
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5347733A (en) * | 1976-10-14 | 1978-04-28 | Fujitsu Ltd | Recognizing device for hand-written kana and chinese characters |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0255761A (en) * | 1988-08-22 | 1990-02-26 | Shikoku Chem Corp | Polyamide resin composition |
JPH0662490A (en) * | 1992-08-05 | 1994-03-04 | Mitsubishi Electric Corp | Multichannel audio reproducing device |
Also Published As
Publication number | Publication date |
---|---|
JPS5699573A (en) | 1981-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2553608B2 (en) | Optical character reader | |
US6886136B1 (en) | Automatic template and field definition in form processing | |
US5040226A (en) | Courtesy amount read and transaction balancing system | |
EP0862132A2 (en) | Robust identification code recognition system | |
JP2713622B2 (en) | Tabular document reader | |
JPS6316795B2 (en) | ||
EP3477547B1 (en) | Optical character recognition systems and methods | |
JPH07182448A (en) | Character recognition method | |
JPH05108806A (en) | Picture characteristic extracting method and device | |
JP2877380B2 (en) | Optical character reader | |
JPH0991385A (en) | Character recognition dictionary adding method and terminal ocr device using same | |
JP2925270B2 (en) | Character reader | |
JP2906758B2 (en) | Character reader | |
JPH0426153B2 (en) | ||
JP3151866B2 (en) | English character recognition method | |
JP2924356B2 (en) | Optical character reader | |
JP2832035B2 (en) | Character recognition device | |
JP2600703B2 (en) | Partial line collation device | |
JPS6074094A (en) | Character recognizing device | |
JPS62295192A (en) | Optical character image reader | |
JPH0319589B2 (en) | ||
JPH0877293A (en) | Character recognition device and generating method for dictionary for character recognition | |
EP0114996A2 (en) | Character recognition utilizing transition measurements | |
JP2727755B2 (en) | Character string recognition method and apparatus | |
JPH0628521A (en) | Optical character reader |