JP2903599B2

JP2903599B2 - Character recognition device

Info

Publication number: JP2903599B2
Application number: JP2037604A
Authority: JP
Inventors: 学人杉本; 真司近藤; 哲也松本; 禎造成本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-02-19
Filing date: 1990-02-19
Publication date: 1999-06-07
Anticipated expiration: 2014-06-07
Also published as: JPH03240891A

Description

【発明の詳細な説明】産業上の利用分野本発明は、平仮名，片仮名の濁音文字，半濁音文字を
含む新聞，雑誌等の活字文字及び手書き文字を認識し、
例えばJISコード等の情報量に変換する文字認識装置に
関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention recognizes printed characters and handwritten characters in newspapers, magazines, etc., including hiragana, katakana and katakana characters.
For example, the present invention relates to a character recognition device that converts an information amount such as a JIS code.

従来の技術濁音文字，半濁音文字を含む入力画像に対し、すでに
知られている文字認識技術を用いて認識した認識結果の
濁音文字，半濁音文字，清音文字の誤りを自動的に訂正
する文字認識装置には、言語の文法にしたがって認識結
果を単語ごとに分割し、あらかじめ登録されている単語
辞書との照合によって訂正を行うものがある。2. Description of the Related Art Characters that automatically correct errors in voiced characters, half-voiced characters, and clear voiced characters as a result of recognition using an already known character recognition technology for input images containing voiced and semi-voiced characters Some recognition devices divide a recognition result for each word according to the grammar of a language, and correct the result by collating with a word dictionary registered in advance.

発明が解決しようとする課題しかしながら、上記のような従来の技術では、単語を
照合させる単語辞書の量が膨大なものであり照合させる
には非常に時間がかかるという欠点を有していた。Problems to be Solved by the Invention However, the conventional techniques as described above have a disadvantage that the amount of a word dictionary for matching words is enormous, and it takes a very long time to match words.

本発明はかかる点に鑑みてなさなれたものであり、濁
音文字，半濁音文字を含む入力画像を文字認識技術によ
って認識した認識結果の濁音文字，半濁音文字，清音文
字の誤りを簡易な方法で、高速かつ自動的に訂正する文
字認識装置を提供することを目的としている。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and provides a simple method for recognizing an error in a voiced character, a half-voiced voice character, and a clear voiced character as a result of recognition of an input image including a voiced character and a half-voiced character by a character recognition technology. Therefore, it is an object of the present invention to provide a character recognition device that performs high-speed and automatic correction.

課題を解決するための手段本発明は上記目的を達成するために、文字認識対象を
含む画像情報を入力する画像情報入力部と、前記画像情
報入力部に入力された画像情報を格納する画像情報メモ
リ部と、前記画像情報メモリ部に格納された画像情報か
ら１文字ずつ文字画像情報を抽出する文字切り出し部
と、前記文字切り出し部で抽出した文字画像情報を格納
する文字画像情報メモリ部と、前記文字画像情報メモリ
部に格納された文字画像情報から文字認識を行い文字認
識結果を得る文字認識部と、前記文字認識部で得た認識
結果と前記文字画像情報メモリ部に格納された文字画像
情報を用いて濁音文字，半濁音文字，清音文字の誤認識
の訂正を行う濁音・半濁音文字処理部を備えた文字認識
装置である。Means for Solving the Problems In order to achieve the above object, the present invention provides an image information input unit for inputting image information including a character recognition target, and image information for storing image information input to the image information input unit. A memory unit, a character cutout unit that extracts character image information character by character from the image information stored in the image information memory unit, and a character image information memory unit that stores the character image information extracted by the character cutout unit. A character recognition unit that performs character recognition from the character image information stored in the character image information memory unit to obtain a character recognition result; a recognition result obtained by the character recognition unit and a character image stored in the character image information memory unit This is a character recognition device having a voiced / semi-voiced character processing unit for correcting misrecognition of voiced, semi-voiced and clear voice characters using information.

作用本発明は上記の構成により、画像情報入力部で入力さ
れた画像情報を画像情報メモリ部に格納し、格納した画
像情報から文字切り出し部で１文字ずつの文字画像情報
を抽出し、抽出された文字画像情報を文字画像情報メモ
リ部へ格納し、格納した文字画像情報から文字認識部で
文字認識を行い認識結果を抽出し、抽出された認識結果
と文字画像情報メモリ部に格納されている文字画像情報
から濁音・半濁音文字処理部で濁音文字，半濁音文字，
清音文字の誤認識の訂正する。According to the present invention, the image information input by the image information input unit is stored in the image information memory unit, and character image information for each character is extracted from the stored image information by the character cutout unit. The character image information stored in the character image information memory unit, the character recognition unit performs character recognition from the stored character image information to extract a recognition result, and the extracted recognition result is stored in the character image information memory unit. From the character image information, the voiced / half-voiced sound character processing unit
Correct misrecognition of Kiyon characters.

実施例以下、本発明の実施例について図面を参照しながら説
明する。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図は、本発明による文字認識装置の一実施例の構
成図である。１は画像情報入力部であり文字認識対象を
含む画像を走査し、２値信号で画像情報メモリ部２に格
納する。３は文字切り出し部であり画像情報メモリ部２
で格納されている画像情報から１文字ずつの文字画像情
報を抽出し、文字画像情報メモリ部４へ格納する。５は
文字認識部であり文字画像情報メモリ部４で格納されて
いる文字画像情報を文字認識し認識結果を抽出する。６
は濁音・半濁音文字処理部であり５の文字認識部で抽出
された認識結果と４の文字画像情報メモリ部で格納され
ている文字画像情報を用いて、濁音文字，半濁音文字，
清音文字の誤認識の訂正する。FIG. 1 is a configuration diagram of one embodiment of a character recognition device according to the present invention. Reference numeral 1 denotes an image information input unit which scans an image including a character recognition target and stores the image in the image information memory unit 2 as a binary signal. Reference numeral 3 denotes a character cutout unit, and an image information memory unit 2
The character image information of each character is extracted from the image information stored in step (1) and stored in the character image information memory unit 4. Reference numeral 5 denotes a character recognition unit that performs character recognition on the character image information stored in the character image information memory unit 4 and extracts a recognition result. 6
Is a voiced / semi-voiced character processing unit, which uses the recognition result extracted by the character recognition unit 5 and the character image information stored in the character image information memory unit 4 to obtain a voiced character, a half-voiced character,
Correct misrecognition of Kiyon characters.

以上のように構成された文字認識装置について第２図
に示す入力画像７を例に説明する。The character recognition device configured as described above will be described using the input image 7 shown in FIG. 2 as an example.

画像情報入力部１から入力された画像７は、文字部の
黒画素を１、背景部の白画素をＯの２値データで画像情
報メモリ部２に蓄える。The image 7 input from the image information input unit 1 stores binary data of 1 for black pixels in the character portion and O for white pixels in the background portion in the image information memory unit 2.

文字切り出し部３では、画像情報メモリ部２に蓄えら
れている入力画像７を横方向に走査して黒画素間の距離
を算出する。同様に画像情報メモリ部２に蓄えられてい
る入力画像７を縦方向に走査して黒画素間の距離を算出
する。縦方向，横方向に走査して得られた黒画素間の距
離情報に着目し、１文字ずつの文字画像情報８を抽出
し、文字画像情報メモリ部４に蓄える。第３図に第２図
の入力画像７から文字切り出し部３によって１文字ずつ
の文字画像情報８が抽出された状態を示す。The character cutout unit 3 scans the input image 7 stored in the image information memory unit 2 in the horizontal direction to calculate the distance between black pixels. Similarly, the input image 7 stored in the image information memory unit 2 is scanned in the vertical direction to calculate the distance between black pixels. Focusing on distance information between black pixels obtained by scanning in the vertical and horizontal directions, character image information 8 for each character is extracted and stored in the character image information memory unit 4. FIG. 3 shows a state where character image information 8 for each character is extracted from the input image 7 of FIG.

文字認識部５では、文字画像情報メモリ部４に蓄えら
れている１文字ずつの文字画像情報を横方向に４分割、
縦方向に４分割、合計16個の小領域に分割する。第４図
に『す』の文字画像情報９を分割した状態を示す。分割
した16個の小領域に対して文字部の黒画素数と背景部の
白画素数を特徴量として算出し、あらかじめ登録されて
いる１文字ずつの特徴量と照合し、最も似た文字『す』
を認識結果とする。第５図に第３図の文字画像情報８を
文字認識部５で認識した認識結果10を示す。The character recognition unit 5 divides the character image information of each character stored in the character image information memory unit 4 into four parts in the horizontal direction,
It is divided vertically into four, a total of 16 small areas. FIG. 4 shows a state in which the character image information 9 of "su" is divided. The number of black pixels in the character portion and the number of white pixels in the background portion are calculated as feature amounts for the 16 divided small regions, and compared with the previously registered feature amount of each character, and the most similar character “ You
Is the recognition result. FIG. 5 shows a recognition result 10 obtained by recognizing the character image information 8 of FIG.

濁音・半濁音文字処理部６では、文字認識部５で抽出
した認識結果10の濁音文字，半濁音文字，清音文字を抽
出し、対応する文字画像情報メモリ部４の文字画像情報
８に対し黒画素の連結している黒画素塊の個数を算出す
る。第６図に『ハ，バ，パ』の黒画素塊の算出結果を示
す。次に算出した黒画素塊の個数を、あらかじめ登録さ
れている文字別黒画素塊数情報11と照合と黒画素塊数が
一致する文字を正確として文字認識部５で抽出した認識
結果10に訂正を加え、訂正後の認識結果12を得る。第７
図にあらかじめ登録されている文字別黒画素塊数情報11
を示す。第８図に濁音・半濁音文字処理部６で訂正され
た認識結果12を示す。The voiced / semi-voiced character processing unit 6 extracts the voiced character, the half-voiced sound character, and the clear sound character of the recognition result 10 extracted by the character recognition unit 5, and performs black processing on the corresponding character image information 8 in the character image information memory unit 4. The number of connected black pixel blocks is calculated. FIG. 6 shows the calculation result of the black pixel block of “c, b, c”. Next, the calculated number of black pixel blocks is corrected to a recognition result 10 extracted by the character recognition unit 5 assuming that a character whose number of black pixel blocks matches with the previously registered black pixel block information 11 for each character is correct. To obtain the corrected recognition result 12. Seventh
Black pixel block number information for each character registered in advance in the figure 11
Is shown. FIG. 8 shows a recognition result 12 corrected by the voiced / semi-voiced character processing unit 6.

以上のように構成された文字認識装置では、入力画像
から文字画像情報を抽出し認識結果を得ることができ、
更に濁音文字，半濁音文字，清音文字について誤認識を
訂正することができる。In the character recognition device configured as described above, character image information can be extracted from an input image to obtain a recognition result,
Further, erroneous recognition can be corrected for voiced characters, semi-voiced characters, and clear voiced characters.

発明の効果以上説明したように、本発明によれば認識対象を含む
入力画像から簡易な方法で認識結果を得ることができ、
更に誤認識している濁音文字，半濁音文字，清音文字に
ついて自動的に訂正することができる。Effects of the Invention As described above, according to the present invention, a recognition result can be obtained from an input image including a recognition target by a simple method,
Furthermore, it is possible to automatically correct erroneously recognized voiced characters, semi-voiced characters, and clear voiced characters.

【図面の簡単な説明】[Brief description of the drawings]

第１図は本発明における一実施例の文字認識装置の構成
図、第２図は入力画像の説明図、第３図は１文字ずつの
文字画像情報の説明図、第４図は『す』の文字画像情報
の分割状態の説明図、第５図は訂正前認識結果の説明
図、第６図は『ハ，バ，パ』の黒画素塊の算出結果の説
明図、第７図は文字別黒画素塊数情報の説明図、第８図
は訂正後認識結果の説明図である。１……画像情報入力部、２……画像情報メモリ部、３…
…文字切り出し部、４……文字画像情報メモリ部、５…
…文字認識部、６……濁音・半濁音文字処理部、７……
入力画像、８……文字画像情報、９……『す』の文字画
像情報の分割状態、10……訂正前認識結果、11……文字
別黒画素塊数情報、12……訂正後認識結果。FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention, FIG. 2 is an explanatory diagram of an input image, FIG. 3 is an explanatory diagram of character image information for each character, and FIG. FIG. 5 is an explanatory diagram of a recognition result before correction, FIG. 6 is an explanatory diagram of a calculation result of a black pixel block of “ha, ba, pa”, and FIG. FIG. 8 is an explanatory diagram of different black pixel block number information, and FIG. 8 is an explanatory diagram of a recognition result after correction. 1... Image information input section, 2... Image information memory section, 3.
... Character cutout section, 4 ... Character image information memory section, 5 ...
… Character recognition unit, 6… Dakuon / semi-voiced character processing unit, 7 ……
Input image, 8 ... Character image information, 9 ... Division of character image information of "su", 10 ... Recognition result before correction, 11 ... Black pixel block number information by character, 12 ... Recognition result after correction .

───────────────────────────────────────────────────── フロントページの続き (72)発明者成本禎造大阪府門真市大字門真1006番地松下電器産業株式会社内 (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06K 9/03 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Sadazo Narimoto 1006 Kazuma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (58) Field surveyed (Int.Cl. ⁶ , DB name) G06K 9/03

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】文字認識対象を含む画像情報を入力する画
像情報入力部と、前記画像情報入力部に入力された前記
画像情報を格納する画像情報メモリ部と、前記画像情報
メモリ部に格納された前記画像情報から１文字ずつ文字
画像情報を抽出する文字切り出し部と、前記文字切り出
し部で抽出した前記文字画像情報を格納する文字画像情
報メモリ部と、前記文字画像情報メモリ部に格納された
前記文字画像情報から文字認識を行い認識結果を得る文
字認識部と、前記文字認識部で得た認識結果と前記文字
画像情報メモリ部に格納された前記文字画像情報を用い
て濁音文字，半濁音文字，清音文字の誤認識の訂正を行
う濁音・半濁音文字処理部を有することを特徴とする文
字認識装置。An image information input unit for inputting image information including a character recognition target; an image information memory unit for storing the image information input to the image information input unit; A character extracting unit that extracts character image information one character at a time from the image information, a character image information memory unit that stores the character image information extracted by the character extracting unit, and a character image information memory unit that stores the character image information. A character recognition unit that obtains a recognition result by performing character recognition from the character image information; a voiced character and a semi-voiced sound using the recognition result obtained by the character recognition unit and the character image information stored in the character image information memory unit A character recognition device comprising a voiced / semi-voiced character processing unit for correcting erroneous recognition of characters and clear-tone characters.