JP2001143020A - Character recognition device, method and recording medium thereof - Google Patents

Character recognition device, method and recording medium thereof

Info

Publication number
JP2001143020A
JP2001143020A JP32061799A JP32061799A JP2001143020A JP 2001143020 A JP2001143020 A JP 2001143020A JP 32061799 A JP32061799 A JP 32061799A JP 32061799 A JP32061799 A JP 32061799A JP 2001143020 A JP2001143020 A JP 2001143020A
Authority
JP
Japan
Prior art keywords
character
character recognition
reliability
type
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP32061799A
Other languages
Japanese (ja)
Inventor
Junji Kashioka
潤二 柏岡
Satoshi Naoi
聡 直井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP32061799A priority Critical patent/JP2001143020A/en
Publication of JP2001143020A publication Critical patent/JP2001143020A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PROBLEM TO BE SOLVED: To quickly, efficiently and highly reliably output a character recogni tion result by discriminating the character sort of characters, whose character sort, e.g. handwritten characters or printing-type characters, is unknown in a document or the like. SOLUTION: This character recognition device is provided with a means for reading out an image from a document or the like, a means for discriminating the character sort of characters in the read image and a means for executing character recognition processing, on the basis of the discriminated character sort and outputting the character recognition result, when the reliability of a character recognition result is a threshold or higher, or when the reliability is less than the threshold, executing other character recognition processings, comparing reliability obtained by the processing with the original reliability, and outputting the character recognition result of higher reliability.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、手書文字あるいは活字
文字などの未知の字種の文書の文字認識を行う文字認識
装置、文字認識方法、および記録媒体に関するものであ
る。近年、入力周辺機器として文字認識ソフトウェアの
需要が増加している。文字認識ソフトウェアにおいて、
手書文字と活字とが混在する文書の場合や、帳票中の文
字記入欄に記入される文字が手書文字か活字かが未知の
帳票を認識する場合に、より高い信頼性で自動的に文字
認識することが望まれている。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device, a character recognition method, and a recording medium for recognizing a character of an unknown character type such as a handwritten character or a printed character. In recent years, demand for character recognition software as input peripheral devices has been increasing. In character recognition software,
Automatically with higher reliability automatically when documents with mixed handwritten characters and print types, or when recognizing a form in which it is unknown whether the characters to be entered in the character entry field in the form are handwritten characters or print type It is desired to recognize characters.

【0002】[0002]

【従来の技術】従来、手書文字と活字文字とが混在する
文書を認識する場合、あるいは帳票中の文字記入欄に記
入された文字が手書文字か活字文字かが未知の帳票を認
識するには、手書文字の認識処理と、活字文字の認識処
理とを2つの処理を行い、両処理のうち信頼性の高い認
識結果を採用していた。
2. Description of the Related Art Conventionally, when recognizing a document in which handwritten characters and printed characters are mixed, or a form in which it is unknown whether a character entered in a character entry field in a form is a handwritten character or a printed character. , Two processes, a handwritten character recognition process and a printed character recognition process, are performed, and a highly reliable recognition result of both processes is adopted.

【0003】また、特願平10−357701号公報に
記載されているように、手書文字か活字文字かの字種を
判別し、判別された字種の方の認識処理を行って認識結
果を得るようにしていた。
Also, as described in Japanese Patent Application No. 10-357701, the type of handwritten character or typeface character is determined, and the recognized character type is subjected to a recognition process to obtain a recognition result. I was trying to get

【0004】[0004]

【発明が解決しようとする課題】上述した前者によれ
ば、手書文字と活字文字との両者の認識処理を行って信
頼度の高い認識結果を出力するようにしていたため、両
者の認識処理を常に行うために非常に多くの時間が必要
となり、一方の認識処理が常に無駄となり迅速かつ効率
的に手書文字と活字文字のいずれであるか未知の文書を
認識できないという問題があった。
According to the former, the recognition process for both handwritten characters and printed characters is performed to output a highly reliable recognition result. There is a problem that it takes a lot of time to always perform it, and the recognition process on one side is always wasteful, and it is not possible to quickly and efficiently recognize an unknown document as a handwritten character or a printed character.

【0005】また、上述した後者の手書文字か活字文字
かを判別し、判別された一方の認識処理を行ってその認
識結果を出力していたため、手書文字か活字文字かの判
別をなんらかの原因で誤ってしまうと、認識結果の信頼
性が非常に低くなってしまうという問題があった。
Further, since the latter is distinguished between a handwritten character and a printed character, one of the determined recognition processes is performed, and the recognition result is output. If there is a mistake in the cause, there is a problem that the reliability of the recognition result becomes very low.

【0006】本発明は、これらの問題点を解決するた
め、文書や帳票中の文字について字種(手書文字か活字
文字などの字種)を判別して判別された方の字種の文字
認識処理を行い、信頼性が閾値より小さければ他の文字
認識処理をして信頼性の高い方の結果を出力したり、処
理の高速な方の文字認識処理をして判別した字種と適用
した文字認識結果の字種とが一致する場合に文字認識結
果の信頼性が閾値より低いときは他の文字認識処理をし
てより信頼性の高い結果を出力し、手書文字か活字文字
などの字種が未知の場合に迅速かつ効率的に信頼性の高
い文字認識結果を出力することを目的としている。
According to the present invention, in order to solve these problems, the character type (character type such as handwritten character or printed character) of a character in a document or a form is determined to determine the character type. Performs recognition processing, and if the reliability is smaller than the threshold, performs other character recognition processing and outputs the more reliable result, or applies the character type determined by performing the faster character recognition processing. If the reliability of the character recognition result is lower than the threshold when the character type of the recognized character recognition result matches, the other character recognition processing is performed and a more reliable result is output. It is intended to quickly and efficiently output a highly reliable character recognition result when the character type is unknown.

【0007】[0007]

【課題を解決するための手段】図1を参照して課題を解
決するための手段を説明する。図1において、処理装置
1は、プログラムに従い各種処理を実行するものであっ
て、ここでは、読取手段2、判別手段3、文字認識手段
4などから構成されるものである。
Means for solving the problem will be described with reference to FIG. In FIG. 1, a processing device 1 executes various processes in accordance with a program, and here includes a reading unit 2, a determining unit 3, a character recognizing unit 4, and the like.

【0008】読取手段2は、文書や帳票などから手書文
字あるいは活字文字のいずれか未知のものをスキャナな
どを制御して読み取るものである。判別手段3は、読み
取った画像(イメージデータ)をもとに手書文字あるい
は活字文字のいずれかを判別するものである。
The reading means 2 reads an unknown one of a handwritten character and a printed character from a document or a form by controlling a scanner or the like. The determining means 3 determines one of a handwritten character and a printed character based on the read image (image data).

【0009】文字認識手段4は、手書認識処理あるいは
活字認識処理を行ったりなどするものである(図2、図
3参照)。次に、動作を説明する。
The character recognition means 4 performs a handwriting recognition process or a type recognition process (see FIGS. 2 and 3). Next, the operation will be described.

【0010】読取手段2が文書や帳票などから文字の画
像を読み取り、判別手段3が読み取った画像から字種
(手書文字あるいは活字文字など)を判別し、文字認識
手段4が判別された字種に基づいて文字認識処理を行
い、その信頼性が閾値以上のときに文字認識結果を出力
し、一方、閾値以下のときに他の文字認識処理を行って
そのときの信頼性と最初の信頼性を比較して高い方の文
字認識結果を出力するようにしている。
A reading means 2 reads an image of a character from a document or a form, a discriminating means 3 discriminates a character type (a handwritten character or a printed character) from the read image, and a character recognizing means 4 discriminates a character. Performs character recognition processing based on the species and outputs a character recognition result when the reliability is equal to or greater than a threshold, and performs other character recognition processing when the reliability is equal to or less than the threshold and the reliability at that time and the initial reliability The character recognition result of the higher one is output by comparing the characters.

【0011】また、読取手段2が文書や帳票などから画
像を読み取り、文字認識手段4が読み取った画像から字
種を判別し、次に、高速処理可能な方の字種の文字認識
処理を行い、判別した字種と適用した文字認識の字種と
が一致する場合には文字認識結果の信頼性が閾値以上の
ときに文字認識結果を出力し、一方、閾値以下のときに
他の認識処理を行ってそのときの信頼性と最初の信頼性
とを比較して高い方の文字認識結果を出力するようにし
ている。
The reading means 2 reads an image from a document or a form, and the character recognizing means 4 determines a character type from the read image, and then performs a character recognition process of the character type which can be processed at high speed. If the determined character type matches the character type of the applied character recognition, the character recognition result is output when the reliability of the character recognition result is equal to or more than the threshold, and when the reliability is equal to or less than the threshold, other recognition processing is performed. And compares the reliability at that time with the initial reliability to output the higher character recognition result.

【0012】また、読取手段2が文書や帳票などから画
像を読み取り、文字認識手段4が読み取った画像から字
種を判別し、次に、高速処理可能な方の字種の文字認識
処理を行い、判別した字種と適用した文字認識の字種と
が一致しない場合には文字認識結果の信頼性が閾値以下
のときに他の文字認識処理を行ってその文字認識結果を
出力し、一方、閾値以上のときに他の認識処理を行って
そのときの信頼性と最初の信頼性とを比較して高い方の
文字認識結果を出力するようにしている。
The reading means 2 reads an image from a document or a form, and the character recognizing means 4 determines a character type from the read image, and then performs a character recognition process of a character type which can be processed at high speed. If the determined character type does not match the character type of the applied character recognition, when the reliability of the character recognition result is equal to or less than the threshold, another character recognition process is performed and the character recognition result is output. When the value is equal to or larger than the threshold value, another recognition process is performed, and the reliability at that time is compared with the initial reliability to output a higher character recognition result.

【0013】従って、文書や帳票中の文字について字種
(手書文字か活字文字等の字種)を判別して判別された
方の字種の文字認識処理を行い、信頼性が閾値より小さ
ければ他の認識処理をして信頼性の高い方の文字認識結
果を出力したり、処理の高速の方の文字認識処理して、
判別した字種と適用した文字認識処理の字種とが一致す
る場合に、文字認識結果の信頼性が閾値より低いときは
他の文字認識処理してより信頼性の高い結果を出力する
ことにより、字種が未知の場合に迅速かつ効率的に信頼
性の高い文字認識結果を出力することが可能となる。
Therefore, the character type (character type such as handwritten character or type character) of the character in the document or the form is determined, and the character type of the character type determined is determined, and if the reliability is smaller than the threshold value. If other recognition processing is performed, the more reliable character recognition result is output, or the faster character recognition processing is performed.
When the determined character type matches the character type of the applied character recognition processing, if the reliability of the character recognition result is lower than the threshold, another character recognition processing is performed to output a more reliable result. In the case where the character type is unknown, it is possible to quickly and efficiently output a highly reliable character recognition result.

【0014】[0014]

【実施例】次に、図1から図4用いて本発明の実施の形
態および動作を順次詳細に説明する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the embodiment and operation of the present invention will be sequentially described in detail with reference to FIGS.

【0015】図1は、本発明のシステム構成図を示す。
図1において、処理装置1は、図示外の記録媒体から読
み出したプログラムを主記憶にローディングして起動し
以下に説明する各種処理を実行するものであって、ここ
では、読取手段2、判別手段3、文字認識手段4などか
ら構成されるものである。
FIG. 1 shows a system configuration diagram of the present invention.
In FIG. 1, a processing device 1 loads a program read from a recording medium (not shown) into a main memory and starts up to execute various processes described below. 3, character recognition means 4 and the like.

【0016】読取手段2は、文書や帳票などから手書文
字あるいは活字文字のいずれであるか未知の画像をスキ
ャナなどを制御して読み取るものである。判別手段3
は、読み取った画像から手書文字あるいは活字文字のい
ずれかを判別するものである(図4を用いて後述す
る)。
The reading means 2 reads an unknown image from a document, a form, or the like as a handwritten character or a printed character by controlling a scanner or the like. Determination means 3
Is to determine either a handwritten character or a printed character from the read image (described later with reference to FIG. 4).

【0017】文字認識手段4は、画像から文字の外接矩
形を求めて文字パターンを抽出し、手書文字用辞書5あ
るいは活字文字用辞書6と照合して文字認識を行ったり
などするものである(図2、図3を用いて後述する)。
The character recognition means 4 obtains a circumscribed rectangle of the character from the image, extracts a character pattern, and performs character recognition by collating with the handwritten character dictionary 5 or the printed character dictionary 6. (Described later with reference to FIGS. 2 and 3).

【0018】手書文字用辞書5は、手書文字を読み取っ
た文字パターンと、当該手書文字用辞書5に予め登録さ
れている文字パターンと照合して一致(距離が最も小さ
いもの)した文字と認識するための辞書である。
The handwritten character dictionary 5 matches a character pattern obtained by reading a handwritten character with a character pattern registered in the handwritten character dictionary 5 in advance (the one having the smallest distance). It is a dictionary for recognizing "."

【0019】活字文字用辞書6は、活字文字を読み取っ
た文字パターンと、当該活字文字用辞書6に予め登録さ
れている文字パターンと照合して一致(距離が最も小さ
いもの)した文字と認識するための辞書である。
The printed character dictionary 6 matches a character pattern obtained by reading a printed character with a character pattern registered in the printed character dictionary 6 in advance, and recognizes a character that matches (has the smallest distance). It is a dictionary for.

【0020】出力ファイル7は、認識結果を出力するフ
ァイルである。表示装置8は、画面などを表示するもの
である。入力装置9は、各種入力装置であって、ここで
は、マウス、キーボード、および文書や帳票上に記載さ
れた手書文字や活字文字をスキャンして画像として読み
取るスキャナなどである。
The output file 7 is a file for outputting a recognition result. The display device 8 displays a screen or the like. The input device 9 is various input devices, such as a mouse, a keyboard, and a scanner that scans handwritten characters and printed characters written on a document or a form and reads them as an image.

【0021】次に、図2のフローチャートの順番に従
い、図1の構成の動作を詳細に説明する。図2は、本発
明の動作説明フローチャート(その1)を示す。これ
は、文書あるいは帳票などから読み取った画像につい
て、字種(手書文字あるいは活字文字のいずれか)を判
別し、判別された字種の方の文字認識処理を行い、文字
認識結果の信頼性が閾値以上のときに認識結果を出力
し、閾値以下のときに他の文字認識処理を行った後、両
者の信頼性を比較してより高い方の文字認識結果を出力
する例を示す。
Next, the operation of the configuration of FIG. 1 will be described in detail according to the order of the flowchart of FIG. FIG. 2 is a flowchart (part 1) for explaining the operation of the present invention. This is to determine the character type (either handwritten character or printed character) of an image read from a document or a form, perform character recognition processing of the determined character type, and improve the reliability of the character recognition result. An example in which a recognition result is output when is equal to or more than a threshold value, another character recognition process is performed when the value is equal to or less than the threshold value, and the reliability of the two is compared to output a higher character recognition result.

【0022】図2において、S1は、画像を入力する。
これは、スキャナを用い、手書文字あるいは活字文字の
いずれであるか未知の文書あるいは帳票を読み取って画
像を取り込む(入力)する。
In FIG. 2, S1 inputs an image.
In this method, an unknown document or form, which is a handwritten character or a printed character, is read using a scanner, and an image is captured (input).

【0023】S2は、文字パターンを抽出する。これ
は、S1で取り込んだ画像について、文字の外接矩形を
求めて文字パターンを抽出する。S3は、手書文字と活
字文字の判別を行う。これは、後述する図4を用いて説
明するように、活字文字は直線の頻度が高く、手書文字
は直線頻度が低いなどの特徴をもとに手書文字か、活字
文字かの判別を行う。
In step S2, a character pattern is extracted. In this method, a character pattern is extracted from the image captured in S1 by obtaining a circumscribed rectangle of the character. S3 discriminates between handwritten characters and printed characters. As described with reference to FIG. 4 described below, this is based on a feature that a printed character has a high frequency of a straight line and a handwritten character has a low frequency of a straight line. Do.

【0024】S4は、S3の判別結果が手書文字か判別
する。YESの場合には、手書文字と判明したので、S
6に進む。一方、NOの場合には、活字文字と判明した
ので、S10に進む。
At S4, it is determined whether or not the result of determination at S3 is a handwritten character. In the case of YES, it is determined that the character is a handwritten character.
Proceed to 6. On the other hand, in the case of NO, it is determined that the character is a print character, and the process proceeds to S10.

【0025】S5は、S4のYESで手書文字と判明し
たので、手書文字認識を行う。これは、S2で抽出した
セル毎の文字パターンと、手書文字用辞書5中に予め登
録されている手書文字用の文字パターンとの間の類似度
が最も大きい(距離差が最も小さい)もの(文字)を手
書文字認識結果として生成する。
In S5, since it is determined that the character is a handwritten character by YES in S4, handwritten character recognition is performed. This is because the similarity between the character pattern for each cell extracted in S2 and the character pattern for handwritten characters registered in advance in the handwritten character dictionary 5 is the largest (the distance difference is the smallest). An object (character) is generated as a handwritten character recognition result.

【0026】S6は、S5で手書文字認識した結果の信
頼度が閾値以上か判別する。YESの場合には、手書文
字認識結果の信頼度が閾値以上と判明したので、S7で
手書文字認識結果を出力する。一方、NOの場合には、
手書文字認識結果の信頼度が閾値以下と判明したので、
S8に進む。
In step S6, it is determined whether or not the reliability of the result of the handwritten character recognition in step S5 is equal to or greater than a threshold value. In the case of YES, it is determined that the reliability of the handwritten character recognition result is equal to or higher than the threshold value, and the handwritten character recognition result is output in S7. On the other hand, in the case of NO,
Since the reliability of the handwritten character recognition result was found to be below the threshold,
Proceed to S8.

【0027】S8は、S6のNOで手書文字認識の信頼
度が閾値以下と判明したので、適用されていない文字認
識処理(ここでは、活字文字認識処理)を適用し、文字
認識を行う。これは、S2で抽出したセル毎の文字パタ
ーンと、活字文字用辞書6中に予め登録されている活字
文字用の文字パターンとの間の類似度が最も大きい(距
離差が最も小さい)もの(文字)を活字文字認識結果と
して生成する。
In S8, since the reliability of handwritten character recognition is determined to be equal to or less than the threshold value in NO in S6, a character recognition process that has not been applied (here, a printed character recognition process) is applied to perform character recognition. This means that the similarity between the character pattern for each cell extracted in S2 and the character pattern for type characters registered in advance in the type character dictionary 6 is the largest (the distance difference is the smallest) ( Character) is generated as a print character recognition result.

【0028】S9は、信頼度の高い認識結果を出力す
る。これは、S8で認識した活字文字認識結果と、S5
で認識した手書文字認識結果とのうちの信頼度のより高
い方の認識結果を出力する。
In step S9, a highly reliable recognition result is output. This is because the type character recognition result recognized in S8 and S5
Then, the recognition result with the higher reliability among the handwritten character recognition results recognized in step 1 is output.

【0029】また、S10は、S4のNOで活字文字と
判明した場合に、活字文字認識を行う。これは、上述し
たように、S2で抽出した文字パターンと、活字文字用
辞書6中に予め登録されている活字文字用の文字パター
ンとの間の類似度が最も大きい(距離差が最も小さい)
もの(文字)を活字文字認識結果として生成する。
In S10, when it is determined that the character is a print character in S4, the print character is recognized. This is because, as described above, the similarity between the character pattern extracted in S2 and the character pattern for type characters registered in advance in the type character dictionary 6 is the largest (the distance difference is the smallest).
An object (character) is generated as a type character recognition result.

【0030】S11は、信頼度が閾値より大きいか判別
する。YESの場合には、活字文字認識結果の信頼度が
閾値以上と判明したので、S12で活字文字認識結果を
出力する。一方、NOの場合には、活字文字認識結果の
信頼度が閾値以下と判明したので、既述したS8で適用
されていない文字認識処理(ここでは、手書文字認識処
理)を適用し、文字認識を行い、S9で信頼度のより高
い認識結果を出力する。
A step S11 decides whether or not the reliability is larger than a threshold. In the case of YES, it is determined that the reliability of the printed character recognition result is equal to or larger than the threshold value, and the printed character recognition result is output in S12. On the other hand, in the case of NO, since the reliability of the printed character recognition result is found to be equal to or less than the threshold value, the character recognition processing not applied in S8 described above (here, handwritten character recognition processing) is applied, and the character recognition processing is performed. Recognition is performed, and a higher reliability recognition result is output in S9.

【0031】以上によって、文書や帳票などから読み取
った画像をもとに手書文字と判別されたときに手書文字
認識を行い、信頼度が閾値より高いときに手書文字認識
結果を出力し、一方、閾値よりも低いときに他の活字文
字認識処理を行い、両者の信頼度を比較してより高い方
の認識結果を出力する。また、同様に、活字文字と判別
されたときに活字文字認識を行い、信頼度が閾値より高
いときに活字文字認識結果を出力し、一方、閾値よりも
低いときに他の手書文字認識処理を行い、両者の信頼度
を比較してより高い方の認識結果を出力する。これらに
より、文書や帳票等から読み取った画像中の文字が手書
文字か活字文字かを判別し、判別された方の文字認識を
行って信頼度が閾値よりも高ければ文字認識結果を出力
し、低ければ更に他の文字認識を行い、両者の信頼度の
うちのより高い方の文字認識結果を出力することで、迅
速かつ信頼度のより高い認識結果を出力することが可能
となる。
As described above, handwritten character recognition is performed when it is determined as a handwritten character based on an image read from a document or form, and a handwritten character recognition result is output when the reliability is higher than a threshold value. On the other hand, when the value is lower than the threshold value, another type character recognition processing is performed, and the reliability of the two is compared to output a higher recognition result. Similarly, when it is determined that the character is a type character, the type character recognition is performed, and when the reliability is higher than the threshold, the type character recognition result is output. On the other hand, when the reliability is lower than the threshold, other handwritten character recognition processing is performed. And outputs a higher recognition result by comparing the reliability of the two. With these, it is possible to determine whether the characters in the image read from a document or a form are handwritten characters or printed characters, perform character recognition of the determined one, and output a character recognition result if the reliability is higher than a threshold. If the character recognition result is low, another character recognition is performed, and the character recognition result of the higher reliability of the two is output, whereby the recognition result can be output quickly and with higher reliability.

【0032】図3は、本発明の動作説明フローチャート
(その2)を示す。これは、文書あるいは帳票などから
読み取った画像について、字種(手書文字かあるいは活
字文字か)を判別し、次に、手書文字あるいは活字文字
のうちの処理の高速にできる方の文字認識処理を行い、
判別した字種と適用した文字認識結果の字種とが一致す
る場合に、文字認識結果の信頼性が閾値以上のときに認
識結果を出力し、閾値以下のときに他の文字認識処理を
行った後、両者の信頼性を比較してより高い方の認識結
果を出力する例を示す。
FIG. 3 is a flowchart (part 2) for explaining the operation of the present invention. This involves determining the character type (handwritten or printed) of an image read from a document or form, and then recognizing which of the handwritten or printed characters can be processed faster. Do the processing,
When the determined character type matches the character type of the applied character recognition result, the recognition result is output when the reliability of the character recognition result is equal to or more than the threshold, and another character recognition process is performed when the reliability is equal to or less than the threshold. Then, an example is shown in which the reliability of the two is compared and the higher recognition result is output.

【0033】図3において、S21は、画像を入力す
る。これは、スキャナを用い、手書文字あるいは活字文
字のいずれか未知の文書あるいは帳票を読み取って画像
を取り込む(入力)する。
In FIG. 3, S21 inputs an image. In this method, an unknown document or form, which is either a handwritten character or a printed character, is read using a scanner, and an image is captured (input).

【0034】S22は、文字パターンを抽出する。これ
は、S21で取り込んだ画像について、文字の外接矩形
を求めて文字パターンを抽出する。S23は、手書文字
と活字文字の判別を行う。これは、後述する図4を用い
て説明するように、活字文字は直線の頻度が高く、手書
文字は直線頻度が低いなどの特徴をもとに手書文字か、
活字文字かの判別を行う。
In step S22, a character pattern is extracted. That is, a character pattern is extracted by obtaining a circumscribed rectangle of a character from the image captured in S21. A step S23 discriminates between handwritten characters and printed characters. This is, as will be described with reference to FIG. 4 described later, whether a printed character has a high frequency of straight lines and a handwritten character has a low frequency of straight lines.
It is determined whether it is a print character.

【0035】S24は、文字認識処理を行う。ここで
は、 ・活字文字認識は、活字文字用辞書6のカテゴリが少な
く、処理時間小さく ・手書文字認識は、手書文字用辞書5のカテゴリが多
く、処理時間が大きいので、処理時間の短い方の活字文
字認識処理を行う。尚、手書文字認識の方が逆に処理時
間が小の場合には、処理時間の短い手書文字認識を行
う。
In step S24, a character recognition process is performed. Here, in the case of type character recognition, there are few categories in the type character dictionary 6 and the processing time is short. In the case of handwritten character recognition, there are many categories of the handwritten character dictionary 5 and the processing time is long, so the processing time is short. Perform the type character recognition processing of the other. On the other hand, if the processing time of the handwritten character recognition is shorter, the handwritten character recognition with the shorter processing time is performed.

【0036】S25は、判別文字種と認識処理の文字種
が一致しているか判別する。YESの場合(例えばS2
3で判別された判別文字種が活字文字種であり、S24
で処理時間小の認識処理の文字種が活字文字種であった
場合)には、両者が一致したと判明したので、S26に
進む。一方、S25のNOの場合(両者が一致しない場
合)には、S30に進む。
A step S25 decides whether or not the discrimination character type and the character type of the recognition processing match. If YES (for example, S2
The discrimination character type determined in step 3 is the print character type, and S24
(When the character type of the recognition process with the short processing time is the print character type), it is determined that they match, and the process proceeds to S26. On the other hand, if NO in S25 (the two do not match), the flow proceeds to S30.

【0037】S26は、S25で両者が一致すると判明
したので、S24の認識結果の信頼度が閾値より大きい
か判別する。YESの場合には、S27でS24の文字
認識結果を出力する。一方、NOの場合には、S24の
文字認識結果が閾値よりも小さいと判明したので、S2
8に進む。
In S26, since it is determined in S25 that they match, it is determined whether the reliability of the recognition result in S24 is larger than a threshold value. If YES, the character recognition result of S24 is output in S27. On the other hand, in the case of NO, since the character recognition result in S24 was found to be smaller than the threshold value,
Proceed to 8.

【0038】S28は、適用されていない文字認識処理
を適用する。これは、S24で処理が短い文字認識処理
(例えば活字文字認識処理)の他の文字認識処理(例え
ば手書文字認識処理)を行う。
In step S28, a character recognition process which has not been applied is applied. For this, another character recognition processing (for example, handwritten character recognition processing) is performed in S24 in a short character recognition processing (for example, print character recognition processing).

【0039】S29は、信頼度の高い認識結果を出力す
る。これは、S24の処理の短い文字認識結果の信頼度
と、S28の他の(処理の短かくない方の文字認識結果
の)信頼度とを比較し、信頼度のより高い方の文字認識
結果を出力する。
In step S29, a highly reliable recognition result is output. This is because the reliability of the character recognition result of the short processing in S24 is compared with the reliability of the other (the character recognition result of the shorter processing) in S28, and the character recognition result of the higher reliability is compared. Is output.

【0040】また、S30は、S25で両者の文字種が
一致しないと判明したので、認識結果の信頼度が閾値よ
りも低いか判別する。YESの場合には、S24の文字
認識結果が閾値よりも低いと判明したので、S31で適
用されていない文字認識処理を行い、S32でその文字
認識結果を出力する。一方、S30のNOの場合には、
S24の文字認識結果の信頼度が閾値よりも小さくない
と判明(大きいと判明)したので、既述したと同様に、
S28で適用されていない文字認識処理を適用し、S2
9で信頼度の高い認識結果を出力する。
In S30, since it is determined in S25 that the character types do not match, it is determined whether the reliability of the recognition result is lower than a threshold value. In the case of YES, it is determined that the character recognition result in S24 is lower than the threshold, so that character recognition processing not applied in S31 is performed, and the character recognition result is output in S32. On the other hand, in the case of NO in S30,
Since the reliability of the character recognition result in S24 was found not to be smaller than the threshold value (it was found to be larger), as described above,
The character recognition process not applied in S28 is applied, and S2
In step 9, a highly reliable recognition result is output.

【0041】以上によって、文書や帳票などから読み取
った画像をもとに処理時間の短い方の文字認識処理を行
い、判別文字種と認識文字種が一致し、かつ信頼度が閾
値以上のときに文字認識結果を出力(S27)、信頼度
が閾値以下のときに他の文字認識処理を行って信頼度の
より高い方の文字認識結果を出力(S29)する。ま
た、判別文字種と認識文字種が一致しないときは、認識
結果の信頼度が閾値以下のときは他の文字認識処理を行
ってその認識結果を出力(S32)し、それ以外は他の
文字認識処理を行って信頼度のより高い方の文字認識結
果を出力(S29)する。これらにより、迅速かつ信頼
度のより高い認識結果を出力することが可能となる。
As described above, the character recognition processing of the shorter processing time is performed based on the image read from the document or the form, and the character recognition processing is performed when the discrimination character type matches the recognition character type and the reliability is equal to or higher than the threshold value. The result is output (S27), and when the reliability is equal to or less than the threshold value, another character recognition process is performed, and the character recognition result with the higher reliability is output (S29). If the discrimination character type does not match the recognition character type, if the reliability of the recognition result is equal to or less than the threshold value, another character recognition process is performed and the recognition result is output (S32). And outputs the character recognition result with the higher reliability (S29). Thus, it is possible to output a recognition result quickly and with higher reliability.

【0042】図4は、本発明の手書文字/活字文字の判
別説明図を示す。これは、 ・活字文字の数字(活・数) ・活字文字の漢字(活・漢) ・手書文字の数字(手・数) ・手書文字の漢字(手・漢) について、ストロークを横軸、直線の頻度を縦軸として
実際にサンプルを計測すると図示の右側の楕円に示すよ
うな範囲になるという傾向がある。即ち、活字文字は直
線のひん度が多く、手書文字は直線のひん度が少ない。
FIG. 4 is an explanatory diagram of handwritten / printed character discrimination according to the present invention. This includes: ・ Numbers of type characters (type / number) ・ Kanji of type characters (type / kan) ・ Numbers of handwritten characters (hand / number) ・ Kanji characters of handwritten characters (hand / kan) When the sample is actually measured with the frequency of the axis and the straight line as the vertical axis, there is a tendency that the range becomes as shown by the ellipse on the right side in the figure. That is, printed characters have a high frequency of straight lines, and handwritten characters have a low frequency of straight lines.

【0043】また、左下の識別関数は、公知の判別分析
により導かれる識別関数であり、右上の活字文字と手書
文字の母集団を示す図示の楕円の範囲の中央を通る直線
に直角な直線である。
The lower left discriminant function is a discriminant function derived by a known discriminant analysis, and is a straight line perpendicular to a straight line passing through the center of the illustrated elliptical range indicating the population of the upper right type character and handwritten character. It is.

【0044】従って、文書や帳票などからスキャナで読
み取った画像中から文字の外接矩形を求めて文字パター
ンを生成し、当該文字パターンの直線のひん度とストロ
ーク数との関係を求めて図示の識別関数に代入して値が
正のときは活字文字、負のときは手書文字として自動判
別することが可能となる。
Accordingly, a character pattern is generated by obtaining a circumscribed rectangle of a character from an image read by a scanner from a document, a form, or the like, and a relationship between the frequency of a straight line of the character pattern and the number of strokes is obtained to determine the identification in the drawing. When the value is substituted into a function and the value is positive, it can be automatically determined as a print character, and when the value is negative, it can be automatically determined as a handwritten character.

【0045】[0045]

【発明の効果】以上説明したように、本発明によれば、
文書や帳票中の文字について字種(例えば手書文字か活
字文字か)を判別して判別された方の認識処理を行い、
処理結果が閾値より小さければ他の認識処理して信頼性
の高い方の結果を出力したり、処理の高速な方の認識処
理をして判別した字種と適用した文字認識処理の字種と
が一致する場合には文字認識結果の信頼性が閾値より低
いときは他の認識処理をしてより信頼性の高い文字認識
結果を出力する構成を採用しているため、文書や帳票中
の文字が手書文字か活字文字かなどの字種が未知の場合
に迅速かつ効率的に信頼性の高い文字認識結果を出力す
ることが可能となる。
As described above, according to the present invention,
Characters (for example, handwritten characters or printed characters) are determined for characters in documents and forms, and recognition processing is performed for the determined type.
If the processing result is smaller than the threshold value, the other recognition processing is performed and a more reliable result is output, or the character type determined by performing the recognition processing of the faster processing and the character type of the applied character recognition processing are determined. When the character recognition result is lower than the threshold when the characters match, another recognition process is performed to output a more reliable character recognition result. It is possible to quickly and efficiently output a highly reliable character recognition result when the character type such as is a handwritten character or a printed character is unknown.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明のシステム構成図である。FIG. 1 is a system configuration diagram of the present invention.

【図2】本発明の動作説明フローチャート(その1)で
ある。
FIG. 2 is a flowchart (part 1) for explaining the operation of the present invention.

【図3】本発明の動作説明フローチャート(その2)で
ある。
FIG. 3 is a flowchart (part 2) for explaining the operation of the present invention.

【図4】本発明の活字文字/手書文字の判別説明図であ
る。
FIG. 4 is an explanatory diagram of discrimination between printed characters and handwritten characters according to the present invention.

【符号の説明】[Explanation of symbols]

1:処理装置 2:読取手段 3:判別手段 4:文字認識手段 5:手書文字用辞書 6:活字文字用辞書 7:出力ファイル 8:表示装置 9:入力装置 1: processing unit 2: reading unit 3: discriminating unit 4: character recognition unit 5: dictionary for handwritten characters 6: dictionary for printed characters 7: output file 8: display device 9: input device

Claims (5)

【特許請求の範囲】[Claims] 【請求項1】文書や帳票などの文字認識を行う文字認識
装置において、 上記文書や帳票などから画像を読み取る手段と、 上記読み取った画像中の文字について字種を判別する手
段と、 上記判別された字種に基づいて文字認識処理を行い、文
字認識結果の信頼性が閾値以上のときに文字認識結果を
出力し、一方、閾値以下のときに他の文字認識処理を行
ってそのときの信頼性と最初の信頼性とを比較して高い
方の文字認識結果を出力する手段とを備えたことを特徴
とする文字認識装置。
1. A character recognizing apparatus for recognizing a character in a document, a form, or the like, comprising: means for reading an image from the document, the form, etc .; and means for determining a character type of a character in the read image. Character recognition processing based on the type of the character, and outputs the character recognition result when the reliability of the character recognition result is equal to or greater than a threshold, and performs other character recognition processing when the reliability of the character recognition result is equal to or less than the threshold. Means for comparing the character and the first reliability and outputting a higher character recognition result.
【請求項2】文書や帳票などの文字認識を行う文字認識
装置において、 上記文書や帳票などから画像を読み取る手段と、 上記読み取った画像中の文字について、字種を判別する
手段と、 複数の字種のうちの高速処理可能な1の文字認識処理を
行い、前記判別した字種と文字認識結果をした字種が一
致する場合には文字認識結果の信頼性が閾値以上のとき
に文字認識結果を出力し、一方、閾値以下のときに他の
文字認識処理を行ってそのときの信頼性と最初の信頼性
とを比較して高い方の文字認識結果を出力する手段とを
備えたことを特徴とする文字認識装置。
2. A character recognizing device for recognizing a character in a document, a form, or the like, comprising: means for reading an image from the document, the form, etc .; means for determining a character type of the character in the read image; One of the character types, which can be processed at high speed, is subjected to character recognition processing. If the determined character type matches the character type obtained as a result of the character recognition, the character recognition is performed when the reliability of the character recognition result is equal to or greater than a threshold value. Means for outputting the result, while performing other character recognition processing when the value is equal to or less than the threshold value, comparing the reliability at that time with the initial reliability, and outputting a higher character recognition result. Character recognition device characterized by the above-mentioned.
【請求項3】文書や帳票などの文字認識を行う文字認識
装置において、 上記文書や帳票などから画像を読み取る手段と、 上記読み取った画像中の文字について、字種を判別する
手段と、 複数の字種のうちの高速処理可能な1の文字認識処理を
行い、前記判別した字種と文字認識結果をした字種が一
致しない場合には文字認識結果の信頼性が閾値以下のと
きに他の文字認識処理を行って、その文字認識結果を出
力し、一方、閾値以上のときに他の文字認識処理を行っ
てそのときの信頼性と最初の信頼性とを比較して高い方
の文字認識結果を出力する手段とを備えたことを特徴と
する文字認識装置。
3. A character recognition apparatus for recognizing a character in a document, a form, or the like, comprising: means for reading an image from the document, the form, etc .; means for determining a character type of the character in the read image; One of the character types, which can be processed at high speed, is subjected to character recognition processing. Performs character recognition processing and outputs the result of the character recognition. On the other hand, when the value is equal to or greater than the threshold value, performs another character recognition processing and compares the reliability at that time with the initial reliability to recognize a higher character. Means for outputting the result.
【請求項4】文書や帳票などの文字認識を行う文字認識
方法において、 上記文書や帳票などから画像を読み取るステップと、 上記読み取った画像中の文字について字種を判別するス
テップと、 上記判別された字種に基づいて文字認識処理を行い、文
字認識結果の信頼性が閾値以上のときに文字認識結果を
出力し、一方、閾値以下のときに他の文字認識処理を行
ってそのときの信頼性と最初の信頼性とを比較して高い
方の文字認識結果を出力するステップとからなることを
特徴とする文字認識方法。
4. A character recognition method for recognizing a character in a document, a form, or the like, comprising the steps of: reading an image from the document, the form, etc .; and determining a character type of the character in the read image. Character recognition processing based on the type of the character, and outputs the character recognition result when the reliability of the character recognition result is equal to or greater than a threshold, and performs other character recognition processing when the reliability of the character recognition result is equal to or less than the threshold. And comparing the character with the first reliability to output a higher character recognition result.
【請求項5】上記文書や帳票などから画像を読み取る手
段と、 上記読み取った画像中の文字について字種を判別する手
段と、 上記判別された字種に基づいて文字認識処理を行い、文
字認識結果の信頼性が閾値以上のときに文字認識結果を
出力し、一方、閾値以下のときに他の文字認識処理を行
ってそのときの信頼性と最初の信頼性とを比較して高い
方の文字認識結果を出力する手段として機能させるプロ
グラムを記録したコンピュータ読取可能な記録媒体。
5. A means for reading an image from the document or form, a means for determining a character type of a character in the read image, and performing a character recognition process based on the determined character type. When the reliability of the result is equal to or higher than the threshold, the character recognition result is output. On the other hand, when the reliability of the result is equal to or lower than the threshold, another character recognition process is performed, and the reliability at that time is compared with the initial reliability. A computer-readable recording medium on which a program for functioning as a means for outputting a character recognition result is recorded.
JP32061799A 1999-11-11 1999-11-11 Character recognition device, method and recording medium thereof Pending JP2001143020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP32061799A JP2001143020A (en) 1999-11-11 1999-11-11 Character recognition device, method and recording medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP32061799A JP2001143020A (en) 1999-11-11 1999-11-11 Character recognition device, method and recording medium thereof

Publications (1)

Publication Number Publication Date
JP2001143020A true JP2001143020A (en) 2001-05-25

Family

ID=18123420

Family Applications (1)

Application Number Title Priority Date Filing Date
JP32061799A Pending JP2001143020A (en) 1999-11-11 1999-11-11 Character recognition device, method and recording medium thereof

Country Status (1)

Country Link
JP (1) JP2001143020A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006092138A (en) * 2004-09-22 2006-04-06 Oki Electric Ind Co Ltd Character recognition device using a plurality of recognition dictionaries
JP2008033604A (en) * 2006-07-28 2008-02-14 Univ Of Tokyo Image processing system, character recognition system, and image processing program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006092138A (en) * 2004-09-22 2006-04-06 Oki Electric Ind Co Ltd Character recognition device using a plurality of recognition dictionaries
JP2008033604A (en) * 2006-07-28 2008-02-14 Univ Of Tokyo Image processing system, character recognition system, and image processing program

Similar Documents

Publication Publication Date Title
US6643401B1 (en) Apparatus and method for recognizing character
US7146047B2 (en) Image processing apparatus and method generating binary image from a multilevel image
CN115082942A (en) Document image flow chart identification method, device and medium based on YOLO v5
JP2001143020A (en) Character recognition device, method and recording medium thereof
JP2001092924A (en) Method and device for recognizing pattern
JP2867531B2 (en) Character size recognition device
JP3428504B2 (en) Character recognition device
JPH0916715A (en) Character recognition system and method therefor
JP2023034823A (en) Image processing apparatus, and control method, and program for image processing apparatus
JPS6146573A (en) Character recognizing device
JPH04280392A (en) Character recognizing system
JPS63269267A (en) Character recognizing device
JPS62281082A (en) Character recognizing device
JPH0632074B2 (en) Normalization method
JPH05189604A (en) Optical character reader
JPH0245892A (en) Method and device for recognizing character
JPH10334190A (en) Character recognition method and device and recording medium
JPH05182025A (en) Character recognition device
JPH08249421A (en) Recognizing method for reverse character
JPH10214308A (en) Character discrimination method
JP2002157550A (en) Device and method for recognizing character and recording medium
JPH09223189A (en) Method and processor for table processing
JPH076207A (en) Character recognition device
JPH0581480A (en) Character recognizing method
JPH05210759A (en) Character recognizing device

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20061004

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20091029

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20091110

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20100309