JPH056456A

JPH056456A - Character recognizing device

Info

Publication number: JPH056456A
Application number: JP3158049A
Authority: JP
Inventors: Hirobumi Okazaki; 博文岡崎
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1991-06-28
Filing date: 1991-06-28
Publication date: 1993-01-14

Abstract

PURPOSE:To improve the recognition performance by excluding unused characters from the recognition object. CONSTITUTION:Picture information read from a form 13 by a read part 12 is sent to a character segmenting part 14, and each character is segmented by a feature extracting part 15. The feature extracting part 15 extracts characteristic pattern from segmented character pattern information and sends it to a candidate character selecting part 16. This part calculates distance data indicating the degree of resemblance to each standard character pattern by pattern matching between inputted feature pattern data and standard character pattern data in a dictionary storage part 17. After a maximum value is set to distance data corresponding to a non-recognized character code stored in a non-recognition character group storage part 18, a prescribed number of candidate characters are outputted in order from the smallest value of distance data. Thus, non-recognition characters are excluded from candidates.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はデータ処理装置に係わ
り、特に帳票等の原稿から読み取った情報から文字を認
識する文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device, and more particularly to a character recognition device for recognizing characters from information read from a document such as a form.

【０００２】[0002]

【従来の技術】データ処理装置へのデータ入力手段の１
つとして、通常ＯＣＲ（Optical Character Recognitio
n)とよばれる文字認識装置が用いられることが多い。こ
の装置は、印刷あるいは手書き文字を光電変換素子から
の入力に基づいて電気的に認識するもので、キーボード
入力等に比べて簡単かつ高速の入力が可能である。この
装置では、多数の標準パターンを予め登録しておき、読
み取った文字から抽出した特徴パターンとパターンマッ
チングを行うことにより近似するいくつかの候補文字を
抽出して出力するようになっている。2. Description of the Related Art One of data input means for a data processing device
As a general rule, OCR (Optical Character Recognitio)
A character recognition device called n) is often used. This device electrically recognizes a printed or handwritten character based on an input from a photoelectric conversion element, and allows simpler and faster input than keyboard input or the like. In this apparatus, a large number of standard patterns are registered in advance, and some candidate characters that are similar to each other are extracted and output by performing pattern matching with the characteristic patterns extracted from the read characters.

【０００３】ところで、このような装置を使用する実際
の業務では、例えば所定のフォームの帳票のように文字
の種別が比較的限定されているものを読み取る場合が多
く、存在するすべての文字とマッチングを行うのは効率
が悪い。このため、従来は、例えば読取対象文字を数
字、英字、あるいは漢字等の各種別に分けて文字の属す
る各種別内でのみマッチングを行い、近似するものから
順にいくつかの候補を出力するようになっていた。By the way, in actual work using such a device, there are many cases in which a document having a relatively limited type of characters, such as a form of a predetermined form, is read, and matching with all existing characters is performed. Is inefficient to do. For this reason, conventionally, for example, a character to be read is divided into various types such as numbers, letters, or kanji, and matching is performed only within each type to which the character belongs, and several candidates are output in order from the closest one. Was there.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、読取対
象が極めて多数の標準パターンを有する漢字のような種
別の場合には、類似の文字が多数存在することから、実
際には使用しない文字まで候補となることがあり、特に
手書き文字の場合には、各人の筆跡の相違等によりさほ
ど近似していない文字まで候補となる場合もある。However, when the object to be read is a type such as Kanji having an extremely large number of standard patterns, since there are many similar characters, even characters that are not actually used are candidates. In particular, in the case of handwritten characters, characters that are not so similar may be candidates because of differences in the handwriting of each person.

【０００５】このように、従来の文字認識装置では、マ
ッチングの対象となる文字種別を限定しても、依然とし
て文字認識性能をさほど向上できないという問題があっ
た。従って、上記問題点を解決しなければならないとい
う課題がある。As described above, the conventional character recognition device has a problem that the character recognition performance cannot be improved so much even if the character types to be matched are limited. Therefore, there is a problem that the above problems must be solved.

【０００６】本発明はかかる問題を解決するためになさ
れたもので、使用しない文字を認識対象から除外して認
識性能を向上させることができる文字認識装置を得るこ
とを目的とする。The present invention has been made to solve such a problem, and an object of the present invention is to obtain a character recognition device capable of improving recognition performance by excluding unused characters from recognition targets.

【０００７】[0007]

【課題を解決するための手段】本発明に係る文字認識装
置は、(i) 各種帳票等を読み取る読取手段と、(ii)帳票
等の種類により使用されることのない文字群を非認識文
字群として予め記憶する記憶手段と、(iii) 読取手段で
読み取られた文字と標準文字パターンとのパターンマッ
チングを行うパターンマッチング手段と、(iv)このパタ
ーンマッチング手段の処理対象または処理結果から記憶
手段内の非認識文字群を除外する除外手段とを有するも
のである。A character recognition apparatus according to the present invention comprises (i) a reading means for reading various forms and the like, and (ii) a non-recognized character group that is not used depending on the type of the form. Storage means for storing in advance as a group, (iii) pattern matching means for performing pattern matching between the characters read by the reading means and a standard character pattern, and (iv) storage means based on the processing target or processing result of this pattern matching means And an excluding means for excluding the unrecognized character group in.

【０００８】[0008]

【作用】本発明に係る文字認識装置では、帳票等の種類
により使用されることのない文字群を非認識文字群とし
て予め登録しておき、これらをパターンマッチング処理
の対象またはその結果から除外することにより、不使用
文字が候補として出力されることがなくなる。In the character recognition device according to the present invention, a character group that is not used depending on the type of form or the like is registered in advance as an unrecognized character group, and these are excluded from the pattern matching processing target or its result. This prevents unused characters from being output as candidates.

【０００９】[0009]

【実施例】以下実施例につき本発明を詳細に説明する。EXAMPLES The present invention will be described in detail below with reference to examples.

【００１０】図１は本発明の一実施例における文字認識
装置及びその周辺装置を表わしたものである。この図
で、文字認識装置１１には読取部１２が備えられ、帳票
１３上の文字等の情報を画像情報として読取るようにな
っている。読取部１２は文字切出部１４を介して特徴抽
出部１５に接続され、さらにパターンマッチング部３１
へと接続されている。このパターンマッチング部３１に
は、文字コードに対応付けて標準パターンを格納する辞
書格納部１７と、マッチングの結果得られた距離データ
を格納する距離データ格納部３２が接続されている。こ
の距離データ格納部３２は候補文字選択部１６に接続さ
れ、さらにこの候補文字選択部１６は、帳票１３では使
用されることのない文字（以下、非認識文字と呼ぶ。）
のコードを格納する非認識文字群格納部１８に接続され
ている。候補文字選択部１６は転送部１９を介して、デ
ータ処理装置２１の転送部２２に接続されている。ま
た、転送部１９は書込制御部２０を介して非認識文字群
格納部１８に接続されている。データ処理装置２１の転
送部２２は、処理部２３に接続されている。この処理部
２３には、フロッピィディスク装置やハードディスク装
置等の外部記憶装置２４やキーボード（ＫＢ）２５が接
続されるほか、表示制御部２６を介してディスプレイ装
置（ＣＲＴ）２７が接続されている。FIG. 1 shows a character recognition device and its peripheral devices according to an embodiment of the present invention. In this figure, the character recognition device 11 is provided with a reading unit 12 so that information such as characters on the form 13 can be read as image information. The reading unit 12 is connected to the feature extracting unit 15 via the character cutting unit 14, and further includes the pattern matching unit 31.
Is connected to. The pattern matching unit 31 is connected to a dictionary storage unit 17 that stores a standard pattern in association with a character code and a distance data storage unit 32 that stores distance data obtained as a result of matching. The distance data storage unit 32 is connected to the candidate character selection unit 16, and the candidate character selection unit 16 is a character that is not used in the form 13 (hereinafter referred to as an unrecognized character).
It is connected to the unrecognized character group storage unit 18 that stores the code. The candidate character selection unit 16 is connected to the transfer unit 22 of the data processing device 21 via the transfer unit 19. Further, the transfer unit 19 is connected to the unrecognized character group storage unit 18 via the writing control unit 20. The transfer unit 22 of the data processing device 21 is connected to the processing unit 23. An external storage device 24 such as a floppy disk device or a hard disk device and a keyboard (KB) 25 are connected to the processing unit 23, and a display device (CRT) 27 is connected via a display control unit 26.

【００１１】以上のような構成の文字認識装置及びその
周辺装置の動作を説明する。読取部１２により帳票１３
から読み取られた画像情報は、文字切出部１４に送出さ
れ、ここで１文字単位の文字パターンに切り出される。
この文字パターンは特徴抽出部１５に送られ、ここで所
定の手順に従って特徴パターンが抽出される。この特徴
パターン２８はパターンマッチング部３１に入力され
る。The operation of the character recognition device having the above-mentioned configuration and its peripheral devices will be described. Form 13 by reading unit 12
The image information read from is sent to the character cutout unit 14, where it is cut out into a character pattern for each character.
This character pattern is sent to the feature extraction unit 15, where the feature pattern is extracted according to a predetermined procedure. The characteristic pattern 28 is input to the pattern matching unit 31.

【００１２】図２は、候補文字選択部１６及びその周辺
部を表わしたものである。この候補文字選択部１６には
一致判定部３３が設けられ、非認識文字群格納部１８が
接続されると共に、距離データ格納部３２及び距離デー
タ書換制御部３４へと接続されている。距離データ格納
部３２は、候補文字抽出部３５を介して転送部１９（図
１）へと接続されている。距離データ格納部３２は、距
離データ書換制御部３４により直接的にデータ書換制御
が行われるようになっている。FIG. 2 shows the candidate character selection unit 16 and its peripheral portion. The candidate character selection unit 16 is provided with a match determination unit 33, which is connected to the unrecognized character group storage unit 18, and is also connected to the distance data storage unit 32 and the distance data rewrite control unit 34. The distance data storage unit 32 is connected to the transfer unit 19 (FIG. 1) via the candidate character extraction unit 35. The distance data storage unit 32 is directly controlled by the distance data rewriting control unit 34.

【００１３】さて、特徴抽出部１５から出力された特徴
パターン２８は、パターンマッチング部３１に入力さ
れ、辞書格納部１７内のすべての標準パターンに対しパ
ターンマッチングが行われる。例えば、図３（Ａ）に示
すように、辞書格納部１７に各文字コードに対応して標
準パターンデータＰ₁〜Ｐ_nが格納されているとする
と、パターンマッチング部３１は、所定のアルゴリズム
により、特徴パターン２８と各標準パターンデータＰ₁
〜Ｐ_nとの類似の度合いを示す距離データＤ₁〜Ｄ_nを
算出し、距離データ格納部３２に格納する。これにより
距離データ格納部３２の内容は同図（Ｂ）のようにな
る。The characteristic pattern 28 output from the characteristic extraction unit 15 is input to the pattern matching unit 31 and pattern matching is performed on all standard patterns in the dictionary storage unit 17. For example, as shown in FIG. 3A, when the standard pattern data P _{1 to} P _n are stored in the dictionary storage unit 17 in correspondence with each character code, the pattern matching unit 31 uses a predetermined algorithm. , The characteristic pattern 28 and each standard pattern data P ₁
The distance data D _{1 to} D _n indicating the degree of similarity with the distances to P _n are calculated and stored in the distance data storage unit 32. As a result, the content of the distance data storage unit 32 becomes as shown in FIG.

【００１４】一致判定部３３は、距離データ格納部３２
から文字コードを読み出すごとに非認識文字群格納部１
８の内容を参照して一致するものがあるか否かを判定
し、一致するものがあった場合には、その文字コードを
距離データ書換制御部３４に送出する。これを受けた距
離データ書換制御部３４は、距離データ格納部３２の該
当文字コードに対応する距離データを最大値Ｄ_maxに書
き換える。例えば、非認識文字群格納部１８に図３
（Ｃ）に示すような非認識文字コード（○○××，○×
○×，○○△△，……）が登録されているとすると、距
離データ格納部３２内の、これらに対応した距離データ
（Ｄ₄，Ｄ_i，Ｄ_j，……）が、すべて最大値Ｄ_maxに
書き換えられる。これにより、距離データ格納部３２の
内容は同図（Ｄ）に示すようになる。The coincidence determination unit 33 includes a distance data storage unit 32.
Each time the character code is read from the unrecognized character group storage unit 1
It is determined by referring to the contents of No. 8 whether there is a match, and if there is a match, the character code is sent to the distance data rewrite control unit 34. Receiving this, the distance data rewriting control unit 34 rewrites the distance data corresponding to the character code in the distance data storage unit 32 to the maximum value D _max . For example, as shown in FIG.
Unrecognized character code (○○ ××, ○ ×
◯, ○○ △△, ……) are registered, the distance data (D ₄ , D _i , D _j ,…) corresponding to them in the distance data storage unit 32 are all the maximum. It is rewritten to the value D _max . As a result, the content of the distance data storage unit 32 becomes as shown in FIG.

【００１５】この後、候補文字抽出部３５は、距離デー
タ格納部３２内の距離データをソートし、その値の小さ
いものから所定個数（例えば、５個）の文字コードを候
補文字群２９として抽出し、出力する。このとき、同図
（Ｄ）に示すように、文字コード（○○××，○×○
×，○○△△，……）に対応する距離データは最大値Ｄ
max となっているので、これらの文字コードが候補とし
て出力されることはない。After that, the candidate character extraction unit 35 sorts the distance data in the distance data storage unit 32, and extracts a predetermined number (for example, 5) of character codes as a candidate character group 29 from the smallest value. And output. At this time, as shown in FIG. 7D, character codes (○○ ××, ○ × ○
The distance data corresponding to ×, ○○ △△, ……) is the maximum value D
Since it is max, these character codes are not output as candidates.

【００１６】さて、候補文字抽出部３５から出力された
候補文字コード２９は転送部１９（以下図１）を介して
出力され、データ処理装置２１の転送部２２を経て処理
部２３へと転送される。処理部２３は、受け取った文字
コードを基に、該当する候補文字を表示制御部２６によ
りディスプレイ装置２７に表示する。そして、キーボー
ド２５からの入力に応じて候補の中から正しい文字を決
定する。The candidate character code 29 output from the candidate character extraction unit 35 is output via the transfer unit 19 (hereinafter, FIG. 1) and transferred to the processing unit 23 via the transfer unit 22 of the data processing device 21. It The processing unit 23 causes the display control unit 26 to display the corresponding candidate character on the display device 27 based on the received character code. Then, the correct character is determined from the candidates according to the input from the keyboard 25.

【００１７】図４（Ａ）に示すように、例えば“雑費”
という文字を認識する場合において、従来の文字認識装
置で認識処理を行った場合には、同図（Ｃ）に示すよう
な候補文字群が出力され、正しい認識が行われない。こ
れに対し、本装置では、非認識文字群格納部１８に同図
（Ｂ）に示すような非認識文字群を格納しておくことに
より、これらの文字が認識対象から除外されるため、同
図（Ｄ）に示すような候補文字群が出力される。ここで
は、第１位の候補文字が“雑費”となっており、正しい
認識が行われている。As shown in FIG. 4A, for example, "miscellaneous expenses"
In the case of recognizing the character, when the recognition processing is performed by the conventional character recognition device, a candidate character group as shown in FIG. 7C is output and correct recognition is not performed. On the other hand, in the present apparatus, by storing the unrecognized character group as shown in FIG. 3B in the unrecognized character group storage unit 18, these characters are excluded from the recognition target. A candidate character group as shown in FIG. 6D is output. Here, the first-ranked candidate character is "miscellaneous expenses", and correct recognition is performed.

【００１８】なお、本実施例の説明中、非認識文字コー
ドは予め非認識文字群格納部１８に格納されているとし
たが、この非認識文字群格納部１８への登録は、装置の
立ち上げ時に、データ処理装置２１の外部記憶装置２４
に予め格納された非認識文字コードのデータファイルを
読み出して行う。この場合、外部記憶装置２４から読み
出された非認識文字コードは、転送部２２を経て文字認
識装置１１に転送され、書込制御部２０の制御の下に非
認識文字群格納部１８に格納される。In the description of this embodiment, the unrecognized character code is assumed to be stored in advance in the unrecognized character group storage unit 18. However, registration in this unrecognized character group storage unit 18 is performed by the device. At the time of raising, the external storage device 24 of the data processing device 21
The data file of the unrecognized character code stored in advance is read out. In this case, the unrecognized character code read from the external storage device 24 is transferred to the character recognition device 11 via the transfer unit 22 and stored in the unrecognized character group storage unit 18 under the control of the writing control unit 20. To be done.

【００１９】また、外部記憶装置２４への非認識文字コ
ード登録は、キーボード２５より行う。この場合、例え
ば漢字の部首や偏をキーボード２５より入力することに
よりディスプレイ装置２６の画面に該当する漢字を表示
させ、これらの中から所望の漢字をカーソルで指定して
選択するようにしてもよい。さらに、本実施例では、パ
ターンマッチング処理後に、非認識文字群を除外するこ
ととしたがパターンマッチング前に除外するようにして
もよい。The non-recognized character code is registered in the external storage device 24 using the keyboard 25. In this case, for example, by inputting the radical or deviation of the kanji from the keyboard 25, the corresponding kanji can be displayed on the screen of the display device 26, and a desired kanji can be selected from these by selecting with a cursor. Good. Further, in this embodiment, the unrecognized character group is excluded after the pattern matching processing, but it may be excluded before the pattern matching.

【００２０】[0020]

【発明の効果】以上説明したように、本発明によれば、
帳票等の種類により使用されることのない文字群を非認
識文字群として予め登録しておき、これらをパターンマ
ッチング処理の対象またはその結果から除外することし
たので、不使用文字が候補として出力されることがなく
なり、文字認識性能が向上するという効果がある。As described above, according to the present invention,
Character groups that are not used depending on the type of form, etc. are registered in advance as unrecognized character groups, and these are excluded from the pattern matching processing targets or their results, so unused characters are output as candidates. And the character recognition performance is improved.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例における文字認識装置とその
周辺装置を示すブロック図である。FIG. 1 is a block diagram showing a character recognition device and its peripheral devices according to an embodiment of the present invention.

【図２】この文字認識装置における候補文字選択部１６
を詳細に示すブロック図である。FIG. 2 is a candidate character selection unit 16 in this character recognition device.
FIG. 3 is a block diagram showing in detail.

【図３】この文字認識装置の動作を説明するための説明
図である。FIG. 3 is an explanatory diagram for explaining the operation of the character recognition device.

【図４】この文字認識装置による文字認識の一例を示す
説明図である。FIG. 4 is an explanatory diagram showing an example of character recognition by this character recognition device.

【符号の説明】[Explanation of symbols]

１１文字認識装置１２読取部１３帳票１４文字切出部１５特徴抽出部１６候補文字選択部１７辞書格納部１８非認識文字群格納部２１データ処理装置３１パターンマッチング部３２距離データ格納部３３一致判定部３４距離データ書換制御部３５候補文字抽出部 11 character recognition device 12 reading unit 13 form 14 character cutout unit 15 feature extraction unit 16 candidate character selection unit 17 dictionary storage unit 18 unrecognized character group storage unit 21 data processing device 31 pattern matching unit 32 distance data storage unit 33 match determination Part 34 Distance data rewriting control part 35 Candidate character extraction part

Claims

【特許請求の範囲】【請求項１】各種帳票等を読み取る読取部と、前記帳
票等の種類により使用されることのない文字群を非認識
文字群として予め記憶する記憶手段と、前記読取部で読
み取られた文字と標準文字パターンとのパターンマッチ
ングを行うパターンマッチング手段と、このパターンマ
ッチング手段の処理対象または処理結果から前記記憶手
段内の非認識文字群を除外する除外手段とを具備するこ
とを特徴とする文字認識装置。Claims: 1. A reading unit that reads various forms and the like, a storage unit that stores in advance a character group that is not used depending on the type of the form and the like as an unrecognized character group, and the reading unit. Pattern matching means for performing pattern matching between the characters read in step S1 and the standard character pattern, and exclusion means for excluding the unrecognized character group in the storage means from the processing target or processing result of this pattern matching means. Character recognition device characterized by.