JPH076208A

JPH076208A - Character recognition device

Info

Publication number: JPH076208A
Application number: JP5147976A
Authority: JP
Inventors: Yoshitake Tsuji; 善丈辻; Daisuke Nishiwaki; 大輔西脇; Makoto Uchida; 誠内田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-06-18
Filing date: 1993-06-18
Publication date: 1995-01-10

Abstract

PURPOSE:To attain more correct identification of similar characters whose shapes differ locally with each other. CONSTITUTION:A distance calculation section 6 multiplies a weight coefficient stored in a weight coefficient storage section 7 with a difference between a characteristic quantity of various character types and a characteristic quantity of a recognition object characteristic obtained by a characteristic extract section 4 to calculate a distance and totalizes all calculated distance values. A discrimination section 8 provides a characteristic type code of a character type corresponding to the sum of smallest distance values as a character recognition result.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は入力した文字を認識する
文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for recognizing input characters.

【０００２】[0002]

【従来の技術】従来の文字認識装置では認識対象文字か
ら得られた特徴量と予め辞書に蓄えられている各文字種
の特徴の距離を計算し、前記距離の最も小さい文字種コ
ードを出力していた（例えば、電子通信学会技術報告＜
パターン認識と学習＞ＰＲＬ８２−４６“マルチフォン
ト印刷漢字認識実験装置”目黒他）。2. Description of the Related Art In a conventional character recognition device, the distance between the feature amount obtained from the recognition target character and the feature of each character type stored in the dictionary in advance is calculated, and the character type code with the smallest distance is output. (For example, IEICE Technical Report <
Pattern recognition and learning> PRL82-46 "Multi-font printing Chinese character recognition experimental device" Meguro et al.).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、認識対
象文字の特徴量と辞書に蓄えられている特徴との距離を
求めて最小距離の文字種を文字認識結果と判定する方法
では、「テ」や「ラ」等の局所的に形状の異なる文字の
場合、微量な違いの積み重ねによる距離と局所的な違い
による距離が近似し識別することが困難となる。However, in the method of determining the distance between the feature amount of the recognition target character and the feature stored in the dictionary and determining the character type with the minimum distance as the character recognition result, "TE" and " In the case of a character having a locally different shape such as "La", the distance due to the accumulation of minute differences and the distance due to the local difference are approximate and it is difficult to identify.

【０００４】本発明の目的は、局所的に形状の異なる文
字に対してもより正確に文字認識を行なうことが可能な
文字認識装置を提供することになる。An object of the present invention is to provide a character recognizing device capable of more accurately recognizing characters having locally different shapes.

【０００５】[0005]

【課題を解決するための手段】本発明の文字認識装置
は、認識対象文字を含む画像を入力する画像入力部と、
前記画像入力部で入力された画像が格納される画像メモ
リと、前記画像メモリに格納されている画像から認識対
象となる文字を矩形で切り出す文字切り出し部と、前記
文字切り出し部で得られた認識対象文字のパターンより
認識に必要な特徴量を抽出する特徴抽出部と、認識対象
文字種毎に標準特徴量が格納されている辞書部と、前記
辞書部に格納されている、各認識対象文字の標準特徴量
毎の重み係数が格納されている重み係数格納部と、前記
特徴抽出部によって得られた、認識対象文字パターンの
各特徴量と、前記辞書部に格納されている、該特徴量に
対応する標準特徴量の差の絶対値に、前記重み係数格納
部に格納されている、該標準特徴量に対応する重み係数
を乗算したものを当該認識対象文字パターンの全ての特
徴量について求めて、これらを加算し距離値を算出する
処理を、前記辞書部に標準特徴量が格納されて全ての認
識対象文字種について行なう距離計算部と、前記距離計
算部で算出された距離値のうちで最も小さい距離値に対
応する認識対象文字種の字種コードを文字認識結果とし
て出力する判定部とを有する。A character recognition apparatus according to the present invention comprises an image input section for inputting an image containing a character to be recognized,
An image memory in which an image input by the image input unit is stored, a character cutout unit that cuts out a character to be recognized from the image stored in the image memory as a rectangle, and a recognition obtained by the character cutout unit. A feature extraction unit that extracts a feature amount necessary for recognition from the pattern of the target character, a dictionary unit that stores the standard feature amount for each recognition target character type, and a recognition unit for each recognition target character that is stored in the dictionary unit. A weighting factor storage unit that stores a weighting factor for each standard feature amount, each feature amount of the recognition target character pattern obtained by the feature extraction unit, and the feature amount stored in the dictionary unit. The absolute value of the difference between the corresponding standard feature amounts is multiplied by the weighting factor corresponding to the standard feature amount stored in the weighting factor storage unit to obtain for all the feature amounts of the recognition target character pattern. Of the distance values calculated by the distance calculation unit and the distance calculation unit that performs the process of adding these and calculating the distance value for all recognition target character types whose standard feature values are stored in the dictionary unit, The determination unit outputs the character type code of the recognition target character type corresponding to the small distance value as a character recognition result.

【０００６】[0006]

【作用】認識対象文字に対して抽出した特徴量と辞書部
に格納された認識対象文字種毎の標準特徴量との距離計
算時に各特徴量の差分に対して重み係数を乗じ、局所的
な差異を強調する様な重み付けをすることにより、局所
的に形状の異なるような類似した文字に対してもより正
確に識別が行なえる。なお、特開昭６２−５５７８３号
は、距離計算部で得られた距離を文字に付与された重み
係数により正規化するものであり、本発明とは異なる。When the distance between the feature amount extracted for the recognition target character and the standard feature amount for each recognition target character type stored in the dictionary unit is calculated, the difference between the feature amounts is multiplied by a weighting coefficient to obtain a local difference. By weighting so as to emphasize, it is possible to more accurately identify even similar characters having locally different shapes. It should be noted that Japanese Patent Laid-Open No. 62-57883 is different from the present invention in that the distance obtained by the distance calculator is normalized by the weighting factor given to the character.

【０００７】[0007]

【実施例】次に、本発明の実施例について図面を参照し
ながら説明する。Embodiments of the present invention will now be described with reference to the drawings.

【０００８】図１は本発明の一実施例の文字認識装置の
ブロック図、図２は入力画像の一例を示す図、図３は文
字切り出し部３で切り出された認識対象文字の一例を示
す図、図４は文字の特徴を示す図、図５は重み係数格納
部７の内容を示す図である。FIG. 1 is a block diagram of a character recognition apparatus according to an embodiment of the present invention, FIG. 2 is a view showing an example of an input image, and FIG. 3 is a view showing an example of recognition target characters cut out by a character cutting section 3. 4, FIG. 4 is a diagram showing characteristics of characters, and FIG. 5 is a diagram showing contents of the weighting factor storage unit 7.

【０００９】本実施例の文字認識装置は、図１に示すよ
うに、画像入力部１と画像メモリ２と文字切り出し部３
と特徴抽出部４と辞書部５と距離計算部６と重み係数格
納部７と判定部８と制御部９とデータバス１０で構成さ
れている。As shown in FIG. 1, the character recognition apparatus of this embodiment has an image input unit 1, an image memory 2, and a character cutout unit 3.
It is composed of a feature extraction unit 4, a dictionary unit 5, a distance calculation unit 6, a weight coefficient storage unit 7, a determination unit 8, a control unit 9 and a data bus 10.

【００１０】画像入力部１は図２に示すような認識対象
文字を含む画像を入力し、画像メモリ２に格納する。文
字切り出し部３は画像メモリ２に格納されている画像か
ら認識対象となる文字を２値化し、例えば数字「２」の
場合、図３に示すような矩形で取り出す。特徴抽出部４
は、文字切り出し部３で得られた認識文字のパターンに
より認識に必要な特徴量を求める。ここでは、特徴量
は、図４（２）に示すように、８個の方向Ａ０，Ａ１，
Ａ２，・・・・，Ａ７の各線分の長さで表わされる。図
３に示す認識文字パターンは、図４（１）に示すよう
に、第１凹と第２凹（または第１凸と第２凸）とからな
り、第１凹は方向Ａ１の線分から始まって、方向Ａ０の
線分、方向Ａ７の線分、方向Ａ６の線分、方向Ａ５の線
分と続く。そしてこれら各線分の長さはそれぞれ２，
４，２，２，１２である。したがって、第１凹の部分の
特徴量Ｇ_j （ｊ＝１，２，・・・，８）は表１のように
なる。第２凹についても同様に特徴量が求められる。The image input unit 1 inputs an image including a character to be recognized as shown in FIG. 2 and stores it in the image memory 2. The character slicing unit 3 binarizes the character to be recognized from the image stored in the image memory 2 and, for example, in the case of the number "2", extracts it in a rectangle as shown in FIG. Feature extraction unit 4
Calculates the feature amount necessary for recognition from the pattern of the recognized character obtained by the character cutout unit 3. Here, the feature amount is, as shown in FIG. 4B, eight directions A0, A1,
It is represented by the length of each line segment of A2, ..., A7. As shown in FIG. 4A, the recognized character pattern shown in FIG. 3 includes a first concave and a second concave (or a first convex and a second convex), and the first concave starts from a line segment in the direction A1. Then, a line segment in the direction A0, a line segment in the direction A7, a line segment in the direction A6, and a line segment in the direction A5 follow. And the length of each of these line segments is 2,
It is 4, 2, 2, 12. Therefore, the feature amount G _j (j = 1, 2, ..., 8) of the first concave portion is as shown in Table 1. The feature amount is similarly obtained for the second recess.

【００１１】[0011]

【表１】辞書部５には認識対象文字種Ｃ１，Ｃ２，・・・・，Ｃ
ｎ毎に各方向Ａ０，Ａ１，・・・・，Ａ７の標準特徴量
Ｆ_ijが格納されている。重み係数格納部７には図５に示
すように認識対象文字種Ｃ１，Ｃ２，・・・・，Ｃｎの
方向Ａ０，Ａ１，・・・・，Ａ７の標準特徴量Ｆ_ijに対
する重み係数α₁₁，α₁₂，・・・・，α _1m，α₂₂，・・
・・，α₃₃，・・・・，α_ij，・・・・，α_nmが格納さ
れている。距離計算部６は、特徴抽出部４で得られた認
識対象文字パターンの特徴量Ｇ_jと辞書部５に格納され
ている文字種Ｃ_i の特徴量Ｆ_ijと重み係数格納部７に格
納されている重み係数α_ijとから次式により距離Ｄ_i を
求める。[Table 1]The dictionary unit 5 includes recognition target character types C1, C2, ..., C.
Standard feature amount in each direction A0, A1, ..., A7 for each n
F_ijIs stored. The weighting factor storage unit 7 is shown in FIG.
Of the recognition target character types C1, C2, ..., Cn
Standard feature amount F in directions A0, A1, ..., A7_ijAgainst
Weighting factor α₁₁, Α₁₂, ..., α _1m, Α_{twenty two}・・・
.., α₃₃, ..., α_ij, ..., α_nmIs stored
Has been. The distance calculation unit 6 uses the recognition obtained by the feature extraction unit 4.
Characteristic amount G of the recognition target character pattern_jAnd stored in the dictionary section 5
Character type C_i Feature amount F_ijAnd the weight coefficient storage unit 7
Stored weighting factor α_ijAnd the distance D from the following equation_i To
Ask.

【００１２】[0012]

【数１】判定部８は距離計算部６で求められた距離Ｄ１，Ｄ２，
・・・・，Ｄｎのうちで最も小さい値に対する文字種の
コードを認識結果として出力する。[Equation 1] The determination unit 8 uses the distances D1, D2 calculated by the distance calculation unit 6
The code of the character type for the smallest value of Dn is output as the recognition result.

【００１３】次に、本実施例の動作を、図６に示すよう
な認識文字パターンが文字切り出し部３で得られた場合
について表２，表３を参照して説明する。Next, the operation of this embodiment will be described with reference to Tables 2 and 3 in the case where a recognized character pattern as shown in FIG. 6 is obtained by the character slicing unit 3.

【００１４】[0014]

【表２】 [Table 2]

【００１５】[0015]

【表３】図６に示す認識文字パターンの特徴量は、表２と表３に
示すように、第１凹の方向Ａ０，Ａ１，Ａ２，Ａ３，Ａ
４，Ａ５，Ａ６，Ａ７についてそれぞれ５，０，０，
０，０，１２，０，０であり、第１凸の方向Ａ０，Ａ
１，Ａ２，Ａ３，Ａ４，Ａ５，Ａ６，Ａ７についてそれ
ぞれ２，１２，０，０，１０，０，０，０である。一
方、文字種「テ」の標準特徴量は、表２に示すように、
第１凹の方向Ａ０，Ａ１，Ａ２，Ａ３，Ａ４，Ａ５，Ａ
６，Ａ７についてそれぞれ３，０，０，０，０，１３，
０，０であり、第１凸の方向Ａ０，Ａ１，Ａ２，Ａ３，
Ａ４，Ａ５，Ａ６，Ａ７についてそれぞれ５，１４，
０，０，１０，０，０，０である。また、文字種「ラ」
の標準特徴量は、表３に示すように、第１凹の方向Ａ
０，Ａ１，Ａ２，Ａ３，Ａ４，Ａ５，Ａ６，Ａ７につい
てそれぞれ８，０，０，０，０，１３，０，０であり、
第１凸の方向Ａ０，Ａ１，Ａ２，Ａ３，Ａ４，Ａ５，Ａ
６，Ａ７についてそれぞれ０，１４，０，０，１０，
０，０，０である。対応する特徴量Ｇ_j と標準特徴量Ｆ
_ijの差（距離）を求め、これらを加算すると、文字種
「テ」の場合８となり（表２参照）、文字種「ラ」の場
合、第１凸の方向Ａ０の特徴量と標準特徴量の差２に重
み付け係数２が乗算されて４となり、距離の総和は１０
となる（表３参照）。距離の総和は文字種「テ」の場合
のほうが小さいので、「テ」の文字種コードが文字認識
結果として判定部８から出力される。[Table 3] As shown in Tables 2 and 3, the feature amount of the recognized character pattern shown in FIG. 6 is the direction of the first concave A0, A1, A2, A3, A.
4, A5, A6, A7 are 5, 0, 0,
0,0,12,0,0, and the first convex direction A0, A
1, A2, A3, A4, A5, A6, A7 are 2, 12, 0, 0, 10, 0, 0, 0, respectively. On the other hand, the standard feature amount of the character type “te” is, as shown in Table 2,
First concave direction A0, A1, A2, A3, A4, A5, A
6, A7 is 3,0,0,0,0,13,
0,0, and the first convex directions A0, A1, A2, A3
About A4, A5, A6, A7 5,14,
0,0,10,0,0,0. Also, the character type "la"
As shown in Table 3, the standard feature amount of the
0, A1, A2, A3, A4, A5, A6, A7 are 8, 0, 0, 0, 0, 13, 0, 0 respectively,
First convex direction A0, A1, A2, A3, A4, A5, A
6, A7 is 0, 14, 0, 0, 10,
It is 0,0,0. Corresponding feature amount G _j and standard feature amount F
_If the difference (distance) of _ij is calculated and these are added, it becomes 8 for the character type "te" (see Table 2), and for the character type "la", the difference between the feature amount in the first convex direction A0 and the standard feature amount. 2 is multiplied by the weighting factor 2 to get 4 and the total distance is 10
(See Table 3). Since the sum of the distances is smaller in the case of the character type “te”, the character type code of “te” is output from the determination unit 8 as the character recognition result.

【００１６】[0016]

【発明の効果】以上説明したように本発明は、認識対象
文字に対して抽出した特徴量と辞書部に格納された認識
対象文字種毎の標準特徴量との距離計算時に各特徴値の
差分に対して重み係数を乗じ、局所的な差異を強調する
様な重み付けをすることにより、局所的に形状の異なる
ような類似した文字に対してもより正確に識別が行なえ
る効果がある。As described above, according to the present invention, when calculating the distance between the feature amount extracted for the recognition target character and the standard feature amount for each recognition target character type stored in the dictionary unit, the difference between the feature values is calculated. By multiplying the weight coefficient by a weighting coefficient and emphasizing local differences, it is possible to more accurately identify similar characters having locally different shapes.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例の文字認識装置のブロック図
である。FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention.

【図２】入力画像の一例を示す図である。FIG. 2 is a diagram showing an example of an input image.

【図３】文字切り出し部３で切り出された認識対象文字
の一例を示す図である。FIG. 3 is a diagram showing an example of a recognition target character cut out by a character cutout unit 3.

【図４】文字の特徴量を説明する図である。FIG. 4 is a diagram illustrating character feature amounts.

【図５】重み係数格納部７の内容を示す図である。5 is a diagram showing the contents of a weighting coefficient storage unit 7. FIG.

【図６】切り出された文字認識パターンの一例を示す図
である。FIG. 6 is a diagram showing an example of a cut-out character recognition pattern.

【符号の説明】[Explanation of symbols]

１画像入力部２画像メモリ３文字切り出し部４特徴抽出部５辞書部６距離計算部７重み係数格納部８判定部９制御部１０データバス 1 Image Input Section 2 Image Memory 3 Character Extraction Section 4 Feature Extraction Section 5 Dictionary Section 6 Distance Calculation Section 7 Weighting Factor Storage Section 8 Judgment Section 9 Control Section 10 Data Bus

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成５年１１月１６日[Submission date] November 16, 1993

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Name of item to be amended] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【特許請求の範囲】[Claims]

Claims

【特許請求の範囲】[Claims]

【請求項１】認識対象文字を含む画像を入力する画像
入力部と、前記画像入力部で入力された画像が格納される画像メモ
リと、前記画像メモリに格納されている画像から認識対象とな
る文字を矩形で切り出す文字切り出し部と、前記文字切り出し部で得られた認識対象文字のパターン
より認識に必要な特徴量を抽出する特徴抽出部と、認識対象文字種毎に標準特徴量が格納されている辞書部
と、前記辞書部に格納されている、各認識対象文字の標準特
徴量毎の重み係数が格納されている重み係数格納部と、前記特徴抽出部によって得られた、認識対象文字パター
ンの各特徴量と、前記辞書部に格納されている、該特徴
量に対応する標準特徴量の差の絶対値に、前記重み係数
格納部に格納されている、該標準特徴量に対応する重み
係数を乗算したものを当該認識対象文字パターンの全て
の特徴量について求めて、これらを加算し距離値を算出
する処理を、前記辞書部に標準特徴量が格納されて全て
の認識対象文字種について行なう距離計算部と、前記距離計算部で算出された距離値のうちで最も小さい
距離値に対応する認識対象文字種の字種コードを文字認
識結果として出力する判定部とを有する文字認識装置。1. An image input unit for inputting an image containing a character to be recognized, an image memory in which the image input by the image input unit is stored, and an image to be recognized from the image stored in the image memory. A character cutout unit that cuts out a character in a rectangle, a feature extraction unit that extracts a feature amount necessary for recognition from the pattern of the recognition target character obtained by the character cutout unit, and a standard feature amount for each recognition target character type are stored. A dictionary unit, a weighting factor storage unit that stores the weighting factor for each standard feature amount of each recognition target character stored in the dictionary unit, and a recognition target character pattern obtained by the feature extraction unit Of the standard feature amount stored in the dictionary unit and the absolute value of the difference between the standard feature amount corresponding to the feature amount stored in the dictionary unit, and the weight corresponding to the standard feature amount stored in the weight coefficient storage unit. Multiply coefficient Distance calculation is performed for all recognition target character types in which the standard feature value is stored in the dictionary unit, and the calculated distance value is calculated for all the feature values of the recognition target character pattern. A character recognition device comprising: a unit; and a determination unit that outputs a character type code of a recognition target character type corresponding to the smallest distance value among the distance values calculated by the distance calculation unit as a character recognition result.