JPH06139407A - Character segmenting method - Google Patents

Character segmenting method

Info

Publication number
JPH06139407A
JPH06139407A JP4289784A JP28978492A JPH06139407A JP H06139407 A JPH06139407 A JP H06139407A JP 4289784 A JP4289784 A JP 4289784A JP 28978492 A JP28978492 A JP 28978492A JP H06139407 A JPH06139407 A JP H06139407A
Authority
JP
Japan
Prior art keywords
character
circumscribed
processing step
rectangle
circumscribed rectangle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP4289784A
Other languages
Japanese (ja)
Other versions
JP2576080B2 (en
Inventor
Masaomi Nakajima
正臣 中嶋
Toshiyuki Yoshida
敏之 吉田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
N T T DATA TSUSHIN KK
NTT Data Corp
Original Assignee
N T T DATA TSUSHIN KK
NTT Data Communications Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by N T T DATA TSUSHIN KK, NTT Data Communications Systems Corp filed Critical N T T DATA TSUSHIN KK
Priority to JP4289784A priority Critical patent/JP2576080B2/en
Publication of JPH06139407A publication Critical patent/JPH06139407A/en
Application granted granted Critical
Publication of JP2576080B2 publication Critical patent/JP2576080B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To improve the correct answer ratio when the characters are segmented from a freely handwritten character string by sorting the groups into a group where a subject circumscribed square is integrated with its adjacent circumscribed squares and a group where the subject circumscribed square is not integrated with its adjacent circumscribed squares based on the synthetic variable that is weighted to the feature value. CONSTITUTION:A centroid calculation circuit 8 calculates the coordinates of the centroid of a character pattern included in a subject circumscribed square. A graphic feature value calculation circuit 9 notices the subject circumscribed square and its adjacent circumscribed square set in the direction (x) and calculates the graphic feature value. A circumscribed square integration deciding feature value calculation circuit 10 calculates the standardized feature value based on the graphic feature value and the estimating result of a character size estimating circuit 6. A second marge circuit 11 calculates the synthetic variable based on the feature value calculated by the circuit 10 and the weight and the constant term of an estimated linear decision function. Then it is decided based on the code of the synthetic variable whether the subject circumscribed square should be integrated with its adjacent circumscribed square of the direction (x) or not. If so, the operations following the calculation of the centroid of the character pattern are carried out.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、光学式文字読取り装置
(以下、「OCR」という)等に適用するに好適な文字切出
し方法に関する。
BACKGROUND OF THE INVENTION The present invention relates to an optical character reader.
The present invention relates to a character cutout method suitable for application to (hereinafter referred to as “OCR”) and the like.

【0002】[0002]

【従来の技術】文字の記入ピッチが一意に定まる固定ピ
ッチ帳票から文字パタンを切出す場合は、文字列の方向
と直交する方向に黒点数を計数した周辺分布を求め、こ
の周辺分布が極小となる位置から文字ピッチ分の範囲を
文字領域とすることができる。しかしながら、自由手書
き文字列においては、文字間のピッチが必ずしも一定に
ならないため、周辺分布の情報のみでは高精度な文字切
出しを実現することはではない。このため、従来は、自
由手書き文字列から文字パタンを切出す場合は、文字線
分のつながりに着目した黒画素連結成分の外接方形を求
め、対象パタンの外接方形が隣接した外接方形と統合さ
れるか否かを判定する方式が一般的である。この場合、
「文字は正方形に近い」,「方形の間隔が離れていれば別の
文字である可能性が高い」という仮定に基づいて外接方
形の分離,統合を行い、文字領域を確定していた。な
お、これに関しては、例えば、仲林等による「あいまい
検索を用いた高速枠なし手書き文字列読取り方式」(信学
論(D-II),J74-D-II,11,pp.1528-1537)の記載が参考
になる。
2. Description of the Related Art When cutting out a character pattern from a fixed-pitch form in which a character entry pitch is uniquely determined, a marginal distribution in which the number of black dots is counted in a direction orthogonal to the direction of a character string is obtained, and the marginal distribution is determined to be a minimum. It is possible to make the range of the character pitch from the position of the above as the character area. However, in the free handwritten character string, the pitch between the characters is not always constant, and therefore it is not possible to realize highly accurate character segmentation only with the information of the peripheral distribution. Therefore, conventionally, when cutting out a character pattern from a free handwritten character string, a circumscribed rectangle of a black pixel connected component focusing on the connection of character line segments is obtained, and the circumscribed rectangle of the target pattern is integrated with an adjacent circumscribed rectangle. A general method is to determine whether or not there is an error. in this case,
The circumscribed rectangles were separated and integrated based on the assumption that "the letters are close to squares" and "there is a possibility that they are different letters if the spaces between the rectangles are separated", and the letter area was fixed. Regarding this, for example, Nakabayashi et al. "High-speed frameless handwritten character string reading method using fuzzy search" (Shingaku Theory (D-II), J74-D-II, 11, pp.1528-1537). Will be helpful.

【0003】[0003]

【発明が解決しようとする課題】上記従来技術は、外接
方形の分離,統合を判定する特徴量として、外接方形の
縦横比と方形間の間隔を用い、各特徴量が一定条件を満
足した場合に外接方形を統合していた。しかしながら、
これらの特徴量は、特に自由手書き文字列においては、
筆記者による変動が大きく、例外パタンが数多く発生す
るため、十分な文字切出し正解率が得られているとは言
えない現状にある。本発明は上記事情に鑑みてなされた
もので、その目的とするところは、従来の技術における
上述の如き問題を解消し、自由手書き文字列からの文字
切出し正解率の向上を図るために、筆記者による文字変
動に対しての許容度の大きい文字切出し方法を提供する
ことにある。
In the above-mentioned prior art, the aspect ratio of the circumscribing rectangle and the interval between the rectangles are used as the feature quantity for judging the separation and integration of the circumscribing rectangles, and each feature quantity satisfies a certain condition. The circumscribed rectangle was integrated into. However,
These features, especially in free handwritten character strings,
The writer's variation is large and many exception patterns occur, so it cannot be said that a sufficient character extraction correct answer rate is obtained. The present invention has been made in view of the above circumstances, and an object of the present invention is to solve the problems as described above in the prior art and to improve the accuracy rate of character segmentation from free handwritten character strings. (EN) It is intended to provide a character cutout method which has a high degree of tolerance for a character variation by an operator.

【0004】[0004]

【課題を解決するための手段】本発明の上記目的は、処
理対象となる文字列を光学的に走査することにより得ら
れた画像データから各文字を切出す文字切出し方法にお
いて、前記画像データを文字列の方向(x方向)に直交す
る方向に走査して、黒連結成分の外接方形の座標を求め
る外接方形計算処理ステップと、文字列の方向と直交す
る方向(y方向)に黒点数を計数する周辺分布計算処理ス
テップと、外接方形をy方向に統合するか否かを判定す
るファーストマージ処理ステップと、ファーストマージ
後の外接方形から、文字サイズを推定する文字サイズ推
定処理ステップと、周辺分布計算処理結果に基づいて文
字や文字の部位間での接触を判定し、当該個所をy方向
の切断線で分離する強制切断処理ステップと、対象外接
方形に包含される文字パタンの重心の少なくともx座標
を算出する重心座標算出ステップと、対象外接方形のy
方向長さ,重心間のx方向長さ,統合後の外接方形とx
方向に隣接した外接方形との間隔を求める図形特徴算出
処理ステップと、図形特徴を文字サイズの推定値で標準
化する外接方形統合判定用特徴量算出処理ステップ、お
よび、予め求められた線形判別関数の係数項と前記特徴
量とを積和した値と前記線形判別関数の定数項との合計
値から、対象外接方形をx方向に隣接した外接方形と統
合するか否かを判定するセカンドマージ処理ステップを
有することを特徴とする文字切出し方法によって達成さ
れる。
The above object of the present invention is to provide a character cutting method for cutting out each character from image data obtained by optically scanning a character string to be processed. Scan in the direction orthogonal to the direction of the character string (x direction) to find the coordinates of the circumscribed rectangle of the black connected component, and set the number of sunspots in the direction (y direction) orthogonal to the direction of the character string. A marginal distribution calculation processing step of counting, a first merge processing step of determining whether or not to integrate the circumscribed rectangle in the y direction, a character size estimation processing step of estimating a character size from the circumscribed rectangle after the first merge, and a peripheral area. Forced cutting process step that determines the contact between characters or parts of characters based on the distribution calculation processing result, and separates the parts by the cutting line in the y direction, and the sentence included in the circumscribed rectangle A barycenter coordinate calculation step for calculating at least the x coordinate of the barycenter of the character pattern, and y of the circumscribed rectangle
Direction length, x-direction length between center of gravity, circumscribed rectangle after integration and x
Figure feature calculation processing step for obtaining the distance between the circumscribed rectangles adjacent in the direction, circumscribed rectangle integrated determination feature amount calculation step for standardizing the figure feature with the estimated value of the character size, and the previously determined linear discriminant function Second merge processing step of determining whether or not to integrate the target circumscribed rectangle with the circumscribed quadrangle adjacent in the x direction from the sum of the product of the coefficient term and the feature quantity and the constant term of the linear discriminant function It is achieved by the character cutting method characterized by having.

【0005】[0005]

【作用】本発明に係る文字切出し方法においては、複数
の外接方形統合判定用の特徴量(以下、単に「特徴量」と
いう)に対して重み付けをすることによって求められる
合成変量に基づいて、対象外接方形を隣接した外接方形
と統合する群と、統合しない群との2群に分類すること
を特徴としている。上述の重みを求める手段としては、
例えば、判別分析を利用する。ここで、判別分析とは、
複数の判定要因から一つの統合判定要因を得る方法であ
り、下記の式(1)の右辺に複数のサンプルの特徴量を代
入して判別分析を実行すると、前述の2群を最もよく分
類するための合成変量を決定する線形判別関数の重みと
定数項が求められる。なお、前述の2群の分類は、例え
ば、合成変量の符号が正である群と負である群に分けれ
ばよい。判別分析を文字切出しの適用対象を代表するサ
ンプルに対して実施し、予め重みと定数項とを求めてお
けば、それ以降は、対象外接矩形の特徴量を算出して、
式(1)の合成変量を求めることで、隣接した外接方形と
統合すべきか否かを決定することができる。 fi=a0+a1i1+a2i2+・・・・+anin ・・・・(1) i :対象となるサンプルを示す fi :サンプルiに関する合成変量(本変量の符
号により、2群のうちどちらかに属するかが決定され
る) zi1〜zin:線形判別関数で用いる サンプルiに関す
るn種類の特徴量 a0 :線形判別関数の定数項 a1〜an :各特徴量毎に得られる重み
In the character cutting method according to the present invention, a target variable based on a composite variable obtained by weighting a plurality of circumscribed rectangle integration determination feature amounts (hereinafter, simply referred to as “feature amount”) The feature is that the circumscribed rectangle is classified into two groups, a group that integrates the circumscribed rectangles adjacent to each other and a group that does not integrate them. As means for obtaining the above weight,
For example, discriminant analysis is used. Here, the discriminant analysis is
This is a method of obtaining one integrated judgment factor from a plurality of judgment factors, and when the discriminant analysis is executed by substituting the feature amounts of a plurality of samples in the right side of the following formula (1), the above-mentioned two groups are best classified. The weight and the constant term of the linear discriminant function that determines the composite variable for In addition, the classification of the above two groups may be divided into, for example, a group in which the sign of the composite variable is positive and a group in which the sign of the composite variable is negative. The discriminant analysis is performed on a sample representing the application target of character cutting, and if the weight and the constant term are obtained in advance, after that, the feature amount of the target circumscribed rectangle is calculated,
By determining the composite variable of equation (1), it is possible to determine whether or not it should be integrated with the adjacent circumscribed rectangle. f i = a 0 + a 1 z i1 + a 2 z i2 + ・ ・ ・ ・ + a n z in・ ・ ・ (1) i: indicates the target sample f i : composite variable related to sample i (sign of this variable) It is determined by which one of the two groups it belongs) z i1 to z in : n kinds of feature quantities related to the sample i used in the linear discriminant function a 0 : constant terms of the linear discriminant function a 1 to a n : Weight obtained for each feature

【0006】本発明に係る文字切出し方法における図形
特徴に関する詳細な説明を図2に、また、図形特徴に基
づいて算出される特徴量の算出方法を図3に示す。な
お、本発明に係る文字切出し方法においては、すべての
特徴量を選択することも可能であるが、線形判別関数の
重みを検定し、検定結果が有意となった特徴量のみを用
いることも可能である。線形判別関数の重みと定数項を
求めるために、判別分析を、住所の地名部20件、計2
00文字に対して適用した結果を、式(2)に示す。 f=-10.35+(-8.14)*統合後外接方形y方向サイズ +29.23*重心間x方向距離 +(-14.97)*統合後外接矩形間隔 +0.70*統合後線密度平均値 ・・・・(2) 式(2)に記述されていない特徴量は、全特徴量を用いた
判別分析において重みの検定結果が有意とならなかった
ものであり、式(2)の重みと定数項は、有意とならない
特徴量は除いて判別分析を実施した結果である。なお、
式(2)では、統合すべき場合に合成変量fが負となるよ
うに設定されている。解析の結果、95%の切出し正解率
が得られた。このことは、式(2)の重みと定数項を用い
て同様な母集団を持つサンプルに対して本発明に係る文
字切出し方法を適用すれば、同様な切出し正解率が期待
できることを示している。式(2)中の各係数を考察する
と、重心間x方向距離が大きいほど、統合矩形は分離さ
れる傾向にあり、また、統合後外接方形y方向サイズが
大きいほど、統合矩形は統合される傾向にあることが分
かる。これらは、文字は正方形に近いという仮説と一致
する。また、特徴量として、重心間のx方向距離が選択
されているが、重心は、一部の線分の突出による影響を
あまり受けないため、筆記者の文字変動に対しても、許
容度の大きい文字切出しが可能となる。
FIG. 2 shows a detailed description of the graphic features in the character cutting method according to the present invention, and FIG. 3 shows a method of calculating a feature amount calculated based on the graphic features. In the character cutting method according to the present invention, it is possible to select all the feature amounts, but it is also possible to test the weight of the linear discriminant function and use only the feature amount for which the test result is significant. Is. In order to obtain the weights and constant terms of the linear discriminant function, discriminant analysis was performed on 20 place name parts of the address, totaling 2
The result applied to 00 characters is shown in Expression (2). f = -10.35 + (-8.14) * circumscribed rectangle in the y direction after integration + 29.23 * distance in the x direction between the centers of gravity + (-14.97) * circumscribed rectangle spacing after integration + 0.70 * averaged line density after integration ... -(2) The feature quantity not described in equation (2) is the one in which the weight test result is not significant in the discriminant analysis using all feature quantities, and the weight and constant term of equation (2) are , Is the result of the discriminant analysis excluding the feature amounts that are not significant. In addition,
In the equation (2), the synthetic variable f is set to be negative when they should be integrated. As a result of the analysis, a correct extraction rate of 95% was obtained. This means that if the character segmentation method according to the present invention is applied to samples having a similar population using the weights and constant terms of the equation (2), a similar segmentation accuracy rate can be expected. . Considering each coefficient in the equation (2), the integrated rectangle tends to be separated as the distance between the centers of gravity in the x direction is larger, and the integrated rectangle is integrated as the size after circumscribing quadrangle y is larger. It can be seen that there is a tendency. These are consistent with the hypothesis that letters are close to squares. In addition, although the distance between the centers of gravity in the x direction is selected as the feature amount, the centers of gravity are not significantly affected by the protrusion of some line segments, and thus the tolerance of the writer's character variation is also high. Large character cutout is possible.

【0007】[0007]

【実施例】以下、本発明の実施例を図面に基づいて詳細
に説明する。図1は、本発明の一実施例に係る文字切出
し装置の構成を示すブロック図である。図において、1
はスキャナ等の画像入力装置から読み込まれた画像デー
タを格納する画像メモリ、2は画像メモリ1中の文字列
を文字列の方向(x方向)と直交する方向(y方向)に走査
して、黒連結成分外接方形を求める黒連結成分外接方形
計算回路、3は同じくy方向に黒画素数を計数して周辺
分布を求める周辺分布計算回路、4は同じくy方向に線
分数を計数して線密度を求める線密度計算回路を示して
いる。上述の黒連結成分外接方形計算回路2,周辺分布
計算回路3および線密度計算回路4は並行して動作し、
対象文字列の黒連結成分外接方形,周辺分布および線密
度を得るものである。また、5は黒連結成分外接方形の
y方向への統合を行うファーストマージ回路である。本
ファーストマージ回路5は、x方向と平行な座標軸への
写像の重なり部分の長さが、重なる両外接方形のx方向
の長さのうち、短い方の値の1/2と比較して長い場合
に外接方形の統合を行うものである。
Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a character cutting device according to an embodiment of the present invention. In the figure, 1
Is an image memory for storing image data read from an image input device such as a scanner, and 2 is a scanning of a character string in the image memory 1 in a direction (y direction) orthogonal to the direction of the character string (x direction), Black connected component circumscribed rectangle calculating circuit for calculating black connected component circumscribed rectangle 3 is a peripheral distribution calculating circuit for similarly calculating the peripheral distribution by counting the number of black pixels in the y direction, and 4 is also a line for calculating the number of line segments in the y direction The linear density calculation circuit which calculates | requires density is shown. The black connected component circumscribing rectangular calculation circuit 2, the peripheral distribution calculation circuit 3 and the linear density calculation circuit 4 described above operate in parallel,
It obtains the black connected component circumscribed rectangle, marginal distribution, and line density of the target character string. Further, 5 is a first merge circuit for integrating the black connected component circumscribed rectangle in the y direction. In the first merge circuit 5, the length of the overlapping portion of the mapping on the coordinate axis parallel to the x direction is longer than half of the shorter one of the overlapping circumscribed rectangles in the x direction. In this case, the circumscribed rectangle is integrated.

【0008】これは、文字には、文字の幅と高さがほぼ
等しくなるという特性があるため、文字の幅または高さ
のいずれかが推定できれば、この値を文字サイズと考え
ることができるということを示している。6は文字の上
述の如き特性に基づき、ファーストマージ後の外接方形
のy方向の長さの平均値または中央値を求めることによ
り、文字サイズを推定する文字サイズ推定回路を示して
いる。また、7は文字や文字の部位間での接触を判定
し、当該個所をy方向の線分により分離する強制切断回
路である。本強制切断判定回路7としては、本出願人が
先に特願平4-259501号「文字切出し方法」により提案した
強制切断回路を使用することができる。この強制切断回
路は、外接方形のx方向の長さと推定された文字サイズ
との比から、強制切断を実施するか否かを判定する処理
と、強制切断の対象となった外接方形領域内での周辺分
布の結果を平滑化する処理と、異なるピッチでの平滑化
処理結果を比較して、強制切断点を探索する範囲を求め
る強制切断探索範囲検出処理と、強制切断探索範囲の中
から周辺分布が極小となる位置を求めて、この位置で外
接方形を分割する処理を実施するものである。
This is because the character has a characteristic that the width and height of the character are almost equal, and if either the width or the height of the character can be estimated, this value can be considered as the character size. It is shown that. Reference numeral 6 denotes a character size estimating circuit for estimating the character size by obtaining the average value or the median value of the lengths of the circumscribed rectangles in the y direction after the first merge based on the above-mentioned characteristics of the characters. Reference numeral 7 is a forced disconnection circuit that determines contact between characters and parts of the characters and separates the parts by line segments in the y direction. As the forced disconnection determination circuit 7, the forced disconnection circuit previously proposed by the present applicant in Japanese Patent Application No. 4-259501 "Character cutout method" can be used. This forced cutting circuit determines whether or not to perform forced cutting based on the ratio between the length of the circumscribed rectangle in the x direction and the estimated character size, and determines whether to perform forced cutting within the circumscribed rectangular area that is the target of forced cutting. The process of smoothing the results of the marginal distribution of and the result of the smoothing process at different pitches are compared to find the range for searching the forced cutting point. A process is carried out in which the position where the distribution becomes minimum is obtained and the circumscribed rectangle is divided at this position.

【0009】重心算出回路8は、対象となる外接方形に
含まれる文字パタンの重心の座標を算出する。図形特徴
量算出回路9は、対象となる外接方形とx方向に隣接し
た外接方形に着目し、先に図2に示した、〜の図形
特徴量を算出する。更に、外接方形統合判定用特徴量算
出回路10では、上述の図形特徴量算出回路9による図
形特徴量、および、前述の文字サイズ推定回路6による
推定結果を用いて、図3に示した計算式により、標準化
された特徴量を算出する。最後に、セカンドマージ回路
11において、上述の外接方形統合判定用特徴量算出回
路10で算出された特徴量と、予め推定された線形判別
関数の重みと定数項を用いて合成変量を求め、合成変量
の符号に基づいて外接方形をx方向に隣接した外接方形
と統合するか否かを判定する。統合する場合は、統合後
の外接方形を新たな対象として、上述の重心算出以降の
処理を実施する。上記実施例によれば、自由手書き文字
列から切出した文字の、複数の外接方形の統合判定を行
うことにより、筆記者による文字変動に対する許容度の
大きい、自由手書き文字列からの文字切出し正解率を向
上させた文字切出し方法を実現できるという効果が得ら
れる。
The center of gravity calculating circuit 8 calculates the coordinates of the center of gravity of the character pattern included in the circumscribing rectangle of interest. The figure feature amount calculation circuit 9 pays attention to the circumscribed rectangle that is adjacent to the target circumscribed rectangle in the x direction, and calculates the figure feature amounts 1 to 3 shown in FIG. Further, the circumscribing rectangle integrated determination feature amount calculation circuit 10 uses the figure feature amount obtained by the figure feature amount calculation circuit 9 described above and the estimation result obtained by the character size estimation circuit 6 described above, and the calculation formula shown in FIG. Then, the standardized feature amount is calculated. Finally, in the second merge circuit 11, a synthetic variable is obtained by using the characteristic amount calculated by the circumscribed rectangular integration determination characteristic amount calculation circuit 10 described above, the weight of the linear discriminant function estimated in advance, and the constant term, and Based on the sign of the variable, it is determined whether or not the circumscribed rectangle is integrated with the circumscribed rectangle adjacent in the x direction. In the case of merging, the circumscribed quadrangle after the merging is set as a new target, and the processes after the above-described calculation of the center of gravity are performed. According to the above embodiment, the character cut out from the free handwritten character string, by performing the integrated determination of a plurality of circumscribing rectangles, the writer has a large tolerance for the character variation, the character cutout correct answer rate from the free handwritten character string. It is possible to obtain an effect that a character cutting method with improved

【0010】上記実施例においては、図2に示した〜
の図形特徴量をすべて算出するようにしているが、こ
のうちの一部、例えば、の重心間縦方向距離等は省略
することができる。また、図2に示した〜の図形特
徴量の算出方法は、上記実施例に示した方法と異なる方
法を用いてもよい。図4に、上述の文字切出し装置を応
用したOCRの概略構成を示す。図4において、41は
スキャナ等の画像入力装置、42は上記実施例に示した
文字切出し装置、43は文字の特徴抽出部、44は文字
識別部、45は識別結果表示部を示している。本実施例
に示すOCRによれば、筆記者による文字変動に対する
許容度の大きい、自由手書き文字列からの文字切出し正
解率を向上させたOCRを実現できるという効果が得ら
れる。上記実施例はあくまでも本発明の一例を示したも
のであって、本発明はこれに限定されるべきものではな
いことはいうまでもないことである。
In the above embodiment, as shown in FIG.
Although all the figure feature amounts of are calculated, some of them, for example, the distance between the centers of gravity in the vertical direction can be omitted. Further, as the calculation method of the graphic feature amounts of 1 to 3 shown in FIG. 2, a method different from the method shown in the above embodiment may be used. FIG. 4 shows a schematic configuration of an OCR to which the above character cutting device is applied. In FIG. 4, 41 is an image input device such as a scanner, 42 is the character cutting device shown in the above embodiment, 43 is a character feature extraction unit, 44 is a character identification unit, and 45 is an identification result display unit. According to the OCR shown in the present embodiment, it is possible to realize an OCR that has a high degree of tolerance for a character variation by a writer and that has an improved accuracy rate of extracting a character from a free handwritten character string. Needless to say, the above-mentioned embodiments are merely examples of the present invention, and the present invention should not be limited thereto.

【0011】[0011]

【発明の効果】以上、詳細に説明した如く、本発明によ
れば、筆記者による文字変動に対する許容度の大きい、
自由手書き文字列からの文字切出し正解率を向上させる
ことが可能な文字切出し方法を実現できるという顕著な
効果を奏するものである。
As described above in detail, according to the present invention, the writer has a high degree of tolerance for character variations.
This has the remarkable effect of realizing a character segmentation method that can improve the accuracy rate of character segmentation from free handwritten character strings.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例に係る文字切出し装置の構成
を示すブロック図である。
FIG. 1 is a block diagram showing a configuration of a character cutting device according to an embodiment of the present invention.

【図2】実施例における図形特徴の詳細な説明図であ
る。
FIG. 2 is a detailed explanatory diagram of graphic features in the embodiment.

【図3】図2に示した図形特徴に基づいて算出される特
徴量の算出方法を説明する図である。
FIG. 3 is a diagram illustrating a method of calculating a feature amount calculated based on the graphic features shown in FIG.

【図4】本発明に係る文字切出し装置を応用したOCR
の概略構成を示す図である。
FIG. 4 is an OCR to which the character cutting device according to the present invention is applied.
It is a figure which shows schematic structure of.

【符号の説明】[Explanation of symbols]

1:画像メモリ、2:黒連結成分外接方形計算回路、
3:周辺分布計算回路、4:線密度計算回路、5:ファ
ーストマージ回路、6:文字サイズ推定回路、7:強制
切断回路、8:重心算出回路、9:図形特徴量算出回
路、10:外接方形統合判定用特徴量算出回路、11:
セカンドマージ回路。
1: image memory, 2: black connected component circumscribed rectangular calculation circuit,
3: marginal distribution calculation circuit, 4: linear density calculation circuit, 5: first merge circuit, 6: character size estimation circuit, 7: forced cutting circuit, 8: centroid calculation circuit, 9: figure feature amount calculation circuit, 10: circumscribed Rectangular integrated determination feature amount calculation circuit, 11:
Second merge circuit.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 処理対象となる文字列を光学的に走査す
ることにより得られた画像データから各文字を切出す文
字切出し方法において、前記画像データを文字列の方向
(x方向)に直交する方向に走査して、黒連結成分の外接
方形の座標を求める外接方形計算処理ステップと、文字
列の方向と直交する方向(y方向)に黒点数を計数する周
辺分布計算処理ステップと、外接方形をy方向に統合す
るか否かを判定するファーストマージ処理ステップと、
ファーストマージ後の外接方形から、文字サイズを推定
する文字サイズ推定処理ステップと、周辺分布計算処理
結果に基づいて文字や文字の部位間での接触を判定し、
当該個所をy方向の切断線で分離する強制切断処理ステ
ップと、対象外接方形に包含される文字パタンの重心の
少なくともx座標を算出する重心座標算出ステップと、
対象外接方形のy方向長さ,重心間のx方向長さ,統合
後の外接方形とx方向に隣接した外接方形との間隔を求
める図形特徴算出処理ステップと、図形特徴を文字サイ
ズの推定値で標準化する外接方形統合判定用特徴量算出
処理ステップ、および、予め求められた線形判別関数の
係数項と前記特徴量とを積和した値と前記線形判別関数
の定数項との合計値から、対象外接方形をx方向に隣接
した外接方形と統合するか否かを判定するセカンドマー
ジ処理ステップを有することを特徴とする文字切出し方
法。
1. A character cutout method for cutting out each character from image data obtained by optically scanning a character string to be processed, wherein the image data is in the direction of the character string.
A circumscribing rectangle calculation processing step for obtaining the coordinates of the circumscribing rectangle of the black connected component by scanning in the direction orthogonal to the (x direction), and a marginal distribution for counting the number of black dots in the direction (y direction) orthogonal to the direction of the character string. A calculation processing step, and a first merge processing step for determining whether or not to integrate the circumscribed rectangle in the y direction,
From the circumscribed rectangle after the first merge, the character size estimation processing step that estimates the character size, and the contact between characters and parts of the character is determined based on the peripheral distribution calculation processing result,
A forced cutting process step of separating the point with a cutting line in the y direction, and a barycentric coordinate calculating step of calculating at least the x coordinate of the barycenter of the character pattern included in the circumscribed rectangle.
Graphic feature calculation processing step for obtaining the y-direction length of the target circumscribing rectangle, the x-direction length between the centers of gravity, and the distance between the circumscribing rectangle after integration and the circumscribing circumscribing adjoining in the x-direction; A circumscribed square integrated determination feature amount calculation processing step standardized by, and from a total value of a product term of a coefficient term of the linear discriminant function and the feature amount obtained in advance and a constant term of the linear discriminant function, A character segmentation method comprising a second merge processing step of determining whether to integrate a target circumscribed rectangle with a circumscribed rectangle adjacent in the x direction.
【請求項2】 前記各処理ステップに加えて、y方向に
線分数を計数する線密度計算処理ステップを有するとと
もに、前記図形特徴算出処理ステップにおいて統合後の
外接方形の線密度合計値を求め、これらの結果をも考慮
して、対象外接方形をx方向に隣接した外接方形と統合
するか否かを判定することを特徴とする請求項1記載の
文字切出し方法。
2. In addition to each of the processing steps, a linear density calculation processing step of counting the number of line segments in the y direction is provided, and in the graphic feature calculation processing step, a total linear density value of the circumscribed rectangles is calculated, The character cutout method according to claim 1, wherein it is determined whether or not the target circumscribed rectangle is integrated with the circumscribed rectangles adjacent to each other in the x direction in consideration of these results.
【請求項3】 前記各処理ステップに加えて、前記図形
特徴算出処理ステップにおいて、前記各特徴量に加えて
重心のy座標を求め、前記図形特徴算出処理ステップに
おいて、前記各特徴量に加えて、対象外接方形のx方向
長さ,重心間のy方向長さを求め、これらの結果をも考
慮して、対象外接方形をx方向に隣接した外接方形と統
合するか否かを判定することを特徴とする請求項1また
は2記載の文字切出し方法。
3. In addition to each of the processing steps, in the graphic feature calculation processing step, a y-coordinate of a center of gravity is calculated in addition to each of the feature quantities, and in addition to each of the feature quantities in the graphic feature calculation processing step. , Determine the x-direction length of the circumscribed rectangle and the y-direction length between the centers of gravity, and also consider these results to determine whether to integrate the circumscribed rectangle with the circumscribed rectangles adjacent in the x-direction. The character cutting method according to claim 1 or 2.
JP4289784A 1992-10-28 1992-10-28 Character extraction method Expired - Lifetime JP2576080B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4289784A JP2576080B2 (en) 1992-10-28 1992-10-28 Character extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4289784A JP2576080B2 (en) 1992-10-28 1992-10-28 Character extraction method

Publications (2)

Publication Number Publication Date
JPH06139407A true JPH06139407A (en) 1994-05-20
JP2576080B2 JP2576080B2 (en) 1997-01-29

Family

ID=17747729

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4289784A Expired - Lifetime JP2576080B2 (en) 1992-10-28 1992-10-28 Character extraction method

Country Status (1)

Country Link
JP (1) JP2576080B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012008909A (en) * 2010-06-28 2012-01-12 Fuji Xerox Co Ltd Image processing device and image processing program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012008909A (en) * 2010-06-28 2012-01-12 Fuji Xerox Co Ltd Image processing device and image processing program

Also Published As

Publication number Publication date
JP2576080B2 (en) 1997-01-29

Similar Documents

Publication Publication Date Title
US6970601B1 (en) Form search apparatus and method
US6636631B2 (en) Optical character reading method and system for a document with ruled lines and its application
US20210110194A1 (en) Method for automatic extraction of data from graph
US6754385B2 (en) Ruled line extracting apparatus for extracting ruled line from normal document image and method thereof
US5034991A (en) Character recognition method and system
US5504822A (en) Character recognition system
US4183013A (en) System for extracting shape features from an image
US6246794B1 (en) Method of reading characters and method of reading postal addresses
CN103034848B (en) A kind of recognition methods of form types
JPH05217019A (en) Business form identifying system and image processing system
CN111091124B (en) Spine character recognition method
CN112419260A (en) PCB character area defect detection method
Xiao et al. Knowledge-based English cursive script segmentation
JPH06139407A (en) Character segmenting method
JP3835652B2 (en) Method for determining Japanese / English of document image and recording medium
Eiterer et al. Postal envelope address block location by fractal-based approach
Zhang et al. Extraction of karyocytes and their components from microscopic bone marrow images based on regional color features
JP4194309B2 (en) Document direction estimation method and document direction estimation program
Humied Segmentation accuracy for offline Arabic handwritten recognition based on bounding box algorithm
JP3914119B2 (en) Character recognition method and character recognition device
JPH06251195A (en) Character position estimating method
Chen Bank Card Number Identification Program Based on Template Matching
Zhi-Gang et al. Image segmentation considering intensity roughness and color purity
CN115482525A (en) Image-based license plate identification method and system, electronic equipment and storage medium
JP3127413B2 (en) Character recognition device

Legal Events

Date Code Title Description
R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071107

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081107

Year of fee payment: 12

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091107

Year of fee payment: 13

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091107

Year of fee payment: 13

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101107

Year of fee payment: 14

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111107

Year of fee payment: 15

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121107

Year of fee payment: 16

EXPY Cancellation because of completion of term