JPH03136181A

JPH03136181A - Character segmenting method

Info

Publication number: JPH03136181A
Application number: JP1273690A
Authority: JP
Inventors: Toru Ishikawa; 石河　融; Hiroshi Yoshida; 浩史吉田; Koichi Higuchi; 浩一樋口; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1989-10-23
Filing date: 1989-10-23
Publication date: 1991-06-10

Abstract

PURPOSE:To accurately segment the characters of a slip including plural character forms at high speed by successively segmenting character patterns by using a segment line to be determined by a binary digital signal corresponding to the distributing area of black bits. CONSTITUTION:A character segmentation selection part 13 scans row picture data from a line buffer 12 in a direction vertical to a read character row, counts the number of the black bits on a scanning line and calculates the distribution of the black bits. Further, a position just before a change from more than 1 to 0 rather than a position, where this distribution is changed from 0 to more than 1, is detected as the distributing area of the black bits and the average line width of the characters in the detected distributing area of the black bits is calculated. Based on the average line width, line elements vertical to the character row more than a threshold value are extracted. According to the presence and absence of the extracted line elements in the vertical direction in the distributing area of the black bits, the direction of the segment line to segment the characters is selected and outputted to a character segment part 14. Thus, the character pattern can be more securely segmented from the slip including the plural character form at high speed in a simple processing.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、複数の字体の文字を含む帳票、特に斜体文字
を含む帳票上の文字を精度よ（切り出すことのできる文
字切り出し方法に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character cutting method that can accurately cut out characters on a form containing characters in a plurality of fonts, particularly on a form containing italic characters.

（従来の技術）従来、ラインバッファに格納されている文字列のパター
ンデータから１文字の領域を分離する文字切り出し方法
は、文字列が格納されているラインバッファの上端から
下端に向かって一列走査し、この走査と直角な方向に順
次列を移動することにより、ラインバッファの文字パタ
ーンを読み出す。(Prior Art) Conventionally, a character extraction method for separating a single character area from the pattern data of a character string stored in a line buffer involves scanning a line from the top to the bottom of the line buffer in which the character string is stored. Then, by sequentially moving columns in a direction perpendicular to this scanning, the character pattern in the line buffer is read out.

そして、１列の走査中に黒ビット（文字部分を黒ビット
、背景部分を白ビット）を計数することにより黒ビット
の分布を作成し、その黒ビットの分布を参照して、１文
字の領域を決定するという方法であった。Then, by counting the black bits (black bits for the character part and white bits for the background part) while scanning one row, a distribution of black bits is created, and by referring to the distribution of black bits, the area of one character is The method was to determine.

第７図は斜体文字を含む文字列についての従来の黒ビッ
トの分布を用いた文字切り出し方法における行画像デー
タの一部を示す図である。同図において６０、６１．６
２．６３．６４は文字パターンであり、ラインバッファ
に格納されている文字パターンの一部である。FIG. 7 is a diagram showing part of line image data in a conventional character segmentation method using the distribution of black bits for a character string including italic characters. In the same figure, 60, 61.6
2.63.64 is a character pattern, which is part of the character pattern stored in the line buffer.

６５は文字パターン６０．６１．６２．６３．６４の列
方向の黒ビットの分布である。また、同図においてライ
ンバッファの左端の指定された位置より読み出しを開始
し、−列の読み出し中に該列の黒ビットの分布を作成す
る。次に該黒ビットの分布について閾値α（α：定数）
と比較し、該黒ビットの分布がαより大きい列を始点と
し再び閾値αより小となる列を終点とし始点から終点ま
でを１文字領域として切り出していた。しかし、斜体文
字を含む文字パターン列を文字切り出しする際に設定さ
れる閾値αは通常文字を基準としているために、斜体文
字を含む文字パターン列から１文字ずつ切り出すことは
不可能であった。65 is the distribution of black bits in the column direction of the character pattern 60.61.62.63.64. In addition, in the figure, reading is started from a specified position at the left end of the line buffer, and while reading the - column, the distribution of black bits in the column is created. Next, for the distribution of black bits, a threshold α (α: constant)
In comparison, a column in which the distribution of black bits is larger than α is the starting point, a column in which the distribution of black bits is smaller than the threshold value α is the ending point, and the area from the starting point to the ending point is cut out as one character area. However, since the threshold value α set when cutting out characters from a character pattern string including italic characters is based on normal characters, it has been impossible to cut out characters one by one from a character pattern string including italic characters.

この問題を解決するために、本出願人は、第７図に示す
ような斜体文字を含む文字列において、文字パターンを
切り出す方法を先に提案している（特開昭６２−１２７
９８５号公報を参照）。この方法は、黒ビットの分布の
領域の幅が固定閾値よりも大きい時には、その領域は２
文字以上の文字パターンがあるものと推定し、文字パタ
ーンの文字外接枠の上下の辺から各々対辺に向かって列
走査を行うことにより背景部分を走査方向別の領域に分
類する。その分類結果により文字外接枠内の水平走査を
行って切り出し領域を検出し、切り出し位置を決定する
というものである。In order to solve this problem, the present applicant has previously proposed a method of cutting out character patterns from a character string including italic characters as shown in FIG.
(See Publication No. 985). In this method, when the width of the region of the distribution of black bits is larger than a fixed threshold, the region is
It is assumed that there are more character patterns than characters, and the background portion is classified into regions according to the scanning direction by performing row scanning from the upper and lower sides of the character circumscribing frame of the character pattern toward the opposite sides. Based on the classification results, the character circumscribing frame is horizontally scanned to detect the cutout area and determine the cutout position.

このような文字切り出し方法を用いて斜体の字体を含む
帳票、例えば第３図に示すような強調するために特定の
単語だけイタリック体で印字されているような英文の文
字パターンの切り出しを行っていた。This type of character extraction method is used to extract English character patterns from forms that include italic fonts, such as those shown in Figure 3, where only specific words are printed in italics for emphasis. Ta.

（発明が解決しようとする課題）しかし、上記従来の方法は、文字パターンが２文字以上
と推定された時に行う処理であり、文字パターンが２文
字であるが、文字パターンが１文字と推定された場合に
、１文字ずつに入力文字パターンを切り出すことが不可
能であるという問題がある。また、入力文字パターンを
切り出す処理が複雑なため、この方法を用いて認識装置
を実現する際に、読取速度が低下するという問題がある
。さらには、認識装置に多数のハードウェアを必要とし
、認識装置の小型化が難しいという問題点があった。(Problem to be Solved by the Invention) However, the conventional method described above is a process that is performed when the character pattern is estimated to be two or more characters. In this case, there is a problem in that it is impossible to extract the input character pattern one character at a time. Furthermore, since the process of cutting out the input character pattern is complicated, there is a problem in that the reading speed decreases when realizing a recognition device using this method. Furthermore, there is a problem in that the recognition device requires a large amount of hardware, making it difficult to downsize the recognition device.

本発明の目的は、このような従来の問題点を解決し、簡
単な処理で複数の字体の文字を含む帳票から文字パター
ンをより確実かつ高速に切り出すことを可能とし、さら
に認識装置の小型化及び処理速度の高速化を可能とする
文字切り出し方法を提供することにある。The purpose of the present invention is to solve these conventional problems, make it possible to more reliably and quickly cut out character patterns from a form containing characters in multiple fonts with simple processing, and further downsize the recognition device. Another object of the present invention is to provide a character extraction method that enables faster processing speed.

（課題を解決するための手段）上記目的を達成するために、本発明の文字切り出し方法
は、媒体上の文字行を光電変換し、量子化された行画像
データを得、該行画像データより文字切り出し位置を検
出し、該検出された文字切り出し位置により、前記行画
像データを切り出し、文字パターンの抽出を行う文字切
り出し方法において、前記行画像データを文字行に対し
て垂直方向に走査し、１走査ごとに、文字パターンの黒
ビット数を計数し、黒ビットの分布を求めるステップと
、前記黒ビット数があらかじめ定められた値よりも大き
な値が連続する文字行に対して任意の角度の線素を抽出
するステップと、前記求められた黒ビットの分布から文
字行方向に黒ビットが連続する領域を検出するステップ
と、前記抽出された文字行に対して任意の角度の線素の
有無を検出するステップと、当該線素の量に基づいて当
該領域における文字切り出し走査方向を決定するステッ
プと、該決定された文字切り出し走査方向に走査を行っ
て文字切り出しを行うステツブとを有することに特徴が
ある。(Means for Solving the Problems) In order to achieve the above object, the character cutting method of the present invention photoelectrically converts character lines on a medium to obtain quantized line image data, and from the line image data. In a character cutting method that detects a character cutting position, cuts out the line image data according to the detected character cutting position, and extracts a character pattern, the line image data is scanned in a direction perpendicular to the character line, For each scan, the number of black bits in the character pattern is counted and the distribution of black bits is determined. a step of extracting a line element, a step of detecting an area where black bits are continuous in the character line direction from the distribution of black bits obtained above, and the presence or absence of a line element at an arbitrary angle with respect to the extracted character line. , a step of determining a character extraction scanning direction in the region based on the amount of the line elements, and a step of performing character extraction by scanning in the determined character extraction scanning direction. It has characteristics.

また、前記文字切り出し方法において、前記文字切り出
し走査方向は、前記文字行方向に黒ビットが連続する領
域内に、前記文字行に対して垂直なＫより大きな線素（
Ｋ：定数）が抽出された場合は、文字行に対して垂直方
向であり、前記文字行に対して垂直なＫより大きな線素
が抽出されなかった場合は、文字行に対して０度だけ傾
いた方向（θ：定数）であることに特徴がある。Further, in the character cutting method, the character cutting scanning direction includes a line element (
If K: constant) is extracted, it is perpendicular to the character line, and if no line element larger than K perpendicular to the character line is extracted, it is 0 degrees to the character line. It is characterized by an inclined direction (θ: constant).

（作用）本発明においては、前記行画像データを文字行に対して
垂直方向に走査し、■走査ごとに、文字パターンの黒ビ
ット数を計数し、黒ビットの分布を求め、その求めた黒
ビット数があらかじめ定められた値よりも大きな値が連
続する文字行に対して任意の角度の線素を抽出し、前記
求められた黒ビットの分布から文字行方向に黒ビットが
連続する領域を検出し、前記抽出された文字行に対して
任意の角度の線素の有無を検出し、当該線素の量に基づ
いて当該領域における文字切り出し走査方向を決定し、
該決定された文字切り出し走査方向に走査を行って文字
切り出しを行う。(Function) In the present invention, the line image data is scanned in a direction perpendicular to the character line, and for each scan, the number of black bits of the character pattern is counted, the distribution of black bits is determined, and the determined black Extract a line element at an arbitrary angle from a character line in which the number of bits is consecutively larger than a predetermined value, and from the distribution of black bits obtained above, find an area where black bits are continuous in the direction of the character line. detecting the presence or absence of a line element at an arbitrary angle with respect to the extracted character line, and determining a character cutting scanning direction in the area based on the amount of the line element,
Character cutting is performed by scanning in the determined character cutting scanning direction.

ここで、前記文字切り出し走査方向は、前記文字行方向
に黒ビットが連続する領域内に、前記文字行に対して垂
直なＫより大きな線素（Ｋ：定数）が抽出された場合は
、文字行に対して垂直方向であり、前記文字行に対して
垂直なＫより大きな線素が抽出されなかった場合は、文
字行に対して０度だけ傾いた方向（θ：定数）である。Here, in the character extraction scanning direction, if a line element larger than K (K: constant) perpendicular to the character line is extracted in an area where black bits are continuous in the character line direction, the character The direction is perpendicular to the line, and if a line element larger than K perpendicular to the character line is not extracted, the direction is inclined by 0 degrees with respect to the character line (θ: constant).

これにより、複数の字体を含む帳票を高速に精度よく切
り出すことができ、さらにこの方法を用いて認識装置を
実現した際に、認識装置の小型化が可能となる。As a result, a form including a plurality of fonts can be cut out quickly and accurately, and when a recognition device is realized using this method, it is possible to downsize the recognition device.

（実施例）以下、本発明の実施例を、図面に基づいて詳細に説明す
る。(Example) Hereinafter, an example of the present invention will be described in detail based on the drawings.

第２図は、本発明を適用した文字認識装置の構成ブロッ
ク図である。FIG. 2 is a block diagram of a character recognition device to which the present invention is applied.

第２図において、ｌＯは帳票などに記入された文字列の
認識を行う文字認識装置、１１は図示されない光学的手
段により読み取った光信号（図中、Ｓで示す）を画像信
号に変換する光電変換部、１２はその変換された画像信
号（２値のディジタル信号）を格納するラインバッファ
、１３は本発明の特徴的な文字パターンを切り出す切り
出し線の方向を選択し、切り出し線を決定する文字切り
出し選択部、１４はその決定した切り出し線に基づき、
２値のディジタル信号から入力文字パターンを切り出す
文字切り出し部、１５は切り出された入力文字パターン
を格納するパターンレジスタ、１６は入力文字パターン
について特徴抽出処理及び入力文字パターンの認識を行
う認識部である。１７は例えばコンピュータなどの外部
機器のデータ入力端子に接続されるものであり、文字認
識の終了した文字名（例えばＪＩＳの文字コード）を出
力する出力端子である。In FIG. 2, IO is a character recognition device that recognizes a character string written on a form, etc., and 11 is a photoelectric device that converts an optical signal (indicated by S in the diagram) read by an optical means (not shown) into an image signal. A conversion unit 12 is a line buffer that stores the converted image signal (binary digital signal); 13 is a character that selects the direction of the cutting line for cutting out the characteristic character pattern of the present invention and determines the cutting line; Based on the determined cutting line, the cutting selection section 14 selects
A character cutting unit cuts out an input character pattern from a binary digital signal, 15 is a pattern register that stores the cut out input character pattern, and 16 is a recognition unit that performs feature extraction processing on the input character pattern and recognition of the input character pattern. . Reference numeral 17 is an output terminal connected to a data input terminal of an external device such as a computer, and outputs a character name (for example, a JIS character code) for which character recognition has been completed.

第３図は、本実施例で用いる文字が記載されている帳票
である。帳票３１の１行目にはｒＴＨＥ　ＳＰＥＥＤＥ
ＮＨＡＮＣＥＭＥＮＴ　Ｊが記載されており、第２行目
にはｒＴＯＴＨＥ　ｔ／Ａ／１０．９　ＦＩＬＥ　ＳＹ
ＳＴＥＭＪが記載されている。FIG. 3 shows a form in which characters used in this embodiment are written. The first line of form 31 is rTHE SPEEDE.
NHANCEMENT J is written, and the second line is rTOTHE t/A/10.9 FILE SY.
STEMJ is listed.

本実施例では、斜体文字を含む第２行目の文字列認識に
ついて説明する。In this embodiment, recognition of a character string in the second line including italic characters will be described.

第４図（ａ）は入力書式テーブルの例を示す図であり、
第４図（ｂ）は入力書式テーブルを説明するための図で
ある。FIG. 4(a) is a diagram showing an example of an input format table,
FIG. 4(b) is a diagram for explaining the input format table.

入力書式テーブルには、第４図（ｂ）に示すように、第
１行目の行領域の帳票の上端及び左端からの距離（ｙ、
とＸ、）、行領域の大きさ（行高り２行幅Ｗ）、行ピッ
チＰ９行数ｎが記録されている。In the input format table, as shown in FIG. 4(b), the distance (y,
and X,), the size of the line area (line height 2 line width W), line pitch P9 and number of lines n are recorded.

第５図（ａ）は本実施例に用いる２値の行画像データを
示す図である。これは第３図に示す帳票３１の第２行の
データである。第５図（ｂ）は行画像データの黒ビット
の分布を示す図である。本図中、Ｍｌは黒ビツト分布の
領域の開始位置、Ｍ６．は黒ビツト分布領域の終了位置
を示している。なお、ここでｉは１行中の黒ビツト分布
領域の出現番号である。第５図（Ｃ）は行画像データか
ら抽出される文字行に対して垂直方向の線素（サブパタ
ーン）を示す図である。FIG. 5(a) is a diagram showing binary row image data used in this embodiment. This is the data on the second line of the form 31 shown in FIG. FIG. 5(b) is a diagram showing the distribution of black bits of row image data. In this figure, Ml is the starting position of the black bit distribution area, M6. indicates the end position of the black bit distribution area. Note that i here is the appearance number of the black bit distribution area in one row. FIG. 5C is a diagram showing line elements (subpatterns) in a direction perpendicular to a character line extracted from line image data.

本実施例では、閾値以上の文字行に対して垂直な線素を
抽出することに特徴がある。第５図（ｄ）は行画像デー
タより切り出された入力文字パターンを示す図である。This embodiment is characterized by extracting line elements perpendicular to character lines that are equal to or greater than a threshold value. FIG. 5(d) is a diagram showing an input character pattern cut out from the line image data.

本実施例では、文字切り出し選択部１３の処理により斜
体文字の切り出しが従来より確実になる。第５図（ｅ）
は入力文字パターンに対応する認識結果を示す図である
。In this embodiment, the processing of the character extraction selection section 13 allows italic characters to be extracted more reliably than in the past. Figure 5(e)
FIG. 3 is a diagram showing recognition results corresponding to input character patterns.

第６図（ａ）は入力文字パターン例を示し、第６図（ｂ
）はその特徴マトリクスを示している。第６図（Ｃ）は
イタリック体の°゛Ｕ°゛の辞書マトリクスを示す図で
あり、第６図（ｄ）は標準体の°Ｕ゛°の辞書マトリク
スを示す図である。第６図（ｂ）　、　（Ｃ）　、　（
ｄ）中、Ｈ，Ｖ、Ｌ、Ｒは方向性パターン（サブパター
ン）であり、Ｈは水平線素、■は垂直線素、Ｌは左斜め
線素、Ｒは右斜め線素である。これらは、方向性の特徴
を持つ面を複数枚用意して各方向の辞書パターンとの照
合を行う例を示している。本実施例では、字体別に方向
性パターンの辞書を有している。FIG. 6(a) shows an example of an input character pattern, and FIG. 6(b) shows an example of an input character pattern.
) shows its feature matrix. FIG. 6(C) is a diagram showing a dictionary matrix of ゛U゛゛ in italic type, and FIG. 6(d) is a diagram showing a dictionary matrix of ゛U゛° in standard type. Figure 6(b), (C), (
d) In the middle, H, V, L, and R are directional patterns (subpatterns), H is a horizontal line element, ■ is a vertical line element, L is a left diagonal line element, and R is a right diagonal line element. These examples show an example in which a plurality of surfaces having directional characteristics are prepared and matched with dictionary patterns in each direction. In this embodiment, a dictionary of directional patterns is provided for each typeface.

第８図は文字パターンを切り１′７ための切り出し角度
を説明する図である。７０は文字列である行画像データ
、７１は切り出し線、７２は切り出し線の角度θである
。FIG. 8 is a diagram illustrating the cutting angle for cutting the character pattern 1'7. 70 is line image data that is a character string, 71 is a cutting line, and 72 is an angle θ of the cutting line.

第９図は、１文字分の領域内に、文字パターンが２文字
存在する例を示す図である。FIG. 9 is a diagram showing an example of a character pattern in which two characters exist within an area for one character.

以下、第２図に示す文字認識装置ＩＯの各部の動作を、
第３図〜第９図および第１図を参照しながら詳細に説明
する。The operation of each part of the character recognition device IO shown in FIG. 2 will be explained below.
This will be explained in detail with reference to FIGS. 3 to 9 and FIG. 1.

光電変換部１１は、第３図に示すような文字・図形等（
以下文字と称する）が記載された帳票等の媒体からの光
信号Ｓより文字行領域を検出する。次に該文字行領域を
光電変換し、文字線部を画素値ｒｌＪの黒ビット、及び
背景部を画素値「０」の白ビットとして各画素ごとに２
値のディジタル信号で表現した行画像データを得、ライ
ンバッファ１２に格納する。前記行領域の検出は、本実
施例では、第４図（ａ）に示す入力書式デープルを参照
して順次行うものとする。ここで、文字行領域とは、帳
票上における文字が記載される１行分の領域のことであ
る。The photoelectric conversion unit 11 converts characters, figures, etc. (as shown in FIG.
A character line area is detected from an optical signal S from a medium such as a form on which characters (hereinafter referred to as characters) are written. Next, the character line area is photoelectrically converted, and the character line part is set as a black bit with pixel value rlJ, and the background part is set as a white bit with pixel value "0", and 2 bits are converted for each pixel.
Line image data expressed as digital signals of values is obtained and stored in the line buffer 12. In this embodiment, the line areas are detected sequentially with reference to the input format data table shown in FIG. 4(a). Here, the character line area is an area for one line in which characters are written on a form.

ラインバッファ１２は、第５図（ａ）に示すような２値
のディジタル信号である行画像データを格納し、又入力
文字パターンの行画像データにおける各画素の信号をこ
の領域の２次元座標とおりに再現できる形式で記憶し、
１２８　Ｘ４０９６画素の大きさを持ってい１るものである。The line buffer 12 stores line image data, which is a binary digital signal as shown in FIG. memorize it in a format that can be reproduced,
It has a size of 128 x 4096 pixels.

以下に文字切り出し選択部１３及び文字切り出し部１４
について第１図の動作流れ図を用いて詳細に説明する。The character extraction selection section 13 and character extraction section 14 are shown below.
This will be explained in detail using the operation flowchart shown in FIG.

文字切り出し選択部１３はラインバッファ１２より行画
像データを読み込み、文字行に対して垂直な方向に走査
を行い、当該走査線上の黒ビット数を計数し、第５図（
ｂ）に示す黒ビットの分布を求める（ステップ１０１）
。さらに当該黒ビットの分布のＯから１以上に変化する
位置より１以上からＯに変化する直前の位置を黒ビット
の分布の領域として検出する（ステップ１０２）。当該
検出された黒ビットの分布の領域内の文字の平均線幅Ｗ
ｃを算出しくステップ１０３　）　、前記算出された平
均線幅Ｗ０に基づいて第５図（ｃ）に示す閾値に以上の
文字行に対して垂直な線素（サブパターン）を抽出する
（ステップ１０４）。The character extraction selection unit 13 reads the line image data from the line buffer 12, scans it in the direction perpendicular to the character line, counts the number of black bits on the scanning line, and calculates the number of black bits on the scanning line as shown in FIG.
Find the distribution of black bits shown in b) (step 101)
. Further, the position immediately before the position where the black bit distribution changes from O to 1 or more is detected as the area of the black bit distribution (step 102). Average line width W of characters within the area of the detected black bit distribution
step 103), and extract line elements (subpatterns) perpendicular to the character lines that are equal to or greater than the threshold value shown in FIG. 5(c) based on the calculated average line width W0 (step 104). ).

当該検出された黒ビットの分布の領域内の当該抽出され
た垂直方向の線素（サブパターン）の有無により（ステ
ップ１０５）文字を切り出す切り出し線の　２方向を選択し、切り出し信号Ｃ１黒ビットの分布の領域
の開始位置Ｍ□（１≦ｉ≦ｎ）、及び終了位置Ｍ、（１
≦ｉ≦ｎ）を文字切り出し部１４へ出力する。Depending on the presence or absence of the extracted vertical line element (subpattern) in the area of the detected black bit distribution (step 105), two directions of the cutting line for cutting out the character are selected, and the cutting signal C1 of the black bit is The starting position M□ (1≦i≦n) and the ending position M, (1
≦i≦n) is output to the character cutting section 14.

本実施例では、当該検出された黒ビットの分布の領域内
に当該垂直方向の線素（サブパターン）が抽出された場
合には（ステップ１０５）、黒ビットの分布の領域にあ
たる行画像データを切り出す切り出し線を文字行に対し
て垂直な方向として、切り出し信号Ｃ＝０を文字切り出
し部１４へ出力する。そして、黒ビットの分布のＯから
１以上に変化する点から１以上から０に変化する直前の
位置までを入力文字パターンとして、パターンレジスタ
１５に格納する（ステップ１０６）。なお、垂直方向の
線素を抽出するための閾値には、Ｋ＝５／２Ｗ、とじた
。但し、Ｗｏは平均線幅である。文字切り出し部１４は
、ラインバッファ１２から読み込む行画像データを、文
字切り出し選択部１３より出力される文字切り出し信号
Ｃに基づき文字を切り出し、入力文字パターンを得る。In this embodiment, when the vertical line element (subpattern) is extracted within the detected black bit distribution area (step 105), the row image data corresponding to the black bit distribution area is extracted. A cutting signal C=0 is output to the character cutting section 14 with the cutting line to be cut out perpendicular to the character line. Then, the input character pattern is stored in the pattern register 15 from the point where the black bit distribution changes from O to 1 or more to the position immediately before the change from 1 or more to 0 (step 106). Note that the threshold value for extracting line elements in the vertical direction was set to K=5/2W. However, Wo is the average line width. The character cutout unit 14 cuts out characters from the line image data read from the line buffer 12 based on the character cutout signal C output from the character cutout selection unit 13, and obtains an input character pattern.

具体的には、第５図（ｂ）に示すＭｃｌかつＭｃｌまで
の黒ビットの領域において、平均線幅Ｗｃ＝ｌＯが算出
され、閾値にはに＝２５となり、第５図（ｃ）に示すよ
うな垂直方向の線素が抽出される。次に文字切り出し選
択部１３より黒ビットの分布の領域の開始位置Ｍ０、黒
ビットの分布の領域の終了位置Ｍ□、及び切り出し信号
Ｃ＝０が入力される。それに伴い、ＭａｌからＭ、まで
の領域を１文字の文字パターンの領域として切り出し、
入力文字パターン゛Ｔ°゛を４抽出する。以下、”ｏ”
、　　“Ｔ”、”Ｈ“’、”Ｅ”までは第５図（ｃ）に
示すように入力文字パターン”Ｔ”を抽出した時と同様
、垂直方向の線素が抽出されている。このため、それぞ
れ黒ビットの領域の開始位置、黒ビットの領域の終了位
置及び切り出し信号Ｃ＝０が文字切り出し選択部１３よ
り入力されて、文字切り出し部１４においてそれぞれの
領域を１文字パターンとして切り出し、入力文字パター
ンを抽出する。Specifically, in the area of black bits from Mcl to Mcl shown in FIG. 5(b), the average line width Wc=lO is calculated, and the threshold value is 25, as shown in FIG. 5(c). Vertical line elements like this are extracted. Next, the character cutout selection section 13 inputs the start position M0 of the black bit distribution area, the end position M□ of the black bit distribution area, and the cutout signal C=0. Along with this, the area from Mal to M is cut out as a character pattern area for one character,
Extract four input character patterns ゛T°゛. Below, “o”
, "T", "H"', and "E", as shown in FIG. 5(c), line elements in the vertical direction are extracted in the same way as when the input character pattern "T" was extracted. Therefore, the start position of the black bit area, the end position of the black bit area, and the cutout signal C=0 are input from the character cutout selection unit 13, and the character cutout unit 14 cuts out each area as a single character pattern. , extract the input character pattern.

ステップ１０５において、黒ビットの分布の領域内に、
当該垂直方向の線素が抽出されなかった場合には、黒ビ
ットの分布の領域にあたる行画像データを切り出す切り
出し線を第８図に示すように文字行に対してθ＝７５度
の方向として切り出し信号Ｃ＝１を文字切り出し部１４
へ出力する（ステップ１０７）。具体的には、第５図（
ｂ）に示すＭ８６からＭ　ｅａまでの黒ビットの領域に
おいて、第５図（Ｃ）に示すように垂直方向の線素が抽
出されないため、文字切り出し選択部１３より黒ビット
の分布の開始位置Ｍ＋＋６、黒ビットの分布の終了位置
Ｍ。６及び切り出し信号Ｃ＝１が入力される。すると、
Ｍ、６からＭ、６までの領域を文字行に対してθ＝７５
度傾けた切り出し線により入力文字パターンを抽出する
。In step 105, within the region of the distribution of black bits,
If the line element in the vertical direction is not extracted, the cutting line for cutting out the line image data corresponding to the area of black bit distribution is cut out in the direction of θ = 75 degrees with respect to the character line as shown in Figure 8. The signal C=1 is sent to the character cutting unit 14
(Step 107). Specifically, Figure 5 (
In the black bit region from M86 to Mea shown in b), no line elements in the vertical direction are extracted as shown in FIG. , the ending position M of the distribution of black bits. 6 and cutout signal C=1 are input. Then,
The area from M,6 to M,6 is θ=75 for the character line.
The input character pattern is extracted using a cutting line tilted at a certain degree.

前記文字切り出し信号Ｃ＝１の時の人力文字パターンの
抽出は、黒ビットの分布の領域Ｍ　ｓ　６からＭ　ｅ　
６までを文字行に対してθ＝７５度の方向に走査する。When the character cutting signal C=1, the manual character pattern extraction is performed from the black bit distribution area M s 6 to M e
6 is scanned in the direction of θ=75 degrees with respect to the character line.

そして、当該走査方向に黒ビットが存在する位置及び黒
ビットが存在しなくなる位置が検出されたとき、当該黒
ビットが存在する位置から当該黒ビットが存在しなくな
る直前の位置までを１文字の入力文字パターンとして抽
出する。Then, when a position where a black bit exists and a position where a black bit ceases to exist in the scanning direction are detected, input one character from the position where the black bit exists to the position immediately before the black bit ceases to exist. Extract as a character pattern.

当該抽出された入力文字パターンは、パターンレ　５６ジスタ１５に格納される（ステップ１０８）。前記ステ
ップ１０７の処理を当該黒ビットの分布の領域の終了位
置Ｍ　ｓ　ａまで行う（ステップ１０９）。当該ステッ
プ１０１からステップ１０９の処理を、ラインバッファ
１行分、及び、帳票における全行にわたって行う（ステ
ップ１１０，１１１　）。尚、切り出し信号Ｃが１の場
合の切り出し角度は、読取対象とする字体により適した
角度を設定するものであるが、本実施例では第８図に示
すように文字行に対して７５度とした。この角度は、字
体によって任意に変えられるものとする。The extracted input character pattern is stored in the pattern register 15 (step 108). The process of step 107 is performed up to the end position M sa of the area of the black bit distribution (step 109). The processes from step 101 to step 109 are performed for one line of the line buffer and all lines of the form (steps 110 and 111). The cutting angle when the cutting signal C is 1 is set to an angle more suitable for the font to be read, but in this embodiment, it is set to 75 degrees with respect to the character line as shown in FIG. did. This angle can be changed arbitrarily depending on the font.

パターンレジスタ１５は入力文字パターンの文字予定領
域における各画素の信号をこの領域の２次元座標通りに
再現できる形式で記憶し、入力文字パターン１文字あた
り１２８　Ｘ１２８画素の大きさを持っているものであ
る。第５図（ｄ）に本実施例におけるパターンレジスタ
１５に格納されている入力文字パターンを示す。パター
ンレジスタ１５に格納されている人力文字パターンは、
認識部１６に読み込まれる。The pattern register 15 stores the signals of each pixel in the expected character area of the input character pattern in a format that can reproduce the two-dimensional coordinates of this area, and has a size of 128 x 128 pixels per character of the input character pattern. be. FIG. 5(d) shows the input character pattern stored in the pattern register 15 in this embodiment. The human character pattern stored in the pattern register 15 is
It is read into the recognition unit 16.

認識部１６は、パターンレジスタ１５より入力文字パタ
ーンを順次読み込み、入力文字パターンについて特徴抽
出処理及び入力文字パターンの認識を行う。The recognition unit 16 sequentially reads input character patterns from the pattern register 15 and performs feature extraction processing and recognition of the input character patterns.

この特徴抽出の方法は、従来公知の種々の方法を用いる
ことができるが、この実施例の場合、以下に説明するよ
うな方法で行う。Although various conventionally known methods can be used for this feature extraction method, in the case of this embodiment, the method described below is used.

先ず、入力文字パターンについて外接する方形枠を検出
し、これを文字枠とする。さらに当該入力文字パターン
について線幅Ｗ、を算出する。この線幅算出は、例えば
下記に示すような周知の近似式（Ｉ）を用いて行うこと
ができる。First, a rectangular frame circumscribing the input character pattern is detected and used as a character frame. Furthermore, the line width W is calculated for the input character pattern. This line width calculation can be performed using, for example, the well-known approximation formula (I) shown below.

Ｗｐ＝　１／　（１−（Ｑ／Ａ））　　　　・・・　（
Ｉ）但し、式（Ｉ）において、Ｑは入力文字パターンを
構成する各点をこれらの点が（２Ｘ２）個ずつの範囲で
見られる窓で分けたとき、この窓内の全ての点が黒ビッ
トとなる窓の個数であり、また、Ａは文字枠内の黒ビッ
トの個数である。Wp= 1/ (1-(Q/A)) ... (
I) However, in formula (I), Q means that when each point constituting the input character pattern is divided into a window in which (2x2) of these points can be seen, all the points within this window are black. It is the number of windows serving as bits, and A is the number of black bits within the character frame.

さらに、この入力文字パターンを複数の方向に走査を行
って各走査列毎の黒ビットの連続個数を検出し、この黒
ビツト連続個数と上述の線幅とに基づいて上述の複数の
方向毎に対応したサブパターンをそれぞれ抽出する。そ
して、この入力文字パターンの文字枠内領域をサブパタ
ーンについて（ＮＸＭ）個の領域（Ｎ、Ｍは定数）に分
割し、さらに、各領域内の文字線長を表す特微量を、文
字を分割した領域毎に計算し、この特微量を文字枠の大
きさで正規化して特徴マトリクスを得る。この実施例で
は、特微量を、（ΔＸ＋ΔＹ）／２なる値で除すること
によって正規化する。ここで、ΔＸは文字枠の水平方向
長さ、ΔＹは垂直方向長さである。Furthermore, this input character pattern is scanned in multiple directions to detect the number of consecutive black bits in each scanning line, and based on this number of consecutive black bits and the above-mentioned line width, the character pattern is scanned in multiple directions. Extract each corresponding subpattern. Then, the region within the character frame of this input character pattern is divided into (NXM) regions (N, M are constants) for subpatterns, and the character is further divided into This feature is calculated for each region, and the feature quantity is normalized by the size of the character frame to obtain a feature matrix. In this embodiment, the feature quantity is normalized by dividing it by a value of (ΔX+ΔY)/2. Here, ΔX is the horizontal length of the character frame, and ΔY is the vertical length.

認識部１６はこのようにして抽出した特徴マトリクスと
、辞書マトリクスとの照合を行い、最も類似度が大きな
値を示す辞書マトリクスに対応する文字名（ＪＩＳコー
ド等）を出力端子１７を介して外部装置（図示省略）に
出力する。尚、この実施例の場合上述した類似度は、以
下に示す式（ＩＩ）に基づいて求めている。The recognition unit 16 compares the feature matrix extracted in this manner with the dictionary matrix, and externally outputs the character name (JIS code, etc.) corresponding to the dictionary matrix with the largest similarity value through the output terminal 17. Output to a device (not shown). In this example, the above-mentioned degree of similarity is calculated based on formula (II) shown below.

Ｒ＝　（１／毘予］下）Ｘ１００・・・（ＩＩ）但し、
式（ＩＩ）において、Ｒは類似度、ｆｌは入力文字パタ
ーン、ｇｌは辞書内に格納させである辞書マトリクスを
それぞれ示し、又、ｉ＝１．２，３゜・・・、ＮＸＭで
ある。R = (1/year] lower) X100... (II) However,
In formula (II), R is the similarity, fl is the input character pattern, gl is the dictionary matrix stored in the dictionary, and i=1.2, 3°, . . . , NXM.

また、辞書マトリクスは、１カテゴリに対して１つの辞
書マトリクスを持つものでなく、複数の辞書マトリクス
を有するものである。本実施例では、２種ずつである。Moreover, the dictionary matrix does not have one dictionary matrix for one category, but has a plurality of dictionary matrices. In this embodiment, there are two types each.

第６図に本実施例の例文中の単語”ｌ／Ｎｌ（’δ°゛
の１文字目である。”Ｌｌ”の特徴マトリクス及び２種
の辞書マトリクス（イタリック体、標準体）を示す。前
記゛Ｕ”の特徴マトリクス（第６図（ｂ））と辞書マト
リクス（第６図（ｃ）または（ｄ））との類似度は、辞
書マトリクス（第６図（Ｃ））の場合、Ｒ＝１１．６、
辞書マトリクス（第６図（ｄ））の場合、Ｒ＝１．０と
算出される。この演算を全辞書マトリクスに対して行う
。第５図（ｅ）に式（ＩＩ）により演算を行い、算出さ
れた類似度の大きかった辞書マトリクスのコードに対応
する文字を示す。FIG. 6 shows a feature matrix and two types of dictionary matrices (italic type, standard type) of the word "l/Nl" (the first character of 'δ°'') in the example sentence of this embodiment. The degree of similarity between the feature matrix of "U" (FIG. 6(b)) and the dictionary matrix (FIG. 6(c) or (d)) is, in the case of the dictionary matrix (FIG. 6(C)), R =11.6,
In the case of the dictionary matrix (FIG. 6(d)), R=1.0 is calculated. This operation is performed for all dictionary matrices. FIG. 5(e) shows the characters corresponding to the codes of the dictionary matrix that are calculated using formula (II) and have a high degree of similarity.

このように、第５図（ｅ）に示すようなコードに対応す
るキャラクタ−が出力される。In this way, characters corresponding to the code shown in FIG. 5(e) are output.

（発明の効果）以上説明したように、本発明によれば、抽出され９０た黒ビットの分布の幅の領域内に抽出された文字行に対
して垂直な線素の有無により、文字パターンを切り出す
切り出し線の方向を選択し、当該黒ビットの分布の領域
に相当する２値のディジタル信号から、決定した切り出
し線を用いて、文字パターンを順次切り出すので、斜体
文字の文字パターンの切り出し、特に、第９図に示すよ
うな１文字分の領域内に２つの文字パターンが存在する
ような２文字と推定されない文字パターンの切り出しが
可能となり、複数の字体を含む帳票を高速に、精度よい
文字切り出しが実現でき、さらには、本方法を用いて認
識装置を実現する際には、簡単な処理方式であるため、
ハードウェア数も少なくて済み、認識装置を小型化する
ことが可能である。(Effects of the Invention) As explained above, according to the present invention, character patterns are determined based on the presence or absence of line elements perpendicular to the character line extracted within the width area of the distribution of extracted black bits. The character pattern is sequentially cut out using the determined cutting line from the binary digital signal corresponding to the area of the black bit distribution. In particular, it becomes possible to extract character patterns that are not presumed to be two characters, such as two character patterns existing within the area of one character, as shown in Figure 9. It is possible to realize character segmentation, and furthermore, when realizing a recognition device using this method, it is a simple processing method.
The number of hardware is also small, and it is possible to downsize the recognition device.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の実施例を示す文字切り出し方法の動作
流れ図、第２図は本発明を適用した文字認識装置の構成ブロック
図、第３図は本発明の実施例で用いる文字が記載された帳票
を示す図、第４図（ａ）、　（ｂ）は入力書式テーブルの例を示す
図、第５図（ａ）〜（ｅ）は本発明の実施例における文字認
識の過程を示す図、第６図（ａ）、　（ｂ）は入力文字パターンとその特徴
マトリクスを示す図、第６図（ｃｌ、　（ｄ）はＵ”の
イタリック体と標準体の辞書マトリクスを示す図、第７
図は黒ビットの分布を用いた従来の方法のパターン例を
示す図、第８図は文字パターンを切り出すための切り出し角度を
説明する図、第９図は１文字分の領域内に、文字パターンが２文字存
在する例を示す図である。ｌＯ・・・文字認識装置、１２・・・ラインバッファ、１４・・・文字切り出し部、１６・・・認識部、３１・・・帳票、１１・・・光電変換部、１３・・・文字切り出し選択部、１５・・・パターンレジスタ、】７・・・出力端子、６０〜６４・・・文字パターン、６５・・・黒ビットの分布、７１・・・切り出し線、７０・・・行画像データ、７２・・・切り出し線の角度。Fig. 1 is an operation flowchart of a character extraction method showing an embodiment of the present invention, Fig. 2 is a block diagram of the configuration of a character recognition device to which the present invention is applied, and Fig. 3 shows characters used in an embodiment of the present invention. Figures 4(a) and 4(b) are diagrams showing examples of input format tables; Figures 5(a) to (e) are diagrams showing the process of character recognition in an embodiment of the present invention. , Figures 6(a) and 6(b) are diagrams showing input character patterns and their feature matrices, Figures 6(cl) and (d) are diagrams showing dictionary matrices of U'' in italic type and standard type, and Figure 7
The figure shows an example of a pattern using the conventional method using the distribution of black bits. Figure 8 is a diagram explaining the cutting angle for cutting out a character pattern. It is a figure which shows the example where two characters exist. IO... Character recognition device, 12... Line buffer, 14... Character cutting unit, 16... Recognition unit, 31... Form, 11... Photoelectric conversion unit, 13... Character cutting out Selection section, 15...Pattern register, ]7...Output terminal, 60-64...Character pattern, 65...Black bit distribution, 71...Cutout line, 70...Line image data , 72... Angle of the cutting line.

Claims

【特許請求の範囲】[Claims]

（１）媒体上の文字行を光電変換し、量子化された行画
像データを得、該行画像データより文字切り出し位置を
検出し、該検出された文字切り出し位置により、前記行
画像データを切り出し、文字パターンの抽出を行う文字
切り出し方法において、前記行画像データを文字行に対して垂直方向に走査し、
１走査ごとに、文字パターンの黒ビット数を計数し、黒
ビットの分布を求めるステップと、前記黒ビット数があ
らかじめ定められた値よりも大きな値が連続する文字行
に対して任意の角度の線素を抽出するステップと、前記求められた黒ビットの分布から文字行方向に黒ビッ
トが連続する領域を検出するステップと、前記抽出され
た文字行に対して任意の角度の線素の有無を検出するス
テップと、当該線素の量に基づいて当該領域における文
字切り出し走査方向を決定するステップと、該決定され
た文字切り出し走査方向に走査を行って文字切り出しを
行うステップとを有することを特徴とする文字切り出し
方法。(1) Photoelectrically convert a character line on the medium to obtain quantized line image data, detect a character cutting position from the line image data, and cut out the line image data based on the detected character cutting position. , in a character cutting method for extracting a character pattern, scanning the line image data in a direction perpendicular to the character line;
For each scan, the number of black bits in the character pattern is counted and the distribution of black bits is determined. a step of extracting a line element, a step of detecting an area where black bits are continuous in the character line direction from the distribution of black bits obtained above, and a step of detecting the presence or absence of a line element at an arbitrary angle with respect to the extracted character line. , a step of determining a character extraction scanning direction in the region based on the amount of the line elements, and a step of performing character extraction by scanning in the determined character extraction scanning direction. Characteristic character cutting method.

（２）前記文字切り出し走査方向は、前記文字行方向に
黒ビットが連続する領域内に、前記文字行に対して垂直
なＫより大きな線素（Ｋ：定数）が抽出された場合は、
文字行に対して垂直方向であり、前記文字行に対して垂
直なＫより大きな線素が抽出されなかった場合は、文字
行に対してθ度だけ傾いた方向（θ：定数）であること
を特徴とする請求項１記載の文字切り出し方法。(2) In the character extraction scanning direction, if a line element (K: constant) larger than K perpendicular to the character line is extracted in an area where black bits are continuous in the character line direction,
The direction is perpendicular to the character line, and if a line element larger than K perpendicular to the character line is not extracted, the direction is tilted by θ degrees with respect to the character line (θ: constant) The character cutting method according to claim 1, characterized in that: