JPH0113583B2

JPH0113583B2 -

Info

Publication number: JPH0113583B2
Application number: JP56034243A
Authority: JP
Inventors: Eiichiro Yamamoto
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-03-10
Filing date: 1981-03-10
Publication date: 1989-03-07
Also published as: JPS57147783A

Description

【発明の詳細な説明】本発明は、文字を縦、横２方向に走査して文字
線（ストローク）に囲まれた領域（背景部分）を
抽出して該文字の特徴パターンを作成する文字認
識方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention is a character recognition system that scans a character in two directions, vertically and horizontally, extracts an area (background part) surrounded by character lines (strokes), and creates a characteristic pattern of the character. Regarding the method.

文字認識においては文字のストロークの特徴を
抽出する方式および文字の背景部分を符号化する
方法が知られている。後者に属する従来の方法は
背景部分のある点においてどの方向に文字線分が
あるかを調べているために、たとえば第３図のイ
とロでは、同一文字であるにもかかわらずロの方
はの背景部分に別の符号ラベルがついてしま
う。即ち第３図イ，ロはいずれ「文」という漢字
を例としたものであるが、右方および下方に線分
がある、左方および下方に線分がある、上、下、
左、右に線分がある等を基準として符号付けを行
なうためイの場合は図示背景部分にラベル〜
が付されるが、文字手書の際の変形によりロでは
斜めストロークの一部が長く伸びているのでイに
ない下方に線分ありの背景部分が生じ、これにラ
ベルが付される。この様な特徴の差は文字認識
に対しイとロの同一性を失なわせるように作用す
る。一般に漢字のように複雑な字形を対象にしよ
うとする場合、文字線の長短によりこのような違
いが頻繁に生じ、手書文字の識別論理を組むのが
むずかしくなる。また複雑な字形の場合はラベル
の種類が多くなり、処理が厄介になる。 In character recognition, a method of extracting stroke features of a character and a method of encoding a background part of a character are known. Conventional methods that belong to the latter category check in which direction the character line segment is located at a certain point in the background, so for example, in Figure 3, A and B are the same characters, but B is in the opposite direction. A different code label is attached to the background part of . In other words, Figure 3 A and B take the kanji ``文'' as an example, and there are line segments on the right and bottom, line segments on the left and bottom, top, bottom,
Codes are assigned based on the presence of line segments on the left and right, so in the case of A, labels are placed on the background of the illustration.
However, due to the deformation during handwriting, some of the diagonal strokes in B are elongated, resulting in a background part with a line segment below that is not in A, and a label is attached to this. This difference in features acts in character recognition to make A and B lose their identity. Generally, when trying to target complex character shapes such as kanji, such differences frequently occur due to the length of the character lines, making it difficult to develop a logic for identifying handwritten characters. Furthermore, in the case of complex glyphs, there are many types of labels, making processing difficult.

また文字認識に対し同一符号部分のヒストグラ
ムを作つてそれを特徴にすると第３図ハとニのよ
うに文字の変形により異なる特徴が抽出される。
即ち、本例ではいずれも「文」という漢字ではあ
るがハは中央の線分で囲まれる領域A₁を小に、
ニはこれA₂を大に書いたものであり、これらは
同一の分類基準に属するので同一のラベルがつけ
られるが、A₁，A₂は面積が違うためにラベレの
個数（１文字の画像信号は例えば50×50ビツトで
構成されて各ビツトはメモリの各セルに書込ま
れ、ラベルも各ビツトに対して付される）をカウ
ントするとヒストグラムH₁，H₂には差が生じ、
文字認識を困難にする。 Also, for character recognition, if a histogram of the same code portion is created and used as a feature, different features can be extracted depending on the deformation of the character, as shown in Figure 3 (c) and (d).
In other words, in this example, although both are the kanji ``文'', HA makes the area A ₁ surrounded by the central line segment smaller,
2 is a large-sized version of A ₂ , and since they belong to the same classification standard, they are given the same label. However, A ₁ and A ₂ have different areas, so the number of labellets (the image of one character) is The signal consists of, for example, 50 x 50 bits, each bit is written to each cell of the memory, and a label is also attached to each bit). When counting, a difference occurs between the histograms H ₁ and H ₂ ,
Makes character recognition difficult.

本発明は、手書文字の変形に影響されにくくま
た取扱いも容易な特徴量の抽出方法を採用した文
字認識方式を提供するものである。本発明の文字
認識方式は、入力文字画像情報の特徴を抽出し、
抽出された特徴パターンと予め用意された辞書パ
ターンとを照合して、その照合結果から入力文字
を判定する文字認識方式において、文字ストロー
クの背景部分を、該ストロークで上下に囲まれた
部分、左右が囲まれた部分、および上下左右が囲
まれた部分に分けて各部分にそれぞれ異なるラベ
ルを付け、そして水平、垂直走査線が各ラベル領
域と交差する回数を計数し、そのヒストグラムを
入力文字の特徴パターンとすることを特徴とする
が、以下図示の実施例を参照しながらこれを詳細
に説明する。 The present invention provides a character recognition method that employs a feature extraction method that is not easily affected by the deformation of handwritten characters and is easy to handle. The character recognition method of the present invention extracts features of input character image information,
In a character recognition method that compares the extracted feature pattern with a dictionary pattern prepared in advance and determines the input character based on the matching result, the background part of a character stroke is Divide it into the enclosed area and the area enclosed on the top, bottom, left and right, give each area a different label, count the number of times the horizontal and vertical scanning lines intersect each label area, and calculate the histogram of the input character. The present invention is characterized in that it is a characteristic pattern, which will be explained in detail below with reference to the illustrated embodiment.

第１図は本発明の一実施例である。同図におい
て、１は観測部、２は映像メモリ、３は垂直走査
アドレスカウンタ、４は水平走査アドレスカウン
タ、５は水平始点検出部、６は水平終点検出部、
７は垂直始点検出部、８は垂直終点検出部、９は
水平始点レジスタ、１０は水平終点レジスタ、１
１は垂直始点レジスタ、１２は垂直終点レジス
タ、１３〜１６は比較器、１７，１８はフリツプ
フロツプ、１９は水平面特徴メモリ、２０は垂直
面特徴メモリ、２１〜２３はアンド回路、２４は
水平アドレスカウンタ、２５は水直アドレスカウ
ンタ、２６は水平垂直囲み特徴計数部、２７は水
平囲み特徴計数部、２８は垂直囲み特徴計数部、
２９はマツチング部である。 FIG. 1 shows an embodiment of the present invention. In the figure, 1 is an observation unit, 2 is a video memory, 3 is a vertical scanning address counter, 4 is a horizontal scanning address counter, 5 is a horizontal start point detection unit, 6 is a horizontal end point detection unit,
7 is a vertical start point detection section, 8 is a vertical end point detection section, 9 is a horizontal start point register, 10 is a horizontal end point register, 1
1 is a vertical start point register, 12 is a vertical end point register, 13 to 16 are comparators, 17 and 18 are flip-flops, 19 is a horizontal plane feature memory, 20 is a vertical plane feature memory, 21 to 23 are AND circuits, and 24 is a horizontal address counter. , 25 is a horizontal address counter, 26 is a horizontal/vertical enclosing feature counter, 27 is a horizontal enclosing feature counter, 28 is a vertical enclosing feature counter,
29 is a matching section.

観測部１は、帳標上からの光学的映像情報を電
気的映像情報に変換するものであり、その出力映
像情報は映像メモリ２に格納される。垂直走査ア
ドレスカウンタ３は映像メモリ２を垂直方向に走
査するためのものであり、また水平走査アドレス
カウンタ４は映像メモリ２を水平方向に走査する
ためのものである。水平始点検出部５は水平走査
における「黒」（文字の部分を「黒」、背景部分を
「白」と呼ぶ）から「白」への変化点を検出する
ものであり、また水平終点検出部６は水平走査に
おいて最初の水平始点を検出した以後の白から黒
への変化点を検出するものである。この様子を、
第２図を参照して説明する。同図において３１は
映像メモリ２に格納された映像である。今、ある
水平走査３２を考えると、黒から白へと変化点は
３３のようになる。またこの水平走査３２におい
て、最初の水平始点検出後の白から黒への変化点
は３４のようになる。水平始点レジスタ９、水平
終点レジスタ１０は水平終点検出のタイミングで
セツトされ、したがつてレジスタ９，１０には信
号３３，３４の黒丸印の値（位置情報）がセツト
される。 The observation unit 1 converts optical image information from the ledger into electrical image information, and the output image information is stored in the image memory 2. The vertical scanning address counter 3 is for scanning the video memory 2 in the vertical direction, and the horizontal scanning address counter 4 is for scanning the video memory 2 in the horizontal direction. The horizontal start point detection section 5 detects the point of change from "black" (the text part is called "black" and the background part is called "white") to "white" in horizontal scanning, and the horizontal end point detection section 6 detects a point of change from white to black after detecting the first horizontal starting point in horizontal scanning. This situation,
This will be explained with reference to FIG. In the figure, 31 is a video stored in the video memory 2. Now, considering a certain horizontal scan 32, there are 33 points of change from black to white. Further, in this horizontal scanning 32, the point of change from white to black after the first horizontal starting point is detected is 34. The horizontal start point register 9 and the horizontal end point register 10 are set at the timing of detecting the horizontal end point, so the values (position information) of the black circles of the signals 33 and 34 are set in the registers 9 and 10.

垂直始点検出部７、垂直終点検出部８、垂直始
点レジスタ１１、垂直終点レジスタ１２も水平の
場合と同様の動作をするので、以下では主に水平
走査を例に説明する。１水平走査が終了すると、
その走査行についても水平始点、水平終点の値
（Ｘアドレス）が水平始点レジスタ９および水平
終点レジスタ１０に格納される。このとき水平ア
ドレスカウンタ２４がカウントを開始し、水平始
点レジスタ９の値と一致した点で比較器１３がオ
ンになり、また水平終点レジスタ１０の値と一致
した点で比較器１４がオンになる。フリツプフロ
ツプ１７は、比較器１３がオンとなつた時点から
比較器１４がオンとなる時点までの間オンとな
る。水平面特徴メモリ１９にはフリツプフロツプ
１７の出力がアドレスを対応させながら書き込ま
れるのでその内容は第２図１９ａのようになる。
この図の多数の水平線分（メモリでは例えば情報
“１”記憶セルの列）が上記のようにして求めら
れた始終点間（文字ストローク間）部分を示し、
終点または始点のみのものはメモリ１９上では本
例では“０”として書込まれる。垂直走査につい
ても同様で、垂直面特徴メモリ２０にはフリツプ
フロツプ１８の出力が書込まれ、その内容は第２
図の２０ａのようになる。アンドゲート２１はメ
モリ１９，２０の読出し出力１９ａ，２０ａを入
力され、その論理積を出力する。これらのメモリ
の読出しは同じアドレス信号を同時に入力して行
ない、従つて同じアドレスのメモリセルの記憶内
容が同時に該ゲート２１に入力し、該ゲートの出
力は第２図で縦、横線を付した網の目部分３８で
表わされるものとなる、アンドゲート２２はメモ
リ１９の出力と、インバータ２２ａで反転された
メモリ２０の出力を入力されるので、垂直囲みが
無い、水平囲みのみの部分を示す出力を生じる。
本例ではこの出力はない。またアンドゲート２３
はメモリ２０の出力と、インバータ２３ｂで反転
されたメモリ１９の出力を入力され、水平囲みが
無い、垂直囲みのみの部分３７を示す出力を生じ
る。 The vertical start point detection section 7, the vertical end point detection section 8, the vertical start point register 11, and the vertical end point register 12 also operate in the same way as in the horizontal case, so the explanation below will mainly take horizontal scanning as an example. When one horizontal scan is completed,
The horizontal start point and horizontal end point values (X address) for that scanning line are also stored in the horizontal start point register 9 and the horizontal end point register 10. At this time, the horizontal address counter 24 starts counting, and the comparator 13 is turned on when the value matches the value of the horizontal start point register 9, and the comparator 14 is turned on when the value matches the value of the horizontal end point register 10. . Flip-flop 17 is turned on from the time when comparator 13 is turned on until the time when comparator 14 is turned on. Since the output of the flip-flop 17 is written into the horizontal plane characteristic memory 19 with corresponding addresses, the contents become as shown in FIG. 2, 19a.
A large number of horizontal line segments in this figure (in the memory, for example, columns of information "1" storage cells) indicate the portion between the start and end points (between character strokes) determined as above,
In this example, only the end point or the start point is written as "0" on the memory 19. The same goes for vertical scanning; the output of the flip-flop 18 is written to the vertical surface feature memory 20, and its contents are
It will look like 20a in the figure. The AND gate 21 receives the read outputs 19a and 20a of the memories 19 and 20, and outputs the logical product thereof. Reading from these memories is carried out by inputting the same address signal at the same time, so the stored contents of the memory cells at the same address are input to the gate 21 at the same time, and the output of the gate is indicated by vertical and horizontal lines in FIG. The output of the memory 19 and the output of the memory 20 inverted by the inverter 22a are input to the AND gate 22, which is represented by the mesh area 38, so there is no vertical enclosure, only a horizontal enclosure is shown. produces an output.
In this example, this output is not present. Also and gate 23
inputs the output of the memory 20 and the output of the memory 19 inverted by the inverter 23b, and produces an output showing a portion 37 with only vertical surrounds and no horizontal surrounds.

水平垂直囲み特徴計数部２６、水平囲み特徴計
数部２７、垂直囲み特徴計数部２８はそれぞれア
ンド回路２１，２２，２３の出力３８，３６，３
７を水平あるいは垂直方向に監視して白から黒へ
の変化する回数をカウントする。水平垂直囲み特
徴を例にこれを説明すると、第２図の縦線と横線
の両方が付されている部分３８が前述のようにア
ンド回路２１の出力で、走査線３９の位置での白
から黒への変化点は３カ所なので、水平垂直囲み
特徴計数部２６は値３を出力する。かゝる出力は
前述の例では50本ある各水平走査線毎に生じる。
水平、垂直囲み特徴計数部２７，２８も同様であ
るが、垂直囲みの場合は勿論各垂直走査線毎の計
数値を出力する。従つて計数部２６，２７，２８
の出力数は本例では各50個、計150個ある。マツ
チング部２９ではこれらの入力部から抽出した特
徴とあらかじめ用意されている標準パターンとの
距離を計算し、距離の近い標準パターンのカテゴ
リー（文字）を結果として出力する。 The horizontal/vertical enclosing feature counting unit 26, the horizontal enclosing feature counting unit 27, and the vertical enclosing feature counting unit 28 are outputs 38, 36, and 3 of the AND circuits 21, 22, and 23, respectively.
7 horizontally or vertically and count the number of times it changes from white to black. To explain this using the horizontal and vertical encircling feature as an example, the portion 38 marked with both vertical and horizontal lines in FIG. Since there are three points of change to black, the horizontal/vertical enclosing feature counting unit 26 outputs a value of 3. Such an output occurs for each of the 50 horizontal scan lines in the above example.
The horizontal and vertical enclosing feature counting units 27 and 28 are similar, but in the case of vertical enclosing, of course, they output a count value for each vertical scanning line. Therefore, the counting sections 26, 27, 28
In this example, the number of outputs is 50 each, for a total of 150. The matching section 29 calculates the distance between the features extracted from these input sections and a standard pattern prepared in advance, and outputs the category (character) of the standard pattern with the closest distance as a result.

以下に本発明方式を採用した文字認識の実験デ
ータを示す。 Experimental data for character recognition using the method of the present invention is shown below.

次の類似カテゴリー37種について認識実験を行
なつた。 We conducted recognition experiments on the following 37 similar categories.

対象カテゴリ悪、意、炎、恩、害、完、患、
危、鬼、急、愚、恵、憲、更、克、
思、慈、充、是、泉、走、束、息、
怠、態、忠、展、東、唐、念、売、
尾、免、吏、竜、昆使用データ各カテゴリ 100サンプル計3700
データ認認実験本発明による方式の特徴を用いた場合の認識率
65.3％第３図ハ，ニの特徴を用いた場合の認識率
62.7％本発明による方式の特徴と他の２つの特徴を組
み合わせた場合 86.2％第３図ハ，ニの特徴と他の２つの特徴を組み合
わせた場合 83.4％この認識実験結果に示すように第３図ハ，ニの
特徴を用いた場合にくらべ、本発明による方式の
特徴を用いると認識率が３％程度向上する。 Target categories: evil, will, flame, favor, harm, perfection, suffering,
danger, demon, sudden, foolish, grace, ken, further, katsu,
thought, compassion, fullness, kore, spring, run, bundle, breath,
laziness, state, loyalty, exhibition, east, tang, nen, sale,
Tail, Men, Man, Dragon, Kun Usage data: 100 samples for each category, total 3700
Data Recognition experiment Recognition rate when using the features of the method according to the present invention
65.3% Recognition rate when using features C and D in Figure 3
62.7% When the features of the method according to the present invention are combined with two other features 86.2% When the features of Figure 3 C and D are combined with the other two features 83.4% As shown in the results of this recognition experiment, the third When the features of the method according to the present invention are used, the recognition rate improves by about 3% compared to when the features shown in Figures C and D are used.

以上の説明から明らかなように、本発明によれ
ば、筆記者特有の癖による文字の傾きや線の長短
に影響されない、また扱い易い文字の特徴を抽出
することが出来る。 As is clear from the above description, according to the present invention, it is possible to extract characteristics of characters that are not affected by the inclination of characters or the length of lines due to the peculiar habits of scribes, and are easy to handle.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロツク図、
第２図はその動作説明図、第３図は従来の文字認
識方式の説明図である。図中、３１は入力映像情報、２０ａは上下が囲
まれた部分、１９ａは左右が囲まれた部分、３８
は上下左右が囲まれた部分である。 FIG. 1 is a block diagram showing one embodiment of the present invention;
FIG. 2 is an explanatory diagram of its operation, and FIG. 3 is an explanatory diagram of a conventional character recognition system. In the figure, 31 is input video information, 20a is a portion surrounded on the top and bottom, 19a is a portion surrounded on the left and right, 38
is an area surrounded on the top, bottom, left and right.

Claims

【特許請求の範囲】[Claims]

１入力文字画像情報の特徴を抽出し、抽出され
た特徴パターンと予め用意された辞書パターンと
を照合して、その照合結果から入力文字を判定す
る文字認識方式において、文字ストロークの背景
部分を、該ストロークで上下が囲まれた部分、左
右が囲まれた部分、および上下左右が囲まれた部
分に分けて各部分にそれぞれ異なるラベルを付
け、そして水平、垂直走査線が各ラベル領域と交
差する回数を計数し、そのヒストグラムを入力文
字の特徴パターンとすることを特徴とする文字認
識方式。1 In a character recognition method that extracts the features of input character image information, matches the extracted feature patterns with a dictionary pattern prepared in advance, and determines the input character from the matching result, the background part of the character stroke is Divide the stroke into a part surrounded by the top and bottom, a part surrounded by the left and right, and a part surrounded by the top, bottom, left and right, and give each part a different label, and horizontal and vertical scanning lines intersect each label area. A character recognition method characterized by counting the number of times and using the histogram as a characteristic pattern of input characters.