JPH03222082A

JPH03222082A - Character recognizing device

Info

Publication number: JPH03222082A
Application number: JP2018212A
Authority: JP
Inventors: Toru Ishikawa; 石河　融; Hiroshi Yoshida; 浩史吉田; Koichi Higuchi; 浩一樋口; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-01-29
Filing date: 1990-01-29
Publication date: 1991-10-01

Abstract

PURPOSE:To execute recognition quickly and accurately by segmenting a data at every character from a character string picture data, detecting the position feature point coordinate of the character data, deciding a classification value based on the feature point coordinate of an adjacent character data and a character pattern classification value, and selecting a dictionary. CONSTITUTION:A character segmenting part 14 segments the character data of the character pattern at every character from the character string picture from the photoelectric conversion part 12 of this character recognizing device 10. The position feature point coordinate value such as the lower edge point of this data, etc., is detected at the lower edge point detection part 18a of a character recognition part, and a classification value deciding part 18b decides the classification value of that character data from the detected result, the previous lower edge point, the lower edge point coordinate value of the adjacent character data immediately before of the classification value storage part 18c, and the classification value. And a histogram by the degree number at every classification value is prepared at a histogram preparation part 18d, a dictionary selective signal S for collation is formed with the classification value which indicates the highest degree at the classification value comparison part 18f as a threshold, the dictionary corresponding to the dictionary part 20 is selected, and the character can be recognized quickly and accurately without correcting the deviation of the character height by a skew.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、文字認識装置に関するものである。[Detailed description of the invention] (Industrial application field) The present invention relates to a character recognition device.

（従来の技術）例えばカンマ「、Ｊとアポストロフィ「′」、大文字「
Ｐ」と小文字１−ＰＪｅのように、互いに形状は等しい
か記載される位置が異なる文字を認識する場合、文字パ
タンの字形のみてはこれら文字を精度良く認識出来ない
。このため、文字認識においては、文字の位置情報をも
文字認識のための情報として用いる場合があった。そし
て、この種の文字認識方法に用いる文字認識装置は、一
般に、以下の■〜■に示すような構成成分を具えていた
。(Prior art) For example, comma ", J and apostrophe "'", capital letter "
When recognizing characters such as "P" and the lowercase letter 1-PJe, which have the same shape or are written in different positions, these characters cannot be recognized with high accuracy based only on the shape of the character pattern. For this reason, in character recognition, character position information is sometimes used as information for character recognition. The character recognition device used in this type of character recognition method generally includes the following components.

■・・・文字、図形等から成る文字列か記載されている
媒体例えば帳票を走査しで得られた光信号を光電変換し
、さらに文字線部を例えば黒どット、背景部を白ビット
というように量子化して前記文字列の画像データを得る
光電変換部。■...The optical signal obtained by scanning a medium, such as a form, on which a character string consisting of characters, figures, etc. is written is photoelectrically converted, and furthermore, the character line part is made of black dots, and the background part is made of white bits. A photoelectric conversion unit that quantizes and obtains image data of the character string.

■・・・この画像データより１文字づつの文字パタンを
切り出す文字切り出し部。■...Character cutting unit that cuts out character patterns one character at a time from this image data.

■・・・切り出した各文字パタン壱分類する文字分類部
。■...Character classification section that classifies each cut out character pattern.

■・・・複数の辞Ｍを有し上述の文字分類部による分類
結果に基づいて１つの辞書が選択される辞書部。(2) Dictionary section which has a plurality of words M and selects one dictionary based on the classification result by the above-mentioned character classification section.

■・・・選択された辞書を用いて文字パタンの照合を行
ない文字を認識する識別部。■...Identification unit that recognizes characters by comparing character patterns using the selected dictionary.

そして、上述の各構成成分のうちの文字分類部の従来の
構成例としては、例えば文献（昭和６３年電子情報通信
学会春季全国大会（昭和６３年３．１５）Ｄ−４４８＞
に開示されているような処理を行なうものがあった。As an example of the conventional configuration of the character classification section of each of the above-mentioned components, for example, the document (1988 Institute of Electronics, Information and Communication Engineers Spring National Conference (March 15, 1988) D-448)
There was one that performed processing as disclosed in .

この文献に開示された文字分類部における処理は、以下
に説明するようなものであった。The processing in the character classification unit disclosed in this document was as described below.

■・・・先ず、１行中の文字夫々の文字外接矩形が抽出
され、次に、該１行中の各文字かこの行の中の文字のう
ちで最も大きい文字と比較され極端に小さい文字は微小
文字としで除去される０次に、残った文字のタト接矩形
の上端及び下端の高さ位置によるヒストグラムか作成さ
れる。■...First, the character circumscribing rectangle of each character in one line is extracted, and then each character in that one line or the largest character in this line is compared to find the extremely small character. is removed as a small character, and then a histogram is created based on the height positions of the upper and lower ends of the Tato tangent rectangle of the remaining characters.

■・・・次に、このヒストグラムより、矩形上端で最も
低い位置にあるピークと、矩形下端で最も高い位置にあ
るピークとが検出される。(2) Next, from this histogram, the peak located at the lowest position at the upper end of the rectangle and the peak located at the highest position at the lower end of the rectangle are detected.

■・・・次に、これらビーウ間の距離とほぼ同し大きざ
の文字の上下端の座標値を用いて最小二乗法により文字
行の傾きを与える直線が求められる。(2) Next, a straight line giving the inclination of the character line is determined by the method of least squares using the coordinate values of the upper and lower ends of the characters that are approximately the same size as the distance between these lines.

次に、得られた直線の傾きよりスキューによる文字高さ
のずれが補正された後再び先に説明したと同様な方法で
ヒストグラムが作ＩＩｉされる。Next, the deviation in character height due to skew is corrected from the slope of the obtained straight line, and then a histogram IIi is created again in the same manner as described above.

■・・・次に、このヒストグラムより、先に説明したと
同様に２つのど−クが検出されこれらど−クか上側基準
線及び下側基準線とされる。(2) Next, two dots are detected from this histogram in the same manner as described above, and these dots are taken as the upper reference line and the lower reference line.

■・・・次に、これら上側及び下側基準線間の距離か基
準サイズの文字とされ、文字行の各文字パタンの大きさ
がこの基準サイズ文字の大きさと比較されまた、各文字
パタンの位置が上側及び下側基準線と比較される。そし
てこの比較結果に基づき文字行の各文字パタンか７つの
カテゴリに分類すれる。第８図は、この７つのカテゴリ
ーの説明図である。文字パタンは、それに外接する矩形
の大きざと上側基準線Ｕ及び下側基準線しに対する位置
とに暴きａ〜９て示す７つのカテゴリに分類キれている
。■...Next, the distance between these upper and lower reference lines is determined as a standard size character, and the size of each character pattern in the character line is compared with this standard size character size. The position is compared to upper and lower reference lines. Based on this comparison result, each character pattern in the character line is classified into seven categories. FIG. 8 is an explanatory diagram of these seven categories. The character patterns are classified into seven categories shown as a to 9 based on the size of the circumscribed rectangle and the position relative to the upper reference line U and the lower reference line.

分類の終了した各文字パタンはそれが分類されたカテゴ
リに対応する専用の辞書を用いて識別部において照合さ
れる。Each character pattern that has been classified is compared in the identification unit using a dedicated dictionary corresponding to the category into which it has been classified.

（発明が解決しようとする課題）しかしなから、上述した文字分類部では、文字外接矩形
の上端及び下端の高さ位置によるヒストグラムを２度作
成すること、スキューによる文字高さのずれを補正する
ために最小二乗法により基準線を抽出することを行なう
ので処理が複雑であるため処理時間が長くなり認識速度
が低下するという問題点かあった。(Problem to be Solved by the Invention) However, in the above-mentioned character classification unit, a histogram is created twice based on the height positions of the upper and lower ends of the character circumscribing rectangle, and the deviation in character height due to skew is corrected. Therefore, the reference line is extracted by the least squares method, which is complicated and requires a long processing time, resulting in a reduction in recognition speed.

この発明はこのような点に鑑みなされたものであり、従
ってこの発明の目的は、従来より文字の分類を高速に然
もスキューが存在しても精度良く行なうことが出来ひい
ては文字を正確に然も高速に認識出来る文字認識装Ｍを
提供することにある。The present invention has been made in view of the above points, and therefore, an object of the present invention is to enable character classification to be performed faster and more accurately than before, even in the presence of skew, and to classify characters accurately. Another object of the present invention is to provide a character recognition device M that can recognize characters at high speed.

（課題を解決するための手段）この目的の達！Ｉｉヲ図るため、この発明によれば、媒
体からの光信号を光電変換し量子化して媒体上の文字列
の画像データを得る光電変換部、該画像データより１文
字づつの文字パタンを切り出す文字切り出し部、各文字
パタンを分類する文字分類部、複数の辞１を有し前述の
文字分類部による分類結果に基づいて１つの辞書が選択
される辞書部及び前述の選択された辞書を用いて文字パ
タンの照合を行ない文字を認識する識別部を具える文字
認識装置において、文字分類部を、各文字パタンの文字列中における位置特徴点座標を検出
する位置特徴点検出部と、前述の各文字パタンの分類値を、着目文字パタンの位置
特徴点座標、該着目文字パタンに隣接する文字パタンの
位置特徴点座標及び該ｌｌＩ接する文字パタンに付与さ
れている分類値に基づいて該着目文字パタンに分類Ｊａ
ミラ与する手順で、順次に決定する分類値決定部とを具
える構成としたことを特徴とする。(Means to solve the problem) Achieving this goal! In order to achieve Ii, the present invention provides a photoelectric conversion unit that photoelectrically converts and quantizes an optical signal from a medium to obtain image data of a character string on the medium, and a character that cuts out a character pattern one character at a time from the image data. A cutting section, a character classification section that classifies each character pattern, a dictionary section that has a plurality of dictionaries 1 and selects one dictionary based on the classification result by the character classification section, and the selected dictionary. A character recognition device comprising an identification unit that performs character pattern matching and recognizes characters, comprising a character classification unit, a position minutiae detection unit that detects position minutiae coordinates in a character string of each character pattern, and each of the above-mentioned characters. The classification value of the character pattern is calculated based on the positional feature point coordinates of the character pattern of interest, the positional feature point coordinates of the character pattern adjacent to the character pattern of interest, and the classification value given to the character pattern adjacent to the character pattern of interest. Classified into Ja
The present invention is characterized in that it has a configuration including a classification value determination unit that sequentially determines the classification value in a procedure for assigning mirrors.

なお、この発明の実施に当たり、文字認識装置に、前述
の分類値決定部で決定される分類値毎の度数を求め分類
値の度数によるヒストグラムを作成するヒストグラム作
成部と、最も高い度数を示した分類値を閾値とし各文字
パタンの分類値を該閾値とそれぞれ比較して各文字パタ
ンの照合用辞ＩＦを選択する信号を前記辞書部に出力す
る分類値比較部とをさらに設けるのが好適である。In implementing the present invention, the character recognition device includes a histogram creation section that calculates the frequency of each classification value determined by the classification value determination section and creates a histogram based on the frequency of the classification values, and a histogram creation section that calculates the frequency of each classification value determined by the classification value determination section and creates a histogram based on the frequency of the classification value. Preferably, the method further includes a classification value comparison section that uses the classification value as a threshold and compares the classification value of each character pattern with the threshold and outputs a signal to the dictionary section for selecting a matching word IF for each character pattern. be.

（作用）この発明の文字認識装置によれば、各文字パタンの位置
特徴点座標は１原水めるのみで良く、従って、文字外接
矩形の上端及び下端の高さ位置によるヒストグラムを２
度作成していた従来構成に比し、処理が簡易になる。(Function) According to the character recognition device of the present invention, the position feature point coordinates of each character pattern need only be calculated by one source, and therefore the histogram based on the height positions of the upper and lower ends of the character circumscribed rectangle can be calculated by
Compared to the conventional configuration, which was created once, processing becomes simpler.

ざらにこの発明の文字認識装置によれば、文字パタンの
分類値を、隣接する文字パタン各々の位置特徴点座標及
び隣接文字パタンに付与されている分類値に基づいて決
定している。このように隣接文字パタン同士の位置特徴
点座標を用いた場合、文字列にスキューが存在する場合
でも互いの位置特徴点座標にはスキューによる誤差の影
響が及びにくい。このため、スキューによる文字高ざの
ずれを補正しなくとも分類値を精度良く決定出来る。ざ
らに、スキューによる文字高さのずれを補正するための
処理が不要になるので、その分、分類値決定の処理の簡
易化が図れる。Briefly, according to the character recognition device of the present invention, the classification value of a character pattern is determined based on the position feature point coordinates of each adjacent character pattern and the classification value given to the adjacent character pattern. When the position feature point coordinates of adjacent character patterns are used in this way, even if a skew exists in the character string, the mutual position feature point coordinates are less likely to be affected by errors due to the skew. Therefore, the classification value can be determined with high accuracy without correcting the deviation in character height due to skew. In general, since the process for correcting the deviation in character height due to skew is unnecessary, the process for determining the classification value can be simplified accordingly.

また、分類値の度数を求め最も高い度数を示した分類値
を閾値とし各文字パタンの分類値を該閾値とそれぞれ比
較する構成とした場合、各文字パタンを、分類値が閾値
より大きい文字パタンの組、分類値が閾値と等しい文字
パタンの組及び分類値が閾値よつ小さい文字パタンの組
という３つのカテゴリに容易に分類できる。In addition, if the frequency of the classification value is determined, and the classification value that shows the highest frequency is used as a threshold, and the classification value of each character pattern is compared with the threshold, each character pattern is compared to the character pattern whose classification value is greater than the threshold. They can be easily classified into three categories: a set of character patterns whose classification value is equal to the threshold value, and a set of character patterns whose classification value is smaller than the threshold value.

（実施例）以下、図面を参照してこの発明の文字認識装置の実施例
につき説明する。(Embodiments) Hereinafter, embodiments of the character recognition device of the present invention will be described with reference to the drawings.

″ｆｊ７−゛　　　の　　　のｔ　８第１図は、実施例の文字認識装置の構ＩＩｉを概略的に
示したブロック図である。FIG. 1 is a block diagram schematically showing the structure of the character recognition device IIi of the embodiment.

この実施例の文字認識装置１０は、文字、図形等（以下
、文字と称する。）が記載された帳票等のような媒体（
図示せず）からの光信号ＬＩＦｒ光電変換し量子化して
媒体上の文字列の画像゛データを得る光電変換部１２、
該画像データより１文字づつの文字パタンを切り出す文
字切り出し部１４、切り出された文字パタンを格納する
パタンレジスタ１６、各文字パタンを分類するための文
字分類部１８、複数の辞Ｎを有し文字分類部１６による
分類結果に基づいて１つの辞書が選択される辞書部２０
、選択された辞書を用いて文字パタンの照合を行ない文
字を認識する識別部２２及び認識文字名を例えば外部コ
ンピュータ、外部表示装置等に出力するための文字名出
力端子２４ヲ具える。The character recognition device 10 of this embodiment uses a medium (such as a form, etc.) on which characters, figures, etc. (hereinafter referred to as characters) are written.
a photoelectric conversion unit 12 that photoelectrically converts and quantizes an optical signal LIFr from (not shown) to obtain image data of a character string on a medium;
A character extraction unit 14 that extracts character patterns one character at a time from the image data, a pattern register 16 that stores the extracted character patterns, a character classification unit 18 that classifies each character pattern, and a character Dictionary unit 20 where one dictionary is selected based on the classification result by the classification unit 16
, an identification section 22 for comparing character patterns using a selected dictionary and recognizing characters, and a character name output terminal 24 for outputting recognized character names to, for example, an external computer, an external display device, etc.

ここで、光電変換部１２は、従来公知のイメージセンサ
等で構成出来、この場合例えば文字線部を画素値「１」
の黒ビット及び背景部を画素値「Ｏ」の白ビットとして
各画素毎に２値のディジタル信号で表現した画像データ
を作成し、文字切り出し部１４に出力する構成としであ
る。Here, the photoelectric conversion unit 12 can be configured with a conventionally known image sensor or the like, and in this case, for example, the character line portion has a pixel value of “1”.
Image data is created for each pixel by using a binary digital signal, with the black bit and the background part as a white bit with a pixel value of "O", and is output to the character cutting section 14.

次に、文字切り出し部１４は、この実施例の場合、充電
変換部１２て得た画像データから行画像データを切り出
すための行切り出し部１４ａと、切り出された該行画像
データを格納するラインバッファ＋４ｂと、ラインバッ
ファ＋４ｂに格納された行画像データから文字パタンを
切り出すための文字パタン切り出し部１４ｃとで構成し
である。そして、この文字切り出し部１４は、文字分類
部１８に対し、文字パタンのラインバッファ＋４ｂにお
ける位置を示す座標及び１行分の文字の切り出しが終了
したことを知らせる信号を出力する。Next, in the case of this embodiment, the character cutting section 14 includes a line cutting section 14a for cutting out line image data from the image data obtained by the charging conversion section 12, and a line buffer for storing the cut out line image data. +4b, and a character pattern cutting section 14c for cutting out a character pattern from the line image data stored in the line buffer +4b. Then, the character cutting section 14 outputs to the character classification section 18 the coordinates indicating the position of the character pattern in the line buffer +4b and a signal informing that cutting out of one line of characters has been completed.

なお、文字切り出し部１４に備わる行切り出し部１４ａ
は、光電変換部１２から出力される画像データを格納す
る。また、ラインバッファ１４ｂは、この場合、１２８
＊４０９６画素分のデータを格納出来る容量を有するも
ので構成しである。また、文字パタン切り出し部１４ｃ
は、ラインバッファ＋４ｂに格納されている１行分の画
像データから１文字分の文字パタンを切り出す。Note that the line cutting section 14a provided in the character cutting section 14
stores the image data output from the photoelectric conversion section 12. Further, in this case, the line buffer 14b is 128
*It is constructed with a capacity that can store data for 4096 pixels. In addition, the character pattern cutting section 14c
cuts out a character pattern for one character from one line of image data stored in line buffer +4b.

次に、パタンレジスタ１６は、この場合、１２８×１２
８画素分のデータを格納出来る容量を有するもので構成
しである。Next, the pattern register 16 is 128×12
It has a capacity that can store data for 8 pixels.

次１こ、文字分類部１８は、この実施例の場合、各文字
パタンの文字列中（この実施例では、ラインバッファ＋
４ｂ中）における位置特徴点座標としてのこの場合下端
点を検出する下端点検出部１８ａ　ｖ？具える。さらに
文字分類部１８は、各文字パタンの分類値を、下端点検
出部１８ａにより検出した着目文字パタンの下端点座標
、該着目文字パタンに隣接する文字パタン（この実施例
の場合は着目文字パタンの直前の文字パタン）の下端点
座標及び該直前の文字パタンに対し付与されている分類
値に基づいて順次決定する分類値決定部＋８ｂとを具え
る。ざらに、この実施例の場合の文字分類部１８は、直
前の文字パタンの下端点（これを前下端点と云う。）及
び直前の文字パタンに付与される分類値（これを前分類
値と云う、）を格納するための前下端点・前分類値格納
部１８ｃと、分類値決定部＋８ｂで決定される分類値毎
の度数によるヒストグラムを作成し最も高い度数を示し
た分類値を格納するヒストグラム作成部＋８ｄと、分類
値決定部＋８ｂて決定された各文字パタン毎の分類値を
記憶する分類値記憶部１８ｅと、ヒストグラム作成部＋
８ｄに格納されている最も高い度数を示した分類値をｌ
！＠とじ分類値記憶部１８ｅ　（こ格納されている各文
字パタンの分類値を該閾値とそれぞれ比較して各文字パ
タンの照合用辞書を選択する信号（以下、辞書選択信号
と称することもある。）を辞書部２０に出力する分類値
比較部＋８ｆとを具えでいる。Next, in the case of this embodiment, the character classification unit 18 analyzes the character strings of each character pattern (in this embodiment, the line buffer +
In this case, the lower end point detection unit 18a v? detects the lower end point as the position feature point coordinates in (middle 4b). equip Further, the character classification unit 18 calculates the classification value of each character pattern by using the lower end point coordinates of the character pattern of interest detected by the lower end point detection unit 18a, the character pattern adjacent to the character pattern of interest (in the case of this embodiment, the character pattern of interest). The classification value determining unit +8b sequentially determines the coordinates of the lower end point of the immediately preceding character pattern) and the classification value assigned to the immediately preceding character pattern. Roughly speaking, the character classification unit 18 in this embodiment uses the lower end point of the immediately preceding character pattern (this is referred to as the previous lower end point) and the classification value given to the immediately preceding character pattern (this is referred to as the previous classification value). A histogram is created based on the frequency of each classification value determined by the lower end point/previous classification value storage section 18c for storing the previous lower end point and classification value determination section +8b, and the classification value showing the highest frequency is stored. A histogram creation unit +8d, a classification value storage unit 18e that stores classification values for each character pattern determined by the classification value determination unit +8b, and a histogram creation unit +8d.
The classification value showing the highest frequency stored in 8d is
! @Binding classification value storage unit 18e (a signal for comparing the stored classification values of each character pattern with the threshold value and selecting a dictionary for checking each character pattern (hereinafter sometimes referred to as a dictionary selection signal). ) to the dictionary section 20.

また、辞書部２０は、この実施例の場合、第２図に示す
ように、第１の辞書２０ａ、第２の辞ｌＦ２０ｂ及び第
３の辞１２０ｃの合計３種類の辞書と、第１〜第３の辞
１２０ａ、２０ｂ、２０ｃの中から分類値比較部１８ｆ
かう出力される辞書選択信号に応した１つの辞１を選択
し識別部２２に出力する辞書選択部２０ｄとを具えた構
成としである。ここで、第１〜第３の辞ｉ！Ｆ２０ａ、
２０ｂ、２０ｃは、この実施例の場合、通常ヘースライ
ンと称されている基準線を基にし、第１の辞ｔＦ２０ａ
ｌ二ついては英語小文字ｒ９．ｌ）。In addition, in the case of this embodiment, the dictionary section 20 includes a total of three types of dictionaries, a first dictionary 20a, a second dictionary IF 20b, and a third dictionary 120c, and first to third dictionaries, as shown in FIG. Classification value comparison unit 18f from among the words 120a, 20b, and 20c of 3.
The configuration includes a dictionary selection section 20d that selects one word 1 corresponding to the output dictionary selection signal and outputs it to the identification section 22. Here, the first to third words i! F20a,
In this embodiment, 20b and 20c are based on a reference line that is usually referred to as the Haese line, and are based on the first term tF20a.
The two of them are English lowercase letters r9. l).

ｙＪ等のように文字線か基準線の下側に突き出るような
文字用の辞書としてあり、第２の辞書２０ｂについては
ｒＡ、Ｂ、ａ、ｂＪ等のように文字の下端か基準線に乗
っているような文字用の辞書としであり、第３の辞書２
０ｃについては「゛」等のような文字の下端か基準線よ
りもかなり上にある文字用の辞書としである。It is a dictionary for characters that protrude below the character line or reference line, such as yJ, and the second dictionary 20b is for characters that protrude below the character line or reference line, such as rA, B, a, bJ, etc. This is a dictionary for characters such as
Regarding 0c, it is used as a dictionary for characters such as "゛" which are located at the lower end of the character or considerably above the reference line.

門！η識壮の　言Ｂ次に、実施例の文字認識装置の理解を深めるために、第
１図〜第７図を参照して実施例の文字認識装置の動作説
明を行なう。ここで、第３図は、この動作説明を容易に
するための帳票３１の具体例を示したものであり、ｒＴ
ｈｅｙｌｌ　ｃ＋ｅｔ　ｏｎ　ｔｈｅｂｕｓ、　Ｊなる
文字列３３を有する帳票例を示した図である。また、第
４図は帳票３１の画像データから行切り出し部１４ａに
よって切り出された行画像データ４１を示した図である
。また、第５図（Ａ）及び（８）は文字分類部１８の動
作をＲ略的に示した流れ図、第６図は分類値決定部＋８
ｂの動作を概略的に示した流れ図である。また、第７図
は文字列３３の各文字パタンに付与された分類値を説明
するための図である。gate! Next, in order to deepen the understanding of the character recognition device of the embodiment, the operation of the character recognition device of the embodiment will be explained with reference to FIGS. 1 to 7. Here, FIG. 3 shows a specific example of the form 31 to facilitate the explanation of this operation.
FIG. 4 is a diagram illustrating an example of a form having a character string 33 such as heyll c+et on thebus, J. Further, FIG. 4 is a diagram showing line image data 41 cut out from the image data of the form 31 by the line cutout section 14a. 5(A) and (8) are flowcharts schematically showing the operation of the character classification section 18, and FIG. 6 is a flowchart schematically showing the operation of the character classification section 18.
3 is a flowchart schematically showing the operation of step b. Further, FIG. 7 is a diagram for explaining classification values given to each character pattern of the character string 33.

ます、光電変換部１２は、文字線部を画素値「１」の黒
ビット及び背景部を画素値「０」の白ヒツトとして各画
素毎に２値のディジタル信号で表現した画像データを文
字切り出し部１４に出力する。First, the photoelectric conversion unit 12 cuts out characters from the image data expressed by binary digital signals for each pixel, with the character line part being a black bit with a pixel value of "1" and the background part being a white bit with a pixel value of "0". output to section 14.

次に、文字切り出し部１４の行切り出し部１４ａは、光
電変換部１２から入力される画像データを横方向（第４
図中のＸ方向に相当する方向）を主走査方向とし及び縦
方向（第４図中のＹ方向に相当する方向）を副走査方向
としで走査し黒ビットの分布を作成する。そして、この
黒ビットの分布においで、黒ピット数がＯがら１以上に
変化する位置（主走査線）より黒ビット数か１以上から
Ｏに変化する直前の位Ｍ（主走査線）までを１行の文字
行領域とし、この領域の画像データ部分を行画像データ
としてラインバッファ＋４ｂに出力する。Next, the line cutting section 14a of the character cutting section 14 converts the image data input from the photoelectric conversion section 12 in the horizontal direction (fourth direction).
A distribution of black bits is created by scanning with the main scanning direction (direction corresponding to the X direction in the figure) and the sub-scanning direction (vertical direction (direction corresponding to the Y direction in FIG. 4)). In this distribution of black bits, from the position (main scanning line) where the number of black pits changes from O to 1 or more, to the position M (main scanning line) immediately before the number of black bits changes from 1 or more to O (main scanning line). It is set as a character line area of one line, and the image data portion of this area is output to the line buffer +4b as line image data.

次に、ラインバッファ＋４ｂは、各文字パタンの行画像
データにおける各画素の信号を文字行領域の２次元座標
通りに再現出来る形式で記憶する。Next, the line buffer +4b stores the signal of each pixel in the line image data of each character pattern in a format that can reproduce the two-dimensional coordinates of the character line area.

なお、既に説明した通り、ラインバッファ１４ｂは、１
２８Ｘ４０９６画素分の容量を持っでいる。第４図は、
ｒＴｈｅｙ’ｌｌ　ｑｅｔ　ａｎ　ｔｈｅ　ｂｕｓ、　
Ｊなる文字列３３を有する帳票３１（第３図参照）から
行切り出し部１４ａによって切り出された行画像データ
を示した図である。ここで、上述の文字行領域とは、帳
票３１上における１行分の文字が記載出来る領域のこと
である。Note that, as already explained, the line buffer 14b has 1
It has a capacity of 28 x 4096 pixels. Figure 4 shows
rThey'll get the bus,
3 is a diagram showing line image data cut out by a line cutting unit 14a from a form 31 (see FIG. 3) having a character string 33 of J. FIG. Here, the above-mentioned character line area is an area on the form 31 in which one line of characters can be written.

次に、文字切り出し部１４の文字パタン切り出し部１４
ｃは、ラインバッファ＋４ｂがら行画像データを読み込
みこの行画像データ８縦方向（第４図中のｙ方向）を主
走査方向及び横方向（第４図中のＸ方向）を副走査方向
として走査し黒ビットの分布を作成する。そして、この
黒ビットの分布にお（主走査線）より黒ビット数が１以
上からＯに変化する直前の位置（主走査線）までを１つ
の文字領域として切り出して文字パタンを得る。得られ
た文字パタンは、パタンレジスタ１６に格納され、同時
に文字分類部１８に出力される。Next, the character pattern cutting section 14 of the character cutting section 14
c reads line image data from line buffer +4b and scans this line image data 8 with the vertical direction (y direction in Figure 4) as the main scanning direction and the horizontal direction (X direction in Figure 4) as the sub scanning direction. and create a distribution of black bits. Then, from this black bit distribution (main scanning line), a character pattern is obtained by cutting out a character area from the position (main scanning line) just before the number of black bits changes from 1 or more to O (main scanning line). The obtained character pattern is stored in the pattern register 16 and simultaneously output to the character classification section 18.

パタンレジスタ１６は、文字パタン切り出し部１４ｃか
ら入力された文字パタンを、上記文字領域における各画
素の信号をこの文字領域の２次元座標通りに再現出来る
形式で記憶する。なお、パタンレジスタ１６は、既に説
明した通り、１２８Ｘ１２８画素分の容量を持っている
。The pattern register 16 stores the character pattern input from the character pattern cutting section 14c in a format that allows the signal of each pixel in the character area to be reproduced according to the two-dimensional coordinates of this character area. Note that the pattern register 16 has a capacity for 128×128 pixels, as already explained.

次に、文字分類部１８は、文字パタン切り出し部１４ｃ
から入力される文字パタンに対し以下に説明するような
手順で分類値を付与する。この説明においては、第１図
に示したブロック図と、第５図（Ａ）、（Ｂ）及び第６
図に示した流れ図とを適宜参照されたい。Next, the character classification section 18 operates the character pattern cutting section 14c.
A classification value is assigned to the character pattern input from the following procedure. In this explanation, the block diagram shown in FIG. 1, FIGS.
Please refer to the flow chart shown in the figure as appropriate.

ます、文字分類部１８の下端点検出部１８ａは、文字切
り出し部１４の文字パタン切り出し部１４ｃに（ステッ
プ１０１）。なお、この読み込まれた文字パタンを以下
着目文字パタンと称しで説明する。First, the lower end point detection section 18a of the character classification section 18 is sent to the character pattern cutting section 14c of the character cutting section 14 (step 101). Note that this read character pattern will be referred to as a character pattern of interest and will be explained below.

次に、下端点検出部１８ａ（、ｔ、着目文字パタンの下
端点を、文字パタン切り出し部１４ｃから出力されるＸ
−Ｙ座標を用いて検出する（ステップ１０２）。ざらに
、着目文字パタンか１行中の先頭の文字パタンであるか
否かを例えば上述のＸ座標により判断する（ステップ１
０３）。着目文字パタンか先頭の文字パタンであるなら
ば、分類値決定部＋８ｂは、前下端点・前分類値記憶部
１８ｃに、前下端点の初期値として着目文字パタンのＹ
座標を格納し、前分類値の初期値としてこの実施例の場
合「５」を格納する（ステップ１０４，１０５　）。こ
の実施例の場合、１行中の先頭の文字パタンは、第３図
及び第４図から明らかなように、「Ｔ」でありそのＹ座
表が「６０」であるので、前下端点・前分類値記憶部１
８ｃには、前下端点の初期値として「６０」が記憶され
、前分類値の初期値として「５」が記憶される。Next, the lower end point detection unit 18a (, t) detects the lower end point of the character pattern of interest by
-Detect using the Y coordinate (step 102). Roughly, it is determined whether the character pattern of interest is the first character pattern in one line, for example, based on the above-mentioned X coordinate (step 1
03). If it is the character pattern of interest or the first character pattern, the classification value determination unit +8b stores the Y of the character pattern of interest as the initial value of the previous lower end point in the previous lower end point/previous classification value storage unit 18c.
The coordinates are stored, and in this embodiment, "5" is stored as the initial value of the pre-classification value (steps 104 and 105). In this example, as is clear from FIGS. 3 and 4, the first character pattern in one line is "T" and its Y coordinate is "60", so the lower front end point and Previous classification value storage unit 1
In 8c, "60" is stored as the initial value of the previous lower end point, and "5" is stored as the initial value of the previous classification value.

次に、下端点検出部１８ａは、つぎの文字パタン（この
例では「ｈ」の文字パタン）を読み込み（ステップ１０
１　）　、この着目文字パタンの下端点を先頭の文字パ
タンの場合と同様１こ検出する（ステップ１０２）。し
かし、このｒｈＪの文字パタンは先頭の文字パタンでは
ないので、この文字パタンについての分類値は、ステッ
プ１０６における手順により決定する。これにつき、第
６図１Ｆ！、９照して説明する。すなわち、（Ａ）着目文字パタン（この場合ｒｈＪの文字パタン）
の下端点と該着目文字パタンにｍ接する文字パタン（こ
の実施例では直前の文字パタンとしているのでこの場合
は「Ｔ」の文字パタン）の下端点との差が予め定めた値
２ｔよりも大きい場合は、前分類値から２１Ｆｒ減じた
値を着目文字パタンの分類値としくステップ２０１，２
０２　）、（８）前記差かｔよりも大きく２ｔ以下の場
合は、前分類値から１を減じた値を着目文字パタンの分
類値としくステップ２０３，２０４　）、（Ｃ）前記差
が一２ｔよりも小さい場合は、前分類値（こ２ｔ加えた
値を着目文字パタンの分類値としくステップ２０５．２
０６　’）、（Ｄ）前記差が−１よりも小さく−２ｔ以上の場合は、
前分類値に１を加えた値を着目文字パタンの分類値とし
くステップ２０７．２０８　）　、及び（Ｄ前記差が−
を以上でｔ以下の場合は、前分類値をそのまま着目文字
パタンの分＠値とする（ステップ２０９　）　。Next, the lower end point detection unit 18a reads the next character pattern (in this example, the character pattern of "h") (step 10).
1) The lower end point of this character pattern of interest is detected once as in the case of the first character pattern (step 102). However, since this rhJ character pattern is not the first character pattern, the classification value for this character pattern is determined by the procedure in step 106. Regarding this, Figure 6 1F! , 9 will be explained. That is, (A) Character pattern of interest (in this case, the character pattern of rhJ)
The difference between the lower end point of the character pattern m adjacent to the character pattern of interest (in this example, the immediately preceding character pattern, so in this case, the character pattern "T") is greater than a predetermined value 2t. In this case, the value obtained by subtracting 21Fr from the previous classification value is set as the classification value of the character pattern of interest, and steps 201 and 2
02), (8) If the difference is greater than t and less than or equal to 2t, the value obtained by subtracting 1 from the previous classification value is set as the classification value of the character pattern of interest. Steps 203, 204), (C) If the difference is If it is smaller than 2t, the previous classification value (adding this 2t) is set as the classification value of the character pattern of interest in step 205.2.
06'), (D) If the difference is less than -1 and greater than or equal to -2t,
The value obtained by adding 1 to the previous classification value is set as the classification value of the character pattern of interest (steps 207 and 208), and (D the difference is -
is greater than or equal to t, the previous classification value is used as it is as the value of the character pattern of interest (step 209).

ここでこの実施例の場合の予め定めた値とは、ｔ＝１４
としである。従って、「ｈ」の文字パタンの下端点は第
４図からも明らかなように直前の文字パタン「Ｔ」の下
端点と同様に「６０」であるので、これら下端点同士の
差はＯとなりこの差は−を以上でｔ以下に含まれること
になるため、ｒｌＩＪの文字パタンについてはステップ
２０９の処理が行われ、この文字パタンの分類値はｒ７
Ｊの文字パタンの分類値と同様に「５」とされる。Here, the predetermined value in this example is t=14
It's Toshide. Therefore, as is clear from Figure 4, the lower end point of the character pattern "h" is "60", just like the lower end point of the immediately preceding character pattern "T", so the difference between these lower end points is O. Since this difference is greater than or equal to - and less than or equal to t, the process of step 209 is performed for the character pattern rlIJ, and the classification value of this character pattern is r7.
Similarly to the classification value of the character pattern J, it is set to "5".

着目文字パタンの分類値の決定が終了すると、分類値決
定部は＋８ｂは、前分類値・前下端点記憶部の前分類値
及び前下端点夫々の値を、着目文字バ勺）ｌこ対１．湊
宇声わ、だ分麺硝乃び下端点Ｉこ書き換える（ステップ
１０７．１０８　）。When the determination of the classification value of the character pattern of interest is completed, the classification value determining unit stores the previous classification value and the value of the previous lower end point in the previous classification value/previous lower end point storage unit, and compares it to the character pattern of interest. 1. Rewrite Minato Usei, Dabunmen Shuno, and the lower end point I (steps 107 and 108).

また、分類値決定部＋８ｂて決定された分類値は、１行
分（この例の場合はｒ　Ｔｈｅｙ’ｌｌ　ｑｅｔ　ｏｎ
ｔｈｅ　ｂｕｓ、　Ｊ　）の分類値決定が終了するまで
、文字パタン毎に分類値記憶部１８ｅに順次に記憶され
る（ステップ１０９）。In addition, the classification value determined by the classification value determination unit +8b is divided into one row (in this example, r
The character patterns are sequentially stored in the classification value storage unit 18e for each character pattern until the classification value determination of the bus, J) is completed (step 109).

また、分類値決定部＋８ｂて文字パタンの分類値か決定
される毎に、ヒストグラム作成部１８ｄは、分類値毎の
度数を計数してそのヒストグラム、壱作成する（ステッ
プ１１０）。ｒＴ」及び「ｈＪまての２文字分の文字パ
タンまでについでは、両文字パタンの分類値が共に「５
」であるので分類値「５」の度数か２とされたヒストグ
ラムが作成される。Further, each time the classification value determination unit 8b determines the classification value of the character pattern, the histogram creation unit 18d counts the frequency of each classification value and creates a histogram of the number (step 110). Regarding the character patterns for up to two characters "rT" and "hJ", the classification value of both character patterns is "5".
” Therefore, a histogram is created in which the frequency of the classification value “5” is set to 2.

次（こ、文字分類部１日は、１行すべての文字パタンの
分類値か決定されたか否かの判断を行う（ステップ１１
１）。分類値決定が終了していない場合はステップ１０
１に戻り、次の文字パタンを読み込みステップ１０２，
１０６〜１１１の手順を緑り返才　Ｓ＋−ＩＺ−、廿バ
τ小會堂パｋｊ　−７１，−１冬公努繍の決定か終了し
たなら、文字パタン切り出し部１４ｃは、１行分の分類
値決定処理か終了したことを示すパルス信号をヒストグ
ラム作成部１８ｄに出力する。Next (step 11), the character classification unit judges whether the classification values of all character patterns in one line have been determined (step 11).
1). If classification value determination has not been completed, proceed to step 10.
Return to step 1, read the next character pattern, step 102,
After completing steps 106 to 111, the character pattern cutting section 14c classifies one line. A pulse signal indicating that the value determination process has been completed is output to the histogram creation section 18d.

なお、この例の場合のｒＴｈｅｙ’ｌｌ　ｑｅｔ　ｏｎ
　ｔｈｅｂｕｓ、　Ｊの行画像データから得た各文字パ
タンの分類値は、第７図に示すように、先頭の文字パタ
ン「Ｔ」から最後の文字パタン「、」の順に示すとｒ５
．５，５．６，４．５，５．６，５，５，５，５，５．
５，５．５，５．５．５　Ｊになる。つまり、第３文字
目の文字パタン「ｅ」は、第２文字目の文字パタン「ｈ
」の場合と同様な理由からその分類値は「５」と決定さ
れる。また、第４文字目の文字パタン「ｙ」の下端点は
第４図に示すように「７５」であるので直前の文字パタ
ンｒｅＪの下端点「６０」との差か−１５となり、この
差は−ｔ（ｔ＝１４）よりも小さく２ｔ以上の場合にな
る。従って、「ｙ」の文字パタンの分類値はステップ２
０７，２０８の処理により決定されるので、「ｙ」の分
類値はｒｅＪの分類値「５」に１か加）Ｅされた値「６
」となる。また、第５文字目の「°」の下端点は第４図
に示すように「３６」であるので直前の文字パタン「ｙ
」の下端点「７５」との差が３９となり、この差は２ｔ
（ｔ＝１４）よりも大きい場合になる。従って、「′」
の文字パタンの分類値はステップ２０１．２０２の処理
により決定されるので、「°」の分類値は「ｙ」の分類
値６から２ｔ減算した値「６」となる。以下の各文字パ
タンも第６図に示した分類値決定手順により順次決定さ
れる。In addition, in this example, rThey'll qet on
The classification value of each character pattern obtained from the line image data of thebus, J is shown in the order from the first character pattern "T" to the last character pattern "," as shown in FIG.
．． 5, 5.6, 4.5, 5.6, 5, 5, 5, 5, 5.
It becomes 5, 5.5, 5.5.5 J. In other words, the third character character pattern "e" is the second character character pattern "h".
For the same reason as in the case of ``, the classification value is determined to be ``5''. Also, since the lower end point of the fourth character pattern "y" is "75" as shown in Figure 4, the difference from the lower end point "60" of the immediately preceding character pattern reJ is -15, and this difference is -15. is smaller than -t (t=14) and greater than or equal to 2t. Therefore, the classification value of the character pattern "y" is
07,208, the classification value of "y" is the value "6" obtained by adding 1 to the classification value "5" of reJ.
”. Also, since the lower end point of the fifth character "°" is "36" as shown in Figure 4, the previous character pattern "y"
” and the lower end point “75” is 39, and this difference is 2t
(t=14). Therefore, “′”
Since the classification value of the character pattern is determined by the processing in steps 201 and 202, the classification value of "°" is the value "6" obtained by subtracting 2t from the classification value 6 of "y". The following character patterns are also sequentially determined by the classification value determination procedure shown in FIG.

また、先頭の文字パタン「Ｔ」から最後の文字パタン「
、」の順に示して各文字パタンの分類値かｒ５．５，５
，６，４．５．５，６，５，５，５，５，５，５，５，
５，５，５，５　Ｊとなることから、ヒストグラム作成
部＋８ｄでは分類値４の度数か１、分類値５の度数が１
６及び分類値６の度数が２というヒストグラムが作成さ
れる。Also, from the first character pattern "T" to the last character pattern "
,'' and the classification value of each character pattern is r5.5,5.
,6,4.5.5,6,5,5,5,5,5,5,5,
5, 5, 5, 5 J, so in histogram creation section +8d, the frequency of classification value 4 is 1, and the frequency of classification value 5 is 1.
A histogram in which the frequency of the classification value 6 and the classification value 6 is 2 is created.

次に、ヒストグラム作成部＋８ｄは、最大度数を示した
分類値を閾値下目りとして記憶する（ステップ１１２）
。Next, the histogram creation unit +8d stores the classification value indicating the maximum frequency as the lower threshold value (step 112).
.

次に、分類値比較部＋８ｆは、分類値記憶部１８ｅに記
憶されている各文字パタンの分類値を読み込み（ステッ
プ１１３）これら分類値かヒストグラム作成部＋８ｄに
記憶されている閾値下目しより小さいか否かの判断を行
い（ステップ＋１４　）　、文字パタンの分類値か閾値
下目しより小さい場合は当該文字パタンの辞書選択信号
Ｓを１とする（ステップ１１５）。分類値比較部＋８ｆ
ての上記判断において文字パタンの分類値が閾値ＴＨＬ
より大きい場合または等しい場合は次に文字パタンの分
類値か閾値ＴＨＬと等しいか否かを判断しくステップ＋
１６　）　、該分類値と閾値下目りとが等しい場合は当
該文字パタンの辞書選択信号Ｓｔ２とする（ステップ１
１７）。また、文字パタンの分類値が閾値下目しより大
きい場合は当該文字パタンの辞書選択信号５７ａ３とす
る（ステップ１１８）。Next, the classification value comparison unit +8f reads the classification values of each character pattern stored in the classification value storage unit 18e (step 113). It is determined whether or not the character pattern is smaller (step +14), and if the classification value of the character pattern is smaller than the threshold lower eyemark, the dictionary selection signal S for the character pattern is set to 1 (step 115). Classification value comparison section +8f
In the above judgment, the classification value of the character pattern is the threshold value THL.
If it is greater than or equal to the threshold THL, then step +
16) If the classification value and the lower threshold value are equal, the dictionary selection signal St2 is used for the character pattern (step 1).
17). Further, if the classification value of the character pattern is larger than the lower threshold value, the dictionary selection signal 57a3 for the character pattern is set (step 118).

この実施例の場合の文字列ｒＴｈｅｙ’ｌｌ　ｑｅｔ　
ｏｎｔｈｅ　ｂｕｓ、　Ｊの例では、閾値ＴＨＬは、度
数が最大な分類値「分類値５」とされる。そして、この
文字列の各文字パタンの辞書選択信号Ｓは、分類値か「
４」である文字パタン「′」については１とされ、分類
値か「５」である文字パタン「Ｔ。In this example, the string rThey'll qet
In the example of onthe bus, J, the threshold value THL is set to the classification value “classification value 5” with the highest frequency. Then, the dictionary selection signal S of each character pattern of this character string is a classification value or "
The character pattern "'" whose classification value is "4" is set to 1, and the character pattern "T" whose classification value is "5".

ｈ、ｅ、１．１．ｅ、ｔ、ｏ、ｎ、ｔ、ｈ、ｅ。h, e, 1.1. e, t, o, n, t, h, e.

ｂ、ｕ、ｓ、、Ｊ夫々については２とされ、分類値か「
６」である文字パタンｒＶ、９Ｊ夫々についでは３とさ
れる。Each of b, u, s, , J is set to 2, and the classification value is
6'' character patterns rV and 9J are respectively set to 3.

次に、分類値比較部＋８ｆは、これら定めた辞書選択信
号Ｓｔ辞書部２０に出力する（ステップ１１９）。Next, the classification value comparison section +8f outputs these determined dictionary selection signals St to the dictionary section 20 (step 119).

ざらに、分類値比較部＋８ｆは、１行分の文字パタン全
部ついて辞書選択信号Ｓの決定がなされたか否かを判断
する（ステップ１２０）。１行分の処理か終了していた
ならば、文字認識袋Ｍ１０は、画像データの第２行目以
後の行画像データについても文字切り出し及び分類値決
定の上述した一連の処理を行う。Roughly speaking, the classification value comparison unit +8f determines whether or not the dictionary selection signal S has been determined for all the character patterns for one line (step 120). If the processing for one line has been completed, the character recognition bag M10 performs the above-described series of processing of character extraction and classification value determination for the second and subsequent line image data of the image data.

次に、認識部２２は、パタンレジスタ１６に記憶されて
いる文字パタンについて特徴抽出処理及び認識を行う。Next, the recognition unit 22 performs feature extraction processing and recognition on the character pattern stored in the pattern register 16.

特徴抽出は、従来公知の種々の方法により行なうことが
出来るが、この実施例の場合以下に説明するような方法
で行なう。Feature extraction can be performed using various conventionally known methods, but in this embodiment, it is performed using the method described below.

先ず、文字パタンについてその文字線部に外接する例え
ば方形の枠を検出し、これを文字枠とする。ざらに当該
文字パタンについて線幅Ｗを算出する。この線幅Ｗの算
出は、例えば下記（１）式で示すような周知の近似式を
用いて行うことが出来る。First, a rectangular frame, for example, circumscribing a character line portion of a character pattern is detected, and this is defined as a character frame. Roughly calculate the line width W for the character pattern. This calculation of the line width W can be performed using, for example, a well-known approximation formula as shown in equation (1) below.

Ｗ＝　１／　（１−（Ｑ／Ａ））　・・・（１）但し、
（１）式においで、Ｑは文字パタンを構成する各点をこ
れらの点か（２ｘ２）個づつの範囲で見られる窓で分け
た場合のこの窓内の４つの点全てか黒ビットとなる窓の
数であり、またＡは文字枠内の黒ヒツトの個数である。W= 1/ (1-(Q/A)) ... (1) However,
In equation (1), Q is the black bit for all four points within this window when each point constituting the character pattern is divided into a window that can be seen in a range of (2 x 2) points each. It is the number of windows, and A is the number of black hits in the character frame.

ざらに、この文字パタンを複数の方向に走査を行なって
各走査列毎の黒ヒツトの連続個数を検出し、この黒ヒツ
トの連続個数と、上述の線幅Ｗとに基づいて上述の複数
の方向毎に対応したサブパタンをそれぞれ抽出する。そ
して、この文字パタンの上述の文字枠内を各サブパタン
について（ＮＸＭ）個の領１’ｆｔＣＭ、Ｎは定数）に
分割し、ざらに各分割領域内の文字線を表わす特徴量を
各分割領域毎に計算し、この特徴ｔを文字枠の大きさで
正規化して特徴マトリクスを得る。この実施例では、特
徴量を（ΔＸ十△Ｙ）／２なる値で除することによって
正規化する。ここで、Δ×は文字枠の水平方向の長さ、
ΔＹは文字枠の垂直方向の長さである。Roughly, this character pattern is scanned in multiple directions to detect the number of consecutive black hits in each scanning line, and based on this number of consecutive black hits and the line width W described above, the number of consecutive black hits is detected. A subpattern corresponding to each direction is extracted. Then, the above-mentioned character frame of this character pattern is divided into (N This feature t is normalized by the size of the character frame to obtain a feature matrix. In this embodiment, the feature amount is normalized by dividing it by a value of (ΔX+ΔY)/2. Here, Δ× is the horizontal length of the character frame,
ΔY is the length of the character frame in the vertical direction.

また、識別部２２は、このようにして求めた特徴マトリ
クスと、辞書選択信号Ｓによって選択きれる第１〜第３
の辞書２０ａ、２０ｂ、２０ｃ　　（第２図参照）のう
ちのいずれかの辞書中の辞書マトリクスとの照合を行い
、最も類似度か大きな値を示した辞書マトリクスに対応
する文字名（ＪＩＳコード）を出力端子２４を介して外
部装置に出力する。Further, the identification unit 22 uses the feature matrix obtained in this way and the first to third
The character name (JIS code) corresponding to the dictionary matrix in one of the dictionaries 20a, 20b, and 20c (see Figure 2) that shows the highest degree of similarity or the largest value. is output to an external device via the output terminal 24.

なお、この実施例の場合上述した類似度は、以下に示す
（２）式に基いて求めている。Note that in this example, the above-mentioned similarity is calculated based on equation (2) shown below.

但し、（２）式において、８は類似度、ｆｌは被認識文
字の文字パタンの特徴マトリクスの要素値、９ムは辞書
マトリクスの要素値、ＮＸＭは被認識文字の特徴マトリ
クス及び辞書マトリクスの次元数をそれぞれ示す。However, in equation (2), 8 is the similarity, fl is the element value of the feature matrix of the character pattern of the character to be recognized, 9 is the element value of the dictionary matrix, and NXM is the dimension of the feature matrix of the character to be recognized and the dictionary matrix. Indicate the number of each.

上述においては、この発明の文字認識装置の実施例につ
き説明したか、この発明は上述の実施例のみに限られる
ものではなく以下に説明するような種々の変更を加える
ことが出来る。Although the embodiments of the character recognition device of the present invention have been described above, the present invention is not limited to the above-described embodiments, and can be modified in various ways as described below.

上述の実施例においでは、分類値決定部＋８ｂは前分類
値に１または２ｔ加えて得られる値、前分類値から１ま
たは２ｔ減じて得られる値及び前分類値をそのままの値
の合計５つの分類値からいずれかの値を文字パタンの分
類値として付与していたか、分類値の種類はこれに限ら
れるものではなく設計に応し変更出来る。In the above embodiment, the classification value determination unit +8b generates a total of five values: a value obtained by adding 1 or 2t to the previous classification value, a value obtained by subtracting 1 or 2t from the previous classification value, and a value obtained by leaving the previous classification value unchanged. Whether one of the classification values is given as the classification value of the character pattern, the type of classification value is not limited to this and can be changed according to the design.

また、上述の実施例では分類値の決定のために隣接する
文字パタンの各下端点同士の差を用いているか、下端点
の用い方はこれに限られるものではない。Further, in the above embodiment, the difference between the lower end points of adjacent character patterns is used to determine the classification value, but the method of using the lower end points is not limited to this.

また、上述の実施例では、各文字パタンの位置特徴点座
標を、各文字パタンの下端点としでいたか、位置特徴点
座標は各文字パタンの上端点としても良く、または、各
文字点のパタンの上端点及び下端点両方としでも良い。In addition, in the above embodiment, the position feature point coordinates of each character pattern are set as the lower end point of each character pattern, or the position feature point coordinates may be set as the upper end point of each character pattern. Both the upper end point and the lower end point of the pattern may be used.

（発明の効果）上述した説明からも明らかなように、この発明の文字認
識装置によれば、着目文字パタンの位置特徴点座標と、
該着目文字パタンの直前の文字パタンの位置特徴点座標
と、該直前の文字パタンに付与された分類値とに基いて
着目文字パタンの分類Ｉａを付与する手順により、−行
中の各文字パタンの分類値を夫々決定し、この決定され
た分類値に対応する辞書を選択しこの選択された辞書（
こより認識を行う。(Effects of the Invention) As is clear from the above description, according to the character recognition device of the present invention, the position feature point coordinates of the character pattern of interest,
Each character pattern in the - line is assigned a classification Ia to the character pattern of interest based on the position feature point coordinates of the character pattern immediately before the character pattern of interest and the classification value assigned to the character pattern immediately before the character pattern of interest. Decide the classification value of each, select the dictionary corresponding to the determined classification value, and select this selected dictionary (
Recognize from this.

このため、各文字パタンの位置特徴点座標は１原末める
のみで良く、従って、文字タト接矩形の上端及び下端の
高さ位置によるヒストグラムを２度作成していた従来構
成に比し、処理か簡易になる。For this reason, the position feature point coordinates of each character pattern only need to be set at one point. Therefore, compared to the conventional configuration in which a histogram is created twice based on the height positions of the upper and lower ends of the character Tato rectangle, Processing becomes easier.

また、文字パタンの分類値の決定に際しａｓＩ接文字パ
タン同士の位置特徴点座標を用いているので文字列にス
キューか存在する場合でも互いの位置特徴点座標にはス
キューによる誤差゛の影響が及びにくい。このため、ス
キューによる文字高さのずれを補正しなくとも分類値を
精度良く決定出来るので、スキューによる文字高さのず
れを補正するための処理か不要になり、その分、分類値
決定の処理の簡易化か図れる。In addition, since the positional feature point coordinates of the asI tangential character patterns are used to determine the classification value of the character pattern, even if there is a skew in the character string, the positional feature point coordinates of each other will be affected by errors due to the skew. Hateful. Therefore, the classification value can be determined with high accuracy without correcting the deviation in character height due to skew, so there is no need to process to correct the deviation in character height due to skew, and the processing for determining the classification value is processed accordingly. It can be simplified.

このため、字形か同して記載位置だけか異なるような文
字パタンを従来より容易にかつ高速に然も精度良く分類
することが可能となるので、文字を高速にかつ正確に認
識出来る装置の実現が可能になる。As a result, it becomes possible to classify character patterns that have the same shape but differ only in writing position more easily and quickly than before, and with high accuracy, thereby realizing a device that can recognize characters quickly and accurately. becomes possible.

また、分類値の度数を求め最も高い度数を示した分類値
を閾値とし各文字パタンの分類値を該閾値とそれぞれ比
較する構成とした場合、各文字パタンを、分類値が閾値
より大きい文字パタンの絹、分類値が閾値と等しい文字
パタンの組及び分類値か閾値より小さい文字パタンの組
という３つのカテゴリに容易に分類できるので、辞書部
の構ＩＩｉ、を簡易に出来るという効果も得られる。In addition, if the frequency of the classification value is determined, and the classification value that shows the highest frequency is used as a threshold, and the classification value of each character pattern is compared with the threshold, each character pattern is compared to the character pattern whose classification value is greater than the threshold. Since it can be easily classified into three categories: a set of character patterns whose classification value is equal to the threshold value and a set of character patterns whose classification value is smaller than the threshold value, it is possible to obtain the effect of simplifying the structure of the dictionary section. .

【図面の簡単な説明】[Brief explanation of drawings]

第１図は、実施例の文字認識装置の構成を概略的に示し
たブロック図、第２図は、実施例の辞書部の説明に供する図、第３図は
、実施例で用いた帳票の説明に供する図、第４図は、行画像データの説明に供する図、第５図（Ａ
）及び（Ｂ）は、実施例の文字分類部の動作説明に供す
る図、第６図は、実施例の分類値決定部の動作説明に供する図
、第７図は、各文字パタンに付与された分類値の説明に供
する図、第８図は、従来技術の説明に供する図である。１０・・・文字認識装置、　　１２・・・光電変換部１
４・・・文字切り出し部、　１４ａ・・・行切り出し部
、＋４ｂ・・・ラインバッファ４ｃ・・・文字パタン切り出し部６・・・パタンレジスタ、　１８・・・文字分類部８ａ
・・・下端点検出部、　＋８ｂ・・・分類値決定部８ｃ
・・・前下端点・前分類値記憶部８ｄ・・・ヒストグラム作成部８ｅ・・・分類値記憶部、　１８ｆ・・・分類値比較部
２０・・・辞書部、　　　　　２２・・・識別部２４・
・・文字名出力端子し・・・媒体からの光信号、Ｓ・・・辞書選択信号２０
ａ・・・第１の辞書、　　２０ｂ・・・第２の辞書２０
ｃ　−第３の辞書、　　２０ｄ　−・・辞書選択部３１
・・・帳票、　　　　　　３３・・・文字列４１・・・
行画像データ。Fig. 1 is a block diagram schematically showing the configuration of the character recognition device of the embodiment, Fig. 2 is a diagram for explaining the dictionary section of the embodiment, and Fig. 3 is a diagram of the form used in the embodiment. A diagram for explaining the row image data, FIG. 4, is a diagram for explaining the row image data, FIG. 5 (A
) and (B) are diagrams for explaining the operation of the character classification section of the embodiment, FIG. 6 is a diagram for explaining the operation of the classification value determination section of the embodiment, and FIG. 7 is a diagram for explaining the operation of the character classification section of the embodiment. Figure 8 is a diagram used to explain the prior art. 10...Character recognition device, 12...Photoelectric conversion unit 1
4...Character cutting section, 14a...Line cutting section, +4b...Line buffer 4c...Character pattern cutting section 6...Pattern register, 18...Character classification section 8a
...lower end point detection section, +8b...classification value determination section 8c
...Previous lower end point/previous classification value storage section 8d...Histogram creation section 8e...Classification value storage section, 18f...Classification value comparison section 20...Dictionary section, 22...Identification section 24・
...Character name output terminal...Optical signal from the medium, S...Dictionary selection signal 20
a...first dictionary, 20b...second dictionary 20
c - third dictionary, 20d - dictionary selection section 31
...Form, 33...Character string 41...
Row image data.

Claims

【特許請求の範囲】[Claims]

（１）媒体からの光信号を光電変換し量子化して媒体上
の文字列の画像データを得る光電変換部、該画像データ
より１文字づつの文字パタンを切り出す文字切り出し部
、各文字パタンを分類する文字分類部、複数の辞書を有
し前記文字分類部による分類結果に基づいて１つの辞書
が選択される辞書部及び前記選択された辞書を用いて文
字パタンの照合を行ない文字を認識する識別部を具える
文字認識装置において、文字分類部を、各文字パタンの文字列中における位置特徴点座標を検出
する位置特徴点検出部と、前記各文字パタンの分類値を、着目文字パタンの位置特
徴点座標、該着目文字パタンに隣接する文字パタンの位
置特徴点座標及び該隣接する文字パタンに付与されてい
る分類値に基づいて該着目文字パタンに分類値を付与す
る手順で、順次に決定する分類値決定部とを具える構成
としたことを特徴とする文字認識装置。(1) A photoelectric conversion unit that photoelectrically converts and quantizes the optical signal from the medium to obtain image data of a character string on the medium, a character cutting unit that cuts out character patterns one character at a time from the image data, and classifies each character pattern. a character classification unit that has a plurality of dictionaries and selects one dictionary based on the classification result by the character classification unit; and an identification unit that performs character pattern matching using the selected dictionary to recognize characters. a character recognition device comprising: a character classification section; a position feature point detection section that detects the position feature point coordinates of each character pattern in a character string; Sequentially determined by a procedure of assigning a classification value to the character pattern of interest based on the feature point coordinates, the position of the character pattern adjacent to the character pattern of interest, the coordinates of the feature point, and the classification value assigned to the adjacent character pattern. A character recognition device characterized in that it is configured to include a classification value determination unit that performs the following operations.

（２）前記分類値決定部を、（Ａ）前記着目文字パタンの前記位置特徴点座標と該着
目文字パタンに隣接する文字パタンの前記位置特徴点座
標との差が予め定めた値２ｔよりも大きい場合は、該着
目文字パタンに隣接する文字パタンに付与されている分
類値から２を減じた値を該着目文字パタンの分類値とし
、（Ｂ）前記差がｔよりも大きく２ｔ以下の場合は、該着
目文字パタンに隣接する文字パタンに付与されている分
類値から１を減じた値を該着目文字パタンの分類値とし
、（Ｃ）前記差が−２ｔよりも小さい場合は、該着目文字
パタンに隣接する文字パタンに付与されている分類値に
２を加えた値を該着目文字パタンの分類値とし、（Ｄ）前記差が−ｔよりも小さく−２ｔ以上の場合は、
該着目文字パタンに隣接する文字パタンに付与されてい
る分類値に１を加えた値を該着目文字パタンの分類値と
し、及び（Ｅ）前記差が−ｔ以上でｔ以下の場合は、該着目文字
パタンに隣接する文字パタンに付与されている分類備を
そのまま該着目文字パタンの分類値とする構成としたこ
とを特徴とする請求項１に記載の文字認識装置。(2) The classification value determining unit determines that (A) the difference between the positional feature point coordinates of the character pattern of interest and the positional feature point coordinates of a character pattern adjacent to the character pattern of interest is greater than a predetermined value 2t; If it is larger, the value obtained by subtracting 2 from the classification value given to the character pattern adjacent to the character pattern of interest is set as the classification value of the character pattern of interest; (B) If the difference is greater than t and 2t or less is the classification value of the character pattern of interest, which is the value obtained by subtracting 1 from the classification value assigned to the character pattern adjacent to the character pattern of interest; The value obtained by adding 2 to the classification value given to the character pattern adjacent to the character pattern is set as the classification value of the character pattern of interest; (D) If the difference is less than -t and greater than or equal to -2t,
The value obtained by adding 1 to the classification value given to the character pattern adjacent to the character pattern of interest is the classification value of the character pattern of interest, and (E) if the difference is greater than or equal to -t and less than or equal to t, then 2. The character recognition device according to claim 1, wherein a classification value given to a character pattern adjacent to a character pattern of interest is used as the classification value of the character pattern of interest.

（３）前記位置特徴点検出部を、各文字パタンの文字列
中における下端点、上端点又はこれら両端点を検出する
構成としたことを特徴とする請求項１に記載の文字認識
装置。(3) The character recognition device according to claim 1, wherein the positional feature point detection unit is configured to detect a lower end point, an upper end point, or both end points in a character string of each character pattern.

（４）前記分類値決定部で決定される分類値毎の度数を
求め分類値の度数によるヒストグラムを作成するヒスト
グラム作成部と、最も高い度数を示した分類値を閾値とし各文字パタンの
分類値を該閾値とそれぞれ比較して各文字パタンの照合
用辞書を選択する信号を前記辞書部に出力する分類値比
較部とをさらに具えたことを特徴とする請求項１〜３の
いずれか１項に記載の文字認識装置。(4) a histogram creation unit that calculates the frequency of each classification value determined by the classification value determination unit and creates a histogram based on the frequency of the classification values; and a classification value of each character pattern using the classification value showing the highest frequency as a threshold. 4. A classification value comparison unit that outputs a signal to the dictionary unit to select a matching dictionary for each character pattern by comparing each character pattern with the threshold value. The character recognition device described in .