JPS62224872A

JPS62224872A - Character recognition device

Info

Publication number: JPS62224872A
Application number: JP61067730A
Authority: JP
Inventors: Koichi Ejiri; 公一江尻; Hajime Sato; 元佐藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-03-26
Filing date: 1986-03-26
Publication date: 1987-10-02

Abstract

PURPOSE:To enhance the recognition rate for a document in which the character size varies by resetting the threshold for character kind discrimination corresponding to the number of rejections per a row. CONSTITUTION:A block 18 measures the height of each character in a row segmented by a segmenting circuit 14, generates the histogram of character height, obtains the threshold for character kind discrimination based on the said histogram, and sets the result in a register 26. In the means time, a block 32 counts the rejection codes per one row of an input document, and compares the counting with a prescribed value, and decides. If, in the result of the said comparison and decision, the reject codes exceeds the prescribed value, the block 18 resets the threshold in the register 26, then the recognition of the character is executed again. As a result, the recognition rate is enhanced for the document in which the character size varies.

Description

【発明の詳細な説明】〔技術分野〕本発明は、欧文文書などの異サイズの文字の混在した文
書を扱い得る文字認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a character recognition device that can handle documents containing a mixture of characters of different sizes, such as Roman documents.

〔従来技術〕[Prior art]

例えば英文文書にあっては、ｃ、ｓ、ｖ、ｗのように１
字形が相似な大文字と小文字が含まれる。For example, in an English document, 1 such as c, s, v, w
Contains uppercase and lowercase letters with similar glyph shapes.

このような異サイズの文字の混在した文書の文字認識を
行うには、大文字と小文字などを弁別する必要がある。In order to perform character recognition on a document containing a mixture of characters of different sizes, it is necessary to distinguish between uppercase and lowercase letters.

ところで、これまでの文字認識装置は活字文書の処理を
前提としているものが圧倒的に多い。活字文書の場合、
文字サイズは規格されているから、従来の文字認識処理
においては、大文字と小文字などの文字サイズに関連し
た文字種を弁別するための閾値は予め設定されるように
なっている。By the way, the overwhelming majority of conventional character recognition devices are designed to process printed documents. For printed documents,
Since character sizes are standardized, in conventional character recognition processing, thresholds for distinguishing character types related to character size, such as uppercase and lowercase letters, are set in advance.

しかし近年、ワードプロセッサ等のＯＡ機器の普及によ
りデジタル文字が容易に作られるようになり、従来の文
字認識装置では認識エラーが増加するという問題が起こ
っている。これは、デジタ小文字はサイズの変動が大き
いため、固定した閾値を用いて文字認識装置を行うと、
字形の類似した大文字と小文字などの誤弁別が発生しや
すいからである。However, in recent years, with the spread of office automation equipment such as word processors, digital characters have become easier to create, and conventional character recognition devices have been faced with the problem of increased recognition errors. This is because digital lowercase letters have large fluctuations in size, so if a character recognition device uses a fixed threshold,
This is because incorrect discrimination is likely to occur between uppercase and lowercase letters that have similar character shapes.

また、認識処理と並行して閾値をアダプティブに決定す
る方式も提案されている。しかし、文章の途中で文字サ
イズが変化した場合に適応できず、実用は困難であった
。Additionally, a method has been proposed in which a threshold value is adaptively determined in parallel with recognition processing. However, it was difficult to put it into practical use because it could not adapt to cases where the font size changed in the middle of a sentence.

〔目　的〕〔the purpose〕

したがって本発明の目的は、文字サイズの変動に対し従
来よりも強い文字認識装置を提供することにある。Therefore, an object of the present invention is to provide a character recognition device that is more resistant to variations in character size than the conventional ones.

〔構　成〕〔composition〕

文字サイズに関連した文字種別弁用の閾値を適切に設定
したとしても、文章の途中で文字サイズが変化すると、
その部分でリジェクトが急増する。Even if the threshold for character type discrimination related to font size is set appropriately, if the font size changes in the middle of a sentence,
The number of rejections increases rapidly in that area.

本発明はこの点に着目し、行の文字サイズを計測して文
字種弁別用の閾値を設定するための第１の手段と、行当
りのリジェクト数と所定値との比較判定を行う第２の手
段とを文字認識装置に設ける。そして５文字認識処理が
実行された直後の行のリジェクト数が所定値以上である
と第２手段によって判定された場合、その行について第
１手段によって閾値を設定し直した後、改めて文字認識
処理を実行させる。The present invention focuses on this point, and includes a first means for measuring the character size of a line and setting a threshold for character type discrimination, and a second means for comparing and determining the number of rejects per line with a predetermined value. means are provided in the character recognition device. If the second means determines that the number of rejects in the line immediately after the 5-character recognition process is equal to or greater than a predetermined value, the first means resets the threshold for that line, and then performs the character recognition process again. Execute.

〔実施例〕〔Example〕

以下１図面を参照し、本発明の実施例について詳細に説
明する。Embodiments of the present invention will be described in detail below with reference to one drawing.

第１図は１本発明による光学的文字認識装置（ＯＣＲ）
の一実施例を示す概略ブロック図である。第２図は、こ
のＯＣＲの動作を説明するための概略フローチャートで
ある。Figure 1 shows an optical character recognition device (OCR) according to the present invention.
1 is a schematic block diagram showing one embodiment of the present invention. FIG. 2 is a schematic flowchart for explaining the operation of this OCR.

まず第１図において、１０は入力文書を光学的に読み取
るスキャナであり、このスキャナ１０から出力される入
力文書の画像情報は画像バッファ１２に一時的に蓄積さ
れる。１４は切出回路であって、画像バッファ１２より
画像情報を読み込んで行の切出しと文字の切出しを行い
、切出した文字の画像情報を文字バッファ１６へ転送す
る。このような切出し処理は、例えば周知の射影法によ
って行われる。First, in FIG. 1, reference numeral 10 denotes a scanner that optically reads an input document, and image information of the input document output from this scanner 10 is temporarily stored in an image buffer 12. Reference numeral 14 denotes a cutting circuit which reads image information from the image buffer 12, cuts out lines and characters, and transfers the image information of the cut out characters to the character buffer 16. Such cutting processing is performed, for example, by a well-known projection method.

１８は行の文字の高さを計測し、文字認識処理に必要な
閾値と変倍定数を設定するブロックである。このブロッ
ク１８は、切出回路１４によって切出された行中の各文
字の高さを計測回路２０で計ｘ＋ＩＩＬ、文字高さのヒ
ストグラムをヒストグラムメモリ２２上に作成し、その
ヒストグラムに基づいて設定部２４で文字種弁別用閾値
および変倍定数を求めてレジスタ２６に設定するように
構成されている。18 is a block that measures the height of characters in a line and sets a threshold value and a scaling constant necessary for character recognition processing. This block 18 uses a measuring circuit 20 to measure the height of each character in a line cut out by the cutting circuit 14, totaling x+IIL, creates a histogram of character heights on a histogram memory 22, and sets the height based on the histogram. The unit 24 is configured to obtain a character type discrimination threshold and a scaling constant and set them in the register 26.

こ＼で文字高さのヒストグラムについて説明する。１ｏ
ピツチタイプの活字で印字された英文文書を２４０ｄｐ
ｉ（ドツト／インチ）でサンプリングして読み取った場
合、その文字高さのヒストグラムは第３図に示すように
なる。このヒストグラムにおいて、ピークａはノイズ、
ピークｂは記号、ピークＣは小文字、ピークｄは大文字
、ピークｅは特殊記号などにそれぞれ対応する。Here I will explain the histogram of character height. 1o
English documents printed in pitch type type at 240dp
When reading with sampling at i (dots/inch), the histogram of the character height is as shown in FIG. In this histogram, peak a is noise,
The peak b corresponds to a symbol, the peak C corresponds to a lowercase letter, the peak d corresponds to an uppercase letter, and the peak e corresponds to a special symbol.

したがって、閾値としてＨ１＝１２．Ｈ２＝２０を設定
すれば１文字高さから記号、小文字、大文字および特殊
記号の３つの文字種を弁別することができる。設定部２
４はそのような閾値Ｈ工、Ｈ２を決定してレジスタ２６
に設定する０文字認識処理部２８は、その閾値Ｈ１，Ｈ
，を用いて入力文字の種類を弁別することができる。Therefore, the threshold value is H1=12. By setting H2=20, three character types, symbols, lowercase letters, uppercase letters, and special symbols, can be distinguished from the height of one character. Setting section 2
4 determines such a threshold value H, H2 and stores it in register 26.
The 0 character recognition processing unit 28 sets the thresholds H1, H
, can be used to distinguish the type of input character.

こ＼で、文字認識処理部２８が文字認識処理のために参
照する辞書２９には、各文字が標準の大きさく例えば高
さ＝２５２幅＝１８）のパターンとして登録されている
。したがって、文字認識処理部２８は、入力文字パター
ンのサイズを標準の大きさに正規化（変倍）してから認
識する。In the dictionary 29 that the character recognition processing section 28 refers to for character recognition processing, each character is registered as a pattern of standard size (for example, height=252 width=18). Therefore, the character recognition processing unit 28 normalizes (varies the magnification) the size of the input character pattern to the standard size and then recognizes the input character pattern.

前記設定部２４は、その変倍処理のための変倍定数の設
定も行う。すなわち、ヒストグラムのピークｃ、ｄに対
応する文字高さと、辞書化された小文字および大文字の
標準の高さとの比がら変倍定数を決定し、それをレジス
タ２６に設定する。The setting unit 24 also sets a scaling constant for the scaling process. That is, a scaling constant is determined from the ratio of the character height corresponding to the peaks c and d of the histogram to the standard heights of lowercase and uppercase letters in the dictionary, and is set in the register 26.

さて、文字認識処理部２８からは認識結果として文字コ
ードまたはリジェクトコードが出力され。Now, the character recognition processing section 28 outputs a character code or a reject code as a recognition result.

これは出力バッファ３０を介して外部記憶装置またはプ
リンタなどへ出力されるが、ブロック３２へも送られる
。This is output to an external storage device, printer, etc. via the output buffer 30, but is also sent to block 32.

このブロック３２は、入力文書の１行当りのリジェクト
コードをカウンタ３４で計数し、その計数値と所定値と
の比較判定を比較器３６で行うものである。この比較判
定の結果は制御部３８に報告される。この制御部３８は
前記各部の動作制御を司るものであり、後述のスキップ
判定の係るカウンタ４０を内蔵している。In this block 32, a counter 34 counts the reject codes per line of the input document, and a comparator 36 compares and determines the counted value with a predetermined value. The results of this comparison and determination are reported to the control unit 38. This control section 38 is responsible for controlling the operation of each section, and includes a counter 40 that performs skip determination, which will be described later.

次に第２図のフローチャートも参照して、このＯＣＲの
動作を説明する。Next, the operation of this OCR will be explained with reference to the flowchart shown in FIG.

制御部３８の制御の下に、スキャナ１０によって入力文
書の読取りが行われ、その画像情報が画像バッファ１２
に逐次蓄積されて行く。この画像情報が一定量蓄積され
ると、切出回路１４による行切出しおよび文字切出しが
開始させられ、切出された文字の画像情報が文字バッフ
ァ１６に蓄積される。この切出し動作と並行して、ブロ
ック１８において文字高さが計測され、そのヒストグラ
ムがヒストグラムメモリ２２に作成される。An input document is read by the scanner 10 under the control of the control unit 38, and the image information is stored in the image buffer 12.
are accumulated sequentially. When a certain amount of this image information is accumulated, line cutting and character cutting by the cutting circuit 14 is started, and the image information of the cut out characters is accumulated in the character buffer 16. In parallel with this cutting operation, the character height is measured in block 18, and a histogram thereof is created in the histogram memory 22.

制御部３８は切出しを完了した行が入力文書の先頭行で
あるか否かを判定しくステップ１００）、先頭行である
ならば、設定部２４によって求められた閾値と変倍定数
をレジスタ２６に設定させるように制御線Ｓ１によって
レジスタ２６を制御する（ステップ１０２）。この後、
制御部３８は制御信号Ｓ２によって文字認識処理部２８
に処理実行を指示する（ステップ１０４）。The control unit 38 determines whether or not the line for which extraction has been completed is the first line of the input document (step 100). If it is the first line, the control unit 38 stores the threshold value and scaling constant determined by the setting unit 24 in the register 26. The register 26 is controlled by the control line S1 so as to be set (step 102). After this,
The control unit 38 controls the character recognition processing unit 28 by the control signal S2.
and instructs it to execute the process (step 104).

文字認識処理部２８は、文字バッファ１６から入力文字
の画像情報を１文字ずつ取り込み、その高さをレジスタ
２６に設定されている閾値と比較することにより、入力
文字の文字種（記号、小文字または大文字もしくは特殊
記号）を弁別する。The character recognition processing unit 28 reads the image information of the input characters one by one from the character buffer 16 and compares the height with the threshold value set in the register 26 to determine the character type (symbol, lowercase, or uppercase) of the input characters. or special symbols).

そして、弁別した文字種に対応してレジスタ２６に設定
されている変倍定数に従い、入力文字を標準の大きさに
正規化してから、辞書２９に登録されている当該文字種
の標準パターンと照合し、入力文字を同定する。同定す
ることができたときは、その文字コードを出力し、でき
なかったときはリジェクトコードを出力する。Then, according to the scaling constant set in the register 26 corresponding to the discriminated character type, the input character is normalized to a standard size, and then compared with the standard pattern of the character type registered in the dictionary 29, Identify input characters. When the character code can be identified, the character code is output, and when it cannot be identified, the reject code is output.

ブロック３２内のカウンタ３４は、各行の文字認識処理
が始まる前に制御部３８によってリセットされ１文字認
識処理中にリジェクトコードが出力される度にカウント
アツプする。A counter 34 in the block 32 is reset by the control unit 38 before the character recognition process for each line begins, and counts up each time a reject code is output during the single character recognition process.

１行の文字認識処理が完了すると、制御部３８は次に処
理すべき行が残っているかチェックする（ステップ１０
６）、未処理行が残っているならば、次行の切出しが始
まっている。When character recognition processing for one line is completed, the control unit 38 checks whether there is a line left to be processed next (step 10).
6) If unprocessed rows remain, cutting out of the next row has begun.

制御部３８は先頭行以外の行の処理の場合、比較器３６
の出力をチェックする（ステップ１０８）。The control unit 38 controls the comparator 36 when processing a line other than the first line.
Check the output of (step 108).

文字認識処理を実行した直前の行におけるリジェクト数
が所定値以上であれば、比較器３６の出力がＨレベルに
なっている。なお、この時点ではカンウタ３４はリセッ
トされていない。If the number of rejects in the line just before the character recognition process is executed is equal to or greater than a predetermined value, the output of the comparator 36 is at the H level. Note that the counter 34 has not been reset at this point.

比較器３６の出力がＬレベルならば、直前行のリジェク
ト数は所定値未満であり、正常に認識されたと判断され
る。この場合、制御部３８はカウンタ（ｎ）４０をリセ
ットしくステップ１１０）、次行の文字認識処理の開始
を制御信号Ｓ２によって文字・認識処理部２８に指示す
る。その直前に、カウンタ３４も制御線Ｓ３を通じてリ
セットされる。If the output of the comparator 36 is at the L level, it is determined that the number of rejects in the immediately preceding row is less than a predetermined value and that it has been recognized normally. In this case, the control section 38 resets the counter (n) 40 (step 110), and instructs the character/recognition processing section 28 to start character recognition processing for the next line using the control signal S2. Just before that, the counter 34 is also reset via the control line S3.

ステップ１０８において比較器３６の出力がＨレベルで
あると判定された場合、制御部３８は内部のカウンタ４
０をカウントアツプしくステップ１１２）、その値ｎを
特定値Ｎ（こシではＮ＝３とする）と比較する（ステッ
プ１１４）。If it is determined in step 108 that the output of the comparator 36 is at H level, the control unit 38 controls the internal counter 4.
0 is counted up (step 112), and the value n is compared with a specific value N (here N=3) (step 114).

ｎ　＜　Ｎならば、制御部３８は制御信号Ｓ４を通じて
、出力バッファ３０に蓄積されている直前行の認識結果
データをキャンセルする（ステップ１１６）。次に制御
部３８は、制御信号Ｓ５によって、直前行の再切出しを
切出回数１４に指示する（ステップ１１８）。この時、
必要ならば制御信号Ｓ６によって、スキャナ１０の読取
り動作を一時的に停止させる。この再切出し動作が終了
すると、ヒストグラムメモリ２２に当該行のヒストグラ
ムが作成され、それに基づいた閾値と変倍定数が設定部
２４によって求められている。制御部３８は制御信号Ｓ
１によって新しい閾値および変倍定数をレジスタ２６に
設定させた後、文字認識処理部２８に処理開始を指示す
る（ステップ１０２）。If n < N, the control unit 38 cancels the recognition result data of the immediately preceding row stored in the output buffer 30 via the control signal S4 (step 116). Next, the control unit 38 instructs the cutting number 14 to re-cut the immediately preceding row using the control signal S5 (step 118). At this time,
If necessary, the reading operation of the scanner 10 is temporarily stopped by the control signal S6. When this re-cutting operation is completed, a histogram of the row is created in the histogram memory 22, and the setting unit 24 determines the threshold value and scaling constant based on the histogram. The control unit 38 receives a control signal S
1 to set a new threshold value and scaling constant in the register 26, and then instruct the character recognition processing section 28 to start processing (step 102).

ステップ１１４において、ｎ≧Ｎと判定された場合、つ
まり現在切出しを完了した行の直前行の文字認識を２回
失敗した場合、制御部３８はカウンタ４０をリセットす
る（ステップ１２０）。つぎに直前行の認識結果をキャ
ンセルし、その代りに１行のスキップコードを出力バッ
ファ３０にセットする（ステップ１２２）。つまり、２
回続けて認識に失敗した行をスキップする訳である。そ
の後、切出しを完了した行の文末認識処理を開始させる
。その直前にカウンタ３４はリセットされる。If it is determined in step 114 that n≧N, that is, if character recognition of the line immediately before the line for which extraction has been completed has failed twice, the control unit 38 resets the counter 40 (step 120). Next, the recognition result of the immediately preceding line is canceled and a skip code of one line is set in the output buffer 30 instead (step 122). In other words, 2
This means that lines that fail to be recognized repeatedly are skipped. Thereafter, the end-of-sentence recognition process for the line for which the extraction has been completed is started. Immediately before that, the counter 34 is reset.

このように、各行のリジェクト数をチェックし。In this way, check the number of rejects for each row.

リジェクト数が所定値を超えた場合には、その行の文字
高さのヒストグラムに基づいて決定した閾値（および変
倍定数）を改めて設定してから再度認識を行う、したが
って、標準以外の文字サイズで印字された文書、文章途
中で文字サイズが変動する文書に対する認識率が向上す
る。If the number of rejects exceeds a predetermined value, the threshold value (and scaling constant) determined based on the histogram of character heights in that line is set again and recognition is performed again. The recognition rate improves for documents printed in , and for documents where the font size changes mid-sentence.

なお、この実施例においては、先頭行の判別を行ったが
、先頭行も２行目以降の各行と同様に扱って処理させて
もよい。たゾし、先頭行について最初の文字認識処理は
無駄になるので、その分だけ処理時間は増加する。In this embodiment, the first row is determined, but the first row may also be treated and processed in the same manner as the second and subsequent lines. However, since the first character recognition process for the first line is wasted, the processing time increases accordingly.

また、画像バッファ１２が１行分のバッファの場合には
、リジェクト数の多い行の再切出しの際に、スキャナ１
０において入力文書を１行分逆送りさせて再度読取らせ
れば、同様の処理を行うことができる。In addition, when the image buffer 12 is a buffer for one line, the scanner 1
0, the input document can be reversed by one line and read again, and the same process can be performed.

さらに、この実施例における各処理部をソフトウェアに
よって実現してもよい。Furthermore, each processing unit in this embodiment may be realized by software.

また、この実施例においては、ヒストグラムから求めた
閾値を無条件に設定したが、条件を付加することもでき
る。例えば、予め標準閾値を用意しておき、ヒストグラ
ムから求められた閾値（抽出閾値）が標準閾値に近い値
の場合だけ、その抽出閾値を設定し、そうでなければ設
定しないようにしてもよい。Further, in this embodiment, the threshold value obtained from the histogram is set unconditionally, but conditions can also be added. For example, a standard threshold value may be prepared in advance, and the extraction threshold value may be set only when the threshold value (extraction threshold value) obtained from the histogram is close to the standard threshold value, and otherwise it may not be set.

具体例を示すならば、第２図のフローチャート中のステ
ップ１０２に、第４図のフローチャートで示すような処
理を追加することができるｅ　ＨｚとＨ２は抽出閾値で
あり、ＨｏとＨ工２は標準閾値である。この例では、Ｈ
工およびＨ２とＨ１工およびＨ２□との差がそれぞれΔ
Ｔ以下であれば、Ｈｌ、　Ｈ２をそのま＼閾値として採
用する（ステップ２００゜２０２　）　−Ｈｚ　トＨ２
２ノ差がＡＴ以下で、Ｈ，とＨ８、の差がΔＴ以下であ
れば、ＨｌとＨ８として、Ｈ２をＨｌとして採用する（
ステップ２０４，２０６）。To give a concrete example, processing as shown in the flowchart of FIG. 4 can be added to step 102 in the flowchart of FIG. 2. Hz and H2 are extraction thresholds, Ho and H2 are This is a standard threshold. In this example, H
The difference between H2 and H1 and H2□ is Δ
If it is less than or equal to T, Hl and H2 are directly adopted as thresholds (steps 200 and 202) -Hz and H2
If the difference between 2 is less than AT and the difference between H and H8 is less than ΔT, H1 and H8 are adopted, and H2 is adopted as H1 (
Steps 204, 206).

この場合、変倍定数も設定する（ステップ２０８）。In this case, a scaling constant is also set (step 208).

これ以外の場合は、Ｈｌ、　Ｈ，を新しい閾値としては
採用しない。In other cases, Hl, H, is not adopted as the new threshold.

この例は標準閾値および抽出閾値がそれぞれ２個である
が、一般的にはｍ個の標準閾値とｎ個の抽出閾値、が考
えられる。この場合、標準閾値および抽出閾値を予め昇
順または降順にソートしてから、ステップ２００，２０
４のような演算判定を効率的にかつ確実に行うことがで
きる。In this example, there are two standard thresholds and two extraction thresholds, but generally there are m standard thresholds and n extraction thresholds. In this case, the standard threshold value and the extraction threshold value are sorted in ascending order or descending order in advance, and then steps 200 and 20
Arithmetic determinations such as those in step 4 can be performed efficiently and reliably.

〔効　果〕〔effect〕

以上の説明から明らかなように１本発明によれば、デジ
タル文字で印字されたような文字サイズが一定しない欧
文文書、文章の途中で文字サイズが変動するような文書
などに対する認識率を向上せしめることができる文字認
識装置を実現することが可能になる。As is clear from the above description, according to the present invention, the recognition rate can be improved for European documents such as those printed with digital characters in which the font size is not constant, documents in which the font size changes in the middle of the sentence, etc. It becomes possible to realize a character recognition device that can perform the following functions.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の一実施例の機能的ブロック構成を示す
ブロック図、第２図は同実施例の動作を説明するための
概略フローチャート、第３図は英文文書の文字高さのヒ
ストグラムの一例を示す図。第４図は閾値の決定と設定に関する変形例を説明するた
めのフローチャートである。１０・・・スキャナ、　　１２・・・画像バッファ、１
４・・・切出回路、　２０・・・文字高さ計測回路。２２・・・ヒストグラムメモリ、　２４・・・閾値など
の設定部、　　２６・・・閾値などが設定されるレジス
タ、　２８・・・文字認識処理部、　２９・・・辞書、
３４・・・リジェクト数のカウンタ、　　３６・・・比
較器、　３８・・・制御部。１Ｑ　　　　　　２０　　　　　３０ −文零高１Fig. 1 is a block diagram showing the functional block configuration of an embodiment of the present invention, Fig. 2 is a schematic flowchart for explaining the operation of the embodiment, and Fig. 3 is a histogram of character height of an English document. A diagram showing an example. FIG. 4 is a flowchart for explaining a modification regarding determination and setting of threshold values. 10...Scanner, 12...Image buffer, 1
4...Cutout circuit, 20...Character height measurement circuit. 22... Histogram memory, 24... Setting unit for threshold values, etc., 26... Register for setting threshold values, etc., 28... Character recognition processing unit, 29... Dictionary,
34... Reject number counter, 36... Comparator, 38... Control unit. 1Q 20 30 -Bunrei High School 1

Claims

【特許請求の範囲】[Claims]

（１）欧文文書などの異サイズの文字の混在した文書を
扱い得る文字認識装置であって、行の文字サイズを計測
し、文字認識処理において文字サイズに関連した文字種
弁別に用いられる閾値を設定するための第１の手段と、
行当りのリジェクト数と所定値との比較判定を行う第２
の手段とを有し、文字認識処理が実行された直後の行の
リジェクト数が前記所定値以上であると前記第２手段に
よって判定されたときは、当該行について前記第１手段
により前記閾値を設定し直した後、改めて文字認識処理
を実行するようにしてなる文字認識装置。(1) A character recognition device that can handle documents containing a mixture of characters of different sizes, such as Roman documents, which measures the character size of a line and sets a threshold value used for character type discrimination related to character size in character recognition processing. A first means for
A second step that compares and determines the number of rejects per line with a predetermined value.
and when the second means determines that the number of rejects in a line immediately after the character recognition process is equal to or greater than the predetermined value, the first means sets the threshold value for the line. A character recognition device that performs character recognition processing again after being reset.

（２）第１の手段は文字の高さのヒストグラムを求め、
そのヒストグラムに基づいて記号と小文字との弁別用閾
値と、小文字と大文字との弁別用閾値とを設定すること
を特徴とする特許請求の範囲第１項記載の文字認識装置
。(2) The first method is to obtain a histogram of character heights,
2. The character recognition device according to claim 1, wherein a threshold for distinguishing between symbols and lowercase letters and a threshold for discriminating between lowercase and uppercase letters are set based on the histogram.