JPH11232388A

JPH11232388A - Document/slip recognition system

Info

Publication number: JPH11232388A
Application number: JP10028573A
Authority: JP
Inventors: Hidekazu Hatano; 英一羽田野; Takeyuki Sugimoto; 建行杉本; Akizo Kadota; 彰三門田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-02-10
Filing date: 1998-02-10
Publication date: 1999-08-27

Abstract

PROBLEM TO BE SOLVED: To preform slip recognition of piled slips that has variation in a printing density every appropriate sheet unit with high precision irrespective of its variation of the printing density. SOLUTION: This system is equipped with an image scanner device 14 for reading a document or a slip, a format information extraction device 15 for extracting format information of a ruled line or a character line from a binarized image from the image scanner device, and a character recognition device 16 for inputting recognition image data of the character line from format information and converting them into a character code. The format information extraction device 15 obtains decision information on condition of the ruled line or the character line, the character recognition device 16 obtains decision information on condition of character recognition and obtains a change slice level value of the image scanner 14 on the basis of the decision information and the slice level of the image scanner device is set on the basis of the change slice level value and the present slice level value.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明に属する技術分野】本発明は文書や帳票のイメー
ジデータを文字コードに変換する認識システムに関し、
特に、スライスレベルを自動的に最適値補正する方法及
びシステムに係わる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a recognition system for converting image data of a document or form into a character code.
In particular, the present invention relates to a method and system for automatically correcting a slice level to an optimum value.

【０００２】[0002]

【従来の技術】一般に、スキャナより入力されたイメー
ジはユーザが設定したスライスレベルで２値化されてい
る。そのため帳票の印字濃度によって認識率が悪くな
る。そのためユーザが毎回スライスレベルを調整して行
う場合が多い。しかしそれでは大量に積まれた帳票をオ
ートフィーダで読み込む場合には、ユーザが毎回設定し
直すことは業務の運用上効率が悪いため不可能である。2. Description of the Related Art Generally, an image input from a scanner is binarized at a slice level set by a user. Therefore, the recognition rate is deteriorated depending on the print density of the form. Therefore, the user often adjusts the slice level each time. However, in that case, when reading a large number of stacked documents by the auto feeder, it is impossible for the user to reset the settings every time because the efficiency of business operation is low.

【０００３】そのため最適なスライスレベルを求める方
法として、判定用特定パターンより求める装置がある。
しかし、特定パターンでは専用に作成したシートでない
と行えない。そのため一般的帳票では不可能になる。な
お、この種に関連する装置は、特開平５ー２０４７９号
公報に述べられている。[0003] As a method for obtaining an optimum slice level, there is an apparatus for obtaining an optimum slice level from a specific pattern for determination.
However, the specific pattern cannot be performed unless the sheet is created exclusively. Therefore, it becomes impossible with a general form. An apparatus related to this type is described in Japanese Patent Application Laid-Open No. 5-20479.

【０００４】また文字認識の結果とパターン辞書を判定
して行う方法がある。この方法は何回か認識させてから
最適なスライスレベル決定して、それ以降はそのスライ
スレベルのみで行う。そのため、大量に積まれた帳票の
印字濃度がある枚数単位にバラツキのある帳票をオート
フィーダで読み込んだ際、ある枚数単位の帳票は認識率
が悪いという問題が発生する。なお、この種に関連する
装置は、特開平６ー１２４３６５号公報に述べられてい
る。There is also a method of judging the result of character recognition and a pattern dictionary. In this method, an optimum slice level is determined after recognition is performed several times, and thereafter, only the slice level is used. For this reason, when a form in which the print density of a large number of forms is dispersed in a unit of a certain number of sheets is read by an auto feeder, there is a problem that the form in a certain number of units has a poor recognition rate. An apparatus related to this type is described in JP-A-6-124365.

【０００５】また多値イメージより認識精度の高いスラ
イスレベルを求め、このスライスレベルを設定し直す。
この方法もまず最初に最適レベルを求める方法であるた
め、大量に積まれた帳票の印字濃度がある枚数単位にバ
ラツキのある帳票をオートフィーダで読み込んだ際、あ
る枚数単位の帳票は認識率が悪いという問題が発生す
る。毎回行った場合は、認識を何度も行うために時間も
かかる。認識精度を求めるにはマスターの認識結果がな
い限り不可能である。なお、この種に関連する装置は、
特開平８ー２８７１９３号公報に述べられている。Further, a slice level having higher recognition accuracy is obtained from the multi-valued image, and this slice level is reset.
Since this method is also a method to obtain the optimum level first, when a form with a large number of printed sheets and a print density that varies in a certain number of sheets is read by the auto feeder, the recognition rate of the certain number of forms is low. The problem of badness occurs. If it is performed every time, it takes time to perform recognition many times. It is impossible to obtain recognition accuracy unless there is a master recognition result. The equipment related to this species is
It is described in JP-A-8-287193.

【０００６】以上のように従来の認識システムでは、大
量に積まれた帳票の印字濃度がある枚数単位にバラツキ
のある帳票をオートフィーダで読み込んだ際、認識率が
悪いという問題があった。As described above, the conventional recognition system has a problem that the recognition rate is poor when a form having a large number of forms printed on the sheet is read by an auto feeder with a variation in print density.

【０００７】[0007]

【発明が解決しようとする課題】本発明が解決しようと
する課題は、従来の方法では、大量に積まれた印字濃度
がある枚数単位にバラツキのある帳票をオートフィーダ
で読み込んだ際、標準の濃さの帳票では認識精度が高い
が、うすい又は濃い帳票では認識率が悪いという問題が
あった。The problem to be solved by the present invention is that, in the conventional method, when a form having a large number of printing densities and a variation in a unit of a certain number of sheets is read by an auto feeder, a standard method is used. There is a problem that the recognition accuracy is high in a dark form, but the recognition rate is poor in a thin or dark form.

【０００８】そこで、本発明の目的は、入力されたイメ
ージの罫線、文字行、文字認識の状態を判定情報として
抽出を行い、判定情報よりスライスレベルを決定し変更
することにより、入力する帳票のスライスレベルを最適
にすることで、印字濃度のバラツキに関係なく安定した
認識精度を得ることである。Accordingly, an object of the present invention is to extract ruled lines, character lines, and character recognition states of an input image as judgment information, determine and change a slice level from the judgment information, and thereby change the form of an input form. By optimizing the slice level, it is possible to obtain stable recognition accuracy irrespective of variations in print density.

【０００９】[0009]

【課題を解決するための手段】前記課題を解決するため
に、本発明は主として次のような構成を採用する。In order to solve the above problems, the present invention mainly employs the following configuration.

【００１０】文書または帳票を読み取るイメージスキャ
ナ装置と、前記イメージスキャナ装置からの２値化イメ
ージより罫線およびまたは文字行のフォーマット情報を
抽出するフォーマット情報抽出装置と、前記フォーマッ
ト情報より文字行の認識イメージデータを入力して文字
コードに変換する文字認識装置と、を備えた文書・帳票
認識システムであって、前記フォーマット情報抽出装置
によって罫線およびまたは文字行の状態の判定情報を求
めるとともに、前記文字認識装置によって文字認識の状
態の判定情報を求め、前記判定情報に基づいて前記イメ
ージスキャナ装置の変更スライスレベル値を求め、前記
変更スライスレベル値と現在のスライスレベル値とに基
づいて前記イメージスキャナ装置のスライスレベルを設
定する文書・帳票認識システム。An image scanner for reading a document or a form, a format information extracting device for extracting format information of a ruled line and / or a character line from a binary image from the image scanner, and a character line recognition image from the format information A character recognition device for inputting data and converting it to a character code, wherein the format information extraction device obtains determination information on the state of a ruled line and / or a character line and the character recognition device. The device determines character recognition state determination information, determines a changed slice level value of the image scanner device based on the determination information, and determines the changed slice level value of the image scanner device based on the changed slice level value and the current slice level value. Document / form for which slice level is set Identification system.

【００１１】[0011]

【発明の実施の形態】まず、本発明の実施形態の原理と
動作を説明する。図１はスライスレベル抽出を行なう処
理ブロックの様子を示す。帳票１０は大量に積まれた帳
票１１の一例である。まずイメージスキャナ１４により
入力された帳票１０をフォーマット抽出１５及び文字認
識１６を行う。そして判定情報抽出のため、罫線・文字
行・枠状態抽出１７や、認識状態抽出１８を行う。この
判定情報を基にスライスレベル判定１９をして、スライ
スレベル設定２０する。大量に積まれた帳票１１の横に
記載した印字濃度１２は、帳票の印字状態を表し、スラ
イスレベル１３はスライスレベル設定２０を行った内容
である。文字認識結果は出力データ２１として出力され
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS First, the principle and operation of an embodiment of the present invention will be described. FIG. 1 shows a processing block for performing slice level extraction. The form 10 is an example of the form 11 stacked in large quantities. First, the form 10 input by the image scanner 14 is subjected to format extraction 15 and character recognition 16. Then, in order to extract the judgment information, a ruled line / character line / frame state extraction 17 and a recognition state extraction 18 are performed. The slice level is determined 19 based on the determination information, and the slice level is set 20. The print density 12 described beside the large number of forms 11 indicates the print state of the forms, and the slice level 13 is the content of the slice level setting 20. The character recognition result is output as output data 21.

【００１２】図２は、罫線より抽出される判定情報であ
る罫線の切れ具合情報の抽出の様子を示す。入力された
イメージデータ３０より罫線抽出を行った結果の罫線デ
ータ３１になる。この罫線データの始点・終点座標を基
に、イメージデータ３０を探索方向に黒画素の個数を探
索３２して、切れ具合情報３３である黒画素頻度率を黒
画素数を罫線長黒画素数で割った値で求める。即ち、罫
線の切れ具合を見るために、抽出された罫線の始点座標
から終点座標までにおいて、横罫線の場合は横方向に、
縦罫線の場合は縦方向に黒画素頻度で判定情報を求め
る。FIG. 2 shows how rule line cutting condition information, which is determination information extracted from a rule line, is extracted. Ruled line data 31 is obtained as a result of performing ruled line extraction from the input image data 30. Based on the coordinates of the start and end points of the ruled line data, the image data 30 is searched for the number of black pixels 32 in the search direction, and the black pixel frequency ratio, which is the cutting degree information 33, is calculated by dividing the black pixel number by the ruled line length black pixel number. Calculate by dividing the value. That is, in order to check the degree of cut of the ruled line, in the case of the horizontal ruled line, in the horizontal direction from the start point coordinate to the end point coordinate of the extracted ruled line,
In the case of a vertical ruled line, the determination information is obtained at the black pixel frequency in the vertical direction.

【００１３】図３は、罫線より抽出される判定情報であ
る罫線の太さ情報の抽出の様子を示す。入力されたイメ
ージデータ４０より罫線抽出を行った結果の罫線データ
４１になる。この罫線データの始点・終点座標を基に、
イメージデータ４０を探索方向に黒画素の個数を探索４
２して、太さ情報４３である太さ単位頻度を抽出された
太さ単位毎の太さの黒画素数を罫線長黒画素数で割った
値で求める。この場合、太さ情報４３は４種類の太さに
対する情報が抽出されている。即ち、罫線の太さのばら
つきを見るために、抽出された罫線の始点座標から終点
座標までにおいて、横罫線の場合は縦方向に、縦罫線の
場合は横方向に黒画素数を求め抽出される太さ単位の頻
度で判定情報を求める。FIG. 3 shows how ruler line thickness information, which is determination information extracted from ruled lines, is extracted. Ruled line data 41 is obtained as a result of performing ruled line extraction from the input image data 40. Based on the start point and end point coordinates of this ruled line data,
Search for number of black pixels in image data 40 in search direction 4
Then, the thickness unit frequency, which is the thickness information 43, is obtained by dividing the number of black pixels of the thickness of each extracted thickness unit by the number of black pixels of the ruled line length. In this case, as the thickness information 43, information on four types of thickness is extracted. That is, in order to see the variation in the thickness of the ruled line, the number of black pixels is calculated and calculated in the vertical direction for the horizontal ruled line and in the horizontal direction for the vertical ruled line from the start point coordinates to the end point coordinates of the extracted ruled line. The determination information is obtained at the frequency of the thickness unit.

【００１４】図４は文字行より抽出される判定情報であ
る孤立点情報の抽出の様子を示す。孤立点とは、黒画素
が白画素に囲まれたものになる。入力され文字行抽出で
切り出された文字イメージデータ５０より抽出孤立点数
５１を求める。この抽出孤立点数５１とあらかじめ設定
された認識結果文字基本孤立点数５２より、孤立点情報
５３である孤立点数差と孤立点数差率を求める。この場
合、かすれた文字イメージデータ５０で、孤立点情報５
３の孤立点数差は１８、孤立点数差率は４６０％にな
る。また濃い文字イメージデータ５４も上記と同じ手段
で行い、孤立点情報５３の孤立点数差は−３、孤立点数
差率は６０％になる。FIG. 4 shows how isolated point information, which is determination information extracted from a character line, is extracted. An isolated point is a black pixel surrounded by white pixels. The number of isolated points 51 to be extracted is obtained from the character image data 50 input and extracted by character line extraction. Based on the extracted isolated point number 51 and the preset recognition result character basic isolated point number 52, an isolated point number difference and an isolated point number difference rate as the isolated point information 53 are obtained. In this case, the faint character image data 50 contains the isolated point information 5.
The isolated point number difference of 3 is 18 and the isolated point number difference ratio is 460%. The dark character image data 54 is also processed by the same means as described above. The isolated point number difference of the isolated point information 53 is -3, and the isolated point number difference rate is 60%.

【００１５】図５は文字行より抽出される判定情報であ
る孤立白エリア情報の抽出の様子を示す。孤立白エリア
とは、白画素が黒画素に囲まれたものになる。入力さ
れ、文字行抽出で切り出された文字イメージデータ６０
より抽出孤立白エリア数６１を求める。この抽出孤立白
エリア数６１とあらかじめ設定された認識結果文字基本
孤立白エリア数６２より、孤立白エリア情報６３である
孤立白エリア数差と孤立白エリア数差率を求める。この
場合、濃い文字イメージデータ６０で、孤立白エリア情
報６３の孤立白エリア数差は−２、孤立点数差率は５０
％になる。またかすれた文字イメージデータ６４も上記
と同じ手段で行い、孤立点情報６３の孤立点数差は７、
孤立点数差率は２７５％になる。FIG. 5 shows how isolated white area information, which is determination information extracted from a character line, is extracted. An isolated white area is one in which white pixels are surrounded by black pixels. Character image data 60 input and cut out by character line extraction
Then, the number 61 of extracted isolated white areas is obtained. From the number of extracted isolated white areas 61 and the preset number of recognition result character basic isolated white areas 62, the isolated white area number difference and the isolated white area number difference rate as the isolated white area information 63 are obtained. In this case, in the dark character image data 60, the difference in the number of isolated white areas in the isolated white area information 63 is -2, and the difference rate in the number of isolated points is 50.
%become. The blurred character image data 64 is also performed by the same means as described above, and the isolated point number difference of the isolated point information 63 is 7,
The isolated point difference rate becomes 275%.

【００１６】図６は文字認識結果より抽出される判定情
報である類似度情報の抽出の様子を示す。入力されたイ
メージデータ７０を文字認識７１を行った結果、認識結
果７２と各文字の類似度７３が抽出される。この類似度
７３を帳票全文字類似度抽出７４行った結果、類似度情
報７５のようなグラフが出来る。このグラフより、類似
度情報７５は頻度最大値のときの類似度と類似度最低値
が求められる。即ち、類似度情報７５によれば、グラフ
のＹ軸の最大値（頻度最大値）に対応するＸ軸の値が頻
度最大値のときの類似度（図１０の最下段左側の数値）
であり、グラフの左端で類似度の最低値（図１０の最下
段右側の数値）が求められ、これらの類似度は例えば％
数値で表される（類似度１００％というのは１００％類
似していることを示す）。類似度とは、文字認識のため
にあらかじめ記憶されている文字パターンを基に入力イ
メージデータとマッチングを行って結果の、文字パター
ンとの近さの度合いである。FIG. 6 shows how similarity information, which is determination information extracted from the result of character recognition, is extracted. As a result of performing character recognition 71 on the input image data 70, a recognition result 72 and a similarity 73 of each character are extracted. As a result of performing the similarity 73 on the form all-character similarity extraction 74, a graph such as the similarity information 75 is created. From this graph, the similarity information 75 determines the similarity at the maximum frequency and the minimum similarity. That is, according to the similarity information 75, the similarity when the X-axis value corresponding to the Y-axis maximum value (frequency maximum value) of the graph is the frequency maximum value (the numerical value on the lower left side of FIG. 10)
The lowest value of the similarity (the numerical value on the lower right side of FIG. 10) is obtained at the left end of the graph, and these similarities are, for example,%
It is represented by a numerical value (a similarity of 100% indicates that there is 100% similarity). The similarity is a degree of closeness to a character pattern as a result of performing matching with input image data based on a character pattern stored in advance for character recognition.

【００１７】図７はスライスレベル判定及び設定（図１
参照）の様子を示す。各判定情報を基にスライスレベル
判定テーブル８０より抽出スライスレベル値を求める。
この抽出スライスレベル値に係数をかけて変更スライス
レベル値８１を求める。そして現在のスライスレベル値
と変更スライスレベル値８１を演算して、設定スライス
レベル値８２を抽出する。この場合演算での±は、判定
情報で帳票がかすれていると判定された場合は−にな
り、濃い場合は＋になる。また、係数は各判定情報毎に
あらかじめ設定された値である。FIG. 7 shows the slice level determination and setting (FIG. 1).
Reference). An extracted slice level value is determined from the slice level determination table 80 based on each determination information.
The extracted slice level value is multiplied by a coefficient to obtain a changed slice level value 81. Then, the current slice level value and the changed slice level value 81 are calculated, and the set slice level value 82 is extracted. In this case, ± in the calculation becomes − when the form is determined to be faint based on the determination information, and becomes + when the form is dark. The coefficient is a value set in advance for each piece of determination information.

【００１８】以下、本発明の実施形態を詳細に説明す
る。図８は、本発明による認識システムの構成を示す図
である。前記認識システムは、文書や帳票をイメージデ
ータとして入力するイメージスキャナ装置２０１、入力
されたイメージを記憶するイメージ記憶装置２０２、ユ
ーザが指示した大局的切り出し位置でイメージを切り出
すイメージ切り出し装置２０３、ユーザが指示した大局
的切り出し位置情報を記憶するユーザ指示情報記憶装置
２０４、罫線を抽出する罫線抽出装置２０５、抽出した
罫線データを記憶する罫線データ記憶装置２０６、罫線
・文字行・文字認識結果の状態より判定情報を抽出する
判定状態抽出装置２０７、判定情報を記憶する判定情報
記憶装置２０８、文字行を抽出する文字行抽出装置２０
９、抽出された文字行データを記憶する文字行データ記
憶装置２１０、文字の認識を行う文字認識装置２１１、
認識結果を記憶する記憶装置２１５、判定情報よりスラ
イスレベルを判定するスライスレベル判定装置２１２、
スライスレベルの設定を行うスライスレベル設定装置２
１３、ユーザからの指示を入力する外部指示入力装置１
０１、ユーザからの指示を入力する際の対話表示及び認
識結果表示／修正または認識ライン数設定のため表示す
る表示装置１０３、これらの各装置を制御する制御装置
１０２より構成される。Hereinafter, embodiments of the present invention will be described in detail. FIG. 8 is a diagram showing the configuration of the recognition system according to the present invention. The recognition system includes an image scanner device 201 for inputting a document or a form as image data, an image storage device 202 for storing an input image, an image cutout device 203 for cutting out an image at a global cutout position specified by a user, A user instruction information storage device 204 for storing the instructed global cutout position information, a ruled line extraction device 205 for extracting the ruled line, a ruled line data storage device 206 for storing the extracted ruled line data, a state of the ruled line / character line / character recognition result Judgment state extracting device 207 for extracting judgment information, judgment information storage device 208 for storing judgment information, character line extracting device 20 for extracting character lines
9, a character line data storage device 210 that stores extracted character line data, a character recognition device 211 that performs character recognition,
A storage device 215 for storing the recognition result, a slice level determination device 212 for determining a slice level from the determination information,
Slice level setting device 2 for setting a slice level
13. External instruction input device 1 for inputting an instruction from a user
01, a display device 103 for displaying an interactive display at the time of inputting an instruction from a user and displaying / correcting a recognition result or setting the number of recognition lines, and a control device 102 for controlling these devices.

【００１９】次に、図９、及び図１，図２，図３，図
４，図５，図６，図８，図１０，図１１を用いてスライ
スレベル抽出フローを説明する。Next, a slice level extraction flow will be described with reference to FIG. 9 and FIGS. 1, 2, 3, 4, 5, 5, 6, 8, 10, and 11.

【００２０】まず、図９のイメージ入力３０１では、あ
らかじめユーザやシステム情報で設定されたスライスレ
ベルでイメージスキャナ装置２０１より帳票１０を入力
し、イメージ記憶装置２０２へイメージデータが記憶さ
れる。First, in the image input 301 shown in FIG. 9, the form 10 is input from the image scanner 201 at the slice level set in advance by the user and system information, and the image data is stored in the image storage 202.

【００２１】フォーマット抽出３１０では、イメージ切
り出し装置２０３がイメージ記憶装置２０２にあるイメ
ージデータより、ユーザ指示情報記憶装置２０４にある
ユーザが指示した大局的切り出し位置情報により認識す
る部分を求める。認識する部分とは帳票全面、帳票の一
部分、または文字部分の場合がある。In the format extraction 310, the image extraction device 203 obtains a portion to be recognized from the image data in the image storage device 202 based on the global extraction position information specified by the user in the user instruction information storage device 204. The part to be recognized may be the entire form, a part of the form, or a character part.

【００２２】罫線抽出３１１では、認識する部分を罫線
抽出装置２０５により罫線データ３１，４１を求め、罫
線データ記憶装置に罫線データ３１（図２参照），４１
（図３参照）を記憶する。In the ruled line extraction 311, the ruled line data 31, 41 are obtained from the ruled line extracting device 205 to determine the ruled line data 31 and 41 in the ruled line data storage device.
(See FIG. 3).

【００２３】文字行抽出３１２では、認識する部分を文
字行抽出装置２０９により文字行データを求め、文字行
データ記憶装置２１０に文字行データ（図４、図５参
照）を記憶する。文字行データは文字行があるエリアと
一文字単位の文字切出し位置である。In the character line extraction 312, the character line data is obtained by the character line extraction device 209 for the part to be recognized, and the character line data storage device 210 stores the character line data (see FIGS. 4 and 5). The character line data is the area where the character line is located and the character extraction position in units of one character.

【００２４】文字認識３２０では、文字行データ記憶装
置２１０にある文字行データより、１文字単位の文字イ
メージを文字認識装置２１１に転送して文字認識する。
文字認識結果出力３２１では、文字認識装置２１１にて
文字認識された文字コードとそれぞれの文字に対する類
似度（図６参照）を認識結果記憶装置２１５に記憶す
る。In the character recognition 320, a character image in units of one character is transferred from the character line data stored in the character line data storage device 210 to the character recognition device 211 to perform character recognition.
In the character recognition result output 321, the character code recognized by the character recognition device 211 and the similarity (see FIG. 6) for each character are stored in the recognition result storage device 215.

【００２５】状態抽出３３０では、罫線、文字行、文字
認識結果、の状態を判定情報抽出装置２０７にて、状態
を抽出してそれを判定情報とする。判定情報は、罫線で
は図２、図３に示す処理を行い判定情報３３，４３を求
める。文字行では図４、図５の処理を行い、判定情報５
３，６３を求める。文字認識結果では図６の処理を行
い、判定情報７５を求める。そして、各判定情報を判定
情報記憶装置２０８に記憶する。In the state extraction 330, the state of the ruled line, character line, and character recognition result is extracted by the determination information extracting device 207 and is used as the determination information. For the determination information, the processing shown in FIG. 2 and FIG. In the character line, the processing shown in FIGS.
Find 3,63. The processing of FIG. 6 is performed on the character recognition result to determine the determination information 75. Then, each determination information is stored in the determination information storage device 208.

【００２６】スライスレベル判定３４０では、スライス
レベル判定装置２１２にて、判定情報記憶装置２０８に
ある各判定情報と、スライスレベル判定装置２１２にあ
るあらかじめ設定された図７のスライスレベル判定テー
ブル８０からテーブルの変更スライスレベル値ＳＬＤ
（ＳｌｉｃｅＬｅｖｅｌＤｅｌｔａ；スライスレベ
ル差分）８１を求める。In the slice level judgment 340, the slice level judgment unit 212 uses the judgment information stored in the judgment information storage unit 208 and the slice level judgment table 80 in FIG. Change slice level value SLD
(Slice Level Delta; slice level difference) 81 is obtained.

【００２７】図１０に示すように、変更スライスレベル
値ＳＬＤ１は、テーブル４００と黒画素頻度率より決定
される。変更スライスレベル値ＳＬＤ２は、テーブル４
０１と太さ種類数より決定される。変更スライスレベル
値ＳＬＤ３は、テーブル４１０と孤立点数差より決定さ
れる。変更スライスレベル値ＳＬＤ４は、テーブル４１
１と孤立点数差率より決定される。変更スライスレベル
値ＳＬＤ５は、テーブル４２０と孤立白エリア数差より
決定される。変更スライスレベル値ＳＬＤ６は、テーブ
ル４２１と孤立白エリア数差率より決定される。変更ス
ライスレベル値ＳＬＤ７はテーブル４４３０と類似度頻
度最大値より決定される。変更スライスレベル値ＳＬＤ
８、テーブル４３１と類似度最低値より決定される。As shown in FIG. 10, the changed slice level value SLD1 is determined from the table 400 and the black pixel frequency ratio. The changed slice level value SLD2 is set in Table 4
01 and the number of thickness types. The changed slice level value SLD3 is determined from the difference between the table 410 and the number of isolated points. The changed slice level value SLD4 is stored in the table 41.
It is determined from 1 and an isolated point difference rate. The changed slice level value SLD5 is determined from the difference between the table 420 and the number of isolated white areas. The changed slice level value SLD6 is determined based on the table 421 and the isolated white area number difference rate. The change slice level value SLD7 is determined from the table 4430 and the maximum similarity frequency. Changed slice level value SLD
8. Determined from the table 431 and the lowest similarity value.

【００２８】次に、スライスレベル設定３５０（図９参
照）では、スライスレベル判定装置２１２にて求めた変
更スライスレベル値に基づいて、スライスレベル設定装
置２１３により、図１１のスライスレベル演算方法５０
０で演算を行う。〜は、変更スライスレベル値のＳ
ＬＤ１〜ＳＬＤ６までの符号を判定する。符号につい
て、−はスライスレベルを濃くする方向で、＋はスライ
スレベルをうすくする方向になる。は各変更スライス
レベル値ＳＬＤ（ＳｌｉｃｅＬｅｖｅｌＤｅｌｔ
ａ；スライスレベル差分）に係数をかけて各判定情報の
要素単位に平均化する。は係数をかけた各変更スライ
スレベル値を平均化するが、その際、〜の符号判定
で０になった分は平均に含まない。は現在設定されて
いるスライスレベルに変更スライスレベルを、符号によ
り足したり、引いたりする。Next, in the slice level setting 350 (see FIG. 9), the slice level setting method 213 of FIG.
Operation is performed with 0. Is the changed slice level value S
The signs of LD1 to SLD6 are determined. With respect to the sign,-indicates a direction for increasing the slice level, and + indicates a direction for decreasing the slice level. Represents each changed slice level value SLD (Slice Level Delta)
a: Slice level difference) is multiplied by a coefficient and averaged for each element of each determination information. Averages each changed slice level value multiplied by a coefficient. In this case, the value that becomes 0 in the sign determination of is not included in the average. Adds or subtracts the changed slice level to or from the currently set slice level.

【００２９】その結果、決定された設定スライスレベル
を、イメージ入力装置２０１に設定する。また、設定ス
ライスレベルが現在のスライスレベルと同じ場合は、イ
メージ入力装置２０１に設定しない。そして設定された
スライスレベルで、次の帳票をイメージ入力装置２０１
より読み込む。As a result, the determined slice level is set in the image input device 201. If the set slice level is the same as the current slice level, the slice level is not set in the image input device 201. Then, at the set slice level, the next form is input to the image input device 201.
Read more.

【００３０】また、イメージ記憶装置に記憶されている
イメージデータが多値で記憶されている場合は、スライ
スレベル設定３５０で設定されたスライスレベルで２値
化処理を行うい再認識することができる。When the image data stored in the image storage device is stored in multi-valued form, the image data can be re-recognized by performing the binarization processing at the slice level set by the slice level setting 350. .

【００３１】帳票のＰＣＳ（ｐｒｉｎｔｃｏｎｔｒ
ａｓｔｓｉｇｎａｌ）の検知によると、以上説明した
本発明の実施形態で設定されるスライスレベルとＰＣＳ
の関係は、図１２の６００のグラフのようになる。ＰＣ
Ｓとは印字（手書き記入も含めて）の濃さを現すもので
ある。ＰＣＳが小さくなるほど印字が薄くなるため、比
例してスライスレベルも小さくなる特性になる。また、
ＰＣＳが高くなる方向では印字が濃くなるため、スライ
スレベルも増加する。但し、増加方向の特性はいったん
大幅に変化して、その後はさほど変化しなくて飽和する
特性になり、スライスレベルを変えてもＰＣＳ、即ちコ
ントラストに影響を与えないので、スライスレベルを変
えないようにする。The form PCS (print controller)
According to the detection of the slice level and the PCS, the slice level and the PCS set in the embodiment of the present invention described above are detected.
Is as shown by a graph 600 in FIG. PC
S represents the density of printing (including handwriting). Since the smaller the PCS, the thinner the print, the slice level is proportionally reduced. Also,
In the direction in which the PCS becomes higher, the printing becomes darker, so that the slice level also increases. However, the characteristics in the increasing direction change drastically once, and after that, they do not change much and become saturated characteristics. Even if the slice level is changed, the PCS, that is, the contrast is not affected, so that the slice level should not be changed. To

【００３２】以上説明したように、本発明の実施形態
は、取りまとめると、次のような構成を含むものであ
る。As described above, the embodiments of the present invention include the following configuration.

【００３３】罫線、文字行、文字認識結果、の状態より
判定情報を求める。まず罫線の判定情報は、罫線の切れ
具合を見るために、抽出された罫線の始点座標から終点
座標までにおいて、横罫線の場合は横方向に、縦罫線の
場合は縦方向に黒画素頻度より判定情報を求める。ま
た、罫線の太さのばらつきを見るために、抽出された罫
線の始点座標から終点座標までにおいて、横罫線の場合
は縦方向に、縦罫線の場合は横方向に黒画素数を求め抽
出される太さ単位の頻度で判定情報を求める。Determination information is obtained from the states of the ruled line, character line, and character recognition result. First, in order to check the degree of break of the ruled line, the ruled line judgment information is obtained from the starting point coordinates to the end point coordinate of the extracted ruled line in the horizontal direction for the horizontal ruled line and in the vertical direction for the vertical ruled line based on the black pixel frequency. Obtain judgment information. In addition, in order to check the variation in the thickness of the ruled line, the number of black pixels is calculated in the vertical direction in the case of a horizontal ruled line and in the horizontal direction in the case of a vertical ruled line, from the start point coordinates to the end point coordinates of the extracted ruled line. The determination information is obtained at the frequency of the thickness unit.

【００３４】文字行の判定情報は、切り出された文字行
イメージの各文字単位のイメージデータ上黒画素状態
を、文字認識結果より文字単位に予め設定された孤立基
本黒画素数を基にして、互いを比較した結果で判定情報
を求める。また、切り出された文字行イメージの各文字
単位のイメージデータ上黒画素状態を、文字認識結果よ
り文字単位に予め設定された孤立基本白画素数を基にし
て、互いを比較した結果で判定情報を求める。The character line determination information is based on the black pixel state on the image data for each character of the cut-out character line image based on the number of isolated basic black pixels set in advance for each character from the character recognition result. The determination information is obtained based on the result of the comparison. Also, based on the number of isolated basic white pixels set in advance for each character from the character recognition result, the black pixel state on the image data of each character of the cut-out character line image is compared with each other, Ask for.

【００３５】文字認識結果の判定情報は、各文字の認識
時の類似度の頻度で判定情報を求める。The determination information of the character recognition result is obtained based on the frequency of the similarity at the time of recognition of each character.

【００３６】そして、以上のそれぞれの判定情報を基に
スライスレベルの決定を行い、スライスレベルを変更で
きるようにしたものであり、次にイメージスキャナ装置
より入力される文書や帳票のスライスレベルを最適にし
たり、現在入力されている文書や帳票を再入力する際の
スライスレベルを最適にしている。この際、前記スライ
スレベルの決定をするための判定情報は、罫線、文字
行、文字認識結果、の全ての状態から求めるものとして
説明したが、前記全ての状態に代えて、適宜の状態を選
択して、この選択された状態に基づいてスライスレベル
を決定することもできる。The slice level is determined based on each of the above determination information, and the slice level can be changed. Next, the slice level of the document or form input from the image scanner is optimized. Or the slice level when re-entering a currently input document or form is optimized. At this time, the determination information for determining the slice level has been described as being obtained from all states of the ruled line, the character line, and the character recognition result. However, instead of all the states, an appropriate state is selected. Then, the slice level can be determined based on the selected state.

【００３７】また、前記判定情報に基づくスライスレベ
ルの変更は、大量に積まれた帳票の印字濃度がある枚数
単位にバラツキのある帳票毎に実施すると効果的であ
る。Further, it is effective to change the slice level based on the determination information for each form in which the print density of a large number of forms varies in a unit of a certain number of sheets.

【００３８】[0038]

【発明の効果】以上説明したごとく、本発明によれば、
入力されたイメージの罫線、文字行、文字認識の状態を
判定情報として抽出を行い、判定情報よりスライスレベ
ルを決定し変更することで、大量に積まれた印字濃度が
ある枚数単位にバラツキのある帳票を認識する際に、印
字濃度のバラツキに関係なく高い認識精度を得ることが
できる。As described above, according to the present invention,
The ruled line, character line, and character recognition state of the input image are extracted as judgment information, and the slice level is determined and changed based on the judgment information. When recognizing a form, high recognition accuracy can be obtained irrespective of variations in print density.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明のスライスレベル抽出を行なう処理ブロ
ックを示す図である。FIG. 1 is a diagram showing processing blocks for performing slice level extraction according to the present invention.

【図２】本発明の罫線状態切れ具合の抽出の様子を示す
図である。FIG. 2 is a diagram showing how a ruled line state is cut out according to the present invention;

【図３】本発明の罫線状態太さの抽出の様子を示す図で
ある。FIG. 3 is a diagram showing how a ruled line state thickness is extracted according to the present invention.

【図４】本発明の文字行状態孤立点の抽出の様子を示す
図である。FIG. 4 is a diagram showing how a character line state isolated point is extracted according to the present invention.

【図５】本発明の文字行状態孤立白エリアの抽出の様子
を示す図である。FIG. 5 is a diagram illustrating a state of extracting a character line state isolated white area according to the present invention.

【図６】本発明の文字認識結果状態の抽出の様子を示す
図である。FIG. 6 is a diagram showing a state of extracting a character recognition result state according to the present invention.

【図７】本発明のスライスレベル判定及び設定の様子を
示す図である。FIG. 7 is a diagram showing a state of slice level determination and setting according to the present invention.

【図８】本発明の一実施形態のシステム構成を示す図で
ある。FIG. 8 is a diagram showing a system configuration according to an embodiment of the present invention.

【図９】本発明の一実施形態のスライスレベル抽出のフ
ローを示す図である。FIG. 9 is a diagram showing a flow of slice level extraction according to an embodiment of the present invention.

【図１０】本発明の一実施形態のスライスレベル判定テ
ーブルの内容を示す図である。FIG. 10 is a diagram showing the contents of a slice level determination table according to an embodiment of the present invention.

【図１１】本発明の一実施例のスライスレベル演算を示
す図である。FIG. 11 is a diagram showing a slice level operation according to an embodiment of the present invention.

【図１２】ＰＣＳ変化とスライスレベルの関係を示す図
である。FIG. 12 is a diagram showing a relationship between a PCS change and a slice level.

【符号の説明】[Explanation of symbols]

１０帳票１１大量に積まれた帳票１２印字濃度１３スライスレベル１４イメージスキャナ１５フォーマット抽出１６文字認識１７罫線・文字行状態抽出１８認識状態抽出１９スライスレベル判定２０スライスレベル設定２１出力データ３３罫線切れ具合情報４３罫線太さ情報５３孤立点情報６３孤立白エリア情報７５文字認識類似度情報 DESCRIPTION OF SYMBOLS 10 Form 11 Form piled up in large quantities 12 Print density 13 Slice level 14 Image scanner 15 Format extraction 16 Character recognition 17 Ruled line / character line state extraction 18 Recognition state extraction 19 Slice level judgment 20 Slice level setting 21 Output data 33 Ruled line cut condition Information 43 Ruled line thickness information 53 Isolated point information 63 Isolated white area information 75 Character recognition similarity information

Claims

【特許請求の範囲】[Claims]

【請求項１】文書または帳票を読み取るイメージスキ
ャナ装置と、前記イメージスキャナ装置からの２値化イ
メージより罫線およびまたは文字行のフォーマット情報
を抽出するフォーマット情報抽出装置と、前記フォーマ
ット情報より文字行の認識イメージデータを入力して文
字コードに変換する文字認識装置と、を備えた文書・帳
票認識システムであって、前記フォーマット情報抽出装置によって罫線およびまた
は文字行の状態の判定情報を求めるとともに、前記文字
認識装置によって文字認識の状態の判定情報を求め、前記判定情報に基づいて前記イメージスキャナ装置の変
更スライスレベル値を求め、前記変更スライスレベル値と現在のスライスレベル値と
に基づいて前記イメージスキャナ装置のスライスレベル
を設定することを特徴とする文書・帳票認識システム。1. An image scanner for reading a document or a form, a format information extractor for extracting format information of a ruled line and / or a character line from a binary image from the image scanner, and a character line based on the format information. A character recognition device for inputting recognition image data and converting it into a character code, comprising: a document / form recognition system comprising: a format information extraction device for determining determination information of a ruled line and / or a character line state; A character recognition device obtains determination information of a character recognition state; a change slice level value of the image scanner device is calculated based on the determination information; and the image scanner is determined based on the changed slice level value and a current slice level value. Specially set the slice level of the device. Document and form recognition system.

【請求項２】請求項１に記載の文書・帳票認識システ
ムにおいて、前記罫線の状態の判定情報を求める手段は、抽出された
罫線の始点座標から終点座標について、横罫線の場合は
横方向に、縦罫線の場合は縦方向に黒画素頻度を求めて
前記罫線の切れ具合を検出することを特徴とする文書・
帳票認識システム。2. The document / form recognition system according to claim 1, wherein the means for obtaining the determination information of the state of the ruled line includes: from a start point coordinate to an end point coordinate of the extracted ruled line; In the case of a vertical ruled line, the frequency of black pixels is determined in the vertical direction to detect the degree of breakage of the ruled line.
Form recognition system.

【請求項３】請求項１または２の文書・帳票認識シス
テムにおいて、前記罫線の状態の判定情報を求める手段は、抽出された
罫線の始点座標から終点座標について、横罫線の場合は
縦方向に、縦罫線の場合は横方向に黒画素数を求め抽出
される太さ単位の頻度で求めて罫線の太さのばらつきを
検出することを特徴とする文書・帳票認識システム。3. The document / form recognition system according to claim 1 or 2, wherein the means for obtaining the determination information of the state of the ruled line includes a coordinate from a start point coordinate to an end point coordinate of the extracted ruled line, and a vertical direction in the case of a horizontal ruled line. In the case of a vertical ruled line, a document / form recognition system characterized in that the number of black pixels is determined in the horizontal direction and the variation of the thickness of the ruled line is detected by the frequency of the extracted thickness unit.

【請求項４】請求項１または３の文書・帳票認識シス
テムにおいて、前記文字行の状態の判定情報を求める手段は、切り出さ
れた文字行イメージの各文字単位のイメージデータ上の
黒画素状態から抽出孤立点数を求め、前記抽出孤立点数
と予め設定された認識結果文字基本孤立点数との対比か
ら得られる孤立点情報を求めることを特徴とする文書・
帳票認識システム。4. The document / form recognition system according to claim 1, wherein the means for obtaining the determination information of the state of the character line is based on a black pixel state on image data of each character of the cut-out character line image. A method for obtaining a number of extracted isolated points, and obtaining isolated point information obtained from a comparison between the number of extracted isolated points and a predetermined number of recognition result character basic isolated points.
Form recognition system.

【請求項５】請求項１または４の文書・帳票認識シス
テムにおいて、前記文字行の状態の判定情報を求める手段は、切り出さ
れた文字行イメージの各文字単位のイメージデータ上の
黒画素状態から抽出孤立白エリア数を求め、前記抽出孤
立白エリア数と予め設定された認識結果文字基本孤立白
エリア数との対比から得られる孤立白エリア情報を求め
ることを特徴とする文書・帳票認識システム。5. The document / form recognition system according to claim 1, wherein the means for obtaining the determination information of the state of the character line is based on a black pixel state on the image data of each character of the cut-out character line image. A document / form recognition system comprising: obtaining a number of extracted isolated white areas; and obtaining isolated white area information obtained from a comparison between the number of extracted isolated white areas and a predetermined number of recognition result character basic isolated white areas.

【請求項６】請求項１または５の文書・帳票認識シス
テムにおいて、前記文字認識の状態の判定情報を求める手段は、複数文
字の認識で得られる類似度抽出結果から、類似度頻度最
大値の類似度と類似度最低値を求めて文字の類似度を検
出することを特徴とする文書・帳票認識システム。6. The document / form recognition system according to claim 1, wherein the means for obtaining the character recognition state determination information includes a maximum similarity frequency value from a similarity extraction result obtained by recognizing a plurality of characters. A document / form recognition system, wherein a similarity and a minimum similarity value are obtained to detect a similarity of a character.

【請求項７】請求項１乃至６のいずれか１つの請求項
に記載の文書・帳票認識システムにおいて、前記判定情報を求めるための処理領域を、前記イメージ
スキャナ装置からの２値化イメージの全面、その一部
分、または文字部分とすることを特徴とする文書・帳票
認識システム。7. The document / form recognition system according to claim 1, wherein a processing area for obtaining the determination information is a whole area of the binarized image from the image scanner. , A part thereof, or a character part.