JP2000113101A

JP2000113101A - Method and device for segmenting character

Info

Publication number: JP2000113101A
Application number: JP10278773A
Authority: JP
Inventors: Masato Minami; 正人南; Toshiyuki Koda; 田敏行香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1998-09-30
Filing date: 1998-09-30
Publication date: 2000-04-21

Abstract

PROBLEM TO BE SOLVED: To more exactly remove an area deleted by a correct line by permitting detection even in the case of one or a little inclined correct line while utilizing lateral projection, character background and circumscribed rectangle width. SOLUTION: A laterally long image selecting part 3 extracts a laterally wide link image having the possibility of correction, a link line discriminating part 4 extracts an image provided with a link line including the correct line while using the lateral projection, a correct line discriminating part 6 extracts only the image including the correct line by excluding continuous characters while using the character background and a character-to-be-corrected removing part 9 removes the area deleted by the correct line.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ＯＣＲの手書き文字認
識装置等に利用される文字切り出し方法、および文字切
り出し装置に関する。なお、本発明は、手書き文字認識
だけでなく、印刷文字の認識、図面認識における文字切
り出し等、広い意味での文字切り出し処理に適応可能な
技術である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting method used in an OCR handwritten character recognition device and the like, and a character extracting device. In addition, the present invention is a technology applicable to not only handwritten character recognition but also character extraction processing in a broad sense, such as recognition of print characters and character extraction in drawing recognition.

【０００２】[0002]

【従来の技術】従来の文字切り出し装置における訂正線
による人為的訂正が存在する場合の訂正判定方法とし
て、特開平８−２０２８２２号公報の「文字切り出し装
置、および文字切り出し方法」に開示されているものが
ある。この「文字切り出し装置、および文字切り出し方
法」においては、以下のようにして、訂正線を検出して
いる。2. Description of the Related Art Japanese Patent Laid-Open Publication No. Hei 8-202822 discloses a "character extracting apparatus and character extracting method" as a correction judging method in the case where there is an artificial correction by a correction line in a conventional character extracting apparatus. There is something. In the “character extraction device and character extraction method”, a correction line is detected as follows.

【０００３】図３２に主要構成を示している。まず、連
結パターン抽出部２０１において、入力パターンに含ま
れる連結パターンを抽出する。次に、横長パターン抽出
部２０２において、パターンが横長のものを抽出する。
そして、続き線抽出部２０３において、横方向の投影が
パターン横幅の予め定められた割合より大きいものを続
き線ありとして抽出する。続き線抽出部２０３で抽出さ
れるパターンの例としては、図３３（ａ）〜（ｃ）のよ
うなパターンが挙げられる。図３３（ａ）は続き文字
で、認識結果を出すべき文字である。それに対して、図
３３（ｂ）および（ｃ）は訂正線で抹消された被訂正文
字で、認識結果を出してはいけないものである。FIG. 32 shows a main structure. First, the connection pattern extraction unit 201 extracts a connection pattern included in the input pattern. Next, the horizontal pattern extraction unit 202 extracts a horizontal pattern.
Then, in the continuation line extraction unit 203, a projection whose horizontal projection is larger than a predetermined ratio of the pattern width is extracted as having a continuation line. Examples of patterns extracted by the continuation line extracting unit 203 include patterns as shown in FIGS. FIG. 33A shows a continuation character, which is a character for which a recognition result is to be output. On the other hand, FIGS. 33 (b) and 33 (c) show the characters to be corrected erased by the correction lines, for which the recognition result must not be output.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
文字切り出し装置では、図３３（ｃ）のように訂正線が
１本の場合、および訂正線は２本だが近接あるいは重な
っているために横方向投影では１本に見えてしまう場合
は、続き文字と誤って判定されてしまうという課題があ
った。また、続き線判定では、横方向投影の大きさがパ
ターン横幅に対して予め定められた割合以上になったも
のを続き線と判定してしまうため、横幅の広いパターン
で続き線が若干斜めになっている場合は、横方向投影の
パターン横に対する割合が低くなり、続き線として判定
されなくなるという課題があった。また、被訂正文字と
訂正文字が図３４のように重なるために、連結パターン
抽出部で被訂正文字と訂正文字が分けられない場合は、
訂正文字を分離することが難しいという課題があった。However, in the conventional character segmentation device, when the number of correction lines is one as shown in FIG. There is a problem in that if the projection looks like a single character, it is erroneously determined to be a continuous character. Further, in the continuation line determination, when the size of the horizontal projection is greater than or equal to a predetermined ratio with respect to the pattern width, the continuation line is determined. In the case of, there is a problem that the ratio of the horizontal projection to the horizontal of the pattern is low, and the pattern is not determined as a continuous line. In addition, if the corrected character and the corrected character cannot be separated by the concatenated pattern extraction unit because the corrected character and the corrected character overlap as shown in FIG. 34,
There is a problem that it is difficult to separate corrected characters.

【０００５】本発明は、上記従来の課題を解決するもの
であり、訂正線が１本の場合、および訂正線は２本だが
近接あるいは重なっているために横方向投影では１本に
見えてしまう場合でも正しく被訂正文字を判定すること
のできる優れた文字切出し方法とその装置を提供するこ
とを目的とする。The present invention solves the above-mentioned conventional problems. In the case where there is one correction line, and when two correction lines are close to or overlap each other, they appear to be one in horizontal projection. It is an object of the present invention to provide an excellent character extracting method and apparatus capable of correctly determining a corrected character even in a case.

【０００６】[0006]

【課題を解決するための手段】本発明は、上記目的を達
成するために、まず横方向投影が大きいイメージを抽出
することで続き線を含むイメージを抽出し、続き文字で
は上端からの背景距離の変化が小さいことを利用して続
き文字を除外することにより、訂正線が１本だけの場合
でも検出可能にした。更に続き線判定でも、横方向投影
大きさのパターン横幅に対する割合で決められる閾値を
パターン横幅に応じて変化させることにより、パターン
の横幅が広くかつ訂正線が若干斜めになっている場合で
も訂正線を検出可能にした。また、訂正線の上の領域と
下の領域のイメージをそれぞれ連結イメージに分割し、
領域単位の連結イメージ群の大きさを比較して、小さい
方の連結イメージ群を除去することで被訂正文字を除去
することが可能になり、訂正文字と被訂正文字が重なっ
ていて分離が難しい場合でも、被訂正文字の影響が少な
くなるような訂正文字の抽出を可能にしたものである。According to the present invention, in order to achieve the above object, an image including a continuous line is first extracted by extracting an image having a large lateral projection, and a continuous distance is used to extract a background distance from an upper end. By using the fact that the change is small, the subsequent characters are excluded, so that even a single correction line can be detected. Furthermore, in the continuous line determination, the threshold determined by the ratio of the horizontal projection size to the pattern width is changed according to the pattern width, so that even when the pattern width is wide and the correction line is slightly oblique, the correction line can be obtained. Was made detectable. In addition, the image of the area above and below the correction line is divided into connected images, respectively,
By comparing the size of the connected image group in the area unit and removing the smaller connected image group, it becomes possible to remove the corrected character, and it is difficult to separate the corrected character and the corrected character because they overlap. Even in such a case, it is possible to extract a corrected character in which the effect of the corrected character is reduced.

【０００７】[0007]

【発明の実施の形態】本発明の請求項１に記載の発明
は、入力パターンに記入枠が含まれている場合は枠を除
去し、枠除去後のパターンを連結するイメージ毎に切り
出して、これを第１の連結イメージ群とし、各々の第１
の連結イメージについて、外接矩形の幅および高さを求
め、求めた外接矩形の幅が大きければ水平ライン毎の横
方向投影を求め、各々の水平ラインに対する横方向投影
と予め定められた閾値を比較し、閾値より大きくなるも
のが存在すれば、横方向投影が閾値以上の水平ライン範
囲と垂直ライン毎の上端からの背景距離を求め、求めた
イメージ上端からの背景距離の変化の少ない連続範囲が
狭ければ、イメージの高さが小さいときにはイメージを
除去し、イメージの高さが大きいときには横方向投影が
閾値以上の水平ライン範囲に基づいて、訂正線がイメー
ジの上半分または下半分に存在するかを判定し、訂正線
がイメージの上半分に存在すると判定された場合にはイ
メージから上半分の領域と水平ライン範囲の領域を除去
し、訂正線がイメージの下半分に存在すると判定された
場合にはイメージから下半分の領域と水平ライン範囲の
領域を除去し、上半分または下半分の領域と水平ライン
範囲の領域の除去後に残った認識対象である訂正文字領
域のイメージから連結するイメージ毎に切り出して、切
り出したイメージ群を第２の連結イメージ群とし、第２
の連結イメージ群が発生した場合は第２の連結イメージ
群を一文字単位に切り出し、第２の連結イメージ群が発
生しない場合は第１の連結イメージ群を一文字単位に切
り出し、最終的に入力パターンに存在する全ての文字を
一文字単位に切り出したイメージを出力することを特徴
とする文字切り出し方法である。DESCRIPTION OF THE PREFERRED EMBODIMENTS According to the first aspect of the present invention, when an input pattern includes an entry frame, the frame is removed, and the pattern after the removal of the frame is cut out for each connected image. This is referred to as a first connected image group, and
For the connected image, the width and height of the circumscribed rectangle are obtained, and if the width of the circumscribed rectangle is large, the horizontal projection for each horizontal line is obtained, and the horizontal projection for each horizontal line is compared with a predetermined threshold. Then, if there is something that is larger than the threshold, the horizontal projection is obtained by calculating the background distance from the upper end of each horizontal line range and vertical line where the horizontal projection is greater than or equal to the threshold, and a continuous range with little change in the background distance from the obtained upper end of the image If narrow, the image will be removed when the image height is small, and if the image height is large, the correction line will be in the upper or lower half of the image based on the horizontal line range where the horizontal projection is above the threshold If it is determined that the correction line exists in the upper half of the image, the upper half area and the area within the horizontal line range are removed from the image, and the correction line is removed from the image. If it is determined that it exists in the lower half of the image, the lower half area and the horizontal line area are removed from the image, and the recognition target remaining after removing the upper half or lower half area and the horizontal line area is removed. Each image to be connected is cut out from an image in a certain corrected character area, and the cut out image group is used as a second connected image group.
When the connected image group is generated, the second connected image group is cut out in units of one character, and when the second connected image group is not generated, the first connected image group is cut out in units of one character. This is a character extraction method characterized by outputting an image in which all existing characters are extracted in units of one character.

【０００８】従来は続き線が２本あることで訂正線を判
定していたものを、本発明では背景距離の変化が小さい
連続する範囲が長く連続しないものを訂正線と判定する
ことにより、訂正線が１本の場合でも判定でき、かつ続
き字を誤って訂正と判定しないという作用を有する。Conventionally, the correction line is determined based on two continuous lines. In the present invention, the correction line is determined by determining a continuous line having a small change in background distance and a long non-continuous line as a correction line. It is possible to determine even if there is only one line, and it is possible to prevent a continuation character from being erroneously determined as correction.

【０００９】本発明の請求項２に記載の発明は、入力パ
ターンに記入枠が含まれている場合は枠を除去し、枠除
去後のパターンを連結するイメージ毎に切り出して、こ
れを第１の連結イメージ群とし、各々の第１の連結イメ
ージについて、外接矩形の幅および高さを求め、幅が大
きければ水平ライン毎の横方向投影を求め、外接矩形幅
に基づいて、外接矩形幅に対する割合で表す横方向投影
の閾値を算出し、各々の水平ラインの横方向投影と算出
された閾値とを比較して、横方向投影が閾値以上の水平
ラインが存在する場合に、横方向投影が閾値以上の水平
ライン範囲と垂直ライン毎の上端からの背景距離を求
め、求めたイメージ上端からの背景距離の変化の少ない
連続範囲が狭ければ、イメージの高さが小さいときには
イメージを除去し、イメージの高さが大きいときには閾
値以上の水平ラインの中心が、イメージの上端または下
端からイメージの高さの半分を超えない予め定められた
範囲内に含まれるか否かを判定し、水平ラインの中心が
予め定められた範囲に含まれると判定された場合におい
て、水平ラインの中心がイメージの上半分に存在する場
合は、連結イメージ領域のイメージ上端から水平ライン
範囲下端までの領域を除去して、除去後の領域から連結
するイメージ毎に切り出して出力し、これを第３の連結
イメージ群とし、水平ラインの中心がイメージの下半分
に存在する場合は、連結イメージ領域の水平ライン範囲
上端からイメージ下端までの領域を除去して、除去後の
領域から連結するイメージ毎に切り出して出力し、これ
を第４の連結イメージ群とし、水平ライン範囲の中心が
予め定められた範囲に含まれないと判定された場合は、
水平ライン範囲の上および下の二つの領域を選択して連
結するイメージ毎に切り出し、二個の領域の分割イメー
ジ群を比較して、訂正文字を分割している可能性の高い
方の分割イメージ群を選択し、選択された方を第５の連
結イメージ群とし、第３、第４、第５いずれかの連結イ
メージ群が存在する場合は、それを一文字単位に切り出
して出力し、どれの存在しない場合は第１の連結イメー
ジを一文字単位に切り出して出力することを特徴とする
文字切り出し方法である。According to a second aspect of the present invention, when the input pattern includes an entry frame, the frame is removed, and the pattern after the removal of the frame is cut out for each image to be connected, and this is extracted as the first image. , The width and height of the circumscribed rectangle are obtained for each first connected image. If the width is large, a horizontal projection is obtained for each horizontal line, and the width of the circumscribed rectangle is determined based on the circumscribed rectangle width. Calculate the threshold value of the horizontal projection expressed as a ratio, compare the horizontal projection of each horizontal line with the calculated threshold value, and determine whether the horizontal projection is Find the background distance from the top of each horizontal line range and vertical line above the threshold, and if the continuous range with little change in the background distance from the obtained image top is narrow, remove the image when the image height is small, When the height of the image is large, it is determined whether or not the center of the horizontal line that is equal to or greater than the threshold is included in a predetermined range that does not exceed half of the height of the image from the top or bottom of the image, and In the case where the center is determined to be included in the predetermined range, if the center of the horizontal line is present in the upper half of the image, the area from the upper end of the connected image area to the lower end of the horizontal line range is removed. Is output for each connected image from the removed area, and this is used as a third connected image group. When the center of the horizontal line is located in the lower half of the image, the horizontal line range starts from the upper end of the horizontal line range of the connected image area. An area up to the lower end of the image is removed, and each connected image is cut out from the removed area and output, and this is set as a fourth connected image group, and the horizontal If the center of the emission range is determined to not be included in a predetermined range,
Select the two areas above and below the horizontal line range and cut out for each connected image, compare the divided image groups of the two areas, and split the corrected character with the highest probability of splitting the corrected character A group is selected, and the selected one is set as a fifth connected image group. If any of the third, fourth, and fifth connected image groups is present, it is cut out in units of one character and output. If there is no such image, the first connected image is extracted and output in units of one character.

【００１０】訂正線が上端または下端に近い場合は、訂
正文字は反対側に存在すると言えるので、端から訂正線
までの領域を削除することにより、訂正線が含まれる被
訂正文字の領域を確実に削除し、訂正文字の領域を誤っ
て削除しなくなり、訂正印等の影響によって訂正線がパ
ターンの中央付近にあるために、訂正文字が訂正線の上
または下のどちらに存在するかの判別が難しい場合は、
一旦訂正線の上そして下の領域の両方について連結イメ
ージに分割して比較し、分割後の連結イメージの状態を
比較して訂正文字の可能性の高いものを選択することに
より、全てのパターンについて訂正線の上と下の両方の
領域を調べるよりも効率的に訂正文字の記入領域が検出
でき、さらに、一旦訂正線の上または下で強制的に領域
を分割することにより、訂正文字と被訂正文字のイメー
ジが重なる場合でも被訂正文字の影響を少なく分離でき
るという作用を有する。When the correction line is close to the upper end or the lower end, it can be said that the correction character exists on the opposite side. Therefore, by deleting the area from the end to the correction line, the area of the character to be corrected including the correction line can be reliably determined. To determine whether the correction character is above or below the correction line because the correction line is located near the center of the pattern due to the effect of the correction mark. Is difficult,
Once all of the patterns above and below the correction line are divided into connected images and compared, the state of the connected images after division is compared, and those with a high possibility of correction characters are selected. The area where correction characters are written can be detected more efficiently than checking both the areas above and below the correction line, and by forcibly dividing the area once above or below the correction line, the correction character and the character to be corrected can be detected. Has the effect that even if the images overlap, the effect of the character to be corrected can be reduced to a small extent.

【００１１】本発明の請求項３に記載の発明は、入力パ
ターンに記入枠が含まれている場合は枠を除去し、枠除
去後のパターンを連結するイメージ毎に切り出して、こ
れを第１の連結イメージ群とし、各々の第１の連結イメ
ージについて、外接矩形の幅および高さを求め、幅が大
きければ水平ライン毎の横方向投影を求め、外接矩形幅
に基づいて、外接矩形幅に対する割合で表す横方向投影
の閾値を決定し、各々の水平ラインの横方向投影と投影
閾値決定部で決定した閾値とを比較して、横方向投影が
閾値以上の水平ラインが存在する場合に、横方向投影が
閾値以上の水平ライン範囲と垂直ライン毎の上端および
下端からの背景距離を求め、求めたイメージ上端からの
背景距離の変化の少ない連続範囲が狭ければ、イメージ
の高さが小さいときにはイメージを除去し、イメージの
高さが大きいときには閾値以上の水平ラインの中心が、
イメージの上端または下端からイメージの高さの半分を
超えない予め定められた範囲内に含まれるか否かを判定
し、水平ラインの中心が予め定められた範囲に含まれ
ると判定された場合において、水平ラインの中心がイメ
ージの上半分に存在する場合は、連結イメージ領域のイ
メージ上端から水平ライン範囲下端までの領域を除去
し、除去後の領域から連結するイメージ毎に切り出し、
これを第６の連結イメージ群とし、水平ラインの中心が
イメージの下半分に存在する場合は、連結イメージ領域
の水平ライン範囲上端からイメージ下端までの領域を除
去し、除去後の領域から連結するイメージ毎に切り出
し、これを第７の連結イメージ群とし、水平ライン範囲
の中心が予め定められた範囲に含まれないと判定された
場合は、水平ライン範囲の上および下の二つの領域を選
択して連結するイメージ毎に切り出し、二つの領域の分
割イメージ群に対して、各々の分割イメージ群中で大き
さの小さい分割イメージを除去し、除去後の分割イメー
ジ群の存在範囲を比較し、存在範囲が広い方を選択し、
選択された方を第８の連結イメージ群とし、第６、第
７、第８いずれかの連結イメージ群が存在する場合は、
それを一文字単位に切り出して出力し、どれの存在しな
い場合は第１の連結イメージを一文字単位に切り出して
出力することを特徴とする文字切り出し方法である。According to a third aspect of the present invention, when the input pattern includes an entry frame, the frame is removed, and the pattern after the removal of the frame is cut out for each connected image, and this is cut out for the first image. , The width and height of the circumscribed rectangle are obtained for each first connected image. If the width is large, a horizontal projection is obtained for each horizontal line, and the width of the circumscribed rectangle is determined based on the circumscribed rectangle width. Determine the horizontal projection threshold represented by the ratio, compare the horizontal projection of each horizontal line and the threshold determined by the projection threshold determination unit, if there is a horizontal line horizontal projection is greater than or equal to the threshold, The horizontal line range where the horizontal projection is greater than or equal to the threshold value and the background distance from the top and bottom of each vertical line are obtained. If the continuous range where the background distance from the top of the obtained image is small is small, the image height is small. When The image is removed in the center of the threshold or more horizontal lines when a large height of the image is,
It is determined whether the center of the horizontal line is included in the predetermined range from the top or bottom of the image within a predetermined range that does not exceed half the height of the image. If the center of the horizontal line is in the upper half of the image, remove the area from the top of the connected image area to the bottom of the horizontal line range, cut out each connected image from the removed area,
This is referred to as a sixth connected image group. If the center of the horizontal line is in the lower half of the image, the area from the upper end of the horizontal line range to the lower end of the connected image area is removed and connected from the removed area. Cut out for each image, set this as the seventh connected image group, and when it is determined that the center of the horizontal line range is not included in the predetermined range, select two regions above and below the horizontal line range Then, for each of the images to be connected and cut out, for the divided image group of the two regions, remove the small-sized divided images in each divided image group, and compare the existing range of the divided image group after the removal, Choose the one with the widest range,
The selected one is set as an eighth connected image group, and if any of the sixth, seventh, and eighth connected image groups exists,
This is a character extracting method characterized in that it is extracted and output in units of one character, and if none exists, the first connected image is extracted and output in units of one character.

【００１２】第２の水平ライン除去部において、訂正線
がパターンの中央付近にあるために、訂正文字が訂正線
の上または下のどちらに存在するかの判別が難しい場合
は、一旦訂正線の上そして下の領域の両方について連結
イメージに分割して比較し、分割後の連結イメージの状
態を比較して訂正文字の可能性の高いものを選択するこ
とにより、全てのパターンについて訂正線の上と下の両
方の領域を調べるよりも効率的に訂正文字の記入領域が
検出でき、さらに、一旦訂正線の上または下で強制的に
領域を分割することにより、訂正文字と被訂正文字のイ
メージが重なる場合でも被訂正文字の影響を少なく分離
できるという作用を有する。In the second horizontal line removing section, if it is difficult to determine whether a correction character exists above or below the correction line because the correction line is near the center of the pattern, once the correction line is removed. By dividing and comparing the connected image for both the upper and lower areas, comparing the state of the combined image after division and selecting the one with the highest possibility of a correction character, the upper and lower correction lines for all patterns are The area where corrected characters are written can be detected more efficiently than checking both areas, and the image of the corrected character and the character to be corrected overlap by forcibly dividing the area above or below the correction line. Even in such a case, there is an effect that the influence of the character to be corrected can be reduced.

【００１３】本発明の請求項４に記載の発明は、請求項
１から３のいずれかに記載の文字切り出し方法で入力パ
ターンを一文字ずつ切り出し、切り出された各々の文字
パターンに対し、文字の種類を分離するのに必要な特徴
を抽出し、抽出された特徴に基づいて、どの種類に属す
るかを認識することを特徴とする文字読取方法である。According to a fourth aspect of the present invention, in the character extracting method according to any one of the first to third aspects, an input pattern is cut out one character at a time, and a character type is determined for each of the cut out character patterns. This is a character reading method characterized by extracting a feature necessary to separate a character, and recognizing to which type the character belongs based on the extracted feature.

【００１４】これまで正常に切り出せなかった訂正に関
連する文字が、本発明の切り出し装置（文字切り出し
部）で正常に切り出すことができるようになり、切り出
し精度の向上により認識部の認識性能も向上するので、
システムの最終性能を表す認識部の認識性能も向上する
という作用を有する。Characters related to corrections that could not be cut out normally can be cut out normally by the cut-out device (character cut-out unit) of the present invention, and the recognition performance of the recognition unit can be improved by improving the cut-out accuracy. So
This has the effect of improving the recognition performance of the recognition unit representing the final performance of the system.

【００１５】本発明の請求項５に記載の発明は、請求項
１から３のいずれかに記載の文字切り出し方法をソフト
ウェアで実現したプログラムを納めた記録媒体であり、
本発明の文字切り出し方法を記録媒体を読み取るだけで
他のシステムにも簡単に導入できるようにすることがで
き、認識部の機能を持つシステム、あるいは記録媒体と
組み合わせることによって、入力パターンを取得するだ
けで、そのパターンに含まれる文字の読取も可能になる
という作用を有し、また、認識性能向上の結果としてユ
ーザ側の確認・修正の負担が少なくなるという作用も有
する。According to a fifth aspect of the present invention, there is provided a recording medium storing a program that realizes the character extracting method according to any one of the first to third aspects by software.
The character segmentation method of the present invention can be easily introduced into other systems simply by reading a recording medium, and an input pattern is obtained by combining with a system having a recognition unit function or a recording medium. This alone has the effect that the characters included in the pattern can be read, and also has the effect that the burden on the user for confirmation and correction is reduced as a result of the improvement in recognition performance.

【００１６】本発明の請求項６に記載の発明は、請求項
４の文字読取方法をソフトウェアで実現したプログラム
を納めた記録媒体であり、本発明の文字読取方法を記録
媒体を読み取るだけで他のシステムでも簡単に導入でき
るようにすることができ、入力パターンを取得するだけ
で、そのパターンに含まれる文字の読取も可能になると
いう作用を有し、また、認識性能向上の結果としてユー
ザ側の確認・修正の負担が少なくなるという作用も有す
る。According to a sixth aspect of the present invention, there is provided a recording medium storing a program which realizes the character reading method according to the fourth aspect by software. Can be easily introduced into the system, and it is possible to read the characters included in the input pattern only by acquiring the input pattern. This also has the effect of reducing the burden of checking / correcting.

【００１７】本発明の請求項７に記載の発明は、入力パ
ターンに記入枠が含まれている場合に、記入枠を除去す
る枠除去部と、前記枠除去部から出力されたパターンを
連結するイメージ毎に切り出し、各々のイメージの外接
矩形の幅および高さを求める第１の連結イメージ分割部
と、前記第１の連結イメージ分割部から出力された連結
イメージ群の中から幅の大きい連結イメージだけを選択
する横長イメージ選択部と、前記横長イメージ選択部で
選択されたイメージについて水平ライン毎の横方向投影
を求め、各々の水平ラインに対する横方向投影と予め定
められた閾値を比較し、閾値より大きければ続き線あり
と判定し、横方向投影が閾値以上の水平ライン範囲を出
力する続き線判定部と、前記続き線判定部で続き線あり
と判定されたイメージについて、イメージの垂直ライン
毎の上端からの背景距離を求める背景距離算出部と、前
記背景距離算出部から出力されたイメージの背景距離に
基づいて、イメージ上端からの背景距離の変化の小さい
範囲が長く連続しないイメージを訂正線ありと判定する
訂正線判定部と、前記訂正線判定部で訂正線ありと判定
されたイメージの高さの大小を判定する高さ判定部と、
前記高さ判定部で高さが小さいと判定されたイメージを
除去するイメージ除去部と、前記高さ判定部で高さが大
きいと判定されたイメージについて、前記続き線判定部
から出力された水平ライン範囲に基づいて、訂正線で抹
消された被訂正領域を除去し、除去後に残ったイメージ
領域を連結イメージ毎に切り出す被訂正文字除去部と、
前記被訂正文字除去部から被訂正領域除去後の連結イメ
ージが出力された場合は、前記被訂正文字除去部から出
力された連結イメージを一文字単位に切り出し、前記被
訂正文字除去部から連結イメージが出力されていない場
合は、第１の連結イメージ分割部から出力された連結イ
メージを一文字単位に切り出す一文字切り出し部を備え
たことを特徴とする文字切り出し装置である。According to a seventh aspect of the present invention, when an input pattern includes an entry frame, a frame removal unit for removing the entry frame is connected to the pattern output from the frame removal unit. A first connected image segmentation unit for extracting the width and height of a circumscribed rectangle of each image cut out for each image, and a connected image having a large width from a connected image group output from the first connected image segmentation unit A horizontal image selecting unit for selecting only the horizontal image, and obtaining a horizontal projection for each horizontal line for the image selected by the horizontal image selecting unit, comparing the horizontal projection for each horizontal line with a predetermined threshold value, If it is larger, it is determined that there is a continuous line, and a continuous line determining unit that outputs a horizontal line range in which the horizontal projection is equal to or greater than a threshold value, and an image that is determined to be continuous by the continuous line determining unit. A background distance calculation unit that calculates a background distance from the top end of each vertical line of the image, and a range in which a change in the background distance from the top end of the image is small based on the background distance of the image output from the background distance calculation unit. A correction line determining unit that determines that an image that is not continuous long has a correction line, a height determining unit that determines the magnitude of the height of the image determined to have a correction line by the correction line determining unit,
An image removing unit that removes an image whose height is determined to be small by the height determining unit; and an image that is determined by the height determining unit to be large is a horizontal output from the continuous line determining unit. Based on the line range, a corrected character removal unit that removes the corrected region deleted by the correction line and cuts out the image region remaining after the removal for each connected image,
When the connected image after the area to be corrected is output from the corrected character removing unit, the connected image output from the corrected character removing unit is cut out in units of one character, and the connected image is output from the corrected character removing unit. A character cutout device comprising a one-character cutout unit that cuts out a connected image output from the first connected image division unit in units of one character when the output is not output.

【００１８】従来は続き線が２本あることで訂正線を判
定していたものを、本発明では訂正線判定部において背
景距離の変化が小さい連続する範囲が長く連続しないも
のを訂正線と判定することにより、訂正線が１本の場合
でも判定でき、かつ続き字を誤って訂正と判定しないと
いう作用を有する。Conventionally, a correction line is determined based on two continuous lines. In the present invention, a correction line determination unit determines that a continuous range in which a change in background distance is small and which is not continuous is long as a correction line. By doing so, it is possible to determine even if there is only one correction line, and it is possible to prevent a continuation character from being erroneously determined to be corrected.

【００１９】本発明の請求項８に記載の発明は、前記続
き線判定部において、前記横長イメージ選択部で選択さ
れた連結イメージについて各々の水平ラインの横方向投
影を求める横方向投影算出部と、前記連結イメージ分
割部から出力された外接矩形幅に基づいて、外接矩形幅
に対する割合で表す横方向投影の閾値を決定する投影閾
値決定部と、前記横方向投影算出部から出力された各々
の水平ラインの横方向投影と投影閾値決定部で決定した
閾値とを比較して、横方向投影が閾値以上の水平ライン
が存在する場合に、閾値以上の水平ラインの範囲を出力
する水平ライン範囲選択部を備えたことを特徴とする請
求項７記載の文字切り出し装置である。According to an eighth aspect of the present invention, in the continuous line determining section, a horizontal projection calculating section for obtaining a horizontal projection of each horizontal line for the connected image selected by the horizontal image selecting section. Based on the circumscribed rectangle width output from the connected image division unit, a projection threshold value determination unit that determines a horizontal projection threshold expressed as a percentage of the circumscribed rectangle width, and each of the output values from the horizontal projection calculation unit A horizontal line range selection for comparing the horizontal projection of the horizontal line with the threshold determined by the projection threshold determination unit and outputting a range of the horizontal line equal to or greater than the threshold when there is a horizontal line whose horizontal projection is equal to or greater than the threshold The character extracting device according to claim 7, further comprising a unit.

【００２０】投影閾値決定部において、外接矩形幅が大
きくなるにしたがって、外接矩形幅に対する割合で表す
横方向投影の閾値を小さくすることにより、外接矩形幅
が大きくかつ続き線が若干斜めになっているために、横
方向投影の外接矩形幅に対する割合が低くなる場合で
も、続き線を検出することができるという作用を有す
る。In the projection threshold value determining section, as the width of the circumscribed rectangle increases, the threshold value of the horizontal projection expressed as a ratio to the width of the circumscribed rectangle is reduced, so that the width of the circumscribed rectangle becomes large and the continuous line becomes slightly oblique. Therefore, even when the ratio of the horizontal projection to the width of the circumscribed rectangle is low, a continuous line can be detected.

【００２１】本発明の請求項９に記載の発明は、前記被
訂正文字除去部において、前記高さ判定部で高さが大き
いと判定されたイメージについて、前記続き線判定部か
ら出力された水平ライン範囲に基づいて、訂正線がイメ
ージの上半分または下半分のどちらに存在するかを判定
する水平ライン範囲大分類部と、上半分と判定された場
合にはイメージから上半分の領域と水平ライン範囲の領
域を除去し、下半分と判定された場合にはイメージから
下半分の領域と水平ライン範囲の領域を除去する訂正文
字領域推定部と、除去後に残ったイメージ領域を連結す
るイメージ毎に切り出す第２の連結イメージ分割部を備
えたことを特徴とする請求項７記載の文字切り出し装置
である。According to a ninth aspect of the present invention, in the corrected character elimination unit, a horizontal line output from the continuation line determination unit for an image determined to have a large height by the height determination unit. Based on the line range, the horizontal line range major classification unit determines whether the correction line is located in the upper half or the lower half of the image. A corrected character region estimator that removes the region of the line range and removes the lower half region and the region of the horizontal line range from the image if it is determined to be the lower half, and each image that connects the image region remaining after the removal. 8. The character extracting apparatus according to claim 7, further comprising a second connected image dividing section for extracting the image.

【００２２】従来の訂正文字の抽出には訂正文字と被訂
正文字がラベル分離していることが必要であったが、本
発明では訂正文字領域推定部で水平ライン範囲の領域お
よび上半分または下半分で強制的に分割することによっ
て、ラベル分離していなくても訂正文字の領域を抽出す
ることができ、その時点で訂正文字以外のイメージが残
っていても、第２の連結イメージ分割部において連結イ
メージに分割した後で小さい連結イメージを除去するこ
とにより、訂正文字以外のイメージを除去できるという
作用を有する。In the conventional extraction of the corrected character, it is necessary that the corrected character and the character to be corrected are label-separated. However, in the present invention, the corrected character region estimating section estimates the horizontal line area and the upper half or lower half. By forcibly dividing by half, the area of the corrected character can be extracted even if label separation is not performed. Even if an image other than the corrected character remains at that time, the second connected image dividing unit By removing the small connected image after the division into the connected images, it is possible to remove an image other than the corrected character.

【００２３】本発明の請求項１０に記載の発明は、前記
被訂正文字除去部において、前記高さ判定部で高さが大
きいと判定されたイメージについて、前記続き線判定部
から出力された閾値以上の水平ライン範囲の中心が、イ
メージの上端または下端からイメージの高さの半分を超
えない予め定められた範囲内に含まれるか否かを判定す
る水平ライン範囲判定部と、前記水平ライン範囲判定部
で水平ライン範囲の中心が予め定められた範囲に含まれ
ると判定された場合において、水平ライン範囲の中心が
イメージ上端から予め定められた範囲に存在する場合
は、前記第１の連結イメージ分割部から出力された連結
イメージ領域から、イメージ上端から水平ライン範囲下
端までの領域を除去したものを訂正文字領域として出力
し、水平ラインの中心がイメージ下端から予め定められ
た範囲に存在する場合は、前記第１の連結イメージ分割
部から出力された連結イメージ領域から、水平ライン範
囲上端からイメージ下端までの領域を除去したものを訂
正文字領域として出力する第１のライン領域除去部と、
前記水平ライン範囲判定部で水平ライン範囲の中心が予
め定められた範囲に含まれないと判定された場合は、水
平ライン範囲の上および下の二つの領域を選択して訂正
文字候補領域として出力する第２のライン領域除去部
と、前記第２のライン領域除去部から二個の訂正文字候
補領域が出力された場合は、二つの訂正文字候補領域か
ら連結するイメージ毎に切り出し、訂正文字候補領域が
出力されていない場合は、前記訂正文字領域推定部から
出力された訂正文字領域から連結するイメージ毎に切り
出し、切り出し後のイメージ群を出力する第２の連結イ
メージ分割部と、前記第２の連結イメージ分割部から切
り出し後のイメージ群が出力された場合は、前記第２の
連結イメージ分割部から出力された二個の訂正文字候補
領域の切り出しイメージ群を比較して、訂正文字を分割
している可能性の高い方の切り出しイメージ群を選択し
て出力する訂正文字可能性判定部を備えたことを特徴と
する請求項７記載の文字切り出し装置である。According to a tenth aspect of the present invention, in the corrected character removing section, a threshold value output from the continuous line determining section for an image whose height is determined to be large by the height determining section. A horizontal line range determining unit that determines whether or not the center of the horizontal line range is included in a predetermined range that does not exceed half the height of the image from the upper end or lower end of the image, and the horizontal line range When the determination unit determines that the center of the horizontal line range is included in the predetermined range, and when the center of the horizontal line range exists in the predetermined range from the upper end of the image, the first connected image From the connected image area output from the division unit, the area from the top of the image to the bottom of the horizontal line range is removed and output as a corrected character area. Is present in a predetermined range from the lower end of the image, a corrected character area obtained by removing the area from the upper end of the horizontal line range to the lower end of the image from the connected image area output from the first connected image dividing unit. A first line area removing unit that outputs
When the horizontal line range determination unit determines that the center of the horizontal line range is not included in the predetermined range, the two regions above and below the horizontal line range are selected and output as corrected character candidate regions. When two corrected character candidate areas are output from the second line area removing unit and the second line area removing unit, the two corrected character candidate areas are cut out for each connected image, and the corrected character candidate If no region has been output, a second connected image dividing unit that cuts out each connected image from the corrected character region output from the corrected character region estimating unit, and outputs a cut-out image group, When the extracted image group is output from the connected image dividing unit, the extracted image group of the two corrected character candidate regions output from the second connected image dividing unit is output. 8. A character cutout according to claim 7, further comprising a corrected character possibility determining unit for comparing the image groups and selecting and outputting a cutout image group having a higher possibility of dividing the corrected character. Device.

【００２４】第１のライン領域除去部において、訂正線
が上端または下端に近い場合は、訂正文字は反対側に存
在すると言えるので、端から訂正線までの領域を削除す
ることにより、訂正線が含まれる被訂正文字の領域を確
実に削除し、訂正文字の領域を誤って削除しないという
作用を有する。In the first line area removing section, when the correction line is near the upper end or the lower end, it can be said that the correction character exists on the opposite side. Therefore, by deleting the area from the end to the correction line, the correction line is removed. This has the effect of reliably deleting the area of the corrected character included and not deleting the area of the corrected character by mistake.

【００２５】また、第２のライン領域除去部において、
訂正印等の影響によって訂正線がパターンの中央付近に
あるために、訂正文字が訂正線の上または下のどちらに
存在するかの判別が難しい場合は、一旦訂正線の上そし
て下の領域の両方について連結イメージに分割して比較
し、分割後の連結イメージの状態を比較して訂正文字の
可能性の高いものを選択することにより、全てのパター
ンについて訂正線の上と下の両方の領域を調べるよりも
効率的に訂正文字の記入領域が検出でき、さらに、一旦
訂正線の上または下で強制的に領域を分割することによ
り、訂正文字と被訂正文字のイメージが重なる場合でも
被訂正文字の影響を少なく分離できるという作用を有す
る。In the second line area removing section,
If it is difficult to determine whether the correction character is above or below the correction line because the correction line is near the center of the pattern due to the effect of the correction mark, etc. By examining both areas above and below the correction line for all patterns, divide both images into connected images and compare them, compare the state of the connected image after division and select the one with the highest possibility of correction characters, and select the one with the highest possibility of correction characters It is possible to detect the correction character entry area more efficiently than before, and once the area is forcibly divided above or below the correction line, even if the image of the correction character and the correction It has the effect that it can be separated with little effect.

【００２６】本発明の請求項１１に記載の発明は、前記
訂正文字可能性判定部において、第２の連結イメージ分
割部から出力された二つの訂正文字候補領域の分割イメ
ージ群に対して、各々の分割イメージ群中で大きさの小
さい分割イメージを除去する小イメージ除去部と、前記
分割小イメージ除去部から出力された二つの訂正文字候
補領域の除去後分割イメージ群の元イメージ中における
存在範囲を比較し、存在範囲が広い方の除去後分割イメ
ージ群を選択する分割イメージ群選択部を備えたことを
特徴とする請求項１０記載の文字切り出し装置である。According to an eleventh aspect of the present invention, in the corrected character possibility determining section, each of the divided image groups of the two corrected character candidate areas output from the second connected image dividing section is A small image removing unit for removing a small-sized divided image in the divided image group, and a range of existence of the two corrected character candidate regions output from the divided small image removing unit in the original image of the divided image group after removal. 11. The character segmenting apparatus according to claim 10, further comprising: a divided image group selecting unit that compares the divided image groups with the larger existence range and selects the post-removed divided image group.

【００２７】小イメージ除去部において、訂正文字以外
の、被訂正文字の残り、訂正印の一部のように文字とは
判定できないような小さい分割イメージを除去して訂正
文字だけ残し、分割イメージ群選択部において、二つの
訂正文字候補領域の残った文字部分の幅を比較して大き
い方を選択することにより、訂正文字が含まれる領域を
より正確に抽出できるという作用を有する。The small image removing unit removes a small divided image that cannot be determined as a character, such as a remaining portion of the character to be corrected and a part of a correction mark, other than the corrected character, and leaves only the corrected character. The selecting unit compares the widths of the remaining character portions of the two corrected character candidate regions and selects the larger one, thereby having the effect of more accurately extracting the region including the corrected character.

【００２８】本発明の請求項１２に記載の発明は、文字
が含まれるパターンを入力するパターン入力装置と、入
力パターン画像から一文字ずつ切り出す前記請求項７か
ら１０のいずれかに記載の文字切り出し装置と、前記の
文字切り出し装置で一文字単位に切り出された各々の文
字パターンに対し、文字の種類を分離するのに必要な特
徴を抽出する特徴抽出部と、特徴抽出部で抽出された特
徴に基づいて、どの種類に属するかを認識する認識部を
備えたことを特徴とする文字読取装置文字が含まれるパ
ターンを入力するパターン入力装置と、入力パターン
画像から一文字ずつ切り出す前記請求項１〜５の文字切
り出し装置と、前記の文字切り出し装置で一文字単位に
切り出された各々の文字パターンに対し、文字の種類を
分離するのに必要な特徴を抽出する特徴抽出部と、特徴
抽出部で抽出された特徴に基づいて、どの種類に属する
かを認識する認識部を備えたことを特徴とする文字読取
装置である。According to a twelfth aspect of the present invention, there is provided a pattern input device for inputting a pattern including a character, and a character extracting device according to any one of the seventh to tenth aspects, wherein the character is extracted one by one from an input pattern image. And a feature extraction unit that extracts features required to separate character types from each of the character patterns cut out in units of one character by the above-described character cutout device, based on the features extracted by the feature extraction unit. A pattern input device for inputting a pattern including a character reading device character, characterized by comprising a recognizing unit for recognizing which type the character reading device belongs to; Character extracting device, and for each character pattern cut out in units of one character by the character extracting device, it is necessary to separate character types. A character reading device comprising: a feature extraction unit for extracting a feature; and a recognition unit for recognizing which type the feature belongs to based on the feature extracted by the feature extraction unit.

【００２９】これまで正常に切り出せなかった訂正に関
連する文字が、本発明の切り出し装置（文字切り出し
部）で正常に切り出すことができるようになることで切
り出し精度が向上し、切り出し精度が良くなれば認識部
の認識性能も良くなるので、システムの最終性能を表す
認識部の認識性能も向上するという作用を有し、認識性
能が上がることで、オペレータやユーザによる結果の確
認作業および修正作業の手間が少なくなるという作用も
有する。Characters related to corrections that could not be cut out normally can be cut out normally by the cut-out device (character cut-out unit) of the present invention, so that the cut-out accuracy can be improved and the cut-out accuracy can be improved. Since the recognition performance of the recognition unit improves, the recognition performance of the recognition unit, which represents the final performance of the system, has the effect of improving the recognition performance. It also has the effect of reducing labor.

【００３０】（実施の形態１）以下、本発明の第１の実
施の形態を図面に基づいて説明する。図１は本発明の第
１の実施の形態における文字切り出し装置のブロック図
を示すものである。図１において、１は枠除去部であ
り、入力イメージに記入枠が含まれている場合に、記入
枠を除去する（図２８の７１）。２は第１の連結イメー
ジ分割部であり、枠除去部１で枠が除去されたパターン
を連結するイメージ毎に切り出し、各々のイメージの外
接矩形の幅および高さを求める（図２８の７２、７
３）。３は横長イメージ選択部であり、第１の連結イメ
ージ分割部２で切り出された連結イメージ群の中から幅
の大きい連結イメージだけを選択する（図２８の７
５）。４は続き線判定部であり、横長イメージ選択部３
で選択されたイメージについて水平ライン毎の横方向投
影を求め（図２８の７６）、各々の水平ラインに対する
横方向投影と予め定められた閾値を比較し（図２８の７
７）、閾値より大きければ続き線ありと判定し、横方向
投影が閾値以上の水平ライン範囲を出力する（図２８の
７８）。５は背景距離算出部で、続き線判定部４で続き
線ありと判定されたイメージについて、イメージの垂直
ライン毎の上端からの背景距離を求める（図２８の７
９）。６は訂正線判定部で、背景距離算出部５から出力
されたイメージの背景距離に基づいて、イメージ上端か
らの背景距離の変化の小さい範囲が長く連続するイメー
ジを訂正線ありと判定する（図２８の８０）。７は高さ
判定部で、訂正線判定部６で訂正線ありと判定されたイ
メージの高さの大小を判定する（図２８の８１）。８は
イメージ除去部で、高さ判定部７で高さが小さいと判定
されたイメージを除去する（図２８の８２）。９は被訂
正文字除去部で、高さ判定部７で高さが大きいと判定さ
れたイメージについて、前記続き線判定部から出力され
た水平ライン範囲に基づいて、訂正線がイメージの上半
分または下半分のどちらに存在するかを判定し（図２８
の８３）、上半分と判定された場合にはイメージから上
半分の領域と水平ライン範囲の領域を除去し（図２８の
８４）、下半分と判定された場合にはイメージから下半
分の領域と水平ライン範囲の領域を除去し（図２８の８
５）、除去後に残ったイメージ領域を連結するイメージ
毎に切り出す（図２８の８６）。１０は一文字切り出し
部で、被訂正文字除去部９から被訂正領域除去後の連結
イメージが出力された場合は、被訂正文字除去部９から
出力された連結イメージを一文字単位に切り出し、被訂
正文字除去部９から連結イメージが出力されていない場
合は、第１の連結イメージ分割部２から出力された連結
イメージを一文字単位に切り出す（図２８の８７）。Embodiment 1 Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a character segmenting apparatus according to the first embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a frame removing unit which removes an entry frame when the input image includes the entry frame (71 in FIG. 28). Reference numeral 2 denotes a first connected image dividing unit, which cuts out the image from which the pattern from which the frame has been removed by the frame removing unit 1 is connected, and obtains the width and height of the circumscribed rectangle of each image (see 72 in FIG. 28, 7
3). Reference numeral 3 denotes a horizontally long image selecting unit which selects only a wide connected image from the connected image group cut out by the first connected image dividing unit 2 (see 7 in FIG. 28).
5). Reference numeral 4 denotes a continuous line determination unit, and a horizontally long image selection unit 3
The horizontal projection for each horizontal line is obtained for the image selected in (76 in FIG. 28), and the horizontal projection for each horizontal line is compared with a predetermined threshold (7 in FIG. 28).
7) If it is larger than the threshold, it is determined that there is a continuous line, and a horizontal line range in which the horizontal projection is equal to or larger than the threshold is output (78 in FIG. 28). Reference numeral 5 denotes a background distance calculation unit that obtains a background distance from the upper end of each vertical line of the image for the image determined to have a continuous line by the continuous line determination unit 4 (see 7 in FIG. 28).
9). Reference numeral 6 denotes a correction line determination unit that determines an image in which a range in which the change in the background distance from the upper end of the image is small and long continues for a long time based on the background distance of the image output from the background distance calculation unit 5 (see FIG. 28-80). Reference numeral 7 denotes a height determination unit which determines the magnitude of the height of the image determined to have a correction line by the correction line determination unit 6 (81 in FIG. 28). Reference numeral 8 denotes an image removing unit that removes an image whose height is determined to be small by the height determining unit 7 (82 in FIG. 28). Reference numeral 9 denotes a corrected character removal unit. For an image determined to be large in height by the height determination unit 7, a correction line is set to an upper half or an upper half of the image based on the horizontal line range output from the continuous line determination unit. It is determined which of the lower half exists (FIG. 28
83), if the upper half is determined, the upper half area and the horizontal line area are removed from the image (84 in FIG. 28), and if the lower half is determined, the lower half area from the image is removed. And the area of the horizontal line range is removed (8 in FIG. 28).
5) The image area remaining after the removal is cut out for each connected image (86 in FIG. 28). Reference numeral 10 denotes a one-character cutout unit which cuts out the connected image output from the corrected character removing unit 9 in units of one character when the corrected image removal unit 9 outputs the connected image after removing the corrected region. If the connected image is not output from the removing unit 9, the connected image output from the first connected image dividing unit 2 is cut out in units of one character (87 in FIG. 28).

【００３１】図６は続き線判定部４の構成を示してい
る。１１は横方向投影算出部で分割イメージに対して図
７のような横方向投影を求める。１２は第１の水平ライ
ン範囲選択部で、１１で求めた横方向投影の中で、イメ
ージ幅に予め定められた割合を乗じた値よりも大きいも
のが存在すれば続き線として判定し、背景情報算出部５
に出力する。逆に、全てが小さければそのまま一文字切
り出し部１０に出力する。FIG. 6 shows the configuration of the continuous line determination unit 4. Reference numeral 11 denotes a horizontal projection calculation unit that obtains a horizontal projection as shown in FIG. 7 for the divided image. Reference numeral 12 denotes a first horizontal line range selection unit. If there is a horizontal projection obtained in step 11 that is larger than a value obtained by multiplying the image width by a predetermined ratio, the horizontal projection is determined as a continuous line. Information calculation unit 5
Output to Conversely, if everything is small, it is output to the one-character cutout unit 10 as it is.

【００３２】図１０は被訂正文字除去部９の構成を示し
ている。３１はライン範囲大分類部で、高さ判定部６か
ら出力された訂正線の位置情報から、訂正線がイメージ
の上半分にあるか下半分にあるかを分類する。３２は訂
正文字領域推定部で、水平ライン範囲大分類部３１から
出力された訂正線の領域分類結果が上半分の場合は、イ
メージの上半分の領域を除去し、逆に下半分の場合は、
イメージの下半分の領域を除去し、除去後のイメージ領
域に前記第１のイメージ・範囲選択部１２から出力され
た閾値以上の水平ラインの範囲が含まれている場合に
は、除去後のイメージ領域からさらに水平ラインの範囲
も除去して、残ったイメージ領域を第２の連結イメージ
分割部３３に出力する。３３は第２の連結イメージ分割
部で、訂正文字領域推定部３２から出力されたイメージ
領域から、第１の連結イメージ分割部２と同じ手法で連
結イメージ毎に分割する。FIG. 10 shows the configuration of the corrected character removing section 9. Reference numeral 31 denotes a line range large classifying unit, which classifies whether the correction line is in the upper half or the lower half of the image based on the position information of the correction line output from the height determination unit 6. Reference numeral 32 denotes a corrected character area estimating unit which removes the upper half area of the image when the area classification result of the correction line output from the horizontal line range large classifying unit 31 is the upper half, and conversely, when the lower half is the lower half. ,
If the lower half area of the image is removed and the removed image area includes the range of the horizontal line equal to or larger than the threshold value output from the first image / range selection unit 12, the removed image is removed. The range of the horizontal line is further removed from the area, and the remaining image area is output to the second connected image division unit 33. Reference numeral 33 denotes a second connected image dividing unit which divides the image area output from the corrected character area estimating unit 32 into connected images in the same manner as the first connected image dividing unit 2.

【００３３】以上のように構成された文字切り出し装置
について、以下その動作を説明する。まず、枠除去部１
で、入力イメージに記入枠が含まれている場合は、記入
枠を除去する処理を行う。記入枠が含まれていない場合
は、入力イメージをそのまま出力する。記入枠が含まれ
ているパターンとしては図２のようなものがあり、枠を
除去する方法の例としては以下のものがある。The operation of the character segmenting apparatus configured as described above will be described below. First, the frame removal unit 1
If the input image includes an entry frame, a process for removing the entry frame is performed. If no entry frame is included, the input image is output as it is. FIG. 2 shows a pattern including an entry frame. An example of a method for removing the frame is as follows.

【００３４】「枠除去方法１」（１）枠より低い濃度で二値化（２）枠および認識に不必要な印刷文字（例：桁）の
領域を記憶（例：枠は交差点、屈折点の座標を記憶し、
印刷文字は記入領域の四辺の座標を記憶）（３）（１）の画像から（２）で求めた枠、印刷文字
を消去（４）枠と文字の間の濃度で二値化（５）（４）の画像に（３）の画像を加える"Frame removal method 1" (1) Binarization at lower density than the frame (2) Storage of the frame and the area of the printing character (example: digit) unnecessary for recognition (example: the frame is an intersection, an inflection point) Memorize the coordinates of
(The print characters store the coordinates of the four sides of the entry area.) (3) From the image of (1), delete the frame and print character obtained in (2). (4) Binarize with the density between the frame and the character. (5) Add the image of (3) to the image of (4)

【００３５】図３（ａ）〜（ｅ）が枠よりも濃い濃度の
手書き文字を記入した場合、図３（ｆ）〜（ｊ）が枠と
同じ濃度の印刷文字の場合の、この方法による途中過程
のイメージを示している。図３（ａ）（ｆ）が上記
（１）の二値化後のイメージ、図３（ｂ）（ｇ）が上記
（２）の領域、図３（ｃ）（ｈ）が上記（３）で残った
文字、図３（ｄ）（ｉ）が上記（４）の二値化後のイメ
ージ、図３（ｅ）（ｊ）が上記（５）の最終二値化イメ
ージを表す。FIGS. 3A to 3E show a case in which handwritten characters having a darker density than the frame are entered, and FIGS. 3F to 3J show printed characters having the same density as the frame. The image in the middle of the process is shown. 3A and 3F are images after the binarization of the above (1), FIGS. 3B and 3G are the areas of the above (2), and FIGS. 3C and 3H are the above (3) 3 (d) and (i) show the image after the binarization of the above (4), and FIGS. 3 (e) and (j) show the final binarized image of the above (5).

【００３６】記入文字が枠より濃い場合では、図３
（ｃ）で不必要な印刷文字と一緒に文字の一部が消えて
いるが、図３（ｄ）で文字全体が現れているので、最終
イメージの（ｅ）でも認識すべき文字だけが現れる。逆
に枠と同じ濃度の印刷文字の場合では、図３（ｉ）で二
値化によって文字が消去されているが、図３（ｈ）で文
字全体が現れているので、最終イメージの（ｊ）でも認
識すべき文字だけが現れる。In the case where the entered characters are darker than the frame, FIG.
In FIG. 3C, part of the characters are erased together with unnecessary print characters. However, since the entire characters appear in FIG. 3D, only the characters to be recognized appear in FIG. 3E of the final image. . Conversely, in the case of a printed character having the same density as that of the frame, the character is erased by binarization in FIG. 3 (i), but the entire character appears in FIG. 3 (h). ), Only the characters to be recognized appear.

【００３７】「枠除去方法２」（１）枠より低い濃度で二値化（２）枠の位置を認識（３）枠を消去（枠と重なった文字部分も消去）（４）枠消去により一緒に消去された文字部分を補間"Frame removal method 2" (1) Binarization with lower density than the frame (2) Recognition of the position of the frame (3) Erasure of the frame (Erase also the character part overlapping the frame) (4) Erasure of the frame Interpolate character parts erased together

【００３８】図４（ａ）〜（ｃ）で、枠消去により枠と
一緒に文字部分も消去されたイメージから、補間によっ
て文字形状を元に戻した例を示す。FIGS. 4A to 4C show an example in which the character shape is returned to the original shape by interpolation from an image in which the character portion is deleted together with the frame by the frame deletion.

【００３９】次に、第１の連結イメージ分割部２では、
枠除去部１で枠が除去されたパターンを入力して、図５
のような縦方向・横方向の投影の切れ目で、連結するイ
メージ毎に分割し、かつ各分割イメージの高さや幅も求
める。図５の各分割イメージに外接する長方形が文字外
接矩形であり、この高さと幅がそれぞれ分割イメージの
高さと幅を示す。なお、正確に分割するには、矩形の大
きさが変化しなくなるまで、縦方向投影による分割→横
方向投影による分割→縦方向投影による分割→…という
ように縦方向投影と横方向投影による分割を繰り返すこ
とになるが、実際には処理負荷・時間等を考慮して、あ
る定められた回数まで実行するようにしても良い。ま
た、投影分割方法以外に、ラベルによる分割方法を用い
ても構わない。Next, in the first connected image dividing unit 2,
By inputting the pattern from which the frame has been removed by the frame removing unit 1, FIG.
At the break in the vertical and horizontal projections such as, the image is divided for each connected image, and the height and width of each divided image are also obtained. The rectangle circumscribing each divided image in FIG. 5 is a character circumscribed rectangle, and the height and width indicate the height and width of the divided image, respectively. It should be noted that in order to accurately divide, until the size of the rectangle no longer changes, division by vertical projection → division by horizontal projection → division by vertical projection →... May be repeated, but may be executed up to a certain number of times in consideration of the processing load and time. Further, in addition to the projection division method, a division method using labels may be used.

【００４０】以下の手順は、各々の連結イメージに対す
る一連の処理として実行し、連結イメージの数だけ繰り
返し実行する。The following procedure is executed as a series of processing for each connected image, and is repeatedly executed by the number of connected images.

【００４１】次の横長イメージ選択部３では、第１の連
結イメージ分割部１で分割されたイメージの中で横幅が
広いものを選択する。例えば、図５の分割イメージで
は、訂正線で抹消された下段の“５００”という文字列
の分割イメージが選択され、続き線判定部４に出力され
る。なお、横幅の閾値は２文字分、あるいは３文字分と
なることが多く、続き文字や訂正が検出できる。それ以
外の横幅が狭いイメージは一文字切り出し部１０で処理
される。The next horizontally long image selection unit 3 selects an image having a wide width from among the images divided by the first connected image division unit 1. For example, in the divided image of FIG. 5, the divided image of the character string “500” in the lower row deleted by the correction line is selected and output to the continuous line determination unit 4. In addition, the threshold value of the width is often two characters or three characters, and a subsequent character or correction can be detected. Other narrow images are processed by the one-character cutout unit 10.

【００４２】続き線判定部４では、図６に示すように、
々の分割イメージについて横方向投影算出部１１で各ラ
インに対する横方向投影を求め、第１の水平ライン範囲
選択部１２において横方向投影がイメージ幅に予め定め
られた割合を乗じた値よりも大きくなるようなラインが
１つでも存在すれば、その分割イメージは続き線ありと
判定し、横方向投影が閾値よりも大きくなる範囲を出力
する。例えば、図７は訂正線で抹消された文字列“５０
０”について横方向投影を求めているが、訂正線の存在
する地点では横方向投影が閾値より大きくなるため、こ
の分割イメージは続き線ありと判定される。続き線と判
定された場合は図７のような閾値より大きくなる水平ラ
イン範囲を算出して背景情報算出部５で処理され、判定
されなかった場合は一文字切り出し部１０で処理され
る。In the continuation line determination unit 4, as shown in FIG.
For each of the divided images, the horizontal projection calculation unit 11 obtains a horizontal projection for each line, and the first horizontal line range selection unit 12 determines that the horizontal projection is larger than a value obtained by multiplying the image width by a predetermined ratio. If at least one such line exists, the divided image is determined to have a continuous line, and a range in which the horizontal projection is larger than a threshold is output. For example, FIG. 7 shows a character string “50” deleted by a correction line.
Although horizontal projection is obtained for 0 ", since the horizontal projection is larger than the threshold value at the point where the correction line exists, it is determined that the divided image has a continuous line. A horizontal line range that is larger than a threshold value such as 7 is calculated and processed by the background information calculation unit 5, and if not determined, processed by the one-character cutout unit 10.

【００４３】次に、背景情報算出部５では、続き線判定
部４で続き線ありと判定されたイメージについて、外接
矩形の上下の辺上の点から垂直方向に走査して文字の黒
画素の点に当たるまでの長さである背景距離を求める。
例えば、図８（ａ）〜（ｃ）の分割イメージに関して
は、上端・下端の各画素から、垂直方向に走査して最初
に当たる黒画素までの長さを示しており、これが各ライ
ンの背景距離を示す。Next, the background information calculation unit 5 scans the image determined as having a continuation line by the continuation line determination unit 4 in a vertical direction from points on the upper and lower sides of the circumscribed rectangle, and determines the black pixel of the character. Find the background distance, which is the length before hitting a point.
For example, with respect to the divided images in FIGS. 8A to 8C, the lengths from the top and bottom pixels to the first black pixel that is scanned in the vertical direction are shown, and this is the background distance of each line. Is shown.

【００４４】次に、訂正線判定部６では、続き線判定部
４で続き線ありと判定されたイメージについて、背景情
報算出部５から出力されたイメージの背景情報から、続
き線が訂正線であるかまたは続き文字の一部であるかを
判定する。例えば、図８（ａ）は“０００”の続き文字
であるが、この場合は続き線の本数が１本で、かつイメ
ージの上端に続き線があることを検出することにより、
続き文字であることを判定できる。また図８（ｂ）
（ｃ）は大きな“５”と小さな“０００”の続き文字で
構成された文字列であるが、この場合は、続き線の本数
が１本、イメージ上端からの背景の大きさの変化が一定
に近いことを検出することにより、続き文字であること
を判定できる。なお、続き文字の追加判定条件として、
イメージ下端からの背景の大きさが周期的に変化するこ
とを利用しても良い。続き線が訂正線であると判定され
た場合は、イメージの高さ、および図７のような横方向
投影が閾値以上の水平ライン範囲と一緒に高さ判定部７
に出力され、判定されなかった場合は一文字切り出し部
１０で処理を行う。Next, in the correction line determining section 6, for the image determined to have a continuous line by the continuous line determining section 4, the continuous line is determined as a correction line based on the background information of the image output from the background information calculating section 5. Determine if it is or is part of a continuation character. For example, FIG. 8A shows a continuation character of “000”. In this case, by detecting that the number of continuation lines is one and that there is a continuation line at the upper end of the image,
It can be determined that the character is a continuation character. FIG. 8 (b)
(C) is a character string composed of continuous characters of large “5” and small “000”. In this case, the number of continuous lines is one, and the change in the size of the background from the top of the image is constant. , It can be determined that the character is a continuation character. In addition, as a continuation character addition determination condition,
The fact that the size of the background from the lower end of the image periodically changes may be used. If it is determined that the continuation line is a correction line, the height determination unit 7 determines the height of the image and the horizontal line range in which the horizontal projection as shown in FIG.
Is output, and if it is not determined, the one-character cutout unit 10 performs processing.

【００４５】次に、高さ判定部７では、訂正線判定部６
で訂正線と判定されたイメージの高さを用いて、イメー
ジ全体を削除するか、一部を残すかを判定する。具体的
には、図９（ａ）のようにイメージの高さが小さい場合
は、被訂正文字のみが存在するのでイメージ除去部８で
イメージ全体を除去し、図９（ｂ）のようにイメージの
高さが大きい場合は、訂正線で抹消された被訂正文字の
上または下に訂正文字が記入されているので、横方向投
影が閾値より大きくなる範囲を被訂正文字除去部９に出
力して、続き線の位置情報を用いて被訂正文字の領域の
みを削除し、図９（ｃ）のようなイメージにする。Next, in the height determination section 7, the correction line determination section 6
Using the height of the image determined to be a correction line in step 2, it is determined whether to delete the entire image or to leave a part of the image. Specifically, when the height of the image is small as shown in FIG. 9A, only the characters to be corrected are present, so the entire image is removed by the image removing unit 8, and the image is removed as shown in FIG. 9B. If the height is large, the corrected character is written above or below the corrected character deleted by the correction line, so that the range where the horizontal projection is larger than the threshold is output to the corrected character removing unit 9. Then, only the area of the character to be corrected is deleted using the position information of the continuous line, and an image as shown in FIG.

【００４６】次に、イメージ除去部８では、高さ判定部
７でイメージの高さが小さいと判定されたイメージ全体
を除去する。Next, the image removing unit 8 removes the entire image for which the height determining unit 7 determines that the height of the image is small.

【００４７】次に、被訂正文字除去部９では、高さ判定
部７でイメージの高さが大きいと判定されたイメージに
ついて、高さ判定部７から出力された横方向投影が閾値
より大きくなる範囲を用いて、訂正線で抹消された被訂
正文字の領域を除去し、残った認識対象である訂正文字
領域のイメージから連結するイメージ毎に分割する。Next, in the corrected character removing section 9, the horizontal projection output from the height determining section 7 becomes larger than the threshold value for the image whose height is determined to be large by the height determining section 7. Using the range, the area of the corrected character deleted by the correction line is removed, and the image of the remaining corrected character area to be recognized is divided for each image to be connected.

【００４８】以下、被訂正文字除去部９の具体的な処理
手順の例を図１０の構成を用いて示す。まず、水平ライ
ン範囲大分類部３１において、高さ判定部７から出力さ
れた横方向投影が閾値より大きくなる範囲に基づいて、
訂正線がイメージ領域の上半分にあるか下半分にあるか
を判定する。例えば図９（ｂ）では横方向投影が閾値よ
り大きくなる範囲は下半分に存在するので訂正線は下半
分にあると判定される。次に訂正文字領域推定部３２に
おいて、訂正線がイメージ領域の上半分にあれば上半分
を除去し、逆に下半分にあれば下半分を除去する。例え
ば図９（ｂ）では訂正線が下半分にあるため、下半分を
除去したのが図９（ｃ）である。なお、訂正線が２本以
上あって、図９（ｄ）のように訂正線の位置が上半分と
下半分に分かれる場合は、訂正線の中点の位置が上半分
か下半分にあるかによって判定する。また、この場合は
上半分または下半分の領域除去後のイメージに訂正線が
残ってしまうので、訂正線を含む領域を一緒に除去す
る。例えば、図９（ｄ）の場合は、２つの訂正線の中点
が下半分にあるために下半分が除去されるが、残った上
半分に訂正線１が残ってしまうため、訂正線１とその下
の領域も一緒に除去する。Hereinafter, an example of a specific processing procedure of the corrected character removing unit 9 will be described with reference to the configuration of FIG. First, in the horizontal line range large classification unit 31, based on the range in which the horizontal projection output from the height determination unit 7 is larger than a threshold,
Determine whether the correction line is in the upper half or lower half of the image area. For example, in FIG. 9B, the range in which the horizontal projection is larger than the threshold exists in the lower half, so that the correction line is determined to be in the lower half. Next, in the correction character area estimation unit 32, if the correction line is in the upper half of the image area, the upper half is removed, and if it is in the lower half, the lower half is removed. For example, in FIG. 9B, since the correction line is in the lower half, FIG. 9C removes the lower half. In the case where there are two or more correction lines and the position of the correction line is divided into an upper half and a lower half as shown in FIG. 9D, whether the position of the middle point of the correction line is in the upper half or the lower half. Determined by In this case, since the correction line remains in the image after the removal of the upper half or lower half region, the region including the correction line is removed together. For example, in the case of FIG. 9D, the lower half is removed because the middle point of the two correction lines is in the lower half, but the correction line 1 remains in the remaining upper half. And the area under it.

【００４９】次に、第２の連結イメージ分割部３３にお
いて、残った訂正文字領域について、第１の連結イメー
ジ分割部２と同じ処理を用いて、図９（ｅ）のような連
結するイメージ毎に分割する。なお、除去領域について
は、上半分・下半分という区分ではなく、訂正線が含ま
れる領域だけでも構わない。Next, in the second connected image dividing unit 33, the remaining corrected character area is processed for each image to be connected as shown in FIG. Divided into Note that the removal area is not limited to the upper half and the lower half, but may be only an area including a correction line.

【００５０】最後に、一文字切り出し部１０では、横長
イメージ選択部３で横長と判定されなかった場合、続き
線選択部４で続き線と判定されなかった場合、訂正線判
定部６で訂正線と判定されなかった場合については、第
１の連結イメージ分割部２から出力されたものと同じ連
結イメージから一文字ずつ切り出し、被訂正文字除去部
で処理された場合については、被訂正文字除去部８にお
いて連結イメージ毎に分割したイメージを一文字ずつ切
り出す。一文字切り出しで用いる手法としては、ラベル
分離、斜め強制分離、文字部分の高さが最小の地点で分
離、強制等分割等がある。なお、被訂正文字除去部８に
おいて被訂正文字の一部が残った場合には、上記の手法
を用いて一文字ずつに切り出した後、小さいラベルのイ
メージを除去するようにしても構わない。Finally, in the one-character cutout unit 10, when the horizontal image selection unit 3 does not determine that the image is horizontal, when the continuation line selection unit 4 does not determine that it is a continuous line, the correction line determination unit 6 determines whether the correction line is correct. If not determined, the character is cut out one by one from the same connected image output from the first connected image dividing unit 2, and if processed by the corrected character removing unit, the corrected character removing unit 8 Cut out the image divided for each connected image one character at a time. As a method used for one character cutout, there are label separation, oblique forced separation, separation at a point where the height of a character portion is the minimum, forced equal division, and the like. If a part of the corrected character remains in the corrected character removing unit 8, the image of a small label may be removed after cutting out one character at a time using the above-described method.

【００５１】このように、本実施の形態１によれば、続
き線判定部４において横方向投影の大きさを用いて続き
線を含む訂正線や続き文字のイメージを抽出し、訂正線
判定部６では続き線判定部４で抽出されたイメージの背
景距離の変化に基づいて続き文字を取り除くことによ
り、従来は続き線が２本あることで訂正線を判定してい
たものを、本実施の形態１では訂正線が１本だけの場合
でも判定でき、かつ続き字を誤って訂正と判定しにくく
なる。As described above, according to the first embodiment, the continuation line determination unit 4 extracts a correction line including a continuation line or an image of a continuation character using the size of the horizontal projection, and outputs the correction line determination unit 4. In step 6, the continuation character is removed based on the change in the background distance of the image extracted by the continuation line determination unit 4, so that the correction line is conventionally determined because there are two continuation lines. In the first embodiment, the determination can be made even when there is only one correction line, and it is difficult to determine that the subsequent character is erroneously corrected.

【００５２】（実施の形態２）図１１は、本発明の第２
の実施の形態における文字切り出し装置のブロック図を
示すものである。ブロック図で示した続き線判定部４以
外の、図１の枠除去部１、第１の連結イメージ分割部
２、横長イメージ選択部３、背景情報算出部５、訂正線
判定部６、高さ判定部７、イメージ除去部８、被訂正文
字除去部９、一文字切り出し部１０は上述した第１の実
施の形態におけるものと同様であり、動作も同様である
（図２９での９１、９２追加、２７削除以外は図２８と
同じ）。(Embodiment 2) FIG. 11 shows a second embodiment of the present invention.
FIG. 3 is a block diagram of a character cutout device according to the embodiment. Except for the continuous line determining unit 4 shown in the block diagram, the frame removing unit 1, the first connected image dividing unit 2, the horizontally long image selecting unit 3, the background information calculating unit 5, the correction line determining unit 6, the height in FIG. The determination unit 7, the image removal unit 8, the corrected character removal unit 9, and the one-character cutout unit 10 are the same as those in the first embodiment described above, and the operations are the same (addition of 91 and 92 in FIG. 29). , 27 except for deletion).

【００５３】第１の実施の形態と異なるのは、続き線判
定部４の構成である図３における第１の閾値範囲選択部
１２の代わりに、図１１ではイメージ横幅から横方向投
影の閾値を決定する投影閾値決定部２２（図２９の９
１）と、横方向投影算出部１１から出力される各ライン
の横方向投影と投影閾値決定部２２から出力される閾値
を比較して、横方向投影が閾値を上回るようなラインが
存在する場合にはその範囲を背景情報算出部５と訂正線
判定部６に出力し、全てのラインについて上回るものが
存在しない場合は、一文字切り出し部１０にイメージを
出力する、第２の水平ライン範囲選択部２３を設けた点
である（図２８の９２）。The difference from the first embodiment is that, instead of the first threshold range selection unit 12 in FIG. 3 which is the configuration of the continuous line determination unit 4, in FIG. The projection threshold value determining unit 22 to be determined (9 in FIG. 29)
1) is compared with the horizontal projection of each line output from the horizontal projection calculation unit 11 and the threshold output from the projection threshold determination unit 22, and there is a line whose horizontal projection exceeds the threshold. The second horizontal line range selection unit outputs the range to the background information calculation unit 5 and the correction line determination unit 6, and outputs an image to the one-character cutout unit 10 when there is no image that exceeds all the lines. 23 is provided (92 in FIG. 28).

【００５４】以上のように構成された文字切り出し装置
について、以下その動作を説明する。なお、本装置の動
作の手順をフローチャートで表すと、図２９のようにな
る。横方向投影算出部１１までの動作は第１の実施の形
態と同一であるので省略する。投影閾値決定部２２で
は、入力されたイメージの幅に基づいて横方向投影の閾
値を決定する。具体的には図１２（ａ）（ｂ）のような
関係によって、イメージの幅に対して閾値（幅に対する
割合）を単純減少させるようにする。図１２（ａ）のよ
うな一次関数による関係式の一例を表すと（１）式のよ
うになる。閾値＝幅×（１．０−幅／１０００．０）・・・（１）これは、訂正線がやや斜めの場合では、イメージ幅が広
がると、図１３のように横方向投影のイメージ幅に対す
る割合が低くなるのを補正するためである。なお、閾値
の決定方法は図１２（ａ）（ｂ）以外の方法でも良く、
例えば段階的に閾値を変化させても良い。The operation of the character segmenting apparatus configured as described above will be described below. FIG. 29 is a flowchart showing the procedure of the operation of the present apparatus. The operation up to the horizontal projection calculation unit 11 is the same as that of the first embodiment, and a description thereof is omitted. The projection threshold value determination unit 22 determines a threshold value for horizontal projection based on the width of the input image. Specifically, the threshold (the ratio to the width) is simply reduced with respect to the width of the image according to the relationship as shown in FIGS. An example of a relational expression based on a linear function as shown in FIG. 12A is as shown in Expression (1). Threshold = width × (1.0−width / 1000.0) (1) This is because when the correction line is slightly oblique, if the image width increases, the image width of the horizontal projection as shown in FIG. This is to correct the decrease in the ratio to. The method of determining the threshold may be a method other than those shown in FIGS.
For example, the threshold value may be changed stepwise.

【００５５】次に第２の水平ライン範囲選択部２３で
は、横方向投影算出部１１で求められた分割イメージの
ライン毎の横方向投影と、投影閾値決定部２２で求めら
れた閾値を比較して、横方向投影が閾値を上回るような
ラインが存在する場合にはその範囲を背景情報算出部５
と訂正線判定部６に出力する。Next, the second horizontal line range selection section 23 compares the horizontal projection of each line of the divided image obtained by the horizontal projection calculation section 11 with the threshold value obtained by the projection threshold determination section 22. If there is a line whose horizontal projection exceeds the threshold, the range is set to the background information calculation unit 5.
Is output to the correction line determination unit 6.

【００５６】このように、本実施の形態２によれば、投
影閾値決定部２２でイメージ幅に応じて横方向投影の閾
値（イメージ幅に対する割合）を決定することにより、
訂正線がやや斜めの場合で、かつイメージ幅が大きい場
合でも訂正を判定することができる。As described above, according to the second embodiment, the projection threshold value determining unit 22 determines the threshold value of the horizontal projection (the ratio to the image width) according to the image width.
Correction can be determined even when the correction line is slightly oblique and the image width is large.

【００５７】（実施の形態３）図１４は本発明の第３の
実施の形態における文字切り出し装置のブロック図を示
すものである。ブロック図で示した被訂正文字除去部９
以外の、図１の枠除去部１、第１の連結イメージ分割部
２、横長イメージ選択部３、続き線判定部４、背景情報
算出部５、訂正線判定部６、高さ判定部７、イメージ除
去部８、一文字切り出し部１０は上述した第１の実施の
形態または第２の実施の形態におけるものと同様であ
り、動作も同様である（図３０では、１０１〜１０９追
加と８３〜８５削除以外は図２８と同じ）。被訂正文字
除去部９の構成は第１の実施の形態または第２の実施の
形態と異なり、水平ライン範囲判定部４１、第１のライ
ン領域除去部４２、第２のライン領域除去部４３、第２
の連結イメージ分割部４４、訂正文字可能性判定部４５
で構成される。(Embodiment 3) FIG. 14 is a block diagram showing a character segmenting apparatus according to a third embodiment of the present invention. Corrected character removing unit 9 shown in the block diagram
1, the frame removal unit 1, the first connected image division unit 2, the horizontally long image selection unit 3, the continuation line determination unit 4, the background information calculation unit 5, the correction line determination unit 6, the height determination unit 7, The image removing unit 8 and the one-character cutout unit 10 are the same as those in the first embodiment or the second embodiment described above, and the operation is also the same (in FIG. 30, 101 to 109 are added and 83 to 85 are added). Except for deletion, the same as FIG. 28). The configuration of the corrected character removing unit 9 is different from the first or second embodiment, and is different from the horizontal line range determining unit 41, the first line region removing unit 42, the second line region removing unit 43, Second
Connected image dividing section 44, corrected character possibility determining section 45
It consists of.

【００５８】図１９は訂正文字可能性判定部４５の構成
を示している。５１は小イメージ除去部で、各々の分割
イメージで大きさの小さいものを除去する。５２は分割
イメージ群選択部で、小イメージ除去部１通過後の二つ
の分割イメージ群の存在範囲を比較して、存在範囲の広
い方を選択する。FIG. 19 shows the structure of the correction character possibility determining section 45. Reference numeral 51 denotes a small image removing unit which removes a small image of each divided image. A divided image group selecting unit 52 compares the existing ranges of the two divided image groups after passing through the small image removing unit 1 and selects the one having a wider existing range.

【００５９】以上のように構成された文字切り出し装置
について、以下その動作を説明する。なお、本装置の動
作の手順をフローチャートで表すと、図３０のようにな
る。高さ判定部７までの処理は第１の実施の形態または
第２の実施の形態と同一であるので省略する。水平ライ
ン範囲判定部４１では、高さ判定部７から出力された閾
値以上の水平ライン範囲に基づいて、その範囲の中心
が、イメージの上端または下端からイメージの高さの半
分を超えない予め定められた範囲内に含まれるか否かを
判定する（図３０の１０１）。例えば、イメージの上端
または下端からイメージの高さ３０％以内の範囲に含ま
れるか否かを判定し、含まれる場合は第１のライン領域
除去部４２で処理を行い、含まれない場合は第２のライ
ン領域除去部４３で処理を行う。例えば、図１５（ａ）
の場合は水平ライン範囲の中心が、下端からイメージの
高さ３０％以内の範囲に含まれるので第１のライン領域
除去部４２で処理され、図１５（ｂ）の場合は、上端お
よび下端の両方からの高さ３０％以内の範囲に入ってい
ないので第２のライン領域処理部４３で処理される。The operation of the character segmenting apparatus configured as described above will be described below. FIG. 30 is a flowchart showing the procedure of the operation of the present apparatus. The processing up to the height determination unit 7 is the same as that of the first embodiment or the second embodiment, and will not be described. In the horizontal line range determination unit 41, based on the horizontal line range equal to or greater than the threshold value output from the height determination unit 7, the center of the range is determined in advance so as not to exceed half the height of the image from the top or bottom of the image. It is determined whether or not it is included in the specified range (101 in FIG. 30). For example, it is determined whether or not the image is included in a range within 30% of the height of the image from the upper or lower end of the image. If the image is included, the first line area removing unit 42 performs the process. The processing is performed by the second line area removing unit 43. For example, FIG.
In the case of, the center of the horizontal line range is included in the range within 30% of the height of the image from the lower end, so that the processing is performed by the first line area removing unit 42. In the case of FIG. Since the height does not fall within the range of 30% or less from the both, the second line area processing unit 43 performs processing.

【００６０】次に、第１のライン領域除去部４２では、
水平ライン範囲判定部４１で水平ライン範囲の中心が予
め定められた範囲に含まれると判定されたとき、その範
囲が上半分の場合はイメージの上端から水平ライン範囲
下端までの領域を除去し（図３０の１０８）、逆に範囲
が下半分の場合は水平ライン範囲上端からイメージ下端
までの領域を除去する（図３０の１０９）。例えば、図
１６（ａ）の場合は、水平ライン範囲の中心が下端から
３０％の範囲に含まれるので、水平ライン範囲上端から
イメージ下端までの領域を除去する。図１６（ｂ）の場
合は、逆に水平ライン範囲の中心が上端から３０％の範
囲に含まれるので、イメージ上端から水平ライン範囲下
端までの領域を除去する。Next, in the first line area removing section 42,
When the horizontal line range determination unit 41 determines that the center of the horizontal line range is included in the predetermined range, if the range is the upper half, the region from the upper end of the image to the lower end of the horizontal line range is removed ( Conversely, when the range is the lower half, the region from the upper end of the horizontal line range to the lower end of the image is removed (109 in FIG. 30). For example, in the case of FIG. 16A, since the center of the horizontal line range is included in the range of 30% from the lower end, the region from the upper end of the horizontal line range to the lower end of the image is removed. In the case of FIG. 16B, since the center of the horizontal line range is included in the range of 30% from the upper end, the region from the upper end of the image to the lower end of the horizontal line range is removed.

【００６１】次に、第２のライン領域除去部４３では、
水平ライン範囲判定部４１で水平ライン範囲の中心が予
め定められた範囲に含まれないと判定されたとき、水平
ライン範囲の上および下の２つの領域を選択する。例え
ば、図１７の場合では、薄いメッシュの選択領域１と選
択領域２の２つの領域を選択する。Next, in the second line area removing section 43,
When the horizontal line range determination unit 41 determines that the center of the horizontal line range is not included in the predetermined range, the two regions above and below the horizontal line range are selected. For example, in the case of FIG. 17, two regions, a selection region 1 and a selection region 2 of a thin mesh, are selected.

【００６２】次に、第２の連結イメージ分割部４４で
は、入力されたイメージ領域について連結するイメージ
単位に分割する（図３０の８６、１０２）。分割は、第
１の連結イメージ分割部２と同一の方法で行う。なお、
第２のライン領域除去部４３で２つの領域が選択された
場合は、各々の領域について連結イメージに分割する
（図３０の１０２）。Next, the second connected image dividing section 44 divides the input image area into connected image units (86 and 102 in FIG. 30). The division is performed in the same manner as in the first connected image division unit 2. In addition,
When two regions are selected by the second line region removing unit 43, each region is divided into a connected image (102 in FIG. 30).

【００６３】次に、訂正文字可能性判定部４５では、第
２の連結イメージ分割部４４から二つの訂正文字候補領
域の分割イメージ群が出力された場合には、分割イメー
ジ群を比較して、認識すべき訂正文字である可能性が高
い方のイメージを選択する（図３０の１０４〜１０
６）。分割信号が出力されていない場合は、第２の連結
イメージ分割部から出力された分割イメージをそのまま
出力する。Next, in the case where the divided image groups of the two corrected character candidate areas are output from the second connected image dividing section 44, the corrected character possibility determining section 45 compares the divided image groups with each other. An image having a higher possibility of being a correction character to be recognized is selected (104 to 10 in FIG. 30).
6). If the divided signal has not been output, the divided image output from the second connected image dividing unit is output as it is.

【００６４】図１９の構成を用いて動作を説明すると、
図１７の選択領域１のイメージについて、第２の連結イ
メージ分割部４４で連結イメージ単位に分割し、小イメ
ージ除去部５１で大きさの小さい連結イメージを除去す
ると（図３０の１０３）、図１８（ａ）のようなイメー
ジ群が残る。同様の処理を図１７の選択領域２のイメー
ジについても適用すると、図１７（ｂ）のようなイメー
ジ群が残る。図１８（ａ）と図１８（ｂ）のイメージ群
の存在範囲の広さ（両矢印の距離）を比較する（図３０
の１０４）と、図１８（ａ）の方が広いため、図１８
（ａ）の方を選択（図３０の１０５）し、以後の処理を
行う。The operation will be described with reference to the configuration of FIG.
The image of the selected area 1 in FIG. 17 is divided into connected image units by the second connected image dividing unit 44, and a small-sized connected image is removed by the small image removing unit 51 (103 in FIG. 30). An image group as shown in FIG. When the same processing is applied to the image of the selection area 2 in FIG. 17, an image group as shown in FIG. 17B remains. 18A and 18B are compared with each other (the distance between the double-headed arrows) (FIG. 30).
104) and FIG. 18A is wider,
(A) is selected (105 in FIG. 30), and the subsequent processing is performed.

【００６５】このように、本実施の形態３によれば、訂
正線が中央付近にあるために認識すべき訂正文字の位置
判定が難しい場合でも、第２のライン領域除去部４３で
訂正線の上と下の領域を両方選択し、第２の連結イメー
ジ分割部４４で投影やラベル処理を利用して文字単位に
近い状態にまで分割し、訂正文字可能性判定部４５で被
訂正文字残りおよびノイズを除去した後の文字存在範囲
を比較して大きい方を選択することにより、精度良く訂
正文字の領域を選択することができる。As described above, according to the third embodiment, even when it is difficult to determine the position of the correction character to be recognized because the correction line is located near the center, the second line area removing unit 43 removes the correction line. Both the upper and lower regions are selected, the second connected image dividing unit 44 divides the image into a state close to a character unit using projection or label processing, and the corrected character possibility determining unit 45 determines the remaining characters to be corrected and noise. By comparing the character existence range after the removal and selecting the larger one, the area of the corrected character can be selected with high accuracy.

【００６６】（実施の形態４）図２０は本発明の第４の
実施の形態における文字読取装置のブロック図を示すも
のである。枠除去部１、第１の連結イメージ分割部２、
横長イメージ選択部３、続き線判定部４、背景距離算出
部５、訂正線判定部６、高さ判定部７、イメージ除去部
８、被訂正文字除去部９、一文字切り出し部１０は図１
に示したものと同一である（図３１の７１〜８９が図２
８と同一）。図１と異なる構成は、一文字切り出し部１
０から出力された各々の文字イメージに基づいて認識部
で識別しやすいような特徴を抽出する特徴抽出部６１
と、特徴抽出部６１で抽出した特徴を入力して、各々の
文字がどの文字種類に属するかを決定する認識部６２で
ある（図３１では１１１、１１２を追加）。(Embodiment 4) FIG. 20 is a block diagram showing a character reading apparatus according to a fourth embodiment of the present invention. A frame removing unit 1, a first connected image dividing unit 2,
The horizontally long image selection unit 3, the continuation line determination unit 4, the background distance calculation unit 5, the correction line determination unit 6, the height determination unit 7, the image removal unit 8, the corrected character removal unit 9, and the one character cutout unit 10 are shown in FIG.
(71 to 89 in FIG. 31 are the same as those shown in FIG. 2).
8). The configuration different from that of FIG.
A feature extraction unit 61 that extracts features that are easily identified by the recognition unit based on each character image output from 0.
And a recognition unit 62 that inputs the features extracted by the feature extraction unit 61 and determines to which character type each character belongs (111 and 112 are added in FIG. 31).

【００６７】以上のようにして構成された文字読取装置
において、以下その動作を説明する。枠除去部１から一
文字切り出し部１０までの動作は、第１の実施の形態と
同一であるので省略する。特徴抽出部６１では、一文字
切り出し部１０から出力された各々の文字イメージに基
づいて特徴を抽出する（図３１の１１１）が、以下で特
徴抽出の一例を示す。まず、一文字切り出し画像から輪
郭点を抽出し、輪郭画像を求め、輪郭方向を図２１のよ
うに符号化して、図２２に示すようなマスクパターンと
のマッチングによって中心画素の輪郭点の方向を決定す
る。例えば、図２２(a)(b)の３×３マスクにおいて灰色
で示した画素は白画素もしくは黒画素のどちらでも構わ
ず、その他の白や黒で示した画素は必ず同一の色にマッ
チングする必要があるマッチング方法では、輪郭画像の
任意の３×３領域で図２２(a) の３×３マスクと同一の
パターンが存在する場合は、前記３×３領域の中央画素
を７に符号化し、図２２(b) と同一の場合は、４に符号
化する。The operation of the thus configured character reading apparatus will be described below. The operations from the frame removing unit 1 to the one-character cutout unit 10 are the same as those in the first embodiment, and a description thereof will be omitted. The feature extracting unit 61 extracts a feature based on each character image output from the one-character extracting unit 10 (111 in FIG. 31), and an example of feature extraction will be described below. First, a contour point is extracted from the one-character cut-out image, a contour image is obtained, the contour direction is encoded as shown in FIG. 21, and the direction of the contour point of the center pixel is determined by matching with a mask pattern as shown in FIG. I do. For example, in the 3 × 3 mask of FIGS. 22A and 22B, the pixels shown in gray may be either white pixels or black pixels, and the other pixels shown in white or black always match the same color. According to the necessary matching method, when the same pattern as the 3 × 3 mask shown in FIG. 22A exists in an arbitrary 3 × 3 area of the contour image, the center pixel of the 3 × 3 area is encoded into 7 22 (b), it is coded to 4.

【００６８】次に求めた輪郭画像に対して、図２３(a)
に示すように外接矩形の左辺の地点を原点として、画像
の各行について右方向に走査して最初に発見される輪郭
画素までの距離と、図２３(b) に示すように外接矩形の
右辺の地点を原点として、画像の各行について左方向に
走査して最初に発見される輪郭画素までの距離と、図２
３(c) に示すように外接矩形の上辺の地点を原点とし
て、画像の各列について下方向に走査して最初に発見さ
れる輪郭画素までの距離と、図２３(d) に示すように外
接矩形の下辺の地点を原点として、画像の各列について
上方向に走査して最初に発見される輪郭画素までの距離
を全て求める。Next, with respect to the contour image obtained, FIG.
As shown in Fig. 23, with the point on the left side of the circumscribed rectangle as the origin, each line of the image is scanned to the right and the distance to the first contour pixel found, and the right side of the circumscribed rectangle as shown in Fig. 23 (b). With the point as the origin, the distance to the first found contour pixel by scanning leftward for each row of the image, and FIG.
As shown in FIG. 3 (c), with the origin of the upper side of the circumscribed rectangle as the origin, each column of the image is scanned in the downward direction to the first found contour pixel, as shown in FIG. 23 (d). Using the point on the lower side of the circumscribed rectangle as the origin, each column of the image is scanned in the upward direction to find all the distances to the first contour pixel found.

【００６９】次に、切り出し画像を複数の領域に分割す
る。例えば図２４に示すように、重心分割を行って各領
域に含まれる黒画素数が均等になるように水平・垂直方
向に複数の領域に分割する。輪郭方向成分について、分
割したブロック単位かつ輪郭方向毎の個数を計算する。
例えば２５(a) の輪郭方向画像において、太線をブロッ
ク境界とし、その中の各々の数字は各画素の輪郭方向成
分の符号を表すものとすると、ブロック単位かつ輪郭方
向毎に数を集計した結果は図２５(b) 〜(e) のようにな
る。次に、集計したブロック毎かつ輪郭方向毎の輪郭方
向成分の個数を対象のブロックの大きさで割って正規化
して、輪郭方向統計特徴量を求める。Next, the cut-out image is divided into a plurality of areas. For example, as shown in FIG. 24, the region is divided into a plurality of regions in the horizontal and vertical directions so that the number of black pixels included in each region is equalized by performing centroid division. For the contour direction component, the number of divided block units and each contour direction is calculated.
For example, in the outline direction image of 25 (a), assuming that a thick line is a block boundary and each number in the image represents the sign of the outline direction component of each pixel, the result of counting the numbers for each block and for each outline direction Are as shown in FIGS. 25 (b) to 25 (e). Next, the total number of contour direction components for each block and for each contour direction is divided by the size of the target block and normalized to obtain contour direction statistical feature values.

【００７０】次に、ブロック毎に、ブロック分割境界線
に平行に走査して最初に存在する文字パターン画素の輪
郭方向成分（外郭方向成分）毎の輪郭方向成分の個数を
計算する。例えば、あるブロックの輪郭方向画像である
図２６において、各々の数字は各画素の輪郭方向成分の
符号を表すものとし、符号−は輪郭点でない画素とし、
走査方向を矢印の通り左端から右方向とすると、ブロッ
ク分割境界線に平行に走査して最初に存在する文字パタ
ーン画素は網掛けされた画素であり、前記網掛けされた
画素全体の輪郭方向成分（外郭方向成分）の個数を輪郭
方向毎（例では０〜３）に集計する。水平方向で分割さ
れたブロックにおいては、左端から右方向と、右端から
左方向の二方向で走査し、垂直方向で分割されたブロッ
クにおいては、上端から下方向と、下端から上方向の二
方向で走査する。Next, for each block, scanning is performed in parallel with the block dividing boundary line to calculate the number of contour direction components for each contour direction component (contour direction component) of the character pattern pixel that first exists. For example, in FIG. 26 which is a contour direction image of a certain block, each number represents a sign of a contour direction component of each pixel, and a sign − represents a pixel that is not a contour point.
Assuming that the scanning direction is from the left end to the right as indicated by the arrow, the character pattern pixels which scan in parallel to the block division boundary line and are present first are shaded pixels, and the contour direction components of the entire shaded pixels are shaded pixels. The number of (outline direction components) is totaled for each outline direction (0 to 3 in the example). For blocks divided in the horizontal direction, scan in two directions from the left end to the right and in the right direction to the left, and in blocks divided in the vertical direction, two directions from the top to the bottom and from the bottom to the top. Scan with.

【００７１】次に、集計したブロック毎かつ輪郭方向毎
の外郭方向成分の個数を、外郭方向集計対象のブロック
の大きさで割って正規化して外郭方向統計特徴量を求め
る。次に、水平方向で分割したブロックでは、左端から
右方向への背景距離の合計と、右端から左方向への背景
距離の合計を計算し、垂直方向で分割したブロックで
は、上端から下方向への背景距離の合計と、下端から上
方向への背景距離の合計を計算する。次に、ブロック毎
の合計四方向の背景距離の合計を、外郭方向集計対象の
ブロックの大きさで割って正規化して、背景距離統計特
徴量を求める。以上が、特徴抽出部６１における特徴量
抽出の例である。Next, the number of contour direction components for each block and for each contour direction is divided by the size of the block to be counted for the contour direction and normalized to obtain contour direction statistical feature values. Next, in the block divided in the horizontal direction, the total of the background distance from the left end to the right and the total of the background distance from the right end to the left are calculated. Calculate the sum of the background distance of the image and the background distance from the bottom to the top. Next, the sum of the background distances in all four directions for each block is divided by the size of the block to be counted in the outer direction to normalize, thereby obtaining a background distance statistical feature amount. The above is an example of feature amount extraction in the feature extraction unit 61.

【００７２】次に、認識部６２では、特徴抽出部６１で
求めた特徴量に基づいて、文字パターンが属するカテゴ
リを求める（図３１の１１２）。図２７では、認識部６
２の一例として、学習完了後の階層型ニューラルネット
ワークを用いて構成する場合を示している。第１層のニ
ューロンに輪郭方向統計特徴量、外郭方向統計特徴量、
および背景距離統計特徴量を入力し、中間層を経由して
最終層へ信号を流すことによって各々の最終層ニューロ
ンから出力信号が発生するようになっており、出力値が
最大の最終層ニューロンに対応する文字の種類を認識部
６２から出力する。例えば、図２７においては、左端の
最終層ニューロン出力が最大であるため認識部６２から
出力される文字の種類は“０”である。このようにし
て、図２０の文字読取装置によって、入力パターンに含
まれる文字の認識が可能になる。Next, the recognizing section 62 obtains a category to which the character pattern belongs based on the feature quantity obtained by the feature extracting section 61 (112 in FIG. 31). In FIG. 27, the recognition unit 6
As an example of 2, a case is shown in which a configuration is made using a hierarchical neural network after learning is completed. In the first layer neurons, the contour direction statistical feature, the contour direction statistical feature,
And the background distance statistical features are input, and an output signal is generated from each final layer neuron by passing a signal to the final layer via the intermediate layer. The corresponding character type is output from the recognition unit 62. For example, in FIG. 27, the type of the character output from the recognition unit 62 is “0” because the output of the leftmost final layer neuron is the maximum. In this way, the character reading device of FIG. 20 can recognize the characters included in the input pattern.

【００７３】このように、本実施の形態４によれば、こ
れまで正常に切り出せなかった訂正に関連する文字が、
本発明の文字切り出し装置（文字切り出し部）で正常に
切り出すことができるようになることで切り出し精度が
向上し、切り出し精度が向上することによりシステムの
最終性能を表す認識部６２の認識性能も向上し、また、
認識性能が向上することで、オペレータやユーザによる
認識結果の確認作業および修正作業の手間が少なくな
る。As described above, according to the fourth embodiment, the characters related to the correction that cannot be cut out normally until now are:
With the character extraction device (character extraction unit) of the present invention, normal extraction can be performed, thereby improving the extraction accuracy, and improving the extraction accuracy, thereby improving the recognition performance of the recognition unit 62 that represents the final performance of the system. And also
By improving the recognition performance, the work of confirming and correcting the recognition result by the operator or the user is reduced.

【００７４】[0074]

【発明の効果】以上説明したように、本発明の請求項１
に記載の文字切り出し方法によれば、従来は続き線が２
本あることで訂正線を判定していたものを、本発明では
背景距離の変化が小さい連続する範囲が長く連続しない
ものを訂正線と判定することにより、訂正線が１本の場
合でも判定でき、かつ続き字を誤って訂正と判定しなく
なる。As described above, according to the first aspect of the present invention,
According to the character segmentation method described in (1), conventionally, the continuous line is 2
In the present invention, a correction line can be determined even if there is only one correction line. , And continuation characters are not erroneously determined to be corrected.

【００７５】本発明の請求項２に記載の文字切り出し方
法によれば、外接矩形幅が大きくなるに従って、外接矩
形幅に対する割合で表す横方向投影の閾値を小さくする
ことにより、外接矩形幅が大きくかつ続き線が若干斜め
になっているために、横方向投影の外接矩形幅に対する
割合が低くなる場合でも、続き線を検出することができ
る。According to the character extracting method according to the second aspect of the present invention, as the width of the circumscribed rectangle increases, the threshold value of the horizontal projection expressed as a ratio to the width of the circumscribed rectangle is reduced, thereby increasing the circumscribed rectangle width. In addition, even when the ratio of the horizontal projection to the circumscribed rectangle width is low because the continuous line is slightly inclined, the continuous line can be detected.

【００７６】本発明の請求項３に記載の文字切り出し方
法によれば、訂正線がパターンの中央付近にあるため
に、訂正文字が訂正線の上または下のどちらに存在する
かの判別が難しい場合は、一旦訂正線の上そして下の領
域の両方について連結イメージに分割して比較し、分割
後の連結イメージの状態を比較して訂正文字の可能性の
高いものを選択することにより、全てのパターンについ
て訂正線の上と下の両方の領域を調べるよりも効率的に
訂正文字の記入領域が検出でき、さらに、一旦訂正線の
上または下で強制的に領域を分割することにより、訂正
文字と被訂正文字のイメージが重なる場合でも被訂正文
字の影響を少なく分離できるという作用を有する。According to the character extracting method of the third aspect of the present invention, since the correction line is located near the center of the pattern, it is difficult to determine whether the correction character exists above or below the correction line. In this case, once the divided image is divided and compared for both the area above and below the correction line, and the state of the divided image after the division is compared, the one with the highest possibility of the corrected character is selected. It is possible to detect the correction character entry area more efficiently than checking both the area above and below the correction line for the pattern of the pattern, and by forcibly dividing the area once above or below the correction line, This has the effect that even when the images of the characters to be corrected overlap, the influence of the characters to be corrected can be reduced with little influence.

【００７７】本発明の請求項４に記載の文字読取方法に
よれば、これまで正常に切り出せなかった訂正に関連す
る文字が、本発明の切り出し装置（文字切り出し部）で
正常に切り出すことができるようになり、切り出し精度
の向上により認識部の認識性能も向上するので、システ
ムの最終性能を表す認識部の認識性能も向上する。According to the character reading method according to the fourth aspect of the present invention, characters related to correction that could not be normally cut out can be normally cut out by the cutout device (character cutout unit) of the present invention. As a result, the recognition performance of the recognition unit is also improved by improving the cutout accuracy, so that the recognition performance of the recognition unit that represents the final performance of the system is also improved.

【００７８】本発明の請求項５に記載の記録媒体は、本
発明の文字切り出し方法を他のシステムにも簡単に導入
できるようにすることができ、認識部の機能を持つシス
テムまたは記録媒体と組み合わせることによって、入力
パターンを取得するだけで、そのパターンに含まれる文
字の読取も可能になり、また、認識性能向上の結果とし
てユーザ側の確認・修正の負担が少なくなる。A recording medium according to a fifth aspect of the present invention can easily introduce the character segmentation method of the present invention into another system, and can provide a system or a recording medium having a function of a recognition unit. By combining them, it is possible to read the characters included in the input pattern only by acquiring the input pattern, and as a result of the improvement of the recognition performance, the burden on the user for confirmation and correction is reduced.

【００７９】本発明の請求項６に記載の記録媒体は、本
発明の文字読取方法を他のシステムでも簡単に導入でき
るようにすることができ、入力パターンを取得するだけ
で、そのパターンに含まれる文字の読取も可能になると
いう作用を有し、また、認識性能向上の結果としてユー
ザ側の確認・修正の負担が少なくなる。The recording medium according to the sixth aspect of the present invention enables the character reading method of the present invention to be easily introduced into other systems. This also has the effect of enabling the reading of characters to be read, and the burden of confirmation / correction on the user side is reduced as a result of the improvement in recognition performance.

【００８０】本発明の請求項７に記載の文字切り出し装
置によれば、訂正線・続き字も含めて続き線を１本の場
合でも抽出し、背景距離の変化が小さい連続範囲が長い
ものを続き字として除外することにより、訂正線が含ま
れるパターンのみを抽出することができるので、訂正線
が１本の場合でも判定でき、かつ続き字を誤って訂正と
判定しなくなる。According to the character segmenting apparatus of the present invention, even a single continuous line including a correction line and a continuous character is extracted, and a continuous range with a small change in background distance is long. By excluding as a continuation character, only a pattern including a correction line can be extracted, so that it is possible to determine even a single correction line, and the continuation character is not erroneously determined to be corrected.

【００８１】本発明の請求項８に記載の文字切り出し装
置によれば、外接矩形幅が大きくなるにしたがって、外
接矩形幅に対する割合で表す横方向投影の閾値を小さく
することにより、外接矩形幅が大きくかつ続き線が若干
斜めになっているために、横方向投影の外接矩形幅に対
する割合が低くなる場合でも、続き線を検出することが
できる。According to the character segmenting device of the present invention, as the width of the circumscribed rectangle increases, the threshold value of the horizontal projection expressed as a ratio to the width of the circumscribed rectangle is reduced, so that the circumscribed rectangle width is reduced. Even if the ratio of the horizontal projection to the width of the circumscribed rectangle is low because the large and continuous line is slightly oblique, the continuous line can be detected.

【００８２】本発明の請求項９に記載の文字切り出し装
置によれば、従来の訂正文字の抽出には訂正文字と被訂
正文字がラベル分離していることが必要であったが、本
発明では、水平ライン範囲の領域および上半分または下
半分で強制的に分割することによって、ラベル分離して
いなくても訂正文字の領域を抽出することができ、その
時点で訂正文字以外のイメージが残っていても、連結イ
メージに分割した後で小さい連結イメージを除去するこ
とにより、訂正文字以外のイメージを除去できる。According to the character extracting apparatus of the ninth aspect of the present invention, it has been necessary to separate the corrected character from the corrected character by label in the conventional extraction of the corrected character. By forcibly dividing the area in the horizontal line range and the upper half or lower half, the area of the corrected character can be extracted even without label separation, and at that point, the image other than the corrected character remains. However, by removing the small connected image after the division into the connected image, the image other than the corrected character can be removed.

【００８３】本発明の請求項１０に記載の文字切り出し
装置によれば、訂正線が上端または下端に近い場合は、
訂正文字は反対側に存在すると言えるので、端から訂正
線までの領域を削除することにより、訂正線が含まれる
被訂正文字の領域を確実に削除し、訂正文字の領域を誤
って削除しなくなり、訂正線がパターンの中央付近にあ
って、訂正文字が訂正線の上または下のどちらに存在す
るかの判別が難しい場合は、一旦訂正線の上そして下の
領域の両方について連結イメージに分割して比較し、分
割後の連結イメージの状態を比較して訂正文字の可能性
の高いものを選択することにより、全てのパターンにつ
いて訂正線の上と下の両方の領域を調べるよりも効率的
に訂正文字の記入領域が検出でき、さらに、一旦訂正線
の上または下で強制的に領域を分割することにより、訂
正文字と被訂正文字のイメージが重なる場合でも被訂正
文字の影響を少なく分離できる。According to the character extracting device of the tenth aspect of the present invention, when the correction line is near the upper end or the lower end,
Since the corrected character can be said to be on the opposite side, by deleting the area from the end to the correction line, the area of the corrected character containing the correction line is surely deleted, and the area of the corrected character will not be deleted accidentally If the correction line is near the center of the pattern and it is difficult to determine whether the correction character is above or below the correction line, once split both the area above and below the correction line into a connected image. By comparing and comparing the state of the connected image after division and selecting the one with the highest possibility of correction characters, correction is more efficient than checking both the area above and below the correction line for all patterns The character entry area can be detected, and the area is forcibly divided above or below the correction line to reduce the effect of the corrected character even when the corrected character and the image of the corrected character overlap. It can be separated.

【００８４】本発明の請求項１１に記載の文字切り出し
装置によれば、訂正文字以外の、被訂正文字の残りや訂
正印の一部のように文字とは判定できないような小さい
分割イメージを除去して訂正文字だけを残し、二つの訂
正文字候補領域の残った文字部分の幅を比較して大きい
方を選択することにより、訂正文字が含まれる領域をよ
り正確に抽出できる。According to the character extracting device of the eleventh aspect of the present invention, a small divided image which cannot be determined as a character, such as the rest of the corrected character and a part of the correction mark, other than the corrected character is removed. By leaving only the corrected character and comparing the widths of the remaining character portions of the two corrected character candidate areas and selecting the larger one, the area containing the corrected character can be extracted more accurately.

【００８５】本発明の請求項１２に記載の文字読取装置
によれば、これまで正常に切り出せなかった訂正に関連
する文字が、本発明の切り出し装置（文字切り出し部）
で正常に切り出すことができるようになることで切り出
し精度が向上し、切り出し精度が良くなれば認識部の認
識性能も良くなるので、システムの最終性能を表す認識
部の認識性能も向上し、認識性能が向上することで、オ
ペレータやユーザによる結果の確認作業および修正作業
の手間が少なくなる。According to the character reading device of the twelfth aspect of the present invention, the character relating to the correction that has not been normally extracted so far can be extracted by the extracting device (character extracting unit) of the present invention.
The accuracy of extraction is improved by being able to extract normally, and the recognition performance of the recognition unit is improved if the extraction accuracy is improved, so the recognition performance of the recognition unit that represents the final performance of the system is also improved, By improving the performance, the labor of the operator and the user for confirming and correcting the result is reduced.

【図面の簡単な説明】[Brief description of the drawings]

【図１】第１の実施の形態における文字切り出し装置基
本構成を示すブロック図FIG. 1 is a block diagram showing a basic configuration of a character cutout device according to a first embodiment.

【図２】第１の実施の形態における枠付きの入力画像パ
ターン例を示す模式図FIG. 2 is a schematic diagram illustrating an example of an input image pattern with a frame according to the first embodiment;

【図３】第１の実施の形態における枠消去方法を示す模
式図FIG. 3 is a schematic diagram illustrating a frame erasing method according to the first embodiment;

【図４】第１の実施の形態における枠消去にともなう文
字部分補間方法を示す模式図FIG. 4 is a schematic diagram illustrating a character partial interpolation method according to a frame erasure according to the first embodiment.

【図５】第１の実施の形態における投影を用いた連結イ
メージ分割方法を示す模式図FIG. 5 is a schematic diagram showing a connected image dividing method using projection in the first embodiment.

【図６】第１の実施の形態における続き線判定部の構成
を示すブロック図FIG. 6 is a block diagram illustrating a configuration of a continuous line determination unit according to the first embodiment.

【図７】第１の実施の形態における続き線判定方法を示
す模式図FIG. 7 is a schematic diagram showing a continuous line determination method according to the first embodiment;

【図８】第１の実施の形態における背景距離算出方法を
示す模式図FIG. 8 is a schematic diagram showing a background distance calculation method according to the first embodiment.

【図９】第１の実施の形態における被訂正文字除去と訂
正文字抽出方法を示す模式図FIG. 9 is a schematic diagram illustrating a method of removing a corrected character and extracting a corrected character according to the first embodiment;

【図１０】第１の実施の形態における被訂正文字除去部
の構成を示すブロック図FIG. 10 is a block diagram illustrating a configuration of a corrected character removing unit according to the first embodiment;

【図１１】第２の実施の形態における続き線判定部の構
成を示すブロック図FIG. 11 is a block diagram illustrating a configuration of a continuous line determination unit according to the second embodiment.

【図１２】第２の実施の形態における横方向投影閾値算
出方法を示す模式図FIG. 12 is a schematic diagram illustrating a method of calculating a horizontal projection threshold value according to the second embodiment;

【図１３】第２の実施の形態における矩形幅と横方向投
影の関係を示す模式図FIG. 13 is a schematic diagram showing a relationship between a rectangular width and a horizontal projection in the second embodiment.

【図１４】第３の実施の形態における被訂正文字除去部
の構成を示すブロック図FIG. 14 is a block diagram illustrating a configuration of a corrected character removing unit according to a third embodiment;

【図１５】第３の実施の形態における訂正文字領域の抽
出方法を示す模式図FIG. 15 is a schematic diagram showing a method of extracting a corrected character area according to the third embodiment.

【図１６】第３の実施の形態における被訂正文字除去方
法を示す模式図FIG. 16 is a schematic diagram illustrating a method of removing a corrected character according to the third embodiment;

【図１７】第３の実施の形態における訂正文字抽出のた
めの選択領域を示す模式図FIG. 17 is a schematic diagram showing a selection area for extracting a corrected character according to the third embodiment;

【図１８】第３の実施の形態における訂正文字領域の判
別方法を示す模式図FIG. 18 is a schematic diagram showing a method of determining a corrected character area according to the third embodiment.

【図１９】第３の実施の形態における訂正文字可能性判
定部の構成を示すブロック図FIG. 19 is a block diagram illustrating a configuration of a corrected character possibility determining unit according to the third embodiment.

【図２０】第４の実施の形態における文字読取装置の構
成を示すブロック図FIG. 20 is a block diagram illustrating a configuration of a character reading device according to a fourth embodiment.

【図２１】第４の実施の形態における輪郭方向の符号化
例を示す模式図FIG. 21 is a schematic diagram showing an example of encoding of a contour direction in the fourth embodiment.

【図２２】第４の実施の形態における各点の輪郭方向の
算出方法を示す模式図FIG. 22 is a schematic diagram illustrating a method of calculating the contour direction of each point according to the fourth embodiment.

【図２３】第４の実施の形態における背景距離算出方法
を示す模式図FIG. 23 is a schematic diagram showing a background distance calculation method according to the fourth embodiment.

【図２４】第４の実施の形態における重心による領域分
割方法を示す模式図FIG. 24 is a schematic diagram showing an area dividing method based on the center of gravity in the fourth embodiment.

【図２５】第４の実施の形態における領域毎の輪郭方向
成分の集計例を示す模式図FIG. 25 is a schematic diagram showing an example of counting contour direction components for each region in the fourth embodiment.

【図２６】第４の実施の形態における外郭方向成分例を
示す模式図FIG. 26 is a schematic diagram showing an example of a contour direction component in the fourth embodiment.

【図２７】第４の実施の形態における認識部の例を示す
模式図FIG. 27 is a schematic diagram illustrating an example of a recognition unit according to the fourth embodiment.

【図２８】第１の実施の形態における動作を示すフロー
図FIG. 28 is a flowchart showing the operation in the first embodiment.

【図２９】第２の実施の形態における動作を示すフロー
図FIG. 29 is a flowchart showing the operation in the second embodiment.

【図３０】第３の実施の形態における動作を示すフロー
図FIG. 30 is a flowchart showing the operation in the third embodiment.

【図３１】第４の実施の形態における動作を示すフロー
図FIG. 31 is a flowchart showing the operation in the fourth embodiment.

【図３２】従来の文字切り出し装置の構成を示すブロッ
ク図FIG. 32 is a block diagram showing a configuration of a conventional character cutout device.

【図３３】従来手法における検出される続き線パターン
例を示す模式図FIG. 33 is a schematic diagram showing an example of a continuous line pattern detected in the conventional method.

【図３４】従来手法では切り出せない被訂正文字・訂正
文字の連結パターン例を示す模式図FIG. 34 is a schematic view showing an example of a concatenated pattern of a corrected character and a corrected character which cannot be extracted by the conventional method.

【符号の説明】[Explanation of symbols]

１枠除去部２第１の連結イメージ分割部３横長イメージ選択部４続き線判定部５背景距離算出部６訂正線判定部７高さ判定部８イメージ除去部９被訂正文字除去部１０一文字切り出し部１１横方向投影算出部１２第１の水平ライン範囲選択部２２投影閾値決定部２３第２の水平ライン範囲選択部３１水平ライン範囲代分類部３２訂正文字領域推定部３３第２の連結イメージ分割部４１水平ライン範囲判定部４２第１のライン領域除去部４３第２のライン領域除去部４４第２の連結イメージ分割部４５訂正文字可能性判定部５１小イメージ除去部５２分割イメージ群選択部６１特徴抽出部６２認識部１０１連結パターン抽出部１０２横長パターン抽出部１０３続き線抽出部１０４変化点抽出部１０５分離点決定部１０６文字分離部 DESCRIPTION OF SYMBOLS 1 Frame removal part 2 1st connection image division part 3 Horizontal image selection part 4 Continuous line determination part 5 Background distance calculation part 6 Correction line determination part 7 Height determination part 8 Image removal part 9 Corrected character removal part 10 One character cutout Unit 11 horizontal projection calculation unit 12 first horizontal line range selection unit 22 projection threshold value determination unit 23 second horizontal line range selection unit 31 horizontal line range allowance classification unit 32 corrected character area estimation unit 33 second connected image division Unit 41 horizontal line range determination unit 42 first line region removal unit 43 second line region removal unit 44 second connected image division unit 45 corrected character possibility determination unit 51 small image removal unit 52 divided image group selection unit 61 Feature extraction unit 62 Recognition unit 101 Connected pattern extraction unit 102 Horizontal pattern extraction unit 103 Continuous line extraction unit 104 Change point extraction unit 105 Separation point determination unit 106 Character separation unit

Claims

【特許請求の範囲】[Claims]

【請求項１】入力パターンに記入枠が含まれている場
合は枠を除去し、枠除去後のパターンを連結するイメー
ジ毎に切り出して、これを第１の連結イメージ群とし、
各々の第１の連結イメージについて、外接矩形の幅およ
び高さを求め、求めた外接矩形の幅が大きければ水平ラ
イン毎の横方向投影を求め、各々の水平ラインに対する
横方向投影と予め定められた閾値を比較し、閾値より大
きくなるものが存在すれば、横方向投影が閾値以上の水
平ライン範囲と垂直ライン毎の上端からの背景距離を求
め、求めたイメージ上端からの背景距離の変化の少ない
連続範囲が狭ければ、イメージの高さが小さいときにはイメージを除去し、イメージの高さが大きいときには横方向投影が閾値以上
の水平ライン範囲に基づいて、訂正線がイメージの上半
分または下半分に存在するかを判定し、訂正線がイメージの上半分にあると判定された場合には
イメージから上半分の領域と水平ライン範囲の領域を除
去し、訂正線がイメージの下半分と判定された場合にはイメー
ジから下半分の領域と水平ライン範囲の領域を除去し、
上半分または下半分の領域と水平ライン範囲の領域の除
去後に残った認識対象である訂正文字領域のイメージか
ら連結するイメージ毎に切り出して、切り出したイメー
ジ群を第２の連結イメージ群とし、第２の連結イメージ群が発生した場合は第２の連結イメ
ージ群を一文字単位に切り出し、第２の連結イメージ群が発生しない場合は第１の連結イ
メージ群を一文字単位に切り出し、最終的に入力パター
ンに存在する全ての文字を一文字単位に切り出したイメ
ージを出力することを特徴とする文字切り出し方法。If the input pattern includes an entry frame, the frame is removed, the pattern after the removal of the frame is cut out for each connected image, and this is used as a first connected image group,
For each of the first connected images, the width and height of the circumscribed rectangle are determined. If the determined width of the circumscribed rectangle is large, a horizontal projection for each horizontal line is determined, and the horizontal projection for each horizontal line is predetermined. The thresholds are compared, and if there is something larger than the threshold, the horizontal projection is obtained by calculating the background distance from the top of each horizontal line range and vertical line where the horizontal projection is greater than or equal to the threshold, and calculating the change in the background distance from the top of the obtained image. If the small contiguous range is narrow, the image is removed when the image height is small, and when the image height is large, the correction line is shifted to the upper half or lower half of the image based on the horizontal line range where the horizontal projection is above the threshold. Judge whether it is in the half, and if it is judged that the correction line is in the upper half of the image, remove the upper half area and the area of the horizontal line range from the image and correct If the line is determined to be the lower half of the image, remove the lower half area and the horizontal line area from the image,
Each image to be connected is cut out from the image of the corrected character area, which is the recognition target remaining after the removal of the upper half or lower half area and the area of the horizontal line range, and the cut out image group is used as a second connected image group. When the second connected image group is generated, the second connected image group is cut out by one character unit. When the second connected image group is not generated, the first connected image group is cut out by one character unit. A character extraction method characterized by outputting an image in which all the characters existing in the image are extracted in units of one character.

【請求項２】入力パターンに記入枠が含まれている場
合は枠を除去し、枠除去後のパターンを連結するイメー
ジ毎に切り出して、これを第１の連結イメージ群とし、
各々の第１の連結イメージについて、外接矩形の幅およ
び高さを求め、幅が大きければ水平ライン毎の横方向投
影を求め、外接矩形幅に基づいて、外接矩形幅に対する
割合で表す横方向投影の閾値を算出し、各々の水平ライ
ンの横方向投影と算出された閾値とを比較して、横方向
投影が閾値以上の水平ラインが存在する場合に、横方向
投影が閾値以上の水平ライン範囲と垂直ライン毎の上端
からの背景距離を求め、求めたイメージ上端からの背景
距離の変化の少ない連続範囲が狭ければ、イメージの高さが小さいときにはイメージを除去し、イメージの高さが大きいときには閾値以上の水平ライン
の中心が、イメージの上端または下端からイメージの高
さの半分を超えない予め定められた範囲内に含まれるか
否かを判定し、水平ラインの中心が予め定められた範囲に含まれると判
定された場合において、水平ラインの中心がイメージの上半分に存在する場合
は、連結イメージ領域のイメージ上端から水平ライン範
囲下端までの領域を除去して、除去後の領域から連結す
るイメージ毎に切り出して出力し、これを第３の連結イ
メージ群とし、水平ラインの中心がイメージの下半分に存在する場合
は、連結イメージ領域の水平ライン範囲上端からイメー
ジ下端までの領域を除去して、除去後の領域から連結す
るイメージ毎に切り出して出力し、これを第４の連結イ
メージ群とし、水平ライン範囲の中心が予め定められた範囲に含まれな
いと判定された場合は、水平ライン範囲の上および下の
二つの領域を選択して連結するイメージ毎に切り出し、
二個の領域の分割イメージ群を比較して、訂正文字を分
割している可能性の高い方の分割イメージ群を選択し、
選択された方を第５の連結イメージ群とし、第３、第４、第５いずれかの連結イメージ群が存在する
場合は、それを一文字単位に切り出して出力し、どれの
存在しない場合は第１の連結イメージを一文字単位に切
り出して出力することを特徴とする文字切り出し方法。2. When the input pattern includes an entry frame, the frame is removed, the pattern after the removal of the frame is cut out for each image to be connected, and this is used as a first connected image group.
For each of the first connected images, the width and height of the circumscribed rectangle are determined. If the width is large, the horizontal projection for each horizontal line is determined. Based on the circumscribed rectangle width, the horizontal projection expressed as a percentage of the circumscribed rectangle width Is calculated, and the horizontal projection of each horizontal line is compared with the calculated threshold. If there is a horizontal line whose horizontal projection is equal to or greater than the threshold, the horizontal line range in which the horizontal projection is equal to or greater than the threshold is calculated. And the background distance from the top edge of each vertical line. If the continuous range where the background distance from the top edge of the obtained image does not change is narrow, the image is removed when the image height is small, and the image height is large. Sometimes, it is determined whether the center of the horizontal line that is equal to or greater than the threshold falls within a predetermined range that does not exceed half the height of the image from the top or bottom of the image, If it is determined that is included in the predetermined range, if the center of the horizontal line is in the upper half of the image, remove the area from the top of the image of the connected image area to the bottom of the horizontal line range, Cut out and output each connected image from the area after removal, and use this as the third connected image group. If the center of the horizontal line is in the lower half of the image, the image is output from the top of the horizontal line range of the connected image area. The region up to the lower end is removed, cut out and output for each connected image from the removed region, and this is set as a fourth connected image group. If the center of the horizontal line range is not included in the predetermined range, If determined, cut out for each image to select and connect the two areas above and below the horizontal line range,
Compare the divided image groups of the two areas and select the divided image group that is more likely to divide the corrected character,
The selected one is set as a fifth connected image group. If any of the third, fourth, and fifth connected image groups exists, it is cut out and output in units of one character. A character extracting method, wherein one connected image is extracted and output in units of one character.

【請求項３】入力パターンに記入枠が含まれている場
合は枠を除去し、枠除去後のパターンを連結するイメー
ジ毎に切り出して、これを第１の連結イメージ群とし、
各々の第１の連結イメージについて、外接矩形の幅およ
び高さを求め、幅が大きければ水平ライン毎の横方向投
影を求め、外接矩形幅に基づいて、外接矩形幅に対する
割合で表す横方向投影の閾値を決定し、各々の水平ライ
ンの横方向投影と投影閾値決定部で決定した閾値とを比
較して、横方向投影が閾値以上の水平ラインが存在する
場合に、横方向投影が閾値以上の水平ライン範囲と垂直
ライン毎の上端および下端からの背景距離を求め、求め
たイメージ上端からの背景距離の変化の少ない連続範囲
が狭ければ、イメージの高さが小さいときにはイメージを除去し、イメージの高さが大きいときには閾値以上の水平ライン
の中心が、イメージの上端または下端からイメージの高
さの半分を超えない予め定められた範囲内に含まれるか
否かを判定し、水平ラインの中心が予め定められた範囲に含まれると判
定された場合において、水平ラインの中心がイメージの上半分に存在する場合
は、連結イメージ領域のイメージ上端から水平ライン範
囲下端までの領域を除去し、除去後の領域から連結する
イメージ毎に切り出し、これを第６の連結イメージ群と
し、水平ラインの中心がイメージの下半分に存在する場合
は、連結イメージ領域の水平ライン範囲上端からイメー
ジ下端までの領域を除去し、除去後の領域から連結する
イメージ毎に切り出し、これを第７の連結イメージ群と
し、水平ライン範囲の中心が予め定められた範囲に含まれな
いと判定された場合は、水平ライン範囲の上および下の
二つの領域を選択して連結するイメージ毎に切り出し、
二つの領域の分割イメージ群に対して、各々の分割イメ
ージ群中で大きさの小さい分割イメージを除去し、除去
後の分割イメージ群の存在範囲を比較し、存在範囲が広
い方を選択し、選択された方を第８の連結イメージ群と
し、第６、第７、第８いずれかの連結イメージ群が存在する
場合は、それを一文字単位に切り出して出力し、どれの
存在しない場合は第１の連結イメージを一文字単位に切
り出して出力することを特徴とする文字切り出し方法。3. When the input pattern includes an entry frame, the frame is removed, and the pattern after the removal of the frame is cut out for each image to be connected, and this is used as a first connected image group.
For each of the first connected images, the width and height of the circumscribed rectangle are determined. If the width is large, the horizontal projection for each horizontal line is determined. Based on the circumscribed rectangle width, the horizontal projection expressed as a percentage of the circumscribed rectangle width Is determined, and the horizontal projection of each horizontal line is compared with the threshold determined by the projection threshold determination unit. If there is a horizontal line whose horizontal projection is equal to or greater than the threshold, the horizontal projection is equal to or greater than the threshold. The background distance from the top and bottom edges of each horizontal line range and vertical line is calculated.If the continuous range where the background distance from the top edge of the obtained image is small is narrow, the image is removed when the image height is small, If the height of the image is large, whether the center of the horizontal line that is equal to or greater than the threshold falls within a predetermined range that does not exceed half the height of the image from the top or bottom of the image In the case where it is determined that the center of the horizontal line is included in the predetermined range, if the center of the horizontal line is in the upper half of the image, from the upper end of the image of the connected image area to the lower end of the horizontal line range Area is removed, and each connected image is cut out from the removed area, and this is used as a sixth connected image group. If the center of the horizontal line is in the lower half of the image, the horizontal line range of the connected image area The region from the upper end to the lower end of the image is removed, and each connected image is cut out from the removed region. This is set as a seventh connected image group, and it is determined that the center of the horizontal line range is not included in the predetermined range. If it is done, select the two areas above and below the horizontal line range and cut out for each connected image,
For the divided image group of the two regions, remove the small divided image in each divided image group, compare the existence range of the divided image group after the removal, and select the one with the larger existence range, The selected one is set as the eighth connected image group. If any of the sixth, seventh, and eighth connected image groups exists, the connected image group is cut out in units of one character, and is output. A character extracting method, wherein one connected image is extracted and output in units of one character.

【請求項４】請求項１から３のいずれかに記載の文字
切り出し方法で入力パターンを一文字ずつ切り出し、切
り出された各々の文字パターンに対し、文字の種類を分
離するのに必要な特徴を抽出し、抽出された特徴に基づ
いて、どの種類に属するかを認識することを特徴とする
文字読取方法。4. A character extraction method according to any one of claims 1 to 3, wherein an input pattern is extracted one character at a time, and for each of the extracted character patterns, a feature necessary for separating a character type is extracted. And a character reading method for recognizing which type it belongs to based on the extracted features.

【請求項５】請求項１から３のいずれかに記載の文字
切り出し方法をソフトウェアで表現したプログラムを納
めた記録媒体。5. A recording medium storing a program that expresses the character segmentation method according to claim 1 by software.

【請求項６】請求項４の文字読取方法をソフトウェア
で表現したプログラムを納めた記録媒体。6. A recording medium storing a program that represents the character reading method according to claim 4 by software.

【請求項７】入力パターンに記入枠が含まれている場
合に、記入枠を除去する枠除去部と、前記枠除去部から出力されたパターンを連結するイメー
ジ毎に切り出し、各々のイメージの外接矩形の幅および
高さを求める第１の連結イメージ分割部と、前記第１の連結イメージ分割部から出力された連結イメ
ージ群の中から幅の大きい連結イメージだけを選択する
横長イメージ選択部と、前記横長イメージ選択部で選択されたイメージについて
水平ライン毎の横方向投影を求め、各々の水平ラインに
対する横方向投影と予め定められた閾値を比較し、閾値
より大きければ続き線ありと判定し、横方向投影が閾値
以上の水平ライン範囲を出力する続き線判定部と、前記続き線判定部で続き線ありと判定されたイメージに
ついて、イメージの垂直ライン毎の上端からの背景距離
を求める背景距離算出部と、前記背景距離算出部から出力されたイメージの背景距離
に基づいて、イメージ上端からの背景距離の変化の小さ
い範囲が長く連続しないイメージを訂正線ありと判定す
る訂正線判定部と、前記訂正線判定部で訂正線ありと判定されたイメージの
高さの大小を判定する高さ判定部と、前記高さ判定部で高さが小さいと判定されたイメージを
除去するイメージ除去部と、前記高さ判定部で高さが大きいと判定されたイメージに
ついて、前記続き線判定部から出力された水平ライン範
囲に基づいて、訂正線で抹消された被訂正領域を除去
し、除去後に残ったイメージ領域を連結イメージ毎に切
り出す被訂正文字除去部と、前記被訂正文字除去部から被訂正領域除去後の連結イメ
ージが出力された場合は、前記被訂正文字除去部から出
力された連結イメージを一文字単位に切り出し、前記被
訂正文字除去部から連結イメージが出力されていない場
合は、第１の連結イメージ分割部から出力された連結イ
メージを一文字単位に切り出す一文字切り出し部を備え
たことを特徴とする文字切り出し装置。7. A frame removing unit for removing an entry frame when an input pattern includes an entry frame, and cutting out the pattern output from the frame removing unit for each connected image, and circumscribing each image. A first connected image dividing unit for obtaining a width and a height of a rectangle; a horizontally long image selecting unit for selecting only a large connected image from a connected image group output from the first connected image dividing unit; For the image selected by the horizontally long image selection unit, determine the horizontal projection for each horizontal line, compare the horizontal projection for each horizontal line with a predetermined threshold, and determine that there is a continuous line if larger than the threshold, A continuation line determination unit that outputs a horizontal line range in which the horizontal projection is equal to or greater than a threshold value, and an image determined to be a continuation line by the continuation line determination unit. A background distance calculation unit that calculates a background distance from the top end of each in, and, based on the background distance of the image output from the background distance calculation unit, an image in which the range in which the change in the background distance from the top end of the image is small is long and not continuous. A correction line determination unit that determines that there is a correction line, a height determination unit that determines the magnitude of the height of the image that is determined to have a correction line in the correction line determination unit, and a small height in the height determination unit. An image removing unit that removes the image determined to be, and, for the image determined to have a large height by the height determining unit, deleting the image with a correction line based on the horizontal line range output from the continuous line determining unit. A corrected character removal section for removing the corrected corrected area and cutting out the image area remaining after the removal for each connected image; and a connected image after the corrected area removal from the corrected character removal section. If so, the connected image output from the corrected character removing unit is cut out in units of one character. If the connected image is not output from the corrected character removing unit, the connected image is output from the first connected image dividing unit. A character extracting unit for extracting a connected image in character units.

【請求項８】前記続き線判定部において、前記横長イメージ選択部で選択された連結イメージにつ
いて各々の水平ラインの横方向投影を求める横方向投影
算出部と、前記連結イメージ分割部から出力された外接矩形幅に基
づいて、外接矩形幅に対する割合で表す横方向投影の閾
値を決定する投影閾値決定部と、前記横方向投影算出部から出力された各々の水平ライン
の横方向投影と投影閾値決定部で決定した閾値とを比較
して、横方向投影が閾値以上の水平ラインが存在する場
合に、閾値以上の水平ラインの範囲を出力する水平ライ
ン範囲選択部を備えたことを特徴とする請求項７記載の
文字切り出し装置。8. The continuous line determination unit, wherein: a horizontal projection calculation unit that obtains a horizontal projection of each horizontal line for the connected image selected by the horizontal image selection unit; A projection threshold value determination unit that determines a threshold value of a horizontal projection expressed as a percentage of the circumscribed rectangle width based on the circumscribed rectangle width; and a horizontal projection and a projection threshold value determination of each horizontal line output from the horizontal projection calculation unit. A horizontal line range selector that outputs a range of horizontal lines equal to or greater than the threshold when there is a horizontal line whose horizontal projection is equal to or greater than the threshold by comparing the threshold with the threshold determined by the unit. Item 7. The character cutout device according to Item 7.

【請求項９】前記被訂正文字除去部において、前記高さ選択部で高さが大きいと判定されたイメージに
ついて、前記続き線判定部から出力された水平ライン範
囲に基づいて、訂正線がイメージの上半分または下半分
のどちらに存在するかを判定する水平ライン範囲大分類
部と、上半分と判定された場合にはイメージから上半分
の領域と水平ライン範囲の領域を除去し、下半分と判定
された場合にはイメージから下半分の領域と水平ライン
範囲の領域を除去する訂正文字領域推定部と、除去後に
残ったイメージ領域を連結するイメージ毎に切り出す第
２の連結イメージ分割部を備えたことを特徴とする請求
項７記載の文字切り出し装置。9. The correction line removing unit according to claim 5, wherein the correction line is determined based on the horizontal line range output from the continuation line determination unit for the image determined to be large by the height selection unit. The horizontal line range major classification unit that determines whether the image exists in the upper half or the lower half of the image, and if the image is determined to be the upper half, the upper half region and the horizontal line region are removed from the image, and the lower half is removed. If it is determined that there is a correction character area estimating section that removes the lower half area and the area of the horizontal line range from the image, and a second connected image dividing section that cuts out the image area remaining after the removal for each connected image. The character cutout device according to claim 7, wherein the character cutout device is provided.

【請求項１０】前記被訂正文字除去部において、前記高さ判定部で高さが大きいと判定されたイメージに
ついて、前記続き線判定部から出力された閾値以上の水
平ライン範囲の中心が、イメージの上端または下端から
イメージの高さの半分を超えない予め定められた範囲内
に含まれるか否かを判定する水平ライン範囲判定部と、前記水平ライン範囲判定部で水平ライン範囲の中心が予
め定められた範囲に含まれると判定された場合におい
て、水平ライン範囲の中心がイメージ上端から予め定め
られた範囲に存在する場合は、前記第１の連結イメージ
分割部から出力された連結イメージ領域から、イメージ
上端から水平ライン範囲下端までの領域を除去したもの
を訂正文字領域として出力し、水平ラインの中心がイメ
ージ下端から予め定められた範囲に存在する場合は、前
記第１の連結イメージ分割部から出力された連結イメー
ジ領域から、水平ライン範囲上端からイメージ下端まで
の領域を除去したものを訂正文字領域として出力する第
１のライン領域除去部と、前記水平ライン範囲判定部で水平ライン範囲の中心が予
め定められた範囲に含まれないと判定された場合は、水
平ライン範囲の上および下の二つの領域を選択して訂正
文字候補領域として出力する第２のライン領域除去部
と、前記第２のライン領域除去部から二個の訂正文字候補領
域が出力された場合は、二つの訂正文字候補領域から連
結するイメージ毎に切り出し、訂正文字候補領域が出力
されていない場合は、前記訂正文字領域推定部から出力
された訂正文字領域から連結するイメージ毎に切り出
し、切り出し後のイメージ群を出力する第２の連結イメ
ージ分割部と、前記第２の連結イメージ分割部から切り出し後のイメー
ジ群が出力された場合は、前記第２の連結イメージ分割
部から出力された二個の訂正文字候補領域の切り出しイ
メージ群を比較して、訂正文字を分割している可能性の
高い方の切り出しイメージ群を選択して出力する訂正文
字可能性判定部を備えたことを特徴とする請求項７記載
の文字切り出し装置。10. The center of a horizontal line range that is equal to or greater than a threshold value output from the continuous line determination unit, wherein the center of an image determined to be large by the height determination unit is determined by the corrected character removal unit. A horizontal line range determination unit that determines whether or not the image is included in a predetermined range that does not exceed half the height of the image from the upper end or the lower end of the image. If the center of the horizontal line range is within a predetermined range from the upper end of the image when it is determined that the horizontal line range is included in the predetermined range, the center of the horizontal line range is determined from the connected image area output from the first connected image division unit. The area from the upper edge of the image to the lower end of the horizontal line range is removed and output as a corrected character area. A first line that outputs a corrected character area obtained by removing the area from the upper end of the horizontal line range to the lower end of the image from the connected image area output from the first connected image division unit when the image is present in the range. When the center of the horizontal line range is determined not to be included in the predetermined range by the horizontal line range determination unit and the horizontal line range determination unit, the two regions below and above the horizontal line range are selected and corrected. A second line area removing unit that outputs as a character candidate area; and, when two corrected character candidate areas are output from the second line area removing unit, for each image connected from the two corrected character candidate areas. If the corrected character candidate area is not output, the image is cut out for each image to be connected from the corrected character area output from the corrected character area estimation unit. A second connected image dividing unit that outputs a group of images, and two images output from the second connected image dividing unit when the cut-out image group is output from the second connected image dividing unit. And comparing the cut-out image group of the corrected character candidate area with the cut-out image group having a higher possibility of dividing the corrected character, and outputting the selected character group. The character cutout device according to claim 7.

【請求項１１】前記訂正文字可能性判定部において、第２の連結イメージ分割部から出力された二つの訂正文
字候補領域の切り出しイメージ群に対して、各々の切り
出しイメージ群中で大きさの小さい切り出しイメージを
除去する小イメージ除去部と、前記小イメージ除去部から出力された二つの訂正文字候
補領域の小さい切り出しイメージ除去後の切り出しイメ
ージ群の入力パターン中における存在範囲を比較し、存
在範囲が広い方の切り出しイメージ群を選択する分割イ
メージ群選択部を備えたことを特徴とする請求項１０記
載の文字切り出し装置。11. The corrected character possibility determining unit, wherein the cutout image group of the two corrected character candidate regions output from the second connected image dividing unit has a smaller size in each of the cutout image groups. A small image removing unit for removing the cut-out image, and comparing the existing ranges of the two corrected character candidate regions output from the small image removing unit in the input pattern of the cut-out image group after the small cut-out image removal, and comparing the existence ranges. 11. The character segmenting device according to claim 10, further comprising a divided image group selecting section for selecting a wider segmented image group.

【請求項１２】文字が含まれるパターンを入力するパ
ターン入力装置と、入力パターン画像から一文字ずつ切
り出す請求項７から１１のいずれかに記載の文字切り出
し装置と、前記の文字切り出し装置で一文字単位に切り
出された各々の文字パターンに対し、文字の種類を分離
するのに必要な特徴を抽出する特徴抽出部と、特徴抽出
部で抽出された特徴に基づいて、どの種類に属するかを
認識する認識部を備えたことを特徴とする文字読取装
置。12. A pattern input device for inputting a pattern including characters, a character cutout device according to any one of claims 7 to 11, which cuts out characters one by one from an input pattern image, and the character cutout device for each character. A feature extraction unit that extracts features necessary to separate the character type from each of the extracted character patterns, and a recognition unit that recognizes which type the character pattern belongs to based on the features extracted by the feature extraction unit. A character reading device comprising a unit.