JPH10134145A

JPH10134145A - Character segmenting method, character recognition device using the same, and computer-readable storage medium where program implementing the same character segmenting method is stored

Info

Publication number: JPH10134145A
Application number: JP8304228A
Authority: JP
Inventors: Toshio Miyazawa; 利夫宮澤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1996-10-31
Filing date: 1996-10-31
Publication date: 1998-05-22

Abstract

PROBLEM TO BE SOLVED: To accurately segment characters even when a document contains touching characters or has extremely narrow character spacing. SOLUTION: Standard rectangle width is found from generated rectangles and rectangle width 41a of interest among the generated rectangles is compared with the standard rectangle width. Further, a rectangle 41b which is adjacent to a rectangle 41a of interest is compared with the standard rectangle width. Thus, two kinds of rectangle width are compared with the standard rectangle width to accurately judge a mistake of a character segmentation candidate position 42. When the rectangle 41a of interest is larger than the standard size, a specific rectangle 43 is further generated in the rectangle 41a of interest. Then the rectangle 43 is scanned in a row direction to obtain the maximum value of black run length and the black pixel rate in the rectangle. When the black run length and black pixel ratio are larger than specific values, a new character segmentation candidate position 42d is generated in addition to the character segmentation candidate position 42b.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文字切り出し方
法およびそれを用いた文字認識装置と、その文字切り出
し方法を実行するプログラムを格納した、コンピュータ
が読取可能な記憶媒体に関し、更に詳しくは、接触した
文字や文字間隔が狭いなどの理由で文字切り出し位置が
間違っている場合でも、正確に文字を切り出すことがで
きる文字切り出し方法およびそれを用いた文字認識装置
と、その文字切り出し方法を実行するプログラムを格納
した、コンピュータが読取可能な記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting method, a character recognizing apparatus using the same, and a computer-readable storage medium storing a program for executing the character extracting method. Character extraction method capable of accurately extracting a character even when the character extraction position is incorrect due to a narrowed character or a small character interval, a character recognition device using the same, and a program for executing the character extraction method And a storage medium readable by a computer.

【０００２】[0002]

【従来の技術】現在、既に書かれたものを読み取るオフ
ライン方式の文字認識装置が、多量の一般文書、帳簿、
伝票などを高速読み取りするといった分野で広く用いら
れている。既に書かれた文書などには、印刷された文字
のみならず手書きの文字も含まれる。ここで、手書きの
文字では、隣接する文字が接触したりする場合がある
（以下、接触文字という）。また、文字間隔が非常に狭
くなる場合がある（以下、近接文字という）。2. Description of the Related Art At present, an off-line type character recognition device for reading already written data is used for a large amount of general documents, books,
It is widely used in fields such as reading slips at high speed. An already written document or the like includes not only printed characters but also handwritten characters. Here, in handwritten characters, adjacent characters may touch each other (hereinafter, referred to as contact characters). Further, the character spacing may be very narrow (hereinafter, referred to as a proximity character).

【０００３】このような接触文字を認識する技術として
は、特開平５−１２８３０７号公報に記載の如く、接触
文字の混在する文字の構成要素を抽出し、これらの文字
構成要素に基づいて平均的な文字幅および文字ピッチを
算出し、この平均的文字幅および文字ピッチに基づいて
切り出しを行うものが知られている。As a technique for recognizing such a contact character, as described in Japanese Patent Laid-Open No. 5-128307, components of a character in which contact characters are mixed are extracted, and an average is extracted based on these character components. It is known that a character width and a character pitch are calculated and cut out based on the average character width and the character pitch.

【０００４】また、特開平６−２１５１８３号公報に記
載の如く、標準的な文字幅に比べその文字幅が顕著に異
なる文字パターンが存在するとき、その前後の文字パタ
ーンをも含めてヒストグラムを求め、所定のしきい値以
下の位置で接触文字を分割する技術も知られている。な
お、特開平６−１６７２０号公報の如く、接触文字を分
離せず、文字塊のまま認識する方法もある。Further, as described in Japanese Patent Application Laid-Open No. 6-215183, when there is a character pattern whose character width is significantly different from the standard character width, a histogram is obtained including the character patterns before and after the character pattern. A technique for dividing a contact character at a position equal to or less than a predetermined threshold is also known. There is also a method of recognizing a contact block without separating a contact character, as disclosed in JP-A-6-16720.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、平均的
な文字幅を算出し、これに基づいて接触文字を切り出す
方法（上記特開平５−１２８３０７号）では、誤った切
り出し位置を誤ってしまう場合がある。また、ヒストグ
ラムの谷間を切り出し位置として文字の切り出しを行う
方法（特開平６−２１５１８３号）では、谷間が現れな
い接触文字を切り出せない場合がある。このため、文字
の切り出しを正確に行うことができない問題点があっ
た。さらに、文字間隔が非常に狭い場合も、同様の問題
点が生じていた。However, in the method of calculating an average character width and extracting a contact character based on the calculated average character width (the above-mentioned Japanese Patent Application Laid-Open No. 5-128307), an erroneous extraction position may be erroneous. is there. Also, in the method of cutting out characters by setting the valley of the histogram as a cutout position (Japanese Patent Laid-Open No. 6-215183), it may not be possible to cut out a contact character in which no valley appears. For this reason, there has been a problem that the character cannot be cut out accurately. Further, the same problem occurs when the character spacing is very narrow.

【０００６】そこで、この発明は、上記に鑑みてなされ
たものであって、接触文字を含む文章や、文字間隔が非
常に狭い文章であっても、文字を正確に切り出すことが
できる文字切り出し方法およびそれを用いた文字認識装
置と、その文字切り出し方法を実行するプログラムを格
納した、コンピュータが読取可能な記憶媒体を提供する
ことを目的とする。In view of the foregoing, the present invention has been made in view of the above, and has a character extracting method capable of accurately extracting a character even if the text includes a contact character or a text with a very narrow character interval. It is another object of the present invention to provide a computer-readable storage medium storing a character recognition device using the same and a program for executing the character extraction method.

【０００７】[0007]

【課題を解決するための手段】上述の目的を達成するた
めに、請求項１に係る文字切り出し方法は、文章を構成
する文字をそれぞれ矩形として切り出す文字切り出し方
法において、前記切り出した矩形のうち注目する一つの
注目矩形の第１矩形幅と、当該注目矩形幅と前記注目矩
形に隣接する矩形の矩形幅とを加えた第２矩形幅と、を
取得する工程と、切り出した矩形から求めた標準的な矩
形の標準矩形幅を取得する工程と、前記第１矩形幅と標
準矩形幅とを比較し、さらに、前記第２矩形幅と標準矩
形幅とを比較する工程と、前記第１矩形幅が前記標準矩
形幅から大きく離れている場合で、かつ、前記第２矩形
幅が前記標準矩形幅の２倍に近い場合は、文字切り出し
候補位置を間違えていると判断する工程と、を含むもの
である。According to a first aspect of the present invention, there is provided a character extracting method for extracting a character constituting a sentence as a rectangle. Obtaining a first rectangular width of one target rectangle to be obtained, a second rectangular width obtained by adding the target rectangular width and a rectangular width of a rectangle adjacent to the target rectangle, and a standard obtained from the cut-out rectangle. Obtaining a standard rectangular width of a standard rectangle; comparing the first rectangular width with the standard rectangular width; further comparing the second rectangular width with the standard rectangular width; Determining that the character cutout candidate position is incorrect when the second rectangular width is substantially twice as large as the standard rectangular width. .

【０００８】すなわち、標準矩形幅に対する比較対象
を、注目矩形の矩形幅のみならず、それに隣接する矩形
の矩形幅まで考慮するようにした。このため、文字切り
出し候補位置の間違いを正確に判断できるようになる。
この結果、接触文字や近接文字における文字切り出し位
置を誤りにくくなり、文字読み取りが正確に行える。[0008] That is, the comparison target with respect to the standard rectangle width is taken into consideration not only the rectangle width of the target rectangle but also the rectangle width of the rectangle adjacent thereto. For this reason, it is possible to accurately determine an error in the character extraction candidate position.
As a result, the character cutout position of the contact character or the proximity character is less likely to be erroneous, and the character can be read accurately.

【０００９】また、請求項２に係る文字切り出し方法
は、さらに、前記第２矩形幅から前記文字切り出し候補
位置の間違いを判断するため分離候補矩形を生成する工
程と、当該分離候補矩形内の黒ラン長さや黒画素比率な
どの黒ラン情報に基づいて文字切り出し候補位置の間違
いを判断する工程と、文字切り出し候補位置が間違って
いると判断したら、前記分離候補矩形を強制分離させる
工程と、を含むものである。The character extracting method according to claim 2, further comprising: generating a separation candidate rectangle for judging an error in the character extraction candidate position from the second rectangle width; A step of determining an error in a character cutout candidate position based on black run information such as a run length or a black pixel ratio, and a step of forcibly separating the separation candidate rectangle if the character cutout candidate position is determined to be incorrect. Including.

【００１０】まず、文字切り出し候補位置の間違いを判
断するための分離候補矩形を生成し、この分離候補内の
黒ランの状態をみる。例えば、この分離候補矩形内の黒
画素比率が大きければ、分離候補矩形の両側の矩形に係
る文字同士は接触している可能性がある。従って、この
まま文字を切り出すと切り出し位置を間違うことにな
る。そこで、かかる場合は分離候補矩形を強制分離する
ようにする。このようにすれば、文字の切り出し位置の
間違いを防止できる。First, a separation candidate rectangle for judging an error in the character cutout candidate position is generated, and the state of the black run in the separation candidate is checked. For example, if the ratio of black pixels in the separation candidate rectangle is large, characters related to rectangles on both sides of the separation candidate rectangle may be in contact with each other. Therefore, if the character is cut out as it is, the cutout position will be wrong. Therefore, in such a case, the separation candidate rectangle is forcibly separated. In this way, it is possible to prevent a mistake in the character cutout position.

【００１１】また、請求項３に係る文字切り出し方法
は、つぎの発明による文字切り出し方法では、さらに、
前記分離候補矩形が所定幅より小さいときには強制分離
を行わない工程を含むようにしたものである。The character extracting method according to a third aspect of the present invention is the character extracting method according to the next invention, further comprising:
When the separation candidate rectangle is smaller than a predetermined width, a step of not performing forced separation is included.

【００１２】前記分離候補矩形があまりに小さなとき、
これを強制分離していると処理時間がかかる等の不具合
を生じさせるてしまう。このため、前記分離候補矩形が
小さなときは強制分離をやめ、不具合を防止するように
した。When the separation candidate rectangle is too small,
If this is forcibly separated, a problem such as a long processing time is caused. For this reason, when the separation candidate rectangle is small, forced separation is stopped to prevent a problem.

【００１３】また、請求項４に係る文字認識装置は、一
般文書、伝票、帳簿などの印刷文字や手書き文字を画像
入力する画像入力手段と、前記画像入力手段から入力し
た画像から矩形を抽出し、前記請求項１〜３のいずれか
一つに記載の方法を実施することで前記抽出した矩形に
より決まる文字切り出し候補位置の間違いを判定し、正
しく矩形を抽出し直す矩形抽出手段と、前記矩形抽出手
段により抽出した各矩形を文字ごと切り出す文字切り出
し手段と、前記切り出した文字を認識する文字認識手段
と、を具備するものである。According to a fourth aspect of the present invention, there is provided a character recognition apparatus, comprising: an image input unit for inputting print characters and handwritten characters of general documents, slips, books, and the like as images, and extracting a rectangle from the image input from the image input unit. A rectangle extracting means for determining a mistake in a character extraction candidate position determined by the extracted rectangle by performing the method according to any one of claims 1 to 3, and re-extracting the rectangle correctly; The apparatus includes character extracting means for extracting each rectangle extracted by the extracting means for each character, and character recognizing means for recognizing the extracted characters.

【００１４】このような文字認識装置を用いれば、文字
切り出し候補位置の間違いを少なくし、文字読み取りを
正確に行える。また、矩形抽出手段に上記工夫をこらせ
ばよく、その他にハード的な構成を必要としないから、
装置を安価に構成できる。By using such a character recognition device, it is possible to reduce errors in character extraction candidate positions and to accurately read characters. In addition, it is sufficient if the above idea is devised in the rectangle extracting means, and no other hardware configuration is required.
The device can be configured at low cost.

【００１５】また、請求項５に係るコンピュータが読取
可能な記憶媒体は、前記請求項１〜３のいずれか一つに
記載された方法を、実行するプログラムを格納したもの
である。A computer-readable storage medium according to a fifth aspect stores a program for executing the method according to any one of the first to third aspects.

【００１６】このように、コンピュータが読取可能な記
憶媒体にプログラムを記憶することにより、上記方法の
プログラム自体を適切に保護することができる。As described above, by storing the program in the computer-readable storage medium, the program itself of the above method can be appropriately protected.

【００１７】[0017]

【発明の実施の形態】以下、この発明につき図面を参照
しつつ詳細に説明する。なお、この実施の形態によりこ
の発明が限定されるものではない。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to the drawings. The present invention is not limited by the embodiment.

【００１８】図１は、文字認識装置の構成を示す構成図
である。この文字認識装置１００は、一般文書、伝票、
帳簿などの印刷文字や手書き文字を画像入力できるＯＣ
Ｒ部１と、ＯＣＲ部１から入力された画像から文字を切
り出すなどの処理を行うＣＰＵ２と、切り出し中の文字
などを表示する表示装置３とから構成されている。FIG. 1 is a configuration diagram showing the configuration of the character recognition device. This character recognition device 100 can be used for general documents, slips,
OC that can input printed and handwritten characters such as books
The image processing apparatus includes an R unit 1, a CPU 2 that performs processing such as cutting out characters from an image input from the OCR unit 1, and a display device 3 that displays characters being cut out.

【００１９】図２は、図１に示した文字認識装置の機能
ブロック図である。この文字認識装置１００は、黒ラン
の連結パターンを包含した矩形を抽出する矩形抽出部２
１と、矩形抽出部２１の矩形情報から小矩形などの統合
や強制分離を行い、文字切り出し候補位置を定める文字
切り出し部２２と、前記文字切り出し候補位置から文字
を認識してその認識結果を出力する文字認識部２３と、
文字パターンが記録されている辞書２４と、から構成さ
れている。FIG. 2 is a functional block diagram of the character recognition device shown in FIG. The character recognition device 100 includes a rectangle extracting unit 2 for extracting a rectangle including a black run connection pattern.
1, a character extraction unit 22 that integrates and forcibly separates small rectangles from the rectangle information of the rectangle extraction unit 21, determines a character extraction candidate position, and recognizes a character from the character extraction candidate position and outputs the recognition result. Character recognition unit 23,
And a dictionary 24 in which character patterns are recorded.

【００２０】図３は、文字認識装置１００の文字切り出
しの手順を示すフローチャートである。なお、以下の処
理は、ＣＰＵ２に内蔵してある記憶媒体に記憶された処
理手順に基づいて行う。ステップＳ１では、ＯＣＲ部１
により手書き文章の画像を入力する。ステップＳ２で
は、黒画素の連結パターンを追跡することで前記入力画
像から矩形を抽出する。FIG. 3 is a flowchart showing a procedure for extracting characters by the character recognition apparatus 100. The following processing is performed based on the processing procedure stored in a storage medium built in the CPU 2. In step S1, the OCR unit 1
To input an image of a handwritten sentence. In step S2, a rectangle is extracted from the input image by tracking a connection pattern of black pixels.

【００２１】図４に、その抽出された矩形４１を示す。
この図において、矩形間が文字切り出し候補位置４２と
なる。例えば、「天の川」なる手書き文字が行内に含ま
れていた場合、図４では「の」および「川」の左側線分
を一つの矩形４１ａとして、「川」の残り線分を一つの
矩形４１ｂとして抽出している。特に、矩形４１ａでは
「の」と「川」の左側線分とが接触しているため矩形サ
イズが大きくなっている。このときの文字切り出し候補
位置は４２ａ、４２ｂ、４２ｃとなる。続いて、ステッ
プＳ３では矩形情報を抽出する。具体的には、一行内の
標準的な文字幅および文字間隔、各矩形それぞれの文字
高さおよび文字幅などを求める。FIG. 4 shows the extracted rectangle 41.
In this figure, a portion between rectangles is a character extraction candidate position 42. For example, if the handwritten character “Milky Way” is included in the line, the left line segment of “No” and “River” is one rectangle 41a in FIG. 4, and the remaining line segment of “River” is one rectangle 41b. Has been extracted. In particular, in the rectangle 41a, the size of the rectangle is large because the “no” and the left line segment of the “river” are in contact. The character cutout candidate positions at this time are 42a, 42b, and 42c. Subsequently, in step S3, rectangle information is extracted. Specifically, the standard character width and character spacing within one line, the character height and character width of each rectangle, and the like are obtained.

【００２２】つぎに、強制分離を行うか否かの決定をす
る。強制分離は、次の条件１および条件２を満たす場合
に行うものとする。まず、条件１は、「左矩形間隔をＧ
ｒ、右矩形間隔をＧｌ、矩形１の幅をＷ１、矩形１と矩
形２とを合わせた幅をＷ２、標準文字幅をＷｓ、最小文
字間隔をＧｍとした場合、（１）Ｇｒ＜０．５Ｗｓ、（２）Ｇｌ＜０．５Ｗｓ、（３）０．３Ｗｓ＜Ｗ１＜０．９Ｗｓまたは１．１Ｗ
ｓ＜Ｗ１＜１．７Ｗｓ、（４）０．７５Ｗｓ≦（Ｗ２−Ｇｍ）／２≦１．２５
Ｗｓ」を満たすものとする。Next, it is determined whether or not forced separation is performed. The forced separation is performed when the following conditions 1 and 2 are satisfied. First, condition 1 is that “the left rectangular interval is G
r, the right rectangle interval is Gl, the width of rectangle 1 is W1, the combined width of rectangle 1 and rectangle 2 is W2, the standard character width is Ws, and the minimum character interval is Gm. (1) Gr <0. 5Ws, (2) Gl <0.5Ws, (3) 0.3Ws <W1 <0.9Ws or 1.1W
s <W1 <1.7Ws, (4) 0.75Ws ≦ (W2-Gm) /2≦1.25
Ws ”.

【００２３】この条件１に合致する場合は、「標準的な
文字サイズから離れている」と判断できる。この場合、
その矩形に係る文字は接触文字または近接した文字であ
る可能性が高いので、強制分離する必要がある。図４の
例を参照してみると、矩形４１ａは標準的な文字サイズ
から大きく離れている。これは「の」と「川」の左側線
分とが接触文字となっているためである。When the condition 1 is satisfied, it can be determined that "the character size is far from the standard character size". in this case,
Since the character related to the rectangle is likely to be a contact character or a close character, it is necessary to forcibly separate the character. Referring to the example of FIG. 4, the rectangle 41a is far away from the standard character size. This is because “no” and the left line segment of “river” are contact characters.

【００２４】そこで、ステップＳ４では、条件１を満た
す場合にはステップＳ６に進み、次の条件２を満たす限
り強制分離を行う。一方、条件１を満たさないときはス
テップＳ５に進み、強制分離は行わない。条件１を満た
さないときは、矩形４１ａに係る文字は接触文字などで
ある可能性が低いからである。In step S4, if condition 1 is satisfied, the process proceeds to step S6, and forced separation is performed as long as the next condition 2 is satisfied. On the other hand, when the condition 1 is not satisfied, the process proceeds to step S5, and the forced separation is not performed. This is because when the condition 1 is not satisfied, the character related to the rectangle 41a is unlikely to be a contact character or the like.

【００２５】さてつぎに、条件１を満たす場合は、ステ
ップＳ６において黒画素情報を抽出する。まず、矩形１
と矩形２とを合わせたときの幅Ｗ２と最小文字間隔Ｇｍ
とから、図５に示すような所定の矩形４３（幅Ｗｎ（Ｗ
２−Ｇｍ））を生成する。この矩形４３は、矩形４１ａ
の右端部から生成するものとする。そして、生成した矩
形４３を行方向に走査して黒ラン長さの最大値、およ
び、矩形内の黒画素比率を取得する。If condition 1 is satisfied, black pixel information is extracted in step S6. First, rectangle 1
W2 and minimum character spacing Gm when combining
From the above, a predetermined rectangle 43 (width Wn (W
2-Gm)). This rectangle 43 is a rectangle 41a
Is generated from the right end. Then, the generated rectangle 43 is scanned in the row direction to obtain the maximum value of the black run length and the black pixel ratio in the rectangle.

【００２６】続いて、ステップＳ７では、条件２を満た
すか否かを判断する。条件２は、「矩形４３の幅Ｗｎが
１１ｍｍ＜Ｗｎか」、または、「矩形４３の幅Ｗｎが４
ｍｍ＜Ｗｎ≦１１ｍｍであって矩形４３内に連続した黒
画素があるか」である。黒ラン長さおよび黒画素比率が
所定の値より大きな場合には、強制分離を行う。この所
定の値は、通常の場合を想定して予め設定しておく。Then, in a step S7, it is determined whether or not the condition 2 is satisfied. Condition 2 is “whether the width Wn of the rectangle 43 is 11 mm <Wn” or “the width Wn of the rectangle 43 is 4
mm <Wn ≦ 11 mm and whether there is a continuous black pixel in the rectangle 43 ”. If the black run length and the black pixel ratio are larger than predetermined values, forced separation is performed. This predetermined value is set in advance assuming a normal case.

【００２７】例えば、図５に示すように、矩形４３内に
「の」の一部と「川」の左側線分があるときには、矩形
４３内に連続した文字パターンが存在するため、黒ラン
長さおよび黒画素比率が通常より大きくなる。この場合
に初めの文字切り出し候補位置４２ｂにより切り出す
と、誤った文字認識を行うことになる。For example, as shown in FIG. 5, when there is a part of “no” and a left line segment of “river” in the rectangle 43, a continuous character pattern exists in the rectangle 43, so that the black run length And the black pixel ratio becomes larger than usual. In this case, if the character is cut out at the first character cutout candidate position 42b, incorrect character recognition will be performed.

【００２８】そこで、条件２を満たすときはステップＳ
８に進み、強制分離を行う。一方、条件２を満たさない
とき、例えば、矩形４３内に「川」の左側線分がないと
きなどには、矩形４３内の黒ラン長さおよび黒画素比率
が通常と変わらないので、ステップＳ５に進み強制分離
は行わないものとする。Therefore, when condition 2 is satisfied, step S
Proceed to 8 to perform forcible separation. On the other hand, when the condition 2 is not satisfied, for example, when there is no left line segment of “river” in the rectangle 43, the black run length and the black pixel ratio in the rectangle 43 are not different from normal, so that step S 5 And forced separation is not performed.

【００２９】また、矩形４３があまりに小さい場合（４
ｍｍ以下）のときにも、強制分離を行わない。このよう
な小さな文字切り出し位置候補を文字認識部２３に渡す
と、多くの処理時間が必要だったり、認識結果に悪影響
を与えるためである。If the rectangle 43 is too small (4
mm or less), no forced separation is performed. Passing such a small character cutout position candidate to the character recognition unit 23 requires a lot of processing time or adversely affects the recognition result.

【００３０】ステップＳ８では、「の」と「川」との強
制分離を行う。すなわち、前の文字切り出し候補位置４
２ｂに加えて、新たな文字切り出し候補位置４２ｄを発
生させる。以上、横書きの文字について説明したが縦書
きの文字の場合にも上記同様に強制分離を行う。さら
に、接触文字のみならず近接文字の場合にも上記同様の
処理により、強制分離を行う。In step S8, "no" and "river" are forcibly separated. That is, the previous character extraction candidate position 4
In addition to 2b, a new character cutout candidate position 42d is generated. The horizontal writing characters have been described above. For vertical writing characters, forced separation is performed in the same manner as described above. Furthermore, not only a contact character but also a nearby character is subjected to forced separation by the same processing as described above.

【００３１】[0031]

【発明の効果】以上説明したように、この発明の文字切
り出し方法（請求項１）によれば、標準矩形幅との比較
対象を、注目矩形の矩形幅のみならず、それに隣接する
矩形の矩形幅まで考慮するようにしたので、文字切り出
し候補位置の間違いを正確に判断できるようになる。こ
の結果、接触文字や近接文字における文字切り出し位置
を誤りにくくなり、文字読み取りが正確に行える。As described above, according to the character extracting method of the present invention (claim 1), the object to be compared with the standard rectangular width is not only the rectangular width of the target rectangle but also the rectangular rectangle adjacent thereto. Since the width is taken into account, it is possible to accurately determine an error in the character extraction candidate position. As a result, the character cutout position of the contact character or the proximity character is less likely to be erroneous, and the character can be read accurately.

【００３２】つぎの発明による文字切り出し方法（請求
項２）によれば、さらに、前記第２矩形幅から分離候補
矩形を生成し、当該分離候補矩形内の黒ラン長さや黒画
素比率などの黒ラン情報に基づいて文字切り出し候補位
置の間違いを判断し、間違っているときには前記分離候
補矩形を強制分離させるようにした。このよため、文字
の切り出し位置の間違いを防止できる。According to the character extracting method according to the next invention (claim 2), a separation candidate rectangle is further generated from the second rectangle width, and a black run length and a black pixel ratio in the separation candidate rectangle are determined. An error in the character cutout candidate position is determined based on the run information, and when the position is incorrect, the separation candidate rectangle is forcibly separated. For this reason, it is possible to prevent a mistake in the character cutout position.

【００３３】つぎの発明による文字切り出し方法（請求
項３）によれば、さらに、前記分離候補矩形が所定幅よ
り小さいときには強制分離を行わないようにした。この
ため、処理時間がかかる等の不具合を防止できる。According to the character extraction method of the present invention (claim 3), when the separation candidate rectangle is smaller than a predetermined width, forced separation is not performed. For this reason, problems such as a long processing time can be prevented.

【００３４】つぎの発明による文字認識装置（請求項
４）によれば、文字切り出し候補位置の間違いを少なく
なる。このため、文字読み取りを正確に行える。また、
処理の工夫により実現できるから、装置を安価に構成で
きる。According to the character recognition device of the present invention (claim 4), the number of erroneous character cutout candidate positions is reduced. Therefore, character reading can be performed accurately. Also,
Since it can be realized by devising the processing, the apparatus can be configured at low cost.

【００３５】つぎの発明によるコンピュータが読取可能
な記憶媒体（請求項５）によれば、上記方法のプログラ
ム自体を適切に保護することができる。According to the computer-readable storage medium of the next invention (claim 5), the program itself of the above method can be appropriately protected.

【図面の簡単な説明】[Brief description of the drawings]

【図１】この発明に係る文字認識装置の概略構成を示す
ブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a character recognition device according to the present invention.

【図２】図１に示した文字認識装置の詳細構成を示す機
能ブロック図である。FIG. 2 is a functional block diagram showing a detailed configuration of the character recognition device shown in FIG.

【図３】この発明の文字切り出し方法の手順を示すフロ
ーチャートである。FIG. 3 is a flowchart showing a procedure of a character extracting method according to the present invention.

【図４】矩形の抽出を示す説明図である。FIG. 4 is an explanatory diagram showing extraction of a rectangle.

【図５】矩形の強制分離を示す説明図である。FIG. 5 is an explanatory diagram showing forced separation of a rectangle.

【符号の説明】[Explanation of symbols]

１００文字認識装置１ＯＣＲ部２ＣＰＵ３表示装置２１矩形抽出部２２文字切り出し部２３文字認識部２４辞書 Reference Signs List 100 character recognition device 1 OCR unit 2 CPU 3 display device 21 rectangle extraction unit 22 character cutout unit 23 character recognition unit 24 dictionary

Claims

【特許請求の範囲】[Claims]

【請求項１】文章を構成する文字をそれぞれ矩形とし
て切り出す文字切り出し方法において、前記切り出した矩形のうち注目する一つの注目矩形の第
１矩形幅と、当該注目矩形幅と前記注目矩形に隣接する
矩形の矩形幅とを加えた第２矩形幅と、を取得する工程
と、切り出した矩形から求めた標準的な矩形の標準矩形幅を
取得する工程と、前記第１矩形幅と標準矩形幅とを比較し、さらに、前記
第２矩形幅と標準矩形幅とを比較する工程と、前記第１矩形幅が前記標準矩形幅から大きく離れている
場合で、かつ、前記第２矩形幅が前記標準矩形幅の２倍
に近い場合は、文字切り出し候補位置を間違えていると
判断する工程と、を含むことを特徴とする文字切り出し方法。1. A character extracting method for extracting a character constituting a sentence as a rectangle, wherein: a first rectangular width of a target rectangle of interest among the extracted rectangles; and a width of the target rectangle adjacent to the target rectangle. Obtaining a second rectangle width obtained by adding the rectangle width of the rectangle; obtaining a standard rectangle width of a standard rectangle obtained from the cut-out rectangle; and obtaining the first rectangle width and the standard rectangle width. Comparing the second rectangular width with a standard rectangular width; and when the first rectangular width is greatly apart from the standard rectangular width, and the second rectangular width is equal to the standard rectangular width. A step of determining that the character extraction candidate position is incorrect if the width is close to twice the width of the rectangle.

【請求項２】さらに、前記第２矩形幅から前記文字切
り出し候補位置の間違いを判断するため分離候補矩形を
生成する工程と、当該分離候補矩形内の黒ラン長さや黒画素比率などの黒
ラン情報に基づいて文字切り出し候補位置の間違いを判
断する工程と、文字切り出し候補位置が間違っていると判断したら、前
記分離候補矩形を強制分離させる工程と、を含むことを特徴とする請求項１に記載の文字切り出し
方法。2. The method according to claim 1, further comprising: generating a separation candidate rectangle for determining an error in the character segmentation candidate position from the second rectangle width; and determining a black run length and a black pixel ratio in the separation candidate rectangle. The method according to claim 1, further comprising: a step of judging an error in the character segmentation candidate position based on the information; and a step of forcibly separating the separation candidate rectangle if the character segmentation candidate position is judged to be incorrect. The character extraction method described.

【請求項３】さらに、前記分離候補矩形が所定幅より
小さいときには強制分離を行わない工程を含むことを特
徴とする請求項２に記載の文字切り出し方法。3. The method according to claim 2, further comprising the step of not performing forced separation when the separation candidate rectangle is smaller than a predetermined width.

【請求項４】一般文書、伝票、帳簿などの印刷文字や
手書き文字を画像入力する画像入力手段と、前記画像入力手段から入力した画像から矩形を抽出し、
前記請求項１〜３のいずれか一つに記載の方法を実施す
ることで前記抽出した矩形により決まる文字切り出し候
補位置の間違いを判定し、正しく矩形を抽出し直す矩形
抽出手段と、前記矩形抽出手段により抽出した各矩形を文字ごと切り
出す文字切り出し手段と、前記切り出した文字を認識する文字認識手段と、を具備することを特徴とする文字認識装置。4. An image input means for inputting print characters and handwritten characters of general documents, slips, books and the like as images, and extracting a rectangle from the image input from the image input means;
A rectangle extracting unit that determines an error in a character extraction candidate position determined by the extracted rectangle by performing the method according to any one of claims 1 to 3, and re-extracts the rectangle correctly. A character recognition device comprising: a character cutout unit that cuts out each rectangle extracted by the unit for each character; and a character recognition unit that recognizes the cutout character.

【請求項５】前記請求項１〜３のいずれか一つに記載
された方法を、実行するプログラムを格納したことを特
徴とするコンピュータが読取可能な記憶媒体。5. A computer-readable storage medium storing a program for executing the method according to claim 1. Description: