JP2014035622A

JP2014035622A - Image processing apparatus and image processing program

Info

Publication number: JP2014035622A
Application number: JP2012175918A
Authority: JP
Inventors: Satoshi Kubota; 聡久保田; Shunichi Kimura; 俊一木村
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2012-08-08
Filing date: 2012-08-08
Publication date: 2014-02-24
Anticipated expiration: 2032-08-08
Also published as: JP6003375B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus configured to prevent multiple character strings from being extracted when a row of character strings is extracted from an image.SOLUTION: An image processing apparatus includes: clipping means for clipping a character string candidate area as a candidate area for a character string, from an image; determining means for determining whether the character string candidate area is one character string or not; and dividing means for dividing the character string candidate area in a direction perpendicular to a row direction thereof when the determining means determines that it is not one character string. The clipping means uses the image in the area divided by the dividing means.

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

特許文献１には、文字枠なしでべたに書かれた手書き文字列から、１文字領域の候補に確率を付与して出力する文字切り出し手段と、該文字切り出し手段から出力された１文字領域の候補に対して認識を行い、文字コードの候補に確率を付与して出力する１文字認識手段と、該１文字認識手段の出力結果から用語候補に確率を付与して出力する用語照合手段と、前記文字切り出し手段が出力した１文字領域候補の確率と前記１文字認識手段の出力結果を根拠として得られる１文字領域候補の確率とを統合して総合的な１文字領域候補の出力を得る第１の確率統合手段と、前記１文字認識手段が出力した文字コード候補の確率と前記用語照合手段の出力結果を根拠として得られる文字コード候補の確率とを統合して総合的な文字コード候補の確率を得る第２の確率統合手段と、前記第１の確率統合手段と前記第２の確率統合手段との出力に基づき、最終結果となる文字列コード列を出力する出力判定手段とを具備し、前記第１の確率統合手段及び前記第２の確率統合手段の出力をそれぞれ前記文字切り出し手段及び前記１文字認識手段にフィードバックし、確率の統合を定常状態に達するまで繰り返し行うことを特徴とする文字列入力方式が開示されている。 Patent Document 1 discloses a character segmentation unit that gives a probability to a candidate for one character region from a handwritten character string that is written without a character frame, and a character region that is output from the character segmentation unit. 1 character recognition means for performing recognition on a candidate, giving a probability to a character code candidate and outputting it, and a term matching means for giving a probability to a term candidate from the output result of the 1 character recognition means, A first character area candidate output obtained by integrating the probability of one character area candidate output by the character cutout means and the probability of one character area candidate obtained based on the output result of the one character recognition means. A probability integration unit for one character, a probability of a character code candidate output by the one character recognition unit, and a probability of a character code candidate obtained on the basis of the output result of the term matching unit. Sure Second probability integration means for obtaining, and output determination means for outputting a character string code string as a final result based on the outputs of the first probability integration means and the second probability integration means, Characters characterized in that the outputs of the first probability integration means and the second probability integration means are fed back to the character cutout means and the one character recognition means, respectively, and the probability integration is repeated until a steady state is reached. A column input method is disclosed.

特許文献２には、マルチサイズ、不定ピッチの日本語文書の認識率を上げることを目的とし、文字パターン辞書に、文字のブロック数が登録されており、文字切り出し部で切り出した文字パターンを、認識部で文字パターン辞書と照合し、認識結果文字のブロック数と文字パターンのブロック数とをブロック数参照部で比較し、ブロック数が不一致のときは、再文字切り出し部で文字パターンを分割して再認識部で認識し、認識結果を出力し、これにより分離文字の切り出し精度、認識率を改善できることが開示されている。 In Patent Document 2, the number of character blocks is registered in a character pattern dictionary for the purpose of increasing the recognition rate of multi-size, undefined pitch Japanese documents. The recognition unit compares it with the character pattern dictionary, compares the number of recognized character blocks with the number of character pattern blocks in the block number reference unit, and if the number of blocks does not match, the character pattern is divided in the re-character segmentation unit. It is disclosed that recognition can be performed by a re-recognition unit, and a recognition result can be output, thereby improving separation accuracy and recognition rate of separated characters.

特開平０２−１６５２８７号公報Japanese Patent Laid-Open No. 02-165287 特開平０５−１４３７７８号公報JP 05-143778 A

本発明は、画像から１行の文字列を抽出する場合にあって、複数行の文字列を抽出してしまうことを防止するようにした画像処理装置及び画像処理プログラムを提供することを目的としている。 SUMMARY OF THE INVENTION An object of the present invention is to provide an image processing apparatus and an image processing program that prevent a character string of a plurality of lines from being extracted when a character string of one line is extracted from an image. Yes.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、画像から文字列の候補となる領域である文字列候補領域を切り出す切出手段と、前記文字列候補領域が１行の文字列であるか否かを判定する判定手段と、判定結果が１行の文字列ではない場合は、前記文字列候補領域の行方向とは垂直方向に分割する分割手段を具備し、前記切出手段は、前記分割手段によって分割された領域の画像を対象の画像とすることを特徴とする画像処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
According to the first aspect of the present invention, a cutting means for cutting out a character string candidate area that is a character string candidate area from an image, and a determination means for determining whether or not the character string candidate area is a character string in one line. If the determination result is not a character string of one line, the character string candidate area includes a dividing unit that divides the character string candidate area in a direction perpendicular to the line direction, and the clipping unit is an area divided by the dividing unit. The image processing apparatus is characterized in that the target image is the target image.

請求項２の発明は、前記文字列候補領域から文字の候補となる領域である文字候補領域を切り出す第２の切出手段を具備し、前記判定手段は、前記文字候補領域の特徴量に基づいて、前記文字列候補領域が１行の文字列であるか否かを判定することを特徴とする請求項１に記載の画像処理装置である。 The invention of claim 2 further comprises a second cutout unit that cuts out a character candidate region that is a character candidate region from the character string candidate region, and the determination unit is based on a feature amount of the character candidate region. The image processing apparatus according to claim 1, wherein it is determined whether or not the character string candidate area is a character string of one line.

請求項３の発明は、前記判定手段は、前記文字候補領域の特徴量として、該文字候補領域の大きさ、位置又は前記文字列候補領域と該文字候補領域の大きさの比率のいずれか１つ以上を用いることを特徴とする請求項２に記載の画像処理装置である。 According to a third aspect of the present invention, the determination means uses any one of the size and position of the character candidate region or the ratio of the size of the character string candidate region and the character candidate region as the feature amount of the character candidate region. The image processing apparatus according to claim 2, wherein two or more are used.

請求項４の発明は、前記文字候補領域を対象として文字認識する文字認識手段を具備し、前記判定手段は、前記文字認識手段による認識結果に基づいて、前記文字列候補領域が１行の文字列であるか否かを判定することを特徴とする請求項２又は３に記載の画像処理装置である。 The invention according to claim 4 further comprises character recognition means for recognizing characters for the character candidate area, and the determination means is a character in which the character string candidate area is one line based on a recognition result by the character recognition means. 4. The image processing apparatus according to claim 2, wherein it is determined whether or not it is a column.

請求項５の発明は、前記分割手段は、前記文字列候補領域内の行方向の投影分布に基づいて、分割位置を決定することを特徴とする請求項１から４のいずれか一項に記載の画像処理装置である。 The invention according to claim 5 is characterized in that the dividing means determines a dividing position based on a projection distribution in a row direction in the character string candidate region. This is an image processing apparatus.

請求項６の発明は、前記分割手段は、前記文字候補領域に基づいて、分割した前記文字列候補領域の画像を再生することを特徴とする請求項２から５のいずれか一項に記載の画像処理装置である。 The invention according to claim 6 is characterized in that the dividing means reproduces an image of the divided character string candidate area based on the character candidate area. An image processing apparatus.

請求項７の発明は、コンピュータを、画像から文字列の候補となる領域である文字列候補領域を切り出す切出手段と、前記文字列候補領域が１行の文字列であるか否かを判定する判定手段と、判定結果が１行の文字列ではない場合は、前記文字列候補領域の行方向とは垂直方向に分割する分割手段として機能させ、前記切出手段は、前記分割手段によって分割された領域の画像を対象の画像とすることを特徴とする画像処理プログラムである。 According to the seventh aspect of the present invention, the computer determines a character string candidate region that is a character string candidate region from the image, and determines whether or not the character string candidate region is a single character string. And when the determination result is not a character string of one line, the dividing means functions as a dividing means for dividing the character string candidate region in a direction perpendicular to the line direction, and the cutting means is divided by the dividing means. An image processing program is characterized in that an image of a specified area is a target image.

請求項１の画像処理装置によれば、画像から１行の文字列を抽出する場合にあって、複数行の文字列を抽出してしまうことを防止することができる。 According to the image processing apparatus of the first aspect, it is possible to prevent a plurality of lines of character strings from being extracted when a single line of character strings is extracted from the image.

請求項２の画像処理装置によれば、文字候補領域の特徴量に基づいて、文字列候補領域が１行の文字列であるか否かを判定することができる。 According to the image processing apparatus of the second aspect, it is possible to determine whether or not the character string candidate area is a single line character string based on the feature amount of the character candidate area.

請求項３の画像処理装置によれば、文字候補領域の特徴量として、該文字候補領域の大きさ、位置又は前記文字列候補領域と該文字候補領域の大きさの比率のいずれか１つ以上を用いることができる。 According to the image processing apparatus of claim 3, as the feature amount of the character candidate area, any one or more of the size and position of the character candidate area or the ratio of the size of the character string candidate area and the character candidate area Can be used.

請求項４の画像処理装置によれば、文字認識結果に基づいて、文字列候補領域が１行の文字列であるか否かを判定することができる。 According to the image processing apparatus of the fourth aspect, it is possible to determine whether or not the character string candidate area is a one-line character string based on the character recognition result.

請求項５の画像処理装置によれば、文字列候補領域内の行方向の投影分布に基づいて、分割位置を決定することができる。 According to the image processing apparatus of the fifth aspect, the division position can be determined based on the projection distribution in the row direction in the character string candidate region.

請求項６の画像処理装置によれば、文字候補領域に基づいて、分割した文字列候補領域の画像を再生することができる。 According to the image processing device of the sixth aspect, it is possible to reproduce the image of the divided character string candidate region based on the character candidate region.

請求項７の画像処理プログラムによれば、画像から１行の文字列を抽出する場合にあって、複数行の文字列を抽出してしまうことを防止することができる。 According to the image processing program of the seventh aspect, it is possible to prevent a plurality of lines of character strings from being extracted when a single line of character strings is extracted from the image.

本実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 文字列判定モジュールにおける処理例を示す説明図である。It is explanatory drawing which shows the process example in a character string determination module. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 対象とする文字列候補領域の例を示す説明図である。It is explanatory drawing which shows the example of the character string candidate area | region made into object. 文字列候補領域内から単文字候補領域を抽出した例を示す説明図である。It is explanatory drawing which shows the example which extracted the single character candidate area | region from the inside of a character string candidate area | region. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

以下、図面に基づき本発明を実現するにあたっての好適な一実施の形態の例を説明する。
図１は、本実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、すべての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, an example of a preferred embodiment for realizing the present invention will be described with reference to the drawings.
FIG. 1 shows a conceptual module configuration diagram of a configuration example of the present embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, the values may be different from each other, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

本実施の形態である画像処理装置は、画像から１行の文字列を抽出し、その１行の文字列の画像を文字認識するものであって、図１の例に示すように、文字列切出しモジュール１１０、文字切出しモジュール１２０、文字認識モジュール１３０、文字列判定モジュール１４０、文字列分割モジュール１５０を有している。特に、文書画像にスキュー（傾き）が存在する場合や、手書き文書などで行間又は列間の間隔が一定ではない場合や、例えば手書きの自由記入枠に斜め書きで記載された場合など、文字行（文字列）自体にスキューが存在する場合を対象とするものである。 The image processing apparatus according to the present embodiment extracts one line of character string from the image and recognizes the image of the character string of one line. As shown in the example of FIG. A cutout module 110, a character cutout module 120, a character recognition module 130, a character string determination module 140, and a character string division module 150 are included. In particular, when there is a skew (slope) in the document image, when the spacing between lines or columns is not constant in a handwritten document, or when the text line is written diagonally in a handwritten free entry frame, for example, This is for the case where there is a skew in (character string) itself.

文字列切出しモジュール１１０は、文字切出しモジュール１２０、文字列分割モジュール１５０と接続されている。文字列切出しモジュール１１０は、画像（文字画像データ１０８）から文字列の候補となる領域である文字列候補領域（文字列画像データ１１２）を切り出す。また、文字列切出しモジュール１１０は、文字列分割モジュール１５０によって分割された領域の画像（分割画像データ１５２）を対象の画像として、文字列候補領域（文字列画像データ１１２）を切り出す。なお、ここで文字列とは、文字の並びであって、横書きにおける文字列であってもよいし、縦書きにおける文字列であってもよい。
文字切出しモジュール１２０は、文字列切出しモジュール１１０、文字認識モジュール１３０、文字列判定モジュール１４０と接続されている。文字切出しモジュール１２０は、文字列切出しモジュール１１０によって切り出された文字列候補領域（文字列画像データ１１２）から文字の候補となる領域である文字候補領域を切り出して、文字候補領域を文字認識モジュール１３０に渡し、文字列候補領域（文字列画像データ１２２（文字列画像データ１１２と同等のもの））を文字列判定モジュール１４０に渡す。
文字認識モジュール１３０は、文字切出しモジュール１２０、文字列判定モジュール１４０と接続されている。文字認識モジュール１３０は、文字切出しモジュール１２０によって切り出された文字候補領域を対象として文字認識（画像中の文字を認識して文字コードデータに変換する処理）し、認識結果１３２を文字列判定モジュール１４０に渡す。 The character string cutting module 110 is connected to the character cutting module 120 and the character string dividing module 150. The character string cutout module 110 cuts out a character string candidate area (character string image data 112), which is a candidate area for a character string, from an image (character image data 108). The character string cutout module 110 cuts out a character string candidate area (character string image data 112) using the image of the area divided by the character string division module 150 (divided image data 152) as a target image. Here, the character string is a sequence of characters and may be a character string in horizontal writing or a character string in vertical writing.
The character cutout module 120 is connected to the character string cutout module 110, the character recognition module 130, and the character string determination module 140. The character cutout module 120 cuts out a character candidate area that is a candidate for a character from the character string candidate area (character string image data 112) cut out by the character string cutout module 110, and converts the character candidate area into the character recognition module 130. The character string candidate area (character string image data 122 (equivalent to the character string image data 112)) is passed to the character string determination module 140.
The character recognition module 130 is connected to the character cutout module 120 and the character string determination module 140. The character recognition module 130 performs character recognition (processing for recognizing a character in an image and converting it into character code data) for the character candidate region cut out by the character cutout module 120, and the recognition result 132 is converted into a character string determination module 140. To pass.

文字列判定モジュール１４０は、文字切出しモジュール１２０、文字認識モジュール１３０、文字列分割モジュール１５０と接続されている。文字列判定モジュール１４０は、文字列切出しモジュール１１０によって切り出された文字列候補領域が１行の文字列であるか否かを判定する。なお、ここで行とは、横書きにおける行（文字列）であってもよいし、縦書きにおける行（文字列）であってもよい。そして、文字列判定モジュール１４０は、１行の文字列ではない（文字列候補領域には複数行の文字列が含まれている）と判定した場合は、文字列画像データ１４２（文字列画像データ１１２又は文字列画像データ１２２と同等のもの）を文字列分割モジュール１５０に渡し、１行の文字列である（文字列候補領域には１行だけの文字列が含まれている）と判定した場合は、その１行の文字列に対する文字認識モジュール１３０による認識結果である最終認識結果１４４を出力する。最終認識結果１４４を出力するとは、例えば、ディスプレイ等の表示装置に表示すること、プリンタ等の印刷装置で印刷すること、文書データベース等の記憶装置へ文書の認識結果として書き込むこと、メモリーカード等の記憶媒体に記憶すること、他の情報処理装置（例えば、翻訳処理装置等）へ渡すこと等が含まれる。
また、文字列判定モジュール１４０は、文字候補領域の特徴量に基づいて、文字列候補領域が１行の文字列であるか否かを判定するようにしてもよい。この文字候補領域の特徴量として、その文字候補領域の大きさ、位置又は文字列候補領域と文字候補領域の大きさの比率のいずれか１つ以上を用いるようにしてもよい。
文字列判定モジュール１４０は、文字認識モジュール１３０による認識結果に基づいて、文字列候補領域が１行の文字列であるか否かを判定するようにしてもよい。 The character string determination module 140 is connected to the character cutout module 120, the character recognition module 130, and the character string division module 150. The character string determination module 140 determines whether or not the character string candidate area cut out by the character string cut-out module 110 is a one-line character string. Here, the line may be a horizontal line (character string) or a vertical line (character string). If the character string determination module 140 determines that the character string is not a single-line character string (a character string candidate area includes a plurality of lines of character strings), the character string image data 142 (character string image data) 112 or equivalent to the character string image data 122) is passed to the character string dividing module 150, and it is determined that the character string is one line (the character string candidate area includes only one line of character string). In this case, a final recognition result 144 that is a recognition result by the character recognition module 130 for the character string of one line is output. The final recognition result 144 is output, for example, displayed on a display device such as a display, printed on a printing device such as a printer, written as a document recognition result in a storage device such as a document database, a memory card, etc. It includes storing in a storage medium, passing to another information processing apparatus (for example, a translation processing apparatus), and the like.
Further, the character string determination module 140 may determine whether or not the character string candidate area is a single line character string based on the feature amount of the character candidate area. As the feature amount of the character candidate area, any one or more of the size and position of the character candidate area or the ratio of the size of the character candidate area and the character candidate area may be used.
The character string determination module 140 may determine whether or not the character string candidate area is a one-line character string based on the recognition result by the character recognition module 130.

文字列分割モジュール１５０は、文字列切出しモジュール１１０、文字列判定モジュール１４０と接続されている。文字列分割モジュール１５０は、文字列判定モジュール１４０による判定結果が１行の文字列ではない場合は、文字列候補領域の行方向とは垂直方向に分割する。ここで垂直方向とは、行方向と垂直になっている場合の他に、文字列候補領域の行方向を分割するための線（直線、曲線を含む）であればよい。例えば、分割位置として予め定められた位置（例えば、文字列候補領域における行方向の１／２等の位置）としてもよい。
また、文字列分割モジュール１５０は、文字列候補領域内の行方向の投影分布に基づいて、分割位置を決定するようにしてもよい。
文字列分割モジュール１５０は、文字候補領域に基づいて、分割した文字列候補領域の画像を再生するようにしてもよい。画像の再生処理については、図１２を用いて後述する。
なお、図１の例では、文字切出しモジュール１２０、文字認識モジュール１３０があるが、この２つは無くてもよい。つまり、文字列切出しモジュール１１０は文字列判定モジュール１４０と接続されていてもよい。文字列判定モジュール１４０は、文字列候補領域の特徴量に基づいて、文字列候補領域が１行の文字列であるか否かを判定する。文字列候補領域の特徴量として、例えば、文字列候補領域の投影分布の形状がある。具体例については後述する。 The character string dividing module 150 is connected to the character string cutting module 110 and the character string determining module 140. The character string division module 150 divides the character string candidate area in a direction perpendicular to the line direction when the determination result by the character string determination module 140 is not a single line character string. Here, the vertical direction may be a line (including a straight line and a curve) for dividing the row direction of the character string candidate region, in addition to the case of being perpendicular to the row direction. For example, it may be a position determined in advance as a division position (for example, a position such as ½ in the line direction in the character string candidate region).
Further, the character string division module 150 may determine the division position based on the projection distribution in the row direction in the character string candidate region.
The character string division module 150 may reproduce the image of the divided character string candidate area based on the character candidate area. The image reproduction process will be described later with reference to FIG.
In the example of FIG. 1, there are the character cutout module 120 and the character recognition module 130, but these two may be omitted. That is, the character string cutout module 110 may be connected to the character string determination module 140. The character string determination module 140 determines whether or not the character string candidate area is a one-line character string based on the feature amount of the character string candidate area. As the feature amount of the character string candidate region, for example, there is a shape of the projection distribution of the character string candidate region. Specific examples will be described later.

図２は、本実施の形態による処理例（文字認識処理例）を示すフローチャートである。なお、以下で説明する処理の流れは１つの文字列に関する処理の流れを説明するものであり、複数の文字列を処理する場合は、以下に説明するステップＳ２０２からステップＳ２１４を文字列数分だけ繰り返すようにする。 FIG. 2 is a flowchart showing a processing example (character recognition processing example) according to this embodiment. Note that the processing flow described below describes the processing flow related to a single character string. When processing a plurality of character strings, steps S202 to S214 described below are performed for the number of character strings. Try to repeat.

ステップＳ２０２では、文字列切出しモジュール１１０は、文字画像データ１０８中の文字列に相当する候補領域を文字列候補領域（文字列画像データ１１２）として順次切り出す。文字列切出しモジュール１１０における文字列切出し処理は、文字列方向の投影分布に基づいて領域分割する従来技術を利用した既存の手法でよい。
ステップＳ２０４では、文字切出しモジュール１２０は、文字列切出しモジュール１１０で切り出した文字列候補画像（文字列画像データ１１２）に対して、文字切出し処理を行い、単文字候補領域を切り出す。文字切出しモジュール１２０における文字切出し処理は、画素塊を切り出す従来技術を利用した既存の手法でよい。ここで、画素塊とは、４連結又は８連結で連続する画素領域を少なくとも含み、これらの画素領域の集合をも含む。これらの画素領域の集合とは、４連結等で連続した画素領域が複数あり、その複数の画素領域は近傍にあるものをいう。ここで、近傍にあるものとは、例えば、互いの画素領域が距離的に近いもの、文章としての１行から１文字ずつ切り出すように縦又は横方向に射影し、空白地点で切り出した画像領域、又は一定間隔で切り出した画像領域等がある。なお、１つの画素塊として、１文字の画像となる場合が多い。ただし、実際に人間が文字として認識できる画素領域である必要はない。文字の一部分、文字を形成しない画素領域等もあり、何らかの画素の塊であればよい。単文字候補領域は、これらの画素塊の外接矩形であってもよい。 In step S202, the character string cutout module 110 sequentially cuts out candidate areas corresponding to the character strings in the character image data 108 as character string candidate areas (character string image data 112). The character string cutting process in the character string cutting module 110 may be an existing method using a conventional technique for dividing an area based on a projection distribution in the character string direction.
In step S204, the character cutout module 120 performs a character cutout process on the character string candidate image (character string image data 112) cut out by the character string cutout module 110, and cuts out a single character candidate area. The character cutout processing in the character cutout module 120 may be an existing method using a conventional technique for cutting out a pixel block. Here, the pixel block includes at least a pixel region that is continuous in four or eight connections, and also includes a set of these pixel regions. The set of these pixel areas means that there are a plurality of continuous pixel areas such as 4-connected, and the plurality of pixel areas are in the vicinity. Here, what is in the vicinity is, for example, an image area in which the pixel areas are close to each other in distance, an image area that is projected vertically or horizontally so as to cut out one character at a time from a line as a sentence, and cut out at a blank spot Or an image region cut out at regular intervals. In many cases, an image of one character is formed as one pixel block. However, it is not necessary that the pixel area is actually recognizable as a character by humans. There are a part of a character, a pixel region that does not form a character, and the like, and any pixel block may be used. The single character candidate area may be a circumscribed rectangle of these pixel blocks.

ステップＳ２０６では、文字認識モジュール１３０は、文字切出しモジュール１２０で切り出した単文字候補領域に対して文字認識を行い、認識結果１３２としての文字コードを出力する。
ステップＳ２０８では、文字列判定モジュール１４０は、例えば、文字切出しモジュール１２０で切り出した単文字候補領域の位置情報及び文字認識モジュール１３０で得られた認識結果１３２に基づいて、文字列候補領域が１行の文字列として切り出されているか、又は複数行混在の領域として切り出されているか否かを判定する。なお、文字列判定モジュール１４０における文字列判定の詳細は後述する。 In step S <b> 206, the character recognition module 130 performs character recognition on the single character candidate region cut out by the character cut-out module 120 and outputs a character code as the recognition result 132.
In step S <b> 208, the character string determination module 140 determines, for example, that the character string candidate area is one line based on the position information of the single character candidate area clipped by the character cutout module 120 and the recognition result 132 obtained by the character recognition module 130. It is determined whether or not it is cut out as a character string or a region mixed with a plurality of lines. Details of character string determination in the character string determination module 140 will be described later.

ステップＳ２１０では、文字列判定モジュール１４０は、文字列候補画像が１行の文字列の画像か否かを判断し、１行の文字列の画像である場合はステップＳ２１４へ進み、それ以外の場合（文字列候補領域には複数行が混在していると判定した場合）はステップＳ２１２へ進む。
ステップＳ２１２では、文字列分割モジュール１５０は、分割処理を行う。文字列分割モジュール１５０における分割処理の詳細については、図４のフローチャート例を用いて後述する。
ステップＳ２１４では、文字列判定モジュール１４０は、認識結果１３２を最終認識結果１４４として出力する。文字列候補領域が１行の文字列と判定されたので、文字列判定モジュール１４０は文字認識モジュール１３０の認識結果１３２を出力し、対象の文字列候補領域の処理を終了する。 In step S210, the character string determination module 140 determines whether or not the character string candidate image is a one-line character string image. If the character string candidate image is a one-line character string image, the process proceeds to step S214. If it is determined that a plurality of lines are mixed in the character string candidate area, the process proceeds to step S212.
In step S212, the character string division module 150 performs division processing. Details of the division processing in the character string division module 150 will be described later with reference to the flowchart example of FIG.
In step S214, the character string determination module 140 outputs the recognition result 132 as the final recognition result 144. Since it is determined that the character string candidate area is one line of character string, the character string determination module 140 outputs the recognition result 132 of the character recognition module 130 and ends the processing of the target character string candidate area.

次に、文字列判定モジュール１４０における文字列候補領域に対する判定処理の詳細について説明する。
文字列判定モジュール１４０は、文字列切出しモジュール１１０、文字切出しモジュール１２０、文字認識モジュール１３０で得られた情報に基づいて、文字列候補領域の「文字列らしさ」又は「文字列としてのきれいさ」を数値として算出し、その数値に基づいて判定処理を行う。ここで言う「文字列らしさ」又は「文字列としてのきれいさ」とは、文字列候補領域内の各単文字候補領域の並びが揃っていること、各単文字候補領域の大きさが均一であること、又は各単文字候補領域の文字認識結果が妥当であるということを意味するものである。したがって、例えば、入力された文字列候補領域内の各単文字候補領域のサイズ（外接矩形の幅、高さ、面積）がほぼ均一であること、文字列候補領域の中心に沿って均等な距離で並んでいること、又は各単文字候補領域の文字認識結果が妥当であると判断される場合は、その文字列は「文字列らしい」又は「文字列としてきれい」とする。その反対に文字列候補領域内の各単文字候補領域がばらばらに位置していたり、各単文字候補領域の大きさが極端に異なっていたりする場合は、その文字列は「文字列らしくない」又は「文字列としてきれいではない」とする。言い換えれば、「文字列らしさ」又は「文字列としてのきれいさ」は文字列候補領域内の単文字候補領域の並び、大きさ、又は文字認識結果のばらつきを数値化したものである。 Next, details of the determination process for the character string candidate region in the character string determination module 140 will be described.
Based on the information obtained by the character string cutout module 110, the character cutout module 120, and the character recognition module 130, the character string determination module 140 “characteristic” or “cleanness as a character string” of the character string candidate area. Is calculated as a numerical value, and determination processing is performed based on the numerical value. Here, “characteristic” or “cleanness as a character string” means that each single character candidate area in the character string candidate area is aligned, and the size of each single character candidate area is uniform. This means that the character recognition result of each single character candidate area is valid. Therefore, for example, the size (width, height, and area of the circumscribed rectangle) of each single character candidate area in the input character string candidate area is substantially uniform, and an equal distance along the center of the character string candidate area If the character recognition result of each single character candidate area is determined to be valid, the character string is assumed to be “character string” or “clean as a character string”. On the other hand, if each single character candidate area in the character string candidate area is scattered or the size of each single character candidate area is extremely different, the character string is `` not like a character string ''. Or, “It is not beautiful as a character string”. In other words, “character string-likeness” or “cleanness as a character string” is obtained by quantifying the variation in arrangement, size, or character recognition result of single-character candidate areas in the character string candidate area.

ここで、文字列判定モジュール１４０において算出される「文字列らしさ」又は「文字列のきれいさ」を表す数値の具体的な例を示す。
１．文字列内の文字矩形高さの標準偏差：σ_ｓ

ここで、ｓ_ｉはｉ番目の文字矩形高さ、ｓ（なお、数式中ではｓのオーバーバー（ｏｖｅｒｂａｒ、上線）、以下同様）は文字列中の文字矩形高さの平均値、ｎは文字列中の文字矩形数をそれぞれ表す。文字矩形高さとは、単文字候補領域の高さである。なお、文字矩形高さのかわりに、文字矩形幅、文字矩形面積であってもよい。
２．（文字列中央値−文字中央値）の標準偏差：σ_ｃ

ここで、ｃ_ｉはｉ番目の文字矩形の重心のｙ座標値（又はｘ座標値）と文字矩形の重心のｙ座標値（又はｘ座標値）の差分の絶対値、ｃ（なお、数式中ではｃのオーバーバー、以下同様）はｃ_ｉの平均値、ｎは文字列中の文字矩形数をそれぞれ表す。
３．文字列高さ（あるいは文字列幅）対する平均文字矩形高さ：ｒ_ｓ

ここで、ｓ（なお、数式中ではｓのオーバーバー、以下同様）は文字列中の文字矩形高さの平均値、ｌｓは文字列高さをそれぞれ表す。なお、文字列高さのかわりに文字列幅であってもよい。文字矩形高さのかわりに文字矩形幅であってもよい。
４．認識確度平均値：ｃｆ（なお、数式中ではｃｆのオーバーバー、以下同様）

ここで、ｃｆ_ｉは文字列中のｉ番目の文字の認識確度値を表す。 Here, a specific example of a numerical value representing “character string likelihood” or “character string cleanliness” calculated by the character string determination module 140 will be shown.
1. Standard deviation of height of character rectangle in character string: σ _s

Here, s _i is the height of the i-th character rectangle, s (in the equation, s overbar (overbar), the same applies hereinafter) is the average value of the character rectangle height in the character string, and n is the character Represents the number of character rectangles in the column. The character rectangle height is the height of the single character candidate area. Instead of the character rectangle height, the character rectangle width and the character rectangle area may be used.
2. Standard deviation of (character string median−character median): σ _c

Here, c _i is the absolute value of the difference between the y-coordinate value (or x-coordinate value) of the centroid of the i-th character rectangle and the y-coordinate value (or x-coordinate value) of the centroid of the character rectangle, c ( in c the over bars, hereinafter the same) is the average value of c _i, n represents the number character rectangle in the character string, respectively.
3. String Height (or string width) against the average character rectangle Height: r _s

Here, s (in the formula, an overs of s, and so on) represents the average value of the height of the character rectangle in the character string, and ls represents the height of the character string. The character string width may be used instead of the character string height. The character rectangle width may be used instead of the character rectangle height.
4). Recognition accuracy average value: cf (in the formula, cf overbar, and so on)

Here, cf _i represents the recognition accuracy value of the i-th character in the character string.

次に、算出された上述の「文字列らしさ」又は「文字列のきれいさ」を表す数値を用いた判定処理の詳細を説明する。
図３は、文字列判定モジュール１４０における判定処理の流れの一例を表した図である。文字列判定モジュール１４０では、算出された上記の「文字列らしさ」又は「文字列のきれいさ」を表す数値を、予め定められた各閾値と比較する。ここで図３を用いて判定処理の流れを説明する。
１．まず算出した文字列内の文字矩形高さ（又は文字矩形幅）の標準偏差σ_ｓの値と予め定められた閾値Ｔｈ_σｓとの比較判定処理を行う（閾値判定３１５）。ここでσ_ｓ≦Ｔｈ_σｓの場合は、標準偏差σ_ｓが所定の閾値Ｔｈ_σｓ以下なので単文字候補領域の高さは揃っていると判定して、文字列候補画像は「文字列らしい」として（文字列中央値−文字中央値）の標準偏差σ_ｃによる判定処理に移る。またσ_ｓ＞Ｔｈ_σｓの場合は、単文字候補領域の高さが揃っていないと判定して、文字列候補領域は「文字列らしくない」として複数行混在文字列領域と判定３８０し、文字列判定モジュール１４０による文字列候補領域の判定処理を終了する。 Next, details of the determination process using the numerical value representing the calculated “character string likelihood” or “character string cleanliness” will be described.
FIG. 3 is a diagram illustrating an example of the flow of determination processing in the character string determination module 140. The character string determination module 140 compares the calculated numerical value indicating the “character string likelihood” or “character string cleanliness” with each predetermined threshold value. Here, the flow of the determination process will be described with reference to FIG.
1. First, a comparison determination process is performed between the value of the standard deviation σ _s of the calculated character rectangle height (or character rectangle width) in the character string and a predetermined threshold Th _σs (threshold determination 315). Here, when σ _s ≦ Th _σs , the standard deviation σ _s is equal to or smaller than the predetermined threshold Th _σs , so it is determined that the single character candidate areas have the same height, and the character string candidate image is assumed to be “character string”. The process proceeds to a determination process based on the standard deviation σ _c of (character string median−character median). If σ _s > Th _σs , it is determined that the heights of the single character candidate areas are not uniform, the character string candidate area is determined to be “not like a character string” as a multi-line mixed character string area 380, and the character The character string candidate area determination process by the column determination module 140 is terminated.

２．次に、算出した（文字列中央値−文字中央値）の標準偏差σ_ｃの値と予め定められた閾値Ｔｈ_σｃとの比較判定処理を行う（閾値判定３２５）。ここでσ_ｃ≦Ｔｈ_σｃの場合は、各単文字候補領域の重心がほぼ一致していると判定して、文字列候補画像は「文字列らしい」として、文字列高さ（又は文字列幅）に対する平均文字矩形高さ（又は平均文字矩形幅）ｒ_ｓによる判定処理に移る。またσ_ｃ＞Ｔｈ_σｃの場合は、各単文字候補領域の重心が揃ってないと判定して、文字列候補領域は「文字列らしくない」として複数行混在文字列領域と判定３８０し、文字列判定モジュール１４０による文字列候補領域の判定処理を終了する。 2. Next, a comparison determination process is performed between the calculated standard deviation σ _c of (character string median−character median) and a predetermined threshold Th _σc (threshold determination 325). Here, when σ _c ≦ Th _σc , it is determined that the centroids of the single character candidate regions are substantially matched, and the character string candidate image is assumed to be “character string”, and the character string height (or character string width) is determined. The process _{proceeds to} a determination process based on the average character rectangle height (or average character rectangle width) rs. If σ _c > Th _σc , it is determined that the centroids of the single character candidate areas are not aligned, and the character string candidate area is determined not to be a character string and is determined to be a multi-line mixed character string area 380. The character string candidate area determination process by the column determination module 140 is terminated.

３．次に、算出した文字列高さ（又は文字列幅）に対する平均文字矩形高さ（又は文字矩形幅）ｒ_ｓと予め定められた閾値Ｔｈ_ｒｓとの比較判定処理を行う（閾値判定３３５）。ここでｒ_ｓ≧Ｔｈ_ｒｓの場合は、各単文字候補領域は文字列候補領域の列方向に沿って並んでいると判定して、文字列候補画像は「文字列らしい」として、認識確度平均値ｃｆによる判定処理に移る。またｒ_ｓ＜Ｔｈ_ｒｓの場合は、各単文字候補領域は列方向に沿って並んでいないと判定して、文字列候補領域は「文字列らしくない」として複数行混在文字列領域と判定３８０し、文字列判定モジュール１４０による文字列候補領域の判定処理を終了する。 3. Next, the comparison processing for determining whether the calculated text height (or string width) average character rectangle height for (or character rectangle width) r _s with a predetermined threshold value Th _rs (threshold determination 335). Here, when r _s ≧ Th _rs , it is determined that the single character candidate areas are arranged along the column direction of the character string candidate areas, and the character string candidate images are regarded as “character strings”, and the recognition accuracy average The process proceeds to determination processing using the value cf. If r _s <Th _rs , it is determined that the single character candidate areas are not arranged in the column direction, and the character string candidate area is determined not to be a character string and is determined to be a multi-line mixed character string area 380. Then, the determination process of the character string candidate area by the character string determination module 140 ends.

４．最後に、文字列判定モジュール１４０は認識確度平均値ｃｆと所定の閾値Ｔｈ_ｃｆとの比較判定処理を行う（閾値判定３４５）。ここでｃｆ≧Ｔｈ_ｃｆの場合は、各単文字候補領域の文字認識結果は妥当で文字切り出しが精度よく処理されていると判定して、文字列候補画像は「文字列らしい」として１行文字列領域と判定３９０する。また、ｃｆ＜Ｔｈ_ｃｆの場合は、各単文字候補領域の文字認識結果は妥当ではなく文字切り出しが精度よく処理されていないと判定し、文字列候補領域は「文字列らしくない」として複数行混在文字列領域と判定３８０する。いずれの場合も文字列判定モジュール１４０による文字列候補領域の判定処理を終了する。 4). Finally, the character string determination module 140 performs a comparison determination process between the recognition accuracy average value cf and a predetermined threshold Th _cf (threshold determination 345). Here, if cf ≧ Th _cf , it is determined that the character recognition result of each single character candidate area is valid and character segmentation has been processed with high accuracy, and the character string candidate image is “like a character string”. A determination is made 390 as a row region. If cf <Th _cf , it is determined that the character recognition result of each single character candidate area is not valid and character segmentation has not been processed accurately, and the character string candidate area is “not likely to be a character string”. The mixed character string area is determined 380. In any case, the character string candidate area determination processing by the character string determination module 140 is terminated.

以上説明してきたように、文字列判定モジュール１４０では、算出した各々の「文字列らしさ」又は「文字列のきれいさ」を表す数値を予め定められた各閾値と比較して、最終的にすべての数値との比較で、「文字列らしい」と判定された文字列候補領域を１行文字列領域と判定３９０し、文字認識モジュール１３０における認識結果を出力する。逆にいずれかの数値との比較において「文字列らしくない」と判定された文字列候補領域は複数行混在文字列領域として判定３８０し、文字列候補領域を後段の文字列分割モジュール１５０に出力する。
なお、これまでの説明では、標準偏差σ_ｓ、標準偏差σ_ｃ、高さの比ｒ_ｓ、認識確度平均値ｃｆの順での判定であったが、文字列判定モジュール１４０における判定処理ではすべての数値と予め定められた各閾値の比較結果において「文字列らしい」と判定されるかどうかを評価しているので、判定の順番は関係なく、どの数値から判定を行ってもよい。ただし、計算量の少ないものを先に判定するようにしてもよいし、本実施の形態のように「複数行文字列として判定３８０」することが確実なものを先に判定するようにしてもよい。 As described above, the character string determination module 140 compares each calculated numerical value representing “characteristic” or “cleanness of character string” with predetermined threshold values, and finally determines all the values. The character string candidate area determined as “character string” by comparison with the numerical value of is determined as a one-line character string area 390, and the recognition result in the character recognition module 130 is output. Conversely, the character string candidate area determined as “not like a character string” in comparison with any of the numerical values is determined as a mixed character string area 380 including a plurality of lines, and the character string candidate area is output to the subsequent character string dividing module 150. To do.
In the description so far, the standard deviation σ _s , the standard deviation σ _c , the height ratio r _s , and the recognition accuracy average value cf are determined in this order. Since it is evaluated whether or not it is determined as “character string” in the comparison result between each numerical value and a predetermined threshold value, the determination may be made from any numerical value regardless of the order of determination. However, it may be determined first that the amount of calculation is small, or it may be determined first what is sure to be “determined as a multi-line character string 380” as in the present embodiment. Good.

次に、文字列分割モジュール１５０における文字列候補領域の分割処理について詳細に説明する。
図５（ａ）は、文字列判定モジュール１４０において「文字列らしくない」と判定された、文字列切出しモジュール１１０で複数行が１行文字列領域として切り出された文字列候補領域５１０の具体的な一例を表す図である。この文字列候補領域５１０は、２行の文字列が１行文字列領域として切り出されている具体例である。図５（ａ）に示すように、例えば自由記述枠に手書きで斜め書きされたような文書画像の場合には、文字列方向の黒画素投影分布５２０に基づいて文字列領域を切り出す技術を利用した場合、その黒画素投影分布５２０は例えば図５（ｂ）に示すような分布となり、１行単位の文字列領域を分割する明確な分布の谷が現れず、結果として図５（ａ）のような２行が１行文字列領域として切り出されてしまう。 Next, the character string candidate area dividing process in the character string dividing module 150 will be described in detail.
FIG. 5A illustrates a specific example of the character string candidate area 510 that has been determined as “not like a character string” by the character string determination module 140 and in which a plurality of lines are extracted as one-line character string areas by the character string extraction module 110. It is a figure showing an example. The character string candidate area 510 is a specific example in which a two-line character string is cut out as a one-line character string area. As shown in FIG. 5 (a), for example, in the case of a document image written obliquely by handwriting in a free description frame, a technique for cutting out a character string region based on a black pixel projection distribution 520 in the character string direction is used. In this case, the black pixel projection distribution 520 is, for example, a distribution as shown in FIG. 5B, and a valley of a clear distribution that divides the character string region in one line does not appear, and as a result, the black pixel projection distribution 520 in FIG. Such two lines are cut out as a one-line character string area.

図６は、図５（ａ）に示す文字列候補領域に対して文字切出しモジュール１２０で文字切出し処理を行った結果の具体例を表す図である。ここで図６中に示した各矩形領域（単文字候補領域６０２等）が単文字候補領域を表す。
図６の例に示したように、図５（ａ）の例に示すような複数行が１行文字列領域として誤って切り出された場合、文字切出しモジュール１２０では、入力された文字列候補領域を１行文字列領域として文字切出し処理を行うので、図６に示すように、例えば上下に並ぶ「手」と「２」を単文字候補領域６０２として切り出す。その結果、この単文字候補領域を入力する文字認識モジュール１３０における文字認識精度は著しく低下する。
そこで、文字列分割モジュール１５０では、文字列判定モジュール１４０において「文字列らしくない」と判定された、例えば図５（ａ）に示すような２行の文字列が１行文字列領域として切り出された文字列候補領域５１０を分割して、分割画像を生成し、各分割画像に対して、再度、文字列切出しモジュール１１０による文字列切出し処理、文字切出しモジュール１２０による文字切出し処理等を行う。 FIG. 6 is a diagram illustrating a specific example of a result of character extraction processing performed by the character extraction module 120 on the character string candidate region illustrated in FIG. Here, each rectangular area (single character candidate area 602 etc.) shown in FIG. 6 represents a single character candidate area.
As illustrated in the example of FIG. 6, when a plurality of lines as illustrated in the example of FIG. 5A is erroneously cut out as a one-line character string region, the character cutout module 120 inputs the input character string candidate region. As shown in FIG. 6, for example, “hand” and “2” lined up and down are cut out as single character candidate regions 602. As a result, the character recognition accuracy in the character recognition module 130 that inputs this single character candidate area is significantly reduced.
Therefore, in the character string dividing module 150, for example, two character strings as shown in FIG. 5A, which are determined as “not like a character string” in the character string determining module 140, are cut out as a one-line character string region. The character string candidate area 510 is divided to generate divided images, and the character string cutting process by the character string cutting module 110 and the character cutting process by the character cutting module 120 are performed again on each divided image.

ここで、文字列分割モジュール１５０における文字列候補領域の分割処理の流れを図４に示すフローチャート例を用いて説明する。なお、この処理は、図２に例示したフローチャート内のステップＳ２１０でＮＯの場合に行われる。
ステップＳ４０２では、文字列分割モジュール１５０が、文字列候補領域を分割するための分割位置を探す。文字列分割モジュール１５０における分割位置探索方法についての詳細は後述する。
ステップＳ４０４では、文字列分割モジュール１５０が、ステップＳ４０２で探索した分割位置に基づいて、文字列候補領域を予め定められた方法で分割し、複数の分割画像を生成する。文字列分割モジュール１５０における分割画像生成方法についての詳細も後述する。なお、分割画像は、次の処理においては文字画像データ１０８となる。ただし、この段階での分割画像には複数行があるが、文字列切出しモジュール１１０によって１行だけの文字列候補領域として切り出される可能性が高いものである。 Here, the flow of the character string candidate area dividing process in the character string dividing module 150 will be described with reference to the flowchart example shown in FIG. This process is performed in the case of NO in step S210 in the flowchart illustrated in FIG.
In step S402, the character string division module 150 searches for a division position for dividing the character string candidate area. Details of the division position search method in the character string division module 150 will be described later.
In step S404, the character string division module 150 divides the character string candidate region by a predetermined method based on the division position searched in step S402, and generates a plurality of divided images. Details of the divided image generation method in the character string division module 150 will also be described later. The divided image becomes character image data 108 in the next processing. However, although there are a plurality of lines in the divided image at this stage, there is a high possibility that the character string extraction module 110 will extract the character string candidate area of only one line.

ステップＳ４０６では、文字列切出しモジュール１１０が、文字列候補領域を分割して生成した分割画像中の文字列候補を切り出す。
ステップＳ４０８では、文字切出しモジュール１２０が、文字列切出しモジュール１１０によって切り出された文字列候補画像に対して文字切出し処理を行い、単文字候補領域を切り出す。
ステップＳ４１０では、文字認識モジュール１３０が、文字切出しモジュール１２０によって切り出された単文字候補領域に対して文字認識を行い、認識結果１３２を出力する。 In step S406, the character string cutout module 110 cuts out character string candidates in the divided image generated by dividing the character string candidate area.
In step S408, the character cutout module 120 performs a character cutout process on the character string candidate image cut out by the character string cutout module 110, and cuts out a single character candidate area.
In step S410, the character recognition module 130 performs character recognition on the single character candidate region cut out by the character cut-out module 120 and outputs a recognition result 132.

ステップＳ４１２では、文字列判定モジュール１４０が、分割画像中のすべての文字列候補領域に対して文字切出し処理と文字認識処理を行ったか否かを判断し、処理を行った場合はステップＳ４１４へ進み、それ以外の場合（分割画像中に未処理の文字列候補領域が存在する場合）はステップＳ４０８へ戻り、未処理の文字列候補領域に対してステップＳ４０８、ステップＳ４１０における文字切出し処理、文字認識処理を繰り返す。
ステップＳ４１４では、文字列判定モジュール１４０が、元の文字列候補領域を分割位置で分割して生成したすべての分割画像に対して処理を行ったか否かを判断し、処理を行った場合はステップＳ２１４へ進み、それ以外の場合（未処理の分割画像が存在する場合）はステップＳ４０６へ戻り、未処理の分割画像に対してステップＳ４０６からステップＳ４１０までの処理を繰り返す。 In step S412, the character string determination module 140 determines whether or not character extraction processing and character recognition processing have been performed on all character string candidate regions in the divided image. If processing has been performed, the process proceeds to step S414. In other cases (when an unprocessed character string candidate area exists in the divided image), the process returns to step S408, and character extraction processing and character recognition in steps S408 and S410 are performed on the unprocessed character string candidate area. Repeat the process.
In step S414, the character string determination module 140 determines whether or not all the divided images generated by dividing the original character string candidate region at the dividing position have been processed. The process proceeds to S214. In other cases (when an unprocessed divided image exists), the process returns to step S406, and the processes from step S406 to step S410 are repeated for the unprocessed divided image.

次に、文字列分割モジュール１５０における分割位置探索の詳細について、図７、図８を用いて説明する。
図７は、文字列分割モジュール１５０における分割位置探索の処理例の流れを説明したフローチャートである。
ステップＳ７０２では、文字列分割モジュール１５０は、初期位置ｓとカレント位置ｃをそれぞれ、ｓ＝０、ｃ＝１にセットする。ここで初期位置ｓ、カレント位置ｃは、図８の例に示すように、分割の対象である文字列候補領域５１０に対する文字切出しモジュール１２０による単文字候補領域の位置に相当する。なお、図８に示す例では単文字領域位置を単文字候補領域の個数を表すインデックス（０、１、２、・・・・）としているが、文字列候補領域に対する単文字候補領域の相対座標値をセットしてもよい。最終的には、初期位置ｓとカレント位置ｃの間の領域を分割することになる。分割画像の左端が初期位置ｓ、右端がカレント位置ｃとなる。 Next, details of the division position search in the character string division module 150 will be described with reference to FIGS.
FIG. 7 is a flowchart for explaining a flow of a processing example of division position search in the character string division module 150.
In step S702, the character string dividing module 150 sets the initial position s and the current position c to s = 0 and c = 1, respectively. Here, as shown in the example of FIG. 8, the initial position s and the current position c correspond to the position of the single character candidate area by the character cutout module 120 with respect to the character string candidate area 510 to be divided. In the example shown in FIG. 8, the single character area position is an index (0, 1, 2,...) Indicating the number of single character candidate areas, but the relative coordinates of the single character candidate area with respect to the character string candidate area. A value may be set. Eventually, the area between the initial position s and the current position c is divided. The left end of the divided image is the initial position s, and the right end is the current position c.

ステップＳ７０４では、文字列分割モジュール１５０は、初期位置ｓからカレント位置ｃまでの文字列方向の投影分布を算出する。文字列方向は、横書きの場合は横方向、縦書きの場合は縦方向であり、図８の例では横方向となる。
ステップＳ７０６では、文字列分割モジュール１５０は、投影分布の最低値（もっとも深い谷の値）が予め定められた閾値（例えば０など）を超えたか否かを判断し、超えた場合はステップＳ７１２へ進み、それ以外の場合（投影分布の最低値が予め定められた閾値以下の場合）はステップＳ７０８へ進む。なお、ここでの閾値は、文字列切出しモジュール１１０が用いている閾値と同じ値であるとよい。
ステップＳ７０８では、投影分布の最低値が閾値以下なので、文字列分割モジュール１５０は、カレント位置ｃに１を加える（インクリメントする）。つまり、カレント位置ｃで分割した場合に、文字列切出しモジュール１１０が１行の文字列を切り出すことができるので、分割位置を右方向に移動させる。 In step S704, the character string dividing module 150 calculates a projection distribution in the character string direction from the initial position s to the current position c. The character string direction is the horizontal direction for horizontal writing, the vertical direction for vertical writing, and the horizontal direction in the example of FIG.
In step S706, the character string dividing module 150 determines whether or not the lowest value (the deepest valley value) of the projection distribution exceeds a predetermined threshold (for example, 0), and if so, the process proceeds to step S712. In other cases (when the minimum value of the projection distribution is equal to or smaller than a predetermined threshold value), the process proceeds to step S708. Note that the threshold value here may be the same value as the threshold value used by the character string cutting module 110.
In step S708, since the minimum value of the projection distribution is equal to or smaller than the threshold value, the character string dividing module 150 adds 1 (increments) to the current position c. That is, when dividing at the current position c, the character string cutting module 110 can cut out one line of character string, and therefore the dividing position is moved to the right.

ステップＳ７１０では、文字列分割モジュール１５０は、（カレント位置ｃ＜単文字候補領域数）であるか否かを判断し、（ｃ＜単文字候補領域数）である場合はステップＳ７０４へ戻り、同様の処理を繰り返し、それ以外の場合（カレント位置ｃが単文字候補領域数と同数となった場合）は文字列分割モジュール１５０における分割処理を終了する（ステップＳ７９９）。ここで、本ステップではカレント位置ｃと単文字候補領域数を比較しているが、これは先に述べたように単文字候補領域位置を単文字候補領域の個数を表すインデックスとしているからであり、カレント位置ｃが相対座標値の場合には、文字列候補領域の右端座標値と比較すればよい。
ステップＳ７１２では、文字列分割モジュール１５０は、カレント位置ｃにおいて投影分布の最低値が予め定められた閾値を超えたので、文字列候補領域の分割位置ｄをｄ＝ｃ−１とする。
ステップＳ７１４では、文字列分割モジュール１５０は、初期位置ｓ＝ｃ−１とし、これまでの投影分布をリセットする。そして、ステップＳ７０４に戻り、文字列候補領域の別の分割位置の探索を続ける。 In step S710, the character string dividing module 150 determines whether or not (current position c <number of single character candidate areas). If (c <number of single character candidate areas), the process returns to step S704, and the same. In other cases (when the current position c is the same as the number of single character candidate areas), the division process in the character string division module 150 is terminated (step S799). Here, in this step, the current position c is compared with the number of single character candidate areas because the single character candidate area position is used as an index representing the number of single character candidate areas as described above. When the current position c is a relative coordinate value, it may be compared with the right end coordinate value of the character string candidate area.
In step S712, the character string division module 150 sets the division position d of the character string candidate region to d = c−1 because the minimum value of the projection distribution exceeds the predetermined threshold at the current position c.
In step S714, the character string dividing module 150 sets the initial position s = c−1 and resets the projection distribution so far. Then, the process returns to step S704, and the search for another division position of the character string candidate area is continued.

次に文字列分割モジュール１５０における分割画像生成処理について説明する。
図９は、文字列分割モジュール１５０における分割画像生成例について説明する図である。
例えば、先に説明した文字列分割モジュール１５０における分割位置探索で、図９（ａ）に示すように分割位置９０２が決定される。文字列分割モジュール１５０における分割画像生成処理では、入力された文字列候補領域５１０に対して分割位置９０２で、文字列方向に垂直に切ることで、図９（ｂ）、図９（ｃ）の例に示すような、分割画像９１０、分割画像９２０を生成する。つまり、図９（ｂ）の例に示す分割画像９１０においては黒画素投影分布９１５であり、図９（ｃ）の例に示す分割画像９２０においては黒画素投影分布９２５であり、投影分布の最低値が予め定められた閾値以下の状態である。なお、図９に示す具体例では分割位置が一箇所の場合の例であるので２つの分割画像が生成されるが、例えば分割位置が２箇所の場合は３つの分割画像が生成される。つまり文字列分割モジュール１５０においては、（分割位置＋１）個の分割画像が生成される。 Next, the divided image generation processing in the character string dividing module 150 will be described.
FIG. 9 is a diagram for explaining an example of divided image generation in the character string division module 150.
For example, the division position search 902 is determined as shown in FIG. 9A by the division position search in the character string division module 150 described above. In the divided image generation processing in the character string dividing module 150, the input character string candidate area 510 is cut perpendicularly to the character string direction at the dividing position 902, and as shown in FIGS. 9B and 9C. As shown in the example, a divided image 910 and a divided image 920 are generated. That is, in the divided image 910 illustrated in the example of FIG. 9B, the black pixel projection distribution 915 is illustrated, and in the divided image 920 illustrated in the example of FIG. 9C, the black pixel projection distribution 925 is illustrated. This is a state where the value is equal to or less than a predetermined threshold. In the specific example shown in FIG. 9, two divided images are generated because the division position is one. For example, when there are two division positions, three divided images are generated. That is, the character string division module 150 generates (division position + 1) divided images.

図１０（ａ）の例は、分割画像９１０を対象とした場合の文字列切出しモジュール１１０、文字切出しモジュール１２０の処理結果であり、図１０（ｂ）の例は、分割画像９２０を対象とした場合の文字列切出しモジュール１１０、文字切出しモジュール１２０の処理結果である。つまり、文字列切出しモジュール１１０は、分割画像９１０に対して文字列切出し処理を行って、分割画像９１０から文字列候補領域１０１０と文字列候補領域１０２０を抽出し、文字切出しモジュール１２０は、文字列候補領域１０１０に対して文字切出し処理を行って単文字候補領域１０１１〜１０１６を抽出し、文字列候補領域１０２０に対して文字切出し処理を行って単文字候補領域１０２１〜１０２６を抽出している。また、同様に、文字列切出しモジュール１１０は、分割画像９２０に対して文字列切出し処理を行って、分割画像９２０から文字列候補領域１０３０と文字列候補領域１０５０を抽出し、文字切出しモジュール１２０は、文字列候補領域１０３０に対して文字切出し処理を行って単文字候補領域１０３１〜１０４２を抽出し、文字列候補領域１０５０に対して文字切出し処理を行って単文字候補領域１０５１〜１０５７を抽出している。 The example of FIG. 10A is the processing result of the character string extraction module 110 and the character extraction module 120 when the divided image 910 is targeted, and the example of FIG. 10B is the target of the divided image 920. This is a processing result of the character string cutout module 110 and the character cutout module 120. That is, the character string cutout module 110 performs character string cutout processing on the divided image 910 to extract the character string candidate area 1010 and the character string candidate area 1020 from the divided image 910. The character cutout module 120 The character extraction process is performed on the candidate area 1010 to extract the single character candidate areas 1011 to 1016, and the character extraction process is performed on the character string candidate area 1020 to extract the single character candidate areas 1021 to 1026. Similarly, the character string cutting module 110 performs character string cutting processing on the divided image 920 to extract the character string candidate area 1030 and the character string candidate area 1050 from the divided image 920, and the character cutting module 120 The character extraction process is performed on the character string candidate area 1030 to extract the single character candidate areas 1031 to 1042, and the character extraction process is performed on the character string candidate area 1050 to extract the single character candidate areas 1051 to 1057. ing.

図１１は、文字列分割モジュール１５０における他の分割画像生成処理例について説明する図である。ここでは、図９の具体例と同様の分割位置９０２の場合について説明する。
この分割画像生成処理では、図９（ａ）における分割位置９０２より左側にある単文字候補領域６０２（手、２）、単文字候補領域６０４（書、行）、単文字候補領域６０６（き、目）、単文字候補領域６０８（さ、も）、単文字候補領域６１０（れ、や）、単文字候補領域６１２（た、っ）と、図９（ａ）における分割位置９０２より右側にある単文字候補領域６１４（文、ぱ）、単文字候補領域６１６（字、り）、単文字候補領域６１８（列、傾）、単文字候補領域６２０（カ、い）、単文字候補領域６２２（ヾ、て）、単文字候補領域６２４（斜、い）、単文字候補領域６２６（め、る）、単文字候補領域６２８（に）、単文字候補領域６３０(傾)、単文字候補領域６３２（い）、単文字候補領域６３４（て）、単文字候補領域６３６（し）、単文字候補領域６３８（ヽ）、単文字候補領域６４０（る）をそれぞれ再画像化（再ラスタライズ）することで図９（ｂ）、図９（ｃ）に示すような分割画像９１０、分割画像９２０を生成する。
再画像化が行える理由は、文字列分割モジュール１５０の分割対象として入力された文字列候補領域は、すでに文字列切出しモジュール１１０、文字切出しモジュール１２０、文字認識モジュール１３０において、単文字候補領域が切り出されており、文字列候補領域に対する単文字候補領域の位置情報（例えば、矩形の左上の座標）、矩形情報（矩形の幅、高さの他に矩形内の画像を含む）が得られているため、文字列候補領域に対する各単文字候補領域の位置関係を保ちつつ再画像化が可能となる。つまり、単文字候補領域の位置情報、矩形情報を記憶しておき、それらを用いて再画像化処理を行う。
また、この分割画像生成処理は再画像化処理を行って分割画像を生成するため、前述の分割画像生成処理と比較して処理負荷が高くなるが、以下に示すような処理を行うこともできるようになる。 FIG. 11 is a diagram illustrating another example of divided image generation processing in the character string division module 150. Here, the case of the division position 902 similar to the specific example of FIG. 9 will be described.
In this divided image generation processing, the single character candidate region 602 (hand, 2), single character candidate region 604 (hand, line), single character candidate region 606 (ki, Eye), single character candidate region 608 (sammo), single character candidate region 610 (re, ya), single character candidate region 612 (ta, tsu), and to the right of division position 902 in FIG. Single character candidate region 614 (sentence, pal), single character candidate region 616 (character, ri), single character candidate region 618 (column, tilt), single character candidate region 620 (k, i), single character candidate region 622 (て, te), single character candidate region 624 (diagonal), single character candidate region 626 (me, ru), single character candidate region 628 (ni), single character candidate region 630 (tilt), single character candidate region 632 (I), single character candidate region 634 (t), single character candidate region 636 In addition, the single character candidate area 638 (単) and the single character candidate area 640 (ru) are re-imaged (re-rasterized), respectively, thereby dividing the divided image 910 as shown in FIGS. 9B and 9C. Then, a divided image 920 is generated.
The reason why the image can be re-imaged is that the character string candidate area input as the division target of the character string dividing module 150 is already cut out by the character string cutting module 110, the character cutting module 120, and the character recognition module 130. The position information of the single character candidate area with respect to the character string candidate area (for example, the upper left coordinates of the rectangle) and the rectangle information (including the image in the rectangle in addition to the width and height of the rectangle) are obtained. Therefore, re-imaging is possible while maintaining the positional relationship of each single character candidate region with respect to the character string candidate region. That is, the position information and rectangular information of the single character candidate area are stored, and the reimaging process is performed using them.
In addition, since this divided image generation process generates a divided image by performing a re-imaging process, the processing load is higher than the above-described divided image generation process, but the following process can also be performed. It becomes like this.

例えば、図１２に示すように、文字列分割モジュール１５０において単文字候補領域位置ｎ−１が、文字列候補領域の分割位置１２４８として探索された場合について説明する。分割位置１２４８を挟む２つの単文字候補領域１２１０、１２２０の外接矩形が図１２の例に示すように重なっている場合には、前述の分割画像生成方法では文字列候補領域を分割位置１２４８で単純に切るので、単文字候補領域「つ」を分断することになり、単文字候補領域「つ」を正しく文字認識できなくなり、誤認識の原因となる。
しかしながら、この分割画像生成処理では、分割位置１２４８の左右それぞれの単文字候補領域群を再画像化することにより分割画像を生成する。したがって、上記のような単文字候補領域の分断が原理的に起こらず、単文字候補領域を分断することで起こる誤認識を防ぐことが可能となる。 For example, as shown in FIG. 12, a case where the character string division module 150 searches for the single character candidate area position n-1 as the division position 1248 of the character string candidate area will be described. When the circumscribed rectangles of the two single character candidate regions 1210 and 1220 sandwiching the division position 1248 overlap as shown in the example of FIG. 12, the character string candidate region is simply displayed at the division position 1248 in the above-described divided image generation method. Therefore, the single character candidate area “tsu” is divided, and the single character candidate area “tsu” cannot be correctly recognized, resulting in erroneous recognition.
However, in this divided image generation process, a divided image is generated by reimaging the left and right single character candidate region groups at the division position 1248. Therefore, the division of the single character candidate area as described above does not occur in principle, and it is possible to prevent erroneous recognition that occurs by dividing the single character candidate area.

図１３を参照して、本実施の形態の画像処理装置のハードウェア構成例について説明する。図１３に示す構成は、例えばパーソナルコンピュータ（ＰＣ）などによって構成されるものであり、スキャナ等のデータ読み取り部１３１７と、プリンタなどのデータ出力部１３１８を備えたハードウェア構成例を示している。 A hardware configuration example of the image processing apparatus according to the present embodiment will be described with reference to FIG. The configuration shown in FIG. 13 is configured by a personal computer (PC), for example, and shows a hardware configuration example including a data reading unit 1317 such as a scanner and a data output unit 1318 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１３０１は、前述の実施の形態において説明した各種のモジュール、すなわち、文字列切出しモジュール１１０、文字切出しモジュール１２０、文字認識モジュール１３０、文字列判定モジュール１４０、文字列分割モジュール１５０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 A CPU (Central Processing Unit) 1301 includes various modules described in the above-described embodiments, that is, a character string extraction module 110, a character extraction module 120, a character recognition module 130, a character string determination module 140, and a character string division module 150. The control unit executes processing according to a computer program describing an execution sequence of each module.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１３０２は、ＣＰＵ１３０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３０３は、ＣＰＵ１３０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス１３０４により相互に接続されている。 A ROM (Read Only Memory) 1302 stores programs used by the CPU 1301, calculation parameters, and the like. A RAM (Random Access Memory) 1303 stores programs used in the execution of the CPU 1301, parameters that change as appropriate during the execution, and the like. These are connected to each other by a host bus 1304 including a CPU bus.

ホストバス１３０４は、ブリッジ１３０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス１３０６に接続されている。 The host bus 1304 is connected to an external bus 1306 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 1305.

キーボード１３０８、マウス等のポインティングデバイス１３０９は、操作者により操作される入力デバイスである。ディスプレイ１３１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）などがあり、各種情報をテキストやイメージ情報として表示する。 A keyboard 1308 and a pointing device 1309 such as a mouse are input devices operated by an operator. The display 1310 includes a liquid crystal display device or a CRT (Cathode Ray Tube), and displays various types of information as text or image information.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１３１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ１３０１によって実行するプログラムや情報を記録又は再生させる。ハードディスクには、文字画像データ１０８、文字列画像データ１１２、認識結果１３２、最終認識結果１４４などが格納される。さらに、その他の各種のデータ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 1311 has a built-in hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 1301 and information. The hard disk stores character image data 108, character string image data 112, recognition result 132, final recognition result 144, and the like. Further, various computer programs such as various other data processing programs are stored.

ドライブ１３１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体１３１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース１３０７、外部バス１３０６、ブリッジ１３０５、及びホストバス１３０４を介して接続されているＲＡＭ１３０３に供給する。リムーバブル記録媒体１３１３も、ハードディスクと同様のデータ記録領域として利用可能である。 The drive 1312 reads data or a program recorded on a removable recording medium 1313 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and reads the data or program into an interface 1307 and an external bus 1306. , The bridge 1305, and the RAM 1303 connected via the host bus 1304. The removable recording medium 1313 can also be used as a data recording area similar to a hard disk.

接続ポート１３１４は、外部接続機器１３１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート１３１４は、インタフェース１３０７、及び外部バス１３０６、ブリッジ１３０５、ホストバス１３０４等を介してＣＰＵ１３０１等に接続されている。通信部１３１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部１３１７は、例えばスキャナであり、ドキュメントの読み取り処理を実行する。データ出力部１３１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 1314 is a port for connecting the external connection device 1315 and has a connection unit such as USB and IEEE1394. The connection port 1314 is connected to the CPU 1301 and the like via the interface 1307, the external bus 1306, the bridge 1305, the host bus 1304, and the like. A communication unit 1316 is connected to a communication line and executes data communication processing with the outside. The data reading unit 1317 is, for example, a scanner, and executes document reading processing. The data output unit 1318 is, for example, a printer, and executes document data output processing.

なお、図１３に示す画像処理装置のハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図１３に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図１３に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Note that the hardware configuration of the image processing apparatus illustrated in FIG. 13 shows one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 13, and the modules described in the present embodiment are executed. Any configuration is possible. For example, some modules may be configured with dedicated hardware (for example, Application Specific Integrated Circuit (ASIC), etc.), and some modules are in an external system and connected via a communication line Alternatively, a plurality of systems shown in FIG. 13 may be connected to each other via communication lines so as to cooperate with each other. Further, it may be incorporated in a copying machine, a fax machine, a scanner, a printer, a multifunction machine (an image processing apparatus having any two or more functions of a scanner, a printer, a copying machine, a fax machine, etc.).

なお、数式を用いて説明したが、数式には、その数式と同等のものが含まれる。同等のものとは、その数式そのものの他に、最終的な結果に影響を及ぼさない程度の数式の変形、又は数式をアルゴリズミックな解法で解くこと等が含まれる。
また、前述の実施の形態の説明において、予め定められた値との比較において、「以上」、「以下」、「より大きい」、「より小さい（未満）」としたものは、その組み合わせに矛盾が生じない限り、それぞれ「より大きい」、「より小さい（未満）」、「以上」、「以下」としてもよい。 In addition, although demonstrated using a numerical formula, the thing equivalent to the numerical formula is contained in a numerical formula. The equivalent includes not only the mathematical formula itself, but also transformation of the mathematical formula to the extent that the final result is not affected, or solving the mathematical formula by an algorithmic solution.
Further, in the description of the above-described embodiment, “more than”, “less than”, “greater than”, and “less than (less than)” in a comparison with a predetermined value contradicts the combination. As long as the above does not occur, “larger”, “smaller (less than)”, “more than”, and “less than” may be used.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通などのために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙＤｉｓｃ（登録商標））、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、あるいは無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、あるいは別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して
記録されていてもよい。また、圧縮や暗号化など、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray Disc (registered trademark), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１１０…文字列切出しモジュール
１２０…文字切出しモジュール
１３０…文字認識モジュール
１４０…文字列判定モジュール
１５０…文字列分割モジュール DESCRIPTION OF SYMBOLS 110 ... Character string extraction module 120 ... Character extraction module 130 ... Character recognition module 140 ... Character string determination module 150 ... Character string division | segmentation module

Claims

画像から文字列の候補となる領域である文字列候補領域を切り出す切出手段と、
前記文字列候補領域が１行の文字列であるか否かを判定する判定手段と、
判定結果が１行の文字列ではない場合は、前記文字列候補領域の行方向とは垂直方向に分割する分割手段
を具備し、
前記切出手段は、前記分割手段によって分割された領域の画像を対象の画像とする
ことを特徴とする画像処理装置。 Clipping means for cutting out a character string candidate area that is a candidate area for a character string from an image;
Determination means for determining whether or not the character string candidate area is a character string of one line;
When the determination result is not a single-line character string, the character string candidate area includes a dividing unit that divides in a direction perpendicular to the line direction,
The image processing apparatus according to claim 1, wherein the cut-out means uses the image of the area divided by the dividing means as a target image.

前記文字列候補領域から文字の候補となる領域である文字候補領域を切り出す第２の切出手段
を具備し、
前記判定手段は、前記文字候補領域の特徴量に基づいて、前記文字列候補領域が１行の文字列であるか否かを判定する
ことを特徴とする請求項１に記載の画像処理装置。 A second cutting means for cutting out a character candidate area that is a candidate area for a character from the character string candidate area;
The image processing apparatus according to claim 1, wherein the determination unit determines whether or not the character string candidate area is a character string in one line based on a feature amount of the character candidate area.

前記判定手段は、前記文字候補領域の特徴量として、該文字候補領域の大きさ、位置又は前記文字列候補領域と該文字候補領域の大きさの比率のいずれか１つ以上を用いる
ことを特徴とする請求項２に記載の画像処理装置。 The determining means uses, as the feature amount of the character candidate area, one or more of the size and position of the character candidate area or the ratio of the size of the character string candidate area and the character candidate area. The image processing apparatus according to claim 2.

前記文字候補領域を対象として文字認識する文字認識手段
を具備し、
前記判定手段は、前記文字認識手段による認識結果に基づいて、前記文字列候補領域が１行の文字列であるか否かを判定する
ことを特徴とする請求項２又は３に記載の画像処理装置。 Comprising character recognition means for recognizing characters for the character candidate area;
The image processing according to claim 2 or 3, wherein the determination unit determines whether or not the character string candidate region is a character string of one line based on a recognition result by the character recognition unit. apparatus.

前記分割手段は、前記文字列候補領域内の行方向の投影分布に基づいて、分割位置を決定する
ことを特徴とする請求項１から４のいずれか一項に記載の画像処理装置。 5. The image processing apparatus according to claim 1, wherein the dividing unit determines a dividing position based on a projection distribution in a row direction in the character string candidate region.

前記分割手段は、前記文字候補領域に基づいて、分割した前記文字列候補領域の画像を再生する
ことを特徴とする請求項２から５のいずれか一項に記載の画像処理装置。 The image processing apparatus according to claim 2, wherein the dividing unit reproduces the divided image of the character string candidate region based on the character candidate region.

コンピュータを、
画像から文字列の候補となる領域である文字列候補領域を切り出す切出手段と、
前記文字列候補領域が１行の文字列であるか否かを判定する判定手段と、
判定結果が１行の文字列ではない場合は、前記文字列候補領域の行方向とは垂直方向に分割する分割手段
として機能させ、
前記切出手段は、前記分割手段によって分割された領域の画像を対象の画像とする
ことを特徴とする画像処理プログラム。 Computer
Clipping means for cutting out a character string candidate area that is a candidate area for a character string from an image;
Determination means for determining whether or not the character string candidate area is a character string of one line;
If the determination result is not a single-line character string, it functions as a dividing means for dividing the character string candidate area in a direction perpendicular to the line direction;
The image processing program characterized in that the cutout means uses the image of the area divided by the dividing means as a target image.