JP3972546B2

JP3972546B2 - Image processing apparatus and image processing method

Info

Publication number: JP3972546B2
Application number: JP2000006396A
Authority: JP
Inventors: 篤伊藤; 佳恭武藤; 宗一岡; 進大来
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2000-01-14
Filing date: 2000-01-14
Publication date: 2007-09-05
Anticipated expiration: 2020-01-14
Also published as: JP2001195542A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力画像中から文字列を抽出するための画像処理装置および画像処理方法に関するものである。
【０００２】
【従来の技術】
一般に、入力画像中からの文字列の抽出は、画像出力の高画質化のための領域別（テキスト／イメージ別）適応処理、画像圧縮時の容量削減のための領域別（テキスト／イメージ別）圧縮処理、ＯＣＲ（Optical Character Reader）の前処理等において行われている。例えば、近年のデジタル複写機においては、文字列を含有する原画や原稿等から画像情報を読み取ると、その画像情報について文字列の抽出を行った後に、文字列に相当する部分と他の部分とに対し別個に処理パラメータを切り替えたり圧縮処理等を行ったりすることで、それぞれの処理の好適化を図っている。
【０００３】
このような文字列の抽出は、従来、以下のようにして行われている。その１つとしては、入力画像に関する特徴量、例えば文字列を構成する画素（黒画素等）の分布をその画像周辺に投影し、その投影された特徴量に基づき投影軸方向（例えば入力画像の副走査方向）に沿った分布の切れ目を探し出し、その切れ目によって入力画像を分割することで、その入力画像から文字列を抽出する方法がある。また、他の１つとして、入力画像中に含まれるエッジ等を認識することで、局所的に文字候補となる塊を抽出し、その塊を所定のスレッショルド（閾値）を基に統合することで、入力画像中からの文字または文字列の抽出を行う方法がある。
【０００４】
【発明が解決しようとする課題】
しかしながら、上述した従来の文字列抽出技術では、入力画像中に文字の大きさや形状、文字間隔等が一定でない文字列や、文字の並ぶ方向が様々であったり曲線上に並んでいたりする文字列が混在していると、文字列の抽出を的確に行えないおそれがある。
【０００５】
例えば、上述した画像を分割する抽出方法では、画像周辺に特徴量を投影することにより一方向のみの情報になってしまうため、抽出すべき文字列の並ぶ方向、その文字列を構成する文字の大きさや行間等が略一定であれば有効であるが、文字列の方向、文字の大きさや行間等が異なっている文字列（例えば投影方向に対して斜めに並んでいる文字列）が混在していると、特徴量の分布の切れ目を探し出すことができず、結果として文字列を正しく抽出することができなくなってしまう。また、他の１つの方法である文字塊を統合する抽出方法では、文字列が局所的に一方向に並ぶことを利用して文字列を抽出するため、様々な方向に並ぶ文字列が混在していても文字列抽出が可能となるが、所定のスレッショルドに基づいて文字塊の統合を行うことから、統合時の条件が文字の大きさや形状、文字間隔等に大きく依存してしまう。したがって、文字の大きさ等や文字間隔が異なっている文字列が混在していると、文字列統合時の判定が困難になってしまい、文字列抽出を高精度に行えなくなるのに加え、その文字列抽出を複数回に分けて行う必要が生じてしまう可能性もある。さらには、上述したいずれの抽出方法も、曲線上に並んでいる文字列については全く考慮していないため、当該文字列が入力画像中に混在していても、これを文字列として抽出することができない。
【０００６】
そこで、本発明は、これらの問題点を鑑み、入力画像中に文字の大きさや形状、文字間隔等が一定でない文字列や、文字の並ぶ方向が様々であったり曲線上に並んでいたりする文字列が混在していても、これらの文字列を的確に、かつ、精度良く抽出することのできる画像処理装置および画像処理方法を提供することを目的とする。
【００１２】
本発明は上記目的を達成するために案出された画像処理装置で、入力画像から当該入力画像に含まれる文字候補を抽出する文字候補抽出手段と、前記文字候補抽出手段が抽出した文字候補に関する特徴量を抽出するとともに当該抽出を複数種類の特徴量について行う特徴抽出手段と、前記文字候補抽出手段が抽出した各文字候補に個別に付された識別番号である文字候補番号と、抽出すべき文字列候補の識別番号である文字列番号との関係が、入力Ｕの値によって出力Ｖの値が遷移するニューロンの状態によって特定され、当該ニューロンの出力Ｖの値によって各文字候補がどの文字列番号の文字列候補に属するか分かるように構成された処理テーブルと、前記特徴抽出手段が抽出した複数種類の特徴量を所定の演算式に代入して前記処理テーブルにおける各ニューロンの入力Ｕの値を変化させ、当該変化によって前記処理テーブル上に存在する全てのニューロンの出力Ｖの値を初期設定状態から最適状態に収束するように遷移させ、その遷移結果を基にしつつ前記文字候補抽出手段が抽出した複数の文字候補の中から同一の文字列番号に属する文字候補群を文字列を構成する文字候補群として抽出する文字列抽出手段とを備えることを特徴とするものである。
【００１３】
さらに、本発明は上記目的を達成するために案出された画像処理方法で、入力画像から当該入力画像に含まれる文字候補を抽出する文字候補抽出ステップと、前記文字候補抽出ステップで抽出した文字候補に関する特徴量を抽出するとともに当該抽出を複数種類の特徴量について行う特徴抽出ステップと、前記文字候補抽出ステップで抽出した各文字候補に個別に付された識別番号である文字候補番号と、抽出すべき文字列候補の識別番号である文字列番号との関係が、入力Ｕの値によって出力Ｖの値が遷移するニューロンの状態によって特定され、当該ニューロンの出力Ｖの値によって各文字候補がどの文字列番号の文字列候補に属するか分かるように構成された処理テーブルを用い、前記特徴抽出ステップで抽出した複数種類の特徴量を所定の演算式に代入して前記処理テーブルにおける各ニューロンの入力Ｕの値を変化させ、当該変化によって前記処理テーブル上に存在する全てのニューロンの出力Ｖの値を初期設定状態から最適状態に収束するように遷移させ、その遷移結果を基にしつつ前記文字候補抽出手段が抽出した複数の文字候補の中から同一の文字列番号に属する文字候補群を文字列を構成する文字候補群として抽出する文字列抽出ステップとを備えることを特徴とする。
【００１４】
上記構成の画像処理装置または上記手順の画像処理方法によれば、入力画像に含まれる文字候補を抽出すると、抽出した文字候補について、複数種類の特徴量抽出する。ここで抽出する特徴量としては、例えば、他の文字候補との間を結ぶ直線または曲線との距離関係、文字候補の大きさや形状、他の文字候補との間隔等が挙げられる。そして、複数種類の特徴量を抽出すると、それぞれの特徴量を参照しながらニューロンの状態によって表されるニューラル表現の処理テーブルにおける各ニューロンの値を遷移させ、その遷移結果を基にしつつ入力画像中から抽出した文字候補の中から文字列を構成する文字候補群を抽出する。これにより、入力画像中に存在する文字列は、その文字列を構成する文字の大きさや並ぶ方向等に依存することなく、文字列として抽出されることになる。
【００１５】
【発明の実施の形態】
以下、図面に基づき本発明に係る画像処理装置および画像処理方法について説明する。
【００１６】
先ず、本発明に係る画像処理装置の概略構成について説明する。図１は、本発明に係る画像処理装置の一例の概略構成を示すブロック図である。
【００１７】
図例のように、本実施形態における画像処理装置は、画像入力手段１０と、文字候補抽出手段２０と、文字列抽出手段３０と、画像出力手段４０と、から構成されている。
【００１８】
画像入力手段１０は、入力画像を光学的読み取りによって取得するためのもので、具体的にはスキャナやデジタルカメラ等からなるものである。この画像入力手段１０では、文字列を含有する原画または原稿、特にオフィス文書、メモ書き、手書き文書、カタログ、マニュアル、雑誌、チラシ、地図、標識や看板や車体番号等の文字列を含有する写真から、入力画像を取得するようになっている。ただし、画像入力手段１０は、ＬＡＮ（Local Area Network）やいわゆるインターネット等に繋がる通信回線を介して入力画像を取得するものであってもよい。
【００１９】
文字候補抽出手段２０は、画像入力手段１０が入力画像を取得すると、後述するようにして、その入力画像から当該入力画像に含まれる文字候補を抽出するものであり、具体的にはＣＰＵ（Central Processing Unit)とこれに実行される所定プログラムとの組み合わせ等によって実現されるものである。
【００２０】
文字列抽出手段３０は、文字候補抽出手段２０が文字候補を抽出すると、その抽出結果を基にしつつ、後述するようにして、入力画像中に存在する文字列を抽出するものであり、具体的にはＣＰＵとこれに実行される所定プログラムとの組み合わせ等によって実現されるものである。
【００２１】
画像出力手段４０は、文字列抽出手段３０による文字列抽出結果を外部へ出力するためのもので、具体的には外部装置とのインターフェース等からなるものである。この画像出力手段４０による出力先としては、例えば、画像入力手段１０が取得した入力画像に対して階調補正や画像圧縮等の処理を行う他の画像処理装置や、その入力画像中に存在する文字を認識するＯＣＲが挙げられる。
【００２２】
ここで、このように構成された画像処理装置における文字候補抽出手段２０の詳細について説明する。図２は、文字候補抽出手段の構成例を示すブロック図である。
【００２３】
図例のように、本実施形態における文字候補抽出手段２０は、エッジ抽出手段２１と、２値化手段２２と、１文字候補抽出手段２３と、から構成されている。
【００２４】
エッジ抽出手段２１は、画像入力手段１０が取得した入力画像に対し、その複数方向（例えば主走査方向および副走査方向）について、エッジの抽出を行うものである。具体的には、例えば図３に示すように、並列に動作する複数のデジタルフィルタ１〜４を備えており、入力画像を構成する各画素値に対し並列にデジタルフィルタリングを行って、その結果と当該画素値との差を算出し、当該差の絶対値の最も大きい値を出力することで、入力画像中に存在するエッジを抽出するようになっている。ただし、エッジ抽出手段２１は、他の周知技術を用いてエッジ抽出処理を行うようにしてもよい。
【００２５】
また図２において、２値化手段２２は、エッジ抽出手段２１が抽出したエッジに対し２値化を行うものである。２値化処理に関しては、単純２値化や浮動２値化などといった周知技術を利用して行うことが考えられる。
【００２６】
１文字候補抽出手段２３は、２値化手段２２による２値化処理後の画像に対して、当該画像中に存在する点や線分等の画像要素毎に外接矩形を仮設するとともに、互いに関連する外接矩形同士は１つの塊であると認識し、当該塊を１つの文字候補として抽出するものである。詳しくは、例えば図４（ａ）に示すように１つの外接矩形１内に他の外接矩形２が位置する場合、図４（ｂ）に示すように１つの外接矩形１と他の外接矩形２とが重なり合って共有領域を持つ場合、図４（ｃ）に示すように１つの外接矩形１と他の外接矩形２との大きさ比、距離間隔、それぞれの形状等が所定の関係にある場合に、いずれも双方の外接矩形１，２同士は１つの塊であると認識し、当該外接矩形１，２同士を統合するようになっている。つまり、１文字候補抽出手段２３は、１つの文字を構成するであろう１個以上の画像要素を、１つの文字候補として抽出するものである。なお、１文字候補抽出手段２３は、文字候補の抽出にあたって、外接矩形の代わりに外接多角形を用いるようにしてもよい。
【００２７】
このように、文字候補抽出手段２０では、エッジ抽出手段２１、２値化手段２２および１文字候補抽出手段２３を備えることによって、画像入力手段１０が取得した入力画像から、１つの文字を構成すると判定される１個以上の画像要素を１つの文字候補として抽出するようになっている。ただし、入力画像中に存在する文字候補を１文字分毎に抽出できれば、文字候補抽出手段２０は、上述した以外の構成によるもの、すなわち上述した以外の周知技術を利用したものであっても構わない。また、例えば画像入力手段１０が取得した入力画像が当初から２値画像である場合には、エッジ抽出手段２１および２値化手段２２を省略することも考えられる。
【００２８】
次いで、本実施形態の画像処理装置において最も特徴的な部分である文字列抽出手段３０の詳細について説明する。図５は、文字列抽出手段の構成例を示すブロック図である。
【００２９】
図例のように、本実施形態における文字列抽出手段３０は、特徴抽出手段３１と、文字列候補抽出手段３２と、文字列評価手段３３と、から構成されている。
【００３０】
特徴抽出手段３１は、文字候補抽出手段２０が抽出したそれぞれの文字候補に対して、その特徴量の抽出を行うものである。この特徴抽出手段３１が抽出する特徴量としては、例えば、文字候補を構成する点や線分等といった画像要素の面積、文字候補の外接矩形の面積、外接多角形の面積、外接矩形または外接多角形の辺の長さ、外接矩形の縦横比、外接多角形の最長幅と最短幅の比、外接矩形または外接多角形の重心位置の座標、などが挙げられる。すなわち、特徴抽出手段３１は、文字候補抽出手段２０が抽出した文字候補に関する特徴量を抽出するとともに、その抽出を複数種類の特徴量について行うものである。なお、特徴量の抽出処理は、周知技術を利用した所定の演算等によって行えばよい。
【００３１】
文字列候補抽出手段３２は、特徴抽出手段３１が抽出した文字候補に関する特徴量を参照しながら、文字候補抽出手段２０が抽出した複数の文字候補の中から文字列を構成する文字候補群の候補、すなわち文字列候補を抽出するものである。
【００３２】
ところで、この文字列候補抽出手段３２は、全連結型のニューラルネットワークによって構成されており、そのニューラルネットワークによる最適化手法を利用して文字列候補を抽出する点に特徴がある。なお、ニューラルネットワークとは、生物の脳の神経細胞（ニューロン）を手本に構成された人口神経回路網をいう。
【００３３】
ここで、ニューラルネットワーク構成の文字列候補抽出手段３２について、さらに詳しく説明する。図６はニューロンモデルを示す説明図であり、図７はそのニューロンモデルの入出力関係の一例であるマッカロック・ピッツモデルを示す説明図であり、図８はニューラル表現の一例を示す説明図である。これらの図中において、Ｖｊは他ニューロンの出力、Ｗｊは重み係数、ＵはニューロンＡの入力、ＶはニューロンＡの出力を示しているものとする。なお、以下の説明を簡単にするため、Ｗｊは均等（＝１）であるものとする。
【００３４】
図６および図７では、ニューロンＡの入力Ｕの状態が他ニューロンの出力Ｖｊに応じて変化するとともに、その入力ＵによってニューロンＡの出力Ｖの値が遷移することを示している。具体的には、図７に示すマッカロック・ピッツモデルであれば、ニューロンＡの入力Ｕが「０」を境に、それよりも大きければ出力Ｖの値が「１」となり、それよりも小さければ出力Ｖの値が「０」となる。なお、ここでは、ニューロンモデルの一例としてマッカロック・ピッツモデルを示しているが、その他にはシグモイド関数モデルやヒステリシス・マッカロック・ピッツモデルなどがある。
【００３５】
図８のニューラル表現は、文字列候補抽出手段３２が有する処理テーブルの状態の一例を示している。この処理テーブルの横軸方向は、文字候補抽出手段２０が抽出した各文字候補に個別に付された識別番号（以下「文字候補番号」という）ｊを表している。したがって、文字候補抽出手段２０で抽出された文字候補がｍ個であれば、処理テーブルの横軸方向には、ｊ＝１〜ｍの文字候補番号が存在することになる。一方、処理テーブルの縦軸方向は、文字列候補抽出手段３２が抽出すべき文字列候補の識別番号（以下「文字列番号」という）ｉを表している。
【００３６】
これら文字候補番号と文字列番号とによって特定される領域、すなわち図中の点線枠内の領域内は、ニューロンの状態を表している。つまり、図中の点線枠内の「０」および「１」は、それぞれがニューロンの出力Ｖの値を表している。これらの値により、この処理テーブルからは、以下のことが分かる。例えば、文字候補番号ｊ＝３で文字列番号ｉ＝２に相当するニューロンが「１」だった場合には、その文字候補番号が３である文字候補は、文字列番号が２である文字列に属する、ということが分かる。
【００３７】
この処理テーブルにおける各ニューロンの出力Ｖの値は、初期状態においては乱数発生によって設定されるが、その後、文字列候補抽出手段３２が文字列候補の抽出を完了するまでの間に、それぞれが最適な状態となるように遷移する。この遷移のために、文字列候補抽出手段３２は、以下の（１）式のような動作式を用意し、さらに後述する（６）式のような更新式を用いて、該当するニューロンの入力Ｕを変化させるようになっている。
【００３８】
【数１】

【００３９】
この動作式（１）において、Ａ，Ｂ，Ｃ，Ｄ，Ｅは、それぞれ予め設定されている係数である。
【００４０】
また、ＡＲＥＡは、注目対象の外接矩形における他の外接矩形に対する面積比に該当するもので、以下の（２）式によって表される。
【００４１】
【数２】

【００４２】
この（２）式において、ＬＸｋ、ＬＹｋは、それぞれ外接矩形ｋの直交する辺の長さであり、ＬＸｊ、ＬＹｊはそれぞれ外接矩形ｊの直交する辺の長さを示している。ただし、ＡＲＥＡは、文字候補を構成する画像要素の面積や外接矩形の面積等を基に、外接矩形同士の面積比を特定するものであってもよい。
【００４３】
また、動作式（１）において、ＮＥＡＲは、同一の文字列番号ｉに属する文字候補の外接矩形についての回帰直線と、注目対象の外接矩形ｋの重心座標との間の距離に該当するもので、以下の（３）式によって表される。
【００４４】
【数３】

【００４５】
この（３）式において、Ｘｋ、Ｙｋは、それぞれ外接矩形ｋの重心座標であり、ａｉ，ｂｉ，ｃｉは文字列ｉの回帰直線ａ_iｘ_k＋ｂ_iｙ_k＋ｃ_i＝０の係数である。
【００４６】
ただし、上述した（３）式ではなく、例えば以下の（４）式のように、ｙ軸方向とｘ軸方向に関する回帰直線との距離のうちの小さいほうをＮＥＡＲとしてもよい。
【００４７】
【数４】

【００４８】
なお、この（４）式において、ｍｉｎ（Ｑ，Ｒ）は、ＱとＲのうち小さいほうのいずれかの値を示し、さらにＱまたはＲの値が存在しない場合はＲまたはＱの値を出力することを意味している。
【００４９】
また、このとき、回帰直線の代わって回帰曲線（２次曲線、３次曲線…、等）を用いれば、曲線状に並んだ文字候補から文字列候補を抽出することができるようになる。例えば、２次曲線を回帰曲線ａ_iｘ_k ²＋ｂ_iｘ_k＋ｃ_iｙ_k＋ｄ_i＝０とすれば、その回帰曲線と外接矩形ｋの重心座標のｙ軸方向の距離は、以下の（５）式のようになる。
【００５０】
【数５】

【００５１】
さらには、上述したようなＮＥＡＲの代わりに、ＮＥＡＲを文字列ｉに属する文字候補の外接矩形の面積の平方根で割った値を使用することも考えられる。
【００５２】
また、動作式（１）において、ＩＮＴＥＲは、注目対象の外接矩形の重心と他外接矩形の重心との距離に該当するものである。また、動作式（１）中におけるmin≡k,k=1〜m（Ｑｋ）の項は、ｋ＝１〜ｍに対応するＱｋのうち正数で最も小さい値を表している。よって、min≡k,k=1〜m（INTER Ｖ_ik）の項は、文字候補ｊと、その文字候補ｊが属する文字列ｉを構成する他の文字候補の中で当該文字候補ｊの最も近傍にある他の文字候補との間の距離に該当する。ただし、このとき、文字候補ｊが属する文字列ｉを構成する他の文字候補の中で当該文字候補ｊに対し２番目に近傍にある文字候補との距離を考慮に入れてもよい。さらには、上述したようなＩＮＴＥＲの代わりに、ＩＮＴＥＲを文字列ｉに属する文字候補の外接矩形の面積の平方根で割った値を使用することも考えられる。
【００５３】
また、動作式（１）において、ｈ(t) は、周知のヒルクライム項に相当するものであり、局所最適解から脱出する働きを持っている。
【００５４】
以上のような動作式（１）によりｄＵ_ijを算出すると、文字列候補抽出手段３２は、そのｄＵ_ijを以下の（６）式のような更新式に代入して、各ニューロンの入力Ｕの値を変化させる。
【００５５】
【数６】

【００５６】
このとき、文字列候補抽出手段３２では、適切な収束条件を与えることで、動作式（１）および更新式（６）による各ニューロンの入力Ｕの更新を終了することができる。その結果、文字列候補抽出手段３２では、ニューラル表現の処理テーブルにおける各ニューロンの出力Ｖの値が最適な状態となるように遷移するので、その遷移後の各ニューロンの出力Ｖの値から各文字候補の属する文字列を判定でき、その文字列の特性を算出することができるようになる。
【００５７】
つまり、文字列候補抽出手段３２は、特徴抽出手段３１が抽出した文字候補に関する特徴量を参照しながら、動作式（１）および更新式（６）を用いてニューラル表現の処理テーブルにおける各ニューロンの出力Ｖの値を最適な状態に遷移させ、その遷移結果を基にしつつ文字列候補を抽出するようになっている。
【００５８】
このような文字列候補抽出手段３２に抽出された文字列候補は、文字列抽出手段３０の文字列評価手段３３によって、文字列であるか否かの評価判定が行われる。すなわち、文字列評価手段３３は、文字列候補抽出手段３２によって抽出された文字列候補に対して、文字列であるか否かの最終的な判断を行うものである。かかる判断は、例えば、直線性に関する評価関数、外接矩形の面積比に関する評価関数、外接矩形の間隔に関する評価関数等を用いて行うことが考えられる。
【００５９】
直線性に関する評価関数は、文字列ｉに属する文字候補の回帰直線または回帰曲線との平均距離を評価するためのものであり、具体的にはそれぞれは以下の（７）、（８）式によって表されるものである。
【００６０】
【数７】

【００６１】
【数８】

【００６２】
外接矩形の面積比に関する評価関数は、文字列ｉに属する文字候補の外接矩形の面積に関する分散を評価するためのものであり、具体的には以下の（９）式によって表されるものである。
【００６３】
【数９】

【００６４】
外接矩形の間隔に関する評価関数は、文字列ｉに属する文字候補の外接矩形の重心座標に関して、最近傍の重心座標との平均距離を評価するためのものであり、具体的には以下の（１０）式によって表されるものである。ただし、この外接矩形の間隔に関する評価関数は、２番目に近傍である重心座標との距離を考慮するようにしてもよい。
【００６５】
【数１０】

【００６６】
文字列評価手段３３は、これら（７）〜（１０）式のそれぞれに対応した閾値Ｔ₁，Ｔ₂，Ｔ₃，Ｔ₄を予め用意しており、各式の算出結果をそれぞれに対応する閾値Ｔ₁，Ｔ₂，Ｔ₃，Ｔ₄と比較することで、文字列候補抽出手段３２に抽出された文字列候補が文字列であるか否かを判断するようになっている。例えば、（７）式の算出結果が閾値Ｔ₁の範囲内になければ、文字列評価手段３３は、抽出された文字列候補が文字列ではないと判断する。また、（９）式の算出結果が閾値Ｔ₃の範囲内に収まっていなければ、文字列評価手段３３は、抽出された文字列候補が文字列ではないと判断する。また、（１０）式の算出結果が閾値Ｔ₄の範囲内になければ、文字列評価手段３３は、抽出された文字列候補が文字列ではないと判断する。
【００６７】
つまり、文字列評価手段３３は、文字列候補抽出手段３２によって抽出された文字列候補に対して、文字列であるか否かの最終的な判断を行うことによって、文字列抽出手段３０における文字列の抽出精度を向上させるためのものである。
【００６８】
次に、以上のように構成された画像処理装置における文字列抽出の処理手順、すなわち本実施形態における画像処理方法について説明する。
【００６９】
本実施形態の画像処理装置では、画像入力手段１０が入力画像を取得すると、文字候補抽出手段２０がその入力画像から当該入力画像に含まれる全ての文字候補を抽出する。そして、全ての文字候補が抽出されると、文字列抽出手段３０の特徴抽出手段３１は、各文字候補に関する複数種類の特徴量を抽出する。また、これと同時に、各文字候補には、文字候補番号ｊが付される。
【００７０】
その後、文字列抽出手段３０の文字列候補抽出手段３２は、ニューラル表現の処理テーブルにおける各ニューロンの出力Ｖの値を最適な状態に遷移させ、その遷移結果から文字列候補を抽出する。そのために、文字列候補抽出手段３２は、先ず、処理テーブルにおける各ニューロンの出力Ｖの値を、乱数発生によって設定する。これにより、処理テーブルにおいては、例えば図８に示すように、文字候補番号ｊの数に対応し、かつ、２次元のマップ状に構成された各ニューロンの出力Ｖの値（「０」または「１」）が設定されることになる。
【００７１】
各ニューロンの出力Ｖの値を乱数発生によって設定すると、続いて、文字列候補抽出手段３２は、次いで、処理テーブル上における１つのニューロンを注目対象とし、その注目対象の値を動作式（１）および更新式（６）を用いて最適な状態に遷移させる。
【００７２】
例えば、文字候補番号ｊ＝１で文字列番号ｉ＝１に相当するニューロンを注目対象とした場合であれば、各文字候補に関する複数種類の特徴量を参照しつつ、文字候補番号ｊ＝１に該当する文字候補の外接矩形と文字列番号ｉ＝１に属する他の外接矩形との面積比、文字候補番号ｊ＝１に該当する文字候補の外接矩形の重心座標と文字列番号ｉ＝１に属する文字候補の外接矩形についての回帰直線との間の距離、文字候補番号ｊ＝１に該当する文字候補の外接矩形の重心と文字列番号ｉ＝１に属する他の外接矩形の重心との距離、等を動作式（１）に代入してｄＵ_ijを算出するとともに、そのｄＵ_ijを更新式（６）に代入して各ニューロンの入力Ｕの値を変化させる。そのため、文字候補番号ｊ＝１で文字列番号ｉ＝１に相当するニューロンの出力Ｖの値は、図７に示すように入力Ｕの値の変化に応じて、文字候補番号ｊ＝１に該当する文字候補が文字列番号ｉ＝１に属するのに適していれば「１」に、そうでなければ「０」に遷移するようになる。この遷移の結果、文字候補番号ｊ＝１で文字列番号ｉ＝１に相当するニューロンの出力Ｖの値が「１」だった場合には、その文字候補番号が１である文字候補は、文字列番号が１である文字列候補に属することになる。
【００７３】
文字列候補抽出手段３２は、このような出力Ｖの値の遷移を、処理テーブル上に存在する全てのニューロンについて、順次繰り返して行う。これにより、処理テーブル上における各ニューロンの出力Ｖの値は、最適な状態となるように遷移して、当該最適な状態に収束することになる。
【００７４】
その後、文字列抽出手段３０では、文字列候補抽出手段３２に抽出された文字列候補に対して、文字列評価手段３３が文字列であるか否かを判断する。この判断の結果、文字列と判断された文字列候補のみが、画像出力手段４０から文字列の抽出結果として出力されることになる。
【００７５】
なお、ニューラル表現の処理テーブルを構成する文字候補番号ｊと文字列番号ｉとの数は、文字列候補抽出手段３２における処理容量（処理能力）等を考慮して適宜決定すればよいが、例えば文字列候補抽出手段３２と文字列評価手段において、文字列評価手段３３で文字列として判断された当該文字列を削除した結果を再び文字列候補抽出手段３２に入力するという手順を踏み、文字列候補抽出手段３２と文字列評価手段３３とによる処理を繰り返すようにすれば、処理テーブルにおける文字候補番号ｊと文字列番号ｉとの数に限定されずに、より多数の文字列を入力画像から抽出することも可能となる。
【００７６】
以上のように、本実施形態の画像処理装置および画像処理方法によれば、入力画像に含まれる文字候補を抽出した後に、文字候補同士を結ぶ直線または曲線と各文字候補との距離関係、すなわち入力画像中における文字候補の位置を考慮しつつ、ニューラル表現の処理テーブルを利用して文字列に沿うであろう直線または曲線を特定する。このとき、入力画像中に複数の文字列が混在していれば、その入力画像中における文字候補の位置を考慮するので、複数の直線または曲線を特定することになる。そして、直線または曲線を特定すると、その直線または曲線上に並ぶ文字候補群については、各文字候補の大きさ等に拘らず文字列を構成するものとして抽出する。
【００７７】
また、本実施形態の画像処理装置および画像処理方法によれば、入力画像に含まれる文字候補を抽出すると、抽出した文字候補について複数種類の特徴量抽出し、それぞれの特徴量を考慮しつつ、それぞれの特徴量に基づいてニューラル表現の処理テーブルを利用しながら、入力画像中から抽出した文字候補の中から文字列を構成する文字候補群を抽出する。
【００７８】
これらのことより、本実施形態の画像処理装置および画像処理方法では、入力画像中に文字の大きさや形状、文字間隔等が一定でない文字列や、文字の並ぶ方向が様々であったり曲線上に並んでいたりする文字列が混在している場合であっても、その文字の大きさや並ぶ方向等に依存することなく、精度の高い文字列抽出を行うことが可能となる。また、入力画像中において曲線上に並ぶ文字により構成される文字列が存在しても、その文字列を的確に抽出することができる。しかも、その際に、各文字候補についての複数種類の特徴量を総合的に考慮するので、文字列統合時の判定が困難になったり、文字列抽出を複数回に分けて行ったりする必要がない。
【００７９】
特に、本実施形態では、文字列候補抽出手段３２が動作式（１）において、文字候補の特徴量として、複数の文字候補を結ぶ回帰直線または回帰曲線と注目対象である文字候補の重心座標との距離関係を考慮するようになっている。したがって、例えば入力画像中に互いに異なる方向を向いた文字列が混在していたり、曲線上に並ぶ文字列が存在していても、これらを的確に抽出することができる。
【００８０】
また、本実施形態における文字列候補抽出手段３２は、文字候補の特徴量として、各文字候補の大きさをも考慮するようになっている。具体的には、動作式（１）において、文字候補を構成する画像要素の面積、文字候補に外接する矩形の面積、あるいは文字候補に外接する矩形の辺の長さ等を考慮するようになっている。したがって、例えば入力画像中にフォントの大きさが互いに異なる文字列が混在していても、これらを的確に抽出することができる。
【００８１】
さらに、このとき、各文字候補の形状、例えば文字候補に外接する矩形の縦横比をも考慮すれば、フォントの大きさのみならず、フォントの種類等が互いに異なる文字列が混在していても、これらを的確に抽出し得るようになる。
【００８２】
また、本実施形態における文字列候補抽出手段３２は、文字候補の特徴量として、各文字候補同士の間隔をも考慮するようになっている。具体的には、動作式（１）において、注目対象の外接矩形の重心と他外接矩形の重心との距離を考慮するようになっている。したがって、例えば入力画像中に文字間隔が互いに異なる文字列が混在していても、これらを的確に抽出することができるようになる。
【００８３】
その上、本実施形態の画像処理装置および画像処理方法では、文字列評価手段３３による評価判定を経た後に、文字列の抽出結果を出力するようになっているので、より一層の文字列抽出の高精度化が図れるようになる。
【００８４】
なお、本実施形態では、文字列候補抽出手段３２が上述した（１）〜（６）式を用いて文字列候補の抽出を行う場合を例に挙げて説明したが、本発明はこれに限定されるものではなく、例えば他のニューロンモデルや他のニューラルネットワークによる最適化手法を利用したものであっても構わない。また、他の最適化手法を利用したものであっても構わない。
【００８５】
【発明の効果】
以上に説明したように、本発明に係る画像処理装置および画像処理方法は、入力画像中に文字の大きさや形状、文字間隔等が一定でない文字列や、文字の並ぶ方向が様々な文字列が混在していたり、曲線上に並ぶ文字により構成される文字列が存在していても、その文字列を構成する文字の大きさや並ぶ方向等に依存することなく、精度の高い文字列抽出を行うことができる。しかも、その際に、各文字候補についての特徴量を総合的に考慮すれば、従来のように文字列統合時の判定が困難になったり文字列抽出を複数回に分けて行ったりすることなく、文字列抽出の高精度化が図れるようになる。
【図面の簡単な説明】
【図１】本発明に係る画像処理装置の一例の概略構成を示すブロック図である。
【図２】図１の画像処理装置が備える文字候補抽出手段の構成例を示すブロック図である。
【図３】図２の文字候補抽出手段における要部の構成例を示すブロック図である。
【図４】図２の文字候補抽出手段による１文字抽出処理の概要を示す説明図であり、（ａ）〜（ｃ）はそれぞれ１文字抽出処理の一態様を示す図である。
【図５】図１の画像処理装置が備える文字列抽出手段の構成例を示すブロック図である。
【図６】ニューロンモデルを示す説明図である。
【図７】ニューロンモデルの入出力関係の一例であるマッカロック・ピッツモデルを示す説明図である。
【図８】ニューラル表現の一例を示す説明図である。
【符号の説明】
１０…画像入力手段、２０…文字候補抽出手段、３０…文字列抽出手段、３１…特徴抽出手段、３２…文字列候補抽出手段、３３…文字列評価手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus and an image processing method for extracting a character string from an input image.
[0002]
[Prior art]
In general, character strings are extracted from input images by adaptive processing for each region (by text / image) for improving the image quality of the image output, and by region (by text / image) for capacity reduction during image compression. It is performed in compression processing, pre-processing of OCR (Optical Character Reader), and the like. For example, in a recent digital copying machine, when image information is read from an original image or a manuscript containing a character string, a character string is extracted from the image information, and then a portion corresponding to the character string and other portions On the other hand, each processing is optimized by individually switching processing parameters or performing compression processing.
[0003]
Such extraction of character strings is conventionally performed as follows. For example, a feature amount relating to an input image, for example, a distribution of pixels (black pixels or the like) constituting a character string is projected on the periphery of the image, and a projection axis direction (for example, an input image) is projected based on the projected feature amount. There is a method of searching for a break in the distribution along the sub-scanning direction) and extracting a character string from the input image by dividing the input image by the break. Also, as another one, by recognizing an edge or the like included in the input image, a chunk that is a candidate for a character is extracted locally, and the chunk is integrated based on a predetermined threshold (threshold). There is a method of extracting characters or character strings from an input image.
[0004]
[Problems to be solved by the invention]
However, in the above-described conventional character string extraction technology, a character string in which the size, shape, character spacing, etc. of the characters are not constant in the input image, or a character string in which characters are arranged in various directions or on a curved line If there is a mixture of characters, there is a possibility that the character string cannot be extracted accurately.
[0005]
For example, in the above-described extraction method for dividing an image, information in only one direction is obtained by projecting a feature amount around the image. Therefore, the direction in which the character strings to be extracted are arranged and the characters constituting the character string It is effective if the size, line spacing, etc. are approximately constant, but character strings with different character direction, character size, line spacing, etc. (for example, character strings that are arranged obliquely with respect to the projection direction) are mixed. If this is the case, it will not be possible to find a break in the distribution of the feature amount, and as a result, the character string cannot be correctly extracted. In addition, in the extraction method that integrates character blocks, which is another method, character strings are extracted by utilizing the fact that character strings are locally arranged in one direction, so that character strings arranged in various directions are mixed. Character strings can be extracted, however, since character blocks are integrated on the basis of a predetermined threshold, the conditions at the time of integration greatly depend on the size and shape of characters, character spacing, and the like. Therefore, if character strings with different character sizes or character spacings are mixed, it will be difficult to determine when integrating the character strings, and it will not be possible to perform character string extraction with high accuracy. There is a possibility that character string extraction needs to be performed in multiple times. Furthermore, since none of the extraction methods described above considers character strings arranged on a curve, even if the character strings are mixed in the input image, they are extracted as character strings. I can't.
[0006]
Therefore, in view of these problems, the present invention provides a character string in which the size, shape, character spacing, etc. of characters are not constant in the input image, and characters that are arranged in various directions or on a curved line. An object of the present invention is to provide an image processing apparatus and an image processing method capable of accurately and accurately extracting these character strings even when columns are mixed.
[0012]
  The present invention relates to a character candidate extraction unit that extracts a character candidate included in an input image from an input image, and a character candidate extracted by the character candidate extraction unit. Feature extraction means for extracting feature quantities and performing the extraction for a plurality of types of feature quantities;The relationship between the character candidate number, which is the identification number individually assigned to each character candidate extracted by the character candidate extraction means, and the character string number, which is the identification number of the character string candidate to be extracted, depends on the value of the input U A processing table that is specified by the state of the neuron to which the value of the output V transitions, and that can be identified by the value of the output V of the neuron to which character candidate each character candidate belongs, and the feature extraction By substituting a plurality of types of feature values extracted by the means into a predetermined arithmetic expression, the value of the input U of each neuron in the processing table is changed, and the output V of all neurons existing on the processing table is changed by the change. Transition the value to converge from the initial setting state to the optimal state,Among the plurality of character candidates extracted by the character candidate extraction means based on the transition resultCharacter candidate groups belonging to the same character string number as character candidate groups constituting the character stringAnd a character string extracting means for extracting.
[0013]
  Furthermore, the present invention is an image processing method devised to achieve the above object, and extracts character candidates included in the input image from the input image.Character candidate extracting step and the character candidate extracting stepExtract feature values for extracted character candidates and perform extraction for multiple types of feature valuesAnd a character candidate number that is an identification number assigned to each character candidate extracted in the character candidate extraction step and a character string number that is an identification number of the character string candidate to be extracted. , A process that is specified by the state of the neuron in which the value of the output V transitions according to the value of the input U, and that the character candidate belongs to the character string candidate of which character string number belongs to the value of the output V of the neuron A table is used to substitute a plurality of types of feature amounts extracted in the feature extraction step into a predetermined arithmetic expression to change the value of the input U of each neuron in the processing table, and the change is present on the processing table. Transition the values of the outputs V of all neurons so that they converge from the initial setting state to the optimal state,SoA character string extracting step of extracting a character candidate group belonging to the same character string number as a character candidate group constituting the character string from a plurality of character candidates extracted by the character candidate extracting means based on the transition result of PrepareIt is characterized by that.
[0014]
  According to the image processing apparatus having the above configuration or the image processing method according to the above procedure, when character candidates included in the input image are extracted, a plurality of types of feature amounts are extracted from the extracted character candidates. The feature amount extracted here includes, for example, a distance relationship with a straight line or a curve connecting with other character candidates, a size and shape of the character candidate, an interval with another character candidate, and the like. When multiple types of feature quantities are extracted, each feature quantity isWhile transitioning the value of each neuron in the processing table of the neural expression represented by the state of the neuron while referring to it, based on the transition resultA character candidate group constituting a character string is extracted from character candidates extracted from the input image. As a result, the character string existing in the input image is extracted as a character string without depending on the size of the characters constituting the character string or the direction in which the characters are arranged.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an image processing apparatus and an image processing method according to the present invention will be described with reference to the drawings.
[0016]
First, a schematic configuration of an image processing apparatus according to the present invention will be described. FIG. 1 is a block diagram showing a schematic configuration of an example of an image processing apparatus according to the present invention.
[0017]
As shown in the figure, the image processing apparatus according to this embodiment includes an image input unit 10, a character candidate extraction unit 20, a character string extraction unit 30, and an image output unit 40.
[0018]
The image input means 10 is for acquiring an input image by optical reading, and specifically comprises a scanner, a digital camera, or the like. In this image input means 10, an original picture or manuscript containing character strings, particularly office documents, memos, handwritten documents, catalogs, manuals, magazines, flyers, maps, photographs containing character strings such as signs, signs, and body numbers. From, the input image is acquired. However, the image input means 10 may acquire an input image via a communication line connected to a LAN (Local Area Network) or a so-called Internet.
[0019]
When the image input means 10 acquires the input image, the character candidate extraction means 20 extracts character candidates included in the input image from the input image as described later. Processing Unit) and a predetermined program executed on the processing unit).
[0020]
When the character candidate extracting unit 20 extracts a character candidate, the character string extracting unit 30 extracts a character string existing in the input image as described later based on the extraction result. Is realized by a combination of a CPU and a predetermined program executed on the CPU.
[0021]
The image output means 40 is for outputting the character string extraction result by the character string extraction means 30 to the outside, and specifically comprises an interface with an external device or the like. The output destination by the image output unit 40 is, for example, another image processing apparatus that performs processing such as gradation correction and image compression on the input image acquired by the image input unit 10 or the input image. OCR which recognizes a character is mentioned.
[0022]
Here, the details of the character candidate extraction means 20 in the image processing apparatus configured as described above will be described. FIG. 2 is a block diagram illustrating a configuration example of the character candidate extraction unit.
[0023]
As shown in the figure, the character candidate extraction unit 20 in the present embodiment includes an edge extraction unit 21, a binarization unit 22, and a single character candidate extraction unit 23.
[0024]
The edge extraction unit 21 performs edge extraction on the input image acquired by the image input unit 10 in a plurality of directions (for example, main scanning direction and sub-scanning direction). Specifically, for example, as shown in FIG. 3, the digital filter includes a plurality of digital filters 1 to 4 that operate in parallel, and performs digital filtering on each pixel value constituting the input image in parallel. An edge existing in the input image is extracted by calculating a difference from the pixel value and outputting a value having the largest absolute value of the difference. However, the edge extraction means 21 may perform edge extraction processing using other known techniques.
[0025]
In FIG. 2, the binarization means 22 binarizes the edges extracted by the edge extraction means 21. As for the binarization processing, it is conceivable to use a known technique such as simple binarization or floating binarization.
[0026]
The one-character candidate extraction unit 23 temporarily sets a circumscribed rectangle for each image element such as a point or a line segment existing in the image after the binarization processing by the binarization unit 22 and relates to each other. The circumscribed rectangles to be recognized are recognized as one block, and the block is extracted as one character candidate. Specifically, for example, when another circumscribed rectangle 2 is located in one circumscribed rectangle 1 as shown in FIG. 4A, one circumscribed rectangle 1 and another circumscribed rectangle 2 as shown in FIG. 4B. And a common area, as shown in FIG. 4C, the size ratio, distance interval, and shape of one circumscribed rectangle 1 and other circumscribed rectangle 2 are in a predetermined relationship. In addition, both circumscribed

rectangles

1 and 2 are recognized as one block, and the circumscribed

rectangles

1 and 2 are integrated. That is, the one-character candidate extraction unit 23 extracts one or more image elements that will constitute one character as one character candidate. The one-character candidate extraction unit 23 may use a circumscribed polygon instead of the circumscribed rectangle when extracting the character candidate.
[0027]
As described above, the character candidate extracting unit 20 includes the edge extracting unit 21, the binarizing unit 22, and the one character candidate extracting unit 23, thereby forming one character from the input image acquired by the image input unit 10. One or more image elements to be determined are extracted as one character candidate. However, as long as the character candidates existing in the input image can be extracted for each character, the character candidate extracting means 20 may have a configuration other than that described above, that is, a known technology other than that described above may be used. Absent. For example, when the input image acquired by the image input unit 10 is a binary image from the beginning, the edge extraction unit 21 and the binarization unit 22 may be omitted.
[0028]
Next, details of the character string extraction means 30 which is the most characteristic part in the image processing apparatus of the present embodiment will be described. FIG. 5 is a block diagram illustrating a configuration example of the character string extraction unit.
[0029]
As shown in the figure, the character string extraction unit 30 in the present embodiment includes a feature extraction unit 31, a character string candidate extraction unit 32, and a character string evaluation unit 33.
[0030]
The feature extraction unit 31 extracts the feature amount of each character candidate extracted by the character candidate extraction unit 20. The feature amount extracted by the feature extraction unit 31 is, for example, the area of an image element such as a point or a line segment constituting a character candidate, the area of a circumscribed rectangle of a character candidate, the area of a circumscribed polygon, a circumscribed rectangle or circumscribed multiple The length of the side of the square, the aspect ratio of the circumscribed rectangle, the ratio of the longest width to the shortest width of the circumscribed polygon, the coordinates of the center of gravity of the circumscribed rectangle or circumscribed polygon, and the like. That is, the feature extraction unit 31 extracts feature amounts related to the character candidates extracted by the character candidate extraction unit 20, and performs the extraction for a plurality of types of feature amounts. The feature amount extraction process may be performed by a predetermined calculation using a known technique.
[0031]
The character string candidate extraction unit 32 refers to the feature amount relating to the character candidate extracted by the feature extraction unit 31, and the candidate character group group constituting the character string from among the plurality of character candidates extracted by the character candidate extraction unit 20. That is, a character string candidate is extracted.
[0032]
By the way, the character string candidate extracting means 32 is constituted by a fully connected neural network, and is characterized in that character string candidates are extracted using an optimization method based on the neural network. A neural network refers to an artificial neural network that is modeled on nerve cells (neurons) in a biological brain.
[0033]
Here, the character string candidate extracting means 32 having the neural network configuration will be described in more detail. FIG. 6 is an explanatory diagram showing a neuron model, FIG. 7 is an explanatory diagram showing a McCarlock Pitz model which is an example of the input / output relationship of the neuron model, and FIG. 8 is an explanatory diagram showing an example of a neural expression. is there. In these figures, Vj represents the output of another neuron, Wj represents a weighting factor, U represents the input of neuron A, and V represents the output of neuron A. In order to simplify the following description, it is assumed that Wj is equal (= 1).
[0034]
6 and 7 show that the state of the input U of the neuron A changes in accordance with the output Vj of another neuron, and the value of the output V of the neuron A changes with the input U. Specifically, in the case of the McCurlock-Pitz model shown in FIG. 7, the value of the output V becomes “1” if the input U of the neuron A is greater than “0” as a boundary, and smaller than that. In this case, the value of the output V becomes “0”. Here, the McCarlock-Pitz model is shown as an example of the neuron model, but other examples include a sigmoid function model and a hysteresis-Maccarlock-Pitz model.
[0035]
The neural expression of FIG. 8 shows an example of the state of the processing table that the character string candidate extraction unit 32 has. The horizontal axis direction of this processing table represents an identification number (hereinafter referred to as “character candidate number”) j assigned to each character candidate extracted by the character candidate extraction means 20. Therefore, if there are m character candidates extracted by the character candidate extracting means 20, character candidate numbers j = 1 to m exist in the horizontal axis direction of the processing table. On the other hand, the vertical axis direction of the processing table represents the identification number (hereinafter referred to as “character string number”) i of the character string candidate to be extracted by the character string candidate extraction unit 32.
[0036]
An area specified by the character candidate number and the character string number, that is, an area within a dotted frame in the figure represents the state of the neuron. That is, “0” and “1” in the dotted line frame in the figure each represent the value of the output V of the neuron. From these values, the following can be understood from this processing table. For example, when the neuron corresponding to the character candidate number j = 3 and the character string number i = 2 is “1”, the character candidate whose character candidate number is 3 is the character string whose character string number is 2. It is understood that it belongs to.
[0037]
The value of the output V of each neuron in this processing table is set by random number generation in the initial state, but each is optimal until the character string candidate extraction unit 32 completes the extraction of the character string candidates. Transitions to a simple state. For this transition, the character string candidate extraction unit 32 prepares an operation expression such as the following expression (1), and further uses an update expression such as the expression (6) described later to input the corresponding neuron. U is changed.
[0038]
[Expression 1]

[0039]
In this operational equation (1), A, B, C, D, and E are preset coefficients, respectively.
[0040]
AREA corresponds to the area ratio of the circumscribed rectangle of interest to other circumscribed rectangles, and is represented by the following equation (2).
[0041]
[Expression 2]

[0042]
In this equation (2), LXk and LYk are the lengths of the sides that are orthogonal to the circumscribed rectangle k, and LXj and LYj are the lengths of the sides that are orthogonal to the circumscribed rectangle j. However, AREA may specify the area ratio of circumscribed rectangles based on the area of image elements constituting the character candidates, the area of circumscribed rectangles, and the like.
[0043]
In the operation formula (1), NEAR corresponds to the distance between the regression line for the circumscribed rectangle of the character candidate belonging to the same character string number i and the barycentric coordinate of the circumscribed rectangle k of interest. It is expressed by the following equation (3).
[0044]
[Equation 3]

[0045]
In this equation (3), Xk and Yk are the barycentric coordinates of the circumscribed rectangle k, and ai, bi, and ci are the regression lines a of the character string i._ix_k+ B_iy_k+ C_i= 0 coefficient.
[0046]
However, instead of the above-described equation (3), for example, as in the following equation (4), the smaller of the distances between the y-axis direction and the regression line in the x-axis direction may be NEAR.
[0047]
[Expression 4]

[0048]
In this equation (4), min (Q, R) indicates one of the smaller values of Q and R, and if the value of Q or R does not exist, the value of R or Q is output. Is meant to do.
[0049]
At this time, if a regression curve (secondary curve, cubic curve, etc.) is used instead of the regression line, character string candidates can be extracted from character candidates arranged in a curved line. For example, a quadratic curve is represented by a regression curve a_ix_k ²+ B_ix_k+ C_iy_k+ D_iIf = 0, the distance in the y-axis direction between the regression curve and the centroid coordinates of the circumscribed rectangle k is expressed by the following equation (5).
[0050]
[Equation 5]

[0051]
Furthermore, instead of NEAR as described above, a value obtained by dividing NEAR by the square root of the area of the circumscribed rectangle of the character candidate belonging to the character string i may be used.
[0052]
In the operation formula (1), INTER corresponds to the distance between the center of gravity of the circumscribed rectangle of interest and the center of gravity of the other circumscribed rectangle. In addition, the term min≡k, k = 1 to m (Qk) in the equation (1) represents the smallest positive positive value among Qk corresponding to k = 1 to m. Therefore, min≡k, k = 1 ～ m (INTER V_ik) Corresponds to the distance between the character candidate j and another character candidate closest to the character candidate j among the other character candidates constituting the character string i to which the character candidate j belongs. . However, at this time, a distance from a character candidate second closest to the character candidate j among other character candidates constituting the character string i to which the character candidate j belongs may be taken into consideration. Further, instead of INTER as described above, a value obtained by dividing INTER by the square root of the circumscribed rectangle area of the character candidate belonging to the character string i may be used.
[0053]
In the operation formula (1), h (t) corresponds to a well-known hill climb term and has a function of escaping from the local optimum solution.
[0054]
DU by the above equation (1)_ijThen, the character string candidate extraction unit 32 calculates the dU_ijIs substituted into an update equation such as the following equation (6) to change the value of the input U of each neuron.
[0055]
[Formula 6]

[0056]
At this time, the character string candidate extraction unit 32 can end the update of the input U of each neuron by the operation formula (1) and the update formula (6) by giving an appropriate convergence condition. As a result, the character string candidate extraction means 32 makes a transition so that the value of the output V of each neuron in the neural expression processing table is in an optimum state, so that each character is determined from the value of the output V of each neuron after the transition. The character string to which the candidate belongs can be determined, and the characteristics of the character string can be calculated.
[0057]
That is, the character string candidate extraction unit 32 refers to the feature amount related to the character candidate extracted by the feature extraction unit 31, and uses the operation formula (1) and the update formula (6) to calculate each neuron in the processing table of the neural expression. The value of the output V is transitioned to an optimal state, and character string candidates are extracted based on the transition result.
[0058]
The character string candidate extracted by the character string candidate extraction unit 32 is evaluated by the character string evaluation unit 33 of the character string extraction unit 30 as to whether or not it is a character string. That is, the character string evaluation unit 33 makes a final determination as to whether or not the character string candidate extracted by the character string candidate extraction unit 32 is a character string. Such a determination may be performed using, for example, an evaluation function related to linearity, an evaluation function related to the area ratio of the circumscribed rectangle, an evaluation function related to the interval of the circumscribed rectangle, and the like.
[0059]
The evaluation function related to linearity is for evaluating the average distance between the regression line or the regression curve of the character candidates belonging to the character string i. Specifically, the evaluation functions are expressed by the following equations (7) and (8), respectively. It is expressed.
[0060]
[Expression 7]

[0061]
[Equation 8]

[0062]
The evaluation function relating to the area ratio of the circumscribed rectangle is for evaluating the variance relating to the area of the circumscribed rectangle of the character candidate belonging to the character string i, and is specifically expressed by the following equation (9). .
[0063]
[Equation 9]

[0064]
The evaluation function related to the interval between circumscribed rectangles is for evaluating the average distance between the centroid coordinates of the circumscribed rectangles of the character candidates belonging to the character string i and the nearest centroid coordinates. ) Expression. However, the evaluation function related to the interval between the circumscribed rectangles may consider the distance from the barycentric coordinate that is the second nearest neighbor.
[0065]
[Expression 10]

[0066]
The character string evaluation means 33 uses a threshold value T corresponding to each of the expressions (7) to (10).₁, T₂, T_Three, T_FourAre prepared in advance, and the calculation result of each expression is a threshold T corresponding to each₁, T₂, T_Three, T_FourIs compared to determine whether or not the character string candidate extracted by the character string candidate extraction unit 32 is a character string. For example, the calculation result of equation (7) is the threshold T₁If it is not within the range, the character string evaluation means 33 determines that the extracted character string candidate is not a character string. Also, the calculation result of equation (9) is the threshold value T_ThreeIf it does not fall within the range, the character string evaluation means 33 determines that the extracted character string candidate is not a character string. Also, the calculation result of equation (10) is the threshold T_FourIf it is not within the range, the character string evaluation means 33 determines that the extracted character string candidate is not a character string.
[0067]
That is, the character string evaluation unit 33 performs a final determination as to whether or not the character string candidate extracted by the character string candidate extraction unit 32 is a character string, so that the character string extraction unit 30 performs the character determination. This is for improving the accuracy of column extraction.
[0068]
Next, a character string extraction processing procedure in the image processing apparatus configured as described above, that is, an image processing method in this embodiment will be described.
[0069]
In the image processing apparatus of the present embodiment, when the image input unit 10 acquires an input image, the character candidate extraction unit 20 extracts all the character candidates included in the input image from the input image. When all the character candidates are extracted, the feature extraction unit 31 of the character string extraction unit 30 extracts a plurality of types of feature amounts related to each character candidate. At the same time, a character candidate number j is assigned to each character candidate.
[0070]
Thereafter, the character string candidate extraction unit 32 of the character string extraction unit 30 causes the value of the output V of each neuron in the neural expression processing table to transition to an optimal state, and extracts a character string candidate from the transition result. For this purpose, the character string candidate extraction unit 32 first sets the value of the output V of each neuron in the processing table by random number generation. Accordingly, in the processing table, for example, as shown in FIG. 8, the value (“0” or “0” of the output V of each neuron corresponding to the number of character candidate numbers j and configured in a two-dimensional map shape. 1 ") is set.
[0071]
When the value of the output V of each neuron is set by random number generation, the character string candidate extraction unit 32 then selects one neuron on the processing table as the target of attention, and sets the value of the target of attention as the operation formula (1). And it changes to an optimal state using update formula (6).
[0072]
For example, if a neuron corresponding to a character candidate number j = 1 and a character string number i = 1 is targeted, the character candidate number j = 1 is set while referring to a plurality of types of feature amounts related to each character candidate. The area ratio between the circumscribed rectangle of the corresponding character candidate and the other circumscribed rectangle belonging to the character string number i = 1, the barycentric coordinates of the circumscribed rectangle of the character candidate corresponding to the character candidate number j = 1, and the character string number i = 1. The distance between the circumscribing rectangle of the circumscribed rectangle of the character candidate to which it belongs, the distance between the centroid of the circumscribed rectangle of the character candidate corresponding to the character candidate number j = 1 and the centroid of the other circumscribed rectangle belonging to the character string number i = 1 , Etc. are substituted into the equation (1) to obtain dU_ijAnd the dU_ijIs substituted into the update equation (6) to change the value of the input U of each neuron. Therefore, the value of the neuron output V corresponding to the character candidate number j = 1 and the character string number i = 1 corresponds to the character candidate number j = 1 in accordance with the change in the value of the input U as shown in FIG. If it is suitable for the character candidate to belong to the character string number i = 1, transition to “1”, otherwise transition to “0”. As a result of this transition, if the value of the neuron output V corresponding to the character string number i = 1 and the character candidate number j = 1 is “1”, the character candidate whose character candidate number is 1 is the character candidate It belongs to a character string candidate whose column number is 1.
[0073]
The character string candidate extraction unit 32 sequentially repeats such a transition of the value of the output V for all neurons existing on the processing table. As a result, the value of the output V of each neuron on the processing table transitions to an optimum state and converges to the optimum state.
[0074]
Thereafter, the character string extraction unit 30 determines whether the character string evaluation unit 33 is a character string for the character string candidate extracted by the character string candidate extraction unit 32. As a result of this determination, only character string candidates determined as character strings are output from the image output means 40 as character string extraction results.
[0075]
The number of character candidate numbers j and character string numbers i constituting the neural expression processing table may be appropriately determined in consideration of the processing capacity (processing capacity) in the character string candidate extraction unit 32, for example. In the character string candidate extracting unit 32 and the character string evaluating unit, the character string that has been determined as the character string by the character string evaluating unit 33 is deleted and the result is input to the character string candidate extracting unit 32 again. If the processing by the candidate extraction unit 32 and the character string evaluation unit 33 is repeated, the number of character candidates is not limited to the number of character candidate numbers j and character string numbers i in the processing table, and more character strings are extracted from the input image. It is also possible to extract.
[0076]
As described above, according to the image processing apparatus and the image processing method of the present embodiment, after extracting the character candidates included in the input image, the distance relationship between the straight line or curve connecting the character candidates and each character candidate, that is, Considering the position of the character candidate in the input image, a straight line or curve that will be along the character string is specified using a processing table of neural expression. At this time, if a plurality of character strings are mixed in the input image, the positions of the character candidates in the input image are taken into consideration, so that a plurality of straight lines or curves are specified. Then, when a straight line or a curve is specified, a character candidate group arranged on the straight line or curve is extracted as constituting a character string regardless of the size of each character candidate.
[0077]
Further, according to the image processing apparatus and the image processing method of the present embodiment, when character candidates included in the input image are extracted, a plurality of types of feature amounts are extracted from the extracted character candidates, and each feature amount is taken into consideration. A character candidate group constituting a character string is extracted from character candidates extracted from the input image using a processing table of neural expression based on each feature amount.
[0078]
For these reasons, in the image processing apparatus and the image processing method according to the present embodiment, the character size, shape, character spacing, etc. in the input image are not constant, the direction in which characters are arranged, Even when character strings arranged side by side are mixed, it is possible to perform highly accurate character string extraction without depending on the size of the characters, the direction in which the characters are arranged, or the like. Even if there is a character string composed of characters arranged on a curve in the input image, the character string can be accurately extracted. In addition, since multiple types of feature quantities for each character candidate are comprehensively considered at that time, it is difficult to determine when integrating the character strings, and it is necessary to perform character string extraction in multiple times. Absent.
[0079]
In particular, in the present embodiment, the character string candidate extraction unit 32 uses the regression line or regression curve that connects a plurality of character candidates and the barycentric coordinates of the character candidate that is the target of attention as the feature amount of the character candidate in the equation (1). The distance relationship is considered. Therefore, for example, even if character strings in different directions are mixed in the input image or there are character strings arranged on a curve, these can be accurately extracted.
[0080]
In addition, the character string candidate extraction unit 32 according to the present embodiment also considers the size of each character candidate as the feature amount of the character candidate. Specifically, in the operation formula (1), the area of the image element constituting the character candidate, the area of the rectangle circumscribing the character candidate, the length of the side of the rectangle circumscribing the character candidate, and the like are considered. ing. Therefore, for example, even if character strings having different font sizes are mixed in the input image, these can be accurately extracted.
[0081]
Furthermore, at this time, if the shape of each character candidate, for example, the aspect ratio of the rectangle circumscribing the character candidate is also taken into account, not only the size of the font but also character strings having different font types may be mixed. These can be extracted accurately.
[0082]
In addition, the character string candidate extraction unit 32 in the present embodiment also considers the interval between the character candidates as the character candidate feature amount. Specifically, in the operation formula (1), the distance between the center of gravity of the circumscribed rectangle to be noticed and the center of gravity of the other circumscribed rectangle is considered. Therefore, for example, even if character strings having different character intervals are mixed in the input image, these can be accurately extracted.
[0083]
In addition, in the image processing apparatus and the image processing method of the present embodiment, after the evaluation determination by the character string evaluation means 33, the extraction result of the character string is output, so that further character string extraction can be performed. High accuracy can be achieved.
[0084]
In the present embodiment, the case where the character string candidate extraction unit 32 extracts character string candidates using the above-described equations (1) to (6) has been described as an example. However, the present invention is not limited thereto. For example, an optimization method using another neuron model or another neural network may be used. Also, other optimization methods may be used.
[0085]
【The invention's effect】
As described above, the image processing apparatus and the image processing method according to the present invention include character strings in which the character size, shape, character spacing, and the like are not constant in the input image, and character strings with various character alignment directions. Even if there is a character string composed of characters that are mixed or lined up on a curved line, highly accurate character string extraction is performed without depending on the size or direction of the characters that make up the character string. be able to. In addition, if the feature values for each character candidate are comprehensively considered at that time, it is not difficult to make a determination at the time of character string integration and the character string extraction is not performed multiple times as in the past. Thus, the accuracy of character string extraction can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an example of an image processing apparatus according to the present invention.
FIG. 2 is a block diagram illustrating a configuration example of a character candidate extraction unit included in the image processing apparatus of FIG.
FIG. 3 is a block diagram showing a configuration example of a main part in the character candidate extraction unit of FIG. 2;
FIGS. 4A and 4B are explanatory diagrams showing an outline of a single character extraction process by the character candidate extraction unit of FIG. 2, and FIGS.
5 is a block diagram illustrating a configuration example of character string extraction means included in the image processing apparatus of FIG. 1;
FIG. 6 is an explanatory diagram showing a neuron model.
FIG. 7 is an explanatory diagram showing a McCarlock Pitz model which is an example of an input / output relationship of a neuron model.
FIG. 8 is an explanatory diagram showing an example of a neural expression.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Image input means, 20 ... Character candidate extraction means, 30 ... Character string extraction means, 31 ... Feature extraction means, 32 ... Character string candidate extraction means, 33 ... Character string evaluation means

Claims

入力画像から当該入力画像に含まれる文字候補を抽出する文字候補抽出手段と、
前記文字候補抽出手段が抽出した文字候補に関する特徴量を抽出するとともに当該抽出を複数種類の特徴量について行う特徴抽出手段と、
前記文字候補抽出手段が抽出した各文字候補に個別に付された識別番号である文字候補番号と、抽出すべき文字列候補の識別番号である文字列番号との関係が、入力Ｕの値によって出力Ｖの値が遷移するニューロンの状態によって特定され、当該ニューロンの出力Ｖの値によって各文字候補がどの文字列番号の文字列候補に属するか分かるように構成された処理テーブルと、
前記特徴抽出手段が抽出した複数種類の特徴量を所定の演算式に代入して前記処理テーブルにおける各ニューロンの入力Ｕの値を変化させ、当該変化によって前記処理テーブル上に存在する全てのニューロンの出力Ｖの値を初期設定状態から最適状態に収束するように遷移させ、その遷移結果を基にしつつ前記文字候補抽出手段が抽出した複数の文字候補の中から同一の文字列番号に属する文字候補群を文字列を構成する文字候補群として抽出する文字列抽出手段と
を備えることを特徴とする画像処理装置。Character candidate extraction means for extracting character candidates included in the input image from the input image;
A feature extraction unit that extracts a feature amount related to a character candidate extracted by the character candidate extraction unit and performs the extraction on a plurality of types of feature amounts;
The relationship between the character candidate number, which is the identification number individually assigned to each character candidate extracted by the character candidate extraction means, and the character string number, which is the identification number of the character string candidate to be extracted, depends on the value of the input U A processing table that is specified by the state of the neuron to which the value of the output V transitions, and is configured so that each character candidate belongs to the character string candidate of which character string number by the value of the output V of the neuron;
By substituting a plurality of types of feature values extracted by the feature extraction means into a predetermined arithmetic expression, the value of the input U of each neuron in the processing table is changed, and by this change, all the neurons existing on the processing table are changed. Character candidates belonging to the same character string number from among a plurality of character candidates extracted by the character candidate extraction means based on the transition result while causing the value of the output V to transition so as to converge from the initial setting state to the optimal state An image processing apparatus comprising: character string extraction means for extracting a group as a character candidate group constituting a character string.

前記複数種類の特徴量の一つは、同一の文字列番号に属する文字候補の外接矩形を結ぶ各回帰直線または各回帰曲線と注目対象である文字候補の重心座標との距離である
ことを特徴とする請求項１記載の画像処理装置。One of the plurality of types of feature amounts is a distance between each regression line or each regression curve connecting the circumscribed rectangles of the character candidates belonging to the same character string number and the barycentric coordinates of the target character candidate. The image processing apparatus according to claim 1.

前記複数種類の特徴量の一つは、各文字候補の大きさであることを特徴とする請求項１または２記載の画像処理装置。 The image processing apparatus according to claim 1, wherein one of the plurality of types of feature amounts is a size of each character candidate.

前記文字候補の大きさは、当該文字候補を構成する画像要素の面積から特定することを特徴とする請求項３記載の画像処理装置。 The image processing apparatus according to claim 3, wherein the size of the character candidate is specified from an area of an image element constituting the character candidate.

前記文字候補の大きさは、当該文字候補に外接する矩形の面積から特定することを特徴とする請求項３記載の画像処理装置。 The image processing apparatus according to claim 3, wherein the size of the character candidate is specified from an area of a rectangle circumscribing the character candidate.

前記文字候補の大きさは、当該文字候補に外接する矩形の辺の長さから特定することを特徴とする請求項３記載の画像処理装置。 The image processing apparatus according to claim 3, wherein the size of the character candidate is specified from a length of a side of a rectangle circumscribing the character candidate.

前記複数種類の特徴量の一つは、各文字候補同士の間隔であることを特徴とする請求項１から６のいずれか１項に記載の画像処理装置。 7. The image processing apparatus according to claim 1, wherein one of the plurality of types of feature amounts is an interval between character candidates.

前記複数種類の特徴量の一つは、各文字候補の形状であることを特徴とする請求項１から７のいずれか１項に記載の画像処理装置。 The image processing apparatus according to claim 1, wherein one of the plurality of types of feature amounts is a shape of each character candidate.

前記文字候補の形状は、当該文字候補に外接する矩形の縦横比から特定することを特徴とする請求項８記載の画像処理装置。 The image processing apparatus according to claim 8, wherein the shape of the character candidate is specified from an aspect ratio of a rectangle circumscribing the character candidate.

入力画像から当該入力画像に含まれる文字候補を抽出する文字候補抽出ステップと、
前記文字候補抽出ステップで抽出した文字候補に関する特徴量を抽出するとともに当該抽出を複数種類の特徴量について行う特徴抽出ステップと、
前記文字候補抽出ステップで抽出した各文字候補に個別に付された識別番号である文字候補番号と、抽出すべき文字列候補の識別番号である文字列番号との関係が、入力Ｕの値によって出力Ｖの値が遷移するニューロンの状態によって特定され、当該ニューロンの出力Ｖの値によって各文字候補がどの文字列番号の文字列候補に属するか分かるように構成された処理テーブルを用い、前記特徴抽出ステップで抽出した複数種類の特徴量を所定の演算式に代入して前記処理テーブルにおける各ニューロンの入力Ｕの値を変化させ、当該変化によって前記処理テーブル上に存在する全てのニューロンの出力Ｖの値を初期設定状態から最適状態に収束するように遷移させ、その遷移結果を基にしつつ前記文字候補抽出手段が抽出した複数の文字候補の中から同一の文字列番号に属する文字候補群を文字列を構成する文字候補群として抽出する文字列抽出ステップと
を備えることを特徴とする画像処理方法。 A character candidate extraction step of extracting character candidates included in the input image from the input image ;
And line Cormorant feature extraction step for a plurality types of features the extraction extracts the feature quantity related to the character candidates extracted by the character candidate extraction step,
The relationship between the character candidate number, which is the identification number individually assigned to each character candidate extracted in the character candidate extraction step, and the character string number, which is the identification number of the character string candidate to be extracted, depends on the value of the input U Using the processing table configured such that the value of the output V is specified by the state of the transitioning neuron and each character candidate belongs to the character string candidate of which character string number by the value of the output V of the neuron. A plurality of types of feature values extracted in the extraction step are substituted into a predetermined arithmetic expression to change the value of the input U of each neuron in the processing table, and the output V of all neurons existing on the processing table due to the change. shifts the value to converge from the initial setting state to the optimum state, a plurality of while based on transition results of its said character candidate extraction unit and extracted character candidates A character string extraction step of extracting a character candidate group belonging to the same string number as a character candidate group constituting a string from being
Image processing method, characterized in that it comprises a.