JP4063706B2

JP4063706B2 - Visual device and its application

Info

Publication number: JP4063706B2
Application number: JP2003122945A
Authority: JP
Inventors: 智岩片; 将文萩原; 義明味岡
Original assignee: 株式会社エッチャンデス
Priority date: 2003-04-25
Filing date: 2003-04-25
Publication date: 2008-03-19
Anticipated expiration: 2023-04-25
Also published as: JP2004328549A

Description

【０００１】
【発明の属する技術の分野】
本発明は、画像中の複数の文字又は図形を抽出する視覚装置に関し、詳しくは、ビデオカメラ及びスキャナなどの撮影装置で撮影された二値画像、濃淡画像、三原色波長、可視光波長、赤外線波長、紫外線波長、その他全ての電磁波のうち任意の帯域から構成される画像などの静止画像中の複数の文字又は図形と、それらの輪郭の色から多色画像を生成し、各々の色の画像に対して文字又は図形を抽出するものに関する。
【０００２】
【従来の技術】
これまで本発明者のうちの一人により、視覚装置が開発されてきた（例えば、特許文献１、２及び３参照。）。この視覚装置はデジタル技術によって人間の視覚機能を具現化したものであり、例えば顕微鏡によって撮影された画像中の微生物の数を数えたり、また、移動カメラのパン、チルト、ロール及びズームの機構を制御することにより、複数の物体を探索し、これらの物体に対して画像処理を行うことができる。この視覚装置が行うこの画像処理の大半が局所処理であり、しかもこれらの局所処理は二次元格子状に配列された複数の配列演算ユニットによって並列に実行される。この視覚装置は、パーソナルコンピュータのようなコンピュータシステム、複数のプロセッサを搭載した並列計算機及びこれらを１個のＬＳＩ（大規模集積回路）に搭載したシステムＬＳＩなどによって実現され得るが、さらに複数の仮想配列演算ユニットから構成される専用回路が用いられるならば、この視覚装置はこれらの局所並列画像処理を高速に実行することができる（特許文献２、３及び非特許文献１参照。）。
【０００３】
一方で、これまでに多くの研究者が画像中の文字を抽出する方法を開発してきた（例えば、非特許文献２、３、４、５及び６参照。）。これらの方法の多くが濃淡画像を対象としたものである。また、これらの方法がカラー画像を処理できる場合でも、これらの方法は、３個の濃淡画像に分解した後に各々の濃淡画像に対して文字を抽出するので、これらの方法は基本的に濃淡画像に対する文字抽出法である（例えば、非特許文献２参照。）。加えて、これらの方法の多くがフーリエ変換、ハフ変換及びヒストグラムの計算を行う（例えば、非特許文献３、４及び６参照。）。したがって、例え処理領域を区切ったとしても、これらの方法はこの処理領域の中で大域処理を行わなければならないので、基本的に並列性が低くなり、結果として処理速度が遅くなる。
【０００４】
ところで、局所並列画像処理はこのような問題を解決することができる。例えば、８近傍しか参照しない位置及び大きさ検出アルゴリズム（以下、位置／大きさ検出アルゴリズムと略記する。例えば、特許文献１参照。）が出力する重複情報が一定の範囲内にある場合、この重複情報が文字の位置のヒントになる。そこで、８個以下の画素を参照する関数のみから構成される局所並列画像処理アルゴリズムでも情景画像から文字を抽出することができ、しかも比較的高い抽出率を達成することができる。しかしながら、このアルゴリズムには以下のような３つの問題がある。第一に、このアルゴリズムはカラー画像を黒文字画像と白文字画像にしか変換しないので、他の色を処理することが極めて困難である。そのため、このアルゴリズムは、このカラー画像の赤帯域だけにバイアスを加えることにより、赤色を黒色か白色のいずれかとして際立たせている。第二に、例えこの重複情報がこの一定の範囲内にあるとしても、この重複情報は必ずしも文字の位置を指し示すとは限らない。多くの場合、塗り潰された領域の中にも同様の重複情報が生成されるため、予めこの塗り潰された領域に含まれる重複情報が除去された後に、このアルゴリズムは残りの重複情報を用いて文字の判定を行わなければならない。第三に、このアルゴリズムは８近傍以下の処理のみを用いているので、この情景画像が大きくなればなるほど計算量も増加する。特にこの塗り潰された領域を除外する場合、このアルゴリズムは８近傍処理を多数回繰り返さなければならない。
【０００５】
ここで、人間の視覚について調べると、補色関係を有効に利用していることが判る（例えば、非特許文献７参照。）。この補色関係は、人間の視覚において見られる特徴であり、色相環において向い合う色の組み合せである。例えば、赤色と緑色は補色関係にあり、同じ画素において同時に見えることはない。同様に黄色と青色も補色関係にある。但し、人間の視覚では、黄色と青色の影響力は白色、黒色、赤色及び緑色に比べて弱く、例えば白地に黄色の文字や黒地に青色の文字は大変見辛い。したがって、黄色は白色に、青色は黒色に近いと見做すことができるが、このことは、心理学のみならず網膜の構造からも裏付けられている。
【０００６】
加えて、上述の白色及び黒色自体も相対的な明るさを表し、ある画素が同じ明るさであっても、その周辺が暗ければこの画素は白色に見え、その周辺が明るければこの画素は黒色に見える。つまり灰色というどの色にも属さない周辺の明るさを基準とした相対的な明るさが人間にとっての白色及び黒色となる。何故なら、太陽光の下など明るさが常時変化する環境において人間が生活する場合、このような相対的な明るさを利用することは大変便利であるからである。但し、神経生理学の分野において、この灰色を決定する具体的な網膜の仕組みは未だ解明されていないばかりか、網膜から神経節細胞を介して一次視覚野に送信する画像データを圧縮するのに、この灰色が重要な役割を果たしていることに、多くの研究者は気付いていない（例えば、非特許文献８参照。）。また、網膜には多くの種類のアマクリン細胞があり、これらのアマクリン細胞は双極細胞及び神経節細胞と複雑なネットワークを形成しているにも係わらず、これらのアマクリン細胞に対しては、ＬＳＩの分野においても動き検出にのみ注目が集まっている（例えば、非特許文献９及び１０参照。）。
【０００７】
実は人間が文字又は図形を書く場合、これらの文字及び図形を見易くするために、無意識にこの補色関係を利用している。そこで第一の問題に対しては、この補色関係を有効に用いれば、これらの文字及び図形の位置を容易に見付けることができるばかりか、少ない情報量でこれらの文字及び図形の色を的確に表すこともできる。しかも塗り潰された領域は色の変化がないために灰色と見做され得る。そこで上述の情景画像からこの灰色の部分を除くことにより、この補色関係は第二の問題に対しても有効である。
【０００８】
さて、第三の問題に対しては、位置／大きさ検出アルゴリズムを使う代りに任意の近傍を直接参照して総和を求める関数を用いる。特に、文字及び図形が一般に線分、円弧及び自由曲線を含む少なくとも１個の線によって構成されているので、この近傍に対する近傍長はこれらの線の最大幅を越える必要はない。したがって、総和を求めるこの関数も局所関数になる。これにより、視覚装置は計算量を大幅に削減することができる。なお、以下では、文字とは所定の線幅以下の線のみから構成されたものを指し、図形とは所定の線幅を越える線若しくは領域を少なくとも１個含むものを指すものとする。
【０００９】
これらのことを考慮すると、視覚装置が入力画像を少なくとも白色、黒色、赤色、緑色及び灰色の５色に分類し、各々の色毎に、総和を求める局所関数を用いて文字又は図形を選択することにより、この視覚装置はこれらの文字及び図形を抽出することができるばかりか、これらの文字及び図形の色を的確に表すことができると期待される。しかも多くの関数が８近傍以下を参照する関数であるので、この視覚装置は少ないハードウェア量と計算量でこれらの文字及び図形を抽出することができると期待される。
【００１０】
【特許文献１】
国際公開番号ＷＯ００／１６２５９号パンフレット
【特許文献２】
国際公開番号ＷＯ０１／４１４４８号パンフレット
【特許文献３】
国際公開番号ＷＯ０２／７３５３８号パンフレット
【非特許文献１】
福田静人，味岡義明，天野英晴，「位置／大きさ検出装置のＦＰＧＡへの実装」，第１０回エフピージーエー／ピーエルディー・デザイン・コンファレンス(FPGA/PLD Design Conference)論文集，エレクトロニック・デザイン・アンド・ソリューション・フェア(Electronic Design and Solution Fair)２００３，２００３年１月３０日，ｐ．１１５−１２２
【非特許文献２】
顧力栩，田中直樹，金子豊久，Ｒ．Ｍ．Ｈａｒａｌｉｃｋ，「表紙画像からの文字領域抽出方式」，電子情報通信学会論文誌，１９９７年１０月，第Ｊ８０−Ｄ−ＩＩ巻，第１０号，ｐ．２６９６−２７０４
【非特許文献３】
劉詠梅，山村毅，大西昇，杉江昇，「シーン内の文字列領域の抽出について」，電子情報通信学会論文誌，１９９８年４月，第Ｊ８１−Ｄ−ＩＩ巻，第４号，ｐ．６４１−６５０
【非特許文献４】
後藤英昭，平山理継，阿曽弘具，「局所多値しきい値処理による濃淡文書画像からの文字パターンの抽出」，電子情報通信学会論文誌，１９９９年１１月，第Ｊ８２−Ｄ−ＩＩ巻，第１１号，ｐ．２１８８−２１９２
【非特許文献５】
金大祐，高橋裕樹，中嶋正之，「情景画像からのハングル文字列領域抽出と字素識別」，電子情報通信学会技術報告，２００２年５月，第ＰＲＭＵ−２００２−１８号，第ＭＩ２００２−３４号，ｐ．６５−７０
【非特許文献６】
松尾賢一，上田勝彦，梅田三千雄，「局所対象領域の２値化による情景画像からの文字列領域抽出」，電気学会論文誌，２００２年２月，第１２２−Ｃ巻，第２号，ｐ．２３２−２４１
【非特許文献７】
池田光男著，「眼はなにをみているか−視覚系の情報処理」，第１版，平凡社，１９８８年８月２２日，ｐ．１８３−２７６
【非特許文献８】
ヘルガ・コルブ(Helga Kolb)，エドアルド・フェルナンデス(Eduardo Fernandez)，ラルフ・ネルソン(Ralph Nelson)編，「ウェブビジョン (WEBVISION)」，（米国），[online]，２００３年，ジョン・Ａ・モラン・アイ・センター(John A. Moran Eye Center)，ユタ大学 (University of Utah)，［２００３年４月１４日検索］，インターネット＜URL:http://webvision.med.utah.edu＞
【非特許文献９】
Ｋ．ボアヘン(K.Boahen)，「レチノモルフィック・チップ・ザット・シー・クアドラプル・イメージ(Retinomorphic Chips that see Quadruple Images)，（米国），第７回ニューラル，ファジーとバイオインスパイアード・システムのためのマイクロエレクトロニクスに関する国際会議予稿集(Proceedings of the Seventh International Conference on Microelectronics for Neural, Fuzzy, and Bio- Inspired Systems)，アイトリプルイー(IEEE)，１９９９年，ｐ．１２−２０
【非特許文献１０】
山田仁，宮下貴重，大谷真弘，米津宏雄，「内網膜機能に学んだ動き情報の生成とその電子回路化」，デンソーテクニカルレビュー，株式会社デンソー，２０００年，第５巻，第２号，ｐ．１０１−１０６
【００１１】
【発明が解決しようとする課題】
そこで、請求項記載の本発明は、色の補色関係を用いて、画像から少なくとも１個の文字及び図形を抽出することにより、少ないハードウェア量及び計算量でこれらの文字及び図形の位置又は色を同時に抽出することを目的とする。
【００１２】
【課題を解決するための手段及び発明の効果】
請求項１の発明は、少なくとも１個のデータ処理装置を含む視覚装置であって、前記データ処理装置が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、を特徴とし、前記データ処理装置が、補色関係に基づいて、入力したデジタル画像の各画素を灰色を含む少なくとも５色に分類することにより、前記デジタル画像から少なくとも前記５色で表される多色画像を生成する多色分類手段を備えることにより、前記デジタル画像の各画素を少なくとも３ビットの帯域画素値に圧縮することを特徴とする視覚装置である。本発明において、前記多色分類手段は二次元格子状に配列された複数の配列演算ユニットによって実現される。したがって、前記視覚装置が行う画像処理は局所並列画像処理である。前記多色分類手段では、前記デジタル画像の各画素の複数の前記帯域画素値で表される色を、白色、黒色及び色相環において向い合う少なくとも１対の色を含む少なくとも４色に分類する手段（又はステップ）と、前記白色と前記黒色の差分を計算する手段（又はステップ）と、向い合う少なくとも前記１対の色から差分を計算する手段（又はステップ）と、前記白色と前記黒色の前記差分と、向い合う少なくとも前記１対の色の前記差分と、から灰色を分類する手段（又はステップ）と、を用いて、少なくとも前記５色を含む前記多色画像に変換することが好ましい。なお、前記デジタル画像がカラー画像である場合、向い合う少なくとも１対の色として、一般に赤色及び緑色が用いられることが好ましい。また、特に断りがない限り、黄色及び青色はそれぞれ前記白色及び前記黒色と見做される。そこでこれら５色を表すためには少なくとも３ビットあれば良いので、前記視覚装置は、前記デジタル画像を、少なくとも３ビットの前記帯域画素値から構成される画像に圧縮することができる。しかも圧縮された前記帯域画素値の多くが前記灰色となるので、これらの前記帯域画素値に対する圧縮も容易である。本発明は、前記デジタル画像を圧縮することができるので、前記デジタル画像の圧縮に関する諸問題が好適に解決される。
【００１３】
請求項２の発明は、少なくとも１個のデータ処理装置を含む視覚装置であって、前記データ処理装置が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、を特徴とし、前記データ処理装置が、補色関係に基づいて、入力したデジタル画像の各画素を灰色を含む少なくとも５色に分類することにより、前記デジタル画像から少なくとも前記５色で表される多色画像を生成する多色分類手段と、前記多色画像の各画素の少なくとも１色に対して、所定の近傍内にある所定の帯域画素値の個数が所定の範囲内にある画素だけを選択することにより、前記多色画像を粗文字／図形画像に変換する文字／図形選択手段と、前記粗文字／図形画像の各画素の少なくとも１色に対して、所定の近傍内にある所定の帯域画素値の個数が所定の範囲内にある画素だけを除去することにより、前記粗文字／図形画像を文字／図形画像に変換するテクスチャ除去手段と、を備えることにより、前記文字／図形画像の各画素を少なくとも３ビットの帯域画素値に圧縮することを特徴とする視覚装置である。本発明において、文字とは所定の線幅以下の線のみから構成されたものを指し、図形とは所定の線幅を越える線若しくは領域を少なくとも１個含むものを指すものとする。そこで本発明では、前記デジタル画像中のこれらの文字及び／又は図形（以下、文字／図形と略記する。）を抽出し、前記文字／図形画像を生成する。本発明において、全ての手段は二次元格子状に配列された複数の配列演算ユニットによって実現される。したがって、前記視覚装置が行う画像処理は局所並列画像処理である。前記多色分類手段では、前記デジタル画像の各画素の複数の前記帯域画素値で表される色を、白色、黒色及び色相環において向い合う少なくとも１対の色を含む少なくとも４色に分類する手段（又はステップ）と、前記白色と前記黒色の差分を計算する手段（又はステップ）と、向い合う少なくとも前記１対の色から差分を計算する手段（又はステップ）と、前記白色と前記黒色の前記差分と、向い合う少なくとも前記１対の色の前記差分と、から灰色を分類する手段（又はステップ）と、を用いて、少なくとも前記５色を含む前記多色画像に変換することが好ましい。なお、前記デジタル画像がカラー画像である場合、向い合う少なくとも１対の色として、一般に赤色及び緑色が用いられる。また、特に断りがない限り、黄色及び青色はそれぞれ前記白色及び前記黒色と見做される。そこでこれら５色を表すためには少なくとも３ビットあれば良いので、前記視覚装置は、前記デジタル画像を、少なくとも３ビットの前記帯域画素値から構成される画像に圧縮することができる。しかも圧縮された前記帯域画素値の多くが前記灰色となるので、これらの前記帯域画素値に対する圧縮も容易である。前記文字／図形選択手段では、前記多色画像の各画素の少なくとも１色に対して、前記所定の近傍内にある前記所定の帯域画素値を集める手段（又はステップ）と、集まった前記帯域画素値の個数が前記所定の範囲内にある前記画素だけを選択する手段（又はステップ）と、を用いて、前記多色画像を粗文字／図形画像に変換する。もし前記所定の帯域画素値の前記個数が前記所定の範囲内にあれば、前記多色画像の前記画素の前記色の帯域画素値を前記文字／図形と見做し、前記粗文字／図形画像の帯域画素値とする。もし前記所定の帯域画素値の前記個数が前記所定の範囲内になければ、前記帯域画素値を前記灰色と見做し、前記粗文字／図形画像の帯域画素値とする。前記テクスチャ除去手段では、前記粗文字／図形画像の各画素の少なくとも１色に対して、エッジ情報に対応する帯域画素値を生成する手段（又はステップ）と、所定の近傍内にある所定の帯域画素値を集める手段（又はステップ）と、集まった前記帯域画素値の個数が所定の範囲内にある画素だけを除去する手段（又はステップ）と、を用いて、前記粗文字／図形画像を文字／図形画像に変換することが好ましい。もし前記所定の帯域画素値の前記個数が前記所定の範囲内になければ、前記粗文字／図形画像の前記画素の前記色の帯域画素値をテクスチャではないと見做し、前記文字／図形画像の帯域画素値とする。もし前記エッジ情報の数が前記所定の範囲内にあれば、前記粗文字／図形画像の前記帯域画素値を前記灰色と見做し、前記文字／図形画像の帯域画素値とする。本発明は、前記デジタル画像中の前記文字／図形の色を的確に表したまま、前記デジタル画像中の前記文字／図形を抽出することができるので、前記文字／図形の抽出に関する諸問題が好適に解決される。
【００１４】
請求項３の発明は、請求項２記載の視覚装置において、前記データ処理装置が精細化手段を備え、前記精細化手段が、前記文字／図形画像の各画素を構成する複数の前記帯域画素値が前記文字／図形を表す場合、精細文字／図形画像において対応する前記帯域画素値の各々を前記デジタル画像において対応する前記帯域画素値に設定する手段（又はステップ）と、前記文字／図形画像の各画素を構成する複数の前記帯域画素値が前記灰色を表す場合、前記精細文字／図形画像において対応する前記帯域画素値の各々を所定の灰色を表す前記帯域画素値に設定する手段（又はステップ）と、を用いることを特徴とする視覚装置である。本発明において、文字とは所定の線幅以下の線のみから構成されたものを指し、図形とは所定の線幅を越える線若しくは領域を少なくとも１個含むものを指すものとする。そこで本発明では、前記デジタル画像中のこれらの文字及び／又は図形（以下、文字／図形と略記する。）を抽出し、前記精細文字／図形画像を生成する。本発明において、全ての手段は二次元格子状に配列された複数の配列演算ユニットによって実現される。したがって、前記視覚装置が行う画像処理は局所並列画像処理である。前記精細化手段は、前記デジタル画像のうち、前記文字／図形画像中の前記灰色の前記画素、つまり前記文字及び前記図形に含まれない前記画素に対応する前記画素を所定の前記灰色に置き換え、それ以外の画素をそのまま維持する。これにより、前記精細文字／図形画像中の前記文字及び前記図形の色は正確になる。しかも多数の画素が前記灰色になるので、これらの前記画素の帯域画素値に対する圧縮も容易である。本発明は、前記デジタル画像中の前記文字及び前記図形の色を正確に表したまま、前記デジタル画像中の前記文字及び前記図形を抽出することができるので、前記文字及び前記図形の抽出に関する諸問題が好適に解決される。
【００１５】
請求項４の発明は、二次元格子状に配列された複数の配列演算ユニットから構成される少なくとも１個のデータ処理装置を含む視覚装置であって、前記データ処理装置の各々が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、各々の前記配列演算ユニットが、少なくとも１個の前記プロセッサを用いて、少なくとも１個の前記帯域画素値を処理することと、を特徴とし、各々の前記配列演算ユニットにおいて、前記配列演算ユニットを初期化する手段と、入力すべきデジタル画像の帯域画素値がなければ処理を終了する手段と、前記デジタル画像の少なくとも１個の前記帯域画素値を入力する手段と、前記デジタル画像の各画素を平滑するために、前記デジタル画像の複数の前記帯域画素値を平滑画像の少なくとも１個の帯域画素値に変換する手段と、前記平滑画像の各画素を強調するために、前記平滑画像の複数の前記帯域画素値を強調画像の少なくとも１個の帯域画素値に変換する手段と、前記強調画像の各画素を少なくとも５色に分類するために、前記強調画像の少なくとも１個の前記帯域画素値から色分類画像の少なくとも１個の帯域画素値を生成する手段と、前記色分類画像の各画素から孤立点及び孤立孔を除去するために、前記色分類画像の複数の前記帯域画素値を多色画像の少なくとも１個の帯域画素値に変換する手段と、前記多色画像の各々の前記帯域画素値を出力する手段と、を備えたことを特徴とする視覚装置である。本発明は前記配列演算ユニットが提供する機能をデジタル技術によって実現するためのアルゴリズムの実装形態である。前記デジタル画像がカラー画像の場合、前記多色画像は、少なくとも白色、黒色、赤色、緑色及び灰色の５色を含むことができる。前記配列演算ユニットを前記二次元格子状に配列し、前記配列演算ユニットを近傍同士相互に結合し、前記配列演算ユニットの各パラメータの初期値を設定した後に、前記デジタル画像を画素単位で適宜入力し、平滑化から前記多色画像の出力までを順次行い、前記デジタル画像が入力されなくなるまで繰り返す。本発明は前記配列演算ユニットを並列に動作させることができるので、複数の前記文字及び前記図形の抽出に関する諸問題が好適に解決される。
【００１６】
請求項５の発明は、二次元格子状に配列された複数の配列演算ユニットから構成される少なくとも１個のデータ処理装置を含む視覚装置であって、前記データ処理装置の各々が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、各々の前記配列演算ユニットが、少なくとも１個の前記プロセッサを用いて、少なくとも１個の前記帯域画素値を処理することと、を特徴とし、各々の前記配列演算ユニットにおいて、前記配列演算ユニットを初期化する手段と、入力すべきデジタル画像の帯域画素値がなければ処理を終了する手段と、前記デジタル画像の少なくとも１個の前記帯域画素値を入力する手段と、前記デジタル画像の各画素を平滑するために、前記デジタル画像の複数の前記帯域画素値を平滑画像の少なくとも１個の帯域画素値に変換する手段と、前記平滑画像の各画素を強調するために、前記平滑画像の複数の前記帯域画素値を強調画像の少なくとも１個の帯域画素値に変換する手段と、前記強調画像の各画素を少なくとも５色に分類するために、前記強調画像の少なくとも１個の前記帯域画素値から色分類画像の少なくとも１個の帯域画素値を生成する手段と、前記色分類画像の各画素から孤立点及び孤立孔を除去するために、前記色分類画像の複数の前記帯域画素値を多色画像の少なくとも１個の帯域画素値に変換する手段と、前記多色画像から文字及び図形を選択するために、前記多色画像の複数の前記帯域画素値を粗文字／図形画像の少なくとも１個の帯域画素値に変換する手段と、前記粗文字／図形画像からテクスチャを除去するために、前記粗文字／図形画像の複数の前記帯域画素値を文字／図形画像の少なくとも１個の帯域画素値に変換する手段と、前記文字／図形画像の各々の前記帯域画素値を出力する手段と、を備えたことを特徴とする視覚装置である。本発明において、文字とは所定の線幅以下の線のみから構成されたものを指し、図形とは所定の線幅を越える線若しくは領域を少なくとも１個含むものを指すものとする。そこで本発明では、前記デジタル画像中のこれらの文字及び／又は図形（以下、文字／図形と略記する。）を抽出し、前記文字／図形画像を生成する。つまり、本発明は前記配列演算ユニットが提供する機能をデジタル技術によって実現するためのアルゴリズムの実装形態である。前記デジタル画像がカラー画像の場合、前記多色画像は、少なくとも白色、黒色、赤色、緑色及び灰色の５色を含むことができる。前記配列演算ユニットを前記二次元格子状に配列し、前記配列演算ユニットを近傍同士相互に結合し、前記配列演算ユニットの各パラメータの初期値を設定した後に、前記デジタル画像を画素単位で適宜入力し、平滑化から前記文字／図形画像の出力までを順次行い、前記デジタル画像が入力されなくなるまで繰り返す。本発明は前記配列演算ユニットを並列に動作させることができるので、複数の前記文字／図形の抽出に関する諸問題が好適に解決される。
【００１７】
請求項６の発明は、請求項５記載の視覚装置において、前記デジタル画像を用いて前記文字／図形画像を精細化し、精細文字／図形画像の少なくとも１個の帯域画素値に変換する手段と、前記文字／図形画像の各々の前記帯域画素値の代りに、前記精細文字／図形画像の各々の前記帯域画素値を出力する手段と、備えたことを特徴とする視覚装置である。本発明において、文字とは所定の線幅以下の線のみから構成されたものを指し、図形とは所定の線幅を越える線若しくは領域を少なくとも１個含むものを指すものとする。そこで本発明では、前記デジタル画像中のこれらの文字及び／又は図形（以下、文字／図形と略記する。）を抽出し、前記精細文字／図形画像を生成する。つまり、本発明は前記配列演算ユニットが提供する機能をデジタル技術によって実現するためのアルゴリズムの実装形態である。前記デジタル画像がカラー画像の場合、前記多色画像は、少なくとも白色、黒色、赤色、緑色及び灰色の５色を含むことができる。前記配列演算ユニットを前記二次元格子状に配列し、前記配列演算ユニットを近傍同士相互に結合し、前記配列演算ユニットの各パラメータの初期値を設定した後に、前記デジタル画像を画素単位で適宜入力し、平滑化から前記精細文字／図形画像の出力までを順次行い、前記デジタル画像が入力されなくなるまで繰り返す。本発明は前記配列演算ユニットを並列に動作させることができるので、複数の前記文字／図形の抽出に関する諸問題が好適に解決される。
【００１８】
請求項７の発明は、請求項１〜５のうちのいずれか１項に記載の視覚装置と、前記デジタル画像を撮影する撮影装置又は前記デジタル画像を描画する描画装置と、画像圧縮手段及び画像通信手段を含む通信装置と、を含む通信システムであって、前記視覚装置が出力した、前記多色画像、前記文字／図形画像又は前記精細文字／図形画像に対して、前記画像圧縮手段が、複数の前記灰色の前記帯域画素値を一纏めに圧縮して圧縮画像データを生成することと、前記画像通信手段が、前記圧縮画像データを通信ネットワークに送信することと、を特徴とする通信システムである。前記撮影装置は、カメラ及びスキャナなど、前記デジタル画像を撮影できるものである。また、前記描画装置は、マウス、タブレット及びタッチパネルなど、前記デジタル画像を描画できるものであることが好ましい。前記通信ネットワークは、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、イーサーネット、ＩＳＤＮ（総合デジタル通信網）、ｘＤＳＬ（ＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）、ＡＴＭ（ＡｓｙｎｃｈｒｏｎｏｕｓＴｒａｎｓｆｅｒＭｏｄｅ）及びフレームリレーなどの有線技術、赤外線、無線ＬＡＮ、ＩＭＴ−２０００（ＤＳ−ＣＤＭＡ及びＭＣ−ＣＤＭＡなど）及びＢｌｕｅｔｏｏｔｈなどの無線技術から構成される。また前記画像圧縮手段は、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）及びＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃｃｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）などの画像圧縮技術を用いて、前記多色画像、前記文字／図形画像及び前記精細文字／図形画像のうち少なくとも１個を圧縮することができることが好ましい。このとき、前記画像圧縮手段がＧＩＦ（ＧｒａｐｈｉｃＩｎｔｅｒｃｈａｎｇｅＦｏｒｍａｔ）及びＦｌａｓｈのようなアニメーションに適した圧縮形式を用いるならば、前記画像圧縮手段はこれらの画像を高い圧縮率で圧縮することができる。本発明は、前記視覚装置が出力するこれらの画像を高速に通信することができるので、前記文字／図形の通信に関する諸問題が好適に解決される。
【００１９】
請求項８の発明は、請求項７記載の通信システムと、データベースを含む少なくとも１個のコンピュータシステムと、前記通信システムと前記コンピュータシステムを接続する通信ネットワークと、を含む文字／図形探索システムであって、前記コンピュータシステムの各々が、前記通信システムから前記圧縮画像データを受信する画像受信手段と、受信した前記圧縮画像データを前記多色画像、前記文字／図形画像又は前記精細文字／図形画像に伸張する画像伸張手段と、前記多色画像、前記文字／図形画像又は前記精細文字／図形画像から少なくとも１個の前記文字／図形を含む領域を検出する文字／図形領域検出手段と、前記領域から前記データベースのキー画像を生成するキー画像生成手段と、前記キー画像を、前記データベースに蓄えられた記録画像データと比較する画像検出手段と、を含むことにより、前記領域に対する検出結果を生成することを特徴とする文字／図形探索システムである。前記画像受信手段は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、イーサーネット、ＩＳＤＮ（総合デジタル通信網）、ｘＤＳＬ（ＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）、ＡＴＭ（ＡｓｙｎｃｈｒｏｎｏｕｓＴｒａｎｓｆｅｒＭｏｄｅ）及びフレームリレーなどの有線技術、赤外線、無線ＬＡＮ、ＩＭＴ−２０００（ＤＳ−ＣＤＭＡ及びＭＣ−ＣＤＭＡなど）及びＢｌｕｅｔｏｏｔｈなどの無線技術を用いて、前記通信ネットワークから前記圧縮画像データを受信する。前記画像伸張手段は、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）及びＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃｃｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）などの画像伸張技術を用いて、前記圧縮画像データを前記多色画像、前記文字／図形画像又は前記精細文字／図形画像に伸張することができることが好ましい。前記文字／図形領域検出手段は、前記多色画像、前記文字／図形画像又は前記精細文字／図形画像が多数の前記灰色を含むことを利用して、水平方向と垂直方向のヒストグラムを計算することにより、少なくとも１個の前記文字／図形を含む前記領域の位置及び大きさを検出することが好ましい。また、前記視覚装置の位置／大きさ検出アルゴリズムを用いることにより、前記文字／図形領域検出手段は、多数の前記領域の前記位置及び前記大きさを高速に検出することができる。前記キー画像生成手段は、前記位置及び前記大きさを用いて、前記多色画像、前記文字／図形画像又は前記精細文字／図形画像から前記領域を切り出し、前記キー画像を生成する。前記画像検出手段は、前記データベースに組み込まれた画像検索技術（例えばｉｍｇＳｅｅｋなど）及びニューラルネットワークなどの画像処理技術を用いて、前記キー画像に対応する前記データを前記データベースの中から検出することが好ましい。前記データベースにはＳＱＬ（ＳｔｒｕｃｔｕｒｅＱｕｅｒｙＬａｎｇｕａｇｅ）のようなデータベース記述言語を備えたリレーショナルデータベース及びオブジェクト指向データベースなどが用いられることが好ましい。ここに、前記文字／図形に関する前記記録画像データが蓄えられ、前記記録画像データには画像ＩＤが割り当てられることが好ましい。前記コンピュータシステムは、前記多色画像、前記文字／図形画像又は前記精細文字／図形画像から前記灰色の前記帯域画素値を除去した後、残りの前記帯域画素値が指し示す前記文字／図形の前記領域を切り出し、前記記録画像データと比較する。これにより、前記コンピュータシステムが、背景など、前記文字／図形以外の部分と、前記記録画像データを比較する必要がない。一般に、前記領域に対する前記検出結果は前記記録画像データであるが、前記記録画像データの代りに前記画像ＩＤが用いられても良い。本発明は、前記データベースの前記データの量を少なくすることができると共に、前記コンピュータシステムにおいて所定の前記文字／図形を高速に検出することができるので、前記文字／図形の探索に関する諸問題が好適に解決される。
【００２０】
請求項９の発明は、請求項８記載の文字／図形探索システムにおいて、少なくとも１個の前記コンピュータシステムが、画像送信手段を用いて、前記領域に対する前記検出結果又は前記領域の位置情報を前記通信装置に送信することと、前記通信装置がディスプレイ又はスピーカを含むことと、により、前記通信装置が前記ディスプレイ又は前記スピーカを用いて前記検索結果又は前記位置情報を出力することを特徴とする文字／図形探索システムである。前記画像送信手段は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、イーサーネット、ＩＳＤＮ（総合デジタル通信網）、ｘＤＳＬ（ＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）、ＡＴＭ（ＡｓｙｎｃｈｒｏｎｏｕｓＴｒａｎｓｆｅｒＭｏｄｅ）及びフレームリレーなどの有線技術、赤外線、無線ＬＡＮ、ＩＭＴ−２０００（ＤＳ−ＣＤＭＡ及びＭＣ−ＣＤＭＡなど）及びＢｌｕｅｔｏｏｔｈなどの無線技術を用いて、前記通信ネットワークに前記検索結果又は前記位置情報を送信することが好ましい。前記位置情報は、前記デジタル画像における前記文字／図形の二次元座標若しくは前記デジタル画像の中心からの相対的な方向、例えば上、下、左及び右などを表す。前記通信装置は前記ディスプレイに前記二次元座標の位置を表示するか、若しくは前記スピーカを用いて上、下、左及び右などの音声データを出力する。前記撮影装置の利用者は、前記ディスプレイ上の前記位置又は前記音声データの指示に従い、前記撮影装置又は撮影物の向きを変えることにより、所定の前記文字／図形を容易に見付けることができる。本発明は、前記通信ネットワークの通信容量が少なくても、前記文字／図形を探索することができるので、前記文字／図形の探索に関する諸問題が好適に解決される。
【００２１】
請求項１０の発明は、請求項８記載の文字／図形探索システムにおいて、少なくとも１個の前記コンピュータシステムが、画像送信手段を用いて、前記記録画像データに関連付けられた検索結果文字列又は検索結果画像を前記通信装置に送信することと、前記通信装置がディスプレイ又はスピーカを含むことと、により、前記通信装置が前記ディスプレイ又は前記スピーカを用いて前記検索結果文字列又は前記検索結果画像を出力することを特徴とする文字／図形探索システムである。前記画像送信手段は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、イーサーネット、ＩＳＤＮ（総合デジタル通信網）、ｘＤＳＬ（ＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）、ＡＴＭ（ＡｓｙｎｃｈｒｏｎｏｕｓＴｒａｎｓｆｅｒＭｏｄｅ）及びフレームリレーなどの有線技術、赤外線、無線ＬＡＮ、ＩＭＴ−２０００（ＤＳ−ＣＤＭＡ及びＭＣ−ＣＤＭＡなど）及びＢｌｕｅｔｏｏｔｈなどの無線技術を用いて、前記検索結果画像又は前記検索結果文字列を前記通信ネットワークに送信する。前記検索結果文字列は任意の文字列であり、前記検索結果画像も任意の画像であることが好ましい。前記検索結果文字列及び前記検索結果画像は前記記録画像データに関連付けられた状態で前記データベースに記録される。したがって、前記キー画像によって前記記録画像データが検索された場合、前記データベースは、前記記録画像データを用いることにより、前記検索結果文字列又は前記検索結果画像を参照することができる。前記通信装置は前記ディスプレイに前記検索結果文字列又は前記検索結果画像を表示するか、若しくは前記スピーカを用いて前記検索結果文字列の音声データを出力する。前記撮影装置の利用者は、前記検索結果文字列又は前記検索結果画像を得るまで適当に前記撮影装置又は撮影物の向きを変えることにより、所定の前記文字／図形を容易に見付けることができる。本発明は、前記通信ネットワークの通信容量が少なくても、前記文字／図形を探索することができるので、前記文字／図形の探索に関する諸問題が好適に解決される。
【００２２】
【発明の実施の形態】
以下、本発明の配列演算ユニット(ARRAY OPERATION UNIT)１００を利用した視覚装置(VISUAL DEVICE)２の実施形態を挙げ、図面を参照して説明する。
【００２３】
まず、図１に示すように、視覚装置２は少なくとも１個のデータ処理装置１１０を含む。このデータ処理装置１１０は、少なくとも１個のプロセッサ１０１を含み、このプロセッサ１０１が、局所関数によって記述される局所並列画像処理アルゴリズムに従った、少なくとも１個の手段を実現すると共に、少なくとも１個の入力画像を入力した後でこの局所並列画像処理を実行し、必要に応じて少なくとも１個の出力画像を出力する。なお、上述の手段が複数ある場合、この視覚装置２は、図１に示すように１個のデータ処理装置１１０を用いてこれらの手段を実現することもできるし、図２に示すように複数（ここでは４個）のデータ処理装置１１０を用いてこれらの手段を実現することもできる。そこで以下では、説明を簡単にするために、この視覚装置２は１個のデータ処理装置１１０を用いるものとする。
【００２４】
次に、図３に示すように、このデータ処理装置１１０は、格子状に配列された複数の配列演算ユニット１００から構成される。これらの配列演算ユニット１００の各々は、このプロセッサ１０１のプログラムとして記述される。しかしながら、このプロセッサ１０１の数が１個の場合、これらの配列演算ユニット１００のプログラムは順番に実行されるため、このデータ処理装置１１０は局所並列画像処理の並列性を生かすことができない。そこで以下では、この並列性を生かしたデータ処理装置１１０の専用回路について説明する。
【００２５】
図３のように、これらの配列演算ユニット１００はデータ処理装置１１０中で格子状に配列され、さらにこれらの配列演算ユニット１００の各々はデータ処理装置１１０中の隣接する配列演算ユニット１００だけと相互に通信できるように配線される。つまり４近傍同士が直接配線されることになる。これにより８近傍同士を配線する場合に比べて、少ない電子部品と配線量で、同程度に高速に動作し、しかも将来近傍サイズを拡張する場合にも簡単に拡張性を有することができる。
【００２６】
さらにこれらの配列演算ユニット１００の各々は、図１０に示す通り、画像処理における数式を計算するためのプロセッサ１０１と、数式で使われる全てのパラメータ、定数、関数及びオペレータを記憶するための一組のメモリ１０２と、近傍の配列演算ユニット１００と通信するための少なくとも１個のコントローラ１０３から構成される。なお、このプロセッサ１０１及びこのコントローラ１０３は、それぞれクロック信号を入力する同期回路である。このプロセッサ１０１は、アドレスバス５１で指定したアドレスによりこれらのメモリ１０２及びコントローラ１０３の任意のメモリ素子及びレジスタを選択することができる。またこのプロセッサ１０１はデータバス５２を介してこれらのメモリ１０２及びコントローラ１０３と双方向に通信可能に接続され、このアドレスバス５１で指定された任意のメモリ素子及びレジスタのデータにアクセスすることができる。この配列演算ユニット１００が１つ以上の入力画素から構成される前入力データ群を入力すると、このコントローラ１０３は前入力データ群をこのメモリ１０２に記憶させる。またこのコントローラ１０３は、関数により作成された、このメモリ１０２中の計算データを隣接する配列演算ユニット１００に送信すると共に、この隣接する配列演算ユニット１００から受信した計算データをこのメモリ１０２に記憶させ、さらに必要ならば、入力した以外の配列演算ユニット１００に転送する。最終的にこのコントローラ１０３は、出力画像の画像データを結果データとして出力する。勿論、この配列演算ユニット１００は複数のコントローラ１０３を備えても良い。なお、この配列演算ユニット１００及びこれを改良した仮想配列演算ユニットの詳細な回路図については、特許文献３を参照のこと。
【００２７】
さてここまでは、視覚装置２を構成するデータ処理装置１１０について説明してきた。このデータ処理装置１１０は、少なくとも１個のプロセッサ１０１を備えることにより、局所並列画像処理に従った少なくとも１個の手段を実現することができる。そこで以下では、本発明で用いるこれらの手段について説明する。
【００２８】
図４に示すように、請求項１記載の発明に対応する視覚装置２の実施形態は、データ処理装置１１０上に実装された多色分類手段１１から構成される。この多色分類手段１１は、デジタル画像１１１を入力して、少なくとも５色を含む多色画像１１２を生成する。以下では説明を簡単にするために、デジタル画像１１１は一般的なカラー画像であるものとし、この多色分類手段１１は、このデジタル画像１１１の各画素を白色、黒色、赤色、緑色及び灰色の５色に分類し、この多色画像１１２として各々の色毎に二値画像を生成するものとする。但し、灰色は残りの４色以外を意味するので、実際には白色、黒色、赤色及び緑色の４色があれば十分である。したがって多色画像１１２は、それぞれ白色、黒色、赤色及び緑色の二値画像に対応した、多色画像（白）１１２ａ、多色画像（黒）１１２ｂ、多色画像（赤）１１２ｃ及び多色画像（緑）１１２ｄから構成されるものとする。なお、このデジタル画像１１１を白色、黒色、赤色、緑色及び灰色の５色に分類した場合、一般に黄色は白色に、青色は黒色に分類されるが、赤みがかった黄色及び緑がかった青色は、それぞれ赤色及び緑色に分類される。
【００２９】
さて、この多色分類手段１１は、補色関係に基づいてデジタル画像１１１の各画素を白色、黒色、赤色、緑色及び灰色の５色に分類し、多色画像１１２を生成する。この補色関係は、人間の視覚において見られる特徴であり、色相環において向い合う色の組み合せである。例えば赤色と緑色は補色関係にあり、１画素においてこれらの赤色と緑色が同時に見えることはなく、しかもお互いにコントラストを強調する。また、黄色と青色（実際には紫色）も補色関係にある。さらに、白色と黒色は相対的な明るさを表し、ある画素が同じ明るさであっても、その周辺が暗ければこの画素は白色に見え、その周辺が明るければこの画素は黒色に見える。つまり灰色というどの色にも属さない周辺の明るさを基準とした相対的な明るさが人間にとっての白色及び黒色である。したがって、１画素において白色と黒色も同時に見えることはない。そこでデジタル画像１１１の各画素において、赤帯域の画素値と緑帯域の画素値の差をそれぞれの色の強さとし、さらに周辺の画素の明るさを基準とした各画素の明るさを白色及び黒色の強さとする。ここで重要なのは、両方の組とも色の強さに差がない場合、この画素を灰色、つまり４色以外の色とすることである。これにより、このデジタル画像１１１のうち色の変化が僅かな画素は全て灰色となり、一方で文字及び図形など複数の色が複雑に入り組んだ画素は灰色以外の色になる。実際に人間が文字及び図形を書く場合、これらの文字及び図形を見易くするために、無意識にコントラストを強調する色の組み合せを選んでいる。したがって、この多色分類手段１１は、この補色関係を利用することにより、これらの文字及び図形の色を損なうことなく、このデジタル画像１１１から、これらの文字及び図形と、それらの輪郭と、を除いた背景を灰色で塗り潰すことができる。このとき、この多色画像１１２は丁度マンガの描写に似た画像となる。
【００３０】
なお、ここでは多色画像１１２として４個の二値画像を用いたが、勿論この多色画像１１２の各画素を３ビットで表しても良い。また、この多色分類手段１１は、補色関係に基づいて、白色、黒色、赤色、緑色、青色、黄色及び灰色の７色に分類しても良い。勿論この場合も、この多色画像１１２の各画素を３ビットで表すことができる。さらに、デジタル画像１１１として、赤外線及び紫外線など、可視光以外の帯域を少なくとも１個含んでも良い。この場合、これらの帯域を赤色、緑色、青色及び黄色のいずれかに割り当てることができる。勿論、これら以外の新たな色に割り当てても良い。
【００３１】
ところで、所定の線幅以下で書かれた文字がデジタル画像１１１中にある場合、このデジタル画像１１１から生成された多色画像１１２中にもこの文字とほぼ同じ文字が現れる。また、この線幅を越える線及び領域を含む図形がこのデジタル画像１１１中にある場合、この多色画像１１２中にもこの図形とほぼ同じ図形及びその輪郭が現れる。そこで以下では、この多色画像１１２から、これらの文字及び／又は図形（以下、単に文字／図形と略記する。）を抽出する視覚装置２について説明する。
【００３２】
図５に示すように、請求項２記載の発明に対応する視覚装置２の実施形態は、データ処理装置１１０上に実装された文字／図形抽出手段２１によって実現され、さらにこの文字／図形抽出手段２１は、多色分類手段１１、文字／図形選択手段１２及びテクスチャ除去手段１３から構成される。なお、図５において、この多色分類手段１１は、デジタル画像１１１を白色、黒色、赤色、緑色及び灰色の５色に分類する。したがって、視覚装置２が出力する文字／図形画像１１３は、文字／図形画像（白）１１３ａ、文字／図形画像（黒）１１３ｂ、文字／図形画像（赤）１１３ｃ及び文字／図形画像（緑）１１３ｄから構成されるものとする。この文字／図形選択手段１２は、この多色分類手段１１が生成した多色画像１１２を入力して、この多色画像１１２の各帯域画素値に対して、色毎に所定の近傍の帯域画素値の総和を求め、この総和が所定の範囲内にあるか判定する。このとき、この総和がこの範囲内にあれば、この文字／図形選択手段１２は、この多色画像１１２の帯域画素値が文字／図形に含まれると判断して、粗文字／図形画像の対応する色の帯域画素値にする。さもなくば、この帯域画素値を灰色にする。これにより、この文字／図形選択手段１２は、背景に関係なく、この多色画像１１２中の文字／図形に含まれる画素を色毎に選択することができる。さらに、このテクスチャ除去手段１３は、この粗文字／図形画像を入力して色毎にエッジ情報を生成し、このエッジ情報に対して、色毎に所定の近傍の帯域画素値の総和を求め、この総和が所定の範囲内にあるか判定する。このとき、この総和がこの範囲内になければ、このテクスチャ除去手段１３は、この粗文字／図形画像の帯域画素値がテクスチャに含まれないと判断して、文字／図形画像１１３の対応する色の帯域画素値にする。さもなくば、この帯域画素値を灰色にする。これにより、このテクスチャ除去手段１３は、文字／図形の大きさ又は形に関係なく、この粗文字／図形画像中のテクスチャに含まれる画素を色毎に除去することができる。
【００３３】
なお、ここでは文字／図形画像１１３として４個の二値画像を用いたが、勿論この文字／図形画像１１３の各画素を３ビットで表しても良い。また、この多色分類手段１１は、補色関係に基づいて、白色、黒色、赤色、緑色、青色、黄色及び灰色の７色に分類しても良い。勿論この場合も、この文字／図形画像１１３の各画素を３ビットで表すことができる。さらに、デジタル画像１１１として、赤外線及び紫外線など、可視光以外の帯域を少なくとも１個含んでも良い。この場合、これらの帯域を赤色、緑色、青色及び黄色のいずれかに割り当てることができる。勿論、これら以外の新たな色に割り当てても良い。
【００３４】
さて、ここで図６に示すように、請求項３記載の発明に対応する視覚装置２の実施形態は、精細文字／図形抽出手段２２によって実現され、この精細文字／図形抽出手段２２は、上述の文字／図形抽出手段２１に対して精細化手段１４を加えたものである。この精細化手段１４は、文字／図形画像１１３を用いることにより、デジタル画像１１１から文字／図形の帯域画素値を抜き出して、精細文字／図形画像１１４の対応する帯域画素値にし、それ以外を所定の灰色にする。これにより、本発明は、文字／図形画像１１３では正確に表すことができなかったこの文字／図形の色を正確に表すことができる。さらに、この文字／図形以外の画素を灰色にすることにより、本発明は、精細文字／図形画像１１４の情報量を少なくすると共に、圧縮プログラムによってこの精細文字／図形画像１１４の更なる圧縮を可能にすることができる。
【００３５】
さて、視覚装置２で用いられている多色分類手段１１、文字／図形抽出手段２１及び精細化手段１４は、幾つかの配列演算ユニット１００から構成されるデータ処理装置１１０を用いることにより実装することができる。そこで以下では、配列演算ユニット１００を利用したデータ処理装置１１０の実施形態を挙げ、この視覚装置２を図面を参照して説明する。
【００３６】
まず配列演算ユニット１００は、入力画像の１つの画素とその近傍画素を用いることにより、出力画像の１つの画素を生成する。そこで図３に示したように、配列演算ユニット１００を入力画像のサイズに合わせて格子状に配列したデータ処理装置１１０を用いることにより、データ処理装置１１０は入力画像から出力画像を生成することができる。なお図３において、配列演算ユニット１００をＡＯＵと略記する。また図３では、配列演算ユニット１００は正方格子状に配列されているが、勿論実装面積を最小にするために、配列演算ユニット１００を六角格子状、つまり最密充填構造に配置しても良い。この場合、配列演算ユニット１００間の複数の信号線の一部はジグザグに配線される。次に配列演算ユニット１００は専用ハードウェアによって実装されても良いし、汎用コンピュータ上でソフトウェアによって実装することもできる。つまり入力画像から出力画像を生成することができれば、実装手段は制限されない。したがって配列演算ユニット１００のアルゴリズムを示すことにより、データ処理装置１１０の画像処理を示すことができる。そこで配列演算ユニット１００のアルゴリズムを示すために、図４、５及び６で示された多色分類手段１１、文字／図形抽出手段２１、テクスチャ除去手段１３及び精細化手段１４で用いる数式について説明する。
【００３７】
幅ｗ、高さｈ、帯域数ｂの任意の２^ｎ階調画像をｘ及びｙとすると、これらの画像ｘ及びｙは各々位置ｐ（ｉ，ｊ，ｋ）の帯域画素値ｘ_ｉｊｋ及びｙ_ｉｊｋを用いて数式１及び２のように表される。なおアンダーラインが付された文字はベクトルを示す。またｎは非負の整数、ｗ、ｈ、ｂ、ｉ、ｊ、ｋは自然数である。
【００３８】
【数１】

【００３９】
【数２】

【００４０】
まず前記画像の各帯域画素値に対する点処理に関する関数について以下で説明する。
【００４１】
画像ｘを二値画像に変換する場合、数式３に従って帯域画素値を二値化する。
【００４２】
【数３】

【００４３】
画像ｘを下限ａと上限ｂで飽和させる場合、数式４に従って帯域画素値を飽和させる。
【００４４】
【数４】

【００４５】
画像ｘから下限ｃと上限ｄに対応するウィンドウを生成する場合、数式５に従って帯域画素値を生成する。
【００４６】
【数５】

【００４７】
画像ｘを帯域最大値画像に変換する場合、数式６に従ってｉ行ｊ列の画素の各帯域の値のうち最大値を選択する。なお前記帯域最大値画像は単帯域画像となるので、便宜上帯域数１の前記画像として取り扱うことにする。したがって第３添字は１となっている。
【００４８】
【数６】

【００４９】
さて、画像の位置ｐ（ｉ，ｊ，ｋ）におけるｑ近傍の位置の集合Ｐ_ｉｊｋ（ｑ）は数式７によって表される。ただしｑは４、８、２４、４８、８０、１２０、（２ｒ＋１）^２−１と続く数列であり、ｒは自然数である。なお画像サイズをはみ出した位置が集合Ｐ_ｉｊｋ（ｑ）に含まれる場合には、特に指定がない限り位置ｐ（ｉ，ｊ，ｋ）を代用するものとする。またこれ以外のときは、指定に従い、画素値が０に相当し、しかも画像に含まれない架空の位置を代用する。これにより辺縁処理は自動的に行われる。したがって集合Ｐ_ｉｊｋ（ｑ）の要素の数は常にｑとなる。
【００５０】
【数７】

【００５１】
そこで次に画像の各帯域画素値に対する最大８近傍の近傍処理に関する関数及びオペレータについて以下で説明する。
【００５２】
画像ｘの位置ｐ（ｉ，ｊ，ｋ）における平滑化は数式８に従って行われる。ただしｉｎｔ（ｖ）は実数ｖの小数点以下切り捨てを意味するものとする。もし画像ｘの帯域画素値が整数値であるならば、ハードウェアの実装時にｑ＝４のときｘ_ｌｍｋの総和に対して右シフト命令を２回、ｑ＝８のときｘ_ｌｍｋの総和に対して右シフト命令を３回実行するような回路に変更することにより、除算を実行する回路を省くことができる。
【００５３】
【数８】

【００５４】
ラプラシアンの計算は、数式９に示すように単なる２階差分オペレータである。８近傍の方がノイズの微妙な変化を捉えてゼロ点およびゼロ交差が多くなり、本発明には向いている。ただしｑが４か８であるので、ハードウェアの実装時にｑ＝４のときｘ_ｉｊｋに対して左シフト命令を２回、ｑ＝８のときｘ_ｉｊｋに対して左シフト命令を３回実行するような回路に変更することにより、乗算を実行する回路を省くことができる。
【００５５】
【数９】

【００５６】
画像ｘが任意の二値画像であるとして、この画像ｘのうち孤立点ないし孤立孔を除去する場合には、数式１０に従い計算する。なお４近傍の場合にはその性質上対角線を検知することができないので、極力８近傍にした方がよい。
【００５７】
【数１０】

【００５８】
次に画像の各帯域画素値に対する近傍処理に関する関数及びオペレータについて以下で説明する。
【００５９】
２つの画像ｘ、ｙがある場合、これらの画像の差分は数式１１に従って計算される。
【００６０】
【数１１】

【００６１】
ここで数式９によるラプラシアンと数式１１による差分を用いると、数式１２に従い画像ｘの鮮鋭化を簡単に記述することができる。
【００６２】
【数１２】

【００６３】
２つの画像ｘ、ｙがあり、画像ｙが単帯域二値画像である場合、数式１３に従い、この画像ｙの帯域画素値を用いてこの画像ｘの各帯域画素値を抜き出すことができる。
【００６４】
【数１３】

【００６５】
さらに、２つの画像ｘ、ｙがあり、画像ｙが単帯域二値画像である場合、数式１４に従い、画像ｙの帯域画素値を用いてこの画像ｘの各帯域画素値を抜き出すと共に、残りの帯域画素値を任意の値ｇに設定することができる。なお、数式１３は、数式１４においてｇ＝０と同等である。
【００６６】
【数１４】

【００６７】
画像ｘの位置ｐ（ｉ，ｊ，ｋ）におけるｑ近傍の総和は数式１５に従って行われる。
【００６８】
【数１５】

【００６９】
画像ｘがカラー画像のような多帯域画像である場合、その２つの帯域、例えば第一帯域である赤色と第二帯域である緑色を用いて、この画像ｘ中の文字／図形及びこの輪郭から白色、黒色、赤色、緑色及び灰色を抽出するものとする。このとき画像ｘの位置ｐ（ｉ，ｊ，ｋ）における多色抽出は、そのｑ近傍を用いて、それぞれ数式１６〜１９に従って行われる。なお、数式１６〜１９の右辺は本来ｑによって割られる必要があるが、これらの帯域画素値は互いに大小を比較するためだけに用いられるので、ここでは右辺にｑを掛けることにより除算が省略されている。また、灰色は、白色、黒色、赤色及び緑色の中間色であり、これらの色以外を指すものとする。
【００７０】
【数１６】

【００７１】
【数１７】

【００７２】
【数１８】

【００７３】
【数１９】

【００７４】
次に、この画像から抽出された複数の色の中から１色を選択するものとする。このとき４帯域画像ｘの位置ｐ（ｉ，ｊ，ｋ）における多色選択は、所定の値ｆを用いて、それぞれ数式２０〜２３に従って行われる。このとき、この４帯域画像ｘの各画素に対して、ｆは適切に微調整されることが望ましい。なお、これらの数式が白色、黒色、赤色及び緑色のいずれも選択しない場合、これらの数式が指し示す画素を灰色とする。
【００７５】
【数２０】

【００７６】
【数２１】

【００７７】
【数２２】

【００７８】
【数２３】

【００７９】
ここで数式１６〜１９による多色抽出と数式２０〜２３による多色選択を用いると、数式２４に従い画像ｘの多色分類を簡単に記述することができる。
【００８０】
【数２４】

【００８１】
画像ｘの位置ｐ（ｉ，ｊ，ｋ）におけるｑ近傍に対して、数式３の二値化、数式１５の総和、数式５のウィンドウ及び数式１３のマスクを用いると、数式２５に従いこの画像ｘの帯域毎の文字／図形選択を簡単に記述することができる。
【００８２】
【数２５】

【００８３】
二値画像ｘの位置ｐ（ｉ，ｊ，ｋ）におけるｑ近傍に対して、数式９によるラプラシアン、数式３による二値化、数式１５による総和、数式５によるウィンドウ及び数式１３によるマスクを用いると、数式２５に従いこの二値画像ｘの帯域毎のテクスチャ除去を簡単に記述することができる。
【００８４】
【数２６】

【００８５】
２つの画像ｘ及びｙがあり、この画像ｙが任意の帯域数の二値画像である場合、数式６による最大値選択及び数式１４によるマスクを用いると、数式２７に従いこの二値画像ｙの精細化を簡単に記述することができる。
【００８６】
【数２７】

【００８７】
そこで数式１から数式２７までを用いることにより、図４、５及び６で示された多色分類手段１１、文字／図形抽出手段２１及び精細文字／図形抽出手段２２を実装するデータ処理装置１１０の全ての配列演算ユニット１００のアルゴリズムを記述することができる。以下では、データ処理装置１１０中の任意の配列演算ユニット１００のアルゴリズムを用いて、多色分類手段１１、文字／図形抽出手段２１及び精細文字／図形抽出手段２２を順番に説明する。
【００８８】
図４に示すように、データ処理装置１１０によって実現される多色分類手段１１がデジタル画像１１１から多色画像１１２を生成するために、格子状に配列された配列演算ユニット１００は同期して並列に動作する。格子上ｉ行ｊ列に配置された配列演算ユニット１００をＡＯＵ_ｉｊとすると、多色分類手段１１に対するＡＯＵ_ｉｊのアルゴリズムは図７のようになる。
【００８９】
ステップ１１０１で、ＡＯＵ_ｉｊを格子上のｉ行ｊ列に配置する。これは論理的であれ物理的であれ、ＡＯＵ_ｉｊの近傍を決定するために必要である。
【００９０】
ステップ１１０２で、ＡＯＵ_ｉｊの近傍や変数の初期値を設定する。近傍の設定においては、前記各関数で使う近傍サイズｑを個別に決めても良いし、全部を統一しても良い。本発明のデータ処理装置１１０が生成した文字／図形画像１１３の正確さを上げるためには近傍サイズｑを全て大きな値に設定することが望ましい。しかしながら計算時間の制約や、入力されるデジタル画像１１１のサイズなどにより、文字／図形抽出手段２１は必要に応じて適宜近傍サイズを変えることで対処することができる。
【００９１】
ステップ１１０３で、デジタル画像１１１が終了したかどうか判断する。もしデジタル画像１１１が無ければ（ステップ１１０３：ＹＥＳ）、アルゴリズムを終了する。もしデジタル画像１１１があれば（ステップ１１０３：ＮＯ）、ステップ１１０４に移行する。ただし特定の帯域数と画像サイズに対して配列演算ユニット１００を実装する場合には、無限ループにしても良い。
【００９２】
ステップ１１０４で、デジタル画像１１１のｉ行ｊ列の画素を帯域数分入力する。これは、ＡＯＵ_ｉｊがデジタル画像１１１のｉ行ｊ列の画素を一括して処理するためである。このためＡＯＵ_ｉｊは少なくとも帯域数分の画像データを記憶するメモリ１０２を必要とする。
【００９３】
ステップ１１０５で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、入力したデジタル画像１１１の各帯域画素値に対して関数Ｓ_ｉｊｋ（ｘ）に従い平滑化を行う。平滑化された帯域画素値は平滑画像の帯域画素値として扱われる。ここで関数Ｓ_ｉｊｋ（ｘ）は必要に応じて数回繰り返しても良い。一般的な多帯域画像の場合、この回数は２回で十分である。なお、この平滑化の後、必要に応じて、この平滑画像を対数変換しても良い。
【００９４】
ステップ１１０６で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、平滑画像の各帯域画素値に対して関数Ｅ_ｉｊｋ（ｘ）に従い鮮鋭化を行う。鮮鋭化された帯域画素値は鮮鋭画像の帯域画素値として扱われる。
【００９５】
ステップ１１０７で、鮮鋭画像の各帯域画素値に対して関数Ｍ^ｑ _ｉｊｋ（ｘ）に従い多色に分類する。分類された帯域画素値は色分類画像の帯域画素値として扱われる。
【００９６】
ステップ１１０８で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、色分類画像の帯域画素値に対して関数Ａ_ｉｊｋ（ｘ）に従い孤立点および孤立孔を除去する。孤立点および孤立孔を除去された帯域画素値は多色画像１１２の帯域画素値として扱われる。
【００９７】
ステップ１１０９で、多色画像１１２の帯域画素値を出力する。その後ステップ１１０３に戻る。
【００９８】
これにより、配列演算ユニット１００から構成されるデータ処理装置１１０を用いて、多色分類手段１１はデジタル画像１１１から多色画像１１２を生成することができる。
【００９９】
図５に示すように、データ処理装置１１０によって実現される文字／図形抽出手段２１がデジタル画像１１１から文字／図形画像１１３を生成するために、格子状に配列された配列演算ユニット１００は同期して並列に動作する。格子上ｉ行ｊ列に配置された配列演算ユニット１００をＡＯＵ_ｉｊとすると、文字／図形抽出手段２１に対するＡＯＵ_ｉｊのアルゴリズムは図８のようになる。
【０１００】
ステップ２１０１で、ＡＯＵ_ｉｊを格子上のｉ行ｊ列に配置する。これは論理的であれ物理的であれ、ＡＯＵ_ｉｊの近傍を決定するために必要である。
【０１０１】
ステップ２１０２で、ＡＯＵ_ｉｊの近傍や変数の初期値を設定する。近傍の設定においては、前記各関数で使う近傍サイズｑを個別に決めても良いし、全部を統一しても良い。本発明のデータ処理装置１１０が生成した文字／図形画像１１３の正確さを上げるためには近傍サイズｑを全て大きな値に設定することが望ましい。しかしながら計算時間の制約や、入力されるデジタル画像１１１のサイズなどにより、文字／図形抽出手段２１は必要に応じて適宜近傍サイズを変えることで対処することができる。
【０１０２】
ステップ２１０３で、デジタル画像１１１が終了したかどうか判断する。もしデジタル画像１１１が無ければ（ステップ２１０３：ＹＥＳ）、アルゴリズムを終了する。もしデジタル画像１１１があれば（ステップ２１０３：ＮＯ）、ステップ２１０４に移行する。ただし特定の帯域数と画像サイズに対して配列演算ユニット１００を実装する場合には、無限ループにしても良い。
【０１０３】
ステップ２１０４で、デジタル画像１１１のｉ行ｊ列の画素を帯域数分入力する。これは、ＡＯＵ_ｉｊがデジタル画像１１１のｉ行ｊ列の画素を一括して処理するためである。このためＡＯＵ_ｉｊは少なくとも帯域数分の画像データを記憶するメモリ１０２を必要とする。
【０１０４】
ステップ２１０５で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、入力したデジタル画像１１１の各帯域画素値に対して関数Ｓ_ｉｊｋ（ｘ）に従い平滑化を行う。平滑化された帯域画素値は平滑画像の帯域画素値として扱われる。ここで関数Ｓ_ｉｊｋ（ｘ）は必要に応じて数回繰り返しても良い。一般的な多帯域画像の場合、この回数は２回で十分である。なお、この平滑化の後、必要に応じて、この平滑画像を対数変換しても良い。
【０１０５】
ステップ２１０６で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、平滑画像の各帯域画素値に対して関数Ｅ_ｉｊｋ（ｘ）に従い鮮鋭化を行う。鮮鋭化された帯域画素値は鮮鋭画像の帯域画素値として扱われる。
【０１０６】
ステップ２１０７で、鮮鋭画像の各帯域画素値に対して関数Ｍ^ｑ _ｉｊｋ（ｘ）に従い多色に分類する。分類された帯域画素値は色分類画像の帯域画素値として扱われる。
【０１０７】
ステップ２１０８で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、色分類画像の帯域画素値に対して関数Ａ_ｉｊｋ（ｘ）に従い孤立点および孤立孔を除去する。孤立点および孤立孔を除去された帯域画素値は多色画像１１２の帯域画素値として扱われる。
【０１０８】
ステップ２１０９で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、多色画像の各帯域画素値に対して関数Ｕ^ｑ _ｉｊｋ（ｘ）に従い文字／図形を選択する。選択された帯域画素値は粗文字／図形画像の帯域画素値として扱われる。なお、必要に応じて、この関数を数回繰り返しても良い。
【０１０９】
ステップ２１１０で、粗文字／図形画像の各帯域画素値に対して関数Ｔ^ｑ _ｉｊｋ（ｘ）に従いテクスチャを除去する。テクスチャを除去された帯域画素値は文字／図形画像１１３の帯域画素値として扱われる。なお、必要に応じて、この関数を数回繰り返しても良い。
【０１１０】
ステップ２１１１で、文字／図形画像１１３の帯域画素値を出力する。その後ステップ２１０３に戻る。
【０１１１】
これにより、配列演算ユニット１００から構成されるデータ処理装置１１０を用いて、文字／図形抽出手段２１はデジタル画像１１１から文字／図形画像１１３を生成することができる。
【０１１２】
図６に示すように、データ処理装置１１０によって実現される精細文字／図形抽出手段２２がデジタル画像１１１から精細文字／図形画像１１４を生成するために、格子状に配列された配列演算ユニット１００は同期して並列に動作する。格子上ｉ行ｊ列に配置された配列演算ユニット１００をＡＯＵ_ｉｊとすると、精細文字／図形抽出手段２２に対するＡＯＵ_ｉｊのアルゴリズムは図９のようになる。
【０１１３】
ステップ２２０１で、ＡＯＵ_ｉｊを格子上のｉ行ｊ列に配置する。これは論理的であれ物理的であれ、ＡＯＵ_ｉｊの近傍を決定するために必要である。
【０１１４】
ステップ２２０２で、ＡＯＵ_ｉｊの近傍や変数の初期値を設定する。近傍の設定においては、前記各関数で使う近傍サイズｑを個別に決めても良いし、全部を統一しても良い。本発明のデータ処理装置１１０が生成した精細文字／図形画像１１４の正確さを上げるためには近傍サイズｑを全て大きな値に設定することが望ましい。しかしながら計算時間の制約や、入力されるデジタル画像１１１のサイズなどにより、精細文字／図形抽出手段２２は必要に応じて適宜近傍サイズを変えることで対処することができる。
【０１１５】
ステップ２２０３で、デジタル画像１１１が終了したかどうか判断する。もしデジタル画像１１１が無ければ（ステップ２２０３：ＹＥＳ）、アルゴリズムを終了する。もしデジタル画像１１１があれば（ステップ２２０３：ＮＯ）、ステップ２２０４に移行する。ただし特定の帯域数と画像サイズに対して配列演算ユニット１００を実装する場合には、無限ループにしても良い。
【０１１６】
ステップ２２０４で、デジタル画像１１１のｉ行ｊ列の画素を帯域数分入力する。これは、ＡＯＵ_ｉｊがデジタル画像１１１のｉ行ｊ列の画素を一括して処理するためである。このためＡＯＵ_ｉｊは少なくとも帯域数分の画像データを記憶するメモリ１０２を必要とする。
【０１１７】
ステップ２２０５で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、入力したデジタル画像１１１の各帯域画素値に対して関数Ｓ_ｉｊｋ（ｘ）に従い平滑化を行う。平滑化された帯域画素値は平滑画像の帯域画素値として扱われる。ここで関数Ｓ_ｉｊｋ（ｘ）は必要に応じて数回繰り返しても良い。一般的な多帯域画像の場合、この回数は２回で十分である。なお、この平滑化の後、必要に応じて、この平滑画像を対数変換しても良い。
【０１１８】
ステップ２２０６で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、平滑画像の各帯域画素値に対して関数Ｅ_ｉｊｋ（ｘ）に従い鮮鋭化を行う。鮮鋭化された帯域画素値は鮮鋭画像の帯域画素値として扱われる。
【０１１９】
ステップ２２０７で、鮮鋭画像の各帯域画素値に対して関数Ｍ^ｑ _ｉｊｋ（ｘ）に従い多色に分類する。分類された帯域画素値は色分類画像の帯域画素値として扱われる。
【０１２０】
ステップ２２０８で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、色分類画像の帯域画素値に対して関数Ａ_ｉｊｋ（ｘ）に従い孤立点および孤立孔を除去する。孤立点および孤立孔を除去された帯域画素値は多色画像１１２の帯域画素値として扱われる。
【０１２１】
ステップ２２０９で、ＡＯＵ_ｉｊが近傍の配列演算ユニット１００と通信することにより、多色画像の各帯域画素値に対して関数Ｕ^ｑ _ｉｊｋ（ｘ）に従い文字／図形を選択する。選択された帯域画素値は粗文字／図形画像の帯域画素値として扱われる。なお、必要に応じて、この関数を数回繰り返しても良い。
【０１２２】
ステップ２２１０で、粗文字／図形画像の各帯域画素値に対して関数Ｔ^ｑ _ｉｊ _ｋ（ｘ）に従いテクスチャを除去する。テクスチャを除去された帯域画素値は文字／図形画像１１３の帯域画素値として扱われる。なお、必要に応じて、この関数を数回繰り返しても良い。
【０１２３】
ステップ２２１１で、文字／図形画像１１３の各帯域画素値に対して関数Ｆ_ｉｊｋ（ｘ，ｙ）に従い精細化する。精細化された帯域画素値は精細文字／図形画像１１４の帯域画素値として扱われる。
【０１２４】
ステップ２２１２で、精細文字／図形画像１１４の帯域画素値を出力する。その後ステップ２２０３に戻る。
【０１２５】
これにより、配列演算ユニット１００から構成されるデータ処理装置１１０を用いて、精細文字／図形抽出手段２２はデジタル画像１１１から精細文字／図形画像１１４を生成することができる。
【０１２６】
なお、ここで多色画像１１２及び文字／図形画像１１３は４帯域画像として扱われているが、視覚装置２は出力手段でこれらの画像の各画素を３ビットの帯域画素値に変換しても良い。これにより、これらの画像のデータ量は、入力されたデジタル画像１１１に比べて大幅に圧縮される。しかもこれらの画像の殆んどの画素が灰色となるため、圧縮プログラムを用いれば、これらの画像はさらに圧縮され得る。加えて、ここではデジタル画像１１１の赤帯域と緑帯域を用いた場合について説明したが、勿論青帯域を用いることにより、同様の方法で白色、黒色、赤色、緑色、青色、黄色及び灰色の７色に変換することもできる。この場合も、視覚装置２は出力手段で多色画像１１２及び文字／図形画像１１３の各画素を３ビットの帯域画素値に変換することができる。
【０１２７】
さてここまでは、文字／図形を抽出する視覚装置２について説明してきた。この視覚装置２はデジタル画像１１１の各画素を少なくとも３ビットに圧縮することができ、しかも多くの画素を灰色にすることができるので、この視覚装置２はこのデジタル画像１１１の高速通信に適している。そこで以下では、この通信装置７及び通信ネットワーク８を中心に通信システムの実施形態について説明する。
【０１２８】
まず、図１１に示すように、請求項７記載の発明に対応する通信システムの実施形態は、視覚装置２、カメラ９及び通信装置７から構成される。なお、図１１では、撮影装置としてこのカメラ９を用いているが、このカメラ９の代りに、スキャナを用いることもできる。また、このカメラ９の代りに、マウス、タブレット及びタッチパネルなどの描画装置を用いることもできる。この通信装置７は画像圧縮手段７０１及び画像通信手段７０２を有する。この画像圧縮手段７０１は、このカメラ９が撮影したデジタル画像１１１からこの視覚装置２によって生成された多色画像１１２、文字／図形画像１１３及び精細文字／図形画像１１４を入力し、ＭＰＥＧなどの動画像圧縮技術及びＪＰＥＧなどの静止画像圧縮技術を用いて動画像及びフレーム画像を圧縮し、画像通信手段７０２に出力する。このとき、これらの入力画像の多くの画素が灰色であり、他の色の数も限られているので、これらの色を効率よく圧縮できるＧＩＦ及びＦｌａｓｈのようなアニメーションに適した圧縮形式を組み合せることにより、この画像圧縮手段７０１は高い圧縮率を達成することができる。
【０１２９】
次に、画像通信手段７０２は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）のような通信ネットワーク８を介して、傍にあるコンピュータシステム８００にＭＰＥＧ画像及びＪＰＥＧ画像などを送信することができる。さらにこの画像通信手段７０２は、ＴＣＰ／ＩＰなどの通信プロトコルを実装することにより、インターネットなどの通信ネットワーク８を介して遠隔地のコンピュータシステム８００にこれらのＭＰＥＧ画像及びＪＰＥＧ画像などを送信することができる。さらにこの画像通信手段７０２としてＤＳ−ＣＤＭＡ及びＭＣ−ＣＤＭＡなどの高帯域無線技術を用いることにより、この通信装置７は、このカメラ９が撮影した動画像を安価に、しかも場所と時間を選ばずに送信することができる。勿論、電子メールを利用することにより、この通信装置７は、このコンピュータシステム８００と通信できるときだけ、これらのＭＰＥＧ画像及びＪＰＥＧ画像などを送信することができる。そこで、請求項８記載の発明に対応する文字／図形探索システムの実施形態では、この通信ネットワーク８に少なくとも１個のコンピュータシステム８００を接続することにより、このコンピュータシステム８００は、この通信ネットワーク８からこれらの圧縮画像を高速に受信することができる。
【０１３０】
ここで、図１３に示すように、このコンピュータシステム８００は、画像受信手段８０１、画像伸張手段８０２、文字／図形領域検出手段８０３、キー画像生成手段８０４及び画像検出手段８０５を備えるものとする。この画像受信手段８０１は、画像通信手段７０２と同様の技術を用いて、通信ネットワーク８から圧縮画像データを受信する。この画像伸張手段８０２は、ＭＰＥＧなどの動画伸張技術及びＪＰＥＧなどの静止画像伸張技術を用いて、これらの圧縮画像データを、視覚装置２によって生成された多色画像１１２、文字／図形画像１１３及び精細文字／図形画像１１４に伸張する。このとき、これらの画像中の多くの画素は灰色であるので、この文字／図形領域検出手段８０３は、灰色以外の画素に対して水平方向及び垂直方向のヒストグラムを計算することにより、これらの画像に含まれる少なくとも１個の文字／図形を含む領域の位置及び大きさも容易に決定することができる。また、所定の色を１とし、残りの色を０とする二値画像に対して視覚装置２の位置／大きさ検出アルゴリズムを用いることにより、この文字／図形領域検出手段８０３は、この領域の位置及び大きさを高速に検出できる。さらに、この位置／大きさ検出アルゴリズムの代りに、位置、大きさ及び傾き検出アルゴリズム（以下、位置／大きさ／傾き検出アルゴリズムと略記する。）を用いることにより、この領域の傾きも高速に検出することができる。そこでこのキー画像生成手段８０４は、この領域の位置及び大きさを用いて、これらの画像からこの領域を切り出すことにより、データベースのキー画像を生成することができる。
【０１３１】
一方で、図１４に示すように、このデータベースは、記録画像の識別番号である画像ＩＤ５０１、この記録画像の説明である画像説明５０２及びこの記録画像のデータである記録画像データ５０３を含む画像レコードを多数記録している。そこで、この画像検出手段８０５は、ｉｍｇＳｅｅｋのような画像検索ソフトウェア、ＳＱＬ（ＳｔｒｕｃｔｕｒｅＱｕｅｒｙＬａｎｇｕａｇｅ）のようなデータベース記述言語を備えたリレーショナルデータベース及びオブジェクト指向データベース、さらにはパーセプトロンのようなニューラルネットワークなどを用いることにより、このキーから、色情報を含めて、高い精度で所定の文字／図形を検出することができる。
【０１３２】
さて、図１２に示すように、請求項９記載の発明に対応する文字／図形探索システムの実施形態は、携帯電話２１１のような通信装置７がディスプレイ２２２及びスピーカ２２４を備えることにより、デジタル画像１１１の撮影者に対して情報を伝達することができる。そこで、このコンピュータシステム８００が、画像通信手段７０２と同様の画像送信手段を備え、さらに、少なくとも１個の所定の文字／図形を含む領域を用いてこの画像検出手段８０５が検出した検出結果又は位置情報を、通信ネットワーク８を介して通信装置７に送信することにより、この通信装置７は、この撮影者に対してこの領域の位置を教えることができる。例えば、この位置情報は、このデジタル画像１１１の二次元座標で表されるものとする。この場合、この位置情報が指し示す位置をこのディスプレイ２２２に表示することにより、この撮影者は、この文字／図形にこのカメラ９を向けることができる。さらにこの撮影者が、この位置を常にこのデジタル画像１１１の中心と一致させながら前進することにより、この撮影者はこの文字／図形がある場所に辿り着くことができる。また、この位置情報は、このデジタル画像１１１の中心からの相対的な方向、例えば上、下、左及び右などを表しても良い。この場合、このスピーカ２２４が、音声合成装置を介してこれらの方向を音声データとして出力することにより、この撮影者は、このディスプレイ２２２を見なくても、この文字／図形がある場所に辿り着くことができる。特にこのような音声機能は、視覚障害者及び高齢者など、このディスプレイ２２２を見ることが困難な人及び作業中の作業員のように、このディスプレイ２２２を常時見続けることが困難な人にとって役に立つ。
【０１３３】
ところで、このデータベースが図１４に示すように、記録画像の識別番号である画像ＩＤ５０１、この記録画像の説明である画像説明５０２及びこの記録画像のデータである記録画像データ５０３を含む画像レコードを多数記録しており、さらにこの画像レコードが、図１５に示すように、この記録画像に対する検索結果文字列５０４及び検索結果画像である別の記録画像を指し示す検索結果画像ＩＤ５０５のうち少なくとも１個を含むものとする。このとき、請求項１０記載の発明に対応する文字／図形探索システムの実施形態は、データベースの機能を利用することにより、撮影されたデジタル画像１１１中の文字／図形自体ではなく、この文字／図形に関連付けられた検索結果文字列５０４又は検索結果画像をこのデジタル画像１１１の撮影者に対して伝達することができる。つまり、このコンピュータシステム８００が、この検索結果文字列５０４又はこの検索結果画像を通信ネットワーク８を介して通信装置７に送信することにより、この通信装置７は、この撮影者に対してこの検索結果文字列５０４又はこの検索結果画像を提示することができる。例えば、撮影された単語が「世界一高い山」であれば、このコンピュータシステム８００は「エベレスト」という単語若しくはこのエベレストの画像をこの通信装置７に送信することができる。本発明はこの単語を表す画像をこのデータベースのキーとして用いているので、日本語、英語、フランス語及びドイツ語など、使用される言語に関係なく、このデータベースを検索することができる。しかも、視覚装置２がこのデジタル画像１１１からこの単語を抽出し、さらに圧縮するので、本発明は、この通信ネットワーク８の通信負荷を大幅に軽減することができる。したがって、携帯電話２１１のように通信帯域が制限される通信ネットワーク８にとって特に有用である。さらに、この検索結果文字列５０４をホームページアドレスにすることにより、この撮影者は、ｉ−ｍｏｄｅのようなインターネット接続機能を利用して、撮影した文字／図形からホームページを直接呼び出すこともできる。勿論、この撮影者は、この検索結果文字列５０４をこのホームページへの文字入力にも利用することができる。
【０１３４】
なお、ここではこの記録画像データ５０３として一般的な文字／図形の画像を用いる場合について説明したが、勿論、このデータベースは、この文字／図形を含む画像だけでなく、この画像に関連付けられた多色画像１１２、文字／図形画像１１３及び精細文字／図形画像１１４のいずれかをこの記録画像データ５０３として用いることができる。この場合、このデータベースは、この文字／図形の画像の画像ＩＤ５０１をこの検索結果画像ＩＤ５０５として用いることができる。これにより、このデータベースはこの文字／図形の画像を高速に検索することができる。
【０１３５】
請求項１及び４記載の発明の実施形態によれば、全ての手段が局所処理によって実現され得る。したがってイメージセンサの製造者は、二次元格子状に配列された複数の配列演算ユニット１００から構成されるデータ処理装置１１０を実現するＬＳＩ（大規模集積回路）を製造し、このＬＳＩを必要な数だけ積層することにより、デジタル画像１１１を簡単に、しかも高速に減色するイメージセンサを実現することができる。そこで、このイメージセンサをパーソナルコンピュータ、ゲーム機、テレビ及びビデオデッキなどの電子機器に組み込むことにより、これらの電子機器は、利用者のプライバシに配慮しながら、この利用者のジェスチャを取り込むことができる。特にこのイメージセンサは、高齢者の安否を確認したり、この高齢者の徘徊を監視するのに適している。また、本実施形態は、このデジタル画像１１１の各画素を３ビットの帯域画素値に圧縮して、カラーマンガのような画像を生成することができる。そこで、本実施形態をビデオデッキに組み込むことにより、このビデオデッキは、録画した動画像の各シーンに対して的確なラベルを付けすることができる。したがって、このビデオデッキの利用者は、タッチパネルなどを使って、見たいシーンのスケッチを描くことにより、このビデオデッキは、このシーンの頭出しを素早く行うことができる。さらに、本実施形態を携帯電話２１１のような通信機器に組み込むことにより、例えこのデジタル画像１１１の画素数が多くても、この通信機器はこのデジタル画像１１１を効率よく圧縮することができる。なお、本実施形態は人間の脳の視覚機能に極めて類似しているので、本実施形態は人間の脳の視覚機能の解明に大いに有用である。例えば、クロマニョン人のような古代人が洞窟に壁画を描いたり、我々の祖先が原始的な象形文字を利用していたが、これらの多くは抽象的な線画であった。この理由は、彼らの脳及び手の表現力が未熟だったからであると考えられているが、本実施形態は、彼らが脳の視覚機能に従って忠実に、つまり「見たまま」に動物及び物体を表現していたことを暗示している。さらに我々自身もマンガの中の登場人物及び風景を容易に認識することができるが、本実施形態を用いれば、この理由も容易に理解することができる。したがって、本実施形態は、利用者にとって理解しやすいユーザインターフェースの開発に役に立つ。
【０１３６】
請求項２及び５記載の発明の実施形態によれば、全ての手段が局所処理によって実現され得る。したがってイメージセンサの製造者は、二次元格子状に配列された複数の配列演算ユニット１００から構成されるデータ処理装置１１０を実現するＬＳＩ（大規模集積回路）を製造し、このＬＳＩを必要な数だけ積層することにより、デジタル画像１１１中の文字／図形を簡単に、しかも高速に抽出するイメージセンサを実現することができる。そこで、このイメージセンサをパーソナルコンピュータ、ゲーム機、テレビ及びビデオデッキなどの電子機器に組み込むことにより、これらの電子機器は、利用者のプライバシに配慮しながら、この利用者のジェスチャを取り込むことができる。特にこのイメージセンサは、高齢者の安否を確認したり、この高齢者の徘徊を監視するのに適している。また、本実施形態は、この文字／図形の色及び形状を損なうことなく、このデジタル画像１１１を圧縮して、カラーマンガのような画像を生成することができる。そこで、本実施形態をビデオデッキに組み込むことにより、このビデオデッキは、録画した動画像の各シーンに対して的確なラベルを付けすることができる。したがって、このビデオデッキの利用者は、タッチパネルなどを使って、見たいシーンのスケッチを描くことにより、このビデオデッキは、このシーンの頭出しを素早く行うことができる。さらに、本実施形態を携帯電話２１１のような通信機器に組み込むことにより、例えこのデジタル画像１１１の画素数が多くても、この通信機器はこのデジタル画像１１１を効率よく圧縮することができ、しかもこの通信機器の利用者は、プライバシーに配慮しながらこの文字／図形を的確に識別することができる。特に、視覚障害者及び高齢者に対して、サポータが遠隔から代読及び歩行などを支援をするのに適している。
【０１３７】
請求項３及び６記載の発明の実施形態によれば、全ての手段が局所処理によって実現され得る。したがってイメージセンサの製造者は、二次元格子状に配列された複数の配列演算ユニット１００から構成されるデータ処理装置１１０を実現するＬＳＩ（大規模集積回路）を製造し、このＬＳＩを必要な数だけ積層することにより、デジタル画像１１１中の文字／図形を簡単に、しかも高速に抽出するイメージセンサを実現することができる。そこで、このイメージセンサをパーソナルコンピュータ、ゲーム機、テレビ及びビデオデッキなどの電子機器に組み込むことにより、これらの電子機器は、利用者のプライバシに配慮しながら、この利用者のジェスチャを取り込むことができる。特にこのイメージセンサは、高齢者の安否を確認したり、この高齢者の徘徊を監視するのに適している。また、本実施形態は、この文字／図形の色及び形状を正確に保持しながら、このデジタル画像１１１を圧縮して、カラーマンガのような画像を生成することができる。そこで、本実施形態をビデオデッキに組み込むことにより、このビデオデッキは、録画した動画像の各シーンに対して的確なラベルを付けすることができる。したがって、このビデオデッキの利用者は、タッチパネルなどを使って、見たいシーンのスケッチを描くことにより、このビデオデッキは、このシーンの頭出しを素早く行うことができる。さらに、本実施形態を携帯電話２１１ような通信機器に組み込むことにより、例えこのデジタル画像１１１の画素数が多くても、この通信機器はこのデジタル画像１１１を効率よく圧縮することができ、しかもこの通信機器の利用者は、プライバシーに配慮しながらこの文字／図形を正確に識別することができる。特に、視覚障害者及び高齢者に対して、サポータが遠隔から代読及び歩行などを支援をするのに適している。
【０１３８】
請求項７及び８記載の発明の実施形態によれば、通信ネットワーク８を介して、視覚装置２を搭載した通信システムと接続されたコンピュータシステム８００は、カメラ９が撮影したデジタル画像１１１から文字／図形を容易に選択することができ、しかもこの文字／図形の色を的確に識別することができる。そこで、このコンピュータシステム８００にこの文字／図形を認識する機能を持たせることにより、このコンピュータシステム８００の利用者は、このカメラ９を持っている遠隔地の撮影者の周囲の状況を容易に把握することができる。
【０１３９】
請求項９記載の発明の実施形態によれば、通信ネットワーク８を介して、視覚装置２を搭載した通信システムと接続されたコンピュータシステム８００は、カメラ９が撮影したデジタル画像１１１から文字／図形を容易に選択することができ、しかもこの文字／図形の色を的確に識別することができる。そこで、このコンピュータシステム８００にこの文字／図形を認識する機能を持たせることにより、このコンピュータシステム８００の利用者は、このカメラ９を持っている遠隔地の撮影者の周囲の状況を容易に把握することができる。例えば、このコンピュータシステム８００にハンバーガーチェーンのロゴを認識する機能を持たせることにより、このコンピュータシステム８００は、この撮影者の周囲にこのハンバーガーチェーンの店舗があるかどうか判断することができる。しかも、もしあれば、このコンピュータシステム８００がこの撮影者に対してこのカメラ９をこのロゴの方向に向けさせることにより、このコンピュータシステム８００は、この店舗の入り口までこの撮影者を誘導することができる。勿論、このコンピュータシステム８００にこの店舗までの途中にある看板及びランドマークを認識する機能を持たせることにより、このコンピュータシステム８００は、この店舗までこの撮影者を誘導することもできる。したがって、この撮影者は初めての場所でも簡単にこの店舗を見付けることができる。さらに、このコンピュータシステム８００にトイレマークを認識する機能を持たせることにより、このコンピュータシステム８００は無線誘導装置など高価な社会基盤を整備することなく、普段使用している看板及びチラシなどをそのまま用いることにより、この撮影者をこの店舗のトイレまで誘導することができる。このような通信システムを、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）と組み合せることにより、この通信システムは高い精度と有用性を達成することができ、特に視覚障害者及び高齢者などが外出する際に便利である。この他に、このコンピュータシステム８００に道路上の複数の車線及びマークを認識する機能を持たせることにより、このコンピュータシステム８００は、これらの車線の位置を検出することができるばかりか、安全地帯の位置及び制限速度を警告することができる。
【０１４０】
請求項１０記載の発明の実施形態によれば、通信ネットワーク８を介して、視覚装置２を搭載した通信システムと接続されたコンピュータシステム８００は、カメラ９が撮影したデジタル画像１１１から文字／図形を容易に選択することができ、しかもこの文字／図形の色を的確に識別することができるばかりか、このカメラ９を持っている遠隔地の撮影者に、この文字／図形に対応した情報を提供することができる。そこで、このコンピュータシステム８００に特定のキーワード及びマークを認識する機能を持たせることにより、この撮影者は、周囲に隠されたこれらのキーワード及びマークを見付け出すことができる。例えば、このコンピュータシステム８００がデータベースを用いて日時、場所、これらのキーワード及びマークを組み合せることにより、この通信システムは、これらのキーワード及びマークを順番に探していく宝探しゲームを、この通信ネットワーク８を介して広範囲で実現することができる。したがって、このようなゲームを上述のハンバーガーチェーンの店舗への誘導と組み合わせることにより、この通信システムはこのハンバーガーチェーンの営業活動を促進することができる。また、このコンピュータシステム８００にテレビの番組表に記載された番組名、時刻及びマークを認識する機能を持たせることにより、この撮影者は、撮影した番組の情報を得られるばかりか、この通信装置７をテレビ及びビデオのリモコンとして利用することもできる。この番組表の文字は一般に小さいので、特に視覚障害者及び高齢者などがテレビ及びビデオを利用する際に便利である。さらに、この撮影者が携帯電話２１１のような通信装置７を用いてチラシ及びホームページなどに書かれた商品コードを撮影した場合、このコンピュータシステム８００は、この携帯電話２１１に商品の説明、広告及び発注案内を送信することもできる。したがって、通信販売会社がこのコンピュータシステム８００を利用することにより、この会社はカタログにバーコードなどを利用しなくても、顧客からの注文を自動化することができる。しかもこの商品コードが小さくても、この撮影者がこのカタログを接写することにより、このコンピュータシステム８００はこの商品コードを識別できるので、特に視覚障害者及び高齢者などが通信販売を利用する際に便利である。勿論、この撮影者がテレビ番組中の女優が着ている服及び友人の家で見付けたアクセサリを撮影して、これらの服及びアクセサリを即座に注文することもできる。この他に、このコンピュータシステム８００に道路標識を認識する機能を持たせることにより、このコンピュータシステム８００は、カーナビゲーションに登録されていない臨時の工事に対しても、通行止めを警告したり、この撮影者の車を回り道に誘導することができる。
【０１４１】
以上、本実施形態を説明したが、本発明は上述の実施形態には限定されることはなく、当業者であれば種々なる態様を実施可能であり、本発明の技術的思想を逸脱しない範囲において本発明の構成を適宜改変できることは当然であり、このような改変も、本発明の技術的範囲に属するものである。
【図面の簡単な説明】
【図１】本実施形態のデータ処理装置を１個含む視覚装置のブロック図である。
【図２】本実施形態のデータ処理装置を４個含む視覚装置のブロック図である。
【図３】本実施形態の格子状に配置された配列演算ユニットのブロック図である。
【図４】本実施形態のデジタル画像の各画素を５色に分類する場合の説明図である。
【図５】本実施形態のデジタル画像から文字及び図形を抽出する場合の説明図である。
【図６】本実施形態のデジタル画像から抽出した文字及び図形の色を正確に表す場合の説明図である。
【図７】本実施形態の多色分類手段のアルゴリズムを示すフローチャートである。
【図８】本実施形態の文字／図形抽出手段のアルゴリズムを示すフローチャートである。
【図９】本実施形態の精細文字／図形抽出手段のアルゴリズムを示すフローチャートである。
【図１０】本実施形態の配列演算ユニットの内部構造のブロック図である。
【図１１】本実施形態のカメラ、視覚装置及び通信装置から構成される通信システムのブロック図である。
【図１２】本実施形態のカメラ、ディスプレイ及びスピーカを搭載した携帯電話の説明図である。
【図１３】本実施形態の圧縮画像データを用いてデータベースを検索するコンピュータシステムの説明図である。
【図１４】本実施形態の記録画像を記録するデータベース中の画像レコードの説明図である。
【図１５】本実施形態の検索結果画像ＩＤと検索結果文字列を記録するデータベース中の画像レコードの説明図である。
【符号の説明】
２視覚装置
７通信装置
８通信ネットワーク
９カメラ
１１多色分類手段
１２文字／図形選択手段
１３テクスチャ除去手段
１４精細化手段
２１文字／図形抽出手段
２２精細文字／図形抽出手段
５１アドレスバス
５２データバス
１００配列演算ユニット
１０１プロセッサ
１０２メモリ
１０３コントローラ
１１０データ処理装置
１１１デジタル画像
１１２多色画像
１１２ａ多色画像（白）
１１２ｂ多色画像（黒）
１１２ｃ多色画像（赤）
１１２ｄ多色画像（緑）
１１３文字／図形画像
１１３ａ文字／図形画像（白）
１１３ｂ文字／図形画像（黒）
１１３ｃ文字／図形画像（赤）
１１３ｄ文字／図形画像（緑）
１１４精細文字／図形画像
２１１携帯電話
２２２ディスプレイ
２２４スピーカ
５０１画像ＩＤ
５０２画像説明
５０３記録画像データ
５０４検索結果文字列
５０５検索結果画像ＩＤ
７０１画像圧縮手段
７０２画像通信手段
８００コンピュータシステム
８０１画像受信手段
８０２画像伸張手段
８０３文字／図形領域検出手段
８０４キー画像生成手段
８０５画像検出手段[0001]
[Field of the Invention]
The present invention relates to a visual device that extracts a plurality of characters or figures in an image, and more specifically, a binary image, grayscale image, three primary color wavelengths, visible light wavelength, infrared wavelength photographed by a photographing device such as a video camera and a scanner. A multicolor image is generated from a plurality of characters or figures in a still image such as an image composed of an arbitrary band of ultraviolet wavelengths, all other electromagnetic waves, and the color of their outlines. On the other hand, the present invention relates to one that extracts characters or figures.
[0002]
[Prior art]
Until now, a visual device has been developed by one of the inventors of the present invention (see, for example,

Patent Documents

1, 2, and 3). This visual device embodies human visual functions by digital technology, for example, counts the number of microorganisms in images taken by a microscope, and controls the pan, tilt, roll and zoom mechanism of a moving camera. By controlling, it is possible to search for a plurality of objects and perform image processing on these objects. Most of the image processing performed by the visual device is local processing, and these local processing is executed in parallel by a plurality of array operation units arranged in a two-dimensional grid. This visual device can be realized by a computer system such as a personal computer, a parallel computer having a plurality of processors, and a system LSI having these in one LSI (Large Scale Integrated Circuit). If a dedicated circuit composed of an array operation unit is used, the visual device can execute these local parallel image processing at high speed (see

Patent Documents

2 and 3 and Non-Patent Document 1).
[0003]
On the other hand, many researchers have developed a method for extracting characters in an image so far (see, for example, Non-Patent

Documents

2, 3, 4, 5 and 6). Many of these methods target grayscale images. Even if these methods can process a color image, these methods basically separate the gray images after separating them into three gray images, so that these methods are basically used for gray images. (See, for example, Non-Patent Document 2). In addition, many of these methods perform Fourier transform, Hough transform, and histogram calculation (see, for example,

Non-Patent Documents

3, 4, and 6). Therefore, even if the processing area is divided, these methods must perform global processing in the processing area, so that the parallelism is basically lowered, resulting in a lower processing speed.
[0004]
By the way, local parallel image processing can solve such a problem. For example, when the overlapping information output by the position and size detection algorithm (hereinafter referred to as position / size detection algorithm, which is referred to as the position / size detection algorithm that refers only to the vicinity of 8) is within a certain range, this overlap is used. The information is a hint of the character position. Therefore, even with a local parallel image processing algorithm composed only of functions that refer to eight or fewer pixels, characters can be extracted from a scene image, and a relatively high extraction rate can be achieved. However, this algorithm has the following three problems. First, since this algorithm only converts color images into black and white character images, it is extremely difficult to process other colors. Therefore, this algorithm makes red stand out as either black or white by applying a bias only to the red band of this color image. Second, even if this duplicate information is within this certain range, this duplicate information does not necessarily indicate the position of the character. In many cases, the same duplication information is also generated in the filled area. Therefore, after the duplication information included in the filled area is removed in advance, the algorithm uses the remaining duplication information to generate characters. Judgment must be made. Thirdly, since this algorithm uses only processing of 8 neighborhoods or less, the amount of calculation increases as the scene image becomes larger. The algorithm has to repeat the 8-neighbor process many times, especially when excluding this filled area.
[0005]
Here, when human vision is examined, it can be seen that the complementary color relationship is effectively used (see, for example, Non-Patent Document 7). This complementary color relationship is a characteristic seen in human vision and is a combination of colors facing each other in the hue circle. For example, red and green have a complementary color relationship and do not appear simultaneously in the same pixel. Similarly, yellow and blue have a complementary color relationship. However, in human vision, the influence of yellow and blue is weaker than that of white, black, red, and green. For example, yellow characters on a white background and blue characters on a black background are very difficult to see. Therefore, it can be assumed that yellow is white and blue is close to black, which is supported not only by psychology but also by the structure of the retina.
[0006]
In addition, the above white and black themselves also represent relative brightness. Even if a certain pixel has the same brightness, this pixel appears white if its periphery is dark, and if this periphery is bright, this pixel is Looks black. That is, the relative brightness based on the brightness of surroundings that does not belong to any color of gray is white and black for humans. This is because it is very convenient to use such relative brightness when a human lives in an environment where the brightness constantly changes, such as under sunlight. However, in the field of neurophysiology, the specific mechanism of the retina that determines this gray is not yet elucidated, but to compress the image data transmitted from the retina to the primary visual cortex via ganglion cells, Many researchers are unaware that this gray plays an important role (see, for example, Non-Patent Document 8). There are many types of amacrine cells in the retina, and these amacrine cells form a complex network with bipolar cells and ganglion cells. Also in the field, attention is focused only on motion detection (see, for example, Non-Patent Documents 9 and 10).
[0007]
Actually, when a human writes a character or a figure, this complementary color relationship is used unconsciously in order to make these characters and figures easy to see. Therefore, for the first problem, if this complementary color relationship is used effectively, the positions of these characters and figures can be easily found, and the colors of these characters and figures can be accurately determined with a small amount of information. It can also be expressed. Moreover, the filled area can be regarded as gray because there is no color change. Therefore, by removing this gray portion from the above-mentioned scene image, this complementary color relationship is also effective for the second problem.
[0008]
Now, for the third problem, instead of using the position / size detection algorithm, a function for directly referring to an arbitrary neighborhood and calculating the sum is used. In particular, since characters and graphics are generally composed of at least one line including a line segment, an arc, and a free curve, the neighborhood length for this neighborhood need not exceed the maximum width of these lines. Therefore, this function for calculating the sum is also a local function. As a result, the visual device can greatly reduce the amount of calculation. In the following, a character refers to a character composed only of a line having a predetermined line width or less, and a graphic refers to a character including at least one line or region exceeding a predetermined line width.
[0009]
Considering these things, the visual device classifies the input image into at least five colors of white, black, red, green and gray, and selects a character or a figure for each color by using a local function for calculating the sum. Thus, the visual device is expected not only to extract these characters and graphics, but also to accurately represent the colors of these characters and graphics. In addition, since many functions are functions that refer to the vicinity of 8 or less, it is expected that this visual apparatus can extract these characters and figures with a small amount of hardware and calculation.
[0010]
[Patent Document 1]
International Publication Number WO00 / 16259 pamphlet
[Patent Document 2]
International Publication Number WO01 / 41448 Pamphlet
[Patent Document 3]
International Publication Number WO02 / 73538
[Non-Patent Document 1]
Shizuto Fukuda, Yoshiaki Amioka, Hideharu Amano, “Implementation of position / size detector on FPGA”, 10th FPGA / PLD Design Conference (FPGA / PLD Design Conference), Electronic Design Electronic Design and Solution Fair 2003, January 30, 2003, p. 115-122
[Non-Patent Document 2]
Kaoru Tsuji, Naoki Tanaka, Toyohisa Kaneko, R.D. M.M. Haralick, “Character region extraction method from cover image”, IEICE Transactions, October 1997, J80-D-II, No. 10, p. 2696-2704
[Non-Patent Document 3]
Liu Xiaomei, Satoshi Yamamura, Noboru Onishi, Noboru Sugie, “About extraction of character string region in scene”, IEICE Transactions, April 1998, J81-D-II, No. 4, p. 641-650
[Non-Patent Document 4]
Hideaki Goto, Ritsuka Hirayama, Hiroki Aki, “Extracting Character Patterns from Gray and Gray Document Images Using Local Multilevel Threshold Processing”, IEICE Transactions, November 1999, Vol. J82-D-II , No. 11, p. 2188-2192
[Non-Patent Document 5]
Daisuke Kin, Hiroki Takahashi, Masayuki Nakajima, “Hangul Character String Region Extraction and Scene Element Recognition from Scene Images”, IEICE Technical Report, May 2002, PRMU-2002-18, MI2002-34 , P. 65-70
[Non-Patent Document 6]
Kenichi Matsuo, Katsuhiko Ueda, Michio Umeda, “Character string region extraction from scene image by binarization of local target region”, IEEJ Transactions, February 2002, Vol. 122-C, No. 2, p. 232-241
[Non-Patent Document 7]
Ikeda Mitsuo, “What does the eye see?-Information processing in the visual system,” first edition, Heibonsha, August 22, 1988, p. 183-276
[Non-Patent Document 8]
Edited by Helga Kolb, Eduardo Fernandez, Ralph Nelson, "Webvision", (USA), [online], 2003, John A. Moran. John A. Moran Eye Center, University of Utah, [Retrieved 14 April 2003], Internet <URL: http://webvision.med.utah.edu>
[Non-patent document 9]
K. K. Boahen, “Retinomorphic Chips that see Quadruple Images (USA), 7th Neural, Micro for Fuzzy and Bio-Inspired Systems Proceedings of the Seventh International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems, I. Triple E (IEEE), 1999, p. 12-20
[Non-Patent Document 10]
Hitoshi Yamada, Rare Miyashita, Masahiro Otani, Hiroo Yonezu, "Generation of Motion Information Learned from the Inner Retina Function and Its Electronic Circuit", Denso Technical Review, Denso Corporation, 2000, Vol. 5, No. 2, p . 101-106
[0011]
[Problems to be solved by the invention]
Therefore, the present invention described in claim 1 extracts at least one character and figure from an image by using the complementary color relationship, so that the position or color of these characters and figures can be obtained with a small amount of hardware and calculation. It aims at extracting simultaneously.
[0012]
[Means for Solving the Problems and Effects of the Invention]
The invention of claim 1 is a visual device including at least one data processing device, the data processing device including at least one processor and at least one set of memory, and one memory. And storing at least one band pixel value that is one element of a vector representing the image of the image, wherein the data processing device includes gray for each pixel of the input digital image based on a complementary color relationship By providing multi-color classification means for generating a multi-color image represented by at least the five colors from the digital image by classifying the digital image into at least five colors, each pixel of the digital image has a band pixel value of at least 3 bits. It is a visual device characterized by being compressed into. In the present invention, the multi-color classification means is realized by a plurality of array operation units arranged in a two-dimensional lattice pattern. Therefore, the image processing performed by the visual device is local parallel image processing. The multi-color classifying unit classifies the color represented by the plurality of band pixel values of each pixel of the digital image into at least four colors including white, black, and at least one pair of colors facing each other in the hue circle. (Or step), means (or step) for calculating the difference between the white and the black, means (or step) for calculating the difference from at least the pair of colors facing each other, and the white and the black It is preferable that the difference (and the difference between at least the pair of colors facing each other) and the means (or step) for classifying gray are used to convert the image into the multicolor image including at least the five colors. When the digital image is a color image, it is generally preferable to use red and green as at least one pair of colors facing each other. Unless otherwise specified, yellow and blue are regarded as the white and black, respectively. Therefore, since at least 3 bits are required to represent these five colors, the visual device can compress the digital image into an image composed of the band pixel values of at least 3 bits. In addition, since most of the compressed band pixel values are gray, it is easy to compress the band pixel values. Since the present invention can compress the digital image, various problems relating to the compression of the digital image are preferably solved.
[0013]
The invention of claim 2 is a visual device including at least one data processing device, the data processing device including at least one processor and at least one set of memory, and one memory. And storing at least one band pixel value that is one element of a vector representing the image of the image, wherein the data processing device includes gray for each pixel of the input digital image based on a complementary color relationship By classifying at least five colors, a multicolor classifying unit that generates a multicolor image represented by at least the five colors from the digital image, and at least one color of each pixel of the multicolor image has a predetermined color A character / figure selecting means for converting the multicolor image into a rough character / figure image by selecting only pixels whose number of predetermined band pixel values in the vicinity is within a predetermined range; By removing only pixels in which the number of predetermined band pixel values within a predetermined vicinity is within a predetermined range for at least one color of each pixel of the rough character / graphic image, the rough character / graphic And a texture removing unit that converts the image into a character / graphic image, thereby compressing each pixel of the character / graphic image to a band pixel value of at least 3 bits. In the present invention, a character refers to a character composed of only a line having a predetermined line width or less, and a graphic refers to a character including at least one line or region exceeding a predetermined line width. Therefore, in the present invention, these characters and / or graphics (hereinafter abbreviated as characters / graphics) in the digital image are extracted to generate the character / graphic images. In the present invention, all the means are realized by a plurality of array operation units arranged in a two-dimensional lattice pattern. Therefore, the image processing performed by the visual device is local parallel image processing. The multi-color classifying unit classifies the color represented by the plurality of band pixel values of each pixel of the digital image into at least four colors including white, black, and at least one pair of colors facing each other in the hue circle. (Or step), means (or step) for calculating the difference between the white and the black, means (or step) for calculating the difference from at least the pair of colors facing each other, and the white and the black It is preferable that the difference (and the difference between at least the pair of colors facing each other) and the means (or step) for classifying gray are used to convert the image into the multicolor image including at least the five colors. When the digital image is a color image, red and green are generally used as at least one pair of colors facing each other. Unless otherwise specified, yellow and blue are regarded as the white and black, respectively. Therefore, since at least 3 bits are required to represent these five colors, the visual device can compress the digital image into an image composed of the band pixel values of at least 3 bits. In addition, since most of the compressed band pixel values are gray, it is easy to compress the band pixel values. In the character / graphic selection means, means (or step) for collecting the predetermined band pixel values in the predetermined vicinity for at least one color of each pixel of the multicolor image, and the collected band pixels The multi-color image is converted into a rough character / graphic image using means (or step) for selecting only the pixels whose number is within the predetermined range. If the number of the predetermined band pixel values is within the predetermined range, the band pixel value of the color of the pixel of the multicolor image is regarded as the character / graphic, and the coarse character / graphic image The band pixel value of. If the number of the predetermined band pixel values is not within the predetermined range, the band pixel value is regarded as the gray and is set as the band pixel value of the coarse character / graphic image. The texture removing means includes means (or step) for generating a band pixel value corresponding to edge information for at least one color of each pixel of the coarse character / graphic image, and a predetermined band within a predetermined vicinity. Using the means (or step) for collecting pixel values and the means (or step) for removing only pixels in which the number of collected band pixel values is within a predetermined range, the rough character / graphic image is converted into a character. / It is preferable to convert to a graphic image. If the number of the predetermined band pixel values is not within the predetermined range, the band pixel value of the color of the pixel of the coarse character / graphic image is regarded as not texture, and the character / graphic image The band pixel value of. If the number of edge information is within the predetermined range, the band pixel value of the coarse character / graphic image is regarded as the gray and is set as the band pixel value of the character / graphic image. Since the present invention can extract the characters / graphics in the digital image while accurately representing the color of the characters / graphics in the digital image, various problems relating to the extraction of the characters / graphics are preferable. To be resolved.
[0014]
According to a third aspect of the present invention, in the visual device according to the second aspect, the data processing device includes a refining unit, and the refining unit includes a plurality of the band pixel values constituting each pixel of the character / graphic image. Represents the characters / graphics, means (or step) for setting each of the corresponding band pixel values in the fine character / graphic image to the corresponding band pixel values in the digital image; and When the plurality of band pixel values constituting each pixel represent the gray, means (or step) for setting each band pixel value corresponding to the fine character / graphic image to the band pixel value representing a predetermined gray ), And a visual device characterized by using. In the present invention, a character refers to a character composed of only a line having a predetermined line width or less, and a graphic refers to a character including at least one line or region exceeding a predetermined line width. Therefore, in the present invention, these characters and / or graphics (hereinafter abbreviated as characters / graphics) in the digital image are extracted to generate the fine character / graphic images. In the present invention, all the means are realized by a plurality of array operation units arranged in a two-dimensional lattice pattern. Therefore, the image processing performed by the visual device is local parallel image processing. The refining means replaces the gray pixel in the character / graphic image in the digital image, that is, the pixel corresponding to the pixel not included in the character and the graphic, with the predetermined gray color, The other pixels are maintained as they are. Thereby, the color of the character and the figure in the fine character / graphic image becomes accurate. In addition, since a large number of pixels become gray, it is easy to compress the band pixel values of these pixels. The present invention can extract the character and the graphic in the digital image while accurately representing the color of the character and the graphic in the digital image. The problem is preferably solved.
[0015]
The invention of claim 4 is a visual device including at least one data processing device composed of a plurality of array operation units arranged in a two-dimensional grid, wherein each of the data processing devices is at least one. Each processor and at least one set of memories, wherein said memory stores at least one band pixel value that is one element of a vector representing one image, and each said array operation unit comprises: Processing at least one of the band pixel values using at least one of the processors, and means for initializing the array operation unit in each of the array operation units; Means for terminating the processing if there is no band pixel value of the digital image; means for inputting at least one band pixel value of the digital image; Means for converting the plurality of band pixel values of the digital image into at least one band pixel value of the smooth image to smooth each pixel of the tall image, and for emphasizing each pixel of the smooth image, Means for converting a plurality of band pixel values of the smoothed image into at least one band pixel value of the enhanced image; and at least one of the enhanced image for classifying each pixel of the enhanced image into at least five colors Means for generating at least one band pixel value of a color classification image from the band pixel value of the plurality of color classification images, and for removing isolated points and holes from each pixel of the color classification image, A visual device comprising: means for converting a band pixel value into at least one band pixel value of a multicolor image; and means for outputting each band pixel value of each of the multicolor images. . The present invention is an implementation form of an algorithm for realizing the function provided by the array operation unit by digital technology. When the digital image is a color image, the multicolor image may include at least five colors of white, black, red, green, and gray. The array operation units are arranged in the two-dimensional grid, the array operation units are connected to each other in the vicinity, and initial values of parameters of the array operation unit are set, and then the digital image is input appropriately in units of pixels. Then, the process from smoothing to the output of the multi-color image is sequentially performed and repeated until the digital image is not input. According to the present invention, the array operation units can be operated in parallel, so that various problems relating to extraction of a plurality of the characters and graphics are preferably solved.
[0016]
The invention according to claim 5 is a visual device including at least one data processing device composed of a plurality of array operation units arranged in a two-dimensional grid, and each of the data processing devices includes at least one data processing device. Each processor and at least one set of memories, wherein said memory stores at least one band pixel value that is one element of a vector representing one image, and each said array operation unit comprises: Processing at least one of the band pixel values using at least one of the processors, and means for initializing the array operation unit in each of the array operation units; Means for terminating the processing if there is no band pixel value of the digital image; means for inputting at least one band pixel value of the digital image; Means for converting the plurality of band pixel values of the digital image into at least one band pixel value of the smooth image to smooth each pixel of the tall image, and for emphasizing each pixel of the smooth image, Means for converting a plurality of band pixel values of the smoothed image into at least one band pixel value of the enhanced image; and at least one of the enhanced image for classifying each pixel of the enhanced image into at least five colors Means for generating at least one band pixel value of a color classification image from the band pixel value of the plurality of color classification images, and for removing isolated points and holes from each pixel of the color classification image, Means for converting a band pixel value into at least one band pixel value of a multicolor image; and for selecting a character and a figure from the multicolor image, the plurality of band pixel values of the multicolor image Graphic image Means for converting to at least one band pixel value; and for removing texture from the coarse character / graphic image, the plurality of band pixel values of the coarse character / graphic image are converted to at least one of the character / graphic image. A visual apparatus comprising: means for converting to a band pixel value; and means for outputting the band pixel value of each character / graphic image. In the present invention, a character refers to a character composed of only a line having a predetermined line width or less, and a graphic refers to a character including at least one line or region exceeding a predetermined line width. Therefore, in the present invention, these characters and / or graphics (hereinafter abbreviated as characters / graphics) in the digital image are extracted to generate the character / graphic images. That is, the present invention is an implementation form of an algorithm for realizing the function provided by the array operation unit by digital technology. When the digital image is a color image, the multicolor image may include at least five colors of white, black, red, green, and gray. The array operation units are arranged in the two-dimensional grid, the array operation units are connected to each other in the vicinity, and initial values of parameters of the array operation unit are set, and then the digital image is input appropriately in units of pixels. Then, the process from the smoothing to the output of the character / graphic image is sequentially performed, and is repeated until the digital image is not input. In the present invention, since the array operation units can be operated in parallel, various problems relating to the extraction of the plurality of characters / graphics are preferably solved.
[0017]
The invention of claim 6 is the visual device according to claim 5, wherein the digital image is used to refine the character / graphic image and convert it into at least one band pixel value of the fine character / graphic image; A visual apparatus comprising: means for outputting the band pixel value of each of the fine character / graphic image instead of the band pixel value of each of the character / graphic image. In the present invention, a character refers to a character composed of only a line having a predetermined line width or less, and a graphic refers to a character including at least one line or region exceeding a predetermined line width. Therefore, in the present invention, these characters and / or graphics (hereinafter abbreviated as characters / graphics) in the digital image are extracted to generate the fine character / graphic images. That is, the present invention is an implementation form of an algorithm for realizing the function provided by the array operation unit by digital technology. When the digital image is a color image, the multicolor image may include at least five colors of white, black, red, green, and gray. The array operation units are arranged in the two-dimensional grid, the array operation units are connected to each other in the vicinity, and initial values of parameters of the array operation unit are set, and then the digital image is input appropriately in units of pixels. Then, the process from smoothing to the output of the fine character / graphic image is sequentially performed, and the process is repeated until the digital image is not input. In the present invention, since the array operation units can be operated in parallel, various problems relating to the extraction of the plurality of characters / graphics are preferably solved.
[0018]
A seventh aspect of the present invention is the visual device according to any one of the first to fifth aspects, a photographing device for photographing the digital image or a drawing device for rendering the digital image, an image compressing unit, and an image. A communication device including communication means, wherein the image compression means for the multicolor image, the character / graphic image or the fine character / graphic image output from the visual device, A communication system comprising: compressing a plurality of gray band pixel values together to generate compressed image data; and transmitting the compressed image data to a communication network. is there. The photographing apparatus can photograph the digital image such as a camera and a scanner. Moreover, it is preferable that the drawing device is capable of drawing the digital image, such as a mouse, a tablet, and a touch panel. The communication network includes USB (Universal Serial Bus), Ethernet, ISDN (Integrated Digital Communication Network), xDSL (Digital Subscriber Line), ATM (Asynchronous Transfer Mode) and frame relay and other wire technologies, infrared, wireless LAN, IMT, etc. It is composed of wireless technologies such as 2000 (DS-CDMA and MC-CDMA, etc.) and Bluetooth. The image compression means uses the image compression technology such as MPEG (Moving Picture Experts Group) and JPEG (Joint Photographic Coding Experts Group), among the multicolor image, the character / graphic image, and the fine character / graphic image. Preferably at least one can be compressed. At this time, if the image compression means uses a compression format suitable for animation such as GIF (Graphic Interchange Format) and Flash, the image compression means can compress these images at a high compression rate. According to the present invention, since these images output from the visual device can be communicated at high speed, various problems relating to the communication of the characters / graphics are preferably solved.
[0019]
The invention of claim 8 is a character / graphics search system including the communication system according to claim 7, at least one computer system including a database, and a communication network connecting the communication system and the computer system. Each of the computer systems receives an image receiving means for receiving the compressed image data from the communication system, and converts the received compressed image data into the multicolor image, the character / graphic image, or the fine character / graphic image. An image expanding means for expanding; a character / graphic area detecting means for detecting an area including at least one character / graphic from the multicolor image, the character / graphic image or the fine character / graphic image; Key image generation means for generating a key image of the database; and the key image in the database. An image detecting means for comparing the gills recording image data by including a text / graphic search system and generating a detection result for the region. The image receiving means includes USB (Universal Serial Bus), Ethernet, ISDN (Integrated Digital Communication Network), xDSL (Digital Subscriber Line), ATM (Asynchronous Transfer Mode) and wire relay technologies such as infrared relay, wireless LAN, and the like. The compressed image data is received from the communication network using a wireless technology such as IMT-2000 (DS-CDMA and MC-CDMA, etc.) and Bluetooth. The image expansion means uses the image expansion technique such as MPEG (Moving Picture Experts Group) and JPEG (Joint Photographic Coding Experts Group) to convert the compressed image data into the multi-color image, the character / graphic image, or the fine character. / It is preferable that it can be expanded into a graphic image. The character / graphic region detecting means calculates a histogram in the horizontal direction and the vertical direction using the multi-color image, the character / graphic image, or the fine character / graphic image including a large number of grays. Thus, it is preferable to detect the position and size of the region including at least one character / graphic. Further, by using the position / size detection algorithm of the visual device, the character / graphic area detection means can detect the position and size of a large number of the areas at high speed. The key image generation means cuts out the region from the multi-color image, the character / graphic image, or the fine character / graphic image using the position and the size, and generates the key image. The image detection means may detect the data corresponding to the key image from the database using an image search technique (for example, imgSeek) incorporated in the database and an image processing technique such as a neural network. preferable. The database is preferably a relational database or an object-oriented database having a database description language such as SQL (Structure Query Language). Here, it is preferable that the recorded image data relating to the characters / graphics is stored, and an image ID is assigned to the recorded image data. The computer system removes the gray band pixel value from the multi-color image, the character / graphic image, or the fine character / graphic image, and then the region of the character / graphic pointed to by the remaining band pixel value Is cut out and compared with the recorded image data. This eliminates the need for the computer system to compare the recorded image data with a portion other than the characters / graphics such as a background. In general, the detection result for the area is the recorded image data, but the image ID may be used instead of the recorded image data. The present invention can reduce the amount of the data in the database, and can detect the predetermined character / graphic at high speed in the computer system, so that various problems relating to the search for the character / graphic are preferable. To be resolved.
[0020]
According to a ninth aspect of the present invention, in the character / graphics search system according to the eighth aspect, at least one of the computer systems uses the image transmission means to transmit the detection result or the position information of the area to the communication. Transmitting to the device; and wherein the communication device includes a display or a speaker, so that the communication device outputs the search result or the position information using the display or the speaker. It is a graphic search system. The image transmission means includes USB (Universal Serial Bus), Ethernet, ISDN (Integrated Digital Communication Network), xDSL (Digital Subscriber Line), ATM (Asynchronous Transfer Mode), and wire relay technology such as infrared relay, wireless LAN, and the like. It is preferable to transmit the search result or the position information to the communication network using a wireless technology such as IMT-2000 (DS-CDMA and MC-CDMA, etc.) and Bluetooth. The position information represents a two-dimensional coordinate of the character / graphic in the digital image or a relative direction from the center of the digital image, for example, up, down, left and right. The communication device displays the position of the two-dimensional coordinates on the display, or outputs audio data such as up, down, left, and right using the speaker. The user of the photographing apparatus can easily find the predetermined character / graphic by changing the direction of the photographing apparatus or the photographed object in accordance with the position on the display or the instruction of the audio data. According to the present invention, even if the communication network has a small communication capacity, the character / graphic can be searched, so that various problems relating to the search for the character / graphic are solved preferably.
[0021]
A tenth aspect of the present invention is the character / graphics search system according to the eighth aspect, wherein at least one of the computer systems uses the image transmission means to retrieve a search result character string or a search result associated with the recorded image data. The communication device outputs the search result character string or the search result image using the display or the speaker by transmitting an image to the communication device and the communication device includes a display or a speaker. This is a character / graphic search system. The image transmission means includes USB (Universal Serial Bus), Ethernet, ISDN (Integrated Digital Communication Network), xDSL (Digital Subscriber Line), ATM (Asynchronous Transfer Mode), and wire relay technology such as infrared relay, wireless LAN, and the like. The search result image or the search result character string is transmitted to the communication network using a wireless technology such as IMT-2000 (DS-CDMA, MC-CDMA, etc.) and Bluetooth. Preferably, the search result character string is an arbitrary character string, and the search result image is also an arbitrary image. The search result character string and the search result image are recorded in the database in a state associated with the recorded image data. Therefore, when the recorded image data is searched by the key image, the database can refer to the search result character string or the search result image by using the recorded image data. The communication device displays the search result character string or the search result image on the display, or outputs voice data of the search result character string using the speaker. The user of the photographing device can easily find the predetermined character / graphic by appropriately changing the direction of the photographing device or the photographed object until obtaining the search result character string or the search result image. According to the present invention, even if the communication network has a small communication capacity, the character / graphic can be searched, so that various problems relating to the search for the character / graphic are solved preferably.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of a visual device (VISUAL DEVICE) 2 using an array operation unit (ARRAY OPERATION UNIT) 100 of the present invention will be described with reference to the drawings.
[0023]
First, as shown in FIG. 1, the visual device 2 includes at least one data processing device 110. The data processing apparatus 110 includes at least one processor 101, which implements at least one means according to a local parallel image processing algorithm described by a local function, and at least one processor After the input image is input, this local parallel image processing is executed, and at least one output image is output as necessary. If there are a plurality of the above-described means, the visual device 2 can realize these means by using one data processing device 110 as shown in FIG. 1, or a plurality of such means as shown in FIG. These means can also be realized by using (in this case, four) data processing devices 110. Therefore, in the following, for the sake of simplicity, it is assumed that the visual device 2 uses one data processing device 110.
[0024]
Next, as shown in FIG. 3, the data processing apparatus 110 is composed of a plurality of array operation units 100 arranged in a grid pattern. Each of these array operation units 100 is described as a program of this processor 101. However, when the number of processors 101 is one, the programs of these array operation units 100 are executed in order, so that the data processing apparatus 110 cannot take advantage of the parallelism of local parallel image processing. Therefore, a dedicated circuit of the data processing apparatus 110 that takes advantage of this parallelism will be described below.
[0025]
As shown in FIG. 3, these array operation units 100 are arranged in a grid pattern in the data processing device 110, and each of these array operation units 100 is mutually connected with only the adjacent array operation unit 100 in the data processing device 110. It is wired so that it can communicate with. That is, the four neighbors are directly wired. As a result, as compared with the case where eight neighbors are wired, the operation can be performed at the same speed with fewer electronic components and the amount of wiring, and the expandability can be easily achieved even when the neighborhood size is expanded in the future.
[0026]
Further, as shown in FIG. 10, each of these array operation units 100 includes a processor 101 for calculating mathematical expressions in image processing, and a set for storing all parameters, constants, functions, and operators used in the mathematical expressions. The memory 102 and at least one controller 103 for communicating with the array processing unit 100 in the vicinity. The processor 101 and the controller 103 are synchronous circuits that receive clock signals. The processor 101 can select an arbitrary memory element and register of the memory 102 and the controller 103 according to the address specified by the address bus 51. The processor 101 is connected to the memory 102 and the controller 103 via the data bus 52 so as to be capable of bidirectional communication, and can access data in any memory element and register designated by the address bus 51. . When the array operation unit 100 receives a previous input data group composed of one or more input pixels, the controller 103 stores the previous input data group in the memory 102. In addition, the controller 103 transmits the calculation data generated by the function in the memory 102 to the adjacent array calculation unit 100, and stores the calculation data received from the adjacent array calculation unit 100 in the memory 102. If necessary, the data is transferred to the array operation unit 100 other than the input. Finally, the controller 103 outputs the image data of the output image as result data. Of course, the array operation unit 100 may include a plurality of controllers 103. For detailed circuit diagrams of the array operation unit 100 and a virtual array operation unit improved from the array operation unit 100, see Patent Document 3.
[0027]
Up to this point, the data processing device 110 constituting the visual device 2 has been described. The data processing apparatus 110 includes at least one processor 101, thereby realizing at least one means according to local parallel image processing. Therefore, hereinafter, these means used in the present invention will be described.
[0028]
As shown in FIG. 4, the embodiment of the visual device 2 corresponding to the first aspect of the present invention is configured by the multicolor classification means 11 mounted on the data processing device 110. The multi-color classification unit 11 inputs the digital image 111 and generates a multi-color image 112 including at least five colors. In the following, for the sake of simplicity, it is assumed that the digital image 111 is a general color image, and the multi-color classifying unit 11 assigns each pixel of the digital image 111 to white, black, red, green, and gray. The multi-color image 112 is classified into five colors, and a binary image is generated for each color. However, since gray means other than the remaining four colors, it is sufficient if there are actually four colors of white, black, red and green. Therefore, the multicolor image 112 corresponds to a binary image of white, black, red, and green, respectively, and is a multicolor image (white) 112a, a multicolor image (black) 112b, a multicolor image (red) 112c, and a multicolor image. (Green) 112d. When this digital image 111 is classified into five colors of white, black, red, green and gray, yellow is generally classified as white and blue is classified as black, but reddish yellow and greenish blue are respectively Classified as red and green.
[0029]
The multicolor classification unit 11 classifies each pixel of the digital image 111 into five colors of white, black, red, green, and gray based on the complementary color relationship, and generates a multicolor image 112. This complementary color relationship is a characteristic seen in human vision and is a combination of colors facing each other in the hue circle. For example, red and green have a complementary color relationship, and the red and green are not seen at the same time in one pixel, and the contrast is enhanced. Yellow and blue (actually purple) are also complementary. Further, white and black represent relative brightness, and even if a certain pixel has the same brightness, this pixel looks white if its periphery is dark, and this pixel appears black if its periphery is bright. That is, the relative brightness based on the brightness of surroundings that does not belong to any color of gray is white and black for humans. Therefore, neither white nor black can be seen simultaneously in one pixel. Therefore, in each pixel of the digital image 111, the difference between the pixel value of the red band and the pixel value of the green band is set as the intensity of each color, and the brightness of each pixel based on the brightness of surrounding pixels is set to white and black. Of strength. What is important here is that if both sets have no difference in color intensity, this pixel is gray, that is, a color other than four. As a result, all the pixels having a slight color change in the digital image 111 are gray, while pixels in which a plurality of colors such as characters and figures are complicatedly mixed are colors other than gray. When a person actually writes characters and figures, a combination of colors that unconsciously emphasizes contrast is selected in order to make the letters and figures easy to see. Therefore, the multi-color classifying means 11 uses the complementary color relationship to extract these characters and figures and their outlines from the digital image 111 without impairing the colors of these letters and figures. The removed background can be painted in gray. At this time, the multi-color image 112 is an image just similar to the description of the manga.
[0030]
Here, four binary images are used as the multicolor image 112, but each pixel of the multicolor image 112 may of course be represented by 3 bits. Further, the multi-color classification unit 11 may classify into seven colors of white, black, red, green, blue, yellow and gray based on the complementary color relationship. Of course, also in this case, each pixel of the multicolor image 112 can be represented by 3 bits. Furthermore, the digital image 111 may include at least one band other than visible light such as infrared rays and ultraviolet rays. In this case, these bands can be assigned to any of red, green, blue and yellow. Of course, you may assign to a new color other than these.
[0031]
By the way, when a character written with a predetermined line width or less is present in the digital image 111, almost the same character appears in the multicolor image 112 generated from the digital image 111. In addition, when a graphic including a line and a region exceeding the line width is present in the digital image 111, the same graphic and its outline appear in the multicolor image 112 as well. Therefore, hereinafter, the visual device 2 that extracts these characters and / or figures (hereinafter simply abbreviated as characters / graphics) from the multicolor image 112 will be described.
[0032]
As shown in FIG. 5, the embodiment of the visual device 2 corresponding to the invention described in claim 2 is realized by the character / graphic extraction means 21 mounted on the data processing device 110, and further this character / graphic extraction means. 21 includes a multi-color classification unit 11, a character / graphic selection unit 12, and a texture removal unit 13. In FIG. 5, the multi-color classifying unit 11 classifies the digital image 111 into five colors of white, black, red, green and gray. Therefore, the character / graphic image 113 output by the visual device 2 is a character / graphic image (white) 113a, a character / graphic image (black) 113b, a character / graphic image (red) 113c, and a character / graphic image (green) 113d. It shall consist of The character / graphic selection unit 12 receives the multicolor image 112 generated by the multicolor classification unit 11, and for each band pixel value of the multicolor image 112, a predetermined neighboring band pixel for each color. The sum of the values is obtained, and it is determined whether the sum is within a predetermined range. At this time, if the total is within this range, the character / graphic selection means 12 determines that the band pixel value of the multicolor image 112 is included in the character / graphic, and the correspondence between the rough character / graphic image is determined. The band pixel value of the color to be used is set. Otherwise, this band pixel value is gray. As a result, the character / graphic selection unit 12 can select pixels included in the character / graphic in the multicolor image 112 for each color regardless of the background. Further, the texture removing unit 13 inputs the rough character / graphic image to generate edge information for each color, and obtains a sum of band pixel values in a predetermined vicinity for each color with respect to the edge information, It is determined whether this sum is within a predetermined range. At this time, if the total is not within this range, the texture removing unit 13 determines that the band pixel value of the rough character / graphic image is not included in the texture, and the corresponding color of the character / graphic image 113 is determined. The band pixel value of. Otherwise, this band pixel value is gray. Thereby, this texture removal means 13 can remove the pixel contained in the texture in this rough character / graphic image for every color irrespective of the magnitude | size or shape of a character / graphic.
[0033]
In this example, four binary images are used as the character / graphic image 113. Of course, each pixel of the character / graphic image 113 may be represented by 3 bits. Further, the multi-color classification unit 11 may classify into seven colors of white, black, red, green, blue, yellow and gray based on the complementary color relationship. Of course, also in this case, each pixel of the character / graphic image 113 can be represented by 3 bits. Furthermore, the digital image 111 may include at least one band other than visible light such as infrared rays and ultraviolet rays. In this case, these bands can be assigned to any of red, green, blue and yellow. Of course, you may assign to a new color other than these.
[0034]
Now, as shown in FIG. 6, the embodiment of the visual device 2 corresponding to the invention of claim 3 is realized by the fine character / graphic extracting means 22, and the fine character / graphic extracting means 22 is described above. The refinement means 14 is added to the character / graphic extraction means 21 of FIG. This refinement means 14 uses the character / graphic image 113 to extract the band pixel value of the character / graphic from the digital image 111 to obtain the corresponding band pixel value of the fine character / graphic image 114, and the others are predetermined. Make it gray. As a result, the present invention can accurately represent the color of this character / graphic that could not be accurately represented in the character / graphic image 113. Further, by making the pixels other than the characters / graphics gray, the present invention reduces the amount of information of the fine characters / graphic images 114 and further compresses the fine characters / graphic images 114 by a compression program. Can be.
[0035]
Now, the multi-color classification means 11, the character / graphic extraction means 21 and the refinement means 14 used in the visual device 2 are implemented by using a data processing device 110 composed of several array operation units 100. be able to. Therefore, in the following, an embodiment of the data processing device 110 using the array operation unit 100 will be described, and the visual device 2 will be described with reference to the drawings.
[0036]
First, the array operation unit 100 generates one pixel of the output image by using one pixel of the input image and its neighboring pixels. Therefore, as shown in FIG. 3, by using the data processing device 110 in which the array operation units 100 are arranged in a grid according to the size of the input image, the data processing device 110 can generate an output image from the input image. it can. In FIG. 3, the array operation unit 100 is abbreviated as AOU. In FIG. 3, the array operation units 100 are arranged in a square lattice shape. However, in order to minimize the mounting area, the array operation units 100 may be arranged in a hexagonal lattice shape, that is, in a close-packed structure. . In this case, some of the plurality of signal lines between the array operation units 100 are wired in a zigzag manner. Next, the array operation unit 100 may be implemented by dedicated hardware, or may be implemented by software on a general-purpose computer. That is, if the output image can be generated from the input image, the mounting means is not limited. Therefore, the image processing of the data processing device 110 can be shown by showing the algorithm of the array operation unit 100. Therefore, in order to show the algorithm of the array operation unit 100, the mathematical formulas used in the multi-color classification means 11, the character / graphic extraction means 21, the texture removal means 13 and the refinement means 14 shown in FIGS. .
[0037]
Arbitrary 2 of width w, height h, number of bands bⁿGradation imagexas well asyAnd these imagesxas well asyIs the band pixel value x at each position p (i, j, k)_ijkAnd y_ijkAre expressed as in

Equations

1 and 2. The underlined character indicates a vector. N is a non-negative integer, and w, h, b, i, j, and k are natural numbers.
[0038]
[Expression 1]

[0039]
[Expression 2]

[0040]
First, a function related to point processing for each band pixel value of the image will be described below.
[0041]
imagexIs converted into a binary image, the band pixel value is binarized according to Equation 3.
[0042]
[Equation 3]

[0043]
imagexIs saturated at the lower limit a and the upper limit b, the band pixel value is saturated according to Equation 4.
[0044]
[Expression 4]

[0045]
imagexWhen the window corresponding to the lower limit c and the upper limit d is generated, the band pixel value is generated according to Equation 5.
[0046]
[Equation 5]

[0047]
imagexIs converted into a band maximum value image, the maximum value is selected from the values of the respective bands of the pixels in i rows and j columns according to Equation 6. Since the band maximum value image is a single band image, it is handled as the image having the band number 1 for convenience. Therefore, the third subscript is 1.
[0048]
[Formula 6]

[0049]
Now, a set P of positions in the vicinity of q at the position p (i, j, k) of the image._ijk(Q) is expressed by Equation 7. Where q is 4, 8, 24, 48, 80, 120, (2r + 1)²It is a sequence following −1, and r is a natural number. Note that the position that protrudes the image size is the set P_ijkIf included in (q), position p (i, j, k) shall be substituted unless otherwise specified. In other cases, according to the designation, an imaginary position whose pixel value corresponds to 0 and is not included in the image is substituted. Thereby, the edge processing is automatically performed. Therefore, the set P_ijkThe number of elements in (q) is always q.
[0050]
[Expression 7]

[0051]
Therefore, a function and an operator relating to neighborhood processing of up to 8 neighborhoods for each band pixel value of the image will be described below.
[0052]
imagexThe smoothing at the position p (i, j, k) is performed according to Equation 8. However, int (v) means that the decimal part of the real number v is rounded down. If imagexIf the band pixel value is an integer value, when q = 4 at the time of hardware implementation, x_lmkRight shift instruction twice for the sum of, x when q = 8_lmkBy changing to a circuit that executes the right shift instruction three times with respect to the sum of, the circuit that executes the division can be omitted.
[0053]
[Equation 8]

[0054]
The Laplacian calculation is simply a second-order difference operator as shown in Equation 9. The vicinity of 8 captures subtle changes in noise and increases the number of zero points and zero crossings, which is suitable for the present invention. However, since q is 4 or 8, when q = 4 when hardware is installed, x_ijkLeft shift instruction twice for q, x when q = 8_ijkOn the other hand, by changing to a circuit that executes the left shift instruction three times, a circuit that executes multiplication can be omitted.
[0055]
[Equation 9]

[0056]
imagexThis image as is any binary imagexIn order to remove isolated points or holes, the calculation is performed according to Equation 10. In the case of 4 neighborhoods, a diagonal line cannot be detected due to its nature, so it is better to use 8 neighborhoods as much as possible.
[0057]
[Expression 10]

[0058]
Next, functions and operators related to the neighborhood processing for each band pixel value of the image will be described below.
[0059]
2 imagesx,yIf there is, the difference between these images is calculated according to Equation 11.
[0060]
## EQU11 ##

[0061]
Here, using the Laplacian according to Equation 9 and the difference according to Equation 11, the image according to Equation 12 is used.xCan be described easily.
[0062]
[Expression 12]

[0063]
2 imagesx,yThere is an imageyIs a single-band binary image, this image according to Equation 13yThis image using the band pixel value ofxEach band pixel value can be extracted.
[0064]
[Formula 13]

[0065]
In addition, two imagesx,yThere is an imageyIs a single-band binary image, the image according to Equation 14yThis image using the band pixel value ofxIn addition, the band pixel values can be extracted and the remaining band pixel values can be set to an arbitrary value g. Note that Equation 13 is equivalent to g = 0 in Equation 14.
[0066]
[Expression 14]

[0067]
imagexThe summation of the vicinity of q at the position p (i, j, k) is performed according to Equation 15.
[0068]
[Expression 15]

[0069]
imagexIs a multi-band image such as a color image, the two bands, for example, the first band red and the second band greenxIt is assumed that white, black, red, green and gray are extracted from the characters / graphics in the middle and the outline. Image at this timexThe multicolor extraction at the position p (i, j, k) is performed according to Equations 16 to 19 using the q neighborhood. Note that the right side of Equations 16 to 19 originally needs to be divided by q. However, since these band pixel values are used only for comparing the magnitudes of each other, the right side is multiplied by q so that the division is omitted. ing. In addition, gray is an intermediate color of white, black, red, and green, and refers to colors other than these colors.
[0070]
[Expression 16]

[0071]
[Expression 17]

[0072]
[Formula 18]

[0073]
[Equation 19]

[0074]
Next, it is assumed that one color is selected from a plurality of colors extracted from the image. 4-band image at this timexThe multi-color selection at the position p (i, j, k) is performed according to the equations 20 to 23 using a predetermined value f. At this time, this 4-band imagexFor each pixel, f is desirably finely adjusted appropriately. When none of white, black, red, and green is selected in these mathematical formulas, the pixel indicated by these mathematical formulas is gray.
[0075]
[Expression 20]

[0076]
[Expression 21]

[0077]
[Expression 22]

[0078]
[Expression 23]

[0079]
Here, when multicolor extraction according to Equations 16-19 and multicolor selection according to Equations 20-23 are used, an image is obtained according to Equation 24.xMulti-color classification can be described easily.
[0080]
[Expression 24]

[0081]
imagexWhen the binarization of Expression 3, the summation of Expression 15, the window of Expression 5, and the mask of Expression 13 are used for the vicinity of q at the position p (i, j, k) ofxCharacter / graphic selection for each band can be easily described.
[0082]
[Expression 25]

[0083]
Binary imagexWhen a Laplacian according to Equation 9, a binarization according to Equation 3, a summation according to Equation 15, a window according to Equation 5, and a mask according to Equation 13 are used for the vicinity of q at position p (i, j, k) of This binary imagexThe texture removal for each band can be easily described.
[0084]
[Equation 26]

[0085]
2 imagesxas well asyThere is this pictureyIs a binary image having an arbitrary number of bands, and using the maximum value selection according to Equation 6 and the mask according to Equation 14, this binary image is obtained according to Equation 27.yThe refinement of the can be described easily.
[0086]
[Expression 27]

[0087]
Therefore, by using Equations 1 to 27, the data processing apparatus 110 that implements the multi-color classification means 11, the character / graphic extraction means 21, and the fine character / graphic extraction means 22 shown in FIGS. The algorithm of all the array operation units 100 can be described. Hereinafter, the multi-color classification unit 11, the character / graphic extraction unit 21, and the fine character / graphic extraction unit 22 will be described in order using an algorithm of an arbitrary array calculation unit 100 in the data processing apparatus 110.
[0088]
As shown in FIG. 4, in order for the multi-color classification means 11 realized by the data processing device 110 to generate a multi-color image 112 from the digital image 111, the array operation units 100 arranged in a lattice pattern are synchronized in parallel. To work. An array operation unit 100 arranged in i rows and j columns on the lattice is designated as AOU._ijThen, AOU for the multi-color classification means 11_ijThe algorithm is as shown in FIG.
[0089]
In step 1101, AOU_ijAre arranged in i rows and j columns on the lattice. This can be logical or physical, AOU_ijIs necessary to determine the neighborhood of.
[0090]
In step 1102, AOU_ijSet the initial value of the neighborhood and variables. In the neighborhood setting, the neighborhood size q used in each function may be determined individually or all of them may be unified. In order to increase the accuracy of the character / graphic image 113 generated by the data processing apparatus 110 of the present invention, it is desirable to set all the neighborhood sizes q to large values. However, the character / figure extraction means 21 can cope with this by appropriately changing the neighborhood size as necessary due to the limitation of the calculation time and the size of the input digital image 111.
[0091]
In step 1103, it is determined whether the digital image 111 is completed. If there is no digital image 111 (step 1103: YES), the algorithm is terminated. If there is a digital image 111 (step 1103: NO), the process proceeds to step 1104. However, when the array operation unit 100 is mounted for a specific number of bands and image size, an infinite loop may be used.
[0092]
In step 1104, pixels of i rows and j columns of the digital image 111 are input for the number of bands. This is AOU_ijThis is because the pixels in the i-th row and j-th column of the digital image 111 are collectively processed. For this reason AOU_ijRequires a memory 102 for storing at least image data for the number of bands.
[0093]
In step 1105, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain a function S for each band pixel value of the input digital image 111._ijk(x) To smooth. The smoothed band pixel value is treated as the band pixel value of the smooth image. Where function S_ijk(x) May be repeated several times as necessary. In the case of a general multiband image, two times is sufficient. Note that, after the smoothing, the smoothed image may be logarithmically converted as necessary.
[0094]
In step 1106, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain the function E for each band pixel value of the smoothed image._ijk(x) To sharpen. The sharpened band pixel value is treated as the band pixel value of the sharp image.
[0095]
In step 1107, the function M is applied to each band pixel value of the sharp image.^q _ijk(x) And classify into multiple colors. The classified band pixel value is treated as the band pixel value of the color classified image.
[0096]
In step 1108, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain the function A for the band pixel value of the color classification image._ijk(x) To remove isolated points and holes. The band pixel value from which the isolated point and the isolated hole are removed is treated as the band pixel value of the multicolor image 112.
[0097]
In step 1109, the band pixel value of the multicolor image 112 is output. Thereafter, the process returns to step 1103.
[0098]
Accordingly, the multicolor classification unit 11 can generate the multicolor image 112 from the digital image 111 using the data processing device 110 configured by the array operation unit 100.
[0099]
As shown in FIG. 5, in order for the character / graphic extraction means 21 realized by the data processing device 110 to generate the character / graphic image 113 from the digital image 111, the array operation units 100 arranged in a grid pattern are synchronized. Operate in parallel. An array operation unit 100 arranged in i rows and j columns on the lattice is designated as AOU._ijThen, AOU for the character / figure extracting means 21_ijThe algorithm is as shown in FIG.
[0100]
In step 2101, AOU_ijAre arranged in i rows and j columns on the lattice. This can be logical or physical, AOU_ijIs necessary to determine the neighborhood of.
[0101]
In step 2102, AOU_ijSet the initial value of the neighborhood and variables. In the neighborhood setting, the neighborhood size q used in each function may be determined individually or all of them may be unified. In order to increase the accuracy of the character / graphic image 113 generated by the data processing apparatus 110 of the present invention, it is desirable to set all the neighborhood sizes q to large values. However, the character / figure extraction means 21 can cope with this by appropriately changing the neighborhood size as necessary due to the limitation of the calculation time and the size of the input digital image 111.
[0102]
In step 2103, it is determined whether the digital image 111 is completed. If there is no digital image 111 (step 2103: YES), the algorithm is terminated. If there is a digital image 111 (step 2103: NO), the process proceeds to step 2104. However, when the array operation unit 100 is mounted for a specific number of bands and image size, an infinite loop may be used.
[0103]
In step 2104, the pixels of i rows and j columns of the digital image 111 are input for the number of bands. This is AOU_ijThis is because the pixels in the i-th row and j-th column of the digital image 111 are collectively processed. For this reason AOU_ijRequires a memory 102 for storing at least image data for the number of bands.
[0104]
In step 2105, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain a function S for each band pixel value of the input digital image 111._ijk(x) To smooth. The smoothed band pixel value is treated as the band pixel value of the smooth image. Where function S_ijk(x) May be repeated several times as necessary. In the case of a general multiband image, two times is sufficient. Note that, after the smoothing, the smoothed image may be logarithmically converted as necessary.
[0105]
In step 2106, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain the function E for each band pixel value of the smoothed image._ijk(x) To sharpen. The sharpened band pixel value is treated as the band pixel value of the sharp image.
[0106]
In step 2107, the function M is applied to each band pixel value of the sharp image.^q _ijk(x) And classify into multiple colors. The classified band pixel value is treated as the band pixel value of the color classified image.
[0107]
In step 2108, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain the function A for the band pixel value of the color classification image._ijk(x) To remove isolated points and holes. The band pixel value from which the isolated point and the isolated hole are removed is treated as the band pixel value of the multicolor image 112.
[0108]
In step 2109, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain a function U for each band pixel value of the multicolor image.^q _ijk(x) To select characters / graphics. The selected band pixel value is treated as the band pixel value of the coarse character / graphic image. Note that this function may be repeated several times as necessary.
[0109]
In step 2110, the function T is applied to each band pixel value of the rough character / graphic image.^q _ijk(x) To remove the texture. The band pixel value from which the texture is removed is treated as the band pixel value of the character / graphic image 113. Note that this function may be repeated several times as necessary.
[0110]
In step 2111, the band pixel value of the character / graphic image 113 is output. Thereafter, the process returns to step 2103.
[0111]
Thereby, the character / graphic extraction means 21 can generate the character / graphic image 113 from the digital image 111 using the data processing device 110 configured by the array operation unit 100.
[0112]
As shown in FIG. 6, in order for the fine character / graphic extraction means 22 realized by the data processing device 110 to generate the fine character / graphic image 114 from the digital image 111, the array operation unit 100 arranged in a grid pattern is Operates in parallel in synchronization. An array operation unit 100 arranged in i rows and j columns on the lattice is designated as AOU._ijThen, AOU for the fine character / graphic extraction means 22_ijThe algorithm is as shown in FIG.
[0113]
In step 2201, AOU_ijAre arranged in i rows and j columns on the lattice. This can be logical or physical, AOU_ijIs necessary to determine the neighborhood of.
[0114]
In step 2202, AOU_ijSet the initial value of the neighborhood and variables. In the neighborhood setting, the neighborhood size q used in each function may be determined individually or all of them may be unified. In order to increase the accuracy of the fine character / graphic image 114 generated by the data processing apparatus 110 of the present invention, it is desirable to set all the neighborhood sizes q to large values. However, the fine character / figure extraction means 22 can cope with this by appropriately changing the neighborhood size as necessary, due to the limitation of the calculation time and the size of the input digital image 111.
[0115]
In step 2203, it is determined whether the digital image 111 is completed. If there is no digital image 111 (step 2203: YES), the algorithm is terminated. If there is a digital image 111 (step 2203: NO), the process proceeds to step 2204. However, when the array operation unit 100 is mounted for a specific number of bands and image size, an infinite loop may be used.
[0116]
In step 2204, the pixels of i rows and j columns of the digital image 111 are inputted for the number of bands. This is AOU_ijThis is because the pixels in the i-th row and j-th column of the digital image 111 are collectively processed. For this reason AOU_ijRequires a memory 102 for storing at least image data for the number of bands.
[0117]
In step 2205, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain a function S for each band pixel value of the input digital image 111._ijk(x) To smooth. The smoothed band pixel value is treated as the band pixel value of the smooth image. Where function S_ijk(x) May be repeated several times as necessary. In the case of a general multiband image, two times is sufficient. Note that, after the smoothing, the smoothed image may be logarithmically converted as necessary.
[0118]
In step 2206, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain the function E for each band pixel value of the smoothed image._ijk(x) To sharpen. The sharpened band pixel value is treated as the band pixel value of the sharp image.
[0119]
In step 2207, the function M is applied to each band pixel value of the sharp image.^q _ijk(x) And classify into multiple colors. The classified band pixel value is treated as the band pixel value of the color classified image.
[0120]
In step 2208, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain the function A for the band pixel value of the color classification image._ijk(x) To remove isolated points and holes. The band pixel value from which the isolated point and the isolated hole are removed is treated as the band pixel value of the multicolor image 112.
[0121]
In step 2209, AOU_ijCommunicates with the neighboring array operation unit 100 to obtain a function U for each band pixel value of the multicolor image.^q _ijk(x) To select characters / graphics. The selected band pixel value is treated as the band pixel value of the coarse character / graphic image. Note that this function may be repeated several times as necessary.
[0122]
In step 2210, the function T is applied to each band pixel value of the rough character / graphic image.^q _ij _k(x) To remove the texture. The band pixel value from which the texture is removed is treated as the band pixel value of the character / graphic image 113. Note that this function may be repeated several times as necessary.
[0123]
In step 2211, the function F is applied to each band pixel value of the character / graphic image 113._ijk(x,y). The refined band pixel value is treated as the band pixel value of the fine character / graphic image 114.
[0124]
In step 2212, the band pixel value of the fine character / graphic image 114 is output. Thereafter, the process returns to step 2203.
[0125]
As a result, the fine character / graphic extraction means 22 can generate the fine character / graphic image 114 from the digital image 111 using the data processing device 110 constituted by the array operation unit 100.
[0126]
Here, the multicolor image 112 and the character / graphic image 113 are handled as four-band images, but the visual device 2 may convert each pixel of these images into a 3-bit band pixel value by the output means. good. As a result, the data amount of these images is greatly compressed as compared with the input digital image 111. In addition, since most of the pixels in these images are gray, these images can be further compressed using a compression program. In addition, although the case where the red band and the green band of the digital image 111 are used has been described here, of course, by using the blue band, white, black, red, green, blue, yellow, and gray 7 can be obtained in the same manner. It can also be converted to color. Also in this case, the visual device 2 can convert each pixel of the multicolor image 112 and the character / graphic image 113 into a 3-bit band pixel value by the output means.
[0127]
So far, the visual device 2 for extracting characters / graphics has been described. Since the visual device 2 can compress each pixel of the digital image 111 to at least 3 bits and can make many pixels gray, the visual device 2 is suitable for high-speed communication of the digital image 111. Yes. Therefore, hereinafter, an embodiment of the communication system will be described focusing on the communication device 7 and the communication network 8.
[0128]
First, as shown in FIG. 11, an embodiment of a communication system corresponding to the invention described in claim 7 includes a visual device 2, a camera 9 and a communication device 7. In FIG. 11, the camera 9 is used as a photographing apparatus, but a scanner may be used instead of the camera 9. In addition, a drawing device such as a mouse, a tablet, or a touch panel can be used in place of the camera 9. The communication device 7 includes an image compression unit 701 and an image communication unit 702. The image compression means 701 inputs a multicolor image 112, a character / graphic image 113 and a fine character / graphic image 114 generated by the visual device 2 from a digital image 111 taken by the camera 9, and a moving image such as MPEG. The moving image and the frame image are compressed using an image compression technique and a still image compression technique such as JPEG, and output to the image communication unit 702. At this time, since many pixels of these input images are gray and the number of other colors is limited, a combination of compression formats suitable for animation such as GIF and Flash that can efficiently compress these colors is combined. Thus, the image compression unit 701 can achieve a high compression rate.
[0129]
Next, the image communication unit 702 can transmit an MPEG image, a JPEG image, and the like to a nearby computer system 800 via a communication network 8 such as a USB (Universal Serial Bus). Further, the image communication means 702 can transmit these MPEG images and JPEG images to the remote computer system 800 via the communication network 8 such as the Internet by implementing a communication protocol such as TCP / IP. it can. Further, by using high-band radio technology such as DS-CDMA and MC-CDMA as the image communication means 702, the communication device 7 can inexpensively select the moving image captured by the camera 9 and can select any place and time. Can be sent to. Of course, by using electronic mail, the communication device 7 can transmit these MPEG images and JPEG images only when it can communicate with the computer system 800. Therefore, in the embodiment of the character / graphics search system corresponding to the invention described in claim 8, by connecting at least one computer system 800 to the communication network 8, the computer system 800 is connected to the communication network 8. These compressed images can be received at high speed.
[0130]
Here, as shown in FIG. 13, the computer system 800 includes an image receiving unit 801, an image expansion unit 802, a character / graphic area detection unit 803, a key image generation unit 804, and an image detection unit 805. The image receiving unit 801 receives compressed image data from the communication network 8 using the same technique as the image communication unit 702. This image expansion means 802 uses a moving image expansion technique such as MPEG and a still image expansion technique such as JPEG to convert these compressed image data into a multicolor image 112, a character / graphic image 113 generated by the visual device 2, and Decompress to fine character / graphic image 114. At this time, since many pixels in these images are gray, the character / graphic region detection means 803 calculates these images by calculating horizontal and vertical histograms for pixels other than gray. The position and size of an area including at least one character / graphic included in can be easily determined. Further, by using the position / size detection algorithm of the visual device 2 for the binary image in which the predetermined color is set to 1 and the remaining colors are set to 0, the character / graphic area detection means 803 can The position and size can be detected at high speed. Furthermore, instead of this position / size detection algorithm, a position, size and inclination detection algorithm (hereinafter abbreviated as position / size / tilt detection algorithm) is used to detect the inclination of this region at high speed. can do. Therefore, the key image generation means 804 can generate a key image of the database by cutting out the area from these images using the position and size of the area.
[0131]
On the other hand, as shown in FIG. 14, the database includes an image record including an image ID 501 that is an identification number of a recorded image, an image description 502 that is an explanation of the recorded image, and recorded image data 503 that is data of the recorded image. Is recorded. Therefore, this image detection means 805 uses image search software such as imgSeek, relational database and object-oriented database having a database description language such as SQL (Structure Query Language), and also a neural network such as perceptron. Thus, a predetermined character / graphic including the color information can be detected from this key with high accuracy.
[0132]
As shown in FIG. 12, the embodiment of the character / graphics search system corresponding to the invention described in claim 9 is such that the communication device 7 such as the mobile phone 211 includes the display 222 and the speaker 224. Information can be transmitted to 111 photographers. Therefore, the computer system 800 includes an image transmission unit similar to the image communication unit 702, and further, a detection result or position detected by the image detection unit 805 using an area including at least one predetermined character / graphic. By transmitting information to the communication device 7 via the communication network 8, the communication device 7 can teach the photographer the position of this area. For example, this position information is represented by the two-dimensional coordinates of the digital image 111. In this case, by displaying the position indicated by the position information on the display 222, the photographer can point the camera 9 at the character / figure. Further, when the photographer moves forward with the position always coinciding with the center of the digital image 111, the photographer can reach the place where the character / graphic is located. The position information may represent a relative direction from the center of the digital image 111, for example, up, down, left, and right. In this case, the speaker 224 outputs these directions as voice data via the voice synthesizer, so that the photographer can reach the place where the character / figure is located without looking at the display 222. be able to. In particular, such an audio function is useful for people who are difficult to see the display 222, such as visually impaired people and elderly people, and people who are difficult to continue to see the display 222 at all times, such as workers who are working. .
[0133]
By the way, as shown in FIG. 14, the database includes a large number of image records including an image ID 501 that is an identification number of a recorded image, an image description 502 that is an explanation of the recorded image, and recorded image data 503 that is data of the recorded image. Further, as shown in FIG. 15, this image record includes at least one of a search result character string 504 for this recorded image and a search result image ID 505 indicating another recorded image as a search result image. Shall be. At this time, the embodiment of the character / graphic search system corresponding to the invention described in claim 10 uses the function of the database, so that this character / graphic is not the character / graphic itself in the photographed digital image 111. The search result character string 504 or the search result image associated with the digital image 111 can be transmitted to the photographer. That is, when the computer system 800 transmits the search result character string 504 or the search result image to the communication device 7 via the communication network 8, the communication device 7 sends the search result to the photographer. The character string 504 or this search result image can be presented. For example, if the photographed word is “the highest mountain in the world”, the computer system 800 can transmit the word “Everest” or an image of this Everest to the communication device 7. Since the present invention uses an image representing this word as a key of this database, this database can be searched regardless of the language used, such as Japanese, English, French and German. Moreover, since the visual device 2 extracts this word from the digital image 111 and further compresses it, the present invention can greatly reduce the communication load of the communication network 8. Therefore, it is particularly useful for the communication network 8 in which the communication band is limited like the mobile phone 211. Further, by using the search result character string 504 as a home page address, the photographer can directly call the home page from the photographed characters / graphics using an Internet connection function such as i-mode. Of course, the photographer can also use the search result character string 504 for inputting characters on the home page.
[0134]
Although the case where a general character / graphic image is used as the recorded image data 503 has been described here, of course, this database is not limited to an image including the character / graphic, but includes many images associated with the image. Any one of the color image 112, the character / graphic image 113, and the fine character / graphic image 114 can be used as the recorded image data 503. In this case, the database can use the image ID 501 of the character / graphic image as the search result image ID 505. As a result, the database can search the character / graphic image at high speed.
[0135]
According to the embodiments of the invention described in

claims

1 and 4, all means can be realized by local processing. Accordingly, an image sensor manufacturer manufactures an LSI (Large Scale Integrated Circuit) that implements a data processing device 110 including a plurality of array operation units 100 arranged in a two-dimensional lattice, and the required number of LSIs is provided. By stacking only the image sensors, it is possible to realize an image sensor that can easily and quickly reduce the color of the digital image 111. Therefore, by incorporating this image sensor into an electronic device such as a personal computer, a game machine, a television, and a video deck, these electronic devices can capture the user's gesture while considering the privacy of the user. . In particular, this image sensor is suitable for confirming the safety of an elderly person and for monitoring the elderly person's wrinkles. In the present embodiment, each pixel of the digital image 111 can be compressed to a 3-bit band pixel value to generate an image such as a color manga. Therefore, by incorporating this embodiment into a video deck, this video deck can accurately label each scene of a recorded moving image. Therefore, the user of this video deck can quickly cue the scene by drawing a sketch of the scene to be viewed using a touch panel or the like. Furthermore, by incorporating this embodiment into a communication device such as the mobile phone 211, even if the number of pixels of the digital image 111 is large, the communication device can efficiently compress the digital image 111. Since the present embodiment is very similar to the visual function of the human brain, the present embodiment is very useful for elucidating the visual function of the human brain. For example, ancient people such as the Kromanmans painted murals in caves, and our ancestors used primitive hieroglyphics, many of which were abstract line drawings. The reason for this is thought to be that their brain and hand expressive power was immature, but this embodiment is that they faithfully follow the visual function of the brain, that is, "as seen" animals and objects. It is implied that Furthermore, although we can easily recognize the characters and scenery in the manga, the reason for this can be easily understood by using this embodiment. Therefore, the present embodiment is useful for developing a user interface that is easy for the user to understand.
[0136]
According to the embodiments of the invention described in

claims

2 and 5, all means can be realized by local processing. Accordingly, an image sensor manufacturer manufactures an LSI (Large Scale Integrated Circuit) that implements a data processing device 110 including a plurality of array operation units 100 arranged in a two-dimensional lattice, and the required number of LSIs is provided. By stacking only the layers, it is possible to realize an image sensor that can easily and quickly extract characters / graphics in the digital image 111. Therefore, by incorporating this image sensor into an electronic device such as a personal computer, a game machine, a television, and a video deck, these electronic devices can capture the user's gesture while considering the privacy of the user. . In particular, this image sensor is suitable for confirming the safety of an elderly person and for monitoring the elderly person's wrinkles. Further, the present embodiment can generate an image such as a color manga by compressing the digital image 111 without impairing the color and shape of the characters / graphics. Therefore, by incorporating this embodiment into a video deck, this video deck can accurately label each scene of a recorded moving image. Therefore, the user of this video deck can quickly cue the scene by drawing a sketch of the scene to be viewed using a touch panel or the like. Furthermore, by incorporating this embodiment into a communication device such as the mobile phone 211, even if the digital image 111 has a large number of pixels, the communication device can efficiently compress the digital image 111. The user of this communication device can accurately identify this character / graphic while considering privacy. In particular, it is suitable for a visually impaired person and an elderly person to support the reading and walking from a remote place.
[0137]
According to the embodiments of the third and sixth aspects of the invention, all means can be realized by local processing. Accordingly, an image sensor manufacturer manufactures an LSI (Large Scale Integrated Circuit) that implements a data processing device 110 including a plurality of array operation units 100 arranged in a two-dimensional lattice, and the required number of LSIs is provided. By stacking only the layers, it is possible to realize an image sensor that can easily and quickly extract characters / graphics in the digital image 111. Therefore, by incorporating this image sensor into an electronic device such as a personal computer, a game machine, a television, and a video deck, these electronic devices can capture the user's gesture while considering the privacy of the user. . In particular, this image sensor is suitable for confirming the safety of an elderly person and for monitoring the elderly person's wrinkles. In addition, the present embodiment can generate an image such as a color manga by compressing the digital image 111 while accurately maintaining the color and shape of the character / graphic. Therefore, by incorporating this embodiment into a video deck, this video deck can accurately label each scene of a recorded moving image. Therefore, the user of this video deck can quickly cue the scene by drawing a sketch of the scene to be viewed using a touch panel or the like. Further, by incorporating this embodiment into a communication device such as the mobile phone 211, even if the number of pixels of the digital image 111 is large, the communication device can efficiently compress the digital image 111. The user of the communication device can accurately identify this character / graphic while considering privacy. In particular, it is suitable for a visually impaired person and an elderly person to support the reading and walking from a remote place.
[0138]
According to the embodiments of the invention described in

claims

7 and 8, the computer system 800 connected to the communication system equipped with the visual device 2 via the communication network 8 can transmit characters / characters from the digital image 111 captured by the camera 9. A graphic can be easily selected, and the color of the character / graphic can be accurately identified. Therefore, by providing the computer system 800 with a function for recognizing the characters / graphics, the user of the computer system 800 can easily grasp the situation around the remote photographer who has the camera 9. can do.
[0139]
According to the embodiment of the invention as set forth in claim 9, the computer system 800 connected to the communication system equipped with the visual device 2 via the communication network 8 receives characters / graphics from the digital image 111 photographed by the camera 9. It can be easily selected, and the color of the character / graphic can be accurately identified. Therefore, by providing the computer system 800 with a function for recognizing the characters / graphics, the user of the computer system 800 can easily grasp the situation around the remote photographer who has the camera 9. can do. For example, by providing the computer system 800 with a function of recognizing a hamburger chain logo, the computer system 800 can determine whether there is a hamburger chain store around the photographer. Moreover, if present, the computer system 800 directs the photographer to the entrance of the store by causing the camera 9 to point the camera 9 toward the logo. it can. Of course, by providing the computer system 800 with a function of recognizing signs and landmarks on the way to the store, the computer system 800 can guide the photographer to the store. Therefore, this photographer can easily find this store even in the first place. Further, by providing the computer system 800 with a function of recognizing a toilet mark, the computer system 800 uses a signboard, a flyer, and the like that are normally used without providing an expensive social infrastructure such as a wireless guidance device. Thus, this photographer can be guided to the toilet of this store. By combining such a communication system with GPS (Global Positioning System), this communication system can achieve high accuracy and usefulness, especially when the visually impaired and the elderly go out. is there. In addition, by providing the computer system 800 with a function of recognizing a plurality of lanes and marks on the road, the computer system 800 can detect the positions of these lanes as well as the safety zone. The position and speed limit can be warned.
[0140]
According to the embodiment of the invention as set forth in claim 10, the computer system 800 connected to the communication system equipped with the visual device 2 via the communication network 8 receives characters / graphics from the digital image 111 photographed by the camera 9. Not only can it be easily selected and the color of this character / graphic can be accurately identified, but also provides information corresponding to this character / graphic to the remote photographer who has this camera 9 can do. Therefore, by providing the computer system 800 with a function of recognizing specific keywords and marks, the photographer can find these keywords and marks hidden around. For example, when the computer system 800 uses a database to combine the date and time, the location, and these keywords and marks, the communication system can create a treasure hunt game in which the keywords and marks are sequentially searched for in the communication network 8. Can be realized over a wide range. Therefore, by combining such a game with the guidance of the above-described hamburger chain to the store, the communication system can promote the business activities of the hamburger chain. Further, by providing the computer system 800 with a function of recognizing the program name, time, and mark described in the TV program guide, the photographer can not only obtain information of the photographed program, but also the communication apparatus. 7 can also be used as a remote controller for television and video. Since the characters of the program guide are generally small, it is particularly convenient when the visually impaired and the elderly use television and video. Further, when the photographer photographs a product code written on a flyer, a homepage or the like using the communication device 7 such as the mobile phone 211, the computer system 800 uses the mobile phone 211 to explain the product, An ordering guide can also be transmitted. Therefore, by using this computer system 800 by a mail-order company, this company can automate orders from customers without using barcodes or the like in the catalog. Moreover, even if the product code is small, the computer system 800 can identify the product code by the photographer taking a close-up copy of the catalog, so that visually impaired persons and elderly people use mail order sales. Convenient. Of course, the photographer can also photograph clothes worn by actresses on television and accessories found at friends' homes and order these clothes and accessories immediately. In addition, by providing the computer system 800 with a function of recognizing a road sign, the computer system 800 warns of a road closure even for temporary work that is not registered in the car navigation, The person's car can be guided to a detour.
[0141]
Although the present embodiment has been described above, the present invention is not limited to the above-described embodiment, and various modes can be implemented by those skilled in the art without departing from the technical idea of the present invention. Of course, the configuration of the present invention can be modified as appropriate, and such modifications are also within the technical scope of the present invention.
[Brief description of the drawings]
FIG. 1 is a block diagram of a visual device including one data processing device of the present embodiment.
FIG. 2 is a block diagram of a visual device including four data processing devices of the present embodiment.
FIG. 3 is a block diagram of an array operation unit arranged in a grid pattern according to the present embodiment.
FIG. 4 is an explanatory diagram when each pixel of a digital image according to the present embodiment is classified into five colors.
FIG. 5 is an explanatory diagram when characters and figures are extracted from the digital image of the present embodiment.
FIG. 6 is an explanatory diagram in a case where characters and figures extracted from a digital image according to the present embodiment are accurately expressed.
FIG. 7 is a flowchart showing an algorithm of the multi-color classification unit of the present embodiment.
FIG. 8 is a flowchart showing an algorithm of a character / graphic extraction unit of the present embodiment.
FIG. 9 is a flowchart showing an algorithm of a fine character / graphic extracting unit of the present embodiment.
FIG. 10 is a block diagram of an internal structure of the array operation unit according to the present embodiment.
FIG. 11 is a block diagram of a communication system including a camera, a visual device, and a communication device according to the present embodiment.
FIG. 12 is an explanatory diagram of a mobile phone equipped with a camera, a display, and a speaker according to the present embodiment.
FIG. 13 is an explanatory diagram of a computer system that searches a database using compressed image data according to the present embodiment.
FIG. 14 is an explanatory diagram of an image record in a database for recording a recorded image according to the present embodiment.
FIG. 15 is an explanatory diagram of an image record in a database that records a search result image ID and a search result character string according to the present embodiment;
[Explanation of symbols]
2 Visual equipment
7 Communication equipment
8 Communication network
9 Camera
11 Multi-color classification means
12 Character / graphic selection means
13 Texture removal means
14 Refinement means
21 Character / figure extraction means
22 Fine character / graphic extraction means
51 Address bus
52 Data bus
100 array operation unit
101 processor
102 memory
103 controller
110 Data processing device
111 digital images
112 Multicolor image
112a Multicolor image (white)
112b Multicolor image (black)
112c Multicolor image (red)
112d Multicolor image (green)
113 character / graphic images
113a Character / graphic image (white)
113b Character / graphic image (black)
113c Character / graphic image (red)
113d Character / graphic image (green)
114 Fine character / graphic image
211 Mobile phone
222 display
224 speaker
501 Image ID
502 Description of images
503 Recorded image data
504 Search result string
505 Search result image ID
701 Image compression means
702 Image communication means
800 computer system
801 Image receiving means
802 Image expansion means
803 Character / graphic region detection means
804 key image generation means
805 Image detection means

Claims

少なくとも１個のデータ処理装置を含む視覚装置であって、
前記データ処理装置が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、
前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、
を特徴とし、
前記データ処理装置が、補色関係に基づいて、入力したデジタル画像の各画素を灰色を含む少なくとも５色に分類することにより、前記デジタル画像から少なくとも前記５色で表される多色画像を生成する多色分類手段を備えることにより、前記デジタル画像の各画素を少なくとも３ビットの帯域画素値に圧縮することを特徴とする視覚装置。A visual device comprising at least one data processing device,
The data processing device includes at least one processor and at least one set of memory;
The memory stores at least one band pixel value that is one element of a vector representing one image;
Features
The data processing device classifies each pixel of the input digital image into at least five colors including gray based on the complementary color relationship, thereby generating a multicolor image represented by at least the five colors from the digital image. A visual device, comprising: a multicolor classification means for compressing each pixel of the digital image to a band pixel value of at least 3 bits.

少なくとも１個のデータ処理装置を含む視覚装置であって、前記データ処理装置が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、
前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、
を特徴とし、
前記データ処理装置が、
補色関係に基づいて、入力したデジタル画像の各画素を灰色を含む少なくとも５色に分類することにより、前記デジタル画像から少なくとも前記５色で表される多色画像を生成する多色分類手段と、
前記多色画像の各画素の少なくとも１色に対して、所定の近傍内にある所定の帯域画素値の個数が所定の範囲内にある画素だけを選択することにより、前記多色画像を粗文字／図形画像に変換する文字／図形選択手段と、
前記粗文字／図形画像の各画素の少なくとも１色に対して、所定の近傍内にある所定の帯域画素値の個数が所定の範囲内にある画素だけを除去することにより、前記粗文字／図形画像を文字／図形画像に変換するテクスチャ除去手段と、
を備えることにより、前記文字／図形画像の各画素を少なくとも３ビットの帯域画素値に圧縮することを特徴とする視覚装置。A visual device including at least one data processing device, the data processing device including at least one processor and at least one set of memory;
The memory stores at least one band pixel value that is one element of a vector representing one image;
Features
The data processing device is
Multicolor classification means for generating a multicolor image represented by at least the five colors from the digital image by classifying each pixel of the input digital image into at least five colors including gray based on the complementary color relationship;
For at least one color of each pixel of the multicolor image, only the pixels having a predetermined number of band pixel values within a predetermined vicinity are within a predetermined range, thereby selecting the multicolor image as a rough character. / Character / graphic selection means to convert to graphic image;
By removing only pixels in which the number of predetermined band pixel values within a predetermined neighborhood is within a predetermined range for at least one color of each pixel of the rough character / graphic image, the rough character / graphic is removed. Texture removal means for converting an image into a character / graphic image;
A visual device, wherein each pixel of the character / graphic image is compressed to a band pixel value of at least 3 bits.

請求項２記載の視覚装置において、
前記データ処理装置が精細化手段を備え、
前記精細化手段が、
前記文字／図形画像の各画素を構成する複数の前記帯域画素値が前記文字／図形を表す場合、精細文字／図形画像において対応する前記帯域画素値の各々を前記デジタル画像において対応する前記帯域画素値に設定するステップと、
前記文字／図形画像の各画素を構成する複数の前記帯域画素値が前記灰色を表す場合、前記精細文字／図形画像において対応する前記帯域画素値の各々を所定の灰色を表す前記帯域画素値に設定するステップと、
を用いることを特徴とする視覚装置。The visual device of claim 2, wherein
The data processing device comprises a refining means;
The refining means is
When a plurality of band pixel values constituting each pixel of the character / graphic image represent the character / graphic, the band pixel value corresponding to the fine character / graphic image corresponds to the band pixel corresponding to the digital image. A step to set the value,
When a plurality of the band pixel values constituting each pixel of the character / graphic image represent the gray, each of the band pixel values corresponding to the fine character / graphic image is changed to the band pixel value representing a predetermined gray. Steps to set,
A visual device characterized by using the above.

二次元格子状に配列された複数の配列演算ユニットから構成される少なくとも１個のデータ処理装置を含む視覚装置であって、
前記データ処理装置の各々が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、
前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、
各々の前記配列演算ユニットが、少なくとも１個の前記プロセッサを用いて、少なくとも１個の前記帯域画素値を処理することと、
を特徴とし、
各々の前記配列演算ユニットにおいて、
前記配列演算ユニットを初期化する手段と、
入力すべきデジタル画像の帯域画素値がなければ処理を終了する手段と、
前記デジタル画像の少なくとも１個の前記帯域画素値を入力する手段と、
前記デジタル画像の各画素を平滑するために、前記デジタル画像の複数の前記帯域画素値を平滑画像の少なくとも１個の帯域画素値に変換する手段と、
前記平滑画像の各画素を強調するために、前記平滑画像の複数の前記帯域画素値を強調画像の少なくとも１個の帯域画素値に変換する手段と、
前記強調画像の各画素を少なくとも５色に分類するために、前記強調画像の少なくとも１個の前記帯域画素値から色分類画像の少なくとも１個の帯域画素値を生成する手段と、
前記色分類画像の各画素から孤立点及び孤立孔を除去するために、前記色分類画像の複数の前記帯域画素値を多色画像の少なくとも１個の帯域画素値に変換する手段と、
前記多色画像の各々の前記帯域画素値を出力する手段と、
を備えたことを特徴とする視覚装置。A visual device including at least one data processing device composed of a plurality of array operation units arranged in a two-dimensional grid,
Each of the data processing devices includes at least one processor and at least one set of memory;
The memory stores at least one band pixel value that is one element of a vector representing one image;
Each of the array operation units processes at least one of the band pixel values using at least one of the processors;
Features
In each of the array operation units,
Means for initializing the array operation unit;
Means for terminating the processing if there is no band pixel value of the digital image to be input;
Means for inputting at least one band pixel value of the digital image;
Means for converting the plurality of band pixel values of the digital image into at least one band pixel value of the smooth image to smooth each pixel of the digital image;
Means for converting a plurality of band pixel values of the smoothed image into at least one band pixel value of the enhanced image to enhance each pixel of the smoothed image;
Means for generating at least one band pixel value of the color classification image from at least one band pixel value of the enhancement image to classify each pixel of the enhancement image into at least five colors;
Means for converting a plurality of band pixel values of the color classification image into at least one band pixel value of a multicolor image to remove isolated points and holes from each pixel of the color classification image;
Means for outputting the banded pixel values of each of the multicolor images;
A visual device comprising:

二次元格子状に配列された複数の配列演算ユニットから構成される少なくとも１個のデータ処理装置を含む視覚装置であって、
前記データ処理装置の各々が、少なくとも１個のプロセッサ及び少なくとも１組のメモリを含むことと、
前記メモリが、１個の画像を表すベクトルの１個の要素である帯域画素値を少なくとも１個蓄えることと、
各々の前記配列演算ユニットが、少なくとも１個の前記プロセッサを用いて、少なくとも１個の前記帯域画素値を処理することと、
を特徴とし、
各々の前記配列演算ユニットにおいて、
前記配列演算ユニットを初期化する手段と、
入力すべきデジタル画像の帯域画素値がなければ処理を終了する手段と、
前記デジタル画像の少なくとも１個の前記帯域画素値を入力する手段と、
前記デジタル画像の各画素を平滑するために、前記デジタル画像の複数の前記帯域画素値を平滑画像の少なくとも１個の帯域画素値に変換する手段と、
前記平滑画像の各画素を強調するために、前記平滑画像の複数の前記帯域画素値を強調画像の少なくとも１個の帯域画素値に変換する手段と、
前記強調画像の各画素を少なくとも５色に分類するために、前記強調画像の少なくとも１個の前記帯域画素値から色分類画像の少なくとも１個の帯域画素値を生成する手段と、
前記色分類画像の各画素から孤立点及び孤立孔を除去するために、前記色分類画像の複数の前記帯域画素値を多色画像の少なくとも１個の帯域画素値に変換する手段と、
前記多色画像から文字及び図形を選択するために、前記多色画像の複数の前記帯域画素値を粗文字／図形画像の少なくとも１個の帯域画素値に変換する手段と、
前記粗文字／図形画像からテクスチャを除去するために、前記粗文字／図形画像の複数の前記帯域画素値を文字／図形画像の少なくとも１個の帯域画素値に変換する手段と、
前記文字／図形画像の各々の前記帯域画素値を出力する手段と、
を備えたことを特徴とする視覚装置。A visual device including at least one data processing device composed of a plurality of array operation units arranged in a two-dimensional grid,
Each of the data processing devices includes at least one processor and at least one set of memory;
The memory stores at least one band pixel value that is one element of a vector representing one image;
Each of the array operation units processes at least one of the band pixel values using at least one of the processors;
Features
In each of the array operation units,
Means for initializing the array operation unit;
Means for terminating the processing if there is no band pixel value of the digital image to be input;
Means for inputting at least one band pixel value of the digital image;
Means for converting the plurality of band pixel values of the digital image into at least one band pixel value of the smooth image to smooth each pixel of the digital image;
Means for converting a plurality of band pixel values of the smoothed image into at least one band pixel value of the enhanced image to enhance each pixel of the smoothed image;
Means for generating at least one band pixel value of the color classification image from at least one band pixel value of the enhancement image to classify each pixel of the enhancement image into at least five colors;
Means for converting a plurality of band pixel values of the color classification image into at least one band pixel value of a multicolor image to remove isolated points and holes from each pixel of the color classification image;
Means for converting a plurality of band pixel values of the multicolor image into at least one band pixel value of a coarse character / graphic image to select characters and figures from the multicolor image;
Means for converting a plurality of band pixel values of the rough character / graphic image into at least one band pixel value of the character / graphic image to remove texture from the rough character / graphic image;
Means for outputting the band pixel value of each of the character / graphic images;
A visual device comprising:

請求項５記載の視覚装置において、
前記デジタル画像を用いて前記文字／図形画像を精細化し、精細文字／図形画像の少なくとも１個の帯域画素値に変換する手段と、
前記文字／図形画像の各々の前記帯域画素値の代りに、前記精細文字／図形画像の各々の前記帯域画素値を出力する手段と、
備えたことを特徴とする視覚装置。The visual device of claim 5, wherein
Means for refining the character / graphic image using the digital image and converting it into at least one band pixel value of the fine character / graphic image;
Means for outputting the band pixel value of each of the fine character / graphic image instead of the band pixel value of each of the character / graphic image;
A visual device characterized by comprising.

請求項１〜５のうちのいずれか１項に記載の視覚装置と、
前記デジタル画像を撮影する撮影装置又は前記デジタル画像を描画する描画装置と、
画像圧縮手段及び画像通信手段を含む通信装置と、
を含む通信システムであって、
前記視覚装置が出力した、前記多色画像、前記文字／図形画像又は前記精細文字／図形画像に対して、
前記画像圧縮手段が、複数の前記灰色の前記帯域画素値を一纏めに圧縮して圧縮画像データを生成することと、
前記画像通信手段が、前記圧縮画像データを通信ネットワークに送信することと、
を特徴とする通信システム。The visual device according to any one of claims 1 to 5,
A photographing device for photographing the digital image or a drawing device for drawing the digital image;
A communication device including image compression means and image communication means;
A communication system comprising:
For the multicolor image, the character / graphic image or the fine character / graphic image output by the visual device,
The image compression means collectively compresses a plurality of the gray band pixel values to generate compressed image data;
The image communication means transmits the compressed image data to a communication network;
A communication system characterized by the above.

請求項７記載の通信システムと、
データベースを含む少なくとも１個のコンピュータシステムと、
前記通信システムと前記コンピュータシステムを接続する通信ネットワークと、
を含む文字／図形探索システムであって、
前記コンピュータシステムの各々が、
前記通信システムから前記圧縮画像データを受信する画像受信手段と、
受信した前記圧縮画像データを前記多色画像、前記文字／図形画像又は前記精細文字／図形画像に伸張する画像伸張手段と、
前記多色画像、前記文字／図形画像又は前記精細文字／図形画像から少なくとも１個の前記文字／図形を含む領域を検出する文字／図形領域検出手段と、
前記領域から前記データベースのキー画像を生成するキー画像生成手段と、
前記キー画像を、前記データベースに蓄えられた記録画像データと比較する画像検出手段と、
を含むことにより、前記領域に対する検出結果を生成することを特徴とする文字／図形探索システム。A communication system according to claim 7;
At least one computer system including a database;
A communication network connecting the communication system and the computer system;
A character / graphic search system including
Each of the computer systems
Image receiving means for receiving the compressed image data from the communication system;
Image expansion means for expanding the received compressed image data into the multicolor image, the character / graphic image, or the fine character / graphic image;
Character / graphic region detection means for detecting an area including at least one character / graphic from the multicolor image, the character / graphic image, or the fine character / graphic image;
Key image generation means for generating a key image of the database from the area;
Image detecting means for comparing the key image with recorded image data stored in the database;
A character / figure search system characterized in that a detection result for the region is generated.

請求項８記載の文字／図形探索システムにおいて、
少なくとも１個の前記コンピュータシステムが、画像送信手段を用いて、前記領域に対する前記検出結果又は前記領域の位置情報を前記通信装置に送信することと、
前記通信装置がディスプレイ又はスピーカを含むことと、
により、前記通信装置が前記ディスプレイ又は前記スピーカを用いて前記検索結果又は前記位置情報を出力することを特徴とする文字／図形探索システム。The character / graphic search system according to claim 8.
At least one of the computer systems using image transmission means to transmit the detection result for the area or the position information of the area to the communication device;
The communication device includes a display or speaker;
Thus, the communication device outputs the search result or the position information using the display or the speaker.

請求項８記載の文字／図形探索システムにおいて、
少なくとも１個の前記コンピュータシステムが、画像送信手段を用いて、前記記録画像データに関連付けられた検索結果文字列又は検索結果画像を前記通信装置に送信することと、
前記通信装置がディスプレイ又はスピーカを含むことと、
により、前記通信装置が前記ディスプレイ又は前記スピーカを用いて前記検索結果文字列又は前記検索結果画像を出力することを特徴とする文字／図形探索システム。The character / graphic search system according to claim 8.
At least one of the computer systems using an image transmission means to transmit a search result character string or a search result image associated with the recorded image data to the communication device;
The communication device includes a display or speaker;
Thus, the communication device outputs the search result character string or the search result image using the display or the speaker.