JP2004266428A

JP2004266428A - Method for shuffling asian character images and shuffle processing system

Info

Publication number: JP2004266428A
Application number: JP2003052822A
Authority: JP
Inventors: Kokuyo Tei; 國揚鄭
Original assignee: DYNACOMWARE CORP
Current assignee: DYNACOMWARE CORP
Priority date: 2003-02-28
Filing date: 2003-02-28
Publication date: 2004-09-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for shuffling Asian character images with a computer, which shuffles imaged characters and can correctly and integrally recognize and extract characters in a sentence by an image analyzing method so as to correctly make the shuffled character images to be text and to provide a shuffle processing system. <P>SOLUTION: A pyramid structure of multiple resolution about a scanned Asian character image is constructed, and a data structure is made to be a tree structure to thereby integrally recognize and extract characters, subsequently shuffling the characters. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、秘密を保つための文書画像イメージの解析に関し、特に、第三者にアジア文字で書かれた秘密文書の内容の意味を知られないようにするための、アジア文字イメージのシャッフル技術に関するものである。
【０００２】
【従来の技術】
アジア文字（漢字、かな、ハングル文字等、アジア地域で使用されている文字をいう。）で書かれた文書（以下、単に「アジア文書」という。）をイメージスキャナ等でスキャニングして、文書を画像（イメージ）データとしてコンピュータに取り込み、得られたイメージを、文字（ｃｈａｒａｃｔｅｒ）部分と非文字部分に分けて、文字イメージ部分のみをシャッフルする（すなわち、文字イメージの順序をばらばらに入れ替えること）技術はいままで存在していなかった。
【０００３】
従って、本発明に直接関連する従来技術は存在していないので、ここでは、この文字イメージシャッフル技術を発明するに至った動機（背景）について説明する。
【０００４】
アジア文書の文は一連の連続したアジア文字（それらはほぼ正方形である）から成り立っている。アジア文字は、文字単体として意味を有してはいるが、連続した文字（熟語若しくはセンテンス）として一つの意味を有し、人々に読まれ理解される場合もある。
【０００５】
このようなアジア文字からなるアジア文書の文字イメージをテキストデータに変換（以下、これを「テキスト化」という。）する場合、通常はそのアジア文書をイメージスキャナで読み取ってイメージ化し、それをＯＣＲソフトでテキスト化するが、現在のＯＣＲソフトの認識率（正しいテキストデータに変換できる率）は１００％とはならないため、人間の手による修正作業が不可欠となる。ここで、前記アジア文書が秘密文書でなければ、この修正作業を含めたテキスト化作業を外部の第三者に委託することもできるが、文書内容が秘密性の高いものであれば、そのままの状態でこれを第三者に委託することができない。
【０００６】
しかしながら、もしも、元の文書の文字がシャッフルされたら、シャッフル後の新しい文書は、（文字としては）読めるけれども、その元の意味を理解することはできなくなる。例えば、元の文書の第一行目の１番目、３番目、５番目そして４番目の文字を、それぞれ第二行目の１番目、３番目、５番目そして４番目の文字と交換したような場合である。言い換えるならば、アジア文字で書かれた文がシャッフルされると、文字単体としては読めるものの、シャッフル後の新たな文は完全に意味をなさなくなるということである。文書の文字に対して複数回のシャッフルがなされれば尚更である。
【０００７】
そこで、文字イメージをシャッフルし、文字の順序をばらばらにした状態にして第三者にテキスト化の委託をし、変換されたテキストデータを委託した者が自分で元の文字の順序に戻せば、第三者に意味内容を知られずに秘密文書のテキスト化が可能となる。
【０００８】
また、アジア文書をイメージ化して通信によって相手に伝える場合、これをそのままの状態で送信すると、通信が傍受されるおそれのある状況では、文書の秘密性を保持することができなくなる。この場合、イメージ化した文書の文字イメージをシャッフルしたものを送信すれば、傍受されても意味内容が第三者に知られるおそれがなくなり、正当な受信者は予め決められたリシャッフル方法によって文字イメージを元の並びに戻せば、文書内容を知ることができる。つまり、送信側におけるシャッフリングは、文書の文字の順番をバラバラにするための暗号化方法に相当し、受信側におけるリシャッフリングは、送られてきた文書の文字の並びを元に戻すための復号化方法に相当するものとなる。
【０００９】
【発明が解決しようとする課題】
しかしながら、この文字イメージのシャッフルをコンピュータによって自動的に行う場合、一つの問題がある。すなわち、アジア文字は人間の目には正方形のブロックが１つの文字として認識されるのに対し、コンピュータは文字をあくまでもイメージ（画像）として把握するため、一つの独立した意味を有している１個の文字（例えば、図１の「休」）が、分離した二つの部分（図１の、「イ」と「木」）から成り立っている場合は、これを１個の「休」という文字として認識せずに、「イ」という図形（外枠１で囲まれた部分）と「木」という図形（外枠２で囲まれた部分）に分割して認識してしまうという問題である。そして、コンピュータ内では、「イ」のある場所を例えば外枠１の頂点の座標として記憶し、「木」のある場所を例えば外枠２の頂点の座標として記憶する。従って、これを分割したまま別個にシャッフルすると、シャッフルした後は文字として成立せず、ＯＣＲを用いた文字のテキスト化ができなくなるという問題が発生する。
【００１０】
言うまでもなく、上記二つの部分の間にギャップが無いような文字の場合は、（たとえ一つ以上の分離した非正方形のブロックが中に含まれていたり、また、複数の外枠で囲まれた部分が重なり合っていたとしても）外枠はもはや正方形となり、一つの文字として認識されることになる。例えば、図２における「復」という文字の場合は、「復」という文字全体を囲む正方形の枠３の中に、小さな「ノ」を囲む４角形の枠４が完全に含まれているので、この場合は一つの文字として認識される。また、図１と同じ「休」という文字であっても、図３に示すような太い字体の「休」の場合は、「イ」と「木」の間にギャップがないので、初めから一つの文字として認識される。
【００１１】
そこで、上記のような分離した二つ以上の部分から成り立っているアジア文字イメージをシャッフルする場合に、その文字を構成している各部分が別個に移動されないで、一体として移動されるように、コンピュータによって文字を一体視（漢字の例で言えば、「へん」と「つくり」をバラバラに単独の文字として認識せずに、一つの漢字として一体的に認識すること）し抽出する技術が必要となる。逆に言えば、アジア文字イメージのシャッフルが成功するか否かは、コンピュータによってアジア文字イメージを一体視して抽出できるか否かにかかっている。
【００１２】
本発明は、かかるコンピュータによる文字認識における問題点に鑑み為されたものであり、本発明の目的は、イメージ化された文字をシャッフルし、シャッフル後の文字イメージのテキスト化が正しく行われるように、イメージ解析方法によって文中の文字を正しく一体視し抽出できるような、コンピュータによるアジア文字イメージのシャッフル方法及びシャッフル処理システムを提供することにある。
【００１３】
【課題を解決するための手段】
本発明は、コンピュータによるアジア文字イメージのシャッフル方法及びシャッフル処理システムに関し、本発明の上記目的を達成するための第１の発明は、スキャンされたアジア文書の文字イメージをシャッフルし、ランダムに並び替えて新たなアジア文書を作るためのアジア文字イメージのシャッフル方法であって、該方法は、次のステップを含むことを特徴とするものである。
【００１４】
ａ）アジア文書をスキャンしてそのイメージを取り込むステップ
ｂ）前記取り込んだアジア文書のイメージを、文字イメージ部分と非文字イメージ部分とに分割するステップ
ｃ）前記文字イメージ部分について、多解像度レベルの木構造を構築するステップ
ｄ）前記多解像度レベルの木構造を解析することによって、シャッフル可能な文字イメージの島を抽出するステップ
ｅ）一個又はそれ以上の文字イメージのブロックを含むシャッフル可能な島をランダムに選んでシャッフルし、該シャッフルされた島をリシャッフルして元の位置に戻すステップ
ｆ）前記リシャッフルされた文字イメージ部分と前記非文字イメージ部分を合体させて完全な元のアジア文書にするステップ
本発明の上記目的を達成するための第２の発明は、前記のアジア文字イメージのシャッフル方法におけるアジア文字イメージについての多解像度レベルの木構造を構築する方法が、さらに以下のステップを有することを特徴とするアジア文字イメージのシャッフル方法である。
【００１５】
ｉ）文字イメージがフルドットイメージになるまで解像度を粗くしていくことにより、文字イメージの多解像度ピラミッド構造を構築するステップ
ｉｉ）各解像度レベルにおけるピラミッド構造の中から、全ての島を見つけるステップ
ｉｉｉ）ピラミッド構造の各レベル間において木構造の各ノードの親子関係を構築するステップ
ｉｖ）木構造の各レベルのノードに、それぞれの位置情報とサイズ情報を持たせるステップ
本発明の上記目的を達成するための第３の発明は、前記のアジア文字イメージのシャッフル方法であって、２^{（ｊ−１）}×２^{（ｊ−１）}の解像度レベルにおける１ドットのまわりを囲むように塗りつぶすことにより２^ｊ×２^ｊの解像度レベルにおける１ドットが構成され、「島」は４連結の塗りつぶしによって結合されたイメージブロックの外枠として定義され、２^ｊ×２^ｊレベルと２^{（ｊ−１）}×２^{（ｊ−１）}レベルはノード上の親子関係を構成し、２^ｊ×２^ｊレベルにおける親ノードの島は、２^{（ｊ−１）}×２^{（ｊ−１）}レベルにおける「島」又は子ノードをすべて含むことを特徴とするアジア文字イメージのシャッフル方法である。
【００１６】
本発明の上記目的を達成するための第４の発明は、前記のアジア文字イメージのシャッフル方法であって、前記島のシャッフル方法がさらに以下のステップを有することを特徴とするアジア文字イメージのシャッフル方法である。
【００１７】
ａ）シャッフルされる文字島を配置するための空スペースを作るステップ
ｂ）シャッフルされる文字島を１個ずつランダムに選び、それを前記空スペースに移し換え、すべての空スペースを前記文字島で埋めるステップ
ｃ）シャッフルされた文字島に、シャフルされる前のスキャンされた文字島の位置座標を属性として持たせるステップ
本発明の上記目的を達成するための第５の発明は、前記のアジア文字イメージのシャッフル方法であって、シャッフルされた文字島を元の位置に戻すためのリシャッフリングを行う際に、シャッフルされた文字島が属性として有する元の文字島の位置座標を用いることを特徴とするアジア文字イメージのシャッフル方法である。
【００１８】
本発明の上記目的を達成するための第６の発明は、前記のアジア文字イメージのシャッフル方法であって、スキャンされたアジア文書を文字イメージ部分と非文字イメージ部分に分けるステップを有し、さらに、該ステップがさらに以下のステップを有することを特徴とするアジア文字イメージのシャッフル方法である。
【００１９】
ａ）スキャンされた文書からアジア文字イメージ部分を分離するイメージ前処理技術を使用するステップ
ｂ）テキスト文字イメージ又は前景イメージのような、文字イメージのみを含むイメージを使用するステップ
ｃ）背景イメージのような非文字イメージのみを含むイメージを使用するステップ
本発明の上記目的を達成するための第７の発明は、前記のアジア文字イメージのシャッフル方法であって、前記イメージ前処理技術が、傾き補正、ノイズ除去、罫線検出、非文字イメージ検出を行うものであることを特徴とするアジア文字イメージのシャッフル方法である。
【００２０】
本発明の上記目的を達成するための第８の発明は、スキャンされたアジア文書の文字イメージをシャッフルし、ランダムに並び替えて新たなアジア文書を作るためのアジア文字イメージのシャッフル処理システムであって、該システムは、次の手段を含むことを特徴とするものである。
【００２１】
ａ）アジア文書をスキャンしてそのイメージを取り込むイメージ入力手段
ｂ）前記入力されたアジア文書のイメージを、文字イメージ部分と非文字イメージ部分とに分割する手段
ｃ）前記文字イメージ部分について、多解像度レベルの木構造を構築する手段
ｄ）前記多解像度レベルの木構造を解析することによって、シャッフル可能な文字イメージの島を抽出する手段
ｅ）一個又はそれ以上の文字イメージのブロックを含むシャッフル可能な島をランダムに選んでシャッフルし、該シャッフルされた島をリシャッフルして元の位置に戻す手段
ｆ）前記リシャッフルされた文字イメージ部分と前記非文字イメージ部分を合体させて完全な元のアジア文書にする手段
本発明の上記目的を達成するための第９の発明は、前記のアジア文字イメージのシャッフル処理システムにおけるアジア文字イメージについての多解像度レベルの木構造を構築する手段が、さらに以下の手段を有することを特徴とするアジア文字イメージのシャッフル処理システムである。
【００２２】
ｉ）文字イメージがフルドットイメージになるまで解像度を粗くしていくことにより、文字イメージの多解像度ピラミッド構造を構築する手段
ｉｉ）各解像度レベルにおけるピラミッド構造の中から、全ての島を見つける手段ｉｉｉ）ピラミッド構造の各レベル間において木構造の各ノードの親子関係を構築する手段
ｉｖ）木構造の各レベルのノードに、それぞれの位置情報とサイズ情報を持たせる手段
本発明の上記目的を達成するための第１０の発明は、前記のアジア文字イメージのシャッフル処理システムであって、２^{（ｊ−１）}×２^{（ｊ−１）}の解像度レベルにおける１ドットのまわりを囲むように塗りつぶすことにより２^ｊ×２^ｊの解像度レベルにおける１ドットが構成され、「島」は４連結の塗りつぶしによって結合されたイメージブロックの外枠として定義され、２^ｊ×２^ｊレベルと２^{（ｊ−１）}×２^{（ｊ−１）}レベルはノード上の親子関係を構成し、２^ｊ×２^ｊレベルにおける親ノードの島は、２^{（ｊ−１）}×２^{（ｊ−１）}レベルにおける「島」又は子ノードをすべて含むことを特徴とするアジア文字イメージのシャッフル処理システムである。
【００２３】
本発明の上記目的を達成するための第１１の発明は、前記のアジア文字イメージのシャッフル処理システムであって、前記島のシャッフル手段がさらに以下の手段を有することを特徴とするアジア文字イメージのシャッフル処理システムである。
【００２４】
ａ）シャッフルされる文字島を配置するための空スペースを作る手段
ｂ）シャッフルされる文字島を１個ずつランダムに選び、それを前記空スペースに移し換え、すべての空スペースを前記文字島で埋める手段
ｃ）シャッフルされた文字島に、シャフルされる前のスキャンされた文字島の位置座標を属性として持たせる手段
本発明の上記目的を達成するための第１２の発明は、前記のアジア文字イメージのシャッフル処理システムであって、前記シャッフルされた文字島を元の位置に戻すためのリシャッフリングを行う際に、シャッフルされた文字島が属性として有する元の文字島の位置座標を用いることを特徴とするアジア文字イメージのシャッフル処理システムである。
【００２５】
本発明の上記目的を達成するための第１３の発明は、スキャンされたアジア文書を文字イメージ部分と非文字イメージ部分に分ける手段を有する前記のアジア文字イメージのシャッフル処理システムであって、前記手段がさらに以下の手段を有することを特徴とするアジア文字イメージのシャッフル処理システムである。
【００２６】
ａ）スキャンされた文書からアジア文字イメージ部分を分離するイメージ前処理技術を使用する手段
ｂ）テキスト文字イメージ又は前景イメージのような、文字イメージのみを含むイメージを使用する手段
ｃ）背景イメージのような非文字イメージのみを含むイメージを使用する手段
本発明の上記目的を達成するための第１４の発明は、前記のアジア文字イメージのシャッフル処理システムにおける前記イメージ前処理技術が、傾き補正、ノイズ除去、罫線検出、非文字イメージ検出を行うものであることを特徴とするアジア文字イメージのシャッフル処理システムである。
【００２７】
本発明の上記目的を達成するための第１５の発明は、コンピュータを制御するためのプログラムを記録したコンピュータ読み取り可能な記録媒体であって、前記プログラムは、前記コンピュータに、スキャンされたアジア文書の文字イメージのシャッフル及びリシャッフルを下記ステップによって行わせるためのプログラムであることを特徴とするコンピュータ読み取り可能な記録媒体である。
【００２８】
ａ）アジア文書をスキャニングして、得られたイメージを文字イメージ部分と非文字イメージ部分に分けるステップ
ｂ）前記文字イメージについて、多解像度レベル木構造を構築するステップであって、該ステップは以下のステップを含む
ｉ）解像度レベルが２^０×２^０、２^１×２^１、２^２×２^２、…とフルドットになるまで文字トイメージの解像度を変化させて文字イメージの多解像度ピラミッド構造を構築するステップ
ｉｉ）各解像度レベルにおけるイメージの中で、ドットが繋がっているものを一つの島と判断して、全ての島を検出するステップ
ｉｉｉ）ピラミッド構造になっている各解像度レベル間において、島の木構造のノードの親子関係を構築するステップ
ｉｖ）木構造の各ノードにそれぞれ一つ前のレベルの島の位置情報及びサイズ情報を属性として格納しておくステップ
ｃ）アジア文字イメージをシャッフルし、また、リシャッフルするステップ
【００２９】
【発明の実施の形態】
以下、本発明に係るアジア文字イメージのシャッフル方法及びシャッフル処理システムについて、図面を参照して詳細に説明する。
【００３０】
図４は、本発明に係るアジア文字イメージのシャッフル処理システムを含む全体構成を示す概念図である。アジア文字で書かれた紙媒体からなるアジア文書１０が、イメージスキャナ２１によってイメージ（画像）データとしてコンピュータ２２の記憶装置内に取り込まれ、コンピュータ２２に格納されているアジア文字イメージのシャッフル処理プログラムによって文字イメージがシャッフルされる。シャッフルされたアジア文字イメージはネットワーク３０を介して、前記文字イメージをテキスト化する第三者のコンピュータ４０に送られる。コンピュータ４０にはＯＣＲソフトが格納されており、前記シャッフルされたアジア文字イメージはテキストデータに変換される。該テキストデータはネットワーク３０を介して送信元のコンピュータ２２に送られ、リシャッフリング（元の文字の並びに戻すこと）される。イメージスキャナ２１、コンピュータ２２及びアジア文字イメージのシャッフル処理プログラムによってアジア文字イメージのシャッフル処理システム２０が構成されている。なお、秘密性の高いアジア文書をネットワーク３０を介して送信して相手に内容を伝える場合に、リシャッフルプログラムが格納されたコンピュータ５０で受信し、該リシャッフルプログラムでリシャッフルすれば、正しい文書内容を伝達することができる。逆に言えば、コンピュータ５０にリシャッフルプログラムが格納されていなければ、リシャッフルできないので、通信が傍受されても意味内容を第三者に知られることがない。
【００３１】
図５は、本発明に係るアジア文字イメージのシャッフル方法の手順を示したフローチャートである。この図に基づいて詳細に説明する。まず、シャッフルを行う文が記載されているアジア文書１０（図１２はその実例である。）をイメージスキャナ２１で読み取り、イメージを取り込む（ステップＳ１０）。取り込んだイメージには、文字の部分と文字を含まない部分（写真や図形など）が含まれているため、これを文字部分と非文字部分に分ける（ステップＳ２０）。その後、必要に応じて前処理を行う。ここに、前処理とは、スキャニングを行う際に紙が傾いていたために、行が傾いたまま読み込まれたような場合に、その傾きを無くするための「傾き補正」や、ゴミや汚れがついていたために、それが文字イメージの一部として読み込まれてしまったような場合にそれを除去するための「ノイズ除去」や、文字に罫線や下線が付されていたような場合に、それを文字の一部とみなさないように、予め「罫線検出」を行い、それを除去する処理等のことである。これらの前処理は、既存のＯＣＲソフト等に組み込まれている機能を利用して行うことが可能である。図１３は、図１２のイメージについて前処理が施された後のイメージである。
【００３２】
次に、ステップＳ２０において得られた文字イメージについて、多解像度レベルの木構造のデータ構造を構築する（ステップＳ３０）。これは、前述のように、分離した二つ以上の部分から成り立っているアジア文字イメージをシャッフルする場合に、その文字を構成している各部分が別個に移動されないで、一体として移動されるように、コンピュータによって文字を一体視（ｉｄｅｎｔｉｆｙ）し抽出することができるようにするために行うものである。そのために、入力した文字イメージの解像度を段階的に粗くし、文字の線の太さを太くしていき、前記分離した二つ以上の部分の間にギャップが無くなるまで行う。ギャップが無くなれば、コンピュータは前記分離した二つ以上の部分から成り立っているアジア文字イメージを一体視（つまり１文字として認識）することができる。最終的にシャッフルする文字イメージは、イメージスキャナで入力したときの文字イメージであるから、解像度を粗くする前の状態の文字イメージの属性（サイズ、位置）情報を記憶しておく必要がある。
【００３３】
そこで、コンピュータ内に文字イメージの属性を記憶する場合に、文字イメージのうち、線が繋がっている最大サイズのブロックの外側を枠で囲い、それを「島」（ｉｓｌａｎｄ）と名付け、島の位置は前記外枠の座標で表す。通常は対角線上の座標を用いる。図１の「休」という文字の例で言えば、それぞれ外枠１、外枠２で囲まれた「イ」及び「木」が「島」となる。図２の例では、外枠３又は外枠４で囲まれた部分が島であるが、島４は島３に含まれることになる。また、図３の例では「休」単独で一つの島である。
【００３４】
図６は、前記ステップＳ３０をさらに細かいステップに分けたものであり、以下、図６に基づいて説明する。
【００３５】
まず、読み込んだオリジナルの文字テキストイメージ（図７の（ａ）、（ｂ））がフルドットイメージ（すべて塗りつぶされた状態のこと）になるまで段階的に解像度を粗くしていくことにより、文字イメージの多解像度ピラミッド構造を構築する（ステップＳ３１）。
【００３６】
ここで、ピラミッド構造とは、コンピュータ画像処理の分野で用いられている階層構造によるデータ構造の一つであり、２^ｋ×２^ｋ画素からなる画像に対して、解像度（分解能）の異なる（２^０×２^０〜２^ｋ×２^ｋ）ｋ＋１枚の画像の階層的集積を考えたものであり、例えば、図８のような形で表現される。すなわち、入力画像Ｉ_０から出発して、順次、画素数が縦横とも１／２になる画像Ｉ_１，Ｉ_２，…を次々に発生させていく。逆に言えば、画像全体の大きさを一定にした場合の１ドットの大きさが４倍になっていくので、解像度が落ちてくることになる。
【００３７】
本発明の実施例における上記Ｉ_ｋ−１とＩ_ｋとの関係は、図９又は図１０に示す通りである。まず、図９に示すものは「４連結」法と呼んでいるものであり、入力イメージである図９（ａ）の１ピクセルの上下左右（これが４連結の意味である）を塗りつぶして２^１×２^１レベルのイメージ（同図（ｂ））を作る。次に、この４連結されたものを新たな１ドットとみなし、これを中心として上下左右に２^１×２^１レベルのドットを連結する。このようにして２^２×２^２レベルのイメージ（同図（ｃ））が出来上がる。以下、同様にして、２^３×２^３レベル（同図（ｄ））、２^４×２^４レベル（同図（ｅ））のイメージが形成され、多解像度レベルのピラミッド構造が構築される。図１０は、図９の４連結に対して、上下左右に加えて左右の斜め方向を加えた「８連結」法の場合を示したものである。基本的な考え方は同じなので、説明は省略する。
【００３８】
次に、ステップＳ３１において構築したピラミッド構造の各解像度レベルのイメージブロックの中から、前述の「島」を抽出する（ステップＳ３２）。島は、繋がったイメージブロックを囲む最大枠のことであり、島を抽出するということは、その外枠の座標をコンピュータに記憶させることを意味する。
【００３９】
次に、前のステップで得られた各解像度レベルごとの島の属性情報を、木構造のデータ構造とし、木構造の各ノード（結節点）について親子関係を構築する（ステップＳ３３）。図７を用いてこれを具体的に説明すると、入力画像の解像度を低くしていくと、次第に線が太くなり、隣り合う島が一つになって新しい島が生成されるため、２^ｋ×２^ｋレベルにある島は２^{（ｋ−１）}×２^{（ｋ−１）}レベルにある島を必ず含むことになる。具体的には、図７（ｆ）の島１は、一つ下のレベルの島２，３，４を含み、（ｅ）の島２は、その下のレベルの島５，６，７を含むという関係になる。このような関係は木構造のデータ構造を用いるのが適している。図７の例では、島１は木構造におけるルート（根）ノードになり、下のレベルの島２は島１の子ノードになるとともに、Ｌｅｖｅｌ２の島５，６，７に対応するノード（子）の親ノードにもなる。このようにして、各島に対応するノードを設け、それについての親子関係を構築する。
【００４０】
このようにして、各解像度レベルにおける島に対応するノードについて親子関係が構築されたら、各親ノードに子ノードに対応する島の属性情報（位置及びサイズ）を格納する（ステップＳ３４）。具体的には、図７において、Ｌｅｖｅｌ４における島９及び１０の属性情報は島８のノードに格納される。こうすることにより、コンピュータは、島８は島９と島１０とから成り立っていることを知ることができる。
【００４１】
ここで、人間であれば、図７の島２が３個の文字から出来ていることが目視できるが、コンピュータは１つのブロックとしてしか把握できない。しかしながら、アジア文字は１文字がほぼ正方形であるため、横幅と縦の長さとの比を計算することにより個数を推定することができる。
【００４２】
次に、前記多解像度レベルの木構造を解析することにより、シャッフル可能な文字ブロック（島）を抽出する（ステップＳ４０）。具体的には、木構造のルートノードに属する島（図７の（ｆ）の１）において、シャッフル可能な文字の個数がいくつあるかを計算で推定する。図７の場合では、読点「、」が含まれているので半端な数になってしまうが、その下のレベル（Ｌｅｖｅｌ１）を木構造の解析によってたどっていくことにより、島４は他に比べて極端に小さいことがわかる。そこで、これをシャッフル対象から除き、残りの４個をシャッフルの対象とする。
【００４３】
次に、シャッフルを行う文字島をランダムに選んで、シャッフルを行う（ステップＳ５０）。シャッフルの方法にはいろいろあるが、一つの方法は、図１１のフローチャートに示すようなものが考えられる。まず、シャッフルされる文字イメージを配置するための空きスペースを作る（ステップＳ５１）。次に、シャッフルされる文字イメージを１個ずつランダムに選び、それを前のステップで作った空きスペースに入れていき、空きスペースを文字イメージで全部埋める（ステップＳ５２）。なお、移動させる文字イメージの島は、入力画像のレベル（図７のＬｅｖｅｌ４）のを用いるとともに、その位置情報を属性として持たせる（ステップＳ５３）。元の位置にリシャッフルするときに必要になるからである。図１４は、図１３の文字イメージをシャッフルした後の文字イメージである。文字単独では判読できるが、全体として意味をなさないことが分かる。
【００４４】
以上のようにしてアジア文字イメージをシャッフルすることができるが、ＯＣＲによるテキスト化等の作業が終わった場合は、これをまた元に戻す必要がある。この場合は、シャッフルされた文字の島の位置情報を利用して戻す（ステップＳ６０）。最後に、リシャッフルされた文字イメージ部分と、非文字イメージ部分を合体させて、元のアジア文書を復元する（ステップＳ７０）。
【００４５】
なお、本発明による文字シャッフル技術は、アジア文字を対象としたものではあるが、アルファベットのような欧米各国で使用されている文字にも適用できるものであることは言うまでもない。
【００４６】
以上に述べた発明の実施の形態は単なる例示に過ぎず、本願発明の精神及び範囲を逸脱しない限りにおいて、様々な変形例が可能であることは言うまでもない。
【００４７】
【発明の効果】
以上述べた通り、本発明に係るアジア文字イメージのシャッフル方法及びシャッフル処理システムによれば、文字イメージをシャッフルし、文字の順序をばらばらにした状態にして第三者にテキスト化の委託をすることができるので、第三者に意味内容を知られずに秘密文書のテキスト化が可能となる。
【００４８】
また、アジア文書をイメージ化して通信によって相手に伝える場合、文字イメージをシャッフルしてから送信することができるので、通信が傍受されるおそれのある状況であっても、文書の秘密性を保持することができるという利点がある。
【図面の簡単な説明】
【図１】アジア文字の構成を説明するための図である。
【図２】文字島の概念を説明するための図である。
【図３】一体として認識されるアジア文字の例を示す図である。
【図４】本発明に係るアジア文字イメージシャッフルシステムの構成を示すブロック図である。
【図５】本発明に係るアジア文字イメージシャッフル方法（システム）のフローチャートである。
【図６】多解像度レベルの木構造を構築する方法（手段）のフローチャートである。
【図７】多解像度レベルのピラミッド構造及び木構造の一例を示す図である。
【図８】一般的な多解像度のピラミッド構造を説明するための図である。
【図９】４連結法を説明するための図である。
【図１０】８連結法を説明するための図である。
【図１１】シャッフル方法（手段）のフローチャートの一例を示す図である。
【図１２】アジア文書の一例を示す図である。
【図１３】前処理を行った後のアジア文書の一例を示す図である。
【図１４】シャッフル後のアジア文字イメージの一例である。
【符号の説明】
１０アジア文書（紙媒体）
２０アジア文字シャッフル処理システム
２１イメージスキャナ
２２コンピュータ（パソコン）
３０ネットワーク
４０第三者の端末
５０第三者の端末[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to the analysis of document image images to maintain confidentiality, and more particularly, to a technique for shuffling Asian character images to prevent a third party from knowing the meaning of the contents of a secret document written in Asian characters. It is about.
[0002]
[Prior art]
A document written in Asian characters (Kanji, Kana, Hangul characters, etc., used in Asian regions) (hereinafter simply referred to as “Asian document”) is scanned by an image scanner or the like, and the document is scanned. A technique in which a computer captures image data as image data and divides the obtained image into a character part and a non-character part, and shuffles only the character image part (that is, the order of the character images is changed). Yes it did not exist.
[0003]
Therefore, since there is no prior art directly related to the present invention, the motivation (background) for inventing the character image shuffling technology will be described here.
[0004]
Asian text sentences consist of a series of consecutive Asian characters, which are approximately square. Although Asian characters have a meaning as a single character, they have one meaning as a continuous character (idiom or sentence) and may be read and understood by people.
[0005]
When converting a character image of an Asian document composed of such Asian characters into text data (hereinafter, referred to as “text conversion”), the Asian document is usually read by an image scanner and converted into an image, and then converted to an OCR software. However, since the recognition rate of the current OCR software (the rate of conversion to correct text data) is not 100%, a manual correction operation is indispensable. Here, if the Asian document is not a confidential document, the text conversion work including this correction work can be entrusted to an external third party. This cannot be entrusted to a third party in a state.
[0006]
However, if the characters of the original document are shuffled, the new document after shuffling will be readable (as characters) but will not be able to understand its original meaning. For example, replace the first, third, fifth, and fourth characters of the first line of the original document with the first, third, fifth, and fourth characters of the second line, respectively. Is the case. In other words, when a sentence written in Asian characters is shuffled, it can be read as a single character, but the new sentence after shuffling becomes completely meaningless. Even more so if the text of the document is shuffled multiple times.
[0007]
Therefore, if the character image is shuffled, the order of the characters is separated, and a third party is entrusted with the conversion of the text, and the person who entrusts the converted text data returns to the original order of the characters, It becomes possible to convert a secret document into text without knowing the meaning of the content by a third party.
[0008]
In the case where an Asian document is imaged and transmitted to the other party by communication, if the document is transmitted as it is, the confidentiality of the document cannot be maintained in a situation where the communication may be intercepted. In this case, sending a shuffled version of the text image of the imaged document eliminates the risk of the third party knowing the meaning even if it is intercepted. By returning the image to its original state, the contents of the document can be known. In other words, shuffling on the sending side is equivalent to an encryption method for changing the order of characters in a document, and reshuffling on the receiving side is decryption to restore the order of characters in a sent document. It is equivalent to a method.
[0009]
[Problems to be solved by the invention]
However, if the shuffling of the character image is automatically performed by a computer, there is one problem. In other words, Asian characters are recognized by the human eye as a single square block, but the computer has only one independent meaning because the computer recognizes the character as an image (image). If two characters (for example, “rest” in FIG. 1) are composed of two separate parts (“a” and “tree” in FIG. 1), this is referred to as one “rest”. However, there is a problem in that the figure is divided into a figure "a" (a part surrounded by an outer frame 1) and a figure "tree" (a part surrounded by an outer frame 2) and recognized. Then, in the computer, the location of “a” is stored as, for example, the coordinates of the vertex of the outer frame 1, and the location of “tree” is stored, for example, as the coordinates of the vertex of the outer frame 2. Therefore, if the text is shuffled separately while being divided, the character is not formed after the shuffling, and the text cannot be converted to text using the OCR.
[0010]
Needless to say, if the character has no gap between the two parts, it may be (even if it contains one or more separate non-square blocks or The outer frame is no longer square (even if the parts overlap) and will be recognized as a single character. For example, in the case of the character "return" in FIG. 2, the square frame 4 surrounding the small "no" is completely contained in the square frame 3 surrounding the entire character "return". In this case, it is recognized as one character. Even in the case of the same character "rest" as in FIG. 1, in the case of the bold "rest" as shown in FIG. 3, there is no gap between "a" and "tree". Is recognized as one character.
[0011]
Therefore, when shuffling an Asian character image composed of two or more separated parts as described above, the parts constituting the character are not moved separately, but are moved as one, A technology is needed that allows computers to look at characters in a unified manner (in the case of kanji, for example, instead of recognizing "hen" and "tsukuri" as individual characters, instead of recognizing them as one single kanji) and extract them It becomes. Conversely, whether or not the Asian character image is successfully shuffled depends on whether or not the Asian character image can be integrally extracted by the computer.
[0012]
The present invention has been made in view of the problems in character recognition by such a computer, and an object of the present invention is to shuffle imaged characters so that a character image after shuffling is correctly converted to text. Another object of the present invention is to provide a computer shuffle method and a shuffle processing system for Asian character images by which a character in a sentence can be correctly integrated and extracted by an image analysis method.
[0013]
[Means for Solving the Problems]
The present invention relates to a shuffling method and a shuffling processing system for Asian character images by a computer. A first invention for achieving the above object of the present invention is to shuffle and randomly sort character images of a scanned Asian document. For shuffling an Asian character image to create a new Asian document, the method including the following steps.
[0014]
a) Scanning the Asian document and capturing the image
b) dividing the captured Asian document image into a character image portion and a non-character image portion
c) constructing a multi-resolution level tree structure for the character image portion
d) extracting the shuffleable character image islands by analyzing the multi-resolution level tree structure.
e) randomly selecting and shuffling shuffleable islands containing one or more blocks of character images, and reshuffling the shuffled islands back to their original positions.
f) combining the reshuffled text image portion and the non-text image portion into a complete original Asian document.
A second invention for achieving the above object of the present invention is that the method for constructing a multi-resolution level tree structure for an Asian character image in the Asian character image shuffling method further includes the following steps. It is a method of shuffling a characteristic Asian character image.
[0015]
i) building a multi-resolution pyramid structure of the character image by reducing the resolution until the character image becomes a full dot image
ii) Finding all islands in the pyramid structure at each resolution level
iii) building a parent-child relationship of each node of the tree structure between each level of the pyramid structure
iv) A step of giving nodes at each level of the tree structure respective position information and size information
A third invention for achieving the above object of the present invention is the above shuffle method for Asian character images, ^(J-1) × 2 ^(J-1) By surrounding a dot at a resolution level of ^j × 2 ^j , And an “island” is defined as the outer frame of an image block connected by a four-connected fill, ^j × 2 ^j Level and 2 ^(J-1) × 2 ^(J-1) Levels form a parent-child relationship on a node, ^j × 2 ^j The island of the parent node at the level is 2 ^(J-1) × 2 ^(J-1) A method for shuffling an Asian character image, comprising all "islands" or child nodes in a level.
[0016]
According to a fourth aspect of the present invention, there is provided a shuffling method for an Asian character image, wherein the shuffling method for an island further includes the following steps. Is the way.
[0017]
a) Creating an empty space for arranging shuffled character islands
b) randomly selecting character islands to be shuffled one by one, transferring them to the empty space, and filling all empty spaces with the character islands;
c) A step of giving the position coordinates of the scanned character island before shuffling as attributes to the shuffled character island.
A fifth invention for achieving the above object of the present invention is the shuffling method for the Asian character image described above, wherein shuffling is performed when reshuffling for returning the shuffled character island to its original position. And a method of shuffling an Asian character image using the position coordinates of the original character island that the character island has as an attribute.
[0018]
According to a sixth aspect of the present invention, there is provided a method for shuffling an Asian character image, comprising the step of dividing a scanned Asian document into a character image portion and a non-character image portion, , Wherein the steps further include the following steps:
[0019]
a) using image preprocessing techniques to separate Asian character image portions from the scanned document;
b) using an image containing only character images, such as a text character image or a foreground image
c) using an image containing only non-text images, such as a background image
A seventh invention for achieving the above object of the present invention is the shuffling method for Asian character images, wherein the image preprocessing technique performs skew correction, noise removal, ruled line detection, and non-character image detection. This is a method for shuffling Asian character images, which is characterized in that:
[0020]
An eighth invention for achieving the above object of the present invention is an Asian character image shuffling processing system for shuffling scanned Asian document character images and rearranging them at random to create a new Asian document. The system includes the following means.
[0021]
a) Image input means for scanning Asian documents and capturing the images
b) means for dividing the input Asian document image into a character image portion and a non-character image portion
c) means for constructing a multi-resolution level tree structure for the character image portion
d) means for extracting islands of shuffleable character images by analyzing the multi-resolution level tree structure
e) means for randomly selecting and shuffling shuffleable islands containing one or more blocks of character images, and reshuffling the shuffled islands back to their original positions.
f) means for combining the reshuffled character image portion and the non-character image portion into a complete original Asian document.
According to a ninth aspect of the present invention, the means for constructing a multi-resolution tree structure for an Asian character image in the Asian character image shuffling processing system further includes the following means. This is a shuffle processing system for Asian character images.
[0022]
i) means for constructing a multi-resolution pyramid structure of a character image by reducing the resolution until the character image becomes a full dot image
ii) Means for finding all islands from the pyramid structure at each resolution level iii) Means for constructing the parent-child relationship of each node of the tree structure between each level of the pyramid structure
iv) Means for giving each position information and size information to each level node of the tree structure
A tenth invention for achieving the above object of the present invention is the Asian character image shuffling processing system, ^(J-1) × 2 ^(J-1) By surrounding a dot at a resolution level of ^j × 2 ^j , And an “island” is defined as the outer frame of an image block connected by a four-connected fill, ^j × 2 ^j Level and 2 ^(J-1) × 2 ^(J-1) Levels form a parent-child relationship on a node, ^j × 2 ^j The island of the parent node at the level is 2 ^(J-1) × 2 ^(J-1) An Asian character image shuffling system comprising all "islands" or child nodes in a level.
[0023]
An eleventh invention for achieving the above object of the present invention is the Asian character image shuffling processing system, wherein the island shuffling means further has the following means. It is a shuffle processing system.
[0024]
a) Means to create empty space for placing shuffled character islands
b) means for randomly selecting character islands to be shuffled one by one, transferring them to the empty space, and filling all empty spaces with the character islands
c) means for giving the position coordinates of the scanned character islands before shuffling as attributes to the shuffled character islands
A twelfth invention for achieving the above object of the present invention is the shuffle processing system for Asian character images described above, wherein when performing reshuffling for returning the shuffled character island to its original position, An shuffle processing system for Asian character images, characterized by using the position coordinates of the original character island that the shuffled character island has as an attribute.
[0025]
A thirteenth invention for achieving the above object of the present invention is the Asian character image shuffling processing system having means for separating a scanned Asian document into a character image portion and a non-character image portion, Is a shuffle processing system for Asian character images, further comprising the following means.
[0026]
a) means using image preprocessing techniques to separate Asian character image parts from the scanned document
b) means to use images containing only character images, such as text character images or foreground images
c) means for using images containing only non-text images, such as background images
A fourteenth invention for achieving the above object of the present invention is that the image preprocessing technique in the Asian character image shuffling processing system performs tilt correction, noise removal, ruled line detection, and non-character image detection. This is a shuffle processing system for Asian character images.
[0027]
A fifteenth invention for achieving the above object of the present invention is a computer-readable recording medium on which a program for controlling a computer is recorded, wherein the program stores, on the computer, a scanned Asian document. A computer-readable recording medium characterized by being a program for causing a shuffling and reshuffling of a character image by the following steps.
[0028]
a) scanning an Asian document and separating the obtained image into a character image portion and a non-character image portion;
b) constructing a multi-resolution level tree structure for the character image, the step including the following steps:
i) Resolution level is 2 ⁰ × 2 ⁰ , 2 ¹ × 2 ¹ , 2 ² × 2 ² Steps to build multi-resolution pyramid structure of character image by changing the resolution of character image until full dot
ii) a step of judging a connected dot in the image at each resolution level as one island and detecting all islands
iii) constructing a parent-child relationship of island tree nodes between each of the pyramid-structured resolution levels
iv) A step of storing position information and size information of the island at the immediately preceding level as attributes in each node of the tree structure.
c) Steps to shuffle and reshuffle Asian character images
[0029]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a shuffling method and a shuffling processing system for Asian character images according to the present invention will be described in detail with reference to the drawings.
[0030]
FIG. 4 is a conceptual diagram showing the overall configuration including the shuffle processing system for Asian character images according to the present invention. An Asian document 10 composed of a paper medium written in Asian characters is taken into a storage device of a computer 22 by an image scanner 21 as image (image) data, and is shuffled by an Asian character image shuffling program stored in the computer 22. The character image is shuffled. The shuffled Asian character image is sent via the network 30 to a third-party computer 40 that converts the character image into text. The computer 40 stores OCR software, and the shuffled Asian character image is converted into text data. The text data is sent to the transmission source computer 22 via the network 30 and reshuffled (rearrangement of original characters). The image scanner 21, the computer 22, and the Asian character image shuffle processing program constitute an Asian character image shuffle processing system 20. When a highly confidential Asian document is transmitted via the network 30 and the content is transmitted to the other party, the document is received by the computer 50 storing the reshuffle program and reshuffled by the reshuffle program. Content can be communicated. Conversely, if the computer 50 does not store the reshuffle program, reshuffle is not possible, so that even if the communication is intercepted, the meaning is not known to a third party.
[0031]
FIG. 5 is a flowchart illustrating a procedure of an Asian character image shuffling method according to the present invention. This will be described in detail with reference to FIG. First, an Asian document 10 (FIG. 12 is an actual example) in which a sentence to be shuffled is described is read by the image scanner 21 and an image is captured (step S10). Since the captured image includes a character portion and a portion that does not include a character (such as a photograph or a figure), it is divided into a character portion and a non-character portion (step S20). Thereafter, preprocessing is performed as necessary. Here, the pre-processing means that, when the scanning is performed, the paper is tilted, and the line is read while being tilted, such as “skew correction” for eliminating the tilt, and dust and dirt are removed. `` Noise removal '' to remove it if it was read as part of a character image because it was on, or if the text was underlined or underlined, Is a process of performing "ruled line detection" in advance so that is not regarded as a part of a character, and removing it. These pre-processes can be performed using functions built into existing OCR software or the like. FIG. 13 is an image after preprocessing is performed on the image of FIG.
[0032]
Next, a data structure of a tree structure of a multi-resolution level is constructed for the character image obtained in step S20 (step S30). This is because, as described above, when shuffling an Asian character image consisting of two or more separate parts, the parts making up the character are not moved separately, but are moved as one. In addition, this is performed so that characters can be identified and extracted by the computer. For this purpose, the resolution of the input character image is gradually reduced, and the thickness of the character line is increased, until the gap between the two or more separated portions disappears. If the gap is eliminated, the computer can integrally recognize (ie, recognize as one character) the Asian character image composed of the separated two or more parts. Since the character image finally shuffled is a character image input by an image scanner, it is necessary to store attribute (size, position) information of the character image before the resolution is reduced.
[0033]
Therefore, when storing the attribute of the character image in the computer, the outside of the block of the maximum size to which the line is connected in the character image is surrounded by a frame, and it is named "island" (island). Is represented by the coordinates of the outer frame. Usually, coordinates on a diagonal line are used. In the example of the character “rest” in FIG. 1, “I” and “tree” surrounded by the outer frame 1 and the outer frame 2 are “islands”, respectively. In the example of FIG. 2, the portion surrounded by the outer frame 3 or the outer frame 4 is an island, but the island 4 is included in the island 3. In the example of FIG. 3, “rest” is one island alone.
[0034]
FIG. 6 divides the step S30 into smaller steps, which will be described below with reference to FIG.
[0035]
First, the resolution is gradually reduced until the read original character text image ((a) and (b) in FIG. 7) becomes a full dot image (a state in which all the characters are filled). A multi-resolution pyramid structure of the image is constructed (step S31).
[0036]
Here, the pyramid structure is one of the hierarchical data structures used in the field of computer image processing. ^k × 2 ^k For an image composed of pixels, the resolution (resolution) is different (2 ⁰ × 2 ⁰ ~ 2 ^k × 2 ^k 8) Considering a hierarchical accumulation of k + 1 images, and is expressed, for example, in a form as shown in FIG. That is, the input image I ₀ Starting from, the image I in which the number of pixels becomes とも in both the vertical and horizontal order ₁ , I ₂ , ... are generated one after another. Conversely, when the size of the entire image is kept constant, the size of one dot is quadrupled, so that the resolution is reduced.
[0037]
The above I in the embodiment of the present invention _k-1 And I _k Is as shown in FIG. 9 or FIG. First, the one shown in FIG. 9 is referred to as a “four-link” method, in which the top, bottom, left and right (this means four-link) of one pixel in FIG. ¹ × 2 ¹ An image of the level ((b) in the figure) is created. Next, the four concatenated ones are regarded as a new one dot, and two dots vertically, horizontally, and ¹ × 2 ¹ Connect the dots of the level. In this way 2 ² × 2 ² The image of the level ((c) in the figure) is completed. Hereinafter, similarly, 2 ³ × 2 ³ Level ((d) in the figure), 2 ⁴ × 2 ⁴ An image of the level (FIG. 10E) is formed, and a pyramid structure of a multi-resolution level is constructed. FIG. 10 shows the case of the “8 connection” method in which the left and right diagonal directions are added to the four connections in FIG. 9 in addition to the vertical and horizontal directions. Since the basic concept is the same, the description is omitted.
[0038]
Next, the above-mentioned "island" is extracted from the image blocks of each resolution level of the pyramid structure constructed in step S31 (step S32). An island is the largest frame surrounding a connected image block, and extracting an island means storing the coordinates of the outer frame in a computer.
[0039]
Next, the attribute information of the island for each resolution level obtained in the previous step is used as a data structure of a tree structure, and a parent-child relationship is constructed for each node (node) of the tree structure (step S33). This will be described in detail with reference to FIG. 7. As the resolution of the input image is reduced, the line gradually becomes thicker, and adjacent islands become one to generate a new island. ^k × 2 ^k Island on level 2 ^(K-1) × 2 ^(K-1) It must include islands at the level. Specifically, the island 1 in FIG. 7F includes islands 2, 3, and 4 at the next lower level, and the island 2 in FIG. 7E includes islands 5, 6, and 7 at the lower level. It becomes a relation of including. For such a relationship, it is suitable to use a tree-structured data structure. In the example of FIG. 7, the island 1 is a root (root) node in the tree structure, the lower-level island 2 is a child node of the island 1, and a node corresponding to the islands 5, 6, and 7 of Level 2 ( Child) also becomes the parent node. In this way, a node corresponding to each island is provided, and a parent-child relationship is established for the node.
[0040]
After the parent-child relationship is established for the node corresponding to the island at each resolution level in this way, the attribute information (position and size) of the island corresponding to the child node is stored in each parent node (step S34). Specifically, in FIG. 7, the attribute information of the islands 9 and 10 in Level 4 is stored in the node of Island 8. By doing so, the computer can know that the island 8 is composed of the island 9 and the island 10.
[0041]
Here, if it is a human, it can be seen that the island 2 of FIG. 7 is made of three characters, but the computer can grasp it as only one block. However, since one Asian character is almost square, the number can be estimated by calculating the ratio between the width and the length.
[0042]
Next, character blocks (islands) that can be shuffled are extracted by analyzing the tree structure of the multi-resolution level (step S40). Specifically, the number of characters that can be shuffled on the island (1 in (f) of FIG. 7) belonging to the root node of the tree structure is estimated by calculation. In the case of FIG. 7, the reading point “,” is included, so the number is odd. However, by tracing the level (Level 1) below that by the analysis of the tree structure, the island 4 becomes another. It turns out that it is extremely small in comparison. Therefore, this is excluded from the shuffle targets, and the remaining four are shuffle targets.
[0043]
Next, character islands to be shuffled are randomly selected and shuffled (step S50). There are various shuffling methods, and one of the methods is as shown in the flowchart of FIG. First, an empty space for placing a character image to be shuffled is created (step S51). Next, character images to be shuffled are randomly selected one by one, and the selected character images are put into the empty space created in the previous step, and the empty space is completely filled with the character images (step S52). As the island of the character image to be moved, the level of the input image (Level 4 in FIG. 7) is used, and the position information is given as an attribute (step S53). This is necessary when reshuffling to the original position. FIG. 14 is a character image after the character image of FIG. 13 is shuffled. It turns out that the characters alone are legible but do not make sense as a whole.
[0044]
The Asian character image can be shuffled as described above. However, when the work such as the conversion into the text by the OCR is completed, it is necessary to restore the original image. In this case, the process returns using the position information of the island of the shuffled character (step S60). Finally, the reshuffled character image portion and the non-character image portion are combined to restore the original Asian document (step S70).
[0045]
Although the character shuffling technique according to the present invention is intended for Asian characters, it is needless to say that it can be applied to characters used in Western countries such as the alphabet.
[0046]
The embodiments of the present invention described above are merely examples, and it goes without saying that various modifications can be made without departing from the spirit and scope of the present invention.
[0047]
【The invention's effect】
As described above, according to the shuffle method and the shuffle processing system for Asian character images according to the present invention, the character images are shuffled, and the order of the characters is randomized, and a third party is entrusted with text conversion. Therefore, it is possible to convert a secret document into a text without knowing the meaning of the content by a third party.
[0048]
Also, when an Asian document is imaged and transmitted to the other party by communication, the character image can be transmitted after being shuffled, so that the confidentiality of the document is maintained even in a situation where communication may be intercepted. There is an advantage that can be.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining the configuration of Asian characters.
FIG. 2 is a diagram for explaining the concept of a character island.
FIG. 3 is a diagram showing an example of Asian characters recognized as one;
FIG. 4 is a block diagram illustrating a configuration of an Asian character image shuffle system according to the present invention.
FIG. 5 is a flowchart of an Asian character image shuffling method (system) according to the present invention;
FIG. 6 is a flowchart of a method (means) for constructing a tree structure of a multi-resolution level.
FIG. 7 is a diagram illustrating an example of a pyramid structure and a tree structure at a multi-resolution level.
FIG. 8 is a diagram for explaining a general multi-resolution pyramid structure.
FIG. 9 is a diagram for explaining a 4-connection method.
FIG. 10 is a diagram for explaining an 8-connection method.
FIG. 11 is a diagram showing an example of a flowchart of a shuffling method (means).
FIG. 12 is a diagram illustrating an example of an Asian document.
FIG. 13 is a diagram illustrating an example of an Asian document after performing preprocessing.
FIG. 14 is an example of an Asian character image after shuffling.
[Explanation of symbols]
10. Asian Documents (paper)
20 Asian Character Shuffle Processing System
21 Image Scanner
22 Computer (PC)
30 Network
40 Third party terminal
50 Third party terminal

Claims

スキャンされたアジア文書の文字イメージをシャッフルし、ランダムに並び替えて新たなアジア文書を作るためのアジア文字イメージのシャッフル方法であって、該方法は、次のステップを含むことを特徴とする。
ａ）アジア文書をスキャンしてそのイメージを取り込むステップ
ｂ）前記取り込んだアジア文書のイメージを、文字イメージ部分と非文字イメージ部分とに分割するステップ
ｃ）前記文字イメージ部分について、多解像度レベルの木構造を構築するステップ
ｄ）前記多解像度レベルの木構造を解析することによって、シャッフル可能な文字イメージの島を抽出するステップ
ｅ）一個又はそれ以上の文字イメージのブロックを含むシャッフル可能な島をランダムに選んでシャッフルし、該シャッフルされた島をリシャッフルして元の位置に戻すステップ
ｆ）前記リシャッフルされた文字イメージ部分と前記非文字イメージ部分を合体させて完全な元のアジア文書にするステップA method for shuffling a scanned Asian document image and randomly rearranging the scanned Asian document image to form a new Asian document, the method comprising the following steps.
a) scanning an Asian document and capturing the image; b) dividing the captured image of the Asian document into a character image portion and a non-character image portion; c) a multiresolution level tree for the character image portion. Constructing a structure d) extracting shuffleable character image islands by analyzing the multi-resolution level tree structure e) randomly shuffling islands comprising one or more blocks of character images And f) shuffling and shuffling the shuffled island back to its original position. F) combining the reshuffled character image portion and the non-character image portion into a complete original Asian document. Steps

請求項１に記載のアジア文字イメージのシャッフル方法におけるアジア文字イメージについての多解像度レベルの木構造を構築する方法が、さらに以下のステップを有することを特徴とするアジア文字イメージのシャッフル方法。
ｉ）文字イメージがフルドットイメージになるまで解像度を粗くしていくことにより、文字イメージの多解像度ピラミッド構造を構築するステップ
ｉｉ）各解像度レベルにおけるピラミッド構造の中から、全ての島を見つけるステップ
ｉｉｉ）ピラミッド構造の各レベル間において木構造の各ノードの親子関係を構築するステップ
ｉｖ）木構造の各レベルのノードに、それぞれの位置情報とサイズ情報を持たせるステップThe method for shuffling an Asian character image according to claim 1, wherein the method for constructing a multi-resolution level tree structure for the Asian character image in the method for shuffling an Asian character image further comprises the following steps.
i) a step of constructing a multi-resolution pyramid structure of the character image by reducing the resolution until the character image becomes a full dot image ii) a step of finding all islands from the pyramid structure at each resolution level ) A step of constructing a parent-child relationship of each node of the tree structure between each level of the pyramid structure iv) a step of giving each position information and size information to each node of each level of the tree structure

請求項２記載のアジア文字イメージのシャッフル方法であって、２^{（ｊ−１）}×２^{（ｊ−１）}の解像度レベルにおける１ドットのまわりを囲むように塗りつぶすことにより２^ｊ×２^ｊの解像度レベルにおける１ドットが構成され、「島」は４連結の塗りつぶしによって結合されたイメージブロックの外枠として定義され、２^ｊ×２^ｊレベルと２^{（ｊ−１）}×２^{（ｊ−１）}レベルはノード上の親子関係を構成し、２^ｊ×２^ｊレベルにおける親ノードの島は、２^{（ｊ−１）}×２^{（ｊ−１）}レベルにおける「島」又は子ノードをすべて含むことを特徴とするアジア文字イメージのシャッフル方法。A shuffling method of claim 2 Asian character image ^{according, 2 (j-1) ×} 2 (j-1) 2 j × 2 j resolution by filling it to surround around the one dot in the resolution level of one dot is formed in the level, "island" is defined as an outer frame of the image blocks that have been bound by the fill 4 connecting, 2 ^j × 2 ^j levels and ^{2 (j-1) × 2} (j-1) level Form a parent-child relationship on the node, and the island of the parent node at the 2 ^j × 2 ^j level includes all “islands” or child nodes at the 2 ^(j−1) × 2 ^(j−1) level. How to shuffle Asian character images.

請求項１に記載のアジア文字イメージのシャッフル方法であって、
前記島のシャッフル方法がさらに以下のステップを有することを特徴とするアジア文字イメージのシャッフル方法。
ａ）シャッフルされる文字島を配置するための空スペースを作るステップ
ｂ）シャッフルされる文字島を１個ずつランダムに選び、それを前記空スペースに移し換え、すべての空スペースを前記文字島で埋めるステップ
ｃ）シャッフルされた文字島に、シャフルされる前のスキャンされた文字島の位置座標を属性として持たせるステップA method for shuffling Asian character images according to claim 1, wherein
The method of shuffling Asian character images, further comprising the following steps.
a) creating an empty space for arranging shuffled character islands b) randomly selecting shuffled character islands one by one, transferring them to the empty space, and replacing all empty spaces with the character islands Filling step c) A step of giving the position coordinates of the scanned character island before shuffling as attributes to the shuffled character island.

シャッフルされた文字島を元の位置に戻すためのリシャッフリングを行う際に、シャッフルされた文字島が属性として有する元の文字島の位置座標を用いることを特徴とする請求項１に記載のアジア文字イメージのシャッフル方法。2. The Asia according to claim 1, wherein, when performing reshuffling for returning the shuffled character island to the original position, the position coordinates of the original character island that the shuffled character island has as an attribute are used. How to shuffle character images.

スキャンされたアジア文書を文字イメージ部分と非文字イメージ部分に分けるステップを有する請求項１に記載のアジア文字イメージのシャッフル方法であって、該ステップがさらに以下のステップを有することを特徴とするアジア文字イメージのシャッフル方法。
ａ）スキャンされた文書からアジア文字イメージ部分を分離するイメージ前処理技術を使用するステップ
ｂ）テキスト文字イメージ又は前景イメージのような、文字イメージのみを含むイメージを使用するステップ
ｃ）背景イメージのような非文字イメージのみを含むイメージを使用するステップ2. The method of claim 1, further comprising the step of separating the scanned Asian document into a character image portion and a non-character image portion, wherein the step further comprises the following steps. How to shuffle character images.
a) using an image pre-processing technique to separate Asian character image portions from the scanned document; b) using an image containing only character images, such as a text character image or foreground image; c) using a background image. Using images containing only non-text images

前記イメージ前処理技術が、傾き補正、ノイズ除去、罫線検出、非文字イメージ検出を行うものであることを特徴とする請求項６に記載のアジア文字イメージのシャッフル方法。7. The shuffling method of claim 6, wherein the image pre-processing technique performs skew correction, noise removal, ruled line detection, and non-character image detection.

スキャンされたアジア文書の文字イメージをシャッフルし、ランダムに並び替えて新たなアジア文書を作るためのアジア文字イメージのシャッフル処理システムであって、該システムは、次の手段を含むことを特徴とする。
ａ）アジア文書をスキャンしてそのイメージを取り込むイメージ入力手段
ｂ）前記入力されたアジア文書のイメージを、文字イメージ部分と非文字イメージ部分とに分割する手段
ｃ）前記文字イメージ部分について、多解像度レベルの木構造を構築する手段
ｄ）前記多解像度レベルの木構造を解析することによって、シャッフル可能な文字イメージの島を抽出する手段
ｅ）一個又はそれ以上の文字イメージのブロックを含むシャッフル可能な島をランダムに選んでシャッフルし、該シャッフルされた島をリシャッフルして元の位置に戻す手段
ｆ）前記リシャッフルされた文字イメージ部分と前記非文字イメージ部分を合体させて完全な元のアジア文書にする手段An Asian character image shuffling processing system for shuffling and randomly rearranging a scanned Asian character image to form a new Asian document, the system including the following means. .
a) image input means for scanning an Asian document and taking in the image; b) means for dividing the input image of the Asian document into a character image portion and a non-character image portion; c) multi-resolution for the character image portion. Means for constructing a level tree structure d) means for extracting islands of shuffleable character images by analyzing said multi-resolution level tree structure e) shuffleable sequences comprising one or more blocks of character images Means for randomly selecting islands and shuffling them, and reshuffling the shuffled islands back to their original positions; f) combining the reshuffled character image portion and the non-character image portion to obtain a complete original Asian Means of writing

請求項８に記載のアジア文字イメージのシャッフル処理システムにおけるアジア文字イメージについての多解像度レベルの木構造を構築する手段が、さらに以下の手段を有することを特徴とするアジア文字イメージのシャッフル処理システム。
ｉ）文字イメージがフルドットイメージになるまで解像度を粗くしていくことにより、文字イメージの多解像度ピラミッド構造を構築する手段
ｉｉ）各解像度レベルにおけるピラミッド構造の中から、全ての島を見つける手段ｉｉｉ）ピラミッド構造の各レベル間において木構造の各ノードの親子関係を構築する手段
ｉｖ）木構造の各レベルのノードに、それぞれの位置情報とサイズ情報を持たせる手段9. An Asian character image shuffling system according to claim 8, wherein the means for constructing a multi-resolution tree structure for the Asian character image in the Asian character image shuffling system further comprises the following means.
i) means for constructing a multi-resolution pyramid structure of the character image by reducing the resolution until the character image becomes a full dot image ii) means for finding all islands from the pyramid structure at each resolution level ) Means for constructing a parent-child relationship of each node of the tree structure between each level of the pyramid structure iv) means for giving each position information and size information to each node of each level of the tree structure

請求項９記載のアジア文字イメージのシャッフル処理システムであって、
２^{（ｊ−１）}×２^{（ｊ−１）}の解像度レベルにおける１ドットのまわりを囲むように塗りつぶすことにより２^ｊ×２^ｊの解像度レベルにおける１ドットが構成され、「島」は４連結の塗りつぶしによって結合されたイメージブロックの外枠として定義され、２^ｊ×２^ｊレベルと２^{（ｊ−１）}×２^{（ｊ−１）}レベルはノード上の親子関係を構成し、２^ｊ×２^ｊレベルにおける親ノードの島は、２^{（ｊ−１）}×２^{（ｊ−１）}レベルにおける「島」又は子ノードをすべて含むことを特徴とするアジア文字イメージのシャッフル処理システム。An Asian character image shuffling system according to claim 9, wherein
One dot at a 2 ^j × 2 ^j resolution level is formed by painting around one dot at a resolution level of 2 ^(j−1) × 2 ^(j−1) . Defined as the outer frame of the image blocks connected by filling, the 2 ^j × 2 ^j level and the 2 ^(j-1) × 2 ^(j-1) level constitute a parent-child relationship on the node, and 2 ^j × 2 ^j An Asian character image shuffling system, wherein the islands of the parent node at the level include all “islands” or child nodes at the 2 ^(j−1) × 2 ^(j−1) level.

請求項８に記載のアジア文字イメージのシャッフル処理システムであって、
前記島のシャッフル手段がさらに以下の手段を有することを特徴とするアジア文字イメージのシャッフル処理システム。
ａ）シャッフルされる文字島を配置するための空スペースを作る手段
ｂ）シャッフルされる文字島を１個ずつランダムに選び、それを前記空スペースに移し換え、すべての空スペースを前記文字島で埋める手段
ｃ）シャッフルされた文字島に、シャフルされる前のスキャンされた文字島の位置座標を属性として持たせる手段An Asian character image shuffling system according to claim 8, wherein
An Asian character image shuffling system, wherein the island shuffling means further includes the following means.
a) means for creating an empty space for arranging shuffled character islands b) randomly selecting shuffled character islands one by one, transferring them to the empty space, and replacing all empty spaces with the character islands Means for filling c) means for giving the position coordinates of the scanned character island before shuffling as attributes to the shuffled character island

前記シャッフルされた文字島を元の位置に戻すためのリシャッフリングを行う際に、シャッフルされた文字島が属性として有する元の文字島の位置座標を用いることを特徴とする請求項８に記載のアジア文字イメージのシャッフル処理システム。9. The reshuffling for returning the shuffled character islands to their original positions, using the position coordinates of the original character islands which the shuffled character islands have as attributes. Asian character image shuffle processing system.

スキャンされたアジア文書を文字イメージ部分と非文字イメージ部分に分ける手段を有する請求項８に記載のアジア文字イメージのシャッフル処理システムであって、前記手段がさらに以下の手段を有することを特徴とするアジア文字イメージのシャッフル処理システム。
ａ）スキャンされた文書からアジア文字イメージ部分を分離するイメージ前処理技術を使用する手段
ｂ）テキスト文字イメージ又は前景イメージのような、文字イメージのみを含むイメージを使用する手段
ｃ）背景イメージのような非文字イメージのみを含むイメージを使用する手段9. The system according to claim 8, further comprising means for dividing the scanned Asian document into a character image portion and a non-character image portion, wherein said means further comprises the following means. Asian character image shuffle processing system.
a) means for using image preprocessing techniques to separate Asian character image portions from the scanned document; b) means for using images containing only character images, such as text character images or foreground images; c) like background images. To use images containing only non-character images

前記イメージ前処理技術が、傾き補正、ノイズ除去、罫線検出、非文字イメージ検出を行うものであることを特徴とする請求項１３に記載のアジア文字イメージのシャッフル処理システム。14. The Asian character image shuffling system according to claim 13, wherein the image preprocessing technique performs skew correction, noise removal, ruled line detection, and non-character image detection.

コンピュータを制御するためのプログラムを記録したコンピュータ読み取り可能な記録媒体であって、前記プログラムは、前記コンピュータに、スキャンされたアジア文書の文字イメージのシャッフル及びリシャッフルを下記ステップによって行わせるためのプログラムであることを特徴とするコンピュータ読み取り可能な記録媒体。
ａ）アジア文書をスキャニングして、得られたイメージを文字イメージ部分と非文字イメージ部分に分けるステップ
ｂ）前記文字イメージについて、多解像度レベル木構造を構築するステップであって、該ステップは以下のステップを含む
ｉ）解像度レベルが２^０×２^０、２^１×２^１、２^２×２^２、…とフルドットになるまで文字トイメージの解像度を変化させて文字イメージの多解像度ピラミッド構造を構築するステップ
ｉｉ）各解像度レベルにおけるイメージの中で、ドットが繋がっているものを一つの島と判断して、全ての島を検出するステップ
ｉｉｉ）ピラミッド構造になっている各解像度レベル間において、島の木構造のノードの親子関係を構築するステップ
ｉｖ）木構造の各ノードにそれぞれ一つ前のレベルの島の位置情報及びサイズ情報を属性として格納しておくステップ
ｃ）アジア文字イメージをシャッフルし、また、リシャッフルするステップA computer-readable recording medium recording a program for controlling a computer, wherein the program causes the computer to shuffle and reshuffle a scanned Asian document character image by the following steps. A computer-readable recording medium, characterized by being:
a) scanning an Asian document and dividing an obtained image into a character image portion and a non-character image portion; b) constructing a multi-resolution level tree structure for the character image, the step comprising: comprising i) a resolution level ^{^{^{2 0 × 2 0, 2 1}}} × 2 1, 2 2 × 2 2, ... and the multi-resolution pyramid structure of the character image by changing the resolution of the character preparative images until full dot Step ii) of constructing an image at each resolution level in which dots are connected is determined as one island, and all islands are detected. Step iii) Between each resolution level having a pyramid structure, Step iv) constructing a parent-child relationship between the nodes of the tree structure of the island. And shuffling step c) Asian character image storing size information as an attribute, also Reshuffle step