JP3839604B2

JP3839604B2 - Data processing method

Info

Publication number: JP3839604B2
Application number: JP36370098A
Authority: JP
Inventors: 明斉藤
Original assignee: Toshiba Corp; Toshiba TEC Corp
Current assignee: Toshiba Corp; Toshiba TEC Corp
Priority date: 1998-12-22
Filing date: 1998-12-22
Publication date: 2006-11-01
Anticipated expiration: 2018-12-22
Also published as: JP2000188692A

Description

【０００１】
【発明の属する技術分野】
この発明は、データ処理方法に関するものであり、具体的には、ＬＺ７７およびＬＺ７８に代表される辞書ベース方式を基にしたデータ圧縮技術を用いて、画像データを効率的に圧縮するデータ処理方法に関する。
【０００２】
【従来の技術】
現在の辞書ベースによるデータ圧縮方法の起源は、Abraham Lempel氏とJacob Ziv 氏とが１９７７年にIEEE Transaction on Information Theoryに発表した論文｀AUniversal Algorithm for Sequential Data Compression´に見られる。これは、通称Lempel-Ziv符号化のスライド辞書法又はＬＺ７７法と言われている。
【０００３】
例えば、宗像清治：Ziv-Lempelのデ一タ圧縮法，情報処理，Ｖｏｌ．２６．Ｎｏ．１（１９８５）に、それが紹介されている。
【０００４】
ＬＺ７７のアルゴリズムは、符号化データを過去のデータ系列の任意の位置から一致する最大長の系列に区切り、過去の系列の複製として符号化する方法である。
【０００５】
具体的には、図２に示すように、符号化済みの入力データを格納する移動窓と、これから符号化するデータを格納する先読みバッファとを備え、先読みバッファのデータ系列と移動窓のデータ系列のすべての部分系列とを照合して、移動窓中で一致する最大長の部分系列を求める。
【０００６】
そして、移動窓中でこの最大長の部分系列を指定するために、「その最大長の部分系列の開始位置」と「一致する長さ」と「不一致をもたらした次のシンボル」との組を符号化する。
【０００７】
次に、先読みバッファ内の符号化したデータ系列を移動窓に移して、先読みバッファ内に符号化したデータ系列分の新たなデータ系列を入力する。
【０００８】
以下、同様の処理を繰り返していくことで、データを部分系列に分解して符号化を実行していくのである。
【０００９】
そして、このような基本的なデータ圧縮技術に対して、多くの改良型が提案されている。
【００１０】
例えば、符号化コードであるのか、生データであるのかを識別するフラグを設けて、符号化コードが生データよりも長くなってしまうときには生データを符号化するという方法をとるＬＺＳＳ符号方式(T.C.Bell,“Better OPM/L Text Compression",IEEE Transaction Commun.,Vol.COM-34,No.12,Dec.(1986)) がある。
【００１１】
また、他の文献としては、Ｍ．ネルソン：データ圧縮ハンドブック改訂第２版、トッパン(1996). ISBN4-8101-8605-9 がある。
【００１２】
ところで、近年、ＯＡシステム（スキャナ、プリンタ、ディジタル複写機など）が普及し、高速化・高解像度化の方向を目指している。
【００１３】
これらの装置では、大容量の画像データを高速で処理する必要があり、高速・高圧縮率のデータ圧縮を加えることで、処理するデータ量を滅らすことが必須となっている。
【００１４】
このようなデータ圧縮の従来技術としては、ＭＭＲ、ＪＢＩＧなど標準化された方式があるが、ＭＭＲは精細な画像で圧縮率が悪化する傾向にある。
【００１５】
また、圧縮率の点ではべストに近いＪＢＩＧは基本的に画素単位の処理であるため高速化に限界があり、高速システムでは採用できなかった。
【００１６】
しかるに、上述した辞書ベース圧縮方式は、基本的にバイト単位の処理であるためＪＢＩＧよりはるかに高速化が可能であり、また精細な画像に対してもＭＭＲほど圧縮率が悪化しないという特徴があり、高速・高解像度のＯＡシステムに適している。
【００１７】
【発明が解決しようとする課題】
しかしながら、従来のＬＺ７７ベースによるデータ圧縮装置では、符号化する際、移動窓中で一致する最大長の部分データ列を求めるためには、これから符号化するデータ列と移動窓の中のすべての位置の間でデータ列比較を行わなければならない。
【００１８】
すなわち、図２に示すように、これから符号化するデータ列を、移動窓中のオフセット１の位置から始まるデータ列、オフセット２の位置から始まるデータ列、…オフセットｎ（ｎは移動窓のサイズ）の位置から始まるデータ列と比較して、最大一致長が得られるオフセットを見つけることである。
【００１９】
上記のような最大一致長を求める方式では、それぞれのオフセットとも長い一致が得られる場合に処理速度が落ちるという欠点がある。たとえば画像の白い部分を符号化すると、すべてのオフセットとの比較で最長の一致（たとえば２５６）が得られるので、１データの比較を１回とカウントすると、各オフセットあたり２５６回の比較を行うことになり、データ列の比較時間が飛躍的に伸びるという問題があった。
【００２０】
この発明の目的は、上記した事情に鑑みなされたものであって、同一データが連続する圧縮対象データ、つまり比較的長い一致長が得られる圧縮対象データを高速に圧縮することが可能なデータ圧縮方法を提供することにある。
【００２１】
【課題を解決するための手段】
上記課題を解決し目的を達成するために、この発明のデータ圧縮方法は、下記に示す通りである。
【００２２】
この発明は、画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出するとき、予め規定された最大一致長が得られた時点で一致長の検出を終了し、この検出された一致長を基にして符号化を行う。
【００２３】
この発明は、画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して符号化するとき、前記符号化対象シンボルに含まれる前記特定のシンボルの比較の対象として、前記オフセット１、Ｌ、Ｌ＋１、及びＬ−１のシンボルを優先的に選択する。
【００２４】
この発明は、画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して符号化するとき、前記符号化対象シンボルに含まれる前記特定のシンボルの比較対象優先順位を優先度の高いものから順に、前記オフセットＬ、１、Ｌ−１、Ｌ＋１、Ｌ−２、Ｌ＋２、…、Ｌ−ｎ、及びＬ＋ｎのシンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出するとき、予め規定された最大一致長が得られた時点で一致長の検出を終了し、この検出された一致長を基にして符号化を行う。
【００２５】
【発明の実施の形態】
以下、この発明の実施の形態について図面を参照して説明する。
【００２６】
まず、二次元的に近い位置から比較ポイントを選択する点を説明する。
【００２７】
従来例で述べたように、ＬＺ７７ベースの圧縮をソフトウェアで実現しようとすると、もっとも単純なインプリメントでは、符号か位置から始まるデータ列と、移動窓中のすべての位置から始まるデータ列とを比較し、最長の一致位置を検出することになる。この方式では、移動窓を大きくとったときに処理速度の低下が著しい。
【００２８】
そこで、第１の発明では、ＬＺ７７をベースとしながらも、これから符号化するデータ列を移動窓中のすべての位置から始まるデータ列と比較するのではなく、一致する可能性の高い位置から始まるデータ列だけを比較対象とすることで、処理速度向上を図っている。例えば、比較対象位置として１６あるいは３２程度で実現するものである。
【００２９】
しかしながら、単純に比較対象位置の数を減らしただけでは、一致する可能性が小さくなり圧縮率が低下すると考えられる。第１の発明では、画像データの周期性に着目して比較対照する位置を選択している。すなわち、移動窓中のすべての部分列を比較するのでなく、画像データ周期性に着目して、一致する可能性の高いデータ位置だけを比較する。
【００３０】
以下、図１を基に第１の発明の原理を説明する。ここでは圧縮対象のデータの単位をバイト単位としている。画像データの２次元的な局所性を考慮すると、あるバイトともっとも類似性が高いのはその上下左右の位置である。画像データの入力順として一般的な左上から右下へのラスタスキャンを考えると、あるバイトに対して、右と下の隣接バイトはこれから入力されるものであるため移動窓にはまだ入っていない。したがって右と下の隣接バイトを比較対象とすることはできない。左の隣接バイトは入力順で一つ前に入力したもっとも最近のデータであり、移動窓中でオフセット１の位置に入っている。上の隣接バイトは、入力画像データのライン長（横幅）がバイト数でＬとすると、移動窓中のオフセットＬの位置に入っている。ただし、移動窓のサイズがＬ以上であることが条件である。従来のＬＺ系コーデックは、入力が画像データであっても、その周期性を無視して左方向だけで一致するポイントを探していたことになる。ここでは、左方向に加えて上方向に隣接する位置とその周辺を比較ポイントに選んでいる。図１において、ひし形は、これから符号化するデータの先頭バイトを示し、イコールは、これから符号化するデータ系列を示し、黒塗り四角は、移動窓のうち比較ポイントとするバイト位置を示し、しろ抜きの四角は、移動窓のうち比較ポイントとしないバイト位置を示す。すなわち、ここでは、比較対象として、１６箇所のオフセット位置（１、Ｌ−７、Ｌ−６、Ｌ−５、Ｌ−４、Ｌ−３、Ｌ−２、Ｌ−１、Ｌ、Ｌ＋１、Ｌ＋２、Ｌ＋３、Ｌ＋４、Ｌ＋５、Ｌ＋６、Ｌ＋７）から始まるデータ列を選んでいる。Ｌは画像のライン長（主走査方法のサイズ）であり、あらかじめ外部から設定されている。
【００３１】
比較対象位置が少なくすることは、オフセット符号を短くできる点でも優れている。例えば、移動窓のサイズを２ｋＢとすると、従来例ではオフセットとして２ｋ通りの符号が必要になるが、図１の例ではオフセットとして１６通りしか必要ないので、単純な符号を選んだ場合、従来例では１１ビットのオフセット符号長になるのに対して、この発明では４ビットと短い。
【００３２】
さらに、この発明では、各オフセットにおける一致長を求める順番と最長一致探索打ち切り条件を組み合わせて圧縮処理の高速化を図っている。この発明の１６箇所の比較ポイントに対して、単純に最長一致を求めるやり方は次のようになる。
【００３３】
１６箇所の比較ポイントに対して単純に最長一致を求めるモジュール
｛符号化位置からのデータ列とオフセット１からのデータ列の一致長を求め、結果をｌｅｎ１とする
符号化位置からのデータ列とオフセットＬ−７からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ−７）とする
符号化位置からのデータ列とオフセットＬ−６からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ−６）とする
…
符号化位置からのデータ列とオフセットＬ＋６からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ＋６）とする
符号化位置からのデータ列とオフセットＬ＋７からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ＋７）とする
ｌｅｎ１、ｌｅｎ（Ｌ−７）、（Ｌ−６）…、ｌｅｎ（Ｌ＋７）の最大値とそのときのオフセットを返す｝
各オフセットで一致長を求める際には、一致長符号の構成で上限を決めて、長い一致が得られても上限で比較処理をうち切る。たとえば一致長符号の最大長が２５６となっている場合には、一致の検出が２５６に達したところでその後の比較をうち切り、２５６を一致長とする。
【００３４】
従来のような最大一致長を求める方式では、それぞれのオフセットとも長い一致が得られる場合に処理速度が落ちるという欠点がある。たとえば画像の白い部分を符号化すると、すべてのオフセットとの比較で最長の一致（たとえば２５６）が得られるので、１データの比較を１回とカウントすると、各オフセットあたり２５６回、計２０９６回の比較を行うことになる。
【００３５】
この発明では、２５６が最大の一致長であることに着目して、２５６という一致長がえられた時点で残りのオフセットの比較をうち切ることで高速化を図る。ただし、通常はオフセット符号、一致長符号ともハフマン符号を用いるので、長い一致長が得られそうなオフセットに短いオフセット符号を割り当てている。たとえばこの発明のようにオフセットを選んだ場合は、これから符号化する位置に近い方が高い類似度を持つと考えられるので、Ｌに最短の符号を割り当て、以下、Ｌ−１、Ｌ＋１、Ｌ−１、Ｌ＋１、Ｌ−２、Ｌ＋２…、Ｌ−７、Ｌ＋７の順に短い符号を割り当てるのがよい。このとき、次のように単純に最大一致長で打ち切りを導入すると、最短でないオフセットが選ばれることがあり、最適とは言えない。
【００３６】
この発明の１６箇所の比較ポイントに対して単純に最長一致を求めるモジュールに打ち切りを導入
｛符号化位置からのデータ列とオフセット１からのデータ列の一致長を求め、結果をｌｅｎ１とする
ｌｅｎ１＝２５６なら、オフセット＝１、一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬ−７からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ−７）とする
ｌｅｎ（Ｌ−７）＝２５６なら、オフセット＝Ｌ−７、一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬ−６からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ−６）とする
ｌｅｎ（Ｌ−６）＝２５６なら、オフセット＝Ｌ−６、一致長＝２５６としてモジュール終了
…
符号化位置からのデータ列とオフセットＬ＋６からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ＋６）とする
ｌｅｎ（Ｌ＋６）＝２５６なら、オフセット＝Ｌ＋６、一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬ＋７からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ＋７）とする
ｌｅｎ（Ｌ＋７）＝２５６なら、オフセット＝Ｌ＋７、一致長＝２５６としてモジュール終了
ｌｅｎ１、ｌｅｎ（Ｌ−７）、（Ｌ−６）…、ｌｅｎ（Ｌ＋６）、ｌｅｎ（Ｌ＋７）の最大値とそのときのオフセットを返す｝
この順に探索すると、画像の白い部分ではオフセット１が選ばれることになるが、最短の符号を割り当てたのはオフセットＬなので、最適符号とはならない。この点を改善するためには、つぎのように探索順をオフセット符号の短い順（長くない順）にすればよい。
【００３７】
打ち切りを導入し探索位置を改善（この発明の方式）
｛符号化位置からのデータ列とオフセットＬからのデータ列の一致長を求め、結果をｌｅｎＬとする
ｌｅｎＬ＝２５６なら、オフセット＝Ｌ、一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬからのデータ列の一致長を求め、結果をｌｅｎＬとする
ｌｅｎ＝１なら、オフセット＝１、一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬ−１からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ−１）とする
ｌｅｎ（Ｌ−１）＝２５６なら、オフセット＝Ｌ−１，一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬ＋１からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ＋１）とする
ｌｅｎ（Ｌ＋１）＝２５６なら、オフセット＝Ｌ＋１，一致長＝２５６としてモジュール終了
…
符号化位置からのデータ列とオフセットＬ−７からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ−７）とする
ｌｅｎ（Ｌ−７）＝２５６なら、オフセット＝Ｌ−７、一致長＝２５６としてモジュール終了
符号化位置からのデータ列とオフセットＬ＋７からのデータ列の一致長を求め、結果をｌｅｎ（Ｌ＋７）とする
ｌｅｎ（Ｌ＋７）＝２５６なら、オフセット＝Ｌ＋７，一致長＝２５６としてモジュール終了
ｌｅｎ１、ｌｅｎ（Ｌ−７）、（Ｌ−６）…、ｌｅｎ（Ｌ＋６）、ｌｅｎ（Ｌ＋７）の最大値とその時のオフセットを返す｝
この場合、２５６の一致が発生してそれ以降の一致長探索を中断したとしても必ずもっとも短いオフセット符号となるオフセットが選ばれるので、符号化効率の改善と処理高速化を両立できる。たとえば画像の白い部分に対しては、当初の方法では計４０９６回の比較が必要であったが、本発明の方式では２５６回の比較でよく、しかも発生する符号は同じになる。
【００３８】
符号化効率は多少落ちても高速化を実現しようとして、この発明の探索打ち切り条件を、一致長符号の最大値よりも小さく設定するようにしてもよい。例えば、一致長符号の最大値が２５６のとき、１２８を越える一致が得られたらそれ以降の探索を行わない、とすることで、多少符号化効率は落ちるものの、高速化を実現できる。
【００３９】
次に、第２の発明について説明する。
【００４０】
最長一致位置を求めるモジュールでは、符号化位置からのデータ列とそれぞれのオフセット位置から始まるデータ列とを比較して一致長を求める作業を行う、この発明では、計１６個所のオフセットに対して比較を行っている。符号化単位をバイトしたとき、単純にインプリメントすると、従来は、下記に示すように、１バイトづつ比較することになる。
int search_maechlen(BYTE*offset, BYTE*cp){
count=0
while(*offset==*cp) count++
return(count)
}
８ビットＣＰＵで圧縮処理を行う場合はこれでもよいが、最近のように３２ビットなどのＣＰＵへの実装を考えると次のように高速化できる。ＣＰＵの自然なデータ長を３２ビットとしたとき、この第２の発明では、下記に示すようにインプリメントする。
int search_maechlen(BYTE*offset, BYTE*cp){
count=0
（offsetとcpの差が４バイトの倍数でなければ従来例のように１バイト単位に比較し、そうでなければ以下の処理を行う）
（offsetとcpは４バイト境界に一致するまで１〜３バイト比較）
while(*(int)offset==*(int)cp) count+=4 //4バイト単位に比較
（最終不一致の４バイトないで１〜３バイト一致しているか、一致していればその分count追加）
return (count)
}
オフセットが４バイトの倍数でなければ、４バイト単位の比較ができないので従来のように１バイト単位で比較する。オフセットが４バイトの倍数の場合、４バイト境界に一致した部分は４バイト単位で比較できるので、まず４バイト境界まで一致しているかどうかを１バイト単位で１〜３バイト比較する。４バイト境界に一致した後は４バイト単位で高速に比較する。不一致が発生するか、一致長符号の上限に達するまで続ける。不一致が発生したときも最後の４バイト境界内で１〜３バイト一致している可能性があるので、１バイト単位で比較する。
【００４１】
一般的な３２ビットＣＰＵでは１バイト比較と４バイト比較は同じサイクルで処理される。したがってこの発明のように比較を行うことで最大４倍の高速化が実現できる。
【００４２】
次に、第３の発明について説明する。
【００４３】
伸長処理では、符号をデコードして一致オフセットと一致長を求め、データバッファの一致オフセットから一致長分だけデータをコピーし、新たなデコードデータとしてデータ出力に追加する作業を繰り返すことになる。このとき、単純にインプリメントすると、従来は、下記に示すように１バイトずつメモリコピーを行うことになる。
void matchl_copy( BYTE*offset, BY TE*cp, int length){
memcpy_in_BYTE(cp, offset, length)
}
第２の発明と同様に、最近のように３２ビットなどのＣＰＵへの実装を考えると次のように高速化できる。ＣＰＵの自然データ長を３２ビットとしたとき、この第３の発明では、下記に示すようにインプリメントする。
void matchl_copy( BYTE*offset, BY TE*cp, int length){ （offsetとcpの差が４バイトの倍数でなければ従来例のように１バイト単位にコピーし、そうでなければ以下の処理を行う）
（offsetとcpは４バイト境界に一致するまで１〜３バイトコピー）
memcpy_in_4BYTE(cp, offset, length) //4バイト単位にメモリコピー
（余りがあれば、１〜３バイト分コピー）
}オフセットが４バイトの倍数でなければ、４バイト単位のメモリコピーができないので従来例のように１バイト単位でメモリコピーを行う。オフセットが４バイトの倍数の場合、４バイト境界に一致した部分は４バイト単位でメモリコピーできるので、まず４バイト境界まで１バイト単位で１〜３バイトメモリコピーする。４バイト境界に一致した後は４バイト単位で一致長に達するまで高速にメモリコピーを行う。最後に４バイト境界ないであまりがある場合は、
１バイト単位でメモリコピーを行う。
【００４４】
一般的な３２ビットＣＰＵでは１バイトメモリコピーと４バイトメモリコピーは同じサイクルで処理される。したがって、この第３の発明のようにメモリコピーを行うことで最大４倍の高速化が実現できる。
【００４５】
次に、この発明のポイントの一覧をまとめる。
【００４６】
Lempel- Ziv方式（移動窓方式）の圧縮伸長をソフトウェアで表現するとき
＜圧縮＞
先に調べたオフセットで長い一致が得られれば他のオフセットは調査しない。
（従来）
for (from offset1 to offset N){
len 1 = search_matchlen( offset 1, current_pointer)
len 2 = search_matchlen( offset 2, current_pointer)
len 3 = search_matchlen( offset 3, current_pointer)
…
}
maxlen = max(len 1, len 2...)
（本発明）
for ( from offset 1 to offset N){
if ( len 1 = search_matchlen( offset 1, current_pointer)＞＝thresh_len) break
if ( len 2 = search_matchlen( offset 2, current_pointer)＞＝thresh_len) break
if ( len 3 = search_matchlen( offset 3, current_pointer)＞＝thresh_len) break
…
}
maxlen = max(len 1, len 2...)
search_matchlen関数をそのプロセッサのネイティブワード長（３２ビットプロセッサなら４バイト）で比較する。
＜伸長＞
一致符号から原画像を形成するとき、ワード境界が一致したら、memcpy動作を４バイトコピー命令で実行する。
【００４７】
上記したように、この発明は、ＬＺ７７ベースの圧縮装置をソフトウェアで実現する際に、画像データの周期性に着目して効率的に最長一致を与えるオフセットを探索し、最長一致を与えるオフセットが見つかった時点で、一致長の探索を終了する。これにより、余分な一致長探索処理を省略することができ、圧縮処理速度を向上させることができる。また、夫々のオフセットで一致長を調べる際に、例えば圧縮の単位（１バイト）と、圧縮ソフトウェアを実装するプロセッサのネイティブワード長（例えば３２ビットプロセッサの場合４バイト）とが異なるときに、プロセッサのネイティブワード長で一致の比較を行うことで処理効率を向上させることができる。さらに、伸張処理をソフトウェアで実装する際に、一致符号のデコードにおいて、プロセッサのネイティブワード長でメモリコピーを行うことで処理効率を向上させることもできる。
【００４８】
【発明の効果】
この発明によれば、同一データが連続する圧縮対象データ、つまり比較的長い一致長が得られる圧縮対象データを高速に圧縮することが可能なデータ圧縮方法を提供できる。
【００４９】
（１）従来のＬＺ７７ベースのデータ圧縮装置をソフトウェアで実現する際には、これから符号化するデータ列と比較対象とするすべてオフセット位置の間でデータ列比較を行うため、たとえば文書の周辺部などすべて白からなる部分で無駄な比較を行っていた。この発明では、画像データの周期性に注目し、２次元的に近いオフセット位置から比較し、一致符号で規定した最大一致長に達するオフセット位置が得られた時点でその他のオフセット位置との比較を取りやめることで、圧縮処理時間を短縮することができる。
【００５０】
また、一致符号で規定した最大一致長に達しなくとも、予め一致長のしきい値を設定しておき、それ以上の一致長が得られたらその後のオフセットとの比較を取りやめることで類似の効果が得られる。この場合最適の圧縮率が得られないことも有り得るが、しきい値を調整することで圧縮率の低下を押さえることもできる。
【００５１】
（２）従来のＬＺ７７ベースのデータ圧縮装置をソフトウェアで実現する際には、これから符号化するデータ列と比較対象のオフセット位置からのデータ列を比較し、一致長を調べる必要がある。従来は、圧縮の処理単位が１バイトのときは、両データ列を１バイト単位で比較して一致長を求めていた。この発明では、圧縮の単位（たとえば１バイト）より、圧縮ソフトウェアを実装するプロセサのネイティブワード長（たとえば３２ビットプロセサの場合４バイト）が大である場合、プロセサのネイティブワード長でデータ列の比較を行うことで処理効率が向上する。
【００５２】
（３）従来のＬＺ７７ベースのデータ伸長装置をソフトウェアで実現する際には、一致符号のデコードにおいて、一致符号の示すオフセット位置から一致長分の原データをコピーする必要があり、従来は圧縮の処理単位（たとえば１バイト）でメモリコピーを行っていた。本発明では、圧縮の単位（たとえば１バイト）より、圧縮ソフトウェアを実装するプロセサのネイティブワード長（たとえば３２ビットプロセサの場合４バイト）が大である場合、プロセサのネイティブワード長でメモリコピーを行うことで処理効率が向上する。
【図面の簡単な説明】
【図１】この発明に係るデータ圧縮方法の概略を説明するための図。
【図２】従来のデータ圧縮方法の概略を説明するための図。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data processing method, and more specifically, to a data processing method for efficiently compressing image data using a data compression technique based on a dictionary-based method represented by LZ77 and LZ78. .
[0002]
[Prior art]
The origin of the current dictionary-based data compression method can be found in the paper “AUniversal Algorithm for Sequential Data Compression” published by Abraham Lempel and Jacob Ziv in IEEE Transaction on Information Theory in 1977. This is commonly called the Lempel-Ziv encoded slide dictionary method or LZ77 method.
[0003]
For example, Seiji Munakata: Ziv-Lempel's data compression method, Information Processing, Vol. 26. No. 1 (1985).
[0004]
The LZ77 algorithm is a method of encoding encoded data as a duplicate of a past sequence by dividing the encoded data into a maximum length sequence that matches from an arbitrary position in the past data sequence.
[0005]
Specifically, as shown in FIG. 2, a moving window for storing encoded input data and a prefetch buffer for storing data to be encoded are provided, and the data sequence of the prefetch buffer and the data sequence of the moving window Are compared with each other to obtain the maximum partial sequence that matches in the moving window.
[0006]
Then, in order to specify this maximum length subsequence in the moving window, a set of “start position of the maximum length subsequence”, “matching length”, and “next symbol that caused mismatch” Encode.
[0007]
Next, the encoded data sequence in the prefetch buffer is moved to the moving window, and a new data sequence for the data sequence encoded in the prefetch buffer is input.
[0008]
Thereafter, by repeating the same processing, the data is decomposed into partial series and encoded.
[0009]
Many improved types of such basic data compression techniques have been proposed.
[0010]
For example, an LZSS encoding method (TCBell coding method) is provided in which a flag for identifying whether the data is encoded data or raw data is provided, and the raw data is encoded when the encoded code becomes longer than the raw data. "Better OPM / L Text Compression", IEEE Transaction Commun., Vol. COM-34, No. 12, Dec. (1986)).
[0011]
Further, as other documents, M.M. Nelson: Data Compression Handbook Revised 2nd edition, Toppan (1996). ISBN4-8101-8605-9.
[0012]
Incidentally, in recent years, OA systems (scanners, printers, digital copiers, etc.) have become widespread, aiming for higher speed and higher resolution.
[0013]
In these apparatuses, it is necessary to process a large amount of image data at high speed, and it is essential to destroy the amount of data to be processed by applying high-speed and high-compression data compression.
[0014]
As conventional techniques for data compression, there are standardized methods such as MMR and JBIG, but MMR tends to deteriorate the compression rate with fine images.
[0015]
Also, in terms of compression rate, JBIG, which is close to the best, is basically a pixel-by-pixel process, so there is a limit to speeding up, and it could not be adopted in a high-speed system.
[0016]
However, the dictionary-based compression method described above is basically processing in units of bytes, so it can be much faster than JBIG, and the compression rate does not deteriorate as much as MMR for fine images. Suitable for high-speed, high-resolution OA systems.
[0017]
[Problems to be solved by the invention]
However, in the conventional LZ77-based data compression apparatus, when encoding, in order to obtain the maximum length partial data sequence that matches in the moving window, the data sequence to be encoded and all positions in the moving window are to be obtained. Data column comparisons must be made between
[0018]
That is, as shown in FIG. 2, the data string to be encoded is a data string starting from the position of offset 1 in the moving window, a data string starting from the position of offset 2,... Offset n (n is the size of the moving window) Is to find the offset that gives the maximum match length compared to the data string starting from the position.
[0019]
The method for obtaining the maximum matching length as described above has a drawback in that the processing speed decreases when a long match is obtained for each offset. For example, if the white part of the image is encoded, the longest match (for example, 256) is obtained in comparison with all offsets, so if one data is counted as one comparison, 256 comparisons are made for each offset. As a result, there has been a problem that the comparison time of the data string is dramatically increased.
[0020]
An object of the present invention has been made in view of the above-described circumstances, and is a data compression capable of compressing at high speed compression target data in which the same data is continuous, that is, compression target data having a relatively long matching length. It is to provide a method.
[0021]
[Means for Solving the Problems]
In order to solve the above problems and achieve the object, a data compression method of the present invention is as follows.
[0022]
The present invention provides a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream. And a plurality of symbols starting from the specific symbol as encoding target symbols, an offset 1 symbol adjacent to the upstream side of the specific symbol, and an offset L one line away from the specific symbol A plurality of symbols having offset L + n to offset L-n on the upstream side and downstream side of the stream centering on the symbol of the above are used as comparison target symbols, and the encoding target symbol and the comparison target symbol are sequentially compared to obtain a match length. When detecting, the matching length is detected when the maximum matching length specified in advance is obtained. Exit, encoding is performed by the detected match length based.
[0023]
The present invention provides a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream. And a plurality of symbols starting from the specific symbol as encoding target symbols, an offset 1 symbol adjacent to the upstream side of the specific symbol, and an offset L one line away from the specific symbol A plurality of symbols with offset L + n to offset L-n on the upstream side and downstream side of the stream centering on the symbol of the above are used as comparison target symbols, and the encoding target symbol and the comparison target symbol are sequentially compared and encoded. A comparison of the specific symbol included in the encoding target symbol As the target, the offset 1, L, L + 1, and preferentially selects the L-1 symbols.
[0024]
The present invention provides a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream. And a plurality of symbols starting from the specific symbol as encoding target symbols, an offset 1 symbol adjacent to the upstream side of the specific symbol, and an offset L one line away from the specific symbol A plurality of symbols with offset L + n to offset L-n on the upstream side and downstream side of the stream centering on the symbol of the above are used as comparison target symbols, and the encoding target symbol and the comparison target symbol are sequentially compared and encoded. A comparison of the specific symbol included in the encoding target symbol Elephant priority is set to the symbols of the offsets L, 1, L-1, L + 1, L-2, L + 2,..., L-n, and L + n in descending order of priority, and the comparison target symbol and the comparison are compared. When the matching length is detected by sequentially comparing with the target symbol, the detection of the matching length is terminated when a predetermined maximum matching length is obtained, and encoding is performed based on the detected matching length. .
[0025]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0026]
First, the point of selecting a comparison point from a two-dimensionally close position will be described.
[0027]
As mentioned in the previous example, when trying to implement LZ77-based compression in software, the simplest implementation compares the data sequence starting from the sign or position with the data sequence starting from all positions in the moving window. The longest matching position is detected. In this method, the processing speed is remarkably reduced when the moving window is large.
[0028]
Therefore, in the first invention, the data sequence to be encoded is not compared with the data sequence starting from all the positions in the moving window, but based on LZ77, the data starting from the position that is highly likely to match. The processing speed is improved by comparing only the columns. For example, the comparison target position is realized with about 16 or 32.
[0029]
However, simply reducing the number of comparison target positions will reduce the possibility of matching and reduce the compression rate. In the first invention, the position to be compared and selected is selected by paying attention to the periodicity of the image data. That is, instead of comparing all the partial sequences in the moving window, focusing on the image data periodicity, only the data positions that are likely to match are compared.
[0030]
Hereinafter, the principle of the first invention will be described with reference to FIG. Here, the unit of data to be compressed is a byte unit. Considering the two-dimensional locality of the image data, the most similar to a certain byte is its vertical and horizontal positions. Considering a general raster scan from the upper left to the lower right as the input order of image data, for a certain byte, the right and lower adjacent bytes are input from now on, so they are not yet in the moving window. . Therefore, the right and lower adjacent bytes cannot be compared. The adjacent byte on the left is the most recent data that was input one before in the input order, and is in the position of offset 1 in the moving window. The upper adjacent byte is in the position of the offset L in the moving window when the line length (horizontal width) of the input image data is L in terms of the number of bytes. However, the condition is that the size of the moving window is L or more. In the conventional LZ codec, even if the input is image data, the periodicity is ignored and a matching point is searched only in the left direction. Here, in addition to the left direction, the position adjacent to the upper direction and its periphery are selected as the comparison points. In FIG. 1, the rhombus indicates the first byte of data to be encoded, the equal indicates the data series to be encoded, the black square indicates the byte position as the comparison point in the moving window, and the margin is removed. The squares indicate the byte positions not used as comparison points in the moving window. That is, here, as comparison targets, 16 offset positions (1, L-7, L-6, L-5, L-4, L-3, L-2, L-1, L, L + 1, L + 2 , L + 3, L + 4, L + 5, L + 6, L + 7). L is the line length of the image (the size of the main scanning method) and is set in advance from the outside.
[0031]
Reducing the comparison target position is also excellent in that the offset code can be shortened. For example, if the size of the moving window is 2 kB, the conventional example requires 2k codes as the offset, but the example of FIG. 1 requires only 16 codes as the offset. In this case, the offset code length is 11 bits, whereas in the present invention, it is as short as 4 bits.
[0032]
Further, in the present invention, the compression processing speed is increased by combining the order of obtaining the matching length at each offset and the longest matching search termination condition. The method of simply obtaining the longest match for the 16 comparison points of the present invention is as follows.
[0033]
Module that simply obtains the longest match for 16 comparison points {data string from the encoding position and offset from data string from the encoding position, and data string and offset from the encoding position where the result is len1 The coincidence length of the data string from L-7 is obtained, the coincidence length of the data string from the encoding position having the result len (L-7) and the data string from the offset L-6 is obtained, and the result is obtained as len (L -6) ...
The coincidence length of the data string from the encoding position and the data string from the offset L + 6 is obtained, the coincidence length of the data string from the encoding position and the data string from the offset L + 7 with the result being len (L + 6) is obtained, and the result is obtained. len (L + 7), len1, len (L-7), (L-6)..., return the maximum value of len (L + 7) and the offset at that time}
When obtaining the match length at each offset, the upper limit is determined by the configuration of the match length code, and even if a long match is obtained, the comparison process is cut off at the upper limit. For example, when the maximum length of the coincidence length code is 256, when the coincidence detection reaches 256, the subsequent comparison is interrupted and 256 is made the coincidence length.
[0034]
The conventional method for obtaining the maximum matching length has a drawback that the processing speed is lowered when a long match is obtained for each offset. For example, if the white part of the image is encoded, the longest match (eg, 256) is obtained when compared to all offsets, so if one data comparison is counted once, it will be 256 times for each offset, for a total of 2096 times. A comparison will be made.
[0035]
In the present invention, paying attention to the fact that 256 is the maximum matching length, the remaining offsets are not compared when the matching length of 256 is obtained, thereby increasing the speed. However, since a Huffman code is normally used for both the offset code and the coincidence length code, a short offset code is assigned to an offset at which a long coincidence length is likely to be obtained. For example, when an offset is selected as in the present invention, it is considered that the closer to the position to be encoded, the higher the degree of similarity. Therefore, the shortest code is assigned to L, and L-1, L + 1, L- It is preferable to assign short codes in the order of 1, L + 1, L-2, L + 2,..., L-7, L + 7. At this time, if truncation is simply introduced with the maximum matching length as follows, an offset other than the shortest may be selected, which is not optimal.
[0036]
Introducing a truncation in a module that simply obtains the longest match for the 16 comparison points of the present invention {determining the match length between the data sequence from the encoding position and the data sequence from offset 1, and letting the result be len1 = len1 = 256, the match length between the data string from the module end coding position and the data string from the offset L-7 is obtained with offset = 1 and match length = 256, and the result is len (L−7). 7) If 256, the match length between the data string from the module end coding position and the data string from the offset L-6 is obtained with offset = L-7 and match length = 256, and the result is len (L-6). If len (L-6) = 256, the module ends with offset = L-6, match length = 256, and so on.
The matching length of the data string from the coding position and the data string from the offset L + 6 is obtained, and the result is len (L + 6). If len (L + 6) = 256, the module end coding position with offset = L + 6 and matching length = 256 Len (L + 7) = 256, where the matching length of the data string from FF and the data string from offset L + 7 is obtained, and the result is len (L + 7). If len (L + 7) = 256, the module ends len1, len (L− 7), (L-6)..., Return the maximum value of len (L + 6) and len (L + 7) and the offset at that time}
When searching in this order, offset 1 is selected in the white portion of the image, but since the shortest code is assigned to offset L, it is not an optimal code. In order to improve this point, the search order may be set in the order of shorter offset codes (not longer) as follows.
[0037]
Introduce censoring to improve search position (method of the present invention)
{Determine the matching length of the data string from the encoding position and the data string from the offset L, and if lenL = 256, where the result is lenL, then the offset = L, the matching length = 256, and the data string from the module end encoding position If the length of the data string from the offset L is obtained and len = 1, the result is lenL, if len = 1, the data string from the module end coding position and the data string from the offset L-1 are set as offset = 1 and the matching length = 256. If len (L-1) = 256, where the match length is obtained and the result is len (L-1), offset = L-1, match length = 256, the data string from the module end coding position and the offset L + 1 When the matching length of the data string is obtained and len (L + 1) = 256 where the result is len (L + 1), offset = L + 1, matching length = 256 Module end Te ...
The matching length of the data string from the encoding position and the data string from the offset L-7 is obtained, and the result is len (L-7). When len (L-7) = 256, the offset = L-7, the matching length = 256, the matching length of the data string from the module end coding position and the data string from the offset L + 7 is obtained, and if len (L + 7) = 256 where the result is len (L + 7), the offset = L + 7 and the matching length = 256 Returns the maximum value of module end len1, len (L-7), (L-6) ..., len (L + 6), len (L + 7) and the offset at that time}
In this case, even if 256 matches occur and the subsequent match length search is interrupted, the offset that is the shortest offset code is always selected, so both improvement in coding efficiency and speeding up of processing can be achieved. For example, for the white portion of the image, a total of 4096 comparisons were required in the original method, but in the method of the present invention, 256 comparisons are sufficient, and the generated codes are the same.
[0038]
The search termination condition of the present invention may be set to be smaller than the maximum value of the matching length code in order to achieve high speed even if the coding efficiency is somewhat reduced. For example, when the maximum value of the match length code is 256, if a match exceeding 128 is obtained, the subsequent search is not performed.
[0039]
Next, the second invention will be described.
[0040]
In the module for obtaining the longest matching position, the data string from the encoding position is compared with the data string starting from each offset position to obtain the matching length. In the present invention, comparison is made with respect to a total of 16 offsets. It is carried out. When the encoding unit is byte-implemented, if it is simply implemented, conventionally, it is compared byte by byte as shown below.
int search_maechlen (BYTE * offset, BYTE * cp) {
count = 0
while (* offset == * cp) count ++
return (count)
}
Although this may be sufficient when compression processing is performed by an 8-bit CPU, it can be speeded up as follows when considering implementation on a 32-bit CPU as in recent years. When the natural data length of the CPU is 32 bits, the second invention is implemented as shown below.
int search_maechlen (BYTE * offset, BYTE * cp) {
count = 0
(If the difference between offset and cp is not a multiple of 4 bytes, it is compared in units of 1 byte as in the conventional example, otherwise the following processing is performed)
(Offset and cp compare 1 to 3 bytes until they match the 4 byte boundary)
while (* (int) offset == * (int) cp) count + = 4/4/4 byte comparison (if there is no final mismatch 4 bytes, 1 to 3 bytes match or match, count add to)
return (count)
}
If the offset is not a multiple of 4 bytes, comparison cannot be made in units of 4 bytes, so comparison is made in units of 1 byte as in the past. When the offset is a multiple of 4 bytes, the portion that matches the 4-byte boundary can be compared in units of 4 bytes. Therefore, first, 1 to 3 bytes are compared in units of 1 byte to determine whether they match up to the 4-byte boundary. After matching the 4-byte boundary, high-speed comparison is performed in units of 4 bytes. Continue until a mismatch occurs or the upper limit of the match length code is reached. Even when a mismatch occurs, there is a possibility that 1 to 3 bytes are matched within the last 4-byte boundary, so comparison is made in units of 1 byte.
[0041]
In a general 32-bit CPU, 1-byte comparison and 4-byte comparison are processed in the same cycle. Therefore, by performing the comparison as in the present invention, a maximum speed increase of 4 times can be realized.
[0042]
Next, the third invention will be described.
[0043]
In the decompression process, the code is decoded to obtain the coincidence offset and the coincidence length, the data corresponding to the coincidence length is copied from the coincidence offset of the data buffer, and the operation of adding the decoded data to the data output is repeated. At this time, if simply implemented, conventionally, memory copy is performed byte by byte as shown below.
void matchl_copy (BYTE * offset, BY TE * cp, int length) {
memcpy_in_BYTE (cp, offset, length)
}
Similar to the second invention, the speed can be increased as follows when considering mounting on a CPU of 32 bits or the like as in recent years. When the natural data length of the CPU is 32 bits, the third invention is implemented as shown below.
void matchl_copy (BYTE * offset, BY TE * cp, int length) {(If the difference between offset and cp is not a multiple of 4 bytes, it is copied in 1-byte units as in the conventional example. Otherwise, the following processing is performed. Do)
(Offset and cp are 1 to 3 bytes copied until they match the 4 byte boundary)
memcpy_in_4BYTE (cp, offset, length) // 4 memory copy in bytes (if there is a remainder, copy 1 to 3 bytes)
} If the offset is not a multiple of 4 bytes, the memory copy cannot be performed in units of 4 bytes, so the memory copy is performed in units of 1 byte as in the conventional example. When the offset is a multiple of 4 bytes, the portion that coincides with the 4-byte boundary can be copied in units of 4 bytes. Therefore, first, 1-3 bytes of memory are copied in units of 1 byte up to the 4-byte boundary. After matching the 4-byte boundary, memory copying is performed at high speed until the matching length is reached in 4-byte units. If there is not much 4 byte boundary at the end,
Memory copy is performed in 1-byte units.
[0044]
In a general 32-bit CPU, 1-byte memory copy and 4-byte memory copy are processed in the same cycle. Therefore, by performing memory copy as in the third aspect of the invention, a maximum speed increase of 4 times can be realized.
[0045]
Next, a list of points of the present invention will be summarized.
[0046]
When expressing the compression and decompression of Lempel-Ziv method (moving window method) with software <Compression>
If a long match is obtained with the previously examined offset, the other offsets are not examined.
(Conventional)
for (from offset1 to offset N) {
len 1 = search_matchlen (offset 1, current_pointer)
len 2 = search_matchlen (offset 2, current_pointer)
len 3 = search_matchlen (offset 3, current_pointer)
...
}
maxlen = max (len 1, len 2 ...)
(Invention)
for (from offset 1 to offset N) {
if (len 1 = search_matchlen (offset 1, current_pointer)> = thresh_len) break
if (len 2 = search_matchlen (offset 2, current_pointer)> = thresh_len) break
if (len 3 = search_matchlen (offset 3, current_pointer)> = thresh_len) break
...
}
maxlen = max (len 1, len 2 ...)
The search_matchlen function is compared with the native word length of the processor (4 bytes for a 32-bit processor).
<Extension>
When the original image is formed from the coincidence code, if the word boundary coincides, the memcpy operation is executed by a 4-byte copy instruction.
[0047]
As described above, according to the present invention, when an LZ77-based compression device is realized by software, an offset that gives the longest match efficiently is searched by paying attention to the periodicity of image data, and an offset that gives the longest match is found. At this point, the search for the matching length ends. As a result, an extra matching length search process can be omitted, and the compression processing speed can be improved. Further, when checking the matching length with each offset, for example, when the compression unit (1 byte) is different from the native word length of the processor (for example, 4 bytes in the case of a 32-bit processor) that implements the compression software, The processing efficiency can be improved by comparing the matches with the native word length. Further, when the decompression process is implemented by software, the processing efficiency can be improved by performing memory copy with the native word length of the processor in decoding the coincidence code.
[0048]
【The invention's effect】
According to the present invention, it is possible to provide a data compression method capable of compressing at high speed compression target data in which the same data continues, that is, compression target data having a relatively long matching length.
[0049]
(1) When a conventional LZ77-based data compression apparatus is realized by software, a data sequence is compared between a data sequence to be encoded and all offset positions to be compared. All the parts made of white were usedlessly. In this invention, paying attention to the periodicity of image data, comparison is made from offset positions that are two-dimensionally close, and when an offset position that reaches the maximum match length defined by the match code is obtained, it is compared with other offset positions. By canceling, the compression processing time can be shortened.
[0050]
Even if the maximum match length specified by the match code is not reached, a threshold value for the match length is set in advance, and if a match length longer than that is obtained, the comparison with the subsequent offset is canceled to obtain a similar effect. Is obtained. In this case, the optimum compression rate may not be obtained, but a decrease in the compression rate can be suppressed by adjusting the threshold value.
[0051]
(2) When a conventional LZ77-based data compression apparatus is realized by software, it is necessary to compare a data string to be encoded with a data string from an offset position to be compared, and check a matching length. Conventionally, when the compression processing unit is 1 byte, both data strings are compared in units of 1 byte to obtain a matching length. In the present invention, when the native word length of the processor (for example, 4 bytes in the case of a 32-bit processor) that implements the compression software is larger than the compression unit (for example, 1 byte), the data strings are compared with the native word length of the processor. Processing efficiency improves by performing.
[0052]
(3) When a conventional LZ77-based data decompression device is realized by software, it is necessary to copy the original data for the matching length from the offset position indicated by the matching code in the decoding of the matching code. Memory copy was performed in units of processing (for example, 1 byte). In the present invention, when the native word length of the processor (for example, 4 bytes in the case of a 32-bit processor) that implements the compression software is larger than the compression unit (for example, 1 byte), the memory copy is performed with the native word length of the processor. This improves the processing efficiency.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining an outline of a data compression method according to the present invention;
FIG. 2 is a diagram for explaining an outline of a conventional data compression method;

Claims

画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、
前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出するとき、予め規定された最大一致長が得られた時点で残りの比較を打ち切り一致長の検出を終了し、この検出された一致長を基にして符号化を行うことを特徴とするデータ処理方法。In a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream,
The specific symbol included in the image data stream, and a plurality of symbols starting from the specific symbol as encoding target symbols, the symbol of offset 1 adjacent to the upstream side of the specific symbol, and the specific symbol A plurality of symbols of offset L + n to offset L-n on the upstream side and downstream side of the stream centered on the symbol of offset L that is one line away from the reference symbol, and the encoding target symbol and the comparison target symbol When the matching length is detected by sequentially comparing the two, the remaining comparison is terminated when the predetermined maximum matching length is obtained, and the detection of the matching length is terminated, and encoding is performed based on the detected matching length. The data processing method characterized by performing.

画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、
前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出し符号化するとき、前記符号化対象シンボルに含まれる前記特定のシンボルの比較の対象として、前記オフセット１、Ｌ、Ｌ＋１、及びＬ−１のシンボルを優先的に選択し、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出するとき、予め規定された最大一致長が得られた時点で残りの比較を打ち切り一致長の検出を終了し、この検出された一致長を基にして符号化を行うことを特徴とするデータ処理方法。 In a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream,
The specific symbol included in the image data stream, and a plurality of symbols starting from the specific symbol as encoding target symbols, the symbol of offset 1 adjacent to the upstream side of the specific symbol, and the specific symbol A plurality of symbols of offset L + n to offset L-n on the upstream side and downstream side of the stream centered on the symbol of offset L that is one line away from the reference symbol, and the encoding target symbol and the comparison target symbol Are sequentially compared to detect the matching length and encode, the symbols of the offsets 1, L, L + 1, and L−1 are preferentially compared as the comparison target of the specific symbol included in the encoding target symbol. selected, to detect the matching length by sequentially comparing the comparison target symbol and the coded symbol When the predetermined maximum matching length is obtained, the rest of the comparison is aborted, the detection of the matching length is terminated, and encoding is performed based on the detected matching length. .

画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、
前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して符号化するとき、前記符号化対象シンボルに含まれる前記特定のシンボルの比較対象優先順位を優先度の高いものから順に、前記オフセットＬ、１、Ｌ−１、Ｌ＋１、Ｌ−２、Ｌ＋２、…、Ｌ−ｎ、及びＬ＋ｎのシンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出するとき、予め規定された最大一致長が得られた時点で残りの比較を打ち切り一致長の検出を終了し、この検出された一致長を基にして符号化を行うことを特徴とするデータ処理方法。In a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream,
The specific symbol included in the image data stream, and a plurality of symbols starting from the specific symbol as encoding target symbols, the symbol of offset 1 adjacent to the upstream side of the specific symbol, and the specific symbol A plurality of symbols of offset L + n to offset L-n on the upstream side and downstream side of the stream centered on the symbol of offset L that is one line away from the reference symbol, and the encoding target symbol and the comparison target symbol Are sequentially compared and encoded, the priority order of comparison of the specific symbols included in the encoding target symbol is set to the offsets L, 1, L-1, L + 1, L- in descending order of priority. 2, L + 2,..., L−n, and L + n, and the encoding target symbol and the comparison target symbol. When detecting a matching length by sequentially comparing the Bol, the remaining comparison completed the detection of truncation matching length when predefined maximum matching length are obtained, and the detected matching length based on A data processing method characterized by encoding.

圧縮処理を担う演算処理部がｎビットの演算処理部の場合、ｎビットの単位でデータを圧縮することを特徴とする請求項１、請求項２、又は請求項３に記載のデータ処理方法。 4. The data processing method according to claim 1, wherein when the arithmetic processing unit responsible for the compression processing is an n-bit arithmetic processing unit, the data is compressed in units of n bits.

１シンボルが８ビットのデータであり、圧縮処理を担う演算処理部がｎビットの演算処理部の場合、（ｎ／８×シンボル）の単位でデータを圧縮することを特徴とする請求項１、請求項２、又は請求項３に記載のデータ処理方法。 The data is compressed in units of (n / 8 × symbol) when one symbol is 8-bit data and the arithmetic processing unit responsible for compression processing is an n-bit arithmetic processing unit. The data processing method according to claim 2 or claim 3.

符号化された符号化データストリームを伸張するために、符号化データストリームに含まれる一部のデータをコピーするとき、データのコピーを行う演算処理部がｎビットの演算処理部の場合、ｎビットの単位でデータをコピーすることを特徴とする請求項１、請求項２、請求項３、請求項４、又は請求項５に記載のデータ処理方法。 When copying a part of the data included in the encoded data stream in order to expand the encoded data stream, n bits when the arithmetic processing unit for copying the data is an n-bit arithmetic processing unit 6. The data processing method according to claim 1, wherein the data is copied in units of.

符号化された符号化データストリームを伸張するために、符号化データストリームに含まれる一部のデータをコピーするとき、１シンボルが８ビットのデータであり、データのコピーを行う演算処理部がｎビットの演算処理部の場合、（ｎ／８×シンボル）の単位でデータをコピーすることを特徴とする請求項１、請求項２、請求項３、請求項４、又は請求項５に記載のデータ処理方法。 When a part of data included in the encoded data stream is copied in order to expand the encoded data stream, one symbol is 8-bit data, and the arithmetic processing unit for copying the data is n 6. The data processing unit according to claim 1, wherein the data is copied in units of (n / 8 × symbol) in the case of a bit processing unit. Data processing method.

画像の走査により得られるこの画像のライン長Ｌの複数ラインに相当する画像データストリームを符号化して、この画像データストリームを圧縮するデータ処理方法において、 In a data processing method for encoding an image data stream corresponding to a plurality of lines having a line length L of the image obtained by scanning the image, and compressing the image data stream,
前記画像データストリームに含まれる特定のシンボル、及びこの特定のシンボルを先頭とした複数のシンボルを符号化対象シンボルとし、前記特定のシンボルの上流側に隣接するオフセット１のシンボル、及び前記特定のシンボルから１ライン長離れたオフセットＬのシンボルを中心としたストリームの上流側及び下流側のオフセットＬ＋ｎ〜オフセットＬ−ｎの複数のシンボルを比較対象シンボルとし、前記符号化対象シンボルと前記比較対象シンボルとを順次比較して一致長を検出するとき、一致長符号の最大値の半分を超える一致が得られた時点で残りの比較を打ち切り一致長の検出を終了し、この検出された一致長を基にして符号化を行うことを特徴とするデータ処理方法。 The specific symbol included in the image data stream, and a plurality of symbols starting from the specific symbol as encoding target symbols, the symbol of offset 1 adjacent to the upstream side of the specific symbol, and the specific symbol A plurality of symbols of offset L + n to offset L-n on the upstream side and downstream side of the stream centered on a symbol of offset L that is one line away from the target symbol, and the encoding target symbol and the comparison target symbol When the match length is detected by sequentially comparing the two, the remaining comparison is aborted when a match exceeding half the maximum value of the match length code is obtained, and the match length detection is terminated. A data processing method characterized by performing encoding.