JPH04123619A

JPH04123619A - Data compressing and restoring device

Info

Publication number: JPH04123619A
Application number: JP2245176A
Authority: JP
Inventors: Hirotaka Chiba; 広隆千葉; Yoshiyuki Okada; 佳之岡田; Shigeru Yoshida; 茂吉田; Yasuhiko Nakano; 泰彦中野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-09-14
Filing date: 1990-09-14
Publication date: 1992-04-23
Anticipated expiration: 2015-05-08
Also published as: JP3038233B2

Abstract

PURPOSE:To shorten a dictionary retrieval time and to speed up the operation by measuring the appearance frequency of data and executing the code conversion for allocating a code of a small value from the order of high frequency. CONSTITUTION:The appearance frequency of one group of input data is measured by a measuring circuit 12, a result of measurement is sent to a converting circuit 14 and a conversion table is generated. In such a case, for instance, in the case the appearance frequency of the input data of a decimal code value 1 is high in tenth order, a value '10' for showing the appearance frequency level is stored in an address of the decimal code value '1' of the input data of the conversion table. Therefore, the input data having the decimal code value 1 is converted to the code value '10' based on the appearance frequency by the conversion table, given to a dictionary retrieving circuit 22 and the dictionary retrieval and the dictionary registration for LZW encoding are executed.

Description

【発明の詳細な説明】[Detailed description of the invention]

【概要】ユバ−サル符号化の一種である増分分解型の改良として
のＬＺＷ符号化によるデータ圧縮装置の辞書検索方式に
関し、外部ハツシュ法の連結リスト構造をもつ辞書メモリの高
速検索を可能にして辞書検索時間を短縮することを目的
とし、データ圧縮時に入力データの使用頻度を計測し、使用頻
度が高いほど連結リストの先頭に配置されるように、入
力データを使用頻度が高いほど小さい値となるように変
換する。また復号時には圧縮時に作成した変換規則の逆
規則を使用して復号されたデータを元の入力データに変
換する。[Summary] Regarding a dictionary search method for a data compression device using LZW encoding as an improvement of the incremental decomposition type, which is a type of universal encoding, this paper enables high-speed search of a dictionary memory with a linked list structure using the external hash method. The purpose of reducing dictionary search time is to measure the frequency of use of input data when compressing data, and set the input data to a smaller value as it is used more frequently, so that the more frequently used it is placed at the beginning of the linked list. Convert it so that Also, during decoding, the decoded data is converted to the original input data using a reverse rule of the conversion rule created during compression.

【産業上の利用分野】[Industrial application field]

本発明は、ユバ−サル符号化の一種である増分分解型の
改良としてのＬＺＷ符号化によるデータ圧縮装置の辞書
検索方式に関する。近年、文字コード、ベクトル情報、画像など様々な種類
のデータがコンピュータで扱われるようになっており、
扱われるデータ量も急速に増加してきている。大量のデ
ータを扱うときは、データの中の冗長な部分を省いてデ
ータ量を圧縮することで１．記憶容量を減らしたり、速
く伝送したりできるようになる。このような様々なデータを１つの方式でデータ圧縮でき
る方法としてユニバーサル符号化が提案されている。ここで、本発明の分野は、文字コードの圧縮に限らず、
様々なデータに適用できるが、以下では、情報理論で用
いられている呼称を踏襲し、データの１ワ一ド単位を文
字と呼び、データが複数ワードツなかったら９を文字列
と呼ぶことにする。ユニバーサル符号の代表的な方法として、ジブーレンペ
ル（！ｉｗ−Ｌｅｅ＞ｃｌ）符号がある（詳しくは、例
えば、宗像１１ｉマーＬｃｍ、ｐｅｌのデータ圧縮法」
、情報処理、Ｖｏｌ、　２６．　Ｎｏ、　！、　１９８
５年を参照のこと）。ジブーレンペル符号では、 ■ユニバーサル型 ■増分分解型（Ｉｎｃｒｅｍｅａｌｇｌ　ｐｘｒｔｉＢ
　）の２つのアルゴリズムが提案されている。更に、ユニバーサル型アルゴリズムの改良として、ｔ　
ｚ、　ｓ　ｓ符号がある（Ｔ、　Ｃ，１ｌｅｌｌ、　　
”Ｂｅ１ｌｅｒ　ＯＰＭ／Ｌ　Ｔｅｘ１　Ｃｏｍｐ＋ｅ
ｓｓｉｏａ　　、ＩＥＥＥ　Ｔｒｘｎｓ、　ａｍ　Ｃｏ
ｍｅｗ＋＋、　、　Ｖｏｌ、　Ｃ０Ｍ７３４．　Ｎｏ、
１２．　ＤＥＣ，１９８６参照）。また、増分分解型アルゴリズムの改良としては、Ｌ　Ｚ
　Ｗ　（Ｌｅｍｐｅｌ−、Ｚｉｗ−Ｗｅｌｃｋ）符号が
ある（Ｔ、　Ａ、　ＷｅＩｃｌ＋、Ａ　Ｔｅｃｈｌｌｉ
ｑ＋ｅ　ｌｏｔ　Ｈｉｌｋ−Ｐｅｒｌｏｖｕｃｅ　Ｄｌ
ｌｇＣｏｍｐｒｅｓｓｉｏｎ　　、　Ｃｏｍｐｕｌｃｔ
、　ＪＩｎｅ　１９８４参照）。これらの符号の内、高速処理ができることと、アルゴリ
ズムの簡単さからＬＺＷ符号が記憶装置のファイル圧縮
などで使われるようになっている。The present invention relates to a dictionary search method for a data compression device using LZW encoding as an improvement on the incremental decomposition type, which is a type of universal encoding. In recent years, computers have come to handle various types of data such as character codes, vector information, and images.
The amount of data handled is also rapidly increasing. When handling large amounts of data, you can reduce the amount of data by omitting redundant parts. It becomes possible to reduce storage capacity and speed up transmission. Universal encoding has been proposed as a method that can compress such various data using one method. Here, the field of the present invention is not limited to character code compression.
Although it can be applied to a variety of data, in the following, we will follow the nomenclature used in information theory and call each word of data a character, and if the data does not contain multiple words, we will call 9 a character string. . A representative method of universal codes is the Gibou-Lempel (!iw-Lee>cl) code (for details, see, for example, the data compression method of Munakata 11i Mar Lcm, pel).
, Information Processing, Vol. 26. No! , 198
(See Year 5). In Gibou-Lempel codes, ■ Universal type ■ Incremental decomposition type (Incremealgl pxrtiB
) have been proposed. Furthermore, as an improvement of the universal algorithm, t
There are z, s s codes (T, C, 1lell,
”Be1ler OPM/L Tex1 Comp+e
ssioa, IEEE Trxns, amCo
mew++, , Vol, C0M734. No,
12. (see DEC, 1986). Moreover, as an improvement of the incremental decomposition type algorithm, L Z
There is a W (Lempel-, Ziw-Welck) code (T, A, WeIcl+, A Techlli
q+e lot Hilk-Perlovuce Dl
lgCompression, Complex
, JIne 1984). Among these codes, the LZW code has come to be used for file compression in storage devices because of its high-speed processing capability and simple algorithm.

【従来の技術】[Conventional technology]

従来のＬＺＷ符号による符号化処理フローを第４図に示
し、復号化処理フローを第５図に示す。まずＬＺＷ符号化処理は、書き替え可能な辞書を持ち、
入力文字列の中を相異なる文字列（部分列）に分け、こ
の文字列を出現した順に参照番号を付けて辞書に登録す
ると共に、現在入力している文字列を、辞書に登録しで
ある最長−散文字列の参照番号で表して符号化するもの
である。第６図にＬＺＷ符号化の説明図を示すと共に第７図にＬ
ＺＷ復号化の説明図を示し、更に第８図に復号化時に作
成される辞書構成例を示す。尚、第６．　７．　８図では説明を簡単にするため、ａ
ｂｃの３文字の組合せからなるデータを圧縮、復元する
場合の例を取り上げている。第４図のＬＺＷ符号化処理では、まずステップＳＬ（以
下「ステップ」は省略）で予め辞書に全文字につき一文
字からなる文字列を初期値として登録してから符号化を
始める。Ｓｌの符号化は入力した最初の文字Ｋにより辞書を検索
して参照番号ωを求め、これを語頭文字列とする。次に８２で入力データの次の文字Ｋを読込み、Ｓ３で文
字入力が終了したか否かチエツクした後、Ｓ４に進んで
Ｓｌで求めた語頭文字列ωに８２で読込んだ文字Ｋを加
えた拡張文字列（ωＫ）が辞書にあるか否か探す。Ｓ４で文字列（ωＫ）が辞書になければ、Ｓ６に進んで
Ｓｌで求めた文字にの参照番号ωを符号語ｃｏｄｅ　（
ω）として出力し、また文字列（ωＫ）に新たな参照番
号を付加して辞書に登録し、更にＳ２の入力文字Ｋを参
照番号ωに置き換えると共に辞書アドレスｎをインクリ
メントしてＳ２に戻って次の文字Ｋを読み込む。一方、Ｓ４で文字列（ωＫ）が辞書にあればＳ５で文字
列（ωＫ）を参照番号ωに置き換え、再びＳ２に戻って
Ｓ４で文字列（ωＫ）が辞書から探せなくなるまで最大
一致長の検索を続ける。第５．６図を参照してＬＺＷ符号化を具体的に説明する
と次のようになる。まず第５図の入力データ１ａｐｕｌは左から右へと読む
。最初の文字ａを入力した時、辞書には文字ａの他に一
致する文字列がないので、０υＴＰＵＴ　Ｃ０ＤＥｌ（
参照番号ω）を符号語して出力する。そして文字ａを語
頭文字列ωとする。次に２番目の文字すを入力したとすると、この入力文字
を語頭文字列ωに加えた拡張文字列ωに＝ａｂは辞書に
ないことから、文字すの０υＴＰＵＴ　ＣＯＤε２を符
号語として出力する。そして、拡張文字列ωに＝ａｂに
参照番号４を付けて辞書に登録する。実際の辞書登録は
第６図の右側に示すように文字列１ｂとして登録される
。そして文字すが語頭文字列ωとなる。。続いて３番目の文字ａを入力したとすると、文字すに語
頭文字列ωを加えた拡張文字列ωに＝ｂａ＝２ａは辞書
にないことから、文字すの０ＵＴＰＵＴＣＯＤＥ　２を
符号語として出力した後、拡張文字列ωに＝ｂａを２８
で表わし、参照番号５を付けて辞書に登録する。そして
文字ａが新たな語頭文字列ωとなる。４番目の入力文字すについては拡張文字列ωに＝ａｂは
１ｂの符号語４として既に辞書に登録されているので、
文字列ωＫを新たな語頭文字列ωとし、５番目の文字Ｃ
を入力して拡張文字列ωに＝４　ｃ＝ａ　ｂ　ｃを作る
。この拡張文字列ωに＝ａｂｅは辞書に登録されていな
いことから、文字列ａｂ＝１ｂの０ＵＴＰＵＴ　Ｃ０Ｄ
Ｅ　４を符号語として出力し、拡張文字列ωに＝ａｂｃ
を辞書に４０の形で符号語６として登録する。以下同様
に、この処理を続ける。第５図の復号化処理は第４図の符号化の逆の操作を行う
。第５図のＬＺＷ復号化では、符号化時と同様に予め辞書
に全文字につき一文字からなる文字列を初期値として登
録してから復号化を始める。まずＳｌで最初の符号（参照番号）を読込み、現在のＣ
０ＤＥを０ＬＤｃｃ＋ｄｅとし、最初の符号は既に辞書
に登録された一文字の参照番号いずれかに該当すること
から、入力符号Ｃ０ＤＥに一致する文字ｃｇｄｅ（Ｋ）
を探し出し、文字Ｋを出力する。尚、出力した文字には後の例外処理のためＦＩＮ、ｃｈ
、ｓｒにセットしておく。次に８２に進んで次の符号を読込んでＣ０ＤＨにＩＮｃ
ｃ＋ｄｅとしてセットする。Ｓ３で新たな符号があるか
否か、即ち符号入力の終了の有無をチエツクしてＳ４に
進み、Ｓ３で入力された符号Ｃ０ＤＥが辞書に定義（登
録）されているか否かチエツクする。通常、入力した符号語は前回までの処理で辞書に登録さ
れているため、Ｓ５に進んで符号Ｃ０ＤＥに対応する文
字列ｃｏｄｅ　（ωＫ）を辞書から読出し、Ｓ６で文字
Ｋを一時的にスタックし、参照番号Ｃ０ＤＥ（ω）を新
な符号Ｃ０ＤＥとして再度Ｓ５に戻り、このＳ５．Ｓ６
の手順を再帰的に参照番号ωが一文字Ｋに至るまで繰り
返し、最後に８７に進んでＳ６でスタックした文字をＬ
　Ｉ　ＦＯ（Ｌｒｓｌ　Ｉｉ　ＦｓｓｌＯｗｌ）形式で
ポツプアップして出力する。同時に８７において、前回
使った符号ωと今回復元した文字列の最初の１文字Ｋを
組（ωＫ）と表した文字列に、新たな参照番号を付加し
て辞書に登録する。第８図を参照してＬＺＷ復号化処理を具体的に説明する
と次のようになる。まず第８図で最初の入力符号語（ＩＮＰｊ１丁Ｃ０ＤＥ
）は１であり、−文字ａ、　　ｂ、　　ｃについては既
に参照番号１．２．３として第７図に示すように辞書に
登録されているため、辞書の参照により符号語１に一致
する参照番号の文字列ａに置き換えて出力する。次の符号語２についても同様にして文字すに置き換えて
出力する。このとき前回処理した符号語１と今回復号し
た文字列の１番目の文字すとを組合わせた文字列ωに＝
１ｂに新たな参照番号４を付加して辞書に登録する。３番目の符号語４は辞書の検索により求めた文字列１ｂ
から文字列ａｂと置き換えて文字列ａｂを出力する。同
時に前回処理した符号語２と今回復号した文字列の１番
目の文字ａとの組合せた文字列ωに＝２ａ　（＝ｂａ）
に新たな参照番号５を付加して辞書に登録する。以下同様に、この処理を繰り返す。第８図のＬＺＷ復号化では次の例外処理がある。この例外処理は、第６番目の入力符号語８の復号で生ず
る。符号語８は復号時に辞書に定義されておらず、復号
できない。この場合には、前回処理した符号語５に前回
復号した文字列ｂａの最初の一文字すを加えた文字列５
ｂを求め、更に５　ｂ＝２　ａ　ｂ＝ｂ　ａ　ｂと置き換えて出力する例外処理を行う。そして、文字列
の出力後に前回の符号語５に今回復号した文字列の１番
目の文字すを加えた文字列５ｂに参照番号８を付加して
辞書に登録する。この例外処理は、第５図の復号化処理フローの８４．８
８の処理を通じて行われ、最終的に８７で文字列の出力
と新たな文字列に参照番号を付加した辞書への登録が８
７で行われる。尚、第５．８図のＬＺＷ復号化は、復号側で符号を解読
しながら辞書をリアルタイムで作り出す場合を説明した
が、符号化の際に作られた辞書をそのまま復号化側にコ
ピーとして使用することで符号化しても良い。この場合
に復号化側での例外処理は不要になる。このように第４図の処理フロー図に示す手順でＬＺＷ符
号化を行うと、１つの文字列を辞書検索するたびに、最
悪、辞書全体をサーチしなければならず、辞書検索に時
間がかかる問題があった。そこで従来の辞書検索方式にあっては、外部ハツシュ法
（ｏ＠Ｂ　ｋ＊５ｋｉＢ又はｃｋ＊１ｍ１Ｂ）を用いて
処理速度を上げている。まず一般的なハツシュ法による辞書検索にあっては、複
数の文字列からなる集合Ｓを考えたとき、集合Ｓの文字
列Ｘの格納位置を、文字列Ｘそのものから格納位置を示
すアドレスを直接計算できる仕組みになっており、高速
の辞書検索ができる。文字列の記憶場所、即ちハツシュ表に０から鳳−１まで
のアドレスが付されているとすると、ハツシュ法では、
関数ｈａｓ→（０，１，・・・、　ｓ−１）を一つ定めて、
集合Ｓの文字列Ｘのアドレスをｈ（Ｘ）として求める。この関数ｈ　（ｘ）をハツシュ関数、値ｈ　（ｘ）を文
字列Ｘのハツシュアドレスという。ハツシュ法は、通常、集合Ｓの大きさがアドレス数ｍに
比べてはるかに大きい場合に用いられる。しかしながら、ハツシュ関数りをどのように選んだとし
ても、集合Ｓの相異なる文字列ｘｉ、ｘ２に対してｈ　（ｘｉ）＝ｈ　（ｘ２）ハツシュアドレスが一致してしまう場合が起こり得る。これを衝突と呼び、衝突に対する対策の一つとして外部
ハツシュ法が用いられる。外部ハツシュ法は第９図に示すように、索引（ディレク
トリ）で示されるハツシュアドレスｉ毎に連結リストを
用意し、衝突を起こしたノ１ツシュアドレスｈ（ｘ）＝
ｉの文字列又は、連結リストの先頭から順番に格納する
。同じハツシュアドレスｈ　（ｘ）ををもつそれぞれの
連結リストはパケット（ｂａｃｋｅｌ）と呼ばれる。辞書検索に外部ハツシュ法のリスト構造を利用したＬＺ
Ｗ符号化の処理フロー図を第１０図に示す。また第１１
図は外部ハツシュ法に従った辞書メモリの構成を示した
もので、第１２図に示す符号化済み文字列のツリー構造
を例にとってＬＺＷ符号化の検索手順と登録手順を具体
的に示している。まず第１１図において、辞書メモリは、ファーストメモ
リ（Ｆｉｒｓｔ　Ｍｅｍｏｓ）　１００、ネクストメモ
リ　（Ｎｅｘｌ　Ｍｅｍｏｔ７１２００及びネクストメ
モリ２００の拡張領域となる拡張メモリ（Ｅｘｔｅｎｌ
ｉｏｎ　ＭｅａＯｔｙ）　３００で構成される。ここで
ファーストメモリ１００が第９図に示した外部ノ＼ツシ
ュ法の索引（ディレクトリ）に対応し、ネクストメモリ
２００が第９図の連結リストのｒａｅｘｌＪに対応し、
更に拡張メモリ３００が第９図の「ｕｍｃＪに対応する
。また第１２図のツリー構造は、文字に、、、に２．。Ｋ２□、・・・、に４．が既に登録され、破線で示すに
４２は新たに登録される場合を示している。このツリー
構造における階層は、第１０図の処理において、ｉカウ
ンタで示され、同じ階層における文字の数はｊカウンタ
で表される。従って、各文字の登録アドレスはω１．として表わされ
る。いま第１２図の登録済みのツリー構造に含まれる文字列ｒＫ＋。＋　Ｋ２２１　Ｋ３２１　Ｋ　４２Ｊが入力し
た時の第１０図の処理フローに従った辞書検索によるＬ
ＺＷ符号化及び登録を説明すると次のようになる。第１θ図において、まずＳｌで次の初期化処理を行う。 ■第１番目の文字を含むように辞書を初期化する。例えばアルファベット２６文字であれば、文字コードを
そのままハツシュアドレスとして第１４図のファースト
メモリに登録する。第１５図の場合、ツリートップにあ
る文字ＫＩＯがアドレスω、Ｏに登録された状態を意味
する。 ■辞書への現在文字登録数ｎを前記■で登録した文字数
にセットする。アルファベット２６文字の場合には、ｎ
＝２６となる。 ■入力した最初の文字Ｋを語頭文字列ｉとする。第１２図の場合、最初の入力文字はＫＩＯであることか
ら語頭文字列ｉ＝１とする。尚、以下の処理フロー中で
は語頭文字列１をｉカウンタとして説明する。 ■辞書検索用配列をＯに初期化する。即ち、ファースト
、ネクスト及び拡張のメモリの検索用配列はｌ１ｒｓｔ
［ｌ、Ｎ５ｓｔｌ、＋ｅｘｔ　［１，Ｎ１１ｌｌｌ　、
ＥＸＴ　　［１，Ｎａ１ｘ］で表わされるので、これを
０に初期化する。Ｓｌの初期化処理が済んだならば、Ｓ２に進んで次の文
字ｒ　Ｋ　２□］を読込む。次に８３で未処理の文字が
あるか否かチエツクする。全ての処理が終ればＳ１６に
進んで符号語＜ｏｄｅ　（ω）を出力して処理を終了す
る。このとき未処理文字があるので８５〜Ｓ９に示す辞
書検索ステップに進む。辞書検索ステップは、まずＳ５でアドレスω。にそのときの語頭文字列ｉ＝１の値をセットし、且つＪ
カウンタをｊ＝０にセットする。これによりファースト
メモリのアドレスω１＝ω、。が生成される。次に８６でファーストメモリ１００のアドレスω、０の
内容を読むとアドレスω１＝ω２．が得られるので、ｉ
カウンタをｉ＝２にセットする。続いてＳ７に進み、ｉ＝０か否かチエツクし、このとき
ｉ＝２であることから８８に進み、Ｓ６のファーストメ
モリ１００から得られたアドレスω２１の拡張メモリ３
００を参照して文字ｒＫ２＋Ｊを読出し、Ｓ２で得てい
る入力文字ｒ　Ｋ　２□」との一致を判別する。この場
合、両者は不一致であることから８９に進み、このとき
のｉカウンタの値ｉ＝２をｊカウンタにセットしてｊ＝
２とし、またネクストメモリ２００のアドレスω２．に
格納されているアドレスω１１”ω２□のｉをｉカウン
タにｉ＝２としてセットする。このため新たなアドレス
ω、；ω２２が作り出される。続いてＳ７に戻り、ｉ＝０をチエツクし蔦このときｉ＝
２であることから再びＳ８に進んでアドレスω２□の拡
張メモリ３００の登録文字「Ｋ２゜」を読出して入力文
字ｒＫ２２ｊとの一致を判別する。このとき両者は一致することから８２に戻り、次の文字
［Ｋ１２Ｊを読込む。以下同様にして８５〜Ｓ９の処理
の繰り返しにより、第１１図の実線の矢印で示す順番に
辞書検索が行なわれ、既に登録済みの文字ｒ　Ｋ　４＋
　Ｊまでの検索処理が行われる。登録文字［Ｋ４＋Ｊの検索が終了してＳ８で最後の入力
文字「Ｋ４２」で不一致が判別された場合には、Ｓ９で
ｉ＝２にセットすると共に、アドレスω４１のネクスト
メモリ２００の内容が０であることから、ｉ＝０にセッ
トする。このためＳ７に戻った時にｉ＝０が判別され、
辞書検索ステップを抜は出してＳＩＯに進み、それまで
の文字列［Ｋ１０＋　　Ｋ２２＋　Ｋ３２Ｊを示すアド
レスω３２を符号語ｃ。ｄｅ　（ω）として出力し、８１１〜１４の辞書登録ス
テ、ツブに進む。辞書登録ステップにあっては、まずＳｌｌで現在登録文
字列ｎを１＝ｉ、即ちｎ＝４にセットし、更にｎを１つ
インクリメントする。そして文字「Ｋ４２」を拡張メモ
リ３００のアドレスω１．＝ω４２に登録する。次に３１２でｊ＝０か否かをチエツクし、このときｊ＝
２であることから３１４に進み、ネクストメモリ２００
のアドレスω４．に文字ｒＫａ２Ｊを登録したアドレス
ω４２を書込む。一方、Ｓ１２でｊ＝ｏであれば、即ち
、ファーストメモリ１００への登録に移行した状態であ
れば、第１１図のファーストメモリ１００のアドレスω
１１．ω２２．ω、２に示すように、拡張メモリ３００
の文字登録アドレスを格納する。この文字登録ステップにおける文字［ａ□」の登録によ
り、第１１図のネクストメモリ２００及び拡張メモリ３
００は、下部に破線で仕切って示すアドレスω４１．ω
４２の登録状態となり、第１２図に示すツリー構造に新
たな文字「Ｋ４□」のアドレスω４□が追加されたこと
になる。尚、第１１図では、アドレスω４．については
説明の都合上、検索と登録で重複して示している。Ｓｌｌ−Ｓ１４の辞書登録ステップが終了すると、８１
５で登録した文字ｒＫ４２Ｊを新たな語頭文字列ｊ１即
ち、ｉカウンタの値にセットし、再びＳ２に戻って文字
「Ｋ４２」をツリートップとして、その後に続く文字列
の辞書検索に移行する。FIG. 4 shows an encoding processing flow using a conventional LZW code, and FIG. 5 shows a decoding processing flow. First, LZW encoding processing has a rewritable dictionary,
Divide the input character string into different character strings (substrings) and register these character strings in the dictionary with reference numbers in the order in which they appear, and also register the currently input character string in the dictionary. The longest-dispersed character string is represented by a reference number and encoded. Figure 6 shows an explanatory diagram of LZW encoding, and Figure 7 shows LZW encoding.
An explanatory diagram of ZW decoding is shown, and FIG. 8 shows an example of a dictionary structure created during decoding. In addition, No. 6. 7. In Figure 8, to simplify the explanation, a
An example of compressing and restoring data consisting of a combination of three characters bc is taken up. In the LZW encoding process shown in FIG. 4, first, in step SL (hereinafter "step" will be omitted), a character string consisting of one character for each character is registered in the dictionary as an initial value, and then encoding is started. To encode Sl, a dictionary is searched using the input first character K to obtain a reference number ω, and this is used as the initial character string. Next, the next character K of the input data is read in step 82, and after checking whether character input has been completed in step S3, the process proceeds to step S4, where the character K read in step 82 is added to the initial character string ω determined in step S1. Search whether the added extended character string (ωK) exists in the dictionary. If the character string (ωK) is not in the dictionary in S4, proceed to S6 and use the code word code (
ω), add a new reference number to the character string (ωK), register it in the dictionary, replace the input character K in S2 with the reference number ω, increment the dictionary address n, and return to S2. Read the next character K. On the other hand, if the character string (ωK) is in the dictionary in S4, the character string (ωK) is replaced with the reference number ω in S5, the process returns to S2, and the maximum match length is increased in S4 until the character string (ωK) cannot be found in the dictionary. Continue searching. LZW encoding will be specifically explained as follows with reference to FIG. 5.6. First, the input data 1apul in FIG. 5 is read from left to right. When you enter the first letter a, there is no matching string in the dictionary other than the letter a, so 0υTPUT C0DEl(
The reference number ω) is output as a code word. Then, let the character a be the initial character string ω. Next, if the second character S is input, this input character is added to the word initial character string ω, which is an expanded character string ω. Since =ab is not in the dictionary, the character S 0υTPUT CODε2 is output as a code word. . Then, the extended character string ω is added with reference number 4 to =ab and registered in the dictionary. In actual dictionary registration, the character string 1b is registered as shown on the right side of FIG. Then, the character S becomes the word-initial character string ω. . If you then input the third character a, the expanded character string ω, which is the initial character string ω added to the character ``=ba=2a'', is not in the dictionary, so the character ``0UTPUTCODE 2'' is output as the code word. After that, add =ba to 28 to the extended string ω.
, and register it in the dictionary with reference number 5. Then, the letter a becomes a new initial character string ω. For the fourth input character, the extended character string ω = ab is already registered in the dictionary as code word 4 of 1b, so
Let the character string ωK be a new initial character string ω, and the fifth character C
Input and create the expanded character string ω =4 c=a b c. Since =abe is not registered in the dictionary in this extended character string ω, the character string ab = 1b is 0UTPUT C0D.
E 4 is output as a code word, and the extended character string ω is =abc
is registered in the dictionary in the form of 40 as code word 6. This process continues in the same manner. The decoding process shown in FIG. 5 performs the reverse operation of the encoding process shown in FIG. 4. In the LZW decoding shown in FIG. 5, decoding is started after a character string consisting of one character for every character is registered in the dictionary as an initial value in the same way as during encoding. First, read the first code (reference number) with Sl, and
Let 0DE be 0LDcc+de, and since the first code corresponds to one of the reference numbers of one character already registered in the dictionary, the character cgde (K) that matches the input code C0DE
Find out and output the letter K. Note that the output characters include FIN, ch for later exception handling.
, set to sr. Next, go to 82, read the next code, and set it to C0DH.INc
Set as c+de. In S3, it is checked whether there is a new code, that is, whether the code input has ended, and the process proceeds to S4, where it is checked whether the code C0DE inputted in S3 is defined (registered) in the dictionary. Normally, the input code word has been registered in the dictionary in the previous processing, so the process advances to S5 and the character string code (ωK) corresponding to the code C0DE is read from the dictionary, and the character K is temporarily stacked in S6. , the reference number C0DE(ω) is changed to a new code C0DE, and the process returns to S5 again. S6
Repeat the steps recursively until the reference number ω reaches one character K, and finally proceed to 87 and change the stacked character to L in S6.
Pop up and output in IFO (Lrsl Ii FsslOwl) format. At the same time, at 87, a new reference number is added to the character string representing the set (ωK) consisting of the previously used code ω and the first character K of the character string restored this time, and the character string is registered in the dictionary. The LZW decoding process will be specifically explained with reference to FIG. 8 as follows. First, in Fig. 8, the first input code word (INPj1C0DE
) is 1, and - characters a, b, and c have already been registered in the dictionary as reference numbers 1.2.3 as shown in Figure 7, so by referring to the dictionary, the reference that matches code word 1 is found. Replace the number with character string a and output. Similarly, the next code word 2 is replaced with the character S and output. At this time, the character string ω that is a combination of the code word 1 processed last time and the first character of the character string just decoded is =
Add a new reference number 4 to 1b and register it in the dictionary. The third code word 4 is the character string 1b found by dictionary search.
is replaced with the character string ab and the character string ab is output. At the same time, the character string ω that is the combination of code word 2 processed last time and the first character a of the character string just decoded is = 2a (=ba)
is added with a new reference number 5 and registered in the dictionary. This process is repeated in the same manner. The LZW decoding shown in FIG. 8 includes the following exception handling. This exception handling occurs in the decoding of the sixth input codeword 8. Code word 8 is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, the character string 5 is obtained by adding the first character of the previously decoded character string ba to the previously processed code word 5.
Exception processing is performed to find b and then replace it with 5 b=2 a b=b a b and output it. After outputting the character string, a reference number 8 is added to a character string 5b obtained by adding the first character of the currently decoded character string to the previous code word 5, and the result is registered in the dictionary. This exception handling is performed at 84.8 in the decoding process flow in Figure 5.
Finally, in step 87, the character string is output and the new character string is registered in the dictionary with a reference number added.
It will be held at 7. In the LZW decoding shown in Figure 5.8, we explained the case where the dictionary is created in real time while decoding the code on the decoding side, but the dictionary created during encoding can be used as a copy on the decoding side as is. It may be encoded by doing this. In this case, exception handling on the decoding side becomes unnecessary. If LZW encoding is performed according to the procedure shown in the processing flow diagram in Figure 4, each time a dictionary is searched for one character string, in the worst case, the entire dictionary must be searched, which takes time. There was a problem. Therefore, in the conventional dictionary search method, an external hash method (o@B k*5kiB or ck*1m1B) is used to increase the processing speed. First, in a dictionary search using the general hash method, when considering a set S consisting of multiple character strings, the storage position of the character string It has a calculation mechanism that allows for high-speed dictionary searches. Assuming that the storage location of the character string, that is, the hash table, is assigned addresses from 0 to Otori-1, in the hash method,
Define one function has → (0, 1, ..., s-1),
Find the address of character string X in set S as h(X). This function h (x) is called a hash function, and the value h (x) is called a hash address of character string X. The hash method is normally used when the size of the set S is much larger than the number m of addresses. However, no matter how the hash function is selected, a case may occur in which the hash addresses for different character strings xi and x2 of the set S match h (xi)=h (x2). This is called a collision, and the external hash method is used as one of the countermeasures against collision. As shown in Figure 9, in the external hash method, a linked list is prepared for each hash address i indicated by an index (directory), and the first hash address h(x)=
Store the character string of i or the linked list in order from the beginning. Each linked list with the same hash address h(x) is called a backel. LZ using list structure of external hash method for dictionary search
A processing flow diagram of W encoding is shown in FIG. Also the 11th
The figure shows the structure of a dictionary memory according to the external hash method, and specifically shows the search procedure and registration procedure of LZW encoding using the tree structure of encoded character strings shown in Figure 12 as an example. . First, in FIG. 11, the dictionary memories include a first memory (First Memos) 100, a next memory (Nexl Memot71200), and an extended memory (Extenl Memos 71200) which is an extended area of the next memory 200.
ion MeaOty) 300. Here, the first memory 100 corresponds to the index (directory) of the external no\tush method shown in FIG. 9, the next memory 200 corresponds to raexlJ of the linked list shown in FIG.
Furthermore, the extended memory 300 corresponds to "umcJ" in FIG. 9. In addition, the tree structure in FIG. 42 shows a case where a new character is registered.The hierarchy in this tree structure is represented by an i counter in the process of FIG. 10, and the number of characters in the same hierarchy is represented by a j counter. , the registered address of each character is expressed as ω1. The character string rK+.+ K221 K321 K 42J included in the registered tree structure of FIG. L by search
ZW encoding and registration will be explained as follows. In FIG. 1θ, first, the following initialization process is performed in Sl. ■ Initialize the dictionary to include the first character. For example, if there are 26 alphabetic characters, the character code is directly registered as a hash address in the first memory shown in FIG. 14. In the case of FIG. 15, it means that the character KIO at the top of the tree is registered at addresses ω and O. (2) Set the current number of characters registered in the dictionary n to the number of characters registered in (2) above. In the case of 26 alphabetic characters, n
=26. ■Let the first character K input be the initial character string i. In the case of FIG. 12, since the first input character is KIO, the initial character string i=1. In the following processing flow, the initial character string 1 will be described as an i counter. ■Initialize the dictionary search array to O. That is, the search array for first, next, and extended memories is l1rst.
[l, N5stl, +ext [1, N11llll,
Since it is represented by EXT [1, Na1x], this is initialized to 0. When the initialization process of Sl is completed, the process advances to S2 and the next character rK2□] is read. Next, at 83, a check is made to see if there are any unprocessed characters. When all the processing is completed, the process proceeds to S16, where the code word <ode (ω) is output, and the processing ends. At this time, since there are unprocessed characters, the process proceeds to dictionary search steps 85 to S9. The dictionary search step begins with address ω in S5. Set the value of the initial word string i=1 at that time, and set J
Set the counter to j=0. As a result, the first memory address ω1=ω. is generated. Next, at 86, the contents of address ω, 0 of the first memory 100 are read, and address ω1=ω2. is obtained, so i
Set the counter to i=2. Next, the process proceeds to S7, and it is checked whether or not i=0. Since i=2 at this time, the process proceeds to 88, and the expansion memory 3 at the address ω21 obtained from the first memory 100 in S6 is
00, reads the character rK2+J, and determines whether it matches the input character "rK2□" obtained in S2. In this case, since the two do not match, the process proceeds to 89, sets the value of the i counter at this time i=2 to the j counter, and sets j=
2, and the address ω2.2 of the next memory 200. The i of the address ω11"ω2□ stored in is set to the i counter as i=2. Therefore, a new address ω,;ω22 is created. Next, return to S7, check i=0, and set the i counter as i=2. When i=
2, the process goes to S8 again to read the registered character "K2°" from the extended memory 300 at the address ω2□ and determine whether it matches the input character rK22j. At this time, since the two match, the process returns to 82 and reads the next character [K12J. Thereafter, by repeating the processes 85 to S9 in the same manner, dictionary searches are performed in the order shown by the solid arrows in FIG. 11, and the already registered characters r K 4+
Search processing up to J is performed. If the search for the registered character [K4+J is completed and a mismatch is determined in S8 with the last input character "K42", i=2 is set in S9, and the contents of the next memory 200 at address ω41 are set to 0. Because of this, we set i=0. Therefore, when returning to S7, it is determined that i=0,
Skip the dictionary search step and proceed to SIO, and set the address ω32 indicating the character string [K10+K22+K32J] to the code word c. output as de (ω) and proceed to the dictionary registration steps 811-14. In the dictionary registration step, first, the currently registered character string n is set to 1=i, that is, n=4, and n is further incremented by one. Then, the character "K42" is added to the address ω1 of the extended memory 300. =Register at ω42. Next, in 312, it is checked whether j=0 or not, and at this time, j=
Since it is 2, proceed to 314 and next memory 200
address ω4. Write the address ω42 in which the character rKa2J is registered. On the other hand, if j=o in S12, that is, if the state has shifted to registration in the first memory 100, the address ω of the first memory 100 in FIG.
11. ω22. As shown in ω,2, the extended memory 300
Stores the character registration address. By registering the character [a□] in this character registration step, the next memory 200 and extended memory 3 in FIG.
00 is the address ω41.00 indicated by a dashed line at the bottom. ω
42, and the address ω4□ of the new character "K4□" has been added to the tree structure shown in FIG. In addition, in FIG. 11, the address ω4. For convenience of explanation, these are shown redundantly for search and registration. When the dictionary registration step of Sll-S14 is completed, 81
The character rK42J registered in step 5 is set to a new initial character string j1, that is, the value of the i counter, and the process returns to S2 again, with the character "K42" as the top of the tree, and a dictionary search for subsequent character strings is performed.

【発明が解決しようとする課題］このように従来のＬＺＷ符号化にあっては、ソフトウェ
アにより第４図に示した処理フローを実行して符号化す
る場合、辞書検索処理に多くの時間を要することから、
外部ハツシュ法を利用して第１０図の処理フローにより
辞書検索の高速化を図っている。しかしながら、外部ハツシュ法を利用した辞書検索にあ
っては、連結リストの検索により候補文字の続出、候補
文字と入力文字との照合、一致不一致の判定がシーケル
シャルに行なわれるために、辞書検索時間が全体時間の
約８０％を占め、高速化が難しいという欠点があった。この欠点を解消するため辞書検索にパイプライン制御を
取り入れて高速化を図ることも提案されているが、辞書
メモリの転送速度に対し実時間の処理速度を得ることは
困難であった。本発明は、このような従来の問題点に鑑みてなされたも
ので、外部ハツシュ法の連結リスト構造を備えた辞書メ
モリのより一層の高速検索を図って辞書検索時間を短縮
できるデータ圧縮及び復元装置を提供するここを目的き
する。【課題を解決するための手段】第１図は本発明の原理説明図である。まず本発明は、符号化済みデータを相異なる部分列に分
けて各部分列毎に異なる参照番号を付加して辞書に登録
しておき、入力データを該辞書中の部分列の内、最大長
一致する部分列の参照番号で指定して符号化するデータ
圧縮装置、例えばＬｚＷ符号化を行なうデータ圧縮装置
を対象とする。このようなデータ圧縮装置として本発明にあっては、候補データの登録場所とは別に、入力データのコード値
の小さい順番に前記候補データの連結アドレスを示す辞
書検索リストを備え、該辞書検索リストに従って入力デ
ータに一致する候補データを順次検索し、最後に一致し
た候補データの連結アドレスで示される参照番号を一連
の入力データ列の符号データとして出力する該符号化手
段１０と；入力データの出現頻度を計測する計測手段１２と；計測手段１２の計測結果に基づき、出現頻度が高いほど
値が小さく出現頻度が低いほど値が大きいコードに入力
データを変換するコード変換手段１４と；を設け、コード変換手段１４により出現頻度に従ってコ
ード変換された入力データを符号化手段１０により符号
化することを特徴とする。ここで計測手段１２は、入力コードと１又は複数コード
前からの入力コードとの組合せ関係から出現頻度を計測
するようにしても良い。また符号化手段１２は、外部ハツシュ法のリスト構造に
従ったファーストメモリ及び拡張メモリを有するネクス
トメモリを備えた辞書メモリ２０と、入力データに基づ
いたアドレス発生により辞書メモリ２０の拡張メモリに
格納された入力データに一致する候補データを検索する
辞書検索手段２２とを備える。また符号化済み文字データを相異なる部分列に分けて各
部分列毎に異なる参照番号を付加して辞書に登録してお
き、入力データを該辞書中の部分列の内、最大長一致す
る部分列の参照番号で指定して圧縮符号化された符号デ
ータから元の入力データを復元するデータ復元装置とし
て本発明は、符号データを復号する復号手段１６と；入
力データを圧縮符号化する際に計測された出現頻度に基
づき、出現頻度が高いほど値が小さく出現頻度が低いほ
ど値が大きいコードに入力データを変換する変換規則の
逆変換規則に従って復号手段１６の復元データを元の入
力データに逆変換する逆変換手段１８とを備えたことを
特徴とする。[Problems to be Solved by the Invention] As described above, in conventional LZW encoding, when encoding is performed by executing the processing flow shown in FIG. 4 by software, a lot of time is required for dictionary search processing. Therefore,
By using the external hash method, the processing flow shown in FIG. 10 is used to speed up the dictionary search. However, in a dictionary search using the external hash method, the search for a linked list generates candidate characters one after another, matches the candidate characters with input characters, and determines whether or not they match. occupies about 80% of the total time, and it has the disadvantage that it is difficult to increase the speed. In order to overcome this drawback, it has been proposed to speed up dictionary searches by incorporating pipeline control, but it has been difficult to achieve real-time processing speeds compared to the transfer speeds of dictionary memories. The present invention has been made in view of these conventional problems, and provides data compression and restoration that can shorten dictionary search time by achieving even faster searches of a dictionary memory equipped with a linked list structure using the external hash method. This is the purpose of providing equipment. [Means for Solving the Problems] FIG. 1 is a diagram illustrating the principle of the present invention. First, the present invention divides encoded data into different subsequences, adds a different reference number to each subsequence, and registers it in a dictionary. The present invention is directed to a data compression apparatus that specifies and encodes matching subsequences using reference numbers, such as a data compression apparatus that performs LzW encoding. In the present invention, such a data compression device is provided with a dictionary search list indicating concatenated addresses of the candidate data in descending order of the code value of the input data, in addition to the registration location of the candidate data, and the dictionary search list the encoding means 10 sequentially searching for candidate data that matches the input data according to the input data, and outputting the reference number indicated by the concatenated address of the last matching candidate data as code data of a series of input data strings; A measuring means 12 for measuring the frequency; and a code converting means 14 for converting input data into a code in which the higher the frequency of appearance, the smaller the value, and the lower the frequency of appearance, the larger the value, based on the measurement result of the measuring means 12; The present invention is characterized in that the input data whose code has been converted by the code conversion means 14 according to the appearance frequency is encoded by the encoding means 10. Here, the measuring means 12 may measure the appearance frequency from the combinational relationship between the input code and the input code from one or more codes before. Further, the encoding means 12 includes a dictionary memory 20 having a first memory according to the list structure of the external hash method and a next memory having an extended memory, and an address generated based on input data to be stored in the extended memory of the dictionary memory 20. dictionary search means 22 for searching for candidate data that matches the entered input data. In addition, encoded character data is divided into different substrings, a different reference number is added to each substring, and the input data is registered in a dictionary. The present invention is a data restoration device for restoring original input data from encoded data that has been compressed and encoded by specifying a column reference number. Based on the measured appearance frequency, the restored data of the decoding means 16 is converted into the original input data according to the inverse conversion rule of the conversion rule that converts the input data into a code whose value is small as the appearance frequency is high and value is large as the appearance frequency is low. The present invention is characterized in that it includes an inverse transformation means 18 that performs inverse transformation.

【作用】[Effect]

このような構成を備えた本発明のデータ圧縮及び復元装
置によれば、入力文字のコード値の小さい順番に候補デ
ータの連結アドレスを配列する外部ハツシュ法に従った
辞書検索リスト（連結リスト）をもつ辞書メモリを対象
とし、入力文字の出現頻度（使用頻度）を計測し、出現
頻度の高い文字はど大きい値となるようにコード値を変
換し、この変換コードに基づいて連結リストを作成した
ため、出現頻度の高い文字はど連結リストの先頭に配列
され、出現頻度の低い文字はど連結リストの後方に配置
されることになる。従って、使用頻度の高い文字はど連
結リストの検索に要する時間を短くでき、最小の検索回
数で入力文字に一致する候補文字の検索ができる。According to the data compression and decompression device of the present invention having such a configuration, a dictionary search list (linked list) according to the external hash method in which linked addresses of candidate data are arranged in descending order of the code value of input characters is created. We measured the frequency of appearance (frequency of use) of input characters, converted the code value so that characters with high frequency of appearance have a large value, and created a linked list based on this conversion code. , characters that appear frequently are arranged at the beginning of the linked list, and characters that appear less frequently are arranged at the end of the linked list. Therefore, the time required to search the linked list for frequently used characters can be shortened, and candidate characters matching input characters can be searched for with a minimum number of searches.

【実施例】【Example】

第２図のデータ圧縮装置及びデータ復元装置としての機
能を備えた本発明の一実施例を示した実施例構成図であ
る。第２図において、符号化の処理対象となる原データ（文
字列）はＤＭＡ　（Ｄｉｒｅｃｌ　Ｍｅｍｏｒｙ　Ａｃ
ｃｅｓｓ）制御回路２４を介して入力される。制御手段
としてのＭＰＵ３０は符号化手段１０及び復号手段１６
としての機能をプログラム制御により実現する。ＭＰＵ３０による符号化処理および復号処理のため辞書
メモリ２０及び辞書検索回路２２が設けられる。ＭＰＵ３０の符号化手段１０は、入力した原データから
１文字ずつ切り出して辞書検索回路２２にセットした後
に、辞書検索回路２２を起動する。辞書検索回路２２は、それまでの文字列の参照番号、即
ち符号化が済んだ符号コードωに新たに読込んだ１文字
Ｋを加えた文字列ωにの候補文字を読込み、入力文字に
と候補文字との一致検査（照合）を行ない、一致すれば
次の候補文字の有無の検出を行なう。この辞書検索回路２２により入力文字と候補文字の照合
、候補文字の有無の検出および辞書メモリ２０に対する
次の候補文字の読出しはパイプライン制御により並行し
て行うことが望ましい。ＬＺＷ符号の符号化では、辞書メモリ２０中の最大長一
致する文字列を求める。従って、入力文字を付加して文
字列を逐次−文字ずつ伸ばしていき、候補文字がなくな
ったところで最大一致長の文字列であることが分かる。このとき、最大一致長文字列まではアドレスωを使用し
た参照番号で表わされており、その参照番号ωを入出力
ボート２６から外部に圧縮今れた符号コードｃｏｄｅ　
（ω）として出力する。辞書メモリ２０は、第１１図に示したように、外部ハツ
シュ法に従ったファーストメモリ１００、ネクストメモ
リ２００及び拡張メモリ３００でなるメモリ構造をもち
、文字コードの値の小さい順番にファーストメモリ１０
０とネクストメモリ２００を使用した辞書検索リストと
しての連結リストを持っており、この連結リストに対し
別途設けた拡張メモリ３００に連結リストに対応する文
字候補に、を格納している。さらに本発明にあっては、新たに計測回路１２、変換回
路１４および逆変換回路１８を設けている。計測回路１４は、ＤＭＡ制御回路２４を介して入力する
１回に処理する１群の文字列を対象として出現頻度を計
測する。例えば第３図（ａ）に示すように、１文字が８ビツトで
表わされたデシマルコード値θ〜２５５で示す２５６種
類の一群の入力データの出現頻度を計測回路１４で計測
する。計測回路１４の計測結果は変換回路１４に送られ
第３図（ｂ）に示す変換テーブルが作成される。第３図（ｂ）の変換テーブルは、同図（ａ）に示した０
〜２５５のデシマル値をもつ２５６種類の入力データの
デシマルコード値をアドレスとして対応する出現頻度、
即ち出現頻度の順番を示す数値が格納されている。例えば第３図（ａ）のデシマルコード値１の入力データ
の出現頻度が１０番目に高かったとすると、同図（ｂ）
の変換テーブルの入力データのデシマルコード値１のア
ドレスには、出現頻度高さを示す値「１０」が格納され
る。このためデシマルコード値１をもつ入力データは変
換テーブルにより出現頻度に基づいたコード値「１０」
に変換される。このような第３図に示す出現頻度の高い順に小さいコー
ド値へ変換する変換テーブルを備えた変換回路１４によ
り、第３図（ａ）の入力データは同図（ｃ）に示すデー
タに変換されて辞書検索回路２２に与えられ、第１０図
に示した処理フローに従ってＬＺＷ符号化のための辞書
検索および辞書登録が行われる。そして辞書メモリ２０の候補文字を検索するための外部
ハツシュ法に従った連結リストは、出現頻度の高い入力
データはど先頭に配置され、入力文字に一致する候補文
字を検索するまでの回数を最小限に抑えることができる
。一方、ＭＰＵ３０に設けた復号手段１６による復元処理
は、従来の符号コードを元の文字列に戻す復元処理を行
った後に、第３図（ｂ）に示した符号化時の変換テーブ
ルとは逆の変換を行う逆変換テーブルを使用して逆変換
回路１８で変換することで、元の入力データを示すコー
ド値に戻すことができる。即ち、逆変換テーブルは、第
３図（ｂ）の変換テーブルの登録内容となるｒ１５８゜
１０、・・・２４５」をアドレスとしてデシマルコード
値ＩＯ，１，・・・　２５５」を格納しており、復号さ
れたコードをアドレスとして逆変換テーブルの内容を読
出せばよい。尚、上記の実施例にあっては、符号化時に入力したデー
タについて計測回路１２で出現頻度を計測して変換回路
１４および逆変換回路１８の変換テーブルを作成してい
るが、変換対象となる入力データの種別が決っている場
合には、予め同じ種類のデータを対象として出現頻度の
計測結果に基づいて各変換テーブルを作成しておき、既
に作成済みの変換テーブルおよび逆変換テーブルを使用
して符号化処理および復号処理を行うようにしても良い
。また上記の実施例では、１つの入力文字の出現頻度を計
測しているが、他の実施例として、１つの前の入力文字
との組合せ関係、即ち２次元のデータについての出現頻
度の順番を計測しても良いし、更に、２つ以上前の文字
コードの組合せ関係から出現頻度の順番を計測するよう
にしても良い。FIG. 3 is an embodiment configuration diagram showing an embodiment of the present invention having the functions of the data compression device and data decompression device of FIG. 2; In Figure 2, the original data (character string) to be encoded is processed using DMA (Direct Memory Ac...
cess) is input via the control circuit 24. The MPU 30 as a control means has an encoding means 10 and a decoding means 16.
This function is realized through program control. A dictionary memory 20 and a dictionary search circuit 22 are provided for encoding processing and decoding processing by the MPU 30. The encoding means 10 of the MPU 30 starts up the dictionary search circuit 22 after cutting out each character from the input original data and setting them in the dictionary search circuit 22 . The dictionary search circuit 22 reads the candidate characters in the character string ω, which is obtained by adding the newly read character K to the reference number of the previous character string, that is, the code code ω that has been encoded, and converts it into the input character. A match check (verification) with the candidate character is performed, and if there is a match, the presence or absence of the next candidate character is detected. It is preferable that the dictionary search circuit 22 perform matching of input characters and candidate characters, detection of the presence or absence of candidate characters, and reading of the next candidate character from the dictionary memory 20 in parallel under pipeline control. In encoding with the LZW code, a character string in the dictionary memory 20 that matches the maximum length is obtained. Therefore, by adding input characters, the character string is successively extended by one character, and when there are no more candidate characters, it is known that the character string has the maximum matching length. At this time, the character string up to the maximum match length is represented by a reference number using the address ω, and that reference number ω is compressed and transferred from the input/output boat 26 to the outside.
Output as (ω). As shown in FIG. 11, the dictionary memory 20 has a memory structure consisting of a first memory 100, a next memory 200, and an extended memory 300 according to the external hash method.
0 and a linked list as a dictionary search list using the next memory 200, and character candidates corresponding to the linked list are stored in an extended memory 300 provided separately for this linked list. Furthermore, in the present invention, a measurement circuit 12, a conversion circuit 14, and an inverse conversion circuit 18 are newly provided. The measurement circuit 14 measures the appearance frequency of a group of character strings input via the DMA control circuit 24 and processed at one time. For example, as shown in FIG. 3(a), a measuring circuit 14 measures the appearance frequency of a group of 256 types of input data represented by decimal code values .theta..about.255 in which one character is represented by 8 bits. The measurement results of the measurement circuit 14 are sent to the conversion circuit 14, and a conversion table shown in FIG. 3(b) is created. The conversion table in FIG. 3(b) is the 0
The frequency of appearance corresponding to the decimal code value of 256 types of input data with a decimal value of ~255 as an address,
That is, numerical values indicating the order of appearance frequency are stored. For example, if the input data with decimal code value 1 in Figure 3(a) has the 10th highest appearance frequency, then Figure 3(b)
A value "10" indicating the frequency of appearance is stored at the address of the decimal code value 1 of the input data of the conversion table. Therefore, input data with a decimal code value of 1 is changed to a code value of "10" based on the frequency of appearance by the conversion table.
is converted to The input data shown in FIG. 3(a) is converted into the data shown in FIG. 3(c) by the conversion circuit 14 equipped with the conversion table shown in FIG. The data is sent to the dictionary search circuit 22, and dictionary search and dictionary registration for LZW encoding are performed according to the processing flow shown in FIG. In the linked list according to the external hash method for searching for candidate characters in the dictionary memory 20, input data with a high frequency of appearance is placed at the beginning, minimizing the number of times it takes to search for a candidate character that matches the input character. can be kept to a minimum. On the other hand, the restoration process by the decoding means 16 provided in the MPU 30 is performed in the opposite manner to the conversion table at the time of encoding shown in FIG. By performing conversion in the inverse conversion circuit 18 using an inverse conversion table that performs conversion, it is possible to return to a code value indicating the original input data. That is, the inverse conversion table stores decimal code values IO, 1, . . . 255, with addresses r158°10, . , the contents of the inverse conversion table can be read using the decoded code as an address. In the above embodiment, the measurement circuit 12 measures the frequency of appearance of data input during encoding to create conversion tables for the conversion circuit 14 and inverse conversion circuit 18. If the type of input data is determined, create each conversion table in advance based on the measurement results of the frequency of occurrence for the same type of data, and use the already created conversion table and inverse conversion table. The encoding process and the decoding process may be performed using the same method. In addition, in the above embodiment, the appearance frequency of one input character is measured, but in another embodiment, the combinatorial relationship with one previous input character, that is, the order of appearance frequency for two-dimensional data, can be measured. Alternatively, the order of appearance frequency may be measured based on the combination of two or more previous character codes.

【発明の効果】【Effect of the invention】

以上説明したように本発明によれば、ＬＺＷ符号化に入
力するデータの出現頻度を計測して頻度の高い順番から
値の小さいコードを割り付けるコード変換を行うことで
、ソートされた連結リストの先頭には常に出現頻度の高
いデータを候補として登録することができ、出現頻度の
高いデータの検索を最小回数で行うことができ、符号化
をより一層高速化することができる。As explained above, according to the present invention, by measuring the appearance frequency of data input to LZW encoding and performing code conversion in which codes with smaller values are assigned in order of frequency, the head of a sorted linked list is Data with a high frequency of appearance can always be registered as candidates, and data with a high frequency of appearance can be searched for with a minimum number of times, making it possible to further speed up encoding.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の原理説明図；第２図は本発明の実施例構成図；第３図は本発明によるコード変換説明図；第４図は従来
のＬＺＷ符号化処理フロー図；第５図は従来のＬＺＷ復
号化処理フロー図；第６図はＬＺＷ符号化説明図；第７図は辞書構成例の説明図；第８図はＬＺＷ符号化説明図；第９図は外部ハツシュ法のリスト構造説明図；第１０図
は外部ハツシュ法を利用した従来のＬＺＷ符号化処理フ
ロー図；第１１図は第１０図のＬＺＷ符号の検索手順と登録手順
の説明図；第１２図は第１１図の辞書登録内容を示したツー構造図
である。図中、１０：符号化手段１２：計数手段（計数回路）１４：変換手段（変換回路）１６：復号手段１８：逆変換手段（逆変換回路）２０：辞書メモリ２２：辞書検索回路２４：ＤＭＡ制御回路２６：入出力回路３０：ＭＰＵｌ００：ファーストメモリ２００：ネクストメモリ３００：拡張メモリリFIG. 1 is a diagram explaining the principle of the present invention; FIG. 2 is a configuration diagram of an embodiment of the present invention; FIG. 3 is a diagram explaining code conversion according to the present invention; FIG. 4 is a flowchart of conventional LZW encoding processing; The figure is a flowchart of the conventional LZW decoding process; Figure 6 is an explanatory diagram of LZW encoding; Figure 7 is an explanatory diagram of a dictionary configuration example; Figure 8 is an explanatory diagram of LZW encoding; Figure 9 is an explanatory diagram of the external hash method. List structure explanatory diagram; Fig. 10 is a conventional LZW encoding processing flow diagram using the external hash method; Fig. 11 is an explanatory diagram of the LZW code search procedure and registration procedure of Fig. 10; It is a two structure diagram showing the dictionary registration contents of the figure. In the figure, 10: Encoding means 12: Counting means (counting circuit) 14: Conversion means (conversion circuit) 16: Decoding means 18: Inverse conversion means (inverse conversion circuit) 20: Dictionary memory 22: Dictionary search circuit 24: DMA Control circuit 26: Input/output circuit 30: MPU l00: First memory 200: Next memory 300: Expansion memory

Claims

【特許請求の範囲】[Claims]

（１）符号化済み文字データを相異なる部分列に分けて
各部分列毎に異なる参照番号を付加して辞書に登録して
おき、入力データを該辞書中の部分列の内、最大長一致
する部分列の参照番号で指定して圧縮符号化するデータ
圧縮装置に於いて、候補データの登録場所とは別に、入
力データのコード値の小さい順番に前記候補データの連
結アドレスを示す辞書検索リストを備え、該辞書検索リ
ストに従って入力データに一致する候補データを順次検
索し、最後に一致した候補データの連結アドレスで示さ
れる参照番号を一連の入力データ列の符号データとして
出力する該符号化手段（１０）と；前記入力データの出現頻度を計測する計測手段（１２）
と：該計測手段（１２）の計測結果に基づき、出現頻度が高
いほど値が小さく出現頻度が低いほど値が大きいコード
に入力データを変換するコード変換手段（１４）と；を設け、前記コード変換手段（１４）により出現頻度に
従ってコード変換された入力データを前記符号化手段（
１０）により符号化することを特徴とするデータ符号化
装置。(1) Divide the encoded character data into different substrings, add a different reference number to each substring, and register it in a dictionary, and input the input data with the maximum length match among the substrings in the dictionary. In a data compression device that performs compression encoding by specifying the reference number of a subsequence to be processed, in addition to the registration location of the candidate data, there is also a dictionary search list indicating the concatenated addresses of the candidate data in descending order of the code value of the input data. and sequentially searches for candidate data that matches the input data according to the dictionary search list, and outputs a reference number indicated by a concatenated address of the last matching candidate data as code data of a series of input data strings. (10) and; Measuring means (12) for measuring the frequency of appearance of the input data.
and: code conversion means (14) for converting input data into a code whose value is smaller as the frequency of appearance is higher and larger as the frequency of appearance is lower based on the measurement result of the measurement means (12); The input data code-converted according to the frequency of appearance by the conversion means (14) is converted into the code by the encoding means (14).
10) A data encoding device characterized by encoding.

（２）請求項１記載のデータ圧縮装置に於いて、前記計
測手段（１２）は、入力コードと１又は複数コード前か
らの入力コードとの組合せ関係から出現頻度を計測する
ことを特徴とするデータ圧縮装置。(2) In the data compression device according to claim 1, the measuring means (12) measures the appearance frequency from the combination relationship between the input code and the input code from one or more codes before. Data compression device.

（３）請求項１記載のデータ圧縮装置に於いて、前記符
号化手段（１２）は、外部ハッシュ法のリスト構造に従
ったファーストメモリ及び拡張メモリを有するネクスト
メモリを備えた辞書メモリ（２０）と；前記入力データに基づいたアドレス発生により前記辞書
メモリ（２０）の拡張メモリに格納された入力データに
一致する候補データを検索する辞書検索手段（２２）と
；を備えたことを特徴とするデータ圧縮装置の辞書検索方
式。(3) In the data compression device according to claim 1, the encoding means (12) includes a dictionary memory (20) comprising a first memory and a next memory having an extended memory according to a list structure of an external hashing method. and; a dictionary search means (22) for searching for candidate data matching the input data stored in the expanded memory of the dictionary memory (20) by generating an address based on the input data; and; Dictionary search method for data compression equipment.

（４）符号化済み文字データを相異なる部分列に分けて
各部分列毎に異なる参照番号を付加して辞書に登録して
おき、入力データを該辞書中の部分列の内、最大長一致
する部分列の参照番号で指定して圧縮符号化された符号
データから元の入力データを復元するデータ復元装置に
於いて、符号データを復号する復号手段（１６）と；入力データ
を圧縮符号化する際に計測された出現頻度に基づき、出
現頻度が高いほど値が小さく出現頻度が低いほど値が大
きいコードに入力データを変換する変換規則の逆変換規
則に従って前記復号手段（１６）の復元データを元の入
力データに逆変換する逆変換手段（１８）と；を備えたことを特徴とするデータ復元装置。(4) Divide the encoded character data into different substrings, add a different reference number to each substring, and register it in a dictionary, and input the input data with the maximum length match among the substrings in the dictionary. A data restoration device for restoring original input data from encoded data compressed and encoded by specifying a reference number of a subsequence to be encoded, comprising: a decoding means (16) for decoding the encoded data; The data restored by the decoding means (16) is based on the inverse conversion rule of the conversion rule for converting the input data into a code in which the higher the frequency of appearance is, the smaller the value is, and the lower the frequency of appearance is, the larger the value is. A data restoration device comprising: inverse conversion means (18) for inverse conversion of input data into original input data;