JP2002344326A

JP2002344326A - Method for compressing data by synthetic index and method for restoring compressed data completely

Info

Publication number: JP2002344326A
Application number: JP2001208560A
Authority: JP
Inventors: Togo Nakamura; 東吾中村
Original assignee: SYSTEM KAISEKI KENKYUSHO KK
Current assignee: SYSTEM KAISEKI KENKYUSHO KK
Priority date: 2001-03-10
Filing date: 2001-06-06
Publication date: 2002-11-29

Abstract

PROBLEM TO BE SOLVED: To enhance compression by applying a simple operation to an information source. SOLUTION: The index of information source symbol having a known frequency of appearance is set with a maximum value and a threshold value and the index sequence of input symbol is converted into a smaller index sequence before compressing by entropy encoding, e.g. Huffman encoding.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データ処理におい
てデータを効率よく圧縮するためのデータ圧縮方法及び
圧縮データの完全復元方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method for efficiently compressing data in data processing and a method for completely restoring compressed data.

【０００２】[0002]

【従来の技術】従来、データを圧縮する場合、特にエン
トロピー符号化（例えばハフマン符号化等）を適用する
ような場合、データ列の構成要素である情報源記号の数
は一定であるとみなしていた。2. Description of the Related Art Conventionally, when data is compressed, particularly when entropy coding (for example, Huffman coding, etc.) is applied, it is assumed that the number of information source symbols which are constituents of a data string is constant. Was.

【０００３】[0003]

【発明が解決しようとする課題】しかし、情報源記号の
数が一定であると、この数に依存して圧縮限界値である
エントロピーが決まってしまい、どんなに記号の出現頻
度に偏りがあった場合でも、情報源記号の平均符号長
は、この限界値に近付くことはできても限界値を超える
ことはできない。However, if the number of information source symbols is constant, the entropy, which is the compression limit value, is determined depending on the number of information source symbols. However, the average code length of the source symbols can approach this limit but cannot exceed it.

【０００４】そこで、本発明では、高頻度の情報源記号
のインデックスに着目した圧縮方法を考案し、そして、
限界値を超えるが、しかし完全に復元できる、簡単な圧
縮例を示すことを目的とする。Therefore, the present invention devises a compression method that focuses on the index of a high-frequency information source symbol.
The purpose is to provide a simple example of compression that exceeds the limit, but can be completely decompressed.

【０００５】[0005]

【課題を解決するための手段】本発明は、データ列を構
成する情報源記号を出現頻度で昇順にソートし、前記記
号のインデックスに最大値及び閾値を予め設定してお
き、新たに入力された前記記号のインデックスが前記最
大値に等しい場合、１つ前に入力された前記記号のイン
デックスに前記閾値を加えてインデックスを合成する手
段と、合成したインデックスの出現頻度テーブルを算出
し、このテーブルに対する符号化としてエントロピー符
号化を適用してデータを圧縮する手段と、前記最大値
は、最も高い出現頻度をもつ前記記号のインデックスと
し、前記閾値は前記最大値より１少ない値にする手段と
を有することを特徴とするデータ圧縮方法によって行
う。According to the present invention, information source symbols constituting a data sequence are sorted in ascending order by appearance frequency, a maximum value and a threshold value are set in advance in the index of the symbols, and newly input symbols are set. When the index of the symbol is equal to the maximum value, means for adding the threshold value to the index of the symbol input immediately before to synthesize the index, and calculating the appearance frequency table of the synthesized index, Means for compressing data by applying entropy coding as encoding for, and means for setting the maximum value to be the index of the symbol having the highest occurrence frequency and setting the threshold value to be one less than the maximum value. It is performed by a data compression method characterized by having

【０００６】更に本発明では、請求項２において生成さ
れた結果に対して、圧縮データの符号語データを順次読
み出してインデックスをエントロピー復号化し、そして
参照用テーブルから合成インデックスを取り出し、前記
インデックスが前記閾値を超えた場合、前記インデック
スを、前記閾値との差と前記最大値に分解する手段と、
分解されたインデックスからテーブルを参照して入力記
号を復元する手段とを有することを特徴とする圧縮デー
タの完全復元方法によって行う。Further, according to the present invention, the codeword data of the compressed data is sequentially read from the result generated in claim 2, the index is entropy-decoded, and the composite index is extracted from the look-up table. If the threshold is exceeded, the index is decomposed into a difference from the threshold and the maximum value,
Means for restoring input symbols by referring to a table from the decomposed index, and performing a complete decompression method of compressed data.

【０００７】[0007]

【発明の実施の形態】以下本発明に係るデータの圧縮方
法及び完全復元方法の実施形態を図を用いて詳細に説明
する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of a data compression method and a complete decompression method according to the present invention will be described below in detail with reference to the drawings.

【０００８】図１は本発明のデータの圧縮方法のフロー
チャートで、処理の流れを示す。図中、ステップ１１に
おいて、元データは圧縮すべきデータで、ステップ１２
では、データを全て読み込み、情報源記号の出現頻度テ
ーブルを作成する。このとき、前記記号数はＭ個である
とする。ステップ１３では、前記テーブルの出現頻度つ
いて、昇順にソートして、最大値として出現頻度が最も
高いインデックスをＭ、及び閾値としてＭ−１を設定す
る。FIG. 1 is a flowchart of a data compression method according to the present invention, showing the flow of processing. In the figure, in step 11, the original data is the data to be compressed, and in step 12
Then, all the data is read, and an appearance frequency table of the information source symbols is created. At this time, it is assumed that the number of the symbols is M. In step 13, the frequency of appearance of the table is sorted in ascending order, M is set as the index having the highest frequency of appearance as the maximum value, and M-1 is set as the threshold value.

【０００９】ステップ１４では、最大値と閾値、及びス
テップ１２で作成された出現頻度テーブルを用いて、新
しいインデックスを合成し、その出現頻度テーブルを作
成する。この作成処理については、後で図２を用いて詳
細に説明する。In step 14, a new index is synthesized using the maximum value and the threshold value and the appearance frequency table created in step 12, and the appearance frequency table is created. This creation process will be described later in detail with reference to FIG.

【００１０】ステップ１５では復号化に必要なヘッダ情
報を、３種類出力する。１つは、データ列の先頭記号の
インデックスがＭに等しい場合、復号したインデックス
は１・ＭかＭか区別がつかないのでこれを区別するフラ
グ情報。２つ目は、データ列を構成する情報源記号テー
ブル。３つ目は、ステップ１４で登録された合成インデ
ックステーブルである。これら３種類のヘッダ情報を圧
縮データファイルの先頭に出力しておく。ヘッダ情報の
出力例は図５に示される。In step 15, three types of header information necessary for decoding are output. One is flag information for discriminating whether the index of the first symbol of the data string is equal to M, because the decoded index cannot be distinguished from 1 · M or M. The second is an information source symbol table constituting a data string. The third is the composite index table registered in step 14. These three types of header information are output at the beginning of the compressed data file. An output example of the header information is shown in FIG.

【００１１】ステップ１６では、ステップ１４で作成し
た合成インデックスの出現頻度テーブルに対し、ハフマ
ン符号化等のエントロピー符号化を適用して符号化を行
い、符号語を圧縮データファイルに出力する。これらの
処理については、後で図３を用いて詳細に説明する。In step 16, the appearance frequency table of the composite index created in step 14 is encoded by applying entropy encoding such as Huffman encoding, and the codeword is output to a compressed data file. These processes will be described later in detail with reference to FIG.

【００１２】ステップ１６での処理をデータ列の終りま
で繰り返すことによって、ステップ１７の圧縮データが
得られる。圧縮データの出力構造は図５に示される。By repeating the processing in step 16 until the end of the data string, compressed data in step 17 is obtained. The output structure of the compressed data is shown in FIG.

【００１３】図２は、図１のステップ１４の合成インデ
ックスの出現頻度テーブルを作成する処理を詳細に説明
するフローチャートである。図中の、前の処理は、図１
のステップ１３であり、ステップ１１は図１の元データ
である。ステップ２２では、元データの入力記号のイン
デックスが最大値に等しいかどうかを調べ、等しけれ
ば、１つ前に入力した記号のインデックスに閾値を加え
て新しいインデックスを作る（ステップ２３）。即ち合
成インデックスとして出現頻度テーブルに登録する（ス
テップ２５）。等しくなければ（ステップ２４）、イン
デックスを合成しないでステップ２５へそのまま登録し
に行く。ステップ２６では、ステップ１１〜２５までの
一連の処理を、元データの入力記号が無くなるまで繰り
返す。これより、合成インデックスの出現頻度テーブル
が得られ、次の処理へ移る（ステップ１５）。FIG. 2 is a flowchart for explaining in detail the process of creating the appearance frequency table of the composite index in step 14 of FIG. The previous processing in FIG.
Step 11 is the original data of FIG. In step 22, it is checked whether or not the index of the input symbol of the original data is equal to the maximum value. If they are equal, a new index is created by adding a threshold value to the index of the symbol input immediately before (step 23). That is, it is registered in the appearance frequency table as a composite index (step 25). If they are not equal (step 24), the process proceeds to step 25 without registering the indexes and goes directly to registration. In step 26, a series of processing in steps 11 to 25 is repeated until there are no more input symbols in the original data. Thus, the appearance frequency table of the composite index is obtained, and the process proceeds to the next processing (step 15).

【００１４】図３は、図１のステップ１４で作成した合
成インデックスの出現頻度テーブルを参照してエントロ
ピー符号化処理を行うフローチャートである。ステッ１
１〜２４までは図２の処理と全く同じである。ステップ
３５では、これらのインデックスを、合成インデックス
の出現頻度テーブルを参照してハフマン符号化等のエン
トロピー符号化を行ない、符号語として圧縮データファ
イルに出力する。これら一連の処理をデータが無くなる
まで繰り返す（ステップ３６）。そして次の処理へ移る
（ステップ１７）、即ち圧縮データを得る．FIG. 3 is a flowchart for performing the entropy encoding process with reference to the appearance frequency table of the composite index created in step 14 of FIG. Step 1
Steps 1 to 24 are exactly the same as the processing in FIG. In step 35, these indexes are subjected to entropy encoding such as Huffman encoding with reference to the appearance frequency table of the composite index, and are output to the compressed data file as codewords. These series of processes are repeated until there is no more data (step 36). Then, the process proceeds to the next step (step 17), that is, compressed data is obtained.

【００１５】図４は圧縮データの完全復元方法のフロー
チャートで、図中のステップ１７は、圧縮されたデータ
である。ステップ４２では、圧縮データファイルから３
種類のヘッダ情報を読み取って復号化に必要なフラグの
設定、情報源記号そして合成インデックスの参照用テー
ブルを作成する。ステップ４３はヘッダ情報に続いて符
号語データを順次読み取る。そして、インデックスをエ
ントロピー復号化する（ステップ４４）。ステップ４５
では、復号されたインデックスから参照用テーブルを用
いて合成インデックスを取り出す。合成インデックス
が、設定された閾値より大きいかどうかを調べ（ステッ
プ４６）、大きければ、数１のように分解する（ステッ
プ４７）。但し、数１のなかで、Ｌは合成インデック
ス、Ｍは最大インデックスである。FIG. 4 is a flowchart of a method for completely restoring the compressed data. Step 17 in the figure is the compressed data. In step 42, 3
The type of header information is read, and a flag setting table required for decoding, an information source symbol, and a reference table for a composite index are created. Step 43 reads the codeword data sequentially following the header information. Then, the index is entropy-decoded (step 44). Step 45
Then, a composite index is extracted from the decrypted index using the reference table. It is checked whether or not the composite index is larger than the set threshold value (step 46). However, in Equation 1, L is the composite index, and M is the maximum index.

【００１６】[0016]

【数１】Ｌ＝（Ｌ−（Ｍ−１））・（Ｍ−１＋１）＝ｉ・Ｍもし、数１のｉがｉ＞Ｍ−１ならば、Ｌ＝ｉとして、こ
の分解をｉ≦Ｍ−１になるまで繰り返す。このような操
作により、Ｌは数２のような形に分解される。数２は、
Ｌを閾値Ｍ−１で割ったときの一意的な分解Ｌ＝ｉ＋
（Ｍ−１）・ｎに対応している。L = (L− (M−1)) · (M−1 + 1) = i · M If i in Equation 1 is i> M−1, L = i and this decomposition is i ≦ M Repeat until M-1. By such an operation, L is decomposed into a form as shown in Expression 2. Equation 2 is
Unique decomposition L = i + when L is divided by threshold M-1
(M-1) · n.

【００１７】[0017]

【数２】Ｌ＝ｉ・ＭＭ…Ｍ、（Ｍはｎ個連続する）L = i · MM... M (M is n consecutive)

【００１８】ステップ４８は、ｉ≦Ｍ−１の場合であ
り、このときは何もしないでインデックスＬ＝ｉを次の
ステップ４９に渡す。ステップ４９では、インデックス
ｉ又はＭに対応する情報源記号を参照用テーブルから取
り出して、記号又は記号列を復元。ステップ４３〜４９
までの一連の操作を、符号語データが無くなるまで繰り
返すと（ステップ５０）、完全に復元された元データが
得られる（ステップ１１）。Step 48 is a case where i ≦ M−1. At this time, the index L = i is passed to the next step 49 without doing anything. In step 49, the information source symbol corresponding to the index i or M is extracted from the look-up table, and the symbol or symbol string is restored. Steps 43-49
By repeating the above series of operations until code word data is exhausted (step 50), completely restored original data is obtained (step 11).

【実施例】本発明の合成インデックスによるデータ圧縮
方法及び圧縮データの完全復元方法の実施例を実際のデ
ータを用いて説明する。今、１１個の入力系列をＳ＝”
ｂａＹａｄａｂａｄｏｏ”とする。Ｓの情報源記号の出
現頻度テーブルを作成し、昇順にソートすると図６のよ
うなテーブルが得られる。まず符号化手順から説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a data compression method using a composite index and a complete restoration method of compressed data according to the present invention will be described using actual data. Now, let S = "11"
When a table of appearance frequencies of the information source symbols of S is created and sorted in ascending order, a table as shown in FIG. 6 is obtained. First, the encoding procedure will be described.

【００１９】Ｓをインデックスで表示すると、Ｓ_１＝”
４５１５３５４５３２２”となる。最大インデックスは
５であるから閾値５−１＝４を設定する。また、先頭イ
ンデックスは最大インデックスに等しくないからフラグ
ｆｌａｇ＝０を設定する。次にＳ_１の系列を順次読込ん
でいって、最大インデックス５がきたら１つまえのイン
デックスに閾値４を加算するという操作を繰り返すと７
個の合成されたインデックス列Ｓ_２＝”８５７８３２
２”を得る。When S is represented by an index, S ₁ = ”
45153545322 becomes ". Maximum index sets a threshold 5-1 = 4 because it is 5. Also, the top index sets a flag flag = 0 because not equal to the maximum index. Then sequentially read the sequence of _{S 1} When the maximum index 5 comes and the operation of adding the threshold 4 to the previous index is repeated, 7
Number of combined index strings S ₂ = “857782”
2 "is obtained.

【００２０】Ｓ_２の合成インデックス列に対して頻度テ
ーブルを作成し、エントロピー符号化として、ハフマン
法で符号化すると図７のテーブルが得られる。このテー
ブルを基に入力系列Ｓの記号を順次読込んで符号化する
と、図８のような出力構造をもつ圧縮データファイルが
得られる。[0020] creates a frequency table for the synthesis index columns of S _2, as entropy coding, when coding in the Huffman method table of FIG 7 is obtained. When the symbols of the input sequence S are sequentially read and encoded based on this table, a compressed data file having an output structure as shown in FIG. 8 is obtained.

【００２１】次に復号化手順を示す。まず、図８に示さ
れる圧縮データファイルからヘッダ情報を抽出する。そ
して復号化に必要な参照用テーブル（図９）を作成し、
情報源記号のインデックスの最大値Ｍ＝５、閾値４を設
定する。次に、図１０の符号語を順次読込んでエントロ
ピー復号化（ハフマン符号化の場合、ハフマンの木を利
用して）を行う。Next, the decoding procedure will be described. First, header information is extracted from the compressed data file shown in FIG. Then, a reference table (FIG. 9) necessary for decryption is created,
The maximum value M = 5 and the threshold value 4 of the index of the information source symbol are set. Next, the code words in FIG. 10 are sequentially read, and entropy decoding is performed (in the case of Huffman coding, using a Huffman tree).

【００２２】詳細な操作手順は次の通りである。図１０
の符号語００を読込んでインデックス１を復号する。実
際には、符号語を１ビットずつ読込んでハフマンの木を
利用して復号する。次に、図９の合成インデックステー
ブルを参照して１番目の合成インデックス８を得る。合
成インデックスは、設定された閾値４より大きいブルを参照して４番目と５番目の記号ｂとａを得る。復
元された記号列はＳ^（１）＝”ｂａ”となる。次に、符
号語１００からインデックス４を復号し、合成インデッ
クス５が求められる。合成インデックスは、設定された
閾値４より大きいので、の記号Ｙとａ、従って、Ｓ^（２）＝”ｂａＹａ”が得ら
れる。同様な操作を符号語が無くなるまでＳ^（１）、Ｓ
^（２） … と繰り返して、Ｓ＝”ｂａＹａｄａｂａｄ
ｏｏ”が完全に復元される。The detailed operation procedure is as follows. FIG.
Is read and the index 1 is decoded. In practice, codewords are read one bit at a time and decoded using a Huffman tree. Next, the first composite index 8 is obtained with reference to the composite index table of FIG. The composite index is larger than the set threshold 4 The fourth and fifth symbols b and a are obtained by referring to the table. The restored symbol string is S ⁽¹⁾ = “ba”. Next, the index 4 is decoded from the codeword 100, and the composite index 5 is obtained. Since the composite index is larger than the set threshold value 4, , And therefore S ⁽²⁾ = “baYa”. The same operation is performed until S ⁽¹⁾ , S
⁽²⁾ Repeatedly, S = “baYadabad
oo "is completely restored.

【００２３】本発明の圧縮方法の効果をみるために、ハ
フマン法による平均符号長とエントロピーを計算してみ
る。図８のテーブルから、本発明の方法による符号語の
総数は、１６ビットで、平均符号長では、１６／１１
（ビット／記号）である。また入力系列Ｓに直接ハフマ
ン法を適用して平均符号長を算出してみると、２５／１
１となる。ちなみに、入力系列Ｓの、圧縮限界であるエ
ントロピーＨは、Ｈ≒２４．０５／１１（ビット／記
号）となっている。従って、これらの計算例は、本発明
による圧縮は、入力系列Ｓのエントロピーを超えた圧縮
となっているということを示しいる。In order to see the effect of the compression method of the present invention, the average code length and entropy by the Huffman method will be calculated. From the table of FIG. 8, the total number of codewords according to the method of the present invention is 16 bits, and the average code length is 16/11.
(Bit / symbol). Also, when the average code length is calculated by directly applying the Huffman method to the input sequence S, 25/1
It becomes 1. Incidentally, the entropy H which is the compression limit of the input sequence S is H ≒ 24.05 / 11 (bits / symbol). Therefore, these calculation examples show that the compression according to the present invention is a compression exceeding the entropy of the input sequence S.

【００２４】[0024]

【発明の効果】本発明では、最も高頻度の記号のインデ
ックスに着目して、最大値と閾値を設定し、全体のイン
デックス列をより少ないインデックス列に変換してから
符号化を図っている。簡単な計算例では、圧縮限界値で
あるエントロピーを超えた圧縮となっており、しかも完
全に復元できるという点で、非常に注目すべき、今後の
可能性を秘めた圧縮方法といえる。データ量が少ない場
合は、オーバーヘッドの部分が大きいがデータ量が十分
に多い場合、相対的に無視できる程度になる。According to the present invention, the maximum value and the threshold value are set by focusing on the index of the most frequent symbol, and the encoding is performed after converting the entire index sequence into a smaller index sequence. In a simple calculation example, the compression method exceeds the entropy, which is the compression limit value, and it is possible to say that the compression method can be completely restored. When the data amount is small, the overhead portion is large, but when the data amount is sufficiently large, it becomes relatively negligible.

【００２５】また、本発明による圧縮方法の利点は、複
雑な計算を伴う従来の方法に較べて、極めて簡単な演算
だけで達成できる点である。An advantage of the compression method according to the present invention is that it can be achieved with a very simple operation as compared with the conventional method involving complicated calculations.

【００２６】本発明によれば、簡単な処理によってデー
タの圧縮及び完全復元が行え、圧縮によるデータの削減
効果の増大及びデータ保存のための格納手段の有効活用
を図ることができる。これにより、格納コストが削減さ
れ、計算機システム全体としての処理効率の大幅な向上
・低コスト化につながる。According to the present invention, data can be compressed and completely decompressed by simple processing, and the effect of reducing data by compression can be increased and the storage means for storing data can be effectively used. As a result, the storage cost is reduced, and the processing efficiency of the entire computer system is greatly improved and the cost is reduced.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例によるデータの圧縮方法のフ
ローチャート。FIG. 1 is a flowchart of a data compression method according to an embodiment of the present invention.

【図２】図１のステップ１４の処理を詳細にした、合成
インデックスの出現頻度テーブルを作成するフローチャ
ート。FIG. 2 is a flowchart for creating a composite index appearance frequency table, which details the process of step 14 in FIG. 1;

【図３】図１のステップ１６の処理を詳細にした、合成
インデックスのエントロピー符号化を行うフローチャー
ト。FIG. 3 is a flowchart showing entropy coding of a composite index, which is a detailed process of step 16 in FIG.

【図４】本発明の一実施例による圧縮データの完全復元
方法のフローチャート。FIG. 4 is a flowchart of a method for completely restoring compressed data according to an embodiment of the present invention;

【図５】圧縮データの構造例を示す図である。FIG. 5 is a diagram illustrating an example of the structure of compressed data.

【図６】入力系列Ｓの出現頻度テーブルの例を示す図で
ある。FIG. 6 is a diagram illustrating an example of an appearance frequency table of an input sequence S.

【図７】入力系列Ｓの合成インデックスの出現頻度テー
ブルの例を示す図である。FIG. 7 is a diagram illustrating an example of an appearance frequency table of a composite index of the input sequence S.

【図８】本発明による入力系列Ｓを圧縮したときの出力
例を示す図である。FIG. 8 is a diagram showing an output example when the input sequence S according to the present invention is compressed.

【図９】本発明による入力系列Ｓを復号するときの参照
用テーブルの例を示す図である。FIG. 9 is a diagram showing an example of a lookup table when decoding an input sequence S according to the present invention.

【図１０】本発明により復号すべき入力系列Ｓの符号語
データの例を示す図である。FIG. 10 is a diagram showing an example of codeword data of an input sequence S to be decoded according to the present invention.

【符号の説明】[Explanation of symbols]

５１０ファイルの先頭記号のインデックスが最大値に
等しいか否かを示すフラグ。等しい場合はｆｌａｇ＝
１、等しくない場合はｆｌａｇ＝０。５１１情報源記号の総数。５１２合成インデックスの総数。５１３合成インデックス。６１入力系列Ｓの情報源記号。７１入力系列Ｓの合成インデックス。510 Flag indicating whether or not the index of the first symbol of the file is equal to the maximum value. If equal, flag =
1, if not equal, flag = 0. 511 Total number of source symbols. 512 Total number of composite indexes. 513 Composite index. 61 Information source symbol of input sequence S. 71 Composite index of input sequence S.

Claims

【特許請求の範囲】[Claims]

【請求項１】データ列を構成する情報源記号を出現頻度
で昇順にソートし、前記記号のインデックスに最大値及
び閾値を予め設定してしておき、新たに入力された前記
記号のインデックスが前記最大値に等しい場合、１つ前
に入力された前記記号のインデックスに前記閾値を加え
てインデックスを合成することを特徴とするデータ圧縮
方法。An information source symbol constituting a data sequence is sorted in ascending order by appearance frequency, a maximum value and a threshold are set in advance in the index of the symbol, and the index of the newly input symbol is A data compression method, wherein when the value is equal to the maximum value, the threshold is added to the index of the symbol input immediately before to synthesize the index.

【請求項２】請求項１に記載のデータ圧縮方法におい
て、合成したインデックスの出現頻度テーブルを算出
し、このテーブルに対する符号化としてエントロピー符
号化を適用してデータを圧縮することを特徴とするデー
タ圧縮方法。2. The data compression method according to claim 1, wherein an appearance frequency table of the combined index is calculated, and the data is compressed by applying entropy coding as encoding for the table. Compression method.

【請求項３】請求項１に記載のデータ圧縮方法におい
て、前記最大値は、最も高い出現頻度をもつ前記記号の
インデックスとし、前記閾値は前記最大値より１少ない
値とすることを特徴とするデータ圧縮方法。3. The data compression method according to claim 1, wherein the maximum value is an index of the symbol having the highest appearance frequency, and the threshold value is one less than the maximum value. Data compression method.

【請求項４】請求項２において生成された結果に対し
て、圧縮データの符号語データを順次読み出してインデ
ックスをエントロピー復号化し、そして参照用テーブル
から合成インデックスを取り出し、前記インデックスが
前記閾値を超えた場合、前記インデックスを、前記閾値
との差と前記最大値に分解する手段と、分解されたイン
デックスからテーブルを参照して入力記号を復元する手
段とを有し、データの復元を行うことを特徴とする圧縮
データの完全復元方法。4. The method according to claim 2, wherein the codeword data of the compressed data is sequentially read out, the index is entropy-decoded, and a composite index is taken out of a look-up table, wherein the index exceeds the threshold value. In the case where the index is divided into a difference from the threshold value and the maximum value, and means for restoring an input symbol by referring to a table from the decomposed index to restore data. Characteristic complete restoration method of compressed data.