JPH08116263A

JPH08116263A - Data processor and data processing method

Info

Publication number: JPH08116263A
Application number: JP25101794A
Authority: JP
Inventors: Hitoshi Ono; 均大野; Yuko Abe; 優子安部; Akio Shinagawa; 明雄品川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-10-17
Filing date: 1994-10-17
Publication date: 1996-05-07

Abstract

PURPOSE: To quicken the data processing speed by allowing the data processing unit to predict an incident probability of data succeeding to data to be com pressed and to improve the data compression rate by allocating data with a high incident probability to short compression data. CONSTITUTION: The processing unit is provided with a data generating means 11 counting a frequency of incidence of data for each kind of data to be compressed to generate an incidence frequency table, a data prediction means 12 predicting incidence of data succeeding to the data to be compressed while referencing the incidence frequency table from the data generating means 11, and a data conversion means 13 converting data with a high incident probability into compression data of a shorter bit length and converting data with a low incident probability into compression data of a longer bit length.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データ処理装置及びデ
ータ処理方法に関するものであり、更に詳しく言えば、
ハフマン符号化を利用して入力データを圧縮又は圧縮デ
ータを復号化をする装置及び方法の改善に関するもので
ある。近年、情報処理装置の高機能化及び多様化に伴
い、膨大な量のデータを格納する磁気ディスク装置等の
記憶装置を使用した装置及びそれらデータを通信回線を
使用して伝送する装置が利用されている。このような情
報処理分野では、高効率化により利用者のコスト削減を
図るべく、データ格納に際しては、記憶容量を実質的に
増やすために、及び、データ転送に際しては、送信時間
を短縮するために、データを圧縮する装置が必要とな
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device and a data processing method, and more specifically,
The present invention relates to improvements in an apparatus and method for compressing input data or decoding compressed data using Huffman coding. 2. Description of the Related Art In recent years, as information processing devices have become more sophisticated and diversified, devices that use storage devices such as magnetic disk devices that store enormous amounts of data and devices that transmit those data using communication lines have been used. ing. In such an information processing field, in order to reduce the user cost by improving efficiency, in order to substantially increase the storage capacity when storing data, and to shorten the transmission time when transferring data. , A device for compressing the data is required.

【０００２】[0002]

【従来の技術】データ圧縮を情報理論面から最初に確立
したのは、米国ベル研究所のＣlaudeＳhannon氏が1948
年に発表した「エントロピー」という概念であるといわ
れている。ほぼ同じ時期に米国ＭＩＴのＲ．Ｍ．Ｆano
氏も同様な理論を考えたことから、Ｓhannon−Ｆano 符
号化と一般に呼ばれることが多い。これは、出現確率が
高い文字を少ないビット数に割り当てることにより、デ
ータを圧縮するものである。その後、この可変長符号を
生成する方式として、1952年にＨuffman氏が発表した
‘ＡＭethod for the Ｃonstruction of Ｍinimum
Ｒedundancy Ｃode'（最少冗長符号の構成方法）の論文
があり、いわゆる、「ハフマン符号化」が主流を占める
ようになった。これらはいずれも、文字の出現頻度の差
を利用して、データ圧縮をするというものである。2. Description of the Related Art Claude Shannon of Bell Laboratories of the United States first established data compression in terms of information theory in 1948.
It is said to be the concept of "entropy" that was announced in the year. At about the same time, R. M. Fano
Since he also considered the same theory, it is often called Shannon-Fano coding. This is to compress data by assigning a character having a high appearance probability to a small number of bits. After that, as a method of generating this variable length code, Huffman announced in 1952 that "A Method for the Construction of Minimum".
There is a paper on Redundancy Code '(construction method of least redundant code), and so-called "Huffman coding" has become mainstream. In all of these, data compression is performed by utilizing the difference in the appearance frequency of characters.

【０００３】これに対して、辞書という概念を持ち、文
字列の繰り返しに関してデータ圧縮をするという全く別
の方法が考えられた。これは通称、Ｌempel −Ｚiv符号
化のスライド辞書法又はＬＺ７７法と言われている。こ
の辞書ベースによりデータ圧縮をする方法は、Ｌempel
Ａbraham氏とＺivＪacob氏が1977年にＩＥＥＥＴrans
action on Ｉnformation Ｔheory に発表した論文‘Ａ
Ｕniversal Ａlgorithm for Ｓequential Ｄata Ｃ
ompression’に見られる。つまり、従来の圧縮アルゴリ
ズムの基本原理を大別すると、文字の出現頻度に対し
て、データ圧縮をするもの（ハフマン符号化等）と、辞
書という概念を持ち、文字列の繰り返しに対してデータ
圧縮をするもの（Ｌempel-Ｚiv符号化）の２種類があ
る。On the other hand, a completely different method has been conceived, which has a concept of a dictionary and performs data compression on repetition of a character string. This is commonly called the Lempel-Ziv encoding slide dictionary method or the LZ77 method. This dictionary-based data compression method is Lempel
Abraham and Ziv Jacob in 1977 IEEE Trans
Paper on action on Innovation Theory'A
Universal Algorithm for Sequential Data C
Seen in'ompression '. In other words, the basic principles of conventional compression algorithms are roughly divided into those that perform data compression depending on the frequency of occurrence of characters (Huffman encoding, etc.) and the concept of a dictionary. There are two types of code (Lempel-Ziv encoding).

【０００４】図12は、従来例に係る説明図である。図12
（Ａ）は、日本国特許庁が発行する公開特許公報，特開
平４−１２３６１９号に見られるようなデータ圧縮装置
（以下第１の装置という）の構成図であり、図12（Ｂ）
は、同様に、特開平４−２８０５１７号に見られるよう
なデータ圧縮装置（以下第２の装置という）の構成図を
それぞれ示している。FIG. 12 is an explanatory diagram according to a conventional example. Figure 12
12A is a block diagram of a data compression device (hereinafter referred to as a first device) as disclosed in Japanese Patent Laid-Open Publication No. 4-123619 published by the Japan Patent Office, and FIG.
Similarly shows respective configuration diagrams of a data compression device (hereinafter referred to as a second device) as disclosed in Japanese Patent Laid-Open No. 4-280517.

【０００５】Ｌempel −Ｚiv符号化を改良した第１の装
置は、図12（Ａ）に示すように、入力データの出現頻度
を計測する計測手段１と、出現頻度に応じて入力データ
ＤINを変換する変換手段２と、辞書検索リストに従っ
て、変換データＤＴに基づき候補データを順次検索し、
候補データの参照番号を符号データＤOUT として出力す
る符号化手段３から成る。The first device with improved Lempel-Ziv encoding is, as shown in FIG. 12 (A), a measuring means 1 for measuring the appearance frequency of input data, and converting the input data DIN according to the appearance frequency. The conversion means 2 for performing the search and the dictionary search list to sequentially search the candidate data based on the conversion data DT,
The encoding means 3 outputs the reference number of the candidate data as the code data DOUT.

【０００６】第１の装置の動作を説明する。まず、計測
手段１により入力データの出現頻度が計測されると、計
測手段１からの計測結果に基づき、変換手段２によって
出現頻度が高いほど小さな値のコードに、また、出現頻
度が低いほど大きな値のコードに、入力データＤINがそ
れぞれ変換される。ここで変換された変換データＤＴは
符号化手段３により、辞書検索リストに従って、入力デ
ータに一致する候補データが順次検索され、最後に一致
した候補データの参照番号が一連の入力データ列の符号
データＤOUT として出力される。これにより、入力デー
タＤINが符号化される。The operation of the first device will be described. First, when the appearance frequency of the input data is measured by the measurement unit 1, the conversion unit 2 produces a code having a smaller value as the appearance frequency becomes higher, and the appearance frequency of the input data becomes larger as the appearance frequency becomes lower by the conversion unit 2. The input data DIN is converted into a value code. The conversion data DT converted here is sequentially searched for candidate data matching the input data by the encoding means 3 according to the dictionary search list, and the reference number of the last matching candidate data is the code data of the series of input data. It is output as DOUT. As a result, the input data DIN is encoded.

【０００７】また、算術符号化によりデータを圧縮する
第２の装置は、図12（Ｂ）に示すように、検索・登録部
４Ａ及び辞書並べ替え部４Ｂを有する自己組織化符号化
部（以下単にＳＯＲ符号化部という）４と、辞書データ
を格納する辞書５と、各文字列の出現頻度及び累積頻度
等を計数するカウンタ６と、ＳＯＲ符号を算術符号化し
て多値符号データを出力する算術符号部７から成る。A second apparatus for compressing data by arithmetic coding is, as shown in FIG. 12B, a self-organizing coding section (hereinafter referred to as a self-organizing coding section having a searching / registering section 4A and a dictionary rearranging section 4B). (SOR encoding unit) 4, a dictionary 5 that stores dictionary data, a counter 6 that counts the appearance frequency and cumulative frequency of each character string, and arithmetically encodes the SOR code to output multilevel code data. It consists of an arithmetic sign unit 7.

【０００８】第２の装置の動作を説明する。まず、圧縮
すべき文字列が辞書５に登録されているか否かを識別す
べく、ＳＯＲ符号化部４の検索・登録部４Ａにより辞書
５が参照される。ここで、辞書並べ替え部４Ｂは自己組
織化の規則に従って、辞書５を更新し、出現頻度の高い
文字列ほど登録番号が小さくなるように文字列の並び替
えを行う。The operation of the second device will be described. First, the dictionary 5 is referred to by the search / registration unit 4A of the SOR encoding unit 4 in order to identify whether or not the character string to be compressed is registered in the dictionary 5. Here, the dictionary rearrangement unit 4B updates the dictionary 5 according to the rule of self-organization, and rearranges the character strings such that the character string having a higher appearance frequency has a smaller registration number.

【０００９】辞書５に同じ文字列が有ると、検索・登録
部４Ａは、辞書５の登録番号をＳＯＲ符号として算術符
号化部７に出力する。未登録の場合には文字列が登録さ
れると共に、この文字列がＳＯＲ符号として算術符号化
部７に出力される。算術符号部７では、ＳＯＲ符号を算
術符号化して多値符号データを出力する。この際に、算
術符号部７は、カウンタ６からの各文字列の出現頻度及
び累積頻度等のカウント値に基づいて、符号ビット値及
び上下ビットの数値を求め、これらを多値符号データと
して出力する。これにより、入力文字列を符号化した多
値符号データが出力される。When the dictionary 5 has the same character string, the search / registration unit 4A outputs the registration number of the dictionary 5 as an SOR code to the arithmetic coding unit 7. If the character string is not registered, the character string is registered and the character string is output to the arithmetic encoding unit 7 as an SOR code. The arithmetic coding unit 7 arithmetically codes the SOR code and outputs multilevel code data. At this time, the arithmetic coding unit 7 obtains a code bit value and numerical values of upper and lower bits based on the count value of the appearance frequency and the cumulative frequency of each character string from the counter 6, and outputs them as multi-level code data. To do. As a result, multi-level code data obtained by encoding the input character string is output.

【００１０】[0010]

【発明が解決しようとする課題】ところで、従来例の第
１の装置によれば、入力データに一致する候補データを
辞書検索リストに従って、順次検索し、最後に一致した
候補データの参照番号を一連の入力データ列の符号デー
タＤOUT として出力しているため、ある程度，文字列の
順番が予想できる入力データ列であっても、連結リスト
構造を備えた辞書を検索リストに従って順次検索しなく
てはならない。By the way, according to the first device of the conventional example, the candidate data matching the input data is sequentially searched in accordance with the dictionary search list, and the reference number of the last matching candidate data is serially searched. Since the input data string is output as the code data DOUT of the input data string, the dictionary having the linked list structure must be searched sequentially according to the search list even if the input data string can predict the order of the character strings to some extent. .

【００１１】例えば、英文の例で「ａｎｄ」という文脈
が頻繁に出現する入力データを仮定した場合に、「ａ」
の次に、「ｎ」が続く確率は高くなる。また、ローマ字
「ｑ」という文字の次に出現する文字が「ｕ」となる確
率が高い。このような文字列や文字に対しても、入力デ
ータに一致する候補データを検索しなくてはならず、辞
書検索時間やデータ転送時間に無駄を生じ、データ処理
時間の高速化の妨げとなるという問題がある。For example, assuming input data in which the context "and" frequently appears in the English example, "a"
The probability that "n" will follow is increased. Also, the probability that the character that appears after the Roman character "q" will be "u" is high. Even for such character strings and characters, it is necessary to search for candidate data that matches the input data, which wastes dictionary search time and data transfer time, which hinders speeding up of data processing time. There is a problem.

【００１２】また、従来例の第２の装置によれば、辞書
５に同じ文字列が有ると、この登録番号がＳＯＲ符号と
して算術符号化部７に出力され、当該算術符号部７で
は、ＳＯＲ符号が算術符号化されて多値符号データが出
力されるため、文字列の順番が予想できる入力データ列
であっても、同じ文字列が有るか否かを辞書５を検索
し、その登録番号をＳＯＲ符号として算術符号部７に転
送しなくてはならない。Further, according to the second device of the conventional example, when the same character string is present in the dictionary 5, this registration number is output as an SOR code to the arithmetic encoding unit 7, and the arithmetic encoding unit 7 performs SOR. Since the code is arithmetically coded and multi-level code data is output, the dictionary 5 is searched for the same character string even if it is an input data string in which the order of the character strings can be predicted, and the registration number Must be transferred to the arithmetic code unit 7 as an SOR code.

【００１３】これにより、第１の装置と同様にデータ処
理時間の高速化の妨げとなるという問題がある。なお、
日本国特許庁が発行する公開特許公報，特開平３−６８
２１９号に見られるようなデータ圧縮装置は、ハフマン
符号化を利用した入力データの出現頻度に対して圧縮を
するものである。これは、１文字のみの出現確率を計算
し、それを可変長符号に割当てるものである。As a result, similarly to the first device, there is a problem that it hinders the speeding up of the data processing time. In addition,
Published patent gazette issued by the Japan Patent Office, JP-A-3-68
The data compression apparatus as seen in No. 219 compresses the appearance frequency of input data using Huffman coding. This is to calculate the appearance probability of only one character and assign it to the variable length code.

【００１４】つまり、この装置は１文字のみの出現頻度
のバラツキを利用しているため、ある程度，文字列の順
番が予想できる入力データ列であっても、１文字のみの
出現確率を計算しなくてはならず、一様に出現するデー
タ列の圧縮率がそれほど高くならないという問題があ
る。本発明は、かかる従来例の問題点に鑑み創作された
ものであり、ある程度，文字列の順番が予想できる場合
には、圧縮すべきデータの次のデータの出現確率を予測
して、データ処理速度の高速化を図ること、及び、出現
確率の高いデータを短い圧縮データに変換して、データ
圧縮率の向上を図ることが可能となるデータ処理装置及
びデータ処理方法の提供を目的とする。In other words, since this apparatus uses the variation in the appearance frequency of only one character, even if the input data string in which the order of the character strings can be predicted to some extent, the appearance probability of only one character is not calculated. However, there is a problem that the compression rate of uniformly appearing data strings is not so high. The present invention was created in view of the problems of the conventional example, and when the order of the character strings can be predicted to some extent, the appearance probability of the data next to the data to be compressed is predicted to perform the data processing. An object of the present invention is to provide a data processing device and a data processing method capable of increasing the speed and converting data having a high appearance probability into short compressed data to improve the data compression rate.

【００１５】[0015]

【課題を解決するための手段】図１は、本発明に係るデ
ータ処理装置の原理図を示している。本発明の第１のデ
ータ処理装置は、図１に示すように、圧縮すべきデータ
の種類毎に該データの出現度数を計数して出現度数テー
ブルを作成するデータ作成手段１１と、前記データ作成
手段からの出現度数テーブルを参照しながら、圧縮すべ
きデータの次のデータの出現予測をするデータ予測手段
１２と、前記データ予測手段のデータの出現予測に応じ
て出現確率の高いデータほど短いビット長の圧縮データ
に変換し、出現確率の低いデータほど長いビット長の圧
縮データに変換するデータ変換手段１３とを備えること
を特徴とする。FIG. 1 shows a principle diagram of a data processing apparatus according to the present invention. As shown in FIG. 1, the first data processing apparatus of the present invention includes a data creating unit 11 that creates an appearance frequency table by counting the appearance frequencies of the data to be compressed and the data creation unit. The data predicting means 12 for predicting the appearance of the data next to the data to be compressed while referring to the appearance frequency table from the means, and the shorter bit for the data having the higher appearance probability according to the appearance prediction of the data of the data predicting means. Data conversion means 13 for converting to long compressed data and converting to data having a lower appearance probability into longer bit length compressed data.

【００１６】本発明の第２のデータ処理装置は、前記デ
ータ作成手段１１が、圧縮すべき１単位のデータを入力
する毎に出現度数テーブルを更新することを特徴とす
る。本発明の第３のデータ処理装置は、前記データ変換
手段１３によって変換された圧縮データの長さを調整す
るレベル調整手段１４が設けられることを特徴とする。The second data processing apparatus of the present invention is characterized in that the data creating means 11 updates the appearance frequency table each time one unit of data to be compressed is input. The third data processing device of the present invention is characterized in that level adjusting means 14 for adjusting the length of the compressed data converted by the data converting means 13 is provided.

【００１７】本発明の第１〜３のデータ処理装置におい
て、前記データ作成手段１１は、圧縮すべき全てのデー
タ又は１単位のデータを取り込んで出現度数テーブルを
作成することを特徴とする。本発明の第１〜３のデータ
処理装置において、前記データ変換手段１３は、出現度
数テーブルでデータを出現度数の高い順に並べ換え、該
出現度数の高いデータから出現度数の低いデータに至る
位置に対して、該データを出現度数の高いデータほど短
い位置情報を割当て、かつ、出現度数の低いデータほど
長い位置情報を割当てた符号テーブルを参照することを
特徴とする。In the first to third data processing devices of the present invention, the data creating means 11 creates all the data to be compressed or one unit of data and creates an appearance frequency table. In the first to third data processing devices of the present invention, the data conversion unit 13 rearranges the data in the appearance frequency table in the descending order of appearance frequency, and determines the positions from the data with high appearance frequency to the data with low appearance frequency. The data is referred to a code table in which shorter position information is assigned to data having a higher appearance frequency and longer position information is assigned to data having a lower appearance frequency.

【００１８】本発明の第１のデータ処理方法は、図２の
処理フローチャートに示すように、ステップＰ１で、予
め、圧縮すべきデータの種類毎に該データの出現度数を
計数して出現度数テーブルを作成し、次に、ステップＰ
２で前記作成された出現度数テーブルを参照しながら、
圧縮すべきデータの次のデータの出現予測をし、その
語、ステップＰ３で前記データの出現予測に応じて出現
確率の高いデータほど短いビット長の圧縮データに変換
し、かつ、出現確率の低いデータほど長いビット長の圧
縮データに変換することを特徴とする。In the first data processing method of the present invention, as shown in the processing flow chart of FIG. 2, in step P1, the appearance frequency of the data is previously counted for each type of data to be compressed, and the appearance frequency table is obtained. And then step P
While referring to the appearance frequency table created in 2 above,
The appearance of the data next to the data to be compressed is predicted, and in step P3, the data having a higher appearance probability is converted into compressed data having a shorter bit length in accordance with the appearance prediction of the data, and the appearance probability is lower. The feature is that the data is converted into compressed data having a longer bit length.

【００１９】本発明の第１のデータ処理方法において、
前記圧縮データの変換は、出現度数テーブルのデータを
出現度数の高い順に並べ換え、該出現度数の高いデータ
から出現度数の低いデータに至る位置に対して、該デー
タを出現度数の高いデータほど短い位置情報を割当て、
かつ、出現度数の低いデータほど長い位置情報を割当て
た符号テーブルを参照することを特徴とする。In the first data processing method of the present invention,
In the conversion of the compressed data, the data of the appearance frequency table is rearranged in the descending order of the appearance frequency, and the data from the data having the high appearance frequency to the data having the low appearance frequency is located at the shorter position as the data having the higher appearance frequency. Assign information,
In addition, it is characterized in that a code table to which longer position information is assigned is referred to for data having a lower appearance frequency.

【００２０】本発明の第２のデータ処理方法は、前記出
現度数テーブル及び符号テーブルを、圧縮すべき１単位
のデータが入力される毎に更新することを特徴とする。
本発明の第３のデータ処理方法は、前記圧縮すべきデー
タを圧縮データに変換した後に、図２の処理フローチャ
ートのステップＰ４で、圧縮データのビット長を調整す
ることを特徴とする。A second data processing method of the present invention is characterized in that the appearance frequency table and the code table are updated each time one unit of data to be compressed is input.
The third data processing method of the present invention is characterized in that, after converting the data to be compressed into compressed data, the bit length of the compressed data is adjusted in step P4 of the processing flowchart of FIG.

【００２１】本発明の第１〜第３のデータ処理方法にお
いて、前記出現度数テーブルは、圧縮すべき全てのデー
タ又は１単位のデータに応じて作成することを特徴とす
る。本発明の第１〜第３のデータ処理方法において、前
記データの次のデータの出現予測は、出現度数テーブル
に書き込まれた各々の出現度数のデータと、圧縮すべき
データの次のデータとを比較することを特徴とする。In the first to third data processing methods of the present invention, the appearance frequency table is created according to all data to be compressed or one unit of data. In the first to third data processing methods of the present invention, the appearance prediction of the next data of the data is performed by calculating the data of each appearance frequency written in the appearance frequency table and the data next to the data to be compressed. It is characterized by comparing.

【００２２】本発明の第１〜第３のデータ処理方法にお
いて、前記圧縮データは、出現度数の高いデータから出
現度数の低いデータに至る位置を表示する位置情報及び
前記位置情報を識別する識別子から構成することを特徴
とし、上記目的を達成する。In the first to third data processing methods of the present invention, the compressed data is composed of position information indicating a position from data having a high appearance frequency to data having a low appearance frequency and an identifier for identifying the position information. The above-mentioned object is achieved by the constitution.

【００２３】[0023]

【作用】次に、図１を参照しながら本発明の第１のデー
タ処理装置の動作を説明をする。図１において、まず、
圧縮すべき全てのデータＤIN又は１単位のデータＤINの
出現度数が、データ作成手段１１によりデータ種類毎に
計測されて出現度数テーブルが作成される。Next, the operation of the first data processing apparatus of the present invention will be described with reference to FIG. In FIG. 1, first,
The appearance frequency of all the data DIN to be compressed or the data DIN of one unit is measured by the data creating means 11 for each data type, and an appearance frequency table is created.

【００２４】データ作成手段１１で作成された出現度数
テーブルはデータ予測手段１２により参照され、圧縮す
べきデータＤINの次のデータがデータ予測手段１２によ
り出現予測される。具体的な出現予測については、出現
度数テーブルに書き込まれた各々の出現度数のテータ
と、圧縮すべきデータＤINの次のデータとが比較され
る。The appearance frequency table created by the data creating means 11 is referred to by the data predicting means 12, and the data predicting means 12 predicts the next data of the data DIN to be compressed. For specific appearance prediction, the data of each appearance frequency written in the appearance frequency table is compared with the data next to the data DIN to be compressed.

【００２５】ここで、データ予測手段のデータの出現予
測に応じて、データ変換手段は出現確率の高いデータほ
ど短いビット長の圧縮データに変換し、出現確率の低い
データほど長いビット長の圧縮データに変換する。例え
ば、データ変換手段１３は、出現度数テーブルでデータ
を出現度数の高い順に並べ換え、該出現度数の高いデー
タから出現度数の低いデータに至る位置に対して、出現
度数の高いデータほど短い位置情報を割当て、出現度数
の低いデータほど長い位置情報を割当てた符号テーブル
を参照する。Here, according to the appearance prediction of the data of the data predicting means, the data converting means converts the data having a higher appearance probability into the compressed data having a shorter bit length, and the data having a lower appearance probability has a longer bit length. Convert to. For example, the data conversion unit 13 rearranges the data in the appearance frequency table in descending order of the appearance frequency, and for the position from the data with the high appearance frequency to the data with the low appearance frequency, the shorter position information is assigned to the data with the higher appearance frequency. Reference is made to a code table to which longer position information is assigned to data having a lower assignment frequency and appearance frequency.

【００２６】このような符号テーブルを参照すること
で、ある程度，文字列の順番が予想できる入力データ列
の場合に、１文字のみの出現確率を計算するハフマン符
号化方法に比べて、一様に出現するデータ列の圧縮率を
向上させることが可能となる。これにより、圧縮すべき
データの前後の対象文字や文字列の出現を予測しなが
ら、データ圧縮をすることが可能となり、辞書検索時間
やデータ転送時間の削減することができ、データ処理動
作が高速化する。By referring to such a code table, in the case of an input data string in which the order of the character strings can be expected to some extent, the Huffman coding method for calculating the appearance probability of only one character is more uniform. It is possible to improve the compression rate of the appearing data string. This makes it possible to compress data while predicting the appearance of target characters and character strings before and after the data to be compressed, reducing the dictionary search time and data transfer time, and speeding up data processing operations. Turn into.

【００２７】本発明の第２のデータ処理装置によれば、
圧縮すべき１単位のデータが入力される毎に出現度数テ
ーブルがデータ作成手段１１により更新されるため、デ
ータ作成手段１１での出現度数テーブルの更新に伴って
データ変換手段１３では短時間に圧縮データを書き換え
ることができ、タイナミック（動的）に符号テーブルを
再構成することができる（第２のデータ処理方法）。According to the second data processing device of the present invention,
Since the appearance frequency table is updated by the data creating means 11 every time one unit of data to be compressed is input, the data converting means 13 compresses in a short time as the appearance frequency table is updated by the data creating means 11. Data can be rewritten, and the code table can be dynamically (dynamically) reconfigured (second data processing method).

【００２８】また、出現度数テーブルの更新によって圧
縮データの全ての並び変えを行うことなく、一部の位置
情報及び識別子の入替えのみで符号テーブルを更新する
ことでき、当該符号テーブルを圧縮データに含める必要
が無くなり、データ圧縮率が向上する。これにより、メ
モリ容量が限られている対象機器にデータ圧縮機能を組
み込む場合に、１バイト又は１文字単位の出現頻度をダ
イナミックに符号化することができる。Further, the code table can be updated only by exchanging some position information and identifiers without rearranging all the compressed data by updating the appearance frequency table, and the code table is included in the compressed data. It eliminates the need and improves the data compression rate. As a result, when the data compression function is incorporated in a target device having a limited memory capacity, the appearance frequency in units of 1 byte or 1 character can be dynamically encoded.

【００２９】本発明の第３のデータ処理装置によれば、
入力データを圧縮データに変換した後に、レベル調整手
段１４により、圧縮データの長さが調整されるため、入
力データの種別に応じて、圧縮データの長さをダイナミ
ックにレベル調整して圧縮データを再構成するオフセッ
ト符号化を実行することができる（第３のデータ処理方
法）。According to the third data processing apparatus of the present invention,
Since the length of the compressed data is adjusted by the level adjusting means 14 after converting the input data into the compressed data, the length of the compressed data is dynamically level-adjusted according to the type of the input data to obtain the compressed data. The offset encoding to be reconstructed can be executed (third data processing method).

【００３０】すなわち、入力データの種類によっては、
圧縮すべきデータに偏りを生じ、出現確率が平均化され
る場合や、データの出現度数が十分に収集されていない
場合等には、次のデータの予測が困難となる場合があ
る。しかし、オフセット符号化を実行することにより、
度数分布の傾向が明確な場合には、入力データをより短
いビット長の圧縮データに変換し、出現度数分布に偏り
が生じている場合には、レベルを上げることにより、短
いビット長の圧縮データに変換することができる。That is, depending on the type of input data,
When the data to be compressed is biased and the appearance probabilities are averaged, or when the appearance frequency of the data is not sufficiently collected, it may be difficult to predict the next data. However, by performing offset coding,
When the tendency of the frequency distribution is clear, the input data is converted to compressed data with a shorter bit length, and when the occurrence frequency distribution is biased, the level is increased to reduce the compressed data with a shorter bit length. Can be converted to.

【００３１】これにより、データ圧縮率が向上する。な
お、各データ別の出現確率が事前に把握されている場合
には、出現度数の高いデータをメモリの上位に初期設定
をして置くことで、更に、短い位置情報に変換される確
率が高まる。本発明の第１のデータ処理方法によれば、
図２の処理フローチャートに示すように、ステップＰ１
で、予め、圧縮すべきデータの種類毎に該データの出現
度数を計数して出現度数テーブルを作成しているため、
圧縮すべきデータの前後の対象文字や文字列の接続状態
を把握することができる。例えば、英文の例で「ａｎ
ｄ」という文脈が頻繁に出現する入力データを仮定した
場合に、「ａ」の次に、「ｎ」が続く場合が多くなり、
また、ローマ字「ｑ」という文字の次に出現する文字が
「ｕ」となる場合が多いが、このような対象文字や文字
列の前後の接続状態が把握できる。As a result, the data compression rate is improved. If the appearance probability of each data is known in advance, the probability of conversion to shorter position information is further increased by initializing the data with high appearance frequency in the upper part of the memory. . According to the first data processing method of the present invention,
As shown in the process flowchart of FIG. 2, step P1
In advance, since the appearance frequency table is created by counting the appearance frequencies of the data for each type of data to be compressed,
It is possible to grasp the connection status of the target character and character string before and after the data to be compressed. For example, in English, "an
Assuming input data in which the context “d” frequently appears, “a” is often followed by “n”,
In addition, the character that appears after the roman character "q" is often "u", but the connection state before and after such a target character or character string can be grasped.

【００３２】また、ステップＰ２では出現度数テーブル
を参照しながら、圧縮すべきデータＤINの次のデータの
出現予測をしているため、先の例で「ａ」の次に出現す
る「ｎ」やローマ字「ｑ」の次に出現する「ｕ」の文字
を予測することができる。さらに、ステップＰ３では、
符号テーブルを参照しながら、入力データＤINの出現予
測に応じて出現確率の高い入力データＤINほど短いビッ
ト長の圧縮データに変換され、その出現確率の低い入力
データＤINほど長いビット長の圧縮データに変換されて
いるため、符号テーブルから直接，位置情報及び識別子
から構成される圧縮データＤOUT を出力することができ
る。In step P2, the appearance frequency of the data next to the data DIN to be compressed is predicted with reference to the appearance frequency table. Therefore, in the above example, "n" that appears next to "a" or The letter "u" that appears after the roman letter "q" can be predicted. Furthermore, in step P3,
According to the appearance prediction of the input data DIN, the input data DIN having a higher appearance probability is converted into compressed data having a shorter bit length according to the appearance prediction of the input data DIN, and the input data DIN having a lower appearance probability becomes a compressed data having a longer bit length. Since the data has been converted, the compressed data DOUT composed of the position information and the identifier can be output directly from the code table.

【００３３】これにより、文字列や文字の出現予測をす
ることで、辞書の検索といった概念が無くなる。特に、
文字列の順番が予想できるような入力データ列の場合に
は、辞書検索時間やデータ転送時間が低減し、データ処
理動作が高速化する。Thus, by predicting the appearance of a character string or a character, the concept of searching a dictionary disappears. In particular,
In the case of an input data string in which the order of character strings can be predicted, the dictionary search time and the data transfer time are reduced, and the data processing operation is speeded up.

【００３４】[0034]

【実施例】次に、図を参照しながら本発明の各実施例に
ついて説明をする。図３〜11は本発明の実施例に係るデ
ータ処理装置及びデータ処理方法を説明する図である。（１）第１の実施例の説明図３は、本発明の各実施例に係るデータ圧縮及び復元装
置の構成図であり、図４は、本発明の第１の実施例に係
るデータの圧縮及び復元フローチャートである。図５
は、データ圧縮時の符号変換エディタの機能説明図であ
り、図６は、各実施例に係る圧縮データのフォーマット
及び符号木の説明図をそれぞれ示している。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, each embodiment of the present invention will be described with reference to the drawings. 3 to 11 are views for explaining the data processing device and the data processing method according to the embodiment of the present invention. (1) Description of First Embodiment FIG. 3 is a block diagram of a data compression / decompression device according to each embodiment of the present invention, and FIG. 4 is a data compression according to the first embodiment of the present invention. And a restoration flowchart. Figure 5
[Fig. 6] is a function explanatory diagram of a code conversion editor at the time of data compression, and Fig. 6 is an explanatory diagram of a format of compressed data and a code tree according to each embodiment.

【００３５】例えば、本発明の第１〜第３の装置を組み
合わせたデータ圧縮又は復元装置は、図３に示すよう
に、出現度数作成エディタ２１，テータ比較エディタ２
２，符号変換エディタ２３，レベル調整エディタ２４，
メモリ２５，ＥＰＲＯＭ２６，ディスプレイ２７，キー
ボード２８，中央処理装置（以下ＣＰＵという）２９，
入力データファイル３０及び圧縮データファイル３１か
ら成る。For example, as shown in FIG. 3, the data compression or decompression device in which the first to third devices of the present invention are combined, the appearance frequency creation editor 21, the data comparison editor 2
2, a code conversion editor 23, a level adjustment editor 24,
Memory 25, EPROM 26, display 27, keyboard 28, central processing unit (hereinafter referred to as CPU) 29,
It comprises an input data file 30 and a compressed data file 31.

【００３６】出現度数作成エディタ２１はデータ作成手
段１１の一例であり、圧縮すべきデータＤINを入力して
該データＤINと出現度数とを対比させた出現度数テーブ
ルを作成する。出現度数作成エディタ２１は、圧縮すべ
き全てのデータＤIN又は１単位のデータを取り込んで出
現度数テーブルを作成する。出現度数はエディタ２１に
よって加算しても積算しても良い。The appearance frequency creating editor 21 is an example of the data creating means 11 and inputs the data DIN to be compressed and creates an appearance frequency table in which the data DIN is compared with the appearance frequency. The appearance frequency creation editor 21 takes in all data DIN or one unit of data to be compressed and creates an appearance frequency table. The appearance frequency may be added or integrated by the editor 21.

【００３７】例えば、エディタ２１は表１に示すような
１６進法により表される２５６個のデータ「００」〜
「ＦＦ」及び１０進法により表される出現回数を対比さ
せた出現度数テーブルを作成する。For example, the editor 21 displays 256 pieces of data "00" represented by the hexadecimal system as shown in Table 1-
An appearance frequency table in which the number of appearances represented by "FF" and the decimal system is compared is created.

【００３８】[0038]

【表１】 [Table 1]

【００３９】表１の例では、データ「００」に対する出
現度数が７５回，データ「０１」に対する出現度数が５
０回，データ「０２」に対する出現度数が100 回，デー
タ「ＦＦ」に対する出現度数が５０回となっている。ま
た、出現度数作成エディタ２１は、圧縮すべき１単位の
データが入力される毎に出現度数テーブルを更新する
（本発明の第２の装置）。In the example of Table 1, the appearance frequency for the data "00" is 75 times and the appearance frequency for the data "01" is 5 times.
0 times, the occurrence frequency for the data “02” is 100 times, and the appearance frequency for the data “FF” is 50 times. Further, the appearance frequency creation editor 21 updates the appearance frequency table every time one unit of data to be compressed is input (the second device of the present invention).

【００４０】テータ比較エディタ２２はデータ予測手段
１２の一例であり、出現度数作成エディタ２１からの出
現度数テーブルを参照しながら、圧縮すべきデータＤIN
の次のデータの出現予測をする。例えば、データ比較エ
ディタ２２は、出現度数テーブルに書き込まれた各々の
出現度数のテータと、圧縮すべきデータＤINの次のデー
タとを比較する。この比較結果から、次のデータの出現
予測をする。The data comparison editor 22 is an example of the data predicting means 12, and refers to the appearance frequency table from the appearance frequency creating editor 21 and refers to the data DIN to be compressed.
Predict the appearance of the next data in. For example, the data comparison editor 22 compares the data of each appearance frequency written in the appearance frequency table with the data next to the data DIN to be compressed. The appearance of the next data is predicted from this comparison result.

【００４１】符号変換エディタ２３はデータ変換手段１
３の一例であり、データ比較エディタ２２からの入力デ
ータＤINの出現予測に応じて、該データＤINを位置情報
に変換するものである。例えば、エディタ２３は出現度
数の高いデータほど短い位置情報に変換し、かつ、出現
度数の低いデータほど長い位置情報に変換する。先の表
１の例を具体的な位置情報で示すと、表２に示すような
符号テーブルが得られる。The code conversion editor 23 is a data conversion means 1
3 is an example, and the data DIN is converted into position information according to the appearance prediction of the input data DIN from the data comparison editor 22. For example, the editor 23 converts data having a higher appearance frequency into shorter position information, and data having a lower appearance frequency into longer position information. When the example of Table 1 above is shown by specific position information, a code table as shown in Table 2 is obtained.

【００４２】[0042]

【表２】 [Table 2]

【００４３】表２の符号テーブルの例では、出現度数の
高いデータ「０２」，「ＦＤ」に対しては、「００」，
「０１」の位置情報に変換され、次に出現度数の高いデ
ータ「００」，「０３」に対しては、「１００」，「１
０１」の位置情報に変換され、次に出現度数の高いデー
タ「０１」，「０４」，「ＦＥ」，「ＦＦ」に対して
は、「１１００」、「１１０１」，「１１１０」，「１
１１１」の位置情報にそれぞれ変換される。なお、符号
テーブルの作成機能をエディタ２１に持たせても良い。In the example of the code table in Table 2, "00", "FD", and "00" are assigned to the data "02" and "FD" having high occurrence frequencies.
For the data "00" and "03" that have been converted into the positional information of "01" and have the next highest appearance frequency, "100" and "1
“1100”, “1101”, “1110”, “1” for the data “01”, “04”, “FE”, and “FF” that have been converted into the positional information of “01” and have the next highest occurrence frequency.
111 ", respectively. The code table creating function may be provided in the editor 21.

【００４４】本発明の各実施例における圧縮データのフ
ォーマットは、図６（Ａ）に示すようになる。図６
（Ａ）において、圧縮データＤOUT は識別子及び位置情
報から構成する。具体的な例を表３に示している。The format of the compressed data in each embodiment of the present invention is as shown in FIG. 6 (A). Figure 6
In (A), the compressed data DOUT is composed of an identifier and position information. A specific example is shown in Table 3.

【００４５】[0045]

【表３】 [Table 3]

【００４６】表３では、例えば、２５６個のデータサン
プルに対して、位置情報を識別するために４つの識別子
を割当てている。具体的には、先頭位置又は第１番目の
位置情報を識別する識別子として「００」を割当てる。
第２番目〜第５番目の位置情報を識別する識別子として
「０１０」を割当てる。同様に、第８番目〜第１３番目
の位置情報を識別する識別子として「０１１」を割当
て、第１４番目〜第２５５番目の位置情報を識別する識
別子として「１」をそれぞれ割当てる。In Table 3, for example, for 256 data samples, four identifiers are assigned to identify the position information. Specifically, "00" is assigned as an identifier for identifying the head position or the first position information.
"010" is assigned as an identifier for identifying the second to fifth position information. Similarly, "011" is assigned as an identifier for identifying the 8th to 13th position information, and "1" is assigned as an identifier for identifying the 14th to 255th position information.

【００４７】また、位置情報のビット幅は識別子「０
０」に対して１ビットを割当て、位置情報の内容は
「０」又は「１」を割り当てる。識別子「０１０」に対
しては２ビットを割当て、その内容は「００」，「０
１」，「１０」，「１１」となる。識別子「０１１」に
対しては３ビットを割当て、その内容は「０００」，
「００１」，「０１０」，「１００」，「０１１」，
「１０１」，「１１０」，「１１１」となる。識別子
「１」に対しては８ビットを割当て、その内容は「００
００００００」〜「１１１１１１１１」となる。なお、
符号変換エディタ２３の機能については図５（Ａ），
（Ｂ）において説明をする。The bit width of the position information has the identifier "0.
1 bit is assigned to "0", and "0" or "1" is assigned to the content of the position information. 2 bits are allocated to the identifier "010", and the contents are "00" and "0".
It becomes 1 ”,“ 10 ”, and“ 11 ”. 3 bits are allocated to the identifier “011”, and the content is “000”,
"001", "010", "100", "011",
It becomes "101", "110", and "111". 8 bits are allocated to the identifier "1", and the content is "00".
It becomes "000000"-"11111111". In addition,
The function of the code conversion editor 23 is shown in FIG.
This will be described in (B).

【００４８】レベル調整エディタ２４はレベル調整手段
１４の一例であり、符号変換エディタ２３によって変換
された圧縮データの長さを調整するものである（本発明
の第３の装置）。当該エディタ２４の機能については図
10，図11において詳述する。メモリ２５は圧縮時には、
入力データＤINを一時記憶するものである。例えば、メ
モリ２５には随時書込み／読出し可能なメモリを用い
る。メモリ２５は復元時には、復元すべき圧縮データを
一時記憶する。The level adjustment editor 24 is an example of the level adjustment means 14 and adjusts the length of the compressed data converted by the code conversion editor 23 (the third device of the present invention). Figure about the function of the editor 24
This will be described in detail with reference to FIGS. When the memory 25 is compressed,
The input data DIN is temporarily stored. For example, a memory that can be written / read at any time is used as the memory 25. The memory 25 temporarily stores compressed data to be restored at the time of restoration.

【００４９】ＥＰＲＯＭ２６は、各実施例で使用する制
御アルゴリズムを格納するプログラム可能な読出し専用
メモリである。例えば、第１の実施例では図４（Ａ）に
示すようなデータ圧縮アルゴリズムや図４（Ｂ）に示す
ようなデータ復元アルゴリズムがＥＰＲＯＭ２６に格納
される。第２の実施例では、図７（Ａ）に示すようなデ
ータ圧縮アルゴリズム（メインルーチン）や、図７
（Ｂ）に示すようなデータ復元アルゴリズムが格納され
る。また、図７（Ａ）のメインルーチンに対して図８に
示すような出現度数テーブルの更新アルゴリズム（サブ
ルーチン）がＥＰＲＯＭ２６に格納される。The EPROM 26 is a programmable read-only memory that stores the control algorithm used in each embodiment. For example, in the first embodiment, the EPROM 26 stores the data compression algorithm as shown in FIG. 4A and the data decompression algorithm as shown in FIG. 4B. In the second embodiment, the data compression algorithm (main routine) as shown in FIG.
A data restoration algorithm as shown in (B) is stored. Further, an update algorithm (subroutine) for the appearance frequency table as shown in FIG. 8 is stored in the EPROM 26 for the main routine of FIG.

【００５０】第３の実施例では、図10（Ａ）に示すよう
なデータ圧縮アルゴリズムや図10（Ｂ）に示すようなデ
ータ復元アルゴリズムが格納される。図10（Ａ）のメイ
ンルーチンに対して図11に示すような位置情報のレベル
調整アルゴリズム（サブルーチン）がＥＰＲＯＭ２６に
格納される。この制御アルゴリズムの具体的な内容につ
いては各実施例において説明をする。In the third embodiment, a data compression algorithm as shown in FIG. 10 (A) and a data decompression algorithm as shown in FIG. 10 (B) are stored. A position information level adjustment algorithm (subroutine) as shown in FIG. 11 is stored in the EPROM 26 with respect to the main routine of FIG. The specific contents of this control algorithm will be described in each embodiment.

【００５１】ディスプレイ２７はキーボード２８やＣＰ
Ｕ２９の入出力を補助するツールである。キーボード２
８は各エディタ２１〜２４の初期設定や起動命令等を制
御文にして入力するツールである。ＣＰＵ２９は、各エ
ディタ２１〜２４，メモリ２５，ＥＰＲＯＭ２６，ディ
スプレイ２７，キーボード２８，入力データファイル３
０及び圧縮データファイル３１の入出力を制御する。The display 27 is a keyboard 28 or a CP.
It is a tool that assists the input and output of U29. Keyboard 2
Reference numeral 8 is a tool for inputting initial settings of each of the editors 21 to 24, start-up commands and the like into control statements. The CPU 29 includes editors 21 to 24, a memory 25, an EPROM 26, a display 27, a keyboard 28, an input data file 3
0 and the input / output of the compressed data file 31 are controlled.

【００５２】入力データファイル３０は圧縮時又は復元
時のデータを格納するものである。圧縮データファイル
３１は、圧縮時又は復元時の圧縮データを格納するメモ
リであり、ファイル３０と同様なメモリ装置を用いる。
これらにより、データ圧縮及び復元装置が構成され、圧
縮すべきデータを符号化し、該符号化された圧縮データ
を復号化することができる。The input data file 30 stores the data at the time of compression or decompression. The compressed data file 31 is a memory that stores compressed data at the time of compression or decompression, and uses the same memory device as the file 30.
With these components, a data compression / decompression device is configured to encode the data to be compressed and to decode the encoded compressed data.

【００５３】次に、本発明の第１の実施例に係るデータ
圧縮方法について、図４（Ａ）の処理フローチャートを
参照しながら、当該装置の動作を説明する。図４（Ａ）
は、本発明の第１の実施例に係るデータの圧縮フローチ
ャートであり、図３に示したＥＰＲＯＭ２６に格納され
た制御アルゴリズムを成す。例えば、入力データＤINの
次のデータの出現予測をしながら、データを符号化して
圧縮データＤOUT を出力する場合、図４（Ａ）のフロー
チャートにおいて、まず、ステップＰ１で、入力データ
列をファイル３０から全て読み込んで出現度数テーブル
を作成する。出現度数テーブルは、出現度数作成エディ
タ２１によって、入力データＤINの種類毎に該データを
計数することにより作成する。Next, with respect to the data compression method according to the first embodiment of the present invention, the operation of the apparatus will be described with reference to the processing flowchart of FIG. FIG. 4 (A)
3 is a data compression flowchart according to the first embodiment of the present invention, which constitutes a control algorithm stored in the EPROM 26 shown in FIG. For example, in the case of encoding the data and outputting the compressed data DOUT while predicting the appearance of the next data of the input data DIN, first, in the flowchart of FIG. Read all from and create an appearance frequency table. The appearance frequency table is created by the appearance frequency creation editor 21 by counting the data for each type of input data DIN.

【００５４】具体的には、図５（Ａ）に示すような１６
進法により表される２５６個のデータ「００」〜「Ｆ
Ｆ」及び１０進法により表される出現回数を一覧表にま
とめた出現度数テーブルがエディタ２１により作成され
る。本実施例では出現度数テーブルＴ１に関し、データ
「００」に対して出現度数が３回、データ「０１」に対
して出現度数が０回、データ「０２」に対して出現度数
が２３回、データ「０３」に対して出現度数が０回、デ
ータ「０４」に対して出現度数が１０回、データ「Ｆ
Ｅ」に対して出現度数が５回、データ「ＦＦ」に対して
出現度数が１回となっている。Specifically, 16 as shown in FIG.
256 data “00” to “F” represented by the base system
The editor 21 creates an appearance frequency table in which the number of appearances represented by “F” and the decimal system is summarized in a list. In the present embodiment, regarding the appearance frequency table T1, the appearance frequency is 3 times for the data “00”, the appearance frequency is 0 times for the data “01”, the occurrence frequency is 23 times for the data “02”, and the data is The appearance frequency is 0 for "03", the appearance frequency is 10 for data "04", and the data "F"
The appearance frequency is 5 for "E", and the appearance frequency is 1 for data "FF".

【００５５】このような例では、英文の場合、「ａｎ
ｄ」という文脈が頻繁に出現する入力データを仮定した
場合に、１６進法の６１により表されるデータ「ａ」の
次には１６進法の６Ｅにより表されるデータ「ｎ」が続
く場合が多くなり、また、ローマ字「ｑ」という文字の
次に出現する文字が「ｕ」となる場合が多く出現する。
このような統計情報を取得すると、対象文字や文字列の
前後の接続状態が把握できる。なお、出現度数テーブル
は、入力データを全て読み込むことなく、圧縮すべき１
単位の入力データ毎に作成しても良い。In such an example, in the case of English, "an
When the input data in which the context "d" frequently appears is assumed, the data "a" represented by hexadecimal 61 is followed by the data "n" represented by hexadecimal 6E. In many cases, the character that appears after the Roman character “q” becomes “u”.
By acquiring such statistical information, the connection state before and after the target character or character string can be grasped. The appearance frequency table should be compressed without reading all input data.
It may be created for each unit of input data.

【００５６】次いで、ステップＰ２で出現度数テーブル
から符号テーブルを作成する。ここで、図５（Ａ）にお
いて、符号変換エディタ２３は、出現度数テーブルＴ１
のデータを出現度数の高い順に並べ換える。この結果、
交換後のテーブルＴ２の先頭位置には、データ「０２」
が書き込まれ、以下順に、データ「０４」，「ＦＥ」，
「００」，「ＦＦ」…「０１」，「０３」が並ぶ。Then, in step P2, a code table is created from the appearance frequency table. Here, in FIG. 5A, the code conversion editor 23 uses the appearance frequency table T1.
Data is sorted in descending order of appearance frequency. As a result,
The data “02” is set at the beginning position of the table T2 after the exchange.
Is written, and data “04”, “FE”,
“00”, “FF” ... “01”, “03” are lined up.

【００５７】次に、ステップＰ３で圧縮すべきデータ列
をファイル３０から１単位のデータを読込んで符号化を
する。この際に、出現度数作成エディタ２１からの出現
度数テーブルを参照しながら、圧縮すべきデータＤINの
次のデータを出現予測をする。例えば、データ比較エデ
ィタ２２により、出現度数テーブルに書き込まれた各々
の出現度数のテータと、圧縮すべきデータＤINの次のデ
ータとが比較される。これにより、先の例で「ａ」の次
に出現する「ｎ」やローマ字「ｑ」の次に出現する
「ｕ」の文字を予測することができる。Next, in step P3, a unit of data is read from the file 30 and the data string to be compressed is encoded. At this time, referring to the appearance frequency table from the appearance frequency creation editor 21, the data next to the data DIN to be compressed is predicted to appear. For example, the data comparison editor 22 compares the data of each appearance frequency written in the appearance frequency table with the data next to the data DIN to be compressed. This makes it possible to predict the character "n" that appears next to "a" and the character "u" that appears next to the Roman character "q" in the above example.

【００５８】さらに、エディタ２３は図５（Ｂ）に示す
ように、該出現度数の高いデータ「０２」から出現度数
の低いデータ「０３」に至る相対位置０〜１５…に対し
て、データを位置情報に変換する。これを先に説明をし
た図５（Ａ）の交換後のテーブルＴ２に関し、具体的に
識別子を導入してデータ変換すると、図５（Ｂ）に示す
ようになる。相対位置「０」，「１」に対しては、デー
タが識別子「００」及び位置情報「０」，「１」にそれ
ぞれ変換される。Further, as shown in FIG. 5B, the editor 23 stores data for the relative positions 0 to 15 ... From the data "02" having a high appearance frequency to the data "03" having a low appearance frequency. Convert to location information. Regarding the table T2 after the exchange in FIG. 5 (A) described above, when the data is converted by specifically introducing the identifier, the result is as shown in FIG. 5 (B). For the relative positions "0" and "1", the data is converted into the identifier "00" and the position information "0" and "1", respectively.

【００５９】相対位置「２」〜「５」に対しては、デー
タが識別子「０１０」及び位置情報「００」，「０
１」，「１０」，「１１」にそれぞれ変換される。相対
位置「６」〜「１３」に対しては、データが識別子「０
１１」及び位置情報「０００」，「００１」，「０１
０」，「１００」，「０１１」，「１０１」，「１１
０」，「１１１」にそれぞれ変換される。For the relative positions "2" to "5", the data is the identifier "010" and the position information "00", "0".
It is converted into 1 ”,“ 10 ”, and“ 11 ”, respectively. For the relative positions “6” to “13”, the data is the identifier “0”.
11 "and position information" 000 "," 001 "," 01 "
0 "," 100 "," 011 "," 101 "," 11 "
0 "and" 111 ", respectively.

【００６０】その後、ステップＰ４では、入力データＤ
INを全て圧縮したか否かを判断する。ここで、入力デー
タＤINを全て圧縮した場合（ＹES）には、制御アルゴリ
ズムを終了する。全てのデータを圧縮していない場合
（ＮＯ）には、ステップＰ３に戻って符号化処理を継続
する。これにより、入力データＤINの次のデータの出現
予測をしながら、データを符号化し、これら圧縮データ
ＤOUT をファイル３１に格納することができる。Then, in step P4, the input data D
Judge whether all IN are compressed. If all the input data DIN have been compressed (YES), the control algorithm ends. If all the data has not been compressed (NO), the process returns to step P3 to continue the encoding process. As a result, the data can be encoded and the compressed data DOUT can be stored in the file 31 while predicting the appearance of the next data of the input data DIN.

【００６１】次に、本発明の第１の実施例に係る圧縮デ
ータの復元処理について説明をする。図４（Ｂ）の復号
フローチャートにおいて、まず、ステップＰ１で圧縮デ
ータをファイル３１から読出し識別子及び位置情報から
符号テーブルを作成する。符号テーブルについては、図
５において説明したものが再現される。次に、ステップ
Ｐ２で再度、圧縮データをファイル３１から読出して符
号テーブルを参照しながら、圧縮データを復号化する。
復号化は、一般的に用いられる図６（Ｂ）に示すような
符号木の構成によって実行する。Next, a compressed data decompression process according to the first embodiment of the present invention will be described. In the decoding flowchart of FIG. 4B, first, in step P1, compressed data is read from the file 31 and a code table is created from the identifier and the position information. The code table described in FIG. 5 is reproduced. Next, in step P2, the compressed data is read again from the file 31 and the compressed data is decoded while referring to the code table.
Decoding is executed by the configuration of a code tree as shown in FIG.

【００６２】次いで、ステップＰ３で圧縮データを全て
復元したか否かを判断する。この際に、圧縮データを全
て復元した場合（ＹES）には、制御アルゴリズムを終了
し、圧縮データを全て復元していない場合（ＮＯ）に
は、ステップＰ２に戻って、圧縮データの復号化処理を
継続する。これにより、圧縮データが復号化され、この
原データがファイル３０に格納される。Then, in step P3, it is determined whether or not all the compressed data has been restored. At this time, if all the compressed data are restored (YES), the control algorithm ends, and if all the compressed data are not restored (NO), the process returns to step P2 to perform the decoding process of the compressed data. To continue. As a result, the compressed data is decrypted and this original data is stored in the file 30.

【００６３】このようにして、本発明の各実施例に係る
データ処理装置によれば、図３に示すように、出現度数
作成エディタ２１，テータ比較エディタ２２，符号変換
エディタ２３及びレベル調整エディタ２４等を備えるた
め、出現度数作成エディタ２１により出現度数テーブル
が作成されると、テータ比較エディタ２２により、圧縮
すべきデータの次に出現する文字又は文字列を予測する
ことができ、この出現予測されたデータを符号変換エデ
ィタ２３により、その出現確率が高いデータほど短い位
置情報に変換し、出現確率の低いデータほど長い位置情
報に変換することができる。As described above, according to the data processing apparatus according to each embodiment of the present invention, as shown in FIG. 3, the appearance frequency creation editor 21, the data comparison editor 22, the code conversion editor 23, and the level adjustment editor 24. Since the appearance frequency creation editor 21 creates the appearance frequency table, the data comparison editor 22 can predict the character or character string that appears next to the data to be compressed. By using the code conversion editor 23, the data having a higher appearance probability can be converted into shorter position information, and the data having a lower appearance probability can be converted into longer position information by the code conversion editor 23.

【００６４】これにより、入力データの出現度数が高い
ものテーブル領域の上位に持ち、最上位からの位置を符
号化（仮称：オフセット符号化）をすることができる。
特に、文字列の順番がテータ比較エディタ２２によって
予想できる入力データ列の場合には、従来例のような符
号木によって、１文字のみの出現確率を計算するハフマ
ン符号化方法に比べて、一様に出現するデータ列の圧縮
率が向上し、データ処理速度が高速化する。As a result, the input data having a high frequency of occurrence can be placed in the upper part of the table area and the position from the uppermost position can be encoded (tentative name: offset encoding).
In particular, in the case of an input data string in which the order of character strings can be predicted by the data comparison editor 22, compared to the Huffman coding method that calculates the appearance probability of only one character using a code tree as in the conventional example, it is more uniform. The compression rate of the data string appearing in is improved and the data processing speed is increased.

【００６５】また、本発明の装置によれば、符号変換エ
ディタ２３によって変換された圧縮データの長さがレベ
ル調整エディタ２４により調整されるため、入力データ
の種別に応じて、圧縮データの長さをダイナミックにレ
ベル調整した圧縮データを再構成することができる。さ
らに、本発明の第１の実施例に係るデータ圧縮方法によ
れば、図４（Ａ）の処理フローチャートに示すように、
ステップＰ１で、予め、圧縮すべきデータの種類毎に該
データの出現度数を計数して出現度数テーブルを作成し
ているため、順次、読み込まれる入力データＤINの前後
の対象文字の接続状態を把握することができる。Further, according to the apparatus of the present invention, the length of the compressed data converted by the code conversion editor 23 is adjusted by the level adjustment editor 24. Therefore, the length of the compressed data is adjusted according to the type of the input data. The compressed data whose level is dynamically adjusted can be reconstructed. Furthermore, according to the data compression method of the first embodiment of the present invention, as shown in the processing flowchart of FIG.
In step P1, since the appearance frequency table is created by counting the appearance frequencies of the data to be compressed in advance, the connection state of the target characters before and after the input data DIN to be sequentially read is grasped. can do.

【００６６】英文の例で説明したように「ａｎｄ」とい
う文脈が頻繁に出現する入力データを仮定した場合に、
「ａ」の次に、「ｎ」が続く場合が多くなることや、ロ
ーマ字「ｑ」という文字の次に出現する文字が「ｕ」と
なる場合等の規則性を確率的に把握できる。また、ステ
ップＰ３では、出現度数テーブルを参照しながら、圧縮
すべきデータＤINの次のデータの出現予測をしているた
め、先の例で「ａ」の次に出現する「ｎ」やローマ字
「ｑ」の次に出現する「ｕ」の文字を予測することがで
きる。As described in the English example, assuming input data in which the context "and" frequently appears,
It is possible to probabilistically understand the regularity such that “a” is followed by “n” in many cases, and the character that appears after the Roman character “q” is “u”. Further, in step P3, the appearance frequency of the data next to the data DIN to be compressed is predicted with reference to the appearance frequency table. Therefore, in the above example, "n" or romaji "appearing next to" a "appears. The letter "u" that appears next to "q" can be predicted.

【００６７】さらに、ステップＰ３では、符号テーブル
を参照しながら、入力データＤINの出現予測に応じて出
現確率の高いデータほど短い位置情報に変換し、その出
現確率の低いデータほど長い位置情報に変換しているた
め、符号テーブルから直接，位置情報及び識別子を有す
る圧縮データＤOUT を出力することができる。これによ
り、文字列や文字の出現予測をすることで、辞書の検索
といった概念が無くなる。特に、文字列の順番が予想で
きるような入力データ列の場合には、従来例の第２の装
置のように同じ文字列が有るか否か等の辞書検索が不要
となり、辞書検索時間やデータ転送時間が削減化する。Further, in step P3, referring to the code table, data having a higher appearance probability is converted into shorter position information according to the appearance prediction of the input data DIN, and data having a lower appearance probability is converted into longer position information. Therefore, the compressed data DOUT having the position information and the identifier can be directly output from the code table. Thus, by predicting the appearance of a character string or a character, the concept of searching a dictionary disappears. In particular, in the case of an input data string in which the order of the character strings can be predicted, it is not necessary to perform a dictionary search such as whether or not the same character string exists as in the second device of the conventional example, and the dictionary search time and data Transfer time is reduced.

【００６８】なお、符号化及び復号化の処理時間を短く
するための更なる工夫として、表３及び図５（Ｂ）の識
別子が「１」のときに、位置情報の代わりに入力データ
ＤINを８ビットのまま符号化しても良い。これにより、
〔入力データ／圧縮データ〕×100 ％で表されるデータ
圧縮率が向上する。また、圧縮率を高めるためには、入
力データＤINの次の１文字だけではなく、複数の文字列
のつながりを判断すると良い。この場合、圧縮率が高ま
るが、データ処理速度やメモリ容量等に制約が生じる。As a further device for shortening the encoding and decoding processing time, when the identifier in Table 3 and FIG. 5B is "1", the input data DIN is used instead of the position information. You may encode as it is 8 bits. This allows
[Input data / compressed data] × 100% The data compression rate is improved. Further, in order to increase the compression rate, it is preferable to determine not only the next character of the input data DIN but also the connection of a plurality of character strings. In this case, the compression rate increases, but the data processing speed, memory capacity, etc. are restricted.

【００６９】（２）第２の実施例の説明図７（Ａ）は、本発明の第２の実施例に係るデータの圧
縮フローチャートであり、図７（Ｂ）は、その復元フロ
ーチャートである。図８は、出現度数テーブルの更新フ
ローチャートであり、図９（Ａ）〜（Ｃ）は、その更新
時のデータ交換の状態図をそれぞれ示している。各フロ
ーチャートは制御アルゴリズムとして図３に示したＥＰ
ＲＯＭ２６に格納される。(2) Description of Second Embodiment FIG. 7A is a data compression flowchart according to the second embodiment of the present invention, and FIG. 7B is a decompression flowchart thereof. FIG. 8 is a flowchart for updating the appearance frequency table, and FIGS. 9A to 9C are state diagrams of data exchange at the time of updating. Each flowchart is the EP shown in FIG. 3 as a control algorithm.
It is stored in the ROM 26.

【００７０】第２の実施例では第１の実施例と異なり、
出現度数テーブル及び符号テーブルを、圧縮すべき１単
位のデータが入力される毎にダイナミック（動的）に更
新するものである。図９（Ａ）に示すような出現度数テ
ーブルのデータ配列Ａを更新する場合、図７（Ａ）のフ
ローチャートにおいて、まず、ステップＰ１で、出現度
数テーブル及び符号テーブルを初期化する。図９（Ａ）
において、更新前の出現度数テーブルのデータ配列Ａ
は、例えば、「００」，「０１」，「０２」，「０３」
…のようになる。The second embodiment differs from the first embodiment in that
The appearance frequency table and the code table are dynamically updated every time one unit of data to be compressed is input. When updating the data array A of the appearance frequency table as shown in FIG. 9A, first, in the flowchart of FIG. 7A, in step P1, the appearance frequency table and the code table are initialized. FIG. 9 (A)
In, the data array A of the appearance frequency table before update
Is, for example, “00”, “01”, “02”, “03”
…become that way.

【００７１】次に、ステップＰ２で１単位の入力データ
列ＤINをファイル３０から読み込んで、それを符号化す
る。符号化は第１の実施例で説明したように、１バイト
又は１文字単位に符号変換エディタ２３により実行され
る。次いで、ステップＰ３で出現度数テーブルを更新す
る。具体的には、出現度数作成エディタ２１により１単
位のデータが入力される毎に、出現度数テーブルが更新
される。出現度数テーブルの更新については、図８のサ
ブルーチンに移行して、ステップＰ31で、順次入力され
たコードに対応して、データ配列Ｂの出現度数を更新す
る。この例では、図９（Ａ）に示すように、順次，出現
度数テーブルに入力されてくるデータ（コード）「０
３」，「００」，「０１」，「０２」に対して、配列Ｂ
の「０３」にポインタが指示された場合を示している。Next, in step P2, the input data string DIN of one unit is read from the file 30 and encoded. The encoding is executed by the code conversion editor 23 in units of 1 byte or 1 character as described in the first embodiment. Then, in step P3, the appearance frequency table is updated. Specifically, the appearance frequency table is updated every time one unit of data is input by the appearance frequency creation editor 21. Regarding the update of the appearance frequency table, the process proceeds to the subroutine of FIG. 8, and in step P31, the appearance frequency of the data array B is updated corresponding to the sequentially input code. In this example, as shown in FIG. 9A, the data (code) “0” sequentially input to the appearance frequency table.
For “3”, “00”, “01”, and “02”, array B
The case where the pointer is pointed to "03" is shown.

【００７２】次に、ステップＰ32でデータ配列Ｂの現在
の出現度数と上位にエントリーされた出現度数とを比較
する。これは、出現度数の高いコードを上位に移動する
ためである。その後、ステップＰ33で両出現度数を比較
した結果、現在の出現度数が大きい場合もしくは上位の
出現度数と等しい場合（ＹES）には、ステップＰ34に移
行する。また、比較結果、現在の出現度数が小さい場合
（ＮＯ）には、更新を行わずに、メインルーチンに復帰
する。Next, in step P32, the current frequency of appearance of the data array B is compared with the frequency of appearance entered in the higher order. This is because a code having a high appearance frequency is moved to a higher position. After that, as a result of comparing both appearance frequencies in step P33, when the current appearance frequency is large or equal to the higher appearance frequency (YES), the process proceeds to step P34. If the current appearance frequency is low as a result of comparison (NO), the process returns to the main routine without updating.

【００７３】現在の出現度数が大きいと判断された場合
（ＹES）には、ステップＰ34でデータ配列Ｂの上位のエ
ントリーの出現度数を検索する。この検索は、データ配
列Ｂの上位の出現度数と比較して、現在の出現度数が小
さくなるまで行う。次に、ステップＰ35でデータ配列Ｂ
で検索したエントリー内容と交換する。図９（Ｂ）では
コード「０１」と「０２」とが交換されることにより、
図９（Ａ）に示した更新前のデータ配列Ａが図９（Ｃ）
に示すように、「０２」，「０３」が入れ替わり、デー
タ配列Ａが更新される。これと共に、符号変換エディタ
２３は、データ配列Ａの更新に伴って位置情報を書換え
る。If it is determined that the current appearance frequency is large (YES), the appearance frequency of the upper entry of the data array B is searched in step P34. This search is performed until the current frequency of occurrence becomes smaller than the higher frequency of occurrence in the data array B. Next, in step P35, the data array B
Replace with the entry contents searched in. In FIG. 9B, by exchanging the codes “01” and “02”,
The data array A before updating shown in FIG. 9A is shown in FIG. 9C.
As shown in, the data array A is updated by replacing "02" and "03". At the same time, the code conversion editor 23 rewrites the position information as the data array A is updated.

【００７４】その後、ステップＰ36でデータ配列Ａの対
応するポインタを移動し、メインルーチンに復帰する。
そして、メインルーチンのステップＰ４では圧縮データ
を全て符号化したか否かを判断する。この際に、入力デ
ータＤINを全て符号化した場合（ＹES）には、制御アル
ゴリズムを終了し、データＤINを全て符号化していない
場合（ＮＯ）には、ステップＰ２に戻って、データＤIN
の符号化処理を継続する。Then, in step P36, the corresponding pointer of the data array A is moved to return to the main routine.
Then, in step P4 of the main routine, it is determined whether all the compressed data have been encoded. At this time, if all the input data DIN have been encoded (YES), the control algorithm ends, and if all the data DIN has not been encoded (NO), the process returns to step P2 to return to the data DIN.
The encoding process of is continued.

【００７５】これにより、符号テーブルをダイナミック
に更新しながら、入力データＤINを符号化し、この圧縮
データをファイル３１に格納することができる。次に、
本発明の第２の実施例に係る圧縮データの復元処理につ
いて説明をする。図７（Ｂ）は、本発明の第２の実施例
に係る圧縮データの復元フローチャートである。復元フ
ローチャートは制御アルゴリズムとして図３に示したＥ
ＰＲＯＭ２６に格納される。As a result, the input data DIN can be encoded and the compressed data can be stored in the file 31 while dynamically updating the code table. next,
A compressed data decompression process according to the second embodiment of the present invention will be described. FIG. 7B is a flowchart for decompressing compressed data according to the second embodiment of the present invention. The restoration flowchart is the control algorithm E shown in FIG.
It is stored in the PROM 26.

【００７６】すなわち、出現度数テーブル及び符号テー
ブルをダイナミックに更新しながら、圧縮データを復号
化する場合、図７（Ｂ）において、まず、ステップＰ１
で出現度数テーブル及び符号テーブルを初期化する。次
に、ステップＰ２で圧縮データをファイル３１から読出
し復号化する。次いで、ステップＰ３で圧縮データを入
力する毎に出現度数テーブルを更新する。これにより、
圧縮データが原データに復元される。That is, when the compressed data is decoded while dynamically updating the appearance frequency table and the code table, first in step P1 in FIG. 7B.
Initializes the appearance frequency table and the code table. Next, in step P2, the compressed data is read from the file 31 and decrypted. Next, in step P3, the appearance frequency table is updated every time the compressed data is input. This allows
Compressed data is restored to original data.

【００７７】このようにして、本発明の第２の実施例に
係るデータ圧縮方法によれば、図７（Ａ）に示したよう
に、圧縮すべき１単位のデータが入力される毎に出現度
数テーブルが更新される。このため、出現度数作成エデ
ィタ２１での出現度数テーブルの更新に伴って符号変換
エディタ２３では短時間に位置情報を書き換えることが
でき、タイナミック（動的）に符号テーブルを再構成す
ることができる。また、出現度数テーブルの更新は、一
部のデータのみの入替えで済み、全てのデータを並び変
えが不要となる。As described above, according to the data compression method of the second embodiment of the present invention, as shown in FIG. 7A, it appears every time one unit of data to be compressed is input. The frequency table is updated. For this reason, the position information can be rewritten in the code conversion editor 23 in a short time with the update of the appearance frequency table in the appearance frequency creation editor 21, and the code table can be dynamically (dynamically) reconfigured. Further, the update of the appearance frequency table only requires replacement of a part of the data, and it becomes unnecessary to rearrange all the data.

【００７８】これにより、当該符号テーブルを圧縮デー
タに含める必要が無くなり、データ圧縮率が向上する。
また、本発明の実施例によれば、１バイト又は１文字単
位の出現頻度をダイナミックに符号化することができ、
メモリ容量が限られている対象機器に本実施例のデータ
圧縮機能を組み込む場合等に、有効にメモリ領域を使用
することができる。As a result, it is not necessary to include the code table in the compressed data, and the data compression rate is improved.
Further, according to the embodiment of the present invention, it is possible to dynamically encode the appearance frequency in units of 1 byte or 1 character,
The memory area can be effectively used when the data compression function of this embodiment is incorporated in a target device having a limited memory capacity.

【００７９】（３）第３の実施例の説明図10（Ａ）は、本発明の第３の実施例に係るデータの圧
縮フローチャートであり、図10（Ｂ）は、その復元フロ
ーチャートである。図11は、データ圧縮時の位置情報の
レベル調整フローチャートをそれぞれ示している。各フ
ローチャートは制御アルゴリズムとして図３に示したＥ
ＰＲＯＭ２６に格納される。(3) Description of Third Embodiment FIG. 10 (A) is a data compression flowchart according to the third embodiment of the present invention, and FIG. 10 (B) is its decompression flowchart. FIG. 11 shows a level adjustment flowchart of position information at the time of data compression. Each flowchart is a control algorithm shown in FIG.
It is stored in the PROM 26.

【００８０】第３の実施例では第１，第２の実施例と異
なり、データを位置情報に変換した後に、圧縮データの
長さを調整するものである。第１，第２の実施例に係る
データ処理方法では、入力データの次に出現する文字の
出現度数分布が平均化されている場合や、データの出現
度数が十分に収集されていない場合等には、次のデータ
を正確に出現予測することができない。Unlike the first and second embodiments, the third embodiment adjusts the length of the compressed data after converting the data into position information. In the data processing methods according to the first and second embodiments, when the appearance frequency distribution of the character that appears next to the input data is averaged, or when the data appearance frequencies are not sufficiently collected, etc. Cannot accurately predict the next data.

【００８１】そこで、本発明の第３の実施例では、レベ
ル調整エディタ２４により、圧縮データの長さを調整
し、データ圧縮率の劣化を防ぐことを特徴としている。
例えば、圧縮すべきデータがプログラムデータ等の場合
には、機械語で書かれた転送命令等の分布に偏りが生ず
る。具体例を示すと、オペレーションコードの次にアド
レスが付随されているジャンプ命令等では、次の１バイ
トの出現確率が平均化されてしまう。Therefore, the third embodiment of the present invention is characterized in that the level adjustment editor 24 adjusts the length of the compressed data to prevent the deterioration of the data compression rate.
For example, when the data to be compressed is program data or the like, the distribution of transfer instructions written in machine language is biased. As a concrete example, in a jump instruction in which an address is attached next to the operation code, the appearance probability of the next 1 byte is averaged.

【００８２】データの種類はファイルの拡張子から判断
する。一般には、オペレーションシステムには規則性が
あり、例えば、パーソナルコンピュータのＭＳ−ＤＯＳ
の場合、拡張子＝ＥＸＥから入力データが実行形式（機
械語）のファイルであることが判断できる。すなわち、
本発明の第３の実施例では、出現度数分布の偏りが明確
なものに対しては、最初の符号化によって変換された圧
縮データの長さを調整する。これにより、更に、短い位
置情報に変換することで、データ圧縮率が向上する。The type of data is judged from the file extension. Generally, the operating system has regularity, and for example, MS-DOS of a personal computer is used.
In the case of, it can be determined from the extension = EXE that the input data is an execution format (machine language) file. That is,
In the third embodiment of the present invention, the length of the compressed data converted by the first encoding is adjusted when the deviation of the appearance frequency distribution is clear. As a result, the data compression rate is further improved by converting into shorter position information.

【００８３】表４は、１バイト（８ビット）単位の入力
データに対する圧縮データの構成内容を示している。す
なわち、本実施例では、原データを識別子無しの８ビッ
トにより符号化するレベル「０」を含めて、７つのレベ
ルを設け、これに基づいて圧縮データを構成する。Table 4 shows the structure of the compressed data for the input data in units of 1 byte (8 bits). That is, in the present embodiment, seven levels are provided including the level "0" in which the original data is encoded by 8 bits without an identifier, and the compressed data is configured based on this.

【００８４】[0084]

【表４】 [Table 4]

【００８５】表４において、第１のレベルでは、例え
ば、２５６個のデータサンプルに対して、位置情報を識
別するために５つの識別子を割当てる。具体的には、先
頭位置〜第15番目の位置情報を識別する識別子として
「００」を割当てる。その位置情報には４ビットを割当
てる。これにより、圧縮データは６ビット長になる。同
様に、第16番目〜第31番目の位置情報の識別子には、
「０１０」を割当て、その位置情報には４ビットを割当
てる。これにより、圧縮データは７ビット長になる。第
32番目〜第63番目の位置情報の識別子には、「０１１」
を割当て、その位置情報には５ビットを割当てる。これ
により、圧縮データは８ビット長になる。第64番目〜第
127 番目の位置情報の識別子には、「１０」を割当て
る。その位置情報には６ビットを割当てる。これによ
り、圧縮データは８ビット長になる。第128 番目〜第25
5 番目の位置情報の識別子には、「１１」を割当て、そ
の位置情報には７ビットを割当てる。これにより、圧縮
データは９ビット長になる。In Table 4, at the first level, for example, for 256 data samples, 5 identifiers are assigned to identify the position information. Specifically, "00" is assigned as an identifier for identifying the position information from the start position to the fifteenth position. 4 bits are allocated to the position information. As a result, the compressed data has a 6-bit length. Similarly, in the 16th to 31st position information identifiers,
“010” is assigned and 4 bits are assigned to the position information. As a result, the compressed data becomes 7 bits long. First
The identifier of the 32nd to 63rd position information is "011"
Is assigned, and 5 bits are assigned to the position information. As a result, the compressed data has a length of 8 bits. 64th ~
"10" is assigned to the identifier of the 127th position information. 6 bits are allocated to the position information. As a result, the compressed data has a length of 8 bits. 128th ~ 25th
“11” is assigned to the identifier of the fifth position information, and 7 bits are assigned to the position information. As a result, the compressed data becomes 9 bits long.

【００８６】第２のレベルでは、第１のレベルと同様に
５つの識別子を割当てる。具体的には、先頭位置〜第３
番目の位置情報を識別する識別子として「０００」を割
当て、その位置情報には２ビットを割当てる。これによ
り、圧縮データは５ビット長になる。同様に、第４番目
〜第11番目の位置情報の識別子には、「００１」を割当
て、その位置情報には３ビットを割当てる。これによ
り、圧縮データは６ビット長になる。第12番目〜第27番
目の位置情報の識別子には、「０１０」を割当て、その
位置情報には４ビットを割当てる。これにより、圧縮デ
ータは７ビット長になる。第28番目〜第59番目の位置情
報の識別子には、「０１１」を割当てる。その位置情報
には５ビットを割当てる。これにより、圧縮データは８
ビット長になる。第60番目〜第255 番目の位置情報の識
別子には、「１」を割当て、その位置情報には７ビット
を割当てる。これにより、圧縮データは９ビット長にな
る。At the second level, like the first level, five identifiers are assigned. Specifically, the first position to the third
“000” is assigned as an identifier for identifying the th position information, and 2 bits are assigned to the position information. As a result, the compressed data becomes 5 bits long. Similarly, "001" is assigned to the identifiers of the fourth to eleventh position information, and 3 bits are assigned to the position information. As a result, the compressed data has a 6-bit length. "010" is assigned to the identifier of the 12th to 27th position information, and 4 bits are assigned to the position information. As a result, the compressed data becomes 7 bits long. “011” is assigned to the identifiers of the 28th to 59th position information. Five bits are assigned to the position information. As a result, the compressed data is 8
Bit length. "1" is assigned to the identifier of the 60th to 255th position information, and 7 bits are assigned to the position information. As a result, the compressed data becomes 9 bits long.

【００８７】第３のレベルでは、第１，２のレベルと同
様に５つの識別子を割当てる。具体的には、先頭位置及
び第１番目の位置情報を識別する識別子として「００
０」を割当て、その位置情報には１ビットを割当てる。
これにより、圧縮データは４ビット長になる。同様に、
第２番目〜第５番目の位置情報の識別子には、「００
１」を割当て、その位置情報には２ビットを割当てる。
これにより、圧縮データは５ビット長になる。第６番目
〜第13番目の位置情報の識別子には、「０１０」を割当
て、その位置情報には３ビットを割当てる。これによ
り、圧縮データは６ビット長になる。第14番目〜第29番
目の位置情報の識別子には、「０１１」を割当てる。そ
の位置情報には４ビットを割当てる。これにより、圧縮
データは７ビット長になる。第60番目〜第255 番目の位
置情報の識別子には、「１」を割当て、その位置情報に
は８ビットを割当てる。これにより、圧縮データは９ビ
ット長になる。At the third level, as in the first and second levels, five identifiers are assigned. Specifically, "00" is used as an identifier for identifying the start position and the first position information.
"0" is allocated and 1 bit is allocated to the position information.
As a result, the compressed data has a 4-bit length. Similarly,
The identifiers of the second to fifth position information include “00
1 ”is allocated and 2 bits are allocated to the position information.
As a result, the compressed data becomes 5 bits long. "010" is assigned to the identifiers of the sixth to thirteenth position information, and 3 bits are assigned to the position information. As a result, the compressed data has a 6-bit length. "011" is assigned to the identifiers of the 14th to 29th position information. 4 bits are allocated to the position information. As a result, the compressed data becomes 7 bits long. “1” is assigned to the identifier of the 60th to 255th position information, and 8 bits are assigned to the position information. As a result, the compressed data becomes 9 bits long.

【００８８】第４のレベルでは、第１〜３のレベルと異
なり、４つの識別子を割当てる。具体的には、先頭位置
及び第１番目の位置情報を識別する識別子として「０
０」を割当て、その位置情報には１ビットを割当てる。
これにより、圧縮データは３ビット長になる。同様に、
第２番目及び第３番目の位置情報の識別子には、「０１
０」を割当て、その位置情報には１ビットを割当てる。
これにより、圧縮データは４ビット長になる。第４番目
〜第７番目の位置情報の識別子には、「０１１」を割当
て、その位置情報には２ビットを割当てる。これによ
り、圧縮データは５ビット長になる。第８番目〜第255
番目の位置情報の識別子には、「１」を割当てる。その
位置情報には８ビットを割当てる。これにより、圧縮デ
ータは９ビット長になる。At the fourth level, unlike the first to third levels, four identifiers are assigned. Specifically, as an identifier for identifying the start position and the first position information, "0
"0" is allocated and 1 bit is allocated to the position information.
As a result, the compressed data becomes 3 bits long. Similarly,
The identifiers of the second and third position information include “01
"0" is allocated and 1 bit is allocated to the position information.
As a result, the compressed data has a 4-bit length. “011” is assigned to the identifiers of the fourth to seventh position information, and 2 bits are assigned to the position information. As a result, the compressed data becomes 5 bits long. 8th ~ 255th
"1" is assigned to the identifier of the th position information. 8 bits are assigned to the position information. As a result, the compressed data becomes 9 bits long.

【００８９】第５のレベルでは、第１〜４のレベルと異
なり、３つの識別子を割当てる。具体的には、先頭位置
を識別する識別子として「００」のみを割当てる。これ
により、圧縮データは２ビット長になる。同様に、第１
番目及び第２番目の位置情報の識別子には、「０１」を
割当て、その位置情報には１ビットを割当てる。これに
より、圧縮データは３ビット長になる。第３番目〜第25
5 番目の位置情報の識別子には、「１」を割当て、その
位置情報には８ビットを割当てる。これにより、圧縮デ
ータは９ビット長になる。At the fifth level, unlike the first to fourth levels, three identifiers are assigned. Specifically, only "00" is assigned as an identifier for identifying the head position. As a result, the compressed data has a 2-bit length. Similarly, the first
“01” is assigned to the identifiers of the second and second position information, and 1 bit is assigned to the position information. As a result, the compressed data becomes 3 bits long. 3rd to 25th
"1" is assigned to the identifier of the fifth position information, and 8 bits are assigned to the position information. As a result, the compressed data becomes 9 bits long.

【００９０】第６のレベルでは、第１〜５のレベルと異
なり、２つの識別子を割当てる。具体的には、先頭位置
を識別する識別子として「０」のみを割当てる。これに
より、圧縮データは１ビット長になる。同様に、第１番
目及び第255 番目の位置情報の識別子には、「１」を割
当て、その位置情報には８ビットを割当てる。これによ
り、圧縮データは９ビット長になる。At the sixth level, unlike the first to fifth levels, two identifiers are assigned. Specifically, only "0" is assigned as an identifier for identifying the head position. As a result, the compressed data has a 1-bit length. Similarly, "1" is assigned to the identifiers of the 1st and 255th position information, and 8 bits are assigned to the position information. As a result, the compressed data becomes 9 bits long.

【００９１】なお、表４に示した７つのレベルはメモリ
２５等に予め格納される。符号変換エディタ２３は、こ
の７つのレベルを参照して入力データを符号化する。レ
ベル調整エディタ２４は符号変換エディタ２３によって
割当てられた圧縮データのレベルを検討し、出現度数分
布の偏りが明確なものに対して、更に、短い位置情報に
変換する。The seven levels shown in Table 4 are stored in advance in the memory 25 or the like. The code conversion editor 23 codes the input data with reference to these seven levels. The level adjustment editor 24 examines the level of the compressed data assigned by the code conversion editor 23, and if the deviation of the appearance frequency distribution is clear, converts it into shorter position information.

【００９２】次に、本発明の第３の実施例に係るデータ
圧縮方法について説明をする。例えば、出現度数分布の
偏りに応じて入力データを符号化する場合、図10（Ａ）
のフローチャートにおいて、まず、ステップＰ１で、入
力データ列をファイル３０から全て読み込んで出現度数
テーブルを作成し、ステップＰ２で出現度数テーブルか
ら符号テーブルを作成する。これまでは第１の実施例と
同様である。Next, a data compression method according to the third embodiment of the present invention will be described. For example, when input data is encoded according to the deviation of the appearance frequency distribution, FIG.
In the flowchart, first, in step P1, all input data strings are read from the file 30 to create an appearance frequency table, and in step P2, a code table is created from the appearance frequency table. The process up to this point is the same as in the first embodiment.

【００９３】次に、ステップＰ３で圧縮データの長さを
調整するオフセット符号化処理をする。この際に、レベ
ル調整エディタ２４により圧縮データの長さが調整され
る。圧縮データの長さ調整については、図11のサブルー
チンに移行して、ステップＰ31で、１単位のデータをフ
ァイル３０からメモリ２５に読み込み、次に、ステップ
Ｐ32で符号化処理をする。符号化は第１の実施例で説明
したように、１バイト又は１文字単位に符号変換エディ
タ２３により実行される。Next, in step P3, offset encoding processing for adjusting the length of the compressed data is performed. At this time, the level adjustment editor 24 adjusts the length of the compressed data. For adjusting the length of the compressed data, the process shifts to the subroutine of FIG. 11, and in step P31, one unit of data is read from the file 30 into the memory 25, and then the encoding process is performed in step P32. The encoding is executed by the code conversion editor 23 in units of 1 byte or 1 character as described in the first embodiment.

【００９４】次いで、ステップＰ33で圧縮データのビッ
ト長を把握する。ここで、先に符号変換エディタ２３に
より割当てられた圧縮データが８ビット以下の場合（Ｎ
Ｏ）には、ステップＰ35に移行する。また、圧縮データ
が９ビット以上の場合（ＹES）には、ステップＰ34に移
行して、符号化レベルを１ランク下げて、メインルーチ
ンに復帰する。例えば、入力データの次のデータの出現
分布に偏りが生じているとして、符号変換エディタ２３
により、表４に示すような第５のレベルが割当てられ、
これがレベル調整エディタ２４により、あまりデータの
出現分布に偏りが生じていないと判断されると、１ラン
ク下の第４のレベルによって圧縮データが構成される。Then, in step P33, the bit length of the compressed data is grasped. Here, when the compressed data previously allocated by the code conversion editor 23 is 8 bits or less (N
For (O), the process moves to step P35. If the compressed data is 9 bits or more (YES), the process proceeds to step P34, the coding level is lowered by one rank, and the process returns to the main routine. For example, assuming that the appearance distribution of the data next to the input data is biased, the code conversion editor 23
Assigns a fifth level as shown in Table 4,
When the level adjustment editor 24 determines that the appearance distribution of the data is not so biased, the compressed data is formed by the fourth level one rank lower.

【００９５】また、圧縮データが８ビット以下の場合に
は、ステップＰ35で、符号化レベルが最も高いか否かを
判断する。この際に、レベルが最も高い場合（ＹES）に
は、そのままメインルーチンに復帰する。しかし、符号
化レベルが高くない場合（ＮＯ）には、ステップＰ36に
移行して、擬似レベル調整を実行する。擬似レベル調整
は、現在よりも圧縮データのビット長が上がるか下がる
かを見いだすために、仮に符号化を実行するものであ
る。If the compressed data is 8 bits or less, it is determined in step P35 whether or not the coding level is the highest. At this time, if the level is the highest (YES), the process directly returns to the main routine. However, if the coding level is not high (NO), the flow shifts to step P36 to execute the pseudo level adjustment. Pseudo level adjustment is to temporarily perform encoding in order to find out whether the bit length of compressed data is higher or lower than that at present.

【００９６】すなわち、仮に符号化を実行した結果、ス
テップＰ37で、現在よりも短い圧縮データになる場合
（ＹES）には、ステップＰ38に移行して符号化のレベル
を上げる。例えば、入力データの次のデータの出現分布
に偏りが生じたとして、符号変換エディタ２３により、
表４に示すような第５のレベルが割当てられ、これがレ
ベル調整エディタ２４により、データの出現分布に偏り
が生じていると判断されると、１ランク下の第４のレベ
ルによって圧縮データが構成される。反対に、圧縮デー
タが現在よりも長くなる場合（ＹES）には、擬似レベル
調整をキャンセルしてメインルーチンに復帰する。That is, if the result of encoding is that the compressed data becomes shorter than the current one at step P37 (YES), the process proceeds to step P38 to raise the encoding level. For example, assuming that the appearance distribution of the next data of the input data is biased, the code conversion editor 23
When the fifth level as shown in Table 4 is assigned and it is judged by the level adjustment editor 24 that the appearance distribution of the data is biased, the compressed data is formed by the fourth level one rank lower. To be done. On the contrary, if the compressed data becomes longer than the present (YES), the pseudo level adjustment is canceled and the process returns to the main routine.

【００９７】そして、メインルーチンのステップＰ４で
はデータの符号化の終了判断を第１の実施例と同様に行
う。これにより、出現度数分布の偏りに応じて入力デー
タをを符号化し、この圧縮データをファイル３１に格納
することができる。次に、本発明の第３の実施例に係る
圧縮データの復元処理について説明をする。図10（Ｂ）
は、本発明の第３の実施例に係る圧縮データの復元フロ
ーチャートである。復元フローチャートは制御アルゴリ
ズムとして図３に示したＥＰＲＯＭ２６に格納される。Then, in step P4 of the main routine, the end of data encoding is determined in the same manner as in the first embodiment. As a result, the input data can be encoded according to the bias of the appearance frequency distribution, and this compressed data can be stored in the file 31. Next, a compressed data decompression process according to the third embodiment of the present invention will be described. Figure 10 (B)
9 is a flowchart for decompressing compressed data according to the third embodiment of the present invention. The restoration flowchart is stored in the EPROM 26 shown in FIG. 3 as a control algorithm.

【００９８】図10（Ｂ）において、まず、ステップＰ１
で圧縮データをファイル３１から読出し識別子及び位置
情報から符号テーブルを作成する。符号テーブルについ
ては、図５において説明したものが再現される。次に、
ステップＰ２で再度、圧縮データをファイル３１から読
出し、レベル調整された符号テーブルを把握しながら、
圧縮データを復号化する。復号化は、一般的に用いられ
る図６（Ｂ）に示すような符号木の構成によって実行す
る。In FIG. 10B, first, step P1
Then, the compressed data is read from the file 31 and a code table is created from the identifier and the position information. The code table described in FIG. 5 is reproduced. next,
In step P2, the compressed data is read again from the file 31, and the level-adjusted code table is grasped,
Decrypt the compressed data. Decoding is executed by the configuration of a code tree as shown in FIG.

【００９９】次いで、ステップＰ３で圧縮データを全て
復元したか否かを判断する。この際に、圧縮データを全
て復元した場合（ＹES）には、制御アルゴリズムを終了
し、圧縮データを全て復元していない場合（ＮＯ）に
は、ステップＰ２に戻って、圧縮データの復号化処理を
継続する。これにより、圧縮データが復号化され、この
原データがファイル３０に格納される。Then, in step P3, it is determined whether or not all the compressed data has been restored. At this time, if all the compressed data are restored (YES), the control algorithm ends, and if all the compressed data are not restored (NO), the process returns to step P2 to perform the decoding process of the compressed data. To continue. As a result, the compressed data is decrypted and this original data is stored in the file 30.

【０１００】このようにして、本発明の第３の実施例に
係るデータ圧縮方法によれば、図11に示すように、ステ
ップＰ32で圧縮データが変換された後に、ステップＰ33
〜ステップＰ38で圧縮データの長さが調整されるため、
入力データの種別に応じて、圧縮データの長さをダイナ
ミックにレベル調整した圧縮データを再構成するオフセ
ット符号化を実行することができる。Thus, according to the data compression method of the third embodiment of the present invention, as shown in FIG. 11, after the compressed data is converted in step P32, the step P33 is executed.
~ Since the length of the compressed data is adjusted in step P38,
It is possible to execute offset encoding for reconstructing compressed data in which the length of the compressed data is dynamically level-adjusted according to the type of input data.

【０１０１】すなわち、本実施例のオフセット符号化で
は、圧縮すべきデータがプログラムデータ等の場合に
は、データの出現度数分布が明確になることから、より
短い位置情報に変換し、また、出現度数分布に偏りが生
じている場合には、レベルを上げることにより、短い位
置情報に変換することができる。更に、圧縮率を高める
ための工夫としては、予め、統計情報を収集する等し
て、各入力データ別の出現度数を事前に把握し、これが
把握できた場合には、出現度数の高いデータをメモリ２
５の上位アドレスに初期設定して置く。That is, in the offset encoding of the present embodiment, when the data to be compressed is program data or the like, the appearance frequency distribution of the data becomes clear, so that the position information is converted into shorter position information and appears. If the frequency distribution is biased, it can be converted into shorter position information by raising the level. Furthermore, as a device for increasing the compression rate, the appearance frequency of each input data can be grasped in advance by collecting statistical information in advance, and if this can be grasped, the data with high appearance frequency can be obtained. Memory 2
Initially set to the upper address of 5.

【０１０２】具体的には、該当するコードの出現度数を
計数するカウンタの初期値を１以上に設定する。これに
より、短い位置情報の符号ビットに変換される確率が高
まる。例えば、コンピュータ（Ｃソース）プログラム等
では、１６進数により表される「０Ｄ」の次に、「０
Ａ」が出現する確率が高くなる。このように出現頻度の
高いデータほど短い位置情報に変換される確率が高ま
り、統計情報が少ない場合や、次の文字の出現確率の度
数分布が平均化され、予測が困難である文字や文字列に
対して、次の文字や文字列を正確に出現予測することが
でき、データ圧縮率が向上する。Specifically, the initial value of the counter for counting the frequency of appearance of the corresponding code is set to 1 or more. As a result, the probability of conversion into the code bits of the short position information increases. For example, in a computer (C source) program or the like, "0D" represented by a hexadecimal number is followed by "0D".
The probability that "A" appears will increase. In this way, the more frequently data appears, the higher the probability that it will be converted to shorter position information. If there is less statistical information, or the frequency distribution of the appearance probability of the next character is averaged, it is difficult to predict characters or character strings. On the other hand, the appearance of the next character or character string can be accurately predicted, and the data compression rate is improved.

【０１０３】[0103]

【発明の効果】以上説明したように、本発明のデータ処
理装置によれば、出現度数テーブルを参照しながら、圧
縮すべきデータの次のデータの出現予測をするデータ予
測手段が設けられるため、次に出現する文字又は文字列
を、その出現確率が高いデータほど短い位置情報にデー
タ変換手段により変換すること、及び、出現確率の低い
データほど長い位置情報に変換することができ、文字列
の順番が予想できる入力データ列の場合に、１文字のみ
の出現確率を計算するハフマン符号化方法に比べて、デ
ータ列の圧縮率を向上させることができる。As described above, according to the data processor of the present invention, the data predicting means for predicting the appearance of the data next to the data to be compressed is provided while referring to the appearance frequency table. The character or character string that appears next can be converted by the data conversion means into shorter position information for data with a higher appearance probability, and can be converted into longer position information for data with a lower appearance probability. In the case of an input data string whose order can be predicted, the compression rate of the data string can be improved as compared with the Huffman coding method that calculates the appearance probability of only one character.

【０１０４】本発明の他のデータ処理装置によれば、１
単位のデータが入力される毎に出現度数テーブルがデー
タ作成手段により更新されるため、出現度数テーブルの
更新に伴ってデータ変換手段では短時間に位置情報を書
き換えることができ、タイナミック（動的）に符号テー
ブルを再構成することができる。また、全ての位置情報
の並び変えを行わずに、符号テーブルを更新することで
き、データ圧縮時間の高速化が図られる。According to another data processing device of the present invention, 1
Since the appearance frequency table is updated by the data creating means every time unit data is input, the position information can be rewritten in a short time by the data converting means along with the update of the appearance frequency table, and dynamic (dynamic) The code table can be reconstructed. Further, the code table can be updated without rearranging all the position information, and the data compression time can be shortened.

【０１０５】本発明の他のデータ処理装置によれば、入
力データを圧縮データに変換した後に、レベル調整手段
により圧縮データの長さが調整されるため、入力データ
の種別に応じて、圧縮データの長さをダイナミックにレ
ベル調整した圧縮データを再構成するオフセット符号化
を実行することができ、出現度数分布の傾向が明確な場
合には、より短いビット長の圧縮データに変換し、出現
度数分布に偏りが生じている場合には、レベルを上げて
短いビット長の圧縮データに変換することができ、デー
タ圧縮率が向上する。According to another data processing apparatus of the present invention, since the length of the compressed data is adjusted by the level adjusting means after the input data is converted into the compressed data, the compressed data is changed according to the type of the input data. It is possible to perform offset encoding that reconstructs compressed data whose level is dynamically adjusted, and if the tendency of the appearance frequency distribution is clear, convert it to compressed data with a shorter bit length and When the distribution is biased, the level can be raised and converted into compressed data having a short bit length, and the data compression rate is improved.

【０１０６】このような１バイト又は１文字単位の出現
頻度をダイナミックに符号化するデータ圧縮機能を対象
機器に組み込むと、メモリ容量の削減化が図れる。本発
明のデータ処理方法によれば、予め、圧縮すべきデータ
の種類毎に、出現度数を計数した出現度数テーブルを作
成しているため、圧縮すべきデータの前後の対象文字や
文字列の接続状態を把握することができる。By incorporating a data compression function for dynamically encoding the appearance frequency in units of 1 byte or 1 character into the target device, the memory capacity can be reduced. According to the data processing method of the present invention, since the appearance frequency table in which the appearance frequency is counted is created in advance for each type of data to be compressed, connection of target characters and character strings before and after the data to be compressed You can grasp the status.

【０１０７】また、本発明のデータ処理方法によれば、
符号テーブルから直接，位置情報及び識別子から構成さ
れる圧縮データを出力することができる。これにより、
文字列や文字の出現予測をしながら、高速に符号化又は
復号化可能なデータ圧縮又は復元装置を構成することが
でき、磁気ディスク装置等の記憶容量の実質的な増加及
びデータ伝送時の送信時間の短縮化に寄与するところが
大きい。According to the data processing method of the present invention,
It is possible to output compressed data composed of position information and an identifier directly from the code table. This allows
It is possible to configure a data compression or decompression device capable of high-speed encoding or decoding while predicting the appearance of character strings or characters, and to substantially increase the storage capacity of magnetic disk devices, etc. and transmit during data transmission. It greatly contributes to shortening the time.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明に係るデータ処理装置の原理図である。FIG. 1 is a principle diagram of a data processing device according to the present invention.

【図２】本発明に係るデータ処理方法の原理図である。FIG. 2 is a principle diagram of a data processing method according to the present invention.

【図３】本発明の各実施例に係るデータ圧縮及び復元装
置の構成図である。FIG. 3 is a configuration diagram of a data compression / decompression device according to each embodiment of the present invention.

【図４】本発明の第１の実施例に係るデータの圧縮及び
復元フローチャートである。FIG. 4 is a data compression / decompression flowchart according to the first embodiment of the present invention.

【図５】本発明の各実施例に係る符号変換エディタの機
能説明図である。FIG. 5 is a functional explanatory diagram of a code conversion editor according to each embodiment of the present invention.

【図６】本発明の各実施例に係る圧縮データのフォーマ
ット及び符号木の説明図である。FIG. 6 is an explanatory diagram of a format of compressed data and a code tree according to each embodiment of the present invention.

【図７】本発明の第２の実施例に係るデータの圧縮及び
復元フローチャートである。FIG. 7 is a data compression / decompression flowchart according to the second embodiment of the present invention.

【図８】本発明の第２の実施例に係る出現度数テーブル
の更新フローチャートである。FIG. 8 is an update flowchart of an appearance frequency table according to the second embodiment of the present invention.

【図９】本発明の第２の実施例に係る出現度数テーブル
の更新時のデータ交換の状態図である。FIG. 9 is a state diagram of data exchange at the time of updating the appearance frequency table according to the second embodiment of the present invention.

【図10】本発明の第３の実施例に係るデータの圧縮及び
復元フローチャートである。FIG. 10 is a data compression / decompression flowchart according to the third embodiment of the present invention.

【図11】本発明の第３の実施例に係る圧縮データのレベ
ル調整フローチャートである。FIG. 11 is a level adjustment flowchart of compressed data according to the third embodiment of the present invention.

【図12】従来例に係るデータ圧縮装置の構成図である。FIG. 12 is a configuration diagram of a data compression device according to a conventional example.

【符号の説明】[Explanation of symbols]

１１…データ作成手段、１２…データ予測手段、１３…データ変換手段、１４…レベル調整手段、２１…出現度数作成エディタ、２２…データ比較エディタ、２３…符号変換エディタ、２４…レベル調整エディタ、２５…メモリ、２６…ＥＰＲＯＭ２７…ディスプレイ、２８…キーボード、２９…ＣＰＵ、３０…入力データファイル、３１…圧縮データファイル、３２…バス、ＤIN…入力データ、ＤOUT …圧縮データ。 11 ... Data creating means, 12 ... Data predicting means, 13 ... Data converting means, 14 ... Level adjusting means, 21 ... Appearance frequency creating editor, 22 ... Data comparison editor, 23 ... Sign conversion editor, 24 ... Level adjusting editor, 25 ... memory, 26 ... EPROM 27 ... display, 28 ... keyboard, 29 ... CPU, 30 ... input data file, 31 ... compressed data file, 32 ... bus, DIN ... input data, DOUT ... compressed data.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｎ 1/417 7/32 ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Office reference number FI technical display location H04N 1/417 7/32

Claims

【特許請求の範囲】[Claims]

【請求項１】圧縮すべきデータの種類毎に該データの
出現度数を計数して出現度数テーブルを作成するデータ
作成手段と、前記データ作成手段からの出現度数テーブルを参照しな
がら、圧縮すべきデータの次のデータの出現予測をする
データ予測手段と、前記データ予測手段のデータの出現予測に応じて該デー
タを出現確率の高いデータほど短いビット長の圧縮デー
タに変換し、出現確率の低いデータほど長いビット長の
圧縮データに変換するデータ変換手段とを備えることを
特徴とするデータ処理装置。1. A data creating unit that creates an appearance frequency table by counting the appearance frequencies of the data for each type of data to be compressed, and compresses while referring to the appearance frequency table from the data creating unit. Data prediction means for predicting the appearance of the next data of the data, and according to the data appearance prediction of the data prediction means, the data having a higher appearance probability is converted into compressed data having a shorter bit length, and the appearance probability is lower. A data processing device, comprising: data conversion means for converting compressed data having a bit length longer than that of data.

【請求項２】前記データ作成手段は、圧縮すべき１単
位のデータを入力する毎に出現度数テーブルを更新する
ことを特徴とする請求項１記載のデータ処理装置。2. The data processing apparatus according to claim 1, wherein the data creating unit updates the appearance frequency table every time one unit of data to be compressed is input.

【請求項３】前記データ変換手段によって変換された
圧縮データの長さを調整するレベル調整手段が設けられ
ることを特徴とする請求項１記載のデータ処理装置。3. The data processing apparatus according to claim 1, further comprising level adjusting means for adjusting the length of the compressed data converted by the data converting means.

【請求項４】前記データ作成手段は、圧縮すべき全て
のデータ又は１単位のデータを取り込んで出現度数テー
ブルを作成することを特徴とする請求項１，２及び３記
載のいずれかのデータ処理装置。4. The data processing according to claim 1, wherein the data creating means creates all the data to be compressed or one unit of data and creates an appearance frequency table. apparatus.

【請求項５】前記データ変換手段は、出現度数テーブ
ルでデータを出現度数の高い順に並べ換え、該出現度数
の高いデータから出現度数の低いデータに至る位置に対
して、該データを出現度数の高いデータほど短い位置情
報を割当て、かつ、出現度数の低いデータほど長い位置
情報を割当てた符号テーブルを参照することを特徴とす
る請求項１，２及び３記載のいずれかのデータ処理装
置。5. The data conversion means rearranges the data in an appearance frequency table in descending order of appearance frequency, and sets the data with high appearance frequency to positions from data with high appearance frequency to data with low appearance frequency. 4. The data processing apparatus according to claim 1, wherein the code table is assigned such that shorter position information is assigned to data and longer position information is assigned to data having a lower appearance frequency.

【請求項６】前記データ変換手段は、データ作成手段
での出現度数テーブルの更新に伴って圧縮データを書換
えることを特徴とする請求項１，２及び３記載のいずれ
かのデータ処理装置。6. The data processing apparatus according to claim 1, wherein the data conversion unit rewrites the compressed data when the appearance frequency table is updated by the data creation unit.

【請求項７】予め、前記圧縮すべきデータの種類毎に
該データの出現度数を計数して出現度数テーブルを作成
し、前記作成された出現度数テーブルを参照しながら、圧縮
すべきデータの次のデータの出現予測をし、前記データの出現予測に応じて該データを出現確率の高
いデータほど短いビット長の圧縮データに変換し、か
つ、出現確率の低いデータほど長いビット長の圧縮デー
タに変換することを特徴とするデータ処理方法。7. An appearance frequency table is created in advance by counting the appearance frequency of the data for each type of data to be compressed, and the next occurrence of the data to be compressed is referred to with reference to the created appearance frequency table. The appearance prediction of the data of, the data is converted into compressed data having a shorter bit length as the data having a higher appearance probability in accordance with the prediction of the appearance of the data, and as the compressed data having a longer bit length as the data having a lower appearance probability. A data processing method characterized by converting.

【請求項８】前記圧縮データの変換は、出現度数テー
ブルのデータを出現度数の高い順に並べ換え、該出現度
数の高いデータから出現度数の低いデータに至る位置に
対して、該データを出現度数の高いデータほど短い位置
情報を割当て、かつ、出現度数の低いデータほど長い位
置情報を割当てた符号テーブルを参照することを特徴と
する請求項７記載のいずれかのデータ処理方法。8. The conversion of the compressed data is performed by rearranging the data of the appearance frequency table in the descending order of appearance frequency, and the data is arranged in order of appearance frequency from the data having high appearance frequency to the data having low appearance frequency. 8. The data processing method according to claim 7, further comprising: referring to a code table in which shorter position information is assigned to higher data and longer position information is assigned to lower occurrence frequency data.

【請求項９】前記出現度数テーブル及び符号テーブル
は、圧縮すべき１単位のデータが入力される毎に更新す
ることを特徴とする請求項７及び８記載のいずれかのデ
ータ処理方法。9. The data processing method according to claim 7, wherein the appearance frequency table and the code table are updated each time one unit of data to be compressed is input.

【請求項10】前記圧縮すべきデータを圧縮データに変
換した後に、圧縮データのビット長を調整することを特
徴とする請求項７，８及び９記載のいずれかのデータ処
理方法。10. The data processing method according to claim 7, wherein the bit length of the compressed data is adjusted after converting the data to be compressed into compressed data.

【請求項11】前記出現度数テーブルは、圧縮すべき全
てのデータ又は１単位のデータに応じて作成することを
特徴とする請求項７，８及び９記載のいずれかのデータ
処理方法。11. The data processing method according to claim 7, wherein the appearance frequency table is created according to all data to be compressed or one unit of data.

【請求項12】前記データの次のデータの出現予測は、
出現度数テーブルに書き込まれた各々の出現度数のデー
タと、圧縮すべきデータの次のデータとを比較すること
を特徴とする請求項７，８及び９記載のいずれかのデー
タ処理方法。12. The appearance prediction of the next data of the data is,
10. The data processing method according to claim 7, wherein the data of each appearance frequency written in the appearance frequency table is compared with the data next to the data to be compressed.

【請求項13】前記圧縮データは、出現度数の高いデー
タから出現度数の低いデータに至る位置を表示する位置
情報及び前記位置情報を識別する識別子から構成するこ
とを特徴とする請求項７，８及び９記載のいずれかのデ
ータ処理方法。13. The compressed data comprises position information for displaying a position from data having a high frequency of appearance to data having a low frequency of occurrence and an identifier for identifying the position information. 9. The data processing method according to any one of 9 and 9 above.