JPH08167852A

JPH08167852A - Method and device for compressing data

Info

Publication number: JPH08167852A
Application number: JP30866294A
Authority: JP
Inventors: Nobuko Sato; 宣子佐藤; Yoshiyuki Okada; 佳之岡田; Shigeru Yoshida; 茂吉田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-12-13
Filing date: 1994-12-13
Publication date: 1996-06-25

Abstract

PURPOSE: To attain a sufficient compressiblity factor even when the size of an object to be compressed is small by sorting characters into plural groups each of which consists of characters having the same statistic property and calculating the appearance probability of each group. CONSTITUTION: A character string is inputted to a character group sorting part 10 and characters included in the character string are sorted into plural hierarchical groups in each character group having the same statistic property. Then a probability model preparing part 20 calculates the appearance probability of respective groups and the appearance probability of input characters in plural groups. An encoding part 30 encodes each input character based upon the calculated intra-group character appearance probability. Even when file size to be compressed is not sufficiently large size for the construction of a probability model, a high compressibility factor can be obtained without previously storing individual character appearance frequency. In the case of sorting characters into plural groups and calculating the appearance probability of respective groups, it is preferable to previously fix and apply the sorts of constitutional elements in respective groups and the appearance probability of respective groups.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】近年、文字コード、画像データ等
の様々なデータがコンピュータで扱われるようになるの
に伴い、取り扱われるデータ量も増大している。そのよ
うな大量のデータは、データ中の冗長な部分を省いて圧
縮することにより、記憶容量を減らしたり、早く伝送し
たりできるようになる。BACKGROUND OF THE INVENTION In recent years, as various data such as character codes and image data have been handled by computers, the amount of data handled has increased. By compressing such a large amount of data while omitting redundant portions in the data, it becomes possible to reduce the storage capacity and to transmit the data quickly.

【０００２】一方、圧縮を行ったデータは、参照・利用
する際に復元する必要があるため、圧縮する前のデータ
に比べアクセス速度が低下する。そこで、これまでのデ
ータ圧縮は、主に、一部参照を行うことが希な、データ
のバックアップや通信のときにのみ利用されている。On the other hand, since compressed data needs to be restored when being referenced and used, the access speed is lower than that of the data before compression. Therefore, the conventional data compression is mainly used only for data backup and communication, which is rarely referred to in part.

【０００３】しかし、近年では、圧縮専用ＬＳＩが利用
されるようになったため、圧縮データの復元速度は短く
なり、通常のデータと同様にアクセスを行うデータに対
しても、圧縮・復元を行うことが考えられてきている。However, in recent years, a compression-dedicated LSI has come to be used, so that the speed of decompressing compressed data is reduced, and it is possible to perform compression / decompression on data that is accessed in the same manner as normal data. Is being considered.

【０００４】そこで、圧縮を行った場合、圧縮データ単
位毎に復元を行うため、アクセスするデータサイズ単位
と、圧縮するデータサイズは同程度（５Ｋｂｙｔｅ以
下、１〜２Ｋｂｙｔｅ程度）で行うことが望まれてい
る。Therefore, when compression is performed, restoration is performed for each compressed data unit. Therefore, it is desirable that the data size unit to be accessed is the same as the data size to be compressed (5 Kbytes or less, 1-2 Kbytes or so). ing.

【０００５】[0005]

【従来の技術】様々な種類のデータ（文字コード、画像
データ等）に適用できるデータ圧縮方式として、ユニバ
ーサル符号化方式が提案されている。ここで、本発明
は、文字コードの圧縮に限定されず、様々なデータに適
用できるが、以下では、情報理論に基づき、データの１
ワード単位を文字（アルファベット）と呼び、データが
任意ワードつながったものを文字列と呼ぶことにする。2. Description of the Related Art As a data compression method applicable to various kinds of data (character code, image data, etc.), a universal coding method has been proposed. Here, the present invention is not limited to the compression of character codes and can be applied to various data.
A word unit is called a character (alphabet), and a string of connected data is called a character string.

【０００６】ユニバーサル符号化方式の中で代表的な方
式として、算術符号化方式がある。この方式は、従来よ
く使われているハフマン符号のように、１文字づつばら
ばらに符号化の１点に対応付け、２進数の小数点以下を
符号として出力するものである。An arithmetic coding method is a typical method among the universal coding methods. In this system, like the Huffman code which has been often used in the past, each character is associated with one point of the encoding separately and the decimal point of a binary number is output as a code.

【０００７】ここで、多値算術符号化の原理を、図２を
参照して説明する。まず、算術符号では、文字列を実数
０と１の間（［０，１））のある実数の区間を用いて表
すということが基本アイデアになっている。Here, the principle of multivalued arithmetic coding will be described with reference to FIG. First, in the arithmetic code, the basic idea is to represent a character string by using a real number interval between real numbers 0 and 1 ([0, 1)).

【０００８】ここで、区間［０，１）を採用するのは、
２進数の小数点以下を符号として出力するためである。
また、以上“［”と未満“）”となっている理由は、
［０，１］では、０と１の小数点以下が同じになって０
と１を区別できなくなるためであり、（０、１）では、
値としての０が使用できなくなるためである。Here, the interval [0, 1) is adopted because
This is because the decimal point of a binary number is output as a code.
Also, the reason why the above is "[" and less than ")" is
In [0,1], the decimal places of 0 and 1 are the same, and 0
This is because it becomes impossible to distinguish between 1 and 1, and (0,1)
This is because 0 as a value cannot be used.

【０００９】図２（Ａ）は、ａ，ｂ，ｃ，ｄの４文字が
出現すると仮定した場合に、各々の文字の出現頻度を示
している。図２（Ａ）中、横軸の文字ａ，ｂ，ｃ，ｄの
下側に記された（４）、（２）、（１）、（３）は、そ
れぞれの文字の出現頻度順位を示している。FIG. 2A shows the appearance frequency of each character, assuming that four characters a, b, c and d appear. In FIG. 2 (A), (4), (2), (1), and (3) written below the letters a, b, c, and d on the horizontal axis indicate the frequency of appearance of each character. Shows.

【００１０】図２（Ａ）に示された各文字の出現頻度に
基づいて、文字毎の累積出現頻度確率を出現頻度順に示
したのが図２（Ｂ）である。即ち、図２（Ｂ）中、横軸
にｃｆ０と記された列は、ｃ，ｂ，ｄ，ａの４文字中の
文字ｃの累積出現頻度確率を示している。同様に、横軸
にｃｆ１と記された列は、ｂ，ｄ，ａの３文字中の文字
ｂの累積出現頻度確率を示している。同様に、横軸にｃ
ｆ２と記された欄は、ｄ，ａの２文字中の文字ｄの累積
出現頻度確率を示している。同様に、横軸にｃｆ３と記
された欄は、文字ａの累積出現頻度確率を示している。FIG. 2B shows cumulative appearance frequency probabilities for each character in the order of appearance frequency based on the appearance frequency of each character shown in FIG. 2A. That is, in FIG. 2B, the column indicated by cf0 on the horizontal axis indicates the cumulative appearance frequency probability of the character c among the four characters c, b, d, and a. Similarly, the column indicated by cf1 on the horizontal axis indicates the cumulative appearance frequency probability of the character b among the three characters b, d, and a. Similarly, the horizontal axis is c
The column marked f2 shows the cumulative appearance frequency probability of the character d in the two characters d and a. Similarly, the column labeled cf3 on the horizontal axis indicates the cumulative appearance frequency probability of the character a.

【００１１】そして、図２（Ｂ）に示された累積出現頻
度確率から算術符号化を行う方法を示したのが図２
（Ｃ）である。即ち、文字ｃを入力した段階で、対応す
る区間１０として、文字ｃの累積出現頻度確率（図２
（Ｂ）においてｃｆ０と記された列で斜線が付された箇
所）と同等の区間幅を採用する。FIG. 2 shows a method of performing arithmetic coding from the cumulative appearance frequency probability shown in FIG. 2 (B).
(C). That is, when the character c is input, the cumulative appearance frequency probability of the character c (FIG.
The section width equivalent to that in the column marked with cf0 in (B) and the shaded area) is adopted.

【００１２】次に、２番目の文字ａが入力された段階
で、対応する区間１１として、１文字目に対応する区間
１０を各文字の累積出現頻度確率で再分割して得られる
区間１１を採用する。Next, when the second character a is input, as the corresponding section 11, the section 11 corresponding to the first character is re-divided with the cumulative appearance frequency probability of each character. adopt.

【００１３】そして、３番目の文字ｄが入力された段階
で、対応する区間１２として、２文字目に対応する区間
１１を各文字の累積出現頻度確率で再分割して得られる
区間１２を採用する。When the third character d is input, a section 12 obtained by redistributing the section 11 corresponding to the second character with the cumulative appearance probability of each character is adopted as the corresponding section 12. To do.

【００１４】このようにして、文字列ａｃｄは区間１２
の任意値（区間１２の上端と下端の間の任意の値）とし
て符号化される。ここで、各区間の下端は、式（１−
１）〜（１−２）により求められる。新たな部分区間の下端＝現部分区間の下端＋現部分区間幅×注目文字の累積確率・・・（１−１）新たな部分区間幅＝現部分区間幅×注目文字の確率・・・（１−２）なお、符号語を復元するには、符号語が各文字の確率に
分けたどの区間に含まれるか、逐次再分割しながら調べ
ればよい。In this way, the character string acd is the section 12
Is coded as an arbitrary value of (any value between the upper end and the lower end of the section 12). Here, the lower end of each section is expressed by the formula (1-
It is obtained by 1) to (1-2). Lower end of new partial section = Lower end of current partial section + Current partial section width x Cumulative probability of target character ... (1-1) New partial section width = Current partial section width x Probability of target character ... ( 1-2) In order to restore the codeword, it is sufficient to check which segment of the probability of each character the codeword is included in while sequentially re-dividing.

【００１５】このように、算術符号化では、区間で符号
化するが、復号する過程では実際に区間が与えられる必
要は無く、区間の中のある一つの数が指定されればよ
い。具体的な符号語としては、区間内の数の中でできる
だけ短いビット数で表せるものを選べばよい。As described above, in arithmetic coding, coding is performed in intervals, but it is not necessary to actually provide intervals in the decoding process, and a certain number in the intervals may be designated. As a specific code word, one that can be represented by the shortest possible number of bits among the numbers in the section may be selected.

【００１６】即ち、出現頻度が高いほど区間幅が大きく
なるということから、区間幅が大きいほど小数点以下の
数が少なくなり、短いビット数で表せるようになる。以
上は、各シンボル出現頻度を固定した例の説明である
が、以下に示すように、出現頻度（確率モデル）を逐次
変更して、動的に行うこともできる。注目文字の累積確率＝注目文字より出現頻度の低い文字の出現回数の累積／入力文字列長・・・（２−１）注目文字の確率＝注目文字の出現回数／入力文字列長・・・（２−２）ここで、出現頻度を文字入力の度に再計算する動的な算
術符号化を行う装置の構成を図３に示す。この装置は、
図２（Ａ）のような入力された文字の出現頻度を作成す
ると共に、図２（Ｂ）のような文字ごとの出現頻度順の
累積出現頻度を作成する確率確率モデル（シンボル出現
頻度）作成部２０と、図２（Ｃ）のような累積出現頻度
確率から算術符号化を行う算術符号部４０とから構成さ
れている。そして、確率モデル作成部２０は、図示して
いない辞書とカウンタとを有している。That is, since the section width becomes larger as the appearance frequency becomes higher, the number after the decimal point becomes smaller as the section width becomes larger, so that the number of bits can be represented by a shorter number of bits. The above is a description of an example in which the frequency of appearance of each symbol is fixed, but as shown below, the frequency of occurrence (probability model) can be sequentially changed and dynamically performed. Cumulative probability of the character of interest = Cumulative number of appearances of characters having a lower appearance frequency than the character of interest / Input character string length (2-1) Probability of the character of interest = Number of appearances of the character of interest / Input character string length ... (2-2) Here, FIG. 3 shows the configuration of a device that performs dynamic arithmetic coding in which the appearance frequency is recalculated each time a character is input. This device
Probability probability model (symbol appearance frequency) creation that creates the appearance frequency of the input character as shown in FIG. 2A and also creates the cumulative appearance frequency in the order of appearance frequency for each character as shown in FIG. It is composed of a unit 20 and an arithmetic coding unit 40 which performs arithmetic coding from the cumulative appearance frequency probability as shown in FIG. The probabilistic model creation unit 20 has a dictionary and a counter (not shown).

【００１７】次に、図４のフローにより、図３の算術符
号化装置の動作を説明する。まず、ステップ４０１で
は、上端＝１、下端＝０、区間幅＝１．０を算術符号化
の初期値とする。このとき、確率モデル作成部２０の辞
書は、シンボルと出現頻度順位を保持し、カウンタは各
シンボル出現頻度を保持する。また、初期化として、シ
ンボル数（出現が考えられる文字数：１ｂｙｔｅの時２
５６）の辞書を準備し、各文字ごとに出現頻度をカウン
トするカウンタを準備し“１”に初期化する。そして、
算術符号部４０は、各シンボルの順位、また、累積出現
頻度を保持する。Next, the operation of the arithmetic coding apparatus of FIG. 3 will be described with reference to the flow of FIG. First, in step 401, the upper end = 1, the lower end = 0, and the section width = 1.0 are used as initial values for arithmetic coding. At this time, the dictionary of the probabilistic model creation unit 20 holds the symbols and the appearance frequency ranks, and the counter holds each symbol appearance frequency. Also, as initialization, the number of symbols (the number of characters that can be considered to appear: 2 when the number is 1 byte)
56) Prepare a dictionary, prepare a counter that counts the appearance frequency for each character, and initialize it to "1". And
The arithmetic coding unit 40 holds the rank of each symbol and the cumulative appearance frequency.

【００１８】入力文字列より一文字（ｋとする）入力す
る（ステップ４０２）毎に、辞書より出現頻度順位を選
択し、この番号及び累積出現頻度を用いて算術符号部４
０にて区間を計算し、算術符号化する（ステップ４０
３）。つまり、入力文字の区間の上端と下端を式（１−
１）〜（１−２）及び式（２−１）〜（２−２）に基づ
いて求め、区間の任意の値を符号として出力する。Each time one character (k) is input from the input character string (step 402), the appearance frequency rank is selected from the dictionary, and the arithmetic code unit 4 is selected using this number and the cumulative appearance frequency.
The interval is calculated at 0 and arithmetically encoded (step 40).
3). That is, the upper and lower ends of the section of the input character are expressed by the formula (1-
1) to (1-2) and equations (2-1) to (2-2), and outputs an arbitrary value of the section as a code.

【００１９】その後、カウンタにて入力文字の出現頻度
を“１”増やす（ステップ４０４）。“１”増加した文
字に伴い、頻度順に辞書を並び替える（ステップ４０
５）と共に、累積出現頻度を更新する（ステップ４０
６）。なお、ステップ４０５とステップ４０６は、逆に
処理してもよい。After that, the appearance frequency of the input character is increased by "1" at the counter (step 404). The dictionary is rearranged in order of frequency according to the characters increased by "1" (step 40).
Along with 5), the cumulative appearance frequency is updated (step 40).
6). Note that steps 405 and 406 may be processed in reverse.

【００２０】以上のステップ４０２からステップ４０６
までの操作は、繰り返し実行される。これまでの説明
は、一文字毎の出現確率に基づいて算術符号化する場合
の例である。更に圧縮率を高めるには、入力文字と直前
の文字との依存関係（以下、「文脈」とする）を取り入
れた、条件付出現確率を用いて算術符号化する。The above steps 402 to 406
The operations up to are repeatedly executed. The above description is an example of arithmetic coding based on the appearance probability of each character. To further increase the compression rate, arithmetic coding is performed using conditional occurrence probabilities, which incorporates the dependency relationship between the input character and the immediately preceding character (hereinafter referred to as “context”).

【００２１】文脈は、図５に示すように、木構造で表さ
れる。各ノードの文字を通る文字列が出る毎に出現回数
を各ノードにて計数しておいて条件付き確率を求める。
図５において、各文字の右隣に記された数字が出現回数
を示している。例えば、rootの直ぐ下にある（枝の長さ
が１）文字ａの右隣に５と記されているのは、文字ａの
出現回数が５であることを意味し、rootから２段下にあ
る（枝の長さが２）文字ａの右隣に２と記されているの
は、文字ａａの出現回数が２であることを意味し、root
から３段下にある（枝の長さが３）文字ａの右隣に１と
記されているのは、文字ａａａの出現回数が１であるこ
とを意味している。The context is represented by a tree structure as shown in FIG. Each time a character string passing through the characters of each node appears, the number of appearances is counted at each node to obtain the conditional probability.
In FIG. 5, the number on the right of each character indicates the number of appearances. For example, the number 5 to the right of the letter a immediately below the root (the length of the branch is 1) means that the number of occurrences of the letter a is 5, and it is two steps below the root. "2" to the right of the character "a" (with a branch length of 2) means that the number of occurrences of the character "aa" is 2.
"1" is written to the right of the character "a" (three branches in length) which is three columns below the character "a" means that the number of occurrences of the character aaa is one.

【００２２】ここで、全ての記号の生起確率は、その記
号の直前に出現した記号列である「文脈」に従って定め
られるが、この文脈の形成に利用される記号列の長さは
「次数」と呼ばれる。次数の設定の仕方である文脈収集
方法には、下記の（１）及び（２）がある。（１）固定次数の文脈条件付確率の条件を固定の次数にする方法。例えば、２
次の文脈では直前２文字につながる文字の文脈を収集
（図５では、rootからの枝の長さ３）し、条件付き確率
p(y|x1,x2)を得る。Here, the occurrence probabilities of all symbols are determined according to the "context" which is the symbol string that appears immediately before that symbol, and the length of the symbol string used to form this context is the "order". Called. There are the following (1) and (2) in the context collection method which is the method of setting the order. (1) Fixed-order context A method of setting the conditional probability condition to a fixed order. For example, 2
In the next context, the contexts of the characters that are connected to the last two characters are collected (in FIG. 5, the branch length from root is 3), and the conditional probability
Get p (y | x1, x2).

【００２３】ただし、yは注目符号化文字、x1、x2はそ
れぞれ直前の第１文字、第２文字を意味し、p(y|x1,x2)
は、x1、x2が続いて出現した後に、yが出現する確率を
意味している。（２）Blending文脈 Blending（次数の混合）は、条件文字列の長さを固定せ
ずに、入力データに応じて次数を伸ばす。However, y means the coded character of interest, x1 and x2 mean the first and second characters immediately before, respectively, and p (y | x1, x2)
Means the probability that y appears after x1 and x2 appear next. (2) Blending context Blending (mixing degree) extends the degree according to the input data without fixing the length of the condition character string.

【００２４】多値算術符号化において出現可能な文字数
が多い場合（例えば、１文字が１６bitで表現され、出
現可能な文字が６４Ｋ個の場合）には、該当する圧縮フ
ァイルに全く出現しない文字が多数存在する。この場
合、出現頻度を文字入力の度に再計算して動的に算術符
号化を行うときに、全各文字の出現可能性を考えて各出
現頻度の初期値に“１”を与えると、無駄な区間を多く
とり、圧縮率が低下することになる。この無駄をなくす
方法として、−１次と０次のBlending方法がある。−１
次は、未出現文字を等確率にしたものをあらわし、０次
は、文脈無しの文字出現頻度を表す。When the number of characters that can appear in the multi-valued arithmetic coding is large (for example, when one character is represented by 16 bits and the number of characters that can appear is 64K), the characters that do not appear at all in the corresponding compressed file are There are many. In this case, when the appearance frequency is recalculated each time a character is input and dynamic arithmetic coding is performed, "1" is given to the initial value of each appearance frequency in consideration of the appearance probability of all characters, Many wasteful sections are taken, and the compression rate decreases. As a method for eliminating this waste, there is a -1st-order and 0th-order Blending method. -1
Next, the non-appearing characters are represented with equal probabilities, and the 0th order represents the frequency of character appearance without context.

【００２５】−１次、０次のBlending方法を用いた算術
符号化方式のフローを、図６を参照して説明する。な
お、このフローは、例えば図３に示す装置で処理され
る。まず、ステップ６０１では、算術符号化にあたっ
て、上端＝１、下端＝０、区間幅＝１．０を初期値とす
る。また、全出現可能な文字（情報源）を未出現文字と
して辞書に登録する。A flow of the arithmetic coding method using the −1st-order and 0th-order blending method will be described with reference to FIG. Note that this flow is processed by, for example, the device shown in FIG. First, in step 601, upon arithmetic coding, the upper end = 1, the lower end = 0, and the section width = 1.0 are set as initial values. In addition, all the characters that can appear (information source) are registered in the dictionary as unappearing characters.

【００２６】そして、入力文字列より一文字（ｋとす
る）入力する（ステップ６０２）毎に、それがフロー中
に出現したかどうかを辞書により判別する（ステップ６
０３）。Then, each time one character (k) is input from the input character string (step 602), it is determined by the dictionary whether or not it appears in the flow (step 6).
03).

【００２７】ステップ６０３で、出現していないと判別
された場合は、未出現文字用区間を算術符号化し（ステ
ップ６０７）、全未出現文字を等確率として文字ｋを算
術符号化する（ステップ６０８）。その後、カウンタに
て入力文字ｋの出現頻度を“１”とし、文字ｋを未出現
文字より除く（ステップ６０９）。When it is determined in step 603 that the character does not appear, the section for the non-appearing character is arithmetically coded (step 607), and the character k is arithmetically coded with all the non-appearing characters as equal probabilities (step 608). ). After that, the appearance frequency of the input character k is set to "1" by the counter, and the character k is excluded from the non-appearing characters (step 609).

【００２８】一方、ステップ６０３で、出現していたと
判別された場合は、文字ｋを算術符号化する（ステップ
６０４）。その後、カウンタにて入力文字の出現頻度を
“１”増やす（ステップ６０５）。そして、出現頻度順
に辞書を並び替える（ステップ６０６）。On the other hand, when it is determined in step 603 that the character appears, the character k is arithmetically encoded (step 604). After that, the appearance frequency of the input character is increased by "1" at the counter (step 605). Then, the dictionaries are rearranged in the order of appearance frequency (step 606).

【００２９】ステップ６０６とステップ６０９の実行後
に、累積出現頻度を更新する（ステップ６１０）。その
後、ステップ６０２から実行を繰り返す。After executing steps 606 and 609, the cumulative appearance frequency is updated (step 610). Then, the execution is repeated from step 602.

【００３０】[0030]

【発明が解決しようとする課題】統計的な各文字出現頻
度に従い、出現確率の高い文字に対して短い符号長を割
り振る圧縮方式（確率統計型圧縮方式）において、前述
したように、各シンボル出現頻度（確率モデル）を固定
的にするものと、動的に変更するものとがある。In the compression method (probability statistical compression method) in which a short code length is assigned to a character having a high probability of occurrence in accordance with the statistical frequency of occurrence of each character, as described above, each symbol occurrence There are two types: one that makes the frequency (probability model) fixed, and one that changes it dynamically.

【００３１】前者は、復元する際に予め設定した確率モ
デルまたは全文字列を操作して得た確率モデルを必要と
し、圧縮したデータとともに、先の出現頻度を保持する
必要がある。The former requires a preset probabilistic model or a probabilistic model obtained by operating all character strings at the time of restoration, and it is necessary to retain the previous appearance frequency together with the compressed data.

【００３２】一方、後者は、文字列を入力するに従っ
て、確率モデルを再計算して使う適応型符号化方式であ
り、予め確率モデルを保持しなくてよく、また、各圧縮
対象データに即した確率モデルを構築することができ
る。しかし、圧縮を行う文字列が短い場合には、十分な
辞書を構築することができず、良い圧縮率が得られな
い。On the other hand, the latter is an adaptive coding method in which the probability model is recalculated and used as a character string is input, and it is not necessary to hold the probability model in advance, and it is suitable for each data to be compressed. Probabilistic models can be built. However, if the character string to be compressed is short, a sufficient dictionary cannot be constructed and a good compression ratio cannot be obtained.

【００３３】本発明は、このような事情に鑑みてなされ
たもので、圧縮対象ファイルサイズが確率モデル構築に
十分な大きさでない場合に、予め個々の文字出現頻度を
保持せずに、良い圧縮率を得ることができるデータ圧縮
方法及び装置を提供することを課題とする。The present invention has been made in view of such circumstances, and when the file size to be compressed is not large enough for constructing the probabilistic model, good compression is not performed in advance without holding the frequency of occurrence of each character. An object of the present invention is to provide a data compression method and device capable of obtaining a rate.

【００３４】[0034]

【課題を解決するための手段】[Means for Solving the Problems]

＜本発明の第１のデータ圧縮方法＞本発明の第１のデー
タ圧縮方法は、前述した課題を解決するため、下記の如
く構成されている（請求項１に対応）。<First data compression method of the present invention> A first data compression method of the present invention is configured as follows in order to solve the above-mentioned problems (corresponding to claim 1).

【００３５】即ち、出現確率に応じた符号長を出力する
可変長符号化を行うデータ圧縮方法において、群構成ス
テップと、群出現確率計算ステップと、群中文字出現確
率計算ステップと、入力文字符号化ステップとを備えて
いる。That is, in the data compression method for performing the variable length coding for outputting the code length according to the appearance probability, the group forming step, the group appearance probability calculating step, the in-group character appearance probability calculating step, and the input character code And a conversion step.

【００３６】群構成ステップは、入力される可能性があ
る文字を、互いに同じ統計的性質を有する文字毎に階層
的な複数の群にそれぞれ分類する。群出現確率計算ステ
ップは、前記それぞれの群の出現確率を計算する。The group forming step classifies the characters that may be input into a plurality of hierarchical groups for each character having the same statistical property. The group appearance probability calculation step calculates the appearance probability of each of the groups.

【００３７】群中文字出現確率計算ステップは、前記複
数の群中における入力文字の出現確率を計算する。入力
文字符号化ステップは、前記群中文字出現確率計算ステ
ップで計算された出現確率に基づいて入力文字を符号化
する。The in-group character appearance probability calculating step calculates the appearance probability of the input character in the plurality of groups. The input character encoding step encodes the input character based on the appearance probability calculated in the group character appearance probability calculation step.

【００３８】＜本発明の第２のデータ圧縮方法＞本発明
の第２のデータ圧縮方法は、前述した課題を解決するた
め、下記の如く構成されている（請求項２に対応）。<Second Data Compression Method of the Present Invention> A second data compression method of the present invention is configured as follows (corresponding to claim 2) in order to solve the above-mentioned problems.

【００３９】即ち、第１のデータ圧縮方法において、前
記群構成ステップ（Ｓ１）では、前記群の構成要素の分
類を予め固定して与える。＜本発明の第３のデータ圧縮方法＞本発明の第３のデー
タ圧縮方法は、前述した課題を解決するため、下記の如
く構成されている（請求項３に対応）。That is, in the first data compression method, in the group forming step (S1), the classification of the constituent elements of the group is fixed and given in advance. <Third data compression method of the present invention> A third data compression method of the present invention is configured as follows (corresponding to claim 3) in order to solve the problems described above.

【００４０】即ち、第１のデータ圧縮方法において、前
記群出現確率計算ステップ（Ｓ２）では、前記群の出現
確率を予め固定して与える。＜本発明の第４のデータ圧縮方法＞本発明の第４のデー
タ圧縮方法は、前述した課題を解決するため、下記の如
く構成されている（請求項４に対応）。That is, in the first data compression method, in the group appearance probability calculation step (S2), the appearance probability of the group is fixed and given in advance. <Fourth data compression method of the present invention> A fourth data compression method of the present invention is configured as described below (corresponding to claim 4) in order to solve the problems described above.

【００４１】即ち、第１のデータ圧縮方法において、前
記群出現確率計算ステップ（Ｓ２）では、前記群の出現
確率に予め初期値を設定するとともに、この群の出現確
率を前記文字の入力に応じて動的に再計算する。That is, in the first data compression method, in the group appearance probability calculation step (S2), an initial value is set in advance for the appearance probability of the group, and the appearance probability of this group is determined according to the input of the character. And recalculate dynamically.

【００４２】＜本発明の第５のデータ圧縮方法＞本発明
の第５のデータ圧縮方法は、前述した課題を解決するた
め、下記の如く構成されている（請求項５に対応）。<Fifth Data Compression Method of the Present Invention> A fifth data compression method of the present invention has the following configuration in order to solve the above problems (corresponding to claim 5).

【００４３】即ち、第１のデータ圧縮方法において、前
記群出現確率計算ステップ（Ｓ２）では、前記群の出現
確率を、直前の複数文字が属する各々の群が出現するこ
とを条件とする条件付群出現確率で計算する。That is, in the first data compression method, in the group appearance probability calculation step (S2), the appearance probability of the group is conditional on the condition that each group to which the immediately preceding plurality of characters belong appears. Calculate with the group appearance probability.

【００４４】＜本発明の第６のデータ圧縮方法＞本発明
の第６のデータ圧縮方法は、前述した課題を解決するた
め、下記の如く構成されている（請求項６に対応）。<Sixth Data Compression Method of the Present Invention> A sixth data compression method of the present invention is configured as follows (corresponding to claim 6) in order to solve the above-mentioned problems.

【００４５】即ち、第１のデータ圧縮方法において、前
記群構成ステップ（Ｓ１）では、前記階層的な複数の群
を、高出現確率文字で構成される第１の群と、低出現確
率文字で構成される第２の群とで構成する。That is, in the first data compression method, in the group forming step (S1), the plurality of hierarchical groups are composed of a first group composed of high appearance probability characters and a low appearance probability character. And a second group configured.

【００４６】＜本発明のデータ圧縮装置＞本発明のデー
タ圧縮装置は、前述した課題を解決するため、下記の如
く構成されている（請求項９に対応）。。<Data Compressing Apparatus of the Present Invention> The data compressing apparatus of the present invention is configured as follows in order to solve the above-mentioned problems (corresponding to claim 9). .

【００４７】即ち、出現確率に応じた符号長を出力する
可変長符号化を行うデータ圧縮装置において、群構成部
と、群出現確率計算部と、群中文字出現確率計算部と、
入力文字符号化部とを備えている。That is, in the data compression apparatus for performing the variable length coding for outputting the code length according to the appearance probability, the group forming section, the group appearance probability calculating section, the group-in-character appearance probability calculating section,
And an input character encoding unit.

【００４８】群構成部は、入力される可能性がある文字
を、互いに同じ統計的性質を有する文字毎に階層的な複
数の群にそれぞれ分類する。群出現確率計算部は、前記
それぞれの群の出現確率を計算する。The group forming unit classifies characters that may be input into a plurality of hierarchical groups for each character having the same statistical property. The group appearance probability calculation unit calculates the appearance probability of each of the groups.

【００４９】群中文字出現確率計算部は、前記複数の群
中における入力文字の出現確率を計算する。入力文字符
号化部は、前記群中文字出現確率計算部で計算された出
現確率に基づいて入力文字を符号化する。The in-group character appearance probability calculation unit calculates the appearance probability of the input character in the plurality of groups. The input character encoding unit encodes the input character based on the appearance probability calculated by the character-in-group appearance probability calculating unit.

【００５０】[0050]

【作用】[Action]

＜本発明の第１のデータ圧縮方法の作用＞まず、群構成
ステップでは、入力される可能性がある文字が、互いに
同じ統計的性質を有する文字毎に階層的な複数の群に分
類される。そして、群出現確率計算ステップでは、それ
ぞれの群の出現確率が計算される。そして、群中文字出
現確率計算ステップでは、複数の群中における入力文字
の出現確率が計算される。そして、入力文字符号化ステ
ップでは、群中文字出現確率計算ステップで計算された
出現確率に基づいて入力文字が符号化される。<Operation of First Data Compression Method of the Present Invention> First, in the group forming step, characters that may be input are classified into a plurality of hierarchical groups for each character having the same statistical property. . Then, in the group appearance probability calculation step, the appearance probability of each group is calculated. Then, in the group character appearance probability calculation step, the appearance probability of the input character in the plurality of groups is calculated. Then, in the input character encoding step, the input character is encoded based on the appearance probability calculated in the in-group character appearance probability calculating step.

【００５１】＜本発明の第２のデータ圧縮方法の作用＞
第１のデータ圧縮方法の作用において、群構成ステップ
では、群の構成要素の分類が予め固定して与えられる。<Operation of the second data compression method of the present invention>
In the operation of the first data compression method, in the group formation step, the classification of the constituent elements of the group is fixed and given in advance.

【００５２】＜本発明の第３のデータ圧縮方法の作用＞
第１のデータ圧縮方法の作用において、群出現確率計算
ステップでは、群の出現確率が予め固定して与えられ
る。<Operation of the third data compression method of the present invention>
In the operation of the first data compression method, the appearance probability of the group is given in advance in the group appearance probability calculation step.

【００５３】＜本発明の第４のデータ圧縮方法の作用＞
第１のデータ圧縮方法の作用において、群出現確率計算
ステップでは、群の出現確率に予め初期値が設定される
とともに、この群の出現確率が文字の入力に応じて動的
に再計算される。<Operation of the fourth data compression method of the present invention>
In the operation of the first data compression method, in the group appearance probability calculation step, an initial value is set in advance for the appearance probability of the group, and the appearance probability of this group is dynamically recalculated according to the input of characters. .

【００５４】＜本発明の第５のデータ圧縮方法の作用＞
第１のデータ圧縮方法の作用において、群出現確率計算
ステップでは、群の出現確率が、直前の複数文字が属す
る各々の群が出現することを条件とする条件付群出現確
率で計算される。<Operation of the fifth data compression method of the present invention>
In the operation of the first data compression method, in the group appearance probability calculating step, the appearance probability of the group is calculated by the conditional group appearance probability on condition that each group to which the immediately preceding plurality of characters belong appears.

【００５５】＜本発明の第６のデータ圧縮方法の作用＞
第１のデータ圧縮方法の作用において、群構成ステップ
では、階層的な複数の群が、高出現確率文字で構成され
る第１の群と、低出現確率文字で構成される第２の群と
で構成される。<Operation of the sixth data compression method of the present invention>
In the operation of the first data compression method, in the group forming step, a plurality of hierarchical groups include a first group composed of high appearance probability characters and a second group composed of low appearance probability characters. Composed of.

【００５６】＜本発明のデータ圧縮装置の作用＞まず、
群構成部では、入力される可能性がある文字が、互いに
同じ統計的性質を有する文字毎に階層的な複数の群に分
類される。そして、群出現確率計算部では、それぞれの
群の出現確率が計算される。そして、群中文字出現確率
計算部では、複数の群中における入力文字の出現確率が
計算される。そして、入力文字符号化部で、群中文字出
現確率計算部で計算された出現確率に基づいて入力文字
が符号化される。<Operation of Data Compressor of the Present Invention> First,
In the group configuration unit, characters that may be input are classified into a plurality of hierarchical groups for each character having the same statistical property. Then, the group appearance probability calculation unit calculates the appearance probability of each group. Then, the in-group character appearance probability calculation unit calculates the appearance probability of the input character in the plurality of groups. Then, the input character encoding unit encodes the input character based on the appearance probability calculated by the in-group character appearance probability calculating unit.

【００５７】[0057]

【実施例】以下、本発明の実施例を図面を参照して説明
する。＜実施例の構成＞図７は、本実施例の算術符号装置の構
成を示す。算術符号装置は、同図に示されるように、以
下の（イ）〜（ロ）の要素を備えて構成される。（イ）文字列を入力し、該文字列に含まれる文字を、文
字群１、文字群２及び文字群３のいずれかに分類する文
字群分類部１０。ここで、文字群１は、ひらがなを構成
要素とし、文字群２は、スペース、句読点及び改行マー
クを構成要素とし、文字群３は、その他の文字、例えば
漢字を構成要素とする。（ロ）文字群分類部１０が出力する文字群の群番号
（１、２、３のいずれか）を入力すると共に文字列を入
力し、文字出現頻度と各群における入力文字の順位を出
力する確率モデル作成部２０。（ハ）符号化文字の群番号からその群の累積出現頻度を
求め、その区間を符号化すると共に、続いて符号化文字
のその群における符号化文字の累積出現頻度を求め、そ
の区間を符号化する符号部３０。この符号部３０は、文
字群分類部１０から「群番号及び群出現頻度」を入力す
ると共に、確率モデル作成部２０から「文字出現頻度及
び各群における入力文字の順位」を入力し、算術符号を
出力する。Embodiments of the present invention will be described below with reference to the drawings. <Structure of Embodiment> FIG. 7 shows the structure of the arithmetic coding device of this embodiment. As shown in the figure, the arithmetic coding device is configured to include the following elements (a) to (b). (A) A character group classification unit 10 that inputs a character string and classifies the characters included in the character string into any one of the character group 1, the character group 2, and the character group 3. Here, the character group 1 has hiragana as a constituent element, the character group 2 has spaces, punctuation marks, and line feed marks as constituent elements, and the character group 3 has other characters, for example, kanji. (B) The group number (one of 1, 2, or 3) of the character group output by the character group classification unit 10 is input and the character string is input, and the character appearance frequency and the rank of the input character in each group are output. Probabilistic model creation unit 20. (C) The cumulative appearance frequency of the group is calculated from the group number of the coded characters, the section is encoded, and subsequently, the cumulative appearance frequency of the coded character in the group of the coded character is obtained and the section is coded. Encoding unit 30 for converting. The encoding unit 30 inputs the “group number and group appearance frequency” from the character group classification unit 10 and the “character appearance frequency and rank of input characters in each group” from the probabilistic model creating unit 20 to calculate the arithmetic code. Is output.

【００５８】以下、前記（イ）〜（ハ）の要素を詳細に
説明する。〔文字群分類部１０〕文字群分類部１０は、図８に示す
ように、群分類部１１と群確率保持部１２とからなる。The elements (a) to (c) will be described in detail below. [Character Group Classification Unit 10] The character group classification unit 10 includes a group classification unit 11 and a group probability holding unit 12, as shown in FIG.

【００５９】群分類部１１は、文字列を入力し、該文字
列に含まれる文字（シンボルともいう）を、文字群１、
文字群２及び文字群３のいずれかに分類して、分類した
文字群の群番号を出力する。群分類部１１は、シンボル
と群番号とを対応させて格納する対応表１１ａを有して
いる。この対応表１１ａに格納された群番号は、確率モ
デル作成部２０及び符号部３０に出力される。The group classification unit 11 inputs a character string, and identifies characters (also called symbols) included in the character string as the character group 1,
The character group is classified into either the character group 2 or the character group 3, and the group number of the classified character group is output. The group classification unit 11 has a correspondence table 11a that stores symbols and group numbers in association with each other. The group number stored in the correspondence table 11a is output to the probabilistic model creating unit 20 and the encoding unit 30.

【００６０】群確率保持部１２は、群分類部１１から群
番号を入力し、各文字群ごとの出現頻度を出力する。群
確率保持部１２は、群番号と群毎の確率とを対応させて
格納する対応表１２ａを有している。この対応表１２ａ
に格納された群出現確率は、符号部３０に出力される。The group probability holding unit 12 inputs the group number from the group classification unit 11 and outputs the appearance frequency for each character group. The group probability holding unit 12 has a correspondence table 12a that stores group numbers and probabilities for each group in association with each other. This correspondence table 12a
The group appearance probability stored in is output to the encoding unit 30.

【００６１】〔確率モデル作成部２０〕確率モデル作成
部２０は、辞書２１と、カウンタ２２とからなる。辞書
２１は、文字列を入力すると共に、文字群分類部１０よ
り入力された文字が属する群番号を入力して、群番号順
位（群中の出現頻度順位）を出力する。そして、辞書２
１は、文字群毎に、シンボルと群番号順位とを対応させ
て格納する対応表２１ａを有している。この対応表２１
ａに格納された群番号順位は、符号部３０に出力され
る。[Probability Model Creating Unit 20] The probabilistic model creating unit 20 includes a dictionary 21 and a counter 22. The dictionary 21 inputs a character string, inputs the group number to which the character input from the character group classification unit 10 belongs, and outputs the group number rank (the appearance frequency rank in the group). And dictionary 2
1 has a correspondence table 21a for storing the symbols and the group number ranks in association with each other for each character group. This correspondence table 21
The group number rank stored in a is output to the encoding unit 30.

【００６２】カウンタ２２は、辞書２１から群番号順位
を入力し、文字出現確率を出力する。そして、カウンタ
２２は、文字群毎に、群中の出現頻度順位と文字出現頻
度とを対応させて格納する対応表２２ａを有している。The counter 22 inputs the group number rank from the dictionary 21 and outputs the character appearance probability. The counter 22 has a correspondence table 22a that stores, for each character group, the appearance frequency rank in the group and the character appearance frequency in association with each other.

【００６３】〔符号部３０〕符号部３０は、テーブル３
１と、算術符号化部３２とからなる。テーブル３１は、
文字群分類部１０から「群番号及び群出現確率」を入力
すると共に、確率モデル作成部２０から「群内文字順位
及び群内文字出現確率」を入力する。そして、テーブル
３１は、群番号と累積出現頻度とを対応させて格納する
テーブル３１ａと、文字群毎に、郡内文字順位と累積出
現頻度とを対応させて格納する複数のテーブル３１ｂを
有している。[Code Section 30] The code section 30 is a table 3
1 and an arithmetic coding unit 32. The table 31 is
The "group number and the group appearance probability" are input from the character group classification unit 10, and the "in-group character rank and the in-group character appearance probability" are input from the probability model creation unit 20. The table 31 has a table 31a for storing group numbers and cumulative appearance frequencies in association with each other, and a plurality of tables 31b for storing character ranks in groups and cumulative appearance frequencies in association with each other for each character group. ing.

【００６４】算術符号化部３２は、テーブル３１が保持
する累積出現頻度を入力して、算術符号を出力する。こ
こで、どの文字がどの群に属するかという情報と文字群
の出現頻度に関する情報は、予め予想される出現頻度に
従って初期的に与えられる。例えば、図９に示すよう
に、スペース（空白）、Ｅ、Ｔ等の文字は、高出現文字
群に分類され、Ｈ、Ｄ、Ｌ等の文字は、低出現文字群に
分類される。そして、各々の群出現確率は、それぞれの
群に属する文字の個々の出現確率の総和をとったものと
する。The arithmetic coding unit 32 inputs the cumulative appearance frequency held in the table 31 and outputs the arithmetic code. Here, the information regarding which character belongs to which group and the information regarding the appearance frequency of the character group are initially given according to the appearance frequency expected in advance. For example, as shown in FIG. 9, characters such as space (blank), E, and T are classified into the high appearance character group, and characters such as H, D, and L are classified into the low appearance character group. Each group appearance probability is the sum of the individual appearance probabilities of the characters belonging to each group.

【００６５】＜実施例の動作＞次に、図１０を参照し
て、実施例の動作を説明する。まず、ステップ１００１
では、算術符号化する符号語の初期設定として、上端＝
１、下端＝０、区間幅＝１.０とする。<Operation of Embodiment> Next, the operation of the embodiment will be described with reference to FIG. First, step 1001
Then, as the initial setting of the code word to be arithmetically encoded, the upper end =
1, the lower end = 0, and the section width = 1.0.

【００６６】ここで、文字群分類部１０は、予め予想さ
れる出現頻度に基づいて、群分類部１１の群分類と群確
率保持部１２の群確率とを初期化する。なお、群分類の
初期化とは、各群の構成要素とどの文字がどの群に属す
るのかという情報を与えることであり、群確率の初期化
とは、例えば、群１：群２：群３＝３：５：１と群の出
現確率を初期値に従って与えることである。Here, the character group classification unit 10 initializes the group classification of the group classification unit 11 and the group probability of the group probability holding unit 12 based on the expected appearance frequency. The initialization of the group classification is to give information about the constituent elements of each group and which character belongs to which group, and the initialization of the group probability is, for example, group 1: group 2: group 3 = 3: 5: 1 and the appearance probability of the group is given according to the initial value.

【００６７】そして、確率モデル作成部２０は、各シン
ボル群に分類し、各シンボルごとのカウンタ２２を準備
し１に初期化する。また、確率モデル作成部２０は、群
累積出現頻度を累積して計算すると共に、各文字群毎に
別々の各シンボルの順位、累積出現頻度を計算する。な
お、群累積出現頻度を累積して計算するとは、例えば、
群３〜群Ｍの出現頻度を足し合わせて群２の累積出現頻
度とすることをいう。Then, the probabilistic model creating section 20 classifies each symbol group, prepares a counter 22 for each symbol, and initializes it to 1. The probabilistic model creation unit 20 accumulates and calculates the group cumulative appearance frequency, and also calculates the rank of each symbol and the cumulative appearance frequency for each character group. In addition, when the group cumulative appearance frequency is accumulated and calculated, for example,
It means that the appearance frequencies of group 3 to group M are added together to obtain the cumulative appearance frequency of group 2.

【００６８】次に、入力文字列より一文字（“ｋ”とす
る）入力する（ステップ１００２）毎に、文字群分類部
１０は、群分類部１１の辞書を検索して入力文字が属す
る群（“Ｋ”とする）を判別する（ステップ１００
３）。Next, every time one character (denoted as "k") is input from the input character string (step 1002), the character group classification unit 10 searches the dictionary of the group classification unit 11 for the group ( "K" is determined (step 100)
3).

【００６９】ここで、確率モデル作成部２０は、ステッ
プ１００３で判別された群と入力文字に基づいて辞書２
１を検索し、出現頻度順位と、群の各文字出現頻度を出
力する。Here, the probabilistic model creating section 20 uses the dictionary 2 based on the group and the input characters determined in step 1003.
1 is output, and the appearance frequency rank and each character appearance frequency of the group are output.

【００７０】そして、算術符号部３０は、文字群累積出
現頻度を使用して文字群Ｋを算術符号化する（ステップ
１００４）と共に、入力文字ｋを算術符号化する（ステ
ップ１００５）。なお、ステップ１００４の算術符号化
は、（イ）群番号及び群累積出現頻度を用いて入力文字
群の区間の上端と下端を求めること、（ロ）入力文字の
群内出現頻度順位及び当群の累積出現頻度を用いて入力
文字の区間の上端と下端を求めること、（ハ）区間の任
意の値を符号として出力すること、により行われる。Then, the arithmetic encoding unit 30 arithmetically encodes the character group K by using the cumulative appearance frequency of the character group (step 1004) and arithmetically encodes the input character k (step 1005). The arithmetic coding in step 1004 is performed by (a) determining the upper and lower ends of the section of the input character group using the group number and the group cumulative appearance frequency, and (b) the appearance frequency rank of the input character in the group and the current group. The upper and lower ends of the section of the input character are obtained using the cumulative appearance frequency of, and (c) an arbitrary value of the section is output as a code.

【００７１】そして、カウンタ２２にて入力文字ｋの出
現頻度を“１”増やし（ステップ１００６）、頻度順に
文字群Ｋの辞書を並び替える（ステップ１００７）。次
に、“１”増加した文字に伴い、出現頻度順位及び累積
出現頻度を更新する（ステップ１００８）。その後、ス
テップ１００２から実行を繰り返す。Then, the appearance frequency of the input character k is increased by "1" in the counter 22 (step 1006), and the dictionary of the character group K is rearranged in order of frequency (step 1007). Next, the appearance frequency rank and the cumulative appearance frequency are updated with the characters increased by "1" (step 1008). Then, the execution is repeated from step 1002.

【００７２】〔−１次、０次のBlending方法を用いた算
術符号化の動作〕次に、−１次、０次のBlending方法を
用いた算術符号化の動作を、図１１を参照して説明す
る。[Operation of Arithmetic Encoding Using Blending Method of -1st Order and 0th Order] Next, the operation of arithmetic encoding using the Blending method of -1st order and 0th order will be described with reference to FIG. explain.

【００７３】まず、ステップ１１０１では、算術符号化
の初期設定として、（イ）文字群累積出現頻度を準備す
ること、（ロ）各文字出現頻度を０とすること、（ハ）
未出現文字として各文字群毎に全文字を登録すること、
（ニ）各文字群毎に準備した未出現文字確率を１に設定
すること、を行う。First, in step 1101, as initial settings for arithmetic coding, (a) prepare a cumulative appearance frequency of character groups, (b) set each character appearance frequency to 0, (c)
Register all characters for each character group as non-appearing characters,
(D) The probability of non-appearing characters prepared for each character group is set to 1.

【００７４】次に、入力文字列より一文字（“ｋ”とす
る）入力する（ステップ１１０２）毎に、文字群分類部
１０は、群分類部１１の辞書を検索して入力文字が属す
る群（“Ｋ”とする）を判別する（ステップ１１０
３）。Next, each time one character (denoted as "k") is input from the input character string (step 1102), the character group classification unit 10 searches the dictionary of the group classification unit 11 to identify the group ( "K" is determined (step 110)
3).

【００７５】そして、算術符号化部３０は、文字群累積
出現頻度を使用して、文字群Ｋを算術符号化する（ステ
ップ１１０４）。ここで、文字群Ｋが先に出現していた
か否かが判断される（ステップ１１０５）。ステップ
１１０５で、先に出現していたと判断された場合、文字
群Ｋの累積出現頻度を使用して、文字ｋを算術符号化し
（ステップ１１０６）、文字ｋをカウントする（ステッ
プ１１０７）とともに、頻度順に辞書を並び替える（ス
テップ１１０８）。Then, the arithmetic encoding unit 30 arithmetically encodes the character group K using the character group cumulative appearance frequency (step 1104). Here, it is determined whether or not the character group K appears first (step 1105). When it is determined in step 1105 that the character group K appears first, the cumulative appearance frequency of the character group K is used to arithmetically encode the character k (step 1106), the character k is counted (step 1107), and the frequency is calculated. The dictionaries are rearranged in order (step 1108).

【００７６】一方、ステップ１１０５で、先に出現して
いないと判断された場合、文字群Ｋの未出現文字区間を
算術符号化し（ステップ１１０９）、文字ｋを算術符号
化する（ステップ１１１０）とともに、文字ｋを文字群
Ｋの辞書に挿入し、文字ｋを文字群Ｋの未出現文字より
除く（ステップ１１１１）。なお、ステップ１１１０で
は、文字群Ｋの全未出現文字は、等確率にされる。On the other hand, if it is determined in step 1105 that the character group K has not appeared first, the non-appearing character section of the character group K is arithmetically coded (step 1109), and the character k is arithmetically coded (step 1110). , The character k is inserted into the dictionary of the character group K, and the character k is excluded from the non-appearing characters of the character group K (step 1111). In step 1110, all unappearing characters in the character group K are made equal in probability.

【００７７】ステップ１１０８とステップ１１１１の後
に、文字群Ｋの累積出現頻度が更新される。〔符号部３０の算術符号化の具体例〕図１２は、符号部
３０の算術符号化の具体例を示す図である。図１２で
は、「ひらがな」を文字群１、「スペース、句読点、改
行マーク」を文字群２、その他の「数字」等を文字群３
としている。一文字群としている。「ひらがな」の出現
確率は０．５２で、「スペース、句読点、改行マーク」
の出現確率は０．１３である。圧縮の初期では、どの文
字も出現したことがなく、各文字の出現頻度は０であ
る。After steps 1108 and 1111, the cumulative appearance frequency of the character group K is updated. [Specific Example of Arithmetic Encoding of Encoding Unit 30] FIG. 12 is a diagram illustrating a specific example of arithmetic encoding of the encoding unit 30. In FIG. 12, “Hiragana” is the character group 1, “Space, punctuation mark, line feed mark” is the character group 2, and other “numbers” are the character group 3.
And It is one character group. The appearance probability of "Hiragana" is 0.52, and "Space, punctuation marks, line break marks"
The appearance probability of is 0.13. In the initial stage of compression, no character has ever appeared, and the appearance frequency of each character is zero.

【００７８】この場合、従来の方式では、どの文字も等
確率で出現可能と考えて、等幅の符号区間を設定する
が、本実施例では、図１２（Ｂ）に示すように各群の出
現確率に応じて定め、文字群区間の中でその文字群に属
する各文字を等幅とする。各文字群区間は、前述した各
群出現確率（図１２（Ａ）参照）に従って分ける。In this case, in the conventional method, it is considered that any character can appear with equal probability, and a code interval of equal width is set, but in the present embodiment, as shown in FIG. It is determined according to the appearance probability, and each character belonging to the character group in the character group section has a uniform width. Each character group section is divided according to the above-described group appearance probability (see FIG. 12A).

【００７９】本発明の文字群区間を定めた上で各文字区
間を定める方式によると、図１２（Ｂ）に示すように、
圧縮初期の段階から出現確率の高い文字に対して広い符
号区間を与えることができる。According to the method of defining each character section after defining the character group section of the present invention, as shown in FIG.
From the initial stage of compression, a wide code section can be given to a character with a high appearance probability.

【００８０】＜本実施例の変形例＞前記実施例では、文
字群出現確率を固定的なものとしてきたが、文字群出現
確率を動的に変える変形例を述べる。<Modification of this Embodiment> In the above embodiment, the character group appearance probability is fixed, but a modification in which the character group appearance probability is dynamically changed will be described.

【００８１】（１）文字群出現確率を個々の群の出現確
率を動的に変えるもの（２）群の文脈を取り入れた、条件付出現確率を動的に
変えるものまず、文字群出現確率を、個々の群の出現確率を動的に
変えるものを図１３に示す。これは、図７における文字
群分類部１０に相当する。文字群分類部１０は、どの文
字がどの群に属するかを示す群分類部１１と、各文字群
に初期値を与え、文字を入力するごとに、その文字の属
する群の出現頻度を“１”増やし、群累積出現頻度を更
新する群カウンタ１３とから構成されている。(1) Dynamically changing the probability of occurrence of a character group by changing the probability of occurrence of each group (2) Dynamically changing the probability of occurrence of conditional expressions by incorporating the context of a group FIG. 13 shows that the appearance probability of each group is dynamically changed. This corresponds to the character group classification unit 10 in FIG. The character group classification unit 10 gives a group classification unit 11 that indicates which character belongs to which group, and gives an initial value to each character group, and every time a character is input, the appearance frequency of the group to which the character belongs is set to "1". The group counter 13 is configured to increase and update the group cumulative appearance frequency.

【００８２】その動作は、図１４を参照して説明する。
まず、ステップ１４０１では、初期設定として、文字群
累積出現頻度をとって、各文字出現頻度を１とし、各文
字群毎に累積出現頻度をとる。The operation will be described with reference to FIG.
First, in step 1401, as an initial setting, the cumulative appearance frequency of character groups is taken, each character appearance frequency is set to 1, and the cumulative appearance frequency is taken for each character group.

【００８３】次に、入力文字列より一文字（“ｋ”）入
力する（ステップ１４０２）毎に、文字群分類部１０
は、群分類部１１の辞書を検索して入力文字が属する群
（“Ｋ”とする）を判別する（ステップ１４０３）。Next, each time one character (“k”) is input from the input character string (step 1402), the character group classification unit 10
Searches the dictionary of the group classification unit 11 to determine the group to which the input character belongs (denoted as "K") (step 1403).

【００８４】そして、算術符号部３０は、文字群累積出
現頻度を使用して文字群Ｋを算術符号化する（ステップ
１４０４）と共に、入力文字ｋを算術符号化する（ステ
ップ１４０５）。Then, the arithmetic encoding unit 30 arithmetically encodes the character group K by using the cumulative appearance frequency of the character group (step 1404) and arithmetically encodes the input character k (step 1405).

【００８５】そして、文字ｋの出現頻度と文字群Ｋの出
現頻度を、それぞれ１ずつ増加させ（ステップ１４０
６）、頻度順に文字群Ｋの辞書を並び替える（ステップ
１４０７）。Then, the appearance frequency of the character k and the appearance frequency of the character group K are each increased by 1 (step 140
6) Then, the dictionary of the character group K is rearranged in order of frequency (step 1407).

【００８６】次に、ステップ１４０６の増加に伴い、文
字群Ｋの累積出現頻度を更新する（ステップ１４０
８）。同様に、群の文脈を取り入れ、条件付出現確率を
動的に得ることもできる。０次の値は初期値を与え、１
次以上の条件付確率は、図５に示すように、各ノード群
を通る文字群が出る毎に、出現回数を各ノードにて計数
しておくことによって条件付確率が求められる。ここで
従来は、シンボルが木の各節点になっていたのに対し、
本実施例では、群が木の各節点になっている。Next, with the increase in step 1406, the cumulative appearance frequency of the character group K is updated (step 140).
8). Similarly, it is possible to take the context of a group and obtain the conditional occurrence probability dynamically. The 0th order value gives the initial value and 1
As for the conditional probabilities not less than the following, as shown in FIG. 5, the conditional probabilities are obtained by counting the number of appearances at each node every time a character group passing through each node group appears. Here, in the past, the symbol was each node of the tree, whereas
In this example, the group is each node of the tree.

【００８７】群出現頻度に１次の条件付出現確率をとる
場合のフローを図１５に示す。まず、初期化として以下
の（イ）〜（ヘ）を行う（ステップ１６０１）。（イ）各文字群出現頻度を初期化する。（ロ）文字群累積出現頻度をとる。（ハ）各文字出現頻度を“１”とする。（ニ）各文字群毎に累積出現頻度をとる。（ホ）一つ前の文字の群番号を保持する。（ヘ）一つ前の群番号を保持しておくレジスタＲ（＝文
脈）を初期化する。FIG. 15 shows a flow in the case where the first-order conditional appearance probability is taken as the group appearance frequency. First, the following (i) to (f) are performed as initialization (step 1601). (B) Initialize the appearance frequency of each character group. (B) The cumulative appearance frequency of character groups is taken. (C) Each character appearance frequency is set to "1". (D) The cumulative appearance frequency is taken for each character group. (E) Holds the group number of the previous character. (F) Initialize the register R (= context) that holds the previous group number.

【００８８】次に、一文字（ｋとする）を入力する（ス
テップ１６０２）。そして、どの文字群（Ｋとする）に
入力文字ｋが属するかを判別する（ステップ１６０
３）。Then, one character (k) is input (step 1602). Then, it is determined to which character group (K) the input character k belongs (step 160).
3).

【００８９】そして、「RKの出現頻度／Rの出現頻度」
を意味する条件付き確率P(K|R)を符号部３０にて算術符
号化する。つまり、R に続いてそれぞれの群が起こる確
率に従って区間を分割し、このうち群K の区間を選択す
る。なお、各群の区間は、Rに続いて起こる文字群の累
積出現頻度によってその下限が求まる（ステップ１６０
４）。Then, "appearance frequency of RK / appearance frequency of R"
The conditional probability P (K | R) that means that is arithmetically encoded by the encoding unit 30. That is, the interval is divided according to the probability that each group follows R, and the interval of group K is selected. The lower limit of the section of each group is determined by the cumulative appearance frequency of the character group that follows R (step 160).
4).

【００９０】そして、条件付き確率P(k|K)を算術符号化
すると共に、文字群の条件付累積出現頻度CF(k|K)を使
用して、入力文字ｋを算術符号化する（ステップ１６０
５）。Then, the conditional probability P (k | K) is arithmetically coded, and at the same time, the conditional cumulative appearance frequency CF (k | K) of the character group is used to arithmetically code the input character k (step 160
5).

【００９１】そして、文字出現頻度C(k|K)、C(K|R)の値
をそれぞれ“１”だけ増加させる（ステップ１６０
６）。そして、文字群Ｋの辞書を文字出現頻度C(x|K)に
従って並び替える（ステップ１６０７）。Then, the values of the character appearance frequencies C (k | K) and C (K | R) are each increased by "1" (step 160).
6). Then, the dictionary of the character group K is rearranged according to the character appearance frequency C (x | K) (step 1607).

【００９２】そして、文字群Ｋの文字累積出現頻度CF(x
|K)を更新すると共に、文字群Ｒに続く文字群の群累積
出現頻度CF(X|R)を更新する（ステップ１６０８）。そ
して、レジスタＲに入力文字ｋを設定する（ステップ１
６０９）。Then, the cumulative character appearance frequency CF (x
| K) and the group cumulative appearance frequency CF (X | R) of the character group following the character group R are updated (step 1608). Then, the input character k is set in the register R (step 1
609).

【００９３】以後、ステップ１６０２からの処理を繰り
返す。＜実施例の効果＞次に、実施例のデータ圧縮効果を図１
６を参照して説明する。Thereafter, the processing from step 1602 is repeated. <Effect of Embodiment> Next, the data compression effect of the embodiment is shown in FIG.
This will be described with reference to FIG.

【００９４】図１６（Ａ）は、データ圧縮率が圧縮対象
ファイルのサイズによってどう変化するかを、本実施例
による場合、静的符号化方式（準適応型）による場合及
び適応型符号化方式による場合の３つのケースについて
示したものである。図１６（Ａ）の横軸と横軸は、それ
ぞれ圧縮対象ファイルサイズとデータ圧縮率であり、線
７ａは、本実施例による場合、線７ｂは、静的符号化方
式による場合、線７ｃは、適応型符号化方式による場合
をそれぞれ示している。FIG. 16A shows how the data compression rate changes according to the size of the file to be compressed, according to the present embodiment, in the case of the static coding method (semi-adaptive type) and in the adaptive coding method. It shows three cases in the case of. The horizontal axis and the horizontal axis of FIG. 16 (A) are the compression target file size and the data compression rate, respectively. The line 7a is according to the present embodiment, the line 7b is based on the static encoding method, and the line 7c is , And the case of using the adaptive coding method.

【００９５】図１６（Ａ）から明らかなように、静的符
号化方式による場合は、圧縮対象ファイルのサイズにか
かわらずほぼ一定のデータ圧縮率を保持し、比較対象の
中では最もデータ圧縮できていることが分かる。一方、
適応型符号化方式と本実施例による場合は、圧縮対象フ
ァイルのサイズが大きくなるほど圧縮率が小さく、即
ち、よりよくデータ圧縮され、静的符号化方式のデータ
圧縮率に近づくことが分かる。そして、本実施例による
場合は、適応型符号化方式による場合よりも常にデータ
圧縮率が小さくなっている。As is clear from FIG. 16A, in the case of the static encoding method, the data compression rate is kept almost constant regardless of the size of the file to be compressed, and the most data compression is possible among the comparison objects. I understand that. on the other hand,
In the case of the adaptive coding method and this embodiment, it can be seen that the larger the size of the file to be compressed is, the smaller the compression rate is, that is, the more the data is compressed, the closer to the data compression rate of the static coding method. In the case of this embodiment, the data compression rate is always smaller than that in the case of the adaptive coding method.

【００９６】ここで、圧縮対象ファイルサイズがほぼ０
の時における、適応型符号化方式による場合と静的符号
化方式による場合のデータ圧縮率の差は、静的符号化方
式には、各文字出現頻度の初期値が与えられているため
である。Here, the compression target file size is almost 0.
The difference in the data compression rate between the adaptive coding method and the static coding method at the time is because the static coding method is given the initial value of each character appearance frequency. .

【００９７】また、圧縮対象ファイルサイズがぼぼ０の
時における、適応型符号化方式による場合と本実施例に
よる場合とのデータ圧縮率の差は、本実施例には、各群
出現頻度の初期値が与えられているためである。In addition, the difference in the data compression rate between the case of the adaptive coding method and the case of the present embodiment when the file size to be compressed is almost 0 is that in the present embodiment, the initial appearance frequency of each group is This is because the value is given.

【００９８】次に、図１６（Ｂ）は、圧縮前のファイル
サイズによって圧縮後のファイルサイズがどう変化する
かを、本実施例による場合、静的符号化方式（準適応
型）による場合及び適応型符号化方式による場合の３つ
のケースについて比較したものである。なお、図１６
（Ｂ）中には、符号化を行わない場合も参考に記してあ
る。図１６（Ｂ）の横軸と縦軸は、それぞれ圧縮前のフ
ァイルサイズと圧縮後のファイルサイズであり、線７ｄ
は、本実施例による場合、線７ｅは、静的符号化方式に
よる場合、線７ｆは、適応型符号化方式による場合、線
７ｇは、符号化を行わない場合をそれぞれ示している。Next, FIG. 16B shows how the file size after compression changes depending on the file size before compression, in the case of the present embodiment, in the case of the static coding system (semi-adaptive type), and It is a comparison of three cases in the case of using the adaptive coding method. Note that FIG.
In (B), reference is also made to the case where no encoding is performed. The horizontal axis and the vertical axis in FIG. 16B represent the file size before compression and the file size after compression, respectively, and the line 7d
In the present embodiment, the line 7e indicates the case of the static encoding method, the line 7f indicates the case of the adaptive encoding method, and the line 7g indicates the case of not performing the encoding.

【００９９】図１６（Ｂ）から明らかなように、符号化
を行ういずれの場合にも、圧縮前のファイルサイズが大
きくなるほど、圧縮後のファイルサイズの増加は鈍るこ
とが分かる。また、静的符号化方式による場合と本実施
例による場合は、ファイルサイズが小さい場合に、圧縮
後のファイルサイズが圧縮前のファイルサイズよりも大
きくなることが分かる。As is clear from FIG. 16B, in any case of encoding, the larger the file size before compression, the slower the increase in file size after compression. Further, in the case of the static encoding method and the case of the present embodiment, it is understood that the file size after compression becomes larger than the file size before compression when the file size is small.

【０１００】そして、所定値よりも圧縮前のファイルサ
イズが小さいときは、静的符号化方式による場合、本実
施例による場合、適応型符号化方式による場合の順に圧
縮後のファイルサイズが小さくなるが、前記所定値より
も圧縮前のファイルファイルサイズが大きくなると、適
応型符号化方式による場合、静的符号化方式による場
合、本実施例による場合の順に圧縮後のファイルサイズ
が小さくなることが分かる。When the file size before compression is smaller than the predetermined value, the file size after compression becomes smaller in the order of the static coding method, the present embodiment, and the adaptive coding method. However, if the file size before compression becomes larger than the predetermined value, the file size after compression may decrease in the order of the adaptive encoding method, the static encoding method, and the present embodiment. I understand.

【０１０１】ここで、圧縮前のファイルサイズがほぼ０
の時における、静的符号化方式に補助データを付加した
場合のファイルサイズが０でないのは、各文字出現頻度
の初期値情報を補助データとして持つためである。Here, the file size before compression is almost 0.
At the time of, the file size when auxiliary data is added to the static encoding method is not 0 because it has initial value information of each character appearance frequency as auxiliary data.

【０１０２】また、圧縮前のファイルサイズがほぼ０の
時における、本実施例に補助データを付加した場合のフ
ァイルサイズが０でないのは、各群出現頻度の初期値情
報を補助データとして持つためである。Further, when the file size before compression is almost 0, the file size when auxiliary data is added to this embodiment is not 0 because the initial value information of each group appearance frequency is held as auxiliary data. Is.

【０１０３】[0103]

【発明の効果】本発明の第１のデータ圧縮方法及びデー
タ圧縮装置によれば、文字を互いに同じ統計的性質を有
する文字ごとに複数の群に分類し、さらに、それぞれの
群の出現確率を計算するようにしたため、従来の方法に
比べ、初期の段階で最適な符号領域を割り振ることがで
きる。これは、出現可能な文字数が多く、圧縮対象ファ
イルが小さいときに特に有効である。つまり、従来の適
応型符号化方式では、確率モデルを構築するのにある程
度の長さの入力列を必要とし、圧縮対象のサイズが小さ
い場合は圧縮率が上がらないのに対して、本発明では、
十分な圧縮率を得ることができる。According to the first data compression method and data compression apparatus of the present invention, characters are classified into a plurality of groups for each character having the same statistical property, and the appearance probability of each group is calculated. Since the calculation is performed, the optimum code area can be allocated in the initial stage as compared with the conventional method. This is particularly effective when the number of characters that can appear is large and the compression target file is small. That is, in the conventional adaptive coding method, an input sequence of a certain length is required to construct the probabilistic model, and the compression rate does not increase when the size of the compression target is small, whereas in the present invention, ,
A sufficient compression rate can be obtained.

【０１０４】本発明の第２及び第３のデータ圧縮方法に
よれば、第１のデータ圧縮方法に比べ、データの出現確
率に従った符号を予め割り当てることができるため、フ
ァイルサイズが小さい場合にも高い圧縮率を得ることが
可能になる。According to the second and third data compression methods of the present invention, the code according to the appearance probability of the data can be assigned in advance as compared with the first data compression method. Therefore, when the file size is small. It is also possible to obtain a high compression rate.

【０１０５】本発明の第４のデータ圧縮方法によれば、
入力データに従って出現頻度を計算し直すので、徐々に
データに即した出現頻度に基づく圧縮が可能になる。本
発明の第５及び第６のデータ圧縮方法によれば、直前に
出現した文字が属する群あるいは直々前に出現した文字
が属する群を条件とした条件付確率を用いることで、さ
らに高い圧縮率が得られるようになる。According to the fourth data compression method of the present invention,
Since the appearance frequency is recalculated according to the input data, it is possible to gradually perform compression based on the appearance frequency according to the data. According to the fifth and sixth data compression methods of the present invention, by using the conditional probability that the group to which the character appearing immediately before belongs or the group to which the character appearing immediately before belongs is used, a higher compression rate is obtained. Will be obtained.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明のデータ圧縮の原理図である。（Ａ）は
データ圧縮方法の原理図を示し、（Ｂ）はデータ圧縮装
置の原理図を示す。FIG. 1 is a principle diagram of data compression of the present invention. (A) shows a principle diagram of a data compression method, and (B) shows a principle diagram of a data compression device.

【図２】多値算術符号の原理を示す図である。（Ａ）
は、各文字の出現頻度を示している。（Ｂ）は、出現頻
度順の累積出現頻度を示している。（Ｃ）は、算術符号
化の原理を示している。FIG. 2 is a diagram showing the principle of multi-value arithmetic code. (A)
Indicates the appearance frequency of each character. (B) shows the cumulative appearance frequencies in order of appearance frequency. (C) shows the principle of arithmetic coding.

【図３】算術符号化の装置構成を示す図である。FIG. 3 is a diagram showing a device configuration of arithmetic coding.

【図４】従来の多値算術符号化のフローを示す図であ
る。FIG. 4 is a diagram showing a flow of conventional multilevel arithmetic coding.

【図５】文脈の木構造（２次の場合）を示す図である。FIG. 5 is a diagram showing a tree structure of context (secondary case).

【図６】従来の多値算術符号化（−１、０次のブレンデ
ィング）のフローを示す図である。FIG. 6 is a diagram showing a flow of conventional multi-value arithmetic coding (-1, 0th order blending).

【図７】実施例の装置構成の概略を示す図である。FIG. 7 is a diagram showing an outline of a device configuration of an example.

【図８】実施例の装置構成を詳細に示す図である。FIG. 8 is a diagram showing in detail the device configuration of the embodiment.

【図９】群分類と群出現確率を示す図である。FIG. 9 is a diagram showing group classification and group appearance probability.

【図１０】実施例の多値算術符号化のフローを示す図で
ある（その１）。FIG. 10 is a diagram showing a flow of multi-value arithmetic coding according to the embodiment (No. 1).

【図１１】実施例の多値算術符号化のフローを示す図で
ある（その２）。このフローは、−１、０次のブレンデ
ィングになっている。FIG. 11 is a diagram showing a flow of multilevel arithmetic encoding according to the embodiment (No. 2). This flow is -1, 0th order blending.

【図１２】文字群出現確率及び初期符号区間を示す図で
ある。（Ａ）は、文字群出現確率を示している。（Ｂ）
は、確率モデル未保持区間時点における符号区間を示し
ている。FIG. 12 is a diagram showing a character group appearance probability and an initial code section. (A) shows the character group appearance probability. (B)
Indicates the code section at the time point when the probability model is not held.

【図１３】文字群分類部を示す図である。FIG. 13 is a diagram showing a character group classification unit.

【図１４】実施例の多値算術符号化のフローを示す図で
ある（その３）。FIG. 14 is a diagram showing a flow of multilevel arithmetic encoding according to the embodiment (No. 3).

【図１５】実施例の多値算術符号化のフローを示す図で
ある（その４）。FIG. 15 is a diagram showing a flow of multilevel arithmetic encoding according to the embodiment (No. 4).

【図１６】従来の算術符号化と本実施例の算術符号化と
の効果の比較図である。（Ａ）は、圧縮対象ファイルサ
イズが変化した場合におけるデータ圧縮率の変化を示し
ている。（Ｂ）は、圧縮前のファイルサイズが変化した
場合における圧縮後のファイルサイズの変化を示してい
る。FIG. 16 is a comparison diagram of effects of the conventional arithmetic coding and the arithmetic coding of the present embodiment. (A) shows a change in the data compression rate when the size of the compression target file changes. (B) shows a change in the file size after compression when the file size before compression changes.

【符号の説明】[Explanation of symbols]

Ｓ１群構成ステップＳ２群出現確率計算ステップＳ３群中文字出現確率計算ステップＭ１群構成部Ｍ２群出現確率計算部Ｍ３群中文字出現確率計算部１０文字群分類部１１群分類部１２群確率保持部１３群カウンタ２０確率モデル作成部２１辞書２２カウンタ３０符号部３１テーブル３２算術符号化部４０算術符号部 S1 group configuration step S2 group appearance probability calculation step S3 group medium character appearance probability calculation step M1 group configuration section M2 group appearance probability calculation section M3 group medium character appearance probability calculation section 10 character group classification section 11 group classification section 12 group probability storage section 13 group counter 20 stochastic model creation unit 21 dictionary 22 counter 30 coding unit 31 table 32 arithmetic coding unit 40 arithmetic coding unit

Claims

【特許請求の範囲】[Claims]

【請求項１】入力された文字を、その出現確率に応じた
符号長を持つ可変長符号に符号化することでデータの圧
縮を行うデータ圧縮方法において、入力される可能性がある文字を、互いに同じ統計的性質
を有する文字毎に階層的な複数の群にそれぞれ分類する
群構成ステップと、前記それぞれの群の出現確率を計算する群出現確率計算
ステップと、前記複数の群中における入力文字の出現確率を計算する
群中文字出現確率計算ステップと、前記群中文字出現確率計算ステップで計算された出現確
率に基づいて入力文字を符号化する入力文字符号化ステ
ップとを備えたことを特徴とするデータ圧縮方法。1. A data compression method for compressing data by encoding an input character into a variable-length code having a code length according to the appearance probability of the input character, A group forming step of classifying each of the characters having the same statistical property into a plurality of hierarchical groups, a group appearance probability calculating step of calculating an appearance probability of each group, and an input character in the plurality of groups In the group character appearance probability calculating step for calculating the appearance probability of, and an input character encoding step for encoding an input character based on the appearance probability calculated in the group in character appearance probability calculating step, Data compression method.

【請求項２】前記群構成ステップでは、前記群の構成要
素の分類を予め固定して与えることを特徴とする請求項
１に記載のデータ圧縮方法。2. The data compression method according to claim 1, wherein in the group forming step, the classification of the constituent elements of the group is fixed and given in advance.

【請求項３】前記群出現確率計算ステップでは、前記群
の出現確率を予め固定して与えることを特徴とする請求
項１に記載のデータ圧縮方法。3. The data compression method according to claim 1, wherein in the group appearance probability calculation step, the appearance probability of the group is fixed and given in advance.

【請求項４】前記群出現確率計算ステップでは、前記群
の出現確率に予め初期値を設定するとともに、この群の
出現確率を前記文字の入力に応じて動的に再計算するこ
とを特徴とする請求項１に記載のデータ圧縮方法。4. The group appearance probability calculating step is characterized in that an initial value is set in advance for the appearance probability of the group, and the appearance probability of the group is dynamically recalculated according to the input of the character. The data compression method according to claim 1.

【請求項５】前記群出現確率計算ステップでは、前記群
の出現確率を、直前の複数文字が属する各々の群が出現
することを条件とする条件付群出現確率で計算すること
を特徴とする請求項１に記載のデータ圧縮方法。5. The group appearance probability calculation step is characterized in that the appearance probability of the group is calculated with a conditional group appearance probability on condition that each group to which a plurality of immediately preceding characters belong appears. The data compression method according to claim 1.

【請求項６】前記群構成ステップでは、前記階層的な複
数の群を、高出現確率文字で構成される第１の群と、低
出現確率文字で構成される第２の群とで構成することを
特徴とする請求項１に記載のデータ圧縮方法。6. In the group forming step, the plurality of hierarchical groups are composed of a first group composed of high appearance probability characters and a second group composed of low appearance probability characters. The data compression method according to claim 1, wherein:

【請求項７】入力された文字を、その出現確率に応じた
符号長を持つ可変長符号に符号化することでデータの圧
縮を行うデータ圧縮装置において、入力される可能性がある文字を、互いに同じ統計的性質
を有する文字毎に階層的な複数の群にそれぞれ分類する
群構成部と、前記それぞれの群の出現確率を計算する群出現確率計算
部と、前記複数の群中における入力文字の出現確率を計算する
群中文字出現確率計算部と、前記群中文字出現確率計算部で計算された出現確率に基
づいて入力文字を符号化する入力文字符号化部とを備え
たことを特徴とするデータ圧縮装置。7. A character that may be input in a data compression device that compresses data by encoding an input character into a variable length code having a code length according to its appearance probability. A group configuration unit that classifies each of the characters having the same statistical property into a plurality of hierarchical groups, a group appearance probability calculation unit that calculates the appearance probability of each group, and an input character in the plurality of groups. In the group character appearance probability calculation unit for calculating the appearance probability of, and an input character encoding unit for encoding the input character based on the appearance probability calculated in the group character appearance probability calculation unit, And a data compression device.