WO2010067897A1

WO2010067897A1 - Data compression device, data compression method, data compression program, and compressed data communication system

Info

Publication number: WO2010067897A1
Application number: PCT/JP2009/071014
Authority: WO
Inventors: 木下聡
Original assignee: 日本電気株式会社
Priority date: 2008-12-12
Filing date: 2009-12-10
Publication date: 2010-06-17
Also published as: JPWO2010067897A1

Abstract

The conventional data compression device lowers the compression ratio by using a code having an inappropriate length without considering the data variety. Disclosed is a data compression device including: a code length storage means which stores respective code lengths which increase at least once without being decreased as the number of fields is increased; and an allocation means which inputs the number of fields (N: more than 1), acquires a code length corresponding to the number of fields (n) not greater than N from the code length storage means, and outputs the code of the code length.

Description

データ圧縮装置、データ圧縮方法、データ圧縮プログラム、および、圧縮データ通信システムData compression apparatus, data compression method, data compression program, and compressed data communication system

　本発明は、データ圧縮装置、データ圧縮方法、データ圧縮プログラム、および、圧縮データ送受信システムに関する。 The present invention relates to a data compression device, a data compression method, a data compression program, and a compressed data transmission / reception system.

　特許文献１は、データストリームの文字列の各々に専用のコードを割り当て、当該コードを用いて、データストリームを圧縮する送信機が記載されている。ここで、当該コードは、文字列の辞書中におけるアドレスである。
　特許文献２は、入力文字列と一致する辞書中の部分列で、最も長い部分列（Ｓ）を検索するデータ圧縮復元装置を開示する。この装置は、部分列Ｓを、辞書中の位置に基づいて決定されるビット長の符号に変換して圧縮する。
　特許文献３は、複数のフィールドからなる元ファイルのレコードを入力して、その複数のフィールドをまとめて固定長フィールドに変更するファイル管理方法を開示する。
特開平８−２５１０３５号公報特開平６−２０２８４４号公報特開平１１−１５４１５５号公報 Patent Document 1 describes a transmitter that assigns a dedicated code to each character string of a data stream and compresses the data stream using the code. Here, the code is an address in the dictionary of character strings.
Patent Document 2 discloses a data compression / decompression device that searches for the longest partial sequence (S) in a partial sequence in a dictionary that matches an input character string. This apparatus converts the partial sequence S into a code having a bit length determined on the basis of the position in the dictionary and compresses it.
Patent Document 3 discloses a file management method in which a record of an original file composed of a plurality of fields is input and the plurality of fields are collectively changed to a fixed-length field.
JP-A-8-251035 JP-A-6-202844 JP-A-11-154155

　上記の文献の技術は、複数フィールドからなるデータの圧縮に際し、各フィールドに於けるデータの多様度を考慮した最適な長さの符号を使用して、当該データを圧縮することが出来ない。この結果、不適切な長さのコードを使用して、圧縮率を低下させるおそれがある。
　具体的に、特許文献１および特許文献２に記載の技術は、文字列をコードまたは符号で圧縮するに際し、当該文字列の多様度を考慮しない。また、特許文献３の方法は、フィールドの長さを基準に複数フィールドをまとめており、フィールド内データの多様度は考慮しない。
　本発明の目的は、上記課題を解決するための、データ圧縮装置、データ圧縮方法、データ圧縮プログラム、および、圧縮データ通信システムを提供することにある。 According to the technique of the above-mentioned literature, when data consisting of a plurality of fields is compressed, the data cannot be compressed using a code having an optimum length considering the diversity of data in each field. As a result, there is a risk of using a cord having an inappropriate length and reducing the compression rate.
Specifically, the techniques described in Patent Document 1 and Patent Document 2 do not consider the diversity of character strings when compressing character strings with codes or codes. Further, the method of Patent Document 3 collects a plurality of fields based on the length of the field, and does not consider the diversity of data in the field.
An object of the present invention is to provide a data compression device, a data compression method, a data compression program, and a compressed data communication system for solving the above problems.

　本発明の一実施形態のデータ圧縮装置は、フィールド数の各々に対応して、前記フィールド数の増加に伴って減少せずに少なくとも一度は増加する符号長の各々を格納する符号長記憶部と、フィールド数（Ｎ；複数）を入力して、前記符号長記憶部から前記Ｎ以下の前記フィールド数（ｎ）に対応する前記符号長を取得し、前記符号長の符号を出力する割り当て部を備える。
　本発明の一実施形態のデータ圧縮方法は、フィールド数の各々に対応して、前記フィールド数の増加に伴って減少せずに少なくとも一度は増加する符号長の各々を格納する符号長記憶部を準備する符号長記憶工程と、フィールド数（Ｎ；複数）を入力して、前記符号長記憶部から前記Ｎ以下の前記フィールド数（ｎ）に対応する前記符号長を取得し、前記符号長の符号を出力する割り当て工程を有する。
　本発明の一実施形態のデータ圧縮プログラムは、フィールド数の各々に対応して、前記フィールド数の増加に伴って減少せずに少なくとも一度は増加する符号長の各々を格納する符号長記憶部を備えるコンピュータに、フィールド数（Ｎ；複数）を入力して、前記符号長記憶部から前記Ｎ以下の前記フィールド数（ｎ）に対応する前記符号長を取得し、前記符号長の符号を出力する割り当て処理を実行させる。 A data compression apparatus according to an embodiment of the present invention includes a code length storage unit that stores each of code lengths that increase at least once without decreasing as the number of fields increases, corresponding to the number of fields. An allocation unit that inputs the number of fields (N; plural), obtains the code length corresponding to the number of fields (n) equal to or less than N from the code length storage unit, and outputs a code of the code length Prepare.
A data compression method according to an embodiment of the present invention includes a code length storage unit that stores each of code lengths that increase at least once without decreasing as the number of fields increases, corresponding to the number of fields. The code length storage step to be prepared and the number of fields (N; plural) are input, the code length corresponding to the number of fields (n) equal to or less than N is obtained from the code length storage unit, and the code length An assigning step for outputting a code;
A data compression program according to an embodiment of the present invention includes a code length storage unit that stores each of code lengths that increase at least once without decreasing as the number of fields increases, corresponding to each number of fields. The number of fields (N; plural) is input to the computer, the code length corresponding to the number of fields (n) equal to or less than N is acquired from the code length storage unit, and the code of the code length is output Execute the allocation process.

　本発明は、データの多様度を反映した、圧縮率が高いデータ圧縮を可能とする。 The present invention enables data compression with a high compression ratio reflecting the diversity of data.

図１は、第１の実施形態にかかるデータ圧縮装置１０が圧縮するデータを示す。FIG. 1 shows data to be compressed by the data compression apparatus 10 according to the first embodiment. 図２は、第１の実施形態のデータ圧縮装置１０の構成図である。FIG. 2 is a configuration diagram of the data compression apparatus 10 according to the first embodiment. 図３は、符号長記憶部４０に格納されるデータを示す。FIG. 3 shows data stored in the code length storage unit 40. 図４は、符号表５０に格納されるデータを示す。FIG. 4 shows data stored in the code table 50. 図５は、ＤＢ管理部３０の動作フローチャートである。FIG. 5 is an operation flowchart of the DB management unit 30. 図６Ａは、割り当て部３２の動作フローチャート（１／２）である。FIG. 6A is an operation flowchart (1/2) of the allocation unit 32. 図６Ｂは、割り当て部３２の動作フローチャート（２／２）である。FIG. 6B is an operation flowchart (2/2) of the assignment unit 32. 図７は、検索符号化部３３の動作フローチャートである。FIG. 7 is an operation flowchart of the search encoding unit 33. 図８は、復号部３１の動作フローチャートである。FIG. 8 is an operation flowchart of the decoding unit 31. 図９は、第２の実施形態のデータ圧縮装置１０が圧縮するデータを示す。FIG. 9 shows data to be compressed by the data compression apparatus 10 according to the second embodiment. 図１０は、第２の実施形態の符号長記憶部４０に格納されるデータを示す。FIG. 10 shows data stored in the code length storage unit 40 of the second embodiment. 図１１は、第３の実施形態のデータ圧縮装置１０の構成図である。FIG. 11 is a configuration diagram of the data compression apparatus 10 according to the third embodiment. 図１２は、本実施形態のデータ圧縮装置１０が行うデータ圧縮の様子を示す。FIG. 12 shows a state of data compression performed by the data compression apparatus 10 of this embodiment. 図１３は、第４の実施形態のデータ圧縮装置１０の構成図である。FIG. 13 is a configuration diagram of the data compression apparatus 10 according to the fourth embodiment. 図１４は、第５の実施形態のデータ圧縮装置１０が圧縮するデータを示す。FIG. 14 shows data to be compressed by the data compression apparatus 10 according to the fifth embodiment. 図１５は、第５の実施形態の割り当て部３２の動作フローチャートである。FIG. 15 is an operation flowchart of the assignment unit 32 according to the fifth embodiment. 図１６は、本発明のデータ圧縮装置１０の基本構成を示す。FIG. 16 shows the basic configuration of the data compression apparatus 10 of the present invention.

　１０　データ圧縮装置
　１１　コンピュータ
　１２　コンテンツデータベース
　１３　圧縮データ受信装置
　１４　圧縮データ通信システム
　２０　格納データ
　２１　格納圧縮データ
　２２　格納コンテンツ
　２４　コンテンツデータ
　２５　コンテンツ圧縮データ
　２６　検索データ
　２７　検索圧縮データ
　２８　出力コンテンツ
　３０　ＤＢ管理部
　３１　復号部
　３２　割り当て部
　３３　検索符号化部
　３４　送信部
　３５　受信部
　３６　受信符号表格納部
　３８　圧縮解除プログラム
　３９　データ圧縮プログラム
　４０　符号長記憶部
　４１　記憶部エントリ
　４２　フィールド数
　４３　符号長
　５０　符号表
　５１　表エントリ
　５２　コード
　５３　ｎ値
　５４　ｎデータ
　５５　長さ表示
　５６　符号 DESCRIPTION OF SYMBOLS 10 Data compression apparatus 11 Computer 12 Content database 13 Compressed data receiving apparatus 14 Compressed data communication system 20 Stored data 21 Stored compressed data 22 Stored content 24 Content data 25 Content compressed data 26 Search data 27 Search compressed data 28 Output content 30 DB management part Reference Signs List 31 Decoding unit 32 Allocation unit 33 Search encoding unit 34 Transmission unit 35 Reception unit 36 Reception code table storage unit 38 Decompression program 39 Data compression program 40 Code length storage unit 41 Storage unit entry 42 Number of fields 43 Code length 50 Code table 51 Table entry 52 Code 53 n Value 54 n Data 55 Length display 56 Code

　図１は、第１の実施形態にかかるデータ圧縮装置１０が圧縮するデータを示す。圧縮されるデータは、複数（Ｎ個）のフィールドから構成される。各フィールドは区切り記号で区切られているものとする。図１によれば、データ圧縮装置１０は、先頭から第ｎ１フィールドまでのデータを圧縮するときは長さｌ１の圧縮符号（以降、符号）を用いる。同装置は、先頭から第ｎ２フィールドまでのデータを圧縮するときは長さｌ２の符号を用いる。同装置は、先頭から第Ｎフィールドまでのデータを圧縮するときは長さｌＮの符号を用いる。
　以降、先頭から第ｎフィールドまでのデータをｎデータと略記することがある。
　利用者は、各フィールドが取りうる値の多様度を考慮して、フィールド数に対応する符号の長さ（ｌ１、ｌ２、ｌＮ等）をデータ圧縮装置１０に指定することが出来る。例えば、同装置が扱う範囲内で、ｎ１データは比較的固定的で多様度が低いが、第ｎ１＋１フィールドから第ｎ２フィールド迄のデータは多様度が高い場合、利用者はｌ１に比較的小さく、ｌ２に大きく増加した値を指定できる。更に、例えば、第ｎ２＋１以降のデータの多様度が高いときは、利用者はｌ２に比べて大きく増加した値をｌＮとして指定できる。
　なお、フィールドが取りうる値の多様度が高いとは、当該フィールドに格納されうるデータのバリエーション数が多いことを意味する。反対に、フィールドが取りうる値の多様度が低いとは、当該フィールドに格納されうるデータのバリエーション数が少ないことを意味する。例えば、あるレコードの先頭フィールドは、和暦の元号２文字データの格納域であり、データのバリエーションは『昭和』か『平成』の２通りであるとする。一方、後続のフィールドは英文字２文字データの格納域であり、データのバリエーションは、５２ｘ５２通りであるとする。この場合、先頭フィールドの多様度は低いが、後続フィールドは多様度は高いと言っても良い。
　データは、例えば、ＥＰＣ（Ｅｌｅｃｔｒｏｎｉｃ　Ｐｒｏｄｕｃｔ　Ｃｏｄｅ）のコード体系で表現したＲＦＩＤ（Ｒａｄｉｏ　Ｆｒｅｑｕｅｎｃｙ　ＩＤｅｎｔｉｆｉｃａｔｉｏｎ）タグＩＤが考えられる。ＥＰＣは階層的な構造を持ち、１つのＥＰＣは、タグの種別コード、企業コード、商品コード、および個品を区別するシリアルコードという複数のフィールドから構成される。
　データは、例えば、ＵＲＬ（Ｕｎｉｆｏｒｍ　Ｒｅｓｏｕｒｃｅ　Ｌｏｃａｔｏｒ）であっても良い。ＵＲＬも複数のフィールドからなる階層的な構造を持つ。データはＥＰＣやＵＲＬに限られず、複数のフィールドから構成されるものであれば良い。
　図２は、第１の実施形態のデータ圧縮装置１０の構成図である。データ圧縮装置１０は、コンテンツデータベース１２と接続されている。コンテンツデータベース１２は、例えば、ＲＦＩＤのタグＩＤ対応に、当該タグが付されている商品等の情報をコンテンツとして格納している。
　同装置は、ＤＢ管理部３０、割り当て部３２、検索符号化部３３、復号部３１、符号長記憶部４０、符号表５０を包含する。符号長記憶部４０、符号表５０は図示しないメモリ等に配置される。
　ＤＢ管理部３０は、格納データ２０と格納コンテンツ２２を端末等から入力する。同部は、格納データ２０を圧縮した格納圧縮データ２１をインデックスとして格納コンテンツ２２をコンテンツデータベース１２に格納する。
　また、同部は検索データ２６を端末等から入力する。同部は、格納データ２０を圧縮した格納圧縮データ２１をインデックスとして用いて、コンテンツデータベース１２を検索し、出力コンテンツ２８を端末等に出力する。
　格納コンテンツ２２は、コンテンツデータ２４を包含していても良い。ＤＢ管理部３０は、コンテンツデータ２４を圧縮してコンテンツデータベース１２に格納する。圧縮されたコンテンツデータ２４は、コンテンツ圧縮データ２５である。
　格納データ２０、検索データ２６、コンテンツデータ２４が、図１のデータに該当する。なお、ＤＢ管理部３０は、その一部として市販のデータベースシステムを使用していても良い。
　割り当て部３２は、格納データ２０、または、コンテンツデータ２４を入力し、圧縮して、格納圧縮データ２１、または、コンテンツ圧縮データ２５を出力する。検索符号化部３３は、検索データ２６を入力し、圧縮して、検索圧縮データ２７を出力する。復号部３１は、コンテンツ圧縮データ２５を入力し、圧縮解除して、コンテンツデータ２４を出力する。
　ＤＢ管理部３０、割り当て部３２、検索符号化部３３、復号部３１は、ハードウェアで実現される。ＤＢ管理部３０、割り当て部３２、検索符号化部３３、復号部３１は、コンピュータ１１でもあるデータ圧縮装置１０の図示しないプロセッサが、図示しないメモリ上のデータ圧縮プログラム３９を実行することで実現されても良い。
　図３は、符号長記憶部４０に格納されるデータを示す。符号長記憶部４０は記憶部エントリ４１を複数格納する。各記憶部エントリ４１はフィールド数４２と符号長４３を対応させて記憶する。
　フィールド数４２は、例えば、１から格納データ２０（検索データ２６、コンテンツデータ２４も同じ）のフィールドの数の最大値（例えば６）までを格納する。フィールド数４２は、１から格納データ２０等のフィールドの数の最大値未満の数までを格納していても良い。
　符号長４３は、フィールド数４２の増加に伴って、順次増加する値（例えば、１から４まで。同じ値があっても良い）を格納する。符号長４３の単位は、例えばバイト長やビット長である。
　図４は、符号表５０に格納されるデータを示す。符号表５０は表エントリ５１を格納する。各表エントリ５１はコード５２、ｎ値５３、ｎデータ５４を対応させて記憶する。コード５２は長さ表示５５と符号５６を包含する。長さ表示５５は符号５６の長さを示す、例えば、２ビットのデータである。例えば、０１は符号５６が１バイトであること、１０は符号５６が２バイトであること、１１は符号５６が４バイトであることを示すこととする。（００は、後述するように、符号化されなかったことを示しても良い。）
　なお、符号５６に終端表示を付加して符号５６の長さを認識する場合、長さ表示５５は不要である。データ圧縮装置１０は、例えば、符号５６の値として２ビット連続する０は含めないようにして、２ビット連続する０を終端表示として使用することが出来る。
　図５は、ＤＢ管理部３０の動作フローチャートである。ＤＢ管理部３０は、端末等からコンテンツデータベース１２への格納要求を入力する（Ｓ１でＹ）と、同端末等から格納データ２０を入力して割り当て部３２に出力し、同部から格納圧縮データ２１を受信する（Ｓ２）。
　続いて、ＤＢ管理部３０は、同端末等から格納コンテンツ２２を入力し、格納コンテンツ２２がコンテンツデータ２４を包含していればこれを圧縮する（Ｓ３）。
　具体的に、同部は格納コンテンツ２２にコンテンツデータ２４が含まれているか否かを判断する。含まれていれば、同部は、コンテンツデータ２４を取り出して割り当て部３２に出力し、同部からコンテンツ圧縮データ２５を受信する。同部は、受信したコンテンツ圧縮データ２５で格納コンテンツ２２内のコンテンツデータ２４を置換する。
　コンテンツデータ２４が含まれているか否かの判断方法や格納コンテンツ２２内でのコンテンツデータ２４の位置取得方法は、予め定められているものとする。例えば、格納コンテンツ２２の特定エリアが、フラグやポインタを含んでいても良い。
　ＤＢ管理部３０は、格納圧縮データ２１を検索用のインデックス値として付して、格納コンテンツ２２をコンテンツデータベース１２に格納する（Ｓ４）。
　端末等からコンテンツデータベース１２への検索要求を入力する（Ｓ１でＮ）と、ＤＢ管理部３０は、同端末等から検索データ２６を入力して検索符号化部３３に出力し、同部から検索圧縮データ２７を受信する（Ｓ５）。
　続いて、ＤＢ管理部３０は、検索圧縮データ２７をキーとしてコンテンツデータベース１２を検索し、出力コンテンツ２８を読み込む（Ｓ６）。
　最後に同部は、出力コンテンツ２８がコンテンツ圧縮データ２５を包含していればこれを圧縮解除して、出力コンテンツ２８を端末等に出力する（Ｓ７）。
　具体的に、同部は出力コンテンツ２８にコンテンツ圧縮データ２５が含まれているか否かを判断する。含まれていれば、同部は、コンテンツ圧縮データ２５を取り出して復号部３１に出力し、同部からコンテンツデータ２４を受信する。同部は、受信したコンテンツデータ２４で出力コンテンツ２８内のコンテンツ圧縮データ２５を置換する。
　図６Ａ及び図６Ｂは、割り当て部３２の動作フローチャートである。なお、以下の説明は、入力データが格納データ２０である場合についてのものであるが、入力データがコンテンツデータ２４である場合も同じである。
　割り当て部３２は、入力した格納データ２０のフィールド数（Ｎ）をカウントし（Ｓ１１）、検索フィールド数（Ｌ）にＮを設定する（Ｓ１２）。格納データ２０のフィールド数が一律である場合、このカウントは不要である。また、Ｎは、パラメータ値として外部から与えられても良い。
　同部は、フィールド数４２がＬである記憶部エントリ４１を発見するため符号長記憶部４０を検索し（Ｓ１３）、発見できれば（Ｓ１４でＹ）、発見した記憶部エントリ４１から符号長４３を取得する（Ｓ１５）。
　その後同部は、格納データ２０の先頭から第Ｌフィールドまでのデータ（格納Ｌデータ）に既に割り当てられている符号５６を発見するため符号表５０を検索する。即ち、同部は、ｎ値５３がＬと同じ値、かつ、ｎデータ５４が格納Ｌデータと一致する表エントリ５１を探す（Ｓ１６）。
　発見できなければ（Ｓ１ＡでＮ）、同部は新たな符号を生成する。即ち、同部は、取得した符号長４３の長さを持ち、符号表５０に登録されていない符号を生成する（Ｓ１Ｂ）。具体的に同部は、例えば、符号長４３対応に既に生成済み符号５６の最大値を記憶しておき、生成時に１加算した値を新たな符号として出力しても良い。
　生成できると（Ｓ１ＣでＹ）、同部は符号表５０に新たな表エントリ５１を追加する。即ち同部は、生成した符号を符号５６に、その長さを長さ表示５５に、Ｌをｎ値５３に、格納Ｌデータをｎデータ５４に各々格納する（Ｓ１Ｄ）。
　同部は、格納データ２０中の格納Ｌデータを、追加した表エントリ５１のコード５２（長さ表示５５と符号５６）で置換して、格納圧縮データ２１を生成してＤＢ管理部３０に出力する（Ｓ１Ｅ）。なお、ＤＢ管理部３０が置換を行うこととして、割り当て部３２は、コード５２をＤＢ管理部３０に出力することとしても良い。
　ｎ値５３がＬと同じ値、かつ、ｎデータ５４が格納Ｌデータと一致する表エントリ５１を発見できた場合（Ｓ１ＡでＹ）、同部は、格納データ２０中の格納Ｌデータを、当該表エントリ５１のコード５２で置換して格納圧縮データ２１を生成する。同部は、生成した格納圧縮データ２１をＤＢ管理部３０に出力する（Ｓ１Ｆ）。
　フィールド数４２がＬである記憶部エントリ４１を発見できない（Ｓ１４でＮ）、または、新たな符号生成に失敗した場合（Ｓ１ＣでＮ）、割り当て部３２はＬを１減じる（Ｓ１７）。その後同部はＳ１３から再実行する。Ｌを１減じた結果０になれば（Ｓ１８でＮ）、同部は、格納データ２０の先頭に長さ表示５５の代わりの値（２ビットデータ００）を付して、格納圧縮データ２１としてＤＢ管理部３０に出力する（Ｓ１９）。このケースは、使用可能な符号が生成出来ず、圧縮出来なかった場合である。
　図７は、検索符号化部３３の動作フローチャートである。検索符号化部３３は、入力した検索データ２６のフィールド数（Ｎ）をカウントし（Ｓ２１）、検索フィールド数（Ｌ）にＮを設定する（Ｓ２２）。検索データ２６のフィールド数が一律である場合、このカウントは不要である。また、Ｎは、パラメータ値として外部から与えられても良い。
　同部は、フィールド数４２がＬである記憶部エントリ４１を発見するため符号長記憶部４０を検索する（Ｓ２３）。発見できれば（Ｓ２４でＹ）、同部は、検索データ２６の先頭から第Ｌフィールドまでのデータ（検索Ｌデータ）に既に割り当てられている符号５６を発見するため符号表５０を検索する。即ち、同部は、ｎ値５３がＬと同じ値、かつ、ｎデータ５４が検索Ｌデータと一致する表エントリ５１を探す（Ｓ２５）。
　発見できた場合（Ｓ２６でＹ）、同部は、検索データ２６中の検索Ｌデータを、当該表エントリ５１のコード５２で置換して、検索圧縮データ２７を生成してＤＢ管理部３０に出力する（Ｓ２７）。
　フィールド数４２がＬである記憶部エントリ４１を発見できない（Ｓ２４でＮ）、または、ｎ値５３がＬと同じ値かつｎデータ５４が検索Ｌデータと一致する表エントリ５１を発見できない場合（Ｓ２６でＮ）、検索符号化部３３はＬを１減ずる（Ｓ２８）。その後同部はＳ２３から再実行する。Ｌを１減じた結果０になれば（Ｓ２９でＮ）、同部は、検索データ２６の先頭に長さ表示５５の代わりの値（２ビットデータ００）を付して、検索圧縮データ２７としてＤＢ管理部３０に出力する（Ｓ２Ａ）。このケースは、圧縮できる検索Ｌデータが符号表５０に登録されていないため、圧縮出来なかった場合である。
　図８は、復号部３１の動作フローチャートである。復号部３１は、コンテンツ圧縮データ２５を入力して（Ｓ３１）、先頭２ビット（圧縮されていれば長さ表示が格納されている領域）が００であるか確認する（Ｓ３２）。
　００でない場合（Ｓ３２でＮ）、同部はコンテンツ圧縮データ２５からコードを取り出して（Ｓ３３）、当該コードと一致するコード５２を有する表エントリ５１を検索する（Ｓ３４）。同部は、コンテンツ圧縮データ２５のコードを、当該表エントリ５１のｎデータ５４で置換して、コンテンツデータ２４を生成して出力する（Ｓ３５）。
　００である場合（Ｓ３２でＹ）、同部は、コンテンツ圧縮データ２５の先頭２ビットの００を削除して、コンテンツデータ２４を生成して出力する（Ｓ３６）。
　上記の説明において、データ圧縮装置１０がデータの先頭から連続したｎフィールドを符号５６で圧縮する。データ圧縮装置１０は、データの後ろから連続したｎフィールドを符号５６で圧縮するようにしても良い。
　また、連続したフィールドの多様度を調整する（例えば、多様度の低いフィールドを連続させる）為に、ＤＢ管理部３０が、格納データ２０や検索データ２６のフィールドの前後関係を入れ替えてから、圧縮するようにしても良い。
　本実施形態のデータ圧縮装置１０は、格納データ２０等のフィールドの取りうる値の多様度に応じて、圧縮率が高いデータ圧縮を可能とする。その理由は、利用者が符号長記憶部４０に、フィールドの値の多様度に応じた適切な符号長４３を指定できるからである。
　また、本実施形態のデータ圧縮装置１０は、複数フィールドを包含するデータの圧縮率を高く維持できる。その理由は、複数のフィールドをまとめて一つの符号５６に圧縮するからである。
　さらに、本実施形態のデータ圧縮装置１０は、ある符号長４３の符号５６が使い切られたときでも、高い圧縮率を維持できる。その理由は、圧縮対象フィールド数を順次減じながらも、複数フィールドを当該複数フィールドの多様度に適した符号長４３の符号５６を用いるからである。
　図９は、第２の実施形態のデータ圧縮装置１０が圧縮するデータを示す。本実施形態のデータ圧縮装置１０は、コンテンツデータベース１２の検索に於いて、ワイルドカード指定の使用を可能とする。これを達成するために、本実施形態のデータ圧縮装置１０は、格納データ２０および検索データ２６の符号化する最大範囲を第ｐ−１フィールド（ｐ＜Ｎ）までに限定している。ワイルドカードは、圧縮されない第ｐフィールド以降で指定可能である。
　図１０は、第２の実施形態の符号長記憶部４０に格納されるデータを示す。符号表５０が格納するフィールド数４２の最大値はｐ−１となっている。その理由は、格納データ２０および検索データ２６を符号化する最大範囲を第ｐ−１フィールドまでに限定するためである。
　本実施形態のＤＢ管理部３０は、ワイルドカードを指定したコンテンツデータベース１２の検索を行う。ワイルドカードを指定した検索技術は公知であるため詳細は省略する。他の点に於いて、本実施形態のデータ圧縮装置１０は第１の実施形態と同じである。
　本実施形態のデータ圧縮装置１０は、柔軟なデータ検索を可能とする。その理由は、検索に於いて、ワイルドカード指定が使用できるからである。
　図１１は、第３の実施形態のデータ圧縮装置１０の構成図である。本実施形態のデータ圧縮装置１０は複数の符号長記憶部４０と複数の符号表５０を包含する。
　図１２は、本実施形態のデータ圧縮装置１０が行うデータ圧縮の様子を示す。本実施形態のデータ圧縮装置１０の割り当て部３２は、格納データ２０を複数のフィールド列（部分格納データ）に分割して、各々を異なる符号長記憶部４０と符号表５０を用いて圧縮する。
　例えば、割り当て部３２は、第１フィールドから第ｎ１−１フィールドまで（第１の部分格納データ）を第１の符号長記憶部４０と符号表５０を用いて圧縮する。同部は、第ｎ１フィールドから第ｎ２−１フィールドまで（第２の部分格納データ）を第２の符号長記憶部４０と符号表５０を用いて圧縮する。同部は、第ｎ２フィールドから第Ｎフィールドまで（第３の部分格納データ）を第２の符号長記憶部４０と符号表５０を用いて圧縮する。
　同装置は、複数の部分格納データを、同一の符号長記憶部４０と符号表５０を用いて圧縮しても良い。格納データ２０の分割数は３に限定されない。さらに、同部および検索符号化部３３は、各々、コンテンツデータ２４および検索データ２６も同様に圧縮する。
　なお、部分格納データの区切りや、各部分格納データと、符号長記憶部４０および符号表５０との対応付けは、例えば、予め固定的に定められているものとする。他の点に於いて、本実施形態のデータ圧縮装置１０は第１の実施形態と同じである。
　本実施形態のデータ圧縮装置１０は、柔軟なデータ圧縮が可能となる。その理由は、部分格納データ等に対して、それぞれ、適切な符号長記憶部４０を指定できるからである。
　図１３は、第４の実施形態のデータ圧縮装置１０の構成図である。本実施形態のデータ圧縮装置１０は圧縮データ通信システム１４の送信装置として機能する。
　本実施形態のデータ圧縮装置１０は、割り当て部３２、符号長記憶部４０、符号表５０、送信部３４を包含する。符号長記憶部４０、符号表５０は、第１の実施形態と同じである。
　割り当て部３２は、格納データ２０を端末等から入力し、格納圧縮データ２１を送信部３４に出力する。他の点に於いて、割り当て部３２は、第１の実施形態と同じである。
　送信部３４は、格納圧縮データ２１の生成過程で登録された符号表５０の内容を圧縮データ受信装置１３に送信する。その後、送信部３４は、格納圧縮データ２１を圧縮データ受信装置１３に送信する。
　割り当て部３２、送信部３４は、ハードウェアで実現される。割り当て部３２、送信部３４は、コンピュータ１１でもあるデータ圧縮装置１０の図示しないプロセッサが、図示しないメモリ上のデータ圧縮プログラム３９を実行することで実現されても良い。
　圧縮データ受信装置１３は、復号部３１、受信部３５、受信符号表格納部３６を備える。受信符号表格納部３６は図示しないメモリ等に配置される。
　受信部３５は、データ圧縮装置１０から符号表５０のデータを受信して受信符号表格納部３６内に、データ圧縮装置１０内と同じ内容の符号表５０を再現する。
　復号部３１は、データ圧縮装置１０から格納圧縮データ２１を受信して、圧縮解除を行って格納データ２０を、端末等に出力する。復号部３１は、コンテンツ圧縮データ２５、コンテンツデータ２４に代えて、格納圧縮データ２１、格納データ２０を扱う。他の点に於いて、復号部３１は第１の実施形態と同じである。
　復号部３１、受信部３５は、ハードウェアで実現される。復号部３１、受信部３５は、コンピュータ１１でもある圧縮データ受信装置１３の図示しないプロセッサが、図示しないメモリ上の圧縮解除プログラム３８を実行することで実現されても良い。
　本実施形態のデータ圧縮装置１０は、格納データ２０を効率よく送信できる。その理由は、格納データ２０を圧縮して送信するからである。
　本発明は、図１が示すようなデータ以外にも、フィールド数によってデータが特定され、多様度が予測可能なデータ一般に適用できる。
　図１４は、第５の実施形態のデータ圧縮装置１０が圧縮するデータを示す。図１４によればデータは多種存在する。データは、例えば、ｎ１個のフィールドから構成される第１種のデータ、ｎ２個のフィールドから構成される第２種のデータ、．．．．．Ｎ個のフィールドから構成される第Ｎ種のデータ等である。
　図１５は、第５の実施形態の割り当て部３２の動作フローチャートである。割り当て部３２は、入力した格納データ２０のフィールド数（Ｎ）をカウントする（Ｓ４１）。
　同部は、フィールド数４２がＮである記憶部エントリ４１を発見するため符号長記憶部４０を検索して（Ｓ４２）、発見した記憶部エントリ４１から符号長４３を取得する（Ｓ４３）。
　その後同部は、格納Ｎデータに既に割り当てられている符号５６を発見するため符号表５０を検索する。即ち、同部は、ｎ値５３がＮと同じ値、かつ、ｎデータ５４が格納Ｎデータと一致する表エントリ５１を探す（Ｓ４４）。
　発見できなければ（Ｓ４５でＮ）、同部は新たな符号を生成する。即ち、同部は、取得した符号長４３の長さを持ち、符号表５０に登録されていない符号を生成する（Ｓ４６）。生成できると（Ｓ４７でＹ）、同部は符号表５０に新たな表エントリ５１を追加する。即ち同部は、生成した符号を符号５６に、その長さを長さ表示５５に、Ｎをｎ値５３に、格納Ｎデータをｎデータ５４に各々格納する（Ｓ４８）。最後に同部は、生成した符号を格納圧縮データ２１として出力する（Ｓ４９）。
　新たな符号の生成が出来ないと（Ｓ４７でＮ）、同部はエラーリターンする（Ｓ４Ａ）。
　ｎ値５３がＮと同じ値、かつ、ｎデータ５４が格納Ｎデータと一致する表エントリ５１が発見出来ると（Ｓ４５でＹ）、同部は発見した表エントリ５１から符号５６を取得して格納圧縮データ２１として出力する（Ｓ４Ｂ）。
　本実施形態のデータ圧縮装置１０は、幅広い格納データ２０の圧縮が可能である。
　図１６は、本発明のデータ圧縮装置１０の基本構成を示す。データ圧縮装置１０は、符号長記憶部４０と割り当て部３２を備える。
　符号長記憶部４０は、フィールド数の各々に対応して、フィールド数４２の増加に伴って減少せずに少なくとも一度は増加する符号長４３の各々を格納する。割り当て部３２は、フィールド数（Ｎ；複数）を入力して、符号長記憶部４０からＮ以下のフィールド数４２（ｎ）に対応する符号長４３を取得し、当該符号長４３の符号５６を出力する。
　以上、実施形態を参照して本願発明を説明した。しかし、本願発明は、上記の実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解しうる様々な変更をすることができる。
　この出願は、２００８年１２月１２日に出願された日本出願特願２００８−３１６６９５を基礎とする優先権を主張し、その開示の全てをここに取り込む。 FIG. 1 shows data to be compressed by the data compression apparatus 10 according to the first embodiment. The data to be compressed is composed of a plurality (N) of fields. Each field is separated by a delimiter. According to FIG. 1, the data compression apparatus 10 uses a compression code (hereinafter referred to as a code) having a length of 11 when compressing data from the head to the n1st field. The apparatus uses a code of length l2 when compressing data from the head to the n2nd field. The apparatus uses a code having a length of 1N when compressing data from the head to the Nth field.
Hereinafter, data from the head to the nth field may be abbreviated as n data.
The user can specify the length of the code corresponding to the number of fields (l1, l2, lN, etc.) to the data compression apparatus 10 in consideration of the diversity of values that each field can take. For example, within the range handled by the apparatus, n1 data is relatively fixed and low in diversity, but when data from the n1 + 1 field to the n2 field is high in diversity, the user is relatively small in l1, A greatly increased value can be specified for l2. Further, for example, when the diversity of the data after the (n2 + 1) th is high, the user can designate a value greatly increased as compared with l2 as lN.
Note that high diversity of values that a field can take means that there are many variations of data that can be stored in the field. On the other hand, the low diversity of values that a field can take means that the number of data variations that can be stored in the field is small. For example, it is assumed that the first field of a record is a storage area for Japanese character two-character data, and there are two variations of data, “Showa” or “Heisei”. On the other hand, the subsequent field is a storage area for two-letter English character data, and there are 52 × 52 data variations. In this case, it can be said that the diversity of the first field is low, but the diversity of the subsequent field is high.
The data may be, for example, an RFID (Radio Frequency IDentification) tag ID expressed in an EPC (Electronic Product Code) code system. The EPC has a hierarchical structure, and one EPC is composed of a plurality of fields such as a tag type code, a company code, a product code, and a serial code for distinguishing individual items.
The data may be, for example, a URL (Uniform Resource Locator). The URL also has a hierarchical structure composed of a plurality of fields. The data is not limited to EPC or URL, but may be any data as long as it is composed of a plurality of fields.
FIG. 2 is a configuration diagram of the data compression apparatus 10 according to the first embodiment. The data compression apparatus 10 is connected to the content database 12. The content database 12 stores, for example, information such as products to which the tag is attached as content corresponding to the RFID tag ID.
The apparatus includes a DB management unit 30, an allocation unit 32, a search encoding unit 33, a decoding unit 31, a code length storage unit 40, and a code table 50. The code length storage unit 40 and the code table 50 are arranged in a memory or the like (not shown).
The DB management unit 30 inputs the stored data 20 and the stored content 22 from a terminal or the like. The same unit stores the stored content 22 in the content database 12 using the stored compressed data 21 obtained by compressing the stored data 20 as an index.
The same section also inputs the search data 26 from a terminal or the like. The same unit searches the content database 12 using the stored compressed data 21 obtained by compressing the stored data 20 as an index, and outputs the output content 28 to a terminal or the like.
The stored content 22 may include content data 24. The DB management unit 30 compresses the content data 24 and stores it in the content database 12. The compressed content data 24 is content compressed data 25.
The stored data 20, search data 26, and content data 24 correspond to the data in FIG. The DB management unit 30 may use a commercially available database system as a part thereof.
The assigning unit 32 receives the stored data 20 or the content data 24, compresses it, and outputs the stored compressed data 21 or the content compressed data 25. The search encoding unit 33 receives and compresses the search data 26 and outputs search compressed data 27. The decryption unit 31 receives the content compressed data 25, decompresses it, and outputs the content data 24.
The DB management unit 30, the allocation unit 32, the search encoding unit 33, and the decoding unit 31 are realized by hardware. The DB management unit 30, the allocation unit 32, the search encoding unit 33, and the decoding unit 31 are realized when a processor (not shown) of the data compression apparatus 10 that is also the computer 11 executes a data compression program 39 on a memory (not shown). May be.
FIG. 3 shows data stored in the code length storage unit 40. The code length storage unit 40 stores a plurality of storage unit entries 41. Each storage unit entry 41 stores the number of fields 42 and the code length 43 in association with each other.
The number of fields 42 stores, for example, from 1 to the maximum value (for example, 6) of the number of fields of the stored data 20 (the search data 26 and the content data 24 are the same). The field number 42 may store from 1 to a number less than the maximum value of the number of fields such as stored data 20.
The code length 43 stores a value that increases sequentially as the number of fields 42 increases (for example, from 1 to 4. The same value may be present). The unit of the code length 43 is, for example, a byte length or a bit length.
FIG. 4 shows data stored in the code table 50. The code table 50 stores a table entry 51. Each table entry 51 stores a code 52, an n value 53, and n data 54 in association with each other. The code 52 includes a length display 55 and a code 56. The length display 55 is, for example, 2-bit data indicating the length of the reference numeral 56. For example, 01 indicates that the code 56 is 1 byte, 10 indicates that the code 56 is 2 bytes, and 11 indicates that the code 56 is 4 bytes. (00 may indicate that encoding was not performed, as will be described later.)
In addition, when the terminal display is added to the reference numeral 56 and the length of the reference numeral 56 is recognized, the length display 55 is not necessary. For example, the data compression apparatus 10 can use a 2-bit continuous 0 as a termination display without including a 2-bit continuous 0 as the value of the code 56.
FIG. 5 is an operation flowchart of the DB management unit 30. When the DB management unit 30 inputs a storage request from the terminal or the like to the content database 12 (Y in S1), the DB management unit 30 inputs the storage data 20 from the terminal or the like and outputs it to the allocating unit 32. 21 is received (S2).
Subsequently, the DB management unit 30 inputs the stored content 22 from the terminal or the like, and compresses the stored content 22 if the stored content 22 includes the content data 24 (S3).
Specifically, the same part determines whether or not content data 24 is included in the stored content 22. If it is included, the same unit retrieves the content data 24 and outputs it to the assigning unit 32, and receives the compressed content data 25 from the same unit. The same unit replaces the content data 24 in the stored content 22 with the received content compressed data 25.
It is assumed that a method for determining whether or not the content data 24 is included and a method for obtaining the position of the content data 24 in the stored content 22 are determined in advance. For example, the specific area of the stored content 22 may include a flag or a pointer.
The DB management unit 30 stores the stored content 22 in the content database 12 with the stored compressed data 21 as an index value for search (S4).
When a search request to the content database 12 is input from a terminal or the like (N in S1), the DB management unit 30 inputs the search data 26 from the terminal or the like and outputs it to the search encoding unit 33. The compressed data 27 is received (S5).
Subsequently, the DB management unit 30 searches the content database 12 using the search compressed data 27 as a key, and reads the output content 28 (S6).
Finally, if the output content 28 includes the content compressed data 25, the same unit decompresses the output content 28 and outputs the output content 28 to a terminal or the like (S7).
Specifically, the same part determines whether or not the content compressed data 25 is included in the output content 28. If it is included, the same unit extracts the content compressed data 25 and outputs the content compressed data 25 to the decrypting unit 31, and receives the content data 24 from the same unit. The same unit replaces the compressed content data 25 in the output content 28 with the received content data 24.
6A and 6B are operation flowcharts of the assigning unit 32. FIG. The following description is for the case where the input data is the stored data 20, but the same applies to the case where the input data is the content data 24.
The allocating unit 32 counts the number of fields (N) of the input stored data 20 (S11), and sets N as the number of search fields (L) (S12). When the number of fields of the stored data 20 is uniform, this count is unnecessary. N may be given from the outside as a parameter value.
The same unit searches the code length storage unit 40 to find the storage unit entry 41 whose field number 42 is L (S13), and if it can be found (Y in S14), the code length 43 is retrieved from the found storage unit entry 41. Obtain (S15).
Thereafter, the same part searches the code table 50 to find the code 56 already assigned to the data (stored L data) from the beginning of the stored data 20 to the Lth field. That is, the same part searches for a table entry 51 in which the n value 53 is the same as L and the n data 54 matches the stored L data (S16).
If it cannot be found (N in S1A), the same part generates a new code. That is, the same part generates a code having the acquired code length 43 and not registered in the code table 50 (S1B). Specifically, for example, the same unit may store the maximum value of the already generated code 56 corresponding to the code length 43 and output a value obtained by adding 1 at the time of generation as a new code.
If it can be generated (Y in S1C), the same part adds a new table entry 51 to the code table 50. That is, the same unit stores the generated code in the code 56, the length in the length display 55, the L in the n value 53, and the stored L data in the n data 54 (S1D).
The same section replaces the stored L data in the stored data 20 with the code 52 (length display 55 and reference numeral 56) of the added table entry 51 to generate the stored compressed data 21 and outputs it to the DB management section 30. (S1E). As the DB management unit 30 performs the replacement, the allocation unit 32 may output the code 52 to the DB management unit 30.
When the table entry 51 in which the n value 53 is the same as L and the n data 54 matches the stored L data can be found (Y in S1A), the same unit stores the stored L data in the stored data 20 The stored compressed data 21 is generated by replacing with the code 52 of the table entry 51. The same unit outputs the generated stored compressed data 21 to the DB management unit 30 (S1F).
When the storage unit entry 41 with the field number 42 being L cannot be found (N in S14) or when new code generation fails (N in S1C), the allocating unit 32 decrements L by 1 (S17). Thereafter, the same unit re-executes from S13. If the result of subtracting 1 from L is 0 (N in S18), the same unit attaches a value (2-bit data 00) instead of the length display 55 to the head of the stored data 20, and stores it as the stored compressed data 21. The data is output to the DB management unit 30 (S19). In this case, usable codes cannot be generated and compressed.
FIG. 7 is an operation flowchart of the search encoding unit 33. The search encoding unit 33 counts the number of fields (N) of the input search data 26 (S21), and sets N to the number of search fields (L) (S22). If the number of fields in the search data 26 is uniform, this count is unnecessary. N may be given from the outside as a parameter value.
The same part searches the code length storage unit 40 in order to find the storage unit entry 41 whose field number 42 is L (S23). If it can be found (Y in S24), the same part searches the code table 50 to find the code 56 already assigned to the data (search L data) from the top of the search data 26 to the Lth field. That is, the same part searches for a table entry 51 in which the n value 53 is the same value as L and the n data 54 matches the search L data (S25).
If found (Y in S26), the same unit replaces the search L data in the search data 26 with the code 52 of the table entry 51 to generate the search compressed data 27 and outputs it to the DB management unit 30 (S27).
When the storage section entry 41 whose field number 42 is L cannot be found (N in S24), or when the table entry 51 in which the n value 53 is the same as L and the n data 54 matches the search L data cannot be found (S26) N), the search coding unit 33 decrements L by 1 (S28). Thereafter, the same unit re-executes from S23. If the result of subtracting 1 from L is 0 (N in S29), the same part adds a value (2-bit data 00) instead of the length display 55 to the head of the search data 26 as the search compressed data 27. The data is output to the DB management unit 30 (S2A). This case is a case where the search L data that can be compressed is not registered in the code table 50 and cannot be compressed.
FIG. 8 is an operation flowchart of the decoding unit 31. The decryption unit 31 inputs the compressed content data 25 (S31), and checks whether the first 2 bits (the area in which the length display is stored if compressed) is 00 (S32).
If it is not 00 (N in S32), the same part extracts the code from the compressed content data 25 (S33), and searches the table entry 51 having the code 52 that matches the code (S34). The same unit replaces the code of the content compressed data 25 with the n data 54 of the table entry 51 to generate and output the content data 24 (S35).
If it is 00 (Y in S32), the same unit deletes the first two bits 00 of the content compressed data 25, and generates and outputs the content data 24 (S36).
In the above description, the data compression apparatus 10 compresses the n field continuous from the head of the data with the code 56. The data compression apparatus 10 may compress n fields continuous from the back of the data with the code 56.
Further, in order to adjust the diversity of consecutive fields (for example, to make a field with low diversity continue), the DB management unit 30 changes the order of the fields of the stored data 20 and the search data 26 and then compresses them. You may make it do.
The data compression apparatus 10 according to the present embodiment enables data compression with a high compression rate in accordance with the diversity of values that can be taken by fields such as stored data 20. This is because the user can specify an appropriate code length 43 in the code length storage unit 40 according to the diversity of the field value.
In addition, the data compression apparatus 10 according to the present embodiment can maintain a high compression rate of data including a plurality of fields. The reason is that a plurality of fields are compressed together into one code 56.
Furthermore, the data compression apparatus 10 of the present embodiment can maintain a high compression rate even when a code 56 having a certain code length 43 is used up. The reason is that the code 56 having the code length 43 suitable for the diversity of the plurality of fields is used for the plurality of fields while sequentially reducing the number of fields to be compressed.
FIG. 9 shows data to be compressed by the data compression apparatus 10 according to the second embodiment. The data compression apparatus 10 according to the present embodiment enables the use of wild card designation in the search of the content database 12. In order to achieve this, the data compression apparatus 10 of the present embodiment limits the maximum range for encoding the stored data 20 and the search data 26 to the p-1th field (p <N). The wild card can be specified after the p-th field that is not compressed.
FIG. 10 shows data stored in the code length storage unit 40 of the second embodiment. The maximum value of the number of fields 42 stored in the code table 50 is p-1. This is because the maximum range for encoding the stored data 20 and the search data 26 is limited to the (p-1) th field.
The DB management unit 30 of the present embodiment searches the content database 12 specifying a wild card. Since a search technique specifying a wild card is well known, the details are omitted. In other respects, the data compression apparatus 10 of this embodiment is the same as that of the first embodiment.
The data compression apparatus 10 according to the present embodiment enables flexible data search. The reason is that a wild card specification can be used in the search.
FIG. 11 is a configuration diagram of the data compression apparatus 10 according to the third embodiment. The data compression apparatus 10 according to the present embodiment includes a plurality of code length storage units 40 and a plurality of code tables 50.
FIG. 12 shows a state of data compression performed by the data compression apparatus 10 of this embodiment. The allocation unit 32 of the data compression apparatus 10 according to the present embodiment divides the storage data 20 into a plurality of field sequences (partial storage data) and compresses each using a different code length storage unit 40 and code table 50.
For example, the assigning unit 32 compresses the first field to the (n1-1) th field (first partial storage data) using the first code length storage unit 40 and the code table 50. The same unit compresses from the n1 field to the n2-1 field (second partial storage data) using the second code length storage unit 40 and the code table 50. The same unit compresses from the n2nd field to the Nth field (third partial stored data) using the second code length storage unit 40 and the code table 50.
The apparatus may compress a plurality of pieces of partial storage data using the same code length storage unit 40 and code table 50. The number of divisions of the stored data 20 is not limited to three. Further, the same unit and the search encoding unit 33 compress the content data 24 and the search data 26 in the same manner.
It is assumed that the partial storage data is delimited and the association between each partial storage data and the code length storage unit 40 and the code table 50 is fixed in advance, for example. In other respects, the data compression apparatus 10 of this embodiment is the same as that of the first embodiment.
The data compression apparatus 10 according to the present embodiment can perform flexible data compression. The reason is that an appropriate code length storage unit 40 can be designated for each partial stored data and the like.
FIG. 13 is a configuration diagram of the data compression apparatus 10 according to the fourth embodiment. The data compression apparatus 10 of this embodiment functions as a transmission apparatus of the compressed data communication system 14.
The data compression apparatus 10 according to the present embodiment includes an allocation unit 32, a code length storage unit 40, a code table 50, and a transmission unit 34. The code length storage unit 40 and the code table 50 are the same as those in the first embodiment.
The assigning unit 32 inputs the stored data 20 from a terminal or the like, and outputs the stored compressed data 21 to the transmitting unit 34. In other respects, the assigning unit 32 is the same as in the first embodiment.
The transmission unit 34 transmits the contents of the code table 50 registered in the process of generating the stored compressed data 21 to the compressed data receiving device 13. Thereafter, the transmission unit 34 transmits the stored compressed data 21 to the compressed data receiving device 13.
The allocation unit 32 and the transmission unit 34 are realized by hardware. The allocation unit 32 and the transmission unit 34 may be realized by a processor (not shown) of the data compression apparatus 10 that is also the computer 11 executing a data compression program 39 on a memory (not shown).
The compressed data receiving device 13 includes a decoding unit 31, a receiving unit 35, and a received code table storage unit 36. The reception code table storage unit 36 is arranged in a memory or the like (not shown).
The receiving unit 35 receives the data of the code table 50 from the data compression device 10 and reproduces the code table 50 having the same contents as the data compression device 10 in the reception code table storage unit 36.
The decoding unit 31 receives the stored compressed data 21 from the data compression apparatus 10, performs decompression, and outputs the stored data 20 to a terminal or the like. The decoding unit 31 handles the stored compressed data 21 and the stored data 20 instead of the content compressed data 25 and the content data 24. In other respects, the decoding unit 31 is the same as in the first embodiment.
The decoding unit 31 and the receiving unit 35 are realized by hardware. The decoding unit 31 and the receiving unit 35 may be realized by a processor (not shown) of the compressed data receiving device 13 that is also the computer 11 executing a decompression program 38 on a memory (not shown).
The data compression apparatus 10 of this embodiment can transmit the stored data 20 efficiently. The reason is that the stored data 20 is compressed and transmitted.
In addition to the data shown in FIG. 1, the present invention can be applied to general data in which data is specified by the number of fields and the diversity can be predicted.
FIG. 14 shows data to be compressed by the data compression apparatus 10 according to the fifth embodiment. According to FIG. 14, there are various types of data. The data includes, for example, a first type of data composed of n1 fields, a second type of data composed of n2 fields,. . . . . N-type data composed of N fields.
FIG. 15 is an operation flowchart of the assignment unit 32 according to the fifth embodiment. The allocating unit 32 counts the number of fields (N) of the input stored data 20 (S41).
The same part searches the code length storage unit 40 to find the storage unit entry 41 whose field number 42 is N (S42), and acquires the code length 43 from the found storage unit entry 41 (S43).
Thereafter, the same part searches the code table 50 to find the code 56 already assigned to the stored N data. That is, the same part searches for a table entry 51 in which the n value 53 is the same as N and the n data 54 matches the stored N data (S44).
If it cannot be found (N in S45), the same part generates a new code. That is, the same part generates a code having the acquired code length 43 and not registered in the code table 50 (S46). If it can be generated (Y in S47), the same part adds a new table entry 51 to the code table 50. That is, the same unit stores the generated code in the code 56, the length in the length display 55, N in the n value 53, and the stored N data in the n data 54 (S48). Finally, the same unit outputs the generated code as the stored compressed data 21 (S49).
If a new code cannot be generated (N in S47), the same unit returns an error (S4A).
When the table entry 51 in which the n value 53 is the same as N and the n data 54 matches the stored N data can be found (Y in S45), the same part acquires the code 56 from the found table entry 51 and stores it. Output as compressed data 21 (S4B).
The data compression apparatus 10 according to the present embodiment can compress a wide range of stored data 20.
FIG. 16 shows the basic configuration of the data compression apparatus 10 of the present invention. The data compression apparatus 10 includes a code length storage unit 40 and an allocation unit 32.
The code length storage unit 40 stores each of the code lengths 43 that increase at least once without decreasing as the number of fields 42 increases, corresponding to the number of fields. The allocating unit 32 inputs the number of fields (N; plural), acquires the code length 43 corresponding to the number of fields 42 (n) equal to or less than N from the code length storage unit 40, and sets the code 56 of the code length 43. Output.
The present invention has been described above with reference to the embodiments. However, the present invention is not limited to the above embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2008-316695 for which it applied on December 12, 2008, and takes in those the indications of all here.

Claims

フィールド数の各々に対応して、前記フィールド数の増加に伴って減少せずに少なくとも一度は増加する符号長の各々を格納する符号長記憶手段と、
　フィールド数（Ｎ；複数）を入力して、前記符号長記憶手段から前記Ｎ以下の前記フィールド数（ｎ）に対応する前記符号長を取得し、当該符号長の符号を出力する割り当て手段を備えるデータ圧縮装置。 Corresponding to each of the number of fields, code length storage means for storing each of the code lengths that increase at least once without decreasing as the number of fields increases;
An allocation unit that inputs the number of fields (N; plural), obtains the code length corresponding to the number of fields (n) equal to or less than N from the code length storage unit, and outputs a code of the code length Data compression device.
未使用な符号が存在する前記符号長に対応する前記フィールド数のうち、前記Ｎ以下の最大の値を前記符号長記憶手段の前記フィールド数から前記ｎとして取得し、前記未使用な符号を使用中にして出力する前記割り当て手段を備える、請求項１のデータ圧縮装置。 Among the number of fields corresponding to the code length in which an unused code exists, the maximum value equal to or less than N is obtained as n from the number of fields in the code length storage unit, and the unused code is used. The data compression apparatus according to claim 1, further comprising the assigning unit that outputs the data in the middle.
符号表と、
　前記Ｎ個のフィールドからなる格納データを入力して、前記格納データのうち先頭から連続した前記ｎ個のフィールドのデータ（ｎデータ）と前記符号を対応させたエントリを前記符号表に格納し、前記ｎデータを前記符号で置換した前記格納データ（格納圧縮データ）を出力する前記割り当て手段を備える、請求項１または２のデータ圧縮装置。 A code table;
The storage data consisting of the N fields is input, and the n-field data (n data) continuous from the top of the stored data is stored in the code table and the entry corresponding to the code is stored in the code table, 3. The data compression apparatus according to claim 1, further comprising the assigning unit that outputs the stored data (stored compressed data) obtained by replacing the n data with the code.
前記格納圧縮データをインデックスとしてコンテンツを格納するコンテンツデータベースと、
　前記Ｎ個の連続したフィールドを含む検索データを入力して前記符号表から検索し、前記検索データと前記ｎデータの両者間で一致する先頭から連続したフィールド数が最大値（ｍ；１以上Ｎ以下）となる前記ｎデータを格納する前記エントリを特定して、先頭から連続した前記ｍ個のフィールドのデータ（ｍデータ）を、当該エントリの符号で置換した前記検索データ（検索圧縮データ）を作成する検索符号化手段と、
　前記検索圧縮データと一致する前記インデックスを有するコンテンツを前記コンテンツデータベースから取得するＤＢ管理手段を備える、請求項３のデータ圧縮装置。 A content database for storing content using the stored compressed data as an index;
The search data including the N consecutive fields is input and searched from the code table, and the number of consecutive fields from the head that matches between the search data and the n data is the maximum value (m; 1 or more N And the search data (search compressed data) obtained by replacing the data (m data) of the m fields continuous from the head with the code of the entry. Search encoding means to create;
4. The data compression apparatus according to claim 3, further comprising DB management means for acquiring content having the index that matches the search compression data from the content database.
前記検索データは、第ｐ（前記Ｎ未満の値）フィールド以降にワイルドカード指定を包含し、
　前記ｐ未満のフィールド数対応の符号長を包含するが前記ｐ以上のフィールド数対応の符号長を包含しない前記符号長記憶手段と、
　前記検索圧縮データのうち前記ワイルドカード指定以外の部分が、一致する前記インデックスを有するコンテンツを前記コンテンツデータベースから取得する前記ＤＢ管理手段を備える、請求項４のデータ圧縮装置。 The search data includes a wild card designation after the p-th (value less than N) field,
The code length storage means including a code length corresponding to the number of fields less than the p but not including a code length corresponding to the number of fields equal to or greater than the p;
The data compression apparatus according to claim 4, further comprising: the DB management unit that acquires, from the content database, content having the index that matches a portion other than the wild card designation in the search compressed data.
第１と第２の符号長記憶手段と、
　第１と第２の符号表と、
　前記格納データから、第１と第２の連続フィールド列（部分格納データ）を取得し、前記第１の部分格納データから前記第１の符号長表と第１の符号表に基づいて第１の部分格納圧縮データを生成し、前記第２の部分格納データから前記第２の符号長表と第２の符号表に基づいて第２の部分格納圧縮データを生成し、前記第１の部分格納データを前記第１の部分圧縮格納データで、前記第２の部分格納データを前記第２の部分圧縮格納データで置換した前記格納置換データを作成する前記割り当て手段を備える、請求項３のデータ圧縮装置。 First and second code length storage means;
First and second code tables;
First and second continuous field sequences (partial storage data) are obtained from the stored data, and a first code based on the first code length table and the first code table is obtained from the first partial storage data. Partially stored compressed data is generated, second partial stored compressed data is generated from the second partial stored data based on the second code length table and the second code table, and the first partial stored data 4. The data compression apparatus according to claim 3, further comprising: the assigning unit that creates the storage replacement data by replacing the second partial storage data with the first partial compression storage data and the second partial storage data with the second partial compression storage data. .
受信符号表格納手段と、
　前記符号表の内容を受信して前記受信符号表格納手段に格納する受信手段と、
　前記格納圧縮データを受信して、前記受信符号表格納手段内の前記符号表から前記格納圧縮データ内の前記符号と前記エントリ内の前記符号が一致するエントリを検索し、当該エントリの前記ｎデータで、前記格納圧縮データ内の前記符号を置換する復号手段を備える、圧縮データ受信装置に、
　前記符号表の内容と前記格納圧縮データを送信する送信手段を備える、請求項３のデータ圧縮装置。 Receiving code table storage means;
Receiving means for receiving the contents of the code table and storing them in the received code table storage means;
The storage compressed data is received, an entry in which the code in the stored compressed data matches the code in the entry is searched from the code table in the reception code table storage means, and the n data of the entry In the compressed data receiving device comprising a decoding means for replacing the code in the stored compressed data.
4. The data compression apparatus according to claim 3, further comprising a transmission unit that transmits the contents of the code table and the stored compressed data.
前記圧縮データ受信装置と請求項７のデータ圧縮装置を包含する圧縮データ通信システム。 A compressed data communication system including the compressed data receiving apparatus and the data compressing apparatus according to claim 7.
フィールド数の各々に対応して、前記フィールド数の増加に伴って減少せずに少なくとも一度は増加する符号長の各々を格納する符号長記憶手段を準備する符号長記憶工程と、
　フィールド数（Ｎ；複数）を入力して、前記符号長記憶手段から前記Ｎ以下の前記フィールド数（ｎ）に対応する前記符号長を取得し、当該符号長の符号を出力する割り当て工程を有するデータ圧縮方法。 Corresponding to each of the number of fields, a code length storage step of preparing code length storage means for storing each of the code lengths that increase at least once without decreasing as the number of fields increases;
An allocation step of inputting a field number (N; plural), obtaining the code length corresponding to the field number (n) equal to or less than N from the code length storage unit, and outputting a code of the code length Data compression method.
未使用な符号が存在する前記符号長に対応する前記フィールド数のうち、前記Ｎ以下の最大の値を前記符号長記憶手段の前記フィールド数から前記ｎとして取得し、前記未使用な符号を使用中にして出力する前記割り当て工程を有する、請求項９のデータ圧縮方法。 Among the number of fields corresponding to the code length in which an unused code exists, the maximum value equal to or less than N is obtained as n from the number of fields in the code length storage unit, and the unused code is used. The data compression method according to claim 9, further comprising the assigning step of outputting the data in the middle.
符号表を準備する符号表工程と、
　前記Ｎ個のフィールドからなる格納データを入力して、前記格納データのうち先頭から連続した前記ｎ個のフィールドのデータ（ｎデータ）と前記符号を対応させたエントリを前記符号表に格納し、前記ｎデータを前記符号で置換した前記格納データ（格納圧縮データ）を出力する前記割り当て工程を有する、請求項９または１０のデータ圧縮方法。 A code table process for preparing a code table;
The storage data consisting of the N fields is input, and the n-field data (n data) continuous from the top of the stored data is stored in the code table and the entry corresponding to the code is stored in the code table, The data compression method according to claim 9 or 10, further comprising the allocation step of outputting the stored data (stored compressed data) obtained by replacing the n data with the code.
前記格納圧縮データをインデックスとしてコンテンツを格納するコンテンツデータベースを準備するＤＢ工程と、
　前記Ｎ個の連続したフィールドを含む検索データを入力して前記符号表から検索し、前記検索データと前記ｎデータの両者間で一致する先頭から連続したフィールド数が最大値（ｍ；１以上Ｎ以下）となる前記ｎデータを格納する前記エントリを特定して、先頭から連続した前記ｍ個のフィールドのデータ（ｍデータ）を、当該エントリの符号で置換した前記検索データ（検索圧縮データ）を作成する検索符号化工程と、
　前記検索圧縮データと一致する前記インデックスを有するコンテンツを前記コンテンツデータベースから取得するＤＢ管理工程を有する、請求項１１のデータ圧縮方法。 A DB step of preparing a content database for storing content using the stored compressed data as an index;
The search data including the N consecutive fields is input and searched from the code table, and the number of consecutive fields from the head that matches between the search data and the n data is the maximum value (m; 1 or more N And the search data (search compressed data) obtained by replacing the data (m data) of the m fields continuous from the head with the code of the entry. A search encoding process to create;
12. The data compression method according to claim 11, further comprising a DB management step of acquiring content having the index that matches the search compressed data from the content database.
前記検索データは、第ｐ（前記Ｎ未満の値）フィールド以降にワイルドカード指定を包含し、
　前記ｐ未満のフィールド数対応の符号長を包含するが前記ｐ以上のフィールド数対応の符号長を包含しない前記符号長記憶手段を準備する前記符号長記憶工程と、
　前記検索圧縮データのうち前記ワイルドカード指定以外の部分が、一致する前記インデックスを有するコンテンツを前記コンテンツデータベースから取得する前記ＤＢ管理工程を有する、請求項１２のデータ圧縮方法。 The search data includes a wild card designation after the p-th (value less than N) field,
The code length storage step of preparing the code length storage means including a code length corresponding to the number of fields less than p but not including a code length corresponding to the number of fields greater than or equal to the p;
The data compression method according to claim 12, further comprising: the DB management step of acquiring, from the content database, content having the index that matches a part of the search compressed data other than the wildcard designation.
第１と第２の符号長記憶手段を準備する符号長記憶工程と、
　第１と第２の符号表を準備する符号表工程と、
　前記格納データから、第１と第２の連続フィールド列（部分格納データ）を取得し、前記第１の部分格納データから前記第１の符号長表と第１の符号表に基づいて第１の部分格納圧縮データを生成し、前記第２の部分格納データから前記第２の符号長表と第２の符号表に基づいて第２の部分格納圧縮データを生成し、前記第１の部分格納データを前記第１の部分圧縮格納データで、前記第２の部分格納データを前記第２の部分圧縮格納データで置換した前記格納置換データを作成する前記割り当て工程を有する、請求項１１のデータ圧縮方法。 A code length storage step of preparing first and second code length storage means;
A code table process for preparing first and second code tables;
First and second continuous field sequences (partial storage data) are obtained from the stored data, and a first code based on the first code length table and the first code table is obtained from the first partial storage data. Partially stored compressed data is generated, second partial stored compressed data is generated from the second partial stored data based on the second code length table and the second code table, and the first partial stored data The data compression method according to claim 11, further comprising: the allocation step of creating the storage replacement data by replacing the second partial storage data with the first partial compression storage data and the second partial storage data with the second partial compression storage data. .
受信符号表格納手段と、
　前記符号表の内容を受信して前記受信符号表格納手段に格納する受信手段と、
　前記格納圧縮データを受信して、前記受信符号表格納手段内の前記符号表から前記格納圧縮データ内の前記符号と前記エントリ内の前記符号が一致するエントリを検索し、当該エントリの前記ｎデータで、前記格納圧縮データ内の前記符号を置換する復号手段を有する、圧縮データ受信装置に、
　前記符号表の内容と前記格納圧縮データを送信する送信工程を有する、請求項１１のデータ圧縮方法。 Receiving code table storage means;
Receiving means for receiving the contents of the code table and storing them in the received code table storage means;
The storage compressed data is received, an entry in which the code in the stored compressed data matches the code in the entry is searched from the code table in the reception code table storage means, and the n data of the entry In the compressed data receiving device having decoding means for replacing the code in the stored compressed data,
12. The data compression method according to claim 11, further comprising a transmission step of transmitting the content of the code table and the stored compressed data.
フィールド数の各々に対応して、前記フィールド数の増加に伴って減少せずに少なくとも一度は増加する符号長の各々を格納する符号長記憶手段を備えるコンピュータに、
　フィールド数（Ｎ；複数）を入力して、前記符号長記憶手段から前記Ｎ以下の前記フィールド数（ｎ）に対応する前記符号長を取得し、当該符号長の符号を出力する割り当て処理を実行させるデータ圧縮プログラム。 Corresponding to each of the number of fields, a computer comprising code length storage means for storing each of the code lengths that increase at least once without decreasing as the number of fields increases,
The number of fields (N; plural) is input, the code length corresponding to the number of fields (n) equal to or less than N is acquired from the code length storage means, and an allocation process is executed to output a code of the code length Data compression program to let you.
前記コンピュータに、
未使用な符号が存在する前記符号長に対応する前記フィールド数のうち、前記Ｎ以下の最大の値を前記符号長記憶手段の前記フィールド数から前記ｎとして取得し、前記未使用な符号を使用中にして出力する前記割り当て処理を実行させる、請求項１６のデータ圧縮プログラム。 In the computer,
Among the number of fields corresponding to the code length in which an unused code exists, the maximum value equal to or less than N is obtained as n from the number of fields in the code length storage unit, and the unused code is used. The data compression program according to claim 16, wherein the allocation process to be output is executed.
符号表を備える前記コンピュータに、
　前記Ｎ個のフィールドからなる格納データを入力して、前記格納データのうち先頭から連続した前記ｎ個のフィールドのデータ（ｎデータ）と前記符号を対応させたエントリを前記符号表に格納し、前記ｎデータを前記符号で置換した前記格納データ（格納圧縮データ）を出力する前記割り当て処理を実行させる、請求項１６または１７のデータ圧縮プログラム。 In the computer comprising the code table,
The storage data consisting of the N fields is input, and the n-field data (n data) continuous from the top of the stored data is stored in the code table and the entry corresponding to the code is stored in the code table, The data compression program according to claim 16 or 17, wherein the allocation process for outputting the stored data (stored compressed data) obtained by replacing the n data with the code is executed.
前記格納圧縮データをインデックスとしてコンテンツを格納するコンテンツデータベースを備える前記コンピュータに、
　前記Ｎ個の連続したフィールドを含む検索データを入力して前記符号表から検索し、前記検索データと前記ｎデータの両者間で一致する先頭から連続したフィールド数が最大値（ｍ；１以上Ｎ以下）となる前記ｎデータを格納する前記エントリを特定して、先頭から連続した前記ｍ個のフィールドのデータ（ｍデータ）を、当該エントリの符号で置換した前記検索データ（検索圧縮データ）を作成する検索符号化処理と、
　前記検索圧縮データと一致する前記インデックスを実行させるコンテンツを前記コンテンツデータベースから取得するＤＢ管理処理を実行させる、請求項１８のデータ圧縮プログラム。 In the computer comprising a content database for storing content using the stored compressed data as an index,
The search data including the N consecutive fields is input and searched from the code table, and the number of consecutive fields from the head that matches between the search data and the n data is the maximum value (m; 1 or more N And the search data (search compressed data) obtained by replacing the data (m data) of the m fields continuous from the head with the code of the entry. A search encoding process to be created;
The data compression program according to claim 18, wherein a DB management process for acquiring content for executing the index that matches the search compression data from the content database is executed.
前記検索データは、第ｐ（前記Ｎ未満の値）フィールド以降にワイルドカード指定を包含し、
　前記ｐ未満のフィールド数対応の符号長を包含するが前記ｐ以上のフィールド数対応の符号長を包含しない前記符号長記憶手段を前記コンピュータに、
　前記検索圧縮データのうち前記ワイルドカード指定以外の部分が、一致する前記インデックスを実行させるコンテンツを前記コンテンツデータベースから取得する前記ＤＢ管理処理を実行させる、請求項１９のデータ圧縮プログラム。 The search data includes a wild card designation after the p-th (value less than N) field,
The code length storage means including the code length corresponding to the number of fields less than p but not including the code length corresponding to the number of fields equal to or greater than p;
The data compression program according to claim 19, wherein a portion of the search compressed data other than the wildcard designation causes the DB management processing to acquire content that causes the matching index to be executed from the content database.
第１と第２の符号長記憶手段と、第１と第２の符号表を備える前記コンピュータに、
　前記格納データから、第１と第２の連続フィールド列（部分格納データ）を取得し、前記第１の部分格納データから前記第１の符号長表と第１の符号表に基づいて第１の部分格納圧縮データを生成し、前記第２の部分格納データから前記第２の符号長表と第２の符号表に基づいて第２の部分格納圧縮データを生成し、前記第１の部分格納データを前記第１の部分圧縮格納データで、前記第２の部分格納データを前記第２の部分圧縮格納データで置換した前記格納置換データを作成する前記割り当て処理を実行させる、請求項１８のデータ圧縮プログラム。 The computer comprising first and second code length storage means and first and second code tables,
First and second continuous field sequences (partial storage data) are obtained from the stored data, and a first code based on the first code length table and the first code table is obtained from the first partial storage data. Partially stored compressed data is generated, second partial stored compressed data is generated from the second partial stored data based on the second code length table and the second code table, and the first partial stored data 19. The data compression according to claim 18, wherein the allocation process is executed to create the storage replacement data by replacing the second partial storage data with the first partial compression storage data and the second partial storage data with the second partial compression storage data. program.
受信符号表格納手段と、
　前記符号表の内容を受信して前記受信符号表格納手段に格納する受信手段と、
　前記格納圧縮データを受信して、前記受信符号表格納手段内の前記符号表から前記格納圧縮データ内の前記符号と前記エントリ内の前記符号が一致するエントリを検索し、当該エントリの前記ｎデータで、前記格納圧縮データ内の前記符号を置換する復号手段を実行させる、圧縮データ受信装置に、前記符号表の内容と前記格納圧縮データを送信する送信処理を、前記コンピュータに実行させる請求項１８のデータ圧縮プログラム。 Receiving code table storage means;
Receiving means for receiving the contents of the code table and storing them in the received code table storage means;
The storage compressed data is received, an entry in which the code in the stored compressed data matches the code in the entry is searched from the code table in the reception code table storage means, and the n data of the entry 19. The apparatus according to claim 18, further comprising: a compressed data receiving device that causes a decoding unit that replaces the code in the stored compressed data to be executed; and causes the computer to execute a transmission process of transmitting the content of the code table and the stored compressed data. Data compression program.