CN101800761A

CN101800761A - Lossless data compression method based on network dictionary

Info

Publication number: CN101800761A
Application number: CN 200910186807
Authority: CN
Inventors: 吴昊; 刘鹏; 陈宏欣; 冯小辉; 虞芬
Original assignee: Individual
Current assignee: Nanjing Communications Institute of Technology
Priority date: 2009-12-25
Filing date: 2009-12-25
Publication date: 2010-08-11
Anticipated expiration: 2029-12-25
Also published as: CN101800761B

Abstract

The invention relates to a lossless data compression method based on a network dictionary, comprising a dictionary with a compression algorithm, the dictionary is originated from a compressed data dictionary which is stored at a server end, a document summarization or a document block is compared with the server end dictionary by a client, a matching value is queried to act as a compression result, thereby effectively improving the compression efficiency and specially being suitable to the document with mass copies on the network.

Description

A kind of destructive data compressing method of dictionary Network Based

Technical field

The present invention relates to the destructive data compressing method of a kind of destructive data compressing method, particularly a kind of dictionary Network Based.

Technical background

Compress technique roughly can be divided into lossy compression method and lossless compress, lossy compression method is generally used for the multi-medium data compression, lossless compress then is generally used for the conventional data compression, lossless compress can be divided into again based on the compression method of statistical model with based on the compression method of dictionary model, the former is represented as Huffman coding and arithmetic coding, and the latter is represented as LZ77, LZ78, LZW etc.Now popular on the market general lossless compress software adopts the compression method based on dictionary usually, for example ZIP, LHarc, ARJ etc., yet the dictionary of these compression algorithms is this locality and generates based on source file, existing compression method based on dictionary, no matter its dictionary is static dictionary or dynamically generates, all be in this locality, compression efficiency is limited usually.

Summary of the invention

Its purpose of the present invention just is to provide a kind of destructive data compressing method of dictionary Network Based, can effectively promote compression efficiency, and limiting case lower compression efficient is near 100%.

The technical scheme that realizes above-mentioned purpose and take, the dictionary that comprises compression algorithm, described dictionary comes from the packed data dictionary that server end is preserved, and client is compared document or blocks of files and server end dictionary, and the match query value is as the method for compression result.

Compared with prior art the present invention has the following advantages.

Because adopted various types of dictionaries of private server preservation, and set up the technology of dictionary index or dictionary address list, thereby can effectively promote compression efficiency, limiting case lower compression efficient is near 100%.

Embodiment

The dictionary that comprises compression algorithm, described dictionary come from the packed data dictionary that server end is preserved, and client is compared document or blocks of files and server end dictionary, and the match query value is as the method for compression result.

Server end packed data dictionary can comprise one of following two classes or all: a class is the full text dictionary that include file title, file content, hash value are set up, and another kind of is the dictionary that generates according to fixed length characteristic or elongated functional value.

Described client is compared document and server end dictionary, be meant that client desires compressed file with client and calculate message digest that the back generates or message digest+filename and server end through Hash and compare, wherein message digest accurately mates, and filename carries out fuzzy matching.

Described client is compared blocks of files and server end, it can be accurate coupling, can be approximate match also, the matching value that returns afterwards comprises the difference value of server end dictionary index value and client file and server end file or the difference value after the compression.

The present invention is based on the destructive data compressing method of network dictionary, existing compression method based on dictionary, and no matter its dictionary is static dictionary or dynamically generates, all be in this locality.The difference key of this compression method is that special-purpose server preserved various types of dictionaries, and has set up dictionary index or dictionary address list.

Its operation principle is as follows: set up special-purpose server in order to preserve information such as dictionary, piecemeal dictionary and summary in full, when compressing at full text, the source file of desire compression is calculated through Hash, generation fixed length summary, reach server and server summary and compare, then return as compression result with the index of data in the server as coupling; When compressing, the piecemeal and the server end piecemeal dictionary of desire compression are compared, find out the data of similarity maximum, return this data directory then and difference value is beamed back client after having the compression method compression now at the piecemeal document.These two kinds of method combinations can effectively promote compression efficiency,

Embodiment

Compression method one: because many files have too many copy, so with the file is the unit definition dictionary, then Ya Suo time efficiency and space efficiency can be high a lot, can adopt the HASH algorithm that original is formed the fixed length summary during specific implementation, compare with server end, if the two is identical, then can set up one-to-one relationship.

Its compression process is described below:

1. client is carried out HASH calculating to source file, generates the fixed length summary;

2. client will make a summary or make a summary+filename transfers to server end;

3. server end will make a summary or make a summary+filename mates with the summary dictionary, wherein summary accurately mates, filename carries out fuzzy matching, if the match is successful, then sets up mapping one by one, and passes mapping result back client, otherwise change 5.;

4. client obtains the File mapping result, and the result is preserved as compressed file, and compression finishes;

5. the summary optionally client submitted to of server end or summary+filename and source file add in the dictionary;

6. client selects for use following compression method two or conventional compression method to compress.

Compression method two: client is when compression, the data of data original text or process initial compression can be divided into piece, its cutting procedure can be that fixed length is cut apart, it also can be elongated cutting apart, again with each piecemeal and the comparison of server dictionary, if identical then only need numbering or the address of this data block of record in dictionary, if difference then adopt existing compression algorithm to compress, server end can optionally add these data in data dictionary.

Its compression process is described below:

1. client is carried out initial compression (option) to initial data;

2. the data of client after with initial data or initial compression are cut apart;

3. client is compared the divided data piece successively with the server end dictionary, perhaps data block is submitted to server end and compares, if find identical block in dictionary, then changes 4., otherwise changes 5.;

4. client obtains data block in server-side index numbering or address, and with its compression result as this data block.If 3. follow-up data block in addition then changes, otherwise changes 6.;

5. the data optionally client submitted to of server end are added in the dictionary, and client is then used existing compression algorithm data block.If 3. follow-up data block in addition then changes;

6. the compression result with each data block is combined into final compressed file.

The existing data compression algorithm thinks that packed data is because data self have redundancy in essence.Data compression is to utilize various algorithms that data redundancy is compressed to minimum, and reduces distortion as much as possible, thereby improves efficiency of transmission and conserve storage.And " destructive data compressing method of dictionary Network Based " do not consider the redundancy of data self merely, and the starting point has more been seen redundancy between the file in the highland.If how multiplexing the available data compression algorithm be to consider the small-scale subprogram of program inside, " destructive data compressing method of dictionary Network Based " considered so is exactly multiplexing large-scale member how.Its main feature is to have preserved a large amount of dictionaries by server end, and the classification of dictionary can be adopted various indexing means, for example filename, file type etc., can be divided at the compression of whole client file with at the piecemeal compression of client file at the compression of data, following method one is the compression at whole file, and method two is to compress at file block.

Compression method one: because many files have too many copy, so with the file is the unit definition dictionary, then Ya Suo time efficiency and space efficiency can be high a lot, can adopt hash algorithm that original is formed the fixed length summary during specific implementation, compare with server end, if the two is identical, then can set up one-to-one relationship.

Its compression process is described below:

7. client is carried out Hash calculating to source file, generates the fixed length summary;

8. client will make a summary or make a summary+filename transfers to server end;

9. server end will make a summary or make a summary+filename mates with the summary dictionary, wherein summary accurately mates, filename carries out fuzzy matching, if the match is successful, then sets up mapping one by one, and passes mapping result back client, otherwise change 5.;

10. client obtains the File mapping result, and the result is preserved as compressed file, and compression finishes;

The summary that the server end is optionally submitted client to or summary+filename and source file add in the dictionary;

The client selects for use following compression method two or conventional compression method to compress.

Its compression process is described below:

7. client is carried out initial compression (option) to initial data;

8. the data of client after with initial data or initial compression are cut apart;

9. client is compared the divided data piece successively with the server end dictionary, perhaps data block is submitted to server end and compares, if find identical block in dictionary, then changes 4., otherwise changes 5.;

10. client obtains data block in server-side index numbering or address, and with its compression result as this data block.If 3. follow-up data block in addition then changes, otherwise changes 6.;

The data that the server end is optionally submitted client to are added in the dictionary, and client is then used existing compression algorithm data block.If 3. follow-up data block in addition then changes;

is combined into final compressed file with the compression result of each data block.

Claims

1. the destructive data compressing method of a dictionary Network Based, the dictionary that comprises compression algorithm, it is characterized in that, described dictionary comes from the packed data dictionary that server end is preserved, client is compared document or blocks of files and server end dictionary, and the match query value is as the method for compression result.

2. the destructive data compressing method of a kind of dictionary Network Based according to claim 1, it is characterized in that, server end packed data dictionary can comprise one of following two classes or all: a class is the full text dictionary that include file title, file content, hash value are set up, and another kind of is the dictionary that generates according to fixed length characteristic or elongated functional value.

3. the destructive data compressing method of a kind of dictionary Network Based according to claim 1, it is characterized in that, described client is compared document and server end dictionary, be meant that client desires compressed file with client and calculate message digest that the back generates or message digest+filename and server end through Hash and compare, wherein message digest accurately mates, and filename carries out fuzzy matching.

4. the destructive data compressing method of a kind of dictionary Network Based according to claim 1, it is characterized in that, described client is compared blocks of files and server end, it can be accurate coupling, it also can be approximate match, the matching value that returns afterwards comprises the difference value of server end dictionary index value and client file and server end file or the difference value after the compression.