CN101800761A - Lossless data compression method based on network dictionary - Google Patents

Lossless data compression method based on network dictionary Download PDF

Info

Publication number
CN101800761A
CN101800761A CN 200910186807 CN200910186807A CN101800761A CN 101800761 A CN101800761 A CN 101800761A CN 200910186807 CN200910186807 CN 200910186807 CN 200910186807 A CN200910186807 A CN 200910186807A CN 101800761 A CN101800761 A CN 101800761A
Authority
CN
China
Prior art keywords
dictionary
server end
compression
client
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200910186807
Other languages
Chinese (zh)
Other versions
CN101800761B (en
Inventor
吴昊
刘鹏
陈宏欣
冯小辉
虞芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Communications Institute of Technology
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 200910186807 priority Critical patent/CN101800761B/en
Publication of CN101800761A publication Critical patent/CN101800761A/en
Application granted granted Critical
Publication of CN101800761B publication Critical patent/CN101800761B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a lossless data compression method based on a network dictionary, comprising a dictionary with a compression algorithm, the dictionary is originated from a compressed data dictionary which is stored at a server end, a document summarization or a document block is compared with the server end dictionary by a client, a matching value is queried to act as a compression result, thereby effectively improving the compression efficiency and specially being suitable to the document with mass copies on the network.

Description

A kind of destructive data compressing method of dictionary Network Based
Technical field
The present invention relates to the destructive data compressing method of a kind of destructive data compressing method, particularly a kind of dictionary Network Based.
Technical background
Compress technique roughly can be divided into lossy compression method and lossless compress, lossy compression method is generally used for the multi-medium data compression, lossless compress then is generally used for the conventional data compression, lossless compress can be divided into again based on the compression method of statistical model with based on the compression method of dictionary model, the former is represented as Huffman coding and arithmetic coding, and the latter is represented as LZ77, LZ78, LZW etc.Now popular on the market general lossless compress software adopts the compression method based on dictionary usually, for example ZIP, LHarc, ARJ etc., yet the dictionary of these compression algorithms is this locality and generates based on source file, existing compression method based on dictionary, no matter its dictionary is static dictionary or dynamically generates, all be in this locality, compression efficiency is limited usually.
Summary of the invention
Its purpose of the present invention just is to provide a kind of destructive data compressing method of dictionary Network Based, can effectively promote compression efficiency, and limiting case lower compression efficient is near 100%.
The technical scheme that realizes above-mentioned purpose and take, the dictionary that comprises compression algorithm, described dictionary comes from the packed data dictionary that server end is preserved, and client is compared document or blocks of files and server end dictionary, and the match query value is as the method for compression result.
Compared with prior art the present invention has the following advantages.
Because adopted various types of dictionaries of private server preservation, and set up the technology of dictionary index or dictionary address list, thereby can effectively promote compression efficiency, limiting case lower compression efficient is near 100%.
Embodiment
The dictionary that comprises compression algorithm, described dictionary come from the packed data dictionary that server end is preserved, and client is compared document or blocks of files and server end dictionary, and the match query value is as the method for compression result.
Server end packed data dictionary can comprise one of following two classes or all: a class is the full text dictionary that include file title, file content, hash value are set up, and another kind of is the dictionary that generates according to fixed length characteristic or elongated functional value.
Described client is compared document and server end dictionary, be meant that client desires compressed file with client and calculate message digest that the back generates or message digest+filename and server end through Hash and compare, wherein message digest accurately mates, and filename carries out fuzzy matching.
Described client is compared blocks of files and server end, it can be accurate coupling, can be approximate match also, the matching value that returns afterwards comprises the difference value of server end dictionary index value and client file and server end file or the difference value after the compression.
The present invention is based on the destructive data compressing method of network dictionary, existing compression method based on dictionary, and no matter its dictionary is static dictionary or dynamically generates, all be in this locality.The difference key of this compression method is that special-purpose server preserved various types of dictionaries, and has set up dictionary index or dictionary address list.
Its operation principle is as follows: set up special-purpose server in order to preserve information such as dictionary, piecemeal dictionary and summary in full, when compressing at full text, the source file of desire compression is calculated through Hash, generation fixed length summary, reach server and server summary and compare, then return as compression result with the index of data in the server as coupling; When compressing, the piecemeal and the server end piecemeal dictionary of desire compression are compared, find out the data of similarity maximum, return this data directory then and difference value is beamed back client after having the compression method compression now at the piecemeal document.These two kinds of method combinations can effectively promote compression efficiency,
Embodiment
Compression method one: because many files have too many copy, so with the file is the unit definition dictionary, then Ya Suo time efficiency and space efficiency can be high a lot, can adopt the HASH algorithm that original is formed the fixed length summary during specific implementation, compare with server end, if the two is identical, then can set up one-to-one relationship.
Its compression process is described below:
1. client is carried out HASH calculating to source file, generates the fixed length summary;
2. client will make a summary or make a summary+filename transfers to server end;
3. server end will make a summary or make a summary+filename mates with the summary dictionary, wherein summary accurately mates, filename carries out fuzzy matching, if the match is successful, then sets up mapping one by one, and passes mapping result back client, otherwise change 5.;
4. client obtains the File mapping result, and the result is preserved as compressed file, and compression finishes;
5. the summary optionally client submitted to of server end or summary+filename and source file add in the dictionary;
6. client selects for use following compression method two or conventional compression method to compress.
Compression method two: client is when compression, the data of data original text or process initial compression can be divided into piece, its cutting procedure can be that fixed length is cut apart, it also can be elongated cutting apart, again with each piecemeal and the comparison of server dictionary, if identical then only need numbering or the address of this data block of record in dictionary, if difference then adopt existing compression algorithm to compress, server end can optionally add these data in data dictionary.
Its compression process is described below:
1. client is carried out initial compression (option) to initial data;
2. the data of client after with initial data or initial compression are cut apart;
3. client is compared the divided data piece successively with the server end dictionary, perhaps data block is submitted to server end and compares, if find identical block in dictionary, then changes 4., otherwise changes 5.;
4. client obtains data block in server-side index numbering or address, and with its compression result as this data block.If 3. follow-up data block in addition then changes, otherwise changes 6.;
5. the data optionally client submitted to of server end are added in the dictionary, and client is then used existing compression algorithm data block.If 3. follow-up data block in addition then changes;
6. the compression result with each data block is combined into final compressed file.
The existing data compression algorithm thinks that packed data is because data self have redundancy in essence.Data compression is to utilize various algorithms that data redundancy is compressed to minimum, and reduces distortion as much as possible, thereby improves efficiency of transmission and conserve storage.And " destructive data compressing method of dictionary Network Based " do not consider the redundancy of data self merely, and the starting point has more been seen redundancy between the file in the highland.If how multiplexing the available data compression algorithm be to consider the small-scale subprogram of program inside, " destructive data compressing method of dictionary Network Based " considered so is exactly multiplexing large-scale member how.Its main feature is to have preserved a large amount of dictionaries by server end, and the classification of dictionary can be adopted various indexing means, for example filename, file type etc., can be divided at the compression of whole client file with at the piecemeal compression of client file at the compression of data, following method one is the compression at whole file, and method two is to compress at file block.
Compression method one: because many files have too many copy, so with the file is the unit definition dictionary, then Ya Suo time efficiency and space efficiency can be high a lot, can adopt hash algorithm that original is formed the fixed length summary during specific implementation, compare with server end, if the two is identical, then can set up one-to-one relationship.
Its compression process is described below:
7. client is carried out Hash calculating to source file, generates the fixed length summary;
8. client will make a summary or make a summary+filename transfers to server end;
9. server end will make a summary or make a summary+filename mates with the summary dictionary, wherein summary accurately mates, filename carries out fuzzy matching, if the match is successful, then sets up mapping one by one, and passes mapping result back client, otherwise change 5.;
10. client obtains the File mapping result, and the result is preserved as compressed file, and compression finishes;
The summary that the server end is optionally submitted client to or summary+filename and source file add in the dictionary;
The client selects for use following compression method two or conventional compression method to compress.
Compression method two: client is when compression, the data of data original text or process initial compression can be divided into piece, its cutting procedure can be that fixed length is cut apart, it also can be elongated cutting apart, again with each piecemeal and the comparison of server dictionary, if identical then only need numbering or the address of this data block of record in dictionary, if difference then adopt existing compression algorithm to compress, server end can optionally add these data in data dictionary.
Its compression process is described below:
7. client is carried out initial compression (option) to initial data;
8. the data of client after with initial data or initial compression are cut apart;
9. client is compared the divided data piece successively with the server end dictionary, perhaps data block is submitted to server end and compares, if find identical block in dictionary, then changes 4., otherwise changes 5.;
10. client obtains data block in server-side index numbering or address, and with its compression result as this data block.If 3. follow-up data block in addition then changes, otherwise changes 6.;
The data that the server end is optionally submitted client to are added in the dictionary, and client is then used existing compression algorithm data block.If 3. follow-up data block in addition then changes;
is combined into final compressed file with the compression result of each data block.

Claims (4)

1. the destructive data compressing method of a dictionary Network Based, the dictionary that comprises compression algorithm, it is characterized in that, described dictionary comes from the packed data dictionary that server end is preserved, client is compared document or blocks of files and server end dictionary, and the match query value is as the method for compression result.
2. the destructive data compressing method of a kind of dictionary Network Based according to claim 1, it is characterized in that, server end packed data dictionary can comprise one of following two classes or all: a class is the full text dictionary that include file title, file content, hash value are set up, and another kind of is the dictionary that generates according to fixed length characteristic or elongated functional value.
3. the destructive data compressing method of a kind of dictionary Network Based according to claim 1, it is characterized in that, described client is compared document and server end dictionary, be meant that client desires compressed file with client and calculate message digest that the back generates or message digest+filename and server end through Hash and compare, wherein message digest accurately mates, and filename carries out fuzzy matching.
4. the destructive data compressing method of a kind of dictionary Network Based according to claim 1, it is characterized in that, described client is compared blocks of files and server end, it can be accurate coupling, it also can be approximate match, the matching value that returns afterwards comprises the difference value of server end dictionary index value and client file and server end file or the difference value after the compression.
CN 200910186807 2009-12-25 2009-12-25 Lossless data compression method based on network dictionary Expired - Fee Related CN101800761B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910186807 CN101800761B (en) 2009-12-25 2009-12-25 Lossless data compression method based on network dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910186807 CN101800761B (en) 2009-12-25 2009-12-25 Lossless data compression method based on network dictionary

Publications (2)

Publication Number Publication Date
CN101800761A true CN101800761A (en) 2010-08-11
CN101800761B CN101800761B (en) 2013-04-17

Family

ID=42596252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910186807 Expired - Fee Related CN101800761B (en) 2009-12-25 2009-12-25 Lossless data compression method based on network dictionary

Country Status (1)

Country Link
CN (1) CN101800761B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857230A (en) * 2012-09-21 2013-01-02 中国科学院武汉物理与数学研究所 High-speed program controller on basis of lossless compression data transmission technology
CN103347047A (en) * 2013-06-07 2013-10-09 吴昊 Lossless data compression method based on online dictionaries
CN110196836A (en) * 2019-03-29 2019-09-03 腾讯科技(深圳)有限公司 A kind of date storage method and device
CN110321349A (en) * 2019-06-13 2019-10-11 暨南大学 A kind of self-adapting data of data-oriented origin system merges storage method
CN110728725A (en) * 2019-10-22 2020-01-24 苏州速显微电子科技有限公司 Hardware-friendly real-time system-oriented lossless texture compression algorithm
CN111464635A (en) * 2020-03-31 2020-07-28 新华三信息安全技术有限公司 Dictionary index transmission method and device
CN112187400A (en) * 2019-07-03 2021-01-05 大唐移动通信设备有限公司 Data transmission method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100428754C (en) * 2004-11-26 2008-10-22 上海理工大学 Medical record exchanging system based on ebXML
CN1972311A (en) * 2006-12-08 2007-05-30 华中科技大学 A stream media server system based on cluster balanced load

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857230A (en) * 2012-09-21 2013-01-02 中国科学院武汉物理与数学研究所 High-speed program controller on basis of lossless compression data transmission technology
CN102857230B (en) * 2012-09-21 2015-05-20 中国科学院武汉物理与数学研究所 High-speed program controller on basis of lossless compression data transmission technology
CN103347047A (en) * 2013-06-07 2013-10-09 吴昊 Lossless data compression method based on online dictionaries
CN103347047B (en) * 2013-06-07 2017-02-08 南京交通职业技术学院 Lossless data compression method based on online dictionaries
CN110196836A (en) * 2019-03-29 2019-09-03 腾讯科技(深圳)有限公司 A kind of date storage method and device
CN110196836B (en) * 2019-03-29 2024-05-10 腾讯云计算(北京)有限责任公司 Data storage method and device
CN110321349B (en) * 2019-06-13 2021-11-12 暨南大学 Self-adaptive data merging and storing method for data origin system
CN110321349A (en) * 2019-06-13 2019-10-11 暨南大学 A kind of self-adapting data of data-oriented origin system merges storage method
CN112187400A (en) * 2019-07-03 2021-01-05 大唐移动通信设备有限公司 Data transmission method and device
CN112187400B (en) * 2019-07-03 2022-04-12 大唐移动通信设备有限公司 Data transmission method and device
CN110728725A (en) * 2019-10-22 2020-01-24 苏州速显微电子科技有限公司 Hardware-friendly real-time system-oriented lossless texture compression algorithm
CN111464635A (en) * 2020-03-31 2020-07-28 新华三信息安全技术有限公司 Dictionary index transmission method and device
CN111464635B (en) * 2020-03-31 2022-02-22 新华三信息安全技术有限公司 Dictionary index transmission method and device

Also Published As

Publication number Publication date
CN101800761B (en) 2013-04-17

Similar Documents

Publication Publication Date Title
CN101800761B (en) Lossless data compression method based on network dictionary
US9223794B2 (en) Method and apparatus for content-aware and adaptive deduplication
US8645333B2 (en) Method and apparatus to minimize metadata in de-duplication
US8380688B2 (en) Method and apparatus for data compression
US7733247B1 (en) Method and system for efficient data transmission with server side de-duplication
US20120130965A1 (en) Data compression method
US20080275847A1 (en) Scalable minimal perfect hashing
CN105868305A (en) A fuzzy matching-supporting cloud storage data dereplication method
Bhattacharjee et al. Comparison study of lossless data compression algorithms for text data
CN103685589A (en) Binary coding-based domain name system (DNS) data compression and decompression methods and systems
CN109101504A (en) A kind of efficient log compression and indexing means
Yao et al. HRCM: an efficient hybrid referential compression method for genomic big data
US20090307247A1 (en) Data block compression using coalescion
Mahmood et al. An Efficient 6 bit Encoding Scheme for Printable Characters by table look up
CN112380196B (en) Server for data compression transmission
Talasila et al. Generalized deduplication: Lossless compression by clustering similar data
US10162832B1 (en) Data aware deduplication
CA2535282A1 (en) A method and system for message thread compression
Mahmood et al. A feasible 6 bit text database compression scheme with character encoding (6BC)
KR20080026772A (en) Method for a compression compensating restoration rate of a lempel-ziv compression method
Yan et al. Z-Dedup: A case for deduplicating compressed contents in cloud
US20200058379A1 (en) Systems and Methods for Compressing Genetic Sequencing Data and Uses Thereof
Zhou et al. Improving metadata caching efficiency for data deduplication via in-RAM metadata utilization
Wan et al. Sorting next generation sequencing data improves compression effectiveness
Mahmood et al. An Efficient Text Database Compression Technique using 6 Bit Character Encoding by Table Look Up

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: NANJING COMMUNICATIONS INSTITUTE OF TECHNOLOGY

Free format text: FORMER OWNER: WU HAO

Effective date: 20140609

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 332000 JIUJIANG, JIANGXI PROVINCE TO: NANJING, JIANGSU PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20140609

Address after: No. 629 Jiangsu Nanjing Science Park Avenue.

Patentee after: Nanjing Communications Institute of Technology

Address before: 332000 Department of electrical engineering, Jiujiang Vocational and Technical College, Jiangxi, Jiujiang

Patentee before: Wu Hao

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130417

Termination date: 20141225

EXPY Termination of patent right or utility model