WO2015019484A1 - Data compression device and data expansion device - Google Patents
Data compression device and data expansion device Download PDFInfo
- Publication number
- WO2015019484A1 WO2015019484A1 PCT/JP2013/071617 JP2013071617W WO2015019484A1 WO 2015019484 A1 WO2015019484 A1 WO 2015019484A1 JP 2013071617 W JP2013071617 W JP 2013071617W WO 2015019484 A1 WO2015019484 A1 WO 2015019484A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- plaintext
- compressed
- payload
- length
- Prior art date
Links
- 238000013144 data compression Methods 0.000 title claims abstract description 42
- 238000007906 compression Methods 0.000 claims abstract description 90
- 230000006835 compression Effects 0.000 claims abstract description 89
- 230000006837 decompression Effects 0.000 claims description 74
- 238000000605 extraction Methods 0.000 claims description 47
- 239000000284 extract Substances 0.000 claims description 12
- 230000008878 coupling Effects 0.000 abstract 2
- 238000010168 coupling process Methods 0.000 abstract 2
- 238000005859 coupling reaction Methods 0.000 abstract 2
- 238000003860 storage Methods 0.000 description 52
- 230000006870 function Effects 0.000 description 28
- 230000005055 memory storage Effects 0.000 description 26
- 238000000034 method Methods 0.000 description 26
- 230000015654 memory Effects 0.000 description 24
- 238000012545 processing Methods 0.000 description 24
- 238000006243 chemical reaction Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 15
- 238000004891 communication Methods 0.000 description 13
- 238000012546 transfer Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0688—Non-volatile semiconductor memory arrays
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3086—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6005—Decoder aspects
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6017—Methods or arrangements to increase the throughput
Definitions
- the present invention relates to data compression and data decompression techniques.
- LZ77 is the most common lossless data compression algorithm developed by Lempel and Ziv in 1977.
- the principle of LZ77 is slide dictionary type compression.
- LZ77 is the longest character string starting from the current pointer in a character string stream (referred to as a slide dictionary) of a predetermined length past the current pointer while moving the pointer from the beginning to the end of the character string stream. Find a match. Further, the LZ 77 converts the character string indicated by the current pointer into a copy symbol, thereby reducing the data amount of the character string stream and compressing the character string stream.
- LZ77 is known for its simple principle, easy mounting, and relatively good compression rate. Note that the size of the slide dictionary is arbitrary, and the greater the size of the slide dictionary, the greater the probability of matching the character strings, so the compression rate improves.
- Some storage systems for storing and managing large amounts of data have a data compression function because they can reduce the cost per capacity if more data is stored.
- Storage systems are classified into two types: file storage that manages data in units of files, and block storage that manages data in units of fixed-size sectors.
- file storage that manages data in units of files
- block storage that manages data in units of fixed-size sectors.
- most of the compression methods employed in the data compression function are the above-described LZ77 or an algorithm derived from LZ77.
- Patent Document 1 discloses a technique for increasing the speed of decompression of compressed data based on LZ77.
- Patent Document 1 discloses a parallelization technique for decompressing compressed data at high speed by software processing using a general-purpose processor. This technique divides plaintext data to be compressed into a plurality of blocks and compresses each block. The number of blocks is determined based on the parallel processing capability (number of cores, etc.) of the processor used for decompression.
- Patent Document 1 discloses a method of making the size of each block variable, adjusting division boundaries by repeating trial compression / expansion, and leveling the expansion time.
- data output throughput may be guaranteed.
- the decompression by software may not guarantee the throughput.
- a data compression device includes a division unit that divides input plaintext data into a plurality of plaintext blocks each having a predetermined plaintext block length, and a plurality of plaintexts. For each plaintext block of the block, generate a payload by compressing the plaintext block using a slide dictionary type compression algorithm, generate a header indicating the length of the payload, and generate a compressed block including the header and the payload A compression unit; and a concatenation unit that generates compressed data by concatenating a plurality of compressed blocks generated from the plurality of plaintext blocks.
- a data decompression device wherein plaintext data is divided into a plurality of plaintext blocks, each of the plurality of plaintext blocks has a predetermined plaintext block length, and each plaintext block of the plurality of plaintext blocks.
- a plaintext block is compressed by using a slide dictionary type compression algorithm, a payload is generated, a header indicating the length of the payload is generated, a compressed block including the header and the payload is generated, and a plurality of plaintext blocks are generated.
- 1 shows a configuration of a flash memory storage 101 according to an embodiment of the present invention.
- An example of data compression by the Deflate algorithm is shown.
- An example of data decompression by the Deflate algorithm is shown.
- 1 shows a configuration of a compression / decompression circuit 104 according to a first embodiment.
- the data compression process of Example 1 is shown.
- the specific example of the data compression process of Example 1 is shown.
- the data expansion process of Example 1 is shown.
- a specific example of the operation of the payload extraction unit 621 will be shown.
- a specific example of the configuration and operation of the code expansion unit 622 is shown.
- a specific example of the configuration and operation of the character resolution unit 623 will be described.
- 2 shows a configuration of a compression / decompression circuit 104 according to a second embodiment.
- a specific example of secondary compression is shown.
- flash memory storage such as SSD (Solid State Drive) equipped with a large amount of NAND flash memory, which is a nonvolatile semiconductor memory, has attracted attention as a storage system storage medium in addition to or instead of HDD (Hard Disk Drive).
- SSD Solid State Drive
- HDD Hard Disk Drive
- flash memory does not have a head seek mechanism like HDD, there is little cue delay (latency) in data access. Therefore, the flash memory has excellent speed performance in random data reading. For this reason, in applications such as databases that require high-speed random reads, the replacement of HDDs with flash memory storage as storage media in storage systems is progressing.
- the bit cost of flash memory storage is becoming cheaper year by year as the integration of flash memory cells is increased due to miniaturization of semiconductor processes, but it is still about 3 to 10 times higher than the bit cost of HDD.
- the bit cost of flash memory storage is a factor that hesitates to adopt flash memory storage.
- the physical data size stored in the flash memory can be reduced by introducing a data compression technique in the flash memory storage. Then, the storage capacity of the flash memory storage can be virtually increased, and the bit cost can be reduced.
- FIG. 1 shows the configuration of a computer system according to an embodiment of the present invention.
- This computer system has a host controller 110 and a plurality of flash memory storages 101.
- the host controller 110 is connected to the flash memory storage 101 and controls the flash memory storage 101.
- the computer system may have one flash memory storage 101.
- the flash memory storage 101 includes a host I / F (Interface) 102, a flash memory controller 103, a plurality of flash memories 105, and a DRAM (Dynamic Random Access Memory) 106 that is a volatile memory.
- the flash memory 105 is a chip having a NAND flash memory.
- the flash memory controller 103 includes a compression / decompression circuit 104 and a microprocessor (MP) 107.
- the microprocessor 107 is further connected to the host I / F 102, the flash memory 105, and the DRAM 106 and controls them.
- the microprocessor 107 interprets the contents of the read / write command from the host controller 110 according to the program stored in the DRAM 106, transmits / receives data to / from the host controller 110, controls compression / decompression of data by the compression / decompression circuit 104, flash Data transfer between the memory 105 and the DRAM 106 is executed.
- the computer system may be a storage system or a server system.
- the host controller 110 in the storage system is, for example, a storage controller that controls RAID (Redundant Arrays of Independent Disks) using the flash memory storage 101.
- the storage controller is connected to the host computer via a network such as a SAN (Storage Area Network), and controls the flash memory storage 101 according to a read / write command from the host computer.
- the host controller 110 in the server system is, for example, a server computer that performs read / write on the flash memory storage 101.
- the host I / F 102 is an interface mechanism for connecting to the host controller 110, and responds to a read / write command in order to transmit data to the host controller 110 or receive data from the host controller 110.
- the mechanism and command of the host I / F 102 and the data transmission / reception protocol conform to, for example, an interface specification compatible with the HDD.
- the compression / decompression circuit 104 is a logic circuit that performs data compression and data expansion. In order to reduce the amount of data stored in the flash memory 105, the compression / decompression circuit 104 reversibly compresses the received write data according to the write command from the host controller 110, and generates compressed data. Further, the compression / decompression circuit 104 decompresses the compressed data read from the flash memory 105 to generate plain text data in order to transmit the plain text data to the upper controller 110 in response to a read command from the upper controller 110.
- the microprocessor 107 first stores the write data from the host controller 110 in the DRAM 106. At this time, the microprocessor 107 returns a write completion response to the host controller 110. Thereafter, the microprocessor 107 instructs the compression / decompression circuit 104 to compress the write data. In response to this instruction, the compression / decompression circuit 104 compresses the write data and converts it into compressed data, and stores the compressed data in the DRAM 106. Then, the microprocessor 107 writes the compressed data to the flash memory 105.
- the microprocessor 107 reads the compressed data from the flash memory 105 in response to a read command from the host controller 110 and stores it in the DRAM 106. Thereafter, the microprocessor 107 instructs the compression / decompression circuit 104 to decompress the compressed data. In response to this instruction, the compression / decompression circuit 104 decompresses the compressed data, converts it into plain text data, and stores the plain text data in the DRAM 106. Then, the microprocessor 107 transmits the plain text data stored in the DRAM 106 to the host controller 110.
- the write performance of the flash memory storage 101 seen from the host controller 110 is constant regardless of whether or not the compression is performed.
- the read performance of the flash memory storage 101 that can be seen from the host controller 110 depends on the decompression time of the compressed data. That is, if the compression / decompression circuit 104 can perform high-speed real-time decompression processing, the read performance of the flash memory storage 101 can be improved.
- the compression / decompression circuit 104 is implemented as a logic circuit. Since the compression / decompression circuit 104 has a high-speed data expansion throughput, the high-speed random read performance characteristic of the flash memory storage 101 can be realized not only for uncompressed data but also for compressed data. In the storage system, when a read request for compressed data is received from the host controller 110, the throughput for decompressing the compressed data and returning the plaintext data to the host controller 110 is not significantly deteriorated as compared with the case of reading normal uncompressed data. In addition, it is preferable to decompress the compressed data as fast as possible.
- the use case of the present invention is not limited to the flash memory storage 101.
- the present invention can also be applied to the data compression function of other storage systems.
- the capacity cost is reduced by using the storage equipped with the compression function.
- the data compression function is often used for static data with low read access frequency, such as backup and snapshot images. According to the storage system to which the present invention is applied, high-speed read performance can be ensured even for compressed data, so the data compression function can be used for dynamic data with high read access frequency.
- FIG. 2 shows an example of data compression by the Deflate algorithm.
- the data compression by the Deflate algorithm includes LZ77 compression and encoding to a bit pattern.
- This figure shows plaintext data 200 represented as a character string, LZ77 compressed data 210 that is a character string obtained by compressing plaintext data 200 with LZ77, and compressed data 220 that is a bit pattern obtained by encoding LZ77 compressed data 210.
- the encoding is, for example, Huffman encoding.
- the data compression function sequentially checks whether or not the same character string as the appearing character string later appears again in the plaintext data 200 that is a character string stream.
- the data compression function converts the character string into a copy symbol [L, J].
- L is sometimes called the copy length
- J is sometimes called the copy distance.
- a four-character string 201 of “b, c, d, e” matches four characters continuously from the six characters before the first character “b”.
- the data compression function converts the character string 201 into a copy symbol 211 of [4, 6].
- the four-character string 202 of “a, b, a, b” is the same in four consecutive characters including the overlapping portion from the two characters before the first character “a”.
- the data compression function converts the character string 202 into a copy symbol 212 of [4, 2].
- the four-character string 203 of “c, d, e, f” coincides with four consecutive characters from 14 characters before the first character “c”. In this case, the data compression function converts the character string 203 into a copy symbol 213 of [4, 14].
- LZ77 compressed data 210 in which the data amount of the plaintext data 200 is reduced is generated by this conversion. be able to.
- the original character string has a data amount of characters that appear for the first time in the plaintext data 200 or a character string that has less than three consecutively matching characters (that is, L ⁇ 3). This conversion is not performed because it is sufficiently small.
- the range of the character string stream (hereinafter referred to as a dictionary) referred to in the above-described matching search is a range from the previous character to the past (forward) by the number of characters of a predetermined dictionary size. This is because if the dictionary size is not limited, the amount of data representing J (return amount) of the copy symbol [L, J] increases and the effect of reducing the data amount decreases. Moreover, if the dictionary size is not limited, the search time increases and the performance decreases. Since the dictionary range slides backward for each search, this dictionary is also called a slide dictionary, and LZ77 is one of slide dictionary type compression algorithms (slide dictionary method).
- the data compression function converts the longest matching character string into a copy symbol. This has the effect of reducing the amount of data more.
- LZ77 compression is character string conversion that reduces the amount of data by using a character string in the past range up to the dictionary size as a slide dictionary.
- Compressed data is not yet generated just by performing LZ77 compression.
- the copy symbol is a symbol that represents the operation of copying, and is not compressed data.
- the data compression function encodes a character that has not been converted to a copy symbol (hereinafter referred to as a literal character) and a copy symbol with a specified encoding method, and concatenates them to form a bitstream. To.
- This bit stream is the result of encoding according to the Deflate rule, and is the compressed data 220 that is finally generated.
- the bit pattern 221 is 13 bits long and the code word of the copy symbol [4,6]
- the bit pattern 222 is 12 bits long and the code word of the copy symbol [4,2]
- the bit pattern 223 is 14 bits. It is a codeword of bit length and copy symbol [4, 14].
- the other 8-bit length bit patterns are code words of literal characters. Thus, the bit pattern length of each codeword in the compressed data 220 is not fixed.
- FIG. 3 shows an example of data expansion by the Deflate algorithm.
- the data decompression function takes the bit stream of the compressed data 220 as input, and reverses the processing of the data compression function to restore the original plaintext data 200.
- the data decompression function needs to extract individual bit patterns (codewords) from the bit stream of the compressed data 220 obtained by concatenating a plurality of bit patterns. There is. However, since the bit pattern length of each codeword is not fixed, it is difficult to immediately extract the bit pattern from a random position.
- the data decompression function basically sets an extraction pointer from the beginning of the bit stream of the compressed data 220, and extracts bit patterns one after another in order.
- the extraction pointer is moved (242). To set the extraction pointer to position 243. The data decompression function then moves the extraction pointer (244) and sets the extraction pointer to the position 245 if it is found that the bit pattern “10010010” starting therefrom represents the character “b”. As described above, the data decompression function determines the position of the extraction pointer only after the previous bit pattern can be determined. In general, the decoding process that returns the compressed data 220 to the symbolic representation of the LZ77 compressed data 210 is such an inefficient serial process.
- the data expansion function knows in advance the variation of the bit pattern length of each codeword, it can enumerate extraction point candidates 247 indicating all cases of the arrangement of codewords.
- the data decompression function can know in advance that one real extraction point exists in the extraction point candidate 247. Therefore, the data decompression function not only advances the extraction of the code word from the beginning, but also starts provisional extraction for each point in the extraction point candidate 247 at the same time.
- the correct extraction point in the candidate 247 is determined as one. Since the previous extraction has already progressed from the extraction point at a fixed time, the processing performance is improved.
- this method is to be realized by hardware, it is necessary to arrange a large number of bit pattern decoders, so that the circuit scale becomes very large. Therefore, this method is actually difficult.
- the compression / decompression circuit 104 of this embodiment can suppress the increase in the circuit scale cost of data decompression by hardware processing and increase the speed.
- FIG. 4 shows the configuration of the compression / decompression circuit 104 according to the first embodiment.
- the compression / decompression circuit 104 includes a compression circuit 610 that performs data compression processing and a decompression circuit 620 that performs data expansion processing.
- the compression circuit 610 includes a division unit 611, a compression unit 617, and a connection unit 616.
- Plain text data is input to the dividing unit 611.
- the compression unit 617 includes a payload generation unit 612 and a header generation unit 613.
- the dividing unit 611 divides plaintext data into a plurality of plaintext blocks.
- the payload generation unit 612 generates a compressed payload by compressing a plaintext block using a slide dictionary type compression algorithm.
- the header generation unit 613 generates a header indicating the length of the compressed payload, and generates a compressed block including the compressed header and the compressed payload.
- the concatenation unit 616 generates compressed data by concatenating a plurality of compressed blocks.
- the compressed header is sometimes called a header.
- the compressed payload may be referred to as a payload.
- the expansion circuit 620 includes an extraction unit 627 and an expansion unit 628.
- the extraction unit 627 has a payload extraction unit 621.
- the decompression unit 628 includes a code expansion unit 622 and a character resolution unit 623.
- the payload extraction unit 621 receives compressed data.
- the payload extraction unit 621 recognizes the compressed header in the compressed data, and extracts the compressed payload from the compressed data based on the payload length indicated in the recognized compressed header.
- the code expansion unit 622 converts the compressed payload into an intermediate block having the same length as the plaintext block.
- the character resolution unit 623 restores the plaintext block by resolving unresolved characters in the intermediate block using the slide dictionary.
- the code expansion unit 622 and the character resolution unit 623 have a pipeline for converting the compressed payload into a plaintext block.
- the plaintext data to be compressed is divided into a plurality of plaintext blocks of equal size.
- the plaintext block length is N [bytes].
- the value of N is determined using the output throughput value, which is the throughput at which the decompression circuit 620 decompresses the compressed data and outputs plain text data, and the drive clock frequency of the decompression circuit 620.
- One byte in the plaintext data 400 means one character.
- each divided plaintext block includes a character string of N characters.
- the decompression circuit 620 decompresses the compressed data, and outputs plaintext data of N bytes for each drive clock cycle, so that plaintext data can be output at a predetermined output throughput.
- FIG. 5 shows the data compression processing of the first embodiment.
- the dividing unit 611 divides the input plaintext data into a plurality of blocks in units of N characters (bytes) (301).
- the payload generation unit 612 compresses each divided plaintext block by LZ77 compression as described above, and generates a bit stream by bit pattern encoding (302). This bit stream is called a compressed payload.
- step 302 the payload generation unit 612 performs LZ77 compression of each plaintext block according to the following two rules.
- the first rule is that the range of the slide dictionary may cross the plaintext block boundary. That is, the slide dictionary has a predetermined dictionary size and slides toward the rear of the plaintext data, as in general LZ77 compression, regardless of the plaintext block boundary.
- the second rule is a rule that the search is stopped when the character string comes to the end of the plaintext block while the match search is performed between the character string in the plaintext block and the slide dictionary. That is, a character string straddling a plaintext block boundary is prevented from being converted into one copy symbol. For example, if a character string of 10 characters exists before the J character, but there is a plaintext block boundary between the front 6 characters and the rear 4 characters in the character string, the front 6 characters are It is converted to copy symbol [6, J] and included before the boundary, and the last four characters are converted to copy symbol [4, J] and included after the boundary.
- the header generation unit 613 calculates how many bits the length of the compressed payload is, generates a compressed header indicating the length, and adds the compressed header immediately before each compressed payload. To obtain a compressed block (303).
- the compressed header is, for example, a bit pattern (1001010 for 74 bits) representing the bit length itself of the compressed payload.
- the compressed header is not limited thereto, and any data may be used as long as the bit length of the compressed payload can be specified.
- the concatenation unit 616 generates the compressed data 440 (bit stream) by concatenating all the compressed blocks (304), and ends this flow.
- FIG. 6 shows a specific example of the data compression processing of the first embodiment.
- the plaintext block length N is 8 by the plaintext block length calculation method described above. In the following description, N is 8.
- the dividing unit 611 divides plaintext data into units of 8 bytes. In this specific example, the first 40 bytes of the plaintext data 400 are divided into plaintext blocks 401, 402, 403, 404, 405,. Next, the payload generation unit 612 compresses the data of each plaintext block by LZ77.
- the plaintext block 401 is compressed into a 24-bit compressed payload 421 using the range from the beginning of the plaintext data 400 to the compression target as a slide dictionary.
- the plaintext block 402 is compressed into a 40-bit compressed payload 422 using the range 412 from the beginning of the plaintext data 400 to the compression target as a slide dictionary.
- the plaintext block 403 is compressed into a 48-bit compressed payload 423 using the range 413 from the beginning of the plaintext data 400 to the compression target as a slide dictionary.
- the plaintext block 404 is compressed into a 16-bit compressed payload 424 using the range 414 from the beginning of the plaintext data 400 to the compression target as a slide dictionary.
- the plaintext block 405 is compressed into a 40-bit compressed payload 425 using the range 415 from the beginning of the plaintext data 400 to the compression target as a slide dictionary. Thereafter, similarly, compression is performed up to the final plaintext block of the plaintext data.
- the dictionary range of each plaintext block always starts from the beginning of the plaintext data 400, but in the subsequent plaintext block, the size of the character string included in the slide dictionary is predetermined as in the conventional LZ77 compression.
- the dictionary size is reached, the beginning of the range of the slide dictionary slides behind the plaintext data 400. Thereafter, the character string included in the slide dictionary becomes a character string corresponding to the dictionary size from the compression target to the past.
- the compressed headers 431, 432, 433, 434, 435,... are data indicating the bit lengths of the compressed payloads 421, 422, 423, 424, 425,.
- the final compressed data 440 is a bit stream in which compressed blocks that are a combination of a compressed header and a compressed payload are concatenated.
- the plaintext data is divided into a plurality of plaintext blocks of the same size, the plaintext block is compressed using LZ77 compression, a compressed payload is generated, and a compressed header indicating the compressed payload is added. it can.
- the compressed payload can be extracted based on the compressed header, and the next compressed block can be extracted.
- FIG. 7 shows data decompression processing of the first embodiment.
- the payload extraction unit 621 has a buffer register. First, the payload extraction unit 621 receives input compressed data by a buffer register, and sets an analysis pointer indicating the position of a bit pattern to be analyzed at the head of the bit stream of the compressed data (that is, the head of the first compressed header). .
- the payload extraction unit 621 analyzes the compressed header starting from the analysis pointer, specifies the compressed payload length of the subsequent compressed payload based on the compressed header, and extracts the compressed payload based on the compressed payload length (501). .
- the payload extraction unit 621 sends the extracted compressed payload to the pipeline of the code expansion unit 622 and the character resolution unit 623 (502).
- the payload extraction unit 621 determines whether the end of the extracted compressed payload is the end of the compressed data (503). If the end of the extracted compressed payload is the end of the compressed data, the payload extraction unit 621 proceeds to Step 505. On the other hand, if it is not the end of the compressed data, the payload extraction unit 621 moves the analysis pointer to the next compressed header (504). Then, the payload extraction unit 621 makes the process transition to Step 501 again.
- step 505 the payload extraction unit 621 waits until the last compressed payload sent to the pipelines of the code expansion unit 622 and the character resolution unit 623 finishes passing through the pipeline (505).
- the payload extraction unit 621 performs this process in one drive clock cycle.
- Those skilled in the art who are familiar with logic circuit design can easily design the hardware of the payload extractor 621.
- the payload extraction unit 621 is a logic circuit having the following configuration, for example.
- Buffer register for storing compressed data input from the outside
- Bit pattern decoder connected to the beginning of the buffer register and analyzing the compressed header
- Payload based on the decoding result of the bit pattern decoder
- Data loader for extracting and passing to the above-mentioned pipeline
- Barrel shifter that moves data so that the next compressed header comes to the head of the buffer register based on the decoding result of the bit pattern decoder
- each circuit element operates simultaneously. As a result, one compressed payload is extracted for each cycle from the bit stream of the compressed data. Each compressed payload is passed to the pipeline of the code expansion unit 622 and the character resolution unit 623 every cycle.
- FIG. 8 shows a specific example of the operation of the payload extraction unit 621.
- This figure shows a configuration of the compressed data 700 and a timing chart 750 showing the operation of the pipeline.
- the payload extractor 621 extracts the subsequent compressed payload 721 based on the compressed payload length indicated in the first compressed header 711 in the compressed data 700, sends it to the pipeline, and analyzes it. Move the pointer to the next compressed header 712.
- the payload extractor 621 extracts the subsequent compressed payload 721 based on the compressed payload length indicated in the compressed header 712, sends it to the pipeline, and moves the analysis pointer to the next compressed header 713.
- the payload extraction unit 621 extracts the subsequent compressed payload 723 based on the compressed payload length indicated in the compressed header 713, sends it to the pipeline, and moves the analysis pointer to the next compressed header 714.
- the payload extraction unit 621 extracts the compressed payload up to the end of the compressed data.
- the timing chart 750 shows an operation period of data expansion by the pipelines of the code expansion unit 622 and the character resolution unit 623 for each compressed payload.
- the operation period 731 indicates the timing at which the compressed payload 721 is processed by the code expansion unit 622.
- the operation period 732 indicates the timing at which the compressed payload 722 is processed by the code expansion unit 622.
- An operation period 733 indicates the timing at which the compressed payload 723 is processed by the code expansion unit 622.
- the operation period 741 indicates the timing at which the compressed payload 721 is processed by the character resolution unit 623.
- An operation period 742 indicates the timing at which the compressed payload 722 is processed by the character resolution unit 623.
- the operation period 743 indicates the timing at which the compressed payload 723 is processed by the character resolution unit 623.
- the pipeline processing timings of the code expansion unit 622 and the character resolution unit 623 of each of the plurality of compressed payloads constituting the compressed data 700 are shifted by one cycle.
- the code expansion unit 622 and the character resolution unit 623 simultaneously perform eight pipeline processes.
- the pipeline of the code expansion unit 622 converts the compressed payload into an intermediate block in 8 cycles
- the pipeline of the character resolution unit 623 converts the intermediate block into a plaintext block in 1 cycle.
- FIG. 9 shows a specific example of the configuration and operation of the code expansion unit 622.
- the code expansion unit 622 includes a bit pattern decoder 800, and the bit pattern decoder 800 analyzes the head portion of the bit stream of each compressed payload.
- the bit pattern decoder 800 has a function of generating an intermediate block corresponding to the compressed payload.
- the intermediate block is intermediate data for data decompression processing, has the same length as the plaintext block, and has an element of N characters (8 characters).
- the compressed payload element is a bit pattern indicating a literal character or a bit pattern indicating a copy symbol [L, J].
- the bit pattern decoder 800 detects a bit pattern indicating a literal character at the head of the bit stream, the bit pattern decoder 800 additionally writes the literal character on the intermediate block. Further, when the bit pattern decoder 800 detects a bit pattern indicating the copy symbol [L, J], the bit pattern decoder 800 additionally writes the undefined character [J] as many as the copy length L on the intermediate block.
- the compressed payload 723 is a bit stream of “e, g, c, [3, J], a, b”.
- the lowermost row shows the head bit pattern. Processing of the compressed payload 723 in the pipeline is performed from time T0 to time T8 in the timing chart 750.
- the code expansion unit 622 detects the first “e” bit pattern from the bit stream of the compressed payload 723.
- the code expansion unit 622 sets the literal character “e” 811 in the intermediate block.
- the code expansion unit 622 shifts the bitstream forward and detects the next “g” bit pattern.
- the code expansion unit 622 adds a literal character “g” 812 to the intermediate block.
- the code expansion unit 622 shifts the bit stream forward and detects the next “c” bit pattern.
- the code expansion unit 622 appends the literal character “c” 813 to the intermediate block.
- the code expansion unit 622 shifts the bit stream forward and detects the bit pattern of the copy symbol [3, J].
- the code expansion unit 622 adds three unconfirmed characters [J] 814 to the intermediate block.
- the code expansion unit 622 shifts the bit stream forward and detects the bit pattern of “a”.
- the code expansion unit 622 adds the literal character “a” 815 to the intermediate block.
- the code expansion unit 622 shifts the bit stream forward and detects the bit pattern “b”.
- the code expansion unit 622 appends the literal character “b” 816 to the intermediate block.
- the code expansion unit 622 completes the entry of N characters (8 characters) in the intermediate block.
- the code expansion unit 622 shifts the bit stream forward, and as a result, the bit stream becomes empty.
- the code expansion unit 622 does not perform bit pattern analysis. Therefore, the contents of the intermediate block do not change at times T7 and T8.
- the timing chart 750 indicates a period during which the content of the intermediate block does not change in the code expansion unit 622 by hatching. Since any bit stream of the compressed payload is a bit pattern string of a maximum of N bytes (8 bytes), an intermediate block corresponding to it after N cycles (8 cycles) after entering the code expansion unit 622 must be Complete.
- the compressed payload 723 is converted into intermediate blocks “e, g, c, [J], [J], [J], a, b”.
- this intermediate block five literal characters (e, g, c, a, b) mean that the character enters that position in the plaintext block.
- the three unconfirmed characters [J] indicate the numerical value J, which means that a literal character that is separated by J in the past only enters that position in the plaintext block, and what character is entered. Is still not resolved. Thereafter, the character resolution unit 623 resolves unconfirmed characters.
- the code expansion unit 622 in this specific example is an N-stage (8-stage) pipeline-type arithmetic circuit, eight bit streams of the compressed payload are simultaneously processed by the bit pattern decoder 800 arranged in the horizontal direction.
- This specific example shows only the processing of the code expansion unit 622 for the compressed payload 723, but the processing for the other seven compressed payloads is also proceeding on other registers at the same time.
- the intermediate block corresponding to the compressed payload 723 is on the register in the third column from the left
- the intermediate block corresponding to the compressed payload 722 is on the register in the fourth column from the left, and corresponds to the compressed payload 721.
- the intermediate block is on the fifth column register from the left.
- the code expansion unit 622 converts the code word to a literal character when the code word in the compressed payload indicates a literal character, and converts the code word to an undetermined character indicating that it is undetermined when the code word indicates a non-literal character.
- an intermediate block having a length of N characters can be generated from the compressed payload.
- the subsequent character resolution unit 623 can process N characters in one cycle.
- FIG. 10 shows a specific example of the configuration and operation of the character resolution unit 623.
- the character resolution unit 623 is installed as a process following the code expansion unit 622 and includes a dictionary register 900 and a data selector 910.
- the dictionary register 900 implements a slide dictionary.
- the data selector 910 detects an undetermined character included in the intermediate block output from the code expansion unit 622.
- the data selector 910 generates a plaintext block by selecting the character indicated by the unconfirmed character 901 from the dictionary register 900 and replacing the unconfirmed character with the selected character, and outputs the plaintext block. Input to 900.
- This specific example shows the operation of the character resolution unit 623 for the intermediate block 920 corresponding to the compressed payload 723.
- the intermediate block 920 is input to the character resolution unit 623 at time T8.
- the lowermost row indicates the first character.
- the data selector 910 detects an undetermined character string 901 having a length of 3 from the intermediate block 920. Thereafter, the data selector 910 selects from the dictionary register 900 a character string having a length of 3 J characters before the position of the undetermined character string 901. For example, when J indicates 26 in the undetermined character string 901, the character string 902 of “x, y, z” 26 characters before in the dictionary register 900 corresponds.
- each column indicates a plaintext block, and the position below the character in the plaintext block indicates a past character.
- the right position of the character in the plaintext block indicates the past character by the plaintext block length. Therefore, the data selector 910 determines the character string 903 by replacing the undetermined character string 901 with the character string 902.
- the plaintext block 904 “e, g, c, x, y, z, a, b” corresponding to the compressed payload 723 is restored at time T9.
- the restored N-character (8-character) plaintext block 904 is output as a part of plaintext data from the decompression circuit 620 and is added to the dictionary register 900 at the same time.
- the dictionary register is a shift register in units of N characters (8 characters). When a new plaintext block is added from the left end, the oldest plaintext block is discarded from the right end.
- the number of stages in the dictionary register 900 is the number obtained by dividing the dictionary size by N.
- the plaintext block can be restored by the character resolution unit 623 replacing the unconfirmed character with the literal character in the dictionary register 900. Further, the character resolution unit 623 processes N characters per cycle, so that the output throughput can be N times the driving clock.
- the decompression circuit 620 can extract the payload from the compressed data based on the payload length indicated in the recognized header by recognizing the compressed header of each of the plurality of compressed blocks. Furthermore, a plaintext block can be restored by decompressing the payload using a slide dictionary type compression algorithm.
- the payload extraction unit 621 extracts one compressed payload per cycle from the compressed data
- the pipeline of the code expansion unit 622 and the character resolution unit 623 restores the plaintext block, and outputs plaintext data in N characters per cycle. can do.
- the decompression throughput of data compressed by the slide dictionary type compression algorithm is always a predetermined value regardless of the data content.
- the throughput can be arbitrarily determined at the hardware design stage. Thereby, for example, in a storage system having a data compression function, high-speed read performance can be guaranteed even for compressed data.
- the size of the plain text data is large and the dictionary size is also large.
- the flash memory storage 110 since the flash memory storage 110 according to the present embodiment reads and writes in units of pages (several kilobytes), for example, the size of plaintext data can be reduced and the dictionary size can be reduced as compared with data compression / decompression by software. be able to.
- the amount of data is reduced by converting a character string that matches a past character string into a copy symbol by using LZ77 compression.
- secondary compression is further performed.
- the data amount is further reduced by combining them into one copy symbol.
- FIG. 11 shows the configuration of the compression / decompression circuit 104 of the second embodiment.
- the compression unit 617 according to the present embodiment includes a conversion unit 615 for secondary compression in addition to the elements of the compression unit 617 according to the first embodiment.
- the extraction unit 627 of the present embodiment includes an inverse conversion unit 625 for decompression with respect to secondary compression.
- the conversion unit 615 converts each copy symbol into one copy symbol when each of a plurality of consecutive compressed payloads is one copy symbol and the copy distance J of the copy symbol is the same.
- FIG. 12 shows a specific example of secondary compression.
- N 8.
- the payload generation unit 612 finds the same character string as the character string of three consecutive plaintext blocks in the past position by J characters in the slide dictionary.
- the payload generation unit 612 converts these three plaintext blocks into compressed payloads 1011, 1012, and 1013, respectively, by data compression processing similar to that in the first embodiment.
- Each of the compressed payloads 1011, 1012, 1013 becomes a copy symbol [8, J], and three identical copy symbols are arranged in succession.
- the header generation unit 613 adds the compressed headers 1001, 1002, and 1003 to the compressed payloads 1011, 1012, and 1013.
- the conversion unit 615 adds up the copy lengths 8 of the three copy symbols into one copy length of 24.
- the compressed payload 1021 is generated by converting the three copy symbols [8, J] into one copy symbol [24, J]. Further, the conversion unit 615 converts the compressed headers 1001, 1002, and 1003 into a compressed header 1020 that indicates the length of the compressed payload 1021. With this conversion, the number of compressed headers and compressed payloads is reduced from three to one, so that the amount of compressed data can be further reduced as compared with the first embodiment. Further, the compressed header 1020 includes combined information indicating that the compressed payload 1021 is a combined copy symbol.
- the compression circuit 610 can perform the processes other than the conversion unit 615 in the same manner as in the first embodiment.
- the inverse transform unit 625 determines whether or not the compressed header 1020 includes combined information.
- the inverse conversion unit 625 obtains a copy symbol combined from a single compressed payload 2011 that follows, and performs multiple conversions (M) as the inverse conversion of the conversion unit 615. Converted into a copy symbol, M compressed payloads 1011, 1012, 1013 are generated.
- the copy length L in the combined copy symbol is N ⁇ M. These are transferred to the code expansion unit 622 over M cycles one by one.
- the payload extraction unit 621 passes one compressed payload per cycle to the code expansion unit 622 as in the first embodiment.
- the decompression circuit 620 can perform the processes other than the inverse transform unit 625 in the same manner as in the first embodiment.
- the extraction unit 627 may determine that the compressed payload is a combined copy symbol without using the combined information. For example, when the compressed payload indicates a copy symbol and the copy length indicates M times the plaintext block length, the extraction unit 627 may convert the compressed payload into M consecutive copy symbols.
- the compression circuit 610 and the expansion circuit 620 may be different devices.
- the first storage device has a compression circuit 610
- the second storage device has a decompression circuit 620.
- the data read by the first storage device is compressed by the compression circuit 610 and converted into compressed data
- the compressed data is sent to the storage controller
- the storage controller sends the received compressed data to the second storage device
- the second storage device decompresses the received compressed data by the decompression circuit 620 and stores it.
- the load of the communication line between the storage controller, the storage controller and the first storage device, and the communication line between the storage controller and the second storage device is reduced by reducing the data transfer amount. Can do.
- the first communication device has a compression circuit 610
- the second communication device has an expansion circuit 620.
- Data input by the first communication device is compressed by the compression circuit 610 and converted into compressed data, and the compressed data is transmitted to the second communication device.
- the second communication device decompresses the received compressed data.
- the data is expanded by 620 and output. In this case, it is possible to reduce the load on the network by reducing the data transfer amount. Further, since the expansion circuit 620 performs expansion at a predetermined output throughput, the network communication speed can be guaranteed.
- the flash memory storage 101 may be a cache device.
- the computer system of the third modified example includes an HDD or a flash memory device as a storage device in addition to the elements of the computer system of the first embodiment.
- the storage device is connected to the host controller 110.
- the host controller 110 transfers the write data to the cache device according to the write command.
- the compression circuit 610 generates compressed write data by compressing the write data
- the flash memory 105 stores the compressed write data.
- the flash memory 105 may store write data.
- the host controller 110 transfers the compressed write data to the storage device and writes it to the storage device.
- the host controller 110 reads the data in the storage device in accordance with the read command, converts it into compressed read data, and transfers it from the storage device to the cache device.
- the flash memory 105 stores the compressed read data
- the expansion circuit 620 generates read data by expanding the compressed read data.
- the flash memory 105 may store read data.
- the host controller 110 transfers read data from the cache device. As a result, the amount of data stored in the storage device can be reduced, and the amount of data transferred between the cache device and the storage device can be reduced. Further, it is possible to guarantee the read throughput in the computer system.
- the compression circuit 610 and the decompression circuit 620 may use a slide dictionary type compression algorithm other than LZ77.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
(2)バッファレジスタの先頭部分に接続され、圧縮ヘッダを分析するビットパタンデコーダ
(3)ビットパタンデコーダのデコード結果に基づきペイロードを抽出し、前述のパイプラインに渡すためのデータローダ
(4)ビットパタンデコーダのデコード結果に基づき、次の圧縮ヘッダがバッファレジスタの先頭に来るようにデータを動かすバレルシフタ (1) Buffer register for storing compressed data input from the outside (2) Bit pattern decoder connected to the beginning of the buffer register and analyzing the compressed header (3) Payload based on the decoding result of the bit pattern decoder Data loader for extracting and passing to the above-mentioned pipeline (4) Barrel shifter that moves data so that the next compressed header comes to the head of the buffer register based on the decoding result of the bit pattern decoder
101: Flash memory storage 103: Flash memory controller 104: Compression / decompression circuit 105: Flash memory 107: Microprocessor 110: Host controller 610: Compression circuit 611: Dividing unit 612: Payload generating unit 613 : Header generation unit, 615: conversion unit, 616: concatenation unit, 617: compression unit, 620: decompression circuit, 621: payload extraction unit, 622: code expansion unit, 623: character resolution unit, 625: inverse conversion unit, 627 : Extraction unit, 628: Extension unit
Claims (10)
- 入力される平文データを、夫々が所定の平文ブロック長を有する複数の平文ブロックに分割する分割部と、
前記複数の平文ブロックの夫々の平文ブロックに対し、スライド辞書型圧縮アルゴリズムを用いて前記平文ブロックを圧縮することによりペイロードを生成し、前記ペイロードの長さを示すヘッダを生成し、前記ヘッダおよび前記ペイロードを含む圧縮ブロックを生成する圧縮部と、
前記複数の平文ブロックから生成された複数の圧縮ブロックを連結することにより圧縮データを生成する連結部と、
を備えるデータ圧縮装置。 A dividing unit that divides input plaintext data into a plurality of plaintext blocks each having a predetermined plaintext block length;
For each plaintext block of the plurality of plaintext blocks, a payload is generated by compressing the plaintext block using a slide dictionary type compression algorithm, a header indicating the length of the payload is generated, and the header and the header A compression unit that generates a compressed block including a payload;
A concatenation unit that generates compressed data by concatenating a plurality of compressed blocks generated from the plurality of plaintext blocks;
A data compression apparatus comprising: - 前記平文ブロック長は、前記圧縮データを伸張する論理回路により伸張されたデータの出力スループットを、前記論理回路の駆動クロックの周波数で除した値である、
請求項1に記載のデータ圧縮装置。 The plaintext block length is a value obtained by dividing the output throughput of data expanded by the logic circuit that expands the compressed data by the frequency of the driving clock of the logic circuit.
The data compression apparatus according to claim 1. - 前記圧縮部は、前記平文ブロック内の第1データが前記平文データ内の過去の第2データに一致している場合、前記第1データから第2データへの距離を示すコピー距離と前記第1データの長さであるコピー長さとを示すコピー記号を用いて、前記第1データを前記コピー記号に変換し、前記コピー記号を示す符号語を前記ペイロードに含め、
前記複数の平文ブロックから生成された複数のペイロードの中の連続するM個のペイロードの夫々が同一の第1コピー記号を示し、且つ前記第1コピー記号のコピー長さが前記平文ブロック長である場合、前記圧縮部は、前記M個のペイロードを一つの第2コピー記号に変換し、前記第2コピー記号のコピー長さとして前記平文ブロック長のM倍を指定する、
請求項1に記載のデータ圧縮装置。 When the first data in the plaintext block matches the past second data in the plaintext data, the compression unit includes a copy distance indicating a distance from the first data to the second data, and the first data Using a copy symbol indicating a copy length that is the length of data, the first data is converted into the copy symbol, a code word indicating the copy symbol is included in the payload,
Each of M consecutive payloads among the plurality of payloads generated from the plurality of plaintext blocks indicates the same first copy symbol, and the copy length of the first copy symbol is the plaintext block length. In this case, the compression unit converts the M payloads into one second copy symbol, and specifies M times the plaintext block length as the copy length of the second copy symbol.
The data compression apparatus according to claim 1. - 平文データが複数の平文ブロックに分割され、前記複数の平文ブロックの夫々が所定の平文ブロック長を有し、前記複数の平文ブロックの夫々の平文ブロックに対し、スライド辞書型圧縮アルゴリズムを用いて前記平文ブロックを圧縮することによりペイロードが生成され、前記ペイロードの長さを示すヘッダが生成され、前記ヘッダおよび前記ペイロードを含む圧縮ブロックが生成され、前記複数の平文ブロックに対して生成された複数の圧縮ブロックを連結することにより生成された圧縮データを用い、前記圧縮データから前記複数の圧縮ブロックの夫々のヘッダを認識し、前記認識されたヘッダに示されている前記ペイロード長に基づいて前記圧縮データからペイロードを抽出する抽出部と、
前記スライド辞書型圧縮アルゴリズムを用いて前記抽出されたペイロードを伸張することにより前記平文ブロックを復元する伸張部と、
を備えるデータ伸張装置。 The plaintext data is divided into a plurality of plaintext blocks, each of the plurality of plaintext blocks has a predetermined plaintext block length, and each plaintext block of the plurality of plaintext blocks is subjected to the slide dictionary compression algorithm. A payload is generated by compressing a plaintext block, a header indicating the length of the payload is generated, a compressed block including the header and the payload is generated, and a plurality of blocks generated for the plurality of plaintext blocks are generated. Using compressed data generated by concatenating compressed blocks, recognizing each header of the plurality of compressed blocks from the compressed data, and compressing based on the payload length indicated in the recognized header An extractor that extracts the payload from the data;
A decompression unit for restoring the plaintext block by decompressing the extracted payload using the slide dictionary type compression algorithm;
A data decompression device comprising: - 前記平文ブロック長は、前記伸張部により復元される平文ブロックの出力スループットを、前記伸張部の駆動クロックの周波数で除した値である、
請求項4に記載のデータ伸張装置。 The plaintext block length is a value obtained by dividing the output throughput of the plaintext block restored by the decompression unit by the frequency of the driving clock of the decompression unit.
The data decompression device according to claim 4. - 前記伸張部は、前記抽出されたペイロード内の符号語がリテラル文字を示す場合に前記符号語をリテラル文字に変換し、前記符号語がリテラル文字以外を示す場合に未確定であることを示す文字である未確定文字に変換することにより、前記抽出されたペイロードから、前記平文ブロック長を有する中間ブロックを生成する、
請求項5に記載のデータ伸張装置。 The decompression unit converts the code word to a literal character when the code word in the extracted payload indicates a literal character, and indicates that the code word is undefined when the code word indicates a non-literal character An intermediate block having the plaintext block length is generated from the extracted payload by converting the unconfirmed character to
The data decompression device according to claim 5. - 前記伸張部は、前記未確定文字をスライド辞書内のリテラル文字に変換することにより、前記中間ブロックから平文ブロックを生成し、前記生成された平文ブロックを前記スライド辞書に格納する、
請求項6に記載のデータ伸張装置。 The expansion unit generates a plaintext block from the intermediate block by converting the unconfirmed character into a literal character in a slide dictionary, and stores the generated plaintext block in the slide dictionary.
The data decompression device according to claim 6. - 前記伸張部は、前記抽出されたペイロードを伸張するパイプラインを含む、
請求項7に記載のデータ伸張装置。 The decompression unit includes a pipeline that decompresses the extracted payload.
The data decompression device according to claim 7. - 前記伸張部は、前記駆動クロックのサイクル毎に、一つの平文ブロックを復元する、
請求項8に記載のデータ伸張装置。 The decompression unit restores one plaintext block every cycle of the driving clock.
The data decompression device according to claim 8. - 前記平文ブロック内の第1データが前記平文データ内の過去の第2データに一致している場合、前記第1データから第2データへの距離を示すコピー距離と前記第1データの長さであるコピー長さとを示すコピー記号を用いて、前記第1データが前記コピー記号に変換され、前記コピー記号を示す符号語が前記ペイロードに含められ、
前記抽出されたペイロードが第3コピー記号を示し、且つ前記第3コピー記号のコピー長さが前記平文ブロック長のM倍を示す場合、前記抽出部は、前記抽出されたペイロードを連続するM個の第4コピー記号に変換し、前記第4コピー記号のコピー長さとして前記平文ブロック長を指定する、
請求項4に記載のデータ伸張装置。
When the first data in the plaintext block matches the past second data in the plaintext data, the copy distance indicating the distance from the first data to the second data and the length of the first data Using a copy symbol indicating a certain copy length, the first data is converted into the copy symbol, a code word indicating the copy symbol is included in the payload,
When the extracted payload indicates a third copy symbol, and the copy length of the third copy symbol indicates M times the plaintext block length, the extraction unit includes M consecutive extracted payloads. And the plaintext block length is designated as the copy length of the fourth copy symbol.
The data decompression device according to claim 4.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015530638A JP6009676B2 (en) | 2013-08-09 | 2013-08-09 | Data compression device and data decompression device |
PCT/JP2013/071617 WO2015019484A1 (en) | 2013-08-09 | 2013-08-09 | Data compression device and data expansion device |
US14/360,500 US9479194B2 (en) | 2013-08-09 | 2013-08-09 | Data compression apparatus and data decompression apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/071617 WO2015019484A1 (en) | 2013-08-09 | 2013-08-09 | Data compression device and data expansion device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015019484A1 true WO2015019484A1 (en) | 2015-02-12 |
Family
ID=52460851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/071617 WO2015019484A1 (en) | 2013-08-09 | 2013-08-09 | Data compression device and data expansion device |
Country Status (3)
Country | Link |
---|---|
US (1) | US9479194B2 (en) |
JP (1) | JP6009676B2 (en) |
WO (1) | WO2015019484A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017041834A (en) * | 2015-08-21 | 2017-02-23 | ブラザー工業株式会社 | Data processing system, decompression device, and compression device |
EP3242243A1 (en) * | 2016-05-03 | 2017-11-08 | Safran Identity & Security | Method for backing up and restoring data of a secure element |
WO2024105793A1 (en) * | 2022-11-15 | 2024-05-23 | 株式会社メガチップス | Memory system, decoding circuit, and encoded data generating method |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10411732B2 (en) | 2017-02-13 | 2019-09-10 | International Business Machines Corporation | Parallel Lempel-Ziv compression for highly-parallel computer architectures |
US10606840B2 (en) * | 2017-02-13 | 2020-03-31 | International Business Machines Corporation | Parallel Lempel-Ziv decompression for highly-parallel computer architectures |
KR102659832B1 (en) | 2019-03-05 | 2024-04-22 | 삼성전자주식회사 | Data storage device and system |
US10944423B2 (en) * | 2019-03-14 | 2021-03-09 | International Business Machines Corporation | Verifying the correctness of a deflate compression accelerator |
US11050436B2 (en) * | 2019-06-21 | 2021-06-29 | Sap Se | Advanced database compression |
JP7197541B2 (en) * | 2020-04-01 | 2022-12-27 | 株式会社日立製作所 | storage device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10187410A (en) * | 1996-12-24 | 1998-07-21 | Fujitsu Ltd | Method and device for compressing data |
JP2005269184A (en) * | 2004-03-18 | 2005-09-29 | Seiko Epson Corp | Data compressing process, program, data recovery method, and apparatus |
WO2009057459A1 (en) * | 2007-10-30 | 2009-05-07 | Nec Corporation | Data compression method |
JP2011193406A (en) * | 2010-03-16 | 2011-09-29 | Ricoh Co Ltd | Data processing apparatus and method |
JP2013150041A (en) * | 2012-01-17 | 2013-08-01 | Fujitsu Ltd | Program, compressed file generation method, compression code expansion method, information processing apparatus and recording medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5831558A (en) * | 1996-06-17 | 1998-11-03 | Digital Equipment Corporation | Method of compressing and decompressing data in a computer system by encoding data using a data dictionary |
US5861827A (en) * | 1996-07-24 | 1999-01-19 | Unisys Corporation | Data compression and decompression system with immediate dictionary updating interleaved with string search |
JP3337633B2 (en) * | 1997-12-03 | 2002-10-21 | 富士通株式会社 | Data compression method and data decompression method, and computer-readable recording medium recording data compression program or data decompression program |
US7167115B1 (en) * | 2005-08-26 | 2007-01-23 | American Megatrends, Inc. | Method, apparatus, and computer-readable medium for data compression and decompression utilizing multiple dictionaries |
US8326605B2 (en) * | 2008-04-24 | 2012-12-04 | International Business Machines Incorporation | Dictionary for textual data compression and decompression |
EP2148444B1 (en) | 2008-07-21 | 2010-09-15 | Sony Computer Entertainment Europe Limited | Data compression and decompression |
-
2013
- 2013-08-09 US US14/360,500 patent/US9479194B2/en active Active
- 2013-08-09 JP JP2015530638A patent/JP6009676B2/en not_active Expired - Fee Related
- 2013-08-09 WO PCT/JP2013/071617 patent/WO2015019484A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10187410A (en) * | 1996-12-24 | 1998-07-21 | Fujitsu Ltd | Method and device for compressing data |
JP2005269184A (en) * | 2004-03-18 | 2005-09-29 | Seiko Epson Corp | Data compressing process, program, data recovery method, and apparatus |
WO2009057459A1 (en) * | 2007-10-30 | 2009-05-07 | Nec Corporation | Data compression method |
JP2011193406A (en) * | 2010-03-16 | 2011-09-29 | Ricoh Co Ltd | Data processing apparatus and method |
JP2013150041A (en) * | 2012-01-17 | 2013-08-01 | Fujitsu Ltd | Program, compressed file generation method, compression code expansion method, information processing apparatus and recording medium |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017041834A (en) * | 2015-08-21 | 2017-02-23 | ブラザー工業株式会社 | Data processing system, decompression device, and compression device |
EP3242243A1 (en) * | 2016-05-03 | 2017-11-08 | Safran Identity & Security | Method for backing up and restoring data of a secure element |
FR3051061A1 (en) * | 2016-05-03 | 2017-11-10 | Morpho | METHOD FOR BACKING UP AND RESTORING DATA OF A SECURE ELEMENT |
US10387054B2 (en) | 2016-05-03 | 2019-08-20 | Idemia Identity & Security | Secure element including a non-volatile memory and methods for saving and restoring data including defragmenting and compressing data stored in occupied and free regions |
WO2024105793A1 (en) * | 2022-11-15 | 2024-05-23 | 株式会社メガチップス | Memory system, decoding circuit, and encoded data generating method |
JP7493062B1 (en) | 2022-11-15 | 2024-05-30 | 株式会社メガチップス | MEMORY SYSTEM, DECODING CIRCUIT, AND ENCODED DATA GENERATION METHOD |
Also Published As
Publication number | Publication date |
---|---|
JPWO2015019484A1 (en) | 2017-03-02 |
JP6009676B2 (en) | 2016-10-19 |
US20160233880A1 (en) | 2016-08-11 |
US9479194B2 (en) | 2016-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6009676B2 (en) | Data compression device and data decompression device | |
US7358867B2 (en) | Content independent data compression method and system | |
US8407378B2 (en) | High-speed inline data compression inline with an eight byte data path | |
CN102244518A (en) | System and method for realizing parallel decompression of hardware | |
JPH07283739A (en) | Method and device to compress and extend data of short block | |
KR20120115244A (en) | Evaluating alternative encoding solutions during data compression | |
US10303402B2 (en) | Data compression using partial statistics | |
US11424761B2 (en) | Multiple symbol decoder | |
JP2016052046A (en) | Compression device, decompression device and storage device | |
JP7381393B2 (en) | Conditional transcoder and transcoding method for encoded data | |
JP2023064241A (en) | Storage system and data processing method in storage system | |
JP7305609B2 (en) | A device that processes received data | |
US12019921B2 (en) | Apparatus for processing received data | |
EP2779467A2 (en) | Staged data compression, including block-level long-range compression, for data streams in a communications system | |
US20240106459A1 (en) | Compression device and compression method | |
Vasanthi et al. | Implementation of Robust Compression Technique Using LZ77 Algorithm on Tensilica's Xtensa Processor | |
US11909423B2 (en) | Compression circuit, storage system, and compression method | |
US11593311B2 (en) | Compression system with longest match processing for generating compressed data | |
KR20240078422A (en) | Conditional transcoding for encoded data | |
JP2023132713A (en) | Data expansion device, memory system, and data expansion method | |
JP2005175926A (en) | Decoder and decoding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 14360500 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13890929 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2015530638 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13890929 Country of ref document: EP Kind code of ref document: A1 |