CN116594572A

CN116594572A - Floating point number stream data compression method, device, computer equipment and medium

Info

Publication number: CN116594572A
Application number: CN202310869512.0A
Authority: CN
Inventors: 王勇; 杨谕黔; 于宁; 唐鹏洲; 王昊; 姚延栋; 翁岩青
Original assignee: Beijing Siweizongheng Data Technology Co ltd
Current assignee: Beijing Siweizongheng Data Technology Co ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-08-15
Anticipated expiration: 2043-07-17
Also published as: CN116594572B

Abstract

The embodiment of the application provides a floating point number streaming data compression method, a device, computer equipment and a medium, and relates to the technical field of data processing, wherein the method comprises the following steps: establishing an index table, wherein the index table is provided with N barrels, and each barrel is provided with M grooves; determining a key value based on the binary representation of the current floating point number; searching a target bucket in the N buckets according to the key value; sequentially performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value, and recording the position of the basic value in the current window; coding the position of the base value in the current window according to the zero bit number condition of the second value obtained by carrying out exclusive OR calculation on the current floating point number and the base value; and performing compression storage according to a preset storage format. By the scheme, lossless compression is realized, and compression rate and decompression speed are improved.

Description

Floating point number stream data compression method, device, computer equipment and medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a floating point number streaming data compression method, apparatus, computer device, and medium.

Background

The compression method can effectively reduce the volume of data, reduce the space occupation and reduce the data volume of IO (input/output) so as to improve the data processing speed. With rapid advances in data processing technology, data stream processing is becoming more and more common. The conventional compression method, such as zstd (Zstandard, lossless compression algorithm), is a compression method for fixed data, that is, a global search is performed for a fixed file or a larger data block to obtain a better compression effect. Stream data is generated continuously and processed at any time, and the relationship between the front and the back of the data is needed to be used for encoding and compressing.

The Gorilla algorithm is a method proposed by facebook, which uses binary form similarity between data, exclusive-ors the data with the previous data, and then stores the encoded data with the head and tail zeros removed. This approach does not work well in real scenes because floating point numbers have a special internal representation, and decimal similarity does not mean binary is similar.

Similarly, some systems store floating point numbers as strings to obtain a more repeatable representation, and better compression ratios can be obtained by using some general compression methods. However, this means that the character string is converted into a floating point number every time data is read and processed, and the cost is very high.

Victoria metrics provides another idea that drops floating point numbers with some precision and converts them into integer numbers for storage. This approach can effectively increase the compression ratio, but in many scenarios, the loss of accuracy is unacceptable to the user.

Chimp is another improvement over the Gorilla algorithm, which explores a large number of open datasets, and optimizes the bit-encoding scheme of the Gorilla algorithm, thus performing better than the Gorilla algorithm in most cases. It also inherits the lower decoding efficiency introduced by the Gorilla bit encoding.

Therefore, the conventional floating point number streaming data compression has the problems of high cost, low storage precision and low decoding efficiency during data reading and processing.

Disclosure of Invention

In view of the above, the embodiment of the application provides a floating point number streaming data compression method, so as to solve the problems of high cost, low storage precision and low decoding efficiency in data reading and processing in the floating point number streaming data compression in the prior art. The method comprises the following steps:

establishing an index table, wherein the index table is provided with N barrels, each barrel is provided with M grooves, and N and M are positive integers;

determining a key value as an index based on the binary representation of the current floating point number;

searching a target bucket in the N buckets by utilizing a hash searching method according to the key value;

performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel in sequence by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window;

coding the current floating point number according to the zero bit number condition of the second value obtained by performing exclusive OR calculation on the current floating point number and the basic value and the position of the basic value in the current window;

and compressing and storing the encoded floating point number according to a preset storage format.

The embodiment of the application also provides a floating point number compression device, which solves the problems of high cost, low storage precision and low decoding efficiency in data reading and processing in the floating point number streaming data compression in the prior art. The device comprises:

the index table establishing module is used for establishing an index table, the index table is provided with N barrels, each barrel is provided with M grooves, and N and M are positive integers;

a key value determining module for determining a key value as an index based on the binary representation of the current floating point number;

the target bucket searching module is used for searching target buckets in the N buckets by utilizing a hash searching method according to the key value;

the basic value searching module is used for sequentially performing exclusive-or calculation on the current floating point number and the data in each groove in the target barrel by using a linear searching method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as the basic value of the current floating point number coding, and recording the position of the basic value in the current window;

the encoding module is used for encoding the current floating point number according to the bit number condition of zero of the second value obtained by carrying out exclusive OR calculation on the current floating point number and the basic value and the position of the basic value in the current window;

the storage module is used for compressing and storing the encoded floating point number according to a preset storage format.

The embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the random floating point number stream data compression method when executing the computer program, so as to solve the problems of high cost, low storage precision and low decoding efficiency in data reading and processing in the floating point number stream data compression in the prior art.

The embodiment of the application also provides a computer readable storage medium which stores a computer program for executing any floating point number stream data compression method, so as to solve the problems of high data reading and processing cost, low storage precision and low decoding efficiency in the floating point number stream data compression in the prior art.

Compared with the prior art, the beneficial effects that above-mentioned at least one technical scheme that this description embodiment adopted can reach include at least: establishing an index table, wherein the index table is provided with N barrels, each barrel is provided with M grooves, and N and M are positive integers; determining a key value as an index based on the binary representation of the current floating point number; searching a target barrel in the N barrels by utilizing a hash searching method according to the key value; sequentially performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window; coding the current floating point number according to the zero bit number condition of the second value obtained by performing exclusive OR calculation on the current floating point number and the basic value and the position of the basic value in the current window; and compressing and storing the encoded floating point number according to a preset storage format. The application realizes lossless compression of data by using an exclusive OR method, improves the compression rate by utilizing efficient search of the data on the duration window, adopts simplified data representation, and is beneficial to improving the decompression speed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a floating point number streaming data compression method provided by an embodiment of the application;

FIG. 2 is a schematic diagram of an index table according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a coding format according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a configuration of a preset storage format according to an embodiment of the present application;

FIG. 5 is a block diagram of a computer device according to an embodiment of the present application;

FIG. 6 is a block diagram of a floating point number streaming data compression device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In an embodiment of the present application, a floating point number streaming data compression method is provided, as shown in fig. 1, where the method includes:

s1, establishing an index table, wherein the index table is provided with N barrels, each barrel is provided with M grooves, and N and M are positive integers;

s2, determining a key value serving as an index based on the binary representation of the current floating point number;

s3, searching a target barrel in the N barrels by utilizing a hash searching method according to the key value;

s4, sequentially performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of current floating point number coding, and recording the position of the basic value in a current window;

s5, coding the current floating point number according to the zero bit number condition of the second value obtained by carrying out exclusive OR calculation on the current floating point number and the basic value and the position of the basic value in the current window;

and S6, compressing and storing the encoded floating point number according to a preset storage format.

As can be seen from the flow shown in fig. 1, in the embodiment of the present application, as can be seen from step 3 and step 5, the present application inherits the xor concept of gorella to implement lossless compression of data, and uses efficient searching of data on a history window to increase compression rate, and uses simplified data representation to increase decompression speed.

Referring now to FIG. 2, it is described in detail how key values are determined as indices based on a binary representation of the current floating point number, i.e., how efficient lookup of historical values over a history window is expedited by a reduced index.

As shown in the upper half of fig. 2, as data arrives continuously, a window is formed, and 128 values are set as sliding windows. The current floating point number is compared with the previous value to select the optimal value, but each value is compared 128 times, which is very cumbersome. Therefore, the application adopts compact memory index to reduce comparison times under the condition of finding a better value.

As shown in the lower part of fig. 2, an index table is defined first, assuming N buckets, each with M slots (slots), where a bucket is implemented by hashing a specified column, splitting data under a column name into a group of buckets by hash value, and making each bucket correspond to a storage file under the column name, the slots being a unit for holding data. In general, m=8 can achieve a good effect, as shown in fig. 2, m=8 in this embodiment, the whole Hashmap is completely continuous in the memory, and the address of each bucket can be obtained only by calculation. Based on this basic structure, it is then described in detail how key values are determined as indices based on the binary representation of the current floating point number and the basic value is found using the key values, followed by three variant search methods for finding the basic value.

In one embodiment, the method is based on the search of the binary tail index of the current floating point number, and specifically comprises the following steps:

a first key value based on a plurality of bits at the tail of a binary system of the current floating point number as an index;

searching a target bucket in the N buckets by utilizing a hash searching method according to the first key value;

and sequentially performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window.

In one embodiment, the method is based on binary double-index lookup of the current floating point number, and is similar to an index table of tail index lookup, and the method is to additionally establish a head index lookup, namely, double-index lookup comprising tail index lookup and head index lookup, and specifically comprises the following steps:

and based on the second key value of the binary system of the current floating point number, when the floating point number is 64 bits, the sign bit is added with the second key value of the index from the 6 th bit to the 12 th bit, and when the floating point number is 32 bits, the sign bit is added with the second key value of the index from the 6 th bit to the 12 th bit.

In specific implementation, the process of double index searching is as follows: firstly, taking a plurality of bits at the tail of a binary system of a current floating point number as a first key value of an index; searching a target bucket in the N buckets by utilizing a hash searching method according to the first key value; performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel in sequence by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window;

if the basic value is not found based on the first key value, taking a plurality of bits of the binary head of the current floating point number as the second key value of the index, when the floating point number is 64 bits, adding the second key value from the 6 th bit to the 12 th bit from the sign bit, when the floating point number is 32 bits, adding the second key value from the 6 th bit to the 12 th bit from the sign bit to the 12 th bit from the high bit, and selecting the bits because the fluctuation of the bits has a larger influence on the exclusive or result for different floating point numbers; then searching a target bucket in the N buckets by utilizing a hash searching method according to the second key value; performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel in sequence by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window;

if no base value is found based on both the first key value and the second key value, the last floating point number of the current floating point number is employed as the base value.

In one embodiment, the binary hybrid search based on the current floating point number is a method for integrating tail index search and double index search, which has better finding speed and close effect, because the memory overhead of two index tables in the double index search scheme is larger, the search times are also larger, and although the effect may be good, the variant of the hybrid search is a method for integrating tail index search and double index search, which comprises the following steps:

when the floating point number is 64 bits, a third key value based on the binary sign bit, the upper 7 th bit, and the lower 6 th bit of the current floating point number as an index; when the floating point number is 32, taking the binary sign bit, the 5 th high bit and the 7 th low bit of the current floating point number as the third key value of the index;

then searching a target bucket in the N buckets by utilizing a hash searching method according to the third key value; and sequentially performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window.

In one embodiment, in step S4, if the key value is used to find the base value, and if the best base value is not found, the last floating point number of the current floating point number is used as the base value.

Therefore, the embodiment of the application adopts the compact memory index to find the basic value for encoding the floating point number by setting three different searching methods for the historical data, thereby reducing the comparison times and improving the searching speed and the decompression speed. And a proper searching method is selected according to the data condition of the floating point number, so that the searching speed can be effectively improved.

In one embodiment, the method specifically includes the following steps of:

the position of the base value in the current window is expressed as offset, and the value of the offset is 0 to (2 ^M-1 -1) the second value obtained by exclusive-or calculation of the current floating point number and the base value is expressed as xorj=xjx, xj is the current floating point number, xi is the base value;

when each bit of the Xorj is zero, only the offset is recorded during encoding;

when at least one bit of the Xorj is not zero, calculating the zero length L1 of the tail part of the Xorj and the zero length L2 of the head part of the Xorj, wherein the L1 and the L2 are all rounded according to bytes, and the total zero length L=L1+L2 of the head part and the tail part of the Xorj, wherein the zero length is a continuous zero bit number;

if L is greater than or equal to 1, recording an offset during encoding, setting the M position of the offset to be 1, recording L1 by 4 bits, recording L by 4 bits, and recording Xorj by at least one byte, wherein the rest part of the L1 length of the tail is removed after zero;

if L is less than 1, the first M-1 bits of the offset are marked as all 1's during encoding, the M-th position of the offset is marked as 1's, and then the original value of the current floating point number is recorded.

Taking M as an example, let M take 8 as an example, and referring to the coding format of fig. 3, let Xi be the found base value, which is a 64-bit floating point number, and Xj be the current floating point number, xorj=xj x, as discussed below, where the position is represented by offset, which is a value of 0-127.

If each bit of Xorj is zero (xorj= 0), that is, the number of the bits is completely equal to the number of the offset positions, only the offset is recorded during encoding, and the flow exits, referring to the first row in fig. 3;

otherwise, the offset is added with 128, i.e. the 8 th position is set to 1, which indicates that the information is still present, specifically refer to the following flow;

when at least one bit of the Xorj is not zero, calculating the zero length L1 of the tail part of the Xorj and the zero length L2 of the head part of the Xorj, wherein the L1 and the L2 are all rounded according to bytes, and the total zero length L=L1+L2 of the head part and the tail part of the Xorj, wherein the zero length is a continuous zero bit number; for example, xorj has 10 bits of zeros in succession at the head and 7 bits of zeros in succession at the tail, since 8 bits are one byte, the 10 bits at the head are rounded up to 1 in bytes, the 7 bits at the tail are rounded up to 0 in bytes, l1=1, l2=0, l=l1+l2=1;

referring to the second line in fig. 3, if L is 1 or more, the offset is recorded at the time of encoding and the 8 th position of the offset is set to 1, then L1 (tail zero length) is recorded with 4 bits, L (total zero length) is recorded with 4 bits, and the remaining portion after zero of the L1 length of the tail is removed by Xorj is recorded with at least one byte (non-zero portion of the second line of fig. 3);

referring to the third line in fig. 3, if L is less than 1, the special TAG marking the first 7 bits of the offset as all 1's is 127 (0X 7F of the third line in fig. 4) at the time of encoding, and the 8 th position of the offset is 1, and then the original value of the current floating point number is recorded, since the floating point number in this embodiment is 64 bits, i.e., the original 8 byte value is recorded at this time (non-zero portion of the third line in fig. 3).

Thus, based on the encoding method of the above embodiment, for a continuously arriving column of floating point numbers (x 1, x2, x3,..once., xn), its preset storage format is as shown in fig. 4, the header includes: magic numbers, version numbers, original length of data, compression length of data, and parameters used for encoding compression ("parameters" in the figure); the header is then followed by a recording of the original value of the first floating point number ("first value" in the figure) and the encoded and compressed value of each floating point number ("encoded representation of the Xor value" in the figure). The magic numbers are used for verifying legal compressed data blocks, the version numbers are used for detecting compatibility of future versions, the original length of data is used for decompressing and verifying, the compressed length of data is used for restoring data, and the parameters used for encoding and compressing generally comprise a compression method, a data width, which basic value searching method is adopted, the parameters of preprocessing, whether floating point numbers are converted or not, and the like. The first original value is recorded after the header, which is the basis of the compression, and the encoded compressed value is recorded after the header, which is obtained by encoding the result value of the exclusive-or after exclusive-or with the value at a certain position before.

The encoding method for the floating point number avoids expensive bit encoding operation, improves decompression speed by using simplified data representation, is a lossless, rapid and high-compression-rate stream floating point number compression method, reduces the volume of data and realizes high-speed data processing.

In one embodiment, after the step of performing exclusive-or calculation on the current floating point number and the data in each slot in the target bucket by using the linear lookup method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as the base value of the current floating point number code, and recording the position of the base value in the current window, the method further includes:

if the slot in which the basic value is located is a window which is not filled with data or a window which is slid when the basic value is searched, filling the current floating point number into the slot;

if the data in the groove where the base value is located is filled, the data stored in the groove first is replaced by the current floating point number.

The current floating point number is filled into the slot or the oldest data in the slot is replaced to update the data in the slot, so that the subsequent floating point number data is more similar to the data in the slot, and the searching speed and the coding efficiency of the floating point number data can be improved.

In one embodiment, the simplest tail-indexed lookup method is described, with 32-bit floating point numbers employed for ease of description. For 10 floating point numbers of 1.1-2.0, the binary representation is as follows:

float: 1.1, binary:00111111100011001100110011001101

float: 1.2, binary:00111111100110011001100110011010

float: 1.3, binary:00111111101001100110011001100110

float: 1.4, binary:00111111101100110011001100110011

float: 1.5, binary:00111111110000000000000000000000

float: 1.6, binary:00111111110011001100110011001101

float: 1.7, binary:00111111110110011001100110011010

float: 1.8, binary:00111111111001100110011001100110

float: 1.9, binary:00111111111100110011001100110011

float: 2.0, binary:01000000000000000000000000000000

for 1.5, its tail 8 bits are all 0, so the index number is also 0, so in bucket 0, there is only one value.

When processing to 2.0 it takes the lower 8 bits as well, the index number is also 0, so going to the 0 th bucket, this value of 1.5 is found, the code number recorded therein is 5, and the Xorj value after exclusive or of 2.0 and 1.5 is 01111111110000000000000000000000, calculated in bytes, the zero of the head is only 1 bit, so the zero length L2 of the head is 0, the zero of the tail is 22 bits, so the zero length L1 of the tail is 2, the final code representation is referred to the second line of fig. 4, the following can be obtained according to the code structure illustrated in fig. 4:

1: first position

0000101: a seven bit offset bit representing the 5 th value in its opposite reference window;

0010: the zero length L1 of the tail is represented by four bits, here 0 with a tail length of 2 bytes;

0010: the total zero length L is represented by four bits, here again 0 of 2 bytes in length in total;

0111111111000000: after the Xorj value removes zeros of the tail 2 bytes length, the reserved data bits are needed.

The floating point number stream data compression method of the application, because of avoiding the expensive bit coding operation, gives the variant decompression speed which is faster than the Gorilla method by several times and faster than the zstd method by 5-6 times. The encoding speed is about as high as the zstd speed, but the compression rate is several times higher than gorilla, and the zstd speed is about as high.

In this embodiment, a computer device is provided, as shown in fig. 5, including a memory 501, a processor 502, and a computer program stored in the memory and capable of running on the processor, where the processor implements any floating point number streaming data compression method described above when executing the computer program.

In particular, the computer device may be a computer terminal, a server or similar computing means.

In this embodiment, a computer-readable storage medium storing a computer program for executing any of the floating point number streaming data compression methods described above is provided.

In particular, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

Based on the same inventive concept, the embodiment of the application also provides a floating point number stream data compression device, as described in the following embodiment. Because the principle of the floating point number stream data compression device for solving the problem is similar to that of the floating point number stream data compression method, the implementation of the floating point number stream data compression device can refer to the implementation of the floating point number stream data compression method, and the repetition is omitted. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

FIG. 6 is a block diagram of a floating point number streaming data compression device according to an embodiment of the present application, as shown in FIG. 3, including: the structure of the index table establishment module 601, key value determination module 602, target bucket search module 603, base value search module 604, encoding module 605, and storage module 606 is described below.

An index table establishing module 601, configured to establish an index table, where the index table is provided with N barrels, each barrel has M slots, and N and M are positive integers;

a key value determination module 602, configured to determine a key value as an index based on the binary representation of the current floating point number;

a target bucket searching module 603, configured to search a target bucket in the N buckets by using a hash searching method according to the key value;

a basic value searching module 604, configured to sequentially exclusive-or calculate the current floating point number with the data in each slot in the target bucket by using a linear search method to obtain a plurality of first values, use the data corresponding to the first value with the largest number of bits of zero as the basic value of the current floating point number code, and record the position of the basic value in the current window;

the encoding module 605 is configured to encode the current floating point number according to the bit number of zero of the second value obtained by performing exclusive-or calculation on the current floating point number and the base value and the position of the base value in the current window;

the storage module 606 is configured to compress and store the encoded floating point number according to a preset storage format.

In one embodiment, the key value determination module 602 is further configured to: the first key value based on the tail bits of the binary of the current floating point number as an index.

In one embodiment, the key value determination module 602 is further configured to: a first key value based on a plurality of bits at the tail of a binary system of the current floating point number as an index; based on the binary header bits of the current floating point number as the indexed second key value, when the floating point number is 64 bits, the sign bit is added with the upper 6 th bit to the upper 12 th bit as the indexed second key value, and when the floating point number is 32 bits, the sign bit is added with the upper 6 th bit to the upper 12 th bit as the indexed second key value.

In one embodiment, the key value determination module 602 is further configured to: when the floating point number is 64 bits, a third key value based on the binary sign bit, the upper 7 th bit, and the lower 6 th bit of the current floating point number as an index; when the floating point number is 32, the third key value is indexed based on the binary sign bit, the upper 5 th bit, and the lower 7 th bit of the current floating point number.

In one embodiment, the encoding module 605 is further to: the position of the basic value in the current window is expressed as an offset, and the value of the offset is 0 to (2) ^M-1 -1) the second value obtained by exclusive-or calculation of the current floating point number and the base value is expressed as xorj=xjx, xj is the current floating point number, xi is the base value;

when each bit of the Xorj is zero, only the offset is recorded during encoding;

In one embodiment, the apparatus further includes a data filling module, configured to fill the current floating point number into the slot if the slot in which the base value is located is a window that is not filled with data or a window that is slid when the base value is found;

In one embodiment, the header of the preset storage format in the storage module 606 includes: magic number, version number, original length of data, compressed length of data and parameters used for encoding and compression; the header is then followed by recording the original value of the first floating point number and the value after compression of each floating point number encoding.

The embodiment of the application realizes the following technical effects: establishing an index table, wherein the index table is provided with N barrels, each barrel is provided with M grooves, and N and M are positive integers; determining a key value as an index based on the binary representation of the current floating point number; searching a target barrel in the N barrels by utilizing a hash searching method according to the key value; sequentially performing exclusive OR calculation on the current floating point number and the data in each groove in the target barrel by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a basic value of the current floating point number code, and recording the position of the basic value in a current window; coding the current floating point number according to the zero bit number condition of the second value obtained by performing exclusive OR calculation on the current floating point number and the basic value and the position of the basic value in the current window; and compressing and storing the encoded floating point number according to a preset storage format. The application realizes lossless compression of data by using an exclusive or method, improves compression rate by utilizing efficient search of data on a duration window, and improves decompression speed by simplified data representation.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the application are not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A floating point number streaming data compression method, comprising:

2. The floating point number streaming data compression method of claim 1, wherein the determining a key value as an index based on the binary representation of the current floating point number comprises:

the first key value based on the tail bits of the binary of the current floating point number as an index.

3. The floating point number streaming data compression method of claim 2, wherein the determining a key value as an index based on the binary representation of the current floating point number further comprises:

based on the binary header bits of the current floating point number as the indexed second key value, when the floating point number is 64 bits, the sign bit is added with the upper 6 th bit to the upper 12 th bit as the indexed second key value, and when the floating point number is 32 bits, the sign bit is added with the upper 6 th bit to the upper 12 th bit as the indexed second key value.

4. The floating point number streaming data compression method of claim 1, wherein the determining a key value as an index based on the binary representation of the current floating point number comprises:

when the floating point number is 64 bits, a third key value based on the binary sign bit, the upper 7 th bit, and the lower 6 th bit of the current floating point number as an index;

when the floating point number is 32, the third key value is indexed based on the binary sign bit, the upper 5 th bit, and the lower 6 th bit of the current floating point number.

5. The floating point number stream data compression method as set forth in claim 1, wherein the encoding the current floating point number based on the bit number of zeros of the second value obtained by exclusive-or calculation of the current floating point number with the base value and the position of the base value in the current window comprises:

the position of the basic value in the current window is expressed as an offset, and the value of the offset is 0 to (2) ^M-1 -1) the second value obtained by exclusive-or calculation of the current floating point number and the base value is expressed as xorj=xjx, xj is the current floating point number, xi is the base value;

when each bit of the Xorj is zero, only the offset is recorded during encoding;

6. The floating point number stream data compression method as claimed in any one of claims 1 to 5, wherein after the step of sequentially xoring the current floating point number with the data in each slot in the target bucket by using a linear search method to obtain a plurality of first values, taking the data corresponding to the first value with the largest number of bits of zero as a base value of the current floating point number code, and recording the position of the base value in the current window, the method further comprises:

7. The floating point number streaming data compression method as in any one of claims 1-5, wherein the header of the preset storage format includes: magic number, version number, original length of data, compressed length of data and parameters used for encoding and compression; the header is then followed by recording the original value of the first floating point number and the value after compression of each floating point number encoding.

8. A floating point number streaming data compression device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the floating point number streaming data compression method of any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for executing the floating point number streaming data compression method according to any one of claims 1 to 7.