CN114301468A - FSE encoding method, device, equipment and storage medium - Google Patents

FSE encoding method, device, equipment and storage medium Download PDF

Info

Publication number
CN114301468A
CN114301468A CN202111608151.1A CN202111608151A CN114301468A CN 114301468 A CN114301468 A CN 114301468A CN 202111608151 A CN202111608151 A CN 202111608151A CN 114301468 A CN114301468 A CN 114301468A
Authority
CN
China
Prior art keywords
coded
symbol
encoded
sequence
fse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111608151.1A
Other languages
Chinese (zh)
Inventor
张永兴
陈静静
吴睿振
张旭
孙华锦
王凛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202111608151.1A priority Critical patent/CN114301468A/en
Publication of CN114301468A publication Critical patent/CN114301468A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses an FSE encoding method, a device, equipment and a storage medium, comprising the following steps: acquiring a symbol sequence to be coded corresponding to the data to be coded, and determining a symbol to be coded with a symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded to obtain a target symbol to be coded; splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub sequences to be coded corresponding to the symbol sequence to be coded; and respectively and independently coding the plurality of subsequences to be coded to obtain the coded data to be coded. The method and the device have the advantages that the symbol with the uniqueness state is taken as the splitting point, the symbol sequence to be coded is split, then the plurality of subsequences can be obtained, and compared with the method and the device which are used for splitting the whole symbol sequence to be coded in a serial coding mode, the subsequences can be coded in parallel, and the coding rate is improved.

Description

FSE encoding method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an FSE encoding method, apparatus, device, and storage medium.
Background
With the rapid development of advanced science and technology such as big data and the like, the hasten data explosion type growth, and the mass data bring huge pressure to the existing storage equipment. In the face of continuously increasing mass data, data compression is an optimal and effective method for reducing storage burden of a server and reducing storage cost. Under the premise of not losing useful information, the data compression reduces the data volume to reduce the storage space and improve the transmission, storage and processing efficiency. Lossless data compression is generally achieved by two methods: one is an algorithm for realizing compression by a dictionary mode, and comprises an LZ series algorithm, and the algorithm can realize the function of searching repeated data; one is a compression algorithm based on a statistical model, such as Huffman code, arithmetic coding, etc., and the core idea of such algorithm is to allocate the code length according to the symbol occurrence frequency, and the larger the symbol frequency is, the shorter the code length is.
However, the length of code words allocated by Huffman coding can only be an integer, and although arithmetic coding can solve the pain point of Huffman coding, arithmetic coding algorithm is complex to implement, has low coding efficiency (approximately 1/10 of Huffman coding), and cannot be applied in the field of data compression. Currently, the Entropy coding algorithm of FSE Finite State Entropy (FSE) combines the advantages of Huffman and arithmetic coding. On the basis, the hybrid compression algorithm zstandard (zstd) consisting of LZ77 encoding, huffman encoding and FSE has better compression performance compared with other compression algorithms (such as DEFLATE algorithm and LZ4 algorithm), but the encoding rate is limited to a larger extent due to the serial stream encoding mode of loop-and-loop deduction, so that the compression performance cannot be achieved.
Therefore, how to provide an FSE encoding method with a higher encoding rate is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides an FSE encoding method, apparatus, device and storage medium, which can implement parallel encoding of subsequences and improve encoding rate. The specific scheme is as follows:
a first aspect of the present application provides an FSE encoding method, including:
acquiring a symbol sequence to be coded corresponding to the data to be coded, and determining a symbol to be coded with a symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded to obtain a target symbol to be coded;
splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub sequences to be coded corresponding to the symbol sequence to be coded;
and respectively and independently coding the plurality of subsequences to be coded to obtain coded data to be coded.
Optionally, the determining, from the sequence of symbols to be encoded, a symbol to be encoded whose symbol state does not have a dependency relationship with other symbol states to obtain a target symbol to be encoded includes:
determining the occurrence frequency of each symbol to be coded in the symbol sequence to be coded so as to obtain the normalized probability corresponding to each symbol to be coded;
and determining the symbol to be coded with the symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded according to the normalized probability so as to obtain the target symbol to be coded.
Optionally, after determining the occurrence frequency of each symbol to be encoded in the symbol sequence to be encoded, the method further includes:
and calculating to obtain the normalized probability corresponding to each symbol to be coded according to the occurrence frequency of each symbol to be coded, the total symbol number of the symbol sequence to be coded and the length of a preset FSE code table for coding.
Optionally, the determining, according to the normalized probability, a symbol to be coded whose symbol state does not have a dependency relationship with other symbol states from the symbol sequence to be coded to obtain the target symbol to be coded includes:
and determining the character to be coded with the normalized probability absolute value of 1 from the symbol sequence to be coded so as to obtain the target symbol to be coded.
Optionally, the splitting the symbol sequence to be encoded based on the position of the target symbol to be encoded in the symbol sequence to be encoded to obtain a plurality of sub-sequences to be encoded corresponding to the symbol sequence to be encoded, including:
splitting each target symbol to be coded at the initial position of the symbol sequence to be coded to split each target symbol to be coded from the symbol sequence to be coded to obtain a first type of sub-sequence to be coded with the number of sub-sequences being consistent with the number of the target symbols to be coded and a second type of sub-sequence to be coded with the number of sub-sequences being one more than the number of the target symbols to be coded; each first type of subsequence to be coded corresponds to one target symbol to be coded, and each second type of symbol to be coded is a broken chain formed by a gap at a splitting position.
Optionally, the separately encoding the multiple sub-sequences to be encoded to obtain the encoded data to be encoded includes:
and respectively and independently coding the multiple subsequences to be coded by using a preset FSE code table to obtain multiple coded subsequences to be coded, and splicing the multiple coded subsequences to be coded according to the arrangement sequence of the symbol series to be coded to obtain the coded data to be coded.
Optionally, the separately encoding the plurality of sub-sequences to be encoded includes:
a plurality of encoders are utilized to carry out independent encoding on a plurality of subsequences to be encoded in parallel; wherein one of the encoders corresponds to one of the sub-sequences to be encoded.
A second aspect of the present application provides an FSE encoding apparatus comprising:
the device comprises an acquisition module, a coding module and a decoding module, wherein the acquisition module is used for acquiring a symbol sequence to be coded corresponding to data to be coded, and determining a symbol to be coded with a symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded so as to obtain a target symbol to be coded;
the splitting module is used for splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded so as to obtain a plurality of sub-sequences to be coded corresponding to the symbol sequence to be coded;
and the coding module is used for separately coding the plurality of subsequences to be coded to obtain the coded data to be coded.
A third aspect of the present application provides an electronic device comprising a processor and a memory; wherein the memory is used to store a computer program that is loaded and executed by the processor to implement the aforementioned FSE encoding method.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein computer-executable instructions that, when loaded and executed by a processor, implement the aforementioned FSE encoding method.
In the method, a symbol sequence to be coded corresponding to the data to be coded is obtained first, and the symbol to be coded with the symbol state having no dependency relationship with other symbol states is determined from the symbol sequence to be coded, so that a target symbol to be coded is obtained; then splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub-sequences to be coded corresponding to the symbol sequence to be coded; and finally, respectively and independently coding the plurality of subsequences to be coded to obtain the coded data to be coded. According to the method and the device, the symbols which do not have dependency relationship with other symbol states in the symbol sequence to be coded are found, namely the symbols with uniqueness in state are taken as the splitting points to split the symbol sequence to be coded, so that a plurality of subsequences can be obtained, and compared with the method and the device which can achieve parallel coding of the subsequences after serial coding splitting is carried out on the whole symbol sequence to be coded, the coding rate is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for FSE encoding provided herein;
FIG. 2 is a diagram illustrating a specific FSE encoding method provided herein;
FIG. 3 is an illustration of an FSE code representation provided herein;
FIG. 4 is a diagram illustrating an example of a specific encoding process for a subsequence to be encoded according to the present application;
FIG. 5 is a diagram illustrating another example of a specific encoding process for a subsequence to be encoded according to the present application;
FIG. 6 is a schematic structural diagram of an FSE encoding apparatus provided in the present application;
fig. 7 is a block diagram of an FSE encoded electronic device according to the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The existing FSE entropy coding algorithm combines the advantages of Huffman and arithmetic coding at the same time, on the basis, a hybrid compression algorithm Zstandard (zstd) consisting of LZ77 coding, Hufman coding and FSE has better compression performance compared with other compression algorithms (such as DEFLATE algorithm and LZ4 algorithm), but the coding rate is limited to a greater extent due to a serial stream coding mode of ring-to-ring deduction, so that the compression performance cannot be achieved. In view of the above technical defects, the present application provides an FSE encoding scheme, which can implement parallel encoding of subsequences and improve encoding rate.
Fig. 1 is a flowchart of an FSE encoding method according to an embodiment of the present application. Referring to fig. 1, the FSE encoding method includes:
s11: obtaining a symbol sequence to be coded corresponding to the data to be coded, and determining a symbol to be coded with a symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded to obtain a target symbol to be coded.
In this embodiment, when encoding data to be encoded, a symbol sequence to be encoded corresponding to the data to be encoded is obtained, and a symbol to be encoded whose symbol state does not have a dependency relationship with other symbol states is determined from the symbol sequence to be encoded, so as to obtain a target symbol to be encoded. And the state of the target symbol to be coded in the FSE code table has uniqueness. The symbol to be coded is a symbol (Syambles) of a Zstandard protocol, and refers to an output symbol of source data searched for duplication by an LZ77 algorithm, and the output symbol comprises four types of symbols, including literal, literal _ length, match _ length, and offset. Wherein, Huffman coding is adopted for the literals, and FSE coding is adopted for the other three types of symbols.
Specifically, the occurrence frequency of each symbol to be coded in the symbol sequence to be coded is determined first, so as to obtain the normalized probability corresponding to each symbol to be coded. The normalized probability corresponding to each symbol to be coded can be obtained by calculation according to the occurrence frequency of each symbol to be coded, the total number of the symbols of the symbol sequence to be coded and the length of a preset FSE code table used for coding. And then determining the symbol to be coded with the symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded according to the normalized probability so as to obtain the target symbol to be coded.
It is understood that the length of the FSE code table is an integer power of 2, and the zsttd protocol refers to the index (index) of the FSE code table as states (states), each State corresponding to a symbol. Since the number of states corresponding to each symbol in the FSE code table is related to the normalized probability of the symbol, it is necessary to Normalize (Normalize) the frequency of the symbol to generate a probability table. The normalized calculation formula is as follows:
Figure BDA0003430088370000061
the normalized probability of a certain symbol is related to the proportion of the occurrence frequency of the symbol to the total number, and the normalization algorithm needs to satisfy the following two principles: (1) the normalized probability sum must be equal to Table _ size; (2) the normalized probability is at least 1 as long as a certain symbol appears. Assuming that the total number of source symbols is 1024, the length of the FSE code table is 32, the frequency of symbol s1 is 32, and the frequency of symbol s2 is 2, the normalized probability of two symbols can be calculated from the above equation:
Figure BDA0003430088370000062
Figure BDA0003430088370000063
due to the normalization principle, norm(s) needs to be reduced2) To distinguish between the two normalized probabilities represented by s1, s2 equal to 1 in the Zstandard protocol, the normalized probability for s2 case is identified by "-1" and the case of s1 by "1". Therefore, in this embodiment, the normalized probability of "1" or "-1" is used as a breakthrough, and the character to be encoded with the normalized probability absolute value of 1 is determined from the symbol sequence to be encoded, so as to obtain the target symbol to be encoded.
The rationality of obtaining the target symbol to be encoded is described in conjunction with the existing FSE code table construction process. The state is first assigned for the symbol of probability "-1", and is assigned in reverse order from the state (Table _ size-1). The pseudo code is as follows:
Figure BDA0003430088370000064
the state is then assigned to the other symbols, the pseudo code being as follows:
Figure RE-GDA0003538666790000072
each State in the FSE code table corresponds to one symbol and also corresponds to one State-range (State-range), the description mode of the State-range is (base, number-bits), and in the FSE encoding and decoding process, the State-range of each State is used to describe the State value range [ base, base + (1< < number-bits) ] of the next symbol. For one symbol, there are several states in the FSE code Table, the states-ranges of these states have no overlap, and the sum of the lengths of the states-ranges of all the states is equal to the total number of states (Table-size) in the FSE code Table. The length of the State-range must be an integer power of 2 (integer is number-bits), and the length of the State-range of all states of the same symbol may take 1-2. If the probability of a symbol is exactly an integer power of 2 (norm 2^ n), then the length of State-range takes on a unique value:
number_bits=table_log-n
Figure BDA0003430088370000072
in other cases, the length of State _ range has two values:
Figure BDA0003430088370000073
Figure BDA0003430088370000074
for the same symbol, all states are first required to be sorted. The length value corresponding to the smaller State State-range is num _ bits1The larger state value is num _ bits2. Suppose N is 2ceil(log(norm))Then num _ bits1The corresponding number of states is N-norm, num _ bits2The corresponding number of states is 2 norm-N, Baseline for State-range (Baseline), assigned from the beginning of the larger State.
Assuming that the length of the FSE code table is 64 and the probability of a symbol s is 5, if the assignment is started from state 0, then the symbols corresponding to states [0, 43, 22, 1, 44] are all s, and the result after the ordering is [0, 1, 22, 43, 44 ]. N2 ^3 ═ 8, so the Number-bits value 4 has a Number of states of 3, i.e., [0, 1, 22 ]. The Number of states with Number-bits value of 3 is 2, i.e. [43, 44 ]. The base line (base) of State (43) is 0, the base line of State (44) is 8(0+8), the base line (base) of State (0) is 16(8+8), the base line of State (1) is 32(16+16), and the base line of State (22) is 48(32+ 16). The above calculation flow is summarized in the following table:
state_order 0 1 2 3 4
State 0 1 22 43 44
Number_bit 4 4 4 3 3
Base_line 16 32 48 0 8
State_range 16-31 32-47 48-63 0-7 8-15
in encoding, assuming that the former symbol s1 selects the State as State [39], and the latter symbol s2 and s2 can be encoded by using Table 1, in FSE encoding, s2 needs to select the corresponding State of State-range where State [39] is located, in this case, State [1] is obviously selected; after the State of s2 is determined, the State of s1 can be encoded, State [39] encodes number _ bits (4) with bit width of State [1] and encodes value 7 (39-32, 32 is baseline of State [1 ]). So far s1 encoding is complete. Assuming that s3 is exactly the same as s2, the State of s3 selects the State [43] corresponding to the State-range (0-7) where State [1] is located, s2 encodes number-bits with bit width of State [43] (3) and encodes a baseline with value of 1 (1-0, 0 being State [43 ]).
Therefore, the state of the 1 st symbol determines the state of the 2 nd symbol, and the code word of the 1 st symbol is determined by the range of the 2 nd state; the state of the 2 nd symbol determines the state of the 3 rd symbol, and the code word of the 2 nd symbol is determined by the range of the 3 rd state; in this way, the state of the 2 nd symbol determines the state of the last symbol, and the encoded codeword of the 2 nd symbol is determined by the range of the last symbol. Summarizing, the state of the previous symbol determines the state of the next symbol, but the state coding content (code length, code word) of the previous symbol is determined by the next symbol. In the FSE encoding process, the state of each symbol in the whole symbol sequence depends on the state of the previous symbol, and the whole encoding process is carried out in a way that the loops are buckled and the serial flow is carried out step by step. In terms of flow, it is difficult to implement FSE encoding of symbol sequences in parallel. Thus, a particular symbol that is unique to a certain state, i.e., a symbol with a normalized probability of "1" or "-1", may be predicted in advance for its selected state.
S12: splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub-sequences to be coded corresponding to the symbol sequence to be coded.
In this embodiment, after the target symbol to be encoded is found, the symbol sequence to be encoded is split based on the position of the target symbol to be encoded in the symbol sequence to be encoded, so as to obtain a plurality of sub-sequences to be encoded corresponding to the symbol sequence to be encoded. Specifically, splitting each target symbol to be coded at a starting position of the symbol sequence to be coded, so as to split each target symbol to be coded from the symbol sequence to be coded to obtain a first type of sub-sequence to be coded, the number of which is consistent with the number of the target symbols to be coded, and a second type of sub-sequence to be coded, the number of which is one more than the number of the target symbols to be coded; each first type of subsequence to be coded corresponds to one target symbol to be coded, and each second type of symbol to be coded is a broken chain formed by a gap at a splitting position.
S13: and respectively and independently coding the plurality of subsequences to be coded to obtain the coded data to be coded.
In this embodiment, the plurality of sub-sequences to be encoded are separately encoded, so as to obtain the encoded data to be encoded. A plurality of said sub-sequences to be encoded may be encoded separately in parallel using a plurality of encoders. One said encoder corresponds to one said sub-sequence to be encoded, thereby speeding up the FSE encoding rate in hardware. On the basis, a preset FSE code table is used for respectively and independently coding the multiple to-be-coded subsequences to obtain multiple coded subsequences to be coded, and the multiple coded subsequences to be coded are spliced according to the arrangement sequence of the to-be-coded symbol series to obtain the coded data to be coded.
As shown in fig. 2, the specific implementation process of this embodiment first searches the entire symbol sequence (symbols-sequences) to find the positions of the symbol sequences where all symbols with the probability of "1" or "-1" are located. Then, the found positions are used as the initial positions, and the whole symbol sequence (symbols-sequences) is divided into a plurality of subsequences (sub-sequences). Each subsequence is encoded using an FSE code table, which generates n FSE encoded data segments. Wherein all subsequences can be encoded using parallel FSE. And finally, synthesizing and splicing the n FSE coded segment data according to the natural sequence of the source data, and outputting the coded code stream. More specifically, it is assumed that the symbol sequence to be encoded is "fddgecaedcfcegeeveeeadedgedfeggfeggedgegfegefgfegefgfecgaefbgfdeffagfeccfgegfecgfeggegfecgfecgfecgfecgfecgfeffedfecgfeffagfaggfaggfecgfaggfecgfaggfecgfaggfaggfecgfagdfecgfagcdefand the symbol frequency statistics are as follows:
a b c d e f g
9 5 16 17 32 24 24
the normalization results were as follows:
a b c d e f g
1 -1 2 -2 4 3 3
assuming that the length of the FSE code table is 16, the code table shown in FIG. 3 is constructed. Finding the characters a and b, the character string can be divided into the following 15 subsequences: "fdegc", "aedcfcege", "afee", "adedgfecc", "afgedgegface", "bfeccegcgef", "bcdg", "aefde", "bgfdef", "agfecfgeff", "adgecfgfcgec", "bfefdegcgecgef", "acefdgccdeff", "acefdgccdefd". The encoding process is illustrated by "afgedgegface" and "bfecgegcgef", respectively.
For "afgedgegface", a state 0, the next symbol is f, f corresponds to states 2(8-15), 5(0-3), 12(4-7), 0 is in the range of state 5, so that f is 5, a coding bit width is 2, and a coding value is 0; the next symbol is g, which corresponds to states 3(8-15), 6(0-3), 9(4-7), where the encoding bit width of the state 9 is selected to be 2, and the encoding value is 1 (5-4); the next symbol is e, which corresponds to states 1(0-3), 8(4-7), 11(8-11), 14(12-15), where the encoding bit width of 11, g is selected to be 2 and the encoding value is 1 (9-8); the next symbol is g, which corresponds to states 3(8-15), 6(0-3), 9(4-7), where the encoding bit width of 3, f is selected to be 3 and the encoding value to be 3 (11-8). By analogy, the whole process is shown in fig. 4, and the final binary code stream is 0001010111000011010111101000011110. Similarly, the "bfeccgegcgef" mapping process is shown in fig. 5, and the final binary code stream is 11110001010110111001011010101. The two code streams are synthesized to be 000101011100001101011110100001111011110001010110111001011010101.
It can be seen that, in the embodiment of the application, a symbol sequence to be coded corresponding to the data to be coded is obtained first, and a symbol to be coded, of which the symbol state does not have a dependency relationship with other symbol states, is determined from the symbol sequence to be coded, so as to obtain a target symbol to be coded; then splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub-sequences to be coded corresponding to the symbol sequence to be coded; and finally, respectively and independently coding the plurality of subsequences to be coded to obtain the coded data to be coded. According to the method and the device, the symbol which does not have dependency relationship with other symbol states in the symbol sequence to be coded is found, namely the symbol with the uniqueness state is taken as the splitting point, the symbol sequence to be coded is split, then a plurality of subsequences can be obtained, and compared with the method and the device which can achieve parallel coding of the subsequences after serial coding splitting is carried out on the whole symbol sequence to be coded, the coding rate is improved.
Referring to fig. 6, an embodiment of the present application further discloses an FSE encoding apparatus, which includes:
the obtaining module 11 is configured to obtain a symbol sequence to be encoded corresponding to the data to be encoded, and determine a symbol to be encoded from the symbol sequence to be encoded, where a symbol state and other symbol states do not have a dependency relationship, so as to obtain a target symbol to be encoded;
a splitting module 12, configured to split the symbol sequence to be encoded based on a position of the target symbol to be encoded in the symbol sequence to be encoded, so as to obtain a plurality of sub-sequences to be encoded corresponding to the symbol sequence to be encoded;
and the encoding module 13 is configured to separately encode the multiple sub-sequences to be encoded, so as to obtain the encoded data to be encoded.
It can be seen that, in the embodiment of the application, a symbol sequence to be coded corresponding to the data to be coded is obtained first, and a symbol to be coded, of which the symbol state does not have a dependency relationship with other symbol states, is determined from the symbol sequence to be coded, so as to obtain a target symbol to be coded; then splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub-sequences to be coded corresponding to the symbol sequence to be coded; and finally, respectively and independently coding the plurality of subsequences to be coded to obtain the coded data to be coded. According to the method and the device, the symbol which does not have dependency relationship with other symbol states in the symbol sequence to be coded is found, namely the symbol with the uniqueness state is taken as the splitting point, the symbol sequence to be coded is split, then a plurality of subsequences can be obtained, and compared with the method and the device which can achieve parallel coding of the subsequences after serial coding splitting is carried out on the whole symbol sequence to be coded, the coding rate is improved.
In some specific embodiments, the obtaining module 11 specifically includes:
the frequency determining unit is used for determining the occurrence frequency of each symbol to be coded in the symbol sequence to be coded;
the probability determining unit is used for calculating to obtain the normalized probability corresponding to each symbol to be coded according to the occurrence frequency of each symbol to be coded, the total symbol number of the symbol sequence to be coded and the length of a preset FSE code table for coding;
and the symbol determining unit is used for determining the symbol to be coded with the symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded according to the normalized probability so as to obtain the target symbol to be coded.
In some specific embodiments, the symbol determining unit is specifically configured to determine, from the symbol sequence to be encoded, a character to be encoded whose normalized probability absolute value is 1, so as to obtain the target symbol to be encoded.
In some specific embodiments, the splitting module 12 is specifically configured to split each target symbol to be encoded at a starting position in the symbol sequence to be encoded, so as to split each target symbol to be encoded from the symbol sequence to be encoded to obtain a first type of sub-sequence to be encoded, whose number of sub-sequences is consistent with the number of the target symbols to be encoded, and a second type of sub-sequence to be encoded, whose number of sub-sequences is one more than the number of the target symbols to be encoded; each first type of subsequence to be coded corresponds to one target symbol to be coded, and each second type of symbol to be coded is a broken chain formed by a gap at a splitting position.
In some embodiments, the encoding module 13 specifically includes:
the coding unit is used for respectively and independently coding the plurality of subsequences to be coded by utilizing a preset FSE code table to obtain a plurality of coded subsequences to be coded;
and the splicing unit is used for splicing the plurality of encoded sub-sequences to be encoded according to the arrangement sequence of the symbol series to be encoded so as to obtain the encoded data to be encoded.
In some embodiments, the encoding unit is further configured to separately encode the plurality of sub-sequences to be encoded in parallel by using a plurality of encoders; wherein one of the encoders corresponds to one of the sub-sequences to be encoded.
Further, the embodiment of the application also provides electronic equipment. FIG. 7 is a block diagram illustrating an electronic device 20 in accordance with an exemplary embodiment, the contents of which should not be construed as limiting the scope of use of the present application in any way.
Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the FSE encoding method disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon may include an operating system 221, a computer program 222, data 223, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20, so as to realize the operation and processing of the mass data 223 in the memory 22 by the processor 21, and may be Windows Server, Netware, Unix, Linux, and the like. The computer programs 222 may further include computer programs that can be used to perform other specific tasks in addition to the computer programs that can be used to perform the FSE encoding method performed by the electronic device 20 disclosed in any of the foregoing embodiments. Data 223 may include data to be encoded that is collected by electronic device 20.
Further, an embodiment of the present application further discloses a storage medium, where a computer program is stored, and when the computer program is loaded and executed by a processor, the steps of the FSE encoding method disclosed in any of the foregoing embodiments are implemented.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The FSE encoding method, apparatus, device and storage medium provided by the present invention are described in detail above, and the principle and the implementation of the present invention are explained in detail herein by applying specific examples, and the description of the above embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and as described above, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of FSE encoding, comprising:
acquiring a symbol sequence to be coded corresponding to the data to be coded, and determining a symbol to be coded with a symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded to obtain a target symbol to be coded;
splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded to obtain a plurality of sub sequences to be coded corresponding to the symbol sequence to be coded;
and respectively and independently coding the plurality of subsequences to be coded to obtain the coded data to be coded.
2. The FSE encoding method as claimed in claim 1, wherein the determining a symbol to be encoded from the symbol sequence to be encoded, which has no dependency relationship between symbol states and other symbol states, to obtain a target symbol to be encoded, comprises:
determining the occurrence frequency of each symbol to be coded in the symbol sequence to be coded so as to obtain the normalized probability corresponding to each symbol to be coded;
and determining the symbol to be coded with the symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded according to the normalized probability so as to obtain the target symbol to be coded.
3. The FSE encoding method of claim 2, wherein said determining the frequency of occurrence of each symbol to be encoded in the sequence of symbols to be encoded further comprises:
and calculating to obtain the normalized probability corresponding to each symbol to be coded according to the occurrence frequency of each symbol to be coded, the total symbol number of the symbol sequence to be coded and the length of a preset FSE code table for coding.
4. The FSE encoding method as claimed in claim 3, wherein the determining the symbol to be encoded from the symbol sequence to be encoded according to the normalized probability that there is no dependency relationship between symbol states and other symbol states to obtain the target symbol to be encoded comprises:
and determining the character to be coded with the normalized probability absolute value of 1 from the symbol sequence to be coded so as to obtain the target symbol to be coded.
5. The FSE encoding method as claimed in claim 1, wherein the splitting the symbol sequence to be encoded based on the position of the target symbol to be encoded in the symbol sequence to be encoded to obtain a plurality of sub-sequences to be encoded corresponding to the symbol sequence to be encoded comprises:
splitting each target symbol to be coded at the initial position of the symbol sequence to be coded to split each target symbol to be coded from the symbol sequence to be coded to obtain a first type of sub-sequence to be coded with the number of sub-sequences being consistent with the number of the target symbols to be coded and a second type of sub-sequence to be coded with the number of sub-sequences being one more than the number of the target symbols to be coded; each first type of subsequence to be coded corresponds to one target symbol to be coded, and each second type of symbol to be coded is a broken chain formed by a gap at a splitting position.
6. The FSE encoding method as claimed in claim 1, wherein the separately encoding the plurality of sub-sequences to be encoded to obtain the encoded data to be encoded comprises:
and respectively and independently coding the multiple subsequences to be coded by using a preset FSE code table to obtain multiple coded subsequences to be coded, and splicing the multiple coded subsequences to be coded according to the arrangement sequence of the symbol series to be coded to obtain the coded data to be coded.
7. The FSE encoding method according to any of claims 1 to 6, wherein the separately encoding the plurality of subsequences to be encoded comprises:
a plurality of encoders are utilized to carry out independent encoding on a plurality of subsequences to be encoded in parallel; wherein one of the encoders corresponds to one of the sub-sequences to be encoded.
8. An FSE encoding apparatus, comprising:
the device comprises an acquisition module, a coding module and a decoding module, wherein the acquisition module is used for acquiring a symbol sequence to be coded corresponding to data to be coded, and determining a symbol to be coded with a symbol state having no dependency relationship with other symbol states from the symbol sequence to be coded so as to obtain a target symbol to be coded;
the splitting module is used for splitting the symbol sequence to be coded based on the position of the target symbol to be coded in the symbol sequence to be coded so as to obtain a plurality of sub-sequences to be coded corresponding to the symbol sequence to be coded;
and the coding module is used for respectively and independently coding the plurality of subsequences to be coded so as to obtain the coded data to be coded.
9. An electronic device, comprising a processor and a memory; wherein the memory is for storing a computer program that is loaded and executed by the processor to implement the FSE encoding method of any of claims 1 to 7.
10. A computer-readable storage medium storing computer-executable instructions which, when loaded and executed by a processor, carry out the FSE encoding method of any of claims 1 to 7.
CN202111608151.1A 2021-12-23 2021-12-23 FSE encoding method, device, equipment and storage medium Pending CN114301468A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111608151.1A CN114301468A (en) 2021-12-23 2021-12-23 FSE encoding method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111608151.1A CN114301468A (en) 2021-12-23 2021-12-23 FSE encoding method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114301468A true CN114301468A (en) 2022-04-08

Family

ID=80970361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111608151.1A Pending CN114301468A (en) 2021-12-23 2021-12-23 FSE encoding method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114301468A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513210A (en) * 2022-04-20 2022-05-17 苏州浪潮智能科技有限公司 State selection method, system, storage medium and device for finite state entropy coding
CN117119066A (en) * 2023-10-20 2023-11-24 国网天津市电力公司营销服务中心 Mobile terminal high-frequency service data dumping method and system adopting message queue
CN117119066B (en) * 2023-10-20 2024-07-02 国网天津市电力公司营销服务中心 Mobile terminal high-frequency service data dumping method and system adopting message queue

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513210A (en) * 2022-04-20 2022-05-17 苏州浪潮智能科技有限公司 State selection method, system, storage medium and device for finite state entropy coding
WO2023202149A1 (en) * 2022-04-20 2023-10-26 苏州浪潮智能科技有限公司 State selection method and system for finite state entropy encoding, and storage medium and device
CN117119066A (en) * 2023-10-20 2023-11-24 国网天津市电力公司营销服务中心 Mobile terminal high-frequency service data dumping method and system adopting message queue
CN117119066B (en) * 2023-10-20 2024-07-02 国网天津市电力公司营销服务中心 Mobile terminal high-frequency service data dumping method and system adopting message queue

Similar Documents

Publication Publication Date Title
US7720878B2 (en) Data compression method and apparatus
US7161507B2 (en) Fast, practically optimal entropy coding
KR100950607B1 (en) Huffman coding
US7623047B2 (en) Data sequence compression
RU2629440C2 (en) Device and method for acceleration of compression and decompression operations
US6563439B1 (en) Method of performing Huffman decoding
CN114513210B (en) State selection method, system, storage medium and device for finite state entropy coding
JP2003218703A (en) Data coder and data decoder
EP4082119A1 (en) Systems and methods of data compression
US8660187B2 (en) Method for treating digital data
US9236881B2 (en) Compression of bitmaps and values
CN115189696A (en) Hardware compression and decompression method based on Huffman decoding table
CN114301468A (en) FSE encoding method, device, equipment and storage medium
CN113630125A (en) Data compression method, data encoding method, data decompression method, data encoding device, data decompression device, electronic equipment and storage medium
KR20030071327A (en) Improved huffman decoding method and apparatus thereof
US7796059B2 (en) Fast approximate dynamic Huffman coding with periodic regeneration and precomputing
Ferragina et al. On the bit-complexity of Lempel-Ziv compression
KR101030726B1 (en) Memory efficient multimedia huffman decoding method and apparatus for adapting huffman table based on symbol from probability table
Jiang et al. Parallel design of arithmetic coding
Bille et al. Lempel-Ziv compression in a sliding window
CN115765755A (en) ANS coding and decoding method, equipment and medium based on finite field multiplication
WO2009001174A1 (en) System and method for data compression and storage allowing fast retrieval
Klein et al. Boosting the compression of rewriting on flash memory
CN115333544B (en) Data decompression circuit and method, chip and electronic equipment thereof
CN115801021A (en) ANS (answer System) grouping coding and decoding method, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination