CN114900193A - Adaptive Huffman coding system and method - Google Patents

Adaptive Huffman coding system and method Download PDF

Info

Publication number
CN114900193A
CN114900193A CN202210366617.XA CN202210366617A CN114900193A CN 114900193 A CN114900193 A CN 114900193A CN 202210366617 A CN202210366617 A CN 202210366617A CN 114900193 A CN114900193 A CN 114900193A
Authority
CN
China
Prior art keywords
node
binary tree
memory
huffman
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210366617.XA
Other languages
Chinese (zh)
Inventor
萧文远
郑岚心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bouffalo Lab Nanjing Co ltd
Original Assignee
Bouffalo Lab Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bouffalo Lab Nanjing Co ltd filed Critical Bouffalo Lab Nanjing Co ltd
Priority to CN202210366617.XA priority Critical patent/CN114900193A/en
Publication of CN114900193A publication Critical patent/CN114900193A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an adaptive Huffman coding system and a method, wherein the adaptive Huffman coding system comprises: the system comprises a module for acquiring data to be coded and a Huffman binary tree coding module; the to-be-coded data acquisition module is used for acquiring to-be-coded data; the Huffman binary tree coding module is used for constructing a Huffman binary tree according to the data to be coded, which is obtained by the data to be coded obtaining module; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established. The adaptive Huffman coding system and the method can efficiently access the coding data structure on hardware and optimize the storage space of coding, and simultaneously give elasticity to the configurable memory.

Description

Adaptive Huffman coding system and method
Technical Field
The invention belongs to the technical field of artificial intelligence and data coding, relates to a coding system, and particularly relates to an adaptive Huffman coding system and method.
Background
Lossless data compression is a key element in efficient data storage, and huffman coding is the most popular variable length coding algorithm. Given a set of data symbols and their probability of occurrence, huffman coding will assign shorter codes to more frequently occurring symbols in such a way that the code words are generated to minimize the average code length. Huffman coding has been widely adopted by various applications because it guarantees optimality. In modern multi-stage compression designs, it is often used as the back-end of the system to improve the compression performance of domain-specific front-ends, such as JPEG and MP 3.
Adaptive huffman coding is a dynamic coding technique based on huffman coding. In contrast, it does not know the probability of each symbol occurrence at first, but rather dynamically collects and updates the probability of symbol occurrence as the data stream arrives, typically traversing the data from the root to the leaf nodes of the huffman tree in a recursive manner. There are three main disadvantages to implementing adaptive huffman coding in hardware: (1) due to the characteristics of AI model parameters and characteristic diagram data distribution, the data structure of a skew tree generated by adaptive Huffman coding is often caused, and the memory data storage is not facilitated; (2) the recursive access of data requires additional space support, thereby increasing the memory usage; (3) the computational cost of traversing the tree for each symbol is very high in hardware design.
AI models often have millions of parameters and feature map data, which require a large amount of memory to store and transmit when performing edge model inference. Especially, the characteristic diagram occupies the bandwidth of the memory I/O during the operation, which will cause insufficient bandwidth if the memory is shared by multiple systems.
Therefore, how to perform effective transmission on hardware, reduce the bandwidth occupied by the memory in unit time, and simultaneously keep as much data information becomes an important issue. The adaptive Huffman coding is a real-time lossless data compression algorithm; however, due to the special tree data structure of the algorithm, it is difficult to access the data of the subtree node in a recursive manner.
In view of the above, there is a need to design a new encoding scheme to overcome at least some of the above-mentioned disadvantages of the existing encoding schemes.
Disclosure of Invention
The invention provides an adaptive Huffman coding system and method, which can efficiently access a coding data structure on hardware and optimize the storage space of coding, and simultaneously give elasticity to a configurable memory.
In order to solve the technical problem, according to one aspect of the invention, the following technical scheme is adopted:
an adaptive huffman coding system, the adaptive huffman coding system comprising:
the data to be coded acquisition module is used for acquiring data to be coded;
the Huffman binary tree coding module is used for constructing a Huffman binary tree according to the data to be coded, which is obtained by the data to be coded obtaining module; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established.
As an embodiment of the present invention, the huffman binary tree coding module includes:
the Huffman binary tree initialization unit is used for initializing a Huffman binary tree;
the Huffman binary tree updating unit is used for updating the Huffman binary tree according to a set rule;
the process of updating the binary huffman tree by the binary huffman tree updating unit is required to follow the following specifications:
(1) the nodes are numbered according to the increasing sequence; that is, the node numbers are increased from top to bottom; numbering and increasing from left to right;
(2) the internal nodes are represented by circles; the internal node weight is the sum of the weights of the child nodes;
(3) leaf nodes are represented by squares; the weight of the leaf node is initialized to 1, and if data are input repeatedly, the weight is increased by 1;
(4) during the update of the tree, nodes must be switched if the left sub-tree has a greater weight than the right sub-tree.
As an embodiment of the present invention, the huffman binary tree coding module needs to dynamically update the node weights and the switching nodes in the process of constructing the huffman binary tree; optimizing the mapping from the node data to the hardware memory in a fixed address access mode;
configuring a continuous memory for storing information of all internal nodes and leaf nodes of a full binary tree under a specific depth, if the internal nodes are stored, performing the dynamic operation through an address for accessing the memory, and not needing to access left and right subtrees one by one through father nodes; if the leaf node is stored, the data to be encoded of the leaf node can be directly accessed and the weight is updated;
calculating the memory index value of the target node according to the unique path from the root node (root) to the target node: the left child node path record of the root node is 0; the right child node path record is 1;
setting the depth of a full binary tree as k; mapping the root node to a memory address of 1, defining addresses of the left child node and the right child node of the root node as 2 and 3 according to the full binary tree, and so on; obtaining the memory address mapped by any target node is achieved through a left shift operator (<), and the left shift operator can shift the integer or the enumerated expression bit to the left; according to the definition of the left and right child node indexes, the memory address index of the left child node of any target node with the index j is represented as j <1, and the memory address index of the right child node is represented as (j <1) +1, wherein the memory index i of the target node is calculated by the unique path from the root node to the target node.
As an embodiment of the present invention, the huffman binary tree coding module further includes a binary tree memory address remapping unit, configured to give a set of continuous memory spaces by using a binary tree memory address remapping method after the huffman binary tree is constructed, and point a null pointer in a memory to a node beyond a memory range, so that the node can be stored in a hardware in a balanced tree data structure with high memory efficiency.
As an embodiment of the present invention, assuming that the depth of a full binary tree is k, the process of constructing the binary tree may pre-calculate the memory index value of the target node, and once the index value exceeds 2 k 1, implementing binary tree memory address remapping; the process of remapping memory addresses in the binary tree will cause the index value to exceed 2 k -1, establishing an address mapping relationship between a parent node of the target node and a parent node of the empty node originally recorded in the memory;
the binary tree memory address remapping comprises: accessing the parent node of the empty node originally recorded in the memory and constructing the address mapping relation between the nodes; headFirstly, due to the characteristics of the adaptive huffman binary tree, leaf nodes containing data to be encoded do not have left and right subtrees, so the binary tree space 2 configured by the full binary tree size with the depth of k k In the node, the leaf node memory index for storing data is j, and the left and right child node indexes 2j and (2j +1) are all null nodes; the leaf node with the minimum index value is used as a father node of the empty node for mapping; then, during the construction of the mapping relationship, the node pointer points to the index value exceeding 2 k -1, and during encoding, using the parent node mapping relationship to remap the address information of the sub binary tree back to the original binary tree.
As an embodiment of the present invention, the huffman binary tree coding module further comprises: the binary tree depth acquisition unit is used for acquiring the depth of a target binary tree;
the binary tree depth acquisition unit is used for recording current coding information and constructing a new Huffman binary tree when the binary tree depth acquired by the binary tree depth acquisition unit exceeds the set target binary tree depth in the coding process, and carrying out a new round of coding until the set data to be coded is coded;
and the binary Huffman tree re-initializes an NYT leaf node, the data to be encoded is regarded as the character node appearing for the first time again, and a new adaptive binary Huffman tree is constructed.
According to another aspect of the invention, the following technical scheme is adopted: an adaptive Huffman coding method, the adaptive Huffman coding method comprising:
a to-be-coded data acquisition step of acquiring to-be-coded data;
a Huffman binary tree coding step, namely constructing a Huffman binary tree in real time according to the acquired data to be coded; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established.
In the encoding step of the binary huffman tree, the node weights and the exchange nodes need to be dynamically updated in the process of constructing the binary huffman tree; optimizing the mapping of the node data to the hardware memory in a fixed address access mode;
configuring a continuous memory for storing information of all internal nodes and leaf nodes of a full binary tree under a specific depth, if the internal nodes are internal nodes, performing the dynamic operation through the address of the memory, and not needing to access left and right subtrees one by one through father nodes; if the leaf node is the leaf node, the data to be encoded of the leaf node can be directly accessed and the weight can be updated;
calculating the memory index value of the target node according to the unique path from the root node (root) to the target node: the left child node path record of the root node is 0; the right child node path record is 1;
setting the depth of a full binary tree as k; mapping the root node to the memory address as 1, and defining the addresses of the left and right child nodes as 2 and 3 according to the full binary tree, and so on; obtaining the memory address mapped by any target node is achieved through a left shift operator (<), and the left shift operator can shift the integer or the enumerated expression bit to the left; according to the definition of the left and right child node indexes, the memory address index of the left child node of any target node with the index j is represented as j <1, and the memory address index of the right child node is represented as (j <1) +1, wherein the memory index i of the target node is calculated by the unique path from the root node to the target node.
As an implementation manner of the invention, the huffman binary tree coding step comprises a binary tree memory address remapping step, after the huffman binary tree is constructed, a group of continuous memory spaces are given by using a sub binary tree memory address remapping manner, and null pointers in a memory point to nodes beyond a memory range, so that the nodes can be stored in a hardware by a balanced tree data structure with higher memory efficiency;
assuming that the depth of a full binary tree is k, the process of constructing the binary tree pre-calculates the memory index value of the target node, and once the index value exceeds 2 k 1, implementing binary tree memory address remapping; the process of remapping memory addresses in the binary tree will cause the index value to exceed 2 k -1, establishing an address mapping relationship between the parent node of the target node and the parent node of the empty node originally recorded in the memory;
the binary tree memory address remapping step comprises the following steps: accessing the parent node of the empty node originally recorded in the memory and constructing the address mapping relation between the nodes; firstly, due to the characteristics of the adaptive huffman binary tree, leaf nodes containing data to be encoded do not have left and right subtrees, so the binary tree space 2 configured by the full binary tree size with the depth of k k In the node, the leaf node memory index for storing data is j, and the left and right child node indexes 2j and (2j +1) are all null nodes; the leaf node with the minimum index value is used as a father node of the empty node for mapping; then, during the construction of the mapping relationship, the node pointer points to the index value exceeding 2 k -1, and during encoding, using the mapping relationship of the parent node to remap the address information of the sub binary tree back to the original binary tree.
As an embodiment of the present invention, the huffman binary tree coding step further includes a binary tree depth obtaining step of obtaining a target binary tree depth;
when the depth of the binary tree obtained in the encoding process exceeds the depth of the set target binary tree, recording current encoding information, constructing a new Huffman binary tree, and performing a new round of encoding until the set data to be encoded is encoded;
and the binary Huffman tree re-initializes an NYT leaf node, the data to be encoded is regarded as the character node appearing for the first time again, and a new adaptive binary Huffman tree is constructed.
The invention has the beneficial effects that: the adaptive Huffman coding system and the method can efficiently access the coding data structure on hardware and optimize the storage space of coding, and simultaneously give elasticity to a configurable memory.
Drawings
Fig. 1 is a schematic diagram illustrating an adaptive huffman coding system according to an embodiment of the present invention.
Fig. 2 is a schematic diagram illustrating components of a huffman binary tree coding module according to an embodiment of the present invention.
Fig. 3 is a flowchart of an adaptive huffman coding method according to an embodiment of the invention.
Fig. 4 is a schematic diagram of an example of adaptive huffman coding.
Fig. 5 is a schematic diagram of an access mechanism for optimizing huffman binary tree memory address mapping according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of an example memory for optimizing huffman binary tree memory address mapping according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating a memory address remapping of a binary sub-tree according to an embodiment of the invention.
FIG. 8 is a flow chart of configurable binary tree deep operation in an embodiment of the invention.
Fig. 9 is a diagram illustrating an example of adaptive huffman coding compression parameters using high hardware efficiency according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
For a further understanding of the invention, reference will now be made to the preferred embodiments of the invention by way of example, and it is to be understood that the description is intended to further illustrate features and advantages of the invention, and not to limit the scope of the claims.
The description in this section is for several exemplary embodiments only, and the present invention is not limited only to the scope of the embodiments described. It is within the scope of the present disclosure and protection that the same or similar prior art means and some features of the embodiments may be interchanged.
The steps in the embodiments in the specification are only expressed for convenience of description, and the implementation manner of the present application is not limited by the order of implementation of the steps.
The term "connected" in the specification includes both direct connection and indirect connection.
The invention discloses an adaptive Huffman coding system, and FIG. 1 is a schematic composition diagram of the adaptive Huffman coding system in an embodiment of the invention; referring to fig. 1, the adaptive huffman coding system comprises: the device comprises a module 1 for acquiring data to be coded and a Huffman binary tree coding module 2. The data to be encoded obtaining module 1 is configured to obtain data to be encoded. The Huffman binary tree coding module 2 is used for constructing a Huffman binary tree according to the data to be coded, which is obtained by the data to be coded obtaining module; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established.
FIG. 2 is a schematic diagram illustrating a Huffman binary tree coding module according to an embodiment of the present invention; referring to fig. 2, in an embodiment of the present invention, the huffman binary tree coding module 2 includes: a binary huffman tree initializing unit 21 and a binary huffman tree updating unit 22. The huffman binary tree initializing unit 21 is used for initializing a huffman binary tree; the huffman binary tree updating unit 22 is configured to update the huffman binary tree according to a set rule.
The process of updating the binary huffman tree by the binary huffman tree updating unit 22 is subject to the following specifications:
(1) the nodes are numbered according to the increasing sequence; that is, the node numbers are increased from top to bottom; numbering from left to right and increasing progressively;
(2) the internal nodes are represented by circles; the internal node weight is the sum of the child node weights;
(3) leaf nodes are represented by squares; the weight of the leaf node is initialized to 1, and if data are repeatedly input, the weight is increased by 1;
(4) during the update of the tree, nodes must be switched if the left sub-tree has a greater weight than the right sub-tree.
The initial state of the binary huffman tree initialized by the binary huffman tree initializing unit 21 only contains one leaf node, that is, a NYT leaf node (Not Yet Transmitted); NYT is an Escape character (Escape character), and when a decoder decodes an NYT, the content behind the NYT can be known as an uncoded symbol; when inserting a symbol q, two situations arise:
(1) q is the first occurring character node; NYT will form a sub-tree constructed by two leaf nodes of NYT symbol and new symbol q, and judge whether the father node of the sub-tree meets the fourth point of the above specification, if not, the node must be exchanged, and then the weight value is updated;
(2) q is not the first occurring character node; if the weight of the parent node of the node where q is located and the left sub-tree is less than that of the right sub-tree, the weight of the symbol q node and the weight of the parent node are directly added with 1; if not, the node must be exchanged and the weight value is updated.
In an embodiment of the present invention, the huffman binary tree coding module 2 needs to dynamically update the node weights and exchange nodes in the process of constructing the huffman binary tree, so how to quickly obtain the node addresses to perform the above operations is crucial to the overall coding and decoding speed; the dynamic operation is typically performed using a recursive access pointer to each node, which cannot be accessed via a fixed address. The mapping of the node data to the hardware memory is optimized by means of fixed address access.
Firstly, configuring a continuous memory for storing information of all internal nodes and leaf nodes of a full binary tree under a specific depth, if the internal nodes are internal nodes, performing the dynamic operation through the address of the memory, and not needing to access left and right subtrees one by one through father nodes; if the leaf node is the leaf node, the data to be encoded of the leaf node can be directly accessed and the weight can be updated.
Next, according to the unique path from the root node (root) to the target node, calculating the memory index value of the target node: the left child node path record of the root node is 0; the right child node path record is 1.
Setting the depth of the full binary tree to k, in the usual method, the ith layer has 2 by definition i-1 J node index on the ith layer is j, and left child node index of the node is 2 j; its right child node index is (2j +1), and 2 needs to be configured in total k The memory of (2).
Fig. 4 is an example of adaptive huffman coding. Given a set of data symbols "0x110x220x330x110x 22", the final corresponding encoding result is00010001 0100100010 10100110011 0 11The final binary huffman tree of the last 0x22 data symbol is inserted.
In the invention, the root node is mapped to the memory address as 1, the left child node and the right child node define the addresses as 2 and 3 respectively according to the full binary tree, and so on; obtaining the memory address mapped by any target node is achieved through a left shift operator (< >), and the left shift operator can shift the bits of an integer or an enumerated expression leftwards and is often used as acceleration of hardware power multiplication; according to the definition of the left and right child node indexes, the memory address index of the left child node of any target node with the index j is represented as j <1, and the memory address index of the right child node is represented as (j <1) +1, wherein the memory index i of the target node is calculated by the unique path from the root node to the target node.
FIG. 5 illustrates an access mechanism for optimizing binary tree memory address mapping according to an embodiment of the present invention. Encode the AI model parameters for the hexadecimal bit, one node stores 1byte of data, for example: 0x 11. During coding, a binary Huffman tree needs to be constructed in real time, the address of a root node (root) is set to be 1, and the residual value after the highest bit 1 of the address is used for coding, wherein the coding mode is the same as that of the original adaptive Huffman coding. The path record from the target node to its left child node is 0, which can be passed through (1)<<1) Accessing the memory address; the path record of the right child node is 1, which can be passed through (1)<<1) The memory address is accessed in the +1 manner, and a set of fixed access addresses can be obtained by recording the position of the node relative to the root node. Example (c): the position of node b is recorded as (10) 2 I.e., the left child node of the root node, whose address is (1)<<1) 2; the position of the node c is recorded as (11) 2 I.e. the right child node of the root node, whose address is (1)<<1) +1 ═ 3; the position of node d is recorded as (100) 2 The next two left child nodes, i.e., the root node, have addresses of ((1)<<1)<<1) 4. Node data can be accessed through a hardware-friendly shift operator (shift operator) while compressing AI model parameters in real time on edge-side hardware.
FIG. 6 illustrates an example of a memory optimized for binary tree memory address mapping in accordance with an embodiment of the present invention. Assuming that a memory with a size of 16 bytes is configured for storing data, a binary tree with a depth of 4 can be accommodated at most, and there are 15 nodes (as shown in fig. 5), and the addresses actually stored in the memory of each node in fig. 5 are shown in fig. 6.
The huffman binary tree coding module 2 further comprises a binary tree memory address remapping unit 23, which is used for giving a group of continuous memory spaces by using a sub-binary tree memory address remapping mode after the huffman binary tree is constructed, and pointing a null pointer in the memory to a node beyond the memory range, so that the node can be stored in a balanced tree data structure with higher memory efficiency on hardware.
Assuming that the depth of a full binary tree is k, the process of constructing the binary tree pre-calculates the memory index value of the target node, and once the index value exceeds 2 k 1, implementing binary tree memory address remapping; the process of remapping memory addresses in the binary tree will cause the index value to exceed 2 k -1, establishing an address mapping relationship between the parent node of the target node and the parent node of the empty node originally recorded in the memory.
The binary tree memory address remapping unit 23 is used for accessing the parent node of the empty node originally recorded in the memory and constructing the address mapping relation between the nodes; firstly, due to the characteristics of the adaptive huffman binary tree, leaf nodes containing data to be encoded do not have left and right subtrees, so the binary tree space 2 configured by the full binary tree size with the depth of k k In the node, the leaf node memory index for storing data is j, and the left and right child node indexes 2j and (2j +1) are all null nodes; the leaf node with the minimum index value is used as a father node of the empty node for mapping; then, during the construction of the mapping relationship, the node pointer points to the index value exceeding 2 k -1, and during encoding, using the mapping relationship of the parent node to remap the address information of the sub binary tree back to the original binary tree.
FIG. 7 is a block diagram illustrating an example of a binary sub-tree memory address remapping method according to an embodiment of the present invention. As can be observed from fig. 5 and fig. 6, if the binary tree represents a skewed tree coding structure, a large amount of memory is wasted (for example, most of the memory of 16 bytes in fig. 6 is empty). A characteristic diagram of AI model(feature map) generated after coding, node of binary tree is about 2 17 However, only 513 nodes actually have stored data (assuming that each pixel of the feature map ranges from 0 to 255, 256 × 2+1 is required to be 513 nodes), which results in a lot of memory waste. Fig. 7 shows a data structure of a skewed tree, assuming that the memory only allows the binary tree depth to 4, the node (p, q) beyond the depth limit after encoding is stored in the position (f, g) of the empty node in the memory, i.e. the father node (c) of the position (f, g) and the father node (h) of the original node (p, q) establish an address mapping relationship, when the addresses of the node f, g are to be encoded, if the node is located in the secondary binary tree, the address information of the secondary binary tree is remapped back to the original binary tree by using the father node mapping relationship. Taking p node as an example, the p node has an original binary tree address of (10000) 2 Mapping p points to f nodes of the sub binary tree (110) 2 And simultaneously establishing a mapping relation between father nodes [ h (1000) 2 →c(11) 2 ]When the f node needs to be encoded into a code word, the f node address is remapped back to the original binary tree address by using the father node mapping relationship ([11 ]]0) 2 →([1000]0) 2 . Thus the codeword for node f is (0000) 2 (the remaining value after the highest bit 1). By analogy, the data are arranged into a relatively balanced data structure of left and right subtrees step by step. By using the mode of remapping memory addresses of the sub binary tree, nodes can be stored on hardware in a balanced tree data structure with higher memory efficiency, and binary tree nodes generated after a feature diagram of the AI model is encoded are reduced to about 2 10 One (i.e., the original 131KB to 1KB, which improves the memory storage efficiency by about 100 times).
In an embodiment of the present invention, the huffman binary tree coding module 2 further includes a binary tree depth obtaining unit 24, configured to obtain the target binary tree depth. The binary tree depth obtaining unit 24 is configured to record current encoding information when the binary tree depth obtained by the binary tree depth obtaining unit exceeds the set target binary tree depth in the encoding process, construct a new huffman binary tree, and perform a new round of encoding until the set data to be encoded is encoded. And the Huffman binary tree re-initializes an NYT leaf node, the data to be coded is regarded as the character node which appears for the first time again, and a new adaptive Huffman binary tree is constructed according to the method.
Fig. 3 is a flowchart of an adaptive huffman coding method according to an embodiment of the present invention; referring to fig. 3, the adaptive huffman coding method includes:
step S1, acquiring data to be encoded;
step S2, constructing a binary Huffman tree in real time according to the acquired data to be encoded; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established.
In an embodiment of the present invention, in the huffman binary tree coding step, the node weights and the exchange nodes need to be dynamically updated in the process of constructing the huffman binary tree, so how to quickly obtain the node addresses to perform the above operations is crucial to the overall coding speed; the dynamic operation is typically performed using a recursive access pointer to each node, which cannot be accessed via a fixed address. The mapping of the node data to the hardware memory is optimized by means of fixed address access.
Firstly, configuring a continuous memory for storing information of all internal nodes and leaf nodes of a full binary tree under a specific depth, if the internal nodes are internal nodes, performing the dynamic operation through the address of the memory, and not needing to access left and right subtrees one by one through father nodes; if the leaf node is the leaf node, the data to be encoded of the leaf node can be directly accessed and the weight can be updated.
Next, according to the unique path from the root node (root) to the target node, calculating the memory index value of the target node: the left child node path record of the root node is 0; the right child node path record is 1.
Set the depth of the full binary tree to k, by definition, its ith layer has 2 i-1 J node index on the ith layer is j, and left child node index of the node is 2 j; the right part thereofThe node index is (2j +1), and 2 needs to be configured in total k The memory of (2).
Mapping the root node to a memory address of 1, defining addresses of the left child node and the right child node of the root node as 2 and 3 according to the full binary tree, and so on; obtaining the memory address mapped by any target node is achieved through a left shift operator (< >), and the left shift operator can shift the bits of an integer or an enumerated expression leftwards and is often used as acceleration of hardware power multiplication; according to the definition of the left and right child node indexes, the memory address index of the left child node of any target node with the index j is represented as j <1, and the memory address index of the right child node is represented as (j <1) +1, wherein the memory index i of the target node is calculated by the unique path from the root node to the target node.
The Huffman binary tree coding step comprises a binary tree memory address remapping step, after the construction of the Huffman binary tree is completed, a group of continuous memory spaces are given by using a sub binary tree memory address remapping mode, and null pointers in the memory point to nodes beyond the memory range, so that the nodes can be stored in a balanced tree data structure with higher memory efficiency on hardware.
Assuming that the depth of a full binary tree is k, the process of constructing the binary tree pre-calculates the memory index value of the target node, and once the index value exceeds 2 k 1, implementing binary tree memory address remapping; the process of remapping memory addresses in the binary tree will cause the index value to exceed 2 k -1, establishing an address mapping relationship between the parent node of the target node and the parent node of the empty node originally recorded in the memory.
The binary tree memory address remapping step comprises the following steps: accessing the parent node of the empty node originally recorded in the memory and constructing the address mapping relation between the nodes; firstly, due to the characteristics of the adaptive huffman binary tree, leaf nodes containing data to be encoded do not have left and right subtrees, so the binary tree space 2 configured by the full binary tree size with the depth of k k In the node, the memory index of the leaf node storing data is j, and the left and right child node indexes 2j and (2j +1) are both empty nodes; the leaf node with the minimum index value is used as a father node of the empty node for mapping; then, atConstructing a mapping relationship by pointing the node pointer to an index value exceeding 2 k -1, and during encoding, using the mapping relationship of the parent node to remap the address information of the sub binary tree back to the original binary tree.
The Huffman binary tree coding step further comprises a binary tree depth obtaining step for obtaining the depth of the target binary tree. And when the depth of the binary tree acquired in the encoding process exceeds the depth of the set target binary tree, recording current encoding information, constructing a new Huffman binary tree, and performing a new round of encoding until the set data to be encoded is encoded. And the Huffman binary tree re-initializes an NYT leaf node, the data to be coded is regarded as the character node which appears for the first time again, and a new adaptive Huffman binary tree is constructed according to the method.
In a usage scenario of the invention, when performing edge model inference, a large amount of memory is required to be used for storing and transmitting AI parameters and feature map data, and high-efficiency data storage of adaptive Huffman coding with high hardware efficiency is utilized to achieve lossless data compression. During encoding, a binary Huffman tree needs to be constructed in real time, and the method replaces recursive access nodes through a fixed address access mode and optimizes the mapping of node data to a hardware memory. In the process of constructing the binary tree, because of the AI model parameters and the characteristic of the data distribution of the characteristic diagram, the coding structure of the skewed tree is often caused, and nodes can be stored in a balanced tree data structure with higher memory efficiency on hardware by using a memory address remapping mode of the sub binary tree. For hardware with strict memory limitation, configurable binary tree depth is provided, so that the construction process of the binary tree can meet the memory limitation. The AI model parameters and the characteristic diagram data are compressed on the edge end hardware in real time by using the adaptive Huffman coding with high hardware efficiency, the bandwidth occupied by the memory in unit time can be greatly reduced, and the model is deployed on a chip with less hardware resources.
During encoding, a binary Huffman tree needs to be constructed in real time, and a left child node (left ch i ld) of a target node can access a memory address in a (1< <1) mode; the right child node (r weight ch i ld) can access the memory address by the way of (1< <1) + 1. Therefore, a fixed set of access addresses can be obtained by recording the data path, and the mapping of the node data to the hardware memory is optimized. In the process of constructing the binary tree, due to the self characteristics of the adaptive Huffman binary tree, newly-added symbols are continuously inserted from NYT nodes of the unilateral subtree, non-NYT nodes do not extend downwards any more, and the characteristics of AI model parameters and characteristic diagram data distribution are added, so that the coding structure of the skewed tree is always caused. And providing configurable binary tree depth for hardware with strict memory limitation, recording current coding information and reconstructing the binary tree to perform a new round of coding until data are coded completely if the coding process exceeds the target binary tree depth, so that the construction process of the binary tree can meet the memory limitation.
FIG. 8 is a flow diagram of configurable binary tree depth operation. For hardware with strict memory limitation, firstly, the allowed binary tree depth and the node number N are calculated according to the memory size, N data are sequentially coded until the N nodes are filled, then the corresponding coding result is stored, if data are not coded, the binary tree is repeatedly initialized, and next N data are coded again until all data are coded. After a feature map of the AI model is optimized by the above-mentioned binary sub-tree memory address remapping, it requires about 1KB of memory space, so if the configurable memory is less than 1KB, the method achieves the memory requirement on hardware by batch coding.
Fig. 9 is an example of adaptive huffman coding compression parameters using high hardware efficiency. Assuming that the configurable memory space is 16 bytes (less than 1KB), in order to meet the memory requirement on hardware, coding is carried out in batches, firstly, a binary tree is generated by using adaptive Huffman coding with high hardware efficiency, and data of each node is mapped to a hardware memory address according to the position of a relative root node; in order to avoid the data structure of the skewed tree, the nodes (p, q, alpha, beta, gamma, delta) which exceed the depth limit after being coded are stored to the positions (f, g, l, m, n, o) of hollow nodes in the memory, namely the root node indexes of p and q are changed to point to c; and changing the root node indexes of gamma and delta to point to q, storing a corresponding coding result after the coding of the batch of data is finished, repeating the initialization of the binary tree if the data is not coded, and coding the next batch of data again until all data are coded.
In summary, the adaptive huffman coding system and method provided by the present invention can efficiently access the coded data structure on the hardware and optimize the storage space of the code, while giving flexibility to the configurable memory.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware; for example, it may be implemented using Application Specific Integrated Circuits (ASICs), general purpose computers, or any other similar hardware devices. In some embodiments, the software programs of the present application may be executed by a processor to implement the above steps or functions. As such, the software programs (including associated data structures) of the present application can be stored in a computer-readable recording medium; such as RAM memory, magnetic or optical drives or diskettes, and the like. In addition, some steps or functions of the present application may be implemented using hardware; for example, as circuitry that cooperates with the processor to perform various steps or functions.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The description and applications of the invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Effects or advantages referred to in the embodiments may not be embodied in the embodiments due to interference of various factors, and the description of the effects or advantages is not intended to limit the embodiments. Variations and modifications of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments will be apparent to those skilled in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other components, materials, and parts, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims (10)

1. An adaptive Huffman coding system, comprising:
the data to be coded acquisition module is used for acquiring data to be coded;
the Huffman binary tree coding module is used for constructing a Huffman binary tree according to the data to be coded, which is obtained by the data to be coded obtaining module; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established.
2. The adaptive huffman coding system according to claim 1, wherein:
the Huffman binary tree coding module comprises:
the Huffman binary tree initialization unit is used for initializing a Huffman binary tree;
the Huffman binary tree updating unit is used for updating the Huffman binary tree according to a set rule;
the process of updating the binary huffman tree by the binary huffman tree updating unit is required to follow the following specifications:
(1) the nodes are numbered according to the increasing sequence; that is, the node numbers are increased from top to bottom; numbering from left to right and increasing progressively;
(2) the internal nodes are represented by circles; the internal node weight is the sum of the child node weights;
(3) leaf nodes are represented by squares; the weight of the leaf node is initialized to 1, and if data are input repeatedly, the weight is increased by 1;
(4) during the update of the tree, nodes must be switched if the left sub-tree has a greater weight than the right sub-tree.
3. The adaptive huffman coding system according to claim 1, wherein:
the Huffman binary tree coding module dynamically updates node weights and switching nodes in the process of constructing a Huffman binary tree; optimizing the mapping of the node data to the hardware memory in a fixed address access mode;
configuring a continuous memory for storing information of all internal nodes and leaf nodes of a full binary tree under a specific depth; if the internal node is stored, the dynamic operation is carried out through accessing the address of the memory, and the left subtree and the right subtree do not need to be accessed one by one through the father node; if the leaf node is stored, directly accessing the data to be encoded of the leaf node and updating the weight;
calculating the memory index value of the target node according to the unique path from the root node to the target node: the left child node path record of the root node is 0; the right child node path record is 1;
setting the depth of a full binary tree as k; mapping the root node to the memory address as 1, and defining the addresses of the left child node and the right child node as 2 and 3 according to the full binary tree, and so on; obtaining the memory address mapped by any target node is achieved through a left shift operator < < which can shift the integer or the enumerated expression bit to the left; according to the definition of the left and right child node indexes, the memory address index of the left child node of any target node with the index j is represented as j <1, and the memory address index of the right child node is represented as (j <1) +1, wherein the memory index i of the target node is calculated by the unique path from the root node to the target node.
4. The adaptive huffman coding system according to claim 1, wherein:
the Huffman binary tree coding module further comprises a binary tree memory address remapping unit, and is used for giving a group of continuous memory spaces by using a sub-binary tree memory address remapping mode after the construction of the Huffman binary tree is completed, and pointing a null pointer in a memory to a node beyond a memory range, so that the node can be stored in a balanced tree data structure with higher memory efficiency on hardware.
5. The adaptive Huffman coding system of claim 4, wherein:
setting a full binary tree depth as k, pre-calculating the memory index value of the target node in the process of constructing the binary tree, and once the index value exceeds 2 k 1, implementing binary tree memory address remapping; the process of remapping memory addresses in binary tree exceeds the index value by 2 k -1, establishing an address mapping relationship between the parent node of the target node and the parent node of the empty node originally recorded in the memory;
the binary tree memory address remapping unit is used for accessing the father node of the empty node originally recorded in the memory and constructing the address mapping relation between the nodes; the adaptive Huffman binary tree contains leaf nodes without left and right subtrees of data to be encoded, so that a binary tree space 2 configured by a full binary tree size with a depth of k k In the node, the leaf node memory index for storing data is j, and the left and right child node indexes 2j and (2j +1) are all null nodes; the leaf node with the minimum index value is used as a father node of the empty node for mapping; in the construction process of the mapping relation, the leaf node pointer with the minimum index value points to the index value exceeding 2 k -1, and during encoding, using the mapping relationship of the parent node to remap the address information of the sub binary tree back to the original binary tree.
6. The adaptive huffman coding system according to claim 1, wherein:
the Huffman binary tree coding module further comprises a binary tree depth acquisition unit used for acquiring the depth of a target binary tree;
the binary tree depth acquisition unit is used for recording current coding information and constructing a new Huffman binary tree when the binary tree depth acquired by the binary tree depth acquisition unit exceeds the set target binary tree depth in the coding process, and carrying out a new round of coding until the set data to be coded is coded;
and the binary Huffman tree re-initializes an NYT leaf node, the data to be encoded is regarded as the character node appearing for the first time again, and a new adaptive binary Huffman tree is constructed.
7. An adaptive Huffman coding method, comprising:
a to-be-coded data acquisition step of acquiring to-be-coded data;
a Huffman binary tree coding step, namely constructing a Huffman binary tree in real time according to the acquired data to be coded; and establishing indexes for each node of the binary Huffman tree and a memory, and calculating the memory index value of the target node according to the unique path from the root node to the target node when the binary Huffman tree is established.
8. The adaptive huffman coding method according to claim 7, wherein:
in the encoding step of the binary Huffman tree, the node weight and the exchange node need to be dynamically updated in the process of constructing the binary Huffman tree; optimizing the mapping of the node data to the hardware memory in a fixed address access mode;
configuring a continuous memory for storing information of all internal nodes and leaf nodes of a full binary tree under a specific depth; if the internal node is stored, the dynamic operation is carried out through accessing the address of the memory, and the left subtree and the right subtree do not need to be accessed one by one through the father node; if the leaf node is stored, directly accessing the data to be encoded of the leaf node and updating the weight;
calculating the memory index value of the target node according to the unique path from the root node to the target node: the left child node path record of the root node is 0; the right child node path record is 1;
setting the depth of a full binary tree as k; mapping the root node to the memory address as 1, and defining the addresses of the left child node and the right child node as 2 and 3 according to the full binary tree, and so on; obtaining the memory address mapped by any target node is achieved through a left shift operator < < which can shift the integer or the enumerated type expression bit to the left; according to the definition of the left and right child node indexes, the memory address index of the left child node of any target node with the index j is represented as j <1, and the memory address index of the right child node is represented as (j <1) +1, wherein the memory index i of the target node is calculated by the unique path from the root node to the target node.
9. The adaptive huffman coding method according to claim 7, wherein:
the Huffman binary tree coding step comprises a binary tree memory address remapping step, after the construction of the Huffman binary tree is completed, a group of continuous memory spaces are given by using a sub binary tree memory address remapping mode, and null pointers in a memory point to nodes beyond the memory range, so that the nodes can be stored in a hardware by using a balanced tree data structure with higher memory efficiency;
setting a full binary tree depth as k, pre-calculating the memory index value of the target node in the process of constructing the binary tree, and once the index value exceeds 2 k 1, implementing binary tree memory address remapping; the process of remapping memory addresses in binary tree exceeds the index value by 2 k -1, establishing an address mapping relationship between a parent node of the target node and a parent node of the empty node originally recorded in the memory;
in the binary tree memory address remapping step, the construction of address mapping relations between father nodes and nodes of empty nodes originally recorded in a memory is accessed; the adaptive Huffman binary tree contains leaf nodes without left and right subtrees of data to be encoded, so that a binary tree space 2 configured by a full binary tree size with a depth of k k In the node, the leaf node memory index for storing data is j, and the left and right child node indexes 2j and (2j +1) are all null nodes; the leaf node with the minimum index value is used as a father node of the empty node for mapping; in the construction process of the mapping relation, the leaf node pointer with the minimum index value points to the index value exceeding 2 k -1 parent node of the target node, and when encoding, using the parent node mapping relationshipThe address information of the sub binary tree is remapped back to the original binary tree.
10. The adaptive huffman coding method according to claim 7, wherein:
the Huffman binary tree coding step further comprises a binary tree depth obtaining step for obtaining the depth of a target binary tree;
when the depth of the binary tree obtained in the encoding process exceeds the depth of the set target binary tree, recording current encoding information, constructing a new Huffman binary tree, and performing a new round of encoding until the set data to be encoded is encoded;
and the binary Huffman tree re-initializes an NYT leaf node, the data to be encoded is regarded as the character node appearing for the first time again, and a new adaptive binary Huffman tree is constructed.
CN202210366617.XA 2022-04-08 2022-04-08 Adaptive Huffman coding system and method Pending CN114900193A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210366617.XA CN114900193A (en) 2022-04-08 2022-04-08 Adaptive Huffman coding system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210366617.XA CN114900193A (en) 2022-04-08 2022-04-08 Adaptive Huffman coding system and method

Publications (1)

Publication Number Publication Date
CN114900193A true CN114900193A (en) 2022-08-12

Family

ID=82716277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210366617.XA Pending CN114900193A (en) 2022-04-08 2022-04-08 Adaptive Huffman coding system and method

Country Status (1)

Country Link
CN (1) CN114900193A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116318173A (en) * 2023-05-10 2023-06-23 青岛农村商业银行股份有限公司 Digital intelligent management system for financial financing service
CN116452682A (en) * 2023-05-09 2023-07-18 北京数慧时空信息技术有限公司 Slicing-free real-time release system and method based on tree structure
CN116610084A (en) * 2023-07-20 2023-08-18 北京柏瑞安电子技术有限公司 PCBA production data intelligent management system
CN117435776A (en) * 2023-12-20 2024-01-23 杭州拓数派科技发展有限公司 Metadata storage and query method, device, computer equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452682A (en) * 2023-05-09 2023-07-18 北京数慧时空信息技术有限公司 Slicing-free real-time release system and method based on tree structure
CN116452682B (en) * 2023-05-09 2024-06-11 北京数慧时空信息技术有限公司 Slicing-free real-time release system and method based on tree structure
CN116318173A (en) * 2023-05-10 2023-06-23 青岛农村商业银行股份有限公司 Digital intelligent management system for financial financing service
CN116318173B (en) * 2023-05-10 2023-08-11 青岛农村商业银行股份有限公司 Digital intelligent management system for financial financing service
CN116610084A (en) * 2023-07-20 2023-08-18 北京柏瑞安电子技术有限公司 PCBA production data intelligent management system
CN116610084B (en) * 2023-07-20 2023-09-12 北京柏瑞安电子技术有限公司 PCBA production data intelligent management system
CN117435776A (en) * 2023-12-20 2024-01-23 杭州拓数派科技发展有限公司 Metadata storage and query method, device, computer equipment and storage medium
CN117435776B (en) * 2023-12-20 2024-04-30 杭州拓数派科技发展有限公司 Metadata storage and query method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN114900193A (en) Adaptive Huffman coding system and method
US5272478A (en) Method and apparatus for entropy coding
US4814746A (en) Data compression method
EP0695040B1 (en) Data compressing method and data decompressing method
US5298896A (en) Method and system for high order conditional entropy coding
US4464650A (en) Apparatus and method for compressing data signals and restoring the compressed data signals
US5532694A (en) Data compression apparatus and method using matching string searching and Huffman encoding
US5710562A (en) Method and apparatus for compressing arbitrary data
US5717393A (en) Apparatus for data compression and data decompression
CN105207678B (en) A kind of system for implementing hardware of modified LZ4 compression algorithms
US5774081A (en) Approximated multi-symbol arithmetic coding method and apparatus
US6919826B1 (en) Systems and methods for efficient and compact encoding
EP0127815B1 (en) Data compression method
US5594435A (en) Permutation-based data compression
US5216423A (en) Method and apparatus for multiple bit encoding and decoding of data through use of tree-based codes
JP2022529393A (en) Short block length distribution matching algorithm
EP0698862A2 (en) Transmission compatibility using custom compression method and hardware
US20220005229A1 (en) Point cloud attribute encoding method and device, and point cloud attribute decoding method and devcie
US6055273A (en) Data encoding and decoding method and device of a multiple-valued information source
US6919827B2 (en) Method and apparatus for effectively decoding Huffman code
US5708431A (en) Method for compression coding of potentially unbounded integers
CN112506876B (en) Lossless compression query method supporting SQL query
Ghuge Map and Trie based Compression Algorithm for Data Transmission
US20060125660A1 (en) Digital data compression robust relative to transmission noise
US5991340A (en) Method and system for encoding and decoding data using run prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination